[Bug tree-optimization/71361] New: [7 Regression] Changes in ivopts caused perf regression on x86

2016-05-31 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71361

Bug ID: 71361
   Summary: [7 Regression] Changes in ivopts caused perf
regression on x86
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: iverbin at gcc dot gnu.org
CC: amker.cheng at gmail dot com, izamyatin at gmail dot com,
kyukhin at gcc dot gnu.org
  Target Milestone: ---

r235805 leads to performance regression on x86.

Reduced testcase:

int arr_1[512];
int arr_2[512];

int main ()
{
int c1[512];
int c2[512];
int res[512];

for (int i = 0; i < 512; i++)
  arr_1[i] = arr_2[i] = c1[i] = c2[i] = i;

for (int l = 0; l < 100; l++)
  for (int k = 1; k <= 9; k++)
{
  int n1 = 1 << k;
  int n2 = n1 >> 1;

  for (int j = 0; j < n2; j++)
for (int i = j; i < 512; i += n1)
  {
int idx = i + n2;
int x1 = arr_1[idx] * c1[j] + arr_2[idx] * c2[j];
int x2 = arr_2[idx] * c1[j] + arr_1[idx] * c2[j];

arr_1[i] = x1;
arr_2[i] = x2;
arr_1[idx] = x1;
arr_2[idx] = x2;
  }
}

return 0;
}

Compilation options: -Ofast -m32 -fPIE
GCC is configured --with-arch=corei7 --with-cpu=corei7 --with-fpmath=sse
Run time on Sandy Bridge increased by ~20%
Run time on Atom increased by ~60%

Below are the dumps of the innermost loop after ivopts pass.

Before regression there are 2 induction variables, which are used as bases for
all 6 memory accesses:

  # i_66 = PHI <i_37(7), j_65(11)>
  # ivtmp.19_63 = PHI <ivtmp.19_95(7), ivtmp.19_76(11)>
  # ivtmp.20_17 = PHI <ivtmp.20_15(7), ivtmp.20_73(11)>
  _59 = (void *) ivtmp.19_63;
  _58 = (sizetype) n2_20;
  _22 = MEM[base: _59, index: _58, step: 4, offset: 0B];
  _24 = _22 * pretmp_105;
  _55 = (void *) ivtmp.20_17;
  _54 = (sizetype) n2_20;
  _25 = MEM[base: _55, index: _54, step: 4, offset: 0B];
  _27 = _25 * pretmp_107;
  x1_28 = _24 + _27;
  _30 = _25 * pretmp_105;
  _31 = _22 * pretmp_107;
  x2_32 = _30 + _31;
  _51 = (void *) ivtmp.19_63;
  MEM[base: _51, offset: 0B] = x1_28;
  _50 = (void *) ivtmp.20_17;
  MEM[base: _50, offset: 0B] = x2_32;
  _57 = (void *) ivtmp.19_63;
  _56 = (sizetype) n2_20;
  MEM[base: _57, index: _56, step: 4, offset: 0B] = x1_28;
  _53 = (void *) ivtmp.20_17;
  _52 = (sizetype) n2_20;
  MEM[base: _53, index: _52, step: 4, offset: 0B] = x2_32;
  i_37 = n1_19 + i_66;
  ivtmp.19_95 = ivtmp.19_63 + _77;
  ivtmp.20_15 = ivtmp.20_17 + _12;
  if (i_37 <= 511)
goto ;
  else
goto ;

After regression there is only one induction variable, which is used as index
for 4 memory accesses.

  # i_66 = PHI <i_37(7), j_65(11)>
  # ivtmp.22_63 = PHI <ivtmp.22_95(7), ivtmp.22_76(11)>
  _22 = MEM[symbol: arr_1, index: ivtmp.22_63, offset: 0B];
  _24 = _22 * pretmp_105;
  _25 = MEM[symbol: arr_2, index: ivtmp.22_63, offset: 0B];
  _27 = _25 * pretmp_107;
  x1_28 = _24 + _27;
  _30 = _25 * pretmp_105;
  _31 = _22 * pretmp_107;
  x2_32 = _30 + _31;
  _17 = (sizetype) i_66;
  _15 = _17 * 4;
  MEM[symbol: arr_1, index: _15, offset: 0B] = x1_28;
  _14 = (sizetype) i_66;
  _12 = _14 * 4;
  MEM[symbol: arr_2, index: _12, offset: 0B] = x2_32;
  MEM[symbol: arr_1, index: ivtmp.22_63, offset: 0B] = x1_28;
  MEM[symbol: arr_2, index: ivtmp.22_63, offset: 0B] = x2_32;
  i_37 = n1_19 + i_66;
  ivtmp.22_95 = ivtmp.22_63 + _77;
  if (i_37 <= 511)
goto ;
  else
goto ;

As a result, the final assembly contains 13% more instructions.

Before regression:

  .L5:
movl(%edi,%ebx,4), %eax
movd%xmm1, %edx
movd%xmm0, %ecx
imull   (%esi,%ebx,4), %ecx
imull   %eax, %edx
addl%ecx, %edx
movd%xmm0, %ecx
imull   %ecx, %eax
movd%xmm1, %ecx
imull   (%esi,%ebx,4), %ecx
movl%edx, (%esi)
movl%edx, (%esi,%ebx,4)
movd%xmm5, %edx
addl%edx, %esi
addl%ecx, %eax
movl%eax, (%edi)
movl%eax, (%edi,%ebx,4)
movd%xmm4, %eax
addl%edx, %edi
addl%eax, -4124(%ebp)
movl-4124(%ebp), %ecx
cmpl$511, %ecx
jle .L5

After regression:

  .L5:
movd%xmm5, %edi
movd%xmm3, %edx
movd%xmm1, %ebx
imull   (%eax,%edx), %ebx
movd%xmm4, %ecx
movd%xmm4, %edx
imull   (%eax,%edi), %ecx
addl%ecx, %ebx
movd%xmm1, %ecx
imull   (%eax,%edi), %ecx
movd%ecx, %xmm0
movd%xmm3, %ecx
imull   (%eax,%ecx), %edx
movd%xmm0, %ecx
addl%edx, %ecx

[Bug target/71088] New: [i386, AVX-512, Perf] vpermi2ps instead of vpermps emitted

2016-05-12 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71088

Bug ID: 71088
   Summary: [i386, AVX-512, Perf] vpermi2ps instead of vpermps
emitted
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: iverbin at gcc dot gnu.org
CC: ienkovich at gcc dot gnu.org, izamyatin at gmail dot com,
kyukhin at gcc dot gnu.org, ubizjak at gmail dot com
  Target Milestone: ---

Testcase:

float foo (float *arr1, float *arr2, float *max_x, int M, float s)
{
  float *res = new float[M];

  for (int i = M - 1; i >= 0; i--)
for (int j = 0; j <= i; j++)
  {
float x = arr1[j] * arr2[i - j] + s;
res[j] = x > max_x[j] ? x : max_x[j];
  }

  return res[0];
}

To reproduce:

$ g++ -S test.cpp -Ofast -funroll-loops -march=knl

GCC emits vpermi2ps instruction to rearrange elements of arr2 backwards,
however
this instruction writes the result into the index register, therefore there are
additional movs before each vpermi2ps to restore indexes [1].
Also there are some weird movs after each vpermi2ps [2].  It's not clear why
the
result from vpermi2ps isn't passed directly to vfmadd132ps.

.L1:
vmovups   (%r11), %zmm9
vmovdqa64 %zmm2, %zmm1  # [1]
vpermi2ps %zmm9, %zmm9, %zmm1
vmovdqa64 %zmm2, %zmm16 # [1]
vmovaps   %zmm1, %zmm10 # [2]
vmovdqa64 %zmm2, %zmm1  # [1]
vmovups   -64(%r11), %zmm12
vfmadd132ps   (%rax,%r9), %zmm3, %zmm10
vpermi2ps %zmm12, %zmm12, %zmm1
vmaxps(%rcx,%r9), %zmm10, %zmm11
vmovaps   %zmm1, %zmm13 # [2]
vmovdqa64 %zmm2, %zmm1  # [1]
vmovups   -128(%r11), %zmm15
vfmadd132ps   64(%rax,%r9), %zmm3, %zmm13
vmovups   -192(%r11), %zmm6
vpermi2ps %zmm15, %zmm15, %zmm1
vpermi2ps %zmm6, %zmm6, %zmm16
vmovaps   %zmm1, %zmm4  # [2]
vmovaps   %zmm16, %zmm7 # [2]
vmaxps64(%rcx,%r9), %zmm13, %zmm14
vfmadd132ps   128(%rax,%r9), %zmm3, %zmm4
vfmadd132ps   192(%rax,%r9), %zmm3, %zmm7
vmaxps128(%rcx,%r9), %zmm4, %zmm5
leal  4(%r15), %r15d
vmaxps192(%rcx,%r9), %zmm7, %zmm8
cmpl  %esi, %r15d
vmovups   %zmm11, (%r8,%r9)
leaq  -256(%r11), %r11
vmovups   %zmm14, 64(%r8,%r9)
vmovups   %zmm5, 128(%r8,%r9)
vmovups   %zmm8, 192(%r8,%r9)
leaq  256(%r9), %r9
jb.L1

Instead of this, vpermps can be used.  It doesn't overwrite the index register,
what allows to get rid of 8 movs in this loop:

.L2:
lea   (,%r12,4), %r10
negq  %r10
addq  %rbx, %r10
vpermps   -64(%r10), %zmm3, %zmm4
vpermps   -128(%r10), %zmm3, %zmm6
vpermps   -192(%r10), %zmm3, %zmm8
vpermps   -256(%r10), %zmm3, %zmm10
vfmadd132ps   (%r11,%r12,4), %zmm2, %zmm4
vfmadd132ps   64(%r11,%r12,4), %zmm2, %zmm6
vfmadd132ps   128(%r11,%r12,4), %zmm2, %zmm8
vfmadd132ps   192(%r11,%r12,4), %zmm2, %zmm10
vmaxps(%r13,%r12,4), %zmm4, %zmm5
vmovups   %zmm5, (%rdi,%r12,4)
vmaxps64(%r13,%r12,4), %zmm6, %zmm7
vmovups   %zmm7, 64(%rdi,%r12,4)
vmaxps128(%r13,%r12,4), %zmm8, %zmm9
vmovups   %zmm9, 128(%rdi,%r12,4)
vmaxps192(%r13,%r12,4), %zmm10, %zmm11
vmovups   %zmm11, 192(%rdi,%r12,4)
addq  $64, %r12
cmpq  %rax, %r12
jb.L2

[Bug other/69582] [meta-bug] Cilk+

2016-04-20 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69582
Bug 69582 depends on bug 69363, which changed state.

Bug 69363 Summary: ICE when doing a pragma simd reduction with max
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69363

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

[Bug c++/69363] ICE when doing a pragma simd reduction with max

2016-04-20 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69363

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from iverbin at gcc dot gnu.org ---
Fixed in GCC 7.

[Bug c++/69363] ICE when doing a pragma simd reduction with max

2016-04-20 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69363

--- Comment #6 from iverbin at gcc dot gnu.org ---
Author: iverbin
Date: Wed Apr 20 15:48:53 2016
New Revision: 235290

URL: https://gcc.gnu.org/viewcvs?rev=235290=gcc=rev
Log:
Fix PR c++/69363

gcc/c-family/
PR c++/69363
* c-cilkplus.c (c_finish_cilk_clauses): Remove function.
* c-common.h (c_finish_cilk_clauses): Remove declaration.
gcc/c/
PR c++/69363
* c-parser.c (c_parser_cilk_all_clauses): Use c_finish_omp_clauses
instead of c_finish_cilk_clauses.
* c-tree.h (c_finish_omp_clauses): Add new default argument.
* c-typeck.c (c_finish_omp_clauses): Add new argument.  Allow
floating-point variables in the linear clause for Cilk Plus.
gcc/cp/
PR c++/69363
* cp-tree.h (finish_omp_clauses): Add new default argument.
* parser.c (cp_parser_cilk_simd_all_clauses): Use finish_omp_clauses
instead of c_finish_cilk_clauses.
* semantics.c (finish_omp_clauses): Add new argument.  Allow
floating-point variables in the linear clause for Cilk Plus.
gcc/testsuite/
PR c++/69363
* c-c++-common/cilk-plus/PS/clauses3.c: Adjust dg-error string.
* c-c++-common/cilk-plus/PS/clauses4.c: New test.
* c-c++-common/cilk-plus/PS/pr69363.c: New test.

Added:
trunk/gcc/testsuite/c-c++-common/cilk-plus/PS/clauses4.c
trunk/gcc/testsuite/c-c++-common/cilk-plus/PS/pr69363.c
Modified:
trunk/gcc/c-family/ChangeLog
trunk/gcc/c-family/c-cilkplus.c
trunk/gcc/c-family/c-common.h
trunk/gcc/c/ChangeLog
trunk/gcc/c/c-parser.c
trunk/gcc/c/c-tree.h
trunk/gcc/c/c-typeck.c
trunk/gcc/cp/ChangeLog
trunk/gcc/cp/cp-tree.h
trunk/gcc/cp/parser.c
trunk/gcc/cp/semantics.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/c-c++-common/cilk-plus/PS/clauses3.c

[Bug middle-end/70506] New: [CilkPlus] error: location references block not in block tree

2016-04-01 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70506

Bug ID: 70506
   Summary: [CilkPlus] error: location references block not in
block tree
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Keywords: ice-checking
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: iverbin at gcc dot gnu.org
  Target Milestone: ---

$ cat test.c
void foo  ()
{
  int i, x = 0;

  _Cilk_for (i = 0; i < 100; i++)
x++;

  _Cilk_for (i = 0; i < 100; i++)
x++;
}

$ gcc -c -fcilkplus test.c
test.c: In function ‘foo._cilk_for_fn.0’:
test.c:10:1: error: location references block not in block tree
 }
 ^
D.1952 = .omp_data_i->x;
test.c:10:1: error: location references block not in block tree
D.1953 = D.1952 + 1;
test.c:10:1: error: location references block not in block tree
.omp_data_i->x = D.1953;
test.c:10:1: internal compiler error: verify_gimple failed
0xe31754 verify_gimple_in_cfg(function*, bool)
gcc/tree-cfg.c:5125
0xcd17a8 execute_function_todo
gcc/passes.c:1958
0xcd0941 do_per_function
gcc/passes.c:1652
0xcd1984 execute_todo
gcc/passes.c:2010
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.


Here is DECL_INITIAL (fn->decl) block for foo._cilk_for_fn.0:

 
unit size 
align 32 symtab 0 alias set -1 canonical type 0x... precision 32
min  max 
pointer_to_this >
used SI file test.c line 3 col 10 size  unit size

align 32 context 
value-expr 

arg 0 
nothrow arg 0 
arg 1 > arg 1 >>>


However, similar OpenMP testcase works fine, because corresponding DECL_INITIAL
(fn->decl) contains a subblock:

 
unit size 
align 32 symtab 0 alias set -1 canonical type 0x... precision 32
min  max 
pointer_to_this >
used SI file test.c line 3 col 10 size  unit size

align 32 context 
value-expr 

arg 0 
nothrow arg 0 
arg 1 > arg 1 >>
subblocks 
used SI file test.c line 3 col 7 size  unit
size 
align 32 context > supercontext
>>

[Bug testsuite/64177] Various cilk+ testsuite failures

2016-03-29 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64177

--- Comment #2 from iverbin at gcc dot gnu.org ---
Actually, only 3 tests require 2+ workers (they fail with export
CILK_NWORKERS=1):
FAIL: c-c++-common/cilk-plus/CK/spawning_arg.c
FAIL: c-c++-common/cilk-plus/CK/steal_check.c
FAIL: g++.dg/cilk-plus/CK/catch_exc.cc

It's unclear what happens with others. Maybe it's a bug in libcilkrts.

[Bug testsuite/64177] Various cilk+ testsuite failures

2016-03-28 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64177

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 CC||iverbin at gcc dot gnu.org,
   ||tschwinge at gcc dot gnu.org

--- Comment #1 from iverbin at gcc dot gnu.org ---
This issue was discussed here:
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01817.html

We need __cilkrts_set_param ("nworkers", "2"); in such tests.

[Bug driver/68463] Offloading fails when some objects are compiled with LTO and some without

2016-02-25 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68463

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from iverbin at gcc dot gnu.org ---
Fixed in GCC 6.

[Bug driver/68463] Offloading fails when some objects are compiled with LTO and some without

2016-02-25 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68463

--- Comment #4 from iverbin at gcc dot gnu.org ---
Author: iverbin
Date: Thu Feb 25 12:23:52 2016
New Revision: 233712

URL: https://gcc.gnu.org/viewcvs?rev=233712=gcc=rev
Log:
gcc/
PR driver/68463
* config/gnu-user.h (CRTOFFLOADBEGIN): Define.  Add crtoffloadbegin.o
if
offloading is enabled and -fopenacc or -fopenmp is specified.
(CRTOFFLOADEND): Likewise.
(GNU_USER_TARGET_STARTFILE_SPEC): Add CRTOFFLOADBEGIN.
(GNU_USER_TARGET_ENDFILE_SPEC): Add CRTOFFLOADEND.
* lto-wrapper.c (offloadbegin, offloadend): Remove static vars.
(offload_objects_file_name): New static var.
(tool_cleanup): Remove offload_objects_file_name file.
(find_offloadbeginend): Replace with ...
(find_crtoffloadtable): ... this.
(run_gcc): Remove offload_argc and offload_argv.
Get offload_objects_file_name from -foffload-objects=... option.
Read names of object files with offload from this file, pass them to
compile_images_for_offload_targets.  Don't call find_offloadbeginend
and
don't pass offloadbegin and offloadend to the linker.  Don't pass
offload non-LTO files to the linker, because now they're not claimed.
libgcc/
PR driver/68463
* Makefile.in (crtoffloadtable$(objext)): New rule.
* configure.ac (extra_parts): Add crtoffloadtable$(objext) if
enable_offload_targets is not empty.
* configure: Regenerate.
* offloadstuff.c: Move __OFFLOAD_TABLE__ from crtoffloadend to
crtoffloadtable.
libgomp/
PR driver/68463
* testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c: Remove.
lto-plugin/
PR driver/68463
* lto-plugin.c (struct plugin_offload_file): New.
(offload_files): Change type.
(offload_files_last, offload_files_last_obj): New.
(offload_files_last_lto): New.
(free_2): Adjust accordingly.
(all_symbols_read_handler): Don't add offload files to lto_arg_ptr.
Don't call free_1 for offload_files.  Write names of object files with
offloading to the temporary file.  Add new option to lto_arg_ptr.
(claim_file_handler): Don't claim file if it contains offload sections
without LTO sections.  If it contains offload sections, add to the
list.

Removed:
trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/gnu-user.h
trunk/gcc/lto-wrapper.c
trunk/libgcc/ChangeLog
trunk/libgcc/Makefile.in
trunk/libgcc/configure
trunk/libgcc/configure.ac
trunk/libgcc/offloadstuff.c
trunk/libgomp/ChangeLog
trunk/lto-plugin/ChangeLog
trunk/lto-plugin/lto-plugin.c

[Bug c++/69363] ICE when doing a pragma simd reduction with max

2016-02-17 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69363

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|WAITING

--- Comment #5 from iverbin at gcc dot gnu.org ---
Waiting for stage1: https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01183.html

[Bug c++/69363] ICE when doing a pragma simd reduction with max

2016-02-16 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69363

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #3 from iverbin at gcc dot gnu.org ---
It turns out that there is at least one difference between Cilk Plus and OpenMP
finalization - Cilk Plus allows float and double variables in linear clause,
while OpenMP doesn't.  I'm going to adjust finish_omp_clauses accordingly.

[Bug c++/69363] ICE when doing a pragma simd reduction with max

2016-02-15 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69363

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 CC||iverbin at gcc dot gnu.org

--- Comment #2 from iverbin at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #1)
> The bug is that cp_parser_cilk_simd_all_clauses and
> c_parser_cilk_simd_all_clauses calls c_finish_cilk_clauses rather than the
> OpenMP clauses finalization routines in each of the FEs, perhaps with some
> argument that would say it wants Cilk+ semantics instead of OpenMP.
> That way, it misses lots of important actions that need to be performed on
> the clauses.

Yep, this patch fixes original testcase and 23 fails in Cilk Plus Conformance
Suite v1.2.1  without new regressions.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 6f47edf..9e12a96 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -37621,7 +37621,7 @@ cp_parser_cilk_simd_all_clauses (cp_parser *parser,
cp_token *pragma_token)
   if (clauses == error_mark_node)
 return error_mark_node;
   else
-return c_finish_cilk_clauses (clauses);
+return finish_omp_clauses (clauses, false);
 }

 /* Main entry-point for parsing Cilk Plus <#pragma simd> for loops.  */

[Bug libgomp/69607] undefined reference to MAIN__._omp_fn.0 in atomic_capture-1.f with -flto

2016-02-04 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69607

--- Comment #7 from iverbin at gcc dot gnu.org ---
I believe we should drop support of offloading without linker plugin.

[Bug libgomp/69607] undefined reference to MAIN__._omp_fn.0 in atomic_capture-1.f with -flto

2016-02-04 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69607

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-02-04
 Ever confirmed|0   |1

--- Comment #9 from iverbin at gcc dot gnu.org ---
(In reply to vries from comment #8)
> (In reply to iverbin from comment #7)
> > I believe we should drop support of offloading without linker plugin.
> 
> Same failures occur with -fuse-linker-plugin though.

Ok, I see.
Maybe we could promote all *omp_fn* to global? It should fix "undefined
reference" from offload table in one partition to *omp_fn* in another.

[Bug fortran/69090] Allocatable arrays mishandled in 'omp declare target'

2016-01-02 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69090

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 CC||iverbin at gcc dot gnu.org

--- Comment #4 from iverbin at gcc dot gnu.org ---
(In reply to Alexander Monakov from comment #0)
> Compiling and running the following testcase with non-shared-memory
> accelerator segfaults in the target region, because only pointed-to data of
> the allocatable array is copied, but not the array structure (.data, .offset
> fields) itself.  From my reading of the OpenMP spec, allocatable arrays are
> not explicitely allowed in the 'declare target' directive, so the code is
> ill-formed. However, no diagnostic is issued, and generally I don't know
> what GCC intends to do here.

Do you mean *global* allocatable arrays, or locals fail too?
We discussed it a bit here: https://gcc.gnu.org/ml/gcc/2015-03/msg00238.html
There is also a discussion in OpenMP ML about clarifying the spec; and as per
my understanding global allocatable arrays are not allowed by OpenMP 4.5, so it
would be nice to have a diagnostic instead of segfault.

[Bug driver/68463] Offloading fails when some objects are compiled with LTO and some without

2015-11-30 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68463

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-11-30
 CC||bernds at gcc dot gnu.org,
   ||hubicka at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from iverbin at gcc dot gnu.org ---
> I presume the same issue exists for GCC 5.
Yes.

It seems that we can fix this issue by passing a new option to lto-wrapper,
which will contain a list of object files with offload (or a filename with the
list).  It also will allow to remove some hacky code from lto-wrapper, like
this
comparison: if (strncmp (argv[i], "-fresolution=", sizeof ("-fresolution=") ...

E.g., if there are 4 objects:
* obj1.o - non-LTO, offload;
* obj2.o - LTO, non-offload;
* obj3.o - non-LTO, non-offload;
* obj4.o - LTO, offload;

then linker plugin will claim only obj2.o and obj4.o, as it was intended.  So
it
will call lto-wrapper by passing obj2.o and obj4.o as argv.  But additionally
linker plugin will pass something like: -foffload_objects="obj1.o,obj4.o".
lto-wrapper will perform LTO on objects from argv as usually, and additionally
compile target images using offload IR from obj1.o and obj4.o.
The tables also should match, because host table will consist of: pieces from
all LTO objects with offload + pieces from non-LTO objects with offload.  Just
need to reorder offload_objects correspondingly before passing them to the
targer compiler (obj4.o,obj1.o).
However in this case both obj1.o and obj4.o cannot be surrounded by
crtoffload{begin,end}.o, because lto-wrapper cannot place crtoffload* before or
after obj1.o, because it is unclaimed.  But I guess this can be fixed by
something like linker script, which will place sections from crtoffload* at the
begin/end of the final joint section.

[Bug other/68463] New: Offloading fails when some objects are compiled with LTO and some without

2015-11-20 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68463

Bug ID: 68463
   Summary: Offloading fails when some objects are compiled with
LTO and some without
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Keywords: openacc, openmp
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: iverbin at gcc dot gnu.org
  Target Milestone: ---

The reproducer:

$ cat foo.c

void foo ()
{
  #pragma omp target
  ;
}

$ cat bar.c

void bar ()
{
  #pragma omp target
  ;
}

$ cat main.c

extern void foo ();
extern void bar ();

int main ()
{
  foo ();
  bar ();
  return 0;
}

$ gcc -c -fopenmp -flto foo.c
$ gcc -c -fopenmp bar.c main.c
$ gcc -fopenmp foo.o bar.o main.o

main.o: In function `main':
main.c:(.text+0x14): undefined reference to `bar'
collect2: error: ld returned 1 exit status

This happens because the linker plugin in claim_file_handler claims bar.o, and
linker just drops it, because linker considers bar.o as LTO object.
Without offload it claims only LTO objects, but now it claims objects with any
IR.  (Yes, offloading misuses lto-plugin and lto-wrapper a bit.)

And even worse, it fails with -foffload=disable, because we decided to stream-
out offload IR unconditionally:
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00628.html
So -foffload=disable only disables compilation of target images in lto-wrapper,
but objects are handled by linker plugin before that.

The first solution that comes to mind - do not claim objects, which contain
offload IR without LTO IR.  But this will cause run-time error:
"libgomp: Cannot map target functions or variables (expected 1, have 2)",
because lto-wrapper will surround only *.ltrans.o (derived from foo.o) with
crtoffload{begin,end}.o; and bar.o will be added at the end of the list of
objects from lto-wrapper.  But we need this order to get correct host table:
"crtoffloadbegin.o, *.ltrans.o, bar.o, crtoffloadend.o".
Here is a bit more about tables:
https://gcc.gnu.org/wiki/Offloading#Address_mapping_tables

Or maybe we should implement new linker offload-plugin with its
offload-wrapper,
but I don't know how difficult it would be to support 2 plugins in the linkers,
and it really doesn't solve the issue with crtoffload{begin,end}.o placement.

Or maybe just print an error during linking that offloading doesn't support
mixing LTO and non-LTO objects (even if some of them doesn't have offload)?

[Bug other/67652] liboffloadmic/runtime/offload_engine.cpp:176: strange expression in sizeof ?

2015-09-29 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67652

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from iverbin at gcc dot gnu.org ---
Fixed in trunk.
GCC 5 doesn't have such an issue.


[Bug other/67652] liboffloadmic/runtime/offload_engine.cpp:176: strange expression in sizeof ?

2015-09-28 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67652

--- Comment #3 from iverbin at gcc dot gnu.org ---
Author: iverbin
Date: Mon Sep 28 16:10:16 2015
New Revision: 228210

URL: https://gcc.gnu.org/viewcvs?rev=228210=gcc=rev
Log:
PR other/67652
liboffloadmic/
* runtime/offload_engine.cpp (Engine::init_process): Fix sizeof.

Modified:
trunk/liboffloadmic/ChangeLog
trunk/liboffloadmic/runtime/offload_engine.cpp


[Bug c/67652] liboffloadmic/runtime/offload_engine.cpp:176: strange expression in sizeof ?

2015-09-21 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67652

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |iverbin at gcc dot 
gnu.org

--- Comment #2 from iverbin at gcc dot gnu.org ---
Thanks, I will fix it.


[Bug libgomp/66950] FAIL: libgomp.fortran/examples-4/simd-7.f90 -O0 execution test

2015-08-30 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66950

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||iverbin at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #2 from iverbin at gcc dot gnu.org ---
Fixed.


[Bug libgomp/66950] FAIL: libgomp.fortran/examples-4/simd-7.f90 -O0 execution test

2015-07-22 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66950

--- Comment #1 from iverbin at gcc dot gnu.org ---
Author: iverbin
Date: Wed Jul 22 17:19:31 2015
New Revision: 226080

URL: https://gcc.gnu.org/viewcvs?rev=226080root=gccview=rev
Log:
2015-07-22  Maxim Blumenthal  maxim.blument...@intel.com

PR libgomp/66950
* testsuite/libgomp.c/examples-4/simd-7.c (N): Change to 30 from 45.
(fib_ref): New function.
(fib): Correct corner cases in the recursion.
(main): Replace the non-simd loop with fib_ref call.
* testsuite/libgomp.fortran/examples-4/simd-7.f90: (fib_ref): New
subroutine.
(fibonacci): Lower the parameter N to 30.  Correct accordingly check
for the last array element value.  Replace the non-simd loop with
fib_ref call.  Remove redundant b_ref array.  Remove the comparison
of the last array element with according Fibonacci sequence element.
(fib): Correct corner cases in the recursion.

Modified:
trunk/libgomp/ChangeLog
trunk/libgomp/testsuite/libgomp.c/examples-4/simd-7.c
trunk/libgomp/testsuite/libgomp.fortran/examples-4/simd-7.f90


[Bug libgomp/65338] Offloading from DSO is broken after OpenACC merge to trunk

2015-04-07 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65338

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from iverbin at gcc dot gnu.org ---
Fixed in trunk (r221878).


[Bug libgomp/65338] New: [5 Regression] Offloading from DSO is broken after OpenACC merge to trunk

2015-03-06 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65338

Bug ID: 65338
   Summary: [5 Regression] Offloading from DSO is broken after
OpenACC merge to trunk
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: iverbin at gcc dot gnu.org
CC: jakub at gcc dot gnu.org, kyukhin at gcc dot gnu.org,
tschwinge at gcc dot gnu.org

The testcase:

+ test.c: +

int f_aaa (void);

int main ()
{
  int x = f_aaa ();
  #pragma omp target
x++;
  return x;
}

+ libaaa.c: +

int f_aaa (void)
{
  int x = 0;
  #pragma omp target
x = 10;
  return x;
}

++

$ gcc -fopenmp -shared -fPIC libaaa.c -o libaaa.so
$ gcc -fopenmp -L. -laaa test.c
$ ./a.out
libgomp: Target function wasn't mapped


The problem is caused by this change:

-gomp_register_images_for_device (struct gomp_device_descr *device)
+gomp_register_image_for_device (struct gomp_device_descr *device,
+   struct offload_image_descr *image)
 {
-  int i;
-  for (i = 0; i  num_offload_images; i++)
+  if (!device-offload_regions_registered
+   (device-type == image-type
+ || device-type == OFFLOAD_TARGET_TYPE_HOST))
 {
-  struct offload_image_descr *image = offload_images[i];
-  if (image-type == device-type)
-   device-register_image_func (image-host_table, image-target_data);
+  device-register_image_func (image-host_table, image-target_data);
+  device-offload_regions_registered = true;
 }
 }

We should at least remove device-offload_regions_registered, or rework
loading/registration to support dlopen'ed libraries. Related mail thread:
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01455.html


[Bug lto/63923] FAIL: libgomp.c/examples-4/e.50.1.c (test for excess errors)

2015-01-30 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63923

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from iverbin at gcc dot gnu.org ---
Fixed.


[Bug testsuite/64605] [5 Regression] ERROR: (DejaGnu) proc libatomic_target_compile lto1738.c lto1738.o object additional_flags=-flto does not exist.

2015-01-16 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64605

--- Comment #2 from iverbin at gcc dot gnu.org ---
Author: iverbin
Date: Fri Jan 16 11:29:54 2015
New Revision: 219722

URL: https://gcc.gnu.org/viewcvs?rev=219722root=gccview=rev
Log:
PR testsuite/64605

libatomic/
* testsuite/lib/libatomic.exp: Do not load gcc-dg.exp.
* testsuite/libatomic.c/c.exp: Load gcc-dg.exp.

Modified:
trunk/libatomic/ChangeLog
trunk/libatomic/testsuite/lib/libatomic.exp
trunk/libatomic/testsuite/libatomic.c/c.exp


[Bug testsuite/64605] [5 Regression] ERROR: (DejaGnu) proc libatomic_target_compile lto1738.c lto1738.o object additional_flags=-flto does not exist.

2015-01-16 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64605

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||mikestump at comcast dot net
 Resolution|--- |FIXED

--- Comment #3 from iverbin at gcc dot gnu.org ---
Fixed by r219722


[Bug testsuite/64605] New: [5 Regression] ERROR: (DejaGnu) proc libatomic_target_compile lto1738.c lto1738.o object additional_flags=-flto does not exist.

2015-01-14 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64605

Bug ID: 64605
   Summary: [5 Regression] ERROR: (DejaGnu) proc
libatomic_target_compile lto1738.c lto1738.o object
additional_flags=-flto does not exist.
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: iverbin at gcc dot gnu.org

A fix for check_effective_target_lto (r219606) caused an ERROR in libatomic
testsuite.

To reproduce: make check-target-libatomic

ERROR: (DejaGnu) proc libatomic_target_compile lto4486.c lto4486.o object
additional_flags=-flto does not exist.
The error code is NONE
The info on the error is:
invalid command name libatomic_target_compile
while executing
::tcl_unknown libatomic_target_compile lto4486.c lto4486.o object
additional_flags=-flto
(uplevel body line 1)
invoked from within
uplevel 1 ::tcl_unknown $args


[Bug middle-end/64412] [regression] ICE in offload compiler: in extract_insn, at recog.c:2327

2015-01-08 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64412

--- Comment #19 from iverbin at gcc dot gnu.org ---
(In reply to iverbin from comment #18)
 It seems that the problem with offload is that -fPIC option is passed to the
 offload compiler, but not passed to the host compiler. If I add -fPIC to the
 host compiler as well, everything is ok.
 
 I don't know how -fPIC option affects IR before streaming out,
 -fdump-tree-optimized are identical for pic/nonpic cases, but
 .gnu.offload_lto_.decls sections are different. However debug_tree
 (vnode-decl) for G in ipa_write_summaries are identical for pic/nonpic
 cases.
 
 So, the question is, how to figure out what is different in G's declaration
 in IR, and how it can affect further expansion?

The regression is caused by LTO streaming of TARGET_OPTIMIZE_NODE:
https://gcc.gnu.org/ml/gcc-patches/2015-01/msg00376.html


[Bug middle-end/64412] [regression] ICE in offload compiler: in extract_insn, at recog.c:2327

2014-12-31 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64412

--- Comment #18 from iverbin at gcc dot gnu.org ---
(In reply to Uroš Bizjak from comment #17)
 The x86 backend did survive many years just fine, so I think offload should
 be fixed to follow approach that generic middle-end takes. The testcase I
 posted expands without problems; in this case middle-end knows that symbol
 address has to be moved to a register. It looks like offload bypasses
 generic expansion functions (that would magically fix this issue).

It seems that the problem with offload is that -fPIC option is passed to the
offload compiler, but not passed to the host compiler. If I add -fPIC to the
host compiler as well, everything is ok.

Offload RTL for pic host, pic offload:
  (insn 8 6 9 4 (set (reg:DI 88)
  (mem/u/c:DI (const:DI (unspec:DI [
  (symbol_ref:DI (G)  var_decl 0x7f2c78238c60 G)
  ] UNSPEC_GOTPCREL)) [0  S8 A8])) addr.c:6 -1
   (nil))

Offload RTL for nonpic host, pic offload:
  (insn 8 6 9 4 (set (reg:CC 17 flags)
  (compare:CC (mem/f/c:DI (plus:DI (reg/f:DI 82 virtual-stack-vars)
  (const_int -8 [0xfff8])) [0 p+0 S8 A64])
  (symbol_ref:DI (G)  var_decl 0x7f5546a88c60 G))) addr.c:6 -1
   (nil))

I don't know how -fPIC option affects IR before streaming out,
-fdump-tree-optimized are identical for pic/nonpic cases, but
.gnu.offload_lto_.decls sections are different. However debug_tree
(vnode-decl) for G in ipa_write_summaries are identical for pic/nonpic
cases.

So, the question is, how to figure out what is different in G's declaration in
IR, and how it can affect further expansion?

[Bug target/64412] [regression] ICE in offload compiler: in extract_insn, at recog.c:2327

2014-12-29 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64412

--- Comment #4 from iverbin at gcc dot gnu.org ---
(In reply to H.J. Lu from comment #3)
 (In reply to iverbin from comment #2)
  (In reply to H.J. Lu from comment #1)
   (In reply to iverbin from comment #0)
To reproduce using Intel Xeon Phi emulation:
1. Build offload and host compilers as described in
https://gcc.gnu.org/wiki/Offloading#How_to_try_offloading_enabled_GCC
2. Run make check-target-libgomp RUNTESTFLAGS=c.exp=e.53.5.c
   
   Can you create a stanalone testcase for the Intel Xeon Phi offload
   cross compiler?  It will be easier to debug.
  
  The offload model in GCC implies 2 compilers: one produces IR for OpenMP
  target regions, and another compiles this IR for Intel Xeon Phi.
  There is no single compiler, which could stream offload IR out, then stream
  it in, and then compile.
  I can reduce e.53.5.c testcase, not sure whether this is helpful.
 
 Can you use gcc -v -save-temps to see what is passed to the offload
 compiler and feed them to the offload compiler directly?

Yes, this is possible.
However, the function preload_common_nodes, modified in r218767, is used for
both IN/OUT streaming, therefore the IR should be produced and consumed by
compilers built from the same sources.

Here are the reduced testcase and corresponding IR for: gcc -fopenmp -O1 -S
pr64412.c

To reproduce the error:
1. Configure and make gcc with:
--enable-as-accelerator-for=x86_64-unknown-linux
--host=x86_64-intelmicemul-linux --build=x86_64-intelmicemul-linux
--target=x86_64-intelmicemul-linux
2. Run: as pr64412.s -o pr64412.o 
x86_64-unknown-linux-accel-x86_64-intelmicemul-linux-gnu-gcc -xlto -fopenmp -O1
-shared -fPIC pr64412.o


[Bug target/64412] [regression] ICE in offload compiler: in extract_insn, at recog.c:2327

2014-12-29 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64412

--- Comment #5 from iverbin at gcc dot gnu.org ---
Created attachment 34350
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34350action=edit
Source code


[Bug target/64412] [regression] ICE in offload compiler: in extract_insn, at recog.c:2327

2014-12-29 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64412

--- Comment #6 from iverbin at gcc dot gnu.org ---
Created attachment 34351
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34351action=edit
pr64412.s


[Bug target/64412] [regression] ICE in offload compiler: in extract_insn, at recog.c:2327

2014-12-29 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64412

--- Comment #9 from iverbin at gcc dot gnu.org ---
(In reply to H.J. Lu from comment #8)
 Created attachment 34357 [details]
 A patch
 
 Can you try this?

Thank you, e.53.5.c now passed.

However for-3.c and for-11.C still fails with another unrecognizable insn. I
attached reduced testcase (pr64412_2).


[Bug target/64412] [regression] ICE in offload compiler: in extract_insn, at recog.c:2327

2014-12-29 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64412

--- Comment #10 from iverbin at gcc dot gnu.org ---
Created attachment 34359
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34359action=edit
pr64412_2.c


[Bug target/64412] [regression] ICE in offload compiler: in extract_insn, at recog.c:2327

2014-12-29 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64412

--- Comment #11 from iverbin at gcc dot gnu.org ---
Created attachment 34360
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34360action=edit
pr64412_2.s


[Bug lto/64412] New: [regression] ICE in offload compiler: in extract_insn, at recog.c:2327

2014-12-26 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64412

Bug ID: 64412
   Summary: [regression] ICE in offload compiler: in extract_insn,
at recog.c:2327
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: iverbin at gcc dot gnu.org
CC: bernds at gcc dot gnu.org, hubicka at gcc dot gnu.org,
kyukhin at gcc dot gnu.org, tschwinge at gcc dot gnu.org

After fixing PR lto/64043 (r218767) the offload target compiler began crashing
while reading intermediate bytecode.

FAIL: libgomp.c/examples-4/e.53.5.c (internal compiler error)
FAIL: libgomp.c/for-3.c (internal compiler error)
FAIL: libgomp.c++/for-11.C (internal compiler error)
FAIL: libgomp.fortran/examples-4/e.53.3.f90   -O2  (internal compiler error)
etc.


To reproduce using Intel Xeon Phi emulation:
1. Build offload and host compilers as described in
https://gcc.gnu.org/wiki/Offloading#How_to_try_offloading_enabled_GCC
2. Run make check-target-libgomp RUNTESTFLAGS=c.exp=e.53.5.c


libgomp/testsuite/libgomp.c/examples-4/e.53.5.c: In function 'accum._omp_fn.1':
libgomp/testsuite/libgomp.c/examples-4/e.53.5.c:53:13: error: unrecognizable
insn:
 #pragma omp parallel for reduction(+:tmp)
 ^
(insn 176 66 177 4 (set (reg:DI 0 ax)
(symbol_ref:DI (Q) var_decl 0x7fb57ffcb900 Q)) -1
 (nil))
libgomp/testsuite/libgomp.c/examples-4/e.53.5.c:53:13: internal compiler error:
in extract_insn, at recog.c:2327
0xbadaf7 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
gcc/rtl-error.c:110
0xbadb38 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
gcc/rtl-error.c:118
0xb60522 extract_insn(rtx_insn*)
gcc/recog.c:2327
0xb60221 extract_constrain_insn(rtx_insn*)
gcc/recog.c:2228
0xb6e973 copyprop_hardreg_forward_1
gcc/regcprop.c:773
0xb701db execute
gcc/regcprop.c:1279
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See http://gcc.gnu.org/bugs.html for instructions.
mkoffload-intelmic: fatal error:
x86_64-pc-linux-gnu-accel-x86_64-intelmicemul-linux-gnu-gcc returned 1 exit
status
compilation terminated.
lto-wrapper: fatal error: accel/x86_64-intelmicemul-linux-gnu/mkoffload
returned 1 exit status
compilation terminated.
ld: lto-wrapper failed
collect2: error: ld returned 1 exit status


[Bug lto/63923] FAIL: libgomp.c/examples-4/e.50.1.c (test for excess errors)

2014-12-26 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63923

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 CC||iverbin at gcc dot gnu.org

--- Comment #2 from iverbin at gcc dot gnu.org ---
This issue is fixed by r217773


[Bug lto/64412] [regression] ICE in offload compiler: in extract_insn, at recog.c:2327

2014-12-26 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64412

--- Comment #2 from iverbin at gcc dot gnu.org ---
(In reply to H.J. Lu from comment #1)
 (In reply to iverbin from comment #0)
  To reproduce using Intel Xeon Phi emulation:
  1. Build offload and host compilers as described in
  https://gcc.gnu.org/wiki/Offloading#How_to_try_offloading_enabled_GCC
  2. Run make check-target-libgomp RUNTESTFLAGS=c.exp=e.53.5.c
 
 Can you create a stanalone testcase for the Intel Xeon Phi offload
 cross compiler?  It will be easier to debug.

The offload model in GCC implies 2 compilers: one produces IR for OpenMP target
regions, and another compiles this IR for Intel Xeon Phi.
There is no single compiler, which could stream offload IR out, then stream it
in, and then compile.
I can reduce e.53.5.c testcase, not sure whether this is helpful.


[Bug regression/63868] [5 Regression] Multiple failures in the libgomp test suite between r217458 and r217501.

2014-11-19 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63868

--- Comment #6 from iverbin at gcc dot gnu.org ---
Author: iverbin
Date: Wed Nov 19 13:52:12 2014
New Revision: 217773

URL: https://gcc.gnu.org/viewcvs?rev=217773root=gccview=rev
Log:
PR regression/63868
* cgraph.c (cgraph_node::create): Guard g-have_offload with
ifdef ENABLE_OFFLOADING.
* omp-low.c (create_omp_child_function): Likewise.
(expand_omp_target): Guard node-mark_force_output and offload_funcs
with ifdef ENABLE_OFFLOADING.
* varpool.c (varpool_node::get_create): Guard g-have_offload and
offload_vars with ifdef ENABLE_OFFLOADING.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/cgraph.c
trunk/gcc/omp-low.c
trunk/gcc/varpool.c


[Bug regression/63868] [5 Regression] Multiple failures in the libgomp test suite between r217458 and r217501.

2014-11-17 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63868

iverbin at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2014-11-17
  Component|libgomp |regression
   Assignee|unassigned at gcc dot gnu.org  |iverbin at gcc dot 
gnu.org
   Target Milestone|--- |5.0
 Ever confirmed|0   |1


[Bug bootstrap/63853] [5.0 Regression] The use of strchrnul breaks bootstrap on x86_64-apple-darwin14.

2014-11-13 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63853

--- Comment #12 from iverbin at gcc dot gnu.org ---
Author: iverbin
Date: Thu Nov 13 22:06:15 2014
New Revision: 217524

URL: https://gcc.gnu.org/viewcvs?rev=217524root=gccview=rev
Log:
2014-11-13  Dominique Dhumieres  domi...@lps.ens.fr

PR bootstrap/63853
gcc/
* gcc.c (handle_foffload_option): Replace strchrnul with strchr.
* lto-wrapper.c (parse_env_var, append_offload_options): Likewise.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/gcc.c
trunk/gcc/lto-wrapper.c


[Bug rtl-optimization/63618] CSE at IRA pass delete SET_GOT which is used later

2014-10-23 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63618

--- Comment #5 from iverbin at gcc dot gnu.org ---
Author: iverbin
Date: Thu Oct 23 16:52:11 2014
New Revision: 216596

URL: https://gcc.gnu.org/viewcvs?rev=216596root=gccview=rev
Log:
PR target/63534
PR target/63618
gcc/
* cse.c (delete_trivially_dead_insns): Consider PIC register is used
while it is pseudo.
* dse.c (deletable_insn_p): Likewise.
gcc/testsuite/
* gcc.target/i386/pr63618.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr63618.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/cse.c
trunk/gcc/dce.c
trunk/gcc/testsuite/ChangeLog


[Bug target/63534] [5 Regression] Bootstrap failure on x86_64/i686-linux

2014-10-23 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63534

--- Comment #37 from iverbin at gcc dot gnu.org ---
Author: iverbin
Date: Thu Oct 23 16:52:11 2014
New Revision: 216596

URL: https://gcc.gnu.org/viewcvs?rev=216596root=gccview=rev
Log:
PR target/63534
PR target/63618
gcc/
* cse.c (delete_trivially_dead_insns): Consider PIC register is used
while it is pseudo.
* dse.c (deletable_insn_p): Likewise.
gcc/testsuite/
* gcc.target/i386/pr63618.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr63618.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/cse.c
trunk/gcc/dce.c
trunk/gcc/testsuite/ChangeLog


[Bug c/63307] [4.9/5 Regression] Cilk+ breaks -fcompare-debug bootstrap

2014-10-20 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63307

--- Comment #6 from iverbin at gcc dot gnu.org ---
Author: iverbin
Date: Mon Oct 20 15:22:09 2014
New Revision: 216483

URL: https://gcc.gnu.org/viewcvs?rev=216483root=gccview=rev
Log:
PR c/63307
gcc/c-family/
* cilk.c: Include vec.h.
(struct cilk_decls): New structure.
(wrapper_parm_cb): Split this function to...
(fill_decls_vec): ...this...
(create_parm_list): ...and this.
(compare_decls): New function.
(for_local_cb): Remove.
(wrapper_local_cb): Ditto.
(build_wrapper_type): For now first traverse and fill vector of
declarations then sort it and then deal with sorted vector.
(cilk_outline): Ditto.
(declare_one_free_variable): Ditto.

Modified:
trunk/gcc/c-family/ChangeLog
trunk/gcc/c-family/cilk.c


[Bug bootstrap/63536] [5 Regression] bootstrap failed when configured with --with-cpu=slm

2014-10-15 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63536

--- Comment #5 from iverbin at gcc dot gnu.org ---
Author: iverbin
Date: Wed Oct 15 17:49:27 2014
New Revision: 216280

URL: https://gcc.gnu.org/viewcvs?rev=216280root=gccview=rev
Log:
PR target/63536
gcc/java/
* lang.c (java_print_error_function): Add check on NULL function
context.


Modified:
trunk/gcc/java/ChangeLog
trunk/gcc/java/lang.c


[Bug target/63534] [5 Regression] Bootstrap failure on x86_64/i686-linux

2014-10-14 Thread iverbin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63534

--- Comment #6 from iverbin at gcc dot gnu.org ---
Author: iverbin
Date: Tue Oct 14 16:26:57 2014
New Revision: 216208

URL: https://gcc.gnu.org/viewcvs?rev=216208root=gccview=rev
Log:
PR target/63534
gcc/
* config/i386/i386.c (ix86_expand_split_stack_prologue): Make
__morestack local.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c