[Bug target/96827] [10/11 Regression] __m128i from _mm_set_epi32 is backwards with -O3

2020-10-01 Thread joel.hutton at arm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96827

Joel Hutton  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #14 from Joel Hutton  ---
backported to GCC 10.

[Bug libgomp/96837] A false if clause in "omp parallel" seriously affects the performance

2020-09-30 Thread joel.hutton at arm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96837

Joel Hutton  changed:

   What|Removed |Added

 CC||joel.hutton at arm dot com

--- Comment #5 from Joel Hutton  ---
Sorry, my commit does not address this bug, I made a typo with PR number in the
commit message.

[Bug target/96827] [10/11 Regression] __m128i from _mm_set_epi32 is backwards with -O3

2020-09-30 Thread joel.hutton at arm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96827

Joel Hutton  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Joel Hutton  ---
This is fixed on trunk by 97b798d8, unfortunately I made a typo in the commit
message.

[Bug target/96827] [10/11 Regression] __m128i from _mm_set_epi32 is backwards with -O3

2020-09-07 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96827

--- Comment #8 from Joel Hutton  ---
I'm working on this.

I believe this may have been introduced by my earlier SLP vector constructor
patch.(commit 10d1592)

What I believe to be the relevant section:

+  else if (constructor)
+{
+  tree rhs = gimple_assign_rhs1 (stmt_info->stmt);
+  tree val;
+  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val)
+   {
+ if (TREE_CODE (val) == SSA_NAME)
+   {
+ gimple* def = SSA_NAME_DEF_STMT (val);
+ stmt_vec_info def_info = vinfo->lookup_stmt (def);
+ /* Value is defined in another basic block.  */
+ if (!def_info)
+   return false;
+ scalar_stmts.safe_push (def_info);
+   }
+ else
+   return false;
+   }
+}

I'm investigating, but I suspect pushing to a stack which is then popped from
later has created a reversal of element order.

[Bug tree-optimization/85804] [8/9/10 Regression][AArch64] Mis-compilation of loop with strided array access and xor reduction

2020-03-03 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85804

Joel Hutton  changed:

   What|Removed |Added

 CC||joel.hutton at arm dot com

--- Comment #9 from Joel Hutton  ---
This was fixed on trunk by 69f8c1ae (From SVN: r276700)

[Bug target/92922] [10 regression] [ilp32] FAIL: gcc.target/aarch64/sve/acle/asm/ldnt1_u32.c -std=c90 -O1 -g -DTEST_FULL (internal compiler error)

2020-02-26 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92922

--- Comment #2 from Joel Hutton  ---
This was fixed by Richard Sandiford's patch. 

commit fb15e2bab5267213b8706fa6a29eeef94f62a524
Author: Richard Sandiford 
Date:   Mon Jan 20 19:29:25 2020 +

aarch64: Fix SVE ACLE handling of SImode pointers

[Bug target/92922] [10 regression] [ilp32] FAIL: gcc.target/aarch64/sve/acle/asm/ldnt1_u32.c -std=c90 -O1 -g -DTEST_FULL (internal compiler error)

2020-02-26 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92922

Joel Hutton  changed:

   What|Removed |Added

 CC||joel.hutton at arm dot com

--- Comment #1 from Joel Hutton  ---
This appears to be fixed on trunk.

[Bug target/93135] [10 Regression] g++.dg/cpp0x/initlist118.C fails on aarch64

2020-01-29 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93135
Bug 93135 depends on bug 93221, which changed state.

Bug 93221 Summary: [10 Regression] ICE maximum number of generated reload insns 
per insn achieved (90) on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93221

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug target/93221] [10 Regression] ICE maximum number of generated reload insns per insn achieved (90) on aarch64

2020-01-29 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93221

Joel Hutton  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Joel Hutton  ---
Fixed on trunk.

[Bug rtl-optimization/93303] [10 Regression] ICE in lra_constraints.c4948 on aarch64-linux-gnu

2020-01-29 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93303
Bug 93303 depends on bug 93221, which changed state.

Bug 93221 Summary: [10 Regression] ICE maximum number of generated reload insns 
per insn achieved (90) on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93221

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug target/93221] [10 Regression] ICE maximum number of generated reload insns per insn achieved (90) on aarch64

2020-01-20 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93221

--- Comment #6 from Joel Hutton  ---
The regression seems to be introduced by this commit:

commit 11b8091fb33c894cea20702d3e85389723987910
Author: Eric Botcazou 
Date:   Wed Dec 18 23:03:23 2019 +

* ira.c (ira): Use simple LRA algorithm when not optimizing.

From-SVN: r279550

[Bug target/93221] [10 Regression] ICE maximum number of generated reload insns per insn achieved (90) on aarch64

2020-01-20 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93221

--- Comment #5 from Joel Hutton  ---
There's some problem with inserting an OI before an OI, which requires an OI
before it etc.

   18: r98:OI=r99:OI
  REG_DEAD r97:V4SI
Inserting insn reload before:
   19: r99:OI=r97:V4SI#0

0 Non input pseudo reload: reject++
  alt=0,overall=13,losers=2,rld_nregs=4
0 Non pseudo reload: reject++
  alt=1,overall=7,losers=1,rld_nregs=2
0 Non input pseudo reload: reject++
1 Spill pseudo into memory: reject+=3
Using memory insn operand 1: reject+=3
alt=2,overall=19,losers=2 -- refuse
 Choosing alt 1 in insn 19:  (0) Utv  (1) w {*aarch64_movoi}
  Creating newreg=100, assigning class FP_REGS to r100
   19: r99:OI=r100:OI
Inserting insn reload before:
   20: r100:OI=r97:V4SI#0

[Bug target/93221] [10 Regression] ICE maximum number of generated reload insns per insn achieved (90) on aarch64

2020-01-10 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93221

Joel Hutton  changed:

   What|Removed |Added

 CC||joel.hutton at arm dot com

--- Comment #1 from Joel Hutton  ---
I'm taking a look at this.

[Bug testsuite/92391] gcc.dg/vect/bb-slp-40.c FAILs

2019-11-29 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92391

--- Comment #13 from Joel Hutton  ---
This appears to no longer be failing in the latest 'gcc-testresults' can this
be closed?

[Bug testsuite/92391] gcc.dg/vect/bb-slp-40.c FAILs

2019-11-26 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92391

--- Comment #11 from Joel Hutton  ---
I see, I think you're right. I was able to replicate the failure when running
the whole 'vect' testsuite. I tried the following change:

diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index 5fe1e83492c..a4418a31516 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5753,7 +5753,7 @@ proc check_effective_target_vect_bswap { } {
 # one vector length.

 proc check_effective_target_vect_char_add { } {
-return [check_cached_effective_target_indexed vect_int {
+return [check_cached_effective_target_indexed vect_char_add {
   expr {
  [istarget i?86-*-*] || [istarget x86_64-*-*]
  || ([istarget powerpc*-*-*]


which appeared to work, however I'm not familiar with how
check_cached_effective_target_indexed works, so I'm not sure if this is
sufficient.

[Bug testsuite/92391] gcc.dg/vect/bb-slp-40.c FAILs

2019-11-26 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92391

--- Comment #9 from Joel Hutton  ---
Weird, I tested on gcc202.

% uname -a
Linux gcc202 4.19.0-5-sparc64-smp #1 SMP Debian 4.19.37-6 (2019-07-18) sparc64
GNU/Linux

% cat gcc/testsuite/gcc/gcc.sum
Test run by joelh on Tue Nov 26 17:22:27 2019
Native configuration is sparc64-unknown-linux-gnu

=== gcc tests ===

Schedule of variations:
unix

Running target unix
Running /home/joelh/gcc/src/gcc/testsuite/gcc.dg/vect/vect.exp ...
UNSUPPORTED: gcc.dg/vect/bb-slp-40.c
UNSUPPORTED: gcc.dg/vect/bb-slp-40.c -flto -ffat-lto-objects

=== gcc Summary ===

# of unsupported tests  2
/home/joelh/gcc/objdir/gcc/xgcc  version 10.0.0 20191126 (experimental) (GCC)

[Bug testsuite/92391] gcc.dg/vect/bb-slp-40.c FAILs

2019-11-26 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92391

--- Comment #6 from Joel Hutton  ---
This should be fixed with Richard Sandifords changes.

[Bug tree-optimization/86504] vectorization failure for a nest loop

2019-11-21 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86504

--- Comment #11 from Joel Hutton  ---
Should be fixed on trunk by r277784

[Bug tree-optimization/86504] vectorization failure for a nest loop

2019-11-21 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86504

--- Comment #10 from Joel Hutton  ---
Should be fixed on trunk

[Bug testsuite/92391] gcc.dg/vect/bb-slp-40.c FAILs

2019-11-08 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92391

--- Comment #4 from Joel Hutton  ---
Hi Rainer

I set up an account with cfarm, and tested on gcc202, the test fails because on
SPARC, no constructor is generated, the  for whatever reason (see below) making
the test not really applicable. I suggest making the test an xfail, so that if
at some point in the future SPARC generates a constructor here the test will
apply. The other option is to skip it for SPARC.

tree output on SPARC at the slp pass:
.
.
.
  MEM[(char *)d_202 + 25B] = _136;
  # .MEM_233 = VDEF <.MEM_232>
  MEM[(char *)d_202 + 26B] = _140;
  # .MEM_234 = VDEF <.MEM_233>
  MEM[(char *)d_202 + 27B] = _144;
  # .MEM_235 = VDEF <.MEM_234>
  MEM[(char *)d_202 + 28B] = _148;
  # .MEM_236 = VDEF <.MEM_235>
  MEM[(char *)d_202 + 29B] = _152;
  # .MEM_237 = VDEF <.MEM_236>
  MEM[(char *)d_202 + 30B] = _156;
  # .MEM_238 = VDEF <.MEM_237>
  MEM[(char *)d_202 + 31B] = _160;
  # PT = { D.1522 } (nonlocal, interposable)
  d_239 = d_202 + 32;

[Bug testsuite/92391] gcc.dg/vect/bb-slp-40.c FAILs

2019-11-06 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92391

--- Comment #2 from Joel Hutton  ---
As this fails when it was introduced, and I don't have a SPARC machine to test
on, I suggest making this XFAIL on sparc.

[Bug testsuite/92391] gcc.dg/vect/bb-slp-40.c FAILs

2019-11-06 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92391

--- Comment #1 from Joel Hutton  ---
I'm looking into this.

[Bug other/92366] new test case gcc.dg/vect/bb-slp-41.c fails with its introduction in r277784

2019-11-05 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92366

--- Comment #2 from Joel Hutton  ---
I'm looking into this. The testcase triggered a case with a constructor with a
large number of elements (at least on aarch64).

[Bug tree-optimization/86504] vectorization failure for a nest loop

2019-07-30 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86504

Joel Hutton  changed:

   What|Removed |Added

 CC||joel.hutton at arm dot com

--- Comment #8 from Joel Hutton  ---
(In reply to Richard Biener from comment #3)

Hi Richard,

> So the vectorization issue would be that basic-block vectorization doesn't
> catch this in a very nice way - on x86 we pull out the invariant computation
> and have a vectorized (outer) loop storing to d.

Just a small clarification, do you mean to say that there is a difference
between the way x86 and aarch64 handle this, as far as I can see they handle
this in the same way.