subject:"\[Bug middle\-end\/111683\] \[11\/12\/13\/14 Regression\] Incorrect answer when using SSE2 intrinsics with \-O3"

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-23 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #26 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:8fc5593df8e0d36cc5bd8ea21097a491a634a866

commit r14-9639-g8fc5593df8e0d36cc5bd8ea21097a491a634a866
Author: Jakub Jelinek 
Date:   Sat Mar 23 11:17:44 2024 +0100

predcom: Punt for steps which aren't multiples of access size [PR111683]

On the following testcases, there is no overlap between data references
within a single iteration, but the data references have size which is twice
as large as the step, which means the data references overlap with the next
iteration which predcom doesn't take into account.
As discussed in the PR, even if the reference size is smaller than step,
if step isn't a multiple of the reference size, there could be overlaps
with
some other iteration later on.
The initial version of the patch regressed (test still passed, but predcom
didn't optimize anymore) pr71083.c which has a packed char, short structure
and was reading/writing the short 2 bytes in there with step 3.
The following patch deals with that by retrying for COMPONENT_REFs also the
aggregate sizes etc., so that it then compares 3 bytes against step 3.
In make check-gcc/check-g++ this patch I believe affects code generation
for only the 2 new testcases according to statistics I've gathered.

2024-03-23  Jakub Jelinek  

PR middle-end/111683
* tree-predcom.cc (pcom_worker::suitable_component_p): If has_write
and comp_step is RS_NONZERO, return false if any reference in the
component doesn't have DR_STEP a multiple of access size.

* gcc.dg/pr111683-1.c: New test.
* gcc.dg/pr111683-2.c: New test.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-22 Thread rguenther at suse dot de via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #25 from rguenther at suse dot de  ---
On Fri, 22 Mar 2024, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683
> 
> --- Comment #24 from Jakub Jelinek  ---
> (In reply to Richard Biener from comment #23)
> > It looks like this could go to suitable_reference_p instead?
> 
> You mean return false for those making them not suitable at all?
> I thought without a write such references would act more like RS_ANY, but
> a reason I didn't just treat such references as RS_ANY rather than RS_NONZERO
> in suitable_reference_p was because of the assert that all refs in a component
> have the same ref_step_type but nothing actually comparing it before the
> assertion.

Hmm, true.

> But if you think I should just return false from suitable_reference_p if the
> step isn't a multiple of the sizes, I can surely try that.
> 
> > That said, I do wonder why with my patch split_data_refs_to_components
> > doesn't fixup.  I think it's supposed to handle the case where
> > dependences are unknown conservatively...
> 
> I'm afraid I'm not familiar with data refs enough to know what was going on.

I tried to follow what happens there and I'm also somewhat lost.

Anyway, I think fixing this in predcom in a convenient place even if
it might be not the true fix is OK.  You might want to put a comment
before any such fix indicating there might be more latent issues
in predcom or dependence analysis in general.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-22 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #24 from Jakub Jelinek  ---
(In reply to Richard Biener from comment #23)
> It looks like this could go to suitable_reference_p instead?

You mean return false for those making them not suitable at all?
I thought without a write such references would act more like RS_ANY, but
a reason I didn't just treat such references as RS_ANY rather than RS_NONZERO
in suitable_reference_p was because of the assert that all refs in a component
have the same ref_step_type but nothing actually comparing it before the
assertion.

But if you think I should just return false from suitable_reference_p if the
step isn't a multiple of the sizes, I can surely try that.

> That said, I do wonder why with my patch split_data_refs_to_components
> doesn't fixup.  I think it's supposed to handle the case where
> dependences are unknown conservatively...

I'm afraid I'm not familiar with data refs enough to know what was going on.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #23 from Richard Biener  ---
It looks like this could go to suitable_reference_p instead?

That said, I do wonder why with my patch split_data_refs_to_components
doesn't fixup.  I think it's supposed to handle the case where
dependences are unknown conservatively...

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-22 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #57768|0   |1
is obsolete||

--- Comment #22 from Jakub Jelinek  ---
Created attachment 57769
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57769=edit
gcc14-pr111683.patch

Actually for bit-fields, TREE_TYPE on the ref is the underlying type, so either
we'd need to somehow better try to understand the actual reference size in that
case, or simply look throught the BIT_FIELD_REF COMPONENT_REFs.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-22 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #21 from Jakub Jelinek  ---
Created attachment 57768
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57768=edit
gcc14-pr111683.patch

Updated patch which uses wi::multiple_of_p but doesn't regress pr71083.c.
To be precise, it actually wasn't about foo with the bitfield, that used to be
optimized before/after, but bar which uses packed struct with char and short
members, step was 3 but the short access had size 2.  But given that the 2
sized access is a COMPONENT_REF of the 3 byte sized struct, if step is a
multiple of
3, that is fine as well.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-22 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #20 from Jakub Jelinek  ---
That is true.  I've been also wondering whether e.g. for the pr71083.c case we
couldn't just look through all COMPONENT_REFs of the DR_REF (and maybe
ARRAY_REFs with constant indexes) and check type size of the aggregate it is in
against the size.  That could perhaps fix the predcom of pr71083.c, because it
is a 24-bit bitfield in 4 byte structure.

Will you take this over, or should I tweak my patch?

Note, statistics I've gathered with the above posted patch was that the patch
did something in
/home/jakub/src/gcc/gcc/testsuite/c-c++-common/torture/pr101636.c bar
/home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/compile/pr32399.c f
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/autopar/pr69109-2.c f
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/pr71575-2.c fn1
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/loop-versioning-6.c f1
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/pr111683-1.c main
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/pr111683-2.c main
but only changed the generated code in the last 2 (those are intentional).
The older patch with wi::multiple_of_p affected
/home/jakub/src/gcc/gcc/testsuite/c-c++-common/torture/pr101636.c bar
/home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/compile/pr32399.c f
/home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/execute/pr71083.c bar
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/autopar/pr69109-2.c f
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/pr71575-2.c fn1
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/loop-versioning-6.c f1
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/pr111683-1.c main
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/pr111683-2.c main
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/torture/pr112736.c fn1
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/torture/pr68379.c fn1
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/tree-ssa/pr87022.c main
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr101445.c foo
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr68502-2.c reset_nodes
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr81740-2.c main
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/vect-avg-16.c f
/home/jakub/src/gcc/gcc/testsuite/g++.dg/vect/pr95297.cc test
and changed code generation of the 2 expected tests and pr71083.c.  All this
statistics was solely from make check-gcc check-g++.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #19 from Richard Biener  ---
But note that {0, +, 3 } and {2, + , 3} with size 2 will still eventually
overlap.  See dr_analyze_indices where we attempt to make DR_INIT a
multiple of the element size (probably also not a perfect solution).

In the end representing the access function for the pointer with byte
granularity is a mistake and it should be translated to be based on
the access size.  That of course makes a byte step of 3 with an access
size of 2 not representable (but it avoids these kind of issues).

The other possibility is to filter the remaining problematic cases
during dependence analysis itself where we see both DR_INITs.  As said
the index based dependence analysis doesn't know about such thing as
partial element overlaps.

For the PR at hand your patch looks reasonable, but I think there's
a problem in how we handle the pointer-based access functions.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-22 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #18 from Jakub Jelinek  ---
Using wi::multiple_of_p is what I've tried first but that regressed the
generated code for pr71083.c (the test still PASSed, but predcom no longer
triggered on it).
I think it has access size 3 and step 4, so step is not a multiple of access
size, but that is not a problem, all we actually care about is just if access
size isn't bigger than step.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-22 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #17 from Jakub Jelinek  ---
Created attachment 57767
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57767=edit
gcc14-pr111683.patch

Patch I've tested overnight for this.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #16 from Richard Biener  ---
Created attachment 57766
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57766=edit
patch

Oddly enough the following patch doesn't fix it but it correctly prevents us
from recording the access functions and data dependence analysis now says

(Data Dep:
#(Data Ref:
#  bb: 3
#  stmt: _2 = MEM[(U * {ref-all})_1];
#  ref: MEM[(U * {ref-all})_1];
#  base_object: MEM[(U * {ref-all})_1];
#)
#(Data Ref:
#  bb: 3
#  stmt: _7 = MEM[(U * {ref-all})_6];
#  ref: MEM[(U * {ref-all})_6];
#  base_object: MEM[(U * {ref-all})_6];
#)
(don't know)

but predictive commoning still does

Store-loads chain 0x48229b0
  max distance 2, may reuse first
  inits MEM[(U * {ref-all})] MEM[(U * {ref-all}) + 8B]
  references:
MEM[(U * {ref-all})_6] (id 3, write)
  offset -2
  distance 0
MEM[(U * {ref-all})_1] (id 0)
  offset 0
  distance 2

Executing predictive commoning without unrolling.

predcom has its own analysis it seems and it makes a similar mistake.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Richard Biener  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #15 from Richard Biener  ---
the problem is in dependence analysis itself - by design classic dependence
analysis relies on a[i] and a[i+1] not aliasing but for DRs like

Creating dr for MEM[(U * {ref-all})_1]
analyze_innermost: success.
base_address: 
offset from base address: 0
constant offset from base address: 0
step: 8
base alignment: 32
base misalignment: 0 
offset alignment: 128
step alignment: 8
base_object: MEM[(U * {ref-all})]
Access function 0: {0B, +, 8}_1

The access function evolves in a way that this doesn't hold.

It might be tempting to fail data-ref analysis itself but it's really
subscript dependence testing that doesn't work.  OTOH we already
have code in dr_analyze_indices to deal with similar cases concerning
overlap by adjusted base offset.

Note in theory we handle non-constant increments, so we eventually
would have to ditch support for those at all.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-21 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Jakub Jelinek  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #14 from Jakub Jelinek  ---
It is indeed pcom.
Slightly better testcase which doesn't use double arithmetics but long long:
long long b[6] = { 3, 4, 5, 6, 7, 8 };
long long c[16];
typedef long long U __attribute__ ((vector_size(16), may_alias, aligned(1)));
typedef long long V __attribute__ ((vector_size(16), may_alias));

int
main ()
{
  for (int f = 0; f < 6; f++)
{
  *(U *) [f] = *(U *) [f] + (V) { b[f], b[f] };
  *(U *) [f + 2] = *(U *) [f + 2] + (V) { b[f], b[f] };
}
  if (c[1] != 7)
__builtin_abort ();
  return 0;
}

And it doesn't need to be about vector types either, nor use any aligned(1)
loads/stores in the source:
int b[6] = { 3, 4, 5, 6, 7, 8 }, c[12];
int d[16] = { 0, 1, 3, 6, 10, 14, 12, 9, 5, 0, 0, 0 };

int
main ()
{
  int i;
  if (sizeof (int) * 2 != sizeof (long long))
return 0;
  for (i = 0; i < 6; i++)
{
  long long a;
  __builtin_memcpy (, [i], sizeof (a));
  a += (((long long) i) << (sizeof (int) * __CHAR_BIT__)) + i;
  __builtin_memcpy ([i], , sizeof (a));
  __builtin_memcpy (, [i + 2], sizeof (a));
  a += (((long long) i) << (sizeof (int) * __CHAR_BIT__)) + i;
  __builtin_memcpy ([i + 2], , sizeof (a));
}
  if (__builtin_memcmp ([0], [0], sizeof (c)))
__builtin_abort ();
  return 0;
}
On the last testcase the *.pcom change against previous dump is:
 int main ()
 {
+  unsigned long D__lsm1.7;
+  unsigned long D__lsm0.6;
   long long int a;
   int i;
   int * _1;
@@ -21,18 +31,24 @@ int main ()
   unsigned long _20;
   sizetype _25;
   sizetype _28;
+  unsigned long _33;
+  unsigned long _37;
   unsigned int ivtmp_38;
   unsigned int ivtmp_39;

[local count: 153437704]:
+  _37 = MEM  [(char * {ref-all})];
+  _33 = MEM  [(char * {ref-all}) + 4B];

[local count: 920304121]:
   # i_26 = PHI 
   # ivtmp_39 = PHI 
+  # D__lsm0.6_34 = PHI 
+  # D__lsm1.7_35 = PHI 
   _28 = (sizetype) i_26;
   _9 = _28 * 4;
   _1 =  + _9;
-  _12 = MEM  [(char * {ref-all})_1];
+  _12 = D__lsm0.6_34;
   a_13 = (long long int) _12;
   _19 = _28 * 4294967297;
   _2 = (long long int) _19;
@@ -48,6 +64,7 @@ int main ()
   _6 = _2 + a_18;
   _20 = (unsigned long) _6;
   MEM  [(char * {ref-all})_5] = _20;
+  D__lsm0.6_36 = _20;
   i_23 = i_26 + 1;
   ivtmp_38 = ivtmp_39 - 1;
   if (ivtmp_38 != 0)

The data refs in the loop have access fns {0B, +, 4}_1 and {8B, +, 4}_1, so
within
the same iteration there is no overlap between the two, but because they are
actually
8 byte loads/stores, there is partial overlap between the adjacent iterations
in them that pcom doesn't take into account.

So, shall we somewhere punt for references with larger access sizes than their
DR_STEP?
Something else?

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #13 from Andrew Pinski  ---
(In reply to Jakub Jelinek from comment #12)
> Further simplified without any headers, -O1 works, -O2 fails.

For this testcase (at -O3), GCC 4.7.4 works but 4.8.1 fails.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-19 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #12 from Jakub Jelinek  ---
Further simplified without any headers, -O1 works, -O2 fails.
double a[2] = { 0.8147, 0.9058 };
double b[6] = { 0.1576, 0.9706, 1, 1, 1, 1 };
double c[16];
typedef double U __attribute__ ((vector_size(16), may_alias, aligned(1)));
typedef double V __attribute__ ((vector_size(16), may_alias));

int
main ()
{
  for (int f = 0; f < 6; f++)
{
  *(U *) [f] = *(U *) [f] + (V) { b[f], b[f] } * *(U *) a;
  *(U *) [f + 2] = *(U *) [f + 2] + (V) { b[f], b[f] } * *(U *) a;
}
  if (c[1] <= 0.93 || c[1] >= 0.94)
__builtin_abort ();
  return 0;
}

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-12 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P1  |P2

--- Comment #11 from Jakub Jelinek  ---
GCC 7-13 has been released with this bug, so P2.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #10 from Andrew Pinski  ---
Created attachment 57678
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57678=edit
testcase that fails all the way back to GCC 4.8.x

This one fails until GCC 4.8.1, works in GCC 4.7.4 but that is because the
outer loop is unrolled in GCC 4.7.4.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #9 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #8)
> Note with the reduced testcase in comment #7, -fno-tree-vrp has no effect on
> it any more so it might make sense to do another bisection on that testcase.

I am suspecting r7-104-gfc9cf6da84a899 exposed the issue now. in GCC 6,
_mm_storeu_pd/_mm_loadu_pd were still builtins, but in GCC 7, they become
directly unaligned loads (starting with r7-104-gfc9cf6da84a899 ).

I suspect if manually change _mm_storeu_pd/_mm_loadu_pd to be similar to what
GCC 7 had, then we might even up with this failing before GCC 7.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Andrew Pinski  changed:

   What|Removed |Added

   Keywords|needs-reduction |

--- Comment #8 from Andrew Pinski  ---
Note with the reduced testcase in comment #7, -fno-tree-vrp has no effect on it
any more so it might make sense to do another bisection on that testcase.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #7 from Andrew Pinski  ---
Created attachment 57677
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57677=edit
Slightly more reduced

Removes the extra induction variables and manually unroll the inner loop.

What I am seeing is the first store is being removed.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-10 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #6 from Sam James  ---
Created attachment 57660
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57660=edit
reduced.c

Bit smaller.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #5 from kugan at gcc dot gnu.org ---
 -O3 -fno-tree-vectorize  and -O3 -fno-tree-vrp works. I looked at the ever
dump and it is not doing anything suspicious. Looks like range_info usage in
vectoriser is causing the problem.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3

2024-03-07 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
   Priority|P3  |P1

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3

2023-12-12 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #4 from Jakub Jelinek  ---
Started with r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
Slightly reduced:
#include 

void convn_script(const double in9[10], const double in10[7], double out5[16])
{
  int iB = 0;
  int iC = 0;
  __builtin_memset([0], 0, 16U * sizeof(double));
  for (int i = 0; i < 7; i++) {
int b_i;
int vectorUB;
if (i + 10 <= 15)
  b_i = 9;
else
  b_i = 15 - i;
vectorUB = (((b_i + 1) / 2) << 1) - 2;
for (int r = 0; r <= vectorUB; r += 2) {
  __m128d b_r;
  b_i = iC + r;
  b_r = _mm_loadu_pd([b_i]);
  _mm_storeu_pd([b_i],
_mm_add_pd(b_r, _mm_mul_pd(_mm_set1_pd(in10[iB]),
   _mm_loadu_pd([r];
}
iC = iB + 1;
iB++;
  }
}

int main() {
  double in9[10] = {0.8147, 0.9058, 0.1270, 0.9134, 0.6324, 0.0975, 0.2785,
0.5469, 0.9575, 0.9649};
  double in10[7] = {0.1576, 0.9706, 0.9572, 0.4854, 0.8003, 0.1419, 0.4218};
  double out5[16];
  convn_script(in9, in10, out5);
  for(int i = 0; i < 16; i++)
__builtin_printf ("%d %f\n", i, out5[i]);
}

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3

2023-10-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov  ---
This is predcom. It's easier to see what's going wrong with

#pragma GCC unroll 99

added to the innermost loop so it's unrolled at -O2 and comparing

g++ -O2 -fpredictive-commoning --param=max-unroll-times=1

vs. plain g++ -O2 (or diffing 'pcom' dump against the preceding pass).

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3

2023-10-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Richard Biener  changed:

   What|Removed |Added

  Known to fail||13.2.1

--- Comment #2 from Richard Biener  ---
Just to say, -fno-strict-aliasing -fstack-reuse=none doesn't help. 
convn_script can be made 'noipa' and still reproduces the failure. 
-fno-tree-vectorize
changes code and result but still differs from -O2.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3

2023-10-03 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
  Known to fail||7.1.0
   Keywords||needs-bisection,
   ||needs-reduction
   Last reconfirmed||2023-10-03
   Target Milestone|--- |11.5
Summary|Incorrect answer when using |[11/12/13/14 Regression]
   |SSE2 intrinsics with -O3|Incorrect answer when using
   ||SSE2 intrinsics with -O3
 Status|UNCONFIRMED |NEW
  Known to work||6.1.0, 6.4.0

--- Comment #1 from Andrew Pinski  ---
Confirmed. Started between GCC 6 and GCC 7. Looks unrolling related too.

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3

27 matches

Site Navigation

Mail list logo

Footer information