[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-06-11 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-18 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359
Bug 57359 depends on bug 95172, which changed state.

Bug 95172 Summary: [11 Regression] wrong code at -O1 on x86_64-linux-gnu since 
r11-272-gb6ff3ddecfa93d53
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95172

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359
Bug 57359 depends on bug 94988, which changed state.

Bug 94988 Summary: [11 Regression] FAIL: gcc.target/i386/pr64110.c 
scan-assembler vmovd[\\t ]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94988

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-12 Thread ro at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359
Bug 57359 depends on bug 94988, which changed state.

Bug 94988 Summary: [11 Regression] FAIL: gcc.target/i386/pr64110.c 
scan-assembler vmovd[\\t ]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94988

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359
Bug 57359 depends on bug 95045, which changed state.

Bug 95045 Summary: [11 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95045

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359
Bug 57359 depends on bug 94988, which changed state.

Bug 94988 Summary: [11 Regression] FAIL: gcc.target/i386/pr64110.c 
scan-assembler vmovd[\\t ]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94988

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359
Bug 57359 depends on bug 95025, which changed state.

Bug 95025 Summary: [11 Regression] ICE in execute_sm_exit at 
gcc/tree-ssa-loop-im.c:2224 since r11-161-g283cb9ea6293e813
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95025

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

Richard Biener  changed:

   What|Removed |Added

 CC||msebor at gcc dot gnu.org

--- Comment #34 from Richard Biener  ---
*** Bug 90668 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359
Bug 57359 depends on bug 90668, which changed state.

Bug 90668 Summary: loop invariant moving a dependent store out of a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90668

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
  Known to work||11.0
  Known to fail||10.0

--- Comment #33 from Richard Biener  ---
Fixed on trunk.

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-07 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

--- Comment #32 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:283cb9ea6293e813e48a1b769e1e0779918ea20a

commit r11-161-g283cb9ea6293e813e48a1b769e1e0779918ea20a
Author: Richard Biener 
Date:   Mon Apr 27 14:45:54 2020 +0200

tree-optimization/57359 - rewrite SM code

This rewrites store-motion to process candidates where we can
ensure order preserving separately and with no need to disambiguate
against all stores.  Those candidates we cannot handle this way
are validated to be independent on all stores (w/o TBAA) and then
processed as "unordered" (all conditionally executed stores are so
as well).

This will necessary cause
  FAIL: gcc.dg/graphite/pr80906.c scan-tree-dump graphite "isl AST to
Gimple succeeded"
because the SM previously performed is not valid for exactly the PR57359
reason, we still perform SM of qc for the innermost loop but that's not
enough.

There is still room for improvements because we still check some
constraints
for the order preserving cases that are only necessary in the current
strict way for the unordered ones.  Leaving that for the furture.

2020-05-07  Richard Biener  

PR tree-optimization/57359
* tree-ssa-loop-im.c (im_mem_ref::indep_loop): Remove.
(in_mem_ref::dep_loop): Repurpose.
(LOOP_DEP_BIT): Remove.
(enum dep_kind): New.
(enum dep_state): Likewise.
(record_loop_dependence): New function to populate the
dependence cache.
(query_loop_dependence): New function to query the dependence
cache.
(memory_accesses::refs_in_loop): Rename to ...
(memory_accesses::refs_loaded_in_loop): ... this and change to
only record loads.
(outermost_indep_loop): Adjust.
(mem_ref_alloc): Likewise.
(gather_mem_refs_stmt): Likewise.
(mem_refs_may_alias_p): Add tbaa_p parameter and pass it down.
(struct sm_aux): New.
(execute_sm): Split code generation on exits, record state
into new hash-map.
(enum sm_kind): New.
(execute_sm_exit): Exit code generation part.
(sm_seq_push_down): Helper for sm_seq_valid_bb performing
dependence checking on stores reached from exits.
(sm_seq_valid_bb): New function gathering SM stores on exits.
(hoist_memory_references): Re-implement.
(refs_independent_p): Add tbaa_p parameter and pass it down.
(record_dep_loop): Remove.
(ref_indep_loop_p_1): Fold into ...
(ref_indep_loop_p): ... this and generalize for three kinds
of dependence queries.
(can_sm_ref_p): Adjust according to hoist_memory_references
changes.
(store_motion_loop): Don't do anything if the set of SM
candidates is empty.
(tree_ssa_lim_initialize): Adjust.
(tree_ssa_lim_finalize): Likewise.

* gcc.dg/torture/pr57359-1.c: New testcase.
* gcc.dg/torture/pr57359-1.c: Likewise.
* gcc.dg/tree-ssa/ssa-lim-14.c: Likewise.
* gcc.dg/graphite/pr80906.c: XFAIL.

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-04-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

Richard Biener  changed:

   What|Removed |Added

  Attachment #48381|0   |1
is obsolete||

--- Comment #31 from Richard Biener  ---
Created attachment 48422
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48422=edit
more complex approac

So this is a feature-wise finished patch for the more complex approach (some
possible improvements are still marked as ???).  I'm throwing this on SPEC
now and appreciate correctness testing.  It patches ontop of the PR39612 patch
which is ready for early GCC 11.

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-04-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

--- Comment #30 from Richard Biener  ---
Created attachment 48381
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48381=edit
more complex approach, POC

Another testcase, this time for store ordering (IIRC we may have a duplicate
for
this).  We currently use a fixed order of *p and *r for _both_ exits which
is of course wrong.

POC patch fixing all issues attached, it needs to be enhanced to not give
up when not all stores we want to move are stored in the exit block.  Also
some memory leaks need fixing and we need to avoid doing redundant work when
enhancing the scheme and eventually enhance it to prune SMs we cannot perform.


extern void abort();

typedef int A;
typedef float B;

void __attribute__((noinline,noclone))
foo(A * p, B *r, long unk, long oh)
{
  for (long i = 0; i < unk; ++i) {
  *p = 1;
  *r = 2;
  if (oh & i)
break;
  *r = 3;
  *p = 4;
  }
}

int main(void)
{
  union { A x; B f; } u;
  foo(, , 1, 1);
  if (u.x != 4) abort();
  foo(, , 2, 1);
  if (u.f != 2) abort ();
  return 0;
}

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-04-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

--- Comment #29 from Richard Biener  ---
And another testcase showing that with a conditional invariant store we may
_never_ apply store-motion since conditional means we do not know the
original order of stores in the loop.  *sigh*

typedef int A;
typedef float B;

float x[256];

void __attribute__((noinline,noclone))
foo(long unk, long ack)
{
  B *q = x;
  for (long i = 0; i < unk; ++i)
{
  q[i] = 42;
  if (ack & i)
*((A*)q + 4) = 1;
}
}

int main(void)
{
  foo(5, 3);
  if (x[4] != 42) __builtin_abort();
  return 0;
}


implementation-wise it should still work, in hoist_memory_references, when
we computed the whole set of refs to apply store-motion to in a loop,
track the sequencing of stores from entry to the latch edge and at
merge points reject (parts of?) the sequence when there are mismatches
in ordering.  For the above there'd be { q[i], *((A*)q + 4) } and { q[i] }
and thus a mismatch.  If we reject only parts of the sequence those
parts would need to be disambiguated against the previous stores in the
sequence w/o TBAA.

Trying to code that up now.

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-04-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

--- Comment #28 from Richard Biener  ---
OK, so more "advanced" testcases are a bit difficult because the
ref_always_accessed_p logic is too simple and it's required for
store-motion of accesses in conditional paths.  Basically if
we have if (test) { *p = 1; } else { *p = 0; } LIM doesn't figure
*p is accessed unconditionally (that's a missed optimization).

Anyway, here's a testcase circumventing that by using static storage
and showing that conditionally executed SM (with and without
-fallow-store-data-races) fail.  It's supposed to be a testcase for a "merge"
point of two
different sequenced store sequences we need to check.

typedef int A;
typedef float B;

float x[256];

void __attribute__((noinline,noclone))
foo(long unk, long ack)
{
  B *q = x;
  for (long i = 0; i < unk; ++i)
{
  if (ack & i)
{
  *((A*)q + 4) = 1;
  q[i] = 42;
}
  else
{
  q[i] = 42;
  *((A*)q + 4) = 1;
}
}
}

int main(void)
{
  foo(5, 6);
  if (x[4] != 42) __builtin_abort();
  return 0;
}

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-04-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

--- Comment #27 from Richard Biener  ---
(In reply to Richard Biener from comment #24)
> (In reply to Richard Biener from comment #22)
> > Created attachment 48311 [details]
> > patch
> > 
> > Note that apart from the possible bad impact on optimization when fixing 
> > this
> > bug an actual fix is complicated by the custom "optimized" dependence
> > analysis
> > code in the loop invariant motion pass.
> > 
> > A conservative "simple" patch would be the attached but that doesn't 
> > preserve
> > store-motion for the following (because the LIM data dependence code doesn't
> > care about stmt order):
> > 
> > typedef int A;
> > typedef float B;
> > 
> > void __attribute__((noinline,noclone))
> > foo(A *p, B *q, long unk)
> > {
> >   for (long i = 0; i < unk; ++i) {
> >   q[i] = 42;
> >   *p = 1;
> >   }
> > }
> > 
> > usually this bug doesn't manifest itself but of course the fix will be
> > experienced everywhere.  Benchmarking the simple patch might reveal
> > it's not an issue (but I doubt that...).
> 
> One case like this is gcc.dg/tree-ssa/pr81744.c which fails after the patch
> because we do not SM the global induction variable update which is already
> last before exit.  Similarly gcc.dg/graphite/pr80906.c and
> gcc.target/i386/pr64110.c - that's all of the GCC testsuite fallout on
> x86_64.  I do not
> think those regressions are acceptable on its own but I'll throw the patch
> on SPEC CPU 2006 to get more data (I fear even a solution preserving the
> cited regressions will regress actual code too much).

Results on a x86 Haswell CPU (-Ofast -march=native -flto), base unpatched
and peak patched (current trunk):

  Estimated   Estimated
Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  -
410.bwaves  13590170   80.0 *   13590175   77.6 *
416.gamess  19580614   31.9 *   19580614   31.9 *
433.milc 9180335   27.4 *9180338   27.2 *
434.zeusmp   9100227   40.0 *9100228   39.8 *
435.gromacs  7140244   29.2 *7140245   29.2 *
436.cactusADM   11950225   53.2 *   11950224   53.3 *
437.leslie3d 9400217   43.4 *9400225   41.8 *
444.namd 8020304   26.4 *8020302   26.5 *
447.dealII  11440201   56.8 *   11440202   56.6 *
450.soplex   8340226   36.9 *8340227   36.7 *
453.povray   5320101   52.8 *5320101   52.9 *
454.calculix 8250265   31.1 *8250265   31.1 *
459.GemsFDTD10610316   33.5 *   10610315   33.6 *
465.tonto9840258   38.1 *9840258   38.1 *
470.lbm 13740256   53.7 *   13740261   52.7 *
481.wrf 11170235   47.5 *   11170237   47.2 *
482.sphinx3 19490370   52.7 *   19490373   52.3 *
 Est. SPECfp_base2006  41.3
 Est. SPECfp2006   41.0

400.perlbench9770249   39.2 *9770248   39.3 *
401.bzip29650388   24.9 *9650389   24.8 *
403.gcc  8050228   35.3 *8050230   35.0 *
429.mcf  9120246   37.1 *9120241   37.9 *
445.gobmk   10490388   27.1 *   10490388   27.0 *
456.hmmer9330152   61.3 *9330151   61.7 *
458.sjeng   12100426   28.4 *   12100428   28.3 *
462.libquantum  20720314   66.0 *   20720308   67.3 *
464.h264ref 22130414   53.5 *   22130414   53.4 *
471.omnetpp  6250290   21.5 *6250301   20.8 *
473.astar7020308   22.8 *7020308   22.8 *
483.xalancbmk6900180   38.4 *6900181   38.2 *
 Est. SPECint(R)_base2006  35.5
 Est. SPECint2006  35.5

the "positive" ones are actually noise where spec median picked not the
fastest run for base.  There's enough negatives to not consider this
simple solution.  Actual consistent negatives to look at (to make sure
a better solution handles required cases) are 437.leslie3d and 471.omnetpp,
the rest are too much in the noise.

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-04-20 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

--- Comment #26 from rguenther at suse dot de  ---
On Mon, 20 Apr 2020, pascal_cuoq at hotmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359
> 
> --- Comment #25 from Pascal Cuoq  ---
> Would it be reasonable to have three settings for -fstrict-aliasing, rather
> then the current two?
> 
> - off
> - strict
> - assume-no-reuse
> 
> (I would let you find a better name for the third one.)

I think it's clearly a GCC bug we need to fix, for users we might want to
give more control in telling the compiler about loop dependences.

So no, I don't like another -f[no-]strict-* option.

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-04-20 Thread pascal_cuoq at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

--- Comment #25 from Pascal Cuoq  ---
Would it be reasonable to have three settings for -fstrict-aliasing, rather
then the current two?

- off
- strict
- assume-no-reuse

(I would let you find a better name for the third one.)

It seems to me that the wrong transformation corresponds to code patterns that
a C developers familiar with the compiled code would be able to tell are used
or not in the compiled code: repurposing of dynamically allocated memory and
access to a union through pointers.

(Note: in https://trust-in-soft.com/wp-content/uploads/2017/01/vmcai.pdf we
remarked that GCC is particularly user-friendly in both documenting what uses
of unions are compatible with -fstrict-aliasing and in sticking to the
documented behavior. The place in the GCC documentation where this is
documented would already explain a lot of the context for explaining the
difference between the strict and assume-no-reuse settings.)

Even if you set -fstrict-aliasing=assume-no-reuse as implied by -O2, this would
still be a much welcome improvement compared to the current situation. GCC
would continue to be compared fairly in benchmarks to other compilers that have
the same approximation, most programs would continue to work fine because they
do not rely on the dangerous patterns, and those that rely on the dangerous
pattern would have a way out.

It would be vaguely comparable to -ffast-math.

One possible psychological drawback may be that if the “strict” option exists,
some users may ask why GCC does not stick to it in the -On settings.

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-04-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

--- Comment #24 from Richard Biener  ---
(In reply to Richard Biener from comment #22)
> Created attachment 48311 [details]
> patch
> 
> Note that apart from the possible bad impact on optimization when fixing this
> bug an actual fix is complicated by the custom "optimized" dependence
> analysis
> code in the loop invariant motion pass.
> 
> A conservative "simple" patch would be the attached but that doesn't preserve
> store-motion for the following (because the LIM data dependence code doesn't
> care about stmt order):
> 
> typedef int A;
> typedef float B;
> 
> void __attribute__((noinline,noclone))
> foo(A *p, B *q, long unk)
> {
>   for (long i = 0; i < unk; ++i) {
>   q[i] = 42;
>   *p = 1;
>   }
> }
> 
> usually this bug doesn't manifest itself but of course the fix will be
> experienced everywhere.  Benchmarking the simple patch might reveal
> it's not an issue (but I doubt that...).

One case like this is gcc.dg/tree-ssa/pr81744.c which fails after the patch
because we do not SM the global induction variable update which is already
last before exit.  Similarly gcc.dg/graphite/pr80906.c and
gcc.target/i386/pr64110.c - that's all of the GCC testsuite fallout on x86_64. 
I do not
think those regressions are acceptable on its own but I'll throw the patch
on SPEC CPU 2006 to get more data (I fear even a solution preserving the
cited regressions will regress actual code too much).

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-04-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

--- Comment #23 from Richard Biener  ---
(In reply to Richard Biener from comment #22)
> Created attachment 48311 [details]
> patch
> 
> Note that apart from the possible bad impact on optimization when fixing this
> bug an actual fix is complicated by the custom "optimized" dependence
> analysis
> code in the loop invariant motion pass.
> 
> A conservative "simple" patch would be the attached but that doesn't preserve
> store-motion for the following (because the LIM data dependence code doesn't
> care about stmt order):
> 
> typedef int A;
> typedef float B;
> 
> void __attribute__((noinline,noclone))
> foo(A *p, B *q, long unk)
> {
>   for (long i = 0; i < unk; ++i) {
>   q[i] = 42;
>   *p = 1;
>   }
> }
> 
> usually this bug doesn't manifest itself but of course the fix will be
> experienced everywhere.  Benchmarking the simple patch might reveal
> it's not an issue (but I doubt that...).

Which means a better approach might be to, in addition to the existing
dependence testing, verify we can sink the stores through all exits
(IIRC there is/was a similar bug involving ordering of stores sunk where we'd
have to preserve the order).  There's again the difficult cases like

 for ()
  {
*p = 1;
q[i] = 42;
if (test)
  *p = 2; 
  }

which we currently sink as

 for ()
   {
 p_1 = 1;
 q[i] = 42;
 if (test)
   p_1 = 2;
   }
 *p = p_1;

if we want to preserve all sinking we'd have to replay all [possibly
aliased] stores - again more difficult when the store we sink is
in the latch (or when multiple exits are involved).  For the above
example do

  for ()
   {
 p_1 = 1;
 q[i] = 42;
 if (test)
   p_1 = 2;
   }
  *p = p_1;
  q[i] = 42;
  if (test)
*p = p_1;

with scanning for valid re-orderings starting from all exits we want
to consider sinking two stores that cannot be reordered.

Or we want to re-think the whole store-motion data dependence code.

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-04-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

--- Comment #22 from Richard Biener  ---
Created attachment 48311
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48311=edit
patch

Note that apart from the possible bad impact on optimization when fixing this
bug an actual fix is complicated by the custom "optimized" dependence analysis
code in the loop invariant motion pass.

A conservative "simple" patch would be the attached but that doesn't preserve
store-motion for the following (because the LIM data dependence code doesn't
care about stmt order):

typedef int A;
typedef float B;

void __attribute__((noinline,noclone))
foo(A *p, B *q, long unk)
{
  for (long i = 0; i < unk; ++i) {
  q[i] = 42;
  *p = 1;
  }
}

usually this bug doesn't manifest itself but of course the fix will be
experienced everywhere.  Benchmarking the simple patch might reveal
it's not an issue (but I doubt that...).

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-04-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

Richard Biener  changed:

   What|Removed |Added

 CC||pascal_cuoq at hotmail dot com

--- Comment #21 from Richard Biener  ---
*** Bug 94658 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-01-16 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

--- Comment #20 from rguenther at suse dot de  ---
On Fri, 17 Jan 2020, pinskia at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359
> 
> --- Comment #19 from Andrew Pinski  ---
> (In reply to Richard Biener from comment #14)
> > Testcase from PR81028
> > 
> > extern void abort();
> > 
> > typedef int A;
> > typedef float B;
> > 
> > void __attribute__((noinline,noclone))
> > foo(A *p, B *q, long unk)
> > {
> >   for (long i = 0; i < unk; ++i) {
> >   *p = 1;
> >   q[i] = 42;
> >   }
> > }
> > 
> 
> I don't see how the above function cannot be optimized to what we currently
> optimize it to?  The C/C++ aliasing rules say p and q cannot be a related
> pointer at all.
> 
> In the inplacement new case, there is another issue going on.  Locals and
> changing types was a defect report against C++ (or at least I thought I saw
> one).

This is about "placement new" or rather the middle-end memory model.
You can also simply make p/q point to different union members of the
same union where then the above is still valid with unk == 0 but we
mess it up, re-ordering the stores (basically we say the stores don't
conflict which is not correct for WAW or WAR dependences)

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-01-16 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

--- Comment #19 from Andrew Pinski  ---
(In reply to Richard Biener from comment #14)
> Testcase from PR81028
> 
> extern void abort();
> 
> typedef int A;
> typedef float B;
> 
> void __attribute__((noinline,noclone))
> foo(A *p, B *q, long unk)
> {
>   for (long i = 0; i < unk; ++i) {
>   *p = 1;
>   q[i] = 42;
>   }
> }
> 

I don't see how the above function cannot be optimized to what we currently
optimize it to?  The C/C++ aliasing rules say p and q cannot be a related
pointer at all.

In the inplacement new case, there is another issue going on.  Locals and
changing types was a defect report against C++ (or at least I thought I saw
one).

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-01-16 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

Andrew Pinski  changed:

   What|Removed |Added

 CC||msl023508 at gmail dot com

--- Comment #18 from Andrew Pinski  ---
*** Bug 93298 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2019-04-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

--- Comment #17 from Richard Biener  ---
We'd need to peel the last iteration as the following shows.  This also means
it is not enough to prove the loop actually iterates.  Peeling the last
iteration means we have to be able to identify that iteration.  Alternatively,
we might be able to re-issue all stores from the last iteration up until
the exit on the exit edge though if there are conditional stores on this
path this might prove interesting.

There's always the option to not apply store-motion here of course.

extern void abort();

typedef int A;
typedef float B;

void __attribute__((noinline,noclone))
foo(A *p, B *q, long unk)
{
  for (long i = 0; i < unk; ++i) {
  *p = 1;
  q[i] = 42;
  }
}

int main(void)
{
  char *mem = __builtin_malloc (sizeof (A) * 5);
  foo((A *)mem + 4, (B *)mem, 5);
  if (*((B *)mem + 4) != 42) abort();
  return 0;
}

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2019-01-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

Richard Biener  changed:

   What|Removed |Added

   Keywords||alias
   Last reconfirmed|2013-05-31 00:00:00 |2019-1-8
Summary|wrong code for union access |store motion causes wrong
   |at -O3 on x86_64-linux  |code for union access at
   ||-O3

--- Comment #16 from Richard Biener  ---
Re-confirmed.  Not really working on this - all other compilers make the same
"mistake".  Cost of "fixing" is quite high (peeling one iteration).