[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8

2024-01-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #12 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:a98d5130a6dcff2ed4db371e500550134777b8cf

commit r14-8346-ga98d5130a6dcff2ed4db371e500550134777b8cf
Author: Richard Biener 
Date:   Mon Jan 15 12:55:20 2024 +0100

rtl-optimization/113255 - base_alias_check vs. pointer difference

When the x86 backend generates code for cpymem with the rep_8byte
strathegy for the 8 byte aligned main rep movq it needs to compute
an adjusted pointer to the source after doing a prologue aligning
the destination.  It computes that via

  src_ptr + (dest_ptr - orig_dest_ptr)

which is perfectly fine.  On RTL this is then

8: r134:DI=const(`g'+0x44)
9: {r133:DI=frame:DI-0x4c;clobber flags:CC;}
  REG_UNUSED flags:CC
   56: r129:DI=const(`g'+0x4c)
   57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;}
  REG_UNUSED flags:CC
  REG_EQUAL const(`g'+0x4c)&0xfff8
   58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;}
  REG_DEAD r134:DI
  REG_UNUSED flags:CC
  REG_EQUAL const(`g'+0x44)-r129:DI
   59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;}
  REG_DEAD r133:DI
  REG_UNUSED flags:CC

but as written find_base_term happily picks the first candidate
it finds for the MINUS which means it picks const(`g') rather
than the correct frame:DI.  This way find_base_term (but also
the unfixed find_base_value used by init_alias_analysis to
initialize REG_BASE_VALUE) performs pointer analysis isn't
sound.  The following restricts the handling of multi-operand
operations to the case we know only one can be a pointer.

This for example causes gcc.dg/tree-ssa/pr94969.c to miss some
RTL PRE (I've opened PR113395 for this).  A more drastic patch,
removing base_alias_check results in only gcc.dg/guality/pr41447-1.c
regressing (so testsuite coverage is bad).  I've looked at
gcc.dg/tree-ssa tests and mostly scheduling changes are present,
the cc1plus .text size is only 230 bytes worse.  With the this
less drastic patch below most scheduling changes are gone.

x86_64 might not the very best target to test for impact, but
test coverage on other targets is unlikely to be very much better.

PR rtl-optimization/113255
* alias.cc (find_base_term): Remove PLUS/MINUS handling
when both operands are not CONST_INT_P.

* gcc.dg/torture/pr113255.c: New testcase.

[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8

2024-01-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #11 from Richard Biener  ---
diff --git a/gcc/alias.cc b/gcc/alias.cc
index b2ec4806d22..0150dd699db 100644
--- a/gcc/alias.cc
+++ b/gcc/alias.cc
@@ -2272,6 +2272,8 @@ static bool
 base_alias_check (rtx x, rtx x_base, rtx y, rtx y_base,
  machine_mode x_mode, machine_mode y_mode)
 {
+  return 1;
+
   /* If the address itself has no known base see if a known equivalent
  value has one.  If either address still has no known base, nothing
  is known about aliasing.  */

(an experiment I did many years ago already) gives clean testresults
besides

+FAIL: gcc.dg/guality/pr41447-1.c   -O2  -DPREVENT_OPTIMIZATION  execution test
+FAIL: gcc.dg/guality/pr41447-1.c   -O3 -g  -DPREVENT_OPTIMIZATION  execution
test
+FAIL: gcc.dg/guality/pr41447-1.c   -Os  -DPREVENT_OPTIMIZATION  execution test
+FAIL: gcc.dg/guality/pr41447-1.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  -DPREVENT_OPTIMIZATION execution test

and only minor change in code generation:

   textdata bss dec hex filename
48374841  64824 1939512 50379177300b9a9 ../obj2/gcc/cc1plus
48375065  64824 1939512 50379401300ba89 gcc/cc1plus

where the larger binary is the patched one.

Assembly-wise there are scheduling changes, missed scheduling over spills.

Recovering this and similar cases should be as easy as marking spill
MEMs with a flag (in MEM_EXPR) for example, distinguishing (classes of)
stack memory from the rest.  We should have this already (spill_slot_decl,
set_mem_attrs_for_spill), not sure why it doesn't look effective.

Improving test coverage for desired transforms would be nice as well.

[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8

2024-01-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #10 from Richard Biener  ---
Hmm, trying to fix find_base_term isn't enough, init_alias_analysis
find_base_value needs to be adjusted as well.  One "obvious" mistake there
is a missing

diff --git a/gcc/alias.cc b/gcc/alias.cc
index b2ec4806d22..6aeb2167520 100644
--- a/gcc/alias.cc
+++ b/gcc/alias.cc
@@ -1492,6 +1492,13 @@ find_base_value (rtx src)
   {
rtx temp, src_0 = XEXP (src, 0), src_1 = XEXP (src, 1);

+   /* If both operands of a MINUS are known pointers then the
+  base is not either of them.  */
+   if (GET_CODE (src) == MINUS
+   && REG_P (src_0) && REG_POINTER (src_0)
+   && REG_P (src_1) && REG_POINTER (src_1))
+ return 0;
+
/* If either operand is a REG that is a known pointer, then it
   is the base.  */
if (REG_P (src_0) && REG_POINTER (src_0))

but of course that's not conservative - not having REG_POINTER set doesn't
mean it's not a pointer.  But even when we assume REG_POINTER is
correct the minus operands might not be REG_P.

This is really all totally wrong for what it is (pointer analysis on RTL).
On RTL we also lost constraints that arithmetic stays within an object.
It should likely be scrapped completely and re-done, possibly having
SET_DEST_POINTS_TO to be able to put SSA points-to info to SETs
(REG_ATTRs are too coarse, but would be possible as well, losing some of
the flow sensitivity).  Incoming args & frame analysis would need to be
implemented of course.  As said, I'm not sure analyzing RTL will yield
anything good while being conservative - and optimistic points-to is what
leads us to these kind of bugs ...

[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8

2024-01-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #9 from Richard Biener  ---
So this:

static void
expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves (rtx destmem, rtx
srcmem,
...
  /* See how many bytes we skipped.  */
  saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, saveddest,
   *destptr,
   saveddest, 1, OPTAB_DIRECT);
  /* Adjust srcptr and count.  */
  if (!issetmem)
*srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr,
   saveddest, *srcptr, 1, OPTAB_DIRECT);

is the problematical op, but I'm not seeing an easy way to avoid the MINUS
here.  It's also difficult to match (minus (..) (and ... aligning-cst))
in find_base_term as the AND is exposed via CSELIB values only.

find_base_term is known broken but base_alias_check continues to be "useful"
for aliasing with spill slots mostly.  find_base_term tries to do ad-hoc
points-to analysis but is not conservative in any way - it doesn't even
have a way to say a final "I don't know" which means there's no way to
perform a conservative correction to it.  In fact I don't think we can
make it conservative.

[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8

2024-01-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

Richard Biener  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=49330,
   ||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=53705

--- Comment #8 from Richard Biener  ---
3: NOTE_INSN_FUNCTION_BEG
8: r134:DI=const(`g'+0x44)
9: {r133:DI=frame:DI-0x4c;clobber flags:CC;}
  REG_UNUSED flags:CC
...
   56: r129:DI=const(`g'+0x4c)
   57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;}
  REG_UNUSED flags:CC
  REG_EQUAL const(`g'+0x4c)&0xfff8
   58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;}
  REG_DEAD r134:DI
  REG_UNUSED flags:CC
  REG_EQUAL const(`g'+0x44)-r129:DI
   59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;}
  REG_DEAD r133:DI
  REG_UNUSED flags:CC

so the source is

   - ((+0x44) - (+0x44)&0xfff...8)

that is, the original source adjusted by the alignment prologue amount
which is aligning the destination.  That's a classical example for where
find_base_term is confused.  I'm not sure there's a way to "obfuscate"
things here.  I think find_base_term ignores paths with & (align ops)
but it goes down the subtract base path here.  So maybe instead of
subtracting using & (align-1) for that would work.

[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8

2024-01-09 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #7 from Jakub Jelinek  ---
Slightly cleaned up testcase:

struct S { unsigned a[10]; unsigned y; unsigned b[6]; } g[2];

__attribute__((noinline, noclone)) int
test (int x)
{
  struct S e[2] = { g[0], g[1] };
  int r = 0;
  if (x >= 0)
{
  r++;
  e[1].y++;
}
  g[1] = e[1];
  return r;
}

int
main ()
{
  test (1);
  if (g[1].y != 1)
__builtin_abort ();
  return 0;
}

[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8

2024-01-09 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

Uroš Bizjak  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #6 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #4)

> I can't decipher this from what expand generates but the problem lies
> there (the rep_8bytes expansion).

Let's ask Honza.

[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8

2024-01-08 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #5 from Jakub Jelinek  ---
(In reply to Richard Biener from comment #4)
> I'm unsure the above parallel is valid, isn't parallel executing stmts
> in "parallel" (unspecified order)?

I don't see anything invalid on it.  In addition to the memory copying, it
describes the other effects at the end of the pattern (that the counter goes to
0 and the di/si pointers are incremented by the original counter times 8).  All
the uses of pseudos in the pattern are the values of those pseudos before the
instruction, then all the sets at the end set the updated pseudos.

[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8

2024-01-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #4 from Richard Biener  ---
DSE thinks the store is dead because it falls off the function.

(insn 41 40 46 4 (set (mem/c:SI (plus:DI (reg/f:DI 19 frame)
(const_int -36 [0xffdc])) [2 e[1].y+0 S4 A32])
(reg:SI 98 [ e$1$y ])) "t.c":21:9 85 {*movsi_internal}
 (expr_list:REG_DEAD (reg:SI 98 [ e$1$y ])
(nil)))

...

(insn 64 63 68 4 (parallel [
(set (reg:DI 131)
(const_int 0 [0]))
(set (reg/f:DI 129)
(plus:DI (ashift:DI (reg:DI 131)
(const_int 3 [0x3]))
(reg/f:DI 129)))
(set (reg/f:DI 119)
(plus:DI (ashift:DI (reg:DI 131)
(const_int 3 [0x3]))
(reg/f:DI 119)))
(set (mem/c:BLK (reg/f:DI 129) [1  A64])
(mem/c:BLK (reg/f:DI 119) [1  A8]))
(use (reg:DI 131))
]) "t.c":21:9 1416 {*rep_movdi_rex64}
 (expr_list:REG_UNUSED (reg:DI 131)
(expr_list:REG_UNUSED (reg/f:DI 129)
(expr_list:REG_UNUSED (reg/f:DI 119)
(nil)

it appears to be a bit convoluted how we compute the source of the block
copy (reg/f:DI 119) and this is probably confusing "frame related" vs.
non-frame-related MEM disambiguation.

I'm unsure the above parallel is valid, isn't parallel executing stmts
in "parallel" (unspecified order)?

**scanning insn=64
  mem: (reg/f:DI 119)

   after canon_rtx address: (reg/f:DI 119)

   after cselib_expand address: (minus:DI (plus:DI (reg/f:DI 129)
(reg/f:DI 133))
(reg/f:DI 134))

   after canon_rtx address: (minus:DI (plus:DI (reg/f:DI 129)
(reg/f:DI 133))
(reg/f:DI 134))
  varying cselib base=17:1741806 offset = 0
 processing cselib load mem:(mem/c:BLK (reg/f:DI 119) [1  A8])
 processing cselib load against insn 55
removing from active insn=55 has store
 processing cselib load against insn 47
removing from active insn=47 has store
 processing cselib load against insn 41
mems_found = 0, cannot_delete = true

we're calling canon_true_dependence with

(gdb) p debug_rtx (store_info->mem)
(mem/c:SI (plus:DI (reg/f:DI 19 frame)
(const_int -36 [0xffdc])) [2 e[1].y+0 S4 A32])
(gdb) p debug_rtx (mem)
(mem/c:BLK (reg/f:DI 119) [1  A8])
(gdb) p debug_rtx (mem_addr)
(minus:DI (plus:DI (value:DI 8:1741643 @0x49c8ba8/0x49b8e80)
(value:DI 11:11 @0x49c8bf0/0x49b8f10))
(value:DI 9:9 @0x49c8bc0/0x49b8eb0))

and find_base_term of the mem_addr is returning

(symbol_ref:DI ("g_e") [flags 0x2] )

which is because of the weird way we compute the source address (involving
the address of the destination).  There's a very old bug about find_base_term
which tends to pick up a wrong base if multiple candidates are involved.

I can't decipher this from what expand generates but the problem lies
there (the rep_8bytes expansion).

[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8

2024-01-07 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #3 from Uroš Bizjak  ---
_.dse1 pass is removing the store for some reason, -fno-dse "fixes" the
testcase.

Before _.dse1 pass, we have:

(insn 41 40 46 4 (set (mem/c:SI (plus:DI (reg/f:DI 19 frame)
(const_int -36 [0xffdc])) [2 e[1].y+0 S4 A32])
(reg:SI 98 [ e$1$y ])) "pr113255.c":21:9 85 {*movsi_internal}
 (expr_list:REG_DEAD (reg:SI 98 [ e$1$y ])
(nil)))

But _.dse1 pass decides that:

**scanning insn=41
  mem: (plus:DI (reg/f:DI 19 frame)
(const_int -36 [0xffdc]))

   after canon_rtx address: (plus:DI (reg/f:DI 19 frame)
(const_int -36 [0xffdc]))
  gid=1 offset=-36
 processing const base store gid=1[-36..-32)
mems_found = 1, cannot_delete = false

...

Locally deleting insn 41
deferring deletion of insn with uid = 41.

[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8

2024-01-07 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P3  |P2
   Keywords|needs-bisection |
 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
Started with r11-963-g80d6f89e78fc3b772701988cc73aa8e8006283be

[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8

2024-01-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |11.5
  Known to fail||10.2.0
   Last reconfirmed||2024-01-06
  Known to work||10.1.0
  Component|c   |target
 Ever confirmed|0   |1
Summary|wrong code with -O2 |[11/12/13/14 Regression]
   |-mtune=k8   |wrong code with -O2
   ||-mtune=k8
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
Confirmed.

This looks to be a target issue.
`-O2 -mtune=k8  -mstringop-strategy=libcall` works but
`-O2 -mtune=k8  -mstringop-strategy=rep_8byte` does not.