[Bug ipa/106935] [11/12/13/14/15 Regression] ICE in redirect_call_stmt_to_callee, at cgraph.cc:1505 since r10-5098-g9b14fc3326e08797

2024-04-30 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106935

--- Comment #3 from Martin Jambor  ---
This ICE no longer happens with GCC 13, in fact after r13-4240-gfeeb0d68f1c708
(Martin Jambor: ipa-cp: Do not consider useless aggregate constants).  From the
patch description, it does not look to be a fix of the underlying issue.

[Bug ipa/102310] [11/12 Regression] ICE in visit_ref_for_mod_analysis with OpenACC

2024-04-30 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102310

Martin Jambor  changed:

   What|Removed |Added

  Known to work||13.1.0
Summary|[11/12/13/14/15 Regression] |[11/12 Regression] ICE in
   |ICE in  |visit_ref_for_mod_analysis
   |visit_ref_for_mod_analysis  |with OpenACC
   |with OpenACC|

--- Comment #10 from Martin Jambor  ---
This has been fixed in GCC 13 by r13-2665-g23baa717c991d7 (Julian Brown:
OpenMP/OpenACC struct sibling list gimplification extension and rework).

[Bug tree-optimization/113964] [11/12/13/14/15 Regression] repeat copy of struct

2024-04-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113964

--- Comment #5 from Martin Jambor  ---
(In reply to Richard Biener from comment #2)
> No, I think the issue is that ESRA leaves e.f0 alone:
> 
>   e$f3_7 = e.f3;
>   e$f0$f4_8 = e.f0.f4;
>   _1 = e$f0$f4_8;
>   _2 = (unsigned char) _1;
>   e$f3_9 = _2;
>   e.f0 = g_50;
>   e$f3_10 = MEM  [(struct S1 *)_50];
>   e$f0$f4_11 = MEM  [(struct S1 *)_50 + 24B];
>   MEM  [(union U8 *)] = e$f3_10;
>   MEM  [(union U8 *) + 24B] = e$f0$f4_11;
>   g_16 = e.f0;
> 
> it looks like it materializes the e.f0 = g_15 copy but fails to elide that
> (maybe assuming sth else will?)?  And then for some reason the final
> g_16 = e.f90 copy isn't replaced?!
> 
> So somehow SRAs heuristics go off.
> 
> Martin?

I am afraid this is just another example of what flow-insensitive SRA cannot
optimize well.  I'll keep it in the list of testcases to hopefully one day
improve on when we make it flow sensitive.

[Bug rtl-optimization/114452] Functions invoked through compile-time table of function pointers not inlined

2024-04-11 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452

--- Comment #6 from Martin Jambor  ---
(In reply to Paweł Bylica from comment #5)
> (In reply to Martin Jambor from comment #4)
> > In this testcase all (well, both) functions referenced from the array
> > are semantically equivalent which is recognized by ICF but making it
> > be able to pass this information to the inliner would be
> > non-trivial... and is this the common case worth optimizing for?
> 
> I reduced the original code to the array of two identical functions.
> Originally, there weren't identical. I can update the test case if this make
> more sense.

Probably not.  But how many elements does the array have in the original code? 
Perhaps we could speculatively inline them if there are only few.

[Bug testsuite/114662] [14 regression] new test case c_lto_pr113359-2 from r14-9841-g1e3312a25a7b34 fails

2024-04-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114662

--- Comment #5 from Martin Jambor  ---
Thanks a lot for taking care of it before I had a chance to.

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled on x86 since r14-5109-ga291237b628f41

2024-04-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907

--- Comment #75 from Martin Jambor  ---
The above fixes the testcase from comment #58.  I am not sure if any other
testcases discussed here remain unresolved.  I am also not sure to what extent
we want to that patch of mine, I guess I'll re-visit the idea in a few weeks.

[Bug ipa/113359] [13/14 Regression] LTO miscompilation of ceph on aarch64 and x86_64

2024-04-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #26 from Martin Jambor  ---
This should be fixed on master, I'll backport the fix in a few weeks to at
least gcc-13 where it was reported.

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247

--- Comment #9 from Martin Jambor  ---
On master this has been fixed by r14-9813-g8cd0d29270d4ed where I
unfortunately copy-pasted a wrong bug number :-/

I assume this needs backporting to at least gcc-13 and gcc-12. I'll do
that in a week or two.

[Bug tree-optimization/113964] [11/12/13/14/15 Regression] repeat copy of struct

2024-04-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113964

--- Comment #4 from Martin Jambor  ---
Oops. I made a mistake, the commit above fixes PR 114247, sorry :-/
This one is the next in my queue.  Sorry again.

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247

--- Comment #7 from Martin Jambor  ---
Thanks, I will bootstrap and test the patch on x86_64 and submit it
for review then.

Can I ask you, can you please modify the testcase so that it does not
use printf but simply calls __builtin_abort in the miscompiled case
and just returns zero from main if it is OK?  That way we could
include it in our test suite.  Thanks a lot.

[Bug ipa/113359] [13/14 Regression] LTO miscompilation of ceph on aarch64 and x86_64

2024-04-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #24 from Martin Jambor  ---
(In reply to Jan Hubicka from comment #23)
> I however wonder if we really guarantee to copy the paddings everywhere else
> then the total scalarization part?
> (i.e. in all paths through the RTL expansion)

I wanted that we sometimes don't do that in PR 80689 and the idea was
refused.  And as far as I can recall the code I don't think we do.

Anyway, I have sent the patch to the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6jzlc25db@virgil.suse.cz/T/#u

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled on x86 since r14-5109-ga291237b628f41

2024-04-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907

--- Comment #71 from Martin Jambor  ---
I have sent the patch to the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6le5s25kl@virgil.suse.cz/T/#u

[Bug ipa/111571] [13 Regression] ICE in modify_call, at ipa-param-manipulation.cc:656

2024-04-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111571

Martin Jambor  changed:

   What|Removed |Added

Summary|[13/14 Regression] ICE in   |[13 Regression] ICE in
   |modify_call, at |modify_call, at
   |ipa-param-manipulation.cc:6 |ipa-param-manipulation.cc:6
   |56  |56

--- Comment #6 from Martin Jambor  ---
Fixed on master, fix queued for backporting to gcc 13 branch.

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247

--- Comment #4 from Martin Jambor  ---
I don't seem to be able to get riscv64 qemu running in reasonable
time.  Can someone please verify that the following patch fixes
the issue?

diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
index 3e0df6a6f77..b4ca78b652e 100644
--- a/gcc/ipa-param-manipulation.cc
+++ b/gcc/ipa-param-manipulation.cc
@@ -740,6 +740,12 @@ ipa_param_adjustments::modify_call (cgraph_edge *cs,
  }
   if (repl)
{
+ if (!useless_type_conversion_p(apm->type, repl->typed.type))
+   {
+ repl = force_value_to_type (apm->type, repl);
+ repl = force_gimple_operand_gsi (, repl,
+  true, NULL, true,
GSI_SAME_STMT);
+   }
  vargs.quick_push (repl);
  continue;
}

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-03 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247

Martin Jambor  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org

--- Comment #3 from Martin Jambor  ---
Mine.

[Bug ipa/113359] [13/14 Regression] LTO miscompilation of ceph on aarch64 and x86_64

2024-03-28 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #22 from Martin Jambor  ---
Created attachment 57828
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57828=edit
Potential fix

I'm testing this patch

[Bug rtl-optimization/114452] Functions invoked through compile-time table of function pointers not inlined

2024-03-27 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452

Martin Jambor  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|DUPLICATE   |---
   Last reconfirmed||2024-03-27
 Ever confirmed|0   |1

--- Comment #4 from Martin Jambor  ---
This does not look like a duplicate of PR 111573.

Nevertheless, it is not quite obvious what to do here.  Inlining
happens before unrolling and I am not sure we'd consider unrolling in
early optimizations.  And without unrolling, the load from the array
is not easy to fold.

In this testcase all (well, both) functions referenced from the array
are semantically equivalent which is recognized by ICF but making it
be able to pass this information to the inliner would be
non-trivial... and is this the common case worth optimizing for?

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-03-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907

--- Comment #66 from Martin Jambor  ---
Created attachment 57750
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57750=edit
Patch comparing jump functions

I'm testing this patch.  (Not sure how to best check that it does not
inadvertently pessimize ICF too much, except for ICF testcases.)

[Bug ipa/114254] [11/12/13 regression] Indirect inlining through C++ member pointers fails if the underlying class has a virtual function

2024-03-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114254

Martin Jambor  changed:

   What|Removed |Added

Summary|[11/12/13/14 regression]|[11/12/13 regression]
   |Indirect inlining through   |Indirect inlining through
   |C++ member pointers fails   |C++ member pointers fails
   |if the underlying class has |if the underlying class has
   |a virtual function  |a virtual function

--- Comment #3 from Martin Jambor  ---
Fixed on trunk.  I may consider backporting to GCC 13 but probably not to
earlier versions.

[Bug ipa/108802] [11/12/13 Regression] missed inlining of call via pointer to member function

2024-03-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108802

Martin Jambor  changed:

   What|Removed |Added

Summary|[11/12/13/14 Regression]|[11/12/13 Regression]
   |missed inlining of call via |missed inlining of call via
   |pointer to member function  |pointer to member function

--- Comment #10 from Martin Jambor  ---
Fixed on trunk.  I may consider backporting to GCC 13 but probably not to
earlier versions.

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-03-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907

--- Comment #65 from Martin Jambor  ---
I hope to have some jump-function comparison functions ready for testing later
today.

[Bug target/112980] 64-bit powerpc ELFv2 does not allow nops to be generated before function global entry point

2024-03-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112980

--- Comment #5 from Martin Jambor  ---
I'd like to ping this, are there plans to implement this in the near-ish term?

[Bug ipa/111571] [13/14 Regression] ICE in modify_call, at ipa-param-manipulation.cc:656

2024-03-15 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111571

--- Comment #4 from Martin Jambor  ---
I have proposed a fix on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6r0gbwf7l@virgil.suse.cz/T/#u

[Bug tree-optimization/113757] [14 regression] ICE when building legion-23.03.0 since r14-8398

2024-03-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113757

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Martin Jambor  ---
Fixed.

[Bug ipa/114254] [11/12/13/14 regression] Indirect inlining through C++ member pointers fails if the underlying class has a virtual function

2024-03-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114254

--- Comment #1 from Martin Jambor  ---
I have proposed a patch on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6r0gkzvi4@virgil.suse.cz/T/#u

[Bug ipa/108802] [11/12/13/14 Regression] missed inlining of call via pointer to member function

2024-03-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108802

--- Comment #8 from Martin Jambor  ---
I have proposed an improved patch on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6r0gkzvi4@virgil.suse.cz/T/#u

[Bug ipa/114254] New: Indirect inlining through C++ member pointers fails if the underlying class has a virtual function

2024-03-06 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114254

Bug ID: 114254
   Summary: Indirect inlining through C++ member pointers fails if
the underlying class has a virtual function
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: jamborm at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57634
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57634=edit
testcase

Just adding a virtual method to the class in our test
testsuite/g++.dg/ipa/iinline-2.C and it will unfortunately stop
working.

At some point the C++ FE got clever and stopped emitting the complex
code checking if a member pointer points to a virtual method or a
normal one when the base class does not have any virtual method.  But
that meant that our testcases stopped exercising the pattern matching
code in ipa_analyze_indirect_call_uses and when that code changed with
r10-917-g3b47da42de621c (Martin Jambor: Make SRA re-construct original
memory accesses when easy) because of a small mistake, we lost the
intended ability to inline also these cases.

So this is a regression against 9.5, unfortunately.

[Bug tree-optimization/114238] New: Multiple 554.roms_r run-time regressions (4%-20%) since r14-9193-ga0b1798042d033

2024-03-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114238

Bug ID: 114238
   Summary: Multiple 554.roms_r run-time regressions (4%-20%)
since r14-9193-ga0b1798042d033
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: rguenth at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux, aarch64-linux
Target: x86_64-linux, aarch64-linux

Our LNT instance has detected that runtime of benchmark 554.roms_r
from the SPEC 2017 FPUrate suite regressed on all machines on most
configurations by 4-20%.

For example:

simple -O2 -flto on AMD Zen 3 regressed by 14%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=470.537.0

on Zen2 -O2 -flto regression is the worst, 20%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=298.537.0

-Ofast -march=native -flto on AMD Zen 4 regressed by 7%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=959.537.0

-Ofast -march=native on AMD Zen 2 regressed by 17%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.537.0

but it also happens on Intel Skylake:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=800.537.0

or Aarch64:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=587.537.0

and there are smaller regressions on the PGO configurations too.

I have bisected the Zen3 -O2 -flto case to r14-9193-ga0b1798042d033
(Richard Biener: tree-optimization/114074 - CHREC multiplication and
undefined overflow).  I have then verified that the zen 4 -Ofast
-march=natice -flto and zen 2 -Ofast -march=native cases have also
been introduces by it:

commit a0b1798042d033fd2cc2c806afbb77875dd2909b
Author: Richard Biener 
Date:   Mon Feb 26 13:33:21 2024 +0100

tree-optimization/114074 - CHREC multiplication and undefined overflow

When folding a multiply CHRECs are handled like {a, +, b} * c
is {a*c, +, b*c} but that isn't generally correct when overflow
invokes undefined behavior.  The following uses unsigned arithmetic
unless either a is zero or a and b have the same sign.

I've used simple early outs for INTEGER_CSTs and otherwise use
a range-query since we lack a tree_expr_nonpositive_p and
get_range_pos_neg isn't a good fit.

PR tree-optimization/114074
* tree-chrec.h (chrec_convert_rhs): Default at_stmt arg to NULL.
* tree-chrec.cc (chrec_fold_multiply): Canonicalize inputs.
Handle poly vs. non-poly multiplication correctly with respect
to undefined behavior on overflow.

* gcc.dg/torture/pr114074.c: New testcase.
* gcc.dg/pr68317.c: Adjust expected location of diagnostic.
* gcc.dg/vect/vect-early-break_119-pr114068.c: Do not expect
loop to be vectorized.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug ipa/108802] [11/12/13/14 Regression] missed inlining of call via pointer to member function

2024-02-21 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108802

--- Comment #7 from Martin Jambor  ---
I have proposed a patch on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6y1bdx3yg@virgil.suse.cz/T/#u

[Bug ipa/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-21 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #16 from Martin Jambor  ---
Fixed.

[Bug ipa/111573] lambda functions often not inlined and optimized out

2024-02-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111573

--- Comment #2 from Martin Jambor  ---
I cannot see any difference at -O3 with or without -fno-early-inlining.

[Bug tree-optimization/112312] -O3 produces worse code than -O2 for std::ranges::lower_bound in some cases, not marking a loop as finite

2024-02-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112312

--- Comment #4 from Martin Jambor  ---
It seems this has been fixed in current master (which is to become gcc 14).
If my bisecting is correct, it has been fixed by r14-5628-g53ba8d669550d3 (Jan
Hubicka: inter-procedural value range propagation).

I guess it would be nice to add this testcase to the testsuite, so I'm keeping
this bug opened (and on my TODO list).

[Bug ipa/108802] [11/12/13/14 Regression] missed inlining of call via pointer to member function

2024-02-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108802

Martin Jambor  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #6 from Martin Jambor  ---
I think I know what to do.

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64

2024-02-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #15 from Martin Jambor  ---
Created attachment 57462
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57462=edit
Simple testcase (needs disabling early - and only early - SRA)

This is a simpler testcase which exhibits the problem on x86_64-linux
and current master.  Steps to reproduce:

$ ~/gcc/trunk/inst/bin/gcc -O2 -fno-strict-aliasing -fno-ipa-cp 
--disable-tree-esra -flto pr113359.c -c -o 1.o
cc1: note: disable pass tree-esra for functions in the range of [0, 4294967295]

$ ~/gcc/trunk/inst/bin/gcc -O2 -fno-strict-aliasing -fno-ipa-cp 
--disable-tree-esra -flto -DFILE2 pr113359.c -c -o 2.o
cc1: note: disable pass tree-esra for functions in the range of [0, 4294967295]

$ ~/gcc/trunk/inst/bin/gcc -flto 1.o 2.o -o test.exe

$ ./test.exe 
Aborted (core dumped)


If you add -fno-ipa-icf to the "compilation" commands, the test will
pass.

Late (post ICF) intra-procedural SRA is necessary to exhibit the
problem.  On the other hand, early SRA must be suppressed or it will
scalarize the aggregate assignment too early and the results will look
different to IPA-ICF.  Instead of using --disable-tree-esra we could
pass the address of tmp in both geta() and getb() to an empty function
coming from a third compilation unit.

Disabling strict aliasing is also necessary to show the problem, with
strict aliasing IPA-ICF takes the alias class of types into acount
when hashing and considers geta() and getb() different from the start.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #6 from Martin Jambor  ---
I have proposed a patch on the mailing list that converts the array of lattices
to a vector:
https://inbox.sourceware.org/gcc-patches/ri6frxoxzpk@virgil.suse.cz/T/#u

[Bug lto/113712] [11/12/13/14 Regression] lto crash: when building 641.leela_s peek with Example-gcc-linux-x86.cfg (SPEC2017 1.1.9) since r10-3311-gff6686d2e5f797

2024-02-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113712

--- Comment #20 from Martin Jambor  ---
I have access to the benchmark and building it with -fprofile-generate
it fails for me (with an ICE in add_symbol_to_partition_1) only when I
use -fno-use-linker-plugin and either -std=c++11 or -std=c++03. Using
-std=c++14 also avoids the issue.  In any event, -fno-use-linker-plugin
looks necessary.

[Bug lto/113712] [11/12/13/14 Regression] lto crash: when building 641.leela_s peek with Example-gcc-linux-x86.cfg (SPEC2017 1.1.9) since r10-3311-gff6686d2e5f797

2024-02-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113712

--- Comment #18 from Martin Jambor  ---
(In reply to Filip Kastl from comment #17)
> I've bisected this (using the test from Andrew Pinski) to
> r10-3311-gff6686d2e5f797

That's a coincidence, with -fno-ipa-sra the testcase fails even earlier,
IPA-SRA was just hiding it, most probably by localizing some symbol before the
linking stage.

Bugs that are only reproducible with -fno-use-linker-plugin are unlikely to get
a high priority.  But I understand that the original issue does not need it?

(Also, the issue is supposed to be reproducible ton x86_64-linux, right?)

[Bug target/113847] [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X

2024-02-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

--- Comment #6 from Martin Jambor  ---
(In reply to Richard Biener from comment #5)
> CCing also Martin who should know how/why IPA SRA doesn't reconstruct the
> component ref chain here 

I have not had a look at this specific case (yet), but IPA-SRA just
doesn't (unlike intraprocedural SRA) and always creates MEM_REFs (in
callers).  I guess we could stream field offsets and/or array_ref
indices and attempt to reconstruct it for simple (non-union,
non-otherwise-overlapping) types, even if it would make the
ipa_adjusted_param type (and thus ipa_param_adjustments) slightly
bigger and add another vector.

> or why it choses the dynamic type as it does
> (possibly local SRA when fully scalarizing an aggregate copy does the same).

That is unlikely.  Total scalarization in intraprocedural SRA just
follows the type of the decl whereas IPA-SRA (and intra-SRA too when
not totally scalarizing) takes all types from existing memory
accesses.

[Bug tree-optimization/113833] 435.gromacs fails verification on with -Ofast -march={cascadelake,icelake-server} and PGO after r14-7272-g57f611604e8bab

2024-02-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113833

--- Comment #4 from Martin Jambor  ---
Created attachment 57397
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57397=edit
-fopt-info-vec before/after comparison

(In reply to Richard Biener from comment #3)
> A compare before/after the patch of -fopt-info-vec output might show the few
> cases that are affected by the patch.

I Hope I have not messed anything up.  I have added -fopt-info-vec right after
-fprofile-use into the spec config and then grepped the output for
':[^:]*:[^:]*: optimized'.  Then I sorted (because the build was parallel) and
compared the output and it seems there are quite a few *fewer* instances of
vectorization happening.

[Bug tree-optimization/110422] asm goto vs SRA

2024-02-09 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Martin Jambor  ---
Fixed on all opened release branches too.

[Bug tree-optimization/113833] New: 435.gromacs fails verification on with -Ofast -march={cascadelake,icelake-server} and PGO after r14-7272-g57f611604e8bab

2024-02-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113833

Bug ID: 113833
   Summary: 435.gromacs fails verification on with -Ofast
-march={cascadelake,icelake-server} and PGO after
r14-7272-g57f611604e8bab
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: fxue at os dot amperecomputing.com
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux
Target: x86_64-linux

After r14-7272-g57f611604e8bab (Feng Xue: Do not count unused scalar
use when marking STMT_VINFO_LIVE_P [PR113091]), our runs of SPEC 2006
CPU benchmark 435.gromacs on Icelake-server CPU compiled with -Ofast
-march=native and PGO (with and without LTO) started failing with
miscompare error:

  0002:  3.07684e+02
 3.03476e+02
   ^

I subsequently verified the failure on an Intel CascadeLake and
bisected it to the aforementioned commit.  We don't see it on our AMD
or Ampere testers (using -march=native).

I guess the miscomparison error may be well within what is expected
when using -Ofast but even in that case it would be nice to have it
documented here that that is indeed expected.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug tree-optimization/113757] [14 regression] ICE when building legion-23.03.0 since r14-8398

2024-02-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113757

--- Comment #8 from Martin Jambor  ---
I have proposed a fix on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6bk8r5kfi@virgil.suse.cz/T/#u

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64

2024-02-07 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #14 from Martin Jambor  ---
(In reply to rguent...@suse.de from comment #13)
> Might be also an interaction with IPA ICF in case there's a pointer to
> the pair involved?

Yes, this is exactly what seems to be happening.  The problem goes
away with -fno-icf.

(Possibly because the testcase uses -fno-strict-aliasing,) IPA-ICF
merges two functions which copy a structure and that access type it
what IPA-SRA saves, but loads only the one of the merged functions.
SRA then uses the (wrong) type to split aggregate copies into copies
by individual fields.

I have talked to Honza about this.  It seems that IPA-ICF needs to be
careful about aggreage with holes in different places.  The ideal next
step would be to create a testcase not dependent on IPA-SRA.

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64

2024-02-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #9 from Martin Jambor  ---
SRA creates the replacements (in GCC 13) during total scalarization,
i.e. the bit that is not driven by pre-existing accesses to
aggregates, but because it sees an aggregate that is small and regular
and so it is split according to its type in the hope it will go away.

Unfortunately in the LTO and non-LTO case, they see a different type.
I have added a dumping of types and fields of totally scalarized
records and got the following.

In the non-LTO case, the type of the aggregate is:
   constant 128>
unit-size  constant 16>
align:64 warn_if_not_align:0 symtab:1430035184 alias-set -1 canonical-type
0x553cabd0
...

and specifically its third field is a pointer:
  
pointer_to_this >
unsigned DI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x562729d8 reference_to_this >
used unsigned nonlocal decl_3 DI /usr/include/c++/13/bits/stl_pair.h:194:11
size  constant 64>
unit-size  constant 8>
align:64 warn_if_not_align:0 offset_align 128 decl_not_flexarray: 0
offset  constant 0>
bit-offset  constant 64> context >


However, in the LTO case the type of the aggregate is:
   constant 128>
unit-size  constant 16>
align:64 warn_if_not_align:0 symtab:0 alias-set 98 canonical-type
0x61cc1498
...

which however has an unsigned int as its third field:
 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x62410690 precision:32 min  max 
pointer_to_this  reference_to_this
>
unsigned nonlocal SI /usr/include/c++/13/bits/stl_pair.h:194:11
size  constant 32>
unit-size  constant 4>
align:32 warn_if_not_align:0 offset_align 128 decl_not_flexarray: 0
offset  constant 0>
bit-offset  constant 64> context >

An so only an unsigned int replacement is created.

The name of the aggregate indicates it has been created by IPA-SRA and
so that is where I am looking right now, but IPA-SRA simply takes (and
streams) the type of the access in the original function body for
these.  Can't this perhaps be some type-merging issue?

[Bug tree-optimization/113757] [14 regression] ICE when building legion-23.03.0 since r14-8398

2024-02-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113757

Martin Jambor  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #7 from Martin Jambor  ---
This is a very particular interaction of the patch with speculative
devirtualization.  Mine.

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2024-01-31 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

--- Comment #3 from Martin Jambor  ---
(In reply to Richard Biener from comment #1)
> Did you try with -fprofile-partial-training (is that default on?  it
> probably should ...).  Can you please try training with the rate data
> instead of train
> to rule out a mismatch?

With -fprofile-partial-training the znver4 LTO vs LTOPGO regression (on a newer
master) goes down from 66% to 54%.  

So far I did not find a way to easily train with the reference run (when I add
"train_with = refrate" to the config, I always get "ERROR: The workload
specified by train_with MUST be a training workload!")

[Bug target/113655] New: Cross compiling to mips64-elf fails because "MIPS_EXPLICIT_RELOCS was not declared" after r14-8386-g58af788d1d0825

2024-01-29 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113655

Bug ID: 113655
   Summary: Cross compiling to mips64-elf fails because
"MIPS_EXPLICIT_RELOCS was not declared" after
r14-8386-g58af788d1d0825
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: syq at gcc dot gnu.org
  Target Milestone: ---
  Host: x86_64-linux
Target: mips64-elf

Starting with r14-8386-g58af788d1d0825 (MIPS: Accept arguments for
-mexplicit-relocs), when I try to test that cross compilation from
x86_64-linux to target mips64-elf still works by configuring gcc with:

../src/configure --prefix=/home/mjambor/gcc/mine/inst --enable-languages=c,c++
--enable-checking=yes --disable-bootstrap --disable-multilib --enable-obsolete
--target=mips64-elf

and then building just the compiler with make -j64 all-host,

the compilation fails with:

options.cc:3474:3: error: ‘MIPS_EXPLICIT_RELOCS’ was not declared in this
scope; did you mean ‘MIPS_EXPLICIT_RELOCS_NON ’?
 3474 |   MIPS_EXPLICIT_RELOCS, /* mips_opt_explicit_relocs */
  |   ^~~~
  |   MIPS_EXPLICIT_RELOCS_NONE


Our buildbot reports failures when building a cross-compiler for
mips64el-st-linux-gnu, mips64octeon-linux, mipsisa64r2-linux,
mipsisa32r2-linux-gnu, mipsisa64r2-sde-elf, mipsisa32-elfoabi,
mipsisa64-elfoabi, mipsisa64r2el-elf, mipsisa64sr71k-elf,
mipsisa64sb1-elf, mips64-elf, mipsel-elf, mips64vr-elf,
mips64orion-elf, mips-rtems, mips-wrs-vxworks, mipstx39-elf and I
suspect the problem is the same or similar.

[Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2024-01-28 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

Bug ID: 113646
   Summary: PGO hurts run-time of 538.imagick_r as much as 68% at
-Ofast -march=native
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: hubicka at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux, aarch64-linux
Target: x86_64-linux, aarch64-linux

Using profile guided optimization is very detrimental when compiling SPEC 2017
FPrate benchmark 538.imagick_r at -Ofast -march=native (with or without LTO) on
all machines where I have tried.

On Zen4, using PGO results in a 68% slower than not doing that without LTO and
65% with LTO:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=970.507.0=966.507.0=959.507.0=958.507.0;

On Zen3, using PGO slows the binary down by 22% when not using LTO and by 30%
with LTO:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.507.0=473.507.0=475.507.0=477.507.0;

On Zen2, PGO regresses by 16% without LTO and by 28% with it:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.507.0=293.507.0=287.507.0=286.507.0;

On our Altra CPU, the slowdowns are 26% and 45%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=584.507.0=583.507.0=587.507.0=589.507.0;

On an Intel CascadeLake machine, they are 24% and 41%. (Our LNT Intel machine
is temporarily offline, unfortunately).

It is of course possible that the training workload does not match the
reference one very well.  However, this was not a problem in the past
(apparently the problem is that our non-PGO results improved but our PGO ones
did not).  Also, other compilers such as LLVM achieve better run-times with PGO
than without.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug target/113641] New: 510.parest_r with PGO at O2 slower than GCC 12 (7% on Zen 3&2, 4% on CascadeLake) since r13-4272-g8caf155a3d6e23

2024-01-28 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113641

Bug ID: 113641
   Summary: 510.parest_r with PGO at O2 slower than GCC 12 (7% on
Zen 3&2, 4% on CascadeLake) since
r13-4272-g8caf155a3d6e23
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux-gnu
Target: x86_64-linux-gnu

During the development of GCC 13, 510.parest_r run-time regressed on x86_64
when built with profile guided optimization and just plain O2 and master than
when using GCC12.  The difference is not big but fairly clear cut, about 7.6%
on Zen3:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=740.457.0=892.457.0=694.457.0;

and about 7.2% on Zen2:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=777.457.0=932.457.0=687.457.0;

The graphs above show use of both LTO and PGO but LTO is not necessary.

I was able to bisect the regression to commit r13-4272-g8caf155a3d6e23 (i386:
Only enable small loop unrolling in backend [PR 107692]).  parest_r is also
about 4% slower when compiled with this revision than with the previous one on
Intel CascadeLake.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug target/113600] [14 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-01-26 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600

--- Comment #4 from Martin Jambor  ---
(In reply to Hongtao Liu from comment #2)
> A patch is posted at
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640276.html
> 
> Would you give a try to see if it fixes the regression, I don't currently
> have a znver4 machine for testing.

Unfortunately it does not.

(In reply to Richard Biener from comment #3)
> I think we need to figure out what exactly gets slower (and hope it's not
> scattered all over the place)

I have collected some profiles:

r14-5602-ge6269bb69c0734

# Samples: 516K of event 'cycles:u'
# Event count (approx.): 468008188417
# Overhead   Samples  Command  Shared Object   
  Symbol   
#     ... 
. 
.
#
13.55% 69886  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] mc_chroma
11.05% 57017  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_16x16
 9.24% 47693  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_8x8
 8.67% 44733  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] get_ref
 4.84% 24984  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] sub16x16_dct
 4.16% 21484  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_me_search_ref
 3.30% 17033  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_hadamard_ac_16x16
 2.28% 11770  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_4x4
 2.10% 10824  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] quant_trellis_cabac
 2.07% 10694  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] hpel_filter
 2.05% 10616  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] sub8x8_dct
 1.86%  9593  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] refine_subpel
 1.70%  8788  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] quant_4x4
 1.57%  8077  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_sad_16x16
 1.16%  6324  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] frame_init_lowres_core
 1.14%  5867  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_sa8d_8x8
 1.11%  5738  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_cabac_encode_decision_c
 1.08%  5736  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_var_16x16



r14-5603-g2b59e2b4dff421

# Samples: 550K of event 'cycles:u'
# Event count (approx.): 498834737657
# Overhead   Samples  Command  Shared Object   
  Symbol   
#     ... 
. 
.
#
18.21%100151  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_16x16
12.37% 68006  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] mc_chroma
 8.51% 46815  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_8x8
 7.56% 41560  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] get_ref
 4.53% 24901  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] sub16x16_dct
 3.92% 21561  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_me_search_ref
 3.08% 16963  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_hadamard_ac_16x16
 2.41% 13239  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_4x4
 1.99% 10931  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] quant_trellis_cabac
 1.96% 10801  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] hpel_filter
 1.95% 10764  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] sub8x8_dct
 1.56%  8587  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] quant_4x4
 1.49%  8166  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] refine_subpel
 1.48%  8124  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_sad_16x16
 1.09%  6328  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] frame_init_lowres_core
 1.07%  5901  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_sa8d_8x8
 1.04%  5703  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_cabac_encode_decision_c

[Bug tree-optimization/107946] [13/14 Regression] 507.cactuBSSN_r regresses by ~9% on znver3 with PGO since r13-3875-g9e11ceef165bc0

2024-01-26 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107946

Martin Jambor  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-26
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #7 from Martin Jambor  ---
This regression is still there (as the graphs linked in the summary show).

[Bug target/113600] New: 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-01-25 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600

Bug ID: 113600
   Summary: 525.x264_r run-time regresses by 8% with PGO -Ofast
-march=znver4
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: liuhongt at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux-gnu
Target: x86_64-linux-gnu

With profile-feedback, -Ofast and -march=native on an AMD Zen 4, there is a
recent 8% regression:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=979.377.0=966.377.0;

With both PGO and LTO, the situation is similar (6%):
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=977.377.0=958.377.0;

On a Zen3 machine, there is a 2% bump around the same time:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=900.377.0=473.377.0;

I have bisected the (non-LTO) Zen 4 case to commit r14-5603-g2b59e2b4dff421:

2b59e2b4dff42118fe3a505f07b9a6aa4cf53bdf is the first bad commit
commit 2b59e2b4dff42118fe3a505f07b9a6aa4cf53bdf
Author: liuhongt 
Date:   Thu Nov 16 18:38:39 2023 +0800

Support reduc_{plus,xor,and,ior}_scal_m for vector integer mode.

BB vectorizer relies on the backend support of
.REDUC_{PLUS,IOR,XOR,AND} to vectorize reduction.

gcc/ChangeLog:

PR target/112325
* config/i386/sse.md (reduc__scal_): New expander.
(REDUC_ANY_LOGIC_MODE): New iterator.
(REDUC_PLUS_MODE): Extend to VxHI/SI/DImode.
(REDUC_SSE_PLUS_MODE): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112325-1.c: New test.
* gcc.target/i386/pr112325-2.c: New test.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug target/105275] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718

2024-01-24 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275

--- Comment #3 from Martin Jambor  ---
I have re-checked this year again (using master revision
r14-7200-g95440171d0e615)  but this time on a high-frequency Zen3 CPU (EPYC
75F3). Run-time of 525.x264_r built with master with PGO and -O2 improved by
5.49% compared to GCC 13 and so compared to GCC 11 the regression dropped to
4.2%.

Run-time of 538.imagick_r compiled with the same options and master is 5.8%
slower on this CPU than when compiling it with GCC 11.

With both PGO and LTO, 525.x264_r is now only 2.8% slower than GCC 11.  In case
of 538.imagick_r the regression is 2.01% on the zen4, but it is 7.49% on a zen4
machine :-/

[Bug ipa/112616] [11/12/13/14 Regression] wrong code at -O{s, 2, 3} on x86_64-linux-gnu since r10-3311

2024-01-24 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112616

--- Comment #8 from Martin Jambor  ---
Fixed on trunk.  I did not want to backport this but because this variant does
not require disabling DCE, I will probably do after a few weeks on master, if
there are no issues.

[Bug ipa/108007] [11/12/13/14 Regression] wrong code at -Os and above with "-fno-dce -fno-tree-dce" on x86_64-linux-gnu since r10-3311-gff6686d2e5f797

2024-01-24 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108007

--- Comment #22 from Martin Jambor  ---
Fixed on trunk.  I did not want to backport this but because of PR 112616 I
will probably do after a few weeks on master, if there are no issues.

[Bug ipa/113490] [14 Regression] ICE: in propagate_vals_across_arith_jfunc, at ipa-cp.cc:2425 at -O3 since r14-285

2024-01-24 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113490

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Martin Jambor  ---
Fixed.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-01-22 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #4 from Martin Jambor  ---
The right place where to free stuff in lattices post-IPA would be in
ipa_node_params::~ipa_node_params() where we should iterate over lattices and
deinitialize them or perhaps destruct the array because since ipcp_vr_lattice
directly contains Value_Range which AFAIU directly contains int_range_max which
has a virtual destructor... does not look like a POD anymore.  This has escaped
me when I was looking at the IPA-VR changes but hopefully it should not be too
difficult to deal with.

[Bug ipa/113490] [14 Regression] ICE: in propagate_vals_across_arith_jfunc, at ipa-cp.cc:2425 at -O3 since r14-285

2024-01-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113490

--- Comment #5 from Martin Jambor  ---
I have proposed a fix on the mailing list: 
https://inbox.sourceware.org/gcc-patches/ri6cytv3eyy.fsf@/T/#u

[Bug other/94629] 10 issues located by the PVS-studio static analyzer

2024-01-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94629

--- Comment #28 from Martin Jambor  ---
(In reply to David Binderman from comment #27)
> The original article checked gcc-10.
> gcc-13 is checked in the following article:
> 
> https://pvs-studio.com/en/blog/posts/cpp/1067/
> 
> I suspect it would be most unwise if any release of gcc after 13 
> introduced new bugs that were known to pvs-studio.

And is there already a bugzilla bug about these (or should I create one)?
I believe a new one would be better than re-using this one.

[Bug ipa/113490] [14 Regression] ICE: in propagate_vals_across_arith_jfunc, at ipa-cp.cc:2425 at -O3 since r14-285

2024-01-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113490

Martin Jambor  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org

--- Comment #3 from Martin Jambor  ---
Still, let me have a look.

[Bug tree-optimization/110422] asm goto vs SRA

2024-01-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

--- Comment #5 from Martin Jambor  ---
Fixed on trunk, I plan to backport to open release branches in the upcoming
weeks.

[Bug other/89863] [meta-bug] Issues in gcc that other static analyzers (cppcheck, clang-static-analyzer, PVS-studio) find that gcc misses

2024-01-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89863
Bug 89863 depends on bug 94629, which changed state.

Bug 94629 Summary: 10 issues located by the PVS-studio static analyzer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94629

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug other/94629] 10 issues located by the PVS-studio static analyzer

2024-01-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94629

Martin Jambor  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #26 from Martin Jambor  ---
(In reply to Martin Liška from comment #25)
> No, there's still the 'ipa_polymorphic_call_context::set_by_invariant' issue
> that's waiting for Honza.

Finally fixed with:

https://gcc.gnu.org/g:4f4820964ebffc03249d98239a4ad2b43dd1a486

commit r14-8191-g4f4820964ebffc03249d98239a4ad2b43dd1a486
Author: Jan Hubicka 
Date:   Wed Jan 17 19:16:47 2024 +0100

Remove accidental hack in ipa_polymorphic_call_context::set_by_invariant

I managed to commit a hack setting offset to 0 in
ipa_polymorphic_call_context::set_by_invariant.  This makes it to give up
on multiple
inheritance, but most likely won't give bad code since the ohter base will
be of
different type.

gcc/ChangeLog:

* ipa-polymorphic-call.cc
(ipa_polymorphic_call_context::set_by_invariant): Remove
accidental hack reseting offset.

[Bug tree-optimization/110422] asm goto vs SRA

2024-01-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

Martin Jambor  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org

--- Comment #3 from Martin Jambor  ---
Mine.

[Bug ipa/112616] [11/12/13/14 Regression] wrong code at -O{s, 2, 3} on x86_64-linux-gnu since r10-3311

2024-01-16 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112616

--- Comment #6 from Martin Jambor  ---
(In reply to Andrew Pinski from comment #1)
>   # q_11 = PHI <0B(2), removed_return.14_14(D)(4),
> removed_return.14_14(D)(3)>
>   _12 = *q_11;
> 
> 
> WTF

Well, _12 is not used anywhere, so the code expects the entire load to be DCEd.
 But it gets optimized to 

  _2 = MEM[(int *)0B]; 

before DCE sees it and then even if _2 is never used anywhere, apparently the
statement is kept there as an intended trap (I guess).

I have adjusted my patch to make DCE for removed returnd part of IPA edge
redirection so that it does not have compare-debug problems and submitted it
for review in: https://inbox.sourceware.org/gcc-patches/ri6cyu1e9kw.fsf@/T/#u

[Bug ipa/108007] [11/12/13/14 Regression] wrong code at -Os and above with "-fno-dce -fno-tree-dce" on x86_64-linux-gnu since r10-3311-gff6686d2e5f797

2024-01-16 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108007

--- Comment #20 from Martin Jambor  ---
I have submitted a slightly modified patch to the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6cyu1e9kw.fsf@/T/#u

[Bug target/113296] [14 Regression] SPEC 2006 434.zeusmp segfaults on Aarch64 when built with -Ofast -march=native -flto

2024-01-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113296

Martin Jambor  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Martin Jambor  ---
According to our buildbot results, this has resolved itself somewhen between 1
and 2 days ago.

I assume nobody wants to go an investigate what issue it was if it does not
reappear, so let me close the bug.

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2024-01-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 113296, which changed state.

Bug 113296 Summary: [14 Regression] SPEC 2006 434.zeusmp segfaults on Aarch64 
when built with -Ofast -march=native -flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113296

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/113178] [14 Regression] ice in find_uses_to_rename_use

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113178

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |

--- Comment #6 from Martin Jambor  ---
(In reply to David Binderman from comment #4)
> Reduced range seems to be g:0994ddd86f9c3d82 to g:a657c7e3518fcfc7.
> 
> All commits in this range are by Tamar.

Specifically r14-6822-g01f4251b8775c8 (Tamar Christina: middle-end: Support
vectorization of loops with multiple exits.)

[Bug tree-optimization/107823] [13/14 Regression] Dead Code Elimination Regression at -Os (trunk vs. 12.2.0) since r13-1934-g353fd1ec3df92f

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107823

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |

--- Comment #6 from Martin Jambor  ---
This has been fixed by commit r14-4089-gd45ddc2c04e471 (Richard Biener:
tree-optimization/111294 - backwards threader PHI costing).

[Bug tree-optimization/109744] mesa/panvk: bogus Warray-bounds on gcc 12.2, fixed in 13 branch

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109744

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |

--- Comment #3 from Martin Jambor  ---
The warning went away with commit r13-4389-gfd8dd6c0384969 (Richard Biener:
tree-optimization/107852 - missed optimization with PHIs).

[Bug c++/109753] [13/14 Regression] pragma GCC target causes std::vector not to compile (always_inline on constructor)

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109753

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |

--- Comment #11 from Martin Jambor  ---
It seems there is nothing to bisect any more, please re-add the keyword if I am
wrong.

[Bug target/109780] [12/13/14 Regression] csmith: runtime crash with -O2 -march=znver1

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109780

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |

--- Comment #26 from Martin Jambor  ---
Seems like there is nothing to bisect any more, please re-add the keyword is I
am wrong.

[Bug c++/109823] [11/12/13/14 Regression] ICE with trailing return of decltype of a fold expression in nested generic variadic lambda

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109823

Martin Jambor  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #4 from Martin Jambor  ---
The testcase from comment #1 started ICEing with commit dc58fa9f3142097b (Jason
Merrill: PR c++/84036 - ICE with variadic capture).

[Bug c/109828] [13/14 Regression] static compound literal with flexible array in initializer leads to invalid size and ICE

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109828

Martin Jambor  changed:

   What|Removed |Added

 CC||jsm28 at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #11 from Martin Jambor  ---
ICE compiling testcase

-
#include 

struct s {
int i;
char c[];
};

const struct s s = { .c = "0", };
const struct s *const r = &(constexpr struct s) { .c = "1", };
const struct s *const t = &(static struct s) { .c = "2", };

size_t ice(void)
{
return __builtin_object_size(t, 1);
}
--

with options -O2 -std=gnu2x -S was introduced with commit
r13-3930-gb556d1773db717 (Joseph Myers: c: C2x constexpr), the testcase simply
errors before that because it tests constexprs.

[Bug c++/109918] [11/12/13/14 Regression] Unexpected -Woverloaded-virtual with virtual conversion operators

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109918

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |
 CC||nathan at gcc dot gnu.org

--- Comment #3 from Martin Jambor  ---
The testcase with -Werror=overloaded-virtual started failing with commit
r8-2669-gbff8b385e997a8 (Nathan Sidwell: Conversion operators have a special
name).

[Bug target/110001] [13/14 regression] Suboptimal code generation for branchless binary search

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110001

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |
 CC||amacleod at redhat dot com

--- Comment #6 from Martin Jambor  ---
Even though I can confirm the observation from comment #1 that the optimized
tree dump does not seem to change in any meaningful way, bisection leads to
commit r12-4871-g502ffb1f389011 (Andrew MacLeod: Switch vrp2 to ranger).

[Bug c++/110065] [11/12/13/14 Regression] [C++20/2b] auto return type in template argument causes ICE, also accepts-invalid

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110065

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org,
   ||jason at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #2 from Martin Jambor  ---
The ICE when compiling the reduced testcase from comment #1 started with
r14-1659-gd3e2a174b13dd0 (Jason Merrill: c++: diagnose auto in template arg) -
before that it was an error.

[Bug tree-optimization/110091] [12/13/14 Regression] bogus -Wdangling-pointer on non-pointer values

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110091

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #3 from Martin Jambor  ---
The warning started appearing from its very introduction to gcc in
r12-6606-g9d6a0f388eb048 (Martin Sebor: Add -Wdangling-pointer [PR63272]).

[Bug middle-end/110294] [11 Regression] Segmentation fault with '-O3 -fno-dce -fno-toplevel-reorder -fno-tree-dce -fno-tree-pta -fno-tree-sink -ftoplevel-reorder'

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110294

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #6 from Martin Jambor  ---
(In reply to Xi Ruoyao from comment #2)
> Not reproducible with GCC 13.1.  I guess it's a duplicate of a fixed issue.

The testcase stopped failing with r12-248-gb58dc0b803057c (Richard Biener:
tree-optimization/99912 - delete trivially dead stmts during DSE)

[Bug tree-optimization/110450] [14 Regression] Dead Code Elimination Regression at -O2 since r14-261-g0ef3756adf0

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110450

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |
 CC||jamborm at gcc dot gnu.org

--- Comment #3 from Martin Jambor  ---
This has been fixed with r14-4141-gbf6b107e2a3423 (Andrew MacLeod: New early
__builtin_unreachable processing).

[Bug ipa/110705] [11/12 Regression] ICE at -O2 and above: in gimplify_modify_expr, at gimplify.cc:6255 (on GCC-12.x)

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110705

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #2 from Martin Jambor  ---
This has been fixed with r13-1695-gb0f02eeb906b63 (Eric Botcazou: Fix ICE on
view conversion between struct and integer)

[Bug tree-optimization/110768] [14 Regression] Dead Code Elimination Regression since r14-2623-gc11a3aedec2

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110768

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #3 from Martin Jambor  ---
This has been fixed with r14-5109-ga291237b628f41 (Andrew MacLeod: Remove
simple ranges from trailing zero bitmasks)

[Bug libgomp/110842] [14 Regression] Openmp loops with KIND=16 DO loops

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110842

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |

--- Comment #5 from Martin Jambor  ---
So IIUC nothing to bisect here and so I am removing the tag.  Please re-add if
I am somehow mistaken.

[Bug tree-optimization/110941] [14 Regression] Dead Code Elimination Regression at -O3 since r14-2379-gc496d15954c

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110941

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #4 from Martin Jambor  ---
This has been fixed with r14-5109-ga291237b628f41 (Andrew MacLeod: Remove
simple ranges from trailing zero bitmasks).

[Bug tree-optimization/110942] [14 Regression] Dead Code Elimination Regression at -O3 since r14-1165-g257c2be7ff8

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110942

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |
 CC||jamborm at gcc dot gnu.org

--- Comment #5 from Martin Jambor  ---
This has been fixed with r14-5109-ga291237b628f41 (Andrew MacLeod: Remove
simple ranges from trailing zero bitmasks.)

[Bug tree-optimization/111003] [14 Regression] Dead Code Elimination Regression at -O3 since r14-2161-g237e83e2158

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111003

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |
 CC||jamborm at gcc dot gnu.org

--- Comment #4 from Martin Jambor  ---
This has been fixed with r14-4786-gd118738e71cf46 (Richi's restrict invariant
motion of shifts).

[Bug tree-optimization/111012] [14 Regression] Dead Code Elimination Regression at -O3 since r14-573-g69f1a8af45d

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111012

Martin Jambor  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 CC||jamborm at gcc dot gnu.org
 Status|NEW |RESOLVED
   Keywords|needs-bisection |

--- Comment #3 from Martin Jambor  ---
This has been fixed with Richi's r14-3982-g9ea74d235c7e78 ( better DCE after
forwprop).  Given the title of the patch I guess it's safe to declare this
fixed.

[Bug fortran/111291] ASAN error: heap-use-after-free gcc/fortran/parse.cc:359 in decode_statement

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111291

Martin Jambor  changed:

   What|Removed |Added

 CC|mjambor at suse dot cz |mikael at gcc dot 
gnu.org

--- Comment #3 from Martin Jambor  ---
This has been introduced with r14-7062-gbcf7ebba9115cc (fortran: Restore
interface to its previous state on error [PR48776]).

[Bug tree-optimization/110841] [14 Regression] Dead Code Elimination Regression since r14-2675-gef28aadad6e

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110841

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |
 CC||jamborm at gcc dot gnu.org

--- Comment #3 from Martin Jambor  ---
The testcase has been fixed with r14-4141-gbf6b107e2a3423 (New early
__builtin_unreachable processing.)

[Bug ipa/113197] [14 Regression] ICE in in handle_call_arg, at tree-ssa-structalias.cc:4119

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113197

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org

--- Comment #8 from Martin Jambor  ---
Indeed this (the reduced testcase from comment #3) can be bisected to Honza's
r12-5177-g494bdadf28d0fb (Enable pure-const discovery in modref).

[Bug target/113296] New: SPEC 2006 434.zeusmp segfaults on Aarch64 when built with -Ofast -march=native -flto

2024-01-09 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113296

Bug ID: 113296
   Summary: SPEC 2006 434.zeusmp segfaults on Aarch64 when built
with -Ofast -march=native -flto
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: aarch64-linux
Target: aarch64-linux

Our Aarch64 benchmarker (armv8.2-a+crypto+fp16+rcpc+dotprod+ssbs)
signals that SPEC 2006 434.zeusmp segfaults at run-time when built
with -Ofast -march=native -flto and master revision
r14-7022-g34d339bbd0c1f5.


Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x4002978f in ???
#1  0x403ff0 in ggen_
at
/home/gcc/buildworker/source/cpu2006/benchspec/CPU2006/434.zeusmp/build/build_peak_amd64-m64-mine./ggen.f:762
#2  0x407a0f in setup_
at
/home/gcc/buildworker/source/cpu2006/benchspec/CPU2006/434.zeusmp/build/build_peak_amd64-m64-mine./setup.f:1135
#3  0x40fc3b in mstart_
at
/home/gcc/buildworker/source/cpu2006/benchspec/CPU2006/434.zeusmp/build/build_peak_amd64-m64-mine./mstart.f:301
#4  0x425ee3 in zeusmp
at
/home/gcc/buildworker/source/cpu2006/benchspec/CPU2006/434.zeusmp/build/build_peak_amd64-m64-mine./zeusmp.fppized.f:620
#5  0x400d5f in main
at
/home/gcc/buildworker/source/cpu2006/benchspec/CPU2006/434.zeusmp/build/build_peak_amd64-m64-mine./zeusmp.fppized.f:769


Unfortunately at the moment don't have another access to another
adequate Aarch64 machine to debug further and so I cannot provide more
information (and so the component "target" is likely bogus, sorry).


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug target/113295] New: SPEC 2006 416.gamess miscompares on Aarch64 when built with -Ofast -march=native -flto

2024-01-09 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113295

Bug ID: 113295
   Summary: SPEC 2006 416.gamess miscompares on Aarch64 when built
with -Ofast -march=native -flto
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: aarch64-linux
Target: aarch64-linux

Our Aarch64 benchmarker (armv8.2-a+crypto+fp16+rcpc+dotprod+ssbs)
signals that SPEC 2006 416.gamess miscompares when built with -Ofast
-march=native -flto and master revision r14-7022-g34d339bbd0c1f5.

*** Miscompare of cytosine.2.out; 
1120:  1 C0.027630132   0.018067739   0.002234116
   1 C -223.432234062   7.107716215  -9.326017293
^
1121:  2 C0.012259576  -0.006051645  -0.67202
   2 C -205.307990130-173.019401916  -6.442472179
^
1122:  3 C   -0.012829758   0.003221329  -0.000743429
   3 C   -0.91858-263.923127366   3.131404191
^
1123:  4 N   -0.041204707   0.020932737  -0.000372560
   4 N  291.766837166-257.876625173  10.788390925
^
1124:  5 C0.057007688   0.032540385  -0.000909621
   5 C -204.215830139  57.403322317  -0.929403441
^
1125:  6 N   -0.015041867  -0.049945043   0.002129121
   6 N  117.540483305 300.802327718 -10.562573219
^
1126:  7 O   -0.076442899  -0.041673056  -0.000117411
   7 O  481.672983389 238.169793443   3.121961894
^
1127:  8 N0.034391335  -0.016048119  -0.001905357
   8 N -247.780884876  91.547672097  -1.133077767
^
1128:  9 H0.014938973   0.008953835   0.000373759
   9 H   -3.948185240   0.659195159  -0.007774350
^
1129: 10 H   -0.002268325   0.023480419   0.10207
  10 H   -0.160267091  -4.621874235  -0.178161165


Unfortunately at the moment don't have another access to another
adequate Aarch64 machine to debug further and so at this time I cannot
provide more information (and so the component "target" is likely
bogus, sorry).


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug tree-optimization/113144] [14 regression] ICE when building dpkg-1.21.15 in verify_dominators (error: dominator of 9 should be 48, not 12)

2024-01-09 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113144

--- Comment #13 from Martin Jambor  ---
The testcase below segfaults when compiled with master configured with
release checking.  However, it is very likely affected by this bug (it
fails with checking compiler like testcases for this issue do) and so
I did not want to file a new bug for a testcase where we know we're
currently having problems keeping dominance information.

Tamar, after you fix this issue, can you please check if the following
segfaults when compiled with -std=gnu99 -fpermissive -fgnu89-inline
-Ofast -march=znver2 -fprofile-generate -S ?

Thanks!

replace_reg_with_saved_mem_i, replace_reg_with_saved_mem_nregs,
replace_reg_with_saved_mem_mem_1;
replace_reg_with_saved_mem_mode() {
  if (replace_reg_with_saved_mem_i)
return;
  while (++replace_reg_with_saved_mem_i < replace_reg_with_saved_mem_nregs)
if (replace_reg_with_saved_mem_i)
  break;
  if (replace_reg_with_saved_mem_i)
if (replace_reg_with_saved_mem_mem_1)
  adjust_address_1();
  replace_reg_with_saved_mem_mem_1 ? fancy_abort() : 0;
}

[Bug ipa/112616] [11/12/13/14 Regression] wrong code at -O{s, 2, 3} on x86_64-linux-gnu since r10-3311

2024-01-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112616

Martin Jambor  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #5 from Martin Jambor  ---
(In reply to Andrew Pinski from comment #2)
> This is like PR 108007 but unlike that one, -fno-tree-dce is not used.

But the patch fixes it, so I gess it's time to make it pass ppc64le bootstrap.

(But I did not want to backport that patch, I wonder whether we can't figure
out something simpler :-/ )

[Bug tree-optimization/113145] [14 regression] ICE in verify_dominators when building mit-krb5-1.21.2 since r14-6822-g01f4251b8775c8

2024-01-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113145

Martin Jambor  changed:

   What|Removed |Added

Summary|[14 regression] ICE when|[14 regression] ICE in
   |building mit-krb5-1.21.2|verify_dominators when
   ||building mit-krb5-1.21.2
   ||since
   ||r14-6822-g01f4251b8775c8
   Last reconfirmed||2024-01-05
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
 CC||jamborm at gcc dot gnu.org

--- Comment #2 from Martin Jambor  ---
This has been introduced with r14-6822-g01f4251b8775c8 (middle-end: Support
vectorization of loops with multiple exits).

I have reduced another testcase from 526.blender_r, which however requires
-Ofast -march=x86-64-v3 -fprofile-generate so the original is probably better:

void *check_for_dupid_lb_0;
char check_for_dupid_name;
int check_for_dupid_nr;
void BLI_split_name_num();
char check_for_dupid() {
  int a;
  while (1) {
for (; check_for_dupid_lb_0;)
  BLI_split_name_num();
a = 0;
for (; a < 64; a++)
  if (a >= check_for_dupid_nr)
break;
if (a && check_for_dupid_name)
  return 1;
  }
}

[Bug tree-optimization/113237] New: [14 Regression] ICE verify_ssa failed when building 500.perlbench_r since r14-6822-g01f4251b8775c8

2024-01-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113237

Bug ID: 113237
   Summary: [14 Regression] ICE verify_ssa failed when building
500.perlbench_r since r14-6822-g01f4251b8775c8
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: tnfchris at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux
Target: x86_64-linux

With a compiler configured with --enable-checking=yes and the following
testcase derived from 500.perlbench_r with -O3 -march=x86-64-v3 I get a
verify_ssa ICE:

$ cat test.c 
long Perl_pp_split_limit;
int Perl_block_gimme();
int Perl_pp_split() {
  char strend;
  long iters;
  int gimme = Perl_block_gimme();
  while (--Perl_pp_split_limit) {
if (gimme)
  iters++;
if (strend)
  break;
  }
  if (iters)
return 0;
}

$ $PREFIX/gcc -O3 -march=x86-64-v3  -S test.c 
test.c: In function ‘Perl_pp_split’:
test.c:3:5: error: definition in block 4 does not dominate use in block 6
3 | int Perl_pp_split() {
  | ^
for SSA_NAME: vect_iters_12.12_110 in statement:
vect_iters_12.12_111 = PHI 
PHI argument
vect_iters_12.12_110
for PHI node
vect_iters_12.12_111 = PHI 
during GIMPLE pass: vect
test.c:3:5: internal compiler error: verify_ssa failed
0x129673f verify_ssa(bool, bool)
/home/mjambor/gcc/mine/src/gcc/tree-ssa.cc:1203
0xf0bcd5 execute_function_todo
/home/mjambor/gcc/mine/src/gcc/passes.cc:2095
0xf0c13e execute_todo
/home/mjambor/gcc/mine/src/gcc/passes.cc:2142
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

I have bisected the failure to r14-6822-g01f4251b8775c8 (middle-end: Support
vectorization of loops with multiple exits).  I have tried if the patch
attached to PR 113137 helps but unfortunately it does not.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug middle-end/109849] suboptimal code for vector walking loop

2024-01-03 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #34 from Martin Jambor  ---
(In reply to Jan Hubicka from comment #32)
> > /tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437:
> > warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)'
> > writing between 2 and 9223372036854775806 bytes into a region of size 0
> > overflows the destination [-Wstringop-overflow=]
> 
> It warns on:
> 
>   template
> struct __copy_move<_IsMove, true, random_access_iterator_tag>
> {
>   template
> _GLIBCXX20_CONSTEXPR
> static _Up*
> __copy_m(_Tp* __first, _Tp* __last, _Up* __result)
> {
>   const ptrdiff_t _Num = __last - __first;
>   if (__builtin_expect(_Num > 1, true))
> __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
>   else if (_Num == 1)
> std::__copy_move<_IsMove, false, random_access_iterator_tag>::
>   __assign_one(__result, __first);
>   return __result + _Num;
> }
> };
> 
> It is likely false positive on a code path that never happens in real
> code, but we now optimize it better.
> 

We end up with:
   [local count: 64736968]:
  __builtin_memcpy (1B, v$_M_impl$D10203$_M_start_448, _354);

IIRC the statement variant is created by jump threading (specifically
thread2).

Moreover, if I understand the comment in compute_objsize_r about the
INTEGER_CST case correctly, small integers are considered potential
"result of erroneous null pointer addition/subtraction."  So not
warning on a constant 1 destination does not seem to be desirable.

[Bug tree-optimization/112822] [14 regression] ICE: invalid RHS for gimple memory store after r14-5831-gaae723d360ca26

2023-12-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112822

--- Comment #9 from Martin Jambor  ---
Thank you, I have proposed the patch on the mailing list:

https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640356.html

If it is approved, I'd also like you to add the testcase to the testsuite as a
target specific test.

[Bug middle-end/112822] [14 regression] ICE: invalid RHS for gimple memory store after r14-5831-gaae723d360ca26

2023-12-11 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112822

--- Comment #5 from Martin Jambor  ---
The following should fix it.  I'll try a bit more to come up with a testcase
that would not require __builtin_vec_vsx_st but so far my simple attempts
failed. 


diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index 3bd0c7a9af0..99a1b0a6d17 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -4219,11 +4219,15 @@ load_assign_lhs_subreplacements (struct access *lacc,
  if (racc && racc->grp_to_be_replaced)
{ 
  rhs = get_access_replacement (racc);
+ bool vce = false;
  if (!useless_type_conversion_p (lacc->type, racc->type))
-   rhs = fold_build1_loc (sad->loc, VIEW_CONVERT_EXPR,
-  lacc->type, rhs);
+   {
+ rhs = fold_build1_loc (sad->loc, VIEW_CONVERT_EXPR,
+lacc->type, rhs);
+ vce = true;
+   }

- if (racc->grp_partial_lhs && lacc->grp_partial_lhs)
+ if (lacc->grp_partial_lhs && (vce || racc->grp_partial_lhs))
rhs = force_gimple_operand_gsi (>old_gsi, rhs, true,
NULL_TREE, true,
GSI_SAME_STMT);
}

  1   2   3   4   5   6   >