[Bug other/115174] New test case gcc.dg/lto/pr113359-2 fails

2024-05-21 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115174

Martin Jambor  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Martin Jambor  ---
Should be fixed with r13-8785-gc827f46d8652d7

Sorry for forgetting to backport the testcase fix.

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64 and x86_64

2024-05-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #32 from Martin Jambor  ---
(In reply to Marc Poulhiès from comment #31)
> Hello Martin,
> 
> Any chance the fix that fixes the new test for 32bits can be also backported?
> 
> 4923ed49b93352bcf9e43cafac38345e4a54c3f8
> https://gcc.gnu.org/g:4923ed49b93352bcf9e43cafac38345e4a54c3f8
> 
> Not sure why it's not tagged so that it would appear here.

My apologies for not including this commit, I completely forgot about it. 
Unfortunately I'm afraid it will have to wait until after the 13.3 release, but
I will backport it quickly afterwards.  Sorry again.

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-15 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #20 from Martin Jambor  ---
The IL we generate the jump function from is:
  
  _1 = cclauses_2(D) != 0B;
  c_parser_omp_all_clauses (_1);

Which translates to the expected jump function:
  callsite  void c_parser_omp_teams(int**)/3 -> int*
c_parser_omp_all_clauses(bool)/1 :
 param 0: PASS THROUGH: 0, op ne_expr 0B

so IPA looks like it's doing what it should.

(In reply to Aldy Hernandez from comment #6)
> I wonder if something like this would work.
> 
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index 5781f50..ea8a685 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -1730,6 +1730,8 @@ ipa_value_range_from_jfunc (vrange ,
> }
>else
> {
> + if (TREE_CODE_CLASS (operation) == tcc_comparison)
> +   vr_type = boolean_type_node;
>   Value_Range op_res (vr_type);
>   Value_Range res (vr_type);
>   tree op = ipa_get_jf_pass_through_operand (jfunc);

This looks OKish and we also do a similar thing in
ipa_get_jf_arith_result.

Also note that the ipa_value_range_from_jfunc already has a parameter
that tells it what type the result should be.  It is called parm_type,
which is boolean_type in the case that ICEs.  So we can even bail out
if we really encounter jump function created from bad IL.

I was thinking of using use parm_type from the beginning, to
initialize op_res with it, but there are jump functions representing
an operation followed by a truncation, for example for:

  _2 = complain_6(D) & 1;
  _3 = (int) std_alignof_7(D);
  cxx_sizeof_or_alignof_type (_3, _2);

where _r is in fact bool (has smaller size and precision) and trying
to make ranger do the bit_and_expr directly to bool leads to a failed
assert in fold_range (the test of m_operator->operand_check_p).

So doing the operation in the original type - unless it is a
comparison - and then using ipa_vr_operation_and_type_effects seems to
be the right thing to do.

But I am really curious why propagate_vr_across_jump_function does not
need the same check for tcc_comparison operators and generally why is
it so different (in the non-scc case)?  Why is ipa_supports_p (this
predicate has a really really really bad name BTW and I am completely
at loss as to what it does and how or why) used there and not in
ipa_value_range_from_jfunc?

(I also cannot prevent myself from ranting a little that it would
really help if all the ranger (helper) classes and functions were
better documented.)

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-05-15 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Martin Jambor  ---
Fixed.

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64 and x86_64

2024-05-14 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #30 from Martin Jambor  ---
...so set to fixed as well.

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64 and x86_64

2024-05-14 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #29 from Martin Jambor  ---
Fixed

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-13 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #19 from Martin Jambor  ---
The following minimized testcase ICEs with r15-312-g36e877996936ab
cross-compiler to ppc64le with -O2 nicely:


void omp_clause_elt_check(int *, const char *, const char *);
enum { C_OMP_CLAUSE_SPLIT_COUNT };
enum c_omp_region_type { C_ORT_OMP };
void c_finish_omp_clauses(int *, c_omp_region_type);
int *c_parser_omp_all_clauses_prev;
int *c_parser_omp_all_clauses(bool finish_p) {
  if (finish_p)
c_finish_omp_clauses(c_parser_omp_all_clauses_prev, C_ORT_OMP);
  return c_parser_omp_all_clauses_prev;
}
int c_parser_omp_teams___trans_tmp_104;
static void c_parser_omp_teams(int **cclauses) {
  c_parser_omp_all_clauses(cclauses);
  omp_clause_elt_check(_parser_omp_teams___trans_tmp_104, "", __FUNCTION__);
}
void c_parser_omp_target() {
  int *cclauses[C_OMP_CLAUSE_SPLIT_COUNT];
  c_parser_omp_teams(cclauses);
}

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #16 from Martin Jambor  ---
I'll have look, hopefully on Monday.

[Bug ipa/106935] [11/12/13/14/15 Regression] ICE in redirect_call_stmt_to_callee, at cgraph.cc:1505 since r10-5098-g9b14fc3326e08797

2024-05-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106935

Martin Jambor  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #4 from Martin Jambor  ---
We hit an assert guarding that we have not already massaged call
arguments before modifying them during call redirection as that would
end up in wring code.  We do that by looking first whether the decl in
the statement is the same as the decl of the cgraph_edge callee and if
not, if the node associated with the decl from the statement has any
parameter adjustment info.

The issue here is that we are in the process of inlining an artificial
thunk, which calls to a cgraph_node clone with adjustments from its
inception.  That would normally not be a problem because of the first
check above (both decls would be the same, we don't really redirect
these calls, not even in this case).  But the call is actually
recursive, and so the decl from the call graph edge is one created by
save_inline_function_body whereas the one in the statement is the
original one.

I guess we need to detect this particular situation.

[Bug c++/114935] New: Miscompilation of initializer_list in presence of exceptions

2024-05-03 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114935

Bug ID: 114935
   Summary: Miscompilation of initializer_list in
presence of exceptions
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: jason at gcc dot gnu.org
  Target Milestone: ---
  Host: x86_64-linux-gnu
Target: x86_64-linux-gnu

The following testcase:

#include 
#include 

void __attribute__((noipa))
tata(std::initializer_list init)
{
  throw 1;
}

int
main()
{
  try
{
  tata({ "0123456789012346" }); // using shorter string or "..."s works
}
  catch (...)
{
}
}

aborts when compiled with GCC 14 even when not optimizing.

I have bisected the failure to r14-1705-g2764335bd336f2 (Jason
Merrill: c++: build initializer_list in a loop [PR105838])

This has been extracted from libstorage-ng testsuite and originally
filed as https://bugzilla.opensuse.org/show_bug.cgi?id=1223820

[Bug tree-optimization/107021] [13 Regression] 511.povray_r error with -Ofast -march=znver2 -flto since r13-2810-gb7fd7fb5011106

2024-05-02 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107021

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org

--- Comment #11 from Martin Jambor  ---
It seems that clang is hitting the same problem now:
https://discourse.llvm.org/t/fast-math-spec-2017-fp-failure-for-povray/74959

[Bug ipa/106935] [11/12/13/14/15 Regression] ICE in redirect_call_stmt_to_callee, at cgraph.cc:1505 since r10-5098-g9b14fc3326e08797

2024-04-30 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106935

--- Comment #3 from Martin Jambor  ---
This ICE no longer happens with GCC 13, in fact after r13-4240-gfeeb0d68f1c708
(Martin Jambor: ipa-cp: Do not consider useless aggregate constants).  From the
patch description, it does not look to be a fix of the underlying issue.

[Bug ipa/102310] [11/12 Regression] ICE in visit_ref_for_mod_analysis with OpenACC

2024-04-30 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102310

Martin Jambor  changed:

   What|Removed |Added

  Known to work||13.1.0
Summary|[11/12/13/14/15 Regression] |[11/12 Regression] ICE in
   |ICE in  |visit_ref_for_mod_analysis
   |visit_ref_for_mod_analysis  |with OpenACC
   |with OpenACC|

--- Comment #10 from Martin Jambor  ---
This has been fixed in GCC 13 by r13-2665-g23baa717c991d7 (Julian Brown:
OpenMP/OpenACC struct sibling list gimplification extension and rework).

[Bug tree-optimization/113964] [11/12/13/14/15 Regression] repeat copy of struct

2024-04-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113964

--- Comment #5 from Martin Jambor  ---
(In reply to Richard Biener from comment #2)
> No, I think the issue is that ESRA leaves e.f0 alone:
> 
>   e$f3_7 = e.f3;
>   e$f0$f4_8 = e.f0.f4;
>   _1 = e$f0$f4_8;
>   _2 = (unsigned char) _1;
>   e$f3_9 = _2;
>   e.f0 = g_50;
>   e$f3_10 = MEM  [(struct S1 *)_50];
>   e$f0$f4_11 = MEM  [(struct S1 *)_50 + 24B];
>   MEM  [(union U8 *)] = e$f3_10;
>   MEM  [(union U8 *) + 24B] = e$f0$f4_11;
>   g_16 = e.f0;
> 
> it looks like it materializes the e.f0 = g_15 copy but fails to elide that
> (maybe assuming sth else will?)?  And then for some reason the final
> g_16 = e.f90 copy isn't replaced?!
> 
> So somehow SRAs heuristics go off.
> 
> Martin?

I am afraid this is just another example of what flow-insensitive SRA cannot
optimize well.  I'll keep it in the list of testcases to hopefully one day
improve on when we make it flow sensitive.

[Bug rtl-optimization/114452] Functions invoked through compile-time table of function pointers not inlined

2024-04-11 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452

--- Comment #6 from Martin Jambor  ---
(In reply to Paweł Bylica from comment #5)
> (In reply to Martin Jambor from comment #4)
> > In this testcase all (well, both) functions referenced from the array
> > are semantically equivalent which is recognized by ICF but making it
> > be able to pass this information to the inliner would be
> > non-trivial... and is this the common case worth optimizing for?
> 
> I reduced the original code to the array of two identical functions.
> Originally, there weren't identical. I can update the test case if this make
> more sense.

Probably not.  But how many elements does the array have in the original code? 
Perhaps we could speculatively inline them if there are only few.

[Bug testsuite/114662] [14 regression] new test case c_lto_pr113359-2 from r14-9841-g1e3312a25a7b34 fails

2024-04-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114662

--- Comment #5 from Martin Jambor  ---
Thanks a lot for taking care of it before I had a chance to.

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled on x86 since r14-5109-ga291237b628f41

2024-04-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907

--- Comment #75 from Martin Jambor  ---
The above fixes the testcase from comment #58.  I am not sure if any other
testcases discussed here remain unresolved.  I am also not sure to what extent
we want to that patch of mine, I guess I'll re-visit the idea in a few weeks.

[Bug ipa/113359] [13/14 Regression] LTO miscompilation of ceph on aarch64 and x86_64

2024-04-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #26 from Martin Jambor  ---
This should be fixed on master, I'll backport the fix in a few weeks to at
least gcc-13 where it was reported.

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247

--- Comment #9 from Martin Jambor  ---
On master this has been fixed by r14-9813-g8cd0d29270d4ed where I
unfortunately copy-pasted a wrong bug number :-/

I assume this needs backporting to at least gcc-13 and gcc-12. I'll do
that in a week or two.

[Bug tree-optimization/113964] [11/12/13/14/15 Regression] repeat copy of struct

2024-04-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113964

--- Comment #4 from Martin Jambor  ---
Oops. I made a mistake, the commit above fixes PR 114247, sorry :-/
This one is the next in my queue.  Sorry again.

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247

--- Comment #7 from Martin Jambor  ---
Thanks, I will bootstrap and test the patch on x86_64 and submit it
for review then.

Can I ask you, can you please modify the testcase so that it does not
use printf but simply calls __builtin_abort in the miscompiled case
and just returns zero from main if it is OK?  That way we could
include it in our test suite.  Thanks a lot.

[Bug ipa/113359] [13/14 Regression] LTO miscompilation of ceph on aarch64 and x86_64

2024-04-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #24 from Martin Jambor  ---
(In reply to Jan Hubicka from comment #23)
> I however wonder if we really guarantee to copy the paddings everywhere else
> then the total scalarization part?
> (i.e. in all paths through the RTL expansion)

I wanted that we sometimes don't do that in PR 80689 and the idea was
refused.  And as far as I can recall the code I don't think we do.

Anyway, I have sent the patch to the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6jzlc25db@virgil.suse.cz/T/#u

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled on x86 since r14-5109-ga291237b628f41

2024-04-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907

--- Comment #71 from Martin Jambor  ---
I have sent the patch to the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6le5s25kl@virgil.suse.cz/T/#u

[Bug ipa/111571] [13 Regression] ICE in modify_call, at ipa-param-manipulation.cc:656

2024-04-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111571

Martin Jambor  changed:

   What|Removed |Added

Summary|[13/14 Regression] ICE in   |[13 Regression] ICE in
   |modify_call, at |modify_call, at
   |ipa-param-manipulation.cc:6 |ipa-param-manipulation.cc:6
   |56  |56

--- Comment #6 from Martin Jambor  ---
Fixed on master, fix queued for backporting to gcc 13 branch.

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247

--- Comment #4 from Martin Jambor  ---
I don't seem to be able to get riscv64 qemu running in reasonable
time.  Can someone please verify that the following patch fixes
the issue?

diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
index 3e0df6a6f77..b4ca78b652e 100644
--- a/gcc/ipa-param-manipulation.cc
+++ b/gcc/ipa-param-manipulation.cc
@@ -740,6 +740,12 @@ ipa_param_adjustments::modify_call (cgraph_edge *cs,
  }
   if (repl)
{
+ if (!useless_type_conversion_p(apm->type, repl->typed.type))
+   {
+ repl = force_value_to_type (apm->type, repl);
+ repl = force_gimple_operand_gsi (, repl,
+  true, NULL, true,
GSI_SAME_STMT);
+   }
  vargs.quick_push (repl);
  continue;
}

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-03 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247

Martin Jambor  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org

--- Comment #3 from Martin Jambor  ---
Mine.

[Bug ipa/113359] [13/14 Regression] LTO miscompilation of ceph on aarch64 and x86_64

2024-03-28 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #22 from Martin Jambor  ---
Created attachment 57828
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57828=edit
Potential fix

I'm testing this patch

[Bug rtl-optimization/114452] Functions invoked through compile-time table of function pointers not inlined

2024-03-27 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452

Martin Jambor  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|DUPLICATE   |---
   Last reconfirmed||2024-03-27
 Ever confirmed|0   |1

--- Comment #4 from Martin Jambor  ---
This does not look like a duplicate of PR 111573.

Nevertheless, it is not quite obvious what to do here.  Inlining
happens before unrolling and I am not sure we'd consider unrolling in
early optimizations.  And without unrolling, the load from the array
is not easy to fold.

In this testcase all (well, both) functions referenced from the array
are semantically equivalent which is recognized by ICF but making it
be able to pass this information to the inliner would be
non-trivial... and is this the common case worth optimizing for?

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-03-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907

--- Comment #66 from Martin Jambor  ---
Created attachment 57750
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57750=edit
Patch comparing jump functions

I'm testing this patch.  (Not sure how to best check that it does not
inadvertently pessimize ICF too much, except for ICF testcases.)

[Bug ipa/114254] [11/12/13 regression] Indirect inlining through C++ member pointers fails if the underlying class has a virtual function

2024-03-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114254

Martin Jambor  changed:

   What|Removed |Added

Summary|[11/12/13/14 regression]|[11/12/13 regression]
   |Indirect inlining through   |Indirect inlining through
   |C++ member pointers fails   |C++ member pointers fails
   |if the underlying class has |if the underlying class has
   |a virtual function  |a virtual function

--- Comment #3 from Martin Jambor  ---
Fixed on trunk.  I may consider backporting to GCC 13 but probably not to
earlier versions.

[Bug ipa/108802] [11/12/13 Regression] missed inlining of call via pointer to member function

2024-03-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108802

Martin Jambor  changed:

   What|Removed |Added

Summary|[11/12/13/14 Regression]|[11/12/13 Regression]
   |missed inlining of call via |missed inlining of call via
   |pointer to member function  |pointer to member function

--- Comment #10 from Martin Jambor  ---
Fixed on trunk.  I may consider backporting to GCC 13 but probably not to
earlier versions.

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-03-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907

--- Comment #65 from Martin Jambor  ---
I hope to have some jump-function comparison functions ready for testing later
today.

[Bug target/112980] 64-bit powerpc ELFv2 does not allow nops to be generated before function global entry point

2024-03-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112980

--- Comment #5 from Martin Jambor  ---
I'd like to ping this, are there plans to implement this in the near-ish term?

[Bug ipa/111571] [13/14 Regression] ICE in modify_call, at ipa-param-manipulation.cc:656

2024-03-15 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111571

--- Comment #4 from Martin Jambor  ---
I have proposed a fix on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6r0gbwf7l@virgil.suse.cz/T/#u

[Bug tree-optimization/113757] [14 regression] ICE when building legion-23.03.0 since r14-8398

2024-03-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113757

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Martin Jambor  ---
Fixed.

[Bug ipa/114254] [11/12/13/14 regression] Indirect inlining through C++ member pointers fails if the underlying class has a virtual function

2024-03-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114254

--- Comment #1 from Martin Jambor  ---
I have proposed a patch on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6r0gkzvi4@virgil.suse.cz/T/#u

[Bug ipa/108802] [11/12/13/14 Regression] missed inlining of call via pointer to member function

2024-03-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108802

--- Comment #8 from Martin Jambor  ---
I have proposed an improved patch on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6r0gkzvi4@virgil.suse.cz/T/#u

[Bug ipa/114254] New: Indirect inlining through C++ member pointers fails if the underlying class has a virtual function

2024-03-06 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114254

Bug ID: 114254
   Summary: Indirect inlining through C++ member pointers fails if
the underlying class has a virtual function
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: jamborm at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57634
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57634=edit
testcase

Just adding a virtual method to the class in our test
testsuite/g++.dg/ipa/iinline-2.C and it will unfortunately stop
working.

At some point the C++ FE got clever and stopped emitting the complex
code checking if a member pointer points to a virtual method or a
normal one when the base class does not have any virtual method.  But
that meant that our testcases stopped exercising the pattern matching
code in ipa_analyze_indirect_call_uses and when that code changed with
r10-917-g3b47da42de621c (Martin Jambor: Make SRA re-construct original
memory accesses when easy) because of a small mistake, we lost the
intended ability to inline also these cases.

So this is a regression against 9.5, unfortunately.

[Bug tree-optimization/114238] New: Multiple 554.roms_r run-time regressions (4%-20%) since r14-9193-ga0b1798042d033

2024-03-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114238

Bug ID: 114238
   Summary: Multiple 554.roms_r run-time regressions (4%-20%)
since r14-9193-ga0b1798042d033
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: rguenth at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux, aarch64-linux
Target: x86_64-linux, aarch64-linux

Our LNT instance has detected that runtime of benchmark 554.roms_r
from the SPEC 2017 FPUrate suite regressed on all machines on most
configurations by 4-20%.

For example:

simple -O2 -flto on AMD Zen 3 regressed by 14%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=470.537.0

on Zen2 -O2 -flto regression is the worst, 20%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=298.537.0

-Ofast -march=native -flto on AMD Zen 4 regressed by 7%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=959.537.0

-Ofast -march=native on AMD Zen 2 regressed by 17%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.537.0

but it also happens on Intel Skylake:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=800.537.0

or Aarch64:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=587.537.0

and there are smaller regressions on the PGO configurations too.

I have bisected the Zen3 -O2 -flto case to r14-9193-ga0b1798042d033
(Richard Biener: tree-optimization/114074 - CHREC multiplication and
undefined overflow).  I have then verified that the zen 4 -Ofast
-march=natice -flto and zen 2 -Ofast -march=native cases have also
been introduces by it:

commit a0b1798042d033fd2cc2c806afbb77875dd2909b
Author: Richard Biener 
Date:   Mon Feb 26 13:33:21 2024 +0100

tree-optimization/114074 - CHREC multiplication and undefined overflow

When folding a multiply CHRECs are handled like {a, +, b} * c
is {a*c, +, b*c} but that isn't generally correct when overflow
invokes undefined behavior.  The following uses unsigned arithmetic
unless either a is zero or a and b have the same sign.

I've used simple early outs for INTEGER_CSTs and otherwise use
a range-query since we lack a tree_expr_nonpositive_p and
get_range_pos_neg isn't a good fit.

PR tree-optimization/114074
* tree-chrec.h (chrec_convert_rhs): Default at_stmt arg to NULL.
* tree-chrec.cc (chrec_fold_multiply): Canonicalize inputs.
Handle poly vs. non-poly multiplication correctly with respect
to undefined behavior on overflow.

* gcc.dg/torture/pr114074.c: New testcase.
* gcc.dg/pr68317.c: Adjust expected location of diagnostic.
* gcc.dg/vect/vect-early-break_119-pr114068.c: Do not expect
loop to be vectorized.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug ipa/108802] [11/12/13/14 Regression] missed inlining of call via pointer to member function

2024-02-21 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108802

--- Comment #7 from Martin Jambor  ---
I have proposed a patch on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6y1bdx3yg@virgil.suse.cz/T/#u

[Bug ipa/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-21 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #16 from Martin Jambor  ---
Fixed.

[Bug ipa/111573] lambda functions often not inlined and optimized out

2024-02-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111573

--- Comment #2 from Martin Jambor  ---
I cannot see any difference at -O3 with or without -fno-early-inlining.

[Bug tree-optimization/112312] -O3 produces worse code than -O2 for std::ranges::lower_bound in some cases, not marking a loop as finite

2024-02-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112312

--- Comment #4 from Martin Jambor  ---
It seems this has been fixed in current master (which is to become gcc 14).
If my bisecting is correct, it has been fixed by r14-5628-g53ba8d669550d3 (Jan
Hubicka: inter-procedural value range propagation).

I guess it would be nice to add this testcase to the testsuite, so I'm keeping
this bug opened (and on my TODO list).

[Bug ipa/108802] [11/12/13/14 Regression] missed inlining of call via pointer to member function

2024-02-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108802

Martin Jambor  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #6 from Martin Jambor  ---
I think I know what to do.

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64

2024-02-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #15 from Martin Jambor  ---
Created attachment 57462
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57462=edit
Simple testcase (needs disabling early - and only early - SRA)

This is a simpler testcase which exhibits the problem on x86_64-linux
and current master.  Steps to reproduce:

$ ~/gcc/trunk/inst/bin/gcc -O2 -fno-strict-aliasing -fno-ipa-cp 
--disable-tree-esra -flto pr113359.c -c -o 1.o
cc1: note: disable pass tree-esra for functions in the range of [0, 4294967295]

$ ~/gcc/trunk/inst/bin/gcc -O2 -fno-strict-aliasing -fno-ipa-cp 
--disable-tree-esra -flto -DFILE2 pr113359.c -c -o 2.o
cc1: note: disable pass tree-esra for functions in the range of [0, 4294967295]

$ ~/gcc/trunk/inst/bin/gcc -flto 1.o 2.o -o test.exe

$ ./test.exe 
Aborted (core dumped)


If you add -fno-ipa-icf to the "compilation" commands, the test will
pass.

Late (post ICF) intra-procedural SRA is necessary to exhibit the
problem.  On the other hand, early SRA must be suppressed or it will
scalarize the aggregate assignment too early and the results will look
different to IPA-ICF.  Instead of using --disable-tree-esra we could
pass the address of tmp in both geta() and getb() to an empty function
coming from a third compilation unit.

Disabling strict aliasing is also necessary to show the problem, with
strict aliasing IPA-ICF takes the alias class of types into acount
when hashing and considers geta() and getb() different from the start.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #6 from Martin Jambor  ---
I have proposed a patch on the mailing list that converts the array of lattices
to a vector:
https://inbox.sourceware.org/gcc-patches/ri6frxoxzpk@virgil.suse.cz/T/#u

[Bug lto/113712] [11/12/13/14 Regression] lto crash: when building 641.leela_s peek with Example-gcc-linux-x86.cfg (SPEC2017 1.1.9) since r10-3311-gff6686d2e5f797

2024-02-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113712

--- Comment #20 from Martin Jambor  ---
I have access to the benchmark and building it with -fprofile-generate
it fails for me (with an ICE in add_symbol_to_partition_1) only when I
use -fno-use-linker-plugin and either -std=c++11 or -std=c++03. Using
-std=c++14 also avoids the issue.  In any event, -fno-use-linker-plugin
looks necessary.

[Bug lto/113712] [11/12/13/14 Regression] lto crash: when building 641.leela_s peek with Example-gcc-linux-x86.cfg (SPEC2017 1.1.9) since r10-3311-gff6686d2e5f797

2024-02-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113712

--- Comment #18 from Martin Jambor  ---
(In reply to Filip Kastl from comment #17)
> I've bisected this (using the test from Andrew Pinski) to
> r10-3311-gff6686d2e5f797

That's a coincidence, with -fno-ipa-sra the testcase fails even earlier,
IPA-SRA was just hiding it, most probably by localizing some symbol before the
linking stage.

Bugs that are only reproducible with -fno-use-linker-plugin are unlikely to get
a high priority.  But I understand that the original issue does not need it?

(Also, the issue is supposed to be reproducible ton x86_64-linux, right?)

[Bug target/113847] [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X

2024-02-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

--- Comment #6 from Martin Jambor  ---
(In reply to Richard Biener from comment #5)
> CCing also Martin who should know how/why IPA SRA doesn't reconstruct the
> component ref chain here 

I have not had a look at this specific case (yet), but IPA-SRA just
doesn't (unlike intraprocedural SRA) and always creates MEM_REFs (in
callers).  I guess we could stream field offsets and/or array_ref
indices and attempt to reconstruct it for simple (non-union,
non-otherwise-overlapping) types, even if it would make the
ipa_adjusted_param type (and thus ipa_param_adjustments) slightly
bigger and add another vector.

> or why it choses the dynamic type as it does
> (possibly local SRA when fully scalarizing an aggregate copy does the same).

That is unlikely.  Total scalarization in intraprocedural SRA just
follows the type of the decl whereas IPA-SRA (and intra-SRA too when
not totally scalarizing) takes all types from existing memory
accesses.

[Bug tree-optimization/113833] 435.gromacs fails verification on with -Ofast -march={cascadelake,icelake-server} and PGO after r14-7272-g57f611604e8bab

2024-02-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113833

--- Comment #4 from Martin Jambor  ---
Created attachment 57397
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57397=edit
-fopt-info-vec before/after comparison

(In reply to Richard Biener from comment #3)
> A compare before/after the patch of -fopt-info-vec output might show the few
> cases that are affected by the patch.

I Hope I have not messed anything up.  I have added -fopt-info-vec right after
-fprofile-use into the spec config and then grepped the output for
':[^:]*:[^:]*: optimized'.  Then I sorted (because the build was parallel) and
compared the output and it seems there are quite a few *fewer* instances of
vectorization happening.

[Bug tree-optimization/110422] asm goto vs SRA

2024-02-09 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Martin Jambor  ---
Fixed on all opened release branches too.

[Bug tree-optimization/113833] New: 435.gromacs fails verification on with -Ofast -march={cascadelake,icelake-server} and PGO after r14-7272-g57f611604e8bab

2024-02-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113833

Bug ID: 113833
   Summary: 435.gromacs fails verification on with -Ofast
-march={cascadelake,icelake-server} and PGO after
r14-7272-g57f611604e8bab
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: fxue at os dot amperecomputing.com
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux
Target: x86_64-linux

After r14-7272-g57f611604e8bab (Feng Xue: Do not count unused scalar
use when marking STMT_VINFO_LIVE_P [PR113091]), our runs of SPEC 2006
CPU benchmark 435.gromacs on Icelake-server CPU compiled with -Ofast
-march=native and PGO (with and without LTO) started failing with
miscompare error:

  0002:  3.07684e+02
 3.03476e+02
   ^

I subsequently verified the failure on an Intel CascadeLake and
bisected it to the aforementioned commit.  We don't see it on our AMD
or Ampere testers (using -march=native).

I guess the miscomparison error may be well within what is expected
when using -Ofast but even in that case it would be nice to have it
documented here that that is indeed expected.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug tree-optimization/113757] [14 regression] ICE when building legion-23.03.0 since r14-8398

2024-02-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113757

--- Comment #8 from Martin Jambor  ---
I have proposed a fix on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6bk8r5kfi@virgil.suse.cz/T/#u

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64

2024-02-07 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #14 from Martin Jambor  ---
(In reply to rguent...@suse.de from comment #13)
> Might be also an interaction with IPA ICF in case there's a pointer to
> the pair involved?

Yes, this is exactly what seems to be happening.  The problem goes
away with -fno-icf.

(Possibly because the testcase uses -fno-strict-aliasing,) IPA-ICF
merges two functions which copy a structure and that access type it
what IPA-SRA saves, but loads only the one of the merged functions.
SRA then uses the (wrong) type to split aggregate copies into copies
by individual fields.

I have talked to Honza about this.  It seems that IPA-ICF needs to be
careful about aggreage with holes in different places.  The ideal next
step would be to create a testcase not dependent on IPA-SRA.

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64

2024-02-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #9 from Martin Jambor  ---
SRA creates the replacements (in GCC 13) during total scalarization,
i.e. the bit that is not driven by pre-existing accesses to
aggregates, but because it sees an aggregate that is small and regular
and so it is split according to its type in the hope it will go away.

Unfortunately in the LTO and non-LTO case, they see a different type.
I have added a dumping of types and fields of totally scalarized
records and got the following.

In the non-LTO case, the type of the aggregate is:
   constant 128>
unit-size  constant 16>
align:64 warn_if_not_align:0 symtab:1430035184 alias-set -1 canonical-type
0x553cabd0
...

and specifically its third field is a pointer:
  
pointer_to_this >
unsigned DI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x562729d8 reference_to_this >
used unsigned nonlocal decl_3 DI /usr/include/c++/13/bits/stl_pair.h:194:11
size  constant 64>
unit-size  constant 8>
align:64 warn_if_not_align:0 offset_align 128 decl_not_flexarray: 0
offset  constant 0>
bit-offset  constant 64> context >


However, in the LTO case the type of the aggregate is:
   constant 128>
unit-size  constant 16>
align:64 warn_if_not_align:0 symtab:0 alias-set 98 canonical-type
0x61cc1498
...

which however has an unsigned int as its third field:
 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x62410690 precision:32 min  max 
pointer_to_this  reference_to_this
>
unsigned nonlocal SI /usr/include/c++/13/bits/stl_pair.h:194:11
size  constant 32>
unit-size  constant 4>
align:32 warn_if_not_align:0 offset_align 128 decl_not_flexarray: 0
offset  constant 0>
bit-offset  constant 64> context >

An so only an unsigned int replacement is created.

The name of the aggregate indicates it has been created by IPA-SRA and
so that is where I am looking right now, but IPA-SRA simply takes (and
streams) the type of the access in the original function body for
these.  Can't this perhaps be some type-merging issue?

[Bug tree-optimization/113757] [14 regression] ICE when building legion-23.03.0 since r14-8398

2024-02-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113757

Martin Jambor  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #7 from Martin Jambor  ---
This is a very particular interaction of the patch with speculative
devirtualization.  Mine.

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2024-01-31 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

--- Comment #3 from Martin Jambor  ---
(In reply to Richard Biener from comment #1)
> Did you try with -fprofile-partial-training (is that default on?  it
> probably should ...).  Can you please try training with the rate data
> instead of train
> to rule out a mismatch?

With -fprofile-partial-training the znver4 LTO vs LTOPGO regression (on a newer
master) goes down from 66% to 54%.  

So far I did not find a way to easily train with the reference run (when I add
"train_with = refrate" to the config, I always get "ERROR: The workload
specified by train_with MUST be a training workload!")

[Bug target/113655] New: Cross compiling to mips64-elf fails because "MIPS_EXPLICIT_RELOCS was not declared" after r14-8386-g58af788d1d0825

2024-01-29 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113655

Bug ID: 113655
   Summary: Cross compiling to mips64-elf fails because
"MIPS_EXPLICIT_RELOCS was not declared" after
r14-8386-g58af788d1d0825
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: syq at gcc dot gnu.org
  Target Milestone: ---
  Host: x86_64-linux
Target: mips64-elf

Starting with r14-8386-g58af788d1d0825 (MIPS: Accept arguments for
-mexplicit-relocs), when I try to test that cross compilation from
x86_64-linux to target mips64-elf still works by configuring gcc with:

../src/configure --prefix=/home/mjambor/gcc/mine/inst --enable-languages=c,c++
--enable-checking=yes --disable-bootstrap --disable-multilib --enable-obsolete
--target=mips64-elf

and then building just the compiler with make -j64 all-host,

the compilation fails with:

options.cc:3474:3: error: ‘MIPS_EXPLICIT_RELOCS’ was not declared in this
scope; did you mean ‘MIPS_EXPLICIT_RELOCS_NON ’?
 3474 |   MIPS_EXPLICIT_RELOCS, /* mips_opt_explicit_relocs */
  |   ^~~~
  |   MIPS_EXPLICIT_RELOCS_NONE


Our buildbot reports failures when building a cross-compiler for
mips64el-st-linux-gnu, mips64octeon-linux, mipsisa64r2-linux,
mipsisa32r2-linux-gnu, mipsisa64r2-sde-elf, mipsisa32-elfoabi,
mipsisa64-elfoabi, mipsisa64r2el-elf, mipsisa64sr71k-elf,
mipsisa64sb1-elf, mips64-elf, mipsel-elf, mips64vr-elf,
mips64orion-elf, mips-rtems, mips-wrs-vxworks, mipstx39-elf and I
suspect the problem is the same or similar.

[Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2024-01-28 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

Bug ID: 113646
   Summary: PGO hurts run-time of 538.imagick_r as much as 68% at
-Ofast -march=native
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: hubicka at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux, aarch64-linux
Target: x86_64-linux, aarch64-linux

Using profile guided optimization is very detrimental when compiling SPEC 2017
FPrate benchmark 538.imagick_r at -Ofast -march=native (with or without LTO) on
all machines where I have tried.

On Zen4, using PGO results in a 68% slower than not doing that without LTO and
65% with LTO:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=970.507.0=966.507.0=959.507.0=958.507.0;

On Zen3, using PGO slows the binary down by 22% when not using LTO and by 30%
with LTO:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.507.0=473.507.0=475.507.0=477.507.0;

On Zen2, PGO regresses by 16% without LTO and by 28% with it:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.507.0=293.507.0=287.507.0=286.507.0;

On our Altra CPU, the slowdowns are 26% and 45%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=584.507.0=583.507.0=587.507.0=589.507.0;

On an Intel CascadeLake machine, they are 24% and 41%. (Our LNT Intel machine
is temporarily offline, unfortunately).

It is of course possible that the training workload does not match the
reference one very well.  However, this was not a problem in the past
(apparently the problem is that our non-PGO results improved but our PGO ones
did not).  Also, other compilers such as LLVM achieve better run-times with PGO
than without.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug target/113641] New: 510.parest_r with PGO at O2 slower than GCC 12 (7% on Zen 3&2, 4% on CascadeLake) since r13-4272-g8caf155a3d6e23

2024-01-28 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113641

Bug ID: 113641
   Summary: 510.parest_r with PGO at O2 slower than GCC 12 (7% on
Zen 3&2, 4% on CascadeLake) since
r13-4272-g8caf155a3d6e23
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux-gnu
Target: x86_64-linux-gnu

During the development of GCC 13, 510.parest_r run-time regressed on x86_64
when built with profile guided optimization and just plain O2 and master than
when using GCC12.  The difference is not big but fairly clear cut, about 7.6%
on Zen3:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=740.457.0=892.457.0=694.457.0;

and about 7.2% on Zen2:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=777.457.0=932.457.0=687.457.0;

The graphs above show use of both LTO and PGO but LTO is not necessary.

I was able to bisect the regression to commit r13-4272-g8caf155a3d6e23 (i386:
Only enable small loop unrolling in backend [PR 107692]).  parest_r is also
about 4% slower when compiled with this revision than with the previous one on
Intel CascadeLake.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug target/113600] [14 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-01-26 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600

--- Comment #4 from Martin Jambor  ---
(In reply to Hongtao Liu from comment #2)
> A patch is posted at
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640276.html
> 
> Would you give a try to see if it fixes the regression, I don't currently
> have a znver4 machine for testing.

Unfortunately it does not.

(In reply to Richard Biener from comment #3)
> I think we need to figure out what exactly gets slower (and hope it's not
> scattered all over the place)

I have collected some profiles:

r14-5602-ge6269bb69c0734

# Samples: 516K of event 'cycles:u'
# Event count (approx.): 468008188417
# Overhead   Samples  Command  Shared Object   
  Symbol   
#     ... 
. 
.
#
13.55% 69886  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] mc_chroma
11.05% 57017  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_16x16
 9.24% 47693  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_8x8
 8.67% 44733  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] get_ref
 4.84% 24984  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] sub16x16_dct
 4.16% 21484  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_me_search_ref
 3.30% 17033  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_hadamard_ac_16x16
 2.28% 11770  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_4x4
 2.10% 10824  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] quant_trellis_cabac
 2.07% 10694  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] hpel_filter
 2.05% 10616  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] sub8x8_dct
 1.86%  9593  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] refine_subpel
 1.70%  8788  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] quant_4x4
 1.57%  8077  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_sad_16x16
 1.16%  6324  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] frame_init_lowres_core
 1.14%  5867  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_sa8d_8x8
 1.11%  5738  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_cabac_encode_decision_c
 1.08%  5736  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_var_16x16



r14-5603-g2b59e2b4dff421

# Samples: 550K of event 'cycles:u'
# Event count (approx.): 498834737657
# Overhead   Samples  Command  Shared Object   
  Symbol   
#     ... 
. 
.
#
18.21%100151  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_16x16
12.37% 68006  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] mc_chroma
 8.51% 46815  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_8x8
 7.56% 41560  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] get_ref
 4.53% 24901  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] sub16x16_dct
 3.92% 21561  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_me_search_ref
 3.08% 16963  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_hadamard_ac_16x16
 2.41% 13239  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_4x4
 1.99% 10931  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] quant_trellis_cabac
 1.96% 10801  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] hpel_filter
 1.95% 10764  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] sub8x8_dct
 1.56%  8587  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] quant_4x4
 1.49%  8166  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] refine_subpel
 1.48%  8124  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_sad_16x16
 1.09%  6328  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] frame_init_lowres_core
 1.07%  5901  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_sa8d_8x8
 1.04%  5703  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_cabac_encode_decision_c

[Bug tree-optimization/107946] [13/14 Regression] 507.cactuBSSN_r regresses by ~9% on znver3 with PGO since r13-3875-g9e11ceef165bc0

2024-01-26 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107946

Martin Jambor  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-26
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #7 from Martin Jambor  ---
This regression is still there (as the graphs linked in the summary show).

[Bug target/113600] New: 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-01-25 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600

Bug ID: 113600
   Summary: 525.x264_r run-time regresses by 8% with PGO -Ofast
-march=znver4
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: liuhongt at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux-gnu
Target: x86_64-linux-gnu

With profile-feedback, -Ofast and -march=native on an AMD Zen 4, there is a
recent 8% regression:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=979.377.0=966.377.0;

With both PGO and LTO, the situation is similar (6%):
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=977.377.0=958.377.0;

On a Zen3 machine, there is a 2% bump around the same time:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=900.377.0=473.377.0;

I have bisected the (non-LTO) Zen 4 case to commit r14-5603-g2b59e2b4dff421:

2b59e2b4dff42118fe3a505f07b9a6aa4cf53bdf is the first bad commit
commit 2b59e2b4dff42118fe3a505f07b9a6aa4cf53bdf
Author: liuhongt 
Date:   Thu Nov 16 18:38:39 2023 +0800

Support reduc_{plus,xor,and,ior}_scal_m for vector integer mode.

BB vectorizer relies on the backend support of
.REDUC_{PLUS,IOR,XOR,AND} to vectorize reduction.

gcc/ChangeLog:

PR target/112325
* config/i386/sse.md (reduc__scal_): New expander.
(REDUC_ANY_LOGIC_MODE): New iterator.
(REDUC_PLUS_MODE): Extend to VxHI/SI/DImode.
(REDUC_SSE_PLUS_MODE): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112325-1.c: New test.
* gcc.target/i386/pr112325-2.c: New test.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug target/105275] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718

2024-01-24 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275

--- Comment #3 from Martin Jambor  ---
I have re-checked this year again (using master revision
r14-7200-g95440171d0e615)  but this time on a high-frequency Zen3 CPU (EPYC
75F3). Run-time of 525.x264_r built with master with PGO and -O2 improved by
5.49% compared to GCC 13 and so compared to GCC 11 the regression dropped to
4.2%.

Run-time of 538.imagick_r compiled with the same options and master is 5.8%
slower on this CPU than when compiling it with GCC 11.

With both PGO and LTO, 525.x264_r is now only 2.8% slower than GCC 11.  In case
of 538.imagick_r the regression is 2.01% on the zen4, but it is 7.49% on a zen4
machine :-/

[Bug ipa/112616] [11/12/13/14 Regression] wrong code at -O{s, 2, 3} on x86_64-linux-gnu since r10-3311

2024-01-24 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112616

--- Comment #8 from Martin Jambor  ---
Fixed on trunk.  I did not want to backport this but because this variant does
not require disabling DCE, I will probably do after a few weeks on master, if
there are no issues.

[Bug ipa/108007] [11/12/13/14 Regression] wrong code at -Os and above with "-fno-dce -fno-tree-dce" on x86_64-linux-gnu since r10-3311-gff6686d2e5f797

2024-01-24 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108007

--- Comment #22 from Martin Jambor  ---
Fixed on trunk.  I did not want to backport this but because of PR 112616 I
will probably do after a few weeks on master, if there are no issues.

[Bug ipa/113490] [14 Regression] ICE: in propagate_vals_across_arith_jfunc, at ipa-cp.cc:2425 at -O3 since r14-285

2024-01-24 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113490

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Martin Jambor  ---
Fixed.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-01-22 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #4 from Martin Jambor  ---
The right place where to free stuff in lattices post-IPA would be in
ipa_node_params::~ipa_node_params() where we should iterate over lattices and
deinitialize them or perhaps destruct the array because since ipcp_vr_lattice
directly contains Value_Range which AFAIU directly contains int_range_max which
has a virtual destructor... does not look like a POD anymore.  This has escaped
me when I was looking at the IPA-VR changes but hopefully it should not be too
difficult to deal with.

[Bug ipa/113490] [14 Regression] ICE: in propagate_vals_across_arith_jfunc, at ipa-cp.cc:2425 at -O3 since r14-285

2024-01-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113490

--- Comment #5 from Martin Jambor  ---
I have proposed a fix on the mailing list: 
https://inbox.sourceware.org/gcc-patches/ri6cytv3eyy.fsf@/T/#u

[Bug other/94629] 10 issues located by the PVS-studio static analyzer

2024-01-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94629

--- Comment #28 from Martin Jambor  ---
(In reply to David Binderman from comment #27)
> The original article checked gcc-10.
> gcc-13 is checked in the following article:
> 
> https://pvs-studio.com/en/blog/posts/cpp/1067/
> 
> I suspect it would be most unwise if any release of gcc after 13 
> introduced new bugs that were known to pvs-studio.

And is there already a bugzilla bug about these (or should I create one)?
I believe a new one would be better than re-using this one.

[Bug ipa/113490] [14 Regression] ICE: in propagate_vals_across_arith_jfunc, at ipa-cp.cc:2425 at -O3 since r14-285

2024-01-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113490

Martin Jambor  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org

--- Comment #3 from Martin Jambor  ---
Still, let me have a look.

[Bug tree-optimization/110422] asm goto vs SRA

2024-01-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

--- Comment #5 from Martin Jambor  ---
Fixed on trunk, I plan to backport to open release branches in the upcoming
weeks.

[Bug other/89863] [meta-bug] Issues in gcc that other static analyzers (cppcheck, clang-static-analyzer, PVS-studio) find that gcc misses

2024-01-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89863
Bug 89863 depends on bug 94629, which changed state.

Bug 94629 Summary: 10 issues located by the PVS-studio static analyzer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94629

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug other/94629] 10 issues located by the PVS-studio static analyzer

2024-01-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94629

Martin Jambor  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #26 from Martin Jambor  ---
(In reply to Martin Liška from comment #25)
> No, there's still the 'ipa_polymorphic_call_context::set_by_invariant' issue
> that's waiting for Honza.

Finally fixed with:

https://gcc.gnu.org/g:4f4820964ebffc03249d98239a4ad2b43dd1a486

commit r14-8191-g4f4820964ebffc03249d98239a4ad2b43dd1a486
Author: Jan Hubicka 
Date:   Wed Jan 17 19:16:47 2024 +0100

Remove accidental hack in ipa_polymorphic_call_context::set_by_invariant

I managed to commit a hack setting offset to 0 in
ipa_polymorphic_call_context::set_by_invariant.  This makes it to give up
on multiple
inheritance, but most likely won't give bad code since the ohter base will
be of
different type.

gcc/ChangeLog:

* ipa-polymorphic-call.cc
(ipa_polymorphic_call_context::set_by_invariant): Remove
accidental hack reseting offset.

[Bug tree-optimization/110422] asm goto vs SRA

2024-01-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

Martin Jambor  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org

--- Comment #3 from Martin Jambor  ---
Mine.

[Bug ipa/112616] [11/12/13/14 Regression] wrong code at -O{s, 2, 3} on x86_64-linux-gnu since r10-3311

2024-01-16 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112616

--- Comment #6 from Martin Jambor  ---
(In reply to Andrew Pinski from comment #1)
>   # q_11 = PHI <0B(2), removed_return.14_14(D)(4),
> removed_return.14_14(D)(3)>
>   _12 = *q_11;
> 
> 
> WTF

Well, _12 is not used anywhere, so the code expects the entire load to be DCEd.
 But it gets optimized to 

  _2 = MEM[(int *)0B]; 

before DCE sees it and then even if _2 is never used anywhere, apparently the
statement is kept there as an intended trap (I guess).

I have adjusted my patch to make DCE for removed returnd part of IPA edge
redirection so that it does not have compare-debug problems and submitted it
for review in: https://inbox.sourceware.org/gcc-patches/ri6cyu1e9kw.fsf@/T/#u

[Bug ipa/108007] [11/12/13/14 Regression] wrong code at -Os and above with "-fno-dce -fno-tree-dce" on x86_64-linux-gnu since r10-3311-gff6686d2e5f797

2024-01-16 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108007

--- Comment #20 from Martin Jambor  ---
I have submitted a slightly modified patch to the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6cyu1e9kw.fsf@/T/#u

[Bug target/113296] [14 Regression] SPEC 2006 434.zeusmp segfaults on Aarch64 when built with -Ofast -march=native -flto

2024-01-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113296

Martin Jambor  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Martin Jambor  ---
According to our buildbot results, this has resolved itself somewhen between 1
and 2 days ago.

I assume nobody wants to go an investigate what issue it was if it does not
reappear, so let me close the bug.

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2024-01-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 113296, which changed state.

Bug 113296 Summary: [14 Regression] SPEC 2006 434.zeusmp segfaults on Aarch64 
when built with -Ofast -march=native -flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113296

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/113178] [14 Regression] ice in find_uses_to_rename_use

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113178

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |

--- Comment #6 from Martin Jambor  ---
(In reply to David Binderman from comment #4)
> Reduced range seems to be g:0994ddd86f9c3d82 to g:a657c7e3518fcfc7.
> 
> All commits in this range are by Tamar.

Specifically r14-6822-g01f4251b8775c8 (Tamar Christina: middle-end: Support
vectorization of loops with multiple exits.)

[Bug tree-optimization/107823] [13/14 Regression] Dead Code Elimination Regression at -Os (trunk vs. 12.2.0) since r13-1934-g353fd1ec3df92f

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107823

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |

--- Comment #6 from Martin Jambor  ---
This has been fixed by commit r14-4089-gd45ddc2c04e471 (Richard Biener:
tree-optimization/111294 - backwards threader PHI costing).

[Bug tree-optimization/109744] mesa/panvk: bogus Warray-bounds on gcc 12.2, fixed in 13 branch

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109744

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |

--- Comment #3 from Martin Jambor  ---
The warning went away with commit r13-4389-gfd8dd6c0384969 (Richard Biener:
tree-optimization/107852 - missed optimization with PHIs).

[Bug c++/109753] [13/14 Regression] pragma GCC target causes std::vector not to compile (always_inline on constructor)

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109753

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |

--- Comment #11 from Martin Jambor  ---
It seems there is nothing to bisect any more, please re-add the keyword if I am
wrong.

[Bug target/109780] [12/13/14 Regression] csmith: runtime crash with -O2 -march=znver1

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109780

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |

--- Comment #26 from Martin Jambor  ---
Seems like there is nothing to bisect any more, please re-add the keyword is I
am wrong.

[Bug c++/109823] [11/12/13/14 Regression] ICE with trailing return of decltype of a fold expression in nested generic variadic lambda

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109823

Martin Jambor  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #4 from Martin Jambor  ---
The testcase from comment #1 started ICEing with commit dc58fa9f3142097b (Jason
Merrill: PR c++/84036 - ICE with variadic capture).

[Bug c/109828] [13/14 Regression] static compound literal with flexible array in initializer leads to invalid size and ICE

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109828

Martin Jambor  changed:

   What|Removed |Added

 CC||jsm28 at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #11 from Martin Jambor  ---
ICE compiling testcase

-
#include 

struct s {
int i;
char c[];
};

const struct s s = { .c = "0", };
const struct s *const r = &(constexpr struct s) { .c = "1", };
const struct s *const t = &(static struct s) { .c = "2", };

size_t ice(void)
{
return __builtin_object_size(t, 1);
}
--

with options -O2 -std=gnu2x -S was introduced with commit
r13-3930-gb556d1773db717 (Joseph Myers: c: C2x constexpr), the testcase simply
errors before that because it tests constexprs.

[Bug c++/109918] [11/12/13/14 Regression] Unexpected -Woverloaded-virtual with virtual conversion operators

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109918

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |
 CC||nathan at gcc dot gnu.org

--- Comment #3 from Martin Jambor  ---
The testcase with -Werror=overloaded-virtual started failing with commit
r8-2669-gbff8b385e997a8 (Nathan Sidwell: Conversion operators have a special
name).

[Bug target/110001] [13/14 regression] Suboptimal code generation for branchless binary search

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110001

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |
 CC||amacleod at redhat dot com

--- Comment #6 from Martin Jambor  ---
Even though I can confirm the observation from comment #1 that the optimized
tree dump does not seem to change in any meaningful way, bisection leads to
commit r12-4871-g502ffb1f389011 (Andrew MacLeod: Switch vrp2 to ranger).

[Bug c++/110065] [11/12/13/14 Regression] [C++20/2b] auto return type in template argument causes ICE, also accepts-invalid

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110065

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org,
   ||jason at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #2 from Martin Jambor  ---
The ICE when compiling the reduced testcase from comment #1 started with
r14-1659-gd3e2a174b13dd0 (Jason Merrill: c++: diagnose auto in template arg) -
before that it was an error.

[Bug tree-optimization/110091] [12/13/14 Regression] bogus -Wdangling-pointer on non-pointer values

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110091

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #3 from Martin Jambor  ---
The warning started appearing from its very introduction to gcc in
r12-6606-g9d6a0f388eb048 (Martin Sebor: Add -Wdangling-pointer [PR63272]).

[Bug middle-end/110294] [11 Regression] Segmentation fault with '-O3 -fno-dce -fno-toplevel-reorder -fno-tree-dce -fno-tree-pta -fno-tree-sink -ftoplevel-reorder'

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110294

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #6 from Martin Jambor  ---
(In reply to Xi Ruoyao from comment #2)
> Not reproducible with GCC 13.1.  I guess it's a duplicate of a fixed issue.

The testcase stopped failing with r12-248-gb58dc0b803057c (Richard Biener:
tree-optimization/99912 - delete trivially dead stmts during DSE)

[Bug tree-optimization/110450] [14 Regression] Dead Code Elimination Regression at -O2 since r14-261-g0ef3756adf0

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110450

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |
 CC||jamborm at gcc dot gnu.org

--- Comment #3 from Martin Jambor  ---
This has been fixed with r14-4141-gbf6b107e2a3423 (Andrew MacLeod: New early
__builtin_unreachable processing).

[Bug ipa/110705] [11/12 Regression] ICE at -O2 and above: in gimplify_modify_expr, at gimplify.cc:6255 (on GCC-12.x)

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110705

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #2 from Martin Jambor  ---
This has been fixed with r13-1695-gb0f02eeb906b63 (Eric Botcazou: Fix ICE on
view conversion between struct and integer)

[Bug tree-optimization/110768] [14 Regression] Dead Code Elimination Regression since r14-2623-gc11a3aedec2

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110768

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #3 from Martin Jambor  ---
This has been fixed with r14-5109-ga291237b628f41 (Andrew MacLeod: Remove
simple ranges from trailing zero bitmasks)

[Bug libgomp/110842] [14 Regression] Openmp loops with KIND=16 DO loops

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110842

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |

--- Comment #5 from Martin Jambor  ---
So IIUC nothing to bisect here and so I am removing the tag.  Please re-add if
I am somehow mistaken.

[Bug tree-optimization/110941] [14 Regression] Dead Code Elimination Regression at -O3 since r14-2379-gc496d15954c

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110941

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #4 from Martin Jambor  ---
This has been fixed with r14-5109-ga291237b628f41 (Andrew MacLeod: Remove
simple ranges from trailing zero bitmasks).

[Bug tree-optimization/110942] [14 Regression] Dead Code Elimination Regression at -O3 since r14-1165-g257c2be7ff8

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110942

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |
 CC||jamborm at gcc dot gnu.org

--- Comment #5 from Martin Jambor  ---
This has been fixed with r14-5109-ga291237b628f41 (Andrew MacLeod: Remove
simple ranges from trailing zero bitmasks.)

[Bug tree-optimization/111003] [14 Regression] Dead Code Elimination Regression at -O3 since r14-2161-g237e83e2158

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111003

Martin Jambor  changed:

   What|Removed |Added

   Keywords|needs-bisection |
 CC||jamborm at gcc dot gnu.org

--- Comment #4 from Martin Jambor  ---
This has been fixed with r14-4786-gd118738e71cf46 (Richi's restrict invariant
motion of shifts).

[Bug tree-optimization/111012] [14 Regression] Dead Code Elimination Regression at -O3 since r14-573-g69f1a8af45d

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111012

Martin Jambor  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 CC||jamborm at gcc dot gnu.org
 Status|NEW |RESOLVED
   Keywords|needs-bisection |

--- Comment #3 from Martin Jambor  ---
This has been fixed with Richi's r14-3982-g9ea74d235c7e78 ( better DCE after
forwprop).  Given the title of the patch I guess it's safe to declare this
fixed.

[Bug fortran/111291] ASAN error: heap-use-after-free gcc/fortran/parse.cc:359 in decode_statement

2024-01-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111291

Martin Jambor  changed:

   What|Removed |Added

 CC|mjambor at suse dot cz |mikael at gcc dot 
gnu.org

--- Comment #3 from Martin Jambor  ---
This has been introduced with r14-7062-gbcf7ebba9115cc (fortran: Restore
interface to its previous state on error [PR48776]).

  1   2   3   4   5   6   >