[Bug tree-optimization/113551] [13/14 Regression] Miscompilation with -O1 -funswitch-loops -fno-strict-overflow

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113551

Richard Biener  changed:

   What|Removed |Added

   Keywords||needs-bisection
 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #6 from Richard Biener  ---
trunk doesn't unswitch for me (needs bisection).  Let me check what happens on
the branch.

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

Tamar Christina  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
   Priority|P3  |P1
  Component|middle-end  |tree-optimization

[Bug middle-end/113552] New: [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

Bug ID: 113552
   Summary: [11/12/13/14 Regression] vectorizer generates calls to
vector math routines with 1 simd lane.
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: link-failure
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64-*

In GCC 7 the Arm vector PCS was implemented to support libmvec but the libmvec
component never made it into glibc until now.

GLIBC 2.39 which will be paired with GCC 14 now implements the vector math
routines.

However consider this function:

> cat cosmo.fppized3.f
  SUBROUTINE a(b)
  DIMENSION b(3,0)
  COMMON c
  DO 4 m=1,c
 DO 4 d=1,3
 b(d,m)=b(d,m)+COS(5.0D00*m)
   4  CONTINUE
  END
  DIMENSION e(53)
  DIMENSION f(6,91),g(6,91),h(6,91),
 *  i(6,91),j(6,91),k(6,86)
  DIMENSION l(107)
  END

and compiled with headers from a glibc 2.39:

> aarch64-unknown-linux-gnu-gfortran -S -o - -Ofast 
> -L/data/repro/glibc/usr/lib64 -I/data/repro/glibc/include 
> --sysroot=/data/repro/glibc -w cosmo.fppized3.f

produces:

fmulv13.2d, v13.2d, v19.2d
fmovd0, d13
bl  _ZGVnN1v_cos
fmovd12, d0
dup d0, v13.d[1]
bl  _ZGVnN1v_cos
fmovd31, d0
stp d12, d31, [sp, 96]

which has deconstructed the vector to scalar and performs a vector call with 1
element.
This is not just inefficient but _ZGVnN1v_cos does not exist in glibc as such
code is produced that we cannot link.

It looks like the vectorizer starts with 4 floats and widens to 2x 2 double. 
But then during vectorizable simd this is again split into multiple vectors,
even though the operation already fits in a vector:

cosmo.fppized3.f:4:13: note:   -->vectorizing SLP node starting from: _49 =
__builtin_cos (_48);
cosmo.fppized3.f:4:13: note:   vect_is_simple_use: operand _47 * 5.0e+0, type
of def: internal
cosmo.fppized3.f:4:13: note:   transform call.
cosmo.fppized3.f:4:13: note:   add new stmt: _132 = BIT_FIELD_REF
;
cosmo.fppized3.f:4:13: note:   add new stmt: _133 = cos.simdclone.0 (_132);
cosmo.fppized3.f:4:13: note:   add new stmt: _134 = BIT_FIELD_REF
;
cosmo.fppized3.f:4:13: note:   add new stmt: _135 = cos.simdclone.0 (_134);
cosmo.fppized3.f:4:13: note:   add new stmt: vect__49.27_136 = {_133, _135};
cosmo.fppized3.f:4:13: note:   add new stmt: _137 = BIT_FIELD_REF
;
cosmo.fppized3.f:4:13: note:   add new stmt: _138 = cos.simdclone.0 (_137);
cosmo.fppized3.f:4:13: note:   add new stmt: _139 = BIT_FIELD_REF
;
cosmo.fppized3.f:4:13: note:   add new stmt: _140 = cos.simdclone.0 (_139);
...

Because we happen to have a V1DF mode that is meant to only be used by some
intrinsics the operation succeeds.

So several issues here:

1. We should remove the new libmvec headers from glibc from applying to GCC
10,9,8,7 since we can't fix those anymore.  So we need a GCC version check on
them, however glibc is now frozen for release.
2. The vectorizer should not decompose a simd call if the input and result
don't require it.
3. We shouldn't generate a call with simdlen 1.  That said in theory this could
still be beneficial because it would allow the rest of the code to vectorize
and the vector pcs is cheaper to call.

[Bug testsuite/113548] gcc.dg/vect/vect-ifcvt-19.c ICEs on LLP64 target

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113548

--- Comment #4 from Richard Biener  ---
Note for 'sizetype' you want to use '__SIZETYPE__', not '__SIZE_TYPE__'

[Bug c++/113541] Rejects __attribute__((section)) on explicit instantiation declaration of ctor/dtor

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113541

Richard Biener  changed:

   What|Removed |Added

   Keywords||rejects-valid
Version|unknown |14.0
  Known to work||4.9.4
  Known to fail||5.1.0

--- Comment #2 from Richard Biener  ---
It sounds like an issue with the C++ mandated aliases.

But I'll note that the template instantiations have to adhere to certain
linkage so I wonder if simply putting them into a different section isn't going
to break the ABI.

[Bug middle-end/113540] missing -Warray-bounds warning with malloc and a simple loop

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113540

Richard Biener  changed:

   What|Removed |Added

 Blocks||56456
   Keywords||diagnostic
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-01-23

--- Comment #1 from Richard Biener  ---
If you remove the volatile, like

#include 

char *foo (void)
{
  char *t;
  t = malloc (4);
  for (int i = 0; i <= 4; i++)
t[i] = 0;
  return t;
}

you get

t.c: In function 'foo':
t.c:8:10: warning: '__builtin_memset' writing 5 bytes into a region of size 4
[-Wstringop-overflow=]
8 | t[i] = 0;
  | ~^~~
t.c:6:7: note: destination object of size 4 allocated by 'malloc'
6 |   t = malloc (4);
  |   ^~

note this is because we then unroll the loop.  If you change it like

#include 

short *foo (void)
{
  short *t;
  t = malloc (8);
  for (int i = 0; i <= 4; i++)
t[i] = 13;
  return t;
}

you get

t.c: In function 'foo':
t.c:8:6: warning: array subscript 4 is outside array bounds of 'short int[4]'
[-Warray-bounds=]
8 | t[i] = 13;
  | ~^~~
t.c:6:7: note: at offset 8 into object of size 8 allocated by 'malloc'
6 |   t = malloc (8);
  |   ^~

because we unroll the loop.  Upping the bounds like

#include 

short *foo (void)
{
  short *t;
  t = malloc (64);
  for (int i = 0; i <= 32; i++)
t[i] = 13;
  return t;
}

no longer warns because we hit unroll limits.  This is also the reason
we do not diagnose the original testcase - there's currently no analysis
done to compute the set of values 'i' must reach for the purpose of
array-bound diagnostics.  Instead we use value-ranges which are
conservative, aka [-INF, INF] is "correct".  But that means we only
diagnose cases where _all_ values of the range fall outside of the
array.

Using niter analysis and SCEV we could do a better job in cases like the
one in this bug.

I'm quite sure we have related/duplicate bugreports for this already.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56456
[Bug 56456] [meta-bug] bogus/missing -Warray-bounds

[Bug target/113255] [11/12/13 Regression] wrong code with -O2 -mtune=k8

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
  Known to work||14.0
 Status|NEW |ASSIGNED
Summary|[11/12/13/14 Regression]|[11/12/13 Regression] wrong
   |wrong code with -O2 |code with -O2 -mtune=k8
   |-mtune=k8   |

--- Comment #13 from Richard Biener  ---
Fixed on trunk sofar.

[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8

2024-01-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #12 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:a98d5130a6dcff2ed4db371e500550134777b8cf

commit r14-8346-ga98d5130a6dcff2ed4db371e500550134777b8cf
Author: Richard Biener 
Date:   Mon Jan 15 12:55:20 2024 +0100

rtl-optimization/113255 - base_alias_check vs. pointer difference

When the x86 backend generates code for cpymem with the rep_8byte
strathegy for the 8 byte aligned main rep movq it needs to compute
an adjusted pointer to the source after doing a prologue aligning
the destination.  It computes that via

  src_ptr + (dest_ptr - orig_dest_ptr)

which is perfectly fine.  On RTL this is then

8: r134:DI=const(`g'+0x44)
9: {r133:DI=frame:DI-0x4c;clobber flags:CC;}
  REG_UNUSED flags:CC
   56: r129:DI=const(`g'+0x4c)
   57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;}
  REG_UNUSED flags:CC
  REG_EQUAL const(`g'+0x4c)&0xfff8
   58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;}
  REG_DEAD r134:DI
  REG_UNUSED flags:CC
  REG_EQUAL const(`g'+0x44)-r129:DI
   59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;}
  REG_DEAD r133:DI
  REG_UNUSED flags:CC

but as written find_base_term happily picks the first candidate
it finds for the MINUS which means it picks const(`g') rather
than the correct frame:DI.  This way find_base_term (but also
the unfixed find_base_value used by init_alias_analysis to
initialize REG_BASE_VALUE) performs pointer analysis isn't
sound.  The following restricts the handling of multi-operand
operations to the case we know only one can be a pointer.

This for example causes gcc.dg/tree-ssa/pr94969.c to miss some
RTL PRE (I've opened PR113395 for this).  A more drastic patch,
removing base_alias_check results in only gcc.dg/guality/pr41447-1.c
regressing (so testsuite coverage is bad).  I've looked at
gcc.dg/tree-ssa tests and mostly scheduling changes are present,
the cc1plus .text size is only 230 bytes worse.  With the this
less drastic patch below most scheduling changes are gone.

x86_64 might not the very best target to test for impact, but
test coverage on other targets is unlikely to be very much better.

PR rtl-optimization/113255
* alias.cc (find_base_term): Remove PLUS/MINUS handling
when both operands are not CONST_INT_P.

* gcc.dg/torture/pr113255.c: New testcase.

[Bug debug/112718] [11/12/13 Regression] ICE: in add_dwarf_attr, at dwarf2out.cc:4501 with -g -fdebug-types-section -flto -ffat-lto-objects

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112718

Richard Biener  changed:

   What|Removed |Added

  Known to work||14.0
   Priority|P3  |P2
Summary|[11/12/13/14 Regression]|[11/12/13 Regression] ICE:
   |ICE: in add_dwarf_attr, at  |in add_dwarf_attr, at
   |dwarf2out.cc:4501 with -g   |dwarf2out.cc:4501 with -g
   |-fdebug-types-section -flto |-fdebug-types-section -flto
   |-ffat-lto-objects   |-ffat-lto-objects

--- Comment #4 from Richard Biener  ---
Fixed on trunk sofar.

[Bug debug/112718] [11/12/13/14 Regression] ICE: in add_dwarf_attr, at dwarf2out.cc:4501 with -g -fdebug-types-section -flto -ffat-lto-objects

2024-01-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112718

--- Comment #3 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:7218f5050cb7163edae331f54ca163248ab48bfa

commit r14-8345-g7218f5050cb7163edae331f54ca163248ab48bfa
Author: Richard Biener 
Date:   Mon Jan 22 15:42:59 2024 +0100

debug/112718 - reset all type units with -ffat-lto-objects

When mixing -flto, -ffat-lto-objects and -fdebug-type-section we
fail to reset all type units after early output resulting in an
ICE when attempting to add then duplicate sibling attributes.

PR debug/112718
* dwarf2out.cc (dwarf2out_finish): Reset all type units
for the fat part of an LTO compile.

* gcc.dg/debug/pr112718.c: New testcase.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-23
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org

--- Comment #5 from Richard Biener  ---
(In reply to Martin Jambor from comment #4)
> The right place where to free stuff in lattices post-IPA would be in
> ipa_node_params::~ipa_node_params() where we should iterate over lattices
> and deinitialize them or perhaps destruct the array because since
> ipcp_vr_lattice directly contains Value_Range which AFAIU directly contains
> int_range_max which has a virtual destructor... does not look like a POD
> anymore.  This has escaped me when I was looking at the IPA-VR changes but
> hopefully it should not be too difficult to deal with.

OK, that might work for the IPA side.

It's quite unusual to introduce a virtual DTOR in the middle of the class
hierarchy though.  Grepping I do see quite some direct uses of 'irange'
and also 'vrange' which do not have the DTOR visible but 'irange' already
exposes and uses 'maybe_resize'.  I think those should only be introduced
in the class exposing the virtual DTOR (but why virtual?!).

Would be nice to have a picture of the range class hierarchies with
pointers on which types to use in which circumstances ...

For example:

  Value_Range vr (parm_type);
...
   irange  = as_a  (vr);
   irange_bitmask bm = r.get_bitmask ();
...

should that really use 'irange'?  Why not int_range&?

All the complication might be because of GC (irange is GTY but int_range is
not), but re-allocation would happen with 'new', not ggc_alloc, so ...

But yes, please try to fix IPA CP, I'll see if this pops up elsewhere as well
then.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #11 from Richard Biener  ---
(In reply to Tamar Christina from comment #9)
> There is a weird costing going on in the PHI nodes though:
> 
> m_108 = PHI  1 times vector_stmt costs 0 in body 
> m_108 = PHI  2 times scalar_to_vec costs 0 in prologue
> 
> they have collapsed to 0. which can't be right..

Note this is likely because of the backend going wrong.

bool
vectorizable_phi (vec_info *,
  stmt_vec_info stmt_info, gimple **vec_stmt,
  slp_tree slp_node, stmt_vector_for_cost *cost_vec)
{
..

  /* For single-argument PHIs assume coalescing which means zero cost
 for the scalar and the vector PHIs.  This avoids artificially
 favoring the vector path (but may pessimize it in some cases).  */
  if (gimple_phi_num_args (as_a  (stmt_info->stmt)) > 1)
record_stmt_cost (cost_vec, SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node),
  vector_stmt, stmt_info, vectype, 0, vect_body);

You could check if we call this with sane values.

[Bug target/113507] can't build a cross compiler to rs6000-ibm-aix7.2

2024-01-22 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113507

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 CC||segher at gcc dot gnu.org
   Last reconfirmed||2024-01-23
 Ever confirmed|0   |1

--- Comment #5 from Kewen Lin  ---
(In reply to H.J. Lu from comment #3)
> (In reply to Kewen Lin from comment #2)
> > Guessing /usr/local/bin/ld is a gnu ld? Based on what I heard before, gnu ld
> > has some problems on aix, people pass object files to aix system and use aix
> > ld there. Not sure if the understanding still holds.
> 
> I am building a cross compiler.  No AIX tools are involved.

Thanks for clarifying, I was dull and misunderstood it.

Confirmed, some symbols are from rs6000-builtin.cc (which is not generated) but
it requires some symbols in rs6000-builtins.cc (which is generated). Both
object files are not included in linking. The below diff can fix it:

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b2d7d7dd475..6b62e4fe56c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -557,8 +557,10 @@ rs6000*-*-*)
 extra_options="${extra_options} g.opt fused-madd.opt
rs6000/rs6000-tables.opt"
 extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
 extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
+extra_objs="${extra_objs} rs6000-builtin.o rs6000-builtins.o"
 target_gtfiles="$target_gtfiles
\$(srcdir)/config/rs6000/rs6000-logue.cc
\$(srcdir)/config/rs6000/rs6000-call.cc"
 target_gtfiles="$target_gtfiles
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
+target_gtfiles="$target_gtfiles ./rs6000-builtins.h"
 ;;
 sparc*-*-*)
 cpu_type=sparc

According to David's comments "rs6000-ibm-aix doesn't exist any more" and I
vaguely remembered Segher also mentioned rs6000*-*-*) becomes stale, maybe we
can aggressively drop the whole rs6000*-*-*) case handling?

[Bug tree-optimization/113551] [13/14 Regression] Miscompilation with -O1 -funswitch-loops -fno-strict-overflow

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113551

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
Summary|Miscompilation with -O1 |[13/14 Regression]
   |-funswitch-loops|Miscompilation with -O1
   |-fno-strict-overflow|-funswitch-loops
   ||-fno-strict-overflow
   Last reconfirmed||2024-01-23
 Status|UNCONFIRMED |NEW
  Known to fail|5.4.0   |13.2.0
   Target Milestone|--- |13.3

--- Comment #5 from Andrew Pinski  ---
Confirmed at least for the bad unswitch which causes the other wrong code to
happen.

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-22 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #11 from Oleg Endo  ---
(In reply to Roger Sayle from comment #10)

> I've found an interesting table of SH cycle counts (for different CPUs) at
> http://www.shared-ptr.com/sh_insns.html

Yeah, I know.  I did that ;)

> In my proposed patch, the address cost (1) when optimizing for size attempts
> to return the additional size of an instruction based on the addressing
> mode.  For register, and reg+reg addressing modes there is no size increase
> (overhead), and for adressing modes with displacements, and displacements to
> address pointers, there is a cost.

AFAIR, I've added the 'sh_address_cost' function.  The intention was/is to
encourage/discourage usage of certain address modes based on the side effects
and impact on the surrounding code.  All insns/addr modes have the same length
and basically same execution time.  However, e.g. @(reg+reg) has a constraint
on 'r0' usage, so I weighted that heavier.  If there's anything that could use
@(reg+disp) as an alternative, that'd be better in some cases. (not sure if
such optimizations actually are done...)

> (2) when optimizing for speed, address
> cost remains between 0 and 3, and is used to prioritize between (equivalent
> numbers of) instructions.  Normally, rtx_costs are defined in terms of
> COST_N_INSNS, which multiplies by 4.  Hence on many platforms a single
> instruction that references memory may be encoded as COSTS_N_INSNS(1)+1 (or
> a more complex addressing mode as COSTS_N_INSNS(1)+2) to show that this is
> disfavored to a single instruction that doesn't reference memory,
> COSTS_N_INSNS(1)+0.

That's actually what sh_rtx_costs was supposed to do as well.  I think in usual
cases it does that, only that apparently I've screwed up the {SIGN|ZERO}_EXTEND
for the case of the mem load and it shows up only now, many years later.

It's still not entirely clear to me why we would want to squash the costs of
addresses to 0 when optimizing for size?  What does effect does it have on the
generated code?  I can't imagine how it would be possibly making any smaller
code?

With your patch, in case of the SIGN_EXTEND with mem operand, it would make the
address cost 0 with -Os, which would return COSTS_N_INSNS(1) for reg operand as
well as mem operand.  So both insns are equally weighted and could be
considered interchangeable.  And we might bump into this type of regression
again, if some (future) optimization decides that it can interchange/substitute
insns of the same cost... 


> For example, SH currently reports multiplications as a single cycle operation,

That doesn't seem to be the case.  It's supposed to be using the function
'multcosts' in sh.cc, which returns at least a cost of '2'.  Note that on SH1
and SH2 there is no dynamic (barrel) shift.  So actually some multiplications
could be faster than stitched shifts.


> sh_rtx_costs doesn't distinguish the machine mode, so the costs of SImode 
> multiplications are the same as DImode multiplications.

I guess this is because SH doesn't have real DImode multiplication (64 x 64 ->
64/128 bit).  It can only do 32 x 32 -> 64 bit widening multiplication.  Any
real DImode multiplication will result in either expanded sequence to calculate
sum of particial products or a libcall, AFAIR

[Bug target/113507] can't build a cross compiler to rs6000-ibm-aix7.2

2024-01-22 Thread dje at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113507

--- Comment #4 from David Edelsohn  ---
rs6000-ibm-aix doesn't exist anymore.  This should have been configured as
powerpc-ibm-aix7.2 .  Maybe there is some magic about the "powerpc" name?

Those variables are provided by generated files and apparently something is not
generating them when building a cross compiler.

[Bug tree-optimization/113551] Miscompilation with -O1 -funswitch-loops -fno-strict-overflow

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113551

Andrew Pinski  changed:

   What|Removed |Added

  Known to fail||5.4.0

--- Comment #4 from Andrew Pinski  ---
The incorrect unswitch has been happening since at least GCC 5 ...

[Bug tree-optimization/113551] Miscompilation with -O1 -funswitch-loops -fno-strict-overflow

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113551

Andrew Pinski  changed:

   What|Removed |Added

  Component|rtl-optimization|tree-optimization
   Keywords||wrong-code

--- Comment #3 from Andrew Pinski  ---
Looks like the unswitch is happening when it should not be ...

[Bug rtl-optimization/113551] Miscompilation with -O1 -funswitch-loops -fno-strict-overflow

2024-01-22 Thread yshuiv7 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113551

--- Comment #2 from Yuxuan Shui  ---
regression from 12.3 -> 13.2

[Bug rtl-optimization/113551] Miscompilation with -O1 -funswitch-loops -fno-strict-overflow

2024-01-22 Thread yshuiv7 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113551

--- Comment #1 from Yuxuan Shui  ---
code is reduced from perf, source file util/dsos.c

[Bug rtl-optimization/113551] New: Miscompilation with -O1 -funswitch-loops -fno-strict-overflow

2024-01-22 Thread yshuiv7 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113551

Bug ID: 113551
   Summary: Miscompilation with -O1 -funswitch-loops
-fno-strict-overflow
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: yshuiv7 at gmail dot com
  Target Milestone: ---

Code:

struct obj {
int __pad;
int i;
};

/* aborts when called with NULL */
int assert_not_null(void *); 

void bug(struct obj **root, struct obj *dso) {
while (1) {
struct obj *this = *root;

if (dso == (void *)0)
// should return here
return;

if (dso == this)
return;

// shouldn't reach here
assert_not_null(dso);

if (!>i)
break;
}
}

// call like this: bug(, NULL);

Result:

* -O1: ok
* -O1 -funswitch-loops: ok
* -O1 -fno-strict-overflow: ok
* -O1 -funswitch-loops -fno-strict-overflow: abort

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-22 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #10 from Roger Sayle  ---
Hi Oleg.  Great question.  The "speed" parameter passed to rtx_costs, and
address_cost indicates whether the middle-end is optimizing for peformance, and
interested in the nummber of cycles taken by each instruction, or optimizing
for size, and interested in the number of bytes used to encode the instruction.
 Previously, this speed parameter was ignored by the SH backend, so the costs
were the same independent of the objective function.

In my proposed patch, the address cost (1) when optimizing for size attempts to
return the additional size of an instruction based on the addressing mode.  For
register, and reg+reg addressing modes there is no size increase (overhead),
and for adressing modes with displacements, and displacements to address
pointers, there is a cost.  (2) when optimizing for speed, address cost remains
between 0 and 3, and is used to prioritize between (equivalent numbers of)
instructions.  Normally, rtx_costs are defined in terms of COST_N_INSNS, which
multiplies by 4.  Hence on many platforms a single instruction that references
memory may be encoded as COSTS_N_INSNS(1)+1 (or a more complex addressing mode
as COSTS_N_INSNS(1)+2) to show that this is disfavored to a single instruction
that doesn't reference memory, COSTS_N_INSNS(1)+0.

This is the fix for this particular regression; SIGN_EXTEND of a register now
costs COSTS_N_INSNS(1), and SIGN_EXTEND of a MEM now costs COSTS_N_INSNS(1)+1.

A useful way to debug rtx_costs is to use the -dP command line option, and then
look at the [c=X, l=Y] annotations in the assembly language file.  One way to
check/confirm that these are sensible is that ideally they should be correlated
when optimizing for size (with -Os or -Oz).

I've found an interesting table of SH cycle counts (for different CPUs) at
http://www.shared-ptr.com/sh_insns.html and these could be used to improve
sh_rtx_costs further.  For example, SH currently reports multiplications as
a single cycle operation, which doesn't match the hardware specs, and prevents
GCC from using synth_mult to produce faster (or shorter) sequences using shifts
and additions.  Likewise, sh_rtx_costs doesn't distinguish the machine mode,
so the costs of SImode multiplications are the same as DImode multiplications.

In comment #5 you mention GCC's defaults; it turns out that for rtx_costs the
default values that would be provided by the middle-end, may be more accurate
than the values (currently) specified by the backend.

I hope this answers your question.

[Bug target/53929] [meta-bug] -masm=intel with global symbol

2024-01-22 Thread lh_mouse at 126 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53929

--- Comment #25 from LIU Hao  ---
Created attachment 57191
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57191=edit
Draft patch

This is a draft patch, bootstrapped on {i686,x86_64}-w64-mingw32 successfully.
Haven't run tests though.

[Bug c++/90463] Documentation: -Wunused not listed among the options enabled by -Wall

2024-01-22 Thread sandra at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90463

sandra at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from sandra at gcc dot gnu.org ---
Marking this fixed now.

[Bug c/89180] [meta-bug] bogus/missing -Wunused warnings

2024-01-22 Thread sandra at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89180
Bug 89180 depends on bug 90463, which changed state.

Bug 90463 Summary: Documentation: -Wunused not listed among the options enabled 
by -Wall
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90463

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-22 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #9 from Oleg Endo  ---
(In reply to Roger Sayle from comment #8)
> Created attachment 57190 [details]
> proposed patch
> 
> Proposed patch to provide a sane/saner set of rtx_costs for SH.  There's
> plenty more that could be done, but these changes are (more than) sufficient
> to resolve the code quality regression caused by improved fwprop.  If
> someone could try this out on SH, and report back the results, that would be
> great.


You've added differentiation for 'speed ?' in 'sh_address_cost'.  Like this
one.

   /* 'GBR + 0'.  Account one more because of R0 restriction.  */
   if (REG_P (x) && REGNO (x) == GBR_REG)
-return 2;
+return speed ? 2 : 0;

What's the intention here?  Why does the cost of the address computation
reduced when not optimizing for speed?  It distorts the address costs and makes
them all equal.

[Bug c++/90463] Documentation: -Wunused not listed among the options enabled by -Wall

2024-01-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90463

--- Comment #3 from GCC Commits  ---
The master branch has been updated by Sandra Loosemore :

https://gcc.gnu.org/g:7e758890a4c86db790a5f9aef0191eef77047f65

commit r14-8342-g7e758890a4c86db790a5f9aef0191eef77047f65
Author: Sandra Loosemore 
Date:   Mon Jan 22 22:38:49 2024 +

Correct lists of options enabled by -Wall and -Wextra [PR90463]

gcc/ChangeLog
PR c++/90463
* doc/invoke.texi (Warning Options): Correct lists of options
enabled by -Wall and -Wextra by checking against common.opt
and c-family/c.opt.

[Bug c++/113531] [14 Regression] AddressSanitizer: stack-use-after-scope when iterating over initializer list since r14-1500-g4d935f52b0d5c0

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113531

--- Comment #1 from Andrew Pinski  ---
It would be useful to get a reduced testcase without the use of the Catch2Main
library.

[Bug c++/113531] [14 Regression] AddressSanitizer: stack-use-after-scope when iterating over initializer list

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113531

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||wrong-code
   Target Milestone|--- |14.0
Summary|AddressSanitizer:   |[14 Regression]
   |stack-use-after-scope when  |AddressSanitizer:
   |iterating over initializer  |stack-use-after-scope when
   |list|iterating over initializer
   ||list

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-22 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #8 from Roger Sayle  ---
Created attachment 57190
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57190=edit
proposed patch

Proposed patch to provide a sane/saner set of rtx_costs for SH.  There's plenty
more that could be done, but these changes are (more than) sufficient to
resolve the code quality regression caused by improved fwprop.  If someone
could try this out on SH, and report back the results, that would be great.

[Bug target/113550] data512_t initializers dereference a clobbered register

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113550

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-22
 Ever confirmed|0   |1
   Keywords||wrong-code
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org

--- Comment #1 from Andrew Pinski  ---
Should be an easy fix.

The pattern:
(define_insn "*aarch64_movv8di"
  [(set (match_operand:V8DI 0 "nonimmediate_operand" "=r,m,r")
(match_operand:V8DI 1 "general_operand" " r,r,m"))]
  "(register_operand (operands[0], V8DImode)
|| register_operand (operands[1], V8DImode))"
  "#"
  [(set_attr "type" "multiple,multiple,multiple")
   (set_attr "length" "32,16,16")]
)

Is missing a & on the r/m case.

Or the split could be improved such that the one that gets loadded last is the
one that might conflict:
(define_split
  [(set (match_operand:V8DI 0 "nonimmediate_operand")
(match_operand:V8DI 1 "general_operand"))]
  "reload_completed"
  [(const_int 0)]

aarch64_simd_emit_reg_reg_move handles this case already too.

I am going to go with the improving the define_split ...

[Bug target/113549] float simd crash on windows in gcc.dg/vect/vect-simd-clone-16b.c

2024-01-22 Thread nightstrike at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113549

--- Comment #4 from nightstrike  ---
(In reply to Andrew Pinski from comment #3)
> Either the stack size or the stack alignment issue.
> 
> I am suspecting a stack alignement issue.

Possibly related: PR110273

[Bug target/113550] New: data512_t initializers dereference a clobbered register

2024-01-22 Thread ianthompson at microsoft dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113550

Bug ID: 113550
   Summary: data512_t initializers dereference a clobbered
register
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ianthompson at microsoft dot com
  Target Milestone: ---
Target: aarch64

Created attachment 57189
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57189=edit
Additional non-minimal failing cases

When initializing or copying a data512_t, the compiler is generating code which
clobbers the register containing the source pointer of the copy. Initially
observed on Arm GNU Toolchain 12.2.Rel1, but this also reproduces on trunk.

Minimal reproduction, hits a segfault when compiled with "aarch64-none-elf-gcc
-march=armv9-a+ls64":

#include 
void test_data512_init() {
data512_t my_value = {};
}

This code generates this assembly snippet for initializing my_value:
adrpx0, .LC0
add x0, x0, :lo12:.LC0
ldp x0, x1, [x0]
ldp x2, x3, [x0, 16]
ldp x4, x5, [x0, 32]
ldp x6, x7, [x0, 48]

Notice that the first ldp clobbers x0, redirecting the remaining 3 loads to
whatever address happens to be in val[0] of the initializer.

Similar incorrect code is generated in many other situations that involve
copying a data512_t (passing a global variable to a function, dereferencing a
data512_t*, etc). See the attached source file for the other failing cases I'm
seeing.

[Bug rtl-optimization/113546] [13/14 Regression] aarch64: bootstrap-debug-lean broken with -fcompare-debug failure since r13-2921-gf1adf45b17f7f1

2024-01-22 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113546

--- Comment #5 from Alex Coplan  ---
FWIW the original preprocessed testcase (regex.i) also started failing with the
same commit (as the reduced testcase).

[Bug c++/113347] [12/13 Regression] ICE during gimplification building TVM since r13-8079-gd237e7b291ff52

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113347

Andrew Pinski  changed:

   What|Removed |Added

 CC||csfore at posteo dot net

--- Comment #9 from Andrew Pinski  ---
*** Bug 113547 has been marked as a duplicate of this bug. ***

[Bug c++/113547] [13 Regression] c++: In function ‘std::vector package_b_info()’: cc1plus: internal compiler error: Segmentation fault

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113547

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Andrew Pinski  ---
(In reply to Christopher Fore from comment #3)
> Backtrace:
> 
> In function ‘std::vector package_b_info()’:
> cc1plus: internal compiler error: Segmentation fault
> 0xe4dfcf crash_signal
>   
> /var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/
> toplev.cc:314
> 0x759446 error_operand_p(tree_node const*)
>   
> /var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/
> tree.h:4501
> 0x759446 cp_gimplify_expr(tree_node**, gimple**, gimple**)
>   
> /var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/
> cp/cp-gimplify.cc:550


Yep, a dup of bug 113347.

*** This bug has been marked as a duplicate of bug 113347 ***

[Bug c++/113547] [13 Regression] c++: In function ‘std::vector package_b_info()’: cc1plus: internal compiler error: Segmentation fault

2024-01-22 Thread csfore at posteo dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113547

--- Comment #3 from Christopher Fore  ---
Backtrace:

In function ‘std::vector package_b_info()’:
cc1plus: internal compiler error: Segmentation fault
0xe4dfcf crash_signal
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/toplev.cc:314
0x759446 error_operand_p(tree_node const*)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/tree.h:4501
0x759446 cp_gimplify_expr(tree_node**, gimple**, gimple**)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/cp/cp-gimplify.cc:550
0xb9c8d7 gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*),
int)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:16292
0xba3f45 gimplify_stmt(tree_node**, gimple**)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:7226
0xba3f45 gimplify_compound_expr
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:6412
0xb9eaba gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*),
int)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:16373
0xb9d3a2 gimplify_stmt(tree_node**, gimple**)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:7226
0xb9d3a2 gimplify_and_add(tree_node*, gimple**)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:492
0xb9d3a2 gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*),
int)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:16751
0xb9e065 gimplify_stmt(tree_node**, gimple**)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:7226
0xb9e065 gimplify_statement_list
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:2019
0xb9e065 gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*),
int)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:16828
0xba576b gimplify_stmt(tree_node**, gimple**)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:7226
0xba576b gimplify_bind_expr
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:1430
0xb9dcd1 gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*),
int)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:16584
0xba1629 gimplify_stmt(tree_node**, gimple**)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:7226
0xba1629 gimplify_body(tree_node*, bool)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:17645
0xba1a02 gimplify_function_tree(tree_node*)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:17844
0xa10d77 cgraph_node::analyze()
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/cgraphunit.cc:684
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #10 from JuzheZhong  ---
(In reply to Tamar Christina from comment #9)
> So on SVE the change is cost modelling.
> 
> Bisect landed on g:33c2b70dbabc02788caabcbc66b7baeafeb95bcf which changed
> the compiler's defaults to using the new throughput matched cost modelling
> used be newer cores.
> 
> It looks like this changes which mode the compiler picks for when using a
> fixed register size.
> 
> This is because the new cost model (correctly) models the costs for FMAs and
> promotions.
> 
> Before:
> 
> array1[0][_1] 1 times scalar_load costs 1 in prologue
> int) _2 1 times scalar_stmt costs 1 in prologue
> 
> after:
> 
> array1[0][_1] 1 times scalar_load costs 1 in prologue 
> (int) _2 1 times scalar_stmt costs 0 in prologue 
> 
> and the cost goes from:
> 
> Vector inside of loop cost: 125
> 
> to
> 
> Vector inside of loop cost: 83 
> 
> so far, nothing sticks out, and in fact the profitability for VNx4QI drops
> from
> 
> Calculated minimum iters for profitability: 5
> 
> to
> 
> Calculated minimum iters for profitability: 3
> 
> This causes a clash, as this is now exactly the same cost as VNx2QI which
> used to be what it preferred before.
> 
> Which then leads it to pick the higher VF.
> 
> In the end smaller VF shows:
> 
> ;; Guessed iterations of loop 4 is 0.500488. New upper bound 1.
> 
> and now we get:
> 
> Vectorization factor 16 seems too large for profile prevoiusly believed to
> be consistent; reducing.  
> ;; Guessed iterations of loop 4 is 0.500488. New upper bound 0.
> ;; Scaling loop 4 with scale 66.6% (guessed) to reach upper bound 0
> 
> which I guess is the big difference.
> 
> There is a weird costing going on in the PHI nodes though:
> 
> m_108 = PHI  1 times vector_stmt costs 0 in body 
> m_108 = PHI  2 times scalar_to_vec costs 0 in prologue
> 
> they have collapsed to 0. which can't be right..

I don't think this change makes the regression since the regression not only
happens on ARM SVE but also on RVV.
It should be middle-end.

I believe you'd better use -fno-vect-cost-model.

[Bug testsuite/113548] gcc.dg/vect/vect-ifcvt-19.c ICEs on LLP64 target

2024-01-22 Thread nightstrike at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113548

--- Comment #3 from nightstrike  ---
Seeing as how this is a testsuite issue, it seems that the crash in the same
location applies to the following:

FAIL: gcc.dg/vect/vect-ifcvt-19.c (internal compiler error: in build2, at
tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10a.c (internal compiler error: in build2, at
tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10b.c (internal compiler error: in build2, at
tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10c.c (internal compiler error: in build2, at
tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10d.c (internal compiler error: in build2, at
tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10e.c (internal compiler error: in build2, at
tree.cc:5097)
FAIL: gcc.dg/vect/vect-ifcvt-19.c -flto -ffat-lto-objects (internal compiler
error: in build2, at tree.cc:5097)
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects (internal
compiler error: in build2, at tree.cc:5097)
FAIL: gcc.dg/vect/vect-cond-arith-2.c (internal compiler error: in build2, at
tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10b.c -flto -ffat-lto-objects (internal compiler
error: in build2, at tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10c.c -flto -ffat-lto-objects (internal compiler
error: in build2, at tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10d.c -flto -ffat-lto-objects (internal compiler
error: in build2, at tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10e.c -flto -ffat-lto-objects (internal compiler
error: in build2, at tree.cc:5097)

[Bug target/113549] float simd crash on windows in gcc.dg/vect/vect-simd-clone-16b.c

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113549

--- Comment #3 from Andrew Pinski  ---
Either the stack size or the stack alignment issue.

I am suspecting a stack alignement issue.

[Bug c++/109642] False Positive -Wdangling-reference with std::span-like classes

2024-01-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109642

--- Comment #15 from GCC Commits  ---
The trunk branch has been updated by Marek Polacek :

https://gcc.gnu.org/g:c596ce03120cc22e141186401c6656009ddebdaa

commit r14-8339-gc596ce03120cc22e141186401c6656009ddebdaa
Author: Marek Polacek 
Date:   Mon Jan 22 16:12:33 2024 -0500

c++: extend Wdangling-reference17.C [PR109642]

This patch extends g++.dg/warn/Wdangling-reference17.C with code
from PR109642.  I'm not creating a new test because this one
already #includes the required headers.

PR c++/109642

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wdangling-reference17.C: Additional testing.

[Bug target/113549] float simd crash on windows in gcc.dg/vect/vect-simd-clone-16b.c

2024-01-22 Thread nightstrike at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113549

--- Comment #2 from nightstrike  ---
Test 16e uses double instead of float, which also crashes.

[Bug target/113549] float simd crash on windows in gcc.dg/vect/vect-simd-clone-16b.c

2024-01-22 Thread nightstrike at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113549

--- Comment #1 from nightstrike  ---
Created attachment 57188
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57188=edit
Failing source for easier copying

[Bug target/113549] New: float simd crash on windows in gcc.dg/vect/vect-simd-clone-16b.c

2024-01-22 Thread nightstrike at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113549

Bug ID: 113549
   Summary: float simd crash on windows in
gcc.dg/vect/vect-simd-clone-16b.c
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nightstrike at gmail dot com
  Target Milestone: ---

Created attachment 57187
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57187=edit
Assembly output

The vect-simd-clone-16b.c test runs the vect-simd-clone-16.c test with the TYPE
set to float.  The default type is int, which works fine.  Reducing that
testcase yields the following:


```
#define TYPE float
#pragma omp declare simd inbranch
TYPE __attribute__((noinline))
foo (TYPE a)
{
  return a + 1;
}

void
masked_fixed (TYPE * a, TYPE * b)
{
  #pragma omp simd
  for (int i = 0; i < 128; i++)
b[i] = a[i]<1 ? foo(a[i]) : a[i];
}

int main() {
TYPE a[1024] = {0};
TYPE b[1024] = {0};
masked_fixed(a, b);
return 0;
}
```

The noipa attribute and the __restrict keywords were removed from masked_fixed.
 noinline is required on foo.


Minimal set of compile arguments required to trigger the problem:
$ x86_64-w64-mingw32-gcc  a.c -fopenmp-simd -O2 -mavx

Note that dropping to -01 or removing -mavx avoids the crash.  

Assembly from -save-temps -fverbose-asm attached.

This is technically running under wine 8.0.  This is the backtrace provided by
wine:

```
wine: Unhandled page fault on read access to  at address
00014000163F (thread 0024), starting debugger...
Unhandled exception: page fault on read access to 0x in 64-bit
code (0x014000163f).
Register dump:
 rip:00014000163f rsp:0021dc50 rbp:0021dcd0 eflags:00010246
(  R- --  I  Z- -P- )
 rax: rbx: rcx:0021dcf0
rdx:0021dcd0
 rsi:0021ed70 rdi:0021dd70  r8:0021dcb0 
r9:00c92000 r10:00c90330
 r11: r12:0021dcb0 r13:0021dcf0
r14: r15:
Stack dump:
0x21dc50:   
0x21dc60:   
0x21dc70:   
0x21dc80:   
0x21dc90:   
0x21dca0:   
0x21dcb0:   
0x21dcc0:   
0x21dcd0:   
0x21dce0:   
0x21dcf0:   
0x21dd00:   
Backtrace:
=>0 0x014000163f in a (+0x163f) (0x21dcd0)
  1 0x0140003384 in a (+0x3384) (0x21fdf0)
  2 0x0140001340 in a (+0x1340) (0x21fdf0)
  3 0x0140001146 in a (+0x1146) (0x21fe30)
  4 0x007b647b51 BaseThreadInitThunk+0x11(unknown=,
entry=, arg=)
[H:\home\user\p\gcc\src\wine-8.0-rc4p2p3\dlls\kernel32\thread.c:61] in kernel32
(0x0
00021fe60)
0x014000163f a+0x163f: ldsl %esp,%edi
```

[Bug target/109929] profiledbootstrap failure on aarch64-linux-gnu with graphite optimization

2024-01-22 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109929

--- Comment #7 from Richard Sandiford  ---
Hmm, yeah, like you say, neither of those commits should have made a different
to whether bootstrap works.  I guess the problem is just latent now.

[Bug c++/109642] False Positive -Wdangling-reference with std::span-like classes

2024-01-22 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109642

--- Comment #14 from Marek Polacek  ---
(In reply to Miro Palmu from comment #11)
> I'm not sure if this is useful information but, using span with a view in a
> ranged-based for loop triggers false positive -Wdangling-referene on gcc
> 14.0.1 20240117 but not on gcc 13.2.
> 
> // On godbold: https://godbolt.org/z/x9jKh4MoW
> #include 
> #include 
> #include 
> 
> int main() {
> const auto vec = std::vector{ 1, 2, 3 };
> const auto s = std::span{vec};
> 
> // -Wwaring=dangling-reference on gcc 14.0.1 20240117 but not on gcc 13.2
> for ([[maybe_unused]] auto _ : s | std::views::take(2)) { }
> 
> // No warning
> for ([[maybe_unused]] auto _ : vec | std::views::take(2)) { }
> 
> // No warning
> const auto s_view = s | std::views::take(2);
> for ([[maybe_unused]] auto _ : s_view) { }
> }

This should be fixed now.  I'm going to expand Wdangling-reference17.C with
this test though.  Thanks.

[Bug testsuite/113548] gcc.dg/vect/vect-ifcvt-19.c ICEs on LLP64 target

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113548

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org

--- Comment #2 from Andrew Pinski  ---
I will fix this testcase today or tomorrow.  It should not be hard.

[Bug target/109929] profiledbootstrap failure on aarch64-linux-gnu with graphite optimization

2024-01-22 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109929

--- Comment #6 from Xi Ruoyao  ---
The first commit deferring the failure to stagefeedback is:

commit 575858508090b18dcbc176db285c9f55227ca4c0
Author: Richard Sandiford 
Date:   Tue Oct 17 23:46:33 2023 +0100

aarch64: Use vecs to store register save order

[Bug testsuite/113548] gcc.dg/vect/vect-ifcvt-19.c ICEs on LLP64 target

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113548

Andrew Pinski  changed:

   What|Removed |Added

  Component|tree-optimization   |testsuite
   Keywords||testsuite-fail
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=108954
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
Summary|ICE vect-ifcvt-19 in|gcc.dg/vect/vect-ifcvt-19.c
   |build2, at tree.cc:5097 |ICEs on LLP64 target
 Target|x86_64-w64-mingw32  |*-*-mingw
   Last reconfirmed||2024-01-22

--- Comment #1 from Andrew Pinski  ---
Note this is just the bug for the testcase issue rather than the ICE, the ICE
is PR 108954 .

We should change  the type of _2 and _1  to __SIZE_TYPE__ from `unsigned long`
as size_type on mingw (and some other targets) is NOT the same size as
`unsigned long`.

[Bug tree-optimization/113548] New: ICE vect-ifcvt-19 in build2, at tree.cc:5097

2024-01-22 Thread nightstrike at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113548

Bug ID: 113548
   Summary: ICE vect-ifcvt-19 in build2, at tree.cc:5097
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nightstrike at gmail dot com
  Target Milestone: ---

Created attachment 57186
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57186=edit
Preprocessed source from -freport-bug

ICE during testsuite run for vect-ifcvt-19.  Many similarly titled bugs, I
think the current title of this one is equally unhelpful, so feel free to
change this PR title to something better.

Running linux 64 to windows 64 cross compiler, fails on 11, 12, 13, 14.  I
didn't test prior versions.

Backtrace:

0x8bb855 build2(tree_code, tree_node*, tree_node*, tree_node*)
../../gcc/tree.cc:5097
0xa4dd1f build2_loc(unsigned int, tree_code, tree_node*, tree_node*,
tree_node*)
../../gcc/tree.h:4750
0xa4dd1f c_parser_gimple_binary_expression
../../gcc/c/gimple-parser.cc:1068
0xa4ec71 c_parser_gimple_statement
../../gcc/c/gimple-parser.cc:878
0xa4f95a c_parser_gimple_compound_statement
../../gcc/c/gimple-parser.cc:669
0xa51a58 c_parser_parse_gimple_body(c_parser*, char*, c_declspec_il,
profile_count)
../../gcc/c/gimple-parser.cc:253
0xa3d3f4 c_parser_declaration_or_fndef
../../gcc/c/c-parser.cc:3011
0xa4764b c_parser_external_declaration
../../gcc/c/c-parser.cc:2046
0xa48035 c_parser_translation_unit
../../gcc/c/c-parser.cc:1900
0xa48035 c_parse_file()
../../gcc/c/c-parser.cc:26815
0xabf271 c_common_parse_file()
../../gcc/c-family/c-opts.cc:1301

[Bug c++/113541] Rejects __attribute__((section)) on explicit instantiation declaration of ctor/dtor

2024-01-22 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113541

Marek Polacek  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
 CC||mpolacek at gcc dot gnu.org
   Last reconfirmed||2024-01-22

--- Comment #1 from Marek Polacek  ---
The error started with r5-1210-ge257a17cb9cc4d.

[Bug libgomp/113513] [OpenMP] libgomp: cuCtxGetDevice error with OMP_DISPLAY_ENV=true OMP_TARGET_OFFLOAD="mandatory" for libgomp.c/target-52.c

2024-01-22 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113513

--- Comment #2 from Tobias Burnus  ---
Patch:
  https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643648.html

[Bug c++/113547] [13 Regression] c++: In function ‘std::vector package_b_info()’: cc1plus: internal compiler error: Segmentation fault

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113547

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=113347

--- Comment #2 from Andrew Pinski  ---
Most likely a dup of bug 113347.

[Bug c++/113547] [13 Regression] c++: In function ‘std::vector package_b_info()’: cc1plus: internal compiler error: Segmentation fault

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113547

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |13.3
Summary|c++: In function|[13 Regression] c++: In
   |‘std::vector|function ‘std::vector
   |package_b_info()’: cc1plus: |package_b_info()’: cc1plus:
   |internal compiler error:|internal compiler error:
   |Segmentation fault  |Segmentation fault

[Bug c++/113547] c++: In function ‘std::vector package_b_info()’: cc1plus: internal compiler error: Segmentation fault

2024-01-22 Thread csfore at posteo dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113547

--- Comment #1 from Christopher Fore  ---
Created attachment 57185
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57185=edit
minimized file with cvise

[Bug c++/113547] New: c++: In function ‘std::vector package_b_info()’: cc1plus: internal compiler error: Segmentation fault

2024-01-22 Thread csfore at posteo dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113547

Bug ID: 113547
   Summary: c++: In function ‘std::vector package_b_info()’:
cc1plus: internal compiler error: Segmentation fault
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: csfore at posteo dot net
  Target Milestone: ---

Created attachment 57184
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57184=edit
original preprocessed file

Originally reported downstream in Gentoo at https://bugs.gentoo.org/920322 when
building package =dev-util/build2-0.14.0

Command line required to trigger for the original:

x86_64-pc-linux-gnu-g++ -std=c++20 -c -fdirectives-only manifest-utility.o.ii

Command line required for the minimized version:

x86_64-pc-linux-gnu-g++ manifest-utility.o.ii


$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/13/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with:
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/configure
--host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr
--bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/13
--includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/13/include
--datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/13
--mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/13/man
--infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/13/info
--with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13
--disable-silent-rules --disable-dependency-tracking
--with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/13/python
--enable-languages=c,c++,go,fortran --enable-obsolete --enable-secureplt
--disable-werror --with-system-zlib --enable-nls --without-included-gettext
--disable-libunwind-exceptions --enable-checking=release
--with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo
13.2.1_p20240113-r1 p12' --with-gcc-major-version-only --enable-libstdcxx-time
--enable-lto --disable-libstdcxx-pch --enable-shared --enable-threads=posix
--enable-__cxa_atexit --enable-clocale=gnu --enable-multilib
--with-multilib-list=m32,m64 --disable-fixed-point --enable-targets=all
--enable-libgomp --disable-libssp --disable-libada --disable-cet
--disable-systemtap --disable-valgrind-annotations --disable-vtable-verify
--disable-libvtv --with-zstd --without-isl --enable-default-pie
--enable-default-ssp
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.2.1 20240113 (Gentoo 13.2.1_p20240113-r1 p12)

[Bug debug/113382] FAIL: gcc.dg/debug/btf/btf-bitfields-3.c scan-assembler-times [\t ]0x6000004[\t ]+[^\n]*btt_info 1

2024-01-22 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113382

John David Anglin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from John David Anglin  ---
Fixed on trunk.

[Bug debug/113382] FAIL: gcc.dg/debug/btf/btf-bitfields-3.c scan-assembler-times [\t ]0x6000004[\t ]+[^\n]*btt_info 1

2024-01-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113382

--- Comment #6 from GCC Commits  ---
The master branch has been updated by John David Anglin :

https://gcc.gnu.org/g:bc77c035c45bb224790b1c03d06a64c8a1cc51c5

commit r14-8338-gbc77c035c45bb224790b1c03d06a64c8a1cc51c5
Author: John David Anglin 
Date:   Mon Jan 22 19:07:32 2024 +

Add -gno-strict-dwarf to dg-options in various btf enum tests

The -gno-strict-dwarf option is needed to ensure enum signedness
is added to type_die.

2024-01-22  John David Anglin  

gcc/testsuite/ChangeLog:

PR debug/113382
* gcc.dg/debug/btf/btf-bitfields-3.c: Add -gno-strict-dwarf
option to dg-options.
* gcc.dg/debug/btf/btf-enum-1.c: Likewise.
* gcc.dg/debug/btf/btf-enum-small.c: Likewise.
* gcc.dg/debug/btf/btf-enum64-1.c: Likewise.

[Bug ada/113536] valid reduction expression rejected by -gnatVo

2024-01-22 Thread devotus at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113536

--- Comment #1 from Jack Perry  ---
Per Simon Wright, gcc 14.0.0 does not fail on this, whereas gcc 14.0.1 does, in
the same location, but with a different error: `expected type "Value"... found
type "Standard.Character"`

I edited his message to conform with the types I used in the example below, but
I've also observed it on godbolt's compiler explorer when using gnat "trunk".

[Bug fortran/113152] Fortran 2023 half-cycle trigonometric functions

2024-01-22 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113152

--- Comment #18 from anlauf at gcc dot gnu.org ---
(In reply to Steve Kargl from comment #17)
> Is there something that is different between your OS and FreeBSD?
> Or is there some fundamental difference between C and C++ that
> I am unaware of?

You should not expect everybody to have the latest MPFR installed.
That's the whole point.

Please use #if / #else / #endif

[Bug rtl-optimization/113546] [13/14 Regression] aarch64: bootstrap-debug-lean broken with -fcompare-debug failure since r13-2921-gf1adf45b17f7f1

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113546

--- Comment #4 from Andrew Pinski  ---
Note the reduced testcase might NOT be a representative of the original issue
though ...

[Bug rtl-optimization/113546] [13/14 Regression] aarch64: bootstrap-debug-lean broken with -fcompare-debug failure since r13-2921-gf1adf45b17f7f1

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113546

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=100733

--- Comment #3 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #2)
> That means this is most likely a dup of bug 107169.

And PR 100733.

[Bug rtl-optimization/113546] [13/14 Regression] aarch64: bootstrap-debug-lean broken with -fcompare-debug failure since r13-2921-gf1adf45b17f7f1

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113546

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=107169

--- Comment #2 from Andrew Pinski  ---
(In reply to Alex Coplan from comment #1)
> The reduced testcase started failing with
> r13-2921-gf1adf45b17f7f1ede463524d80032bb2ec866ead:
> 
> commit f1adf45b17f7f1ede463524d80032bb2ec866ead
> Author: Eugene Rozenfeld 
> Date:   Thu Apr 21 23:42:15 2022
> 
> Add instruction level discriminator support.
> 
> This is the first in a series of patches to enable discriminator support
> in AutoFDO.

That means this is most likely a dup of bug 107169.

[Bug rtl-optimization/113546] [13/14 Regression] aarch64: bootstrap-debug-lean broken with -fcompare-debug failure since r13-2921-gf1adf45b17f7f1

2024-01-22 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113546

Alex Coplan  changed:

   What|Removed |Added

   Keywords||compare-debug-failure
Summary|aarch64:|[13/14 Regression] aarch64:
   |bootstrap-debug-lean broken |bootstrap-debug-lean broken
   |with -fcompare-debug|with -fcompare-debug
   |failure |failure since
   ||r13-2921-gf1adf45b17f7f1
 Target||aarch64-*-*

--- Comment #1 from Alex Coplan  ---
The reduced testcase started failing with
r13-2921-gf1adf45b17f7f1ede463524d80032bb2ec866ead:

commit f1adf45b17f7f1ede463524d80032bb2ec866ead
Author: Eugene Rozenfeld 
Date:   Thu Apr 21 23:42:15 2022

Add instruction level discriminator support.

This is the first in a series of patches to enable discriminator support
in AutoFDO.

[Bug c++/113529] Incorrect result of requires-expression in case of function call ambiguity and `operator<=>`

2024-01-22 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113529

Patrick Palka  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |ppalka at gcc dot 
gnu.org
 CC||ppalka at gcc dot gnu.org
 Status|NEW |ASSIGNED

[Bug debug/113382] FAIL: gcc.dg/debug/btf/btf-bitfields-3.c scan-assembler-times [\t ]0x6000004[\t ]+[^\n]*btt_info 1

2024-01-22 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113382

--- Comment #5 from John David Anglin  ---
The problem seems to be DW_AT_encoding is not found in this call:
static ctf_id_t
gen_ctf_enumeration_type (ctf_container_ref ctfc, dw_die_ref enumeration)
{
  const char *enum_name = get_AT_string (enumeration, DW_AT_name);
  unsigned int bit_size = ctf_die_bitsize (enumeration);
  unsigned int signedness = get_AT_unsigned (enumeration, DW_AT_encoding);

get_AT() returns NULL.

This is because dwarf_strict is 1:
  if (!dwarf_strict)
add_AT_unsigned (type_die, DW_AT_encoding,
 TYPE_UNSIGNED (type)
 ? DW_ATE_unsigned
 : DW_ATE_signed);

I believe we need to add -gno-strict-dwarf option on hppa*64*-*-hpux*.

[Bug fortran/113152] Fortran 2023 half-cycle trigonometric functions

2024-01-22 Thread sgk at troutmask dot apl.washington.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113152

--- Comment #17 from Steve Kargl  ---
On Mon, Jan 22, 2024 at 05:35:41PM +, anlauf at gcc dot gnu.org wrote:
> --- Comment #16 from anlauf at gcc dot gnu.org ---
> (In reply to Steve Kargl from comment #14)
> > On Sun, Jan 21, 2024 at 09:52:39PM +, anlauf at gcc dot gnu.org wrote:
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113152
> > > 
> > > I think that you cannot do
> > > 
> > > +  if (MPFR_HALF_CYCLE)
> > > 
> > > you really must use
> > > 
> > > #if MPFR_HALF_CYCLE
> > > 
> > 
> > #include 
> > #include "mpfr.h"
> > 
> > #define MPFR_HALF_CYCLE (MPFR_VERSION_MAJOR * 100 + MPFR_VERSION_MINOR >=
> > 402)
> > 
> > int
> > main(void)
> > {
> >if (MPFR_HALF_CYCLE)
> >   printf("here\n");
> >else
> >   printf("there\n");
> >return (0);
> > }
> > 
> > % cc -o z -I/usr/local/include a.c && ./z
> 
> This does not test the right thing.
> 
> % cat sgk.cc
> #include 
> 
> #define MPFR_HALF_CYCLE 0

This is not what the pre-processor should be doing
(on at least FreeBSD).  See below.


> int
> main(void)
> {
>if (MPFR_HALF_CYCLE)
>   printf_not_declared_if_0 ("here\n");
>else
>   printf ("there\n");
>return (0);
> }
> 
> % g++ sgk.cc
> sgk.cc: In function 'int main()':
> sgk.cc:9:7: error: 'printf_not_declared_if_0' was not declared in this scope
>printf_not_declared_if_0 ("here\n");
>^~~~

Of course, it will fail.  You need to actually have a
printf_not_declared_if_0 function defined during parsing.

#include 
#include 

#define MPFR_HALF_CYCLE 1
#define printf_not_declared_if_0(a) abort()

int
main(void)
{
   if (MPFR_HALF_CYCLE)
  printf_not_declared_if_0 ("here\n");
   else
  printf ("there\n");
   return (0);
}

~/work/x/bin/g++ -I/usr/local/include -o z a.cc && ./z
Abort (core dumped)

Changing 1 to 0 the MPFR_HALF_CYCLE define.

 ~/work/x/bin/g++ -I/usr/local/include -o z a.cc && ./z
there

Going back to my original example and g++ from master, I'm seeing

% ~/work/x/bin/g++ -I/usr/local/include -E a.cc

int
main(void)
{
   if ((
# 9 "a.cc" 3
  4 
# 9 "a.cc"
  * 100 + 
# 9 "a.cc" 3
  2 
# 9 "a.cc"
  >= 402))
  printf("here\n");
   else
  printf("there\n");
   return (0);
}

and with clang++

% c++ -E -I/usr/local/include a.cc
int
main(void)
{
   if ((4 * 100 + 2 >= 402))
  printf("here\n");
   else
  printf("there\n");
   return (0);
}

Is there something that is different between your OS and FreeBSD?
Or is there some fundamental difference between C and C++ that
I am unaware of?

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-01-22 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #4 from Martin Jambor  ---
The right place where to free stuff in lattices post-IPA would be in
ipa_node_params::~ipa_node_params() where we should iterate over lattices and
deinitialize them or perhaps destruct the array because since ipcp_vr_lattice
directly contains Value_Range which AFAIU directly contains int_range_max which
has a virtual destructor... does not look like a POD anymore.  This has escaped
me when I was looking at the IPA-VR changes but hopefully it should not be too
difficult to deal with.

[Bug fortran/113152] Fortran 2023 half-cycle trigonometric functions

2024-01-22 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113152

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 CC||anlauf at gcc dot gnu.org

--- Comment #16 from anlauf at gcc dot gnu.org ---
(In reply to Steve Kargl from comment #14)
> On Sun, Jan 21, 2024 at 09:52:39PM +, anlauf at gcc dot gnu.org wrote:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113152
> > 
> > I think that you cannot do
> > 
> > +  if (MPFR_HALF_CYCLE)
> > 
> > you really must use
> > 
> > #if MPFR_HALF_CYCLE
> > 
> 
> #include 
> #include "mpfr.h"
> 
> #define MPFR_HALF_CYCLE (MPFR_VERSION_MAJOR * 100 + MPFR_VERSION_MINOR >=
> 402)
> 
> int
> main(void)
> {
>if (MPFR_HALF_CYCLE)
>   printf("here\n");
>else
>   printf("there\n");
>return (0);
> }
> 
> % cc -o z -I/usr/local/include a.c && ./z

This does not test the right thing.

% cat sgk.cc
#include 

#define MPFR_HALF_CYCLE 0

int
main(void)
{
   if (MPFR_HALF_CYCLE)
  printf_not_declared_if_0 ("here\n");
   else
  printf ("there\n");
   return (0);
}

% g++ sgk.cc
sgk.cc: In function 'int main()':
sgk.cc:9:7: error: 'printf_not_declared_if_0' was not declared in this scope
   printf_not_declared_if_0 ("here\n");
   ^~~~

[Bug target/113030] parsecpu.awk's chkarch/chkcpu commands is broken for aliases

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113030

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |14.0

--- Comment #7 from Andrew Pinski  ---
Fixed.

[Bug target/113030] parsecpu.awk's chkarch/chkcpu commands is broken for aliases

2024-01-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113030

--- Comment #6 from GCC Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:41caf6b0d603408a829b37f7f7fb09d64d814d48

commit r14-8337-g41caf6b0d603408a829b37f7f7fb09d64d814d48
Author: Andrew Pinski 
Date:   Sat Jan 20 23:12:31 2024 -0800

arm: Fix parsecpu.awk for aliases [PR113030]

So the problem here is the 2 functions check_cpu and check_arch use
the wrong variable to check if an alias is valid for that cpu/arch.
check_cpu uses cpu_optaliases instead of cpu_opt_alias. cpu_optaliases
is an array of index'ed by the cpuname that contains all of the valid
aliases
for that cpu but cpu_opt_alias is an double index array which is index'ed
by cpuname and the alias which provides what is the alias for that option.
Similar thing happens for check_arch and arch_optaliases vs
arch_optaliases.

Tested by running:
```
awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+simd"
config/arm/arm-cpus.in
awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+neon"
config/arm/arm-cpus.in
awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+neon-vfpv3"
config/arm/arm-cpus.in
```
And they don't return error back.

gcc/ChangeLog:

PR target/113030
* config/arm/parsecpu.awk (check_cpu): Use cpu_opt_alias
instead of cpu_optaliases.
(check_arch): Use arch_opt_alias instead of arch_optaliases.

Signed-off-by: Andrew Pinski 

[Bug c++/113545] ICE in label_matches with constexpr function with switch-statement and converted (nonconstant, cast address) input

2024-01-22 Thread hp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113545

Hans-Peter Nilsson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2024-01-22
 Ever confirmed|0   |1

[Bug rtl-optimization/113546] New: aarch64: bootstrap-debug-lean broken with -fcompare-debug failure

2024-01-22 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113546

Bug ID: 113546
   Summary: aarch64: bootstrap-debug-lean broken with
-fcompare-debug failure
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: acoplan at gcc dot gnu.org
  Target Milestone: ---

I tried a bootstrap --with-build-config=bootstrap-debug-lean on aarch64 and it
failed with an -fcompare-debug failure building libiberty/regex.c:

make[3]: Entering directory '/data/ajc/toolchain/builds/bstrap-lean/libiberty'
if [ x"-fPIC" != x ]; then \
  /home/alecop01/toolchain/builds/bstrap-lean/./prev-gcc/xgcc
-B/home/alecop01/toolchain/builds/bstrap-lean/./prev-gcc/
-B/home/alecop01/toolchain/builds/bstrap-lean/aarch64-unknown-linux-gnu/bin/
-B/home/alecop01/toolchain/builds/bstrap-lean/aarch64-unknown-linux-gnu/bin/
-B/home/alecop01/toolchain/builds/bstrap-lean/aarch64-unknown-linux-gnu/lib/
-isystem
/home/alecop01/toolchain/builds/bstrap-lean/aarch64-unknown-linux-gnu/include
-isystem
/home/alecop01/toolchain/builds/bstrap-lean/aarch64-unknown-linux-gnu/sys-include
  -fchecking=1 -c -DHAVE_CONFIG_H -g -O2 -fchecking=1 -fcompare-debug  -I.
-I/home/alecop01/toolchain/src/gcc/libiberty/../include  -W -Wall
-Wwrite-strings -Wc++-compat -Wstrict-prototypes -Wshadow=local -pedantic 
-D_GNU_SOURCE  -fPIC /home/alecop01/toolchain/src/gcc/libiberty/regex.c -o
pic/regex.o; \
else true; fi
xgcc: error: /home/alecop01/toolchain/src/gcc/libiberty/regex.c:
‘-fcompare-debug’ failure
Makefile:1219: recipe for target 'regex.o' failed
make[3]: *** [regex.o] Error 1
make[3]: Leaving directory '/data/ajc/toolchain/builds/bstrap-lean/libiberty'
Makefile:11725: recipe for target 'all-stage3-libiberty' failed
make[2]: *** [all-stage3-libiberty] Error 2
make[2]: Leaving directory '/data/ajc/toolchain/builds/bstrap-lean'
Makefile:26292: recipe for target 'stage3-bubble' failed
make[1]: *** [stage3-bubble] Error 2
make[1]: Leaving directory '/data/ajc/toolchain/builds/bstrap-lean'
Makefile:1099: recipe for target 'all' failed
make: *** [all] Error 2

Here is a reduced testcase for that:

$ cat t.c
int x;
void f() {
fail:
  switch (x) { case 0: goto fail;; }
}
$ ./xgcc -B . -c t.c -fcompare-debug -O -S -o /dev/null
xgcc: error: t.c: ‘-fcompare-debug’ failure

[Bug c++/102626] [c++20] compiler crash when invoking constexpr function in the constructor of template class

2024-01-22 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102626

Patrick Palka  changed:

   What|Removed |Added

 CC||ppalka at gcc dot gnu.org
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=86933,
   ||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=92969

--- Comment #5 from Patrick Palka  ---
PR86933, PR92969 and this all seem related.  GCC seems to mishandle a type
template parameter pack appearing in the return type of a pointer to data
member NTTP pack:

typename ...Ts, Ts S::* ...ms

[Bug c++/113544] [14 Regression] bogus incomplete type error with dependent data member in local class in generic lambda since r14-278

2024-01-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113544

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P3  |P1
 CC||jakub at gcc dot gnu.org

[Bug other/111966] GCN '--with-arch=[...]' not considered for 'mkoffload' default 'elf_arch'

2024-01-22 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111966

Tobias Burnus  changed:

   What|Removed |Added

 CC||burnus at gcc dot gnu.org
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Tobias Burnus  ---
FIXED on mainline/GCC 14.

[Bug tree-optimization/110679] Missed optimization opportunity with countr_zero

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110679

Andrew Pinski  changed:

   What|Removed |Added

 CC||janschultke at googlemail dot 
com

--- Comment #2 from Andrew Pinski  ---
*** Bug 113543 has been marked as a duplicate of this bug. ***

[Bug target/113543] Poor codegen for bit-counting functions (countl_zero, countl_one, countr_zero, countr_one)

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113543

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Andrew Pinski  ---
The rest are a dup of bug 110679.

*** This bug has been marked as a duplicate of bug 110679 ***

[Bug target/113543] Poor codegen for bit-counting functions (countl_zero, countl_one, countr_zero, countr_one)

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113543

Andrew Pinski  changed:

   What|Removed |Added

  Component|c++ |target

--- Comment #1 from Andrew Pinski  ---
>Clang does not emit the extra xor instruction. I don't really know why. 

This is a performance errata in some Intel cores and GCC implements that while
LLVM/clang does NOT. See PR 62011 on that.

[Bug c++/113545] New: ICE in label_matches with constexpr function with switch-statement and converted (nonconstant, cast address) input

2024-01-22 Thread hp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113545

Bug ID: 113545
   Summary: ICE in label_matches with constexpr function with
switch-statement and converted (nonconstant, cast
address) input
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hp at gcc dot gnu.org
  Target Milestone: ---

For the following test-case, g++ ICEs from at least gcc-10 and forward (i.e.
apparently not a regression):
```
char foo;

constexpr unsigned char bar(__UINTPTR_TYPE__ baz)
{
  switch (baz)
{
case 13:
  return 11;
case 14:
  return 78;
case 2048:
  return 13;
default:
  return 42;
}
}

__attribute__ ((__noipa__))
void xyzzy(int x)
{
  if (x != 42)
__builtin_abort ();
}

int main()
{
  unsigned const char c = bar(reinterpret_cast<__UINTPTR_TYPE__>());
  xyzzy(c);
}
'''

Example backtrace with -std=c++20 -O3:

../n.cc: In function 'int main()':
../n.cc:27:30:   in 'constexpr' expansion of 'bar(((long unsigned int)(&
foo)))'
../n.cc:5:3: internal compiler error: in label_matches, at cp/constexpr.cc:6925
5 |   switch (baz)
  |   ^~
0x63894c label_matches
/gcctop/gcc/cp/constexpr.cc:6925
0xa0bb3d cxx_eval_constant_expression
/gcctop/gcc/cp/constexpr.cc:7319
0xa0d2b2 cxx_eval_statement_list
/gcctop/gcc/cp/constexpr.cc:6969
0xa0d2b2 cxx_eval_constant_expression
/gcctop/gcc/cp/constexpr.cc:8316
0xa1e782 cxx_eval_switch_expr
/gcctop/gcc/cp/constexpr.cc:7115
0xa0cb6b cxx_eval_constant_expression
/gcctop/gcc/cp/constexpr.cc:8412
0xa0aae6 cxx_eval_call_expression
/gcctop/gcc/cp/constexpr.cc:3288
0xa0c2ef cxx_eval_constant_expression
/gcctop/gcc/cp/constexpr.cc:7524
0xa17d9a cxx_eval_outermost_constant_expr
/gcctop/gcc/cp/constexpr.cc:8822
0xa1d28f maybe_constant_value(tree_node*, tree_node*, mce_value)
/gcctop/gcc/cp/constexpr.cc:9110
0xa49e40 cp_fully_fold
/gcctop/gcc/cp/cp-gimplify.cc:2831
0xa49ed9 cp_fully_fold
/gcctop/gcc/cp/cp-gimplify.cc:2825
0xa49ed9 cp_fully_fold_init(tree_node*)
/gcctop/gcc/cp/cp-gimplify.cc:2861
0xc7a204 store_init_value(tree_node*, tree_node*, vec**, int)
/gcctop/gcc/cp/typeck2.cc:926
0xa6ca32 check_initializer
/gcctop/gcc/cp/decl.cc:7810
0xa941be cp_finish_decl(tree_node*, tree_node*, bool, tree_node*, int,
cp_decomp*)
/gcctop/gcc/cp/decl.cc:8842
0xb95477 cp_parser_init_declarator
/gcctop/gcc/cp/parser.cc:23618
0xb6ac98 cp_parser_simple_declaration
/gcctop/gcc/cp/parser.cc:15890
0xb8f830 cp_parser_declaration_statement
/gcctop/gcc/cp/parser.cc:14926
0xb97215 cp_parser_statement
/gcctop/gcc/cp/parser.cc:12882
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug c++/113544] [14 Regression] bogus incomplete type error with dependent data member in local class in generic lambda since r14-278

2024-01-22 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113544

Patrick Palka  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Target Milestone|--- |14.0
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-01-22

[Bug c++/113544] New: [14 Regression] bogus incomplete type error with dependent data member in local class in generic lambda since r14-278

2024-01-22 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113544

Bug ID: 113544
   Summary: [14 Regression] bogus incomplete type error with
dependent data member in local class in generic lambda
since r14-278
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ppalka at gcc dot gnu.org
  Target Milestone: ---

template
void f() {
  [](auto parm) {
struct type {
  decltype(parm) x;
};
  };
}

template void f();

: In instantiation of ‘struct f()type’:
:6:5:   required from ‘void f() [with T = int]’
:10:22:   required from here
:5:22: error: ‘f()type::x’ has incomplete type
:5:22: error: invalid use of dependent type ‘decltype (parm)’

[Bug c++/113543] New: Poor codegen for bit-counting functions (countl_zero, countl_one, countr_zero, countr_one)

2024-01-22 Thread janschultke at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113543

Bug ID: 113543
   Summary: Poor codegen for bit-counting functions (countl_zero,
countl_one, countr_zero, countr_one)
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: janschultke at googlemail dot com
  Target Milestone: ---

## Code to Reproduce (https://godbolt.org/z/qPeszhaPv)

#include 

template 
T countr_zero(T x) {
return std::countr_zero(x);
}

template unsigned char countr_zero(unsigned char);
template unsigned short countr_zero(unsigned short);
template unsigned int countr_zero(unsigned int);
template unsigned long countr_zero(unsigned long);
template unsigned long long countr_zero(unsigned long long);

template 
T countr_one(T x) {
return std::countr_one(x);
}

template unsigned char countr_one(unsigned char);
template unsigned short countr_one(unsigned short);
template unsigned int countr_one(unsigned int);
template unsigned long countr_one(unsigned long);
template unsigned long long countr_one(unsigned long long);


template 
T countl_zero(T x) {
return std::countl_zero(x);
}

template unsigned char countl_zero(unsigned char);
template unsigned short countl_zero(unsigned short);
template unsigned int countl_zero(unsigned int);
template unsigned long countl_zero(unsigned long);
template unsigned long long countl_zero(unsigned long long);

template 
T countl_one(T x) {
return std::countl_zero(x);
}

template unsigned char countl_one(unsigned char);
template unsigned short countl_one(unsigned short);
template unsigned int countl_one(unsigned int);
template unsigned long countl_one(unsigned long);
template unsigned long long countl_one(unsigned long long);


## Summary

GCC consistently emits much more code for these function than clang.
For example, GCC:

> unsigned int countl_one(unsigned int):
>   xor eax, eax
>   lzcnt   eax, edi
>   ret

Clang does not emit the extra xor instruction. I don't really know why. LZCNT
has a wide contract and should be equivalent to std::countl_zero.

It gets a lot worse though:

> unsigned short countl_zero(unsigned short):
>   mov eax, 16
>   testdi, di
>   je  .L23
>   movzx   edi, di
>   lzcnt   edi, edi
>   lea eax, [rdi-16]
> .L23:
>   ret

I don't really know what all of this schmutz is. Clang emits lzcnt and ret in
this case.


Another bit of disappointing codegen is this:
> unsigned char countr_zero(unsigned char):
>   movzx   eax, dil
>   xor edx, edx
>   tzcnt   edx, eax
>   testdil, dil
>   mov eax, 8
>   cmovne  eax, edx
>   ret

Clang emits:
>   or  edi, 256
>   tzcnt   eax, edi
>   ret

This clang codegen is very clever. It simply adds a bit on the left, so that
the 32-bit routine can be re-used with only one additional instruction.

[Bug tree-optimization/113239] [13/14 regression] After 822a11a1e64, bogus -Warray-bounds warnings in std::vector

2024-01-22 Thread dimitry--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113239

--- Comment #8 from Dimitry Andric  ---
(In reply to Frank Ch. Eigler from comment #7)
> Wonder if this similar but different diagnostic is closely related:
...
> where the c++ code in question is a straight
> 
> vector<> foo;
> vector<> bar;
> foo.insert(foo.end(), bar.begin(), bar.end());

I can't reproduce the warning here with a vector example, the function is
entirely optimized away too. But even if I return the result, e.g.:

std::vector f(std::vector bar)
{
  std::vector foo;
  foo.insert(foo.end(), bar.begin(), bar.end());
  return foo;
}

still no warning. But I think you might need to reduce the mutatee.cxx case.

That said, the warning you show is triggered in a different place, and the
"between 9 and 9223372036854775800 bytes" is also different.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #9 from Tamar Christina  ---
So on SVE the change is cost modelling.

Bisect landed on g:33c2b70dbabc02788caabcbc66b7baeafeb95bcf which changed the
compiler's defaults to using the new throughput matched cost modelling used be
newer cores.

It looks like this changes which mode the compiler picks for when using a fixed
register size.

This is because the new cost model (correctly) models the costs for FMAs and
promotions.

Before:

array1[0][_1] 1 times scalar_load costs 1 in prologue
int) _2 1 times scalar_stmt costs 1 in prologue

after:

array1[0][_1] 1 times scalar_load costs 1 in prologue 
(int) _2 1 times scalar_stmt costs 0 in prologue 

and the cost goes from:

Vector inside of loop cost: 125

to

Vector inside of loop cost: 83 

so far, nothing sticks out, and in fact the profitability for VNx4QI drops from

Calculated minimum iters for profitability: 5

to

Calculated minimum iters for profitability: 3

This causes a clash, as this is now exactly the same cost as VNx2QI which used
to be what it preferred before.

Which then leads it to pick the higher VF.

In the end smaller VF shows:

;; Guessed iterations of loop 4 is 0.500488. New upper bound 1.

and now we get:

Vectorization factor 16 seems too large for profile prevoiusly believed to be
consistent; reducing.  
;; Guessed iterations of loop 4 is 0.500488. New upper bound 0.
;; Scaling loop 4 with scale 66.6% (guessed) to reach upper bound 0

which I guess is the big difference.

There is a weird costing going on in the PHI nodes though:

m_108 = PHI  1 times vector_stmt costs 0 in body 
m_108 = PHI  2 times scalar_to_vec costs 0 in prologue

they have collapsed to 0. which can't be right..

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-22 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #7 from Oleg Endo  ---
(In reply to Roger Sayle from comment #6)
> To help diagnose the problem, I came up with this simple patch:

Thanks for looking into it!

> which then helps reveal that on sh3-linux-gnu with -O1 we see:

I think this will also happen on all sh-elf sub-targets, not necessarily
sh3-linux... if it helps anything ... 

> propagating insn 6 into insn 12, replacing:
> (set (reg:SI 174 [ _1 ])
> (sign_extend:SI (reg:QI 169 [ *a_7(D) ])))
> successfully matched this instruction to *extendqisi2_compact_snd:
> (set (reg:SI 174 [ _1 ])
> (sign_extend:SI (mem:QI (reg/v/f:SI 168 [ aD.1817 ]) [0 *a_7(D)+0 S1
> A8])))
> change is profitable (cost 4 -> cost 1)
> 
> which confirms Andrew's and Oleg's analyses above; the sh_rtx_costs function
> is a little odd... Reading from memory is four times faster than using a
> pseudo!?
> I'm investigating a "costs" patch for the SH backend.

Looks like sh_rtx_costs function assumes that the costs of the whole RTX are
summed up outside in the caller.

In sh_rtx_costs SIGN_EXTEND, ZERO_EXTEND, the 'sh_address_cost' is returned
directly for the MEM_P case. It should probably have went through COSTS_N_INSN
to get it into the same scale as for the arith_reg_operand case.

[Bug c++/102626] [c++20] compiler crash when invoking constexpr function in the constructor of template class

2024-01-22 Thread hp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102626

Hans-Peter Nilsson  changed:

   What|Removed |Added

 CC||hp at gcc dot gnu.org
   Last reconfirmed|2021-10-11 00:00:00 |2024-01-14 0:00

--- Comment #4 from Hans-Peter Nilsson  ---
Searching for a constexpr-related bug (not this one) I can confirm that (for
cris-elf at least) the bug is still there at r14-7232-gb468821eea8d
(the test-case in comment #2 with "-std=c++20 -O3")

[Bug tree-optimization/113239] [13/14 regression] After 822a11a1e64, bogus -Warray-bounds warnings in std::vector

2024-01-22 Thread fche at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113239

--- Comment #7 from Frank Ch. Eigler  ---
Wonder if this similar but different diagnostic is closely related:

https://kojipkgs.fedoraproject.org//work/tasks/6259/112176259/build.log

[...]
inlined from ‘mutatee::instrument_dynprobe_target(BPatch_object*,
dynprobe_target const&)’ at mutatee.cxx:444:22:
/usr/include/c++/14/bits/stl_algobase.h:438:30: error: ‘memmove’ writing
between 9 and 9223372036854775800 bytes into a region of size 0 overflows the
destination [-Werror=stringop-overflow=]
  438 | __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
  | ~^~~
In file included from
/usr/include/c++/14/x86_64-redhat-linux/bits/c++allocator.h:33,
 from /usr/include/c++/14/bits/allocator.h:46,
 from /usr/include/c++/14/string:43:
In member function ‘std::__new_allocator::allocate(unsigned
long, void const*)’,
[...]

where the c++ code in question is a straight

vector<> foo;
vector<> bar;
foo.insert(foo.end(), bar.begin(), bar.end());

[Bug rtl-optimization/113542] New: gcc.target/arm/bics_3.c regression after change for pr111267

2024-01-22 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113542

Bug ID: 113542
   Summary: gcc.target/arm/bics_3.c regression after change for
pr111267
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: roger at nextmovesoftware dot com
  Target Milestone: ---

This patch is a placeholder for tracking the reported failures of
FAIL: gcc.target/arm/bics_3.c scan-assembler-times bics\tr[0-9]+, r[0-9]+,
r[0-9]+ 2
FAIL: gcc.target/arm/bics_3.c scan-assembler-times bics\tr[0-9]+, r[0-9]+,
r[0-9]+, .sl #2 1
See https://linaro.atlassian.net/browse/GNU-1117

Alas, I've been unable to reproduce the failure on cross compilers to either
arm-linux-gnueabihf nor armv8l-unknown-linux-gnueabihf, so I suspect that
there's some configuration option or compile-time flag I'm missing that's
required to trigger these failures (which I'm hoping is "missed optimization"
rather than "wrong code").

Hopefully, filing this PR provides a mechanism to allow folks to help me
investigate this issue.  My apologies for the temporary inconvenience.
Setting the component to "rtl-optimization" until this is confirmed to be a
target (ARM backend) issue.

[Bug debug/113382] FAIL: gcc.dg/debug/btf/btf-bitfields-3.c scan-assembler-times [\t ]0x6000004[\t ]+[^\n]*btt_info 1

2024-01-22 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113382

--- Comment #4 from John David Anglin  ---
dtd->dtd_enum_unsigned is set in ctf_add_enum:
dtd->dtd_enum_unsigned = eunsigned;

  /* Generate a CTF type for the enumeration.  */
  enumeration_type_id = ctf_add_enum (ctfc, CTF_ADD_ROOT,
  enum_name, bit_size / 8,
  (signedness == DW_ATE_unsigned),
  enumeration);

signedness = 0

(gdb) bt
#0  ctf_add_enum (ctfc=0x83ffbfea7c00, flag=1,
name=0x83ffbfe6b188 "foo", size=4, eunsigned=false,
die=0x83ffbfea2320) at ../../gcc/gcc/ctfc.cc:591
#1  0x4082df34 in gen_ctf_enumeration_type (
enumeration=0x83ffbfea2320, ctfc=0x83ffbfea7c00)
at ../../gcc/gcc/dwarf2ctf.cc:762
#2  gen_ctf_type (ctfc=0x83ffbfea7c00, die=0x83ffbfea2320)
at ../../gcc/gcc/dwarf2ctf.cc:899
#3  0x4082e8b8 in ctf_do_die (die=0x83ffbfea2320)
at ../../gcc/gcc/dwarf2ctf.cc:978
#4  0x4088f9b0 in ctf_debug_do_cu (die=)
at ../../gcc/gcc/dwarf2out.cc:33017
#5  ctf_debug_do_cu (die=) at ../../gcc/gcc/dwarf2out.cc:33010
#6  dwarf2out_early_finish (filename=0x83ffbfea7c00 "▒\362\004\002")
at ../../gcc/gcc/dwarf2out.cc:33146
#7  0x407de578 in symbol_table::finalize_compilation_unit (
this=0x83ffbfe6b188) at ../../gcc/gcc/cgraphunit.cc:2579
#8  0x40d55338 in compile_file () at ../../gcc/gcc/toplev.cc:474

[Bug c++/113541] New: Rejects __attribute__((section)) on explicit instantiation declaration of ctor/dtor

2024-01-22 Thread arthur.j.odwyer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113541

Bug ID: 113541
   Summary: Rejects __attribute__((section)) on explicit
instantiation declaration of ctor/dtor
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: arthur.j.odwyer at gmail dot com
  Target Milestone: ---

// https://godbolt.org/z/34Wdj1ox8

template
struct S {
S(int) {}
void operator=(int) {}
void f(int) {}
~S() {}
};
template __attribute__((section("TEST"))) S::S(int); // error
template __attribute__((section("TEST"))) void S::f(int); // OK
template __attribute__((section("TEST"))) void S::operator=(int); // OK
template __attribute__((section("TEST"))) S::~S(); // error

===

: In instantiation of 'S::S(int) [with T = int]':
:9:56:   required from here
:3:5: error: section of alias 'S::S(int) [with T = int]' must match
section of its target
3 | S(int) {}
  | ^

The problem seems to be only with the constructor and destructor, i.e., the two
kinds of functions that codegen two object-code definitions (base object xtor
and complete object xtor) for a single C++ declaration.

Somehow, giving `S` a virtual base class (`struct S : virtual B`) fixes the
problem. Then both codegenned xtors correctly wind up in the "TEST" section.

GCC 4.9.4 is happy with the code as written. The bug started happening with GCC
5.

(This was noted on Slack in June 2019, but never reported on Bugzilla AFAICT
until now: https://cpplang.slack.com/archives/C5GN4SP41/p1560800562026000 )

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-22 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

Roger Sayle  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-22
 Status|UNCONFIRMED |NEW
 CC||roger at nextmovesoftware dot 
com
 Ever confirmed|0   |1

--- Comment #6 from Roger Sayle  ---
To help diagnose the problem, I came up with this simple patch:
diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index 7872609b336..dc563ac2ca1 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -492,6 +492,9 @@ try_fwprop_subst_pattern (obstack_watermark ,
insn_change _change,
   " (cost %d -> cost %d)\n", old_cost, new_cost);
ok = false;
  }
+   else if (dump_file)
+ fprintf (dump_file, "change is profitable"
+  " (cost %d -> cost %d)\n", old_cost, new_cost);
   }

   if (!ok)

which then helps reveal that on sh3-linux-gnu with -O1 we see:
propagating insn 6 into insn 12, replacing:
(set (reg:SI 174 [ _1 ])
(sign_extend:SI (reg:QI 169 [ *a_7(D) ])))
successfully matched this instruction to *extendqisi2_compact_snd:
(set (reg:SI 174 [ _1 ])
(sign_extend:SI (mem:QI (reg/v/f:SI 168 [ aD.1817 ]) [0 *a_7(D)+0 S1 A8])))
change is profitable (cost 4 -> cost 1)

which confirms Andrew's and Oleg's analyses above; the sh_rtx_costs function is
a little odd... Reading from memory is four times faster than using a pseudo!?
I'm investigating a "costs" patch for the SH backend.  My apologies for the
temporary inconvenience, and thanks to Jeff for catching/spotting this.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #31 from JuzheZhong  ---
machine dep reorg  : 403.69 ( 56%)  23.48 ( 93%) 427.17 ( 57%) 
5290k (  0%)

Confirm remove RTL DF checking, LICM is no longer be compile-time hog issue.

VSETVL PASS count 56% compile-time.

Even though I can' see memory-hog in GGC -ftime-report, I can see 33G memory
usage
in htop.

Confirm both compile-hog and memory-hog are VSETVL PASS issue.

I will work on optimize compile-time as well as memory-usage of VSETVL PASS.

[Bug debug/112718] [11/12/13/14 Regression] ICE: in add_dwarf_attr, at dwarf2out.cc:4501 with -g -fdebug-types-section -flto -ffat-lto-objects

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112718

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #2 from Richard Biener  ---
I have a patch, but other issues with -fdebug-types-section and -flto will
prevail.

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2024-01-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

--- Comment #38 from Robin Dapp  ---
deepsjeng also looks ok here.

[Bug middle-end/113540] New: missing -Warray-bounds warning with malloc and a simple loop

2024-01-22 Thread vincent-gcc at vinc17 dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113540

Bug ID: 113540
   Summary: missing -Warray-bounds warning with malloc and a
simple loop
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincent-gcc at vinc17 dot net
  Target Milestone: ---

Consider the following code:

#include 

int main (void)
{
  volatile char *t;
  t = malloc (4);
  for (int i = 0; i <= 4; i++)
t[i] = 0;
  return 0;
}

With -O2 -Warray-bounds, I do not get any warning.

Replacing the loop by "t[4] = 0;" gives a warning "array subscript 4 is outside
array bounds of 'volatile char[4]'" as expected.

Or replacing the use of malloc() by "volatile char t[4];" also gives a warning.

Tested with gcc (Debian 20240117-1) 14.0.1 20240117 (experimental) [master
r14-8187-gb00be6f1576]. But previous versions do not give any warning either.

[Bug c/113518] ICE: in gimplify_expr, at gimplify.cc:18596 with atomic_fetch_or_explicit() on _BitInt()

2024-01-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113518

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
Created attachment 57183
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57183=edit
gcc14-pr113518.patch

Untested fix.

[Bug middle-end/113514] Imprecise __builtin_dynamic_object_size when using a set local variable

2024-01-22 Thread siddhesh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113514

Siddhesh Poyarekar  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-22
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

[Bug middle-end/113514] Wrong __builtin_dynamic_object_size when using a set local variable

2024-01-22 Thread siddhesh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113514

--- Comment #5 from Siddhesh Poyarekar  ---
What seems to be happening is that early_objsz bails out since the subobject
size at that point is not a constant; I remember concluding that it's safest to
stick to constant sizes here, but I can't remember why I came to that
conclusion.  Then in constant propagation (literally the next pass in -O2), the
reference gets folded into a MEM_REF and we have the classic case of the
subobject reference being lost, due to which we see the whole object size there
instead of the subobject size.

I need to try and remember why I decided against generating expressions in
early_objsz.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|13.3|14.0
Summary|[13/14 Regression] Fail to  |[14 Regression] Fail to
   |fold the last element with  |fold the last element with
   |multiple loop   |multiple loop

  1   2   >