[Bug tree-optimization/115221] [15 regression] ICE when building reiser4progs (propagate_updated_value, at gimple-range-cache.cc:1368)

2024-05-24 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115221

Aldy Hernandez  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #5 from Aldy Hernandez  ---
Has anyone bisected this?

[Bug tree-optimization/115191] [15 regression] ICE when building stklos (in verify_range, at value-range.cc:1526)

2024-05-23 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115191

Aldy Hernandez  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #9 from Aldy Hernandez  ---
Fixed.

[Bug tree-optimization/115191] [15 regression] ICE when building stklos (in verify_range, at value-range.cc:1526)

2024-05-22 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115191

--- Comment #7 from Aldy Hernandez  ---
Created attachment 58272
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58272=edit
patch in testing

[Bug tree-optimization/115191] [15 regression] ICE when building stklos (in verify_range, at value-range.cc:1526)

2024-05-22 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115191

--- Comment #6 from Aldy Hernandez  ---
(In reply to Andrew Pinski from comment #4)
> Confirmed.
> 
>   /* After the optimization PHI result can have value
>  which it couldn't have previously.  */
>   int_range_max r;
>   if (get_global_range_query ()->range_of_expr (r,
> phires,
> phi))
> 
> 
> Yes int_range_max here is almost definitely wrong for pointer types.
> Note this code is originally from r13-4878-g3d6bb832022160 .

Exactly.  Assuming r could be either an integer or a pointer, that should be
Value_Range which is type agnostic.

[Bug tree-optimization/115191] [15 regression] ICE when building stklos (in verify_range, at value-range.cc:1526)

2024-05-22 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115191

Aldy Hernandez  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |aldyh at gcc dot gnu.org
Summary|[15 regression] ICE when|[15 regression] ICE when
   |building stklos in  |building stklos (in
   |verify_range due to |verify_range, at
   |phiopt's value_replacement  |value-range.cc:1526)
   Last reconfirmed|2024-05-22 00:00:00 |

--- Comment #5 from Aldy Hernandez  ---
Mine.

Phiopt is using int_range_max for a range that could be either an int or a
pointer.

[Bug middle-end/115131] [15 regression] ICE when building (external) rtl88x2bu kernel module (in verify_range, at value-range.cc:677)

2024-05-17 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115131

Aldy Hernandez  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Aldy Hernandez  ---
Fixed

[Bug middle-end/115131] [15 regression] ICE when building (external) rtl88x2bu kernel module (in verify_range, at value-range.cc:677)

2024-05-17 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115131

--- Comment #5 from Aldy Hernandez  ---
Created attachment 58224
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58224=edit
patch in testing

[Bug middle-end/115131] [15 regression] ICE when building (external) rtl88x2bu kernel module (in verify_range, at value-range.cc:677)

2024-05-17 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115131

Aldy Hernandez  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Assignee|unassigned at gcc dot gnu.org  |aldyh at gcc dot gnu.org
   Last reconfirmed||2024-05-17
 Ever confirmed|0   |1

--- Comment #3 from Aldy Hernandez  ---
Mine.

[Bug middle-end/115128] [15 regression] ICE when building aflplusplus (internal compiler error: in type, at value-range.h:983)

2024-05-17 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115128

Aldy Hernandez  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #7 from Aldy Hernandez  ---
Fixed

[Bug middle-end/115128] [15 regression] ICE when building aflplusplus (internal compiler error: in type, at value-range.h:983)

2024-05-17 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115128

--- Comment #3 from Aldy Hernandez  ---
Created attachment 58222
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58222=edit
patch in testing

[Bug middle-end/115128] [15 regression] ICE when building aflplusplus (internal compiler error: in type, at value-range.h:983)

2024-05-17 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115128

Aldy Hernandez  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-05-17
 Status|UNCONFIRMED |NEW

--- Comment #1 from Aldy Hernandez  ---
Mine.

[Bug middle-end/96564] [11/12/13/14/15 Regression] New maybe use of uninitialized variable warning since r11-959

2024-05-16 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96564

--- Comment #19 from Aldy Hernandez  ---
(In reply to Richard Biener from comment #17)
> Handling pointer-vs-pointer in ptrs_compare_unequal isn't enough since we
> have
> 
>   # PT = nonlocal null
>   unsigned int * x_7(D) = x;
> ...
>   # PT = null { D.2785 }
>   a_9 = malloc (_2);
>   if (a_9 == 0B) 
> goto ; [0.04%]
>   else
> goto ; [99.96%]
> 
>[local count: 429496]:
>   goto ; [100.00%]
> 
>[local count: 1073312328]:
>   if (x_7(D) != a_9)
> 
> so querying points-to only has to consider both pointers being NULL.  Now,
> I'm not sure how prange handles the above and whether or how it integrates
> knowledge from flow-insensitive points-to analysis.

prange currently does nothing different than what irange did.  It's a first
step in providing points-to and propagating the information through the CFG
like we do for integer ranges.  Or more precisely, prange is meant to be
nothing more than a pair of integer end points (not unlimited like
int_range_max), plus a value/mask pair.

> 
> Aldy might know.
> 
> I'm testing a patch enhancing ptrs_compare_unequal right now, also filling
> in a missing bit in points-to info.

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-16 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #32 from Aldy Hernandez  ---
(In reply to r...@cebitec.uni-bielefeld.de from comment #31)
> > --- Comment #29 from Aldy Hernandez  ---
> [...]
> > Ok, what's the minimum configuration I need to build here?
> >
> > srcdir/configure --build=sparc-sun-solaris2.11
> >
> > srcdir/configure --build=sparc-sun-solaris2.11 --without-gnu-as
> > --without-gnu-ld
> >
> > ??
> 
>   srcdir/configure --build=sparcv9-sun-solaris2.11 --without-gnu-as
>   --with-as=/usr/bin/as --without-gnu-ld --with-ld=/usr/bin/ld

OK, I'll add this to my notes.  Thanks.

BTW, the above seems like it would be a perfect thing to put in /etc/motd :).

> 
> should do the trick.
> 
> Preferably prepend /usr/gnu/bin to PATH.
> 
> > I really don't care how it builds, I just want the bare minimum so I can
> > bootstrap and run tests.
> >
> > A minor rant, but why can't all this be set up automatically in the compile
> > farm machines?  Keeping track of minor nuances of each architecture is
> > distracting.  They should all be set up, whether by setting default paths in
> 
> Agreed, but you always run into stuff like this when leaving Linux (like
> AIX, HP-UX, or macOS).  PATHs won't help much for reasons explained in
> install.texi.
> 
> > /etc/profile or whatever, or by having the relevant patches in GCC's source
> > base, such that they work with src/configure && make.
> 
> But what should be the default: GNU or native tools?  You have choices
> here and need to decide what you need/want.  No one can do this for you.

My guess would be to use gnu tools if available, cause those would be
"guaranteed" to work correctly to build gcc?

> 
> > I know this isn't your fault, but if more cfarm boxes where set up to go 
> > with
> > no surprises, I'd add more boxen to my testing harness.
> 
> Well, I *did* set up the Solaris cfarm systems, actually ;-)  I see your
> point, but things are not that simple unfortunately.  And GCC's
> configure doesn't need to cater to users of the cfarm only...
> 
> Some of the issues (like the need for --build) have been caused by
> upstream (like the config.guess maintainers that refuse to listen to
> reason ;-).  I thought about improving the --with-as/ld situation; maybe
> something can be done there.

Thanks for the explanation.

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-16 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #30 from Aldy Hernandez  ---
(In reply to Aldy Hernandez from comment #29)

> > > gmake[3]: Leaving directory '/home/aldyh/bld/clean'
> > > Comparing stages 2 and 3
> > > Bootstrap comparison failure!
> > > gcc/tree-vect-stmts.o differs
> > 
> > I'm not seeing this myself.
> 
> Dunno.  It could be the particular revision was broken??

Interestingly, I just did the following with latest trunk, and sparc
bootstrapped and is running tests:

configure && make -j128 && make check -j128 -k

No --with-something-rather or path tweaks or --build=blah.

*shrugs*

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-16 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #29 from Aldy Hernandez  ---
(In reply to r...@cebitec.uni-bielefeld.de from comment #28)
> > --- Comment #27 from Aldy Hernandez  ---
> > This is in cfarm216.cfarm.et:
> >
> > aldyh@s11-sparc:~/bld/clean$ hostname
> > s11-sparc.cfarm
> > aldyh@s11-sparc:~/bld/clean$ uname -a
> > SunOS s11-sparc.cfarm 5.11 11.4.68.164.2 sun4v sparc sun4v logical-domain
> > aldyh@s11-sparc:~/bld/clean$ ~/src/clean/configure && gmake -j80 && gmake 
> > check
> > -k -j80
> 
> You need to be careful that the assembler and linker actually used match
> gcc's idea thereof: 
> 
>   https://gcc.gnu.org/install/specific.html#x-x-solaris2
> 
> Best specify all of --build sparcv9-sun-solaris2.11 (if you're
> bootstrapping with a 64-bit-default gcc/g++) and
> 
>   --with-as=/usr/gnu/bin/as (or /opt/cfarm/binutils-latest-64/bin/as)
>   --with-gnu-as
> 
> (/usr/bin/as works too, but you need to specify both the path and
> --without-gnu-as)
> 
>   --with-ld=/usr/bin/ld
>   --without-gnu-ld
> 
> Having two different assemblers and linkers (Solaris and GNU) available
> can lead to confusion if you rely on PATH alone.

Ok, what's the minimum configuration I need to build here?

srcdir/configure --build=sparc-sun-solaris2.11

srcdir/configure --build=sparc-sun-solaris2.11 --without-gnu-as
--without-gnu-ld

??

I really don't care how it builds, I just want the bare minimum so I can
bootstrap and run tests.

A minor rant, but why can't all this be set up automatically in the compile
farm machines?  Keeping track of minor nuances of each architecture is
distracting.  They should all be set up, whether by setting default paths in
/etc/profile or whatever, or by having the relevant patches in GCC's source
base, such that they work with src/configure && make.

I know this isn't your fault, but if more cfarm boxes where set up to go with
no surprises, I'd add more boxen to my testing harness.

> 
> > gmake[3]: Leaving directory '/home/aldyh/bld/clean'
> > Comparing stages 2 and 3
> > Bootstrap comparison failure!
> > gcc/tree-vect-stmts.o differs
> 
> I'm not seeing this myself.

Dunno.  It could be the particular revision was broken??

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-16 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #27 from Aldy Hernandez  ---
(In reply to r...@cebitec.uni-bielefeld.de from comment #26)
> > --- Comment #25 from Aldy Hernandez  ---
> > prange has been enabled again, after testing on x86-64 and ppc64le linux. 
> > Aarch has no space to run tests on the compile farm, and sparc bootstrap was
> > already broken.
> 
> Huh?  Current trunk bootstraps just fine on Solaris/SPARC without any
> patches.  What issue do you see?
> 
> Besides, there *is* a Solaris/SPARC system in the cfarm (cfarm216), so
> you can try for yourself.

commit 5609d77e683944439fae38323ecabc44a1eb4671 (HEAD -> clean)
Author: Christoph Müllner 
Date:   Wed May 15 01:34:54 2024 +0200

RISC-V: Test cbo.zero expansion for rv32

We had an issue when expanding via cmo-zero for RV32.
This was fixed upstream, but we don't have a RV32 test.
Therefore, this patch introduces such a test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicboz-zic64-1.c: Fix for rv32.

Signed-off-by: Christoph Müllner 


This is in cfarm216.cfarm.et:

aldyh@s11-sparc:~/bld/clean$ hostname
s11-sparc.cfarm
aldyh@s11-sparc:~/bld/clean$ uname -a
SunOS s11-sparc.cfarm 5.11 11.4.68.164.2 sun4v sparc sun4v logical-domain
aldyh@s11-sparc:~/bld/clean$ ~/src/clean/configure && gmake -j80 && gmake check
-k -j80
...
...
gmake[2]: Entering directory '/home/aldyh/bld/clean'
gmake[3]: Entering directory '/home/aldyh/bld/clean'
rm -f stage_current
gmake[3]: Leaving directory '/home/aldyh/bld/clean'
Comparing stages 2 and 3
Bootstrap comparison failure!
gcc/tree-vect-stmts.o differs
gmake[2]: *** [Makefile:26649: compare] Error 1
gmake[2]: Leaving directory '/home/aldyh/bld/clean'
gmake[1]: *** [Makefile:26629: stage3-bubble] Error 2
gmake[1]: Leaving directory '/home/aldyh/bld/clean'
gmake: *** [Makefile:1099: all] Error 2
aldyh@s11-sparc:~/bld/clean$

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-16 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #25 from Aldy Hernandez  ---
prange has been enabled again, after testing on x86-64 and ppc64le linux. 
Aarch has no space to run tests on the compile farm, and sparc bootstrap was
already broken.

The problem in this PR can be triggered by commenting the two new
operand_check_p conditionals:

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 09cab761822..76548f31853 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -1744,7 +1744,7 @@ ipa_value_range_from_jfunc (vrange ,
 pointer type to hold the result instead of a boolean
 type.  Avoid trapping in the sanity check in
 fold_range until this is fixed.  */
- || !handler.operand_check_p (vr_type, srcvr.type (), op_vr.type
())
+ /*|| !handler.operand_check_p (vr_type, srcvr.type (), op_vr.type
())*/
  || !handler.fold_range (op_res, vr_type, srcvr, op_vr))
op_res.set_varying (vr_type);

@@ -2556,9 +2556,11 @@ propagate_vr_across_jump_function (cgraph_edge *cs,
ipa_jump_func *jfunc,
 pointer type to hold the result instead of a boolean
 type.  Avoid trapping in the sanity check in
 fold_range until this is fixed.  */
+ /*
  || !handler.operand_check_p (operand_type,
   src_lats->m_value_range.m_vr.type
(),
   op_vr.type ())
+ */
  || !handler.fold_range (op_res, operand_type,
  src_lats->m_value_range.m_vr, op_vr))
op_res.set_varying (param_type);

You can reproduce on x86-64 as well as ppc64le, and the problem can be seen
with and without prange support enabled.

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-16 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #22 from Aldy Hernandez  ---
(In reply to Martin Jambor from comment #20)

Thanks for looking into this.

> The IL we generate the jump function from is:
>   
>   _1 = cclauses_2(D) != 0B;
>   c_parser_omp_all_clauses (_1);
> 
> Which translates to the expected jump function:
>   callsite  void c_parser_omp_teams(int**)/3 -> int*
> c_parser_omp_all_clauses(bool)/1 :
>  param 0: PASS THROUGH: 0, op ne_expr 0B
> 
> so IPA looks like it's doing what it should.

So, is it bad IL?  If so, do you know what created it?

> 
> (In reply to Aldy Hernandez from comment #6)
> > I wonder if something like this would work.
> > 
> > diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> > index 5781f50..ea8a685 100644
> > --- a/gcc/ipa-cp.cc
> > +++ b/gcc/ipa-cp.cc
> > @@ -1730,6 +1730,8 @@ ipa_value_range_from_jfunc (vrange ,
> > }
> >else
> > {
> > + if (TREE_CODE_CLASS (operation) == tcc_comparison)
> > +   vr_type = boolean_type_node;
> >   Value_Range op_res (vr_type);
> >   Value_Range res (vr_type);
> >   tree op = ipa_get_jf_pass_through_operand (jfunc);
> 
> This looks OKish and we also do a similar thing in
> ipa_get_jf_arith_result.
> 
> Also note that the ipa_value_range_from_jfunc already has a parameter
> that tells it what type the result should be.  It is called parm_type,
> which is boolean_type in the case that ICEs.  So we can even bail out
> if we really encounter jump function created from bad IL.
> 
> I was thinking of using use parm_type from the beginning, to
> initialize op_res with it, but there are jump functions representing
> an operation followed by a truncation, for example for:
> 
>   _2 = complain_6(D) & 1;
>   _3 = (int) std_alignof_7(D);
>   cxx_sizeof_or_alignof_type (_3, _2);
> 
> where _r is in fact bool (has smaller size and precision) and trying
> to make ranger do the bit_and_expr directly to bool leads to a failed
> assert in fold_range (the test of m_operator->operand_check_p).

ranger should accept any combination the IL does.  What we try to diagnose with
the operand_check_p's is combinations that would also be incorrect in the IL.

> 
> So doing the operation in the original type - unless it is a
> comparison - and then using ipa_vr_operation_and_type_effects seems to
> be the right thing to do.
> 
> But I am really curious why propagate_vr_across_jump_function does not
> need the same check for tcc_comparison operators and generally why is

In my local tree I have mods to both ipa_value_range_from_jfunc and
propagate_vr_across_jump_function, because I found a failing testcase that
needed it.  So you probably need to tweak both places.

> it so different (in the non-scc case)?  Why is ipa_supports_p (this
> predicate has a really really really bad name BTW and I am completely
> at loss as to what it does and how or why) used there and not in
> ipa_value_range_from_jfunc?

Feel free to change it to whatever you prefer.  I was just trying to avoid
adding "&& prange::supports_type_p()" in a bunch of places.  Although what we
probably want is to verify that the types match with operands_check_p as you do
in ipa_vr_operation_and_type_effects():

  return (handler.operand_check_p (dst_type, src_type, dst_type)
  && handler.fold_range (dst_vr, dst_type, src_vr, varying)
  && !dst_vr.varying_p ()
  && !dst_vr.undefined_p ());

My apologies if I have dirtied ipa-* somewhat.  To be honest, I would much
rather not have to touch the code, as I'm not very familiar with it.  Feel free
to fix it however way you feel is best.  I'll be happy to provide feedback as
needed.

> 
> (I also cannot prevent myself from ranting a little that it would
> really help if all the ranger (helper) classes and functions were
> better documented.)

Andrew has been in documentation and clean-up mode for a few months now, and
should be able to push something out soon.

[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-15 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

--- Comment #13 from Aldy Hernandez  ---
(In reply to Aldy Hernandez from comment #10)
> Created attachment 58202 [details]
> proof of concept implementing a range-op entry for builtin_assume_aligned
> 
> Something like this (properly coded and enhanced) would work.
> 
> The pointers_handled_p() cruft is temporary boilerplate to indicate that we
> only support a fold-range() of PRANGE = IRANGE op PRANGE for this operation.
> This is going away when the dust settles.

FYI, I've removed the pointers_handled_p requirement for new range-op entries.

[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-14 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

--- Comment #11 from Aldy Hernandez  ---
Just to clarify.  prange as well as irange keep a value/mask pair where we can
store alignment info, so every calculation (range-op) along the way can track
this properly.

Now, for pointers we loose this information across passes because of the union
in SSA_NAME_RANGE_INFO (struct tree_ssa_name) which keeps the range info in an
ptr_info_def, not a proper vrange_storage.  It's in my long term TODO list to
propose we properly track pointer ranges once prange comes live.

Note that IPA keeps alignment info on the side, so part of this info is kept
across passes.  But I assume they're doing this, because originally ranges
didn't have alignment (value/mask) info associated with them.

[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-14 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

--- Comment #10 from Aldy Hernandez  ---
Created attachment 58202
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58202=edit
proof of concept implementing a range-op entry for builtin_assume_aligned

Something like this (properly coded and enhanced) would work.

The pointers_handled_p() cruft is temporary boilerplate to indicate that we
only support a fold-range() of PRANGE = IRANGE op PRANGE for this operation. 
This is going away when the dust settles.

[Bug tree-optimization/114995] C++23 Assume keyword not being used for vectorization

2024-05-14 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114995

--- Comment #9 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #7)
> The above examples just show misunderstanding what __builtin_assume_aligned
> is and what it is not.  You need to use the result of the built-in function
> in the accesses to be able to use the alignment information, if you just try
> to compare __builtin_assume_aligned (x, 32) == x, it will just fold as
> always true.  The design of the builtin is to attach the alignment
> information to the result of the builtin function only.
> 
> CCing Aldy/Andrew for whether prange can or could be taught to handle the
> assume cases with uintptr_t and bitwise and + comparison.

All the pieces are there to make it work, both with the assume aligned and with
the uintptr_t case.  And we could probably get it all without prange.

For example:

#include 

void foo (const float *);

void bar1 (const float *array)
{
  [[assume(array != nullptr)]];
  const float *aligned = (const float *) __builtin_assume_aligned (array, 32);
  foo (aligned);
}

The __builtin_assume_aligned hasn't been expanded by evrp, so we should be able
to add a range-op entry for it.  This is what evrp sees:

void bar1 (const float * array)
{
  const float * aligned;

   :
  aligned_2 = __builtin_assume_aligned (array_1(D), 32);
  foo (aligned_2);
  return;

}

All we need is a range-op implementation for builtin_assume_aligned.  The
attached crude implementation does it.

=== BB 2 
 :
aligned_2 = __builtin_assume_aligned (array_1(D), 32);
foo (aligned_2);
return;

aligned_2 : [prange] const float * [0, +INF] MASK 0x VALUE 0x0

That is, the bottom 32 bits are cleared.

Andrew will have to comment on the uintptr_t idiom, because it gets expanded
into an .ASSUME() function which I'm unfamiliar with.

For this small function:

void bar2 (const float *array)
{
  [[assume((uintptr_t (array) & (32 - 1)) == 0)]];
  foo (array);
}

evrp expands to:

=== BB 2 
Partial equiv (array.0_3 pe64 array_2(D))
 :
array.0_3 = (long unsigned int) array_2(D);
_4 = array.0_3 & 31;
_5 = _4 == 0;
return _5;

_4 : [irange] long unsigned int [0, 31] MASK 0x1f VALUE 0x0

I don't see any reason why we couldn't get that array.0_3 and array_2 are
aligned to 32-bits.  Maybe we don't set the value/mask pair for the
bitwise_and::op1_range?  The value/mask stuff is not very fleshed out,
especially for the op1_range operators.

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-11 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #18 from Aldy Hernandez  ---
Ah, it looks like seurer already beat me to the preprocessed source.

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-11 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #17 from Aldy Hernandez  ---
(In reply to Martin Jambor from comment #16)
> I'll have look, hopefully on Monday.

Thanks Martin.

To reproduce the problem:

1. Revert this patch:

commit d7bb8eaade3cd3aa70715c8567b4d7b08098e699 (master, clean)
Author: Aldy Hernandez 
Date:   Fri May 10 00:29:13 2024 +0200

Revert: "Enable prange support." [PR114985]

2. Trap on pointer mismatches:

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 000ec802e66..c14b72c19de 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -52,7 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 // Set to 1 to trap on range-op entries that cannot handle the pointer
 // combination being requested.  This is a temporary sanity check to
 // aid in debugging, and will be removed later in the release cycle.
-#define TRAP_ON_UNHANDLED_POINTER_OPERATORS 0
+#define TRAP_ON_UNHANDLED_POINTER_OPERATORS 1

 // Instantiate the operators which apply to multiple types here.

I will also attach a preprocessed file to reproduce the problem on ppc64le that
can save you some time.

[Bug tree-optimization/115026] [15 Regression] msp430-elf fails gcc.dg/pr66444.c with prange enabled

2024-05-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115026

--- Comment #8 from Aldy Hernandez  ---
(In reply to Jeffrey A. Law from comment #7)
> So what's the magic to re-enable prange?  I can do that and spin a fresh
> build.

You could revert this patch:

commit d7bb8eaade3cd3aa70715c8567b4d7b08098e699 (master, clean)
Author: Aldy Hernandez 
Date:   Fri May 10 00:29:13 2024 +0200

Revert: "Enable prange support." [PR114985]

This reverts commit 36e877996936abd8bd08f8b1d983c8d1023a5842 until the IPA
pass is fixed with regards to POINTER = POINTER  POINTER.

[Bug tree-optimization/115026] [15 Regression] msp430-elf fails gcc.dg/pr66444.c with prange enabled

2024-05-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115026

Aldy Hernandez  changed:

   What|Removed |Added

 Status|NEW |WAITING

--- Comment #6 from Aldy Hernandez  ---
This fixes the PR, but since prange is currently disabled, let's keep it open
until it's re-enabled.

[Bug tree-optimization/115026] msp430-elf fails gcc.dg/pr66444.c with prange enabled

2024-05-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115026

--- Comment #4 from Aldy Hernandez  ---
Jeff, if you still have your tree around, could you try this patch?

I'll queue it with the rest of patches I will push before enabling prange when
the IPA issues are sorted out.

[Bug tree-optimization/115026] msp430-elf fails gcc.dg/pr66444.c with prange enabled

2024-05-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115026

--- Comment #3 from Aldy Hernandez  ---
Created attachment 58169
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58169=edit
proposed patch

[Bug tree-optimization/115026] msp430-elf fails gcc.dg/pr66444.c with prange enabled

2024-05-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115026

--- Comment #2 from Aldy Hernandez  ---
OK, this is embarrassing.

We are incorrectly folding a POINTER_PLUS_EXPR range operation:

 Folding statement: x_7 = 2048B + _2;
-Queued stmt for removal.  Folds to: 2062B
+Queued stmt for removal.  Folds to: 0B

The reason is that the prange::update_bitmask() code is ignoring its operand
and using the current bitmask.

I have no idea how this bootstrapped *any* architecture.  I suppose it needs a
value/mask pair (0xe / 0x0) for the second operand that actually indicates a
singleton.  *shrug*

Thanks for reporting this.

[Bug tree-optimization/115026] msp430-elf fails gcc.dg/pr66444.c with prange enabled

2024-05-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115026

Aldy Hernandez  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |aldyh at gcc dot gnu.org
   Last reconfirmed||2024-05-10

--- Comment #1 from Aldy Hernandez  ---
All mine baby!

[Bug tree-optimization/115026] New: msp430-elf fails gcc.dg/pr66444.c with prange enabled

2024-05-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115026

Bug ID: 115026
   Summary: msp430-elf fails gcc.dg/pr66444.c with prange enabled
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: aldyh at gcc dot gnu.org
  Target Milestone: ---

This is a separate issue reported by Jeff in PR115009.  I have reproduced on a
cross compiler.

Copied below is his report:

And on msp430-elf we're getting a codegen correctness issue on msp430-elf. 
gcc.dg/pr66444.c fails in the simulator.  The -O2 code difference looks like:

*** good.s  Thu May  9 20:41:37 2024
--- bad.s   Thu May  9 20:41:44 2024
*** baz:
*** 73,81 
  ; saved regs:(none)
; start of prologue
; end of prologue
!   MOV.W   #2062, R12
CALL#fn1
!   MOV.W   #2062, R12
CALL#fn2
MOV.B   #0, R12
; start of epilogue
--- 73,81 
  ; saved regs:(none)
; start of prologue
; end of prologue
!   MOV.B   #0, R12
CALL#fn1
!   MOV.B   #0, R12
CALL#fn2
MOV.B   #0, R12
; start of epilogue


We're mucking up the pointer that baz() passes down to fn1 and fn2 I think.

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #12 from Aldy Hernandez  ---
Created attachment 58168
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58168=edit
proposed patch in testing

[Bug tree-optimization/115009] [15 regression] AVR: ICE in alloc, at value-range-storage.cc:598

2024-05-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115009

Aldy Hernandez  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #14 from Aldy Hernandez  ---
The pushed patch fixes the problem with different sized pointers.

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #11 from Aldy Hernandez  ---
I have reverted the prange enabling patch until the IPA pass is fixed.

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #10 from Aldy Hernandez  ---
(In reply to David Edelsohn from comment #9)
> The patch in comment 6 succeeds for me, but it seems more of a heavy-handed
> band-aid that confirms the symptom, but covers up the problem.
> 
> Something in GCC apparently has generated invalid IR that was not caught
> earlier.  GCC should not generate
> 
> POINTER = POINTER CMP POINTER
> 
> But the trunk should not be left in a broken state as per GCC development
> policy.  The broken trunk interferes with the work of other developers and
> may mask other broken patches being committed.
> 
> This patch should be reverted until the source of the problem is discovered
> and fixed.

The range request is not necessarily coming from the IR, but it is a request
that the IPA pass is making of the ranger, which does adhere by gimple IR
rules.  So this may not be bad IR, but just a quirk of how IPA makes requests
of the ranger.  As such, I don't think it's heavy handed, but we do need the
IPA experts to opine.

[Bug tree-optimization/115009] [15 regression] AVR: ICE in alloc, at value-range-storage.cc:598

2024-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115009

--- Comment #11 from Aldy Hernandez  ---
(In reply to Jeffrey A. Law from comment #8)
> And on msp430-elf we're getting a codegen correctness issue on msp430-elf. 
> gcc.dg/pr66444.c fails in the simulator.  The -O2 code difference looks like:
> 
> *** good.s  Thu May  9 20:41:37 2024
> --- bad.s   Thu May  9 20:41:44 2024
> *** baz:
> *** 73,81 
>   ; saved regs:(none)
> ; start of prologue
> ; end of prologue
> !   MOV.W   #2062, R12
> CALL#fn1
> !   MOV.W   #2062, R12
> CALL#fn2
> MOV.B   #0, R12
> ; start of epilogue
> --- 73,81 
>   ; saved regs:(none)
> ; start of prologue
> ; end of prologue
> !   MOV.B   #0, R12
> CALL#fn1
> !   MOV.B   #0, R12
> CALL#fn2
> MOV.B   #0, R12
> ; start of epilogue
> 
> 
> We're mucking up the pointer that baz() passes down to fn1 and fn2 I think.

Hmmm, this may be something different.  I can look at it tomorrow.

[Bug tree-optimization/115009] [15 regression] AVR: ICE in alloc, at value-range-storage.cc:598

2024-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115009

--- Comment #10 from Aldy Hernandez  ---
(In reply to Andrew Pinski from comment #7)
> For rl78:
> static scalar_int_mode
> rl78_addr_space_address_mode (addr_space_t addrspace)
> {
>   switch (addrspace)
> {
> case ADDR_SPACE_GENERIC:
>   return HImode;
> case ADDR_SPACE_NEAR:
>   return HImode;
> case ADDR_SPACE_FAR:
>   return SImode;
> default:
>   gcc_unreachable ();
> }
> }
> 
> So yes it is obvious that address space can have different sizes for
> pointers ...

I can test my proposed patch on a cross for avr, but for this testcase on
rl78-none and rl78-elf, I get:

a.c:1:8: error: unknown type name ‘scalar_int_mode’
1 | static scalar_int_mode
  |^~~
a.c:2:31: error: unknown type name ‘addr_space_t’
2 | rl78_addr_space_address_mode (addr_space_t addrspace)
  |   ^~~~

Either way...could someone be so kind as to test on avr and rl78 to see if this
fixes the problem?

[Bug tree-optimization/115009] [15 regression] AVR: ICE in alloc, at value-range-storage.cc:598

2024-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115009

Aldy Hernandez  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |aldyh at gcc dot gnu.org

--- Comment #9 from Aldy Hernandez  ---
Created attachment 58155
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58155=edit
untested patch

[Bug tree-optimization/114912] [15 regression] SIGBUS in wi::copy<> on SPARC since r15-88-gc60b3e211c5557 since char array is not aligned to what it needs to be

2024-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114912

Aldy Hernandez  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #19 from Aldy Hernandez  ---
(In reply to r...@cebitec.uni-bielefeld.de from comment #18)
> > --- Comment #16 from Aldy Hernandez  ---
> > (In reply to r...@cebitec.uni-bielefeld.de from comment #14)
> >> > --- Comment #13 from Aldy Hernandez  ---
> >> > BTW, I'm waiting for a review, or at least a nod from a C++ savvy
> >> > person here:
> >> >
> >> > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650634.html
> >> 
> >> I can give the patch a whirl, thanks.
> >
> > I've attached a rebased patch against current mainline.  Let me know if it
> > works on your end, and I'll commit.
> 
> I've included both this patch ...
> 
> >> I had Andrew's patch in my tree to avoid the issue.  Unfortunately,
> >> Solaris/SPARC bootstrap is broken again due to PR ipa/114985.
> >
> > I have provided a patch for that PR as well, but the IPA folk need to say if
> > this is the correct approach.
> 
> ... and that one in last night's SPARC bootstraps, which worked just
> fine again.  Thanks.

Thanks for testing.  I'll close this PR as fixed in mainline then.  FWIW, I
also retested on x86-64 Linux.

[Bug tree-optimization/114912] [15 regression] SIGBUS in wi::copy<> on SPARC since r15-88-gc60b3e211c5557 since char array is not aligned to what it needs to be

2024-05-08 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114912

--- Comment #16 from Aldy Hernandez  ---
(In reply to r...@cebitec.uni-bielefeld.de from comment #14)
> > --- Comment #13 from Aldy Hernandez  ---
> > BTW, I'm waiting for a review, or at least a nod from a C++ savvy person 
> > here:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650634.html
> 
> I can give the patch a whirl, thanks.

I've attached a rebased patch against current mainline.  Let me know if it
works on your end, and I'll commit.

> 
> I had Andrew's patch in my tree to avoid the issue.  Unfortunately,
> Solaris/SPARC bootstrap is broken again due to PR ipa/114985.

I have provided a patch for that PR as well, but the IPA folk need to say if
this is the correct approach.

[Bug tree-optimization/114912] [15 regression] SIGBUS in wi::copy<> on SPARC since r15-88-gc60b3e211c5557 since char array is not aligned to what it needs to be

2024-05-08 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114912

--- Comment #15 from Aldy Hernandez  ---
Created attachment 58136
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58136=edit
proposed patch in testing

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-08 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #6 from Aldy Hernandez  ---
I wonder if something like this would work.

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 5781f50..ea8a685 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -1730,6 +1730,8 @@ ipa_value_range_from_jfunc (vrange ,
}
   else
{
+ if (TREE_CODE_CLASS (operation) == tcc_comparison)
+   vr_type = boolean_type_node;
  Value_Range op_res (vr_type);
  Value_Range res (vr_type);
  tree op = ipa_get_jf_pass_through_operand (jfunc);

[Bug tree-optimization/114912] [15 regression] SIGBUS in wi::copy<> on SPARC since r15-88-gc60b3e211c5557 since char array is not aligned to what it needs to be

2024-05-08 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114912

--- Comment #13 from Aldy Hernandez  ---
BTW, I'm waiting for a review, or at least a nod from a C++ savvy person here:

https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650634.html

[Bug bootstrap/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-08 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

Aldy Hernandez  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-05-08
 Status|UNCONFIRMED |NEW
 CC||amacleod at redhat dot com,
   ||hubicka at gcc dot gnu.org,
   ||jamborm at gcc dot gnu.org

--- Comment #4 from Aldy Hernandez  ---
Confirmed.

Here is some background on tracking discriminator failures.

The sanity check in the range_op dispatch code has noticed that it has an
unsupported pointer range combination.  This is the sanity check:

bool
range_op_handler::fold_range (vrange , tree type,
  const vrange ,
  const vrange ,
  relation_trio rel) const
...
...
  if (has_pointer_operand_p (r, lh, rh)
  && !m_operator->pointers_handled_p (DISPATCH_FOLD_RANGE,
  dispatch_kind (r, lh, rh)))
discriminator_fail (r, lh, rh);

The above code fails if the operator cannot handle the pointer combo it was
passed, and the operator at hand is op_equal:

(gdb) p *this
$11 = {m_operator = 0x1463a590 }

What's being attempted is a  =  ==  as per the
somewhat cryptic error:

DISCRIMINATOR FAIL.  Dispatch > RO_PPP <

This is because for operator_equal::fold_range(), we only handle INTEGER =
POINTER OP_EQUAL POINTER.  That is, the result must be an integer:

bool
operator_equal::pointers_handled_p (range_op_dispatch_type type,
unsigned dispatch) const
{
  switch (type)
{
case DISPATCH_FOLD_RANGE:
  return dispatch == RO_IPP;
...
...
}


This all comes from ipa_value_range_from_jfunc() which is trying to calculate
the equality of two pointer ranges and store the result in a pointer.  I
believe this is incorrect, as the result of equality should be a
boolean_type_node, or at least an integer.

This is where IPA is trying to call fold_range with the invalid combo:

 Value_Range op_res (vr_type);
  Value_Range res (vr_type);
  tree op = ipa_get_jf_pass_through_operand (jfunc);
  Value_Range op_vr (TREE_TYPE (op));
  range_op_handler handler (operation);

  ipa_range_set_and_normalize (op_vr, op);

  if (!handler
  || !op_res.supports_type_p (vr_type)
=>|| !handler.fold_range (op_res, vr_type, srcvr, op_vr))
op_res.set_varying (vr_type);


It is trying to call fold_range() with EQ_EXPR for two pointer operands, but
storing the result in a pointer:

(gdb) p operation
$18 = EQ_EXPR
(gdb) p debug(srcvr)
[prange] union tree_node * * [0, +INF] MASK 0xfff8 VALUE 0x0
$14 = void
(gdb) p debug(op_vr)
[prange] union tree_node * * [0, 0] MASK 0x0 VALUE 0x0
$15 = void

IMO op_res should have an integer type, but it is a pointer:

(gdb) p debug_generic_stmt(vr_type)
union tree_node * *

...causing op_res to be a pointer:

(gdb) p debug(op_res)
[prange] UNDEFINED

If  =  OP_EQUAL  is valid gimple, then we should
change operator_equal::fold_range() to accept all pointer operands.  If not,
then we need to change the IPA pass.

I would appreciate if an IL expert could opine here.

[Bug bootstrap/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-08 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #2 from Aldy Hernandez  ---
Yeah, that's mine.

Can you attach a preprocessed file of the offending file?

[Bug tree-optimization/114912] [15 regression] SIGBUS in wi::copy<> on SPARC since r15-88-gc60b3e211c5557 since char array is not aligned to what it needs to be

2024-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114912

--- Comment #12 from Aldy Hernandez  ---
(In reply to Andrew Pinski from comment #10)
> If Aldy does not fix it by Saturday, I will give the union a try then. I
> will also try out the solaris machine on the compile farm if possible.

Sorry, didn't mean for you to pick this up.  Thanks for the analysis.  I can
take it from here.

[Bug middle-end/114912] [15 regression] SIGBUS in wi::copy<> on SPARC

2024-05-01 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114912

--- Comment #1 from Aldy Hernandez  ---
Since this happens while building libgcc during stage1, perhaps this can be
reproduced with a cross?  Would it be possible to get the preprocessed file
that's failing?

You could try /var/gcc/reghunt/sigbus-range/288807/./gcc/xgcc -save-temps [blah
blah], and attach the libgcc2.i file that gets generated.

[Bug tree-optimization/111864] [12/13/14 Regression] Dead Code Elimination Regression

2024-03-15 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111864

--- Comment #3 from Aldy Hernandez  ---
(In reply to Jeffrey A. Law from comment #2)
> It almost looks like a costing issue.  The threaders find opportunities to
> thread all the incoming edges in the key block to the path which avoids the
> call..  But all the paths get rejected.  
> 
> 
> This is the  key block:
> 
> ;;   basic block 11, loop depth 0, count 976284897 (estimated locally, freq
> 0.9092), maybe hot
> ;;prev block 10, next block 12, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   10 [99.8% (guessed)]  count:225266786 (estimated locally,
> freq 0.2098) (TRUE_VALUE,EXECUTABLE)
> ;;14 [100.0% (guessed)]  count:751018112 (estimated locally,
> freq 0.6994) (TRUE_VALUE,EXECUTABLE)
>   # o.10_11 = PHI <(10), o.10_28(14)>
>   _17 = o.10_11 == 
>   _20 = o.10_11 == 
>   _27 = _20 | _17;
>   if (_27 != 0)
> goto ; [58.87%]
>   else
> goto ; [41.13%]
> 
> It's pretty obvious that 10->11 can thread to 6.  If we look at the other
> incoming edge we need o.10_28 which comes from bb14 with the value   So
> that path should be 14->10->11 threading to 6.
> 
> But they get rejected during threadfull2.

In order for threadfull2 to thread 10->11->6, it would have to handle pointer
equivalences, but the new threader doesn't currently handle them.  In VRP we
are able to handle equivs through the pointer_equiv_analyzer class which
pushes/pops state.  It only works when traversing in dominator order, which I
suppose is technically the case when going down a path.

So...I think you could wire pointer_equiv_analyzer to the new threader and get
this, though it may need to be also taught about PHIs.  The class is very
simplistic, and was only a stop gap until we got pointer ranges (with points-to
info).

Question though, why didn't the old threader get this?  I see bb11 looks
exactly the same in DOM3, and the old threader does handle pointer equivalences
through its scoped tables.  May be the IL is too complicated?

If we want to fix this in this release, I think we could do it with
pointer_equiv_analyzer, but for future releases we should be able to get it for
free when pranges are implemented.  Half the work is already done...let's see
how far I get considering I'll be going on paternity leave for a few months
this year.

[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))

2024-03-15 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331

--- Comment #13 from Aldy Hernandez  ---
I think y'all have it all figured out.  Basically,

operator_cast::op1_range() is solving num_5 in the equation:

[111,111] = (short int) num_5

Where lhs is:
(gdb) p debug(lhs)
[irange] short int [111, 111]

And the bitmask is implicit, but correctly calculated:
(gdb) p debug(lhs.get_bitmask ())
MASK 0x0 VALUE 0x6f

And the range for num_5 is just varying.

You are looking at the true side of the truncating_cast_p() conditional in
op1_range.  Essentially, once it starts building the range it ignores known
bitmasks, which cause it to return an overly conservative range.

And yes, bitmasks are calculated on demand per the comment in get_bitmask:

  // The mask inherent in the range is calculated on-demand.  For   
  // example, [0,255] does not have known bits set by default.  This
  // saves us considerable time, because setting it at creation incurs  
  // a large penalty for irange::set.  At the time of writing there 
  // was a 5% slowdown in VRP if we kept the mask precisely up to date  
  // at all times.  Instead, we default to -1 and set it when   
  // explicitly requested.  However, this function will always return   
  // the correct mask.

We may have to revisit this.  Perhaps it doesn't matter any more, and we could
keep bitmasks up to date.  It would definitely make the dumps easier to read. 
Though, at least for this testcase the bitmask is correctly returned with
get_bitmask().

I think the analysis in comment 11 is spot on.  And no Andrew, it's not *my*
code, it all comes from the bit-ccp code :-P.

fold_range() should call update_bitmask(), which eventually calls
bit_value_binop() in the CCP code.

And yes Jakub, as you have noticed, BIT_IOR_EXPR, BIT_XOR_EXPR, and likely
other operators may need to be tweaked to take bitmasks into account.  I
wouldn't be surprised if there's a lot of low hanging fruit in this space.

[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))

2024-03-14 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331

--- Comment #6 from Aldy Hernandez  ---
(In reply to Andrew Macleod from comment #5)
> (In reply to Jakub Jelinek from comment #4)
> > Actually, looking at value-range.h, irange_bitmask doesn't have just the
> > mask but also value, so I wonder why it isn't able to figure this out.
> > I'd think m_mask should be ~0x and value 111.
> 
> _1 = (short int) num_5(D);
> _2 = (int) _1;
> switch (_1)
> 
> globally we know 
> _2 : [irange] int [-32768, 32767]
> and coming into bb 3:
> _2 : [irange] int [-32768, 32767]
> 2->3  _1 :  [irange] short int [111, 111]
> 2->3  _2 :  [irange] int [111, 111]
> 2->3  num_5(D) :[irange] int [-INF, -65537][-65425, -65425][111,
> 111][65536, +INF]
> 
> I guess what you are looking for is to add known bits to the range produced
> for num_5 on the outgoing edge.
> 
> This would have to be done in operator_cast::op1_range  where we are
> reducing it to known range of the switch.  I do not believe it makes any
> attempts to set bitmasks based on casts like that currently.
> 
> so, GORI working backwards has _1 = [irange] short int [111, 111] , so it
> would be satisfying the expression:
>   short int [111, 111] = (short int) num_5(D);
> 
> when evaluating num_5 in operator_cast::op1_range, in the truncating_cast_p
> section we could also see if there is a bitmask pattern for the LHS that
> could be applied to the range..  I think that basically what you are asking
> for.   Simple enough for a constant I suppose for starters.   Or maybe Aldy
> has a routine that picks bitmasks and values from ranges somewhere?

You may want to look at:

// Return the bitmask inherent in the range.

irange_bitmask
irange::get_bitmask_from_range () const
{
}

IIRC, this code was originally authored by Jakub and was embedded in the
tree-ssanames.cc code setting nonzero bits every time we set a global range.  I
just abstracted it and adapted it to work within our framework.

[Bug ipa/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-22 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #18 from Aldy Hernandez  ---
(In reply to rguent...@suse.de from comment #17)
> On Wed, 21 Feb 2024, aldyh at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476
> > 
> > --- Comment #8 from Aldy Hernandez  ---
> > (In reply to Richard Biener from comment #5)
> > > (In reply to Martin Jambor from comment #4)
> > > > The right place where to free stuff in lattices post-IPA would be in
> > > > ipa_node_params::~ipa_node_params() where we should iterate over 
> > > > lattices
> > > > and deinitialize them or perhaps destruct the array because since
> > > > ipcp_vr_lattice directly contains Value_Range which AFAIU directly 
> > > > contains
> > > > int_range_max which has a virtual destructor... does not look like a POD
> > > > anymore.  This has escaped me when I was looking at the IPA-VR changes 
> > > > but
> > > > hopefully it should not be too difficult to deal with.
> > > 
> > > OK, that might work for the IPA side.
> > > 
> > > It's quite unusual to introduce a virtual DTOR in the middle of the class
> > > hierarchy though.  Grepping I do see quite some direct uses of 'irange'
> > > and also 'vrange' which do not have the DTOR visible but 'irange' already
> > > exposes and uses 'maybe_resize'.  I think those should only be introduced
> > > in the class exposing the virtual DTOR (but why virtual?!).
> > > 
> > > Would be nice to have a picture of the range class hierarchies with
> > > pointers on which types to use in which circumstances ...
> > 
> > For reference, you should always use the most base class you can.  If
> > you can get the work done with the vrange API, use that.  If you're
> > dealing with an integer, use irange.  The int_range<> derived class
> > should only be used for defining a variable, so:
> > 
> > int_range<123> foobar; // Defined with storage.
> > 
> > if (is_a )
> > {
> >   irange  = as_a  (foobar);
> >   ...
> > }
> > 
> > // Use an irange reference here, not an int_range reference.
> > void somefunc (const irange )
> > {
> > }
> > 
> > Also, the reason irange does not have a destructor is because it has
> > no storage.  Only int_range<> has storage associated with it, so it is
> > the only one able to see if the range grew:
> 
> But when I do
> 
>  irange *r = new int_range<>;
>  delete r;
> 
> then it will fail to release memory?  Are virtual DTORs not exactly
> invented for this, when you delete on a reference/pointer to a base
> class but the object is really a derived one?

There is no memory to release above, as int_range<> does not grow by default. 
Only int_range_max can grow, and it's meant to live on the stack by design. 
That was the whole point:

typedef int_range<3, /*RESIZABLE=*/true> int_range_max;

I suppose if you specifically wanted to shoot yourself in the foot, you could
do:

irange *r = new int_range<5, true>;

...and that would fail to release memory, but why would you do that?  If you're
allocating a lot of ranges, we have the vrange_allocator written specifically
for that, which is a lot less memory intensive.  It's what we use in the cache.

If ranges are anywhere but the stack, they should go through the allocator
which will squish down things appropriately, as iranges are very big.  The
ipa_vr class also uses the allocator in order to keep the memory footprint
down.

I guess it wouldn't hurt to have a virtual destructor in irange or even vrange
for future proofing, but it's not needed right now.  I don't have any strong
opinions, unless there's a performance penalty.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-21 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #13 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #12)
> (In reply to Aldy Hernandez from comment #11)
> > Both patches pass test.  Up to the release maintainers to decide if they
> > want to include them in this release.  Otherwise, I'll queue them up for
> > later.
> 
> This is an important regression, why shouldn't it be included in GCC 14?
> GCC trunk is open for regression and documentation fixes...

Martin's change to IPA fixes the leak in the PR.  My changes are just cleanups,
as no other pass is using ranges in GC space.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-21 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #11 from Aldy Hernandez  ---
Both patches pass test.  Up to the release maintainers to decide if they want
to include them in this release.  Otherwise, I'll queue them up for later.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-21 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #10 from Aldy Hernandez  ---
Created attachment 57478
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57478=edit
Remove GTY support for vrange and friends

Bootstraps.  Tests are pending.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-21 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #9 from Aldy Hernandez  ---
Created attachment 57477
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57477=edit
Remove virtual from int_range destructor.

Bootstraps.  Tests are pending.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-21 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #8 from Aldy Hernandez  ---
(In reply to Richard Biener from comment #5)
> (In reply to Martin Jambor from comment #4)
> > The right place where to free stuff in lattices post-IPA would be in
> > ipa_node_params::~ipa_node_params() where we should iterate over lattices
> > and deinitialize them or perhaps destruct the array because since
> > ipcp_vr_lattice directly contains Value_Range which AFAIU directly contains
> > int_range_max which has a virtual destructor... does not look like a POD
> > anymore.  This has escaped me when I was looking at the IPA-VR changes but
> > hopefully it should not be too difficult to deal with.
> 
> OK, that might work for the IPA side.
> 
> It's quite unusual to introduce a virtual DTOR in the middle of the class
> hierarchy though.  Grepping I do see quite some direct uses of 'irange'
> and also 'vrange' which do not have the DTOR visible but 'irange' already
> exposes and uses 'maybe_resize'.  I think those should only be introduced
> in the class exposing the virtual DTOR (but why virtual?!).
> 
> Would be nice to have a picture of the range class hierarchies with
> pointers on which types to use in which circumstances ...

For reference, you should always use the most base class you can.  If
you can get the work done with the vrange API, use that.  If you're
dealing with an integer, use irange.  The int_range<> derived class
should only be used for defining a variable, so:

int_range<123> foobar; // Defined with storage.

if (is_a )
{
  irange  = as_a  (foobar);
  ...
}

// Use an irange reference here, not an int_range reference.
void somefunc (const irange )
{
}

Also, the reason irange does not have a destructor is because it has
no storage.  Only int_range<> has storage associated with it, so it is
the only one able to see if the range grew:

template
inline
int_range::~int_range ()
{
  if (RESIZABLE && m_base != m_ranges)
delete[] m_base;
}

The irange superclass does not have storage, just the m_base pointer.
OTOH, m_ranges[] lives in int_range<>:

template
class int_range : public irange
{
...
private:
  wide_int m_ranges[N*2];
};

This shouldn't be a problem because you should never be putting a raw irange
pointer in a container that you allocate/deallocate.

Does this help?  If so, I could definitely document it in value-range.h.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-21 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #7 from Aldy Hernandez  ---
Let me see if I can untangle things here.  Thanks for chasing
this down, BTW.

Value_Range doesn't need a CTOR because it has an int_range_max which
does have one (courtesy of int_range<>), so things get initialized
correctly.  Deleting a memory allocated container that has a
Value_Range also works, because of the ~int_range<> destructor:

+  {
+struct container
+{
+  Value_Range vr;
+};
+struct container *pointer = new container;
+pointer->vr.set_type (integer_type_node);
+for (int i = 0; i < 100; i += 5)
+  {
+   int_range<1> tmp (integer_type_node,
+ wi::shwi (i, TYPE_PRECISION (integer_type_node)),
+ wi::shwi (i+1, TYPE_PRECISION (integer_type_node)));
+   pointer->vr.union_ (tmp);
+  }
+delete pointer;

Deleting pointer causes us to call int_range::~int_range() which clean things appropriately.

So it looks like the problem is the missing destructor in IPA.

That being said, there's a few things I've noticed thanks to you guys'
analysis.

First, the virtual marker in the int_range<> destructor is not needed.
IPA + ranges went through various changes this release, and I believe
that was left over from when value_range (int_range<2>) was being
placed in GC space directly.

We went through various incantations this release wrt IPA storage of
ranges, both in GC and in the stack.  Ultimately we ended up with
ipa_vr, which I believe is the only use of ranges in GC space, and
furthermore it uses vrange_storage not vrange nor irange nor anything
else.  For that matter, vrange_storage doesn't even have pointers, so
we shouldn't need GTY markers at all.

It seems that since IPA is the only GC user of iranges, I think we
can safely remove GTY support from the entire vrange hierarchy.

The rule of thumb should be that only int_range<> and derivatives
should be in short-term storage, and GC users should use
vrange_storage (like ipa_vr is doing, or perhaps vrange_allocator
which automates the allocation).

[Bug tree-optimization/113752] [14 Regression] warning: ‘%s’ directive writing up to 10218 bytes into a region of size between 0 and 10240 [-Wformat-overflow=] since r14-261-g0ef3756adf078c

2024-02-12 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113752

Aldy Hernandez  changed:

   What|Removed |Added

 CC|aldyh at redhat dot com|

--- Comment #3 from Aldy Hernandez  ---
Can't reproduce on x86-64 on recent trunk:

abulafia:~/bld/t/gcc []$ ./xgcc -B./ -c -O3 -Wall a.c
abulafia:~/bld/t/gcc []$ cat a.c
char a[10256];
char b;
char *c, *g;
int d, e, f;
int sprintf(char *, char *, ...);
unsigned long strlen(char *);
int h(char *j) {
  if (strlen(j) + strlen(c) + strlen(g) + 32 > 10256)
return 0;
  sprintf(a, "%s:%s:%d:%d:%d:%c:%s\n", j, c, d, e, f, b, g);
  return 1;
}

[Bug tree-optimization/113735] ICE: in operator[], at vec.h:910 with _BitInt() at -O and above

2024-02-08 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113735

Aldy Hernandez  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #8 from Aldy Hernandez  ---
fixed in trunk

[Bug tree-optimization/113735] ICE: in operator[], at vec.h:910 with _BitInt() at -O and above

2024-02-06 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113735

--- Comment #6 from Aldy Hernandez  ---
Created attachment 57336
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57336=edit
Proposed patch #2

Thanks for the suggestion Jakub.

[Bug tree-optimization/113735] ICE: in operator[], at vec.h:910 with _BitInt() at -O and above

2024-02-06 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113735

--- Comment #4 from Aldy Hernandez  ---
Created attachment 57335
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57335=edit
Proposed patch

Patch in testing.

[Bug tree-optimization/113735] ICE: in operator[], at vec.h:910 with _BitInt() at -O and above

2024-02-05 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113735

--- Comment #2 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #1)
> Slightly tweaked, still -O1:
> char b;
> void bar (void);
> 
> void
> foo (_BitInt(6110) j)
> {
>   for (;;)
> {
>   _BitInt(10) k = b % j;
>   for (j = 6; j; --j)
>   if (k)
> bar ();
> }
> }
> 
> The ICE is in on:
> 721 if (!m_equiv[bb->index])
> because bb->index is larger than m_equiv size.
> The bitint lowering pass works by walking the IL and replacing some
> statements in there with lowered code, which can involve new basic blocks
> (either through splitting existing blocks or creating fresh new ones) and
> the like.  And asks the ranger about range of statements during that.
> Is that something that ranger doesn't/can't support?
> So, would I need to ensure to find out what ranges I'll need before making
> any changes,
> ask for them, remember them somewhere on the side and then use them during
> the transformations?

We're generally OK with changing IL, if the semantic of the statement doesn't
change.  For example, you couldn't change the meaning of a statement to mean
something totally different, as that would invalidate the cached value of the
resulting SSA.  The cache was tweaked in the last release or so to handle
growing SSA names.

I'm not totally sure on changing BBs, as VRP didn't add BB's in flight, IIRC. 
But I do see code in the relation oracle handling a change in the BB count:

void
equiv_oracle::limit_check (basic_block bb)
{
  int i = (bb) ? bb->index : last_basic_block_for_fn (cfun);
  if (i >= (int)m_equiv.length ())
m_equiv.safe_grow_cleared (last_basic_block_for_fn (cfun) + 1);
}

Interestingly, this function is never used.

Andrew, was this an oversight?

The attached patch seems to fix the problem, and it looks like all other
dereferences of m_equiv[] have a suitable check, or are looking for things
already known to have an entry.Andrew would know for sure.

diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index 27f9ad61c0e..619ee5f0867 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -718,6 +718,7 @@ equiv_oracle::add_equiv_to_block (basic_block bb, bitmap
equiv_set)

   // Check if this is the first time a block has an equivalence added.
   // and create a header block. And set the summary for this block.
+  limit_check (bb);
   if (!m_equiv[bb->index])
 {
   ptr = (equiv_chain *) obstack_alloc (_chain_obstack,

[Bug tree-optimization/110603] [14 Regression] GCC, ICE: internal compiler error: in verify_range, at value-range.cc:1104 since r14-255

2024-01-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110603

Aldy Hernandez  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #4 from Aldy Hernandez  ---
(In reply to wierton from comment #0)
> The testing program:
> ```
> typedef long unsigned int size_t;
> void *memcpy(void *, const void *, size_t);
> int snprintf(char *restrict, size_t, const char *restrict, ...);
> 
> extern char a[2];
> void test_func_on_line_62(void) {
>   memcpy(a, "12", sizeof("12") - 1);
>   const int res = snprintf(0, 0, "%s", a);
>   if (res <= 3)
> do {
>   extern void f(void);
>   f();
> } while (0);
> }

The sprintf pass is ICEing because it's trying to build a nonsensical range of
[2,1].  Legacy irange tried harder with swapped ranges, but in the above case
it would actually drop to VARYING:

-  /* There's one corner case, if we had [C+1, C] before we now have
-that again.  But this represents an empty value range, so drop
-to varying in this case.  */

Which would cause the sprintf pass to set a global range of VARYING.  I can't
remember whether this meant nuking the known global range, or ignoring it
altogether (the semantics changed in the last release or two).  My guess is the
later, since set_range_info() improves ranges, never pessimizes them.

Now the reason we're passing swapped endpoints seems to originate in
get_range_strlen_dynamic().  It is setting a min of 2, courtesy of the nonzero
characters in the memcpy:

memcpy(a, "12", sizeof("12") - 1);

This comes from tree-ssa-strlen.c:
  if (!pdata->minlen && si->nonzero_chars)
{
  if (TREE_CODE (si->nonzero_chars) == INTEGER_CST)
pdata->minlen = si->nonzero_chars;


Further down we set a max of 1, stemming from the size of a[2] minus 1 for the
terminating null:

  if (TREE_CODE (size) == INTEGER_CST)
{
  ++off;   /* Increment for the terminating nul.  */
  tree toffset = build_int_cst (size_type_node, off);
  pdata->maxlen = fold_build2 (MINUS_EXPR, size_type_node,
size,
   toffset);
  pdata->maxbound = pdata->maxlen;
}

I don't understand this code enough to opine, but at the worst we could bail if
the ends are swapped.  It's no worse than what we had before.

[Bug tree-optimization/102958] std::u8string suboptimal compared to std::string

2024-01-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102958

--- Comment #7 from Aldy Hernandez  ---
Created attachment 57016
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57016=edit
preprocessed testcase with GCC13

Compile with -O2 -std=c++20

[Bug tree-optimization/102958] std::u8string suboptimal compared to std::string

2024-01-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102958

Aldy Hernandez  changed:

   What|Removed |Added

 CC||aldyh at gcc dot gnu.org,
   ||amacleod at redhat dot com,
   ||jakub at gcc dot gnu.org,
   ||jason at gcc dot gnu.org,
   ||jwakely.gcc at gmail dot com
Summary|std::u8string suboptimal|std::u8string suboptimal
   |compared to std::string,|compared to std::string
   |triggers warnings   |

--- Comment #6 from Aldy Hernandez  ---
Adjusting description, since not only have we disabled the warning in the C++
headers so this no longer warns, but the underlying problem has nothing to do
with warnings.

The char_traits specialization is opaque enough such that we can't
figure out the length:

  static _GLIBCXX17_CONSTEXPR size_t
  length(const char_type* __s)
  {
#if __cplusplus >= 201703L
if (std::__is_constant_evaluated())
  return __gnu_cxx::char_traits::length(__s);
#endif
size_t __i = 0;
while (!eq(__s[__i], char_type()))
  ++__i;
return __i;
  }

OTOH, the  specialization falls back to a __builtin_strlen which which is
trivial to see through.

I think this boils down to pinski's comment that we fail to see a string length
calculation in the following sequence, which survives all the way to the
.optimized dump:

   [local count: 8687547538]:
  # __i_46 = PHI <__i_22(3), 0(2)>
  __i_22 = __i_46 + 1;
  _24 = MEM[(const char_type &)"123456789" + __i_22 * 1];
  if (_24 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

I've seen variations of the above being turned into __builtin_strlen by fre,
ldist, as well as the strlen [ass.  Who's job is it perform this optimization?

[Bug tree-optimization/111458] [11 Regression] ICE in in dfs_enumerate_from, at cfganal.c:1560

2023-09-19 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111458

--- Comment #4 from Aldy Hernandez  ---
(In reply to Richard Biener from comment #3)

> This issue is still latent in the forward threader.  The backward threader
> calls this function from back_threader_profitability::profitable_path_p,
> so before doing any CFG changes.  In GCC 12 and later the remaining forward
> threader instances are all from DOM(?).

Yes.  Remaining uses of the forward threader are all from DOM.

I'd like to remove the forward threader in the next release, but it requires
being absolutely sure ranger catches everything the scoped tables do,
especially the floating point stuff.

[Bug c/111468] cannot express unordered equal in gimple FE

2023-09-19 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111468

--- Comment #6 from Aldy Hernandez  ---
(In reply to Richard Biener from comment #4)
> Fixed on trunk.

Sweet.  Thanks so much.  This really helps.

[Bug middle-end/111468] New: cannot express unordered equal in gimple FE

2023-09-18 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111468

Bug ID: 111468
   Summary: cannot express unordered equal in gimple FE
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: aldyh at gcc dot gnu.org
  Target Milestone: ---

Created attachment 55930
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55930=edit
Failing testcase

It looks like you can't express an UNEQ_EXPR in the GIMPLE FE.

Am I missing something?  The snippet below compiles fine with:

  if (x_1 == a_5(D))

but not with:

  if (x_1 u== a_5(D))

And at least tree-pretty-print.cc has:

case UNEQ_EXPR:
  return "u==";

I'm trying to come up with a threading test for some unordered stuff, and I'm
getting:

$ ./cc1 c.c -O2 -fgimple -quiet
c.c: In function ‘foo’:
c.c:19:10: error: expected ‘)’ before ‘u’
   19 |   if (x_1 u== a_5(D))
  |  ^~
  |  )

[snip snip]

Is there a blessed way of expressing unordered comparisons in the GIMPLE FE?

[Bug tree-optimization/110875] [14 Regression] Dead Code Elimination Regression since r14-2501-g285c9d042e9

2023-08-21 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110875

Aldy Hernandez  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #2 from Aldy Hernandez  ---
(In reply to Andrew Pinski from comment #1)

> I am super confused about VRP's ranges:
> We have the following that ranges that get exported and their relationships:
> Global Exported: a.8_105 = [irange] int [-2, 0]
>   _10 = a.8_105 + -1;
> Global Exported: _10 = [irange] int [-INF, -6][-3, -1][1, 2147483645]
>   _103 = (unsigned int) _10;
> Global Exported: _103 = [irange] unsigned int [1, 2147483645][2147483648,
> 4294967290][4294967294, +INF]
> Simplified relational if (_103 > 1)
>  into if (_103 != 1)
> 
> 
> Shouldn't the range of _10 just be [-3,-1] 
> If so _103 can't get 0 or 1 ? And then if that gets it right then the call
> to foo will go away.

[It looks like a caching issue of some kind.  Looping Andrew.]

Yes, that is indeed confusing.  _10 should have a more refined range.

Note that there's a dependency between a.8_105 and _10:

 [local count: 327784168]:
# f_lsm.17_26 = PHI 
# a.8_105 = PHI <0(3), _10(13)>
# b_lsm.19_33 = PHI 
# b_lsm_flag.20_53 = PHI <0(3), 1(13)>
# a_lsm.21_49 = PHI <_54(D)(3), _10(13)>
_9 = e.10_39 + 4294967061;
_10 = a.8_105 + -1;
if (_10 != -3(OVF))
  goto ; [94.50%]
else
  goto ; [5.50%]

This is what I see with --param=ranger-debug=tracegori in VRP2...

We first calculate a.8_105 to [-INF, -5][-2, 0][2, 2147483646]:

1140 range_of_stmt (a.8_105) at stmt a.8_105 = PHI <0(3), _10(13)>
1141   ROS dependence fill
 ROS dep fill (a.8_105) at stmt a.8_105 = PHI <0(3), _10(13)>
 ROS dep fill (_10) at stmt _10 = a.8_105 + -1;
1142 range_of_expr(a.8_105) at stmt _10 = a.8_105 + -1;
 TRUE : (1142) range_of_expr (a.8_105) [irange] int [-INF, -5][-2,
0][2, 2147483646]

Which we later refine with SCEV:

Statement _10 = a.8_105 + -1;
 is executed at most 2147483647 (bounded by 2147483647) + 1 times in loop 4.
   Loops range found for a.8_105: [irange] int [-2, 0] and calculated range
:[irange] int [-INF, -6][-2, 0][2, 2147483645]
 TRUE : (1140) range_of_stmt (a.8_105) [irange] int [-2, 0]
Global Exported: a.8_105 = [irange] int [-2, 0]

I have verified that range_of_expr after this point returns [-2, 0], so we know
both globally and locally this refined range.

However, when we try to fold _10 later on, we use the cached value instead of
recalculating with the new range for a.8_105:

Folding statement: _10 = a.8_105 + -1;
872  range_of_stmt (_10) at stmt _10 = a.8_105 + -1;
 TRUE : (872)  cached (_10) [irange] int [-INF, -6][-3, -1][1,
2147483645]

[Bug ipa/110753] [14 Regression] ICE in meet_with_1, at ipa-cp.cc:1057

2023-08-18 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110753

Aldy Hernandez  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Aldy Hernandez  ---
(In reply to Aldy Hernandez from comment #3)

> MASK 0xfffc VALUE 0x0 (low 2 bits are known to be 1, everything
> else is unknown).

FWIW, the above implies the low 2 bits are known to be 0, not 1.  Not that it
matters, as I just fixed this PR in trunk :).

[Bug ipa/110753] [14 Regression] ICE in meet_with_1, at ipa-cp.cc:1057

2023-08-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110753

Aldy Hernandez  changed:

   What|Removed |Added

 CC||aldyh at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |aldyh at gcc dot gnu.org

--- Comment #3 from Aldy Hernandez  ---
The intersection of two mask/value pairs is being pessimized incorrectly.

MASK 0x0 VALUE 0x1 (value is known to be 1)
MASK 0xfffc VALUE 0x0 (low 2 bits are known to be 1, everything
else is unknown).

irange_bitmask::intersect() is intersecting these to:

MASK 0x VALUE 0x0

(i.e. totally unknown value).

This is causing union_bitmask() to return a change when none actually occurred,
because we end up losing the bitmask.

Mine.

[Bug tree-optimization/107043] range information not used in popcount

2023-07-12 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107043

Aldy Hernandez  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Aldy Hernandez  ---
Finally... fixed in trunk :).

[Bug tree-optimization/107053] ones bits is not tracked and popcount is not tracked

2023-07-12 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107053

Aldy Hernandez  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Aldy Hernandez  ---
fixed in trunk

[Bug middle-end/110228] [13/14 Regression] llvm-16 miscompiled due to an maybe uninitialized and optimizations saying that the uninitialized has a nonzero bits of 1.

2023-06-20 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110228

--- Comment #17 from Aldy Hernandez  ---
(In reply to Andrew Pinski from comment #4)
> Phiopt does this:
> ```
>   v_13 == 1 ? 1 : LookupFlags_6
> Matching expression match.pd:1990, gimple-match-5.cc:23
> Matching expression match.pd:1990, gimple-match-5.cc:23
> Matching expression match.pd:2479, gimple-match-4.cc:35
> Matching expression match.pd:2482, gimple-match-3.cc:66
> Matching expression match.pd:2489, gimple-match-2.cc:58
> Matching expression match.pd:1947, gimple-match-7.cc:20
> Applying pattern match.pd:4742, gimple-match-7.cc:15326
> Folded into the sequence:
> _17 = v_13 == 1;
> _18 = LookupFlags_6 | _17;
> Removing basic block 5
> ;; basic block 5, loop depth 1
> ;;  pred:   4
> ;;  succ:   6
> ```
> As zero_one_valued_p returns true for LookupFlags_6 because
> tree_nonzero_bits/get_nonzero_bits returns 1 for it.
> 
> So it is ranger in the end that returns that it has a nonzero bits of 1.
> Which is completely wrong as LookupFlags_6 is defined by:
>   # LookupFlags_6 = PHI 
> 
> 
> Which has an uninitialized variable in it which is not defined at what its
> value would be 

BTW, it doesn't seem ranger would be involved here at all, since this bug
happens with -O1, and evrp nor VRP* run at this level.  The only other way
ranger could be involved at -O1 is through DOM threading's use of ranger, but
phiopt1 from which your dump above comes from, runs before DOM.

I realize further downthread y'all conclude this is an unitialized issue, but
I'm just trying to understand how ranger could have been involved.

FWIW, there are other passes which set global nonzero bits at -O1, most notably
CCP.  Theoretically any pass which calls set_range_info() or set_nonzero_bits()
would alter everyone's view of a range (strlen pass, PRE, DOM, sprintf, CCP,
etc).

And way down on the list of likely scenarios is fold_range() being called with
a global query object.  This would get ranger involved in folding something
with known global values (no additional lookups), but it's unlikely.  You
probably got a nonzero bits from some other pass.

[Bug middle-end/110233] [12/13/14 Regression] Wrong code at -Os on x86_64-linux-gnu

2023-06-14 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110233

--- Comment #4 from Aldy Hernandez  ---
FWIW, a less intrusive and probably more correct way of seeing what ranger
knows at this point would be to put a breakpoint where you're seeing:

Queued stmt for removal.  Folds to: 2147483647

This is in tree-ssa-propagate.cc.

There you can do:

(gdb) p cfun->x_range_query->dump (stderr)

which will pick up the current ranger and dump what it knows about the IL,
without tickling any new queries:

...
...
=== BB 3 
b.6_9   [irange] int [1334323000, +INF] NONZERO 0x7fff
 [local count: 357878153]:
_28 = b.6_9 * 2;
ivtmp.19_27 = 2147483647;
ivtmp.22_31 = (unsigned long) 
goto ; [100.00%]

_28 : [irange] int [+INF, +INF] NONZERO 0x7ffe

[Bug middle-end/110233] [12/13/14 Regression] Wrong code at -Os on x86_64-linux-gnu

2023-06-14 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110233

--- Comment #3 from Aldy Hernandez  ---
(In reply to Andrew Pinski from comment #2)
> VRP2/DOM3 produces the wrong folding for some reason:
> Folding statement: _27 = b.6_9 * 2;
> Queued stmt for removal.  Folds to: 2147483647
> 
> I don't uinderstand how it could get that from:

Putting a breakpoint in execute_ranger_vrp() for the VRP2 instance, and dumping
what the ranger knows with debug_ranger() we see:

=== BB 3 
b.6_9   [irange] int [1334323000, +INF] NONZERO 0x7fff
 [local count: 357878153]:
_28 = b.6_9 * 2;
ivtmp.19_27 = 2147483647;
ivtmp.22_31 = (unsigned long) 
goto ; [100.00%]

ivtmp.19_27 : [irange] unsigned int [2147483647, 2147483647] NONZERO 0x7fff
_28 : [irange] int [+INF, +INF] NONZERO 0x7ffe
ivtmp.22_31 : [irange] unsigned long [1, +INF]

So on entry to BB3, b.6_9 is [1334323000, +INF].  This comes from the 2->3
edge, which is the only incoming edge to BB3.

Isn't multiplying any number in that range an overflow, and thus undefined?

Ranger is coming back with:

_28 : [irange] int [+INF, +INF] NONZERO 0x7ffe

which sounds reasonable, and the reason you're seeing 2147483647.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-23 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #38 from Aldy Hernandez  ---
(In reply to Andrew Macleod from comment #37)
> (In reply to CVS Commits from comment #36)
> 
> > For the curious, a particular hot spot for IPA in this area was:
> > 
> > ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
> > {
> > ...
> > ...
> >   value_range save (m_vr);
> >   m_vr.union_ (*other_vr);
> >   return m_vr != save;
> > }
> > 
> > The problem isn't the resizing (since we do that at most once) but the
> > fact that for some functions with lots of callers we end up a huge
> > range that gets copied and compared for every meet operation.  Maybe
> > the IPA algorithm could be adjusted somehow??.
> 
> isn't the the same thing as 
> 
>   return m_vr.union_ (*other_vr);
> 
> which should be much faster without all the copying... ?

Indeed.  That is in the committed version of the patch.  I just forgot to
update the message.

The reason I didn't originally do as you suggested was that there was a bug in
the union code that returned TRUE for a union that hadn't changed anything. 
That has also been fixed in trunk.

[Bug tree-optimization/109934] [14 Regression] Wrong code at -O3 on x86_64-linux-gnu

2023-05-23 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109934

Aldy Hernandez  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #5 from Aldy Hernandez  ---
fixed.  Thanks for reporting.

[Bug tree-optimization/109934] [14 Regression] Wrong code at -O3 on x86_64-linux-gnu

2023-05-23 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109934

--- Comment #3 from Aldy Hernandez  ---
Created attachment 55140
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55140=edit
untested patch in testing

[Bug tree-optimization/109934] [14 Regression] Wrong code at -O3 on x86_64-linux-gnu

2023-05-23 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109934

Aldy Hernandez  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com
   Assignee|unassigned at gcc dot gnu.org  |aldyh at gcc dot gnu.org

--- Comment #2 from Aldy Hernandez  ---
Woah...this is a latent bug in irange::invert that seems to have been here for
a very long time.

In the loop unswitching code we do:

false_range = true_range;
if (!false_range.varying_p ()
&& !false_range.undefined_p ())
  false_range.invert ();

...and get the false_range all wrong:

(gdb) p debug(false_range)
[irange] unsigned int [44, 44][111, 111][222, 222] NONZERO 0xff
$40 = void
(gdb) n
(gdb) n
(gdb) n
(gdb) n
(gdb) p debug(false_range)
[irange] unsigned int [44, +INF]

In no universe is the inverse of the false_range equal to [44, +INF].  Whoops.

This craziness happens here:

  if (m_num_ranges == m_max_ranges
  && lower_bound () != type_min
  && upper_bound () != type_max)
{
  m_base[1] = type_max;
  m_num_ranges = 1;
  return;
}

I have no idea what we were trying to do here, but it's clearly wrong.  This
probably never triggered because the old legacy code didn't use this code, and
the new code used int_range<255> (int_range_max) which made it extremely
unlikely this would ever trigger.

[Bug tree-optimization/109920] [14 Regression] value-range.h: Mismatched new [] and delete

2023-05-23 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109920

Aldy Hernandez  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Aldy Hernandez  ---
Fixed.  David, thanks for the analysis.  You did 99% of the work here :).

[Bug tree-optimization/109920] [14 Regression] value-range.h: Mismatched new [] and delete

2023-05-22 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109920

--- Comment #2 from Aldy Hernandez  ---
Created attachment 55137
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55137=edit
untested patch

Does this fix the problem?

[Bug ipa/109886] UBSAN error: shift exponent 64 is too large for 64-bit type when compiling gcc.c-torture/compile/pr96796.c

2023-05-22 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109886

--- Comment #4 from Aldy Hernandez  ---
(In reply to Andrew Pinski from comment #3)
> (In reply to Aldy Hernandez from comment #2)
> > If irange::supports_p (TREE_TYPE (arg)) is true, we're talking about an
> > integer/pointer, but if range_cast is being called on a parm_type of
> > RECORD_TYPE, someone's trying to cast a structure to an integer.  Is that
> > the intent here, because that will not work with ranges??
> 
> That is correct. The generated code has a VIEW_CONVERT_EXR from an integer
> type to a RECORD_TYPE.

Eeeech.  In that case, then what you suggest is reasonable.  Bail if param_type
is not supported by the underlying range.  Maybe the IPA experts could opine?

[Bug ipa/109886] UBSAN error: shift exponent 64 is too large for 64-bit type when compiling gcc.c-torture/compile/pr96796.c

2023-05-22 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109886

--- Comment #2 from Aldy Hernandez  ---
(In reply to Andrew Pinski from comment #1)
> Breakpoint 6, range_cast (r=..., type=) at
> /home/apinski/src/upstream-gcc/gcc/gcc/range-op.cc:4853
> 4853  Value_Range tmp (r);
> 
> 
> Confirmed.
> The code looks like:
> ```
> int g_5, func_1_l_32, func_50___trans_tmp_31;
> ...
> int func_1() { func_50(g_3_4, g_5, func_1_l_32, 8, 3); }
> ...
> struct S0 *func_50(int p_51, struct S0 p_52, struct S1 p_53, int p_54,
>int p_55) {
> ...
> }
> ```
> 
> Code in gcc:
>   if (TREE_CODE (arg) == SSA_NAME
>   && param_type
>   /* Limit the ranger query to integral types as the rest
>  of this file uses value_range's, which only hold
>  integers and pointers.  */
>   && irange::supports_p (TREE_TYPE (arg))
>   && get_range_query (cfun)->range_of_expr (vr, arg)
>   && !vr.undefined_p ())
> {
>   value_range resvr = vr;
>   range_cast (resvr, param_type);
>   if (!resvr.undefined_p () && !resvr.varying_p ())
> ipa_set_jfunc_vr (jfunc, );
>   else
> gcc_assert (!jfunc->m_vr);
> }
>   else
> gcc_assert (!jfunc->m_vr);
> 
> 
> Maybe there should be an extra check for `irange::supports_p (param_type)`
> too to catch the case where param_type type is not supported at all.

If irange::supports_p (TREE_TYPE (arg)) is true, we're talking about an
integer/pointer, but if range_cast is being called on a parm_type of
RECORD_TYPE, someone's trying to cast a structure to an integer.  Is that the
intent here, because that will not work with ranges??

[Bug tree-optimization/109791] -Wstringop-overflow warning with -O3 and _GLIBCXX_USE_CXX11_ABI=0

2023-05-11 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109791

--- Comment #6 from Aldy Hernandez  ---

> but the issue with the PHI node remains unless we sink the  part
> (but there's many uses of __i_14).  I guess it's still the "easiest"
> way to get rangers help.  Aka make
> 
>  # __i_14' = PHI <1(10), 2(9)>
>  __i_14 =  + __i_14'; // would be a POINTER_PLUS_EXPR
> 
> it's probably still not a complete fix but maybe a good start.  Of course
> it increases the number of stmts - [ + 1B] was an 'invariant'
> (of course the PHI result isn't).  There's not a good place for this
> transform - we never "fold" PHIs (and this would be an un-folding).

Ughh, that sucks.  Let's see if Andrew has any ideas, but on my end I won't be
able to work on prange until much later this cycle-- assuming I finish what I
have on my plate.

[Bug tree-optimization/109791] -Wstringop-overflow warning with -O3 and _GLIBCXX_USE_CXX11_ABI=0

2023-05-11 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109791

--- Comment #4 from Aldy Hernandez  ---
BTW, another reason I had to drop the prange work was because IPA was doing
their own thing with ranges outside of the irange API, so it was harder to
separate things out.  So really, all this stuff was related to legacy, which is
mostly gone, and my upcoming work on IPA later this cycle should clean up the
rest of IPA.

[Bug tree-optimization/109791] -Wstringop-overflow warning with -O3 and _GLIBCXX_USE_CXX11_ABI=0

2023-05-11 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109791

--- Comment #3 from Aldy Hernandez  ---
(In reply to Richard Biener from comment #2)
> Confirmed.  This is a missed optimization, we fail to optimize the loop guard
> 
>  [local count: 329643239]:
> _4 = (unsigned long)   [(void *) + 2B];
> _6 = (unsigned long) __i_14;
> _50 = -_6;
> _100 = _4 + 18446744073709551615;
> _40 = _100 - _6;
> _41 = _40 > 13;
> if (_41 != 0)

Do we even get a non-zero range for _4?  I'm assuming that even if we get that,
we can't see that +2B minus 1 is also nonzero?

> 
> with __i_14 being
> 
>  [local count: 452186132]:
> # __i_14 = PHI <  [(void *) + 1B](10),  
> [(void *) + 2B](9)>
> 
> I'll note that the strlen pass runs before VRP (but after DOM), but I'll
> also note that likely ranger isn't very good with these kind of
> "symbolic" ranges?  How would we handle this?  Using two
> relations, __i_14 >=  + 1 && __i_14 <=  + 2?

Yeah, we don't do much with that.  Although the pointer equivalency class
should help in VRP's case.  We do some simple pointer tracking and even call
into gimple fold to simplify statements, but it's far from perfect and as you
say, strlen is running before VRP, so it wouldn't help in this case.

> 
> DOM has
> 
> Optimizing block #16
> 
> 1>>> STMT 1 =   [(void *) + 2B] ge_expr __i_14
> 1>>> STMT 1 =   [(void *) + 2B] ne_expr __i_14
> 1>>> STMT 0 =   [(void *) + 2B] eq_expr __i_14
> 1>>> STMT 1 =   [(void *) + 2B] gt_expr __i_14
> 1>>> STMT 0 =   [(void *) + 2B] le_expr __i_14
> Optimizing statement _4 = (unsigned long)   [(void *) + 2B];
> LKUP STMT _4 = nop_expr   [(void *) + 2B]
> 2>>> STMT _4 = nop_expr   [(void *) + 2B]
> Optimizing statement _6 = (unsigned long) __i_14;
> LKUP STMT _6 = nop_expr __i_14
> 2>>> STMT _6 = nop_expr __i_14
> Optimizing statement _50 = -_6;
>  Registering value_relation (_6 pe64 __i_14) (bb16) at _6 = (unsigned long)
> __i_14;
> LKUP STMT _50 = negate_expr _6
> 2>>> STMT _50 = negate_expr _6
> Optimizing statement _100 = _4 + 18446744073709551615;
> LKUP STMT _100 = _4 plus_expr 18446744073709551615
> 2>>> STMT _100 = _4 plus_expr 18446744073709551615
> Optimizing statement _40 = _100 - _6;
>  Registering value_relation (_100 < _4) (bb16) at _100 = _4 +
> 18446744073709551615;
> LKUP STMT _40 = _100 minus_expr _6
> 2>>> STMT _40 = _100 minus_expr _6
> Optimizing statement _41 = _40 > 13;
> LKUP STMT _41 = _40 gt_expr 13
> 2>>> STMT _41 = _40 gt_expr 13
> LKUP STMT _40 le_expr 14
> Optimizing statement if (_41 != 0)
> 
> Visiting conditional with predicate: if (_41 != 0)
> 
> With known ranges
> _41: [irange] bool VARYING

Ranger won't do anything, but can DOM's scoped tables do anything here?  The
hybrid threader in DOM first asks DOM's internal mechanism before asking ranger
(which seems useless in this case):

dom_jt_simplifier::simplify (gimple *stmt, gimple *within_stmt,
 basic_block bb, jt_state *state)
{
  /* First see if the conditional is in the hash table.  */
  tree cached_lhs =  m_avails->lookup_avail_expr (stmt, false, true);
  if (cached_lhs)
return cached_lhs;

  /* Otherwise call the ranger if possible.  */
  if (state)
return hybrid_jt_simplifier::simplify (stmt, within_stmt, bb, state);

  return NULL;
}

Our long term plan for pointers was providing a prange class and separating
pointers from irange.  This class would track zero/nonzero mostly, but it would
also do some pointer equivalence tracking, thus subsuming what we do with
pointer_equiv_analyzer which is currently restricted to VRP and is an
on-the-side hack.

The prototype I had for it last release tracked the equivalence plus an offset,
so we should be able to do arithmetic on a prange of say [ + X].  I had to
put it aside because the frange work took way longer than expected, plus legacy
got in the way.  Neither of this is an issue now, so we'll see.. but that's the
plan.  I should dust off those patches :-/.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #35 from Aldy Hernandez  ---
We could also tweak the number of sub-ranges.  8 (??) also sounds good for a
few percent less in performance drop, if we care.

p.s. I did try the auto_vec thing for a 25% loss in VRP performance, even when
using address(), reserve(), etc.  I may have gotten something wrong, but it
didn't look promising.  I could post my attempt and someone could take it from
there, but I think the one irange approach with sensible defaults that
automatically grow to MAX, better.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #34 from Aldy Hernandez  ---
Excellent ideas!

For that matter, we may get away with defaulting to 3 sub-ranges and always
resizing as needed (up to MAX).  Needing more than 3 sub-ranges is so rare
(less than 0.5% of the time), that the penalty will be small.

Furthermore, these defaults are sensible enough that we could nuke int_range
altogether and have irange have this small [3*2] array.  After all, most uses
of int_range now are int_range_max, since we never know the size of the
range (except in rare cases such as boolean_type_node, etc).  This would
simplify the code and get rid of the annoying templates which I hate.  No need
for int_range_max, or int_range, etc.  Just plain irange.

This would give us an irange of 592 bytes compared to 40912 for int_range_max
currently.  Plus, it's not that far away from int_range<2> which currently is
432 bytes, and as I mentioned, barely happens as we mostly use int_range_max.

I think this is a nice trade off.  Cleaner more flexible code, without
templates.

Oh... preliminary tests show it's a 5% penalty for VRP, which is more than
covered by our 13.22% improvement (plus Andrew's cache improvements) to VRP.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #30 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #29)
> Comment on attachment 55031 [details]
> WIP patch for a dynamic int_range<>
> 
> What I meant is that by using a auto_vec could avoid reimplementing larger
> chunks of vec.h/vec.cc for this.
> Resizing by adding 10 more ranges can have higher compile time cost than
> what vec.cc (calculate_allocation_1) does - doubles the size each time for
> smaller sizes and and multiplying previous by 3 / 2 for larger ones.

Hmmm, that may require a lot more work reshuffling how irange is implemented
internally.  I'll take a peek, but I can't afford to spend another week on this
;-).  Also, adding 10 or multiplying by 2, or even adding 5 IIRC, didn't make
much of a difference in our benchmarks.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #28 from Aldy Hernandez  ---
Created attachment 55031
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55031=edit
WIP patch for a dynamic int_range<>

Here's my WIP.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #24 from Aldy Hernandez  ---
FYI.  I originally tried new/delete for allocation, which was a tad slower than
ggc_alloc / ggc_free.  Not too much, but measurable.

Another idea would be to have a global obstack which auto_int_range<> uses,
which gets freed at the end of every pass.  This saves us various ggc_free()s
throughout, especially at destruction.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #23 from Aldy Hernandez  ---
An update on the int_range_max memory bloat work.

As Andrew mentioned, having int_range<25> solves the problem, but is just
kicking the can down the road.  I ran some stats on what we actually need on a
bootstrap, and 99.7% of ranges fit in a 3 sub-range range, but we need more to
represent switches, etc.

There's no clear winner for choosing , as the distribution for anything past
<3> is rather random.  What I did see was that at no point do we need more than
125 sub-ranges (on a set of .ii files from a boostrap).

I've implemented various alternatives using a dynamic approach similar to what
we do for auto_vec.  I played with allocating 2x as much as needed, and
allocating 10 or 20 more than needed, as well going from N to 255 in one go. 
All of it required some shuffling to make sure the penalty isn't much wrt
virtuals, etc, but I think the dynamic approach is the way to go.

The question is how much of a performance hit are we willing take in order to
reduce the memory footprint.  Memory to speed is a linear relationship here, so
we just have to pick a number we're happy with.

Here are some numbers for various sub-ranges (the sub-ranges grow automatically
in union, intersect, invert, and assignment, which are the methods that grow in
sub-ranges).

trunk (wide_ints <255>) =>  40912 bytes  
GCC 12 (trees <255>)=>   4112 bytes
auto_int_range<2>   =>432 bytes  (5.14% penalty for VRP)
auto_int_range<3>   =>592 bytes  (4.01% penalty)
auto_int_range<8>   =>   1392 bytes  (2.68% penalty)
auto_int_range<10>  =>   1712 bytes  (2.14% penalty)

As you can see, even at N=10, we're still 24X smaller than trunk and 2.4X
smaller than GCC12 for a 2.14% performance drop.

I'm tempted to just pick a number and tweak this later as we have ultimate
flexibility here.  Plus, we can also revert to a very small N, and have passes
that care about switches use their own temporaries (auto_int_range<20> or
such).

Note that we got a 13.22% improvement for the wide_int+legacy work, so even the
absolute worst case of a 5.14% penalty would have us sitting on a net 8.76%
improvement over GCC12.

Bike shedding welcome ;-)

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

Aldy Hernandez  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #16 from Aldy Hernandez  ---
Is there a way to measure peak memory usage in the compiler?  I'd like to
gather some stats.  Also do we have a handful of programs we typically use to
measure memory usage?  ISTR that Richard (??) had a set of memory hog programs.

[Bug tree-optimization/54627] VRP uses lots of memory and compile-time

2023-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54627

Aldy Hernandez  changed:

   What|Removed |Added

 CC||aldyh at gcc dot gnu.org

--- Comment #3 from Aldy Hernandez  ---
Just ran into this while looking for something else.  Should this be closed, as
the implementation of VRP is different now?  (i.e. no equivalence set bitmaps).

[Bug ipa/109711] [14 regression] ICE (tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in verify_range, at value-range.cc:1060) when building ffmpeg-4.4.4 since r14-377-gc92b8be9b52b

2023-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109711

Aldy Hernandez  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Aldy Hernandez  ---
fixed

[Bug ipa/109711] [14 regression] ICE (tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in verify_range, at value-range.cc:1060) when building ffmpeg-4.4.4 since r14-377-gc92b8be9b52b

2023-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109711

Aldy Hernandez  changed:

   What|Removed |Added

  Attachment #54980|0   |1
is obsolete||

--- Comment #9 from Aldy Hernandez  ---
Created attachment 54982
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54982=edit
patch in testing

I think it's easiest to restore the legacy behavior for now.  A proper
conversion of IPA should follow.

[Bug ipa/109711] [14 regression] ICE (tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in verify_range, at value-range.cc:1060) when building ffmpeg-4.4.4 since r14-377-gc92b8be9b52b

2023-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109711

--- Comment #8 from Aldy Hernandez  ---
Created attachment 54980
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54980=edit
untested

This may fix it.

[Bug ipa/109711] [14 regression] ICE (tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in verify_range, at value-range.cc:1060) when building ffmpeg-4.4.4 since r14-377-gc92b8be9b52b

2023-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109711

Aldy Hernandez  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #7 from Aldy Hernandez  ---
It looks like IPA is trying to set a range for a float even though value_range
does not support floats.  This worked before because legacy silently set a
range of error_mark_node.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #13 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #12)
> Perhaps change int_range to have the wide_ints as auto_vec with reserved
> space for a few (perhaps 3 (times 2))?

Our original implementation was exactly that, but we switched to the current
one years ago (maybe it coincided with our switch back to trees, I can't
remember).  But yes, we're discussing options.

  1   2   3   4   5   6   7   8   9   10   >