[Bug tree-optimization/111864] [12/13/14 Regression] Dead Code Elimination Regression

2024-03-15 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111864

--- Comment #3 from Aldy Hernandez  ---
(In reply to Jeffrey A. Law from comment #2)
> It almost looks like a costing issue.  The threaders find opportunities to
> thread all the incoming edges in the key block to the path which avoids the
> call..  But all the paths get rejected.  
> 
> 
> This is the  key block:
> 
> ;;   basic block 11, loop depth 0, count 976284897 (estimated locally, freq
> 0.9092), maybe hot
> ;;prev block 10, next block 12, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   10 [99.8% (guessed)]  count:225266786 (estimated locally,
> freq 0.2098) (TRUE_VALUE,EXECUTABLE)
> ;;14 [100.0% (guessed)]  count:751018112 (estimated locally,
> freq 0.6994) (TRUE_VALUE,EXECUTABLE)
>   # o.10_11 = PHI <(10), o.10_28(14)>
>   _17 = o.10_11 == 
>   _20 = o.10_11 == 
>   _27 = _20 | _17;
>   if (_27 != 0)
> goto ; [58.87%]
>   else
> goto ; [41.13%]
> 
> It's pretty obvious that 10->11 can thread to 6.  If we look at the other
> incoming edge we need o.10_28 which comes from bb14 with the value   So
> that path should be 14->10->11 threading to 6.
> 
> But they get rejected during threadfull2.

In order for threadfull2 to thread 10->11->6, it would have to handle pointer
equivalences, but the new threader doesn't currently handle them.  In VRP we
are able to handle equivs through the pointer_equiv_analyzer class which
pushes/pops state.  It only works when traversing in dominator order, which I
suppose is technically the case when going down a path.

So...I think you could wire pointer_equiv_analyzer to the new threader and get
this, though it may need to be also taught about PHIs.  The class is very
simplistic, and was only a stop gap until we got pointer ranges (with points-to
info).

Question though, why didn't the old threader get this?  I see bb11 looks
exactly the same in DOM3, and the old threader does handle pointer equivalences
through its scoped tables.  May be the IL is too complicated?

If we want to fix this in this release, I think we could do it with
pointer_equiv_analyzer, but for future releases we should be able to get it for
free when pranges are implemented.  Half the work is already done...let's see
how far I get considering I'll be going on paternity leave for a few months
this year.

[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))

2024-03-15 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331

--- Comment #13 from Aldy Hernandez  ---
I think y'all have it all figured out.  Basically,

operator_cast::op1_range() is solving num_5 in the equation:

[111,111] = (short int) num_5

Where lhs is:
(gdb) p debug(lhs)
[irange] short int [111, 111]

And the bitmask is implicit, but correctly calculated:
(gdb) p debug(lhs.get_bitmask ())
MASK 0x0 VALUE 0x6f

And the range for num_5 is just varying.

You are looking at the true side of the truncating_cast_p() conditional in
op1_range.  Essentially, once it starts building the range it ignores known
bitmasks, which cause it to return an overly conservative range.

And yes, bitmasks are calculated on demand per the comment in get_bitmask:

  // The mask inherent in the range is calculated on-demand.  For   
  // example, [0,255] does not have known bits set by default.  This
  // saves us considerable time, because setting it at creation incurs  
  // a large penalty for irange::set.  At the time of writing there 
  // was a 5% slowdown in VRP if we kept the mask precisely up to date  
  // at all times.  Instead, we default to -1 and set it when   
  // explicitly requested.  However, this function will always return   
  // the correct mask.

We may have to revisit this.  Perhaps it doesn't matter any more, and we could
keep bitmasks up to date.  It would definitely make the dumps easier to read. 
Though, at least for this testcase the bitmask is correctly returned with
get_bitmask().

I think the analysis in comment 11 is spot on.  And no Andrew, it's not *my*
code, it all comes from the bit-ccp code :-P.

fold_range() should call update_bitmask(), which eventually calls
bit_value_binop() in the CCP code.

And yes Jakub, as you have noticed, BIT_IOR_EXPR, BIT_XOR_EXPR, and likely
other operators may need to be tweaked to take bitmasks into account.  I
wouldn't be surprised if there's a lot of low hanging fruit in this space.

[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))

2024-03-14 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331

--- Comment #6 from Aldy Hernandez  ---
(In reply to Andrew Macleod from comment #5)
> (In reply to Jakub Jelinek from comment #4)
> > Actually, looking at value-range.h, irange_bitmask doesn't have just the
> > mask but also value, so I wonder why it isn't able to figure this out.
> > I'd think m_mask should be ~0x and value 111.
> 
> _1 = (short int) num_5(D);
> _2 = (int) _1;
> switch (_1)
> 
> globally we know 
> _2 : [irange] int [-32768, 32767]
> and coming into bb 3:
> _2 : [irange] int [-32768, 32767]
> 2->3  _1 :  [irange] short int [111, 111]
> 2->3  _2 :  [irange] int [111, 111]
> 2->3  num_5(D) :[irange] int [-INF, -65537][-65425, -65425][111,
> 111][65536, +INF]
> 
> I guess what you are looking for is to add known bits to the range produced
> for num_5 on the outgoing edge.
> 
> This would have to be done in operator_cast::op1_range  where we are
> reducing it to known range of the switch.  I do not believe it makes any
> attempts to set bitmasks based on casts like that currently.
> 
> so, GORI working backwards has _1 = [irange] short int [111, 111] , so it
> would be satisfying the expression:
>   short int [111, 111] = (short int) num_5(D);
> 
> when evaluating num_5 in operator_cast::op1_range, in the truncating_cast_p
> section we could also see if there is a bitmask pattern for the LHS that
> could be applied to the range..  I think that basically what you are asking
> for.   Simple enough for a constant I suppose for starters.   Or maybe Aldy
> has a routine that picks bitmasks and values from ranges somewhere?

You may want to look at:

// Return the bitmask inherent in the range.

irange_bitmask
irange::get_bitmask_from_range () const
{
}

IIRC, this code was originally authored by Jakub and was embedded in the
tree-ssanames.cc code setting nonzero bits every time we set a global range.  I
just abstracted it and adapted it to work within our framework.

[Bug ipa/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-22 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #18 from Aldy Hernandez  ---
(In reply to rguent...@suse.de from comment #17)
> On Wed, 21 Feb 2024, aldyh at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476
> > 
> > --- Comment #8 from Aldy Hernandez  ---
> > (In reply to Richard Biener from comment #5)
> > > (In reply to Martin Jambor from comment #4)
> > > > The right place where to free stuff in lattices post-IPA would be in
> > > > ipa_node_params::~ipa_node_params() where we should iterate over 
> > > > lattices
> > > > and deinitialize them or perhaps destruct the array because since
> > > > ipcp_vr_lattice directly contains Value_Range which AFAIU directly 
> > > > contains
> > > > int_range_max which has a virtual destructor... does not look like a POD
> > > > anymore.  This has escaped me when I was looking at the IPA-VR changes 
> > > > but
> > > > hopefully it should not be too difficult to deal with.
> > > 
> > > OK, that might work for the IPA side.
> > > 
> > > It's quite unusual to introduce a virtual DTOR in the middle of the class
> > > hierarchy though.  Grepping I do see quite some direct uses of 'irange'
> > > and also 'vrange' which do not have the DTOR visible but 'irange' already
> > > exposes and uses 'maybe_resize'.  I think those should only be introduced
> > > in the class exposing the virtual DTOR (but why virtual?!).
> > > 
> > > Would be nice to have a picture of the range class hierarchies with
> > > pointers on which types to use in which circumstances ...
> > 
> > For reference, you should always use the most base class you can.  If
> > you can get the work done with the vrange API, use that.  If you're
> > dealing with an integer, use irange.  The int_range<> derived class
> > should only be used for defining a variable, so:
> > 
> > int_range<123> foobar; // Defined with storage.
> > 
> > if (is_a )
> > {
> >   irange  = as_a  (foobar);
> >   ...
> > }
> > 
> > // Use an irange reference here, not an int_range reference.
> > void somefunc (const irange )
> > {
> > }
> > 
> > Also, the reason irange does not have a destructor is because it has
> > no storage.  Only int_range<> has storage associated with it, so it is
> > the only one able to see if the range grew:
> 
> But when I do
> 
>  irange *r = new int_range<>;
>  delete r;
> 
> then it will fail to release memory?  Are virtual DTORs not exactly
> invented for this, when you delete on a reference/pointer to a base
> class but the object is really a derived one?

There is no memory to release above, as int_range<> does not grow by default. 
Only int_range_max can grow, and it's meant to live on the stack by design. 
That was the whole point:

typedef int_range<3, /*RESIZABLE=*/true> int_range_max;

I suppose if you specifically wanted to shoot yourself in the foot, you could
do:

irange *r = new int_range<5, true>;

...and that would fail to release memory, but why would you do that?  If you're
allocating a lot of ranges, we have the vrange_allocator written specifically
for that, which is a lot less memory intensive.  It's what we use in the cache.

If ranges are anywhere but the stack, they should go through the allocator
which will squish down things appropriately, as iranges are very big.  The
ipa_vr class also uses the allocator in order to keep the memory footprint
down.

I guess it wouldn't hurt to have a virtual destructor in irange or even vrange
for future proofing, but it's not needed right now.  I don't have any strong
opinions, unless there's a performance penalty.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-21 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #13 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #12)
> (In reply to Aldy Hernandez from comment #11)
> > Both patches pass test.  Up to the release maintainers to decide if they
> > want to include them in this release.  Otherwise, I'll queue them up for
> > later.
> 
> This is an important regression, why shouldn't it be included in GCC 14?
> GCC trunk is open for regression and documentation fixes...

Martin's change to IPA fixes the leak in the PR.  My changes are just cleanups,
as no other pass is using ranges in GC space.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-21 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #11 from Aldy Hernandez  ---
Both patches pass test.  Up to the release maintainers to decide if they want
to include them in this release.  Otherwise, I'll queue them up for later.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-21 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #10 from Aldy Hernandez  ---
Created attachment 57478
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57478=edit
Remove GTY support for vrange and friends

Bootstraps.  Tests are pending.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-21 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #9 from Aldy Hernandez  ---
Created attachment 57477
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57477=edit
Remove virtual from int_range destructor.

Bootstraps.  Tests are pending.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-21 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #8 from Aldy Hernandez  ---
(In reply to Richard Biener from comment #5)
> (In reply to Martin Jambor from comment #4)
> > The right place where to free stuff in lattices post-IPA would be in
> > ipa_node_params::~ipa_node_params() where we should iterate over lattices
> > and deinitialize them or perhaps destruct the array because since
> > ipcp_vr_lattice directly contains Value_Range which AFAIU directly contains
> > int_range_max which has a virtual destructor... does not look like a POD
> > anymore.  This has escaped me when I was looking at the IPA-VR changes but
> > hopefully it should not be too difficult to deal with.
> 
> OK, that might work for the IPA side.
> 
> It's quite unusual to introduce a virtual DTOR in the middle of the class
> hierarchy though.  Grepping I do see quite some direct uses of 'irange'
> and also 'vrange' which do not have the DTOR visible but 'irange' already
> exposes and uses 'maybe_resize'.  I think those should only be introduced
> in the class exposing the virtual DTOR (but why virtual?!).
> 
> Would be nice to have a picture of the range class hierarchies with
> pointers on which types to use in which circumstances ...

For reference, you should always use the most base class you can.  If
you can get the work done with the vrange API, use that.  If you're
dealing with an integer, use irange.  The int_range<> derived class
should only be used for defining a variable, so:

int_range<123> foobar; // Defined with storage.

if (is_a )
{
  irange  = as_a  (foobar);
  ...
}

// Use an irange reference here, not an int_range reference.
void somefunc (const irange )
{
}

Also, the reason irange does not have a destructor is because it has
no storage.  Only int_range<> has storage associated with it, so it is
the only one able to see if the range grew:

template
inline
int_range::~int_range ()
{
  if (RESIZABLE && m_base != m_ranges)
delete[] m_base;
}

The irange superclass does not have storage, just the m_base pointer.
OTOH, m_ranges[] lives in int_range<>:

template
class int_range : public irange
{
...
private:
  wide_int m_ranges[N*2];
};

This shouldn't be a problem because you should never be putting a raw irange
pointer in a container that you allocate/deallocate.

Does this help?  If so, I could definitely document it in value-range.h.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-21 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #7 from Aldy Hernandez  ---
Let me see if I can untangle things here.  Thanks for chasing
this down, BTW.

Value_Range doesn't need a CTOR because it has an int_range_max which
does have one (courtesy of int_range<>), so things get initialized
correctly.  Deleting a memory allocated container that has a
Value_Range also works, because of the ~int_range<> destructor:

+  {
+struct container
+{
+  Value_Range vr;
+};
+struct container *pointer = new container;
+pointer->vr.set_type (integer_type_node);
+for (int i = 0; i < 100; i += 5)
+  {
+   int_range<1> tmp (integer_type_node,
+ wi::shwi (i, TYPE_PRECISION (integer_type_node)),
+ wi::shwi (i+1, TYPE_PRECISION (integer_type_node)));
+   pointer->vr.union_ (tmp);
+  }
+delete pointer;

Deleting pointer causes us to call int_range::~int_range() which clean things appropriately.

So it looks like the problem is the missing destructor in IPA.

That being said, there's a few things I've noticed thanks to you guys'
analysis.

First, the virtual marker in the int_range<> destructor is not needed.
IPA + ranges went through various changes this release, and I believe
that was left over from when value_range (int_range<2>) was being
placed in GC space directly.

We went through various incantations this release wrt IPA storage of
ranges, both in GC and in the stack.  Ultimately we ended up with
ipa_vr, which I believe is the only use of ranges in GC space, and
furthermore it uses vrange_storage not vrange nor irange nor anything
else.  For that matter, vrange_storage doesn't even have pointers, so
we shouldn't need GTY markers at all.

It seems that since IPA is the only GC user of iranges, I think we
can safely remove GTY support from the entire vrange hierarchy.

The rule of thumb should be that only int_range<> and derivatives
should be in short-term storage, and GC users should use
vrange_storage (like ipa_vr is doing, or perhaps vrange_allocator
which automates the allocation).

[Bug tree-optimization/113752] [14 Regression] warning: ‘%s’ directive writing up to 10218 bytes into a region of size between 0 and 10240 [-Wformat-overflow=] since r14-261-g0ef3756adf078c

2024-02-12 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113752

Aldy Hernandez  changed:

   What|Removed |Added

 CC|aldyh at redhat dot com|

--- Comment #3 from Aldy Hernandez  ---
Can't reproduce on x86-64 on recent trunk:

abulafia:~/bld/t/gcc []$ ./xgcc -B./ -c -O3 -Wall a.c
abulafia:~/bld/t/gcc []$ cat a.c
char a[10256];
char b;
char *c, *g;
int d, e, f;
int sprintf(char *, char *, ...);
unsigned long strlen(char *);
int h(char *j) {
  if (strlen(j) + strlen(c) + strlen(g) + 32 > 10256)
return 0;
  sprintf(a, "%s:%s:%d:%d:%d:%c:%s\n", j, c, d, e, f, b, g);
  return 1;
}

[Bug tree-optimization/113735] ICE: in operator[], at vec.h:910 with _BitInt() at -O and above

2024-02-08 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113735

Aldy Hernandez  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #8 from Aldy Hernandez  ---
fixed in trunk

[Bug tree-optimization/113735] ICE: in operator[], at vec.h:910 with _BitInt() at -O and above

2024-02-06 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113735

--- Comment #6 from Aldy Hernandez  ---
Created attachment 57336
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57336=edit
Proposed patch #2

Thanks for the suggestion Jakub.

[Bug tree-optimization/113735] ICE: in operator[], at vec.h:910 with _BitInt() at -O and above

2024-02-06 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113735

--- Comment #4 from Aldy Hernandez  ---
Created attachment 57335
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57335=edit
Proposed patch

Patch in testing.

[Bug tree-optimization/113735] ICE: in operator[], at vec.h:910 with _BitInt() at -O and above

2024-02-05 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113735

--- Comment #2 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #1)
> Slightly tweaked, still -O1:
> char b;
> void bar (void);
> 
> void
> foo (_BitInt(6110) j)
> {
>   for (;;)
> {
>   _BitInt(10) k = b % j;
>   for (j = 6; j; --j)
>   if (k)
> bar ();
> }
> }
> 
> The ICE is in on:
> 721 if (!m_equiv[bb->index])
> because bb->index is larger than m_equiv size.
> The bitint lowering pass works by walking the IL and replacing some
> statements in there with lowered code, which can involve new basic blocks
> (either through splitting existing blocks or creating fresh new ones) and
> the like.  And asks the ranger about range of statements during that.
> Is that something that ranger doesn't/can't support?
> So, would I need to ensure to find out what ranges I'll need before making
> any changes,
> ask for them, remember them somewhere on the side and then use them during
> the transformations?

We're generally OK with changing IL, if the semantic of the statement doesn't
change.  For example, you couldn't change the meaning of a statement to mean
something totally different, as that would invalidate the cached value of the
resulting SSA.  The cache was tweaked in the last release or so to handle
growing SSA names.

I'm not totally sure on changing BBs, as VRP didn't add BB's in flight, IIRC. 
But I do see code in the relation oracle handling a change in the BB count:

void
equiv_oracle::limit_check (basic_block bb)
{
  int i = (bb) ? bb->index : last_basic_block_for_fn (cfun);
  if (i >= (int)m_equiv.length ())
m_equiv.safe_grow_cleared (last_basic_block_for_fn (cfun) + 1);
}

Interestingly, this function is never used.

Andrew, was this an oversight?

The attached patch seems to fix the problem, and it looks like all other
dereferences of m_equiv[] have a suitable check, or are looking for things
already known to have an entry.Andrew would know for sure.

diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index 27f9ad61c0e..619ee5f0867 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -718,6 +718,7 @@ equiv_oracle::add_equiv_to_block (basic_block bb, bitmap
equiv_set)

   // Check if this is the first time a block has an equivalence added.
   // and create a header block. And set the summary for this block.
+  limit_check (bb);
   if (!m_equiv[bb->index])
 {
   ptr = (equiv_chain *) obstack_alloc (_chain_obstack,

[Bug tree-optimization/110603] [14 Regression] GCC, ICE: internal compiler error: in verify_range, at value-range.cc:1104 since r14-255

2024-01-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110603

Aldy Hernandez  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #4 from Aldy Hernandez  ---
(In reply to wierton from comment #0)
> The testing program:
> ```
> typedef long unsigned int size_t;
> void *memcpy(void *, const void *, size_t);
> int snprintf(char *restrict, size_t, const char *restrict, ...);
> 
> extern char a[2];
> void test_func_on_line_62(void) {
>   memcpy(a, "12", sizeof("12") - 1);
>   const int res = snprintf(0, 0, "%s", a);
>   if (res <= 3)
> do {
>   extern void f(void);
>   f();
> } while (0);
> }

The sprintf pass is ICEing because it's trying to build a nonsensical range of
[2,1].  Legacy irange tried harder with swapped ranges, but in the above case
it would actually drop to VARYING:

-  /* There's one corner case, if we had [C+1, C] before we now have
-that again.  But this represents an empty value range, so drop
-to varying in this case.  */

Which would cause the sprintf pass to set a global range of VARYING.  I can't
remember whether this meant nuking the known global range, or ignoring it
altogether (the semantics changed in the last release or two).  My guess is the
later, since set_range_info() improves ranges, never pessimizes them.

Now the reason we're passing swapped endpoints seems to originate in
get_range_strlen_dynamic().  It is setting a min of 2, courtesy of the nonzero
characters in the memcpy:

memcpy(a, "12", sizeof("12") - 1);

This comes from tree-ssa-strlen.c:
  if (!pdata->minlen && si->nonzero_chars)
{
  if (TREE_CODE (si->nonzero_chars) == INTEGER_CST)
pdata->minlen = si->nonzero_chars;


Further down we set a max of 1, stemming from the size of a[2] minus 1 for the
terminating null:

  if (TREE_CODE (size) == INTEGER_CST)
{
  ++off;   /* Increment for the terminating nul.  */
  tree toffset = build_int_cst (size_type_node, off);
  pdata->maxlen = fold_build2 (MINUS_EXPR, size_type_node,
size,
   toffset);
  pdata->maxbound = pdata->maxlen;
}

I don't understand this code enough to opine, but at the worst we could bail if
the ends are swapped.  It's no worse than what we had before.

[Bug tree-optimization/102958] std::u8string suboptimal compared to std::string

2024-01-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102958

--- Comment #7 from Aldy Hernandez  ---
Created attachment 57016
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57016=edit
preprocessed testcase with GCC13

Compile with -O2 -std=c++20

[Bug tree-optimization/102958] std::u8string suboptimal compared to std::string

2024-01-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102958

Aldy Hernandez  changed:

   What|Removed |Added

 CC||aldyh at gcc dot gnu.org,
   ||amacleod at redhat dot com,
   ||jakub at gcc dot gnu.org,
   ||jason at gcc dot gnu.org,
   ||jwakely.gcc at gmail dot com
Summary|std::u8string suboptimal|std::u8string suboptimal
   |compared to std::string,|compared to std::string
   |triggers warnings   |

--- Comment #6 from Aldy Hernandez  ---
Adjusting description, since not only have we disabled the warning in the C++
headers so this no longer warns, but the underlying problem has nothing to do
with warnings.

The char_traits specialization is opaque enough such that we can't
figure out the length:

  static _GLIBCXX17_CONSTEXPR size_t
  length(const char_type* __s)
  {
#if __cplusplus >= 201703L
if (std::__is_constant_evaluated())
  return __gnu_cxx::char_traits::length(__s);
#endif
size_t __i = 0;
while (!eq(__s[__i], char_type()))
  ++__i;
return __i;
  }

OTOH, the  specialization falls back to a __builtin_strlen which which is
trivial to see through.

I think this boils down to pinski's comment that we fail to see a string length
calculation in the following sequence, which survives all the way to the
.optimized dump:

   [local count: 8687547538]:
  # __i_46 = PHI <__i_22(3), 0(2)>
  __i_22 = __i_46 + 1;
  _24 = MEM[(const char_type &)"123456789" + __i_22 * 1];
  if (_24 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

I've seen variations of the above being turned into __builtin_strlen by fre,
ldist, as well as the strlen [ass.  Who's job is it perform this optimization?

[Bug tree-optimization/111458] [11 Regression] ICE in in dfs_enumerate_from, at cfganal.c:1560

2023-09-19 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111458

--- Comment #4 from Aldy Hernandez  ---
(In reply to Richard Biener from comment #3)

> This issue is still latent in the forward threader.  The backward threader
> calls this function from back_threader_profitability::profitable_path_p,
> so before doing any CFG changes.  In GCC 12 and later the remaining forward
> threader instances are all from DOM(?).

Yes.  Remaining uses of the forward threader are all from DOM.

I'd like to remove the forward threader in the next release, but it requires
being absolutely sure ranger catches everything the scoped tables do,
especially the floating point stuff.

[Bug c/111468] cannot express unordered equal in gimple FE

2023-09-19 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111468

--- Comment #6 from Aldy Hernandez  ---
(In reply to Richard Biener from comment #4)
> Fixed on trunk.

Sweet.  Thanks so much.  This really helps.

[Bug middle-end/111468] New: cannot express unordered equal in gimple FE

2023-09-18 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111468

Bug ID: 111468
   Summary: cannot express unordered equal in gimple FE
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: aldyh at gcc dot gnu.org
  Target Milestone: ---

Created attachment 55930
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55930=edit
Failing testcase

It looks like you can't express an UNEQ_EXPR in the GIMPLE FE.

Am I missing something?  The snippet below compiles fine with:

  if (x_1 == a_5(D))

but not with:

  if (x_1 u== a_5(D))

And at least tree-pretty-print.cc has:

case UNEQ_EXPR:
  return "u==";

I'm trying to come up with a threading test for some unordered stuff, and I'm
getting:

$ ./cc1 c.c -O2 -fgimple -quiet
c.c: In function ‘foo’:
c.c:19:10: error: expected ‘)’ before ‘u’
   19 |   if (x_1 u== a_5(D))
  |  ^~
  |  )

[snip snip]

Is there a blessed way of expressing unordered comparisons in the GIMPLE FE?

[Bug tree-optimization/110875] [14 Regression] Dead Code Elimination Regression since r14-2501-g285c9d042e9

2023-08-21 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110875

Aldy Hernandez  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #2 from Aldy Hernandez  ---
(In reply to Andrew Pinski from comment #1)

> I am super confused about VRP's ranges:
> We have the following that ranges that get exported and their relationships:
> Global Exported: a.8_105 = [irange] int [-2, 0]
>   _10 = a.8_105 + -1;
> Global Exported: _10 = [irange] int [-INF, -6][-3, -1][1, 2147483645]
>   _103 = (unsigned int) _10;
> Global Exported: _103 = [irange] unsigned int [1, 2147483645][2147483648,
> 4294967290][4294967294, +INF]
> Simplified relational if (_103 > 1)
>  into if (_103 != 1)
> 
> 
> Shouldn't the range of _10 just be [-3,-1] 
> If so _103 can't get 0 or 1 ? And then if that gets it right then the call
> to foo will go away.

[It looks like a caching issue of some kind.  Looping Andrew.]

Yes, that is indeed confusing.  _10 should have a more refined range.

Note that there's a dependency between a.8_105 and _10:

 [local count: 327784168]:
# f_lsm.17_26 = PHI 
# a.8_105 = PHI <0(3), _10(13)>
# b_lsm.19_33 = PHI 
# b_lsm_flag.20_53 = PHI <0(3), 1(13)>
# a_lsm.21_49 = PHI <_54(D)(3), _10(13)>
_9 = e.10_39 + 4294967061;
_10 = a.8_105 + -1;
if (_10 != -3(OVF))
  goto ; [94.50%]
else
  goto ; [5.50%]

This is what I see with --param=ranger-debug=tracegori in VRP2...

We first calculate a.8_105 to [-INF, -5][-2, 0][2, 2147483646]:

1140 range_of_stmt (a.8_105) at stmt a.8_105 = PHI <0(3), _10(13)>
1141   ROS dependence fill
 ROS dep fill (a.8_105) at stmt a.8_105 = PHI <0(3), _10(13)>
 ROS dep fill (_10) at stmt _10 = a.8_105 + -1;
1142 range_of_expr(a.8_105) at stmt _10 = a.8_105 + -1;
 TRUE : (1142) range_of_expr (a.8_105) [irange] int [-INF, -5][-2,
0][2, 2147483646]

Which we later refine with SCEV:

Statement _10 = a.8_105 + -1;
 is executed at most 2147483647 (bounded by 2147483647) + 1 times in loop 4.
   Loops range found for a.8_105: [irange] int [-2, 0] and calculated range
:[irange] int [-INF, -6][-2, 0][2, 2147483645]
 TRUE : (1140) range_of_stmt (a.8_105) [irange] int [-2, 0]
Global Exported: a.8_105 = [irange] int [-2, 0]

I have verified that range_of_expr after this point returns [-2, 0], so we know
both globally and locally this refined range.

However, when we try to fold _10 later on, we use the cached value instead of
recalculating with the new range for a.8_105:

Folding statement: _10 = a.8_105 + -1;
872  range_of_stmt (_10) at stmt _10 = a.8_105 + -1;
 TRUE : (872)  cached (_10) [irange] int [-INF, -6][-3, -1][1,
2147483645]

[Bug ipa/110753] [14 Regression] ICE in meet_with_1, at ipa-cp.cc:1057

2023-08-18 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110753

Aldy Hernandez  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Aldy Hernandez  ---
(In reply to Aldy Hernandez from comment #3)

> MASK 0xfffc VALUE 0x0 (low 2 bits are known to be 1, everything
> else is unknown).

FWIW, the above implies the low 2 bits are known to be 0, not 1.  Not that it
matters, as I just fixed this PR in trunk :).

[Bug ipa/110753] [14 Regression] ICE in meet_with_1, at ipa-cp.cc:1057

2023-08-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110753

Aldy Hernandez  changed:

   What|Removed |Added

 CC||aldyh at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |aldyh at gcc dot gnu.org

--- Comment #3 from Aldy Hernandez  ---
The intersection of two mask/value pairs is being pessimized incorrectly.

MASK 0x0 VALUE 0x1 (value is known to be 1)
MASK 0xfffc VALUE 0x0 (low 2 bits are known to be 1, everything
else is unknown).

irange_bitmask::intersect() is intersecting these to:

MASK 0x VALUE 0x0

(i.e. totally unknown value).

This is causing union_bitmask() to return a change when none actually occurred,
because we end up losing the bitmask.

Mine.

[Bug tree-optimization/107043] range information not used in popcount

2023-07-12 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107043

Aldy Hernandez  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Aldy Hernandez  ---
Finally... fixed in trunk :).

[Bug tree-optimization/107053] ones bits is not tracked and popcount is not tracked

2023-07-12 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107053

Aldy Hernandez  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Aldy Hernandez  ---
fixed in trunk

[Bug middle-end/110228] [13/14 Regression] llvm-16 miscompiled due to an maybe uninitialized and optimizations saying that the uninitialized has a nonzero bits of 1.

2023-06-20 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110228

--- Comment #17 from Aldy Hernandez  ---
(In reply to Andrew Pinski from comment #4)
> Phiopt does this:
> ```
>   v_13 == 1 ? 1 : LookupFlags_6
> Matching expression match.pd:1990, gimple-match-5.cc:23
> Matching expression match.pd:1990, gimple-match-5.cc:23
> Matching expression match.pd:2479, gimple-match-4.cc:35
> Matching expression match.pd:2482, gimple-match-3.cc:66
> Matching expression match.pd:2489, gimple-match-2.cc:58
> Matching expression match.pd:1947, gimple-match-7.cc:20
> Applying pattern match.pd:4742, gimple-match-7.cc:15326
> Folded into the sequence:
> _17 = v_13 == 1;
> _18 = LookupFlags_6 | _17;
> Removing basic block 5
> ;; basic block 5, loop depth 1
> ;;  pred:   4
> ;;  succ:   6
> ```
> As zero_one_valued_p returns true for LookupFlags_6 because
> tree_nonzero_bits/get_nonzero_bits returns 1 for it.
> 
> So it is ranger in the end that returns that it has a nonzero bits of 1.
> Which is completely wrong as LookupFlags_6 is defined by:
>   # LookupFlags_6 = PHI 
> 
> 
> Which has an uninitialized variable in it which is not defined at what its
> value would be 

BTW, it doesn't seem ranger would be involved here at all, since this bug
happens with -O1, and evrp nor VRP* run at this level.  The only other way
ranger could be involved at -O1 is through DOM threading's use of ranger, but
phiopt1 from which your dump above comes from, runs before DOM.

I realize further downthread y'all conclude this is an unitialized issue, but
I'm just trying to understand how ranger could have been involved.

FWIW, there are other passes which set global nonzero bits at -O1, most notably
CCP.  Theoretically any pass which calls set_range_info() or set_nonzero_bits()
would alter everyone's view of a range (strlen pass, PRE, DOM, sprintf, CCP,
etc).

And way down on the list of likely scenarios is fold_range() being called with
a global query object.  This would get ranger involved in folding something
with known global values (no additional lookups), but it's unlikely.  You
probably got a nonzero bits from some other pass.

[Bug middle-end/110233] [12/13/14 Regression] Wrong code at -Os on x86_64-linux-gnu

2023-06-14 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110233

--- Comment #4 from Aldy Hernandez  ---
FWIW, a less intrusive and probably more correct way of seeing what ranger
knows at this point would be to put a breakpoint where you're seeing:

Queued stmt for removal.  Folds to: 2147483647

This is in tree-ssa-propagate.cc.

There you can do:

(gdb) p cfun->x_range_query->dump (stderr)

which will pick up the current ranger and dump what it knows about the IL,
without tickling any new queries:

...
...
=== BB 3 
b.6_9   [irange] int [1334323000, +INF] NONZERO 0x7fff
 [local count: 357878153]:
_28 = b.6_9 * 2;
ivtmp.19_27 = 2147483647;
ivtmp.22_31 = (unsigned long) 
goto ; [100.00%]

_28 : [irange] int [+INF, +INF] NONZERO 0x7ffe

[Bug middle-end/110233] [12/13/14 Regression] Wrong code at -Os on x86_64-linux-gnu

2023-06-14 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110233

--- Comment #3 from Aldy Hernandez  ---
(In reply to Andrew Pinski from comment #2)
> VRP2/DOM3 produces the wrong folding for some reason:
> Folding statement: _27 = b.6_9 * 2;
> Queued stmt for removal.  Folds to: 2147483647
> 
> I don't uinderstand how it could get that from:

Putting a breakpoint in execute_ranger_vrp() for the VRP2 instance, and dumping
what the ranger knows with debug_ranger() we see:

=== BB 3 
b.6_9   [irange] int [1334323000, +INF] NONZERO 0x7fff
 [local count: 357878153]:
_28 = b.6_9 * 2;
ivtmp.19_27 = 2147483647;
ivtmp.22_31 = (unsigned long) 
goto ; [100.00%]

ivtmp.19_27 : [irange] unsigned int [2147483647, 2147483647] NONZERO 0x7fff
_28 : [irange] int [+INF, +INF] NONZERO 0x7ffe
ivtmp.22_31 : [irange] unsigned long [1, +INF]

So on entry to BB3, b.6_9 is [1334323000, +INF].  This comes from the 2->3
edge, which is the only incoming edge to BB3.

Isn't multiplying any number in that range an overflow, and thus undefined?

Ranger is coming back with:

_28 : [irange] int [+INF, +INF] NONZERO 0x7ffe

which sounds reasonable, and the reason you're seeing 2147483647.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-23 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #38 from Aldy Hernandez  ---
(In reply to Andrew Macleod from comment #37)
> (In reply to CVS Commits from comment #36)
> 
> > For the curious, a particular hot spot for IPA in this area was:
> > 
> > ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
> > {
> > ...
> > ...
> >   value_range save (m_vr);
> >   m_vr.union_ (*other_vr);
> >   return m_vr != save;
> > }
> > 
> > The problem isn't the resizing (since we do that at most once) but the
> > fact that for some functions with lots of callers we end up a huge
> > range that gets copied and compared for every meet operation.  Maybe
> > the IPA algorithm could be adjusted somehow??.
> 
> isn't the the same thing as 
> 
>   return m_vr.union_ (*other_vr);
> 
> which should be much faster without all the copying... ?

Indeed.  That is in the committed version of the patch.  I just forgot to
update the message.

The reason I didn't originally do as you suggested was that there was a bug in
the union code that returned TRUE for a union that hadn't changed anything. 
That has also been fixed in trunk.

[Bug tree-optimization/109934] [14 Regression] Wrong code at -O3 on x86_64-linux-gnu

2023-05-23 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109934

Aldy Hernandez  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #5 from Aldy Hernandez  ---
fixed.  Thanks for reporting.

[Bug tree-optimization/109934] [14 Regression] Wrong code at -O3 on x86_64-linux-gnu

2023-05-23 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109934

--- Comment #3 from Aldy Hernandez  ---
Created attachment 55140
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55140=edit
untested patch in testing

[Bug tree-optimization/109934] [14 Regression] Wrong code at -O3 on x86_64-linux-gnu

2023-05-23 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109934

Aldy Hernandez  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com
   Assignee|unassigned at gcc dot gnu.org  |aldyh at gcc dot gnu.org

--- Comment #2 from Aldy Hernandez  ---
Woah...this is a latent bug in irange::invert that seems to have been here for
a very long time.

In the loop unswitching code we do:

false_range = true_range;
if (!false_range.varying_p ()
&& !false_range.undefined_p ())
  false_range.invert ();

...and get the false_range all wrong:

(gdb) p debug(false_range)
[irange] unsigned int [44, 44][111, 111][222, 222] NONZERO 0xff
$40 = void
(gdb) n
(gdb) n
(gdb) n
(gdb) n
(gdb) p debug(false_range)
[irange] unsigned int [44, +INF]

In no universe is the inverse of the false_range equal to [44, +INF].  Whoops.

This craziness happens here:

  if (m_num_ranges == m_max_ranges
  && lower_bound () != type_min
  && upper_bound () != type_max)
{
  m_base[1] = type_max;
  m_num_ranges = 1;
  return;
}

I have no idea what we were trying to do here, but it's clearly wrong.  This
probably never triggered because the old legacy code didn't use this code, and
the new code used int_range<255> (int_range_max) which made it extremely
unlikely this would ever trigger.

[Bug tree-optimization/109920] [14 Regression] value-range.h: Mismatched new [] and delete

2023-05-23 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109920

Aldy Hernandez  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Aldy Hernandez  ---
Fixed.  David, thanks for the analysis.  You did 99% of the work here :).

[Bug tree-optimization/109920] [14 Regression] value-range.h: Mismatched new [] and delete

2023-05-22 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109920

--- Comment #2 from Aldy Hernandez  ---
Created attachment 55137
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55137=edit
untested patch

Does this fix the problem?

[Bug ipa/109886] UBSAN error: shift exponent 64 is too large for 64-bit type when compiling gcc.c-torture/compile/pr96796.c

2023-05-22 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109886

--- Comment #4 from Aldy Hernandez  ---
(In reply to Andrew Pinski from comment #3)
> (In reply to Aldy Hernandez from comment #2)
> > If irange::supports_p (TREE_TYPE (arg)) is true, we're talking about an
> > integer/pointer, but if range_cast is being called on a parm_type of
> > RECORD_TYPE, someone's trying to cast a structure to an integer.  Is that
> > the intent here, because that will not work with ranges??
> 
> That is correct. The generated code has a VIEW_CONVERT_EXR from an integer
> type to a RECORD_TYPE.

Eeeech.  In that case, then what you suggest is reasonable.  Bail if param_type
is not supported by the underlying range.  Maybe the IPA experts could opine?

[Bug ipa/109886] UBSAN error: shift exponent 64 is too large for 64-bit type when compiling gcc.c-torture/compile/pr96796.c

2023-05-22 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109886

--- Comment #2 from Aldy Hernandez  ---
(In reply to Andrew Pinski from comment #1)
> Breakpoint 6, range_cast (r=..., type=) at
> /home/apinski/src/upstream-gcc/gcc/gcc/range-op.cc:4853
> 4853  Value_Range tmp (r);
> 
> 
> Confirmed.
> The code looks like:
> ```
> int g_5, func_1_l_32, func_50___trans_tmp_31;
> ...
> int func_1() { func_50(g_3_4, g_5, func_1_l_32, 8, 3); }
> ...
> struct S0 *func_50(int p_51, struct S0 p_52, struct S1 p_53, int p_54,
>int p_55) {
> ...
> }
> ```
> 
> Code in gcc:
>   if (TREE_CODE (arg) == SSA_NAME
>   && param_type
>   /* Limit the ranger query to integral types as the rest
>  of this file uses value_range's, which only hold
>  integers and pointers.  */
>   && irange::supports_p (TREE_TYPE (arg))
>   && get_range_query (cfun)->range_of_expr (vr, arg)
>   && !vr.undefined_p ())
> {
>   value_range resvr = vr;
>   range_cast (resvr, param_type);
>   if (!resvr.undefined_p () && !resvr.varying_p ())
> ipa_set_jfunc_vr (jfunc, );
>   else
> gcc_assert (!jfunc->m_vr);
> }
>   else
> gcc_assert (!jfunc->m_vr);
> 
> 
> Maybe there should be an extra check for `irange::supports_p (param_type)`
> too to catch the case where param_type type is not supported at all.

If irange::supports_p (TREE_TYPE (arg)) is true, we're talking about an
integer/pointer, but if range_cast is being called on a parm_type of
RECORD_TYPE, someone's trying to cast a structure to an integer.  Is that the
intent here, because that will not work with ranges??

[Bug tree-optimization/109791] -Wstringop-overflow warning with -O3 and _GLIBCXX_USE_CXX11_ABI=0

2023-05-11 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109791

--- Comment #6 from Aldy Hernandez  ---

> but the issue with the PHI node remains unless we sink the  part
> (but there's many uses of __i_14).  I guess it's still the "easiest"
> way to get rangers help.  Aka make
> 
>  # __i_14' = PHI <1(10), 2(9)>
>  __i_14 =  + __i_14'; // would be a POINTER_PLUS_EXPR
> 
> it's probably still not a complete fix but maybe a good start.  Of course
> it increases the number of stmts - [ + 1B] was an 'invariant'
> (of course the PHI result isn't).  There's not a good place for this
> transform - we never "fold" PHIs (and this would be an un-folding).

Ughh, that sucks.  Let's see if Andrew has any ideas, but on my end I won't be
able to work on prange until much later this cycle-- assuming I finish what I
have on my plate.

[Bug tree-optimization/109791] -Wstringop-overflow warning with -O3 and _GLIBCXX_USE_CXX11_ABI=0

2023-05-11 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109791

--- Comment #4 from Aldy Hernandez  ---
BTW, another reason I had to drop the prange work was because IPA was doing
their own thing with ranges outside of the irange API, so it was harder to
separate things out.  So really, all this stuff was related to legacy, which is
mostly gone, and my upcoming work on IPA later this cycle should clean up the
rest of IPA.

[Bug tree-optimization/109791] -Wstringop-overflow warning with -O3 and _GLIBCXX_USE_CXX11_ABI=0

2023-05-11 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109791

--- Comment #3 from Aldy Hernandez  ---
(In reply to Richard Biener from comment #2)
> Confirmed.  This is a missed optimization, we fail to optimize the loop guard
> 
>  [local count: 329643239]:
> _4 = (unsigned long)   [(void *) + 2B];
> _6 = (unsigned long) __i_14;
> _50 = -_6;
> _100 = _4 + 18446744073709551615;
> _40 = _100 - _6;
> _41 = _40 > 13;
> if (_41 != 0)

Do we even get a non-zero range for _4?  I'm assuming that even if we get that,
we can't see that +2B minus 1 is also nonzero?

> 
> with __i_14 being
> 
>  [local count: 452186132]:
> # __i_14 = PHI <  [(void *) + 1B](10),  
> [(void *) + 2B](9)>
> 
> I'll note that the strlen pass runs before VRP (but after DOM), but I'll
> also note that likely ranger isn't very good with these kind of
> "symbolic" ranges?  How would we handle this?  Using two
> relations, __i_14 >=  + 1 && __i_14 <=  + 2?

Yeah, we don't do much with that.  Although the pointer equivalency class
should help in VRP's case.  We do some simple pointer tracking and even call
into gimple fold to simplify statements, but it's far from perfect and as you
say, strlen is running before VRP, so it wouldn't help in this case.

> 
> DOM has
> 
> Optimizing block #16
> 
> 1>>> STMT 1 =   [(void *) + 2B] ge_expr __i_14
> 1>>> STMT 1 =   [(void *) + 2B] ne_expr __i_14
> 1>>> STMT 0 =   [(void *) + 2B] eq_expr __i_14
> 1>>> STMT 1 =   [(void *) + 2B] gt_expr __i_14
> 1>>> STMT 0 =   [(void *) + 2B] le_expr __i_14
> Optimizing statement _4 = (unsigned long)   [(void *) + 2B];
> LKUP STMT _4 = nop_expr   [(void *) + 2B]
> 2>>> STMT _4 = nop_expr   [(void *) + 2B]
> Optimizing statement _6 = (unsigned long) __i_14;
> LKUP STMT _6 = nop_expr __i_14
> 2>>> STMT _6 = nop_expr __i_14
> Optimizing statement _50 = -_6;
>  Registering value_relation (_6 pe64 __i_14) (bb16) at _6 = (unsigned long)
> __i_14;
> LKUP STMT _50 = negate_expr _6
> 2>>> STMT _50 = negate_expr _6
> Optimizing statement _100 = _4 + 18446744073709551615;
> LKUP STMT _100 = _4 plus_expr 18446744073709551615
> 2>>> STMT _100 = _4 plus_expr 18446744073709551615
> Optimizing statement _40 = _100 - _6;
>  Registering value_relation (_100 < _4) (bb16) at _100 = _4 +
> 18446744073709551615;
> LKUP STMT _40 = _100 minus_expr _6
> 2>>> STMT _40 = _100 minus_expr _6
> Optimizing statement _41 = _40 > 13;
> LKUP STMT _41 = _40 gt_expr 13
> 2>>> STMT _41 = _40 gt_expr 13
> LKUP STMT _40 le_expr 14
> Optimizing statement if (_41 != 0)
> 
> Visiting conditional with predicate: if (_41 != 0)
> 
> With known ranges
> _41: [irange] bool VARYING

Ranger won't do anything, but can DOM's scoped tables do anything here?  The
hybrid threader in DOM first asks DOM's internal mechanism before asking ranger
(which seems useless in this case):

dom_jt_simplifier::simplify (gimple *stmt, gimple *within_stmt,
 basic_block bb, jt_state *state)
{
  /* First see if the conditional is in the hash table.  */
  tree cached_lhs =  m_avails->lookup_avail_expr (stmt, false, true);
  if (cached_lhs)
return cached_lhs;

  /* Otherwise call the ranger if possible.  */
  if (state)
return hybrid_jt_simplifier::simplify (stmt, within_stmt, bb, state);

  return NULL;
}

Our long term plan for pointers was providing a prange class and separating
pointers from irange.  This class would track zero/nonzero mostly, but it would
also do some pointer equivalence tracking, thus subsuming what we do with
pointer_equiv_analyzer which is currently restricted to VRP and is an
on-the-side hack.

The prototype I had for it last release tracked the equivalence plus an offset,
so we should be able to do arithmetic on a prange of say [ + X].  I had to
put it aside because the frange work took way longer than expected, plus legacy
got in the way.  Neither of this is an issue now, so we'll see.. but that's the
plan.  I should dust off those patches :-/.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #35 from Aldy Hernandez  ---
We could also tweak the number of sub-ranges.  8 (??) also sounds good for a
few percent less in performance drop, if we care.

p.s. I did try the auto_vec thing for a 25% loss in VRP performance, even when
using address(), reserve(), etc.  I may have gotten something wrong, but it
didn't look promising.  I could post my attempt and someone could take it from
there, but I think the one irange approach with sensible defaults that
automatically grow to MAX, better.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #34 from Aldy Hernandez  ---
Excellent ideas!

For that matter, we may get away with defaulting to 3 sub-ranges and always
resizing as needed (up to MAX).  Needing more than 3 sub-ranges is so rare
(less than 0.5% of the time), that the penalty will be small.

Furthermore, these defaults are sensible enough that we could nuke int_range
altogether and have irange have this small [3*2] array.  After all, most uses
of int_range now are int_range_max, since we never know the size of the
range (except in rare cases such as boolean_type_node, etc).  This would
simplify the code and get rid of the annoying templates which I hate.  No need
for int_range_max, or int_range, etc.  Just plain irange.

This would give us an irange of 592 bytes compared to 40912 for int_range_max
currently.  Plus, it's not that far away from int_range<2> which currently is
432 bytes, and as I mentioned, barely happens as we mostly use int_range_max.

I think this is a nice trade off.  Cleaner more flexible code, without
templates.

Oh... preliminary tests show it's a 5% penalty for VRP, which is more than
covered by our 13.22% improvement (plus Andrew's cache improvements) to VRP.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #30 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #29)
> Comment on attachment 55031 [details]
> WIP patch for a dynamic int_range<>
> 
> What I meant is that by using a auto_vec could avoid reimplementing larger
> chunks of vec.h/vec.cc for this.
> Resizing by adding 10 more ranges can have higher compile time cost than
> what vec.cc (calculate_allocation_1) does - doubles the size each time for
> smaller sizes and and multiplying previous by 3 / 2 for larger ones.

Hmmm, that may require a lot more work reshuffling how irange is implemented
internally.  I'll take a peek, but I can't afford to spend another week on this
;-).  Also, adding 10 or multiplying by 2, or even adding 5 IIRC, didn't make
much of a difference in our benchmarks.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #28 from Aldy Hernandez  ---
Created attachment 55031
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55031=edit
WIP patch for a dynamic int_range<>

Here's my WIP.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #24 from Aldy Hernandez  ---
FYI.  I originally tried new/delete for allocation, which was a tad slower than
ggc_alloc / ggc_free.  Not too much, but measurable.

Another idea would be to have a global obstack which auto_int_range<> uses,
which gets freed at the end of every pass.  This saves us various ggc_free()s
throughout, especially at destruction.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #23 from Aldy Hernandez  ---
An update on the int_range_max memory bloat work.

As Andrew mentioned, having int_range<25> solves the problem, but is just
kicking the can down the road.  I ran some stats on what we actually need on a
bootstrap, and 99.7% of ranges fit in a 3 sub-range range, but we need more to
represent switches, etc.

There's no clear winner for choosing , as the distribution for anything past
<3> is rather random.  What I did see was that at no point do we need more than
125 sub-ranges (on a set of .ii files from a boostrap).

I've implemented various alternatives using a dynamic approach similar to what
we do for auto_vec.  I played with allocating 2x as much as needed, and
allocating 10 or 20 more than needed, as well going from N to 255 in one go. 
All of it required some shuffling to make sure the penalty isn't much wrt
virtuals, etc, but I think the dynamic approach is the way to go.

The question is how much of a performance hit are we willing take in order to
reduce the memory footprint.  Memory to speed is a linear relationship here, so
we just have to pick a number we're happy with.

Here are some numbers for various sub-ranges (the sub-ranges grow automatically
in union, intersect, invert, and assignment, which are the methods that grow in
sub-ranges).

trunk (wide_ints <255>) =>  40912 bytes  
GCC 12 (trees <255>)=>   4112 bytes
auto_int_range<2>   =>432 bytes  (5.14% penalty for VRP)
auto_int_range<3>   =>592 bytes  (4.01% penalty)
auto_int_range<8>   =>   1392 bytes  (2.68% penalty)
auto_int_range<10>  =>   1712 bytes  (2.14% penalty)

As you can see, even at N=10, we're still 24X smaller than trunk and 2.4X
smaller than GCC12 for a 2.14% performance drop.

I'm tempted to just pick a number and tweak this later as we have ultimate
flexibility here.  Plus, we can also revert to a very small N, and have passes
that care about switches use their own temporaries (auto_int_range<20> or
such).

Note that we got a 13.22% improvement for the wide_int+legacy work, so even the
absolute worst case of a 5.14% penalty would have us sitting on a net 8.76%
improvement over GCC12.

Bike shedding welcome ;-)

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

Aldy Hernandez  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #16 from Aldy Hernandez  ---
Is there a way to measure peak memory usage in the compiler?  I'd like to
gather some stats.  Also do we have a handful of programs we typically use to
measure memory usage?  ISTR that Richard (??) had a set of memory hog programs.

[Bug tree-optimization/54627] VRP uses lots of memory and compile-time

2023-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54627

Aldy Hernandez  changed:

   What|Removed |Added

 CC||aldyh at gcc dot gnu.org

--- Comment #3 from Aldy Hernandez  ---
Just ran into this while looking for something else.  Should this be closed, as
the implementation of VRP is different now?  (i.e. no equivalence set bitmaps).

[Bug ipa/109711] [14 regression] ICE (tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in verify_range, at value-range.cc:1060) when building ffmpeg-4.4.4 since r14-377-gc92b8be9b52b

2023-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109711

Aldy Hernandez  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Aldy Hernandez  ---
fixed

[Bug ipa/109711] [14 regression] ICE (tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in verify_range, at value-range.cc:1060) when building ffmpeg-4.4.4 since r14-377-gc92b8be9b52b

2023-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109711

Aldy Hernandez  changed:

   What|Removed |Added

  Attachment #54980|0   |1
is obsolete||

--- Comment #9 from Aldy Hernandez  ---
Created attachment 54982
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54982=edit
patch in testing

I think it's easiest to restore the legacy behavior for now.  A proper
conversion of IPA should follow.

[Bug ipa/109711] [14 regression] ICE (tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in verify_range, at value-range.cc:1060) when building ffmpeg-4.4.4 since r14-377-gc92b8be9b52b

2023-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109711

--- Comment #8 from Aldy Hernandez  ---
Created attachment 54980
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54980=edit
untested

This may fix it.

[Bug ipa/109711] [14 regression] ICE (tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in verify_range, at value-range.cc:1060) when building ffmpeg-4.4.4 since r14-377-gc92b8be9b52b

2023-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109711

Aldy Hernandez  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #7 from Aldy Hernandez  ---
It looks like IPA is trying to set a range for a float even though value_range
does not support floats.  This worked before because legacy silently set a
range of error_mark_node.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #13 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #12)
> Perhaps change int_range to have the wide_ints as auto_vec with reserved
> space for a few (perhaps 3 (times 2))?

Our original implementation was exactly that, but we switched to the current
one years ago (maybe it coincided with our switch back to trees, I can't
remember).  But yes, we're discussing options.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr

2023-05-02 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #6 from Aldy Hernandez  ---
I forgot to add.  Running with "ulimit -s unlimited" does not segfault.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr

2023-05-02 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

Aldy Hernandez  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #4 from Aldy Hernandez  ---
Deep recursion in the ranger is causing us to run out of stack space.  This
failure coincides with switching the internal representation of irange's to
wide_ints which take more space.

The problem happens in the waccess pass and can be seen with:
--param=ranger-debug=all

It looks like we start going bat shit crazy around:

2401   range_on_edge (_1011) on edge 1016->1017
2402 range_on_exit (_1011) from BB 1016
2403   range_of_expr(_1011) at stmt resx 4
2404 range_on_entry (_1011) to BB 1016
2405   range_of_stmt (_1011) at stmt _1011 = PHI <_1947(3),
_1010(1013)>

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr

2023-05-02 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

Aldy Hernandez  changed:

   What|Removed |Added

   Last reconfirmed||2023-05-02
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr

2023-05-02 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

Aldy Hernandez  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |aldyh at gcc dot gnu.org
   Priority|P3  |P1
 CC|aldyh at redhat dot com|aldyh at gcc dot gnu.org

--- Comment #3 from Aldy Hernandez  ---
Mine.  Started with:

c92b8be9b52b7e0de5ad67bc268dad1498181908 is the first bad commit
commit c92b8be9b52b7e0de5ad67bc268dad1498181908
Author: Aldy Hernandez 
Date:   Sun Feb 19 17:43:43 2023 +0100

Convert internal representation of irange to wide_ints.

[Bug ipa/109639] [14 regression] Internal compiler error: tree check: expected integer_cst, have addr_expr in to_wide, at tree.h:6283 since r14-249-g3c9372dfee0bb8

2023-04-27 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109639

Aldy Hernandez  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #16 from Aldy Hernandez  ---
fixed

[Bug ipa/109639] [14 regression] Internal compiler error: tree check: expected integer_cst, have addr_expr in to_wide, at tree.h:6283 since r14-249-g3c9372dfee0bb8

2023-04-27 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109639

--- Comment #14 from Aldy Hernandez  ---
Created attachment 54939
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54939=edit
proposed patch in testing

[Bug ipa/109639] [14 regression] Internal compiler error: tree check: expected integer_cst, have addr_expr in to_wide, at tree.h:6283 since r14-249-g3c9372dfee0bb8

2023-04-27 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109639

--- Comment #13 from Aldy Hernandez  ---
Sorry, it wasn't the setter doing the normalization, but the
range_fold_{unary,binary}_expr helpers.  Since IPA was the only user of these
deprecated functions, this should be relatively easy to contain.

[Bug ipa/109639] [14 regression] Internal compiler error: tree check: expected integer_cst, have addr_expr in to_wide, at tree.h:6283 since r14-249-g3c9372dfee0bb8

2023-04-27 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109639

Aldy Hernandez  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |aldyh at gcc dot gnu.org
 CC||amacleod at redhat dot com

--- Comment #12 from Aldy Hernandez  ---
We're trying to build a range of [, ] and are failing because irange's no
longer support symbolics.  Truth be told, we haven't supported them in a while,
because ranger doesn't.  However, the setter was normalizing addresses to
non-zero, which it is no longer doing.

Mine.

[Bug ipa/109639] [14 regression] Internal compiler error: tree check: expected integer_cst, have addr_expr in to_wide, at tree.h:6283 since r14-249-g3c9372dfee0bb8

2023-04-27 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109639

Aldy Hernandez  changed:

   What|Removed |Added

 CC||slyfox at gcc dot gnu.org

--- Comment #11 from Aldy Hernandez  ---
*** Bug 109643 has been marked as a duplicate of this bug. ***

[Bug ipa/109643] [14 Regression] IPA inline ICE on pkg-config-0.29.2 since r14-249-g3c9372dfee0bb8

2023-04-27 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109643

Aldy Hernandez  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #4 from Aldy Hernandez  ---
dup

*** This bug has been marked as a duplicate of bug 109639 ***

[Bug tree-optimization/109593] [14 Regression] Valgrind error doing build of latest gcc trunk since r14-32-g10e481b154c5fc63

2023-04-23 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109593

Aldy Hernandez  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Aldy Hernandez  ---
fixed

[Bug tree-optimization/109593] [14 Regression] Valgrind error doing build of latest gcc trunk since r14-32-g10e481b154c5fc63

2023-04-23 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109593

--- Comment #8 from Aldy Hernandez  ---
Created attachment 54908
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54908=edit
untested patch

Patch in testing.  Does this fix the problem on the reporter's side?

[Bug c/109593] valgrind error doing build of latest gcc trunk

2023-04-22 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109593

--- Comment #5 from Aldy Hernandez  ---
Huh.  I'm gonna guess:

commit 10e481b154c5fc63e6ce4b449ce86cecb87a6015
Author: Aldy Hernandez 
Date:   Thu Jan 26 04:46:54 2023 +0100

Return true from operator== for two identical ranges containing NAN.


-  if (known_isnan () || src.known_isnan ())
-   return false;
-
   return (real_identical (_min, _min)
  && real_identical (_max, _max)
  && m_pos_nan == src.m_pos_nan

If either operand is a NAN, m_min or m_max have garbage.  Can you verify that's
the commit?  Either way, I can take a look at it on Monday.

[Bug tree-optimization/91645] Missed optimization with sqrt(x*x)

2023-03-29 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91645

Aldy Hernandez  changed:

   What|Removed |Added

 CC||llvm at rifkin dot dev

--- Comment #10 from Aldy Hernandez  ---
*** Bug 103559 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/103559] Can't optimize away < 0 check on sqrt

2023-03-29 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103559

Aldy Hernandez  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE
 CC||aldyh at gcc dot gnu.org

--- Comment #5 from Aldy Hernandez  ---
(In reply to Jeremy R. from comment #0)
> For a simple invocation of sqrt, gcc inserts a < 0 check to set math errno
> if needed. E.g.
> 
> float f(float x) {
> return sqrt(x);
> }
> 
> Is generated as
> 
> f(float):
> vxorps  xmm1, xmm1, xmm1
> vucomissxmm1, xmm0
> ja  .L10
> vsqrtss xmm0, xmm0, xmm0
> ret
> .L10:
> jmp sqrtf
> 
> 
> Unfortunately, this check is still present when the GCC is able to prove
> that x is non-negative:
> 
> float f(float x) {
> if(x < 0) [[unlikely]] {
> __builtin_unreachable();
> } else {
> return sqrt(x);
> }
> }

x could also be a NAN which I *think* the hardware sqrt won't handle, so a
better test would be:

if (x < 0.0 || __builtin_isnan(x)) [[unlikely]]
  __builtin_unreachable ();

or perhaps:

if (!__builtin_isgreater(x, 0.0))

Either way, this is a subset of PR91645 so I'm closing it as a duplicate.

*** This bug has been marked as a duplicate of bug 91645 ***

[Bug tree-optimization/91645] Missed optimization with sqrt(x*x)

2023-03-29 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91645

--- Comment #9 from Aldy Hernandez  ---
It looks like what we want for this test is actually !isgreaterequal() not
isless(), since we want to exclude the possibility of a NAN.  Like this:

float test (float x)
{
  if (!__builtin_isgreaterequal (x, 0.0))
__builtin_unreachable();
  return sqrtf (x);
}

After VRP1 removes the unreachable, the range for x_1 is correctly exported as
>= -0.0 without a NAN:

Global Exported (via unreachable): x_1(D) = [frange] float [-0.0 (-0x0.0p+0),
+Inf]

We end up with this:

float test (float x)
{
  float _4;

   [local count: 1073741824]:
  _4 = sqrtf (x_1(D));
  return _4;

}

which then CDCE expands with the unnecessary checking code:

   [local count: 1073741824]:
  DCE_COND_LB.2_5 = x_1(D);
  DCE_COND_LB_TEST.3_6 = DCE_COND_LB.2_5 u>= 0.0;
  if (DCE_COND_LB_TEST.3_6 != 0)
goto ; [99.95%]
  else
goto ; [0.05%]

   [local count: 1073204960]:
  _4 = .SQRT (x_1(D));
  goto ; [100.00%]

   [local count: 536864]:
  _7 = sqrtf (x_1(D));

So the CDCE pass needs to be enhanced to use the ranger, instead of heuristics,
to determine that the argument to sqrtf is neither negative nor NAN.

In this particular case, we could use global ranges for free, but there's no
reason we can't use an actual on-demand ranger for more complex scenarios.

Just a guess here, but use_internal_fn() in CDCE shrink wraps the call to sqrt
into a check with appropriate dispatch.  We could emit the .SQRT call directly
if the range of x_1 is not NAN and not negative.

[Bug tree-optimization/109154] [13 regression] jump threading de-optimizes nested floating point comparisons

2023-03-27 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154

--- Comment #13 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #12)

> (In reply to Aldy Hernandez from comment #10)
> > BTW, I don't think it helps at all here, but casting from l_10 to a float,
> > we know _1 can't be either -0.0 or +-INF or +-NAN.  We could add a range-op
> > entry for NOP_EXPR / CONVERT_EXPR to expose this fact.  Well, at the very
> > least that it can't be a NAN...in the current representation for frange's.
> 
> We definitely should add range-ops for conversions from integral to floating
> point and from floating to integral and their reverses.  But until we have
> more than one range, if the integral value is VARYING, for 32-bit signed int
> the range would be
> [-0x1.p+31, 0x1.p+31] so nothing specific around zero.  With 3+ ranges we
> could make it
> [-0x1.p+31, -1.][0., 0.][1., 0x1.p+31] if we think normal values around zero
> are important special cases.

Ultimately we want "unlimited" sub-ranges like we have for int_range_max, but
who knows what pandora's box that will open :-/.

[Bug c/109233] [12/13 Regression] warning: array subscript 5 is above array bounds of ‘struct tg3_napi[5]’ since r12-2591

2023-03-22 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109233

--- Comment #10 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #8)

> And on the ranger side why we have determined the [0, 5] range rather than
> [0, 4], whether it is related to inaccurate number of iterations estimation,
> or ranger using it incorrectly, ...

The [0, 5] is actually coming from SCEV, which ranger is using to refine the
range.  Presumably, ranger is doing worse than SCEV, because it doesn't improve
it.

$ grep 'Loops range fo' a.c.*evrp
   Loops range found for i_3: [irange] int [0, 5] NONZERO 0x7 and calculated
range :[irange] int [-2147483647, +INF]
   Loops range found for i_3: [irange] int [0, 5] NONZERO 0x7 and calculated
range :[irange] int [0, 6] NONZERO 0x7

Perhaps Andrew can pontificate on the recalculations / iterations / etc.

[Bug tree-optimization/109154] [13 regression] jump threading de-optimizes nested floating point comparisons

2023-03-22 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154

--- Comment #10 from Aldy Hernandez  ---
(In reply to Andrew Macleod from comment #9)
> (In reply to Richard Biener from comment #7)
> > (In reply to Richard Biener from comment #6)
> > > ah, probably it's the missing CSE there:
> > > 
> > >  :
> > > _1 = (float) l_10;
> > > _2 = _1 < 0.0;
> > > zone1_17 = (int) _2;
> > > if (_1 < 0.0)

BTW, I don't think it helps at all here, but casting from l_10 to a float, we
know _1 can't be either -0.0 or +-INF or +-NAN.  We could add a range-op entry
for NOP_EXPR / CONVERT_EXPR to expose this fact.  Well, at the very least that
it can't be a NAN...in the current representation for frange's.

[Bug tree-optimization/109154] [13 regression] jump threading de-optimizes nested floating point comparisons

2023-03-22 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154

--- Comment #8 from Aldy Hernandez  ---
(In reply to avieira from comment #5)
> Im slightly confused here, on entry to BB 5 we know the opposite of _1 < 0.0
> no? if we branch to BB 5 we know !(_1 < 0.0) so we can't fold _1 <= 1.0, we
> just know that the range of _1 is >= 0.0 . Or am I misreading, I've not
> tried compiling myself just going off the code both of you posted here.

Sorry, I should've been more clear.

_1 is < 0.0 on entry to BB5, but only on the 4->5->?? path which is what's
being analyzed.

[Bug tree-optimization/109154] [13 regression] jump threading de-optimizes nested floating point comparisons

2023-03-22 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154

Aldy Hernandez  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com,
   ||law at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org

--- Comment #4 from Aldy Hernandez  ---
(In reply to Tamar Christina from comment #3)
> Aldy, any thoughts here?

We need a "real" threading expert on this one, as the decision by ranger is
correct.  It looks like this is a profitability issue in the threader.

The problem can be seen with -O2 --param=threader-debug=all, on the threadfull1
dump:

The threaded path is:
path: 4->5->7 SUCCESS

ranger can fold the conditional in BB5:

if (_1 < 1.0e+0)
  goto ; [41.00%]
else
  goto ; [59.00%]

...because on entry to BB5 we know _1 < 0.0:

   [local count: 955630225]:
  _1 = (float) l_9;
  _2 = _1 < 0.0;
  zone1_15 = (int) _2;
  if (_1 < 0.0)
goto ; [41.00%]
  else
goto ; [59.00%]

   [local count: 391808389]:

   [local count: 955630225]:
  # iftmp.0_10 = PHI 
  fasten_main_natpro_chrg_init.2_3 = fasten_main_natpro_chrg_init;
  _4 = fasten_main_natpro_chrg_init.2_3 * iftmp.0_10;
  _5 = (float) _4;
  if (_1 < 1.0e+0)
goto ; [41.00%]
  else
goto ; [59.00%]

If this shouldn't be threaded because of a hot/cold issue, perhaps the code
goes in back_threader_profitability::profitable_path_p() where there's already
logic wrt hot blocks.

[Bug ipa/81323] IPA-VRP doesn't handle return values

2023-03-21 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81323

--- Comment #7 from Aldy Hernandez  ---
(In reply to Andrew Macleod from comment #6)
> (In reply to Jakub Jelinek from comment #4)
> > Or the ranger could do it itself, similarly to how it handles .ASSUME, but
> > without actually querying anything but the global range of the return value
> > if any.  Though, doing that in the range means that we won't know ranges of
> > functions which with LTO are in a different partition, while doing it as IPA
> > optimization could allow even that to work.
> 
> Aldy has been doing some IPA/LTO related cleanup with ranges... Hopefully we
> can get this all connected next release.

I'm sitting on a patchset for GCC 14 that will revamp all the range handling in
ipa-*.* to be less ad-hoc, and use generic ranges (vrange even, so
type-agnostic).  The plan is to integrate this with the internal range storage
used by IPA (currently pairs of wide ints or value_range's) so that IPA at
least has the infrastructure to handle generic ranges.

Some discussion upstream is still needed, but the general idea should be
feasible for GCC 14.  It will be up to the IPA maintainers to handle floats/etc
internally though.

[Bug tree-optimization/109008] [13 Regression] Wrong code in scipy package since r13-3926-gd4c2f1d376da6f

2023-03-08 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109008

--- Comment #29 from Aldy Hernandez  ---
Created attachment 54604
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54604=edit
untested patch for NAN state copying

This is what I had in mind.

Notice I haven't touched the fields in frange itself, to keep disruption to a
minimum in this release.  In the next release we could replace m_pos_nan and
m_neg_nan with nan_state.

If this helps, I could test and document it.

[Bug tree-optimization/109008] [13 Regression] Wrong code in scipy package since r13-3926-gd4c2f1d376da6f

2023-03-08 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109008

--- Comment #27 from Aldy Hernandez  ---
(In reply to Richard Biener from comment #21)
> So without messing with real.cc to try exposing 0.5ulp adjustments for GCC
> 13 I'd simply do something like the following:
> 
> diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
> index ff42b95de4f..1ae68503664 100644
> --- a/gcc/range-op-float.cc
> +++ b/gcc/range-op-float.cc
> @@ -2214,7 +2214,26 @@ public:
>  range_op_handler minus (MINUS_EXPR, type);
>  if (!minus)
>return false;
> -return float_binary_op_range_finish (minus.fold_range (r, type, lhs,
> op2),
> +/* The forward operation is
> +lhs = r + op2
> +   where the result is +-0.5ulp of lhs before rounding.  For the
> +   reverse operation we need the lhs range before rounding, so
> +   conservatively use nextafter on it.
> +   ???  REAL_VALUE_TYPE could handle this more precisely if we
> +   make sure to not round the inputs for the format before the
> +   real_operation is carried out and when we can properly round
> +   towards +-Inf for the lower/upper bounds.  */
> +frange wlhs (lhs);
> +if (!lhs.known_isnan ())
> +  {
> +   REAL_VALUE_TYPE lhsl = lhs.lower_bound ();
> +   frange_nextafter (TYPE_MODE (type), lhsl, dconstninf);
> +   REAL_VALUE_TYPE lhsu = lhs.upper_bound ();
> +   frange_nextafter (TYPE_MODE (type), lhsu, dconstinf);
> +   wlhs.set (type, lhsl, lhsu);
> +   wlhs.union_ (lhs);  /* Copy NaN state.  */
> +  }
> +return float_binary_op_range_finish (minus.fold_range (r, type, wlhs,
> op2),
>  r, type, lhs);
>}
>virtual bool op2_range (frange , tree type,

Could we abstract this into an inline function until this could be properly
implemented in real.cc?

[Bug tree-optimization/109008] [13 Regression] Wrong code in scipy package since r13-3926-gd4c2f1d376da6f

2023-03-08 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109008

Aldy Hernandez  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #26 from Aldy Hernandez  ---
(In reply to Richard Biener from comment #16)
> (In reply to Jakub Jelinek from comment #14)
> > Created attachment 54599 [details]
> > gcc13-pr109008-wip.patch
> > 
> > I have now this WIP but it doesn't work correctly yet, need to debug it now,
> > just finished writing it and making it compile.
> > Note, after the binary search it is still unfinished, I'd like to then
> > adjust it one bit at a time to narrow it further.  But before working on
> > that it needs to work correctly.
> 
> Looks nicer and more precise (even though the iterating function is ugly
> and you also suffer from the lack of copying of alternate range state,
> but using .union () is a way out here I guess ;)).

While working on LTO streaming of ranges, I also noticed that we don't have a
way of accessing the individual NAN sign bits or a way of copying NAN state. 
Why don't we get this right instead of hacking around it?

Would it help if we provided the following?

nan_state_t irange::get_nan_state ();

irange::set (tree type, const REAL_VALUE_TYPE &, const REAL_VALUE_TYPE &, const
nan_state_t &, value_range_kind = VR_RANGE);

We could implement nan_state_t to include the m_pos_nan and m_neg_nan for
minimal disruption now, and later use it for extending the NAN information
(signalling, etc).

[Bug tree-optimization/107561] [13 Regression] g++.dg/pr71488.C and [g++.dg/warn/Warray-bounds-16.C -m32] regression due to -Wstringop-overflow problem

2023-02-27 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107561

--- Comment #14 from Aldy Hernandez  ---
(In reply to Richard Biener from comment #11)
> So I've missed the VR_ANTI_RANGE handling in get_size_range where we run into
> 
>   wide_int maxsize = wi::to_wide (max_object_size ());
>   min = wide_int::from (min, maxsize.get_precision (), UNSIGNED);
>   max = wide_int::from (max, maxsize.get_precision (), UNSIGNED);
>   if (wi::eq_p (0, min - 1))
> {
>   /* EXP is unsigned and not in the range [1, MAX].  That means
>  it's either zero or greater than MAX.  Even though 0 would
>  normally be detected by -Walloc-zero, unless ALLOW_ZERO
>  is set, set the range to [MAX, TYPE_MAX] so that when MAX
>  is greater than the limit the whole range is diagnosed.  */
>   wide_int maxsize = wi::to_wide (max_object_size ());
>   if (flags & SR_ALLOW_ZERO)
> {
>   if (wi::leu_p (maxsize, max + 1)
>   || !(flags & SR_USE_LARGEST))
> min = max = wi::zero (expprec);
>   else
> {
>   min = max + 1;
>   max = wi::to_wide (TYPE_MAX_VALUE (exptype));
> }
> }
>   else
> {
>   min = max + 1;
>   max = wi::to_wide (TYPE_MAX_VALUE (exptype));
> }
> 
> and from [0,0] [8, +INF] pick [8, +INF] based on the comments reasoning.

Ughh, you're reaching all the problematic cases I ran into while trying to
remove legacy.

> 
> This all wouldn't happen if we'd be able to CSE the zero size ...
> 
> We can now try to put additional heuristic ontop of the above heuristic,
> namely when the object we write to is of size zero set SR_ALLOW_ZERO.
> Or try to "undo" the multiplication trick which would probably make us
> end up with VARYING.
> 
> I'll note that with the earlier proposed change we regress the following,
> that's an observation I make a lot of times - all "weirdness" in the code
> is backed by (artificial) testcases verifying it works exactly as coded ...

+1

> 
> FAIL: gcc.dg/Wstringop-overflow-15.c pr82608 (test for warnings, line 37)
> FAIL: gcc.dg/Wstringop-overflow-15.c pr82608 (test for warnings, line 38)
> FAIL: gcc.dg/Wstringop-overflow-15.c pr82608 (test for warnings, line 69)
> FAIL: gcc.dg/Wstringop-overflow-15.c pr82608 (test for warnings, line 70)
> FAIL: gcc.dg/Wstringop-overflow-56.c  (test for warnings, line 64)
> FAIL: gcc.dg/Wstringop-overflow-56.c  (test for warnings, line 75)
> FAIL: gcc.dg/Wstringop-overflow-56.c  (test for warnings, line 86)
> FAIL: gcc.dg/Wstringop-overflow-56.c  (test for warnings, line 97)
> FAIL: gcc.dg/Wstringop-overflow-56.c  (test for warnings, line 108)
> FAIL: gcc.dg/Wstringop-overflow-56.c  (test for warnings, line 148)
> FAIL: gcc.dg/Wstringop-overflow-56.c  (test for warnings, line 159)
> FAIL: gcc.dg/attr-alloc_size-11.c  (test for warnings, line 52)
> FAIL: gcc.dg/attr-alloc_size-11.c  (test for warnings, line 53)
> FAIL: gcc.dg/attr-alloc_size-11.c  (test for warnings, line 54)
> FAIL: gcc.dg/attr-alloc_size-11.c  (test for warnings, line 55)
> FAIL: gcc.dg/attr-alloc_size-11.c  (test for warnings, line 56)
> FAIL: gcc.dg/attr-alloc_size-11.c  (test for warnings, line 57)
> FAIL: gcc.dg/attr-alloc_size-11.c  (test for warnings, line 58)
> FAIL: gcc.dg/attr-alloc_size-11.c  (test for warnings, line 64)
> FAIL: gcc.dg/attr-alloc_size-11.c  (test for warnings, line 65)
> FAIL: gcc.dg/attr-alloc_size-3.c  (test for warnings, line 438)
> FAIL: gcc.dg/attr-alloc_size-4.c  (test for warnings, line 138)
> FAIL: gcc.dg/attr-alloc_size-4.c  (test for warnings, line 143)
> FAIL: gcc.dg/attr-alloc_size-4.c  (test for warnings, line 187)
> FAIL: gcc.dg/pr98721-1.c  (test for warnings, line 11)
> FAIL: gcc.dg/pr98721-1.c  (test for warnings, line 12)
> 
> For example gcc.dg/pr98721-1.c has
> 
> int
> foo (int n)
> {
>   if (n <= 0)
> {
>   char vla[n];  /* { dg-message "source object 'vla'
> of size 0" } */
>   return __builtin_strlen (vla);/* { dg-warning "'__builtin_strlen'
> reading 1 or more bytes from a region of size 0" } */
> 
> but of course we do not diagnose
> 
> int
> foo (int n)
> {
>   if (n < 0)
> {
>   char vla[n];
> 
> or when no condition is present or a n > 32 condition is present.

Yup, ran into that too.

> 
> I fear it's not possible to "fix" this testcase without changing the
> expectation on a bunch of other testcases.  But the messaging to the
> user is quite unhelpful because it doesn't actually inform him about
> above reasoning.

Agreed.

[Bug tree-optimization/107561] [13 Regression] g++.dg/pr71488.C and [g++.dg/warn/Warray-bounds-16.C -m32] regression due to -Wstringop-overflow problem

2023-02-27 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107561

--- Comment #12 from Aldy Hernandez  ---
(In reply to Richard Biener from comment #10)
> (In reply to Richard Biener from comment #9)
> > Note I think there's still a bug in value_range (irange) here. 
> > get_size_range
> > does
> > 
> >   if (integral)
> > {
> >   value_range vr;
> > 
> >   query->range_of_expr (vr, exp, stmt);
> > 
> >   if (vr.undefined_p ())
> > vr.set_varying (TREE_TYPE (exp));
> >   range_type = vr.kind ();
> >   min = wi::to_wide (vr.min ());
> >   max = wi::to_wide (vr.max ());
> > 
> > and we have vr:
> > 
> > (gdb) p vr
> > $13 = { = { = {
> >   _vptr.vrange = 0x3693a30 +16>, 
> >   m_kind = VR_ANTI_RANGE, m_discriminator = VR_IRANGE}, 
> > m_num_ranges = 1 '\001', m_max_ranges = 1 '\001', 
> > m_nonzero_mask = , m_base = 0x7fffc8f0}, m_ranges = {
> > , }}
> > (gdb) p vr.dump (stderr)
> > [irange] unsigned int [0, 0][8, +INF]$17 = void
> > 
> > but vr.min () produces 1 and vr.max () produces 7, just as if it doesn't
> > interpret VR_ANTI_RANGE transparently here (if that's the intent?!).
> > At least
> > 
> > // Return the highest bound of a range expressed as a tree.
> > 
> > inline tree
> > irange::tree_upper_bound () const
> > 
> > suggests that.  Note that vr.num_pairs () produces 2 (because constant_p ())
> > but vr.m_num_ranges is 1 and tree_upper_bound uses m_num_ranges.
> > 
> > I suppose irange::{min,max,tree_lower_bound,tree_upper_bound} miss "support"
> > for legacy_mode_p here.
> 
> OTOH gimple-array-bounds.cc does
> 
>   const value_range *vr = NULL;
>   if (TREE_CODE (low_sub_org) == SSA_NAME)
> {
>   vr = get_value_range (low_sub_org, stmt);
>   if (!vr->undefined_p () && !vr->varying_p ())
> {
>   low_sub = vr->kind () == VR_RANGE ? vr->max () : vr->min ();
>   up_sub = vr->kind () == VR_RANGE ? vr->min () : vr->max ();
> }
> 
> so the bug is a documentation bug on min/max/lower/upper bound?!
> 
> I'm policeing other uses of value_range right now.

Haven't looked at this yet, but min/max/lower/upper bound are broken and
inconsistent.  They give different results in legacy and the new irange API. 
This was not by intent, but it crept in :/.  My fault.  I noticed this when
removing the legacy code for next stage1.

[Bug testsuite/108835] gm2 tests at large -jNN numbers do not return

2023-02-20 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108835

--- Comment #4 from Aldy Hernandez  ---
Thank you very much for such a prompt fix.

[Bug testsuite/108835] New: gm2 tests at large -jNN numbers do not return

2023-02-17 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108835

Bug ID: 108835
   Summary: gm2 tests at large -jNN numbers do not return
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: aldyh at gcc dot gnu.org
  Target Milestone: ---

Running make check -j55 sometimes yields tests that fail to terminate in the
gm2/ directory.  For example, coroutine.x5 and testtransfer.x5.

Worst case scenario there should be a timeout for these tests.

[Bug tree-optimization/108697] constructing a path-range-query is expensive

2023-02-07 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697

--- Comment #5 from Aldy Hernandez  ---
(In reply to Richard Biener from comment #3)
> But yes, your observation about m_has_cache_entry is correct - if that has
> any value (it makes reset_path "cheap"), then it should be also relied on
> upon
> growth.  Maybe make this bitmap an optional feature of the cache itself
> and handle it transparently there?

BTW, if putting the bitmap in the ssa_global_cache itself works, that's fine as
well, especially if it makes the ranger's cache faster :).

[Bug tree-optimization/108697] constructing a path-range-query is expensive

2023-02-07 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697

--- Comment #4 from Aldy Hernandez  ---
Created attachment 54420
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54420=edit
reuse path_range_query SSA cache for all of back_threader class

Off of the top of my head (i.e. probably very broken ;-)), this is what I had
in mind...  Share a cache between subsequent instantiations of
path_range_query.

Note the change to the set_cache:

 path_range_query::set_cache (const vrange , tree name)
 {
   unsigned v = SSA_NAME_VERSION (name);
-  bitmap_set_bit (m_has_cache_entry, v);
+  if (bitmap_set_bit (m_has_cache_entry, v))
+m_cache->clear_global_range (name);
   m_cache->set_global_range (name, r);
 }

Since we're sharing the cache, make sure we nuke whatever it had there before,
otherwise set_global_range will try to reuse the slot if it thinks it fits. 
Although...maybe that's not even necessary.  I'm not sure...you'd have to play
around with it, but I think the general idea would work.

[Bug tree-optimization/108697] constructing a path-range-query is expensive

2023-02-07 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697

--- Comment #1 from Aldy Hernandez  ---
Sharing a cache with say the ranger is a no go, because the path_range_query's
cache is specific to the path, but perhaps we could share the path's cache
between constructions providing a constructor that takes the cache as you
suggest.

The path query has a bitmap to determine if a cache entry is used
(m_has_cache_entry), so I think we could reuse a dirty cache since the bitmap
gets cleared at construction.  I assume the main culprit is the
path_range_query's being instantiated from class back_threader?  If so, there
could be one cache per back_threader?

Hmmm, perhaps we'd have to call ssa_global_cache::clear_global_range() before
each set_global_range() for reused caches otherwise the latter would try to
reuse slots (fits_p and all that).

Also, is it the clearing part of safe_grow_cleared() that is taking up the
time, or just safe_grow.  Cause ISTM we don't need to clear the cache for use
in path_range_query.  So we could even add a flag to use safe_grow() instead of
safe_grow_cleared() for the path query?

[Bug tree-optimization/108647] [13 Regression] ICE in upper_bound, at value-range.h:950 with -O3 since r13-2974-g67166c9ec35d58ef

2023-02-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108647

--- Comment #15 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #14)
> Created attachment 54405 [details]
> gcc13-pr108647.patch
> 
> Here is what I'm about to test momentarily, though I must say I don't
> understand those operator_cast changes at all.  I thought casts are unary
> operators and so I don't understand what kind of range would be op2 in that
> case.

Oh poop, sorry.  Unary operators always have the resulting type passed as
VARYING in op2.  It would never be undefined.  Sorry for the noise; you can
disregard the cast changes.

[Bug tree-optimization/108647] [13 Regression] ICE in upper_bound, at value-range.h:950 with -O3 since r13-2974-g67166c9ec35d58ef

2023-02-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108647

--- Comment #13 from Aldy Hernandez  ---
Created attachment 54404
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54404=edit
frange changes

These are the analogous changes to range-op-float.cc.

Patch in testing.

[Bug tree-optimization/108647] [13 Regression] ICE in upper_bound, at value-range.h:950 with -O3 since r13-2974-g67166c9ec35d58ef

2023-02-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108647

--- Comment #12 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #7)
> So
> --- gcc/range-op.cc.jj2023-02-03 10:51:40.699003658 +0100
> +++ gcc/range-op.cc   2023-02-03 16:04:39.264159294 +0100
> @@ -642,7 +642,8 @@ operator_equal::op1_range (irange , tr
>  case BRS_FALSE:
>// If the result is false, the only time we know anything is
>// if OP2 is a constant.
> -  if (wi::eq_p (op2.lower_bound(), op2.upper_bound()))
> +  if (!op2.undefined_p ()
> +   && wi::eq_p (op2.lower_bound(), op2.upper_bound()))
>   {
> r = op2;
> r.invert ();
> @@ -755,7 +756,8 @@ operator_not_equal::op1_range (irange 
>  case BRS_TRUE:
>// If the result is true, the only time we know anything is if
>// OP2 is a constant.
> -  if (wi::eq_p (op2.lower_bound(), op2.upper_bound()))
> +  if (!op2.undefined_p ()
> +   && wi::eq_p (op2.lower_bound(), op2.upper_bound()))
>   {
> r = op2;
> r.invert ();
> @@ -920,6 +922,9 @@ operator_lt::op1_range (irange , tree
>   const irange ,
>   relation_trio) const
>  {
> +  if (op2.undefined_p ())
> +return false;
> +
>switch (get_bool_state (r, lhs, type))
>  {
>  case BRS_TRUE:
> @@ -942,6 +947,9 @@ operator_lt::op2_range (irange , tree
>   const irange ,
>   relation_trio) const
>  {
> +  if (op1.undefined_p ())
> +return false;
> +
>switch (get_bool_state (r, lhs, type))
>  {
>  case BRS_TRUE:
> @@ -1031,6 +1039,9 @@ operator_le::op1_range (irange , tree
>   const irange ,
>   relation_trio) const
>  {
> +  if (op2.undefined_p ())
> +return false;
> +
>switch (get_bool_state (r, lhs, type))
>  {
>  case BRS_TRUE:
> @@ -1053,6 +1064,9 @@ operator_le::op2_range (irange , tree
>   const irange ,
>   relation_trio) const
>  {
> +  if (op1.undefined_p ())
> +return false;
> +
>switch (get_bool_state (r, lhs, type))
>  {
>  case BRS_TRUE:
> @@ -1141,6 +1155,9 @@ operator_gt::op1_range (irange , tree
>   const irange , const irange ,
>   relation_trio) const
>  {
> +  if (op2.undefined_p ())
> +return false;
> +
>switch (get_bool_state (r, lhs, type))
>  {
>  case BRS_TRUE:
> @@ -1163,6 +1180,9 @@ operator_gt::op2_range (irange , tree
>   const irange ,
>   relation_trio) const
>  {
> +  if (op1.undefined_p ())
> +return false;
> +
>switch (get_bool_state (r, lhs, type))
>  {
>  case BRS_TRUE:
> @@ -1252,6 +1272,9 @@ operator_ge::op1_range (irange , tree
>   const irange ,
>   relation_trio) const
>  {
> +  if (op2.undefined_p ())
> +return false;
> +
>switch (get_bool_state (r, lhs, type))
>  {
>  case BRS_TRUE:
> @@ -1274,6 +1297,9 @@ operator_ge::op2_range (irange , tree
>   const irange ,
>   relation_trio) const
>  {
> +  if (op1.undefined_p ())
> +return false;
> +
>switch (get_bool_state (r, lhs, type))
>  {
>  case BRS_TRUE:
> then plus testcase?

Looks good to me.

Do you mind adding this bit to your testing?

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 136b709385c..fdc0a6c05fd 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -2678,7 +2678,6 @@ operator_cast::op1_range (irange , tree type,
   if (lhs.undefined_p ())
 return false;
   tree lhs_type = lhs.type ();
-  gcc_checking_assert (types_compatible_p (op2.type(), type));

   // If we are calculating a pointer, shortcut to what we really care about.
   if (POINTER_TYPE_P (type))
@@ -2705,6 +2704,8 @@ operator_cast::op1_range (irange , tree type,
   return true;
 }

+  if (op2.undefined_p ())
+return false;
   if (truncating_cast_p (op2, lhs))
 {
   if (lhs.varying_p ())

This catches the cast operator which will ICE in truncating_cast_p when op2 is
undefined.

I've removed the checking assert since any number of operations further down
will ICE if the types don't match.

[Bug tree-optimization/108647] [13 Regression] ICE in upper_bound, at value-range.h:950 with -O3 since r13-2974-g67166c9ec35d58ef

2023-02-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108647

--- Comment #5 from Aldy Hernandez  ---
> Perhaps just adding if (op2.undefined_p ()) return false; above most of the
> switch (get_bool_state (r, lhs, type)) lines (in methods that refer to op2;
> and similarly for op1) for the comparison operators.

This feels like the right approach.  We also do something similar in some of
the binary operators.  For example, plus/minus_expr do:

  if (lhs.undefined_p ())
return false;

For this testcase we have:

[0,0] = UNDEFINED < op2

I don't think we can determine anything for op2 in this situation, so the right
thing would be to return false, which amounts to returning VARYING.

The binary operators would also have to handle operators of undefined, but
those seem to all have gates on singleton_p() or go through the fold_range()
routine for the inverse operation, so they'd be handled correctly.

So yeah, it looks like just handling the op[12]_range operators above the
get_bool_state would do the trick.

Hmmm, it seems like there are 2 operators that have possible problematic
handling of op2.undefined_p():

1. operator_cast::op1_range (irange , tree type,

   The assert looks at the op2 type.  That will ICE with an UNDEFINED.

   The truncating_cast_p path will also look at op2's type blindly.

2. operator_bitwise_xor::op1_range

  Imagine: [0,0] = x ^ UNDEFINED

  op1_range won't ICE, but it will do:

case BRS_FALSE:
  r = op2;

  I don't think we know anything about x in this case.  Though if, op2 is
UNDEFINED, I guess it
  wouldn't hurt to assume op1 is also UNDEFINED.

   Also earlier in the function, for:

 UNDEFINED = x ^ op2

   will return x = UNDEFINED for any op2.  

So at least #1 would also have to be handled because it could ICE.

Also, range-op-float.cc needs the same treatment prior all the get_bool_state
calls.

As you mention, another alternative would be to always call
empty_range_varying, as we're doing for abs::op[12]_range:

  if (empty_range_varying (r, type, lhs, op2))
return true;

Thoughts?  Andrew?

Hmmm, I wonder why we haven't tripped over this scenario in past releases (LHS
being defined, but op2 being UNDEFINED).

[Bug tree-optimization/108639] [13 Regression] ICE on valid code at -O1 and above: in decompose, at wide-int.h:984 since r13-5578

2023-02-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108639

--- Comment #12 from Aldy Hernandez  ---
Yeah, I've been mulling around the idea of removing the type from storage of
both irange and frange.  It seems we need it for setting a type (setting the
endpoints for varying, querying HONOR_NANS, HONOR_INFINITIES,e tc), but not in
the storage itself.  Something to keep in mind when moving to wide_ints.

It does seem we need to keep the sign and precision somewhere.  The precision
lives in the wide-int, but the sign bit still needs storage??

[Bug tree-optimization/108639] [13 Regression] ICE on valid code at -O1 and above: in decompose, at wide-int.h:984 since r13-5578

2023-02-02 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108639

--- Comment #7 from Aldy Hernandez  ---
Jakub, take a look and see if you agree.  I've fired off some tests.

[Bug tree-optimization/108639] [13 Regression] ICE on valid code at -O1 and above: in decompose, at wide-int.h:984 since r13-5578

2023-02-02 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108639

--- Comment #6 from Aldy Hernandez  ---
Created attachment 54393
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54393=edit
untested patch for irange::operator==

[Bug tree-optimization/108639] [13 Regression] ICE on valid code at -O1 and above: in decompose, at wide-int.h:984 since r13-5578

2023-02-02 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108639

Aldy Hernandez  changed:

   What|Removed |Added

 CC||aldyh at gcc dot gnu.org

--- Comment #5 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #3)
> Created attachment 54391 [details]
> gcc13-pr108639.patch
> 
> Untested fix.

I think the problem is more fundamental than that.  The equality operator for
irange is not ICEing for the sub-range comparison (which also have different
precision), but is dying in the nonzero mask comparison.

[Bug tree-optimization/108540] [13 Regression] Frange miscompilation of ruby since r13-3261

2023-01-26 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108540

--- Comment #7 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #6)
> Created attachment 54351 [details]
> gcc13-pr108540.patch
> 
> Untested fix.

LGTM.  Thanks for looking at this.

[Bug tree-optimization/108447] [13 Regression] glibc math/test-*-iseqsig failures

2023-01-19 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108447

--- Comment #14 from Aldy Hernandez  ---
(In reply to rguent...@suse.de from comment #13)
> On Thu, 19 Jan 2023, jakub at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108447
> > 
> > --- Comment #12 from Jakub Jelinek  ---
> > As a workaround in stage4, perhaps, but long term the relations make a lot 
> > of
> > sense even for floating point with NANs.  If you know <= relation between 2
> > vars and know the range of one of them, the other can be computed from it, 
> > ...
> 
> I think it makes more sense to fix the relation combining code, maybe
> simply disable that for FP as a workaround?

Yeah, the combining code definitely needs some tweaks.

Thinking out loud here...

It looks like we ignore relations in the various relation fold_range() methods
if NAN can be a possibility.  See how frelop_early_resolve() only looks at the
relation if the operands are !NAN.  So perhaps this is why we never used
validate_relation().

On the other hand, the op1_range methods *do* use the relations.  For example,
on EQ_EXPR (foperator_equal::op1_range), we clear NAN on the TRUE side because
equality could never happen if there was a NAN present (regardless of VREL_*). 
On the FALSE side we set_nan() only if VREL_EQ.

So we ignore the oracle's relations in fold_range() throughout, unless we're
sure !NAN, and use the oracle from op1_range.

[Bug tree-optimization/108447] [13 Regression] glibc math/test-*-iseqsig failures

2023-01-19 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108447

--- Comment #11 from Aldy Hernandez  ---
Hmmm, I wonder if we could do this all in validate_relation like Andrew had
planned.

If NAN is a possibility in either x or y, then we could disallow any relation
recording right off the bat, and avoid any special casing in union/intersect. 
Perhaps allow the != relation even if NAN?

[Bug tree-optimization/108447] [13 Regression] glibc math/test-*-iseqsig failures

2023-01-19 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108447

--- Comment #9 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #4)
> I see fold_using_range::relation_fold_and_or
> which sees relation1 VREL_LE and relation2 VREL_GE and !is_and, and because
> of
> relation_union (relation1, relation2) == VREL_VARYING fold it to 1.
> But for floating point comparisons, LE | GE is not always true, it is true if
> neither operand is NAN, otherwise false.

Ah, it was the union not the intersect.  So we need something here that would
avoid resolving this to 1 if maybe_isnan() is true for either source operand.

[Bug tree-optimization/108447] [13 Regression] glibc math/test-*-iseqsig failures

2023-01-19 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108447

--- Comment #8 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #1)
> This one started with r13-1933-g24012539ae3410174.
> I think the problem is in:
> 
> Folding statement: _3 = cmp1_10 & cmp2_11;
> Not folded
> Folding statement: if (_3 != 0)
>  Registering value_relation (x_8(D) <= y_9(D)) on (2->3)
>  Registering value_relation (x_8(D) >= y_9(D)) on (2->3)
> Intersecting with existing (x_8(D) <= y_9(D)) to produce (x_8(D) ==
> y_9(D)) Updated.
> and later
> Folding statement: _1 = cmp1_10 | cmp2_11;
>   Relation adjustment: cmp1_10  and cmp2_11  combine to produce [irange]
> _Bool [1, 1]
> Queued stmt for removal.  Folds to: 1
> 
> When NANs are honored, the above is not true for floating point comparisons.
> x <= y and x >= y comparisons are signaling (if either operand is qNaN or
> sNaN), x == y and x != y aren't (they only signal if either operand is
> sNaN), and while

*blink*.  Woah.  I did not know that.

As Andrew mentions down thread, we had validate_relation() to verify whether a
relation could be registered.  So for example, given x == x,
validate_relation() would be called at registration time and call the range-ops
entry for EQ_EXPR, which would return false for this combo if x could be a NAN.

For some forgotten reason, we forgot to call validate_relation() at the
oracle's registry.  My guess is that since we ignore the oracle's answers in
frelop_early_resolve() (if !maybe_isnan) in each of the relational operators,
it was no longer needed.

Although for the testcase here, perhaps this would not have helped either.  Is
there anything in x_8 or y_9 that could be used indicate that we can't register
these relations?

  :
  cmp1_10 = x_8(D) <= y_9(D);
  cmp2_11 = x_8(D) >= y_9(D);
  _3 = cmp1_10 & cmp2_11;
  if (_3 != 0)

That is, when we see x_8 <= y_9, is there anything in either the range of x_8
or y_9 that would hint that we can't register the <= relation?  I would have
assumed that we could only register this relation if neither x_8 nor y_9 can
have a NAN.  Would this help?

If this won't help, then perhaps the issue is in the oracle's relation
intersection code.  We would need a way to indicate that VREL_LE ^ VREL_GE is
VREL_VARYING if the source operands may have a NAN (or some other fancy FP
condition I don't understand ;-))?

[Bug tree-optimization/107608] [13 Regression] Failure on fold-overflow-1.c and pr95115.c

2023-01-18 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107608

Aldy Hernandez  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #38 from Aldy Hernandez  ---
fixed in trunk

[Bug tree-optimization/107608] [13 Regression] Failure on fold-overflow-1.c and pr95115.c

2023-01-18 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107608

--- Comment #36 from Aldy Hernandez  ---
Can we close this PR?

  1   2   3   4   5   6   7   8   9   10   >