[Bug tree-optimization/111864] [12/13/14 Regression] Dead Code Elimination Regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111864 --- Comment #3 from Aldy Hernandez --- (In reply to Jeffrey A. Law from comment #2) > It almost looks like a costing issue. The threaders find opportunities to > thread all the incoming edges in the key block to the path which avoids the > call.. But all the paths get rejected. > > > This is the key block: > > ;; basic block 11, loop depth 0, count 976284897 (estimated locally, freq > 0.9092), maybe hot > ;;prev block 10, next block 12, flags: (NEW, REACHABLE, VISITED) > ;;pred: 10 [99.8% (guessed)] count:225266786 (estimated locally, > freq 0.2098) (TRUE_VALUE,EXECUTABLE) > ;;14 [100.0% (guessed)] count:751018112 (estimated locally, > freq 0.6994) (TRUE_VALUE,EXECUTABLE) > # o.10_11 = PHI <(10), o.10_28(14)> > _17 = o.10_11 == > _20 = o.10_11 == > _27 = _20 | _17; > if (_27 != 0) > goto ; [58.87%] > else > goto ; [41.13%] > > It's pretty obvious that 10->11 can thread to 6. If we look at the other > incoming edge we need o.10_28 which comes from bb14 with the value So > that path should be 14->10->11 threading to 6. > > But they get rejected during threadfull2. In order for threadfull2 to thread 10->11->6, it would have to handle pointer equivalences, but the new threader doesn't currently handle them. In VRP we are able to handle equivs through the pointer_equiv_analyzer class which pushes/pops state. It only works when traversing in dominator order, which I suppose is technically the case when going down a path. So...I think you could wire pointer_equiv_analyzer to the new threader and get this, though it may need to be also taught about PHIs. The class is very simplistic, and was only a stop gap until we got pointer ranges (with points-to info). Question though, why didn't the old threader get this? I see bb11 looks exactly the same in DOM3, and the old threader does handle pointer equivalences through its scoped tables. May be the IL is too complicated? If we want to fix this in this release, I think we could do it with pointer_equiv_analyzer, but for future releases we should be able to get it for free when pranges are implemented. Half the work is already done...let's see how far I get considering I'll be going on paternity leave for a few months this year.
[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331 --- Comment #13 from Aldy Hernandez --- I think y'all have it all figured out. Basically, operator_cast::op1_range() is solving num_5 in the equation: [111,111] = (short int) num_5 Where lhs is: (gdb) p debug(lhs) [irange] short int [111, 111] And the bitmask is implicit, but correctly calculated: (gdb) p debug(lhs.get_bitmask ()) MASK 0x0 VALUE 0x6f And the range for num_5 is just varying. You are looking at the true side of the truncating_cast_p() conditional in op1_range. Essentially, once it starts building the range it ignores known bitmasks, which cause it to return an overly conservative range. And yes, bitmasks are calculated on demand per the comment in get_bitmask: // The mask inherent in the range is calculated on-demand. For // example, [0,255] does not have known bits set by default. This // saves us considerable time, because setting it at creation incurs // a large penalty for irange::set. At the time of writing there // was a 5% slowdown in VRP if we kept the mask precisely up to date // at all times. Instead, we default to -1 and set it when // explicitly requested. However, this function will always return // the correct mask. We may have to revisit this. Perhaps it doesn't matter any more, and we could keep bitmasks up to date. It would definitely make the dumps easier to read. Though, at least for this testcase the bitmask is correctly returned with get_bitmask(). I think the analysis in comment 11 is spot on. And no Andrew, it's not *my* code, it all comes from the bit-ccp code :-P. fold_range() should call update_bitmask(), which eventually calls bit_value_binop() in the CCP code. And yes Jakub, as you have noticed, BIT_IOR_EXPR, BIT_XOR_EXPR, and likely other operators may need to be tweaked to take bitmasks into account. I wouldn't be surprised if there's a lot of low hanging fruit in this space.
[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331 --- Comment #6 from Aldy Hernandez --- (In reply to Andrew Macleod from comment #5) > (In reply to Jakub Jelinek from comment #4) > > Actually, looking at value-range.h, irange_bitmask doesn't have just the > > mask but also value, so I wonder why it isn't able to figure this out. > > I'd think m_mask should be ~0x and value 111. > > _1 = (short int) num_5(D); > _2 = (int) _1; > switch (_1) > > globally we know > _2 : [irange] int [-32768, 32767] > and coming into bb 3: > _2 : [irange] int [-32768, 32767] > 2->3 _1 : [irange] short int [111, 111] > 2->3 _2 : [irange] int [111, 111] > 2->3 num_5(D) :[irange] int [-INF, -65537][-65425, -65425][111, > 111][65536, +INF] > > I guess what you are looking for is to add known bits to the range produced > for num_5 on the outgoing edge. > > This would have to be done in operator_cast::op1_range where we are > reducing it to known range of the switch. I do not believe it makes any > attempts to set bitmasks based on casts like that currently. > > so, GORI working backwards has _1 = [irange] short int [111, 111] , so it > would be satisfying the expression: > short int [111, 111] = (short int) num_5(D); > > when evaluating num_5 in operator_cast::op1_range, in the truncating_cast_p > section we could also see if there is a bitmask pattern for the LHS that > could be applied to the range.. I think that basically what you are asking > for. Simple enough for a constant I suppose for starters. Or maybe Aldy > has a routine that picks bitmasks and values from ranges somewhere? You may want to look at: // Return the bitmask inherent in the range. irange_bitmask irange::get_bitmask_from_range () const { } IIRC, this code was originally authored by Jakub and was embedded in the tree-ssanames.cc code setting nonzero bits every time we set a global range. I just abstracted it and adapted it to work within our framework.
[Bug ipa/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476 --- Comment #18 from Aldy Hernandez --- (In reply to rguent...@suse.de from comment #17) > On Wed, 21 Feb 2024, aldyh at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476 > > > > --- Comment #8 from Aldy Hernandez --- > > (In reply to Richard Biener from comment #5) > > > (In reply to Martin Jambor from comment #4) > > > > The right place where to free stuff in lattices post-IPA would be in > > > > ipa_node_params::~ipa_node_params() where we should iterate over > > > > lattices > > > > and deinitialize them or perhaps destruct the array because since > > > > ipcp_vr_lattice directly contains Value_Range which AFAIU directly > > > > contains > > > > int_range_max which has a virtual destructor... does not look like a POD > > > > anymore. This has escaped me when I was looking at the IPA-VR changes > > > > but > > > > hopefully it should not be too difficult to deal with. > > > > > > OK, that might work for the IPA side. > > > > > > It's quite unusual to introduce a virtual DTOR in the middle of the class > > > hierarchy though. Grepping I do see quite some direct uses of 'irange' > > > and also 'vrange' which do not have the DTOR visible but 'irange' already > > > exposes and uses 'maybe_resize'. I think those should only be introduced > > > in the class exposing the virtual DTOR (but why virtual?!). > > > > > > Would be nice to have a picture of the range class hierarchies with > > > pointers on which types to use in which circumstances ... > > > > For reference, you should always use the most base class you can. If > > you can get the work done with the vrange API, use that. If you're > > dealing with an integer, use irange. The int_range<> derived class > > should only be used for defining a variable, so: > > > > int_range<123> foobar; // Defined with storage. > > > > if (is_a ) > > { > > irange = as_a (foobar); > > ... > > } > > > > // Use an irange reference here, not an int_range reference. > > void somefunc (const irange ) > > { > > } > > > > Also, the reason irange does not have a destructor is because it has > > no storage. Only int_range<> has storage associated with it, so it is > > the only one able to see if the range grew: > > But when I do > > irange *r = new int_range<>; > delete r; > > then it will fail to release memory? Are virtual DTORs not exactly > invented for this, when you delete on a reference/pointer to a base > class but the object is really a derived one? There is no memory to release above, as int_range<> does not grow by default. Only int_range_max can grow, and it's meant to live on the stack by design. That was the whole point: typedef int_range<3, /*RESIZABLE=*/true> int_range_max; I suppose if you specifically wanted to shoot yourself in the foot, you could do: irange *r = new int_range<5, true>; ...and that would fail to release memory, but why would you do that? If you're allocating a lot of ranges, we have the vrange_allocator written specifically for that, which is a lot less memory intensive. It's what we use in the cache. If ranges are anywhere but the stack, they should go through the allocator which will squish down things appropriately, as iranges are very big. The ipa_vr class also uses the allocator in order to keep the memory footprint down. I guess it wouldn't hurt to have a virtual destructor in irange or even vrange for future proofing, but it's not needed right now. I don't have any strong opinions, unless there's a performance penalty.
[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476 --- Comment #13 from Aldy Hernandez --- (In reply to Jakub Jelinek from comment #12) > (In reply to Aldy Hernandez from comment #11) > > Both patches pass test. Up to the release maintainers to decide if they > > want to include them in this release. Otherwise, I'll queue them up for > > later. > > This is an important regression, why shouldn't it be included in GCC 14? > GCC trunk is open for regression and documentation fixes... Martin's change to IPA fixes the leak in the PR. My changes are just cleanups, as no other pass is using ranges in GC space.
[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476 --- Comment #11 from Aldy Hernandez --- Both patches pass test. Up to the release maintainers to decide if they want to include them in this release. Otherwise, I'll queue them up for later.
[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476 --- Comment #10 from Aldy Hernandez --- Created attachment 57478 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57478=edit Remove GTY support for vrange and friends Bootstraps. Tests are pending.
[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476 --- Comment #9 from Aldy Hernandez --- Created attachment 57477 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57477=edit Remove virtual from int_range destructor. Bootstraps. Tests are pending.
[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476 --- Comment #8 from Aldy Hernandez --- (In reply to Richard Biener from comment #5) > (In reply to Martin Jambor from comment #4) > > The right place where to free stuff in lattices post-IPA would be in > > ipa_node_params::~ipa_node_params() where we should iterate over lattices > > and deinitialize them or perhaps destruct the array because since > > ipcp_vr_lattice directly contains Value_Range which AFAIU directly contains > > int_range_max which has a virtual destructor... does not look like a POD > > anymore. This has escaped me when I was looking at the IPA-VR changes but > > hopefully it should not be too difficult to deal with. > > OK, that might work for the IPA side. > > It's quite unusual to introduce a virtual DTOR in the middle of the class > hierarchy though. Grepping I do see quite some direct uses of 'irange' > and also 'vrange' which do not have the DTOR visible but 'irange' already > exposes and uses 'maybe_resize'. I think those should only be introduced > in the class exposing the virtual DTOR (but why virtual?!). > > Would be nice to have a picture of the range class hierarchies with > pointers on which types to use in which circumstances ... For reference, you should always use the most base class you can. If you can get the work done with the vrange API, use that. If you're dealing with an integer, use irange. The int_range<> derived class should only be used for defining a variable, so: int_range<123> foobar; // Defined with storage. if (is_a ) { irange = as_a (foobar); ... } // Use an irange reference here, not an int_range reference. void somefunc (const irange ) { } Also, the reason irange does not have a destructor is because it has no storage. Only int_range<> has storage associated with it, so it is the only one able to see if the range grew: template inline int_range::~int_range () { if (RESIZABLE && m_base != m_ranges) delete[] m_base; } The irange superclass does not have storage, just the m_base pointer. OTOH, m_ranges[] lives in int_range<>: template class int_range : public irange { ... private: wide_int m_ranges[N*2]; }; This shouldn't be a problem because you should never be putting a raw irange pointer in a container that you allocate/deallocate. Does this help? If so, I could definitely document it in value-range.h.
[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476 --- Comment #7 from Aldy Hernandez --- Let me see if I can untangle things here. Thanks for chasing this down, BTW. Value_Range doesn't need a CTOR because it has an int_range_max which does have one (courtesy of int_range<>), so things get initialized correctly. Deleting a memory allocated container that has a Value_Range also works, because of the ~int_range<> destructor: + { +struct container +{ + Value_Range vr; +}; +struct container *pointer = new container; +pointer->vr.set_type (integer_type_node); +for (int i = 0; i < 100; i += 5) + { + int_range<1> tmp (integer_type_node, + wi::shwi (i, TYPE_PRECISION (integer_type_node)), + wi::shwi (i+1, TYPE_PRECISION (integer_type_node))); + pointer->vr.union_ (tmp); + } +delete pointer; Deleting pointer causes us to call int_range::~int_range() which clean things appropriately. So it looks like the problem is the missing destructor in IPA. That being said, there's a few things I've noticed thanks to you guys' analysis. First, the virtual marker in the int_range<> destructor is not needed. IPA + ranges went through various changes this release, and I believe that was left over from when value_range (int_range<2>) was being placed in GC space directly. We went through various incantations this release wrt IPA storage of ranges, both in GC and in the stack. Ultimately we ended up with ipa_vr, which I believe is the only use of ranges in GC space, and furthermore it uses vrange_storage not vrange nor irange nor anything else. For that matter, vrange_storage doesn't even have pointers, so we shouldn't need GTY markers at all. It seems that since IPA is the only GC user of iranges, I think we can safely remove GTY support from the entire vrange hierarchy. The rule of thumb should be that only int_range<> and derivatives should be in short-term storage, and GC users should use vrange_storage (like ipa_vr is doing, or perhaps vrange_allocator which automates the allocation).
[Bug tree-optimization/113752] [14 Regression] warning: ‘%s’ directive writing up to 10218 bytes into a region of size between 0 and 10240 [-Wformat-overflow=] since r14-261-g0ef3756adf078c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113752 Aldy Hernandez changed: What|Removed |Added CC|aldyh at redhat dot com| --- Comment #3 from Aldy Hernandez --- Can't reproduce on x86-64 on recent trunk: abulafia:~/bld/t/gcc []$ ./xgcc -B./ -c -O3 -Wall a.c abulafia:~/bld/t/gcc []$ cat a.c char a[10256]; char b; char *c, *g; int d, e, f; int sprintf(char *, char *, ...); unsigned long strlen(char *); int h(char *j) { if (strlen(j) + strlen(c) + strlen(g) + 32 > 10256) return 0; sprintf(a, "%s:%s:%d:%d:%d:%c:%s\n", j, c, d, e, f, b, g); return 1; }
[Bug tree-optimization/113735] ICE: in operator[], at vec.h:910 with _BitInt() at -O and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113735 Aldy Hernandez changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #8 from Aldy Hernandez --- fixed in trunk
[Bug tree-optimization/113735] ICE: in operator[], at vec.h:910 with _BitInt() at -O and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113735 --- Comment #6 from Aldy Hernandez --- Created attachment 57336 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57336=edit Proposed patch #2 Thanks for the suggestion Jakub.
[Bug tree-optimization/113735] ICE: in operator[], at vec.h:910 with _BitInt() at -O and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113735 --- Comment #4 from Aldy Hernandez --- Created attachment 57335 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57335=edit Proposed patch Patch in testing.
[Bug tree-optimization/113735] ICE: in operator[], at vec.h:910 with _BitInt() at -O and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113735 --- Comment #2 from Aldy Hernandez --- (In reply to Jakub Jelinek from comment #1) > Slightly tweaked, still -O1: > char b; > void bar (void); > > void > foo (_BitInt(6110) j) > { > for (;;) > { > _BitInt(10) k = b % j; > for (j = 6; j; --j) > if (k) > bar (); > } > } > > The ICE is in on: > 721 if (!m_equiv[bb->index]) > because bb->index is larger than m_equiv size. > The bitint lowering pass works by walking the IL and replacing some > statements in there with lowered code, which can involve new basic blocks > (either through splitting existing blocks or creating fresh new ones) and > the like. And asks the ranger about range of statements during that. > Is that something that ranger doesn't/can't support? > So, would I need to ensure to find out what ranges I'll need before making > any changes, > ask for them, remember them somewhere on the side and then use them during > the transformations? We're generally OK with changing IL, if the semantic of the statement doesn't change. For example, you couldn't change the meaning of a statement to mean something totally different, as that would invalidate the cached value of the resulting SSA. The cache was tweaked in the last release or so to handle growing SSA names. I'm not totally sure on changing BBs, as VRP didn't add BB's in flight, IIRC. But I do see code in the relation oracle handling a change in the BB count: void equiv_oracle::limit_check (basic_block bb) { int i = (bb) ? bb->index : last_basic_block_for_fn (cfun); if (i >= (int)m_equiv.length ()) m_equiv.safe_grow_cleared (last_basic_block_for_fn (cfun) + 1); } Interestingly, this function is never used. Andrew, was this an oversight? The attached patch seems to fix the problem, and it looks like all other dereferences of m_equiv[] have a suitable check, or are looking for things already known to have an entry.Andrew would know for sure. diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc index 27f9ad61c0e..619ee5f0867 100644 --- a/gcc/value-relation.cc +++ b/gcc/value-relation.cc @@ -718,6 +718,7 @@ equiv_oracle::add_equiv_to_block (basic_block bb, bitmap equiv_set) // Check if this is the first time a block has an equivalence added. // and create a header block. And set the summary for this block. + limit_check (bb); if (!m_equiv[bb->index]) { ptr = (equiv_chain *) obstack_alloc (_chain_obstack,
[Bug tree-optimization/110603] [14 Regression] GCC, ICE: internal compiler error: in verify_range, at value-range.cc:1104 since r14-255
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110603 Aldy Hernandez changed: What|Removed |Added CC||amacleod at redhat dot com --- Comment #4 from Aldy Hernandez --- (In reply to wierton from comment #0) > The testing program: > ``` > typedef long unsigned int size_t; > void *memcpy(void *, const void *, size_t); > int snprintf(char *restrict, size_t, const char *restrict, ...); > > extern char a[2]; > void test_func_on_line_62(void) { > memcpy(a, "12", sizeof("12") - 1); > const int res = snprintf(0, 0, "%s", a); > if (res <= 3) > do { > extern void f(void); > f(); > } while (0); > } The sprintf pass is ICEing because it's trying to build a nonsensical range of [2,1]. Legacy irange tried harder with swapped ranges, but in the above case it would actually drop to VARYING: - /* There's one corner case, if we had [C+1, C] before we now have -that again. But this represents an empty value range, so drop -to varying in this case. */ Which would cause the sprintf pass to set a global range of VARYING. I can't remember whether this meant nuking the known global range, or ignoring it altogether (the semantics changed in the last release or two). My guess is the later, since set_range_info() improves ranges, never pessimizes them. Now the reason we're passing swapped endpoints seems to originate in get_range_strlen_dynamic(). It is setting a min of 2, courtesy of the nonzero characters in the memcpy: memcpy(a, "12", sizeof("12") - 1); This comes from tree-ssa-strlen.c: if (!pdata->minlen && si->nonzero_chars) { if (TREE_CODE (si->nonzero_chars) == INTEGER_CST) pdata->minlen = si->nonzero_chars; Further down we set a max of 1, stemming from the size of a[2] minus 1 for the terminating null: if (TREE_CODE (size) == INTEGER_CST) { ++off; /* Increment for the terminating nul. */ tree toffset = build_int_cst (size_type_node, off); pdata->maxlen = fold_build2 (MINUS_EXPR, size_type_node, size, toffset); pdata->maxbound = pdata->maxlen; } I don't understand this code enough to opine, but at the worst we could bail if the ends are swapped. It's no worse than what we had before.
[Bug tree-optimization/102958] std::u8string suboptimal compared to std::string
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102958 --- Comment #7 from Aldy Hernandez --- Created attachment 57016 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57016=edit preprocessed testcase with GCC13 Compile with -O2 -std=c++20
[Bug tree-optimization/102958] std::u8string suboptimal compared to std::string
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102958 Aldy Hernandez changed: What|Removed |Added CC||aldyh at gcc dot gnu.org, ||amacleod at redhat dot com, ||jakub at gcc dot gnu.org, ||jason at gcc dot gnu.org, ||jwakely.gcc at gmail dot com Summary|std::u8string suboptimal|std::u8string suboptimal |compared to std::string,|compared to std::string |triggers warnings | --- Comment #6 from Aldy Hernandez --- Adjusting description, since not only have we disabled the warning in the C++ headers so this no longer warns, but the underlying problem has nothing to do with warnings. The char_traits specialization is opaque enough such that we can't figure out the length: static _GLIBCXX17_CONSTEXPR size_t length(const char_type* __s) { #if __cplusplus >= 201703L if (std::__is_constant_evaluated()) return __gnu_cxx::char_traits::length(__s); #endif size_t __i = 0; while (!eq(__s[__i], char_type())) ++__i; return __i; } OTOH, the specialization falls back to a __builtin_strlen which which is trivial to see through. I think this boils down to pinski's comment that we fail to see a string length calculation in the following sequence, which survives all the way to the .optimized dump: [local count: 8687547538]: # __i_46 = PHI <__i_22(3), 0(2)> __i_22 = __i_46 + 1; _24 = MEM[(const char_type &)"123456789" + __i_22 * 1]; if (_24 != 0) goto ; [89.00%] else goto ; [11.00%] I've seen variations of the above being turned into __builtin_strlen by fre, ldist, as well as the strlen [ass. Who's job is it perform this optimization?
[Bug tree-optimization/111458] [11 Regression] ICE in in dfs_enumerate_from, at cfganal.c:1560
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111458 --- Comment #4 from Aldy Hernandez --- (In reply to Richard Biener from comment #3) > This issue is still latent in the forward threader. The backward threader > calls this function from back_threader_profitability::profitable_path_p, > so before doing any CFG changes. In GCC 12 and later the remaining forward > threader instances are all from DOM(?). Yes. Remaining uses of the forward threader are all from DOM. I'd like to remove the forward threader in the next release, but it requires being absolutely sure ranger catches everything the scoped tables do, especially the floating point stuff.
[Bug c/111468] cannot express unordered equal in gimple FE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111468 --- Comment #6 from Aldy Hernandez --- (In reply to Richard Biener from comment #4) > Fixed on trunk. Sweet. Thanks so much. This really helps.
[Bug middle-end/111468] New: cannot express unordered equal in gimple FE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111468 Bug ID: 111468 Summary: cannot express unordered equal in gimple FE Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: aldyh at gcc dot gnu.org Target Milestone: --- Created attachment 55930 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55930=edit Failing testcase It looks like you can't express an UNEQ_EXPR in the GIMPLE FE. Am I missing something? The snippet below compiles fine with: if (x_1 == a_5(D)) but not with: if (x_1 u== a_5(D)) And at least tree-pretty-print.cc has: case UNEQ_EXPR: return "u=="; I'm trying to come up with a threading test for some unordered stuff, and I'm getting: $ ./cc1 c.c -O2 -fgimple -quiet c.c: In function ‘foo’: c.c:19:10: error: expected ‘)’ before ‘u’ 19 | if (x_1 u== a_5(D)) | ^~ | ) [snip snip] Is there a blessed way of expressing unordered comparisons in the GIMPLE FE?
[Bug tree-optimization/110875] [14 Regression] Dead Code Elimination Regression since r14-2501-g285c9d042e9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110875 Aldy Hernandez changed: What|Removed |Added CC||amacleod at redhat dot com --- Comment #2 from Aldy Hernandez --- (In reply to Andrew Pinski from comment #1) > I am super confused about VRP's ranges: > We have the following that ranges that get exported and their relationships: > Global Exported: a.8_105 = [irange] int [-2, 0] > _10 = a.8_105 + -1; > Global Exported: _10 = [irange] int [-INF, -6][-3, -1][1, 2147483645] > _103 = (unsigned int) _10; > Global Exported: _103 = [irange] unsigned int [1, 2147483645][2147483648, > 4294967290][4294967294, +INF] > Simplified relational if (_103 > 1) > into if (_103 != 1) > > > Shouldn't the range of _10 just be [-3,-1] > If so _103 can't get 0 or 1 ? And then if that gets it right then the call > to foo will go away. [It looks like a caching issue of some kind. Looping Andrew.] Yes, that is indeed confusing. _10 should have a more refined range. Note that there's a dependency between a.8_105 and _10: [local count: 327784168]: # f_lsm.17_26 = PHI # a.8_105 = PHI <0(3), _10(13)> # b_lsm.19_33 = PHI # b_lsm_flag.20_53 = PHI <0(3), 1(13)> # a_lsm.21_49 = PHI <_54(D)(3), _10(13)> _9 = e.10_39 + 4294967061; _10 = a.8_105 + -1; if (_10 != -3(OVF)) goto ; [94.50%] else goto ; [5.50%] This is what I see with --param=ranger-debug=tracegori in VRP2... We first calculate a.8_105 to [-INF, -5][-2, 0][2, 2147483646]: 1140 range_of_stmt (a.8_105) at stmt a.8_105 = PHI <0(3), _10(13)> 1141 ROS dependence fill ROS dep fill (a.8_105) at stmt a.8_105 = PHI <0(3), _10(13)> ROS dep fill (_10) at stmt _10 = a.8_105 + -1; 1142 range_of_expr(a.8_105) at stmt _10 = a.8_105 + -1; TRUE : (1142) range_of_expr (a.8_105) [irange] int [-INF, -5][-2, 0][2, 2147483646] Which we later refine with SCEV: Statement _10 = a.8_105 + -1; is executed at most 2147483647 (bounded by 2147483647) + 1 times in loop 4. Loops range found for a.8_105: [irange] int [-2, 0] and calculated range :[irange] int [-INF, -6][-2, 0][2, 2147483645] TRUE : (1140) range_of_stmt (a.8_105) [irange] int [-2, 0] Global Exported: a.8_105 = [irange] int [-2, 0] I have verified that range_of_expr after this point returns [-2, 0], so we know both globally and locally this refined range. However, when we try to fold _10 later on, we use the cached value instead of recalculating with the new range for a.8_105: Folding statement: _10 = a.8_105 + -1; 872 range_of_stmt (_10) at stmt _10 = a.8_105 + -1; TRUE : (872) cached (_10) [irange] int [-INF, -6][-3, -1][1, 2147483645]
[Bug ipa/110753] [14 Regression] ICE in meet_with_1, at ipa-cp.cc:1057
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110753 Aldy Hernandez changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #5 from Aldy Hernandez --- (In reply to Aldy Hernandez from comment #3) > MASK 0xfffc VALUE 0x0 (low 2 bits are known to be 1, everything > else is unknown). FWIW, the above implies the low 2 bits are known to be 0, not 1. Not that it matters, as I just fixed this PR in trunk :).
[Bug ipa/110753] [14 Regression] ICE in meet_with_1, at ipa-cp.cc:1057
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110753 Aldy Hernandez changed: What|Removed |Added CC||aldyh at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |aldyh at gcc dot gnu.org --- Comment #3 from Aldy Hernandez --- The intersection of two mask/value pairs is being pessimized incorrectly. MASK 0x0 VALUE 0x1 (value is known to be 1) MASK 0xfffc VALUE 0x0 (low 2 bits are known to be 1, everything else is unknown). irange_bitmask::intersect() is intersecting these to: MASK 0x VALUE 0x0 (i.e. totally unknown value). This is causing union_bitmask() to return a change when none actually occurred, because we end up losing the bitmask. Mine.
[Bug tree-optimization/107043] range information not used in popcount
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107043 Aldy Hernandez changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from Aldy Hernandez --- Finally... fixed in trunk :).
[Bug tree-optimization/107053] ones bits is not tracked and popcount is not tracked
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107053 Aldy Hernandez changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #5 from Aldy Hernandez --- fixed in trunk
[Bug middle-end/110228] [13/14 Regression] llvm-16 miscompiled due to an maybe uninitialized and optimizations saying that the uninitialized has a nonzero bits of 1.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110228 --- Comment #17 from Aldy Hernandez --- (In reply to Andrew Pinski from comment #4) > Phiopt does this: > ``` > v_13 == 1 ? 1 : LookupFlags_6 > Matching expression match.pd:1990, gimple-match-5.cc:23 > Matching expression match.pd:1990, gimple-match-5.cc:23 > Matching expression match.pd:2479, gimple-match-4.cc:35 > Matching expression match.pd:2482, gimple-match-3.cc:66 > Matching expression match.pd:2489, gimple-match-2.cc:58 > Matching expression match.pd:1947, gimple-match-7.cc:20 > Applying pattern match.pd:4742, gimple-match-7.cc:15326 > Folded into the sequence: > _17 = v_13 == 1; > _18 = LookupFlags_6 | _17; > Removing basic block 5 > ;; basic block 5, loop depth 1 > ;; pred: 4 > ;; succ: 6 > ``` > As zero_one_valued_p returns true for LookupFlags_6 because > tree_nonzero_bits/get_nonzero_bits returns 1 for it. > > So it is ranger in the end that returns that it has a nonzero bits of 1. > Which is completely wrong as LookupFlags_6 is defined by: > # LookupFlags_6 = PHI > > > Which has an uninitialized variable in it which is not defined at what its > value would be BTW, it doesn't seem ranger would be involved here at all, since this bug happens with -O1, and evrp nor VRP* run at this level. The only other way ranger could be involved at -O1 is through DOM threading's use of ranger, but phiopt1 from which your dump above comes from, runs before DOM. I realize further downthread y'all conclude this is an unitialized issue, but I'm just trying to understand how ranger could have been involved. FWIW, there are other passes which set global nonzero bits at -O1, most notably CCP. Theoretically any pass which calls set_range_info() or set_nonzero_bits() would alter everyone's view of a range (strlen pass, PRE, DOM, sprintf, CCP, etc). And way down on the list of likely scenarios is fold_range() being called with a global query object. This would get ranger involved in folding something with known global values (no additional lookups), but it's unlikely. You probably got a nonzero bits from some other pass.
[Bug middle-end/110233] [12/13/14 Regression] Wrong code at -Os on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110233 --- Comment #4 from Aldy Hernandez --- FWIW, a less intrusive and probably more correct way of seeing what ranger knows at this point would be to put a breakpoint where you're seeing: Queued stmt for removal. Folds to: 2147483647 This is in tree-ssa-propagate.cc. There you can do: (gdb) p cfun->x_range_query->dump (stderr) which will pick up the current ranger and dump what it knows about the IL, without tickling any new queries: ... ... === BB 3 b.6_9 [irange] int [1334323000, +INF] NONZERO 0x7fff [local count: 357878153]: _28 = b.6_9 * 2; ivtmp.19_27 = 2147483647; ivtmp.22_31 = (unsigned long) goto ; [100.00%] _28 : [irange] int [+INF, +INF] NONZERO 0x7ffe
[Bug middle-end/110233] [12/13/14 Regression] Wrong code at -Os on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110233 --- Comment #3 from Aldy Hernandez --- (In reply to Andrew Pinski from comment #2) > VRP2/DOM3 produces the wrong folding for some reason: > Folding statement: _27 = b.6_9 * 2; > Queued stmt for removal. Folds to: 2147483647 > > I don't uinderstand how it could get that from: Putting a breakpoint in execute_ranger_vrp() for the VRP2 instance, and dumping what the ranger knows with debug_ranger() we see: === BB 3 b.6_9 [irange] int [1334323000, +INF] NONZERO 0x7fff [local count: 357878153]: _28 = b.6_9 * 2; ivtmp.19_27 = 2147483647; ivtmp.22_31 = (unsigned long) goto ; [100.00%] ivtmp.19_27 : [irange] unsigned int [2147483647, 2147483647] NONZERO 0x7fff _28 : [irange] int [+INF, +INF] NONZERO 0x7ffe ivtmp.22_31 : [irange] unsigned long [1, +INF] So on entry to BB3, b.6_9 is [1334323000, +INF]. This comes from the 2->3 edge, which is the only incoming edge to BB3. Isn't multiplying any number in that range an overflow, and thus undefined? Ranger is coming back with: _28 : [irange] int [+INF, +INF] NONZERO 0x7ffe which sounds reasonable, and the reason you're seeing 2147483647.
[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695 --- Comment #38 from Aldy Hernandez --- (In reply to Andrew Macleod from comment #37) > (In reply to CVS Commits from comment #36) > > > For the curious, a particular hot spot for IPA in this area was: > > > > ipcp_vr_lattice::meet_with_1 (const value_range *other_vr) > > { > > ... > > ... > > value_range save (m_vr); > > m_vr.union_ (*other_vr); > > return m_vr != save; > > } > > > > The problem isn't the resizing (since we do that at most once) but the > > fact that for some functions with lots of callers we end up a huge > > range that gets copied and compared for every meet operation. Maybe > > the IPA algorithm could be adjusted somehow??. > > isn't the the same thing as > > return m_vr.union_ (*other_vr); > > which should be much faster without all the copying... ? Indeed. That is in the committed version of the patch. I just forgot to update the message. The reason I didn't originally do as you suggested was that there was a bug in the union code that returned TRUE for a union that hadn't changed anything. That has also been fixed in trunk.
[Bug tree-optimization/109934] [14 Regression] Wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109934 Aldy Hernandez changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #5 from Aldy Hernandez --- fixed. Thanks for reporting.
[Bug tree-optimization/109934] [14 Regression] Wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109934 --- Comment #3 from Aldy Hernandez --- Created attachment 55140 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55140=edit untested patch in testing
[Bug tree-optimization/109934] [14 Regression] Wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109934 Aldy Hernandez changed: What|Removed |Added CC||amacleod at redhat dot com Assignee|unassigned at gcc dot gnu.org |aldyh at gcc dot gnu.org --- Comment #2 from Aldy Hernandez --- Woah...this is a latent bug in irange::invert that seems to have been here for a very long time. In the loop unswitching code we do: false_range = true_range; if (!false_range.varying_p () && !false_range.undefined_p ()) false_range.invert (); ...and get the false_range all wrong: (gdb) p debug(false_range) [irange] unsigned int [44, 44][111, 111][222, 222] NONZERO 0xff $40 = void (gdb) n (gdb) n (gdb) n (gdb) n (gdb) p debug(false_range) [irange] unsigned int [44, +INF] In no universe is the inverse of the false_range equal to [44, +INF]. Whoops. This craziness happens here: if (m_num_ranges == m_max_ranges && lower_bound () != type_min && upper_bound () != type_max) { m_base[1] = type_max; m_num_ranges = 1; return; } I have no idea what we were trying to do here, but it's clearly wrong. This probably never triggered because the old legacy code didn't use this code, and the new code used int_range<255> (int_range_max) which made it extremely unlikely this would ever trigger.
[Bug tree-optimization/109920] [14 Regression] value-range.h: Mismatched new [] and delete
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109920 Aldy Hernandez changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #5 from Aldy Hernandez --- Fixed. David, thanks for the analysis. You did 99% of the work here :).
[Bug tree-optimization/109920] [14 Regression] value-range.h: Mismatched new [] and delete
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109920 --- Comment #2 from Aldy Hernandez --- Created attachment 55137 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55137=edit untested patch Does this fix the problem?
[Bug ipa/109886] UBSAN error: shift exponent 64 is too large for 64-bit type when compiling gcc.c-torture/compile/pr96796.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109886 --- Comment #4 from Aldy Hernandez --- (In reply to Andrew Pinski from comment #3) > (In reply to Aldy Hernandez from comment #2) > > If irange::supports_p (TREE_TYPE (arg)) is true, we're talking about an > > integer/pointer, but if range_cast is being called on a parm_type of > > RECORD_TYPE, someone's trying to cast a structure to an integer. Is that > > the intent here, because that will not work with ranges?? > > That is correct. The generated code has a VIEW_CONVERT_EXR from an integer > type to a RECORD_TYPE. Eeeech. In that case, then what you suggest is reasonable. Bail if param_type is not supported by the underlying range. Maybe the IPA experts could opine?
[Bug ipa/109886] UBSAN error: shift exponent 64 is too large for 64-bit type when compiling gcc.c-torture/compile/pr96796.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109886 --- Comment #2 from Aldy Hernandez --- (In reply to Andrew Pinski from comment #1) > Breakpoint 6, range_cast (r=..., type=) at > /home/apinski/src/upstream-gcc/gcc/gcc/range-op.cc:4853 > 4853 Value_Range tmp (r); > > > Confirmed. > The code looks like: > ``` > int g_5, func_1_l_32, func_50___trans_tmp_31; > ... > int func_1() { func_50(g_3_4, g_5, func_1_l_32, 8, 3); } > ... > struct S0 *func_50(int p_51, struct S0 p_52, struct S1 p_53, int p_54, >int p_55) { > ... > } > ``` > > Code in gcc: > if (TREE_CODE (arg) == SSA_NAME > && param_type > /* Limit the ranger query to integral types as the rest > of this file uses value_range's, which only hold > integers and pointers. */ > && irange::supports_p (TREE_TYPE (arg)) > && get_range_query (cfun)->range_of_expr (vr, arg) > && !vr.undefined_p ()) > { > value_range resvr = vr; > range_cast (resvr, param_type); > if (!resvr.undefined_p () && !resvr.varying_p ()) > ipa_set_jfunc_vr (jfunc, ); > else > gcc_assert (!jfunc->m_vr); > } > else > gcc_assert (!jfunc->m_vr); > > > Maybe there should be an extra check for `irange::supports_p (param_type)` > too to catch the case where param_type type is not supported at all. If irange::supports_p (TREE_TYPE (arg)) is true, we're talking about an integer/pointer, but if range_cast is being called on a parm_type of RECORD_TYPE, someone's trying to cast a structure to an integer. Is that the intent here, because that will not work with ranges??
[Bug tree-optimization/109791] -Wstringop-overflow warning with -O3 and _GLIBCXX_USE_CXX11_ABI=0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109791 --- Comment #6 from Aldy Hernandez --- > but the issue with the PHI node remains unless we sink the part > (but there's many uses of __i_14). I guess it's still the "easiest" > way to get rangers help. Aka make > > # __i_14' = PHI <1(10), 2(9)> > __i_14 = + __i_14'; // would be a POINTER_PLUS_EXPR > > it's probably still not a complete fix but maybe a good start. Of course > it increases the number of stmts - [ + 1B] was an 'invariant' > (of course the PHI result isn't). There's not a good place for this > transform - we never "fold" PHIs (and this would be an un-folding). Ughh, that sucks. Let's see if Andrew has any ideas, but on my end I won't be able to work on prange until much later this cycle-- assuming I finish what I have on my plate.
[Bug tree-optimization/109791] -Wstringop-overflow warning with -O3 and _GLIBCXX_USE_CXX11_ABI=0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109791 --- Comment #4 from Aldy Hernandez --- BTW, another reason I had to drop the prange work was because IPA was doing their own thing with ranges outside of the irange API, so it was harder to separate things out. So really, all this stuff was related to legacy, which is mostly gone, and my upcoming work on IPA later this cycle should clean up the rest of IPA.
[Bug tree-optimization/109791] -Wstringop-overflow warning with -O3 and _GLIBCXX_USE_CXX11_ABI=0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109791 --- Comment #3 from Aldy Hernandez --- (In reply to Richard Biener from comment #2) > Confirmed. This is a missed optimization, we fail to optimize the loop guard > > [local count: 329643239]: > _4 = (unsigned long) [(void *) + 2B]; > _6 = (unsigned long) __i_14; > _50 = -_6; > _100 = _4 + 18446744073709551615; > _40 = _100 - _6; > _41 = _40 > 13; > if (_41 != 0) Do we even get a non-zero range for _4? I'm assuming that even if we get that, we can't see that +2B minus 1 is also nonzero? > > with __i_14 being > > [local count: 452186132]: > # __i_14 = PHI < [(void *) + 1B](10), > [(void *) + 2B](9)> > > I'll note that the strlen pass runs before VRP (but after DOM), but I'll > also note that likely ranger isn't very good with these kind of > "symbolic" ranges? How would we handle this? Using two > relations, __i_14 >= + 1 && __i_14 <= + 2? Yeah, we don't do much with that. Although the pointer equivalency class should help in VRP's case. We do some simple pointer tracking and even call into gimple fold to simplify statements, but it's far from perfect and as you say, strlen is running before VRP, so it wouldn't help in this case. > > DOM has > > Optimizing block #16 > > 1>>> STMT 1 = [(void *) + 2B] ge_expr __i_14 > 1>>> STMT 1 = [(void *) + 2B] ne_expr __i_14 > 1>>> STMT 0 = [(void *) + 2B] eq_expr __i_14 > 1>>> STMT 1 = [(void *) + 2B] gt_expr __i_14 > 1>>> STMT 0 = [(void *) + 2B] le_expr __i_14 > Optimizing statement _4 = (unsigned long) [(void *) + 2B]; > LKUP STMT _4 = nop_expr [(void *) + 2B] > 2>>> STMT _4 = nop_expr [(void *) + 2B] > Optimizing statement _6 = (unsigned long) __i_14; > LKUP STMT _6 = nop_expr __i_14 > 2>>> STMT _6 = nop_expr __i_14 > Optimizing statement _50 = -_6; > Registering value_relation (_6 pe64 __i_14) (bb16) at _6 = (unsigned long) > __i_14; > LKUP STMT _50 = negate_expr _6 > 2>>> STMT _50 = negate_expr _6 > Optimizing statement _100 = _4 + 18446744073709551615; > LKUP STMT _100 = _4 plus_expr 18446744073709551615 > 2>>> STMT _100 = _4 plus_expr 18446744073709551615 > Optimizing statement _40 = _100 - _6; > Registering value_relation (_100 < _4) (bb16) at _100 = _4 + > 18446744073709551615; > LKUP STMT _40 = _100 minus_expr _6 > 2>>> STMT _40 = _100 minus_expr _6 > Optimizing statement _41 = _40 > 13; > LKUP STMT _41 = _40 gt_expr 13 > 2>>> STMT _41 = _40 gt_expr 13 > LKUP STMT _40 le_expr 14 > Optimizing statement if (_41 != 0) > > Visiting conditional with predicate: if (_41 != 0) > > With known ranges > _41: [irange] bool VARYING Ranger won't do anything, but can DOM's scoped tables do anything here? The hybrid threader in DOM first asks DOM's internal mechanism before asking ranger (which seems useless in this case): dom_jt_simplifier::simplify (gimple *stmt, gimple *within_stmt, basic_block bb, jt_state *state) { /* First see if the conditional is in the hash table. */ tree cached_lhs = m_avails->lookup_avail_expr (stmt, false, true); if (cached_lhs) return cached_lhs; /* Otherwise call the ranger if possible. */ if (state) return hybrid_jt_simplifier::simplify (stmt, within_stmt, bb, state); return NULL; } Our long term plan for pointers was providing a prange class and separating pointers from irange. This class would track zero/nonzero mostly, but it would also do some pointer equivalence tracking, thus subsuming what we do with pointer_equiv_analyzer which is currently restricted to VRP and is an on-the-side hack. The prototype I had for it last release tracked the equivalence plus an offset, so we should be able to do arithmetic on a prange of say [ + X]. I had to put it aside because the frange work took way longer than expected, plus legacy got in the way. Neither of this is an issue now, so we'll see.. but that's the plan. I should dust off those patches :-/.
[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695 --- Comment #35 from Aldy Hernandez --- We could also tweak the number of sub-ranges. 8 (??) also sounds good for a few percent less in performance drop, if we care. p.s. I did try the auto_vec thing for a 25% loss in VRP performance, even when using address(), reserve(), etc. I may have gotten something wrong, but it didn't look promising. I could post my attempt and someone could take it from there, but I think the one irange approach with sensible defaults that automatically grow to MAX, better.
[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695 --- Comment #34 from Aldy Hernandez --- Excellent ideas! For that matter, we may get away with defaulting to 3 sub-ranges and always resizing as needed (up to MAX). Needing more than 3 sub-ranges is so rare (less than 0.5% of the time), that the penalty will be small. Furthermore, these defaults are sensible enough that we could nuke int_range altogether and have irange have this small [3*2] array. After all, most uses of int_range now are int_range_max, since we never know the size of the range (except in rare cases such as boolean_type_node, etc). This would simplify the code and get rid of the annoying templates which I hate. No need for int_range_max, or int_range, etc. Just plain irange. This would give us an irange of 592 bytes compared to 40912 for int_range_max currently. Plus, it's not that far away from int_range<2> which currently is 432 bytes, and as I mentioned, barely happens as we mostly use int_range_max. I think this is a nice trade off. Cleaner more flexible code, without templates. Oh... preliminary tests show it's a 5% penalty for VRP, which is more than covered by our 13.22% improvement (plus Andrew's cache improvements) to VRP.
[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695 --- Comment #30 from Aldy Hernandez --- (In reply to Jakub Jelinek from comment #29) > Comment on attachment 55031 [details] > WIP patch for a dynamic int_range<> > > What I meant is that by using a auto_vec could avoid reimplementing larger > chunks of vec.h/vec.cc for this. > Resizing by adding 10 more ranges can have higher compile time cost than > what vec.cc (calculate_allocation_1) does - doubles the size each time for > smaller sizes and and multiplying previous by 3 / 2 for larger ones. Hmmm, that may require a lot more work reshuffling how irange is implemented internally. I'll take a peek, but I can't afford to spend another week on this ;-). Also, adding 10 or multiplying by 2, or even adding 5 IIRC, didn't make much of a difference in our benchmarks.
[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695 --- Comment #28 from Aldy Hernandez --- Created attachment 55031 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55031=edit WIP patch for a dynamic int_range<> Here's my WIP.
[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695 --- Comment #24 from Aldy Hernandez --- FYI. I originally tried new/delete for allocation, which was a tad slower than ggc_alloc / ggc_free. Not too much, but measurable. Another idea would be to have a global obstack which auto_int_range<> uses, which gets freed at the end of every pass. This saves us various ggc_free()s throughout, especially at destruction.
[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695 --- Comment #23 from Aldy Hernandez --- An update on the int_range_max memory bloat work. As Andrew mentioned, having int_range<25> solves the problem, but is just kicking the can down the road. I ran some stats on what we actually need on a bootstrap, and 99.7% of ranges fit in a 3 sub-range range, but we need more to represent switches, etc. There's no clear winner for choosing , as the distribution for anything past <3> is rather random. What I did see was that at no point do we need more than 125 sub-ranges (on a set of .ii files from a boostrap). I've implemented various alternatives using a dynamic approach similar to what we do for auto_vec. I played with allocating 2x as much as needed, and allocating 10 or 20 more than needed, as well going from N to 255 in one go. All of it required some shuffling to make sure the penalty isn't much wrt virtuals, etc, but I think the dynamic approach is the way to go. The question is how much of a performance hit are we willing take in order to reduce the memory footprint. Memory to speed is a linear relationship here, so we just have to pick a number we're happy with. Here are some numbers for various sub-ranges (the sub-ranges grow automatically in union, intersect, invert, and assignment, which are the methods that grow in sub-ranges). trunk (wide_ints <255>) => 40912 bytes GCC 12 (trees <255>)=> 4112 bytes auto_int_range<2> =>432 bytes (5.14% penalty for VRP) auto_int_range<3> =>592 bytes (4.01% penalty) auto_int_range<8> => 1392 bytes (2.68% penalty) auto_int_range<10> => 1712 bytes (2.14% penalty) As you can see, even at N=10, we're still 24X smaller than trunk and 2.4X smaller than GCC12 for a 2.14% performance drop. I'm tempted to just pick a number and tweak this later as we have ultimate flexibility here. Plus, we can also revert to a very small N, and have passes that care about switches use their own temporaries (auto_int_range<20> or such). Note that we got a 13.22% improvement for the wide_int+legacy work, so even the absolute worst case of a 5.14% penalty would have us sitting on a net 8.76% improvement over GCC12. Bike shedding welcome ;-)
[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695 Aldy Hernandez changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #16 from Aldy Hernandez --- Is there a way to measure peak memory usage in the compiler? I'd like to gather some stats. Also do we have a handful of programs we typically use to measure memory usage? ISTR that Richard (??) had a set of memory hog programs.
[Bug tree-optimization/54627] VRP uses lots of memory and compile-time
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54627 Aldy Hernandez changed: What|Removed |Added CC||aldyh at gcc dot gnu.org --- Comment #3 from Aldy Hernandez --- Just ran into this while looking for something else. Should this be closed, as the implementation of VRP is different now? (i.e. no equivalence set bitmaps).
[Bug ipa/109711] [14 regression] ICE (tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in verify_range, at value-range.cc:1060) when building ffmpeg-4.4.4 since r14-377-gc92b8be9b52b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109711 Aldy Hernandez changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #11 from Aldy Hernandez --- fixed
[Bug ipa/109711] [14 regression] ICE (tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in verify_range, at value-range.cc:1060) when building ffmpeg-4.4.4 since r14-377-gc92b8be9b52b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109711 Aldy Hernandez changed: What|Removed |Added Attachment #54980|0 |1 is obsolete|| --- Comment #9 from Aldy Hernandez --- Created attachment 54982 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54982=edit patch in testing I think it's easiest to restore the legacy behavior for now. A proper conversion of IPA should follow.
[Bug ipa/109711] [14 regression] ICE (tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in verify_range, at value-range.cc:1060) when building ffmpeg-4.4.4 since r14-377-gc92b8be9b52b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109711 --- Comment #8 from Aldy Hernandez --- Created attachment 54980 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54980=edit untested This may fix it.
[Bug ipa/109711] [14 regression] ICE (tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in verify_range, at value-range.cc:1060) when building ffmpeg-4.4.4 since r14-377-gc92b8be9b52b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109711 Aldy Hernandez changed: What|Removed |Added CC||amacleod at redhat dot com --- Comment #7 from Aldy Hernandez --- It looks like IPA is trying to set a range for a float even though value_range does not support floats. This worked before because legacy silently set a range of error_mark_node.
[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695 --- Comment #13 from Aldy Hernandez --- (In reply to Jakub Jelinek from comment #12) > Perhaps change int_range to have the wide_ints as auto_vec with reserved > space for a few (perhaps 3 (times 2))? Our original implementation was exactly that, but we switched to the current one years ago (maybe it coincided with our switch back to trees, I can't remember). But yes, we're discussing options.
[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695 --- Comment #6 from Aldy Hernandez --- I forgot to add. Running with "ulimit -s unlimited" does not segfault.
[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695 Aldy Hernandez changed: What|Removed |Added CC||amacleod at redhat dot com --- Comment #4 from Aldy Hernandez --- Deep recursion in the ranger is causing us to run out of stack space. This failure coincides with switching the internal representation of irange's to wide_ints which take more space. The problem happens in the waccess pass and can be seen with: --param=ranger-debug=all It looks like we start going bat shit crazy around: 2401 range_on_edge (_1011) on edge 1016->1017 2402 range_on_exit (_1011) from BB 1016 2403 range_of_expr(_1011) at stmt resx 4 2404 range_on_entry (_1011) to BB 1016 2405 range_of_stmt (_1011) at stmt _1011 = PHI <_1947(3), _1010(1013)>
[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695 Aldy Hernandez changed: What|Removed |Added Last reconfirmed||2023-05-02 Status|UNCONFIRMED |NEW Ever confirmed|0 |1
[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695 Aldy Hernandez changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |aldyh at gcc dot gnu.org Priority|P3 |P1 CC|aldyh at redhat dot com|aldyh at gcc dot gnu.org --- Comment #3 from Aldy Hernandez --- Mine. Started with: c92b8be9b52b7e0de5ad67bc268dad1498181908 is the first bad commit commit c92b8be9b52b7e0de5ad67bc268dad1498181908 Author: Aldy Hernandez Date: Sun Feb 19 17:43:43 2023 +0100 Convert internal representation of irange to wide_ints.
[Bug ipa/109639] [14 regression] Internal compiler error: tree check: expected integer_cst, have addr_expr in to_wide, at tree.h:6283 since r14-249-g3c9372dfee0bb8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109639 Aldy Hernandez changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #16 from Aldy Hernandez --- fixed
[Bug ipa/109639] [14 regression] Internal compiler error: tree check: expected integer_cst, have addr_expr in to_wide, at tree.h:6283 since r14-249-g3c9372dfee0bb8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109639 --- Comment #14 from Aldy Hernandez --- Created attachment 54939 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54939=edit proposed patch in testing
[Bug ipa/109639] [14 regression] Internal compiler error: tree check: expected integer_cst, have addr_expr in to_wide, at tree.h:6283 since r14-249-g3c9372dfee0bb8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109639 --- Comment #13 from Aldy Hernandez --- Sorry, it wasn't the setter doing the normalization, but the range_fold_{unary,binary}_expr helpers. Since IPA was the only user of these deprecated functions, this should be relatively easy to contain.
[Bug ipa/109639] [14 regression] Internal compiler error: tree check: expected integer_cst, have addr_expr in to_wide, at tree.h:6283 since r14-249-g3c9372dfee0bb8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109639 Aldy Hernandez changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |aldyh at gcc dot gnu.org CC||amacleod at redhat dot com --- Comment #12 from Aldy Hernandez --- We're trying to build a range of [, ] and are failing because irange's no longer support symbolics. Truth be told, we haven't supported them in a while, because ranger doesn't. However, the setter was normalizing addresses to non-zero, which it is no longer doing. Mine.
[Bug ipa/109639] [14 regression] Internal compiler error: tree check: expected integer_cst, have addr_expr in to_wide, at tree.h:6283 since r14-249-g3c9372dfee0bb8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109639 Aldy Hernandez changed: What|Removed |Added CC||slyfox at gcc dot gnu.org --- Comment #11 from Aldy Hernandez --- *** Bug 109643 has been marked as a duplicate of this bug. ***
[Bug ipa/109643] [14 Regression] IPA inline ICE on pkg-config-0.29.2 since r14-249-g3c9372dfee0bb8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109643 Aldy Hernandez changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #4 from Aldy Hernandez --- dup *** This bug has been marked as a duplicate of bug 109639 ***
[Bug tree-optimization/109593] [14 Regression] Valgrind error doing build of latest gcc trunk since r14-32-g10e481b154c5fc63
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109593 Aldy Hernandez changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #11 from Aldy Hernandez --- fixed
[Bug tree-optimization/109593] [14 Regression] Valgrind error doing build of latest gcc trunk since r14-32-g10e481b154c5fc63
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109593 --- Comment #8 from Aldy Hernandez --- Created attachment 54908 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54908=edit untested patch Patch in testing. Does this fix the problem on the reporter's side?
[Bug c/109593] valgrind error doing build of latest gcc trunk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109593 --- Comment #5 from Aldy Hernandez --- Huh. I'm gonna guess: commit 10e481b154c5fc63e6ce4b449ce86cecb87a6015 Author: Aldy Hernandez Date: Thu Jan 26 04:46:54 2023 +0100 Return true from operator== for two identical ranges containing NAN. - if (known_isnan () || src.known_isnan ()) - return false; - return (real_identical (_min, _min) && real_identical (_max, _max) && m_pos_nan == src.m_pos_nan If either operand is a NAN, m_min or m_max have garbage. Can you verify that's the commit? Either way, I can take a look at it on Monday.
[Bug tree-optimization/91645] Missed optimization with sqrt(x*x)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91645 Aldy Hernandez changed: What|Removed |Added CC||llvm at rifkin dot dev --- Comment #10 from Aldy Hernandez --- *** Bug 103559 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/103559] Can't optimize away < 0 check on sqrt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103559 Aldy Hernandez changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE CC||aldyh at gcc dot gnu.org --- Comment #5 from Aldy Hernandez --- (In reply to Jeremy R. from comment #0) > For a simple invocation of sqrt, gcc inserts a < 0 check to set math errno > if needed. E.g. > > float f(float x) { > return sqrt(x); > } > > Is generated as > > f(float): > vxorps xmm1, xmm1, xmm1 > vucomissxmm1, xmm0 > ja .L10 > vsqrtss xmm0, xmm0, xmm0 > ret > .L10: > jmp sqrtf > > > Unfortunately, this check is still present when the GCC is able to prove > that x is non-negative: > > float f(float x) { > if(x < 0) [[unlikely]] { > __builtin_unreachable(); > } else { > return sqrt(x); > } > } x could also be a NAN which I *think* the hardware sqrt won't handle, so a better test would be: if (x < 0.0 || __builtin_isnan(x)) [[unlikely]] __builtin_unreachable (); or perhaps: if (!__builtin_isgreater(x, 0.0)) Either way, this is a subset of PR91645 so I'm closing it as a duplicate. *** This bug has been marked as a duplicate of bug 91645 ***
[Bug tree-optimization/91645] Missed optimization with sqrt(x*x)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91645 --- Comment #9 from Aldy Hernandez --- It looks like what we want for this test is actually !isgreaterequal() not isless(), since we want to exclude the possibility of a NAN. Like this: float test (float x) { if (!__builtin_isgreaterequal (x, 0.0)) __builtin_unreachable(); return sqrtf (x); } After VRP1 removes the unreachable, the range for x_1 is correctly exported as >= -0.0 without a NAN: Global Exported (via unreachable): x_1(D) = [frange] float [-0.0 (-0x0.0p+0), +Inf] We end up with this: float test (float x) { float _4; [local count: 1073741824]: _4 = sqrtf (x_1(D)); return _4; } which then CDCE expands with the unnecessary checking code: [local count: 1073741824]: DCE_COND_LB.2_5 = x_1(D); DCE_COND_LB_TEST.3_6 = DCE_COND_LB.2_5 u>= 0.0; if (DCE_COND_LB_TEST.3_6 != 0) goto ; [99.95%] else goto ; [0.05%] [local count: 1073204960]: _4 = .SQRT (x_1(D)); goto ; [100.00%] [local count: 536864]: _7 = sqrtf (x_1(D)); So the CDCE pass needs to be enhanced to use the ranger, instead of heuristics, to determine that the argument to sqrtf is neither negative nor NAN. In this particular case, we could use global ranges for free, but there's no reason we can't use an actual on-demand ranger for more complex scenarios. Just a guess here, but use_internal_fn() in CDCE shrink wraps the call to sqrt into a check with appropriate dispatch. We could emit the .SQRT call directly if the range of x_1 is not NAN and not negative.
[Bug tree-optimization/109154] [13 regression] jump threading de-optimizes nested floating point comparisons
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154 --- Comment #13 from Aldy Hernandez --- (In reply to Jakub Jelinek from comment #12) > (In reply to Aldy Hernandez from comment #10) > > BTW, I don't think it helps at all here, but casting from l_10 to a float, > > we know _1 can't be either -0.0 or +-INF or +-NAN. We could add a range-op > > entry for NOP_EXPR / CONVERT_EXPR to expose this fact. Well, at the very > > least that it can't be a NAN...in the current representation for frange's. > > We definitely should add range-ops for conversions from integral to floating > point and from floating to integral and their reverses. But until we have > more than one range, if the integral value is VARYING, for 32-bit signed int > the range would be > [-0x1.p+31, 0x1.p+31] so nothing specific around zero. With 3+ ranges we > could make it > [-0x1.p+31, -1.][0., 0.][1., 0x1.p+31] if we think normal values around zero > are important special cases. Ultimately we want "unlimited" sub-ranges like we have for int_range_max, but who knows what pandora's box that will open :-/.
[Bug c/109233] [12/13 Regression] warning: array subscript 5 is above array bounds of ‘struct tg3_napi[5]’ since r12-2591
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109233 --- Comment #10 from Aldy Hernandez --- (In reply to Jakub Jelinek from comment #8) > And on the ranger side why we have determined the [0, 5] range rather than > [0, 4], whether it is related to inaccurate number of iterations estimation, > or ranger using it incorrectly, ... The [0, 5] is actually coming from SCEV, which ranger is using to refine the range. Presumably, ranger is doing worse than SCEV, because it doesn't improve it. $ grep 'Loops range fo' a.c.*evrp Loops range found for i_3: [irange] int [0, 5] NONZERO 0x7 and calculated range :[irange] int [-2147483647, +INF] Loops range found for i_3: [irange] int [0, 5] NONZERO 0x7 and calculated range :[irange] int [0, 6] NONZERO 0x7 Perhaps Andrew can pontificate on the recalculations / iterations / etc.
[Bug tree-optimization/109154] [13 regression] jump threading de-optimizes nested floating point comparisons
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154 --- Comment #10 from Aldy Hernandez --- (In reply to Andrew Macleod from comment #9) > (In reply to Richard Biener from comment #7) > > (In reply to Richard Biener from comment #6) > > > ah, probably it's the missing CSE there: > > > > > > : > > > _1 = (float) l_10; > > > _2 = _1 < 0.0; > > > zone1_17 = (int) _2; > > > if (_1 < 0.0) BTW, I don't think it helps at all here, but casting from l_10 to a float, we know _1 can't be either -0.0 or +-INF or +-NAN. We could add a range-op entry for NOP_EXPR / CONVERT_EXPR to expose this fact. Well, at the very least that it can't be a NAN...in the current representation for frange's.
[Bug tree-optimization/109154] [13 regression] jump threading de-optimizes nested floating point comparisons
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154 --- Comment #8 from Aldy Hernandez --- (In reply to avieira from comment #5) > Im slightly confused here, on entry to BB 5 we know the opposite of _1 < 0.0 > no? if we branch to BB 5 we know !(_1 < 0.0) so we can't fold _1 <= 1.0, we > just know that the range of _1 is >= 0.0 . Or am I misreading, I've not > tried compiling myself just going off the code both of you posted here. Sorry, I should've been more clear. _1 is < 0.0 on entry to BB5, but only on the 4->5->?? path which is what's being analyzed.
[Bug tree-optimization/109154] [13 regression] jump threading de-optimizes nested floating point comparisons
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154 Aldy Hernandez changed: What|Removed |Added CC||amacleod at redhat dot com, ||law at gcc dot gnu.org, ||rguenth at gcc dot gnu.org --- Comment #4 from Aldy Hernandez --- (In reply to Tamar Christina from comment #3) > Aldy, any thoughts here? We need a "real" threading expert on this one, as the decision by ranger is correct. It looks like this is a profitability issue in the threader. The problem can be seen with -O2 --param=threader-debug=all, on the threadfull1 dump: The threaded path is: path: 4->5->7 SUCCESS ranger can fold the conditional in BB5: if (_1 < 1.0e+0) goto ; [41.00%] else goto ; [59.00%] ...because on entry to BB5 we know _1 < 0.0: [local count: 955630225]: _1 = (float) l_9; _2 = _1 < 0.0; zone1_15 = (int) _2; if (_1 < 0.0) goto ; [41.00%] else goto ; [59.00%] [local count: 391808389]: [local count: 955630225]: # iftmp.0_10 = PHI fasten_main_natpro_chrg_init.2_3 = fasten_main_natpro_chrg_init; _4 = fasten_main_natpro_chrg_init.2_3 * iftmp.0_10; _5 = (float) _4; if (_1 < 1.0e+0) goto ; [41.00%] else goto ; [59.00%] If this shouldn't be threaded because of a hot/cold issue, perhaps the code goes in back_threader_profitability::profitable_path_p() where there's already logic wrt hot blocks.
[Bug ipa/81323] IPA-VRP doesn't handle return values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81323 --- Comment #7 from Aldy Hernandez --- (In reply to Andrew Macleod from comment #6) > (In reply to Jakub Jelinek from comment #4) > > Or the ranger could do it itself, similarly to how it handles .ASSUME, but > > without actually querying anything but the global range of the return value > > if any. Though, doing that in the range means that we won't know ranges of > > functions which with LTO are in a different partition, while doing it as IPA > > optimization could allow even that to work. > > Aldy has been doing some IPA/LTO related cleanup with ranges... Hopefully we > can get this all connected next release. I'm sitting on a patchset for GCC 14 that will revamp all the range handling in ipa-*.* to be less ad-hoc, and use generic ranges (vrange even, so type-agnostic). The plan is to integrate this with the internal range storage used by IPA (currently pairs of wide ints or value_range's) so that IPA at least has the infrastructure to handle generic ranges. Some discussion upstream is still needed, but the general idea should be feasible for GCC 14. It will be up to the IPA maintainers to handle floats/etc internally though.
[Bug tree-optimization/109008] [13 Regression] Wrong code in scipy package since r13-3926-gd4c2f1d376da6f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109008 --- Comment #29 from Aldy Hernandez --- Created attachment 54604 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54604=edit untested patch for NAN state copying This is what I had in mind. Notice I haven't touched the fields in frange itself, to keep disruption to a minimum in this release. In the next release we could replace m_pos_nan and m_neg_nan with nan_state. If this helps, I could test and document it.
[Bug tree-optimization/109008] [13 Regression] Wrong code in scipy package since r13-3926-gd4c2f1d376da6f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109008 --- Comment #27 from Aldy Hernandez --- (In reply to Richard Biener from comment #21) > So without messing with real.cc to try exposing 0.5ulp adjustments for GCC > 13 I'd simply do something like the following: > > diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc > index ff42b95de4f..1ae68503664 100644 > --- a/gcc/range-op-float.cc > +++ b/gcc/range-op-float.cc > @@ -2214,7 +2214,26 @@ public: > range_op_handler minus (MINUS_EXPR, type); > if (!minus) >return false; > -return float_binary_op_range_finish (minus.fold_range (r, type, lhs, > op2), > +/* The forward operation is > +lhs = r + op2 > + where the result is +-0.5ulp of lhs before rounding. For the > + reverse operation we need the lhs range before rounding, so > + conservatively use nextafter on it. > + ??? REAL_VALUE_TYPE could handle this more precisely if we > + make sure to not round the inputs for the format before the > + real_operation is carried out and when we can properly round > + towards +-Inf for the lower/upper bounds. */ > +frange wlhs (lhs); > +if (!lhs.known_isnan ()) > + { > + REAL_VALUE_TYPE lhsl = lhs.lower_bound (); > + frange_nextafter (TYPE_MODE (type), lhsl, dconstninf); > + REAL_VALUE_TYPE lhsu = lhs.upper_bound (); > + frange_nextafter (TYPE_MODE (type), lhsu, dconstinf); > + wlhs.set (type, lhsl, lhsu); > + wlhs.union_ (lhs); /* Copy NaN state. */ > + } > +return float_binary_op_range_finish (minus.fold_range (r, type, wlhs, > op2), > r, type, lhs); >} >virtual bool op2_range (frange , tree type, Could we abstract this into an inline function until this could be properly implemented in real.cc?
[Bug tree-optimization/109008] [13 Regression] Wrong code in scipy package since r13-3926-gd4c2f1d376da6f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109008 Aldy Hernandez changed: What|Removed |Added CC||amacleod at redhat dot com --- Comment #26 from Aldy Hernandez --- (In reply to Richard Biener from comment #16) > (In reply to Jakub Jelinek from comment #14) > > Created attachment 54599 [details] > > gcc13-pr109008-wip.patch > > > > I have now this WIP but it doesn't work correctly yet, need to debug it now, > > just finished writing it and making it compile. > > Note, after the binary search it is still unfinished, I'd like to then > > adjust it one bit at a time to narrow it further. But before working on > > that it needs to work correctly. > > Looks nicer and more precise (even though the iterating function is ugly > and you also suffer from the lack of copying of alternate range state, > but using .union () is a way out here I guess ;)). While working on LTO streaming of ranges, I also noticed that we don't have a way of accessing the individual NAN sign bits or a way of copying NAN state. Why don't we get this right instead of hacking around it? Would it help if we provided the following? nan_state_t irange::get_nan_state (); irange::set (tree type, const REAL_VALUE_TYPE &, const REAL_VALUE_TYPE &, const nan_state_t &, value_range_kind = VR_RANGE); We could implement nan_state_t to include the m_pos_nan and m_neg_nan for minimal disruption now, and later use it for extending the NAN information (signalling, etc).
[Bug tree-optimization/107561] [13 Regression] g++.dg/pr71488.C and [g++.dg/warn/Warray-bounds-16.C -m32] regression due to -Wstringop-overflow problem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107561 --- Comment #14 from Aldy Hernandez --- (In reply to Richard Biener from comment #11) > So I've missed the VR_ANTI_RANGE handling in get_size_range where we run into > > wide_int maxsize = wi::to_wide (max_object_size ()); > min = wide_int::from (min, maxsize.get_precision (), UNSIGNED); > max = wide_int::from (max, maxsize.get_precision (), UNSIGNED); > if (wi::eq_p (0, min - 1)) > { > /* EXP is unsigned and not in the range [1, MAX]. That means > it's either zero or greater than MAX. Even though 0 would > normally be detected by -Walloc-zero, unless ALLOW_ZERO > is set, set the range to [MAX, TYPE_MAX] so that when MAX > is greater than the limit the whole range is diagnosed. */ > wide_int maxsize = wi::to_wide (max_object_size ()); > if (flags & SR_ALLOW_ZERO) > { > if (wi::leu_p (maxsize, max + 1) > || !(flags & SR_USE_LARGEST)) > min = max = wi::zero (expprec); > else > { > min = max + 1; > max = wi::to_wide (TYPE_MAX_VALUE (exptype)); > } > } > else > { > min = max + 1; > max = wi::to_wide (TYPE_MAX_VALUE (exptype)); > } > > and from [0,0] [8, +INF] pick [8, +INF] based on the comments reasoning. Ughh, you're reaching all the problematic cases I ran into while trying to remove legacy. > > This all wouldn't happen if we'd be able to CSE the zero size ... > > We can now try to put additional heuristic ontop of the above heuristic, > namely when the object we write to is of size zero set SR_ALLOW_ZERO. > Or try to "undo" the multiplication trick which would probably make us > end up with VARYING. > > I'll note that with the earlier proposed change we regress the following, > that's an observation I make a lot of times - all "weirdness" in the code > is backed by (artificial) testcases verifying it works exactly as coded ... +1 > > FAIL: gcc.dg/Wstringop-overflow-15.c pr82608 (test for warnings, line 37) > FAIL: gcc.dg/Wstringop-overflow-15.c pr82608 (test for warnings, line 38) > FAIL: gcc.dg/Wstringop-overflow-15.c pr82608 (test for warnings, line 69) > FAIL: gcc.dg/Wstringop-overflow-15.c pr82608 (test for warnings, line 70) > FAIL: gcc.dg/Wstringop-overflow-56.c (test for warnings, line 64) > FAIL: gcc.dg/Wstringop-overflow-56.c (test for warnings, line 75) > FAIL: gcc.dg/Wstringop-overflow-56.c (test for warnings, line 86) > FAIL: gcc.dg/Wstringop-overflow-56.c (test for warnings, line 97) > FAIL: gcc.dg/Wstringop-overflow-56.c (test for warnings, line 108) > FAIL: gcc.dg/Wstringop-overflow-56.c (test for warnings, line 148) > FAIL: gcc.dg/Wstringop-overflow-56.c (test for warnings, line 159) > FAIL: gcc.dg/attr-alloc_size-11.c (test for warnings, line 52) > FAIL: gcc.dg/attr-alloc_size-11.c (test for warnings, line 53) > FAIL: gcc.dg/attr-alloc_size-11.c (test for warnings, line 54) > FAIL: gcc.dg/attr-alloc_size-11.c (test for warnings, line 55) > FAIL: gcc.dg/attr-alloc_size-11.c (test for warnings, line 56) > FAIL: gcc.dg/attr-alloc_size-11.c (test for warnings, line 57) > FAIL: gcc.dg/attr-alloc_size-11.c (test for warnings, line 58) > FAIL: gcc.dg/attr-alloc_size-11.c (test for warnings, line 64) > FAIL: gcc.dg/attr-alloc_size-11.c (test for warnings, line 65) > FAIL: gcc.dg/attr-alloc_size-3.c (test for warnings, line 438) > FAIL: gcc.dg/attr-alloc_size-4.c (test for warnings, line 138) > FAIL: gcc.dg/attr-alloc_size-4.c (test for warnings, line 143) > FAIL: gcc.dg/attr-alloc_size-4.c (test for warnings, line 187) > FAIL: gcc.dg/pr98721-1.c (test for warnings, line 11) > FAIL: gcc.dg/pr98721-1.c (test for warnings, line 12) > > For example gcc.dg/pr98721-1.c has > > int > foo (int n) > { > if (n <= 0) > { > char vla[n]; /* { dg-message "source object 'vla' > of size 0" } */ > return __builtin_strlen (vla);/* { dg-warning "'__builtin_strlen' > reading 1 or more bytes from a region of size 0" } */ > > but of course we do not diagnose > > int > foo (int n) > { > if (n < 0) > { > char vla[n]; > > or when no condition is present or a n > 32 condition is present. Yup, ran into that too. > > I fear it's not possible to "fix" this testcase without changing the > expectation on a bunch of other testcases. But the messaging to the > user is quite unhelpful because it doesn't actually inform him about > above reasoning. Agreed.
[Bug tree-optimization/107561] [13 Regression] g++.dg/pr71488.C and [g++.dg/warn/Warray-bounds-16.C -m32] regression due to -Wstringop-overflow problem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107561 --- Comment #12 from Aldy Hernandez --- (In reply to Richard Biener from comment #10) > (In reply to Richard Biener from comment #9) > > Note I think there's still a bug in value_range (irange) here. > > get_size_range > > does > > > > if (integral) > > { > > value_range vr; > > > > query->range_of_expr (vr, exp, stmt); > > > > if (vr.undefined_p ()) > > vr.set_varying (TREE_TYPE (exp)); > > range_type = vr.kind (); > > min = wi::to_wide (vr.min ()); > > max = wi::to_wide (vr.max ()); > > > > and we have vr: > > > > (gdb) p vr > > $13 = { = { = { > > _vptr.vrange = 0x3693a30 +16>, > > m_kind = VR_ANTI_RANGE, m_discriminator = VR_IRANGE}, > > m_num_ranges = 1 '\001', m_max_ranges = 1 '\001', > > m_nonzero_mask = , m_base = 0x7fffc8f0}, m_ranges = { > > , }} > > (gdb) p vr.dump (stderr) > > [irange] unsigned int [0, 0][8, +INF]$17 = void > > > > but vr.min () produces 1 and vr.max () produces 7, just as if it doesn't > > interpret VR_ANTI_RANGE transparently here (if that's the intent?!). > > At least > > > > // Return the highest bound of a range expressed as a tree. > > > > inline tree > > irange::tree_upper_bound () const > > > > suggests that. Note that vr.num_pairs () produces 2 (because constant_p ()) > > but vr.m_num_ranges is 1 and tree_upper_bound uses m_num_ranges. > > > > I suppose irange::{min,max,tree_lower_bound,tree_upper_bound} miss "support" > > for legacy_mode_p here. > > OTOH gimple-array-bounds.cc does > > const value_range *vr = NULL; > if (TREE_CODE (low_sub_org) == SSA_NAME) > { > vr = get_value_range (low_sub_org, stmt); > if (!vr->undefined_p () && !vr->varying_p ()) > { > low_sub = vr->kind () == VR_RANGE ? vr->max () : vr->min (); > up_sub = vr->kind () == VR_RANGE ? vr->min () : vr->max (); > } > > so the bug is a documentation bug on min/max/lower/upper bound?! > > I'm policeing other uses of value_range right now. Haven't looked at this yet, but min/max/lower/upper bound are broken and inconsistent. They give different results in legacy and the new irange API. This was not by intent, but it crept in :/. My fault. I noticed this when removing the legacy code for next stage1.
[Bug testsuite/108835] gm2 tests at large -jNN numbers do not return
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108835 --- Comment #4 from Aldy Hernandez --- Thank you very much for such a prompt fix.
[Bug testsuite/108835] New: gm2 tests at large -jNN numbers do not return
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108835 Bug ID: 108835 Summary: gm2 tests at large -jNN numbers do not return Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite Assignee: unassigned at gcc dot gnu.org Reporter: aldyh at gcc dot gnu.org Target Milestone: --- Running make check -j55 sometimes yields tests that fail to terminate in the gm2/ directory. For example, coroutine.x5 and testtransfer.x5. Worst case scenario there should be a timeout for these tests.
[Bug tree-optimization/108697] constructing a path-range-query is expensive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697 --- Comment #5 from Aldy Hernandez --- (In reply to Richard Biener from comment #3) > But yes, your observation about m_has_cache_entry is correct - if that has > any value (it makes reset_path "cheap"), then it should be also relied on > upon > growth. Maybe make this bitmap an optional feature of the cache itself > and handle it transparently there? BTW, if putting the bitmap in the ssa_global_cache itself works, that's fine as well, especially if it makes the ranger's cache faster :).
[Bug tree-optimization/108697] constructing a path-range-query is expensive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697 --- Comment #4 from Aldy Hernandez --- Created attachment 54420 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54420=edit reuse path_range_query SSA cache for all of back_threader class Off of the top of my head (i.e. probably very broken ;-)), this is what I had in mind... Share a cache between subsequent instantiations of path_range_query. Note the change to the set_cache: path_range_query::set_cache (const vrange , tree name) { unsigned v = SSA_NAME_VERSION (name); - bitmap_set_bit (m_has_cache_entry, v); + if (bitmap_set_bit (m_has_cache_entry, v)) +m_cache->clear_global_range (name); m_cache->set_global_range (name, r); } Since we're sharing the cache, make sure we nuke whatever it had there before, otherwise set_global_range will try to reuse the slot if it thinks it fits. Although...maybe that's not even necessary. I'm not sure...you'd have to play around with it, but I think the general idea would work.
[Bug tree-optimization/108697] constructing a path-range-query is expensive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697 --- Comment #1 from Aldy Hernandez --- Sharing a cache with say the ranger is a no go, because the path_range_query's cache is specific to the path, but perhaps we could share the path's cache between constructions providing a constructor that takes the cache as you suggest. The path query has a bitmap to determine if a cache entry is used (m_has_cache_entry), so I think we could reuse a dirty cache since the bitmap gets cleared at construction. I assume the main culprit is the path_range_query's being instantiated from class back_threader? If so, there could be one cache per back_threader? Hmmm, perhaps we'd have to call ssa_global_cache::clear_global_range() before each set_global_range() for reused caches otherwise the latter would try to reuse slots (fits_p and all that). Also, is it the clearing part of safe_grow_cleared() that is taking up the time, or just safe_grow. Cause ISTM we don't need to clear the cache for use in path_range_query. So we could even add a flag to use safe_grow() instead of safe_grow_cleared() for the path query?
[Bug tree-optimization/108647] [13 Regression] ICE in upper_bound, at value-range.h:950 with -O3 since r13-2974-g67166c9ec35d58ef
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108647 --- Comment #15 from Aldy Hernandez --- (In reply to Jakub Jelinek from comment #14) > Created attachment 54405 [details] > gcc13-pr108647.patch > > Here is what I'm about to test momentarily, though I must say I don't > understand those operator_cast changes at all. I thought casts are unary > operators and so I don't understand what kind of range would be op2 in that > case. Oh poop, sorry. Unary operators always have the resulting type passed as VARYING in op2. It would never be undefined. Sorry for the noise; you can disregard the cast changes.
[Bug tree-optimization/108647] [13 Regression] ICE in upper_bound, at value-range.h:950 with -O3 since r13-2974-g67166c9ec35d58ef
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108647 --- Comment #13 from Aldy Hernandez --- Created attachment 54404 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54404=edit frange changes These are the analogous changes to range-op-float.cc. Patch in testing.
[Bug tree-optimization/108647] [13 Regression] ICE in upper_bound, at value-range.h:950 with -O3 since r13-2974-g67166c9ec35d58ef
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108647 --- Comment #12 from Aldy Hernandez --- (In reply to Jakub Jelinek from comment #7) > So > --- gcc/range-op.cc.jj2023-02-03 10:51:40.699003658 +0100 > +++ gcc/range-op.cc 2023-02-03 16:04:39.264159294 +0100 > @@ -642,7 +642,8 @@ operator_equal::op1_range (irange , tr > case BRS_FALSE: >// If the result is false, the only time we know anything is >// if OP2 is a constant. > - if (wi::eq_p (op2.lower_bound(), op2.upper_bound())) > + if (!op2.undefined_p () > + && wi::eq_p (op2.lower_bound(), op2.upper_bound())) > { > r = op2; > r.invert (); > @@ -755,7 +756,8 @@ operator_not_equal::op1_range (irange > case BRS_TRUE: >// If the result is true, the only time we know anything is if >// OP2 is a constant. > - if (wi::eq_p (op2.lower_bound(), op2.upper_bound())) > + if (!op2.undefined_p () > + && wi::eq_p (op2.lower_bound(), op2.upper_bound())) > { > r = op2; > r.invert (); > @@ -920,6 +922,9 @@ operator_lt::op1_range (irange , tree > const irange , > relation_trio) const > { > + if (op2.undefined_p ()) > +return false; > + >switch (get_bool_state (r, lhs, type)) > { > case BRS_TRUE: > @@ -942,6 +947,9 @@ operator_lt::op2_range (irange , tree > const irange , > relation_trio) const > { > + if (op1.undefined_p ()) > +return false; > + >switch (get_bool_state (r, lhs, type)) > { > case BRS_TRUE: > @@ -1031,6 +1039,9 @@ operator_le::op1_range (irange , tree > const irange , > relation_trio) const > { > + if (op2.undefined_p ()) > +return false; > + >switch (get_bool_state (r, lhs, type)) > { > case BRS_TRUE: > @@ -1053,6 +1064,9 @@ operator_le::op2_range (irange , tree > const irange , > relation_trio) const > { > + if (op1.undefined_p ()) > +return false; > + >switch (get_bool_state (r, lhs, type)) > { > case BRS_TRUE: > @@ -1141,6 +1155,9 @@ operator_gt::op1_range (irange , tree > const irange , const irange , > relation_trio) const > { > + if (op2.undefined_p ()) > +return false; > + >switch (get_bool_state (r, lhs, type)) > { > case BRS_TRUE: > @@ -1163,6 +1180,9 @@ operator_gt::op2_range (irange , tree > const irange , > relation_trio) const > { > + if (op1.undefined_p ()) > +return false; > + >switch (get_bool_state (r, lhs, type)) > { > case BRS_TRUE: > @@ -1252,6 +1272,9 @@ operator_ge::op1_range (irange , tree > const irange , > relation_trio) const > { > + if (op2.undefined_p ()) > +return false; > + >switch (get_bool_state (r, lhs, type)) > { > case BRS_TRUE: > @@ -1274,6 +1297,9 @@ operator_ge::op2_range (irange , tree > const irange , > relation_trio) const > { > + if (op1.undefined_p ()) > +return false; > + >switch (get_bool_state (r, lhs, type)) > { > case BRS_TRUE: > then plus testcase? Looks good to me. Do you mind adding this bit to your testing? diff --git a/gcc/range-op.cc b/gcc/range-op.cc index 136b709385c..fdc0a6c05fd 100644 --- a/gcc/range-op.cc +++ b/gcc/range-op.cc @@ -2678,7 +2678,6 @@ operator_cast::op1_range (irange , tree type, if (lhs.undefined_p ()) return false; tree lhs_type = lhs.type (); - gcc_checking_assert (types_compatible_p (op2.type(), type)); // If we are calculating a pointer, shortcut to what we really care about. if (POINTER_TYPE_P (type)) @@ -2705,6 +2704,8 @@ operator_cast::op1_range (irange , tree type, return true; } + if (op2.undefined_p ()) +return false; if (truncating_cast_p (op2, lhs)) { if (lhs.varying_p ()) This catches the cast operator which will ICE in truncating_cast_p when op2 is undefined. I've removed the checking assert since any number of operations further down will ICE if the types don't match.
[Bug tree-optimization/108647] [13 Regression] ICE in upper_bound, at value-range.h:950 with -O3 since r13-2974-g67166c9ec35d58ef
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108647 --- Comment #5 from Aldy Hernandez --- > Perhaps just adding if (op2.undefined_p ()) return false; above most of the > switch (get_bool_state (r, lhs, type)) lines (in methods that refer to op2; > and similarly for op1) for the comparison operators. This feels like the right approach. We also do something similar in some of the binary operators. For example, plus/minus_expr do: if (lhs.undefined_p ()) return false; For this testcase we have: [0,0] = UNDEFINED < op2 I don't think we can determine anything for op2 in this situation, so the right thing would be to return false, which amounts to returning VARYING. The binary operators would also have to handle operators of undefined, but those seem to all have gates on singleton_p() or go through the fold_range() routine for the inverse operation, so they'd be handled correctly. So yeah, it looks like just handling the op[12]_range operators above the get_bool_state would do the trick. Hmmm, it seems like there are 2 operators that have possible problematic handling of op2.undefined_p(): 1. operator_cast::op1_range (irange , tree type, The assert looks at the op2 type. That will ICE with an UNDEFINED. The truncating_cast_p path will also look at op2's type blindly. 2. operator_bitwise_xor::op1_range Imagine: [0,0] = x ^ UNDEFINED op1_range won't ICE, but it will do: case BRS_FALSE: r = op2; I don't think we know anything about x in this case. Though if, op2 is UNDEFINED, I guess it wouldn't hurt to assume op1 is also UNDEFINED. Also earlier in the function, for: UNDEFINED = x ^ op2 will return x = UNDEFINED for any op2. So at least #1 would also have to be handled because it could ICE. Also, range-op-float.cc needs the same treatment prior all the get_bool_state calls. As you mention, another alternative would be to always call empty_range_varying, as we're doing for abs::op[12]_range: if (empty_range_varying (r, type, lhs, op2)) return true; Thoughts? Andrew? Hmmm, I wonder why we haven't tripped over this scenario in past releases (LHS being defined, but op2 being UNDEFINED).
[Bug tree-optimization/108639] [13 Regression] ICE on valid code at -O1 and above: in decompose, at wide-int.h:984 since r13-5578
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108639 --- Comment #12 from Aldy Hernandez --- Yeah, I've been mulling around the idea of removing the type from storage of both irange and frange. It seems we need it for setting a type (setting the endpoints for varying, querying HONOR_NANS, HONOR_INFINITIES,e tc), but not in the storage itself. Something to keep in mind when moving to wide_ints. It does seem we need to keep the sign and precision somewhere. The precision lives in the wide-int, but the sign bit still needs storage??
[Bug tree-optimization/108639] [13 Regression] ICE on valid code at -O1 and above: in decompose, at wide-int.h:984 since r13-5578
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108639 --- Comment #7 from Aldy Hernandez --- Jakub, take a look and see if you agree. I've fired off some tests.
[Bug tree-optimization/108639] [13 Regression] ICE on valid code at -O1 and above: in decompose, at wide-int.h:984 since r13-5578
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108639 --- Comment #6 from Aldy Hernandez --- Created attachment 54393 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54393=edit untested patch for irange::operator==
[Bug tree-optimization/108639] [13 Regression] ICE on valid code at -O1 and above: in decompose, at wide-int.h:984 since r13-5578
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108639 Aldy Hernandez changed: What|Removed |Added CC||aldyh at gcc dot gnu.org --- Comment #5 from Aldy Hernandez --- (In reply to Jakub Jelinek from comment #3) > Created attachment 54391 [details] > gcc13-pr108639.patch > > Untested fix. I think the problem is more fundamental than that. The equality operator for irange is not ICEing for the sub-range comparison (which also have different precision), but is dying in the nonzero mask comparison.
[Bug tree-optimization/108540] [13 Regression] Frange miscompilation of ruby since r13-3261
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108540 --- Comment #7 from Aldy Hernandez --- (In reply to Jakub Jelinek from comment #6) > Created attachment 54351 [details] > gcc13-pr108540.patch > > Untested fix. LGTM. Thanks for looking at this.
[Bug tree-optimization/108447] [13 Regression] glibc math/test-*-iseqsig failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108447 --- Comment #14 from Aldy Hernandez --- (In reply to rguent...@suse.de from comment #13) > On Thu, 19 Jan 2023, jakub at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108447 > > > > --- Comment #12 from Jakub Jelinek --- > > As a workaround in stage4, perhaps, but long term the relations make a lot > > of > > sense even for floating point with NANs. If you know <= relation between 2 > > vars and know the range of one of them, the other can be computed from it, > > ... > > I think it makes more sense to fix the relation combining code, maybe > simply disable that for FP as a workaround? Yeah, the combining code definitely needs some tweaks. Thinking out loud here... It looks like we ignore relations in the various relation fold_range() methods if NAN can be a possibility. See how frelop_early_resolve() only looks at the relation if the operands are !NAN. So perhaps this is why we never used validate_relation(). On the other hand, the op1_range methods *do* use the relations. For example, on EQ_EXPR (foperator_equal::op1_range), we clear NAN on the TRUE side because equality could never happen if there was a NAN present (regardless of VREL_*). On the FALSE side we set_nan() only if VREL_EQ. So we ignore the oracle's relations in fold_range() throughout, unless we're sure !NAN, and use the oracle from op1_range.
[Bug tree-optimization/108447] [13 Regression] glibc math/test-*-iseqsig failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108447 --- Comment #11 from Aldy Hernandez --- Hmmm, I wonder if we could do this all in validate_relation like Andrew had planned. If NAN is a possibility in either x or y, then we could disallow any relation recording right off the bat, and avoid any special casing in union/intersect. Perhaps allow the != relation even if NAN?
[Bug tree-optimization/108447] [13 Regression] glibc math/test-*-iseqsig failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108447 --- Comment #9 from Aldy Hernandez --- (In reply to Jakub Jelinek from comment #4) > I see fold_using_range::relation_fold_and_or > which sees relation1 VREL_LE and relation2 VREL_GE and !is_and, and because > of > relation_union (relation1, relation2) == VREL_VARYING fold it to 1. > But for floating point comparisons, LE | GE is not always true, it is true if > neither operand is NAN, otherwise false. Ah, it was the union not the intersect. So we need something here that would avoid resolving this to 1 if maybe_isnan() is true for either source operand.
[Bug tree-optimization/108447] [13 Regression] glibc math/test-*-iseqsig failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108447 --- Comment #8 from Aldy Hernandez --- (In reply to Jakub Jelinek from comment #1) > This one started with r13-1933-g24012539ae3410174. > I think the problem is in: > > Folding statement: _3 = cmp1_10 & cmp2_11; > Not folded > Folding statement: if (_3 != 0) > Registering value_relation (x_8(D) <= y_9(D)) on (2->3) > Registering value_relation (x_8(D) >= y_9(D)) on (2->3) > Intersecting with existing (x_8(D) <= y_9(D)) to produce (x_8(D) == > y_9(D)) Updated. > and later > Folding statement: _1 = cmp1_10 | cmp2_11; > Relation adjustment: cmp1_10 and cmp2_11 combine to produce [irange] > _Bool [1, 1] > Queued stmt for removal. Folds to: 1 > > When NANs are honored, the above is not true for floating point comparisons. > x <= y and x >= y comparisons are signaling (if either operand is qNaN or > sNaN), x == y and x != y aren't (they only signal if either operand is > sNaN), and while *blink*. Woah. I did not know that. As Andrew mentions down thread, we had validate_relation() to verify whether a relation could be registered. So for example, given x == x, validate_relation() would be called at registration time and call the range-ops entry for EQ_EXPR, which would return false for this combo if x could be a NAN. For some forgotten reason, we forgot to call validate_relation() at the oracle's registry. My guess is that since we ignore the oracle's answers in frelop_early_resolve() (if !maybe_isnan) in each of the relational operators, it was no longer needed. Although for the testcase here, perhaps this would not have helped either. Is there anything in x_8 or y_9 that could be used indicate that we can't register these relations? : cmp1_10 = x_8(D) <= y_9(D); cmp2_11 = x_8(D) >= y_9(D); _3 = cmp1_10 & cmp2_11; if (_3 != 0) That is, when we see x_8 <= y_9, is there anything in either the range of x_8 or y_9 that would hint that we can't register the <= relation? I would have assumed that we could only register this relation if neither x_8 nor y_9 can have a NAN. Would this help? If this won't help, then perhaps the issue is in the oracle's relation intersection code. We would need a way to indicate that VREL_LE ^ VREL_GE is VREL_VARYING if the source operands may have a NAN (or some other fancy FP condition I don't understand ;-))?
[Bug tree-optimization/107608] [13 Regression] Failure on fold-overflow-1.c and pr95115.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107608 Aldy Hernandez changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #38 from Aldy Hernandez --- fixed in trunk
[Bug tree-optimization/107608] [13 Regression] Failure on fold-overflow-1.c and pr95115.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107608 --- Comment #36 from Aldy Hernandez --- Can we close this PR?