[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 Richard Biener changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #15 from Richard Biener --- Let's close this now.
[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 --- Comment #14 from rguenther at suse dot de --- On Wed, 16 Jun 2021, alexander.gr...@tu-dresden.de wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 > > --- Comment #13 from Alexander Grund --- > > But what you can see is that the resulting pointer is used for the > > initialization and not the placement address as literally written in source. > > So I assume it was supposed to be "Y::Y (D_6557, 1);" ? > > > I'm not sure how one can solve this issue with using placement new > > but are unions not sufficiently restricted so that copy assignment > > should work (and activate the appropriate union member)? Thus > > > > slot->mutable_value = pair(k, v); > > The problem is not the copy, the problem is that the value may contain any > kind > of data, think e.g. a pair of strings. And at the initial point (i.e. first > emplace) the slot is a casted pointer into uninitialized data. I.e. the above > would be an assignment into an object which does not exist. And (especially) > for such non-trivial types this would break. > > I think it will work for trivial types though, although it is UB due to > lifetime rules: You can't use an object (here: assign to) which has not > started > its lifetime yet. I see. I would need to read up what kind of restrictions recent C++ standards place on union members, but in C a store to a non-active union member makes that active and IIRC for tradidional POD data types the same should hold true for C++, even w/o requiring an explicit placement new. > However e.g. pair has custom copy and regular constructors so I think it will > run into the issue you mentioned: The ctor will access the object via the > this-pointer and not via the full union-thing and hence might misoptimise > later > > This would mean that in conclusion the use case of putting std::pairs in an > union and accessing them via aliasing is unsupported by (at least) GCC. Is > that > correct? Without restricting the set of C++ features used, yes. Even accessing the data via union.memb.getX (); would involve a 'this' pointer and thus break things. std::pair is probably a special-case that might work since you use pair.first rather than a method though ;) You can see this type-punning via unions exception was invented for C ;)
[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 --- Comment #13 from Alexander Grund --- > But what you can see is that the resulting pointer is used for the > initialization and not the placement address as literally written in source. So I assume it was supposed to be "Y::Y (D_6557, 1);" ? > I'm not sure how one can solve this issue with using placement new > but are unions not sufficiently restricted so that copy assignment > should work (and activate the appropriate union member)? Thus > > slot->mutable_value = pair(k, v); The problem is not the copy, the problem is that the value may contain any kind of data, think e.g. a pair of strings. And at the initial point (i.e. first emplace) the slot is a casted pointer into uninitialized data. I.e. the above would be an assignment into an object which does not exist. And (especially) for such non-trivial types this would break. I think it will work for trivial types though, although it is UB due to lifetime rules: You can't use an object (here: assign to) which has not started its lifetime yet. However e.g. pair has custom copy and regular constructors so I think it will run into the issue you mentioned: The ctor will access the object via the this-pointer and not via the full union-thing and hence might misoptimise later This would mean that in conclusion the use case of putting std::pairs in an union and accessing them via aliasing is unsupported by (at least) GCC. Is that correct?
[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 --- Comment #12 from rguenther at suse dot de --- On Wed, 16 Jun 2021, alexander.gr...@tu-dresden.de wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 > > Alexander Grund changed: > >What|Removed |Added > > Version|8.4.0 |8.5.0 > Known to fail||8.3.0, 8.4.0, 8.5.0 > Known to work||9.1.0 > > --- Comment #10 from Alexander Grund --- > > My suspicion is, that GCC loads the value of slot2 before constructing the > > object > > I have just confirmed that: memsetting the whole region with a magic value > shows exactly that: Where a new value should be created and then returned, > then > the memset-value is returned instead. When the map already has an entry for > the > key (i.e. no new value created) then the existing value (i.e. the correct one) > is returned. > > Question now: Is the in-place construction UB and just happens to "normally" > work or is that an optimizer bug that was fixed or gone latent? And how to > proceed to further narrow this down? I think what can be clearly said is that in-place construction of a union member and then accessing the constructed object via another union member (aka do type punning via unions) is not supported even if the later access has the union visible in the access (because the in-place construction does not). For simple cases if everything is inlined GCC usually plays nice and does what-you-mean in case it can see must-aliases, thus *(int *)x = 1; *(float *)x = 0.f; return *(int *)x; is, even if technically UB, _not_ "miscompiled" ("miscompiled" it would optimize to return 1).
[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 --- Comment #11 from rguenther at suse dot de --- On Wed, 16 Jun 2021, alexander.gr...@tu-dresden.de wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 > > --- Comment #9 from Alexander Grund --- > > Note that when the union type is visible in the access path then GCC allows > > punning without further restrictions. Thus the accesses as written above > > are OK. > > Now I have to ask again for clarification because this seemingly contradicts > your previous statement. So let me try to state the previous question again: > > Given a pointer `slot_type* slot` (where slot_type is the union as defined > before) is it legal to access `slot->value.first`, > `slot->mutable_value.first`, > `slot->key` interchangeably? > > In all 3 accesses the union is visible in the access path (if I understood > that > correctly) so the type punning should be ok, shouldn't it? Yes - but it routinely happens that in C++ you end up returning a reference to the union member as abstraction and an access to that reference does _not_ have the union visible but only the member. > > the circumstances have been so there's incentive to do an invalid > > transform... > > So is this indeed a GCC bug which may be fixed or gone latent? I don't think so, but we cannot rule this out at this point (but see above). > Otherwise, maybe the construction is the problem? What Abseil basically > does is allocating a (suitably aligned) buffer of chars and in-place > constructing those slot_type unions/pairs there as needed (i.e. similar > to std::vector) After abstractions what basically happens is: > > // alloc buffer done, then: > slot_type* slot_buffer = reinterpret_cast(buffer); > > // on emplace: > size_t x = ...; > slot_type* slot = slot_buffer + x; > new(slot) slot_type; > new(>mutable_value) pair(k, v); > slot_type* slot2 = slot_buffer + x; // same as before, actually done through > the iterator > idx_vec[i] = slot2->value.second; so if slot_type is the union type then new(>mutable_value) pair(k, v) looks problematic since that calls operator new with a pointer to the union member and the actual construction receives >mutable_value as address of type decltype (slot->mutable_value) * which falls foul of the union punning rule. I'm not sure how one can solve this issue with using placement new but are unions not sufficiently restricted so that copy assignment should work (and activate the appropriate union member)? Thus slot->mutable_value = pair(k, v); ? The compiler should hopefully be able to elide the copy. > My suspicion is, that GCC loads the value of slot2 before constructing the > object, at least sometimes. Printing the values in the vector often shows a > run > of zeroes before some other (potentially correct) values. I'd guess the > correct > values happen, when no insertion took place, i.e. the construction was done > already in a prior loop iteration. Yes, GCC would simply "skip" the offending placement new and if it finds a suitable definition before it would replace the load with said definition. To expand on the placement new issue if you for example have struct Y { Y(int); }; union X { int i; float f; }; void foo (void *p) { X *q = reinterpret_cast (p); new (>i) Y (1); } we end up with (early unoptimized IL): void __GIMPLE (void * p) { void * D_6558; void * D_6557; union X * q; q = p; D_6558 = >i; D_6557 = operator new (1ul, D_6558); try { Y::Y (D_6571, 1); } catch { operator delete (D_6557, D_6558); } } where 'operator new' is the usual noop that just returns the passed pointer here. But what you can see is that the resulting pointer is used for the initialization and not the placement address as literally written in source. And even then, the CTOR that is involved of course sees only 'this' which is a plain pointer to its class type and all accesses in the CTOR will not have the union visible. That might be unexpected but I think it's a quite natural consequence of C++ abstraction :/
[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 Alexander Grund changed: What|Removed |Added Version|8.4.0 |8.5.0 Known to fail||8.3.0, 8.4.0, 8.5.0 Known to work||9.1.0 --- Comment #10 from Alexander Grund --- > My suspicion is, that GCC loads the value of slot2 before constructing the > object I have just confirmed that: memsetting the whole region with a magic value shows exactly that: Where a new value should be created and then returned, then the memset-value is returned instead. When the map already has an entry for the key (i.e. no new value created) then the existing value (i.e. the correct one) is returned. Question now: Is the in-place construction UB and just happens to "normally" work or is that an optimizer bug that was fixed or gone latent? And how to proceed to further narrow this down?
[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 --- Comment #9 from Alexander Grund --- > Note that when the union type is visible in the access path then GCC allows > punning without further restrictions. Thus the accesses as written above are > OK. Now I have to ask again for clarification because this seemingly contradicts your previous statement. So let me try to state the previous question again: Given a pointer `slot_type* slot` (where slot_type is the union as defined before) is it legal to access `slot->value.first`, `slot->mutable_value.first`, `slot->key` interchangeably? In all 3 accesses the union is visible in the access path (if I understood that correctly) so the type punning should be ok, shouldn't it? > the circumstances have been so there's incentive to do an invalid > transform... So is this indeed a GCC bug which may be fixed or gone latent? Otherwise, maybe the construction is the problem? What Abseil basically does is allocating a (suitably aligned) buffer of chars and in-place constructing those slot_type unions/pairs there as needed (i.e. similar to std::vector) After abstractions what basically happens is: // alloc buffer done, then: slot_type* slot_buffer = reinterpret_cast(buffer); // on emplace: size_t x = ...; slot_type* slot = slot_buffer + x; new(slot) slot_type; new(>mutable_value) pair(k, v); slot_type* slot2 = slot_buffer + x; // same as before, actually done through the iterator idx_vec[i] = slot2->value.second; My suspicion is, that GCC loads the value of slot2 before constructing the object, at least sometimes. Printing the values in the vector often shows a run of zeroes before some other (potentially correct) values. I'd guess the correct values happen, when no insertion took place, i.e. the construction was done already in a prior loop iteration.
[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 --- Comment #8 from rguenther at suse dot de --- On June 15, 2021 4:27:37 PM GMT+02:00, "alexander.gr...@tu-dresden.de" wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 > >--- Comment #6 from Alexander Grund --- >Oh and for completeness: The same applies to the following union, >doesn't it? >I.e. given this: > >struct TF_TString_Raw { > uint8_t raw[24]; >}; >struct TF_TString_Small { > uint8_t size; > char str[23]; >}; >struct TF_TString_Large { > size_t size; > size_t cap; > char *ptr; >}; >union{ >TF_TString_Raw raw; >TF_TString_Small smll; >TF_TString_Large large; >} x; > >it is not allowed to access x.raw.raw[0] AND then x.large.size (not >even a >common initial subsequence) but also not x.raw.raw[0] AND then >x.small.size >(maybe a common sequence, not sure about the array, but not implemented >in GCC) Note that when the union type is visible in the access path then GCC allows punning without further restrictions. Thus the accesses as written above are OK. What is not OK is mixed accesses via pointers to the small/large/base aggregate types. What is OK is accesses via an effective character type (uint8_t is usually as implementation detail, not sure if by the standard), thus char *
[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 --- Comment #7 from rguenther at suse dot de --- On June 15, 2021 4:21:12 PM GMT+02:00, "alexander.gr...@tu-dresden.de" wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 > >--- Comment #5 from Alexander Grund --- >So am I right assuming that the following is basically UB as per GCC >(although >it should work as per the standard)? > >template >union slot_type { > map_slot_type() {} > ~map_slot_type() = delete; > using value_type = std::pair; > using mutable_value_type = std::pair; > > value_type value; > mutable_value_type mutable_value; > K key; >}; > >--> I.e. given a pointer `slot_type* slot` it _should_ have been >possible to >access `slot->value.first`, `slot->mutable_value.first`, `slot->key` >interchangeably but in fact it is not, i.e. not implemented in GCC. That's correct. >I'm asking because it "may" work, and e.g. seemingly does in GCC 9+, Yeah, the circumstances have been so there's incentive to do an invalid transform... >but >yeah... If that is indeed unsupported I'll open a bug report against >the repo >using this. Note that this effectively disallows having "flat" maps >that return >`std::pair` via their iterators but have support for moving >items >effectively (i.e. via `std::pair` pointers)
[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 --- Comment #6 from Alexander Grund --- Oh and for completeness: The same applies to the following union, doesn't it? I.e. given this: struct TF_TString_Raw { uint8_t raw[24]; }; struct TF_TString_Small { uint8_t size; char str[23]; }; struct TF_TString_Large { size_t size; size_t cap; char *ptr; }; union{ TF_TString_Raw raw; TF_TString_Small smll; TF_TString_Large large; } x; it is not allowed to access x.raw.raw[0] AND then x.large.size (not even a common initial subsequence) but also not x.raw.raw[0] AND then x.small.size (maybe a common sequence, not sure about the array, but not implemented in GCC)
[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 --- Comment #5 from Alexander Grund --- So am I right assuming that the following is basically UB as per GCC (although it should work as per the standard)? template union slot_type { map_slot_type() {} ~map_slot_type() = delete; using value_type = std::pair; using mutable_value_type = std::pair; value_type value; mutable_value_type mutable_value; K key; }; --> I.e. given a pointer `slot_type* slot` it _should_ have been possible to access `slot->value.first`, `slot->mutable_value.first`, `slot->key` interchangeably but in fact it is not, i.e. not implemented in GCC. I'm asking because it "may" work, and e.g. seemingly does in GCC 9+, but yeah... If that is indeed unsupported I'll open a bug report against the repo using this. Note that this effectively disallows having "flat" maps that return `std::pair` via their iterators but have support for moving items effectively (i.e. via `std::pair` pointers)
[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 --- Comment #4 from rguenther at suse dot de --- On Tue, 15 Jun 2021, alexander.gr...@tu-dresden.de wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 > > --- Comment #3 from Alexander Grund --- > You are right, it actually seems to be the combination of those to, so -O2 > -fno-strict-aliasing and -O2 -fno-tree-vrp both make it work. > > The layout-compatible refers to the "common initial sequence" that is allowed > to be inspected for inactive union members, see the link. We definitely do not implement the common initial sequence rule. Where this manifests is when you for example have struct X { int i; }; struct Y { int i; int j; } *a; and access *(struct X *)a - thus when you do an access with an effective type of the _aggregate_ that forms the common initial sequence. Accessing the members via a pointer to their type works fine (here int *). Note that ((struct X *)a)->i also constitutes an access of an object of type X and thus is not implemented with the common initial sequence rule in mind. Note this is never going to change for GCC since this rule is not well thought out (to name it politely ;)). You can, as workaround use sth like struct X { int i; } __attribute__((may_alias)); where the indirect accesses to type X will alias everything. Or alternatively use type composition.
[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 --- Comment #3 from Alexander Grund --- You are right, it actually seems to be the combination of those to, so -O2 -fno-strict-aliasing and -O2 -fno-tree-vrp both make it work. The layout-compatible refers to the "common initial sequence" that is allowed to be inspected for inactive union members, see the link. I also tested GCC 8.5.0 but there it is not fixed. (BTW: I checked the Github mirror which doesn't list 8.5.0) I'm aware that it is no longer supported but I too think that it is a latent bug or maybe just outright unsupported/UB > If you have ways to pinpoint the function where things go wrong then it might > be possible to spot a wrong transform there In the provided source the bug happens in this 2 lines: auto it = uniq.emplace(Tin(i), j); idx_vec(i) = it.first->second; Tin and idx_vec can be replaced by std::vector and std::vector respectively and the bug still occurs. However there are quite some abstractions especially in the first line including a conversion from their custom string type to their custom string_view I know it has got not main, because it is part of the tensorflow shared library loaded into the Python process and run from there. I wasn't able to reduce this further as mentioned as almost any change leads to this bug disappearing. I see that the tree-vrp has had many string-related changes which may handle their custom string better, see https://github.com/tensorflow/tensorflow/blob/c405c59212c764485818fad8b34294c0b553e6fb/tensorflow/core/platform/ctstring_internal.h#L118-L121 However they are dangerously close to UB in many places, e.g. https://github.com/tensorflow/tensorflow/blob/c405c59212c764485818fad8b34294c0b553e6fb/tensorflow/core/platform/ctstring_internal.h#L184-L186 (the code executed here) accesses an array of one union member and a variable of another. I'm also not sure if allocating a chunk of memory and in-place construction of an union containing a const member is actually allowed. I just want to know for sure, that either their code is non-compliant or there is a bug in the compiler. I can do experiments with the source but lack the knowledge to trace it through the compiler. See also my issue in the TF repo: https://github.com/tensorflow/tensorflow/issues/47179
[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 --- Comment #2 from Richard Biener --- If it is related to those union accesses then it should have nothing to do with -ftree-vrp but it should vanish with -O2 -fno-strict-aliasing at least. Note we do not implement any of "layout compatible" (whatever that exactly is) for union members but if the difference is really just 'const T' vs. 'T' then this should work. Also note that GCC 8 is no longer supported and there's a newer 8.5.0 release available as well. It looks like the bug requires quite special circumstances to happen thus it's likely just gone latent with newer compilers and/or different flags. Unfortunately the preprocessed source lacks a main() and thus it's hard to determine whether the bug occurs or not. If you have ways to pinpoint the function where things go wrong then it might be possible to spot a wrong transform there (but a C++ code base is always hard to analyze because of all the inlining and abstraction that happens).
[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061 --- Comment #1 from Alexander Grund --- Created attachment 51006 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51006=edit compressed preprocessed source