[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing

2021-06-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #15 from Richard Biener  ---
Let's close this now.

[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing

2021-06-16 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061

--- Comment #14 from rguenther at suse dot de  ---
On Wed, 16 Jun 2021, alexander.gr...@tu-dresden.de wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061
> 
> --- Comment #13 from Alexander Grund  ---
> > But what you can see is that the resulting pointer is used for the 
> > initialization and not the placement address as literally written in source.
> 
> So I assume it was supposed to be "Y::Y (D_6557, 1);" ?
> 
> > I'm not sure how one can solve this issue with using placement new
> > but are unions not sufficiently restricted so that copy assignment
> > should work (and activate the appropriate union member)?  Thus
> > 
> >   slot->mutable_value = pair(k, v);
> 
> The problem is not the copy, the problem is that the value may contain any 
> kind
> of data, think e.g. a pair of strings. And at the initial point (i.e. first
> emplace) the slot is a casted pointer into uninitialized data. I.e. the above
> would be an assignment into an object which does not exist. And (especially)
> for such non-trivial types this would break.
> 
> I think it will work for trivial types though, although it is UB due to
> lifetime rules: You can't use an object (here: assign to) which has not 
> started
> its lifetime yet.

I see.  I would need to read up what kind of restrictions recent C++
standards place on union members, but in C a store to a non-active
union member makes that active and IIRC for tradidional POD data types
the same should hold true for C++, even w/o requiring an explicit
placement new.

> However e.g. pair has custom copy and regular constructors so I think it will
> run into the issue you mentioned: The ctor will access the object via the
> this-pointer and not via the full union-thing and hence might misoptimise 
> later
> 
> This would mean that in conclusion the use case of putting std::pairs in an
> union and accessing them via aliasing is unsupported by (at least) GCC. Is 
> that
> correct?

Without restricting the set of C++ features used, yes.  Even
accessing the data via union.memb.getX (); would involve a 'this'
pointer and thus break things.  std::pair is probably a special-case
that might work since you use pair.first rather than a method though ;)

You can see this type-punning via unions exception was invented
for C ;)

[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing

2021-06-16 Thread alexander.grund--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061

--- Comment #13 from Alexander Grund  ---
> But what you can see is that the resulting pointer is used for the 
> initialization and not the placement address as literally written in source.

So I assume it was supposed to be "Y::Y (D_6557, 1);" ?

> I'm not sure how one can solve this issue with using placement new
> but are unions not sufficiently restricted so that copy assignment
> should work (and activate the appropriate union member)?  Thus
> 
>   slot->mutable_value = pair(k, v);

The problem is not the copy, the problem is that the value may contain any kind
of data, think e.g. a pair of strings. And at the initial point (i.e. first
emplace) the slot is a casted pointer into uninitialized data. I.e. the above
would be an assignment into an object which does not exist. And (especially)
for such non-trivial types this would break.

I think it will work for trivial types though, although it is UB due to
lifetime rules: You can't use an object (here: assign to) which has not started
its lifetime yet.

However e.g. pair has custom copy and regular constructors so I think it will
run into the issue you mentioned: The ctor will access the object via the
this-pointer and not via the full union-thing and hence might misoptimise later

This would mean that in conclusion the use case of putting std::pairs in an
union and accessing them via aliasing is unsupported by (at least) GCC. Is that
correct?

[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing

2021-06-16 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061

--- Comment #12 from rguenther at suse dot de  ---
On Wed, 16 Jun 2021, alexander.gr...@tu-dresden.de wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061
> 
> Alexander Grund  changed:
> 
>What|Removed |Added
> 
> Version|8.4.0   |8.5.0
>   Known to fail||8.3.0, 8.4.0, 8.5.0
>   Known to work||9.1.0
> 
> --- Comment #10 from Alexander Grund  ---
> > My suspicion is, that GCC loads the value of slot2 before constructing the 
> > object
> 
> I have just confirmed that: memsetting the whole region with a magic value
> shows exactly that: Where a new value should be created and then returned, 
> then
> the memset-value is returned instead. When the map already has an entry for 
> the
> key (i.e. no new value created) then the existing value (i.e. the correct one)
> is returned.
> 
> Question now: Is the in-place construction UB and just happens to "normally"
> work or is that an optimizer bug that was fixed or gone latent? And how to
> proceed to further narrow this down?

I think what can be clearly said is that in-place construction of
a union member and then accessing the constructed object via
another union member (aka do type punning via unions) is not supported
even if the later access has the union visible in the access
(because the in-place construction does not).

For simple cases if everything is inlined GCC usually plays nice
and does what-you-mean in case it can see must-aliases, thus

 *(int *)x = 1;
 *(float *)x = 0.f;
 return *(int *)x;

is, even if technically UB, _not_ "miscompiled" ("miscompiled" it
would optimize to return 1).

[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing

2021-06-16 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061

--- Comment #11 from rguenther at suse dot de  ---
On Wed, 16 Jun 2021, alexander.gr...@tu-dresden.de wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061
> 
> --- Comment #9 from Alexander Grund  ---
> > Note that when the union type is visible in the access path then GCC allows 
> > punning without further restrictions. Thus the accesses as written above 
> > are OK.
> 
> Now I have to ask again for clarification because this seemingly contradicts
> your previous statement. So let me try to state the previous question again:
> 
> Given a pointer `slot_type* slot` (where slot_type is the union as defined
> before) is it legal to access `slot->value.first`, 
> `slot->mutable_value.first`,
> `slot->key` interchangeably?
> 
> In all 3 accesses the union is visible in the access path (if I understood 
> that
> correctly) so the type punning should be ok, shouldn't it?

Yes - but it routinely happens that in C++ you end up returning a
reference to the union member as abstraction and an access to that 
reference does _not_ have the union visible but only the member.

> > the circumstances have been so there's incentive to do an invalid 
> > transform... 
> 
> So is this indeed a GCC bug which may be fixed or gone latent?

I don't think so, but we cannot rule this out at this point (but see 
above).

> Otherwise, maybe the construction is the problem? What Abseil basically 
> does is allocating a (suitably aligned) buffer of chars and in-place 
> constructing those slot_type unions/pairs there as needed (i.e. similar 
> to std::vector) After abstractions what basically happens is:
> 
> // alloc buffer done, then:
> slot_type* slot_buffer = reinterpret_cast(buffer);
> 
> // on emplace:
> size_t x = ...;
> slot_type* slot = slot_buffer + x;
> new(slot) slot_type;
> new(>mutable_value) pair(k, v);
> slot_type* slot2 = slot_buffer + x; // same as before, actually done through
> the iterator
> idx_vec[i] = slot2->value.second;

so if slot_type is the union type then

  new(>mutable_value) pair(k, v)

looks problematic since that calls operator new with a pointer to the
union member and the actual construction receives >mutable_value
as address of type decltype (slot->mutable_value) * which falls foul
of the union punning rule.

I'm not sure how one can solve this issue with using placement new
but are unions not sufficiently restricted so that copy assignment
should work (and activate the appropriate union member)?  Thus

  slot->mutable_value = pair(k, v);

?  The compiler should hopefully be able to elide the copy.

> My suspicion is, that GCC loads the value of slot2 before constructing the
> object, at least sometimes. Printing the values in the vector often shows a 
> run
> of zeroes before some other (potentially correct) values. I'd guess the 
> correct
> values happen, when no insertion took place, i.e. the construction was done
> already in a prior loop iteration.

Yes, GCC would simply "skip" the offending placement new and if it finds
a suitable definition before it would replace the load with said 
definition.

To expand on the placement new issue if you for example have

struct Y { Y(int); };
union X { int i; float f; };

void foo (void *p)
{
  X *q = reinterpret_cast  (p);
  new (>i) Y (1);
}

we end up with (early unoptimized IL):

void __GIMPLE (void * p)
{
  void * D_6558;
  void * D_6557;
  union X * q;

  q = p;
  D_6558 = >i;
  D_6557 = operator new (1ul, D_6558);
  try
{
  Y::Y (D_6571, 1);
}
  catch
{
  operator delete (D_6557, D_6558);
}
}

where 'operator new' is the usual noop that just returns the passed
pointer here.  But what you can see is that the resulting pointer
is used for the initialization and not the placement address as
literally written in source.  And even then, the CTOR that is
involved of course sees only 'this' which is a plain pointer to
its class type and all accesses in the CTOR will not have the
union visible.

That might be unexpected but I think it's a quite natural
consequence of C++ abstraction :/

[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing

2021-06-16 Thread alexander.grund--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061

Alexander Grund  changed:

   What|Removed |Added

Version|8.4.0   |8.5.0
  Known to fail||8.3.0, 8.4.0, 8.5.0
  Known to work||9.1.0

--- Comment #10 from Alexander Grund  ---
> My suspicion is, that GCC loads the value of slot2 before constructing the 
> object

I have just confirmed that: memsetting the whole region with a magic value
shows exactly that: Where a new value should be created and then returned, then
the memset-value is returned instead. When the map already has an entry for the
key (i.e. no new value created) then the existing value (i.e. the correct one)
is returned.

Question now: Is the in-place construction UB and just happens to "normally"
work or is that an optimizer bug that was fixed or gone latent? And how to
proceed to further narrow this down?

[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing

2021-06-16 Thread alexander.grund--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061

--- Comment #9 from Alexander Grund  ---
> Note that when the union type is visible in the access path then GCC allows 
> punning without further restrictions. Thus the accesses as written above are 
> OK.

Now I have to ask again for clarification because this seemingly contradicts
your previous statement. So let me try to state the previous question again:

Given a pointer `slot_type* slot` (where slot_type is the union as defined
before) is it legal to access `slot->value.first`, `slot->mutable_value.first`,
`slot->key` interchangeably?

In all 3 accesses the union is visible in the access path (if I understood that
correctly) so the type punning should be ok, shouldn't it?

> the circumstances have been so there's incentive to do an invalid 
> transform... 

So is this indeed a GCC bug which may be fixed or gone latent?

Otherwise, maybe the construction is the problem? What Abseil basically does is
allocating a (suitably aligned) buffer of chars and in-place constructing those
slot_type unions/pairs there as needed (i.e. similar to std::vector)
After abstractions what basically happens is:

// alloc buffer done, then:
slot_type* slot_buffer = reinterpret_cast(buffer);

// on emplace:
size_t x = ...;
slot_type* slot = slot_buffer + x;
new(slot) slot_type;
new(>mutable_value) pair(k, v);
slot_type* slot2 = slot_buffer + x; // same as before, actually done through
the iterator
idx_vec[i] = slot2->value.second;


My suspicion is, that GCC loads the value of slot2 before constructing the
object, at least sometimes. Printing the values in the vector often shows a run
of zeroes before some other (potentially correct) values. I'd guess the correct
values happen, when no insertion took place, i.e. the construction was done
already in a prior loop iteration.

[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing

2021-06-15 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061

--- Comment #8 from rguenther at suse dot de  ---
On June 15, 2021 4:27:37 PM GMT+02:00, "alexander.gr...@tu-dresden.de"
 wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061
>
>--- Comment #6 from Alexander Grund  ---
>Oh and for completeness: The same applies to the following union,
>doesn't it?
>I.e. given this:
>
>struct TF_TString_Raw {
>  uint8_t raw[24];
>};
>struct TF_TString_Small {
>  uint8_t size;
>  char str[23];
>};
>struct TF_TString_Large { 
>  size_t size;
>  size_t cap;
>  char *ptr;
>};
>union{
>TF_TString_Raw raw;
>TF_TString_Small smll;
>TF_TString_Large large;
>} x;  
>
>it is not allowed to access x.raw.raw[0] AND then x.large.size (not
>even a
>common initial subsequence) but also not x.raw.raw[0] AND then
>x.small.size
>(maybe a common sequence, not sure about the array, but not implemented
>in GCC)

Note that when the union type is visible in the access path then GCC allows
punning without further restrictions. Thus the accesses as written above are
OK. What is not OK is mixed accesses via pointers to the small/large/base
aggregate types. What is OK is accesses via an effective character type
(uint8_t is usually as implementation detail, not sure if by the standard),
thus char *

[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing

2021-06-15 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061

--- Comment #7 from rguenther at suse dot de  ---
On June 15, 2021 4:21:12 PM GMT+02:00, "alexander.gr...@tu-dresden.de"
 wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061
>
>--- Comment #5 from Alexander Grund  ---
>So am I right assuming that the following is basically UB as per GCC
>(although
>it should work as per the standard)?
>
>template 
>union slot_type {
>  map_slot_type() {}
>  ~map_slot_type() = delete;
>  using value_type = std::pair;
>  using mutable_value_type = std::pair;
>
>  value_type value;
>  mutable_value_type mutable_value;
>  K key;
>};
>
>--> I.e. given a pointer `slot_type* slot` it _should_ have been
>possible to
>access `slot->value.first`, `slot->mutable_value.first`, `slot->key`
>interchangeably but in fact it is not, i.e. not implemented in GCC.

That's correct. 

>I'm asking because it "may" work, and e.g. seemingly does in GCC 9+,

Yeah, the circumstances have been so there's incentive to do an invalid
transform... 

>but
>yeah... If that is indeed unsupported I'll open a bug report against
>the repo
>using this. Note that this effectively disallows having "flat" maps
>that return
>`std::pair` via their iterators but have support for moving
>items
>effectively (i.e. via `std::pair` pointers)

[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing

2021-06-15 Thread alexander.grund--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061

--- Comment #6 from Alexander Grund  ---
Oh and for completeness: The same applies to the following union, doesn't it?
I.e. given this:

struct TF_TString_Raw {
  uint8_t raw[24];
};
struct TF_TString_Small {
  uint8_t size;
  char str[23];
};
struct TF_TString_Large { 
  size_t size;
  size_t cap;
  char *ptr;
};
union{
TF_TString_Raw raw;
TF_TString_Small smll;
TF_TString_Large large;
} x;  

it is not allowed to access x.raw.raw[0] AND then x.large.size (not even a
common initial subsequence) but also not x.raw.raw[0] AND then x.small.size
(maybe a common sequence, not sure about the array, but not implemented in GCC)

[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing

2021-06-15 Thread alexander.grund--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061

--- Comment #5 from Alexander Grund  ---
So am I right assuming that the following is basically UB as per GCC (although
it should work as per the standard)?

template 
union slot_type {
  map_slot_type() {}
  ~map_slot_type() = delete;
  using value_type = std::pair;
  using mutable_value_type = std::pair;

  value_type value;
  mutable_value_type mutable_value;
  K key;
};

--> I.e. given a pointer `slot_type* slot` it _should_ have been possible to
access `slot->value.first`, `slot->mutable_value.first`, `slot->key`
interchangeably but in fact it is not, i.e. not implemented in GCC.

I'm asking because it "may" work, and e.g. seemingly does in GCC 9+, but
yeah... If that is indeed unsupported I'll open a bug report against the repo
using this. Note that this effectively disallows having "flat" maps that return
`std::pair` via their iterators but have support for moving items
effectively (i.e. via `std::pair` pointers)

[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing

2021-06-15 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061

--- Comment #4 from rguenther at suse dot de  ---
On Tue, 15 Jun 2021, alexander.gr...@tu-dresden.de wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061
> 
> --- Comment #3 from Alexander Grund  ---
> You are right, it actually seems to be the combination of those to, so -O2
> -fno-strict-aliasing and -O2 -fno-tree-vrp both make it work.
> 
> The layout-compatible refers to the "common initial sequence" that is allowed
> to be inspected for inactive union members, see the link.

We definitely do not implement the common initial sequence rule.  Where
this manifests is when you for example have

 struct X { int i; };
 struct Y { int i; int j; } *a;

and access *(struct X *)a - thus when you do an access with an effective
type of the _aggregate_ that forms the common initial sequence.  Accessing
the members via a pointer to their type works fine (here int *).  Note
that ((struct X *)a)->i also constitutes an access of an object of
type X and thus is not implemented with the common initial sequence
rule in mind.

Note this is never going to change for GCC since this rule is not
well thought out (to name it politely ;)).

You can, as workaround use sth like

struct X { int i; } __attribute__((may_alias));

where the indirect accesses to type X will alias everything.
Or alternatively use type composition.

[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing

2021-06-15 Thread alexander.grund--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061

--- Comment #3 from Alexander Grund  ---
You are right, it actually seems to be the combination of those to, so -O2
-fno-strict-aliasing and -O2 -fno-tree-vrp both make it work.

The layout-compatible refers to the "common initial sequence" that is allowed
to be inspected for inactive union members, see the link.

I also tested GCC 8.5.0 but there it is not fixed. (BTW: I checked the Github
mirror which doesn't list 8.5.0) I'm aware that it is no longer supported but I
too think that it is a latent bug or maybe just outright unsupported/UB

> If you have ways to pinpoint the function where things go wrong then it might 
> be possible to spot a wrong transform there

In the provided source the bug happens in this 2 lines:
auto it = uniq.emplace(Tin(i), j);
idx_vec(i) = it.first->second;

Tin and idx_vec can be replaced by std::vector and std::vector
respectively and the bug still occurs. However there are quite some
abstractions especially in the first line including a conversion from their
custom string type to their custom string_view

I know it has got not main, because it is part of the tensorflow shared library
loaded into the Python process and run from there. I wasn't able to reduce this
further as mentioned as almost any change leads to this bug disappearing.

I see that the tree-vrp has had many string-related changes which may handle
their custom string better, see
https://github.com/tensorflow/tensorflow/blob/c405c59212c764485818fad8b34294c0b553e6fb/tensorflow/core/platform/ctstring_internal.h#L118-L121
However they are dangerously close to UB in many places, e.g.
https://github.com/tensorflow/tensorflow/blob/c405c59212c764485818fad8b34294c0b553e6fb/tensorflow/core/platform/ctstring_internal.h#L184-L186
(the code executed here) accesses an array of one union member and a variable
of another.

I'm also not sure if allocating a chunk of memory and in-place construction of
an union containing a const member is actually allowed.

I just want to know for sure, that either their code is non-compliant or there
is a bug in the compiler. I can do experiments with the source but lack the
knowledge to trace it through the compiler.

See also my issue in the TF repo:
https://github.com/tensorflow/tensorflow/issues/47179

[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing

2021-06-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061

--- Comment #2 from Richard Biener  ---
If it is related to those union accesses then it should have nothing to do with
-ftree-vrp but it should vanish with -O2 -fno-strict-aliasing at least.

Note we do not implement any of "layout compatible" (whatever that exactly is)
for union members but if the difference is really just 'const T' vs. 'T' then
this should work.

Also note that GCC 8 is no longer supported and there's a newer 8.5.0 release
available as well.

It looks like the bug requires quite special circumstances to happen thus it's
likely just gone latent with newer compilers and/or different flags.

Unfortunately the preprocessed source lacks a main() and thus it's hard to
determine whether the bug occurs or not.  If you have ways to pinpoint
the function where things go wrong then it might be possible to spot a wrong
transform there (but a C++ code base is always hard to analyze because of all
the inlining and abstraction that happens).

[Bug tree-optimization/101061] tree-vrp misoptimization on skylake+ using union-based aliasing

2021-06-14 Thread alexander.grund--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101061

--- Comment #1 from Alexander Grund  ---
Created attachment 51006
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51006=edit
compressed preprocessed source