Re: Importance of transformations that turn data dependencies into control dependencies?

2016-03-01 Thread Paul E. McKenney
On Tue, Mar 01, 2016 at 05:55:07PM +0100, Michael Matz wrote:
> Hi,
> 
> On Tue, 1 Mar 2016, Richard Biener wrote:
> 
> > > What about the example I gave above?  Is it unrealistic for compilers 
> > > do ever do something like this, or is it just unlikely to gain much 
> > > performance, or is it just that GCC does not do this today?
> > 
> > GCC does not do this today with the exception of value-profiling.  GCC 
> > in other cases does not establish equivalences but only relations (a < 
> > b, etc.) that are not a problem as far as I can see because those do not 
> > allow to change expressions using a to use b.
> 
> Made up example using relations:
> 
> int32 a, b;
> a = (b >> 31) & 1;
> 
> -->
> 
> if (b < 0)
>   a = 1;
> else
>   a = 0;
> 
> data-dep to control-dep and only relations :)  (I think this is taken care 
> of by Pauls wording, ignoring the fact that these aren't pointers anyway 
> and hence don't carry a dependency through them, only onto them at max)

Agreed, these are not pointers, and therefore do not carry dependencies.

Thanx, Paul



Re: Importance of transformations that turn data dependencies into control dependencies?

2016-03-01 Thread Michael Matz
Hi,

On Tue, 1 Mar 2016, Richard Biener wrote:

> > What about the example I gave above?  Is it unrealistic for compilers 
> > do ever do something like this, or is it just unlikely to gain much 
> > performance, or is it just that GCC does not do this today?
> 
> GCC does not do this today with the exception of value-profiling.  GCC 
> in other cases does not establish equivalences but only relations (a < 
> b, etc.) that are not a problem as far as I can see because those do not 
> allow to change expressions using a to use b.

Made up example using relations:

int32 a, b;
a = (b >> 31) & 1;

-->

if (b < 0)
  a = 1;
else
  a = 0;

data-dep to control-dep and only relations :)  (I think this is taken care 
of by Pauls wording, ignoring the fact that these aren't pointers anyway 
and hence don't carry a dependency through them, only onto them at max)


Ciao,
Michael.


Re: Importance of transformations that turn data dependencies into control dependencies?

2016-03-01 Thread Richard Biener
On Fri, Feb 26, 2016 at 8:10 PM, Torvald Riegel  wrote:
> On Fri, 2016-02-26 at 11:49 +0100, Richard Biener wrote:
>> On Thu, Feb 25, 2016 at 6:33 PM, Torvald Riegel  wrote:
>> > On Wed, 2016-02-24 at 13:14 +0100, Richard Biener wrote:
>> >> On Tue, Feb 23, 2016 at 8:38 PM, Torvald Riegel  
>> >> wrote:
>> >> > I'd like to know, based on the GCC experience, how important we consider
>> >> > optimizations that may turn data dependencies of pointers into control
>> >> > dependencies.  I'm thinking about all optimizations or transformations
>> >> > that guess that a pointer might have a specific value, and then create
>> >> > (specialized) code that assumes this value that is only executed if the
>> >> > pointer actually has this value.  For example:
>> >> >
>> >> > int d[2] = {23, compute_something()};
>> >> >
>> >> > int compute(int v) {
>> >> >   if (likely(v == 23)) return 23;
>> >> >   else ;
>> >> > }
>> >> >
>> >> > int bar() {
>> >> >   int *p = ptr.load(memory_order_consume);
>> >> >   size_t reveal_that_p_is_in_d = p - d[0];
>> >> >   return compute(*p);
>> >> > }
>> >> >
>> >> > Could be transformed to (after inlining compute(), and specializing for
>> >> > the likely path):
>> >> >
>> >> > int bar() {
>> >> >   int *p = ptr.load(memory_order_consume);
>> >> >   if (p == d) return 23;
>> >> >   else ;
>> >> > }
>> >>
>> >> Note that if a user writes
>> >>
>> >>   if (p == d)
>> >>{
>> >>  ... do lots of stuff via p ...
>> >>}
>> >>
>> >> GCC might rewrite accesses to p as accesses to d and thus expose
>> >> those opportunities.  Is that a transform that isn't valid then or is
>> >> the code written by the user (establishing the equivalency) to blame?
>> >
>> > In the context of this memory_order_consume proposal, this transform
>> > would be valid because the program has already "reveiled" what value p
>> > has after the branch has been taken.
>> >
>> >> There's a PR where this kind of equivalencies lead to unexpected (wrong?)
>> >> points-to results for example.
>> >>
>> >> > Other potential examples that come to mind are de-virtualization, or
>> >> > feedback-directed optimizations that has observed at runtime that a
>> >> > certain pointer is likely to be always equal to some other pointer (eg.,
>> >> > if p is almost always d[0], and specializing for that).
>> >>
>> >> That's the cases that are quite important in practice.
>> >
>> > Could you quantify this somehow, even if it's a very rough estimate?
>> > I'm asking because it's significant and widely used, then this would
>> > require users or compiler implementors to make a difficult trade-off
>> > (ie, do you want mo_consume performance or performance through those
>> > other optimizations?).
>>
>> Probably resoved by your followup on how the transform is safe anyway.
>
> Are you saying this because de-virtualization would be the only
> optimization that matters for performance and turns data into control
> dependencies (but can be considered safe)?
>
> What about feedback-directed optimizations, for which you said it would
> be one of the important cases too?  Will these only look equivalences in
> non-mutable data too (ie, like vtables)?

We do have other value-profiling transforms - for example divison/modulo
of a constant:

  if (x == 256)
z = y / 256;
  else
z = y / x;

> What about the example I gave above?  Is it unrealistic for compilers do
> ever do something like this, or is it just unlikely to gain much
> performance, or is it just that GCC does not do this today?

GCC does not do this today with the exception of value-profiling.  GCC in
other cases does not establish equivalences but only relations (a < b, etc.)
that are not a problem as far as I can see because those do not allow to
change expressions using a to use b.

>> >> > Also, it would be interesting to me to know how often we may turn data
>> >> > dependencies into control dependencies in cases where this doesn't
>> >> > affect performance significantly.
>> >>
>> >> I suppose we try to avoid that but can we ever know for sure?  Like
>> >> speculative devirtualization does this (with the intent that it _does_ 
>> >> matter,
>> >> of course).
>> >>
>> >> I suppose establishing non-dependence isn't an issue, like with the
>> >> vectorizer adding runtime dependence checks and applying versioning
>> >> to get a vectorized and a not vectorized loop (in case there are 
>> >> dependences)?
>> >
>> > I'm not sure I understand you correctly.  Do you have a brief example,
>> > perhaps?  For mo_consume and its data dependencies, if there might be a
>> > dependence, the compiler would have to preserve it; but I guess that
>> > both a vectorized loop an one that accessses each element separately
>> > would preserve dependences because it's doing those accesses, and they
>> > depend on the input data.
>> > OTOH, peraps HW vector instructions don't get the ordering guarantees
>> > from data dependences -- 

Re: Importance of transformations that turn data dependencies into control dependencies?

2016-02-29 Thread Torvald Riegel
On Fri, 2016-02-26 at 20:10 +0100, Torvald Riegel wrote:
> On Fri, 2016-02-26 at 11:49 +0100, Richard Biener wrote:
> > On Thu, Feb 25, 2016 at 6:33 PM, Torvald Riegel  wrote:
> > > On Wed, 2016-02-24 at 13:14 +0100, Richard Biener wrote:
> > >> On Tue, Feb 23, 2016 at 8:38 PM, Torvald Riegel  
> > >> wrote:
> > >> > I'd like to know, based on the GCC experience, how important we 
> > >> > consider
> > >> > optimizations that may turn data dependencies of pointers into control
> > >> > dependencies.  I'm thinking about all optimizations or transformations
> > >> > that guess that a pointer might have a specific value, and then create
> > >> > (specialized) code that assumes this value that is only executed if the
> > >> > pointer actually has this value.  For example:
> > >> >
> > >> > int d[2] = {23, compute_something()};
> > >> >
> > >> > int compute(int v) {
> > >> >   if (likely(v == 23)) return 23;
> > >> >   else ;
> > >> > }
> > >> >
> > >> > int bar() {
> > >> >   int *p = ptr.load(memory_order_consume);
> > >> >   size_t reveal_that_p_is_in_d = p - d[0];
> > >> >   return compute(*p);
> > >> > }
> > >> >
> > >> > Could be transformed to (after inlining compute(), and specializing for
> > >> > the likely path):
> > >> >
> > >> > int bar() {
> > >> >   int *p = ptr.load(memory_order_consume);
> > >> >   if (p == d) return 23;
> > >> >   else ;
> > >> > }
> > >>

[snip]

> What about the example I gave above?  Is it unrealistic for compilers do
> ever do something like this, or is it just unlikely to gain much
> performance, or is it just that GCC does not do this today?

Modification of d happens-before the mo_consume load in this case, so
it's "immutable" data and the this specific example would be okay.
Sorry for the noise.

However, I find it nontrivial to figure out whether a compiler would
_not_ be able to replace a data dependency with a control dependency.
Such impossibility results are hard.  Yet this is something that users
of the new mo_consume proposal would have to do, and I'm not sure we'll
end up at a state were this is relatively simple to figure out.  If we
don't know when a programmer intends to use a dependency, we also cannot
warn if there's none.



Re: Importance of transformations that turn data dependencies into control dependencies?

2016-02-29 Thread Torvald Riegel
On Fri, 2016-02-26 at 11:49 +0100, Richard Biener wrote:
> On Thu, Feb 25, 2016 at 6:33 PM, Torvald Riegel  wrote:
> > On Wed, 2016-02-24 at 13:14 +0100, Richard Biener wrote:
> >> On Tue, Feb 23, 2016 at 8:38 PM, Torvald Riegel  wrote:
> >> > I'd like to know, based on the GCC experience, how important we consider
> >> > optimizations that may turn data dependencies of pointers into control
> >> > dependencies.  I'm thinking about all optimizations or transformations
> >> > that guess that a pointer might have a specific value, and then create
> >> > (specialized) code that assumes this value that is only executed if the
> >> > pointer actually has this value.  For example:
> >> >
> >> > int d[2] = {23, compute_something()};
> >> >
> >> > int compute(int v) {
> >> >   if (likely(v == 23)) return 23;
> >> >   else ;
> >> > }
> >> >
> >> > int bar() {
> >> >   int *p = ptr.load(memory_order_consume);
> >> >   size_t reveal_that_p_is_in_d = p - d[0];
> >> >   return compute(*p);
> >> > }
> >> >
> >> > Could be transformed to (after inlining compute(), and specializing for
> >> > the likely path):
> >> >
> >> > int bar() {
> >> >   int *p = ptr.load(memory_order_consume);
> >> >   if (p == d) return 23;
> >> >   else ;
> >> > }
> >>
> >> Note that if a user writes
> >>
> >>   if (p == d)
> >>{
> >>  ... do lots of stuff via p ...
> >>}
> >>
> >> GCC might rewrite accesses to p as accesses to d and thus expose
> >> those opportunities.  Is that a transform that isn't valid then or is
> >> the code written by the user (establishing the equivalency) to blame?
> >
> > In the context of this memory_order_consume proposal, this transform
> > would be valid because the program has already "reveiled" what value p
> > has after the branch has been taken.
> >
> >> There's a PR where this kind of equivalencies lead to unexpected (wrong?)
> >> points-to results for example.
> >>
> >> > Other potential examples that come to mind are de-virtualization, or
> >> > feedback-directed optimizations that has observed at runtime that a
> >> > certain pointer is likely to be always equal to some other pointer (eg.,
> >> > if p is almost always d[0], and specializing for that).
> >>
> >> That's the cases that are quite important in practice.
> >
> > Could you quantify this somehow, even if it's a very rough estimate?
> > I'm asking because it's significant and widely used, then this would
> > require users or compiler implementors to make a difficult trade-off
> > (ie, do you want mo_consume performance or performance through those
> > other optimizations?).
> 
> Probably resoved by your followup on how the transform is safe anyway.

Are you saying this because de-virtualization would be the only
optimization that matters for performance and turns data into control
dependencies (but can be considered safe)?

What about feedback-directed optimizations, for which you said it would
be one of the important cases too?  Will these only look equivalences in
non-mutable data too (ie, like vtables)?

What about the example I gave above?  Is it unrealistic for compilers do
ever do something like this, or is it just unlikely to gain much
performance, or is it just that GCC does not do this today?

> >> > Also, it would be interesting to me to know how often we may turn data
> >> > dependencies into control dependencies in cases where this doesn't
> >> > affect performance significantly.
> >>
> >> I suppose we try to avoid that but can we ever know for sure?  Like
> >> speculative devirtualization does this (with the intent that it _does_ 
> >> matter,
> >> of course).
> >>
> >> I suppose establishing non-dependence isn't an issue, like with the
> >> vectorizer adding runtime dependence checks and applying versioning
> >> to get a vectorized and a not vectorized loop (in case there are 
> >> dependences)?
> >
> > I'm not sure I understand you correctly.  Do you have a brief example,
> > perhaps?  For mo_consume and its data dependencies, if there might be a
> > dependence, the compiler would have to preserve it; but I guess that
> > both a vectorized loop an one that accessses each element separately
> > would preserve dependences because it's doing those accesses, and they
> > depend on the input data.
> > OTOH, peraps HW vector instructions don't get the ordering guarantees
> > from data dependences -- Paul, do you know of any such cases?
> 
> A brief example would be for
> 
>  void foo (int *a, int *b, int n)
> {
>   for (int i = 0; i < n; ++i)
>a[i] = b[i];
> }
> 
> which we can vectorize like
> 
>   if (a + n < b || b + n < a)
>{
>   vectorized loop
>}
>   else
> {
>not vectorized loop
> }
> 
> note how we're not establishing equivalences between pointers but
> non-dependence vs. possible dependence.

Okay, non-dependence of this kind (/ disjoint-access parallelism) isn't
a problem, as long as both the vectorized and non-vectorized loops would
perform the 

Re: Importance of transformations that turn data dependencies into control dependencies?

2016-02-26 Thread Jeff Law

On 02/24/2016 05:14 AM, Richard Biener wrote:


Note that if a user writes

   if (p == d)
{
  ... do lots of stuff via p ...
}

GCC might rewrite accesses to p as accesses to d and thus expose
those opportunities.  Is that a transform that isn't valid then or is
the code written by the user (establishing the equivalency) to blame?
I think Torvald later clarified this wasn't a problem.  If it was, then 
it wouldn't be hard to disable this for pointers and I doubt the impact 
would be measurable.




There's a PR where this kind of equivalencies lead to unexpected (wrong?)
points-to results for example.
Yea, I've watched a couple discussions around this issue.  I don't 
recall them reaching any conclusion about the validity of the testcases.


If the final determination is that such equivalence propagation is 
unsafe for the points-to aliasing system, we can just disable those 
pointer equivalence tracking bits.





Other potential examples that come to mind are de-virtualization, or
feedback-directed optimizations that has observed at runtime that a
certain pointer is likely to be always equal to some other pointer (eg.,
if p is almost always d[0], and specializing for that).


That's the cases that are quite important in practice.
I was most concerned about de-virt and feedback stuff that specializes 
paths based on expected values.



Jeff


Re: Importance of transformations that turn data dependencies into control dependencies?

2016-02-26 Thread Paul E. McKenney
On Fri, Feb 26, 2016 at 11:49:29AM +0100, Richard Biener wrote:
> On Thu, Feb 25, 2016 at 6:33 PM, Torvald Riegel  wrote:
> > On Wed, 2016-02-24 at 13:14 +0100, Richard Biener wrote:
> >> On Tue, Feb 23, 2016 at 8:38 PM, Torvald Riegel  wrote:
> >> > I'd like to know, based on the GCC experience, how important we consider
> >> > optimizations that may turn data dependencies of pointers into control
> >> > dependencies.  I'm thinking about all optimizations or transformations
> >> > that guess that a pointer might have a specific value, and then create
> >> > (specialized) code that assumes this value that is only executed if the
> >> > pointer actually has this value.  For example:
> >> >
> >> > int d[2] = {23, compute_something()};
> >> >
> >> > int compute(int v) {
> >> >   if (likely(v == 23)) return 23;
> >> >   else ;
> >> > }
> >> >
> >> > int bar() {
> >> >   int *p = ptr.load(memory_order_consume);
> >> >   size_t reveal_that_p_is_in_d = p - d[0];
> >> >   return compute(*p);
> >> > }
> >> >
> >> > Could be transformed to (after inlining compute(), and specializing for
> >> > the likely path):
> >> >
> >> > int bar() {
> >> >   int *p = ptr.load(memory_order_consume);
> >> >   if (p == d) return 23;
> >> >   else ;
> >> > }
> >>
> >> Note that if a user writes
> >>
> >>   if (p == d)
> >>{
> >>  ... do lots of stuff via p ...
> >>}
> >>
> >> GCC might rewrite accesses to p as accesses to d and thus expose
> >> those opportunities.  Is that a transform that isn't valid then or is
> >> the code written by the user (establishing the equivalency) to blame?
> >
> > In the context of this memory_order_consume proposal, this transform
> > would be valid because the program has already "reveiled" what value p
> > has after the branch has been taken.
> >
> >> There's a PR where this kind of equivalencies lead to unexpected (wrong?)
> >> points-to results for example.
> >>
> >> > Other potential examples that come to mind are de-virtualization, or
> >> > feedback-directed optimizations that has observed at runtime that a
> >> > certain pointer is likely to be always equal to some other pointer (eg.,
> >> > if p is almost always d[0], and specializing for that).
> >>
> >> That's the cases that are quite important in practice.
> >
> > Could you quantify this somehow, even if it's a very rough estimate?
> > I'm asking because it's significant and widely used, then this would
> > require users or compiler implementors to make a difficult trade-off
> > (ie, do you want mo_consume performance or performance through those
> > other optimizations?).
> 
> Probably resoved by your followup on how the transform is safe anyway.
> 
> >> > Also, it would be interesting to me to know how often we may turn data
> >> > dependencies into control dependencies in cases where this doesn't
> >> > affect performance significantly.
> >>
> >> I suppose we try to avoid that but can we ever know for sure?  Like
> >> speculative devirtualization does this (with the intent that it _does_ 
> >> matter,
> >> of course).
> >>
> >> I suppose establishing non-dependence isn't an issue, like with the
> >> vectorizer adding runtime dependence checks and applying versioning
> >> to get a vectorized and a not vectorized loop (in case there are 
> >> dependences)?
> >
> > I'm not sure I understand you correctly.  Do you have a brief example,
> > perhaps?  For mo_consume and its data dependencies, if there might be a
> > dependence, the compiler would have to preserve it; but I guess that
> > both a vectorized loop an one that accessses each element separately
> > would preserve dependences because it's doing those accesses, and they
> > depend on the input data.
> > OTOH, peraps HW vector instructions don't get the ordering guarantees
> > from data dependences -- Paul, do you know of any such cases?
> 
> A brief example would be for
> 
>  void foo (int *a, int *b, int n)
> {
>   for (int i = 0; i < n; ++i)
>a[i] = b[i];
> }
> 
> which we can vectorize like
> 
>   if (a + n < b || b + n < a)
>{
>   vectorized loop
>}
>   else
> {
>not vectorized loop
> }
> 
> note how we're not establishing equivalences between pointers but
> non-dependence vs. possible dependence.

Thank you for the clarification, I will check.

If I am not too confused, such a loop might want to used x86 non-temporal
stores, which require special handling even with acquire and release,
but that is presumably already handled.

Thanx, Paul

> >> > The background for this question is Paul McKenney's recently updated
> >> > proposal for a different memory_order_consume specification:
> >> > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0190r0.pdf
> >> >
> >> > In a nutshell, this requires a compiler to either prove that a pointer
> >> > value is not carrying a dependency (simplified, its value somehow
> >> > originates from a 

Re: Importance of transformations that turn data dependencies into control dependencies?

2016-02-26 Thread Richard Biener
On Thu, Feb 25, 2016 at 6:33 PM, Torvald Riegel  wrote:
> On Wed, 2016-02-24 at 13:14 +0100, Richard Biener wrote:
>> On Tue, Feb 23, 2016 at 8:38 PM, Torvald Riegel  wrote:
>> > I'd like to know, based on the GCC experience, how important we consider
>> > optimizations that may turn data dependencies of pointers into control
>> > dependencies.  I'm thinking about all optimizations or transformations
>> > that guess that a pointer might have a specific value, and then create
>> > (specialized) code that assumes this value that is only executed if the
>> > pointer actually has this value.  For example:
>> >
>> > int d[2] = {23, compute_something()};
>> >
>> > int compute(int v) {
>> >   if (likely(v == 23)) return 23;
>> >   else ;
>> > }
>> >
>> > int bar() {
>> >   int *p = ptr.load(memory_order_consume);
>> >   size_t reveal_that_p_is_in_d = p - d[0];
>> >   return compute(*p);
>> > }
>> >
>> > Could be transformed to (after inlining compute(), and specializing for
>> > the likely path):
>> >
>> > int bar() {
>> >   int *p = ptr.load(memory_order_consume);
>> >   if (p == d) return 23;
>> >   else ;
>> > }
>>
>> Note that if a user writes
>>
>>   if (p == d)
>>{
>>  ... do lots of stuff via p ...
>>}
>>
>> GCC might rewrite accesses to p as accesses to d and thus expose
>> those opportunities.  Is that a transform that isn't valid then or is
>> the code written by the user (establishing the equivalency) to blame?
>
> In the context of this memory_order_consume proposal, this transform
> would be valid because the program has already "reveiled" what value p
> has after the branch has been taken.
>
>> There's a PR where this kind of equivalencies lead to unexpected (wrong?)
>> points-to results for example.
>>
>> > Other potential examples that come to mind are de-virtualization, or
>> > feedback-directed optimizations that has observed at runtime that a
>> > certain pointer is likely to be always equal to some other pointer (eg.,
>> > if p is almost always d[0], and specializing for that).
>>
>> That's the cases that are quite important in practice.
>
> Could you quantify this somehow, even if it's a very rough estimate?
> I'm asking because it's significant and widely used, then this would
> require users or compiler implementors to make a difficult trade-off
> (ie, do you want mo_consume performance or performance through those
> other optimizations?).

Probably resoved by your followup on how the transform is safe anyway.

>> > Also, it would be interesting to me to know how often we may turn data
>> > dependencies into control dependencies in cases where this doesn't
>> > affect performance significantly.
>>
>> I suppose we try to avoid that but can we ever know for sure?  Like
>> speculative devirtualization does this (with the intent that it _does_ 
>> matter,
>> of course).
>>
>> I suppose establishing non-dependence isn't an issue, like with the
>> vectorizer adding runtime dependence checks and applying versioning
>> to get a vectorized and a not vectorized loop (in case there are 
>> dependences)?
>
> I'm not sure I understand you correctly.  Do you have a brief example,
> perhaps?  For mo_consume and its data dependencies, if there might be a
> dependence, the compiler would have to preserve it; but I guess that
> both a vectorized loop an one that accessses each element separately
> would preserve dependences because it's doing those accesses, and they
> depend on the input data.
> OTOH, peraps HW vector instructions don't get the ordering guarantees
> from data dependences -- Paul, do you know of any such cases?

A brief example would be for

 void foo (int *a, int *b, int n)
{
  for (int i = 0; i < n; ++i)
   a[i] = b[i];
}

which we can vectorize like

  if (a + n < b || b + n < a)
   {
  vectorized loop
   }
  else
{
   not vectorized loop
}

note how we're not establishing equivalences between pointers but
non-dependence vs. possible dependence.

>> > The background for this question is Paul McKenney's recently updated
>> > proposal for a different memory_order_consume specification:
>> > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0190r0.pdf
>> >
>> > In a nutshell, this requires a compiler to either prove that a pointer
>> > value is not carrying a dependency (simplified, its value somehow
>> > originates from a memory_order_consume load), or it has to
>> > conservatively assume that it does; if it does, the compiler must not
>> > turn data dependencies into control dependencies in generated code.
>> > (The data dependencies, in contrast to control dependencies, enforce
>> > memory ordering on archs such as Power and ARM; these orderings than
>> > allow for not having to use an acquire HW barrier in the generated
>> > code.)
>> >
>> > Given that such a proof will likely be hard for a compiler (dependency
>> > chains propagate through assignments to variables on the heap and stack,
>> > chains are not 

Re: Importance of transformations that turn data dependencies into control dependencies?

2016-02-25 Thread Torvald Riegel
On Thu, 2016-02-25 at 18:33 +0100, Torvald Riegel wrote:
> On Wed, 2016-02-24 at 13:14 +0100, Richard Biener wrote:
> > On Tue, Feb 23, 2016 at 8:38 PM, Torvald Riegel  wrote:
> > > I'd like to know, based on the GCC experience, how important we consider
> > > optimizations that may turn data dependencies of pointers into control
> > > dependencies.  I'm thinking about all optimizations or transformations
> > > that guess that a pointer might have a specific value, and then create
> > > (specialized) code that assumes this value that is only executed if the
> > > pointer actually has this value.  For example:
> > >
> > > int d[2] = {23, compute_something()};
> > >
> > > int compute(int v) {
> > >   if (likely(v == 23)) return 23;
> > >   else ;
> > > }
> > >
> > > int bar() {
> > >   int *p = ptr.load(memory_order_consume);
> > >   size_t reveal_that_p_is_in_d = p - d[0];
> > >   return compute(*p);
> > > }
> > >
> > > Could be transformed to (after inlining compute(), and specializing for
> > > the likely path):
> > >
> > > int bar() {
> > >   int *p = ptr.load(memory_order_consume);
> > >   if (p == d) return 23;
> > >   else ;
> > > }
> > 
> > Note that if a user writes
> > 
> >   if (p == d)
> >{
> >  ... do lots of stuff via p ...
> >}
> > 
> > GCC might rewrite accesses to p as accesses to d and thus expose
> > those opportunities.  Is that a transform that isn't valid then or is
> > the code written by the user (establishing the equivalency) to blame?
> 
> In the context of this memory_order_consume proposal, this transform
> would be valid because the program has already "reveiled" what value p
> has after the branch has been taken.
> 
> > There's a PR where this kind of equivalencies lead to unexpected (wrong?)
> > points-to results for example.
> > 
> > > Other potential examples that come to mind are de-virtualization, or
> > > feedback-directed optimizations that has observed at runtime that a
> > > certain pointer is likely to be always equal to some other pointer (eg.,
> > > if p is almost always d[0], and specializing for that).
> > 
> > That's the cases that are quite important in practice.
> 
> Could you quantify this somehow, even if it's a very rough estimate?
> I'm asking because it's significant and widely used, then this would
> require users or compiler implementors to make a difficult trade-off
> (ie, do you want mo_consume performance or performance through those
> other optimizations?).
> 
> > > Also, it would be interesting to me to know how often we may turn data
> > > dependencies into control dependencies in cases where this doesn't
> > > affect performance significantly.
> > 
> > I suppose we try to avoid that but can we ever know for sure?  Like
> > speculative devirtualization does this (with the intent that it _does_ 
> > matter,
> > of course).

Due to a think-o on my behalf, I need to add that the transformations
that turn data into control dependencies would need to operate on data
that is not considered "constant" during the lifetime of the
application; IOW, all modifications to the data accessed through the
original data dependence would always have to happen-before any of the
accesses that get turned into a control dependency.

In the de-virtualization case I suppose this would be the case, because
the vtables won't change, so if the compiler turns this:
  func = p->vtable[23];
into this
  if (p->vtable == structA)
func = structA.vtable[23];  // or inlines func directly...
then this would not matter for the memory_order_consume load because all
the vtables wouldn't get modified concurrently.



Re: Importance of transformations that turn data dependencies into control dependencies?

2016-02-25 Thread Torvald Riegel
On Wed, 2016-02-24 at 13:14 +0100, Richard Biener wrote:
> On Tue, Feb 23, 2016 at 8:38 PM, Torvald Riegel  wrote:
> > I'd like to know, based on the GCC experience, how important we consider
> > optimizations that may turn data dependencies of pointers into control
> > dependencies.  I'm thinking about all optimizations or transformations
> > that guess that a pointer might have a specific value, and then create
> > (specialized) code that assumes this value that is only executed if the
> > pointer actually has this value.  For example:
> >
> > int d[2] = {23, compute_something()};
> >
> > int compute(int v) {
> >   if (likely(v == 23)) return 23;
> >   else ;
> > }
> >
> > int bar() {
> >   int *p = ptr.load(memory_order_consume);
> >   size_t reveal_that_p_is_in_d = p - d[0];
> >   return compute(*p);
> > }
> >
> > Could be transformed to (after inlining compute(), and specializing for
> > the likely path):
> >
> > int bar() {
> >   int *p = ptr.load(memory_order_consume);
> >   if (p == d) return 23;
> >   else ;
> > }
> 
> Note that if a user writes
> 
>   if (p == d)
>{
>  ... do lots of stuff via p ...
>}
> 
> GCC might rewrite accesses to p as accesses to d and thus expose
> those opportunities.  Is that a transform that isn't valid then or is
> the code written by the user (establishing the equivalency) to blame?

In the context of this memory_order_consume proposal, this transform
would be valid because the program has already "reveiled" what value p
has after the branch has been taken.

> There's a PR where this kind of equivalencies lead to unexpected (wrong?)
> points-to results for example.
> 
> > Other potential examples that come to mind are de-virtualization, or
> > feedback-directed optimizations that has observed at runtime that a
> > certain pointer is likely to be always equal to some other pointer (eg.,
> > if p is almost always d[0], and specializing for that).
> 
> That's the cases that are quite important in practice.

Could you quantify this somehow, even if it's a very rough estimate?
I'm asking because it's significant and widely used, then this would
require users or compiler implementors to make a difficult trade-off
(ie, do you want mo_consume performance or performance through those
other optimizations?).

> > Also, it would be interesting to me to know how often we may turn data
> > dependencies into control dependencies in cases where this doesn't
> > affect performance significantly.
> 
> I suppose we try to avoid that but can we ever know for sure?  Like
> speculative devirtualization does this (with the intent that it _does_ matter,
> of course).
> 
> I suppose establishing non-dependence isn't an issue, like with the
> vectorizer adding runtime dependence checks and applying versioning
> to get a vectorized and a not vectorized loop (in case there are dependences)?

I'm not sure I understand you correctly.  Do you have a brief example,
perhaps?  For mo_consume and its data dependencies, if there might be a
dependence, the compiler would have to preserve it; but I guess that
both a vectorized loop an one that accessses each element separately
would preserve dependences because it's doing those accesses, and they
depend on the input data.
OTOH, peraps HW vector instructions don't get the ordering guarantees
from data dependences -- Paul, do you know of any such cases?

> > The background for this question is Paul McKenney's recently updated
> > proposal for a different memory_order_consume specification:
> > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0190r0.pdf
> >
> > In a nutshell, this requires a compiler to either prove that a pointer
> > value is not carrying a dependency (simplified, its value somehow
> > originates from a memory_order_consume load), or it has to
> > conservatively assume that it does; if it does, the compiler must not
> > turn data dependencies into control dependencies in generated code.
> > (The data dependencies, in contrast to control dependencies, enforce
> > memory ordering on archs such as Power and ARM; these orderings than
> > allow for not having to use an acquire HW barrier in the generated
> > code.)
> >
> > Given that such a proof will likely be hard for a compiler (dependency
> > chains propagate through assignments to variables on the heap and stack,
> > chains are not marked in the code, and points-to analysis can be hard),
> > a compiler faces a trade-off between either:
> > (1) trying to support this memory_order_consume specification and likely
> > disallowing all transformations that change data dependencies into
> > control dependencies, or
> > (2) not support the proposal by simply emitting memory_order_acquire
> > code, but get no new constraints on transformations in return (ie, what
> > we do for memory_order_consume today).
> >
> > A compiler could let users make this choice, but this will be hard for
> > users too, and the compiler would still have to pick a default.
> >
> > 

Re: Importance of transformations that turn data dependencies into control dependencies?

2016-02-24 Thread Richard Biener
On Tue, Feb 23, 2016 at 8:38 PM, Torvald Riegel  wrote:
> I'd like to know, based on the GCC experience, how important we consider
> optimizations that may turn data dependencies of pointers into control
> dependencies.  I'm thinking about all optimizations or transformations
> that guess that a pointer might have a specific value, and then create
> (specialized) code that assumes this value that is only executed if the
> pointer actually has this value.  For example:
>
> int d[2] = {23, compute_something()};
>
> int compute(int v) {
>   if (likely(v == 23)) return 23;
>   else ;
> }
>
> int bar() {
>   int *p = ptr.load(memory_order_consume);
>   size_t reveal_that_p_is_in_d = p - d[0];
>   return compute(*p);
> }
>
> Could be transformed to (after inlining compute(), and specializing for
> the likely path):
>
> int bar() {
>   int *p = ptr.load(memory_order_consume);
>   if (p == d) return 23;
>   else ;
> }

Note that if a user writes

  if (p == d)
   {
 ... do lots of stuff via p ...
   }

GCC might rewrite accesses to p as accesses to d and thus expose
those opportunities.  Is that a transform that isn't valid then or is
the code written by the user (establishing the equivalency) to blame?

There's a PR where this kind of equivalencies lead to unexpected (wrong?)
points-to results for example.

> Other potential examples that come to mind are de-virtualization, or
> feedback-directed optimizations that has observed at runtime that a
> certain pointer is likely to be always equal to some other pointer (eg.,
> if p is almost always d[0], and specializing for that).

That's the cases that are quite important in practice.

> Also, it would be interesting to me to know how often we may turn data
> dependencies into control dependencies in cases where this doesn't
> affect performance significantly.

I suppose we try to avoid that but can we ever know for sure?  Like
speculative devirtualization does this (with the intent that it _does_ matter,
of course).

I suppose establishing non-dependence isn't an issue, like with the
vectorizer adding runtime dependence checks and applying versioning
to get a vectorized and a not vectorized loop (in case there are dependences)?

> The background for this question is Paul McKenney's recently updated
> proposal for a different memory_order_consume specification:
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0190r0.pdf
>
> In a nutshell, this requires a compiler to either prove that a pointer
> value is not carrying a dependency (simplified, its value somehow
> originates from a memory_order_consume load), or it has to
> conservatively assume that it does; if it does, the compiler must not
> turn data dependencies into control dependencies in generated code.
> (The data dependencies, in contrast to control dependencies, enforce
> memory ordering on archs such as Power and ARM; these orderings than
> allow for not having to use an acquire HW barrier in the generated
> code.)
>
> Given that such a proof will likely be hard for a compiler (dependency
> chains propagate through assignments to variables on the heap and stack,
> chains are not marked in the code, and points-to analysis can be hard),
> a compiler faces a trade-off between either:
> (1) trying to support this memory_order_consume specification and likely
> disallowing all transformations that change data dependencies into
> control dependencies, or
> (2) not support the proposal by simply emitting memory_order_acquire
> code, but get no new constraints on transformations in return (ie, what
> we do for memory_order_consume today).
>
> A compiler could let users make this choice, but this will be hard for
> users too, and the compiler would still have to pick a default.
>
> Therefore, it would be good to know how important such transformations
> or optimizations are in practice.  If they are, it either means somewhat
> slower code everywhere (or at least having to change much in todays
> compilers), or not supporting the new memory_order_consume (at least not
> in the default setting of the compiler).

IMHO all the memory order semantics are too complicated already so what's
the reason to complicate it even more...?  See for example how our alias
analysis (which is supposed to also honor barriers) gives up completely
on atomics.

Richard.

> Thanks for any opinions.
>


Importance of transformations that turn data dependencies into control dependencies?

2016-02-23 Thread Torvald Riegel
I'd like to know, based on the GCC experience, how important we consider
optimizations that may turn data dependencies of pointers into control
dependencies.  I'm thinking about all optimizations or transformations
that guess that a pointer might have a specific value, and then create
(specialized) code that assumes this value that is only executed if the
pointer actually has this value.  For example:

int d[2] = {23, compute_something()};

int compute(int v) {
  if (likely(v == 23)) return 23;
  else ;
}

int bar() {
  int *p = ptr.load(memory_order_consume);
  size_t reveal_that_p_is_in_d = p - d[0];
  return compute(*p);
}

Could be transformed to (after inlining compute(), and specializing for
the likely path):

int bar() {
  int *p = ptr.load(memory_order_consume);
  if (p == d) return 23;
  else ;
}

Other potential examples that come to mind are de-virtualization, or
feedback-directed optimizations that has observed at runtime that a
certain pointer is likely to be always equal to some other pointer (eg.,
if p is almost always d[0], and specializing for that).

Also, it would be interesting to me to know how often we may turn data
dependencies into control dependencies in cases where this doesn't
affect performance significantly.


The background for this question is Paul McKenney's recently updated
proposal for a different memory_order_consume specification:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0190r0.pdf

In a nutshell, this requires a compiler to either prove that a pointer
value is not carrying a dependency (simplified, its value somehow
originates from a memory_order_consume load), or it has to
conservatively assume that it does; if it does, the compiler must not
turn data dependencies into control dependencies in generated code.
(The data dependencies, in contrast to control dependencies, enforce
memory ordering on archs such as Power and ARM; these orderings than
allow for not having to use an acquire HW barrier in the generated
code.)

Given that such a proof will likely be hard for a compiler (dependency
chains propagate through assignments to variables on the heap and stack,
chains are not marked in the code, and points-to analysis can be hard),
a compiler faces a trade-off between either:
(1) trying to support this memory_order_consume specification and likely
disallowing all transformations that change data dependencies into
control dependencies, or
(2) not support the proposal by simply emitting memory_order_acquire
code, but get no new constraints on transformations in return (ie, what
we do for memory_order_consume today).

A compiler could let users make this choice, but this will be hard for
users too, and the compiler would still have to pick a default.

Therefore, it would be good to know how important such transformations
or optimizations are in practice.  If they are, it either means somewhat
slower code everywhere (or at least having to change much in todays
compilers), or not supporting the new memory_order_consume (at least not
in the default setting of the compiler).

Thanks for any opinions.