[Bug middle-end/90348] [7/8/9/10 Regression] Partition of char arrays is incorrect in some cases

2019-05-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348

--- Comment #15 from Richard Biener  ---
Educating people about -fstack-reuse is also a possibility, thus leave the
issue to workarounds like that, experimenting with full rewrites that are
obviously not back-portable.

[Bug middle-end/90348] [7/8/9/10 Regression] Partition of char arrays is incorrect in some cases

2019-05-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348

--- Comment #14 from Jakub Jelinek  ---
(In reply to Michael Matz from comment #12)
> (In reply to Jakub Jelinek from comment #11)
> > before that region.  If we can say for:
> >   for (...)
> > {
> >   unsigned char v[10];
> >   unsigned char *p = foo (v);
> >   *p = 1;
> >   unsigned char w[10];
> >   bar (w);
> > }
> > hoist the p = foo (v); call before the loop, then indeed we are in big
> > trouble.
> 
> This is effectively what the testcase is doing (just that 'foo' is no call,
> but a normal address expression), so yes, we can do that, and yes we are in
> big
> trouble :-/

Well, that p = foo (v); or store of memory is something that will have (unless
const call, indeed) a vuse or even a vdef and the alias oracle should usually
consider it to be conflicting with the clobber stmt, so I'd hope it isn't
hoisted in that case, while for the pure address arithmetics there is nothing
that can easily stop the hoisting.

Now, if there isn't really an easy way out of this and given how rarely this
has been reported (I think we have this PR and PR61203, don't remember anything
else), the options can be do nothing, or do something simple that isn't that
expensive, will fix this testcase and while not perfect solution will still
allow variable sharing in the general case almost as often as before, or change
the IL representation so that such hoisting or sinking of addresses is not
possible.

[Bug middle-end/90348] [7/8/9/10 Regression] Partition of char arrays is incorrect in some cases

2019-05-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348

--- Comment #13 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #11)
> (In reply to Richard Biener from comment #10)
> > (In reply to Jakub Jelinek from comment #9)
> > > Created attachment 46312 [details]
> > > gcc10-pr90348.patch
> > > 
> > > Untested patch that implements what was written in #c5.  I agree that
> > > without further changes to the IL, determining if one can hoist addresses 
> > > of
> > > local variables or not is going to be hard, would require computing the
> > > variable life info in each pass that would do something similar.  On the
> > > other side, admittedly such hoisting results in worse code generation
> > > because if the address is hoisted earlier than where it used to be live
> > > before, then there will be more stack conflicts.
> > 
> > Ick.  You need to handle pt->anything and pt->escaped (walk the escaped
> > solution) as well.  And !SSA_NAME_PTR_INFO (op) is of course the same
> > as pt->anything.
> 
> Do I?
> I thought I don't.  The thing is, I believe the partitioning only cares
> about where (originally in the source) the automatic variables are
> referenced.
> Now, if I have say:
> __attribute__((noipa)) type *foo (type *x) { return x; }
> or similar, in order for a variable to be live before the clobber, it needs
> to escape in the area where the variable is live, we don't care what we do
> with whatever it returned, unless we actually hoist such escaping calls
> before that region.  If we can say for:
>   for (...)
> {
>   unsigned char v[10];
>   unsigned char *p = foo (v);
>   *p = 1;
>   unsigned char w[10];
>   bar (w);
> }
> hoist the p = foo (v); call before the loop, then indeed we are in big
> trouble.
> But if we can't, the attached patch was just meant as a shorthand for trying
> to do a dataflow propagation of the can a pointer SSA_NAME point to variable
> xyz, if we don't consider escaping (to other functions and to global state).
> What I'm trying to "undo" is just the hoisting of the address loads, not
> hoisting of those address loads plus function calls in which it escapes or
> where that address is saved to memory.
> 
> If I have to consider pt->anything and pt->escaped, then it will be as
> useless for the variable conflicts as is removing the important clearing of
> the bitmap bit on clobber stmt, we won't share stack slots pretty much at
> all.

The point about !SSA_NAME_PTR_INFO is that passes might clear it, replace
such SSA name with a new one, forgetting to copy it, etc.  The point
about anything is that points-to analysis might just have given up
(consider the provenance thing where we might end up with anything for
going through some uintptr arithmetic), the point about escaped is that
this is just a placeholder for a set of variables and points-to tends
to shrink sets by removing bits also in escaped.  Also
global = ptr; ptr2 = global; will have escaped in the sets but not
necessarily the bit of the original decl.

So my comments are just about correctly interpreting points-to data.

[Bug middle-end/90348] [7/8/9/10 Regression] Partition of char arrays is incorrect in some cases

2019-05-07 Thread matz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348

--- Comment #12 from Michael Matz  ---
(In reply to Jakub Jelinek from comment #11)
> before that region.  If we can say for:
>   for (...)
> {
>   unsigned char v[10];
>   unsigned char *p = foo (v);
>   *p = 1;
>   unsigned char w[10];
>   bar (w);
> }
> hoist the p = foo (v); call before the loop, then indeed we are in big
> trouble.

This is effectively what the testcase is doing (just that 'foo' is no call, but
a normal address expression), so yes, we can do that, and yes we are in big
trouble :-/

> If I have to consider pt->anything and pt->escaped, then it will be as
> useless for the variable conflicts as is removing the important clearing of
> the bitmap bit on clobber stmt, we won't share stack slots pretty much at
> all.

Yeah; if we don't want to patch the specific situation for this testcase
(which might be okayish, we haven't seen this problem very often over the
last years), but want to really fix it we might have to take more involved
means like doing stack slot sharing before gimplification and rewriting
the IL to reflect this.  Or give up on sharing (which isn't a good idea).
Gnah.

[Bug middle-end/90348] [7/8/9/10 Regression] Partition of char arrays is incorrect in some cases

2019-05-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348

--- Comment #11 from Jakub Jelinek  ---
(In reply to Richard Biener from comment #10)
> (In reply to Jakub Jelinek from comment #9)
> > Created attachment 46312 [details]
> > gcc10-pr90348.patch
> > 
> > Untested patch that implements what was written in #c5.  I agree that
> > without further changes to the IL, determining if one can hoist addresses of
> > local variables or not is going to be hard, would require computing the
> > variable life info in each pass that would do something similar.  On the
> > other side, admittedly such hoisting results in worse code generation
> > because if the address is hoisted earlier than where it used to be live
> > before, then there will be more stack conflicts.
> 
> Ick.  You need to handle pt->anything and pt->escaped (walk the escaped
> solution) as well.  And !SSA_NAME_PTR_INFO (op) is of course the same
> as pt->anything.

Do I?
I thought I don't.  The thing is, I believe the partitioning only cares about
where (originally in the source) the automatic variables are referenced.
Now, if I have say:
__attribute__((noipa)) type *foo (type *x) { return x; }
or similar, in order for a variable to be live before the clobber, it needs to
escape in the area where the variable is live, we don't care what we do with
whatever it returned, unless we actually hoist such escaping calls before that
region.  If we can say for:
  for (...)
{
  unsigned char v[10];
  unsigned char *p = foo (v);
  *p = 1;
  unsigned char w[10];
  bar (w);
}
hoist the p = foo (v); call before the loop, then indeed we are in big trouble.
But if we can't, the attached patch was just meant as a shorthand for trying to
do a dataflow propagation of the can a pointer SSA_NAME point to variable xyz,
if we don't consider escaping (to other functions and to global state).
What I'm trying to "undo" is just the hoisting of the address loads, not
hoisting of those address loads plus function calls in which it escapes or
where that address is saved to memory.

If I have to consider pt->anything and pt->escaped, then it will be as useless
for the variable conflicts as is removing the important clearing of the bitmap
bit on clobber stmt, we won't share stack slots pretty much at all.

[Bug middle-end/90348] [7/8/9/10 Regression] Partition of char arrays is incorrect in some cases

2019-05-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348

--- Comment #10 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #9)
> Created attachment 46312 [details]
> gcc10-pr90348.patch
> 
> Untested patch that implements what was written in #c5.  I agree that
> without further changes to the IL, determining if one can hoist addresses of
> local variables or not is going to be hard, would require computing the
> variable life info in each pass that would do something similar.  On the
> other side, admittedly such hoisting results in worse code generation
> because if the address is hoisted earlier than where it used to be live
> before, then there will be more stack conflicts.

Ick.  You need to handle pt->anything and pt->escaped (walk the escaped
solution) as well.  And !SSA_NAME_PTR_INFO (op) is of course the same
as pt->anything.

I see it quickly degrading given the last PTA compute is far far away...

[Bug middle-end/90348] [7/8/9/10 Regression] Partition of char arrays is incorrect in some cases

2019-05-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348

--- Comment #9 from Jakub Jelinek  ---
Created attachment 46312
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46312=edit
gcc10-pr90348.patch

Untested patch that implements what was written in #c5.  I agree that without
further changes to the IL, determining if one can hoist addresses of local
variables or not is going to be hard, would require computing the variable life
info in each pass that would do something similar.  On the other side,
admittedly such hoisting results in worse code generation because if the
address is hoisted earlier than where it used to be live before, then there
will be more stack conflicts.

[Bug middle-end/90348] [7/8/9/10 Regression] Partition of char arrays is incorrect in some cases

2019-05-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348

--- Comment #8 from Richard Biener  ---
(In reply to Michael Matz from comment #7)
> No, this is not a problem in the stack slot sharing algorithm, but rather in
> the input.  As presented to expand, and only showing the important parts,
> and reordering some BBs to make the flow more obvious:
> 
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _30 = (unsigned long) 
>   ivtmp.27_29 = _30 + 1;
>   goto ; [100.00%]
> 
> So, 'in' becomes "live" here, it's address in _30 and _29.  Fallthrough to
> bb5,
> which also uses in, but otherwise is uninteresting, except that it can reach
> only BBs 6 and 7:
> 
> ;;   basic block 5, loop depth 1
>   ...
>   _2 = check_zero (, _31);
>   if (_2 != 0)
> goto ; [99.96%]
>   else
> goto ; [0.04%]
> 
> bb6 is a no-return block, hence uninteresting.  bb7 _is_ interesting in that
> it clobbers in:
> 
> ;;   basic block 7, loop depth 1
> ;;pred:   5
>   in ={v} {CLOBBER};
>   if (i_11 != 5)
> goto ; [83.33%]
>   else
> goto ; [16.67%]
> 
> Note that the semantics of the clobber is not only that the former contents
> are destroyed, but rather that the very storage associated with the clobbered
> entity is gone.  So, from now on, any pointers into 'in', and memory accesses
> into 'in' are invalid.  Nevertheless the flow from bb7 goes to bb 8 and 9,
> the latter being the return block, so:
> 
> ;;   basic block 8, loop depth 1
> ;;pred:   7
>   if (i_11 > 0)
> goto ; [100.00%]
>   else
> goto ; [0.00%]
> 
> and here we finally have a path into bb3, which is the other interesting one:
> 
> ;;   basic block 3, loop depth 2
>   # ivtmp.20_6 = PHI <_30(8), ivtmp.20_18(3)>
> 
>  BOOM!  Here _30 is used, and ...
> 
>   _4 = (void *) ivtmp.20_6;
>   MEM[base: _4, offset: 0B] = 0;
> 
> ... even written into ...  That's invalid.  _30 is associated with an
> object that is clobbered and gone ...
> 
>   set_one ();
>   buf ={v} {CLOBBER};
>   ivtmp.20_18 = ivtmp.20_6 + 1;
> 
> ... and as the MEM[] write can't have possibly been into 'in' (as that is
> invalid, as 'in' didn't exist at the MEM access), it's okay and sensible to
> allocate 'in' and 'buf' into the same memory.
> 
> It seems to be a common misconception of what the clobber is really about.
> I designed it to mean what I wrote above, the storage associated with the
> clobbered entities is gone after the clobber (not merely it's former
> contents!). 
> 
> But ivopts or dom extends the lifetime of 'in' (by moving an address-taken
> earlier) and hence the lifetime of its storage, but without doing anything
> about the clobber (and hence not _really_ extending the lifetime).  That
> doesn't work.
> It's basically a mirrored transformation of the usual take-address-of-local
> and access it out of it's declared scope, just that here the _start_ of the
> supposed lifetime is moved out of the declared scope, not the end.

While I spotted this "issue" as well I respectfully disagree about
the semantics.  Not because they are not sound but because of
the implementation challenge resolving around address-takens being
values with no data dependence on the clobbers.

I thought the liveness algorithm would be able to handle this situation.
If not we need to prune clobbers that became "invalid" somehow.
Not sure how - the address value never "dies", so we'd need to compute
liveness of the things the address escapes to - SSA names are easy,
calls - well, we have to give up.  This possiby defeats the whole idea
of doing stack sharing analysis (I remember kernel/fortran testcases
passing the stack slots by reference to a function).

Now if we would want to try forcing the original semantics we need a
verifier verifying the IL state against bogus transforms - but then we
would have a way to prune the invalid ones.

[Bug middle-end/90348] [7/8/9/10 Regression] Partition of char arrays is incorrect in some cases

2019-05-06 Thread matz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348

--- Comment #7 from Michael Matz  ---
No, this is not a problem in the stack slot sharing algorithm, but rather in
the input.  As presented to expand, and only showing the important parts,
and reordering some BBs to make the flow more obvious:

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _30 = (unsigned long) 
  ivtmp.27_29 = _30 + 1;
  goto ; [100.00%]

So, 'in' becomes "live" here, it's address in _30 and _29.  Fallthrough to bb5,
which also uses in, but otherwise is uninteresting, except that it can reach
only BBs 6 and 7:

;;   basic block 5, loop depth 1
  ...
  _2 = check_zero (, _31);
  if (_2 != 0)
goto ; [99.96%]
  else
goto ; [0.04%]

bb6 is a no-return block, hence uninteresting.  bb7 _is_ interesting in that
it clobbers in:

;;   basic block 7, loop depth 1
;;pred:   5
  in ={v} {CLOBBER};
  if (i_11 != 5)
goto ; [83.33%]
  else
goto ; [16.67%]

Note that the semantics of the clobber is not only that the former contents
are destroyed, but rather that the very storage associated with the clobbered
entity is gone.  So, from now on, any pointers into 'in', and memory accesses
into 'in' are invalid.  Nevertheless the flow from bb7 goes to bb 8 and 9,
the latter being the return block, so:

;;   basic block 8, loop depth 1
;;pred:   7
  if (i_11 > 0)
goto ; [100.00%]
  else
goto ; [0.00%]

and here we finally have a path into bb3, which is the other interesting one:

;;   basic block 3, loop depth 2
  # ivtmp.20_6 = PHI <_30(8), ivtmp.20_18(3)>

 BOOM!  Here _30 is used, and ...

  _4 = (void *) ivtmp.20_6;
  MEM[base: _4, offset: 0B] = 0;

... even written into ...  That's invalid.  _30 is associated with an
object that is clobbered and gone ...

  set_one ();
  buf ={v} {CLOBBER};
  ivtmp.20_18 = ivtmp.20_6 + 1;

... and as the MEM[] write can't have possibly been into 'in' (as that is
invalid, as 'in' didn't exist at the MEM access), it's okay and sensible to
allocate 'in' and 'buf' into the same memory.

It seems to be a common misconception of what the clobber is really about.
I designed it to mean what I wrote above, the storage associated with the
clobbered entities is gone after the clobber (not merely it's former
contents!). 

But ivopts or dom extends the lifetime of 'in' (by moving an address-taken
earlier) and hence the lifetime of its storage, but without doing anything
about the clobber (and hence not _really_ extending the lifetime).  That
doesn't work.
It's basically a mirrored transformation of the usual take-address-of-local
and access it out of it's declared scope, just that here the _start_ of the
supposed lifetime is moved out of the declared scope, not the end.

[Bug middle-end/90348] [7/8/9/10 Regression] Partition of char arrays is incorrect in some cases

2019-05-06 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348

--- Comment #6 from Jakub Jelinek  ---
We'd probably need to change decl_to_stack_part from hash_map to
hash_map that would just map DECL_UIDs which have
DECL_RTL_IF_SET equal to pc_rtx to the partition numbers, rather than from
decl, or have some other mapping from DECL_UIDs that appear in the points to
info vars back to the decls.

[Bug middle-end/90348] [7/8/9/10 Regression] Partition of char arrays is incorrect in some cases

2019-05-06 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek  ---
I think before ivopts this looks fine, we have:
   [local count: 858993460]:
  # j_20 = PHI <0(10), j_13(11)>
  in[j_20] = 0;
  set_one ();
  buf ={v} {CLOBBER};
  j_13 = j_20 + 1;
  if (i_6 > j_20)
goto ; [80.00%]
  else
goto ; [20.00%]
in the inner loop and
   [local count: 214748365]:
  in ={v} {CLOBBER};
  i_10 = i_6 + 1;
  ivtmp_22 = ivtmp_4 - 1;
  if (ivtmp_22 != 0)
goto ; [80.00%]
  else
goto ; [20.00%]
near the end of the outer loop.  The problem is that ivopts turns the in[j_20]
access into
   [local count: 42959980]:
  _30 = (unsigned int) 
  ivtmp.27_29 = _30 + 1;
  goto ; [100.00%]

   [local count: 171798691]:
  ivtmp.21_16 = (unsigned int) 
  _23 = (unsigned int) 

   [local count: 858993460]:
  # ivtmp.21_21 = PHI 
  _18 = (void *) ivtmp.21_21;
  MEM[base: _18, offset: 0B] = 0;
  set_one ();
  buf ={v} {CLOBBER};
which is still fine, for the life analysis we still see in as live in bb 10 and
therefore in bb3 too.
But later on dom3 changes
[local count: 171798691]:
-  ivtmp.21_16 = (unsigned int) 
-  _23 = (unsigned int) 
+  ivtmp.21_16 = _30;
+  _23 = _30;
and now  is live clearly only in bb2 before the outer loop for the life
analysis purposes.  Now, if we kill that
  if (gimple_clobber_p (stmt))
{
  tree lhs = gimple_assign_lhs (stmt);
  size_t *v;
  /* Nested function lowering might introduce LHSs
 that are COMPONENT_REFs.  */
  if (!VAR_P (lhs))
continue;
  if (DECL_RTL_IF_SET (lhs) == pc_rtx
  && (v = decl_to_stack_part->get (lhs)))
bitmap_clear_bit (work, *v);
}
you've mentioned, I'm afraid we lose all the stack slot sharing possibilities,
or at least most of them.
So I'm afraid we need to do something smarter.
Either we need to track during the life analysis algorithm what variables
SSA_NAMEs can point to (even for non-pointers, as in this testcase ivopts uses
an integral value (unsigned int)  and then later casts to integers), or
could we use the alias analysis points to information for that?

For this testcase, it would be enough to walk for referenced pointer SSA_NAMEs
through their points to variables and mark those variables as also seen in the
IL.

[Bug middle-end/90348] [7/8/9/10 Regression] Partition of char arrays is incorrect in some cases

2019-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug middle-end/90348] [7/8/9/10 Regression] Partition of char arrays is incorrect in some cases

2019-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348

Richard Biener  changed:

   What|Removed |Added

 CC||matz at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org

--- Comment #4 from Richard Biener  ---
Sth wrong with liveness analysis.  There's obvious liveness of IN from BB2:

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _30 = (unsigned long) 
  ivtmp.30_29 = _30 + 1;
  goto ; [100.00%]
;;succ:   5

to

;;   basic block 7, loop depth 1
;;pred:   5
  in ={v} {CLOBBER};
  i_10 = i_6 + 1;
  if (i_10 != 5)
goto ; [80.00%]
  else
goto ; [20.00%]
;;succ:   3
;;8

but maybe that being at different loop depth somehow confuses the algorithm
in fact having it there seems odd to me but the address-taking in BB2 is
the result of IVOPTs hoisting.  The CLOBBER doesn't effect hoisting
the address but RTL expansion liveness compute splits 'in' into multiple
logical instances at the CLOBBER which _does_ make the addresses effectively
different ...