[Bug tree-optimization/68673] Handle __builtin_GOMP_task optimally in ipa-pta

2015-12-04 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68673

--- Comment #7 from vries at gcc dot gnu.org ---
The constraints generated for the GOMP_task call look ok to me:
...
_Z2f3v._omp_cpyfn.1.arg0 = _CPFN_TMP.0+64
_Z2f3v._omp_cpyfn.1.arg1 = &.omp_data_o.2.0+64

_Z2f3v._omp_fn.0.arg0 = _CPFN_TMP.0+64
...

And we seem to be able to conclude that the loads and stores in
_Z2f3v._omp_fn.0 point to a, b and c:
...
_12 = { ESCAPED NONLOCAL a }
_14 = { ESCAPED NONLOCAL b }
_16 = { ESCAPED NONLOCAL c }
...
But AFAIU, the escaped/nonlocal bit causes the optimization to fail.

I'm not sure yet if the escaped/nonlocal bit is:
- too conservative, or
- actually necessary (due to either GOMP_task, or the testcase).

[Bug tree-optimization/68673] Handle __builtin_GOMP_task optimally in ipa-pta

2015-12-04 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68673

--- Comment #4 from vries at gcc dot gnu.org ---
Created attachment 36912
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36912=edit
task-ipa-pta-3.C.074i.pta (pta dump for f3)

[Bug tree-optimization/68673] Handle __builtin_GOMP_task optimally in ipa-pta

2015-12-04 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68673

--- Comment #2 from vries at gcc dot gnu.org ---
Created attachment 36910
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36910=edit
tentative patch, part 1, handles f1 and f2

[Bug tree-optimization/68673] Handle __builtin_GOMP_task optimally in ipa-pta

2015-12-04 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68673

--- Comment #3 from vries at gcc dot gnu.org ---
Created attachment 36911
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36911=edit
tentative patch, part 2, attempt to handle f3

[Bug tree-optimization/68673] Handle __builtin_GOMP_task optimally in ipa-pta

2015-12-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68673

--- Comment #6 from Jakub Jelinek  ---
Note GOMP_target_ext has similar depend argument.

[Bug tree-optimization/68673] Handle __builtin_GOMP_task optimally in ipa-pta

2015-12-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68673

--- Comment #5 from Jakub Jelinek  ---
Comment on attachment 36910
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36910
tentative patch, part 1, handles f1 and f2

What is the reason for testing for NULL depend?
I guess it depends on what the tree-ssa-structalias.c code is actually doing
with it.  depend is either a NULL, or pointer to an array which contains
pointers and some sizes, but 1) those pointers/sizes never escape to the task
callback functions 2) they are never actually used as addresses, just as
cookies.
So, we need to ensure that if depend is non-NULL, then PTA/IPA-PTA/alias
oracle/whatever else does not try to optimize the argument away, or think it
isn't used, so from this POV the call acts as a readonly use of the array.  But
other than that it acts as GOMP_task with NULL depend.

[Bug tree-optimization/68673] Handle __builtin_GOMP_task optimally in ipa-pta

2015-12-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68673

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
E.g.
#define N 2

int
f1 (void)
{
  int a[N], b[N], c[N];
  int *ap = [0];
  int *bp = [0];
  int *cp = [0];

#pragma omp task shared (ap, bp, cp)
  for (unsigned int idx = 0; idx < N; idx++)
{
  ap[idx] = 1;
  bp[idx] = 2;
  cp[idx] = ap[idx];
}

  return *cp;
}

int
f2 (void)
{
  int a[N], b[N], c[N];
  int *ap = [0];
  int *bp = [0];
  int *cp = [0];

#pragma omp task // implicitly firstprivate (ap, bp, cp)
  for (unsigned int idx = 0; idx < N; idx++)
{
  ap[idx] = 1;
  bp[idx] = 2;
  cp[idx] = ap[idx];
}

  return *cp;
}

struct A { A (); A (const A &); ~A (); int a; void foo (); };

int
f3 (void)
{
  int a[N], b[N], c[N];
  int *ap = [0];
  int *bp = [0];
  int *cp = [0];
  A d;

#pragma omp task shared (ap, bp, cp) firstprivate (d)
  for (unsigned int idx = 0; idx < N; idx++)
{
  d.foo ();
  ap[idx] = 1;
  bp[idx] = 2;
  cp[idx] = ap[idx];
}

  return *cp;
}

The first two GOMP_task calls have the same struct used on the caller and
callee, while the third one has different structs (the caller's struct has a
pointer to A in it, the callee's struct has A itself in it, the cpyfn callback
is called to run the copy constructor and copy other pointers etc. from the
caller's struct to the callee's struct.  Note with the recent optimization I
wrote, f1 is handled as firstprivate (ap, bp, cp) too, because none of those
pointers are written in the task region.