[Bug tree-optimization/68673] Handle __builtin_GOMP_task optimally in ipa-pta
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68673 --- Comment #7 from vries at gcc dot gnu.org --- The constraints generated for the GOMP_task call look ok to me: ... _Z2f3v._omp_cpyfn.1.arg0 = _CPFN_TMP.0+64 _Z2f3v._omp_cpyfn.1.arg1 = &.omp_data_o.2.0+64 _Z2f3v._omp_fn.0.arg0 = _CPFN_TMP.0+64 ... And we seem to be able to conclude that the loads and stores in _Z2f3v._omp_fn.0 point to a, b and c: ... _12 = { ESCAPED NONLOCAL a } _14 = { ESCAPED NONLOCAL b } _16 = { ESCAPED NONLOCAL c } ... But AFAIU, the escaped/nonlocal bit causes the optimization to fail. I'm not sure yet if the escaped/nonlocal bit is: - too conservative, or - actually necessary (due to either GOMP_task, or the testcase).
[Bug tree-optimization/68673] Handle __builtin_GOMP_task optimally in ipa-pta
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68673 --- Comment #4 from vries at gcc dot gnu.org --- Created attachment 36912 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36912=edit task-ipa-pta-3.C.074i.pta (pta dump for f3)
[Bug tree-optimization/68673] Handle __builtin_GOMP_task optimally in ipa-pta
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68673 --- Comment #2 from vries at gcc dot gnu.org --- Created attachment 36910 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36910=edit tentative patch, part 1, handles f1 and f2
[Bug tree-optimization/68673] Handle __builtin_GOMP_task optimally in ipa-pta
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68673 --- Comment #3 from vries at gcc dot gnu.org --- Created attachment 36911 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36911=edit tentative patch, part 2, attempt to handle f3
[Bug tree-optimization/68673] Handle __builtin_GOMP_task optimally in ipa-pta
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68673 --- Comment #6 from Jakub Jelinek --- Note GOMP_target_ext has similar depend argument.
[Bug tree-optimization/68673] Handle __builtin_GOMP_task optimally in ipa-pta
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68673 --- Comment #5 from Jakub Jelinek --- Comment on attachment 36910 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36910 tentative patch, part 1, handles f1 and f2 What is the reason for testing for NULL depend? I guess it depends on what the tree-ssa-structalias.c code is actually doing with it. depend is either a NULL, or pointer to an array which contains pointers and some sizes, but 1) those pointers/sizes never escape to the task callback functions 2) they are never actually used as addresses, just as cookies. So, we need to ensure that if depend is non-NULL, then PTA/IPA-PTA/alias oracle/whatever else does not try to optimize the argument away, or think it isn't used, so from this POV the call acts as a readonly use of the array. But other than that it acts as GOMP_task with NULL depend.
[Bug tree-optimization/68673] Handle __builtin_GOMP_task optimally in ipa-pta
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68673 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #1 from Jakub Jelinek --- E.g. #define N 2 int f1 (void) { int a[N], b[N], c[N]; int *ap = [0]; int *bp = [0]; int *cp = [0]; #pragma omp task shared (ap, bp, cp) for (unsigned int idx = 0; idx < N; idx++) { ap[idx] = 1; bp[idx] = 2; cp[idx] = ap[idx]; } return *cp; } int f2 (void) { int a[N], b[N], c[N]; int *ap = [0]; int *bp = [0]; int *cp = [0]; #pragma omp task // implicitly firstprivate (ap, bp, cp) for (unsigned int idx = 0; idx < N; idx++) { ap[idx] = 1; bp[idx] = 2; cp[idx] = ap[idx]; } return *cp; } struct A { A (); A (const A &); ~A (); int a; void foo (); }; int f3 (void) { int a[N], b[N], c[N]; int *ap = [0]; int *bp = [0]; int *cp = [0]; A d; #pragma omp task shared (ap, bp, cp) firstprivate (d) for (unsigned int idx = 0; idx < N; idx++) { d.foo (); ap[idx] = 1; bp[idx] = 2; cp[idx] = ap[idx]; } return *cp; } The first two GOMP_task calls have the same struct used on the caller and callee, while the third one has different structs (the caller's struct has a pointer to A in it, the callee's struct has A itself in it, the cpyfn callback is called to run the copy constructor and copy other pointers etc. from the caller's struct to the callee's struct. Note with the recent optimization I wrote, f1 is handled as firstprivate (ap, bp, cp) too, because none of those pointers are written in the task region.