Hi,
Thanks for the help. Some more questions:
1) I am trying to workshare reduction operators, currently working on
SUM.
INTEGER N
REAL AA(N), MYSUM
!$OMP PARALLEL
!$OMP WORKSHARE
MYSUM = SUM(AA)
!$OMP END WORKSHARE
!$OMP END PARALLEL
To compute SUM, the scalarizer creates a temporary variable (let's call
it val2) for accumulating the sum.
In order to workshare the sum, I am attempting to create an OMP_FOR loop
with an omp reduction clause for the temporary val2. In pseudocode this
would be
OMP DO REDUCTION(+:val)
DO I=1,N
val2 = val2 + AA[I]
END DO
The problem is that I get an error from the gimplifier: "reduction
variable val.2 is private in outer context". I think this is because the
parallel region assumes val2 is a private variable.
I have tried creating an extra omp clause shared for val2
sharedreduction = build_omp_clause(OMP_CLAUSE_SHARED);
OMP_CLAUSE_DECL(sharedreduction) = reduction_variable;
where reduction_variable is the tree node for val2. I am attaching this
clause to the clauses of the OMP_PARALLEL construct.
Doing this breaks the following assertion in gimplify.c:omp_add_variable
/* The only combination of data sharing classes we should see is
FIRSTPRIVATE and LASTPRIVATE. */
nflags = n->value | flags;
gcc_assert ((nflags & GOVD_DATA_SHARE_CLASS)
== (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE));
I think this happens because val2 is first added with GOVD_SHARED |
GOVD_EXPLICIT flags because of my shared clause, and later re-added
(from the default parallel construct handling?) with GOVD_LOCAL |
GOVD_SEEN attributes.
Ignoring this, another assertion breaks in expr.c:
/* Variables inherited from containing functions should have
been lowered by this point. */
context = decl_function_context (exp);
gcc_assert (!context
|| context == current_function_decl
|| TREE_STATIC (exp)
/* ??? C++ creates functions that are not
TREE_STATIC*/
|| TREE_CODE (exp) == FUNCTION_DECL);
I guess val2 is not lowered properly? Ignoring this assertion triggers
an rtl error (assigning wrong machine codes DI to SF) so something is
definitely wrong.
Do I need to attach val2's tree node declaration somewhere else?
2) again for the reduction operators, I would subsequently do the scalar
assignment MYSUM = val2 by one thread using omp single. Is there a
better way? I don't think I can use the program-defined mysum as the
reduction variable inside the sum loop because the rhs needs to be
evaluated before the lhs is assigned to.
3) gfc_check_dependency seems to be an appropriate helper function for
the dependence analysis in the statements of the workshare . If you have
other suggestions let me know.
thanks,
- Vasilis
On Mon, Apr 14, 2008 at 6:47 AM, Jakub Jelinek <[EMAIL PROTECTED]> wrote:
> Hi!
>
>
> On Wed, Apr 09, 2008 at 11:29:24PM -0500, Vasilis Liaskovitis wrote:
> > I am a beginner interested in learning gcc internals and contributing
> > to the community.
>
> Thanks for showing interest in this area!
>
>
> > I have started implementing PR35423 - omp workshare in the fortran
> > front-end. I have some questions - any guidance and suggestions are
> > welcome:
> >
> > - For scalar assignments, wrapping them in OMP_SINGLE clause.
>
> Yes, though if there is a couple of adjacent scalar assignments which don't
> involve function calls and won't take too long to execute, you want
> to put them all into one OMP_SINGLE. If the assignments make take long
> because of function calls and there are several such ones adjacent,
> you can use OMP_WORKSHARE.
>
> Furthermore, for all statements, not just the scalar ones, you want to
> do dependency analysis between all the statements within !$omp workshare,
> and make OMP_SINGLE, OMP_FOR or OMP_SECTIONS and add OMP_CLAUSE_NOWAIT
> to them where no barrier is needed.
>
>
> > - Array/subarray assignments: For assignments handled by the
> > scalarizer, I now create an OMP_FOR loop instead of a LOOP_EXPR for
> > the outermost scalarized loop. This achieves worksharing at the
> > outermost loop level.
>
> Yes, though on gomp-3_0-branch you actually could use collapsed OMP_FOR
> loop too. Just bear in mind that for best performance at least with
> static OMP_FOR scheduling ideally the same memory (part of array in this
> case) is accessed by the same thread, as then it is in that CPU's caches.
> Of course that's not always possible, but if it can be done, gfortran
> should try that.
>
>
> > Some array assignments are handled by functions (e.g.
> > gfc_build_memcpy_call generates calls to memcpy). For these, I believe
> > we need to divide the arrays into chunks and have each thread call the
> > builtin function on its own chunk. E.g. If we have the following call
> > in a parallel workshare construct:
> >
> > memcpy(dst, src, len)
> >
> > I generate this pseudocode:
> >
> > {