Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2022-02-15 Thread Julian Brown
On Mon, 14 Feb 2022 16:56:35 +0100
Thomas Schwinge  wrote:

> Hi Julian!
> 
> Two more questions here, in context of 
> "[12 Regression] ICE in expand_gimple_stmt_1, at cfgexpand.c:3932
> since r12-980-g29a2f51806c":
> 
> On 2019-06-03T17:02:45+0100, Julian Brown 
> wrote:
> > +/* Record vars listed in private clauses in CLAUSES in CTX.  This
> > information
> > +   is used to mark up variables that should be made private
> > per-gang.  */ +
> > +static void
> > +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
> > +{
> > +  tree c;
> > +
> > +  if (!ctx)
> > +return;
> > +
> > +  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> > +if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
> > +  {
> > +   tree decl = OMP_CLAUSE_DECL (c);
> > +   if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
> > + ctx->oacc_addressable_var_decls->safe_push (decl);
> > +  }
> > +}  
> 
> So, here we analyze 'OMP_CLAUSE_DECL (c)' (as is, without translation
> through 'lookup_decl (decl, ctx)')...

I think you're right that this one should be using lookup_decl, but...

> > +/* Record addressable vars declared in BINDVARS in CTX.  This
> > information is
> > +   used to mark up variables that should be made private per-gang.
> >  */ +
> > +static void
> > +oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
> > +{
> > +  if (!ctx)
> > +return;
> > +
> > +  for (tree v = bindvars; v; v = DECL_CHAIN (v))
> > +if (VAR_P (v) && TREE_ADDRESSABLE (v))
> > +  ctx->oacc_addressable_var_decls->safe_push (v);
> > +}  
> 
> ..., and similarly here analyze 'v' (without 'lookup_decl (v,
> ctx)')...

I'm not so sure about this one: if the variables are declared at a
particular binding level, I think they have to be in the current OMP
context (and thus shadow any definitions that might be present in the
parent context)? Maybe that can be confirmed via an assertion.

> > +/* Mark addressable variables which are declared implicitly or
> > explicitly as
> > +   gang private with a special attribute.  These may need to have
> > their
> > +   declarations altered later on in compilation (e.g. in
> > +   execute_oacc_device_lower or the backend, depending on how the
> > OpenACC
> > +   execution model is implemented on a given target) to ensure
> > that sharing
> > +   semantics are correct.  */
> > +
> > +static void
> > +mark_oacc_gangprivate (vec *decls, omp_context *ctx)
> > +{
> > +  int i;
> > +  tree decl;
> > +
> > +  FOR_EACH_VEC_ELT (*decls, i, decl)
> > +{
> > +  for (omp_context *thisctx = ctx; thisctx; thisctx =
> > thisctx->outer)
> > +   {
> > + tree inner_decl = maybe_lookup_decl (decl, thisctx);
> > + if (inner_decl)
> > +   {
> > + decl = inner_decl;
> > + break;
> > +   }
> > +   }
> > +  if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES
> > (decl)))
> > +   {
> > + if (dump_file && (dump_flags & TDF_DETAILS))
> > +   {
> > + fprintf (dump_file,
> > +  "Setting 'oacc gangprivate' attribute for
> > decl:");
> > + print_generic_decl (dump_file, decl, TDF_SLIM);
> > + fputc ('\n', dump_file);
> > +   }
> > + DECL_ATTRIBUTES (decl)
> > +   = tree_cons (get_identifier ("oacc gangprivate"),
> > +NULL, DECL_ATTRIBUTES (decl));
> > +   }
> > +}
> > +}  
> 
> ..., but here we action on the 'maybe_lookup_decl'-translated
> 'inner_decl', if applicable.  In certain cases that one may be
> different from the original 'decl'.  (In particular (only?), when the
> OMP lowering has made 'decl' "late 'TREE_ADDRESSABLE'".)  This
> assymetry I understand to give rise to 
> "[12 Regression] ICE in expand_gimple_stmt_1, at cfgexpand.c:3932
> since r12-980-g29a2f51806c".
> 
> It makes sense to me that we do the OpenACC privatization on the
> 'lookup_decl' -- but shouldn't we then do that in the analysis phase,
> too?  (This appears to work fine for OpenACC 'private' clauses (...,
> and avoids marking a few as addressable/gang-private), and for those
> in 'gimple_bind_vars' it doesn't seem to make a difference (for the
> current test cases and/or compiler transformations).)

Yes, I think you're right.

> And, second question: what case did you run into or foresee, that you
> here need the 'thisctx' loop and 'maybe_lookup_decl', instead of a
> plain 'lookup_decl (decl, ctx)'?  Per my testing that's sufficient.

I'd probably misunderstood about lookup_decl walking up through parent
contexts itself... oops.

> Unless you think this needs more consideration, I suggest to do these
> two changes.  (I have a WIP patch in testing.)

Sounds good to me.

Thank you,

Julian


Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2022-02-14 Thread Thomas Schwinge
Hi Julian!

Two more questions here, in context of 
"[12 Regression] ICE in expand_gimple_stmt_1, at cfgexpand.c:3932 since
r12-980-g29a2f51806c":

On 2019-06-03T17:02:45+0100, Julian Brown  wrote:
> This is a new version of the patch, rebased

The code as we've now got it in master branch has changed some more, but
I think the behavior I'm seeing may have been introduced here:

> and with a couple of
> additional bugfixes, as follows:
>
> Firstly, in mark_oacc_gangprivate, each decl is looked up (using
> maybe_lookup_decl) to apply the "oacc gangprivate" attribute to the
> innermost-nested copy of the decl.

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -137,6 +137,12 @@ struct omp_context

> +  /* Addressable variable decls in this context.  */
> +  vec *oacc_addressable_var_decls;
>  };

> +/* Record vars listed in private clauses in CLAUSES in CTX.  This information
> +   is used to mark up variables that should be made private per-gang.  */
> +
> +static void
> +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
> +{
> +  tree c;
> +
> +  if (!ctx)
> +return;
> +
> +  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> +if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
> +  {
> + tree decl = OMP_CLAUSE_DECL (c);
> + if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
> +   ctx->oacc_addressable_var_decls->safe_push (decl);
> +  }
> +}

So, here we analyze 'OMP_CLAUSE_DECL (c)' (as is, without translation
through 'lookup_decl (decl, ctx)')...

> +/* Record addressable vars declared in BINDVARS in CTX.  This information is
> +   used to mark up variables that should be made private per-gang.  */
> +
> +static void
> +oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
> +{
> +  if (!ctx)
> +return;
> +
> +  for (tree v = bindvars; v; v = DECL_CHAIN (v))
> +if (VAR_P (v) && TREE_ADDRESSABLE (v))
> +  ctx->oacc_addressable_var_decls->safe_push (v);
> +}

..., and similarly here analyze 'v' (without 'lookup_decl (v, ctx)')...

> +/* Mark addressable variables which are declared implicitly or explicitly as
> +   gang private with a special attribute.  These may need to have their
> +   declarations altered later on in compilation (e.g. in
> +   execute_oacc_device_lower or the backend, depending on how the OpenACC
> +   execution model is implemented on a given target) to ensure that sharing
> +   semantics are correct.  */
> +
> +static void
> +mark_oacc_gangprivate (vec *decls, omp_context *ctx)
> +{
> +  int i;
> +  tree decl;
> +
> +  FOR_EACH_VEC_ELT (*decls, i, decl)
> +{
> +  for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
> + {
> +   tree inner_decl = maybe_lookup_decl (decl, thisctx);
> +   if (inner_decl)
> + {
> +   decl = inner_decl;
> +   break;
> + }
> + }
> +  if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl)))
> + {
> +   if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> +   fprintf (dump_file,
> +"Setting 'oacc gangprivate' attribute for decl:");
> +   print_generic_decl (dump_file, decl, TDF_SLIM);
> +   fputc ('\n', dump_file);
> + }
> +   DECL_ATTRIBUTES (decl)
> + = tree_cons (get_identifier ("oacc gangprivate"),
> +  NULL, DECL_ATTRIBUTES (decl));
> + }
> +}
> +}

..., but here we action on the 'maybe_lookup_decl'-translated
'inner_decl', if applicable.  In certain cases that one may be different
from the original 'decl'.  (In particular (only?), when the OMP lowering
has made 'decl' "late 'TREE_ADDRESSABLE'".)  This assymetry I understand
to give rise to  "[12 Regression] ICE in
expand_gimple_stmt_1, at cfgexpand.c:3932 since r12-980-g29a2f51806c".

It makes sense to me that we do the OpenACC privatization on the
'lookup_decl' -- but shouldn't we then do that in the analysis phase,
too?  (This appears to work fine for OpenACC 'private' clauses (..., and
avoids marking a few as addressable/gang-private), and for those in
'gimple_bind_vars' it doesn't seem to make a difference (for the current
test cases and/or compiler transformations).)

And, second question: what case did you run into or foresee, that you
here need the 'thisctx' loop and 'maybe_lookup_decl', instead of a plain
'lookup_decl (decl, ctx)'?  Per my testing that's sufficient.

Unless you think this needs more consideration, I suggest to do these two
changes.  (I have a WIP patch in testing.)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2021-05-21 Thread Thomas Schwinge
Hi!

On 2019-06-07T15:08:37+0100, Julian Brown  wrote:
> Hi Jakub,
>
> Thanks for the review! I believe I've addressed all your comments in
> the attached version of the patch.
>
> On Mon, 3 Jun 2019 18:23:00 +0200
> Jakub Jelinek  wrote:
>> > +/* Record vars listed in private clauses in CLAUSES in CTX.  This 
>> > information
>> > +   is used to mark up variables that should be made private per-gang.  */
>> > +
>> > +static void
>> > +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
>> > +{
>> > +  [...]
>> > +}
>>
>> You don't want to do this for all GOMP_FOR or GOMP_TARGET context,
>> I'd hope you only want to do that for OpenACC contexts.

> I've [...] fixed the patch to only call oacc_record_private_var_clauses in
> OpenACC contexts.

> commit 6c2a018b940d0b132395048b0600f7d897319ee2
> Author: Julian Brown 
> Date:   Thu Aug 9 20:27:04 2018 -0700
>
> [OpenACC] Add support for gang local storage allocation in shared memory

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c

> @@ -8599,6 +8681,9 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context 
> *ctx)
>
>push_gimplify_context ();
>
> +  if (is_gimple_omp_oacc (ctx->stmt))
> +oacc_record_private_var_clauses (ctx, gimple_omp_for_clauses (stmt));
> +
>lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx);
>
>block = make_node (BLOCK);

So, yes -- but then, apparently, that again got lost in a later version
of the patch.  ;-)

I've pushed "[OpenACC privatization] Don't evaluate OpenMP 'for' clauses
[PR90115]" to master branch in commit
3a285ebd0cf5ab762726018515d23280fa6dd445, see attached.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
>From 3a285ebd0cf5ab762726018515d23280fa6dd445 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 20 May 2021 15:22:24 +0200
Subject: [PATCH] [OpenACC privatization] Don't evaluate OpenMP 'for' clauses
 [PR90115]

	gcc/
	PR middle-end/90115
	* omp-low.c (lower_omp_for): Don't evaluate OpenMP 'for' clauses.
---
 gcc/omp-low.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index da827ef2e34..a86c6c1e82c 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11067,7 +11067,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   push_gimplify_context ();
 
-  oacc_privatization_scan_clause_chain (ctx, gimple_omp_for_clauses (stmt));
+  if (is_gimple_omp_oacc (ctx->stmt))
+oacc_privatization_scan_clause_chain (ctx, gimple_omp_for_clauses (stmt));
 
   lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx);
 
-- 
2.30.2



Add 'libgomp.oacc-c-c++-common/loop-gwv-2.c' (was: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory)

2021-05-19 Thread Thomas Schwinge
Hi!

On 2018-08-13T21:41:50+0100, Julian Brown  wrote:
> On Mon, 13 Aug 2018 11:42:26 -0700 Cesar Philippidis  
> wrote:
>> On 08/13/2018 09:21 AM, Julian Brown wrote:
>> > diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c 
>> > b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
>> > new file mode 100644
>> > index 000..2fa708a
>> > --- /dev/null
>> > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
>> > @@ -0,0 +1,106 @@
>> > +/* { dg-xfail-run-if "gangprivate failure" { 
>> > openacc_nvidia_accel_selected } { "-O0" } { "" } } */

>> is the above xfail still necessary? It seems to xpass
>> for me on nvptx. However, I see this regression on the host:
>>
>> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/loop-gwv-2.c
>> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1  -O2  execution test

> Oops, this was the version of the patch I meant to post (and the one I
> tested). The XFAIL on loop-gwv-2.c isn't necessary, plus that test
> needed some other fixes to make it pass for NVPTX (it was written for
> GCN to start with).

As I should find out later, this testcase actually does work without the
code changes (OpenACC privatization levels) that it's accompanying -- and
I don't actually see anything in the testcase that the code changes would
trigger for.  Maybe it was for some earlier revision of these code
changes?  Anyway, as it's all-PASS for all systems that I've tested on,
I've now pushed "Add 'libgomp.oacc-c-c++-common/loop-gwv-2.c'" to master
branch in commit 5a16fb19e7c4274f8dd9bbdd30d7d06fe2eff8af, see attached.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
>From 5a16fb19e7c4274f8dd9bbdd30d7d06fe2eff8af Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Mon, 13 Aug 2018 21:41:50 +0100
Subject: [PATCH] Add 'libgomp.oacc-c-c++-common/loop-gwv-2.c'

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New.
---
 .../libgomp.oacc-c-c++-common/loop-gwv-2.c| 95 +++
 1 file changed, 95 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
new file mode 100644
index 000..a4f81a39e24
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
@@ -0,0 +1,95 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#if 0
+#define DEBUG(DIM, IDX, VAL) \
+  fprintf (stderr, "%sdist[%d] = %d\n", (DIM), (IDX), (VAL))
+#else
+#define DEBUG(DIM, IDX, VAL)
+#endif
+
+#define N (32*32*32)
+
+int
+check (const char *dim, int *dist, int dimsize)
+{
+  int ix;
+  int exit = 0;
+
+  for (ix = 0; ix < dimsize; ix++)
+{
+  DEBUG(dim, ix, dist[ix]);
+  if (dist[ix] < (N) / (dimsize + 0.5)
+	  || dist[ix] > (N) / (dimsize - 0.5))
+	{
+	  fprintf (stderr, "did not distribute to %ss (%d not between %d "
+		   "and %d)\n", dim, dist[ix], (int) ((N) / (dimsize + 0.5)),
+		   (int) ((N) / (dimsize - 0.5)));
+	  exit |= 1;
+	}
+}
+
+  return exit;
+}
+
+int main ()
+{
+  int ary[N];
+  int ix;
+  int exit = 0;
+  int gangsize = 0, workersize = 0, vectorsize = 0;
+  int *gangdist, *workerdist, *vectordist;
+
+  for (ix = 0; ix < N;ix++)
+ary[ix] = -1;
+
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+	copy(ary) copyout(gangsize, workersize, vectorsize)
+  {
+#pragma acc loop gang worker vector
+for (unsigned ix = 0; ix < N; ix++)
+  {
+	int g, w, v;
+
+	g = __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
+	w = __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
+	v = __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
+
+	ary[ix] = (g << 16) | (w << 8) | v;
+  }
+
+gangsize = __builtin_goacc_parlevel_size (GOMP_DIM_GANG);
+workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
+  }
+
+  gangdist = (int *) alloca (gangsize * sizeof (int));
+  workerdist = (int *) alloca (workersize * sizeof (int));
+  vectordist = (int *) alloca (vectorsize * sizeof (int));
+  memset (gangdist, 0, gangsize * sizeof (int));
+  memset (workerdist, 0, workersize * sizeof (int));
+  memset (vectordist, 0, vectorsize * sizeof (int));
+
+  /* Test that work is shared approximately equally amongst each active
+ gang/worker/vector.  */
+  for (ix = 0; ix < N; ix++)
+{
+  int g = (ary[ix] >> 16) & 255;
+  int w = (ary[ix] >> 8) & 255;
+  int v = ary[ix] & 255;
+
+  gangdist[g]++;
+  workerdist[w]++;
+  vectordist[v]++;
+}
+
+  exit = check ("gang", gangdist, gangsize);
+  exit |= check ("worker", workerdist, workersize);
+  exit |= check ("vector", vectordist, vectorsize);
+
+  return exit;
+}
-- 
2.30.2



Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2019-11-06 Thread Julian Brown
Hi!

This is a new patch that takes a different approach to the last-posted
version in this thread. I have combined the previous incremental patches
on the og9 branch that culminated in the following patch:

https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01220.html

From that email, the following explanation was given of the previous
approaches taken as to how the partitioning level for OpenACC "private"
variables was calculated and represented in the compiler, and how this
patch differs:

 - The first (by Chung-Lin Tang) recorded which variables should be
   made private per-gang in each front end (i.e. separately in C, C++
   and Fortran) using a new attribute "oacc gangprivate". This was
   deemed too early; the final determination about which loops are
   assigned which parallelism level has not yet been made at parse time.

 - The second, last discussed here:

 https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00726.html

   moved the analysis of OpenACC contexts to determine parallelism
   levels to omp-low.c (but kept the "oacc gangprivate" attribute and
   the NVPTX backend parts). However (as mentioned in that mail), this
   is still too early: in fact the final determination of the
   parallelism level for each loop (especially for loops without
   explicit gang/worker/vector clauses) does not happen until we reach
   the device compiler, in the oaccloops pass.

This patch builds on the second approach, but delays fixing the
parallelism level of each "private" variable (those that are
addressable, and declared private using OpenACC clauses or by defining
them in a scope nested within a compute region or partitioned loop)
until the oaccdevlow pass. This is done by adding a new internal UNIQUE
function (OACC_PRIVATE) that lists (the address of) each private
variable as an argument. These new internal functions fit into the
existing scheme for demarking OpenACC loops, as described in comments
in the patch.

Use of the "oacc gangprivate" attribute is now restricted to the NVPTX
backend (and could probably be replaced with some lighter-weight
mechanism as a followup).

I realised I omitted to make some of the cosmetic changes Thomas
highlighted below on starting to write this email, but I can do that
(with suitable retesting) if desired before committing.

On Wed, 12 Jun 2019 20:42:16 +0100
Julian Brown  wrote:

> On Wed, 12 Jun 2019 13:57:22 +0200
> Thomas Schwinge  wrote:
> 
> > I understand right that this will address some aspects of PR90115
> > "OpenACC: predetermined private levels for variables declared in
> > blocks" (so please mention that one in the ChangeLog updates, and
> > commit log), but it doesn't address all of these aspects (and see
> > also Cesar's list in
> > ),
> > and also not yet PR90114 "Predetermined private levels for variables
> > declared in OpenACC accelerator routines"?  
> 
> There's two possible reasons for placing gang-private variables in
> shared memory: correct implementation of OpenACC semantics, or
> optimisation, since shared memory is faster than local memory (on
> NVidia devices). Handling of private variables is intimately tied
> with the execution model for gangs/workers/vectors implemented by a
> particular target: for PTX, that's handled in the backend using a
> broadcasting/neutering scheme.
> 
> That is sufficient for code that e.g. sets a variable in worker-single
> mode and expects to use the value in worker-partitioned mode. The
> difficulty (semantics-wise) comes when the user wants to do something
> like an atomic operation in worker-partitioned mode and expects a
> worker-single variable to be shared across each partitioned worker.
> Forcing use of shared memory for such variables makes that work
> properly.
> 
> It is *not* sufficient for the next level down, though -- expecting to
> perform atomic operations in vector-partitioned mode on a variable
> that is declared in vector-single mode, i.e. so that it is supposed to
> be shared across all vector elements. AFAIK, that's not
> straightforward, and we haven't attempted to implement it.
> 
> I think the original motivation for this patch was optimisation,
> though -- typical code won't try to use atomics in this way. Cesar's
> list of caveats that you linked to seems to support that notion.

After a little further investigation, I came to the conclusion that the
patch was always originally about correctness, but optimisation. But
that's largely academic now.

> > I guess I'm not terribly happy with the 'goacc.expand_accel_var'
> > name. Using different "memories" for specially tagged DECLs seems
> > to be a pretty generic concept (address spaces?), and...  
> 
> This is partly another NVPTX weirdness -- the target uses address
> spaces, but only within the backend, and without using the generic
> middle-end address space machinery. The other reason for using an
> attribute instead of assigning an address space is 

Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2019-06-12 Thread Julian Brown
On Wed, 12 Jun 2019 13:57:22 +0200
Thomas Schwinge  wrote:

> Hi!
> 
> First, thanks for picking this up, and improving the patch you
> inherited.

Thanks for review!

> I understand right that this will address some aspects of PR90115
> "OpenACC: predetermined private levels for variables declared in
> blocks" (so please mention that one in the ChangeLog updates, and
> commit log), but it doesn't address all of these aspects (and see
> also Cesar's list in
> ),
> and also not yet PR90114 "Predetermined private levels for variables
> declared in OpenACC accelerator routines"?

There's two possible reasons for placing gang-private variables in
shared memory: correct implementation of OpenACC semantics, or
optimisation, since shared memory is faster than local memory (on NVidia
devices). Handling of private variables is intimately tied with the
execution model for gangs/workers/vectors implemented by a particular
target: for PTX, that's handled in the backend using a
broadcasting/neutering scheme.

That is sufficient for code that e.g. sets a variable in worker-single
mode and expects to use the value in worker-partitioned mode. The
difficulty (semantics-wise) comes when the user wants to do something
like an atomic operation in worker-partitioned mode and expects a
worker-single variable to be shared across each partitioned worker.
Forcing use of shared memory for such variables makes that work
properly.

It is *not* sufficient for the next level down, though -- expecting to
perform atomic operations in vector-partitioned mode on a variable
that is declared in vector-single mode, i.e. so that it is supposed to
be shared across all vector elements. AFAIK, that's not
straightforward, and we haven't attempted to implement it.

I think the original motivation for this patch was optimisation, though
-- typical code won't try to use atomics in this way. Cesar's list of
caveats that you linked to seems to support that notion.

> On Fri, 7 Jun 2019 15:08:37 +0100, Julian Brown
>  wrote:
> > --- a/gcc/config/nvptx/nvptx.c
> > +++ b/gcc/config/nvptx/nvptx.c  
> 
> > @@ -5237,6 +5248,10 @@ nvptx_file_end (void)
> >  write_shared_buffer (asm_out_file, vector_red_sym,
> >  vector_red_align, vector_red_size);
> >  
> > +  if (gangprivate_shared_size)
> > +write_shared_buffer (asm_out_file, gangprivate_shared_sym,
> > +gangprivate_shared_align,
> > gangprivate_shared_size);  
> 
> Curious, what is the reason that we maintain this
> '__gangprivate_shared' variable on a per-file basis instead of on a
> per-function basis (with names '__gangprivate_shared_[function]', or
> similar), which should make it more obvious where each block of
> '.shared' memory belongs to?

I can't comment on that, I'm afraid that was a part of the patch that I
inherited and didn't alter much...

> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi  
> 
> > +@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree
> > @var{var}) +This hook, if defined, is used by accelerator target
> > back-ends to expand +specially handled kinds of VAR_DECL
> > expressions.  A particular use is to +place variables with specific
> > attributes inside special accelarator +memories.  A return value of
> > NULL indicates that the target does not +handle this VAR_DECL, and
> > normal RTL expanding is resumed. +@end deftypefn  
> 
> I guess I'm not terribly happy with the 'goacc.expand_accel_var' name.
> Using different "memories" for specially tagged DECLs seems to be a
> pretty generic concept (address spaces?), and...

This is partly another NVPTX weirdness -- the target uses address
spaces, but only within the backend, and without using the generic
middle-end address space machinery. The other reason for using an
attribute instead of assigning an address space is that the former can
be detected by the target compiler, but will be ignored by the host
compiler. Forcing use of an address space this early would mean that
the same non-standard address space would have to make sense for both
host and offloaded code.

For AMD GCN, we do use the generic address space support, and I found
that I could re-use the "oacc gangprivate" attribute -- but not the
expand_accel_var hook (expand time is too late for that target).
Instead, another new hook "TARGET_GOACC_ADJUST_GANGPRIVATE_DECL" is
called from omp-offload.c:execute_oacc_device_lower for variables that
have the "oacc gangprivate" attribute set. Those bits haven't been
posted upstream yet, though.

> > --- a/gcc/expr.c
> > +++ b/gcc/expr.c
> > @@ -9974,8 +9974,19 @@ expand_expr_real_1 (tree exp, rtx target,
> > machine_mode tmode, exp = SSA_NAME_VAR (ssa_name);
> >goto expand_decl_rtl;
> >  
> > -case PARM_DECL:
> >  case VAR_DECL:
> > +  /* Allow accel compiler to handle specific cases of
> > variables,
> > +specifically those tagged with the "oacc gangprivate"

Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2019-06-12 Thread Thomas Schwinge
Hi!

First, thanks for picking this up, and improving the patch you inherited.


Then, just a few individual comments, not a complete review.

(As far as I concerned, and as far as relevant, these can be addressed
later, incrementally, of course.)


I understand right that this will address some aspects of PR90115
"OpenACC: predetermined private levels for variables declared in blocks"
(so please mention that one in the ChangeLog updates, and commit log),
but it doesn't address all of these aspects (and see also Cesar's list in
),
and also not yet PR90114 "Predetermined private levels for variables
declared in OpenACC accelerator routines"?


On Fri, 7 Jun 2019 15:08:37 +0100, Julian Brown  wrote:
> --- a/gcc/config/nvptx/nvptx.c
> +++ b/gcc/config/nvptx/nvptx.c

> @@ -5237,6 +5248,10 @@ nvptx_file_end (void)
>  write_shared_buffer (asm_out_file, vector_red_sym,
>vector_red_align, vector_red_size);
>  
> +  if (gangprivate_shared_size)
> +write_shared_buffer (asm_out_file, gangprivate_shared_sym,
> +  gangprivate_shared_align, gangprivate_shared_size);

Curious, what is the reason that we maintain this '__gangprivate_shared'
variable on a per-file basis instead of on a per-function basis (with
names '__gangprivate_shared_[function]', or similar), which should make
it more obvious where each block of '.shared' memory belongs to?


> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi

> +@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree @var{var})
> +This hook, if defined, is used by accelerator target back-ends to expand
> +specially handled kinds of VAR_DECL expressions.  A particular use is to
> +place variables with specific attributes inside special accelarator
> +memories.  A return value of NULL indicates that the target does not
> +handle this VAR_DECL, and normal RTL expanding is resumed.
> +@end deftypefn

I guess I'm not terribly happy with the 'goacc.expand_accel_var' name.
Using different "memories" for specially tagged DECLs seems to be a
pretty generic concept (address spaces?), and...

> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -9974,8 +9974,19 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
> tmode,
>exp = SSA_NAME_VAR (ssa_name);
>goto expand_decl_rtl;
>  
> -case PARM_DECL:
>  case VAR_DECL:
> +  /* Allow accel compiler to handle specific cases of variables,
> +  specifically those tagged with the "oacc gangprivate" attribute,
> +  which may be intended to be placed in special memory in GPUs.  */
> +  if (flag_openacc && targetm.goacc.expand_accel_var)
> + {
> +   temp = targetm.goacc.expand_accel_var (exp);
> +   if (temp)
> + return temp;
> + }
> +  /* ... fall through ...  */
> +
> +case PARM_DECL:

... I'm thus confused that there isn't already a generic mechanism
available in GCC, that we can just use instead of adding a new one here?
Thinking about the "address spaces" stuff in 'gcc/target.def' -- or is
that the wrong concept?  (I'm not familiar with all that, and haven't
looked closely.)


> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c

> +/* Record vars listed in private clauses in CLAUSES in CTX.  This information
> +   is used to mark up variables that should be made private per-gang.  */
> +
> +static void
> +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
> +{
> +  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> +if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
> +  {
> + tree decl = OMP_CLAUSE_DECL (c);
> + if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
> +   {
> + ctx->oacc_addressable_var_decls.safe_push (decl);
> + maybe_oacc_gangprivate_vars = true;
> +   }
> +  }
> +}

Are all the relevant variables addressable?  And/or, need only those be
considered?

> +/* Record addressable vars declared in BINDVARS in CTX.  This information is
> +   used to mark up variables that should be made private per-gang.  */
> +
> +static void
> +oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
> +{
> +  for (tree v = bindvars; v; v = DECL_CHAIN (v))
> +if (VAR_P (v) && TREE_ADDRESSABLE (v))
> +  {
> + ctx->oacc_addressable_var_decls.safe_push (v);
> + maybe_oacc_gangprivate_vars = true;
> +  }
> +}

Likewise.


> +/* Mark addressable variables which are declared implicitly or explicitly as
> +   gang private with a special attribute.  These may need to have their
> +   declarations altered later on in compilation (e.g. in
> +   execute_oacc_device_lower or the backend, depending on how the OpenACC
> +   execution model is implemented on a given target) to ensure that sharing
> +   semantics are correct.  */
> +
> +static void
> +mark_oacc_gangprivate (vec *decls, omp_context *ctx)
> +{
> +  int i;
> +  tree decl;
> +
> +  FOR_EACH_VEC_ELT (*decls, i, decl)
> +{
> +  for 

Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2019-06-12 Thread Tom de Vries
On 12-06-19 12:22, Jakub Jelinek wrote:
> On Fri, Jun 07, 2019 at 03:08:37PM +0100, Julian Brown wrote:
>> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
>> index a7f35ffe416..67e1e82ec00 100644
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -9794,6 +9882,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
>> omp_context *ctx)
>>  
>>if (offloaded)
>>  {
>> +  mark_oacc_gangprivate (>oacc_addressable_var_decls, ctx);
>> +
> 
> The above one still doesn't seem to be guarded for OpenACC constructs only.
> 
> As for the rest of the patch, you need Tom to look over the nvptx changes.

I haven't seen any nvptx changes mentioned since I ok-ed the nvptx part
( https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00324.html ), so on that
basis I'd say it's still ok.

Thanks,
- Tom


Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2019-06-12 Thread Jakub Jelinek
On Fri, Jun 07, 2019 at 03:08:37PM +0100, Julian Brown wrote:
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index a7f35ffe416..67e1e82ec00 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -9794,6 +9882,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
> omp_context *ctx)
>  
>if (offloaded)
>  {
> +  mark_oacc_gangprivate (>oacc_addressable_var_decls, ctx);
> +

The above one still doesn't seem to be guarded for OpenACC constructs only.

As for the rest of the patch, you need Tom to look over the nvptx changes.

Jakub


Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2019-06-07 Thread Julian Brown
Hi Jakub,

Thanks for the review! I believe I've addressed all your comments in
the attached version of the patch.

On Mon, 3 Jun 2019 18:23:00 +0200
Jakub Jelinek  wrote:

> Why vec * rather than vec?

> > @@ -878,6 +884,7 @@ new_omp_context (gimple *stmt, omp_context
> > *outer_ctx) }
> >  
> >ctx->cb.decl_map = new hash_map;
> > +  ctx->oacc_addressable_var_decls = new vec ();  
> 
> You then don't have to new it here and delete below.  As the context
> is cleared with XCNEW, you don't need to do anything here, and just
> release when deleting.  Note, even if using a pointer for some reason
> was needed (not in this case), using unconditional new for something
> only used for small subset of contexts is unacceptable, it would be
> then desirable to only create when needed.

Fixed.

> > +/* Record vars listed in private clauses in CLAUSES in CTX.  This
> > information
> > +   is used to mark up variables that should be made private
> > per-gang.  */ +
> > +static void
> > +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
> > +{
> > +  tree c;
> > +
> > +  if (!ctx)
> > +return;
> > +
> > +  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> > +if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
> > +  {
> > +   tree decl = OMP_CLAUSE_DECL (c);
> > +   if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
> > + ctx->oacc_addressable_var_decls->safe_push (decl);
> > +  }
> > +}  
> 
> You don't want to do this for all GOMP_FOR or GOMP_TARGET context,
> I'd hope you only want to do that for OpenACC contexts.  Perhaps it
> is ok to bail out early if the context isn't OpenACC one.  On the
> other side, the if (!ctx) condition makes no sense, the callers of
> course guarantee that ctx is non-NULL.

I'm not sure where that came from -- ctx can be NULL at the top-level
of lower_omp as called from execute_lower_omp. Maybe that was left over
from an earlier version of the patch. Anyway, I've removed that bit
and fixed the patch to only call oacc_record_private_var_clauses in
OpenACC contexts.

> > @@ -10665,6 +10774,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p,
> > omp_context *ctx) ctx);
> >break;
> >  case GIMPLE_BIND:
> > +  oacc_record_vars_in_bind (ctx, gimple_bind_vars (as_a  > *> (stmt)));  
> 
> Again, why is this done unconditionally?  It should be relevant to
> gather it only in some subset of context, so guard that and don't do
> it otherwise.

And here (where ctx *can* be NULL).

> >lower_omp (gimple_bind_body_ptr (as_a  (stmt)),
> > ctx); maybe_remove_omp_member_access_dummy_vars (as_a 
> > (stmt)); break;
> > @@ -10905,6 +11015,7 @@ execute_lower_omp (void)
> >  
> >if (all_contexts)
> >  {
> > +  splay_tree_foreach (all_contexts,
> > process_oacc_gangprivate_1, NULL);  
> 
> Similarly.  Either guard with if (flag_openacc), or have some flag
> cleared at the start of the pass and set only if you find something
> interesting so that the splay_tree_foreach does something.

I've introduced maybe_oacc_gangprivate_vars, and the splay tree walk is
only called if that's true. It's set whenever something's put in
oacc_addressable_var_decls in some omp context.

Re-tested with offloading to NVPTX. OK?

Thanks,

Julian

commit 6c2a018b940d0b132395048b0600f7d897319ee2
Author: Julian Brown 
Date:   Thu Aug 9 20:27:04 2018 -0700

[OpenACC] Add support for gang local storage allocation in shared memory

2019-06-03  Julian Brown  
Chung-Lin Tang  

gcc/
* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
(gangprivate_shared_size): New global variable.
(gangprivate_shared_align): Likewise.
(gangprivate_shared_sym): Likewise.
(gangprivate_shared_hmap): Likewise.
(nvptx_option_override): Initialize gangprivate_shared_sym,
gangprivate_shared_align.
(nvptx_file_end): Output gangprivate_shared_sym.
(nvptx_goacc_expand_accel_var): New function.
(nvptx_set_current_function): Initialise gangprivate_shared_hmap. Add
function comment.
(TARGET_GOACC_EXPAND_ACCEL): Likewise.
* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
* expr.c (expand_expr_real_1): Remap VAR_DECLs marked with the
"oacc gangprivate" attribute.
* omp-low.c (omp_context): Add oacc_partitioning_level and
oacc_addressable_var_decls fields.
(maybe_oacc_gangprivate_vars): New global variable.
(delete_omp_context): Release oacc_addressable_var_decls in old
omp_context.
(lower_oacc_head_tail): Record partitioning-level count in omp context.
(oacc_record_private_var_clauses, oacc_record_vars_in_bind,
mark_oacc_gangprivate): New functions.
(lower_omp_for): Call 

Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2019-06-03 Thread Jakub Jelinek
On Mon, Jun 03, 2019 at 05:02:45PM +0100, Julian Brown wrote:
> * omp-low.c (omp_context): Add oacc_partitioning_level and
> oacc_addressable_var_decls fields.
> (new_omp_context): Initialize oacc_addressable_var_decls in new
> omp_context.
> (delete_omp_context): Delete oacc_addressable_var_decls in old
> omp_context.
> (lower_oacc_head_tail): Record partitioning-level count in omp 
> context.
> (oacc_record_private_var_clauses, oacc_record_vars_in_bind,
> mark_oacc_gangprivate): New functions.
> (lower_omp_for): Call oacc_record_private_var_clauses with "for"
> clauses.
> (lower_omp_target): Likewise, for "target" clauses.
> Call mark_oacc_gangprivate for offloaded target regions.
> (process_oacc_gangprivate_1): New function.
> (lower_omp_1): Call oacc_record_vars_in_bind for GIMPLE_BIND within 
> OMP
> regions.
> (execute_lower_omp): Call process_oacc_gangprivate_1 for each OMP
> context.

Just commenting on the above part:

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -137,6 +137,12 @@ struct omp_context
>  
>/* True if this construct can be cancelled.  */
>bool cancellable;
> +
> +  /* The number of levels of OpenACC partitioning invoked in this context.  
> */
> +  unsigned oacc_partitioning_levels;
> +
> +  /* Addressable variable decls in this context.  */
> +  vec *oacc_addressable_var_decls;

Why vec * rather than vec?

> @@ -878,6 +884,7 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
>  }
>  
>ctx->cb.decl_map = new hash_map;
> +  ctx->oacc_addressable_var_decls = new vec ();

You then don't have to new it here and delete below.  As the context is
cleared with XCNEW, you don't need to do anything here, and just
release when deleting.  Note, even if using a pointer for some reason was
needed (not in this case), using unconditional new for something only used
for small subset of contexts is unacceptable, it would be then desirable to
only create when needed.

>  
>return ctx;
>  }
> @@ -960,6 +967,7 @@ delete_omp_context (splay_tree_value value)
>  }
>  
>delete ctx->lastprivate_conditional_map;
> +  delete ctx->oacc_addressable_var_decls;
>  
>XDELETE (ctx);
>  }
> @@ -8458,6 +8469,79 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, 
> gimple_seq *body_p,
>  }
>  }
>  
> +/* Record vars listed in private clauses in CLAUSES in CTX.  This information
> +   is used to mark up variables that should be made private per-gang.  */
> +
> +static void
> +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
> +{
> +  tree c;
> +
> +  if (!ctx)
> +return;
> +
> +  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> +if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
> +  {
> + tree decl = OMP_CLAUSE_DECL (c);
> + if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
> +   ctx->oacc_addressable_var_decls->safe_push (decl);
> +  }
> +}

You don't want to do this for all GOMP_FOR or GOMP_TARGET context, I'd hope
you only want to do that for OpenACC contexts.  Perhaps it is ok
to bail out early if the context isn't OpenACC one.  On the other side, the
if (!ctx) condition makes no sense, the callers of course guarantee that ctx
is non-NULL.

> @@ -10665,6 +10774,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context 
> *ctx)
>ctx);
>break;
>  case GIMPLE_BIND:
> +  oacc_record_vars_in_bind (ctx, gimple_bind_vars (as_a  
> (stmt)));

Again, why is this done unconditionally?  It should be relevant to gather it
only in some subset of context, so guard that and don't do it otherwise.

>lower_omp (gimple_bind_body_ptr (as_a  (stmt)), ctx);
>maybe_remove_omp_member_access_dummy_vars (as_a  (stmt));
>break;
> @@ -10905,6 +11015,7 @@ execute_lower_omp (void)
>  
>if (all_contexts)
>  {
> +  splay_tree_foreach (all_contexts, process_oacc_gangprivate_1, NULL);

Similarly.  Either guard with if (flag_openacc), or have some flag cleared
at the start of the pass and set only if you find something interesting so
that the splay_tree_foreach does something.

Jakub


Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2019-06-03 Thread Julian Brown
On Tue, 11 Dec 2018 15:08:11 +
Julian Brown  wrote:

> Is this version OK? Re-tested with offloading to NVPTX.

This is a ping for the patch posted here:

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00749.html

This is a new version of the patch, rebased and with a couple of
additional bugfixes, as follows:

Firstly, in mark_oacc_gangprivate, each decl is looked up (using
maybe_lookup_decl) to apply the "oacc gangprivate" attribute to the
innermost-nested copy of the decl.

Secondly, I'd misunderstood when the maximum parallelism level was
calculated for each nested omp_context, meaning that the code to
trigger adding the "oacc gangprivate" attribute could trigger in the
wrong circumstances. I've fixed this by moving the attribute-setting to
execute_lower_omp.

I've also added a new testcase (gangprivate-attrib-2.f90). Re-tested
with offloading to nvptx.

OK for trunk?

Thank you,

Julian

2019-06-03  Julian Brown  
Chung-Lin Tang  

gcc/
* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
(gangprivate_shared_size): New global variable.
(gangprivate_shared_align): Likewise.
(gangprivate_shared_sym): Likewise.
(gangprivate_shared_hmap): Likewise.
(nvptx_option_override): Initialize gangprivate_shared_sym,
gangprivate_shared_align.
(nvptx_file_end): Output gangprivate_shared_sym.
(nvptx_goacc_expand_accel_var): New function.
(nvptx_set_current_function): Initialise gangprivate_shared_hmap. Add
function comment.
(TARGET_GOACC_EXPAND_ACCEL): Likewise.
* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
* expr.c (expand_expr_real_1): Remap VAR_DECLs marked with the
"oacc gangprivate" attribute.
* omp-low.c (omp_context): Add oacc_partitioning_level and
oacc_addressable_var_decls fields.
(new_omp_context): Initialize oacc_addressable_var_decls in new
omp_context.
(delete_omp_context): Delete oacc_addressable_var_decls in old
omp_context.
(lower_oacc_head_tail): Record partitioning-level count in omp context.
(oacc_record_private_var_clauses, oacc_record_vars_in_bind,
mark_oacc_gangprivate): New functions.
(lower_omp_for): Call oacc_record_private_var_clauses with "for"
clauses.
(lower_omp_target): Likewise, for "target" clauses.
Call mark_oacc_gangprivate for offloaded target regions.
(process_oacc_gangprivate_1): New function.
(lower_omp_1): Call oacc_record_vars_in_bind for GIMPLE_BIND within OMP
regions.
(execute_lower_omp): Call process_oacc_gangprivate_1 for each OMP
context.
* target.def (expand_accel_var): New hook.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
* testsuite/libgomp.oacc-c/pr85465.c: New test.
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: New test.
>From 917189cd07fcb68ba289c5fbcd768b7d4dff785f Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Thu, 9 Aug 2018 20:27:04 -0700
Subject: [PATCH] [OpenACC] Add support for gang local storage allocation in
 shared memory

2019-06-03  Julian Brown  
	Chung-Lin Tang  

	gcc/
	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
	(gangprivate_shared_size): New global variable.
	(gangprivate_shared_align): Likewise.
	(gangprivate_shared_sym): Likewise.
	(gangprivate_shared_hmap): Likewise.
	(nvptx_option_override): Initialize gangprivate_shared_sym,
	gangprivate_shared_align.
	(nvptx_file_end): Output gangprivate_shared_sym.
	(nvptx_goacc_expand_accel_var): New function.
	(nvptx_set_current_function): Initialise gangprivate_shared_hmap. Add
	function comment.
	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
	* expr.c (expand_expr_real_1): Remap VAR_DECLs marked with the
	"oacc gangprivate" attribute.
	* omp-low.c (omp_context): Add oacc_partitioning_level and
	oacc_addressable_var_decls fields.
	(new_omp_context): Initialize oacc_addressable_var_decls in new
	omp_context.
	(delete_omp_context): Delete oacc_addressable_var_decls in old
	omp_context.
	(lower_oacc_head_tail): Record partitioning-level count in omp context.
	(oacc_record_private_var_clauses, oacc_record_vars_in_bind,
	mark_oacc_gangprivate): New functions.
	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
	clauses.
	(lower_omp_target): Likewise, for "target" clauses.
	Call mark_oacc_gangprivate for offloaded target regions.
	(process_oacc_gangprivate_1): New funct

Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2018-12-11 Thread Julian Brown
On Fri, 17 Aug 2018 18:39:00 +0200
Bernhard Reutner-Fischer  wrote:

> On 16 August 2018 17:46:43 CEST, Julian Brown
>  wrote:
> >On Wed, 15 Aug 2018 21:56:54 +0200
> >Bernhard Reutner-Fischer  wrote:
> >  
> >> On 15 August 2018 18:46:37 CEST, Julian Brown
> >>  wrote:  
> >> >On Mon, 13 Aug 2018 12:06:21 -0700
> >> >Cesar Philippidis  wrote:
> >> 
> >> atttribute has more t than strictly necessary. 
> >> Don't like signed integer levels where they should be some
> >> unsigned. Also don't like single switch cases instead of if.
> >> And omitting function comments even if the hook way above is
> >> documented may be ok ish but is a bit lazy ;)  
> >
> >Here's a new version with those comments addressed. I also changed
> >the logic around a little to avoid adding decls to the vec in
> >omp_context which would never be given the gang-private attribute.
> >
> >Re-tested with offloading to NVPTX.
> >
> >OK?  
> 
> (TREE_CODE (var) == VAR_DECL
> Is nowadays known as VAR_P (decl), FWIW.

Fixed. (And also Tom's formatting nit mentioned in another email.)

> ISTM that global variables are not JIT-friendly.
> No further comments from me.

Probably true, but AFAIK nobody's trying to use the (GCC) JIT with the
PTX backend, and the backend already uses global variables for several
other purposes. Of course PTX code is JIT'ted itself by the NVidia
runtime, but I guess that's not what you were referring to!

Is this version OK? Re-tested with offloading to NVPTX.

Thanks,

Julian
commit 3335ddfa72944be5359280116e8eb4febd4ed3c7
Author: Julian Brown 
Date:   Thu Aug 9 20:27:04 2018 -0700

[OpenACC] Add support for gang local storage allocation in shared memory

2018-08-10  Julian Brown  
	Chung-Lin Tang  

	gcc/
	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
	(gangprivate_shared_size): New global variable.
	(gangprivate_shared_align): Likewise.
	(gangprivate_shared_sym): Likewise.
	(gangprivate_shared_hmap): Likewise.
	(nvptx_option_override): Initialize gangprivate_shared_sym,
	gangprivate_shared_align.
	(nvptx_file_end): Output gangprivate_shared_sym.
	(nvptx_goacc_expand_accel_var): New function.
	(nvptx_set_current_function): New function.
	(TARGET_SET_CURRENT_FUNCTION): Define hook.
	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
	* expr.c (expand_expr_real_1): Remap decls marked with the
	"oacc gangprivate" attribute.
	* omp-low.c (omp_context): Add oacc_partitioning_level and
	oacc_addressable_var_decls fields.
	(new_omp_context): Initialize oacc_addressable_var_decls in new
	omp_context.
	(delete_omp_context): Delete oacc_addressable_var_decls in old
	omp_context.
	(lower_oacc_head_tail): Record partitioning-level count in omp context.
	(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
	(mark_oacc_gangprivate): New functions.
	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
	clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
	(lower_omp_target): Call oacc_record_private_var_clauses with "target"
	clauses.
	Call mark_oacc_gangprivate for offloaded target regions.
	(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
	* target.def (expand_accel_var): New hook.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
	* testsuite/libgomp.oacc-c/pr85465.c: New test.
	* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 9903a27..02c2847 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -73,6 +73,7 @@
 #include "cfgloop.h"
 #include "fold-const.h"
 #include "intl.h"
+#include "tree-hash-traits.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -137,6 +138,12 @@ static unsigned worker_red_size;
 static unsigned worker_red_align;
 static GTY(()) rtx worker_red_sym;
 
+/* Shared memory block for gang-private variables.  */
+static unsigned gangprivate_shared_size;
+static unsigned gangprivate_shared_align;
+static GTY(()) rtx gangprivate_shared_sym;
+static hash_map gangprivate_shared_hmap;
+
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
@@ -210,6 +217,10 @@ nvptx_option_override (void)
   SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 
+  gangprivate_shared_sym = gen_rtx_SYMBOL_REF (Pmode, "__gangprivate_shared");
+  SET_SYMBOL_DATA_AREA (gangprivate_shared_sym, DATA_AREA_SHARED);
+  gangprivate_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
   diagnose_openacc_conflict (TARGET_GOMP, 

Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2018-10-05 Thread Tom de Vries
On 8/16/18 5:46 PM, Julian Brown wrote:
> On Wed, 15 Aug 2018 21:56:54 +0200
> Bernhard Reutner-Fischer  wrote:
> 
>> On 15 August 2018 18:46:37 CEST, Julian Brown
>>  wrote:
>>> On Mon, 13 Aug 2018 12:06:21 -0700
>>> Cesar Philippidis  wrote:  
>>
>> atttribute has more t than strictly necessary. 
>> Don't like signed integer levels where they should be some unsigned. 
>> Also don't like single switch cases instead of if.
>> And omitting function comments even if the hook way above is
>> documented may be ok ish but is a bit lazy ;)
> 
> Here's a new version with those comments addressed. I also changed the
> logic around a little to avoid adding decls to the vec in omp_context
> which would never be given the gang-private attribute.
> 
> Re-tested with offloading to NVPTX.
> 
> OK?

As far as the nvptx part is concerned, I see:
...
=== ERROR type #4: trailing operator (1 error(s)) ===
gcc/config/nvptx/nvptx.c:5946:27: gangprivate_shared_size =
...

Otherwise, the nvptx part is OK.

Thanks,
- Tom

> 
> Julian
> 
> 2018-08-10  Julian Brown  
> Chung-Lin Tang  
> 
> gcc/
> * config/nvptx/nvptx.c (tree-hash-traits.h): Include.
> (gangprivate_shared_size): New global variable.
> (gangprivate_shared_align): Likewise.
> (gangprivate_shared_sym): Likewise.
> (gangprivate_shared_hmap): Likewise.
> (nvptx_option_override): Initialize gangprivate_shared_sym,
> gangprivate_shared_align.
> (nvptx_file_end): Output gangprivate_shared_sym.
> (nvptx_goacc_expand_accel_var): New function.
> (nvptx_set_current_function): New function.
> (TARGET_SET_CURRENT_FUNCTION): Define hook.
> (TARGET_GOACC_EXPAND_ACCEL): Likewise.
> * doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
> * doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
> * expr.c (expand_expr_real_1): Remap decls marked with the
> "oacc gangprivate" attribute.
> * omp-low.c (omp_context): Add oacc_partitioning_level and
> oacc_addressable_var_decls fields.
> (new_omp_context): Initialize oacc_addressable_var_decls in new
> omp_context.
> (delete_omp_context): Delete oacc_addressable_var_decls in old
> omp_context.
> (lower_oacc_head_tail): Record partitioning-level count in omp 
> context.
> (oacc_record_private_var_clauses, oacc_record_vars_in_bind)
> (mark_oacc_gangprivate): New functions.
> (lower_omp_for): Call oacc_record_private_var_clauses with "for"
> clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
> (lower_omp_target): Call oacc_record_private_var_clauses with "target"
> clauses.
> Call mark_oacc_gangprivate for offloaded target regions.
> (lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
> * target.def (expand_accel_var): New hook.
> 
> libgomp/
> * testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
> * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
> * testsuite/libgomp.oacc-c/pr85465.c: New test.
> * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.
> 


Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2018-08-17 Thread Bernhard Reutner-Fischer
On 16 August 2018 17:46:43 CEST, Julian Brown  wrote:
>On Wed, 15 Aug 2018 21:56:54 +0200
>Bernhard Reutner-Fischer  wrote:
>
>> On 15 August 2018 18:46:37 CEST, Julian Brown
>>  wrote:
>> >On Mon, 13 Aug 2018 12:06:21 -0700
>> >Cesar Philippidis  wrote:  
>> 
>> atttribute has more t than strictly necessary. 
>> Don't like signed integer levels where they should be some unsigned. 
>> Also don't like single switch cases instead of if.
>> And omitting function comments even if the hook way above is
>> documented may be ok ish but is a bit lazy ;)
>
>Here's a new version with those comments addressed. I also changed the
>logic around a little to avoid adding decls to the vec in omp_context
>which would never be given the gang-private attribute.
>
>Re-tested with offloading to NVPTX.
>
>OK?

(TREE_CODE (var) == VAR_DECL
Is nowadays known as VAR_P (decl), FWIW.

ISTM that global variables are not JIT-friendly.
No further comments from me.

Thanks,


Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2018-08-16 Thread Julian Brown
On Wed, 15 Aug 2018 21:56:54 +0200
Bernhard Reutner-Fischer  wrote:

> On 15 August 2018 18:46:37 CEST, Julian Brown
>  wrote:
> >On Mon, 13 Aug 2018 12:06:21 -0700
> >Cesar Philippidis  wrote:  
> 
> atttribute has more t than strictly necessary. 
> Don't like signed integer levels where they should be some unsigned. 
> Also don't like single switch cases instead of if.
> And omitting function comments even if the hook way above is
> documented may be ok ish but is a bit lazy ;)

Here's a new version with those comments addressed. I also changed the
logic around a little to avoid adding decls to the vec in omp_context
which would never be given the gang-private attribute.

Re-tested with offloading to NVPTX.

OK?

Julian

2018-08-10  Julian Brown  
Chung-Lin Tang  

gcc/
* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
(gangprivate_shared_size): New global variable.
(gangprivate_shared_align): Likewise.
(gangprivate_shared_sym): Likewise.
(gangprivate_shared_hmap): Likewise.
(nvptx_option_override): Initialize gangprivate_shared_sym,
gangprivate_shared_align.
(nvptx_file_end): Output gangprivate_shared_sym.
(nvptx_goacc_expand_accel_var): New function.
(nvptx_set_current_function): New function.
(TARGET_SET_CURRENT_FUNCTION): Define hook.
(TARGET_GOACC_EXPAND_ACCEL): Likewise.
* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
* expr.c (expand_expr_real_1): Remap decls marked with the
"oacc gangprivate" attribute.
* omp-low.c (omp_context): Add oacc_partitioning_level and
oacc_addressable_var_decls fields.
(new_omp_context): Initialize oacc_addressable_var_decls in new
omp_context.
(delete_omp_context): Delete oacc_addressable_var_decls in old
omp_context.
(lower_oacc_head_tail): Record partitioning-level count in omp context.
(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
(mark_oacc_gangprivate): New functions.
(lower_omp_for): Call oacc_record_private_var_clauses with "for"
clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
(lower_omp_target): Call oacc_record_private_var_clauses with "target"
clauses.
Call mark_oacc_gangprivate for offloaded target regions.
(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
* target.def (expand_accel_var): New hook.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
* testsuite/libgomp.oacc-c/pr85465.c: New test.
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.
commit e276442550a85b62866ba13890eacf4e946d1079
Author: Julian Brown 
Date:   Thu Aug 9 20:27:04 2018 -0700

[OpenACC] Add support for gang local storage allocation in shared memory

2018-08-10  Julian Brown  
	Chung-Lin Tang  

	gcc/
	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
	(gangprivate_shared_size): New global variable.
	(gangprivate_shared_align): Likewise.
	(gangprivate_shared_sym): Likewise.
	(gangprivate_shared_hmap): Likewise.
	(nvptx_option_override): Initialize gangprivate_shared_sym,
	gangprivate_shared_align.
	(nvptx_file_end): Output gangprivate_shared_sym.
	(nvptx_goacc_expand_accel_var): New function.
	(nvptx_set_current_function): New function.
	(TARGET_SET_CURRENT_FUNCTION): Define hook.
	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
	* expr.c (expand_expr_real_1): Remap decls marked with the
	"oacc gangprivate" attribute.
	* omp-low.c (omp_context): Add oacc_partitioning_level and
	oacc_addressable_var_decls fields.
	(new_omp_context): Initialize oacc_addressable_var_decls in new
	omp_context.
	(delete_omp_context): Delete oacc_addressable_var_decls in old
	omp_context.
	(lower_oacc_head_tail): Record partitioning-level count in omp context.
	(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
	(mark_oacc_gangprivate): New functions.
	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
	clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
	(lower_omp_target): Call oacc_record_private_var_clauses with "target"
	clauses.
	Call mark_oacc_gangprivate for offloaded target regions.
	(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
	* target.def (expand_accel_var): New hook.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New 

Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2018-08-15 Thread Bernhard Reutner-Fischer
On 15 August 2018 18:46:37 CEST, Julian Brown  wrote:
>On Mon, 13 Aug 2018 12:06:21 -0700
>Cesar Philippidis  wrote:

atttribute has more t than strictly necessary. 
Don't like signed integer levels where they should be some unsigned. 
Also don't like single switch cases instead of if.
And omitting function comments even if the hook way above is documented may be 
ok ish but is a bit lazy ;)

thanks, 

>
>> So in other words, this is safe for fortran. It probably could use a
>> fortran test, because that functionality wasn't explicitly exercised
>> in og7/og8.
>
>Here's a new version of the patch with a Fortran test case. It's not
>too easy to write a test that depends on whether gang-local variables
>actually end up in the right kind of memory, so I wrote one that scans
>the omplower dump instead. Many other (including execution) tests will
>already trigger the new behaviour.
>
>Tested with offloading to NVPTX.
>
>OK?
>
>Thanks,
>
>Julian
>
>2018-08-10  Julian Brown  
>Chung-Lin Tang  
>
>gcc/
>* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
>(gangprivate_shared_size): New global variable.
>(gangprivate_shared_align): Likewise.
>(gangprivate_shared_sym): Likewise.
>(gangprivate_shared_hmap): Likewise.
>(nvptx_option_override): Initialize gangprivate_shared_sym,
>gangprivate_shared_align.
>(nvptx_file_end): Output gangprivate_shared_sym.
>(nvptx_goacc_expand_accel_var): New function.
>(nvptx_set_current_function): New function.
>(TARGET_SET_CURRENT_FUNCTION): Define hook.
>(TARGET_GOACC_EXPAND_ACCEL): Likewise.
>  * doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
>* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
>* expr.c (expand_expr_real_1): Remap decls marked with the
>"oacc gangprivate" atttribute.
>  * omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
>fields.
>(new_omp_context): Initialize oacc_decls in new omp_context.
>(delete_omp_context): Delete oacc_decls in old omp_context.
>(lower_oacc_head_tail): Record partitioning-level count in omp context.
>(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
>(mark_oacc_gangprivate): New functions.
>   (lower_omp_for): Call oacc_record_private_var_clauses with "for"
>   clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
> (lower_omp_target): Call oacc_record_private_var_clauses with "target"
>clauses.
>Call mark_oacc_gangprivate for offloaded target regions.
>   (lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
>* target.def (expand_accel_var): New hook.
>
>libgomp/
>  * testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
>* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
>* testsuite/libgomp.oacc-c/pr85465.c: New test.
>   * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.



Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2018-08-15 Thread Julian Brown
On Mon, 13 Aug 2018 12:06:21 -0700
Cesar Philippidis  wrote:

> So in other words, this is safe for fortran. It probably could use a
> fortran test, because that functionality wasn't explicitly exercised
> in og7/og8.

Here's a new version of the patch with a Fortran test case. It's not
too easy to write a test that depends on whether gang-local variables
actually end up in the right kind of memory, so I wrote one that scans
the omplower dump instead. Many other (including execution) tests will
already trigger the new behaviour.

Tested with offloading to NVPTX.

OK?

Thanks,

Julian

2018-08-10  Julian Brown  
Chung-Lin Tang  

gcc/
* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
(gangprivate_shared_size): New global variable.
(gangprivate_shared_align): Likewise.
(gangprivate_shared_sym): Likewise.
(gangprivate_shared_hmap): Likewise.
(nvptx_option_override): Initialize gangprivate_shared_sym,
gangprivate_shared_align.
(nvptx_file_end): Output gangprivate_shared_sym.
(nvptx_goacc_expand_accel_var): New function.
(nvptx_set_current_function): New function.
(TARGET_SET_CURRENT_FUNCTION): Define hook.
(TARGET_GOACC_EXPAND_ACCEL): Likewise.
* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
* expr.c (expand_expr_real_1): Remap decls marked with the
"oacc gangprivate" atttribute.
* omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
fields.
(new_omp_context): Initialize oacc_decls in new omp_context.
(delete_omp_context): Delete oacc_decls in old omp_context.
(lower_oacc_head_tail): Record partitioning-level count in omp context.
(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
(mark_oacc_gangprivate): New functions.
(lower_omp_for): Call oacc_record_private_var_clauses with "for"
clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
(lower_omp_target): Call oacc_record_private_var_clauses with "target"
clauses.
Call mark_oacc_gangprivate for offloaded target regions.
(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
* target.def (expand_accel_var): New hook.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
* testsuite/libgomp.oacc-c/pr85465.c: New test.
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.
commit b73428237720be8d5b6e793f8615204356336d30
Author: Julian Brown 
Date:   Thu Aug 9 20:27:04 2018 -0700

[OpenACC] Add support for gang local storage allocation in shared memory

2018-08-10  Julian Brown  
	Chung-Lin Tang  

	gcc/
	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
	(gangprivate_shared_size): New global variable.
	(gangprivate_shared_align): Likewise.
	(gangprivate_shared_sym): Likewise.
	(gangprivate_shared_hmap): Likewise.
	(nvptx_option_override): Initialize gangprivate_shared_sym,
	gangprivate_shared_align.
	(nvptx_file_end): Output gangprivate_shared_sym.
	(nvptx_goacc_expand_accel_var): New function.
	(nvptx_set_current_function): New function.
	(TARGET_SET_CURRENT_FUNCTION): Define hook.
	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
	* expr.c (expand_expr_real_1): Remap decls marked with the
	"oacc gangprivate" atttribute.
	* omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
	fields.
	(new_omp_context): Initialize oacc_decls in new omp_context.
	(delete_omp_context): Delete oacc_decls in old omp_context.
	(lower_oacc_head_tail): Record partitioning-level count in omp context.
	(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
	(mark_oacc_gangprivate): New functions.
	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
	clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
	(lower_omp_target): Call oacc_record_private_var_clauses with "target"
	clauses.
	Call mark_oacc_gangprivate for offloaded target regions.
	(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
	* target.def (expand_accel_var): New hook.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
	* testsuite/libgomp.oacc-c/pr85465.c: New test.
	* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index c0b0a2e..14eb842 100644
--- a/gcc/config/nvptx/nvptx.c
+++ 

Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2018-08-13 Thread Julian Brown
On Mon, 13 Aug 2018 11:42:26 -0700
Cesar Philippidis  wrote:

> On 08/13/2018 09:21 AM, Julian Brown wrote:
> 
> > diff --git
> > a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
> > b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c new file
> > mode 100644 index 000..2fa708a --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
> > @@ -0,0 +1,106 @@
> > +/* { dg-xfail-run-if "gangprivate
> > failure" { openacc_nvidia_accel_selected } { "-O0" } { "" } } */  
> 
> As a quick comment, I like the approach that you've taken with this
> patch, but the og8 patch only applies the gangprivate attribute in the
> c/c++ FE. I'd have to review the notes, but I seem to recall that
> excluding that clause in fortran was deliberate. Chung-Lin, do you
> recall the rationale behind that?
> 
> With that aside, is the above xfail still necessary? It seems to xpass
> for me on nvptx. However, I see this regression on the host:
> 
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/loop-gwv-2.c
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1  -O2  execution test
> 
> There could be other regressions, but I only tested the new tests
> introduced by the patch so far.

Oops, this was the version of the patch I meant to post (and the one I
tested). The XFAIL on loop-gwv-2.c isn't necessary, plus that test
needed some other fixes to make it pass for NVPTX (it was written for
GCN to start with).

Everything else is the same. I'll see what I can come up with for a
Fortran test.

Thanks,

Julian
commit 7834b2f0dffec3e56e510c04e1663424b778fdfb
Author: Julian Brown 
Date:   Thu Aug 9 20:27:04 2018 -0700

[OpenACC] Add support for gang local storage allocation in shared memory

2018-08-10  Julian Brown  
	Chung-Lin Tang  

	gcc/
	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
	(gangprivate_shared_size): New global variable.
	(gangprivate_shared_align): Likewise.
	(gangprivate_shared_sym): Likewise.
	(gangprivate_shared_hmap): Likewise.
	(nvptx_option_override): Initialize gangprivate_shared_sym,
	gangprivate_shared_align.
	(nvptx_file_end): Output gangprivate_shared_sym.
	(nvptx_goacc_expand_accel_var): New function.
	(nvptx_set_current_function): New function.
	(TARGET_SET_CURRENT_FUNCTION): Define hook.
	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
	* expr.c (expand_expr_real_1): Remap decls marked with the
	"oacc gangprivate" atttribute.
	* omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
	fields.
	(new_omp_context): Initialize oacc_decls in new omp_context.
	(delete_omp_context): Delete oacc_decls in old omp_context.
	(lower_oacc_head_tail): Record partitioning-level count in omp context.
	(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
	(mark_oacc_gangprivate): New functions.
	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
	clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
	(lower_omp_target): Call oacc_record_private_var_clauses with "target"
	clauses.
	Call mark_oacc_gangprivate for offloaded target regions.
	(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
	* target.def (expand_accel_var): New hook.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
	* testsuite/libgomp.oacc-c/pr85465.c: New test.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index c0b0a2e..14eb842 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -73,6 +73,7 @@
 #include "cfgloop.h"
 #include "fold-const.h"
 #include "intl.h"
+#include "tree-hash-traits.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -137,6 +138,12 @@ static unsigned worker_red_size;
 static unsigned worker_red_align;
 static GTY(()) rtx worker_red_sym;
 
+/* Shared memory block for gang-private variables.  */
+static unsigned gangprivate_shared_size;
+static unsigned gangprivate_shared_align;
+static GTY(()) rtx gangprivate_shared_sym;
+static hash_map gangprivate_shared_hmap;
+
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
@@ -210,6 +217,10 @@ nvptx_option_override (void)
   SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 
+  gangprivate_shared_sym = gen_rtx_SYMBOL_REF (Pmode, "__gangprivate_shared");
+  SET_SYMBOL_DATA_AREA (gangprivate_shared_sym, DATA_AREA_SHARED);
+  gangprivate_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
   diagnose_openacc_conflict (TARGET_GOMP, "-mgomp");
   diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack");
   

Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2018-08-13 Thread Cesar Philippidis
On 08/13/2018 11:42 AM, Cesar Philippidis wrote:
> On 08/13/2018 09:21 AM, Julian Brown wrote:
> 
>> diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c 
>> b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
>> new file mode 100644
>> index 000..2fa708a
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
>> @@ -0,0 +1,106 @@
>> +/* { dg-xfail-run-if "gangprivate failure" { openacc_nvidia_accel_selected 
>> } { "-O0" } { "" } } */
> 
> As a quick comment, I like the approach that you've taken with this
> patch, but the og8 patch only applies the gangprivate attribute in the
> c/c++ FE. I'd have to review the notes, but I seem to recall that
> excluding that clause in fortran was deliberate. Chung-Lin, do you
> recall the rationale behind that?

I found this in an old email:

  The older version of fortran that OpenACC supports doesn't have a
  concept of lexically scoped blocks like c/c++, so this isn't relevant
  except for explicit gang private variables.

So in other words, this is safe for fortran. It probably could use a
fortran test, because that functionality wasn't explicitly exercised in
og7/og8.

Cesar


Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2018-08-13 Thread Cesar Philippidis
On 08/13/2018 09:21 AM, Julian Brown wrote:

> diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c 
> b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
> new file mode 100644
> index 000..2fa708a
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
> @@ -0,0 +1,106 @@
> +/* { dg-xfail-run-if "gangprivate failure" { openacc_nvidia_accel_selected } 
> { "-O0" } { "" } } */

As a quick comment, I like the approach that you've taken with this
patch, but the og8 patch only applies the gangprivate attribute in the
c/c++ FE. I'd have to review the notes, but I seem to recall that
excluding that clause in fortran was deliberate. Chung-Lin, do you
recall the rationale behind that?

With that aside, is the above xfail still necessary? It seems to xpass
for me on nvptx. However, I see this regression on the host:

FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/loop-gwv-2.c
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1  -O2  execution test

There could be other regressions, but I only tested the new tests
introduced by the patch so far.

Cesar


[PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2018-08-13 Thread Julian Brown
This patch adds support for placing gang-private variables in NVPTX
per-CU shared memory. This is done by marking up addressable variables
declared at the appropriate parallelism level with an attribute ("oacc
gangprivate") in omp-low.c.

Target-dependent code in the NVPTX backend then modifies the symbol
associated with the variable at expand time via a new target hook
(TARGET_GOACC_EXPAND_ACCEL_VAR) in order to place it in shared memory,
which is faster to access than the ".local" memory that would otherwise
be used for such variables. This has (theoretical, at least)
consequences on program semantics, in that the shared memory is also
statically-allocated rather than obeying stack discipline -- but you
can't have recursive routine calls in OpenACC anyway, so that's no big
deal.

Other targets can use the same attribute in different ways, as
appropriate.

OK for trunk?

Thanks,

Julian

2018-08-10  Julian Brown  
Chung-Lin Tang  

gcc/
* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
(gangprivate_shared_size): New global variable.
(gangprivate_shared_align): Likewise.
(gangprivate_shared_sym): Likewise.
(gangprivate_shared_hmap): Likewise.
(nvptx_option_override): Initialize gangprivate_shared_sym,
gangprivate_shared_align.
(nvptx_file_end): Output gangprivate_shared_sym.
(nvptx_goacc_expand_accel_var): New function.
(nvptx_set_current_function): New function.
(TARGET_SET_CURRENT_FUNCTION): Define hook.
(TARGET_GOACC_EXPAND_ACCEL): Likewise.
* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
* expr.c (expand_expr_real_1): Remap decls marked with the
"oacc gangprivate" atttribute.
* omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
fields.
(new_omp_context): Initialize oacc_decls in new omp_context.
(delete_omp_context): Delete oacc_decls in old omp_context.
(lower_oacc_head_tail): Record partitioning-level count in omp context.
(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
(mark_oacc_gangprivate): New functions.
(lower_omp_for): Call oacc_record_private_var_clauses with "for"
clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
(lower_omp_target): Call oacc_record_private_var_clauses with "target"
clauses.
Call mark_oacc_gangprivate for offloaded target regions.
(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
* target.def (expand_accel_var): New hook.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
* testsuite/libgomp.oacc-c/pr85465.c: New test.
commit 9637e7ea887e100f35d99b8d12101f9f8a9b94e3
Author: Julian Brown 
Date:   Thu Aug 9 20:27:04 2018 -0700

[OpenACC] Add support for gang local storage allocation in shared memory

2018-08-10  Julian Brown  
	Chung-Lin Tang  

	gcc/
	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
	(gangprivate_shared_size): New global variable.
	(gangprivate_shared_align): Likewise.
	(gangprivate_shared_sym): Likewise.
	(gangprivate_shared_hmap): Likewise.
	(nvptx_option_override): Initialize gangprivate_shared_sym,
	gangprivate_shared_align.
	(nvptx_file_end): Output gangprivate_shared_sym.
	(nvptx_goacc_expand_accel_var): New function.
	(nvptx_set_current_function): New function.
	(TARGET_SET_CURRENT_FUNCTION): Define hook.
	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
	* expr.c (expand_expr_real_1): Remap decls marked with the
	"oacc gangprivate" atttribute.
	* omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
	fields.
	(new_omp_context): Initialize oacc_decls in new omp_context.
	(delete_omp_context): Delete oacc_decls in old omp_context.
	(lower_oacc_head_tail): Record partitioning-level count in omp context.
	(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
	(mark_oacc_gangprivate): New functions.
	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
	clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
	(lower_omp_target): Call oacc_record_private_var_clauses with "target"
	clauses.
	Call mark_oacc_gangprivate for offloaded target regions.
	(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
	* target.def (expand_accel_var): New hook.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
	*