Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices

2022-12-05 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 24, 2022 at 03:09:02PM +0100, Marcel Vollweiler wrote:
> gcc/ChangeLog:
> 
>   * gimplify.cc (optimize_target_teams): Set initial num_teams_upper
>   to "-2" instead of "1" for non-existing num_teams clause in order to
>   disambiguate from the case of an existing num_teams clause with value 1.
> 
> libgomp/ChangeLog:
> 
>   * config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
>   allow processing of device-specific values.
>   (omp_set_teams_thread_limit): Likewise.
>   (ialias): Likewise.
>   * config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
>   (omp_set_teams_thread_limit): Likewise.
>   (ialias): Likewise.
>   * icv-device.c (omp_get_teams_thread_limit): Likewise.
>   (ialias): Likewise.
>   (omp_set_teams_thread_limit): Likewise.
>   * icv.c (omp_set_teams_thread_limit): Removed.
>   (omp_get_teams_thread_limit): Likewise.
>   (ialias): Likewise.
>   * libgomp.texi: Updated documentation for nvptx and gcn corresponding
>   to the limitation of the number of teams.
>   * plugin/plugin-gcn.c (limit_teams): New helper function that limits
>   the number of teams by twice the number of compute units.
>   (parse_target_attributes): Limit the number of teams on gcn offload
>   devices.
>   * target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
>   handling.
>   (gomp_load_image_to_device): Added a size check for the ICVs struct
>   variable.
>   (gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
>   copy back the ICV values from device to host.
>   (GOMP_target_ext): Update the number of teams and threads in the kernel
>   args also considering device-specific values.
>   * testsuite/libgomp.c-c++-common/icv-4.c: Fixed an error in the reading
>   of OMP_TEAMS_THREAD_LIMIT from the environment.
>   * testsuite/libgomp.c-c++-common/icv-5.c: Extended.
>   * testsuite/libgomp.c-c++-common/icv-6.c: Extended.
>   * testsuite/libgomp.c-c++-common/icv-7.c: Extended.
>   * testsuite/libgomp.c-c++-common/icv-9.c: New test.
>   * testsuite/libgomp.fortran/icv-5.f90: New test.
>   * testsuite/libgomp.fortran/icv-6.f90: New test.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/gomp/target-teams-1.c: Adapt expected values for
>   num_teams from "1" to "-2" in cases without num_teams clause.
>   * g++.dg/gomp/target-teams-1.C: Likewise.
>   * gfortran.dg/gomp/defaultmap-4.f90: Likewise.
>   * gfortran.dg/gomp/defaultmap-5.f90: Likewise.
>   * gfortran.dg/gomp/defaultmap-6.f90: Likewise.

Ok, thanks.

Jakub



Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices

2022-11-24 Thread Marcel Vollweiler

Hi Jakub,


> * testsuite/libgomp.c-c++-common/icv-4.c: Bugfix.

Better say what exactly you changed in words.


Changed.


> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -14153,7 +14153,7 @@ optimize_target_teams (tree target, gimple_seq
*pre_p)
>struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;
>
>if (teams == NULL_TREE)
> -num_teams_upper = integer_one_node;
> +num_teams_upper = build_int_cst (integer_type_node, -2);
>else
>  for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c))
>{

The function comment above optimize_target_teams contains detailed description
on what the values mean and why, so it definitely should document what -2 means
and when it is used.
I know you have documentation in libgomp for it, but it should be in both 
places.


I updated the comment with an explanation for "-2".



> +  intptr_t new_teams = orig_teams, new_threads = orig_threads;
> +  /* ORIG_TEAMS == -2: No explicit teams construct specified. Set to 1.

Two spaces after .


Corrected here and at other places.



> + ORIG_TEAMS == -1: TEAMS construct with NUM_TEAMS clause specified, but
the
> +  value could not be specified. No Change.

Likewise.
lowercase change ?


Corrected.



> + ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause.
> + Set device-specific value.
> + ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause.
> +No change.  */
> +  if (orig_teams == -2)
> +new_teams = 1;
> +  else if (orig_teams == 0)
> +{
> +  struct gomp_offload_icv_list *item = gomp_get_offload_icv_item 
(device);
> +  if (item != NULL)
> +   new_teams = item->icvs.nteams;
> +}
> +  /* The device-specific teams-thread-limit is only set if (a) an explicit 
TEAMS
> + region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already 
set by
> + e.g. a THREAD_LIMIT clause.  */
> +  if (orig_teams >= -2 && orig_threads == 0)

The comment talks about ORIG_TEAMS > -2, but the condition is >= -2.
So which one is it?


Thanks for the hint. It should be indeed "> -2" since teams_thread_limit "sets
the maximum number of OpenMP threads to use in each contention group created by
a teams construct" (OpenMP 5.2, section 21.6.2). So if there is no (explicit)
teams construct, then teams_thread_limit doesn't need to be copied to the 
device.



> +  /* This tests a large number of teams and threads. If it is larger than
> +2^15+1 then the according argument in the kernels arguments list
> +is encoded with two items instead of one. On NVIDIA there is an
> +adjustment for too large teams and threads. For AMD such adjustment
> +exists only for threads and will cause runtime errors with a two
> +large

s/two/too/ ?
Shouldn't amdgcn adjusts also number of teams?


I adjusted now also the number of teams in the amdgcn plugin. As upper bound I
chose two times the number of compute units. This seems to be sufficient when
one team is executed at one compute unit. This at least avoids the queueing of a
large amount of teams and the corresponding memory allocation.

The drawback is that a user is probably not aware of the actual number of
compute units (which is not very large on gfx cards, e.g. 120 for gfx908 and 104
for gfx90a) and thus maybe expects different values from omp_get_team_num(). For
instance in something like the following:

#pragma omp target
#pragma omp teams num_teams(220)
#pragma omp distribute parallel for
  for(int i = 0; i < 220; ++i)
{
#pragma omp critical
   ... omp_get_team_num () ...
}

On a gfx90a card with 104 compute units 12 threads are assigned to "reused"
teams (instead of having their own teams) that would not be the case without the
limit.

Alternatively, we could just define some (larger) constant number (though I
don't know a reasonable value here). But this does actually not solve the above
mentioned drawback. I think, we need to find a compromise between an
unneccessary small upper bound and the chance to get memory allocation failures
due to a too large number of teams.



As for testcases, have you tested this in a native setup where 
dg-set-target-env-var
actually works?


Besides remote testing with offloading (which does not yet work with
dg-set-target-env-var), I also tested locally on x86_64-pc-linux-gnu with one
nvptx offload device without issues (using "make check" and verifying that
offloading is indeed used).

Marcel
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of
device-specific ICV values (specified as environment variables or changed on a
device). In order 

Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices

2022-09-30 Thread Jakub Jelinek via Gcc-patches
On Sun, Sep 18, 2022 at 10:24:43AM +0200, Marcel Vollweiler wrote:
> gcc/ChangeLog:
> 
>   * gimplify.cc (optimize_target_teams): Set initial num_teams_upper
>   to "-2" instead of "1" for non-existing num_teams clause in order to
>   disambiguate from the case of an existing num_teams clause with value 1.
> 
> libgomp/ChangeLog:
> 
>   * config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
>   allow processing of device-specific values.
>   (omp_set_teams_thread_limit): Likewise.
>   (ialias): Likewise.
>   * config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
>   (omp_set_teams_thread_limit): Likewise.
>   (ialias): Likewise.
>   * icv-device.c (omp_get_teams_thread_limit): Likewise.
>   (ialias): Likewise.
>   (omp_set_teams_thread_limit): Likewise.
>   * icv.c (omp_set_teams_thread_limit): Removed.
>   (omp_get_teams_thread_limit): Likewise.
>   (ialias): Likewise.
>   * target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
>   handling.
>   (gomp_load_image_to_device): Added a size check for the ICVs struct
>   variable.
>   (gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
>   copy back the ICV values from device to host.
>   (GOMP_target_ext): Update the number of teams and threads in the kernel
>   args also considering device-specific values.
>   * testsuite/libgomp.c-c++-common/icv-4.c: Bugfix.

Better say what exactly you changed in words.

>   * testsuite/libgomp.c-c++-common/icv-5.c: Extended.
>   * testsuite/libgomp.c-c++-common/icv-6.c: Extended.
>   * testsuite/libgomp.c-c++-common/icv-7.c: Extended.
>   * testsuite/libgomp.c-c++-common/icv-9.c: New test.
>   * testsuite/libgomp.fortran/icv-5.f90: New test.
>   * testsuite/libgomp.fortran/icv-6.f90: New test.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/gomp/target-teams-1.c: Adapt expected values for
>   num_teams from "1" to "-2" in cases without num_teams clause.
>   * g++.dg/gomp/target-teams-1.C: Likewise.
>   * gfortran.dg/gomp/defaultmap-4.f90: Likewise.
>   * gfortran.dg/gomp/defaultmap-5.f90: Likewise.
>   * gfortran.dg/gomp/defaultmap-6.f90: Likewise.

> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -14153,7 +14153,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p)
>struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;
>  
>if (teams == NULL_TREE)
> -num_teams_upper = integer_one_node;
> +num_teams_upper = build_int_cst (integer_type_node, -2);
>else
>  for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c))
>{

The function comment above optimize_target_teams contains detailed
description on what the values mean and why, so it definitely should
document what -2 means and when it is used.
I know you have documentation in libgomp for it, but it should be in both
places.

> +  intptr_t new_teams = orig_teams, new_threads = orig_threads;
> +  /* ORIG_TEAMS == -2: No explicit teams construct specified. Set to 1.

Two spaces after .

> + ORIG_TEAMS == -1: TEAMS construct with NUM_TEAMS clause specified, but 
> the
> +value could not be specified. No Change.

Likewise.
lowercase change ?

> + ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause.
> +   Set device-specific value.
> + ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause.
> +  No change.  */
> +  if (orig_teams == -2)
> +new_teams = 1;
> +  else if (orig_teams == 0)
> +{
> +  struct gomp_offload_icv_list *item = gomp_get_offload_icv_item 
> (device);
> +  if (item != NULL)
> + new_teams = item->icvs.nteams;
> +}
> +  /* The device-specific teams-thread-limit is only set if (a) an explicit 
> TEAMS
> + region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already 
> set by
> + e.g. a THREAD_LIMIT clause.  */
> +  if (orig_teams >= -2 && orig_threads == 0)

The comment talks about ORIG_TEAMS > -2, but the condition is >= -2.
So which one is it?

> +  /* This tests a large number of teams and threads. If it is larger than
> +  2^15+1 then the according argument in the kernels arguments list
> +  is encoded with two items instead of one. On NVIDIA there is an
> +  adjustment for too large teams and threads. For AMD such adjustment
> +  exists only for threads and will cause runtime errors with a two large

s/two/too/ ?
Shouldn't amdgcn adjusts also number of teams?

As for testcases, have you tested this in a native setup where 
dg-set-target-env-var
actually works?

Jakub



Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices

2022-09-18 Thread Marcel Vollweiler

Hi Jakub,

The last version of this patch was slightly adapted to the latest changes of the
device-specific environment variable syntax
(https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601145.html), also
considering the latest related bug fixes (commits 994ea892bd02d and 
7d37c7f67c1bb).

The new patch version was bootstrapped and tested on x86_64-linux with nvptx and
amdgcn offloading without regressions.

Marcel
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of
device-specific ICV values (specified as environment variables or changed on a
device). In order to reuse device-specific ICV values, a copy back mechanism is
implemented that copies ICV values back from device to the host. 

gcc/ChangeLog:

* gimplify.cc (optimize_target_teams): Set initial num_teams_upper
to "-2" instead of "1" for non-existing num_teams clause in order to
disambiguate from the case of an existing num_teams clause with value 1.

libgomp/ChangeLog:

* config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
allow processing of device-specific values.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* icv-device.c (omp_get_teams_thread_limit): Likewise.
(ialias): Likewise.
(omp_set_teams_thread_limit): Likewise.
* icv.c (omp_set_teams_thread_limit): Removed.
(omp_get_teams_thread_limit): Likewise.
(ialias): Likewise.
* target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
handling.
(gomp_load_image_to_device): Added a size check for the ICVs struct
variable.
(gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
copy back the ICV values from device to host.
(GOMP_target_ext): Update the number of teams and threads in the kernel
args also considering device-specific values.
* testsuite/libgomp.c-c++-common/icv-4.c: Bugfix.
* testsuite/libgomp.c-c++-common/icv-5.c: Extended.
* testsuite/libgomp.c-c++-common/icv-6.c: Extended.
* testsuite/libgomp.c-c++-common/icv-7.c: Extended.
* testsuite/libgomp.c-c++-common/icv-9.c: New test.
* testsuite/libgomp.fortran/icv-5.f90: New test.
* testsuite/libgomp.fortran/icv-6.f90: New test.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/target-teams-1.c: Adapt expected values for
num_teams from "1" to "-2" in cases without num_teams clause.
* g++.dg/gomp/target-teams-1.C: Likewise.
* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
* gfortran.dg/gomp/defaultmap-6.f90: Likewise.

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index dcdc852..b393ed8 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -14153,7 +14153,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p)
   struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;
 
   if (teams == NULL_TREE)
-num_teams_upper = integer_one_node;
+num_teams_upper = build_int_cst (integer_type_node, -2);
   else
 for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c))
   {
diff --git a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c 
b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
index 51b8d48..74d60e1 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
@@ -81,5 +81,5 @@ foo (int a, int b, long c, long d)
 /* { dg-final { scan-tree-dump-times "thread_limit\\(-1\\)" 3 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "num_teams\\(0\\)" 4 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(0\\)" 6 "gimple" } } */
-/* { dg-final { scan-tree-dump-times "num_teams\\(1\\)" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "num_teams\\(-2\\)" 2 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(1\\)" 0 "gimple" } } */
diff --git a/gcc/testsuite/g++.dg/gomp/target-teams-1.C 
b/gcc/testsuite/g++.dg/gomp/target-teams-1.C
index f78a608..29e5597 100644
--- a/gcc/testsuite/g++.dg/gomp/target-teams-1.C
+++ b/gcc/testsuite/g++.dg/gomp/target-teams-1.C
@@ -88,5 +88,5 @@ foo (int a, int b, long c, long d)
 /* { dg-final { scan-tree-dump-times "thread_limit\\(-1\\)" 3 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "num_teams\\(0\\)" 4 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "thread_limit\\(0\\)" 6 "gimple" } } */
-/* { dg-final { 

Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices

2022-08-03 Thread Marcel Vollweiler

Hi Jakub,

This patch was reduced a bit and most of your comments were considered in the
last submission of the environment variable syntax extension patch
(https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599175.html). This patch
also builds on that envvar patch version.

The nteams-var related content was moved from this patch to the envvar patch as
that is closely connected. However, additional testing and testing of copy back
device-specific nteams-var ICV values is still included in this patch together
with the teams-thread-limit-var content.


--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -13994,7 +13994,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p)
struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;

if (teams == NULL_TREE)
-num_teams_upper = integer_one_node;
+num_teams_upper = integer_minus_two_node;


No, please don't introduce this, it is quite costly to have a GC trees
like integer_one_node, so they should stay for the most commonly used
numbers, -2 isn't like that.  Just build_int_cst (integer_type_node, -2).


integer_minus_two_node was replaced by "build_int_cst (integer_type_node, -2)".




--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -642,6 +642,7 @@ enum tree_index {
TI_INTEGER_ONE,
TI_INTEGER_THREE,
TI_INTEGER_MINUS_ONE,
+  TI_INTEGER_MINUS_TWO,
TI_NULL_POINTER,

TI_SIZE_ZERO,
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 8f83ea1..8cb474d 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -9345,6 +9345,7 @@ build_common_tree_nodes (bool signed_char)
integer_one_node = build_int_cst (integer_type_node, 1);
integer_three_node = build_int_cst (integer_type_node, 3);
integer_minus_one_node = build_int_cst (integer_type_node, -1);
+  integer_minus_two_node = build_int_cst (integer_type_node, -2);

size_zero_node = size_int (0);
size_one_node = size_int (1);
diff --git a/gcc/tree.h b/gcc/tree.h
index cea49a5..1aeb009 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4206,6 +4206,7 @@ tree_strip_any_location_wrapper (tree exp)
  #define integer_one_node   global_trees[TI_INTEGER_ONE]
  #define integer_three_node  global_trees[TI_INTEGER_THREE]
  #define integer_minus_one_node global_trees[TI_INTEGER_MINUS_ONE]
+#define integer_minus_two_node  global_trees[TI_INTEGER_MINUS_TWO]
  #define size_zero_node global_trees[TI_SIZE_ZERO]
  #define size_one_node  global_trees[TI_SIZE_ONE]
  #define bitsize_zero_node  global_trees[TI_BITSIZE_ZERO]


And drop the above 3 hunks.


Removed.




--- a/libgomp/config/gcn/icv-device.c
+++ b/libgomp/config/gcn/icv-device.c
@@ -37,6 +37,7 @@ volatile int GOMP_DEFAULT_DEVICE_VAR;
  volatile int GOMP_MAX_ACTIVE_LEVELS_VAR;
  volatile omp_proc_bind_t GOMP_BIND_VAR;
  volatile int GOMP_NTEAMS_VAR;
+volatile int GOMP_TEAMS_THREAD_LIMIT_VAR;


I really don't like this copying of individual ICVs one by one to the
device, copy a struct containing them and access fields in that struct.


I recently changed this in
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599175.html. So there is
one struct containing all ICVs that are copied from host to the device and back.




--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -116,6 +116,7 @@ struct addr_pair
  #define GOMP_MAX_ACTIVE_LEVELS_VAR __gomp_max_active_levels
  #define GOMP_BIND_VAR __gomp_bind
  #define GOMP_NTEAMS_VAR __gomp_nteams
+#define GOMP_TEAMS_THREAD_LIMIT_VAR __gomp_teams_thread_limit_var


Likewise here.


Those were all removed.




@@ -527,13 +538,19 @@ struct gomp_icv_list {

  extern void *gomp_get_icv_value_ptr (struct gomp_icv_list **list,
  int device_num);
-extern struct gomp_icv_list *gomp_run_sched_var_dev_list;
-extern struct gomp_icv_list *gomp_run_sched_chunk_size_dev_list;
+extern struct gomp_icv_list* gomp_add_device_specific_icv (int dev_num,
+   size_t size,
+struct gomp_icv_list 
**list);
+extern struct gomp_icv_list *gomp_initial_run_sched_var_dev_list;
+extern struct gomp_icv_list *gomp_initial_run_sched_chunk_size_dev_list;
+extern struct gomp_icv_list *gomp_initial_max_active_levels_var_dev_list;
+extern struct gomp_icv_list *gomp_initial_proc_bind_var_dev_list;
+extern struct gomp_icv_list *gomp_initial_proc_bind_var_list_dev_list;
+extern struct gomp_icv_list *gomp_initial_proc_bind_var_list_len_dev_list;
+extern struct gomp_icv_list *gomp_initial_nteams_var_dev_list;
+
  extern struct gomp_icv_list *gomp_nteams_var_dev_list;
-extern struct gomp_icv_list *gomp_max_active_levels_var_dev_list;
-extern struct gomp_icv_list *gomp_proc_bind_var_dev_list;
-extern struct gomp_icv_list *gomp_proc_bind_var_list_dev_list;
-extern struct gomp_icv_list *gomp_proc_bind_var_list_len_dev_list;
+extern struct gomp_icv_list *gomp_teams_thread_limit_var_dev_list;


Nor these 

Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices

2022-06-30 Thread Jakub Jelinek via Gcc-patches
On Thu, Apr 14, 2022 at 06:06:24PM +0200, Marcel Vollweiler wrote:
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -13994,7 +13994,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p)
>struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;
>  
>if (teams == NULL_TREE)
> -num_teams_upper = integer_one_node;
> +num_teams_upper = integer_minus_two_node;

No, please don't introduce this, it is quite costly to have a GC trees
like integer_one_node, so they should stay for the most commonly used
numbers, -2 isn't like that.  Just build_int_cst (integer_type_node, -2).

> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -642,6 +642,7 @@ enum tree_index {
>TI_INTEGER_ONE,
>TI_INTEGER_THREE,
>TI_INTEGER_MINUS_ONE,
> +  TI_INTEGER_MINUS_TWO,
>TI_NULL_POINTER,
>  
>TI_SIZE_ZERO,
> diff --git a/gcc/tree.cc b/gcc/tree.cc
> index 8f83ea1..8cb474d 100644
> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -9345,6 +9345,7 @@ build_common_tree_nodes (bool signed_char)
>integer_one_node = build_int_cst (integer_type_node, 1);
>integer_three_node = build_int_cst (integer_type_node, 3);
>integer_minus_one_node = build_int_cst (integer_type_node, -1);
> +  integer_minus_two_node = build_int_cst (integer_type_node, -2);
>  
>size_zero_node = size_int (0);
>size_one_node = size_int (1);
> diff --git a/gcc/tree.h b/gcc/tree.h
> index cea49a5..1aeb009 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -4206,6 +4206,7 @@ tree_strip_any_location_wrapper (tree exp)
>  #define integer_one_node global_trees[TI_INTEGER_ONE]
>  #define integer_three_node  global_trees[TI_INTEGER_THREE]
>  #define integer_minus_one_node   
> global_trees[TI_INTEGER_MINUS_ONE]
> +#define integer_minus_two_node   
> global_trees[TI_INTEGER_MINUS_TWO]
>  #define size_zero_node   global_trees[TI_SIZE_ZERO]
>  #define size_one_nodeglobal_trees[TI_SIZE_ONE]
>  #define bitsize_zero_nodeglobal_trees[TI_BITSIZE_ZERO]

And drop the above 3 hunks.

> --- a/libgomp/config/gcn/icv-device.c
> +++ b/libgomp/config/gcn/icv-device.c
> @@ -37,6 +37,7 @@ volatile int GOMP_DEFAULT_DEVICE_VAR;
>  volatile int GOMP_MAX_ACTIVE_LEVELS_VAR;
>  volatile omp_proc_bind_t GOMP_BIND_VAR;
>  volatile int GOMP_NTEAMS_VAR;
> +volatile int GOMP_TEAMS_THREAD_LIMIT_VAR;

I really don't like this copying of individual ICVs one by one to the
device, copy a struct containing them and access fields in that struct.

> --- a/libgomp/libgomp-plugin.h
> +++ b/libgomp/libgomp-plugin.h
> @@ -116,6 +116,7 @@ struct addr_pair
>  #define GOMP_MAX_ACTIVE_LEVELS_VAR __gomp_max_active_levels
>  #define GOMP_BIND_VAR __gomp_bind
>  #define GOMP_NTEAMS_VAR __gomp_nteams
> +#define GOMP_TEAMS_THREAD_LIMIT_VAR __gomp_teams_thread_limit_var

Likewise here.

> @@ -527,13 +538,19 @@ struct gomp_icv_list {
>  
>  extern void *gomp_get_icv_value_ptr (struct gomp_icv_list **list,
>int device_num);
> -extern struct gomp_icv_list *gomp_run_sched_var_dev_list;
> -extern struct gomp_icv_list *gomp_run_sched_chunk_size_dev_list;
> +extern struct gomp_icv_list* gomp_add_device_specific_icv (int dev_num,
> +size_t size,
> + struct 
> gomp_icv_list **list);
> +extern struct gomp_icv_list *gomp_initial_run_sched_var_dev_list;
> +extern struct gomp_icv_list *gomp_initial_run_sched_chunk_size_dev_list;
> +extern struct gomp_icv_list *gomp_initial_max_active_levels_var_dev_list;
> +extern struct gomp_icv_list *gomp_initial_proc_bind_var_dev_list;
> +extern struct gomp_icv_list *gomp_initial_proc_bind_var_list_dev_list;
> +extern struct gomp_icv_list *gomp_initial_proc_bind_var_list_len_dev_list;
> +extern struct gomp_icv_list *gomp_initial_nteams_var_dev_list;
> +
>  extern struct gomp_icv_list *gomp_nteams_var_dev_list;
> -extern struct gomp_icv_list *gomp_max_active_levels_var_dev_list;
> -extern struct gomp_icv_list *gomp_proc_bind_var_dev_list;
> -extern struct gomp_icv_list *gomp_proc_bind_var_list_dev_list;
> -extern struct gomp_icv_list *gomp_proc_bind_var_list_len_dev_list;
> +extern struct gomp_icv_list *gomp_teams_thread_limit_var_dev_list;

Nor these per-var lists.  For a specific device, walk the list with
all the vars in it, start with the most specific (matching dev number),
then just dev and then all and fill in from it what is going to be copied.
> --- a/libgomp/plugin/plugin-gcn.c
> +++ b/libgomp/plugin/plugin-gcn.c
> @@ -572,7 +572,8 @@ static char *GOMP_ICV_STRINGS[] =
>XSTRING (GOMP_DYN_VAR),
>XSTRING (GOMP_MAX_ACTIVE_LEVELS_VAR),
>XSTRING (GOMP_BIND_VAR),
> -  XSTRING (GOMP_NTEAMS_VAR)
> +  XSTRING (GOMP_NTEAMS_VAR),
> +  XSTRING (GOMP_TEAMS_THREAD_LIMIT_VAR)

Then you don't need to e.g. track the names of the individual vars, just
one for the whole ICV block.

Jakub



[PING][Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices

2022-06-09 Thread Marcel Vollweiler

Hi,

I’d like to ping the patch for the OpenMP runtime routines omp_get_max_teams,
omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices:

https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593260.html

This patch builds on the following patch which is currently in revision/review:
- [PATCH] OpenMP, libgomp: Environment variable syntax extension.
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588728.html

As several technical details will be changed anyway due to revision of the
environment variable extension patch, a complete review does not make sense yet
from my point of view. However, I wondered if a "rough" review about the main
approach/idea is feasible, so that necessary changes could be included in the
revision that is needed anyway.

Thanks
Marcel
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices

2022-04-14 Thread Marcel Vollweiler

Hi,

This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices.

The patch builds on the following patches which are submitted, but not yet
approved/committed:
- [PATCH] OpenMP, libgomp: Environment variable syntax extension.
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588728.html
- [PATCH] OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr.
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591556.html

The OpenMP runtime routines omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit were introduced in OpenMP 5.1 and where already
implemented for the host usage with patch
https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581283.html

The new patch extends the functionality of these OpenMP runtime routines by the
usage also on the device, i.e. device-specific values for nteams-var and
teams-thread-limit-var ICVs can now be retrieved and set also on the device. The
updated number of teams/threads are then used when launching the kernel.

The following main aspects are considered:
(a) Implemented the functions in the according icv-device files.
(b) Added structures to not only store initial device-specific values (they have
to be kept for omp_display_env) but also device-specific ICV values that can be
changed on the device at runtime.
(c) Changed the gimplification:
(c.1) Introduced integer_minus_two_node.
(c.2) For target regions that do not include teams constructs, now the clause
num_teams(-2) is added instead num_teams(1). This was necessary as num_teams(1)
is ambigious: it can also mean that a teams construct with explicit num_teams(1)
clause was specified inside the target region. The disambiguation is needed in
order to choose the correct thread limit: teams-thread-limit-var is only
intended for teams constructs such that if there is no teams construct, then the
number of threads is limited by thread-limit-var.
(d) Extend GOMP_target_ext. The host needs to set the device-specific ICV values
before the kernel is launched. The number of teams and threads are members of
the args list and are modified when no value was specified in an explicit clause
and the computation of the value was not postponed due to mapped variables.
(d.1) The arguments list is copied in order to guarantee immutability.
(e) Added copy back mechanism for ICVs which are modified on the device. The
only way to change device-specific ICVs is to do it on the device. As the
device-specific values are sometimes needed also on the host when the kernel is
launched (particularly number of teams and threads) they have to be copied back.

The patch was tested on x86_64-linux with nvptx and gcn offloading. All with no
regressions.

Marcel
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcc/ChangeLog:

* gimplify.cc (optimize_target_teams): Changed integer_one_node to
integer_minus_two_node in case of non-existing teams construct in target
region due to disambiguation. Previously, num_teams(1) was used as
clause on the target construct when (a) no teams construct exists in the
target region or (b) a teams construct with explicit num_teams(1)
clause was specified.
* tree-core.h (enum tree_index): Added TI_INTEGER_MINUS_TWO.
* tree.cc (build_common_tree_nodes): Added integer_minus_two_node.
* tree.h (integer_minus_two_node): Likewise.

libgomp/ChangeLog:

* config/gcn/icv-device.c (omp_set_num_teams): Added.
(omp_get_teams_thread_limit): Added.
(omp_set_teams_thread_limit): Added.
(ialias): Added for omp_set_num_teams and omp_{gs}et_teams_thread_limit.
* config/nvptx/icv-device.c (omp_set_num_teams): Likewise.
(omp_get_teams_thread_limit): Likewise.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* env.c (struct gomp_default_icv_t): Added to hold default ICV values.
(struct gomp_icv_list): Removed static.
(omp_display_env): Renaming of used lists.
(add_device_specific_icv): Removed static.
(gomp_add_device_specific_icv): Removed static.
(parse_device_specific): Renaming of used lists and added storing of
parsed values in lists of modifiable ICV values. 
* icv-device.c (omp_set_num_teams): Added.
(ialias): Added for omp_set_num_teams and omp_{gs}et_teams_thread_limit.
(omp_get_teams_thread_limit): Added.
(omp_set_teams_thread_limit): Added.
* icv.c (omp_set_num_teams): Removed.
(omp_set_teams_thread_limit): Removed.
(omp_get_teams_thread_limit): Removed.
(ialias): Removed for omp_set_num_teams and
omp_{gs}et_teams_thread_limit.
* libgomp-plugin.h