Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
On Thu, Nov 24, 2022 at 03:09:02PM +0100, Marcel Vollweiler wrote: > gcc/ChangeLog: > > * gimplify.cc (optimize_target_teams): Set initial num_teams_upper > to "-2" instead of "1" for non-existing num_teams clause in order to > disambiguate from the case of an existing num_teams clause with value 1. > > libgomp/ChangeLog: > > * config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to > allow processing of device-specific values. > (omp_set_teams_thread_limit): Likewise. > (ialias): Likewise. > * config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise. > (omp_set_teams_thread_limit): Likewise. > (ialias): Likewise. > * icv-device.c (omp_get_teams_thread_limit): Likewise. > (ialias): Likewise. > (omp_set_teams_thread_limit): Likewise. > * icv.c (omp_set_teams_thread_limit): Removed. > (omp_get_teams_thread_limit): Likewise. > (ialias): Likewise. > * libgomp.texi: Updated documentation for nvptx and gcn corresponding > to the limitation of the number of teams. > * plugin/plugin-gcn.c (limit_teams): New helper function that limits > the number of teams by twice the number of compute units. > (parse_target_attributes): Limit the number of teams on gcn offload > devices. > * target.c (get_gomp_offload_icvs): Added teams_thread_limit_var > handling. > (gomp_load_image_to_device): Added a size check for the ICVs struct > variable. > (gomp_copy_back_icvs): New function that is used in GOMP_target_ext to > copy back the ICV values from device to host. > (GOMP_target_ext): Update the number of teams and threads in the kernel > args also considering device-specific values. > * testsuite/libgomp.c-c++-common/icv-4.c: Fixed an error in the reading > of OMP_TEAMS_THREAD_LIMIT from the environment. > * testsuite/libgomp.c-c++-common/icv-5.c: Extended. > * testsuite/libgomp.c-c++-common/icv-6.c: Extended. > * testsuite/libgomp.c-c++-common/icv-7.c: Extended. > * testsuite/libgomp.c-c++-common/icv-9.c: New test. > * testsuite/libgomp.fortran/icv-5.f90: New test. > * testsuite/libgomp.fortran/icv-6.f90: New test. > > gcc/testsuite/ChangeLog: > > * c-c++-common/gomp/target-teams-1.c: Adapt expected values for > num_teams from "1" to "-2" in cases without num_teams clause. > * g++.dg/gomp/target-teams-1.C: Likewise. > * gfortran.dg/gomp/defaultmap-4.f90: Likewise. > * gfortran.dg/gomp/defaultmap-5.f90: Likewise. > * gfortran.dg/gomp/defaultmap-6.f90: Likewise. Ok, thanks. Jakub
Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
Hi Jakub, > * testsuite/libgomp.c-c++-common/icv-4.c: Bugfix. Better say what exactly you changed in words. Changed. > --- a/gcc/gimplify.cc > +++ b/gcc/gimplify.cc > @@ -14153,7 +14153,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p) >struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp; > >if (teams == NULL_TREE) > -num_teams_upper = integer_one_node; > +num_teams_upper = build_int_cst (integer_type_node, -2); >else > for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c)) >{ The function comment above optimize_target_teams contains detailed description on what the values mean and why, so it definitely should document what -2 means and when it is used. I know you have documentation in libgomp for it, but it should be in both places. I updated the comment with an explanation for "-2". > + intptr_t new_teams = orig_teams, new_threads = orig_threads; > + /* ORIG_TEAMS == -2: No explicit teams construct specified. Set to 1. Two spaces after . Corrected here and at other places. > + ORIG_TEAMS == -1: TEAMS construct with NUM_TEAMS clause specified, but the > + value could not be specified. No Change. Likewise. lowercase change ? Corrected. > + ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause. > + Set device-specific value. > + ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause. > +No change. */ > + if (orig_teams == -2) > +new_teams = 1; > + else if (orig_teams == 0) > +{ > + struct gomp_offload_icv_list *item = gomp_get_offload_icv_item (device); > + if (item != NULL) > + new_teams = item->icvs.nteams; > +} > + /* The device-specific teams-thread-limit is only set if (a) an explicit TEAMS > + region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already set by > + e.g. a THREAD_LIMIT clause. */ > + if (orig_teams >= -2 && orig_threads == 0) The comment talks about ORIG_TEAMS > -2, but the condition is >= -2. So which one is it? Thanks for the hint. It should be indeed "> -2" since teams_thread_limit "sets the maximum number of OpenMP threads to use in each contention group created by a teams construct" (OpenMP 5.2, section 21.6.2). So if there is no (explicit) teams construct, then teams_thread_limit doesn't need to be copied to the device. > + /* This tests a large number of teams and threads. If it is larger than > +2^15+1 then the according argument in the kernels arguments list > +is encoded with two items instead of one. On NVIDIA there is an > +adjustment for too large teams and threads. For AMD such adjustment > +exists only for threads and will cause runtime errors with a two > +large s/two/too/ ? Shouldn't amdgcn adjusts also number of teams? I adjusted now also the number of teams in the amdgcn plugin. As upper bound I chose two times the number of compute units. This seems to be sufficient when one team is executed at one compute unit. This at least avoids the queueing of a large amount of teams and the corresponding memory allocation. The drawback is that a user is probably not aware of the actual number of compute units (which is not very large on gfx cards, e.g. 120 for gfx908 and 104 for gfx90a) and thus maybe expects different values from omp_get_team_num(). For instance in something like the following: #pragma omp target #pragma omp teams num_teams(220) #pragma omp distribute parallel for for(int i = 0; i < 220; ++i) { #pragma omp critical ... omp_get_team_num () ... } On a gfx90a card with 104 compute units 12 threads are assigned to "reused" teams (instead of having their own teams) that would not be the case without the limit. Alternatively, we could just define some (larger) constant number (though I don't know a reasonable value here). But this does actually not solve the above mentioned drawback. I think, we need to find a compromise between an unneccessary small upper bound and the chance to get memory allocation failures due to a too large number of teams. As for testcases, have you tested this in a native setup where dg-set-target-env-var actually works? Besides remote testing with offloading (which does not yet work with dg-set-target-env-var), I also tested locally on x86_64-pc-linux-gnu with one nvptx offload device without issues (using "make check" and verifying that offloading is indeed used). Marcel - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 This patch adds support for omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of device-specific ICV values (specified as environment variables or changed on a device). In order
Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
On Sun, Sep 18, 2022 at 10:24:43AM +0200, Marcel Vollweiler wrote: > gcc/ChangeLog: > > * gimplify.cc (optimize_target_teams): Set initial num_teams_upper > to "-2" instead of "1" for non-existing num_teams clause in order to > disambiguate from the case of an existing num_teams clause with value 1. > > libgomp/ChangeLog: > > * config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to > allow processing of device-specific values. > (omp_set_teams_thread_limit): Likewise. > (ialias): Likewise. > * config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise. > (omp_set_teams_thread_limit): Likewise. > (ialias): Likewise. > * icv-device.c (omp_get_teams_thread_limit): Likewise. > (ialias): Likewise. > (omp_set_teams_thread_limit): Likewise. > * icv.c (omp_set_teams_thread_limit): Removed. > (omp_get_teams_thread_limit): Likewise. > (ialias): Likewise. > * target.c (get_gomp_offload_icvs): Added teams_thread_limit_var > handling. > (gomp_load_image_to_device): Added a size check for the ICVs struct > variable. > (gomp_copy_back_icvs): New function that is used in GOMP_target_ext to > copy back the ICV values from device to host. > (GOMP_target_ext): Update the number of teams and threads in the kernel > args also considering device-specific values. > * testsuite/libgomp.c-c++-common/icv-4.c: Bugfix. Better say what exactly you changed in words. > * testsuite/libgomp.c-c++-common/icv-5.c: Extended. > * testsuite/libgomp.c-c++-common/icv-6.c: Extended. > * testsuite/libgomp.c-c++-common/icv-7.c: Extended. > * testsuite/libgomp.c-c++-common/icv-9.c: New test. > * testsuite/libgomp.fortran/icv-5.f90: New test. > * testsuite/libgomp.fortran/icv-6.f90: New test. > > gcc/testsuite/ChangeLog: > > * c-c++-common/gomp/target-teams-1.c: Adapt expected values for > num_teams from "1" to "-2" in cases without num_teams clause. > * g++.dg/gomp/target-teams-1.C: Likewise. > * gfortran.dg/gomp/defaultmap-4.f90: Likewise. > * gfortran.dg/gomp/defaultmap-5.f90: Likewise. > * gfortran.dg/gomp/defaultmap-6.f90: Likewise. > --- a/gcc/gimplify.cc > +++ b/gcc/gimplify.cc > @@ -14153,7 +14153,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p) >struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp; > >if (teams == NULL_TREE) > -num_teams_upper = integer_one_node; > +num_teams_upper = build_int_cst (integer_type_node, -2); >else > for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c)) >{ The function comment above optimize_target_teams contains detailed description on what the values mean and why, so it definitely should document what -2 means and when it is used. I know you have documentation in libgomp for it, but it should be in both places. > + intptr_t new_teams = orig_teams, new_threads = orig_threads; > + /* ORIG_TEAMS == -2: No explicit teams construct specified. Set to 1. Two spaces after . > + ORIG_TEAMS == -1: TEAMS construct with NUM_TEAMS clause specified, but > the > +value could not be specified. No Change. Likewise. lowercase change ? > + ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause. > + Set device-specific value. > + ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause. > + No change. */ > + if (orig_teams == -2) > +new_teams = 1; > + else if (orig_teams == 0) > +{ > + struct gomp_offload_icv_list *item = gomp_get_offload_icv_item > (device); > + if (item != NULL) > + new_teams = item->icvs.nteams; > +} > + /* The device-specific teams-thread-limit is only set if (a) an explicit > TEAMS > + region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already > set by > + e.g. a THREAD_LIMIT clause. */ > + if (orig_teams >= -2 && orig_threads == 0) The comment talks about ORIG_TEAMS > -2, but the condition is >= -2. So which one is it? > + /* This tests a large number of teams and threads. If it is larger than > + 2^15+1 then the according argument in the kernels arguments list > + is encoded with two items instead of one. On NVIDIA there is an > + adjustment for too large teams and threads. For AMD such adjustment > + exists only for threads and will cause runtime errors with a two large s/two/too/ ? Shouldn't amdgcn adjusts also number of teams? As for testcases, have you tested this in a native setup where dg-set-target-env-var actually works? Jakub
Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
Hi Jakub, The last version of this patch was slightly adapted to the latest changes of the device-specific environment variable syntax (https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601145.html), also considering the latest related bug fixes (commits 994ea892bd02d and 7d37c7f67c1bb). The new patch version was bootstrapped and tested on x86_64-linux with nvptx and amdgcn offloading without regressions. Marcel - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 This patch adds support for omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of device-specific ICV values (specified as environment variables or changed on a device). In order to reuse device-specific ICV values, a copy back mechanism is implemented that copies ICV values back from device to the host. gcc/ChangeLog: * gimplify.cc (optimize_target_teams): Set initial num_teams_upper to "-2" instead of "1" for non-existing num_teams clause in order to disambiguate from the case of an existing num_teams clause with value 1. libgomp/ChangeLog: * config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to allow processing of device-specific values. (omp_set_teams_thread_limit): Likewise. (ialias): Likewise. * config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise. (omp_set_teams_thread_limit): Likewise. (ialias): Likewise. * icv-device.c (omp_get_teams_thread_limit): Likewise. (ialias): Likewise. (omp_set_teams_thread_limit): Likewise. * icv.c (omp_set_teams_thread_limit): Removed. (omp_get_teams_thread_limit): Likewise. (ialias): Likewise. * target.c (get_gomp_offload_icvs): Added teams_thread_limit_var handling. (gomp_load_image_to_device): Added a size check for the ICVs struct variable. (gomp_copy_back_icvs): New function that is used in GOMP_target_ext to copy back the ICV values from device to host. (GOMP_target_ext): Update the number of teams and threads in the kernel args also considering device-specific values. * testsuite/libgomp.c-c++-common/icv-4.c: Bugfix. * testsuite/libgomp.c-c++-common/icv-5.c: Extended. * testsuite/libgomp.c-c++-common/icv-6.c: Extended. * testsuite/libgomp.c-c++-common/icv-7.c: Extended. * testsuite/libgomp.c-c++-common/icv-9.c: New test. * testsuite/libgomp.fortran/icv-5.f90: New test. * testsuite/libgomp.fortran/icv-6.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/target-teams-1.c: Adapt expected values for num_teams from "1" to "-2" in cases without num_teams clause. * g++.dg/gomp/target-teams-1.C: Likewise. * gfortran.dg/gomp/defaultmap-4.f90: Likewise. * gfortran.dg/gomp/defaultmap-5.f90: Likewise. * gfortran.dg/gomp/defaultmap-6.f90: Likewise. diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index dcdc852..b393ed8 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -14153,7 +14153,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p) struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp; if (teams == NULL_TREE) -num_teams_upper = integer_one_node; +num_teams_upper = build_int_cst (integer_type_node, -2); else for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c)) { diff --git a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c index 51b8d48..74d60e1 100644 --- a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c +++ b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c @@ -81,5 +81,5 @@ foo (int a, int b, long c, long d) /* { dg-final { scan-tree-dump-times "thread_limit\\(-1\\)" 3 "gimple" } } */ /* { dg-final { scan-tree-dump-times "num_teams\\(0\\)" 4 "gimple" } } */ /* { dg-final { scan-tree-dump-times "thread_limit\\(0\\)" 6 "gimple" } } */ -/* { dg-final { scan-tree-dump-times "num_teams\\(1\\)" 2 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "num_teams\\(-2\\)" 2 "gimple" } } */ /* { dg-final { scan-tree-dump-times "thread_limit\\(1\\)" 0 "gimple" } } */ diff --git a/gcc/testsuite/g++.dg/gomp/target-teams-1.C b/gcc/testsuite/g++.dg/gomp/target-teams-1.C index f78a608..29e5597 100644 --- a/gcc/testsuite/g++.dg/gomp/target-teams-1.C +++ b/gcc/testsuite/g++.dg/gomp/target-teams-1.C @@ -88,5 +88,5 @@ foo (int a, int b, long c, long d) /* { dg-final { scan-tree-dump-times "thread_limit\\(-1\\)" 3 "gimple" } } */ /* { dg-final { scan-tree-dump-times "num_teams\\(0\\)" 4 "gimple" } } */ /* { dg-final { scan-tree-dump-times "thread_limit\\(0\\)" 6 "gimple" } } */ -/* { dg-final {
Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
Hi Jakub, This patch was reduced a bit and most of your comments were considered in the last submission of the environment variable syntax extension patch (https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599175.html). This patch also builds on that envvar patch version. The nteams-var related content was moved from this patch to the envvar patch as that is closely connected. However, additional testing and testing of copy back device-specific nteams-var ICV values is still included in this patch together with the teams-thread-limit-var content. --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -13994,7 +13994,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p) struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp; if (teams == NULL_TREE) -num_teams_upper = integer_one_node; +num_teams_upper = integer_minus_two_node; No, please don't introduce this, it is quite costly to have a GC trees like integer_one_node, so they should stay for the most commonly used numbers, -2 isn't like that. Just build_int_cst (integer_type_node, -2). integer_minus_two_node was replaced by "build_int_cst (integer_type_node, -2)". --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -642,6 +642,7 @@ enum tree_index { TI_INTEGER_ONE, TI_INTEGER_THREE, TI_INTEGER_MINUS_ONE, + TI_INTEGER_MINUS_TWO, TI_NULL_POINTER, TI_SIZE_ZERO, diff --git a/gcc/tree.cc b/gcc/tree.cc index 8f83ea1..8cb474d 100644 --- a/gcc/tree.cc +++ b/gcc/tree.cc @@ -9345,6 +9345,7 @@ build_common_tree_nodes (bool signed_char) integer_one_node = build_int_cst (integer_type_node, 1); integer_three_node = build_int_cst (integer_type_node, 3); integer_minus_one_node = build_int_cst (integer_type_node, -1); + integer_minus_two_node = build_int_cst (integer_type_node, -2); size_zero_node = size_int (0); size_one_node = size_int (1); diff --git a/gcc/tree.h b/gcc/tree.h index cea49a5..1aeb009 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -4206,6 +4206,7 @@ tree_strip_any_location_wrapper (tree exp) #define integer_one_node global_trees[TI_INTEGER_ONE] #define integer_three_node global_trees[TI_INTEGER_THREE] #define integer_minus_one_node global_trees[TI_INTEGER_MINUS_ONE] +#define integer_minus_two_node global_trees[TI_INTEGER_MINUS_TWO] #define size_zero_node global_trees[TI_SIZE_ZERO] #define size_one_node global_trees[TI_SIZE_ONE] #define bitsize_zero_node global_trees[TI_BITSIZE_ZERO] And drop the above 3 hunks. Removed. --- a/libgomp/config/gcn/icv-device.c +++ b/libgomp/config/gcn/icv-device.c @@ -37,6 +37,7 @@ volatile int GOMP_DEFAULT_DEVICE_VAR; volatile int GOMP_MAX_ACTIVE_LEVELS_VAR; volatile omp_proc_bind_t GOMP_BIND_VAR; volatile int GOMP_NTEAMS_VAR; +volatile int GOMP_TEAMS_THREAD_LIMIT_VAR; I really don't like this copying of individual ICVs one by one to the device, copy a struct containing them and access fields in that struct. I recently changed this in https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599175.html. So there is one struct containing all ICVs that are copied from host to the device and back. --- a/libgomp/libgomp-plugin.h +++ b/libgomp/libgomp-plugin.h @@ -116,6 +116,7 @@ struct addr_pair #define GOMP_MAX_ACTIVE_LEVELS_VAR __gomp_max_active_levels #define GOMP_BIND_VAR __gomp_bind #define GOMP_NTEAMS_VAR __gomp_nteams +#define GOMP_TEAMS_THREAD_LIMIT_VAR __gomp_teams_thread_limit_var Likewise here. Those were all removed. @@ -527,13 +538,19 @@ struct gomp_icv_list { extern void *gomp_get_icv_value_ptr (struct gomp_icv_list **list, int device_num); -extern struct gomp_icv_list *gomp_run_sched_var_dev_list; -extern struct gomp_icv_list *gomp_run_sched_chunk_size_dev_list; +extern struct gomp_icv_list* gomp_add_device_specific_icv (int dev_num, + size_t size, +struct gomp_icv_list **list); +extern struct gomp_icv_list *gomp_initial_run_sched_var_dev_list; +extern struct gomp_icv_list *gomp_initial_run_sched_chunk_size_dev_list; +extern struct gomp_icv_list *gomp_initial_max_active_levels_var_dev_list; +extern struct gomp_icv_list *gomp_initial_proc_bind_var_dev_list; +extern struct gomp_icv_list *gomp_initial_proc_bind_var_list_dev_list; +extern struct gomp_icv_list *gomp_initial_proc_bind_var_list_len_dev_list; +extern struct gomp_icv_list *gomp_initial_nteams_var_dev_list; + extern struct gomp_icv_list *gomp_nteams_var_dev_list; -extern struct gomp_icv_list *gomp_max_active_levels_var_dev_list; -extern struct gomp_icv_list *gomp_proc_bind_var_dev_list; -extern struct gomp_icv_list *gomp_proc_bind_var_list_dev_list; -extern struct gomp_icv_list *gomp_proc_bind_var_list_len_dev_list; +extern struct gomp_icv_list *gomp_teams_thread_limit_var_dev_list; Nor these
Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
On Thu, Apr 14, 2022 at 06:06:24PM +0200, Marcel Vollweiler wrote: > --- a/gcc/gimplify.cc > +++ b/gcc/gimplify.cc > @@ -13994,7 +13994,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p) >struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp; > >if (teams == NULL_TREE) > -num_teams_upper = integer_one_node; > +num_teams_upper = integer_minus_two_node; No, please don't introduce this, it is quite costly to have a GC trees like integer_one_node, so they should stay for the most commonly used numbers, -2 isn't like that. Just build_int_cst (integer_type_node, -2). > --- a/gcc/tree-core.h > +++ b/gcc/tree-core.h > @@ -642,6 +642,7 @@ enum tree_index { >TI_INTEGER_ONE, >TI_INTEGER_THREE, >TI_INTEGER_MINUS_ONE, > + TI_INTEGER_MINUS_TWO, >TI_NULL_POINTER, > >TI_SIZE_ZERO, > diff --git a/gcc/tree.cc b/gcc/tree.cc > index 8f83ea1..8cb474d 100644 > --- a/gcc/tree.cc > +++ b/gcc/tree.cc > @@ -9345,6 +9345,7 @@ build_common_tree_nodes (bool signed_char) >integer_one_node = build_int_cst (integer_type_node, 1); >integer_three_node = build_int_cst (integer_type_node, 3); >integer_minus_one_node = build_int_cst (integer_type_node, -1); > + integer_minus_two_node = build_int_cst (integer_type_node, -2); > >size_zero_node = size_int (0); >size_one_node = size_int (1); > diff --git a/gcc/tree.h b/gcc/tree.h > index cea49a5..1aeb009 100644 > --- a/gcc/tree.h > +++ b/gcc/tree.h > @@ -4206,6 +4206,7 @@ tree_strip_any_location_wrapper (tree exp) > #define integer_one_node global_trees[TI_INTEGER_ONE] > #define integer_three_node global_trees[TI_INTEGER_THREE] > #define integer_minus_one_node > global_trees[TI_INTEGER_MINUS_ONE] > +#define integer_minus_two_node > global_trees[TI_INTEGER_MINUS_TWO] > #define size_zero_node global_trees[TI_SIZE_ZERO] > #define size_one_nodeglobal_trees[TI_SIZE_ONE] > #define bitsize_zero_nodeglobal_trees[TI_BITSIZE_ZERO] And drop the above 3 hunks. > --- a/libgomp/config/gcn/icv-device.c > +++ b/libgomp/config/gcn/icv-device.c > @@ -37,6 +37,7 @@ volatile int GOMP_DEFAULT_DEVICE_VAR; > volatile int GOMP_MAX_ACTIVE_LEVELS_VAR; > volatile omp_proc_bind_t GOMP_BIND_VAR; > volatile int GOMP_NTEAMS_VAR; > +volatile int GOMP_TEAMS_THREAD_LIMIT_VAR; I really don't like this copying of individual ICVs one by one to the device, copy a struct containing them and access fields in that struct. > --- a/libgomp/libgomp-plugin.h > +++ b/libgomp/libgomp-plugin.h > @@ -116,6 +116,7 @@ struct addr_pair > #define GOMP_MAX_ACTIVE_LEVELS_VAR __gomp_max_active_levels > #define GOMP_BIND_VAR __gomp_bind > #define GOMP_NTEAMS_VAR __gomp_nteams > +#define GOMP_TEAMS_THREAD_LIMIT_VAR __gomp_teams_thread_limit_var Likewise here. > @@ -527,13 +538,19 @@ struct gomp_icv_list { > > extern void *gomp_get_icv_value_ptr (struct gomp_icv_list **list, >int device_num); > -extern struct gomp_icv_list *gomp_run_sched_var_dev_list; > -extern struct gomp_icv_list *gomp_run_sched_chunk_size_dev_list; > +extern struct gomp_icv_list* gomp_add_device_specific_icv (int dev_num, > +size_t size, > + struct > gomp_icv_list **list); > +extern struct gomp_icv_list *gomp_initial_run_sched_var_dev_list; > +extern struct gomp_icv_list *gomp_initial_run_sched_chunk_size_dev_list; > +extern struct gomp_icv_list *gomp_initial_max_active_levels_var_dev_list; > +extern struct gomp_icv_list *gomp_initial_proc_bind_var_dev_list; > +extern struct gomp_icv_list *gomp_initial_proc_bind_var_list_dev_list; > +extern struct gomp_icv_list *gomp_initial_proc_bind_var_list_len_dev_list; > +extern struct gomp_icv_list *gomp_initial_nteams_var_dev_list; > + > extern struct gomp_icv_list *gomp_nteams_var_dev_list; > -extern struct gomp_icv_list *gomp_max_active_levels_var_dev_list; > -extern struct gomp_icv_list *gomp_proc_bind_var_dev_list; > -extern struct gomp_icv_list *gomp_proc_bind_var_list_dev_list; > -extern struct gomp_icv_list *gomp_proc_bind_var_list_len_dev_list; > +extern struct gomp_icv_list *gomp_teams_thread_limit_var_dev_list; Nor these per-var lists. For a specific device, walk the list with all the vars in it, start with the most specific (matching dev number), then just dev and then all and fill in from it what is going to be copied. > --- a/libgomp/plugin/plugin-gcn.c > +++ b/libgomp/plugin/plugin-gcn.c > @@ -572,7 +572,8 @@ static char *GOMP_ICV_STRINGS[] = >XSTRING (GOMP_DYN_VAR), >XSTRING (GOMP_MAX_ACTIVE_LEVELS_VAR), >XSTRING (GOMP_BIND_VAR), > - XSTRING (GOMP_NTEAMS_VAR) > + XSTRING (GOMP_NTEAMS_VAR), > + XSTRING (GOMP_TEAMS_THREAD_LIMIT_VAR) Then you don't need to e.g. track the names of the individual vars, just one for the whole ICV block. Jakub
[PING][Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
Hi, I’d like to ping the patch for the OpenMP runtime routines omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices: https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593260.html This patch builds on the following patch which is currently in revision/review: - [PATCH] OpenMP, libgomp: Environment variable syntax extension. https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588728.html As several technical details will be changed anyway due to revision of the environment variable extension patch, a complete review does not make sense yet from my point of view. However, I wondered if a "rough" review about the main approach/idea is feasible, so that necessary changes could be included in the revision that is needed anyway. Thanks Marcel - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
Hi, This patch adds support for omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices. The patch builds on the following patches which are submitted, but not yet approved/committed: - [PATCH] OpenMP, libgomp: Environment variable syntax extension. https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588728.html - [PATCH] OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr. https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591556.html The OpenMP runtime routines omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit were introduced in OpenMP 5.1 and where already implemented for the host usage with patch https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581283.html The new patch extends the functionality of these OpenMP runtime routines by the usage also on the device, i.e. device-specific values for nteams-var and teams-thread-limit-var ICVs can now be retrieved and set also on the device. The updated number of teams/threads are then used when launching the kernel. The following main aspects are considered: (a) Implemented the functions in the according icv-device files. (b) Added structures to not only store initial device-specific values (they have to be kept for omp_display_env) but also device-specific ICV values that can be changed on the device at runtime. (c) Changed the gimplification: (c.1) Introduced integer_minus_two_node. (c.2) For target regions that do not include teams constructs, now the clause num_teams(-2) is added instead num_teams(1). This was necessary as num_teams(1) is ambigious: it can also mean that a teams construct with explicit num_teams(1) clause was specified inside the target region. The disambiguation is needed in order to choose the correct thread limit: teams-thread-limit-var is only intended for teams constructs such that if there is no teams construct, then the number of threads is limited by thread-limit-var. (d) Extend GOMP_target_ext. The host needs to set the device-specific ICV values before the kernel is launched. The number of teams and threads are members of the args list and are modified when no value was specified in an explicit clause and the computation of the value was not postponed due to mapped variables. (d.1) The arguments list is copied in order to guarantee immutability. (e) Added copy back mechanism for ICVs which are modified on the device. The only way to change device-specific ICVs is to do it on the device. As the device-specific values are sometimes needed also on the host when the kernel is launched (particularly number of teams and threads) they have to be copied back. The patch was tested on x86_64-linux with nvptx and gcn offloading. All with no regressions. Marcel - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 gcc/ChangeLog: * gimplify.cc (optimize_target_teams): Changed integer_one_node to integer_minus_two_node in case of non-existing teams construct in target region due to disambiguation. Previously, num_teams(1) was used as clause on the target construct when (a) no teams construct exists in the target region or (b) a teams construct with explicit num_teams(1) clause was specified. * tree-core.h (enum tree_index): Added TI_INTEGER_MINUS_TWO. * tree.cc (build_common_tree_nodes): Added integer_minus_two_node. * tree.h (integer_minus_two_node): Likewise. libgomp/ChangeLog: * config/gcn/icv-device.c (omp_set_num_teams): Added. (omp_get_teams_thread_limit): Added. (omp_set_teams_thread_limit): Added. (ialias): Added for omp_set_num_teams and omp_{gs}et_teams_thread_limit. * config/nvptx/icv-device.c (omp_set_num_teams): Likewise. (omp_get_teams_thread_limit): Likewise. (omp_set_teams_thread_limit): Likewise. (ialias): Likewise. * env.c (struct gomp_default_icv_t): Added to hold default ICV values. (struct gomp_icv_list): Removed static. (omp_display_env): Renaming of used lists. (add_device_specific_icv): Removed static. (gomp_add_device_specific_icv): Removed static. (parse_device_specific): Renaming of used lists and added storing of parsed values in lists of modifiable ICV values. * icv-device.c (omp_set_num_teams): Added. (ialias): Added for omp_set_num_teams and omp_{gs}et_teams_thread_limit. (omp_get_teams_thread_limit): Added. (omp_set_teams_thread_limit): Added. * icv.c (omp_set_num_teams): Removed. (omp_set_teams_thread_limit): Removed. (omp_get_teams_thread_limit): Removed. (ialias): Removed for omp_set_num_teams and omp_{gs}et_teams_thread_limit. * libgomp-plugin.h