Re: [PATCH, OpenACC 2.7] Implement reductions for arrays and structs

2024-03-13 Thread Tobias Burnus

Hi Chung-Lin,


https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641669.html

Chung-Lin Tang wrote:

this patch implements reductions for arrays and structs for OpenACC. Following 
the pattern for OpenACC reductions [...]


(Stumbled over while looking at the Fortran patch, but applying to 
C/C++, hence mentioned here; the Fortran patch is at 
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645205.html )



OpenACC permits array elements and subarrays. I have not checked whether 
array elements are currently rejected or fully supported, but I miss a 
testcase for both array elements (unless there is one already) and array 
sections.


If implemented, I think there should be a working run-time test.
If not supported, there should be a sorry_at error for those.

Note: the parser should handle array sections as OpenMP handles them.

The testcase should cover something like the following:

void f(int n)
{
  int x[5][5]; // Multimensional array;
  int y[n]; // VLA
  int *z = (int*)malloc(5*5*sizeof(int)); // Allocated array

... reduction(+:x)
... reduction(+:y)

... reduction(+:x[0:5][2:1])  // OK
... reduction(+:x[1:4][2:1])
  // invalid - while contiguous, first dim does not span the whole array
... reduction(+:y[2:2])  // OK
... reduction(+:y[3:])  // OK - same as [3:n-3]
... reduction(+:y[:2])  // OK - same as [0:2]
... reduction(+:z[1:2][1:6])  // OK

And the same where at least one of the const number is replaced by
a variable.

Note: The 'invalid' reduction is fine in terms of being contiguous (last 
dimension contains a single element, hence, the dimension before does 
not need to span the whole extend) - but OpenACC requires the all 
dimensions but the last to span the whole range.


See "2.7.1 Data Specification in Data Clauses" for the subarray description.

I think - if known at compile time - there should be also a diagnostic 
if the any dimension but the last does not span the whole range.


Thanks,

Tobias


Re: [PATCH, OpenACC 2.7] Implement reductions for arrays and structs

2024-01-10 Thread Julian Brown
On Tue, 2 Jan 2024 23:21:21 +0800
Chung-Lin Tang  wrote:

> To Julian, there is a patch to the middle-end neutering, a hack
> actually, that detects SSA_NAMEs used in reduction array MEM_REFs,
> and avoids single->parallel copying (by moving those definitions
> before BUILT_IN_GOACC_SINGLE_COPY_START). This appears to work
> because reductions do their own initializing of the private copy.

It looks OK to me I think (bearing in mind your following paragraph, of
course!). I wonder though if maybe non-SSA (i.e. addressable) variables
need to be handled also, i.e. parts like this:

+  /* For accesses of variables used in array reductions, instead of
+ propagating the value for the main thread to all other worker threads
+ (which doesn't make sense as a reduction private var), move the defs
+ of such SSA_NAMEs to before the copy block and leave them alone (each
+ thread should access their own local copy).  */
+  for (gimple_stmt_iterator i = gsi_after_labels (from); !gsi_end_p (i);)
+{
+  gimple *stmt = gsi_stmt (i);
+  if (gimple_assign_single_p (stmt)
+ && def_escapes_block->contains (gimple_assign_lhs (stmt))
+ && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME)

are only handling SSA-converted variables. But maybe that's OK?

> As we discussed in our internal calls, the real proper way is to
> create the private array in a more appropriate stage, but that is too
> long a shot for now. The changes here are needed at least for some
> -O0 cases (when under optimization, propagation of the private
> copies' local address eliminate the SSA_NAME and things actually just
> work in that case). So please bear with this hack.

HTH,

Julian


[PATCH, OpenACC 2.7] Implement reductions for arrays and structs

2024-01-02 Thread Chung-Lin Tang
Hi Thomas, Andrew,
this patch implements reductions for arrays and structs for OpenACC. Following 
the pattern for OpenACC reductions, this is mostly in the respective NVPTX/GCN 
backends' *_goacc_reduction_setup/init/fini/teardown hooks, particularly in the 
fini part, and [nvptx/gcn]_reduction_update routines. The code is mostly 
similar between the two targets, with mostly the lack of vector mode handling 
in GCN.

To Julian, there is a patch to the middle-end neutering, a hack actually, that 
detects SSA_NAMEs used in reduction array MEM_REFs, and avoids single->parallel 
copying (by moving those definitions before BUILT_IN_GOACC_SINGLE_COPY_START). 
This appears to work because reductions do their own initializing of the 
private copy.

As we discussed in our internal calls, the real proper way is to create the 
private array in a more appropriate stage, but that is too long a shot for now. 
The changes here are needed at least for some -O0 cases (when under 
optimization, propagation of the private copies' local address eliminate the 
SSA_NAME and things actually just work in that case). So please bear with this 
hack.

I believe the new added libgomp testcases should be fairly complete. Though 
note that one case of reduction of * for double arrays has been commented out 
for now, for there appears to be a (presumably) unrelated issue causing this 
case to fail (maybe has to do with the loop-based atomic form used by both 
NVPTX/GCN). Maybe should XFAIL instead of comment out. Will do this in next 
iteration.

Thanks,
Chung-Lin

2024-01-02  Chung-Lin Tang  

gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_clause_reduction): Adjustments for
OpenACC-specific cases.
* c-typeck.cc (c_oacc_reduction_defined_type_p): New function.
(c_oacc_reduction_code_name): Likewise.
(c_finish_omp_clauses): Handle OpenACC cases using new functions.

gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_clause_reduction): Adjustments for
OpenACC-specific cases.
* semantics.cc (cp_oacc_reduction_defined_type_p): New function.
(cp_oacc_reduction_code_name): Likewise.
(finish_omp_reduction_clause): Handle OpenACC cases using new functions.

gcc/ChangeLog:
* config/gcn/gcn-tree.cc (gcn_reduction_update): Additions for
handling ARRAY_TYPE and RECORD_TYPE reductions.
(gcn_goacc_reduction_setup): Likewise.
(gcn_goacc_reduction_init): Likewise.
(gcn_goacc_reduction_fini): Likewise.
(gcn_goacc_reduction_teardown): Likewise.

* config/nvptx/nvptx.cc (nvptx_gen_shuffle): Properly generate
V2SI shuffle using vec_extract op.
(nvptx_get_shared_red_addr): Adjust type/alignment calculations to
use TYPE_SIZE/ALIGN_UNIT instead of machine mode based.
(nvptx_reduction_update): Additions for handling ARRAY_TYPE and
RECORD_TYPE reductions.
(nvptx_goacc_reduction_setup): Likewise.
(nvptx_goacc_reduction_init): Likewise.
(nvptx_goacc_reduction_fini): Likewise.
(nvptx_goacc_reduction_teardown): Likewise.

* omp-low.cc (scan_sharing_clauses): Adjust ARRAY_REF pointer type
building to use decl type, rather than generic ptr_type_node.
(omp_reduction_init_op): Add ARRAY_TYPE and RECORD_TYPE init op
construction.
(lower_oacc_reductions): Add code to teardown/recover array access
MEM_REF in OMP_CLAUSE_DECL, to accomodate for lookup requirements.
Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT
instead of machine mode based.

* omp-oacc-neuter-broadcast.cc (worker_single_copy):
Add 'hash_set *array_reduction_base_vars' parameter.
Add xxx.

(neuter_worker_single): Add 'hash_set *array_reduction_base_vars'
parameter. Adjust recursive calls to self and worker_single_copy.
(oacc_do_neutering): Add 'hash_set *array_reduction_base_vars'
parameter. Adjust call to neuter_worker_single.
(execute_omp_oacc_neuter_broadcast): Add local
'hash_set array_reduction_base_vars' declaration. Collect MEM_REF
base-pointer SSA_NAMEs of arrays into array_reduction_base_vars. Add
'_reduction_base_vars' argument to call of oacc_do_neutering.

* omp-offload.cc (default_goacc_reduction): Add unshare_expr.

gcc/testsuite/ChangeLog:
* c-c++-common/goacc/reduction-9.c: New test.
* c-c++-common/goacc/reduction-10.c: New test.
* c-c++-common/goacc/reduction-11.c: New test.
* c-c++-common/goacc/reduction-12.c: New test.
* c-c++-common/goacc/reduction-13.c: New test.

libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/reduction.h
(check_reduction_array_xx): New macro.
(operator_apply): Likewise.
(check_reduction_array_op): Likewise.
(check_reduction_arraysec_op): Likewise.