[PATCH, OpenACC] (1/2) Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)

2018-08-28 Thread Julian Brown
This patch implements support for array slices (with a non-zero base
element) declared on OpenACC data constructs. Any lexically-enclosed
parallel or kernels regions should "inherit" such mappings, e.g. if we
have:

#pragma acc data copy(arr[10:20])
{
#pragma acc parallel loop
  for (...) { ...arr[X]... }
}

the mapping for "arr" on the data construct takes precedence over the
default mapping behaviour for the parallel construct, which is to map
the whole array. (OpenACC 2.5, "2.5.1. Parallel Construct" and
elsewhere).

Tested with offloading to nvptx. (This patch differs in implementation
somewhat from the version on the gomp4, etc. branches.)

OK to apply?

Thanks,

Julian

2018-08-28  Julian Brown  
Cesar Philippidis  

PR middle-end/70828

gcc/
* gimplify.c (gimplify_omp_ctx): Add decl_data_clause hash map.
(new_omp_context): Initialise above.
(delete_omp_context): Delete above.
(gimplify_scan_omp_clauses): Scan for array mappings on data constructs,
and record in above map.
(gomp_needs_data_present): New function.
(gimplify_adjust_omp_clauses_1): Handle data mappings (e.g. array
slices) declared in lexically-enclosing data constructs.
* omp-low.c (lower_omp_target): Allow decl for bias not to be present
in omp context.

gcc/testsuite/
* c-c++-common/goacc/acc-data-chain.c: New test.
* gfortran.dg/goacc/pr70828.f90: New test.
* gfortran.dg/goacc/pr70828-2.f90: New test.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/pr70828.c: New test.
* testsuite/libgomp.oacc-fortran/implicit_copy.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-2.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-3.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-5.f90: New test.
>From 9123c4ddd701c40c3e85a0c6cd327066542b9e7a Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Thu, 16 Aug 2018 20:02:10 -0700
Subject: [PATCH 1/2] Inheritance of array sections on data constructs.

2018-08-28  Julian Brown  
	Cesar Philippidis  

	gcc/
	* gimplify.c (gimplify_omp_ctx): Add decl_data_clause hash map.
	(new_omp_context): Initialise above.
	(delete_omp_context): Delete above.
	(gimplify_scan_omp_clauses): Scan for array mappings on data constructs,
	and record in above map.
	(gomp_needs_data_present): New function.
	(gimplify_adjust_omp_clauses_1): Handle data mappings (e.g. array
	slices) declared in lexically-enclosing data constructs.
	* omp-low.c (lower_omp_target): Allow decl for bias not to be present
	in omp context.

	gcc/testsuite/
	* c-c++-common/goacc/acc-data-chain.c: New test.
	* gfortran.dg/goacc/pr70828.f90: New test.
	* gfortran.dg/goacc/pr70828-2.f90: New test.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/pr70828.c: New test.
	* testsuite/libgomp.oacc-fortran/implicit_copy.f90: New test.
	* testsuite/libgomp.oacc-fortran/pr70828.f90: New test.
	* testsuite/libgomp.oacc-fortran/pr70828-2.f90: New test.
	* testsuite/libgomp.oacc-fortran/pr70828-3.f90: New test.
	* testsuite/libgomp.oacc-fortran/pr70828-5.f90: New test.
---
 gcc/gimplify.c | 97 +-
 gcc/omp-low.c  |  7 +-
 gcc/testsuite/c-c++-common/goacc/acc-data-chain.c  | 24 ++
 gcc/testsuite/gfortran.dg/goacc/pr70828.f90| 22 +
 .../libgomp.oacc-c-c++-common/pr70828-2.c  | 34 
 .../testsuite/libgomp.oacc-c-c++-common/pr70828.c  | 27 ++
 .../libgomp.oacc-fortran/implicit_copy.f90 | 30 +++
 .../testsuite/libgomp.oacc-fortran/pr70828-2.f90   | 31 +++
 .../testsuite/libgomp.oacc-fortran/pr70828-3.f90   | 34 
 .../testsuite/libgomp.oacc-fortran/pr70828-5.f90   | 29 +++
 libgomp/testsuite/libgomp.oacc-fortran/pr70828.f90 | 24 ++
 11 files changed, 354 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/acc-data-chain.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/pr70828.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/pr70828-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/pr70828.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/implicit_copy.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pr70828-2.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pr70828-3.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pr70828-5.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pr70828.f90

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index dbd0f0e..d704aef 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -191,6 +191,7 @@ struct gimplify_omp_ctx
   bool target_map_scalars_firstprivate;
   bool target_map_pointers_as_0len_arrays;
   bool target_firstprivatize_array_bases

Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2018-08-16 Thread Julian Brown
On Wed, 15 Aug 2018 21:56:54 +0200
Bernhard Reutner-Fischer  wrote:

> On 15 August 2018 18:46:37 CEST, Julian Brown
>  wrote:
> >On Mon, 13 Aug 2018 12:06:21 -0700
> >Cesar Philippidis  wrote:  
> 
> atttribute has more t than strictly necessary. 
> Don't like signed integer levels where they should be some unsigned. 
> Also don't like single switch cases instead of if.
> And omitting function comments even if the hook way above is
> documented may be ok ish but is a bit lazy ;)

Here's a new version with those comments addressed. I also changed the
logic around a little to avoid adding decls to the vec in omp_context
which would never be given the gang-private attribute.

Re-tested with offloading to NVPTX.

OK?

Julian

2018-08-10  Julian Brown  
Chung-Lin Tang  

gcc/
* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
(gangprivate_shared_size): New global variable.
(gangprivate_shared_align): Likewise.
(gangprivate_shared_sym): Likewise.
(gangprivate_shared_hmap): Likewise.
(nvptx_option_override): Initialize gangprivate_shared_sym,
gangprivate_shared_align.
(nvptx_file_end): Output gangprivate_shared_sym.
(nvptx_goacc_expand_accel_var): New function.
(nvptx_set_current_function): New function.
(TARGET_SET_CURRENT_FUNCTION): Define hook.
(TARGET_GOACC_EXPAND_ACCEL): Likewise.
* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
* expr.c (expand_expr_real_1): Remap decls marked with the
"oacc gangprivate" attribute.
* omp-low.c (omp_context): Add oacc_partitioning_level and
oacc_addressable_var_decls fields.
(new_omp_context): Initialize oacc_addressable_var_decls in new
omp_context.
(delete_omp_context): Delete oacc_addressable_var_decls in old
omp_context.
(lower_oacc_head_tail): Record partitioning-level count in omp context.
(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
(mark_oacc_gangprivate): New functions.
(lower_omp_for): Call oacc_record_private_var_clauses with "for"
clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
(lower_omp_target): Call oacc_record_private_var_clauses with "target"
clauses.
Call mark_oacc_gangprivate for offloaded target regions.
(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
* target.def (expand_accel_var): New hook.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
* testsuite/libgomp.oacc-c/pr85465.c: New test.
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.
commit e276442550a85b62866ba13890eacf4e946d1079
Author: Julian Brown 
Date:   Thu Aug 9 20:27:04 2018 -0700

[OpenACC] Add support for gang local storage allocation in shared memory

2018-08-10  Julian Brown  
	Chung-Lin Tang  

	gcc/
	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
	(gangprivate_shared_size): New global variable.
	(gangprivate_shared_align): Likewise.
	(gangprivate_shared_sym): Likewise.
	(gangprivate_shared_hmap): Likewise.
	(nvptx_option_override): Initialize gangprivate_shared_sym,
	gangprivate_shared_align.
	(nvptx_file_end): Output gangprivate_shared_sym.
	(nvptx_goacc_expand_accel_var): New function.
	(nvptx_set_current_function): New function.
	(TARGET_SET_CURRENT_FUNCTION): Define hook.
	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
	* expr.c (expand_expr_real_1): Remap decls marked with the
	"oacc gangprivate" attribute.
	* omp-low.c (omp_context): Add oacc_partitioning_level and
	oacc_addressable_var_decls fields.
	(new_omp_context): Initialize oacc_addressable_var_decls in new
	omp_context.
	(delete_omp_context): Delete oacc_addressable_var_decls in old
	omp_context.
	(lower_oacc_head_tail): Record partitioning-level count in omp context.
	(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
	(mark_oacc_gangprivate): New functions.
	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
	clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
	(lower_omp_target): Call oacc_record_private_var_clauses with "target"
	clauses.
	Call mark_oacc_gangprivate for offloaded target regions.
	(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
	* target.def (expand_accel_var): New hook.

	libgomp/
	* testsuite/libgomp.oacc-c-c++

Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2018-08-15 Thread Julian Brown
On Mon, 13 Aug 2018 12:06:21 -0700
Cesar Philippidis  wrote:

> So in other words, this is safe for fortran. It probably could use a
> fortran test, because that functionality wasn't explicitly exercised
> in og7/og8.

Here's a new version of the patch with a Fortran test case. It's not
too easy to write a test that depends on whether gang-local variables
actually end up in the right kind of memory, so I wrote one that scans
the omplower dump instead. Many other (including execution) tests will
already trigger the new behaviour.

Tested with offloading to NVPTX.

OK?

Thanks,

Julian

2018-08-10  Julian Brown  
Chung-Lin Tang  

gcc/
* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
(gangprivate_shared_size): New global variable.
(gangprivate_shared_align): Likewise.
(gangprivate_shared_sym): Likewise.
(gangprivate_shared_hmap): Likewise.
(nvptx_option_override): Initialize gangprivate_shared_sym,
gangprivate_shared_align.
(nvptx_file_end): Output gangprivate_shared_sym.
(nvptx_goacc_expand_accel_var): New function.
(nvptx_set_current_function): New function.
(TARGET_SET_CURRENT_FUNCTION): Define hook.
(TARGET_GOACC_EXPAND_ACCEL): Likewise.
* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
* expr.c (expand_expr_real_1): Remap decls marked with the
"oacc gangprivate" atttribute.
* omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
fields.
(new_omp_context): Initialize oacc_decls in new omp_context.
(delete_omp_context): Delete oacc_decls in old omp_context.
(lower_oacc_head_tail): Record partitioning-level count in omp context.
(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
(mark_oacc_gangprivate): New functions.
(lower_omp_for): Call oacc_record_private_var_clauses with "for"
clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
(lower_omp_target): Call oacc_record_private_var_clauses with "target"
clauses.
Call mark_oacc_gangprivate for offloaded target regions.
(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
* target.def (expand_accel_var): New hook.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
* testsuite/libgomp.oacc-c/pr85465.c: New test.
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.
commit b73428237720be8d5b6e793f8615204356336d30
Author: Julian Brown 
Date:   Thu Aug 9 20:27:04 2018 -0700

[OpenACC] Add support for gang local storage allocation in shared memory

2018-08-10  Julian Brown  
	Chung-Lin Tang  

	gcc/
	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
	(gangprivate_shared_size): New global variable.
	(gangprivate_shared_align): Likewise.
	(gangprivate_shared_sym): Likewise.
	(gangprivate_shared_hmap): Likewise.
	(nvptx_option_override): Initialize gangprivate_shared_sym,
	gangprivate_shared_align.
	(nvptx_file_end): Output gangprivate_shared_sym.
	(nvptx_goacc_expand_accel_var): New function.
	(nvptx_set_current_function): New function.
	(TARGET_SET_CURRENT_FUNCTION): Define hook.
	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
	* expr.c (expand_expr_real_1): Remap decls marked with the
	"oacc gangprivate" atttribute.
	* omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
	fields.
	(new_omp_context): Initialize oacc_decls in new omp_context.
	(delete_omp_context): Delete oacc_decls in old omp_context.
	(lower_oacc_head_tail): Record partitioning-level count in omp context.
	(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
	(mark_oacc_gangprivate): New functions.
	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
	clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
	(lower_omp_target): Call oacc_record_private_var_clauses with "target"
	clauses.
	Call mark_oacc_gangprivate for offloaded target regions.
	(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
	* target.def (expand_accel_var): New hook.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
	* testsuite/libgomp.oacc-c/pr85465.c: New test.
	* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx

Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2018-08-13 Thread Julian Brown
On Mon, 13 Aug 2018 11:42:26 -0700
Cesar Philippidis  wrote:

> On 08/13/2018 09:21 AM, Julian Brown wrote:
> 
> > diff --git
> > a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
> > b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c new file
> > mode 100644 index 000..2fa708a --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
> > @@ -0,0 +1,106 @@
> > +/* { dg-xfail-run-if "gangprivate
> > failure" { openacc_nvidia_accel_selected } { "-O0" } { "" } } */  
> 
> As a quick comment, I like the approach that you've taken with this
> patch, but the og8 patch only applies the gangprivate attribute in the
> c/c++ FE. I'd have to review the notes, but I seem to recall that
> excluding that clause in fortran was deliberate. Chung-Lin, do you
> recall the rationale behind that?
> 
> With that aside, is the above xfail still necessary? It seems to xpass
> for me on nvptx. However, I see this regression on the host:
> 
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/loop-gwv-2.c
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1  -O2  execution test
> 
> There could be other regressions, but I only tested the new tests
> introduced by the patch so far.

Oops, this was the version of the patch I meant to post (and the one I
tested). The XFAIL on loop-gwv-2.c isn't necessary, plus that test
needed some other fixes to make it pass for NVPTX (it was written for
GCN to start with).

Everything else is the same. I'll see what I can come up with for a
Fortran test.

Thanks,

Julian
commit 7834b2f0dffec3e56e510c04e1663424b778fdfb
Author: Julian Brown 
Date:   Thu Aug 9 20:27:04 2018 -0700

[OpenACC] Add support for gang local storage allocation in shared memory

2018-08-10  Julian Brown  
	Chung-Lin Tang  

	gcc/
	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
	(gangprivate_shared_size): New global variable.
	(gangprivate_shared_align): Likewise.
	(gangprivate_shared_sym): Likewise.
	(gangprivate_shared_hmap): Likewise.
	(nvptx_option_override): Initialize gangprivate_shared_sym,
	gangprivate_shared_align.
	(nvptx_file_end): Output gangprivate_shared_sym.
	(nvptx_goacc_expand_accel_var): New function.
	(nvptx_set_current_function): New function.
	(TARGET_SET_CURRENT_FUNCTION): Define hook.
	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
	* expr.c (expand_expr_real_1): Remap decls marked with the
	"oacc gangprivate" atttribute.
	* omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
	fields.
	(new_omp_context): Initialize oacc_decls in new omp_context.
	(delete_omp_context): Delete oacc_decls in old omp_context.
	(lower_oacc_head_tail): Record partitioning-level count in omp context.
	(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
	(mark_oacc_gangprivate): New functions.
	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
	clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
	(lower_omp_target): Call oacc_record_private_var_clauses with "target"
	clauses.
	Call mark_oacc_gangprivate for offloaded target regions.
	(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
	* target.def (expand_accel_var): New hook.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
	* testsuite/libgomp.oacc-c/pr85465.c: New test.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index c0b0a2e..14eb842 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -73,6 +73,7 @@
 #include "cfgloop.h"
 #include "fold-const.h"
 #include "intl.h"
+#include "tree-hash-traits.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -137,6 +138,12 @@ static unsigned worker_red_size;
 static unsigned worker_red_align;
 static GTY(()) rtx worker_red_sym;
 
+/* Shared memory block for gang-private variables.  */
+static unsigned gangprivate_shared_size;
+static unsigned gangprivate_shared_align;
+static GTY(()) rtx gangprivate_shared_sym;
+static hash_map gangprivate_shared_hmap;
+
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
@@ -210,6 +217,10 @@ nvptx_option_override (void)
   SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 
+  gangprivate_shared_sym = gen_rtx_SYMBOL_REF (Pmode, "__gangprivate_shared");
+  SET_SYMBOL_DATA_AREA (gangprivate_s

[PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2018-08-13 Thread Julian Brown
This patch adds support for placing gang-private variables in NVPTX
per-CU shared memory. This is done by marking up addressable variables
declared at the appropriate parallelism level with an attribute ("oacc
gangprivate") in omp-low.c.

Target-dependent code in the NVPTX backend then modifies the symbol
associated with the variable at expand time via a new target hook
(TARGET_GOACC_EXPAND_ACCEL_VAR) in order to place it in shared memory,
which is faster to access than the ".local" memory that would otherwise
be used for such variables. This has (theoretical, at least)
consequences on program semantics, in that the shared memory is also
statically-allocated rather than obeying stack discipline -- but you
can't have recursive routine calls in OpenACC anyway, so that's no big
deal.

Other targets can use the same attribute in different ways, as
appropriate.

OK for trunk?

Thanks,

Julian

2018-08-10  Julian Brown  
Chung-Lin Tang  

gcc/
* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
(gangprivate_shared_size): New global variable.
(gangprivate_shared_align): Likewise.
(gangprivate_shared_sym): Likewise.
(gangprivate_shared_hmap): Likewise.
(nvptx_option_override): Initialize gangprivate_shared_sym,
gangprivate_shared_align.
(nvptx_file_end): Output gangprivate_shared_sym.
(nvptx_goacc_expand_accel_var): New function.
(nvptx_set_current_function): New function.
(TARGET_SET_CURRENT_FUNCTION): Define hook.
(TARGET_GOACC_EXPAND_ACCEL): Likewise.
* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
* expr.c (expand_expr_real_1): Remap decls marked with the
"oacc gangprivate" atttribute.
* omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
fields.
(new_omp_context): Initialize oacc_decls in new omp_context.
(delete_omp_context): Delete oacc_decls in old omp_context.
(lower_oacc_head_tail): Record partitioning-level count in omp context.
(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
(mark_oacc_gangprivate): New functions.
(lower_omp_for): Call oacc_record_private_var_clauses with "for"
clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
(lower_omp_target): Call oacc_record_private_var_clauses with "target"
clauses.
Call mark_oacc_gangprivate for offloaded target regions.
(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
* target.def (expand_accel_var): New hook.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
* testsuite/libgomp.oacc-c/pr85465.c: New test.
commit 9637e7ea887e100f35d99b8d12101f9f8a9b94e3
Author: Julian Brown 
Date:   Thu Aug 9 20:27:04 2018 -0700

[OpenACC] Add support for gang local storage allocation in shared memory

2018-08-10  Julian Brown  
	Chung-Lin Tang  

	gcc/
	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
	(gangprivate_shared_size): New global variable.
	(gangprivate_shared_align): Likewise.
	(gangprivate_shared_sym): Likewise.
	(gangprivate_shared_hmap): Likewise.
	(nvptx_option_override): Initialize gangprivate_shared_sym,
	gangprivate_shared_align.
	(nvptx_file_end): Output gangprivate_shared_sym.
	(nvptx_goacc_expand_accel_var): New function.
	(nvptx_set_current_function): New function.
	(TARGET_SET_CURRENT_FUNCTION): Define hook.
	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
	* expr.c (expand_expr_real_1): Remap decls marked with the
	"oacc gangprivate" atttribute.
	* omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
	fields.
	(new_omp_context): Initialize oacc_decls in new omp_context.
	(delete_omp_context): Delete oacc_decls in old omp_context.
	(lower_oacc_head_tail): Record partitioning-level count in omp context.
	(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
	(mark_oacc_gangprivate): New functions.
	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
	clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
	(lower_omp_target): Call oacc_record_private_var_clauses with "target"
	clauses.
	Call mark_oacc_gangprivate for offloaded target regions.
	(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
	* target.def (expand_accel_var): New hook.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
   

Re: ivopts vs. garbage collection

2016-01-12 Thread Julian Brown
On Mon, 11 Jan 2016 13:51:25 -0700
Tom Tromey  wrote:

> > "Michael" == Michael Matz  writes:  
> 
> Michael> Well, that's a hack.  A solution is to design something that
> Michael> works generally for garbage collected languages with such
> Michael> requirements instead of arbitrarily limiting transformations
> Michael> here and there.  It could be something like the notion of
> Michael> derived pointers, where the base pointer needs to stay alive
> Michael> as long as the derived pointers are.  
> 
> This was done once in GCC, for the Modula 3 compiler.
> There was a paper about it, but I can't find it any more.
> 
> The basic idea was to emit a description of the stack frame that their
> GC could read.  They had a moving GC that could use this information
> to rewrite the frame when moving objects.

This one perhaps?

https://www.cs.purdue.edu/homes/hosking/papers/ismm06.pdf

Julian


Re: [PATCH, libgomp] Rewire OpenACC async

2015-12-01 Thread Julian Brown
On Tue, 24 Nov 2015 18:27:24 +0800
Chung-Lin Tang  wrote:

> Hi, this patch reworks some of the way that asynchronous copyouts are
> implemented for OpenACC in libgomp.
> 
> Before this patch, we had a somewhat confusing way of implementing
> this by having two refcounts for each mapping: refcount and
> async_refcount, which I never got working again after the last wave
> of async regressions showed up.
> 
> So this patch implements what I believe to be a simplification:
> async_refcount is removed, and instead of trying to queue the async
> copyouts during unmapping we actually do that during the plugin event
> handling. This requires a addition of the async stream integer as an
> argument to the register_async_cleanup plugin hook, but overall I
> think this should be more elegant than before.

This looks OK to me I think (I've only looked fairly briefly). I vaguely
remember trying something along these lines in an earlier iteration of
the async support -- maybe hitting problems with locking (I see you
have code to mitigate problems with that, and locking generally has
probably evolved a bit since I last looked at the code in detail
anyway).

Can event_gc ever be called when the *device* lock is held?

I'm slightly concerned that pushing async unmapping into event_gc means
that program-level semantics are deferred to the backend, which is
arguably the wrong place. But then I don't understand what went wrong
with the dual-refcount implementation, so maybe it's unavoidable for
some reason.

HTH,

Julian


Re: [OpenACC 0/7] host_data construct

2015-11-30 Thread Julian Brown
On Thu, 19 Nov 2015 16:57:23 +0100
Jakub Jelinek <ja...@redhat.com> wrote:

> If it is unclear, I think disallowing acc {parallel,kernels} inside of
> acc host_data might be too big hammer, but perhaps just erroring out
> or warning during gimplification that if you (explicitly or
> implicitly) try to map a var that is in use_device clause in some
> outer context, it is either wrong, unsupported or will not do what
> users think?

I think we can only assume that trying to map a variable declared in
a surrounding use_device clause is undefined behaviour. I haven't had
any response to my questions about host_data & deviceptr on the OpenACC
list.

> > #pragma acc host_data use_device(x)
> > {
> >   target_primitive(x);
> >   #pragma acc parallel deviceptr(x)
> >   {
> > ...
> >   }
> > }
> 
> Is deviceptr as above meant to work?  That is the OpenACC counterpart
> of is_device_ptr, right?  If yes, then I'd suggest just warning if you
> try to implicitly or explicitly map something use_device in outer
> contexts, and just make sure you don't ICE on the cases where you
> warn. If the standard does not say what it means, then it is
> unspecified behavior...

A problem with deviceptr, unlike is_device_ptr, is that it turns out to
be defined only to work with pointers, not arrays (OpenACC 2.0a
2.6.5.2), and there are no rules describing the latter decaying to the
former. So at least if 'x' is an array, it appears the answer is "no".

So, the attached patch disallows (via raising an error):

* Variables being declared in explicit mapping clauses that are
  declared in enclosing host_data regions.

* Variables being implicitly used (mapped) in offloaded regions that
  are declared in enclosing host_data regions.

It's otherwise equivalent to the previously-posted version, but without
the hacks to {maybe_,}lookup_decl_in_outer_ctx. I added checks for the
above conditions during gimplification, which seemed to be about the
same phase that other similar kinds of errors are diagnosed.

Tests look OK (libgomp/gcc/g++/libstdc++), and the new ones pass.

OK for mainline?

Thanks,

Julian

ChangeLog

Julian Brown  <jul...@codesourcery.com>
Cesar Philippidis  <ce...@codesourcery.com>
James Norris  <james_nor...@mentor.com>

gcc/
* c-family/c-pragma.c (oacc_pragmas): Add PRAGMA_OACC_HOST_DATA.
* c-family/c-pragma.h (pragma_kind): Add PRAGMA_OACC_HOST_DATA.
(pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_USE_DEVICE.
* c/c-parser.c (c_parser_omp_clause_name): Add use_device support.
(c_parser_oacc_clause_use_device): New function.
(c_parser_oacc_all_clauses): Add use_device support.
(OACC_HOST_DATA_CLAUSE_MASK): New macro.
(c_parser_oacc_host_data): New function.
(c_parser_omp_construct): Add host_data support.
* c/c-tree.h (c_finish_oacc_host_data): Add prototype.
* c/c-typeck.c (c_finish_oacc_host_data): New function.
(c_finish_omp_clauses): Add use_device support.
* cp/cp-tree.h (finish_oacc_host_data): Add prototype.
* cp/parser.c (cp_parser_omp_clause_name): Add use_device support.
(cp_parser_oacc_all_clauses): Add use_device support.
(OACC_HOST_DATA_CLAUSE_MASK): New macro.
(cp_parser_oacc_host_data): New function.
(cp_parser_omp_construct): Add host_data support.
(cp_parser_pragma): Add host_data support.
* cp/semantics.c (finish_omp_clauses): Add use_device support.
(finish_oacc_host_data): New function.
* gimple-pretty-print.c (dump_gimple_omp_target): Add host_data
support.
* gimple.h (gf_mask): Add GF_OMP_TARGET_KIND_OACC_HOST_DATA.
(is_gimple_omp_oacc): Add support for above.
* gimplify.c (omp_region_type): Add ORT_ACC_HOST_DATA.
(omp_notice_variable): Diagnose undefined implicit uses of
use_device variables in offloaded regions.
(gimplify_scan_omp_clauses): Add host_data, use_device
support. Diagnose undefined mapping of use_device variables in
OpenACC clauses.
(gimplify_omp_workshare): Add host_data support.
(gimplify_expr): Likewise.
* omp-builtins.def (BUILT_IN_GOACC_HOST_DATA): New.
* omp-low.c (lookup_decl_in_outer_ctx)
(maybe_lookup_decl_in_outer_ctx): Add optional argument to skip
host_data regions.
(scan_sharing_clauses): Support use_device.
(check_omp_nesting_restrictions): Support host_data.
(expand_omp_target): Support host_data.
(lower_omp_target): Skip over outer host_data regions when looking
up decls. Support use_device.
(make_gimple_omp_edges): Support host_data.
* tree-nested.c (convert_nonlocal_omp_clauses): Add use_device
clause.

libgomp/
* oacc-parallel.c (GOACC_host_data): New function.
* libgomp.map (GOACC_host_data): Add to GOACC_2.0.1.
* testsuite/libgomp.oacc-c-c++-common/host_data-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/host_data-2.c: New t

Re: [OpenACC 0/7] host_data construct

2015-11-19 Thread Julian Brown
On Thu, 19 Nov 2015 14:13:45 +0100
Jakub Jelinek <ja...@redhat.com> wrote:

> On Wed, Nov 18, 2015 at 12:47:47PM +0000, Julian Brown wrote:
> 
> The FE/gimplifier part is okay, but I really don't like the
> omp-low.c changes, mostly the *lookup_decl_in_outer_ctx* changes.
> If I count well, we have right now 27 maybe_lookup_decl_in_outer_ctx
> callers and 7 lookup_decl_in_outer_ctx callers, you want to change
> behavior of 1 maybe_lookup_decl_in_outer_ctx and 1
> lookup_decl_in_outer_ctx.  Why exactly those 2 and not the others?

The not-very-good reason is that those are the merely the places that
allowed the supplied examples to work, and I'm wary of changing other
code that I don't understand very well.

> What are the exact rules (what does the standard say about it)?
> I'd expect that all phases (scan_sharing_clauses, lower_omp* and
> expand_omp*) should agree on the same behavior, otherwise I can't see
> how it can work properly.

OK, thanks -- as to what the standard says, it's so ill-specified in
this area that nothing can be learned about the behaviour of offloaded
regions within host_data constructs, and my question about that on the
technical mailing list is still unanswered (actually Nathan suggested
in private mail that the conservative thing to do would be to disallow
offloaded regions entirely within host_data constructs, so maybe that's
the way to go).

OpenMP 4.5 seems to *not* specify the skipping-over behaviour for
use_device_ptr variables (p105, lines 20-23):

"The is_device_ptr clause is used to indicate that a list item is a
device pointer already in the device data environment and that it
should be used directly. Support for device pointers created outside
of OpenMP, specifically outside of the omp_target_alloc routine and the
use_device_ptr clause, is implementation defined."

That suggests that use_device_ptr is a valid way to create device
pointers for use in enclosed target regions: the behaviour I assumed
was wrong for OpenACC. So I think my guess at the "most-obvious"
behaviour was probably misguided anyway.

It's maybe even more complicated. Consider the example:

char x[1024];

#pragma acc enter data copyin(x)

#pragma acc host_data use_device(x)
{
  target_primitive(x);
  #pragma acc parallel present(x)[1]
  {
x[5] = 0;[2]
  }
}

Here, the "present" clause marked [1] will fail (because 'x' is a
target pointer now). If it's omitted, the array access [2] will cause an
implicit present_or_copy to be used for the 'x' pointer (which again
will fail, because now 'x' points to target data). Maybe what we
actually need is,

#pragma acc host_data use_device(x)
{
  target_primitive(x);
  #pragma acc parallel deviceptr(x)
  {
...
  }
}

with the deviceptr(x) clause magically substituted in the parallel
construct, but I'm struggling to see how we could justify doing that
when that behaviour's not mentioned in the spec at all.

Aha, so: maybe manually using deviceptr(x) is implicitly mandatory in
this situation, and missing it out should be an error? That suddenly
seems to make most sense. I'll see about fixing the patch to do that.

Julian


Re: [OpenACC 0/7] host_data construct

2015-11-18 Thread Julian Brown
On Thu, 12 Nov 2015 11:16:21 +
Julian Brown <jul...@codesourcery.com> wrote:

> Here's a version of the patch which (hopefully) brings OpenACC on par
> with OpenMP with respect to use_device/use_device_ptr variables. The
> implementation is essentially the same now for OpenACC as for OpenMP
> (i.e. using mapping structures): so for now, only array or pointer
> variables can be used as use_device variables. The included tests have
> been adjusted accordingly.

Here's a rebased version of the patch, since the previous version no
longer applies cleanly. Re-tested OK (libgomp tests). ChangeLog as
before. (Ping.)

Juliancommit 0201a5927c380da65d6400afad4a0e277fb85786
Author: Julian Brown <jul...@codesourcery.com>
Date:   Mon Nov 2 06:31:47 2015 -0800

OpenACC host_data support using mapping regions.

diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index 12c3e75..56cf697 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1251,6 +1251,7 @@ static const struct omp_pragma_def oacc_pragmas[] = {
   { "declare", PRAGMA_OACC_DECLARE },
   { "enter", PRAGMA_OACC_ENTER_DATA },
   { "exit", PRAGMA_OACC_EXIT_DATA },
+  { "host_data", PRAGMA_OACC_HOST_DATA },
   { "kernels", PRAGMA_OACC_KERNELS },
   { "loop", PRAGMA_OACC_LOOP },
   { "parallel", PRAGMA_OACC_PARALLEL },
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 999ac67..dd246b9 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -33,6 +33,7 @@ enum pragma_kind {
   PRAGMA_OACC_DECLARE,
   PRAGMA_OACC_ENTER_DATA,
   PRAGMA_OACC_EXIT_DATA,
+  PRAGMA_OACC_HOST_DATA,
   PRAGMA_OACC_KERNELS,
   PRAGMA_OACC_LOOP,
   PRAGMA_OACC_PARALLEL,
@@ -167,6 +168,7 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_SELF,
   PRAGMA_OACC_CLAUSE_SEQ,
   PRAGMA_OACC_CLAUSE_TILE,
+  PRAGMA_OACC_CLAUSE_USE_DEVICE,
   PRAGMA_OACC_CLAUSE_VECTOR,
   PRAGMA_OACC_CLAUSE_VECTOR_LENGTH,
   PRAGMA_OACC_CLAUSE_WAIT,
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 7b10764..0a5c8bb 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -10267,6 +10267,8 @@ c_parser_omp_clause_name (c_parser *parser)
 	result = PRAGMA_OMP_CLAUSE_UNTIED;
 	  else if (!strcmp ("use_device_ptr", p))
 	result = PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR;
+	  else if (!strcmp ("use_device", p))
+	result = PRAGMA_OACC_CLAUSE_USE_DEVICE;
 	  break;
 	case 'v':
 	  if (!strcmp ("vector", p))
@@ -11619,6 +11621,15 @@ c_parser_oacc_clause_tile (c_parser *parser, tree list)
   return c;
 }
 
+/* OpenACC 2.0:
+   use_device ( variable-list ) */
+
+static tree
+c_parser_oacc_clause_use_device (c_parser *parser, tree list)
+{
+  return c_parser_omp_var_list_parens (parser, OMP_CLAUSE_USE_DEVICE, list);
+}
+
 /* OpenACC:
wait ( int-expr-list ) */
 
@@ -12928,6 +12939,10 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  clauses = c_parser_oacc_data_clause (parser, c_kind, clauses);
 	  c_name = "self";
 	  break;
+	case PRAGMA_OACC_CLAUSE_USE_DEVICE:
+	  clauses = c_parser_oacc_clause_use_device (parser, clauses);
+	  c_name = "use_device";
+	  break;
 	case PRAGMA_OACC_CLAUSE_SEQ:
 	  clauses = c_parser_oacc_simple_clause (parser, OMP_CLAUSE_SEQ,
 		clauses);
@@ -13577,6 +13592,29 @@ c_parser_oacc_enter_exit_data (c_parser *parser, bool enter)
 
 
 /* OpenACC 2.0:
+   # pragma acc host_data oacc-data-clause[optseq] new-line
+ structured-block
+*/
+
+#define OACC_HOST_DATA_CLAUSE_MASK	\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_USE_DEVICE) )
+
+static tree
+c_parser_oacc_host_data (location_t loc, c_parser *parser)
+{
+  tree stmt, clauses, block;
+
+  clauses = c_parser_oacc_all_clauses (parser, OACC_HOST_DATA_CLAUSE_MASK,
+   "#pragma acc host_data");
+
+  block = c_begin_omp_parallel ();
+  add_stmt (c_parser_omp_structured_block (parser));
+  stmt = c_finish_oacc_host_data (loc, clauses, block);
+  return stmt;
+}
+
+
+/* OpenACC 2.0:
 
# pragma acc loop oacc-loop-clause[optseq] new-line
  structured-block
@@ -16884,6 +16922,9 @@ c_parser_omp_construct (c_parser *parser)
 case PRAGMA_OACC_DATA:
   stmt = c_parser_oacc_data (loc, parser);
   break;
+case PRAGMA_OACC_HOST_DATA:
+  stmt = c_parser_oacc_host_data (loc, parser);
+  break;
 case PRAGMA_OACC_KERNELS:
 case PRAGMA_OACC_PARALLEL:
   strcpy (p_name, "#pragma acc");
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 6bc216a..848131e 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -653,6 +653,7 @@ extern tree c_finish_goto_ptr (location_t, tree);
 extern tree c_expr_to_decl (tree, bool *, bool *);
 extern tree c_finish_omp_construct (location_t, enum tree_code, tree, tree);
 extern tree c_finish_oacc_data (location_t, tree, tree);
+extern tree c_finish_oacc_host_data (location_t, tree, tree);
 extern tree c_begi

Re: [OpenACC 0/7] host_data construct

2015-11-12 Thread Julian Brown
On Mon, 2 Nov 2015 18:33:39 +
Julian Brown <jul...@codesourcery.com> wrote:

> On Mon, 26 Oct 2015 19:34:22 +0100
> Jakub Jelinek <ja...@redhat.com> wrote:
> 
> > Your use_device sounds very similar to use_device_ptr clause in
> > OpenMP, which is allowed on #pragma omp target data construct and is
> > implemented quite a bit differently from this; it is unclear if the
> > OpenACC standard requires this kind of implementation, or you just
> > chose to implement it this way.  In particular, the GOMP_target_data
> > call puts the variables mentioned in the use_device_ptr clauses into
> > the mapping structures (similarly how map clause appears) and the
> > corresponding vars are privatized within the target data region
> > (which is a host region, basically a fancy { } braces), where the
> > private variables contain the offloading device's pointers.  
> 
> As the author of the original patch, I have to say using the mapping
> structures seems like a far better approach, but I've hit some trouble
> with the details of adapting OpenACC to use that method.

Here's a version of the patch which (hopefully) brings OpenACC on par
with OpenMP with respect to use_device/use_device_ptr variables. The
implementation is essentially the same now for OpenACC as for OpenMP
(i.e. using mapping structures): so for now, only array or pointer
variables can be used as use_device variables. The included tests have
been adjusted accordingly.

One awkward part of the implementation concerns nesting offloaded
regions within host_data regions:

#define N 1024

int main (int argc, char* argv[])
{
  int x[N];

#pragma acc data copyin (x[0:N])
  {
int *xp;
#pragma acc host_data use_device (x)
{
  [...]
#pragma acc parallel present (x) copyout (xp)
  {
xp = x;
  }
}

assert (xp == acc_deviceptr (x));
  }

  return 0;
}

I think the meaning of 'x' as seen within the clauses of the parallel
directive should be the *host* version of x, not the mapped target
address (I've asked on the OpenACC technical mailing list to clarify
this point, but no reply as yet). The changes to
{maybe_,}lookup_decl_in_outer_ctx "skip over" host_data contexts when
called from lower_omp_target. There's probably an analogous case for
OpenMP, but I've not tried to handle that.

No regressions for libgomp tests, and the new tests pass. OK for trunk?

Thanks,

Julian

ChangeLog

Julian Brown  <jul...@codesourcery.com>
Cesar Philippidis  <ce...@codesourcery.com>
James Norris  <james_nor...@mentor.com>

gcc/
* c-family/c-pragma.c (oacc_pragmas): Add PRAGMA_OACC_HOST_DATA.
* c-family/c-pragma.h (pragma_kind): Add PRAGMA_OACC_HOST_DATA.
(pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_USE_DEVICE.
* c/c-parser.c (c_parser_omp_clause_name): Add use_device support.
(c_parser_oacc_clause_use_device): New function.
(c_parser_oacc_all_clauses): Add use_device support.
(OACC_HOST_DATA_CLAUSE_MASK): New macro.
(c_parser_oacc_host_data): New function.
(c_parser_omp_construct): Add host_data support.
* c/c-tree.h (c_finish_oacc_host_data): Add prototype.
* c/c-typeck.c (c_finish_oacc_host_data): New function.
(c_finish_omp_clauses): Add use_device support.
* cp/cp-tree.h (finish_oacc_host_data): Add prototype.
* cp/parser.c (cp_parser_omp_clause_name): Add use_device support.
(cp_parser_oacc_all_clauses): Add use_device support.
(OACC_HOST_DATA_CLAUSE_MASK): New macro.
(cp_parser_oacc_host_data): New function.
(cp_parser_omp_construct): Add host_data support.
(cp_parser_pragma): Add host_data support.
* cp/semantics.c (finish_omp_clauses): Add use_device support.
(finish_oacc_host_data): New function.
* gimple-pretty-print.c (dump_gimple_omp_target): Add host_data
support.
* gimple.h (gf_mask): Add GF_OMP_TARGET_KIND_OACC_HOST_DATA.
(is_gimple_omp_oacc): Add support for above.
* gimplify.c (gimplify_scan_omp_clauses): Add host_data, use_device
support.
(gimplify_omp_workshare): Add host_data support.
(gimplify_expr): Likewise.
* omp-builtins.def (BUILT_IN_GOACC_HOST_DATA): New.
* omp-low.c (lookup_decl_in_outer_ctx)
(maybe_lookup_decl_in_outer_ctx): Add optional argument to skip
host_data regions.
(scan_sharing_clauses): Support use_device.
(check_omp_nesting_restrictions): Support host_data.
(expand_omp_target): Support host_data.
(lower_omp_target): Skip over outer host_data regions when looking
up decls. Support use_device.
(make_gimple_omp_edges): Support host_data.
* tree-nested.c (convert_nonlocal_omp_clauses): Add use_device
clause.

libgomp/
* oacc-parallel.c (GOACC_host_data): New function.
* libgomp.map (GOACC_host_data): Add to GOACC_2.0.1.
* testsuite/libgomp.oacc-c-c++-common/host_data-1.c: New test.
* tests

Re: [PATCH/RFC/RFA] Machine modes for address printing (all targets)

2015-11-09 Thread Julian Brown
On Thu, 5 Nov 2015 11:22:04 +0100
Bernd Schmidt  wrote:

> >  static void
> > -mcore_print_operand_address (FILE * stream, rtx x)
> > +mcore_print_operand_address (FILE * stream, machine_mode mode
> > ATTRIBUTE_UNUSED,
> > +rtx x)  
> 
> So apparently we're settling on writing the unused arg as just 
> "machine_mode" without a name. Please change everywhere.
> 
> > @@ -1754,7 +1754,7 @@ mmix_print_operand_punct_valid_p (unsign
> >  /* TARGET_PRINT_OPERAND_ADDRESS.  */
> >
> >  static void
> > -mmix_print_operand_address (FILE *stream, rtx x)
> > +mmix_print_operand_address (FILE *stream, machine_mode mode, rtx x)
> >  {
> >if (REG_P (x))
> >  {  
> 
> The arg appears to be unused - I'd expect to see a warning here.

I've fixed those two, and a handful of other bits I missed.

> Other thank that it looks OK. I'm not going to require that you test 
> every target, but it would be good to have the full set built to cc1 
> before and after, and please be on the lookout for fallout.

Thanks! I used the attached "build-all.sh" to test all the targets
affected by the patch with "make all-gcc": those now all succeed
(I'm sure I reinvented a wheel here, but perhaps the target list is
useful to someone else).

Julian

ChangeLog

gcc/
* final.c (output_asm_insn): Pass VOIDmode to output_address.
(output_address): Add MODE argument. Pass to print_operand_address
hook.
* targhooks.c (default_print_operand_address): Add MODE argument.
* targhooks.h (default_print_operand_address): Update prototype.
* output.h (output_address): Update prototype.
* target.def (print_operand_address): Add MODE argument.
* config/vax/vax.c (print_operand_address): Pass VOIDmode to
output_address.
(print_operand): Pass access mode to output_address.
* config/mcore/mcore.c (mcore_print_operand_address): Add MODE
argument.
(mcore_print_operand): Update calls to mcore_print_operand_address.
* config/fr30/fr30.c (fr30_print_operand): Pass VOIDmode to
output_address.
* config/lm32/lm32.c (lm32_print_operand): Pass mode in calls to
output_address.
* config/tilegx/tilegx.c (output_memory_reference_mode): Remove
global.
(tilegx_print_operand): Don't set above global. Update calls to
output_address.
(tilegx_print_operand_address): Add MODE argument. Use instead of
output_memory_reference_mode global.
* config/frv/frv.c (frv_print_operand_address): Add MODE argument.
(frv_print_operand): Pass mode to frv_print_operand_address calls.
* config/mn10300/mn10300.c (mn10300_print_operand): Pass mode to
output_address.
* config/cris/cris.c (cris_print_operand_address): Add MODE
argument.
(cris_print_operand): Pass mode to output_address calls.
* config/spu/spu.c (print_operand): Pass mode to output_address
calls.
* config/aarch64/aarch64.h (aarch64_print_operand)
(aarch64_print_operand_address): Remove prototypes.
* config/aarch64/aarch64.c (aarch64_memory_reference_mode): Delete
global.
(aarch64_print_operand): Make static. Update calls to
output_address.
(aarch64_print_operand_address): Add MODE argument. Use instead of
aarch64_memory_reference_mode global.
(TARGET_PRINT_OPERAND, TARGET_PRINT_OPERAND_ADDRESS): Define target
hooks.
* config/aarch64/aarch64.h (PRINT_OPERAND, PRINT_OPERAND_ADDRESS):
Delete macro definitions.
* config/pa/pa.c (pa_print_operand): Pass mode in output_address
calls.
* config/xtensa/xtensa.c (print_operand): Pass mode in
output_address calls.
* config/h8300/h8300.c (h8300_print_operand_address): Add MODE
argument.
(h83000_print_operand): Update calls to h8300_print_operand_address
and output_address.
* config/ia64/ia64.c (ia64_print_operand_address): Add MODE
argument.
* config/tilepro/tilepro.c (output_memory_reference_mode): Delete
global.
(tilepro_print_operand): Pass mode to output_address.
(tilepro_print_operand_address): Add MODE argument. Use instead of
output_memory_reference_mode.
* config/nvptx/nvptx.c (output_decl_chunk, nvptx_assemble_integer)
(nvptx_output_call_insn, nvptx_print_address_operand): Pass VOIDmode
to output_address calls.
(nvptx_print_operand_address): Add MODE argument.
* config/alpha/alpha.c (print_operand): Pass mode argument in
output_address calls.
* config/m68k/m68k.c (print_operand): Pass mode argument in
output_address call.
* config/avr/avr.c (avr_print_operand_address): Add MODE argument.
(avr_print_operand): Update calls to avr_print_operand_address.
* config/sparc/sparc.c (sparc_print_operand_address): Add MODE
argument. Update calls to output_address.
(sparc_print_operand): Pass mode to output_address.
* config/iq2000/iq2000.c (iq2000_print_operand_address): Add MODE
argument.
(iq2000_print_operand): Pass mode in output_address calls.
* 

[PATCH/RFC/RFA] Machine modes for address printing (all targets)

2015-11-04 Thread Julian Brown
Hi,

Depending on assembler syntax and supported addressing modes, several
targets need to know the machine mode for a memory access when printing
an address (i.e. for automodify addresses that need to know the size
of their access), but it is not available with the current
TARGET_PRINT_OPERAND_ADDRESS hook. This leads to an ugly corner in the
operand output mechanism, where address printing gets split between
different parts of a backend, or some other hack (e.g. a global
variable) is used to communicate the machine mode to the address
printing hook.

Using a global variable also leads to a latent (?) bug on at least
AArch64: attempts to use the 'a' operand printing code cause final.c
to call output_address (in turn invoking the PRINT_OPERAND_ADDRESS
macro) *without* first setting the magic global
aarch64_memory_reference_mode, which means a stale value will be used
instead.

The full list of targets that use some form of workaround for the lack
of machine mode in the address printing hook is (E):

aarch64: uses magic global.
arc: pre/post inc/dec handled in print_operand.
arm: uses magic global.
c6x
epiphany: offsets handled in print_operand.
m32r: hard-wires 4 for access size.
nds32
tilegx: uses magic global.
tilepro: uses magic global.

That's not all targets by any means, but may be enough to warrant a
change in the interface. I propose that:

* The output_address function should have a machine_mode argument
  added. Bare addresses (e.g. the 'a' case in final.c) should pass
  "VOIDmode" for this argument.

* Other callers of output_address -- actually all in backends -- can
  pass the machine mode for the memory access in question.

* The TARGET_PRINT_OPERAND_ADDRESS hook shall also have a machine_mode
  argument added. The legacy PRINT_OPERAND_ADDRESS hook can be left
  alone. (The documentation for the operand-printing hooks needs fixing
  too, incidentally.)

The attached patch makes this change, fairly mechanically. This removes
(most of) the magic globals for address printing, but I haven't tried
to refactor the targets that use other hacks to print correct
auto-modify addresses (that can be done by their respective
maintainers, hopefully, and should result in a nice cleanup).

Unfortunately I can't hope to test all the targets affected, though the
subset of targets that it's relatively easy for me to build, build
fine. I also ran regression tests for AArch64.

OK to apply, or any comments, or any further testing required?

Thanks,

Julian

ChangeLog

gcc/
* final.c (output_asm_insn): Pass VOIDmode to output_address.
(output_address): Add MODE argument. Pass to print_operand_address
hook.
* targhooks.c (default_print_operand_address): Add MODE argument.
* targhooks.h (default_print_operand_address): Update prototype.
* output.h (output_address): Update prototype.
* target.def (print_operand_address): Add MODE argument.
* config/vax/vax.c (print_operand_address): Pass VOIDmode to
output_address.
(print_operand): Pass access mode to output_address.
* config/mcore/mcore.c (mcore_print_operand_address): Add MODE
argument.
(mcore_print_operand): Update calls to mcore_print_operand_address.
* config/fr30/fr30.c (fr30_print_operand): Pass VOIDmode to
output_address.
* config/lm32/lm32.c (lm32_print_operand): Pass mode in calls to
output_address.
* config/tilegx/tilegx.c (output_memory_reference_mode): Remove
global.
(tilegx_print_operand): Don't set above global. Update calls to
output_address.
(tilegx_print_operand_address): Add MODE argument. Use instead of
output_memory_reference_mode global.
* config/frv/frv.c (frv_print_operand_address): Add MODE argument.
* config/mn10300/mn10300.c (mn10300_print_operand): Pass mode to
output_address.
* config/cris/cris.c (cris_print_operand_address): Add MODE
argument.
(cris_print_operand): Pass mode to output_address calls.
* config/spu/spu.c (print_operand): Pass mode to output_address
calls.
* config/aarch64/aarch64.h (aarch64_print_operand)
(aarch64_print_operand_address): Remove prototypes.
* config/aarch64/aarch64.c (aarch64_memory_reference_mode): Delete
global.
(aarch64_print_operand): Make static. Update calls to
output_address.
(aarch64_print_operand_address): Add MODE argument. Use instead of
aarch64_memory_reference_mode global.
(TARGET_PRINT_OPERAND, TARGET_PRINT_OPERAND_ADDRESS): Define target
hooks.
* config/aarch64/aarch64.h (PRINT_OPERAND, PRINT_OPERAND_ADDRESS):
Delete macro definitions.
* config/pa/pa.c (pa_print_operand): Pass mode in output_address
calls.
* config/xtensa/xtensa.c (print_operand): Pass mode in
output_address calls.
* config/h8300/h8300.c (h8300_print_operand_address): Add MODE
argument.
(h83000_print_operand): Update calls to h8300_print_operand_address
and output_address.
* config/ia64/ia64.c 

Re: [Bulk] [OpenACC 0/7] host_data construct

2015-11-02 Thread Julian Brown
On Mon, 26 Oct 2015 19:34:22 +0100
Jakub Jelinek  wrote:

> Your use_device sounds very similar to use_device_ptr clause in
> OpenMP, which is allowed on #pragma omp target data construct and is
> implemented quite a bit differently from this; it is unclear if the
> OpenACC standard requires this kind of implementation, or you just
> chose to implement it this way.  In particular, the GOMP_target_data
> call puts the variables mentioned in the use_device_ptr clauses into
> the mapping structures (similarly how map clause appears) and the
> corresponding vars are privatized within the target data region
> (which is a host region, basically a fancy { } braces), where the
> private variables contain the offloading device's pointers.

As the author of the original patch, I have to say using the mapping
structures seems like a far better approach, but I've hit some trouble
with the details of adapting OpenACC to use that method.

Firstly, on trunk at least, use_device_ptr variables are restricted to
pointer or array types: that restriction doesn't exist in OpenACC, nor
actually could I find it in the OpenMP 4.1 document (my guess is the
standards are supposed to match in this regard). I think that a program
such as this should work:

void target_fn (int *targ_data);

int
main (int argc, char *argv[])
{
  char out;
  int myvar;
#pragma omp target enter data map(to: myvar)

#pragma omp target data use_device_ptr(myvar) map(from:out)
  {
target_fn ();
out = 5;
  }

  return 0;
}

"myvar" would have its address taken in the use_device_ptr region, and
places where the corresponding mapped variable has its address taken
would be replaced by a direct use of the mapped pointer. (Or is that
not a well-formed thing to do, in general?). This fails with "error:
'use_device_ptr' variable is neither a pointer nor an array".

Secondly, attempts to use use_device_ptr on (e.g.
dynamically-allocated) arrays accessed through a pointer cause an ICE
with the existing trunk OpenMP code:

#include 

void target_fn (char *targ_data);

int
main (int argc, char *argv[])
{
  char *myarr, out;

  myarr = malloc (1024);

#pragma omp target data map(to: myarr[0:1024])
  {
#pragma omp target data use_device_ptr(myarr) map(from:out)
{
  target_fn (myarr);
  out = 5;
}
  }

  return 0;
}

udp3.c: In function 'main':
udp3.c:6:1: internal compiler error: in make_decl_rtl, at varasm.c:1298
 main (int argc, char *argv[])
 ^
0x111256b make_decl_rtl(tree_node*)
/scratch/jbrown/openacc-trunk/src/gcc-mainline/gcc/varasm.c:1294
0x9ea005 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
/scratch/jbrown/openacc-trunk/src/gcc-mainline/gcc/expr.c:9559
0x9e31c2 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, 
rtx_def**, bool)
/scratch/jbrown/openacc-trunk/src/gcc-mainline/gcc/expr.c:7892
0x9cb4ae expand_expr
/scratch/jbrown/openacc-trunk/src/gcc-mainline/gcc/expr.h:255
0x9d907d expand_assignment(tree_node*, tree_node*, bool)
/scratch/jbrown/openacc-trunk/src/gcc-mainline/gcc/expr.c:5089
0x89e219 expand_gimple_stmt_1
/scratch/jbrown/openacc-trunk/src/gcc-mainline/gcc/cfgexpand.c:3576
0x89e60d expand_gimple_stmt
/scratch/jbrown/openacc-trunk/src/gcc-mainline/gcc/cfgexpand.c:3672
0x8a5773 expand_gimple_basic_block
/scratch/jbrown/openacc-trunk/src/gcc-mainline/gcc/cfgexpand.c:5676
0x8a72d4 execute
/scratch/jbrown/openacc-trunk/src/gcc-mainline/gcc/cfgexpand.c:6288

Furthermore, this looks strange to me (006t.omplower):

  .omp_data_arr.5.out = 
  myarr.8 = myarr;
  .omp_data_arr.5.myarr = myarr.8;
  #pragma omp target data map(from:out [len: 1]) use_device_ptr(myarr)
{
  D.2436 = .omp_data_arr.5.myarr;
  myarr = D.2436;

That's clobbering the original myarr variable, right?

Any clues on these two? The omp-low.c code is rather opaque to me...

Thanks,

Julian


Re: [PATCH] [ARM] neon-testgen.ml typo

2015-11-02 Thread Julian Brown
Hi,

On Thu, 29 Oct 2015 10:23:58 -0700
Jim Wilson  wrote:

> I noticed a comment typo in this file while using grep to look for
> other stuff.  The typo is easy to fix.
> 
> I tried running neon-testgen.ml to verify, but it is apparently no
> longer valid ocaml, as it doesn't work with the ocamlc 4.01.0 I have
> on Ubuntu 14.04.  I get a syntax error.  Someone who knows ocaml will
> have to fix this.  Meanwhile, the patch to fix the typo should still
> be OK, as this is a separate problem.

This seems to work for me (semicolons in OCaml are separators not
terminators - I'm not sure why this worked before). OK to apply?

Julian

ChangeLog

gcc/
* config/arm/neon-testgen.ml (emit_epilogue): Remove extraneous
brackets and semicolon.Index: gcc/config/arm/neon-testgen.ml
===
--- gcc/config/arm/neon-testgen.ml	(revision 229410)
+++ gcc/config/arm/neon-testgen.ml	(working copy)
@@ -130,14 +130,14 @@ let emit_call chan const_valuator c_type
 let emit_epilogue chan features regexps =
   let no_op = List.exists (fun feature -> feature = No_op) features in
 Printf.fprintf chan "}\n\n";
-(if not no_op then
-   List.iter (fun regexp ->
-   Printf.fprintf chan
- "/* { dg-final { scan-assembler \"%s\" } } */\n" regexp)
+if not no_op then
+  List.iter (fun regexp ->
+  Printf.fprintf chan
+"/* { dg-final { scan-assembler \"%s\" } } */\n" regexp)
 regexps
- else
-   ()
-);
+else
+  ()
+
 
 (* Check a list of C types to determine which ones are pointers and which
ones are const.  *)


Re: [gomp4 00/14] NVPTX: further porting

2015-10-22 Thread Julian Brown
On Thu, 22 Oct 2015 19:41:51 +0300
Alexander Monakov  wrote:

> On Thu, 22 Oct 2015, Jakub Jelinek wrote:
> > Does that apply also to threads within a warp?  I.e. is .local
> > local to each thread in the warp, or to the whole warp, and if the
> > former, how can say at the start of a SIMD region or at its end the
> > local vars be broadcast to other threads and collected back?  One
> > thing is scalar vars, another pointers, or references to various
> > types, or even bigger indirection.  
> 
> .local is indeed local to each warp member, not the warp as a whole.
> What OpenACC/PTX implementation does is to copy the whole stack
> frame, plus live registers: the implementation is in
> nvptx.c:nvptx_propagate.
> 
> I see two possible alternative approaches for OpenMP/PTX.

> The second approach is to run all threads in the warp all the time,
> making sure they execute the same code with the same data, and thus
> build up the same local state.  In this case we'd need to ensure this
> invariant: if threads in the warp have the same state prior to
> executing an instruction, they also have the same state after
> executing that instruction (plus global state changes as if only one
> thread executed that instruction).
> 
> Most instructions are safe w.r.t this invariant.

> Was something like this considered (and rejected?) for OpenACC?

I'm not sure we understood the "global state changes as if only one
thread executed that instruction" bit (do you have a citation?). But
anyway, even if that works for threads within a warp, it doesn't work
for warps within a CTA, so we'd still need some broadcast mechanism for
those.

Julian


Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Julian Brown
On Thu, 22 Oct 2015 10:05:30 +0200
Richard Biener  wrote:

> On Thu, Oct 22, 2015 at 9:59 AM, Jakub Jelinek 
> wrote:
> > On Thu, Oct 22, 2015 at 09:49:29AM +0200, Richard Biener wrote:  
> >> >> Jakub, IYR I originally had IFN_FORK and IFN_JOIN as such
> >> >> distinct internal fns.  This replaces that scheme.
> >> >>
> >> >> ok?  
> >> >
> >> > Hmm, I'd just have used gimple_has_volatile_ops on the call?
> >> > That should have the
> >> > desired effects.  
> >>
> >> That is, whatever new IFNs you need are ok, but special-casing
> >> them is not necessary if you properly mark the calls as volatile.  
> >
> > I don't see gimple_has_volatile_ops used in tracer.c or
> > tree-ssa-threadedge.c.  Setting gimple_has_volatile_ops on those
> > IFNs is fine, but I think they are even stronger than that.  
> 
> Hmm, indeed.  Now I fail to see how the implemented property
> "preserves the CFG looping structure".  And I would have expected
> can_copy_bbs_p to be adjusted instead (catching more cases and the
> threading and tracer case as well).
> 
> As far as I can see nothing would prevent dissolving the loop by
> completely unolling it for example.  Or deleting it because it has no
> side-effects.
> 
> So you'd need to be more precise as to what properties you are trying
> to preserve by placing a single stmt somewhere.

FWIW an earlier, abandoned attempt at solving the same problem was
discussed in the following thread, continuing through June:

  https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02647.html

Though the details of lowering of OpenACC constructs have changed with
Nathan's current patches, the underlying problem remains the same. PTX
requires certain operations (bar.sync) to be executed uniformly by all
threads in a CTA. IIUC this affects "JOIN" points across all
workers/vectors in a gang, in particular (though this is generic code,
other -- particularly GPU -- targets may have similar restrictions).

HTH,

Julian


Re: Repository for the conversion machinery

2015-09-01 Thread Julian Brown
On Fri, 28 Aug 2015 17:50:53 +
Joseph Myers  wrote:

> shinwell = Mark Shinwell 
>   (Jane Street)

Mark's current address is mshinw...@janestreet.com.

Julian


[gomp4] Some additional OpenACC reduction tests

2015-07-29 Thread Julian Brown
Hi,

This is a set of 19 new tests for OpenACC reductions, covering several
ways of performing reductions over the parallel and loop directives
using gang or worker/vector level parallelism. (The semantics are quite
subtle in some places, but I believe the tests follow the specification
to the letter at least, EOE.)

Several of these do not pass yet, so have been marked with XFAILs.

I will apply to gomp4 branch shortly.

Cheers,

Julian

ChangeLog

libgomp/
* testsuite/libgomp.oacc-c-c++-common/loop-reduction-*.c: New tests.
* testsuite/par-reduction-*.c: New tests.
* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-*.c:
New tests.commit d6cb22b11bbe6f536bd0f6d5ce8349266040
Author: Julian Brown jul...@codesourcery.com
Date:   Wed Jul 29 10:04:36 2015 -0700

Some new OpenACC reduction tests.

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c
new file mode 100644
index 000..52f9a8f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c
@@ -0,0 +1,43 @@
+#include assert.h
+
+/* Test of reduction on loop directive (gangs, non-private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i  1024; i++)
+arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		   copy(res)
+  {
+#pragma acc loop gang reduction(+:res)
+for (i = 0; i  1024; i++)
+  res += arr[i];
+  }
+
+  for (i = 0; i  1024; i++)
+hres += arr[i];
+
+  assert (res == hres);
+
+  res = hres = 1;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		   copy(res)
+  {
+#pragma acc loop gang reduction(*:res)
+for (i = 0; i  12; i++)
+  res *= arr[i];
+  }
+
+  for (i = 0; i  12; i++)
+hres *= arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c
new file mode 100644
index 000..b5e3b2f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c
@@ -0,0 +1,28 @@
+#include assert.h
+
+/* Test of reduction on loop directive (gangs and vectors, non-private
+   reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i  1024; i++)
+arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		   copy(res)
+  {
+#pragma acc loop gang vector reduction(+:res)
+for (i = 0; i  1024; i++)
+  res += arr[i];
+  }
+
+  for (i = 0; i  1024; i++)
+hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c
new file mode 100644
index 000..d724680
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c
@@ -0,0 +1,28 @@
+#include assert.h
+
+/* Test of reduction on loop directive (gangs and workers, non-private
+   reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i  1024; i++)
+arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		   copy(res)
+  {
+#pragma acc loop gang worker reduction(+:res)
+for (i = 0; i  1024; i++)
+  res += arr[i];
+  }
+
+  for (i = 0; i  1024; i++)
+hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c
new file mode 100644
index 000..d610373
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c
@@ -0,0 +1,28 @@
+#include assert.h
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i  1024; i++)
+arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		   copy(res)
+  {
+#pragma acc loop gang worker vector reduction(+:res)
+for (i = 0; i  1024; i++)
+  res += arr[i];
+  }
+
+  for (i = 0; i  1024; i++)
+hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
new file mode 100644
index 000..3e5c707
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
@@ -0,0 +1,36 @@
+/* { dg-xfail-run-if TODO { openacc_nvidia_accel_selected } { * } {  } } */
+
+#include assert.h

Re: [gomp4] Remove device-specific filtering during parsing for OpenACC

2015-07-27 Thread Julian Brown
On Fri, 17 Jul 2015 14:57:14 +0200
Thomas Schwinge tho...@codesourcery.com wrote:

 In combination with the equivant change to
 gcc/cp/parser.c:cp_parser_oacc_all_clauses,
 gcc/c-family/c-omp.c:c_oacc_filter_device_types, and transitively also
 the struct identifier_hasher and c_oacc_extract_device_id function
 preceding it, are now unused.  (Not an exhaustive list; have not
 checked which other auxilliary functions etc. Cesar has added in his
 device_type changes.)  Does it make any sense to keep these for
 later, or dump them now?

The attached patch removes this dead code...

  --- a/gcc/c/c-typeck.c
  +++ b/gcc/c/c-typeck.c
  @@ -12568,6 +12568,10 @@ c_finish_omp_clauses (tree clauses, bool
  oacc) pc = OMP_CLAUSE_CHAIN (c);
continue;
   
  +case OMP_CLAUSE_DEVICE_TYPE:
  + pc = OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c);
  + continue;
  +
  case OMP_CLAUSE_INBRANCH:
  case OMP_CLAUSE_NOTINBRANCH:
if (branch_seen)
 
 From a quick glance only, this seems to be different from the C++
 front end (have not checked Fortran).
 
 I have not looked at what the front end parsing is now actually
 doing; is it just attaching any clauses following a device_type
 clause to the latter?  (The same should be done for all front ends,
 obviously.  Even if it's not important right now, because of the
 sorry diagnostic that will be emitted later on as soon as there is
 one device_type clause, this should best be addressed now, while you
 still remember what's going on here ;-) so that there will be no bad
 surprises once we actually implement the handling in OMP
 lowering/streaming/device compilers.)
 
 Do we need manually need to take care to
 finalize (c_finish_omp_clauses et al.) such masked clause chains,
 or will the right thing happen automatically?

...and fixes the C and C++ frontend to finalize parsed
device_type clauses properly (although so far finalization doesn't do
anything for the clauses that can be associated with a device_type
clause anyway, so there's no actual change in behaviour).

I haven't moved the sorry reporting for the unsupported device_type
clause to scan_sharing_clauses because it doesn't seem to be
particularly a more logical place, and doing so breaks the tests that
scan the omp-low dumps.

I will apply to gomp4 branch as obvious, shortly.

Thanks,

Julian

ChangeLog

gcc/
* c-family/c-omp.c (c_oacc_extract_device_id, identifier_hasher)
(c_oacc_filter_device_types): Remove dead code.
* c/c-typeck.c (c_finish_omp_clauses): Add scanning for sub-clauses
of device_type clause.
* cp/semantics.c (finish_omp_clauses): Likewise.commit e24a9cd14d4b8b5dab8b37218b29844787809648
Author: Julian Brown jul...@codesourcery.com
Date:   Mon Jul 27 07:31:10 2015 -0700

Clause finalization cleanups and dead code removal.

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index b76de69..10190d7 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -1081,132 +1081,6 @@ c_omp_predetermined_sharing (tree decl)
   return OMP_CLAUSE_DEFAULT_UNSPECIFIED;
 }
 
-/* Return a numerical code representing the device_type.  Currently,
-   only device_type(nvidia) is supported.  All device_type parameters
-   are treated as case-insensitive keywords.  */
-
-static int
-c_oacc_extract_device_id (const char *device)
-{
-  if (!strcasecmp (device, nvidia))
-return GOMP_DEVICE_NVIDIA_PTX;
-  else if (!strcmp (device, *))
-return GOMP_DEVICE_DEFAULT;
-  return GOMP_DEVICE_NONE;
-}
-
-struct identifier_hasher : ggc_cache_ptr_hashtree_node
-{
-  static hashval_t hash (tree t) { return htab_hash_pointer (t); }
-  static bool equal (tree a, tree b)
-  {
-return !strcmp(IDENTIFIER_POINTER (a), IDENTIFIER_POINTER (b));
-  }
-};
-
-/* Filter out the list of unsupported OpenACC device_types.  */
-
-tree
-c_oacc_filter_device_types (tree clauses)
-{
-  tree c, prev;
-  tree dtype = NULL_TREE;
-  tree seen_nvidia = NULL_TREE;
-  tree seen_default = NULL_TREE;
-  hash_tableidentifier_hasher *dt_htab
-= hash_tableidentifier_hasher::create_ggc (10);
-
-  /* First scan for all device_type clauses.  */
-  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
-{
-  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
-	{
-	  tree t;
-
-	  for (t = OMP_CLAUSE_DEVICE_TYPE_DEVICES (c); t; t = TREE_CHAIN (t))
-	{
-	  if (dt_htab-find (t))
-		{
-		  error_at (OMP_CLAUSE_LOCATION (c),
-			duplicate device_type (%s),
-			IDENTIFIER_POINTER (t));
-		  goto filter_dtype;
-		}
-
-	  int code = c_oacc_extract_device_id (IDENTIFIER_POINTER (t));
-
-	  if (code == GOMP_DEVICE_DEFAULT)
-		seen_default = OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c);
-	  else if (code == GOMP_DEVICE_NVIDIA_PTX)
-		seen_nvidia = OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c);
-	  else
-		{
-		  /* The OpenACC technical committee advises compilers
-		 to silently ignore unknown devices.  */
-		}
-
-	  tree *slot = dt_htab-find_slot (t, INSERT);
-	  *slot = t

Re: [gomp4] Remove device-specific filtering during parsing for OpenACC

2015-07-17 Thread Julian Brown
On Fri, 17 Jul 2015 14:57:14 +0200
Thomas Schwinge tho...@codesourcery.com wrote:

 Hi Julian!
 
 On Thu, 16 Jul 2015 16:32:12 +0100, Julian Brown
 jul...@codesourcery.com wrote:
  This patch removes the device-specific filtering (for NVidia PTX)
  from the parsing stages of the host compiler (for the device_type
  clause -- separately for C, C++ and Fortran) in favour of fully
  parsing the device_type clauses, but not actually implementing
  anything for them (device_type support is a feature that we're not
  planning to implement just yet: the existing support is something
  of a red herring).
  
  With this patch, the parsed device_type clauses will be ready at OMP
  lowering time whenever we choose to do something with them (e.g.
  transforming them into a representation that can be streamed out and
  re-read by the appropriate offload compiler). The representation is
  more-or-less the same for all supported languages
 
 Thanks!
 
  modulo clause ordering.
 
 Is that something that a) doesn't need to be/already has been
 addressed (with your patch), or b) still needs to be addressed?

It's something that doesn't matter, I think: clauses are chained
together like this:

  num_gangs
  num_workers
  ...
  |
  device_type(foo)
  \__num_gangs(OMP_CLAUSE_DEVICE_TYPE_CLAUSES)
  |  num_workers
  |  ...
  device_type(bar)
  \__num_gangs
  |  num_workers
  |  ...
  V
  (OMP_CLAUSE_CHAIN)

foo and bar are OMP_CLAUSE_DEVICE_TYPE_DEVICES -- tree lists. The
Fortran front-end will emit num_gangs, num_workers etc. clauses in a
fixed order (irrespective of their order in the source program), but the
C and C++ frontends will emit them in the (reverse of the) order
encountered.

There isn't really a consumer for this information yet, but when there
is, it will just have to not care about that (which should be
straightforward, I think).

  I've altered the dtype-*.* tests to account for the new behaviour
  (and to not use e.g. mixed-case nVidia or acc_device_nvidia
  names, which are contrary to the recommendations in the spec).
 
 OpenACC 2.0a indeed seems to suggest that device_type arguments are
 case-sensitive -- contrary to the ACC_DEVICE_TYPE environment
 variable, which probably is where the idea came from to parse them
 case-insensitive.
 
 As to the latter invalid names, I thought the idea has been to
 verify that the clauses following such device_types clauses are
 indeed ignored in the later processing.  (Obviously, there should've
 been comments indicating that, as otherwise that's very confusing --
 as we've just seen -- due to the similarity to the runtime library's
 acc_device_* device type values.)

Yes, and there are still some tests for that functionality. I figured
there wasn't much point in over-testing it, especially since none of
this code does that much yet.

  OK to apply, or any comments?
 
 Your commit r225927 appears to have caused:
 
 [-PASS:-]{+FAIL: libgomp.fortran/declare-simd-2.f90   -O0
 (internal compiler error)+} {+FAIL:+}
 libgomp.fortran/declare-simd-2.f90   -O0  (test for excess errors)
 [-PASS:-]{+UNRESOLVED:+} libgomp.fortran/declare-simd-2.f90   -O0
 [-execution test-] [-PASS:-]{+compilation failed to produce
 executable+} [same for other optimization levels]
 
 
 [...]/source-gcc/libgomp/testsuite/libgomp.fortran/declare-simd-3.f90:17:0:
 internal compiler error: Segmentation fault 0xc39b6f crash_signal
 [...]/source-gcc/gcc/toplev.c:352
 0x7043a8 gfc_trans_omp_clauses
 [...]/source-gcc/gcc/fortran/trans-openmp.c:2671
 0x7049a8 gfc_trans_omp_declare_simd(gfc_namespace*)
 [...]/source-gcc/gcc/fortran/trans-openmp.c:4589
 0x6b8542 gfc_get_extern_function_decl(gfc_symbol*)
 [...]/source-gcc/gcc/fortran/trans-decl.c:2025
 0x6b878d gfc_get_extern_function_decl(gfc_symbol*)
 [...]/source-gcc/gcc/fortran/trans-decl.c:1820
 0x6ce952 conv_function_val
 [...]/source-gcc/gcc/fortran/trans-expr.c:3601
 0x6ce952 gfc_conv_procedure_call(gfc_se*, gfc_symbol*,
 gfc_actual_arglist*, gfc_expr*, vectree_node*, va_gc, vl_embed*)
 [...]/source-gcc/gcc/fortran/trans-expr.c:5873 0x6cf4c2
 gfc_conv_expr(gfc_se*, gfc_expr*)
 [...]/source-gcc/gcc/fortran/trans-expr.c:7391 0x6d71d0
 gfc_trans_assignment_1 [...]/source-gcc/gcc/fortran/trans-expr.c:9127
 0x692465 trans_code
 [...]/source-gcc/gcc/fortran/trans.c:1674
 0x6fa457 gfc_trans_omp_code
 [...]/source-gcc/gcc/fortran/trans-openmp.c:2711
 0x705410 gfc_trans_omp_do
 [...]/source-gcc/gcc/fortran/trans-openmp.c:3459
 0x707f9f gfc_trans_omp_directive(gfc_code*)
 [...]/source-gcc/gcc/fortran/trans-openmp.c:4521
 0x6922b7 trans_code
 [...]/source-gcc/gcc/fortran/trans.c:1924
 0x6c0660 gfc_generate_function_code(gfc_namespace*)
 [...]/source-gcc/gcc/fortran/trans-decl.c:6231
 0x64d630 translate_all_program_units
 [...]/source-gcc/gcc/fortran

Re: [gomp4] Remove device-specific filtering during parsing for OpenACC

2015-07-17 Thread Julian Brown
On Fri, 17 Jul 2015 14:57:14 +0200
Thomas Schwinge tho...@codesourcery.com wrote:

 Your commit r225927 appears to have caused:
 
 [-PASS:-]{+FAIL: libgomp.fortran/declare-simd-2.f90   -O0
 (internal compiler error)+} {+FAIL:+}
 libgomp.fortran/declare-simd-2.f90   -O0  (test for excess errors)
 [-PASS:-]{+UNRESOLVED:+} libgomp.fortran/declare-simd-2.f90   -O0
 [-execution test-] [-PASS:-]{+compilation failed to produce
 executable+} [same for other optimization levels]

This is fixed by the attached. I will apply shortly.

Thanks,

Julian

ChangeLog

gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Add NULL check for
clauses.commit 7171ab9066e6b4bb84c317d1892a3a0a77cf63ae
Author: Julian Brown jul...@codesourcery.com
Date:   Fri Jul 17 11:46:56 2015 -0700

Add NULL check for clauses in gfc_trans_omp_clauses.

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 20a1e65..378dd3b 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -2668,6 +2668,9 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
   tree omp_clauses = gfc_trans_omp_clauses_1 (block, clauses, where,
 	  declare_simd);
 
+  if (clauses == NULL)
+return NULL_TREE;
+
   for (; clauses-device_types; clauses = clauses-dtype_clauses)
 {
   tree c, following_clauses = NULL_TREE, dev_list = NULL_TREE;


[gomp4] Remove device-specific filtering during parsing for OpenACC

2015-07-16 Thread Julian Brown
Hi,

This patch removes the device-specific filtering (for NVidia PTX) from
the parsing stages of the host compiler (for the device_type clause --
separately for C, C++ and Fortran) in favour of fully parsing the
device_type clauses, but not actually implementing anything for them
(device_type support is a feature that we're not planning to implement
just yet: the existing support is something of a red herring).

With this patch, the parsed device_type clauses will be ready at OMP
lowering time whenever we choose to do something with them (e.g.
transforming them into a representation that can be streamed out and
re-read by the appropriate offload compiler). The representation is
more-or-less the same for all supported languages, modulo
clause ordering.

I've altered the dtype-*.* tests to account for the new behaviour (and
to not use e.g. mixed-case nVidia or acc_device_nvidia names, which
are contrary to the recommendations in the spec).

OK to apply, or any comments?

Thanks,

Julian

ChangeLog

gcc/
* gimplify.c (gimplify_scan_omp_clauses): Handle
OMP_CLAUSE_DEVICE_TYPE.
(gimplify_adjust_omp_clauses): Likewise.
* omp-low.c (scan_sharing_clauses): Likewise.
(expand_omp_target): Add sorry for device_type support.
* tree-pretty-print.c (dump_omp_clause): Add device_type support.
* tree.c (walk_tree_1): Likewise.

gcc/c/
* c-parser.c (c_parser_oacc_all_clauses): Don't call
c_oacc_filter_device_types.
* c-typeck.c (c_finish_omp_clauses): Handle OMP_CLAUSE_DEVICE_TYPE.

gcc/cp/
* parser.c (cp_parser_oacc_all_clauses): Don't call
c_oacc_filter_device_types.
* pt.c (tsubst_omp_clauses): Handle OMP_CLAUSE_DEVICE_TYPE.
* semantics.c (finish_omp_clauses): Likewise.

gcc/fortran/
* gfortran.h (gfc_omp_clauses): Change dtype int field to
device_types gfc_expr_list.
* openmp.c (gfc_match_omp_clauses): Remove scan_dtype variable (add
OMP_CLAUSE_DEVICE_TYPE directly to appropriate bitmasks). Parse all
device_type clauses without filtering.
(OACC_LOOP_CLAUSE_DEVICE_TYPE_MASK)
(OACC_KERNELS_CLAUSE_DEVICE_TYPE_MASK)
(OACC_PARALLEL_CLAUSE_DEVICE_TYPE_MASK)
(OACC_ROUTINE_CLAUSE_DEVICE_TYPE_MASK)
(OACC_UPDATE_CLAUSE_DEVICE_TYPE_MASK): Add OMP_CLAUSE_DEVICE_TYPE.
* trans-openmp.c (gfc_trans_omp_clauses): Translate device_type
clauses, and split old body into...
(gfc_trans_omp_clauses_1): New function.

gcc/testsuite/
* c-c++-common/goacc/dtype-1.c: Update test for new behaviour.
* c-c++-common/goacc/dtype-2.c: Likewise.
* c-c++-common/goacc/dtype-3.c: Likewise.
* c-c++-common/goacc/dtype-4.c: Likewise.
* gfortran.dg/goacc/dtype-1.f95: Likewise.
* gfortran.dg/goacc/dtype-2.f95: Likewise.
* gfortran.dg/goacc/dtype-3.f: Likewise.commit 123298186bb8ce87f84b6a3a72743939d4fdae11
Author: Julian Brown jul...@codesourcery.com
Date:   Thu Jul 16 08:06:01 2015 -0700

Fix device_type parsing, add sorry() for missing implementation of remainder.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 1c65abf..d90c18e 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -12439,10 +12439,7 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
   c_parser_skip_to_pragma_eol (parser);
 
   if (finish_p)
-{
-  clauses = c_oacc_filter_device_types (clauses);
-  return c_finish_omp_clauses (clauses, true);
-}
+return c_finish_omp_clauses (clauses, true);
 
   return clauses;
 }
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 98b8e3d..dcc246c 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -12568,6 +12568,10 @@ c_finish_omp_clauses (tree clauses, bool oacc)
 	  pc = OMP_CLAUSE_CHAIN (c);
 	  continue;
 
+case OMP_CLAUSE_DEVICE_TYPE:
+	  pc = OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c);
+	  continue;
+
 	case OMP_CLAUSE_INBRANCH:
 	case OMP_CLAUSE_NOTINBRANCH:
 	  if (branch_seen)
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 28f0048..80aabed 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -29879,10 +29879,7 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
   cp_parser_skip_to_pragma_eol (parser, pragma_tok);
 
   if (finish_p)
-{
-  clauses = c_oacc_filter_device_types (clauses);
-  return finish_omp_clauses (clauses, true);
-}
+return finish_omp_clauses (clauses, true);
 
   return clauses;
 }
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 205dc30..056b2c1 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13666,6 +13666,7 @@ tsubst_omp_clauses (tree clauses, bool declare_simd,
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
 	case OMP_CLAUSE_TILE:
+	case OMP_CLAUSE_DEVICE_TYPE:
 	  break;
 	default:
 	  gcc_unreachable ();
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 8935eb6..1ce1dfa 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -5951,6 +5951,7 @@ finish_omp_clauses (tree clauses, bool oacc)
 	case OMP_CLAUSE_BIND:
 	case OMP_CLAUSE_NOHOST:
 	case

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Julian Brown
On Mon, 22 Jun 2015 16:24:56 +0200
Jakub Jelinek ja...@redhat.com wrote:

 On Mon, Jun 22, 2015 at 02:55:49PM +0100, Julian Brown wrote:
  One problem is that (at least on the GPU hardware we've considered
  so far) we're somewhat constrained in how much control we have over
  how the underlying hardware executes code: it's possible to draw up
  a scheme where OpenACC source-level control-flow semantics are
  reflected directly in the PTX assembly output (e.g. to say all
  threads in a CTA/warp will be coherent after such-and-such a
  loop), and lowering OpenACC directives quite early seems to make
  that relatively tractable. (Even if the resulting code is
  relatively un-optimisable due to the abnormal edges inserted to
  make sure that the CFG doesn't become ill-formed.)
  
  If arbitrary optimisations are done between OMP-lowering time and
  somewhere around vectorisation (say), it's less clear if that
  correspondence can be maintained. Say if the code executed by half
  the threads in a warp becomes physically separated from the code
  executed by the other half of the threads in a warp due to some loop
  optimisation, we can no longer easily determine where that warp will
  reconverge, and certain other operations (relying on coherent warps
  -- e.g. CTA synchronisation) become impossible. A similar issue
  exists for warps within a CTA.
  
  So, essentially -- I don't know how late loop lowering would
  interact with:
  
  (a) Maintaining a CFG that will work with PTX.
  
  (b) Predication for worker-single and/or vector-single modes
  (actually all currently-proposed schemes have problems with proper
  representation of data-dependencies for variables and
  compiler-generated temporaries between predicated regions.)
 
 I don't understand why lowering the way you suggest helps here at all.
 In the proposed scheme, you essentially have whole function
 in e.g. worker-single or vector-single mode, which you need to be
 able to handle properly in any case, because users can write such
 routines themselves.  And then you can have a loop in such a function
 that has some special attribute, a hint that it is desirable to
 vectorize it (for PTX the PTX way) or use vector-single mode for it
 in a worker-single function.  So, the special pass then of course
 needs to handle all the needed broadcasting and reduction required to
 change the mode from e.g. worker-single to vector-single, but the
 convergence points still would be either on the boundary of such
 loops to be vectorized or parallelized, or wherever else they appear
 in normal vector-single or worker-single functions (around the calls
 to certainly calls?).

I think most of my concerns are centred around loops (with the markings
you suggest) that might be split into parts: if that cannot happen for
loops that are annotated as you describe, maybe things will work out OK.

(Apologies for my ignorance here, this isn't a part of the compiler
that I know anything about.)

Julian


Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Julian Brown
On Mon, 22 Jun 2015 16:24:56 +0200
Jakub Jelinek ja...@redhat.com wrote:

 On Mon, Jun 22, 2015 at 02:55:49PM +0100, Julian Brown wrote:
  One problem is that (at least on the GPU hardware we've considered
  so far) we're somewhat constrained in how much control we have over
  how the underlying hardware executes code: it's possible to draw up
  a scheme where OpenACC source-level control-flow semantics are
  reflected directly in the PTX assembly output (e.g. to say all
  threads in a CTA/warp will be coherent after such-and-such a
  loop), and lowering OpenACC directives quite early seems to make
  that relatively tractable. (Even if the resulting code is
  relatively un-optimisable due to the abnormal edges inserted to
  make sure that the CFG doesn't become ill-formed.)
  
  If arbitrary optimisations are done between OMP-lowering time and
  somewhere around vectorisation (say), it's less clear if that
  correspondence can be maintained. Say if the code executed by half
  the threads in a warp becomes physically separated from the code
  executed by the other half of the threads in a warp due to some loop
  optimisation, we can no longer easily determine where that warp will
  reconverge, and certain other operations (relying on coherent warps
  -- e.g. CTA synchronisation) become impossible. A similar issue
  exists for warps within a CTA.
  
  So, essentially -- I don't know how late loop lowering would
  interact with:
  
  (a) Maintaining a CFG that will work with PTX.
  
  (b) Predication for worker-single and/or vector-single modes
  (actually all currently-proposed schemes have problems with proper
  representation of data-dependencies for variables and
  compiler-generated temporaries between predicated regions.)
 
 I don't understand why lowering the way you suggest helps here at all.
 In the proposed scheme, you essentially have whole function
 in e.g. worker-single or vector-single mode, which you need to be
 able to handle properly in any case, because users can write such
 routines themselves.

In vector-single or worker-single mode, divergence of threads within a
warp or a CTA is controlled by broadcasting the controlling expression
of conditional branches to the set of inactive threads, so each of
those follows along with the active thread. So you only get
potentially-problematic thread divergence when workers or vectors are
operating in partitioned mode.

So, for instance, a made-up example:

#pragma acc parallel
{
  #pragma acc loop gang
  for (i = 0; i  N; i++))
  {
#pragma acc loop worker
for (j = 0; j  M; j++)
{
  if (j  M / 2)
/* stmt 1 */
  else
/* stmt 2 */
}

/* reconvergence point: thread barrier */

[...]
  }
}

Here stmt 1 and stmt 2 execute in worker-partitioned, vector-single
mode. With early lowering, the reconvergence point can be
inserted at the end of the loop, and abnormal edges (etc.) can be used
to ensure that the CFG does not get changed in such a way that there is
no longer a unique point at which the loop threads reconverge.

With late lowering, it's no longer obvious to me if that can still be
done.

Julian


Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Julian Brown
On Fri, 19 Jun 2015 14:25:57 +0200
Jakub Jelinek ja...@redhat.com wrote:

 On Fri, Jun 19, 2015 at 11:53:14AM +0200, Bernd Schmidt wrote:
  On 05/28/2015 05:08 PM, Jakub Jelinek wrote:
  
  I understand it is more work, I'd just like to ask that when
  designing stuff for the OpenACC offloading you (plural) try to
  take the other offloading devices and host fallback into account.
  
  The problem is that many of the transformations we need to do are
  really GPU specific, and with the current structure of
  omplow/ompexp they are being done in the host compiler. The
  offloading scheme we decided on does not give us the means to write
  out multiple versions of an offloaded function where each target
  gets a different one. For that reason I think we should postpone
  these lowering decisions until we're in the accel compiler, where
  they could be controlled by target hooks, and over the last two
  weeks I've been doing some experiments to see how that could be
  achieved.

 I wonder why struct loop flags and other info together with function
 attributes and/or cgraph flags and other info aren't sufficient for
 the OpenACC needs.
 Have you or Thomas looked what we're doing for OpenMP simd / Cilk+
 simd?
 
 Why can't the execution model (normal, vector-single and
 worker-single) be simply attributes on functions or cgraph node flags
 and the kind of #acc loop simply be flags on struct loop, like
 already OpenMP simd / Cilk+ simd is?

One problem is that (at least on the GPU hardware we've considered so
far) we're somewhat constrained in how much control we have over how the
underlying hardware executes code: it's possible to draw up a scheme
where OpenACC source-level control-flow semantics are reflected directly
in the PTX assembly output (e.g. to say all threads in a CTA/warp will
be coherent after such-and-such a loop), and lowering OpenACC
directives quite early seems to make that relatively tractable. (Even
if the resulting code is relatively un-optimisable due to the abnormal
edges inserted to make sure that the CFG doesn't become ill-formed.)

If arbitrary optimisations are done between OMP-lowering time and
somewhere around vectorisation (say), it's less clear if that
correspondence can be maintained. Say if the code executed by half the
threads in a warp becomes physically separated from the code executed
by the other half of the threads in a warp due to some loop
optimisation, we can no longer easily determine where that warp will
reconverge, and certain other operations (relying on coherent warps --
e.g. CTA synchronisation) become impossible. A similar issue exists for
warps within a CTA.

So, essentially -- I don't know how late loop lowering would interact
with:

(a) Maintaining a CFG that will work with PTX.

(b) Predication for worker-single and/or vector-single modes
(actually all currently-proposed schemes have problems with proper
representation of data-dependencies for variables and
compiler-generated temporaries between predicated regions.)

Julian


[gomp4] Tests for private variables/state propagation

2015-06-17 Thread Julian Brown
Hi,

This is a set of tests for OpenACC private variable/state propagation
support in GCC. The associated functionality is a work-in-progress: as
such, many of these tests do not pass yet (causing incorrect results,
ICEs or even bogus assembly output). I believe the tests to be valid
OpenACC, though it's possible I misinterpreted the spec at some points!

I will apply to the gomp4 branch shortly. (We will of course be working
on addressing the failures.)

Cheers,

Julian

ChangeLog

libgomp/
* testsuite/libgomp.oacc-c-c++-common/
private-vars-par-gang-{1,2,3}.c: New tests.
* testsuite/libgomp.oacc-c-c++-common/
private-vars-local-gang-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/
private-vars-loop-gang-{1,2,3,4,5,6}.c: New tests.
* testsuite/libgomp.oacc-c-c++-common/
private-vars-loop-worker-{1,2,3,4,5,6,7}.c: New tests.
* testsuite/libgomp.oacc-c-c++-common/
private-vars-local-worker-{1,2,3,4,5}.c: New tests.
* testsuite/libgomp.oacc-c-c++-common/
private-vars-loop-vector-{1,2}.c: New tests.commit 40193f49480f0a0b750d15049d29fd427282c5f0
Author: Julian Brown jul...@codesourcery.com
Date:   Tue Jun 16 03:50:55 2015 -0700

New set of private variable/state propagation tests.

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-gang-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-gang-1.c
new file mode 100644
index 000..ada46d0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-gang-1.c
@@ -0,0 +1,38 @@
+#include assert.h
+
+/* Test of gang-private variables declared in local scope with parallel
+   directive.  */
+
+#if defined(ACC_DEVICE_TYPE_host) || defined(ACC_DEVICE_TYPE_host_nonshm)
+#define ACTUAL_GANGS 1
+#else
+#define ACTUAL_GANGS 32
+#endif
+
+int
+main (int argc, char* argv[])
+{
+  int x = 5, i, arr[ACTUAL_GANGS];
+
+  for (i = 0; i  ACTUAL_GANGS; i++)
+arr[i] = 3;
+
+  #pragma acc parallel copy(arr) num_gangs(ACTUAL_GANGS) num_workers(8) \
+		   vector_length(32)
+  {
+int x;
+
+#pragma acc loop gang(static:1)
+for (i = 0; i  ACTUAL_GANGS; i++)
+  x = i * 2;
+
+#pragma acc loop gang(static:1)
+for (i = 0; i  ACTUAL_GANGS; i++)
+  arr[i] += x;
+  }
+
+  for (i = 0; i  ACTUAL_GANGS; i++)
+assert (arr[i] == 3 + i * 2);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-1.c
new file mode 100644
index 000..f8658e5
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-1.c
@@ -0,0 +1,56 @@
+/* { dg-xfail-run-if TODO { openacc_nvidia_accel_selected } { * } {  } } */
+
+#include assert.h
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Back-to-back worker loops.  */
+
+int
+main (int argc, char* argv[])
+{
+  int i, arr[32 * 32 * 32];
+
+  for (i = 0; i  32 * 32 * 32; i++)
+arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+int j;
+
+#pragma acc loop gang
+for (i = 0; i  32; i++)
+  {
+#pragma acc loop worker
+	for (j = 0; j  32; j++)
+	  {
+	int k;
+	int x = i ^ j * 3;
+
+	#pragma acc loop vector
+	for (k = 0; k  32; k++)
+	  arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+
+	#pragma acc loop worker
+	for (j = 0; j  32; j++)
+	  {
+	int k;
+	int x = i | j * 5;
+	
+	#pragma acc loop vector
+	for (k = 0; k  32; k++)
+	  arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+  }
+  }
+
+  for (i = 0; i  32; i++)
+for (int j = 0; j  32; j++)
+  for (int k = 0; k  32; k++)
+{
+	  int idx = i * 1024 + j * 32 + k;
+  assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-2.c
new file mode 100644
index 000..925f9a0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-2.c
@@ -0,0 +1,51 @@
+/* { dg-xfail-run-if TODO { openacc_nvidia_accel_selected } { * } {  } } */
+
+#include assert.h
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Successive vector loops.  */
+
+int
+main (int argc, char* argv[])
+{
+  int x = 5, i, arr[32 * 32 * 32];
+
+  for (i = 0; i  32 * 32 * 32; i++)
+arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+int j;
+
+#pragma acc loop gang
+for (i = 0; i  32; i++)
+  {
+#pragma acc loop worker
+	for (j = 0; j  32; j++)
+	  {
+	int k;
+	int x = i ^ j * 3;
+
+	#pragma acc loop vector
+	for (k = 0; k  32; k++)
+	  arr[i * 1024 + j * 32 + k] += x * k;
+	
+	x = i

[gomp4] (NVPTX) thread barriers after OpenACC worker loops

2015-06-08 Thread Julian Brown
Hi,

This patch adds a thread barrier after worker loops for OpenACC, in
accordance with OpenACC 2.0a section 2.7.3 (worker loops): All workers
will complete execution of their assigned iterations before any worker
proceeds beyond the end of the loop.. (This is quite target-specific:
work to alleviate that is still ongoing.)

Barriers are special in that they should not be cloned or subject to
excessive code motion: to that end, barriers placed after loops have
their (outgoing) edge set to EDGE_ABNORMAL. That seems to suffice to
keep the barriers in the right places.

This passes libgomp testing when applied on gomp4 branch, and fixes the
previously-broken worker-partn-5.c and worker-partn-6.c tests, on top
of my previous patches:

https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02612.html
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00307.html

(ping!), but unfortunately (again, with the above patches) appears to
interact badly with Cesar's patch for vector state propagation:

https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00371.html

I haven't yet investigated why (I reverted that patch in my local
series in order to test the attached patch).

FYI,

Julian

ChangeLog

gcc/
* omp-low.c (build_oacc_threadbarrier): New function.
(oacc_loop_needs_threadbarrier_p): New function.
(expand_omp_for_static_nochunk, expand_omp_for_static_chunk):
Insert threadbarrier after worker loops.
(find_omp_for_region_data): Rename to...
(find_omp_for_region_gwv): This. Return mask, rather than modifying
REGION structure.
(build_omp_regions_1): Move modification of REGION structure to
here, after calling above function with new name.
(generate_oacc_broadcast): Use new build_oacc_threadbarrier
function.
(make_gimple_omp_edges): Make edges out of OpenACC worker loop exit
block abnormal.
* tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Add
BUILT_IN_GOACC_THREADBARRIER.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/worker-partn-5.c: Remove
XFAIL.
* testsuite/libgomp.oacc-c-c++-common/worker-partn-6.c: Likewise.commit e46fbc68b7bc7e705417475fcfb8e203056b5a51
Author: Julian Brown jul...@codesourcery.com
Date:   Fri Jun 5 10:01:01 2015 -0700

Threadbarrier after worker and vector loops.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 55a2a12..45ff05a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3691,6 +3691,15 @@ build_omp_barrier (tree lhs)
   return g;
 }
 
+/* Build a call to GOACC_threadbarrier.  */
+
+static gcall *
+build_oacc_threadbarrier (void)
+{
+  tree fndecl = builtin_decl_explicit (BUILT_IN_GOACC_THREADBARRIER);
+  return gimple_build_call (fndecl, 0);
+}
+
 /* If a context was created for STMT when it was scanned, return it.  */
 
 static omp_context *
@@ -7181,6 +7190,20 @@ expand_omp_for_generic (struct omp_region *region,
 }
 
 
+/* True if a barrier is needed after a loop partitioned over
+   gangs/workers/vectors as specified by GWV_BITS.  OpenACC semantics specify
+   that a (conceptual) barrier is needed after worker and vector-partitioned
+   loops, but not after gang-partitioned loops.  Currently we are relying on
+   warp reconvergence to synchronise threads within a warp after vector loops,
+   so an explicit barrier is not helpful after those.  */
+
+static bool
+oacc_loop_needs_threadbarrier_p (int gwv_bits)
+{
+  return (gwv_bits  (MASK_GANG | MASK_WORKER)) == MASK_WORKER;
+}
+
+
 /* A subroutine of expand_omp_for.  Generate code for a parallel
loop with static schedule and no specified chunk size.  Given
parameters:
@@ -7523,7 +7546,11 @@ expand_omp_for_static_nochunk (struct omp_region *region,
 {
   t = gimple_omp_return_lhs (gsi_stmt (gsi));
   if (gimple_omp_for_kind (fd-for_stmt) == GF_OMP_FOR_KIND_OACC_LOOP)
-	gcc_checking_assert (t == NULL_TREE);
+	{
+	  gcc_checking_assert (t == NULL_TREE);
+	  if (oacc_loop_needs_threadbarrier_p (region-gwv_this))
+	gsi_insert_after (gsi, build_oacc_threadbarrier (), GSI_SAME_STMT);
+	}
   else
 	gsi_insert_after (gsi, build_omp_barrier (t), GSI_SAME_STMT);
 }
@@ -7956,7 +7983,11 @@ expand_omp_for_static_chunk (struct omp_region *region,
 {
   t = gimple_omp_return_lhs (gsi_stmt (gsi));
   if (gimple_omp_for_kind (fd-for_stmt) == GF_OMP_FOR_KIND_OACC_LOOP)
-	gcc_checking_assert (t == NULL_TREE);
+{
+	  gcc_checking_assert (t == NULL_TREE);
+	  if (oacc_loop_needs_threadbarrier_p (region-gwv_this))
+	gsi_insert_after (gsi, build_oacc_threadbarrier (), GSI_SAME_STMT);
+	}
   else
 	gsi_insert_after (gsi, build_omp_barrier (t), GSI_SAME_STMT);
 }
@@ -10270,22 +10301,26 @@ expand_omp (struct omp_region *region)
 /* Map each basic block to an omp_region.  */
 static hash_mapbasic_block, omp_region * *bb_region_map;
 
-/* Fill in additional data for a region REGION associated with an
+/* Return a mask of GWV bits for region REGION associated with an
OMP_FOR STMT.  */
 
-static void
-find_omp_for_region_data

[gomp4] Add tests for OpenACC worker-single/worker-partitioned modes

2015-06-04 Thread Julian Brown
Hi,

This patch adds a set of tests for worker-single predication (added
by Bernd in https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00094.html)
and worker-partitioned mode for OpenACC.

Results generally look good, though support for synchronisation after
worker loops is currently missing, so the corresponding tests are
XFAILed for NVidia (I will look into fixing that).

I will apply shortly.

Thanks,

Julian

ChangeLog

libgomp/
* testsuite/libgomp.oacc-c-c++-common/
worker-single-{1,1a,2,3,4,5,6}.c: New tests.
* testsuite/libgomp.oacc-c-c++-common/
worker-partn-{1,2,3,4,5,6,7}.c: New tests.commit c4edb6e748c86c2bc5251707f61d4d37679194cf
Author: Julian Brown jul...@codesourcery.com
Date:   Thu Jun 4 07:16:56 2015 -0700

Add a set of OpenACC worker-single/worker-partitioned mode tests.

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-1.c
new file mode 100644
index 000..1bdb8ea
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-1.c
@@ -0,0 +1,30 @@
+#include assert.h
+
+/* Test worker-partitioned/vector-single mode.  */
+
+int
+main (int argc, char *argv[])
+{
+  int arr[32 * 8], i;
+
+  for (i = 0; i  32 * 8; i++)
+arr[i] = 0;
+
+  #pragma acc parallel copy(arr) num_gangs(8) num_workers(8) vector_length(32)
+  {
+int j;
+#pragma acc loop gang
+for (j = 0; j  32; j++)
+  {
+	int k;
+	#pragma acc loop worker
+	for (k = 0; k  8; k++)
+  arr[j * 8 + k] += j * 8 + k;
+  }
+  }
+
+  for (i = 0; i  32 * 8; i++)
+assert (arr[i] == i);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-2.c
new file mode 100644
index 000..1023e22
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-2.c
@@ -0,0 +1,44 @@
+#include assert.h
+
+/* Test condition in worker-partitioned mode.  */
+
+int
+main (int argc, char *argv[])
+{
+  int arr[32 * 32 * 8], i;
+
+  for (i = 0; i  32 * 32 * 8; i++)
+arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(8) num_workers(8) vector_length(32)
+  {
+int j;
+#pragma acc loop gang
+for (j = 0; j  32; j++)
+  {
+	int k;
+	#pragma acc loop worker
+	for (k = 0; k  8; k++)
+	  {
+	int m;
+	if ((k % 2) == 0)
+	  {
+		#pragma acc loop vector
+		for (m = 0; m  32; m++)
+		  arr[j * 32 * 8 + k * 32 + m]++;
+	  }
+	else
+	  {
+		#pragma acc loop vector
+		for (m = 0; m  32; m++)
+		  arr[j * 32 * 8 + k * 32 + m] += 2;
+	  }
+	  }
+  }
+  }
+
+  for (i = 0; i  32 * 32 * 8; i++)
+assert (arr[i] == i + ((i / 32) % 2) + 1);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-3.c
new file mode 100644
index 000..a13a571
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-3.c
@@ -0,0 +1,54 @@
+#include assert.h
+
+/* Test switch in worker-partitioned mode.  */
+
+int
+main (int argc, char *argv[])
+{
+  int arr[32 * 32 * 8], i;
+
+  for (i = 0; i  32 * 32 * 8; i++)
+arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(8) num_workers(8) vector_length(32)
+  {
+int j;
+#pragma acc loop gang
+for (j = 0; j  32; j++)
+  {
+	int k;
+	#pragma acc loop worker
+	for (k = 0; k  8; k++)
+	  {
+	int m;
+	switch ((j * 32 + k) % 3)
+	{
+	case 0:
+	  #pragma acc loop vector
+	  for (m = 0; m  32; m++)
+		arr[j * 32 * 8 + k * 32 + m]++;
+	  break;
+
+	case 1:
+	  #pragma acc loop vector
+	  for (m = 0; m  32; m++)
+		arr[j * 32 * 8 + k * 32 + m] += 2;
+	  break;
+
+	case 2:
+	  #pragma acc loop vector
+	  for (m = 0; m  32; m++)
+		arr[j * 32 * 8 + k * 32 + m] += 3;
+	  break;
+
+	default: ;
+	}
+	  }
+  }
+  }
+
+  for (i = 0; i  32 * 32 * 8; i++)
+assert (arr[i] == i + ((i / 32) % 3) + 1);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-4.c
new file mode 100644
index 000..0902c80
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-4.c
@@ -0,0 +1,54 @@
+#include assert.h
+
+/* Test worker-single/worker-partitioned transitions.  */
+
+int
+main (int argc, char *argv[])
+{
+  int n[32], arr[32 * 32], i;
+
+  for (i = 0; i  32 * 32; i++)
+arr[i] = 0;
+
+  for (i = 0; i  32; i++)
+n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) num_gangs(8) num_workers(16) \
+	  vector_length(32)
+  {
+int j;
+#pragma acc loop gang
+for (j = 0; j  32; j++)
+  {
+	int k;
+
+	n[j]++;
+
+	#pragma acc loop worker
+	for (k = 0; k  32; k++)
+  arr[j * 32 + k]++;
+
+	n[j]++;
+
+	#pragma acc loop worker
+	for (k = 0; k  32; k++)
+  arr[j * 32 + k]++;
+
+	n[j]++;
+
+	#pragma acc loop worker
+	for (k = 0

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-03 Thread Julian Brown
On Thu, 28 May 2015 16:37:04 +0200
Richard Biener richard.guent...@gmail.com wrote:

 On Thu, May 28, 2015 at 4:06 PM, Julian Brown
 jul...@codesourcery.com wrote:
  For NVPTX, it is vitally important that the divergence of threads
  within a warp can be controlled: in particular we must be able to
  generate code that we know reconverges at a particular point.
  Unfortunately GCC's middle-end optimisers can cause this property to
  be violated, which causes problems for the OpenACC execution model
  we're planning to use for NVPTX.
 
 Hmm, I don't think adding a new edge flag is good nor necessary.  It
 seems to me that instead the broadcast operation should have abnormal
 control flow and thus basic-blocks should be split either before or
 after it (so either incoming or outgoing edge(s) should be
 abnormal).  I suppose splitting before the broadcast would be best
 (thus handle it similar to setjmp ()).

Here's a version of the patch that uses abnormal edges with semantics
unchanged, splitting the false/non-execution edge using a dummy block
to avoid the prohibited case of both EDGE_TRUE/EDGE_FALSE and
EDGE_ABNORMAL on the outgoing edges of a GIMPLE_COND.

So for a fragment like this:

  if (threadIdx.x == 0) /* cond_bb */
  {
/* work */
p0 = ...; /* assign */
  }
  pN = broadcast(p0);
  if (pN) goto T; else goto F;

Incoming edges to a broadcast operation have EDGE_ABNORMAL set:

  ++
  |cond_bb |,
  ++|
  | (true edge) | (false edge)
  v v
  ++ +---+
  | (work) | | dummy |
  ++ +---+
  | assign ||
  ++|
ABNORM| |ABNORM
  v |
  ++---'
  |  bcast |
  ++
  |  cond  |
  ++
   / \
  T   F

The abnormal edges actually serve two purposes, I think: as well as
ensuring the broadcast operation takes place when a warp is
non-diverged/coherent, they ensure that p0 is not seen as uninitialised
along the false path from cond_bb, possibly leading to the broadcast
operation being optimised away as partially redundant. This feels
somewhat fragile though! We'll have to continue to think about
warp divergence in subsequent patches.

The patch passes libgomp testing (with Bernd's recent worker-single
patch also). OK for gomp4 branch (together with the
previously-mentioned inline thread builtin patch)?

Thanks,

Julian

ChangeLog

gcc/
* omp-low.c (make_predication_test): Split false block out of
cond_bb, making latter edge abnormal.
(predicate_bb): Set EDGE_ABNORMAL on edges before broadcast
operations.commit 38056ae4a29f93ce54715dfad843a233f3b0fd2a
Author: Julian Brown jul...@codesourcery.com
Date:   Mon Jun 1 11:12:41 2015 -0700

Use abnormal edges before broadcast ops

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 7048f9f..310eb72 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -10555,7 +10555,16 @@ make_predication_test (edge true_edge, basic_block skip_dest_bb, int mask)
   gsi_insert_after (tmp_gsi, cond_stmt, GSI_NEW_STMT);
 
   true_edge-flags = EDGE_TRUE_VALUE;
-  make_edge (cond_bb, skip_dest_bb, EDGE_FALSE_VALUE);
+
+  /* Force an abnormal edge before a broadcast operation that might be present
+ in SKIP_DEST_BB.  This is only done for the non-execution edge (with
+ respect to the predication done by this function) -- the opposite
+ (execution) edge that reaches the broadcast operation must be made
+ abnormal also, e.g. in this function's caller.  */
+  edge e = make_edge (cond_bb, skip_dest_bb, EDGE_FALSE_VALUE);
+  basic_block false_abnorm_bb = split_edge (e);
+  edge abnorm_edge = single_succ_edge (false_abnorm_bb);
+  abnorm_edge-flags |= EDGE_ABNORMAL;
 }
 
 /* Apply OpenACC predication to basic block BB which is in
@@ -10605,6 +10614,7 @@ predicate_bb (basic_block bb, struct omp_region *parent, int mask)
 		   mask);
 
   edge e = split_block (bb, splitpoint);
+  e-flags = EDGE_ABNORMAL;
   skip_dest_bb = e-dest;
 
   gimple_cond_set_condition (as_a gcond * (stmt), EQ_EXPR,
@@ -10624,6 +10634,7 @@ predicate_bb (basic_block bb, struct omp_region *parent, int mask)
 		   gsi_asgn, mask);
 
   edge e = split_block (bb, splitpoint);
+  e-flags = EDGE_ABNORMAL;
   skip_dest_bb = e-dest;
 
   gimple_switch_set_index (sstmt, new_var);


[gomp4] Expand OpenACC thread builtins inline

2015-05-28 Thread Julian Brown
For partitioned loops, we're currently calling library functions (in
libgcc) to determine the cardinality of the set of threads a particular
loop is distributed over (given a set of gang/worker/vector toggles),
and the index of the current thread within that set.

This patch reimplements those two functions in terms of the
(PTX-specific!) builtins that Bernd has recently added in order to
implement vector-single/worker-single predication, which expand
directly to machine instructions on the target (or to constant zero/one
on the host). It also makes use of the same gwv bitfields that are set
up by that new code.

The previous BUILT_IN_GOACC_GET_THREAD_NUM and
BUILT_IN_GOACC_GET_NUM_THREADS builtins are removed entirely.

This works reasonably well, but there are some regressions caused by
middle-end optimisers having extra freedom to manipulate the CFG in
ways that PTX cannot support without the optimisation barrier of the
calls to the thread builtins being present. This will be addressed by a
follow-on patch.

Pre-approved for gomp4, but I'll wait for comments on the follow-on
patch before applying so as not to leave the branch in a broken state.

Thanks,

Julian

ChangeLog

gcc/
* builtins.c (expand_oacc_builtin): Return const1_rtx for
ntid/nctaid builtins when the associated patterns are not present.
* omp-builtins.def (BUILT_IN_GOACC_GET_THREAD_NUM)
(BUILT_IN_GOACC_GET_NUM_THREADS): Remove.
* omp-low.c (struct omp_for_data): Remove gang, worker, vector
fields.
(extract_omp_for_data): Don't initialise deleted gang, worker,
vector fields.
(expand_oacc_get_num_threads, expand_oacc_get_thread_num): New
functions.
(lower_reduction_clauses): Use above functions.
(expand_omp_for_static_nochunk): Likewise.
(expand_omp_for_static_chunk): Likewise.
commit 1be8ada44a9f91d2eba16ef1f81243707647f237
Author: Julian Brown jul...@codesourcery.com
Date:   Fri May 15 03:20:42 2015 -0700

Inlined OpenACC thread builtins.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index ebd4b4a..cd51821 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5964,8 +5964,8 @@ expand_oacc_builtin (enum built_in_function fcode, tree exp, rtx target)
 case BUILT_IN_GOACC_NTID:
 #ifdef HAVE_oacc_ntid
   icode = CODE_FOR_oacc_ntid;
-  result = const1_rtx;
 #endif
+  result = const1_rtx;
   break;
 case BUILT_IN_GOACC_TID:
 #ifdef HAVE_oacc_tid
@@ -5975,8 +5975,8 @@ expand_oacc_builtin (enum built_in_function fcode, tree exp, rtx target)
 case BUILT_IN_GOACC_NCTAID:
 #ifdef HAVE_oacc_nctaid
   icode = CODE_FOR_oacc_nctaid;
-  result = const1_rtx;
 #endif
+  result = const1_rtx;
   break;
 case BUILT_IN_GOACC_CTAID:
 #ifdef HAVE_oacc_ctaid
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index ac1f802..47d9e45 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -69,10 +69,6 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_NCTAID, GOACC_nctaid,
 		   BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_CTAID, GOACC_ctaid,
 		   BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_THREAD_NUM, GOACC_get_thread_num,
-		   BT_FN_INT_INT_INT_INT, ATTR_NOTHROW_LEAF_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_NUM_THREADS, GOACC_get_num_threads,
-		   BT_FN_INT_INT_INT_INT, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_GANGLOCAL_PTR, GOACC_get_ganglocal_ptr,
 		   BT_FN_PTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DEVICEPTR, GOACC_deviceptr,
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index b114887..f82247b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -263,7 +263,6 @@ struct omp_for_data
   tree chunk_size;
   gomp_for *for_stmt;
   tree pre, iter_type;
-  tree gang, worker, vector;
   int collapse;
   bool have_nowait, have_ordered;
   enum omp_clause_schedule_kind sched_kind;
@@ -749,16 +748,6 @@ extract_omp_for_data (gomp_for *for_stmt, struct omp_for_data *fd,
   gcc_assert (fd-chunk_size == NULL_TREE);
   fd-chunk_size = build_int_cst (TREE_TYPE (fd-loop.v), 1);
 }
-
-  /* Extract the OpenACC gang, worker and vector clauses.  */
-  t = find_omp_clause (gimple_omp_for_clauses (for_stmt), OMP_CLAUSE_GANG);
-  fd-gang = (t == NULL_TREE) ? integer_zero_node : integer_one_node;
-
-  t = find_omp_clause (gimple_omp_for_clauses (for_stmt), OMP_CLAUSE_WORKER);
-  fd-worker = (t == NULL_TREE) ? integer_zero_node : integer_one_node;
-
-  t = find_omp_clause (gimple_omp_for_clauses (for_stmt), OMP_CLAUSE_VECTOR);
-  fd-vector = (t == NULL_TREE) ? integer_zero_node : integer_one_node;
 }
 
 
@@ -4919,6 +4908,159 @@ is_atomic_compatible_reduction (tree var, omp_context *ctx)
   return true;
 }
 
+
+/* Find the total number of threads used by a region partitioned by
+   GWV_BITS.  Setup code required for the calculation is added to SEQ.  Note
+   that this is currently used from both OMP-lowering and OMP-expansion phases,
+   and uses

Re: acc_on_device for device_type_host_nonshm

2015-05-28 Thread Julian Brown
On Thu, 28 May 2015 04:48:58 -0700
H.J. Lu hjl.to...@gmail.com wrote:

 On Thu, May 21, 2015 at 4:10 AM, Jakub Jelinek ja...@redhat.com
 wrote:
  On Thu, May 21, 2015 at 01:02:12PM +0200, Thomas Schwinge wrote:
  Hi!
 
  On Thu, 7 May 2015 19:32:26 +0100, Julian Brown
  jul...@codesourcery.com wrote:
   Here's a new version of the patch [...]
 
   OK for trunk?
 
  Makes sense to me (with just a request to drop the testsuite
  changes, see below), to get the existing regressions under
  control.  Jakub?
 
  Ok for trunk.
 
   PR libgomp/65742
  
   gcc/
   * builtins.c (expand_builtin_acc_on_device): Don't use
   open-coded sequence for !ACCEL_COMPILER.
  
 
 It breaks bootstrap on x86:
 
 https://gcc.gnu.org/ml/gcc-regression/2015-05/msg00389.html
 
 I checked in this to fix it.

Apologies, and thanks!

Julian


[gomp4] Preserve NVPTX reconvergence points

2015-05-28 Thread Julian Brown
 (canonicalize_loop_closed_ssa):
Likewise.
* predict.c (tree_bb_level_predictions): Likewise.
* profile.c (instrument_edges, branch_prop, find_spanning_tree):
Likewise.
* tree-cfg.c (replace_uses_by, gimple_split_edge)
(gimple_redirect_edge_and_branch, split_critical_edges): Likewise.
* tree-cfgcleanup.c (tree_forwarder_block_p, remove_forwarder_block)
(pass_merge_phi::execute): Likewise.
* tree-chkp.c (chkp_fix_cfg): Likewise.
* tree-if-conv.c (if_convertible_bb_p): Likewise.
* tree-inline.c (update_ssa_across_abnormal_edges): Likewise.
* tree-into-ssa.c (rewrite_update_phi_arguments)
(rewrite_update_dom_walker::before_dom_children)
(create_new_def_for): Likewise.
* tree-outof-ssa.c (eliminate_phi): Likewise.
* tree-phinodes.c (add_phi_arg): Likewise.
* tree-ssa-coalesce (coalesce_cost_edge, create_outofssa_var_map)
(coalesce_partitions): Likewise.
* tree-ssa-dom.c (cprop_into_successor_phis)
(dom_opt_dom_walker::after_dom_children, propagate_rhs_into_lhs):
Likewise.
* tree-ssa-loop-im.c (loop_suitable_for_sm): Likewise.
* tree-ssa-loop-prefetch.c (emit_mfence_after_loop)
(may_use_storent_in_loop_p): Likewise.
* tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Likewise.
* tree-ssa-pre.c (compute_antic, insert_into_preds_of_block):
Likewise.
* tree-ssa-propagate.c (simulate_block, replace_phi_args_in):
Likewise.
* tree-ssa-sink.c (sink_code_in_bb): Likewise.
* tree-ssa-threadedge.c (thread_across_edge): Likewise.
* tree-ssa-threadupdate.c (thread_single_edge): Likewise.
* tree-ssa-uninit.c (compute_control_dep_chain): Likewise.
* tree-ssa.c (verify_phi_args): Likewise.
* tree-vect-loop.c (vect_analyze_loop_form): Likewise.
* value-prof.c (gimple_ic): Likewise.
* tree-vrp.c (infer_value_range, process_assert_insertions_for):
Likewise.
(find_conditional_asserts): Skip over EDGE_TO_RECONVERGENCE edges.
commit 472bd543b30356f7a4c59efc961f9f61b11ca197
Author: Julian Brown jul...@codesourcery.com
Date:   Wed May 20 11:35:45 2015 -0700

Introduce EDGE_TO_RECONVERGENCE, and tweak some uses of EDGE_ABNORMAL.

diff --git a/gcc/basic-block.h b/gcc/basic-block.h
index f28fa57..7fe25f0 100644
--- a/gcc/basic-block.h
+++ b/gcc/basic-block.h
@@ -70,7 +70,8 @@ enum cfg_edge_flags {
Test the edge flags on EDGE_COMPLEX to detect all forms of strange
control flow transfers.  */
 #define EDGE_COMPLEX \
-  (EDGE_ABNORMAL | EDGE_ABNORMAL_CALL | EDGE_EH | EDGE_PRESERVE)
+  (EDGE_ABNORMAL | EDGE_ABNORMAL_CALL | EDGE_EH | EDGE_PRESERVE \
+   | EDGE_TO_RECONVERGENCE)
 
 struct GTY(()) rtl_bb_info {
   /* The first insn of the block is embedded into bb-il.x.  */
@@ -559,6 +560,20 @@ bb_has_abnormal_pred (basic_block bb)
   return false;
 }
 
+static inline bool
+bb_has_abnorm_or_reconv_pred (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb-preds)
+{
+  if (e-flags  (EDGE_ABNORMAL | EDGE_TO_RECONVERGENCE))
+	return true;
+}
+  return false;
+}
+
 /* Return the fallthru edge in EDGES if it exists, NULL otherwise.  */
 static inline edge
 find_fallthru_edge (vecedge, va_gc *edges)
@@ -629,9 +644,10 @@ has_abnormal_or_eh_outgoing_edge_p (basic_block bb)
   edge_iterator ei;
 
   FOR_EACH_EDGE (e, ei, bb-succs)
-if (e-flags  (EDGE_ABNORMAL | EDGE_EH))
+if (e-flags  (EDGE_ABNORMAL | EDGE_EH | EDGE_TO_RECONVERGENCE))
   return true;
 
   return false;
 }
+
 #endif /* GCC_BASIC_BLOCK_H */
diff --git a/gcc/cfg-flags.def b/gcc/cfg-flags.def
index eedcd69..fd51e2f 100644
--- a/gcc/cfg-flags.def
+++ b/gcc/cfg-flags.def
@@ -177,6 +177,10 @@ DEF_EDGE_FLAG(TM_UNINSTRUMENTED, 15)
 /* Abort (over) edge out of a GIMPLE_TRANSACTION statement.  */
 DEF_EDGE_FLAG(TM_ABORT, 16)
 
+/* An immutable edge to an OpenACC (currently, NVPTX) reconvergence point. 
+   This flag is only used for the GIMPLE CFG.  */
+DEF_EDGE_FLAG(TO_RECONVERGENCE, 17)
+
 #endif
 
 /*
diff --git a/gcc/cfgbuild.c b/gcc/cfgbuild.c
index 7cbed50..7185f07 100644
--- a/gcc/cfgbuild.c
+++ b/gcc/cfgbuild.c
@@ -449,7 +449,7 @@ purge_dead_tablejump_edges (basic_block bb, rtx_jump_table_data *table)
   if (FULL_STATE (e-dest)  BLOCK_USED_BY_TABLEJUMP)
 	SET_STATE (e-dest, FULL_STATE (e-dest)
 			 ~(size_t) BLOCK_USED_BY_TABLEJUMP);
-  else if (!(e-flags  (EDGE_ABNORMAL | EDGE_EH)))
+  else if (!(e-flags  (EDGE_ABNORMAL | EDGE_EH | EDGE_TO_RECONVERGENCE)))
 	{
 	  remove_edge (e);
 	  continue;
diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index 797d14a..e73062a 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -2031,7 +2031,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
 
   /* Avoid deleting preserve label when redirecting ABNORMAL edges.  */
   if (block_has_preserve_label (e1-dest)
-   (e1-flags  EDGE_ABNORMAL))
+   (e1-flags  (EDGE_ABNORMAL | EDGE_TO_RECONVERGENCE)))
 return false;
 
   /* Here we know that the insns in the end of SRC1

Re: [gomp4] Vector-single predication

2015-05-21 Thread Julian Brown
On Thu, 21 May 2015 13:57:00 +0200
Jakub Jelinek ja...@redhat.com wrote:

 On Thu, May 21, 2015 at 01:42:11PM +0200, Bernd Schmidt wrote:
  This uses the patch I committed yesterday which introduces warp
  broadcasts to implement the vector-single predication needed for
  OpenACC. Outside a loop with vector parallelism, only one of the
  threads representing a vector must execute, the others follow
  along. So we skip the real work in each basic block for the
  inactive threads, then broadcast the direction to take in the
  control flow graph from the active one, and jump as a group.
  
  This will get extended with similar functionality for
  worker-single. Julian is working on some patches on top of that to
  ensure the later optimizers don't destroy the control flow - we
  really need the threads to reconverge and perform the
  broadcast/jump in lockstep.
  
  Committed on gomp-4_0-branch.
 
 What do you do with function calls?
 Do you call them just in the (tid.x  31) == 0 threads (then they
 can't use vectorization), or for all threads (then it is an ABI
 change, they would need to know whether they are called this way and
 depending on that handle it similarly (skip all the real work, except
 for function calls, for (tid.x  31) != 0, unless it is a vectorized
 region). Or is OpenACC restricting this to statements in the
 constructs directly (rather than anywhere in the region)?

OpenACC handles function calls specially (calling them routines -- of
varying sorts, gang, worker, vector or seq, affecting where they can be
invoked from). The plan is that all threads will call such routines --
and then some threads will be neutered as appropriate within the
routines themselves, as appropriate.

That's not actually implemented yet, though.

Julian


Re: [gomp4] Vector-single predication

2015-05-21 Thread Julian Brown
On Thu, 21 May 2015 14:38:19 +0100
Julian Brown jul...@codesourcery.com wrote:

 On Thu, 21 May 2015 15:21:54 +0200
 Jakub Jelinek ja...@redhat.com wrote:
 
  On Thu, May 21, 2015 at 02:05:12PM +0100, Julian Brown wrote:
   OpenACC handles function calls specially (calling them routines
   -- of varying sorts, gang, worker, vector or seq, affecting where
   they can be invoked from). The plan is that all threads will call
   such routines -- and then some threads will be neutered as
   appropriate within the routines themselves, as appropriate.
  
  All functions will behave that way, or just some using some magic
  attribute etc.?  Say will newlib functions behave this way (math
  functions, printf, ...)? 
 
 It's actually unclear at this point if regular functions are
 supported by OpenACC at all (the spec says nothing about them). They
 probably raise interesting questions about re-entrancy,
 synchronisation, and so on.

...actually, replied too soon: regular math functions, etc. will be
handled the same as routines declared with seq. They won't contain
partitioned loops, and can be called from anywhere in an offloaded
region.

Julian


Re: [gomp4] Vector-single predication

2015-05-21 Thread Julian Brown
On Thu, 21 May 2015 15:21:54 +0200
Jakub Jelinek ja...@redhat.com wrote:

 On Thu, May 21, 2015 at 02:05:12PM +0100, Julian Brown wrote:
  OpenACC handles function calls specially (calling them routines
  -- of varying sorts, gang, worker, vector or seq, affecting where
  they can be invoked from). The plan is that all threads will call
  such routines -- and then some threads will be neutered as
  appropriate within the routines themselves, as appropriate.
 
 All functions will behave that way, or just some using some magic
 attribute etc.?  Say will newlib functions behave this way (math
 functions, printf, ...)? 

It's actually unclear at this point if regular functions are
supported by OpenACC at all (the spec says nothing about them). They
probably raise interesting questions about re-entrancy,
synchronisation, and so on.

 For math functions e.g. it would be nice if
 they could behave both ways (perhaps as separate entrypoints), so
 have the possibility to say how many threads from the warp will
 perform the operation and then work on array arguments and array
 return value (kind like OpenMP or Cilk+ elemental functions, just
 perhaps with different argument/return value passing conventions).

And that's something that's way outside the spec as currently defined,
AFAIK.

Julian


[gomp4] Lack of OpenACC NVPTX devices is not an error during scanning

2015-05-19 Thread Julian Brown
Hi,

This patch fixes an oversight whereby if the CUDA libraries are
available for some reason on a system that doesn't actually contain an
nVidia card, an OpenACC program will raise an error if the NVPTX
backend is picked as a default instead of falling back to some other
device instead.

OK for gomp4 branch? For trunk?

Thanks,

Julian

ChangeLog

libgomp/
* plugin/plugin-nvptx.c (nvptx_get_num_devices): Return zero
on cuInit failure.commit 696a0d7e22bb8217ff581886cdf0979bfc2e85bb
Author: Julian Brown jul...@codesourcery.com
Date:   Fri May 15 03:22:56 2015 -0700

Lack of PTX devices is not an error during scanning.

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index b36691a..d09a91c 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -781,7 +781,13 @@ nvptx_get_num_devices (void)
  until cuInit has been called.  Just call it now (but don't yet do any
  further initialization).  */
   if (instantiated_devices == 0)
-cuInit (0);
+{
+  r = cuInit (0);
+  /* This is not an error: e.g. we may have CUDA libraries installed but
+ no devices available.  */
+  if (r != CUDA_SUCCESS)
+return 0;
+}
 
   r = cuDeviceGetCount (n);
   if (r!= CUDA_SUCCESS)


[gomp4] Add OpenACC vector-single/vector-partitioned tests

2015-05-19 Thread Julian Brown
Hi,

This patch adds several tests of vector-single/vector-partitioned mode,
as part of work implementing the OpenACC execution model.

Pre-approved for gomp4 branch. I will apply there shortly.

Thanks,

Julian

ChangeLog

libgomp/
* testsuite/libgomp.oacc-c-c++-common/vec-single-{1,2,3,4,5,6}.c:
New tests.
* testsuite/libgomp.oacc-c-c++-common/vec-partn-{1,2,3,4,5,6}.c:
New tests.
commit b2bb572cef2b6b0984d65995e070dc424b03a525
Author: jbrown jbrown@e7755896-6108-0410-9592-8049d3e74e28
Date:   Mon May 11 16:04:48 2015 +

Add vector-single/vector-partitioned tests.

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-1.c
new file mode 100644
index 000..b21e588
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-1.c
@@ -0,0 +1,30 @@
+#include assert.h
+
+/* Test basic vector-partitioned mode transitions.  */
+
+int
+main (int argc, char *argv[])
+{
+  int n = 0, arr[32], i;
+
+  for (i = 0; i  32; i++)
+arr[i] = 0;
+
+  #pragma acc parallel copy(n, arr) num_gangs(1) num_workers(1) \
+		   vector_length(32)
+  {
+int j;
+n++;
+#pragma acc loop vector
+for (j = 0; j  32; j++)
+  arr[j]++;
+n++;
+  }
+
+  assert (n == 2);
+
+  for (i = 0; i  32; i++)
+assert (arr[i] == 1);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-2.c
new file mode 100644
index 000..1ff222d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-2.c
@@ -0,0 +1,43 @@
+#include assert.h
+
+/* Test vector-partitioned, gang-partitioned mode.  */
+
+int
+main (int argc, char *argv[])
+{
+  int n[32], arr[1024], i;
+  
+  for (i = 0; i  1024; i++)
+arr[i] = 0;
+
+  for (i = 0; i  32; i++)
+n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) num_gangs(32) num_workers(1) \
+		   vector_length(32)
+  {
+int j, k;
+
+#pragma acc loop gang(static:*)
+for (j = 0; j  32; j++)
+  n[j]++;
+
+#pragma acc loop gang
+for (j = 0; j  32; j++)
+  #pragma acc loop vector
+  for (k = 0; k  32; k++)
+	arr[j * 32 + k]++;
+
+#pragma acc loop gang(static:*)
+for (j = 0; j  32; j++)
+  n[j]++;
+  }
+
+  for (i = 0; i  32; i++)
+assert (n[i] == 2);
+
+  for (i = 0; i  1024; i++)
+assert (arr[i] == 1);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c
new file mode 100644
index 000..7908d4c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c
@@ -0,0 +1,54 @@
+#include assert.h
+
+/* Test conditional vector-partitioned loops.  */
+
+int
+main (int argc, char *argv[])
+{
+  int n[32], arr[1024], i;
+
+  for (i = 0; i  1024; i++)
+arr[i] = 0;
+
+  for (i = 0; i  32; i++)
+n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) num_gangs(32) num_workers(1) \
+		   vector_length(32)
+  {
+int j, k;
+
+#pragma acc loop gang(static:*)
+for (j = 0; j  32; j++)
+  n[j]++;
+
+#pragma acc loop gang
+for (j = 0; j  32; j++)
+  {
+	if ((j % 2) == 0)
+	  {
+	#pragma acc loop vector
+	for (k = 0; k  32; k++)
+	  arr[j * 32 + k]++;
+	  }
+	else
+	  {
+	#pragma acc loop vector
+	for (k = 0; k  32; k++)
+	  arr[j * 32 + k]--;
+	  }
+  }
+
+#pragma acc loop gang(static:*)
+for (j = 0; j  32; j++)
+  n[j]++;
+  }
+
+  for (i = 0; i  32; i++)
+assert (n[i] == 2);
+
+  for (i = 0; i  1024; i++)
+assert (arr[i] == (i % 64)  32 ? 1 : -1);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-4.c
new file mode 100644
index 000..4ea3bf2
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-4.c
@@ -0,0 +1,46 @@
+#include assert.h
+
+/* Test conditions inside vector-partitioned loops.  */
+
+int
+main (int argc, char *argv[])
+{
+  int n[32], arr[1024], i;
+
+  for (i = 0; i  1024; i++)
+arr[i] = i;
+
+  for (i = 0; i  32; i++)
+n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) num_gangs(32) num_workers(1) \
+		   vector_length(32)
+  {
+int j, k;
+
+#pragma acc loop gang(static:*)
+for (j = 0; j  32; j++)
+  n[j]++;
+
+#pragma acc loop gang
+for (j = 0; j  32; j++)
+  {
+	#pragma acc loop vector
+	for (k = 0; k  32; k++)
+	  if ((arr[j * 32 + k] % 2) != 0)
+	arr[j * 32 + k] *= 2;
+  }
+
+#pragma acc loop gang(static:*)
+for (j = 0; j  32; j++)
+  n[j]++;
+  }
+
+  for (i = 0; i  32; i++)
+assert (n[i] == 2);
+
+  for (i = 0; i  1024; i++)
+assert (arr[i] == ((i % 2) == 0 ? i : i * 2));
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-5.c
new file mode 100644
index 

Re: acc_on_device for device_type_host_nonshm (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks) (PR65742)

2015-05-07 Thread Julian Brown
On Fri, 17 Apr 2015 15:16:19 +0200
Jakub Jelinek ja...@redhat.com wrote:

 On Tue, Apr 14, 2015 at 05:43:26PM +0200, Thomas Schwinge wrote:
  On Tue, 14 Apr 2015 15:15:02 +0100, Julian Brown
  jul...@codesourcery.com wrote:
   On Wed, 8 Apr 2015 17:58:56 +0300
   Ilya Verbin iver...@gmail.com wrote:
I see several regressions:
FAIL:
libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
-DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution
test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
-DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution
test
   
   I think there may be multiple issues here. The attached patch
   addresses one -- acc_device_type not distinguishing between
   offloaded and host code with the host_nonshm plugin.
  
  (You mean acc_on_device?)
  
   --- libgomp/oacc-init.c   (revision 221922)
   +++ libgomp/oacc-init.c   (working copy)
   @@ -548,7 +549,14 @@ ialias (acc_set_device_num)
int
acc_on_device (acc_device_t dev)
{
   -  if (acc_get_device_type () == acc_device_host_nonshm)
   +  struct goacc_thread *thr = goacc_thread ();
   +
   +  /* We only want to appear to be the host_nonshm plugin from
   offloaded
   + code -- i.e. within a parallel region.  Test a flag set by
   the
   + openacc_parallel hook of the host_nonshm plugin to
   determine that.  */
   +  if (acc_get_device_type () == acc_device_host_nonshm
   +   thr  thr-target_tls
   +   ((struct nonshm_thread *)thr-target_tls)-nonshm_exec)
return dev == acc_device_host_nonshm || dev ==
   acc_device_not_host; 
  /* Just rely on the compiler builtin.  */
  
  Really, acc_on_device is implemented as a compiler builtin (which
  is just disabled for a few libgomp test cases, in order to test the
  acc_on_device library function in libgomp), and I never understood
  why the fallback implementation in libgomp (cited above) should
  be doing anything different from the GCC builtin.  Is the problem
  actually, that some
 
 The question is if the builtin expansion isn't wrong, at least as
 long as the host_nonshm device is meant to be supported.  The
 #ifdef ACCEL_COMPILER
 case is easier, at least as long as ACCEL_COMPILER compiled code is
 not meant to be able to offload to other devices (or host again), but
 the non-ACCEL_COMPILER case means the code is either on the host, or
 host_nonshm, or e.g. with Intel MIC you could have some shared
 library be compiled by the host compiler, but then actuall linked
 into the MIC offloaded path.  In all those cases, I think it is just
 the library that can determine the return value.
 
 E.g. OpenMP omp_is_initial_device function is also only implemented
 in the library, perhaps at some point I could expand it for #ifdef
 ACCEL_COMPILER as builtin, but not for the host code, at least not
 due to the host-nonshm plugin.

Here's a new version of the patch that doesn't use the open-coded
expansion for acc_on_device for the host compiler at all. This means
that the host and the host_nonshm plugin should DTRT without any
special compiler options (which have thus been removed from the libgomp
tests that set them or refer to them).

So now, for the host, acc_on_device returns:

acc_on_device (acc_device_none): true
acc_on_device (acc_device_host): true
otherwise: false

When the host_nonshm plugin is active, acc_on_device returns:

acc_on_device (acc_device_host_nonshm): true (except when host
fallback is in effect, i.e. because of a false if clause).
acc_on_device (acc_device_not_host): likewise.
otherwise: false

In particular, the host_nonshm plugin doesn't consider itself to be
running code on the host.

OK for trunk?

Julian

ChangeLog

PR libgomp/65742

gcc/
* builtins.c (expand_builtin_acc_on_device): Don't use open-coded
sequence for !ACCEL_COMPILER.

libgomp/
* oacc-init.c (plugin/plugin-host.h): Include.
(acc_on_device): Check whether we're in an offloaded region for
host_nonshm
plugin. Don't use __builtin_acc_on_device.
* plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
nonshm_exec flag in thread-local data.
(GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
data for host_nonshm plugin.
(GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data
for host_nonshm plugin.
* plugin/plugin-host.h: New.
* testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: Remove
-fno-builtin-acc_on_device flag.
* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Remove
comment re: acc_on_device builtin.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.commit adccf2e7d313263d585f63e752a4d36653d47811
Author: Julian Brown jul...@codesourcery.com
Date:   Tue Apr 21 12:40:45 2015 -0700

Non-SHM acc_on_device fixes

diff --git a/gcc/builtins.c b/gcc

Re: [PATCH] Fix OpenACC shutdown and PTX image unloading (PR65904)

2015-05-07 Thread Julian Brown
On Wed, 6 May 2015 10:32:56 +0200
Thomas Schwinge tho...@codesourcery.com wrote:

 Hi!
 
 On Fri, 1 May 2015 10:47:19 +0100, Julian Brown
 jul...@codesourcery.com wrote:
  The patch also fixes a thinko that was revealed in image unloading
  in the NVPTX backend. Tested for libgomp with PTX offloading.
 
 Confirming that both nvptx (PR65904 fixed) and (emulated) intelmic (no
 changes) testing look fine.

Thanks for testing!

 By the way, do we need to lock ptx_devices in
 libgomp/plugin/plugin-nvptx.c:nvptx_attach_host_thread_to_device and
 libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_openacc_create_thread_data?
 (ptx_dev_lock?  If yes, its definition as well as instantiated_devices
 should be moved to a more appropriate place, probably?)

Probably yes (though I'm not sure what you mean about moving the
instantiated_devices and ptx_dev_lock to a more appropriate place?).

 Also, several
 accesses to instantiated_devices are not locked by ptx_dev_lock but
 should be, from my cursory review.

I'm not sure about that.

  --- a/libgomp/target.c
  +++ b/libgomp/target.c
  @@ -797,32 +797,79 @@ GOMP_offload_register (void *host_table, enum
  offload_target_type target_type, gomp_mutex_unlock (register_lock);
   }
   
  -/* This function should be called from every offload image while
  unloading.
  -   It gets the descriptor of the host func and var tables
  HOST_TABLE, TYPE of
  -   the target, and TARGET_DATA needed by target plugin.  */
  +/* DEVICEP should be locked on entry, and remains locked on exit.
  */
 
 (I'm not a native speaker, but would use what I consider to be more
 explicit/precise language: »must be locked« instead of »should be«.
 I'll be happy to learn should they mean the same thing?)

I've changed the wording in a couple of comments.

   
  -void
  -GOMP_offload_unregister (void *host_table, enum
  offload_target_type target_type,
  -void *target_data)
  +static void
  +gomp_deoffload_image_from_device (struct gomp_device_descr
  *devicep,
  + void *host_table, void
  *target_data) {
 
  +/* This function should be called from every offload image while
  unloading.
 
 s%from%for%, I think?  (And, s%should%must%, again?)

No, this really is from -- this comment wasn't actually added by my
patch, just moved. I'm also not sure about should in this instance --
unloading an image is already a corner-case, and maybe there are
circumstances in which it'd be impossible for some given object to call
the function?

  +   It gets the descriptor of the host func and var tables
  HOST_TABLE, TYPE of
  +   the target, and TARGET_DATA needed by target plugin.  */
  +
  +void
  +GOMP_offload_unregister (void *host_table, enum
  offload_target_type target_type,
  +void *target_data)
  +{
 
  -/* Free address mapping tables.  MM must be locked on entry, and
  remains locked
  -   on return.  */
  +/* Free address mapping tables for an active device DEVICEP.  This
  includes
  +   both mapped offload functions/variables, and mapped user data
  regions.
  +   To be used before shutting a device down: subsequently
  reinitialising the
  +   device will repopulate the offload image mappings.  */
   
   attribute_hidden void
  -gomp_free_memmap (struct splay_tree_s *mem_map)
  +gomp_free_memmap (struct gomp_device_descr *devicep)
   {
  +  int i;
  +  struct splay_tree_s *mem_map = devicep-mem_map;
  +
  +  assert (devicep-is_initialized);
  +
  +  gomp_mutex_lock (devicep-lock);
 
 Need to lock before first access to *devicep?

Fixed.

  +  
  +  /* Unmap offload images that are registered to this device.  */
  +  for (i = 0; i  num_offload_images; i++)
  +{
  +  struct offload_image_descr *image = offload_images[i];
 
 Need to take register_lock when accessing offload_images?

This too. Retested for libgomp/NVPTX.

OK for trunk now?

Thanks,

Julian

ChangeLog

PR libgomp/65904

libgomp/
* libgomp.h (gomp_free_memmap): Update prototype.
* oacc-init.c (acc_shutdown_1): Pass device descriptor to
gomp_free_memmap. Don't lock device around call.
* target.c (gomp_map_vars): Initialise tgt-array to NULL before
early exit.
(GOMP_offload_unregister): Split out and call...
(gomp_deoffload_image_from_device): This new function.
(gomp_free_memmap): Call gomp_deoffload_image_from_device.
* plugin/nvptx.c (struct ptx_image_data): Add ord, fn_descs fields.
(nvptx_init): Tweak comment.
(nvptx_attach_host_thread_to_device): Add locking with ptx_dev_lock
around ptx_devices accesses.
(GOMP_OFFLOAD_load_image): Populate new ptx_image_data fields.
(GOMP_OFFLOAD_unload_image): Switch to ORD'th device before freeing
images, and use fn_descs field from ptx_image_data instead of
incorrectly using a pointer derived from target_data.
(GOMP_OFFLOAD_openacc_create_thread_data): Add locking around
ptx_devices accesses.
Index: libgomp/target.c

Re: OpenACC: initialization with unsupported acc_device_t (was: [PR testsuite/65205, libgomp/65993] Fix dg-shouldfail usage in OpenACC libgomp tests)

2015-05-07 Thread Julian Brown
On Tue, 5 May 2015 16:09:18 +0200
Thomas Schwinge tho...@codesourcery.com wrote:

 Hi!
 
 On Tue, 5 May 2015 08:43:48 -0400, John David Anglin
 dave.ang...@bell.net wrote:
  On 2015-05-05 5:43 AM, Thomas Schwinge wrote:
   FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/lib-62.c
   -DACC_DEVICE_TYPE_hos
   t=1 -DACC_MEM_SHARED=1 output pattern test, is , should match
   invalid size
   With this one I'll need your help: please cite from libgomp.log
   (or, from a manual run) the actual output message that you're
   getting.
  There's no output message:
  # ./lib-62.exe
  Segmentation fault (core dumped)

 As this is a PA-RISC HP-UX system, I feel certain that you don't
 actually have nvptx offloading available (so, the nvptx libgomp
 plugin is not being built).  However, this test case, contains an
 unconditional acc_init call for acc_device_nvidia, and I would then
 guess that this situation is not (not anymore?) correctly handled
 (abort with »offloading to [...] not possible«, or similar; see
 libgomp.oacc-c-c++-common/lib-4.c) in libgomp -- Julian, could this be
 due to your recent libgomp OpenACC initialization changes?  (When
 working on this in a build that does have nvptx offloading
 configured, I think you should be able to simulate the situation by
 hiding (temporarily deleting, or similar) the nvptx libgomp
 plugin?)

The attached patch contains (what I hope should be) a fix for this,
tested by running the libgomp testsuite (with nvptx offloading), and by
deleting the nvptx plugin, with the patch applied, and ensuring that
lib-62.c no longer segfaults in that case.

The patch also tidies up a few other error paths around resolve_device,
and de-duplicates some error message reporting code.

 Then, I don't know why libgomp.oacc-c-c++-common/lib-62.c contains
 this explicit acc_init call with acc_device_nvidia -- generally, the
 test cases should not contain such unconditional statements.  So,
 let's then please remove this.  See
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-66.c for a very
 similar test case, which does this differently.

I've not touched this test though -- but I have tweaked
libgomp.oacc-c-c++-common/lib-4.c that should now expect a slightly
different error output.

OK for trunk?

Thanks,

Julian

ChangeLog

libgomp/
* oacc-init.c (resolve_device): Add FAIL_IS_ERROR argument. Update
function comment. Only call gomp_fatal if new argument is true.
(acc_dev_num_out_of_range): New function.
(acc_init_1, acc_shutdown_1): Update call to resolve_device. Call
acc_dev_num_out_of_range as appropriate.
(acc_get_num_devices, acc_set_device_type, acc_get_device_type)
(acc_get_device_num, acc_set_device_num): Update calls to resolve_device.
* testsuite/libgomp.oacc-c-c++-common/lib-4.c: Update expected test
output.
commit 221b5dea47cdb7611456ca3cf28d180d3ff1156a
Author: Julian Brown jul...@codesourcery.com
Date:   Thu May 7 08:39:16 2015 -0700

Clean up initialisation when no devices of a particular type are available.

diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index f2c60ec..cd50521 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -109,10 +109,12 @@ name_of_acc_device_t (enum acc_device_t type)
 }
 }
 
-/* ACC_DEVICE_LOCK should be held before calling this function.  */
+/* ACC_DEVICE_LOCK must be held before calling this function.  If FAIL_IS_ERROR
+   is true, this function raises an error if there are no devices of type D,
+   otherwise it returns NULL in that case.  */
 
 static struct gomp_device_descr *
-resolve_device (acc_device_t d)
+resolve_device (acc_device_t d, bool fail_is_error)
 {
   acc_device_t d_arg = d;
 
@@ -130,7 +132,13 @@ resolve_device (acc_device_t d)
 		   dispatchers[d]-get_num_devices_func ()  0)
 		goto found;
 
-	gomp_fatal (device type %s not supported, goacc_device_type);
+	if (fail_is_error)
+	  {
+		gomp_mutex_unlock (acc_device_lock);
+		gomp_fatal (device type %s not supported, goacc_device_type);
+	  }
+	else
+	  return NULL;
 	  }
 
 	/* No default device specified, so start scanning for any non-host
@@ -149,7 +157,13 @@ resolve_device (acc_device_t d)
 	  d = acc_device_host;
 	  goto found;
 	}
-  gomp_fatal (no device found);
+  if (fail_is_error)
+{
+	  gomp_mutex_unlock (acc_device_lock);
+	  gomp_fatal (no device found);
+	}
+  else
+return NULL;
   break;
 
 case acc_device_host:
@@ -157,7 +171,12 @@ resolve_device (acc_device_t d)
 
 default:
   if (d  _ACC_device_hwm)
-	gomp_fatal (device %u out of range, (unsigned)d);
+	{
+	  if (fail_is_error)
+	goto unsupported_device;
+	  else
+	return NULL;
+	}
   break;
 }
  found:
@@ -166,12 +185,30 @@ resolve_device (acc_device_t d)
 	   d != acc_device_default
 	   d != acc_device_not_host);
 
+  if (dispatchers[d] == NULL  fail_is_error)
+{
+unsupported_device:
+  gomp_mutex_unlock (acc_device_lock);
+  gomp_fatal (device

[PATCH] Fix OpenACC shutdown and PTX image unloading (PR65904)

2015-05-01 Thread Julian Brown
Hi,

This patch fixes PR65904, a double-free error that started occurring
after recent libgomp changes to the way offload images are registered
with the runtime.

Offload images now map all functions/data using just two malloc'ed
blocks, but the function gomp_free_memmap did not take that into
account, and treated all mappings as if they had their own blocks (as
they do if created by gomp_map_vars): so attempting to free the whole
map at once failed when it hit mappings for an offload image.

The fix is to split offload-image freeing out of GOMP_offload_unregister
into a separate function, and call that from gomp_free_memmap for the
given device before freeing the rest of the memory map.

The patch also fixes a thinko that was revealed in image unloading in
the NVPTX backend. Tested for libgomp with PTX offloading.

OK for trunk?

Thanks,

Julian

ChangeLog

libgomp/
* libgomp.h (gomp_free_memmap): Update prototype.
* oacc-init.c (acc_shutdown_1): Pass device descriptor to
gomp_free_memmap. Don't lock device around call.
* target.c (gomp_map_vars): Initialise tgt-array to NULL before
early exit.
(GOMP_offload_unregister): Split out and call...
(gomp_deoffload_image_from_device): This new function.
(gomp_free_memmap): Call gomp_deoffload_image_from_device.

* plugin/nvptx.c (struct ptx_image_data): Add ord, fn_descs fields.
(GOMP_OFFLOAD_load_image): Populate above fields.
(GOMP_OFFLOAD_unload_image): Switch to ORD'th device before freeing
images, and use fn_descs field from ptx_image_data instead of
incorrectly using a pointer derived from target_data.
 commit 14e8e35a494a5a8231ab1a3cad38a2157bca7e4a
Author: Julian Brown jul...@codesourcery.com
Date:   Thu Apr 30 10:19:58 2015 -0700

Fix freeing of memory maps during acc shutdown.

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 5272f01..5e0e09c 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -777,7 +777,7 @@ extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
 extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
-extern void gomp_free_memmap (struct splay_tree_s *);
+extern void gomp_free_memmap (struct gomp_device_descr *);
 extern void gomp_fini_device (struct gomp_device_descr *);
 
 /* work.c */
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 503f8b8..f2c60ec 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -245,9 +245,7 @@ acc_shutdown_1 (acc_device_t d)
 
   if (walk-dev)
 	{
-	  gomp_mutex_lock (walk-dev-lock);
-	  gomp_free_memmap (walk-dev-mem_map);
-	  gomp_mutex_unlock (walk-dev-lock);
+	  gomp_free_memmap (walk-dev);
 
 	  walk-dev = NULL;
 	  walk-base_dev = NULL;
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 583ec87..2cc0ae0 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -334,8 +334,10 @@ struct ptx_event
 
 struct ptx_image_data
 {
+  int ord;
   void *target_data;
   CUmodule module;
+  struct targ_fn_descriptor *fn_descs;
   struct ptx_image_data *next;
 };
 
@@ -1625,13 +1627,6 @@ GOMP_OFFLOAD_load_image (int ord, void *target_data,
 
   link_ptx (module, img_header[0]);
 
-  pthread_mutex_lock (ptx_image_lock);
-  new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data));
-  new_image-target_data = target_data;
-  new_image-module = module;
-  new_image-next = ptx_images;
-  ptx_images = new_image;
-  pthread_mutex_unlock (ptx_image_lock);
 
   /* The mkoffload utility emits a table of pointers/integers at the start of
  each offload image:
@@ -1652,8 +1647,21 @@ GOMP_OFFLOAD_load_image (int ord, void *target_data,
 
   *target_table = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
   * (fn_entries + var_entries));
-  targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
- * fn_entries);
+  if (fn_entries  0)
+targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
+   * fn_entries);
+  else
+targ_fns = NULL;
+
+  pthread_mutex_lock (ptx_image_lock);
+  new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data));
+  new_image-ord = ord;
+  new_image-target_data = target_data;
+  new_image-module = module;
+  new_image-fn_descs = targ_fns;
+  new_image-next = ptx_images;
+  ptx_images = new_image;
+  pthread_mutex_unlock (ptx_image_lock);
 
   for (i = 0; i  fn_entries; i++)
 {
@@ -1687,23 +1695,22 @@ GOMP_OFFLOAD_load_image (int ord, void *target_data,
 }
 
 void
-GOMP_OFFLOAD_unload_image (int tid __attribute__((unused)), void *target_data)
+GOMP_OFFLOAD_unload_image (int ord, void *target_data)
 {
-  void **img_header = (void **) target_data;
-  struct targ_fn_descriptor *targ_fns
-= (struct targ_fn_descriptor *) img_header[0];
   struct ptx_image_data *image, *prev = NULL, *newhd = NULL;
 
-  free (targ_fns

Re: [PATCH] Tidy up locking for libgomp OpenACC entry points

2015-04-24 Thread Julian Brown
On Thu, 23 Apr 2015 18:41:34 +0200
Thomas Schwinge tho...@codesourcery.com wrote:

 Hi!
 
 On Wed, 22 Apr 2015 19:42:43 +0100, Julian Brown
 jul...@codesourcery.com wrote:
  This patch is an attempt to fix some potential race conditions with
  accesses to shared data structures from multiple concurrent threads
  in libgomp's OpenACC entry points. The main change is to move
  locking out of lookup_host and lookup_dev in oacc-mem.c and into
  their callers (which can then hold the locks for the whole
  operation that they are performing).
 
 Yeah, that makes sense I guess.
 
  Also missing locking has been added for gomp_acc_insert_pointer.
  
  Tests look OK (with offloading to NVidia PTX).
 
 How did you test to get some confidence in the locking being
 sufficient?

Merely by running the existing tests and via inspection, sadly. I'm not
sure how much value we'd get from implementing an exhaustive threading
testsuite at this stage: I guess testcases will be easier to come by in
the future if/when people start to use e.g. OpenMP and OpenACC together.

 Going further (separate patch?), a few more comments:
 
 Is it OK that oacc-init.c:cached_base_dev is accessed without locking?
 
 Generally, we have to keep in mind that the same device may be
 accessed in parallel through both OpenACC and OpenMP interfaces.  For
 this, for example, in oacc-init.c, even though acc_device_lock is
 held, is it OK to call gomp_init_device(D) without D-lock being
 locked?  (Compare to target.c code.)
 
 Please document what exactly oacc-init.c:acc_device_lock is to guard.
 I'm not sure I'm understanding this correctly.

I've attached a follow-on patch that documents the purpose of
acc_device_lock -- and also fixes some places that should have been
holding the lock, but were not.

I've also added locking (with dev-lock) when calling gomp_init_device
and gomp_fini_device from the OpenACC initialisation/finalisation code.

 Should oacc-init.c:acc_shutdown_1 release goacc_thread_lock before any
 gomp_fatal calls?  (That seems to be the general policy in libgomp.)

I added this to the first patch.

  --- a/libgomp/oacc-mem.c
  +++ b/libgomp/oacc-mem.c
 
  @@ -120,25 +116,32 @@ acc_free (void *d)
   {
 splay_tree_key k;
 struct goacc_thread *thr = goacc_thread ();
  +  struct gomp_device_descr *acc_dev = thr-dev;
   
 if (!d)
   return;
   
 assert (thr  thr-dev);
   
  +  gomp_mutex_lock (acc_dev-lock);
  +
 /* We don't have to call lazy open here, as the ptr value must
  have been returned by acc_malloc.  It's not permitted to pass NULL
  in (unless you got that null from acc_malloc).  */
  -  if ((k = lookup_dev (thr-dev-openacc.data_environ, d, 1)))
  -   {
  - void *offset;
  +  if ((k = lookup_dev (acc_dev-openacc.data_environ, d, 1)))
  +{
  +  void *offset;
  +
  +  offset = d - k-tgt-tgt_start + k-tgt_offset;
   
  - offset = d - k-tgt-tgt_start + k-tgt_offset;
  +  gomp_mutex_unlock (acc_dev-lock);
   
  - acc_unmap_data ((void *)(k-host_start + offset));
  -   }
  +  acc_unmap_data ((void *)(k-host_start + offset));
  +}
  +  else
  +gomp_mutex_unlock (acc_dev-lock);
 
 Does it make sense to make the unlock unconditional, and move the
 acc_unmap_data after it, guarded by »if (k)«?

I've left this one -- just a stylistic tweak, but I think it's fine
as-is.

  -  thr-dev-free_func (thr-dev-target_id, d);
  +  acc_dev-free_func (acc_dev-target_id, d);
   }
   
   void
  @@ -178,16 +181,24 @@ acc_deviceptr (void *h)
 goacc_lazy_initialize ();
   
 struct goacc_thread *thr = goacc_thread ();
  +  struct gomp_device_descr *dev = thr-dev;
  +
  +  gomp_mutex_lock (dev-lock);
   
  -  n = lookup_host (thr-dev, h, 1);
  +  n = lookup_host (dev, h, 1);
   
 if (!n)
  -return NULL;
  +{
  +  gomp_mutex_unlock (dev-lock);
  +  return NULL;
  +}
   
 offset = h - n-host_start;
   
 d = n-tgt-tgt_start + n-tgt_offset + offset;
   
  +  gomp_mutex_unlock (dev-lock);
  +
 return d;
   }
 
 Do we need to retain the lock while working with n?  If not, the
 unlock could be placed right after the lookup_host, unconditionally.
 I'm confused -- it's commonly being done (retained) in target.c code,
 but not in the tgt_fn lookup in target.c:GOMP_target.

I think the difference can be explained as follows: a given mapping
(splay_key_tree_s) is essentially immutable after it is created (apart
from the refcounts). Thus it can be safely accessed *so long as we know
it will not be deallocated*.

Now, in some parts of target.c, we have an active target_mem_desc,
corresponding to a set of host-target mappings. So long as we are
holding that target_mem_desc (e.g. as we are in GOMP_target_data), we
know that none of the associated mappings' refcounts will fall to zero:
so, we can access them (read only) safely without explicitly holding the
lock.

But, that's *not* the case for e.g. acc_deviceptr: that can be called
at any point, in particular

[PATCH] Tidy up locking for libgomp OpenACC entry points

2015-04-22 Thread Julian Brown
Hi,

This patch is an attempt to fix some potential race conditions with
accesses to shared data structures from multiple concurrent threads in
libgomp's OpenACC entry points. The main change is to move locking out
of lookup_host and lookup_dev in oacc-mem.c and into their callers
(which can then hold the locks for the whole operation that they are
performing).

Also missing locking has been added for gomp_acc_insert_pointer.

Tests look OK (with offloading to NVidia PTX).

OK? (For the gomp4 branch, maybe, if trunk's not suitable at the
moment?)

Thanks,

Julian

ChangeLog

libgomp/
* oacc-mem.c (lookup_host): Remove locking from function. Note
locking requirement for caller in function comment.
(lookup_dev): Likewise.
(acc_free, acc_deviceptr, acc_hostptr, acc_is_present)
(acc_map_data, acc_unmap_data, present_create_copy, delete_copyout)
(update_dev_host, gomp_acc_insert_pointer, gomp_acc_remove_pointer):
Add locking.
commit 983e08e46be24380a52095851cd9c6eb481eb47c
Author: Julian Brown jul...@codesourcery.com
Date:   Tue Apr 21 12:42:17 2015 -0700

More locking in oacc-mem.c

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 89ef5fc..d53af4b 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -35,7 +35,8 @@
 #include stdint.h
 #include assert.h
 
-/* Return block containing [H-S), or NULL if not contained.  */
+/* Return block containing [H-S), or NULL if not contained.  The device lock
+   for DEV must be locked on entry, and remains locked on exit.  */
 
 static splay_tree_key
 lookup_host (struct gomp_device_descr *dev, void *h, size_t s)
@@ -46,9 +47,7 @@ lookup_host (struct gomp_device_descr *dev, void *h, size_t s)
   node.host_start = (uintptr_t) h;
   node.host_end = (uintptr_t) h + s;
 
-  gomp_mutex_lock (dev-lock);
   key = splay_tree_lookup (dev-mem_map, node);
-  gomp_mutex_unlock (dev-lock);
 
   return key;
 }
@@ -56,7 +55,8 @@ lookup_host (struct gomp_device_descr *dev, void *h, size_t s)
 /* Return block containing [D-S), or NULL if not contained.
The list isn't ordered by device address, so we have to iterate
over the whole array.  This is not expected to be a common
-   operation.  */
+   operation.  The device lock associated with TGT must be locked on entry, and
+   remains locked on exit.  */
 
 static splay_tree_key
 lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
@@ -67,16 +67,12 @@ lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
   if (!tgt)
 return NULL;
 
-  gomp_mutex_lock (tgt-device_descr-lock);
-
   for (t = tgt; t != NULL; t = t-prev)
 {
   if (t-tgt_start = (uintptr_t) d  t-tgt_end = (uintptr_t) d + s)
 break;
 }
 
-  gomp_mutex_unlock (tgt-device_descr-lock);
-
   if (!t)
 return NULL;
 
@@ -120,25 +116,32 @@ acc_free (void *d)
 {
   splay_tree_key k;
   struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr-dev;
 
   if (!d)
 return;
 
   assert (thr  thr-dev);
 
+  gomp_mutex_lock (acc_dev-lock);
+
   /* We don't have to call lazy open here, as the ptr value must have
  been returned by acc_malloc.  It's not permitted to pass NULL in
  (unless you got that null from acc_malloc).  */
-  if ((k = lookup_dev (thr-dev-openacc.data_environ, d, 1)))
-   {
- void *offset;
+  if ((k = lookup_dev (acc_dev-openacc.data_environ, d, 1)))
+{
+  void *offset;
+
+  offset = d - k-tgt-tgt_start + k-tgt_offset;
 
- offset = d - k-tgt-tgt_start + k-tgt_offset;
+  gomp_mutex_unlock (acc_dev-lock);
 
- acc_unmap_data ((void *)(k-host_start + offset));
-   }
+  acc_unmap_data ((void *)(k-host_start + offset));
+}
+  else
+gomp_mutex_unlock (acc_dev-lock);
 
-  thr-dev-free_func (thr-dev-target_id, d);
+  acc_dev-free_func (acc_dev-target_id, d);
 }
 
 void
@@ -178,16 +181,24 @@ acc_deviceptr (void *h)
   goacc_lazy_initialize ();
 
   struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *dev = thr-dev;
+
+  gomp_mutex_lock (dev-lock);
 
-  n = lookup_host (thr-dev, h, 1);
+  n = lookup_host (dev, h, 1);
 
   if (!n)
-return NULL;
+{
+  gomp_mutex_unlock (dev-lock);
+  return NULL;
+}
 
   offset = h - n-host_start;
 
   d = n-tgt-tgt_start + n-tgt_offset + offset;
 
+  gomp_mutex_unlock (dev-lock);
+
   return d;
 }
 
@@ -204,16 +215,24 @@ acc_hostptr (void *d)
   goacc_lazy_initialize ();
 
   struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr-dev;
 
-  n = lookup_dev (thr-dev-openacc.data_environ, d, 1);
+  gomp_mutex_lock (acc_dev-lock);
+
+  n = lookup_dev (acc_dev-openacc.data_environ, d, 1);
 
   if (!n)
-return NULL;
+{
+  gomp_mutex_unlock (acc_dev-lock);
+  return NULL;
+}
 
   offset = d - n-tgt-tgt_start + n-tgt_offset;
 
   h = n-host_start + offset;
 
+  gomp_mutex_unlock (acc_dev-lock);
+
   return h;
 }
 
@@ -232,6 +251,8 @@ acc_is_present (void *h, size_t s)
   struct goacc_thread *thr

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) (PR65742)

2015-04-17 Thread Julian Brown
On Tue, 14 Apr 2015 15:15:02 +0100
Julian Brown jul...@codesourcery.com wrote:

 On Wed, 8 Apr 2015 17:58:56 +0300
 Ilya Verbin iver...@gmail.com wrote:
 
  On Wed, Apr 08, 2015 at 15:31:42 +0100, Julian Brown wrote:
   This version is mostly the same as the last posted version but
   has a tweak in GOACC_parallel to account for the new splay tree
   arrangement for target functions:
   
   -  tgt_fn = (void (*)) tgt_fn_key-tgt-tgt_start;
   +  tgt_fn = (void (*)) tgt_fn_key-tgt_offset;
   
   Have there been any other changes I might have missed?
  
  No.
  
   It passes libgomp testing on NVPTX. OK?
  
  Have you tested it with disabled offloading?
  
  I see several regressions:
  FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
  -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
  FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
  -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
 
 I think there may be multiple issues here. The attached patch
 addresses one -- acc_device_type not distinguishing between
 offloaded and host code with the host_nonshm plugin.

The patch appears to fix the original issue after all: I've re-run
tests with host==target and the failures no longer appear. Also the
same has been noted by Dominique d'Humieres in PR65742.

 The other problem is that it appears that the ACC_DEVICE_TYPE
 environment variable is not getting set properly on the target for
 (any of) the OpenACC tests: this means a lot of the time the wrong
 plugin is being tested, and means that the above tests (and several
 others) still fail. That will apparently need some more engineering
 (on our part).

Fixing this turns out to require more DejaGNU-fu than I have: AFAICT,
setting a per-test environment variable from an .exp file can't easily
be done at present. The potentially useful-looking
{dg-}set-target-env-var doesn't look quite suitable for this purpose,
and besides which doesn't actually seem to be implemented for host !=
target anyway.

(At least, if this fragment of gcc-dg.exp is anything to go by:

   if { [info exists set_target_env_var] \
 [llength $set_target_env_var] != 0 } {
 if { [is_remote target] } {
   return [list unsupported ]
 } ...
).

So: OK for trunk?

Thanks,

Julian

 ChangeLog
 
 libgomp/
 * oacc-init.c (acc_on_device): Check whether we're in an offloaded
 region for host_nonshm plugin.
 * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
 nonshm_exec flag in thread-local data.
 (GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
 data for host_nonshm plugin.
 (+GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local
 data for host_nonshm plugin.
 * plugin/plugin-host.h: New.


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-04-14 Thread Julian Brown
On Wed, 8 Apr 2015 17:58:56 +0300
Ilya Verbin iver...@gmail.com wrote:

 On Wed, Apr 08, 2015 at 15:31:42 +0100, Julian Brown wrote:
  This version is mostly the same as the last posted version but has a
  tweak in GOACC_parallel to account for the new splay tree
  arrangement for target functions:
  
  -  tgt_fn = (void (*)) tgt_fn_key-tgt-tgt_start;
  +  tgt_fn = (void (*)) tgt_fn_key-tgt_offset;
  
  Have there been any other changes I might have missed?
 
 No.
 
  It passes libgomp testing on NVPTX. OK?
 
 Have you tested it with disabled offloading?
 
 I see several regressions:
 FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
 -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
 FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
 -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test

I think there may be multiple issues here. The attached patch addresses
one -- acc_device_type not distinguishing between offloaded and host
code with the host_nonshm plugin.

The other problem is that it appears that the ACC_DEVICE_TYPE
environment variable is not getting set properly on the target for (any
of) the OpenACC tests: this means a lot of the time the wrong plugin
is being tested, and means that the above tests (and several others)
still fail. That will apparently need some more engineering (on our
part).

(Not asking for review just yet, JFYI.)

Julian

ChangeLog

libgomp/
* oacc-init.c (acc_on_device): Check whether we're in an offloaded
region for host_nonshm plugin.
* plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
nonshm_exec flag in thread-local data.
(GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
data for host_nonshm plugin.
(+GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data
for host_nonshm plugin.
* plugin/plugin-host.h: New.Index: libgomp/oacc-init.c
===
--- libgomp/oacc-init.c	(revision 221922)
+++ libgomp/oacc-init.c	(working copy)
@@ -29,6 +29,7 @@
 #include libgomp.h
 #include oacc-int.h
 #include openacc.h
+#include plugin/plugin-host.h
 #include assert.h
 #include stdlib.h
 #include strings.h
@@ -548,7 +549,14 @@ ialias (acc_set_device_num)
 int
 acc_on_device (acc_device_t dev)
 {
-  if (acc_get_device_type () == acc_device_host_nonshm)
+  struct goacc_thread *thr = goacc_thread ();
+
+  /* We only want to appear to be the host_nonshm plugin from offloaded
+ code -- i.e. within a parallel region.  Test a flag set by the
+ openacc_parallel hook of the host_nonshm plugin to determine that.  */
+  if (acc_get_device_type () == acc_device_host_nonshm
+   thr  thr-target_tls
+   ((struct nonshm_thread *)thr-target_tls)-nonshm_exec)
 return dev == acc_device_host_nonshm || dev == acc_device_not_host;
 
   /* Just rely on the compiler builtin.  */
Index: libgomp/plugin/plugin-host.c
===
--- libgomp/plugin/plugin-host.c	(revision 221922)
+++ libgomp/plugin/plugin-host.c	(working copy)
@@ -44,6 +44,7 @@
 #include stdlib.h
 #include string.h
 #include stdio.h
+#include stdbool.h
 
 #ifdef HOST_NONSHM_PLUGIN
 #define STATIC
@@ -55,6 +56,10 @@
 #define SELF host: 
 #endif
 
+#ifdef HOST_NONSHM_PLUGIN
+#include plugin-host.h
+#endif
+
 STATIC const char *
 GOMP_OFFLOAD_get_name (void)
 {
@@ -174,7 +179,10 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn
 			   void *targ_mem_desc __attribute__ ((unused)))
 {
 #ifdef HOST_NONSHM_PLUGIN
+  struct nonshm_thread *thd = GOMP_PLUGIN_acc_thread ();
+  thd-nonshm_exec = true;
   fn (devaddrs);
+  thd-nonshm_exec = false;
 #else
   fn (hostaddrs);
 #endif
@@ -232,11 +240,20 @@ STATIC void *
 GOMP_OFFLOAD_openacc_create_thread_data (int ord
 	 __attribute__ ((unused)))
 {
+#ifdef HOST_NONSHM_PLUGIN
+  struct nonshm_thread *thd
+= GOMP_PLUGIN_malloc (sizeof (struct nonshm_thread));
+  thd-nonshm_exec = false;
+  return thd;
+#else
   return NULL;
+#endif
 }
 
 STATIC void
-GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data
-	  __attribute__ ((unused)))
+GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data)
 {
+#ifdef HOST_NONSHM_PLUGIN
+  free (tls_data);
+#endif
 }
Index: libgomp/plugin/plugin-host.h
===
--- libgomp/plugin/plugin-host.h	(revision 0)
+++ libgomp/plugin/plugin-host.h	(revision 0)
@@ -0,0 +1,37 @@
+/* OpenACC Runtime Library: acc_device_host, acc_device_host_nonshm.
+
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-04-08 Thread Julian Brown
On Tue, 7 Apr 2015 17:26:45 +0200
Jakub Jelinek ja...@redhat.com wrote:

 On Mon, Apr 06, 2015 at 03:45:57PM +0300, Ilya Verbin wrote:
  On Wed, Apr 01, 2015 at 15:20:25 +0200, Jakub Jelinek wrote:
   LGTM with proper ChangeLog entry.
  
  I've commited this patch into trunk.
  
  Julian, you probably want to update the nvptx plugin.
 
 Note that as the number of P1s without posted fixes is now zero, it is
 likely RC1 will be done this week, so if you want nvptx working in
 GCC 5, please post a fix as soon as possible.

This version is mostly the same as the last posted version but has a
tweak in GOACC_parallel to account for the new splay tree arrangement
for target functions:

-  tgt_fn = (void (*)) tgt_fn_key-tgt-tgt_start;
+  tgt_fn = (void (*)) tgt_fn_key-tgt_offset;

Have there been any other changes I might have missed?

It passes libgomp testing on NVPTX. OK?

Thanks,

Juliancommit ac06b5e25e170061bb9855b9ea4b8e5696816bf1
Author: Julian Brown jul...@codesourcery.com
Date:   Tue Apr 7 09:23:58 2015 -0700

NVPTX load/unload and init-rework patch.

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index 02c44b6..dbc68bc 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -839,6 +839,7 @@ process (FILE *in, FILE *out)
 {
   const char *input = read_file (in);
   Token *tok = tokenize (input);
+  unsigned int nvars = 0, nfuncs = 0;
 
   do
 tok = parse_file (tok);
@@ -850,16 +851,17 @@ process (FILE *in, FILE *out)
   write_stmts (out, rev_stmts (fns));
   fprintf (out, ;\n\n);
   fprintf (out, static const char *var_mappings[] = {\n);
-  for (id_map *id = var_ids; id; id = id-next)
+  for (id_map *id = var_ids; id; id = id-next, nvars++)
 fprintf (out, \t\%s\%s\n, id-ptx_name, id-next ? , : );
   fprintf (out, };\n\n);
   fprintf (out, static const char *func_mappings[] = {\n);
-  for (id_map *id = func_ids; id; id = id-next)
+  for (id_map *id = func_ids; id; id = id-next, nfuncs++)
 fprintf (out, \t\%s\%s\n, id-ptx_name, id-next ? , : );
   fprintf (out, };\n\n);
 
   fprintf (out, static const void *target_data[] = {\n);
-  fprintf (out,   ptx_code, var_mappings, func_mappings\n);
+  fprintf (out,   ptx_code, (void*) %u, var_mappings, (void*) %u, 
+		func_mappings\n, nvars, nfuncs);
   fprintf (out, };\n\n);
 
   fprintf (out, extern void GOMP_offload_register (const void *, int, void *);\n);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a1d42c5..5272f01 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -655,9 +655,6 @@ struct target_mem_desc {
   /* Corresponding target device descriptor.  */
   struct gomp_device_descr *device_descr;
 
-  /* Memory mapping info for the thread that created this descriptor.  */
-  struct splay_tree_s *mem_map;
-
   /* List of splay keys to remove (or decrease refcount)
  at the end of region.  */
   splay_tree_key list[];
@@ -691,18 +688,6 @@ typedef struct acc_dispatch_t
   /* This is guarded by the lock in the outer struct gomp_device_descr.  */
   struct target_mem_desc *data_environ;
 
-  /* Extra information required for a device instance by a given target.  */
-  /* This is guarded by the lock in the outer struct gomp_device_descr.  */
-  void *target_data;
-
-  /* Open or close a device instance.  */
-  void *(*open_device_func) (int n);
-  int (*close_device_func) (void *h);
-
-  /* Set or get the device number.  */
-  int (*get_device_num_func) (void);
-  void (*set_device_num_func) (int);
-
   /* Execute.  */
   void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
 		 unsigned short *, int, int, int, int, void *);
@@ -720,7 +705,7 @@ typedef struct acc_dispatch_t
   void (*async_set_async_func) (int);
 
   /* Create/destroy TLS data.  */
-  void *(*create_thread_data_func) (void *);
+  void *(*create_thread_data_func) (int);
   void (*destroy_thread_data_func) (void *);
 
   /* NVIDIA target specific routines.  */
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
index 08b7c5e..1f5827e 100644
--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c
@@ -26,7 +26,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
http://www.gnu.org/licenses/.  */
 
-
+#include assert.h
 #include openacc.h
 #include libgomp.h
 #include oacc-int.h
@@ -37,13 +37,23 @@ acc_async_test (int async)
   if (async  acc_async_sync)
 gomp_fatal (invalid async argument: %d, async);
 
-  return base_dev-openacc.async_test_func (async);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr-dev)
+gomp_fatal (no device active);
+
+  return thr-dev-openacc.async_test_func (async);
 }
 
 int
 acc_async_test_all (void)
 {
-  return base_dev-openacc.async_test_all_func ();
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr-dev)
+gomp_fatal (no device active);
+
+  return thr-dev-openacc.async_test_all_func ();
 }
 
 void
@@ -52,19 +62,34 @@ acc_wait (int async)
   if (async  acc_async_sync

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-04-08 Thread Julian Brown
On Wed, 8 Apr 2015 17:58:56 +0300
Ilya Verbin iver...@gmail.com wrote:

 Have you tested it with disabled offloading?
 
 I see several regressions:
 FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
 -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
 FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
 -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test

No -- thanks for the note. I've committed the patch now, but I'll try
to get to looking at these in the next day or two (it's probably
something relatively minor, I guess).

Julian


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-30 Thread Julian Brown
On Mon, 30 Mar 2015 18:42:02 +0200
Jakub Jelinek ja...@redhat.com wrote:

 On Thu, Mar 26, 2015 at 11:41:30PM +0300, Ilya Verbin wrote:
  Here is the latest patch for libgomp and mic plugin.
  make check-target-libgomp using intelmic emul passed.
  Also I used a testcase from the attachment.
 
 This applies cleanly.
 
  Latest ptx part is here, I guess:
  https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01407.html
 
 But the one Julian posted doesn't apply on top of your patch.
 If there is any interdiff needed on top of your patch, can it be
 posted against trunk + your patch?

Here's a version of my patch against trunk and Ilya's latest patch
(hopefully!). Tests look OK (libgomp + PTX).

HTH,

Juliancommit f203634ace786b5bb2fdce56f123f3fba236dda3
Author: Julian Brown jul...@codesourcery.com
Date:   Mon Mar 30 14:37:53 2015 -0700

nvptx load/unload support, init rework

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index 02c44b6..dbc68bc 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -839,6 +839,7 @@ process (FILE *in, FILE *out)
 {
   const char *input = read_file (in);
   Token *tok = tokenize (input);
+  unsigned int nvars = 0, nfuncs = 0;
 
   do
 tok = parse_file (tok);
@@ -850,16 +851,17 @@ process (FILE *in, FILE *out)
   write_stmts (out, rev_stmts (fns));
   fprintf (out, ;\n\n);
   fprintf (out, static const char *var_mappings[] = {\n);
-  for (id_map *id = var_ids; id; id = id-next)
+  for (id_map *id = var_ids; id; id = id-next, nvars++)
 fprintf (out, \t\%s\%s\n, id-ptx_name, id-next ? , : );
   fprintf (out, };\n\n);
   fprintf (out, static const char *func_mappings[] = {\n);
-  for (id_map *id = func_ids; id; id = id-next)
+  for (id_map *id = func_ids; id; id = id-next, nfuncs++)
 fprintf (out, \t\%s\%s\n, id-ptx_name, id-next ? , : );
   fprintf (out, };\n\n);
 
   fprintf (out, static const void *target_data[] = {\n);
-  fprintf (out,   ptx_code, var_mappings, func_mappings\n);
+  fprintf (out,   ptx_code, (void*) %u, var_mappings, (void*) %u, 
+		func_mappings\n, nvars, nfuncs);
   fprintf (out, };\n\n);
 
   fprintf (out, extern void GOMP_offload_register (const void *, int, void *);\n);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a1d42c5..5272f01 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -655,9 +655,6 @@ struct target_mem_desc {
   /* Corresponding target device descriptor.  */
   struct gomp_device_descr *device_descr;
 
-  /* Memory mapping info for the thread that created this descriptor.  */
-  struct splay_tree_s *mem_map;
-
   /* List of splay keys to remove (or decrease refcount)
  at the end of region.  */
   splay_tree_key list[];
@@ -691,18 +688,6 @@ typedef struct acc_dispatch_t
   /* This is guarded by the lock in the outer struct gomp_device_descr.  */
   struct target_mem_desc *data_environ;
 
-  /* Extra information required for a device instance by a given target.  */
-  /* This is guarded by the lock in the outer struct gomp_device_descr.  */
-  void *target_data;
-
-  /* Open or close a device instance.  */
-  void *(*open_device_func) (int n);
-  int (*close_device_func) (void *h);
-
-  /* Set or get the device number.  */
-  int (*get_device_num_func) (void);
-  void (*set_device_num_func) (int);
-
   /* Execute.  */
   void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
 		 unsigned short *, int, int, int, int, void *);
@@ -720,7 +705,7 @@ typedef struct acc_dispatch_t
   void (*async_set_async_func) (int);
 
   /* Create/destroy TLS data.  */
-  void *(*create_thread_data_func) (void *);
+  void *(*create_thread_data_func) (int);
   void (*destroy_thread_data_func) (void *);
 
   /* NVIDIA target specific routines.  */
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
index 08b7c5e..1f5827e 100644
--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c
@@ -26,7 +26,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
http://www.gnu.org/licenses/.  */
 
-
+#include assert.h
 #include openacc.h
 #include libgomp.h
 #include oacc-int.h
@@ -37,13 +37,23 @@ acc_async_test (int async)
   if (async  acc_async_sync)
 gomp_fatal (invalid async argument: %d, async);
 
-  return base_dev-openacc.async_test_func (async);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr-dev)
+gomp_fatal (no device active);
+
+  return thr-dev-openacc.async_test_func (async);
 }
 
 int
 acc_async_test_all (void)
 {
-  return base_dev-openacc.async_test_all_func ();
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr-dev)
+gomp_fatal (no device active);
+
+  return thr-dev-openacc.async_test_all_func ();
 }
 
 void
@@ -52,19 +62,34 @@ acc_wait (int async)
   if (async  acc_async_sync)
 gomp_fatal (invalid async argument: %d, async);
 
-  base_dev-openacc.async_wait_func (async);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr-dev)
+gomp_fatal (no device

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-27 Thread Julian Brown
, present_create_copy)
(delete_copyout, update_dev_host, gomp_acc_remove_pointer): Tweak
lookup_host calls.
* oacc-parallel.c (select_acc_device): Remove. Replace calls with
goacc_lazy_initialize (throughout).
(GOACC_parallel): Use lock and splay tree from gomp_device_descr not
gomp_memory_mapping.
* target.c (gomp_map_vars, gomp_copy_from_async, gomp_unmap_vars)
(gomp_splay_tree_insert_mapping, GOMP_offload_unregister)
(GOMP_target): Use splay tree and lock directly in
gomp_device_descr, not gomp_memory_mapping.
(gomp_update): Remove mm argument. Use splay tree and lock directly
in gomp_device_descr.
(gomp_free_memmap): Change argument to struct splay_tree_s.
(gomp_load_plugin_for_device): Don't initialise openacc
open_device, close_device, get_device_num or set_device_num hooks.
Don't initialise target_data or deleted mem_map is_initialized,
splay_tree.root fields.
* plugin/plugin-host.c (GOMP_OFFLOAD_openacc_open_device)
(GOMP_OFFLOAD_openacc_close_device)
(GOMP_OFFLOAD_openacc_get_device_num)
(GOMP_OFFLOAD_openacc_set_device_num): Remove.
(GOMP_OFFLOAD_openacc_create_thread_data): Change (unused) argument
to int.
* plugin/plugin-nvptx.c (pthread.h): Include.
(ptx_inited): Remove.
(instantiated_devices, ptx_dev_lock): New.
(struct ptx_image_data): New.
(ptx_devices, ptx_images, ptx_image_lock): New.
(fini_streams_for_device): Reorder cuStreamDestroy call.
(nvptx_get_num_devices): Remove forward declaration.
(nvptx_init): Change return type to bool.
(nvptx_fini): Remove.
(nvptx_attach_host_thread_to_device): New.
(nvptx_open_device): Return struct ptx_device* instead of void*.
(nvptx_close_device): Change argument type to struct ptx_device*,
return type to void.
(nvptx_get_num_devices): Use instantiated_devices not ptx_inited.
(kernel_target_data, kernel_host_table): Remove static globals.
(GOMP_OFFLOAD_register_image, GOMP_OFFLOAD_get_table): Remove.
(GOMP_OFFLOAD_init_device): Reimplement.
(GOMP_OFFLOAD_fini_device): Likewise.
(GOMP_OFFLOAD_load_image, GOMP_OFFLOAD_unload_image): New.
(GOMP_OFFLOAD_alloc, GOMP_OFFLOAD_free, GOMP_OFFLOAD_dev2host)
(GOMP_OFFLOAD_host2dev): Use ORD argument.
(GOMP_OFFLOAD_openacc_open_device)
(GOMP_OFFLOAD_openacc_close_device)
(GOMP_OFFLOAD_openacc_set_device_num)
(GOMP_OFFLOAD_openacc_get_device_num): Remove.
(GOMP_OFFLOAD_openacc_create_thread_data): Change argument to int
(device number).

libgomp/testsuite/
* libgomp.oacc-c-c++-common/lib-9.c: Fix devnum check in test.commit 63091061f227f124d8d496fd3064982935178f3a
Author: Julian Brown jul...@codesourcery.com
Date:   Mon Feb 23 11:55:41 2015 -0800

nvptx load/unload image support, init rework

fix multi-device tests

more load/unload patch cleanups

misc fixes

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index 02c44b6..dbc68bc 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -839,6 +839,7 @@ process (FILE *in, FILE *out)
 {
   const char *input = read_file (in);
   Token *tok = tokenize (input);
+  unsigned int nvars = 0, nfuncs = 0;
 
   do
 tok = parse_file (tok);
@@ -850,16 +851,17 @@ process (FILE *in, FILE *out)
   write_stmts (out, rev_stmts (fns));
   fprintf (out, ;\n\n);
   fprintf (out, static const char *var_mappings[] = {\n);
-  for (id_map *id = var_ids; id; id = id-next)
+  for (id_map *id = var_ids; id; id = id-next, nvars++)
 fprintf (out, \t\%s\%s\n, id-ptx_name, id-next ? , : );
   fprintf (out, };\n\n);
   fprintf (out, static const char *func_mappings[] = {\n);
-  for (id_map *id = func_ids; id; id = id-next)
+  for (id_map *id = func_ids; id; id = id-next, nfuncs++)
 fprintf (out, \t\%s\%s\n, id-ptx_name, id-next ? , : );
   fprintf (out, };\n\n);
 
   fprintf (out, static const void *target_data[] = {\n);
-  fprintf (out,   ptx_code, var_mappings, func_mappings\n);
+  fprintf (out,   ptx_code, (void*) %u, var_mappings, (void*) %u, 
+		func_mappings\n, nvars, nfuncs);
   fprintf (out, };\n\n);
 
   fprintf (out, extern void GOMP_offload_register (const void *, int, void *);\n);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3fc9aa9..822d2fe 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -656,9 +656,6 @@ struct target_mem_desc {
   /* Corresponding target device descriptor.  */
   struct gomp_device_descr *device_descr;
 
-  /* Memory mapping info for the thread that created this descriptor.  */
-  struct gomp_memory_mapping *mem_map;
-
   /* List of splay keys to remove (or decrease refcount)
  at the end of region.  */
   splay_tree_key list[];
@@ -683,20 +680,6 @@ struct splay_tree_key_s {
 
 #include splay-tree.h
 
-/* Information about mapped memory regions (per device/context).  */
-
-struct gomp_memory_mapping
-{
-  /* Mutex for operating with the splay tree and other shared

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-09 Thread Julian Brown
On Fri, 6 Mar 2015 17:01:13 +0300
Ilya Verbin iver...@gmail.com wrote:

 On Thu, Feb 26, 2015 at 20:25:11 +0300, Ilya Verbin wrote:
  On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote:
Julian Brown jul...@codesourcery.com wrote:
This is a version of the previously-posted patch to rework
initialisation and support the proposed load/unload hooks,
merged to gomp4 branch and tested alongside the two patches
(from
  
  Currently the 'struct gomp_memory_mapping' contains 'lock' and
  'is_initialized'. Do you still need them?  Or we can use
  gomp_device_descr::lock and is_initialized instead?  If yes, then
  we can replace the gomp_memory_mapping structure with a splay_tree,
  as it was before the OpenACC merge.
 
 Ping?

Apologies, I've been distracted with travel and other things. I
suspect, as you suggest, that the gomp_memory_mapping
lock/is_initialized fields may no longer be required. I haven't yet had
time to address that nor all of Thomas's comments on the patch (mostly
breakage with multiple devices), and I'm unlikely to have time this
week either due to vacation...

Thanks,

Julian


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-02-25 Thread Julian Brown
On Wed, 25 Feb 2015 10:36:08 +0100
Thomas Schwinge tho...@codesourcery.com wrote:

 Hi!
 
 On Tue, 24 Feb 2015 11:29:51 +, Julian Brown
 jul...@codesourcery.com wrote:
  Test results look OK, barring a suspected harness issue (lib-83
  failing with a timeout for nvptx
 
 However, I'm seeing a class of testsuite regressions: all variants of
 libgomp.oacc-fortran/lib-5.f90 and libgomp.oacc-fortran/lib-7.f90
 FAIL: »libgomp: cuMemFreeHost error: invalid value«.  I see these two
 test cases contain a lot of acc_get_num_devices and similar calls --
 I've been testing this on our nvidiak20-2 system, which contains two
 Nvidia K20 cards, so maybe there's something wrong in that regard.
 (But why is this failing only for Fortran -- are we missing C/C++
 tests in that area?) Can you have a look, or want me to?

I can have a look at that.

  --- a/gcc/config/nvptx/mkoffload.c
  +++ b/gcc/config/nvptx/mkoffload.c
  @@ -850,16 +851,17 @@ process (FILE *in, FILE *out)
 
 fprintf (out, static const void *target_data[] = {\n);
  -  fprintf (out,   ptx_code, var_mappings, func_mappings\n);
  +  fprintf (out,   ptx_code, (void*) %u, var_mappings, (void*) %u,
  
  +   func_mappings\n, nvars, nfuncs);
 fprintf (out, };\n\n);
 
 I wondered if it's maybe more elegant to just separate those by NULL
 delimiters instead of the size integers casted to void * (spaces
 missing)?  But then, that'd need double scanning in the consumer,
 libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_load_image, because we
 need to allocate an appropriately sized array, so maybe your more
 expressive approach is better indeed.

Yeah, I considered both: there's probably not much to choose between
the approaches. They use the same amount of space.

  --- a/libgomp/oacc-async.c
  +++ b/libgomp/oacc-async.c
  @@ -34,44 +34,68 @@
   int
   acc_async_test (int async)
   {
  +  struct goacc_thread *thr = goacc_thread ();
  +
 if (async  acc_async_sync)
   gomp_fatal (invalid async argument: %d, async);
   
  -  return base_dev-openacc.async_test_func (async);
  +  assert (thr-dev);
  +
  +  return thr-dev-openacc.async_test_func (async);
   }

 Here, and in several other places: is this code conforming to the
 OpenACC specification?  Do we need to (lazily) initialize in all
 these places, or in goacc_thread, or gracefully fail (see below) if
 not initialized (basically in all places where you currently assert
 (thr-dev)?
 
 #include openacc.h
 
 int main(int argc, char *argv[])
 {
   return acc_async_test(0);
 }
 
 [sigsegv]

Whether it conforms to the spec or not is a hard question to answer,
because a lot of behaviour is left undefined. But here are two
possibly-useful made-up guidelines:

1. Does the program work the same with OpenACC disabled?

2. Does some strange use of OpenACC functionality (including library
   calls, etc.) probably indicate user error?

Much of the lazy initialisation code is there so that (1) can be true
-- i.e., a program can use OpenACC directives without making an
explicit call to acc_init or other API-specific initialisation code.

But this case is an explicit call to the OpenACC runtime library, so the
program can't work without -fopenacc enabled, so we can follow
guideline (2) instead. And in this case, it's meaningless to test for
completion of async operation when no device is active.

Of course though, this should be an actual error rather than a crash.
But, I don't think we want to lazily-initialise here.

 Also, I'm not sure what the expected outcome of this code sequence is:
 
 acc_init(acc_device_nvidia);
 acc_shutdown(acc_device_nvidia);
 acc_async_test(0);
 
 a.out: [...]/source-gcc/libgomp/oacc-async.c:42: acc_async_test:
 Assertion `thr-dev' failed. Aborted (core dumped)
 
 If the OpenACC specification can be read such that all this indeed is
 undefined behavior, then aborting/crashing is OK, of course.

Again, this would probably indicate user error in a real program, so it
should raise a (real) error message.

  --- a/libgomp/oacc-cuda.c
  +++ b/libgomp/oacc-cuda.c
  @@ -34,51 +34,53 @@
   void *
   acc_get_current_cuda_device (void)
   {
  -  void *p = NULL;
  +  struct goacc_thread *thr = goacc_thread ();
   
  -  if (base_dev  base_dev-openacc.cuda.get_current_device_func)
  -p = base_dev-openacc.cuda.get_current_device_func ();
  +  if (thr  thr-dev 
  thr-dev-openacc.cuda.get_current_device_func)
  +return thr-dev-openacc.cuda.get_current_device_func ();
   
  -  return p;
  +  return NULL;
   }
 
 Here, and in other places, it looks as if we'd fail gracefully.

Not sure about this (maybe it should be an error too?), but...

   int
   acc_set_cuda_stream (int async, void *stream)
   {
  -  int s = -1;
  +  struct goacc_thread *thr;
   
 if (async  0 || stream == NULL)
   return 0;
   
 goacc_lazy_initialize ();
   
  -  if (base_dev  base_dev-openacc.cuda.set_stream_func)
  -s = base_dev-openacc.cuda.set_stream_func

Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-02-24 Thread Julian Brown
Hi,

On Wed, 4 Feb 2015 15:05:45 +
Julian Brown jul...@codesourcery.com wrote:

 The major changes are:
 
 * The removal of the OpenACC-specific plugin hooks open_device,
   close_device, set_device_num and get_device_num. The functionality
   has been moved into the init/fini hooks (for the first two) or moved
   into the target-independent OpenACC parts, respectively.
 
 * The PTX mkoffload utility has been extended to support variables as
   well as function mapping, to fill out support for the load/unload
   image hooks. (Not really tested so far!)
 
 * The plugin hooks that are shared between OpenMP and OpenACC now
   support the device number argument properly: that should help with
   (eventually) unifying the plugin interface for the two APIs. (With
   set_device_num and get_device_num removed, the plugin is stateless
   with respect to which device is currently active. The rest of the
   OpenACC hooks -- async functions, etc. -- should probably be changed
   to take a device number argument too, but that could be a follow-on
   patch.)
 
 * The limitation of having only one type of device active
 simultaneously in the OpenACC runtime has (theoretically!) been
 removed.

This is a version of the previously-posted patch to rework
initialisation and support the proposed load/unload hooks, merged to
gomp4 branch and tested alongside the two patches (from
https://gcc.gnu.org/wiki/Offloading#nvptx_Offloading):

http://news.gmane.org/find-root.php?message_id=%3C20150218100035.GF1746%40tucnak.redhat.com%3E

http://news.gmane.org/find-root.php?message_id=%3C546CF508.9010807%40codesourcery.com%3E

As well as Ilya Verbin's patch:

https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01605.html

Test results look OK, barring a suspected harness issue (lib-83
failing with a timeout for nvptx, though it works fine from the command
line).

OK for gomp4 branch? I could commit Ilya's patch there too if so.

Thanks,

Julian

ChangeLog

gcc/
* config/nvptx/mkoffload.c (process): Support variable mapping.

libgomp/
* libgomp.h (acc_dispatch_t): Remove open_device_func,
close_device_func, get_device_num_func, set_device_num_func,
target_data members. Change create_thread_data_func argument to
device number instead of generic pointer.
* oacc-async.c (assert.h): Include.
(acc_async_test, acc_async_test_all, acc_wait, acc_wait_async)
(acc_wait_all, acc_wait_all_async): Use current host thread's
active device, not base_dev.
* oacc-cuda.c (acc_get_current_cuda_device)
(acc_get_current_cuda_context, acc_get_cuda_stream)
(acc_set_cuda_stream): Likewise.
* oacc-host.c (host_dispatch): Don't set open_device_func,
close_device_func, get_device_num_func or set_device_num_func.
* oacc-init.c (base_dev, init_key): Remove.
(cached_base_dev): New.
(name_of_acc_device_t): New.
(acc_init_1): Initialise default-numbered device, not zeroth.
(acc_shutdown_1): Close all devices of a given type.
(goacc_destroy_thread): Don't use base_dev.
(lazy_open, lazy_init, lazy_init_and_open): Remove.
(goacc_attach_host_thread_to_device): New.
(acc_init): Reimplement with goacc_attach_host_thread_to_device.
(acc_get_num_devices): Don't use base_dev.
(acc_set_device_type): Reimplement.
(acc_get_device_type): Don't use base_dev.
(acc_get_device_num): Tweak logic.
(acc_set_device_num): Likewise.
(goacc_runtime_initialize): Initialize cached_base_dev not base_dev.
(goacc_lazy_initialize): Reimplement with acc_init and
goacc_attach_host_thread_to_device.
* oacc-int.h (goacc_thread): Add base_dev field.
(base_dev): Remove extern declaration.
(goacc_attach_host_thread_to_device): Add prototype.
* oacc-mem.c (acc_malloc): Use current thread's device instead of
base_dev.
(acc_free): Likewise.
(acc_memcpy_to_device): Likewise.
(acc_memcpy_from_device): Likewise.
* oacc-parallel.c (select_acc_device): Remove. Replace calls with
goacc_lazy_initialize (throughout).
* target.c (gomp_load_plugin_for_device): Don't initialise openacc
open_device, close_device, get_device_num or set_device_num hooks.
Don't initialise target_data.
* plugin/plugin-host.c (GOMP_OFFLOAD_openacc_open_device)
(GOMP_OFFLOAD_openacc_close_device)
(GOMP_OFFLOAD_openacc_get_device_num)
(GOMP_OFFLOAD_openacc_set_device_num): Remove.
(GOMP_OFFLOAD_openacc_create_thread_data): Change (unused) argument
to int.
* plugin/plugin-nvptx.c (pthread.h): Include.
(ptx_inited): Remove.
(instantiated_devices, ptx_dev_lock): New.
(struct ptx_image_data): New.
(ptx_devices, ptx_images, ptx_image_lock): New.
(nvptx_get_num_devices): Remove forward declaration.
(nvptx_init): Change return type to bool.
(nvptx_fini): Remove.
(nvptx_attach_host_thread_to_device): New.
(nvptx_open_device): Remove struct ptx_device* instead of void*.
(nvptx_close_device): Change argument

Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-02-04 Thread Julian Brown
On Tue, 3 Feb 2015 23:01:04 +0300
Ilya Verbin iver...@gmail.com wrote:

 On 03 Feb 13:00, Julian Brown wrote:
  On Tue, 3 Feb 2015 14:28:44 +0300
  Ilya Verbin iver...@gmail.com wrote:
   On 27 Jan 14:07, Julian Brown wrote:
On Mon, 26 Jan 2015 17:34:26 +0300
Ilya Verbin iver...@gmail.com wrote:
 Here is my current patch, it works for OpenMP-MIC, but
 obviously will not work for PTX, since it requires
 symmetrical changes in the plugin.  Could you please take a
 look, whether it is possible to support this new interface in
 PTX plugin?

I think it can probably be made to work. I'll have a look in
more detail.
   
   Do you have any progress on this?
  
  I'm still working on a patch to update OpenACC support and the PTX
  backend to use load/unload_image and to unify
  initialisation/opening. So far I think the answer is basically
  yes, the new interface can be supported, though I might request a
  minor tweak -- e.g. that load_image takes an extra void **
  argument so that a libgomp backend can allocate a block of generic
  metadata relating to the image, then that same block would be
  passed (void *) to the unload hook so the backend can use it there
  and deallocate it when it's finished with.
  
  Would that be possible? (It'd mostly be for a CUmodule handle:
  this could be stashed away somewhere within the nvptx backend, but
  it might be neater to put it in generic code since it'll probably
  be useful for other backends anyway.)
 
 An extra argument is not a problem, however I don't quite get the
 idea. PTX plugin allocates some data while loading, and needs this
 data while unloading?  Then why not to create a hash table with
 image_ptr - metadata mapping inside the plugin? [...]

Right -- that's what I meant by could be stashed away somewhere within
the nvptx backend. I just thought that retaining a generic chunk of
state for each (JIT-compiled, in this case) block of code might be
something that would be useful for other targets too. I've kept
the required information (for now at least) within the nvptx backend as
an associative list.

This (WIP) patch is based on top of a version of your patch that I
merged to our internal branch: that's still the easiest way for me to
test the PTX backend (with unloading support) at present, and it passes
libgomp testing that way. Trunk should be fairly close, but I haven't
tried applying it there yet.

The major changes are:

* The removal of the OpenACC-specific plugin hooks open_device,
  close_device, set_device_num and get_device_num. The functionality
  has been moved into the init/fini hooks (for the first two) or moved
  into the target-independent OpenACC parts, respectively.

* The PTX mkoffload utility has been extended to support variables as
  well as function mapping, to fill out support for the load/unload
  image hooks. (Not really tested so far!)

* The plugin hooks that are shared between OpenMP and OpenACC now
  support the device number argument properly: that should help with
  (eventually) unifying the plugin interface for the two APIs. (With
  set_device_num and get_device_num removed, the plugin is stateless
  with respect to which device is currently active. The rest of the
  OpenACC hooks -- async functions, etc. -- should probably be changed
  to take a device number argument too, but that could be a follow-on
  patch.)

* The limitation of having only one type of device active simultaneously
  in the OpenACC runtime has (theoretically!) been removed.

Thoughts?

Thanks,

Julian

ChangeLog

gcc/
* config/nvptx/mkoffload.c (process): Support variable mapping.

libgomp/
* libgomp.h (acc_dispatch_t): Remove open_device_func,
close_device_func, get_device_num_func, set_device_num_func,
target_data members. Change create_thread_data_func argument to
device number instead of generic pointer.
* oacc-async.c (assert.h): Include.
(acc_async_test, acc_async_test_all, acc_wait, acc_wait_async)
(acc_wait_all, acc_wait_all_async): Use current host thread's
active device, not base_dev.
* oacc-cuda.c (acc_get_current_cuda_device)
(acc_get_current_cuda_context, acc_get_cuda_stream)
(acc_set_cuda_stream): Likewise.
* oacc-host.c (host_dispatch): Don't set open_device_func,
close_device_func, get_device_num_func or set_device_num_func.
* oacc-init.c (base_dev, init_key): Remove.
(cached_base_dev): New.
(name_of_acc_device_t): New.
(acc_init_1): Initialise default-numbered device, not zeroth.
(acc_shutdown_1): Close all devices of a given type.
(goacc_destroy_thread): Don't use base_dev.
(lazy_open, lazy_init, lazy_init_and_open): Remove.
(goacc_attach_host_thread_to_device): New.
(acc_init): Reimplement with goacc_attach_host_thread_to_device.
(acc_get_num_devices): Don't use base_dev.
(acc_set_device_type): Reimplement.
(acc_get_device_type): Don't use base_dev.
(acc_get_device_num): Tweak

Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-02-03 Thread Julian Brown
On Tue, 3 Feb 2015 14:28:44 +0300
Ilya Verbin iver...@gmail.com wrote:

 Hi Julian!
 
 On 27 Jan 14:07, Julian Brown wrote:
  On Mon, 26 Jan 2015 17:34:26 +0300
  Ilya Verbin iver...@gmail.com wrote:
   Here is my current patch, it works for OpenMP-MIC, but obviously
   will not work for PTX, since it requires symmetrical changes in
   the plugin.  Could you please take a look, whether it is possible
   to support this new interface in PTX plugin?
  
  I think it can probably be made to work. I'll have a look in more
  detail.
 
 Do you have any progress on this?

I'm still working on a patch to update OpenACC support and the PTX
backend to use load/unload_image and to unify initialisation/opening.
So far I think the answer is basically yes, the new interface can be
supported, though I might request a minor tweak -- e.g. that
load_image takes an extra void ** argument so that a libgomp backend
can allocate a block of generic metadata relating to the image, then
that same block would be passed (void *) to the unload hook so the
backend can use it there and deallocate it when it's finished with.

Would that be possible? (It'd mostly be for a CUmodule handle: this
could be stashed away somewhere within the nvptx backend, but it might
be neater to put it in generic code since it'll probably be useful for
other backends anyway.)

Thanks,

Julian


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-27 Thread Julian Brown
On Mon, 26 Jan 2015 17:34:26 +0300
Ilya Verbin iver...@gmail.com wrote:

 Here is my current patch, it works for OpenMP-MIC, but obviously
 will not work for PTX, since it requires symmetrical changes in the
 plugin.  Could you please take a look, whether it is possible to
 support this new interface in PTX plugin?

I think it can probably be made to work. I'll have a look in more
detail.

Thanks,

Julian


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-27 Thread Julian Brown
On Mon, 26 Jan 2015 14:44:19 +0100
Thomas Schwinge tho...@codesourcery.com wrote:

  On 17 Jan 02:16, Ilya Verbin wrote:
   Unfortunately, it broke offloading from shared libraries (I mean
   common libs with NEEDED entries, not dlopened).
 
 Sorry for that!
 
   Such things are not covered by the
   testsuite, that's why you missed this issue.  Here is a simple
   testcase:
 
 http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E
 
 Probably a good motivation for adding such a test case.  ;-)
 
   So, you don't assume that a device can have multiple images from
   multiple libs?
  
  Ping?
 
 This probably is just a bug that we introduced with our changes?
 (Julian?)

AFAICR, we haven't yet figured out how to make (shared) libraries work
with PTX. Actually I'm not entirely sure if static libraries containing
PTX code will work either. But, multiple images (e.g. from different
object files) are supported, via the loop in gomp_target_init.

(The semantics of gomp_register_image_for_device were changed, but not
-- intentionally! -- to limit the number of offloaded images to one.)

  Also, could you please explain, why did you divide a device
  initialization into two functions -- gomp_init_device and
  gomp_init_tables?
 
 As I understand it (again, Julian, please correct me if I got that
 wrong), the reason is that for OpenACC support, we need these as two
 separate (independent) actions.  Is this causing problems for OpenMP
 offloading?

This was certainly necessary at some point, when the support for
multiple devices of the same type in the OpenACC runtime was delegated
entirely to target-dependent code. Later (after one round of
refactoring), the gomp_device_descr and the memory map were still
separate, with the former possibly representing a number of devices,
and the latter having independent copies for each instance of a device.

That's largely been refactored (again) away now though -- a
gomp_device_descr and its memory map are stored together, per-device
instance. So this separation of their initialisation can probably go
away, although some (somewhat delicate) code in oacc-init.c would need
to be tweaked.

Julian


Re: [PATCH 4/5] OpenACC 2.0 support for libgomp - new tests (repost)

2014-11-17 Thread Julian Brown
On Sat, 15 Nov 2014 00:58:56 +
Julian Brown jul...@codesourcery.com wrote:

 On Thu, 13 Nov 2014 11:15:18 +0100
 Jakub Jelinek ja...@redhat.com wrote:
 
   +# Turn on OpenACC.
   +# XXX (TEMPORARY): Remove the -flto once that's properly
   integrated. +lappend ALWAYS_CFLAGS additional_flags=-fopenacc
   -flto
  
  Do you still need that?
 
 I'm not sure -- I can't easily check on trunk without the middle-end
 bits, and I haven't tried to incorporate those in my testing yet. I'll
 try to check this on e.g. the gomp4 branch soon.

It seems that -flto *is* still needed at present -- I'm not sure what
the plan was for integrating it properly. Making -fopenacc imply
-flto via specs or similar?

Thanks,

Julian


Re: [PATCH 3/5] OpenACC 2.0 support for libgomp - outline documentation (repost)

2014-11-14 Thread Julian Brown
On Thu, 13 Nov 2014 11:05:10 +0100
Tobias Burnus tobias.bur...@physik.fu-berlin.de wrote:

 Jakub Jelinek wrote:
   -* libgomp: (libgomp).GNU OpenMP runtime
   library +* libgomp: (libgomp).GNU OpenACC and
   OpenMP runtime library @end direntry
 
  See Dave Malcolm's patch, please integrate it into your patchset.
 
 Namely, https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01317.html
 
 
 However, a grep shows also the following spots which have to be
 updated:
 
 gcc/fortran/gfortran.texi-@option{-fopenmp}.  This also arranges for
 automatic linking of the gcc/fortran/gfortran.texi:GNU OpenMP runtime
 library @ref{Top,,libgomp,libgomp,GNU OpenMP
 gcc/fortran/gfortran.texi-runtime library}. --
 gcc/fortran/intrinsic.texi-@file{omp_lib.h}. The procedures provided
 by @code{OMP_LIB} can be found gcc/fortran/intrinsic.texi:in the
 @ref{Top,,Introduction,libgomp,GNU OpenMP runtime library} manual,
 gcc/fortran/intrinsic.texi-the named constants defined in the modules
 are listed -- gcc/doc/sourcebuild.texi-@item libgomp
 gcc/doc/sourcebuild.texi:The GNU OpenMP runtime library.
 gcc/doc/sourcebuild.texi-

Thanks -- here's a new version of the patch, which incorporates David
Malcolm's new backronym for libgomp, and edits the above files also.

Juliancommit 06fc24fb9ffcf70aa49158f12db3f592bca5c3ff
Author: Julian Brown jul...@codesourcery.com
Date:   Thu Nov 13 04:21:16 2014 -0800

OpenACC documentation.

-xx-xx  Thomas Schwinge  tho...@codesourcery.com
	James Norris  jnor...@codesourcery.com
	David Malcolm dmalc...@redhat.com
	Julian Brown  jul...@codesourcery.com

libgomp/
* libgomp.texi: Outline documentation for OpenACC.

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 20a206d..373dbb6 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -89,7 +89,7 @@ The Go runtime library.  The bulk of this library is mirrored from the
 @uref{http://code.google.com/@/p/@/go/, master Go repository}.
 
 @item libgomp
-The GNU OpenMP runtime library.
+The GNU Offloading and Multi Processing library.
 
 @item libiberty
 The @code{libiberty} library, used for portability and for some
diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index 90c9a3a..52db989 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -14030,8 +14030,8 @@ The OpenMP Fortran runtime library routines are provided both in
 a form of two Fortran 90 modules, named @code{OMP_LIB} and 
 @code{OMP_LIB_KINDS}, and in a form of a Fortran @code{include} file named
 @file{omp_lib.h}. The procedures provided by @code{OMP_LIB} can be found
-in the @ref{Top,,Introduction,libgomp,GNU OpenMP runtime library} manual,
-the named constants defined in the modules are listed
+in the @ref{Top,,Introduction,libgomp,GNU Offloading and Multi Processing
+library} manual, the named constants defined in the modules are listed
 below.
 
 For details refer to the actual
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 254be57..4bd7ab8 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -31,11 +31,14 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
 @ifinfo
 @dircategory GNU Libraries
 @direntry
-* libgomp: (libgomp).GNU OpenMP runtime library
+* libgomp: (libgomp).   GNU Offloading and Multi Processing Runtime library
 @end direntry
 
-This manual documents the GNU implementation of the OpenMP API for 
-multi-platform shared-memory parallel programming in C/C++ and Fortran.
+This manual documents libgomp, the GNU Offloading and Multi
+Processing Runtime library.  This is the GNU implementation of the OpenMP
+API for multi-platform shared-memory parallel programming in C/C++ and
+Fortran and of the OpenACC and OpenMP APIs for offloading of code to accelerator
+devices from the same languages.
 
 Published by the Free Software Foundation
 51 Franklin Street, Fifth Floor
@@ -48,7 +51,7 @@ Boston, MA 02110-1301 USA
 @setchapternewpage odd
 
 @titlepage
-@title The GNU OpenMP Implementation
+@title The GNU OpenACC and OpenMP Implementation
 @page
 @vskip 0pt plus 1filll
 @comment For the @value{version-GCC} Version*
@@ -69,7 +72,11 @@ Boston, MA 02110-1301, USA@*
 @top Introduction
 @cindex Introduction
 
-This manual documents the usage of libgomp, the GNU implementation of the 
+This manual documents the usage of libgomp, the GNU Offloading and Multi
+Processing Runtime library.  This is the GNU implementation of the
+@uref{http://www.openacc.org/, OpenACC} Application Programming Interface (API)
+for offloading of code to accelerator devices in C/C++ and Fortran, and
+the GNU implementation of the 
 @uref{http://www.openmp.org, OpenMP} Application Programming Interface (API)
 for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
@@ -81,23 +88,617 @@ for multi-platform shared-memory parallel programming in C/C

Re: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)

2014-11-12 Thread Julian Brown
On Wed, 12 Nov 2014 11:06:26 +0100
Jakub Jelinek ja...@redhat.com wrote:

 On Tue, Nov 11, 2014 at 01:53:23PM +, Julian Brown wrote:
  A few OpenMP tests fail with the new host_nonshm plugin (with
  failures of the form libgomp: Trying to update
  [0x605820..0x605824) object that is not mapped), probably because
  of middle-end bugs. I haven't investigated those in detail.
 
 Depends how exactly your host_nonshm plugin works.  A few tests in the
 testsuite use #pragma omp declare target variables, so if host_nonshm
 plugin is something like I had on the gomp-4_0-branch initially as
 hackish device 257, where code is run on the host, and map directives
 simply malloc/free host memory and memcpy stuff around, then without
 extra work the #pragma omp declare target variables indeed can't work.
 You'd either need to support a strange partially shared memory model,
 where #pragma omp declare target variables would be shared (you'd
 still need to populate the mapping data structures with those vars
 and identity map them), or not so conforming model where you'd map
 them on entering the target regions if they aren't mapped yet (the
 thing is that then if the variables are changed on the host in
 between the start of the program and the target region, you'd use the
 changed values instead the values they were originally assigned), or
 map them in some constructor (but, how would you know if a
 host_nonshm plugin is going to be used in the future).

Thanks for the review! I'll work on addressing your comments. Your
characterization of the host_nonshm plugin sounds accurate, but OOI,
what does the Intel MIC plugin do differently that means it is not
subject to the same problem with target variables?

 One can always use the intelmicemul plugin to test nonshared-memory
 stuff without any HW (provided the host is x86_64/i686), so do we
 really need host_nonshm plugin?

It might still be useful for testing (non-shm) OpenACC without
hardware, I guess (or for pedagogical purposes) -- perhaps we could
remove the TARGET_CAP_OPENMP_400 flag, if that's not expected to work.

Julian


[PATCH 2/5] OpenACC 2.0 support for libgomp - temporarily work around missing __builtin_acc_on_device (repost)

2014-11-11 Thread Julian Brown
On Tue, 23 Sep 2014 19:19:55 +0100
Julian Brown jul...@codesourcery.com wrote:

 The patches implementing __builtin_acc_on_device are still in
 processing. For the time being this patch removes the dependency on
 that builtin in the OpenACC runtime.
 
 Julian
 
 -xx-xx  Julian Brown  jul...@codesourcery.com
 
   libgomp/
   * oacc-init.c (acc_on_device): Temporarily hard-code for host
   instead of using __builtin_acc_on_device.

This patch remains unchanged from the last posting.

OK to apply?

JulianFrom 99e76023ff0759925403b43e19612fb859c3759e Mon Sep 17 00:00:00 2001
From: Julian Brown jul...@codesourcery.com
Date: Fri, 19 Sep 2014 11:28:11 -0700
Subject: [PATCH 2/5] Work around lack of __builtin_acc_on_device for now

-xx-xx  Julian Brown  jul...@codesourcery.com

libgomp/
* oacc-init.c (acc_on_device): Temporarily hard-code for host
instead of using __builtin_acc_on_device.
---
 libgomp/oacc-init.c |   12 
 1 file changed, 12 insertions(+)

diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 8c91ea7..1cbb4d7 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -545,8 +545,20 @@ acc_on_device (acc_device_t dev)
acc_device_type (thr-dev-type) == acc_device_host_nonshm)
 return dev == acc_device_host_nonshm || dev == acc_device_not_host;
 
+#if 1
+  /* Support for __builtin_acc_on_device comes in later patches.  */
+  switch (dev)
+{
+case acc_device_none:
+case acc_device_host:
+  return 1;
+default:
+  return 0;
+}
+#else
   /* Just rely on the compiler builtin.  */
   return __builtin_acc_on_device (dev);
+#endif
 }
 ialias (acc_on_device)
 
-- 
1.7.10.4



[PATCH 3/5] OpenACC 2.0 support for libgomp - outline documentation (repost)

2014-11-11 Thread Julian Brown
On Tue, 23 Sep 2014 19:20:14 +0100
Julian Brown jul...@codesourcery.com wrote:

 This patch provides some documentation for the new OpenACC bits in
 libgomp.
 
 Julian
 
 -xx-xx  Thomas Schwinge  tho...@codesourcery.com
   James Norris  jnor...@codesourcery.com
 
   libgomp/
   * libgomp.texi: Outline documentation for OpenACC.

This patch also remains unchanged from the last posting.

OK to apply?

JulianFrom 1f17beb70b5607d1884fad1cb4734857f0e7846f Mon Sep 17 00:00:00 2001
From: Julian Brown jul...@codesourcery.com
Date: Mon, 22 Sep 2014 02:45:29 -0700
Subject: [PATCH 3/5] OpenACC documentation.

-xx-xx  Thomas Schwinge  tho...@codesourcery.com
	James Norris  jnor...@codesourcery.com

libgomp/
* libgomp.texi: Outline documentation for OpenACC.
---
 libgomp/libgomp.texi |  661 --
 1 file changed, 636 insertions(+), 25 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 254be57..9530a2b 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -31,10 +31,12 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
 @ifinfo
 @dircategory GNU Libraries
 @direntry
-* libgomp: (libgomp).GNU OpenMP runtime library
+* libgomp: (libgomp).GNU OpenACC and OpenMP runtime library
 @end direntry
 
-This manual documents the GNU implementation of the OpenMP API for 
+This manual documents the GNU implementation of the OpenACC API for 
+offloading of code to accelerator devices in C/C++ and Fortran and
+the GNU implementation of the OpenMP API for 
 multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
 Published by the Free Software Foundation
@@ -48,7 +50,7 @@ Boston, MA 02110-1301 USA
 @setchapternewpage odd
 
 @titlepage
-@title The GNU OpenMP Implementation
+@title The GNU OpenACC and OpenMP Implementation
 @page
 @vskip 0pt plus 1filll
 @comment For the @value{version-GCC} Version*
@@ -69,7 +71,10 @@ Boston, MA 02110-1301, USA@*
 @top Introduction
 @cindex Introduction
 
-This manual documents the usage of libgomp, the GNU implementation of the 
+This manual documents the usage of libgomp, the GNU implementation of the
+@uref{http://www.openacc.org/, OpenACC} Application Programming Interface (API)
+for offloading of code to accelerator devices in C/C++ and Fortran, and
+the GNU implementation of the 
 @uref{http://www.openmp.org, OpenMP} Application Programming Interface (API)
 for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
@@ -81,23 +86,619 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 @comment  better formatting.
 @comment
 @menu
-* Enabling OpenMP::How to enable OpenMP for your applications.
-* Runtime Library Routines::   The OpenMP runtime application programming 
-   interface.
-* Environment Variables::  Influencing runtime behavior with environment 
-   variables.
-* The libgomp ABI::Notes on the external ABI presented by libgomp.
-* Reporting Bugs:: How to report bugs in GNU OpenMP.
-* Copying::GNU general public license says
-   how you can copy and share libgomp.
-* GNU Free Documentation License::
-   How you can copy and share this manual.
-* Funding::How to help assure continued work for free 
-   software.
-* Library Index::  Index of this documentation.
+* Enabling OpenACC:: How to enable OpenACC for your
+ applications.
+* OpenACC Runtime Library Routines:: The OpenACC runtime application
+  programming interface.
+* OpenACC Environment Variables::Influencing OpenACC runtime behavior with
+ environment variables.
+* OpenACC Library Interoperability:: OpenACC library interoperability with the
+ NVIDIA CUBLAS library.
+* Enabling OpenMP::  How to enable OpenMP for your
+ applications.
+* OpenMP Runtime Library Routines: Runtime Library Routines.
+ The OpenMP runtime application programming
+ interface.
+* OpenMP Environment Variables: Environment Variables.
+ Influencing OpenMP runtime behavior with
+ environment variables.
+* The libgomp ABI::  Notes on the external libgomp ABI.
+* Reporting Bugs::   How to report bugs.
+* Copying::  GNU general public license says how you
+ can copy and share libgomp.
+* GNU Free Documentation License::   How you can copy and share

[PATCH 5/5] OpenACC 2.0 support for libgomp - temporary test harness tweaks

2014-11-11 Thread Julian Brown
Hi,

As mentioned in the previous mail in this series, testing the OpenACC
runtime support in libgomp is going to be awkward until the associated
middle-end pieces are ready. This stop-gap patch helps to allow tests
(that don't use any of the pragmas, only calling the run-time library
directly) to run successfully.

OK to apply?

Thanks,

Julian

ChangeLog

libgomp/
* testsuite/libgomp.oacc-c++/c++.exp (ALWAYS_CFLAGS): Temporarily
replace -fopenacc with -lgomp -lpthread, until -fopenacc support
lands upstream.
* testsuite/libgomp.oacc-c/c.exp (ALWAYS_CFLAGS): Likewise.
* testsuite/libgomp.oacc-fortran/fortran.exp (ALWAYS_CFLAGS):
Similar, but without -lpthread.
From c70f2aca94bc306e4600282aa81bc1a758ad81fa Mon Sep 17 00:00:00 2001
From: Julian Brown jul...@codesourcery.com
Date: Tue, 11 Nov 2014 02:54:09 -0800
Subject: [PATCH 5/5] Temporary testing tweaks

libgomp/
* testsuite/libgomp.oacc-c++/c++.exp (ALWAYS_CFLAGS): Temporarily replace
-fopenacc with -lgomp -lpthread, until -fopenacc support lands upstream.
* testsuite/libgomp.oacc-c/c.exp (ALWAYS_CFLAGS): Likewise.
* testsuite/libgomp.oacc-fortran/fortran.exp (ALWAYS_CFLAGS): Similar, but
without -lpthread.
---
 libgomp/testsuite/libgomp.oacc-c++/c++.exp |4 +++-
 libgomp/testsuite/libgomp.oacc-c/c.exp |4 +++-
 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp |4 +++-
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c++/c++.exp b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
index b8b3e85..1060344 100644
--- a/libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -23,7 +23,9 @@ dg-init
 
 # Turn on OpenACC.
 # XXX (TEMPORARY): Remove the -flto once that's properly integrated.
-lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto
+#lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto
+# TODO: Revert this temporary hack when OpenACC middle-end pieces are submitted.
+lappend ALWAYS_CFLAGS additional_flags=-lgomp -flto -lpthread
 
 set blddir [lookfor_file [get_multilibs] libgomp]
 
diff --git a/libgomp/testsuite/libgomp.oacc-c/c.exp b/libgomp/testsuite/libgomp.oacc-c/c.exp
index 5558ec8..85528aa 100644
--- a/libgomp/testsuite/libgomp.oacc-c/c.exp
+++ b/libgomp/testsuite/libgomp.oacc-c/c.exp
@@ -28,7 +28,9 @@ dg-init
 
 # Turn on OpenACC.
 # XXX (TEMPORARY): Remove the -flto once that's properly integrated.
-lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto
+#lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto
+# TODO: Revert temporary hack when OpenACC middle-end pieces are submitted.
+lappend ALWAYS_CFLAGS additional_flags=-lgomp -flto -lpthread
 
 lappend libgomp_compile_options compiler=$GCC_UNDER_TEST
 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
index 0ada038..27cf4d5 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
+++ b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
@@ -23,7 +23,9 @@ dg-init
 
 # Turn on OpenACC.
 # XXX (TEMPORARY): Remove the -flto once that's properly integrated.
-lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto
+#lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto
+# TODO: Revert this temporary hack when OpenACC middle-end pieces are submitted.
+lappend ALWAYS_CFLAGS additional_flags=-lgomp -flto
 
 if { $blddir !=  } {
 set lang_source_re {^.*\.[fF](|90|95|03|08)$}
-- 
1.7.10.4



Re: [patch] OpenACC fortran front end

2014-11-11 Thread Julian Brown
On Tue, 11 Nov 2014 08:10:29 +0100
Jakub Jelinek ja...@redhat.com wrote:

 On Mon, Nov 10, 2014 at 02:43:38PM -0800, Cesar Philippidis wrote:
   I'll post a separate patch with the fortran tests later. If
   anyone wants to test this patch, please use gomp-4_0-branch
   instead. You don't need a CUDA accelerator to use
   OpenACC, and some of the runtime tests will fail because that
   branch doesn't include the nvptx backend.
   Now that the first series of PTX target patches have been
   committed: I assume it is still true that nvptx doesn't work
   because the libgomp bits aren't in yes, isn't it?
  
  That's correct. The nvptx backend also depends on the offloading
  changes that a team from Intel is working on for the MIC target.
  But Julian should be posting the libgomp patches tomorrow, I think,
  since his changes are somewhat self-contained.
 
 For the middle-end and libgomp changes, can you talk to the Intel
 folks to update their git branch to latest trunk (so that you have
 the nvptx bits in there) and send middle-end and libgomp diffs
 against that? As far as I remember, most of the changes from the
 branch are now approved, they are just waiting for review of the LTO
 related changes in the middle-end (please, correct me if I've missed
 something).

We've been preparing new patches against trunk for the libgomp and
middle-end bits: I've now posted the former, and the latter are on
their way soon, I believe. The middle-end bits are also present on the
gomp-4_0-branch SVN branch (likewise, the libgomp pieces), and I
believe we're planning to merge the PTX bits there also now they've
been committed to trunk.

Is it really worthwhile merging our patches to yet another branch at
this stage?

Thanks,

Julian


Re: [patch] OpenACC fortran front end

2014-11-11 Thread Julian Brown
On Tue, 11 Nov 2014 17:51:01 +0100
Jakub Jelinek ja...@redhat.com wrote:

 On Tue, Nov 11, 2014 at 02:52:20PM +, Julian Brown wrote:
  On Tue, 11 Nov 2014 08:10:29 +0100
  Jakub Jelinek ja...@redhat.com wrote:
  
  We've been preparing new patches against trunk for the libgomp and
  middle-end bits: I've now posted the former, and the latter are on
  their way soon, I believe. The middle-end bits are also present on
  the gomp-4_0-branch SVN branch (likewise, the libgomp pieces), and I
  believe we're planning to merge the PTX bits there also now they've
  been committed to trunk.
  
  Is it really worthwhile merging our patches to yet another branch at
  this stage?
 
 The point is that the kyukhin/gomp4-offload branch is mostly reviewed
 now (waiting for Richard and/or Honza now to review the last LTO bits)
 and your patches have huge overlap with that, so sending patches
 against trunk that implement the same thing would mean reviewing the
 same bits again, and worse if there are conflicts between the two
 patchsets, if both patchsets were to be approved, one couldn't be
 committed anyway.

Yeah, understood, and apologies for not making that clearer: as Cesar
mentions, my patches are meant to apply (as well as I could manage) on
top of Ilya's ones that have mostly been approved, and there should be
no overlap in functionality (Ilya's patches subsume patches 1-6 in my
previously-posted series). Our approach to branch management perhaps
hasn't been perfect here -- it didn't dawn on me until quite late in the
submission process that Intel had been working on their own branch
rather than the gomp-4_0-branch, and that the patches they would be
posting would be based on the former rather than the latter. But, we've
tried hard to accommodate the differences that have arisen in the
meantime.

Thanks,

Julian


Re: [gomp4] Move libgomp plugins into subdirectory

2014-11-06 Thread Julian Brown
On Thu, 6 Nov 2014 10:06:00 +0100
Thomas Schwinge tho...@codesourcery.com wrote:

 Hi Julian!
 
 On Wed, 5 Nov 2014 17:57:10 +, Julian Brown
 jul...@codesourcery.com wrote:
  This patch moves plugin-nvptx.c and plugin-host.c (from oacc-host.c)
  into a new plugin subdirectory, as requested by Jakub, and to
  match more closely the layout of the Intel MIC pieces. This also
  moves the autotools bits to enable the NVPTX plugin and locate CUDA
  libraries into the plugin directory's (new) configury bits.
 
 Hmm.  And then we cross-include files in libgomp/ from
 libgomp/plugin/ as well as the other way round (libgomp/oacc-host.c
 including libgomp/plugin/plugin-host.c, for example) -- whilst these
 two regimes are configured by two separate Autoconf instances?  Is
 this really the intended scheme, or should we maybe rather have a
 top-level libgomp Autoconf/Automake system (as before), which is
 amended by libgomp/plugin/configfrag.ac and
 libgomp/plugin/Makefrag.am files that are included from
 libgomp/configure.ac and libgomp/Makefile.am?

I don't know -- I was trying to follow existing practice (or how I
imagine that to be) with regard to recursive autotools invocations
(e.g. libjava/libltdl), and I have some FUD, probably misplaced, about
how well non-recursive autotools works.

A couple of the header files (oacc-plugin.h, libgomp-plugin.h) might be
better placed within the plugin directory, but plugins will generally
still need to include some headers direct from libgomp/. Maybe this
reorg is just a bad idea?

  Test results look reasonable with my (patched for PTX support)
  version of the gomp4 branch. I'll apply it there shortly.
 
 Mid-air collision with my yesterday's libgomp changes -- with your
 patch in (r217162), gomp-4_0-branch doesn't even build; the files
 added/moved to libgomp/plugins/ are missing some of my changes.  (I
 didn't look/compare in more detail.)

Apologies, I thought I'd fixed those up, but it looks like I missed a
bit.

  libgomp/
  * Makefile.am (SUBDIRS): Add plugin.
  (DIST_SUBDIRS): Define.
  (libgomp_plugin_nvptx_*): Remove nvptx support from here.
  (libgomp_plugin_host_nonshm_*): Likewise.
  * Makefile.in: Regenerate.
  * configure: Regenerate.
  * oacc-host.c: Replace with #include of plugin/plugin-host.c
  code, move implementation to the latter.
  * plugin/plugin-host.c: New file.
  * plugin-nvptx.c: Move to...
  * plugin/plugin-nvptx.c: New file.
  * plugin/Makefile.am: New.
  * plugin/Makefile.in: Regenerate.
  * plugin/aclocal.m4: Regenerate.
  * plugin/configure: Regenerate.
 
 Please check in the regenerated libgomp/config.h.in, update
 contrib/gcc_update, and make generation of
 libgomp/testsuite/libgomp-test-support.exp work again, that is,
 substitution of @CUDA_DRIVER_INCLUDE@ and @CUDA_DRIVER_LIB@ (perhaps
 move instantiation from libgomp/configure.ac to
 libgomp/plugin/configure.ac).

I'll fix this.

Thanks,

Julian


Re: [gomp4] Move libgomp plugins into subdirectory

2014-11-06 Thread Julian Brown
On Thu, 6 Nov 2014 11:11:42 +0100
Jakub Jelinek ja...@redhat.com wrote:

 On Thu, Nov 06, 2014 at 10:06:00AM +0100, Thomas Schwinge wrote:
  Hi Julian!
  
  On Wed, 5 Nov 2014 17:57:10 +, Julian Brown
  jul...@codesourcery.com wrote:
   This patch moves plugin-nvptx.c and plugin-host.c (from
   oacc-host.c) into a new plugin subdirectory, as requested by
   Jakub, and to match more closely the layout of the Intel MIC
   pieces. This also moves the autotools bits to enable the NVPTX
   plugin and locate CUDA libraries into the plugin directory's
   (new) configury bits.
  
  Hmm.  And then we cross-include files in libgomp/ from
  libgomp/plugin/ as well as the other way round (libgomp/oacc-host.c
  including libgomp/plugin/plugin-host.c, for example) -- whilst
  these two regimes are configured by two separate Autoconf
  instances?  Is this really the intended scheme, or should we maybe
  rather have a top-level libgomp Autoconf/Automake system (as
  before), which is amended by libgomp/plugin/configfrag.ac and
  libgomp/plugin/Makefrag.am files that are included from
  libgomp/configure.ac and libgomp/Makefile.am?

I'll apply the attached fixes for now in case anyone's blocked on the
broken libgomp build, and then...

 I agree a plugin fragment into libgomp/configure.ac and/or
 libgomp/Makefile* is better.

work on refactoring those configury bits (which will revert some of the
attached, including moving libgomp-test-support.exp.in back to its
previous location, but never mind).

Thanks,

Julian

ChangeLog

* contrib/gcc_update (libgomp/plugin/aclocal.m4)
(libgomp/plugin/Makefile.in, libgomp/plugin/configure)
(libgomp/plugin/config.h.in): Add.

libgomp/
* oacc-init.c (resolve_device, _acc_init): Fix init_device_func
hook naming.
* plugin/plugin-host.c (GOMP_OFFLOAD_openacc_avail): Remove.
(host_dispatch): Don't set avail_func hook.
* plugin/configure.ac (libgomp-test-support.exp): Add to
AC_CONFIG_FILES.
* plugin/configure: Regenerate.
* testsuite/libgomp-test-support.exp.in: Move from here...
* plugin/libgomp-test-support.exp.in: ...to here.
* plugin/Makefile.in: Regenerate.
* testsuite/lib/libgomp.exp (libgomp-test-support.exp): Find in
plugin dir, for now.
* testsuite/Makefile.in: Regenerate.
* configure.ac (testsuite/libgomp-test-support.exp): Remove from
AC_CONFIG_FILES.
* config.h.in: Regenerate.
* configure: Regenerate.
Index: libgomp/oacc-init.c
===
--- libgomp/oacc-init.c	(revision 217192)
+++ libgomp/oacc-init.c	(working copy)
@@ -97,7 +97,7 @@ resolve_device (acc_device_t d)
 	while (++d != _ACC_device_hwm)
 	  if (dispatchers[d]
 		   !strcasecmp (goacc_device_type, dispatchers[d]-name)
-		   dispatchers[d]-device_init_func ()  0)
+		   dispatchers[d]-init_device_func ()  0)
 		goto found;
 
 	gomp_fatal (device type %s not supported, goacc_device_type);
@@ -112,7 +112,7 @@ resolve_device (acc_device_t d)
 case acc_device_not_host:
   /* Find the first available device after acc_device_not_host.  */
   while (++d != _ACC_device_hwm)
-	if (dispatchers[d]  dispatchers[d]-device_init_func ()  0)
+	if (dispatchers[d]  dispatchers[d]-init_device_func ()  0)
 	  goto found;
   if (d_arg == acc_device_default)
 	{	  
@@ -140,7 +140,7 @@ resolve_device (acc_device_t d)
 }
 
 /* This is called when plugins have been initialized, and serves to call
-   (indirectly) the target's device_init hook.  Calling multiple times without
+   (indirectly) the target's init_device hook.  Calling multiple times without
an intervening _acc_shutdown call is an error.  */
 
 static struct gomp_device_descr const *
@@ -150,7 +150,7 @@ _acc_init (acc_device_t d)
 
   acc_dev = resolve_device (d);
 
-  if (!acc_dev || acc_dev-device_init_func () = 0)
+  if (!acc_dev || acc_dev-init_device_func () = 0)
 gomp_fatal (device %u not supported, (unsigned)d);
 
   if (acc_dev-is_initialized)
Index: libgomp/plugin/plugin-host.c
===
--- libgomp/plugin/plugin-host.c	(revision 217192)
+++ libgomp/plugin/plugin-host.c	(working copy)
@@ -153,16 +153,6 @@ GOMP_OFFLOAD_get_table (struct mapping_t
   return 0;
 }
 
-STATIC bool
-GOMP_OFFLOAD_openacc_avail (void)
-{
-#ifdef DEBUG
-  fprintf (stderr, SELF %s:%s\n, __FILE__, __FUNCTION__);
-#endif
-
-  return 1;
-}
-
 STATIC void *
 GOMP_OFFLOAD_openacc_open_device (int n)
 {
@@ -415,9 +405,6 @@ static struct gomp_device_descr host_dis
   .get_device_num_func = GOMP_OFFLOAD_openacc_get_device_num,
   .set_device_num_func = GOMP_OFFLOAD_openacc_set_device_num,
 
-  /* Device available.  */
-  .avail_func = GOMP_OFFLOAD_openacc_avail,
-
   .exec_func = GOMP_OFFLOAD_openacc_parallel,
 
   .register_async_cleanup_func
Index: libgomp/plugin/configure.ac

Re: [gomp4] Move libgomp plugins into subdirectory

2014-11-06 Thread Julian Brown
On Thu, 6 Nov 2014 15:37:42 +
Julian Brown jul...@codesourcery.com wrote:

 On Thu, 6 Nov 2014 11:11:42 +0100
 Jakub Jelinek ja...@redhat.com wrote:
 
  On Thu, Nov 06, 2014 at 10:06:00AM +0100, Thomas Schwinge wrote:
   Hmm.  And then we cross-include files in libgomp/ from
   libgomp/plugin/ as well as the other way round
   (libgomp/oacc-host.c including libgomp/plugin/plugin-host.c, for
   example) -- whilst these two regimes are configured by two
   separate Autoconf instances?  Is this really the intended scheme,
   or should we maybe rather have a top-level libgomp
   Autoconf/Automake system (as before), which is amended by
   libgomp/plugin/configfrag.ac and libgomp/plugin/Makefrag.am files
   that are included from libgomp/configure.ac and
   libgomp/Makefile.am?

  I agree a plugin fragment into libgomp/configure.ac and/or
  libgomp/Makefile* is better.
 
 [...] work on refactoring those configury bits (which will revert some
 of the attached, including moving libgomp-test-support.exp.in back to
 its previous location, but never mind).

Does this look like what you had in mind? (I think liboffloadmic uses a
similar recursive autotools invocation for its libgomp plugin -- maybe
that wants refactoring too?).

Thanks,

Julian

ChangeLog

* contrib/gcc_update (libgomp/aclocal.m4, libgomp/Makefile.in)
(libgomp/configure, libgomp/config.h.in): Add depends for plugin
config fragments.
(libgomp/plugin/aclocal.m4, libgomp/plugin/Makefile.in)
(libgomp/plugin/configure, libgomp/plugin/config.h.in): Remove.

libgomp/
* Makefile.am (SUBDIRS): Remove plugin subdir.
(DIST_SUBDIRS): Delete.
(search_path): Add ($top_srcdir)/../include.
(AM_CPPFLAGS): Remove -I$(top_srcdir)/../include.
(plugin/Makefrag.in): Include.
* Makefile.in: Regenerate.
* configure.ac (plugin): Remove from AC_CONFIG_SUBDIRS.
(plugin/configfrag.ac): Include.
(testsuite/libgomp-test-support.exp): Add to AC_CONFIG_FILES.
* configure: Regenerate.
* plugin/Makefile.am: Remove, refactor into...
* plugin/Makefrag.am: ...this. New.
* plugin/aclocal.m4: Remove.
* plugin/config.h.in: Remove.
* plugin/configure: Remove.
* plugin/configure.ac: Remove, refactor into...
* plugin/configfrag.ac: ...this. New.
* plugin/libgomp-test-support-exp.in: Move back to...
* testsuite/libgomp-test-support-exp.in: Here.
* testsuite/lib/libgomp.exp (libgomp-test-support.exp): Include
from current directory, not plugin dir.commit ea1335fc5a4aed75ad0f299969520f10e2f27435
Author: Julian Brown jul...@codesourcery.com
Date:   Thu Nov 6 11:54:25 2014 -0800

Don't use recursive autoconf/automake for libgomp plugins

diff --git a/contrib/gcc_update b/contrib/gcc_update
index a50dc8c..2903d7a 100755
--- a/contrib/gcc_update
+++ b/contrib/gcc_update
@@ -138,15 +138,11 @@ libjava/libltdl/config-h.in: libjava/libltdl/configure.ac libjava/libltdl/acloca
 libcpp/aclocal.m4: libcpp/configure.ac
 libcpp/Makefile.in: libcpp/configure.ac libcpp/aclocal.m4
 libcpp/configure: libcpp/configure.ac libcpp/aclocal.m4
-libgomp/aclocal.m4: libgomp/configure.ac libgomp/acinclude.m4
-libgomp/Makefile.in: libgomp/Makefile.am libgomp/aclocal.m4
+libgomp/aclocal.m4: libgomp/configure.ac libgomp/acinclude.m4 libgomp/plugin/configfrag.ac
+libgomp/Makefile.in: libgomp/Makefile.am libgomp/aclocal.m4 libgomp/plugin/Makefrag.am
 libgomp/testsuite/Makefile.in: libgomp/testsuite/Makefile.am libgomp/aclocal.m4
-libgomp/configure: libgomp/configure.ac libgomp/aclocal.m4
-libgomp/config.h.in: libgomp/configure.ac libgomp/aclocal.m4
-libgomp/plugin/aclocal.m4: libgomp/plugin/configure.ac
-libgomp/plugin/Makefile.in: libgomp/plugin/Makefile.am libgomp/plugin/aclocal.m4
-libgomp/plugin/configure: libgomp/plugin/configure.ac libgomp/plugin/aclocal.m4
-libgomp/plugin/config.h.in: libgomp/plugin/configure.ac libgomp/plugin/aclocal.m4
+libgomp/configure: libgomp/configure.ac libgomp/aclocal.m4 libgomp/plugin/configfrag.ac
+libgomp/config.h.in: libgomp/configure.ac libgomp/aclocal.m4 libgomp/plugin/configfrag.ac
 libitm/aclocal.m4: libitm/configure.ac libitm/acinclude.m4
 libitm/Makefile.in: libitm/Makefile.am libitm/aclocal.m4
 libitm/testsuite/Makefile.in: libitm/testsuite/Makefile.am libitm/aclocal.m4
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index f265c5d..dc2f88a 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -1,21 +1,21 @@
 ## Process this file with automake to produce Makefile.in
 
 ACLOCAL_AMFLAGS = -I .. -I ../config
-SUBDIRS = testsuite plugin
-DIST_SUBDIRS = plugin
+SUBDIRS = testsuite
 
 ## May be used by toolexeclibdir.
 gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
 
 config_path = @config_path@
-search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir)
+search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir) \
+	  $(top_srcdir)/../include
 
 fincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version

[gomp4] Use GOMP_OFFLOAD_ prefix for (OpenACC) plugin hooks

2014-11-05 Thread Julian Brown
Hi,

Mirroring changes in Ilya Verbin's libgomp offloading pieces posted to
trunk, this patch adds a prefix of GOMP_OFFLOAD_ to the OpenACC plugin
hooks. Some of these bits will not be needed for a trunk version of the
patch once Ilya's patch is approved (I'm hoping other
incompatibilities haven't crept in other than the renaming!).

I will apply to the gomp4 branch shortly.

Thanks,

Julian

ChangeLog

libgomp/
* oacc-host.c: Add GOMP_OFFLOAD_ prefix for plugin hooks. Rename
device_init to init_device, device_fini to fini_device,
offload_register to register_image and remove extraneous device_
from device_alloc, device_free, device_dev2host, device_host2dev and
device_run.
(host_dispatch): Use new names for hooks.
* oacc-init.c: Use new names for hooks, throughout.
* plugin-nvptx.c: Likewise.
* target.c: Likewise.
(gomp_load_plugin_for_device): Likewise. Look for new hook names.
* target.h (gomp_device_descr): Use new hook names.
commit 4e1b71a5e0d15de4c6e89ab5139964e32b563d68
Author: Julian Brown jul...@codesourcery.com
Date:   Wed Nov 5 02:34:22 2014 -0800

Use GOMP_OFFLOAD_ prefix for plugin hooks.

diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index fc3e77c..02794bb 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -60,7 +60,7 @@ static struct gomp_device_descr host_dispatch;
 #endif
 
 STATIC const char *
-get_name (void)
+GOMP_OFFLOAD_get_name (void)
 {
 #ifdef DEBUG
   fprintf (stderr, SELF %s:%s\n, __FILE__, __FUNCTION__);
@@ -74,7 +74,7 @@ get_name (void)
 }
 
 STATIC int
-get_type (void)
+GOMP_OFFLOAD_get_type (void)
 {
 #ifdef DEBUG
   fprintf (stderr, SELF %s:%s\n, __FILE__, __FUNCTION__);
@@ -88,7 +88,7 @@ get_type (void)
 }
 
 STATIC unsigned int
-get_caps (void)
+GOMP_OFFLOAD_get_caps (void)
 {
   unsigned int caps = TARGET_CAP_OPENACC_200 | TARGET_CAP_OPENMP_400
 		  | TARGET_CAP_NATIVE_EXEC;
@@ -105,7 +105,7 @@ get_caps (void)
 }
 
 STATIC int
-get_num_devices (void)
+GOMP_OFFLOAD_get_num_devices (void)
 {
 #ifdef DEBUG
   fprintf (stderr, SELF %s:%s\n, __FILE__, __FUNCTION__);
@@ -115,7 +115,7 @@ get_num_devices (void)
 }
 
 STATIC void
-offload_register (void *host_table, void *target_data)
+GOMP_OFFLOAD_register_image (void *host_table, void *target_data)
 {
 #ifdef DEBUG
   fprintf (stderr, SELF %s:%s (%p, %p)\n, __FILE__, __FUNCTION__, host_table,
@@ -124,17 +124,17 @@ offload_register (void *host_table, void *target_data)
 }
 
 STATIC int
-device_init (void)
+GOMP_OFFLOAD_init_device (void)
 {
 #ifdef DEBUG
   fprintf (stderr, SELF %s:%s\n, __FILE__, __FUNCTION__);
 #endif
 
-  return get_num_devices ();
+  return GOMP_OFFLOAD_get_num_devices ();
 }
 
 STATIC int
-device_fini (void)
+GOMP_OFFLOAD_fini_device (void)
 {
 #ifdef DEBUG
   fprintf (stderr, SELF %s:%s\n, __FILE__, __FUNCTION__);
@@ -144,7 +144,7 @@ device_fini (void)
 }
 
 STATIC int
-device_get_table (struct mapping_table **table)
+GOMP_OFFLOAD_get_table (struct mapping_table **table)
 {
 #ifdef DEBUG
   fprintf (stderr, SELF %s:%s (%p)\n, __FILE__, __FUNCTION__, table);
@@ -154,7 +154,7 @@ device_get_table (struct mapping_table **table)
 }
 
 STATIC bool
-openacc_avail (void)
+GOMP_OFFLOAD_openacc_avail (void)
 {
 #ifdef DEBUG
   fprintf (stderr, SELF %s:%s\n, __FILE__, __FUNCTION__);
@@ -164,7 +164,7 @@ openacc_avail (void)
 }
 
 STATIC void *
-openacc_open_device (int n)
+GOMP_OFFLOAD_openacc_open_device (int n)
 {
 #ifdef DEBUG
   fprintf (stderr, SELF %s:%s (%u)\n, __FILE__, __FUNCTION__, n);
@@ -174,7 +174,7 @@ openacc_open_device (int n)
 }
 
 STATIC int
-openacc_close_device (void *hnd)
+GOMP_OFFLOAD_openacc_close_device (void *hnd)
 {
 #ifdef DEBUG
   fprintf (stderr, SELF %s:%s (%p)\n, __FILE__, __FUNCTION__, hnd);
@@ -184,7 +184,7 @@ openacc_close_device (void *hnd)
 }
 
 STATIC int
-openacc_get_device_num (void)
+GOMP_OFFLOAD_openacc_get_device_num (void)
 {
 #ifdef DEBUG
   fprintf (stderr, SELF %s:%s\n, __FILE__, __FUNCTION__);
@@ -194,7 +194,7 @@ openacc_get_device_num (void)
 }
 
 STATIC void
-openacc_set_device_num (int n)
+GOMP_OFFLOAD_openacc_set_device_num (int n)
 {
 #ifdef DEBUG
   fprintf (stderr, SELF %s:%s (%u)\n, __FILE__, __FUNCTION__, n);
@@ -205,7 +205,7 @@ openacc_set_device_num (int n)
 }
 
 STATIC void *
-device_alloc (size_t s)
+GOMP_OFFLOAD_alloc (size_t s)
 {
   void *ptr = GOMP(malloc) (s);
 
@@ -217,7 +217,7 @@ device_alloc (size_t s)
 }
 
 STATIC void
-device_free (void *p)
+GOMP_OFFLOAD_free (void *p)
 {
 #ifdef DEBUG
   fprintf (stderr, SELF %s:%s (%p)\n, __FILE__, __FUNCTION__, p);
@@ -227,7 +227,7 @@ device_free (void *p)
 }
 
 STATIC void *
-device_host2dev (void *d, const void *h, size_t s)
+GOMP_OFFLOAD_host2dev (void *d, const void *h, size_t s)
 {
 #ifdef DEBUG
   fprintf (stderr, SELF %s:%s (%p, %p, %zd)\n, __FILE__, __FUNCTION__, d, h,
@@ -242,7 +242,7 @@ device_host2dev (void *d, const void *h, size_t s)
 }
 
 STATIC void *
-device_dev2host (void *h, const void *d, size_t s

[gomp4] Move libgomp plugins into subdirectory

2014-11-05 Thread Julian Brown
Hi,

This patch moves plugin-nvptx.c and plugin-host.c (from oacc-host.c)
into a new plugin subdirectory, as requested by Jakub, and to match
more closely the layout of the Intel MIC pieces. This also moves the
autotools bits to enable the NVPTX plugin and locate CUDA libraries
into the plugin directory's (new) configury bits.

So far this only changes the location of the source files: the plugins
themselves are still installed to the same place as before (alongside
libgomp itself).

Test results look reasonable with my (patched for PTX support) version
of the gomp4 branch. I'll apply it there shortly.

Thanks,

Julian

ChangeLog

libgomp/
* Makefile.am (SUBDIRS): Add plugin.
(DIST_SUBDIRS): Define.
(libgomp_plugin_nvptx_*): Remove nvptx support from here.
(libgomp_plugin_host_nonshm_*): Likewise.
* Makefile.in: Regenerate.
* configure: Regenerate.
* oacc-host.c: Replace with #include of plugin/plugin-host.c code,
move implementation to the latter.
* plugin/plugin-host.c: New file.
* plugin-nvptx.c: Move to...
* plugin/plugin-nvptx.c: New file.
* plugin/Makefile.am: New.
* plugin/Makefile.in: Regenerate.
* plugin/aclocal.m4: Regenerate.
* plugin/configure: Regenerate.
commit 8994fb8c1b9d52cb9c82a61227a450df29e61806
Author: Julian Brown jul...@codesourcery.com
Date:   Wed Nov 5 02:54:30 2014 -0800

Move libgomp plugins into their own directory.

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index e0ab763..f265c5d 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -1,7 +1,8 @@
 ## Process this file with automake to produce Makefile.in
 
 ACLOCAL_AMFLAGS = -I .. -I ../config
-SUBDIRS = testsuite
+SUBDIRS = testsuite plugin
+DIST_SUBDIRS = plugin
 
 ## May be used by toolexeclibdir.
 gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
@@ -21,27 +22,6 @@ AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
 toolexeclib_LTLIBRARIES = libgomp.la
 nodist_toolexeclib_HEADERS = libgomp.spec
 
-if PLUGIN_NVPTX
-# Nvidia PTX OpenACC plugin.
-libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
-toolexeclib_LTLIBRARIES += libgomp-plugin-nvptx.la
-libgomp_plugin_nvptx_la_SOURCES = plugin-nvptx.c
-libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
-libgomp_plugin_nvptx_la_LDFLAGS = $(libgomp_plugin_nvptx_version_info) \
-	$(lt_host_flags)
-libgomp_plugin_nvptx_la_LDFLAGS += $(PLUGIN_NVPTX_LDFLAGS)
-libgomp_plugin_nvptx_la_LIBADD = $(PLUGIN_NVPTX_LIBS)
-libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
-endif
-
-libgomp_plugin_host_nonshm_version_info = -version-info $(libtool_VERSION)
-toolexeclib_LTLIBRARIES += libgomp-plugin-host_nonshm.la
-libgomp_plugin_host_nonshm_la_SOURCES = oacc-host.c
-libgomp_plugin_host_nonshm_la_CPPFLAGS = $(AM_CPPFLAGS) -DHOST_NONSHM_PLUGIN
-libgomp_plugin_host_nonshm_la_LDFLAGS = \
-	$(libgomp_plugin_host_nonshm_version_info) $(lt_host_flags)
-libgomp_plugin_host_nonshm_la_LIBTOOLFLAGS = --tag=disable-static
-
 if LIBGOMP_BUILD_VERSIONED_SHLIB
 # -Wc is only a libtool option.
 comma = ,
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index d12376e..ea3e1ca 100644
diff --git a/libgomp/configure b/libgomp/configure
index 7daccd9..11a7ae0 100755
diff --git a/libgomp/configure.ac b/libgomp/configure.ac
index 89c6b31..e883945 100644
--- a/libgomp/configure.ac
+++ b/libgomp/configure.ac
@@ -30,42 +30,6 @@ LIBGOMP_ENABLE(generated-files-in-srcdir, no, ,
 AC_MSG_RESULT($enable_generated_files_in_srcdir)
 AM_CONDITIONAL(GENINSRC, test $enable_generated_files_in_srcdir = yes)
 
-# Look for the CUDA driver package.
-CUDA_DRIVER_INCLUDE=
-CUDA_DRIVER_LIB=
-AC_SUBST(CUDA_DRIVER_INCLUDE)
-AC_SUBST(CUDA_DRIVER_LIB)
-CUDA_DRIVER_CPPFLAGS=
-CUDA_DRIVER_LDFLAGS=
-AC_ARG_WITH(cuda-driver,
-	[AS_HELP_STRING([--with-cuda-driver=PATH],
-		[specify prefix directory for installed CUDA driver package.
-		 Equivalent to --with-cuda-driver-include=PATH/include
-		 plus --with-cuda-driver-lib=PATH/lib])])
-AC_ARG_WITH(cuda-driver-include,
-	[AS_HELP_STRING([--with-cuda-driver-include=PATH],
-		[specify directory for installed CUDA driver include files])])
-AC_ARG_WITH(cuda-driver-lib,
-	[AS_HELP_STRING([--with-cuda-driver-lib=PATH],
-		[specify directory for the installed CUDA driver library])])
-if test x$with_cuda_driver != x; then
-  CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
-  CUDA_DRIVER_LIB=$with_cuda_driver/lib
-fi
-if test x$with_cuda_driver_include != x; then
-  CUDA_DRIVER_INCLUDE=$with_cuda_driver_include
-fi
-if test x$with_cuda_driver_lib != x; then
-  CUDA_DRIVER_LIB=$with_cuda_driver_lib
-fi
-if test x$CUDA_DRIVER_INCLUDE != x; then
-  CUDA_DRIVER_CPPFLAGS=-I$CUDA_DRIVER_INCLUDE
-fi
-if test x$CUDA_DRIVER_LIB != x; then
-  CUDA_DRIVER_LDFLAGS=-L$CUDA_DRIVER_LIB
-fi
-
-
 # ---
 # ---
 
@@ -241,52 +205,7 @@ elif test x$enable_accelerator != xno; then
   AC_MSG_ERROR([Can't have support for accelerators without support for plugins])
 fi

Re: [gomp4] Use GOMP_OFFLOAD_ prefix for (OpenACC) plugin hooks

2014-11-05 Thread Julian Brown
On Wed, 5 Nov 2014 22:02:33 +0300
Ilya Verbin iver...@gmail.com wrote:

 Hi,
 
 On 05 Nov 17:56, Julian Brown wrote:
  +GOMP_OFFLOAD_register_image (void *host_table, void *target_data)
  +GOMP_OFFLOAD_get_table (struct mapping_table **table)
 
 FYI, these interfaces may change in the near future.
 Currently GOMP_OFFLOAD_get_table returns a joint table for all
 images, offloaded to a device.  But this doesn't work properly with
 offloading from dlopened libs. Do you plan to support such cases for
 PTX? Perhaps it's worth to replace them with a function like
 GOMP_OFFLOAD_load_image, which will offload one image, and return a
 target table for this image. In this case there is no need to pass
 host_table to the plugin, and return a joint table, since libgomp
 will join host and target tables itself.

I made some changes to table initialisation on the gomp4 branch also --
probably not enough to genuinely support multiple devices, but
hopefully some of the way there. Have you seen those? I haven't
considered dlopened libs though.

 Another question is what to do with multiple devices of same type.
 Can they have different images?  There are 2 options:
 1. GOMP_OFFLOAD_load_image will offload one image to one device and
 receive a table from it.
 or
 2. GOMP_OFFLOAD_register_image will register one image in the plugin
 for all devices of same type, and
 GOMP_OFFLOAD_get_table will return a table for one image and for one
 device.

Similarly, I added (partial, in the case of OpenMP) support for
multiple devices of the same type on the gomp4 branch.

Thanks,

Julian


Re: [gomp4] Rationalise thread-local variables in libgomp OpenACC support

2014-10-29 Thread Julian Brown
On Tue, 28 Oct 2014 11:16:19 +
Julian Brown jul...@codesourcery.com wrote:

 Hi,
 
 This patch rationalises TLS support by moving all thread-local
 variables into a single structure. Because this meant interfering with
 how per-thread/per-device initialisation was done, I took the
 opportunity to tidy up a couple of other bits along the way.
 Highlights are:

Here's a slightly-updated version of the patch, adjusted for Thomas's
removal of the queue.h list-handling functions. ChangeLog as before.

Thanks,

Juliancommit ab4e9ff7a52e43418d6d2fc5b5e76e0065e130d5
Author: Julian Brown jul...@codesourcery.com
Date:   Mon Oct 27 08:43:07 2014 -0700

TLS rework

diff --git a/libgomp/env.c b/libgomp/env.c
index 32fb92c..8b22e6f 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -28,6 +28,7 @@
 #include libgomp.h
 #include libgomp_f.h
 #include target.h
+#include oacc-int.h
 #include ctype.h
 #include stdlib.h
 #include stdio.h
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index e31573c..1496437 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -50,8 +50,4 @@ extern void GOMP_PLUGIN_mutex_destroy (gomp_mutex_t *mutex);
 extern void GOMP_PLUGIN_mutex_lock (gomp_mutex_t *mutex);
 extern void GOMP_PLUGIN_mutex_unlock (gomp_mutex_t *mutex);
 
-/* target.c */
-
-extern void GOMP_PLUGIN_async_unmap_vars (void *ptr);
-
 #endif
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 538aabb..c6a88a2 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -337,4 +337,5 @@ PLUGIN_1.0 {
 	GOMP_PLUGIN_mutex_lock;
 	GOMP_PLUGIN_mutex_unlock;
 	GOMP_PLUGIN_async_unmap_vars;
+	GOMP_PLUGIN_acc_thread;
 };
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
index 08b6b95..dddfe05 100644
--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c
@@ -29,6 +29,7 @@
 #include openacc.h
 #include libgomp.h
 #include target.h
+#include oacc-int.h
 
 int
 acc_async_test (int async)
@@ -36,13 +37,13 @@ acc_async_test (int async)
   if (async  acc_async_sync)
 gomp_fatal (invalid async argument: %d, async);
 
-  return ACC_dev-openacc.async_test_func (async);
+  return base_dev-openacc.async_test_func (async);
 }
 
 int
 acc_async_test_all (void)
 {
-  return ACC_dev-openacc.async_test_all_func ();
+  return base_dev-openacc.async_test_all_func ();
 }
 
 void
@@ -51,22 +52,19 @@ acc_wait (int async)
   if (async  acc_async_sync)
 gomp_fatal (invalid async argument: %d, async);
 
-  ACC_dev-openacc.async_wait_func (async);
-  return;
+  base_dev-openacc.async_wait_func (async);
 }
 
 void
 acc_wait_async (int async1, int async2)
 {
-  ACC_dev-openacc.async_wait_async_func (async1, async2);
-  return;
+  base_dev-openacc.async_wait_async_func (async1, async2);
 }
 
 void
 acc_wait_all (void)
 {
-  ACC_dev-openacc.async_wait_all_func ();
-  return;
+  base_dev-openacc.async_wait_all_func ();
 }
 
 void
@@ -75,6 +73,5 @@ acc_wait_all_async (int async)
   if (async  acc_async_sync)
 gomp_fatal (invalid async argument: %d, async);
 
-  ACC_dev-openacc.async_wait_all_async_func (async);
-  return;
+  base_dev-openacc.async_wait_all_async_func (async);
 }
diff --git a/libgomp/oacc-cuda.c b/libgomp/oacc-cuda.c
index f587325..3daf5b1 100644
--- a/libgomp/oacc-cuda.c
+++ b/libgomp/oacc-cuda.c
@@ -29,14 +29,15 @@
 #include config.h
 #include libgomp.h
 #include target.h
+#include oacc-int.h
 
 void *
 acc_get_current_cuda_device (void)
 {
   void *p = NULL;
 
-  if (ACC_dev  ACC_dev-openacc.cuda.get_current_device_func)
-p = ACC_dev-openacc.cuda.get_current_device_func ();
+  if (base_dev  base_dev-openacc.cuda.get_current_device_func)
+p = base_dev-openacc.cuda.get_current_device_func ();
 
   return p;
 }
@@ -46,8 +47,8 @@ acc_get_current_cuda_context (void)
 {
   void *p = NULL;
 
-  if (ACC_dev  ACC_dev-openacc.cuda.get_current_context_func)
-p = ACC_dev-openacc.cuda.get_current_context_func ();
+  if (base_dev  base_dev-openacc.cuda.get_current_context_func)
+p = base_dev-openacc.cuda.get_current_context_func ();
 
   return p;
 }
@@ -60,8 +61,8 @@ acc_get_cuda_stream (int async)
   if (async  0)
 return p;
 
-  if (ACC_dev  ACC_dev-openacc.cuda.get_stream_func)
-p = ACC_dev-openacc.cuda.get_stream_func (async);
+  if (base_dev  base_dev-openacc.cuda.get_stream_func)
+p = base_dev-openacc.cuda.get_stream_func (async);
 
   return p;
 }
@@ -73,9 +74,11 @@ acc_set_cuda_stream (int async, void *stream)
 
   if (async  0 || stream == NULL)
 return 0;
+  
+  ACC_lazy_initialize ();
 
-  if (ACC_dev  ACC_dev-openacc.cuda.set_stream_func)
-s = ACC_dev-openacc.cuda.set_stream_func (async, stream);
+  if (base_dev  base_dev-openacc.cuda.set_stream_func)
+s = base_dev-openacc.cuda.set_stream_func (async, stream);
 
   return s;
 }
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index f44ca5e..6fe8f6c 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -35,6 +35,9 @@
 #include target.h
 #ifdef HOST_NONSHM_PLUGIN
 #include libgomp

[gomp4] Rationalise thread-local variables in libgomp OpenACC support

2014-10-28 Thread Julian Brown
 memmap_t argument to struct
gomp_memory_mapping.
(lookup_dev): Change memmap_t argument to struct target_mem_desc.
Use list_count not refcount for iterating over mapped elements.
(acc_malloc): Use base_dev not ACC_dev.
(acc_free): Update call to lookup_dev. Use base_dev not ACC_dev.
(acc_memcpy_to_device, acc_memcpy_from_device): Use base_dev not
ACC_dev.
(acc_deviceptr, acc_is_present): Update call to lookup_host.
(acc_hostptr): Update call to lookup_dev.
(acc_map_data): Look up thread device instead of using ACC_dev,
update calls to lookup_host, lookup_dev. Use data environment in
device descriptor.
(acc_unmap_data): Update call to lookup_host. Remove mapped data
from data environment not ACC_memmap.
(present_create_copy): Update call to lookup_host. Use data
environment instead of list in ACC_memmap.
(delete_copyout): Update call to lookup_host. Look up device in
current thread info instead of using ACC_dev.
(update_dev_host): Look up device in current thread info instead of
using ACC_dev.
* oacc-parallel.c (oacc-int.h): Include.
(struct devgeom, devgeom, dump_devaddrs): Remove.
(select_acc_device): Call ACC_lazy_initialize earlier.
(GOACC_parallel): Use device for current thread instead of ACC_dev.
Use memory map from current device.
(GOACC_data_start): Likewise. Use thread info block for mapped data.
(GOACC_data_end): Use thread info block for mapped data.
(goacc_wait): Use device for current thread instead of ACC_dev.
(GOACC_update): Likewise. Formatting fixes.
* oacc-plugin.c (ACC_plugin_register): Remove.
(oacc-int.h): Include.
(GOMP_PLUGIN_acc_thread): New.
* oacc-plugin.h (target.h): Don't include.
(ACC_plugin_register): Remove.
(GOMP_PLUGIN_async_unmap_vars, GOMP_PLUGIN_acc_thread): Add extern
declarations.
* plugin-nvptx.c (oacc-plugin.h): Include.
(current_stream, PTX_dev, PTX_devices): Remove.
(struct nvptx_thread): New.
(nvptx_thread): New function.
(select_stream_for_async): Locate ptx_dev in device-specific TLS
data instead of using TLS PTX_dev variable.
(PTX_init): Don't initialize PTX_devices.
(PTX_open_device): Remove PTX_devices list handling. Tweak context
initialization.
(PTX_close_device): Remove PTX_devices list handling. Find PTX
device info via function argument instead of global TLS variable.
(PTX_get_num_devices): Make callable when backend has not been
initialized.
(event_gc): Find PTX device info, current stream via nvptx_thread.
(event_add, PTX_exec, PTX_host2dev, PTX_dev2host)
(PTX_async_test_all, PTX_wait_all, PTX_wait_all_async)
(PTX_get_current_cuda_device, PTX_get_current_cuda_context)
(PTX_get_cuda_stream, PTX_set_cuda_stream, openacc_close_device)
(openacc_set_device_num, openacc_register_async_cleanup)
(openacc_async_set_async): Likewise.
(openacc_create_thread_data, openacc_destroy_thread_data): New.
* target.c (oacc-int.h): Include.
(gomp_fini_device): Split out memory-map freeing into...
(gomp_free_memmap): ...this new function.
(gomp_load_plugin_for_device): Initialize
openacc.create_thread_data_func, openacc.destroy_thread_data_func
hooks.
(gomp_find_available_plugins): Initialize one target_device_descr
per physical device.
* target.h (oacc-int.h): Don't include.
(ACC_dispatch_t): Declare here. Add data_environ, ord fields.
Update comment for mem_map field.
(gomp_free_memmap): Add prototype.
commit 898dba8e56827d7dde964e63f53c804c59674e9b
Author: Julian Brown jul...@codesourcery.com
Date:   Mon Oct 27 08:43:07 2014 -0700

TLS rework

diff --git a/libgomp/env.c b/libgomp/env.c
index 32fb92c..8b22e6f 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -28,6 +28,7 @@
 #include libgomp.h
 #include libgomp_f.h
 #include target.h
+#include oacc-int.h
 #include ctype.h
 #include stdlib.h
 #include stdio.h
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index e31573c..1496437 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -50,8 +50,4 @@ extern void GOMP_PLUGIN_mutex_destroy (gomp_mutex_t *mutex);
 extern void GOMP_PLUGIN_mutex_lock (gomp_mutex_t *mutex);
 extern void GOMP_PLUGIN_mutex_unlock (gomp_mutex_t *mutex);
 
-/* target.c */
-
-extern void GOMP_PLUGIN_async_unmap_vars (void *ptr);
-
 #endif
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 538aabb..c6a88a2 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -337,4 +337,5 @@ PLUGIN_1.0 {
 	GOMP_PLUGIN_mutex_lock;
 	GOMP_PLUGIN_mutex_unlock;
 	GOMP_PLUGIN_async_unmap_vars;
+	GOMP_PLUGIN_acc_thread;
 };
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
index 08b6b95..dddfe05 100644
--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c
@@ -29,6 +29,7 @@
 #include openacc.h
 #include libgomp.h
 #include target.h
+#include oacc-int.h
 
 int
 acc_async_test (int async)
@@ -36,13

[gomp4] Remove goacc_parse_device_num

2014-10-28 Thread Julian Brown
Hi,

This patch removes the goacc_parse_device_num function in libgomp's
env.c since it is redundant with parse_int. I also added some bounds
checking for the device number in oacc-init.c (the behaviour is left as
implementation defined in the OpenACC 2.0 spec, so I chose to raise
an error for an out-of-range device number).

OK for gomp4 branch?

Thanks,

Julian

ChangeLog

libgomp/
* env.c (goacc_parse_device_num): Remove.
(initialize_env): Use parse_int instead of goacc_parse_device_num.
* oacc-init.c (lazy_open): Add bounds check for device number.commit 1dacb833b33d179553723faecf4b32e89efc69a9
Author: Julian Brown jul...@codesourcery.com
Date:   Tue Oct 28 06:03:47 2014 -0700

ACC_DEVICE_NUM tweaks

diff --git a/libgomp/env.c b/libgomp/env.c
index 8b22e6f..02bce0c 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -1016,27 +1016,6 @@ parse_affinity (bool ignore)
   return false;
 }
 
-
-static void
-goacc_parse_device_num (void)
-{
-  const char *env = getenv (ACC_DEVICE_NUM);
-  int default_num = -1;
-  
-  if (env  *env != '\0')
-{
-  char *end;
-  default_num = strtol (env, end, 0);
-  
-  if (*end || default_num  0)
-default_num = 0;
-}
-  else
-default_num = 0;
-  
-  goacc_device_num = default_num;
-}
-
 static void
 goacc_parse_device_type (void)
 {
@@ -1310,7 +1289,9 @@ initialize_env (void)
   handle_omp_display_env (stacksize, wait_policy);
   
   /* Look for OpenACC-specific environment variables.  */
-  goacc_parse_device_num ();
+  if (!parse_int (ACC_DEVICE_NUM, goacc_device_num, true))
+goacc_device_num = 0;
+
   goacc_parse_device_type ();
 
   /* Initialize OpenACC-specific internal state.  */
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 489ac14..24e911b 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -249,6 +249,9 @@ lazy_open (int ord)
   if (ord  0)
 ord = goacc_device_num;
 
+  if (ord = base_dev-get_num_devices_func ())
+gomp_fatal (device %u does not exist, ord);
+
   if (!thr)
 thr = goacc_new_thread ();
   


[gomp4] Don't put acc_notify_var in thread-local struct

2014-10-28 Thread Julian Brown
Hi,

This patch moves acc_notify_var out of gomp_task_icv and makes it
simply a global variable instead.

OK for gomp4 branch?

Thanks,

Julian

ChangeLog

libgomp/
* env.c (goacc_notify_var): New.
(initialize_env): Use above instead of
gomp_global_icv.acc_notify_var.
* error.c (gomp_vnotify): Use goacc_notify_var.
(gomp_notify): Fix formatting.
* libgomp.h (gomp_task_icv): Remove acc_notify_var field.
(goacc_notify_var): Add extern declaration.commit 5b18c3e134779ee562af11702d2ba2c4baa66370
Author: Julian Brown jul...@codesourcery.com
Date:   Tue Oct 28 06:45:41 2014 -0700

acc_notify_var tweaks

diff --git a/libgomp/env.c b/libgomp/env.c
index 02bce0c..03206dd 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -79,6 +79,7 @@ unsigned long gomp_bind_var_list_len;
 void **gomp_places_list;
 unsigned long gomp_places_list_len;
 
+int goacc_notify_var;
 int goacc_device_num;
 char* goacc_device_type;
 
@@ -1196,7 +1197,7 @@ initialize_env (void)
   gomp_global_icv.thread_limit_var
 	= thread_limit_var  INT_MAX ? UINT_MAX : thread_limit_var;
 }
-  parse_int (GCC_ACC_NOTIFY, gomp_global_icv.acc_notify_var, true);
+  parse_int (GCC_ACC_NOTIFY, goacc_notify_var, true);
 #ifndef HAVE_SYNC_BUILTINS
   gomp_mutex_init (gomp_managed_threads_lock);
 #endif
diff --git a/libgomp/error.c b/libgomp/error.c
index 5f400cc..320b4d2 100644
--- a/libgomp/error.c
+++ b/libgomp/error.c
@@ -76,13 +76,12 @@ gomp_fatal (const char *fmt, ...)
 void
 gomp_vnotify (const char *msg, va_list list)
 {
-  struct gomp_task_icv *icv = gomp_icv (false);
-  if (icv-acc_notify_var)
+  if (goacc_notify_var)
 vfprintf (stderr, msg, list);
 }
 
 void
-gomp_notify(const char *msg, ...)
+gomp_notify (const char *msg, ...)
 {
   va_list list;
   
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 8b7327d..206b293 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -238,7 +238,6 @@ struct gomp_task_icv
   bool dyn_var;
   bool nest_var;
   char bind_var;
-  int acc_notify_var;
   /* Internal ICV.  */
   struct target_mem_desc *target_data;
 };
@@ -257,6 +256,7 @@ extern unsigned long gomp_bind_var_list_len;
 extern void **gomp_places_list;
 extern unsigned long gomp_places_list_len;
 
+extern int goacc_notify_var;
 extern int goacc_device_num;
 extern char* goacc_device_type;
 


[gomp4] Remove redundant get_caps hook invocations

2014-10-28 Thread Julian Brown
Hi,

This patch causes the get_caps hook to be called only once during
device initialisation, and caches the result in the device's
capabilities field.

OK for gomp4 branch?

Thanks,

Julian

ChangeLog

libgomp/
* target.c (gomp_load_plugin_for_device): Only call get_caps once.
(gomp_find_available_plugins): ...and don't call it again here.commit 271ee70eec93866e312c7b9363cb0e736b6361d3
Author: Julian Brown jul...@codesourcery.com
Date:   Tue Oct 28 07:14:19 2014 -0700

Remove redundant get_caps calls.

diff --git a/libgomp/target.c b/libgomp/target.c
index 73a186b..615ba6b 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1036,9 +1036,10 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   DLSYM (device_free);
   DLSYM (device_dev2host);
   DLSYM (device_host2dev);
-  if (device-get_caps_func ()  TARGET_CAP_OPENMP_400)
+  device-capabilities = device-get_caps_func ();
+  if (device-capabilities  TARGET_CAP_OPENMP_400)
 DLSYM (device_run);
-  if (device-get_caps_func ()  TARGET_CAP_OPENACC_200)
+  if (device-capabilities  TARGET_CAP_OPENACC_200)
 {
   optional_present = optional_total = 0;
   DLSYM_OPT (openacc.exec, openacc_parallel);
@@ -1167,7 +1168,6 @@ gomp_find_available_plugins (void)
 	  devicep-mem_map.is_initialized = false;
 	  devicep-type = devicep-get_type_func ();
 	  devicep-name = devicep-get_name_func ();
-	  devicep-capabilities = devicep-get_caps_func ();
 	  gomp_mutex_init (devicep-mem_map.lock);
 	  devicep-ord = i;
 	  devicep-target_data = NULL;


[gomp4] Remove stray debugging code

2014-10-28 Thread Julian Brown
Hi,

This patch removes some debugging code leftover from development. It's
probably not helpful to keep it around now.

OK for gomp4 branch?

Thanks,

Julian

ChangeLog

libgomp/
* oacc-host.c (DEBUG): Remove undefine.
* plugin-nvptx.c (DEBUG, DISABLE_ASYNC): Remove comment-out macro
definitions.
* target.c (dump_mappings): Remove debugging function.commit 13794d26fc95225268e05abf9912ab6eba3c7b3f
Author: Julian Brown jul...@codesourcery.com
Date:   Tue Oct 28 06:49:19 2014 -0700

Remove stray debugging code

diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 6fe8f6c..fc3e77c 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -45,8 +45,6 @@
 #include string.h
 #include stdio.h
 
-#undef DEBUG
-
 #ifdef HOST_NONSHM_PLUGIN
 #define STATIC
 #define GOMP(X) GOMP_PLUGIN_##X
diff --git a/libgomp/plugin-nvptx.c b/libgomp/plugin-nvptx.c
index c5bdf73..8d040fe 100644
--- a/libgomp/plugin-nvptx.c
+++ b/libgomp/plugin-nvptx.c
@@ -30,9 +30,6 @@
is not clear as to what that state might be.  Or how one might
propagate it from one thread to another.  */
 
-//#define DEBUG
-//#define DISABLE_ASYNC
-
 #include openacc.h
 #include config.h
 #include libgomp.h
diff --git a/libgomp/target.c b/libgomp/target.c
index bce8ca6..73a186b 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -110,34 +110,6 @@ resolve_device (int device_id)
   return devices[device_id];
 }
 
-__attribute__((used)) static void
-dump_mappings (FILE *f, splay_tree_node node)
-{
-  int i;
-  
-  splay_tree_key k = node-key;
-  
-  if (!k)
-return;
-  
-  fprintf (f, key %p: host_start %p, host_end %p, tgt_offset %p, refcount %d, 
-	   copy_from %s\n, k, (void *) k-host_start,
-	   (void *) k-host_end, (void *) k-tgt_offset, (int) k-refcount,
-	   k-copy_from ? true : false);
-  fprintf (f, tgt-refcount %d, tgt-tgt_start %p, tgt-tgt_end %p, 
-	   tgt-to_free %p, tgt-prev %p, tgt-list_count %d, 
-	   tgt-device_descr %p\n, (int) k-tgt-refcount,
-	   (void *) k-tgt-tgt_start, (void *) k-tgt-tgt_end,
-	   k-tgt-to_free, k-tgt-prev, (int) k-tgt-list_count,
-	   k-tgt-device_descr);
-
-  for (i = 0; i  k-tgt-list_count; i++)
-fprintf (f, item %d: %p\n, i, k-tgt-list[i]);
-  
-  dump_mappings (f, node-left);
-  dump_mappings (f, node-right);
-}
-
 /* Handle the case where splay_tree_lookup found oldn for newn.
Helper function of gomp_map_vars.  */
 


[gomp4] Remove gomp_map_vars mem_map argument

2014-10-28 Thread Julian Brown
Hi,

This patch removes the now-redundant gomp_memory_mapping argument from
gomp_map_vars, introduced when OpenACC kept the structure in question
in a different place from OpenMP. Both now keep the memory map in the
gomp_device_descr structure, so there's no need to pass both that and
the memory map to the function explicitly.

OK for gomp4 branch?

Thanks,

Julian

ChangeLog

libgomp/
* target.c (gomp_map_vars): Remove MM argument.
(GOMP_target, GOMP_target_data): Update calls to gomp_map_vars.
* oacc-mem.c (acc_map_data, present_create_copy): Update calls to
gomp_map_vars.
* oacc-parallel.c (GOACC_parallel, GOACC_data_start): Likewise.
* target.h (gomp_map_vars): Update prototype.commit 3afc4e592a6d8a796ec0c44bb8dc808b1392fd29
Author: Julian Brown jul...@codesourcery.com
Date:   Tue Oct 28 09:17:01 2014 -0700

Remove gomp_map_vars mem_map argument

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index d812f72..582a1e0 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -257,7 +257,7 @@ acc_map_data (void *h, void *d, size_t s)
   if (d != h)
 gomp_fatal (cannot map data on shared-memory system);
 
-  tgt = gomp_map_vars (NULL, NULL, 0, NULL, NULL, NULL, NULL, true, false);
+  tgt = gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, true, false);
 }
   else
 {
@@ -275,9 +275,8 @@ acc_map_data (void *h, void *d, size_t s)
 	gomp_fatal (device address [%p, +%d] is already mapped, (void *)d,
 		(int)s);
 
-  tgt = gomp_map_vars ((struct gomp_device_descr *) acc_dev,
-			   acc_dev-mem_map, mapnum, hostaddrs,
-			   devaddrs, sizes, kinds, true, false);
+  tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs, devaddrs, sizes,
+			   kinds, true, false);
 }
 
   tgt-prev = acc_dev-openacc.data_environ;
@@ -383,9 +382,8 @@ present_create_copy (unsigned f, void *h, size_t s)
   else
 kinds = GOMP_MAP_ALLOC;
 
-  tgt = gomp_map_vars ((struct gomp_device_descr *) acc_dev,
-			   acc_dev-mem_map, mapnum, hostaddrs,
-			   NULL, s, kinds, true, false);
+  tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs, NULL, s, kinds, true,
+			   false);
 
   gomp_mutex_lock (acc_dev-mem_map.lock);
 
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index b787df7..1639244 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -173,9 +173,8 @@ GOACC_parallel (int device, void (*fn) (void *), const void *openmp_target,
   else
 tgt_fn = (void (*)) fn;
 
-  tgt = gomp_map_vars ((struct gomp_device_descr *) acc_dev,
-		   acc_dev-mem_map, mapnum, hostaddrs,
-		   NULL, sizes, kinds, true, false);
+  tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs, NULL, sizes, kinds, true,
+		   false);
 
   devaddrs = alloca (sizeof (void *) * mapnum);
   for (i = 0; i  mapnum; i++)
@@ -217,7 +216,7 @@ GOACC_data_start (int device, const void *openmp_target, size_t mapnum,
   if ((acc_dev-capabilities  TARGET_CAP_SHARED_MEM)
   || !if_clause_condition_value)
 {
-  tgt = gomp_map_vars (NULL, NULL, 0, NULL, NULL, NULL, NULL, true, false);
+  tgt = gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, true, false);
   tgt-prev = thr-mapped_data;
   thr-mapped_data = tgt;
 
@@ -225,9 +224,8 @@ GOACC_data_start (int device, const void *openmp_target, size_t mapnum,
 }
 
   gomp_notify (  %s: prepare mappings\n, __FUNCTION__);
-  tgt = gomp_map_vars ((struct gomp_device_descr *) acc_dev,
-		   acc_dev-mem_map, mapnum, hostaddrs,
-		   NULL, sizes, kinds, true, false);
+  tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs, NULL, sizes, kinds, true,
+		   false);
   gomp_notify (  %s: mappings prepared\n, __FUNCTION__);
   tgt-prev = thr-mapped_data;
   thr-mapped_data = tgt;
diff --git a/libgomp/target.c b/libgomp/target.c
index 615ba6b..507488e 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -134,14 +134,14 @@ get_kind (bool is_openacc, void *kinds, int idx)
 }
 
 attribute_hidden struct target_mem_desc *
-gomp_map_vars (struct gomp_device_descr *devicep,
-	   struct gomp_memory_mapping *mm, size_t mapnum,
-	   void **hostaddrs, void **devaddrs, size_t *sizes,
-	   void *kinds, bool is_openacc, bool is_target)
+gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
+	   void **hostaddrs, void **devaddrs, size_t *sizes, void *kinds,
+	   bool is_openacc, bool is_target)
 {
   size_t i, tgt_align, tgt_size, not_found_cnt = 0;
   const int rshift = is_openacc ? 8 : 3;
   const int typemask = is_openacc ? 0xff : 0x7;
+  struct gomp_memory_mapping *mm = devicep-mem_map;
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
 = gomp_malloc (sizeof (*tgt) + sizeof (tgt-list[0]) * mapnum);
@@ -861,8 +861,8 @@ GOMP_target (int device, void (*fn) (void *), const void *openmp_target,
   gomp_mutex_unlock (mm-lock);
 
   struct target_mem_desc *tgt_vars
-= gomp_map_vars (devicep, devicep-mem_map, mapnum, hostaddrs, NULL

Re: [gomp4] Remove gomp_map_vars mem_map argument

2014-10-28 Thread Julian Brown
On Tue, 28 Oct 2014 16:52:22 +
Julian Brown jul...@codesourcery.com wrote:

 Hi,
 
 This patch removes the now-redundant gomp_memory_mapping argument from
 gomp_map_vars, introduced when OpenACC kept the structure in question
 in a different place from OpenMP. Both now keep the memory map in the
 gomp_device_descr structure, so there's no need to pass both that and
 the memory map to the function explicitly.
 
 OK for gomp4 branch?

Forgot to say: this patch and the previous three have been tested with
no regressions (alongside a version of Bernd's PTX support patches) on
the gomp4 branch (libgomp tests).

Julian


[gomp4] Use GOMP_PLUGIN_ not gomp_plugin_ for libgomp plugin API

2014-10-17 Thread Julian Brown
Hi,

As the title says, this patch makes the libgomp plugin API use the
GOMP_PLUGIN_ prefix rather than gomp_plugin_. This is purely a
mechanical change.

OK for the gomp4 branch?

Thanks,

Julian

ChangeLog

libgomp/
* libgomp-plugin.c (gomp_plugin_*): Rename to...
(GOMP_PLUGIN_*): This.
* libgomp-plugin.h: Likewise.
* libgomp.map: Likewise.
* oacc-host.c (GOMP): Use GOMP_PLUGIN_ in macro expansion.
* oacc-plugin.c (gomp_plugin_*): Rename to...
(GOMP_PLUGIN_*): This.
* plugin-nvptx.c: Likewise.commit cce63ddb8895d3b51a176d68045b7920affc05e5
Author: Julian Brown jul...@codesourcery.com
Date:   Wed Oct 15 02:05:08 2014 -0700

Use GOMP_PLUGIN_ not gomp_plugin_ for libgomp plugin API.

diff --git a/libgomp/libgomp-plugin.c b/libgomp/libgomp-plugin.c
index 46dd7b0..0f72bb9 100644
--- a/libgomp/libgomp-plugin.c
+++ b/libgomp/libgomp-plugin.c
@@ -31,25 +31,25 @@
 #include target.h
 
 void *
-gomp_plugin_malloc (size_t size)
+GOMP_PLUGIN_malloc (size_t size)
 {
   return gomp_malloc (size);
 }
 
 void *
-gomp_plugin_malloc_cleared (size_t size)
+GOMP_PLUGIN_malloc_cleared (size_t size)
 {
   return gomp_malloc_cleared (size);
 }
 
 void *
-gomp_plugin_realloc (void *ptr, size_t size)
+GOMP_PLUGIN_realloc (void *ptr, size_t size)
 {
   return gomp_realloc (ptr, size);
 }
 
 void
-gomp_plugin_error (const char *msg, ...)
+GOMP_PLUGIN_error (const char *msg, ...)
 {
   va_list ap;
   
@@ -59,7 +59,7 @@ gomp_plugin_error (const char *msg, ...)
 }
 
 void
-gomp_plugin_notify (const char *msg, ...)
+GOMP_PLUGIN_notify (const char *msg, ...)
 {
   va_list ap;
   
@@ -69,7 +69,7 @@ gomp_plugin_notify (const char *msg, ...)
 }
 
 void
-gomp_plugin_fatal (const char *msg, ...)
+GOMP_PLUGIN_fatal (const char *msg, ...)
 {
   va_list ap;
   
@@ -82,25 +82,25 @@ gomp_plugin_fatal (const char *msg, ...)
 }
 
 void
-gomp_plugin_mutex_init (gomp_mutex_t *mutex)
+GOMP_PLUGIN_mutex_init (gomp_mutex_t *mutex)
 {
   gomp_mutex_init (mutex);
 }
 
 void
-gomp_plugin_mutex_destroy (gomp_mutex_t *mutex)
+GOMP_PLUGIN_mutex_destroy (gomp_mutex_t *mutex)
 {
   gomp_mutex_destroy (mutex);
 }
 
 void
-gomp_plugin_mutex_lock (gomp_mutex_t *mutex)
+GOMP_PLUGIN_mutex_lock (gomp_mutex_t *mutex)
 {
   gomp_mutex_lock (mutex);
 }
 
 void
-gomp_plugin_mutex_unlock (gomp_mutex_t *mutex)
+GOMP_PLUGIN_mutex_unlock (gomp_mutex_t *mutex)
 {
   gomp_mutex_unlock (mutex);
 }
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 0ecb407..e31573c 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -31,27 +31,27 @@
 
 /* alloc.c */
 
-extern void *gomp_plugin_malloc (size_t) __attribute__((malloc));
-extern void *gomp_plugin_malloc_cleared (size_t) __attribute__((malloc));
-extern void *gomp_plugin_realloc (void *, size_t);
+extern void *GOMP_PLUGIN_malloc (size_t) __attribute__((malloc));
+extern void *GOMP_PLUGIN_malloc_cleared (size_t) __attribute__((malloc));
+extern void *GOMP_PLUGIN_realloc (void *, size_t);
 
 /* error.c */
 
-extern void gomp_plugin_notify(const char *msg, ...);
-extern void gomp_plugin_error (const char *, ...)
+extern void GOMP_PLUGIN_notify(const char *msg, ...);
+extern void GOMP_PLUGIN_error (const char *, ...)
 	__attribute__((format (printf, 1, 2)));
-extern void gomp_plugin_fatal (const char *, ...)
+extern void GOMP_PLUGIN_fatal (const char *, ...)
 	__attribute__((noreturn, format (printf, 1, 2)));
 
 /* mutex.c */
 
-extern void gomp_plugin_mutex_init (gomp_mutex_t *mutex);
-extern void gomp_plugin_mutex_destroy (gomp_mutex_t *mutex);
-extern void gomp_plugin_mutex_lock (gomp_mutex_t *mutex);
-extern void gomp_plugin_mutex_unlock (gomp_mutex_t *mutex);
+extern void GOMP_PLUGIN_mutex_init (gomp_mutex_t *mutex);
+extern void GOMP_PLUGIN_mutex_destroy (gomp_mutex_t *mutex);
+extern void GOMP_PLUGIN_mutex_lock (gomp_mutex_t *mutex);
+extern void GOMP_PLUGIN_mutex_unlock (gomp_mutex_t *mutex);
 
 /* target.c */
 
-extern void gomp_plugin_async_unmap_vars (void *ptr);
+extern void GOMP_PLUGIN_async_unmap_vars (void *ptr);
 
 #endif
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index e1e87d9..538aabb 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -326,15 +326,15 @@ GOACC_2.0 {
 # FIXME: Hygiene/grouping/naming?
 PLUGIN_1.0 {
   global:
-	gomp_plugin_malloc;
-	gomp_plugin_malloc_cleared;
-	gomp_plugin_realloc;
-	gomp_plugin_error;
-	gomp_plugin_notify;
-	gomp_plugin_fatal;
-	gomp_plugin_mutex_init;
-	gomp_plugin_mutex_destroy;
-	gomp_plugin_mutex_lock;
-	gomp_plugin_mutex_unlock;
-	gomp_plugin_async_unmap_vars;
+	GOMP_PLUGIN_malloc;
+	GOMP_PLUGIN_malloc_cleared;
+	GOMP_PLUGIN_realloc;
+	GOMP_PLUGIN_error;
+	GOMP_PLUGIN_notify;
+	GOMP_PLUGIN_fatal;
+	GOMP_PLUGIN_mutex_init;
+	GOMP_PLUGIN_mutex_destroy;
+	GOMP_PLUGIN_mutex_lock;
+	GOMP_PLUGIN_mutex_unlock;
+	GOMP_PLUGIN_async_unmap_vars;
 };
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 7a50d65..a47617a 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c

[gomp4] Fix include path configury for gomp-constants.h

2014-10-17 Thread Julian Brown
Hi,

This patch tweaks the include path configury used by libgomp to find
the gomp-constants.h header, as suggested by Jakub.

OK for the gomp4 branch?

Thanks,

Julian

libgomp/
* Makefile.am (AM_CPPFLAGS): Fix search path for locating
gomp-constants.h.
* Makefile.in: Regenerate.commit a682a91d68d3ffb1516a1589ef093e00151a6078
Author: Julian Brown jul...@codesourcery.com
Date:   Wed Oct 15 02:12:07 2014 -0700

Fix include path configury for gomp-constants.h.

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 7ddb0a4..77f71ee 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -14,8 +14,7 @@ libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
 
 vpath % $(strip $(search_path))
 
-AM_CPPFLAGS = $(addprefix -I, $(search_path)) \
-	$(addprefix -I, $(search_path)/../include)
+AM_CPPFLAGS = $(addprefix -I, $(search_path)) -I $(top_srcdir)/../include
 AM_CFLAGS = $(XCFLAGS)
 AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
 
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 4965442..fdd18ff 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -333,9 +333,7 @@ gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
 search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir)
 fincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/finclude
 libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
-AM_CPPFLAGS = $(addprefix -I, $(search_path)) \
-	$(addprefix -I, $(search_path)/../include)
-
+AM_CPPFLAGS = $(addprefix -I, $(search_path)) -I $(top_srcdir)/../include
 AM_CFLAGS = $(XCFLAGS)
 AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
 toolexeclib_LTLIBRARIES = libgomp.la $(am__append_1) \


[gomp4] Asynchronous data unmapping wait fixes for OpenACC

2014-10-17 Thread Julian Brown
Hi,

This patch introduces a new plugin hook in libgomp to register a
callback function to clean up host-side bookkeeping data after an
asynchronous operation has completed (replacing the previous ad-hoc
method used in the NVPTX backend), and adds code to ensure that same
cleanup is done reliably in the NVPTX backend when the user program
hits a wait directive, or equivalent.

OK for the gomp4 branch?

Thanks,

Julian

ChangeLog

libgomp/
* oacc-host.c (openacc_register_async_cleanup): New.
(host_dispatch): Initialise register_async_cleanup_func entry.
* oacc-int.h (struct ACC_dispatch_t): Add
register_async_cleanup_func hook.
* oacc-parallel.c (GOACC_parallel): Call
register_async_cleanup_func hook after queuing asynchronous
copy-back.
* plugin-nvptx.c (enum PTX_event_type): Add PTX_EVT_ASYNC_CLEANUP.
(struct PTX_event): Remove tgt field.
(event_gc): Don't do async cleanup in PTX_EVT_KNL, do it in
PTX_EVT_ASYNC_CLEANUP instead.
(event_add): Remove tgt argument. Support PTX_EVT_ASYNC_CLEANUP
events.
(PTX_exec, PTX_host2dev, PTX_dev2host, PTX_wait_async)
(PTX_wait_all_async): Update calls to event_add.
(openacc_register_async_cleanup): New.
(PTX_async_test): Call event_gc on success path.
(PTX_async_test_all): Likewise.
* target.c (gomp_load_plugin_for_device): Initialise
register_async_cleanup hook.
commit 78d6b16bf258106282f791f2e7b3010bf75f2a86
Author: Julian Brown jul...@codesourcery.com
Date:   Wed Oct 15 02:10:00 2014 -0700

Async fixes/improvements.

diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index a47617a..f44ca5e 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -294,6 +294,16 @@ openacc_parallel (void (*fn) (void *), size_t mapnum __attribute__((unused)),
 }
 
 STATIC void
+openacc_register_async_cleanup (void *targ_mem_desc)
+{
+#ifdef HOST_NONSHM_PLUGIN
+  /* Asynchronous launches are executed synchronously on the (non-SHM) host,
+ so there's no point in delaying host-side cleanup -- just do it now.  */
+  GOMP_PLUGIN_async_unmap_vars (targ_mem_desc);
+#endif
+}
+
+STATIC void
 openacc_async_set_async (int async __attribute__((unused)))
 {
 #ifdef DEBUG
@@ -397,6 +407,8 @@ static struct gomp_device_descr host_dispatch =
 
   .exec_func = openacc_parallel,
 
+  .register_async_cleanup_func = openacc_register_async_cleanup,
+
   .async_set_async_func = openacc_async_set_async,
   .async_test_func = openacc_async_test,
   .async_test_all_func = openacc_async_test_all,
diff --git a/libgomp/oacc-int.h b/libgomp/oacc-int.h
index e1d2e32..03529cc 100644
--- a/libgomp/oacc-int.h
+++ b/libgomp/oacc-int.h
@@ -64,6 +64,9 @@ typedef struct ACC_dispatch_t
   void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
 		 unsigned short *, int, int, int, int, void *);
 
+  /* async cleanup callback registration */
+  void (*register_async_cleanup_func) (void *);
+
   /* asynchronous routines  */
   int (*async_test_func) (int);
   int (*async_test_all_func) (void);
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 57ac8de..e3f156c 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -213,7 +213,10 @@ GOACC_parallel (int device, void (*fn) (void *), const void *openmp_target,
   if (async  acc_async_noval)
 gomp_unmap_vars (tgt, true);
   else
-gomp_copy_from_async (tgt);
+{
+  gomp_copy_from_async (tgt);
+  ACC_dev-openacc.register_async_cleanup_func (tgt);
+}
 
   ACC_dev-openacc.async_set_async_func (acc_async_sync);
 }
diff --git a/libgomp/plugin-nvptx.c b/libgomp/plugin-nvptx.c
index e163f3a..f193229 100644
--- a/libgomp/plugin-nvptx.c
+++ b/libgomp/plugin-nvptx.c
@@ -317,7 +317,8 @@ enum PTX_event_type
 {
   PTX_EVT_MEM,
   PTX_EVT_KNL,
-  PTX_EVT_SYNC
+  PTX_EVT_SYNC,
+  PTX_EVT_ASYNC_CLEANUP
 };
 
 struct PTX_event
@@ -325,7 +326,6 @@ struct PTX_event
   CUevent *evt;
   int type;
   void *addr;
-  void *tgt;
   int ord;
   SLIST_ENTRY(PTX_event) next;
 };
@@ -946,6 +946,10 @@ event_gc (bool memmap_lockable)
 	  break;
 	
 	case PTX_EVT_KNL:
+  map_pop (ptx_event-addr);
+	  break;
+
+	case PTX_EVT_ASYNC_CLEANUP:
   {
 	/* The function GOMP_PLUGIN_async_unmap_vars needs to claim the
 		   memory-map splay tree lock for the current device, so we
@@ -955,9 +959,7 @@ event_gc (bool memmap_lockable)
 	if (!memmap_lockable)
 		  goto next_event;
 
-	map_pop (ptx_event-addr);
-		if (ptx_event-tgt)
-		  GOMP_PLUGIN_async_unmap_vars (ptx_event-tgt);
+		GOMP_PLUGIN_async_unmap_vars (ptx_event-addr);
   }
 	  break;
 	}
@@ -978,17 +980,17 @@ event_gc (bool memmap_lockable)
 }
 
 static void
-event_add (enum PTX_event_type type, CUevent *e, void *h, void *tgt)
+event_add (enum PTX_event_type type, CUevent *e, void *h)
 {
   struct PTX_event *ptx_event;
 
-  assert (type == PTX_EVT_MEM || type == PTX_EVT_KNL || type

[gomp] [3/3] OpenACC 2.0 support for libgomp - documentation

2014-10-14 Thread Julian Brown
This is a version of the patch:

https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02024.html

against gomp4 branch instead of mainline.

OK to apply?

Thanks,

Julian

-xx-xx  Thomas Schwinge  tho...@codesourcery.com
James Norris  jnor...@codesourcery.com

libgomp/
* libgomp.texi: Outline documentation for OpenACC.
From c58006a7ade2a9556bd73bac9ef45b3bbd62ca37 Mon Sep 17 00:00:00 2001
From: Julian Brown jul...@codesourcery.com
Date: Wed, 17 Sep 2014 10:26:56 -0700
Subject: [PATCH 2/3] OpenACC documentation

---
 libgomp/libgomp.texi |  661 --
 1 file changed, 636 insertions(+), 25 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 254be57..9530a2b 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -31,10 +31,12 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
 @ifinfo
 @dircategory GNU Libraries
 @direntry
-* libgomp: (libgomp).GNU OpenMP runtime library
+* libgomp: (libgomp).GNU OpenACC and OpenMP runtime library
 @end direntry
 
-This manual documents the GNU implementation of the OpenMP API for 
+This manual documents the GNU implementation of the OpenACC API for 
+offloading of code to accelerator devices in C/C++ and Fortran and
+the GNU implementation of the OpenMP API for 
 multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
 Published by the Free Software Foundation
@@ -48,7 +50,7 @@ Boston, MA 02110-1301 USA
 @setchapternewpage odd
 
 @titlepage
-@title The GNU OpenMP Implementation
+@title The GNU OpenACC and OpenMP Implementation
 @page
 @vskip 0pt plus 1filll
 @comment For the @value{version-GCC} Version*
@@ -69,7 +71,10 @@ Boston, MA 02110-1301, USA@*
 @top Introduction
 @cindex Introduction
 
-This manual documents the usage of libgomp, the GNU implementation of the 
+This manual documents the usage of libgomp, the GNU implementation of the
+@uref{http://www.openacc.org/, OpenACC} Application Programming Interface (API)
+for offloading of code to accelerator devices in C/C++ and Fortran, and
+the GNU implementation of the 
 @uref{http://www.openmp.org, OpenMP} Application Programming Interface (API)
 for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
@@ -81,23 +86,619 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 @comment  better formatting.
 @comment
 @menu
-* Enabling OpenMP::How to enable OpenMP for your applications.
-* Runtime Library Routines::   The OpenMP runtime application programming 
-   interface.
-* Environment Variables::  Influencing runtime behavior with environment 
-   variables.
-* The libgomp ABI::Notes on the external ABI presented by libgomp.
-* Reporting Bugs:: How to report bugs in GNU OpenMP.
-* Copying::GNU general public license says
-   how you can copy and share libgomp.
-* GNU Free Documentation License::
-   How you can copy and share this manual.
-* Funding::How to help assure continued work for free 
-   software.
-* Library Index::  Index of this documentation.
+* Enabling OpenACC:: How to enable OpenACC for your
+ applications.
+* OpenACC Runtime Library Routines:: The OpenACC runtime application
+  programming interface.
+* OpenACC Environment Variables::Influencing OpenACC runtime behavior with
+ environment variables.
+* OpenACC Library Interoperability:: OpenACC library interoperability with the
+ NVIDIA CUBLAS library.
+* Enabling OpenMP::  How to enable OpenMP for your
+ applications.
+* OpenMP Runtime Library Routines: Runtime Library Routines.
+ The OpenMP runtime application programming
+ interface.
+* OpenMP Environment Variables: Environment Variables.
+ Influencing OpenMP runtime behavior with
+ environment variables.
+* The libgomp ABI::  Notes on the external libgomp ABI.
+* Reporting Bugs::   How to report bugs.
+* Copying::  GNU general public license says how you
+ can copy and share libgomp.
+* GNU Free Documentation License::   How you can copy and share this manual.
+* Funding::  How to help assure continued work for free
+ software.
+* Library Index::Index of this documentation.
 @end menu
 
 
+
+@c

Re: [PATCH 0/10] OpenACC 2.0 support for libgomp

2014-10-02 Thread Julian Brown
Hi,

On Wed, 24 Sep 2014 14:32:31 +0200
Jakub Jelinek ja...@redhat.com wrote:

 On Tue, Sep 23, 2014 at 07:17:25PM +0100, Julian Brown wrote:
  The upcoming patch series constitutes our current (still
  in-progress) implementation of run-time support for OpenACC 2.0 in
  libgomp. We've tried to build on top of the (also currently WIP)
  support for OpenMP 4.0's target construct, sharing code where
  possible: because of this, I've also prepared versions of (a fairly
  minimal, hopefully correct set of) prerequisite patches that apply
  to current mainline (and were previously on the gomp 4.0 branch),
  although in many cases we weren't the original authors of those.
  
  Other parts of the OpenACC support for GCC are being sent upstream
  concurrently with this runtime support (and are co-dependent with
  it), so unfortunately, though the main part of the implementation
  (part 7/10) works on our internal branch, I haven't yet been able
  to convincingly test the series I'm about to post upstream. However
  this code will be useful to others who are posting their bits of
  OpenACC support upstream, so perhaps it'd be useful to commit it
  anyway (we have to start somewhere!).
 
 Just random comments about all the 10 patches:

Thanks for your comments -- I'm planning to address the things you've
bought up, but will probably change tack a little and do that work on
the gomp-4_0-branch (rather than working directly on mainline). That
way I can (hopefully) send incremental patches rather than working
entirely locally then sending another over-sized patch.

 Cache the return value?  Also, I must say I'm not particularly excited
 about different plugins not supporting both OpenMP 4.0 and OpenACC 2.0
 offloading.  Why is that needed?

For now, because OpenACC supports some stuff that (AFAIK!) OpenMP
doesn't, such as asynchronous execution. The eventual plan is for the
plugin interface to be generic, but we're not there yet.

 +  /* Make sure all the CUDA functions are there if any of them
 are.  */
 +  if (optional_present  optional_present != optional_total)
 +   {
 + err = plugin missing OpenACC CUDA handler function;
 + goto out;
 +   }
 
 So, any plugin that doesn't support CUDA will not support OpenACC?
 I hoped OpenACC would not be so tied to one particular HW...

The intention was for that section to allow zero CUDA handling
functions, or all of them. For better or worse, OpenACC defines a few
APIs which are target-specific (for NVidia, AMD, Intel so far, IIRC).
An OpenACC application doesn't have to use any of those, of course.

 that is not how ChangeLog entries should look like, if a line is not
 starting with ( after the tab, it should not contain extra spaces
 after the tab, so move Use these. and hack. (and in other spots)
 two columns to the left.

That was merely a copy/paste error of some sort, apologies.

Thanks,

Julian


[PATCH 1/10] OpenACC 2.0 support for libgomp - offloading support

2014-09-23 Thread Julian Brown
This patch is by Jakub Jelinek, and was originally posted here:

  https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01098.html

Parts of the patch subsequently landed on mainline as part of the
following patch:

  https://gcc.gnu.org/ml/gcc-patches/2013-10/msg00505.html

But not the OpenMP target parts. This patch therefore contains the
delta between those two patches.

Julian

-xx-xx  Jakub Jelinek  ja...@redhat.com

libgomp/
* splay-tree.h: New file.
* target.c (splay_tree_node, splay_tree, splay_tree_key): New
typedefs. (struct target_mem_desc, struct splay_tree_key_s):
New structures. (splay_compare): New inline function.
* libgomp.h (gomp_get_num_devices): Add prototype.
(gomp_get_num_devices): Add FIXME comment.
(resolve_device): Use default_device_var ICV.  Add temporarily
magic testing device number 257.
(dev_splay_tree, dev_env_lock): New variables.
(gomp_map_vars_existing, gomp_map_vars, gomp_unmap_tgt,
gomp_unmap_vars, gomp_update): New functions.
(GOMP_target, GOMP_target_data, GOMP_target_end_data,
GOMP_target_update): Add support for magic testing device
number 257.
commit fc39aa98eba906466226c17fb455e57ebcfc1bc6
Author: Julian Brown jul...@codesourcery.com
Date:   Fri Sep 19 08:33:05 2014 -0700

Delta between upstream and gomp-4_0-branch version of r202620:

2013-09-16  Jakub Jelinek  ja...@redhat.com

   * splay-tree.h: New file.
   * target.c: Include stdbool.h.
   (splay_tree_node, splay_tree, splay_tree_key): New typedefs.
   (struct target_mem_desc, struct splay_tree_key_s): New structures.
   (splay_compare): New inline function.
   (gomp_get_num_devices): New function.
   (resolve_device): Use default_device_var ICV.  Add temporarily
   magic testing device number 257.
   (dev_splay_tree, dev_env_lock): New variables.
   (gomp_map_vars_existing, gomp_map_vars, gomp_unmap_tgt,
   gomp_unmap_vars, gomp_update): New functions.
   (GOMP_target, GOMP_target_data, GOMP_target_end_data,
   GOMP_target_update): Add support for magic testing device number 257.
   * libgomp.h (struct target_mem_desc): Forward declare.
   (struct gomp_task_icv): Add default_device_var and target_data.
   (gomp_get_num_devices): New prototype.
   * env.c (gomp_global_icv): Add default_device_var initializer.
   (parse_int): New function.
   (handle_omp_display_env): Print OMP_DEFAULT_DEVICE.
   (initialize_env): Initialize default_device_var.
   (omp_set_default_device): Set default_device_var ICV.
   (omp_get_default_device): Query default_device_var ICV.
   (omp_get_num_devices): Call gomp_get_num_devices.
   (omp_get_num_teams, omp_get_team_num, omp_is_initial_device): Add
   comments.

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a1482cc..d53a326 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -608,6 +608,10 @@ extern void gomp_free_thread (void *);
 
 extern int gomp_get_num_devices (void);
 
+/* target.c */
+
+extern int gomp_get_num_devices (void);
+
 /* work.c */
 
 extern void gomp_init_work_share (struct gomp_work_share *, bool, unsigned);
diff --git a/libgomp/splay-tree.h b/libgomp/splay-tree.h
new file mode 100644
index 000..04a71d1
--- /dev/null
+++ b/libgomp/splay-tree.h
@@ -0,0 +1,232 @@
+/* A splay-tree datatype.
+   Copyright 1998-2013
+   Free Software Foundation, Inc.
+   Contributed by Mark Mitchell (m...@markmitchell.com).
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   http://www.gnu.org/licenses/.  */
+
+/* The splay tree code copied from include/splay-tree.h and adjusted,
+   so that all the data lives directly in splay_tree_node_s structure
+   and no extra allocations are needed.
+
+   Files including this header should before including it add:
+typedef struct splay_tree_node_s *splay_tree_node;
+typedef struct splay_tree_s

[PATCH 0/10] OpenACC 2.0 support for libgomp

2014-09-23 Thread Julian Brown
Hi,

The upcoming patch series constitutes our current (still in-progress)
implementation of run-time support for OpenACC 2.0 in libgomp. We've
tried to build on top of the (also currently WIP) support for OpenMP
4.0's target construct, sharing code where possible: because of this,
I've also prepared versions of (a fairly minimal, hopefully correct set
of) prerequisite patches that apply to current mainline (and were
previously on the gomp 4.0 branch), although in many cases we weren't
the original authors of those.

Other parts of the OpenACC support for GCC are being sent upstream
concurrently with this runtime support (and are co-dependent with it),
so unfortunately, though the main part of the implementation (part 7/10)
works on our internal branch, I haven't yet been able to convincingly
test the series I'm about to post upstream. However this code will be
useful to others who are posting their bits of OpenACC support
upstream, so perhaps it'd be useful to commit it anyway (we have to
start somewhere!).

I've tried to retain proper attribution for all the forthcoming patches,
but I may have made mistakes. Please let me know if so!

Thanks,

Julian


[PATCH 2/10] OpenACC 2.0 support for libgomp - initial plugin support

2014-09-23 Thread Julian Brown
This patch is by Michael Zolotukhin and was originally posted here:

  https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01469.html

It contains an initial implementation of plugin support for libgomp,
for implementing different hardware devices for pieces of accelerated
code to be offloaded to.

I also merged a minor follow-up fix by Thomas Schwinge.

Julian

-xx-xx  Michael Zolotukhin  michael.v.zolotuk...@intel.com
Thomas Schwinge  tho...@codesourcery.com

   * configure.ac: Add checks for plugins support.
   * config.h.in: Regenerated.
   * configure: Regenerated.
   * target.c (struct target_mem_desc): Add device_descr field.
   (devices): New.
   (num_devices): New.
   (struct gomp_device_descr): New.
   (gomp_get_num_devices): Call gomp_target_init.
   (resolve_device): Return device_descr instead of int.
   (gomp_map_vars): Add devicep argument and update the function
   accordingly.
   (gomp_unmap_tgt): Likewise.
   (gomp_unmap_vars): Likewise.
   (gomp_update): Likewise.
   (GOMP_target): Use device_descr struct.
   (GOMP_target_data): Likewise.
   (GOMP_target_update): Likewise.
   (gomp_check_plugin_file_name): New.
   (gomp_load_plugin_for_device): New.
   (gomp_find_available_plugins): New.
   (gomp_target_init): New.
commit 75ef137a74cbd6af36a75b30edf60350ec9eae0d
Author: Julian Brown jul...@codesourcery.com
Date:   Fri Sep 19 08:51:44 2014 -0700

Merge of r202827.

-xx-xx  Michael Zolotukhin  michael.v.zolotuk...@intel.com
	Thomas Schwinge  tho...@codesourcery.com

   * configure.ac: Add checks for plugins support.
   * config.h.in: Regenerated.
   * configure: Regenerated.
   * target.c (struct target_mem_desc): Add device_descr field.
   (devices): New.
   (num_devices): New.
   (struct gomp_device_descr): New.
   (gomp_get_num_devices): Call gomp_target_init.
   (resolve_device): Return device_descr instead of int.
   (gomp_map_vars): Add devicep argument and update the function
   accordingly.
   (gomp_unmap_tgt): Likewise.
   (gomp_unmap_vars): Likewise.
   (gomp_update): Likewise.
   (GOMP_target): Use device_descr struct.
   (GOMP_target_data): Likewise.
   (GOMP_target_update): Likewise.
   (gomp_check_plugin_file_name): New.
   (gomp_load_plugin_for_device): New.
   (gomp_find_available_plugins): New.
   (gomp_target_init): New.

diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index 14c7e2a..67f5420 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -30,6 +30,9 @@
 /* Define to 1 if you have the inttypes.h header file. */
 #undef HAVE_INTTYPES_H
 
+/* Define to 1 if you have the `dl' library (-ldl). */
+#undef HAVE_LIBDL
+
 /* Define to 1 if you have the memory.h header file. */
 #undef HAVE_MEMORY_H
 
@@ -107,6 +110,9 @@
 /* Define to the version of this package. */
 #undef PACKAGE_VERSION
 
+/* Define if all infrastructure, needed for plugins, is supported. */
+#undef PLUGIN_SUPPORT
+
 /* The size of `char', as computed by sizeof. */
 #undef SIZEOF_CHAR
 
diff --git a/libgomp/configure b/libgomp/configure
index 766eb09..704f22a 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15052,6 +15052,69 @@ fi
 rm -f core conftest.err conftest.$ac_objext \
 conftest$ac_exeext conftest.$ac_ext
 
+plugin_support=yes
+{ $as_echo $as_me:${as_lineno-$LINENO}: checking for dlsym in -ldl 5
+$as_echo_n checking for dlsym in -ldl...  6; }
+if test ${ac_cv_lib_dl_dlsym+set} = set; then :
+  $as_echo_n (cached)  6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS=-ldl  $LIBS
+cat confdefs.h - _ACEOF conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern C
+#endif
+char dlsym ();
+int
+main ()
+{
+return dlsym ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link $LINENO; then :
+  ac_cv_lib_dl_dlsym=yes
+else
+  ac_cv_lib_dl_dlsym=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo $as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_dl_dlsym 5
+$as_echo $ac_cv_lib_dl_dlsym 6; }
+if test x$ac_cv_lib_dl_dlsym = xyes; then :
+  cat confdefs.h _ACEOF
+#define HAVE_LIBDL 1
+_ACEOF
+
+  LIBS=-ldl $LIBS
+
+else
+  plugin_support=no
+fi
+
+ac_fn_c_check_header_mongrel $LINENO dirent.h ac_cv_header_dirent_h $ac_includes_default
+if test x$ac_cv_header_dirent_h = xyes; then :
+
+else
+  plugin_support=no
+fi
+
+
+
+if test x$plugin_support = xyes; then
+
+$as_echo #define PLUGIN_SUPPORT 1 confdefs.h
+
+fi
+
 # Check for functions needed.
 for ac_func in getloadavg clock_gettime strtoull
 do :
diff --git a/libgomp

[PATCH 4/10] OpenACC 2.0 support for libgomp - host plugin

2014-09-23 Thread Julian Brown
This patch was originally by Thomas Schwinge and was posted here:

  https://gcc.gnu.org/ml/gcc-patches/2014-02/msg01172.html

It implements a plugin for host execution that can be used for testing
non-shared-memory semantics on a virtual target device. It's merged
with a minor follow-up patch, also by Thomas.

Julian

-xx-xx  Thomas Schwinge  tho...@codesourcery.com
James Norris  jnor...@codesourcery.com

   * plugin-host.c: New file.
   * target.c (struct gomp_device_descr): Add device_alloc_func,
   device_free_func, device_dev2host_func, device_host2dev_func
   members.
   (gomp_load_plugin_for_device): Load these.
   (gomp_map_vars, gomp_unmap_tgt, gomp_unmap_vars, gomp_update):
 Use these.
   (resolve_device, gomp_find_available_plugins): Remove ID 257
 hack.
commit 1adb683c08079789d013713751a15803b26f11c2
Author: Julian Brown jul...@codesourcery.com
Date:   Fri Sep 19 09:07:08 2014 -0700

Merge r207938.

2014-02-20  Thomas Schwinge  tho...@codesourcery.com
	James Norris  jnor...@codesourcery.com

   * plugin-host.c: New file.
   * target.c (struct gomp_device_descr): Add device_alloc_func,
   device_free_func, device_dev2host_func, device_host2dev_func
   members.
   (gomp_load_plugin_for_device): Load these.
   (gomp_map_vars, gomp_unmap_tgt, gomp_unmap_vars, gomp_update): Use
   these.
   (resolve_device, gomp_find_available_plugins): Remove ID 257 hack.

Merge r207940.

2014-02-20  Thomas Schwinge  tho...@codesourcery.com

	* target.c (gomp_load_plugin_for_device): Don't call dlcose if
	dlopen failed.

diff --git a/libgomp/plugin-host.c b/libgomp/plugin-host.c
new file mode 100644
index 000..5354ebe
--- /dev/null
+++ b/libgomp/plugin-host.c
@@ -0,0 +1,84 @@
+/* Plugin for non-shared memory host execution.
+
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   Contributed by Thomas Schwinge tho...@codesourcery.com.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   http://www.gnu.org/licenses/.  */
+
+/* Simple implementation of a libgomp plugin for non-shared memory host
+   execution.  */
+
+#include stdbool.h
+#include stdio.h
+#include stdlib.h
+#include string.h
+
+bool
+device_available (void)
+{
+#ifdef DEBUG
+  printf (libgomp plugin: %s:%s\n, __FILE__, __FUNCTION__);
+#endif
+
+  return true;
+}
+
+void *
+device_alloc (size_t size)
+{
+  void *ptr = malloc (size);
+
+#ifdef DEBUG
+  printf (libgomp plugin: %s:%s (%zd): %p\n, __FILE__, __FUNCTION__, size, ptr);
+#endif
+
+  return ptr;
+}
+
+void
+device_free (void *ptr)
+{
+#ifdef DEBUG
+  printf (libgomp plugin: %s:%s (%p)\n, __FILE__, __FUNCTION__, ptr);
+#endif
+
+  free (ptr);
+}
+
+void *device_dev2host (void *dest, const void *src, size_t n)
+{
+#ifdef DEBUG
+  printf (libgomp plugin: %s:%s (%p, %p, %zd)\n, __FILE__, __FUNCTION__, dest, src, n);
+#endif
+
+  return memcpy (dest, src, n);
+}
+
+void *device_host2dev (void *dest, const void *src, size_t n)
+{
+#ifdef DEBUG
+  printf (libgomp plugin: %s:%s (%p, %p, %zd)\n, __FILE__, __FUNCTION__, dest, src, n);
+#endif
+
+  return memcpy (dest, src, n);
+}
diff --git a/libgomp/target.c b/libgomp/target.c
index f1e776b..d0db4c2 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -122,6 +122,10 @@ struct gomp_device_descr
 
   /* Function handlers.  */
   bool (*device_available_func) (void);
+  void *(*device_alloc_func) (size_t);
+  void (*device_free_func) (void *);
+  void *(*device_dev2host_func)(void *, const void *, size_t);
+  void *(*device_host2dev_func)(void *, const void *, size_t);
 
   /* Splay tree containing information about mapped memory regions.  */
   struct splay_tree_s dev_splay_tree;
@@ -145,14 +149,10 @@ resolve_device (int device_id)
   struct gomp_task_icv *icv = gomp_icv (false);
   device_id = icv-default_device_var;
 }
-  if (device_id = gomp_get_num_devices ()
-   device_id != 257)
+  if (device_id  0
+  || device_id = gomp_get_num_devices

[PATCH 3/10] OpenACC 2.0 support for libgomp - Don't update copy_from for existing mappings

2014-09-23 Thread Julian Brown
This patch is by Ilya Verbin and was originally posted here:

  https://gcc.gnu.org/ml/gcc-patches/2014-02/msg01011.html

This is a fix for OpenMP semantics re: mapping of memory for a target
device.

Julian

-xx-xx  Ilya Verbin  ilya.ver...@intel.com

   * target.c (gomp_map_vars_existing): Don't update copy_from for
 the existing mappings.
commit 76da6cdeb61190c6b39f02656a91a24e26bc3006
Author: Julian Brown jul...@codesourcery.com
Date:   Fri Sep 19 09:03:49 2014 -0700

Merge r207897.

2014-02-17  Ilya Verbin  ilya.ver...@intel.com

   * target.c (gomp_map_vars_existing): Don't update copy_from for the
   existing mappings.

diff --git a/libgomp/target.c b/libgomp/target.c
index 55b3781..f1e776b 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -170,11 +170,6 @@ gomp_map_vars_existing (splay_tree_key oldn, splay_tree_key newn,
 		[%p..%p) is already mapped,
 		(void *) newn-host_start, (void *) newn-host_end,
 		(void *) oldn-host_start, (void *) oldn-host_end);
-  if (((kind  7) == 2 || (kind  7) == 3)
-   !oldn-copy_from
-   oldn-host_start == newn-host_start
-   oldn-host_end == newn-host_end)
-oldn-copy_from = true;
   oldn-refcount++;
 }
 


[PATCH 6/10] OpenACC 2.0 support for libgomp - Fortran bits

2014-09-23 Thread Julian Brown
This patch is by Thomas Schwinge and Jakub Jelinek, and was originally
posted here:

  https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00656.html

It adds some mappings required by the OpenACC implementation for
Fortran.

Julian

-xx-xx  Thomas Schwinge  tho...@codesourcery.com
Jakub Jelinek  ja...@redhat.com

   * target.c (gomp_map_vars, gomp_unmap_vars, gomp_update): Support
   NULL mappings as well as mapping kind OMP_CLAUSE_MAP_TO_PSET.
   Also, some code reformatting.
commit b661af0d60506bf174b687dbd0a590bacd0a4ed4
Author: Julian Brown jul...@codesourcery.com
Date:   Fri Sep 19 09:18:03 2014 -0700

Merge r212405.

2014-07-09  Thomas Schwinge  tho...@codesourcery.com
	Jakub Jelinek  ja...@redhat.com

	* target.c (gomp_map_vars, gomp_unmap_vars, gomp_update): Support
	NULL mappings as well as mapping kind OMP_CLAUSE_MAP_TO_PSET.
	Also, some code reformatting.

diff --git a/libgomp/target.c b/libgomp/target.c
index ef62228..64b787e 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -238,6 +238,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   gomp_mutex_lock (devicep-dev_env_lock);
   for (i = 0; i  mapnum; i++)
 {
+  if (hostaddrs[i] == NULL)
+	{
+	  tgt-list[i] = NULL;
+	  continue;
+	}
   cur_node.host_start = (uintptr_t) hostaddrs[i];
   if ((kinds[i]  7) != 4)
 	cur_node.host_end = cur_node.host_start + sizes[i];
@@ -259,6 +264,22 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	tgt_align = align;
 	  tgt_size = (tgt_size + align - 1)  ~(align - 1);
 	  tgt_size += cur_node.host_end - cur_node.host_start;
+	  if ((kinds[i]  7) == 5)
+	{
+	  size_t j;
+	  for (j = i + 1; j  mapnum; j++)
+		if ((kinds[j]  7) != 4)
+		  break;
+		else if ((uintptr_t) hostaddrs[j]  cur_node.host_start
+			 || ((uintptr_t) hostaddrs[j] + sizeof (void *)
+			  cur_node.host_end))
+		  break;
+		else
+		  {
+		tgt-list[j] = NULL;
+		i++;
+		  }
+	}
 	}
 }
 
@@ -281,10 +302,13 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 {
   tgt-array = gomp_malloc (not_found_cnt * sizeof (*tgt-array));
   splay_tree_node array = tgt-array;
+  size_t j;
 
   for (i = 0; i  mapnum; i++)
 	if (tgt-list[i] == NULL)
 	  {
+	if (hostaddrs[i] == NULL)
+	  continue;
 	splay_tree_key k = array-key;
 	k-host_start = (uintptr_t) hostaddrs[i];
 	if ((kinds[i]  7) != 4)
@@ -324,14 +348,25 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		/* FIXME: Perhaps add some smarts, like if copying
 		   several adjacent fields from host to target, use some
 		   host buffer to avoid sending each var individually.  */
-		devicep-device_host2dev_func((void *) (tgt-tgt_start
-			+ k-tgt_offset),
-		  (void *) k-host_start,
-		  k-host_end - k-host_start);
+		devicep-device_host2dev_func
+		  ((void *) (tgt-tgt_start + k-tgt_offset),
+		   (void *) k-host_start,
+		   k-host_end - k-host_start);
 		break;
 		  case 4: /* POINTER */
 		cur_node.host_start
 		  = (uintptr_t) *(void **) k-host_start;
+		if (cur_node.host_start == (uintptr_t) NULL)
+		  {
+			cur_node.tgt_offset = (uintptr_t) NULL;
+			/* Copy from host to device memory.  */
+			/* FIXME: see above FIXME comment.  */
+			devicep-device_host2dev_func
+			  ((void *) (tgt-tgt_start + k-tgt_offset),
+			   (void *) cur_node.tgt_offset,
+			   sizeof (void *));
+			break;
+		  }
 		/* Add bias to the pointer value.  */
 		cur_node.host_start += sizes[i];
 		cur_node.host_end = cur_node.host_start + 1;
@@ -363,11 +398,86 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		cur_node.tgt_offset -= sizes[i];
 		/* Copy from host to device memory.  */
 		/* FIXME: see above FIXME comment.  */
-		devicep-device_host2dev_func ((void *) (tgt-tgt_start
-			 + k-tgt_offset),
-		   (void *) cur_node.tgt_offset,
-		   sizeof (void *));
+		devicep-device_host2dev_func
+		  ((void *) (tgt-tgt_start + k-tgt_offset),
+		   (void *) cur_node.tgt_offset,
+		   sizeof (void *));
 		break;
+		  case 5: /* TO_PSET */
+		/* Copy from host to device memory.  */
+		/* FIXME: see above FIXME comment.  */
+		devicep-device_host2dev_func
+		  ((void *) (tgt-tgt_start + k-tgt_offset),
+		   (void *) k-host_start,
+		   (k-host_end - k-host_start));
+		for (j = i + 1; j  mapnum; j++)
+		  if ((kinds[j]  7) != 4)
+			break;
+		  else if ((uintptr_t) hostaddrs[j]  k-host_start
+			   || ((uintptr_t) hostaddrs[j] + sizeof (void *)
+k-host_end))
+			break;
+		  else
+			{
+			  tgt-list[j] = k;
+			  k-refcount++;
+			  cur_node.host_start
+			= (uintptr_t) *(void **) hostaddrs[j];
+			  if (cur_node.host_start == (uintptr_t) NULL)
+			{
+			  cur_node.tgt_offset = (uintptr_t

[PATCH 5/10] OpenACC 2.0 support for libgomp - offload image registration

2014-09-23 Thread Julian Brown
This patch is by Ilya Verbin and was originally posted here:

  https://gcc.gnu.org/ml/gcc-patches/2014-03/msg00591.html

It implements a scheme for offloaded target-device code to register
itself with the libgomp runtime.

Julian

-xx-xx  Ilya Verbin  ilya.ver...@intel.com

   * libgomp.map (GOMP_4.0.1): New symbol version.
   Add GOMP_offload_register.
   * plugin-host.c (device_available): Replace with:
   (get_num_devices): This.
   (get_type): New.
   (offload_register): Ditto.
   (device_init): Ditto.
   (device_get_table): Ditto.
   (device_run): Ditto.
   * target.c (target_type): New enum.
   (offload_image_descr): New struct.
   (offload_images, num_offload_images): New globals.
   (struct gomp_device_descr): Remove device_available_func.
   Add type, is_initialized, get_type_func, get_num_devices_func,
   offload_register_func, device_init_func, device_get_table_func,
   device_run_func.
   (mapping_table): New struct.
   (GOMP_offload_register): New function.
   (gomp_init_device): Ditto.
   (GOMP_target): Add device initialization and lookup for target
 fn. (GOMP_target_data): Add device initialization.
   (GOMP_target_update): Ditto.
   (gomp_load_plugin_for_device): Take handles for get_type,
   get_num_devices, offload_register, device_init, device_get_table,
   device_run functions.
   (gomp_register_images_for_device): New function.
   (gomp_find_available_plugins): Add registration of offload
 images.
commit a8ad9504670363d8fd68e8e29f4a7455aae14446
Author: Julian Brown jul...@codesourcery.com
Date:   Fri Sep 19 09:16:11 2014 -0700

Merge r208657.

2014-03-18  Ilya Verbin  ilya.ver...@intel.com

   * libgomp.map (GOMP_4.0.1): New symbol version.
   Add GOMP_offload_register.
   * plugin-host.c (device_available): Replace with:
   (get_num_devices): This.
   (get_type): New.
   (offload_register): Ditto.
   (device_init): Ditto.
   (device_get_table): Ditto.
   (device_run): Ditto.
   * target.c (target_type): New enum.
   (offload_image_descr): New struct.
   (offload_images, num_offload_images): New globals.
   (struct gomp_device_descr): Remove device_available_func.
   Add type, is_initialized, get_type_func, get_num_devices_func,
   offload_register_func, device_init_func, device_get_table_func,
   device_run_func.
   (mapping_table): New struct.
   (GOMP_offload_register): New function.
   (gomp_init_device): Ditto.
   (GOMP_target): Add device initialization and lookup for target fn.
   (GOMP_target_data): Add device initialization.
   (GOMP_target_update): Ditto.
   (gomp_load_plugin_for_device): Take handles for get_type,
   get_num_devices, offload_register, device_init, device_get_table,
   device_run functions.
   (gomp_register_images_for_device): New function.
   (gomp_find_available_plugins): Add registration of offload images.

diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index b102fd8..f36df23 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -227,3 +227,8 @@ GOMP_4.0 {
 	GOMP_target_update;
 	GOMP_teams;
 } GOMP_3.0;
+
+GOMP_4.0.1 {
+  global:
+	GOMP_offload_register;
+} GOMP_4.0;
diff --git a/libgomp/plugin-host.c b/libgomp/plugin-host.c
index 5354ebe..ec0c78c 100644
--- a/libgomp/plugin-host.c
+++ b/libgomp/plugin-host.c
@@ -33,14 +33,53 @@
 #include stdlib.h
 #include string.h
 
-bool
-device_available (void)
+const int TARGET_TYPE_HOST = 0;
+
+int
+get_type (void)
 {
 #ifdef DEBUG
   printf (libgomp plugin: %s:%s\n, __FILE__, __FUNCTION__);
 #endif
 
-  return true;
+  return TARGET_TYPE_HOST;
+}
+
+int
+get_num_devices (void)
+{
+#ifdef DEBUG
+  printf (libgomp plugin: %s:%s\n, __FILE__, __FUNCTION__);
+#endif
+
+  return 1;
+}
+
+void
+offload_register (void *host_table, void *target_data)
+{
+#ifdef DEBUG
+  printf (libgomp plugin: %s:%s (%p, %p)\n, __FILE__, __FUNCTION__,
+	  host_table, target_data);
+#endif
+}
+
+void
+device_init (void)
+{
+#ifdef DEBUG
+  printf (libgomp plugin: %s:%s\n, __FILE__, __FUNCTION__);
+#endif
+}
+
+int
+device_get_table (void *table)
+{
+#ifdef DEBUG
+  printf (libgomp plugin: %s:%s (%p)\n, __FILE__, __FUNCTION__, table);
+#endif
+
+  return 0;
 }
 
 void *
@@ -82,3 +121,16 @@ void *device_host2dev (void *dest, const void *src, size_t n)
 
   return memcpy (dest, src, n);
 }
+
+void
+device_run (void *fn_ptr, void *vars)
+{
+#ifdef DEBUG
+  printf (libgomp plugin: %s:%s (%p, %p)\n, __FILE__, __FUNCTION__, fn_ptr,
+	  vars);
+#endif
+
+  void (*fn)(void *) = (void (*)(void *)) fn_ptr;
+
+  fn (vars);
+}
diff --git a/libgomp/target.c b/libgomp/target.c
index d0db4c2..ef62228 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -84,6 +84,26 @@ struct

[PATCH 8/10] OpenACC 2.0 support for libgomp - temporarily work around missing __builtin_acc_on_device

2014-09-23 Thread Julian Brown
The patches implementing __builtin_acc_on_device are still in
processing. For the time being this patch removes the dependency on
that builtin in the OpenACC runtime.

Julian

-xx-xx  Julian Brown  jul...@codesourcery.com

libgomp/
* oacc-init.c (acc_on_device): Temporarily hard-code for host
instead of using __builtin_acc_on_device.
commit b74fb2fcb435b646499e9558a64b3989b64ad943
Author: Julian Brown jul...@codesourcery.com
Date:   Fri Sep 19 11:28:11 2014 -0700

Work around lack of __builtin_acc_on_device for now

diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index af2d2aa..35fe643 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -451,8 +451,20 @@ ialias (acc_set_device_num)
 int
 acc_on_device (acc_device_t dev)
 {
+#if 1
+  /* Support for __builtin_acc_on_device comes in later patches.  */
+  switch (dev)
+{
+case acc_device_none:
+case acc_device_host:
+  return 1;
+default:
+  return 0;
+}
+#else
   /* Just rely on the compiler builtin.  */
   return __builtin_acc_on_device (dev);
+#endif
 }
 ialias (acc_on_device)
 


[PATCH 9/10] OpenACC 2.0 support for libgomp - outline documentation

2014-09-23 Thread Julian Brown
This patch provides some documentation for the new OpenACC bits in
libgomp.

Julian

-xx-xx  Thomas Schwinge  tho...@codesourcery.com
James Norris  jnor...@codesourcery.com

libgomp/
* libgomp.texi: Outline documentation for OpenACC.
commit c1b3a366e95ff50d8f30fb0e942c0c25a51108c7
Author: Julian Brown jul...@codesourcery.com
Date:   Mon Sep 22 02:45:29 2014 -0700

OpenACC documentation.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 254be57..9530a2b 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -31,10 +31,12 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
 @ifinfo
 @dircategory GNU Libraries
 @direntry
-* libgomp: (libgomp).GNU OpenMP runtime library
+* libgomp: (libgomp).GNU OpenACC and OpenMP runtime library
 @end direntry
 
-This manual documents the GNU implementation of the OpenMP API for 
+This manual documents the GNU implementation of the OpenACC API for 
+offloading of code to accelerator devices in C/C++ and Fortran and
+the GNU implementation of the OpenMP API for 
 multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
 Published by the Free Software Foundation
@@ -48,7 +50,7 @@ Boston, MA 02110-1301 USA
 @setchapternewpage odd
 
 @titlepage
-@title The GNU OpenMP Implementation
+@title The GNU OpenACC and OpenMP Implementation
 @page
 @vskip 0pt plus 1filll
 @comment For the @value{version-GCC} Version*
@@ -69,7 +71,10 @@ Boston, MA 02110-1301, USA@*
 @top Introduction
 @cindex Introduction
 
-This manual documents the usage of libgomp, the GNU implementation of the 
+This manual documents the usage of libgomp, the GNU implementation of the
+@uref{http://www.openacc.org/, OpenACC} Application Programming Interface (API)
+for offloading of code to accelerator devices in C/C++ and Fortran, and
+the GNU implementation of the 
 @uref{http://www.openmp.org, OpenMP} Application Programming Interface (API)
 for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
@@ -81,23 +86,619 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 @comment  better formatting.
 @comment
 @menu
-* Enabling OpenMP::How to enable OpenMP for your applications.
-* Runtime Library Routines::   The OpenMP runtime application programming 
-   interface.
-* Environment Variables::  Influencing runtime behavior with environment 
-   variables.
-* The libgomp ABI::Notes on the external ABI presented by libgomp.
-* Reporting Bugs:: How to report bugs in GNU OpenMP.
-* Copying::GNU general public license says
-   how you can copy and share libgomp.
-* GNU Free Documentation License::
-   How you can copy and share this manual.
-* Funding::How to help assure continued work for free 
-   software.
-* Library Index::  Index of this documentation.
+* Enabling OpenACC:: How to enable OpenACC for your
+ applications.
+* OpenACC Runtime Library Routines:: The OpenACC runtime application
+  programming interface.
+* OpenACC Environment Variables::Influencing OpenACC runtime behavior with
+ environment variables.
+* OpenACC Library Interoperability:: OpenACC library interoperability with the
+ NVIDIA CUBLAS library.
+* Enabling OpenMP::  How to enable OpenMP for your
+ applications.
+* OpenMP Runtime Library Routines: Runtime Library Routines.
+ The OpenMP runtime application programming
+ interface.
+* OpenMP Environment Variables: Environment Variables.
+ Influencing OpenMP runtime behavior with
+ environment variables.
+* The libgomp ABI::  Notes on the external libgomp ABI.
+* Reporting Bugs::   How to report bugs.
+* Copying::  GNU general public license says how you
+ can copy and share libgomp.
+* GNU Free Documentation License::   How you can copy and share this manual.
+* Funding::  How to help assure continued work for free
+ software.
+* Library Index::Index of this documentation.
 @end menu
 
 
+
+@c -
+@c Enabling OpenACC
+@c -
+
+@node Enabling OpenACC
+@chapter Enabling OpenACC
+
+To activate the OpenACC extensions for C/C++ and Fortran

Re: GCC ARM: aligned access

2014-09-02 Thread Julian Brown
On Mon, 1 Sep 2014 09:14:31 +0800
Peng Fan van.free...@gmail.com wrote:

 On 09/01/2014 08:09 AM, Matt Thomas wrote:
  
  On Aug 31, 2014, at 11:32 AM, Joel Sherrill
  joel.sherr...@oarcorp.com wrote:
  I think this is totally expected. You were passed a u8 pointer
  which is aligned for that type (no restrictions likely). You cast
  it to a type with stricter alignment requirements. The code is
  just flawed. Some CPUs handle unaligned accesses but not your ARM.
  
 armv7 and armv6 arch except armv6-m support unaligned access. a u8
 pointer is casted to u32 pointer, should gcc take the align problem
 into consideration to avoid possible errors? because
 -mno-unaligned-access.

Using -munaligned-access (or its inverse) isn't enough to make GCC
generate code that can perform arbitrary unaligned accesses, because
several instructions (e.g. VFP loads/stores or load/store multiple
instructions IIRC) must still act on naturally-aligned data even when
the hardware flag to enable unaligned accesses is on, and those
instructions will still be generated by GCC when they are considered
safe, i.e. when not doing explicitly-unaligned accesses in packed
structures or similar.

It would be *possible* to add an option to the backend to allow
arbitrary alignment for any access, I think, but it's not at all clear
that it's a good idea, and would certainly negatively affect
performance.

(If you need unaligned accesses, you can use e.g. memcpy, and that will
probably generate good inline code.)

Julian


Re: [PATCH] Fix GDB PR15559 (inferior calls using thiscall calling convention)

2014-06-24 Thread Julian Brown
On Fri, 9 May 2014 17:33:41 +0100
Julian Brown jul...@codesourcery.com wrote:

 On Wed, 7 May 2014 09:41:27 -0600
 Tom Tromey tro...@redhat.com wrote:
 
  Tom The usual approach is some appropriate text somewhere on the
  Tom GCC wiki (though I suppose a note in the mail archives would
  Tom do in a pinch) along with a URL in a comment in the
  Tom appropriate file (dwarf2.h or dwarf2.def).
  
  Tom Could you please do that?
  
  Julian How's this, as a first attempt?
  Julian http://gcc.gnu.org/wiki/GNUDwarfExtensions
  
  Sorry I didn't reply to this sooner.
  That page looks great.  Thanks for doing this.
 
 Thanks! Now, does anyone want to review the patch itself? :-)

Ping?

Julian


Re: RTABI half-precision conversion functions (ping)

2014-06-24 Thread Julian Brown
On Thu, 29 May 2014 11:16:52 +0100
Julian Brown jul...@codesourcery.com wrote:

 On Thu, 19 Jul 2012 14:47:54 +0100
 Julian Brown jul...@codesourcery.com wrote:
 
  On Thu, 19 Jul 2012 13:54:57 +0100
  Paul Brook p...@codesourcery.com wrote:
  
But, that means EABI-conformant callers are also perfectly
entitled to sign-extend half-float values before calling our
helper functions (although GCC itself won't do that). Using
unsigned int and taking care to only examine the low-order
bits of the value in the helper function itself serves to fix
the latent bug, is compatible with existing code, allows us to
be conformant with the eabi, and allows use of aliases to make
the __gnu and __aeabi functions the same.
   
   As long as LTO never sees this mismatch we should be fine :-)
   AFAIK we don't curently have any way of expressing the actual ABI.
  
  Let's not worry about that for now :-).
  
The patch no longer applied as-is, so I've updated it (attached,
re-tested). Note that there are no longer any target-independent
changes (though I'm not certain that the symbol versions are
still correct).

OK to apply?
   
   I think this deserves a comment in the source.  Otherwise it's
   liable to get fixed in the future :-) Something allong the lines
   of While the EABI describes the arguments to the half-float
   helper routines as 'short', it does not require that they be
   extended to full register width. The normal ABI requres that the
   caller sign/zero extend short values to 32 bit.  We use unsigned
   int arguments to prevent the gcc making assumptions about the
   high half of the register.
  
  Here's a version with an explanatory comment. I also fixed a couple
  of minor formatting nits I noticed (they don't upset the diff too
  much, I don't think).
 
 It looks like this one got forgotten about. Ping?
 
 Context:
 
 https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00902.html
 https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00912.html
 
 This is an EABI-conformance fix.

Ping?

Julian


Re: Handle MULTILIB_REUSE in auto-generated SYSROOT_SUFFIX_SPEC macro

2014-06-24 Thread Julian Brown
On Thu, 5 Jun 2014 20:23:27 +0100
Julian Brown jul...@codesourcery.com wrote:

 Hi,
 
 The print-sysroot-suffix.sh script that can be used (via the
 t-sysroot-suffix makefile fragment) to auto-generate
 the SYSROOT_SUFFIX_SPEC macro for non-trivial multilib setups does not
 take into account the MULTILIB_REUSE target fragment variable.
 
 I'm not sure of a way to demonstrate how this causes problems with a
 vanilla tree, but consider the attached patch
 (arm-sysroot-mlib-arrangement-1.diff) intended to create a compiler
 with three multilibs:

Ping? (Note that no in-tree targets use both print-sysroot-suffix.sh
and MULTILIB_REUSE, AFAICT, so this patch is mostly useful to
3rd-party integrators.)

Julian


Re: [PATCH, ARM] Don't use NEON for autovectorization in big-endian mode

2014-06-24 Thread Julian Brown
On Mon, 16 Jun 2014 12:42:36 +0100
Julian Brown jul...@codesourcery.com wrote:

 Hi,
 
 As discussed several times previously, support for NEON in ARM
 big-endian mode is quite broken because of differing assumptions about
 lane ordering made by the ARM EABI and the set of NEON intrinsics on
 the one hand, and the vectorizer on the other.
 
 Fixing this properly would involve quite a large overhaul of the
 NEON backend implementation, and such an overhaul does not appear to
 be forthcoming. Unfortunately this leaves big-endian mode with a
 problem: even if the user is not explicitly using NEON intrinsics,
 compiling with NEON and the vectorizer enabled (i.e. -O3) can quite
 easily lead to incorrect code being generated.
 
 This is the patch we've been using internally for a while to work
 around the problem. When applied:

Ping?

Julian


[PATCH, ARM] Don't use NEON for autovectorization in big-endian mode

2014-06-16 Thread Julian Brown
Hi,

As discussed several times previously, support for NEON in ARM
big-endian mode is quite broken because of differing assumptions about
lane ordering made by the ARM EABI and the set of NEON intrinsics on
the one hand, and the vectorizer on the other.

Fixing this properly would involve quite a large overhaul of the NEON
backend implementation, and such an overhaul does not appear to be
forthcoming. Unfortunately this leaves big-endian mode with a problem:
even if the user is not explicitly using NEON intrinsics, compiling
with NEON and the vectorizer enabled (i.e. -O3) can quite easily lead
to incorrect code being generated.

This is the patch we've been using internally for a while to work
around the problem. When applied:

* We do not allow Neon vectors to be used for autovectorization.
  Vectorization is not disabled completely: ARM core registers (e.g.
  four chars packed into a core register) can still be used to vectorize
  loops in limited circumstances. I think this is mildly preferable to
  forcing -ftree-vectorize to be off entirely for big-endian NEON.

* Intrinsics are not touched. Those which attempt to mix generic vector
  operations with the ABI-defined vector types (i.e. those which are
  implemented with __builtin_shuffle) are, I think, technically
  incorrect -- but in the sense of two wrongs making a right, so the
  end result appears to work.

* Generic vectors (i.e. direct use of __attribute__((vector_size(foo)))
  types) will continue to behave strangely in big-endian mode.

This of course continues to be suboptimal, but at least in *the
common case* we stop generating bad code.

Testing in big-endian mode on user-space QEMU (ARMv7-A, NEON, softfp)
shows (apart from some noise) test diffs as attached. Notice the large
number of removed execution failures, in particular.

OK to apply?

Thanks,

Julian

ChangeLog

gcc/
* config/arm/arm.c (arm_array_mode_supported_p): No array modes for
big-endian NEON.
(arm_preferred_simd_mode): Don't use NEON vectors for
autovectorization in big-endian mode.
(arm_autovectorize_vector_sizes): Don't iterate over other vector
sizes for big-endian NEON.

gcc/testsuite/
* lib/target-supports.exp (check_vect_support_and_set_flags): Don't
run vect tests for big-endian ARM NEON.
* gcc.target/arm/neon/vect-vcvt.c: XFAIL for !arm_little_endian.
* gcc.target/arm/neon/vect-vcvtq.c: Likewise.
* gcc.target/arm/neon-vshl-imm-1.c: Likewise.
* gcc.target/arm/neon-vshr-imm-1.c: Likewise.
* gcc.target/arm/neon-vmls-1.c: Likewise.
* gcc.target/arm/neon-vmla-1.c: Likewise.
* gcc.target/arm/neon-vfma-1.c: Likewise.
* gcc.target/arm/neon-vfms-1.c: Likewise.
* gcc.target/arm/neon-vorn-vbic.c: Likewise.
* gcc.target/arm/neon-vlshr-imm-1.c: Likewise.
* gcc.target/arm/neon-vcond-ltgt.c: Likewise.
* gcc.target/arm/neon-vcond-gt.c: Likewise.
* gcc.target/arm/neon-vcond-unordered.c: Likewise.
Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c	(revision 210209)
+++ gcc/config/arm/arm.c	(working copy)
@@ -28813,7 +28813,7 @@ static bool
 arm_array_mode_supported_p (enum machine_mode mode,
 			unsigned HOST_WIDE_INT nelems)
 {
-  if (TARGET_NEON
+  if (TARGET_NEON  !BYTES_BIG_ENDIAN
(VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))
(nelems = 2  nelems = 4))
 return true;
@@ -28828,7 +28828,7 @@ arm_array_mode_supported_p (enum machine
 static enum machine_mode
 arm_preferred_simd_mode (enum machine_mode mode)
 {
-  if (TARGET_NEON)
+  if (TARGET_NEON  !BYTES_BIG_ENDIAN)
 switch (mode)
   {
   case SFmode:
@@ -29845,7 +29845,8 @@ arm_vector_alignment (const_tree type)
 static unsigned int
 arm_autovectorize_vector_sizes (void)
 {
-  return TARGET_NEON_VECTORIZE_DOUBLE ? 0 : (16 | 8);
+  return (TARGET_NEON_VECTORIZE_DOUBLE || (TARGET_NEON  BYTES_BIG_ENDIAN))
+	 ? 0 : (16 | 8);
 }
 
 static bool
Index: gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c
===
--- gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c	(revision 210209)
+++ gcc/testsuite/gcc.target/arm/neon/vect-vcvtq.c	(working copy)
@@ -24,5 +24,5 @@ int convert()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times vectorized 2 loops 1 vect } } */
+/* { dg-final { scan-tree-dump-times vectorized 2 loops 1 vect { xfail { ! arm_little_endian } } } } */
 /* { dg-final { cleanup-tree-dump vect } } */
Index: gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c
===
--- gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c	(revision 210209)
+++ gcc/testsuite/gcc.target/arm/neon/vect-vcvt.c	(working copy)
@@ -24,5 +24,5 @@ int convert()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times vectorized 2 loops 1 vect } } */
+/* { dg-final { scan-tree-dump-times vectorized 2 loops 1 vect { xfail { ! arm_little_endian } 

Handle MULTILIB_REUSE in auto-generated SYSROOT_SUFFIX_SPEC macro

2014-06-05 Thread Julian Brown
Hi,

The print-sysroot-suffix.sh script that can be used (via the
t-sysroot-suffix makefile fragment) to auto-generate
the SYSROOT_SUFFIX_SPEC macro for non-trivial multilib setups does not
take into account the MULTILIB_REUSE target fragment variable.

I'm not sure of a way to demonstrate how this causes problems with a
vanilla tree, but consider the attached patch
(arm-sysroot-mlib-arrangement-1.diff) intended to create a compiler
with three multilibs:

  .; (little-endian, soft float)
  be;@mbig-endian(big-endian, soft float)
  vfp;@mfloat-abi=softfp (little-endian, hardware FP)

Notice that we are not building a multilib for the be+vfp combination.
Instead we use the MULTILIB_REUSE macro to make that combination fall
back to using just the soft-float big-endian multilib:

MULTILIB_REUSE = mbig-endian=mbig-endian/mfloat-abi.softfp

But now, compiling code will fail with errors such as:

$ arm-none-linux-gnueabi-gcc hello.c -mbig-endian -mfloat-abi=softfp \
-o hello
../arm-none-linux-gnueabi/bin/ld: 
/path/to/install/arm-none-linux-gnueabi/libc/usr/lib/libc.a(s_signbit.o): 
compiled for a little endian system and target is big endian

Invoking the compiler with -print-sysroot vs. -print-multi-directory
illustrates the problem:

$ arm-none-linux-gnueabi-gcc hello.c -mbig-endian -mfloat-abi=softfp \
-print-sysroot
/path/to/install/arm-none-linux-gnueabi/libc

$ arm-none-linux-gnueabi-gcc hello.c -mbig-endian -mfloat-abi=softfp \
-print-multi-directory
be

What we wanted was for the first command to give the same result that
invoking without -mfloat-abi=softfp does (which was the purpose of the
MULTILIB_REUSE setting):

$ arm-none-linux-gnueabi-gcc hello.c -mbig-endian -print-sysroot
/path/to/install/arm-none-linux-gnueabi/libc/be

but, that doesn't work at present. The attached patch fixes that: it's
based on a part of CodeSourcery's earlier MULTILIB_ALIASES support
(by Paul Brook originally, I think -- I don't think it ever made it
upstream, but it worked quite similarly to MULTILIB_REUSE, that did),
and allows the above multilib arrangement to work correctly.

OK for mainline? (The ARM bits are for reference only and are not meant
to be committed, of course.)

Thanks,

Julian

ChangeLog

gcc/
* config/print-sysroot-suffix.sh: Handle MULTILIB_REUSE settings.
* config/t-sysroot-suffix (sysroot-suffix.h): Pass MULTILIB_REUSE
to print-sysroot-suffix.sh script.
Index: gcc/config.gcc
===
--- gcc/config.gcc	(revision 210209)
+++ gcc/config.gcc	(working copy)
@@ -1014,7 +1014,9 @@ arm*-*-linux-*)			# ARM GNU/Linux with E
 	;;
 	esac
 	tmake_file=${tmake_file} arm/t-arm arm/t-arm-elf arm/t-bpabi arm/t-linux-eabi
+	tmake_file=$tmake_file t-sysroot-suffix
 	tm_file=$tm_file arm/bpabi.h arm/linux-eabi.h arm/aout.h arm/arm.h
+	tm_file=$tm_file ./sysroot-suffix.h
 	# Define multilib configuration for arm-linux-androideabi.
 	case ${target} in
 	*-androideabi)
Index: gcc/config/arm/t-linux-eabi
===
--- gcc/config/arm/t-linux-eabi	(revision 210209)
+++ gcc/config/arm/t-linux-eabi	(working copy)
@@ -20,8 +20,15 @@
 # CLEAR_INSN_CACHE in linux-gas.h does not work in Thumb mode.
 # If you set MULTILIB_OPTIONS to a non-empty value you should also set
 # MULTILIB_DEFAULTS in linux-elf.h.
-MULTILIB_OPTIONS	=
-MULTILIB_DIRNAMES	=
+MULTILIB_OPTIONS	= mbig-endian mfloat-abi=softfp
+MULTILIB_DIRNAMES	= be vfp
+MULTILIB_OSDIRNAMES	= mbig-endian=!be mfloat-abi.softfp=!vfp
+MULTILIB_MATCHES	=
+MULTILIB_EXCEPTIONS	=
+
+MULTILIB_REUSE		= mbig-endian=mbig-endian/mfloat-abi.softfp
+
+MULTILIB_REQUIRED	= mbig-endian mfloat-abi=softfp
 
 #MULTILIB_OPTIONS += mcpu=fa606te/mcpu=fa626te/mcpu=fmp626/mcpu=fa726te
 #MULTILIB_DIRNAMES+= fa606te fa626te fmp626 fa726te
Index: gcc/config/print-sysroot-suffix.sh
===
Index: gcc/config/t-sysroot-suffix
===
--- gcc/config/print-sysroot-suffix.sh	(revision 210209)
+++ gcc/config/print-sysroot-suffix.sh	(working copy)
@@ -29,6 +29,7 @@
 #  MULTILIB_OSDIRNAMES \
 #  MULTILIB_OPTIONS \
 #  MULTILIB_MATCHES \
+#  MULTILIB_REUSE
 #   t-sysroot-suffix.h
 
 # The three options exactly correspond to the variables of the same
@@ -54,6 +55,7 @@ set -e
 dirnames=$1
 options=$2
 matches=$3
+reuse=$4
 
 cat  print-sysroot-suffix3.sh \EOF
 #! /bin/sh
@@ -80,7 +82,14 @@ shift 2
 n=\ \\
 $padding\
 if [ $# = 0 ]; then
+  case $optstring in
 EOF
+for x in $reuse; do
+  l=`echo $x | sed -e 's/=.*$//' -e 's/\./=/g'`
+  r=`echo $x | sed -e 's/^.*=//' -e 's/\./=/g'`
+  echo /$r/) optstring=\/$l/\ ;;  print-sysroot-suffix2.sh
+done
+echo   esac  print-sysroot-suffix2.sh
 
 pat=
 for x in $dirnames; do
--- gcc/config/t-sysroot-suffix	(revision 210209)
+++ 

<    2   3   4   5   6   7   8   9   >