Re: [PATCH 15/40] graphite: Extend SCoP detection dump output

2022-05-18 Thread Harwath, Frederik
Hi Richard,

On Tue, 2022-05-17 at 08:21 +, Richard Biener wrote:
> On Mon, 16 May 2022, Tobias Burnus wrote:
>
> > As requested by Richard: Rediffed patch.
> >
> > Changes: s/.c/.cc/ + some whitespace changes.
> > (At least in my email reader, some  were lost. I also fixed
> > too-long line
> > issues.)
> >
> > In addition, FOR_EACH_LOOP was replaced by 'for (auto loop : ...'
> > (macro was removed late in GCC 12 development ? r12-2605-
> > ge41ba804ba5f5c)
> >
> > Otherwise, it should be identical to Frederik's patch, earlier in
> > this thread.
> >
> > On 15.12.21 16:54, Frederik Harwath wrote:
> > > Extend dump output to make understanding why Graphite rejects to
> > > include a loop in a SCoP easier (for GCC developers).
> >
> > OK for mainline?
>
> +  if (printed)
> +fprintf (file, "\b\b");
>
> please find other means of omitting ", ", like by printing it
> _before_ the number but only for the second and following loop
> number.

Done.

>
> I'll also note that
>
> +static void
> +print_sese_loop_numbers (FILE *file, sese_l sese)
> +{
> +  bool printed = false;
> +  for (auto loop : loops_list (cfun, 0))
> +{
> +  if (loop_in_sese_p (loop, sese))
> +   fprintf (file, "%d, ", loop->num);
> +  printed = true;
> +}
>
> is hardly optimal.  Please instead iterate over
> sese.entry->dest->loop_father and children instead which you can do
> by passing that as extra argument to loops_list.

Done.

This had to be extended a little bit, because a SCoP
can consist of consecutive loop-nests and iterating
only over "loops_list (cfun, LI_INCLUDE_ROOT, sese.entry->dest-
>loop_father))" would output only the loops from the first
loop-nest in the SCoP (cf. the test file scop-22a.c that I added).

>
> +
> +  if (dump_file && dump_flags & TDF_DETAILS)
> +{
> +  fprintf (dump_file, "Loops in SCoP: ");
> +  for (auto loop : loops_list (cfun, 0))
> +   if (loop_in_sese_p (loop, s))
> + fprintf (dump_file, "%d ", loop->num);
> +  fprintf (dump_file, "\n");
> +}
>
> you are duplicating functionality of the function you just added ...
>

Fixed.

> Otherwise looks OK to me.

Can I commit the revised patch?

Thanks for your review,
Frederik

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
From fb268a37704b1598a84051c735514ff38adad038 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Wed, 18 May 2022 07:59:42 +0200
Subject: [PATCH] graphite: Extend SCoP detection dump output

Extend dump output to make understanding why Graphite rejects to
include a loop in a SCoP easier (for GCC developers).

gcc/ChangeLog:

	* graphite-scop-detection.cc (scop_detection::can_represent_loop):
	Output reason for failure to dump file.
	(scop_detection::harmful_loop_in_region): Likewise.
	(scop_detection::graphite_can_represent_expr): Likewise.
	(scop_detection::stmt_has_simple_data_refs_p): Likewise.
	(scop_detection::stmt_simple_for_scop_p): Likewise.
	(print_sese_loop_numbers): New function.
	(scop_detection::add_scop): Use from here.

gcc/testsuite/ChangeLog:

	* gcc.dg/graphite/scop-22a.c: New test.
---
 gcc/graphite-scop-detection.cc   | 184 ---
 gcc/testsuite/gcc.dg/graphite/scop-22a.c |  56 +++
 2 files changed, 219 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/scop-22a.c

diff --git a/gcc/graphite-scop-detection.cc b/gcc/graphite-scop-detection.cc
index 8c0ee9975579..9792d87ee0ae 100644
--- a/gcc/graphite-scop-detection.cc
+++ b/gcc/graphite-scop-detection.cc
@@ -69,12 +69,27 @@ public:
 fprintf (output.dump_file, "%d", i);
 return output;
   }
+
   friend debug_printer &
   operator<< (debug_printer , const char *s)
   {
 fprintf (output.dump_file, "%s", s);
 return output;
   }
+
+  friend debug_printer &
+  operator<< (debug_printer , gimple* stmt)
+  {
+print_gimple_stmt (output.dump_file, stmt, 0, TDF_VOPS | TDF_MEMSYMS);
+return output;
+  }
+
+  friend debug_printer &
+  operator<< (debug_printer , tree t)
+  {
+print_generic_expr (output.dump_file, t, TDF_SLIM);
+return output;
+  }
 } dp;
 
 #define DEBUG_PRINT(args) do \
@@ -506,6 +521,27 @@ scop_detection::merge_sese (sese_l first, sese_l second) const
   return combined;
 }
 
+/* Print the loop numbers of the loops contained in SESE to FILE. */
+
+static void
+print_sese_loop_numbers (FILE *file, sese_l sese)
+{
+  bool first_loop = true;
+  for (loop_p nest = sese.entry->dest->loop_father; nest; nest = nest->next)
+{
+  if (!loop_in_sese_p (nest, sese))
+break;
+
+  for (auto loop : loops_list (cfun, LI_INCLUDE_ROOT, nest))
+{
+  gcc_assert (loop_in_sese_p (loop, sese));
+
+  fprintf (file, "%s%d", first_loop ? "" : ", ", loop->num);
+  first_loop = 

[PATCH, committed][OpenACC] Adapt libgomp acc_get_property.f90 test

2020-02-21 Thread Harwath, Frederik
Hi,
The commit r10-6721-g8d1a1cb1b816381bf60cb1211c93b8eba1fe1472 has changed
the name of the type that is used for the return value of the Fortran
acc_get_property function without adapting the test acc_get_property.f90.

This obvious patch fixes that problem. Committed as 
r10-6782-g83d45e1d7155a5a600d8a4aa01aca00d3c6c2d3a.

Best regards,
Frederik
From 83d45e1d7155a5a600d8a4aa01aca00d3c6c2d3a Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Fri, 21 Feb 2020 15:26:02 +0100
Subject: [PATCH] Adapt libgomp acc_get_property.f90 test

The commit r10-6721-g8d1a1cb1b816381bf60cb1211c93b8eba1fe1472 has changed
the name of the type that is used for the return value of the Fortran
acc_get_property function without adapting the test acc_get_property.f90.

2020-02-21  Frederik Harwath  

	* testsuite/libgomp.oacc-fortran/acc_get_property.f90: Adapt to
	changes from 2020-02-19, i.e. use integer(c_size_t) instead of
	integer(acc_device_property) for the type of the return value of
	acc_get_property.
---
 libgomp/ChangeLog  | 7 +++
 .../testsuite/libgomp.oacc-fortran/acc_get_property.f90| 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 3c640c7350b..bff3ae58c9a 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,10 @@
+2020-02-21  Frederik Harwath  
+
+	* testsuite/libgomp.oacc-fortran/acc_get_property.f90: Adapt to
+	changes from 2020-02-19, i.e. use integer(c_size_t) instead of
+	integer(acc_device_property) for the type of the return value of
+	acc_get_property.
+
 2020-02-19  Tobias Burnus  
 
 	* .gitattributes: New; whitespace handling for Fortran's openacc_lib.h.
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90 b/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90
index 80ae292f41f..1af7cc3b988 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90
@@ -26,13 +26,14 @@ end program test
 ! and do basic device independent validation.
 subroutine print_device_properties (device_type)
   use openacc
+  use iso_c_binding, only: c_size_t
   implicit none
 
   integer, intent(in) :: device_type
 
   integer :: device_count
   integer :: device
-  integer(acc_device_property) :: v
+  integer(c_size_t) :: v
   character*256 :: s
 
   device_count = acc_get_num_devices(device_type)
-- 
2.17.1



Re: [PATCH] openmp: ignore nowait if async execution is unsupported [PR93481]

2020-02-13 Thread Harwath, Frederik
Hi Jakub,

On 13.02.20 09:30, Jakub Jelinek wrote:
> On Thu, Feb 13, 2020 at 09:04:36AM +0100, Harwath, Frederik wrote:
>> --- a/libgomp/target.c
>> +++ b/libgomp/target.c
>> @@ -2022,6 +2022,16 @@ GOMP_target (int device, void (*fn) (void *), const 
>> void *unused,
>>gomp_unmap_vars (tgt_vars, true);
>>  }
>>  
>> +static unsigned int
> 
> Add inline?
> 

Added.

>> @@ -2257,6 +2269,8 @@ GOMP_target_update_ext (int device, size_t mapnum, 
>> void **hostaddrs,
>>  {
>>struct gomp_device_descr *devicep = resolve_device (device);
>>  
>> +  flags = clear_unsupported_flags (devicep, flags);

>> @@ -2398,6 +2412,8 @@ GOMP_target_enter_exit_data (int device, size_t 
>> mapnum, void **hostaddrs,
>>  {
>>struct gomp_device_descr *devicep = resolve_device (device);
>>  
>> +  flags = clear_unsupported_flags (devicep, flags);

> I don't see why you need to do the above two.  GOMP_TARGET_TASK_DATA
> is done on the host side, async_run callback isn't called in that case
> and while we create a task, all we do is wait for the (host) dependencies
> in there and then perform the data transfer we need.
> I think it is perfectly fine to ignore nowait on target but honor it
> on target update or target {enter,exit} data.

I see. Removed.


> Otherwise LGTM.

Thanks for the review! I have committed the patch with those changes. I forgot 
to include the ChangeLog
entry which I had to add in a separate commit. Sorry for that! It seems that I 
have to adapt my workflow -
perhaps some pre-push hook ;-).

Best regards,
Frederik

From 001ab12e620c6f117b2e93c77d188bd62fe7ba03 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Thu, 13 Feb 2020 07:30:16 +0100
Subject: [PATCH 1/2] openmp: ignore nowait if async execution is unsupported
 [PR93481]

An OpenMP "nowait" clause on a target construct currently leads to
a call to GOMP_OFFLOAD_async_run in the plugin that is used for
offloading at execution time. The nvptx plugin contains only a stub
of this function that always produces a fatal error if called.

This commit changes the "nowait" implementation to ignore the clause
if the executing device's plugin does not implement GOMP_OFFLOAD_async_run.
The stub in the nvptx plugin is removed which effectively means that
programs containing "nowait" can now be executed with nvptx offloading
as if the clause had not been used.
This behavior is consistent with the OpenMP specification which says that
"[...] execution of the target task *may* be deferred" (emphasis added),
cf. OpenMP 5.0, page 172.

libgomp/

	* plugin/plugin-nvptx.c: Remove GOMP_OFFLOAD_async_run stub.
	* target.c (gomp_load_plugin_for_device): Make "async_run" loading
	optional.
	(gomp_target_task_fn): Assert "devicep->async_run_func".
	(clear_unsupported_flags): New function to remove unsupported flags
	(right now only GOMP_TARGET_FLAG_NOWAIT) that can be be ignored.
	(GOMP_target_ext): Apply clear_unsupported_flags to flags.
	* testsuite/libgomp.c/target-33.c:
	Remove xfail for offload_target_nvptx.
	* testsuite/libgomp.c/target-34.c: Likewise.
---
 libgomp/plugin/plugin-nvptx.c   |  7 +--
 libgomp/target.c| 15 ++-
 libgomp/testsuite/libgomp.c/target-33.c |  3 ---
 libgomp/testsuite/libgomp.c/target-34.c |  3 ---
 4 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 6033c71a9db..ec103a2f40b 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1931,9 +1931,4 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args)
   nvptx_stacks_free (stacks, teams * threads);
 }
 
-void
-GOMP_OFFLOAD_async_run (int ord, void *tgt_fn, void *tgt_vars, void **args,
-			void *async_data)
-{
-  GOMP_PLUGIN_fatal ("GOMP_OFFLOAD_async_run unimplemented");
-}
+/* TODO: Implement GOMP_OFFLOAD_async_run. */
diff --git a/libgomp/target.c b/libgomp/target.c
index 3df007283f4..0ff727de47d 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2022,6 +2022,16 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
   gomp_unmap_vars (tgt_vars, true);
 }
 
+static inline unsigned int
+clear_unsupported_flags (struct gomp_device_descr *devicep, unsigned int flags)
+{
+  /* If we cannot run asynchronously, simply ignore nowait.  */
+  if (devicep != NULL && devicep->async_run_func == NULL)
+flags &= ~GOMP_TARGET_FLAG_NOWAIT;
+
+  return flags;
+}
+
 /* Like GOMP_target, but KINDS is 16-bit, UNUSED is no longer present,
and several arguments have been added:
FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h.
@@ -2054,6 +2064,8 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
   size_t tgt_align = 0, tgt_size = 0;
   bool f

[PATCH] openmp: ignore nowait if async execution is unsupported [PR93481]

2020-02-13 Thread Harwath, Frederik
Hi Jakub,

On 10.02.20 08:49, Harwath, Frederik wrote:

>> There has been even in some PR a suggestion that instead of failing
>> in nvptx async_run we should just ignore the nowait clause if the plugin
>> doesn't implement it properly.
> 
> This must be https://gcc.gnu.org/PR93481.

The attached patch implements the behavior that has been suggested in the PR.
It makes GOMP_OFFLOAD_async_run optional, removes the stub which produces
the error described in the PR from the nvptx plugin, and changes the 
nowait-handling
to ignore the clause if GOMP_OFFLOAD_async_run is not available for the 
executing
device's plugin. I have tested the patch by running the full libgomp testsuite 
with
nvptx-none offloading on x86_64-linux-gnu. I have observed no regressions.

Ok to push the commit to master?

For the record: Someone should implement GOMP_OFFLOAD_async_run properly
in the nvtpx plugin.

Best regards,
Frederik

From 1258f713be317870e9171281e3f7c3a174773aa1 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Thu, 13 Feb 2020 07:30:16 +0100
Subject: [PATCH] openmp: ignore nowait if async execution is unsupported
 [PR93481]

An OpenMP "nowait" clause on a target construct currently leads to
a call to GOMP_OFFLOAD_async_run in the plugin that is used for
offloading at execution time. The nvptx plugin contains only a stub
of this function that always produces a fatal error if called.

This commit changes the "nowait" implementation to ignore the clause
if the executing device's plugin does not implement GOMP_OFFLOAD_async_run.
The stub in the nvptx plugin is removed which effectively means that
programs containing "nowait" can now be executed with nvptx offloading
as if the clause had not been used.
This behavior is consistent with the OpenMP specification which says that
"[...] execution of the target task *may* be deferred" (emphasis added),
cf. OpenMP 5.0, page 172.

libgomp/

	* plugin/plugin-nvptx.c: Remove GOMP_OFFLOAD_async_run stub.
	* target.c (gomp_load_plugin_for_device): Make "async_run" loading
	optional.
	(gomp_target_task_fn): Assert "devicep->async_run_func".
	(clear_unsupported_flags): New function to remove unsupported flags
	(right now only GOMP_TARGET_FLAG_NOWAIT) that can be be ignored.
	(GOMP_target_ext): Apply clear_unsupported_flags to flags.
	(GOMP_target_update_ext): Likewise.
	(GOMP_target_enter_exit_data): Likewise.
	* testsuite/libgomp.c/target-33.c:
	Remove xfail for offload_target_nvptx.
	* testsuite/libgomp.c/target-34.c: Likewise.
---
 libgomp/plugin/plugin-nvptx.c   |  7 +--
 libgomp/target.c| 19 ++-
 libgomp/testsuite/libgomp.c/target-33.c |  3 ---
 libgomp/testsuite/libgomp.c/target-34.c |  3 ---
 4 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 6033c71a9db..ec103a2f40b 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1931,9 +1931,4 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args)
   nvptx_stacks_free (stacks, teams * threads);
 }
 
-void
-GOMP_OFFLOAD_async_run (int ord, void *tgt_fn, void *tgt_vars, void **args,
-			void *async_data)
-{
-  GOMP_PLUGIN_fatal ("GOMP_OFFLOAD_async_run unimplemented");
-}
+/* TODO: Implement GOMP_OFFLOAD_async_run. */
diff --git a/libgomp/target.c b/libgomp/target.c
index 3df007283f4..4fbf963f305 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2022,6 +2022,16 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
   gomp_unmap_vars (tgt_vars, true);
 }
 
+static unsigned int
+clear_unsupported_flags (struct gomp_device_descr *devicep, unsigned int flags)
+{
+  /* If we cannot run asynchronously, simply ignore nowait.  */
+  if (devicep != NULL && devicep->async_run_func == NULL)
+flags &= ~GOMP_TARGET_FLAG_NOWAIT;
+
+  return flags;
+}
+
 /* Like GOMP_target, but KINDS is 16-bit, UNUSED is no longer present,
and several arguments have been added:
FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h.
@@ -2054,6 +2064,8 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
   size_t tgt_align = 0, tgt_size = 0;
   bool fpc_done = false;
 
+  flags = clear_unsupported_flags (devicep, flags);
+
   if (flags & GOMP_TARGET_FLAG_NOWAIT)
 {
   struct gomp_thread *thr = gomp_thread ();
@@ -2257,6 +2269,8 @@ GOMP_target_update_ext (int device, size_t mapnum, void **hostaddrs,
 {
   struct gomp_device_descr *devicep = resolve_device (device);
 
+  flags = clear_unsupported_flags (devicep, flags);
+
   /* If there are depend clauses, but nowait is not present,
  block the parent task until the dependencies are resolved
  and then just continue with the rest of the function as if it
@@ -2398,6 +2412,8 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,
 {

Re: [PATCH] xfail and improve some failing libgomp tests

2020-02-09 Thread Harwath, Frederik
Hi Jakub,

On 07.02.20 16:29, Jakub Jelinek wrote:
> On Fri, Feb 07, 2020 at 09:56:38AM +0100, Harwath, Frederik wrote:
>> * {target-32.c, thread-limit-2.c}:
>> no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690
> 
> Please don't, I want to deal with that using declare variant, just didn't
> get yet around to finishing the last patch needed for that.  Will try next 
> week.

Ok, great! looking forward to see a better solution.

>> * target-{33,34}.c:
>> no "GOMP_OFFLOAD_async_run" implemented in plugin-nvptx.c. Cf. 
>> https://gcc.gnu.org/PR81688
>>
>> * target-link-1.c:
>> omp "target link" not implemented for nvptx. Cf. https://gcc.gnu.org/PR81689
> 
> I guess this is ok, though of course the right thing would be to implement
> both
Ok, this means that I can commit the attached patch which contains only the 
changes to
target-{33,43}.c and target-link-1.c? Of course, I agree that those features 
should be
implemented.

> There has been even in some PR a suggestion that instead of failing
> in nvptx async_run we should just ignore the nowait clause if the plugin
> doesn't implement it properly.

This must be https://gcc.gnu.org/PR93481.

Best regards,
Frederik


From e5165ccb143022614920dbd208f6f368b84b4382 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 10 Feb 2020 08:08:00 +0100
Subject: [PATCH] Add xfails to libgomp tests target-{33,34}.c, target-link-1.c

Add xfails for nvptx offloading because
"no GOMP_OFFLOAD_async_run implemented in plugin-nvptx.c"
(https://gcc.gnu.org/PR81688) and because
"omp target link not implemented for nvptx"
(https://gcc.gnu.org/PR81689).

libgomp/
	* testsuite/libgomp.c/target-33.c: Add xfail for execution on
	offload_target_nvptx, cf. https://gcc.gnu.org/PR81688.
	* testsuite/libgomp.c/target-34.c: Likewise.
	* testsuite/libgomp.c/target-link-1.c: Add xfail for
	offload_target_nvptx, cf. https://gcc.gnu.org/PR81689.
---
 libgomp/testsuite/libgomp.c/target-33.c | 3 +++
 libgomp/testsuite/libgomp.c/target-34.c | 3 +++
 libgomp/testsuite/libgomp.c/target-link-1.c | 3 +++
 3 files changed, 9 insertions(+)

diff --git a/libgomp/testsuite/libgomp.c/target-33.c b/libgomp/testsuite/libgomp.c/target-33.c
index 1bed4b6bc67..15d2d7e38ab 100644
--- a/libgomp/testsuite/libgomp.c/target-33.c
+++ b/libgomp/testsuite/libgomp.c/target-33.c
@@ -1,3 +1,6 @@
+/* { dg-xfail-run-if "GOMP_OFFLOAD_async_run not implemented" { offload_target_nvptx } }
+   Cf. https://gcc.gnu.org/PR81688.  */
+
 extern void abort (void);
 
 int
diff --git a/libgomp/testsuite/libgomp.c/target-34.c b/libgomp/testsuite/libgomp.c/target-34.c
index 66d9f54202b..5a3596424d8 100644
--- a/libgomp/testsuite/libgomp.c/target-34.c
+++ b/libgomp/testsuite/libgomp.c/target-34.c
@@ -1,3 +1,6 @@
+/* { dg-xfail-run-if "GOMP_OFFLOAD_async_run not implemented" { offload_target_nvptx } }
+   Cf. https://gcc.gnu.org/PR81688.  */
+
 extern void abort (void);
 
 int
diff --git a/libgomp/testsuite/libgomp.c/target-link-1.c b/libgomp/testsuite/libgomp.c/target-link-1.c
index 681677cc2aa..99ce33bc9b4 100644
--- a/libgomp/testsuite/libgomp.c/target-link-1.c
+++ b/libgomp/testsuite/libgomp.c/target-link-1.c
@@ -1,3 +1,6 @@
+/* { dg-xfail-if "#pragma omp target link not implemented" { offload_target_nvptx } }
+   Cf. https://gcc.gnu.org/PR81689.  */
+
 struct S { int s, t; };
 
 int a = 1, b = 1;
-- 
2.17.1



[PATCH] xfail and improve some failing libgomp tests

2020-02-07 Thread Harwath, Frederik
Hi,
the libgomp testsuite contains some test cases (all in 
/libgomp/testsuite/libgomp.c/)
which fail with nvptx offloading because of some long standing issues:

* {target-32.c, thread-limit-2.c}:
no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690

* target-{33,34}.c:
no "GOMP_OFFLOAD_async_run" implemented in plugin-nvptx.c. Cf. 
https://gcc.gnu.org/PR81688

* target-link-1.c:
omp "target link" not implemented for nvptx. Cf. https://gcc.gnu.org/PR81689


All these issues have been known, at least, since 2016:

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00972.html

As suggested in this mail:
 "Short term, it should be possible to implement something like -foffload=^nvptx
to skip PTX (and only PTX) offloading on those tests."

Well, we can now skip/xfail tests for nvptx offloading using the effective 
target
"offload_target_nvptx" and the present patch uses this to xfail the tests for 
which
no short-term solution is in sight, i.e. the GOMP_OFFLOAD_async_run and the 
"target link"
related failures.

Regarding the "usleep" issue, I have decided to follow Jakub's suggestion
(cf. https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01026.html) to
replace usleep by busy waiting. As noted by Tobias
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81690#c4), this involves creating 
separate
test files for the cases with and without usleep. This solution is a bit 
cumbersome but I
think we can live with it, in particular, since the actual test case 
implementations do not
get duplicated (they have been moved into auxiliary header files which are 
shared by both
variants of the corresponding tests).

Since the "usleep" issue also concerns amdgcn, I have introduced an effective 
target
"offload_target_amdgcn" to add xfails for this offloading target, too. This 
behaves like
"offload_target_nvptx" but for amdgcn. Note that the existing amdgcn effective 
targets
cannot be used for our purpose since they are OpenACC-specific.

The new thread-limit-2-nosleep.c should now pass for both nvptx and amdgcn 
offloading
whereas thread-limit-2.c should xfail. The new target-32-nosleep.c passes with 
amdgcn
offloading, but xfails with nvptx offloading, because it also needs the 
unimplemented
GOMP_OFFLOAD_async_run.

With the patch, the detailed test summary now looks as follows for me:

nvptx offloading:

// Expected execution failures due to missing usleep
PASS: libgomp.c/target-32-nosleep.c (test for excess errors)
XFAIL: libgomp.c/target-32-nosleep.c execution test// missing 
GOMP_OFFLOAD_async_run
XFAIL: libgomp.c/target-32.c (test for excess errors)
UNRESOLVED: libgomp.c/target-32.c compilation failed to produce executable

PASS: libgomp.c/thread-limit-2-nosleep.c (test for excess errors)
PASS: libgomp.c/thread-limit-2-nosleep.c execution test
XFAIL: libgomp.c/thread-limit-2.c (test for excess errors)
UNRESOLVED: libgomp.c/thread-limit-2.c compilation failed to produce executable

// Expected execution failures due to missing GOMP_OFFLOAD_async_run
PASS: libgomp.c/target-33.c (test for excess errors)
XFAIL: libgomp.c/target-33.c execution test
PASS: libgomp.c/target-34.c (test for excess errors)
XFAIL: libgomp.c/target-34.c execution test

// Expected compilation failures due to missing target link
XFAIL: libgomp.c/target-link-1.c (test for excess errors)
UNRESOLVED: libgomp.c/target-link-1.c compilation failed to produce executable


amdgcn offloading:

// Tests using usleep
PASS: libgomp.c/target-32-nosleep.c (test for excess errors)
PASS: libgomp.c/target-32-nosleep.c execution test
XFAIL: libgomp.c/target-32.c 7 blank line(s) in output
XFAIL: libgomp.c/target-32.c (test for excess errors)
UNRESOLVED: libgomp.c/target-32.c compilation failed to produce executable

PASS: libgomp.c/thread-limit-2-nosleep.c (test for excess errors)
PASS: libgomp.c/thread-limit-2-nosleep.c execution test
XFAIL: libgomp.c/thread-limit-2.c 1 blank line(s) in output
XFAIL: libgomp.c/thread-limit-2.c (test for excess errors)

// No failures since GOMP_OFFLOAD_async_run works on amdgcn
PASS: libgomp.c/target-33.c (test for excess errors)
PASS: libgomp.c/target-33.c execution test
PASS: libgomp.c/target-34.c (test for excess errors)
PASS: libgomp.c/target-34.c execution test

// No xfail here
PASS: libgomp.c/target-link-1.c (test for excess errors)
FAIL: libgomp.c/target-link-1.c execution test

Note that target-link-1.c execution does also fail on amdgcn.
Since - in contrast to nvptx - it seems that the cause of this failure
has not yet been investigated and discussed, I have not added an xfail
for amdgcn to this test.

All testing has been done with a x86_64-linux-gnu host and target.

Ok to commit this patch?

Best regards,
Frederik





From 6e5e2d45f02235a0f72e6130dcd8d52f88f7b126 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Fri, 7 Feb 2020 08:03:00 +0100
Subject: [PATCH] xfail and improve some failing libgomp tests

* libgomp.c/{target-32.c,thread-limit-2.c}

Regarding failures because "no usleep implemented for 

Re: Make OpenACC 'acc_get_property' with 'acc_device_current' work (was: [PATCH] Add OpenACC 2.6 `acc_get_property' support)

2020-02-03 Thread Harwath, Frederik
Hi Thomas,

On 30.01.20 16:54, Thomas Schwinge wrote:
> 
> [...] the 'acc_device_current' interface should work already now.
> 
> [...] Please review
> the attached (Tobias the Fortran test cases, please), and test with AMD
> GCN offloading.  If approving this patch, please respond with

I have tested the patch with AMD GCN offloading and I have observed no 
regressions.
The new tests pass as expected and print the correct output.
Great that you have extended the Fortran tests!

> diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
> index ef12b4c16d01..c28c0f689ba2 100644
> --- a/libgomp/oacc-init.c
> +++ b/libgomp/oacc-init.c
> @@ -796,7 +796,9 @@ get_property_any (int ord, acc_device_t d, 
> acc_device_property_t prop)
> size_t
> acc_get_property (int ord, acc_device_t d, acc_device_property_t prop)
> {
> -  if (!known_device_type_p (d))
> +  if (d == acc_device_current)
> +; /* Allowed only for 'acc_get_property', 'acc_get_property_string'.  */
> +  else if (!known_device_type_p (d))
> unknown_device_type_error(d);

I don't like the empty if branch very much. Introducing a variable
(for instance, "bool allowed_device_type = acc_device_current
|| known_device_type(d);") would also provide a place for your comment.
You could also extract a function to avoid duplicating the explanation
in acc_get_property_string.

The patch looks good to me.

Reviewed-by: Frederik Harwath  

Best regards,
Frederik



Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-31 Thread Harwath, Frederik
Hi Thomas,

On 30.01.20 17:08, Thomas Schwinge wrote:

> I understand correctly that the only reason for:
> 
> On 2020-01-29T10:52:57+0100, "Harwath, Frederik"  
> wrote:
>>  * testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
>>  (expect_device_properties): Split function into ...
>>  (expect_device_string_properties): ... this new function ...
>>  (expect_device_memory): ... and this new function.
> 
> ... this split is that we can't test 'expect_device_memory' here:

> [...]

> ..., because that one doesn't (re-)implement the 'acc_property_memory'
> interface?

Correct. But why "re-"? It has not been implemented before.

>> --- a/libgomp/plugin/plugin-gcn.c
>> +++ b/libgomp/plugin/plugin-gcn.c
> 
>> @@ -4115,12 +4141,37 @@ GOMP_OFFLOAD_openacc_async_dev2host (int device, 
>> void *dst, const void *src,
>>  union goacc_property_value
>>  GOMP_OFFLOAD_openacc_get_property (int device, enum goacc_property prop)
>>  {
>> [...]
>> +  switch (prop)
>> +{
>> +case GOACC_PROPERTY_FREE_MEMORY:
>> +  /* Not supported. */
>> +  break;
> 
> (OK, can be added later when somebody feels like doing that.)

Well, "not supported" means that there seems to be no (reasonable) way to obtain
the necessary information from the runtime - in contrast to the nvptx plugin
where it can be obtained easily through the CUDA API.

> 
>> +case GOACC_PROPERTY_MEMORY:
>> +  {
>> +size_t size;
>> +hsa_region_t region = agent->data_region;
>> +hsa_status_t status =
>> +  hsa_fns.hsa_region_get_info_fn (region, HSA_REGION_INFO_SIZE, );
>> +if (status == HSA_STATUS_SUCCESS)
>> +  propval.val = size;
>> +break;
>> +  }
>> [...]
>>  }
> 
> Here we got 'acc_property_memory' implemented, but not here:
> 
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c

Yes, there seems to be no straightforward way to determine the expected value 
through
the runtime API. We might of course try to replicate the logic that is
used in plugin-gcn.c.

Best regards,
Frederik



Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-29 Thread Harwath, Frederik
Hi Thomas,

On 29.01.20 18:44, Thomas Schwinge wrote:

>> +  size_t len = sizeof hsa_context.driver_version_s;
>> +  int printed = snprintf (hsa_context.driver_version_s, len,
>> +  "HSA Runtime %hu.%hu", (unsigned short int)major,
>> +  (unsigned short int)minor);
>> +  if (printed >= len)
>> +GCN_WARNING ("HSA runtime version string was truncated."
>> + "Version %hu.%hu is too long.", (unsigned short int)major,
>> + (unsigned short int)minor);
> 
> (Can it actually happen that 'snprintf' returns 'printed > len' --
> meaning that it's written into random memory?  I thought 'snprintf' has a
> hard stop at 'len'?  Or does this indicate the amount of memory it
> would've written?  I should re-read the manpage at some point...)  ;-)
> 

Yes, "printed > len" can happen. Seems that I have chosen a bad variable name.
"actual_len" (of the formatted string that should have been written -
excluding the terminating '\0') would have been more appropriate.


> For 'printed = len' does or doesn't 'snprintf' store the terminating
> 'NUL' character, or do we manually have to set:
> 
> hsa_context.driver_version_s[len - 1] = '\0';
> 
> ... in that case?

No, in this case, the printed string is missing the last character, but the
terminating '\0' has been written. Consider:

#include 

int main () {
char s[] = "foo";
char buf[3];

// buf is too short to hold terminating '\0'
int actual_len = snprintf (buf, 3, "%s", s);
printf ("buf: %s\n", buf);
printf ("actual_len: %d\n", actual_len);
}

Output:


buf: fo
actual_len: 3

> 
>> @@ -3410,15 +3432,19 @@ GOMP_OFFLOAD_init_device (int n)
> 
>> -  char buf[64];
>>status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AGENT_INFO_NAME,
>> -  );
>> +  >name);
>>if (status != HSA_STATUS_SUCCESS)
>>  return hsa_error ("Error querying the name of the agent", status);
> 
> (That's of course pre-existing, but) this looks like a dangerous API,
> given that 'hsa_agent_get_info_fn' doesn't know 'sizeof agent->name' (or
> 'sizeof buf' before)...

The API documentation
(cf. 
https://rocm-documentation.readthedocs.io/en/latest/ROCm_API_References/ROCr-API.html)
states that "the type of this attribute is a NUL-terminated char[64]".
But, right, should this ever change, we might not notice it.

Best regards,
Frederik




[PATCH][OpenACC] Add acc_device_radeon to name_of_acc_device_t function

2020-01-29 Thread Harwath, Frederik
Hi,
we should handle acc_device_radeon in the name_of_acc_device_t function
which is used in libgomp/oacc-init.c to display the name of devices
in several error messages.

Ok to commit this patch to master?

Best regards,
Frederik

From 6aacba3e8123ce5e0961857802fd7d8a103aa96b Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 27 Jan 2020 15:41:26 +0100
Subject: [PATCH] Add acc_device_radeon to name_of_acc_device_t function

libgomp/
	* oacc-init.c (name_of_acc_device_t): Handle acc_device_radeon.
---
 libgomp/oacc-init.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 89a30b3e716..ef12b4c16d0 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -115,6 +115,7 @@ name_of_acc_device_t (enum acc_device_t type)
 case acc_device_host: return "host";
 case acc_device_not_host: return "not_host";
 case acc_device_nvidia: return "nvidia";
+case acc_device_radeon: return "radeon";
 default: unknown_device_type_error (type);
 }
   __builtin_unreachable ();
-- 
2.17.1



Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-29 Thread Harwath, Frederik
Hi Andrew,

On 29.01.20 11:38, Andrew Stubbs wrote:
> On 29/01/2020 09:52, Harwath, Frederik wrote:

> 
> Patch 1 is OK with the formatting fixed.
> Patch 2 is OK.
> 
> Thanks very much,
> 

Committed as 2e5ea57959183bd5bd0356739bb5167417401a31 and 
87c3fcfa6bbb5c372d4e275276d21f601d0b62b0.

Thank you for the review,
Frederik



Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-29 Thread Harwath, Frederik
Hi Andrew,

On 28.01.20 16:42, Andrew Stubbs wrote:
> On 28/01/2020 14:55, Harwath, Frederik wrote:
> 
> If we're going to use a fixed-size buffer then we should use snprintf and 
> emit GCN_WARNING if the return value is greater than 
> "sizeof(driver_version_s)", even though that is unlikely. Do the same in the 
> testcase, but use a bigger buffer so that truncation causes a mismatch and 
> test failure.

Ok.


> I realise that an existing function in this testcase uses this layout, but 
> the code style does not normally have the parameter list on the next line, 
> and certainly not in column 1.

Ok. I have also adjusted the formatting in the other acc_get_property tests to 
the code style. I have turned this into a separate trivial patch.

Ok to commit the revised patch?

Best regards,
Frederik

From fb15cb9058feeda8891d6454d32f43fda885b789 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Wed, 29 Jan 2020 10:19:50 +0100
Subject: [PATCH 1/2] Add OpenACC acc_get_property support for AMD GCN

Add full support for the OpenACC 2.6 acc_get_property and
acc_get_property_string functions to the libgomp GCN plugin.

libgomp/
	* plugin-gcn.c (struct agent_info): Add fields "name" and
	"vendor_name" ...
	(GOMP_OFFLOAD_init_device): ... and init from here.
	(struct hsa_context_info): Add field "driver_version_s" ...
	(init_hsa_contest): ... and init from here.
	(GOMP_OFFLOAD_openacc_get_property): Replace stub with a proper
	implementation.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property.c:
	Enable test execution for amdgcn and host offloading targets.
	* testsuite/libgomp.oacc-fortran/acc_get_property.f90: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
	(expect_device_properties): Split function into ...
	(expect_device_string_properties): ... this new function ...
	(expect_device_memory): ... and this new function.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c:
	Add test.
---
 libgomp/plugin/plugin-gcn.c   |  71 --
 .../acc_get_property-aux.c|  79 ++-
 .../acc_get_property-gcn.c| 132 ++
 .../acc_get_property.c|   5 +-
 .../libgomp.oacc-fortran/acc_get_property.f90 |   2 -
 5 files changed, 242 insertions(+), 47 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 7854c142f05..45c625495b9 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -425,7 +425,10 @@ struct agent_info
 
   /* The instruction set architecture of the device. */
   gcn_isa device_isa;
-
+  /* Name of the agent. */
+  char name[64];
+  /* Name of the vendor of the agent. */
+  char vendor_name[64];
   /* Command queues of the agent.  */
   hsa_queue_t *sync_queue;
   struct goacc_asyncqueue *async_queues, *omp_async_queue;
@@ -544,6 +547,8 @@ struct hsa_context_info
   int agent_count;
   /* Array of agent_info structures describing the individual HSA agents.  */
   struct agent_info *agents;
+  /* Driver version string. */
+  char driver_version_s[30];
 };
 
 /* Format of the on-device heap.
@@ -1513,6 +1518,23 @@ init_hsa_context (void)
 	GOMP_PLUGIN_error ("Failed to list all HSA runtime agents");
 }
 
+  uint16_t minor, major;
+  status = hsa_fns.hsa_system_get_info_fn (HSA_SYSTEM_INFO_VERSION_MINOR, );
+  if (status != HSA_STATUS_SUCCESS)
+GOMP_PLUGIN_error ("Failed to obtain HSA runtime minor version");
+  status = hsa_fns.hsa_system_get_info_fn (HSA_SYSTEM_INFO_VERSION_MAJOR, );
+  if (status != HSA_STATUS_SUCCESS)
+GOMP_PLUGIN_error ("Failed to obtain HSA runtime major version");
+
+  size_t len = sizeof hsa_context.driver_version_s;
+  int printed = snprintf (hsa_context.driver_version_s, len,
+			  "HSA Runtime %hu.%hu", (unsigned short int)major,
+			  (unsigned short int)minor);
+  if (printed >= len)
+GCN_WARNING ("HSA runtime version string was truncated."
+		 "Version %hu.%hu is too long.", (unsigned short int)major,
+		 (unsigned short int)minor);
+
   hsa_context.initialized = true;
   return true;
 }
@@ -3410,15 +3432,19 @@ GOMP_OFFLOAD_init_device (int n)
 return hsa_error ("Error requesting maximum queue size of the GCN agent",
 		  status);
 
-  char buf[64];
   status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AGENT_INFO_NAME,
-	  );
+	  >name);
   if (status != HSA_STATUS_SUCCESS)
 return hsa_error ("Error querying the name of the agent", status);
 
-  agent->device_isa = isa_code (buf);
+  agent->device_isa = isa_code (agent->name);
   if (agent->device_isa < 0)
-return hsa_error ("Unknown GCN agent architecture.", HSA_STATUS_ERROR);
+return hsa_error ("Unknown GCN agent 

[PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-28 Thread Harwath, Frederik
Hi,
this patch adds full support for the OpenACC 2.6 acc_get_property and
acc_get_property_string functions to the libgomp GCN plugin. This replaces
the existing stub in libgomp/plugin-gcn.c.

Andrew: The value returned for acc_property_memory ("size of device memory in 
bytes"
according to the spec) is the HSA_REGION_INFO_SIZE of the agent's data_region. 
This
has been adapted from a previous incomplete implementation that we had on the 
OG9 branch.
Does that sound reasonable?

I have tested the patch with amdgcn and nvptx offloading.

Ok to commit this to the main branch?


Best regards,
Frederik

From 6f1855281c38993a088f9b4af020a786f8e05fe9 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 28 Jan 2020 08:01:00 +0100
Subject: [PATCH] Add OpenACC acc_get_property support for AMD GCN

Add full support for the OpenACC 2.6 acc_get_property and
acc_get_property_string functions to the libgomp GCN plugin.

libgomp/
	* plugin-gcn.c (struct agent_info): Add fields "name" and
	"vendor_name" ...
	(GOMP_OFFLOAD_init_device): ... and init from here.
	(struct hsa_context_info): Add field "driver_version_s" ...
	(init_hsa_contest): ... and init from here.
	(GOMP_OFFLOAD_openacc_get_property): Replace stub with a proper
	implementation.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property.c:
	Enable test execution for amdgcn and host offloading targets.
	* testsuite/libgomp.oacc-fortran/acc_get_property.f90: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
	(expect_device_properties): Split function into ...
	(expect_device_string_properties): ... this new function ...
	(expect_device_memory): ... and this new function.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c:
	Add test.
---
 libgomp/plugin/plugin-gcn.c   |  63 +++--
 .../acc_get_property-aux.c|  60 +---
 .../acc_get_property-gcn.c| 132 ++
 .../acc_get_property.c|   5 +-
 .../libgomp.oacc-fortran/acc_get_property.f90 |   2 -
 5 files changed, 224 insertions(+), 38 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 7854c142f05..0a09daaa0a4 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -425,7 +425,10 @@ struct agent_info
 
   /* The instruction set architecture of the device. */
   gcn_isa device_isa;
-
+  /* Name of the agent. */
+  char name[64];
+  /* Name of the vendor of the agent. */
+  char vendor_name[64];
   /* Command queues of the agent.  */
   hsa_queue_t *sync_queue;
   struct goacc_asyncqueue *async_queues, *omp_async_queue;
@@ -544,6 +547,8 @@ struct hsa_context_info
   int agent_count;
   /* Array of agent_info structures describing the individual HSA agents.  */
   struct agent_info *agents;
+  /* Driver version string. */
+  char driver_version_s[30];
 };
 
 /* Format of the on-device heap.
@@ -1513,6 +1518,15 @@ init_hsa_context (void)
 	GOMP_PLUGIN_error ("Failed to list all HSA runtime agents");
 }
 
+  uint16_t minor, major;
+  status = hsa_fns.hsa_system_get_info_fn (HSA_SYSTEM_INFO_VERSION_MINOR, );
+  if (status != HSA_STATUS_SUCCESS)
+GOMP_PLUGIN_error ("Failed to obtain HSA runtime minor version");
+  status = hsa_fns.hsa_system_get_info_fn (HSA_SYSTEM_INFO_VERSION_MAJOR, );
+  if (status != HSA_STATUS_SUCCESS)
+GOMP_PLUGIN_error ("Failed to obtain HSA runtime major version");
+  sprintf (hsa_context.driver_version_s, "HSA Runtime %d.%d", major, minor);
+
   hsa_context.initialized = true;
   return true;
 }
@@ -3410,15 +3424,19 @@ GOMP_OFFLOAD_init_device (int n)
 return hsa_error ("Error requesting maximum queue size of the GCN agent",
 		  status);
 
-  char buf[64];
   status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AGENT_INFO_NAME,
-	  );
+	  >name);
   if (status != HSA_STATUS_SUCCESS)
 return hsa_error ("Error querying the name of the agent", status);
 
-  agent->device_isa = isa_code (buf);
+  agent->device_isa = isa_code (agent->name);
   if (agent->device_isa < 0)
-return hsa_error ("Unknown GCN agent architecture.", HSA_STATUS_ERROR);
+return hsa_error ("Unknown GCN agent architecture", HSA_STATUS_ERROR);
+
+  status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AGENT_INFO_VENDOR_NAME,
+	  >vendor_name);
+  if (status != HSA_STATUS_SUCCESS)
+return hsa_error ("Error querying the vendor name of the agent", status);
 
   status = hsa_fns.hsa_queue_create_fn (agent->id, queue_size,
 	HSA_QUEUE_TYPE_MULTI,
@@ -4115,12 +4133,37 @@ GOMP_OFFLOAD_openacc_async_dev2host (int device, void *dst, const void *src,
 union goacc_property_value
 GOMP_OFFLOAD_openacc_get_property (int device, enum goacc_property prop)
 {
-  /* Stub. Check device and return default value for unsupported properties. */
-  /* TODO: Implement this function. */
-  get_agent_info (device);
+  

Re: [PATCH] Add OpenACC 2.6 `acc_get_property' support

2020-01-24 Thread Harwath, Frederik
Hi Thomas,

On 23.01.20 15:32, Thomas Schwinge wrote:

> On 2020-01-20T15:01:01+0100, "Harwath, Frederik"  
> wrote:
>> On 16.01.20 17:00, Thomas Schwinge wrote:
>>> On 2019-12-20T17:46:57+0100, "Harwath, Frederik" 
>>>  wrote:
>> Ok to push the commit to master?
> 
> Thanks, OK.  Reviewed-by: Thomas Schwinge 

Thank you. Committed as 4bd03ed69bd789278a0286017b692f49052ffe5c, including the 
changes to the size_t
formatting.

Best regards,
Frederik

> 
> 
> As a low-priority follow-up, please look into:
> 
> 
> source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c:
>  In function 'expect_device_properties':
> 
> source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c:74:24:
>  warning: format '%d' expects argument of type 'int', but argument 3 has type 
> 'const char *' [-Wformat=]
>74 |   fprintf (stderr, "Expected value of unknown string property 
> to be NULL, "
>   |
> ^~~~
>75 | "but was %d.\n", s);
>   |  ~
>   |  |
>   |  const char *
> 
> source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c:75:19:
>  note: format string is defined here
>75 | "but was %d.\n", s);
>   |  ~^
>   |   |
>   |   int
>   |  %s
> 
> ..., and (random example):
> 
>>int unknown_property = 16058;
>> -  int v = acc_get_property (dev_num, dev_type, 
>> (acc_device_property_t)unknown_property);
>> +  size_t v = acc_get_property (dev_num, dev_type, 
>> (acc_device_property_t)unknown_property);
>>if (v != 0)
>>  {
>>fprintf (stderr, "Expected value of unknown numeric property to equal 
>> 0, "
>> -   "but was %d.\n", v);
>> +   "but was %zd.\n", v);
>>abort ();
>>  }
> 
> ..., shouldn't that be '%zu' given that 'size_t' is 'unsigned'?
> 
> libgomp.oacc-c-c++-common/acc_get_property-aux.c:  fprintf (stderr, 
> "Expected acc_property_memory to equal %zd, "
> libgomp.oacc-c-c++-common/acc_get_property-aux.c:"but was 
> %zd.\n", expected_memory, total_mem);
> libgomp.oacc-c-c++-common/acc_get_property-aux.c:", but free 
> memory was %zd and total memory was %zd.\n",
> libgomp.oacc-c-c++-common/acc_get_property-aux.c:"but was 
> %zd.\n", v);
> libgomp.oacc-c-c++-common/acc_get_property.c:  printf ("Total 
> memory: %zd\n", v);
> libgomp.oacc-c-c++-common/acc_get_property.c:  printf ("Free 
> memory: %zd\n", v);
> 
> 
> Grüße
>  Thomas
> 

From 4bd03ed69bd789278a0286017b692f49052ffe5c Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 20 Jan 2020 14:07:03 +0100
Subject: [PATCH 1/2] Fix expectation and types in acc_get_property tests

* Weaken expectation concerning acc_property_free_memory.
  Do not expect the value returned by CUDA since that value might have
  changed in the meantime.
* Use correct type for the results of calls to acc_get_property in tests.

libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
	(expect_device_properties): Remove "expected_free_mem" argument,
	change "expected_total_mem" argument type to size_t;
	change types of acc_get_property results to size_t,
	adapt format strings.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property.c:
	Use %zu instead of %zd to print size_t values.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-2.c: Adapt and
	rename to ...
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-nvptx.c: ... this.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-3.c: Adapt and
	rename to ...
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-host.c: ... this.

Reviewed-by: Thomas Schwinge  
---
 .../acc_get_property-aux.c| 30 +--
 ...t_property-3.c => acc_get_property-host.c} |  7 ++---
 ..._property-2.c => acc_get_property-nvptx.c} |  9 +++---
 .../acc_get_property.c|  4 +--
 4 files changed, 25 insertions(+), 25 deletions(-)
 rename libgomp/testsuite/libgomp.oacc-c-c++-common/{acc_get_property-3.c => acc_get_property-host.c} (63%)
 rename libgomp/testsuite/libgomp.oacc-c-c++-common/{acc_get_property-2.c => acc_get_property-nvptx.c} (86%)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c b/libgomp/te

Re: [PATCH][amdgcn] Add runtime ISA check for amdgcn offloading

2020-01-20 Thread Harwath, Frederik
Hi Andrew,
Thanks for the review! I have attached a revised patch containing the changes 
that you suggested.

On 20.01.20 11:00, Andrew Stubbs wrote:

> On 20/01/2020 06:57, Harwath, Frederik wrote:
>> Is it ok to commit this patch to the master branch?
> 
> I can't see anything significantly wrong with the code of the patch, however 
> I have some minor issues I'd like fixed in the text.
> 
> [...] Please move the functions down into the "Utility functions" group. The 
> const static variables should probably go with them.

Done.

>> @@ -3294,7 +3415,11 @@ GOMP_OFFLOAD_init_device (int n)
>>    );
>>    if (status != HSA_STATUS_SUCCESS)
>>  return hsa_error ("Error querying the name of the agent", status);
>> -  agent->gfx900_p = (strncmp (buf, "gfx900", 6) == 0);
>> +  agent->gfx900_p = (strncmp (buf, gcn_gfx900_s, gcn_isa_name_len) == 0);
>> +
>> +  agent->device_isa = isa_code (buf);
>> +  if (agent->device_isa < 0)
>> +    return hsa_error ("Unknown GCN agent architecture.", HSA_STATUS_ERROR);
> 
> Can device_isa not just replace gfx900_p? I think it's only tested in one 
> place, and that would be easily substituted.
> 

Yes, I have changed that one place to use agent->device_isa.

I would commit the patch then if nobody objects :-). The other approaches (fat 
binaries etc.) that have been discussed in
this thread seem to be long-term projects and until something like this gets 
implemented the early error checking
implemented by this patch seems to be better than nothing.

Frederik
From 470892454bf0d67ea71c2399f5819713592e46a0 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 20 Jan 2020 07:45:43 +0100
Subject: [PATCH] Add runtime ISA check for amdgcn offloading

When executing code that uses amdgcn GPU offloading, the ISA of the GPU must
match the ISA for which the code has been compiled.  So far, the libgomp amdgcn
plugin did not attempt to verify this.  In case of a mismatch, the user is
confronted with an unhelpful error message produced by the HSA runtime.

This commit implements a runtime ISA check. In the case of a ISA mismatch, the
execution is aborted with a clear error message and a hint at the correct
compilation parameters for the GPU on which the execution has been attempted.

libgomp/
	* plugin/plugin-gcn.c (EF_AMDGPU_MACH): New enum.
	* (EF_AMDGPU_MACH_MASK): New constant.
	* (gcn_isa): New typedef.
	* (gcn_gfx801_s): New constant.
	* (gcn_gfx803_s): New constant.
	* (gcn_gfx900_s): New constant.
	* (gcn_gfx906_s): New constant.
	* (gcn_isa_name_len): New constant.
	* (elf_gcn_isa_field): New function.
	* (isa_hsa_name): New function.
	* (isa_gcc_name): New function.
	* (isa_code): New function.
	* (struct agent_info): Add field "device_isa" and remove field
	"gfx900_p".
	* (GOMP_OFFLOAD_init_device): Adapt agent init to "agent_info"
	field changes, fail if device has unknown ISA.
	* (parse_target_attributes): Replace "gfx900_p" by "device_isa".
	* (isa_matches_agent): New function ...
	* (create_and_finalize_hsa_program): ... used from here to check
	that the GPU ISA and the code-object ISA match.
---
 libgomp/plugin/plugin-gcn.c | 131 ++--
 1 file changed, 127 insertions(+), 4 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 16ce251f3a5..de470a3dd33 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -396,6 +396,20 @@ struct gcn_image_desc
   struct global_var_info *global_variables;
 };
 
+/* This enum mirrors the corresponding LLVM enum's values for all ISAs that we
+   support.
+   See https://llvm.org/docs/AMDGPUUsage.html#amdgpu-ef-amdgpu-mach-table */
+
+typedef enum {
+  EF_AMDGPU_MACH_AMDGCN_GFX801 = 0x028,
+  EF_AMDGPU_MACH_AMDGCN_GFX803 = 0x02a,
+  EF_AMDGPU_MACH_AMDGCN_GFX900 = 0x02c,
+  EF_AMDGPU_MACH_AMDGCN_GFX906 = 0x02f,
+} EF_AMDGPU_MACH;
+
+const static int EF_AMDGPU_MACH_MASK = 0x00ff;
+typedef EF_AMDGPU_MACH gcn_isa;
+
 /* Description of an HSA GPU agent (device) and the program associated with
it.  */
 
@@ -408,8 +422,9 @@ struct agent_info
   /* Whether the agent has been initialized.  The fields below are usable only
  if it has been.  */
   bool initialized;
-  /* Precomputed check for problem architectures.  */
-  bool gfx900_p;
+
+  /* The instruction set architecture of the device. */
+  gcn_isa device_isa;
 
   /* Command queues of the agent.  */
   hsa_queue_t *sync_queue;
@@ -1232,7 +1247,8 @@ parse_target_attributes (void **input,
 
   if (gcn_dims_found)
 {
-  if (agent->gfx900_p && gcn_threads == 0 && override_z_dim == 0)
+  if (agent->device_isa == EF_AMDGPU_MACH_AMDGCN_GFX900
+	  && gcn_threads == 0 && override_z_dim == 0)
 	{
 	  gcn_threads = 4

Re: [PATCH] Add OpenACC 2.6 `acc_get_property' support

2020-01-20 Thread Harwath, Frederik
Hi Thomas,
I have attached a patch containing the changes that you suggested.

On 16.01.20 17:00, Thomas Schwinge wrote:

> On 2019-12-20T17:46:57+0100, "Harwath, Frederik"  
> wrote:
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-2.c
> 
> I suggest to rename this one to 'acc_get_property-nvptx.c'> [...]
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-3.c

> I suggest to rename this one to 'acc_get_property-host.c'.

I renamed both.

> This assumes that the 'cuda*' interfaces and OpenACC/libgomp interfaces
> handle/order device numbers in the same way -- which it seems they do,
> but just noting this in case this becomes an issue at some point.

Correct, I have added a corresponding comment to acc_get_property-nvptx.c.

> Aside from improper data types being used for storing/printing the memory
> information, we have to expect 'acc_property_free_memory' to change
> between two invocations.  ;-)

Right! I have removed the assertion and changed it into ...
> 
> Better to just verify that 'free_mem >= 0' (by means of 'size_t' data
> type, I suppose), and 'free_mem <= total_mem'?

... this.

> 
> (..., and for avoidance of doubt: I think there's no point in
> special-casing this one for 'acc_device_host' where we know that
> 'free_mem' is always zero -- this may change in the future.)

Sure! But with the new "free_mem <= total_mem" assertion and since we assert
total_mem == 0 and since free_mem >= 0, we effectively also assert that in the
test right now ;-).


Ok to push the commit to master?

Best regards,
Frederik
From ef5a959bedc3214e86d6a683a02b693d82847ecd Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 20 Jan 2020 14:07:03 +0100
Subject: [PATCH] Fix expectation and types in acc_get_property tests

* Weaken expectation concerning acc_property_free_memory.
  Do not expect the value returned by CUDA since that value might have
  changed in the meantime.
* Use correct type for the results of calls to acc_get_property in tests.

libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
	(expect_device_properties): Remove "expected_free_mem" argument,
	change "expected_total_mem" argument type to size_t;
	change types of acc_get_property results to size_t.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-2.c: Adapt and
	rename to ...
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-nvptx.c: ... this.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-3.c: Adapt and
	rename to ...
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-host.c: ... this.

Reviewed-by: Thomas Schwinge  
---
 .../acc_get_property-aux.c| 28 +--
 ...t_property-3.c => acc_get_property-host.c} |  7 ++---
 ..._property-2.c => acc_get_property-nvptx.c} |  9 +++---
 3 files changed, 22 insertions(+), 22 deletions(-)
 rename libgomp/testsuite/libgomp.oacc-c-c++-common/{acc_get_property-3.c => acc_get_property-host.c} (63%)
 rename libgomp/testsuite/libgomp.oacc-c-c++-common/{acc_get_property-2.c => acc_get_property-nvptx.c} (86%)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
index 952bdbf6aea..76c29501839 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
@@ -8,9 +8,8 @@
 
 void expect_device_properties
 (acc_device_t dev_type, int dev_num,
- int expected_total_mem, int expected_free_mem,
- const char* expected_vendor, const char* expected_name,
- const char* expected_driver)
+ size_t expected_memory, const char* expected_vendor,
+ const char* expected_name, const char* expected_driver)
 {
   const char *vendor = acc_get_property_string (dev_num, dev_type,
 		acc_property_vendor);
@@ -21,22 +20,23 @@ void expect_device_properties
   abort ();
 }
 
-  int total_mem = acc_get_property (dev_num, dev_type,
-acc_property_memory);
-  if (total_mem != expected_total_mem)
+  size_t total_mem = acc_get_property (dev_num, dev_type,
+   acc_property_memory);
+  if (total_mem != expected_memory)
 {
-  fprintf (stderr, "Expected acc_property_memory to equal %d, "
-	   "but was %d.\n", expected_total_mem, total_mem);
+  fprintf (stderr, "Expected acc_property_memory to equal %zd, "
+	   "but was %zd.\n", expected_memory, total_mem);
   abort ();
 
 }
 
-  int free_mem = acc_get_property (dev_num, dev_type,
+  size_t free_mem = acc_get_property (dev_num, dev_type,
    acc_property_free_memory);
-  if (free_mem != expected_free_mem)
+  if (free_mem > total_mem)
 {
-  fprintf (stderr, "Expected acc_property_free_memory to equal %d, "
-

[PATCH][amdgcn] Add runtime ISA check for amdgcn offloading

2020-01-19 Thread Harwath, Frederik
Hi,
this patch implements a runtime ISA check for amdgcn offloading.
The check verifies that the ISA of the GPU to which we try to offload matches
the ISA for which the code to be offloaded has been compiled. If it detects
a mismatch, it emits an error message which contains a hint at the correct 
compilation
parameters for the GPU. For instance:

  "libgomp: GCN fatal error: GCN code object ISA 'gfx906' does not match GPU 
ISA 'gfx900'.
   Try to recompile with '-foffload=-march=gfx900'."
or
  "libgomp: GCN fatal error: GCN code object ISA 'gfx900' does not match agent 
ISA 'gfx803'.
   Try to recompile with '-foffload=-march=fiji'."

(By the way, the names that we use for the ISAs are a bit inconsistent. Perhaps 
we should just
 use the gfx-names for all ISAs everywhere?.)

Without this patch, the user only gets an confusing error message from the HSA 
runtime which
fails to load the GCN object code.

I have checked that the code does not lead to any regressions when running
the test suite correctly, i.e. with the "-foffload=-march=..." option
given to the compiler matching the architecture of the GPU.
It seems difficult to implement an automated test that triggers an ISA mismatch.
I have tested manually (for different combinations of the compilation flags
and offloading GPU ISAs) that the runtime ISA check produces the expected error 
messages.

Is it ok to commit this patch to the master branch?

Frederik



From 27981f9c93d1efed6d943dae4ea0c52147c02d5b Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 20 Jan 2020 07:45:43 +0100
Subject: [PATCH] Add runtime ISA check for amdgcn offloading

When executing code that uses amdgcn GPU offloading, the ISA of the GPU must
match the ISA for which the code has been compiled.  So far, the libgomp amdgcn
plugin did not attempt to verify this.  In case of a mismatch, the user is
confronted with an unhelpful error message produced by the HSA runtime.

This commit implements a runtime ISA check. In the case of a ISA mismatch, the
execution is aborted with a clear error message and a hint at the correct
compilation parameters for the GPU on which the execution has been attempted.

libgomp/
	* plugin/plugin-gcn.c (EF_AMDGPU_MACH): New enum.
	(EF_AMDGPU_MACH_MASK): New constant.
	(gcn_isa): New typedef.
	(gcn_gfx801_s): New constant.
	(gcn_gfx803_s): New constant.
	(gcn_gfx900_s): New constant.
	(gcn_gfx906_s): New constant.
	(gcn_isa_name_len): New constant.
	(elf_gcn_isa_field): New function.
	(isa_hsa_name): New function.
	(isa_gcc_name): New function.
	(isa_code): New function.
	(struct agent_info): Add field "device_isa" ...
	(GOMP_OFFLOAD_init_device): ... and init from here,
	failing if device has unknown ISA; adapt init of "gfx900_p"
	to use new constants.
	(isa_matches_agent): New function ...
	(create_and_finalize_hsa_program): ... used from here to check
	that the GPU ISA and the code-object ISA match.
---
 libgomp/plugin/plugin-gcn.c | 127 +++-
 1 file changed, 126 insertions(+), 1 deletion(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 16ce251f3a5..14f4a707a7c 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -396,6 +396,88 @@ struct gcn_image_desc
   struct global_var_info *global_variables;
 };
 
+/* This enum mirrors the corresponding LLVM enum's values for all ISAs that we
+   support.
+   See https://llvm.org/docs/AMDGPUUsage.html#amdgpu-ef-amdgpu-mach-table */
+
+typedef enum {
+  EF_AMDGPU_MACH_AMDGCN_GFX801 = 0x028,
+  EF_AMDGPU_MACH_AMDGCN_GFX803 = 0x02a,
+  EF_AMDGPU_MACH_AMDGCN_GFX900 = 0x02c,
+  EF_AMDGPU_MACH_AMDGCN_GFX906 = 0x02f,
+} EF_AMDGPU_MACH;
+
+const static int EF_AMDGPU_MACH_MASK = 0x00ff;
+typedef EF_AMDGPU_MACH gcn_isa;
+
+const static char* gcn_gfx801_s = "gfx801";
+const static char* gcn_gfx803_s = "gfx803";
+const static char* gcn_gfx900_s = "gfx900";
+const static char* gcn_gfx906_s = "gfx906";
+const static int gcn_isa_name_len = 6;
+
+static int
+elf_gcn_isa_field (Elf64_Ehdr *image)
+{
+  return image->e_flags & EF_AMDGPU_MACH_MASK;
+}
+
+/* Returns the name that the HSA runtime uses for the ISA or NULL if we do not
+   support the ISA. */
+
+static const char*
+isa_hsa_name (int isa) {
+  switch(isa)
+{
+case EF_AMDGPU_MACH_AMDGCN_GFX801:
+  return gcn_gfx801_s;
+case EF_AMDGPU_MACH_AMDGCN_GFX803:
+  return gcn_gfx803_s;
+case EF_AMDGPU_MACH_AMDGCN_GFX900:
+  return gcn_gfx900_s;
+case EF_AMDGPU_MACH_AMDGCN_GFX906:
+  return gcn_gfx906_s;
+}
+  return NULL;
+}
+
+/* Returns the user-facing name that GCC uses to identify the architecture (e.g.
+   with -march) or NULL if we do not support the ISA.
+   Keep in sync with /gcc/config/gcn/gcn.{c,opt}.  */
+
+static const char*
+isa_gcc_name (int isa) {
+  switch(isa)
+{
+case EF_AMDGPU_MACH_AMDGCN_GFX801:
+  return "carrizo";
+case EF_AMDGPU_MACH_AMDGCN_GFX803:
+  return "fiji";
+default:
+  return 

*ping* - Re: [Patch] Rework OpenACC nested reduction clause consistency checking (was: Re: [PATCH][committed] Warn about inconsistent OpenACC nested reduction clauses)

2020-01-08 Thread Harwath, Frederik
PING

Hi Jakub,
I have attached a version of the patch that has been rebased on the current 
trunk.

Frederik

On 03.12.19 12:16, Harwath, Frederik wrote:
> On 08.11.19 07:41, Harwath, Frederik wrote:
>> On 06.11.19 14:00, Jakub Jelinek wrote:
>> [...]
>>> I'm not sure it is a good idea to use a TREE_LIST in this case, vec would be
>>> more natural, wouldn't it.
>> [...]
>>> If gimplifier is not the right spot, then use a splay tree + vector instead?
>>> splay tree for the outer ones, vector for the local ones, and put into both
>>> the clauses, so you can compare reduction code etc.
>>
>> Sounds like a good idea. I am going to try that.
> 
> Below you can find a patch that reimplements the nested reductions check using
> more appropriate data structures. [...]


From b08855328c52e36143770e442e50ba87f25c14b3 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Wed, 8 Jan 2020 14:00:44 +0100
Subject: [PATCH] Rework OpenACC nested reduction clause consistency checking

Revision 277875 of trunk introduced a consistency check for nested OpenACC
reduction clauses. The implementation has two drawbacks:
1) It uses suboptimal data structures for storing information about
   the reduction clauses.
2) The warnings issued for *repeated* inconsistent use of reduction operators
   are confusing. For instance, on three nested loops that use the reduction
   operators +, -, + on the same variable, we obtain a warning at the switch
   from + to - (as desired) and another warning about the switch from - to +.
   It would be preferable to avoid the second warning since + is consistent
   with the first reduction operator.

This commit attempts to fix both problems by using more appropriate data
structures (splay trees and vectors instead of tree lists) for keeping track of
the information about the reduction clauses.

2020-01-08  Frederik Harwath  

	gcc/
	* omp-low.c (omp_context): Removed fields local_reduction_clauses,
	outer_reduction_clauses; added fields oacc_reduction_clauses,
	oacc_reductions_stack.
	(oacc_reduction_clause_location): New struct.
	(oacc_reduction_var_occ): New struct.
	(new_omp_context): Adjust omp_context initialization to new fields.
	(delete_omp_context): Adjust omp_context deletion to new fields.
	(rewind_oacc_reductions_stack): New function.
	(check_oacc_reduction_clause): New function.
	(check_oacc_reduction_clauses): New function.
	(scan_sharing_clauses): Call check_oacc_reduction_clause for
	reduction clauses (this handles clauses on compute regions)
	if a new optional flag is enabled.
	(scan_omp_for): Remove old nested reduction check, call
	 check_oacc_reduction_clauses instead.
	(scan_omp_target): Adapt call to scan_sharing_clauses to enable the new
	flag.

   	gcc/testsuite/
	* c-c++-common/goacc/nested-reductions-warn.c: Add dg-prune-output to
	 ignore warnings that are not relevant to the test.
	(acc_parallel): Stop expecting pruned warnings, adjust expected
	warnings to changes in omp-low.c, add checks for info messages about the
	location of clauses.
	(acc_parallel_loop): Likewise.
	(acc_parallel_reduction): Likewise.
	(acc_parallel_loop_reduction): Likewise.
	(acc_routine): Likewise.
	(acc_kernels): Likewise.

	* gfortran.dg/goacc/nested-reductions-warn.f90: Likewise.
---
 gcc/omp-low.c | 306 --
 .../goacc/nested-reductions-warn.c|  81 ++---
 .../goacc/nested-reductions-warn.f90  |  83 ++---
 3 files changed, 271 insertions(+), 199 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e692a53a3de..6026b7aff89 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -73,6 +73,9 @@ along with GCC; see the file COPYING3.  If not see
scanned for regions which are then moved to a new
function, to be invoked by the thread library, or offloaded.  */
 
+
+struct oacc_reduction_var_occ;
+
 /* Context structure.  Used to store information about each parallel
directive in the code.  */
 
@@ -128,12 +131,6 @@ struct omp_context
  corresponding tracking loop iteration variables.  */
   hash_map *lastprivate_conditional_map;
 
-  /* A tree_list of the reduction clauses in this context.  */
-  tree local_reduction_clauses;
-
-  /* A tree_list of the reduction clauses in outer contexts.  */
-  tree outer_reduction_clauses;
-
   /* Nesting depth of this context.  Used to beautify error messages re
  invalid gotos.  The outermost ctx is depth 1, with depth 0 being
  reserved for the main body of the function.  */
@@ -163,8 +160,52 @@ struct omp_context
 
   /* True if there is bind clause on the construct (i.e. a loop construct).  */
   bool loop_p;
+
+  /* A mapping that maps a variable to information about the last OpenACC
+ reduction clause that used the variable above the current context.
+ This information is used for checking the nesting restrictions for
+ reduction clauses by the function check_oacc

Re: [PATCH] Add OpenACC 2.6 `acc_get_property' support

2019-12-22 Thread Harwath, Frederik
Hi Thomas,

>> Is it ok to commit the patch to trunk?
> 
> OK, thanks.  And then some follow-up/clean-up next year, also including
> some of the open questions that I've snipped off here.

Right, thanks for the review! I have committed the patch as r279710 with a
minor change: I have disabled the new acc_get_property.{c,f90} tests for
the amdgcn offload target for now.

Best regards,
Frederik



Re: [PATCH] Add OpenACC 2.6 `acc_get_property' support

2019-12-20 Thread Harwath, Frederik
Hi Thomas,
thanks for the review! I have attached a revised patch.

> > There is no AMD GCN support yet. This will be added later on.
>
> ACK, just to note that there now is a 'libgomp/plugin/plugin-gcn.c' that
> at least needs to get a stub implementation (can mostly copy from
> 'libgomp/plugin/plugin-hsa.c'?) as otherwise the build will fail.

Yes, I have added a stub. A full implementation will follow soon.
The implementation in the OG9 branch that Andrew mentioned will need a
bit of polishing.

> Tobias has generally reviewed the Fortran bits, correct?

Yes, he has done that internally.

> | Before Frederik starts working on integrating this into GCC trunk, do you
> | (Jakub) agree with the libgomp plugin interface changes as implemented by
> | Maciej?  For example, top-level 'GOMP_OFFLOAD_get_property' function in
> | 'struct gomp_device_descr' instead of stuffing this into its
> | 'acc_dispatch_t openacc'.  (I never understood why the OpenACC functions
> | need to be segregated like they are.)
>
> Jakub didn't answer, but I now myself decided that we should group this
> with the other OpenACC libgomp-plugin functions, as this interface is
> defined in terms of OpenACC-specific stuff such as 'acc_device_t'.
> Frederik, please work on that, also try to move function definitions etc.
> into appropriate places in case they aren't; ask if you need help.
> That needs to be updated.

Is it ok to do this in a separate follow-up patch?


> >  .../acc-get-property-2.c  |  68 +
> >  .../acc-get-property-3.c  |  19 +++
> >  .../acc-get-property-aux.c|  60 
> >  .../acc-get-property.c|  75 ++
> >  .../libgomp.oacc-fortran/acc-get-property.f90 |  80 ++
>
> Please name all these 'acc_get_property*', which is the name of the
> interface tested.

Ok.


> > --- a/include/gomp-constants.h
> > +++ b/include/gomp-constants.h
> > @@ -178,6 +178,20 @@ enum gomp_map_kind
> >=20=20
> >  #define GOMP_DEVICE_ICV-1
> >  #define GOMP_DEVICE_HOST_FALLBACK  -2
> > +#define GOMP_DEVICE_CURRENT-3
> [...]
>
> Not should if this should be grouped with 'GOMP_DEVICE_ICV',
> 'GOMP_DEVICE_HOST_FALLBACK', for it is not related to there.
>
> [...]
>
> Should this actually get value '-1' instead of '-3'?  Or, is the OpenACC
> 'acc_device_t' code already paying special attention to negative values
> '-1', '-2'?  (I don't think so.)
> | Also, 'acc_device_current' is a libgomp-internal thing (doesn't interface
> | with the compiler proper), so strictly speaking 'GOMP_DEVICE_CURRENT'
> | isn't needed in 'include/gomp-constants.h'.  But probably still a good
> | idea to list it there, in this canonical place, to keep the several lists
> | of device types coherent.
> still wonder about that...  ;-)

I have removed GOMP_DEVICE_CURRENT from gomp-constants.h.
Changing the value of GOMP_DEVICE_ICV violates the following static asserts in 
oacc-parallel.c:

 /* In the ABI, the GOACC_FLAGs are encoded as an inverted bitmask, so that we
   continue to support the following two legacy values.  */
_Static_assert (GOACC_FLAGS_UNMARSHAL (GOMP_DEVICE_ICV) == 0,
"legacy GOMP_DEVICE_ICV broken");
_Static_assert (GOACC_FLAGS_UNMARSHAL (GOMP_DEVICE_HOST_FALLBACK)
== GOACC_FLAG_HOST_FALLBACK,
"legacy GOMP_DEVICE_HOST_FALLBACK broken");

> > +/* Device property codes.  Keep in sync with
> > +   libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_device_property_t
>
> | Same thing, libgomp-internal, not sure whether to list these here?
>
> > +   as well as libgomp/libgomp-plugin.h.  */
>
> (Not sure why 'libgomp/libgomp-plugin.h' is relevant here?)

It does not seem to be relevant. Right now, openacc_lib.h is also not relevant.
I have removed both file names from the comment.

> > +#define GOMP_DEVICE_PROPERTY_MEMORY1
> > +#define GOMP_DEVICE_PROPERTY_FREE_MEMORY   2
> > +#define GOMP_DEVICE_PROPERTY_NAME  0x10001
> > +#define GOMP_DEVICE_PROPERTY_VENDOR0x10002
> > +#define GOMP_DEVICE_PROPERTY_DRIVER0x10003
> > +
> > +/* Internal property mask to tell numeric and string values apart.  */
> > +#define GOMP_DEVICE_PROPERTY_STRING_MASK   0x1
>
> (Maybe should use an 'enum'?)

I have changed this to an enum. However, this does not improve the code much,
since we cannot use the enum for the function arguments in the plugins
because gomp-constants.h is not included from there.

> Maybe this stuff should move from 'include/gomp-constants.h' to
> 'libgomp/oacc-int.h'.  I'll think about that again, when I'm awake again
> tomorrow.  ;-)

Have you made up your mind yet? :-)


> > --- a/libgomp/libgomp-plugin.h
> > +++ b/libgomp/libgomp-plugin.h
> > @@ -54,6 +54,13 @@ enum offload_target_type
> >OFFLOAD_TARGET_TYPE_GCN =3D 8
> >  };
> >=20=20
> > +/* Container type for passing device properties.  */
> > +union 

[PATCH, committed] Fix PR92901: Change test expectation for C++ in OpenACC test clause-locations.c

2019-12-11 Thread Harwath, Frederik
Hi,
I have committed the attached trivial patch to trunk as r279215. The columns of 
the clause locations are reported differently
by the C and C++ front-end and hence we need different test expectations for 
both languages.

Best regards,
Frederik

r279215 | frederik | 2019-12-11 09:26:18 +0100 (Mi, 11 Dez 2019) | 12 lines

Fix PR92901: Change test expectation for C++ in OpenACC test clause-locations.c 

The columns of the clause locations that are reported for C and C++ are
different and hence we need separate test expectations for both languages.

2019-12-11  Frederik Harwath  

	PR other/92901
	/gcc/testsuite/
	* c-c++-common/clause-locations.c: Adjust test expectation for C++.




Index: gcc/testsuite/c-c++-common/goacc/clause-locations.c
===
--- gcc/testsuite/c-c++-common/goacc/clause-locations.c	(revision 279214)
+++ gcc/testsuite/c-c++-common/goacc/clause-locations.c	(working copy)
@@ -9,7 +9,9 @@
 #pragma acc loop reduction(+:sum)
 for (i = 1; i <= 10; i++)
   {
-#pragma acc loop reduction(-:diff) reduction(-:sum)  /* { dg-warning "53: conflicting reduction operations for .sum." } */
+#pragma acc loop reduction(-:diff) reduction(-:sum)
+	/* { dg-warning "53: conflicting reduction operations for .sum." "" { target c } .-1 } */
+	/* { dg-warning "56: conflicting reduction operations for .sum." "" { target c++ } .-2 } */
 	for (j = 1; j <= 10; j++)
 	  sum = 1;
   }


Re: [PATCH 0/2] Add tests to verify OpenACC clause locations

2019-12-10 Thread Harwath, Frederik
Hi Thomas,

On 10.12.19 15:44, Thomas Schwinge wrote:

> Thanks, yes, with my following remarks considered, and acted on per your
> preference.  To record the review effort, please include "Reviewed-by:
> Thomas Schwinge " in the commit log, see
> .

Committed as r279168 and r279169.

Frederik




Re: [PATCH 0/2] Add tests to verify OpenACC clause locations

2019-12-10 Thread Harwath, Frederik
Hi Thomas,

On 10.12.19 15:44, Thomas Schwinge wrote:

>> Frederik Harwath (2):
>>   Use clause locations in OpenACC nested reduction warnings
>>   Add tests to verify OpenACC clause locations
> 
> I won't insist, but suggest (common practice) to merge that into one
> patch: bug fix plus test cases, using the summary line of your first
> patch.> [...]
> It's of course always OK to add new test cases, but wouldn't the same
> test coverage be reached by just adding such checking to the existing
> test cases in 'c-c++-common/goacc/nested-reductions-warn.c',
> 'gfortran.dg/goacc/nested-reductions-warn.f90'?

Sure, we could have everything in one patch and one test. The rationale
for splitting the patches and for splitting the tests is that the tests do
not try to verify the nested reductions validation code. They try to verify
that the language front-ends set the correct locations for clauses.
Without a possibility to do proper unit testing, I just had to find some
way to check the clauses. I had no immediate success triggering one of the
very few other warnings that use the location of omp_clauses from both Fortran
and C code and hence I went with the nested reductions code.

Thanks for your review!

Best regards,
Frederik




[PATCH] Fix column information for omp_clauses in Fortran code

2019-12-09 Thread Harwath, Frederik
Hi,
Tobias has recently fixed a problem with the column information in gfortran 
locations
("PR 92793 - fix column used for error diagnostic"). Diagnostic messages for 
OpenMP/OpenACC
clauses do not contain the right column information yet. The reason is that the 
location
information of the first clause is used for all clauses on a line and hence the 
columns
are wrong for all but the first clause. The attached patch fixes this problem.

I have tested the patch manually by adapting the validity check for nested 
OpenACC reductions (see omp-low.c)
to include the location of clauses in warnings instead of the location of the 
loop to which the clause belongs.
I can add a regression test based on this later on after adapting the code in 
omp-low.c.

Is it ok to include the patch in trunk?

Best regards,
Frederik


On 04.12.19 14:37, Tobias Burnus wrote:
> As reported internally by Frederik, gfortran currently passes LOCATION_COLUMN 
> == 0 to the middle end. The reason for that is how parsing works – gfortran 
> reads the input line by line.
> 
> For internal error diagnostic (fortran/error.c), the column location was 
> corrected –  but not for locations passed to the middle end. Hence, the 
> diagnostic there wasn't optimal.
> 
> Fixed by introducing a new function; now one only needs to make sure that no 
> new code will re-introduce "lb->location" :-)
> 
> Build and regtested on x86-64-gnu-linux.
> OK for the trunk?
> 
> Tobias

From af3a63b64f38d522b0091a123a919d1f20f5a8b1 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 9 Dec 2019 15:07:53 +0100
Subject: [PATCH] Fix column information for omp_clauses in Fortran code

The location of all OpenMP/OpenACC clauses on any given line in Fortran code
always points to the first clause on that line. Hence, the column information
is wrong for all clauses but the first one.

Use the correct location for each clause instead.

2019-12-09  Frederik Harwath  

/gcc/fortran/
	* trans-openmp (gfc_trans_omp_reduction_list): Pass correct location for each
	clause to build_omp_clause.
---
 gcc/fortran/trans-openmp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index d07ff86fc0b..356fd04e6c3 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1982,7 +1982,7 @@ gfc_trans_omp_reduction_list (gfc_omp_namelist *namelist, tree list,
 	tree t = gfc_trans_omp_variable (namelist->sym, false);
 	if (t != error_mark_node)
 	  {
-	tree node = build_omp_clause (gfc_get_location (),
+	tree node = build_omp_clause (gfc_get_location (>where),
 	  OMP_CLAUSE_REDUCTION);
 	OMP_CLAUSE_DECL (node) = t;
 	if (mark_addressable)
-- 
2.17.1



[PATCH][AMDGCN] Skip test gcc/testsuite/gcc.dg/asm-4.c

2019-12-04 Thread Harwath, Frederik
Hi,
the inline assembly "p" modifier ("An operand that is a valid memory address is 
allowed",
cf. 
https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints)
is not supported on AMD GCN. This causes an ICE during the compilation of 
gcc.dg/asm-4.c.
We should skip the test for the amdgcn-*-* target.

Can I merge the patch below into trunk?

Best regards,
Frederik


2019-12-05  Frederik Harwath  

gcc/testsuite/
* gcc.dg/asm-4.c: Skip on target amdgcn-*-*.

Index: gcc/testsuite/gcc.dg/asm-4.c
===
--- gcc/testsuite/gcc.dg/asm-4.c(revision 278932)
+++ gcc/testsuite/gcc.dg/asm-4.c(working copy)
@@ -3,6 +3,7 @@

 /* "p" modifier can't be used to generate a valid memory address with ILP32.  
*/
 /* { dg-skip-if "" { aarch64*-*-* && ilp32 } } */
+/* { dg-skip-if "'p' is not supported for GCN" { amdgcn-*-* } } */

 int main()
 {



Re: [Patch, Fortran] PR 92793 - fix column used for error diagnostic

2019-12-04 Thread Harwath, Frederik
Hi Tobias,

On 04.12.19 14:37, Tobias Burnus wrote:
> As reported internally by Frederik, gfortran currently passes LOCATION_COLUMN 
> == 0 to the middle end. The reason for that is how parsing works – gfortran 
> reads the input line by line.
> 
> For internal error diagnostic (fortran/error.c), the column location was 
> corrected –  but not for locations passed to the middle end. Hence, the 
> diagnostic there wasn't optimal.

I am not sure if those changes have any impact on existing diagnostics - 
probably not or you would have needed to change some tests in your patch. Thus, 
I want to confirm that this fixes the
problems that I had when trying to emit warnings that referenced the location 
of OpenACC reduction clauses from pass_lower_omp when compiling Fortran code.
Where previously

inform (OMP_CLAUSE_LOCATION (some_omp_clause), "Some message.");

would produce

[...] /gcc/testsuite/gfortran.dg/goacc/nested-reductions-warn.f90:19:0: note: 
Some message.

I now get the expected result:

[...] /gcc/testsuite/gfortran.dg/goacc/nested-reductions-warn.f90:19:27: note: 
Some message.

(Well, not completely as expected. In this case where the clause is an OpenACC 
reduction clause, the location of the clause is a bit off because it points to 
the reduction variable and not
to the beginning of the clause, but that's another issue which is not related 
to this patch ;-) )

The existing translation of the reduction clauses has another bug. It uses the 
location of the first clause from the reduction list for all clauses. This 
could be fixed by changing the patch as follows:

> @@ -1854,7 +1854,7 @@ gfc_trans_omp_reduction_list (gfc_omp_namelist 
> *namelist, tree list,
>   tree t = gfc_trans_omp_variable (namelist->sym, false);
>   if (t != error_mark_node)
> {
> - tree node = build_omp_clause (where.lb->location,
> + tree node = build_omp_clause (gfc_get_location (),
> OMP_CLAUSE_REDUCTION);
>   OMP_CLAUSE_DECL (node) = t;
>   if (mark_addressable)

Here "" should be ">where" to use the location of the current 
clause. I have verified that this yields the correct locations for all clauses 
using the nested-reductions-warn.f90 test.


Thank you for fixing this!

Best regards,
Frederik



Re: [PATCH 2/4] Validate acc_device_t uses

2019-12-03 Thread Harwath, Frederik
Hi Thomas,

On 03.12.19 13:14, Thomas Schwinge wrote:
> You once had this patch separate, but then merged into the upstream
> submission of 'acc_get_property'; let's please keep this separate.
> 
> With changes as indicated below, please commit this to trunk [...]

Ok, I have committed the patch as revision 278936. You can find the committed 
version in the attachment. Thank you for the review!

> Generally, does usage of these functions obsolete some existing usage of
> 'acc_dev_num_out_of_range'?  (OK to address later.)

I think it does. I am going to verify this.

>> @@ -168,7 +184,7 @@ resolve_device (acc_device_t d, bool fail_is_error)
>>break;
>>  
>>  default:
>> -  if (d > _ACC_device_hwm)
>> +  if (!acc_known_device_type (d))
>>  {
>>if (fail_is_error)
>>  goto unsupported_device;
> 
> Note that this had 'd > _ACC_device_hwm', not '>=' as it now does, that
> is, previously didn't reject 'd == _ACC_device_hwm' but now does -- but I
> suppose this was an (minor) bug that existed before, so OK to change as
> you did?
Right, I do not see any reasons why it should accept ACC_device_hwm and
the change did not cause any regressions.

Best regards,
Frederik



r278937 | frederik | 2019-12-03 15:38:54 +0100 (Di, 03 Dez 2019) | 25 lines

Validate acc_device_t uses

Check that function arguments of type acc_device_t
are valid enumeration values in all publicly visible
functions from oacc-init.c.

2019-12-03  Frederik Harwath  

libgomp/
* oacc-init.c (acc_known_device_type): Add function.
(unknown_device_type_error): Add function.
(name_of_acc_device_t): Change to call unknown_device_type_error
on unknown type.
(resolve_device): Use acc_known_device_type.
(acc_init): Fail if acc_device_t argument is not valid.
(acc_shutdown): Likewise.
(acc_get_num_devices): Likewise.
(acc_set_device_type): Likewise.
(acc_get_device_num): Likewise.
(acc_set_device_num): Likewise.
(acc_on_device): Add comment that argument validity is not checked.

Reviewed-by: Thomas Schwinge 



Index: libgomp/oacc-init.c
===
--- libgomp/oacc-init.c	(revision 278936)
+++ libgomp/oacc-init.c	(working copy)
@@ -82,6 +82,18 @@
   gomp_mutex_unlock (_device_lock);
 }
 
+static bool
+known_device_type_p (acc_device_t d)
+{
+  return d >= 0 && d < _ACC_device_hwm;
+}
+
+static void
+unknown_device_type_error (acc_device_t invalid_type)
+{
+  gomp_fatal ("unknown device type %u", invalid_type);
+}
+
 /* OpenACC names some things a little differently.  */
 
 static const char *
@@ -103,8 +115,9 @@
 case acc_device_host: return "host";
 case acc_device_not_host: return "not_host";
 case acc_device_nvidia: return "nvidia";
-default: gomp_fatal ("unknown device type %u", (unsigned) type);
+default: unknown_device_type_error (type);
 }
+  __builtin_unreachable ();
 }
 
 /* ACC_DEVICE_LOCK must be held before calling this function.  If FAIL_IS_ERROR
@@ -123,7 +136,7 @@
 	if (goacc_device_type)
 	  {
 	/* Lookup the named device.  */
-	while (++d != _ACC_device_hwm)
+	while (known_device_type_p (++d))
 	  if (dispatchers[d]
 		  && !strcasecmp (goacc_device_type,
   get_openacc_name (dispatchers[d]->name))
@@ -147,7 +160,7 @@
 
 case acc_device_not_host:
   /* Find the first available device after acc_device_not_host.  */
-  while (++d != _ACC_device_hwm)
+  while (known_device_type_p (++d))
 	if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0)
 	  goto found;
   if (d_arg == acc_device_default)
@@ -168,7 +181,7 @@
   break;
 
 default:
-  if (d > _ACC_device_hwm)
+  if (!known_device_type_p (d))
 	{
 	  if (fail_is_error)
 	goto unsupported_device;
@@ -505,6 +518,9 @@
 void
 acc_init (acc_device_t d)
 {
+  if (!known_device_type_p (d))
+unknown_device_type_error (d);
+
   gomp_init_targets_once ();
 
   gomp_mutex_lock (_device_lock);
@@ -519,6 +535,9 @@
 void
 acc_shutdown (acc_device_t d)
 {
+  if (!known_device_type_p (d))
+unknown_device_type_error (d);
+
   gomp_init_targets_once ();
 
   gomp_mutex_lock (_device_lock);
@@ -533,6 +552,9 @@
 int
 acc_get_num_devices (acc_device_t d)
 {
+  if (!known_device_type_p (d))
+unknown_device_type_error (d);
+
   int n = 0;
   struct gomp_device_descr *acc_dev;
 
@@ -564,6 +586,9 @@
 void
 acc_set_device_type (acc_device_t d)
 {
+  if (!known_device_type_p (d))
+unknown_device_type_error (d);
+
   struct gomp_device_descr *base_dev, *acc_dev;
   struct goacc_thread *thr = goacc_thread ();
 
@@ -647,12 +672,12 @@
 int
 acc_get_device_num (acc_device_t d)
 {
+  if (!known_device_type_p (d))
+unknown_device_type_error (d);
+
   const struct 

[Patch] Rework OpenACC nested reduction clause consistency checking (was: Re: [PATCH][committed] Warn about inconsistent OpenACC nested reduction clauses)

2019-12-03 Thread Harwath, Frederik
Hi Jakub,

On 08.11.19 07:41, Harwath, Frederik wrote:
> On 06.11.19 14:00, Jakub Jelinek wrote:
> [...]
>> I'm not sure it is a good idea to use a TREE_LIST in this case, vec would be
>> more natural, wouldn't it.
> 
> Yes.
> 
> [...]
>> If gimplifier is not the right spot, then use a splay tree + vector instead?
>> splay tree for the outer ones, vector for the local ones, and put into both
>> the clauses, so you can compare reduction code etc.
> 
> Sounds like a good idea. I am going to try that.

Below you can find a patch that reimplements the nested reductions check using
more appropriate data structures. As an additional benefit, the quality of the 
warnings
has also improved (see description below). I have checked the patch by running 
the testsuite on
x86_64-pc-linux-gnu.

Best regards,
Frederik

From 94ca786172afa7dab7630d75965bf6d6f0dd24e1 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 3 Dec 2019 10:38:01 +0100
Subject: [PATCH] Rework OpenACC nested reduction clause consistency checking

Revision 277875 of trunk introduced a consistency check for nested OpenACC
reduction clauses. The implementation has two drawbacks:
1) It uses suboptimal data structures for storing information about
   the reduction clauses.
2) The warnings issued for *repeated* inconsistent use of reduction operators
   are confusing. For instance, on three nested loops that use the reduction
   operators +, -, + on the same variable, we obtain a warning at the switch
   from + to - (as desired) and another warning about the switch from - to +.
   It would be preferable to avoid the second warning since + is consistent
   with the first reduction operator.

This commit attempts to fix both problems by using more appropriate data
structures (splay trees and vectors instead of tree lists) for keeping track of
the information about the reduction clauses.

2019-12-3  Frederik Harwath  

	gcc/
	* omp-low.c (omp_context): Removed fields local_reduction_clauses,
	outer_reduction_clauses; added fields oacc_reduction_clauses,
	oacc_reductions_stack.
	(oacc_reduction_clause_location): New struct.
	(oacc_reduction_var_occ): New struct.
	(new_omp_context): Adjust omp_context initialization to new fields.
	(delete_omp_context): Adjust omp_context deletion to new fields.
	(rewind_oacc_reductions_stack): New function.
	(check_oacc_reduction_clause): New function.
	(check_oacc_reduction_clauses): New function.
	(scan_sharing_clauses): Call check_oacc_reduction_clause for
	reduction clauses (this handles clauses on compute regions)
	if a new optional flag is enabled.
	(scan_omp_for): Remove old nested reduction check, call
	 check_oacc_reduction_clauses instead.
	(scan_omp_target): Adapt call to scan_sharing_clauses to enable the new
	flag.

   	gcc/testsuite/
	* c-c++-common/goacc/nested-reductions-warn.c: Add dg-prune-output to
	 ignore warnings that are not relevant to the test.
	(acc_parallel): Stop expecting pruned warnings, adjust expected
	warnings to changes in omp-low.c, add checks for info messages about the
	location of clauses.
	(acc_parallel_loop): Likewise.
	(acc_parallel_reduction): Likewise.
	(acc_parallel_loop_reduction): Likewise.
	(acc_routine): Likewise.
	(acc_kernels): Likewise.

	* gfortran.dg/goacc/nested-reductions-warn.f90: Likewise.
---
 gcc/omp-low.c | 305 --
 .../goacc/nested-reductions-warn.c|  81 ++---
 .../goacc/nested-reductions-warn.f90  |  83 ++---
 3 files changed, 271 insertions(+), 198 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 19132f76da2..ba04e7477dc 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -73,6 +73,9 @@ along with GCC; see the file COPYING3.  If not see
scanned for regions which are then moved to a new
function, to be invoked by the thread library, or offloaded.  */
 
+
+struct oacc_reduction_var_occ;
+
 /* Context structure.  Used to store information about each parallel
directive in the code.  */
 
@@ -128,12 +131,6 @@ struct omp_context
  corresponding tracking loop iteration variables.  */
   hash_map *lastprivate_conditional_map;
 
-  /* A tree_list of the reduction clauses in this context.  */
-  tree local_reduction_clauses;
-
-  /* A tree_list of the reduction clauses in outer contexts.  */
-  tree outer_reduction_clauses;
-
   /* Nesting depth of this context.  Used to beautify error messages re
  invalid gotos.  The outermost ctx is depth 1, with depth 0 being
  reserved for the main body of the function.  */
@@ -163,8 +160,52 @@ struct omp_context
 
   /* True if there is bind clause on the construct (i.e. a loop construct).  */
   bool loop_p;
+
+  /* A mapping that maps a variable to information about the last OpenACC
+ reduction clause that used the variable above the current context.
+ This information is used for checking the nesting restrictions for
+ reduction clauses by the function ch

Re: [PATCH] Fix ICE in re-simplification of VEC_COND_EXPR

2019-11-29 Thread Harwath, Frederik
On 29.11.19 15:46, Richard Sandiford wrote:

> Thanks for doing this, looks good to me FWIW.  I was seeing the same
> failure for SVE but hadn't found time to look at it.

Thank you all for the review. Committed as r278853.

Frederik



Re: [PATCH] Fix ICE in re-simplification of VEC_COND_EXPR (was: Re: [PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR)

2019-11-29 Thread Harwath, Frederik
Hi Jakub,

On 29.11.19 14:41, Jakub Jelinek wrote:

> s/use/Use/
>
> [...]
>
> s/. /.  /

Right, thanks. Does that look ok for inclusion in trunk now?

Best regards,
Frederik


2019-11-29  Frederik Harwath  

gcc/
* gimple-match-head.c (maybe_resimplify_conditional_op): Use
generic_expr_could_trap_p to check if the condition of COND_EXPR or
VEC_COND_EXPR can trap.
---
 gcc/gimple-match-head.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 2996bade301..9010f11621e 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -144,9 +144,21 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
gimple_match_op *res_op,
   /* Likewise if the operation would not trap.  */
   bool honor_trapv = (INTEGRAL_TYPE_P (res_op->type)
  && TYPE_OVERFLOW_TRAPS (res_op->type));
-  if (!operation_could_trap_p ((tree_code) res_op->code,
-  FLOAT_TYPE_P (res_op->type),
-  honor_trapv, res_op->op_or_null (1)))
+  tree_code op_code = (tree_code) res_op->code;
+  bool op_could_trap;
+
+  /* COND_EXPR and VEC_COND_EXPR will trap if, and only if, the condition
+ traps and hence we have to check this.  For all other operations, we
+ don't need to consider the operands.  */
+  if (op_code == COND_EXPR || op_code == VEC_COND_EXPR)
+   op_could_trap = generic_expr_could_trap_p (res_op->ops[0]);
+  else
+   op_could_trap = operation_could_trap_p ((tree_code) res_op->code,
+   FLOAT_TYPE_P (res_op->type),
+   honor_trapv,
+   res_op->op_or_null (1));
+
+  if (!op_could_trap)
{
  res_op->cond.cond = NULL_TREE;
  return false;
-- 
2.17.1



[PATCH] Fix ICE in re-simplification of VEC_COND_EXPR (was: Re: [PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR)

2019-11-29 Thread Harwath, Frederik
Hi,

On 29.11.19 13:51, Harwath, Frederik wrote:

>> condition for the inner vec_cond.  Your fix looks reasonable but is
>> very badly formatted.  Can you instead do

I hope the formatting looks better now. I have also removed the [amdgcn] from 
the subject line since
the fact that this has been discovered in the context of amdgcn is not really 
essential.

Best regards,
Frederik


2019-11-29  Frederik Harwath  

gcc/
* gimple-match-head.c (maybe_resimplify_conditional_op): use
generic_expr_could_trap_p to check if the condition of COND_EXPR or
VEC_COND_EXPR can trap.
---
 gcc/gimple-match-head.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 2996bade301..c763a80a6d1 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -144,9 +144,21 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
gimple_match_op *res_op,
   /* Likewise if the operation would not trap.  */
   bool honor_trapv = (INTEGRAL_TYPE_P (res_op->type)
  && TYPE_OVERFLOW_TRAPS (res_op->type));
-  if (!operation_could_trap_p ((tree_code) res_op->code,
-  FLOAT_TYPE_P (res_op->type),
-  honor_trapv, res_op->op_or_null (1)))
+  tree_code op_code = (tree_code) res_op->code;
+  bool op_could_trap;
+
+  /* COND_EXPR and VEC_COND_EXPR will trap if, and only if, the condition
+ traps and hence we have to check this. For all other operations, we
+ don't need to consider the operands. */
+  if (op_code == COND_EXPR || op_code == VEC_COND_EXPR)
+   op_could_trap = generic_expr_could_trap_p (res_op->ops[0]);
+  else
+   op_could_trap = operation_could_trap_p ((tree_code) res_op->code,
+   FLOAT_TYPE_P (res_op->type),
+   honor_trapv,
+   res_op->op_or_null (1));
+
+  if (!op_could_trap)
{
  res_op->cond.cond = NULL_TREE;
  return false;
-- 
2.17.1




Re: [PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR

2019-11-29 Thread Harwath, Frederik
Hi Richard,

On 29.11.19 13:37, Richard Biener wrote:
> On Fri, Nov 29, 2019 at 1:24 PM Harwath, Frederik
>  wrote:
> [...]
>> It seems that this rule is not invoked when compiling for x86_64 where the 
>> generated code for vect-cond-reduc-1.c does not contain anything that would
>> match this rule. Could it be that there is no test covering this rule for 
>> commonly tested architectures?
> 
> This was all added for aarch64 SVE.  So it looks like the outer plus
> was conditional and we end up inheriting the
I should have mentioned this, it was indeed a COND_ADD.

> condition for the inner vec_cond.  Your fix looks reasonable but is
> very badly formatted.  Can you instead do
> 
>  if (op_Code == cOND_EPXR || op_code == vEC_COND_EXPR)
>op_could_trap = generic_expr_could_trap (..)
>  else
>   op_could_trap = operation_could_trap_p (...
> 

Sorry, sure!

Thanks,
Frederik



[PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR

2019-11-29 Thread Harwath, Frederik
Hi,
currently, on trunk, the tests gcc.dg/vect/vect-cond-reduc-1.c and 
gcc.dg/pr68286.c fail when compiling for amdgcn-unknown-amdhsa.
The reason seems to lie in the interaction of the changes that have been 
introduced by revision r276659
("Allow COND_EXPR and VEC_COND_EXPR condtions to trap" by Ilya Leoshkevich) of 
trunk and the vectorized code that is generated for amdgcn.

If the function maybe_resimplify_conditional_op from gimple-match-head.c gets 
called on a conditional operation without an "else" part, it
makes the operation unconditional, but only if the operation cannot trap. To 
check this, it uses operation_could_trap_p.
This ends up in a violated assertion in the latter function if 
maybe_resimplify_conditional_op is called on a COND_EXPR or VEC_COND_EXPR:

 /* This function cannot tell whether or not COND_EXPR and VEC_COND_EXPR could
 trap, because that depends on the respective condition op.  */
  gcc_assert (op != COND_EXPR && op != VEC_COND_EXPR);

A related issue has been resolved by the patch that was committed as r276915 
("PR middle-end/92063" by Jakub Jelinek).

In our case, the error is triggered by the simplification rule at line 3450 of 
gcc/match.pd:

/* A + (B vcmp C ? 1 : 0) -> A - (B vcmp C ? -1 : 0), since vector comparisons
   return all -1 or all 0 results.  */
/* ??? We could instead convert all instances of the vec_cond to negate,
   but that isn't necessarily a win on its own.  */
(simplify
 (plus:c @3 (view_convert? (vec_cond:s @0 integer_each_onep@1 integer_zerop@2)))
 (if (VECTOR_TYPE_P (type)
  && known_eq (TYPE_VECTOR_SUBPARTS (type),
   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))
  && (TYPE_MODE (TREE_TYPE (type))
  == TYPE_MODE (TREE_TYPE (TREE_TYPE (@1)
  (minus @3 (view_convert (vec_cond @0:0 (negate @1) @2)
)

It seems that this rule is not invoked when compiling for x86_64 where the 
generated code for vect-cond-reduc-1.c does not contain anything that would
match this rule. Could it be that there is no test covering this rule for 
commonly tested architectures?

I have changed maybe_resimplify_conditional_op to check if a COND_EXPR or 
VEC_COND_EXPR could trap by checking whether the condition can trap using
generic_expr_could_trap_p. Judging from the comment above the assertion and the 
code changes of r276659, it seems that this is both necessary and
sufficient to verify if those expressions can trap.

Does that sound reasonable and can the patch be included in trunk?

The patch fixes the failing tests for me and does not cause any visible 
regressions in the results of "make check" which I have executed for targets 
amdgcn-unknown-amdhsa
and x86_64-pc-linux-gnu.

Best regards,
Frederik



2019-11-28  Frederik Harwath  

gcc/
* gimple-match-head.c (maybe_resimplify_conditional_op): use
generic_expr_could_trap_p to check if the condition of COND_EXPR or
VEC_COND_EXPR can trap.
---
 gcc/gimple-match-head.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 2996bade301..4da6c4d7458 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -144,9 +144,17 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
gimple_match_op *res_op,
   /* Likewise if the operation would not trap.  */
   bool honor_trapv = (INTEGRAL_TYPE_P (res_op->type)
  && TYPE_OVERFLOW_TRAPS (res_op->type));
-  if (!operation_could_trap_p ((tree_code) res_op->code,
-  FLOAT_TYPE_P (res_op->type),
-  honor_trapv, res_op->op_or_null (1)))
+  tree_code op_code = (tree_code) res_op->code;
+  /* COND_EXPR and VEC_COND_EXPR will trap if, and only if, the condition
+traps and hence we have to check this. For all other operations, we
+don't need to consider the operands. */
+  bool op_could_trap = op_code == COND_EXPR || op_code == VEC_COND_EXPR ?
+   generic_expr_could_trap_p (res_op->ops[0]) :
+   operation_could_trap_p ((tree_code) res_op->code,
+   FLOAT_TYPE_P (res_op->type),
+   honor_trapv, res_op->op_or_null (1));
+
+  if (!op_could_trap)
{
  res_op->cond.cond = NULL_TREE;
  return false;
-- 
2.17.1



Re: [PATCH 5/7] Remove last leftover usage of params* files.

2019-11-12 Thread Harwath, Frederik
Hi Martin,

On 06.11.19 13:40, Martin Liska wrote:

>   (finalize_options_struct): Remove.

This patch has been committed by now, but it seems that a single use of 
finalize_options_struct has been overlooked
in gcc/tree-streamer-in.c.

Best regards,
Frederik



Re: [PATCH][committed] Warn about inconsistent OpenACC nested reduction clauses

2019-11-07 Thread Harwath, Frederik
Hi Jakub,

On 06.11.19 14:00, Jakub Jelinek wrote:
> On Wed, Nov 06, 2019 at 01:41:47PM +0100, frede...@codesourcery.com wrote:
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -128,6 +128,12 @@ struct omp_context
>> [...]
>> +  /* A tree_list of the reduction clauses in this context.  */
>> +  tree local_reduction_clauses;
>> +
>> +  /* A tree_list of the reduction clauses in outer contexts.  */
>> +  tree outer_reduction_clauses;
> 
> Could there be acc in the name to make it clear it is OpenACC only?

Yes, will be added.


>> @@ -910,6 +916,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
>> [...]
>> +  ctx->local_reduction_clauses = NULL;
>> [...]
>> @@ -925,6 +933,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
>> [...]
>> +  ctx->local_reduction_clauses = NULL;
>> +  ctx->outer_reduction_clauses = NULL;
> 
> The = NULL assignments are unnecessary in all 3 cases, ctx is allocated with
> XCNEW.

Ok, will be removed.

>> @@ -1139,6 +1149,11 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>>goto do_private;
>>  
>>  case OMP_CLAUSE_REDUCTION:
>> +  if (is_oacc_parallel (ctx) || is_oacc_kernels (ctx))
>> +ctx->local_reduction_clauses
>> +  = tree_cons (NULL, c, ctx->local_reduction_clauses);
> 
> I'm not sure it is a good idea to use a TREE_LIST in this case, vec would be
> more natural, wouldn't it.

Yes.

> Or, wouldn't it be better to do this checking in the gimplifier instead of
> omp-low.c?  There we have splay trees with GOVD_REDUCTION etc. for the
> variables, so it wouldn't be O(#reductions^2) compile time> It is true that 
> the gimplifier doesn't record the reduction codes (after
> all, OpenMP has UDRs and so there can be fairly arbitrary reductions).


Right, I have considered moving the implementation somewhere else before.
I am going to look into this, but perhaps we will just keep it where it is
if otherwise the implementation becomes more complicated.

> Consider million reduction clauses on nested loops.
> If gimplifier is not the right spot, then use a splay tree + vector instead?
> splay tree for the outer ones, vector for the local ones, and put into both
> the clauses, so you can compare reduction code etc.

Sounds like a good idea. I am going to try that.
However, I have not seen the suboptimal data structure choices
of the original patch as a problem, since the case of million reduction clauses
has not occurred to me.

Thank you for your feedback!

Best regards,
Frederik





Re: [PATCH] Report errors on inconsistent OpenACC nested reduction clauses

2019-11-06 Thread Harwath, Frederik
Hi Thomas,

On 05.11.19 15:22, Thomas Schwinge wrote:

> For your convenience, I'm attaching an incremental patch, to be merged
> into yours.> [...]> With that addressed, OK for trunk.

Thank you. I have merged the patches and committed.

> A few more comments to address separately, later on.

I will look into your remaining questions.

Best regards,
Frederik



Re: Add OpenACC 2.6 `acc_get_property' support

2019-11-05 Thread Harwath, Frederik
Hi Thomas,

> On 07.10.19 20:41, Thomas Schwinge wrote:
> > On 2018-12-03T16:51:14+, "Maciej W. Rozycki"  
> > wrote:
> > Add generic support for the OpenACC 2.6 `acc_get_property' and
> > `acc_get_property_string' routines [...]
>
> ..., which allow for user code to query the implementation for stuff like:
>
> > OpenACC vendor: GNU
> > OpenACC name: GOMP
> > OpenACC driver: 1.0
>
> [...]
>
> > --- a/include/gomp-constants.h
> > +++ b/include/gomp-constants.h
> > @@ -215,10 +215,24 @@ enum gomp_map_kind
> >  #define GOMP_DEVICE_NVIDIA_PTX 5
> >  #define GOMP_DEVICE_INTEL_MIC  6
> >  #define GOMP_DEVICE_HSA7
> > +#define GOMP_DEVICE_CURRENT8
>
> This is used for 'acc_device_current', relevant only for
> 'acc_get_property', to return "the value of the property for the current
> device".  This should probably use a more special (negative?) value
> instead of eight, so that when additional real device types are added
> later on, we can just add them with increasing numbers, and keep the
> scanning code simple.

Yes, I use the first unused negative value.

> (Use of 'acc_device_current' as an argument to other functions taking an
> 'acc_device_t' is undefined, and should be rejected with 'gomp_fatal'?)

So far, there seems to be essentially no validity checking for acc_device_t
and other enums in the relevant parts of the code. I have added such checks
to public functions which take acc_device_t arguments.

> > --- a/libgomp/plugin/plugin-nvptx.c
> > +++ b/libgomp/plugin/plugin-nvptx.c
>
> > +union gomp_device_property_value
> > +GOMP_OFFLOAD_get_property (int n, int prop)
> > +{
> > +  union gomp_device_property_value propval = { .val = 0 };
> > +
> > +  pthread_mutex_lock (_dev_lock);
> > +
> > +  if (!nvptx_init () || n >= nvptx_get_num_devices ())
> > +{
> > +  pthread_mutex_unlock (_dev_lock);
> > +  return propval;
> > +}
>
> Isn't it implicit that 'get_num_devices' has been called while loading
> the plugin, so we don't have to do any initialization that here?  (But I
> may be misremembering that.)

Yes, a call path for nvptx_get_num_devices during initialization, in case we
are using the nvptx plugin:
acc_init (oacc_init.c) -> gomp_init_targets_once (target.c) ->
gomp_target_init (target.c) -> GOMP_OFFLOAD_get_num_devices (plugin-nvptx.c) ->
nvptx_get_num_devices

For nvptx_init, a call path is:
acc_init (oacc_init.c) -> acc_init_1 (oacc_init.c) -> gomp_init_device 
(oacc_init.c) ->
GOMP_OFFLOAD_init_device (plugin-nvptx.c) -> nvptx_init

Hence, yes, we should not call nvptx_init from here.


> > +  switch (prop)
> > +{
> > +case GOMP_DEVICE_PROPERTY_MEMORY:
> > +  {
> > +   size_t total_mem;
> > +   CUdevice dev;
> > +
> > +   CUDA_CALL_ERET (propval, cuDeviceGet, , n);
>
> Isn't that already known as 'ptx_devices[n]'?  (Likewise elsewhere.)

Yes, that gets set during GOMP_OFFLOAD_init_device.

> + CUDA_CALL_ERET (propval, cuDeviceTotalMem, _mem, dev);
> + propval.val = total_mem;
> +  }
> +  break;

> +case GOMP_DEVICE_PROPERTY_NAME:
> +  {
> + static char name[256];
> + CUdevice dev;
> +
> + CUDA_CALL_ERET (propval, cuDeviceGet, , n);
> + CUDA_CALL_ERET (propval, cuDeviceGetName, name, sizeof (name), dev);
> + propval.ptr = name;
> +  }
> +  break;

> Uh, that's not thread-safe, is it?

Not at all.

> Otherwise, perhaps make this 'name' a property of 'struct ptx_device' in
> the nvptx plugin here, and keep it live while the device is open
> ('nvptx_open_device'), together with other per-device data?

That's what I do now.

> Is that 'snprintf' formatting the generic way to display a CUDA driver
> version number?

At least, the same formatting is applied by NVidia's deviceQuery example
from cuda-samples
(i.e. 
https://github.com/NVIDIA/cuda-samples/blob/master/Samples/deviceQuery/deviceQuery.cpp#L106).
For me, the output yields "CUDA Driver Version / Runtime Version 9.1 / 9.1" with
the nvidia-cuda-toolkit 9.1.


> > As, in theory, such Nvidia GPU offloading support could also be
> > implemented via the Nouveau/Mesa GalliumCompute driver, should the string
> > returned here actually include "CUDA Driver"?

This seems like a good way to disambiguate between different drivers, but I am 
not sure if there
are any compatibility issues that we have to consider (PGI?). The standard does 
not impose
any restrictions on the format of the string.


> > +default:
> > +  break;
>
> Should this 'GOMP_PLUGIN_error' or even 'GOMP_PLUGIN_fatal'?  (Similar
> then elsewhere.)

Yes, I chose GOMP_PLUGIN_error.

> > --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc-get-property.c
> > @@ -0,0 +1,37 @@
> > +/* Test the `acc_get_property' and '`acc_get_property_string' library
> > +   functions. */
> > +/* { dg-do run } */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +int main ()
> > +{
> > +  const char *s;
> > +  size_t 

Re: [PATCH] Report errors on inconsistent OpenACC nested reduction clauses

2019-10-29 Thread Harwath, Frederik

On 24.10.19 16:31, Thomas Schwinge wrote:

Hi,
I have attached a revised patch.


[...] I was wondering if the way in which the patch
avoids issuing errors about operator switches more than once by modifying the 
clauses (cf. the
corresponding comment in omp-low.c) could lead to problems [...]
 
"Patching up" erroneous state or even completely removing OMP clauses is

-- as far as I understand -- acceptable to avoid "issuing errors about
operator switches more than once".  This doesn't affect code generation,
because no code will be generated at all.

(Does that answer your question?)



Yes, thank you.



Regarding my suggestions to "demote error to warning diagnostics", I'd
suggest that at this point we do *not* try to fix for the user any
presumed wrong/missing 'reduction' clauses (difficult/impossible to do
correctly in the general case), but really only diagnose them.


Ok, I have changed the errors into warnings and I have removed the
code for avoiding repeated messages.

So just C/C++ testing, no Fortran at all.  This is not ideal, but
probably (hopefully) acceptable given that this is working on the middle
end representation shared between all front ends.


Thanks to Tobias, we now also have Fortran tests.


To match the order in 'struct omp_context' (see above), move these new
initializations before those of 'ctx->depth'.  (Even if that also just
achieves "some local consistency".)  ;-)


Done.


@@ -1131,6 +1141,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
  
  	case OMP_CLAUSE_REDUCTION:

case OMP_CLAUSE_IN_REDUCTION:
+ if (is_oacc_parallel (ctx) || is_oacc_kernels (ctx))
+   ctx->local_reduction_clauses
+ = tree_cons (NULL, c, ctx->local_reduction_clauses);
  decl = OMP_CLAUSE_DECL (c);
  if (TREE_CODE (decl) == MEM_REF)
{


I think this should really only apply to 'OMP_CLAUSE_REDUCTION' but not > 
'OMP_CLAUSE_IN_REDUCTION' (please verify)?


Right, I have moved the new code to the OMP_CLAUSE_REDUCTION case above.



I'm usually the last one to complain about such things ;-) -- but here
really the indentation of the new code seems to be off?  Please verify.
Maybe you had set a tab-stop to four spaces instead of eight?


Oh, it should look better now.


--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/nested-reductions-fail.c


Rename to '*-warn.c', and instead of 'dg-error' use 'dg-warning'
(possibly more than currently).


Ok.


--- a/gcc/testsuite/c-c++-common/goacc/reduction-6.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-6.c
@@ -16,17 +16,6 @@ int foo (int N)
}
}
  
-  #pragma acc parallel

-  {
-#pragma acc loop reduction(+:b)
-for (int i = 0; i < N; i++)
-  {
-#pragma acc loop
-   for (int j = 0; j < N; j++)
- b += 1;
-  }
-  }
-
#pragma acc parallel
{
  #pragma acc loop reduction(+:c)


That one stays in, but gets a 'dg-warning'.


What warning would you expect to see here? I do not get any warnings.

Best regards,
Frederik

>From 22f45d4c2c11febce171272f9289c487aed4f9d7 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 29 Oct 2019 12:39:23 +0100
Subject: [PATCH] Warn about inconsistent OpenACC nested reduction clauses
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

OpenACC (cf. OpenACC 2.7, section 2.9.11. "reduction clause";
this was first clarified by OpenACC 2.6) requires that, if a
variable is used in reduction clauses on two nested loops, then
there must be reduction clauses for that variable on all loops
that are nested in between the two loops and all these reduction
clauses must use the same operator.
This commit introduces a check for that property which reports
warnings if it is violated.

In gcc/testsuite/c-c++-common/goacc/reduction-6.c, we remove the erroneous
reductions on variable b; adding a reduction clause to make it compile cleanly
would make it a duplicate of the test for variable c.

2019-10-29  Gergö Barany  
		Tobias Burnus  
		Frederik Harwath  
		Thomas Schwinge  

	 gcc/
	 * omp-low.c (struct omp_context): New fields
	 local_reduction_clauses, outer_reduction_clauses.
	 (new_omp_context): Initialize these.
	 (scan_sharing_clauses): Record reduction clauses on OpenACC constructs.
	 (scan_omp_for): Check reduction clauses for incorrect nesting.
	 gcc/testsuite/
	 * c-c++-common/goacc/nested-reductions-warn.c: New test.
	 * c-c++-common/goacc/nested-reductions.c: New test.
	 * c-c++-common/goacc/reduction-6.c: Adjust.
	 * gfortran.dg/goacc/nested-reductions-warn.f90: New test.
	 * gfortran.dg/goacc/nested-reductions.f90: New test.
	 libgomp/
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c:
	 Add missing reduction clauses.
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c:
	 Likewise.
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c:
	 

[PATCH] Report errors on inconsistent OpenACC nested reduction, clauses

2019-10-21 Thread Harwath, Frederik

Hi,
OpenACC requires that, if a variable is used in reduction clauses on two nested 
loops, then there
must be reduction clauses for that variable on all loops that are nested in 
between the two loops
and all these reduction clauses must use the same operator; this has been first 
clarified by
OpenACC 2.6. This commit introduces a check for that property which reports 
errors if the property
is violated.

I have tested the patch by comparing "make check" results and I am not aware of 
any regressions.

Gergö has implemented the check and it works, but I was wondering if the way in 
which the patch
avoids issuing errors about operator switches more than once by modifying the 
clauses (cf. the
corresponding comment in omp-low.c) could lead to problems - the processing 
might still continue
after the error on the modified tree, right? I was also wondering about the 
best place for such
checks. Should this be a part of "pass_lower_omp" (as in the patch) or should 
it run earlier
like, for instance, "pass_diagnose_omp_blocks".

Can the patch be included in trunk?

Frederik



>From 99796969c1bf91048c6383dfb1b8576bdd9efd7d Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 21 Oct 2019 08:27:58 +0200
Subject: [PATCH] Report errors on inconsistent OpenACC nested reduction
 clauses
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

OpenACC (cf. OpenACC 2.7, section 2.9.11. "reduction clause";
this was first clarified by OpenACC 2.6) requires that, if a
variable is used in reduction clauses on two nested loops, then
there must be reduction clauses for that variable on all loops
that are nested in between the two loops and all these reduction
clauses must use the same operator.
This commit introduces a check for that property which reports
errors if it is violated.

In gcc/testsuite/c-c++-common/goacc/reduction-6.c, we remove the erroneous
reductions on variable b; adding a reduction clause to make it compile cleanly
would make it a duplicate of the test for variable c.

2010-10-21  Gergö Barany  
		Frederik Harwath  

	 gcc/
	 * omp-low.c (struct omp_context): New fields
	 local_reduction_clauses, outer_reduction_clauses.
	 (new_omp_context): Initialize these.
	 (scan_sharing_clauses): Record reduction clauses on OpenACC
	 constructs.
	 (scan_omp_for): Check reduction clauses for incorrect nesting.
	 gcc/testsuite/
	 * c-c++-common/goacc/nested-reductions-fail.c: New test.
	 * c-c++-common/goacc/nested-reductions.c: New test.
	 * c-c++-common/goacc/reduction-6.c: Adjust.
	 libgomp/
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c:
	 Add missing reduction clauses.
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c:
	 Likewise.
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c:
	 Likewise.
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
	 Likewise.
---
 gcc/omp-low.c | 107 +++-
 .../goacc/nested-reductions-fail.c| 492 ++
 .../c-c++-common/goacc/nested-reductions.c| 420 +++
 .../c-c++-common/goacc/reduction-6.c  |  11 -
 .../par-loop-comb-reduction-1.c   |   2 +-
 .../par-loop-comb-reduction-2.c   |   2 +-
 .../par-loop-comb-reduction-3.c   |   2 +-
 .../par-loop-comb-reduction-4.c   |   2 +-
 8 files changed, 1022 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-fail.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions.c

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 279b6ef893a..a2212274685 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -127,6 +127,12 @@ struct omp_context
  corresponding tracking loop iteration variables.  */
   hash_map *lastprivate_conditional_map;
 
+  /* A tree_list of the reduction clauses in this context.  */
+  tree local_reduction_clauses;
+
+  /* A tree_list of the reduction clauses in outer contexts.  */
+  tree outer_reduction_clauses;
+
   /* Nesting depth of this context.  Used to beautify error messages re
  invalid gotos.  The outermost ctx is depth 1, with depth 0 being
  reserved for the main body of the function.  */
@@ -902,6 +908,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
   ctx->cb = outer_ctx->cb;
   ctx->cb.block = NULL;
   ctx->depth = outer_ctx->depth + 1;
+  ctx->local_reduction_clauses = NULL;
+  ctx->outer_reduction_clauses = ctx->outer_reduction_clauses;
 }
   else
 {
@@ -917,6 +925,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
   ctx->cb.adjust_array_error_bounds = true;
   ctx->cb.dont_remap_vla_if_no_change = true;
   ctx->depth = 1;
+  ctx->local_reduction_clauses = NULL;
+  ctx->outer_reduction_clauses = NULL;
 }
 
   ctx->cb.decl_map = new hash_map;
@@ -1131,6 +1141,9 @@ scan_sharing_clauses 

Add myself to MAINTAINERS files

2019-10-01 Thread Harwath, Frederik
2019-10-01  Frederik Harwath 

* MAINTAINERS: Add myself to Write After Approval

Index: ChangeLog
===
--- ChangeLog   (revision 276390)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2019-10-01  Frederik Harwath 
+
+   * MAINTAINERS: Add myself to Write After Approval
+
 2019-09-26  Richard Sandiford  

* MAINTAINERS: Add myself as an aarch64 maintainer.
Index: MAINTAINERS
===
--- MAINTAINERS (revision 276390)
+++ MAINTAINERS (working copy)
@@ -409,6 +409,7 @@
 Wei Guozhi 
 Mostafa Hagog  
 Andrew Haley   
+Frederik Harwath   
 Stuart Hastings
 Michael Haubenwallner  

 Pat Haugen 






Re: [PATCH] libgomp_g.h: Include stdint.h instead of gstdint.h

2019-09-30 Thread Harwath, Frederik


Hi Jakub,

Am 30.09.2019 um 09:25 schrieb Jakub Jelinek:
> On Mon, Sep 30, 2019 at 12:03:00AM -0700, Frederik Harwath wrote:
>> The patch changes libgomp/libgomp_g.h to include stdint.h instead of the 
>> internal gstdint.h. [...]
> 
> That looks wrong, will make libgomp less portable. [...]
>   Jakub

We have discussed this issue with Joseph Myers. Let me quote what Joseph
wrote:

"I think including  is appropriate (and, more generally,
removing the special configure support for GCC_HEADER_STDINT for
anything built only for the target - note that libgcc/gstdint.h has a
comment saying it's about libdecnumber portability to *hosts*, not
targets, without stdint.h). On any target without stdint.h, GCC should
be providing its own; the only targets where GCC does not yet know about
target stdint.h types are SymbianOS, LynxOS, QNX, TPF (see GCC bug 448),
and I think it's pretty unlikely libgomp would do anything useful for
those (and if in fact they do provide stdint.h, there wouldn't be an
issue anyway)."

Hence, I think the change will not affect portability negatively.

Best regards,
Frederik