Re: [Patch] plugin/plugin-gcn.c: Fix error handling of GOMP_OFFLOAD_openacc_async_construct (was: [Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally)

2024-12-04 Thread Thomas Schwinge
Hi Tobias!

On 2024-12-03T11:28:19+0100, Tobias Burnus  wrote:
> Thomas Schwinge wrote:
>> That's a bug in 'libgomp/plugin/plugin-gcn.c:maybe_init_omp_async' (or
>> its users); the real user of ['GOMP_OFFLOAD_openacc_async_construct'] does
>> handle the error condition:
>
> Well, it does not really handle it: It also just calls 'fatal',
> admittedly after unlocking but if the program dies it does not
> really make a difference.

It makes a very big difference: without unlocking, the process deadlocks.

> However, there are known issues with libgomp-plugin calling FATAL,
> which does not really trigger the FATAL part in libgomp, cf.
> https://gcc.gnu.org/PR109664

Right.

> The attached patch now defers the error handling to a bit later,
> either still in libgomp-plugin or in one case it even propagates
> the error through to a user-called routine.
>
> Comments before I commit it?

Looks good to me, one incremental step forward, thanks!

(I see there are more missing-unlocking issues in the GCN
'GOMP_OFFLOAD_openacc_async_construct', and not only there; that's for
another day.)


Grüße
 Thomas


> plugin/plugin-gcn.c: Fix error handling of 
> GOMP_OFFLOAD_openacc_async_construct
>
> Follow up to r15-5392-g884637b6362391. As the name implies,
> GOMP_OFFLOAD_openacc_async_construct is also externally called.
> Hence, partially revert previous commit to permit unlocking handling
> in oacc-async.c's lookup_goacc_asyncqueue by not failing fatally.
>
> Hence, also the other (indirect) callers had to be updated:
> GOMP_OFFLOAD_dev2dev fails now with 'false' and
> GOMP_OFFLOAD_async_run fatally.
>
> libgomp/ChangeLog:
>
>   * plugin/plugin-gcn.c (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_run):
>   Handle omp_async_queue == NULL after call tomaybe_init_omp_async.
>   (GOMP_OFFLOAD_openacc_async_construct): Use error not fatal error,
>   partially reverting r15-5392.
>
>  libgomp/plugin/plugin-gcn.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
> index d26b93657bf..239acf8cb75 100644
> --- a/libgomp/plugin/plugin-gcn.c
> +++ b/libgomp/plugin/plugin-gcn.c
> @@ -3939,6 +3939,8 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst, const void 
> *src, size_t n)
>  {
>struct agent_info *agent = get_agent_info (device);
>maybe_init_omp_async (agent);
> +  if (!agent->omp_async_queue)
> + return false;
>queue_push_copy (agent->omp_async_queue, dst, src, n);
>return true;
>  }
> @@ -4350,6 +4352,8 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void 
> *tgt_vars,
>  }
>  
>maybe_init_omp_async (agent);
> +  if (!agent->omp_async_queue)
> +GOMP_PLUGIN_fatal ("Asynchronous queue initialization failed");
>queue_push_launch (agent->omp_async_queue, kernel, tgt_vars, kla);
>queue_push_callback (agent->omp_async_queue,
>  GOMP_PLUGIN_target_task_completion, async_data);
> @@ -4388,9 +4392,7 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void 
> *),
>gcn_exec (kernel, devaddrs, dims, targ_mem_desc, true, aq);
>  }
>  
> -/* Create a new asynchronous thread and queue for running future kernels;
> -   issues a fatal error if the queue cannot be created as all callers expect
> -   that the queue exists.  */
> +/* Create a new asynchronous thread and queue for running future kernels.  */
>  
>  struct goacc_asyncqueue *
>  GOMP_OFFLOAD_openacc_async_construct (int device)
> @@ -4418,17 +4420,17 @@ GOMP_OFFLOAD_openacc_async_construct (int device)
>  
>if (pthread_mutex_init (&aq->mutex, NULL))
>  {
> -  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue mutex");
> +  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue mutex");
>return NULL;
>  }
>if (pthread_cond_init (&aq->queue_cond_in, NULL))
>  {
> -  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
> +  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
>return NULL;
>  }
>if (pthread_cond_init (&aq->queue_cond_out, NULL))
>  {
> -  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
> +  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
>return NULL;
>  }
>  


[Patch] plugin/plugin-gcn.c: Fix error handling of GOMP_OFFLOAD_openacc_async_construct (was: [Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally)

2024-12-03 Thread Tobias Burnus

Thomas Schwinge wrote:

That's a bug in 'libgomp/plugin/plugin-gcn.c:maybe_init_omp_async' (or
its users); the real user of 'GOMP_OFFLOAD_openacc_async_exec' does
handle the error condition:


Well, it does not really handle it: It also just calls 'fatal',
admittedly after unlocking but if the program dies it does not
really make a difference.

However, there are known issues with libgomp-plugin calling FATAL,
which does not really trigger the FATAL part in libgomp, cf.
https://gcc.gnu.org/PR109664

* * *

The attached patch now defers the error handling to a bit later,
either still in libgomp-plugin or in one case it even propagates
the error through to a user-called routine.

Comments before I commit it?

Tobias
plugin/plugin-gcn.c: Fix error handling of GOMP_OFFLOAD_openacc_async_construct

Follow up to r15-5392-g884637b6362391. As the name implies,
GOMP_OFFLOAD_openacc_async_construct is also externally called.
Hence, partially revert previous commit to permit unlocking handling
in oacc-async.c's lookup_goacc_asyncqueue by not failing fatally.

Hence, also the other (indirect) callers had to be updated:
GOMP_OFFLOAD_dev2dev fails now with 'false' and
GOMP_OFFLOAD_async_run fatally.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_run):
	Handle omp_async_queue == NULL after call tomaybe_init_omp_async.
	(GOMP_OFFLOAD_openacc_async_construct): Use error not fatal error,
	partially reverting r15-5392.

 libgomp/plugin/plugin-gcn.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index d26b93657bf..239acf8cb75 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3939,6 +3939,8 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst, const void *src, size_t n)
 {
   struct agent_info *agent = get_agent_info (device);
   maybe_init_omp_async (agent);
+  if (!agent->omp_async_queue)
+	return false;
   queue_push_copy (agent->omp_async_queue, dst, src, n);
   return true;
 }
@@ -4350,6 +4352,8 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
 }
 
   maybe_init_omp_async (agent);
+  if (!agent->omp_async_queue)
+GOMP_PLUGIN_fatal ("Asynchronous queue initialization failed");
   queue_push_launch (agent->omp_async_queue, kernel, tgt_vars, kla);
   queue_push_callback (agent->omp_async_queue,
 		   GOMP_PLUGIN_target_task_completion, async_data);
@@ -4388,9 +4392,7 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void *),
   gcn_exec (kernel, devaddrs, dims, targ_mem_desc, true, aq);
 }
 
-/* Create a new asynchronous thread and queue for running future kernels;
-   issues a fatal error if the queue cannot be created as all callers expect
-   that the queue exists.  */
+/* Create a new asynchronous thread and queue for running future kernels.  */
 
 struct goacc_asyncqueue *
 GOMP_OFFLOAD_openacc_async_construct (int device)
@@ -4418,17 +4420,17 @@ GOMP_OFFLOAD_openacc_async_construct (int device)
 
   if (pthread_mutex_init (&aq->mutex, NULL))
 {
-  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue mutex");
+  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue mutex");
   return NULL;
 }
   if (pthread_cond_init (&aq->queue_cond_in, NULL))
 {
-  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
+  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
   return NULL;
 }
   if (pthread_cond_init (&aq->queue_cond_out, NULL))
 {
-  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
+  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
   return NULL;
 }
 


Re: [Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally

2024-11-27 Thread Thomas Schwinge
Hi Tobias!

On 2024-11-18T14:23:24+0100, Tobias Burnus  wrote:
> This fixes a C23 error, causing a build fail: 'false'
> should have been 'NULL'.

ACK.

> The NULL value is not really handled as the code calling
> maybe_init_omp_async assumes that agent->omp_async_queue can be 
> dereferenced. Hence, besides fixing the false/NULL issue, it switches to 
> a 'fatal' error. Comments before I commit it after lunch?

(Please don't bundle up unrelated changes...)

That's a bug in 'libgomp/plugin/plugin-gcn.c:maybe_init_omp_async' (or
its users); the real user of 'GOMP_OFFLOAD_openacc_async_exec' does
handle the error condition:

libgomp/oacc-async.c-  if (!dev->openacc.async.asyncqueue[async])
libgomp/oacc-async.c-{
libgomp/oacc-async.c-  dev->openacc.async.asyncqueue[async]
libgomp/oacc-async.c:   = dev->openacc.async.construct_func 
(dev->target_id);
libgomp/oacc-async.c-
libgomp/oacc-async.c-  if (!dev->openacc.async.asyncqueue[async])
libgomp/oacc-async.c-   {
libgomp/oacc-async.c- gomp_mutex_unlock (&dev->openacc.async.lock);
libgomp/oacc-async.c- gomp_fatal ("async %d creation failed", async);
libgomp/oacc-async.c-   }

..., but needs to 'gomp_mutex_unlock' before 'gomp_fatal', which your
change now circumvents.  Therefore, please revert these 's%error%fatal'
changes, and instead fix up the libgomp GCN plugin-internal usage of
'GOMP_OFFLOAD_openacc_async_construct'.


Grüße
 Thomas


> --- a/libgomp/plugin/plugin-gcn.c
> +++ b/libgomp/plugin/plugin-gcn.c
> @@ -4388,7 +4388,8 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void 
> *),
>gcn_exec (kernel, devaddrs, dims, targ_mem_desc, true, aq);
>  }
>  
> -/* Create a new asynchronous thread and queue for running future kernels.  */
> +/* Create a new asynchronous thread and queue for running future kernels;
> +   fails with a fatal error as all callers expect the queue to exist.  */
>  
>  struct goacc_asyncqueue *
>  GOMP_OFFLOAD_openacc_async_construct (int device)
> @@ -4416,18 +4417,18 @@ GOMP_OFFLOAD_openacc_async_construct (int device)
>  
>if (pthread_mutex_init (&aq->mutex, NULL))
>  {
> -  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue mutex");
> -  return false;
> +  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue mutex");
> +  return NULL;
>  }
>if (pthread_cond_init (&aq->queue_cond_in, NULL))
>  {
> -  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
> -  return false;
> +  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
> +  return NULL;
>  }
>if (pthread_cond_init (&aq->queue_cond_out, NULL))
>  {
> -  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
> -  return false;
> +  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
> +  return NULL;
>  }
>  
>hsa_status_t status = hsa_fns.hsa_queue_create_fn (agent->id,


Re: [Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally

2024-11-18 Thread Tobias Burnus

Committed r15-5392-g884637b6362391 (as attached)

Albeit with a minor comment change as my new comment was slightly 
misleading.


Tobias Burnus wrote:

This fixes a C23 error, causing a build fail: 'false'
should have been 'NULL'.

The NULL value is not really handled as the code calling
maybe_init_omp_async assumes that agent->omp_async_queue can be 
dereferenced. Hence, besides fixing the false/NULL issue, it switches 
to a 'fatal' error. Comments before I commit it after lunch?


Tobias
commit 884637b6362391921100efa2c7db4f4452e2a13f
Author: Tobias Burnus 
Date:   Mon Nov 18 14:58:21 2024 +0100

libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally

libgomp/ChangeLog:

* plugin/plugin-gcn.c (GOMP_OFFLOAD_openacc_async_construct): In
case of an error, call GOMP_PLUGIN_fatal not ..._error; use NULL
not false in return.
---
 libgomp/plugin/plugin-gcn.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index f2f2940de9d..d26b93657bf 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -4388,7 +4388,9 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void *),
   gcn_exec (kernel, devaddrs, dims, targ_mem_desc, true, aq);
 }
 
-/* Create a new asynchronous thread and queue for running future kernels.  */
+/* Create a new asynchronous thread and queue for running future kernels;
+   issues a fatal error if the queue cannot be created as all callers expect
+   that the queue exists.  */
 
 struct goacc_asyncqueue *
 GOMP_OFFLOAD_openacc_async_construct (int device)
@@ -4416,18 +4418,18 @@ GOMP_OFFLOAD_openacc_async_construct (int device)
 
   if (pthread_mutex_init (&aq->mutex, NULL))
 {
-  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue mutex");
-  return false;
+  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue mutex");
+  return NULL;
 }
   if (pthread_cond_init (&aq->queue_cond_in, NULL))
 {
-  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
-  return false;
+  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
+  return NULL;
 }
   if (pthread_cond_init (&aq->queue_cond_out, NULL))
 {
-  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
-  return false;
+  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
+  return NULL;
 }
 
   hsa_status_t status = hsa_fns.hsa_queue_create_fn (agent->id,


[Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally

2024-11-18 Thread Tobias Burnus

This fixes a C23 error, causing a build fail: 'false'
should have been 'NULL'.

The NULL value is not really handled as the code calling
maybe_init_omp_async assumes that agent->omp_async_queue can be 
dereferenced. Hence, besides fixing the false/NULL issue, it switches to 
a 'fatal' error. Comments before I commit it after lunch? Tobias
libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_openacc_async_construct): In
	case of an error, call GOMP_PLUGIN_fatal not ..._error; use NULL
	not false in return.

 libgomp/plugin/plugin-gcn.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index f2f2940de9d..e9af05f43cb 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -4388,7 +4388,8 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void *),
   gcn_exec (kernel, devaddrs, dims, targ_mem_desc, true, aq);
 }
 
-/* Create a new asynchronous thread and queue for running future kernels.  */
+/* Create a new asynchronous thread and queue for running future kernels;
+   fails with a fatal error as all callers expect the queue to exist.  */
 
 struct goacc_asyncqueue *
 GOMP_OFFLOAD_openacc_async_construct (int device)
@@ -4416,18 +4417,18 @@ GOMP_OFFLOAD_openacc_async_construct (int device)
 
   if (pthread_mutex_init (&aq->mutex, NULL))
 {
-  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue mutex");
-  return false;
+  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue mutex");
+  return NULL;
 }
   if (pthread_cond_init (&aq->queue_cond_in, NULL))
 {
-  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
-  return false;
+  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
+  return NULL;
 }
   if (pthread_cond_init (&aq->queue_cond_out, NULL))
 {
-  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
-  return false;
+  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
+  return NULL;
 }
 
   hsa_status_t status = hsa_fns.hsa_queue_create_fn (agent->id,


Re: [Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally

2024-11-18 Thread Andrew Stubbs

On 11/18/24 13:23, Tobias Burnus wrote:

This fixes a C23 error, causing a build fail: 'false'
should have been 'NULL'.

The NULL value is not really handled as the code calling
maybe_init_omp_async assumes that agent->omp_async_queue can be 
dereferenced. Hence, besides fixing the false/NULL issue, it switches to 
a 'fatal' error. Comments before I commit it after lunch? Tobias


LGTM.

Andrew