Re: [Patch] plugin/plugin-gcn.c: Fix error handling of GOMP_OFFLOAD_openacc_async_construct (was: [Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally)
Hi Tobias! On 2024-12-03T11:28:19+0100, Tobias Burnus wrote: > Thomas Schwinge wrote: >> That's a bug in 'libgomp/plugin/plugin-gcn.c:maybe_init_omp_async' (or >> its users); the real user of ['GOMP_OFFLOAD_openacc_async_construct'] does >> handle the error condition: > > Well, it does not really handle it: It also just calls 'fatal', > admittedly after unlocking but if the program dies it does not > really make a difference. It makes a very big difference: without unlocking, the process deadlocks. > However, there are known issues with libgomp-plugin calling FATAL, > which does not really trigger the FATAL part in libgomp, cf. > https://gcc.gnu.org/PR109664 Right. > The attached patch now defers the error handling to a bit later, > either still in libgomp-plugin or in one case it even propagates > the error through to a user-called routine. > > Comments before I commit it? Looks good to me, one incremental step forward, thanks! (I see there are more missing-unlocking issues in the GCN 'GOMP_OFFLOAD_openacc_async_construct', and not only there; that's for another day.) Grüße Thomas > plugin/plugin-gcn.c: Fix error handling of > GOMP_OFFLOAD_openacc_async_construct > > Follow up to r15-5392-g884637b6362391. As the name implies, > GOMP_OFFLOAD_openacc_async_construct is also externally called. > Hence, partially revert previous commit to permit unlocking handling > in oacc-async.c's lookup_goacc_asyncqueue by not failing fatally. > > Hence, also the other (indirect) callers had to be updated: > GOMP_OFFLOAD_dev2dev fails now with 'false' and > GOMP_OFFLOAD_async_run fatally. > > libgomp/ChangeLog: > > * plugin/plugin-gcn.c (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_run): > Handle omp_async_queue == NULL after call tomaybe_init_omp_async. > (GOMP_OFFLOAD_openacc_async_construct): Use error not fatal error, > partially reverting r15-5392. > > libgomp/plugin/plugin-gcn.c | 14 -- > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c > index d26b93657bf..239acf8cb75 100644 > --- a/libgomp/plugin/plugin-gcn.c > +++ b/libgomp/plugin/plugin-gcn.c > @@ -3939,6 +3939,8 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst, const void > *src, size_t n) > { >struct agent_info *agent = get_agent_info (device); >maybe_init_omp_async (agent); > + if (!agent->omp_async_queue) > + return false; >queue_push_copy (agent->omp_async_queue, dst, src, n); >return true; > } > @@ -4350,6 +4352,8 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void > *tgt_vars, > } > >maybe_init_omp_async (agent); > + if (!agent->omp_async_queue) > +GOMP_PLUGIN_fatal ("Asynchronous queue initialization failed"); >queue_push_launch (agent->omp_async_queue, kernel, tgt_vars, kla); >queue_push_callback (agent->omp_async_queue, > GOMP_PLUGIN_target_task_completion, async_data); > @@ -4388,9 +4392,7 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void > *), >gcn_exec (kernel, devaddrs, dims, targ_mem_desc, true, aq); > } > > -/* Create a new asynchronous thread and queue for running future kernels; > - issues a fatal error if the queue cannot be created as all callers expect > - that the queue exists. */ > +/* Create a new asynchronous thread and queue for running future kernels. */ > > struct goacc_asyncqueue * > GOMP_OFFLOAD_openacc_async_construct (int device) > @@ -4418,17 +4420,17 @@ GOMP_OFFLOAD_openacc_async_construct (int device) > >if (pthread_mutex_init (&aq->mutex, NULL)) > { > - GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue mutex"); > + GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue mutex"); >return NULL; > } >if (pthread_cond_init (&aq->queue_cond_in, NULL)) > { > - GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond"); > + GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond"); >return NULL; > } >if (pthread_cond_init (&aq->queue_cond_out, NULL)) > { > - GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond"); > + GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond"); >return NULL; > } >
[Patch] plugin/plugin-gcn.c: Fix error handling of GOMP_OFFLOAD_openacc_async_construct (was: [Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally)
Thomas Schwinge wrote: That's a bug in 'libgomp/plugin/plugin-gcn.c:maybe_init_omp_async' (or its users); the real user of 'GOMP_OFFLOAD_openacc_async_exec' does handle the error condition: Well, it does not really handle it: It also just calls 'fatal', admittedly after unlocking but if the program dies it does not really make a difference. However, there are known issues with libgomp-plugin calling FATAL, which does not really trigger the FATAL part in libgomp, cf. https://gcc.gnu.org/PR109664 * * * The attached patch now defers the error handling to a bit later, either still in libgomp-plugin or in one case it even propagates the error through to a user-called routine. Comments before I commit it? Tobias plugin/plugin-gcn.c: Fix error handling of GOMP_OFFLOAD_openacc_async_construct Follow up to r15-5392-g884637b6362391. As the name implies, GOMP_OFFLOAD_openacc_async_construct is also externally called. Hence, partially revert previous commit to permit unlocking handling in oacc-async.c's lookup_goacc_asyncqueue by not failing fatally. Hence, also the other (indirect) callers had to be updated: GOMP_OFFLOAD_dev2dev fails now with 'false' and GOMP_OFFLOAD_async_run fatally. libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_run): Handle omp_async_queue == NULL after call tomaybe_init_omp_async. (GOMP_OFFLOAD_openacc_async_construct): Use error not fatal error, partially reverting r15-5392. libgomp/plugin/plugin-gcn.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index d26b93657bf..239acf8cb75 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -3939,6 +3939,8 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst, const void *src, size_t n) { struct agent_info *agent = get_agent_info (device); maybe_init_omp_async (agent); + if (!agent->omp_async_queue) + return false; queue_push_copy (agent->omp_async_queue, dst, src, n); return true; } @@ -4350,6 +4352,8 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars, } maybe_init_omp_async (agent); + if (!agent->omp_async_queue) +GOMP_PLUGIN_fatal ("Asynchronous queue initialization failed"); queue_push_launch (agent->omp_async_queue, kernel, tgt_vars, kla); queue_push_callback (agent->omp_async_queue, GOMP_PLUGIN_target_task_completion, async_data); @@ -4388,9 +4392,7 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void *), gcn_exec (kernel, devaddrs, dims, targ_mem_desc, true, aq); } -/* Create a new asynchronous thread and queue for running future kernels; - issues a fatal error if the queue cannot be created as all callers expect - that the queue exists. */ +/* Create a new asynchronous thread and queue for running future kernels. */ struct goacc_asyncqueue * GOMP_OFFLOAD_openacc_async_construct (int device) @@ -4418,17 +4420,17 @@ GOMP_OFFLOAD_openacc_async_construct (int device) if (pthread_mutex_init (&aq->mutex, NULL)) { - GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue mutex"); + GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue mutex"); return NULL; } if (pthread_cond_init (&aq->queue_cond_in, NULL)) { - GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond"); + GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond"); return NULL; } if (pthread_cond_init (&aq->queue_cond_out, NULL)) { - GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond"); + GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond"); return NULL; }
Re: [Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally
Hi Tobias! On 2024-11-18T14:23:24+0100, Tobias Burnus wrote: > This fixes a C23 error, causing a build fail: 'false' > should have been 'NULL'. ACK. > The NULL value is not really handled as the code calling > maybe_init_omp_async assumes that agent->omp_async_queue can be > dereferenced. Hence, besides fixing the false/NULL issue, it switches to > a 'fatal' error. Comments before I commit it after lunch? (Please don't bundle up unrelated changes...) That's a bug in 'libgomp/plugin/plugin-gcn.c:maybe_init_omp_async' (or its users); the real user of 'GOMP_OFFLOAD_openacc_async_exec' does handle the error condition: libgomp/oacc-async.c- if (!dev->openacc.async.asyncqueue[async]) libgomp/oacc-async.c-{ libgomp/oacc-async.c- dev->openacc.async.asyncqueue[async] libgomp/oacc-async.c: = dev->openacc.async.construct_func (dev->target_id); libgomp/oacc-async.c- libgomp/oacc-async.c- if (!dev->openacc.async.asyncqueue[async]) libgomp/oacc-async.c- { libgomp/oacc-async.c- gomp_mutex_unlock (&dev->openacc.async.lock); libgomp/oacc-async.c- gomp_fatal ("async %d creation failed", async); libgomp/oacc-async.c- } ..., but needs to 'gomp_mutex_unlock' before 'gomp_fatal', which your change now circumvents. Therefore, please revert these 's%error%fatal' changes, and instead fix up the libgomp GCN plugin-internal usage of 'GOMP_OFFLOAD_openacc_async_construct'. Grüße Thomas > --- a/libgomp/plugin/plugin-gcn.c > +++ b/libgomp/plugin/plugin-gcn.c > @@ -4388,7 +4388,8 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void > *), >gcn_exec (kernel, devaddrs, dims, targ_mem_desc, true, aq); > } > > -/* Create a new asynchronous thread and queue for running future kernels. */ > +/* Create a new asynchronous thread and queue for running future kernels; > + fails with a fatal error as all callers expect the queue to exist. */ > > struct goacc_asyncqueue * > GOMP_OFFLOAD_openacc_async_construct (int device) > @@ -4416,18 +4417,18 @@ GOMP_OFFLOAD_openacc_async_construct (int device) > >if (pthread_mutex_init (&aq->mutex, NULL)) > { > - GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue mutex"); > - return false; > + GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue mutex"); > + return NULL; > } >if (pthread_cond_init (&aq->queue_cond_in, NULL)) > { > - GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond"); > - return false; > + GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond"); > + return NULL; > } >if (pthread_cond_init (&aq->queue_cond_out, NULL)) > { > - GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond"); > - return false; > + GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond"); > + return NULL; > } > >hsa_status_t status = hsa_fns.hsa_queue_create_fn (agent->id,
Re: [Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally
Committed r15-5392-g884637b6362391 (as attached) Albeit with a minor comment change as my new comment was slightly misleading. Tobias Burnus wrote: This fixes a C23 error, causing a build fail: 'false' should have been 'NULL'. The NULL value is not really handled as the code calling maybe_init_omp_async assumes that agent->omp_async_queue can be dereferenced. Hence, besides fixing the false/NULL issue, it switches to a 'fatal' error. Comments before I commit it after lunch? Tobias commit 884637b6362391921100efa2c7db4f4452e2a13f Author: Tobias Burnus Date: Mon Nov 18 14:58:21 2024 +0100 libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_openacc_async_construct): In case of an error, call GOMP_PLUGIN_fatal not ..._error; use NULL not false in return. --- libgomp/plugin/plugin-gcn.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index f2f2940de9d..d26b93657bf 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -4388,7 +4388,9 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void *), gcn_exec (kernel, devaddrs, dims, targ_mem_desc, true, aq); } -/* Create a new asynchronous thread and queue for running future kernels. */ +/* Create a new asynchronous thread and queue for running future kernels; + issues a fatal error if the queue cannot be created as all callers expect + that the queue exists. */ struct goacc_asyncqueue * GOMP_OFFLOAD_openacc_async_construct (int device) @@ -4416,18 +4418,18 @@ GOMP_OFFLOAD_openacc_async_construct (int device) if (pthread_mutex_init (&aq->mutex, NULL)) { - GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue mutex"); - return false; + GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue mutex"); + return NULL; } if (pthread_cond_init (&aq->queue_cond_in, NULL)) { - GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond"); - return false; + GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond"); + return NULL; } if (pthread_cond_init (&aq->queue_cond_out, NULL)) { - GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond"); - return false; + GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond"); + return NULL; } hsa_status_t status = hsa_fns.hsa_queue_create_fn (agent->id,
[Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally
This fixes a C23 error, causing a build fail: 'false' should have been 'NULL'. The NULL value is not really handled as the code calling maybe_init_omp_async assumes that agent->omp_async_queue can be dereferenced. Hence, besides fixing the false/NULL issue, it switches to a 'fatal' error. Comments before I commit it after lunch? Tobias libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_openacc_async_construct): In case of an error, call GOMP_PLUGIN_fatal not ..._error; use NULL not false in return. libgomp/plugin/plugin-gcn.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index f2f2940de9d..e9af05f43cb 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -4388,7 +4388,8 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void *), gcn_exec (kernel, devaddrs, dims, targ_mem_desc, true, aq); } -/* Create a new asynchronous thread and queue for running future kernels. */ +/* Create a new asynchronous thread and queue for running future kernels; + fails with a fatal error as all callers expect the queue to exist. */ struct goacc_asyncqueue * GOMP_OFFLOAD_openacc_async_construct (int device) @@ -4416,18 +4417,18 @@ GOMP_OFFLOAD_openacc_async_construct (int device) if (pthread_mutex_init (&aq->mutex, NULL)) { - GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue mutex"); - return false; + GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue mutex"); + return NULL; } if (pthread_cond_init (&aq->queue_cond_in, NULL)) { - GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond"); - return false; + GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond"); + return NULL; } if (pthread_cond_init (&aq->queue_cond_out, NULL)) { - GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond"); - return false; + GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond"); + return NULL; } hsa_status_t status = hsa_fns.hsa_queue_create_fn (agent->id,
Re: [Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally
On 11/18/24 13:23, Tobias Burnus wrote: This fixes a C23 error, causing a build fail: 'false' should have been 'NULL'. The NULL value is not really handled as the code calling maybe_init_omp_async assumes that agent->omp_async_queue can be dereferenced. Hence, besides fixing the false/NULL issue, it switches to a 'fatal' error. Comments before I commit it after lunch? Tobias LGTM. Andrew