Re: [PATCH] Avoid using an unsupported agent when offloading to GCN

2024-01-26 Thread Andrew Stubbs

On 26/01/2024 10:40, Richard Biener wrote:

The following avoids selecting an unsupported agent early, avoiding
later asserts when we rely on it being supported.

tested on x86_64-unknown-linux-gnu -> amdhsa-gcn on gfx1060

that's the alternative to the other patch.  I do indeed seem to get
the other (unsupported) agent selected somehow after the other supported
agent finished a kernel run.  Not sure if it's the CPU or the IGPU though.

OK?  Which variant?


So, looking at it again, the original intent of the assert was to alert 
toolchain developers that they missed adding a new name when porting to 
a new device, but I concur that it's not ideal when the assert 
encounters an unknown device in the wild.


However, if we're trying to do something more useful than merely fixing 
an ugly error message, maybe we should look at removing unsupported 
devices in "suitable_hsa_agent_p" instead? Unsupported GPUs wouldn't be 
assigned a device number at all.


Probably devices that are GPUs but skipped because they are unsupported 
should be mentioned on GOMP_DEBUG (as well as GCN_DEBUG)?


The goal should be that folks with your twin-GPU setup shouldn't have to 
work around it, but I don't really want to remove the message for people 
who only have one device but don't realize it is unsupported.


On the other hand, if a user has two devices that *are* supported, but 
the second one is preferred, they'll have to set OMP_DEFAULT_DEVICE 
explicitly, and is this so different?


As a user, WDYT?

Andrew



libgomp/
* plugin/plugin-gcn.c (get_agent_info): When the agent isn't supported
return NULL.
---
  libgomp/plugin/plugin-gcn.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index d8c3907c108..f453f630e06 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1036,6 +1036,8 @@ print_kernel_dispatch (struct kernel_dispatch *dispatch, 
unsigned indent)
  /* }}}  */
  /* {{{ Utility functions  */
  
+static const char* isa_hsa_name (int isa);

+
  /* Cast the thread local storage to gcn_thread.  */
  
  static inline struct gcn_thread *

@@ -1589,6 +1591,11 @@ get_agent_info (int n)
GOMP_PLUGIN_error ("Attempt to use an uninitialized GCN agent.");
return NULL;
  }
+  if (!isa_hsa_name (hsa_context.agents[n].device_isa))
+{
+  GOMP_PLUGIN_error ("Attempt to use an unsupported GCN agent.");
+  return NULL;
+}
return &hsa_context.agents[n];
  }
  




[PATCH] Avoid using an unsupported agent when offloading to GCN

2024-01-26 Thread Richard Biener
The following avoids selecting an unsupported agent early, avoiding
later asserts when we rely on it being supported.

tested on x86_64-unknown-linux-gnu -> amdhsa-gcn on gfx1060

that's the alternative to the other patch.  I do indeed seem to get
the other (unsupported) agent selected somehow after the other supported
agent finished a kernel run.  Not sure if it's the CPU or the IGPU though.

OK?  Which variant?

libgomp/
* plugin/plugin-gcn.c (get_agent_info): When the agent isn't supported
return NULL.
---
 libgomp/plugin/plugin-gcn.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index d8c3907c108..f453f630e06 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1036,6 +1036,8 @@ print_kernel_dispatch (struct kernel_dispatch *dispatch, 
unsigned indent)
 /* }}}  */
 /* {{{ Utility functions  */
 
+static const char* isa_hsa_name (int isa);
+
 /* Cast the thread local storage to gcn_thread.  */
 
 static inline struct gcn_thread *
@@ -1589,6 +1591,11 @@ get_agent_info (int n)
   GOMP_PLUGIN_error ("Attempt to use an uninitialized GCN agent.");
   return NULL;
 }
+  if (!isa_hsa_name (hsa_context.agents[n].device_isa))
+{
+  GOMP_PLUGIN_error ("Attempt to use an unsupported GCN agent.");
+  return NULL;
+}
   return &hsa_context.agents[n];
 }
 
-- 
2.35.3