Re: [Patch] invoke.texi: Add note that -foffload= does not affect device detection
Hi Sandra, Sandra Loosemore wrote: On 3/1/24 08:23, Tobias Burnus wrote: Maybe the proposed wording will help others to avoid this pitfall. (Or is this superfluous as -foffload= is not much used and, even if, no one then remembers or finds this none?) Well, I spent a long time looking at this, and my only conclusion is that I don't really understand what the problem you're trying to solve is. If it's problematical to have the runtime know about offload devices the compiled code isn't using, don't users also need to know how to restrict the runtime to a particular set of devices the same way -foffload= lets you do, and not just how to disable offloading in the runtime entirely? It's pretty clearly documented already how -foffload affects the compiler's behavior, and the library's behavior is already documented in its own manual. Maybe what we don't have is a tutorial on how to build/link/run programs using a specific offload device, or on the host? The problem is for code like the following, which is perfectly valid and works (A) If you don't have any offload device (independent of the compiler options) (B) If you have an offload device (supported by your libgomp) and compiled with offloading support (for that device) But (C) if you have an offload device and compile as: gcc -fopenmp -foffload=disabled it will fail at runtime with: dev = 0 / num devs = 1 Segmentation fault (core dumped) The problem is that there is a mismatch between the code (assumes no offload code + always host fallback) and the run-time library (which detects offload devices), such that the API routines uses a different device than the 'target' code: #include #include #define N 2064 int main () { int *x = (int*) omp_target_alloc (sizeof(int)*N, omp_get_default_device ()); printf ("dev = %d / num devs = %d\n", omp_get_default_device (), omp_get_num_devices ()); #pragma omp target is_device_ptr(x) for (int i = 0; i < N; ++i) x[i] = i; } --- On the technical side, it is not really surprising but it might be still be confusing for the user. Obviously, it can also occur if you compile, e.g., for AMD GCN and only an Nvidia device is available - but there the solution would be the same (disable all devices). (OpenMP 6.0 will provide a environment variable that allows fine tuning of the available devices.) Questions: * Is such a usage common enough to matter? I guess for some benchmark use it make – to test whether real offloading or host fallback is faster + if the latter is true, it might also get used in operational code. * Are API routines used in such a code in a way that it breaks? (Unfortunately not very unlikely in larger code.) If there is enough real-world usage (= 2x yes to the questions above): * How to word is to help users and not to confuse them? Tobias
Re: [Patch] invoke.texi: Add note that -foffload= does not affect device detection
Hi, Sandra Loosemore wrote: On 3/1/24 17:29, Sandra Loosemore wrote: On 3/1/24 08:23, Tobias Burnus wrote: Aside: Shouldn't all the HTML documents start with a and before the table of content? Currently, it has: Top (GNU libgomp) and the body starts with Short Table of Contents I note that the 'Top(...)' in already appears in the GCC 8.5 docs (created with Texinfo 6.5; while GCC 7.5, created with texinfo 6.3, is okay). And the disappears in the GCC 10.5 doc, created with Texinfo 7.0dev. I have no idea why the 'Top(...)' appears with Texinfo 6.5, but the missing is because of Texinfo 7.0, cf. https://git.savannah.gnu.org/cgit/texinfo.git/plain/NEWS I think it would be useful to remove the 'Top()' in and add the in general. For the GCC website, we might want to set TOP_NODE_UP_URL. I think this is a bug in the version of texinfo used to produce the HTML content for the GCC web site. Looking at a recent build of my own using Texinfo 6.7, I do see GNU libgomp The manual on the web site says it was produced by "GNU Texinfo 7.0dev". I poked at this a little and apparently you need to fiddle with the SHOW_TITLE or NO_TOP_NODE_OUTPUT customization variables in recent versions of Texinfo in order to get the document title to show up in HTML output. https://www.gnu.org/software/texinfo/manual/texinfo/texinfo.html#index-SHOW_005fTITLE Probably this has to be controlled by a configure check since older Texinfo versions may barf on unknown options. ... I'd think that if we were going to do that, we'd also want to use an official release version of Texinfo instead of a "dev" snapshot. (I concur that we should update 7.0dev to 7.0.3 or 7.1 on the server to have a defined version.) Thanks, Tobias
Re: [Patch] invoke.texi: Add note that -foffload= does not affect device detection
On 3/1/24 17:29, Sandra Loosemore wrote: On 3/1/24 08:23, Tobias Burnus wrote: Aside: Shouldn't all the HTML documents start with a and before the table of content? Currently, it has: Top (GNU libgomp) and the body starts with Short Table of Contents I think this is a bug in the version of texinfo used to produce the HTML content for the GCC web site. Looking at a recent build of my own using Texinfo 6.7, I do see GNU libgomp The manual on the web site says it was produced by "GNU Texinfo 7.0dev". I poked at this a little and apparently you need to fiddle with the SHOW_TITLE or NO_TOP_NODE_OUTPUT customization variables in recent versions of Texinfo in order to get the document title to show up in HTML output. https://www.gnu.org/software/texinfo/manual/texinfo/texinfo.html#index-SHOW_005fTITLE Probably this has to be controlled by a configure check since older Texinfo versions may barf on unknown options. I'm not at a good point to fiddle with this myself right now (I'm deep inside more metadirective/declare variant hacking), also I have no idea how to re-do the HTML manuals linked from the GCC web site to tweak the formatting in this way. I'd think that if we were going to do that, we'd also want to use an official release version of Texinfo instead of a "dev" snapshot. -Sandra
Re: [Patch] invoke.texi: Add note that -foffload= does not affect device detection
On 3/1/24 08:23, Tobias Burnus wrote: Not very often, but do I keep running into issues (fails, segfaults) related to testing programs compiled with a GCC without offload configured and then using the system libraries. - That's equivalent to having the system compiler (or any offload compiler) and compiling with -foffload=disable. The problem is that while the program only contains host code, the run-time library still initializes devices when an API routine - such as omp_get_num_devices - is invoked. This can lead to odd bugs as target regions, obviously, will use host fallback (for any device number) but the API routines will happily operate on the actual devices, which can lead to odd errors. (Likewise issue when compiling for one offload target type and running on a system which has devices of an other type.) I assume that that's not a very common problem, but it can be rather confusing when hitting this issue. Maybe the proposed wording will help others to avoid this pitfall. (Or is this superfluous as -foffload= is not much used and, even if, no one then remembers or finds this none?) Thoughts? Well, I spent a long time looking at this, and my only conclusion is that I don't really understand what the problem you're trying to solve is. If it's problematical to have the runtime know about offload devices the compiled code isn't using, don't users also need to know how to restrict the runtime to a particular set of devices the same way -foffload= lets you do, and not just how to disable offloading in the runtime entirely? It's pretty clearly documented already how -foffload affects the compiler's behavior, and the library's behavior is already documented in its own manual. Maybe what we don't have is a tutorial on how to build/link/run programs using a specific offload device, or on the host? Anyway, I don't really object to the text you want to add, but it makes me more confused instead of less so. :-S * * * It was not clear to me how to refer to libgomp.texi - Should it be 'libgomp' as in 'info libgomp' or the URL https://gcc.gnu.org/onlinedocs/libgomp/ (or filename of the PDF) implies? - Or as 'GNU Offloading and Multi Processing Runtime Library Manual' as named linked to at https://gcc.gnu.org/onlinedocs or on the title page of the the PDF - but that name is not repeated in the info file or the HTML file. - Or even 'GNU libgomp' to mirror a substring in the of the HTML file. I now ended up only implicitly referring that document. The Texinfo input file has "@settitle GNU libgomp". Aside: Shouldn't all the HTML documents start with a and before the table of content? Currently, it has: Top (GNU libgomp) and the body starts with Short Table of Contents I think this is a bug in the version of texinfo used to produce the HTML content for the GCC web site. Looking at a recent build of my own using Texinfo 6.7, I do see GNU libgomp The manual on the web site says it was produced by "GNU Texinfo 7.0dev". -Sandra