Re: [Patch] invoke.texi: Add note that -foffload= does not affect device detection

2024-03-04 Thread Tobias Burnus

Hi Sandra,

Sandra Loosemore wrote:

On 3/1/24 08:23, Tobias Burnus wrote:

Maybe the proposed wording will help others to avoid this pitfall.
(Or is this superfluous as -foffload= is not much used and, even if,
no one then remembers or finds this none?)


Well, I spent a long time looking at this, and my only conclusion is 
that I don't really understand what the problem you're trying to solve 
is.  If it's problematical to have the runtime know about offload 
devices the compiled code isn't using, don't users also need to know 
how to restrict the runtime to a particular set of devices the same 
way -foffload= lets you do, and not just how to disable offloading in 
the runtime entirely?
It's pretty clearly documented already how -foffload affects the 
compiler's behavior, and the library's behavior is already documented 
in its own manual.  Maybe what we don't have is a tutorial on how to 
build/link/run programs using a specific offload device, or on the host?


The problem is for code like the following, which is perfectly valid
and works

(A) If you don't have any offload device
(independent of the compiler options)

(B) If you have an offload device (supported by your libgomp)
and compiled with offloading support (for that device)

But (C) if you have an offload device and compile as:
  gcc -fopenmp -foffload=disabled

it will fail at runtime with:

dev = 0 / num devs = 1 Segmentation fault (core dumped) The problem is 
that there is a mismatch between the code (assumes no offload code + 
always host fallback) and the run-time library (which detects offload 
devices), such that the API routines uses a different device than the 
'target' code:


#include 
#include 

#define N 2064
int
main ()
{
  int *x = (int*) omp_target_alloc (sizeof(int)*N,
omp_get_default_device ());
  printf ("dev = %d / num devs = %d\n",
  omp_get_default_device (), omp_get_num_devices ());
  #pragma omp target is_device_ptr(x)
  for (int i = 0; i < N; ++i)
x[i] = i;
}
---

On the technical side, it is not really surprising but it
might be still be confusing for the user. Obviously, it can
also occur if you compile, e.g., for AMD GCN and only an
Nvidia device is available - but there the solution would be
the same (disable all devices).

(OpenMP 6.0 will provide a environment variable that allows
fine tuning of the available devices.)


Questions:

* Is such a usage common enough to matter?
I guess for some benchmark use it make – to test whether
real offloading or host fallback is faster + if the latter
is true, it might also get used in operational code.

* Are API routines used in such a code in a way that it breaks?
(Unfortunately not very unlikely in larger code.)

If there is enough real-world usage (= 2x yes to the questions above):
* How to word is to help users and not to confuse them?

Tobias


Re: [Patch] invoke.texi: Add note that -foffload= does not affect device detection

2024-03-04 Thread Tobias Burnus

Hi,

Sandra Loosemore wrote:

On 3/1/24 17:29, Sandra Loosemore wrote:

On 3/1/24 08:23, Tobias Burnus wrote:
Aside: Shouldn't all the HTML documents start with a  and 
 before

the table of content? Currently, it has:
   Top (GNU libgomp)
and the body starts with
   Short Table of Contents


I note that the 'Top(...)' in  already appears in the GCC 8.5 
docs (created with Texinfo 6.5; while GCC 7.5, created with texinfo 6.3, 
is okay). And the  disappears in the GCC 10.5 doc, created with 
Texinfo 7.0dev.


I have no idea why the 'Top(...)' appears with Texinfo 6.5, but the 
missing  is because of Texinfo 7.0, cf. 
https://git.savannah.gnu.org/cgit/texinfo.git/plain/NEWS


I think it would be useful to remove the 'Top()' in  and add the 
 in general.


For the GCC website, we might want to set TOP_NODE_UP_URL.

I think this is a bug in the version of texinfo used to produce the 
HTML content for the GCC web site.  Looking at a recent build of my 
own using Texinfo 6.7, I do see



GNU libgomp

The manual on the web site says it was produced by "GNU Texinfo 7.0dev".


I poked at this a little and apparently you need to fiddle with the 
SHOW_TITLE or NO_TOP_NODE_OUTPUT customization variables in recent 
versions of Texinfo in order to get the document title to show up in 
HTML output.


https://www.gnu.org/software/texinfo/manual/texinfo/texinfo.html#index-SHOW_005fTITLE 



Probably this has to be controlled by a configure check since older 
Texinfo versions may barf on unknown options.

...
I'd think that if we were going to do that, we'd also want to use an 
official release version of Texinfo instead of a "dev" snapshot.


(I concur that we should update 7.0dev to 7.0.3 or 7.1 on the server to 
have a defined version.)


Thanks,

Tobias



Re: [Patch] invoke.texi: Add note that -foffload= does not affect device detection

2024-03-03 Thread Sandra Loosemore

On 3/1/24 17:29, Sandra Loosemore wrote:

On 3/1/24 08:23, Tobias Burnus wrote:
Aside: Shouldn't all the HTML documents start with a  and  
before

the table of content? Currently, it has:
   Top (GNU libgomp)
and the body starts with
   Short Table of Contents


I think this is a bug in the version of texinfo used to produce the HTML 
content for the GCC web site.  Looking at a recent build of my own using 
Texinfo 6.7, I do see



GNU libgomp

The manual on the web site says it was produced by "GNU Texinfo 7.0dev".


I poked at this a little and apparently you need to fiddle with the 
SHOW_TITLE or NO_TOP_NODE_OUTPUT customization variables in recent 
versions of Texinfo in order to get the document title to show up in 
HTML output.


https://www.gnu.org/software/texinfo/manual/texinfo/texinfo.html#index-SHOW_005fTITLE

Probably this has to be controlled by a configure check since older 
Texinfo versions may barf on unknown options.


I'm not at a good point to fiddle with this myself right now (I'm deep 
inside more metadirective/declare variant hacking), also I have no idea 
how to re-do the HTML manuals linked from the GCC web site to tweak the 
formatting in this way.  I'd think that if we were going to do that, 
we'd also want to use an official release version of Texinfo instead of 
a "dev" snapshot.


-Sandra


Re: [Patch] invoke.texi: Add note that -foffload= does not affect device detection

2024-03-01 Thread Sandra Loosemore

On 3/1/24 08:23, Tobias Burnus wrote:

Not very often, but do I keep running into issues (fails, segfaults)
related to testing programs compiled with a GCC without offload
configured and then using the system libraries. - That's equivalent
to having the system compiler (or any offload compiler) and
compiling with -foffload=disable.

The problem is that while the program only contains host code,
the run-time library still initializes devices when an API
routine - such as omp_get_num_devices - is invoked. This can
lead to odd bugs as target regions, obviously, will use host
fallback (for any device number) but the API routines will
happily operate on the actual devices, which can lead to odd
errors.

(Likewise issue when compiling for one offload target type
and running on a system which has devices of an other type.)

I assume that that's not a very common problem, but it can be
rather confusing when hitting this issue.

Maybe the proposed wording will help others to avoid this pitfall.
(Or is this superfluous as -foffload= is not much used and, even if,
no one then remembers or finds this none?)

Thoughts?


Well, I spent a long time looking at this, and my only conclusion is 
that I don't really understand what the problem you're trying to solve 
is.  If it's problematical to have the runtime know about offload 
devices the compiled code isn't using, don't users also need to know how 
to restrict the runtime to a particular set of devices the same way 
-foffload= lets you do, and not just how to disable offloading in the 
runtime entirely?


It's pretty clearly documented already how -foffload affects the 
compiler's behavior, and the library's behavior is already documented in 
its own manual.  Maybe what we don't have is a tutorial on how to 
build/link/run programs using a specific offload device, or on the host?


Anyway, I don't really object to the text you want to add, but it makes 
me more confused instead of less so.  :-S




* * *

It was not clear to me how to refer to libgomp.texi
- Should it be 'libgomp' as in 'info libgomp' or the URL
   https://gcc.gnu.org/onlinedocs/libgomp/ (or filename of the PDF) 
implies?

- Or as  'GNU Offloading and Multi Processing Runtime Library Manual'
   as named linked to at https://gcc.gnu.org/onlinedocs or on the title 
page
   of the the PDF - but that name is not repeated in the info file or 
the HTML

   file.
- Or even 'GNU libgomp' to mirror a substring in the  of the HTML 
file.

I now ended up only implicitly referring that document.


The Texinfo input file has "@settitle GNU libgomp".

Aside: Shouldn't all the HTML documents start with a  and  
before

the table of content? Currently, it has:
   Top (GNU libgomp)
and the body starts with
   Short Table of Contents


I think this is a bug in the version of texinfo used to produce the HTML 
content for the GCC web site.  Looking at a recent build of my own using 
Texinfo 6.7, I do see



GNU libgomp

The manual on the web site says it was produced by "GNU Texinfo 7.0dev".

-Sandra