[PATCH] [testsuite] Fix gcc.dg/pr115066.c fail on aarch64

2024-05-14 Thread Tom de Vries
On aarch64, I get this failure:
...
FAIL: gcc.dg/pr115066.c scan-assembler \\.byte\\t0xb\\t# Define macro strx
...

This happens because we expect to match:
...
.byte   0xb # Define macro strx
...
but instead we get:
...
.byte   0xb // Define macro strx
...

Fix this by not explicitly matching the comment marker.

Tested on aarch64 and x86_64.

gcc/testsuite/ChangeLog:

2024-05-14  Tom de Vries  

* gcc.dg/pr115066.c: Don't match comment marker.
---
 gcc/testsuite/gcc.dg/pr115066.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr115066.c b/gcc/testsuite/gcc.dg/pr115066.c
index 645757df209..a7e98500160 100644
--- a/gcc/testsuite/gcc.dg/pr115066.c
+++ b/gcc/testsuite/gcc.dg/pr115066.c
@@ -2,7 +2,7 @@
 /* { dg-skip-if "split DWARF unsupported" { hppa*-*-hpux* powerpc*-ibm-aix* 
*-*-darwin* } } */
 /* { dg-options "-gsplit-dwarf -g3 -dA -gdwarf-4" } */
 /* { dg-final { scan-assembler-times {\.section\t"?\.debug_macro} 1 } } */
-/* { dg-final { scan-assembler-not {\.byte\t0x5\t# Define macro strp} } } */
-/* { dg-final { scan-assembler {\.byte\t0xb\t# Define macro strx} } } */
+/* { dg-final { scan-assembler-not {\.byte\t0x5\t.* Define macro strp} } } */
+/* { dg-final { scan-assembler {\.byte\t0xb\t.* Define macro strx} } } */
 
 #define foo 1

base-commit: b7003b4cc5e263343f047fe64ed1ae12f561b2d1
-- 
2.35.3



[PATCH] [debug] Fix dwarf v4 .debug_macro.dwo

2024-05-14 Thread Tom de Vries
Consider a hello world, compiled with -gsplit-dwarf and dwarf version 4, and 
-g3:
...
$ gcc -gdwarf-4 -gsplit-dwarf /data/vries/hello.c -g3 -save-temps -dA
...

In section .debug_macro.dwo, we have:
...
.Ldebug_macro0:
.value  0x4 # DWARF macro version number
.byte   0x2 # Flags: 32-bit, lineptr present
.long   .Lskeleton_debug_line0
.byte   0x3 # Start new file
.uleb128 0  # Included from line number 0
.uleb128 0x1# file /data/vries/hello.c
.byte   0x5 # Define macro strp
.uleb128 0  # At line number 0
.uleb128 0x1d0  # The macro: "__STDC__ 1"
...

Given that we use a DW_MACRO_define_strp, we'd expect 0x1d0 to be an
offset into a .debug_str.dwo section.

But in fact, 0x1d0 is an index into the string offset table in
.debug_str_offsets.dwo:
...
.long   0x34f0  # indexed string 0x1d0: __STDC__ 1
...

Add asserts that catch this inconsistency, and fix this by using
DW_MACRO_define_strx instead.

Tested on x86_64.

PR debug/115066
---
 gcc/dwarf2out.cc| 20 ++--
 gcc/testsuite/gcc.dg/pr115066.c |  8 
 2 files changed, 22 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr115066.c

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index eedb13bb069..70b7f5f42cd 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -29045,7 +29045,7 @@ output_macinfo_op (macinfo_entry *ref)
  && !DWARF2_INDIRECT_STRING_SUPPORT_MISSING_ON_TARGET
  && (debug_str_section->common.flags & SECTION_MERGE) != 0)
{
- if (dwarf_split_debug_info && dwarf_version >= 5)
+ if (dwarf_split_debug_info)
ref->code = ref->code == DW_MACINFO_define
? DW_MACRO_define_strx : DW_MACRO_undef_strx;
  else
@@ -29097,12 +29097,20 @@ output_macinfo_op (macinfo_entry *ref)
   HOST_WIDE_INT_PRINT_UNSIGNED,
   ref->lineno);
   if (node->form == DW_FORM_strp)
-dw2_asm_output_offset (dwarf_offset_size, node->label,
-   debug_str_section, "The macro: \"%s\"",
-   ref->info);
+   {
+ gcc_assert (ref->code == DW_MACRO_define_strp
+ || ref->code == DW_MACRO_undef_strp);
+ dw2_asm_output_offset (dwarf_offset_size, node->label,
+debug_str_section, "The macro: \"%s\"",
+ref->info);
+   }
   else
-dw2_asm_output_data_uleb128 (node->index, "The macro: \"%s\"",
- ref->info);
+   {
+ gcc_assert (ref->code == DW_MACRO_define_strx
+ || ref->code == DW_MACRO_undef_strx);
+ dw2_asm_output_data_uleb128 (node->index, "The macro: \"%s\"",
+  ref->info);
+   }
   break;
 case DW_MACRO_import:
   dw2_asm_output_data (1, ref->code, "Import");
diff --git a/gcc/testsuite/gcc.dg/pr115066.c b/gcc/testsuite/gcc.dg/pr115066.c
new file mode 100644
index 000..645757df209
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr115066.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-skip-if "split DWARF unsupported" { hppa*-*-hpux* powerpc*-ibm-aix* 
*-*-darwin* } } */
+/* { dg-options "-gsplit-dwarf -g3 -dA -gdwarf-4" } */
+/* { dg-final { scan-assembler-times {\.section\t"?\.debug_macro} 1 } } */
+/* { dg-final { scan-assembler-not {\.byte\t0x5\t# Define macro strp} } } */
+/* { dg-final { scan-assembler {\.byte\t0xb\t# Define macro strx} } } */
+
+#define foo 1

base-commit: 2d0eeb529d400e61197a09c56011be976dd81ef0
-- 
2.35.3



Re: [Patch] nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]

2023-04-04 Thread Tom de Vries via Gcc-patches

On 4/4/23 11:02, Thomas Schwinge wrote:

Hi!

Are we going to install such a work-around?



Hi,

LGTM.

Thanks,
- Tom



Grüße
  Thomas


On 2022-12-19T13:04:43+0100, I wrote:

Hi!

On 2022-12-16T17:19:00+0100, Tobias Burnus  wrote:

Seems to be a CUDA JIT issue


A Nvidia Driver JIT issue, more precisely.  ;-)


which is fixed by adding a dummy procedure.


Gah...  :-|


Lightly tested with 4 systems at hand, where 2 failed before.


I'm happy to confirm that indeed this does resolve the issue for all
configurations that I reported in 
"OpenMP/nvptx reverse offload execution test FAILs".


As I said on IRC, #gcc, 2022-12-16:


[...] we're unlikely to reverse-engineer the exact version/conditions
where this got fixed, so don't have a useful means for versioning the
workaround.  Fortunately, it doesn't "cost" anything really.  (In
constrast to some other GCC/nvptx back end workarounds, as I
understand.)



Grüße
  Thomas



One had 10.2 and
the other had some ancient CUDA where 'nvptx-smi' did not print a CUDA version
and requires -mptx=3.1.
(I did check that offloading indeed happened and no hostfallback was done.)

OK for mainline?

Tobias




nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]

Seemingly, the ptx JIT of CUDA <= 10.2 replaces function pointers in global
variables by NULL if a translation does not contain any executable code. It
works with CUDA 11.1.  The code of this commit is about reverse offload;
having NULL values disables the side of reverse offload during image load.

Solution is the same as found by Thomas for a related issue: Adding a dummy
procedure. Cf. the PR of this issue and Thomas' patch
"nvptx: Support global constructors/destructors via 'collect2'"
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607749.html

As that approach also works here:

Co-authored-by: Thomas Schwinge 

gcc/
  PR libgomp/108098

  * config/nvptx/mkoffload.cc (process): Emit dummy procedure
  alongside reverse-offload function table to prevent NULL values
  of the function addresses.

---
  gcc/config/nvptx/mkoffload.cc | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 5d89ba8..8306aa0 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -357,6 +357,20 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
  fputc (sm_ver2[i], out);
fprintf (out, "\"\n\t\".file 1 \\\"\\\"\"\n");

+  /* WORKAROUND - see PR 108098
+ It seems as if older CUDA JIT compiler optimizes the function pointers
+ in offload_func_table to NULL, which can be prevented by adding a
+ dummy procedure. With CUDA 11.1, it seems to work fine without
+ workaround while CUDA 10.2 as some ancient version have need the
+ workaround. Assuming CUDA 11.0 fixes it, emitting it could be
+ restricted to 'if (sm_ver2[0] < 8 && version2[0] < 7)' as sm_80 and
+ PTX ISA 7.0 are new in CUDA 11.0; for 11.1 it would be sm_86 and
+ PTX ISA 7.1.  */
+  fprintf (out, "\n\t\".func __dummy$func ( );\"\n");
+  fprintf (out, "\t\".func __dummy$func ( )\"\n");
+  fprintf (out, "\t\"{\"\n");
+  fprintf (out, "\t\"}\"\n");
+
size_t fidx = 0;
for (id = func_ids; id; id = id->next)
  {

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955




Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-12-16 Thread Tom de Vries via Gcc-patches

On 9/21/22 09:45, Chung-Lin Tang wrote:

Hi Tom,
I had a patch submitted earlier, where I reported that the current way 
of implementing
barriers in libgomp on nvptx created a quite significant performance 
drop on some SPEChpc2021

benchmarks:
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html

That previous patch wasn't accepted well (admittedly, it was kind of a 
hack).

So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX.



Ack.

Basically, instead of trying to have the GPU do CPU-with-OS-like things 
that it isn't suited for,
barriers are implemented simplistically with bar.* synchronization 
instructions.
Tasks are processed after threads have joined, and only if 
team->task_count != 0


(arguably, there might be a little bit of performance forfeited where 
earlier arriving threads
could've been used to process tasks ahead of other threads. But that 
again falls into requiring
implementing complex futex-wait/wake like behavior. Really, that kind of 
tasking is not what target

offloading is usually used for)



Please try to add this insight somewhere as a comment in the code, f.i. 
in the header comment of bar.h.



Implementation highlight notes:
1. gomp_team_barrier_wake() is now an empty function (threads never 
"wake" in the usual manner)

2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
3. gomp_barrier_wait_last() now is implemented using "bar.arrive"

4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
    The main synchronization is done using a 'bar.red' instruction. This 
reduces across all threads
    the condition (team->task_count != 0), to enable the task processing 
down below if any thread
    created a task. (this bar.red usage required the need of the second 
GCC patch in this series)


This patch has been tested on x86_64/powerpc64le with nvptx offloading, 
using libgomp, ovo, omptests,
and sollve_vv testsuites, all without regressions. Also verified that 
the SPEChpc 2021 521.miniswp_t
and 534.hpgmgfv_t performance regressions that occurred in the GCC12 
cycle has been restored to

devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk?



AFAIU the waiters and lock fields are longer used, so they can be removed.

Yes, LGTM, please apply (after the other one).

Thanks for addressing this.

FWIW, tested on NVIDIA RTX A2000 with driver 525.60.11.

(also suggest backporting to GCC12 branch, if performance regression can 
be considered a defect)




That's ok, but wait a while after applying on trunk before doing that, 
say a month.


Thanks,
- Tom


Thanks,
Chung-Lin

libgomp/ChangeLog:

2022-09-21  Chung-Lin Tang  

 * config/nvptx/bar.c (generation_to_barrier): Remove.
 (futex_wait,futex_wake,do_spin,do_wait): Remove.
 (GOMP_WAIT_H): Remove.
 (#include "../linux/bar.c"): Remove.
 (gomp_barrier_wait_end): New function.
 (gomp_barrier_wait): Likewise.
 (gomp_barrier_wait_last): Likewise.
 (gomp_team_barrier_wait_end): Likewise.
 (gomp_team_barrier_wait): Likewise.
 (gomp_team_barrier_wait_final): Likewise.
 (gomp_team_barrier_wait_cancel_end): Likewise.
 (gomp_team_barrier_wait_cancel): Likewise.
 (gomp_team_barrier_cancel): Likewise.
 * config/nvptx/bar.h (gomp_team_barrier_wake): Remove
 prototype, add new static inline function.


Re: [PATCH, nvptx, 2/2] Reimplement libgomp barriers for nvptx: bar.red instruction support in GCC

2022-12-16 Thread Tom de Vries via Gcc-patches

On 9/21/22 09:45, Chung-Lin Tang wrote:

Hi Tom, following the first patch.

This new barrier implementation I posted in the first patch uses the 
'bar.red' instruction. > Usually this could've been easily done with a single line of inline

assembly. However I quickly
realized that because the NVPTX GCC port is implemented with all virtual 
general registers,

we don't have a register constraint usable to select "predicate registers".
Since bar.red uses predicate typed values, I can't create it directly 
using inline asm.


So it appears that the most simple way of accessing it is with a target 
builtin.
The attached patch adds bar.red instructions to the nvptx port, and 
__builtin_nvptx_bar_red_* builtins
to use it. The code should support all variations of bar.red (and, or, 
and popc operations).


(This support was used to implement the first libgomp barrier patch, so 
must be approved together)




What I conclude from what you're telling me here is that this is the 
first patch in the series rather than the second.


So, LGTM, please apply it, unless it cannot be applied by itself without 
causing regressions, in which case you need to fix those first.


IWBN if this also included standalone test-cases in 
gcc/testsuite/gcc.target/nvptx, but I suppose we can live without for now.


Thanks,
- Tom


Thanks,
Chung-Lin

2022-09-21  Chung-Lin Tang  

gcc/ChangeLog:

 * config/nvptx/nvptx.cc (nvptx_print_operand): Add 'p'
 case, adjust comments.
 (enum nvptx_builtins): Add NVPTX_BUILTIN_BAR_RED_AND,
 NVPTX_BUILTIN_BAR_RED_OR, and NVPTX_BUILTIN_BAR_RED_POPC.
 (nvptx_expand_bar_red): New function.
 (nvptx_init_builtins):
 Add DEFs of __builtin_nvptx_bar_red_[and/or/popc].
 (nvptx_expand_builtin): Use nvptx_expand_bar_red to expand
 NVPTX_BUILTIN_BAR_RED_[AND/OR/POPC] cases.

 * config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
 UNSPECV_BARRED_AND, UNSPECV_BARRED_OR, and UNSPECV_BARRED_POPC.
 (BARRED): New int iterator.
 (barred_op,barred_mode,barred_ptxtype): New int attrs.
 (nvptx_barred_): New define_insn.


Re: nvptx: In 'STARTFILE_SPEC', fix 'crt0.o' for '-mmainkernel' (was: [MentorEmbedded/nvptx-tools] Match standard 'ld' "search" behavior (PR #38))

2022-11-18 Thread Tom de Vries via Gcc-patches

On 11/19/22 00:25, Thomas Schwinge wrote:

Hi!

Re
:

On 2022-11-18T11:05:23-0800, I wrote:

Actually, in GCC/nvptx target testing, this #38's commit 
886a95faf66bf66a82fc0fe7d2a9fd9e9fec2820 "ld: Don't search for input files in 
'-L'directories" is generally causing linking to fail with:

```
error opening crt0.o
collect2: error: ld returned 1 exit status
compiler exited with status 1
```

I'm investigating.


OK to push the attached
GCC "nvptx: In 'STARTFILE_SPEC', fix 'crt0.o' for '-mmainkernel'" to all
active GCC branches?  (... instead of having to restore this "blunder"
(do "search for input files in '-L'directories") in nvptx-tools...)



Hi,

yes, LGTM.

Thanks,
- Tom



Grüße
  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[committed] Don't build readline/libreadline.a, when --with-system-readline is supplied

2022-10-21 Thread Tom de Vries via Gcc-patches
Hi,

[ Committed as obvious as per
https://gcc.gnu.org/legacy-ml/gcc-patches/2018-12/msg00299.html . ]

https://sourceware.org/bugzilla/show_bug.cgi?id=18632

The bundled libreadline is always built, even if the system is
./configure'd --with-system-readline and the build libreadline.a is not
used.

Proposed patch:

Fix ./configure.ac not to proceed readline/, when --with-system-
readline is provided

* configure.ac: Don't configure readline if --with-system-readline is
used.
* configure: Re-generate.


Committed to trunk.

Thanks,
- Tom

Don't build readline/libreadline.a, when --with-system-readline is supplied

---
 configure| 6 ++
 configure.ac | 6 ++
 2 files changed, 12 insertions(+)

diff --git a/configure b/configure
index d9aa84c6138..007a77a5f6c 100755
--- a/configure
+++ b/configure
@@ -2946,6 +2946,12 @@ if test x$with_system_zlib = xyes ; then
   noconfigdirs="$noconfigdirs zlib"
 fi
 
+# Don't compile the bundled readline/libreadline.a if --with-system-readline
+# is provided.
+if test x$with_system_readline = xyes ; then
+  noconfigdirs="$noconfigdirs readline"
+fi
+
 # some tools are so dependent upon X11 that if we're not building with X,
 # it's not even worth trying to configure, much less build, that tool.
 
diff --git a/configure.ac b/configure.ac
index 2cff32e300e..1df410bba1f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -247,6 +247,12 @@ if test x$with_system_zlib = xyes ; then
   noconfigdirs="$noconfigdirs zlib"
 fi
 
+# Don't compile the bundled readline/libreadline.a if --with-system-readline
+# is provided.
+if test x$with_system_readline = xyes ; then
+  noconfigdirs="$noconfigdirs readline"
+fi
+
 # some tools are so dependent upon X11 that if we're not building with X, 
 # it's not even worth trying to configure, much less build, that tool.
 


Re: Restore default 'sorry' 'TARGET_ASM_CONSTRUCTOR', 'TARGET_ASM_DESTRUCTOR' (was: [PATCH 1/3] STABS: remove -gstabs and -gxcoff functionality)

2022-10-10 Thread Tom de Vries via Gcc-patches

On 10/10/22 16:19, Thomas Schwinge wrote:

With that, OK to push?


FWIW, nvptx change looks in the obvious category to me.

Thanks,
- Tom


[PATCH] Add --without-makeinfo

2022-10-04 Thread Tom de Vries via Gcc-patches
Hi,

Currently, we cannot build gdb without makeinfo installed.

It would be convenient to work around this by using the configure flag
MAKEINFO=/usr/bin/true or some such, but that doesn't work because top-level
configure requires a makeinfo of at least version 4.7, and that version check
fails for /usr/bin/true, so we end up with MAKEINFO=missing instead.

What does work is this:
...
$ ./configure
$ make MAKEINFO=/usr/bin/true
...
but the drawback is that it'll have to be specified for each make invocation.

Fix this by adding support for --without-makeinfo in top-level configure.

Tested by building gdb on x86_64-linux, and verifying that no .info files
were generated.

OK for trunk?

Thanks,
- Tom

Add --without-makeinfo

ChangeLog:

2022-09-05  Tom de Vries  

* configure.ac: Add --without-makeinfo.
* configure: Regenerate.

---
 configure| 4 
 configure.ac | 4 
 2 files changed, 8 insertions(+)

diff --git a/configure b/configure
index f14e0efd675..eb84add60cb 100755
--- a/configure
+++ b/configure
@@ -8399,6 +8399,9 @@ fi
 done
 test -n "$MAKEINFO" || MAKEINFO="$MISSING makeinfo"
 
+if test $with_makeinfo = "no"; then
+MAKEINFO=true
+else
 case " $build_configdirs " in
   *" texinfo "*) MAKEINFO='$$r/$(BUILD_SUBDIR)/texinfo/makeinfo/makeinfo' ;;
   *)
@@ -8414,6 +8417,7 @@ case " $build_configdirs " in
 ;;
 
 esac
+fi
 
 # FIXME: expect and dejagnu may become build tools?
 
diff --git a/configure.ac b/configure.ac
index 0152c69292e..e4a2c076674 100644
--- a/configure.ac
+++ b/configure.ac
@@ -3441,6 +3441,9 @@ case " $build_configdirs " in
 esac
 
 AC_CHECK_PROGS([MAKEINFO], makeinfo, [$MISSING makeinfo])
+if test $with_makeinfo = "no"; then
+MAKEINFO=true
+else
 case " $build_configdirs " in
   *" texinfo "*) MAKEINFO='$$r/$(BUILD_SUBDIR)/texinfo/makeinfo/makeinfo' ;;
   *)
@@ -3456,6 +3459,7 @@ changequote(,)
 ;;
 changequote([,])
 esac
+fi
 
 # FIXME: expect and dejagnu may become build tools?
 


Re: [PING^5] nvptx: Allow '--with-arch' to override the default '-misa' (was: nvptx multilib setup)

2022-09-18 Thread Tom de Vries via Gcc-patches

On 8/6/22 21:20, Thomas Schwinge wrote:

Hi Tom!



Hi Thomas,

thanks for doing this.

Series approved.

As I mentioned, I'm not completely happy with the multilib name, but I 
don't think it makes sense to post-pone approval for this.


Thanks,
- Tom


Ping.


Grüße
  Thomas


On 2022-07-27T17:48:58+0200, I wrote:

Hi Tom!

Ping.


Grüße
  Thomas


On 2022-07-20T14:46:03+0200, I wrote:

Hi Tom!

Ping.


Grüße
  Thomas


On 2022-07-13T10:42:44+0200, I wrote:

Hi Tom!

Ping.


Grüße
  Thomas


On 2022-07-05T16:59:23+0200, I wrote:

Hi Tom!

Ping.


Grüße
  Thomas


On 2022-06-15T23:18:10+0200, I wrote:

Hi Tom!

On 2022-05-13T16:20:14+0200, I wrote:

On 2022-02-04T13:09:29+0100, Tom de Vries via Gcc  wrote:

On 2/4/22 08:21, Thomas Schwinge wrote:

On 2022-02-03T13:35:55+, "vries at gcc dot gnu.org via Gcc-bugs" 
 wrote:

I've tested this using (recommended) driver 470.94 on boards:



while iterating over dimensions { -mptx=3.1 , -mptx=6.3 } x { GOMP_NVPTX_JIT=-O0, 
 }.


Do you use separate (nvptx-none offload target only?) builds for
different '-mptx' variants (likewise: '-misa'), or have you hacked up the
multilib configuration?


Neither, I'm using --target_board=unix/foffload= for that.


ACK, I see.  So these flags then only affect GCC/nvptx code generation
for the actual user code (here: GCC libgomp test cases), but for the
GCC/nvptx target libraries (such as: libc, libm, libgfortran, libgomp --
the latter especially relevant for OpenMP), it uses PTX code from one of
the two "pre-compiled" GCC/nvptx multilibs: default or '-mptx=3.1'.

Meaning, one can't just use such a flag for "completely building code"
for a specific configuration.  Random example,
'-foffload-options=nvptx-none=-march=sm_75': as GCC/nvptx target
libraries aren't being built for '-march=sm_75' multilib,
'-foffload-options=nvptx-none=-march=sm_75' uses the default multilib,
which isn't '-march=sm_75'.



   ('gcc/config/nvptx/t-nvptx:MULTILIB_OPTIONS'

etc., I suppose?)  Should we add a few representative configurations to
be built by default?  And/or, should we have a way to 'configure' per
user needs (I suppose: '--with-multilib-list=[...]', as supported for a
few other targets?)?  (I see there's also a new
'--with-multilib-generator=[...]', haven't looked in detail.)  No matter
which way: again, combinatorial explosion is a problem, of course...


As far as I know, the gcc build doesn't finish when switching default to
higher than sm_35, so there's little point to go to a multilib setup at
this point.  But once we fix that, we could reconsider, otherwise,
things are likely to regress again.


As far as I remember, several issues have been fixed.  Still waiting for
Roger's "middle-end: Support ABIs that pass FP values as wider integers"
or something similar, but that PR104489 issue is being worked around by
"Limit HFmode support to mexperimental", if I got that right.

Now I'm not suggesting we should now enable all or any random GCC/nvptx
multilibs, to get all these variants of GCC/nvptx target libraries built;
especially also given that GCC/nvptx code generation currently doesn't
make too much use of the new capabilities.

However, we do have a specific request that a customer would like to be
able to change at GCC 'configure' time the GCC/nvptx default multilib
(including that being used for building corresponding GCC/nvptx target
libraries).

Per 'gcc/doc/install.texi', I do see that some GCC targets allow for
GCC 'configure'-time '--with-multilib-list=[...]', or
'--with-multilib-generator=[...]', and I suppose we could be doing
something similar?  But before starting implementing, I'd like your
input, as you'll be the one to approve in the end.  And/or, maybe you've
already made up your own ideas about that?


So, instead of "random GCC/nvptx multilib configuration" (last
paragraph), I've come up with a way to implement our customer's request
(second last paragraph): 'configure' GCC/nvptx '--with-arch=sm_70'.

I think I've implemented this in a way so that "random GCC/nvptx multilib
configuration" may eventually be implemented on top of that.  For easy
review/testing I've split my changes into three commits, see attached
"nvptx: Make default '-misa=sm_30' explicit",
"nvptx: Introduce dummy multilib option for default '-misa=sm_30'",
"nvptx: Allow '--with-arch' to override the default '-misa'".

To the best of my knowledge, the first two patches do not change any
user-visible behavior (I generally 'diff'ed target libraries, and
compared a good number of 'gcc -print-multi-directory [flags]'), and
likewise with the third patch, given implicit (default) or explicit
'--with-arch=sm_30', and that with '--with-arch=sm_70', for example, the
'-misa=sm_70' multilib variants are used for implicit (default) or
explicit '-misa=sm_70' or higher, and the '-misa=sm_30' multilib variants
are used for explicit lower '-misa'.

What do you

Re: [committed][nvptx] Add uniform_warp_check insn

2022-09-14 Thread Tom de Vries via Gcc-patches

On 9/14/22 11:41, Thomas Schwinge wrote:

Hi Tom!

On 2022-02-01T19:31:27+0100, Tom de Vries via Gcc-patches 
 wrote:

Hi,

On a GT 1030, with driver version 470.94 and -mptx=3.1 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
   -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
   -O2 execution test
...


This relates to PR99932 "OpenACC/nvptx offloading execution regressions
starting with CUDA 11.2-era Nvidia Driver 460.27.04".  You've fixed that
for GCC 12+, but testing nvptx offloading for GCC 11, GCC 10 branches on
a system with current Nvidia Driver, we're still running into the several
FAILs and annoying 'WARNING: program timed out' reported in PR99932.

GCC 11, GCC 10 only build '.version 3.1' target libraries, and with
"[nvptx] Add bar.warp.sync" in place (which this patch here thus only
textually depends on), we've got good test results again.

OK to push to GCC 11, GCC 10 branches the attached
"[nvptx] Add uniform_warp_check insn"?




LGTM.

Thanks,
- Tom


Re: [committed][nvptx] Add bar.warp.sync

2022-09-14 Thread Tom de Vries via Gcc-patches

On 9/14/22 11:41, Thomas Schwinge wrote:

Hi Tom!

On 2022-02-01T19:31:13+0100, Tom de Vries via Gcc-patches 
 wrote:

On a GT 1030 (sm_61), with driver version 470.94 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
   -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
   -O2 execution test
...


This relates to PR99932 "OpenACC/nvptx offloading execution regressions
starting with CUDA 11.2-era Nvidia Driver 460.27.04".  You've fixed that
for GCC 12+, but testing nvptx offloading for GCC 11, GCC 10 branches on
a system with current Nvidia Driver, we're still running into the several
FAILs and annoying 'WARNING: program timed out' reported in PR99932.

Given that GCC 11, GCC 10 only build '.version 3.1' target libraries, the
patch below is actually a no-op ('!TARGET_PTX_6_0'; thus the attached
"nvptx: Define (dummy) 'TARGET_PTX_6_0'"), but having it makes it easier
to cherry-pick the actual relevant "[nvptx] Add uniform_warp_check insn".

OK to push to GCC 11, GCC 10 branches the attached
"nvptx: Define (dummy) 'TARGET_PTX_6_0'", "[nvptx] Add bar.warp.sync"?



LGTM, and thanks for taking care of this.

Thanks,
- Tom



[PING^2][PATCH][gdb/build] Fix build breaker with --enabled-shared

2022-09-06 Thread Tom de Vries via Gcc-patches

On 7/12/22 15:42, Tom de Vries wrote:

[ dropped gdb-patches, since already applied there. ]

On 6/27/22 15:38, Tom de Vries wrote:

On 6/27/22 15:03, Tom de Vries wrote:

Hi,

When building gdb with --enabled-shared, I run into:
...
ld: build/zlib/libz.a(libz_a-inffast.o): relocation R_X86_64_32S 
against \
   `.rodata' can not be used when making a shared object; recompile 
with -fPIC

ld: build/zlib/libz.a(libz_a-inflate.o): warning: relocation against \
   `inflateResetKeep' in read-only section `.text'
collect2: error: ld returned 1 exit status
make[3]: *** [libbfd.la] Error 1
...

This is a regression since commit a08bdb159bb ("[gdb/build] Fix 
gdbserver

build with -fsanitize=thread").

The problem is that a single case statement in configure is shared to 
handle
special requirements for both the host libiberty and host zlib, which 
has the

effect that only one is handled.

Fix this by handling libiberty and zlib each in its own case statement.

Build on x86_64-linux, with and without --enable-shared.

OK for gcc trunk?





Ping^2.

Thanks,
- Tom


To fix the buildbot breakage, I already pushed to the gdb repo.

Thanks,
- Tom



[gdb/build] Fix build breaker with --enabled-shared

ChangeLog:

2022-06-27  Tom de Vries  

* configure.ac: Set extra_host_libiberty_configure_flags and
extra_host_zlib_configure_flags in separate case statements.
* configure: Regenerate.

---
  configure    | 8 ++--
  configure.ac | 8 ++--
  2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/configure b/configure
index aac80b88d70..be433ef6d5d 100755
--- a/configure
+++ b/configure
@@ -6962,13 +6962,18 @@ fi
  # Sometimes we have special requirements for the host libiberty.
  extra_host_libiberty_configure_flags=
-extra_host_zlib_configure_flags=
  case " $configdirs " in
    *" lto-plugin "* | *" libcc1 "* | *" gdbserver "*)
  # When these are to be built as shared libraries, the same 
applies to

  # libiberty.
  extra_host_libiberty_configure_flags=--enable-shared
  ;;
+esac
+
+
+# Sometimes we have special requirements for the host zlib.
+extra_host_zlib_configure_flags=
+case " $configdirs " in
    *" bfd "*)
  # When bfd is to be built as a shared library, the same applies to
  # zlib.
@@ -6979,7 +6984,6 @@ case " $configdirs " in
  esac
-
  # Produce a warning message for the subdirs we can't configure.
  # This isn't especially interesting in the Cygnus tree, but in the 
individual
  # FSF releases, it's important to let people know when their 
machine isn't

diff --git a/configure.ac b/configure.ac
index 29f74d10b5a..1651cbf3b02 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2342,13 +2342,18 @@ fi
  # Sometimes we have special requirements for the host libiberty.
  extra_host_libiberty_configure_flags=
-extra_host_zlib_configure_flags=
  case " $configdirs " in
    *" lto-plugin "* | *" libcc1 "* | *" gdbserver "*)
  # When these are to be built as shared libraries, the same 
applies to

  # libiberty.
  extra_host_libiberty_configure_flags=--enable-shared
  ;;
+esac
+AC_SUBST(extra_host_libiberty_configure_flags)
+
+# Sometimes we have special requirements for the host zlib.
+extra_host_zlib_configure_flags=
+case " $configdirs " in
    *" bfd "*)
  # When bfd is to be built as a shared library, the same applies to
  # zlib.
@@ -2357,7 +2362,6 @@ case " $configdirs " in
  fi
  ;;
  esac
-AC_SUBST(extra_host_libiberty_configure_flags)
  AC_SUBST(extra_host_zlib_configure_flags)
  # Produce a warning message for the subdirs we can't configure.


Re: [PATCH] nvptx: Silence unused variable warning

2022-09-06 Thread Tom de Vries via Gcc-patches

On 8/28/22 13:09, Jan-Benedict Glaw wrote:

Hi!

The nvptx backend defines ASM_OUTPUT_DEF along with
ASM_OUTPUT_DEF_FROM_DECLS.  Much like the rs6000 coff target, nvptx
triggers an unused variable warning:

/usr/lib/gcc-snapshot/bin/g++  -fno-PIE -c   -g -O2   -DIN_GCC  
-DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic 
-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common 
 -DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. 
-I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include 
-I../../gcc/gcc/../libcody  -I../../gcc/gcc/../libdecnumber 
-I../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I../../gcc/gcc/../libbacktrace   -o varasm.o -MT varasm.o -MMD -MP -MF 
./.deps/varasm.TPo ../../gcc/gcc/varasm.cc
../../gcc/gcc/varasm.cc: In function 'void 
output_constant_pool_contents(rtx_constant_pool*)':
../../gcc/gcc/varasm.cc:4318:21: error: unused variable 'name' 
[-Werror=unused-variable]
  4318 | const char *name = XSTR (desc->sym, 0);
   | ^~~~
cc1plus: all warnings being treated as errors
make[1]: *** [Makefile:1145: varasm.o] Error 1


Fixed the same way:

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index ed72c253191..71297440566 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -321,6 +321,9 @@ struct GTY(()) machine_function
  #define ASM_OUTPUT_DEF(FILE,LABEL1,LABEL2)\
do  \
  { \
+  (void) (FILE);   \
+  (void) (LABEL1); \
+  (void) (LABEL2); \
gcc_unreachable (); \
  } \
while (0)


Ok for HEAD?



LGTM.

Thanks,
- Tom


Re: [PING] nvptx: forward '-v' command-line option to assembler, linker

2022-09-05 Thread Tom de Vries via Gcc-patches




On 6/7/22 17:41, Thomas Schwinge wrote:

Subject:
[PING] nvptx: forward '-v' command-line option to assembler, linker
From:
Thomas Schwinge 
Date:
6/7/22, 17:41

To:
Tobias Burnus , , "Tom 
de Vries" 



Hi!

On 2022-05-30T09:06:21+0200, Tobias Burnus  wrote:

On 29.05.22 22:49, Thomas Schwinge wrote:

Not sure if that's what you had in mind, but what do you think about the
attached "nvptx: forward '-v' command-line option to assembler, linker"?
OK to push to GCC master branch (after merging
<https://github.com/MentorEmbedded/nvptx-tools/pull/37>
"Put '-v' verbose output onto stderr instead of stdout")?

I was mainly thinking of some way to have it available — which
'-foffload-options=-Wa,-v' already permits on the GCC side. (Once the
nvptx-tools patch actually makes use of the '-v'.)

(Merged a week ago.)


If I understand your patch correctly, this patch now causes 'gcc -v' to
imply 'gcc -v -Wa,-v'. I think that's okay, since 'gcc -v' already
outputs a lot of lines and those lines can be helpful to understand what
happens and what not.

ACK.


Tom, your thoughts on this?

Ping.


Grüße
  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


0001-nvptx-forward-v-command-line-option-to-assembler-lin.patch

 From 17c35607d4927299b0c4bd19dd6fd205c85c4a4b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge
Date: Sun, 29 May 2022 22:31:43 +0200
Subject: [PATCH] nvptx: forward '-v' command-line option to assembler, linker

For example, for offloading compilation with '-save-temps -v', before vs. after
word-diff then looks like:

 [...]
  [...]/build-gcc-offload-nvptx-none/gcc/as {+-v -v+} -o 
./a.xnvptx-none.mkoffload.o ./a.xnvptx-none.mkoffload.s
 {+Verifying sm_30 code with sm_35 code generation.+}
 {+ ptxas -c -o /dev/null ./a.xnvptx-none.mkoffload.o --gpu-name sm_35 -O0+}
 [...]
  [...]/build-gcc-offload-nvptx-none/gcc/collect2 {+-v -v+} -o 
./a.xnvptx-none.mkoffload [...] @./a.xnvptx-none.mkoffload.args.1 -lgomp -lgcc 
-lc -lgcc
 {+collect2 version 12.0.1 20220428 (experimental)+}
 {+[...]/build-gcc-offload-nvptx-none/gcc/collect-ld -v -v -o 
./a.xnvptx-none.mkoffload [...] ./a.xnvptx-none.mkoffload.o -lgomp -lgcc -lc 
-lgcc+}
 {+Linking ./a.xnvptx-none.mkoffload.o as 0+}
 {+trying lib libc.a+}
 {+trying lib libgcc.a+}
 {+trying lib libgomp.a+}
 {+Resolving abort+}
 {+Resolving acc_on_device+}
 {+Linking libgomp.a::oacc-init.o/ as 1+}
 {+Linking libc.a::lib_a-abort.o/   as 2+}
 [...]

(This depends on<https://github.com/MentorEmbedded/nvptx-tools/pull/37>
"Put '-v' verbose output onto stderr instead of stdout".)



Ack, I see that has been merged.

The ASM_SPEC part LGTM.

The LINK_SPEC part results looked very verbose to me at first glance, 
given that it prints info that with gnu ld we'd only see with 
-Wl,-trace.  But I suppose that's more of a question of what we print 
with nvptx-none-ld -v.


Still, I wonder, normally we don't pass -v to ld, and need -Wl,-v for 
that.  So, any particular reason why we would do things differently for 
nvptx?


Thanks,
- Tom


gcc/
* config/nvptx/nvptx.h (ASM_SPEC, LINK_SPEC): Define.
---
  gcc/config/nvptx/nvptx.h | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index ed72c253191..b184f1d0150 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -27,6 +27,13 @@
  
  /* Run-time Target.  */
  
+/* Assembler supports '-v' option; handle similar to

+   '../../gcc.cc:asm_options', 'HAVE_GNU_AS'.  */
+#define ASM_SPEC "%{v}"
+
+/* Linker supports '-v' option.  */
+#define LINK_SPEC "%{v}"
+
  #define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
  
  #define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()

-- 2.25.1

Attachments:

0001-nvptx-forward-v-command-line-option-to-assembler-lin.patch 2.0 KB



Re: [PING][PATCH][gdb/build] Fix build breaker with --enabled-shared

2022-07-12 Thread Tom de Vries via Gcc-patches

On 7/12/22 15:59, Iain Sandoe wrote:

Hi Tom


On 12 Jul 2022, at 14:42, Tom de Vries via Gcc-patches 
 wrote:

[ dropped gdb-patches, since already applied there. ]

On 6/27/22 15:38, Tom de Vries wrote:

On 6/27/22 15:03, Tom de Vries wrote:

Hi,

When building gdb with --enabled-shared, I run into:
...
ld: build/zlib/libz.a(libz_a-inffast.o): relocation R_X86_64_32S against \
`.rodata' can not be used when making a shared object; recompile with -fPIC
ld: build/zlib/libz.a(libz_a-inflate.o): warning: relocation against \
`inflateResetKeep' in read-only section `.text'
collect2: error: ld returned 1 exit status
make[3]: *** [libbfd.la] Error 1
...

This is a regression since commit a08bdb159bb ("[gdb/build] Fix gdbserver
build with -fsanitize=thread").

The problem is that a single case statement in configure is shared to handle
special requirements for both the host libiberty and host zlib, which has the
effect that only one is handled.

Fix this by handling libiberty and zlib each in its own case statement.

Build on x86_64-linux, with and without --enable-shared.

OK for gcc trunk?



Ping.


see also
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/597263.html
which is approved but i didn’t yet push it ..



Ack.


do you not see any issues with GMP et. al. (or are they not used)?


Well, it's used, but I'm not building it in-tree:
...
$ ldd ./gdb | grep gmp
libgmp.so.10 => /usr/lib64/libgmp.so.10 (0x7f7008706000)
...

Thanks,
- Tom


Iain



Thanks,
- Tom


To fix the buildbot breakage, I already pushed to the gdb repo.
Thanks,
- Tom


[gdb/build] Fix build breaker with --enabled-shared

ChangeLog:

2022-06-27  Tom de Vries  

 * configure.ac: Set extra_host_libiberty_configure_flags and
 extra_host_zlib_configure_flags in separate case statements.
 * configure: Regenerate.

---
   configure| 8 ++--
   configure.ac | 8 ++--
   2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/configure b/configure
index aac80b88d70..be433ef6d5d 100755
--- a/configure
+++ b/configure
@@ -6962,13 +6962,18 @@ fi
   # Sometimes we have special requirements for the host libiberty.
   extra_host_libiberty_configure_flags=
-extra_host_zlib_configure_flags=
   case " $configdirs " in
 *" lto-plugin "* | *" libcc1 "* | *" gdbserver "*)
   # When these are to be built as shared libraries, the same applies to
   # libiberty.
   extra_host_libiberty_configure_flags=--enable-shared
   ;;
+esac
+
+
+# Sometimes we have special requirements for the host zlib.
+extra_host_zlib_configure_flags=
+case " $configdirs " in
 *" bfd "*)
   # When bfd is to be built as a shared library, the same applies to
   # zlib.
@@ -6979,7 +6984,6 @@ case " $configdirs " in
   esac
-
   # Produce a warning message for the subdirs we can't configure.
   # This isn't especially interesting in the Cygnus tree, but in the individual
   # FSF releases, it's important to let people know when their machine isn't
diff --git a/configure.ac b/configure.ac
index 29f74d10b5a..1651cbf3b02 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2342,13 +2342,18 @@ fi
   # Sometimes we have special requirements for the host libiberty.
   extra_host_libiberty_configure_flags=
-extra_host_zlib_configure_flags=
   case " $configdirs " in
 *" lto-plugin "* | *" libcc1 "* | *" gdbserver "*)
   # When these are to be built as shared libraries, the same applies to
   # libiberty.
   extra_host_libiberty_configure_flags=--enable-shared
   ;;
+esac
+AC_SUBST(extra_host_libiberty_configure_flags)
+
+# Sometimes we have special requirements for the host zlib.
+extra_host_zlib_configure_flags=
+case " $configdirs " in
 *" bfd "*)
   # When bfd is to be built as a shared library, the same applies to
   # zlib.
@@ -2357,7 +2362,6 @@ case " $configdirs " in
   fi
   ;;
   esac
-AC_SUBST(extra_host_libiberty_configure_flags)
   AC_SUBST(extra_host_zlib_configure_flags)
   # Produce a warning message for the subdirs we can't configure.




[PING][PATCH][gdb/build] Fix build breaker with --enabled-shared

2022-07-12 Thread Tom de Vries via Gcc-patches

[ dropped gdb-patches, since already applied there. ]

On 6/27/22 15:38, Tom de Vries wrote:

On 6/27/22 15:03, Tom de Vries wrote:

Hi,

When building gdb with --enabled-shared, I run into:
...
ld: build/zlib/libz.a(libz_a-inffast.o): relocation R_X86_64_32S 
against \
   `.rodata' can not be used when making a shared object; recompile 
with -fPIC

ld: build/zlib/libz.a(libz_a-inflate.o): warning: relocation against \
   `inflateResetKeep' in read-only section `.text'
collect2: error: ld returned 1 exit status
make[3]: *** [libbfd.la] Error 1
...

This is a regression since commit a08bdb159bb ("[gdb/build] Fix gdbserver
build with -fsanitize=thread").

The problem is that a single case statement in configure is shared to 
handle
special requirements for both the host libiberty and host zlib, which 
has the

effect that only one is handled.

Fix this by handling libiberty and zlib each in its own case statement.

Build on x86_64-linux, with and without --enable-shared.

OK for gcc trunk?



Ping.

Thanks,
- Tom


To fix the buildbot breakage, I already pushed to the gdb repo.

Thanks,
- Tom



[gdb/build] Fix build breaker with --enabled-shared

ChangeLog:

2022-06-27  Tom de Vries  

* configure.ac: Set extra_host_libiberty_configure_flags and
extra_host_zlib_configure_flags in separate case statements.
* configure: Regenerate.

---
  configure    | 8 ++--
  configure.ac | 8 ++--
  2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/configure b/configure
index aac80b88d70..be433ef6d5d 100755
--- a/configure
+++ b/configure
@@ -6962,13 +6962,18 @@ fi
  # Sometimes we have special requirements for the host libiberty.
  extra_host_libiberty_configure_flags=
-extra_host_zlib_configure_flags=
  case " $configdirs " in
    *" lto-plugin "* | *" libcc1 "* | *" gdbserver "*)
  # When these are to be built as shared libraries, the same 
applies to

  # libiberty.
  extra_host_libiberty_configure_flags=--enable-shared
  ;;
+esac
+
+
+# Sometimes we have special requirements for the host zlib.
+extra_host_zlib_configure_flags=
+case " $configdirs " in
    *" bfd "*)
  # When bfd is to be built as a shared library, the same applies to
  # zlib.
@@ -6979,7 +6984,6 @@ case " $configdirs " in
  esac
-
  # Produce a warning message for the subdirs we can't configure.
  # This isn't especially interesting in the Cygnus tree, but in the 
individual
  # FSF releases, it's important to let people know when their machine 
isn't

diff --git a/configure.ac b/configure.ac
index 29f74d10b5a..1651cbf3b02 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2342,13 +2342,18 @@ fi
  # Sometimes we have special requirements for the host libiberty.
  extra_host_libiberty_configure_flags=
-extra_host_zlib_configure_flags=
  case " $configdirs " in
    *" lto-plugin "* | *" libcc1 "* | *" gdbserver "*)
  # When these are to be built as shared libraries, the same 
applies to

  # libiberty.
  extra_host_libiberty_configure_flags=--enable-shared
  ;;
+esac
+AC_SUBST(extra_host_libiberty_configure_flags)
+
+# Sometimes we have special requirements for the host zlib.
+extra_host_zlib_configure_flags=
+case " $configdirs " in
    *" bfd "*)
  # When bfd is to be built as a shared library, the same applies to
  # zlib.
@@ -2357,7 +2362,6 @@ case " $configdirs " in
  fi
  ;;
  esac
-AC_SUBST(extra_host_libiberty_configure_flags)
  AC_SUBST(extra_host_zlib_configure_flags)
  # Produce a warning message for the subdirs we can't configure.


Re: [PATCH][gdb/build] Fix build breaker with --enabled-shared

2022-06-27 Thread Tom de Vries via Gcc-patches

On 6/27/22 15:03, Tom de Vries wrote:

Hi,

When building gdb with --enabled-shared, I run into:
...
ld: build/zlib/libz.a(libz_a-inffast.o): relocation R_X86_64_32S against \
   `.rodata' can not be used when making a shared object; recompile with -fPIC
ld: build/zlib/libz.a(libz_a-inflate.o): warning: relocation against \
   `inflateResetKeep' in read-only section `.text'
collect2: error: ld returned 1 exit status
make[3]: *** [libbfd.la] Error 1
...

This is a regression since commit a08bdb159bb ("[gdb/build] Fix gdbserver
build with -fsanitize=thread").

The problem is that a single case statement in configure is shared to handle
special requirements for both the host libiberty and host zlib, which has the
effect that only one is handled.

Fix this by handling libiberty and zlib each in its own case statement.

Build on x86_64-linux, with and without --enable-shared.

OK for gcc trunk?



To fix the buildbot breakage, I already pushed to the gdb repo.

Thanks,
- Tom



[gdb/build] Fix build breaker with --enabled-shared

ChangeLog:

2022-06-27  Tom de Vries  

* configure.ac: Set extra_host_libiberty_configure_flags and
extra_host_zlib_configure_flags in separate case statements.
* configure: Regenerate.

---
  configure| 8 ++--
  configure.ac | 8 ++--
  2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/configure b/configure
index aac80b88d70..be433ef6d5d 100755
--- a/configure
+++ b/configure
@@ -6962,13 +6962,18 @@ fi
  
  # Sometimes we have special requirements for the host libiberty.

  extra_host_libiberty_configure_flags=
-extra_host_zlib_configure_flags=
  case " $configdirs " in
*" lto-plugin "* | *" libcc1 "* | *" gdbserver "*)
  # When these are to be built as shared libraries, the same applies to
  # libiberty.
  extra_host_libiberty_configure_flags=--enable-shared
  ;;
+esac
+
+
+# Sometimes we have special requirements for the host zlib.
+extra_host_zlib_configure_flags=
+case " $configdirs " in
*" bfd "*)
  # When bfd is to be built as a shared library, the same applies to
  # zlib.
@@ -6979,7 +6984,6 @@ case " $configdirs " in
  esac
  
  
-

  # Produce a warning message for the subdirs we can't configure.
  # This isn't especially interesting in the Cygnus tree, but in the individual
  # FSF releases, it's important to let people know when their machine isn't
diff --git a/configure.ac b/configure.ac
index 29f74d10b5a..1651cbf3b02 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2342,13 +2342,18 @@ fi
  
  # Sometimes we have special requirements for the host libiberty.

  extra_host_libiberty_configure_flags=
-extra_host_zlib_configure_flags=
  case " $configdirs " in
*" lto-plugin "* | *" libcc1 "* | *" gdbserver "*)
  # When these are to be built as shared libraries, the same applies to
  # libiberty.
  extra_host_libiberty_configure_flags=--enable-shared
  ;;
+esac
+AC_SUBST(extra_host_libiberty_configure_flags)
+
+# Sometimes we have special requirements for the host zlib.
+extra_host_zlib_configure_flags=
+case " $configdirs " in
*" bfd "*)
  # When bfd is to be built as a shared library, the same applies to
  # zlib.
@@ -2357,7 +2362,6 @@ case " $configdirs " in
  fi
  ;;
  esac
-AC_SUBST(extra_host_libiberty_configure_flags)
  AC_SUBST(extra_host_zlib_configure_flags)
  
  # Produce a warning message for the subdirs we can't configure.


[PATCH][gdb/build] Fix build breaker with --enabled-shared

2022-06-27 Thread Tom de Vries via Gcc-patches
Hi,

When building gdb with --enabled-shared, I run into:
...
ld: build/zlib/libz.a(libz_a-inffast.o): relocation R_X86_64_32S against \
  `.rodata' can not be used when making a shared object; recompile with -fPIC
ld: build/zlib/libz.a(libz_a-inflate.o): warning: relocation against \
  `inflateResetKeep' in read-only section `.text'
collect2: error: ld returned 1 exit status
make[3]: *** [libbfd.la] Error 1
...

This is a regression since commit a08bdb159bb ("[gdb/build] Fix gdbserver
build with -fsanitize=thread").

The problem is that a single case statement in configure is shared to handle
special requirements for both the host libiberty and host zlib, which has the
effect that only one is handled.

Fix this by handling libiberty and zlib each in its own case statement.

Build on x86_64-linux, with and without --enable-shared.

OK for gcc trunk?

Thanks,
- Tom

[gdb/build] Fix build breaker with --enabled-shared

ChangeLog:

2022-06-27  Tom de Vries  

* configure.ac: Set extra_host_libiberty_configure_flags and
extra_host_zlib_configure_flags in separate case statements.
* configure: Regenerate.

---
 configure| 8 ++--
 configure.ac | 8 ++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/configure b/configure
index aac80b88d70..be433ef6d5d 100755
--- a/configure
+++ b/configure
@@ -6962,13 +6962,18 @@ fi
 
 # Sometimes we have special requirements for the host libiberty.
 extra_host_libiberty_configure_flags=
-extra_host_zlib_configure_flags=
 case " $configdirs " in
   *" lto-plugin "* | *" libcc1 "* | *" gdbserver "*)
 # When these are to be built as shared libraries, the same applies to
 # libiberty.
 extra_host_libiberty_configure_flags=--enable-shared
 ;;
+esac
+
+
+# Sometimes we have special requirements for the host zlib.
+extra_host_zlib_configure_flags=
+case " $configdirs " in
   *" bfd "*)
 # When bfd is to be built as a shared library, the same applies to
 # zlib.
@@ -6979,7 +6984,6 @@ case " $configdirs " in
 esac
 
 
-
 # Produce a warning message for the subdirs we can't configure.
 # This isn't especially interesting in the Cygnus tree, but in the individual
 # FSF releases, it's important to let people know when their machine isn't
diff --git a/configure.ac b/configure.ac
index 29f74d10b5a..1651cbf3b02 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2342,13 +2342,18 @@ fi
 
 # Sometimes we have special requirements for the host libiberty.
 extra_host_libiberty_configure_flags=
-extra_host_zlib_configure_flags=
 case " $configdirs " in
   *" lto-plugin "* | *" libcc1 "* | *" gdbserver "*)
 # When these are to be built as shared libraries, the same applies to
 # libiberty.
 extra_host_libiberty_configure_flags=--enable-shared
 ;;
+esac
+AC_SUBST(extra_host_libiberty_configure_flags)
+
+# Sometimes we have special requirements for the host zlib.
+extra_host_zlib_configure_flags=
+case " $configdirs " in
   *" bfd "*)
 # When bfd is to be built as a shared library, the same applies to
 # zlib.
@@ -2357,7 +2362,6 @@ case " $configdirs " in
 fi
 ;;
 esac
-AC_SUBST(extra_host_libiberty_configure_flags)
 AC_SUBST(extra_host_zlib_configure_flags)
 
 # Produce a warning message for the subdirs we can't configure.


[PATCH][gdb/build] Fix gdbserver build with -fsanitize=thread

2022-06-25 Thread Tom de Vries via Gcc-patches
Hi,

When building gdbserver with -fsanitize=thread (added to CFLAGS/CXXFLAGS) we
run into:
...
ld: ../libiberty/libiberty.a(safe-ctype.o): warning: relocation against \
  `__tsan_init' in read-only section `.text'
ld: ../libiberty/libiberty.a(safe-ctype.o): relocation R_X86_64_PC32 \
  against symbol `__tsan_init' can not be used when making a shared object; \
  recompile with -fPIC
ld: final link failed: bad value
collect2: error: ld returned 1 exit status
make[1]: *** [libinproctrace.so] Error 1
...
which looks similar to what is described in commit 78e49486944 ("[gdb/build]
Fix gdbserver build with -fsanitize=address").

The gdbserver component builds a shared library libinproctrace.so, which uses
libiberty and therefore requires the pic variant.  The gdbserver Makefile is
setup to use this variant, if available, but it's not there.

Fix this by listing gdbserver in the toplevel configure alongside libcc1, as a
component that needs the libiberty pic variant, setting:
...
extra_host_libiberty_configure_flags=--enable-shared
...

Tested on x86_64-linux.

OK for trunk gcc?

Thanks,
- Tom

[gdb/build] Fix gdbserver build with -fsanitize=thread

---
 configure| 2 +-
 configure.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 1badcb314f8..aac80b88d70 100755
--- a/configure
+++ b/configure
@@ -6964,7 +6964,7 @@ fi
 extra_host_libiberty_configure_flags=
 extra_host_zlib_configure_flags=
 case " $configdirs " in
-  *" lto-plugin "* | *" libcc1 "*)
+  *" lto-plugin "* | *" libcc1 "* | *" gdbserver "*)
 # When these are to be built as shared libraries, the same applies to
 # libiberty.
 extra_host_libiberty_configure_flags=--enable-shared
diff --git a/configure.ac b/configure.ac
index 5b6e2048514..29f74d10b5a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2344,7 +2344,7 @@ fi
 extra_host_libiberty_configure_flags=
 extra_host_zlib_configure_flags=
 case " $configdirs " in
-  *" lto-plugin "* | *" libcc1 "*)
+  *" lto-plugin "* | *" libcc1 "* | *" gdbserver "*)
 # When these are to be built as shared libraries, the same applies to
 # libiberty.
 extra_host_libiberty_configure_flags=--enable-shared


[committed][gdb/build] Fix build for gcc < 11

2022-06-15 Thread Tom de Vries via Gcc-patches
Hi,

When building trunk on openSUSE Leap 15.3 with system gcc 7.5.0, I run into:
...
In file included from ../bfd/bfd.h:46:0,
 from gdb/defs.h:37,
 from gdb/debuginfod-support.c:19:
gdb/debuginfod-support.c: In function ‘bool debuginfod_is_enabled()’:
gdb/../include/diagnostics.h:42:3: error: unknown option after \
  ‘#pragma GCC diagnostic’ kind [-Werror=pragmas]
   _Pragma (DIAGNOSTIC_STRINGIFY (GCC diagnostic ignored option))
   ^
gdb/../include/diagnostics.h:80:3: note: in expansion of macro \
  ‘DIAGNOSTIC_IGNORE’
   DIAGNOSTIC_IGNORE ("-Wstringop-overread")
   ^
gdb/debuginfod-support.c:201:4: note: in expansion of macro \
  ‘DIAGNOSTIC_IGNORE_STRINGOP_OVERREAD’
DIAGNOSTIC_IGNORE_STRINGOP_OVERREAD
^
...

The problem is that the warning -Wstringop-overread has been introduced for
gcc 11, and we can only tell gcc to ignore if it knows about it.

Fix this by guarding the DIAGNOSTIC_IGNORE_STRINGOP_OVERREAD definition in
diagnostics.c with '#if __GNUC__ >= 11'.

Tested on x86_64-linux, by completing a build.

Committed to trunk.

Thanks,
- Tom

[gdb/build] Fix build for gcc < 11

---
 include/diagnostics.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/diagnostics.h b/include/diagnostics.h
index 8bf5a3c3d9b..3da88282261 100644
--- a/include/diagnostics.h
+++ b/include/diagnostics.h
@@ -76,8 +76,10 @@
 # define DIAGNOSTIC_IGNORE_STRINGOP_TRUNCATION \
   DIAGNOSTIC_IGNORE ("-Wstringop-truncation")
 
+# if __GNUC__ >= 11
 # define DIAGNOSTIC_IGNORE_STRINGOP_OVERREAD \
   DIAGNOSTIC_IGNORE ("-Wstringop-overread")
+#endif
 
 # define DIAGNOSTIC_IGNORE_FORMAT_NONLITERAL \
   DIAGNOSTIC_IGNORE ("-Wformat-nonliteral")


Re: libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'

2022-05-12 Thread Tom de Vries via Gcc-patches

On 4/28/22 15:45, Thomas Schwinge wrote:

Hi Tom!

On 2022-04-08T09:35:44+0200, Tom de Vries  wrote:

On 4/8/22 00:27, Thomas Schwinge wrote:

On 2017-01-13T19:11:23+0100, Jakub Jelinek  wrote:

Especially for distributions it is undesirable to need to have proprietary
CUDA libraries and headers installed when building GCC.



--- libgomp/plugin/configfrag.ac.jj   2017-01-13 12:07:56.0 +0100
+++ libgomp/plugin/configfrag.ac  2017-01-13 17:33:26.608240936 +0100



+   PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
+   PLUGIN_NVPTX_LIBS='-ldl'
+   PLUGIN_NVPTX_DYNAMIC=1



+AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
+  [Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should 
be linked against it.])


Actually, the conditionals leading to 'PLUGIN_NVPTX_DYNAMIC=1' here do
control two orthogonal aspects; OK to disentangle that with the attached
"libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'"?



we discussed dropping --with-cuda, so do I understand it correctly that
you now propose to drop --with-cuda and --with-cuda-driver-lib but
intend to keep --with-cuda-driver-include ?


No, I think you're reading too much into this first patch.  ;-)

The goal with this patch is just to help disentangle two orthogonal
concepts (as described in the commit log), and then...


Can you explain what user or maintainer scenario is served by this?


... in a next step, we may indeed remove the current user-visible
'--with-cuda-driver' etc., but keep the underlying functionality
available for the developers.  That's to address the point you'd made in
the "Proposal to remove '--with-cuda-driver'" thread: that it still
"could be useful for debugging / comparison purposes" -- and especially
for development purposes, in my opinion: if you develop CUDA API-level
changes in the libgomp nvptx plugin, it's likely to be easier to just use
the full CUDA toolkit 'cuda.h' and directly link against libcuda (so that
you've got all symbols etc. available), and only once you know what
exactly you need, update GCC's 'include/cuda/cuda.h' and
'libgomp/plugin/cuda-lib.def'.

With that hopefully clarified, OK to push the re-attached
"libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'"?



Ack, understood, thanks for the detailed explanation.

LGTM.

Thanks,
- Tom


Is
there a problem with using gcc's cuda.h?


No, all good.


Grüße
  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [committed][nvptx] Fix ASM_SPEC workaround for sm_30

2022-04-11 Thread Tom de Vries via Gcc-patches

On 4/7/22 16:17, Thomas Schwinge wrote:

Hi!

On 2022-03-31T09:40:47+0200, Tom de Vries via Gcc-patches 
 wrote:

Newer versions of CUDA no longer support sm_30, and nvptx-tools as
currently doesn't handle that gracefully when verifying
( https://github.com/MentorEmbedded/nvptx-tools/issues/30 ).


There's now <https://github.com/MentorEmbedded/nvptx-tools/pull/33>
'as: Deal with CUDA 11.0, "Support for Kepler 'sm_30' and 'sm_32'
architecture based products is dropped"' available for comment/testing.


There's a --no-verify work-around in place in ASM_SPEC, but that one doesn't
work when using -Wa,--verify on the command line.


With that resolved in nvptx-tools, we may then revert these GCC-level
workarounds, GCC commit bf4832d6fa817f66009f100a9cd68953062add7d
"[nvptx] Fix ASM_SPEC workaround for sm_30", and
GCC commit 12fa7641ceed9c9139e2ea7b62c11f3dc5b6f6f4
"[nvptx] Use --no-verify for sm_30".  OK to push, once nvptx-tools ready?


Use a more robust workaround: verify using sm_35 when misa=sm_30 is specified
(either implicitly or explicitly).


Thanks for that suggestion!



Hi,

I've tested the nvptx-tools patch in combination with a patch that 
remote ASM_SPEC, and that went fine.


[ Well apart from a new libgomp FAIL:
...
FAIL: libgomp.oacc-fortran/private-variables.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1 
 at line 142 (test for bogus messages, line 131)

...
but I assume that's unrelated ]

So, patch that removes ASM_SPEC pre-approved.

Thanks,
- Tom


Re: libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA

2022-04-08 Thread Tom de Vries via Gcc-patches

On 4/8/22 00:27, Thomas Schwinge wrote:

Hi!

On 2017-01-13T19:11:23+0100, Jakub Jelinek  wrote:

Especially for distributions it is undesirable to need to have proprietary
CUDA libraries and headers installed when building GCC.



--- libgomp/plugin/configfrag.ac.jj   2017-01-13 12:07:56.0 +0100
+++ libgomp/plugin/configfrag.ac  2017-01-13 17:33:26.608240936 +0100



+   PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
+   PLUGIN_NVPTX_LIBS='-ldl'
+   PLUGIN_NVPTX_DYNAMIC=1



+AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
+  [Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should 
be linked against it.])


Actually, the conditionals leading to 'PLUGIN_NVPTX_DYNAMIC=1' here do
control two orthogonal aspects; OK to disentangle that with the attached
"libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'"?



Hi Thomas,

we discussed dropping --with-cuda, so do I understand it correctly that 
you now propose to drop --with-cuda and --with-cuda-driver-lib but 
intend to keep --with-cuda-driver-include ?


Can you explain what user or maintainer scenario is served by this?  Is 
there a problem with using gcc's cuda.h?


Thanks,
- Tom



Grüße
  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: Proposal to remove '--with-cuda-driver' (was: [wwwdocs][patch] gcc-12: Nvptx updates)

2022-04-06 Thread Tom de Vries via Gcc-patches

On 4/5/22 17:14, Thomas Schwinge wrote:

Hi!

Still catching up with GCC/nvptx back end changes...  %-)


In the following I'm not discussing the patch to document
"gcc-12: Nvptx updates", but rather one aspect of the
"gcc-12: Nvptx updates" themselves.  ;-)

On 2022-03-30T14:27:41+0200, Tom de Vries  wrote:

+  The -march flag has been added.  The -misa
+flag is now considered an alias of the -march flag.
+  Support for PTX ISA target architectures sm_53,
+sm_70, sm_75 and sm_80 has been
+added.  These can be specified using the -march flag.
+  The default PTX ISA target architecture has been set back
+to sm_30, to fix support for sm_30 boards.
+  The -march-map flag has been added.  The
+-march-map value will be mapped to an valid
+-march flag value.  For instance,
+-march-map=sm_50 maps to -march=sm_35.
+This can be used to specify that generated code is to be executed on a
+board with at least some specific compute capability, without having to
+know the valid values for the -march flag.


Regarding the following:


The -mptx flag has been added to specify the PTX ISA 
version
for the generated code; permitted values are 3.1
-  (default, matches previous GCC versions) and 6.3.
+  (matches previous GCC versions), 6.0, 6.3,
+  and 7.0. If not specified, the used version is the minimal
+  version required for -march but at least 6.0.



For "the PTX ISA version [used is] at least '6.0'", per
<https://docs.nvidia.com/cuda/parallel-thread-execution/#release-notes>,
this means we now require "CUDA 9.0, driver r384" (or more recent).


Well, that would be the case if there was no -mptx=3.1.


Per <https://developer.nvidia.com/cuda-toolkit-archive>:
"CUDA Toolkit 9.0 (Sept 2017)", so ~4.5 years old.
Per <https://download.nvidia.com/XFree86/Linux-x86_64/>, I'm guessing a


I just see a list with version numbers there, I'm not sure what 
information you're referring to.



similar timeframe for the imprecise "r384" Driver version stated in that
table.  That should all be fine (re not mandating use of all-too-recent
versions).



I don't know what an imprecise driver is.


Now, consider doing a GCC/nvptx offloading build with
'--with-cuda-driver' pointing to CUDA 9.0 (or more recent).  This means
that the libgomp nvptx plugin may now use CUDA Driver features of the
CUDA 9.0 distribution ("driver r384", etc.) -- because that's what it is
being 'configure'd and linked against.  (I say "may now use", because
we're currently not making a lot of effort to use "modern" CUDA Driver
features -- but we could, and probably should.  That's a separate
discussion, of course.)  It then follows that the libgomp nvptx plugin
has a hard dependency on CUDA Driver features of the CUDA 9.0
distribution ("driver r384", etc.).  That's dependency as in ABI: via
'*.so' symbol versions as well as internal CUDA interface configuration;
see  doing different '#define's for different
'__CUDA_API_VERSION' etc.)

Now assume one such dependency on "modern" CUDA Driver were not
implemented by:



Thanks for reminding me, I forgot about this configure option.


+  An mptx-3.1 multilib was added.  This allows using older
+  drivers which do not support PTX ISA version 6.0.


... this "old" CUDA Driver.  Then you do have the '-mptx-3.1' multilib to
use with "old" CUDA Driver -- but you cannot actually use the libgomp
nvptx plugin, because that's been built against "modern" CUDA Driver.



I remember the following problem: using -with-cuda-driver to specify 
what cuda driver interface (version) you want to link the libgomp plugin 
against, and then using an older driver in combination with that libgomp 
plugin.   We may run into trouble, typically at libgomp plugin load 
time, with an error mentioning an unresolved symbol or some abi symbol 
version being not sufficient.


So, do I understand it correctly that your point is that using -mptx=3.1 
doesn't fix that problem?



Same problem, generally, for 'nvptx-run' of the nvptx-tools, which has
similar CUDA Driver dependencies.

Now, that may currently be a latent problem only, because we're not
actually making use of "modern" CUDA Driver features.  But, I'd like to
resolve this "impedance mismatch", before we actually run into such
problems.



It would be helpful for me if you would come up with an example of a 
modification to the libgomp plugin that would cause trouble in 
combination with mptx=3.1.



Already long ago Jakub put in changes to use '--without-cuda-driver' to
"Allow building GCC with PTX offloading even without CUDA being installed
(gcc and nvptx-tools patches)": "Especially for distributions it is
undesirable to need to have proprietary CUDA libraries and headers
installed when building GCC.", and 

Re: [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90

2022-04-04 Thread Tom de Vries via Gcc-patches

On 4/4/22 13:07, Jakub Jelinek wrote:

On Mon, Apr 04, 2022 at 01:05:12PM +0200, Tom de Vries wrote:

2022-04-04  Tom de Vries  

* testsuite/libgomp.fortran/examples-4/on_device_arch.c: Copy from
parent dir.


Wouldn't just ! { dg-additional-sources ../on_device_arch.c }
work?


I does, pushed with that update.

Thanks,
- Tom



Re: [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90

2022-04-04 Thread Tom de Vries via Gcc-patches

On 4/1/22 17:57, Tom de Vries wrote:

On 4/1/22 17:38, Jakub Jelinek wrote:

On Fri, Apr 01, 2022 at 05:34:50PM +0200, Tom de Vries wrote:

Do you perhaps have an idea why it's failing?


Because you call on_device_arch_nvptx () outside of
!$omp target region, so unless the host device is NVPTX,
it will not be true.



That bit does works because on_device_arch_nvptx calls on_device_arch 
which contains the omp target bit:

...
static int
on_device_arch (int d)
{
   int d_cur;
   #pragma omp target map(from:d_cur)
   d_cur = device_arch ();

   return d_cur == d;
}

int
on_device_arch_nvptx ()
{
   return on_device_arch (GOMP_DEVICE_NVIDIA_PTX);
}
...

So I realized that I didn't do a good job of specifying the problem I 
encountered, and went looking at it, at which point I realized the error 
message had changed, and knew how to fix it ... So, my apologies, some 
confusion on my part.


Anyway, attached patch avoids any nvptx-related tcl directives (just for 
once test-case for now).  To me, this seems the most robust solution.


It this approach acceptable?


I intend to commit this in a few days, unless there are objections.

Thanks,
- Tom[libgomp/testsuite] Fix libgomp.fortran/examples-4/declare_target-{1,2}.f90

The test-cases libgomp.fortran/examples-4/declare_target-{1,2}.f90 mean to
set an nvptx-specific limit using offload_target_nvptx, but also change
behaviour for amd.

That is, there is now a difference in behaviour between:
- a compiler configured for GCN offloading, and
- a compiler configured for both GCN and nvptx offloading.

Fix this by using instead on_device_arch_nvptx.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-04-04  Tom de Vries  

	* testsuite/libgomp.fortran/examples-4/on_device_arch.c: Copy from
	parent dir.
	* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Use
	on_device_arch_nvptx instead of offload_target_nvptx.
	* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.

---
 .../examples-4/declare_target-1.f90| 31 +-
 .../examples-4/declare_target-2.f90| 31 +-
 .../libgomp.fortran/examples-4/on_device_arch.c|  3 +++
 3 files changed, 41 insertions(+), 24 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90 b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
index 03c5c53ed67..acded20f756 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
@@ -1,16 +1,6 @@
 ! { dg-do run }
-! { dg-additional-options "-cpp" }
-! Reduced from 25 to 23, otherwise execution runs out of thread stack on
-! Nvidia Titan V.
-! Reduced from 23 to 22, otherwise execution runs out of thread stack on
-! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
-! Reduced from 22 to 20, otherwise execution runs out of thread stack on
-! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
-! { dg-additional-options "-DREC_DEPTH=20" { target { offload_target_nvptx } } } */
-
-#ifndef REC_DEPTH
-#define REC_DEPTH 25
-#endif
+! { dg-additional-sources on_device_arch.c }
+! { dg-prune-output "command-line option '-fintrinsic-modules-path=.*' is valid for Fortran but not for C" }
 
 module e_53_1_mod
   integer :: THRESHOLD = 20
@@ -38,6 +28,23 @@ end module
 
 program e_53_1
   use e_53_1_mod, only : fib, fib_wrapper
+  integer :: REC_DEPTH = 25
+
+  interface
+integer function on_device_arch_nvptx() bind(C)
+end function on_device_arch_nvptx
+  end interface
+
+  if (on_device_arch_nvptx () /= 0) then
+ ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+ ! Nvidia Titan V.
+ ! Reduced from 23 to 22, otherwise execution runs out of thread stack on
+ ! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
+ ! Reduced from 22 to 20, otherwise execution runs out of thread stack on
+ ! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
+ REC_DEPTH = 20
+  end if
+
   if (fib (15) /= fib_wrapper (15)) stop 1
   if (fib (REC_DEPTH) /= fib_wrapper (REC_DEPTH)) stop 2
 end program
diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90 b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
index 0e8bea578a8..27a5cec2e9d 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
@@ -1,20 +1,27 @@
 ! { dg-do run }
-! { dg-additional-options "-cpp" }
-! Reduced from 25 to 23, otherwise execution runs out of thread stack on
-! Nvidia Titan V.
-! Reduced from 23 to 22, otherwise execution runs out of thread stack on
-! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
-! Reduced from 22 to 18, otherwise execution runs out of thread stack on
-! Nvidia RTX A2000 (6GB variant), w

Re: [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90

2022-04-01 Thread Tom de Vries via Gcc-patches

On 4/1/22 17:38, Jakub Jelinek wrote:

On Fri, Apr 01, 2022 at 05:34:50PM +0200, Tom de Vries wrote:

Do you perhaps have an idea why it's failing?


Because you call on_device_arch_nvptx () outside of
!$omp target region, so unless the host device is NVPTX,
it will not be true.



That bit does works because on_device_arch_nvptx calls on_device_arch 
which contains the omp target bit:

...
static int
on_device_arch (int d)
{
  int d_cur;
  #pragma omp target map(from:d_cur)
  d_cur = device_arch ();

  return d_cur == d;
}

int
on_device_arch_nvptx ()
{
  return on_device_arch (GOMP_DEVICE_NVIDIA_PTX);
}
...

So I realized that I didn't do a good job of specifying the problem I 
encountered, and went looking at it, at which point I realized the error 
message had changed, and knew how to fix it ... So, my apologies, some 
confusion on my part.


Anyway, attached patch avoids any nvptx-related tcl directives (just for 
once test-case for now).  To me, this seems the most robust solution.


It this approach acceptable?

Thanks,
- Tom
[libgomp/testsuite] Fix libgomp.fortran/examples-4/declare_target-1.f90

---
 .../examples-4/declare_target-1.f90| 31 +-
 .../libgomp.fortran/examples-4/on_device_arch.c|  3 +++
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90 b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
index 03c5c53ed67..acded20f756 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
@@ -1,16 +1,6 @@
 ! { dg-do run }
-! { dg-additional-options "-cpp" }
-! Reduced from 25 to 23, otherwise execution runs out of thread stack on
-! Nvidia Titan V.
-! Reduced from 23 to 22, otherwise execution runs out of thread stack on
-! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
-! Reduced from 22 to 20, otherwise execution runs out of thread stack on
-! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
-! { dg-additional-options "-DREC_DEPTH=20" { target { offload_target_nvptx } } } */
-
-#ifndef REC_DEPTH
-#define REC_DEPTH 25
-#endif
+! { dg-additional-sources on_device_arch.c }
+! { dg-prune-output "command-line option '-fintrinsic-modules-path=.*' is valid for Fortran but not for C" }
 
 module e_53_1_mod
   integer :: THRESHOLD = 20
@@ -38,6 +28,23 @@ end module
 
 program e_53_1
   use e_53_1_mod, only : fib, fib_wrapper
+  integer :: REC_DEPTH = 25
+
+  interface
+integer function on_device_arch_nvptx() bind(C)
+end function on_device_arch_nvptx
+  end interface
+
+  if (on_device_arch_nvptx () /= 0) then
+ ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+ ! Nvidia Titan V.
+ ! Reduced from 23 to 22, otherwise execution runs out of thread stack on
+ ! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
+ ! Reduced from 22 to 20, otherwise execution runs out of thread stack on
+ ! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
+ REC_DEPTH = 20
+  end if
+
   if (fib (15) /= fib_wrapper (15)) stop 1
   if (fib (REC_DEPTH) /= fib_wrapper (REC_DEPTH)) stop 2
 end program
diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/on_device_arch.c b/libgomp/testsuite/libgomp.fortran/examples-4/on_device_arch.c
new file mode 100644
index 000..f8bef19e021
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/on_device_arch.c
@@ -0,0 +1,3 @@
+/* Auxiliar file.  */
+/* { dg-do compile  { target skip-all-targets } } */
+#include "../../libgomp.c-c++-common/on_device_arch.h"


Re: [PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90

2022-04-01 Thread Tom de Vries via Gcc-patches

On 4/1/22 14:28, Thomas Schwinge wrote:

Hi Tom!

On 2022-04-01T13:24:40+0200, Tom de Vries  wrote:

When running testcases libgomp.fortran/examples-4/declare_target-{1,2}.f90 on
an RTX A2000 (sm_86) with driver 510.60.02 and with GOMP_NVPTX_JIT=-O0 I run
into:
...
FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 \
   -DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.fortran/examples-4/declare_target-2.f90 -O0 \
   -DGOMP_NVPTX_JIT=-O0 execution test
...

Fix this by further limiting recursion depth in the test-cases for nvptx.

Furthermore, make the recursion depth limiting nvptx-specific.


Careful:


--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
@@ -1,4 +1,16 @@
  ! { dg-do run }
+! { dg-additional-options "-cpp" }
+! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+! Nvidia Titan V.
+! Reduced from 23 to 22, otherwise execution runs out of thread stack on
+! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
+! Reduced from 22 to 20, otherwise execution runs out of thread stack on
+! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
+! { dg-additional-options "-DREC_DEPTH=20" { target { offload_target_nvptx } } 
} */


'offload_target_nvptx' doesn't mean that offloading execution is done on
nvptx, but rather that we're "*compiling* for offload target nvptx"
(emphasis mine).  That means, with such a change we're now getting
different behavior in a system with an AMD GPU, when using a toolchain
that only has GCN offloading configured vs. a toolchain that has GCN and
nvptx offloading configured.  This isn't going to cause any real
problems, of course, but it's confusing, and a bad example of
'offload_target_nvptx'.

'offload_device_nvptx' ought to work: "using nvptx offload device".



Thanks for pointing that out.

I tried to understand this multiple offloading configuration a bit, and 
came up with the following mental model: it's possible to have a host 
with say an nvptx and amd offloading device, and then configure and 
build a toolchain that can generate a single executable that can offload 
to either device, depending on the value of appropriate openacc/openmp 
environment variables.


So, in principle the libgomp testsuite could have a mode in which it 
does that: run the same executable twice, once for each offloading 
device.  In that case, even using offload_device_nvptx would not be 
accurate enough, and we'd need to test for offload device type at 
runtime, as used to be done in 
libgomp/testsuite/libgomp.fortran/task-detach-6.f90.


I've tried to copy that setup to 
libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90, but 
that doesn't seem to work anymore.  I've also tried copying that 
test-case to 
libgomp/testsuite/libgomp.fortran/copy-of-declare_target-1.f90 to rule 
out any subdir-related problems, but no luck there either.


Attached is that copy approach, could you try it out and see if it works 
for you?


Do you perhaps have an idea why it's failing?

I can make a patch using offload_device_nvptx, but I'd prefer to 
understand first why the approach above isn't working.


Thanks,
- Tom[libgomp/testsuite] Add libgomp.fortran/copy-of-declare_target-1.f90

---
 .../libgomp.fortran/copy-of-declare_target-1.f90   | 49 ++
 1 file changed, 49 insertions(+)

diff --git a/libgomp/testsuite/libgomp.fortran/copy-of-declare_target-1.f90 b/libgomp/testsuite/libgomp.fortran/copy-of-declare_target-1.f90
new file mode 100644
index 000..6dcf5312070
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/copy-of-declare_target-1.f90
@@ -0,0 +1,49 @@
+! { dg-do run }
+! { dg-additional-sources on_device_arch.c }
+
+module e_53_1_mod
+  integer :: THRESHOLD = 20
+contains
+  integer recursive function fib (n) result (f)
+!$omp declare target
+integer :: n
+if (n <= 0) then
+  f = 0
+else if (n == 1) then
+  f = 1
+else
+  f = fib (n - 1) + fib (n - 2)
+end if
+  end function
+
+  integer function fib_wrapper (n)
+integer :: x
+!$omp target map(to: n) map(from: x) if(n > THRESHOLD)
+  x = fib (n)
+!$omp end target
+fib_wrapper = x
+  end function
+end module
+
+program e_53_1
+  use e_53_1_mod, only : fib, fib_wrapper
+  integer :: REC_DEPTH = 25
+
+  interface
+integer function on_device_arch_nvptx() bind(C)
+end function on_device_arch_nvptx
+  end interface
+
+  if (on_device_arch_nvptx () /= 0) then
+ ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+ ! Nvidia Titan V.
+ ! Reduced from 23 to 22, otherwise execution runs out of thread stack on
+ ! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
+ ! Reduced from 22 to 20, otherwise execution runs out of thread stack on
+ ! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O

[PATCH][libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90

2022-04-01 Thread Tom de Vries via Gcc-patches
Hi,

When running testcases libgomp.fortran/examples-4/declare_target-{1,2}.f90 on
an RTX A2000 (sm_86) with driver 510.60.02 and with GOMP_NVPTX_JIT=-O0 I run
into:
...
FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 \
  -DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.fortran/examples-4/declare_target-2.f90 -O0 \
  -DGOMP_NVPTX_JIT=-O0 execution test
...

Fix this by further limiting recursion depth in the test-cases for nvptx.

Furthermore, make the recursion depth limiting nvptx-specific.

Tested on x86_64 with nvptx accelerator.

Any comments?

Thanks,
- Tom

[libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90

libgomp/ChangeLog:

2022-04-01  Tom de Vries  

* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Define
and use REC_DEPTH.
* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.

---
 .../libgomp.fortran/examples-4/declare_target-1.f90  | 18 +-
 .../libgomp.fortran/examples-4/declare_target-2.f90  | 20 ++--
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90 
b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
index b761979ecde..03c5c53ed67 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
@@ -1,4 +1,16 @@
 ! { dg-do run }
+! { dg-additional-options "-cpp" }
+! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+! Nvidia Titan V.
+! Reduced from 23 to 22, otherwise execution runs out of thread stack on
+! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
+! Reduced from 22 to 20, otherwise execution runs out of thread stack on
+! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
+! { dg-additional-options "-DREC_DEPTH=20" { target { offload_target_nvptx } } 
} */
+
+#ifndef REC_DEPTH
+#define REC_DEPTH 25
+#endif
 
 module e_53_1_mod
   integer :: THRESHOLD = 20
@@ -27,9 +39,5 @@ end module
 program e_53_1
   use e_53_1_mod, only : fib, fib_wrapper
   if (fib (15) /= fib_wrapper (15)) stop 1
-  ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
-  ! Nvidia Titan V.
-  ! Reduced from 23 to 22, otherwise execution runs out of thread stack on
-  ! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
-  if (fib (22) /= fib_wrapper (22)) stop 2
+  if (fib (REC_DEPTH) /= fib_wrapper (REC_DEPTH)) stop 2
 end program
diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90 
b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
index f576c25ba39..0e8bea578a8 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
@@ -1,16 +1,24 @@
 ! { dg-do run }
+! { dg-additional-options "-cpp" }
+! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+! Nvidia Titan V.
+! Reduced from 23 to 22, otherwise execution runs out of thread stack on
+! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
+! Reduced from 22 to 18, otherwise execution runs out of thread stack on
+! Nvidia RTX A2000 (6GB variant), when run with GOMP_NVPTX_JIT=-O0.
+! { dg-additional-options "-DREC_DEPTH=18" { target { offload_target_nvptx } } 
} */
+
+#ifndef REC_DEPTH
+#define REC_DEPTH 25
+#endif
 
 program e_53_2
   !$omp declare target (fib)
   integer :: x, fib
   !$omp target map(from: x)
-! Reduced from 25 to 23, otherwise execution runs out of thread stack on
-! Nvidia Titan V.
-! Reduced from 23 to 22, otherwise execution runs out of thread stack on
-! Nvidia T400 (2GB variant), when run with GOMP_NVPTX_JIT=-O0.
-x = fib (22)
+x = fib (REC_DEPTH)
   !$omp end target
-  if (x /= fib (22)) stop 1
+  if (x /= fib (REC_DEPTH)) stop 1
 end program
 
 integer recursive function fib (n) result (f)


[committed][libgomp, testsuite, nvptx] Fix dg-output test in vector-length-128-7.c

2022-04-01 Thread Tom de Vries via Gcc-patches
Hi,

When running test-case libgomp.oacc-c-c++-common/vector-length-128-7.c on an
RTX A2000 (sm_86) with driver 510.60.02 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-7.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  \
  output pattern test
...

The failing check verifies the launch dimensions:
...
/* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: \
launch gangs=1, workers=8, vectors=128" } */
...
which fails because (as we can see with GOMP_DEBUG=1) the actual num_workers
is 6:
...
  nvptx_exec: kernel main$_omp_fn$0: launch gangs=1, workers=6, vectors=128
...

This is due to the result of cuOccupancyMaxPotentialBlockSize (which suggests
'a launch configuration with reasonable occupancy') printed just before:
...
cuOccupancyMaxPotentialBlockSize: grid = 52, block = 768
...
[ Note: 6 * 128 == 768. ]

Fix this by updating the check to allow num_workers in the range 1 to 8.

Tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[libgomp, testsuite, nvptx] Fix dg-output test in vector-length-128-7.c

libgomp/ChangeLog:

2022-04-01  Tom de Vries  

* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Fix
num_workers check.

---
 libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c
index 4a8c1bf549e..92b3de03636 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c
@@ -37,4 +37,4 @@ main (void)
 }
 
 /* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 
0, 128\\)" "oaccloops" } } */
-/* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, 
workers=8, vectors=128" } */
+/* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, 
workers=\[1-8\], vectors=128" } */


[committed][nvptx, testsuite] Fix gcc.target/nvptx/alias-*.c on sm_80

2022-04-01 Thread Tom de Vries via Gcc-patches
Hi,

When running test-cases gcc.target/nvptx/alias-*.c on target board
nvptx-none-run/-misa=sm_80 we run into fails because the test-cases add
-mptx=6.3, which doesn't support sm_80.

Fix this by only adding -mptx=6.3 if necessary, and simplify the test-cases by
using ptx_alias feature abstractions:
...
/* { dg-do run { target runtime_ptx_alias } } */
/* { dg-add-options ptx_alias } */
...

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx, testsuite] Fix gcc.target/nvptx/alias-*.c on sm_80

gcc/testsuite/ChangeLog:

2022-04-01  Tom de Vries  

* gcc.target/nvptx/nvptx.exp
(check_effective_target_runtime_ptx_isa_version_6_3): Rename and
generalize to ...
(check_effective_target_runtime_ptx_isa_version_at_least): .. this.
(check_effective_target_default_ptx_isa_version_at_least)
(check_effective_target_runtime_ptx_alias, add_options_for_ptx_alias):
New proc.
* gcc.target/nvptx/alias-1.c: Use "target runtime_ptx_alias" and
"dg-add-options ptx_alias".
* gcc.target/nvptx/alias-2.c: Same.
* gcc.target/nvptx/alias-3.c: Same.
* gcc.target/nvptx/alias-4.c: Same.

---
 gcc/testsuite/gcc.target/nvptx/alias-1.c |  5 +--
 gcc/testsuite/gcc.target/nvptx/alias-2.c |  5 +--
 gcc/testsuite/gcc.target/nvptx/alias-3.c |  5 +--
 gcc/testsuite/gcc.target/nvptx/alias-4.c |  5 +--
 gcc/testsuite/gcc.target/nvptx/nvptx.exp | 62 +---
 5 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.target/nvptx/alias-1.c 
b/gcc/testsuite/gcc.target/nvptx/alias-1.c
index f68716e77dd..d251eee6e42 100644
--- a/gcc/testsuite/gcc.target/nvptx/alias-1.c
+++ b/gcc/testsuite/gcc.target/nvptx/alias-1.c
@@ -1,6 +1,7 @@
 /* { dg-do link } */
-/* { dg-do run { target runtime_ptx_isa_version_6_3 } } */
-/* { dg-options "-save-temps -malias -mptx=6.3" } */
+/* { dg-do run { target runtime_ptx_alias } } */
+/* { dg-options "-save-temps" } */
+/* { dg-add-options ptx_alias } */
 
 int v;
 
diff --git a/gcc/testsuite/gcc.target/nvptx/alias-2.c 
b/gcc/testsuite/gcc.target/nvptx/alias-2.c
index e2dc9b1f5ac..96cb7e2c1ef 100644
--- a/gcc/testsuite/gcc.target/nvptx/alias-2.c
+++ b/gcc/testsuite/gcc.target/nvptx/alias-2.c
@@ -1,6 +1,7 @@
 /* { dg-do link } */
-/* { dg-do run { target runtime_ptx_isa_version_6_3 } } */
-/* { dg-options "-save-temps -malias -mptx=6.3 -O2" } */
+/* { dg-do run { target runtime_ptx_alias } } */
+/* { dg-options "-save-temps -O2" } */
+/* { dg-add-options ptx_alias } */
 
 #include "alias-1.c"
 
diff --git a/gcc/testsuite/gcc.target/nvptx/alias-3.c 
b/gcc/testsuite/gcc.target/nvptx/alias-3.c
index 60486e50826..39649e30b91 100644
--- a/gcc/testsuite/gcc.target/nvptx/alias-3.c
+++ b/gcc/testsuite/gcc.target/nvptx/alias-3.c
@@ -1,6 +1,7 @@
 /* { dg-do link } */
-/* { dg-do run { target runtime_ptx_isa_version_6_3 } } */
-/* { dg-options "-save-temps -malias -mptx=6.3" } */
+/* { dg-do run { target runtime_ptx_alias } } */
+/* { dg-options "-save-temps" } */
+/* { dg-add-options ptx_alias } */
 
 /* Copy of alias-1.c, with static __f and f.  */
 
diff --git a/gcc/testsuite/gcc.target/nvptx/alias-4.c 
b/gcc/testsuite/gcc.target/nvptx/alias-4.c
index 956150a6b3f..28163c0faa0 100644
--- a/gcc/testsuite/gcc.target/nvptx/alias-4.c
+++ b/gcc/testsuite/gcc.target/nvptx/alias-4.c
@@ -1,6 +1,7 @@
 /* { dg-do link } */
-/* { dg-do run { target runtime_ptx_isa_version_6_3 } } */
-/* { dg-options "-save-temps -malias -mptx=6.3 -O2" } */
+/* { dg-do run { target runtime_ptx_alias } } */
+/* { dg-options "-save-temps -O2" } */
+/* { dg-add-options ptx_alias } */
 
 #include "alias-3.c"
 
diff --git a/gcc/testsuite/gcc.target/nvptx/nvptx.exp 
b/gcc/testsuite/gcc.target/nvptx/nvptx.exp
index e69b6d35fed..e9622ae7aaa 100644
--- a/gcc/testsuite/gcc.target/nvptx/nvptx.exp
+++ b/gcc/testsuite/gcc.target/nvptx/nvptx.exp
@@ -25,11 +25,65 @@ if ![istarget nvptx*-*-*] then {
 # Load support procs.
 load_lib gcc-dg.exp
 
-# Return 1 if code with -mptx=6.3 can be run.
-proc check_effective_target_runtime_ptx_isa_version_6_3 { args } {
-return [check_runtime run_ptx_isa_6_3 {
+# Return 1 if code by default compiles for at least PTX ISA version
+# major.minor.
+proc check_effective_target_default_ptx_isa_version_at_least { major minor } {
+set name default_ptx_isa_version_at_least_${major}_${minor}
+
+set supported_p \
+   [concat \
+"((__PTX_ISA_VERSION_MAJOR__ == $major" \
+"  && __PTX_ISA_VERSION_MINOR__ >= $minor)" \
+" || (__PTX_ISA_VERSION_MAJOR__ > $major))"]
+
+set src \
+   [list \
+"#if $supported_p" \
+"#else" \
+"#error unsupported" \
+"#endif"]
+

[committed][nvptx, testsuite] Fix typo in gcc.target/nvptx/march.c

2022-03-31 Thread Tom de Vries via Gcc-patches
Hi,

The dg-options line in gcc.target/nvptx/march.c:
...
/* { dg-options "-march=sm_30"} */
...
currently doesn't have any effect because it's missing a space between '"' and
'}'.

Fix this by adding the missing space.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx, testsuite] Fix typo in gcc.target/nvptx/march.c

gcc/testsuite/ChangeLog:

2022-03-31  Tom de Vries  

* gcc.target/nvptx/march.c: Add missing space in dg-options line.

---
 gcc/testsuite/gcc.target/nvptx/march.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/nvptx/march.c 
b/gcc/testsuite/gcc.target/nvptx/march.c
index ec91f21c903..d1dd715798c 100644
--- a/gcc/testsuite/gcc.target/nvptx/march.c
+++ b/gcc/testsuite/gcc.target/nvptx/march.c
@@ -1,4 +1,4 @@
-/* { dg-options "-march=sm_30"} */
+/* { dg-options "-march=sm_30" } */
 
 #include "main.c"
 


[committed][nvptx] Fix ASM_SPEC workaround for sm_30

2022-03-31 Thread Tom de Vries via Gcc-patches
Hi,

Newer versions of CUDA no longer support sm_30, and nvptx-tools as
currently doesn't handle that gracefully when verifying
( https://github.com/MentorEmbedded/nvptx-tools/issues/30 ).

There's a --no-verify work-around in place in ASM_SPEC, but that one doesn't
work when using -Wa,--verify on the command line.

Use a more robust workaround: verify using sm_35 when misa=sm_30 is specified
(either implicitly or explicitly).

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Fix ASM_SPEC workaround for sm_30

gcc/ChangeLog:

2022-03-30  Tom de Vries  

* config/nvptx/nvptx.h (ASM_SPEC): Use "-m sm_35" for -misa=sm_30.

---
 gcc/config/nvptx/nvptx.h | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 75ac7a666b1..3b06f33032f 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -29,10 +29,24 @@
 
 #define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
 
-/* Default needs to be in sync with default for misa in nvptx.opt.
-   We add a default here to work around a hard-coded sm_30 default in
-   nvptx-as.  */
-#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}%{misa=sm_30:--no-verify}"
+/* Newer versions of CUDA no longer support sm_30, and nvptx-tools as
+   currently doesn't handle that gracefully when verifying
+   ( https://github.com/MentorEmbedded/nvptx-tools/issues/30 ).  Work around
+   this by verifying with sm_35 when having misa=sm_30 (either implicitly
+   or explicitly).  */
+#define ASM_SPEC   \
+  "%{" \
+  /* Explict misa=sm_30.  */   \
+  "misa=sm_30:-m sm_35"\
+  /* Separator. */ \
+  "; " \
+  /* Catch-all. */ \
+  "misa=*:-m %*"   \
+  /* Separator. */ \
+  "; " \
+  /* Implicit misa=sm_30.  */  \
+  ":-m sm_35"  \
+  "}"
 
 #define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
 


[wwwdocs][patch] gcc-12: Nvptx updates.

2022-03-30 Thread Tom de Vries via Gcc-patches
[ was: Re: [wwwdocs][patch] gcc-12/changes.html: Document -misa update 
for nvptx ]


On 3/3/22 13:27, Tobias Burnus wrote:

The current wording, https://gcc.gnu.org/gcc-12/changes.html#nvptx ,
is outdated and (now wrongly) encourages to use -mptx=.

Updated as follows.


I've taken these changes as a base, revised and added some more items.

Any comments?

Also, feel free to instead comment on the full-text version below 
(copied from firefox after opening the page), that might be more readable.


Thanks,
- Tom

-

NVPTX

The -march flag has been added. The -misa flag is now considered an 
alias of the -march flag.


Support for PTX ISA target architectures sm_53, sm_70, sm_75 and 
sm_80 has been added. These can be specified using the -march flag.


The default PTX ISA target architecture has been set back to sm_30, 
to fix support for sm_30 boards.


The -march-map flag has been added. The -march-map value will be 
mapped to an valid -march flag value. For instance, -march-map=sm_50 
maps to -march=sm_35. This can be used to specify that generated code is 
to be executed on a board with at least some specific compute 
capability, without having to know the valid values for the -march flag.


The -mptx flag has been added to specify the PTX ISA version for 
the generated code; permitted values are 3.1 (matches previous GCC 
versions), 6.0, 6.3, and 7.0. If not specified, the used version is the 
minimal version required for -march but at least 6.0.


An mptx-3.1 multilib was added. This allows using older drivers 
which do not support PTX ISA version 6.0.


The new __PTX_SM__ predefined macro allows code to check the PTX 
ISA target architecture being targeted by the compiler.


The new __PTX_ISA_VERSION_MAJOR__ and __PTX_ISA_VERSION_MINOR__ 
predefined macros allows code to check the PTX ISA version being 
targeted by the compiler.


-gcc-12: Nvptx updates.

Co-Authored-By: Tobias Burnus 

---
 htdocs/gcc-12/changes.html | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 689feeba..d95f7253 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -493,12 +493,33 @@ a work-in-progress.
 
 NVPTX
 
+  The -march flag has been added.  The -misa
+flag is now considered an alias of the -march flag.
+  Support for PTX ISA target architectures sm_53,
+sm_70, sm_75 and sm_80 has been
+added.  These can be specified using the -march flag.
+  The default PTX ISA target architecture has been set back
+to sm_30, to fix support for sm_30 boards.
+  The -march-map flag has been added.  The
+-march-map value will be mapped to an valid
+-march flag value.  For instance,
+-march-map=sm_50 maps to -march=sm_35.
+This can be used to specify that generated code is to be executed on a
+board with at least some specific compute capability, without having to
+know the valid values for the -march flag.
   The -mptx flag has been added to specify the PTX ISA version
   for the generated code; permitted values are 3.1
-  (default, matches previous GCC versions) and 6.3.
+  (matches previous GCC versions), 6.0, 6.3,
+  and 7.0. If not specified, the used version is the minimal
+  version required for -march but at least 6.0.
   
+  An mptx-3.1 multilib was added.  This allows using older
+  drivers which do not support PTX ISA version 6.0.
   The new __PTX_SM__ predefined macro allows code to check the
-  compute model being targeted by the compiler.
+  PTX ISA target architecture being targeted by the compiler.
+  The new __PTX_ISA_VERSION_MAJOR__
+  and __PTX_ISA_VERSION_MINOR__ predefined macros allows code
+  to check the PTX ISA version being targeted by the compiler.
 
 
 


Re: [PATCH][nvptx, doc] Update misa and mptx, add march and march-map

2022-03-30 Thread Tom de Vries via Gcc-patches

On 3/30/22 11:02, Tobias Burnus wrote:

On 30.03.22 10:03, Tom de Vries wrote:


On 3/29/22 16:47, Tobias Burnus wrote:

I think it would be useful to have additionally some wording for the
(new in GCC 12/new since today) macros,

[...]

The macro is defined also if the option is not specified, so I think
this formulation is not 100% clear in that aspect.  I've reformulated
to fix that.

Fine. (It was a copy, paste + modify from elsewhere.)


Also, I took out the detail of how the value is determined, since
we're just following __CUDA_ARCH__ rather than defining our own policy.


OK. While I am not sure that it is obvious, also the example makes clear
what value to expect. Combining the two, I concur that the details
aren't required.


Any comments?


LGTM.

Tobias


PS: Regarding the sm_30 -> sm_35 change (before in this email thread).
That was not meant to be in the the .texi file, but just as item to
remember when updating the wwwdocs / gcc-12/changes.html document.



I see, I misunderstood then.  FWIW, it's already added to the version in 
my sandbox.



It was/is also not completely clear to me whether there is still this
CUDA 11.x issue of not supporting sm_30 (only sm_35 and higher) or not.


Thanks for reminding me of this issue.


I assume it still exists but is mitigated at
compiler-usage/libgomp-runtime-usage time as PTX ISA now defaults to 6.0
such that CUDA – but shouldn't it still see sm_30 instead of sm_35 in
this case?

If so, I think it will still show up when using either explicitly PTX
ISA 3.1 or when building GCC itself and all of the following holds:
nvptx-tools is installed, CUDA (in a too new version) is installed
(ptxas in $PATH) , and the the pending pull request nvptx-tools has not
been applied that ignores the non-explicit '--verify' when .target sm_xx
or PTX ISA .version is not supported by ptxas.



I don't think the 6.0 default has any influence (and I'll be using 
-mptx=3.1 below to make sure we run into the worst-case behaviour).


Anyway, in absence of an nvptx-tools fix I committed a work-around in 
the compiler:

...
#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}%{misa=sm_30:--no-verify}"
...

Note that this was before reverting back the default to sm_30, and I 
probably forgot to update this spot when changing the default.


So now, there are effectively two workarounds in place.

This (implicitly using sm_30) passes:
...
$ ( PATH=$PATH:~/cuda/11.6.0/bin; ./gcc.sh ~/hello.c -c -save-temps 
-Wa,--verify -mptx=3.1 )

...
because as we can see with -v, sm_35 is used to verify:
...
 ./build-gcc/gcc/as -m sm_35 --verify -o hello.o hello.s
...

This (explicitly using sm_30) passes:
...
$ ( PATH=$PATH:~/cuda/11.6.0/bin; ./gcc.sh ~/hello.c -c -save-temps 
-march=sm_30 -mptx=3.1 )

...
because as we can see with -v, the --no-verify workaround is triggered:
...
 ./build-gcc/gcc/as -m sm_30 --no-verify -o hello.o hello.s
...

But that one stops working once we use an explicit -Wa,--verify:
...
$ ( PATH=$PATH:~/cuda/11.6.0/bin; ./gcc.sh ~/hello.c -c -save-temps 
-Wa,--verify -march=sm_30 -mptx=3.1  )

ptxas fatal   : Value 'sm_30' is not defined for option 'gpu-name'
nvptx-as: ptxas returned 255 exit status
...

So, it seems using sm_35 to verify sm_30 is the most robust workaround.

I'm currently testing attached patch.

Thanks,
- Tom[nvptx] Fix ASM_SPEC workaround for sm_30

Newer versions of CUDA no longer support sm_30, and nvptx-tools as
currently doesn't handle that gracefully when verifying
( https://github.com/MentorEmbedded/nvptx-tools/issues/30 ).

There's a --no-verify work-around in place in ASM_SPEC, but that one doesn't
work when using -Wa,--verify on the command line.

Use a more robust workaround: verify using sm_35 when misa=sm_30 is specified
(either implicitly or explicitly).

Tested on nvptx.

gcc/ChangeLog:

2022-03-30  Tom de Vries  

	* config/nvptx/nvptx.h (ASM_SPEC): Use "-m sm_35" for -misa=sm_30.

---
 gcc/config/nvptx/nvptx.h | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 75ac7a666b13..3b06f33032fd 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -29,10 +29,24 @@
 
 #define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
 
-/* Default needs to be in sync with default for misa in nvptx.opt.
-   We add a default here to work around a hard-coded sm_30 default in
-   nvptx-as.  */
-#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}%{misa=sm_30:--no-verify}"
+/* Newer versions of CUDA no longer support sm_30, and nvptx-tools as
+   currently doesn't handle that gracefully when verifying
+   ( https://github.com/MentorEmbedded/nvptx-tools/issues/30 ).  Work around
+   this by verifying with sm_35 when having misa=sm_30 (either implicitly
+   or explicitly).  */
+#define ASM_SPEC\
+  "%{"		\
+  /* Explict misa=sm_30.  */			\
+  "misa=sm_30:-m sm_35"\
+  /* Separator.	

Re: [PATCH][nvptx, doc] Update misa and mptx, add march and march-map

2022-03-30 Thread Tom de Vries via Gcc-patches

On 3/29/22 16:47, Tobias Burnus wrote:

On 29.03.22 16:28, Tobias Burnus wrote:


On 29.03.22 15:39, Tom de Vries wrote:

Any comments?


I think it would be useful to have additionally some wording for the
(new in GCC 12/new since today) macros,


Agreed.


i.e. something like:

--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -27546,6 +27546,10 @@
  strings must be lower-case.  Valid ISA strings include @samp{sm_30} and
  @samp{sm_35}.  The default ISA is sm_35.

+This option causes the preprocessor macro @code{__PTX_SM__} to be defined
+to the architecture number multiplied by ten; for instance, for
+@samp{sm_35}, it has the value @samp{350}.
+


The macro is defined also if the option is not specified, so I think 
this formulation is not 100% clear in that aspect.  I've reformulated to 
fix that.


Also, I took out the detail of how the value is determined, since we're 
just following __CUDA_ARCH__ rather than defining our own policy.



  @item -mptx=@var{version-string}
  @opindex mptx
  Generate code for given the specified PTX version (e.g.@: @samp{7.0}).
@@ -27553,6 +27557,10 @@
  @samp{7.0}.  The default PTX version is 6.0, unless a higher minimal
  version is required for specified PTX ISA via option @option{-misa=}.

+This option causes the preprocessor macros 
@code{__PTX_ISA_VERSION_MAJOR__}

+and @code{__PTX_ISA_VERSION_MINOR__} to be defined; for instance,
+for @samp{3.1} the macros have the values @samp{3} and @samp{1}, 
respectively.

+


Reformulated this as well.

Any comments?

Thanks,
- Tom[nvptx, doc] Document predefined macros at march and mptx

Document predefined macros:
- __PTX_SM__ ,
- __PTX_ISA_VERSION_MAJOR__ and
- __PTX_ISA_VERSION_MINOR__ .

gcc/ChangeLog:

2022-03-29  Tom de Vries  

	* doc/invoke.texi (march): Document __PTX_SM__.
	 (mptx): Document __PTX_ISA_VERSION_MAJOR__ and
	 __PTX_ISA_VERSION_MINOR__.

Co-Authored-By: Tobias Burnus 

---
 gcc/doc/invoke.texi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 43b75132c91b..09715a510b4d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -27547,6 +27547,10 @@ Generate code for the specified PTX ISA target architecture
 @samp{sm_35}, @samp{sm_53}, @samp{sm_70}, @samp{sm_75} and
 @samp{sm_80}.  The default target architecture is sm_30.
 
+This option sets the value of the preprocessor macro
+@code{__PTX_SM__}; for instance, for @samp{sm_35}, it has the value
+@samp{350}.
+
 @item -misa=@var{architecture-string}
 @opindex misa
 Alias of @option{-march=}.
@@ -27566,6 +27570,11 @@ Valid version strings include @samp{3.1}, @samp{6.0}, @samp{6.3}, and
 version is required for specified PTX ISA target architecture via
 option @option{-march=}.
 
+This option sets the values of the preprocessor macros
+@code{__PTX_ISA_VERSION_MAJOR__} and @code{__PTX_ISA_VERSION_MINOR__};
+for instance, for @samp{3.1} the macros have the values @samp{3} and
+@samp{1}, respectively.
+
 @item -mmainkernel
 @opindex mmainkernel
 Link in code for a __main kernel.  This is for stand-alone instead of


Re: [PATCH][nvptx, doc] Update misa and mptx, add march and march-map

2022-03-30 Thread Tom de Vries via Gcc-patches

On 3/29/22 16:28, Tobias Burnus wrote:

Hi Tom,

On 29.03.22 15:39, Tom de Vries wrote:

Any comments?
+(e.g.@: @samp{sm_35}).  Valid architecture strings are @samp{sm_30},
+@samp{sm_35}, @samp{sm_53} @samp{sm_70}, @samp{sm_75} and
+@samp{sm_80}.  The default target architecture is sm_30.


Missing comma (",") between sm_53 and sm_70.



Ack, fixed.


I want to note that the default is now back at sm_30;
for GCC 11 it was changed to sm_35, cf. 
https://gcc.gnu.org/gcc-11/changes.html


I think changes are better described in release notes.

(We also need to update the wwwdocs release notes before the release, 
but it

can also be done after branching).



Right, I'll follow up on your proposal from beginning of this month.


+@item -march-map=@var{architecture-string}
+@opindex march
+Select the closest available @option{-march=} value that is not more
+capable.  For instance, for @option{-march-map=sm_50} select
+@option{-march=sm_35}, and for @option{-march-map=sm_53} select
+@option{-march=sm_53}.


(Somehow, I am not completely happy with the wording, but, admittedly, I
don't have a better suggestion.)


I feel the same, so committed as is.

Thanks,
- Tom


[committed][nvptx] Add __PTX_ISA_VERSION_{MAJOR,MINOR}__

2022-03-29 Thread Tom de Vries via Gcc-patches
Hi,

Add preprocessor macros __PTX_ISA_VERSION_MAJOR__ and
__PTX_ISA_VERSION_MINOR__.

For the default 6.0, we have:
...
 $ echo | cc1 -E -dD - 2>&1 | grep PTX_ISA_VERSION
 #define __PTX_ISA_VERSION_MAJOR__ 6
 #define __PTX_ISA_VERSION_MINOR__ 0
...
and for 3.1, we have:
...
 $ echo | cc1 -mptx=3.1 -E -dD - 2>&1 | grep PTX_ISA_VERSION
 #define __PTX_ISA_VERSION_MAJOR__ 3
 #define __PTX_ISA_VERSION_MINOR__ 1
...

These can be used to express things like:
...
 #if __PTX_ISA_VERSION_MAJOR__ >= 4 && __PTX_ISA_VERSION_MAJOR__ >= 1
   /* Code using %dynamic_smem_size.  */
 #else
   /* Fallback code.  */
 #endif
...

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add __PTX_ISA_VERSION_{MAJOR,MINOR}__

gcc/ChangeLog:

2022-03-29  Tom de Vries  

PR target/104857
* config/nvptx/nvptx-c.cc (nvptx_cpu_cpp_builtins): Emit
__PTX_ISA_VERSION_MAJOR__ and __PTX_ISA_VERSION_MINOR__.
* config/nvptx/nvptx.cc (ptx_version_to_number): New function.
* config/nvptx/nvptx-protos.h (ptx_version_to_number): Declare.

gcc/testsuite/ChangeLog:

2022-03-29  Tom de Vries  

PR target/104857
* gcc.target/nvptx/ptx31.c: New test.
* gcc.target/nvptx/ptx60.c: New test.
* gcc.target/nvptx/ptx63.c: New test.
* gcc.target/nvptx/ptx70.c: New test.

---
 gcc/config/nvptx/nvptx-c.cc|  9 +
 gcc/config/nvptx/nvptx-protos.h|  1 +
 gcc/config/nvptx/nvptx.cc  | 22 ++
 gcc/testsuite/gcc.target/nvptx/ptx31.c | 10 ++
 gcc/testsuite/gcc.target/nvptx/ptx60.c | 10 ++
 gcc/testsuite/gcc.target/nvptx/ptx63.c | 10 ++
 gcc/testsuite/gcc.target/nvptx/ptx70.c | 10 ++
 7 files changed, 72 insertions(+)

diff --git a/gcc/config/nvptx/nvptx-c.cc b/gcc/config/nvptx/nvptx-c.cc
index 02f75625064..f060a8ab1d4 100644
--- a/gcc/config/nvptx/nvptx-c.cc
+++ b/gcc/config/nvptx/nvptx-c.cc
@@ -49,5 +49,14 @@ nvptx_cpu_cpp_builtins (void)
 #include "nvptx-sm.def"
 #undef NVPTX_SM
   cpp_define (parse_in, ptx_sm);
+
+  {
+unsigned major
+  = ptx_version_to_number ((ptx_version)ptx_version_option, true);
+unsigned minor
+  = ptx_version_to_number ((ptx_version)ptx_version_option, false);
+cpp_define_formatted (parse_in, "__PTX_ISA_VERSION_MAJOR__=%u", major);
+cpp_define_formatted (parse_in, "__PTX_ISA_VERSION_MINOR__=%u", minor);
+  }
 }
 
diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
index ca0a87ee4bd..dfa08ec8319 100644
--- a/gcc/config/nvptx/nvptx-protos.h
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -44,6 +44,7 @@ extern void nvptx_cpu_cpp_builtins (void);
 extern void nvptx_register_pragmas (void);
 extern unsigned int nvptx_data_alignment (const_tree, unsigned int);
 extern void nvptx_asm_output_def_from_decls (FILE *, tree, tree);
+extern unsigned int ptx_version_to_number (enum ptx_version, bool);
 
 #ifdef RTX_CODE
 extern void nvptx_expand_oacc_fork (unsigned);
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 87efc23bd96..e4297e2d6c3 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -272,6 +272,28 @@ ptx_version_to_string (enum ptx_version v)
 }
 }
 
+unsigned int
+ptx_version_to_number (enum ptx_version v, bool major_p)
+{
+  switch (v)
+{
+case PTX_VERSION_3_0:
+  return major_p ? 3 : 0;
+case PTX_VERSION_3_1:
+  return major_p ? 3 : 1;
+case PTX_VERSION_4_2:
+  return major_p ? 4 : 2;
+case PTX_VERSION_6_0:
+  return major_p ? 6 : 0;
+case PTX_VERSION_6_3:
+  return major_p ? 6 : 3;
+case PTX_VERSION_7_0:
+  return major_p ? 7 : 0;
+default:
+  gcc_unreachable ();
+}
+}
+
 static const char *
 sm_version_to_string (enum ptx_isa sm)
 {
diff --git a/gcc/testsuite/gcc.target/nvptx/ptx31.c 
b/gcc/testsuite/gcc.target/nvptx/ptx31.c
new file mode 100644
index 000..46b5e1ba405
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/ptx31.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-march=sm_30 -mptx=3.1" } */
+
+#if __PTX_ISA_VERSION_MAJOR__ != 3
+#error wrong value for __PTX_ISA_VERSION_MAJOR__
+#endif
+
+#if __PTX_ISA_VERSION_MINOR__ != 1
+#error wrong value for __PTX_ISA_VERSION_MINOR__
+#endif
diff --git a/gcc/testsuite/gcc.target/nvptx/ptx60.c 
b/gcc/testsuite/gcc.target/nvptx/ptx60.c
new file mode 100644
index 000..267a9c64f1e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/ptx60.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-march=sm_30 -mptx=6.0" } */
+
+#if __PTX_ISA_VERSION_MAJOR__ != 6
+#error wrong value for __PTX_ISA_VERSION_MAJOR__
+#endif
+
+#if __PTX_ISA_VERSION_MINOR__ != 0
+#error wrong value for __PTX_ISA_VERSION_MINOR__
+#endif
diff --git a/gcc/testsuite/gcc.target/nvptx/ptx63.c 
b/gcc/testsuite/gcc.target/nvptx/ptx63.c
new file mode 100644
index 000..13d02e132ae
--- /d

[PATCH][nvptx, doc] Update misa and mptx, add march and march-map

2022-03-29 Thread Tom de Vries via Gcc-patches
Hi,

Update nvptx documentation:
- Use meaningful terms: "PTX ISA target architecture" and "PTX ISA version".
- Remove invalid claim that "ISA strings must be lower-case".
- Add missing sm_xx entries.
- Fix default ISA.
- Add march, copying misa doc.
- Declare misa an march alias.
- Add march-map.
- Fix "for given the specified" typo.

Any comments?

Thanks,
- Tom

[nvptx, doc] Update misa and mptx, add march and march-map

gcc/ChangeLog:

2022-03-29  Tom de Vries  

* doc/invoke.texi (misa, mptx): Update.
(march, march-map): Add.

---
 gcc/doc/invoke.texi | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 554e04ecbf3a..eb2fe959e600 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -27540,18 +27540,31 @@ These options are defined for Nvidia PTX:
 Ignored, but preserved for backward compatibility.  Only 64-bit ABI is
 supported.
 
-@item -misa=@var{ISA-string}
+@item -march=@var{architecture-string}
 @opindex march
-Generate code for given the specified PTX ISA (e.g.@: @samp{sm_35}).  ISA
-strings must be lower-case.  Valid ISA strings include @samp{sm_30} and
-@samp{sm_35}.  The default ISA is sm_35.
+Generate code for the specified PTX ISA target architecture
+(e.g.@: @samp{sm_35}).  Valid architecture strings are @samp{sm_30},
+@samp{sm_35}, @samp{sm_53} @samp{sm_70}, @samp{sm_75} and
+@samp{sm_80}.  The default target architecture is sm_30.
+
+@item -misa=@var{architecture-string}
+@opindex misa
+Alias of @option{-march=}.
+
+@item -march-map=@var{architecture-string}
+@opindex march
+Select the closest available @option{-march=} value that is not more
+capable.  For instance, for @option{-march-map=sm_50} select
+@option{-march=sm_35}, and for @option{-march-map=sm_53} select
+@option{-march=sm_53}.
 
 @item -mptx=@var{version-string}
 @opindex mptx
-Generate code for given the specified PTX version (e.g.@: @samp{7.0}).
+Generate code for the specified PTX ISA version (e.g.@: @samp{7.0}).
 Valid version strings include @samp{3.1}, @samp{6.0}, @samp{6.3}, and
-@samp{7.0}.  The default PTX version is 6.0, unless a higher minimal
-version is required for specified PTX ISA via option @option{-misa=}.
+@samp{7.0}.  The default PTX version is 6.0, unless a higher version
+is required for specified PTX ISA target architecture via option
+@option{-march=}.
 
 @item -mmainkernel
 @opindex mmainkernel


[committed][nvptx] Update help text for m64

2022-03-29 Thread Tom de Vries via Gcc-patches
Hi,

In the docs we have for m64:
...
Ignored, but preserved for backward compatibility.  Only 64-bit ABI is
supported.
...

But with --target-help, we have instead:
...
$ gcc --target-help
  ...
  -m64Generate code for a 64-bit ABI.
...
which could be interpreted as meaning that generating code for a 32-bit ABI is
still possible.

Fix this by instead emitting the same text as in the docs:
...
  -m64Ignored, but preserved for backward compatibility.  Only 64-bit
  ABI is supported.
...

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Update help text for m64

gcc/ChangeLog:

2022-03-29  Tom de Vries  

* config/nvptx/nvptx.opt (m64): Update help text to reflect that it
is ignored.

---
 gcc/config/nvptx/nvptx.opt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 58eddeeabf4..55a10572dd1 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -25,7 +25,8 @@
 
 m64
 Target RejectNegative Mask(ABI64)
-Generate code for a 64-bit ABI.
+Ignored, but preserved for backward compatibility.  Only 64-bit ABI is
+supported.
 
 mmainkernel
 Target RejectNegative


[committed][nvptx] Add march-map

2022-03-29 Thread Tom de Vries via Gcc-patches
Hi,

Say we have an sm_50 board, and we want to run a benchmark using the highest
possible march setting.

Currently there's march=sm_30, march=sm_35, march=sm_53, but no march=sm_50.

So, we'd need to pick march=sm_35.

Likewise, for a test script that handles multiple boards, we'd need a mapping
from native board sm_xx to march, which might have to be updated with newer
gcc releases.

Add an option march-map, such that we can just specify march-map=sm_50, and
let the compiler map this to the appropriate march.

The option is implemented as a list of aliases, such that we have a somewhat
lengthy (17 lines in total):
...
$ gcc --help=target
  ...
  -march-map=sm_30Same as -misa=sm_30.
  -march-map=sm_32Same as -misa=sm_30.
  ...
  -march-map=sm_87Same as -misa=sm_80.
  -march-map=sm_90Same as -misa=sm_80.
...

This implementation was chosen in the hope that it'll be easier if
we end up with some misa multilib.

It would be nice to have the mapping list generated from an updated
nvptx-sm.def, but for now it's spelled out in nvptx.opt.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add march-map

gcc/ChangeLog:

2022-03-29  Tom de Vries  

PR target/104714
* config/nvptx/nvptx.opt (march-map=*): Add aliases.

gcc/testsuite/ChangeLog:

2022-03-29  Tom de Vries  

PR target/104714
* gcc.target/nvptx/march-map.c: New test.

---
 gcc/config/nvptx/nvptx.opt | 51 ++
 gcc/testsuite/gcc.target/nvptx/march-map.c |  5 +++
 2 files changed, 56 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index b5d0170e9e9..58eddeeabf4 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -60,6 +60,57 @@ march=
 Target RejectNegative Joined Alias(misa=)
 Alias:
 
+march-map=sm_30
+Target RejectNegative Alias(misa=,sm_30)
+
+march-map=sm_32
+Target RejectNegative Alias(misa=,sm_30)
+
+march-map=sm_35
+Target RejectNegative Alias(misa=,sm_35)
+
+march-map=sm_37
+Target RejectNegative Alias(misa=,sm_35)
+
+march-map=sm_50
+Target RejectNegative Alias(misa=,sm_35)
+
+march-map=sm_52
+Target RejectNegative Alias(misa=,sm_35)
+
+march-map=sm_53
+Target RejectNegative Alias(misa=,sm_53)
+
+march-map=sm_60
+Target RejectNegative Alias(misa=,sm_53)
+
+march-map=sm_61
+Target RejectNegative Alias(misa=,sm_53)
+
+march-map=sm_62
+Target RejectNegative Alias(misa=,sm_53)
+
+march-map=sm_70
+Target RejectNegative Alias(misa=,sm_70)
+
+march-map=sm_72
+Target RejectNegative Alias(misa=,sm_70)
+
+march-map=sm_75
+Target RejectNegative Alias(misa=,sm_75)
+
+march-map=sm_80
+Target RejectNegative Alias(misa=,sm_80)
+
+march-map=sm_86
+Target RejectNegative Alias(misa=,sm_80)
+
+march-map=sm_87
+Target RejectNegative Alias(misa=,sm_80)
+
+march-map=sm_90
+Target RejectNegative Alias(misa=,sm_80)
+
 Enum
 Name(ptx_version) Type(int)
 Known PTX ISA versions (for use with the -mptx= option):
diff --git a/gcc/testsuite/gcc.target/nvptx/march-map.c 
b/gcc/testsuite/gcc.target/nvptx/march-map.c
new file mode 100644
index 000..00838e55fc0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/march-map.c
@@ -0,0 +1,5 @@
+/* { dg-options "-march-map=sm_50" } */
+
+#include "main.c"
+
+/* { dg-final { scan-assembler-times "\\.target\tsm_35" 1 } } */


[committed][nvptx] Add march alias for misa

2022-03-29 Thread Tom de Vries via Gcc-patches
Hi,

The target option misa has the following description:
...
$ gcc --target-help 2>&1 | grep misa
  -misa=  Specify the PTX ISA target architecture to use.
...

The name misa is somewhat poorly chosen.  It suggests that for a use
-misa=sm_30, sm_30 is the name of a specific Instruction Set Architecture.
Instead, sm_30 is the name of a specific target architecture in the generic
PTX Instruction Set Architecture.

Futhermore, there's mptx, which also has ISA in the description:
...
  -mptx=  Specify the PTX ISA version to use.
...

Add the more intuitive alias march for misa:
...
$ gcc --target-help 2>&1 | grep march
  -march= Alias:  Same as -misa=.
...

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add march alias for misa

gcc/ChangeLog:

2022-03-29  Tom de Vries  

* config/nvptx/nvptx.opt (march): Add alias of misa.

gcc/testsuite/ChangeLog:

2022-03-29  Tom de Vries  

* gcc.target/nvptx/main.c: New test.
* gcc.target/nvptx/march.c: New test.

---
 gcc/config/nvptx/nvptx.opt | 4 
 gcc/testsuite/gcc.target/nvptx/main.c  | 7 +++
 gcc/testsuite/gcc.target/nvptx/march.c | 5 +
 3 files changed, 16 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 1f684ed8860..b5d0170e9e9 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -56,6 +56,10 @@ misa=
 Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) 
Init(PTX_ISA_SM30)
 Specify the PTX ISA target architecture to use.
 
+march=
+Target RejectNegative Joined Alias(misa=)
+Alias:
+
 Enum
 Name(ptx_version) Type(int)
 Known PTX ISA versions (for use with the -mptx= option):
diff --git a/gcc/testsuite/gcc.target/nvptx/main.c 
b/gcc/testsuite/gcc.target/nvptx/main.c
new file mode 100644
index 000..3af2b575842
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/main.c
@@ -0,0 +1,7 @@
+/* { dg-do link } */
+
+int
+main (void)
+{
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/nvptx/march.c 
b/gcc/testsuite/gcc.target/nvptx/march.c
new file mode 100644
index 000..ec91f21c903
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/march.c
@@ -0,0 +1,5 @@
+/* { dg-options "-march=sm_30"} */
+
+#include "main.c"
+
+/* { dg-final { scan-assembler-times "\\.target\tsm_30" 1 } } */


[committed][nvptx] Improve help description of misa and mptx

2022-03-28 Thread Tom de Vries via Gcc-patches
Hi,

Currently we have:
...
$ gcc --target-help 2>&1 | egrep "misa|mptx"
  -misa=  Specify the version of the ptx ISA to use.
  -mptx=  Specify the version of the ptx version to use.
  Known PTX ISA versions (for use with the -misa= option):
  Known PTX versions (for use with the -mptx= option):
...

As reported in PR104818, the "version of the ptx version" doesn't make much
sense.

Furthermore, the description of misa (and 'Known ISA versions') is misleading
because it does not specify the version of the PTX ISA, but rather the PTX ISA
target architecture.

Fix this by printing instead:
...
$ gcc --target-help 2>&1 | egrep "misa|mptx"
  -misa=  Specify the PTX ISA target architecture to use.
  -mptx=  Specify the PTX ISA version to use.
  Known PTX ISA target architectures (for use with the -misa= option):
  Known PTX ISA versions (for use with the -mptx= option):
...

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Improve help description of misa and mptx

gcc/ChangeLog:

2022-03-28  Tom de Vries  

PR target/104818
* config/nvptx/gen-opt.sh (ptx_isa): Improve help text.
* config/nvptx/nvptx-gen.opt: Regenerate.
* config/nvptx/nvptx.opt (misa, mptx, ptx_version): Improve help text.
* config/nvptx/t-nvptx (s-nvptx-gen-opt): Add missing dependency on
gen-opt.sh.

---
 gcc/config/nvptx/gen-opt.sh| 2 +-
 gcc/config/nvptx/nvptx-gen.opt | 2 +-
 gcc/config/nvptx/nvptx.opt | 6 +++---
 gcc/config/nvptx/t-nvptx   | 3 ++-
 4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/config/nvptx/gen-opt.sh b/gcc/config/nvptx/gen-opt.sh
index 5248ed25090c..ba048891a499 100644
--- a/gcc/config/nvptx/gen-opt.sh
+++ b/gcc/config/nvptx/gen-opt.sh
@@ -44,7 +44,7 @@ echo
 cat < tmp-nvptx-gen.opt
$(SHELL) $(srcdir)/../move-if-change \


Re: [PATCH][libgomp, testsuite] Fix hardcoded libexec in plugin/configfrag.ac

2022-03-28 Thread Tom de Vries via Gcc-patches

On 3/28/22 14:04, Richard Biener wrote:

On Mon, 28 Mar 2022, Andreas Schwab wrote:


On Mär 28 2022, Richard Biener via Gcc-patches wrote:


OK in principle, but I have no idea on how portable

$(libexecdir:\$(exec_prefix)/%=%)

is going to be?


We already require GNU make, don't we?


We should aim for POSIX shell compatibility here, whatever that
exactly is.


It's not a shell construct.


Ah, it's only substituted into Makefile.in - in that case yes,
we already require GNU make.  If it's evaluated by that and not
a subshell of it then it should be indeed fine.


Ack, then committed the first version.

Thanks,
- Tom


Re: [PATCH][libgomp, testsuite] Fix hardcoded libexec in plugin/configfrag.ac

2022-03-28 Thread Tom de Vries via Gcc-patches

On 3/28/22 10:49, Richard Biener wrote:

On Mon, 28 Mar 2022, Tom de Vries wrote:


Hi,

When building an nvptx offloading configuration on openSUSE Leap 15.3, the
site script /usr/share/site/x86_64-unknown-linux-gnu is activated, setting
libexecdir to ${exec_prefix}/lib rather than ${exec_prefix}/libexec:
...
| # If user did not specify libexecdir, set the correct target:
| # Nor FHS nor openSUSE allow prefix/libexec. Let's default to prefix/lib.
|
| if test "$libexecdir" = '${exec_prefix}/libexec' ; then
|   libexecdir='${exec_prefix}/lib'
| fi
...

However, in libgomp libgomp/plugin/configfrag.ac we hardcode libexec:
...
 # Configure additional search paths.
 if test x"$tgt_dir" != x; then
   offload_additional_options="$offload_additional_options \
 -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) \
-B$tgt_dir/bin"
...

Fix this by using /$(libexecdir:\$(exec_prefix)/%=%)/ instead of /libexec/.

Tested on x86_64-linux with nvptx accelerator.

OK for trunk?


OK in principle, but I have no idea on how portable

$(libexecdir:\$(exec_prefix)/%=%)

is going to be?  We should aim for POSIX shell compatibility here,
whatever that exactly is.


I tried to avoid this construct by using shell variable substitution, 
but then I end up using $(shell ...) instead, I'm not sure if that is 
any better.


Thanks,
- Tom
[libgomp, testsuite] Fix hardcoded libexec in plugin/configfrag.ac

When building an nvptx offloading configuration on openSUSE Leap 15.3, the
site script /usr/share/site/x86_64-unknown-linux-gnu is activated, setting
libexecdir to ${exec_prefix}/lib rather than ${exec_prefix}/libexec:
...
| # If user did not specify libexecdir, set the correct target:
| # Nor FHS nor openSUSE allow prefix/libexec. Let's default to prefix/lib.
|
| if test "$libexecdir" = '${exec_prefix}/libexec' ; then
|   libexecdir='${exec_prefix}/lib'
| fi
...

However, in libgomp libgomp/plugin/configfrag.ac we hardcode libexec:
...
# Configure additional search paths.
if test x"$tgt_dir" != x; then
  offload_additional_options="$offload_additional_options \
-B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) \
	-B$tgt_dir/bin"
...

Fix this by using:
...
  /$(shell dir=$(libexecdir); echo $${dir\#$(exec_prefix)/})/
...
instead of /libexec/.

Tested on x86_64-linux with nvptx accelerator.

libgomp/ChangeLog:

2022-03-28  Tom de Vries  

	* plugin/configfrag.ac: Use /$(libexecdir:\$(exec_prefix)/%=%)/
	instead of /libexec/.
	* configure: Regenerate.

---
 libgomp/configure| 2 +-
 libgomp/plugin/configfrag.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/configure b/libgomp/configure
index a73a6d44003..081bf2d64c0 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15419,7 +15419,7 @@ rm -f core conftest.err conftest.$ac_objext \
 fi
 # Configure additional search paths.
 if test x"$tgt_dir" != x; then
-  offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
+  offload_additional_options="$offload_additional_options -B$tgt_dir/\$(shell dir=\$(libexecdir); echo \$\${dir\#\$(exec_prefix)/})/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
   offload_additional_lib_paths="$offload_additional_lib_paths:$tgt_dir/lib64:$tgt_dir/lib:$tgt_dir/lib32"
 else
   offload_additional_options="$offload_additional_options -B\$(libexecdir)/gcc/\$(target_alias)/\$(gcc_version) -B\$(bindir)"
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index da573bd8387..1eef2907bc2 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -254,7 +254,7 @@ if test x"$enable_offload_targets" != x; then
 fi
 # Configure additional search paths.
 if test x"$tgt_dir" != x; then
-  offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
+  offload_additional_options="$offload_additional_options -B$tgt_dir/\$(shell dir=\$(libexecdir); echo \$\${dir\#\$(exec_prefix)/})/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
   offload_additional_lib_paths="$offload_additional_lib_paths:$tgt_dir/lib64:$tgt_dir/lib:$tgt_dir/lib32"
 else
   offload_additional_options="$offload_additional_options -B\$(libexecdir)/gcc/\$(target_alias)/\$(gcc_version) -B\$(bindir)"


[PATCH][libgomp, testsuite] Fix hardcoded libexec in plugin/configfrag.ac

2022-03-28 Thread Tom de Vries via Gcc-patches
Hi,

When building an nvptx offloading configuration on openSUSE Leap 15.3, the
site script /usr/share/site/x86_64-unknown-linux-gnu is activated, setting
libexecdir to ${exec_prefix}/lib rather than ${exec_prefix}/libexec:
...
| # If user did not specify libexecdir, set the correct target:
| # Nor FHS nor openSUSE allow prefix/libexec. Let's default to prefix/lib.
|
| if test "$libexecdir" = '${exec_prefix}/libexec' ; then
|   libexecdir='${exec_prefix}/lib'
| fi
...

However, in libgomp libgomp/plugin/configfrag.ac we hardcode libexec:
...
# Configure additional search paths.
if test x"$tgt_dir" != x; then
  offload_additional_options="$offload_additional_options \
-B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) \
-B$tgt_dir/bin"
...

Fix this by using /$(libexecdir:\$(exec_prefix)/%=%)/ instead of /libexec/.

Tested on x86_64-linux with nvptx accelerator.

OK for trunk?

Thanks,
- Tom

[libgomp, testsuite] Fix hardcoded libexec in plugin/configfrag.ac

libgomp/ChangeLog:

2022-03-28  Tom de Vries  

* plugin/configfrag.ac: Use /$(libexecdir:\$(exec_prefix)/%=%)/
instead of /libexec/.
* configure: Regenerate.

---
 libgomp/configure| 2 +-
 libgomp/plugin/configfrag.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/configure b/libgomp/configure
index a73a6d44003..bdbe3d142d1 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15419,7 +15419,7 @@ rm -f core conftest.err conftest.$ac_objext \
 fi
 # Configure additional search paths.
 if test x"$tgt_dir" != x; then
-  offload_additional_options="$offload_additional_options 
-B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
+  offload_additional_options="$offload_additional_options 
-B$tgt_dir/\$(libexecdir:\$(exec_prefix)/%=%)/gcc/\$(target_alias)/\$(gcc_version)
 -B$tgt_dir/bin"
   
offload_additional_lib_paths="$offload_additional_lib_paths:$tgt_dir/lib64:$tgt_dir/lib:$tgt_dir/lib32"
 else
   offload_additional_options="$offload_additional_options 
-B\$(libexecdir)/gcc/\$(target_alias)/\$(gcc_version) -B\$(bindir)"
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index da573bd8387..9f9d0a7f08c 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -254,7 +254,7 @@ if test x"$enable_offload_targets" != x; then
 fi
 # Configure additional search paths.
 if test x"$tgt_dir" != x; then
-  offload_additional_options="$offload_additional_options 
-B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
+  offload_additional_options="$offload_additional_options 
-B$tgt_dir/\$(libexecdir:\$(exec_prefix)/%=%)/gcc/\$(target_alias)/\$(gcc_version)
 -B$tgt_dir/bin"
   
offload_additional_lib_paths="$offload_additional_lib_paths:$tgt_dir/lib64:$tgt_dir/lib:$tgt_dir/lib32"
 else
   offload_additional_options="$offload_additional_options 
-B\$(libexecdir)/gcc/\$(target_alias)/\$(gcc_version) -B\$(bindir)"


Re: [PATCH][libgomp, testsuite] Scale down some OpenACC test-cases

2022-03-25 Thread Tom de Vries via Gcc-patches

On 3/25/22 13:35, Thomas Schwinge wrote:

Hi!

On 2022-03-25T13:08:52+0100, Tom de Vries  wrote:

On 3/25/22 11:04, Tobias Burnus wrote:

On 25.03.22 10:27, Jakub Jelinek via Gcc-patches wrote:

On Fri, Mar 25, 2022 at 10:18:49AM +0100, Tom de Vries wrote:

[...]
Fix this by scaling down the failing test-cases.
Tested on x86_64-linux with nvptx accelerator.
[...]

Will defer to Thomas, as it is a purely OpenACC change.

One way to do it is
/* { dg-additional-options "-DEXPENSIVE" { target run_expensive_tests
} } */
and using
#ifdef EXPENSIVE
[...]

For the Fortran test it would mean .F90 extension though...


Alternative, use the "-cpp" flag in 'dg-additional-options', which also
enables the C-pre-processor pre-processing in gfortran.


Ack, updated patch accordingly.


Not sure if this additional "complexity" is really necessary here: as far
as I can tell, there's no actual rationale behind the original number of
iterations, so it seems fine to unconditionally scale them down.  I'd
thus move forward with your original patch -- but won't object the
'run_expensive_tests' variant either; the latter is already used in a
handful of other libgomp test cases.



Ack, committed the GCC_TEST_RUN_EXPENSIVE variant.

Thanks,
- Tom


Re: [PATCH][libgomp, testsuite] Scale down some OpenACC test-cases

2022-03-25 Thread Tom de Vries via Gcc-patches

On 3/25/22 11:04, Tobias Burnus wrote:

On 25.03.22 10:27, Jakub Jelinek via Gcc-patches wrote:

On Fri, Mar 25, 2022 at 10:18:49AM +0100, Tom de Vries wrote:

[...]
Fix this by scaling down the failing test-cases.
Tested on x86_64-linux with nvptx accelerator.
[...]

Will defer to Thomas, as it is a purely OpenACC change.

One way to do it is
/* { dg-additional-options "-DEXPENSIVE" { target run_expensive_tests 
} } */

and using
#ifdef EXPENSIVE
[...]

For the Fortran test it would mean .F90 extension though...


Alternative, use the "-cpp" flag in 'dg-additional-options', which also
enables the C-pre-processor pre-processing in gfortran.



Ack, updated patch accordingly.

Thanks,
- Tom
[libgomp, testsuite] Scale down some OpenACC test-cases

When a display manager is running on an nvidia card, all CUDA kernel launches
get a 5 seconds watchdog timer.

Consequently, when running the libgomp testsuite with nvptx accelerator and
GOMP_NVPTX_JIT=-O0 we run into a few FAILs like this:
...
libgomp: cuStreamSynchronize error: the launch timed out and was terminated
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \
  execution test
...

Fix this by scaling down the failing test-cases by default, and reverting to
the original behaviour for GCC_TEST_RUN_EXPENSIVE=1.

Tested on x86_64-linux with nvptx accelerator.

libgomp/ChangeLog:

2022-03-25  Tom de Vries  

	PR libgomp/105042
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Reduce
	execution time.
	* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Same.
	* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Same.

---
 .../libgomp.oacc-c-c++-common/parallel-dims.c  | 45 +-
 .../libgomp.oacc-c-c++-common/vred2d-128.c |  6 +++
 .../libgomp.oacc-fortran/parallel-dims.f90 | 18 +++--
 3 files changed, 46 insertions(+), 23 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
index b1cfe37df8a..6798e23ef70 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -1,6 +1,8 @@
 /* OpenACC parallelism dimensions clauses: num_gangs, num_workers,
vector_length.  */
 
+/* { dg-additional-options "-DEXPENSIVE" { target run_expensive_tests } } */
+
 /* { dg-additional-options "--param=openacc-kernels=decompose" } */
 
 /* { dg-additional-options "-fopt-info-all-omp" }
@@ -49,6 +51,11 @@ static int acc_vector ()
   return __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
 }
 
+#ifdef EXPENSIVE
+#define N 100
+#else
+#define N 50
+#endif
 
 int main ()
 {
@@ -76,7 +83,7 @@ int main ()
 {
   /* We're actually executing with num_gangs (1).  */
   gangs_actual = 1;
-  for (int i = 100 * gangs_actual; i > -100 * gangs_actual; --i)
+  for (int i = N * gangs_actual; i > -N * gangs_actual; --i)
 	{
 	  gangs_min = gangs_max = acc_gang ();
 	  workers_min = workers_max = acc_worker ();
@@ -115,7 +122,7 @@ int main ()
   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
   /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
   /* { dg-optimized {assigned OpenACC gang loop parallelism} {} { target *-*-* } l_loop_i$c_loop_i } */
-  for (int i = 100 * gangs_actual; i > -100 * gangs_actual; --i)
+  for (int i = N * gangs_actual; i > -N * gangs_actual; --i)
 	{
 	  gangs_min = gangs_max = acc_gang ();
 	  workers_min = workers_max = acc_worker ();
@@ -154,7 +161,7 @@ int main ()
   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
   /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
   /* { dg-optimized {assigned OpenACC worker loop parallelism} {} { target *-*-* } l_loop_i$c_loop_i } */
-  for (int i = 100 * workers_actual; i > -100 * workers_actual; --i)
+  for (int i = N * workers_actual; i > -N * workers_actual; --i)
 	{
 	  gangs_min = gangs_max = acc_gang ();
 	  workers_min = workers_max = acc_worker ();
@@ -200,7 +207,7 @@ int main ()
   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
   /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
   /* { dg-optimized {assigned OpenACC vector loop parallelism} {} { target *-*-* } l_loop_i$c_loop_i } */
-  for (int i = 100 * vectors_actual; i > -100 * vectors_actual; --

[PATCH][libgomp, testsuite] Scale down some OpenACC test-cases

2022-03-25 Thread Tom de Vries via Gcc-patches
Hi,

When a display manager is running on an nvidia card, all CUDA kernel launches
get a 5 seconds watchdog timer.

Consequently, when running the libgomp testsuite with nvptx accelerator and
GOMP_NVPTX_JIT=-O0 we run into a few FAILs like this:
...
libgomp: cuStreamSynchronize error: the launch timed out and was terminated
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \
  execution test
...

Fix this by scaling down the failing test-cases.

Tested on x86_64-linux with nvptx accelerator.

OK for trunk?

Thanks,
- Tom

[libgomp, testsuite] Scale down some OpenACC test-cases

libgomp/ChangeLog:

2022-03-25  Tom de Vries  

PR libgomp/105042
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Reduce
execution time.
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Same.
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Same.

---
 .../libgomp.oacc-c-c++-common/parallel-dims.c  | 39 +++---
 .../libgomp.oacc-c-c++-common/vred2d-128.c |  2 +-
 .../libgomp.oacc-fortran/parallel-dims.f90 | 10 +++---
 3 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
index b1cfe37df8a..d9e4bd0d75f 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -49,6 +49,7 @@ static int acc_vector ()
   return __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
 }
 
+#define N 50
 
 int main ()
 {
@@ -76,7 +77,7 @@ int main ()
 {
   /* We're actually executing with num_gangs (1).  */
   gangs_actual = 1;
-  for (int i = 100 * gangs_actual; i > -100 * gangs_actual; --i)
+  for (int i = N * gangs_actual; i > -N * gangs_actual; --i)
{
  gangs_min = gangs_max = acc_gang ();
  workers_min = workers_max = acc_worker ();
@@ -115,7 +116,7 @@ int main ()
   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: 
gangs_max, workers_max, vectors_max)
   /* { dg-note {variable 'i' in 'private' clause isn't candidate for 
adjusting OpenACC privatization level: not addressable} {} { target *-*-* } 
l_loop_i$c_loop_i } */
   /* { dg-optimized {assigned OpenACC gang loop parallelism} {} { target 
*-*-* } l_loop_i$c_loop_i } */
-  for (int i = 100 * gangs_actual; i > -100 * gangs_actual; --i)
+  for (int i = N * gangs_actual; i > -N * gangs_actual; --i)
{
  gangs_min = gangs_max = acc_gang ();
  workers_min = workers_max = acc_worker ();
@@ -154,7 +155,7 @@ int main ()
   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: 
gangs_max, workers_max, vectors_max)
   /* { dg-note {variable 'i' in 'private' clause isn't candidate for 
adjusting OpenACC privatization level: not addressable} {} { target *-*-* } 
l_loop_i$c_loop_i } */
   /* { dg-optimized {assigned OpenACC worker loop parallelism} {} { target 
*-*-* } l_loop_i$c_loop_i } */
-  for (int i = 100 * workers_actual; i > -100 * workers_actual; --i)
+  for (int i = N * workers_actual; i > -N * workers_actual; --i)
{
  gangs_min = gangs_max = acc_gang ();
  workers_min = workers_max = acc_worker ();
@@ -200,7 +201,7 @@ int main ()
   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: 
gangs_max, workers_max, vectors_max)
   /* { dg-note {variable 'i' in 'private' clause isn't candidate for 
adjusting OpenACC privatization level: not addressable} {} { target *-*-* } 
l_loop_i$c_loop_i } */
   /* { dg-optimized {assigned OpenACC vector loop parallelism} {} { target 
*-*-* } l_loop_i$c_loop_i } */
-  for (int i = 100 * vectors_actual; i > -100 * vectors_actual; --i)
+  for (int i = N * vectors_actual; i > -N * vectors_actual; --i)
{
  gangs_min = gangs_max = acc_gang ();
  workers_min = workers_max = acc_worker ();
@@ -250,7 +251,7 @@ int main ()
}
   /* As we're executing GR not GP, don't multiply with a "gangs_actual"
 factor.  */
-  for (int i = 100 /* * gangs_actual */; i > -100 /* * gangs_actual */; 
--i)
+  for (int i = N /* * gangs_actual */; i > -N /* * gangs_actual */; --i)
{
  gangs_min = gangs_max = acc_gang ();
  workers_min = workers_max = acc_worker ();
@@ -291,7 +292,7 @@ int main ()
   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: 
gangs_max, workers_max, vectors_max)
   /* { dg-note {variable 'i' in 'private' clause isn't candidate for 
adjusting OpenACC privatization level: not addressable} {} { target *-*-* } 
l_loop_i$c_loop_i } */
   /* { dg-optimized {assigned OpenACC gang loop parallelism} {} { target 
*-*-* } l_loop_i$c_loop_i } */
-  for (int i = 100 * ga

Re: [PATCH][libatomic] Fix return value in libat_test_and_set

2022-03-24 Thread Tom de Vries via Gcc-patches

On 3/24/22 11:59, Jakub Jelinek wrote:

On Thu, Mar 24, 2022 at 11:01:30AM +0100, Tom de Vries wrote:

Shouldn't that be instead
return (woldval & ((UWORD) -1 << shift)) != 0;
or
return (woldval & ((UWORD) ~(UWORD) 0 << shift)) != 0;
?


Well, I used '(woldval & wval) == wval' based on the fact that the set
operation uses a bitor:
...
   wval = (UWORD)__GCC_ATOMIC_TEST_AND_SET_TRUEVAL << shift;
   woldval = __atomic_load_n (wptr, __ATOMIC_RELAXED);
   do
 {
   t = woldval | wval;
...
so apparently we do not care here about bits not in
__GCC_ATOMIC_TEST_AND_SET_TRUEVAL (or alternatively, we care but assume that
they're 0).

AFAIU, it would have been more precise to compare the entire byte with
__GCC_ATOMIC_TEST_AND_SET_TRUEVAL, but then it would have made sense to set
the entire byte in the set part as well.

Anyway, that doesn't seem to be what you're proposing.  During investigation
of the failure I found that the address used is word-aligned, so shift
becomes 0 in that case.  AFAICT, the fix you're proposing is a nop for shift
== 0, and indeed, it doesn't fix the failure I'm observing.


Ah, sorry, I certainly meant
   return (woldval & ((UTYPE) -1 << shift)) != 0;
or
   return (woldval & ((UTYPE) ~(UTYPE) 0 << shift)) != 0;
i.e. more portable ways of
   return (woldval & (0xff << shift)) != 0;
which don't hardcode that UTYPE is 8-bit unsigned char.



I see, that makes sense.


If one uses just __atomic_test_and_set and __atomic_clear, then I think
it makes no difference.
But testing whether the old byte was non-zero more matches the previous
intent in case the previous value is neither 0 nor 
__GCC_ATOMIC_TEST_AND_SET_TRUEVAL
and treats it as "set" as well.
I think we don't need to change the loop, woldval | wval even for woldval
byte containing say 42 the or will make it still non-zero.

The documentation argues against using those atomics on types other than
bool and {,{un,}signed }char but libatomic still supports those, I believe
when one doesn't have hw specific support for these, __atomic_clear will
clear the entire UTYPE.


Ack, updated patch, added missing changelog contribution.

OK for trunk?

Thanks,
- Tom[libatomic] Fix return value in libat_test_and_set

On nvptx (using a Quadro K2000 with driver 470.103.01) I ran into this:
...
FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
...
which mimimized to:
...
  #include 
  atomic_flag a = ATOMIC_FLAG_INIT;
  int main () {
if ((atomic_flag_test_and_set) ())
  __builtin_abort ();
return 0;
  }
...

The atomic_flag_test_and_set is implemented using __atomic_test_and_set_1,
which corresponds to the "word-sized compare-and-swap loop" version of
libat_test_and_set in libatomic/tas_n.c.

The semantics of a test-and-set is that the return value is "true if and only
if the previous contents were 'set'".

But the code uses:
...
  return woldval != 0;
...
which means it doesn't look only at the byte that was either set or not set,
but at the entire word.

Fix this by using instead:
...
  return (woldval & ((UTYPE) ~(UTYPE) 0 << shift)) != 0;
...

Tested on nvptx.

libatomic/ChangeLog:

2022-03-24  Tom de Vries  

	PR target/105011
	* tas_n.c (libat_test_and_set): Fix return value.

---
 libatomic/tas_n.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libatomic/tas_n.c b/libatomic/tas_n.c
index d0d8c283b495..524312e7d8db 100644
--- a/libatomic/tas_n.c
+++ b/libatomic/tas_n.c
@@ -73,7 +73,7 @@ SIZE(libat_test_and_set) (UTYPE *mptr, int smodel)
  __ATOMIC_RELAXED, __ATOMIC_RELAXED));
 
   post_barrier (smodel);
-  return woldval != 0;
+  return (woldval & ((UTYPE) ~(UTYPE) 0 << shift)) != 0;
 }
 
 #define DONE 1


Re: [PATCH][libatomic] Fix return value in libat_test_and_set

2022-03-24 Thread Tom de Vries via Gcc-patches

On 3/24/22 10:02, Jakub Jelinek wrote:

On Thu, Mar 24, 2022 at 09:28:15AM +0100, Tom de Vries via Gcc-patches wrote:

Hi,

On nvptx (using a Quadro K2000 with driver 470.103.01) I ran into this:
...
FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
...
which mimimized to:
...
   #include 
   atomic_flag a = ATOMIC_FLAG_INIT;
   int main () {
 if ((atomic_flag_test_and_set) ())
   __builtin_abort ();
 return 0;
   }
...

The atomic_flag_test_and_set is implemented using __atomic_test_and_set_1,
which corresponds to the "word-sized compare-and-swap loop" version of
libat_test_and_set in libatomic/tas_n.c.

The semantics of a test-and-set is that the return value is "true if and only
if the previous contents were 'set'".

But the code uses:
...
   return woldval != 0;
...
which means it doesn't look only at the byte that was either set or not set,
but at the entire word.

Fix this by using instead:
...
   return (woldval & wval) == wval;


Shouldn't that be instead
   return (woldval & ((UWORD) -1 << shift)) != 0;
or
   return (woldval & ((UWORD) ~(UWORD) 0 << shift)) != 0;
?


Well, I used '(woldval & wval) == wval' based on the fact that the set 
operation uses a bitor:

...
  wval = (UWORD)__GCC_ATOMIC_TEST_AND_SET_TRUEVAL << shift;
  woldval = __atomic_load_n (wptr, __ATOMIC_RELAXED);
  do
{
  t = woldval | wval;
...
so apparently we do not care here about bits not in 
__GCC_ATOMIC_TEST_AND_SET_TRUEVAL (or alternatively, we care but assume 
that they're 0).


AFAIU, it would have been more precise to compare the entire byte with 
__GCC_ATOMIC_TEST_AND_SET_TRUEVAL, but then it would have made sense to 
set the entire byte in the set part as well.


Anyway, that doesn't seem to be what you're proposing.  During 
investigation of the failure I found that the address used is 
word-aligned, so shift becomes 0 in that case.  AFAICT, the fix you're 
proposing is a nop for shift == 0, and indeed, it doesn't fix the 
failure I'm observing.



The exact __GCC_ATOMIC_TEST_AND_SET_TRUEVAL varies (the most usual
value is 1, but sparc uses 0xff and m68k/sh use 0x80), falseval is
always 0 though and (woldval & wval) == wval
is testing whether some bits of the oldval are all set rather than
whether the old byte was 0.
Say for trueval 1 it tests whether the least significant bit is set,
for 0x80 if the most significant bit of the byte is set, for
0xff whether all bits are set.


Yes, I noticed that.

AFAIU, the proposed patch ddrt under the assumption that we don't care 
about bits not set in __GCC_ATOMIC_TEST_AND_SET_TRUEVAL.


If that's not acceptable, I can submit a patch that doesn't have that 
assumption, and tests the entire byte (but should I also fix the set 
operation then?).


Thanks,
- Tom




[PATCH][libatomic] Fix return value in libat_test_and_set

2022-03-24 Thread Tom de Vries via Gcc-patches
Hi,

On nvptx (using a Quadro K2000 with driver 470.103.01) I ran into this:
...
FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
...
which mimimized to:
...
  #include 
  atomic_flag a = ATOMIC_FLAG_INIT;
  int main () {
if ((atomic_flag_test_and_set) ())
  __builtin_abort ();
return 0;
  }
...

The atomic_flag_test_and_set is implemented using __atomic_test_and_set_1,
which corresponds to the "word-sized compare-and-swap loop" version of
libat_test_and_set in libatomic/tas_n.c.

The semantics of a test-and-set is that the return value is "true if and only
if the previous contents were 'set'".

But the code uses:
...
  return woldval != 0;
...
which means it doesn't look only at the byte that was either set or not set,
but at the entire word.

Fix this by using instead:
...
  return (woldval & wval) == wval;
...

Tested on nvptx.

OK for trunk?

Thanks,
- Tom

[libatomic] Fix return value in libat_test_and_set

---
 libatomic/tas_n.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libatomic/tas_n.c b/libatomic/tas_n.c
index d0d8c283b49..65eaa7753a5 100644
--- a/libatomic/tas_n.c
+++ b/libatomic/tas_n.c
@@ -73,7 +73,7 @@ SIZE(libat_test_and_set) (UTYPE *mptr, int smodel)
 __ATOMIC_RELAXED, __ATOMIC_RELAXED));
 
   post_barrier (smodel);
-  return woldval != 0;
+  return (woldval & wval) == wval;
 }
 
 #define DONE 1


[committed][nvptx] Use '%' as register prefix

2022-03-22 Thread Tom de Vries via Gcc-patches
Hi,

The percentage sign as first character of a ptx identifier can be used to
avoid name conflicts, e.g., between user-defined variable names and
compiler-generated names.

The insn nvptx_uniform_warp_check contains register names without '%' prefix,
which potentially could lead to name conflicts with user-defined variable
names.

Fix this by adding a '%' prefix, more specifically a '%r_' prefix to avoid a
name conflict with ptx special registers.

Tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Use '%' as register prefix

gcc/ChangeLog:

2022-03-20  Tom de Vries  

PR target/104925
* config/nvptx/nvptx.md (define_insn "nvptx_uniform_warp_check"):
Use % as register prefix.

---
 gcc/config/nvptx/nvptx.md | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 5550ce25513..8ed685027b5 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -2280,13 +2280,14 @@ (define_insn "nvptx_uniform_warp_check"
   {
 const char *insns[] = {
   "{",
-  "\\t"  ".reg.b32""\\t" "act;",
-  "%.\\t""vote.ballot.b32" "\\t" "act,1;",
-  "\\t"  ".reg.pred"   "\\t" "do_abort;",
-  "\\t"  "mov.pred""\\t" "do_abort,0;",
-  "%.\\t""setp.ne.b32" "\\t" "do_abort,act,0x;",
-  "@ do_abort\\t" "trap;",
-  "@ do_abort\\t" "exit;",
+  "\\t"  ".reg.b32""\\t" "%%r_act;",
+  "%.\\t""vote.ballot.b32" "\\t" "%%r_act,1;",
+  "\\t"  ".reg.pred"   "\\t" "%%r_do_abort;",
+  "\\t"  "mov.pred""\\t" "%%r_do_abort,0;",
+  "%.\\t""setp.ne.b32" "\\t" "%%r_do_abort,%%r_act,"
+ "0x;",
+  "@ %%r_do_abort\\t" "trap;",
+  "@ %%r_do_abort\\t" "exit;",
   "}",
   NULL
 };


[committed][nvptx] Limit HFmode support to mexperimental

2022-03-22 Thread Tom de Vries via Gcc-patches
Hi,

With PR104489 still open and end-of-stage-4 approaching, classify HFmode
support as experimental, which is not enabled by default but can be enabled
using -mexperimental.

This fixes the nvptx build when the default sm_xx is set to sm_53 or higher.

Note that we're not using -mfp16 or some such, because that might create
expectations about being able to switch support on or off in the future, and
at this point it's not clear why, once reaching non-experimental status, it
shouldn't always be enabled.

Committed to trunk.

Thanks,
- Tom

[nvptx] Limit HFmode support to mexperimental

gcc/ChangeLog:

2022-03-19  Tom de Vries  

* config/nvptx/nvptx.cc (nvptx_scalar_mode_supported_p)
(nvptx_libgcc_floating_mode_supported_p): Only enable HFmode for
mexperimental.

gcc/testsuite/ChangeLog:

2022-03-19  Tom de Vries  

* gcc.target/nvptx/float16-1.c: Add additional-options -mexperimental.
* gcc.target/nvptx/float16-2.c: Same.
* gcc.target/nvptx/float16-3.c: Same.
* gcc.target/nvptx/float16-4.c: Same.
* gcc.target/nvptx/float16-5.c: Same.
* gcc.target/nvptx/float16-6.c: Same.

---
 gcc/config/nvptx/nvptx.cc  | 4 ++--
 gcc/testsuite/gcc.target/nvptx/float16-1.c | 1 +
 gcc/testsuite/gcc.target/nvptx/float16-2.c | 1 +
 gcc/testsuite/gcc.target/nvptx/float16-3.c | 1 +
 gcc/testsuite/gcc.target/nvptx/float16-4.c | 1 +
 gcc/testsuite/gcc.target/nvptx/float16-5.c | 1 +
 gcc/testsuite/gcc.target/nvptx/float16-6.c | 1 +
 7 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index b2f7b4af392..87efc23bd96 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -7156,7 +7156,7 @@ nvptx_cannot_force_const_mem (machine_mode mode 
ATTRIBUTE_UNUSED,
 static bool
 nvptx_scalar_mode_supported_p (scalar_mode mode)
 {
-  if (mode == HFmode && TARGET_SM53)
+  if (nvptx_experimental && mode == HFmode && TARGET_SM53)
 return true;
 
   return default_scalar_mode_supported_p (mode);
@@ -7165,7 +7165,7 @@ nvptx_scalar_mode_supported_p (scalar_mode mode)
 static bool
 nvptx_libgcc_floating_mode_supported_p (scalar_float_mode mode)
 {
-  if (mode == HFmode && TARGET_SM53)
+  if (nvptx_experimental && mode == HFmode && TARGET_SM53)
 return true;
 
   return default_libgcc_floating_mode_supported_p (mode);
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-1.c 
b/gcc/testsuite/gcc.target/nvptx/float16-1.c
index 873a0543535..017774c2941 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-1.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math -misa=sm_53 -mptx=_" } */
+/* { dg-additional-options "-mexperimental" } */
 
 _Float16 var;
 
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-2.c 
b/gcc/testsuite/gcc.target/nvptx/float16-2.c
index 30a3092bc29..e15b685253b 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-2.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math -misa=sm_80 -mptx=_" } */
+/* { dg-additional-options "-mexperimental" } */
 
 _Float16 x;
 _Float16 y;
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-3.c 
b/gcc/testsuite/gcc.target/nvptx/float16-3.c
index edd6514a976..1c646902055 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-3.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-3.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -misa=sm_53 -mptx=_" } */
+/* { dg-additional-options "-mexperimental" } */
 
 _Float16 var;
 
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-4.c 
b/gcc/testsuite/gcc.target/nvptx/float16-4.c
index 0a823971e75..1c24ec8c3b2 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-4.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-4.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math -misa=sm_53 -mptx=_" } */
+/* { dg-additional-options "-mexperimental" } */
 
 _Float16 var;
 
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-5.c 
b/gcc/testsuite/gcc.target/nvptx/float16-5.c
index 2261f42baac..9ae3365e1a6 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-5.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-5.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math -misa=sm_53 -mptx=_" } */
+/* { dg-additional-options "-mexperimental" } */
 
 _Float16 a;
 _Float16 b;
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-6.c 
b/gcc/testsuite/gcc.target/nvptx/float16-6.c
index 9ca714ca76f..37c580429c5 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-6.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-6.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -misa=sm_53 -mptx=_" } */
+/* { dg-additional-options "-mexperimental" } */
 
 _Float16 x;
 _Float16 y;


[committed][nvptx] Add mexperimental

2022-03-22 Thread Tom de Vries via Gcc-patches
Hi,

Add new option -mexperimental.

This allows, rather than developing a new feature to completion in a
development branch, to develop a new feature on trunk, without disturbing
trunk.

The equivalent of the feature branch merge then becomes making the
functionality available for -mno-experimental.

If more features at the same time will be developed, we can do something like
-mexperimental=feature1,feature2 but for now that's not necessary.

For now, has no effect.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add mexperimental

gcc/ChangeLog:

2022-03-19  Tom de Vries  

* config/nvptx/nvptx.opt (mexperimental): New option.

---
 gcc/config/nvptx/nvptx.opt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 980428b58cc..11288d1a8ee 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -88,3 +88,6 @@ Target Var(nvptx_comment) Init(1) Undocumented
 
 malias
 Target Var(nvptx_alias) Init(0) Undocumented
+
+mexperimental
+Target Var(nvptx_experimental) Init(0) Undocumented


[committed][nvptx] Use .alias directive for mptx >= 6.3

2022-03-22 Thread Tom de Vries via Gcc-patches
Hi,

Starting with ptx isa version 6.3, a ptx directive .alias is available.
Use this directive to support symbol aliases, as far as possible.

The alias support is off by default.  It can be turned on using a switch
-malias.

Furthermore, for pre-sm_75, it's not effective unless the ptx version is
bumped to 6.3 or higher using -mptx (given that the default for pre-sm_75 is
6.0).

The alias support has the following limitations.

Only function aliases are supported.

Weak aliases are not supported.  That is, if I disable the check in
nvptx_asm_output_def_from_decls that disallows this, a weak alias is emitted
and parsed by the driver.  But the test gcc.dg/globalalias.c starts failing,
with the behaviour matching the comment about "weird behavior of AIX's .set
pseudo-op": a weak alias may resolve to different functions in different
files.

Aliases to weak symbols are not supported (see gcc.dg/localalias.c).  This is
currently not prohibited by the compiler, but with the driver link we run
into: "error: Function test with .weak scope cannot be aliased".

Aliases to aliases are not supported (see libgomp.c-c++-common/pr96390.c).
This is currently not prohibited by the compiler, but with the driver link we
run into:  "Internal error: alias to unknown symbol" .

Unreferenced aliases are not emitted (these can occur f.i. when inlining a
call to an alias).  This avoids driver link error "Internal error: reference
to deleted section".

When enabling malias by default, libgomp detects alias support and
consequently libgomp.a will contains a few uses of .alias.  This however
results in aforementioned "Internal error: reference to deleted section" in
many test-cases.  Either there's some error with how .alias is used, or
there's a driver bug.  While this issue is not resolved, we keep malias
off-by-default.

At some point we may add support in the nvptx-tools linker for symbol
aliases, and define f.i. malias=ptx and malias=ld to choose between the two in
the compiler.

An example of where this support is useful, is the OvO (OpenMP vs Offload)
testsuite.  The testsuite passes already at -O2.  But at -O0, there are errors
in some c++ test-cases due to missing symbol alias support.  By compiling with
-malias, the whole testsuite passes also at -O0.

This patch causes a regression:
...
-PASS: gcc.dg/pr60797.c  (test for errors, line 4)
+FAIL: gcc.dg/pr60797.c  (test for errors, line 4)
...
The test-case is skipped for effective target alias, and both without and with
this patch the nvptx target is considered to not support it, so the test-case is
executed.  The test-case expects an error message along the lines of "alias
definitions not supported in this configuration", but instead we run into:
...
gcc.dg/pr60797.c:4:12: error: foo aliased to undefined symbol
...
This is probably due to the fact that the nvptx backend now defines macros
ASM_OUTPUT_DEF and ASM_OUTPUT_DEF_FROM_DECLS, so from the point of view of the
common part of the compiler, aliases are supported.

Committed to trunk.

Thanks,
- Tom

[nvptx] Use .alias directive for mptx >= 6.3

gcc/ChangeLog:

2022-03-18  Tom de Vries  

PR target/104957
* config/nvptx/nvptx-protos.h (nvptx_asm_output_def_from_decls): 
Declare.
* config/nvptx/nvptx.cc (write_fn_proto_1): Don't add function marker
for alias.
(SET_ASM_OP, NVPTX_ASM_OUTPUT_DEF): New macro def.
(nvptx_asm_output_def_from_decls): New function.
* config/nvptx/nvptx.h (ASM_OUTPUT_DEF): New macro def, define to
gcc_unreachable ().
(ASM_OUTPUT_DEF_FROM_DECLS): New macro def, define to
nvptx_asm_output_def_from_decls.
* config/nvptx/nvptx.opt (malias): New opt.

gcc/testsuite/ChangeLog:

2022-03-18  Tom de Vries  

PR target/104957
* gcc.target/nvptx/alias-1.c: New test.
* gcc.target/nvptx/alias-2.c: New test.
* gcc.target/nvptx/alias-3.c: New test.
* gcc.target/nvptx/alias-4.c: New test.
* gcc.target/nvptx/nvptx.exp
(check_effective_target_runtime_ptx_isa_version_6_3): New proc.

---
 gcc/config/nvptx/nvptx-protos.h  |  1 +
 gcc/config/nvptx/nvptx.cc| 74 +++-
 gcc/config/nvptx/nvptx.h | 17 
 gcc/config/nvptx/nvptx.opt   |  3 ++
 gcc/testsuite/gcc.target/nvptx/alias-1.c | 27 
 gcc/testsuite/gcc.target/nvptx/alias-2.c | 13 ++
 gcc/testsuite/gcc.target/nvptx/alias-3.c | 29 +
 gcc/testsuite/gcc.target/nvptx/alias-4.c | 12 ++
 gcc/testsuite/gcc.target/nvptx/nvptx.exp |  7 +++
 9 files changed, 182 insertions(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
index 0bf9af406a2..ca0a87ee4bd 100644
--- a/gcc/config/nvptx/nvptx-protos.h
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -43,6 +43,7 @@ extern void nvptx_output_ascii (FILE *, const char *, 
uns

[committed][nvptx] Add warp sync at simt exit

2022-03-22 Thread Tom de Vries via Gcc-patches
Hi,

Consider this code (with N defined to 1024):
...
  float v = 0.0;
  #pragma omp target map(tofrom: v)
  #pragma omp parallel for simd
  for (int i = 0 ; i < N; i++)
{
  #pragma omp atomic update
  v = v + 1.0;
}
...

It hangs when executing on target board unix/-foffload=-misa=sm_75, using
drivers 470.103.01 and 510.54 on a T400 board (sm_75).

I'm tentatively identifying the problem as a bug in -muniform-simt for
architectures that support Independent Thread Scheduling (sm_70 and later).

The problem -muniform-simt is trying to address is to make sure that a
register produced outside an openmp simd region is available when used in any
lane inside an simd region.

The solution is to, outside an simd region, execute in all warp lanes, thus
producing consistent values in result registers in each warp thread.

This approach doesn't work when executing in all warp lanes multiplies the
side effects from 1 to 32 separate side effects, which is the case for atomic
insns.  So atomic insns are rewritten to execute only in lane 0, and if
there are any results, those are propagated to the other threads in the warp.
[ And likewise for system calls malloc, free, vprintf. ]

Now, consider a non-atomic update: ld, add, store.  The store has side
effects, are those multiplied or not?

Pre-sm_70 we can assume that at the end of an SIMT region, any divergent
control flow has reconverged, and we have a uniform warp, executing in lock
step.  So:
- the load will load the same value into the result register across the warp,
- the add will write the same value into the result register across the warp,
- the store will write the same value to the same memory location, 32 times,
  at once, having the result of a single store.
So, no side-effect multiplication (well, at least that's the observation).

Starting sm_70, the threads in a warp are no longer guaranteed to reconverge
after divergence.  There's a "Convergence Optimizer" that can can identify
that it is safe for a warp to reconverge, but that works only as long as the
code does not contain "synchronizing operations".

Consequently, the ld, add, store sequence can be executed by a non-uniform
warp, which means the side effects can have multiplied, and the registers are
no longer guarantueed to be in sync.

The atomic update in the example above is translated using an atom.cas loop,
which means that we have divergence (because only one thread is allowed to
succeed at a time) and the "Convergence Optimizer" doesn't reconverge probably
because the atom.cas counts as a "synchronizing operation".  So, it seems
plausible that the root cause for the mentioned hang is the problem described
above.

Fix this by adding an explicit warp sync at simt exit.

Note that we're assuming here that the warp will stay uniform until the next
SIMT region entry.

Tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add warp sync at simt exit

gcc/ChangeLog:

2022-03-09  Tom de Vries  

PR target/104916
PR target/104783
* config/nvptx/nvptx.md (define_expand "omp_simt_exit"): Emit warp
sync (or uniform warp check for mptx < 6.0).

libgomp/ChangeLog:

2022-03-15  Tom de Vries  

PR target/104916
PR target/104783
* testsuite/libgomp.c/pr104783-2.c: New test.

---
 gcc/config/nvptx/nvptx.md|  4 
 libgomp/testsuite/libgomp.c/pr104783-2.c | 25 +
 2 files changed, 29 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 1dec7caa0d1..5550ce25513 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1881,6 +1881,10 @@ (define_expand "omp_simt_exit"
   ""
 {
   emit_insn (gen_omp_simt_exit (Pmode, operands[0]));
+  if (TARGET_PTX_6_0)
+emit_insn (gen_nvptx_warpsync ());
+  else
+emit_insn (gen_nvptx_uniform_warp_check ());
   DONE;
 })
 
diff --git a/libgomp/testsuite/libgomp.c/pr104783-2.c 
b/libgomp/testsuite/libgomp.c/pr104783-2.c
new file mode 100644
index 000..8750d915d01
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr104783-2.c
@@ -0,0 +1,25 @@
+#define N (32 * 32)
+
+#define TYPE float
+#define VAR v
+#define INIT 0.0
+#define UPDATE + 1.0
+#define EXPECTED N
+
+int
+main (void)
+{
+  TYPE VAR = INIT;
+  #pragma omp target map(tofrom: VAR)
+  #pragma omp parallel for simd
+  for (int i = 0 ; i < N; i++)
+{
+  #pragma omp atomic update
+  VAR = VAR UPDATE;
+}
+
+  if (VAR != EXPECTED)
+__builtin_abort ();
+
+  return 0;
+}


Re: [PING^2][PATCH][final] Handle compiler-generated asm insn

2022-03-21 Thread Tom de Vries via Gcc-patches

On 3/21/22 14:49, Richard Biener wrote:

On Mon, Mar 21, 2022 at 12:50 PM Tom de Vries  wrote:


On 3/21/22 08:58, Richard Biener wrote:

On Thu, Mar 17, 2022 at 4:10 PM Tom de Vries via Gcc-patches
 wrote:


On 3/9/22 13:50, Tom de Vries wrote:

On 2/22/22 14:55, Tom de Vries wrote:

Hi,

For the nvptx port, with -mptx-comment we have in pr53465.s:
...
   // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
   // Start: Added by -minit-regs=3:
   // #NO_APP
   mov.u32 %r26, 0;
   // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
   // End: Added by -minit-regs=3:
   // #NO_APP
...

The comments where generated using the compiler-generated equivalent of:
...
 asm ("// Comment");
...
but both the printed location and the NO_APP/APP are unnecessary for a
compiler-generated asm insn.

Fix this by handling ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION in
final_scan_insn_1, such what we simply get:
...
   // Start: Added by -minit-regs=3:
   mov.u32 %r26, 0;
   // End: Added by -minit-regs=3:
...

Tested on nvptx.

OK for trunk?





Ping^2.

Tobias just reported an ICE in PR104968, and this patch fixes it.

I'd like to known whether this patch is acceptable for stage 4 or not.

If not, I need to fix PR104968 in a different way.  Say, disable
-mcomment by default, or trying harder to propagate source info on
outlined functions.




Hi,

thanks for the review.


Usually targets use UNSPECs to emit compiler-generated "asm"
instructions.


Ack. [ I could go down that route eventually, but for now I'm hoping to
implement this without having to change the port. ]


I think an unknown location is a reasonable but not
the best way to identify 'compiler-generated', we might lose
the location through optimization.  (why does it not use
the INSN_LOCATION?)



I don't know.  FWIW, at the time that ASM_INPUT_SOURCE_LOCATION was
introduced (2007), there was no INSN_LOCATION yet (introduced in 2012),
only INSN_LOCATOR, my guess is that it has something to do with that.


Rather than a location I'd use sth like DECL_ARTIFICIAL to
disable 'user-mangling', do we have something like that for
ASM or an insn in general?


Haven't found it.


If not maybe there's an unused
bit on ASMs we can enable this way.


Done.  I've used the jump flag for that.

Updated, untested patch attached.

Is this what you meant?


Hmm.  I now read that ASM_INPUT is in every PATTERN of an insn


Maybe I misunderstand, but that sounds incorrect to me.  That is, can 
you point me to where you read that?


Maybe you're referring to the fact that an ASM_INPUT may occur inside an 
ASM_OPERANDS, as "a convenient way to hold a string" (quoting rtl.def)?



and wonder how this all works out there.  That is, by default the
ASM_INPUT would be artificial (for regular define_insn) but asm("")
in source would mark them ASM_INPUT_USER_P or so.



If you're suggesting to make it by default artificial, then that doesn't 
sound like a bad idea to me.  In this iteration I haven't implemented 
this (yet), but instead explicitly marked as artificial some other uses 
of ASM_INPUT.



But then I know nothing here.  I did expect us to look at
ASM_OPERANDS instead of just ASM_INPUT (but the code you
are changing is about ASM_INPUT).



I extended the rationale in the commit log a bit to include a 
description of what the rtl-equivalent of 'asm ("// Comment")' looks 
like, and there's no ASM_OPERANDS there.



That said, the comments should probably explicitely say this
is about ASM_INPUT in an ASM_OPERANDS  instruction
template, not some other pattern.



AFAIU, this isn't about an ASM_INPUT in an ASM_OPERANDS  instruction 
template, so at this point I haven't updated the comment.


Thanks,
- Tom
[final] Handle compiler-generated asm insn

For the nvptx port, with -mptx-comment we have for test-case pr53465.c at
mach:
...
(insn 66 43 65 3 (asm_input ("// Start: Added by -minit-regs=3:")) -1
 (nil))
(insn 65 66 67 3 (set (reg/v:SI 26 [ d ])
(const_int 0 [0])) 6 {*movsi_insn}
 (nil))
(insn 67 65 44 3 (asm_input ("// End: Added by -minit-regs=3:")) -1
 (nil))
...
and in pr53465.s:
...
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// Start: Added by -minit-regs=3:
// #NO_APP
mov.u32 %r26, 0;
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// End: Added by -minit-regs=3:
// #NO_APP
...

[ The comment insns were modelled after:
...
  asm ("// Comment");
...
which expands to:
...
(insn 5 2 6 2 (parallel [
(asm_input/v ("// Comment") test.c:4)
(clobber (mem:BLK (scratch) [0  A8]))
]) "test.c":4:3 -1
 (nil))
...
Note btw the differences: the comment insn has no clo

Re: [PING^2][PATCH][final] Handle compiler-generated asm insn

2022-03-21 Thread Tom de Vries via Gcc-patches

On 3/21/22 08:58, Richard Biener wrote:

On Thu, Mar 17, 2022 at 4:10 PM Tom de Vries via Gcc-patches
 wrote:


On 3/9/22 13:50, Tom de Vries wrote:

On 2/22/22 14:55, Tom de Vries wrote:

Hi,

For the nvptx port, with -mptx-comment we have in pr53465.s:
...
  // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
  // Start: Added by -minit-regs=3:
  // #NO_APP
  mov.u32 %r26, 0;
  // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
  // End: Added by -minit-regs=3:
  // #NO_APP
...

The comments where generated using the compiler-generated equivalent of:
...
asm ("// Comment");
...
but both the printed location and the NO_APP/APP are unnecessary for a
compiler-generated asm insn.

Fix this by handling ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION in
final_scan_insn_1, such what we simply get:
...
  // Start: Added by -minit-regs=3:
  mov.u32 %r26, 0;
  // End: Added by -minit-regs=3:
...

Tested on nvptx.

OK for trunk?





Ping^2.

Tobias just reported an ICE in PR104968, and this patch fixes it.

I'd like to known whether this patch is acceptable for stage 4 or not.

If not, I need to fix PR104968 in a different way.  Say, disable
-mcomment by default, or trying harder to propagate source info on
outlined functions.




Hi,

thanks for the review.


Usually targets use UNSPECs to emit compiler-generated "asm"
instructions.


Ack. [ I could go down that route eventually, but for now I'm hoping to 
implement this without having to change the port. ]



I think an unknown location is a reasonable but not
the best way to identify 'compiler-generated', we might lose
the location through optimization.  (why does it not use
the INSN_LOCATION?)



I don't know.  FWIW, at the time that ASM_INPUT_SOURCE_LOCATION was 
introduced (2007), there was no INSN_LOCATION yet (introduced in 2012), 
only INSN_LOCATOR, my guess is that it has something to do with that.



Rather than a location I'd use sth like DECL_ARTIFICIAL to
disable 'user-mangling', do we have something like that for
ASM or an insn in general? 


Haven't found it.


If not maybe there's an unused
bit on ASMs we can enable this way.


Done.  I've used the jump flag for that.

Updated, untested patch attached.

Is this what you meant?

Thanks,
- Tom[final] Handle compiler-generated asm insn

For the nvptx port, with -mptx-comment we have in pr53465.s:
...
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// Start: Added by -minit-regs=3:
// #NO_APP
mov.u32 %r26, 0;
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// End: Added by -minit-regs=3:
// #NO_APP
...

The comments where generated using the compiler-generated equivalent of:
...
  asm ("// Comment");
...
but both the printed location and the NO_APP/APP are unnecessary for a
compiler-generated asm insn.

Fix this by:
- adding new flag ASM_INPUT_ARTIFICIAL_P
- in gen_comment:
  - setting ASM_INPUT_ARTIFICIAL_P to 1
  - setting ASM_INPUT_SOURCE_LOCATION to UNKNOWN_LOCATION,
- in final_scan_insn_1:
  - handling ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION and
  ASM_INPUT_ARTIFICIAL_P
such what we simply get:
...
// Start: Added by -minit-regs=3:
mov.u32 %r26, 0;
// End: Added by -minit-regs=3:
...

Tested on nvptx.

gcc/ChangeLog:

2022-02-21  Tom de Vries  

	PR rtl-optimization/104596
	* rtl.h (struct rtx_def): Document use of jump flag in ASM_INPUT.
	(ASM_INPUT_ARTIFICIAL_P): New macro.
	* config/nvptx/nvptx.cc (gen_comment): Use gen_rtx_ASM_INPUT instead
	of gen_rtx_ASM_INPUT_loc.  Set ASM_INPUT_ARTIFICIAL_P.
	* final.cc (final_scan_insn_1): Handle
	ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION and
	ASM_INPUT_ARTIFICIAL_P.

---
 gcc/config/nvptx/nvptx.cc |  5 +++--
 gcc/final.cc  | 18 --
 gcc/rtl.h |  3 +++
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 87efc23bd96a..93df3f309d18 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -5442,8 +5442,9 @@ gen_comment (const char *s)
   size_t len = strlen (ASM_COMMENT_START) + strlen (sep) + strlen (s) + 1;
   char *comment = (char *) alloca (len);
   snprintf (comment, len, "%s%s%s", ASM_COMMENT_START, sep, s);
-  return gen_rtx_ASM_INPUT_loc (VOIDmode, ggc_strdup (comment),
-DECL_SOURCE_LOCATION (cfun->decl));
+  rtx asm_input = gen_rtx_ASM_INPUT (VOIDmode, ggc_strdup (comment));
+  ASM_INPUT_ARTIFICIAL_P (asm_input) = 1;
+  return asm_input;
 }
 
 /* Initialize all declared regs at function entry.
diff --git a/gcc/final.cc b/gcc/final.cc
index a9868861bd2c..fee512869482 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -2642,15 +2642,21 @@ final_scan_insn

Re: [PATCH][openmp] Set location for taskloop stmts

2022-03-18 Thread Tom de Vries via Gcc-patches

On 3/18/22 15:56, Jakub Jelinek wrote:

On Fri, Mar 18, 2022 at 03:42:48PM +0100, Tom de Vries wrote:

And for NVPTX we somehow lower the taskloop into GIMPLE_ASM
or how we end up ICEing?



In the nvptx backend, gen_comment (triggering not very frequently atm) uses
gen_rtx_ASM_INPUT_loc with as location argument DECL_SOURCE_LOCATION
(cfun->decl).


Ok.


Alternatively, if there's a better way to get some random valid location
than DECL_SOURCE_LOCATION (cfun->decl), that would also work for me. ]


No objection against doing that, but if we do it, we should probably do it
for all or at least most gimple_build_omp_* calls, not just these 2.
So in gimplify_omp_parallel, gimplify_omp_task, another spot in
gimplify_omp_for beyond these 2, gimplify_omp_workshare (ideally just
in one spot for all the cases), gimplify_omp_target_update,
gimplify_omp_atomic, gimplify_omp_ordered, gimplify_expr's
case OMP_* that call gimple_build_omp_*.
Or is it normally handled using
if (!gimple_seq_empty_p (internal_post))
  {
annotate_all_with_location (internal_post, input_location);
gimplify_seq_add_seq (pre_p, internal_post);
  }
and we just need to catch the cases where we gimplify something into
multiple nested stmts because annotate_all_with_location doesn't
walk into gimple_omp_body?


I can try to update the patch to take care of these additional cases.

I reckon answering the questions that you're asking requires writing
test-cases for all of these.


Actually, in the light of annotate_all_with_location annotating
the newly generated sequence except for the stmts in nested contexts
I think only the two spots you have in your patch is what needs adjusting.

But I'd do it only when actually dealing with a OMP_TASKLOOP, so both
in the spot of your second hunk and for consistency with the
annotate_all_with_location do there (pseudo patch):
+  gimple_set_location (gfor, input_location);
g = gimple_build_bind (NULL_TREE, gfor, NULL_TREE);
g = gimple_build_omp_task (g, task_clauses, NULL_TREE, NULL_TREE,
   NULL_TREE, NULL_TREE, NULL_TREE);
gimple_omp_task_set_taskloop_p (g, true);
+  gimple_set_location (g, input_location);
g = gimple_build_bind (NULL_TREE, g, NULL_TREE);
gomp_for *gforo
  = gimple_build_omp_for (g, GF_OMP_FOR_KIND_TASKLOOP, 
outer_for_clauses,
  gimple_omp_for_collapse (gfor),
  gimple_omp_for_pre_body (gfor));
gimple_omp_for_set_pre_body (gfor, NULL);
gimple_omp_for_set_combined_p (gforo, true);
gimple_omp_for_set_combined_into_p (gfor, true);
In theory we could do it for the gimple_build_bind results too, but we don't
do that in other spots where we gimple_build_bind in OpenMP/OpenACC related
gimplification.

Ok for trunk with those tweaks.


Ack, committed (in two steps though, I accidentally first committed the 
old patch).


Thanks,
- Tom


[committed][openmp] Fix SIMT reduction using TRUTH_{AND,OR}IF_EXPR

2022-03-18 Thread Tom de Vries via Gcc-patches
Hi,

Consider test-case pr104952-1.c, included in this commit, containing:
...
  #pragma omp target map(tofrom:result) map(to:arr)
  #pragma omp simd reduction(||: result)
...

When run on x86_64 with nvptx accelerator, the test-case either aborts or
hangs.

The reduction clause is translated by the SIMT code (active for nvptx) as a
butterfly reduction loop with this butterfly shuffle / update pair:
...
  D.2163 = D.2163 || .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164)
...
in the loop body.

The problem is that the butterfly shuffle is possibly not executed, while it
needs to be executed unconditionally.

Fix this by translating instead as:
...
  D.tmp_bfly = .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164)
  D.2163 = D.2163 || D.tmp_bfly
...

Tested on x86_64-linux with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[openmp] Fix SIMT reduction using TRUTH_{AND,OR}IF_EXPR

gcc/ChangeLog:

2022-03-17  Tom de Vries  

PR target/104952
* omp-low.cc (lower_rec_input_clauses): Make sure GOMP_SIMT_XCHG_BFLY
is executed unconditionally.

libgomp/ChangeLog:

2022-03-17  Tom de Vries  

PR target/104952
* testsuite/libgomp.c/pr104952-1.c: New test.
* testsuite/libgomp.c/pr104952-2.c: New test.

---
 gcc/omp-low.cc   |  5 -
 libgomp/testsuite/libgomp.c/pr104952-1.c | 24 
 libgomp/testsuite/libgomp.c/pr104952-2.c | 22 ++
 3 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index cfc63d6a104..392bb18bc5d 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -6743,7 +6743,10 @@ lower_rec_input_clauses (tree clauses, gimple_seq 
*ilist, gimple_seq *dlist,
  x = build_call_expr_internal_loc
(UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
 TREE_TYPE (ivar), 2, ivar, simt_lane);
- x = build2 (code, TREE_TYPE (ivar), ivar, x);
+ /* Make sure x is evaluated unconditionally.  */
+ tree bfly_var = create_tmp_var (TREE_TYPE (ivar));
+ gimplify_assign (bfly_var, x, [2]);
+ x = build2 (code, TREE_TYPE (ivar), ivar, bfly_var);
  gimplify_assign (ivar, x, [2]);
}
  tree ivar2 = ivar;
diff --git a/libgomp/testsuite/libgomp.c/pr104952-1.c 
b/libgomp/testsuite/libgomp.c/pr104952-1.c
new file mode 100644
index 000..a3bfb1e77df
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr104952-1.c
@@ -0,0 +1,24 @@
+#define N 32
+
+static char arr[N];
+
+int
+main (void)
+{
+  unsigned int result = 0;
+
+  for (unsigned int i = 0; i < N; ++i)
+arr[i] = 0;
+
+  arr[5] = 42;
+
+#pragma omp target map(tofrom:result) map(to:arr)
+#pragma omp simd reduction(||: result)
+  for (unsigned int i = 0; i < N; ++i)
+result = result || arr[i];
+
+  if (result != 1)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/pr104952-2.c 
b/libgomp/testsuite/libgomp.c/pr104952-2.c
new file mode 100644
index 000..7ab4bcdb8af
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr104952-2.c
@@ -0,0 +1,22 @@
+#define N 32
+
+static char arr[N];
+
+int
+main (void)
+{
+  unsigned int result = 2;
+
+  for (unsigned int i = 0; i < N; ++i)
+arr[i] = i + 1;
+
+#pragma omp target map(tofrom:result) map(to:arr)
+#pragma omp simd reduction(&&: result)
+  for (unsigned int i = 0; i < N; ++i)
+result = result && arr[i];
+
+  if (result != 1)
+__builtin_abort ();
+
+  return 0;
+}


Re: [PATCH][openmp] Set location for taskloop stmts

2022-03-18 Thread Tom de Vries via Gcc-patches

On 3/18/22 14:01, Jakub Jelinek wrote:

On Fri, Mar 18, 2022 at 01:44:00PM +0100, Tom de Vries wrote:

The test-case included in this patch contains:
...
   #pragma omp taskloop simd shared(a) lastprivate(myId)
...

This is translated to 3 taskloop statements in gimple, visible with
-fdump-tree-gimple:
...
   #pragma omp taskloop private(D.2124)
 #pragma omp taskloop shared(a) shared(myId) private(i.0) firstprivate(a_h)
   #pragma omp taskloop lastprivate(myId)
...

But when exposing the gimple statement locations using
-fdump-tree-gimple-lineno, we find that only the first one has location
information.

Fix this by adding the missing location information.

Tested gomp.exp on x86_64.

Tested libgomp testsuite on x86_64 with nvptx accelerator.


And for NVPTX we somehow lower the taskloop into GIMPLE_ASM
or how we end up ICEing?



In the nvptx backend, gen_comment (triggering not very frequently atm) 
uses gen_rtx_ASM_INPUT_loc with as location argument 
DECL_SOURCE_LOCATION (cfun->decl).


If this location is UNKNOWN_LOCATION, we run into an ICE, which is fixed 
by the proposed patch "[final] Handle compiler-generated asm insn" ( 
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590721.html ).


As for the openmp test-case, we end up lowering at least one of those 
taskloops into an outlined function, and if its location is 
UNKNOWN_LOCATION and gen_comment is triggered in the body, we run into 
the ICE.


[ My preferred solution is to have "[final] Handle compiler-generated 
asm insn" approved and committed, but no response sofar, maybe ignored 
for not being stage-4 material, I'm not sure.


Alternatively, if there's a better way to get some random valid location 
than DECL_SOURCE_LOCATION (cfun->decl), that would also work for me. ]



No objection against doing that, but if we do it, we should probably do it
for all or at least most gimple_build_omp_* calls, not just these 2.
So in gimplify_omp_parallel, gimplify_omp_task, another spot in
gimplify_omp_for beyond these 2, gimplify_omp_workshare (ideally just
in one spot for all the cases), gimplify_omp_target_update,
gimplify_omp_atomic, gimplify_omp_ordered, gimplify_expr's
case OMP_* that call gimple_build_omp_*.
Or is it normally handled using
   if (!gimple_seq_empty_p (internal_post))
 {
   annotate_all_with_location (internal_post, input_location);
   gimplify_seq_add_seq (pre_p, internal_post);
 }
and we just need to catch the cases where we gimplify something into
multiple nested stmts because annotate_all_with_location doesn't
walk into gimple_omp_body?


I can try to update the patch to take care of these additional cases.

I reckon answering the questions that you're asking requires writing 
test-cases for all of these.


Thanks,
- Tom


[PATCH][openmp] Set location for taskloop stmts

2022-03-18 Thread Tom de Vries via Gcc-patches
Hi,

The test-case included in this patch contains:
...
  #pragma omp taskloop simd shared(a) lastprivate(myId)
...

This is translated to 3 taskloop statements in gimple, visible with
-fdump-tree-gimple:
...
  #pragma omp taskloop private(D.2124)
#pragma omp taskloop shared(a) shared(myId) private(i.0) firstprivate(a_h)
  #pragma omp taskloop lastprivate(myId)
...

But when exposing the gimple statement locations using
-fdump-tree-gimple-lineno, we find that only the first one has location
information.

Fix this by adding the missing location information.

Tested gomp.exp on x86_64.

Tested libgomp testsuite on x86_64 with nvptx accelerator.

OK for trunk?

Thanks,
- Tom

[openmp] Set location for taskloop stmts

gcc/ChangeLog:

2022-03-18  Tom de Vries  

* gimplify.cc (gimplify_omp_for): Set taskloop location.

gcc/testsuite/ChangeLog:

2022-03-18  Tom de Vries  

* c-c++-common/gomp/pr104968.c: New test.

---
 gcc/gimplify.cc|  2 ++
 gcc/testsuite/c-c++-common/gomp/pr104968.c | 14 ++
 2 files changed, 16 insertions(+)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 139a0de6100..c46589639e4 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -13178,6 +13178,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
   gfor = gimple_build_omp_for (for_body, kind, OMP_FOR_CLAUSES (orig_for_stmt),
   TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)),
   for_pre_body);
+  gimple_set_location (gfor, EXPR_LOCATION (*expr_p));
   if (orig_for_stmt != for_stmt)
 gimple_omp_for_set_combined_p (gfor, true);
   if (gimplify_omp_ctxp
@@ -13361,6 +13362,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
   g = gimple_build_bind (NULL_TREE, gfor, NULL_TREE);
   g = gimple_build_omp_task (g, task_clauses, NULL_TREE, NULL_TREE,
 NULL_TREE, NULL_TREE, NULL_TREE);
+  gimple_set_location (g, EXPR_LOCATION (*expr_p));
   gimple_omp_task_set_taskloop_p (g, true);
   g = gimple_build_bind (NULL_TREE, g, NULL_TREE);
   gomp_for *gforo
diff --git a/gcc/testsuite/c-c++-common/gomp/pr104968.c 
b/gcc/testsuite/c-c++-common/gomp/pr104968.c
new file mode 100644
index 000..2977db2f433
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/pr104968.c
@@ -0,0 +1,14 @@
+/* { dg-additional-options "-fdump-tree-gimple-lineno" }  */
+
+int
+main (void)
+{
+  double a[10], a_h[10];
+  int myId = -1;
+#pragma omp target map(tofrom:a)
+#pragma omp taskloop simd shared(a) lastprivate(myId) /* { dg-line here } */
+for(int i = 0 ; i < 10; i++) if (a[i] != a_h[i]) { }
+}
+
+/* { dg-final { scan-tree-dump-times "#pragma omp taskloop" 3 "gimple" } }  */
+/* { dg-final { scan-tree-dump-times "(?n)\\\[.*pr104968.c:[get-absolute-line 
'' here]:.*\\\] #pragma omp taskloop" 3 "gimple" } }  */


[PING^2][PATCH][final] Handle compiler-generated asm insn

2022-03-17 Thread Tom de Vries via Gcc-patches

On 3/9/22 13:50, Tom de Vries wrote:

On 2/22/22 14:55, Tom de Vries wrote:

Hi,

For the nvptx port, with -mptx-comment we have in pr53465.s:
...
 // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
 // Start: Added by -minit-regs=3:
 // #NO_APP
 mov.u32 %r26, 0;
 // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
 // End: Added by -minit-regs=3:
 // #NO_APP
...

The comments where generated using the compiler-generated equivalent of:
...
   asm ("// Comment");
...
but both the printed location and the NO_APP/APP are unnecessary for a
compiler-generated asm insn.

Fix this by handling ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION in
final_scan_insn_1, such what we simply get:
...
 // Start: Added by -minit-regs=3:
 mov.u32 %r26, 0;
 // End: Added by -minit-regs=3:
...

Tested on nvptx.

OK for trunk?





Ping^2.

Tobias just reported an ICE in PR104968, and this patch fixes it.

I'd like to known whether this patch is acceptable for stage 4 or not.

If not, I need to fix PR104968 in a different way.  Say, disable 
-mcomment by default, or trying harder to propagate source info on 
outlined functions.


Thanks,
- Tom


[final] Handle compiler-generated asm insn

gcc/ChangeLog:

2022-02-21  Tom de Vries  

PR rtl-optimization/104596
* config/nvptx/nvptx.cc (gen_comment): Use gen_rtx_ASM_INPUT instead
of gen_rtx_ASM_INPUT_loc.
* final.cc (final_scan_insn_1): Handle
ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION.

---
  gcc/config/nvptx/nvptx.cc |  3 +--
  gcc/final.cc  | 17 +++--
  2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 858789e6df7..4124c597f24 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -5381,8 +5381,7 @@ gen_comment (const char *s)
    size_t len = strlen (ASM_COMMENT_START) + strlen (sep) + strlen 
(s) + 1;

    char *comment = (char *) alloca (len);
    snprintf (comment, len, "%s%s%s", ASM_COMMENT_START, sep, s);
-  return gen_rtx_ASM_INPUT_loc (VOIDmode, ggc_strdup (comment),
-    cfun->function_start_locus);
+  return gen_rtx_ASM_INPUT (VOIDmode, ggc_strdup (comment));
  }
  /* Initialize all declared regs at function entry.
diff --git a/gcc/final.cc b/gcc/final.cc
index a9868861bd2..e6443ef7a4f 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -2642,15 +2642,20 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, 
int optimize_p ATTRIBUTE_UNUSED,

  if (string[0])
    {
  expanded_location loc;
+    bool unknown_loc_p
+  = ASM_INPUT_SOURCE_LOCATION (body) == UNKNOWN_LOCATION;
-    app_enable ();
-    loc = expand_location (ASM_INPUT_SOURCE_LOCATION (body));
-    if (*loc.file && loc.line)
-  fprintf (asm_out_file, "%s %i \"%s\" 1\n",
-   ASM_COMMENT_START, loc.line, loc.file);
+    if (!unknown_loc_p)
+  {
+    app_enable ();
+    loc = expand_location (ASM_INPUT_SOURCE_LOCATION (body));
+    if (*loc.file && loc.line)
+  fprintf (asm_out_file, "%s %i \"%s\" 1\n",
+   ASM_COMMENT_START, loc.line, loc.file);
+  }
  fprintf (asm_out_file, "\t%s\n", string);
  #if HAVE_AS_LINE_ZERO
-    if (*loc.file && loc.line)
+    if (!unknown_loc_p && loc.file && *loc.file && loc.line)
    fprintf (asm_out_file, "%s 0 \"\" 2\n", ASM_COMMENT_START);
  #endif
    }


PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Tom de Vries via Gcc-patches

On 3/2/22 20:18, Jeff Law via Gcc-patches wrote:



On 2/28/2022 5:54 AM, Richard Biener via Gcc-patches wrote:

On Mon, 28 Feb 2022, Tobias Burnus wrote:


Ping**3

On 23.02.22 09:42, Tobias Burnus wrote:

PING**2 for the ME review or at least comments to that patch,
which fixes a build issue/ICE with nvptx

Patch:
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html
(for gcc/cfgexpand.cc + gcc/expr.cc)

(There is some discussion by Tom and Roger about the BE in the patch
thread, which only not relate to the ME patch. But there is no
ME-patch comment so far.)

The related BE patch has been already committed, but to be effective, it
needs the ME patch.

I'm not sure I'm qualified to review this - maybe Richard is.
I'd initially ignored the patch as it didn't seem a good fit for stage4, 
subsequent messages changed my mind about it, but I never went back to 
take a deeper look at Roger's patch.


Ping.

[ FWIW, I'd appreciate it if a response came before the end of stage 4, 
such that I have some time left to deal with fallout in case the patch 
is not approved. ]


Thanks,
- Tom


[committed][nvptx] Use no,yes for attribute predicable

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi,

The documentation states about the predicable instruction attribute:
...
This attribute must be a boolean (i.e. have exactly two elements in its
list-of-values), with the possible values being no and yes.
...

The nvptx port has instead:
...
(define_attr "predicable" "false,true"
  (const_string "true"))
...

Fix this by updating to:
...
(define_attr "predicable" "no,yes"
  (const_string "yes"))
...

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Use no,yes for attribute predicable

gcc/ChangeLog:

2022-03-08  Tom de Vries  

PR target/104840
* config/nvptx/nvptx.md (define_attr "predicable"): Use no,yes instead
of false,true.

---
 gcc/config/nvptx/nvptx.md | 40 
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 1ccb0f11e4c..1dec7caa0d1 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -172,8 +172,8 @@ (define_predicate "symbol_ref_function_operand"
   return SYMBOL_REF_FUNCTION_P (op);
 })
 
-(define_attr "predicable" "false,true"
-  (const_string "true"))
+(define_attr "predicable" "no,yes"
+  (const_string "yes"))
 
 (define_cond_exec
   [(match_operator 0 "predicate_operator"
@@ -911,7 +911,7 @@ (define_insn "br_true"
  (pc)))]
   ""
   "%j0\\tbra\\t%l1;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_insn "br_false"
   [(set (pc)
@@ -921,7 +921,7 @@ (define_insn "br_false"
  (pc)))]
   ""
   "%J0\\tbra\\t%l1;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 ;; unified conditional branch
 (define_insn "br_true_uni"
@@ -931,7 +931,7 @@ (define_insn "br_true_uni"
 (label_ref (match_operand 1 "" "")) (pc)))]
   ""
   "%j0\\tbra.uni\\t%l1;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_insn "br_false_uni"
   [(set (pc) (if_then_else
@@ -940,7 +940,7 @@ (define_insn "br_false_uni"
 (label_ref (match_operand 1 "" "")) (pc)))]
   ""
   "%J0\\tbra.uni\\t%l1;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_expand "cbranch4"
   [(set (pc)
@@ -1619,7 +1619,7 @@ (define_insn "return"
 {
   return nvptx_output_return ();
 }
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_expand "epilogue"
   [(clobber (const_int 0))]
@@ -1712,7 +1712,7 @@ (define_insn "trap_if_true"
(const_int 0))]
   ""
   "%j0 trap; %j0 exit;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_insn "trap_if_false"
   [(trap_if (eq (match_operand:BI 0 "nvptx_register_operand" "R")
@@ -1720,7 +1720,7 @@ (define_insn "trap_if_false"
(const_int 0))]
   ""
   "%J0 trap; %J0 exit;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_expand "ctrap4"
   [(trap_if (match_operator 0 "nvptx_comparison_operator"
@@ -1769,28 +1769,28 @@ (define_insn "nvptx_fork"
   UNSPECV_FORK)]
   ""
   "// fork %0;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_insn "nvptx_forked"
   [(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "")]
   UNSPECV_FORKED)]
   ""
   "// forked %0;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_insn "nvptx_joining"
   [(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "")]
   UNSPECV_JOINING)]
   ""
   "// joining %0;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_insn "nvptx_join"
   [(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "")]
   UNSPECV_JOIN)]
   ""
   "// join %0;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicabl

[committed][nvptx] Disable warp sync in simt region

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi,

I ran into a hang for this code:
...
  #pragma omp target map(tofrom: counter_N0)
  #pragma omp simd
  for (int i = 0 ; i < 1 ; i++ )
{
  #pragma omp atomic update
  counter_N0 = counter_N0 + 1 ;
}
...

This has to do with the nature of -muniform-simt.  It has two modes of
operation: inside and outside an SIMT region.

Outside an SIMT region, a warp pretends to execute a single thread, but
actually executes in all threads, to keep the local registers in all threads
consistent.  This approach works unless the insn that is executed is a syscall
or an atomic insn.  In that case, the insn is predicated, such that it
executes in only one thread.  If the predicated insn writes a result to a
register, then that register is propagated to the other threads, after which
the local registers in all threads are consistent again.

Inside an SIMT region, a warp executes in all threads.  However, the
predication and propagation for syscalls and atomic insns is also present
here, because nvptx_reorg_uniform_simt works on all code.  Care has been taken
though to ensure that the predication and propagation is a nop.  That is,
inside an SIMT region:
- the predicate evalutes to true for each thread, and
- the propagation insn copies a register from each thread to the same thread.

That works fine, until we use -mptx=6.0, and instead of using the deprecated
warp propagation insn shfl, we start using shfl.sync:
...
  @%r33 atom.add.u32_, [%r29], 1;
shfl.sync.idx.b32   %r30, %r30, %r32, 31, 0x;
...

The shfl.sync specifies a member mask indicating all threads, but given that
the loop only has a single iteration, only thread 0 will execute the insn,
where it will hang waiting for the other threads.

Fix this by predicating the shfl.sync (and likewise, bar.warp.sync and the
uniform warp check) such that it only executes outside the SIMT region.

Tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Disable warp sync in simt region

gcc/ChangeLog:

2022-03-08  Tom de Vries  

PR target/104783
* config/nvptx/nvptx.cc (nvptx_init_unisimt_predicate)
(nvptx_output_unisimt_switch): Handle unisimt_outside_simt_predicate.
(nvptx_get_unisimt_outside_simt_predicate): New function.
(predicate_insn): New function, factored out of ...
(nvptx_reorg_uniform_simt): ... here.  Predicate all emitted insns.
* config/nvptx/nvptx.h (struct machine_function): Add
unisimt_outside_simt_predicate field.
* config/nvptx/nvptx.md (define_insn "nvptx_warpsync")
(define_insn "nvptx_uniform_warp_check"): Make predicable.

libgomp/ChangeLog:

2022-03-10  Tom de Vries  

* testsuite/libgomp.c/pr104783.c: New test.

---
 gcc/config/nvptx/nvptx.cc  | 45 +++---
 gcc/config/nvptx/nvptx.h   |  1 +
 gcc/config/nvptx/nvptx.md  | 29 --
 libgomp/testsuite/libgomp.c/pr104783.c | 18 ++
 4 files changed, 76 insertions(+), 17 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index c41e305a34f..3a7be63c290 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -1364,6 +1364,13 @@ nvptx_init_unisimt_predicate (FILE *file)
   int master = REGNO (cfun->machine->unisimt_master);
   int pred = REGNO (cfun->machine->unisimt_predicate);
   fprintf (file, "\t\tld.shared.u32 %%r%d, [%%r%d];\n", master, loc);
+  if (cfun->machine->unisimt_outside_simt_predicate)
+   {
+ int pred_outside_simt
+   = REGNO (cfun->machine->unisimt_outside_simt_predicate);
+ fprintf (file, "\t\tsetp.eq.u32 %%r%d, %%r%d, 0;\n",
+  pred_outside_simt, master);
+   }
   fprintf (file, "\t\tmov.u32 %%ustmp0, %%laneid;\n");
   /* Compute 'master lane index' as 'laneid & __nvptx_uni[tid.y]'.  */
   fprintf (file, "\t\tand.b32 %%r%d, %%r%d, %%ustmp0;\n", master, master);
@@ -1589,6 +1596,13 @@ nvptx_output_unisimt_switch (FILE *file, bool entering)
   fprintf (file, "\t{\n");
   fprintf (file, "\t\t.reg.u32 %%ustmp2;\n");
   fprintf (file, "\t\tmov.u32 %%ustmp2, %d;\n", entering ? -1 : 0);
+  if (cfun->machine->unisimt_outside_simt_predicate)
+{
+  int pred_outside_simt
+   = REGNO (cfun->machine->unisimt_outside_simt_predicate);
+  fprintf (file, "\t\tmov.pred %%r%d, %d;\n", pred_outside_simt,
+  entering ? 0 : 1);
+}
   if (!crtl->is_leaf)
 {
   int loc = REGNO (cfun->machine->unisimt_location);
@@ -3242,6 +3256,13 @@ nvptx_get_unisimt_predicate ()
   return pred ? pred : pred = gen_reg_rtx (BImode);
 }
 
+static rtx
+nvptx_get_unisimt_outside_simt_predicate ()
+{
+  rtx  = cfun->machine->unisimt_outside_simt_predicate;
+  retu

[committed][nvptx] Handle unused result in nvptx_unisimt_handle_set

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi,

For an example:
...
  #pragma omp target map(tofrom: counter_N0)
  #pragma omp simd
  for (int i = 0 ; i < 1 ; i++ )
{
  #pragma omp atomic update
  counter_N0 = counter_N0 + 1 ;
}
...
I noticed that the result of the atomic update (%r30) is propagated:
...
  @%r33 atom.add.u32_, [%r29], 1;
shfl.sync.idx.b32   %r30, %r30, %r32, 31, 0x;
...
even though it is unused (which is why the bit bucket operand _ is used).

Fix this by not emitting the shuffle in this case, such that we have instead:
...
  @%r33 atom.add.u32_, [%r29], 1;
bar.warp.sync   0x;
...

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Handle unused result in nvptx_unisimt_handle_set

gcc/ChangeLog:

2022-03-07  Tom de Vries  

* config/nvptx/nvptx.cc (nvptx_unisimt_handle_set): Handle unused
result.

gcc/testsuite/ChangeLog:

2022-03-07  Tom de Vries  

* gcc.target/nvptx/uniform-simt-4.c: New test.

---
 gcc/config/nvptx/nvptx.cc   |  4 +++-
 gcc/testsuite/gcc.target/nvptx/uniform-simt-4.c | 22 ++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 14911bd15f1..c41e305a34f 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -3274,7 +3274,9 @@ static bool
 nvptx_unisimt_handle_set (rtx set, rtx_insn *insn, rtx master)
 {
   rtx reg;
-  if (GET_CODE (set) == SET && REG_P (reg = SET_DEST (set)))
+  if (GET_CODE (set) == SET
+  && REG_P (reg = SET_DEST (set))
+  && find_reg_note (insn, REG_UNUSED, reg) == NULL_RTX)
 {
   emit_insn_after (nvptx_gen_shuffle (reg, reg, master, SHUFFLE_IDX),
   insn);
diff --git a/gcc/testsuite/gcc.target/nvptx/uniform-simt-4.c 
b/gcc/testsuite/gcc.target/nvptx/uniform-simt-4.c
new file mode 100644
index 000..c33de7a4111
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/uniform-simt-4.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -muniform-simt -mptx=_" } */
+
+enum memmodel
+{
+  MEMMODEL_RELAXED = 0
+};
+
+unsigned long long int *p64;
+unsigned long long int v64;
+
+int
+main()
+{
+  __atomic_fetch_add (p64, v64, MEMMODEL_RELAXED);
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "atom.add.u64\[\t \]+_," 1 } } */
+/* { dg-final { scan-assembler-times "bar.warp.sync" 1 } } */
+/* { dg-final { scan-assembler-not "shfl.sync.idx" } } */


[committed][nvptx] Use bit-bucket operand for atom insns

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi,

For an atomic fetch operation that doesn't use the result:
...
  __atomic_fetch_add (p64, v64, MEMMODEL_RELAXED);
...
we currently emit:
...
  atom.add.u64 %r26, [%r25], %r27;
...

Detect the REG_UNUSED reg-note for %r26, and emit instead:
...
  atom.add.u64 _, [%r25], %r27;
...

Likewise for all atom insns.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Use bit-bucket operand for atom insns

gcc/ChangeLog:

2022-03-07  Tom de Vries  

PR target/104815
* config/nvptx/nvptx.cc (nvptx_print_operand): Handle 'x' operand
modifier.
* config/nvptx/nvptx.md: Use %x0 destination operand in atom insns.

gcc/testsuite/ChangeLog:

2022-03-07  Tom de Vries  

PR target/104815
* gcc.target/nvptx/atomic-bit-bucket-dest.c: New test.

---
 gcc/config/nvptx/nvptx.cc  | 11 ++-
 gcc/config/nvptx/nvptx.md  | 10 +++
 .../gcc.target/nvptx/atomic-bit-bucket-dest.c  | 35 ++
 3 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 6ca99a61cbd..14911bd15f1 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -2835,7 +2835,8 @@ nvptx_mem_maybe_shared_p (const_rtx x)
S -- print a shuffle kind specified by CONST_INT
t -- print a type opcode suffix, promoting QImode to 32 bits
T -- print a type size in bits
-   u -- print a type opcode suffix without promotions.  */
+   u -- print a type opcode suffix without promotions.
+   x -- print a destination operand that may also be a bit bucket.  */
 
 static void
 nvptx_print_operand (FILE *file, rtx x, int code)
@@ -2863,6 +2864,14 @@ nvptx_print_operand (FILE *file, rtx x, int code)
 
   switch (code)
 {
+case 'x':
+  if (current_output_insn != NULL
+ && find_reg_note (current_output_insn, REG_UNUSED, x) != NULL_RTX)
+   {
+ fputs ("_", file);
+ return;
+   }
+  goto common;
 case 'B':
   if (SYMBOL_REF_P (XEXP (x, 0)))
switch (SYMBOL_DATA_AREA (XEXP (x, 0)))
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 8079763077f..1cbf197065f 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -2050,7 +2050,7 @@ (define_insn "atomic_compare_and_swap_1"
   ""
   {
 const char *t
-  = "%.\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;";
+  = "%.\\tatom%A1.cas.b%T0\\t%x0, %1, %2, %3;";
 return nvptx_output_atomic_insn (t, operands, 1, 4);
   }
   [(set_attr "atomic" "true")])
@@ -2076,7 +2076,7 @@ (define_insn "atomic_exchange"
return "";
   }
 const char *t
-  = "%.\tatom%A1.exch.b%T0\t%0, %1, %2;";
+  = "%.\tatom%A1.exch.b%T0\t%x0, %1, %2;";
 return nvptx_output_atomic_insn (t, operands, 1, 3);
   }
   [(set_attr "atomic" "true")])
@@ -2166,7 +2166,7 @@ (define_insn "atomic_fetch_add"
return "";
   }
 const char *t
-  = "%.\\tatom%A1.add%t0\\t%0, %1, %2;";
+  = "%.\\tatom%A1.add%t0\\t%x0, %1, %2;";
 return nvptx_output_atomic_insn (t, operands, 1, 3);
   }
   [(set_attr "atomic" "true")])
@@ -2196,7 +2196,7 @@ (define_insn "atomic_fetch_addsf"
return "";
   }
 const char *t
-  = "%.\\tatom%A1.add%t0\\t%0, %1, %2;";
+  = "%.\\tatom%A1.add%t0\\t%x0, %1, %2;";
 return nvptx_output_atomic_insn (t, operands, 1, 3);
   }
   [(set_attr "atomic" "true")])
@@ -2226,7 +2226,7 @@ (define_insn "atomic_fetch_"
return "";
   }
 const char *t
-  = "%.\\tatom%A1..b%T0\\t%0, %1, %2;";
+  = "%.\\tatom%A1..b%T0\\t%x0, %1, %2;";
 return nvptx_output_atomic_insn (t, operands, 1, 3);
   }
 
diff --git a/gcc/testsuite/gcc.target/nvptx/atomic-bit-bucket-dest.c 
b/gcc/testsuite/gcc.target/nvptx/atomic-bit-bucket-dest.c
new file mode 100644
index 000..7e3ffcece06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/atomic-bit-bucket-dest.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -misa=sm_35" } */
+
+enum memmodel
+{
+  MEMMODEL_RELAXED = 0
+};
+
+unsigned long long int *p64;
+unsigned long long int v64;
+
+int
+main()
+{
+  __atomic_fetch_add (p64, v64, MEMMODEL_RELAXED);
+  __atomic_fetch_and (p64, v64, MEMMODEL_RELAXED);
+  __atomic_fetch_or (p64, v64, MEMMODEL_RELAXED);
+  __atomic_fetch_xor (p64, v64, MEMMODEL_RELAXED);
+  __atomic_exchange_n (p64, v64, MEMMODEL_RELAXED);
+
+  {
+unsigned long long expected = v64;
+__atomic_compare_exchange_n (p64, , 0, 0, MEMMODEL_RELAXED,
+MEMMODEL_RELAXED);
+  }
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "atom.add.u64\[\t \]+_,

[committed][nvptx] Use atom.and.b64 instead of atom.b64.and

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi,

The ptx manual prescribes the instruction format atom{.space}.op.type but the
compiler currently emits:
...
  atom.b64.and %r31, [%r30], %r32;
...
which uses the instruction format atom{.space}.type.op.

Fix this by emitting instead:
...
  atom.and.b64  %r31, [%r30], %r32;
...

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Use atom.and.b64 instead of atom.b64.and

gcc/ChangeLog:

2022-03-07  Tom de Vries  

* config/nvptx/nvptx.md (define_insn "atomic_fetch_"):
Emit atom.and.b64 instead of atom.b64.and.

gcc/testsuite/ChangeLog:

2022-03-07  Tom de Vries  

* gcc.target/nvptx/atomic_fetch-1.c: Update.
* gcc.target/nvptx/atomic_fetch-2.c: Update.

---
 gcc/config/nvptx/nvptx.md   |  2 +-
 gcc/testsuite/gcc.target/nvptx/atomic_fetch-1.c | 36 -
 gcc/testsuite/gcc.target/nvptx/atomic_fetch-2.c | 18 ++---
 3 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index a453c1de503..8079763077f 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -2226,7 +2226,7 @@ (define_insn "atomic_fetch_"
return "";
   }
 const char *t
-  = "%.\\tatom%A1.b%T0.\\t%0, %1, %2;";
+  = "%.\\tatom%A1..b%T0\\t%0, %1, %2;";
 return nvptx_output_atomic_insn (t, operands, 1, 3);
   }
 
diff --git a/gcc/testsuite/gcc.target/nvptx/atomic_fetch-1.c 
b/gcc/testsuite/gcc.target/nvptx/atomic_fetch-1.c
index 941cf3a2ab4..801572928cb 100644
--- a/gcc/testsuite/gcc.target/nvptx/atomic_fetch-1.c
+++ b/gcc/testsuite/gcc.target/nvptx/atomic_fetch-1.c
@@ -66,35 +66,35 @@ main()
 /* Generic.  */
 
 /* { dg-final { scan-assembler-times "atom.add.u64" 1 } } */
-/* { dg-final { scan-assembler-times "atom.b64.and" 1 } } */
-/* { dg-final { scan-assembler-times "atom.b64.or" 1 } } */
-/* { dg-final { scan-assembler-times "atom.b64.xor" 1 } } */
+/* { dg-final { scan-assembler-times "atom.and.b64" 1 } } */
+/* { dg-final { scan-assembler-times "atom.or.b64" 1 } } */
+/* { dg-final { scan-assembler-times "atom.xor.b64" 1 } } */
 
 /* { dg-final { scan-assembler-times "atom.add.u32" 1 } } */
-/* { dg-final { scan-assembler-times "atom.b32.and" 1 } } */
-/* { dg-final { scan-assembler-times "atom.b32.or" 1 } } */
-/* { dg-final { scan-assembler-times "atom.b32.xor" 1 } } */
+/* { dg-final { scan-assembler-times "atom.and.b32" 1 } } */
+/* { dg-final { scan-assembler-times "atom.or.b32" 1 } } */
+/* { dg-final { scan-assembler-times "atom.xor.b32" 1 } } */
 
 /* Global.  */
 
 /* { dg-final { scan-assembler-times "atom.global.add.u64" 1 } } */
-/* { dg-final { scan-assembler-times "atom.global.b64.and" 1 } } */
-/* { dg-final { scan-assembler-times "atom.global.b64.or" 1 } } */
-/* { dg-final { scan-assembler-times "atom.global.b64.xor" 1 } } */
+/* { dg-final { scan-assembler-times "atom.global.and.b64" 1 } } */
+/* { dg-final { scan-assembler-times "atom.global.or.b64" 1 } } */
+/* { dg-final { scan-assembler-times "atom.global.xor.b64" 1 } } */
 
 /* { dg-final { scan-assembler-times "atom.global.add.u32" 1 } } */
-/* { dg-final { scan-assembler-times "atom.global.b32.and" 1 } } */
-/* { dg-final { scan-assembler-times "atom.global.b32.or" 1 } } */
-/* { dg-final { scan-assembler-times "atom.global.b32.xor" 1 } } */
+/* { dg-final { scan-assembler-times "atom.global.and.b32" 1 } } */
+/* { dg-final { scan-assembler-times "atom.global.or.b32" 1 } } */
+/* { dg-final { scan-assembler-times "atom.global.xor.b32" 1 } } */
 
 /* Shared.  */
 
 /* { dg-final { scan-assembler-times "atom.shared.add.u64" 1 } } */
-/* { dg-final { scan-assembler-times "atom.shared.b64.and" 1 } } */
-/* { dg-final { scan-assembler-times "atom.shared.b64.or" 1 } } */
-/* { dg-final { scan-assembler-times "atom.shared.b64.xor" 1 } } */
+/* { dg-final { scan-assembler-times "atom.shared.and.b64" 1 } } */
+/* { dg-final { scan-assembler-times "atom.shared.or.b64" 1 } } */
+/* { dg-final { scan-assembler-times "atom.shared.xor.b64" 1 } } */
 
 /* { dg-final { scan-assembler-times "atom.shared.add.u32" 1 } } */
-/* { dg-final { scan-assembler-times "atom.shared.b32.and" 1 } } */
-/* { dg-final { scan-assembler-times "atom.shared.b32.or" 1 } } */
-/* { dg-final { scan-assembler-times "atom.shared.b32.xor" 1 } } */
+/* { dg-final { scan-assembler-times "atom.shared.and.b32" 1 } } */
+/* { dg-final { scan-assembler-times "atom.shared.or.b32" 1 } } */
+/* { dg-final { scan-assembler-times "atom.shared.xor.b32" 1 

[committed][nvptx] Add multilib mptx=3.1

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi,

With commit 5b5e456f018 ("[nvptx] Build libraries with mptx=3.1") the
intention was that the ptx isa version for all libraries was switched back to
3.1 using MULTILIB_EXTRA_OPTS, without changing the default 6.0.

Further testing revealed that this is not the case, and some libs were still
build with 6.0.

Fix this by introducing an mptx=3.1 multilib.

Adding a multilib should be avoided if possible, because it adds build time.
But I think it's a reasonable trade-off.  With --disable-multilib, the default
lib with misa=sm_30 and mptx=6.0 should be usable in most scenarios.  With
--enable-multilib, we can enable older drivers, as well as generate code
similar to how that was done in previous gcc releases, which is very useful.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add multilib mptx=3.1

gcc/ChangeLog:

2022-03-07  Tom de Vries  

* config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Move mptx=3.1 ...
(MULTILIB_OPTIONS): ... here.

---
 gcc/config/nvptx/t-nvptx | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx
index a4a5341bb24..b63c4a5a39d 100644
--- a/gcc/config/nvptx/t-nvptx
+++ b/gcc/config/nvptx/t-nvptx
@@ -30,6 +30,4 @@ s-nvptx-gen-opt: $(srcdir)/config/nvptx/nvptx-sm.def
  tmp-nvptx-gen.opt $(srcdir)/config/nvptx/nvptx-gen.opt
$(STAMP) s-nvptx-gen-opt
 
-MULTILIB_OPTIONS = mgomp
-
-MULTILIB_EXTRA_OPTS = mptx=3.1
+MULTILIB_OPTIONS = mgomp mptx=3.1


[committed][nvptx] Restore default to sm_30

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi,

With commit 07667c911b1 ("[nvptx] Build libraries with misa=sm_30") the
intention was that the sm_xx for all libraries was switched back to sm_30
using MULTILIB_EXTRA_OPTS, without changing the default sm_35.

Testing on an sm_30 board revealed that still some libs were build with sm_35,
so fix this by switching back to default sm_30.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Restore default to sm_30

gcc/ChangeLog:

2022-03-07  Tom de Vries  

PR target/104758
* config/nvptx/nvptx.opt (misa): Set default to sm_30.
* config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Remove misa=sm_30.

---
 gcc/config/nvptx/nvptx.opt | 2 +-
 gcc/config/nvptx/t-nvptx   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index c83ceb3568b..fea99c5d406 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -53,7 +53,7 @@ Generate code for OpenMP offloading: enables -msoft-stack and 
-muniform-simt.
 
 ; Default needs to be in sync with default in ASM_SPEC in nvptx.h.
 misa=
-Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) 
Init(PTX_ISA_SM35)
+Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) 
Init(PTX_ISA_SM30)
 Specify the version of the ptx ISA to use.
 
 Enum
diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx
index 8f67264d132..a4a5341bb24 100644
--- a/gcc/config/nvptx/t-nvptx
+++ b/gcc/config/nvptx/t-nvptx
@@ -32,4 +32,4 @@ s-nvptx-gen-opt: $(srcdir)/config/nvptx/nvptx-sm.def
 
 MULTILIB_OPTIONS = mgomp
 
-MULTILIB_EXTRA_OPTS = misa=sm_30 mptx=3.1
+MULTILIB_EXTRA_OPTS = mptx=3.1


[PING][PATCH][final] Handle compiler-generated asm insn

2022-03-09 Thread Tom de Vries via Gcc-patches

On 2/22/22 14:55, Tom de Vries wrote:

Hi,

For the nvptx port, with -mptx-comment we have in pr53465.s:
...
 // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
 // Start: Added by -minit-regs=3:
 // #NO_APP
 mov.u32 %r26, 0;
 // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
 // End: Added by -minit-regs=3:
 // #NO_APP
...

The comments where generated using the compiler-generated equivalent of:
...
   asm ("// Comment");
...
but both the printed location and the NO_APP/APP are unnecessary for a
compiler-generated asm insn.

Fix this by handling ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION in
final_scan_insn_1, such what we simply get:
...
 // Start: Added by -minit-regs=3:
 mov.u32 %r26, 0;
 // End: Added by -minit-regs=3:
...

Tested on nvptx.

OK for trunk?



Ping.

Thanks,
- Tom


[final] Handle compiler-generated asm insn

gcc/ChangeLog:

2022-02-21  Tom de Vries  

PR rtl-optimization/104596
* config/nvptx/nvptx.cc (gen_comment): Use gen_rtx_ASM_INPUT instead
of gen_rtx_ASM_INPUT_loc.
* final.cc (final_scan_insn_1): Handle
ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION.

---
  gcc/config/nvptx/nvptx.cc |  3 +--
  gcc/final.cc  | 17 +++--
  2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 858789e6df7..4124c597f24 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -5381,8 +5381,7 @@ gen_comment (const char *s)
size_t len = strlen (ASM_COMMENT_START) + strlen (sep) + strlen (s) + 1;
char *comment = (char *) alloca (len);
snprintf (comment, len, "%s%s%s", ASM_COMMENT_START, sep, s);
-  return gen_rtx_ASM_INPUT_loc (VOIDmode, ggc_strdup (comment),
-   cfun->function_start_locus);
+  return gen_rtx_ASM_INPUT (VOIDmode, ggc_strdup (comment));
  }
  
  /* Initialize all declared regs at function entry.

diff --git a/gcc/final.cc b/gcc/final.cc
index a9868861bd2..e6443ef7a4f 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -2642,15 +2642,20 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, int 
optimize_p ATTRIBUTE_UNUSED,
if (string[0])
  {
expanded_location loc;
+   bool unknown_loc_p
+ = ASM_INPUT_SOURCE_LOCATION (body) == UNKNOWN_LOCATION;
  
-		app_enable ();

-   loc = expand_location (ASM_INPUT_SOURCE_LOCATION (body));
-   if (*loc.file && loc.line)
- fprintf (asm_out_file, "%s %i \"%s\" 1\n",
-  ASM_COMMENT_START, loc.line, loc.file);
+   if (!unknown_loc_p)
+ {
+   app_enable ();
+   loc = expand_location (ASM_INPUT_SOURCE_LOCATION (body));
+   if (*loc.file && loc.line)
+ fprintf (asm_out_file, "%s %i \"%s\" 1\n",
+  ASM_COMMENT_START, loc.line, loc.file);
+ }
fprintf (asm_out_file, "\t%s\n", string);
  #if HAVE_AS_LINE_ZERO
-   if (*loc.file && loc.line)
+   if (!unknown_loc_p && loc.file && *loc.file && loc.line)
  fprintf (asm_out_file, "%s 0 \"\" 2\n", ASM_COMMENT_START);
  #endif
  }


[committed][nvptx] Build libraries with mptx=3.1

2022-03-03 Thread Tom de Vries via Gcc-patches
Hi,

In gcc-5 to gcc-11, the ptx isa version was 3.1.

On trunk, the default is now 6.0, which is also what will be the value in
the libraries.

Consequently, there may be setups with an older driver that worked with
gcc-11, but will become unsupported with gcc-12.

Fix this by building the libraries with mptx=3.1.

After this, setups with an older driver still won't work out of the box
with gcc-12, because the default ptx isa version has changed, but should work
after specifying mptx=3.1.

Committed to trunk.

Thanks,
- Tom

[nvptx] Build libraries with mptx=3.1

gcc/ChangeLog:

2022-03-03  Tom de Vries  

* config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Add mptx=3.1.

---
 gcc/config/nvptx/t-nvptx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx
index 056d2dd2d04..8f67264d132 100644
--- a/gcc/config/nvptx/t-nvptx
+++ b/gcc/config/nvptx/t-nvptx
@@ -32,4 +32,4 @@ s-nvptx-gen-opt: $(srcdir)/config/nvptx/nvptx-sm.def
 
 MULTILIB_OPTIONS = mgomp
 
-MULTILIB_EXTRA_OPTS = misa=sm_30
+MULTILIB_EXTRA_OPTS = misa=sm_30 mptx=3.1


[committed][nvptx] Build libraries with misa=sm_30

2022-03-03 Thread Tom de Vries via Gcc-patches
Hi,

In gcc-11, when  specifying -misa=sm_30, an executable may still contain sm_35
code (due to libraries being built with the default -misa=sm_35), so it won't
run on an sm_30 board.

Fix this by building libraries with sm_30, as was the case in gcc-5 to gcc-10.

Committed to trunk.

Thanks,
- Tom

[nvptx] Build libraries with misa=sm_30

gcc/ChangeLog:

2022-03-03  Tom de Vries  

PR target/104758
* config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Add misa=sm_30.

---
 gcc/config/nvptx/t-nvptx | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx
index f17fc9c19aa..056d2dd2d04 100644
--- a/gcc/config/nvptx/t-nvptx
+++ b/gcc/config/nvptx/t-nvptx
@@ -31,3 +31,5 @@ s-nvptx-gen-opt: $(srcdir)/config/nvptx/nvptx-sm.def
$(STAMP) s-nvptx-gen-opt
 
 MULTILIB_OPTIONS = mgomp
+
+MULTILIB_EXTRA_OPTS = misa=sm_30


[committed][nvptx] Use --no-verify for sm_30

2022-03-03 Thread Tom de Vries via Gcc-patches
Hi,

In PR97348, we ran into the problem that recent CUDA dropped support for
sm_30, which inhibited the build when building with CUDA bin in the path,
because the nvptx-tools assembler uses CUDA's ptxas to do ptx verification.

To fix this, in gcc-11 the default sm_xx was moved from sm_30 to sm_35.

This however broke support for sm_30 boards: an executable build for sm_30
might contain sm_35 code from the libraries, which are build with the default
sm_xx (PR104758).

We want to fix this by going back to having the libraries build with sm_30, as
was the case for gcc-5 to gcc-10.  That however reintroduces the problem from
PR97348.

Deal with PR97348 in the simplest way possible: when calling the assembler for
sm_30, specify --no-verify.

This has the unfortunate effect that after fixing PR104758 by building
libraries with sm_30, the libraries are no longer verified.  This can be
improved upon by:
- adding a configure test in gcc that tests if CUDA supports sm_30, and
  if so disabling this patch
- dealing with this in nvptx-tools somehow, either:
  - detect at ptxas execution time that it doesn't support sm_30, or
  - detect this at nvptx-tool configure time.

Committed to trunk.

Thanks,
- Tom

[nvptx] Use --no-verify for sm_30

gcc/ChangeLog:

2022-03-03  Tom de Vries  

* config/nvptx/nvptx.h (ASM_SPEC): Add %{misa=sm_30:--no-verify}.

---
 gcc/config/nvptx/nvptx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 4ab412bc7d8..3ca22a595d2 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -32,7 +32,7 @@
 /* Default needs to be in sync with default for misa in nvptx.opt.
We add a default here to work around a hard-coded sm_30 default in
nvptx-as.  */
-#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}"
+#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}%{misa=sm_30:--no-verify}"
 
 #define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
 


[committed][nvptx] Add -mptx=_ in gcc.target/nvptx/smxx.c

2022-03-03 Thread Tom de Vries via Gcc-patches
Hi,

With target board nvptx-none-run/-mptx=3.1 we run into:
...
cc1: error: PTX version (-mptx) needs to be at least 4.2 to support \
  selected -misa (sm_53)^M
compiler exited with status 1
FAIL: gcc.target/nvptx/sm53.c (test for excess errors)
...

Fix this by adding -mptx=_ in sm53.c and similar.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add -mptx=_ in gcc.target/nvptx/smxx.c

gcc/testsuite/ChangeLog:

2022-03-03  Tom de Vries  

* gcc.target/nvptx/sm53.c: Add -mptx=_.
* gcc.target/nvptx/sm70.c: Same.
* gcc.target/nvptx/sm75.c: Same.
* gcc.target/nvptx/sm80.c: Same.

---
 gcc/testsuite/gcc.target/nvptx/sm53.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/sm70.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/sm75.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/sm80.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/nvptx/sm53.c 
b/gcc/testsuite/gcc.target/nvptx/sm53.c
index c47790b6448..b4d819c6a79 100644
--- a/gcc/testsuite/gcc.target/nvptx/sm53.c
+++ b/gcc/testsuite/gcc.target/nvptx/sm53.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-misa=sm_53" } */
+/* { dg-options "-misa=sm_53 -mptx=_" } */
 
 #if __PTX_SM__ != 530
 #error wrong value for __PTX_SM__
diff --git a/gcc/testsuite/gcc.target/nvptx/sm70.c 
b/gcc/testsuite/gcc.target/nvptx/sm70.c
index dc5a5fd8bfa..4bd012b5680 100644
--- a/gcc/testsuite/gcc.target/nvptx/sm70.c
+++ b/gcc/testsuite/gcc.target/nvptx/sm70.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-misa=sm_70" } */
+/* { dg-options "-misa=sm_70 -mptx=_" } */
 
 #if __PTX_SM__ != 700
 #error wrong value for __PTX_SM__
diff --git a/gcc/testsuite/gcc.target/nvptx/sm75.c 
b/gcc/testsuite/gcc.target/nvptx/sm75.c
index c098bf77ca2..d159d3f5fb3 100644
--- a/gcc/testsuite/gcc.target/nvptx/sm75.c
+++ b/gcc/testsuite/gcc.target/nvptx/sm75.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-misa=sm_75" } */
+/* { dg-options "-misa=sm_75 -mptx=_" } */
 
 #if __PTX_SM__ != 750
 #error wrong value for __PTX_SM__
diff --git a/gcc/testsuite/gcc.target/nvptx/sm80.c 
b/gcc/testsuite/gcc.target/nvptx/sm80.c
index 3770563eb16..ef6d8b7fa23 100644
--- a/gcc/testsuite/gcc.target/nvptx/sm80.c
+++ b/gcc/testsuite/gcc.target/nvptx/sm80.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-misa=sm_80" } */
+/* { dg-options "-misa=sm_80 -mptx=_" } */
 
 #if __PTX_SM__ != 800
 #error wrong value for __PTX_SM__


[committed][nvptx] Handle DCmode in define_expand "omp_simt_xchg_{bfly,idx}"

2022-03-01 Thread Tom de Vries via Gcc-patches
Hi,

For a test-case doing an openmp target simd reduction on a complex double:
...
  DOUBLE COMPLEX :: counter_N0
  ...
  !$OMP TARGET SIMD reduction(+: counter_N0)
...
we run into:
...
during RTL pass: expand
b.f90: In function ‘MAIN__._omp_fn.0’:
b.f90:23:32: internal compiler error: in expand_insn, at optabs.cc:8029
   23 | counter_N0 = counter_N0 + 1.
  |^
0x10f1cd3 expand_insn(insn_code, unsigned int, expand_operand*)
gcc/optabs.cc:8029
0xeac435 expand_GOMP_SIMT_XCHG_BFLY
gcc/internal-fn.cc:375
...

Fix this by handling DCmode and CDImode in define_expand
"omp_simt_xchg_{bfly,idx}".

Tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Handle DCmode in define_expand "omp_simt_xchg_{bfly,idx}"

gcc/ChangeLog:

2022-02-28  Tom de Vries  

PR target/102429
* config/nvptx/nvptx.cc (nvptx_gen_shuffle): Handle DCmode and CDImode.
* config/nvptx/nvptx.md
(define_predicate "nvptx_register_or_complex_di_df_register_operand"):
New predicate.
(define_expand "omp_simt_xchg_bfly", define_expand "omp_simt_xchg_idx"):
Use nvptx_register_or_complex_di_df_register_operand.

---
 gcc/config/nvptx/nvptx.cc | 17 +
 gcc/config/nvptx/nvptx.md | 20 
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index f3179efa8d6..6ca99a61cbd 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -1941,6 +1941,23 @@ nvptx_gen_shuffle (rtx dst, rtx src, rtx idx, 
nvptx_shuffle_kind kind)
 
   switch (GET_MODE (dst))
 {
+  case E_DCmode:
+  case E_CDImode:
+   {
+ gcc_assert (GET_CODE (dst) == CONCAT);
+ gcc_assert (GET_CODE (src) == CONCAT);
+ rtx dst_real = XEXP (dst, 0);
+ rtx dst_imag = XEXP (dst, 1);
+ rtx src_real = XEXP (src, 0);
+ rtx src_imag = XEXP (src, 1);
+
+ start_sequence ();
+ emit_insn (nvptx_gen_shuffle (dst_real, src_real, idx, kind));
+ emit_insn (nvptx_gen_shuffle (dst_imag, src_imag, idx, kind));
+ res = get_insns ();
+ end_sequence ();
+   }
+   break;
 case E_SImode:
   res = gen_nvptx_shufflesi (dst, src, idx, GEN_INT (kind));
   break;
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 4989b5642e2..a453c1de503 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -94,6 +94,18 @@ (define_predicate "nvptx_register_operand"
   return register_operand (op, mode);
 })
 
+(define_predicate "nvptx_register_or_complex_di_df_register_operand"
+  (ior (match_code "reg")
+   (match_code "concat"))
+{
+  if (GET_CODE (op) == CONCAT)
+return ((GET_MODE (op) == DCmode || GET_MODE (op) == CDImode)
+   && nvptx_register_operand (XEXP (op, 0), mode)
+   && nvptx_register_operand (XEXP (op, 1), mode));
+
+  return nvptx_register_operand (op, mode);
+})
+
 (define_predicate "nvptx_nonimmediate_operand"
   (match_code "mem,reg")
 {
@@ -1902,8 +1914,8 @@ (define_expand "omp_simt_ordered"
 ;; Implement IFN_GOMP_SIMT_XCHG_BFLY: perform a "butterfly" exchange
 ;; across lanes
 (define_expand "omp_simt_xchg_bfly"
-  [(match_operand 0 "nvptx_register_operand" "=R")
-   (match_operand 1 "nvptx_register_operand" "R")
+  [(match_operand 0 "nvptx_register_or_complex_di_df_register_operand" "=R")
+   (match_operand 1 "nvptx_register_or_complex_di_df_register_operand" "R")
(match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")]
   ""
 {
@@ -1915,8 +1927,8 @@ (define_expand "omp_simt_xchg_bfly"
 ;; Implement IFN_GOMP_SIMT_XCHG_IDX: broadcast value in operand 1
 ;; from lane given by index in operand 2 to operand 0 in all lanes
 (define_expand "omp_simt_xchg_idx"
-  [(match_operand 0 "nvptx_register_operand" "=R")
-   (match_operand 1 "nvptx_register_operand" "R")
+  [(match_operand 0 "nvptx_register_or_complex_di_df_register_operand" "=R")
+   (match_operand 1 "nvptx_register_or_complex_di_df_register_operand" "R")
(match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")]
   ""
 {


[committed][nvptx] Add nvptx-gen.h and nvptx-gen.opt

2022-03-01 Thread Tom de Vries via Gcc-patches
Hi,

Use nvptx-sm.def to generate new files nvptx-gen.h and nvptx-gen.opt, and:
- include nvptx-gen.h in nvptx.h, and
- add nvptx-gen.opt to extra_options (before nvptx.opt, in case that matters).

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add nvptx-gen.h and nvptx-gen.opt

gcc/ChangeLog:

2022-02-25  Tom de Vries  

* config.gcc (nvptx*-*-*): Add nvptx/nvptx-gen.opt to extra_options.
* config/nvptx/gen-copyright.sh: New file.
* config/nvptx/gen-h.sh: New file.
* config/nvptx/gen-opt.sh: New file.
* config/nvptx/nvptx.h (TARGET_SM35, TARGET_SM53, TARGET_SM70)
(TARGET_SM75, TARGET_SM80): Move ...
* config/nvptx/nvptx-gen.h: ... here.  New file, generate.
* config/nvptx/nvptx.opt (Enum ptx_isa): Move ...
* config/nvptx/nvptx-gen.opt: ... here.  New file, generate.
* config/nvptx/t-nvptx ($(srcdir)/config/nvptx/nvptx-gen.h)
($(srcdir)/config/nvptx/nvptx-gen.opt): New make target.

---
 gcc/config.gcc|  1 +
 gcc/config/nvptx/gen-copyright.sh | 82 +++
 gcc/config/nvptx/gen-h.sh | 44 +
 gcc/config/nvptx/gen-opt.sh   | 66 +++
 gcc/config/nvptx/nvptx-gen.h  | 29 ++
 gcc/config/nvptx/nvptx-gen.opt| 42 
 gcc/config/nvptx/nvptx.h  |  6 +--
 gcc/config/nvptx/nvptx.opt| 22 ---
 gcc/config/nvptx/t-nvptx  | 17 
 9 files changed, 282 insertions(+), 27 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 2cc5aeec9e4..3833bfa16a9 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -477,6 +477,7 @@ nvptx-*-*)
cpu_type=nvptx
c_target_objs="nvptx-c.o"
cxx_target_objs="nvptx-c.o"
+   extra_options="${extra_options} nvptx/nvptx-gen.opt"
;;
 or1k*-*-*)
cpu_type=or1k
diff --git a/gcc/config/nvptx/gen-copyright.sh 
b/gcc/config/nvptx/gen-copyright.sh
new file mode 100644
index 000..79f48995acc
--- /dev/null
+++ b/gcc/config/nvptx/gen-copyright.sh
@@ -0,0 +1,82 @@
+#!/bin/sh
+
+# Copyright (C) 2022 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+style="$1"
+case $style in
+opt)
+;;
+c)
+   first=true
+;;
+*)
+   echo "Unknown style: \"$style\""
+   exit 1
+   ;;
+esac
+
+( cat <http://www.gnu.org/licenses/>.
+EOF
+) | while read line; do
+case $style in
+   opt)
+   if [ "$line" = "" ]; then
+   echo ";"
+   else
+   echo "; $line"
+   fi
+   ;;
+   c)
+   if $first; then
+   echo "/* $line"
+   first=false
+   else
+   if [ "$line" = "" ]; then
+   echo
+   else
+   echo "   $line"
+   fi
+   fi
+   ;;
+esac
+done
+
+
+case $style in
+c)
+   echo "*/"
+   ;;
+esac
diff --git a/gcc/config/nvptx/gen-h.sh b/gcc/config/nvptx/gen-h.sh
new file mode 100644
index 000..605f874055a
--- /dev/null
+++ b/gcc/config/nvptx/gen-h.sh
@@ -0,0 +1,44 @@
+#!/bin/sh
+
+# Copyright (C) 2022 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+nvptx_sm_def="$1/nvptx-sm.def"
+gen_copyright_sh="$1/gen-copyright.sh"
+
+sms=$(grep ^NVPTX_SM $nvptx_sm_def | sed 's/.*(//;s/,.*//')
+
+cat <= PTX_ISA_SM$sm)
+EOF
+done
diff --git a/gcc/config/nvptx/gen-opt.sh b/gcc/config/nvptx/gen-opt.sh
new fil

[committed][nvptx] Use nvptx-sm.def for t-omp-device

2022-03-01 Thread Tom de Vries via Gcc-patches
Hi,

Add a script gen-omp-device-properties.sh that uses nvptx-sm.def to generate
omp-device-properties-nvptx.

Tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Use nvptx-sm.def for t-omp-device

gcc/ChangeLog:

2022-02-25  Tom de Vries  

* config/nvptx/gen-omp-device-properties.sh: New file.
* config/nvptx/t-omp-device: Use gen-omp-device-properties.sh.

---
 gcc/config/nvptx/gen-omp-device-properties.sh | 33 +++
 gcc/config/nvptx/t-omp-device |  7 +++---
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/gcc/config/nvptx/gen-omp-device-properties.sh 
b/gcc/config/nvptx/gen-omp-device-properties.sh
new file mode 100644
index 000..175092cdde6
--- /dev/null
+++ b/gcc/config/nvptx/gen-omp-device-properties.sh
@@ -0,0 +1,33 @@
+#!/bin/sh
+
+# Copyright (C) 2022 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+nvptx_sm_def="$1/nvptx-sm.def"
+
+sms=$(grep ^NVPTX_SM $nvptx_sm_def | sed 's/.*(//;s/,.*//')
+
+echo kind: gpu
+echo arch: nvptx
+
+isa=""
+for sm in $sms; do
+isa="$isa sm_$sm"
+done
+
+echo isa: $isa
diff --git a/gcc/config/nvptx/t-omp-device b/gcc/config/nvptx/t-omp-device
index 4228218a424..c2b28a41ee4 100644
--- a/gcc/config/nvptx/t-omp-device
+++ b/gcc/config/nvptx/t-omp-device
@@ -1,4 +1,3 @@
-omp-device-properties-nvptx: $(srcdir)/config/nvptx/nvptx.cc
-   echo kind: gpu > $@
-   echo arch: nvptx >> $@
-   echo isa: sm_30 sm_35 sm_53 sm_70 sm_75 sm_80 >> $@
+omp-device-properties-nvptx: $(srcdir)/config/nvptx/nvptx-sm.def
+   $(SHELL) $(srcdir)/config/nvptx/gen-omp-device-properties.sh \
+ "$(srcdir)/config/nvptx" > $@


[committed][nvptx] Add nvptx-sm.def

2022-03-01 Thread Tom de Vries via Gcc-patches
Hi,

Add a file gcc/config/nvptx/nvptx-sm.def that lists all sm_xx versions used in
the port, like so:
...
NVPTX_SM(30, NVPTX_SM_SEP)
NVPTX_SM(35, NVPTX_SM_SEP)
NVPTX_SM(53, NVPTX_SM_SEP)
NVPTX_SM(70, NVPTX_SM_SEP)
NVPTX_SM(75, NVPTX_SM_SEP)
NVPTX_SM(80,)
...
and use it in various places using a pattern:
...
  #define NVPTX_SM(XX, SEP) { ... }
  #include "nvptx-sm.def"
  #undef NVPTX_SM
...

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add nvptx-sm.def

gcc/ChangeLog:

2022-02-25  Tom de Vries  

* config/nvptx/nvptx-sm.def: New file.
* config/nvptx/nvptx-c.cc (nvptx_cpu_cpp_builtins): Use nvptx-sm.def.
* config/nvptx/nvptx-opts.h (enum ptx_isa): Same.
* config/nvptx/nvptx.cc (sm_version_to_string)
(nvptx_omp_device_kind_arch_isa): Same.

---
 gcc/config/nvptx/nvptx-c.cc   | 22 ++
 gcc/config/nvptx/nvptx-opts.h | 11 +--
 gcc/config/nvptx/nvptx-sm.def | 30 ++
 gcc/config/nvptx/nvptx.cc | 36 
 4 files changed, 57 insertions(+), 42 deletions(-)

diff --git a/gcc/config/nvptx/nvptx-c.cc b/gcc/config/nvptx/nvptx-c.cc
index b2375fb5b16..02f75625064 100644
--- a/gcc/config/nvptx/nvptx-c.cc
+++ b/gcc/config/nvptx/nvptx-c.cc
@@ -39,17 +39,15 @@ nvptx_cpu_cpp_builtins (void)
 cpp_define (parse_in, "__nvptx_softstack__");
   if (TARGET_UNIFORM_SIMT)
 cpp_define (parse_in,"__nvptx_unisimt__");
-  if (TARGET_SM80)
-cpp_define (parse_in, "__PTX_SM__=800");
-  else if (TARGET_SM75)
-cpp_define (parse_in, "__PTX_SM__=750");
-  else if (TARGET_SM70)
-cpp_define (parse_in, "__PTX_SM__=700");
-  else if (TARGET_SM53)
-cpp_define (parse_in, "__PTX_SM__=530");
-  else if (TARGET_SM35)
-cpp_define (parse_in, "__PTX_SM__=350");
-  else
-cpp_define (parse_in,"__PTX_SM__=300");
+
+  const char *ptx_sm = NULL;
+#define NVPTX_SM(XX, SEP) \
+  {\
+if (TARGET_SM ## XX)   \
+  ptx_sm = "__PTX_SM__=" #XX "0"; \
+  }
+#include "nvptx-sm.def"
+#undef NVPTX_SM
+  cpp_define (parse_in, ptx_sm);
 }
 
diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h
index 30852b6992c..86b433caae8 100644
--- a/gcc/config/nvptx/nvptx-opts.h
+++ b/gcc/config/nvptx/nvptx-opts.h
@@ -22,12 +22,11 @@
 
 enum ptx_isa
 {
-  PTX_ISA_SM30,
-  PTX_ISA_SM35,
-  PTX_ISA_SM53,
-  PTX_ISA_SM70,
-  PTX_ISA_SM75,
-  PTX_ISA_SM80
+#define NVPTX_SM(XX, SEP) PTX_ISA_SM ## XX SEP
+#define NVPTX_SM_SEP ,
+#include "nvptx-sm.def"
+#undef NVPTX_SM_SEP
+#undef NVPTX_SM
 };
 
 enum ptx_version
diff --git a/gcc/config/nvptx/nvptx-sm.def b/gcc/config/nvptx/nvptx-sm.def
new file mode 100644
index 000..c552eb0c88b
--- /dev/null
+++ b/gcc/config/nvptx/nvptx-sm.def
@@ -0,0 +1,30 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef NVPTX_SM_SEP
+#define NVPTX_SM_SEP
+#endif
+
+NVPTX_SM (30, NVPTX_SM_SEP)
+NVPTX_SM (35, NVPTX_SM_SEP)
+NVPTX_SM (53, NVPTX_SM_SEP)
+NVPTX_SM (70, NVPTX_SM_SEP)
+NVPTX_SM (75, NVPTX_SM_SEP)
+NVPTX_SM (80,)
+
+#undef NVPTX_SM_SEP
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 7862a90a65a..f3179efa8d6 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -276,18 +276,11 @@ sm_version_to_string (enum ptx_isa sm)
 {
   switch (sm)
 {
-case PTX_ISA_SM30:
-  return "30";
-case PTX_ISA_SM35:
-  return "35";
-case PTX_ISA_SM53:
-  return "53";
-case PTX_ISA_SM70:
-  return "70";
-case PTX_ISA_SM75:
-  return "75";
-case PTX_ISA_SM80:
-  return "80";
+#define NVPTX_SM(XX, SEP)  \
+  case PTX_ISA_SM ## XX:   \
+   return #XX;
+#include "nvptx-sm.def"
+#undef NVPTX_SM
 default:
   gcc_unreachable ();
 }
@@ -6177,18 +6170,13 @@ nvptx_omp_device_kind_arch_isa (enum 
omp_device_kind_arch_isa trait,
 case omp_device_arch:
   return strcmp (name, "nvptx") == 0;
 case omp_device_isa:
-  if (strcmp (name, "sm_30") == 0)
-   r

[committed][nvptx, testsuite] Add gcc.target/nvptx/sm*.c

2022-03-01 Thread Tom de Vries via Gcc-patches
Hi,

Add a few test-cases that test passing each -misa=sm_xx version and verify that
the proper __PTX_SM__ is defined.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx, testsuite] Add gcc.target/nvptx/sm*.c

gcc/testsuite/ChangeLog:

2022-02-25  Tom de Vries  

* gcc.target/nvptx/sm30.c: New test.
* gcc.target/nvptx/sm35.c: New test.
* gcc.target/nvptx/sm53.c: New test.
* gcc.target/nvptx/sm70.c: New test.
* gcc.target/nvptx/sm75.c: New test.
* gcc.target/nvptx/sm80.c: New test.

---
 gcc/testsuite/gcc.target/nvptx/sm30.c | 6 ++
 gcc/testsuite/gcc.target/nvptx/sm35.c | 6 ++
 gcc/testsuite/gcc.target/nvptx/sm53.c | 6 ++
 gcc/testsuite/gcc.target/nvptx/sm70.c | 6 ++
 gcc/testsuite/gcc.target/nvptx/sm75.c | 6 ++
 gcc/testsuite/gcc.target/nvptx/sm80.c | 6 ++
 6 files changed, 36 insertions(+)

diff --git a/gcc/testsuite/gcc.target/nvptx/sm30.c 
b/gcc/testsuite/gcc.target/nvptx/sm30.c
new file mode 100644
index 000..4b3531788d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/sm30.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-misa=sm_30" } */
+
+#if __PTX_SM__ != 300
+#error wrong value for __PTX_SM__
+#endif
diff --git a/gcc/testsuite/gcc.target/nvptx/sm35.c 
b/gcc/testsuite/gcc.target/nvptx/sm35.c
new file mode 100644
index 000..ff3d1793846
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/sm35.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-misa=sm_35" } */
+
+#if __PTX_SM__ != 350
+#error wrong value for __PTX_SM__
+#endif
diff --git a/gcc/testsuite/gcc.target/nvptx/sm53.c 
b/gcc/testsuite/gcc.target/nvptx/sm53.c
new file mode 100644
index 000..c47790b6448
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/sm53.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-misa=sm_53" } */
+
+#if __PTX_SM__ != 530
+#error wrong value for __PTX_SM__
+#endif
diff --git a/gcc/testsuite/gcc.target/nvptx/sm70.c 
b/gcc/testsuite/gcc.target/nvptx/sm70.c
new file mode 100644
index 000..dc5a5fd8bfa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/sm70.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-misa=sm_70" } */
+
+#if __PTX_SM__ != 700
+#error wrong value for __PTX_SM__
+#endif
diff --git a/gcc/testsuite/gcc.target/nvptx/sm75.c 
b/gcc/testsuite/gcc.target/nvptx/sm75.c
new file mode 100644
index 000..c098bf77ca2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/sm75.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-misa=sm_75" } */
+
+#if __PTX_SM__ != 750
+#error wrong value for __PTX_SM__
+#endif
diff --git a/gcc/testsuite/gcc.target/nvptx/sm80.c 
b/gcc/testsuite/gcc.target/nvptx/sm80.c
new file mode 100644
index 000..3770563eb16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/sm80.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-misa=sm_80" } */
+
+#if __PTX_SM__ != 800
+#error wrong value for __PTX_SM__
+#endif


[committed][libgomp, testsuite, nvptx] Add -mptx=_ in declare-variant-3-sm*.c

2022-02-28 Thread Tom de Vries via Gcc-patches
Hi,

When running with target board unix/-foffload=-mptx=3.1, we run into:
...
lto1: error: PTX version (-mptx) needs to be at least 4.2 to support \
  selected -misa (sm_53)^M
mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned \
  1 exit status^M
compilation terminated.^M
  ...
FAIL: libgomp.c/declare-variant-3-sm53.c (test for excess errors)
...

Fix this by adding -foffload=-mptx=_ in the libgomp.c/declare-variant-3-sm*.c
test-cases.

Tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[libgomp, testsuite, nvptx] Add -mptx=_ in declare-variant-3-sm*.c

libgomp/ChangeLog:

2022-02-28  Tom de Vries  

* testsuite/libgomp.c/declare-variant-3-sm30.c: Add -foffload=-mptx=_.
* testsuite/libgomp.c/declare-variant-3-sm35.c: Same.
* testsuite/libgomp.c/declare-variant-3-sm53.c: Same.
* testsuite/libgomp.c/declare-variant-3-sm70.c: Same.
* testsuite/libgomp.c/declare-variant-3-sm75.c: Same.
* testsuite/libgomp.c/declare-variant-3-sm80.c: Same.

---
 libgomp/testsuite/libgomp.c/declare-variant-3-sm30.c | 2 +-
 libgomp/testsuite/libgomp.c/declare-variant-3-sm35.c | 2 +-
 libgomp/testsuite/libgomp.c/declare-variant-3-sm53.c | 2 +-
 libgomp/testsuite/libgomp.c/declare-variant-3-sm70.c | 2 +-
 libgomp/testsuite/libgomp.c/declare-variant-3-sm75.c | 2 +-
 libgomp/testsuite/libgomp.c/declare-variant-3-sm80.c | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c/declare-variant-3-sm30.c 
b/libgomp/testsuite/libgomp.c/declare-variant-3-sm30.c
index ad1602c13cd..a49bc12064a 100644
--- a/libgomp/testsuite/libgomp.c/declare-variant-3-sm30.c
+++ b/libgomp/testsuite/libgomp.c/declare-variant-3-sm30.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { offload_target_nvptx } } } */
-/* { dg-additional-options "-foffload=-misa=sm_30" } */
+/* { dg-additional-options "-foffload=-misa=sm_30 -foffload=-mptx=_" } */
 /* { dg-additional-options "-foffload=-fdump-tree-optimized" } */
 
 #include "declare-variant-3.h"
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-3-sm35.c 
b/libgomp/testsuite/libgomp.c/declare-variant-3-sm35.c
index 1a7cda2456b..9f71acb8738 100644
--- a/libgomp/testsuite/libgomp.c/declare-variant-3-sm35.c
+++ b/libgomp/testsuite/libgomp.c/declare-variant-3-sm35.c
@@ -1,5 +1,5 @@
 /* { dg-do link { target { offload_target_nvptx } } } */
-/* { dg-additional-options "-foffload=-misa=sm_35" } */
+/* { dg-additional-options "-foffload=-misa=sm_35 -foffload=-mptx=_" } */
 /* { dg-additional-options "-foffload=-fdump-tree-optimized" } */
 
 #include "declare-variant-3.h"
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-3-sm53.c 
b/libgomp/testsuite/libgomp.c/declare-variant-3-sm53.c
index a37b5fdaa28..fa713920ce0 100644
--- a/libgomp/testsuite/libgomp.c/declare-variant-3-sm53.c
+++ b/libgomp/testsuite/libgomp.c/declare-variant-3-sm53.c
@@ -1,5 +1,5 @@
 /* { dg-do link { target { offload_target_nvptx } } } */
-/* { dg-additional-options "-foffload=-misa=sm_53" } */
+/* { dg-additional-options "-foffload=-misa=sm_53 -foffload=-mptx=_" } */
 /* { dg-additional-options "-foffload=-fdump-tree-optimized" } */
 
 #include "declare-variant-3.h"
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-3-sm70.c 
b/libgomp/testsuite/libgomp.c/declare-variant-3-sm70.c
index ab022cd79f9..90f0116c582 100644
--- a/libgomp/testsuite/libgomp.c/declare-variant-3-sm70.c
+++ b/libgomp/testsuite/libgomp.c/declare-variant-3-sm70.c
@@ -1,5 +1,5 @@
 /* { dg-do link { target { offload_target_nvptx } } } */
-/* { dg-additional-options "-foffload=-misa=sm_70" } */
+/* { dg-additional-options "-foffload=-misa=sm_70 -foffload=-mptx=_" } */
 /* { dg-additional-options "-foffload=-fdump-tree-optimized" } */
 
 #include "declare-variant-3.h"
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-3-sm75.c 
b/libgomp/testsuite/libgomp.c/declare-variant-3-sm75.c
index 7d09195d9c4..86f2e72866a 100644
--- a/libgomp/testsuite/libgomp.c/declare-variant-3-sm75.c
+++ b/libgomp/testsuite/libgomp.c/declare-variant-3-sm75.c
@@ -1,5 +1,5 @@
 /* { dg-do link { target { offload_target_nvptx } } } */
-/* { dg-additional-options "-foffload=-misa=sm_75" } */
+/* { dg-additional-options "-foffload=-misa=sm_75 -foffload=-mptx=_" } */
 /* { dg-additional-options "-foffload=-fdump-tree-optimized" } */
 
 #include "declare-variant-3.h"
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-3-sm80.c 
b/libgomp/testsuite/libgomp.c/declare-variant-3-sm80.c
index 898ae6e4da8..de208d9bdd1 100644
--- a/libgomp/testsuite/libgomp.c/declare-variant-3-sm80.c
+++ b/libgomp/testsuite/libgomp.c/declare-variant-3-sm80.c
@@ -1,5 +1,5 @@
 /* { dg-do link { target { offload_target_nvptx } } } */
-/* { dg-additional-options "-foffload=-misa=sm_80" } */
+/* { dg-additional-options "-foffload=-misa=sm_80 -foffload=-mptx=_" } */
 /* { dg-additional-options "-foffload=-fdump-tree-optimized" } */
 
 #include "declare-variant-3.h"


[committed][nvptx, testsuite] Add -mptx=_ in nvptx.exp test-cases

2022-02-28 Thread Tom de Vries via Gcc-patches
Hi,

When running with target board nvptx-none-run/-mptx=3.1, I run into:
...
cc1: error: PTX version (-mptx) needs to be at least 4.2 to support selected \
  -misa (sm_53)^M
compiler exited with status 1
FAIL: gcc.target/nvptx/atomic-store-1.c (test for excess errors)
...

Fix this and similar cases by adding an explicit -mptx=_ setting.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx, testsuite] Add -mptx=_ in nvptx.exp test-cases

gcc/testsuite/ChangeLog:

2022-02-28  Tom de Vries  

* gcc.target/nvptx/atomic-store-1.c: Add -mptx=_.
* gcc.target/nvptx/atomic-store-2.c: Same.
* gcc.target/nvptx/float16-1.c: Same.
* gcc.target/nvptx/float16-2.c: Same.
* gcc.target/nvptx/float16-3.c: Same.
* gcc.target/nvptx/float16-4.c: Same.
* gcc.target/nvptx/float16-5.c: Same.
* gcc.target/nvptx/float16-6.c: Same.
* gcc.target/nvptx/tanh-1.c: Same.
* gcc.target/nvptx/uniform-simt-1.c: Same.
* gcc.target/nvptx/uniform-simt-3.c: Same.

---
 gcc/testsuite/gcc.target/nvptx/atomic-store-1.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/atomic-store-2.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/float16-1.c  | 2 +-
 gcc/testsuite/gcc.target/nvptx/float16-2.c  | 2 +-
 gcc/testsuite/gcc.target/nvptx/float16-3.c  | 2 +-
 gcc/testsuite/gcc.target/nvptx/float16-4.c  | 2 +-
 gcc/testsuite/gcc.target/nvptx/float16-5.c  | 2 +-
 gcc/testsuite/gcc.target/nvptx/float16-6.c  | 2 +-
 gcc/testsuite/gcc.target/nvptx/tanh-1.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/uniform-simt-3.c | 2 +-
 11 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/gcc.target/nvptx/atomic-store-1.c 
b/gcc/testsuite/gcc.target/nvptx/atomic-store-1.c
index d611f2d410f..eecd00854f7 100644
--- a/gcc/testsuite/gcc.target/nvptx/atomic-store-1.c
+++ b/gcc/testsuite/gcc.target/nvptx/atomic-store-1.c
@@ -2,7 +2,7 @@
shared state space.  */
 
 /* { dg-do compile } */
-/* { dg-options "-misa=sm_53" } */
+/* { dg-options "-misa=sm_53 -mptx=_" } */
 
 enum memmodel
 {
diff --git a/gcc/testsuite/gcc.target/nvptx/atomic-store-2.c 
b/gcc/testsuite/gcc.target/nvptx/atomic-store-2.c
index b58f33f2abd..127d2c4cbe2 100644
--- a/gcc/testsuite/gcc.target/nvptx/atomic-store-2.c
+++ b/gcc/testsuite/gcc.target/nvptx/atomic-store-2.c
@@ -2,7 +2,7 @@
shared state space.  */
 
 /* { dg-do compile } */
-/* { dg-options "-misa=sm_70" } */
+/* { dg-options "-misa=sm_70 -mptx=_" } */
 
 enum memmodel
 {
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-1.c 
b/gcc/testsuite/gcc.target/nvptx/float16-1.c
index 9c3f8fe8f9d..873a0543535 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-1.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -misa=sm_53 -ffast-math" } */
+/* { dg-options "-O2 -ffast-math -misa=sm_53 -mptx=_" } */
 
 _Float16 var;
 
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-2.c 
b/gcc/testsuite/gcc.target/nvptx/float16-2.c
index 2d1dc1aafb5..30a3092bc29 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-2.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ffast-math -misa=sm_80" } */
+/* { dg-options "-O2 -ffast-math -misa=sm_80 -mptx=_" } */
 
 _Float16 x;
 _Float16 y;
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-3.c 
b/gcc/testsuite/gcc.target/nvptx/float16-3.c
index 3abcec39a8a..edd6514a976 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-3.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -misa=sm_53" } */
+/* { dg-options "-O2 -misa=sm_53 -mptx=_" } */
 
 _Float16 var;
 
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-4.c 
b/gcc/testsuite/gcc.target/nvptx/float16-4.c
index 173f9600ac7..0a823971e75 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-4.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -misa=sm_53 -ffast-math" } */
+/* { dg-options "-O2 -ffast-math -misa=sm_53 -mptx=_" } */
 
 _Float16 var;
 
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-5.c 
b/gcc/testsuite/gcc.target/nvptx/float16-5.c
index 700b3159a97..2261f42baac 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-5.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -misa=sm_53 -ffast-math" } */
+/* { dg-options "-O2 -ffast-math -misa=sm_53 -mptx=_" } */
 
 _Float16 a;
 _Float16 b;
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-6.c 
b/gcc/testsuite/gcc.target/nvptx/float16-6.c
index 4889577f7f6..9ca714ca76f 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-6.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-6.c
@@

[committed][nvptx] Add -mptx=_

2022-02-28 Thread Tom de Vries via Gcc-patches
Hi,

Add an -mptx=_ value, that indicates the default ptx version.

It can be used to undo an explicit -mptx setting, so this:
...
$ gcc test.c -mptx=3.1 -mptx=_
...
has the same effect as:
...
$ gcc test.c
...

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add -mptx=_

gcc/ChangeLog:

2022-02-28  Tom de Vries  

* config/nvptx/nvptx-opts.h (enum ptx_version): Add
PTX_VERSION_default.
* config/nvptx/nvptx.cc (handle_ptx_version_option): Handle
PTX_VERSION_default.
* config/nvptx/nvptx.opt: Add EnumValue "_" / PTX_VERSION_default.

---
 gcc/config/nvptx/nvptx-opts.h | 1 +
 gcc/config/nvptx/nvptx.cc | 3 ++-
 gcc/config/nvptx/nvptx.opt| 3 +++
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h
index e918d43ea16..30852b6992c 100644
--- a/gcc/config/nvptx/nvptx-opts.h
+++ b/gcc/config/nvptx/nvptx-opts.h
@@ -32,6 +32,7 @@ enum ptx_isa
 
 enum ptx_version
 {
+  PTX_VERSION_default,
   PTX_VERSION_3_0,
   PTX_VERSION_3_1,
   PTX_VERSION_4_2,
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index b9451c2ed09..7862a90a65a 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -296,7 +296,8 @@ sm_version_to_string (enum ptx_isa sm)
 static void
 handle_ptx_version_option (void)
 {
-  if (!OPTION_SET_P (ptx_version_option))
+  if (!OPTION_SET_P (ptx_version_option)
+  || ptx_version_option == PTX_VERSION_default)
 {
   ptx_version_option = default_ptx_version_option ();
   return;
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 9776c3b9a1f..f555ad1d8bf 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -94,6 +94,9 @@ Enum(ptx_version) String(6.3) Value(PTX_VERSION_6_3)
 EnumValue
 Enum(ptx_version) String(7.0) Value(PTX_VERSION_7_0)
 
+EnumValue
+Enum(ptx_version) String(_) Value(PTX_VERSION_default)
+
 mptx=
 Target RejectNegative ToLower Joined Enum(ptx_version) Var(ptx_version_option)
 Specify the version of the ptx version to use.


[committed][nvptx, testsuite] Add -misa=sm_30 in nvptx/atomic-store-3.c

2022-02-28 Thread Tom de Vries via Gcc-patches
Hi,

When running with target board nvptx-none-run/-misa=sm_70 I run into:
...
FAIL: gcc.target/nvptx/atomic-store-3.c scan-assembler-times st.global.u32 1
FAIL: gcc.target/nvptx/atomic-store-3.c scan-assembler-times st.global.u64 1
...

Fix this by adding an explicit -misa=sm_30 in the test-case.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx, testsuite] Add -misa=sm_30 in nvptx/atomic-store-3.c

gcc/testsuite/ChangeLog:

2022-02-28  Tom de Vries  

* gcc.target/nvptx/atomic-store-3.c: Add -misa=sm_30.

---
 gcc/testsuite/gcc.target/nvptx/atomic-store-3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/nvptx/atomic-store-3.c 
b/gcc/testsuite/gcc.target/nvptx/atomic-store-3.c
index cc0264f2b06..5d417b84b3e 100644
--- a/gcc/testsuite/gcc.target/nvptx/atomic-store-3.c
+++ b/gcc/testsuite/gcc.target/nvptx/atomic-store-3.c
@@ -1,7 +1,7 @@
 /* Test the atomic store expansion, global state space.  */
 
 /* { dg-do compile } */
-/* { dg-additional-options "-Wno-long-long" } */
+/* { dg-additional-options "-Wno-long-long -misa=sm_30" } */
 
 enum memmodel
 {


[committed][nvptx, testsuite] Add -misa=sm_30 in nvptx/uniform-simt-2.c

2022-02-28 Thread Tom de Vries via Gcc-patches
Hi,

When running with target board nvptx-none-run/-misa=sm_53 we run into:
...
cc1: error: PTX version (-mptx) needs to be at least 4.2 to support selected \
  -misa (sm_53)^M
compiler exited with status 1
FAIL: gcc.target/nvptx/uniform-simt-2.c (test for excess errors)
...

Fix this by adding an explicit -misa=sm_30 in the test-case.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx, testsuite] Add -misa=sm_30 in nvptx/uniform-simt-2.c

gcc/testsuite/ChangeLog:

2022-02-28  Tom de Vries  

* gcc.target/nvptx/uniform-simt-2.c: Add -misa=sm_30.

---
 gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c 
b/gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c
index 0f1e4e780fe..b1eee0d618f 100644
--- a/gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c
+++ b/gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -muniform-simt -mptx=3.1" } */
+/* { dg-options "-O2 -muniform-simt -mptx=3.1 -misa=sm_30" } */
 
 enum memmodel
 {


[committed][nvptx, testsuite] Add -misa=sm_35 in nvptx/rotate.c

2022-02-28 Thread Tom de Vries via Gcc-patches
Hi,

When running with target board nvptx-none-run/-misa=sm_30 we run into:
...
FAIL: gcc.target/nvptx/rotate.c scan-assembler-times shf.l.wrap.b32 1
FAIL: gcc.target/nvptx/rotate.c scan-assembler-times shf.r.wrap.b32 1
FAIL: gcc.target/nvptx/rotate.c scan-assembler-not and.b32
...

Fix this by adding an explicit -misa=sm_35 in the test-case.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx, testsuite] Add -misa=sm_35 in nvptx/rotate.c

gcc/testsuite/ChangeLog:

2022-02-28  Tom de Vries  

* gcc.target/nvptx/rotate.c: Add -misa=sm_35.

---
 gcc/testsuite/gcc.target/nvptx/rotate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/nvptx/rotate.c 
b/gcc/testsuite/gcc.target/nvptx/rotate.c
index 1c9b83b4809..a6045166b57 100644
--- a/gcc/testsuite/gcc.target/nvptx/rotate.c
+++ b/gcc/testsuite/gcc.target/nvptx/rotate.c
@@ -1,5 +1,5 @@
 /* { dg-do assemble } */
-/* { dg-options "-O2 -save-temps" } */
+/* { dg-options "-O2 -save-temps -misa=sm_35" } */
 
 #define MASK 0x1f
 


Re: [PATCH][libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c

2022-02-24 Thread Tom de Vries via Gcc-patches

On 2/24/22 11:09, Jakub Jelinek wrote:

On Thu, Feb 24, 2022 at 11:01:22AM +0100, Tom de Vries wrote:

[ was: Re: [Patch] nvptx: Add -mptx=6.0 + -misa=sm_70 ]

On 2/24/22 09:29, Tom de Vries wrote:

I'll try to submit a patch with one or more test-cases.


Hi,

These test-cases exercise the omp declare variant construct using the
available nvptx isas.

OK for trunk?

Thanks,
- Tom



[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c

Add openmp test-cases that test the omp declare variant construct:
...
   #pragma omp declare variant (f30) match (device={isa("sm_30")})
...
using the available nvptx isas.

On a Pascal board GT 1030 with sm_61, we have these unsupported:
...
UNSUPPORTED: libgomp.c/declare-variant-3-sm70.c
UNSUPPORTED: libgomp.c/declare-variant-3-sm75.c
UNSUPPORTED: libgomp.c/declare-variant-3-sm80.c
...
and on a Turing board T400 with sm_75, we have this only this one:
...
UNSUPPORTED: libgomp.c/declare-variant-3-sm80.c
...

Tested on x86_64 with nvptx accelerator.


I think testing it through dg-do link tests with -fdump-tree-optimized
or so would be better, you wouldn't need access to actual hardware level
and checking in the dump what function is actually called for each case is
easy.



Done, expect for the sm_30 test which is still dg-do run (although I've 
added the compile time test) which should pass on all boards (since we 
don't support below sm_30).


OK for trunk?

Thanks,
- Tom
[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c

Add openmp test-cases that test the omp declare variant construct:
...
  #pragma omp declare variant (f30) match (device={isa("sm_30")})
...
using the available nvptx isas.

Only the one for sm_30 is a dg-do run test-case, the other ones are dg-do
link.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-02-24  Tom de Vries  

	* testsuite/libgomp.c/declare-variant-3-sm30.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm35.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm53.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm70.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm75.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm80.c: New test.
	* testsuite/libgomp.c/declare-variant-3.h: New header file.

---
 .../testsuite/libgomp.c/declare-variant-3-sm30.c   |  7 +++
 .../testsuite/libgomp.c/declare-variant-3-sm35.c   |  7 +++
 .../testsuite/libgomp.c/declare-variant-3-sm53.c   |  7 +++
 .../testsuite/libgomp.c/declare-variant-3-sm70.c   |  7 +++
 .../testsuite/libgomp.c/declare-variant-3-sm75.c   |  7 +++
 .../testsuite/libgomp.c/declare-variant-3-sm80.c   |  7 +++
 libgomp/testsuite/libgomp.c/declare-variant-3.h| 66 ++
 7 files changed, 108 insertions(+)

diff --git a/libgomp/testsuite/libgomp.c/declare-variant-3-sm30.c b/libgomp/testsuite/libgomp.c/declare-variant-3-sm30.c
new file mode 100644
index 000..ad1602c13cd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/declare-variant-3-sm30.c
@@ -0,0 +1,7 @@
+/* { dg-do run { target { offload_target_nvptx } } } */
+/* { dg-additional-options "-foffload=-misa=sm_30" } */
+/* { dg-additional-options "-foffload=-fdump-tree-optimized" } */
+
+#include "declare-variant-3.h"
+
+/* { dg-final { scan-offload-tree-dump "= f30 \\(\\);" "optimized" } } */
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-3-sm35.c b/libgomp/testsuite/libgomp.c/declare-variant-3-sm35.c
new file mode 100644
index 000..1a7cda2456b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/declare-variant-3-sm35.c
@@ -0,0 +1,7 @@
+/* { dg-do link { target { offload_target_nvptx } } } */
+/* { dg-additional-options "-foffload=-misa=sm_35" } */
+/* { dg-additional-options "-foffload=-fdump-tree-optimized" } */
+
+#include "declare-variant-3.h"
+
+/* { dg-final { scan-offload-tree-dump "= f35 \\(\\);" "optimized" } } */
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-3-sm53.c b/libgomp/testsuite/libgomp.c/declare-variant-3-sm53.c
new file mode 100644
index 000..a37b5fdaa28
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/declare-variant-3-sm53.c
@@ -0,0 +1,7 @@
+/* { dg-do link { target { offload_target_nvptx } } } */
+/* { dg-additional-options "-foffload=-misa=sm_53" } */
+/* { dg-additional-options "-foffload=-fdump-tree-optimized" } */
+
+#include "declare-variant-3.h"
+
+/* { dg-final { scan-offload-tree-dump "= f53 \\(\\);" "optimized" } } */
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-3-sm70.c b/libgomp/testsuite/libgomp.c/declare-variant-3-sm70.c
new file mode 100644
index 000..ab022cd79f9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/declare-variant-3-sm70.c
@@ -0,0 +1,7 @@
+/* { dg-do link { target { offload_target_nvptx } } } */
+/* { dg-additional-options "-foffload=-misa=sm_70" } */
+/* { dg

[PATCH][libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c

2022-02-24 Thread Tom de Vries via Gcc-patches

[ was: Re: [Patch] nvptx: Add -mptx=6.0 + -misa=sm_70 ]

On 2/24/22 09:29, Tom de Vries wrote:

I'll try to submit a patch with one or more test-cases.


Hi,

These test-cases exercise the omp declare variant construct using the 
available nvptx isas.


OK for trunk?

Thanks,
- Tom[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c

Add openmp test-cases that test the omp declare variant construct:
...
  #pragma omp declare variant (f30) match (device={isa("sm_30")})
...
using the available nvptx isas.

On a Pascal board GT 1030 with sm_61, we have these unsupported:
...
UNSUPPORTED: libgomp.c/declare-variant-3-sm70.c
UNSUPPORTED: libgomp.c/declare-variant-3-sm75.c
UNSUPPORTED: libgomp.c/declare-variant-3-sm80.c
...
and on a Turing board T400 with sm_75, we have this only this one:
...
UNSUPPORTED: libgomp.c/declare-variant-3-sm80.c
...

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-02-24  Tom de Vries  

	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_device_nvptx_sm_xx)
	(check_effective_target_offload_device_nvptx_sm_30)
	(check_effective_target_offload_device_nvptx_sm_35)
	(check_effective_target_offload_device_nvptx_sm_53)
	(check_effective_target_offload_device_nvptx_sm_70)
	(check_effective_target_offload_device_nvptx_sm_75)
	(check_effective_target_offload_device_nvptx_sm_80): New proc.
	* testsuite/libgomp.c/declare-variant-3-sm30.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm35.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm53.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm70.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm75.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm80.c: New test.
	* testsuite/libgomp.c/declare-variant-3.h: New header file.

---
 libgomp/testsuite/lib/libgomp.exp  | 46 +++
 .../testsuite/libgomp.c/declare-variant-3-sm30.c   |  5 ++
 .../testsuite/libgomp.c/declare-variant-3-sm35.c   |  5 ++
 .../testsuite/libgomp.c/declare-variant-3-sm53.c   |  5 ++
 .../testsuite/libgomp.c/declare-variant-3-sm70.c   |  5 ++
 .../testsuite/libgomp.c/declare-variant-3-sm75.c   |  5 ++
 .../testsuite/libgomp.c/declare-variant-3-sm80.c   |  5 ++
 libgomp/testsuite/libgomp.c/declare-variant-3.h| 66 ++
 8 files changed, 142 insertions(+)

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 8c5ecfff0ac..d664863b15c 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -426,6 +426,52 @@ proc check_effective_target_offload_device_nvptx { } {
 } ]
 }
 
+# Return 1 if using nvptx offload device which supports -misa=sm_$SM.
+proc check_effective_target_offload_device_nvptx_sm_xx { sm } {
+if { ![check_effective_target_offload_device_nvptx] } {
+	return 0
+}
+return [check_runtime_nocache offload_device_nvptx_sm_$sm {
+  int main ()
+	{
+	  int x = 1;
+	  #pragma omp target map(tofrom: x)
+	x--;
+	  return x;
+	}
+} "-foffload=-misa=sm_$sm" ]
+}
+
+# See check_effective_target_offload_device_nvptx_sm_xx.
+proc check_effective_target_offload_device_nvptx_sm_30 { } {
+return [check_effective_target_offload_device_nvptx_sm_xx 30]
+}
+
+# See check_effective_target_offload_device_nvptx_sm_xx.
+proc check_effective_target_offload_device_nvptx_sm_35 { } {
+return [check_effective_target_offload_device_nvptx_sm_xx 35]
+}
+
+# See check_effective_target_offload_device_nvptx_sm_xx.
+proc check_effective_target_offload_device_nvptx_sm_53 { } {
+return [check_effective_target_offload_device_nvptx_sm_xx 53]
+}
+
+# See check_effective_target_offload_device_nvptx_sm_xx.
+proc check_effective_target_offload_device_nvptx_sm_70 { } {
+return [check_effective_target_offload_device_nvptx_sm_xx 70]
+}
+
+# See check_effective_target_offload_device_nvptx_sm_xx.
+proc check_effective_target_offload_device_nvptx_sm_75 { } {
+return [check_effective_target_offload_device_nvptx_sm_xx 75]
+}
+
+# See check_effective_target_offload_device_nvptx_sm_xx.
+proc check_effective_target_offload_device_nvptx_sm_80 { } {
+return [check_effective_target_offload_device_nvptx_sm_xx 80]
+}
+
 # Return 1 if at least one Nvidia GPU is accessible.
 
 proc check_effective_target_openacc_nvidia_accel_present { } {
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-3-sm30.c b/libgomp/testsuite/libgomp.c/declare-variant-3-sm30.c
new file mode 100644
index 000..7c680b07a94
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/declare-variant-3-sm30.c
@@ -0,0 +1,5 @@
+/* { dg-do run { target { offload_target_nvptx } } } */
+/* { dg-require-effective-target offload_device_nvptx_sm_30 } */
+/* { dg-additional-options "-foffload=-misa=sm_30" } */
+
+#include "declare-variant-3.h"
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-3-sm35.c b/libgomp/testsuite/libgomp.c/declare-variant-3-sm35.c
new file mode 100644
index 

Re: [Patch] nvptx: Add -mptx=6.0 + -misa=sm_70

2022-02-24 Thread Tom de Vries via Gcc-patches

On 2/22/22 17:03, Tobias Burnus wrote:

Hi Tom,

On 22.02.22 15:43, Tom de Vries wrote:

On 2/17/22 18:24, Tobias Burnus wrote:

--- a/gcc/config/nvptx/t-omp-device
+++ b/gcc/config/nvptx/t-omp-device
@@ -1,4 +1,4 @@
 echo kind: gpu > $@
 echo arch: nvptx >> $@
-    echo isa: sm_30 sm_35 >> $@
+    echo isa: sm_30 sm_35 sm_53 sm_70 sm_75 sm_80 >> $@


I'm not sure I understand how this is used.  Is this user-visible?  Is
there a libgomp test-case where we can observe a difference?


That's used for OpenMP context selectors like; that way, one can generate,
e.g. one code used with nvptx and one with gcn as with:

#pragma omp declare variant (on_nvptx) 
match(construct={target},device={arch(nvptx)})
#pragma omp declare variant (on_gcn) 
match(construct={target},device={arch(gcn)})

...
   #pragma omp target map(from:v)
   v = on ();
which then either calls 'on' or 'on_nvptx' or 'on_gcn'
(from libgomp/testsuite/libgomp.c/target-42.c)


The following testcases use 'arch(nvptx)':

libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h
libgomp/testsuite/libgomp.c/target-42.c
libgomp/testsuite/libgomp.c/usleep.h
libgomp/testsuite/libgomp.fortran/declare-variant-1.f90

For ISA, there is only one run-time test:

libgomp/testsuite/libgomp.c/declare-variant-1.c

but only for x86-64: match (device={isa("avx512f")})

The sm_35 also appears, but only in the compile-time tests:
gcc/testsuite/{c-c++-common,gfortran.dg}/gomp/declare-variant-{9,10}.*



Thanks for the explanation.

I've updated the patch to include changes to 
nvptx_omp_device_kind_arch_isa, and committed.


I'll try to submit a patch with one or more test-cases.

Thanks,
- Tom

[nvptx] Add missing t-omp-device isas

In t-omp-device we list isas that can be used in omp declare variant like so:
...
  #pragma omp declare variant (f30) match (device={isa("sm_30")})
...
and in nvptx_omp_device_kind_arch_isa we handle them.

Update both to reflect the current list of isas.

Tested on x86_64-linux with nvptx accelerator.

gcc/ChangeLog:

2022-02-23  Tom de Vries  

	* config/nvptx/nvptx.cc (nvptx_omp_device_kind_arch_isa): Handle
	sm_70, sm_75 and sm_80.
	* config/nvptx/t-omp-device: Add sm_53, sm_70, sm_75 and sm_80.

Co-Authored-By: Tobias Burnus 

---
 gcc/config/nvptx/nvptx.cc | 8 +++-
 gcc/config/nvptx/t-omp-device | 2 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 6f6d592e462..b9451c2ed09 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -6181,7 +6181,13 @@ nvptx_omp_device_kind_arch_isa (enum omp_device_kind_arch_isa trait,
   if (strcmp (name, "sm_35") == 0)
 	return TARGET_SM35 && !TARGET_SM53;
   if (strcmp (name, "sm_53") == 0)
-	return TARGET_SM53;
+	return TARGET_SM53 && !TARGET_SM70;
+  if (strcmp (name, "sm_70") == 0)
+	return TARGET_SM70 && !TARGET_SM75;
+  if (strcmp (name, "sm_75") == 0)
+	return TARGET_SM75 && !TARGET_SM80;
+  if (strcmp (name, "sm_80") == 0)
+	return TARGET_SM80;
   return 0;
 default:
   gcc_unreachable ();
diff --git a/gcc/config/nvptx/t-omp-device b/gcc/config/nvptx/t-omp-device
index 8765d9f1881..4228218a424 100644
--- a/gcc/config/nvptx/t-omp-device
+++ b/gcc/config/nvptx/t-omp-device
@@ -1,4 +1,4 @@
 omp-device-properties-nvptx: $(srcdir)/config/nvptx/nvptx.cc
 	echo kind: gpu > $@
 	echo arch: nvptx >> $@
-	echo isa: sm_30 sm_35 >> $@
+	echo isa: sm_30 sm_35 sm_53 sm_70 sm_75 sm_80 >> $@


[committed][nvptx] Add shf.{l,r}.wrap insn

2022-02-24 Thread Tom de Vries via Gcc-patches

On 2/23/22 12:40, Tom de Vries wrote:

Hi,

Ptx contains funnel shift operations shf.l.wrap and shf.r.wrap that can be
used to implement 32-bit left or right rotate.

Add define_insns rotlsi3 and rotrsi3.

Currently testing.



And committed.

Thanks,
- Tom


[nvptx] Add shf.{l,r}.wrap insn

gcc/ChangeLog:

2022-02-23  Tom de Vries  

* config/nvptx/nvptx.md (define_insn "rotlsi3", define_insn
"rotrsi3"): New define_insn.

gcc/testsuite/ChangeLog:

2022-02-23  Tom de Vries  

* gcc.target/nvptx/rotate-run.c: New test.
* gcc.target/nvptx/rotate.c: New test.

---
  gcc/config/nvptx/nvptx.md   | 16 
  gcc/testsuite/gcc.target/nvptx/rotate-run.c | 23 +++
  gcc/testsuite/gcc.target/nvptx/rotate.c | 20 
  3 files changed, 59 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 216e89f230ac..4989b5642e29 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -808,6 +808,22 @@
""
"%.\\tshr.u%T0\\t%0, %1, %2;")
  
+(define_insn "rotlsi3"

+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+   (rotate:SI (match_operand:SI 1 "nvptx_register_operand" "R")
+  (and:SI (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")
+  (const_int 31]
+  "TARGET_SM35"
+  "%.\\tshf.l.wrap.b32\\t%0, %1, %1, %2;")
+
+(define_insn "rotrsi3"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+   (rotatert:SI (match_operand:SI 1 "nvptx_register_operand" "R")
+(and:SI (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")
+(const_int 31]
+  "TARGET_SM35"
+  "%.\\tshf.r.wrap.b32\\t%0, %1, %1, %2;")
+
  ;; Logical operations
  
  (define_code_iterator any_logic [and ior xor])

diff --git a/gcc/testsuite/gcc.target/nvptx/rotate-run.c 
b/gcc/testsuite/gcc.target/nvptx/rotate-run.c
new file mode 100644
index ..14cb6f8b0b3f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/rotate-run.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "rotate.c"
+
+#define ASSERT(EXPR)   \
+  do   \
+{  \
+  if (!(EXPR)) \
+   __builtin_abort (); \
+} while (0)
+
+int
+main (void)
+{
+  ASSERT (rotl (0x12345678, 8) == 0x34567812);
+  ASSERT (rotl (0x12345678, 8 + 32) == 0x34567812);
+
+  ASSERT (rotr (0x12345678, 8) == 0x78123456);
+  ASSERT (rotr (0x12345678, 8 + 32) == 0x78123456);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/nvptx/rotate.c 
b/gcc/testsuite/gcc.target/nvptx/rotate.c
new file mode 100644
index ..1c9b83b4809d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/rotate.c
@@ -0,0 +1,20 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -save-temps" } */
+
+#define MASK 0x1f
+
+unsigned int
+rotl (unsigned int val, unsigned int cnt) {
+  cnt &= MASK;
+  return (val << cnt) | (val >> (-cnt & MASK));
+}
+
+unsigned int
+rotr (unsigned int val, unsigned int cnt) {
+  cnt &= MASK;
+  return (val >> cnt) | (val << (-cnt & MASK));
+}
+
+/* { dg-final { scan-assembler-times "shf.l.wrap.b32" 1 } } */
+/* { dg-final { scan-assembler-times "shf.r.wrap.b32" 1 } } */
+/* { dg-final { scan-assembler-not "and.b32" } } */


[committed][nvptx] Fix dummy location in gen_comment

2022-02-24 Thread Tom de Vries via Gcc-patches

On 2/23/22 12:58, Thomas Schwinge wrote:

Hi!

On 2022-02-23T12:14:57+0100, Tom de Vries via Gcc-patches 
 wrote:

[ Re: [committed][nvptx] Add -mptx-comment ]

On 2/22/22 14:53, Tom de Vries wrote:

Add functionality that indicates which insns are added by -minit-regs, such
that for instance we have for pr53465.s:
...
  // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
  // Start: Added by -minit-regs=3:
  // #NO_APP
  mov.u32 %r26, 0;
  // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
  // End: Added by -minit-regs=3:
  // #NO_APP
...

Can be switched off using -mno-ptx-comment.

Tested on nvptx.


But tested in combination with another patch, which is still waiting for
review.

This patch by itself caused some regressions


I'd just begun analyzing and determined that it was
commit c2b23aaaf4457278403c01cd145cd3936683384e
"[nvptx] Add -mptx-comment" that causes a load of FAILs in nvptx
offloading testing:

 Program received signal SIGSEGV, Segmentation fault.
 0x0084abad in final_scan_insn_1 (insn=insn@entry=0x77380940, 
file=file@entry=0x1f50c40, optimize_p=optimize_p@entry=0, 
nopeepholes=nopeepholes@entry=0, seen=seen@entry=0x7fffd07c) at 
[...]/source-gcc/gcc/final.cc:2650
 2650if (*loc.file && loc.line)
 (gdb) print loc
 $1 = {file = 0x0, line = 0, column = 0, data = 0x0, sysp = false}
 (gdb) bt
 #0  0x0084abad in final_scan_insn_1 
(insn=insn@entry=0x77380940, file=file@entry=0x1f50c40, 
optimize_p=optimize_p@entry=0, nopeepholes=nopeepholes@entry=0, 
seen=seen@entry=0x7fffd07c) at [...]/source-gcc/gcc/final.cc:2650
 #1  0x0084b86a in final_scan_insn (insn=insn@entry=0x77380940, 
file=file@entry=0x1f50c40, optimize_p=optimize_p@entry=0, 
nopeepholes=nopeepholes@entry=0, seen=seen@entry=0x7fffd07c) at 
[...]/source-gcc/gcc/final.cc:2942
 #2  0x0084823a in final_1 (first=0x774631c0, file=0x1f50c40, 
seen=1, optimize_p=0) at [...]/source-gcc/gcc/final.cc:1999
 #3  0x0085091a in rest_of_handle_final () at 
[...]/source-gcc/gcc/final.cc:4287
 #4  0x00850de4 in (anonymous namespace)::pass_final::execute 
(this=0x1f4bd00) at [...]/source-gcc/gcc/final.cc:4365
 #5  0x00b781b1 in execute_one_pass (pass=pass@entry=0x1f4bd00) at 
[...]/source-gcc/gcc/passes.cc:2639
 #6  0x00b7855a in execute_pass_list_1 (pass=0x1f4bd00) at 
[...]/source-gcc/gcc/passes.cc:2739
 #7  0x00b7858d in execute_pass_list_1 (pass=0x1f4b820) at 
[...]/source-gcc/gcc/passes.cc:2740
 #8  0x00b7858d in execute_pass_list_1 (pass=0x1f49d20, 
pass@entry=0x1f45780) at [...]/source-gcc/gcc/passes.cc:2740
 #9  0x00b785e9 in execute_pass_list (fn=0x772e1e40, 
pass=0x1f45780) at [...]/source-gcc/gcc/passes.cc:2750
 #10 0x00732a66 in cgraph_node::expand (this=0x772efbb0) at 
[...]/source-gcc/gcc/cgraphunit.cc:1836
 #11 0x0073336a in cgraph_order_sort::process (this=0x20730f8) at 
[...]/source-gcc/gcc/cgraphunit.cc:2075
 #12 0x007336f4 in output_in_order () at 
[...]/source-gcc/gcc/cgraphunit.cc:2143
 #13 0x00733dbe in symbol_table::compile (this=0x77542000) at 
[...]/source-gcc/gcc/cgraphunit.cc:2347
 #14 0x0065d79b in lto_main () at 
[...]/source-gcc/gcc/lto/lto.cc:655
 #15 0x00c709e6 in compile_file () at 
[...]/source-gcc/gcc/toplev.cc:454
 #16 0x00c73abb in do_compile (no_backend=no_backend@entry=false) 
at [...]/source-gcc/gcc/toplev.cc:2160
 #17 0x00c73ea6 in toplev::main (this=this@entry=0x7fffd4b0, 
argc=argc@entry=16, argv=0x1f1db40, argv@entry=0x7fffd5b8) at 
[...]/source-gcc/gcc/toplev.cc:2312
 #18 0x0174fe5f in main (argc=16, argv=0x7fffd5b8) at 
[...]/source-gcc/gcc/main.cc:41


currently testing attached
fix.


Per the test results that I've got so far (but is still running), your
proposed fix does resolve the SIGSEGVs, thanks.


Thanks for testing this, and sorry for the fall-out.

Now committed.

Thanks,
- Tom


[PATCH][nvptx] Add shf.{l,r}.wrap insn

2022-02-23 Thread Tom de Vries via Gcc-patches
Hi,

Ptx contains funnel shift operations shf.l.wrap and shf.r.wrap that can be
used to implement 32-bit left or right rotate.

Add define_insns rotlsi3 and rotrsi3.

Currently testing.

Thanks,
- Tom

[nvptx] Add shf.{l,r}.wrap insn

gcc/ChangeLog:

2022-02-23  Tom de Vries  

* config/nvptx/nvptx.md (define_insn "rotlsi3", define_insn
"rotrsi3"): New define_insn.

gcc/testsuite/ChangeLog:

2022-02-23  Tom de Vries  

* gcc.target/nvptx/rotate-run.c: New test.
* gcc.target/nvptx/rotate.c: New test.

---
 gcc/config/nvptx/nvptx.md   | 16 
 gcc/testsuite/gcc.target/nvptx/rotate-run.c | 23 +++
 gcc/testsuite/gcc.target/nvptx/rotate.c | 20 
 3 files changed, 59 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 216e89f230ac..4989b5642e29 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -808,6 +808,22 @@
   ""
   "%.\\tshr.u%T0\\t%0, %1, %2;")
 
+(define_insn "rotlsi3"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+   (rotate:SI (match_operand:SI 1 "nvptx_register_operand" "R")
+  (and:SI (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")
+  (const_int 31]
+  "TARGET_SM35"
+  "%.\\tshf.l.wrap.b32\\t%0, %1, %1, %2;")
+
+(define_insn "rotrsi3"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+   (rotatert:SI (match_operand:SI 1 "nvptx_register_operand" "R")
+(and:SI (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")
+(const_int 31]
+  "TARGET_SM35"
+  "%.\\tshf.r.wrap.b32\\t%0, %1, %1, %2;")
+
 ;; Logical operations
 
 (define_code_iterator any_logic [and ior xor])
diff --git a/gcc/testsuite/gcc.target/nvptx/rotate-run.c 
b/gcc/testsuite/gcc.target/nvptx/rotate-run.c
new file mode 100644
index ..14cb6f8b0b3f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/rotate-run.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "rotate.c"
+
+#define ASSERT(EXPR)   \
+  do   \
+{  \
+  if (!(EXPR)) \
+   __builtin_abort (); \
+} while (0)
+
+int
+main (void)
+{
+  ASSERT (rotl (0x12345678, 8) == 0x34567812);
+  ASSERT (rotl (0x12345678, 8 + 32) == 0x34567812);
+
+  ASSERT (rotr (0x12345678, 8) == 0x78123456);
+  ASSERT (rotr (0x12345678, 8 + 32) == 0x78123456);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/nvptx/rotate.c 
b/gcc/testsuite/gcc.target/nvptx/rotate.c
new file mode 100644
index ..1c9b83b4809d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/rotate.c
@@ -0,0 +1,20 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -save-temps" } */
+
+#define MASK 0x1f
+
+unsigned int
+rotl (unsigned int val, unsigned int cnt) {
+  cnt &= MASK;
+  return (val << cnt) | (val >> (-cnt & MASK));
+}
+
+unsigned int
+rotr (unsigned int val, unsigned int cnt) {
+  cnt &= MASK;
+  return (val >> cnt) | (val << (-cnt & MASK));
+}
+
+/* { dg-final { scan-assembler-times "shf.l.wrap.b32" 1 } } */
+/* { dg-final { scan-assembler-times "shf.r.wrap.b32" 1 } } */
+/* { dg-final { scan-assembler-not "and.b32" } } */


[PATCH][nvptx] Fix dummy location in gen_comment

2022-02-23 Thread Tom de Vries via Gcc-patches

[ Re: [committed][nvptx] Add -mptx-comment ]

On 2/22/22 14:53, Tom de Vries wrote:

Hi,

Add functionality that indicates which insns are added by -minit-regs, such
that for instance we have for pr53465.s:
...
 // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
 // Start: Added by -minit-regs=3:
 // #NO_APP
 mov.u32 %r26, 0;
 // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
 // End: Added by -minit-regs=3:
 // #NO_APP
...

Can be switched off using -mno-ptx-comment.

Tested on nvptx.


But tested in combination with another patch, which is still waiting for 
review.


This patch by itself caused some regressions, currently testing attached 
fix.


Thanks,
- Tom
[nvptx] Fix dummy location in gen_comment

I committed "[nvptx] Add -mptx-comment", but tested it in combination with the
proposed "[final] Handle compiler-generated asm insn" (
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590721.html ), so
by itself the commit introduced some regressions:
...
FAIL: gcc.dg/20020426-2.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/analyzer/zlib-3.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/pr101223.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/torture/pr80764.c   -O2  (internal compiler error: Segmentation fault)
...

There are due to cfun->function_start_locus == 0.

Fix these by using DECL_SOURCE_LOCATION (cfun->decl) instead.

Tested on nvptx.

gcc/ChangeLog:

2022-02-23  Tom de Vries  

	* config/nvptx/nvptx.cc (gen_comment): Use
	DECL_SOURCE_LOCATION (cfun->decl) instead of cfun->function_start_locus.

---
 gcc/config/nvptx/nvptx.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 858789e6df76..6f6d592e4621 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -5382,7 +5382,7 @@ gen_comment (const char *s)
   char *comment = (char *) alloca (len);
   snprintf (comment, len, "%s%s%s", ASM_COMMENT_START, sep, s);
   return gen_rtx_ASM_INPUT_loc (VOIDmode, ggc_strdup (comment),
-cfun->function_start_locus);
+DECL_SOURCE_LOCATION (cfun->decl));
 }
 
 /* Initialize all declared regs at function entry.


Re: [committed][nvptx] Use nvptx_warpsync / nvptx_uniform_warp_check for -muniform-simt

2022-02-23 Thread Tom de Vries via Gcc-patches

On 2/23/22 10:06, Thomas Schwinge wrote:

Hi Tom!

This is me again, following along GCC/nvptx devlopment, and asking
questions.  ;-)



Yes, thanks for that, that's useful :)


On 2022-02-19T20:07:18+0100, Tom de Vries via Gcc-patches 
 wrote:

With the default ptx isa 6.0, we have for uniform-simt-1.c:
...
 @%r33   atom.global.cas.b32 %r26, [a], %r28, %r29;
 shfl.sync.idx.b32   %r26, %r26, %r32, 31, 0x;
...

The atomic insn is predicated by -muniform-simt, and the subsequent insn does
a warp sync, at which point the warp is uniform again.


I understand the concern here is Independent Thread Scheduling, where the
execution of predicated-off threads of a warp ('@ ! %r33') may proceed
with the next instruction, 'shfl', without implicitly waiting for the
other threads of a warp still working on the 'atom'?  Hence, the 'sync'
aspect of 'shfl.sync', as a means that PTX provides at the ISA level such
that we're getting the desired semantics: as its first step, "wait for
all threads in membermask to arrive".



Indeed.


But with -mptx=3.1, we have instead:
...
 @%r33   atom.global.cas.b32 %r26, [a], %r28, %r29;
 shfl.idx.b32%r26, %r26, %r32, 31;
...

The shfl does not sync the warp, and we want the warp to go back to executing
uniformly asap.  We cannot enforce this


Is it really the case that such code may cause "permanent" warp-divergent
execution (until re-converging "somewhere")?  My understanding has been
that predicated-off threads of a warp ('@ ! %r33') would simply idle,
implicitly waiting for the other threads of a warp still working on the
'atom' -- due to the nature of a shared program counter per warp, and the
desire to re-converge as soon as possible.

For example, PTX ISA 7.2, 3.1. "A Set of SIMT Multiprocessors":

| [...]
| At every instruction issue time, the SIMT unit selects a warp that is ready 
to execute and
| issues the next instruction to the active threads of the warp. A warp 
executes one common
| instruction at a time, so full efficiency is realized when all threads of a 
warp agree on their
| execution path. If threads of a warp diverge via a data-dependent conditional 
branch, the
| warp serially executes each branch path taken, disabling threads that are not 
on that path,
| and when all paths complete, the threads converge back to the same execution 
path. [...]

So I'd have assumed that after the potentially-diverging
'@%r33'-predicated 'atom' instruction, we're implicitly re-converging for
the unpredicated 'shfl' (as long as Independent Thread Scheduling isn't
involved, which it it's for '-mptx=3.1')?

As I'm understanding you, my understanding is not correct, and we may
thus be getting "permanent" warp-divergent execution as soon as there's
any predication/conditional involved that may evaluate differently for
individual threads of a warp, and we thus need such *explicit*
synchronization after all such instances?



Reading the ptx manual, I think your interpretation of what _should_ 
happen is right.


Regardless, the JIT is still free to translate say a block of equally 
predicated insns using a branch as long as it inserts a warp sync right 
after.  And then there might be a JIT bug that optimizes that sync away, 
or shift it further out, past the shfl.


So perhaps the rationale should have been formulated more in terms of 
the shfl.  Note btw that it's possible that there's a compiler bug that 
does a diverging branch earlier, which would give problems for the shfl, 
and which the check would catch.


Note that the uniform-warp-check insn doesn't enforce convergence.  It 
only checks that the warp is convergent.


So, if the warp is not convergent, the check will abort.

If the warp is convergent, the JIT optimizer is free to optimize the 
check away.


And sometimes we have seen that adding the check makes the warp 
convergent (as in: preventing some JIT bug to trigger).


Anyway, unfortunately at this point I don't remember whether I found a 
smoking gun specifically for openmp.


Thanks,
- Tom


but at least check this using
nvptx_uniform_warp_check, similar to how that is done for openacc.

Likewise, detect the case that no shfl insn is emitted, and add a
nvptx_uniform_warp_check or nvptx_warpsync.


For example, 'nvptx-none/mgomp/libatomic/cas_1_.o':

 [...]
  @ %r71 atom.cas.b64 %r62,[%r35],%r29,%r61;
 +{
 +.reg .b32 act;
 +vote.ballot.b32 act,1;
 +.reg .pred uni;
 +setp.eq.b32 uni,act,0x;
 +@ ! uni trap;
 +@ ! uni exit;
 +}
  mov.b64 {%r69,%r70},%r62;
  shfl.idx.b32 %r69,%r69,%r68,31;
  shfl.idx.b32 %r70,%r70,%r68,31;
 [...]

So that's basically an 'assert' that all threads of a warp are converged.
(Is the JIT maybe even able to optimize that out?)  I guess I just wonder
if that's not satisfied implicitly.


Grüße
  Thomas



[nvptx] Use nvptx_warpsync / nvptx_uniform_warp_check f

Re: [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-02-22 Thread Tom de Vries via Gcc-patches

On 2/22/22 17:08, Roger Sayle wrote:


Hi Tom,

I'll admit that I'd not myself considered the ABI issues when I initially 
proposed
experimental HFmode support for the nvptx backend, and was surprised when
I finally tracked down the source of the problem you'd reported: that libgcc
spots HFmode support exists and immediately starts passing/returning values
in this type.

The one precedent that I can point to is that LLVM's nvptx backend passes
HFmode values in SImode regs,   see https://reviews.llvm.org/D28540


Interesting, thanks for the link.


Their motivation is that not all PTX ISAs support fp16, so for compatibility
with say sm_30/sm_35, fp16 values are treated like b16, i.e. HImode.
At this point, the nvptx ABI states that HImode values are passed as SImode,
so we end up with the interesting mismatch of HFmode<->SImode.


Indeed, that sounds plausible.

And IIUC, that also means that this leaves the door open for us to 
implement fp16 support for pre-sm_53 using b16 in a compatible way.


Then I think the current solution is OK, thanks for digging this up.

Thanks,
-Tom


I guess the same thing affects host code, where an i386/x86 host that
doesn't support 16-bit floating point, can pass "unsigned short" values
to and from the accelerator, and likewise this HImode locally gets passed
in a wider (often WORD_MODE) integer types on most x86 ABIs.

My guess is that passing SFmode in DImode may have been supported
in older versions of GCC, before handling of SUBREGs was tightened up,
so this might be considered a regression.

Cheers,
Roger
--


-Original Message-----
From: Tom de Vries 
Sent: 22 February 2022 15:43
To: Roger Sayle ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] middle-end: Support ABIs that pass FP values as wider
integers.

On 2/9/22 21:12, Roger Sayle wrote:


This patch adds middle-end support for target ABIs that pass/return
floating point values in integer registers with precision wider than
the original FP mode.  An example, is the nvptx backend where 16-bit
HFmode registers are passed/returned as (promoted to) SImode registers.
Unfortunately, this currently falls foul of the various (recent?)
sanity checks that (very sensibly) prevent creating paradoxical
SUBREGs of floating point registers.  The approach below is to
explicitly perform the conversion/promotion in two steps, via an
integer mode of same precision as the floating point value.  So on
nvptx, 16-bit HFmode is initially converted to 16-bit HImode (using
SUBREG), then zero-extended to SImode, and likewise when going the
other way, parameters truncated to HImode then converted to HFmode
(using SUBREG).  These changes are localized to expand_value_return
and expanding DECL_RTL to support strange ABIs, rather than inside
convert_modes or gen_lowpart, as mismatched precision integer/FP
conversions should be explicit in the RTL, and these semantics not generally

visible/implicit in user code.




Hi Roger,

I cannot comment on the patch, but I do wonder (after your "strange ABI"
comment): did we actively decide on (or align to) a register passing ABI for
HFmode, or has it merely been decided by the implementation of
promote_arg:
...
static machine_mode
promote_arg (machine_mode mode, bool prototyped) {
if (!prototyped && mode == SFmode)
  /* K float promotion for unprototyped functions.  */
  mode = DFmode;
else if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode))
  mode = SImode;

return mode;
}
...

There may be a rationale why it's good to pass a HF as SI, but it's not
documented there.

Anyway, I checked what cuda does for HF, and it passes a byte array:
...
.param .align 2 .b8 _Z5helloPj6__halfs_param_1[2], ...

So, I guess what I'm saying is I'd like to understand why we're having the HF 
-> SI
promotion.

Thanks,
- Tom




Re: [PATCH] nvptx: Back-end portion of a fix for PR target/104489.

2022-02-22 Thread Tom de Vries via Gcc-patches

On 2/11/22 11:38, Roger Sayle wrote:

This one line fix/tweak is the back-end specific change for a fix for

PR target/104489, that allows the ISA for GCC's nvptx backend to be bumped

to sm_53.  The machine-independent middle-end pieces were posted here:

https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html

This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu,

together with the above middle-end patch and changes identical to those

described by Tom de Vries in the PR, with make and make -k check, where

the build now completes, and there are no regressions in the testsuite.

Ok for mainline?

2022-02-11  Roger Sayle  

gcc/ChangeLog

PR target/104489

* config/nvptx/nvptx.md (*movhf_insn): Add subregs_ok attribute.



LGTM.

Thanks,
- Tom



Re: [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-02-22 Thread Tom de Vries via Gcc-patches

On 2/9/22 21:12, Roger Sayle wrote:


This patch adds middle-end support for target ABIs that pass/return
floating point values in integer registers with precision wider than
the original FP mode.  An example, is the nvptx backend where 16-bit
HFmode registers are passed/returned as (promoted to) SImode registers.
Unfortunately, this currently falls foul of the various (recent?) sanity
checks that (very sensibly) prevent creating paradoxical SUBREGs of
floating point registers.  The approach below is to explicitly perform the
conversion/promotion in two steps, via an integer mode of same precision
as the floating point value.  So on nvptx, 16-bit HFmode is initially
converted to 16-bit HImode (using SUBREG), then zero-extended to SImode,
and likewise when going the other way, parameters truncated to HImode
then converted to HFmode (using SUBREG).  These changes are localized
to expand_value_return and expanding DECL_RTL to support strange ABIs,
rather than inside convert_modes or gen_lowpart, as mismatched
precision integer/FP conversions should be explicit in the RTL,
and these semantics not generally visible/implicit in user code.



Hi Roger,

I cannot comment on the patch, but I do wonder (after your "strange ABI" 
comment): did we actively decide on (or align to) a register passing ABI 
for HFmode, or has it merely been decided by the implementation of 
promote_arg:

...
static machine_mode
promote_arg (machine_mode mode, bool prototyped)
{
  if (!prototyped && mode == SFmode)
/* K float promotion for unprototyped functions.  */
mode = DFmode;
  else if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode))
mode = SImode;

  return mode;
}
...

There may be a rationale why it's good to pass a HF as SI, but it's not 
documented there.


Anyway, I checked what cuda does for HF, and it passes a byte array:
...
.param .align 2 .b8 _Z5helloPj6__halfs_param_1[2],
...

So, I guess what I'm saying is I'd like to understand why we're having 
the HF -> SI promotion.


Thanks,
- Tom


  1   2   3   4   5   6   7   8   9   10   >