[Bug c++/113599] [14 Regression] Wrong computation of member offset through pointer-to-member since r14-5503

2024-01-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113599

Tom de Vries  changed:

   What|Removed |Added

 CC||vries at gcc dot gnu.org

--- Comment #2 from Tom de Vries  ---
FWIW, the inherit order is relevant, after applying this change we get the
expected result:
...
-struct thread_info : public dummy, public intrusive_list_node {
+struct thread_info : public intrusive_list_node, public dummy {
...

This could be used as workaround.

[Bug debug/112565] Abnormal Jump in Execution using 'stepi' Command in GDB under O2 optimization

2023-11-16 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112565

--- Comment #1 from Tom de Vries  ---
(In reply to Anonymous from comment #0)
> Tom de Vries suggests that this issue may be attributed to a GCC
> optimization bug.

I do not.

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799

--- Comment #7 from Tom de Vries  ---
(In reply to Alexander Monakov from comment #5)
> This trips Valgrind's data race detector (valgrind --tool=helgrind) too. So
> I don't think checking SANITIZE_THREAD is the correct approach.

Can you elaborate on what you consider a correct approach?

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799

--- Comment #6 from Tom de Vries  ---
(In reply to rguent...@suse.de from comment #4)
> I'm suggesting to not fix it ;) 

Can you explain why ?

It doesn't look difficult to fix to me, and I don't see any downsides.

> That said, is TSAN a useful vehicle?

Well, false positives aside, yes.

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799

--- Comment #2 from Tom de Vries  ---
(In reply to Richard Biener from comment #1)
> We consider introducing load data races OK, what's the difference here? 

This is a load vs. store data race.

> There are other passes that would do similar things but in practice the
> loads would be considered to possibly trap so the real-world impact might be
> limited?

If you're suggesting to fix this in a more generic way, I'm all for it.

[Bug sanitizer/110799] New: [tsan] False positive due to -fhoist-adjacent-loads

2023-07-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799

Bug ID: 110799
   Summary: [tsan] False positive due to -fhoist-adjacent-loads
   Product: gcc
   Version: 13.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at 
gcc dot gnu.org
  Target Milestone: ---

I build gdb with -O2 -fsanitizer=thread, and ran into a false positive due to
-fhoist-adjacent-loads.

Minimal example:
...
$ cat race.c
#include 
#include 

int c;

struct s
{
  int a;
  int b;
};

struct s v1;

int v3;

void *
thread1 (void *x)
{
  v1.a = 1;
  return NULL;
}

void *
thread2 (void *x)
{
  v3 = c ? v1.a : v1.b;
  return NULL;
}

int
main (void)
{
  pthread_t t[2];

  pthread_create([0], NULL, thread1, NULL);
  pthread_create([1], NULL, thread2, NULL);

  pthread_join(t[0], NULL);
  pthread_join(t[1], NULL);

  return 0;
}
...

With O0, runs fine:
...
$ gcc race.c -fsanitize=thread -g
$ ./a.out 
$
...

With O2, a race is reported:
...
$ gcc race.c -fsanitize=thread -g -O2
$ ./a.out 
==
WARNING: ThreadSanitizer: data race (pid=24538)
  Read of size 4 at 0x00404060 by thread T2:
#0 thread2 /data/vries/gdb/race.c:26 (a.out+0x401299) (BuildId:
295673549b1e99c73c70a2a8d26944f177f88c15)
#1   (libtsan.so.2+0x3c329) (BuildId:
8f2a9be581a0fcb3d7109755a6067408093b9dbd)

  Previous write of size 4 at 0x00404060 by thread T1:
#0 thread1 /data/vries/gdb/race.c:19 (a.out+0x401257) (BuildId:
295673549b1e99c73c70a2a8d26944f177f88c15)
#1   (libtsan.so.2+0x3c329) (BuildId:
8f2a9be581a0fcb3d7109755a6067408093b9dbd)
...

With -fno-hoist-adjacent-loads, it's fine again:
...
$ gcc race.c -fsanitize=thread -g -O2 -fno-hoist-adjacent-loads
$ ./a.out 
$
...

The optimization transforms these loads:
...
  v3 = c ? v1.a : v1.b;
...
into:
...
  int tmp_a = v1.a;
  int tmp_b = v1.b;
  v3 = c ? tmp_a : tmp_b
...
which introduces the false positive.

So I wonder if there should be a change like this:
...
 static bool
 gate_hoist_loads (void)
 {
   return (flag_hoist_adjacent_loads == 1
+  && (flag_sanitize & SANITIZE_THREAD) == 0
   && param_l1_cache_line_size
   && HAVE_conditional_move);
 }
...

[Bug c/109708] New: [c, doc] wdangling-pointer example broken

2023-05-03 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109708

Bug ID: 109708
   Summary: [c, doc] wdangling-pointer example broken
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

I ran into a Wdangling-pointer warning and decided to read the docs and try out
an example.

The first one listed is:
...
int f (int c1, int c2, x)
{
  char *p = strchr ((char[]){ c1, c2 }, c3);
  // warning: dangling pointer to a compound literal
  return p ? *p : 'x';
}
...

It's not a complete example, x is missing a declared type and c3 is undeclared. 

After fixing that (and adding the implicit "#include "), we have an
example that compiles:
...
#include 

int f (int c1, int c2, int c3)
{
  char *p = strchr ((char[]){ c1, c2 }, c3);
  return p ? *p : 'x';
}
...
but no warning, not at O0, O1, O2 or O3:
...
$ gcc test.c -Wdangling-pointer=1 -c
$
...

After reading the description of the warning, I managed to come up with:
...
char
f (char c1, char c2)
{
  char *p;

  {
p = (char[]) { c1, c2 };
  } 

  return *p;
}
...
which does manage to trigger the warning for O0-O3:
...
$ gcc test.c -Wdangling-pointer=1 -c
test.c: In function ‘f’:
test.c:10:10: warning: using dangling pointer ‘p’ to an unnamed temporary
[-Wdangling-pointer=]
   10 |   return *p;
  |  ^~
test.c:7:18: note: unnamed temporary defined here
7 | p = (char[]) { c1, c2 };
  |  ^
$
...

It might be worth mentioning that it's a C example, when using g++ we have:
...
$ g++ test.c -Wdangling-pointer=1 -c
test.c: In function ‘char f(char, char)’:
test.c:7:18: error: taking address of temporary array
7 | p = (char[]) { c1, c2 };
  |  ^~
...

BTW, note that the warning can be fixed by doing:
...
 char
 f (char c1, char c2)
 {
   char *p;
+  char c;

  {
p = (char[]) { c1, c2 };
+   c = *p;
  }

-  return *p;
+  return c;
}
...

[Bug debug/108600] Use DW_LNS_set_prologue_end

2023-01-31 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600

--- Comment #4 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> Note that for for instance gdb test-case gdb.ada/ref_param.exp, this
> convention was broken for gcc 7.5.0 (and I don't know how much earlier), and
> my current guess is that it got fixed in gcc 11.1.0, by commit c029fcb5680
> ("Reset force_source_line in final.c").

Confirmed, see PR108615.

[Bug debug/108615] Incorrect prologue marker in line table

2023-01-31 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108615

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |11.0
   Keywords||wrong-debug

--- Comment #1 from Tom de Vries  ---
Closing with milestone gcc 11.0.

[Bug debug/108615] New: Incorrect prologue marker in line table

2023-01-31 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108615

Bug ID: 108615
   Summary: Incorrect prologue marker in line table
   Product: gcc
   Version: 10.4.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

[ Filing FTR, to link a commit to a test-case that it fixes, and to be able to
refer to it from gdb sources. ]

Consider pck.adb/pck.ads from here (
https://sourceware.org/git/?p=binutils-gdb.git;a=tree;f=gdb/testsuite/gdb.ada/ref_param;h=823556d8928bd1de865e71400092e6a44b2773cd;hb=HEAD
).

Compile like so, with system gcc 7.5.0:
...
$ gcc pck.adb -S -g
...

In the .s file we find:
...
pck__call_me:
.LFB3:
.file 1 "pck.adb"
.loc 1 18 0
.cfi_startproc
.loc 1 18 0
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
movq%rdi, -8(%rbp)
.loc 1 20 0
...

There's a convention that the second entry in the line table indicates the end
of the prologue.

So the second .loc indicates that the prologue ends there, while it ends only
after the third insn, at line 20.

Same with 10.4.0.

Fixed by commit c029fcb5680 ("Reset force_source_line in final.c"), first
available in release 11.1.0.

With 11.3.0 we have:
...
pck__call_me:
.LFB3:
.file 1 "pck.adb"
.loc 1 18 4
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
movq%rdi, -8(%rbp)
.loc 1 20 16
...

[Bug debug/108600] Use DW_LNS_set_prologue_end

2023-01-31 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600

--- Comment #3 from Tom de Vries  ---
(In reply to Tom de Vries from comment #2)
> (In reply to Tom de Vries from comment #1)
> > Created attachment 54371 [details]
> 
> We probably don't want to emit in all cases, maybe limiting to 
> "dwarf_version >= 3", or "!dwarf_strict || dwarf_version >= 3".

Let's see about dwarf_strict.

Semantics:
...
-gstrict-dwarf

Disallow using extensions of later DWARF standard version than selected
with -gdwarf-version. On most targets using non-conflicting DWARF extensions
from later standard versions is allowed.
...

For the -gas-loc-support case (gcc emitting .locs), even when passing
-gdwarf-2, gas emits a v3 version .debug_line section (since binutils-2_32). 
And even if we'd fix that (I've filed
https://sourceware.org/bugzilla/show_bug.cgi?id=30064), the way gas works is by
bumping the dwarf version when encountering a feature that requires a higher
version, so using end_prologue in a loc directive would then end up bumping
dwarf_level to 3, bumping also the .debug_line version.

For the -gno-as-loc-support case (gcc emitting .debug_line contribution), for
-gdwarf-2 gcc indeed emits a v2 .debug_line section.  But, that makes
DW_LNS_set_prologue_end fall in the range of vendor specific extensions, and we
can't conflict with that.

Taking this all into account, I think it's better not to emit
DW_LNS_set_prologue_end for -gdwarf-2 -gno-strict-dwarf.

[Bug debug/47471] [10/11/12/13 Regression] stdarg functions extraneous too-early prologue end

2023-01-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47471

Tom de Vries  changed:

   What|Removed |Added

 CC||vries at gcc dot gnu.org

--- Comment #24 from Tom de Vries  ---
I can reproduce this with gcc 4.8.5:
...
f:
.LFB0:
.file 1 "test.c"
.loc 1 3 0
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
subq$60, %rsp
movq%rsi, -168(%rbp)
movq%rdx, -160(%rbp)
movq%rcx, -152(%rbp)
movq%r8, -144(%rbp)
movq%r9, -136(%rbp)
testb   %al, %al
je  .L2
.loc 1 3 0
movaps  %xmm0, -128(%rbp)
movaps  %xmm1, -112(%rbp)
movaps  %xmm2, -96(%rbp)
movaps  %xmm3, -80(%rbp)
movaps  %xmm4, -64(%rbp)
movaps  %xmm5, -48(%rbp)
movaps  %xmm6, -32(%rbp)
movaps  %xmm7, -16(%rbp)
.L2:
movl%edi, -180(%rbp)
.loc 1 4 0
movlv(%rip), %eax
...

But with gcc 7.5.0, I get:
...
f:
.LFB0:
.file 1 "test.c"
.loc 1 3 0
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
subq$72, %rsp
movl%edi, -180(%rbp)
movq%rsi, -168(%rbp)
movq%rdx, -160(%rbp)
movq%rcx, -152(%rbp)
movq%r8, -144(%rbp)
movq%r9, -136(%rbp)
testb   %al, %al
je  .L3
movaps  %xmm0, -128(%rbp)
movaps  %xmm1, -112(%rbp)
movaps  %xmm2, -96(%rbp)
movaps  %xmm3, -80(%rbp)
movaps  %xmm4, -64(%rbp)
movaps  %xmm5, -48(%rbp)
movaps  %xmm6, -32(%rbp)
movaps  %xmm7, -16(%rbp)
.L3:
.loc 1 4 0
movlv(%rip), %eax
addl$1, %eax
movl%eax, v(%rip)
...

So, isn't this fixed?

[Bug debug/108600] Use DW_LNS_set_prologue_end

2023-01-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600

--- Comment #2 from Tom de Vries  ---
(In reply to Tom de Vries from comment #1)
> Created attachment 54371 [details]

We probably don't want to emit in all cases, maybe limiting to 
"dwarf_version >= 3", or "!dwarf_strict || dwarf_version >= 3".

[Bug debug/108600] Use DW_LNS_set_prologue_end

2023-01-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600

--- Comment #1 from Tom de Vries  ---
Created attachment 54371
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54371=edit
tentative patch

Tentative patch.

For hello.c, for the -gas-loc-support case it gives us:
...
$ gcc -g ~/hello.c -S -o-
  ...
main:
.LFB0:
.file 1 "/home/vries/hello.c"
.loc 1 5 1
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
.loc 1 6 3 prologue_end
movl$.LC0, %edi

...

And for the -gno-as-loc-support case:
...
$ gcc -g ~/hello.c -c -gno-as-loc-support
$ llvm-dwarfdump --debug-line hello.o
  ...
AddressLine   Column File   ISA Discriminator Flags
-- -- -- -- --- - -
0x  5  0  1   0 0  is_stmt
0x0004  6  1  1   0 0  is_stmt prologue_end
0x000e  8  3  1   0 0  is_stmt
0x0013  9 10  1   0 0  is_stmt
0x0015  9  1  1   0 0  is_stmt end_sequence
...

[Bug debug/108600] New: Use DW_LNS_set_prologue_end

2023-01-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600

Bug ID: 108600
   Summary: Use DW_LNS_set_prologue_end
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

The prologue_end marker (introduced in dwarf v3) is currently not emitted by
gcc.

This is the case for -gno-as-loc-support (gcc emitting .debug_line
contribution).

And for -gas-loc-support (gcc emitting .loc directives).

Note that there is an existing convention that marks the end of the prologue:
the second entry in the line table marks the first insn after the prologue.

However, that convention is not known by everyone, which makes it easier to
break it without noticing.

Note that for for instance gdb test-case gdb.ada/ref_param.exp, this convention
was broken for gcc 7.5.0 (and I don't know how much earlier), and my current
guess is that it got fixed in gcc 11.1.0, by commit c029fcb5680 ("Reset
force_source_line in final.c").

The only way to make the end of the prologue visible in assembly as well as
disassembly is to make it explicit by using DW_LNS_set_prologue_end.

[Bug libgomp/108098] OpenMP/nvptx reverse offload execution test FAILs

2022-12-15 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108098

--- Comment #1 from Tom de Vries  ---
(In reply to Thomas Schwinge from comment #0)
> $ nvidia-smi
> [...]
> | NVIDIA-SMI 440.33.01Driver Version: 440.33.01CUDA Version: 10.2
> [...]
> |   0  Tesla K80  [...]
> [...]
> |   1  Tesla K80  [...]
> 

I'm not sure if it matters for triggering this problem, but if I look at this
board at nvidia drivers download and select cuda 10.2 and production branch, I
get :
...
version:440.118.02
Release Date:   2020.9.30
...

Then using the "Beta and Older Drivers" I find the version you're using is:
...
version: 440.33.01
Release date:  November 19, 2019
...

Please always use the latest drivers when reporting a problem.

[Bug debug/107909] New: [powerpc64le, debug] Incorrect call site location due to nop after call insn

2022-11-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107909

Bug ID: 107909
   Summary: [powerpc64le, debug] Incorrect call site location due
to nop after call insn
   Product: gcc
   Version: 7.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider test-case vla-optimized-out.c:
...
int
__attribute__((noinline,weak)) __attribute__((noclone))
f1 (int i)
{
  char a[i + 1];
  a[0] = 5;
  return a[0];
}

int
main (void)
{
  volatile int j;
  int i = 5;
  asm volatile ("" : "=r" (i) : "0" (i));
  j = f1 (i);
  return 0;
}
...
compiled with -O1 -g.

We generate a nop after the bl insn:
...
bl f1# 11   *call_value_nonlocal_aixdi  [length = 8]
nop
.LVL4:
...
and the label after the nop is a call site location:
...
.uleb128 0x5 # (DIE (0x67) DW_TAG_GNU_call_site)
.8byte  .LVL4# DW_AT_low_pc
.4byte  0x81 # DW_AT_abstract_origin
...

Consequently we can't actually find the call site:
...
$ gdb -q -batch ./a.out -ex "break f1" -ex run -ex "set debug entry-values 1"
-ex "print sizeof (a)"
Breakpoint 1 at 0x165c: file vla-optimized-out.c, line 8.

Breakpoint 1, f1 (i=5) at vla-optimized-out.c:8
8   }
DW_OP_entry_value resolving cannot find DW_TAG_call_site 0x1690 in main
$1 = 
...

If we manually fix this in the .s file, we get a bit further:
...
$ gdb -q -batch ./a.out -ex "break f1" -ex run -ex "set debug entry-values 1"
-ex "print sizeof (a)"
Breakpoint 1 at 0x165c: file vla-optimized-out.c, line 8.

Breakpoint 1, f1 (i=5) at vla-optimized-out.c:8
8   }
Cannot find matching parameter at DW_TAG_call_site 0x1690 at main
$1 = 
...

The problem now is that the DW_AT_abstract_origin in the DW_TAG_GNU_call_site
is not properly handled by gdb. 

I reproduced this with gcc 7.5.0, but after looking at the pattern for
call_value_nonlocal_aixdi I think this should be reproducible with trunk.

[Bug other/87741] Don't build readline/libreadline.a in GDB, when --with-system-readline is supplied

2022-10-21 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87741

Tom de Vries  changed:

   What|Removed |Added

 CC||vries at gcc dot gnu.org
 Status|NEW |RESOLVED
   Target Milestone|--- |13.0
 Resolution|--- |FIXED
  Component|bootstrap   |other

--- Comment #8 from Tom de Vries  ---
(In reply to Andrew Pinski from comment #7)
> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;f=configure.ac;
> h=69961a84c9b3744a10248fb6cbccc3c688a1e0a5
> 
> It would be useful if both configures were synced up again 

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=36ba985145ffa8e2078033fc1f1cf22851707a8e

[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs

2022-09-06 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

--- Comment #17 from Tom de Vries  ---
(In reply to Thomas Schwinge from comment #14)
> > That's with a Nvidia Tesla K20c GPU, Driver Version: 346.46.
> > As that version is "a bit old", I shall first update this, before we spend
> > any further time on analyzing this.
> 
> Cross-checking on another system with Nvidia Tesla K20c GPU but more recent
> Driver Version I'm not seeing such an issue.
> 
> On the "old" system, gradually upgrading Driver Version: 346.46 to 352.99,
> 361.93.02, 375.88 (always the latest (?) version of the respective series),
> these all did not resolve the problem.
> 
> Only starting with 384.59 (that is, early version of the 384.X series), that
> then did resolve the issue.  That's still using the GCC/nvptx '-mptx=3.1'
> multilib.
> 
> (We couldn't with earlier series, but given this is 384.X, we may now also
> cross-check with the default multilib, and that also was fine.)
> 
> Now, I don't know if at all we would like to spend any more effort on this
> issue, given that it only appears with rather old pre-384.X versions -- but
> on the other hand, the GCC/nvptx '-mptx=3.1' multilib is meant to keep these
> supported?  (... which is why I'm running such testing; and certainly the
> timeouts are annoying there.)
> 
> It might be another issue with pre-384.X versions of the Nvidia PTX JIT, or
> is there the slight possibility that GCC is generating/libgomp contains some
> "weird" code that post-384.X version happen to "fix up" -- probably the
> former rather than the latter?  (Or, the chance of GPU hardware/firmware or
> some other system weirdness -- unlikely, otherwise behaves totally fine?)
> 
> I don't know where to find complete Nvidia Driver/JIT release notes, where
> the 375.X -> 384.X notes might provide an idea of what got fixed, and we
> might then add another 'WORKAROUND_PTXJIT_BUG' for that -- maybe simple,
> maybe not.
> 
> Any thoughts, Tom?

I care about old cards, not about old drivers.  The oldest card we support is
an sm_30, and last driver series that supports that one is 470.x (and AFAIU, is
therefore supported by nvidia for that arch).

There's the legacy series, 390.x, which is the last to support fermi, but we
don't support any fermi cards or earlier.  I did do some testing with this one
for later cards, but reported issues are acknowledged but not fixed by nvidia,
so ... this is already out of scope for me.

So yeah, IWBN to come up with workarounds for various older drivers, but I'm
not investing time in that.  Is there a problem for you to move to 470.x or
later (515.x) ?  Is there a card for which that causes problems ?

[Bug debug/105772] [debug, i386] sched2 moves get_pc_thunk call past debug_insn

2022-05-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105772

--- Comment #2 from Tom de Vries  ---
As background info, I'm proposing a patch for gdb to have the
architecture-specific prologue skipper skip over the get_pc_thunk call:
https://sourceware.org/pipermail/gdb-patches/2022-May/189563.html , which helps
to skip over the prologue with -O0 -pie -fPIE code.

But that causes a regression in test-case gdb/testsuite/gdb.base/break.exp,
because of this PR.

[Bug debug/105772] New: [debug, i386] sched2 moves get_pc_thunk call past debug_insn

2022-05-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105772

Bug ID: 105772
   Summary: [debug, i386] sched2 moves get_pc_thunk call past
debug_insn
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider the test-case source gdb/testsuite/gdb.base/break1.c (
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/testsuite/gdb.base/break1.c;h=24d0d15dc2e8bd14f6b14fabbb38fec43db9c990;hb=HEAD
) containing function marker4.

Extracting the relevant part:
...
struct some_struct
{
  int a_field;
  int b_field;
  union { int z_field; };
};

struct some_struct values[50];

void marker4 (long d) { values[0].a_field = d; }/* set breakpoint 14
here */
...

When compiling with gcc 12.1.0 and -O2 like so:
...
$ gcc -fno-stack-protector -m32 -fPIE -pie -w -c -g break1.c -O2
...
we have before sched2:
...
(note 4 1 14 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 14 4 11 2 NOTE_INSN_PROLOGUE_END)
(insn/f 11 14 3 2 (parallel [
(set (reg:SI 0 ax [82])
(unspec:SI [
(const_int 0 [0])
] UNSPEC_SET_GOT))
(clobber (reg:CC 17 flags))
]) 931 {*set_got}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_EQUIV (unspec:SI [
(const_int 0 [0])
] UNSPEC_SET_GOT)
(expr_list:REG_CFA_FLUSH_QUEUE (nil)
(nil)
(note 3 11 6 2 NOTE_INSN_FUNCTION_BEG)
(debug_insn 6 3 12 2 (debug_marker) "break1.c":59:25 -1
 (nil))
(insn 12 6 8 2 (set (reg/v:SI 1 dx [orig:83 d ] [83])
(mem/c:SI (plus:SI (reg/f:SI 7 sp)
(const_int 4 [0x4])) [5 d+0 S4 A32])) "break1.c":59:43 81
{*movsi_internal}
 (expr_list:REG_EQUIV (mem/c:SI (reg/f:SI 16 argp) [5 d+0 S4 A32])
(nil)))
...
and after:
...
(note 4 1 14 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 14 4 3 2 NOTE_INSN_PROLOGUE_END)
(note 3 14 6 2 NOTE_INSN_FUNCTION_BEG)
(debug_insn 6 3 11 2 (debug_marker) "break1.c":59:25 -1
 (nil))
(insn/f:TI 11 6 12 2 (parallel [
(set (reg:SI 0 ax [82])
(unspec:SI [
(const_int 0 [0])
] UNSPEC_SET_GOT))
(clobber (reg:CC 17 flags))
]) 931 {*set_got}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_EQUIV (unspec:SI [
(const_int 0 [0])
] UNSPEC_SET_GOT)
(expr_list:REG_CFA_FLUSH_QUEUE (nil)
(nil)
(insn 12 11 8 2 (set (reg/v:SI 1 dx [orig:83 d ] [83])
(mem/c:SI (plus:SI (reg/f:SI 7 sp)
(const_int 4 [0x4])) [5 d+0 S4 A32])) "break1.c":59:43 81
{*movsi_internal}
 (expr_list:REG_EQUIV (mem/c:SI (reg/f:SI 16 argp) [5 d+0 S4 A32])
(nil)))
...

This moves the get_pc_thunk call after the debug_insn, making it (in terms of
debug info) part of the first statement instead of the prologue.

That is, with -O1 we have insn:
...
000d :
   d:   e8 fc ff ff ff  call   e 
  12:   05 01 00 00 00  add$0x1,%eax
  17:   8b 54 24 04 mov0x4(%esp),%edx
  1b:   89 90 00 00 00 00   mov%edx,0x0(%eax)
  21:   c3  ret
...
and line info:
...
File nameLine numberStarting addressViewStmt
break1.c  59 0xd   x
break1.c  59 0xd   1
break1.c  590x17   x
break1.c  590x17   1
break1.c  590x21
break1.c   -0x22
...
so at 0x17 we have the start of a statement.

But with -O2 we have identical insn:
...

0030 :
  30:   e8 fc ff ff ff  call   31 
  35:   05 01 00 00 00  add$0x1,%eax
  3a:   8b 54 24 04 mov0x4(%esp),%edx
  3e:   89 90 00 00 00 00   mov%edx,0x0(%eax)
  44:   c3  ret
...
but different line info:
...
File nameLine numberStarting addressViewStmt
break1.c  590x30   x
break1.c  590x30   1   x
break1.c  590x3a
break1.c  590x44
break1.c   -0x45
...
so at 0x3a we don't have the start of a statement.

[Bug target/104893] [nvptx] Handle Independent Thread Scheduling for sm_70+ with -msoft-stack

2022-03-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104893

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |WORKSFORME
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Tom de Vries  ---
(In reply to Tom de Vries from comment #1)
> (In reply to Tom de Vries from comment #0)
> > The per-thread call stack is handled for .local memory by the CUDA driver.
> > 
> > For the 'soft stack' that's not the case.
> 
> Hmm, actually there's .local memory used, just not "directly".  Possibly the
> documentation needs updating to point that out.
> 
> So, there doesn't seem to be an issue related to overlapping storage.
> 
> So I wonder, is the stack pointer also per thread then? Or still per-warp?

OK, here ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203#c6 ) we read:
...
The pointer is switched between per-warp global memory and per-lane local
memory.
...

So, I think this should be fine then.

Marking this resolved-worksforme until we run into an actual failing test-case.

[Bug target/104857] [nvptx] Add macro specifying ptx isa version

2022-03-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104857

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |12.0

--- Comment #3 from Tom de Vries  ---
Committed.

[Bug target/104714] [nvptx] Means to specify any sm_xx

2022-03-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104714

Tom de Vries  changed:

   What|Removed |Added

   Target Milestone|--- |12.0
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Tom de Vries  ---
Added march-map.

[Bug driver/105096] New: --target-help not an alias for --help=target

2022-03-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105096

Bug ID: 105096
   Summary: --target-help not an alias for --help=target
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: trivial
  Priority: P3
 Component: driver
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

In common.opt, we read:
...
-target-help
Common Driver
Alias for --help=target.
...

But that doesn't seem to be correct, I get different results.

For instance, for nvptx target, we have an malias that atm is undocumented, and
we have:
...
$ cc1 --target-help 2>&1 | grep malias
$
...
and:
...
$ cc1 --help=target 2>&1 | grep malias
  -malias   [disabled]
$...

[Bug c/53037] warn_if_not_aligned(X)

2022-03-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53037

Tom de Vries  changed:

   What|Removed |Added

 CC||vries at gcc dot gnu.org

--- Comment #43 from Tom de Vries  ---
*** Bug 81909 has been marked as a duplicate of this bug. ***

[Bug target/81909] Missing warning in gcc.dg/pr53037-{2,3}.c

2022-03-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81909

Tom de Vries  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from Tom de Vries  ---
Marking resolved-duplicate.

*** This bug has been marked as a duplicate of bug 53037 ***

[Bug target/81728] nvptx-run: error getting kernel result: the launch timed out and was terminated

2022-03-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81728

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |WORKSFORME
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Tom de Vries  ---
The error message looks like the one produced due to the 5 second watchdog that
is installed when the board is running a display manager.

Anyway, in order to run into this timeout, the test-case should hang, and it
currently doesn't.  At least not on any of the boards I've recently tested
with, with the supported drivers.  So, possibly this was fixed in gcc, or in a
driver.

There's not enough information in here to reproduce, so closing as
resolved-worksforme.

[Bug target/104818] Duplicate word "version" in option -mptx description

2022-03-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104818

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED
   Target Milestone|--- |12.0

--- Comment #2 from Tom de Vries  ---
Fixed in aforementioned commit.

[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver

2022-03-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042

Tom de Vries  changed:

   What|Removed |Added

   Severity|normal  |enhancement

--- Comment #8 from Tom de Vries  ---
With the conversation shifted to better error messages, re-classifying as
enhancement.

[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)

2022-03-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075

--- Comment #6 from Tom de Vries  ---
Created attachment 52698
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52698=edit
Demonstrator patch with stepping stone patterns for combine

(In reply to Tom de Vries from comment #2)
> Also, I wonder if defining a stepping-stone intermediate pattern could help
> combine.

Well, that approach seems to work.

[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test

2022-03-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014

--- Comment #6 from Tom de Vries  ---
Reproducer filed at https://github.com/vries/nvidia-bugs/tree/master/shift-and

PR filed at nvidia ( https://developer.nvidia.com/nvidia_bug/3585290 ).

[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)

2022-03-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075

Tom de Vries  changed:

   What|Removed |Added

  Attachment #52693|0   |1
is obsolete||

--- Comment #3 from Tom de Vries  ---
Created attachment 52694
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52694=edit
Demonstrator patch v2

Forgot to add test-case.

[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)

2022-03-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075

--- Comment #2 from Tom de Vries  ---
AFAIU, at gimple level support is limited to vectors, so that doesn't help to
generate the insn for the simple, scalar case.

It would be nice if combine could generate it and we wouldn't have to use a
peephole, but AFAIU the pattern is too complex for that.

I wonder if reformulating using a conditional could help there (ptx isa
describes semantics using "c + ((a

[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)

2022-03-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075

--- Comment #1 from Tom de Vries  ---
Created attachment 52693
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52693=edit
Demonstrator patch

This patch adds:
- modeling of insn sad.u32 in the .md file
- peephole2 to generate it
  (which is incomplete, it needs some safety-checks related to
  using unique intermediate regs)
- extra instance of peephole2 pass (otherwise, the peephole is not triggered)
- extra instance of fast_rtl_dce pass (otherwise, unused intermediate
  insn are not cleaned up)

So for the usad_2 in the test-case, we have without the patch:
...
cvt.u64.u32 %r32, %r28; 
cvt.u64.u32 %r33, %r29; 
sub.u64 %r34, %r32, %r33;   
abs.s64 %r35, %r34; 
cvt.u32.u64 %r36, %r35; 
add.u32 %value, %r36, %r30; 
...
and with:
...
sad.u32 %value, %r28, %r29, %r30;   
...

[Bug target/105075] New: [nvptx] Generate sad insn (sum of absolute differences)

2022-03-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075

Bug ID: 105075
   Summary: [nvptx] Generate sad insn (sum of absolute
differences)
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

ptx has sad ((sum of absolute differences)) insn, which is currently not
modeled in the .md file.

[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver

2022-03-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042

--- Comment #5 from Tom de Vries  ---
(In reply to Richard Biener from comment #1)
> Doesn't whatever driver/library API we use from libgomp to invoke workloads
> report actual errors?  Maybe we need to improve there.

This:
...
libgomp: cuStreamSynchronize error: the launch timed out and was terminated
...
seems to be the string for cudaErrorLaunchTimeout, which AFAICT is dedicated to
this situation, so we could treat that error code specially in cuda_error in
plugin-nvptx.c and emit a custom message.

Say:
...
libgomp: cuStreamSynchronize error: the launch timed out and was terminated (5
second time-out caused by launching on a device running a display manager)
...

Alternatively, we could detect cudaDeviceProp::kernelExecTimeoutEnabled and
emit a warning when initializing or before launching the first kernel.

[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver

2022-03-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042

--- Comment #4 from Tom de Vries  ---
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592275.html

[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver

2022-03-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042

--- Comment #3 from Tom de Vries  ---
(In reply to Tom de Vries from comment #2)
> (In reply to Richard Biener from comment #1)
> > Doesn't whatever driver/library API we use from libgomp to invoke workloads
> > report actual errors?  Maybe we need to improve there.
> 
> Good point, it reported some form of timeout.  I'll post the exact form once
> I reproduce.

It's:
...
Execution timeout is: 300
spawn [open ...]^M

libgomp: cuStreamSynchronize error: the launch timed out and was terminated
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0 
execution test
...

Googling a bit about this error message (
https://forums.developer.nvidia.com/t/need-to-remove-timeouts-and-the-launch-timed-out-and-was-terminated-message/16741/2
) shows that running a display manager sets a 5/10 seconds watchdog timer on
any kernel.

[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test

2022-03-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014

--- Comment #5 from Tom de Vries  ---
Minimal test-case:
...
void __attribute__((noinline)) foo (unsigned long long d0) {
  unsigned long long __a;
  __a = 0x38;
  for (; __a > 0; __a -= 8)
if (((d0 >> __a) & 0xff) != 0)
  break;

   __builtin_printf ("__a: 0x%llx\n", __a);
}

int main (void) {
  foo (1);
  return 0;
}
...

Different value of __a:
...
$ ./install/bin/nvptx-none-run -O0 ./pr97459-1.exe ; echo;
./install/bin/nvptx-none-run ./pr97459-1.exe 
__a: 0x0

__a: 0x30
...

[Bug target/105011] [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test

2022-03-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |12.0
 Resolution|--- |FIXED

--- Comment #5 from Tom de Vries  ---
Fixed by commit.

[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver

2022-03-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042

--- Comment #2 from Tom de Vries  ---
(In reply to Richard Biener from comment #1)
> Doesn't whatever driver/library API we use from libgomp to invoke workloads
> report actual errors?  Maybe we need to improve there.

Good point, it reported some form of timeout.  I'll post the exact form once I
reproduce.

[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test

2022-03-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014

--- Comment #4 from Tom de Vries  ---
(In reply to Tom de Vries from comment #1)
> With -O0 JIT instead:

Also OK with JIT -O1, problems start at JIT -O2.

[Bug target/105011] [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test

2022-03-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011

--- Comment #3 from Tom de Vries  ---
Submitted fix :
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592211.html

Though without changelog, apparently.

[Bug libgomp/105042] New: [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver

2022-03-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042

Bug ID: 105042
   Summary: [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite
failures when X runs on nvidia driver
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

I usually have only an nvidia-compute$n driver package installed, but sometimes
(as happened when I updated the system yesterday) also x11-video-nvidia$n,
after which X is run on the nvidia card (instead of on the builtin intel
graphics).

With such a setup, I run into a cluster of FAILs, all for GOMP_NVPTX_JIT=-O0:
...
$ grep ^FAIL: 2/libgomp.sum
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vred2d-128.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vred2d-128.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/parallel-dims.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/vred2d-128.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/vred2d-128.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-fortran/parallel-dims.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0 -DGOMP_NVPTX_JIT=-O0 execution
test
FAIL: libgomp.oacc-fortran/parallel-dims.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1 -DGOMP_NVPTX_JIT=-O0 execution
test
FAIL: libgomp.oacc-fortran/parallel-dims.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -Os -DGOMP_NVPTX_JIT=-O0 execution
test
...

Note that this is with a patch from PR104423 that runs tests both with default
JIT optimization and GOMP_NVPTX_JIT=-O0, hence the -DGOMP_NVPTX_JIT=-O0 tag. 
But it can be reproduced by just doing:
...
export GOMP_NVPTX_JIT=-O0
...

It could be that the test-cases just need scaling down.  OTOH, it also could be
that there's an underlying problem that only surfaces when other processes are
run in parallel, or specifically, X.

This is on board K2000 with driver 470.103.01.

The board has 2GB of memory, and according to nvidia-smi, having the X
processes takes a couple of 100MBs, and ./parallel-dims.exe just takes 15MiB,
so at first glance it doesn't seem to be an out-of-board-memory thing.

I do observe reduced system responsiveness while running the tests, so maybe
it's the compute capacity rather than memory which is exhausted.

[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"

2022-03-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019

--- Comment #6 from Tom de Vries  ---
(In reply to Tom de Vries from comment #4)
> OK, I think this is the pattern:
> ...
> $ cat gcc/testsuite/gcc.target/nvptx/alias-5.c

FTR, same thing if I use static functions:
...
static void __attribute__((noinline))
__f ()
{
  v = 1;
}

static void f () __attribute__ ((alias ("__f")));

static void
g (void)
{
  f ();
}
...

[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"

2022-03-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019

--- Comment #5 from Tom de Vries  ---
Creating a CUDA example is hampered by the fact that there's no symbol alias
support, AFAICT.

I'd like to write something like:
...
__device__ void __foo ()
{
  printf ("__foo\n");
}
__device__ void foo () __attribute__((alias ("__foo")));
__device__ void bar ()
{
  foo ();
}
__global__ void hello_world ()
{
  bar ();
}
...
(and then comment out bar to reproduce the problem), but all attempts to get
this compiled and executed end up in various errors.

Of course we can resort to hand-edited ptx, or adding .alias directives in asm
statements, but that's likely to produce an 'unsupported' response by nvidia
when filed as bug report.

[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"

2022-03-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019

--- Comment #4 from Tom de Vries  ---
OK, I think this is the pattern:
...
$ cat gcc/testsuite/gcc.target/nvptx/alias-5.c
/* { dg-do link } */
/* { dg-do run { target runtime_ptx_isa_version_6_3 } } */
/* { dg-options "-save-temps -malias -mptx=6.3" } */

int v;

void __attribute__((noinline))
__f ()
{
  v = 1;
}

void f () __attribute__ ((alias ("__f")));

void
g (void)
{
  f ();
}

int
main (void)
{
  if (v != 0)
__builtin_abort ();

  return 0;
}
...

There's a function __f:
...
// BEGIN GLOBAL FUNCTION DEF: __f
.visible .func __f
{
.reg .u32 %r22;
mov.u32 %r22,1;
st.global.u32 [v],%r22;
ret;
}
...
with alias f:
...
// BEGIN GLOBAL FUNCTION DECL: __f
.visible .func __f;
.visible .func f;
.alias f,__f;
...
called from g:
...
// BEGIN GLOBAL FUNCTION DEF: g
.visible .func g
{
{
call f;
}
ret;
}
...

However, g is unused.

So we have:
...
PASS: gcc.target/nvptx/alias-5.c (test for excess errors)
spawn nvptx-none-run ./alias-5.exe^M
fatal   : Internal error: reference to deleted section^M
nvptx-run: cuLinkComplete failed: unknown error (CUDA_ERROR_UNKNOWN, 999)^M
FAIL: gcc.target/nvptx/alias-5.c execution test
...

Calling g from main fixes the internal error.

[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"

2022-03-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019

--- Comment #3 from Tom de Vries  ---
Aliases in failing .exe:
...
$ strings declare_target-1.exe | grep "\.alias"
.alias gomp_ialias_GOMP_taskgroup_start,GOMP_taskgroup_start;
.alias gomp_ialias_GOMP_taskgroup_end,GOMP_taskgroup_end;
.alias
gomp_ialias_GOMP_taskgroup_reduction_register,GOMP_taskgroup_reduction_register;
...

[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"

2022-03-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019

--- Comment #2 from Tom de Vries  ---
Aliases in libgomp.a:
...
$ grep "\.alias"
build-gcc-offload-nvptx-none/nvptx-none/mgomp/libgomp/.libs/libgomp.a
.alias gomp_ialias_GOMP_loop_runtime_next,GOMP_loop_runtime_next;
.alias gomp_ialias_GOMP_loop_ull_runtime_next,GOMP_loop_ull_runtime_next;
.alias gomp_ialias_GOMP_parallel_end,GOMP_parallel_end;
.alias gomp_ialias_GOMP_taskgroup_start,GOMP_taskgroup_start;
.alias gomp_ialias_GOMP_taskgroup_end,GOMP_taskgroup_end;
.alias
gomp_ialias_GOMP_taskgroup_reduction_register,GOMP_taskgroup_reduction_register;
.alias gomp_ialias_omp_capture_affinity,omp_capture_affinity;
.alias gomp_ialias_omp_aligned_alloc,omp_aligned_alloc;
.alias gomp_ialias_omp_free,omp_free;
.alias gomp_ialias_omp_aligned_calloc,omp_aligned_calloc;
...

[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"

2022-03-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019

--- Comment #1 from Tom de Vries  ---
To trigger:
...
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 87efc23bd96..8bf9ea90a77 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -245,6 +245,9 @@ default_ptx_version_option (void)
  warp convergence.  */
   res = MAX (res, PTX_VERSION_6_0);

+  /* Pick at least 6.3, to enable using malias.  */
+  res = MAX (res, PTX_VERSION_6_3);
+
   /* Verify that we pick a version that supports the sm.  */
   gcc_assert (first <= res);
   return res;
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 11288d1a8ee..a4aece80682 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -87,7 +87,7 @@ mptx-comment
 Target Var(nvptx_comment) Init(1) Undocumented

 malias-
-Target Var(nvptx_alias) Init(0) Undocumented
+Target Var(nvptx_alias) Init(1) Undocumented

 mexperimental
 Target Var(nvptx_experimental) Init(0) Undocumented
...
rebuild gcc, run libgomp tests, and:
...
$ grep -c "Internal error: reference to deleted section" libgomp.log 
637
...

For instance:
...
Execution timeout is: 300
spawn [open ...]^M

libgomp: Link error log fatal   : Internal error: reference to deleted section


libgomp: cuLinkComplete error: unknown error

libgomp: Cannot map target functions or variables (expected 2, have 4294967295)
FAIL: libgomp.c/../libgomp.c-c++-common/declare_target-1.c execution test
...

[Bug target/105019] New: [nvptx] malias in libgomp results in "Internal error: reference to deleted section"

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019

Bug ID: 105019
   Summary: [nvptx] malias in libgomp results in "Internal error:
reference to deleted section"
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

As mentioned in the commit message for malias:
...
When enabling malias by default, libgomp detects alias support and
consequently libgomp.a will contains a few uses of .alias.  This however
results in aforementioned "Internal error: reference to deleted section" in
many test-cases.  Either there's some error with how .alias is used, or
there's a driver bug.  While this issue is not resolved, we keep malias
off-by-default.
...

This needs to be investigated, and if it's a driver bug, reported to nvidia, or
otherwise fixed or worked around.

Note: the same error showed up in a test-case where the call to an alias was
inlined, and consequently the alias referenced a defined but unused function. 
To observe this, disable this bit:
...
  if (!cgraph_node::get (name)->referred_to_p ())
/* Prevent "Internal error: reference to deleted section".  */
return;
...
in nvptx_asm_output_def_from_decls and run nvptx.exp=alias-2.c:
...
PASS: gcc.target/nvptx/alias-2.c (test for excess errors)
spawn nvptx-none-run ./alias-2.exe^M
fatal   : Internal error: reference to deleted section^M
nvptx-run: cuLinkComplete failed: unknown error (CUDA_ERROR_UNKNOWN, 999)^M
FAIL: gcc.target/nvptx/alias-2.c execution test
...

So it's possible that the error somehow related to this scenario.

[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014

--- Comment #3 from Tom de Vries  ---
(In reply to Tom de Vries from comment #2)
> (In reply to Tom de Vries from comment #0)
> > On a quadro k2000 with driver 470.103.01, I run into:
> 
> So, sm_30.
> 
> > ...
> > FAIL: gcc.dg/pr97459-1.c execution test
> 
> Reproduced on geforce gt710 (sm_35), with same driver.

But not on quadro k620 (sm_50).

[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014

--- Comment #2 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> On a quadro k2000 with driver 470.103.01, I run into:

So, sm_30.

> ...
> FAIL: gcc.dg/pr97459-1.c execution test

Reproduced on geforce gt710 (sm_35), with same driver.

[Bug target/105018] [nvptx] Need better alias support

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105018

--- Comment #2 from Tom de Vries  ---
As mentioned before by amonakov, a possibility is to add alias support to the
nvptx-tools linker, and use that.

[Bug target/105018] [nvptx] Need better alias support

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105018

--- Comment #1 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> Aliases to aliases are not supported (see libgomp.c-c++-common/pr96390.c).
> This is currently not prohibited by the compiler, but with the driver link we
> run into:  "Internal error: alias to unknown symbol" .

And that is the reason that libgomp.c-c++-common/pr96390.c and friends doesn't
pass when I do:
...
/* { dg-additional-options "-foffload=-mptx=6.3 -foffload=-malias" { target
offload_target_nvptx } } */
...

[Bug target/105018] New: [nvptx] Need better alias support

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105018

Bug ID: 105018
   Summary: [nvptx] Need better alias support
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

We currently have alias support enabled by malias, which relies on the ptx
.alias directive.

There is a number of limitations, listed in the commit adding malias:
...
Only function aliases are supported.

Weak aliases are not supported.  That is, if I disable the check in
nvptx_asm_output_def_from_decls that disallows this, a weak alias is emitted
and parsed by the driver.  But the test gcc.dg/globalalias.c starts failing,
with the behaviour matching the comment about "weird behavior of AIX's .set
pseudo-op": a weak alias may resolve to different functions in different
files.

Aliases to weak symbols are not supported (see gcc.dg/localalias.c).  This is
currently not prohibited by the compiler, but with the driver link we run
into: "error: Function test with .weak scope cannot be aliased".

Aliases to aliases are not supported (see libgomp.c-c++-common/pr96390.c).
This is currently not prohibited by the compiler, but with the driver link we
run into:  "Internal error: alias to unknown symbol" .

Unreferenced aliases are not emitted (these can occur f.i. when inlining a
call to an alias).  This avoids driver link error "Internal error: reference
to deleted section".
...

We'd like an implementation that doesn't have (all of) these limitations.

[Bug target/97106] [nvptx] Issues with weak aliases introduced by C++

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97106

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |12.0
 Resolution|--- |FIXED

--- Comment #5 from Tom de Vries  ---
Using the test-case from comment 0 and:
...
/* { dg-additional-options "-foffload=-malias -foffload=-mptx=6.3 -O0" } */
...
I get:
...
$ strings test.exe | grep -i alias.alias
_ZN1VILi1EEC1ImvEET_,_ZN1VILi1EEC2ImvEET_;
...
so I see a normal alias, not a weak alias.

The test-case still fails in the abort.

Note that I get the same result with:
...
/* { dg-additional-options "-foffload=-mno-alias -foffload=-mptx=6.3 -O2" } */
...

There may be a problem with the test-case, there may be a problem with nvptx
c++ support, but the alias issue seems to have been addresses, so I'm closing
this one.

[Bug target/97106] [nvptx] Issues with weak aliases introduced by C++

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97106
Bug 97106 depends on bug 97102, which changed state.

Bug 97102 Summary: [nvptx] PTX JIT compilation failed when using aliases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97102

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug target/97102] [nvptx] PTX JIT compilation failed when using aliases

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97102

Tom de Vries  changed:

   What|Removed |Added

   Target Milestone|--- |12.0
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #8 from Tom de Vries  ---
The test-case from comment 0 now works in combination with:
/* { dg-additional-options "-foffload=-malias -foffload=-mptx=6.3" } */

[Bug libgomp/98215] Coalescing memory in target region creates slower code

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98215

Tom de Vries  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   Keywords||missed-optimization
 CC||vries at gcc dot gnu.org

[Bug target/104916] [nvptx] Handle Independent Thread Scheduling for sm_70+ with -muniform-simt

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104916

Tom de Vries  changed:

   What|Removed |Added

   Target Milestone|--- |12.0

[Bug target/104916] [nvptx] Handle Independent Thread Scheduling for sm_70+ with -muniform-simt

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104916

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Tom de Vries  ---
Fixed.

[Bug target/104783] [nvptx, openmp] Hang/abort with atomic update in simd construct

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104783

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |12.0
 Status|UNCONFIRMED |RESOLVED

--- Comment #8 from Tom de Vries  ---
Fixed.

[Bug target/104957] [nvptx] Use .alias directive (available starting ptx isa version 6.3)

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104957

Tom de Vries  changed:

   What|Removed |Added

   Target Milestone|--- |12.0
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Tom de Vries  ---
Committed.

[Bug target/104925] [nvptx] Use "%" as register prefix

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104925

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |12.0

--- Comment #2 from Tom de Vries  ---
Fixed.

[Bug libgcc/105016] [libgcc, TARGET_HAS_NO_HW_DIVIDE] Incorrect result for __udivmodti4

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105016

--- Comment #3 from Tom de Vries  ---
In libgcc.h, I see:
...
#define __udivmoddi4__NDW(udivmod,4)
...
and for LIBGCC2_UNITS_PER_WORD == 8 we have:
...
#define __NDW(a,b)  __ ## a ## ti ## b
...

So, AFAICT it's possible that __udivmoddi4 is mapped to __udivmodti4.

[Bug libgcc/105016] [libgcc, TARGET_HAS_NO_HW_DIVIDE] Incorrect result for __udivmodti4

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105016

--- Comment #1 from Tom de Vries  ---
Created attachment 52662
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52662=edit
test-case

[Bug libgcc/105016] New: [libgcc, TARGET_HAS_NO_HW_DIVIDE] Incorrect result for __udivmodti4

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105016

Bug ID: 105016
   Summary: [libgcc, TARGET_HAS_NO_HW_DIVIDE] Incorrect result for
__udivmodti4
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

While investigating PR105014, I copied the TARGET_HAS_NO_HW_DIVIDE
implementation of __udivmoddi4 to the test-case.

On x86, when using the native __udivmodti4, I get:
...
$ ./a.out 
a   : 0xfffb
b   : 0x0001
div : 0xfffb
mod : 0x
... 

But with the TARGET_HAS_NO_HW_DIVIDE version I get instead:
...
a   : 0xfffb
b   : 0x0001
div : 0x
mod : 0xfffc
...

[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014

--- Comment #1 from Tom de Vries  ---
First FAIL minimizes to:
...
typedef __uint128_t T;

union u {
  T t;
  struct {
unsigned long long x;
unsigned long long y;
  } xy;
};

#define PRINT(VAR)  \
  do\
{   \
  __builtin_printf (#VAR ": lo: %llx\n", VAR.xy.x); \
  __builtin_printf (#VAR ": hi: %llx\n", VAR.xy.y); \
} \
  while (0)

extern T __udivmodti4 (T, T, T *);

int
main (void)
{
  union u a, b, mod, div;
  a.t = -4;
  b.t = 1;
  PRINT (a);
  PRINT (b);
  div.t = __udivmodti4 (a.t, b.t, );
  PRINT (div);
  PRINT (mod);
  if (mod.t != 0)
__builtin_abort ();

  return 0;
}
...

Fails like this:
...
$ ./install/bin/nvptx-none-run  ./pr97459-1.exe 
a: lo: fffc
a: hi: 
b: lo: 1
b: hi: 0
div: lo: fffd
div: hi: 
mod: lo: 
mod: hi: 0
nvptx-run: error getting kernel result: unspecified launch failure
(CUDA_ERROR_LAUNCH_FAILED, 719)
$
...

With -O0 JIT instead:
...
$ ./install/bin/nvptx-none-run -O0  ./pr97459-1.exe 
a: lo: fffc
a: hi: 
b: lo: 1
b: hi: 0
div: lo: fffc
div: hi: 
mod: lo: 0
mod: hi: 0
...

[Bug target/105014] New: [nvptx] FAIL: gcc.dg/pr97459-1.c execution test

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014

Bug ID: 105014
   Summary: [nvptx] FAIL: gcc.dg/pr97459-1.c execution test
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

On a quadro k2000 with driver 470.103.01, I run into:
...
FAIL: gcc.dg/pr97459-1.c execution test
FAIL: gcc.dg/pr97459-2.c execution test
FAIL: gcc.dg/pr97459-3.c execution test
FAIL: gcc.dg/pr97459-4.c execution test
FAIL: gcc.dg/pr97459-5.c execution test
FAIL: gcc.dg/pr97459-6.c execution test
...

[Bug target/105011] [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011

--- Comment #2 from Tom de Vries  ---
Even better:
...
diff --git a/libatomic/tas_n.c b/libatomic/tas_n.c
index d0d8c283b495..65eaa7753a51 100644
--- a/libatomic/tas_n.c
+++ b/libatomic/tas_n.c
@@ -73,7 +73,7 @@ SIZE(libat_test_and_set) (UTYPE *mptr, int smodel)
 __ATOMIC_RELAXED, __ATOMIC_RELAXED));

   post_barrier (smodel);
-  return woldval != 0;
+  return (woldval & wval) == wval;
 }

 #define DONE 1
...

That also gives back accurate results in case
TARGET_ATOMIC_TEST_AND_SET_TRUEVAL has more than one bit set.

[Bug target/105011] [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011

--- Comment #1 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> It should probably do something like:
> ...
>   return (woldval & wval) != 0;
> ...

Indeed, that fixes the FAILs.

[Bug target/105011] New: [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011

Bug ID: 105011
   Summary: [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c   -O1
execution test
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

On a quadro k2000 with driver 470.103.01 I'm running into this cluster of
FAILs:
...
FAIL: gcc.dg/atomic/stdatomic-flag-2.c   -O1  execution test
FAIL: gcc.dg/atomic/stdatomic-flag-2.c   -O3 -g  execution test
FAIL: gcc.dg/atomic/stdatomic-flag.c   -O3 -g  execution test
...

Minimizing the first FAIL, I end up with:
...
#include 

extern void abort (void);
atomic_flag a = ATOMIC_FLAG_INIT;

int
main ()
{
  int b;

  if ((atomic_flag_test_and_set) ())
abort ();

  return 0;
}
...

The atomic access is done by libatomic, using a 64-bit cas loop.

If we print the address of a, we have:
...
: 000700700200
...
so the pointer is already 64-bit aligned.

If we print the 64-bit value of *(unsigned long long *) before and after the
test-and-set, we have:
...
a: 00024f00
a: 00024f01
...
so that looks all-right as well.

At first glance, the problem is in libatomic, tas_n.c:
...
  wval = (UWORD)__GCC_ATOMIC_TEST_AND_SET_TRUEVAL << shift;
  woldval = __atomic_load_n (wptr, __ATOMIC_RELAXED);
  do
{
  t = woldval | wval;
}
  while (!atomic_compare_exchange_w (wptr, , t, true,
 __ATOMIC_RELAXED, __ATOMIC_RELAXED));

  post_barrier (smodel);
  return woldval != 0;
...

What is returned is woldval != 0, but that tests the entire word, not just the
byte we're interested in.

It should probably do something like:
...
  return (woldval & wval) != 0;
...

[Bug middle-end/105001] If executing with non-nvptx offloading, but nvptx offloading compilation is enabled: FAIL: libgomp.c/pr104783.c execution test

2022-03-21 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105001

--- Comment #1 from Tom de Vries  ---
Interesting.

Can you compare dump files to see where the difference comes from?

[Bug target/104936] [nvptx] Handle weak decl/def distinction in common code

2022-03-21 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104936

Tom de Vries  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   Keywords||internal-improvement

[Bug target/104991] New: [nvptx] Simplify muniform-simt transformation

2022-03-20 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104991

Bug ID: 104991
   Summary: [nvptx] Simplify muniform-simt transformation
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

The muniform-simt reorg pass transforms the insn stream, both inside and
outside an SIMT region.

The transform rewrites atomic insns by adding a predicate to execute in one
thread only outside the SIMT region.  Furthermore, if the atomic insn has a
result, a shuffle is added to propagate the result to all threads in the warp.

Inside the SIMT region, the predicate evaluates to true such that all threads
in the warp execute it.

And the source lane register for the shuffle is set such that the shuffle is a
nop inside the SIMT region.

However, since we've started using shfl.sync for the shuffle, the shuffle now
has it's own predicate, and consequently having a source lane register with
different values inside and outside the SIMT region is no longer necessary.

[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1

2022-03-18 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |12.0
 Status|UNCONFIRMED |RESOLVED

--- Comment #7 from Tom de Vries  ---
Fixed by
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=356e2720e9030927579024c2f060d665a0b9080f
.

[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT

2022-03-18 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |12.0
 Status|UNCONFIRMED |RESOLVED

--- Comment #12 from Tom de Vries  ---
Fixed by "[openmp] Fix SIMT reduction using TRUTH_{AND,OR}IF_EXPR".

[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1

2022-03-18 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968

--- Comment #6 from Tom de Vries  ---
(In reply to Tom de Vries from comment #5)
> This patch fixes the ICE at openmp level:
> ...
> diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> index 139a0de6100..19af384c634 100644
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -13361,6 +13361,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
>g = gimple_build_bind (NULL_TREE, gfor, NULL_TREE);
>g = gimple_build_omp_task (g, task_clauses, NULL_TREE, NULL_TREE,
>  NULL_TREE, NULL_TREE, NULL_TREE);
> +  gimple_set_location (g, EXPR_LOCATION (*expr_p));
>gimple_omp_task_set_taskloop_p (g, true);
>g = gimple_build_bind (NULL_TREE, g, NULL_TREE);
>gomp_for *gforo
> ...

Submitted a more complete patch here (
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591954.html ).

[Bug target/104957] [nvptx] Use .alias directive (available starting ptx isa version 6.3)

2022-03-18 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104957

Tom de Vries  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968

Tom de Vries  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #5 from Tom de Vries  ---
This patch fixes the ICE at openmp level:
...
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 139a0de6100..19af384c634 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -13361,6 +13361,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
   g = gimple_build_bind (NULL_TREE, gfor, NULL_TREE);
   g = gimple_build_omp_task (g, task_clauses, NULL_TREE, NULL_TREE,
 NULL_TREE, NULL_TREE, NULL_TREE);
+  gimple_set_location (g, EXPR_LOCATION (*expr_p));
   gimple_omp_task_set_taskloop_p (g, true);
   g = gimple_build_bind (NULL_TREE, g, NULL_TREE);
   gomp_for *gforo
...

[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968

--- Comment #4 from Tom de Vries  ---
This ( https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591912.html )
proposed patch fixes this ICE, pinged again.

[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968

--- Comment #3 from Tom de Vries  ---
(In reply to Tom de Vries from comment #2)
> (In reply to Tom de Vries from comment #1)
> > Can't reproduce.
> > 
> > It this not fixed by:
> > ...
> > commit 7862f6ccd85a001e4d70abb00bb95d8c7846ba80
> > Author: Tom de Vries 
> > Date:   Wed Feb 23 09:33:33 2022 +0100
> > 
> > [nvptx] Fix dummy location in gen_comment
> > ...
> > ?
> 
> Hmm, wait, of course I have a patch in my stack that's pending for upstream.
> Let me undo that one and retry.

Ack, reproduced.

[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968

--- Comment #2 from Tom de Vries  ---
(In reply to Tom de Vries from comment #1)
> Can't reproduce.
> 
> It this not fixed by:
> ...
> commit 7862f6ccd85a001e4d70abb00bb95d8c7846ba80
> Author: Tom de Vries 
> Date:   Wed Feb 23 09:33:33 2022 +0100
> 
> [nvptx] Fix dummy location in gen_comment
> ...
> ?

Hmm, wait, of course I have a patch in my stack that's pending for upstream.
Let me undo that one and retry.

[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968

--- Comment #1 from Tom de Vries  ---
Can't reproduce.

It this not fixed by:
...
commit 7862f6ccd85a001e4d70abb00bb95d8c7846ba80
Author: Tom de Vries 
Date:   Wed Feb 23 09:33:33 2022 +0100

[nvptx] Fix dummy location in gen_comment
...
?

[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952

--- Comment #9 from Tom de Vries  ---
Created attachment 52647
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52647=edit
Tentative patch with test-cases, rationale and changelog

I'll put this through testing, and submit if no problems found.

[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952

--- Comment #8 from Tom de Vries  ---
(In reply to Jakub Jelinek from comment #6)
> And yes, #c1 is valid.

Thanks for confirming.

> But would be nice to have similar test with && and
> initial result = 2; and arr[] say { 1, 2, 3, 4, 5, 6, 7, ..., 32 } and test
> result is 1 at the end to make sure we don't actually do just
> orig = orig & (private != 0)
> style merging or even just
> orig = orig & private;

Ack, will add that.

[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952

--- Comment #7 from Tom de Vries  ---
Alternative fix that doesn't require fiddling with the 'code' var:
...
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index d932d74cb03..d0ddd4a6142 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -6734,7 +6734,10 @@ lower_rec_input_clauses (tree clauses, gimple_seq
*ilist, gimpl
e_seq *dlist,
  x = build_call_expr_internal_loc
(UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
 TREE_TYPE (ivar), 2, ivar, simt_lane);
- x = build2 (code, TREE_TYPE (ivar), ivar, x);
+ /* Make sure x is evaluated unconditionally.  */
+ tree bfly_var = create_tmp_var (TREE_TYPE (ivar));
+ gimplify_assign (bfly_var, x, [2]);
+ x = build2 (code, TREE_TYPE (ivar), ivar, bfly_var);
  gimplify_assign (ivar, x, [2]);
}
  tree ivar2 = ivar;
...

[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952

--- Comment #4 from Tom de Vries  ---
This fixes it:
...
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index d932d74cb03..f2ac8f98e32 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -6734,7 +6734,21 @@ lower_rec_input_clauses (tree clauses, gimple_seq
*ilist, gimpl
e_seq *dlist,
  x = build_call_expr_internal_loc
(UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
 TREE_TYPE (ivar), 2, ivar, simt_lane);
- x = build2 (code, TREE_TYPE (ivar), ivar, x);
+ /* Make sure x is evaluated unconditionally.  */
+ enum tree_code update_code;
+ switch (OMP_CLAUSE_REDUCTION_CODE (c))
+   {
+   case TRUTH_ANDIF_EXPR:
+ update_code = TRUTH_AND_EXPR;
+ break;
+   case TRUTH_ORIF_EXPR:
+ update_code = TRUTH_OR_EXPR;
+ break;
+   default:
+ update_code = code;
+ break;
+   }
+ x = build2 (update_code, TREE_TYPE (ivar), ivar, x);
  gimplify_assign (ivar, x, [2]);
}
  tree ivar2 = ivar;
...

[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952

--- Comment #3 from Tom de Vries  ---
Hmm, that seems to be actually due to:
...
  if (sctx.is_simt)
{
  if (!simt_lane)
simt_lane = create_tmp_var (unsigned_type_node);
  x = build_call_expr_internal_loc
(UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
 TREE_TYPE (ivar), 2, ivar, simt_lane);
  x = build2 (code, TREE_TYPE (ivar), ivar, x);
  gimplify_assign (ivar, x, [2]);
}
...
which gimplifies assigning:
...
(gdb) call debug_generic_expr (x)
D.2163 || .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164)
...
to:
...
(gdb) call debug_generic_expr (ivar)
D.2163
...

[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952

--- Comment #2 from Tom de Vries  ---
I think the problem can be seen already at omp-lower, in the body of the
butterfly loop.

Let's first look at what we have if we use reduction op '|':
...
D.2173 = .GOMP_SIMT_VF ();
D.2164 = 1;
D.2161 = 0;
goto ;
:
D.2165 = D.2163;
D.2165 = D.2163;
D.2166 = .GOMP_SIMT_XCHG_BFLY (D.2165, D.2164);
D.2167 = D.2165 | D.2166;
D.2163 = D.2167;
D.2164 = D.2164 << 1;
:
if (D.2164 < D.2173) goto ; else goto ;
:
...
Fairly straightforward, we have a loop, runs a couple of times, first a shuffle
(GOMP_SIMT_XCHG_BFLY), then an update (D.2167 = D.2165 | D.2166).

Now compare that with reduction op '||':
...
D.2183 = .GOMP_SIMT_VF ();
D.2164 = 1;
D.2161 = 0;
goto ;
:
D.2169 = D.2163;
D.2170 = (_Bool) D.2169;
if (D.2170 != 0) goto ; else goto ;
:
D.2169 = D.2163;
D.2172 = .GOMP_SIMT_XCHG_BFLY (D.2169, D.2164);
D.2173 = (_Bool) D.2172;
if (D.2173 != 0) goto ; else goto ;
:
iftmp.5 = 1;
goto ;
:
iftmp.5 = 0;
:
D.2163 = iftmp.5;
D.2164 = D.2164 << 1;
:
if (D.2164 < D.2183) goto ; else goto ;
:
...

The shuffle is now conditional.  I think the shuffle is inserted too late, in
the middle of the update rather than before.

[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952

Tom de Vries  changed:

   What|Removed |Added

   Keywords||openmp

--- Comment #1 from Tom de Vries  ---
I can reproduce the problem.

I've made the simd explicit (I hope that's still valid openmp code):
...
$ cat libgomp/testsuite/libgomp.c/test.c
#define N 32

static char arr[N];

int
main (void)
{
  unsigned int result = 0;

  for (unsigned int i = 0; i < N; ++i)
arr[i] = 0;
  arr[5] = 1;

#pragma omp target map(tofrom:result) map(to:arr)
#pragma omp simd reduction(||: result)
  for (unsigned int i = 0; i < N; ++i)
result = result || arr[i];

  if (result != 1)
return 1;

  return 0;
}
...

Easy workaround:
...
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index d932d74cb03..bf6845d654e 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -4641,6 +4641,15 @@ lower_rec_simd_input_clauses (tree new_var, omp_context
*ctx,
  sctx->max_vf = 1;
  break;
  }
+
+ if (OMP_CLAUSE_REDUCTION_CODE (c) == TRUTH_ANDIF_EXPR
+ || OMP_CLAUSE_REDUCTION_CODE (c) == TRUTH_ORIF_EXPR)
+   {
+ sctx->max_vf = 1;
+ break;
+   }
}
}
   if (maybe_gt (sctx->max_vf, 1U))
...

[Bug target/104957] [nvptx] Use .alias directive (available starting ptx isa version 6.3)

2022-03-16 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104957

--- Comment #3 from Tom de Vries  ---
The OvO testsuite, when run at -O2 passes, because it inlines all .alias
instances.

But at -O0, it doesn't.  With -foffload=-malias that's fixed.

[Bug target/104957] [nvptx] Use .alias directive (available starting ptx isa version 6.3)

2022-03-16 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104957

--- Comment #2 from Tom de Vries  ---
So, what do we get after specifying -malias -mptx=6.3?

Alias attribute only for functions, not variables.

No support for weak alias (allowing this does compile, but we run into
execution fails in gcc.dg/globalalias.c and gcc.dg/pr77587.c).

No support for aliases of weak functions.  We can't detect this in the
compiler, so we'll run into linker error "error: Function test with .weak scope
cannot be aliased".

[Bug target/104957] [nvptx] Use .alias directive (available starting ptx isa version 6.3)

2022-03-16 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104957

--- Comment #1 from Tom de Vries  ---
Created attachment 52636
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52636=edit
Tentative patch

Patch that I'm currently working on.

Adds -malias, off by default.

It's off by default because when doing a build with libgomp and malias on by
default, libgomp uses .alias a few times, and that ends up in a linker error
"Internal error: reference to deleted section" with OvO test-cases (haven't
tried others)

This may be a driver error, or incorrect usage of the .alias directive.  The
answer might be found by playing around with .alias in cuda examples.

Things I tried manually in the ptx were:
- resolving the .alias: this worked
- enforcing order: function definition, alias declaration, alias definition
  to precisely match example in ptx manual: didn't work.

[Bug target/104957] New: [nvptx] Use .alias directive (available starting ptx isa version 6.3)

2022-03-16 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104957

Bug ID: 104957
   Summary: [nvptx] Use .alias directive (available starting ptx
isa version 6.3)
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

[ There is a number of nvptx PRs open about alias support.  The focus of this
PR is $subject, rather than supporting some specific source construct. ]

So, we have an .alias directive in ptx, can we use it for something?

Ideally, we'd use the .alias directive for all our needs, but it's too limited
for that.

OTOH, currently we just error out on any .alias usage in the source code, so we
could try to add an implementation that only errors out for things it doesn't
support.

[Bug target/97106] [nvptx] Issues with weak aliases introduced by C++

2022-03-15 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97106

--- Comment #4 from Tom de Vries  ---
This:
...
$ cat alias.c
void __f ()
{
  __builtin_printf ("hello\n");
}

void f () __attribute__ ((alias ("__f")));

int
main (void)
{
  f ();
  return 0;
}
...
works fine at -O0 and -O1:
...
$ ./gcc.sh -O0 ./alias.c
$ ./install/bin/nvptx-none-run a.out
hello
$ ./gcc.sh -O1 ./alias.c
$ ./install/bin/nvptx-none-run a.out
hello
...
but at -O2 we have:
...
$ ./gcc.sh -O2 ./alias.c
$ ./install/bin/nvptx-none-run a.out
fatal   : Internal error: reference to deleted section
nvptx-run: cuLinkComplete failed: unknown error (CUDA_ERROR_UNKNOWN, 999)
...

This seems to be due to f/__f being inlined into main, after which we have an
alias declaration which is unused:
...
.visible .func f;
.alias f,__f;
...
Removing these two lines make the executable run fine again.

Note: same thing when using nvptx-none-run -O0.

Fixed by:
...
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index ab1f62359d4b..3e51bf15776c 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -77,6 +77,7 @@
 #include "opts.h"
 #include "tree-pretty-print.h"
 #include "rtl-iter.h"
+#include "cgraph.h"

 /* This file should be included last.  */
 #include "target-def.h"
@@ -7396,6 +7397,10 @@ nvptx_mem_local_p (rtx mem)
 void
 nvptx_asm_output_def_from_decls (FILE *stream, tree name, tree value)
 {
+  if (!cgraph_node::get (name)->referred_to_p ())
+/* Prevent "Internal error: reference to deleted section".  */
+return;
+
   std::stringstream s;
   write_fn_proto (s, false, get_fnname_from_decl (name), name);
   fputs (s.str().c_str(), stream);
...

[Bug target/104936] New: [nvptx] Handle weak decl/def distinction in common code

2022-03-15 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104936

Bug ID: 104936
   Summary: [nvptx] Handle weak decl/def distinction in common
code
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

At docs for ASM_WEAKEN_LABEL (stream, name) we find:
...
If you don’t define this macro or ASM_WEAKEN_DECL, GCC will not support weak
symbols and you should not define the SUPPORTS_WEAK macro. 
...

However, we have:
...
$ grep define.*WEAKEN gcc/config/nvptx/*
$ 
...
but still:
...
$ grep SUPPORTS_WEAK gcc/config/nvptx/*
gcc/config/nvptx/nvptx.h:#define SUPPORTS_WEAK 1
...

I think an argument for the discrepancy is made here:
...
  /*  We support weak defintions, and hence have the right  
  ASM_WEAKEN_DECL definition.  Diagnose the problem here.  */
  if (DECL_WEAK (decl))
error_at (DECL_SOURCE_LOCATION (decl),
  "PTX does not support weak declarations"
  " (only weak definitions)");
...
where it's my understanding that the "right ASM_WEAKEN_DECL" is meant to refer
to ASM_WEAKEN_DECL not being defined.

It would be nice to solve this somehow in the common code instead of deviating
from prescribed target macro usage.

[Bug target/104768] [nvptx] Exploit Independent Thread Scheduling for sm_70+

2022-03-15 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104768

--- Comment #1 from Tom de Vries  ---
Hmm, reading about it a bit more, it's more about enabling algorithms that were
not possible before, than about performance improvements.

So, we should aim at having test-cases, both openacc and openmp that hang on
previous architectures but pass with sm_70+.

  1   2   3   4   5   6   7   8   9   10   >