[Bug c++/113599] [14 Regression] Wrong computation of member offset through pointer-to-member since r14-5503
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113599 Tom de Vries changed: What|Removed |Added CC||vries at gcc dot gnu.org --- Comment #2 from Tom de Vries --- FWIW, the inherit order is relevant, after applying this change we get the expected result: ... -struct thread_info : public dummy, public intrusive_list_node { +struct thread_info : public intrusive_list_node, public dummy { ... This could be used as workaround.
[Bug debug/112565] Abnormal Jump in Execution using 'stepi' Command in GDB under O2 optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112565 --- Comment #1 from Tom de Vries --- (In reply to Anonymous from comment #0) > Tom de Vries suggests that this issue may be attributed to a GCC > optimization bug. I do not.
[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 --- Comment #7 from Tom de Vries --- (In reply to Alexander Monakov from comment #5) > This trips Valgrind's data race detector (valgrind --tool=helgrind) too. So > I don't think checking SANITIZE_THREAD is the correct approach. Can you elaborate on what you consider a correct approach?
[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 --- Comment #6 from Tom de Vries --- (In reply to rguent...@suse.de from comment #4) > I'm suggesting to not fix it ;) Can you explain why ? It doesn't look difficult to fix to me, and I don't see any downsides. > That said, is TSAN a useful vehicle? Well, false positives aside, yes.
[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 --- Comment #2 from Tom de Vries --- (In reply to Richard Biener from comment #1) > We consider introducing load data races OK, what's the difference here? This is a load vs. store data race. > There are other passes that would do similar things but in practice the > loads would be considered to possibly trap so the real-world impact might be > limited? If you're suggesting to fix this in a more generic way, I'm all for it.
[Bug sanitizer/110799] New: [tsan] False positive due to -fhoist-adjacent-loads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 Bug ID: 110799 Summary: [tsan] False positive due to -fhoist-adjacent-loads Product: gcc Version: 13.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: sanitizer Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org, jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at gcc dot gnu.org Target Milestone: --- I build gdb with -O2 -fsanitizer=thread, and ran into a false positive due to -fhoist-adjacent-loads. Minimal example: ... $ cat race.c #include #include int c; struct s { int a; int b; }; struct s v1; int v3; void * thread1 (void *x) { v1.a = 1; return NULL; } void * thread2 (void *x) { v3 = c ? v1.a : v1.b; return NULL; } int main (void) { pthread_t t[2]; pthread_create([0], NULL, thread1, NULL); pthread_create([1], NULL, thread2, NULL); pthread_join(t[0], NULL); pthread_join(t[1], NULL); return 0; } ... With O0, runs fine: ... $ gcc race.c -fsanitize=thread -g $ ./a.out $ ... With O2, a race is reported: ... $ gcc race.c -fsanitize=thread -g -O2 $ ./a.out == WARNING: ThreadSanitizer: data race (pid=24538) Read of size 4 at 0x00404060 by thread T2: #0 thread2 /data/vries/gdb/race.c:26 (a.out+0x401299) (BuildId: 295673549b1e99c73c70a2a8d26944f177f88c15) #1 (libtsan.so.2+0x3c329) (BuildId: 8f2a9be581a0fcb3d7109755a6067408093b9dbd) Previous write of size 4 at 0x00404060 by thread T1: #0 thread1 /data/vries/gdb/race.c:19 (a.out+0x401257) (BuildId: 295673549b1e99c73c70a2a8d26944f177f88c15) #1 (libtsan.so.2+0x3c329) (BuildId: 8f2a9be581a0fcb3d7109755a6067408093b9dbd) ... With -fno-hoist-adjacent-loads, it's fine again: ... $ gcc race.c -fsanitize=thread -g -O2 -fno-hoist-adjacent-loads $ ./a.out $ ... The optimization transforms these loads: ... v3 = c ? v1.a : v1.b; ... into: ... int tmp_a = v1.a; int tmp_b = v1.b; v3 = c ? tmp_a : tmp_b ... which introduces the false positive. So I wonder if there should be a change like this: ... static bool gate_hoist_loads (void) { return (flag_hoist_adjacent_loads == 1 + && (flag_sanitize & SANITIZE_THREAD) == 0 && param_l1_cache_line_size && HAVE_conditional_move); } ...
[Bug c/109708] New: [c, doc] wdangling-pointer example broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109708 Bug ID: 109708 Summary: [c, doc] wdangling-pointer example broken Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- I ran into a Wdangling-pointer warning and decided to read the docs and try out an example. The first one listed is: ... int f (int c1, int c2, x) { char *p = strchr ((char[]){ c1, c2 }, c3); // warning: dangling pointer to a compound literal return p ? *p : 'x'; } ... It's not a complete example, x is missing a declared type and c3 is undeclared. After fixing that (and adding the implicit "#include "), we have an example that compiles: ... #include int f (int c1, int c2, int c3) { char *p = strchr ((char[]){ c1, c2 }, c3); return p ? *p : 'x'; } ... but no warning, not at O0, O1, O2 or O3: ... $ gcc test.c -Wdangling-pointer=1 -c $ ... After reading the description of the warning, I managed to come up with: ... char f (char c1, char c2) { char *p; { p = (char[]) { c1, c2 }; } return *p; } ... which does manage to trigger the warning for O0-O3: ... $ gcc test.c -Wdangling-pointer=1 -c test.c: In function ‘f’: test.c:10:10: warning: using dangling pointer ‘p’ to an unnamed temporary [-Wdangling-pointer=] 10 | return *p; | ^~ test.c:7:18: note: unnamed temporary defined here 7 | p = (char[]) { c1, c2 }; | ^ $ ... It might be worth mentioning that it's a C example, when using g++ we have: ... $ g++ test.c -Wdangling-pointer=1 -c test.c: In function ‘char f(char, char)’: test.c:7:18: error: taking address of temporary array 7 | p = (char[]) { c1, c2 }; | ^~ ... BTW, note that the warning can be fixed by doing: ... char f (char c1, char c2) { char *p; + char c; { p = (char[]) { c1, c2 }; + c = *p; } - return *p; + return c; } ...
[Bug debug/108600] Use DW_LNS_set_prologue_end
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600 --- Comment #4 from Tom de Vries --- (In reply to Tom de Vries from comment #0) > Note that for for instance gdb test-case gdb.ada/ref_param.exp, this > convention was broken for gcc 7.5.0 (and I don't know how much earlier), and > my current guess is that it got fixed in gcc 11.1.0, by commit c029fcb5680 > ("Reset force_source_line in final.c"). Confirmed, see PR108615.
[Bug debug/108615] Incorrect prologue marker in line table
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108615 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED Target Milestone|--- |11.0 Keywords||wrong-debug --- Comment #1 from Tom de Vries --- Closing with milestone gcc 11.0.
[Bug debug/108615] New: Incorrect prologue marker in line table
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108615 Bug ID: 108615 Summary: Incorrect prologue marker in line table Product: gcc Version: 10.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- [ Filing FTR, to link a commit to a test-case that it fixes, and to be able to refer to it from gdb sources. ] Consider pck.adb/pck.ads from here ( https://sourceware.org/git/?p=binutils-gdb.git;a=tree;f=gdb/testsuite/gdb.ada/ref_param;h=823556d8928bd1de865e71400092e6a44b2773cd;hb=HEAD ). Compile like so, with system gcc 7.5.0: ... $ gcc pck.adb -S -g ... In the .s file we find: ... pck__call_me: .LFB3: .file 1 "pck.adb" .loc 1 18 0 .cfi_startproc .loc 1 18 0 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rsp, %rbp .cfi_def_cfa_register 6 movq%rdi, -8(%rbp) .loc 1 20 0 ... There's a convention that the second entry in the line table indicates the end of the prologue. So the second .loc indicates that the prologue ends there, while it ends only after the third insn, at line 20. Same with 10.4.0. Fixed by commit c029fcb5680 ("Reset force_source_line in final.c"), first available in release 11.1.0. With 11.3.0 we have: ... pck__call_me: .LFB3: .file 1 "pck.adb" .loc 1 18 4 .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rsp, %rbp .cfi_def_cfa_register 6 movq%rdi, -8(%rbp) .loc 1 20 16 ...
[Bug debug/108600] Use DW_LNS_set_prologue_end
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600 --- Comment #3 from Tom de Vries --- (In reply to Tom de Vries from comment #2) > (In reply to Tom de Vries from comment #1) > > Created attachment 54371 [details] > > We probably don't want to emit in all cases, maybe limiting to > "dwarf_version >= 3", or "!dwarf_strict || dwarf_version >= 3". Let's see about dwarf_strict. Semantics: ... -gstrict-dwarf Disallow using extensions of later DWARF standard version than selected with -gdwarf-version. On most targets using non-conflicting DWARF extensions from later standard versions is allowed. ... For the -gas-loc-support case (gcc emitting .locs), even when passing -gdwarf-2, gas emits a v3 version .debug_line section (since binutils-2_32). And even if we'd fix that (I've filed https://sourceware.org/bugzilla/show_bug.cgi?id=30064), the way gas works is by bumping the dwarf version when encountering a feature that requires a higher version, so using end_prologue in a loc directive would then end up bumping dwarf_level to 3, bumping also the .debug_line version. For the -gno-as-loc-support case (gcc emitting .debug_line contribution), for -gdwarf-2 gcc indeed emits a v2 .debug_line section. But, that makes DW_LNS_set_prologue_end fall in the range of vendor specific extensions, and we can't conflict with that. Taking this all into account, I think it's better not to emit DW_LNS_set_prologue_end for -gdwarf-2 -gno-strict-dwarf.
[Bug debug/47471] [10/11/12/13 Regression] stdarg functions extraneous too-early prologue end
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47471 Tom de Vries changed: What|Removed |Added CC||vries at gcc dot gnu.org --- Comment #24 from Tom de Vries --- I can reproduce this with gcc 4.8.5: ... f: .LFB0: .file 1 "test.c" .loc 1 3 0 .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rsp, %rbp .cfi_def_cfa_register 6 subq$60, %rsp movq%rsi, -168(%rbp) movq%rdx, -160(%rbp) movq%rcx, -152(%rbp) movq%r8, -144(%rbp) movq%r9, -136(%rbp) testb %al, %al je .L2 .loc 1 3 0 movaps %xmm0, -128(%rbp) movaps %xmm1, -112(%rbp) movaps %xmm2, -96(%rbp) movaps %xmm3, -80(%rbp) movaps %xmm4, -64(%rbp) movaps %xmm5, -48(%rbp) movaps %xmm6, -32(%rbp) movaps %xmm7, -16(%rbp) .L2: movl%edi, -180(%rbp) .loc 1 4 0 movlv(%rip), %eax ... But with gcc 7.5.0, I get: ... f: .LFB0: .file 1 "test.c" .loc 1 3 0 .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rsp, %rbp .cfi_def_cfa_register 6 subq$72, %rsp movl%edi, -180(%rbp) movq%rsi, -168(%rbp) movq%rdx, -160(%rbp) movq%rcx, -152(%rbp) movq%r8, -144(%rbp) movq%r9, -136(%rbp) testb %al, %al je .L3 movaps %xmm0, -128(%rbp) movaps %xmm1, -112(%rbp) movaps %xmm2, -96(%rbp) movaps %xmm3, -80(%rbp) movaps %xmm4, -64(%rbp) movaps %xmm5, -48(%rbp) movaps %xmm6, -32(%rbp) movaps %xmm7, -16(%rbp) .L3: .loc 1 4 0 movlv(%rip), %eax addl$1, %eax movl%eax, v(%rip) ... So, isn't this fixed?
[Bug debug/108600] Use DW_LNS_set_prologue_end
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600 --- Comment #2 from Tom de Vries --- (In reply to Tom de Vries from comment #1) > Created attachment 54371 [details] We probably don't want to emit in all cases, maybe limiting to "dwarf_version >= 3", or "!dwarf_strict || dwarf_version >= 3".
[Bug debug/108600] Use DW_LNS_set_prologue_end
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600 --- Comment #1 from Tom de Vries --- Created attachment 54371 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54371=edit tentative patch Tentative patch. For hello.c, for the -gas-loc-support case it gives us: ... $ gcc -g ~/hello.c -S -o- ... main: .LFB0: .file 1 "/home/vries/hello.c" .loc 1 5 1 .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rsp, %rbp .cfi_def_cfa_register 6 .loc 1 6 3 prologue_end movl$.LC0, %edi ... And for the -gno-as-loc-support case: ... $ gcc -g ~/hello.c -c -gno-as-loc-support $ llvm-dwarfdump --debug-line hello.o ... AddressLine Column File ISA Discriminator Flags -- -- -- -- --- - - 0x 5 0 1 0 0 is_stmt 0x0004 6 1 1 0 0 is_stmt prologue_end 0x000e 8 3 1 0 0 is_stmt 0x0013 9 10 1 0 0 is_stmt 0x0015 9 1 1 0 0 is_stmt end_sequence ...
[Bug debug/108600] New: Use DW_LNS_set_prologue_end
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600 Bug ID: 108600 Summary: Use DW_LNS_set_prologue_end Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- The prologue_end marker (introduced in dwarf v3) is currently not emitted by gcc. This is the case for -gno-as-loc-support (gcc emitting .debug_line contribution). And for -gas-loc-support (gcc emitting .loc directives). Note that there is an existing convention that marks the end of the prologue: the second entry in the line table marks the first insn after the prologue. However, that convention is not known by everyone, which makes it easier to break it without noticing. Note that for for instance gdb test-case gdb.ada/ref_param.exp, this convention was broken for gcc 7.5.0 (and I don't know how much earlier), and my current guess is that it got fixed in gcc 11.1.0, by commit c029fcb5680 ("Reset force_source_line in final.c"). The only way to make the end of the prologue visible in assembly as well as disassembly is to make it explicit by using DW_LNS_set_prologue_end.
[Bug libgomp/108098] OpenMP/nvptx reverse offload execution test FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108098 --- Comment #1 from Tom de Vries --- (In reply to Thomas Schwinge from comment #0) > $ nvidia-smi > [...] > | NVIDIA-SMI 440.33.01Driver Version: 440.33.01CUDA Version: 10.2 > [...] > | 0 Tesla K80 [...] > [...] > | 1 Tesla K80 [...] > I'm not sure if it matters for triggering this problem, but if I look at this board at nvidia drivers download and select cuda 10.2 and production branch, I get : ... version:440.118.02 Release Date: 2020.9.30 ... Then using the "Beta and Older Drivers" I find the version you're using is: ... version: 440.33.01 Release date: November 19, 2019 ... Please always use the latest drivers when reporting a problem.
[Bug debug/107909] New: [powerpc64le, debug] Incorrect call site location due to nop after call insn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107909 Bug ID: 107909 Summary: [powerpc64le, debug] Incorrect call site location due to nop after call insn Product: gcc Version: 7.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Consider test-case vla-optimized-out.c: ... int __attribute__((noinline,weak)) __attribute__((noclone)) f1 (int i) { char a[i + 1]; a[0] = 5; return a[0]; } int main (void) { volatile int j; int i = 5; asm volatile ("" : "=r" (i) : "0" (i)); j = f1 (i); return 0; } ... compiled with -O1 -g. We generate a nop after the bl insn: ... bl f1# 11 *call_value_nonlocal_aixdi [length = 8] nop .LVL4: ... and the label after the nop is a call site location: ... .uleb128 0x5 # (DIE (0x67) DW_TAG_GNU_call_site) .8byte .LVL4# DW_AT_low_pc .4byte 0x81 # DW_AT_abstract_origin ... Consequently we can't actually find the call site: ... $ gdb -q -batch ./a.out -ex "break f1" -ex run -ex "set debug entry-values 1" -ex "print sizeof (a)" Breakpoint 1 at 0x165c: file vla-optimized-out.c, line 8. Breakpoint 1, f1 (i=5) at vla-optimized-out.c:8 8 } DW_OP_entry_value resolving cannot find DW_TAG_call_site 0x1690 in main $1 = ... If we manually fix this in the .s file, we get a bit further: ... $ gdb -q -batch ./a.out -ex "break f1" -ex run -ex "set debug entry-values 1" -ex "print sizeof (a)" Breakpoint 1 at 0x165c: file vla-optimized-out.c, line 8. Breakpoint 1, f1 (i=5) at vla-optimized-out.c:8 8 } Cannot find matching parameter at DW_TAG_call_site 0x1690 at main $1 = ... The problem now is that the DW_AT_abstract_origin in the DW_TAG_GNU_call_site is not properly handled by gdb. I reproduced this with gcc 7.5.0, but after looking at the pattern for call_value_nonlocal_aixdi I think this should be reproducible with trunk.
[Bug other/87741] Don't build readline/libreadline.a in GDB, when --with-system-readline is supplied
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87741 Tom de Vries changed: What|Removed |Added CC||vries at gcc dot gnu.org Status|NEW |RESOLVED Target Milestone|--- |13.0 Resolution|--- |FIXED Component|bootstrap |other --- Comment #8 from Tom de Vries --- (In reply to Andrew Pinski from comment #7) > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;f=configure.ac; > h=69961a84c9b3744a10248fb6cbccc3c688a1e0a5 > > It would be useful if both configures were synced up again https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=36ba985145ffa8e2078033fc1f1cf22851707a8e
[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555 --- Comment #17 from Tom de Vries --- (In reply to Thomas Schwinge from comment #14) > > That's with a Nvidia Tesla K20c GPU, Driver Version: 346.46. > > As that version is "a bit old", I shall first update this, before we spend > > any further time on analyzing this. > > Cross-checking on another system with Nvidia Tesla K20c GPU but more recent > Driver Version I'm not seeing such an issue. > > On the "old" system, gradually upgrading Driver Version: 346.46 to 352.99, > 361.93.02, 375.88 (always the latest (?) version of the respective series), > these all did not resolve the problem. > > Only starting with 384.59 (that is, early version of the 384.X series), that > then did resolve the issue. That's still using the GCC/nvptx '-mptx=3.1' > multilib. > > (We couldn't with earlier series, but given this is 384.X, we may now also > cross-check with the default multilib, and that also was fine.) > > Now, I don't know if at all we would like to spend any more effort on this > issue, given that it only appears with rather old pre-384.X versions -- but > on the other hand, the GCC/nvptx '-mptx=3.1' multilib is meant to keep these > supported? (... which is why I'm running such testing; and certainly the > timeouts are annoying there.) > > It might be another issue with pre-384.X versions of the Nvidia PTX JIT, or > is there the slight possibility that GCC is generating/libgomp contains some > "weird" code that post-384.X version happen to "fix up" -- probably the > former rather than the latter? (Or, the chance of GPU hardware/firmware or > some other system weirdness -- unlikely, otherwise behaves totally fine?) > > I don't know where to find complete Nvidia Driver/JIT release notes, where > the 375.X -> 384.X notes might provide an idea of what got fixed, and we > might then add another 'WORKAROUND_PTXJIT_BUG' for that -- maybe simple, > maybe not. > > Any thoughts, Tom? I care about old cards, not about old drivers. The oldest card we support is an sm_30, and last driver series that supports that one is 470.x (and AFAIU, is therefore supported by nvidia for that arch). There's the legacy series, 390.x, which is the last to support fermi, but we don't support any fermi cards or earlier. I did do some testing with this one for later cards, but reported issues are acknowledged but not fixed by nvidia, so ... this is already out of scope for me. So yeah, IWBN to come up with workarounds for various older drivers, but I'm not investing time in that. Is there a problem for you to move to 470.x or later (515.x) ? Is there a card for which that causes problems ?
[Bug debug/105772] [debug, i386] sched2 moves get_pc_thunk call past debug_insn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105772 --- Comment #2 from Tom de Vries --- As background info, I'm proposing a patch for gdb to have the architecture-specific prologue skipper skip over the get_pc_thunk call: https://sourceware.org/pipermail/gdb-patches/2022-May/189563.html , which helps to skip over the prologue with -O0 -pie -fPIE code. But that causes a regression in test-case gdb/testsuite/gdb.base/break.exp, because of this PR.
[Bug debug/105772] New: [debug, i386] sched2 moves get_pc_thunk call past debug_insn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105772 Bug ID: 105772 Summary: [debug, i386] sched2 moves get_pc_thunk call past debug_insn Product: gcc Version: 12.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Consider the test-case source gdb/testsuite/gdb.base/break1.c ( https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/testsuite/gdb.base/break1.c;h=24d0d15dc2e8bd14f6b14fabbb38fec43db9c990;hb=HEAD ) containing function marker4. Extracting the relevant part: ... struct some_struct { int a_field; int b_field; union { int z_field; }; }; struct some_struct values[50]; void marker4 (long d) { values[0].a_field = d; }/* set breakpoint 14 here */ ... When compiling with gcc 12.1.0 and -O2 like so: ... $ gcc -fno-stack-protector -m32 -fPIE -pie -w -c -g break1.c -O2 ... we have before sched2: ... (note 4 1 14 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 14 4 11 2 NOTE_INSN_PROLOGUE_END) (insn/f 11 14 3 2 (parallel [ (set (reg:SI 0 ax [82]) (unspec:SI [ (const_int 0 [0]) ] UNSPEC_SET_GOT)) (clobber (reg:CC 17 flags)) ]) 931 {*set_got} (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUIV (unspec:SI [ (const_int 0 [0]) ] UNSPEC_SET_GOT) (expr_list:REG_CFA_FLUSH_QUEUE (nil) (nil) (note 3 11 6 2 NOTE_INSN_FUNCTION_BEG) (debug_insn 6 3 12 2 (debug_marker) "break1.c":59:25 -1 (nil)) (insn 12 6 8 2 (set (reg/v:SI 1 dx [orig:83 d ] [83]) (mem/c:SI (plus:SI (reg/f:SI 7 sp) (const_int 4 [0x4])) [5 d+0 S4 A32])) "break1.c":59:43 81 {*movsi_internal} (expr_list:REG_EQUIV (mem/c:SI (reg/f:SI 16 argp) [5 d+0 S4 A32]) (nil))) ... and after: ... (note 4 1 14 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 14 4 3 2 NOTE_INSN_PROLOGUE_END) (note 3 14 6 2 NOTE_INSN_FUNCTION_BEG) (debug_insn 6 3 11 2 (debug_marker) "break1.c":59:25 -1 (nil)) (insn/f:TI 11 6 12 2 (parallel [ (set (reg:SI 0 ax [82]) (unspec:SI [ (const_int 0 [0]) ] UNSPEC_SET_GOT)) (clobber (reg:CC 17 flags)) ]) 931 {*set_got} (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUIV (unspec:SI [ (const_int 0 [0]) ] UNSPEC_SET_GOT) (expr_list:REG_CFA_FLUSH_QUEUE (nil) (nil) (insn 12 11 8 2 (set (reg/v:SI 1 dx [orig:83 d ] [83]) (mem/c:SI (plus:SI (reg/f:SI 7 sp) (const_int 4 [0x4])) [5 d+0 S4 A32])) "break1.c":59:43 81 {*movsi_internal} (expr_list:REG_EQUIV (mem/c:SI (reg/f:SI 16 argp) [5 d+0 S4 A32]) (nil))) ... This moves the get_pc_thunk call after the debug_insn, making it (in terms of debug info) part of the first statement instead of the prologue. That is, with -O1 we have insn: ... 000d : d: e8 fc ff ff ff call e 12: 05 01 00 00 00 add$0x1,%eax 17: 8b 54 24 04 mov0x4(%esp),%edx 1b: 89 90 00 00 00 00 mov%edx,0x0(%eax) 21: c3 ret ... and line info: ... File nameLine numberStarting addressViewStmt break1.c 59 0xd x break1.c 59 0xd 1 break1.c 590x17 x break1.c 590x17 1 break1.c 590x21 break1.c -0x22 ... so at 0x17 we have the start of a statement. But with -O2 we have identical insn: ... 0030 : 30: e8 fc ff ff ff call 31 35: 05 01 00 00 00 add$0x1,%eax 3a: 8b 54 24 04 mov0x4(%esp),%edx 3e: 89 90 00 00 00 00 mov%edx,0x0(%eax) 44: c3 ret ... but different line info: ... File nameLine numberStarting addressViewStmt break1.c 590x30 x break1.c 590x30 1 x break1.c 590x3a break1.c 590x44 break1.c -0x45 ... so at 0x3a we don't have the start of a statement.
[Bug target/104893] [nvptx] Handle Independent Thread Scheduling for sm_70+ with -msoft-stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104893 Tom de Vries changed: What|Removed |Added Resolution|--- |WORKSFORME Status|UNCONFIRMED |RESOLVED --- Comment #2 from Tom de Vries --- (In reply to Tom de Vries from comment #1) > (In reply to Tom de Vries from comment #0) > > The per-thread call stack is handled for .local memory by the CUDA driver. > > > > For the 'soft stack' that's not the case. > > Hmm, actually there's .local memory used, just not "directly". Possibly the > documentation needs updating to point that out. > > So, there doesn't seem to be an issue related to overlapping storage. > > So I wonder, is the stack pointer also per thread then? Or still per-warp? OK, here ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203#c6 ) we read: ... The pointer is switched between per-warp global memory and per-lane local memory. ... So, I think this should be fine then. Marking this resolved-worksforme until we run into an actual failing test-case.
[Bug target/104857] [nvptx] Add macro specifying ptx isa version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104857 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED Target Milestone|--- |12.0 --- Comment #3 from Tom de Vries --- Committed.
[Bug target/104714] [nvptx] Means to specify any sm_xx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104714 Tom de Vries changed: What|Removed |Added Target Milestone|--- |12.0 Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #3 from Tom de Vries --- Added march-map.
[Bug driver/105096] New: --target-help not an alias for --help=target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105096 Bug ID: 105096 Summary: --target-help not an alias for --help=target Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: trivial Priority: P3 Component: driver Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- In common.opt, we read: ... -target-help Common Driver Alias for --help=target. ... But that doesn't seem to be correct, I get different results. For instance, for nvptx target, we have an malias that atm is undocumented, and we have: ... $ cc1 --target-help 2>&1 | grep malias $ ... and: ... $ cc1 --help=target 2>&1 | grep malias -malias [disabled] $...
[Bug c/53037] warn_if_not_aligned(X)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53037 Tom de Vries changed: What|Removed |Added CC||vries at gcc dot gnu.org --- Comment #43 from Tom de Vries --- *** Bug 81909 has been marked as a duplicate of this bug. ***
[Bug target/81909] Missing warning in gcc.dg/pr53037-{2,3}.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81909 Tom de Vries changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #3 from Tom de Vries --- Marking resolved-duplicate. *** This bug has been marked as a duplicate of bug 53037 ***
[Bug target/81728] nvptx-run: error getting kernel result: the launch timed out and was terminated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81728 Tom de Vries changed: What|Removed |Added Resolution|--- |WORKSFORME Status|UNCONFIRMED |RESOLVED --- Comment #1 from Tom de Vries --- The error message looks like the one produced due to the 5 second watchdog that is installed when the board is running a display manager. Anyway, in order to run into this timeout, the test-case should hang, and it currently doesn't. At least not on any of the boards I've recently tested with, with the supported drivers. So, possibly this was fixed in gcc, or in a driver. There's not enough information in here to reproduce, so closing as resolved-worksforme.
[Bug target/104818] Duplicate word "version" in option -mptx description
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104818 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED Target Milestone|--- |12.0 --- Comment #2 from Tom de Vries --- Fixed in aforementioned commit.
[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042 Tom de Vries changed: What|Removed |Added Severity|normal |enhancement --- Comment #8 from Tom de Vries --- With the conversation shifted to better error messages, re-classifying as enhancement.
[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075 --- Comment #6 from Tom de Vries --- Created attachment 52698 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52698=edit Demonstrator patch with stepping stone patterns for combine (In reply to Tom de Vries from comment #2) > Also, I wonder if defining a stepping-stone intermediate pattern could help > combine. Well, that approach seems to work.
[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014 --- Comment #6 from Tom de Vries --- Reproducer filed at https://github.com/vries/nvidia-bugs/tree/master/shift-and PR filed at nvidia ( https://developer.nvidia.com/nvidia_bug/3585290 ).
[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075 Tom de Vries changed: What|Removed |Added Attachment #52693|0 |1 is obsolete|| --- Comment #3 from Tom de Vries --- Created attachment 52694 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52694=edit Demonstrator patch v2 Forgot to add test-case.
[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075 --- Comment #2 from Tom de Vries --- AFAIU, at gimple level support is limited to vectors, so that doesn't help to generate the insn for the simple, scalar case. It would be nice if combine could generate it and we wouldn't have to use a peephole, but AFAIU the pattern is too complex for that. I wonder if reformulating using a conditional could help there (ptx isa describes semantics using "c + ((a
[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075 --- Comment #1 from Tom de Vries --- Created attachment 52693 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52693=edit Demonstrator patch This patch adds: - modeling of insn sad.u32 in the .md file - peephole2 to generate it (which is incomplete, it needs some safety-checks related to using unique intermediate regs) - extra instance of peephole2 pass (otherwise, the peephole is not triggered) - extra instance of fast_rtl_dce pass (otherwise, unused intermediate insn are not cleaned up) So for the usad_2 in the test-case, we have without the patch: ... cvt.u64.u32 %r32, %r28; cvt.u64.u32 %r33, %r29; sub.u64 %r34, %r32, %r33; abs.s64 %r35, %r34; cvt.u32.u64 %r36, %r35; add.u32 %value, %r36, %r30; ... and with: ... sad.u32 %value, %r28, %r29, %r30; ...
[Bug target/105075] New: [nvptx] Generate sad insn (sum of absolute differences)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075 Bug ID: 105075 Summary: [nvptx] Generate sad insn (sum of absolute differences) Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- ptx has sad ((sum of absolute differences)) insn, which is currently not modeled in the .md file.
[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042 --- Comment #5 from Tom de Vries --- (In reply to Richard Biener from comment #1) > Doesn't whatever driver/library API we use from libgomp to invoke workloads > report actual errors? Maybe we need to improve there. This: ... libgomp: cuStreamSynchronize error: the launch timed out and was terminated ... seems to be the string for cudaErrorLaunchTimeout, which AFAICT is dedicated to this situation, so we could treat that error code specially in cuda_error in plugin-nvptx.c and emit a custom message. Say: ... libgomp: cuStreamSynchronize error: the launch timed out and was terminated (5 second time-out caused by launching on a device running a display manager) ... Alternatively, we could detect cudaDeviceProp::kernelExecTimeoutEnabled and emit a warning when initializing or before launching the first kernel.
[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042 --- Comment #4 from Tom de Vries --- https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592275.html
[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042 --- Comment #3 from Tom de Vries --- (In reply to Tom de Vries from comment #2) > (In reply to Richard Biener from comment #1) > > Doesn't whatever driver/library API we use from libgomp to invoke workloads > > report actual errors? Maybe we need to improve there. > > Good point, it reported some form of timeout. I'll post the exact form once > I reproduce. It's: ... Execution timeout is: 300 spawn [open ...]^M libgomp: cuStreamSynchronize error: the launch timed out and was terminated FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test ... Googling a bit about this error message ( https://forums.developer.nvidia.com/t/need-to-remove-timeouts-and-the-launch-timed-out-and-was-terminated-message/16741/2 ) shows that running a display manager sets a 5/10 seconds watchdog timer on any kernel.
[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014 --- Comment #5 from Tom de Vries --- Minimal test-case: ... void __attribute__((noinline)) foo (unsigned long long d0) { unsigned long long __a; __a = 0x38; for (; __a > 0; __a -= 8) if (((d0 >> __a) & 0xff) != 0) break; __builtin_printf ("__a: 0x%llx\n", __a); } int main (void) { foo (1); return 0; } ... Different value of __a: ... $ ./install/bin/nvptx-none-run -O0 ./pr97459-1.exe ; echo; ./install/bin/nvptx-none-run ./pr97459-1.exe __a: 0x0 __a: 0x30 ...
[Bug target/105011] [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Target Milestone|--- |12.0 Resolution|--- |FIXED --- Comment #5 from Tom de Vries --- Fixed by commit.
[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042 --- Comment #2 from Tom de Vries --- (In reply to Richard Biener from comment #1) > Doesn't whatever driver/library API we use from libgomp to invoke workloads > report actual errors? Maybe we need to improve there. Good point, it reported some form of timeout. I'll post the exact form once I reproduce.
[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014 --- Comment #4 from Tom de Vries --- (In reply to Tom de Vries from comment #1) > With -O0 JIT instead: Also OK with JIT -O1, problems start at JIT -O2.
[Bug target/105011] [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011 --- Comment #3 from Tom de Vries --- Submitted fix : https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592211.html Though without changelog, apparently.
[Bug libgomp/105042] New: [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042 Bug ID: 105042 Summary: [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org CC: jakub at gcc dot gnu.org Target Milestone: --- I usually have only an nvidia-compute$n driver package installed, but sometimes (as happened when I updated the system yesterday) also x11-video-nvidia$n, after which X is run on the nvidia card (instead of on the builtin intel graphics). With such a setup, I run into a cluster of FAILs, all for GOMP_NVPTX_JIT=-O0: ... $ grep ^FAIL: 2/libgomp.sum FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vred2d-128.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vred2d-128.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/parallel-dims.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/vred2d-128.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/vred2d-128.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.oacc-fortran/parallel-dims.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.oacc-fortran/parallel-dims.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O1 -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.oacc-fortran/parallel-dims.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -Os -DGOMP_NVPTX_JIT=-O0 execution test ... Note that this is with a patch from PR104423 that runs tests both with default JIT optimization and GOMP_NVPTX_JIT=-O0, hence the -DGOMP_NVPTX_JIT=-O0 tag. But it can be reproduced by just doing: ... export GOMP_NVPTX_JIT=-O0 ... It could be that the test-cases just need scaling down. OTOH, it also could be that there's an underlying problem that only surfaces when other processes are run in parallel, or specifically, X. This is on board K2000 with driver 470.103.01. The board has 2GB of memory, and according to nvidia-smi, having the X processes takes a couple of 100MBs, and ./parallel-dims.exe just takes 15MiB, so at first glance it doesn't seem to be an out-of-board-memory thing. I do observe reduced system responsiveness while running the tests, so maybe it's the compute capacity rather than memory which is exhausted.
[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019 --- Comment #6 from Tom de Vries --- (In reply to Tom de Vries from comment #4) > OK, I think this is the pattern: > ... > $ cat gcc/testsuite/gcc.target/nvptx/alias-5.c FTR, same thing if I use static functions: ... static void __attribute__((noinline)) __f () { v = 1; } static void f () __attribute__ ((alias ("__f"))); static void g (void) { f (); } ...
[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019 --- Comment #5 from Tom de Vries --- Creating a CUDA example is hampered by the fact that there's no symbol alias support, AFAICT. I'd like to write something like: ... __device__ void __foo () { printf ("__foo\n"); } __device__ void foo () __attribute__((alias ("__foo"))); __device__ void bar () { foo (); } __global__ void hello_world () { bar (); } ... (and then comment out bar to reproduce the problem), but all attempts to get this compiled and executed end up in various errors. Of course we can resort to hand-edited ptx, or adding .alias directives in asm statements, but that's likely to produce an 'unsupported' response by nvidia when filed as bug report.
[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019 --- Comment #4 from Tom de Vries --- OK, I think this is the pattern: ... $ cat gcc/testsuite/gcc.target/nvptx/alias-5.c /* { dg-do link } */ /* { dg-do run { target runtime_ptx_isa_version_6_3 } } */ /* { dg-options "-save-temps -malias -mptx=6.3" } */ int v; void __attribute__((noinline)) __f () { v = 1; } void f () __attribute__ ((alias ("__f"))); void g (void) { f (); } int main (void) { if (v != 0) __builtin_abort (); return 0; } ... There's a function __f: ... // BEGIN GLOBAL FUNCTION DEF: __f .visible .func __f { .reg .u32 %r22; mov.u32 %r22,1; st.global.u32 [v],%r22; ret; } ... with alias f: ... // BEGIN GLOBAL FUNCTION DECL: __f .visible .func __f; .visible .func f; .alias f,__f; ... called from g: ... // BEGIN GLOBAL FUNCTION DEF: g .visible .func g { { call f; } ret; } ... However, g is unused. So we have: ... PASS: gcc.target/nvptx/alias-5.c (test for excess errors) spawn nvptx-none-run ./alias-5.exe^M fatal : Internal error: reference to deleted section^M nvptx-run: cuLinkComplete failed: unknown error (CUDA_ERROR_UNKNOWN, 999)^M FAIL: gcc.target/nvptx/alias-5.c execution test ... Calling g from main fixes the internal error.
[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019 --- Comment #3 from Tom de Vries --- Aliases in failing .exe: ... $ strings declare_target-1.exe | grep "\.alias" .alias gomp_ialias_GOMP_taskgroup_start,GOMP_taskgroup_start; .alias gomp_ialias_GOMP_taskgroup_end,GOMP_taskgroup_end; .alias gomp_ialias_GOMP_taskgroup_reduction_register,GOMP_taskgroup_reduction_register; ...
[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019 --- Comment #2 from Tom de Vries --- Aliases in libgomp.a: ... $ grep "\.alias" build-gcc-offload-nvptx-none/nvptx-none/mgomp/libgomp/.libs/libgomp.a .alias gomp_ialias_GOMP_loop_runtime_next,GOMP_loop_runtime_next; .alias gomp_ialias_GOMP_loop_ull_runtime_next,GOMP_loop_ull_runtime_next; .alias gomp_ialias_GOMP_parallel_end,GOMP_parallel_end; .alias gomp_ialias_GOMP_taskgroup_start,GOMP_taskgroup_start; .alias gomp_ialias_GOMP_taskgroup_end,GOMP_taskgroup_end; .alias gomp_ialias_GOMP_taskgroup_reduction_register,GOMP_taskgroup_reduction_register; .alias gomp_ialias_omp_capture_affinity,omp_capture_affinity; .alias gomp_ialias_omp_aligned_alloc,omp_aligned_alloc; .alias gomp_ialias_omp_free,omp_free; .alias gomp_ialias_omp_aligned_calloc,omp_aligned_calloc; ...
[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019 --- Comment #1 from Tom de Vries --- To trigger: ... diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index 87efc23bd96..8bf9ea90a77 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -245,6 +245,9 @@ default_ptx_version_option (void) warp convergence. */ res = MAX (res, PTX_VERSION_6_0); + /* Pick at least 6.3, to enable using malias. */ + res = MAX (res, PTX_VERSION_6_3); + /* Verify that we pick a version that supports the sm. */ gcc_assert (first <= res); return res; diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt index 11288d1a8ee..a4aece80682 100644 --- a/gcc/config/nvptx/nvptx.opt +++ b/gcc/config/nvptx/nvptx.opt @@ -87,7 +87,7 @@ mptx-comment Target Var(nvptx_comment) Init(1) Undocumented malias- -Target Var(nvptx_alias) Init(0) Undocumented +Target Var(nvptx_alias) Init(1) Undocumented mexperimental Target Var(nvptx_experimental) Init(0) Undocumented ... rebuild gcc, run libgomp tests, and: ... $ grep -c "Internal error: reference to deleted section" libgomp.log 637 ... For instance: ... Execution timeout is: 300 spawn [open ...]^M libgomp: Link error log fatal : Internal error: reference to deleted section libgomp: cuLinkComplete error: unknown error libgomp: Cannot map target functions or variables (expected 2, have 4294967295) FAIL: libgomp.c/../libgomp.c-c++-common/declare_target-1.c execution test ...
[Bug target/105019] New: [nvptx] malias in libgomp results in "Internal error: reference to deleted section"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019 Bug ID: 105019 Summary: [nvptx] malias in libgomp results in "Internal error: reference to deleted section" Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- As mentioned in the commit message for malias: ... When enabling malias by default, libgomp detects alias support and consequently libgomp.a will contains a few uses of .alias. This however results in aforementioned "Internal error: reference to deleted section" in many test-cases. Either there's some error with how .alias is used, or there's a driver bug. While this issue is not resolved, we keep malias off-by-default. ... This needs to be investigated, and if it's a driver bug, reported to nvidia, or otherwise fixed or worked around. Note: the same error showed up in a test-case where the call to an alias was inlined, and consequently the alias referenced a defined but unused function. To observe this, disable this bit: ... if (!cgraph_node::get (name)->referred_to_p ()) /* Prevent "Internal error: reference to deleted section". */ return; ... in nvptx_asm_output_def_from_decls and run nvptx.exp=alias-2.c: ... PASS: gcc.target/nvptx/alias-2.c (test for excess errors) spawn nvptx-none-run ./alias-2.exe^M fatal : Internal error: reference to deleted section^M nvptx-run: cuLinkComplete failed: unknown error (CUDA_ERROR_UNKNOWN, 999)^M FAIL: gcc.target/nvptx/alias-2.c execution test ... So it's possible that the error somehow related to this scenario.
[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014 --- Comment #3 from Tom de Vries --- (In reply to Tom de Vries from comment #2) > (In reply to Tom de Vries from comment #0) > > On a quadro k2000 with driver 470.103.01, I run into: > > So, sm_30. > > > ... > > FAIL: gcc.dg/pr97459-1.c execution test > > Reproduced on geforce gt710 (sm_35), with same driver. But not on quadro k620 (sm_50).
[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014 --- Comment #2 from Tom de Vries --- (In reply to Tom de Vries from comment #0) > On a quadro k2000 with driver 470.103.01, I run into: So, sm_30. > ... > FAIL: gcc.dg/pr97459-1.c execution test Reproduced on geforce gt710 (sm_35), with same driver.
[Bug target/105018] [nvptx] Need better alias support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105018 --- Comment #2 from Tom de Vries --- As mentioned before by amonakov, a possibility is to add alias support to the nvptx-tools linker, and use that.
[Bug target/105018] [nvptx] Need better alias support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105018 --- Comment #1 from Tom de Vries --- (In reply to Tom de Vries from comment #0) > Aliases to aliases are not supported (see libgomp.c-c++-common/pr96390.c). > This is currently not prohibited by the compiler, but with the driver link we > run into: "Internal error: alias to unknown symbol" . And that is the reason that libgomp.c-c++-common/pr96390.c and friends doesn't pass when I do: ... /* { dg-additional-options "-foffload=-mptx=6.3 -foffload=-malias" { target offload_target_nvptx } } */ ...
[Bug target/105018] New: [nvptx] Need better alias support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105018 Bug ID: 105018 Summary: [nvptx] Need better alias support Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- We currently have alias support enabled by malias, which relies on the ptx .alias directive. There is a number of limitations, listed in the commit adding malias: ... Only function aliases are supported. Weak aliases are not supported. That is, if I disable the check in nvptx_asm_output_def_from_decls that disallows this, a weak alias is emitted and parsed by the driver. But the test gcc.dg/globalalias.c starts failing, with the behaviour matching the comment about "weird behavior of AIX's .set pseudo-op": a weak alias may resolve to different functions in different files. Aliases to weak symbols are not supported (see gcc.dg/localalias.c). This is currently not prohibited by the compiler, but with the driver link we run into: "error: Function test with .weak scope cannot be aliased". Aliases to aliases are not supported (see libgomp.c-c++-common/pr96390.c). This is currently not prohibited by the compiler, but with the driver link we run into: "Internal error: alias to unknown symbol" . Unreferenced aliases are not emitted (these can occur f.i. when inlining a call to an alias). This avoids driver link error "Internal error: reference to deleted section". ... We'd like an implementation that doesn't have (all of) these limitations.
[Bug target/97106] [nvptx] Issues with weak aliases introduced by C++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97106 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Target Milestone|--- |12.0 Resolution|--- |FIXED --- Comment #5 from Tom de Vries --- Using the test-case from comment 0 and: ... /* { dg-additional-options "-foffload=-malias -foffload=-mptx=6.3 -O0" } */ ... I get: ... $ strings test.exe | grep -i alias.alias _ZN1VILi1EEC1ImvEET_,_ZN1VILi1EEC2ImvEET_; ... so I see a normal alias, not a weak alias. The test-case still fails in the abort. Note that I get the same result with: ... /* { dg-additional-options "-foffload=-mno-alias -foffload=-mptx=6.3 -O2" } */ ... There may be a problem with the test-case, there may be a problem with nvptx c++ support, but the alias issue seems to have been addresses, so I'm closing this one.
[Bug target/97106] [nvptx] Issues with weak aliases introduced by C++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97106 Bug 97106 depends on bug 97102, which changed state. Bug 97102 Summary: [nvptx] PTX JIT compilation failed when using aliases https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97102 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED
[Bug target/97102] [nvptx] PTX JIT compilation failed when using aliases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97102 Tom de Vries changed: What|Removed |Added Target Milestone|--- |12.0 Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #8 from Tom de Vries --- The test-case from comment 0 now works in combination with: /* { dg-additional-options "-foffload=-malias -foffload=-mptx=6.3" } */
[Bug libgomp/98215] Coalescing memory in target region creates slower code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98215 Tom de Vries changed: What|Removed |Added Severity|normal |enhancement Keywords||missed-optimization CC||vries at gcc dot gnu.org
[Bug target/104916] [nvptx] Handle Independent Thread Scheduling for sm_70+ with -muniform-simt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104916 Tom de Vries changed: What|Removed |Added Target Milestone|--- |12.0
[Bug target/104916] [nvptx] Handle Independent Thread Scheduling for sm_70+ with -muniform-simt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104916 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #5 from Tom de Vries --- Fixed.
[Bug target/104783] [nvptx, openmp] Hang/abort with atomic update in simd construct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104783 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |12.0 Status|UNCONFIRMED |RESOLVED --- Comment #8 from Tom de Vries --- Fixed.
[Bug target/104957] [nvptx] Use .alias directive (available starting ptx isa version 6.3)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104957 Tom de Vries changed: What|Removed |Added Target Milestone|--- |12.0 Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #5 from Tom de Vries --- Committed.
[Bug target/104925] [nvptx] Use "%" as register prefix
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104925 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED Target Milestone|--- |12.0 --- Comment #2 from Tom de Vries --- Fixed.
[Bug libgcc/105016] [libgcc, TARGET_HAS_NO_HW_DIVIDE] Incorrect result for __udivmodti4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105016 --- Comment #3 from Tom de Vries --- In libgcc.h, I see: ... #define __udivmoddi4__NDW(udivmod,4) ... and for LIBGCC2_UNITS_PER_WORD == 8 we have: ... #define __NDW(a,b) __ ## a ## ti ## b ... So, AFAICT it's possible that __udivmoddi4 is mapped to __udivmodti4.
[Bug libgcc/105016] [libgcc, TARGET_HAS_NO_HW_DIVIDE] Incorrect result for __udivmodti4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105016 --- Comment #1 from Tom de Vries --- Created attachment 52662 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52662=edit test-case
[Bug libgcc/105016] New: [libgcc, TARGET_HAS_NO_HW_DIVIDE] Incorrect result for __udivmodti4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105016 Bug ID: 105016 Summary: [libgcc, TARGET_HAS_NO_HW_DIVIDE] Incorrect result for __udivmodti4 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgcc Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- While investigating PR105014, I copied the TARGET_HAS_NO_HW_DIVIDE implementation of __udivmoddi4 to the test-case. On x86, when using the native __udivmodti4, I get: ... $ ./a.out a : 0xfffb b : 0x0001 div : 0xfffb mod : 0x ... But with the TARGET_HAS_NO_HW_DIVIDE version I get instead: ... a : 0xfffb b : 0x0001 div : 0x mod : 0xfffc ...
[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014 --- Comment #1 from Tom de Vries --- First FAIL minimizes to: ... typedef __uint128_t T; union u { T t; struct { unsigned long long x; unsigned long long y; } xy; }; #define PRINT(VAR) \ do\ { \ __builtin_printf (#VAR ": lo: %llx\n", VAR.xy.x); \ __builtin_printf (#VAR ": hi: %llx\n", VAR.xy.y); \ } \ while (0) extern T __udivmodti4 (T, T, T *); int main (void) { union u a, b, mod, div; a.t = -4; b.t = 1; PRINT (a); PRINT (b); div.t = __udivmodti4 (a.t, b.t, ); PRINT (div); PRINT (mod); if (mod.t != 0) __builtin_abort (); return 0; } ... Fails like this: ... $ ./install/bin/nvptx-none-run ./pr97459-1.exe a: lo: fffc a: hi: b: lo: 1 b: hi: 0 div: lo: fffd div: hi: mod: lo: mod: hi: 0 nvptx-run: error getting kernel result: unspecified launch failure (CUDA_ERROR_LAUNCH_FAILED, 719) $ ... With -O0 JIT instead: ... $ ./install/bin/nvptx-none-run -O0 ./pr97459-1.exe a: lo: fffc a: hi: b: lo: 1 b: hi: 0 div: lo: fffc div: hi: mod: lo: 0 mod: hi: 0 ...
[Bug target/105014] New: [nvptx] FAIL: gcc.dg/pr97459-1.c execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014 Bug ID: 105014 Summary: [nvptx] FAIL: gcc.dg/pr97459-1.c execution test Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- On a quadro k2000 with driver 470.103.01, I run into: ... FAIL: gcc.dg/pr97459-1.c execution test FAIL: gcc.dg/pr97459-2.c execution test FAIL: gcc.dg/pr97459-3.c execution test FAIL: gcc.dg/pr97459-4.c execution test FAIL: gcc.dg/pr97459-5.c execution test FAIL: gcc.dg/pr97459-6.c execution test ...
[Bug target/105011] [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011 --- Comment #2 from Tom de Vries --- Even better: ... diff --git a/libatomic/tas_n.c b/libatomic/tas_n.c index d0d8c283b495..65eaa7753a51 100644 --- a/libatomic/tas_n.c +++ b/libatomic/tas_n.c @@ -73,7 +73,7 @@ SIZE(libat_test_and_set) (UTYPE *mptr, int smodel) __ATOMIC_RELAXED, __ATOMIC_RELAXED)); post_barrier (smodel); - return woldval != 0; + return (woldval & wval) == wval; } #define DONE 1 ... That also gives back accurate results in case TARGET_ATOMIC_TEST_AND_SET_TRUEVAL has more than one bit set.
[Bug target/105011] [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011 --- Comment #1 from Tom de Vries --- (In reply to Tom de Vries from comment #0) > It should probably do something like: > ... > return (woldval & wval) != 0; > ... Indeed, that fixes the FAILs.
[Bug target/105011] New: [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011 Bug ID: 105011 Summary: [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- On a quadro k2000 with driver 470.103.01 I'm running into this cluster of FAILs: ... FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O3 -g execution test FAIL: gcc.dg/atomic/stdatomic-flag.c -O3 -g execution test ... Minimizing the first FAIL, I end up with: ... #include extern void abort (void); atomic_flag a = ATOMIC_FLAG_INIT; int main () { int b; if ((atomic_flag_test_and_set) ()) abort (); return 0; } ... The atomic access is done by libatomic, using a 64-bit cas loop. If we print the address of a, we have: ... : 000700700200 ... so the pointer is already 64-bit aligned. If we print the 64-bit value of *(unsigned long long *) before and after the test-and-set, we have: ... a: 00024f00 a: 00024f01 ... so that looks all-right as well. At first glance, the problem is in libatomic, tas_n.c: ... wval = (UWORD)__GCC_ATOMIC_TEST_AND_SET_TRUEVAL << shift; woldval = __atomic_load_n (wptr, __ATOMIC_RELAXED); do { t = woldval | wval; } while (!atomic_compare_exchange_w (wptr, , t, true, __ATOMIC_RELAXED, __ATOMIC_RELAXED)); post_barrier (smodel); return woldval != 0; ... What is returned is woldval != 0, but that tests the entire word, not just the byte we're interested in. It should probably do something like: ... return (woldval & wval) != 0; ...
[Bug middle-end/105001] If executing with non-nvptx offloading, but nvptx offloading compilation is enabled: FAIL: libgomp.c/pr104783.c execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105001 --- Comment #1 from Tom de Vries --- Interesting. Can you compare dump files to see where the difference comes from?
[Bug target/104936] [nvptx] Handle weak decl/def distinction in common code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104936 Tom de Vries changed: What|Removed |Added Severity|normal |enhancement Keywords||internal-improvement
[Bug target/104991] New: [nvptx] Simplify muniform-simt transformation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104991 Bug ID: 104991 Summary: [nvptx] Simplify muniform-simt transformation Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- The muniform-simt reorg pass transforms the insn stream, both inside and outside an SIMT region. The transform rewrites atomic insns by adding a predicate to execute in one thread only outside the SIMT region. Furthermore, if the atomic insn has a result, a shuffle is added to propagate the result to all threads in the warp. Inside the SIMT region, the predicate evaluates to true such that all threads in the warp execute it. And the source lane register for the shuffle is set such that the shuffle is a nop inside the SIMT region. However, since we've started using shfl.sync for the shuffle, the shuffle now has it's own predicate, and consequently having a source lane register with different values inside and outside the SIMT region is no longer necessary.
[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |12.0 Status|UNCONFIRMED |RESOLVED --- Comment #7 from Tom de Vries --- Fixed by https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=356e2720e9030927579024c2f060d665a0b9080f .
[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |12.0 Status|UNCONFIRMED |RESOLVED --- Comment #12 from Tom de Vries --- Fixed by "[openmp] Fix SIMT reduction using TRUTH_{AND,OR}IF_EXPR".
[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968 --- Comment #6 from Tom de Vries --- (In reply to Tom de Vries from comment #5) > This patch fixes the ICE at openmp level: > ... > diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc > index 139a0de6100..19af384c634 100644 > --- a/gcc/gimplify.cc > +++ b/gcc/gimplify.cc > @@ -13361,6 +13361,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) >g = gimple_build_bind (NULL_TREE, gfor, NULL_TREE); >g = gimple_build_omp_task (g, task_clauses, NULL_TREE, NULL_TREE, > NULL_TREE, NULL_TREE, NULL_TREE); > + gimple_set_location (g, EXPR_LOCATION (*expr_p)); >gimple_omp_task_set_taskloop_p (g, true); >g = gimple_build_bind (NULL_TREE, g, NULL_TREE); >gomp_for *gforo > ... Submitted a more complete patch here ( https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591954.html ).
[Bug target/104957] [nvptx] Use .alias directive (available starting ptx isa version 6.3)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104957 Tom de Vries changed: What|Removed |Added Severity|normal |enhancement
[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968 Tom de Vries changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #5 from Tom de Vries --- This patch fixes the ICE at openmp level: ... diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 139a0de6100..19af384c634 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -13361,6 +13361,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) g = gimple_build_bind (NULL_TREE, gfor, NULL_TREE); g = gimple_build_omp_task (g, task_clauses, NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE); + gimple_set_location (g, EXPR_LOCATION (*expr_p)); gimple_omp_task_set_taskloop_p (g, true); g = gimple_build_bind (NULL_TREE, g, NULL_TREE); gomp_for *gforo ...
[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968 --- Comment #4 from Tom de Vries --- This ( https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591912.html ) proposed patch fixes this ICE, pinged again.
[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968 --- Comment #3 from Tom de Vries --- (In reply to Tom de Vries from comment #2) > (In reply to Tom de Vries from comment #1) > > Can't reproduce. > > > > It this not fixed by: > > ... > > commit 7862f6ccd85a001e4d70abb00bb95d8c7846ba80 > > Author: Tom de Vries > > Date: Wed Feb 23 09:33:33 2022 +0100 > > > > [nvptx] Fix dummy location in gen_comment > > ... > > ? > > Hmm, wait, of course I have a patch in my stack that's pending for upstream. > Let me undo that one and retry. Ack, reproduced.
[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968 --- Comment #2 from Tom de Vries --- (In reply to Tom de Vries from comment #1) > Can't reproduce. > > It this not fixed by: > ... > commit 7862f6ccd85a001e4d70abb00bb95d8c7846ba80 > Author: Tom de Vries > Date: Wed Feb 23 09:33:33 2022 +0100 > > [nvptx] Fix dummy location in gen_comment > ... > ? Hmm, wait, of course I have a patch in my stack that's pending for upstream. Let me undo that one and retry.
[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968 --- Comment #1 from Tom de Vries --- Can't reproduce. It this not fixed by: ... commit 7862f6ccd85a001e4d70abb00bb95d8c7846ba80 Author: Tom de Vries Date: Wed Feb 23 09:33:33 2022 +0100 [nvptx] Fix dummy location in gen_comment ... ?
[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952 --- Comment #9 from Tom de Vries --- Created attachment 52647 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52647=edit Tentative patch with test-cases, rationale and changelog I'll put this through testing, and submit if no problems found.
[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952 --- Comment #8 from Tom de Vries --- (In reply to Jakub Jelinek from comment #6) > And yes, #c1 is valid. Thanks for confirming. > But would be nice to have similar test with && and > initial result = 2; and arr[] say { 1, 2, 3, 4, 5, 6, 7, ..., 32 } and test > result is 1 at the end to make sure we don't actually do just > orig = orig & (private != 0) > style merging or even just > orig = orig & private; Ack, will add that.
[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952 --- Comment #7 from Tom de Vries --- Alternative fix that doesn't require fiddling with the 'code' var: ... diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index d932d74cb03..d0ddd4a6142 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -6734,7 +6734,10 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimpl e_seq *dlist, x = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY, TREE_TYPE (ivar), 2, ivar, simt_lane); - x = build2 (code, TREE_TYPE (ivar), ivar, x); + /* Make sure x is evaluated unconditionally. */ + tree bfly_var = create_tmp_var (TREE_TYPE (ivar)); + gimplify_assign (bfly_var, x, [2]); + x = build2 (code, TREE_TYPE (ivar), ivar, bfly_var); gimplify_assign (ivar, x, [2]); } tree ivar2 = ivar; ...
[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952 --- Comment #4 from Tom de Vries --- This fixes it: ... diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index d932d74cb03..f2ac8f98e32 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -6734,7 +6734,21 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimpl e_seq *dlist, x = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY, TREE_TYPE (ivar), 2, ivar, simt_lane); - x = build2 (code, TREE_TYPE (ivar), ivar, x); + /* Make sure x is evaluated unconditionally. */ + enum tree_code update_code; + switch (OMP_CLAUSE_REDUCTION_CODE (c)) + { + case TRUTH_ANDIF_EXPR: + update_code = TRUTH_AND_EXPR; + break; + case TRUTH_ORIF_EXPR: + update_code = TRUTH_OR_EXPR; + break; + default: + update_code = code; + break; + } + x = build2 (update_code, TREE_TYPE (ivar), ivar, x); gimplify_assign (ivar, x, [2]); } tree ivar2 = ivar; ...
[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952 --- Comment #3 from Tom de Vries --- Hmm, that seems to be actually due to: ... if (sctx.is_simt) { if (!simt_lane) simt_lane = create_tmp_var (unsigned_type_node); x = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY, TREE_TYPE (ivar), 2, ivar, simt_lane); x = build2 (code, TREE_TYPE (ivar), ivar, x); gimplify_assign (ivar, x, [2]); } ... which gimplifies assigning: ... (gdb) call debug_generic_expr (x) D.2163 || .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164) ... to: ... (gdb) call debug_generic_expr (ivar) D.2163 ...
[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952 --- Comment #2 from Tom de Vries --- I think the problem can be seen already at omp-lower, in the body of the butterfly loop. Let's first look at what we have if we use reduction op '|': ... D.2173 = .GOMP_SIMT_VF (); D.2164 = 1; D.2161 = 0; goto ; : D.2165 = D.2163; D.2165 = D.2163; D.2166 = .GOMP_SIMT_XCHG_BFLY (D.2165, D.2164); D.2167 = D.2165 | D.2166; D.2163 = D.2167; D.2164 = D.2164 << 1; : if (D.2164 < D.2173) goto ; else goto ; : ... Fairly straightforward, we have a loop, runs a couple of times, first a shuffle (GOMP_SIMT_XCHG_BFLY), then an update (D.2167 = D.2165 | D.2166). Now compare that with reduction op '||': ... D.2183 = .GOMP_SIMT_VF (); D.2164 = 1; D.2161 = 0; goto ; : D.2169 = D.2163; D.2170 = (_Bool) D.2169; if (D.2170 != 0) goto ; else goto ; : D.2169 = D.2163; D.2172 = .GOMP_SIMT_XCHG_BFLY (D.2169, D.2164); D.2173 = (_Bool) D.2172; if (D.2173 != 0) goto ; else goto ; : iftmp.5 = 1; goto ; : iftmp.5 = 0; : D.2163 = iftmp.5; D.2164 = D.2164 << 1; : if (D.2164 < D.2183) goto ; else goto ; : ... The shuffle is now conditional. I think the shuffle is inserted too late, in the middle of the update rather than before.
[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952 Tom de Vries changed: What|Removed |Added Keywords||openmp --- Comment #1 from Tom de Vries --- I can reproduce the problem. I've made the simd explicit (I hope that's still valid openmp code): ... $ cat libgomp/testsuite/libgomp.c/test.c #define N 32 static char arr[N]; int main (void) { unsigned int result = 0; for (unsigned int i = 0; i < N; ++i) arr[i] = 0; arr[5] = 1; #pragma omp target map(tofrom:result) map(to:arr) #pragma omp simd reduction(||: result) for (unsigned int i = 0; i < N; ++i) result = result || arr[i]; if (result != 1) return 1; return 0; } ... Easy workaround: ... diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index d932d74cb03..bf6845d654e 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -4641,6 +4641,15 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx, sctx->max_vf = 1; break; } + + if (OMP_CLAUSE_REDUCTION_CODE (c) == TRUTH_ANDIF_EXPR + || OMP_CLAUSE_REDUCTION_CODE (c) == TRUTH_ORIF_EXPR) + { + sctx->max_vf = 1; + break; + } } } if (maybe_gt (sctx->max_vf, 1U)) ...
[Bug target/104957] [nvptx] Use .alias directive (available starting ptx isa version 6.3)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104957 --- Comment #3 from Tom de Vries --- The OvO testsuite, when run at -O2 passes, because it inlines all .alias instances. But at -O0, it doesn't. With -foffload=-malias that's fixed.
[Bug target/104957] [nvptx] Use .alias directive (available starting ptx isa version 6.3)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104957 --- Comment #2 from Tom de Vries --- So, what do we get after specifying -malias -mptx=6.3? Alias attribute only for functions, not variables. No support for weak alias (allowing this does compile, but we run into execution fails in gcc.dg/globalalias.c and gcc.dg/pr77587.c). No support for aliases of weak functions. We can't detect this in the compiler, so we'll run into linker error "error: Function test with .weak scope cannot be aliased".
[Bug target/104957] [nvptx] Use .alias directive (available starting ptx isa version 6.3)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104957 --- Comment #1 from Tom de Vries --- Created attachment 52636 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52636=edit Tentative patch Patch that I'm currently working on. Adds -malias, off by default. It's off by default because when doing a build with libgomp and malias on by default, libgomp uses .alias a few times, and that ends up in a linker error "Internal error: reference to deleted section" with OvO test-cases (haven't tried others) This may be a driver error, or incorrect usage of the .alias directive. The answer might be found by playing around with .alias in cuda examples. Things I tried manually in the ptx were: - resolving the .alias: this worked - enforcing order: function definition, alias declaration, alias definition to precisely match example in ptx manual: didn't work.
[Bug target/104957] New: [nvptx] Use .alias directive (available starting ptx isa version 6.3)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104957 Bug ID: 104957 Summary: [nvptx] Use .alias directive (available starting ptx isa version 6.3) Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- [ There is a number of nvptx PRs open about alias support. The focus of this PR is $subject, rather than supporting some specific source construct. ] So, we have an .alias directive in ptx, can we use it for something? Ideally, we'd use the .alias directive for all our needs, but it's too limited for that. OTOH, currently we just error out on any .alias usage in the source code, so we could try to add an implementation that only errors out for things it doesn't support.
[Bug target/97106] [nvptx] Issues with weak aliases introduced by C++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97106 --- Comment #4 from Tom de Vries --- This: ... $ cat alias.c void __f () { __builtin_printf ("hello\n"); } void f () __attribute__ ((alias ("__f"))); int main (void) { f (); return 0; } ... works fine at -O0 and -O1: ... $ ./gcc.sh -O0 ./alias.c $ ./install/bin/nvptx-none-run a.out hello $ ./gcc.sh -O1 ./alias.c $ ./install/bin/nvptx-none-run a.out hello ... but at -O2 we have: ... $ ./gcc.sh -O2 ./alias.c $ ./install/bin/nvptx-none-run a.out fatal : Internal error: reference to deleted section nvptx-run: cuLinkComplete failed: unknown error (CUDA_ERROR_UNKNOWN, 999) ... This seems to be due to f/__f being inlined into main, after which we have an alias declaration which is unused: ... .visible .func f; .alias f,__f; ... Removing these two lines make the executable run fine again. Note: same thing when using nvptx-none-run -O0. Fixed by: ... diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index ab1f62359d4b..3e51bf15776c 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -77,6 +77,7 @@ #include "opts.h" #include "tree-pretty-print.h" #include "rtl-iter.h" +#include "cgraph.h" /* This file should be included last. */ #include "target-def.h" @@ -7396,6 +7397,10 @@ nvptx_mem_local_p (rtx mem) void nvptx_asm_output_def_from_decls (FILE *stream, tree name, tree value) { + if (!cgraph_node::get (name)->referred_to_p ()) +/* Prevent "Internal error: reference to deleted section". */ +return; + std::stringstream s; write_fn_proto (s, false, get_fnname_from_decl (name), name); fputs (s.str().c_str(), stream); ...
[Bug target/104936] New: [nvptx] Handle weak decl/def distinction in common code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104936 Bug ID: 104936 Summary: [nvptx] Handle weak decl/def distinction in common code Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- At docs for ASM_WEAKEN_LABEL (stream, name) we find: ... If you don’t define this macro or ASM_WEAKEN_DECL, GCC will not support weak symbols and you should not define the SUPPORTS_WEAK macro. ... However, we have: ... $ grep define.*WEAKEN gcc/config/nvptx/* $ ... but still: ... $ grep SUPPORTS_WEAK gcc/config/nvptx/* gcc/config/nvptx/nvptx.h:#define SUPPORTS_WEAK 1 ... I think an argument for the discrepancy is made here: ... /* We support weak defintions, and hence have the right ASM_WEAKEN_DECL definition. Diagnose the problem here. */ if (DECL_WEAK (decl)) error_at (DECL_SOURCE_LOCATION (decl), "PTX does not support weak declarations" " (only weak definitions)"); ... where it's my understanding that the "right ASM_WEAKEN_DECL" is meant to refer to ASM_WEAKEN_DECL not being defined. It would be nice to solve this somehow in the common code instead of deviating from prescribed target macro usage.
[Bug target/104768] [nvptx] Exploit Independent Thread Scheduling for sm_70+
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104768 --- Comment #1 from Tom de Vries --- Hmm, reading about it a bit more, it's more about enabling algorithms that were not possible before, than about performance improvements. So, we should aim at having test-cases, both openacc and openmp that hang on previous architectures but pass with sm_70+.