[Bug debug/111409] Invalid .debug_macro.dwo macro information for split DWARF
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111409 Tom de Vries changed: What|Removed |Added CC||hjl.tools at gmail dot com --- Comment #6 from Tom de Vries --- *** Bug 87472 has been marked as a duplicate of this bug. ***
[Bug debug/87472] Unknown macro opcode with -gsplit-dwarf -g3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87472 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||vries at gcc dot gnu.org Resolution|--- |DUPLICATE --- Comment #4 from Tom de Vries --- Duplicate. *** This bug has been marked as a duplicate of bug 111409 ***
[Bug debug/111409] Invalid .debug_macro.dwo macro information for split DWARF
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111409 Tom de Vries changed: What|Removed |Added CC||vries at gcc dot gnu.org Target Milestone|--- |14.0 --- Comment #5 from Tom de Vries --- Fixed in 14.1 release. Corresponding target milestone not available, so using 14.0.
[Bug debug/115066] [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115066 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Status|REOPENED|RESOLVED --- Comment #11 from Tom de Vries --- (In reply to Rainer Orth from comment #10) > The new test currently FAILs on Solaris/SPARC with the native as: > > FAIL: gcc.dg/pr115066.c scan-assembler .bytet0xbt# Define macro > strx > > The relevant snippet of pr115066.s is > > .section".debug_macro.dwo",#exclude,#progbits > .LLdebug_macro0: > .uahalf 0x4 ! DWARF macro version number > .byte 0x2 ! Flags: 32-bit, lineptr present > .uaword .LLskeleton_debug_line0 > .byte 0x1 ! Define macro > > while when using gas, I have > > .section.debug_macro.dwo,"e",@progbits > .LLdebug_macro0: > .uahalf 0x4 ! DWARF macro version number > .byte 0x2 ! Flags: 32-bit, lineptr present > .uaword .LLskeleton_debug_line0 > .byte 0xb ! Define macro strx > > AFAICS from dwarf2out.cc (output_macinfo_op), the requirements for using > DW_MACRO_define_strx are (among others) > !DWARF2_INDIRECT_STRING_SUPPORT_MISSING_ON_TARGET && SECTION_MERGE. > > However, with the native assembler, SHF_MERGE doesn't work (as emits > something > ld cannot link). > > I wonder how best to handle this: just skip the test on sparc*-sun-solaris2* > && !gas? Theoretically, there could be other targets with similar issues. This looks like test-case issue, so re-closing the PR. How about: ... -/* { dg-final { scan-assembler {\.byte\t0xb\t[^\n\r]* Define macro strx} } } */ +/* { dg-final { scan-assembler {\.byte\t0xb\t[^\n\r]* Define macro strx|\.byte\t0x1\t[^\n\r]* Define macro} ... ?
[Bug debug/115066] [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115066 Tom de Vries changed: What|Removed |Added Target Milestone|--- |15.0 Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #9 from Tom de Vries --- Fixed.
[Bug debug/115066] [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115066 --- Comment #7 from Tom de Vries --- Submitted here ( https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651586.html ).
[Bug debug/115066] [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115066 --- Comment #6 from Tom de Vries --- (In reply to Jakub Jelinek from comment #5) > Just make it if (dwarf_split_debug_info) then? That works as well indeed: ... diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc index eedb13bb069..70b7f5f42cd 100644 --- a/gcc/dwarf2out.cc +++ b/gcc/dwarf2out.cc @@ -29045,7 +29045,7 @@ output_macinfo_op (macinfo_entry *ref) && !DWARF2_INDIRECT_STRING_SUPPORT_MISSING_ON_TARGET && (debug_str_section->common.flags & SECTION_MERGE) != 0) { - if (dwarf_split_debug_info && dwarf_version >= 5) + if (dwarf_split_debug_info) ref->code = ref->code == DW_MACINFO_define ? DW_MACRO_define_strx : DW_MACRO_undef_strx; else @@ -29097,12 +29097,20 @@ output_macinfo_op (macinfo_entry *ref) HOST_WIDE_INT_PRINT_UNSIGNED, ref->lineno); if (node->form == DW_FORM_strp) -dw2_asm_output_offset (dwarf_offset_size, node->label, - debug_str_section, "The macro: \"%s\"", - ref->info); + { + gcc_assert (ref->code == DW_MACRO_define_strp + || ref->code == DW_MACRO_undef_strp); + dw2_asm_output_offset (dwarf_offset_size, node->label, +debug_str_section, "The macro: \"%s\"", +ref->info); + } else -dw2_asm_output_data_uleb128 (node->index, "The macro: \"%s\"", - ref->info); + { + gcc_assert (ref->code == DW_MACRO_define_strx + || ref->code == DW_MACRO_undef_strx); + dw2_asm_output_data_uleb128 (node->index, "The macro: \"%s\"", + ref->info); + } break; case DW_MACRO_import: dw2_asm_output_data (1, ref->code, "Import"); ... I've also added asserts detecting this PR.
[Bug debug/115066] [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115066 --- Comment #4 from Tom de Vries --- (In reply to Richard Biener from comment #3) > And with -gstrict-dwarf -gdwarf-4? With: ... $ gcc.sh -gdwarf-4 -gsplit-dwarf /data/vries/hello.c -g3 -gstrict-dwarf ... we get: ... .section.debug_macinfo.dwo,"e",@progbits .Ldebug_macinfo0: .byte 0x3 # Start new file .uleb128 0 # Included from line number 0 .uleb128 0x1# file /data/vries/hello.c .byte 0x1 # Define macro .uleb128 0 # At line number 0 .ascii "__STDC__ 1\0" # The macro ...
[Bug debug/115066] [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115066 --- Comment #2 from Tom de Vries --- (In reply to Tom de Vries from comment #1) > Looking at the source code, I wonder if this would fix it: > ... > diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc > index eedb13bb069..045858bf638 100644 > --- a/gcc/dwarf2out.cc > +++ b/gcc/dwarf2out.cc > @@ -29045,7 +29045,7 @@ output_macinfo_op (macinfo_entry *ref) > && !DWARF2_INDIRECT_STRING_SUPPORT_MISSING_ON_TARGET > && (debug_str_section->common.flags & SECTION_MERGE) != 0) > { > - if (dwarf_split_debug_info && dwarf_version >= 5) > + if (dwarf_split_debug_info && (!dwarf_strict || dwarf_version >= 5)) > ref->code = ref->code == DW_MACINFO_define > ? DW_MACRO_define_strx : DW_MACRO_undef_strx; > else > ... With that change I get: ... .Ldebug_macro0: .value 0x4 # DWARF macro version number .byte 0x2 # Flags: 32-bit, lineptr present .long .Lskeleton_debug_line0 .byte 0x3 # Start new file .uleb128 0 # Included from line number 0 .uleb128 0x1# file /data/vries/hello.c .byte 0xb # Define macro strx .uleb128 0 # At line number 0 .uleb128 0x17b # The macro: "__STDC__ 1" ...
[Bug debug/115066] [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115066 --- Comment #1 from Tom de Vries --- Looking at the source code, I wonder if this would fix it: ... diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc index eedb13bb069..045858bf638 100644 --- a/gcc/dwarf2out.cc +++ b/gcc/dwarf2out.cc @@ -29045,7 +29045,7 @@ output_macinfo_op (macinfo_entry *ref) && !DWARF2_INDIRECT_STRING_SUPPORT_MISSING_ON_TARGET && (debug_str_section->common.flags & SECTION_MERGE) != 0) { - if (dwarf_split_debug_info && dwarf_version >= 5) + if (dwarf_split_debug_info && (!dwarf_strict || dwarf_version >= 5)) ref->code = ref->code == DW_MACINFO_define ? DW_MACRO_define_strx : DW_MACRO_undef_strx; else ...
[Bug debug/115066] New: [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115066 Bug ID: 115066 Summary: [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Consider a hello world, compiled with split dwarf and dwarf version 4, and -g3 for macro info: ... $ gcc -gdwarf-4 -gsplit-dwarf /data/vries/hello.c -g3 -save-temps -dA ... In section .debug_macro.dwo, we have: ... .Ldebug_macro0: .value 0x4 # DWARF macro version number .byte 0x2 # Flags: 32-bit, lineptr present .long .Lskeleton_debug_line0 .byte 0x3 # Start new file .uleb128 0 # Included from line number 0 .uleb128 0x1# file /data/vries/hello.c .byte 0x5 # Define macro strp .uleb128 0 # At line number 0 .uleb128 0x1d0 # The macro: "__STDC__ 1" ... So, given that we use a DW_MACRO_define_strp, we'd expect 0x1d0 to be an offset into a .debug_str section. However, in .debug_str.dwo we find: ... 0x01d0 455f584f 50454e32 4b385853 49005345 E_XOPEN2K8XSI.SE ... In fact, 0x1d0 is an index into the string offset table in .debug_str_offsets.dwo: ... .long 0x34f0 # indexed string 0x1d0: __STDC__ 1 ... So, it looks like DW_MACRO_define_strx should have been used instead.
[Bug c++/113599] [14 Regression] Wrong computation of member offset through pointer-to-member since r14-5503
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113599 Tom de Vries changed: What|Removed |Added CC||vries at gcc dot gnu.org --- Comment #2 from Tom de Vries --- FWIW, the inherit order is relevant, after applying this change we get the expected result: ... -struct thread_info : public dummy, public intrusive_list_node { +struct thread_info : public intrusive_list_node, public dummy { ... This could be used as workaround.
[Bug debug/112565] Abnormal Jump in Execution using 'stepi' Command in GDB under O2 optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112565 --- Comment #1 from Tom de Vries --- (In reply to Anonymous from comment #0) > Tom de Vries suggests that this issue may be attributed to a GCC > optimization bug. I do not.
[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 --- Comment #7 from Tom de Vries --- (In reply to Alexander Monakov from comment #5) > This trips Valgrind's data race detector (valgrind --tool=helgrind) too. So > I don't think checking SANITIZE_THREAD is the correct approach. Can you elaborate on what you consider a correct approach?
[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 --- Comment #6 from Tom de Vries --- (In reply to rguent...@suse.de from comment #4) > I'm suggesting to not fix it ;) Can you explain why ? It doesn't look difficult to fix to me, and I don't see any downsides. > That said, is TSAN a useful vehicle? Well, false positives aside, yes.
[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 --- Comment #2 from Tom de Vries --- (In reply to Richard Biener from comment #1) > We consider introducing load data races OK, what's the difference here? This is a load vs. store data race. > There are other passes that would do similar things but in practice the > loads would be considered to possibly trap so the real-world impact might be > limited? If you're suggesting to fix this in a more generic way, I'm all for it.
[Bug sanitizer/110799] New: [tsan] False positive due to -fhoist-adjacent-loads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 Bug ID: 110799 Summary: [tsan] False positive due to -fhoist-adjacent-loads Product: gcc Version: 13.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: sanitizer Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org, jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at gcc dot gnu.org Target Milestone: --- I build gdb with -O2 -fsanitizer=thread, and ran into a false positive due to -fhoist-adjacent-loads. Minimal example: ... $ cat race.c #include #include int c; struct s { int a; int b; }; struct s v1; int v3; void * thread1 (void *x) { v1.a = 1; return NULL; } void * thread2 (void *x) { v3 = c ? v1.a : v1.b; return NULL; } int main (void) { pthread_t t[2]; pthread_create([0], NULL, thread1, NULL); pthread_create([1], NULL, thread2, NULL); pthread_join(t[0], NULL); pthread_join(t[1], NULL); return 0; } ... With O0, runs fine: ... $ gcc race.c -fsanitize=thread -g $ ./a.out $ ... With O2, a race is reported: ... $ gcc race.c -fsanitize=thread -g -O2 $ ./a.out == WARNING: ThreadSanitizer: data race (pid=24538) Read of size 4 at 0x00404060 by thread T2: #0 thread2 /data/vries/gdb/race.c:26 (a.out+0x401299) (BuildId: 295673549b1e99c73c70a2a8d26944f177f88c15) #1 (libtsan.so.2+0x3c329) (BuildId: 8f2a9be581a0fcb3d7109755a6067408093b9dbd) Previous write of size 4 at 0x00404060 by thread T1: #0 thread1 /data/vries/gdb/race.c:19 (a.out+0x401257) (BuildId: 295673549b1e99c73c70a2a8d26944f177f88c15) #1 (libtsan.so.2+0x3c329) (BuildId: 8f2a9be581a0fcb3d7109755a6067408093b9dbd) ... With -fno-hoist-adjacent-loads, it's fine again: ... $ gcc race.c -fsanitize=thread -g -O2 -fno-hoist-adjacent-loads $ ./a.out $ ... The optimization transforms these loads: ... v3 = c ? v1.a : v1.b; ... into: ... int tmp_a = v1.a; int tmp_b = v1.b; v3 = c ? tmp_a : tmp_b ... which introduces the false positive. So I wonder if there should be a change like this: ... static bool gate_hoist_loads (void) { return (flag_hoist_adjacent_loads == 1 + && (flag_sanitize & SANITIZE_THREAD) == 0 && param_l1_cache_line_size && HAVE_conditional_move); } ...
[Bug c/109708] New: [c, doc] wdangling-pointer example broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109708 Bug ID: 109708 Summary: [c, doc] wdangling-pointer example broken Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- I ran into a Wdangling-pointer warning and decided to read the docs and try out an example. The first one listed is: ... int f (int c1, int c2, x) { char *p = strchr ((char[]){ c1, c2 }, c3); // warning: dangling pointer to a compound literal return p ? *p : 'x'; } ... It's not a complete example, x is missing a declared type and c3 is undeclared. After fixing that (and adding the implicit "#include "), we have an example that compiles: ... #include int f (int c1, int c2, int c3) { char *p = strchr ((char[]){ c1, c2 }, c3); return p ? *p : 'x'; } ... but no warning, not at O0, O1, O2 or O3: ... $ gcc test.c -Wdangling-pointer=1 -c $ ... After reading the description of the warning, I managed to come up with: ... char f (char c1, char c2) { char *p; { p = (char[]) { c1, c2 }; } return *p; } ... which does manage to trigger the warning for O0-O3: ... $ gcc test.c -Wdangling-pointer=1 -c test.c: In function ‘f’: test.c:10:10: warning: using dangling pointer ‘p’ to an unnamed temporary [-Wdangling-pointer=] 10 | return *p; | ^~ test.c:7:18: note: unnamed temporary defined here 7 | p = (char[]) { c1, c2 }; | ^ $ ... It might be worth mentioning that it's a C example, when using g++ we have: ... $ g++ test.c -Wdangling-pointer=1 -c test.c: In function ‘char f(char, char)’: test.c:7:18: error: taking address of temporary array 7 | p = (char[]) { c1, c2 }; | ^~ ... BTW, note that the warning can be fixed by doing: ... char f (char c1, char c2) { char *p; + char c; { p = (char[]) { c1, c2 }; + c = *p; } - return *p; + return c; } ...
[Bug debug/108600] Use DW_LNS_set_prologue_end
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600 --- Comment #4 from Tom de Vries --- (In reply to Tom de Vries from comment #0) > Note that for for instance gdb test-case gdb.ada/ref_param.exp, this > convention was broken for gcc 7.5.0 (and I don't know how much earlier), and > my current guess is that it got fixed in gcc 11.1.0, by commit c029fcb5680 > ("Reset force_source_line in final.c"). Confirmed, see PR108615.
[Bug debug/108615] Incorrect prologue marker in line table
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108615 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED Target Milestone|--- |11.0 Keywords||wrong-debug --- Comment #1 from Tom de Vries --- Closing with milestone gcc 11.0.
[Bug debug/108615] New: Incorrect prologue marker in line table
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108615 Bug ID: 108615 Summary: Incorrect prologue marker in line table Product: gcc Version: 10.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- [ Filing FTR, to link a commit to a test-case that it fixes, and to be able to refer to it from gdb sources. ] Consider pck.adb/pck.ads from here ( https://sourceware.org/git/?p=binutils-gdb.git;a=tree;f=gdb/testsuite/gdb.ada/ref_param;h=823556d8928bd1de865e71400092e6a44b2773cd;hb=HEAD ). Compile like so, with system gcc 7.5.0: ... $ gcc pck.adb -S -g ... In the .s file we find: ... pck__call_me: .LFB3: .file 1 "pck.adb" .loc 1 18 0 .cfi_startproc .loc 1 18 0 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rsp, %rbp .cfi_def_cfa_register 6 movq%rdi, -8(%rbp) .loc 1 20 0 ... There's a convention that the second entry in the line table indicates the end of the prologue. So the second .loc indicates that the prologue ends there, while it ends only after the third insn, at line 20. Same with 10.4.0. Fixed by commit c029fcb5680 ("Reset force_source_line in final.c"), first available in release 11.1.0. With 11.3.0 we have: ... pck__call_me: .LFB3: .file 1 "pck.adb" .loc 1 18 4 .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rsp, %rbp .cfi_def_cfa_register 6 movq%rdi, -8(%rbp) .loc 1 20 16 ...
[Bug debug/108600] Use DW_LNS_set_prologue_end
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600 --- Comment #3 from Tom de Vries --- (In reply to Tom de Vries from comment #2) > (In reply to Tom de Vries from comment #1) > > Created attachment 54371 [details] > > We probably don't want to emit in all cases, maybe limiting to > "dwarf_version >= 3", or "!dwarf_strict || dwarf_version >= 3". Let's see about dwarf_strict. Semantics: ... -gstrict-dwarf Disallow using extensions of later DWARF standard version than selected with -gdwarf-version. On most targets using non-conflicting DWARF extensions from later standard versions is allowed. ... For the -gas-loc-support case (gcc emitting .locs), even when passing -gdwarf-2, gas emits a v3 version .debug_line section (since binutils-2_32). And even if we'd fix that (I've filed https://sourceware.org/bugzilla/show_bug.cgi?id=30064), the way gas works is by bumping the dwarf version when encountering a feature that requires a higher version, so using end_prologue in a loc directive would then end up bumping dwarf_level to 3, bumping also the .debug_line version. For the -gno-as-loc-support case (gcc emitting .debug_line contribution), for -gdwarf-2 gcc indeed emits a v2 .debug_line section. But, that makes DW_LNS_set_prologue_end fall in the range of vendor specific extensions, and we can't conflict with that. Taking this all into account, I think it's better not to emit DW_LNS_set_prologue_end for -gdwarf-2 -gno-strict-dwarf.
[Bug debug/47471] [10/11/12/13 Regression] stdarg functions extraneous too-early prologue end
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47471 Tom de Vries changed: What|Removed |Added CC||vries at gcc dot gnu.org --- Comment #24 from Tom de Vries --- I can reproduce this with gcc 4.8.5: ... f: .LFB0: .file 1 "test.c" .loc 1 3 0 .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rsp, %rbp .cfi_def_cfa_register 6 subq$60, %rsp movq%rsi, -168(%rbp) movq%rdx, -160(%rbp) movq%rcx, -152(%rbp) movq%r8, -144(%rbp) movq%r9, -136(%rbp) testb %al, %al je .L2 .loc 1 3 0 movaps %xmm0, -128(%rbp) movaps %xmm1, -112(%rbp) movaps %xmm2, -96(%rbp) movaps %xmm3, -80(%rbp) movaps %xmm4, -64(%rbp) movaps %xmm5, -48(%rbp) movaps %xmm6, -32(%rbp) movaps %xmm7, -16(%rbp) .L2: movl%edi, -180(%rbp) .loc 1 4 0 movlv(%rip), %eax ... But with gcc 7.5.0, I get: ... f: .LFB0: .file 1 "test.c" .loc 1 3 0 .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rsp, %rbp .cfi_def_cfa_register 6 subq$72, %rsp movl%edi, -180(%rbp) movq%rsi, -168(%rbp) movq%rdx, -160(%rbp) movq%rcx, -152(%rbp) movq%r8, -144(%rbp) movq%r9, -136(%rbp) testb %al, %al je .L3 movaps %xmm0, -128(%rbp) movaps %xmm1, -112(%rbp) movaps %xmm2, -96(%rbp) movaps %xmm3, -80(%rbp) movaps %xmm4, -64(%rbp) movaps %xmm5, -48(%rbp) movaps %xmm6, -32(%rbp) movaps %xmm7, -16(%rbp) .L3: .loc 1 4 0 movlv(%rip), %eax addl$1, %eax movl%eax, v(%rip) ... So, isn't this fixed?
[Bug debug/108600] Use DW_LNS_set_prologue_end
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600 --- Comment #2 from Tom de Vries --- (In reply to Tom de Vries from comment #1) > Created attachment 54371 [details] We probably don't want to emit in all cases, maybe limiting to "dwarf_version >= 3", or "!dwarf_strict || dwarf_version >= 3".
[Bug debug/108600] Use DW_LNS_set_prologue_end
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600 --- Comment #1 from Tom de Vries --- Created attachment 54371 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54371=edit tentative patch Tentative patch. For hello.c, for the -gas-loc-support case it gives us: ... $ gcc -g ~/hello.c -S -o- ... main: .LFB0: .file 1 "/home/vries/hello.c" .loc 1 5 1 .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rsp, %rbp .cfi_def_cfa_register 6 .loc 1 6 3 prologue_end movl$.LC0, %edi ... And for the -gno-as-loc-support case: ... $ gcc -g ~/hello.c -c -gno-as-loc-support $ llvm-dwarfdump --debug-line hello.o ... AddressLine Column File ISA Discriminator Flags -- -- -- -- --- - - 0x 5 0 1 0 0 is_stmt 0x0004 6 1 1 0 0 is_stmt prologue_end 0x000e 8 3 1 0 0 is_stmt 0x0013 9 10 1 0 0 is_stmt 0x0015 9 1 1 0 0 is_stmt end_sequence ...
[Bug debug/108600] New: Use DW_LNS_set_prologue_end
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600 Bug ID: 108600 Summary: Use DW_LNS_set_prologue_end Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- The prologue_end marker (introduced in dwarf v3) is currently not emitted by gcc. This is the case for -gno-as-loc-support (gcc emitting .debug_line contribution). And for -gas-loc-support (gcc emitting .loc directives). Note that there is an existing convention that marks the end of the prologue: the second entry in the line table marks the first insn after the prologue. However, that convention is not known by everyone, which makes it easier to break it without noticing. Note that for for instance gdb test-case gdb.ada/ref_param.exp, this convention was broken for gcc 7.5.0 (and I don't know how much earlier), and my current guess is that it got fixed in gcc 11.1.0, by commit c029fcb5680 ("Reset force_source_line in final.c"). The only way to make the end of the prologue visible in assembly as well as disassembly is to make it explicit by using DW_LNS_set_prologue_end.
[Bug libgomp/108098] OpenMP/nvptx reverse offload execution test FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108098 --- Comment #1 from Tom de Vries --- (In reply to Thomas Schwinge from comment #0) > $ nvidia-smi > [...] > | NVIDIA-SMI 440.33.01Driver Version: 440.33.01CUDA Version: 10.2 > [...] > | 0 Tesla K80 [...] > [...] > | 1 Tesla K80 [...] > I'm not sure if it matters for triggering this problem, but if I look at this board at nvidia drivers download and select cuda 10.2 and production branch, I get : ... version:440.118.02 Release Date: 2020.9.30 ... Then using the "Beta and Older Drivers" I find the version you're using is: ... version: 440.33.01 Release date: November 19, 2019 ... Please always use the latest drivers when reporting a problem.
[Bug debug/107909] New: [powerpc64le, debug] Incorrect call site location due to nop after call insn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107909 Bug ID: 107909 Summary: [powerpc64le, debug] Incorrect call site location due to nop after call insn Product: gcc Version: 7.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Consider test-case vla-optimized-out.c: ... int __attribute__((noinline,weak)) __attribute__((noclone)) f1 (int i) { char a[i + 1]; a[0] = 5; return a[0]; } int main (void) { volatile int j; int i = 5; asm volatile ("" : "=r" (i) : "0" (i)); j = f1 (i); return 0; } ... compiled with -O1 -g. We generate a nop after the bl insn: ... bl f1# 11 *call_value_nonlocal_aixdi [length = 8] nop .LVL4: ... and the label after the nop is a call site location: ... .uleb128 0x5 # (DIE (0x67) DW_TAG_GNU_call_site) .8byte .LVL4# DW_AT_low_pc .4byte 0x81 # DW_AT_abstract_origin ... Consequently we can't actually find the call site: ... $ gdb -q -batch ./a.out -ex "break f1" -ex run -ex "set debug entry-values 1" -ex "print sizeof (a)" Breakpoint 1 at 0x165c: file vla-optimized-out.c, line 8. Breakpoint 1, f1 (i=5) at vla-optimized-out.c:8 8 } DW_OP_entry_value resolving cannot find DW_TAG_call_site 0x1690 in main $1 = ... If we manually fix this in the .s file, we get a bit further: ... $ gdb -q -batch ./a.out -ex "break f1" -ex run -ex "set debug entry-values 1" -ex "print sizeof (a)" Breakpoint 1 at 0x165c: file vla-optimized-out.c, line 8. Breakpoint 1, f1 (i=5) at vla-optimized-out.c:8 8 } Cannot find matching parameter at DW_TAG_call_site 0x1690 at main $1 = ... The problem now is that the DW_AT_abstract_origin in the DW_TAG_GNU_call_site is not properly handled by gdb. I reproduced this with gcc 7.5.0, but after looking at the pattern for call_value_nonlocal_aixdi I think this should be reproducible with trunk.
[Bug other/87741] Don't build readline/libreadline.a in GDB, when --with-system-readline is supplied
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87741 Tom de Vries changed: What|Removed |Added CC||vries at gcc dot gnu.org Status|NEW |RESOLVED Target Milestone|--- |13.0 Resolution|--- |FIXED Component|bootstrap |other --- Comment #8 from Tom de Vries --- (In reply to Andrew Pinski from comment #7) > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;f=configure.ac; > h=69961a84c9b3744a10248fb6cbccc3c688a1e0a5 > > It would be useful if both configures were synced up again https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=36ba985145ffa8e2078033fc1f1cf22851707a8e
[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555 --- Comment #17 from Tom de Vries --- (In reply to Thomas Schwinge from comment #14) > > That's with a Nvidia Tesla K20c GPU, Driver Version: 346.46. > > As that version is "a bit old", I shall first update this, before we spend > > any further time on analyzing this. > > Cross-checking on another system with Nvidia Tesla K20c GPU but more recent > Driver Version I'm not seeing such an issue. > > On the "old" system, gradually upgrading Driver Version: 346.46 to 352.99, > 361.93.02, 375.88 (always the latest (?) version of the respective series), > these all did not resolve the problem. > > Only starting with 384.59 (that is, early version of the 384.X series), that > then did resolve the issue. That's still using the GCC/nvptx '-mptx=3.1' > multilib. > > (We couldn't with earlier series, but given this is 384.X, we may now also > cross-check with the default multilib, and that also was fine.) > > Now, I don't know if at all we would like to spend any more effort on this > issue, given that it only appears with rather old pre-384.X versions -- but > on the other hand, the GCC/nvptx '-mptx=3.1' multilib is meant to keep these > supported? (... which is why I'm running such testing; and certainly the > timeouts are annoying there.) > > It might be another issue with pre-384.X versions of the Nvidia PTX JIT, or > is there the slight possibility that GCC is generating/libgomp contains some > "weird" code that post-384.X version happen to "fix up" -- probably the > former rather than the latter? (Or, the chance of GPU hardware/firmware or > some other system weirdness -- unlikely, otherwise behaves totally fine?) > > I don't know where to find complete Nvidia Driver/JIT release notes, where > the 375.X -> 384.X notes might provide an idea of what got fixed, and we > might then add another 'WORKAROUND_PTXJIT_BUG' for that -- maybe simple, > maybe not. > > Any thoughts, Tom? I care about old cards, not about old drivers. The oldest card we support is an sm_30, and last driver series that supports that one is 470.x (and AFAIU, is therefore supported by nvidia for that arch). There's the legacy series, 390.x, which is the last to support fermi, but we don't support any fermi cards or earlier. I did do some testing with this one for later cards, but reported issues are acknowledged but not fixed by nvidia, so ... this is already out of scope for me. So yeah, IWBN to come up with workarounds for various older drivers, but I'm not investing time in that. Is there a problem for you to move to 470.x or later (515.x) ? Is there a card for which that causes problems ?
[Bug debug/105772] [debug, i386] sched2 moves get_pc_thunk call past debug_insn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105772 --- Comment #2 from Tom de Vries --- As background info, I'm proposing a patch for gdb to have the architecture-specific prologue skipper skip over the get_pc_thunk call: https://sourceware.org/pipermail/gdb-patches/2022-May/189563.html , which helps to skip over the prologue with -O0 -pie -fPIE code. But that causes a regression in test-case gdb/testsuite/gdb.base/break.exp, because of this PR.
[Bug debug/105772] New: [debug, i386] sched2 moves get_pc_thunk call past debug_insn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105772 Bug ID: 105772 Summary: [debug, i386] sched2 moves get_pc_thunk call past debug_insn Product: gcc Version: 12.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Consider the test-case source gdb/testsuite/gdb.base/break1.c ( https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/testsuite/gdb.base/break1.c;h=24d0d15dc2e8bd14f6b14fabbb38fec43db9c990;hb=HEAD ) containing function marker4. Extracting the relevant part: ... struct some_struct { int a_field; int b_field; union { int z_field; }; }; struct some_struct values[50]; void marker4 (long d) { values[0].a_field = d; }/* set breakpoint 14 here */ ... When compiling with gcc 12.1.0 and -O2 like so: ... $ gcc -fno-stack-protector -m32 -fPIE -pie -w -c -g break1.c -O2 ... we have before sched2: ... (note 4 1 14 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 14 4 11 2 NOTE_INSN_PROLOGUE_END) (insn/f 11 14 3 2 (parallel [ (set (reg:SI 0 ax [82]) (unspec:SI [ (const_int 0 [0]) ] UNSPEC_SET_GOT)) (clobber (reg:CC 17 flags)) ]) 931 {*set_got} (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUIV (unspec:SI [ (const_int 0 [0]) ] UNSPEC_SET_GOT) (expr_list:REG_CFA_FLUSH_QUEUE (nil) (nil) (note 3 11 6 2 NOTE_INSN_FUNCTION_BEG) (debug_insn 6 3 12 2 (debug_marker) "break1.c":59:25 -1 (nil)) (insn 12 6 8 2 (set (reg/v:SI 1 dx [orig:83 d ] [83]) (mem/c:SI (plus:SI (reg/f:SI 7 sp) (const_int 4 [0x4])) [5 d+0 S4 A32])) "break1.c":59:43 81 {*movsi_internal} (expr_list:REG_EQUIV (mem/c:SI (reg/f:SI 16 argp) [5 d+0 S4 A32]) (nil))) ... and after: ... (note 4 1 14 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 14 4 3 2 NOTE_INSN_PROLOGUE_END) (note 3 14 6 2 NOTE_INSN_FUNCTION_BEG) (debug_insn 6 3 11 2 (debug_marker) "break1.c":59:25 -1 (nil)) (insn/f:TI 11 6 12 2 (parallel [ (set (reg:SI 0 ax [82]) (unspec:SI [ (const_int 0 [0]) ] UNSPEC_SET_GOT)) (clobber (reg:CC 17 flags)) ]) 931 {*set_got} (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUIV (unspec:SI [ (const_int 0 [0]) ] UNSPEC_SET_GOT) (expr_list:REG_CFA_FLUSH_QUEUE (nil) (nil) (insn 12 11 8 2 (set (reg/v:SI 1 dx [orig:83 d ] [83]) (mem/c:SI (plus:SI (reg/f:SI 7 sp) (const_int 4 [0x4])) [5 d+0 S4 A32])) "break1.c":59:43 81 {*movsi_internal} (expr_list:REG_EQUIV (mem/c:SI (reg/f:SI 16 argp) [5 d+0 S4 A32]) (nil))) ... This moves the get_pc_thunk call after the debug_insn, making it (in terms of debug info) part of the first statement instead of the prologue. That is, with -O1 we have insn: ... 000d : d: e8 fc ff ff ff call e 12: 05 01 00 00 00 add$0x1,%eax 17: 8b 54 24 04 mov0x4(%esp),%edx 1b: 89 90 00 00 00 00 mov%edx,0x0(%eax) 21: c3 ret ... and line info: ... File nameLine numberStarting addressViewStmt break1.c 59 0xd x break1.c 59 0xd 1 break1.c 590x17 x break1.c 590x17 1 break1.c 590x21 break1.c -0x22 ... so at 0x17 we have the start of a statement. But with -O2 we have identical insn: ... 0030 : 30: e8 fc ff ff ff call 31 35: 05 01 00 00 00 add$0x1,%eax 3a: 8b 54 24 04 mov0x4(%esp),%edx 3e: 89 90 00 00 00 00 mov%edx,0x0(%eax) 44: c3 ret ... but different line info: ... File nameLine numberStarting addressViewStmt break1.c 590x30 x break1.c 590x30 1 x break1.c 590x3a break1.c 590x44 break1.c -0x45 ... so at 0x3a we don't have the start of a statement.
[Bug target/104893] [nvptx] Handle Independent Thread Scheduling for sm_70+ with -msoft-stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104893 Tom de Vries changed: What|Removed |Added Resolution|--- |WORKSFORME Status|UNCONFIRMED |RESOLVED --- Comment #2 from Tom de Vries --- (In reply to Tom de Vries from comment #1) > (In reply to Tom de Vries from comment #0) > > The per-thread call stack is handled for .local memory by the CUDA driver. > > > > For the 'soft stack' that's not the case. > > Hmm, actually there's .local memory used, just not "directly". Possibly the > documentation needs updating to point that out. > > So, there doesn't seem to be an issue related to overlapping storage. > > So I wonder, is the stack pointer also per thread then? Or still per-warp? OK, here ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203#c6 ) we read: ... The pointer is switched between per-warp global memory and per-lane local memory. ... So, I think this should be fine then. Marking this resolved-worksforme until we run into an actual failing test-case.
[Bug target/104857] [nvptx] Add macro specifying ptx isa version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104857 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED Target Milestone|--- |12.0 --- Comment #3 from Tom de Vries --- Committed.
[Bug target/104714] [nvptx] Means to specify any sm_xx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104714 Tom de Vries changed: What|Removed |Added Target Milestone|--- |12.0 Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #3 from Tom de Vries --- Added march-map.
[Bug driver/105096] New: --target-help not an alias for --help=target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105096 Bug ID: 105096 Summary: --target-help not an alias for --help=target Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: trivial Priority: P3 Component: driver Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- In common.opt, we read: ... -target-help Common Driver Alias for --help=target. ... But that doesn't seem to be correct, I get different results. For instance, for nvptx target, we have an malias that atm is undocumented, and we have: ... $ cc1 --target-help 2>&1 | grep malias $ ... and: ... $ cc1 --help=target 2>&1 | grep malias -malias [disabled] $...
[Bug c/53037] warn_if_not_aligned(X)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53037 Tom de Vries changed: What|Removed |Added CC||vries at gcc dot gnu.org --- Comment #43 from Tom de Vries --- *** Bug 81909 has been marked as a duplicate of this bug. ***
[Bug target/81909] Missing warning in gcc.dg/pr53037-{2,3}.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81909 Tom de Vries changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #3 from Tom de Vries --- Marking resolved-duplicate. *** This bug has been marked as a duplicate of bug 53037 ***
[Bug target/81728] nvptx-run: error getting kernel result: the launch timed out and was terminated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81728 Tom de Vries changed: What|Removed |Added Resolution|--- |WORKSFORME Status|UNCONFIRMED |RESOLVED --- Comment #1 from Tom de Vries --- The error message looks like the one produced due to the 5 second watchdog that is installed when the board is running a display manager. Anyway, in order to run into this timeout, the test-case should hang, and it currently doesn't. At least not on any of the boards I've recently tested with, with the supported drivers. So, possibly this was fixed in gcc, or in a driver. There's not enough information in here to reproduce, so closing as resolved-worksforme.
[Bug target/104818] Duplicate word "version" in option -mptx description
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104818 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED Target Milestone|--- |12.0 --- Comment #2 from Tom de Vries --- Fixed in aforementioned commit.
[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042 Tom de Vries changed: What|Removed |Added Severity|normal |enhancement --- Comment #8 from Tom de Vries --- With the conversation shifted to better error messages, re-classifying as enhancement.
[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075 --- Comment #6 from Tom de Vries --- Created attachment 52698 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52698=edit Demonstrator patch with stepping stone patterns for combine (In reply to Tom de Vries from comment #2) > Also, I wonder if defining a stepping-stone intermediate pattern could help > combine. Well, that approach seems to work.
[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014 --- Comment #6 from Tom de Vries --- Reproducer filed at https://github.com/vries/nvidia-bugs/tree/master/shift-and PR filed at nvidia ( https://developer.nvidia.com/nvidia_bug/3585290 ).
[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075 Tom de Vries changed: What|Removed |Added Attachment #52693|0 |1 is obsolete|| --- Comment #3 from Tom de Vries --- Created attachment 52694 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52694=edit Demonstrator patch v2 Forgot to add test-case.
[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075 --- Comment #2 from Tom de Vries --- AFAIU, at gimple level support is limited to vectors, so that doesn't help to generate the insn for the simple, scalar case. It would be nice if combine could generate it and we wouldn't have to use a peephole, but AFAIU the pattern is too complex for that. I wonder if reformulating using a conditional could help there (ptx isa describes semantics using "c + ((a
[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075 --- Comment #1 from Tom de Vries --- Created attachment 52693 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52693=edit Demonstrator patch This patch adds: - modeling of insn sad.u32 in the .md file - peephole2 to generate it (which is incomplete, it needs some safety-checks related to using unique intermediate regs) - extra instance of peephole2 pass (otherwise, the peephole is not triggered) - extra instance of fast_rtl_dce pass (otherwise, unused intermediate insn are not cleaned up) So for the usad_2 in the test-case, we have without the patch: ... cvt.u64.u32 %r32, %r28; cvt.u64.u32 %r33, %r29; sub.u64 %r34, %r32, %r33; abs.s64 %r35, %r34; cvt.u32.u64 %r36, %r35; add.u32 %value, %r36, %r30; ... and with: ... sad.u32 %value, %r28, %r29, %r30; ...
[Bug target/105075] New: [nvptx] Generate sad insn (sum of absolute differences)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075 Bug ID: 105075 Summary: [nvptx] Generate sad insn (sum of absolute differences) Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- ptx has sad ((sum of absolute differences)) insn, which is currently not modeled in the .md file.
[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042 --- Comment #5 from Tom de Vries --- (In reply to Richard Biener from comment #1) > Doesn't whatever driver/library API we use from libgomp to invoke workloads > report actual errors? Maybe we need to improve there. This: ... libgomp: cuStreamSynchronize error: the launch timed out and was terminated ... seems to be the string for cudaErrorLaunchTimeout, which AFAICT is dedicated to this situation, so we could treat that error code specially in cuda_error in plugin-nvptx.c and emit a custom message. Say: ... libgomp: cuStreamSynchronize error: the launch timed out and was terminated (5 second time-out caused by launching on a device running a display manager) ... Alternatively, we could detect cudaDeviceProp::kernelExecTimeoutEnabled and emit a warning when initializing or before launching the first kernel.
[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042 --- Comment #4 from Tom de Vries --- https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592275.html
[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042 --- Comment #3 from Tom de Vries --- (In reply to Tom de Vries from comment #2) > (In reply to Richard Biener from comment #1) > > Doesn't whatever driver/library API we use from libgomp to invoke workloads > > report actual errors? Maybe we need to improve there. > > Good point, it reported some form of timeout. I'll post the exact form once > I reproduce. It's: ... Execution timeout is: 300 spawn [open ...]^M libgomp: cuStreamSynchronize error: the launch timed out and was terminated FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test ... Googling a bit about this error message ( https://forums.developer.nvidia.com/t/need-to-remove-timeouts-and-the-launch-timed-out-and-was-terminated-message/16741/2 ) shows that running a display manager sets a 5/10 seconds watchdog timer on any kernel.
[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014 --- Comment #5 from Tom de Vries --- Minimal test-case: ... void __attribute__((noinline)) foo (unsigned long long d0) { unsigned long long __a; __a = 0x38; for (; __a > 0; __a -= 8) if (((d0 >> __a) & 0xff) != 0) break; __builtin_printf ("__a: 0x%llx\n", __a); } int main (void) { foo (1); return 0; } ... Different value of __a: ... $ ./install/bin/nvptx-none-run -O0 ./pr97459-1.exe ; echo; ./install/bin/nvptx-none-run ./pr97459-1.exe __a: 0x0 __a: 0x30 ...
[Bug target/105011] [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Target Milestone|--- |12.0 Resolution|--- |FIXED --- Comment #5 from Tom de Vries --- Fixed by commit.
[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042 --- Comment #2 from Tom de Vries --- (In reply to Richard Biener from comment #1) > Doesn't whatever driver/library API we use from libgomp to invoke workloads > report actual errors? Maybe we need to improve there. Good point, it reported some form of timeout. I'll post the exact form once I reproduce.
[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014 --- Comment #4 from Tom de Vries --- (In reply to Tom de Vries from comment #1) > With -O0 JIT instead: Also OK with JIT -O1, problems start at JIT -O2.
[Bug target/105011] [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011 --- Comment #3 from Tom de Vries --- Submitted fix : https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592211.html Though without changelog, apparently.
[Bug libgomp/105042] New: [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042 Bug ID: 105042 Summary: [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org CC: jakub at gcc dot gnu.org Target Milestone: --- I usually have only an nvidia-compute$n driver package installed, but sometimes (as happened when I updated the system yesterday) also x11-video-nvidia$n, after which X is run on the nvidia card (instead of on the builtin intel graphics). With such a setup, I run into a cluster of FAILs, all for GOMP_NVPTX_JIT=-O0: ... $ grep ^FAIL: 2/libgomp.sum FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vred2d-128.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vred2d-128.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/parallel-dims.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/vred2d-128.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/vred2d-128.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.oacc-fortran/parallel-dims.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.oacc-fortran/parallel-dims.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O1 -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.oacc-fortran/parallel-dims.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -Os -DGOMP_NVPTX_JIT=-O0 execution test ... Note that this is with a patch from PR104423 that runs tests both with default JIT optimization and GOMP_NVPTX_JIT=-O0, hence the -DGOMP_NVPTX_JIT=-O0 tag. But it can be reproduced by just doing: ... export GOMP_NVPTX_JIT=-O0 ... It could be that the test-cases just need scaling down. OTOH, it also could be that there's an underlying problem that only surfaces when other processes are run in parallel, or specifically, X. This is on board K2000 with driver 470.103.01. The board has 2GB of memory, and according to nvidia-smi, having the X processes takes a couple of 100MBs, and ./parallel-dims.exe just takes 15MiB, so at first glance it doesn't seem to be an out-of-board-memory thing. I do observe reduced system responsiveness while running the tests, so maybe it's the compute capacity rather than memory which is exhausted.
[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019 --- Comment #6 from Tom de Vries --- (In reply to Tom de Vries from comment #4) > OK, I think this is the pattern: > ... > $ cat gcc/testsuite/gcc.target/nvptx/alias-5.c FTR, same thing if I use static functions: ... static void __attribute__((noinline)) __f () { v = 1; } static void f () __attribute__ ((alias ("__f"))); static void g (void) { f (); } ...
[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019 --- Comment #5 from Tom de Vries --- Creating a CUDA example is hampered by the fact that there's no symbol alias support, AFAICT. I'd like to write something like: ... __device__ void __foo () { printf ("__foo\n"); } __device__ void foo () __attribute__((alias ("__foo"))); __device__ void bar () { foo (); } __global__ void hello_world () { bar (); } ... (and then comment out bar to reproduce the problem), but all attempts to get this compiled and executed end up in various errors. Of course we can resort to hand-edited ptx, or adding .alias directives in asm statements, but that's likely to produce an 'unsupported' response by nvidia when filed as bug report.
[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019 --- Comment #4 from Tom de Vries --- OK, I think this is the pattern: ... $ cat gcc/testsuite/gcc.target/nvptx/alias-5.c /* { dg-do link } */ /* { dg-do run { target runtime_ptx_isa_version_6_3 } } */ /* { dg-options "-save-temps -malias -mptx=6.3" } */ int v; void __attribute__((noinline)) __f () { v = 1; } void f () __attribute__ ((alias ("__f"))); void g (void) { f (); } int main (void) { if (v != 0) __builtin_abort (); return 0; } ... There's a function __f: ... // BEGIN GLOBAL FUNCTION DEF: __f .visible .func __f { .reg .u32 %r22; mov.u32 %r22,1; st.global.u32 [v],%r22; ret; } ... with alias f: ... // BEGIN GLOBAL FUNCTION DECL: __f .visible .func __f; .visible .func f; .alias f,__f; ... called from g: ... // BEGIN GLOBAL FUNCTION DEF: g .visible .func g { { call f; } ret; } ... However, g is unused. So we have: ... PASS: gcc.target/nvptx/alias-5.c (test for excess errors) spawn nvptx-none-run ./alias-5.exe^M fatal : Internal error: reference to deleted section^M nvptx-run: cuLinkComplete failed: unknown error (CUDA_ERROR_UNKNOWN, 999)^M FAIL: gcc.target/nvptx/alias-5.c execution test ... Calling g from main fixes the internal error.
[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019 --- Comment #3 from Tom de Vries --- Aliases in failing .exe: ... $ strings declare_target-1.exe | grep "\.alias" .alias gomp_ialias_GOMP_taskgroup_start,GOMP_taskgroup_start; .alias gomp_ialias_GOMP_taskgroup_end,GOMP_taskgroup_end; .alias gomp_ialias_GOMP_taskgroup_reduction_register,GOMP_taskgroup_reduction_register; ...
[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019 --- Comment #2 from Tom de Vries --- Aliases in libgomp.a: ... $ grep "\.alias" build-gcc-offload-nvptx-none/nvptx-none/mgomp/libgomp/.libs/libgomp.a .alias gomp_ialias_GOMP_loop_runtime_next,GOMP_loop_runtime_next; .alias gomp_ialias_GOMP_loop_ull_runtime_next,GOMP_loop_ull_runtime_next; .alias gomp_ialias_GOMP_parallel_end,GOMP_parallel_end; .alias gomp_ialias_GOMP_taskgroup_start,GOMP_taskgroup_start; .alias gomp_ialias_GOMP_taskgroup_end,GOMP_taskgroup_end; .alias gomp_ialias_GOMP_taskgroup_reduction_register,GOMP_taskgroup_reduction_register; .alias gomp_ialias_omp_capture_affinity,omp_capture_affinity; .alias gomp_ialias_omp_aligned_alloc,omp_aligned_alloc; .alias gomp_ialias_omp_free,omp_free; .alias gomp_ialias_omp_aligned_calloc,omp_aligned_calloc; ...
[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019 --- Comment #1 from Tom de Vries --- To trigger: ... diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index 87efc23bd96..8bf9ea90a77 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -245,6 +245,9 @@ default_ptx_version_option (void) warp convergence. */ res = MAX (res, PTX_VERSION_6_0); + /* Pick at least 6.3, to enable using malias. */ + res = MAX (res, PTX_VERSION_6_3); + /* Verify that we pick a version that supports the sm. */ gcc_assert (first <= res); return res; diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt index 11288d1a8ee..a4aece80682 100644 --- a/gcc/config/nvptx/nvptx.opt +++ b/gcc/config/nvptx/nvptx.opt @@ -87,7 +87,7 @@ mptx-comment Target Var(nvptx_comment) Init(1) Undocumented malias- -Target Var(nvptx_alias) Init(0) Undocumented +Target Var(nvptx_alias) Init(1) Undocumented mexperimental Target Var(nvptx_experimental) Init(0) Undocumented ... rebuild gcc, run libgomp tests, and: ... $ grep -c "Internal error: reference to deleted section" libgomp.log 637 ... For instance: ... Execution timeout is: 300 spawn [open ...]^M libgomp: Link error log fatal : Internal error: reference to deleted section libgomp: cuLinkComplete error: unknown error libgomp: Cannot map target functions or variables (expected 2, have 4294967295) FAIL: libgomp.c/../libgomp.c-c++-common/declare_target-1.c execution test ...
[Bug target/105019] New: [nvptx] malias in libgomp results in "Internal error: reference to deleted section"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019 Bug ID: 105019 Summary: [nvptx] malias in libgomp results in "Internal error: reference to deleted section" Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- As mentioned in the commit message for malias: ... When enabling malias by default, libgomp detects alias support and consequently libgomp.a will contains a few uses of .alias. This however results in aforementioned "Internal error: reference to deleted section" in many test-cases. Either there's some error with how .alias is used, or there's a driver bug. While this issue is not resolved, we keep malias off-by-default. ... This needs to be investigated, and if it's a driver bug, reported to nvidia, or otherwise fixed or worked around. Note: the same error showed up in a test-case where the call to an alias was inlined, and consequently the alias referenced a defined but unused function. To observe this, disable this bit: ... if (!cgraph_node::get (name)->referred_to_p ()) /* Prevent "Internal error: reference to deleted section". */ return; ... in nvptx_asm_output_def_from_decls and run nvptx.exp=alias-2.c: ... PASS: gcc.target/nvptx/alias-2.c (test for excess errors) spawn nvptx-none-run ./alias-2.exe^M fatal : Internal error: reference to deleted section^M nvptx-run: cuLinkComplete failed: unknown error (CUDA_ERROR_UNKNOWN, 999)^M FAIL: gcc.target/nvptx/alias-2.c execution test ... So it's possible that the error somehow related to this scenario.
[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014 --- Comment #3 from Tom de Vries --- (In reply to Tom de Vries from comment #2) > (In reply to Tom de Vries from comment #0) > > On a quadro k2000 with driver 470.103.01, I run into: > > So, sm_30. > > > ... > > FAIL: gcc.dg/pr97459-1.c execution test > > Reproduced on geforce gt710 (sm_35), with same driver. But not on quadro k620 (sm_50).
[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014 --- Comment #2 from Tom de Vries --- (In reply to Tom de Vries from comment #0) > On a quadro k2000 with driver 470.103.01, I run into: So, sm_30. > ... > FAIL: gcc.dg/pr97459-1.c execution test Reproduced on geforce gt710 (sm_35), with same driver.
[Bug target/105018] [nvptx] Need better alias support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105018 --- Comment #2 from Tom de Vries --- As mentioned before by amonakov, a possibility is to add alias support to the nvptx-tools linker, and use that.
[Bug target/105018] [nvptx] Need better alias support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105018 --- Comment #1 from Tom de Vries --- (In reply to Tom de Vries from comment #0) > Aliases to aliases are not supported (see libgomp.c-c++-common/pr96390.c). > This is currently not prohibited by the compiler, but with the driver link we > run into: "Internal error: alias to unknown symbol" . And that is the reason that libgomp.c-c++-common/pr96390.c and friends doesn't pass when I do: ... /* { dg-additional-options "-foffload=-mptx=6.3 -foffload=-malias" { target offload_target_nvptx } } */ ...
[Bug target/105018] New: [nvptx] Need better alias support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105018 Bug ID: 105018 Summary: [nvptx] Need better alias support Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- We currently have alias support enabled by malias, which relies on the ptx .alias directive. There is a number of limitations, listed in the commit adding malias: ... Only function aliases are supported. Weak aliases are not supported. That is, if I disable the check in nvptx_asm_output_def_from_decls that disallows this, a weak alias is emitted and parsed by the driver. But the test gcc.dg/globalalias.c starts failing, with the behaviour matching the comment about "weird behavior of AIX's .set pseudo-op": a weak alias may resolve to different functions in different files. Aliases to weak symbols are not supported (see gcc.dg/localalias.c). This is currently not prohibited by the compiler, but with the driver link we run into: "error: Function test with .weak scope cannot be aliased". Aliases to aliases are not supported (see libgomp.c-c++-common/pr96390.c). This is currently not prohibited by the compiler, but with the driver link we run into: "Internal error: alias to unknown symbol" . Unreferenced aliases are not emitted (these can occur f.i. when inlining a call to an alias). This avoids driver link error "Internal error: reference to deleted section". ... We'd like an implementation that doesn't have (all of) these limitations.
[Bug target/97106] [nvptx] Issues with weak aliases introduced by C++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97106 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Target Milestone|--- |12.0 Resolution|--- |FIXED --- Comment #5 from Tom de Vries --- Using the test-case from comment 0 and: ... /* { dg-additional-options "-foffload=-malias -foffload=-mptx=6.3 -O0" } */ ... I get: ... $ strings test.exe | grep -i alias.alias _ZN1VILi1EEC1ImvEET_,_ZN1VILi1EEC2ImvEET_; ... so I see a normal alias, not a weak alias. The test-case still fails in the abort. Note that I get the same result with: ... /* { dg-additional-options "-foffload=-mno-alias -foffload=-mptx=6.3 -O2" } */ ... There may be a problem with the test-case, there may be a problem with nvptx c++ support, but the alias issue seems to have been addresses, so I'm closing this one.
[Bug target/97106] [nvptx] Issues with weak aliases introduced by C++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97106 Bug 97106 depends on bug 97102, which changed state. Bug 97102 Summary: [nvptx] PTX JIT compilation failed when using aliases https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97102 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED
[Bug target/97102] [nvptx] PTX JIT compilation failed when using aliases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97102 Tom de Vries changed: What|Removed |Added Target Milestone|--- |12.0 Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #8 from Tom de Vries --- The test-case from comment 0 now works in combination with: /* { dg-additional-options "-foffload=-malias -foffload=-mptx=6.3" } */
[Bug libgomp/98215] Coalescing memory in target region creates slower code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98215 Tom de Vries changed: What|Removed |Added Severity|normal |enhancement Keywords||missed-optimization CC||vries at gcc dot gnu.org
[Bug target/104916] [nvptx] Handle Independent Thread Scheduling for sm_70+ with -muniform-simt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104916 Tom de Vries changed: What|Removed |Added Target Milestone|--- |12.0
[Bug target/104916] [nvptx] Handle Independent Thread Scheduling for sm_70+ with -muniform-simt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104916 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #5 from Tom de Vries --- Fixed.
[Bug target/104783] [nvptx, openmp] Hang/abort with atomic update in simd construct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104783 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |12.0 Status|UNCONFIRMED |RESOLVED --- Comment #8 from Tom de Vries --- Fixed.
[Bug target/104957] [nvptx] Use .alias directive (available starting ptx isa version 6.3)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104957 Tom de Vries changed: What|Removed |Added Target Milestone|--- |12.0 Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #5 from Tom de Vries --- Committed.
[Bug target/104925] [nvptx] Use "%" as register prefix
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104925 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED Target Milestone|--- |12.0 --- Comment #2 from Tom de Vries --- Fixed.
[Bug libgcc/105016] [libgcc, TARGET_HAS_NO_HW_DIVIDE] Incorrect result for __udivmodti4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105016 --- Comment #3 from Tom de Vries --- In libgcc.h, I see: ... #define __udivmoddi4__NDW(udivmod,4) ... and for LIBGCC2_UNITS_PER_WORD == 8 we have: ... #define __NDW(a,b) __ ## a ## ti ## b ... So, AFAICT it's possible that __udivmoddi4 is mapped to __udivmodti4.
[Bug libgcc/105016] [libgcc, TARGET_HAS_NO_HW_DIVIDE] Incorrect result for __udivmodti4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105016 --- Comment #1 from Tom de Vries --- Created attachment 52662 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52662=edit test-case
[Bug libgcc/105016] New: [libgcc, TARGET_HAS_NO_HW_DIVIDE] Incorrect result for __udivmodti4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105016 Bug ID: 105016 Summary: [libgcc, TARGET_HAS_NO_HW_DIVIDE] Incorrect result for __udivmodti4 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgcc Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- While investigating PR105014, I copied the TARGET_HAS_NO_HW_DIVIDE implementation of __udivmoddi4 to the test-case. On x86, when using the native __udivmodti4, I get: ... $ ./a.out a : 0xfffb b : 0x0001 div : 0xfffb mod : 0x ... But with the TARGET_HAS_NO_HW_DIVIDE version I get instead: ... a : 0xfffb b : 0x0001 div : 0x mod : 0xfffc ...
[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014 --- Comment #1 from Tom de Vries --- First FAIL minimizes to: ... typedef __uint128_t T; union u { T t; struct { unsigned long long x; unsigned long long y; } xy; }; #define PRINT(VAR) \ do\ { \ __builtin_printf (#VAR ": lo: %llx\n", VAR.xy.x); \ __builtin_printf (#VAR ": hi: %llx\n", VAR.xy.y); \ } \ while (0) extern T __udivmodti4 (T, T, T *); int main (void) { union u a, b, mod, div; a.t = -4; b.t = 1; PRINT (a); PRINT (b); div.t = __udivmodti4 (a.t, b.t, ); PRINT (div); PRINT (mod); if (mod.t != 0) __builtin_abort (); return 0; } ... Fails like this: ... $ ./install/bin/nvptx-none-run ./pr97459-1.exe a: lo: fffc a: hi: b: lo: 1 b: hi: 0 div: lo: fffd div: hi: mod: lo: mod: hi: 0 nvptx-run: error getting kernel result: unspecified launch failure (CUDA_ERROR_LAUNCH_FAILED, 719) $ ... With -O0 JIT instead: ... $ ./install/bin/nvptx-none-run -O0 ./pr97459-1.exe a: lo: fffc a: hi: b: lo: 1 b: hi: 0 div: lo: fffc div: hi: mod: lo: 0 mod: hi: 0 ...
[Bug target/105014] New: [nvptx] FAIL: gcc.dg/pr97459-1.c execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014 Bug ID: 105014 Summary: [nvptx] FAIL: gcc.dg/pr97459-1.c execution test Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- On a quadro k2000 with driver 470.103.01, I run into: ... FAIL: gcc.dg/pr97459-1.c execution test FAIL: gcc.dg/pr97459-2.c execution test FAIL: gcc.dg/pr97459-3.c execution test FAIL: gcc.dg/pr97459-4.c execution test FAIL: gcc.dg/pr97459-5.c execution test FAIL: gcc.dg/pr97459-6.c execution test ...
[Bug target/105011] [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011 --- Comment #2 from Tom de Vries --- Even better: ... diff --git a/libatomic/tas_n.c b/libatomic/tas_n.c index d0d8c283b495..65eaa7753a51 100644 --- a/libatomic/tas_n.c +++ b/libatomic/tas_n.c @@ -73,7 +73,7 @@ SIZE(libat_test_and_set) (UTYPE *mptr, int smodel) __ATOMIC_RELAXED, __ATOMIC_RELAXED)); post_barrier (smodel); - return woldval != 0; + return (woldval & wval) == wval; } #define DONE 1 ... That also gives back accurate results in case TARGET_ATOMIC_TEST_AND_SET_TRUEVAL has more than one bit set.
[Bug target/105011] [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011 --- Comment #1 from Tom de Vries --- (In reply to Tom de Vries from comment #0) > It should probably do something like: > ... > return (woldval & wval) != 0; > ... Indeed, that fixes the FAILs.
[Bug target/105011] New: [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011 Bug ID: 105011 Summary: [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- On a quadro k2000 with driver 470.103.01 I'm running into this cluster of FAILs: ... FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O3 -g execution test FAIL: gcc.dg/atomic/stdatomic-flag.c -O3 -g execution test ... Minimizing the first FAIL, I end up with: ... #include extern void abort (void); atomic_flag a = ATOMIC_FLAG_INIT; int main () { int b; if ((atomic_flag_test_and_set) ()) abort (); return 0; } ... The atomic access is done by libatomic, using a 64-bit cas loop. If we print the address of a, we have: ... : 000700700200 ... so the pointer is already 64-bit aligned. If we print the 64-bit value of *(unsigned long long *) before and after the test-and-set, we have: ... a: 00024f00 a: 00024f01 ... so that looks all-right as well. At first glance, the problem is in libatomic, tas_n.c: ... wval = (UWORD)__GCC_ATOMIC_TEST_AND_SET_TRUEVAL << shift; woldval = __atomic_load_n (wptr, __ATOMIC_RELAXED); do { t = woldval | wval; } while (!atomic_compare_exchange_w (wptr, , t, true, __ATOMIC_RELAXED, __ATOMIC_RELAXED)); post_barrier (smodel); return woldval != 0; ... What is returned is woldval != 0, but that tests the entire word, not just the byte we're interested in. It should probably do something like: ... return (woldval & wval) != 0; ...
[Bug middle-end/105001] If executing with non-nvptx offloading, but nvptx offloading compilation is enabled: FAIL: libgomp.c/pr104783.c execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105001 --- Comment #1 from Tom de Vries --- Interesting. Can you compare dump files to see where the difference comes from?
[Bug target/104936] [nvptx] Handle weak decl/def distinction in common code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104936 Tom de Vries changed: What|Removed |Added Severity|normal |enhancement Keywords||internal-improvement
[Bug target/104991] New: [nvptx] Simplify muniform-simt transformation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104991 Bug ID: 104991 Summary: [nvptx] Simplify muniform-simt transformation Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- The muniform-simt reorg pass transforms the insn stream, both inside and outside an SIMT region. The transform rewrites atomic insns by adding a predicate to execute in one thread only outside the SIMT region. Furthermore, if the atomic insn has a result, a shuffle is added to propagate the result to all threads in the warp. Inside the SIMT region, the predicate evaluates to true such that all threads in the warp execute it. And the source lane register for the shuffle is set such that the shuffle is a nop inside the SIMT region. However, since we've started using shfl.sync for the shuffle, the shuffle now has it's own predicate, and consequently having a source lane register with different values inside and outside the SIMT region is no longer necessary.
[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |12.0 Status|UNCONFIRMED |RESOLVED --- Comment #7 from Tom de Vries --- Fixed by https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=356e2720e9030927579024c2f060d665a0b9080f .
[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |12.0 Status|UNCONFIRMED |RESOLVED --- Comment #12 from Tom de Vries --- Fixed by "[openmp] Fix SIMT reduction using TRUTH_{AND,OR}IF_EXPR".
[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968 --- Comment #6 from Tom de Vries --- (In reply to Tom de Vries from comment #5) > This patch fixes the ICE at openmp level: > ... > diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc > index 139a0de6100..19af384c634 100644 > --- a/gcc/gimplify.cc > +++ b/gcc/gimplify.cc > @@ -13361,6 +13361,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) >g = gimple_build_bind (NULL_TREE, gfor, NULL_TREE); >g = gimple_build_omp_task (g, task_clauses, NULL_TREE, NULL_TREE, > NULL_TREE, NULL_TREE, NULL_TREE); > + gimple_set_location (g, EXPR_LOCATION (*expr_p)); >gimple_omp_task_set_taskloop_p (g, true); >g = gimple_build_bind (NULL_TREE, g, NULL_TREE); >gomp_for *gforo > ... Submitted a more complete patch here ( https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591954.html ).
[Bug target/104957] [nvptx] Use .alias directive (available starting ptx isa version 6.3)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104957 Tom de Vries changed: What|Removed |Added Severity|normal |enhancement
[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968 Tom de Vries changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #5 from Tom de Vries --- This patch fixes the ICE at openmp level: ... diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 139a0de6100..19af384c634 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -13361,6 +13361,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) g = gimple_build_bind (NULL_TREE, gfor, NULL_TREE); g = gimple_build_omp_task (g, task_clauses, NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE); + gimple_set_location (g, EXPR_LOCATION (*expr_p)); gimple_omp_task_set_taskloop_p (g, true); g = gimple_build_bind (NULL_TREE, g, NULL_TREE); gomp_for *gforo ...
[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968 --- Comment #4 from Tom de Vries --- This ( https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591912.html ) proposed patch fixes this ICE, pinged again.
[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968 --- Comment #3 from Tom de Vries --- (In reply to Tom de Vries from comment #2) > (In reply to Tom de Vries from comment #1) > > Can't reproduce. > > > > It this not fixed by: > > ... > > commit 7862f6ccd85a001e4d70abb00bb95d8c7846ba80 > > Author: Tom de Vries > > Date: Wed Feb 23 09:33:33 2022 +0100 > > > > [nvptx] Fix dummy location in gen_comment > > ... > > ? > > Hmm, wait, of course I have a patch in my stack that's pending for upstream. > Let me undo that one and retry. Ack, reproduced.
[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968 --- Comment #2 from Tom de Vries --- (In reply to Tom de Vries from comment #1) > Can't reproduce. > > It this not fixed by: > ... > commit 7862f6ccd85a001e4d70abb00bb95d8c7846ba80 > Author: Tom de Vries > Date: Wed Feb 23 09:33:33 2022 +0100 > > [nvptx] Fix dummy location in gen_comment > ... > ? Hmm, wait, of course I have a patch in my stack that's pending for upstream. Let me undo that one and retry.
[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968 --- Comment #1 from Tom de Vries --- Can't reproduce. It this not fixed by: ... commit 7862f6ccd85a001e4d70abb00bb95d8c7846ba80 Author: Tom de Vries Date: Wed Feb 23 09:33:33 2022 +0100 [nvptx] Fix dummy location in gen_comment ... ?
[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952 --- Comment #9 from Tom de Vries --- Created attachment 52647 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52647=edit Tentative patch with test-cases, rationale and changelog I'll put this through testing, and submit if no problems found.
[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952 --- Comment #8 from Tom de Vries --- (In reply to Jakub Jelinek from comment #6) > And yes, #c1 is valid. Thanks for confirming. > But would be nice to have similar test with && and > initial result = 2; and arr[] say { 1, 2, 3, 4, 5, 6, 7, ..., 32 } and test > result is 1 at the end to make sure we don't actually do just > orig = orig & (private != 0) > style merging or even just > orig = orig & private; Ack, will add that.
[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952 --- Comment #7 from Tom de Vries --- Alternative fix that doesn't require fiddling with the 'code' var: ... diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index d932d74cb03..d0ddd4a6142 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -6734,7 +6734,10 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimpl e_seq *dlist, x = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY, TREE_TYPE (ivar), 2, ivar, simt_lane); - x = build2 (code, TREE_TYPE (ivar), ivar, x); + /* Make sure x is evaluated unconditionally. */ + tree bfly_var = create_tmp_var (TREE_TYPE (ivar)); + gimplify_assign (bfly_var, x, [2]); + x = build2 (code, TREE_TYPE (ivar), ivar, bfly_var); gimplify_assign (ivar, x, [2]); } tree ivar2 = ivar; ...