[Bug debug/111409] Invalid .debug_macro.dwo macro information for split DWARF

2024-05-21 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111409

Tom de Vries  changed:

   What|Removed |Added

 CC||hjl.tools at gmail dot com

--- Comment #6 from Tom de Vries  ---
*** Bug 87472 has been marked as a duplicate of this bug. ***

[Bug debug/87472] Unknown macro opcode with -gsplit-dwarf -g3

2024-05-21 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87472

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||vries at gcc dot gnu.org
 Resolution|--- |DUPLICATE

--- Comment #4 from Tom de Vries  ---
Duplicate.

*** This bug has been marked as a duplicate of bug 111409 ***

[Bug debug/111409] Invalid .debug_macro.dwo macro information for split DWARF

2024-05-21 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111409

Tom de Vries  changed:

   What|Removed |Added

 CC||vries at gcc dot gnu.org
   Target Milestone|--- |14.0

--- Comment #5 from Tom de Vries  ---
Fixed in 14.1 release.  Corresponding target milestone not available, so using
14.0.

[Bug debug/115066] [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index

2024-05-16 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115066

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #11 from Tom de Vries  ---
(In reply to Rainer Orth from comment #10)
> The new test currently FAILs on Solaris/SPARC with the native as:
> 
> FAIL: gcc.dg/pr115066.c scan-assembler .bytet0xbt# Define macro
> strx
> 
> The relevant snippet of pr115066.s is
> 
> .section".debug_macro.dwo",#exclude,#progbits
> .LLdebug_macro0:
> .uahalf 0x4 ! DWARF macro version number
> .byte   0x2 ! Flags: 32-bit, lineptr present
> .uaword .LLskeleton_debug_line0
> .byte   0x1 ! Define macro
> 
> while when using gas, I have
> 
> .section.debug_macro.dwo,"e",@progbits
> .LLdebug_macro0: 
> .uahalf 0x4 ! DWARF macro version number
> .byte   0x2 ! Flags: 32-bit, lineptr present
> .uaword .LLskeleton_debug_line0
> .byte   0xb ! Define macro strx
> 
> AFAICS from dwarf2out.cc (output_macinfo_op), the requirements for using
> DW_MACRO_define_strx are (among others)
> !DWARF2_INDIRECT_STRING_SUPPORT_MISSING_ON_TARGET && SECTION_MERGE.
> 
> However, with the native assembler, SHF_MERGE doesn't work (as emits
> something
> ld cannot link).
> 
> I wonder how best to handle this: just skip the test on sparc*-sun-solaris2*
> && !gas?  Theoretically, there could be other targets with similar issues.

This looks like test-case issue, so re-closing the PR.

How about:
...
-/* { dg-final { scan-assembler {\.byte\t0xb\t[^\n\r]* Define macro strx} } }
*/
+/* { dg-final { scan-assembler {\.byte\t0xb\t[^\n\r]* Define macro
strx|\.byte\t0x1\t[^\n\r]* Define macro}
...
?

[Bug debug/115066] [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index

2024-05-14 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115066

Tom de Vries  changed:

   What|Removed |Added

   Target Milestone|--- |15.0
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #9 from Tom de Vries  ---
Fixed.

[Bug debug/115066] [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index

2024-05-14 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115066

--- Comment #7 from Tom de Vries  ---
Submitted here ( https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651586.html
).

[Bug debug/115066] [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index

2024-05-13 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115066

--- Comment #6 from Tom de Vries  ---
(In reply to Jakub Jelinek from comment #5)
> Just make it if (dwarf_split_debug_info) then?

That works as well indeed:
...
diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index eedb13bb069..70b7f5f42cd 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -29045,7 +29045,7 @@ output_macinfo_op (macinfo_entry *ref)
  && !DWARF2_INDIRECT_STRING_SUPPORT_MISSING_ON_TARGET
  && (debug_str_section->common.flags & SECTION_MERGE) != 0)
{
- if (dwarf_split_debug_info && dwarf_version >= 5)
+ if (dwarf_split_debug_info)
ref->code = ref->code == DW_MACINFO_define
? DW_MACRO_define_strx : DW_MACRO_undef_strx;
  else
@@ -29097,12 +29097,20 @@ output_macinfo_op (macinfo_entry *ref)
   HOST_WIDE_INT_PRINT_UNSIGNED,
   ref->lineno);
   if (node->form == DW_FORM_strp)
-dw2_asm_output_offset (dwarf_offset_size, node->label,
-   debug_str_section, "The macro: \"%s\"",
-   ref->info);
+   {
+ gcc_assert (ref->code == DW_MACRO_define_strp
+ || ref->code == DW_MACRO_undef_strp);
+ dw2_asm_output_offset (dwarf_offset_size, node->label,
+debug_str_section, "The macro: \"%s\"",
+ref->info);
+   }
   else
-dw2_asm_output_data_uleb128 (node->index, "The macro: \"%s\"",
- ref->info);
+   {
+ gcc_assert (ref->code == DW_MACRO_define_strx
+ || ref->code == DW_MACRO_undef_strx);
+ dw2_asm_output_data_uleb128 (node->index, "The macro: \"%s\"",
+  ref->info);
+   }
   break;
 case DW_MACRO_import:
   dw2_asm_output_data (1, ref->code, "Import");
...

I've also added asserts detecting this PR.

[Bug debug/115066] [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index

2024-05-13 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115066

--- Comment #4 from Tom de Vries  ---
(In reply to Richard Biener from comment #3)
> And with -gstrict-dwarf -gdwarf-4?

With:
...
$ gcc.sh -gdwarf-4 -gsplit-dwarf /data/vries/hello.c -g3 -gstrict-dwarf
...
we get:
...
.section.debug_macinfo.dwo,"e",@progbits
.Ldebug_macinfo0:
.byte   0x3 # Start new file
.uleb128 0  # Included from line number 0
.uleb128 0x1# file /data/vries/hello.c
.byte   0x1 # Define macro
.uleb128 0  # At line number 0
.ascii "__STDC__ 1\0"   # The macro
...

[Bug debug/115066] [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index

2024-05-13 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115066

--- Comment #2 from Tom de Vries  ---
(In reply to Tom de Vries from comment #1)
> Looking at the source code, I wonder if this would fix it:
> ...
> diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
> index eedb13bb069..045858bf638 100644
> --- a/gcc/dwarf2out.cc
> +++ b/gcc/dwarf2out.cc
> @@ -29045,7 +29045,7 @@ output_macinfo_op (macinfo_entry *ref)
> && !DWARF2_INDIRECT_STRING_SUPPORT_MISSING_ON_TARGET
> && (debug_str_section->common.flags & SECTION_MERGE) != 0)
>   {
> -   if (dwarf_split_debug_info && dwarf_version >= 5)
> +   if (dwarf_split_debug_info && (!dwarf_strict || dwarf_version >= 5))
>   ref->code = ref->code == DW_MACINFO_define
>   ? DW_MACRO_define_strx : DW_MACRO_undef_strx;
> else
> ...

With that change I get:
...
.Ldebug_macro0:
.value  0x4 # DWARF macro version number
.byte   0x2 # Flags: 32-bit, lineptr present
.long   .Lskeleton_debug_line0
.byte   0x3 # Start new file
.uleb128 0  # Included from line number 0
.uleb128 0x1# file /data/vries/hello.c
.byte   0xb # Define macro strx
.uleb128 0  # At line number 0
.uleb128 0x17b  # The macro: "__STDC__ 1"
...

[Bug debug/115066] [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index

2024-05-13 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115066

--- Comment #1 from Tom de Vries  ---
Looking at the source code, I wonder if this would fix it:
...
diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index eedb13bb069..045858bf638 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -29045,7 +29045,7 @@ output_macinfo_op (macinfo_entry *ref)
  && !DWARF2_INDIRECT_STRING_SUPPORT_MISSING_ON_TARGET
  && (debug_str_section->common.flags & SECTION_MERGE) != 0)
{
- if (dwarf_split_debug_info && dwarf_version >= 5)
+ if (dwarf_split_debug_info && (!dwarf_strict || dwarf_version >= 5))
ref->code = ref->code == DW_MACINFO_define
? DW_MACRO_define_strx : DW_MACRO_undef_strx;
  else
...

[Bug debug/115066] New: [debug, gsplit-dwarf, gdwarf-4, g3] DW_MACRO_define_strp used for debug_str_offsets index

2024-05-13 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115066

Bug ID: 115066
   Summary: [debug, gsplit-dwarf, gdwarf-4, g3]
DW_MACRO_define_strp used for debug_str_offsets index
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider a hello world, compiled with split dwarf and dwarf version 4, and -g3
for macro info:
...
$ gcc -gdwarf-4 -gsplit-dwarf /data/vries/hello.c -g3 -save-temps -dA
...

In section .debug_macro.dwo, we have:
...
.Ldebug_macro0:
.value  0x4 # DWARF macro version number
.byte   0x2 # Flags: 32-bit, lineptr present
.long   .Lskeleton_debug_line0
.byte   0x3 # Start new file
.uleb128 0  # Included from line number 0
.uleb128 0x1# file /data/vries/hello.c
.byte   0x5 # Define macro strp
.uleb128 0  # At line number 0
.uleb128 0x1d0  # The macro: "__STDC__ 1"
...

So, given that we use a DW_MACRO_define_strp, we'd expect 0x1d0 to be an offset
into a .debug_str section.

However, in .debug_str.dwo we find:
...
  0x01d0 455f584f 50454e32 4b385853 49005345 E_XOPEN2K8XSI.SE
...

In fact, 0x1d0 is an index into the string offset table in
.debug_str_offsets.dwo:
...
.long   0x34f0  # indexed string 0x1d0: __STDC__ 1
...

So, it looks like DW_MACRO_define_strx should have been used instead.

[Bug c++/113599] [14 Regression] Wrong computation of member offset through pointer-to-member since r14-5503

2024-01-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113599

Tom de Vries  changed:

   What|Removed |Added

 CC||vries at gcc dot gnu.org

--- Comment #2 from Tom de Vries  ---
FWIW, the inherit order is relevant, after applying this change we get the
expected result:
...
-struct thread_info : public dummy, public intrusive_list_node {
+struct thread_info : public intrusive_list_node, public dummy {
...

This could be used as workaround.

[Bug debug/112565] Abnormal Jump in Execution using 'stepi' Command in GDB under O2 optimization

2023-11-16 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112565

--- Comment #1 from Tom de Vries  ---
(In reply to Anonymous from comment #0)
> Tom de Vries suggests that this issue may be attributed to a GCC
> optimization bug.

I do not.

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799

--- Comment #7 from Tom de Vries  ---
(In reply to Alexander Monakov from comment #5)
> This trips Valgrind's data race detector (valgrind --tool=helgrind) too. So
> I don't think checking SANITIZE_THREAD is the correct approach.

Can you elaborate on what you consider a correct approach?

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799

--- Comment #6 from Tom de Vries  ---
(In reply to rguent...@suse.de from comment #4)
> I'm suggesting to not fix it ;) 

Can you explain why ?

It doesn't look difficult to fix to me, and I don't see any downsides.

> That said, is TSAN a useful vehicle?

Well, false positives aside, yes.

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799

--- Comment #2 from Tom de Vries  ---
(In reply to Richard Biener from comment #1)
> We consider introducing load data races OK, what's the difference here? 

This is a load vs. store data race.

> There are other passes that would do similar things but in practice the
> loads would be considered to possibly trap so the real-world impact might be
> limited?

If you're suggesting to fix this in a more generic way, I'm all for it.

[Bug sanitizer/110799] New: [tsan] False positive due to -fhoist-adjacent-loads

2023-07-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799

Bug ID: 110799
   Summary: [tsan] False positive due to -fhoist-adjacent-loads
   Product: gcc
   Version: 13.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at 
gcc dot gnu.org
  Target Milestone: ---

I build gdb with -O2 -fsanitizer=thread, and ran into a false positive due to
-fhoist-adjacent-loads.

Minimal example:
...
$ cat race.c
#include 
#include 

int c;

struct s
{
  int a;
  int b;
};

struct s v1;

int v3;

void *
thread1 (void *x)
{
  v1.a = 1;
  return NULL;
}

void *
thread2 (void *x)
{
  v3 = c ? v1.a : v1.b;
  return NULL;
}

int
main (void)
{
  pthread_t t[2];

  pthread_create([0], NULL, thread1, NULL);
  pthread_create([1], NULL, thread2, NULL);

  pthread_join(t[0], NULL);
  pthread_join(t[1], NULL);

  return 0;
}
...

With O0, runs fine:
...
$ gcc race.c -fsanitize=thread -g
$ ./a.out 
$
...

With O2, a race is reported:
...
$ gcc race.c -fsanitize=thread -g -O2
$ ./a.out 
==
WARNING: ThreadSanitizer: data race (pid=24538)
  Read of size 4 at 0x00404060 by thread T2:
#0 thread2 /data/vries/gdb/race.c:26 (a.out+0x401299) (BuildId:
295673549b1e99c73c70a2a8d26944f177f88c15)
#1   (libtsan.so.2+0x3c329) (BuildId:
8f2a9be581a0fcb3d7109755a6067408093b9dbd)

  Previous write of size 4 at 0x00404060 by thread T1:
#0 thread1 /data/vries/gdb/race.c:19 (a.out+0x401257) (BuildId:
295673549b1e99c73c70a2a8d26944f177f88c15)
#1   (libtsan.so.2+0x3c329) (BuildId:
8f2a9be581a0fcb3d7109755a6067408093b9dbd)
...

With -fno-hoist-adjacent-loads, it's fine again:
...
$ gcc race.c -fsanitize=thread -g -O2 -fno-hoist-adjacent-loads
$ ./a.out 
$
...

The optimization transforms these loads:
...
  v3 = c ? v1.a : v1.b;
...
into:
...
  int tmp_a = v1.a;
  int tmp_b = v1.b;
  v3 = c ? tmp_a : tmp_b
...
which introduces the false positive.

So I wonder if there should be a change like this:
...
 static bool
 gate_hoist_loads (void)
 {
   return (flag_hoist_adjacent_loads == 1
+  && (flag_sanitize & SANITIZE_THREAD) == 0
   && param_l1_cache_line_size
   && HAVE_conditional_move);
 }
...

[Bug c/109708] New: [c, doc] wdangling-pointer example broken

2023-05-03 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109708

Bug ID: 109708
   Summary: [c, doc] wdangling-pointer example broken
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

I ran into a Wdangling-pointer warning and decided to read the docs and try out
an example.

The first one listed is:
...
int f (int c1, int c2, x)
{
  char *p = strchr ((char[]){ c1, c2 }, c3);
  // warning: dangling pointer to a compound literal
  return p ? *p : 'x';
}
...

It's not a complete example, x is missing a declared type and c3 is undeclared. 

After fixing that (and adding the implicit "#include "), we have an
example that compiles:
...
#include 

int f (int c1, int c2, int c3)
{
  char *p = strchr ((char[]){ c1, c2 }, c3);
  return p ? *p : 'x';
}
...
but no warning, not at O0, O1, O2 or O3:
...
$ gcc test.c -Wdangling-pointer=1 -c
$
...

After reading the description of the warning, I managed to come up with:
...
char
f (char c1, char c2)
{
  char *p;

  {
p = (char[]) { c1, c2 };
  } 

  return *p;
}
...
which does manage to trigger the warning for O0-O3:
...
$ gcc test.c -Wdangling-pointer=1 -c
test.c: In function ‘f’:
test.c:10:10: warning: using dangling pointer ‘p’ to an unnamed temporary
[-Wdangling-pointer=]
   10 |   return *p;
  |  ^~
test.c:7:18: note: unnamed temporary defined here
7 | p = (char[]) { c1, c2 };
  |  ^
$
...

It might be worth mentioning that it's a C example, when using g++ we have:
...
$ g++ test.c -Wdangling-pointer=1 -c
test.c: In function ‘char f(char, char)’:
test.c:7:18: error: taking address of temporary array
7 | p = (char[]) { c1, c2 };
  |  ^~
...

BTW, note that the warning can be fixed by doing:
...
 char
 f (char c1, char c2)
 {
   char *p;
+  char c;

  {
p = (char[]) { c1, c2 };
+   c = *p;
  }

-  return *p;
+  return c;
}
...

[Bug debug/108600] Use DW_LNS_set_prologue_end

2023-01-31 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600

--- Comment #4 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> Note that for for instance gdb test-case gdb.ada/ref_param.exp, this
> convention was broken for gcc 7.5.0 (and I don't know how much earlier), and
> my current guess is that it got fixed in gcc 11.1.0, by commit c029fcb5680
> ("Reset force_source_line in final.c").

Confirmed, see PR108615.

[Bug debug/108615] Incorrect prologue marker in line table

2023-01-31 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108615

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |11.0
   Keywords||wrong-debug

--- Comment #1 from Tom de Vries  ---
Closing with milestone gcc 11.0.

[Bug debug/108615] New: Incorrect prologue marker in line table

2023-01-31 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108615

Bug ID: 108615
   Summary: Incorrect prologue marker in line table
   Product: gcc
   Version: 10.4.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

[ Filing FTR, to link a commit to a test-case that it fixes, and to be able to
refer to it from gdb sources. ]

Consider pck.adb/pck.ads from here (
https://sourceware.org/git/?p=binutils-gdb.git;a=tree;f=gdb/testsuite/gdb.ada/ref_param;h=823556d8928bd1de865e71400092e6a44b2773cd;hb=HEAD
).

Compile like so, with system gcc 7.5.0:
...
$ gcc pck.adb -S -g
...

In the .s file we find:
...
pck__call_me:
.LFB3:
.file 1 "pck.adb"
.loc 1 18 0
.cfi_startproc
.loc 1 18 0
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
movq%rdi, -8(%rbp)
.loc 1 20 0
...

There's a convention that the second entry in the line table indicates the end
of the prologue.

So the second .loc indicates that the prologue ends there, while it ends only
after the third insn, at line 20.

Same with 10.4.0.

Fixed by commit c029fcb5680 ("Reset force_source_line in final.c"), first
available in release 11.1.0.

With 11.3.0 we have:
...
pck__call_me:
.LFB3:
.file 1 "pck.adb"
.loc 1 18 4
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
movq%rdi, -8(%rbp)
.loc 1 20 16
...

[Bug debug/108600] Use DW_LNS_set_prologue_end

2023-01-31 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600

--- Comment #3 from Tom de Vries  ---
(In reply to Tom de Vries from comment #2)
> (In reply to Tom de Vries from comment #1)
> > Created attachment 54371 [details]
> 
> We probably don't want to emit in all cases, maybe limiting to 
> "dwarf_version >= 3", or "!dwarf_strict || dwarf_version >= 3".

Let's see about dwarf_strict.

Semantics:
...
-gstrict-dwarf

Disallow using extensions of later DWARF standard version than selected
with -gdwarf-version. On most targets using non-conflicting DWARF extensions
from later standard versions is allowed.
...

For the -gas-loc-support case (gcc emitting .locs), even when passing
-gdwarf-2, gas emits a v3 version .debug_line section (since binutils-2_32). 
And even if we'd fix that (I've filed
https://sourceware.org/bugzilla/show_bug.cgi?id=30064), the way gas works is by
bumping the dwarf version when encountering a feature that requires a higher
version, so using end_prologue in a loc directive would then end up bumping
dwarf_level to 3, bumping also the .debug_line version.

For the -gno-as-loc-support case (gcc emitting .debug_line contribution), for
-gdwarf-2 gcc indeed emits a v2 .debug_line section.  But, that makes
DW_LNS_set_prologue_end fall in the range of vendor specific extensions, and we
can't conflict with that.

Taking this all into account, I think it's better not to emit
DW_LNS_set_prologue_end for -gdwarf-2 -gno-strict-dwarf.

[Bug debug/47471] [10/11/12/13 Regression] stdarg functions extraneous too-early prologue end

2023-01-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47471

Tom de Vries  changed:

   What|Removed |Added

 CC||vries at gcc dot gnu.org

--- Comment #24 from Tom de Vries  ---
I can reproduce this with gcc 4.8.5:
...
f:
.LFB0:
.file 1 "test.c"
.loc 1 3 0
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
subq$60, %rsp
movq%rsi, -168(%rbp)
movq%rdx, -160(%rbp)
movq%rcx, -152(%rbp)
movq%r8, -144(%rbp)
movq%r9, -136(%rbp)
testb   %al, %al
je  .L2
.loc 1 3 0
movaps  %xmm0, -128(%rbp)
movaps  %xmm1, -112(%rbp)
movaps  %xmm2, -96(%rbp)
movaps  %xmm3, -80(%rbp)
movaps  %xmm4, -64(%rbp)
movaps  %xmm5, -48(%rbp)
movaps  %xmm6, -32(%rbp)
movaps  %xmm7, -16(%rbp)
.L2:
movl%edi, -180(%rbp)
.loc 1 4 0
movlv(%rip), %eax
...

But with gcc 7.5.0, I get:
...
f:
.LFB0:
.file 1 "test.c"
.loc 1 3 0
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
subq$72, %rsp
movl%edi, -180(%rbp)
movq%rsi, -168(%rbp)
movq%rdx, -160(%rbp)
movq%rcx, -152(%rbp)
movq%r8, -144(%rbp)
movq%r9, -136(%rbp)
testb   %al, %al
je  .L3
movaps  %xmm0, -128(%rbp)
movaps  %xmm1, -112(%rbp)
movaps  %xmm2, -96(%rbp)
movaps  %xmm3, -80(%rbp)
movaps  %xmm4, -64(%rbp)
movaps  %xmm5, -48(%rbp)
movaps  %xmm6, -32(%rbp)
movaps  %xmm7, -16(%rbp)
.L3:
.loc 1 4 0
movlv(%rip), %eax
addl$1, %eax
movl%eax, v(%rip)
...

So, isn't this fixed?

[Bug debug/108600] Use DW_LNS_set_prologue_end

2023-01-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600

--- Comment #2 from Tom de Vries  ---
(In reply to Tom de Vries from comment #1)
> Created attachment 54371 [details]

We probably don't want to emit in all cases, maybe limiting to 
"dwarf_version >= 3", or "!dwarf_strict || dwarf_version >= 3".

[Bug debug/108600] Use DW_LNS_set_prologue_end

2023-01-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600

--- Comment #1 from Tom de Vries  ---
Created attachment 54371
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54371=edit
tentative patch

Tentative patch.

For hello.c, for the -gas-loc-support case it gives us:
...
$ gcc -g ~/hello.c -S -o-
  ...
main:
.LFB0:
.file 1 "/home/vries/hello.c"
.loc 1 5 1
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
.loc 1 6 3 prologue_end
movl$.LC0, %edi

...

And for the -gno-as-loc-support case:
...
$ gcc -g ~/hello.c -c -gno-as-loc-support
$ llvm-dwarfdump --debug-line hello.o
  ...
AddressLine   Column File   ISA Discriminator Flags
-- -- -- -- --- - -
0x  5  0  1   0 0  is_stmt
0x0004  6  1  1   0 0  is_stmt prologue_end
0x000e  8  3  1   0 0  is_stmt
0x0013  9 10  1   0 0  is_stmt
0x0015  9  1  1   0 0  is_stmt end_sequence
...

[Bug debug/108600] New: Use DW_LNS_set_prologue_end

2023-01-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108600

Bug ID: 108600
   Summary: Use DW_LNS_set_prologue_end
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

The prologue_end marker (introduced in dwarf v3) is currently not emitted by
gcc.

This is the case for -gno-as-loc-support (gcc emitting .debug_line
contribution).

And for -gas-loc-support (gcc emitting .loc directives).

Note that there is an existing convention that marks the end of the prologue:
the second entry in the line table marks the first insn after the prologue.

However, that convention is not known by everyone, which makes it easier to
break it without noticing.

Note that for for instance gdb test-case gdb.ada/ref_param.exp, this convention
was broken for gcc 7.5.0 (and I don't know how much earlier), and my current
guess is that it got fixed in gcc 11.1.0, by commit c029fcb5680 ("Reset
force_source_line in final.c").

The only way to make the end of the prologue visible in assembly as well as
disassembly is to make it explicit by using DW_LNS_set_prologue_end.

[Bug libgomp/108098] OpenMP/nvptx reverse offload execution test FAILs

2022-12-15 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108098

--- Comment #1 from Tom de Vries  ---
(In reply to Thomas Schwinge from comment #0)
> $ nvidia-smi
> [...]
> | NVIDIA-SMI 440.33.01Driver Version: 440.33.01CUDA Version: 10.2
> [...]
> |   0  Tesla K80  [...]
> [...]
> |   1  Tesla K80  [...]
> 

I'm not sure if it matters for triggering this problem, but if I look at this
board at nvidia drivers download and select cuda 10.2 and production branch, I
get :
...
version:440.118.02
Release Date:   2020.9.30
...

Then using the "Beta and Older Drivers" I find the version you're using is:
...
version: 440.33.01
Release date:  November 19, 2019
...

Please always use the latest drivers when reporting a problem.

[Bug debug/107909] New: [powerpc64le, debug] Incorrect call site location due to nop after call insn

2022-11-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107909

Bug ID: 107909
   Summary: [powerpc64le, debug] Incorrect call site location due
to nop after call insn
   Product: gcc
   Version: 7.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider test-case vla-optimized-out.c:
...
int
__attribute__((noinline,weak)) __attribute__((noclone))
f1 (int i)
{
  char a[i + 1];
  a[0] = 5;
  return a[0];
}

int
main (void)
{
  volatile int j;
  int i = 5;
  asm volatile ("" : "=r" (i) : "0" (i));
  j = f1 (i);
  return 0;
}
...
compiled with -O1 -g.

We generate a nop after the bl insn:
...
bl f1# 11   *call_value_nonlocal_aixdi  [length = 8]
nop
.LVL4:
...
and the label after the nop is a call site location:
...
.uleb128 0x5 # (DIE (0x67) DW_TAG_GNU_call_site)
.8byte  .LVL4# DW_AT_low_pc
.4byte  0x81 # DW_AT_abstract_origin
...

Consequently we can't actually find the call site:
...
$ gdb -q -batch ./a.out -ex "break f1" -ex run -ex "set debug entry-values 1"
-ex "print sizeof (a)"
Breakpoint 1 at 0x165c: file vla-optimized-out.c, line 8.

Breakpoint 1, f1 (i=5) at vla-optimized-out.c:8
8   }
DW_OP_entry_value resolving cannot find DW_TAG_call_site 0x1690 in main
$1 = 
...

If we manually fix this in the .s file, we get a bit further:
...
$ gdb -q -batch ./a.out -ex "break f1" -ex run -ex "set debug entry-values 1"
-ex "print sizeof (a)"
Breakpoint 1 at 0x165c: file vla-optimized-out.c, line 8.

Breakpoint 1, f1 (i=5) at vla-optimized-out.c:8
8   }
Cannot find matching parameter at DW_TAG_call_site 0x1690 at main
$1 = 
...

The problem now is that the DW_AT_abstract_origin in the DW_TAG_GNU_call_site
is not properly handled by gdb. 

I reproduced this with gcc 7.5.0, but after looking at the pattern for
call_value_nonlocal_aixdi I think this should be reproducible with trunk.

[Bug other/87741] Don't build readline/libreadline.a in GDB, when --with-system-readline is supplied

2022-10-21 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87741

Tom de Vries  changed:

   What|Removed |Added

 CC||vries at gcc dot gnu.org
 Status|NEW |RESOLVED
   Target Milestone|--- |13.0
 Resolution|--- |FIXED
  Component|bootstrap   |other

--- Comment #8 from Tom de Vries  ---
(In reply to Andrew Pinski from comment #7)
> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;f=configure.ac;
> h=69961a84c9b3744a10248fb6cbccc3c688a1e0a5
> 
> It would be useful if both configures were synced up again 

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=36ba985145ffa8e2078033fc1f1cf22851707a8e

[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs

2022-09-06 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

--- Comment #17 from Tom de Vries  ---
(In reply to Thomas Schwinge from comment #14)
> > That's with a Nvidia Tesla K20c GPU, Driver Version: 346.46.
> > As that version is "a bit old", I shall first update this, before we spend
> > any further time on analyzing this.
> 
> Cross-checking on another system with Nvidia Tesla K20c GPU but more recent
> Driver Version I'm not seeing such an issue.
> 
> On the "old" system, gradually upgrading Driver Version: 346.46 to 352.99,
> 361.93.02, 375.88 (always the latest (?) version of the respective series),
> these all did not resolve the problem.
> 
> Only starting with 384.59 (that is, early version of the 384.X series), that
> then did resolve the issue.  That's still using the GCC/nvptx '-mptx=3.1'
> multilib.
> 
> (We couldn't with earlier series, but given this is 384.X, we may now also
> cross-check with the default multilib, and that also was fine.)
> 
> Now, I don't know if at all we would like to spend any more effort on this
> issue, given that it only appears with rather old pre-384.X versions -- but
> on the other hand, the GCC/nvptx '-mptx=3.1' multilib is meant to keep these
> supported?  (... which is why I'm running such testing; and certainly the
> timeouts are annoying there.)
> 
> It might be another issue with pre-384.X versions of the Nvidia PTX JIT, or
> is there the slight possibility that GCC is generating/libgomp contains some
> "weird" code that post-384.X version happen to "fix up" -- probably the
> former rather than the latter?  (Or, the chance of GPU hardware/firmware or
> some other system weirdness -- unlikely, otherwise behaves totally fine?)
> 
> I don't know where to find complete Nvidia Driver/JIT release notes, where
> the 375.X -> 384.X notes might provide an idea of what got fixed, and we
> might then add another 'WORKAROUND_PTXJIT_BUG' for that -- maybe simple,
> maybe not.
> 
> Any thoughts, Tom?

I care about old cards, not about old drivers.  The oldest card we support is
an sm_30, and last driver series that supports that one is 470.x (and AFAIU, is
therefore supported by nvidia for that arch).

There's the legacy series, 390.x, which is the last to support fermi, but we
don't support any fermi cards or earlier.  I did do some testing with this one
for later cards, but reported issues are acknowledged but not fixed by nvidia,
so ... this is already out of scope for me.

So yeah, IWBN to come up with workarounds for various older drivers, but I'm
not investing time in that.  Is there a problem for you to move to 470.x or
later (515.x) ?  Is there a card for which that causes problems ?

[Bug debug/105772] [debug, i386] sched2 moves get_pc_thunk call past debug_insn

2022-05-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105772

--- Comment #2 from Tom de Vries  ---
As background info, I'm proposing a patch for gdb to have the
architecture-specific prologue skipper skip over the get_pc_thunk call:
https://sourceware.org/pipermail/gdb-patches/2022-May/189563.html , which helps
to skip over the prologue with -O0 -pie -fPIE code.

But that causes a regression in test-case gdb/testsuite/gdb.base/break.exp,
because of this PR.

[Bug debug/105772] New: [debug, i386] sched2 moves get_pc_thunk call past debug_insn

2022-05-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105772

Bug ID: 105772
   Summary: [debug, i386] sched2 moves get_pc_thunk call past
debug_insn
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider the test-case source gdb/testsuite/gdb.base/break1.c (
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/testsuite/gdb.base/break1.c;h=24d0d15dc2e8bd14f6b14fabbb38fec43db9c990;hb=HEAD
) containing function marker4.

Extracting the relevant part:
...
struct some_struct
{
  int a_field;
  int b_field;
  union { int z_field; };
};

struct some_struct values[50];

void marker4 (long d) { values[0].a_field = d; }/* set breakpoint 14
here */
...

When compiling with gcc 12.1.0 and -O2 like so:
...
$ gcc -fno-stack-protector -m32 -fPIE -pie -w -c -g break1.c -O2
...
we have before sched2:
...
(note 4 1 14 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 14 4 11 2 NOTE_INSN_PROLOGUE_END)
(insn/f 11 14 3 2 (parallel [
(set (reg:SI 0 ax [82])
(unspec:SI [
(const_int 0 [0])
] UNSPEC_SET_GOT))
(clobber (reg:CC 17 flags))
]) 931 {*set_got}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_EQUIV (unspec:SI [
(const_int 0 [0])
] UNSPEC_SET_GOT)
(expr_list:REG_CFA_FLUSH_QUEUE (nil)
(nil)
(note 3 11 6 2 NOTE_INSN_FUNCTION_BEG)
(debug_insn 6 3 12 2 (debug_marker) "break1.c":59:25 -1
 (nil))
(insn 12 6 8 2 (set (reg/v:SI 1 dx [orig:83 d ] [83])
(mem/c:SI (plus:SI (reg/f:SI 7 sp)
(const_int 4 [0x4])) [5 d+0 S4 A32])) "break1.c":59:43 81
{*movsi_internal}
 (expr_list:REG_EQUIV (mem/c:SI (reg/f:SI 16 argp) [5 d+0 S4 A32])
(nil)))
...
and after:
...
(note 4 1 14 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 14 4 3 2 NOTE_INSN_PROLOGUE_END)
(note 3 14 6 2 NOTE_INSN_FUNCTION_BEG)
(debug_insn 6 3 11 2 (debug_marker) "break1.c":59:25 -1
 (nil))
(insn/f:TI 11 6 12 2 (parallel [
(set (reg:SI 0 ax [82])
(unspec:SI [
(const_int 0 [0])
] UNSPEC_SET_GOT))
(clobber (reg:CC 17 flags))
]) 931 {*set_got}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_EQUIV (unspec:SI [
(const_int 0 [0])
] UNSPEC_SET_GOT)
(expr_list:REG_CFA_FLUSH_QUEUE (nil)
(nil)
(insn 12 11 8 2 (set (reg/v:SI 1 dx [orig:83 d ] [83])
(mem/c:SI (plus:SI (reg/f:SI 7 sp)
(const_int 4 [0x4])) [5 d+0 S4 A32])) "break1.c":59:43 81
{*movsi_internal}
 (expr_list:REG_EQUIV (mem/c:SI (reg/f:SI 16 argp) [5 d+0 S4 A32])
(nil)))
...

This moves the get_pc_thunk call after the debug_insn, making it (in terms of
debug info) part of the first statement instead of the prologue.

That is, with -O1 we have insn:
...
000d :
   d:   e8 fc ff ff ff  call   e 
  12:   05 01 00 00 00  add$0x1,%eax
  17:   8b 54 24 04 mov0x4(%esp),%edx
  1b:   89 90 00 00 00 00   mov%edx,0x0(%eax)
  21:   c3  ret
...
and line info:
...
File nameLine numberStarting addressViewStmt
break1.c  59 0xd   x
break1.c  59 0xd   1
break1.c  590x17   x
break1.c  590x17   1
break1.c  590x21
break1.c   -0x22
...
so at 0x17 we have the start of a statement.

But with -O2 we have identical insn:
...

0030 :
  30:   e8 fc ff ff ff  call   31 
  35:   05 01 00 00 00  add$0x1,%eax
  3a:   8b 54 24 04 mov0x4(%esp),%edx
  3e:   89 90 00 00 00 00   mov%edx,0x0(%eax)
  44:   c3  ret
...
but different line info:
...
File nameLine numberStarting addressViewStmt
break1.c  590x30   x
break1.c  590x30   1   x
break1.c  590x3a
break1.c  590x44
break1.c   -0x45
...
so at 0x3a we don't have the start of a statement.

[Bug target/104893] [nvptx] Handle Independent Thread Scheduling for sm_70+ with -msoft-stack

2022-03-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104893

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |WORKSFORME
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Tom de Vries  ---
(In reply to Tom de Vries from comment #1)
> (In reply to Tom de Vries from comment #0)
> > The per-thread call stack is handled for .local memory by the CUDA driver.
> > 
> > For the 'soft stack' that's not the case.
> 
> Hmm, actually there's .local memory used, just not "directly".  Possibly the
> documentation needs updating to point that out.
> 
> So, there doesn't seem to be an issue related to overlapping storage.
> 
> So I wonder, is the stack pointer also per thread then? Or still per-warp?

OK, here ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203#c6 ) we read:
...
The pointer is switched between per-warp global memory and per-lane local
memory.
...

So, I think this should be fine then.

Marking this resolved-worksforme until we run into an actual failing test-case.

[Bug target/104857] [nvptx] Add macro specifying ptx isa version

2022-03-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104857

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |12.0

--- Comment #3 from Tom de Vries  ---
Committed.

[Bug target/104714] [nvptx] Means to specify any sm_xx

2022-03-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104714

Tom de Vries  changed:

   What|Removed |Added

   Target Milestone|--- |12.0
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Tom de Vries  ---
Added march-map.

[Bug driver/105096] New: --target-help not an alias for --help=target

2022-03-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105096

Bug ID: 105096
   Summary: --target-help not an alias for --help=target
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: trivial
  Priority: P3
 Component: driver
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

In common.opt, we read:
...
-target-help
Common Driver
Alias for --help=target.
...

But that doesn't seem to be correct, I get different results.

For instance, for nvptx target, we have an malias that atm is undocumented, and
we have:
...
$ cc1 --target-help 2>&1 | grep malias
$
...
and:
...
$ cc1 --help=target 2>&1 | grep malias
  -malias   [disabled]
$...

[Bug c/53037] warn_if_not_aligned(X)

2022-03-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53037

Tom de Vries  changed:

   What|Removed |Added

 CC||vries at gcc dot gnu.org

--- Comment #43 from Tom de Vries  ---
*** Bug 81909 has been marked as a duplicate of this bug. ***

[Bug target/81909] Missing warning in gcc.dg/pr53037-{2,3}.c

2022-03-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81909

Tom de Vries  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from Tom de Vries  ---
Marking resolved-duplicate.

*** This bug has been marked as a duplicate of bug 53037 ***

[Bug target/81728] nvptx-run: error getting kernel result: the launch timed out and was terminated

2022-03-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81728

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |WORKSFORME
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Tom de Vries  ---
The error message looks like the one produced due to the 5 second watchdog that
is installed when the board is running a display manager.

Anyway, in order to run into this timeout, the test-case should hang, and it
currently doesn't.  At least not on any of the boards I've recently tested
with, with the supported drivers.  So, possibly this was fixed in gcc, or in a
driver.

There's not enough information in here to reproduce, so closing as
resolved-worksforme.

[Bug target/104818] Duplicate word "version" in option -mptx description

2022-03-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104818

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED
   Target Milestone|--- |12.0

--- Comment #2 from Tom de Vries  ---
Fixed in aforementioned commit.

[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver

2022-03-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042

Tom de Vries  changed:

   What|Removed |Added

   Severity|normal  |enhancement

--- Comment #8 from Tom de Vries  ---
With the conversation shifted to better error messages, re-classifying as
enhancement.

[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)

2022-03-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075

--- Comment #6 from Tom de Vries  ---
Created attachment 52698
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52698=edit
Demonstrator patch with stepping stone patterns for combine

(In reply to Tom de Vries from comment #2)
> Also, I wonder if defining a stepping-stone intermediate pattern could help
> combine.

Well, that approach seems to work.

[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test

2022-03-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014

--- Comment #6 from Tom de Vries  ---
Reproducer filed at https://github.com/vries/nvidia-bugs/tree/master/shift-and

PR filed at nvidia ( https://developer.nvidia.com/nvidia_bug/3585290 ).

[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)

2022-03-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075

Tom de Vries  changed:

   What|Removed |Added

  Attachment #52693|0   |1
is obsolete||

--- Comment #3 from Tom de Vries  ---
Created attachment 52694
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52694=edit
Demonstrator patch v2

Forgot to add test-case.

[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)

2022-03-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075

--- Comment #2 from Tom de Vries  ---
AFAIU, at gimple level support is limited to vectors, so that doesn't help to
generate the insn for the simple, scalar case.

It would be nice if combine could generate it and we wouldn't have to use a
peephole, but AFAIU the pattern is too complex for that.

I wonder if reformulating using a conditional could help there (ptx isa
describes semantics using "c + ((a

[Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences)

2022-03-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075

--- Comment #1 from Tom de Vries  ---
Created attachment 52693
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52693=edit
Demonstrator patch

This patch adds:
- modeling of insn sad.u32 in the .md file
- peephole2 to generate it
  (which is incomplete, it needs some safety-checks related to
  using unique intermediate regs)
- extra instance of peephole2 pass (otherwise, the peephole is not triggered)
- extra instance of fast_rtl_dce pass (otherwise, unused intermediate
  insn are not cleaned up)

So for the usad_2 in the test-case, we have without the patch:
...
cvt.u64.u32 %r32, %r28; 
cvt.u64.u32 %r33, %r29; 
sub.u64 %r34, %r32, %r33;   
abs.s64 %r35, %r34; 
cvt.u32.u64 %r36, %r35; 
add.u32 %value, %r36, %r30; 
...
and with:
...
sad.u32 %value, %r28, %r29, %r30;   
...

[Bug target/105075] New: [nvptx] Generate sad insn (sum of absolute differences)

2022-03-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105075

Bug ID: 105075
   Summary: [nvptx] Generate sad insn (sum of absolute
differences)
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

ptx has sad ((sum of absolute differences)) insn, which is currently not
modeled in the .md file.

[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver

2022-03-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042

--- Comment #5 from Tom de Vries  ---
(In reply to Richard Biener from comment #1)
> Doesn't whatever driver/library API we use from libgomp to invoke workloads
> report actual errors?  Maybe we need to improve there.

This:
...
libgomp: cuStreamSynchronize error: the launch timed out and was terminated
...
seems to be the string for cudaErrorLaunchTimeout, which AFAICT is dedicated to
this situation, so we could treat that error code specially in cuda_error in
plugin-nvptx.c and emit a custom message.

Say:
...
libgomp: cuStreamSynchronize error: the launch timed out and was terminated (5
second time-out caused by launching on a device running a display manager)
...

Alternatively, we could detect cudaDeviceProp::kernelExecTimeoutEnabled and
emit a warning when initializing or before launching the first kernel.

[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver

2022-03-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042

--- Comment #4 from Tom de Vries  ---
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592275.html

[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver

2022-03-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042

--- Comment #3 from Tom de Vries  ---
(In reply to Tom de Vries from comment #2)
> (In reply to Richard Biener from comment #1)
> > Doesn't whatever driver/library API we use from libgomp to invoke workloads
> > report actual errors?  Maybe we need to improve there.
> 
> Good point, it reported some form of timeout.  I'll post the exact form once
> I reproduce.

It's:
...
Execution timeout is: 300
spawn [open ...]^M

libgomp: cuStreamSynchronize error: the launch timed out and was terminated
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0 
execution test
...

Googling a bit about this error message (
https://forums.developer.nvidia.com/t/need-to-remove-timeouts-and-the-launch-timed-out-and-was-terminated-message/16741/2
) shows that running a display manager sets a 5/10 seconds watchdog timer on
any kernel.

[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test

2022-03-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014

--- Comment #5 from Tom de Vries  ---
Minimal test-case:
...
void __attribute__((noinline)) foo (unsigned long long d0) {
  unsigned long long __a;
  __a = 0x38;
  for (; __a > 0; __a -= 8)
if (((d0 >> __a) & 0xff) != 0)
  break;

   __builtin_printf ("__a: 0x%llx\n", __a);
}

int main (void) {
  foo (1);
  return 0;
}
...

Different value of __a:
...
$ ./install/bin/nvptx-none-run -O0 ./pr97459-1.exe ; echo;
./install/bin/nvptx-none-run ./pr97459-1.exe 
__a: 0x0

__a: 0x30
...

[Bug target/105011] [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test

2022-03-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |12.0
 Resolution|--- |FIXED

--- Comment #5 from Tom de Vries  ---
Fixed by commit.

[Bug libgomp/105042] [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver

2022-03-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042

--- Comment #2 from Tom de Vries  ---
(In reply to Richard Biener from comment #1)
> Doesn't whatever driver/library API we use from libgomp to invoke workloads
> report actual errors?  Maybe we need to improve there.

Good point, it reported some form of timeout.  I'll post the exact form once I
reproduce.

[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test

2022-03-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014

--- Comment #4 from Tom de Vries  ---
(In reply to Tom de Vries from comment #1)
> With -O0 JIT instead:

Also OK with JIT -O1, problems start at JIT -O2.

[Bug target/105011] [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test

2022-03-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011

--- Comment #3 from Tom de Vries  ---
Submitted fix :
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592211.html

Though without changelog, apparently.

[Bug libgomp/105042] New: [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite failures when X runs on nvidia driver

2022-03-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042

Bug ID: 105042
   Summary: [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite
failures when X runs on nvidia driver
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

I usually have only an nvidia-compute$n driver package installed, but sometimes
(as happened when I updated the system yesterday) also x11-video-nvidia$n,
after which X is run on the nvidia card (instead of on the builtin intel
graphics).

With such a setup, I run into a cluster of FAILs, all for GOMP_NVPTX_JIT=-O0:
...
$ grep ^FAIL: 2/libgomp.sum
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vred2d-128.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vred2d-128.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/parallel-dims.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/vred2d-128.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/vred2d-128.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-fortran/parallel-dims.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0 -DGOMP_NVPTX_JIT=-O0 execution
test
FAIL: libgomp.oacc-fortran/parallel-dims.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1 -DGOMP_NVPTX_JIT=-O0 execution
test
FAIL: libgomp.oacc-fortran/parallel-dims.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -Os -DGOMP_NVPTX_JIT=-O0 execution
test
...

Note that this is with a patch from PR104423 that runs tests both with default
JIT optimization and GOMP_NVPTX_JIT=-O0, hence the -DGOMP_NVPTX_JIT=-O0 tag. 
But it can be reproduced by just doing:
...
export GOMP_NVPTX_JIT=-O0
...

It could be that the test-cases just need scaling down.  OTOH, it also could be
that there's an underlying problem that only surfaces when other processes are
run in parallel, or specifically, X.

This is on board K2000 with driver 470.103.01.

The board has 2GB of memory, and according to nvidia-smi, having the X
processes takes a couple of 100MBs, and ./parallel-dims.exe just takes 15MiB,
so at first glance it doesn't seem to be an out-of-board-memory thing.

I do observe reduced system responsiveness while running the tests, so maybe
it's the compute capacity rather than memory which is exhausted.

[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"

2022-03-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019

--- Comment #6 from Tom de Vries  ---
(In reply to Tom de Vries from comment #4)
> OK, I think this is the pattern:
> ...
> $ cat gcc/testsuite/gcc.target/nvptx/alias-5.c

FTR, same thing if I use static functions:
...
static void __attribute__((noinline))
__f ()
{
  v = 1;
}

static void f () __attribute__ ((alias ("__f")));

static void
g (void)
{
  f ();
}
...

[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"

2022-03-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019

--- Comment #5 from Tom de Vries  ---
Creating a CUDA example is hampered by the fact that there's no symbol alias
support, AFAICT.

I'd like to write something like:
...
__device__ void __foo ()
{
  printf ("__foo\n");
}
__device__ void foo () __attribute__((alias ("__foo")));
__device__ void bar ()
{
  foo ();
}
__global__ void hello_world ()
{
  bar ();
}
...
(and then comment out bar to reproduce the problem), but all attempts to get
this compiled and executed end up in various errors.

Of course we can resort to hand-edited ptx, or adding .alias directives in asm
statements, but that's likely to produce an 'unsupported' response by nvidia
when filed as bug report.

[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"

2022-03-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019

--- Comment #4 from Tom de Vries  ---
OK, I think this is the pattern:
...
$ cat gcc/testsuite/gcc.target/nvptx/alias-5.c
/* { dg-do link } */
/* { dg-do run { target runtime_ptx_isa_version_6_3 } } */
/* { dg-options "-save-temps -malias -mptx=6.3" } */

int v;

void __attribute__((noinline))
__f ()
{
  v = 1;
}

void f () __attribute__ ((alias ("__f")));

void
g (void)
{
  f ();
}

int
main (void)
{
  if (v != 0)
__builtin_abort ();

  return 0;
}
...

There's a function __f:
...
// BEGIN GLOBAL FUNCTION DEF: __f
.visible .func __f
{
.reg .u32 %r22;
mov.u32 %r22,1;
st.global.u32 [v],%r22;
ret;
}
...
with alias f:
...
// BEGIN GLOBAL FUNCTION DECL: __f
.visible .func __f;
.visible .func f;
.alias f,__f;
...
called from g:
...
// BEGIN GLOBAL FUNCTION DEF: g
.visible .func g
{
{
call f;
}
ret;
}
...

However, g is unused.

So we have:
...
PASS: gcc.target/nvptx/alias-5.c (test for excess errors)
spawn nvptx-none-run ./alias-5.exe^M
fatal   : Internal error: reference to deleted section^M
nvptx-run: cuLinkComplete failed: unknown error (CUDA_ERROR_UNKNOWN, 999)^M
FAIL: gcc.target/nvptx/alias-5.c execution test
...

Calling g from main fixes the internal error.

[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"

2022-03-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019

--- Comment #3 from Tom de Vries  ---
Aliases in failing .exe:
...
$ strings declare_target-1.exe | grep "\.alias"
.alias gomp_ialias_GOMP_taskgroup_start,GOMP_taskgroup_start;
.alias gomp_ialias_GOMP_taskgroup_end,GOMP_taskgroup_end;
.alias
gomp_ialias_GOMP_taskgroup_reduction_register,GOMP_taskgroup_reduction_register;
...

[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"

2022-03-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019

--- Comment #2 from Tom de Vries  ---
Aliases in libgomp.a:
...
$ grep "\.alias"
build-gcc-offload-nvptx-none/nvptx-none/mgomp/libgomp/.libs/libgomp.a
.alias gomp_ialias_GOMP_loop_runtime_next,GOMP_loop_runtime_next;
.alias gomp_ialias_GOMP_loop_ull_runtime_next,GOMP_loop_ull_runtime_next;
.alias gomp_ialias_GOMP_parallel_end,GOMP_parallel_end;
.alias gomp_ialias_GOMP_taskgroup_start,GOMP_taskgroup_start;
.alias gomp_ialias_GOMP_taskgroup_end,GOMP_taskgroup_end;
.alias
gomp_ialias_GOMP_taskgroup_reduction_register,GOMP_taskgroup_reduction_register;
.alias gomp_ialias_omp_capture_affinity,omp_capture_affinity;
.alias gomp_ialias_omp_aligned_alloc,omp_aligned_alloc;
.alias gomp_ialias_omp_free,omp_free;
.alias gomp_ialias_omp_aligned_calloc,omp_aligned_calloc;
...

[Bug target/105019] [nvptx] malias in libgomp results in "Internal error: reference to deleted section"

2022-03-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019

--- Comment #1 from Tom de Vries  ---
To trigger:
...
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 87efc23bd96..8bf9ea90a77 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -245,6 +245,9 @@ default_ptx_version_option (void)
  warp convergence.  */
   res = MAX (res, PTX_VERSION_6_0);

+  /* Pick at least 6.3, to enable using malias.  */
+  res = MAX (res, PTX_VERSION_6_3);
+
   /* Verify that we pick a version that supports the sm.  */
   gcc_assert (first <= res);
   return res;
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 11288d1a8ee..a4aece80682 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -87,7 +87,7 @@ mptx-comment
 Target Var(nvptx_comment) Init(1) Undocumented

 malias-
-Target Var(nvptx_alias) Init(0) Undocumented
+Target Var(nvptx_alias) Init(1) Undocumented

 mexperimental
 Target Var(nvptx_experimental) Init(0) Undocumented
...
rebuild gcc, run libgomp tests, and:
...
$ grep -c "Internal error: reference to deleted section" libgomp.log 
637
...

For instance:
...
Execution timeout is: 300
spawn [open ...]^M

libgomp: Link error log fatal   : Internal error: reference to deleted section


libgomp: cuLinkComplete error: unknown error

libgomp: Cannot map target functions or variables (expected 2, have 4294967295)
FAIL: libgomp.c/../libgomp.c-c++-common/declare_target-1.c execution test
...

[Bug target/105019] New: [nvptx] malias in libgomp results in "Internal error: reference to deleted section"

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105019

Bug ID: 105019
   Summary: [nvptx] malias in libgomp results in "Internal error:
reference to deleted section"
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

As mentioned in the commit message for malias:
...
When enabling malias by default, libgomp detects alias support and
consequently libgomp.a will contains a few uses of .alias.  This however
results in aforementioned "Internal error: reference to deleted section" in
many test-cases.  Either there's some error with how .alias is used, or
there's a driver bug.  While this issue is not resolved, we keep malias
off-by-default.
...

This needs to be investigated, and if it's a driver bug, reported to nvidia, or
otherwise fixed or worked around.

Note: the same error showed up in a test-case where the call to an alias was
inlined, and consequently the alias referenced a defined but unused function. 
To observe this, disable this bit:
...
  if (!cgraph_node::get (name)->referred_to_p ())
/* Prevent "Internal error: reference to deleted section".  */
return;
...
in nvptx_asm_output_def_from_decls and run nvptx.exp=alias-2.c:
...
PASS: gcc.target/nvptx/alias-2.c (test for excess errors)
spawn nvptx-none-run ./alias-2.exe^M
fatal   : Internal error: reference to deleted section^M
nvptx-run: cuLinkComplete failed: unknown error (CUDA_ERROR_UNKNOWN, 999)^M
FAIL: gcc.target/nvptx/alias-2.c execution test
...

So it's possible that the error somehow related to this scenario.

[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014

--- Comment #3 from Tom de Vries  ---
(In reply to Tom de Vries from comment #2)
> (In reply to Tom de Vries from comment #0)
> > On a quadro k2000 with driver 470.103.01, I run into:
> 
> So, sm_30.
> 
> > ...
> > FAIL: gcc.dg/pr97459-1.c execution test
> 
> Reproduced on geforce gt710 (sm_35), with same driver.

But not on quadro k620 (sm_50).

[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014

--- Comment #2 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> On a quadro k2000 with driver 470.103.01, I run into:

So, sm_30.

> ...
> FAIL: gcc.dg/pr97459-1.c execution test

Reproduced on geforce gt710 (sm_35), with same driver.

[Bug target/105018] [nvptx] Need better alias support

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105018

--- Comment #2 from Tom de Vries  ---
As mentioned before by amonakov, a possibility is to add alias support to the
nvptx-tools linker, and use that.

[Bug target/105018] [nvptx] Need better alias support

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105018

--- Comment #1 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> Aliases to aliases are not supported (see libgomp.c-c++-common/pr96390.c).
> This is currently not prohibited by the compiler, but with the driver link we
> run into:  "Internal error: alias to unknown symbol" .

And that is the reason that libgomp.c-c++-common/pr96390.c and friends doesn't
pass when I do:
...
/* { dg-additional-options "-foffload=-mptx=6.3 -foffload=-malias" { target
offload_target_nvptx } } */
...

[Bug target/105018] New: [nvptx] Need better alias support

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105018

Bug ID: 105018
   Summary: [nvptx] Need better alias support
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

We currently have alias support enabled by malias, which relies on the ptx
.alias directive.

There is a number of limitations, listed in the commit adding malias:
...
Only function aliases are supported.

Weak aliases are not supported.  That is, if I disable the check in
nvptx_asm_output_def_from_decls that disallows this, a weak alias is emitted
and parsed by the driver.  But the test gcc.dg/globalalias.c starts failing,
with the behaviour matching the comment about "weird behavior of AIX's .set
pseudo-op": a weak alias may resolve to different functions in different
files.

Aliases to weak symbols are not supported (see gcc.dg/localalias.c).  This is
currently not prohibited by the compiler, but with the driver link we run
into: "error: Function test with .weak scope cannot be aliased".

Aliases to aliases are not supported (see libgomp.c-c++-common/pr96390.c).
This is currently not prohibited by the compiler, but with the driver link we
run into:  "Internal error: alias to unknown symbol" .

Unreferenced aliases are not emitted (these can occur f.i. when inlining a
call to an alias).  This avoids driver link error "Internal error: reference
to deleted section".
...

We'd like an implementation that doesn't have (all of) these limitations.

[Bug target/97106] [nvptx] Issues with weak aliases introduced by C++

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97106

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |12.0
 Resolution|--- |FIXED

--- Comment #5 from Tom de Vries  ---
Using the test-case from comment 0 and:
...
/* { dg-additional-options "-foffload=-malias -foffload=-mptx=6.3 -O0" } */
...
I get:
...
$ strings test.exe | grep -i alias.alias
_ZN1VILi1EEC1ImvEET_,_ZN1VILi1EEC2ImvEET_;
...
so I see a normal alias, not a weak alias.

The test-case still fails in the abort.

Note that I get the same result with:
...
/* { dg-additional-options "-foffload=-mno-alias -foffload=-mptx=6.3 -O2" } */
...

There may be a problem with the test-case, there may be a problem with nvptx
c++ support, but the alias issue seems to have been addresses, so I'm closing
this one.

[Bug target/97106] [nvptx] Issues with weak aliases introduced by C++

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97106
Bug 97106 depends on bug 97102, which changed state.

Bug 97102 Summary: [nvptx] PTX JIT compilation failed when using aliases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97102

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug target/97102] [nvptx] PTX JIT compilation failed when using aliases

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97102

Tom de Vries  changed:

   What|Removed |Added

   Target Milestone|--- |12.0
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #8 from Tom de Vries  ---
The test-case from comment 0 now works in combination with:
/* { dg-additional-options "-foffload=-malias -foffload=-mptx=6.3" } */

[Bug libgomp/98215] Coalescing memory in target region creates slower code

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98215

Tom de Vries  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   Keywords||missed-optimization
 CC||vries at gcc dot gnu.org

[Bug target/104916] [nvptx] Handle Independent Thread Scheduling for sm_70+ with -muniform-simt

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104916

Tom de Vries  changed:

   What|Removed |Added

   Target Milestone|--- |12.0

[Bug target/104916] [nvptx] Handle Independent Thread Scheduling for sm_70+ with -muniform-simt

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104916

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Tom de Vries  ---
Fixed.

[Bug target/104783] [nvptx, openmp] Hang/abort with atomic update in simd construct

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104783

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |12.0
 Status|UNCONFIRMED |RESOLVED

--- Comment #8 from Tom de Vries  ---
Fixed.

[Bug target/104957] [nvptx] Use .alias directive (available starting ptx isa version 6.3)

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104957

Tom de Vries  changed:

   What|Removed |Added

   Target Milestone|--- |12.0
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Tom de Vries  ---
Committed.

[Bug target/104925] [nvptx] Use "%" as register prefix

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104925

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |12.0

--- Comment #2 from Tom de Vries  ---
Fixed.

[Bug libgcc/105016] [libgcc, TARGET_HAS_NO_HW_DIVIDE] Incorrect result for __udivmodti4

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105016

--- Comment #3 from Tom de Vries  ---
In libgcc.h, I see:
...
#define __udivmoddi4__NDW(udivmod,4)
...
and for LIBGCC2_UNITS_PER_WORD == 8 we have:
...
#define __NDW(a,b)  __ ## a ## ti ## b
...

So, AFAICT it's possible that __udivmoddi4 is mapped to __udivmodti4.

[Bug libgcc/105016] [libgcc, TARGET_HAS_NO_HW_DIVIDE] Incorrect result for __udivmodti4

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105016

--- Comment #1 from Tom de Vries  ---
Created attachment 52662
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52662=edit
test-case

[Bug libgcc/105016] New: [libgcc, TARGET_HAS_NO_HW_DIVIDE] Incorrect result for __udivmodti4

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105016

Bug ID: 105016
   Summary: [libgcc, TARGET_HAS_NO_HW_DIVIDE] Incorrect result for
__udivmodti4
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

While investigating PR105014, I copied the TARGET_HAS_NO_HW_DIVIDE
implementation of __udivmoddi4 to the test-case.

On x86, when using the native __udivmodti4, I get:
...
$ ./a.out 
a   : 0xfffb
b   : 0x0001
div : 0xfffb
mod : 0x
... 

But with the TARGET_HAS_NO_HW_DIVIDE version I get instead:
...
a   : 0xfffb
b   : 0x0001
div : 0x
mod : 0xfffc
...

[Bug target/105014] [nvptx] FAIL: gcc.dg/pr97459-1.c execution test

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014

--- Comment #1 from Tom de Vries  ---
First FAIL minimizes to:
...
typedef __uint128_t T;

union u {
  T t;
  struct {
unsigned long long x;
unsigned long long y;
  } xy;
};

#define PRINT(VAR)  \
  do\
{   \
  __builtin_printf (#VAR ": lo: %llx\n", VAR.xy.x); \
  __builtin_printf (#VAR ": hi: %llx\n", VAR.xy.y); \
} \
  while (0)

extern T __udivmodti4 (T, T, T *);

int
main (void)
{
  union u a, b, mod, div;
  a.t = -4;
  b.t = 1;
  PRINT (a);
  PRINT (b);
  div.t = __udivmodti4 (a.t, b.t, );
  PRINT (div);
  PRINT (mod);
  if (mod.t != 0)
__builtin_abort ();

  return 0;
}
...

Fails like this:
...
$ ./install/bin/nvptx-none-run  ./pr97459-1.exe 
a: lo: fffc
a: hi: 
b: lo: 1
b: hi: 0
div: lo: fffd
div: hi: 
mod: lo: 
mod: hi: 0
nvptx-run: error getting kernel result: unspecified launch failure
(CUDA_ERROR_LAUNCH_FAILED, 719)
$
...

With -O0 JIT instead:
...
$ ./install/bin/nvptx-none-run -O0  ./pr97459-1.exe 
a: lo: fffc
a: hi: 
b: lo: 1
b: hi: 0
div: lo: fffc
div: hi: 
mod: lo: 0
mod: hi: 0
...

[Bug target/105014] New: [nvptx] FAIL: gcc.dg/pr97459-1.c execution test

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105014

Bug ID: 105014
   Summary: [nvptx] FAIL: gcc.dg/pr97459-1.c execution test
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

On a quadro k2000 with driver 470.103.01, I run into:
...
FAIL: gcc.dg/pr97459-1.c execution test
FAIL: gcc.dg/pr97459-2.c execution test
FAIL: gcc.dg/pr97459-3.c execution test
FAIL: gcc.dg/pr97459-4.c execution test
FAIL: gcc.dg/pr97459-5.c execution test
FAIL: gcc.dg/pr97459-6.c execution test
...

[Bug target/105011] [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011

--- Comment #2 from Tom de Vries  ---
Even better:
...
diff --git a/libatomic/tas_n.c b/libatomic/tas_n.c
index d0d8c283b495..65eaa7753a51 100644
--- a/libatomic/tas_n.c
+++ b/libatomic/tas_n.c
@@ -73,7 +73,7 @@ SIZE(libat_test_and_set) (UTYPE *mptr, int smodel)
 __ATOMIC_RELAXED, __ATOMIC_RELAXED));

   post_barrier (smodel);
-  return woldval != 0;
+  return (woldval & wval) == wval;
 }

 #define DONE 1
...

That also gives back accurate results in case
TARGET_ATOMIC_TEST_AND_SET_TRUEVAL has more than one bit set.

[Bug target/105011] [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011

--- Comment #1 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> It should probably do something like:
> ...
>   return (woldval & wval) != 0;
> ...

Indeed, that fixes the FAILs.

[Bug target/105011] New: [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test

2022-03-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105011

Bug ID: 105011
   Summary: [nvptx] FAIL: gcc.dg/atomic/stdatomic-flag-2.c   -O1
execution test
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

On a quadro k2000 with driver 470.103.01 I'm running into this cluster of
FAILs:
...
FAIL: gcc.dg/atomic/stdatomic-flag-2.c   -O1  execution test
FAIL: gcc.dg/atomic/stdatomic-flag-2.c   -O3 -g  execution test
FAIL: gcc.dg/atomic/stdatomic-flag.c   -O3 -g  execution test
...

Minimizing the first FAIL, I end up with:
...
#include 

extern void abort (void);
atomic_flag a = ATOMIC_FLAG_INIT;

int
main ()
{
  int b;

  if ((atomic_flag_test_and_set) ())
abort ();

  return 0;
}
...

The atomic access is done by libatomic, using a 64-bit cas loop.

If we print the address of a, we have:
...
: 000700700200
...
so the pointer is already 64-bit aligned.

If we print the 64-bit value of *(unsigned long long *) before and after the
test-and-set, we have:
...
a: 00024f00
a: 00024f01
...
so that looks all-right as well.

At first glance, the problem is in libatomic, tas_n.c:
...
  wval = (UWORD)__GCC_ATOMIC_TEST_AND_SET_TRUEVAL << shift;
  woldval = __atomic_load_n (wptr, __ATOMIC_RELAXED);
  do
{
  t = woldval | wval;
}
  while (!atomic_compare_exchange_w (wptr, , t, true,
 __ATOMIC_RELAXED, __ATOMIC_RELAXED));

  post_barrier (smodel);
  return woldval != 0;
...

What is returned is woldval != 0, but that tests the entire word, not just the
byte we're interested in.

It should probably do something like:
...
  return (woldval & wval) != 0;
...

[Bug middle-end/105001] If executing with non-nvptx offloading, but nvptx offloading compilation is enabled: FAIL: libgomp.c/pr104783.c execution test

2022-03-21 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105001

--- Comment #1 from Tom de Vries  ---
Interesting.

Can you compare dump files to see where the difference comes from?

[Bug target/104936] [nvptx] Handle weak decl/def distinction in common code

2022-03-21 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104936

Tom de Vries  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   Keywords||internal-improvement

[Bug target/104991] New: [nvptx] Simplify muniform-simt transformation

2022-03-20 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104991

Bug ID: 104991
   Summary: [nvptx] Simplify muniform-simt transformation
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

The muniform-simt reorg pass transforms the insn stream, both inside and
outside an SIMT region.

The transform rewrites atomic insns by adding a predicate to execute in one
thread only outside the SIMT region.  Furthermore, if the atomic insn has a
result, a shuffle is added to propagate the result to all threads in the warp.

Inside the SIMT region, the predicate evaluates to true such that all threads
in the warp execute it.

And the source lane register for the shuffle is set such that the shuffle is a
nop inside the SIMT region.

However, since we've started using shfl.sync for the shuffle, the shuffle now
has it's own predicate, and consequently having a source lane register with
different values inside and outside the SIMT region is no longer necessary.

[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1

2022-03-18 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |12.0
 Status|UNCONFIRMED |RESOLVED

--- Comment #7 from Tom de Vries  ---
Fixed by
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=356e2720e9030927579024c2f060d665a0b9080f
.

[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT

2022-03-18 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |12.0
 Status|UNCONFIRMED |RESOLVED

--- Comment #12 from Tom de Vries  ---
Fixed by "[openmp] Fix SIMT reduction using TRUTH_{AND,OR}IF_EXPR".

[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1

2022-03-18 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968

--- Comment #6 from Tom de Vries  ---
(In reply to Tom de Vries from comment #5)
> This patch fixes the ICE at openmp level:
> ...
> diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> index 139a0de6100..19af384c634 100644
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -13361,6 +13361,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
>g = gimple_build_bind (NULL_TREE, gfor, NULL_TREE);
>g = gimple_build_omp_task (g, task_clauses, NULL_TREE, NULL_TREE,
>  NULL_TREE, NULL_TREE, NULL_TREE);
> +  gimple_set_location (g, EXPR_LOCATION (*expr_p));
>gimple_omp_task_set_taskloop_p (g, true);
>g = gimple_build_bind (NULL_TREE, g, NULL_TREE);
>gomp_for *gforo
> ...

Submitted a more complete patch here (
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591954.html ).

[Bug target/104957] [nvptx] Use .alias directive (available starting ptx isa version 6.3)

2022-03-18 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104957

Tom de Vries  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968

Tom de Vries  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #5 from Tom de Vries  ---
This patch fixes the ICE at openmp level:
...
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 139a0de6100..19af384c634 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -13361,6 +13361,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
   g = gimple_build_bind (NULL_TREE, gfor, NULL_TREE);
   g = gimple_build_omp_task (g, task_clauses, NULL_TREE, NULL_TREE,
 NULL_TREE, NULL_TREE, NULL_TREE);
+  gimple_set_location (g, EXPR_LOCATION (*expr_p));
   gimple_omp_task_set_taskloop_p (g, true);
   g = gimple_build_bind (NULL_TREE, g, NULL_TREE);
   gomp_for *gforo
...

[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968

--- Comment #4 from Tom de Vries  ---
This ( https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591912.html )
proposed patch fixes this ICE, pinged again.

[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968

--- Comment #3 from Tom de Vries  ---
(In reply to Tom de Vries from comment #2)
> (In reply to Tom de Vries from comment #1)
> > Can't reproduce.
> > 
> > It this not fixed by:
> > ...
> > commit 7862f6ccd85a001e4d70abb00bb95d8c7846ba80
> > Author: Tom de Vries 
> > Date:   Wed Feb 23 09:33:33 2022 +0100
> > 
> > [nvptx] Fix dummy location in gen_comment
> > ...
> > ?
> 
> Hmm, wait, of course I have a patch in my stack that's pending for upstream.
> Let me undo that one and retry.

Ack, reproduced.

[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968

--- Comment #2 from Tom de Vries  ---
(In reply to Tom de Vries from comment #1)
> Can't reproduce.
> 
> It this not fixed by:
> ...
> commit 7862f6ccd85a001e4d70abb00bb95d8c7846ba80
> Author: Tom de Vries 
> Date:   Wed Feb 23 09:33:33 2022 +0100
> 
> [nvptx] Fix dummy location in gen_comment
> ...
> ?

Hmm, wait, of course I have a patch in my stack that's pending for upstream.
Let me undo that one and retry.

[Bug target/104968] [nvptx][OpenMP] SIGSEGV / ICE in final_scan_insn_1

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104968

--- Comment #1 from Tom de Vries  ---
Can't reproduce.

It this not fixed by:
...
commit 7862f6ccd85a001e4d70abb00bb95d8c7846ba80
Author: Tom de Vries 
Date:   Wed Feb 23 09:33:33 2022 +0100

[nvptx] Fix dummy location in gen_comment
...
?

[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952

--- Comment #9 from Tom de Vries  ---
Created attachment 52647
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52647=edit
Tentative patch with test-cases, rationale and changelog

I'll put this through testing, and submit if no problems found.

[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952

--- Comment #8 from Tom de Vries  ---
(In reply to Jakub Jelinek from comment #6)
> And yes, #c1 is valid.

Thanks for confirming.

> But would be nice to have similar test with && and
> initial result = 2; and arr[] say { 1, 2, 3, 4, 5, 6, 7, ..., 32 } and test
> result is 1 at the end to make sure we don't actually do just
> orig = orig & (private != 0)
> style merging or even just
> orig = orig & private;

Ack, will add that.

[Bug target/104952] [nvptx][OpenMP] wrong code with OR / AND reduction ('reduction(||:' and '&&') with SIMT

2022-03-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104952

--- Comment #7 from Tom de Vries  ---
Alternative fix that doesn't require fiddling with the 'code' var:
...
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index d932d74cb03..d0ddd4a6142 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -6734,7 +6734,10 @@ lower_rec_input_clauses (tree clauses, gimple_seq
*ilist, gimpl
e_seq *dlist,
  x = build_call_expr_internal_loc
(UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
 TREE_TYPE (ivar), 2, ivar, simt_lane);
- x = build2 (code, TREE_TYPE (ivar), ivar, x);
+ /* Make sure x is evaluated unconditionally.  */
+ tree bfly_var = create_tmp_var (TREE_TYPE (ivar));
+ gimplify_assign (bfly_var, x, [2]);
+ x = build2 (code, TREE_TYPE (ivar), ivar, bfly_var);
  gimplify_assign (ivar, x, [2]);
}
  tree ivar2 = ivar;
...

  1   2   3   4   5   6   7   8   9   10   >