Enable 'gcc.dg/pr114768.c' for nvptx target [PR114768] (was: [PATCH] rtlanal: Fix set_noop_p for volatile loads or stores [PR114768])

2024-04-19 Thread Thomas Schwinge
Hi!

On 2024-04-19T12:30:25+0200, Jakub Jelinek  wrote:
> On Fri, Apr 19, 2024 at 12:23:03PM +0200, Thomas Schwinge wrote:
>> On 2024-04-19T08:24:03+0200, Jakub Jelinek  wrote:
>> > --- gcc/testsuite/gcc.dg/pr114768.c.jj 2024-04-18 15:37:49.139433678 
>> > +0200
>> > +++ gcc/testsuite/gcc.dg/pr114768.c2024-04-18 15:43:30.389730365 
>> > +0200
>> > @@ -0,0 +1,10 @@
>> > +/* PR rtl-optimization/114768 */
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O2 -fdump-rtl-final" } */
>> > +/* { dg-final { scan-rtl-dump "\\\(mem/v:" "final" { target { ! { 
>> > nvptx*-*-* } } } } } */
>> > +
>> > +void
>> > +foo (int *p)
>> > +{
>> > +  *p = *(volatile int *) p;
>> > +}
>> 
>> Why exclude nvptx target here?  As far as I can see, it does behave in
>> the exactly same way as expected; see 'diff' of before vs. after the
>> 'gcc/rtlanal.cc' code changes:
>
> I wasn't sure if the non-RA targets (for which we don't have an effective
> target) even have final dump.
> If they do as you show, then guess the target guard can go.

ACK.  Pushed to trunk branch in
commit 9451b6c0a941dc44ca6f14ff8565d74fe56cca59
"Enable 'gcc.dg/pr114768.c' for nvptx target [PR114768]", see attached.


Grüße
 Thomas


>From 9451b6c0a941dc44ca6f14ff8565d74fe56cca59 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 19 Apr 2024 12:32:03 +0200
Subject: [PATCH] Enable 'gcc.dg/pr114768.c' for nvptx target [PR114768]

Follow-up to commit 9f295847a9c32081bdd0fe908ffba58e830a24fb
"rtlanal: Fix set_noop_p for volatile loads or stores [PR114768]": nvptx does
behave in the exactly same way as expected; see 'diff' of before vs. after the
'gcc/rtlanal.cc' code changes:

PASS: gcc.dg/pr114768.c (test for excess errors)
[-FAIL:-]{+PASS:+} gcc.dg/pr114768.c scan-rtl-dump final "\\(mem/v:"

--- 0/pr114768.c.347r.final	2024-04-19 11:34:34.577037596 +0200
+++ ./pr114768.c.347r.final	2024-04-19 12:08:00.118312524 +0200
@@ -13,15 +13,27 @@
 ;;  entry block defs 	 1 [%stack] 2 [%frame] 3 [%args]
 ;;  exit block uses 	 1 [%stack] 2 [%frame]
 ;;  regs ever live
-;;  ref usage 	r1={1d,2u} r2={1d,2u} r3={1d,1u}
-;;total ref usage 8{3d,5u,0e} in 1{1 regular + 0 call} insns.
+;;  ref usage 	r1={1d,3u} r2={1d,3u} r3={1d,2u} r22={1d,1u} r23={1d,2u}
+;;total ref usage 16{5d,11u,0e} in 4{4 regular + 0 call} insns.
 (note 1 0 4 NOTE_INSN_DELETED)
 (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
-(note 2 4 3 2 NOTE_INSN_DELETED)
+(insn 2 4 3 2 (set (reg/v/f:DI 23 [ p ])
+(unspec:DI [
+(const_int 0 [0])
+] UNSPEC_ARG_REG)) "source-gcc/gcc/testsuite/gcc.dg/pr114768.c":8:1 14 {load_arg_regdi}
+ (nil))
 (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
-(note 6 3 10 2 NOTE_INSN_DELETED)
-(note 10 6 11 2 NOTE_INSN_EPILOGUE_BEG)
-(jump_insn 11 10 12 2 (return) "source-gcc/gcc/testsuite/gcc.dg/pr114768.c":10:1 289 {return}
+(insn 6 3 7 2 (set (reg:SI 22 [ _1 ])
+(mem/v:SI (reg/v/f:DI 23 [ p ]) [1 MEM[(volatile int *)p_3(D)]+0 S4 A32])) "source-gcc/gcc/testsuite/gcc.dg/pr114768.c":9:8 6 {*movsi_insn}
+ (nil))
+(insn 7 6 10 2 (set (mem:SI (reg/v/f:DI 23 [ p ]) [1 *p_3(D)+0 S4 A32])
+(reg:SI 22 [ _1 ])) "source-gcc/gcc/testsuite/gcc.dg/pr114768.c":9:6 6 {*movsi_insn}
+ (expr_list:REG_DEAD (reg/v/f:DI 23 [ p ])
+(expr_list:REG_DEAD (reg:SI 22 [ _1 ])
+(nil
+(note 10 7 13 2 NOTE_INSN_EPILOGUE_BEG)
+(note 13 10 11 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
+(jump_insn 11 13 12 3 (return) "source-gcc/gcc/testsuite/gcc.dg/pr114768.c":10:1 289 {return}
	  (nil)
  -> return)
 (barrier 12 11 0)

--- 0/pr114768.s	2024-04-19 11:34:34.577037596 +0200
+++ ./pr114768.s	2024-04-19 12:08:00.118312524 +0200
@@ -13,5 +13,10 @@
 {
	.reg.u64 %ar0;
	ld.param.u64 %ar0, [%in_ar0];
+	.reg.u32 %r22;
+	.reg.u64 %r23;
+		mov.u64	%r23, %ar0;
+		ld.u32	%r22, [%r23];
+		st.u32	[%r23], %r22;
	ret;
 }

	PR testsuite/114768
	gcc/testsuite/
	* gcc.dg/pr114768.c: Enable for nvptx target.
---
 gcc/testsuite/gcc.dg/pr114768.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr114768.c b/gcc/testsuite/gcc.dg/pr114768.c
index 2075f0d6b82..ffe3b368638 100644
--- a/gcc/testsuite/gcc.dg/pr114768.c
+++ b/gcc/testsuite/gcc.dg/pr114768.c
@@ -1,7 +1,7 @@
 /* PR rtl-optimization/114768 */
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-rtl-final" } */
-/* { dg-final { scan-rtl-dump "\\\(mem/v:" "final" { target { ! { nvptx*-*-* } } } } } */
+/* { dg-final { scan-rtl-dump "\\\(mem/v:" "final" } } */
 
 void
 foo (int *p)
-- 
2.34.1



Re: [PATCH] rtlanal: Fix set_noop_p for volatile loads or stores [PR114768]

2024-04-19 Thread Thomas Schwinge
Hi Jakub!

On 2024-04-19T08:24:03+0200, Jakub Jelinek  wrote:
> --- gcc/testsuite/gcc.dg/pr114768.c.jj2024-04-18 15:37:49.139433678 
> +0200
> +++ gcc/testsuite/gcc.dg/pr114768.c   2024-04-18 15:43:30.389730365 +0200
> @@ -0,0 +1,10 @@
> +/* PR rtl-optimization/114768 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-final" } */
> +/* { dg-final { scan-rtl-dump "\\\(mem/v:" "final" { target { ! { nvptx*-*-* 
> } } } } } */
> +
> +void
> +foo (int *p)
> +{
> +  *p = *(volatile int *) p;
> +}

Why exclude nvptx target here?  As far as I can see, it does behave in
the exactly same way as expected; see 'diff' of before vs. after the
'gcc/rtlanal.cc' code changes:

PASS: gcc.dg/pr114768.c (test for excess errors)
[-FAIL:-]{+PASS:+} gcc.dg/pr114768.c scan-rtl-dump final "\\(mem/v:"

--- 0/pr114768.c.347r.final 2024-04-19 11:34:34.577037596 +0200
+++ ./pr114768.c.347r.final 2024-04-19 12:08:00.118312524 +0200
@@ -13,15 +13,27 @@
 ;;  entry block defs1 [%stack] 2 [%frame] 3 [%args]
 ;;  exit block uses 1 [%stack] 2 [%frame]
 ;;  regs ever live 
-;;  ref usage  r1={1d,2u} r2={1d,2u} r3={1d,1u} 
-;;total ref usage 8{3d,5u,0e} in 1{1 regular + 0 call} insns.
+;;  ref usage  r1={1d,3u} r2={1d,3u} r3={1d,2u} r22={1d,1u} 
r23={1d,2u} 
+;;total ref usage 16{5d,11u,0e} in 4{4 regular + 0 call} insns.
 (note 1 0 4 NOTE_INSN_DELETED)
 (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
-(note 2 4 3 2 NOTE_INSN_DELETED)
+(insn 2 4 3 2 (set (reg/v/f:DI 23 [ p ])
+(unspec:DI [
+(const_int 0 [0])
+] UNSPEC_ARG_REG)) 
"source-gcc/gcc/testsuite/gcc.dg/pr114768.c":8:1 14 {load_arg_regdi}
+ (nil))
 (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
-(note 6 3 10 2 NOTE_INSN_DELETED)
-(note 10 6 11 2 NOTE_INSN_EPILOGUE_BEG)
-(jump_insn 11 10 12 2 (return) 
"source-gcc/gcc/testsuite/gcc.dg/pr114768.c":10:1 289 {return}
+(insn 6 3 7 2 (set (reg:SI 22 [ _1 ])
+(mem/v:SI (reg/v/f:DI 23 [ p ]) [1 MEM[(volatile int *)p_3(D)]+0 
S4 A32])) "source-gcc/gcc/testsuite/gcc.dg/pr114768.c":9:8 6 {*movsi_insn}
+ (nil))
+(insn 7 6 10 2 (set (mem:SI (reg/v/f:DI 23 [ p ]) [1 *p_3(D)+0 S4 A32])
+(reg:SI 22 [ _1 ])) 
"source-gcc/gcc/testsuite/gcc.dg/pr114768.c":9:6 6 {*movsi_insn}
+ (expr_list:REG_DEAD (reg/v/f:DI 23 [ p ])
+(expr_list:REG_DEAD (reg:SI 22 [ _1 ])
+(nil
+(note 10 7 13 2 NOTE_INSN_EPILOGUE_BEG)
+(note 13 10 11 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
+(jump_insn 11 13 12 3 (return) 
"source-gcc/gcc/testsuite/gcc.dg/pr114768.c":10:1 289 {return}
  (nil)
  -> return)
 (barrier 12 11 0)

--- 0/pr114768.s2024-04-19 11:34:34.577037596 +0200
+++ ./pr114768.s2024-04-19 12:08:00.118312524 +0200
@@ -13,5 +13,10 @@
 {
.reg.u64 %ar0;
ld.param.u64 %ar0, [%in_ar0];
+   .reg.u32 %r22;
+   .reg.u64 %r23;
+   mov.u64 %r23, %ar0;
+   ld.u32  %r22, [%r23];
+   st.u32  [%r23], %r22;
ret;
 }


Grüße
 Thomas


GCN: Enable effective-target 'vect_long_long'

2024-04-16 Thread Thomas Schwinge
Hi!

OK to push the attached "GCN: Enable effective-target 'vect_long_long'"?
(Or is that not what you'd expect to see for GCN?  I haven't checked the
actual back end code...)


Grüße
 Thomas


>From d74cc9caadfe36652503782a8da172ae1975915c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 16 Apr 2024 14:10:15 +0200
Subject: [PATCH] GCN: Enable effective-target 'vect_long_long'

... as made apparent by a number of unexpectedly UNSUPPORTED test cases, which
now all turn into PASS, with just one exception:

PASS: gcc.dg/vect/vect-early-break_124-pr114403.c (test for excess errors)
PASS: gcc.dg/vect/vect-early-break_124-pr114403.c execution test
FAIL: gcc.dg/vect/vect-early-break_124-pr114403.c scan-tree-dump vect "LOOP VECTORIZED"

..., which needs to be looked into, separately.

	gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_vect_long_long):
	Enable for GCN.
---
 gcc/testsuite/lib/target-supports.exp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 594837653bb..1a8459561c6 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7692,7 +7692,8 @@ proc check_effective_target_vect_long_long { } {
 	 || ([istarget riscv*-*-*]
 		 && [check_effective_target_riscv_v])
 	 || ([istarget loongarch*-*-*]
-		 && [check_effective_target_loongarch_sx])}}]
+		 && [check_effective_target_loongarch_sx])
+	 || [istarget amdgcn-*-*] }}]
 }
 
 
-- 
2.34.1



build: Use of cargo not yet supported here in Canadian cross configurations (was: [PATCH] build: Check for cargo when building rust language)

2024-04-15 Thread Thomas Schwinge
Hi!

On 2024-04-15T13:14:42+0200, I wrote:
> On 2024-04-08T18:33:38+0200, pierre-emmanuel.pa...@embecosm.com wrote:
>> The rust frontend requires cargo to build some of it's components,
>
> In GCC upstream still: 's%requires%is going to require'.  ;-)
>
>> it's presence was not checked during configuration.
>
> After confirming the desired semantics/diagnostics, I've now pushed this
> to trunk branch in commit 3e1e73fc99584440e5967577f2049573eeaf4596
> "build: Check for cargo when building rust language".

On top of that, OK to push the attached
"build: Use of cargo not yet supported here in Canadian cross configurations"?


Grüße
 Thomas


>From eb38990b4147951dd21f19def43072368f782af5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 15 Apr 2024 14:27:45 +0200
Subject: [PATCH] build: Use of cargo not yet supported here in Canadian cross
 configurations

..., until <https://github.com/Rust-GCC/gccrs/issues/2898>
"'cargo' should build for the host system" is resolved.

Follow-up to commit 3e1e73fc99584440e5967577f2049573eeaf4596
"build: Check for cargo when building rust language".

	* configure.ac (have_cargo): Force to "no" in Canadian cross
	configurations
	* configure: Regenerate.
---
 configure| 13 +
 configure.ac | 12 
 2 files changed, 25 insertions(+)

diff --git a/configure b/configure
index e254aa132b5..e59a870b2bd 100755
--- a/configure
+++ b/configure
@@ -9179,6 +9179,19 @@ $as_echo "$as_me: WARNING: --enable-host-shared required to build $language" >&2
   ;;
 esac
 
+# Pre-conditions to consider whether cargo being supported.
+if test x"$have_cargo" = xyes \
+  && test x"$build" != x"$host"; then
+  # Until <https://github.com/Rust-GCC/gccrs/issues/2898>
+  # "'cargo' should build for the host system" is resolved:
+  { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: use of cargo not yet supported here in Canadian cross configurations" >&5
+$as_echo "$as_me: WARNING: use of cargo not yet supported here in Canadian cross configurations" >&2;}
+  have_cargo=no
+else
+  # Assume that cargo-produced object files are compatible with what
+  # we're going to build here.
+  :
+fi
 # Disable Rust if cargo is unavailable.
 case ${add_this_lang}:${language}:${have_cargo} in
   yes:rust:no)
diff --git a/configure.ac b/configure.ac
index 87205d0ac1f..4ab54431475 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2306,6 +2306,18 @@ directories, to avoid imposing the performance cost of
   ;;
 esac
 
+# Pre-conditions to consider whether cargo being supported.
+if test x"$have_cargo" = xyes \
+  && test x"$build" != x"$host"; then
+  # Until <https://github.com/Rust-GCC/gccrs/issues/2898>
+  # "'cargo' should build for the host system" is resolved:
+  AC_MSG_WARN([use of cargo not yet supported here in Canadian cross configurations])
+  have_cargo=no
+else
+  # Assume that cargo-produced object files are compatible with what
+  # we're going to build here.
+  :
+fi
 # Disable Rust if cargo is unavailable.
 case ${add_this_lang}:${language}:${have_cargo} in
   yes:rust:no)
-- 
2.34.1



build: Don't check for host-prefixed 'cargo' program (was: [PATCH] build: Check for cargo when building rust language)

2024-04-15 Thread Thomas Schwinge
Hi!

On 2024-04-15T13:14:42+0200, I wrote:
> On 2024-04-08T18:33:38+0200, pierre-emmanuel.pa...@embecosm.com wrote:
>> The rust frontend requires cargo to build some of it's components,
>
> In GCC upstream still: 's%requires%is going to require'.  ;-)
>
>> it's presence was not checked during configuration.
>
> After confirming the desired semantics/diagnostics, I've now pushed this
> to trunk branch in commit 3e1e73fc99584440e5967577f2049573eeaf4596
> "build: Check for cargo when building rust language".
>
>
> I now wonder: instead of 'AC_CHECK_TOOL', shouldn't this use
> 'AC_CHECK_PROG'?  (We always want plain 'cargo', not host-prefixed
> 'aarch64-linux-gnu-cargo' etc., right?)  I'll look into changing this.

OK to push "build: Don't check for host-prefixed 'cargo' program", see
attached?


Grüße
 Thomas


>From 913be0412665d02561f8aeb999860ce8d292c61e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 15 Apr 2024 13:33:48 +0200
Subject: [PATCH] build: Don't check for host-prefixed 'cargo' program

Follow-up to commit 3e1e73fc99584440e5967577f2049573eeaf4596
"build: Check for cargo when building rust language":

On 2024-04-15T13:14:42+0200, I wrote:
> I now wonder: instead of 'AC_CHECK_TOOL', shouldn't this use
> 'AC_CHECK_PROG'?  (We always want plain 'cargo', not host-prefixed
> 'aarch64-linux-gnu-cargo' etc., right?)  I'll look into changing this.

	* configure: Regenerate.
	config/
	* acx.m4 (ACX_PROG_CARGO): Use 'AC_CHECK_PROGS'.
---
 config/acx.m4 |  3 +--
 configure | 64 ++-
 2 files changed, 8 insertions(+), 59 deletions(-)

diff --git a/config/acx.m4 b/config/acx.m4
index 3c5fe67342e..c45e55e7f51 100644
--- a/config/acx.m4
+++ b/config/acx.m4
@@ -427,8 +427,7 @@ fi
 # Test for Rust
 # We require cargo and rustc for some parts of the rust compiler.
 AC_DEFUN([ACX_PROG_CARGO],
-[AC_REQUIRE([AC_CHECK_TOOL_PREFIX])
-AC_CHECK_TOOL(CARGO, cargo, no)
+[AC_CHECK_PROGS(CARGO, cargo, no)
 if test "x$CARGO" != xno; then
   have_cargo=yes
 else
diff --git a/configure b/configure
index dd96445ac4a..e254aa132b5 100755
--- a/configure
+++ b/configure
@@ -5818,10 +5818,10 @@ else
   have_gdc=no
 fi
 
-
-if test -n "$ac_tool_prefix"; then
-  # Extract the first word of "${ac_tool_prefix}cargo", so it can be a program name with args.
-set dummy ${ac_tool_prefix}cargo; ac_word=$2
+for ac_prog in cargo
+do
+  # Extract the first word of "$ac_prog", so it can be a program name with args.
+set dummy $ac_prog; ac_word=$2
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
 $as_echo_n "checking for $ac_word... " >&6; }
 if ${ac_cv_prog_CARGO+:} false; then :
@@ -5837,7 +5837,7 @@ do
   test -z "$as_dir" && as_dir=.
 for ac_exec_ext in '' $ac_executable_extensions; do
   if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
-ac_cv_prog_CARGO="${ac_tool_prefix}cargo"
+ac_cv_prog_CARGO="$ac_prog"
 $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5
 break 2
   fi
@@ -5857,59 +5857,9 @@ $as_echo "no" >&6; }
 fi
 
 
-fi
-if test -z "$ac_cv_prog_CARGO"; then
-  ac_ct_CARGO=$CARGO
-  # Extract the first word of "cargo", so it can be a program name with args.
-set dummy cargo; ac_word=$2
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
-$as_echo_n "checking for $ac_word... " >&6; }
-if ${ac_cv_prog_ac_ct_CARGO+:} false; then :
-  $as_echo_n "(cached) " >&6
-else
-  if test -n "$ac_ct_CARGO"; then
-  ac_cv_prog_ac_ct_CARGO="$ac_ct_CARGO" # Let the user override the test.
-else
-as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
-for as_dir in $PATH
-do
-  IFS=$as_save_IFS
-  test -z "$as_dir" && as_dir=.
-for ac_exec_ext in '' $ac_executable_extensions; do
-  if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
-ac_cv_prog_ac_ct_CARGO="cargo"
-$as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5
-break 2
-  fi
+  test -n "$CARGO" && break
 done
-  done
-IFS=$as_save_IFS
-
-fi
-fi
-ac_ct_CARGO=$ac_cv_prog_ac_ct_CARGO
-if test -n "$ac_ct_CARGO"; then
-  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CARGO" >&5
-$as_echo "$ac_ct_CARGO" >&6; }
-else
-  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
-$as_echo "no" >&6; }
-fi
-
-  if test "x$ac_ct_CARGO" = x; then
-CARGO="no"
-  else
-case $cross_compiling:$ac_tool_warned in
-yes:)
-{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5
-$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;}
-ac_tool_warned=yes ;;
-esac
-CARGO=$ac_ct_CARGO
-  fi
-else
-  CARGO="$ac_cv_prog_CARGO"
-fi
+test -n "$CARGO" || CARGO="no"
 
 if test "x$CARGO" != xno; then
   have_cargo=yes
-- 
2.34.1



Re: [PATCH] build: Check for cargo when building rust language

2024-04-15 Thread Thomas Schwinge
Hi!

On 2024-04-08T18:33:38+0200, pierre-emmanuel.pa...@embecosm.com wrote:
> The rust frontend requires cargo to build some of it's components,

In GCC upstream still: 's%requires%is going to require'.  ;-)

> it's presence was not checked during configuration.

After confirming the desired semantics/diagnostics, I've now pushed this
to trunk branch in commit 3e1e73fc99584440e5967577f2049573eeaf4596
"build: Check for cargo when building rust language".


I now wonder: instead of 'AC_CHECK_TOOL', shouldn't this use
'AC_CHECK_PROG'?  (We always want plain 'cargo', not host-prefixed
'aarch64-linux-gnu-cargo' etc., right?)  I'll look into changing this.


Grüße
 Thomas


> Prevent rust language from building when cargo is
> missing.
>
> config/ChangeLog:
>
>   * acx.m4: Add a macro to check for rust
>   components.
>
> ChangeLog:
>
>   * configure: Regenerate.
>   * configure.ac: Emit an error message when cargo
>   is missing.
>
> Signed-off-by: Pierre-Emmanuel Patry 
> ---
>  config/acx.m4 |  11 +
>  configure | 117 ++
>  configure.ac  |  18 
>  3 files changed, 146 insertions(+)
>
> diff --git a/config/acx.m4 b/config/acx.m4
> index 7efe98aaf96..3c5fe67342e 100644
> --- a/config/acx.m4
> +++ b/config/acx.m4
> @@ -424,6 +424,17 @@ else
>  fi
>  ])
>  
> +# Test for Rust
> +# We require cargo and rustc for some parts of the rust compiler.
> +AC_DEFUN([ACX_PROG_CARGO],
> +[AC_REQUIRE([AC_CHECK_TOOL_PREFIX])
> +AC_CHECK_TOOL(CARGO, cargo, no)
> +if test "x$CARGO" != xno; then
> +  have_cargo=yes
> +else
> +  have_cargo=no
> +fi])
> +
>  # Test for D.
>  AC_DEFUN([ACX_PROG_GDC],
>  [AC_REQUIRE([AC_CHECK_TOOL_PREFIX])
> diff --git a/configure b/configure
> index 874966fb9f0..46e66e20197 100755
> --- a/configure
> +++ b/configure
> @@ -714,6 +714,7 @@ PGO_BUILD_GEN_CFLAGS
>  HAVE_CXX11_FOR_BUILD
>  HAVE_CXX11
>  do_compare
> +CARGO
>  GDC
>  GNATMAKE
>  GNATBIND
> @@ -5786,6 +5787,104 @@ else
>have_gdc=no
>  fi
>  
> +
> +if test -n "$ac_tool_prefix"; then
> +  # Extract the first word of "${ac_tool_prefix}cargo", so it can be a 
> program name with args.
> +set dummy ${ac_tool_prefix}cargo; ac_word=$2
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
> +$as_echo_n "checking for $ac_word... " >&6; }
> +if ${ac_cv_prog_CARGO+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  if test -n "$CARGO"; then
> +  ac_cv_prog_CARGO="$CARGO" # Let the user override the test.
> +else
> +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
> +for as_dir in $PATH
> +do
> +  IFS=$as_save_IFS
> +  test -z "$as_dir" && as_dir=.
> +for ac_exec_ext in '' $ac_executable_extensions; do
> +  if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
> +ac_cv_prog_CARGO="${ac_tool_prefix}cargo"
> +$as_echo "$as_me:${as_lineno-$LINENO}: found 
> $as_dir/$ac_word$ac_exec_ext" >&5
> +break 2
> +  fi
> +done
> +  done
> +IFS=$as_save_IFS
> +
> +fi
> +fi
> +CARGO=$ac_cv_prog_CARGO
> +if test -n "$CARGO"; then
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CARGO" >&5
> +$as_echo "$CARGO" >&6; }
> +else
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
> +$as_echo "no" >&6; }
> +fi
> +
> +
> +fi
> +if test -z "$ac_cv_prog_CARGO"; then
> +  ac_ct_CARGO=$CARGO
> +  # Extract the first word of "cargo", so it can be a program name with args.
> +set dummy cargo; ac_word=$2
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
> +$as_echo_n "checking for $ac_word... " >&6; }
> +if ${ac_cv_prog_ac_ct_CARGO+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  if test -n "$ac_ct_CARGO"; then
> +  ac_cv_prog_ac_ct_CARGO="$ac_ct_CARGO" # Let the user override the test.
> +else
> +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
> +for as_dir in $PATH
> +do
> +  IFS=$as_save_IFS
> +  test -z "$as_dir" && as_dir=.
> +for ac_exec_ext in '' $ac_executable_extensions; do
> +  if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
> +ac_cv_prog_ac_ct_CARGO="cargo"
> +$as_echo "$as_me:${as_lineno-$LINENO}: found 
> $as_dir/$ac_word$ac_exec_ext" >&5
> +break 2
> +  fi
> +done
> +  done
> +IFS=$as_save_IFS
> +
> +fi
> +fi
> +ac_ct_CARGO=$ac_cv_prog_ac_ct_CARGO
> +if test -n "$ac_ct_CARGO"; then
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CARGO" >&5
> +$as_echo "$ac_ct_CARGO" >&6; }
> +else
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
> +$as_echo "no" >&6; }
> +fi
> +
> +  if test "x$ac_ct_CARGO" = x; then
> +CARGO="no"
> +  else
> +case $cross_compiling:$ac_tool_warned in
> +yes:)
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not 
> prefixed with host triplet" >&5
> +$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" 
> >&2;}
> +ac_tool_warned=yes ;;
> +esac
> +CARGO=$ac_ct_CARGO
> +  fi
> +else
> +  CARGO="$ac_cv_prog_CARGO"
> +fi
> +
> +if test "x$CARGO" != xno; then
> +  have_cargo=yes
> 

Re: [gcc r14-7544] gccrs: libproc_macro: Build statically

2024-04-15 Thread Thomas Schwinge
Hi!

On 2024-01-16T17:43:10+, Arthur Cohen via Gcc-cvs  
wrote:
> https://gcc.gnu.org/g:71180a9eed367667e7b2c3f6aea1ee1bba15e9b3
>
> commit r14-7544-g71180a9eed367667e7b2c3f6aea1ee1bba15e9b3
> Author: Pierre-Emmanuel Patry 
> Date:   Wed Apr 26 10:31:35 2023 +0200
>
> gccrs: libproc_macro: Build statically
> 
> We do not need dynamic linking, all use case of this library cover can
> be done statically hence the change.
> 
> gcc/rust/ChangeLog:
> 
> * Make-lang.in: Link against the static libproc_macro.

> --- a/gcc/rust/Make-lang.in
> +++ b/gcc/rust/Make-lang.in
> @@ -182,11 +182,14 @@ RUST_ALL_OBJS = $(GRS_OBJS) $(RUST_TARGET_OBJS)
>  
>  rust_OBJS = $(RUST_ALL_OBJS) rust/rustspec.o
>  
> +RUST_LDFLAGS = $(LDFLAGS) -L./../libgrust/libproc_macro
> +RUST_LIBDEPS = $(LIBDEPS) ../libgrust/libproc_macro/libproc_macro.a
> +
>  # The compiler itself is called crab1
> -crab1$(exeext): $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(LIBDEPS) $(rust.prev)
> +crab1$(exeext): $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(RUST_LIBDEPS) 
> $(rust.prev)
>   @$(call LINK_PROGRESS,$(INDEX.rust),start)
> - +$(LLINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
> -   $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(LIBS) $(BACKENDLIBS)
> + +$(LLINKER) $(ALL_LINKERFLAGS) $(RUST_LDFLAGS) -o $@ \
> +   $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(LIBS) 
> ../libgrust/libproc_macro/libproc_macro.a $(BACKENDLIBS)
>   @$(call LINK_PROGRESS,$(INDEX.rust),end)

The 'crab1' compiler is (at least potentially) just one of several
executables that 'gcc/rust/Make-lang.in' may build, which may all have
different library dependencies, etc.  Instead of via generic 'RUST_[...]'
variables, those dependencies etc. should therefore be specified as they
are individually necessary.

I've pushed to trunk branch the following clean-up commits, see attached:

  - commit cb70a49b30f0a22ec7a1b7df29c3ab370d603f90 "Remove 
'libgrust/libproc_macro_internal' from 'gcc/rust/Make-lang.in:RUST_LDFLAGS'"
  - commit f7c8fa7280c85cbdea45be9c09f36123ff16a78a "Inline 
'gcc/rust/Make-lang.in:RUST_LDFLAGS' into single user"
  - commit 24d92f65f9ed9b3c730c59f700ce2f5c038c8207 "Add 
'gcc/rust/Make-lang.in:LIBPROC_MACRO_INTERNAL'"
  - commit e3fda76af4f342ad1ba8bd901a72d811e8357e99 "Inline 
'gcc/rust/Make-lang.in:RUST_LIBDEPS' into single user"


Grüße
 Thomas


>From cb70a49b30f0a22ec7a1b7df29c3ab370d603f90 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 28 Feb 2024 22:41:42 +0100
Subject: [PATCH 1/4] Remove 'libgrust/libproc_macro_internal' from
 'gcc/rust/Make-lang.in:RUST_LDFLAGS'

This isn't necessary, as the full path to 'libproc_macro_internal.a' is
specified elsewhere.

	gcc/rust/
	* Make-lang.in (RUST_LDFLAGS): Remove
	'libgrust/libproc_macro_internal'.
---
 gcc/rust/Make-lang.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/rust/Make-lang.in b/gcc/rust/Make-lang.in
index 4d73412739d..e901668b93d 100644
--- a/gcc/rust/Make-lang.in
+++ b/gcc/rust/Make-lang.in
@@ -208,7 +208,7 @@ RUST_ALL_OBJS = $(GRS_OBJS) $(RUST_TARGET_OBJS)
 
 rust_OBJS = $(RUST_ALL_OBJS) rust/rustspec.o
 
-RUST_LDFLAGS = $(LDFLAGS) -L./../libgrust/libproc_macro_internal
+RUST_LDFLAGS = $(LDFLAGS)
 RUST_LIBDEPS = $(LIBDEPS) ../libgrust/libproc_macro_internal/libproc_macro_internal.a
 
 # The compiler itself is called crab1
-- 
2.34.1

>From f7c8fa7280c85cbdea45be9c09f36123ff16a78a Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 28 Feb 2024 22:45:18 +0100
Subject: [PATCH 2/4] Inline 'gcc/rust/Make-lang.in:RUST_LDFLAGS' into single
 user

	gcc/rust/
	* Make-lang.in (RUST_LDFLAGS): Inline into single user.
---
 gcc/rust/Make-lang.in | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/rust/Make-lang.in b/gcc/rust/Make-lang.in
index e901668b93d..ffeb325d6ce 100644
--- a/gcc/rust/Make-lang.in
+++ b/gcc/rust/Make-lang.in
@@ -208,13 +208,12 @@ RUST_ALL_OBJS = $(GRS_OBJS) $(RUST_TARGET_OBJS)
 
 rust_OBJS = $(RUST_ALL_OBJS) rust/rustspec.o
 
-RUST_LDFLAGS = $(LDFLAGS)
 RUST_LIBDEPS = $(LIBDEPS) ../libgrust/libproc_macro_internal/libproc_macro_internal.a
 
 # The compiler itself is called crab1
 crab1$(exeext): $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(RUST_LIBDEPS) $(rust.prev)
 	@$(call LINK_PROGRESS,$(INDEX.rust),start)
-	+$(LLINKER) $(ALL_LINKERFLAGS) $(RUST_LDFLAGS) -o $@ \
+	+$(LLINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
 	  $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(LIBS) ../libgrust/libproc_macro_internal/libproc_macro_internal.a $(BACKENDLIBS)
 	@$(call LINK_PROGRESS,$(INDEX.rust),end)
 
-- 
2.34.1

>From 24d92f65f9ed9b3c730c59f700ce2f5c038c8207 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 28 Feb 2024 22:51:24 +0100
Subject: [PATCH 3/4] Add 'gcc/rust/Make-lang.in:LIBPROC_MACRO_INTERNAL'

... to avoid verbatim repeti

Re: [nvptx PATCH] Correct pattern for popcountdi2 insn in nvptx.md.

2024-04-12 Thread Thomas Schwinge
Hi Roger!

On 2023-01-09T13:29:14+, "Roger Sayle"  wrote:
> The result of a POPCOUNT operation in RTL should have the same mode
> as its operand.  This corrects the specification of popcount in
> the nvptx backend, splitting the current generic define_insn into
> two, one for popcountsi2 and the other for popcountdi2 (the latter
> with an explicit truncate).
>
> This patch has been tested on nvptx-none (hosted on x86_64-pc-linux-gnu)
> with make and make -k check with no new failures.  This functionality is
> already tested by gcc.target/nvptx/popc-[123].c.

So I compared '-fdump-rtl-all' and '*.s' of current vs. patched for those
three '*.c' files.  It is expected that I only see '(popcount:SI [DI])'
-> '(truncate:SI (popcount:DI [DI]))', but not any actually observable
change, right?

Shouldn't the current erronuous form trigger a '--enable-checking=rtl'
error?

> Ok for mainline?

OK, thanks.


..., and sorry for the great delay!  The chaos that came upon my group
half a year ago, and resulted in having had to switch employers, has not
exactly helped to allow allocating proper time for better learning GCC
back end.  But, fortunately, we've been able to switch employers!


Grüße
 Thomas


> 2023-01-09  Roger Sayle  
>
> gcc/ChangeLog
>   * config/nvptx/nvptx.md (popcount2): Split into...
>   (popcountsi2): define_insn handling SImode popcount.
>   (popcountdi2): define_insn handling DImode popcount, with an
>   explicit truncate:SI to produce an SImode result.
>
> Thanks in advance,
> Roger
> --
>
> diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
> index 740c4de..461540e 100644
> --- a/gcc/config/nvptx/nvptx.md
> +++ b/gcc/config/nvptx/nvptx.md
> @@ -658,11 +658,18 @@
>DONE;
>  })
>  
> -(define_insn "popcount2"
> +(define_insn "popcountsi2"
>[(set (match_operand:SI 0 "nvptx_register_operand" "=R")
> - (popcount:SI (match_operand:SDIM 1 "nvptx_register_operand" "R")))]
> + (popcount:SI (match_operand:SI 1 "nvptx_register_operand" "R")))]
>""
> -  "%.\\tpopc.b%T1\\t%0, %1;")
> +  "%.\\tpopc.b32\\t%0, %1;")
> +
> +(define_insn "popcountdi2"
> +  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
> + (truncate:SI
> +   (popcount:DI (match_operand:DI 1 "nvptx_register_operand" "R"]
> +  ""
> +  "%.\\tpopc.b64\\t%0, %1;")
>  
>  ;; Multiplication variants
>  


Re: [PATCH] Regenerate opt.urls

2024-04-12 Thread Thomas Schwinge
Hi!

After having received around a dozen more buildbot notifications...

On 2024-04-10T06:46:04-0700, Palmer Dabbelt  wrote:
> On Tue, 09 Apr 2024 07:57:24 PDT (-0700), ishitatsuy...@gmail.com wrote:
>> Fixes: 97069657c4e ("RISC-V: Implement TLS Descriptors.")
>>
>> gcc/ChangeLog:
>>  * config/riscv/riscv.opt.urls: Regenerated.
>> ---
>>  gcc/config/riscv/riscv.opt.urls | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/gcc/config/riscv/riscv.opt.urls 
>> b/gcc/config/riscv/riscv.opt.urls
>> index da31820e234..351f7f0dda2 100644
>> --- a/gcc/config/riscv/riscv.opt.urls
>> +++ b/gcc/config/riscv/riscv.opt.urls
>> @@ -89,3 +89,5 @@ UrlSuffix(gcc/RISC-V-Options.html#index-minline-strncmp)
>>  minline-strlen
>>  UrlSuffix(gcc/RISC-V-Options.html#index-minline-strlen)
>>
>> +; skipping UrlSuffix for 'mtls-dialect=' due to finding no URLs
>> +
>
> Thanks.  I had another one over here 
> ,
>  
> but let's go with yours -- I think the actual contents are the same, but 
> I didn't actually run the regenerate script.  So
>
> Reviewed-by: Palmer Dabbelt 
> Acked-by: Palmer Dabbelt 

..., I've now pushed this to trunk branch in
commit c9500083073ff5e0f5c1c9db92d7ce6e51a62919
"Regenerate opt.urls".


Grüße
 Thomas


Re: [PATCH] contrib/check-params-in-docs.py: Ignore target-specific params

2024-04-12 Thread Thomas Schwinge
Hi!

On 2024-04-12T09:08:13+0200, Filip Kastl  wrote:
> On Thu 2024-04-11 20:51:55, Thomas Schwinge wrote:
>> On 2024-04-11T19:52:51+0200, Martin Jambor  wrote:
>> > contrib/check-params-in-docs.py is a script that checks that all
>> > options reported with ./gcc/xgcc -Bgcc --help=param are in
>> > gcc/doc/invoke.texi and vice versa.
>> 
>> Eh, first time I'm hearing about this one!
>> 
>> (a) Shouldn't this be running as part of the GCC build process?
>> 
>> > gcn-preferred-vectorization-factor is in the manual but normally not
>> > reported by --help, probably because I do not have gcn offload
>> > configured.
>> 
>> No, because you've not been building GCC for GCN target.  ;-P
>> 
>> > This patch makes the script silently about this particular
>> > fact.
>> 
>> (b) Shouldn't we instead ignore any '--param's with "gcn" prefix, similar
>> to how that's done for "skip aarch64 params"?
>> 
>> (c) ..., and shouldn't we likewise skip any "x86" ones?
>> 
>> (d) ..., or in fact any target specific ones, following after the generic
>> section?  (Easily achieved with a special marker in
>> 'gcc/doc/invoke.texi', just before:
>> 
>> The following choices of @var{name} are available on AArch64 targets:
>> 
>> ..., and adjusting the 'takewhile' in 'contrib/check-params-in-docs.py'
>> accordingly?

> I've made a patch to address (b), (c), (d).  I didn't adjust takewhile.  I
> chose to do it differently since target-specific params in both invoke.texi 
> and
> --help=params have to be ignored.

Right, I realized that after I had sent my email...

> The downside of this patch is that the script won't complain if someone adds a
> target-specific param and doesn't document it.

Yes, but that's a pre-existing problem -- unless you happened to be
targeting some x86 variant.  The target-specific '--param's will have to
be handled differently.

> What do you think?

Looks like a good incremental improvement to me, thanks!


Grüße
 Thomas


> contrib/check-params-in-docs.py is a script that checks that all options
> reported with gcc --help=params are in gcc/doc/invoke.texi and vice
> versa.
> gcc/doc/invoke.texi lists target-specific params but gcc --help=params
> doesn't.  This meant that the script would mistakenly complain about
> parms missing from --help=params.  Previously, the script was just set
> to ignore aarch64 and gcn params which solved this issue only for x86.
> This patch sets the script to ignore all target-specific params.
>
> contrib/ChangeLog:
>
>   * check-params-in-docs.py: Ignore target specific params.
>
> Signed-off-by: Filip Kastl 
> ---
>  contrib/check-params-in-docs.py | 21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py
> index f7879dd8e08..ccdb8d72169 100755
> --- a/contrib/check-params-in-docs.py
> +++ b/contrib/check-params-in-docs.py
> @@ -38,6 +38,9 @@ def get_param_tuple(line):
>  description = line[i:].strip()
>  return (name, description)
>  
> +def target_specific(param):
> +return param.split('-')[0] in ('aarch64', 'gcn', 'x86')
> +
>  
>  parser = argparse.ArgumentParser()
>  parser.add_argument('texi_file')
> @@ -45,13 +48,16 @@ parser.add_argument('params_output')
>  
>  args = parser.parse_args()
>  
> -ignored = {'logical-op-non-short-circuit', 
> 'gcn-preferred-vectorization-factor'}
> -params = {}
> +ignored = {'logical-op-non-short-circuit'}
> +help_params = {}
>  
>  for line in open(args.params_output).readlines():
>  if line.startswith(' ' * 2) and not line.startswith(' ' * 8):
>  r = get_param_tuple(line)
> -params[r[0]] = r[1]
> +help_params[r[0]] = r[1]
> +
> +# Skip target-specific params
> +help_params = [x for x in help_params.keys() if not target_specific(x)]
>  
>  # Find section in .texi manual with parameters
>  texi = ([x.strip() for x in open(args.texi_file).readlines()])
> @@ -66,14 +72,13 @@ for line in texi:
>  texi_params.append(line[len(token):])
>  break
>  
> -# skip digits
> +# Skip digits
>  texi_params = [x for x in texi_params if not x[0].isdigit()]
> -# skip aarch64 params
> -texi_params = [x for x in texi_params if not x.startswith('aarch64')]
> -sorted_params = sorted(texi_params)
> +# Skip target-specific params
> +texi_params = [x for x in texi_params if not target_specific(x)]
>  
>  texi_set = set(texi_params) - ignored
> -params_set = set(params.keys()) - ignored
> +params_set = set(help_params) - ignored
>  
>  success = True
>  extra = texi_set - params_set
> -- 
> 2.43.1


Re: [PATCH, OpenACC 2.7, v3] Adjust acc_map_data/acc_unmap_data interaction with reference counters

2024-04-12 Thread Thomas Schwinge
Hi Chung-Lin!

On 2024-04-11T22:08:47+0800, Chung-Lin Tang  wrote:
> On 2024/3/15 7:24 PM, Thomas Schwinge wrote:
>> -  if (n->refcount != REFCOUNT_INFINITY)
>> +  if (n->refcount != REFCOUNT_INFINITY
>> +  && n->refcount != REFCOUNT_ACC_MAP_DATA)
>>  n->refcount--;
>>n->dynamic_refcount--;
>>  }
>>  
>> +  /* Mappings created by 'acc_map_data' may only be deleted by
>> + 'acc_unmap_data'.  */
>> +  if (n->refcount == REFCOUNT_ACC_MAP_DATA
>> +  && n->dynamic_refcount == 0)
>> +n->dynamic_refcount = 1;
>> +
>>if (n->refcount == 0)
>>  {
>>bool copyout = (kind == GOMP_MAP_FROM
>> 
>> ..., which really should have the same semantics?  No strong opinion on
>> which of the two variants you now chose.
>
> My guess is that breaking off the REFCOUNT_ACC_MAP_DATA case separately will
> be lighter on any branch predictors (faster performing overall)

Eh, OK...

> so I will
> stick with my version here.


>>>> It's not clear to me why you need this handling -- instead of just
>>>> handling 'REFCOUNT_ACC_MAP_DATA' like 'REFCOUNT_INFINITY' here, that is,
>>>> early 'return'?
>>>>
>>>> Per my understanding, this code is for OpenACC only exercised for
>>>> structured data regions, and it seems strange (unnecessary?) to adjust
>>>> the 'dynamic_refcount' for these for 'acc_map_data'-mapped data?  Or am I
>>>> missing anything?
>>>
>>> No, that is not true. It goes through almost everything through 
>>> gomp_map_vars_existing/_internal.
>>> This is what happens when you acc_create/acc_copyin on a mapping created by 
>>> acc_map_data.

I still don't follow.  If you 'acc_map_data' something, and then
'acc_create' the same memory region, then that's handled, with
'dynamic_refcount', via 'acc_create' -> 'goacc_enter_datum' ->
'goacc_map_var_existing', all in 'libgomp/oacc-mem.c'.  Agree?

>> But I don't understand what you foresee breaking with the following (on
>> top of your v2):
>> 
>> --- a/libgomp/target.c
>> +++ b/libgomp/target.c
>> @@ -476,14 +476,14 @@ gomp_free_device_memory (struct gomp_device_descr 
>> *devicep, void *devptr)
>>  static inline void
>>  gomp_increment_refcount (splay_tree_key k, htab_t *refcount_set)
>>  {
>> -  if (k == NULL || k->refcount == REFCOUNT_INFINITY)
>> +  if (k == NULL
>> +  || k->refcount == REFCOUNT_INFINITY
>> +  || k->refcount == REFCOUNT_ACC_MAP_DATA)
>>  return;
>>  
>>uintptr_t *refcount_ptr = >refcount;
>>  
>> -  if (k->refcount == REFCOUNT_ACC_MAP_DATA)
>> -refcount_ptr = >dynamic_refcount;
>> -  else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount))
>> +  if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount))
>>  refcount_ptr = >structelem_refcount;
> ...
>> Can you please show a test case?

That is, a test case where the 'libgomp/target.c:gomp_increment_refcount'
etc. handling is relevant.  Those test cases:

> I have re-tested the patch *without* the gomp_increment/decrement_refcount 
> changes,
> and have these regressions (just to demonstrate what is affected):
> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
> execution test
> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> execution test
> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
> execution test
> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> execution test
> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
> execution test
> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> execution test
> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
> execution test
> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  

Re: [PATCH] contrib/check-params-in-docs.py: Ignore gcn-preferred-vectorization-factor

2024-04-11 Thread Thomas Schwinge
Hi!

On 2024-04-11T19:52:51+0200, Martin Jambor  wrote:
> contrib/check-params-in-docs.py is a script that checks that all
> options reported with ./gcc/xgcc -Bgcc --help=param are in
> gcc/doc/invoke.texi and vice versa.

Eh, first time I'm hearing about this one!

(a) Shouldn't this be running as part of the GCC build process?

> gcn-preferred-vectorization-factor is in the manual but normally not
> reported by --help, probably because I do not have gcn offload
> configured.

No, because you've not been building GCC for GCN target.  ;-P

> This patch makes the script silently about this particular
> fact.

(b) Shouldn't we instead ignore any '--param's with "gcn" prefix, similar
to how that's done for "skip aarch64 params"?

(c) ..., and shouldn't we likewise skip any "x86" ones?

(d) ..., or in fact any target specific ones, following after the generic
section?  (Easily achieved with a special marker in
'gcc/doc/invoke.texi', just before:

The following choices of @var{name} are available on AArch64 targets:

..., and adjusting the 'takewhile' in 'contrib/check-params-in-docs.py'
accordingly?


Grüße
 Thomas


> I'll push the patch as obvious momentarily.
>
> Martin
>
>
> contrib/ChangeLog:
>
> 2024-04-11  Martin Jambor  
>
>   * check-params-in-docs.py (ignored): Add
>   gcn-preferred-vectorization-factor.
> ---
>  contrib/check-params-in-docs.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py
> index 623c82284e2..f7879dd8e08 100755
> --- a/contrib/check-params-in-docs.py
> +++ b/contrib/check-params-in-docs.py
> @@ -45,7 +45,7 @@ parser.add_argument('params_output')
>  
>  args = parser.parse_args()
>  
> -ignored = {'logical-op-non-short-circuit'}
> +ignored = {'logical-op-non-short-circuit', 
> 'gcn-preferred-vectorization-factor'}
>  params = {}
>  
>  for line in open(args.params_output).readlines():
> -- 
> 2.44.0


Re: [PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis

2024-04-11 Thread Thomas Schwinge
Hi Chung-Lin, Richard!

>From me just a few mechanical pieces, see below.  Richard, are you able
to again comment on Chung-Lin's general strategy, as I'm not at all
familiar with those parts of the code?

On 2024-04-03T19:50:55+0800, Chung-Lin Tang  wrote:
> On 2023/10/30 8:46 PM, Richard Biener wrote:
>>>
>>> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
>>> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
>>> flag.
>>>
>>> The actual optimization then is done in this second patch.  Chung-Lin
>>> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
>>> I don't have much experience with most of the following generic code, so
>>> would appreciate a helping hand, whether that conceptually makes sense as
>>> well as from the implementation point of view:
>
> First of all, I have removed all of the gimplify-stage scanning and setting of
> DECL_POINTS_TO_READONLY and SSA_NAME_POINTS_TO_READONLY_MEMORY (so no changes 
> to
> gimplify.cc now)
>
> I remember this code was an artifact of earlier attempts to allow 
> struct-member
> pointer mappings to also work (e.g. map(readonly:rec.ptr[:N])), but failed 
> anyways.
> I think the omp_data_* member accesses when building child function side
> receiver_refs is blocking points-to analysis from working (didn't try digging 
> deeper)
>
> Also during gimplify, VAR_DECLs appeared to be reused (at least in some 
> cases) for map
> clause decl reference building, so hoping that the variables "happen to be" 
> single-use and
> DECL_POINTS_TO_READONLY relaying into SSA_NAME_POINTS_TO_READONLY_MEMORY does 
> appear to be
> a little risky.
>
> However, for firstprivate pointers processed during omp-low, it appears to be 
> somewhat different.
> (see below description)
>
>> No, I don't think you can use that flag on non-default-defs, nor
>> preserve it on copying.  So
>> it also doesn't nicely extend to DECLs as done by the patch.  We
>> currently _only_ use it
>> for incoming parameters.  When used on arbitrary code you can get to for 
>> example
>> 
>> ptr1(points-to-readony-memory) = >x;
>> ... access via ptr1 ...
>> ptr2 = >x;
>> ... access via ptr2 ...
>> 
>> where both are your OMP regions differently constrained (the constrain is on 
>> the
>> code in the region, _not_ on the actual protections of the pointed to
>> data, much like
>> for the fortran case).  But now CSE comes along and happily replaces all ptr2
>> with ptr2 in the second region and ... oops!
>
> Richard, I assume what you meant was "happily replaces all ptr2 with ptr1 in 
> the second region"?
>
> That doesn't happen, because during omp-lower/expand, OMP target regions 
> (which is all that
> this applies currently) is separated into different individual child 
> functions.
>
> (Currently, the only "effective" use of DECL_POINTS_TO_READONLY is during 
> omp-lower, when
> for firstprivate pointers (i.e. 'a' here) we set this bit when constructing 
> the first load
> of this pointer)
>
>   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
>   {
> foo (a, a[8]);
> r = a[8];
>   }
>   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
>   {
> foo (a, a[12]);
> r = a[12];
>   }
>
> After omp-expand (before SSA):
>
> __attribute__((oacc parallel, omp target entrypoint, noclone))
> void main._omp_fn.1 (const struct .omp_data_t.3 & restrict .omp_data_i)
> {
>  ...
>:
>   D.2962 = .omp_data_i->D.2947;
>   a.8 = D.2962;
>   r.1 = (*a.8)[12];
>   foo (a.8, r.1);
>   r.1 = (*a.8)[12];
>   D.2965 = .omp_data_i->r;
>   *D.2965 = r.1;
>   return;
> }
>
> __attribute__((oacc parallel, omp target entrypoint, noclone))
> void main._omp_fn.0 (const struct .omp_data_t.2 & restrict .omp_data_i)
> {
>   ...
>:
>   D.2968 = .omp_data_i->D.2939;
>   a.4 = D.2968;
>   r.0 = (*a.4)[8];
>   foo (a.4, r.0);
>   r.0 = (*a.4)[8];
>   D.2971 = .omp_data_i->r;
>   *D.2971 = r.0;
>   return;
> }
>
> So actually, the creating of DECL_POINTS_TO_READONLY and its relaying to
> SSA_NAME_POINTS_TO_READONLY_MEMORY here, is actually quite similar to a 
> default-def
> for an PARM_DECL, at least conceptually.
>
> (If offloading was structured significantly differently, say if child 
> functions
> were separated much earlier before omp-lowering, than this readonly-modifier 
> might
> possibly be a direct application of 'r' in the "fn spec" attribute)
>
> Other changes since first version of patch include:
> 1) update of C/C++ FE changes to new style in c-family/c-omp.cc
> 2) merging of two if cases in fortran/trans-openmp.cc like Thomas suggested
> 3) Update of readonly-2.c testcase to scan before/after "fre1" pass, to 
> verify removal of a MEM load, also as Thomas suggested.

Thanks!

> I have re-tested this patch using mainline, with no regressions. Is this okay 
> for mainline?

> 2024-04-03  Chung-Lin Tang  
>
> gcc/c-family/ChangeLog:
>
>   * c-omp.cc (c_omp_address_inspector::expand_array_base):
>   Set 

Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++

2024-04-11 Thread Thomas Schwinge
Hi!

I've filed <https://gcc.gnu.org/PR114690>
"OpenMP 'indirect' clause: dynamic image loading/unloading" for the
following issue:

On 2023-11-13T12:47:04+0100, Tobias Burnus  wrote:
> On 13.11.23 11:59, Thomas Schwinge wrote:
>>>> Also, for my understanding: why is 'build_indirect_map' done at kernel
>>>> invocation time (here) instead of at image load time?
>>> The splay_tree is generated on the device itself - and we currently do
>>> not start a kernel during GOMP_OFFLOAD_load_image. We could, the
>>> question is whether it makes sense. (Generating the splay_tree on the
>>> host for the device is a hassle and error prone as it needs to use
>>> device pointers at the end.)
>> Hmm.  It seems conceptually cleaner to me to set this up upfront, and
>> avoids potentially slowing down every device kernel invocation (at least
>> another function call, and 'gomp_mutex_lock' check).  Though, I agree
>> this may be "in the noise" with regards to all the other stuff going on
>> in 'gomp_gcn_enter_kernel' and elsewhere...
>
> I think the most common case is GOMP_INDIRECT_ADDR_MAP == NULL.
>
> The question is whether the lock should/could be moved inside  if 
> (!indirect_array)
> or not. Probably yes:
> * doing an atomic load for the outer '!indirect array', work on a local array 
> for
> the build up and only assign it at the end - and just after the lock check 
> again
> whether '!indirect array'.
>
> That way, it is lock free once build but when build there is no race.
>
>> What I just realize, what's also unclear to me is how the current
>> implementation works with regards to several images getting loaded --
>> don't we then overwrite 'GOMP_INDIRECT_ADDR_MAP' instead of
>> (conceptually) appending to it?
>
> Yes, I think that will happen - but it looks as if the same issue exists
> also the other code? I think that's not the first variable that has that
> issue?
>
> I think we should try to cleanup that handling, also to support calling
> a device function in a shared library from a target region in the main
> program, which currently also fails.
>
> All device routines that are in normal static libraries and in the
> object files of the main program should simply work thanks to offload
> LTO such that there is only a single GOMP_offload_register_ver call (per
> device type) and GOMP_OFFLOAD_load_image call (per device).
>
> Likewise if the offloading is only done via a single shared library. —
> Any mixing will currently fail, unfortunately. This patch just adds
> another item which does not handle it properly.
>
> (Not good but IMHO also not a showstopper for this patch.)
>
>> In the general case, additional images may also get loaded during
>> execution.  We thus need proper locking of the shared data structure, uh?
>> Or, can we have separate on-device data structures per image?  (I've not
>> yet thought about that in detail.)
>
> I think we could - but in the main-program 'omp target' case that calls
> a shared-library 'declare target' function means that we need to handle
> multiple GOMP_offload_register_ver / load_image calls such that they can
> work together.
>
> Obviously, it gets harder if the user keeps doing dlopen() / dlclose()
> of libraries containing offload code where a target/compute region is
> run before, between, and after those calls (but hopefully not running
> when calling dlopen/dlclose).
>
>> Relatedly then, when images are unloaded, we also need to remove stale
>> items from the table, and release resources (for example, the
>> 'GOMP_OFFLOAD_alloc' for 'map_target_addr').
>
> True. I think the general assumption is that images only get unloaded at
> the very end, which matches most but not all code. Yet another work item.
>
> I think we should open a new PR about this topic and collect work items
> there.


Grüße
 Thomas


Regeneration of 'gcc/config/riscv/riscv.opt.urls' (was: [PATCH v2 2/3] aarch64: Add support for aarch64-gnu (GNU/Hurd on AArch64))

2024-04-10 Thread Thomas Schwinge
Hi!

On 2024-04-09T09:24:29-0700, Palmer Dabbelt  wrote:
> On Tue, 09 Apr 2024 01:04:34 PDT (-0700), buga...@gmail.com wrote:
>> On Tue, Apr 9, 2024 at 10:27 AM Thomas Schwinge  
>> wrote:
>>> Thanks, pushed to trunk branch:
>>>
>>>   - commit 532c57f8c3a15b109a46d3e2b14d60a5c40979d5 "Move GNU/Hurd 
>>> startfile spec from config/i386/gnu.h to config/gnu.h"
>>>   - commit 9670a2326333caa8482377c00beb65723b7b4b26 "aarch64: Add support 
>>> for aarch64-gnu (GNU/Hurd on AArch64)"
>>>   - commit 46c91665f4bceba19aed56f5bd6e934c548b84ff "libgcc: Add basic 
>>> support for aarch64-gnu (GNU/Hurd on AArch64)"
>>
>> \o/ Thanks a lot!
>>
>> This will unblock merging the aarch64-gnu glibc port upstream.

\o/


>> I assume the buildbot failure that I just got an email about is
>> unrelated; it's failing on some RISC-V thing.
>
> Sorry if I missed something here, do you have a pointer?

<https://inbox.sourceware.org/20240409074850.ed7bd3858...@sourceware.org>
and several more such messages, requesting:

--- a/gcc/config/riscv/riscv.opt.urls
+++ b/gcc/config/riscv/riscv.opt.urls
@@ -89,3 +89,5 @@ UrlSuffix(gcc/RISC-V-Options.html#index-minline-strncmp)
 minline-strlen
 UrlSuffix(gcc/RISC-V-Options.html#index-minline-strlen)
 
+; skipping UrlSuffix for 'mtls-dialect=' due to finding no URLs
+

To be fixed by
<https://inbox.sourceware.org/20240409145724.9640-1-ishitatsuy...@gmail.com>
"Regenerate opt.urls".


Grüße
 Thomas


Re: [PATCH v2 2/3] aarch64: Add support for aarch64-gnu (GNU/Hurd on AArch64)

2024-04-09 Thread Thomas Schwinge
Hi!

On 2024-04-05T15:13:33+0300, Sergey Bugaev  wrote:
> On Tue, Apr 2, 2024 at 8:26 PM Richard Sandiford
>  wrote:
>> I don't know if you're waiting on me, but just in case: this and patch 3
>> still LGTM if Thomas is OK with them.
>
> Thanks. Thomas asked me to resubmit with Changelog entries added (but
> hasn't pointed out anything else), so this is waiting for him to
> confirm that this looks OK now.

Thanks, pushed to trunk branch:

  - commit 532c57f8c3a15b109a46d3e2b14d60a5c40979d5 "Move GNU/Hurd startfile 
spec from config/i386/gnu.h to config/gnu.h"
  - commit 9670a2326333caa8482377c00beb65723b7b4b26 "aarch64: Add support for 
aarch64-gnu (GNU/Hurd on AArch64)"
  - commit 46c91665f4bceba19aed56f5bd6e934c548b84ff "libgcc: Add basic support 
for aarch64-gnu (GNU/Hurd on AArch64)"


Grüße
 Thomas


Re: [PATCH] rust: Add rust.install-dvi and rust.install-html rules

2024-04-08 Thread Thomas Schwinge
Hi Christophe!

On 2024-04-04T16:27:19+, Christophe Lyon  wrote:
> rust has the (empty) rust.dvi and rust.html rules, but lacks the
> (empty) rust.install-dvi and rust.install-html ones.

Thanks, looks good to me.


Grüße
 Thomas


> 2024-04-04  Christophe Lyon  
>
>   gcc/rust/
>   * Make-lang.in (rust.install-dvi, rust.install-html): New rules.
> ---
>  gcc/rust/Make-lang.in | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/rust/Make-lang.in b/gcc/rust/Make-lang.in
> index 4d646018792..4d73412739d 100644
> --- a/gcc/rust/Make-lang.in
> +++ b/gcc/rust/Make-lang.in
> @@ -342,6 +342,8 @@ selftest-rust-valgrind: $(RUST_SELFTEST_DEPS)
>  # should have dependencies on info files that should be installed.
>  rust.install-info:
>  
> +rust.install-dvi:
> +rust.install-html:
>  rust.install-pdf:
>  
>  # Install man pages for the front end. This target should ignore errors.


GCN: '--param=gcn-preferred-vectorization-factor=[default,32,64]' (was: GCN: '--param=gcn-preferred-vector-lane-width=[default,32,64]')

2024-04-08 Thread Thomas Schwinge
Hi!

On 2024-04-08T13:24:06+0100, Andrew Stubbs  wrote:
> On 08/04/2024 11:45, Thomas Schwinge wrote:
>> On 2024-03-28T08:00:50+0100, I wrote:
>>> On 2024-03-22T15:54:48+, Andrew Stubbs  wrote:
>>>> This patch alters the default (preferred) vector size to 32 on RDNA 
>>>> devices to
>>>> better match the actual hardware.  64-lane vectors will continue to be
>>>> used where they are hard-coded (such as function prologues).
>>>>
>>>> We run these devices in wavefrontsize64 for compatibility, but they 
>>>> actually
>>>> only have 32-lane vectors, natively.  If the upper part of a V64 is masked
>>>> off (as it is in V32) then RDNA devices will skip execution of the upper 
>>>> part
>>>> for most operations, so this adjustment shouldn't leave too much 
>>>> performance on
>>>> the table.  One exception is memory instructions, so full wavefrontsize32
>>>> support would be better.
>>>>
>>>> The advantage is that we avoid the missing V64 operations (such as permute 
>>>> and
>>>> vec_extract).
>>>>
>>>> Committed to mainline.
>>>
>>> In my GCN target '-march=gfx1100' testing, this commit
>>> "amdgcn: Prefer V32 on RDNA devices" does resolve (or, make latent?) a
>>> number of execution test FAILs (that is, regressions compared to earlier
>>> '-march=gfx90a' etc. testing).
>>>
>>> This commit also resolves (for my '-march=gfx1100' testing) one
>>> pre-existing FAIL (that is, already seen in '-march=gfx90a' earlier
>>> etc. testing):
>>>
>>>  PASS: gcc.dg/tree-ssa/scev-14.c (test for excess errors)
>>>  [-FAIL:-]{+PASS:+} gcc.dg/tree-ssa/scev-14.c scan-tree-dump ivopts 
>>> "Overflowness wrto loop niter:\tNo-overflow"
>>>
>>> That means, this test case specifically (or, just its 'scan-tree-dump'?)
>>> needs to be adjusted for GCN V64 testing?
>>>
>>> This commit, as you'd also mentioned elsewhere, however also causes a
>>> number of regressions in 'gcc.target/gcn/gcn.exp', see list below.
>>>
>>> Those can be "fixed" with 'dg-additional-options -march=gfx90a' (or
>>> similar) in the affected test cases (let me know if you'd like me to
>>> 'git push' that), but I suppose something more elaborate may be in order?
>>> (Conditionalize those on 'target { ! gcn_rdna }', and add respective
>>> scanning for 'target gcn_rdna'?  I can help with effective-target
>>> 'gcn_rdna' (or similar), if you'd like me to.)
>>>
>>> And/or, have a '-mpreferred-simd-mode=v64' (or similar) to be used for
>>> such test cases, to override 'if (TARGET_RDNA2_PLUS)' etc. in
>>> 'gcn_vectorize_preferred_simd_mode'?
>> 
>> The latter I have quickly implemented, see attached
>> "GCN: '--param=gcn-preferred-vector-lane-width=[default,32,64]'".  OK to
>> push to trunk branch?
>> 
>> (This '--param' will also be useful for another bug/regression I'm about
>> to file.)
>> 
>>> Best, probably, both these things, to properly test both V32 and V64?
>> 
>> That part remains to be done, but is best done by someone who actually
>> knowns "GCN" assembly/GCC back end -- that is, not me.
>
> I'm not sure that this is *best* solution to the problem (in general, 
> it's probably best to test the actual code that will be generated in 
> practice), but I think this option will be useful for testing 
> performance in each configuration and other correctness issues, and 
> these tests are not testing that feature.

ACK.

> However, "vector lane width" sounds like it's configuring the number of 
> bits in each lane. I think "vectorization factor" is unambigous.
>
> OK to commit, with the name change.

Thanks, changed, and pushed v2 version to trunk branch in
commit df7625c3af004a81c13d54bb8810e03932eeb59a
"GCN: '--param=gcn-preferred-vectorization-factor=[default,32,64]'", see
attached.


Grüße
 Thomas


>>>  PASS: gcc.target/gcn/cond_fmaxnm_1.c (test for excess errors)
>>>  [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-not 
>>> \\tv_writelane_b32\\tv[0-9]+, vcc_..
>>>  [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-times 
>>> smaxv64df3_exec 3
>>>  [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-times 
>>> smaxv64sf3_exec 3
>>>  PASS: gcc.target/gcn/cond_fmaxnm_1_run.c (test for excess errors)
>>> 

New effective-target 'asm_goto_with_outputs' (was: [PATCH] testsuite: Fix up lra effective target)

2024-04-08 Thread Thomas Schwinge
Hi!

On 2024-03-21T12:20:38+0100, I wrote:
> On 2024-02-16T10:48:53-0800, Mike Stump  wrote:
>> On Feb 16, 2024, at 2:16 AM, Jakub Jelinek  wrote:
>>> 
>>> There is one special case, NVPTX, which is a TARGET_NO_REGISTER_ALLOCATION
>>> target.  I think claiming for it that it is a lra target is strange (even
>>> though it effectively returns true for targetm.lra_p ()), unsure if it
>>> supports asm goto with outputs or not, if it does and we want to test it,
>>> perhaps we should introduce asm_goto_outputs effective target and use
>>> lra || nvptx-*-* for that?
>>
>> Since the port people have to maintain that code in general, I usually leave 
>> it to them to try and select a cheap, maintainable way to manage it.
>>
>> If people want to pave the way, I'd tend to defer to them, having thought 
>> about more than I.
>
> Here I am.  ;-)
>
> After commit e16f90be2dc8af6c371fe79044c3e668fa3dda62
> "testsuite: Fix up lra effective target", we get for nvptx target:
>
> -PASS: gcc.c-torture/compile/asmgoto-2.c   -O0  (test for excess errors)
> +ERROR: gcc.c-torture/compile/asmgoto-2.c   -O0 : no files matched glob 
> pattern "lra1020113.c.[0-9][0-9][0-9]r.reload" for " dg-do 2 compile { target 
> lra } "
>
> Etc.
>
> That is, the current effective-target 'lra' is not suitable for nvptx --
> which, I suppose, is OK, given that nvptx neither uses LRA nor doesn't
> use LRA.  ;-) (Therefore, effective-target 'lra' shouldn't get used in
> test cases that are active for nvptx.)
>
> However, nvptx appears to support 'asm goto' with outputs, including the
> new execution test case:
>
> PASS: gcc.dg/pr107385.c execution test
>
> I'm attaching "[WIP] New effective-target 'asm_goto_with_outputs'", which
> does address the effective-target check for nvptx, and otherwise does
> 's%lra%asm_goto_with_outputs'.  (I have not yet actually merged
> 'check_effective_target_lra' into
> 'check_effective_target_asm_goto_with_outputs'.)
>
> I have verified that all current effective-target 'lra' test cases
> actually use 'asm goto' with outputs, there is just one exception:
> 'gcc.dg/pr110079.c' (see
> <https://inbox.sourceware.org/Zel5TMMr/3BHgl0g@tucnak>
> "bb-reorder: Fix -freorder-blocks-and-partition ICEs on aarch64 with asm goto 
> [PR110079]",
> <https://gcc.gnu.org/PR110079>
> "ICE with -freorder-blocks-and-partition and inline-asm goto").  That
> test case, 'gcc.dg/pr110079.c', currently uses 'target lra', and uses
> 'asm goto' -- but not with outputs, so is 'asm_goto_with_outputs' not
> really applicable?  The test case does PASS for nvptx target (but I've
> not verified what it's actually doing/testing).  How to handle that one?

I've now pushed a v2 version to trunk branch in
commit 3fa8bff30ab58bd8b8018764d390ec2fcc8153bb
"New effective-target 'asm_goto_with_outputs'", see attached.


Grüße
 Thomas


>From 3fa8bff30ab58bd8b8018764d390ec2fcc8153bb Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 4 Mar 2024 16:04:11 +0100
Subject: [PATCH] New effective-target 'asm_goto_with_outputs'

After commit e16f90be2dc8af6c371fe79044c3e668fa3dda62
"testsuite: Fix up lra effective target", we get for nvptx target:

-PASS: gcc.c-torture/compile/asmgoto-2.c   -O0  (test for excess errors)
+ERROR: gcc.c-torture/compile/asmgoto-2.c   -O0 : no files matched glob pattern "lra1020113.c.[0-9][0-9][0-9]r.reload" for " dg-do 2 compile { target lra } "

Etc.

However, nvptx appears to support 'asm goto' with outputs, including the
new execution test case:

PASS: gcc.dg/pr107385.c execution test

Therefore, generally use new effective-target 'asm_goto_with_outputs' instead
of 'lra'.  One exceptions is 'gcc.dg/pr110079.c', which doesn't use 'asm goto'
with outputs, and continues using effective-target 'lra', with special-casing
nvptx target, to avoid ERROR for 'lra'.

	gcc/
	* doc/sourcebuild.texi (Effective-Target Keywords): Document
	'asm_goto_with_outputs'.  Add comment to 'lra'.
	gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_lra): Add
	comment.
	(check_effective_target_asm_goto_with_outputs): New.
	* gcc.c-torture/compile/asmgoto-2.c: Use it.
	* gcc.c-torture/compile/asmgoto-5.c: Likewise.
	* gcc.c-torture/compile/asmgoto-6.c: Likewise.
	* gcc.c-torture/compile/pr98096.c: Likewise.
	* gcc.dg/pr100590.c: Likewise.
	* gcc.dg/pr107385.c: Likewise.
	* gcc.dg/pr108095.c: Likewise.
	* gcc.dg/pr97954.c: Likewise.
	* gcc.dg/torture/pr100329.c: Likewise.
	* gcc.dg/torture/pr100398.c: Likewise.
	* gcc.dg/torture/pr100519.c: Likewise.
	* gcc.dg/torture/pr110422.c: Likewise.
	* gcc.dg/pr110079.c: Special-case nvptx target.
---
 gcc/doc/sourcebuild.texi

GCN: '--param=gcn-preferred-vector-lane-width=[default,32,64]' (was: [committed] amdgcn: Prefer V32 on RDNA devices)

2024-04-08 Thread Thomas Schwinge
ort.c scan-assembler-times 
> __udivv64hi3@rel32@lo 0
> PASS: gcc.target/gcn/simd-math-5-short.c scan-assembler-times 
> __umodv64hi3@rel32@lo 0
>
> PASS: gcc.target/gcn/simd-math-5.c (test for excess errors)
> XFAIL: gcc.target/gcn/simd-math-5.c scan-assembler-times 
> __divmodv64si4@rel32@lo 1
> PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times __divsi3@rel32@lo 
> 1
> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5.c scan-assembler-times 
> __divv64si3@rel32@lo 1
> [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5.c scan-assembler-times 
> __modv64si3@rel32@lo 1
> PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times 
> __udivmodv64si4@rel32@lo 0
> PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times 
> __udivsi3@rel32@lo 0
> PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times 
> __udivv64si3@rel32@lo 0
> @@ -125242,13 +125242,13 @@ PASS: gcc.target/gcn/simd-math-5.c 
> scan-assembler-times __umodv64si3@rel32@lo 0
>
> PASS: gcc.target/gcn/smax_1.c (test for excess errors)
> PASS: gcc.target/gcn/smax_1.c scan-assembler-times \\tv_cmp_gt_i64\\tvcc, 
> v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 10
> FAIL: gcc.target/gcn/smax_1.c scan-assembler-times 
> \\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 80
> [-PASS:-]{+FAIL:+} gcc.target/gcn/smax_1.c scan-assembler-times 
> vec_cmpv64didi 10
> PASS: gcc.target/gcn/smax_1_run.c (test for excess errors)
> PASS: gcc.target/gcn/smax_1_run.c execution test
>
> PASS: gcc.target/gcn/smin_1.c (test for excess errors)
> PASS: gcc.target/gcn/smin_1.c scan-assembler-times \\tv_cmp_lt_i64\\tvcc, 
> v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 10
> FAIL: gcc.target/gcn/smin_1.c scan-assembler-times 
> \\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 80
> [-PASS:-]{+FAIL:+} gcc.target/gcn/smin_1.c scan-assembler-times 
> vec_cmpv64didi 10
> PASS: gcc.target/gcn/smin_1_run.c (test for excess errors)
> PASS: gcc.target/gcn/smin_1_run.c execution test
>
> PASS: gcc.target/gcn/sram-ecc-3.c (test for excess errors)
> [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-3.c scan-assembler 
> (\\*zero_extendv64qiv64si_sdwa|\\*zero_extendv64qiv64si_shift)
>
> PASS: gcc.target/gcn/sram-ecc-4.c (test for excess errors)
> [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-4.c scan-assembler 
> (\\*zero_extendv64hiv64si_sdwa|\\*zero_extendv64hiv64si_shift)
>
> PASS: gcc.target/gcn/sram-ecc-7.c (test for excess errors)
> [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-7.c scan-assembler 
> (\\*zero_extendv64qiv64si_sdwa|\\*zero_extendv64qiv64si_shift)
>
> PASS: gcc.target/gcn/sram-ecc-8.c (test for excess errors)
> [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-8.c scan-assembler 
> (\\*zero_extendv64hiv64si_sdwa|\\*zero_extendv64hiv64si_shift)
>
> PASS: gcc.target/gcn/umax_1.c (test for excess errors)
> PASS: gcc.target/gcn/umax_1.c scan-assembler-times \\tv_cmp_gt_u64\\tvcc, 
> v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 8
> FAIL: gcc.target/gcn/umax_1.c scan-assembler-times 
> \\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 56
> [-PASS:-]{+FAIL:+} gcc.target/gcn/umax_1.c scan-assembler-times 
> vec_cmpv64didi 8
> PASS: gcc.target/gcn/umax_1_run.c (test for excess errors)
> PASS: gcc.target/gcn/umax_1_run.c execution test
>
> PASS: gcc.target/gcn/umin_1.c (test for excess errors)
> PASS: gcc.target/gcn/umin_1.c scan-assembler-times \\tv_cmp_lt_u64\\tvcc, 
> v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 8
> FAIL: gcc.target/gcn/umin_1.c scan-assembler-times 
> \\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 56
> [-PASS:-]{+FAIL:+} gcc.target/gcn/umin_1.c scan-assembler-times 
> vec_cmpv64didi 8
> PASS: gcc.target/gcn/umin_1_run.c (test for excess errors)
> PASS: gcc.target/gcn/umin_1_run.c execution test
>
>
> Grüße
>  Thomas
>
>
>> gcc/ChangeLog:
>>
>>  * config/gcn/gcn.cc (gcn_vectorize_preferred_simd_mode): Prefer V32 on
>>  RDNA devices.
>> ---
>>  gcc/config/gcn/gcn.cc | 26 ++
>>  1 file changed, 26 insertions(+)
>>
>> diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
>> index 498146dcde9..efb73af50c4 100644
>> --- a/gcc/config/gcn/gcn.cc
>> +++ b/gcc/config/gcn/gcn.cc
>> @@ -5226,6 +5226,32 @@ gcn_vector_mode_supported_p (machine_mode mode)
>>  static machine_mode
>>  gcn_vectorize_preferred_simd_mode (scalar_mode mode)
>>  {
>> +  /* RDNA devices have 32-lane vectors with limited support for 64-bit 
>> vectors
>> + (in particular, permute operations are only available for cases that 
>> don't
>> + span the 32-lane boundary).
>> +
>> + From the RDNA3 manual: "Hardware may choose to 

nvptx: In mkoffload.cc, call diagnostic_color_init + gcc_init_libintl: Restore 'libgomp.c/reverse-offload-sm30.c' testing (was: [Patch] nvptx: In mkoffload.cc, call diagnostic_color_init + gcc_init_li

2024-04-05 Thread Thomas Schwinge
Hi!

On 2024-04-03T14:06:45+0200, Tobias Burnus  wrote:
> Nvptx's mkoffload.cc contains 14 'fatal_error' calls and one 'warning_at' 
> call,
> which stands out more clearly (color, bold) when enabling
>diagnostic_color_init
> which this patch does. — Additionally, the call gcc_init_libintl permits that
> the already translated error messages also show up as translation.
>
> OK for mainline?

But you've not regression-tested this?  Pushed to trunk branch
commit 679f81a32f706645f45900fdb1659fb5fe607f77
"nvptx: In mkoffload.cc, call diagnostic_color_init + gcc_init_libintl: Restore 
'libgomp.c/reverse-offload-sm30.c' testing",
see attached.


Instead of adding support for all the '-fdiagnostics-color' variants, I
suppose we should rather switch the 'mkoffload's to use GCC's standard
option handling machinery (like in 'gcc/lto-wrapper.cc', for example)?


Grüße
 Thomas


> PS: Example: 'nvptx mkoffload:' is bold and 'fatal error:' is in red
> in English and some language variants.
>
> nvptx mkoffload: fatal error: COLLECT_GCC must be set.
> nvptx mkoffload: 致命的エラー: COLLECT_GCC must be set.
> nvptx mkoffload: erreur fatale: COLLECT_GCC doit être défini.
> nvptx mkoffload: schwerwiegender Fehler: COLLECT_GCC muss gesetzt sein.
>
> (BTW: It looks as if many languages did not translate the error string
> itself, e.g. jp or zh or pl or zh_TW/zh_CN or fi or ...)
> nvptx: In mkoffload.cc, call diagnostic_color_init + gcc_init_libintl
>
> gcc/ChangeLog:
>
>   * config/nvptx/mkoffload.cc (main): Call
>   gcc_init_libintl and diagnostic_color_init.
>
>  gcc/config/nvptx/mkoffload.cc | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
> index a7fc28cbd3f..503b1abcefd 100644
> --- a/gcc/config/nvptx/mkoffload.cc
> +++ b/gcc/config/nvptx/mkoffload.cc
> @@ -638,7 +638,9 @@ main (int argc, char **argv)
>const char *outname = 0;
>  
>progname = tool_name;
> +  gcc_init_libintl ();
>diagnostic_initialize (global_dc, 0);
> +  diagnostic_color_init (global_dc);
>  
>if (atexit (mkoffload_cleanup) != 0)
>      fatal_error (input_location, "atexit failed");


>From 679f81a32f706645f45900fdb1659fb5fe607f77 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 5 Apr 2024 14:04:53 +0200
Subject: [PATCH] nvptx: In mkoffload.cc, call diagnostic_color_init +
 gcc_init_libintl: Restore 'libgomp.c/reverse-offload-sm30.c' testing

With commit 7520a4992c94254016085a461c58c972497c4483
"nvptx: In mkoffload.cc, call diagnostic_color_init + gcc_init_libintl",
we regressed:

[-PASS:-]{+FAIL:+} libgomp.c/reverse-offload-sm30.c  at line 15 (test for warnings, line )
[-PASS:-]{+FAIL:+} libgomp.c/reverse-offload-sm30.c (test for excess errors)

	libgomp/
	* testsuite/libgomp.c/reverse-offload-sm30.c: Set 'GCC_COLORS' to the empty string.
---
 libgomp/testsuite/libgomp.c/reverse-offload-sm30.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
index 7f10fd4ded9..cae75f03462 100644
--- a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
+++ b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
@@ -12,4 +12,7 @@ main ()
   return 0;
 }
 
+/* The 'mkoffload's currently don't obey '-fno-diagnostics-color' etc., so use a different way to effect the same thing:
+   { dg-set-compiler-env-var GCC_COLORS "" }
+   ..., so that the following regexp doesn't have to deal with color code escape sequences.  */
 /* { dg-warning "'omp requires reverse_offload' requires at least 'sm_35' for '-foffload-options=nvptx-none=-march=' - disabling offload-code generation for this device type" "" { target *-*-* } 0 } */
-- 
2.34.1



Re: [Patch] nvptx: In mkoffload.cc, call diagnostic_color_init + gcc_init_libintl

2024-04-04 Thread Thomas Schwinge
Hi Tobias!

On 2024-04-03T14:06:45+0200, Tobias Burnus  wrote:
> Nvptx's mkoffload.cc contains 14 'fatal_error' calls and one 'warning_at' 
> call,
> which stands out more clearly (color, bold) when enabling
>diagnostic_color_init
> which this patch does. — Additionally, the call gcc_init_libintl permits that
> the already translated error messages also show up as translation.
>
> OK for mainline?

OK, thanks.


Grüße
 Thomas


> PS: Example: 'nvptx mkoffload:' is bold and 'fatal error:' is in red
> in English and some language variants.
>
> nvptx mkoffload: fatal error: COLLECT_GCC must be set.
> nvptx mkoffload: 致命的エラー: COLLECT_GCC must be set.
> nvptx mkoffload: erreur fatale: COLLECT_GCC doit être défini.
> nvptx mkoffload: schwerwiegender Fehler: COLLECT_GCC muss gesetzt sein.
>
> (BTW: It looks as if many languages did not translate the error string
> itself, e.g. jp or zh or pl or zh_TW/zh_CN or fi or ...)

> nvptx: In mkoffload.cc, call diagnostic_color_init + gcc_init_libintl
>
> gcc/ChangeLog:
>
>   * config/nvptx/mkoffload.cc (main): Call
>   gcc_init_libintl and diagnostic_color_init.
>
>  gcc/config/nvptx/mkoffload.cc | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
> index a7fc28cbd3f..503b1abcefd 100644
> --- a/gcc/config/nvptx/mkoffload.cc
> +++ b/gcc/config/nvptx/mkoffload.cc
> @@ -638,7 +638,9 @@ main (int argc, char **argv)
>const char *outname = 0;
>  
>progname = tool_name;
> +  gcc_init_libintl ();
>diagnostic_initialize (global_dc, 0);
> +  diagnostic_color_init (global_dc);
>  
>if (atexit (mkoffload_cleanup) != 0)
>  fatal_error (input_location, "atexit failed");


Re: [committed] amdgcn: Adjust GFX10/GFX11 cache coherency

2024-04-04 Thread Thomas Schwinge
Hi!

To again state this in public:

On 2024-03-22T15:54:49+, Andrew Stubbs  wrote:
> The RDNA devices have different cache architectures to the CDNA devices, and
> the differences go deeper than just the assembler mnemonics, so we
> probably need to generate different code to maintain coherency across
> the whole device.
>
> I believe this patch is correct according to the documentation in the LLVM
> AMDGPU user guide (the ISA manual is less instructive), but I hadn't observed
> any real problems before (or after).
>
> Committed to mainline.

Thanks!  This commit does repair a lot of the GCN offloading damage noted
in 
"libgomp GCN gfx1030/gfx1100 offloading status" and thereabouts, that is,
this recovers to PASS a lot of twinkling libgomp/OpenMP/GCN execution
test cases, and their even more annyoing random timeouts.

(The commit doesn't affect GCN target testing.)


I still have a number of stabilization hacks applied to my sources -- but
I've, for example, not seen any 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' or
random timeouts in my current GCN offloading test results.


Grüße
 Thomas


> gcc/ChangeLog:
>
>   * config/gcn/gcn.md (*memory_barrier): Split into RDNA and !RDNA.
>   (atomic_load): Adjust RDNA cache settings.
>   (atomic_store): Likewise.
>   (atomic_exchange): Likewise.
> ---
>  gcc/config/gcn/gcn.md | 86 +++
>  1 file changed, 55 insertions(+), 31 deletions(-)
>
> diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
> index 3b51453aaca..574c2f87e8c 100644
> --- a/gcc/config/gcn/gcn.md
> +++ b/gcc/config/gcn/gcn.md
> @@ -1960,11 +1960,19 @@
>  (define_insn "*memory_barrier"
>[(set (match_operand:BLK 0)
>   (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))]
> -  ""
> -  "{buffer_wbinvl1_vol|buffer_gl0_inv}"
> +  "!TARGET_RDNA2_PLUS"
> +  "buffer_wbinvl1_vol"
>[(set_attr "type" "mubuf")
> (set_attr "length" "4")])
>  
> +(define_insn "*memory_barrier"
> +  [(set (match_operand:BLK 0)
> + (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))]
> +  "TARGET_RDNA2_PLUS"
> +  "buffer_gl1_inv\;buffer_gl0_inv"
> +  [(set_attr "type" "mult")
> +   (set_attr "length" "8")])
> +
>  ; FIXME: These patterns have been disabled as they do not seem to work
>  ; reliably - they can cause hangs or incorrect results.
>  ; TODO: flush caches according to memory model
> @@ -2094,9 +2102,13 @@
> case 0:
>   return "s_load%o0\t%0, %A1 glc\;s_waitcnt\tlgkmcnt(0)";
> case 1:
> - return "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0";
> + return (TARGET_RDNA2 /* Not GFX11.  */
> + ? "flat_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\t0"
> + : "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0");
> case 2:
> - return "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)";
> + return (TARGET_RDNA2 /* Not GFX11.  */
> + ? "global_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\tvmcnt(0)"
> + : "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)");
> }
>   break;
>case MEMMODEL_CONSUME:
> @@ -2108,15 +2120,21 @@
>   return "s_load%o0\t%0, %A1 glc\;s_waitcnt\tlgkmcnt(0)\;"
>  "s_dcache_wb_vol";
> case 1:
> - return (TARGET_RDNA2_PLUS
> + return (TARGET_RDNA2
> + ? "flat_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\t0\;"
> +   "buffer_gl1_inv\;buffer_gl0_inv"
> + : TARGET_RDNA3
>   ? "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0\;"
> -   "buffer_gl0_inv"
> +   "buffer_gl1_inv\;buffer_gl0_inv"
>   : "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0\;"
> "buffer_wbinvl1_vol");
> case 2:
> - return (TARGET_RDNA2_PLUS
> + return (TARGET_RDNA2
> + ? "global_load%o0\t%0, %A1%O1 glc 
> dlc\;s_waitcnt\tvmcnt(0)\;"
> +   "buffer_gl1_inv\;buffer_gl0_inv"
> + : TARGET_RDNA3
>   ? "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)\;"
> -   "buffer_gl0_inv"
> +   "buffer_gl1_inv\;buffer_gl0_inv"
>   : "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)\;"
> "buffer_wbinvl1_vol");
> }
> @@ -2130,15 +2148,21 @@
>   return "s_dcache_wb_vol\;s_load%o0\t%0, %A1 glc\;"
>  "s_waitcnt\tlgkmcnt(0)\;s_dcache_inv_vol";
> case 1:
> - return (TARGET_RDNA2_PLUS
> - ? "buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 glc\;"
> -   "s_waitcnt\t0\;buffer_gl0_inv"
> + return (TARGET_RDNA2
> + ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 
> glc dlc\;"
> +   "s_waitcnt\t0\;buffer_gl1_inv\;buffer_gl0_inv"
> + : TARGET_RDNA3
> +   

Re: [committed] amdgcn: Prefer V32 on RDNA devices

2024-03-28 Thread Thomas Schwinge
Hi Andrew!

On 2024-03-22T15:54:48+, Andrew Stubbs  wrote:
> This patch alters the default (preferred) vector size to 32 on RDNA devices to
> better match the actual hardware.  64-lane vectors will continue to be
> used where they are hard-coded (such as function prologues).
>
> We run these devices in wavefrontsize64 for compatibility, but they actually
> only have 32-lane vectors, natively.  If the upper part of a V64 is masked
> off (as it is in V32) then RDNA devices will skip execution of the upper part
> for most operations, so this adjustment shouldn't leave too much performance 
> on
> the table.  One exception is memory instructions, so full wavefrontsize32
> support would be better.
>
> The advantage is that we avoid the missing V64 operations (such as permute and
> vec_extract).
>
> Committed to mainline.

In my GCN target '-march=gfx1100' testing, this commit
"amdgcn: Prefer V32 on RDNA devices" does resolve (or, make latent?) a
number of execution test FAILs (that is, regressions compared to earlier
'-march=gfx90a' etc. testing).

This commit also resolves (for my '-march=gfx1100' testing) one
pre-existing FAIL (that is, already seen in '-march=gfx90a' earlier
etc. testing):

PASS: gcc.dg/tree-ssa/scev-14.c (test for excess errors)
[-FAIL:-]{+PASS:+} gcc.dg/tree-ssa/scev-14.c scan-tree-dump ivopts 
"Overflowness wrto loop niter:\tNo-overflow"

That means, this test case specifically (or, just its 'scan-tree-dump'?)
needs to be adjusted for GCN V64 testing?

This commit, as you'd also mentioned elsewhere, however also causes a
number of regressions in 'gcc.target/gcn/gcn.exp', see list below.

Those can be "fixed" with 'dg-additional-options -march=gfx90a' (or
similar) in the affected test cases (let me know if you'd like me to
'git push' that), but I suppose something more elaborate may be in order?
(Conditionalize those on 'target { ! gcn_rdna }', and add respective
scanning for 'target gcn_rdna'?  I can help with effective-target
'gcn_rdna' (or similar), if you'd like me to.)

And/or, have a '-mpreferred-simd-mode=v64' (or similar) to be used for
such test cases, to override 'if (TARGET_RDNA2_PLUS)' etc. in
'gcn_vectorize_preferred_simd_mode'?

Best, probably, both these things, to properly test both V32 and V64?

PASS: gcc.target/gcn/cond_fmaxnm_1.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-times 
smaxv64df3_exec 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-times 
smaxv64sf3_exec 3
PASS: gcc.target/gcn/cond_fmaxnm_1_run.c (test for excess errors)
PASS: gcc.target/gcn/cond_fmaxnm_1_run.c execution test

PASS: gcc.target/gcn/cond_fmaxnm_2.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_2.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_2.c scan-assembler-times 
smaxv64df3_exec 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_2.c scan-assembler-times 
smaxv64sf3_exec 3
PASS: gcc.target/gcn/cond_fmaxnm_2_run.c (test for excess errors)
PASS: gcc.target/gcn/cond_fmaxnm_2_run.c execution test

PASS: gcc.target/gcn/cond_fmaxnm_3.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times 
movv64df_exec 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times 
movv64sf_exec 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times 
smaxv64sf3 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times 
smaxv64sf3 3
PASS: gcc.target/gcn/cond_fmaxnm_3_run.c (test for excess errors)
PASS: gcc.target/gcn/cond_fmaxnm_3_run.c execution test

PASS: gcc.target/gcn/cond_fmaxnm_4.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times 
movv64df_exec 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times 
movv64sf_exec 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times 
smaxv64sf3 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times 
smaxv64sf3 3
PASS: gcc.target/gcn/cond_fmaxnm_4_run.c (test for excess errors)
PASS: gcc.target/gcn/cond_fmaxnm_4_run.c execution test

PASS: gcc.target/gcn/cond_fmaxnm_5.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_5.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_5.c scan-assembler-times 
smaxv64df3_exec 3
[-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_5.c scan-assembler-times 
smaxv64sf3_exec 3
PASS: 

Re: [PATCH] vect: more oversized bitmask fixups

2024-03-27 Thread Thomas Schwinge
Hi!

On 2024-03-22T14:15:36+, Andrew Stubbs  wrote:
> On 22/03/2024 08:43, Richard Biener wrote:
> Thanks, here's what I pushed.

> vect: more oversized bitmask fixups
>
> These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when
> configured to use V32 instead of V64 (I plan to do this for RDNA devices).

Thanks, confirming that this "vect: more oversized bitmask fixups" does
fix the GCN target '-march=gfx1100' testing regression:

PASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/vect/tsvc/vect-tsvc-s278.c execution test
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 
loops"

PASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/vect/tsvc/vect-tsvc-s279.c execution test
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 
loops"

... that I saw introduced by "amdgcn: Prefer V32 on RDNA devices".

(The XPASSes are independent of that, pre-existing.)


Grüße
 Thomas


> The problem was that a "not" operation on the mask inadvertently enabled
> inactive lanes 31-63 and corrupted the output.  The fix is to adjust the mask
> when calling internal functions (in this case COND_MINUS), when doing masked
> loads and stores, and when doing conditional jumps (some cases were already
> handled).
>
> gcc/ChangeLog:
>
>   * dojump.cc (do_compare_rtx_and_jump): Clear excess bits in vector
>   bitmasks.
>   (do_compare_and_jump): Remove now-redundant similar code.
>   * internal-fn.cc (expand_fn_using_insn): Clear excess bits in vector
>   bitmasks.
>   (add_mask_and_len_args): Likewise.
>
> diff --git a/gcc/dojump.cc b/gcc/dojump.cc
> index 88600cb42d3..5f74b696b41 100644
> --- a/gcc/dojump.cc
> +++ b/gcc/dojump.cc
> @@ -1235,6 +1235,24 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum 
> rtx_code code, int unsignedp,
>   }
>   }
>  
> +  /* For boolean vectors with less than mode precision
> +  make sure to fill padding with consistent values.  */
> +  if (val
> +   && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
> +   && SCALAR_INT_MODE_P (mode))
> + {
> +   auto nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (val)).to_constant ();
> +   if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
> + {
> +   op0 = expand_binop (mode, and_optab, op0,
> +   GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> +   NULL_RTX, true, OPTAB_WIDEN);
> +   op1 = expand_binop (mode, and_optab, op1,
> +   GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> +   NULL_RTX, true, OPTAB_WIDEN);
> + }
> + }
> +
>emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, val,
>  if_true_label, prob);
>  }
> @@ -1266,7 +1284,6 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum 
> rtx_code signed_code,
>machine_mode mode;
>int unsignedp;
>enum rtx_code code;
> -  unsigned HOST_WIDE_INT nunits;
>  
>/* Don't crash if the comparison was erroneous.  */
>op0 = expand_normal (treeop0);
> @@ -1309,21 +1326,6 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum 
> rtx_code signed_code,
>emit_insn (targetm.gen_canonicalize_funcptr_for_compare (new_op1, 
> op1));
>op1 = new_op1;
>  }
> -  /* For boolean vectors with less than mode precision
> - make sure to fill padding with consistent values.  */
> -  else if (VECTOR_BOOLEAN_TYPE_P (type)
> -&& SCALAR_INT_MODE_P (mode)
> -&& TYPE_VECTOR_SUBPARTS (type).is_constant ()
> -&& maybe_ne (GET_MODE_PRECISION (mode), nunits))
> -{
> -  gcc_assert (code == EQ || code == NE);
> -  op0 = expand_binop (mode, and_optab, op0,
> -   GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), NULL_RTX,
> -   true, OPTAB_WIDEN);
> -  op1 = expand_binop (mode, and_optab, op1,
> -   GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1), NULL_RTX,
> -   true, OPTAB_WIDEN);
> -}
>  
>do_compare_rtx_and_jump (op0, op1, code, unsignedp, treeop0, mode,
>  ((mode == BLKmode)
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index fcf47c7fa12..5269f0ac528 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -245,6 +245,18 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
> unsigned int noutputs,
>  && SSA_NAME_IS_DEFAULT_DEF (rhs)
>  && VAR_P (SSA_NAME_VAR (rhs)))
>   create_undefined_input_operand ([opno], TYPE_MODE (rhs_type));
> +  else if (VECTOR_BOOLEAN_TYPE_P (rhs_type)
> +&& SCALAR_INT_MODE_P (TYPE_MODE (rhs_type))
> +&& maybe_ne (GET_MODE_PRECISION (TYPE_MODE (rhs_type)),
> + TYPE_VECTOR_SUBPARTS (rhs_type).to_constant ()))
> + {
> +  

Re: [committed] amdgcn: Ensure gfx11 is running in cumode

2024-03-22 Thread Thomas Schwinge
Hi Andrew!

On 2024-03-21T13:39:53+, Andrew Stubbs  wrote:
> CUmode "on" is the setting for compatibility with GCN and CDNA devices.

> --- a/gcc/config/gcn/gcn-hsa.h
> +++ b/gcc/config/gcn/gcn-hsa.h
> @@ -107,6 +107,7 @@ extern unsigned int gcn_local_sym_hash (const char *name);
> "%{" NO_XNACK XNACKOPT "} " \
> "%{" NO_SRAM_ECC SRAMOPT "} " \
> "%{march=gfx1030|march=gfx1100:-mattr=+wavefrontsize64} " \
> +   "%{march=gfx1030|march=gfx1100:-mattr=+cumode} " \
> "-filetype=obj"

Is this just general housekeeping, or should I be seeing any kind of
change in the GCN target '-march=gfx1100' test results?  (I'm not.)


Grüße
 Thomas


New effective-target 'asm_goto_with_outputs' (was: [PATCH] testsuite: Fix up lra effective target)

2024-03-21 Thread Thomas Schwinge
Hi!

On 2024-02-16T10:48:53-0800, Mike Stump  wrote:
> On Feb 16, 2024, at 2:16 AM, Jakub Jelinek  wrote:
>> 
>> There is one special case, NVPTX, which is a TARGET_NO_REGISTER_ALLOCATION
>> target.  I think claiming for it that it is a lra target is strange (even
>> though it effectively returns true for targetm.lra_p ()), unsure if it
>> supports asm goto with outputs or not, if it does and we want to test it,
>> perhaps we should introduce asm_goto_outputs effective target and use
>> lra || nvptx-*-* for that?
>
> Since the port people have to maintain that code in general, I usually leave 
> it to them to try and select a cheap, maintainable way to manage it.
>
> If people want to pave the way, I'd tend to defer to them, having thought 
> about more than I.

Here I am.  ;-)

After commit e16f90be2dc8af6c371fe79044c3e668fa3dda62
"testsuite: Fix up lra effective target", we get for nvptx target:

-PASS: gcc.c-torture/compile/asmgoto-2.c   -O0  (test for excess errors)
+ERROR: gcc.c-torture/compile/asmgoto-2.c   -O0 : no files matched glob 
pattern "lra1020113.c.[0-9][0-9][0-9]r.reload" for " dg-do 2 compile { target 
lra } "

Etc.

That is, the current effective-target 'lra' is not suitable for nvptx --
which, I suppose, is OK, given that nvptx neither uses LRA nor doesn't
use LRA.  ;-) (Therefore, effective-target 'lra' shouldn't get used in
test cases that are active for nvptx.)

However, nvptx appears to support 'asm goto' with outputs, including the
new execution test case:

PASS: gcc.dg/pr107385.c execution test

I'm attaching "[WIP] New effective-target 'asm_goto_with_outputs'", which
does address the effective-target check for nvptx, and otherwise does
's%lra%asm_goto_with_outputs'.  (I have not yet actually merged
'check_effective_target_lra' into
'check_effective_target_asm_goto_with_outputs'.)

I have verified that all current effective-target 'lra' test cases
actually use 'asm goto' with outputs, there is just one exception:
'gcc.dg/pr110079.c' (see
<https://inbox.sourceware.org/Zel5TMMr/3BHgl0g@tucnak>
"bb-reorder: Fix -freorder-blocks-and-partition ICEs on aarch64 with asm goto 
[PR110079]",
<https://gcc.gnu.org/PR110079>
"ICE with -freorder-blocks-and-partition and inline-asm goto").  That
test case, 'gcc.dg/pr110079.c', currently uses 'target lra', and uses
'asm goto' -- but not with outputs, so is 'asm_goto_with_outputs' not
really applicable?  The test case does PASS for nvptx target (but I've
not verified what it's actually doing/testing).  How to handle that one?


Grüße
 Thomas


>From d9f8faaa5026bb970b3246235eb22bf9b5e9fe3a Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 4 Mar 2024 16:04:11 +0100
Subject: [PATCH] [WIP] New effective-target 'asm_goto_with_outputs'

After commit e16f90be2dc8af6c371fe79044c3e668fa3dda62
"testsuite: Fix up lra effective target", we get for nvptx target:

-PASS: gcc.c-torture/compile/asmgoto-2.c   -O0  (test for excess errors)
+ERROR: gcc.c-torture/compile/asmgoto-2.c   -O0 : no files matched glob pattern "lra1020113.c.[0-9][0-9][0-9]r.reload" for " dg-do 2 compile { target lra } "

Etc.

However, nvptx appears to support 'asm goto' with outputs, including the
new execution test case:

PASS: gcc.dg/pr107385.c execution test

TODO gcc/testsuite/gcc.dg/pr110079.c
doesn't using 'asm_goto' with outputs, but is PASS for nvptx, and would ERROR for 'target lra'.
---
 gcc/doc/sourcebuild.texi| 3 +++
 gcc/testsuite/gcc.c-torture/compile/asmgoto-2.c | 2 +-
 gcc/testsuite/gcc.c-torture/compile/asmgoto-5.c | 2 +-
 gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c | 3 +--
 gcc/testsuite/gcc.c-torture/compile/pr98096.c   | 2 +-
 gcc/testsuite/gcc.dg/pr100590.c | 2 +-
 gcc/testsuite/gcc.dg/pr107385.c | 2 +-
 gcc/testsuite/gcc.dg/pr108095.c | 2 +-
 gcc/testsuite/gcc.dg/pr110079.c | 2 +-
 gcc/testsuite/gcc.dg/pr97954.c  | 2 +-
 gcc/testsuite/gcc.dg/torture/pr100329.c | 2 +-
 gcc/testsuite/gcc.dg/torture/pr100398.c | 2 +-
 gcc/testsuite/gcc.dg/torture/pr100519.c | 2 +-
 gcc/testsuite/gcc.dg/torture/pr110422.c | 2 +-
 gcc/testsuite/lib/target-supports.exp   | 9 +
 15 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index b56b9c39733..a176a3c864f 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2863,6 +2863,9 @@ Target supports weak undefined symbols
 @item R_flag_in_section
 Target supports the 'R' flag in .section directive in assembly inputs.
 
+@item asm_goto_with_outputs
+Target supports 'asm goto' with outputs.
+
 @item automatic_stack_alignment
 Target supports automatic stack alignment.
 
diff --git a/gcc/testsuite/gcc.c-torture

GCN: Enable effective-target 'vect_hw_misalign'

2024-03-21 Thread Thomas Schwinge
Hi!

OK to push the attached
"GCN: Enable effective-target 'vect_hw_misalign'"?  (Or is that not what
you'd expect to see for GCN?  I haven't checked the actual back end
code...)


Grüße
 Thomas


>From dad0686e179e9395408a39ccfbf760bc30acffc9 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 20 Mar 2024 23:52:26 +0100
Subject: [PATCH] GCN: Enable effective-target 'vect_hw_misalign'

... as made apparent by commit 4e1fcf44bdc582e71408175d75e025f5be8b0e55
"testsuite: vect: Require vect_hw_misalign in gcc.dg/vect/vect-cost-model-1.c etc. [PR98238]"
causing:

 PASS: gcc.dg/vect/vect-cost-model-1.c (test for excess errors)
-PASS: gcc.dg/vect/vect-cost-model-1.c scan-tree-dump vect "LOOP VECTORIZED"

 PASS: gcc.dg/vect/vect-cost-model-3.c (test for excess errors)
-PASS: gcc.dg/vect/vect-cost-model-3.c scan-tree-dump vect "LOOP VECTORIZED"

 PASS: gcc.dg/vect/vect-cost-model-5.c (test for excess errors)
-PASS: gcc.dg/vect/vect-cost-model-5.c scan-tree-dump vect "LOOP VECTORIZED"

..., and similarly commit ffd47fb63ddc024db847daa07f8ae27fffdfcb28
"testsuite: Fix pr113431.c FAIL on sparc* [PR113431]" causing:

 PASS: gcc.dg/vect/pr113431.c (test for excess errors)
 PASS: gcc.dg/vect/pr113431.c execution test
-PASS: gcc.dg/vect/pr113431.c scan-tree-dump-times slp1 "optimized: basic block part vectorized" 2

..., which this commit all restores, and also enables a good number of further
FAIL -> PASS, UNSUPPORTED -> PASS, etc. progressions.  There are also a small
number of regressions, mostly in the SLP area apparently:

 PASS: gcc.dg/vect/bb-slp-layout-12.c (test for excess errors)
+XPASS: gcc.dg/vect/bb-slp-layout-12.c scan-tree-dump-not slp1 "duplicating permutation node"
+XFAIL: gcc.dg/vect/bb-slp-layout-12.c scan-tree-dump-times slp1 "add new stmt: [^\\n\\r]* = VEC_PERM_EXPR" 3

 PASS: gcc.dg/vect/bb-slp-layout-6.c (test for excess errors)
+FAIL: gcc.dg/vect/bb-slp-layout-6.c scan-tree-dump slp2 "absorbing input layouts"

 PASS: gcc.dg/vect/pr97428.c (test for excess errors)
 PASS: gcc.dg/vect/pr97428.c scan-tree-dump vect "Detected interleaving load of size 8"
 PASS: gcc.dg/vect/pr97428.c scan-tree-dump vect "Detected interleaving store of size 16"
 PASS: gcc.dg/vect/pr97428.c scan-tree-dump-not vect "gap of 6 elements"
-XFAIL: gcc.dg/vect/pr97428.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2
+FAIL: gcc.dg/vect/pr97428.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2

 PASS: gcc.dg/vect/vect-33.c (test for excess errors)
+FAIL: gcc.dg/vect/vect-33.c scan-tree-dump vect "Vectorizing an unaligned access"
 PASS: gcc.dg/vect/vect-33.c scan-tree-dump-not optimized "Invalid sum"
 PASS: gcc.dg/vect/vect-33.c scan-tree-dump-times vect "vectorized 1 loops" 1

..., so some further conditionalizing etc. seems necessary.  These seem to
mostly appear next to pre-existing similar FAILs in related test cases.
(Overall, way more PASS than FAIL.)

	gcc/testsuite/
	* lib/target-supports.exp
	(check_effective_target_vect_hw_misalign): Enable for GCN.
	(check_effective_target_vect_element_align): Adjust.
---
 gcc/testsuite/lib/target-supports.exp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 302781e91de..2291a673d53 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8309,7 +8309,8 @@ proc check_effective_target_vect_hw_misalign { } {
 	 || ([istarget s390*-*-*]
 		 && [check_effective_target_s390_vx])
 	 || ([istarget riscv*-*-*])
-	 || ([istarget loongarch*-*-*]) } {
+	 || ([istarget loongarch*-*-*])
+	 || [istarget amdgcn*-*-*] } {
 	  return 1
 	}
 	if { [istarget arm*-*-*]
@@ -8873,8 +8874,7 @@ proc check_effective_target_vect_element_align { } {
 return [check_cached_effective_target_indexed vect_element_align {
   expr { ([istarget arm*-*-*]
 	  && ![check_effective_target_arm_vect_no_misalign])
-	 || [check_effective_target_vect_hw_misalign]
-	 || [istarget amdgcn-*-*] }}]
+	 || [check_effective_target_vect_hw_misalign] }}]
 }
 
 # Return 1 if we expect to see unaligned accesses in at least some
-- 
2.34.1



GCN: Enable effective-target 'vect_long_mult'

2024-03-21 Thread Thomas Schwinge
Hi!

OK to push the attached "GCN: Enable effective-target 'vect_long_mult'"?
(Or is that not what you'd expect to see for GCN?  I haven't checked the
actual back end code...)


Grüße
 Thomas


>From e0e58dfc350581ed0519420ad02adcc01e645eae Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 20 Mar 2024 23:56:58 +0100
Subject: [PATCH] GCN: Enable effective-target 'vect_long_mult'

... as made apparent by commit bfd6b36f08021f023e0e9223f5aea315b74a5c56
"testsuite/vect: Fix pr25413a.c expectations [PR109705]" causing:

 PASS: gcc.dg/vect/pr25413a.c (test for excess errors)
 PASS: gcc.dg/vect/pr25413a.c execution test
-PASS: gcc.dg/vect/pr25413a.c scan-tree-dump-times vect "vectorized 2 loops" 1
+FAIL: gcc.dg/vect/pr25413a.c scan-tree-dump-times vect "vectorized 1 loops" 1

..., which this commit resolves.

	gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_vect_long_mult):
	Enable for GCN.
---
 gcc/testsuite/lib/target-supports.exp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 2291a673d53..452b36ff927 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9056,7 +9056,8 @@ proc check_effective_target_vect_long_mult { } {
 	 || ([istarget riscv*-*-*]
 	  && [check_effective_target_riscv_v])
 	 || ([istarget loongarch*-*-*]
-	  && [check_effective_target_loongarch_sx]) } {
+	  && [check_effective_target_loongarch_sx])
+	 || [istarget amdgcn-*-*] } {
 	set answer 1
 } else {
 	set answer 0
-- 
2.34.1



GCN: Enable effective-target 'vect_early_break', 'vect_early_break_hw'

2024-03-21 Thread Thomas Schwinge
Hi!

On 2024-01-12T15:02:35+0100, I wrote:
> OK to push the attached
> "GCN: Enable effective-target 'vect_early_break', 'vect_early_break_hw'"?

Ping.  (Or is that not what you'd expect to see for GCN?  I haven't
checked the actual back end code...)


> ("The relevant test cases are all-PASS with just [two] exceptions, to be
> looked into individually, later on."  I'm not currently planning to look
> into that.)

(One of those actually going to be fixed by a different patch to be
posted in a moment.)


Grüße
 Thomas


>From 3193614c4f9a8032e85a4da87bde8055aeee7d7b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 9 Jan 2024 10:25:48 +0100
Subject: [PATCH] GCN: Enable effective-target 'vect_early_break',
 'vect_early_break_hw'

Via XPASSing test cases after commit a657c7e3518fcfc796f223d47385cad5e97dc9a5
"testsuite: un-xfail TSVC loops that check for exit control flow vectorization":

PASS: gcc.dg/vect/tsvc/vect-tsvc-s332.c (test for excess errors)
PASS: gcc.dg/vect/tsvc/vect-tsvc-s332.c execution test
[-XFAIL:-]{+XPASS:+} gcc.dg/vect/tsvc/vect-tsvc-s332.c scan-tree-dump vect "vectorized 1 loops"

PASS: gcc.dg/vect/tsvc/vect-tsvc-s481.c (test for excess errors)
PASS: gcc.dg/vect/tsvc/vect-tsvc-s481.c execution test
[-XFAIL:-]{+XPASS:+} gcc.dg/vect/tsvc/vect-tsvc-s481.c scan-tree-dump vect "vectorized 1 loops"

PASS: gcc.dg/vect/tsvc/vect-tsvc-s482.c (test for excess errors)
PASS: gcc.dg/vect/tsvc/vect-tsvc-s482.c execution test
[-XFAIL:-]{+XPASS:+} gcc.dg/vect/tsvc/vect-tsvc-s482.c scan-tree-dump vect "vectorized 1 loops"

..., it became apparent that GCN, too, does support vectorization of loops with
early breaks.  The relevant test cases are all-PASS with just the following
exceptions, to be looked into individually, later on:

PASS: gcc.dg/vect/vect-early-break_25.c (test for excess errors)
PASS: gcc.dg/vect/vect-early-break_25.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-early-break_25.c scan-tree-dump-times vect "Alignment of access forced using peeling" 1

PASS: gcc.dg/vect/vect-early-break_56.c (test for excess errors)
PASS: gcc.dg/vect/vect-early-break_56.c execution test
XPASS: gcc.dg/vect/vect-early-break_56.c scan-tree-dump-times vect "vectorized 2 loops" 2

	gcc/testsuite/
	* lib/target-supports.exp
	(check_effective_target_vect_early_break)
	(check_effective_target_vect_early_break_hw): Enable for GCN.
---
 gcc/testsuite/lib/target-supports.exp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 75d1add894f..497c46de4cb 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4071,6 +4071,7 @@ proc check_effective_target_vect_early_break { } {
 	[istarget aarch64*-*-*]
 	|| [check_effective_target_arm_v8_neon_ok]
 	|| [check_effective_target_sse4]
+	|| [istarget amdgcn-*-*]
 	}}]
 }
 
@@ -4085,6 +4086,7 @@ proc check_effective_target_vect_early_break_hw { } {
 	[istarget aarch64*-*-*]
 	|| [check_effective_target_arm_v8_neon_hw]
 	|| [check_sse4_hw_available]
+	|| [istarget amdgcn-*-*]
 	}}]
 }
 
-- 
2.34.1



Re: [PATCH gcc] Hurd x86_64: add unwind support for signal trampoline code

2024-03-20 Thread Thomas Schwinge
Hi!

Please note that emails to , or
 don't reach me anymore, and, at least for
the time being, likewise for  --
 is the new thing; see
.
(Or use , ,
, as before.)


On 2024-03-01T02:33:10+0100, Samuel Thibault  wrote:
> Flavio Cruz, le mer. 28 févr. 2024 22:59:09 -0500, a ecrit:
>> Tested with some simple toy examples where an exception is thrown in the
>> signal handler.
>> 
>> libgcc/ChangeLog:
>>  * config/i386/gnu-unwind.h: Support unwinding x86_64 signal frames.
>> 
>> Signed-off-by: Flavio Cruz 
>
> Reviewed-by: Samuel Thibault 

Thanks, pushed as commit b7c4ae5ace82b81dafffbc50e8026adfa3cc76e7.


Grüße
 Thomas


Re: [PATCH gcc 1/3] Move GNU/Hurd startfile spec from config/i386/gnu.h to config/gnu.h

2024-03-20 Thread Thomas Schwinge
Hi!

On 2024-01-03T09:49:06+, Richard Sandiford  
wrote:
> The series looks good to me FWIW, but Thomas should have the final say.

Richard, thanks for your review.

Sergey, great work on aarch64 GNU/Hurd!  (... where these GCC bits
clearly were the less complicated part...)  ;-)

Please re-submit with ChangeLog updates added to the Git commit logs; see
 ->
, and/or 'git log'
for guidance.  You may use
'contrib/gcc-changelog/git_check_commit.py --print-changelog' to verify.


Grüße
 Thomas


Re: [PATCH, OpenACC 2.7] struct/array reductions for Fortran

2024-03-18 Thread Thomas Schwinge
Hi Chung-Lin!

Thanks for your work here, which I'm beginning to look into (prerequisite
"[PATCH, OpenACC 2.7] Implement reductions for arrays and structs",
first, of course); it'll take me some time.


In non-offloading testing, I noticed for x86_64-pc-linux-gnu '-m32':

+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O0  (test for excess errors)
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O0  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O1  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O1  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O2  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -Os  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -Os  execution test

With optimizations enabled, it runs into 'STOP 4'.

Per '-Wextra':

[...]/libgomp.oacc-fortran/reduction-13.f90:40:6: Warning: Inequality 
comparison for REAL(4) at (1) [-Wcompare-reals]
[...]/libgomp.oacc-fortran/reduction-13.f90:63:6: Warning: Inequality 
comparison for REAL(4) at (1) [-Wcompare-reals]
[...]/libgomp.oacc-fortran/reduction-13.f90:64:6: Warning: Inequality 
comparison for REAL(8) at (1) [-Wcompare-reals]

Do we need to allow for some epsilon (generally in such test cases), or
is there another problem?

For reference:

On 2024-02-08T22:47:13+0800, Chung-Lin Tang  wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-13.f90
> @@ -0,0 +1,66 @@
> +! { dg-do run }
> +
> +! record type reductions
> +
> +program reduction_13
> +  implicit none
> +
> +  type t1
> + integer :: i
> + real :: r
> +  end type t1
> +
> +  type t2
> + real :: r
> + integer :: i
> + double precision :: d
> +  end type t2
> +
> +  integer, parameter :: n = 10, ng = 8, nw = 4, vl = 32
> +  integer :: i
> +  type(t1) :: v1, a1
> +  type (t2) :: v2, a2
> +
> +  v1%i = 0
> +  v1%r = 0
> +  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(v1)
> +  !$acc loop reduction (+:v1)
> +  do i = 1, n
> + v1%i = v1%i + 1
> + v1%r = v1%r + 2
> +  end do
> +  !$acc end parallel
> +  a1%i = 0
> +  a1%r = 0
> +  do i = 1, n
> + a1%i = a1%i + 1
> + a1%r = a1%r + 2
> +  end do
> +  if (v1%i .ne. a1%i) STOP 1
> +  if (v1%r .ne. a1%r) STOP 2
> +
> +  v2%i = 1
> +  v2%r = 1
> +  v2%d = 1
> +  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(v2)
> +  !$acc loop reduction (*:v2)
> +  do i = 1, n
> + v2%i = v2%i * 2
> + v2%r = v2%r * 1.1
> + v2%d = v2%d * 1.3
> +  end do
> +  !$acc end parallel
> +  a2%i = 1
> +  a2%r = 1
> +  a2%d = 1
> +  do i = 1, n
> + a2%i = a2%i * 2
> + a2%r = a2%r * 1.1
> + a2%d = a2%d * 1.3
> +  end do
> +
> +  if (v2%i .ne. a2%i) STOP 3
> +  if (v2%r .ne. a2%r) STOP 4
> +  if (v2%d .ne. a2%d) STOP 5
> +
> +end program reduction_13


Grüße
 Thomas


Re: [PATCH, OpenACC 2.7, v2] Adjust acc_map_data/acc_unmap_data interaction with reference counters

2024-03-15 Thread Thomas Schwinge
Hi Chung-Lin!

I realized: please add "PR libgomp/92840" to the Git commit log, as your
changes are directly a continuation of my earlier changes.


On 2024-03-05T01:18:28+0900, Chung-Lin Tang  wrote:
> On 2023/10/31 11:06 PM, Thomas Schwinge wrote:
>>> @@ -691,15 +694,27 @@ goacc_exit_datum_1 (struct gomp_device_descr 
>>> *acc_dev, void *h, size_t s,
>>>
>>>if (finalize)
>>>  {
>>> -  if (n->refcount != REFCOUNT_INFINITY)
>>> +  if (n->refcount != REFCOUNT_INFINITY
>>> +   && n->refcount != REFCOUNT_ACC_MAP_DATA)
>>>   n->refcount -= n->dynamic_refcount;
>>> -  n->dynamic_refcount = 0;
>>> +
>>> +  if (n->refcount == REFCOUNT_ACC_MAP_DATA)
>>> + /* Mappings created by acc_map_data are returned to initial
>>> +dynamic_refcount of 1. Can only be deleted by acc_unmap_data.  */
>>> + n->dynamic_refcount = 1;
>>> +  else
>>> + n->dynamic_refcount = 0;
>>>  }
>>>else if (n->dynamic_refcount)
>>>  {
>>> -  if (n->refcount != REFCOUNT_INFINITY)
>>> +  if (n->refcount != REFCOUNT_INFINITY
>>> +   && n->refcount != REFCOUNT_ACC_MAP_DATA)
>>>   n->refcount--;
>>> -  n->dynamic_refcount--;
>>> +
>>> +  /* When mapping is created by acc_map_data, dynamic_refcount must be
>>> +  maintained at >= 1.  */
>>> +  if (n->refcount != REFCOUNT_ACC_MAP_DATA || n->dynamic_refcount > 1)
>>> + n->dynamic_refcount--;
>>>  }
>> 
>> I'd find those changes more concise to understand if done the following
>> way: restore both 'if (finalize)' and 'else if (n->dynamic_refcount)'
>> branches to their original form (other than excluding 'n->refcount'
>> modification for 'REFCOUNT_ACC_MAP_DATA', as you have), and instead then
>> afterwards (that is, here), do:
>> 
>> /* Mappings created by 'acc_map_data' can only be deleted by 
>> 'acc_unmap_data'.  */
>> if (n->refcount == REFCOUNT_ACC_MAP_DATA
>> && n->dynamic_refcount == 0)
>>   n->dynamic_refcount = 1;
>> 
>> That does have the same semantics, please verify?
>
> This does not have the same semantics, because if the original 
> finalize/n->dynamic_refcount
> cases are left unmodified, they will treat REFCOUNT_ACC_MAP_DATA like a 
> normal refcount and
> decrement n->refcount, and handling n->refcount == REFCOUNT_ACC_MAP_DATA 
> later won't work either.

That's why I said: "restore [...] excluding 'n->refcount' modification
for 'REFCOUNT_ACC_MAP_DATA', as you have [...]".  Sorry if that was
unclear.

> I have however, adjusted the nesting of cases to split the 'n->refcount == 
> REFCOUNT_ACC_MAP_DATA'
> case away. This should be easier to read.

Thanks, that easier to follow indeed.  I had meant (on top of your v2):

--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -686,35 +686,27 @@ goacc_exit_datum_1 (struct gomp_device_descr 
*acc_dev, void *h, size_t s,
   gomp_fatal ("Dynamic reference counting assert fail\n");
 }
 
-  if (n->refcount == REFCOUNT_ACC_MAP_DATA)
+  if (finalize)
 {
-  if (finalize)
-   {
- /* Mappings created by acc_map_data are returned to initial
-dynamic_refcount of 1. Can only be deleted by acc_unmap_data.  */
- n->dynamic_refcount = 1;
-   }
-  else if (n->dynamic_refcount)
-   {
- /* When mapping is created by acc_map_data, dynamic_refcount must be
-maintained at >= 1.  */
- if (n->dynamic_refcount > 1)
-   n->dynamic_refcount--;
-   }
-}
-  else if (finalize)
-{
-  if (n->refcount != REFCOUNT_INFINITY)
+  if (n->refcount != REFCOUNT_INFINITY
+ && n->refcount != REFCOUNT_ACC_MAP_DATA)
n->refcount -= n->dynamic_refcount;
   n->dynamic_refcount = 0;
 }
   else if (n->dynamic_refcount)
 {
-  if (n->refcount != REFCOUNT_INFINITY)
+  if (n->refcount != REFCOUNT_INFINITY
+ && n->refcount != REFCOUNT_ACC_MAP_DATA)
n->refcount--;
   n->dynamic_refcount--;
 }
 
+  /* Mappings created by 'acc_map_data' may only be deleted by
+ 'acc_unmap_data'.  */
+  if (n->refcount == REFCOUNT_ACC_MAP_DATA
+  && n->dynamic_refcount == 0)
+n->dynamic_refcount = 1;
+
   if (n->refcount == 0)
 {
   

OpenACC 2.7: front-end support for readonly modifier: Add basic OpenACC 'declare' testing (was: [PATCH, OpenACC 2.7, v2] readonly modifier support in front-ends)

2024-03-14 Thread Thomas Schwinge
 if (n->u.map_op == OMP_MAP_RELEASE
>> -   || n->u.map_op == OMP_MAP_DELETE)
>> +  else if (n->u.map.op == OMP_MAP_RELEASE
>> +   || n->u.map.op == OMP_MAP_DELETE)
>>  ;
>>else if (op == EXEC_OMP_TARGET_EXIT_DATA
>> || op == EXEC_OACC_EXIT_DATA)
>> @@ -4088,6 +4091,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
>> gfc_omp_clauses *clauses,
>>  }
>>if (n->u.present_modifier)
>>  OMP_CLAUSE_MOTION_PRESENT (node) = 1;
>> +  if (list == OMP_LIST_CACHE && n->u.map.readonly)
>> +OMP_CLAUSE__CACHE__READONLY (node) = 1;
>>omp_clauses = gfc_trans_add_clause (node, omp_clauses);
>>  }
>>break;
>> @@ -6561,7 +6566,7 @@ gfc_add_clause_implicitly (gfc_omp_clauses 
>> *clauses_out,
>>n2->where = n->where;
>>n2->sym = n->sym;
>>if (is_target)
>> -n2->u.map_op = OMP_MAP_TOFROM;
>> +n2->u.map.op = OMP_MAP_TOFROM;
>>if (tail)
>>  {
>>tail->next = n2;

>> diff --git a/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90 
>> b/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
>> new file mode 100644
>> index 000..696ebd08321
>> --- /dev/null
>> +++ b/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
>> @@ -0,0 +1,89 @@
>> +! { dg-additional-options "-fdump-tree-original" }
>> +
>> +subroutine foo (a, n)
>> +  integer :: n, a(:)
>> +  integer :: i, b(n), c(n)
>> +  !$acc parallel copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +  do i = 1,32
>> + !$acc cache (readonly: a(:), b(:n))
>> + !$acc cache (c(:))
>> +  enddo
>> +  !$acc end parallel
>> +
>> +  !$acc kernels copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +  do i = 1,32
>> + !$acc cache (readonly: a(:), b(:n))
>> + !$acc cache (c(:))
>> +  enddo
>> +  !$acc end kernels
>> +
>> +  !$acc serial copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +  do i = 1,32
>> + !$acc cache (readonly: a(:), b(:n))
>> + !$acc cache (c(:))
>> +  enddo
>> +  !$acc end serial
>> +
>> +  !$acc data copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +  do i = 1,32
>> + !$acc cache (readonly: a(:), b(:n))
>> + !$acc cache (c(:))
>> +  enddo
>> +  !$acc end data
>> +
>> +  !$acc enter data copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +
>> +end subroutine foo
>> +
>> +program main
>> +  integer :: g(32), h(32)
>> +  integer :: i, n = 32, a(32)
>> +  integer :: b(32), c(32)
>> +
>> +  !$acc declare copyin(readonly: g), copyin(h)
>> +
>> +  !$acc parallel copyin(readonly: a(:32), b(:n)) copyin(c(:))
>> +  do i = 1,32
>> + !$acc cache (readonly: a(:), b(:n))
>> + !$acc cache (c(:))
>> +  enddo
>> +  !$acc end parallel
>> +
>> +  !$acc kernels copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +  do i = 1,32
>> + !$acc cache (readonly: a(:), b(:n))
>> + !$acc cache (c(:))
>> +  enddo
>> +  !$acc end kernels
>> +
>> +  !$acc serial copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +  do i = 1,32
>> + !$acc cache (readonly: a(:), b(:n))
>> + !$acc cache (c(:))
>> +  enddo
>> +  !$acc end serial
>> +
>> +  !$acc data copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +  do i = 1,32
>> + !$acc cache (readonly: a(:), b(:n))
>> + !$acc cache (c(:))
>> +  enddo
>> +  !$acc end data
>> +
>> +  !$acc enter data copyin(readonly: a(:), b(:n)) copyin(c(:))
>> +
>> +end program main
>> +
>> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel 
>> map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ 
>> map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
>> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel 
>> map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ 
>> map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
>> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels 
>> map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ 
>> map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
>> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels 
>> map

Re: [PATCH, OpenACC 2.7, v2] readonly modifier support in front-ends

2024-03-13 Thread Thomas Schwinge
Hi Chung-Lin!

On 2024-03-07T17:02:02+0900, Chung-Lin Tang  wrote:
> On 2023/10/26 6:43 PM, Thomas Schwinge wrote:
>>>>>> +++ b/gcc/tree.h
>>>>>> @@ -1813,6 +1813,14 @@ class auto_suppress_location_wrappers
>>>>>>   #define OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE(NODE) \
>>>>>> (OMP_CLAUSE_SUBCODE_CHECK (NODE, 
>>>>>> OMP_CLAUSE_MAP)->base.addressable_flag)
>>>>>>
>>>>>> +/* Nonzero if OpenACC 'readonly' modifier set, used for 'copyin'.  */
>>>>>> +#define OMP_CLAUSE_MAP_READONLY(NODE) \
>>>>>> +  TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
>>>>>> +
>>>>>> +/* Same as above, for use in OpenACC cache directives.  */
>>>>>> +#define OMP_CLAUSE__CACHE__READONLY(NODE) \
>>>>>> +  TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CACHE_))
>>>>> I'm not sure if these special accessor functions are actually useful, or
>>>>> we should just directly use 'TREE_READONLY' instead?  We're only using
>>>>> them in contexts where it's clear that the 'OMP_CLAUSE_SUBCODE_CHECK' is
>>>>> satisfied, for example.
>>>> I find directly using TREE_READONLY confusing.
>>>
>>> FWIW, I've changed to use TREE_NOTHROW instead, if it can give a better 
>>> sense of safety :P
>> 
>> I don't understand that, why not use 'TREE_READONLY'?
>> 
>>> I think there's a misunderstanding here anyways: we are not relying on a 
>>> DECL marked
>>> TREE_READONLY here. We merely need the OMP_CLAUSE_MAP to be marked as 
>>> OMP_CLAUSE_MAP_READONLY == 1.
>> 
>> Yes, I understand that.  My question was why we don't just use
>> 'TREE_READONLY (c)', where 'c' is the
>> 'OMP_CLAUSE_MAP'/'OMP_CLAUSE__CACHE_' clause (not its decl), and avoid
>> the indirection through
>> '#define OMP_CLAUSE_MAP_READONLY'/'#define OMP_CLAUSE__CACHE__READONLY',
>> given that we're only using them in contexts where it's clear that the
>> 'OMP_CLAUSE_SUBCODE_CHECK' is satisfied.  I don't have a strong
>> preference, though.
>
> After further re-testing using TREE_NOTHROW, I have reverted to using 
> TREE_READONLY

ACK, thanks.

> because TREE_NOTHROW clashes
> with OMP_CLAUSE_RELEASE_DESCRIPTOR (which doesn't use the OMP_CLAUSE_MAP_* 
> naming convention and is
> not documented in gcc/tree-core.h either, hmmm...)

Yeah, it's a mess...  The same bits of information spread over three
different places.

(One day I'll turn 'tree's into a proper C++ class hierarchy, with
accessor methods for such flags, statically checked at compile-time, and
thus documented in a single place.  Etc.)

> I have added the comment adjustments in gcc/tree-core.h for the new uses of 
> TREE_READONLY/readonly_flag.
>
> We basically all use OMP_CLAUSE_SUBCODE_CHECK macros for OpenMP clause 
> expressions exclusively,
> so I don't see a reason to diverge from that style (even when context is 
> clear).

ACK.

> I have greatly expanded the test scan patterns to include 
> parallel/kernels/serial/data/enter data,
> as well as non-readonly copyin clause together with readonly.

Thanks.

> Also added simple 'declare' tests, but there is not anything to scan in the 
> 'tree-original' dump though.

Yeah, the current OpenACC 'declare' implementation is "special".

>>> --- a/gcc/fortran/openmp.cc
>>> +++ b/gcc/fortran/openmp.cc
>>> @@ -1197,7 +1197,7 @@ omp_inv_mask::omp_inv_mask (const omp_mask ) : 
>>> omp_mask (m)
>>>
>>>  static bool
>>>  gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op,
>>> -   bool allow_common, bool allow_derived)
>>> +   bool allow_common, bool allow_derived, bool 
>>> readonly = false)
>>>  {
>>>gfc_omp_namelist **head = NULL;
>>>if (gfc_match_omp_variable_list ("", list, allow_common, NULL, , 
>>> true,
>>> @@ -1206,7 +1206,10 @@ gfc_match_omp_map_clause (gfc_omp_namelist **list, 
>>> gfc_omp_map_op map_op,
>>>  {
>>>gfc_omp_namelist *n;
>>>for (n = *head; n; n = n->next)
>>> - n->u.map_op = map_op;
>>> + {
>>> +   n->u.map.op = map_op;
>>> +   n->u.map.readonly = readonly;
>>> + }
>>>return true;
>>>  }
>> 
>> Didn't we conclude that "not doing it here is cleaner" (Tobias' words),
>> and instead do this "Similar to 'c_parser_omp_var_list_p

Re: nvptx: 'cuDeviceGetCount' failure is fatal

2024-03-08 Thread Thomas Schwinge
Hi Tobias!

On 2024-03-07T15:28:21+0100, Tobias Burnus  wrote:
> Thomas Schwinge wrote:
>> OK to push the attached "nvptx: 'cuDeviceGetCount' failure is fatal"?
>
> I think the real question is: what does a 'cuDeviceGetCount' fail mean?

Internally to the CUDA stack: the error codes that you've cited below.
Per the state we're in when calling 'cuDeviceGetCount', we only expect
'CUDA_SUCCESS'.  Therefore, in our actual use: anything else means a
fatal condition that we don't attempt to recover from, like for most of
all other device access failures.

> Does it mean a serious error – or could it just be a permissions issue 
> such that the user has no device access but otherwise is fine?

As you can see, we've done a 'cuInit' right before, so in case there was
any permission issue (or similar), that's already settled (in whichever
way) by the time we do the 'cuDeviceGetCount'.

> Because if it is, e.g., a permission problem – just returning '0' (no 
> devices) would seem to be the proper solution.
>
> But if it is expected to be always something serious, well, then a fatal 
> error makes more sense.

ACK; pushed in commit 37078f241a22c45db6380c5e9a79b4d08054bb3d.


Grüße
 Thomas


> The possible exit codes are:
>
> CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, 
> CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE
>
> which does not really help.
>
> My impression is that 0 is usually returned if something goes wrong 
> (e.g. with permissions) such that an error is a real exception. But all 
> three choices seem to make about equally sense: either host fallback 
> (with 0 or -1) or a fatal error.
>
> Tobias


Re: [PATCH v2] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls

2024-03-08 Thread Thomas Schwinge
Hi!

On 2024-01-29T17:48:47+, Kwok Cheung Yeung  wrote:
> A splay-tree was previously used to lookup equivalent target addresses
> for a given host address on offload targets. However, as splay-trees can
> modify their structure on lookup, they are not suitable for concurrent
> access from separate teams/threads without some form of locking.

Heh.  ,-)

> This
> patch changes the lookup data structure to a hashtab instead, which does
> not have these issues.

(I've not looked into which data structure is most suitable here; not my
area of expertise.)

> The call to build_indirect_map to initialize the data structure is now
> called from just the first thread of the first team to avoid redundant
> calls to this function.

ACK, and also you've removed a number of 'volatile's, as I had questioned
earlier.  It remains open the question when to do the initialization, and
how to react to dynamic device image load and unload, and possibly other
(but not many?) raised during review.

I cannot formally approve this patch, but it seems a good incremental
step forward to me: per my testing so far,
(a) 'libgomp.c-c++-common/declare-target-indirect-2.c' is all-PASS,
with 'warning: this statement may fall through' resolved, and
(b) for 'libgomp.fortran/declare-target-indirect-2.f90': no more timeouts
(applies to nvptx only), and all-PASS execution test (both GCN, nvptx):

PASS: libgomp.fortran/declare-target-indirect-2.f90   -O0  (test for excess 
errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90   -O0  execution 
test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90   -O0  
execution test
PASS: libgomp.fortran/declare-target-indirect-2.f90   -O1  (test for excess 
errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90   -O1  execution 
test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90   -O1  
execution test
PASS: libgomp.fortran/declare-target-indirect-2.f90   -O2  (test for excess 
errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90   -O2  execution 
test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90   -O2  
execution test
PASS: libgomp.fortran/declare-target-indirect-2.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
(test for excess errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
execution test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
execution test
PASS: libgomp.fortran/declare-target-indirect-2.f90   -O3 -g  (test for 
excess errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90   -O3 -g  
execution test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90   -O3 -g  
execution test
PASS: libgomp.fortran/declare-target-indirect-2.f90   -Os  (test for excess 
errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90   -Os  execution 
test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90   -Os  
execution test

(Of course, the patch now needs un-XFAILing of
'libgomp.fortran/declare-target-indirect-2.f90' merged in.)


Grüße
 Thomas


>   libgomp/
>   * config/accel/target-indirect.c: Include string.h and hashtab.h.
>   Remove include of splay-tree.h.  Update comments.
>   (splay_tree_prefix, splay_tree_c): Delete.
>   (struct indirect_map_t): New.
>   (hash_entry_type, htab_alloc, htab_free, htab_hash, htab_eq): New.
>   (GOMP_INDIRECT_ADD_MAP): Remove volatile qualifier.
>   (USE_SPLAY_TREE_LOOKUP): Rename to...
>   (USE_HASHTAB_LOOKUP): ..this.
>   (indirect_map, indirect_array): Delete.
>   (indirect_htab): New.
>   (build_indirect_map): Remove locking.  Build indirect map using
>   hashtab.
>   (GOMP_target_map_indirect_ptr): Use indirect_htab to lookup target
>   address.
>   (GOMP_target_map_indirect_ptr): Remove volatile qualifier.
>   * config/gcn/team.c (gomp_gcn_enter_kernel): Call build_indirect_map
>   from first thread of first team only.
>   * config/nvptx/team.c (gomp_nvptx_main): Likewise.
>   * testsuite/libgomp.c-c++-common/declare-target-indirect-2.c (main):
>   Add missing break statements.
> ---
>  libgomp/config/accel/target-indirect.c| 83 ++-
>  libgomp/config/gcn/team.c |  7 +-
>  libgomp/config/nvptx/team.c   |  9 +-
>  .../declare-target-indirect-2.c   | 14 ++--
>  4 files changed, 63 insertions(+), 50 deletions(-)
>
> diff --git a/libgomp/config/accel/target-indirect.c 
> b/libgomp/config/accel/target-indirect.c
> index c60fd547cb6..cfef1ddbc49 

Fix 'char' initialization, copy, check in 'libgomp.oacc-fortran/acc-memcpy.f90' (was: [patch] OpenACC: Add Fortran routines acc_{alloc,free,hostptr,deviceptr,memcpy_{to,from}_device*})

2024-03-08 Thread Thomas Schwinge
Hi Tobias!

On 2024-02-19T22:36:51+0100, Tobias Burnus  wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/acc-memcpy.f90

OK to push
"Fix 'char' initialization, copy, check in 
'libgomp.oacc-fortran/acc-memcpy.f90'",
see attached?


Grüße
 Thomas


> @@ -0,0 +1,47 @@
> +! { dg-do run }
> +! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
> +
> +! based on libgomp.oacc-c-c++-common/lib-60.c
> +
> +program main
> +  use openacc
> +  use iso_fortran_env
> +  use iso_c_binding
> +  implicit none (type, external)
> +  integer(int8), allocatable :: char(:)
> +  type(c_ptr) :: dptr
> +  integer(c_intptr_t) :: i
> +  integer(int8) :: j
> +
> +  allocate(char(-128:127))
> +  do i = -128, 127
> +char(j) = int (j, int8)
> +  end do
> +
> +  dptr = acc_malloc (256_c_size_t)
> +  call acc_memcpy_to_device (dptr, char, 255_c_size_t)
> +
> +  do i = 0, 255
> +if (acc_is_present (transfer (transfer(char, i) + i, dptr), 1)) &
> +  stop 1
> +  end do
> +
> +  char = 0_int8
> +
> +  call acc_memcpy_from_device (char, dptr, 256_c_size_t)
> +
> +  do i = -128, 127
> +char(i) = int (j, int8)
> +if (char(i) /= j) &
> +  stop 2
> +  end do
> +
> +  do i = 0, 255
> +if (acc_is_present (transfer (transfer(char, i) + i, dptr), 1)) &
> +  stop 3
> +  end do
> +
> +  call acc_free (dptr)
> +
> +  deallocate (char)
> +end


>From 7ea60a544353fa9ff0760e11db53332195eebad4 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 6 Mar 2024 23:18:08 +0100
Subject: [PATCH] Fix 'char' initialization, copy, check in
 'libgomp.oacc-fortran/acc-memcpy.f90'
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Our dear friend '-Wuninitialized' reported:

[...]/libgomp.oacc-fortran/acc-memcpy.f90:18:27:

   18 | char(j) = int (j, int8)
  |   ^
Warning: ‘j’ may be used uninitialized [-Wmaybe-uninitialized]
[...]/libgomp.oacc-fortran/acc-memcpy.f90:14:20:

   14 |   integer(int8) :: j
  |^
note: ‘j’ was declared here

..., but actually there were other issues.

	libgomp/
	* testsuite/libgomp.oacc-fortran/acc-memcpy.f90: Fix 'char'
	initialization, copy, check.
---
 libgomp/testsuite/libgomp.oacc-fortran/acc-memcpy.f90 | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc-memcpy.f90 b/libgomp/testsuite/libgomp.oacc-fortran/acc-memcpy.f90
index 670dc50ff07..844d08a4661 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc-memcpy.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc-memcpy.f90
@@ -11,15 +11,14 @@ program main
   integer(int8), allocatable :: char(:)
   type(c_ptr) :: dptr
   integer(c_intptr_t) :: i
-  integer(int8) :: j
 
   allocate(char(-128:127))
   do i = -128, 127
-char(j) = int (j, int8)
+char(i) = int (i, int8)
   end do
 
   dptr = acc_malloc (256_c_size_t)
-  call acc_memcpy_to_device (dptr, char, 255_c_size_t)
+  call acc_memcpy_to_device (dptr, char, 256_c_size_t)
 
   do i = 0, 255
 if (acc_is_present (transfer (transfer(char, i) + i, dptr), 1)) &
@@ -31,8 +30,7 @@ program main
   call acc_memcpy_from_device (char, dptr, 256_c_size_t)
 
   do i = -128, 127
-char(i) = int (j, int8)
-if (char(i) /= j) &
+if (char(i) /= i) &
   stop 2
   end do
 
-- 
2.34.1



GCN, nvptx: Errors during device probing are fatal (was: Stabilizing flaky libgomp GCN target/offloading testing)

2024-03-08 Thread Thomas Schwinge
Hi!

On 2024-02-21T13:34:01+0100, I wrote:
> On 2024-02-01T15:49:02+0100, Richard Biener  wrote:
>> On Thu, 1 Feb 2024, Thomas Schwinge wrote:
>>> [...] what I
>>> got with '-march=gfx1100' for AMD Radeon RX 7900 XTX.  [...]
>
>>> [...] execution test FAILs.  Not all FAILs appear all the time [...]
>
> What disturbs the testing a lot is, that the GPU may get into a bad
> state, upon which any use either fails with a
> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in
> 'libhsa-runtime64.so.1'...

So, there's a "fun" aspect: if we run into
'HSA_STATUS_ERROR_OUT_OF_RESOURCES' (or other errors; and similar in the
libgomp nvptx plugin) during libgomp GCN plugin device probing, then it's
not fatal, but instead silently disables the libgomp plugin/device, thus
typically silently resorting to host-fallback execution.  That's not
helpful behavior in my opinion, so I propose the attached
"GCN, nvptx: Errors during device probing are fatal".  OK to push?

(That's also the behavior that's implemented in both the GCN and nvptx
target 'run' tools.)


Grüße
 Thomas


>From 0dc72089dccc10d3b55096ade5fc4d72de6cb96f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 7 Mar 2024 14:42:07 +0100
Subject: [PATCH] GCN, nvptx: Errors during device probing are fatal

Currently, we silently disable libgomp GCN and nvptx plugins/devices in
presence of certain error conditions during device probing, thus typically
silently resorting to host-fallback execution.  Make such errors fatal, similar
as for any other device access later on, so that we early and reliably notice
when things go wrong.  (Keep just two cases non-fatal: (a) libgomp GCN or nvptx
plugins are available but 'libhsa-runtime64.so.1' or 'libcuda.so.1' are not,
and (b) those are available, but the corresponding devices are not.)

This resolves the issue that we've got execution test cases unexpectedly
PASSing, despite:

libgomp: GCN fatal error: Run-time could not be initialized
Runtime message: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.

..., and therefore they were not offloaded to the GCN device, but ran in
host-fallback execution mode.  What happend in that scenario is that in
'init_hsa_context' during the initial 'GOMP_OFFLOAD_get_num_devices' we ran
into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', but it wasn't fatal, but just
silently disabled the libgomp plugin/device.

Especially "entertaining" were cases where such unintended host-fallback
execution happened during effective-target checks like
'offload_device_available' (host-fallback execution there meaning: no offload
device available), but actual test cases then were running with an offload
device available, and therefore mis-configured.

	include/
	* cuda/cuda.h (CUresult): Add 'CUDA_ERROR_NO_DEVICE'.
	libgomp/
	* plugin/plugin-gcn.c (init_hsa_context): Add and handle
	'bool probe' parameter.  Adjust all users; errors during device
	probing are fatal.
	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Aside from
	'CUDA_ERROR_NO_DEVICE', errors during device probing are fatal.
---
 include/cuda/cuda.h   |  1 +
 libgomp/plugin/plugin-gcn.c   | 14 --
 libgomp/plugin/plugin-nvptx.c |  4 +++-
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 114aba4e074..0dca4b3a5c0 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -57,6 +57,7 @@ typedef enum {
   CUDA_ERROR_OUT_OF_MEMORY = 2,
   CUDA_ERROR_NOT_INITIALIZED = 3,
   CUDA_ERROR_DEINITIALIZED = 4,
+  CUDA_ERROR_NO_DEVICE = 100,
   CUDA_ERROR_INVALID_CONTEXT = 201,
   CUDA_ERROR_INVALID_HANDLE = 400,
   CUDA_ERROR_NOT_FOUND = 500,
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 7e141a85f31..2bea9157e9d 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1511,10 +1511,12 @@ assign_agent_ids (hsa_agent_t agent, void *data)
 }
 
 /* Initialize hsa_context if it has not already been done.
-   Return TRUE on success.  */
+   If !PROBE: returns TRUE on success.
+   If PROBE: returns TRUE on success or if the plugin/device shall be silently
+   ignored, and otherwise emits an error and returns FALSE.  */
 
 static bool
-init_hsa_context (void)
+init_hsa_context (bool probe)
 {
   hsa_status_t status;
   int agent_index = 0;
@@ -1529,7 +1531,7 @@ init_hsa_context (void)
 	GOMP_PLUGIN_fatal ("%s\n", msg);
   else
 	GCN_WARNING ("%s\n", msg);
-  return false;
+  return probe ? true : false;
 }
   status = hsa_fns.hsa_init_fn ();
   if (status != HSA_STATUS_SUCCESS)
@@ -3321,8 +3323,8 @@ GOMP_OFFLOAD_version (void)
 int
 GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
 {
-  if (!ini

Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-08 Thread Thomas Schwinge
Hi!

On 2024-03-07T15:07:32+0100, Tobias Burnus  wrote:
> first, I have the feeling we talk about (more or less) the same code 
> region and use the same words – but we talk about rather different 
> things. Thus, you confuse me (and possibly Andrew) – and my reply 
> confuses you.

That, indeed, is my impression, too.  :-/

And actually the biggest confusion seems to be that both you would like
'GCN_SUPPRESS_HOST_FALLBACK' to mean something else than
'HSA_SUPPRESS_HOST_FALLBACK' originally meant.

Hopefully the
"GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable 
(non-shared memory system)"
does clarify that.


Just to close this out, let's try again for the other discussion items:

> Thomas Schwinge wrote:
>> On 2024-03-07T12:43:07+0100, Tobias Burnus  wrote:
>>> Thomas Schwinge wrote:
>>>> First, I think most users do not set GCN_SUPPRESS_HOST_FALLBACK – and it
>>>> is also not really desirable.
>> External users probably don't, but certainly all our internal testing is
>> setting it,
>
> First, I doubt it

'git grep --cached GCN_SUPPRESS_HOST_FALLBACK' in our internal scripts is
your friend.

> secondly, if it were true, it was broken for the 
> last 5 years or so as we definitely did not notice fails due to not 
> working offload devices. – Neither for AMD GCN nor ...

You're saying that 'GCN_SUPPRESS_HOST_FALLBACK=1' doesn't report as fatal
certain errors during device probing?  That's not what the code as well
as my experience says.

>> and also implicitly all nvptx offloading testing: simply by
>> means of having ["no" missing here -- sorry!] such knob in the libgomp nvptx 
>> plugin.
>
> I did see it at some places set for AMD but I do not see any 
> nvptx-specific environment variable which permits to do the same.

Right, that was confusing: there was a "no" missing in that sentence --
sorry!

> However:
>>   That is, the
>> libgomp nvptx plugin has an implicit 'suppress_host_fallback = true' for
>> (the original meaning of) that flag
>
> I think that's one of the problems here – you talk about 
> suppress_host_fallback (implicit, original meaning), while I talk about 
> the GCN_SUPPRESS_HOST_FALLBACK environment variable.

The 'suppress_host_fallback' internal variable directly corresponds to
the 'GCN_SUPPRESS_HOST_FALLBACK' environment variable.

> Besides all the talk about suppress_host_fallback, 
> 'init_hsa_runtime_functions' is not fatal' of the subject line seems to 
> be something to be considered (beyond the patches you already suggested).

I'll next submit "GCN, nvptx: Errors during device probing are fatal".

>>> If I run on my Linux system the system compiler with nvptx + gcn suppost
>>> installed, I get (with a nvptx permission problem):
>>>
>>> $ GCN_SUPPRESS_HOST_FALLBACK=1 ./a.out
>>>
>>> libgomp: GCN host fallback has been suppressed
>>>
>>> And exit code = 1. The same result with '-foffload=disable' or with
>>> '-foffload=nvptx-none'.
>> I can't tell if that's what you expect to see there, or not?
>
> Well, obviously

In this discussion thread here, nothing was obvious to my anymore...  ;-|

> not that I get this error by default – and as your 
> wording indicated that the internal variable will be always true

That always-'true' suggestion was only for the *original* meaning of the
variable: the use in 'GOMP_OFFLOAD_can_run'.

> – and 
> not only when the env var GCN_SUPPRESS_HOST_FALLBACK is explicit set, I 
> worry that I would get the error any time.

That was exactly the point of my patch in this thread: to get rid of the
*additional*/*new* behavior that the libgomp GCN plugin derives from
'GCN_SUPPRESS_HOST_FALLBACK', different from what
'HSA_SUPPRESS_HOST_FALLBACK' originally meant.

However, I now understand that Andrew would like to keep that *new*
behavior.

>> (For avoidance of doubt: I'm expecting silent host-fallback execution in
>> case that libgomp GCN and/or nvptx plugins are available, but no
>> corresponding devices.  That's what my patch achieves.)
>
> I concur that the silent host fallback should happen by default (unless 
> env vars tell otherwise) - at least when either no code was generated 
> for the device (e.g. -foffload=disable) or when the vendor runtime 
> library is not available or no device (be it no hardware or no permission).
>
> That's the current behavior and if that remains, my main concern evaporates.

ACK, thanks.


Grüße
 Thomas


>>> If we want to remove it, we can make it always false - but I am strongly
>>> against making it always true.
>> I'm confused.  So you want the GCN and nvptx plugins to behave
>> differently in that regard?
> No – or at

GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system) (was: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is

2024-03-08 Thread Thomas Schwinge
Hi!

So, attached here is now a different patch
"GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable 
(non-shared memory system)",
that takes a different approach re clarifying the two orthogonal aspects
that the 'GCN_SUPPRESS_HOST_FALLBACK' environment variable controls:
(a) the *original* meaning via 'HSA_SUPPRESS_HOST_FALLBACK', and
(b) the *additional*/*new* meaning to report as fatal certain errors
during device probing.

As you requested, (b) remains as it is (with just the diagnostic message
clarified).  Re (a):

On 2024-03-07T14:37:10+0100, I wrote:
> On 2024-03-07T12:43:07+0100, Tobias Burnus  wrote:
>> Thomas Schwinge wrote:
>>> [...] libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' [...]
>>>
>>> [...] originates in the libgomp HSA plugin, where the idea was -- in my
>>> understanding -- that you wouldn't have device code available for all
>>> 'fn_ptr's, and in that case transparently (shared-memory system!) do
>>> host-fallback execution.  Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set,
>>> you'd get those diagnosed.
>>>
>>> This has then been copied into the libgomp GCN plugin (see above).
>>> However, is it really still applicable there; don't we assume that we're
>>> generating device code for all relevant functions?

> And, one step back: how is (the original meaning of)
> 'suppress_host_fallback = false' even supposed to work on non-shared
> memory systems as currently implemented by the libgomp GCN plugin?

> [...] this whole concept of dynamic plugin-level
> host-fallback execution being in conflict with our current non-shared
> memory system configurations?

I therefore suggest to get rid of (a).

OK to push?


Grüße
 Thomas


>From 2a188021ca70fc1956ba78707fdec9dcca4f734d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 7 Mar 2024 15:51:54 +0100
Subject: [PATCH] GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK'
 isn't applicable (non-shared memory system)

'GCN_SUPPRESS_HOST_FALLBACK' originated as 'HSA_SUPPRESS_HOST_FALLBACK' in the
libgomp HSA plugin, where the idea was -- in my understanding -- that you
wouldn't have device code available for all functions that may be called, and
in that case transparently (shared memory system!) do host-fallback execution.
Or, with 'HSA_SUPPRESS_HOST_FALLBACK' set, you'd get those diagnosed.

This has then been copied into the libgomp GCN plugin as
'GCN_SUPPRESS_HOST_FALLBACK'.  However, the original meaning isn't applicable
for the libgomp GCN plugin anymore: we assume that we're generating device code
for all relevant functions, and we're implementing a non-shared memory system,
where we cannot transparently do host-fallback execution for individual
functions.

However, 'GCN_SUPPRESS_HOST_FALLBACK' has gained an additional meaning, to
enforce a fatal error in case that 'libhsa-runtime64.so.1' can't be dynamically
loaded; keep that meaning.

	libgomp/
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_can_run): Don't consider
	'GCN_SUPPRESS_HOST_FALLBACK' anymore (assume always-'true').
	(init_hsa_context): Adjust 'GCN_SUPPRESS_HOST_FALLBACK' error
	message.
---
 libgomp/plugin/plugin-gcn.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 4b7ab5e83c5..7e141a85f31 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1524,9 +1524,11 @@ init_hsa_context (void)
   init_environment_variables ();
   if (!init_hsa_runtime_functions ())
 {
-  GCN_WARNING ("Run-time could not be dynamically opened\n");
+  const char *msg = "Run-time could not be dynamically opened";
   if (suppress_host_fallback)
-	GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
+	GOMP_PLUGIN_fatal ("%s\n", msg);
+  else
+	GCN_WARNING ("%s\n", msg);
   return false;
 }
   status = hsa_fns.hsa_init_fn ();
@@ -3855,15 +3857,9 @@ GOMP_OFFLOAD_can_run (void *fn_ptr)
 
   init_kernel (kernel);
   if (kernel->initialization_failed)
-goto failure;
+GOMP_PLUGIN_fatal ("kernel initialization failed");
 
   return true;
-
-failure:
-  if (suppress_host_fallback)
-GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
-  GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n");
-  return false;
 }
 
 /* Allocate memory on device N.  */
-- 
2.34.1



Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-07 Thread Thomas Schwinge
Hi Tobias!

On 2024-03-07T12:43:07+0100, Tobias Burnus  wrote:
> Thomas Schwinge wrote:
>> An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is
>> different from the libgomp-level host-fallback execution):
>>> +failure:
>>> +  if (suppress_host_fallback)
>>> +GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
>>> +  GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n");
>>> +  return false;
>>> +}
>>
>> This originates in the libgomp HSA plugin, where the idea was -- in my
>> understanding -- that you wouldn't have device code available for all
>> 'fn_ptr's, and in that case transparently (shared-memory system!) do
>> host-fallback execution.  Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set,
>> you'd get those diagnosed.
>>
>> This has then been copied into the libgomp GCN plugin (see above).
>> However, is it really still applicable there; don't we assume that we're
>> generating device code for all relevant functions?  (I suppose everyone
>> really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?)
>
> First, I think most users do not set GCN_SUPPRESS_HOST_FALLBACK – and it 
> is also not really desirable.

External users probably don't, but certainly all our internal testing is
setting it, and also implicitly all nvptx offloading testing: simply by
means of having such knob in the libgomp nvptx plugin.  That is, the
libgomp nvptx plugin has an implicit 'suppress_host_fallback = true' for
(the original meaning of) that flag (and does not have the "init"-error
behavior that I consider bogus, and try to remove from the libgomp GCN
plugin).

And, one step back: how is (the original meaning of)
'suppress_host_fallback = false' even supposed to work on non-shared
memory systems as currently implemented by the libgomp GCN plugin?

> If I run on my Linux system the system compiler with nvptx + gcn suppost 
> installed, I get (with a nvptx permission problem):
>
> $ GCN_SUPPRESS_HOST_FALLBACK=1 ./a.out
>
> libgomp: GCN host fallback has been suppressed
>
> And exit code = 1. The same result with '-foffload=disable' or with 
> '-foffload=nvptx-none'.

I can't tell if that's what you expect to see there, or not?

(For avoidance of doubt: I'm expecting silent host-fallback execution in
case that libgomp GCN and/or nvptx plugins are available, but no
corresponding devices.  That's what my patch achieves.)

>> Should we thus
>> actually remove 'suppress_host_fallback' (that is, make it
>> always-'true'),
>
> If we want to remove it, we can make it always false - but I am strongly 
> against making it always true.

I'm confused.  So you want the GCN and nvptx plugins to behave
differently in that regard?  What is the rationale for that?  In
particular also regarding this whole concept of dynamic plugin-level
host-fallback execution being in conflict with our current non-shared
memory system configurations?


> Use OMP_TARGET_OFFLOAD=mandatory (or that GCN env) if you want to 
> prevent the host fallback, but don't break somewhat common systems.

That's an orthogonal concept?


Grüße
 Thomas


Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-07 Thread Thomas Schwinge
Hi Andrew!

On 2024-03-07T11:38:27+, Andrew Stubbs  wrote:
> On 07/03/2024 11:29, Thomas Schwinge wrote:
>> On 2019-11-12T13:29:16+, Andrew Stubbs  wrote:
>>> This patch contributes the GCN libgomp plugin, with the various
>>> configure and make bits to go with it.
>> 
>> An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is
>> different from the libgomp-level host-fallback execution):
>> 
>>> --- /dev/null
>>> +++ b/libgomp/plugin/plugin-gcn.c
>> 
>>> +/* Flag to decide if the runtime should suppress a possible fallback to 
>>> host
>>> +   execution.  */
>>> +
>>> +static bool suppress_host_fallback;
>> 
>>> +static void
>>> +init_environment_variables (void)
>>> +{
>>> +  [...]
>>> +  if (secure_getenv ("GCN_SUPPRESS_HOST_FALLBACK"))
>>> +suppress_host_fallback = true;
>>> +  else
>>> +suppress_host_fallback = false;
>> 
>>> +/* Return true if the HSA runtime can run function FN_PTR.  */
>>> +
>>> +bool
>>> +GOMP_OFFLOAD_can_run (void *fn_ptr)
>>> +{
>>> +  struct kernel_info *kernel = (struct kernel_info *) fn_ptr;
>>> +
>>> +  init_kernel (kernel);
>>> +  if (kernel->initialization_failed)
>>> +goto failure;
>>> +
>>> +  return true;
>>> +
>>> +failure:
>>> +  if (suppress_host_fallback)
>>> +GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
>>> +  GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n");
>>> +  return false;
>>> +}
>> 
>> This originates in the libgomp HSA plugin, where the idea was -- in my
>> understanding -- that you wouldn't have device code available for all
>> 'fn_ptr's, and in that case transparently (shared-memory system!) do
>> host-fallback execution.  Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set,
>> you'd get those diagnosed.
>> 
>> This has then been copied into the libgomp GCN plugin (see above).
>> However, is it really still applicable there; don't we assume that we're
>> generating device code for all relevant functions?  (I suppose everyone
>> really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?)  Should we thus
>> actually remove 'suppress_host_fallback' (that is, make it
>> always-'true'), including removal of the 'can_run' hook?  (I suppose that
>> even in a future shared-memory "GCN" configuration, we're not expecting
>> to use this again; expecting always-'true' for 'can_run'?)
>> 
>> 
>> Now my actual issue: the libgomp GCN plugin then invented an additional
>> use of 'GCN_SUPPRESS_HOST_FALLBACK':
>> 
>>> +/* Initialize hsa_context if it has not already been done.
>>> +   Return TRUE on success.  */
>>> +
>>> +static bool
>>> +init_hsa_context (void)
>>> +{
>>> +  hsa_status_t status;
>>> +  int agent_index = 0;
>>> +
>>> +  if (hsa_context.initialized)
>>> +return true;
>>> +  init_environment_variables ();
>>> +  if (!init_hsa_runtime_functions ())
>>> +{
>>> +  GCN_WARNING ("Run-time could not be dynamically opened\n");
>>> +  if (suppress_host_fallback)
>>> +   GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
>>> +  return false;
>>> +}
>> 
>> That is, if 'GCN_SUPPRESS_HOST_FALLBACK' is (routinely) set (for its
>> original purpose), and you have the libgomp GCN plugin configured, but
>> don't have 'libhsa-runtime64.so.1' available, you run into a fatal error.
>> 
>> The libgomp nvptx plugin in such cases silently disables the
>> plugin/device (and thus lets libgomp proper do its thing), and I propose
>> we do the same here.  OK to push the attached
>> "GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 
>> 'init_hsa_runtime_functions' is not fatal"?
>
> If you try to run the offload testsuite on a device that is not properly 
> configured then we want FAIL

Exactly, and that's what I'm working towards.  (Currently we're not
implementing that properly.)

But why is 'GCN_SUPPRESS_HOST_FALLBACK' controlling
'init_hsa_runtime_functions' relevant for that?  As you know, that
function only deals with dynamically loading 'libhsa-runtime64.so.1', and
Failure to load that one (because it doesn't exist) should have the
agreed-upon behavior of *not* raising an error.  (Any other, later errors
should be fatal, I certainly agree.)

> not pass-via-fallback. You're breaking that.

Sorry, I don't follow, please explain?


Grüße
 Thomas


nvptx: 'cuDeviceGetCount' failure is fatal (was: [Patch] OpenMP: Move omp requires checks to libgomp)

2024-03-07 Thread Thomas Schwinge
Hi!

On 2022-06-08T05:56:02+0200, Tobias Burnus  wrote:
> [...] On the libgomp side: The devices which do not fulfill the requirements 
> are
> now filtered out.  [...]

> --- a/libgomp/plugin/plugin-gcn.c
> +++ b/libgomp/plugin/plugin-gcn.c

>  /* Return the number of GCN devices on the system.  */
>  
>  int
> -GOMP_OFFLOAD_get_num_devices (void)
> +GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
>  {
>if (!init_hsa_context ())
>  return 0;
> +  /* Return -1 if no omp_requires_mask cannot be fulfilled but
> + devices were present.  */
> +  if (hsa_context.agent_count > 0 && omp_requires_mask != 0)
> +return -1;
>return hsa_context.agent_count;
>  }

> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c

>  int
> -GOMP_OFFLOAD_get_num_devices (void)
> +GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
>  {
> -  return nvptx_get_num_devices ();
> +  int num_devices = nvptx_get_num_devices ();
> +  /* Return -1 if no omp_requires_mask cannot be fulfilled but
> + devices were present.  */
> +  if (num_devices > 0 && omp_requires_mask != 0)
> +return -1;
> +  return num_devices;
>  }

> --- a/libgomp/target.c
> +++ b/libgomp/target.c

> @@ -4132,8 +4183,19 @@ gomp_target_init (void)
>  
>   if (gomp_load_plugin_for_device (_device, plugin_name))
> {
> - new_num_devs = current_device.get_num_devices_func ();
> - if (new_num_devs >= 1)
> + new_num_devs = current_device.get_num_devices_func (requires_mask);
> + if (new_num_devs < 0)
> +   {
> + [...]
> +   }
> + else if (new_num_devs >= 1)
> {
>   /* Augment DEVICES and NUM_DEVICES.  */

OK to push the attached "nvptx: 'cuDeviceGetCount' failure is fatal"?


Grüße
 Thomas


>From 8090da93cb00e4aa47a8b21b6548d739b2cebc49 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 7 Mar 2024 13:18:23 +0100
Subject: [PATCH] nvptx: 'cuDeviceGetCount' failure is fatal

Per commit 683f11843974f0bdf42f79cdcbb0c2b43c7b81b0
"OpenMP: Move omp requires checks to libgomp", we're now using 'return -1'
from 'GOMP_OFFLOAD_get_num_devices' for 'omp_requires_mask' purposes.  This
missed that via 'nvptx_get_num_devices', we could also 'return -1' for
'cuDeviceGetCount' failure.  Before, this meant (in 'gomp_target_init') to
silently ignore the plugin/device -- which also has been doubtful behavior.
Let's instead turn 'cuDeviceGetCount' failure into a fatal error, similar to
other errors during device initialization.

	libgomp/
	* plugin/plugin-nvptx.c (nvptx_get_num_devices):
	'cuDeviceGetCount' failure is fatal.
---
 libgomp/plugin/plugin-nvptx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index ffb1db67d20..81b4a7f499a 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -630,7 +630,7 @@ nvptx_get_num_devices (void)
 	}
 }
 
-  CUDA_CALL_ERET (-1, cuDeviceGetCount, );
+  CUDA_CALL_ASSERT (cuDeviceGetCount, );
   return n;
 }
 
-- 
2.34.1



GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 'libcuda.so.1' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patch

2024-03-07 Thread Thomas Schwinge
Hi!

On 2017-01-13T19:11:23+0100, Jakub Jelinek  wrote:
> [...] If the nvptx libgomp plugin is installed, but libcuda.so.1
> can't be found, then the plugin behaves as if there are no PTX devices
> available.  [...]

ACK.

> --- libgomp/plugin/plugin-nvptx.c.jj  2017-01-13 12:07:56.0 +0100
> +++ libgomp/plugin/plugin-nvptx.c 2017-01-13 18:00:39.693284346 +0100

> +/* -1 if init_cuda_lib has not been called yet, false
> +   if it has been and failed, true if it has been and succeeded.  */
> +static char cuda_lib_inited = -1;
>  
> -  return desc;
> +/* Dynamically load the CUDA runtime library and initialize function
> +   pointers, return false if unsuccessful, true if successful.  */
> +static bool
> +init_cuda_lib (void)
> +{
> +  if (cuda_lib_inited != -1)
> +return cuda_lib_inited;
> +  const char *cuda_runtime_lib = "libcuda.so.1";
> +  void *h = dlopen (cuda_runtime_lib, RTLD_LAZY);
> +  cuda_lib_inited = false;
> +  if (h == NULL)
> +return false;

..., so this has to stay.

> +# undef CUDA_ONE_CALL
> +# define CUDA_ONE_CALL(call) CUDA_ONE_CALL_1 (call)
> +# define CUDA_ONE_CALL_1(call) \
> +  cuda_lib.call = dlsym (h, #call);  \
> +  if (cuda_lib.call == NULL) \
> +return false;

However, this (missing symbol) I'd like to make a fatal error, instead of
silently disabling the plugin/device.  OK to push the attached
"GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 
'libcuda.so.1'"?

> +  [...]
> +  cuda_lib_inited = true;
> +  return true;
>  }


Grüße
 Thomas


>From 6a6520e01f7e7118b556683c2934f2c64c6dbc81 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 7 Mar 2024 12:31:52 +0100
Subject: [PATCH] GCN, nvptx: Fatal error for missing symbols in
 'libhsa-runtime64.so.1', 'libcuda.so.1'

If 'libhsa-runtime64.so.1', 'libcuda.so.1' are not available, the corresponding
libgomp plugin/device gets disabled, as before.  But if they are available,
report any inconsistencies such as missing symbols, similar to how we fail in
presence of other issues during device initialization.

	libgomp/
	* plugin/plugin-gcn.c (init_hsa_runtime_functions): Fatal error
	for missing symbols.
	* plugin/plugin-nvptx.c (init_cuda_lib): Likewise.
---
 libgomp/plugin/plugin-gcn.c   | 3 ++-
 libgomp/plugin/plugin-nvptx.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 464164afb03..338225db6f4 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1382,9 +1382,10 @@ init_hsa_runtime_functions (void)
 #define DLSYM_FN(function) \
   hsa_fns.function##_fn = dlsym (handle, #function); \
   if (hsa_fns.function##_fn == NULL) \
-return false;
+GOMP_PLUGIN_fatal ("'%s' is missing '%s'", hsa_runtime_lib, #function);
 #define DLSYM_OPT_FN(function) \
   hsa_fns.function##_fn = dlsym (handle, #function);
+
   void *handle = dlopen (hsa_runtime_lib, RTLD_LAZY);
   if (handle == NULL)
 return false;
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 3fd6cd42fa6..ffb1db67d20 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -127,7 +127,7 @@ init_cuda_lib (void)
 # define CUDA_ONE_CALL_1(call, allow_null)		\
   cuda_lib.call = dlsym (h, #call);	\
   if (!allow_null && cuda_lib.call == NULL)		\
-return false;
+GOMP_PLUGIN_fatal ("'%s' is missing '%s'", cuda_runtime_lib, #call);
 #include "cuda-lib.def"
 # undef CUDA_ONE_CALL
 # undef CUDA_ONE_CALL_1
-- 
2.34.1



GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal (was: [PATCH 7/7 libgomp,amdgcn] GCN Libgomp Plugin)

2024-03-07 Thread Thomas Schwinge
Hi!

On 2019-11-12T13:29:16+, Andrew Stubbs  wrote:
> This patch contributes the GCN libgomp plugin, with the various
> configure and make bits to go with it.

An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is
different from the libgomp-level host-fallback execution):

> --- /dev/null
> +++ b/libgomp/plugin/plugin-gcn.c

> +/* Flag to decide if the runtime should suppress a possible fallback to host
> +   execution.  */
> +
> +static bool suppress_host_fallback;

> +static void
> +init_environment_variables (void)
> +{
> +  [...]
> +  if (secure_getenv ("GCN_SUPPRESS_HOST_FALLBACK"))
> +suppress_host_fallback = true;
> +  else
> +suppress_host_fallback = false;

> +/* Return true if the HSA runtime can run function FN_PTR.  */
> +
> +bool
> +GOMP_OFFLOAD_can_run (void *fn_ptr)
> +{
> +  struct kernel_info *kernel = (struct kernel_info *) fn_ptr;
> +
> +  init_kernel (kernel);
> +  if (kernel->initialization_failed)
> +goto failure;
> +
> +  return true;
> +
> +failure:
> +  if (suppress_host_fallback)
> +GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
> +  GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n");
> +  return false;
> +}

This originates in the libgomp HSA plugin, where the idea was -- in my
understanding -- that you wouldn't have device code available for all
'fn_ptr's, and in that case transparently (shared-memory system!) do
host-fallback execution.  Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set,
you'd get those diagnosed.

This has then been copied into the libgomp GCN plugin (see above).
However, is it really still applicable there; don't we assume that we're
generating device code for all relevant functions?  (I suppose everyone
really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?)  Should we thus
actually remove 'suppress_host_fallback' (that is, make it
always-'true'), including removal of the 'can_run' hook?  (I suppose that
even in a future shared-memory "GCN" configuration, we're not expecting
to use this again; expecting always-'true' for 'can_run'?)


Now my actual issue: the libgomp GCN plugin then invented an additional
use of 'GCN_SUPPRESS_HOST_FALLBACK':

> +/* Initialize hsa_context if it has not already been done.
> +   Return TRUE on success.  */
> +
> +static bool
> +init_hsa_context (void)
> +{
> +  hsa_status_t status;
> +  int agent_index = 0;
> +
> +  if (hsa_context.initialized)
> +return true;
> +  init_environment_variables ();
> +  if (!init_hsa_runtime_functions ())
> +{
> +  GCN_WARNING ("Run-time could not be dynamically opened\n");
> +  if (suppress_host_fallback)
> + GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
> +  return false;
> +}

That is, if 'GCN_SUPPRESS_HOST_FALLBACK' is (routinely) set (for its
original purpose), and you have the libgomp GCN plugin configured, but
don't have 'libhsa-runtime64.so.1' available, you run into a fatal error.

The libgomp nvptx plugin in such cases silently disables the
plugin/device (and thus lets libgomp proper do its thing), and I propose
we do the same here.  OK to push the attached
"GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 
'init_hsa_runtime_functions' is not fatal"?


Grüße
 Thomas


>From f037d2d8274940f042633a0ecb18a53942c075f5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 7 Mar 2024 10:43:15 +0100
Subject: [PATCH] GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to
 'init_hsa_runtime_functions' is not fatal

'GCN_SUPPRESS_HOST_FALLBACK' controls the libgomp GCN plugin's capability to
transparently use host-fallback execution for certain device functions; it
shouldn't control failure of libgomp GCN plugin initialization (which libgomp
handles fine: triggering use of a different plugin/device, or general
host-fallback execution, or fatal error, as applicable).

	libgomp/
	* plugin/plugin-gcn.c (init_hsa_context): Even with
	'GCN_SUPPRESS_HOST_FALLBACK' set, failure to
	'init_hsa_runtime_functions' is not fatal.
---
 libgomp/plugin/plugin-gcn.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 2771123252a..464164afb03 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1524,8 +1524,6 @@ init_hsa_context (void)
   if (!init_hsa_runtime_functions ())
 {
   GCN_WARNING ("Run-time could not be dynamically opened\n");
-  if (suppress_host_fallback)
-	GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
   return false;
 }
   status = hsa_fns.hsa_init_fn ();
-- 
2.34.1



amdgcn: additional gfx1030/gfx1100 support: adjust test cases (was: [PATCH] amdgcn: additional gfx1100 support)

2024-03-06 Thread Thomas Schwinge
Hi!

On 2024-01-24T12:43:04+, Andrew Stubbs  wrote:
> This [...]

... became commit 99890e15527f1f04caef95ecdd135c9f1a077f08
"amdgcn: additional gfx1030/gfx1100 support", and included the following:

> --- a/gcc/config/gcn/gcn-valu.md
> +++ b/gcc/config/gcn/gcn-valu.md
> @@ -3555,30 +3555,63 @@
>  ;; }}}
>  ;; {{{ Int/int conversions
>  
> +(define_code_iterator all_convert [truncate zero_extend sign_extend])
>  (define_code_iterator zero_convert [truncate zero_extend])
>  (define_code_attr convop [
>   (sign_extend "extend")
>   (zero_extend "zero_extend")
>   (truncate "trunc")])
>  
> -(define_insn "2"
> +(define_expand "2"
> +  [(set (match_operand:V_INT_1REG 0 "register_operand"  "=v")
> +(all_convert:V_INT_1REG
> +   (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> +  "")
> +
> +(define_insn "*_sdwa"
>[(set (match_operand:V_INT_1REG 0 "register_operand"  "=v")
>  (zero_convert:V_INT_1REG
> (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> -  ""
> +  "!TARGET_RDNA3"
>"v_mov_b32_sdwa\t%0, %1 dst_sel: dst_unused:UNUSED_PAD 
> src0_sel:"
>[(set_attr "type" "vop_sdwa")
> (set_attr "length" "8")])
>  
> -(define_insn "extend2"
> +(define_insn "extend_sdwa"
>[(set (match_operand:V_INT_1REG 0 "register_operand"   "=v")
>  (sign_extend:V_INT_1REG
> (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> -  ""
> +  "!TARGET_RDNA3"
>"v_mov_b32_sdwa\t%0, sext(%1) src0_sel:"
>[(set_attr "type" "vop_sdwa")
> (set_attr "length" "8")])
>  
> +(define_insn "*_shift"
> +  [(set (match_operand:V_INT_1REG 0 "register_operand"  "=v")
> +(all_convert:V_INT_1REG
> +   (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> +  "TARGET_RDNA3"
> +  {
> +enum {extend, zero_extend, trunc};
> +rtx shiftwidth = (mode == QImode
> +   || mode == QImode
> +   ? GEN_INT (24)
> +   : mode == HImode
> + || mode == HImode
> +   ? GEN_INT (16)
> +   : NULL);
> +operands[2] = shiftwidth;
> +
> +if (!shiftwidth)
> +  return "v_mov_b32 %0, %1";
> +else if ( == extend ||  == trunc)
> +  return "v_lshlrev_b32\t%0, %2, %1\;v_ashrrev_i32\t%0, %2, %0";
> +else
> +  return "v_lshlrev_b32\t%0, %2, %1\;v_lshrrev_b32\t%0, %2, %0";
> +  }
> +  [(set_attr "type" "mult")
> +   (set_attr "length" "8")])

OK to push the attached
"amdgcn: additional gfx1030/gfx1100 support: adjust test cases"?
Tested 'gcn.exp' for all '-march'es.


Grüße
 Thomas


>From 04b83e9aa19b02b9805e03f31db14325bb00e737 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 4 Mar 2024 10:40:39 +0100
Subject: [PATCH] amdgcn: additional gfx1030/gfx1100 support: adjust test cases

The "SDWA" changes in commit 99890e15527f1f04caef95ecdd135c9f1a077f08
"amdgcn: additional gfx1030/gfx1100 support" caused a few regressions:

PASS: gcc.target/gcn/sram-ecc-3.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-3.c scan-assembler zero_extendv64qiv64si2

PASS: gcc.target/gcn/sram-ecc-4.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-4.c scan-assembler zero_extendv64hiv64si2

PASS: gcc.target/gcn/sram-ecc-7.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-7.c scan-assembler zero_extendv64qiv64si2

PASS: gcc.target/gcn/sram-ecc-8.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-8.c scan-assembler zero_extendv64hiv64si2

Those test cases need corresponding adjustment.

	gcc/testsuite/
	* gcc.target/gcn/sram-ecc-3.c: Adjust.
	* gcc.target/gcn/sram-ecc-4.c: Likewise.
	* gcc.target/gcn/sram-ecc-7.c: Likewise.
	* gcc.target/gcn/sram-ecc-8.c: Likewise.
---
 gcc/testsuite/gcc.target/gcn/sram-ecc-3.c | 2 +-
 gcc/testsuite/gcc.target/gcn/sram-ecc-4.c | 2 +-
 gcc/testsuite/gcc.target/gcn/sram-ecc-7.c | 2 +-
 gcc/testsuite/gcc.target/gcn/sram-ecc-8.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/gcn/sram-ecc-3.c b/gcc/testsuite/gcc.target/gcn/sram-ecc-3.c
index 692d4578b66..bc89e3542d2 100644
--- a/gcc/testsuite/gcc.target/gcn/sram-ecc-

Stabilize flaky GCN target/offloading testing

2024-03-06 Thread Thomas Schwinge
Hi!

On 2024-02-21T17:32:13+0100, Richard Biener  wrote:
> Am 21.02.2024 um 13:34 schrieb Thomas Schwinge :
>> [...] per my work on <https://gcc.gnu.org/PR66005>
>> "libgomp make check time is excessive", all execution testing in libgomp
>> is serialized in 'libgomp/testsuite/lib/libgomp.exp:libgomp_load'.  [...]
>> (... with the caveat that execution tests for
>> effective-targets are *not* governed by that, as I've found yesterday.
>> I have a WIP hack for that, too.)

>> What disturbs the testing a lot is, that the GPU may get into a bad
>> state, upon which any use either fails with a
>> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in
>> 'libhsa-runtime64.so.1'...
>> 
>> I've now tried to debug the latter case (hang).  When the GPU gets into
>> this bad state (whatever exactly that is),
>> 'hsa_executable_load_code_object' still returns 'HSA_STATUS_SUCCESS', but
>> then GCN target execution ('gcn-run') hangs in 'hsa_executable_freeze'
>> vs. GCN offloading execution ('libgomp-plugin-gcn.so.1') hangs right
>> before 'hsa_executable_freeze', in the GCN heap setup 'hsa_memory_copy'.
>> There it hangs until killed (for example, until DejaGnu's timeout
>> mechanism kills the process -- just that the next GPU-using execution
>> test then runs into the same thing again...).
>> 
>> In this state (and also the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' state),
>> we're able to recover via:
>> 
>>$ flock /tmp/gpu.lock sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
>>0

At least most of the times.  I've found that -- sometimes... ;-( -- if
you run into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', then do
'amdgpu_gpu_recover', and then immediately re-execute, you'll again run
into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.  That appears to be avoidable
by injecting some artificial "cool-down period"...  (The latter I've not
yet tested extensively.)

>> This is, obviously, a hack, probably needs a serial lock to not disturb
>> other things, has hard-coded 'dri/0', and as I said in
>> <https://inbox.sourceware.org/87plww8qin@euler.schwinge.ddns.net>
>> "GCN RDNA2+ vs. GCC SLP vectorizer":
>> 
>> | I've no idea what
>> | 'amdgpu_gpu_recover' would do if the GPU is also used for display.
>
> It ends up terminating your X session…

Eh  ;'-|

> (there’s some automatic driver recovery that’s also sometimes triggered which 
> sounds like the same thing).

> I need to try using the integrated graphics for X11 to see if that avoids the 
> issue.

A few years ago, I tried that for a Nvidia GPU laptop, and -- if I now
remember correctly -- basically got it to work, via hand-editing
'/etc/X11/xorg.conf' and all that...  But: I couldn't get external HDMI
to work in that setup, and therefore reverted to "standard".

> Guess AMD needs to improve the driver/runtime (or we - it’s open source at 
> least up to the firmware).

>> However, it's very useful in my testing.  :-|
>> 
>> The questions is, how to detect the "hang" state without first running
>> into a timeout (and disambiguating such a timeout from a user code
>> timeout)?  Add a watchdog: call 'alarm([a few seconds])' before device
>> initialization, and before the actual GPU kernel launch cancel it with
>> 'alarm(0)'?  (..., and add a handler for 'SIGALRM' to print a distinct
>> error message that we can then react on, like for
>> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.)  Probably 'alarm'/'SIGALRM' is a
>> no-go in libgomp -- instead, use a helper thread to similarly implement a
>> watchdog?  ('libgomp/plugin/plugin-gcn.c' already is using pthreads for
>> other purposes.)  Any other clever ideas?  What's a suitable value for
>> "a few seconds"?

I'm attaching my current "GCN: Watchdog for device image load", covering
both 'gcc/config/gcn/gcn-run.cc' and 'libgomp/plugin/plugin-gcn.c'.
(That's using 'timer_create' etc. instead of 'alarm'/'SIGALRM'. )

That, plus routing *all* potential GPU usage (in particular: including
execution tests for effective-targets, see above) through a serial lock
('flock', implemented in DejaGnu board file, outside of the the
"DejaGnu timeout domain", similar to
'libgomp/testsuite/lib/libgomp.exp:libgomp_load', see above), plus
catching 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' (both the "real" ones and
the "fake" ones via "GCN: Watchdog for device image load") and in that
case 'amdgpu_gpu_recover' and re-execution of the respective executable,
does greatly stabilize flaky GCN target/offloading testing.

Do we have consensus to move forward with this approach, generally?


Grüße
 Thomas


>From 217953534

Re: [Patch] OpenMP: Reject non-const 'condition' trait in Fortran (was: [Patch] OpenMP: Handle DECL_ASSEMBLER_NAME with 'declare variant')

2024-03-04 Thread Thomas Schwinge
Hi Tobias!

On 2024-02-13T18:31:02+0100, Tobias Burnus  wrote:
> --- a/gcc/fortran/openmp.cc
> +++ b/gcc/fortran/openmp.cc

> +   /* Device number must be conforming, which includes
> +  omp_initial_device (-1) and omp_invalid_device (-4).  */
> +   if (property_kind == OMP_TRAIT_PROPERTY_DEV_NUM_EXPR
> +   && otp->expr->expr_type == EXPR_CONSTANT
> +   && mpz_sgn (otp->expr->value.integer) < 0
> +   && mpz_cmp_si (otp->expr->value.integer, -1) != 0
> +   && mpz_cmp_si (otp->expr->value.integer, -4) != 0)
> + {
> +   gfc_error ("property must be a conforming device number "
> +  "at %C");

Instead of magic numbers, shouldn't this use 'include/gomp-constants.h':

/* We have a compatibility issue.  OpenMP 5.2 introduced
   omp_initial_device with value of -1 which clashes with our
   GOMP_DEVICE_ICV, so we need to remap user supplied device
   ids, -1 (aka omp_initial_device) to GOMP_DEVICE_HOST_FALLBACK,
   and -2 (one of many non-conforming device numbers, but with
   OMP_TARGET_OFFLOAD=mandatory needs to be treated a
   omp_invalid_device) to -3 (so that for dev_num >= -2U we can
   subtract 1).  -4 is then what we use for omp_invalid_device,
   which unlike the other non-conforming device numbers results
   in fatal error regardless of OMP_TARGET_OFFLOAD.  */
#define GOMP_DEVICE_ICV -1
#define GOMP_DEVICE_HOST_FALLBACK   -2
#define GOMP_DEVICE_INVALID -4


Grüße
 Thomas


Update GCC 14 OpenACC changes some more (was: [wwwdocs] gcc-14/changes.html + projects/gomp/: OpenMP + OpenACC update)

2024-03-01 Thread Thomas Schwinge
Hi!

On 2024-02-27T20:16:52+0100, Tobias Burnus  wrote:
> Minor update for older and more recent changes.
>
> Comments?

> gcc-14/changes.html + projects/gomp/: OpenMP + OpenACC update

> Update OpenACC for one new feature (Fortran interface to exisiting
> C/C++ routines).

> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html

> +OpenACC 3.2: The following API routines are now available in
> +  Fortran using the openacc module or the
> +  open_lib.h header file: acc_alloc,
> +  acc_free, acc_hostptr,
> +  acc_deviceptr, acc_memcpy_to_device,
> +  acc_memcpy_to_device_async,
> +  acc_memcyp_from_device and
> +  acc_memcyp_from_device_async.

Thanks -- but you have to improve your copy'n'paste skills.  ;-P

On top of your wwwdocs commit f92f353bb0e932edba7d063b2609943683cf0a36
"gcc-14/changes.html + projects/gomp/: OpenMP + OpenACC update", I've
pushed commit df2bc49fc018c2b1aeb27030fe1967470d0d4ec3
"Update GCC 14 OpenACC changes some more", see attached.


Grüße
 Thomas


>From df2bc49fc018c2b1aeb27030fe1967470d0d4ec3 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 1 Mar 2024 15:01:54 +0100
Subject: [PATCH] Update GCC 14 OpenACC changes some more

Follow-up to commit f92f353bb0e932edba7d063b2609943683cf0a36
"gcc-14/changes.html + projects/gomp/: OpenMP + OpenACC update":

  - 's%acc_alloc%acc_malloc'
  - add 'acc_map_data' and 'acc_unmap_data'
  - swap 'acc_deviceptr' and 'acc_hostptr'
  - 's%memcyp%memcpy%g'
---
 htdocs/gcc-14/changes.html | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index e8004d4a..d88fbc96 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -120,12 +120,14 @@ a work-in-progress.
   constructs.
 OpenACC 3.2: The following API routines are now available in
   Fortran using the openacc module or the
-  openacc_lib.h header file: acc_alloc,
-  acc_free, acc_hostptr,
-  acc_deviceptr, acc_memcpy_to_device,
+  openacc_lib.h header file:
+  acc_malloc, acc_free,
+  acc_map_data, acc_unmap_data,
+  acc_deviceptr, acc_hostptr,
+  acc_memcpy_to_device,
   acc_memcpy_to_device_async,
-  acc_memcyp_from_device, and
-  acc_memcyp_from_device_async.
+  acc_memcpy_from_device, and
+  acc_memcpy_from_device_async.
   
   
   For offload-device code generated via OpenMP and OpenACC, the math
-- 
2.43.0



Re: [committed] Set num_threads to 50 on 32-bit hppa in two libgomp loop tests

2024-02-29 Thread Thomas Schwinge
Hi!

On 2024-02-01T19:20:57+, John David Anglin  wrote:
> Tested on hppa-unknown-linux-gnu.  Committed to trunk.

> Set num_threads to 50 on 32-bit hppa in two libgomp loop tests
>
> We support a maximum of 50 threads on 32-bit hppa.

What happens if you go higher?  Curious, what/why is that architectural
limit of 50 threads?

I wonder: shouldn't that cap at 50 threads happen inside libgomp,
generally, instead of per test case and user code (!)?  Per my
understanding, OpenMP 'num_threads' specifies a *desired* number of
threads; the implementation may limit that value.


Grüße
 Thomas


> --- a/libgomp/testsuite/libgomp.c++/loop-3.C
> +++ b/libgomp/testsuite/libgomp.c++/loop-3.C
> @@ -1,3 +1,9 @@
> +#if defined(__hppa__) && !defined(__LP64__)
> +#define NUM_THREADS 50
> +#else
> +#define NUM_THREADS 64
> +#endif
> +
>  extern "C" void abort (void);
>  int a;
>  
> @@ -19,7 +25,7 @@ foo ()
>  int
>  main (void)
>  {
> -#pragma omp parallel num_threads (64)
> +#pragma omp parallel num_threads (NUM_THREADS)
>foo ();
>  
>return 0;

> --- a/libgomp/testsuite/libgomp.c/omp-loop03.c
> +++ b/libgomp/testsuite/libgomp.c/omp-loop03.c
> @@ -1,3 +1,9 @@
> +#if defined(__hppa__) && !defined(__LP64__)
> +#define NUM_THREADS 50
> +#else
> +#define NUM_THREADS 64
> +#endif
> +
>  extern void abort (void);
>  int a;
>  
> @@ -19,7 +25,7 @@ foo ()
>  int
>  main (void)
>  {
> -#pragma omp parallel num_threads (64)
> +#pragma omp parallel num_threads (NUM_THREADS)
>foo ();
>  
>return 0;


Re: [PATCH] lto, Darwin: Fix offload section names.

2024-02-29 Thread Thomas Schwinge
Hi Iain!

On 2024-01-16T15:00:16+, Iain Sandoe  wrote:
> Currently, these section names have wrong syntax for Mach-O.
> Although they were added some time ago; recently added tests are
> now emitting them leading to new fails on Darwin.
>
> This adds a Mach-O variant for each.

>  gcc/lto-section-names.h | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/lto-section-names.h b/gcc/lto-section-names.h
> index a743deb4efb..1cdadf36ec0 100644
> --- a/gcc/lto-section-names.h
> +++ b/gcc/lto-section-names.h
> @@ -25,7 +25,11 @@ along with GCC; see the file COPYING3.  If not see
> name for the functions and static_initializers.  For other types of
> sections a '.' and the section type are appended.  */
>  #define LTO_SECTION_NAME_PREFIX ".gnu.lto_"
> +#if OBJECT_FORMAT_MACHO
> +#define OFFLOAD_SECTION_NAME_PREFIX "__GNU_OFFLD_LTO,"
> +#else
>  #define OFFLOAD_SECTION_NAME_PREFIX ".gnu.offload_lto_"
> +#endif
>  
>  /* Can be either OFFLOAD_SECTION_NAME_PREFIX when we stream IR for offload
> compiler, or LTO_SECTION_NAME_PREFIX for LTO case.  */
> @@ -35,8 +39,14 @@ extern const char *section_name_prefix;
>  
>  #define LTO_SEGMENT_NAME "__GNU_LTO"
>  
> +#if OBJECT_FORMAT_MACHO
> +#define OFFLOAD_VAR_TABLE_SECTION_NAME "__GNU_OFFLOAD,__vars"
> +#define OFFLOAD_FUNC_TABLE_SECTION_NAME "__GNU_OFFLOAD,__funcs"
> +#define OFFLOAD_IND_FUNC_TABLE_SECTION_NAME "__GNU_OFFLOAD,__ind_fns"
> +#else
>  #define OFFLOAD_VAR_TABLE_SECTION_NAME ".gnu.offload_vars"
>  #define OFFLOAD_FUNC_TABLE_SECTION_NAME ".gnu.offload_funcs"
>  #define OFFLOAD_IND_FUNC_TABLE_SECTION_NAME ".gnu.offload_ind_funcs"
> +#endif
>  
>  #endif /* GCC_LTO_SECTION_NAMES_H */

Just to note that, per my understanding, this will require corresponding
changes elsewhere, once you attempt to actually enable offloading
compilation for Darwin (which -- ;-) I suspect -- is not on your agenda
right now):

$ git grep --cached -F .gnu.offload_
gcc/config/gcn/mkoffload.cc:  if (sscanf (buf, " .section 
.gnu.offload_vars%c", ) > 0)
gcc/config/gcn/mkoffload.cc:  else if (sscanf (buf, " .section 
.gnu.offload_funcs%c", ) > 0)
gcc/config/gcn/mkoffload.cc:  /* Likewise for .gnu.offload_vars; used 
for reverse offload. */
gcc/config/gcn/mkoffload.cc:  else if (sscanf (buf, " .section 
.gnu.offload_ind_funcs%c", ) > 0)
['gcc/lto-section-names.h' adjusted per above.]
libgcc/offloadstuff.c:#define OFFLOAD_FUNC_TABLE_SECTION_NAME 
".gnu.offload_funcs"
libgcc/offloadstuff.c:#define OFFLOAD_IND_FUNC_TABLE_SECTION_NAME 
".gnu.offload_ind_funcs"
libgcc/offloadstuff.c:#define OFFLOAD_VAR_TABLE_SECTION_NAME 
".gnu.offload_vars"
lto-plugin/lto-plugin.c:  if (startswith (name, ".gnu.offload_lto_.opts"))


Grüße
 Thomas


Re: [patch] OpenACC: Add Fortran routines acc_{alloc,free,hostptr,deviceptr,memcpy_{to,from}_device*}

2024-02-27 Thread Thomas Schwinge
Hi Tobias!

On 2024-02-27T13:29:33+0100, Tobias Burnus  wrote:
> Thomas Schwinge:
>>>   @table @asis
>>>   @item @emph{Description}
>>> -This function allocates @var{len} bytes of device memory. It returns
>>> +This function allocates @var{bytes} of device memory. It returns

>> Not '@var{bytes} {+bytes+}' or similar?
>
> I think either works – depending how one parses @var{} mentally, 
> one of the variants sounds smooth and the other very odd. But I can/will 
> change it.

Yeah, I see.  Not the strongest argument ("upstream vs. local" style),
but I see that while OpenACC 3.3 doesn't for 'acc_malloc', it does, for
example, for 'acc_copyin' talk about "'bytes' bytes" (or, avoiding the
issue: "'bytes' specifies the data size in bytes").


>>> --- a/libgomp/openacc.f90
>>> +++ b/libgomp/openacc.f90

>> Assuming that 'module openacc_internal' currently is sorted per
>> appearance in the OpenACC specification (?), I suggest we continue to do
>> so.  (..., like in 'openacc_lib.h', too.)

> I will check – it looks only block-wise sorted but I might be wrong.

OK, but please don't sink too much time into that.

> I 
> followed location of the comments, placing it before the routines that 
> followed the comment, assuming that the comments were at the right spot.


>>> @@ -794,6 +881,9 @@ module openacc
>>> ...
>>> +  public :: acc_malloc, acc_free, acc_map_data, acc_unmap_data, 
>>> acc_deviceptr
>>> +  public :: acc_hostptr, acc_memcpy_to_device, acc_memcpy_to_device_async
>>> +  public :: acc_memcpy_from_device, acc_memcpy_from_device_async
>>>   ...
>>> -  ! acc_malloc: Only available in C/C++
>>> -  ! acc_free: Only available in C/C++
>>> -
>>> ...
>>> interface acc_is_present
>>>   procedure :: acc_is_present_32_h
>>>   procedure :: acc_is_present_64_h
>>>   procedure :: acc_is_present_array_h
>>> end interface

>> Is that now a different style that we're not listing the new interfaces
>> in 'module openacc' here?
>
> As there is no precedent for this type of interface, the style is by 
> nature differently. But the question is which style is better. The 
> current 'openacc' is very short – and contains not a single specific 
> interface, but only generic interfaces. The actual specific-procedure 
> declarations are only in 'openacc_internal'.
>
> Those new procedures are the first ones that do not have a generic 
> interface and only a specific one. Thus, one can either put the specific 
> one into 'openacc_internal' and refer it from 'openacc' (via 'use 
> openacc_internal' + 'public :: acc_') – or place the 
> interface directly into 'openacc' (and not touching 'openacc_internal' 
> at all).
>
> During development, I had a accidentally a mixture between both - and 
> then settled for the current variant. – Possibly, moving the interface 
> to 'openacc' is clearer?
>
> Thoughts?

No, sorry.  As I said: "I don't know much about Fortran interfaces".  :-|


>>> --- /dev/null
>>> +++ b/libgomp/testsuite/libgomp.fortran/acc_host_device_ptr.f90

>>> +  ! The following assumes sizeof(void*) being the same on host and device:

>> That's generally required anyway.
>
> I have to admit that I don't know OpenACC well enough to see whether 
> that's the case or not.

My thinking, "simply", is that this follows implicitly from the fact that
data layout has to match between host and device, and if pointers have
different sizes, that breaks?

For example, OpenACC 3.3, 2.6.4 "Data Structures with Pointers":

| [...]
| When a data object is copied to device memory, the values are copied exactly. 
If the data is a data
| structure that includes a pointer, or is just a pointer, the pointer value 
copied to device memory
| will be the host pointer value. [...]

> And, while I am not very consistent, I do try to 
> document stricter requirements / implementation-specific parts in a 
> testcases.

ACK, that's always good practice.

> I know that OpenMP permits that the pointer size differs

Oh, really!?

> and 'void *p = 
> omp_target_alloc (...);' might in this case not return the device 
> pointer but a handle to the device ptr. (For instance, it could be a 
> pointer to an uint128_t variable for a 128bit device pointer; I think 
> such a hardware exists in real - and uses several bits for other 
> purposes like flags.)

I do see in OpenMP 5.2, 1.2.6 "Data Terminology":

| *device address*  An address of an object that may be referenced on a _target 
device_.

| *device pointer*  An _implementation-defined handle_ that refers to a _device 
address_.

...,

Re: [patch] OpenACC: Add Fortran routines acc_{alloc,free,hostptr,deviceptr,memcpy_{to,from}_device*}

2024-02-27 Thread Thomas Schwinge
Hi Tobias!

On 2024-02-19T22:36:51+0100, Tobias Burnus  wrote:
> While waiting for some testing to finish, I got distracted and added the
> very low hanging OpenACC 3.3 fruits, i.e. those Fortran routines that directly
> map to their C counter part.
>
> Comments, remarks?

Thanks, that largely looks straight-forward.  I've not done an in-depth
review, just a few comments.  Resolve these as you think is necessary,
and then 'git push'.

I don't know much about Fortran interfaces -- I trust you've got that
under control.  ;-)

Thanks for the test cases.  Would be nice to have test cases covering all
interfaces -- but I don't think we're currently complete in that regard,
so shall not hold your contribution to higher standards.

> OpenACC: Add Fortran routines 
> acc_{alloc,free,hostptr,deviceptr,memcpy_{to,from}_device*}
>
> These routines map simply to the C counterpart and are meanwhile
> defined in OpenACC 3.3. (There are additional routine changes,
> including the Fortran addition of acc_attach/acc_detach, that
> require more work than a simple addition of an interface and
> are therefore excluded.)

I saw:

  -  "Bogus 'Warning: Interface mismatch in 
global procedure' with C binding"
  -  "[OpenACC][OpenACC 3.3] Add 
'acc_attach'/'acc_detach' routine"

> --- a/libgomp/libgomp.texi
> +++ b/libgomp/libgomp.texi

>  @section @code{acc_malloc} -- Allocate device memory.
>  @table @asis
>  @item @emph{Description}
> -This function allocates @var{len} bytes of device memory. It returns
> +This function allocates @var{bytes} of device memory. It returns

Not '@var{bytes} {+bytes+}' or similar?

>  @section @code{acc_memcpy_to_device} -- Copy host memory to device memory.

>  @item @emph{C/C++}:
>  @multitable @columnfractions .20 .80
> -@item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void 
> *src, size_t bytes);}
> +@item @emph{Prototype}: @tab @code{void acc_memcpy_to_device(d_void* 
> data_dev_dest,}
> +@item   @tab @code{h_void* data_host_src, size_t bytes);}
> +@item @emph{Prototype}: @tab @code{void acc_memcpy_to_device_async(d_void* 
> data_dev_dest,}
> +@item   @tab @code{h_void* data_host_src, size_t bytes, int 
> async_arg);}
> +@end multitable
> +
> +@item @emph{Fortran}:
> +@multitable @columnfractions .20 .80
> +@item @emph{Interface}: @tab @code{subroutine 
> acc_memcpy_to_device(data_dev_dest, &}
> +@item   @tab @code{data_host_src, bytes)}
> +@item @emph{Interface}: @tab @code{subroutine 
> acc_memcpy_to_device_async(data_dev_dest, &}
> +@item   @tab @code{data_host_src, bytes, async_arg)}
> +@item   @tab @code{type(c_ptr), value :: data_dev_dest}
> +@item   @tab @code{type(*), dimension(*) :: data_host_src}
> +@item   @tab @code{integer(c_size_t), value :: bytes}
> +@item   @tab @code{integer(acc_handle_kind), value :: 
> async_arg}
>  @end multitable

I did wonder whether we should (here, and elsewhere) also update the
'@menu' in "OpenACC Runtime Library Routines" to list the 'async'
routines -- but the OpenACC specification also doesn't, so it shall be
fine as is here, too.

>  @item @emph{Reference}:
>  @uref{https://www.openacc.org, OpenACC specification v2.6}, section
> -3.2.31.
> +3.2.31  @uref{https://www.openacc.org, OpenACC specification v3.3}, section

(Fine as is, of course, but could -- generally -- simplify the 'diff' by
starting the new '@uref' on its own line.)

> +3.2.26..

Double '.'.

> --- a/libgomp/openacc.f90
> +++ b/libgomp/openacc.f90
> @@ -758,6 +758,93 @@ module openacc_internal
>integer (c_int), value :: async
>  end subroutine
>end interface
> +
> +  interface
> +type(c_ptr) function acc_malloc (bytes) bind(C)
> +[...]
> +end subroutine
> +  end interface
>  end module openacc_internal

Assuming that 'module openacc_internal' currently is sorted per
appearance in the OpenACC specification (?), I suggest we continue to do
so.  (..., like in 'openacc_lib.h', too.)

> @@ -794,6 +881,9 @@ module openacc
>public :: acc_copyin_async, acc_create_async, acc_copyout_async
>public :: acc_delete_async, acc_update_device_async, acc_update_self_async
>public :: acc_copyout_finalize, acc_delete_finalize
> +  public :: acc_malloc, acc_free, acc_map_data, acc_unmap_data, acc_deviceptr
> +  public :: acc_hostptr, acc_memcpy_to_device, acc_memcpy_to_device_async
> +  public :: acc_memcpy_from_device, acc_memcpy_from_device_async

Likewise.

> @@ -871,9 +961,6 @@ module openacc
>  procedure :: acc_on_device_h
>end interface
>  
> -  ! acc_malloc: Only available in C/C++
> -  ! acc_free: Only available in C/C++
> -
>! As vendor extension, the following code supports both 32bit and 64bit
>! arguments for "size"; the OpenACC standard only permits default-kind
>! integers, which are of kind 4 (i.e. 32 bits).
> @@ -953,20 

Stabilizing flaky libgomp GCN target/offloading testing (was: libgomp GCN gfx1030/gfx1100 offloading status)

2024-02-21 Thread Thomas Schwinge
Hi!

On 2024-02-01T15:49:02+0100, Richard Biener  wrote:
> On Thu, 1 Feb 2024, Thomas Schwinge wrote:
>> On 2024-01-26T10:45:10+0100, Richard Biener  wrote:
>> > On Fri, 26 Jan 2024, Richard Biener wrote:
>> >> On Wed, 24 Jan 2024, Andrew Stubbs wrote:
>> >> > [...] is enough to get gfx1100 working for most purposes, on top of the
>> >> > patch that Tobias committed a week or so ago; there are still some test
>> >> > failures to investigate, and probably some tuning to do.
>> >> > 
>> >> > It might also get gfx1030 working too. @Richi, could you test it,
>> >> > please?
>> >> 
>> >> I can report partial success here.  [...]
>> 
>> >> I'll followup with a test summary once the (serial) run of libgomp
>> >> testing finished.
>> 
>> (Why serial, by the way?)
>
> Just out of caution ... (I'm using the GPU for the desktop at the
> same time and dmesg gets spammed with some not-so reassuring
> "errors" during the offloading)

Yeah, indeed 'dmesg' is full of "notes"...

However, note that per my work on <https://gcc.gnu.org/PR66005>
"libgomp make check time is excessive", all execution testing in libgomp
is serialized in 'libgomp/testsuite/lib/libgomp.exp:libgomp_load'.  So,
no problem/difference in that regard, to run parallel
'check-target-libgomp'.  (... with the caveat that execution tests for
effective-targets are *not* governed by that, as I've found yesterday.
I have a WIP hack for that, too.)


>> [...] what I
>> got with '-march=gfx1100' for AMD Radeon RX 7900 XTX.  [...]

>> [...] execution test FAILs.  Not all FAILs appear all the time [...]

What disturbs the testing a lot is, that the GPU may get into a bad
state, upon which any use either fails with a
'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in
'libhsa-runtime64.so.1'...

I've now tried to debug the latter case (hang).  When the GPU gets into
this bad state (whatever exactly that is),
'hsa_executable_load_code_object' still returns 'HSA_STATUS_SUCCESS', but
then GCN target execution ('gcn-run') hangs in 'hsa_executable_freeze'
vs. GCN offloading execution ('libgomp-plugin-gcn.so.1') hangs right
before 'hsa_executable_freeze', in the GCN heap setup 'hsa_memory_copy'.
There it hangs until killed (for example, until DejaGnu's timeout
mechanism kills the process -- just that the next GPU-using execution
test then runs into the same thing again...).

In this state (and also the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' state),
we're able to recover via:

$ flock /tmp/gpu.lock sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
0

This is, obviously, a hack, probably needs a serial lock to not disturb
other things, has hard-coded 'dri/0', and as I said in
<https://inbox.sourceware.org/87plww8qin@euler.schwinge.ddns.net>
"GCN RDNA2+ vs. GCC SLP vectorizer":

| I've no idea what
| 'amdgpu_gpu_recover' would do if the GPU is also used for display.

However, it's very useful in my testing.  :-|

The questions is, how to detect the "hang" state without first running
into a timeout (and disambiguating such a timeout from a user code
timeout)?  Add a watchdog: call 'alarm([a few seconds])' before device
initialization, and before the actual GPU kernel launch cancel it with
'alarm(0)'?  (..., and add a handler for 'SIGALRM' to print a distinct
error message that we can then react on, like for
'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.)  Probably 'alarm'/'SIGALRM' is a
no-go in libgomp -- instead, use a helper thread to similarly implement a
watchdog?  ('libgomp/plugin/plugin-gcn.c' already is using pthreads for
other purposes.)  Any other clever ideas?  What's a suitable value for
"a few seconds"?


Grüße
 Thomas


Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-20 Thread Thomas Schwinge
Hi Richard!

On 2024-02-20T08:44:35+0100, Richard Biener  wrote:
> On Mon, 19 Feb 2024, Thomas Schwinge wrote:
>> On 2024-02-19T17:31:20+0100, I wrote:
>> > On 2024-02-19T11:52:55+0100, Richard Biener  wrote:
>> >> On Mon, 19 Feb 2024, Thomas Schwinge wrote:
>> >>> On 2024-02-16T14:53:04+0100, I wrote:
>> >>> > On 2024-02-16T12:41:06+, Andrew Stubbs  wrote:
>> >>> >> On 16/02/2024 12:26, Richard Biener wrote:
>> >>> >>> On Fri, 16 Feb 2024, Andrew Stubbs wrote:
>> >>> >>>> On 16/02/2024 10:17, Richard Biener wrote:
>> >>> >>>>> On Fri, 16 Feb 2024, Thomas Schwinge wrote:
>> >>> >>>>>> On 2023-10-20T12:51:03+0100, Andrew Stubbs 
>> >>> >>>>>>  wrote:
>> >>> >>>>>>> I've committed this patch
>> >>> >>>>>>
>> >>> >>>>>> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
>> >>> >>>>>> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later 
>> >>> >>>>>> RDNA3/gfx1100
>> >>> >>>>>> support builds on top of, and that's what I'm currently working on
>> >>> >>>>>> getting proper GCC/GCN target (not offloading) results for.
>> >>> >>>>>>
>> >>> >>>>>> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably 
>> >>> >>>>>> simple,
>> >>> >>>>>> and hopefully representative for other SLP execution test FAILs
>> >>> >>>>>> (regressions compared to my earlier non-gfx1100 testing).
>> >>> >>>>>>
>> >>> >>>>>>   $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/
>> >>> >>>>>>   source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
>> >>> >>>>>>   --sysroot=install/amdgcn-amdhsa -ftree-vectorize
>> >>> >>>>>>   -fno-tree-loop-distribute-patterns -fno-vect-cost-model 
>> >>> >>>>>> -fno-common
>> >>> >>>>>>   -O2 -fdump-tree-slp-details -fdump-tree-vect-details 
>> >>> >>>>>> -isystem
>> >>> >>>>>>   build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem
>> >>> >>>>>>   source-gcc/newlib/libc/include
>> >>> >>>>>>   -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/
>> >>> >>>>>>   -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper
>> >>> >>>>>>   setarch,--addr-no-randomize -fdump-tree-all-all 
>> >>> >>>>>> -fdump-ipa-all-all
>> >>> >>>>>>   -fdump-rtl-all-all -save-temps -march=gfx1100
>> >>> >>>>>>
>> >>> >>>>>> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from
>> >>> >>>>>> 'TARGET_PACKED_WORK_ITEMS' in 
>> >>> >>>>>> 'gcn_target_asm_function_prologue'), so I
>> >>> >>>>>> suppose will also exhibit the same failure mode, once again?
>> >>> >>>>>>
>> >>> >>>>>> Compared to '-march=gfx90a', the differences begin in
>> >>> >>>>>> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 
>> >>> >>>>>> 'a-bb-slp-cond-1.s'.
>> >>> >>>>>>
>> >>> >>>>>> Changed like:
>> >>> >>>>>>
>> >>> >>>>>>   @@ -38,10 +38,10 @@ int main ()
>> >>> >>>>>>#pragma GCC novector
>> >>> >>>>>>  for (i = 1; i < N; i++)
>> >>> >>>>>>if (a[i] != i%4 + 1)
>> >>> >>>>>>   -  abort ();
>> >>> >>>>>>   +  __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1);
>> >>> >>>>>>
>> >>> >>>>>>  if (a[0] != 5)
>> >>> >>>>>>   -abort ();
>> >>> >>>>>>   +__builtin_printf("%d %d !=

GCN: Restore lost '__gfx90a__' target CPU definition (was: [Patch] GCN: Add pre-initial support for gfx1100)

2024-02-19 Thread Thomas Schwinge
Hi!

On 2024-01-07T20:20:19+0100, Tobias Burnus  wrote:
> --- a/gcc/config/gcn/gcn.h
> +++ b/gcc/config/gcn/gcn.h
> @@ -30,6 +30,8 @@
>   builtin_define ("__CDNA2__");  \
>else if (TARGET_RDNA2) 
>   \
>   builtin_define ("__RDNA2__");  \
> +  else if (TARGET_RDNA3) 
>   \
> + builtin_define ("__RDNA3__");  \
>if (TARGET_FIJI)   
>   \
>   {  \
> builtin_define ("__fiji__"); \
> @@ -41,11 +43,13 @@
>   builtin_define ("__gfx906__"); \
>else if (TARGET_GFX908)
>   \
>   builtin_define ("__gfx908__"); \
> -  else if (TARGET_GFX90a)
>   \
> - builtin_define ("__gfx90a__"); \
> +  else if (TARGET_GFX1030)   
>   \
> + builtin_define ("__gfx1030");  \
> +  else if (TARGET_GFX1100)   
>   \
> + builtin_define ("__gfx1100__");\
>} while (0)

Supposedly it wasn't intentional that we lost gfx90a here -- I've pushed
to master branch commit 159174f25716c18a74a915cb01b9a28024ea7a3d
"GCN: Restore lost '__gfx90a__' target CPU definition", see attached.


Grüße
 Thomas


>From 159174f25716c18a74a915cb01b9a28024ea7a3d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 8 Feb 2024 23:27:19 +0100
Subject: [PATCH] GCN: Restore lost '__gfx90a__' target CPU definition

Also, add some safeguards for the future.

Fix-up for commit 52a2c659ae6c21f84b6acce0afcb9b93b9dc71a0
"GCN: Add pre-initial support for gfx1100".

	gcc/
	* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Restore lost
	'__gfx90a__' target CPU definition.  Add some safeguards for the future.
---
 gcc/config/gcn/gcn.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index a17f16aacc40..c314c7b4ae8e 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -32,6 +32,8 @@
 	builtin_define ("__RDNA2__");  \
   else if (TARGET_RDNA3)   \
 	builtin_define ("__RDNA3__");  \
+  else \
+	gcc_unreachable ();\
   if (TARGET_FIJI) \
 	{  \
 	  builtin_define ("__fiji__"); \
@@ -43,10 +45,14 @@
 	builtin_define ("__gfx906__"); \
   else if (TARGET_GFX908)  \
 	builtin_define ("__gfx908__"); \
+  else if (TARGET_GFX90a)  \
+	builtin_define ("__gfx90a__"); \
   else if (TARGET_GFX1030) \
 	builtin_define ("__gfx1030");  \
   else if (TARGET_GFX1100) \
 	builtin_define ("__gfx1100__");\
+  else \
+	gcc_unreachable ();\
   } while (0)
 
 #define ASSEMBLER_DIALECT (TARGET_RDNA2_PLUS ? 1 : 0)
-- 
2.43.0



Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-19 Thread Thomas Schwinge
Hi!

On 2024-02-19T17:31:20+0100, I wrote:
> On 2024-02-19T11:52:55+0100, Richard Biener  wrote:
>> On Mon, 19 Feb 2024, Thomas Schwinge wrote:
>>> On 2024-02-16T14:53:04+0100, I wrote:
>>> > On 2024-02-16T12:41:06+, Andrew Stubbs  wrote:
>>> >> On 16/02/2024 12:26, Richard Biener wrote:
>>> >>> On Fri, 16 Feb 2024, Andrew Stubbs wrote:
>>> >>>> On 16/02/2024 10:17, Richard Biener wrote:
>>> >>>>> On Fri, 16 Feb 2024, Thomas Schwinge wrote:
>>> >>>>>> On 2023-10-20T12:51:03+0100, Andrew Stubbs  
>>> >>>>>> wrote:
>>> >>>>>>> I've committed this patch
>>> >>>>>>
>>> >>>>>> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
>>> >>>>>> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later 
>>> >>>>>> RDNA3/gfx1100
>>> >>>>>> support builds on top of, and that's what I'm currently working on
>>> >>>>>> getting proper GCC/GCN target (not offloading) results for.
>>> >>>>>>
>>> >>>>>> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably 
>>> >>>>>> simple,
>>> >>>>>> and hopefully representative for other SLP execution test FAILs
>>> >>>>>> (regressions compared to my earlier non-gfx1100 testing).
>>> >>>>>>
>>> >>>>>>   $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/
>>> >>>>>>   source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
>>> >>>>>>   --sysroot=install/amdgcn-amdhsa -ftree-vectorize
>>> >>>>>>   -fno-tree-loop-distribute-patterns -fno-vect-cost-model 
>>> >>>>>> -fno-common
>>> >>>>>>   -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem
>>> >>>>>>   build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem
>>> >>>>>>   source-gcc/newlib/libc/include
>>> >>>>>>   -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/
>>> >>>>>>   -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper
>>> >>>>>>   setarch,--addr-no-randomize -fdump-tree-all-all 
>>> >>>>>> -fdump-ipa-all-all
>>> >>>>>>   -fdump-rtl-all-all -save-temps -march=gfx1100
>>> >>>>>>
>>> >>>>>> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from
>>> >>>>>> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), 
>>> >>>>>> so I
>>> >>>>>> suppose will also exhibit the same failure mode, once again?
>>> >>>>>>
>>> >>>>>> Compared to '-march=gfx90a', the differences begin in
>>> >>>>>> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'.
>>> >>>>>>
>>> >>>>>> Changed like:
>>> >>>>>>
>>> >>>>>>   @@ -38,10 +38,10 @@ int main ()
>>> >>>>>>#pragma GCC novector
>>> >>>>>>  for (i = 1; i < N; i++)
>>> >>>>>>if (a[i] != i%4 + 1)
>>> >>>>>>   -  abort ();
>>> >>>>>>   +  __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1);
>>> >>>>>>
>>> >>>>>>  if (a[0] != 5)
>>> >>>>>>   -abort ();
>>> >>>>>>   +__builtin_printf("%d %d != %d\n", 0, a[0], 5);
>>> >>>>>>
>>> >>>>>> ..., we see:
>>> >>>>>>
>>> >>>>>>   $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
>>> >>>>>>   40 5 != 1
>>> >>>>>>   41 6 != 2
>>> >>>>>>   42 7 != 3
>>> >>>>>>   43 8 != 4
>>> >>>>>>   44 5 != 1
>>> >>>>>>   45 6 != 2
>>> >>>>>>   46 7 != 3
>>> >>>>>>   47 8 != 4
>>> >>>>>>
>>>

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-19 Thread Thomas Schwinge
Hi!

On 2024-02-19T11:52:55+0100, Richard Biener  wrote:
> On Mon, 19 Feb 2024, Thomas Schwinge wrote:
>> On 2024-02-16T14:53:04+0100, I wrote:
>> > On 2024-02-16T12:41:06+, Andrew Stubbs  wrote:
>> >> On 16/02/2024 12:26, Richard Biener wrote:
>> >>> On Fri, 16 Feb 2024, Andrew Stubbs wrote:
>> >>>> On 16/02/2024 10:17, Richard Biener wrote:
>> >>>>> On Fri, 16 Feb 2024, Thomas Schwinge wrote:
>> >>>>>> On 2023-10-20T12:51:03+0100, Andrew Stubbs  
>> >>>>>> wrote:
>> >>>>>>> I've committed this patch
>> >>>>>>
>> >>>>>> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
>> >>>>>> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later 
>> >>>>>> RDNA3/gfx1100
>> >>>>>> support builds on top of, and that's what I'm currently working on
>> >>>>>> getting proper GCC/GCN target (not offloading) results for.
>> >>>>>>
>> >>>>>> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably 
>> >>>>>> simple,
>> >>>>>> and hopefully representative for other SLP execution test FAILs
>> >>>>>> (regressions compared to my earlier non-gfx1100 testing).
>> >>>>>>
>> >>>>>>   $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/
>> >>>>>>   source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
>> >>>>>>   --sysroot=install/amdgcn-amdhsa -ftree-vectorize
>> >>>>>>   -fno-tree-loop-distribute-patterns -fno-vect-cost-model 
>> >>>>>> -fno-common
>> >>>>>>   -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem
>> >>>>>>   build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem
>> >>>>>>   source-gcc/newlib/libc/include
>> >>>>>>   -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/
>> >>>>>>   -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper
>> >>>>>>   setarch,--addr-no-randomize -fdump-tree-all-all 
>> >>>>>> -fdump-ipa-all-all
>> >>>>>>   -fdump-rtl-all-all -save-temps -march=gfx1100
>> >>>>>>
>> >>>>>> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from
>> >>>>>> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), so 
>> >>>>>> I
>> >>>>>> suppose will also exhibit the same failure mode, once again?
>> >>>>>>
>> >>>>>> Compared to '-march=gfx90a', the differences begin in
>> >>>>>> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'.
>> >>>>>>
>> >>>>>> Changed like:
>> >>>>>>
>> >>>>>>   @@ -38,10 +38,10 @@ int main ()
>> >>>>>>#pragma GCC novector
>> >>>>>>  for (i = 1; i < N; i++)
>> >>>>>>if (a[i] != i%4 + 1)
>> >>>>>>   -  abort ();
>> >>>>>>   +  __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1);
>> >>>>>>
>> >>>>>>  if (a[0] != 5)
>> >>>>>>   -abort ();
>> >>>>>>   +__builtin_printf("%d %d != %d\n", 0, a[0], 5);
>> >>>>>>
>> >>>>>> ..., we see:
>> >>>>>>
>> >>>>>>   $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
>> >>>>>>   40 5 != 1
>> >>>>>>   41 6 != 2
>> >>>>>>   42 7 != 3
>> >>>>>>   43 8 != 4
>> >>>>>>   44 5 != 1
>> >>>>>>   45 6 != 2
>> >>>>>>   46 7 != 3
>> >>>>>>   47 8 != 4
>> >>>>>>
>> >>>>>> '40..47' are the 'i = 10..11' in 'foo', and the expectation is
>> >>>>>> 'a[i * stride + 0..3] != 0'.  So, either some earlier iteration has
>> >>>>>> scribbled zero values over these (vector lane masking issue, 
>> >>>>>> p

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-19 Thread Thomas Schwinge
Hi!

On 2024-02-16T14:53:04+0100, I wrote:
> On 2024-02-16T12:41:06+, Andrew Stubbs  wrote:
>> On 16/02/2024 12:26, Richard Biener wrote:
>>> On Fri, 16 Feb 2024, Andrew Stubbs wrote:
>>>> On 16/02/2024 10:17, Richard Biener wrote:
>>>>> On Fri, 16 Feb 2024, Thomas Schwinge wrote:
>>>>>> On 2023-10-20T12:51:03+0100, Andrew Stubbs  wrote:
>>>>>>> I've committed this patch
>>>>>>
>>>>>> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
>>>>>> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later RDNA3/gfx1100
>>>>>> support builds on top of, and that's what I'm currently working on
>>>>>> getting proper GCC/GCN target (not offloading) results for.
>>>>>>
>>>>>> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably simple,
>>>>>> and hopefully representative for other SLP execution test FAILs
>>>>>> (regressions compared to my earlier non-gfx1100 testing).
>>>>>>
>>>>>>   $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/
>>>>>>   source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
>>>>>>   --sysroot=install/amdgcn-amdhsa -ftree-vectorize
>>>>>>   -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common
>>>>>>   -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem
>>>>>>   build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem
>>>>>>   source-gcc/newlib/libc/include
>>>>>>   -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/
>>>>>>   -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper
>>>>>>   setarch,--addr-no-randomize -fdump-tree-all-all -fdump-ipa-all-all
>>>>>>   -fdump-rtl-all-all -save-temps -march=gfx1100
>>>>>>
>>>>>> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from
>>>>>> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), so I
>>>>>> suppose will also exhibit the same failure mode, once again?
>>>>>>
>>>>>> Compared to '-march=gfx90a', the differences begin in
>>>>>> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'.
>>>>>>
>>>>>> Changed like:
>>>>>>
>>>>>>   @@ -38,10 +38,10 @@ int main ()
>>>>>>#pragma GCC novector
>>>>>>  for (i = 1; i < N; i++)
>>>>>>if (a[i] != i%4 + 1)
>>>>>>   -  abort ();
>>>>>>   +  __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1);
>>>>>>
>>>>>>  if (a[0] != 5)
>>>>>>   -abort ();
>>>>>>   +__builtin_printf("%d %d != %d\n", 0, a[0], 5);
>>>>>>
>>>>>> ..., we see:
>>>>>>
>>>>>>   $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
>>>>>>   40 5 != 1
>>>>>>   41 6 != 2
>>>>>>   42 7 != 3
>>>>>>   43 8 != 4
>>>>>>   44 5 != 1
>>>>>>   45 6 != 2
>>>>>>   46 7 != 3
>>>>>>   47 8 != 4
>>>>>>
>>>>>> '40..47' are the 'i = 10..11' in 'foo', and the expectation is
>>>>>> 'a[i * stride + 0..3] != 0'.  So, either some earlier iteration has
>>>>>> scribbled zero values over these (vector lane masking issue, perhaps?),
>>>>>> or some other code generation issue?
>
>>>> [...], I must be doing something different because vect/bb-slp-cond-1.c
>>>> passes for me, on gfx1100.
>
> That's strange.  I've looked at your log file (looks good), and used your
> toolchain to compile, and your 'gcn-run' to invoke, and still do get:
>
> $ flock /tmp/gcn.lock ~/gcn-run ~/bb-slp-cond-1.exe
> GCN Kernel Aborted
> Kernel aborted
>
> Andrew, later on, please try what happens when you put an unconditional
> 'abort' call into a test case?

Andrew, any luck with that yet?

Richard, are you able to reproduce the 'gcc.dg/vect/bb-slp-cond-1.c'
execution test failure mentioned above (manual compilation and
'gcn-run')?


Grüße
 Thomas


>>> I didn't try to run it - when doing make check-gcc fails to using
>>> gcn-run for test invocation
>
> N

GCN: Conditionalize 'define_expand "reduc__scal_"' on '!TARGET_RDNA2_PLUS' [PR113615] (was: [patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615])

2024-02-16 Thread Thomas Schwinge
Hi!

On 2024-01-29T11:34:05+0100, Tobias Burnus  wrote:
> Andrew wrote off list:
>"Vector reductions don't work on RDNA, as is, but they're
> supposed to be disabled by the insn condition"
>
> This patch disables "fold_left_plus_", which is about
> vectorization and in the code path shown in the backtrace.
> I can also confirm manually that it fixes the ICE I saw and
> also the ICE for the testfile that Richard's PR shows at the
> end of his backtrace.  (-O3 is needed to trigger the ICE.)

On top of that, OK to push the attached
"GCN: Conditionalize 'define_expand "reduc__scal_"' on 
'!TARGET_RDNA2_PLUS' [PR113615]"?

Which of the 'assert's are worth keeping?

Only tested 'vect.exp' for 'check-gcc-c' so far; full testing to run
later.

Please confirm I'm understanding this correctly:

Andrew's original commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
"amdgcn: add -march=gfx1030 EXPERIMENTAL" did this:

 (define_expand "reduc__scal_"
   [(set (match_operand: 0 "register_operand")
(unspec:
  [(match_operand:V_ALL 1 "register_operand")]
  REDUC_UNSPEC))]
-  ""
+  "!TARGET_RDNA2" [later '!TARGET_RDNA2_PLUS']
   {
 [...]

This conditional, however, does *not* govern any explicit
'gen_reduc_plus_scal_', and therefore Tobias in
commit 7cc2262ec9a410dc56d1c1c6b950c922e14f621d
"gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]"
had to replicate the '!TARGET_RDNA2_PLUS' condition:

> @@ -4274,7 +4274,8 @@ (define_expand "fold_left_plus_"
>   [(match_operand: 0 "register_operand")
>(match_operand: 1 "gcn_alu_operand")
>(match_operand:V_FP 2 "gcn_alu_operand")]
> -  "can_create_pseudo_p ()
> +  "!TARGET_RDNA2_PLUS
> +   && can_create_pseudo_p ()
> && (flag_openacc || flag_openmp
> || flag_associative_math)"
>{
|  rtx dest = operands[0];
|  rtx scalar = operands[1];
|  rtx vector = operands[2];
|  rtx tmp = gen_reg_rtx (mode);
|  
|  emit_insn (gen_reduc_plus_scal_ (tmp, vector));
|  [...]

..., and I thus now have to do similar for
'gen_reduc__scal_' use in here:

 (define_expand "reduc__scal_"
   [(match_operand: 0 "register_operand")
(fminmaxop:V_FP
  (match_operand:V_FP 1 "register_operand"))]
-  ""
    +  "!TARGET_RDNA2_PLUS"
   {
 /* fmin/fmax are identical to smin/smax.  */
 emit_insn (gen_reduc__scal_ (operands[0], 
operands[1]));
 [...]


Grüße
 Thomas


>From 1ca37da07f0fd3fa2e87fcbde9f2c2aadbe320dc Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 16 Feb 2024 13:04:00 +0100
Subject: [PATCH] GCN: Conditionalize 'define_expand
 "reduc__scal_"' on '!TARGET_RDNA2_PLUS' [PR113615]

On top of commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
"amdgcn: add -march=gfx1030 EXPERIMENTAL" conditionalizing
'define_expand "reduc__scal_"' on
'!TARGET_RDNA2' (later: '!TARGET_RDNA2_PLUS'), we then did similar in
commit 7cc2262ec9a410dc56d1c1c6b950c922e14f621d
"gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]"
to conditionalize 'define_expand "fold_left_plus_"' on
'!TARGET_RDNA2_PLUS', but I found we also need to conditionalize the related
'define_expand "reduc__scal_"' on '!TARGET_RDNA2_PLUS', to
avoid ICEs like:

[...]/gcc.dg/vect/pr108608.c: In function 'foo':
[...]/gcc.dg/vect/pr108608.c:9:1: error: unrecognizable insn:
(insn 34 33 35 2 (set (reg:V64DF 723)
(unspec:V64DF [
(reg:V64DF 690 [ vect_m_11.20 ])
(const_int 1 [0x1])
] UNSPEC_MOV_DPP_SHR)) -1
 (nil))
during RTL pass: vregs

Similar for 'gcc.dg/vect/vect-fmax-2.c', 'gcc.dg/vect/vect-fmin-2.c', and
'UNSPEC_SMAX_DPP_SHR' for 'gcc.dg/vect/vect-fmax-1.c', and
'UNSPEC_SMIN_DPP_SHR' for 'gcc.dg/vect/vect-fmin-1.c', when running 'vect.exp'
for 'check-gcc-c'.

	PR target/113615
	gcc/
	* config/gcn/gcn-valu.md (define_expand "reduc__scal_"):
	Conditionalize on '!TARGET_RDNA2_PLUS'.
---
 gcc/config/gcn/gcn-valu.md | 6 +-
 gcc/config/gcn/gcn.cc  | 4 
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 59e27d0aed79..973a72e3fc41 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -4247,6 +4247,8 @@
 	  REDUC_UNSPEC))]
   "!TARGET_RDNA2_PLUS"
   {
+gcc_checking_assert (!TARGET_RDNA2_PLUS);
+
 rtx tmp = gcn_expand_reduc_scalar (mode, operands[1],
    );
 
@@ -4261,8 +4263,10 @@
   [(match_operand: 0 "register_operand")
(fminmaxop:V_FP
  (match_operand:V_FP 1 &q

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-16 Thread Thomas Schwinge
Hi!

On 2024-02-16T12:41:06+, Andrew Stubbs  wrote:
> On 16/02/2024 12:26, Richard Biener wrote:
>> On Fri, 16 Feb 2024, Andrew Stubbs wrote:
>>> On 16/02/2024 10:17, Richard Biener wrote:
>>>> On Fri, 16 Feb 2024, Thomas Schwinge wrote:
>>>>> On 2023-10-20T12:51:03+0100, Andrew Stubbs  wrote:
>>>>>> I've committed this patch
>>>>>
>>>>> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
>>>>> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later RDNA3/gfx1100
>>>>> support builds on top of, and that's what I'm currently working on
>>>>> getting proper GCC/GCN target (not offloading) results for.
>>>>>
>>>>> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably simple,
>>>>> and hopefully representative for other SLP execution test FAILs
>>>>> (regressions compared to my earlier non-gfx1100 testing).
>>>>>
>>>>>   $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/
>>>>>   source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
>>>>>   --sysroot=install/amdgcn-amdhsa -ftree-vectorize
>>>>>   -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common
>>>>>   -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem
>>>>>   build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem
>>>>>   source-gcc/newlib/libc/include
>>>>>   -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/
>>>>>   -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper
>>>>>   setarch,--addr-no-randomize -fdump-tree-all-all -fdump-ipa-all-all
>>>>>   -fdump-rtl-all-all -save-temps -march=gfx1100
>>>>>
>>>>> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from
>>>>> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), so I
>>>>> suppose will also exhibit the same failure mode, once again?
>>>>>
>>>>> Compared to '-march=gfx90a', the differences begin in
>>>>> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'.
>>>>>
>>>>> Changed like:
>>>>>
>>>>>   @@ -38,10 +38,10 @@ int main ()
>>>>>#pragma GCC novector
>>>>>  for (i = 1; i < N; i++)
>>>>>if (a[i] != i%4 + 1)
>>>>>   -  abort ();
>>>>>   +  __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1);
>>>>>
>>>>>  if (a[0] != 5)
>>>>>   -abort ();
>>>>>   +__builtin_printf("%d %d != %d\n", 0, a[0], 5);
>>>>>
>>>>> ..., we see:
>>>>>
>>>>>   $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
>>>>>   40 5 != 1
>>>>>   41 6 != 2
>>>>>   42 7 != 3
>>>>>   43 8 != 4
>>>>>   44 5 != 1
>>>>>   45 6 != 2
>>>>>   46 7 != 3
>>>>>   47 8 != 4
>>>>>
>>>>> '40..47' are the 'i = 10..11' in 'foo', and the expectation is
>>>>> 'a[i * stride + 0..3] != 0'.  So, either some earlier iteration has
>>>>> scribbled zero values over these (vector lane masking issue, perhaps?),
>>>>> or some other code generation issue?

>>> [...], I must be doing something different because vect/bb-slp-cond-1.c
>>> passes for me, on gfx1100.

That's strange.  I've looked at your log file (looks good), and used your
toolchain to compile, and your 'gcn-run' to invoke, and still do get:

$ flock /tmp/gcn.lock ~/gcn-run ~/bb-slp-cond-1.exe
GCN Kernel Aborted
Kernel aborted

Andrew, later on, please try what happens when you put an unconditional
'abort' call into a test case?

>> I didn't try to run it - when doing make check-gcc fails to using
>> gcn-run for test invocation

Note, that for such individual test cases, invoking the compiler and then
'gcn-run' manually would seem easiest?

>> what's the trick to make it do that?

I tell you've probably not done much "embedded" or simulator testing of
GCC targets?  ;-P

> There's a config file for nvptx here: 
> https://github.com/SourceryTools/nvptx-tools/blob/master/nvptx-none-run.exp

Yes, and I have pending some updates to that one, to be finished once
I've generally got my testing set up again, to a sufficient degree...

> You can probably make the obvious 

GCN RDNA2+ vs. GCC SLP vectorizer (was: [committed] amdgcn: add -march=gfx1030 EXPERIMENTAL)

2024-02-16 Thread Thomas Schwinge
Hi!

On 2023-10-20T12:51:03+0100, Andrew Stubbs  wrote:
> I've committed this patch

... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
"amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later RDNA3/gfx1100
support builds on top of, and that's what I'm currently working on
getting proper GCC/GCN target (not offloading) results for.

Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably simple,
and hopefully representative for other SLP execution test FAILs
(regressions compared to my earlier non-gfx1100 testing).

$ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ 
source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c 
--sysroot=install/amdgcn-amdhsa -ftree-vectorize 
-fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 
-fdump-tree-slp-details -fdump-tree-vect-details -isystem 
build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem 
source-gcc/newlib/libc/include -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/ 
-Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper setarch,--addr-no-randomize 
-fdump-tree-all-all -fdump-ipa-all-all -fdump-rtl-all-all -save-temps 
-march=gfx1100

The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from
'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), so I
suppose will also exhibit the same failure mode, once again?

Compared to '-march=gfx90a', the differences begin in
'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'.

Changed like:

@@ -38,10 +38,10 @@ int main ()
 #pragma GCC novector
   for (i = 1; i < N; i++)
 if (a[i] != i%4 + 1)
-  abort ();
+  __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1);
 
   if (a[0] != 5)
-abort ();
+__builtin_printf("%d %d != %d\n", 0, a[0], 5);

..., we see:

$ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
40 5 != 1
41 6 != 2
42 7 != 3
43 8 != 4
44 5 != 1
45 6 != 2
46 7 != 3
47 8 != 4

'40..47' are the 'i = 10..11' in 'foo', and the expectation is
'a[i * stride + 0..3] != 0'.  So, either some earlier iteration has
scribbled zero values over these (vector lane masking issue, perhaps?),
or some other code generation issue?


Grüße
 Thomas


Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Thomas Schwinge
Hi!

On 2024-02-15T08:49:17+0100, Richard Biener  wrote:
> On Wed, 14 Feb 2024, Andrew Stubbs wrote:
>> On 14/02/2024 13:43, Richard Biener wrote:
>> > On Wed, 14 Feb 2024, Andrew Stubbs wrote:
>> >> On 14/02/2024 13:27, Richard Biener wrote:
>> >>> On Wed, 14 Feb 2024, Andrew Stubbs wrote:
>> >>>> On 13/02/2024 08:26, Richard Biener wrote:
>> >>>>> On Mon, 12 Feb 2024, Thomas Schwinge wrote:
>> >>>>>> On 2023-10-20T12:51:03+0100, Andrew Stubbs 
>> >>>>>> wrote:
>> >>>>>>> I've committed this patch
>> >>>>>>
>> >>>>>> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
>> >>>>>> "amdgcn: add -march=gfx1030 EXPERIMENTAL".
>> >>>>>>
>> >>>>>> The RDNA2 ISA variant doesn't support certain instructions previous
>> >>>>>> implemented in GCC/GCN, so a number of patterns etc. had to be
>> >>>>>> disabled:
>> >>>>>>
>> >>>>>>> [...] Vector
>> >>>>>>> reductions will need to be reworked for RDNA2.  [...]
>> >>>>>>
>> >>>>>>>* config/gcn/gcn-valu.md (@dpp_move): Disable for RDNA2.
>> >>>>>>>(addc3): Add RDNA2 syntax variant.
>> >>>>>>>(subc3): Likewise.
>> >>>>>>>(2_exec): Add RDNA2 alternatives.
>> >>>>>>>(vec_cmpdi): Likewise.
>> >>>>>>>(vec_cmpdi): Likewise.
>> >>>>>>>(vec_cmpdi_exec): Likewise.
>> >>>>>>>(vec_cmpdi_exec): Likewise.
>> >>>>>>>(vec_cmpdi_dup): Likewise.
>> >>>>>>>(vec_cmpdi_dup_exec): Likewise.
>> >>>>>>>(reduc__scal_): Disable for RDNA2.
>> >>>>>>>(*_dpp_shr_): Likewise.
>> >>>>>>>(*plus_carry_dpp_shr_): Likewise.
>> >>>>>>>(*plus_carry_in_dpp_shr_): Likewise.
>> >>>>>>
>> >>>>>> Etc.  The expectation being that GCC middle end copes with this, and
>> >>>>>> synthesizes some less ideal yet still functional vector code, I 
>> >>>>>> presume.
>> >>>>>>
>> >>>>>> The later RDNA3/gfx1100 support builds on top of this, and that's what
>> >>>>>> I'm currently working on getting proper GCC/GCN target (not 
>> >>>>>> offloading)
>> >>>>>> results for.
>> >>>>>>
>> >>>>>> I'm seeing a good number of execution test FAILs (regressions 
>> >>>>>> compared to
>> >>>>>> my earlier non-gfx1100 testing), and I've now tracked down where one
>> >>>>>> large class of those comes into existance -- [...]

>> >>>>>> With the following hack applied to 'gcc/tree-vect-loop.cc':
>> >>>>>>
>> >>>>>>@@ -6687,8 +6687,9 @@ vect_create_epilog_for_reduction
>> >>>>>>(loop_vec_info
>> >>>>>>loop_vinfo,
>> >>>>>>   reduce_with_shift = have_whole_vector_shift (mode1);
>> >>>>>>   if (!VECTOR_MODE_P (mode1)
>> >>>>>>  || !directly_supported_p (code, vectype1))
>> >>>>>>reduce_with_shift = false;
>> >>>>>>+  reduce_with_shift = false;
>> >>>>>>
>> >>>>>> ..., I'm able to work around those regressions: by means of forcing
>> >>>>>> "Reduce using scalar code" instead of "Reduce using vector shifts".

>> The attached not-well-tested patch should allow only valid permutations.
>> Hopefully we go back to working code, but there'll be things that won't
>> vectorize. That said, the new "dump" output code has fewer and probably
>> cheaper instructions, so hmmm.
>
> This fixes the reduced builtin-bitops-1.c on RDNA2.

I confirm that "amdgcn: Disallow unsupported permute on RDNA devices"
also obsoletes my 'reduce_with_shift = false;' hack -- and also cures a
good number of additional FAILs (regressions), where presumably we
permute via different code paths.  Thanks!

There also are a few

GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts" (was: [committed] amdgcn: add -march=gfx1030 EXPERIMENTAL)

2024-02-12 Thread Thomas Schwinge
Hi!

On 2023-10-20T12:51:03+0100, Andrew Stubbs  wrote:
> I've committed this patch

... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
"amdgcn: add -march=gfx1030 EXPERIMENTAL".

The RDNA2 ISA variant doesn't support certain instructions previous
implemented in GCC/GCN, so a number of patterns etc. had to be disabled:

> [...] Vector
> reductions will need to be reworked for RDNA2.  [...]

>   * config/gcn/gcn-valu.md (@dpp_move): Disable for RDNA2.
>   (addc3): Add RDNA2 syntax variant.
>   (subc3): Likewise.
>   (2_exec): Add RDNA2 alternatives.
>   (vec_cmpdi): Likewise.
>   (vec_cmpdi): Likewise.
>   (vec_cmpdi_exec): Likewise.
>   (vec_cmpdi_exec): Likewise.
>   (vec_cmpdi_dup): Likewise.
>   (vec_cmpdi_dup_exec): Likewise.
>   (reduc__scal_): Disable for RDNA2.
>   (*_dpp_shr_): Likewise.
>   (*plus_carry_dpp_shr_): Likewise.
>   (*plus_carry_in_dpp_shr_): Likewise.

Etc.  The expectation being that GCC middle end copes with this, and
synthesizes some less ideal yet still functional vector code, I presume.

The later RDNA3/gfx1100 support builds on top of this, and that's what
I'm currently working on getting proper GCC/GCN target (not offloading)
results for.

I'm seeing a good number of execution test FAILs (regressions compared to
my earlier non-gfx1100 testing), and I've now tracked down where one
large class of those comes into existance -- not yet how to resolve,
unfortunately.  But maybe, with you guys' combined vectorizer and back
end experience, the latter will be done quickly?

Richard, I don't know if you've ever run actual GCC/GCN target (not
offloading) testing; let me know if you have any questions about that.
Given that (at least largely?) the same patterns etc. are disabled as in
my gfx1100 configuration, I suppose your gfx1030 one would exhibit the
same issues.  You can build GCC/GCN target like you build the offloading
one, just remove '--enable-as-accelerator-for=[...]'.  Likely, you can
even use a offloading GCC/GCN build to reproduce the issue below.

One example is the attached 'builtin-bitops-1.c', reduced from
'gcc.c-torture/execute/builtin-bitops-1.c', where 'my_popcount' is
miscompiled as soon as '-ftree-vectorize' is effective:

$ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ builtin-bitops-1.c 
-Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/ 
-Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -fdump-tree-all-all -fdump-ipa-all-all 
-fdump-rtl-all-all -save-temps -march=gfx1100 -O1 -ftree-vectorize

In the 'diff' of 'a-builtin-bitops-1.c.179t.vect', for example, for
'-march=gfx90a' vs. '-march=gfx1100', we see:

+builtin-bitops-1.c:7:17: missed:   reduc op not supported by target.

..., and therefore:

-builtin-bitops-1.c:7:17: note:  Reduce using direct vector reduction.
+builtin-bitops-1.c:7:17: note:  Reduce using vector shifts
+builtin-bitops-1.c:7:17: note:  extract scalar result

That is, instead of one '.REDUC_PLUS' for gfx90a, for gfx1100 we build a
chain of summation of 'VEC_PERM_EXPR's.  However, there's wrong code
generated:

$ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
i=1, ints[i]=0x1 a=1, b=2
i=2, ints[i]=0x8000 a=1, b=2
i=3, ints[i]=0x2 a=1, b=2
i=4, ints[i]=0x4000 a=1, b=2
i=5, ints[i]=0x1 a=1, b=2
i=6, ints[i]=0x8000 a=1, b=2
i=7, ints[i]=0xa5a5a5a5 a=16, b=32
i=8, ints[i]=0x5a5a5a5a a=16, b=32
i=9, ints[i]=0xcafe a=11, b=22
i=10, ints[i]=0xcafe00 a=11, b=22
i=11, ints[i]=0xcafe a=11, b=22
i=12, ints[i]=0x a=32, b=64

(I can't tell if the 'b = 2 * a' pattern is purely coincidental?)

I don't speak enough "vectorization" to fully understand the generic
vectorized algorithm and its implementation.  It appears that the
"Reduce using vector shifts" code has been around for a very long time,
but also has gone through a number of changes.  I can't tell which GCC
targets/configurations it's actually used for (in the same way as for
GCN gfx1100), and thus whether there's an issue in that vectorizer code,
or rather in the GCN back end, or GCN back end parameterizing the generic
code?

Manually working through the 'a-builtin-bitops-1.c.265t.optimized' code:

int my_popcount (unsigned int x)
{
  int stmp__12.12;
  vector(64) int vect__12.11;
  vector(64) unsigned int vect__1.8;
  vector(64) unsigned int _13;
  vector(64) unsigned int vect_cst__18;
  vector(64) int [all others];

   [local count: 32534376]:
  vect_cst__18 = { [all 'x_8(D)'] };
  vect__1.8_19 = vect_cst__18 >> { 0, 1, 2, [...], 61, 62, 63 };
  _13 = .COND_AND ({ [32 x '-1'], [32 x '0'] }, vect__1.8_19, { [all '1'] 
}, { [all '0'] });
  vect__12.11_24 = VIEW_CONVERT_EXPR(_13);
  _26 = VEC_PERM_EXPR ;
  _27 = vect__12.11_24 + _26;
  _28 = VEC_PERM_EXPR <_27, { [all '0'] }, { 16, 17, 18, [...], 77, 78, 79 
}>;
  _29 = _27 + _28;
  _30 = VEC_PERM_EXPR <_29, { [all '0'] }, { 8, 9, 10, [...], 69, 70, 

libgomp GCN gfx1030/gfx1100 offloading status (was: [PATCH] amdgcn: additional gfx1100 support)

2024-02-01 Thread Thomas Schwinge
Hi!

On 2024-01-26T10:45:10+0100, Richard Biener  wrote:
> On Fri, 26 Jan 2024, Richard Biener wrote:
>> On Wed, 24 Jan 2024, Andrew Stubbs wrote:
>> > [...] is enough to get gfx1100 working for most purposes, on top of the
>> > patch that Tobias committed a week or so ago; there are still some test
>> > failures to investigate, and probably some tuning to do.
>> > 
>> > It might also get gfx1030 working too. @Richi, could you test it,
>> > please?
>> 
>> I can report partial success here.  [...]

>> I'll followup with a test summary once the (serial) run of libgomp
>> testing finished.

(Why serial, by the way?)

>> At least there are quite some number of
>> actual kernel executions and PASSing testcases.
>
> === libgomp Summary ===
>
> # of expected passes29126
> # of unexpected failures697
> # of unexpected successes   1
> # of expected failures  703
> # of unresolved testcases   318
> # of unsupported tests  766
>
> full summary attached (compressed).

Compating your old results ('| ' prefix in the following) with what I
got with '-march=gfx1100' for AMD Radeon RX 7900 XTX.  My GCC sources are
a few weeks old, but have all the recent fix-up commits cherry-picked,
and a work-around applied for:

/tmp/ccfrKwEK.mkoffload.2.s:29:27: error: value out of range
  .amdhsa_next_free_vgpr516
^~~

(..., to be discussed later.)

There are, I think, no compilation FAILs anymore; I'm only commenting on
execution test FAILs.  Not all FAILs appear all the time (so it follows
that I may be missing a few), and 'libgomp.c++/../libgomp.c-c++-common'
generally behaves similar to 'libgomp.c/../libgomp.c-c++-common', so
omitting the former here.

| FAIL: libgomp.c/../libgomp.c-c++-common/error-1.c output pattern test

Not seeing that FAIL.

I also see 'libgomp.c-c++-common/for-5.c' FAIL.

| FAIL: libgomp.c/../libgomp.c-c++-common/icv-5.c execution test
| FAIL: libgomp.c/../libgomp.c-c++-common/icv-6.c execution test
| FAIL: libgomp.c/../libgomp.c-c++-common/icv-7.c execution test
| FAIL: libgomp.c/../libgomp.c-c++-common/icv-9.c execution test

I confirm 'libgomp.c-c++-common/icv-5.c', 'libgomp.c-c++-common/icv-9.c'
FAIL, but 'libgomp.c-c++-common/icv-6.c', 'libgomp.c-c++-common/icv-7.c'
PASS.

| FAIL: libgomp.c/../libgomp.c-c++-common/non-rect-loop-1.c execution test

Not seeing that FAIL.

| FAIL: libgomp.c/../libgomp.c-c++-common/reduction-6.c execution test

I confirm that FAIL, and also 'libgomp.c-c++-common/reduction-5.c'
occasionally.

| FAIL: libgomp.c/../libgomp.c-c++-common/requires-unified-addr-1.c 
execution test
| FAIL: libgomp.c/../libgomp.c-c++-common/requires-unified-addr-2.c 
execution test
| FAIL: libgomp.c/../libgomp.c-c++-common/target-45.c execution test
| FAIL: libgomp.c/../libgomp.c-c++-common/target-implicit-map-3.c execution 
test
| FAIL: libgomp.c/../libgomp.c-c++-common/target-is-accessible-1.c 
execution test

Not seeing these FAILs.

I also see 'libgomp.c-c++-common/reverse-offload-1.c' FAIL.

| FAIL: libgomp.c/../libgomp.c-c++-common/task-detach-6.c execution test
| WARNING: program timed out.
| FAIL: libgomp.c/../libgomp.c-c++-common/task-in-explicit-1.c execution 
test

I confirm these FAILs.

| FAIL: libgomp.c/../libgomp.c-c++-common/teams-2.c execution test

Known FAIL.

| FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-1.c execution 
test
| FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-2.c execution 
test
| FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-3.c execution 
test
| FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-4.c execution 
test
| FAIL: libgomp.c/declare-variant-4-gfx900.c (test for excess errors)
| FAIL: libgomp.c/declare-variant-4-gfx906.c (test for excess errors)
| FAIL: libgomp.c/declare-variant-4-gfx908.c (test for excess errors)
| FAIL: libgomp.c/declare-variant-4-gfx90a.c (test for excess errors)
| FAIL: libgomp.c/declare-variant-4.c execution test
| FAIL: libgomp.c/declare-variant-4.c scan-amdgcn-amdhsa-offload-tree-dump 
optimized "= gfx[^ ]+ ();"
| FAIL: libgomp.c/examples-4/device-2.c execution test
| WARNING: program timed out.

Not seeing these FAILs.

I also see 'libgomp.c/examples-4/teams-4.c', 'libgomp.c/target-31.c' FAIL.

| FAIL: libgomp.c/target-teams-1.c execution test

I confirm this FAIL.

| FAIL: libgomp.fortran/[...] execution test

You had a lot of FAILs there.  I only see the following:

| FAIL: libgomp.fortran/examples-4/teams-2.f90   -O0  execution test
| [...]

| FAIL: libgomp.fortran/examples-4/teams-4.f90   -O0  execution test
| [...]

| FAIL: libgomp.fortran/icv-6.f90   -O  execution test

| FAIL: libgomp.fortran/reverse-offload-1.f90   -O2  execution test
| FAIL: libgomp.fortran/reverse-offload-1.f90   -O3 

GCN: Don't hard-code number of SGPR/VGPR/AVGPR registers (was: [PATCH v3 05/10] GCN back-end code)

2024-02-01 Thread Thomas Schwinge
Hi!

On 2018-12-12T11:52:52+, Andrew Stubbs  wrote:
> This patch contains the major part of the GCN back-end.  [...]

> --- /dev/null
> +++ b/gcc/config/gcn/gcn.c

> +void
> +gcn_hsa_declare_function_name (FILE *file, const char *name, tree)
> +{

> +  /* Determine count of sgpr/vgpr registers by looking for last
> + one used.  */
> +  for (sgpr = 101; sgpr >= 0; sgpr--)
> +if (df_regs_ever_live_p (FIRST_SGPR_REG + sgpr))
> +  break;
> +  sgpr++;
> +  for (vgpr = 255; vgpr >= 0; vgpr--)
> +if (df_regs_ever_live_p (FIRST_VGPR_REG + vgpr))
> +  break;
> +  vgpr++;

> --- /dev/null
> +++ b/gcc/config/gcn/gcn.h

> +#define FIRST_SGPR_REG   0
> +#define SGPR_REGNO(N)((N)+FIRST_SGPR_REG)
> +#define LAST_SGPR_REG101

> +#define FIRST_VGPR_REG   160
> +#define VGPR_REGNO(N)((N)+FIRST_VGPR_REG)
> +#define LAST_VGPR_REG415

OK to push "GCN: Don't hard-code number of SGPR/VGPR/AVGPR registers",
see attached?


Grüße
 Thomas


>From ff812668636bce9d203acbcbdc19260f98857e03 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 31 Jan 2024 11:56:59 +0100
Subject: [PATCH] GCN: Don't hard-code number of SGPR/VGPR/AVGPR registers

Also add 'STATIC_ASSERT's for number of SGPR/VGPR/AVGPR registers (in
'#ifndef USED_FOR_TARGET', as otherwise 'STATIC_ASSERT' isn't available).

	gcc/
	* config/gcn/gcn.cc (gcn_hsa_declare_function_name): Don't
	hard-code number of SGPR/VGPR/AVGPR registers.
	* config/gcn/gcn.h: Add a 'STATIC_ASSERT's for number of
	SGPR/VGPR/AVGPR registers.
---
 gcc/config/gcn/gcn.cc |  6 +++---
 gcc/config/gcn/gcn.h  | 15 ---
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 9d3ae2ff1110..c56576ffd9a4 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -6584,15 +6584,15 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, tree decl)
 
   /* Determine count of sgpr/vgpr registers by looking for last
  one used.  */
-  for (sgpr = 101; sgpr >= 0; sgpr--)
+  for (sgpr = LAST_SGPR_REG - FIRST_SGPR_REG; sgpr >= 0; sgpr--)
 if (df_regs_ever_live_p (FIRST_SGPR_REG + sgpr))
   break;
   sgpr++;
-  for (vgpr = 255; vgpr >= 0; vgpr--)
+  for (vgpr = LAST_VGPR_REG - FIRST_VGPR_REG; vgpr >= 0; vgpr--)
 if (df_regs_ever_live_p (FIRST_VGPR_REG + vgpr))
   break;
   vgpr++;
-  for (avgpr = 255; avgpr >= 0; avgpr--)
+  for (avgpr = LAST_AVGPR_REG - FIRST_AVGPR_REG; avgpr >= 0; avgpr--)
 if (df_regs_ever_live_p (FIRST_AVGPR_REG + avgpr))
   break;
   avgpr++;
diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index efe3c91511e5..a17f16aacc40 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -146,14 +146,23 @@
 #define EXEC_HI_REG	127
 #define EXECZ_REG	128
 #define SCC_REG		129
+
 /* 132-159 are reserved to simplify masks.  */
+
 #define FIRST_VGPR_REG	160
 #define VGPR_REGNO(N)	((N)+FIRST_VGPR_REG)
 #define LAST_VGPR_REG	415
+
 #define FIRST_AVGPR_REG 416
 #define AVGPR_REGNO(N)  ((N)+FIRST_AVGPR_REG)
 #define LAST_AVGPR_REG  671
 
+#ifndef USED_FOR_TARGET
+STATIC_ASSERT (LAST_SGPR_REG + 1 - FIRST_SGPR_REG == 102);
+STATIC_ASSERT (LAST_VGPR_REG + 1 - FIRST_VGPR_REG == 256);
+STATIC_ASSERT (LAST_AVGPR_REG + 1 - FIRST_AVGPR_REG == 256);
+#endif /* USED_FOR_TARGET */
+
 /* Frame Registers, and other registers */
 
 #define HARD_FRAME_POINTER_REGNUM 14
@@ -180,9 +189,9 @@
 #define HARD_FRAME_POINTER_IS_ARG_POINTER   0
 #define HARD_FRAME_POINTER_IS_FRAME_POINTER 0
 
-#define SGPR_REGNO_P(N)		((N) <= LAST_SGPR_REG)
-#define VGPR_REGNO_P(N)		((N)>=FIRST_VGPR_REG && (N) <= LAST_VGPR_REG)
-#define AVGPR_REGNO_P(N)((N)>=FIRST_AVGPR_REG && (N) <= LAST_AVGPR_REG)
+#define SGPR_REGNO_P(N)		((N) >= FIRST_SGPR_REG && (N) <= LAST_SGPR_REG)
+#define VGPR_REGNO_P(N)		((N) >= FIRST_VGPR_REG && (N) <= LAST_VGPR_REG)
+#define AVGPR_REGNO_P(N)((N) >= FIRST_AVGPR_REG && (N) <= LAST_AVGPR_REG)
 #define SSRC_REGNO_P(N)		((N) <= SCC_REG && (N) != VCCZ_REG)
 #define SDST_REGNO_P(N)		((N) <= EXEC_HI_REG && (N) != VCCZ_REG)
 #define CC_REG_P(X)		(REG_P (X) && CC_REGNO_P (REGNO (X)))
-- 
2.43.0



GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'

2024-02-01 Thread Thomas Schwinge
Hi!

On 2024-01-31T11:31:00+, Andrew Stubbs  wrote:
> On 31/01/2024 10:36, Thomas Schwinge wrote:
>> OK to push "GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'",
>> see attached?
>> 
>> In pre-RDNA 3 ISA manuals, there are notes for 'DS_CMPST_[...]', like:
>> 
>>  Caution, the order of src and cmp are the *opposite* of the 
>> BUFFER_ATOMIC_CMPSWAP opcode.
>> 
>> ..., and conversely in the RDNA 3 ISA manual, for 'DS_CMPSTORE_[...]':
>> 
>>  In this architecture the order of src and cmp agree with the 
>> BUFFER_ATOMIC_CMPSWAP opcode.
>> 
>> Is my understanding correct, that this isn't something we have to worry
>> about at the GCC machine description level; that's resolved at the
>> assembler level?
>
> Right, the IR uses GCC's operand order and has nothing to do with the 
> assembler syntax; the output template does the mapping.
>
>> --- a/gcc/config/gcn/gcn.md
>> +++ b/gcc/config/gcn/gcn.md
>> @@ -2095,7 +2095,12 @@
>> (match_operand:SIDI 3 "register_operand" "  v")]
>>UNSPECV_ATOMIC))]
>>""
>> -  "ds_cmpst_rtn_b %0, %1, %2, %3\;s_waitcnt\tlgkmcnt(0)"
>> +  {
>> +if (TARGET_RDNA3)
>> +  return "ds_cmpstore_rtn_b %0, %1, %2, 
>> %3\;s_waitcnt\tlgkmcnt(0)";
>> +else
>> +  return "ds_cmpst_rtn_b %0, %1, %2, 
>> %3\;s_waitcnt\tlgkmcnt(0)";
>> +  }
>>[(set_attr "type" "ds")
>> (set_attr "length" "12")])
>
> I think you need to swap %2 and %3 in the new format. ds_cmpst matches 
> GCC operand order, but ds_cmpstore has "cmp" and "src" reversed.

OK, thanks.  That was my actual question -- so, we do need to swap, and
indeed, most of the affected libgomp OpenACC test cases then PASS their
execution test.  With that changed, I've pushed to master branch
commit 6c2a40f4f4577f5d0f7bd1cfda48a5701b75744c
"GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'", see
attached.


Grüße
 Thomas


>From 6c2a40f4f4577f5d0f7bd1cfda48a5701b75744c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 31 Jan 2024 10:19:00 +0100
Subject: [PATCH] GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'

For OpenACC/GCN '-march=gfx1100', a lot of libgomp OpenACC test cases FAIL:

/tmp/ccGfLJ8a.mkoffload.2.s:406:2: error: instruction not supported on this GPU
ds_cmpst_rtn_b32 v0, v0, v4, v3
^

In RDNA 3, 'ds_cmpst_[...]' has been replaced by 'ds_cmpstore_[...]', and the
notes for 'ds_cmpst_[...]' in pre-RDNA 3 ISA manuals:

Caution, the order of src and cmp are the *opposite* of the BUFFER_ATOMIC_CMPSWAP opcode.

..., have been resolved for 'ds_cmpstore_[...]' in the RDNA 3 ISA manual:

In this architecture the order of src and cmp agree with the BUFFER_ATOMIC_CMPSWAP opcode.

..., and therefore '%2', '%3' now swapped with regards to GCC operand order.
Most of the affected libgomp OpenACC test cases then PASS their execution test.

	gcc/
	* config/gcn/gcn.md (sync_compare_and_swap_lds_insn)
	[TARGET_RDNA3]: Adjust.
---
 gcc/config/gcn/gcn.md | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 1f3c692b7a67..925e2cea4895 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -2074,7 +2074,12 @@
 	   (match_operand:SIDI 3 "register_operand" "  v")]
 	  UNSPECV_ATOMIC))]
   ""
-  "ds_cmpst_rtn_b %0, %1, %2, %3\;s_waitcnt\tlgkmcnt(0)"
+  {
+if (TARGET_RDNA3)
+  return "ds_cmpstore_rtn_b %0, %1, %3, %2\;s_waitcnt\tlgkmcnt(0)";
+else
+  return "ds_cmpst_rtn_b %0, %1, %2, %3\;s_waitcnt\tlgkmcnt(0)";
+  }
   [(set_attr "type" "ds")
(set_attr "length" "12")])
 
-- 
2.43.0



GCN: Remove 'FIRST_{SGPR,VGPR,AVGPR}_REG', 'LAST_{SGPR,VGPR,AVGPR}_REG' from machine description (was: [PATCH v3 04/10] GCN machine description)

2024-01-31 Thread Thomas Schwinge
Hi!

On 2018-12-12T11:52:23+, Andrew Stubbs  wrote:
> This patch contains the machine description portion of the GCN back-end.  
> [...]

> --- /dev/null
> +++ b/gcc/config/gcn/gcn.md

> +;; {{{ Constants and enums
> +
> +; Named registers
> +(define_constants
> +  [(FIRST_SGPR_REG0)
> +   (LAST_SGPR_REG 101)
> +   (FLAT_SCRATCH_REG  102)
> +   (FLAT_SCRATCH_LO_REG   102)
> +   (FLAT_SCRATCH_HI_REG   103)
> +   (XNACK_MASK_REG104)
> +   (XNACK_MASK_LO_REG 104)
> +   (XNACK_MASK_HI_REG 105)
> +   (VCC_REG   106)
> +   (VCC_LO_REG106)
> +   (VCC_HI_REG107)
> +   (VCCZ_REG  108)
> +   (TBA_REG   109)
> +   (TBA_LO_REG109)
> +   (TBA_HI_REG110)
> +   (TMA_REG   111)
> +   (TMA_LO_REG111)
> +   (TMA_HI_REG112)
> +   (TTMP0_REG 113)
> +   (TTMP11_REG124)
> +   (M0_REG125)
> +   (EXEC_REG  126)
> +   (EXEC_LO_REG   126)
> +   (EXEC_HI_REG   127)
> +   (EXECZ_REG 128)
> +   (SCC_REG   129)
> +   (FIRST_VGPR_REG160)
> +   (LAST_VGPR_REG 415)])
> +
> +(define_constants
> +  [(SP_REGNUM 16)
> +   (LR_REGNUM 18)
> +   (AP_REGNUM 416)
> +   (FP_REGNUM 418)])

Generally, shouldn't there be a better way, that avoids duplication and
instead shares such definitions between 'gcn.h' and 'gcn.md'?

Until that's settled, OK to push the attached
"GCN: Remove 'FIRST_{SGPR,VGPR,AVGPR}_REG', 'LAST_{SGPR,VGPR,AVGPR}_REG' from 
machine description"?
(I assume "still builds" is sufficient validation of this change.)


Grüße
 Thomas


>From 6af4774b4574086f5d4925333406eab4fed7f9a5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 31 Jan 2024 13:27:34 +0100
Subject: [PATCH] GCN: Remove 'FIRST_{SGPR,VGPR,AVGPR}_REG',
 'LAST_{SGPR,VGPR,AVGPR}_REG' from machine description

They're not used there, and we avoid potentially out-of-sync definitions.

	gcc/
	* config/gcn/gcn.md (FIRST_SGPR_REG, LAST_SGPR_REG)
	(FIRST_VGPR_REG, LAST_VGPR_REG, FIRST_AVGPR_REG, LAST_AVGPR_REG):
	Don't 'define_constants'.
---
 gcc/config/gcn/gcn.md | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 1f3c692b7a67..b3235eeea1b6 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -23,9 +23,7 @@
 
 ; Named registers
 (define_constants
-  [(FIRST_SGPR_REG		 0)
-   (CC_SAVE_REG			 22)
-   (LAST_SGPR_REG		 101)
+  [(CC_SAVE_REG			 22)
(FLAT_SCRATCH_REG		 102)
(FLAT_SCRATCH_LO_REG		 102)
(FLAT_SCRATCH_HI_REG		 103)
@@ -49,11 +47,7 @@
(EXEC_LO_REG			 126)
(EXEC_HI_REG			 127)
(EXECZ_REG			 128)
-   (SCC_REG			 129)
-   (FIRST_VGPR_REG		 160)
-   (LAST_VGPR_REG		 415)
-   (FIRST_AVGPR_REG		 416)
-   (LAST_AVGPR_REG		 671)])
+   (SCC_REG			 129)])
 
 (define_constants
   [(SP_REGNUM 16)
-- 
2.43.0



GCN: Remove 'SGPR_OR_VGPR_REGNO_P' definition (was: [PATCH v3 05/10] GCN back-end code)

2024-01-31 Thread Thomas Schwinge
Hi!

On 2018-12-12T11:52:52+, Andrew Stubbs  wrote:
> This patch contains the major part of the GCN back-end.  [...]

> --- /dev/null
> +++ b/gcc/config/gcn/gcn.h

> +#define FIRST_SGPR_REG   0
> +#define SGPR_REGNO(N)((N)+FIRST_SGPR_REG)
> +#define LAST_SGPR_REG101

> +#define FIRST_VGPR_REG   160
> +#define VGPR_REGNO(N)((N)+FIRST_VGPR_REG)
> +#define LAST_VGPR_REG415

> +#define SGPR_OR_VGPR_REGNO_P(N) ((N)>=FIRST_VGPR_REG && (N) <= LAST_SGPR_REG)

OK to push the attached "GCN: Remove 'SGPR_OR_VGPR_REGNO_P' definition"?


Grüße
 Thomas


>From 849a52b3dcfdd840e6d24a1924962bb01762c1b1 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 31 Jan 2024 12:25:25 +0100
Subject: [PATCH] GCN: Remove 'SGPR_OR_VGPR_REGNO_P' definition

..., which was always (a) unused, and (b) bogus: always-false.

	gcc/
	* config/gcn/gcn.h (SGPR_OR_VGPR_REGNO_P): Remove.
---
 gcc/config/gcn/gcn.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index c2afb5e91403..efe3c91511e5 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -180,7 +180,6 @@
 #define HARD_FRAME_POINTER_IS_ARG_POINTER   0
 #define HARD_FRAME_POINTER_IS_FRAME_POINTER 0
 
-#define SGPR_OR_VGPR_REGNO_P(N) ((N)>=FIRST_VGPR_REG && (N) <= LAST_SGPR_REG)
 #define SGPR_REGNO_P(N)		((N) <= LAST_SGPR_REG)
 #define VGPR_REGNO_P(N)		((N)>=FIRST_VGPR_REG && (N) <= LAST_VGPR_REG)
 #define AVGPR_REGNO_P(N)((N)>=FIRST_AVGPR_REG && (N) <= LAST_AVGPR_REG)
-- 
2.43.0



GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'

2024-01-31 Thread Thomas Schwinge
Hi!

OK to push "GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'",
see attached?

In pre-RDNA 3 ISA manuals, there are notes for 'DS_CMPST_[...]', like:

Caution, the order of src and cmp are the *opposite* of the 
BUFFER_ATOMIC_CMPSWAP opcode.

..., and conversely in the RDNA 3 ISA manual, for 'DS_CMPSTORE_[...]':

In this architecture the order of src and cmp agree with the 
BUFFER_ATOMIC_CMPSWAP opcode.

Is my understanding correct, that this isn't something we have to worry
about at the GCC machine description level; that's resolved at the
assembler level?


Grüße
 Thomas


>From df6e031bf4b46d9e5b2de117fecd66b8b9b6dd20 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 31 Jan 2024 10:19:00 +0100
Subject: [PATCH] GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'

For OpenACC/GCN '-march=gfx1100', a lot of test cases FAIL:

/tmp/ccGfLJ8a.mkoffload.2.s:406:2: error: instruction not supported on this GPU
ds_cmpst_rtn_b32 v0, v0, v4, v3
^

Apparently, in RDNA 3, 'ds_cmpst_[...]' has been replaced by
'ds_cmpstore_[...]'.

	gcc/
	* config/gcn/gcn.md (sync_compare_and_swap_lds_insn)
	[TARGET_RDNA3]: Adjust.
---
 gcc/config/gcn/gcn.md | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 8abaef3bbdec..bbb75704140b 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -2095,7 +2095,12 @@
 	   (match_operand:SIDI 3 "register_operand" "  v")]
 	  UNSPECV_ATOMIC))]
   ""
-  "ds_cmpst_rtn_b %0, %1, %2, %3\;s_waitcnt\tlgkmcnt(0)"
+  {
+if (TARGET_RDNA3)
+  return "ds_cmpstore_rtn_b %0, %1, %2, %3\;s_waitcnt\tlgkmcnt(0)";
+else
+  return "ds_cmpst_rtn_b %0, %1, %2, %3\;s_waitcnt\tlgkmcnt(0)";
+  }
   [(set_attr "type" "ds")
(set_attr "length" "12")])
 
-- 
2.43.0



Re: [v2][patch] plugin/plugin-nvptx.c: Fix fini_device call when already shutdown [PR113513]

2024-01-29 Thread Thomas Schwinge
Hi Tobias!

On 2024-01-23T10:55:16+0100, Tobias Burnus  wrote:
> Slightly changed patch:
>
> nvptx_attach_host_thread_to_device now fails again with an error for 
> CUDA_ERROR_DEINITIALIZED, except for GOMP_OFFLOAD_fini_device.
>
> I think it makes more sense that way.

Agreed.

> Tobias Burnus wrote:
>> Testing showed that the libgomp.c/target-52.c failed with:
>>
>> libgomp: cuCtxGetDevice error: unknown cuda error
>>
>> libgomp: device finalization failed
>>
>> This testcase uses OMP_DISPLAY_ENV=true and 
>> OMP_TARGET_OFFLOAD=mandatory, and those env vars matter, i.e. it only 
>> fails if dg-set-target-env-var is honored.
>>
>> If both env vars are set, the device initialization occurs earlier as 
>> OMP_DEFAULT_DEVICE is shown due to the display-env env var and its 
>> value (when target-offload-var is 'mandatory') might be either 
>> 'omp_invalid_device' or '0'.
>>
>> It turned out that this had an effect on device finalization, which 
>> caused CUDA to stop earlier than expected. This patch now handles this 
>> case gracefully. For details, see the commit log message in the 
>> attached patch and/or the PR.

> plugin/plugin-nvptx.c: Fix fini_device call when already shutdown [PR113513]
>
> The following issue was found when running libgomp.c/target-52.c with
> nvptx offloading when the dg-set-target-env-var was honored.

Curious, I've never seen this failure mode in my several different
configurations.  :-|

> The issue
> occurred for both -foffload=disable and with offloading configured when
> an nvidia device is available.
>
> At the end of the program, the offloading parts are shutdown via two means:
> The callback registered via 'atexit (gomp_target_fini)' and - via code
> generated in mkoffload, the '__attribute__((destructor)) fini' function
> that calls GOMP_offload_unregister_ver.
>
> In normal processing, first gomp_target_fini is called - which then sets
> GOMP_DEVICE_FINALIZED for the device - and later GOMP_offload_unregister_ver,
> but that's then because the state is GOMP_DEVICE_FINALIZED.
> If both OMP_DISPLAY_ENV=true and OMP_TARGET_OFFLOAD="mandatory" are set,
> the call omp_display_env already invokes gomp_init_targets_once, i.e. it
> occurs earlier than usual and is invoked via __attribute__((constructor))
> initialize_env.
>
> For some unknown reasons, while this does not have an effect on the
> order of the called plugin functions for initialization, it changes the
> order of function calls for shutting down. Namely, when the two environment
> variables are set, GOMP_offload_unregister_ver is called now before
> gomp_target_fini.

Re "unknown reasons", isn't that indeed explained by the different
'atexit' function/'__attribute__((destructor))' sequencing, due to
different order of 'atexit'/'__attribute__((constructor))' calls?

I think I agree that, defensively, we should behave correctly in libgomp
finitialization, no matter in which these calls occur.

> And it seems as if CUDA regards a call to cuModuleUnload
> (or unloading the last module?) as indication that the device context should
> be destroyed - or, at least, afterwards calling cuCtxGetDevice will return
> CUDA_ERROR_DEINITIALIZED.

However, this I don't understand -- but would like to.  Are you saying
that for:

--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -1556,8 +1556,16 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned 
version, const void *target_data)
 if (image->target_data == target_data)
   {
*prev_p = image->next;
-   if (CUDA_CALL_NOCHECK (cuModuleUnload, image->module) != CUDA_SUCCESS)
+   CUresult r;
+   r = CUDA_CALL_NOCHECK (cuModuleUnload, image->module);
+   GOMP_PLUGIN_debug (0, "%s: cuModuleUnload: %s\n", __FUNCTION__, 
cuda_error (r));
+   if (r != CUDA_SUCCESS)
  ret = false;
+   CUdevice dev_;
+   r = CUDA_CALL_NOCHECK (cuCtxGetDevice, _);
+   GOMP_PLUGIN_debug (0, "%s: cuCtxGetDevice: %s\n", __FUNCTION__, 
cuda_error (r));
+   GOMP_PLUGIN_debug (0, "%s: dev_=%d, dev->dev=%d\n", __FUNCTION__, dev_, 
dev->dev);
+   assert (dev_ == dev->dev);
free (image->fns);
free (image);
break;

..., you're seeing an error for 'libgomp.c/target-52.c' with
'env OMP_TARGET_OFFLOAD=mandatory OMP_DISPLAY_ENV=true'?  I get:

GOMP_OFFLOAD_unload_image: cuModuleUnload: no error
GOMP_OFFLOAD_unload_image: cuCtxGetDevice: no error
GOMP_OFFLOAD_unload_image: dev_=0, dev->dev=0

Or, is something else happening in between the 'cuModuleUnload' and your
reportedly failing 'cuCtxGetDevice'?

Re your PR113513 details, I don't see how your failure mode could be
related to (a) the PTX code ('--with-arch=sm_80'), or the GPU hardware
("NVIDIA RTX A1000 6GB") (..., unless the Nvidia Driver is doing "funny"
things, of course...), so could this possibly be due to a recent change
in the CUDA Driver/Nvidia Driver?  You say "CUDA Version: 12.3", but
which which Nvidia Driver version?  The 

Re: [patch] nvptx.opt: Add sm_89 and sm_90a to -march-map=

2024-01-29 Thread Thomas Schwinge
Hi Tobias!

On 2024-01-20T10:57:29+0100, Tobias Burnus  wrote:
> Stumbled over this as we recently got a sm_89 card.
>
> -march-map= is mostly a future proof method for user to ensure to use 
> always the best code gen for a specific card - without needing to know 
> which GCC version added support for what --march=sm_... (or -misa=sm_... 
> - those are aliases).
>
> sm_89 was added in CUDA 11.8 (ptx isa 7.8) and sm_90a in CUDA 12.0 (ptx 
> isa 8.0) but that's just FYI as -march-map=sm_xx, xx >= 80 is mapping to 
> -march=sm_80 and implies -mptx=7.0 (i.e. ptx isa 7.0, added in CUDA 
> 11.0); hence, any CUDA 11.0+ will do.
>
> OK for mainline?

OK, thanks.


Grüße
 Thomas


> nvptx.opt: Add sm_89 and sm_90a to -march-map=
>
> The -march-map= options maps the compute capability to the closest
> lower compute capability that has been implemented; for sm_89 and
> sm_90a, that were previously missing, that's currently -march=sm_80
> alias -misa=sm_80.
>
> gcc/ChangeLog:
>
>   * config/nvptx/nvptx.opt (march-map=): Add sm_89 and sm_90a.
>
> Signed-off-by: Tobias Burnus 
>
> diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
> index 09d75fca037..deb006663d7 100644
> --- a/gcc/config/nvptx/nvptx.opt
> +++ b/gcc/config/nvptx/nvptx.opt
> @@ -108,9 +108,15 @@ Target RejectNegative Alias(misa=,sm_80)
>  march-map=sm_87
>  Target RejectNegative Alias(misa=,sm_80)
>  
> +march-map=sm_89
> +Target RejectNegative Alias(misa=,sm_80)
> +
>  march-map=sm_90
>  Target RejectNegative Alias(misa=,sm_80)
>  
> +march-map=sm_90a
> +Target RejectNegative Alias(misa=,sm_80)
> +
>  Enum
>  Name(ptx_version) Type(int)
>  Known PTX ISA versions (for use with the -mptx= option):


Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs

2024-01-26 Thread Thomas Schwinge
Hi!

Great progress that you've made!  :-)

On 2024-01-26T13:32:02+0100, Tobias Burnus  wrote:
> Tobias Burnus wrote:
>> Am 24.01.24 um 17:01 schrieb Tobias Burnus:
>>> Okay to enable gfx1100 multilib building and to document gfx1100 in 
>>> the manual?
>>
>> and, with this patch, additionally gfx1030?

> amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs

> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -1258,12 +1258,12 @@ default set of libraries is selected based on the 
> value of
>  
>  @item amdgcn*-*-*
>  @var{list} is a comma separated list of ISA names (allowed values: 
> @code{fiji},
> -@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}). It ought not
> -include the name of the default ISA, specified via @option{--with-arch}.  If
> -@var{list} is empty, then there will be no multilibs and only the default
> -run-time library will be built.  If @var{list} is @code{default} or
> -@option{--with-multilib-list=} is not specified, then the default set of
> -libraries is selected.
> +@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}, @code{gfx1030},
> +@code{gfx1100}).  It ought not include the name of the default ISA, specified
> +via @option{--with-arch}.  If @var{list} is empty, then there will be no
> +multilibs and only the default run-time library will be built.  If @var{list}
> +is @code{default} or @option{--with-multilib-list=} is not specified, then
> +the default set of libraries is selected.

Further down in that file, we state:

@anchor{amdgcn-x-amdhsa}
@heading amdgcn-*-amdhsa
AMD GCN GPU target.

Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, 
[...]

LLVM 13.0.1 may still be fine for gfx1030
('[...]/amdgcn-amdhsa/gfx1030/libgcc' does get built; I've not further
tested), but it's not sufficient for gfx1100 anymore:

[...]
checking for suffix of object files... configure: error: in 
`[...]/amdgcn-amdhsa/gfx1100/libgcc':
configure: error: cannot compute suffix of object files: cannot compile
See `config.log' for more details
make[1]: *** [Makefile:14105: configure-target-libgcc] Error 1
[...]

'[...]/amdgcn-amdhsa/gfx1100/libgcc/config.log':

[...]
'gfx1100' is not a recognized processor for this target (ignoring processor)
'gfx1100' is not a recognized processor for this target (ignoring processor)
/tmp/ccZdohcj.s:1:17: error: .amdgcn_target directive's target id 
amdgcn-unknown-amdhsa--gfx1100 does not match the specified target id 
amdgcn-unknown-amdhsa--gfx000
.amdgcn_target "amdgcn-unknown-amdhsa--gfx1100"
   ^
[...]

Which version of LLVM should we be recommending?


Grüße
 Thomas


Re: [PATCH] x86: Update PR 35513 tests

2024-01-24 Thread Thomas Schwinge
Hi!

On 2022-02-10T05:55:15-0800, "H.J. Lu via Gcc-patches" 
 wrote:
> 1. Require linker with GNU_PROPERTY_1_NEEDED support for PR 35513
> run-time tests.

Moving my x86_64-pc-linux-gnu testing from an old to a newish system
(Ubuntu 20.04), I notice:

[-PASS: g++.target/i386/pr35513-1.C  -std=gnu++98 (test for excess errors)-]
[-PASS:-]{+UNSUPPORTED:+} g++.target/i386/pr35513-1.C  
-std=gnu++98[-execution test-]

Etc.

[-PASS: g++.target/i386/pr35513-2.C  -std=gnu++98 (test for excess errors)-]
[-PASS:-]{+UNSUPPORTED:+} g++.target/i386/pr35513-2.C  
-std=gnu++98[-execution test-]

Etc.

..., due to the 'property_1_needed' effective-target check now
diagnosing:

/usr/bin/ld: warning: /tmp/ccFNkvfI.o: unsupported GNU_PROPERTY_TYPE (5) 
type: 0xb0008000

..., with:

$ /usr/bin/ld --version | head -n 1
GNU ld (GNU Binutils for Ubuntu) 2.34

I'm not familiar with these properties, but I wonder if really some
support has been removed (so that this indeed is now UNSUPPORTED), or if
something's wrong somewhere (so that this should still PASS).

For reference:

> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp

> +proc check_effective_target_property_1_needed { } {
> +  return [check_no_compiler_messages_nocache property_1_needed executable {
> +/* Assembly code */
> +#ifdef __LP64__
> +# define __PROPERTY_ALIGN 3
> +#else
> +# define __PROPERTY_ALIGN 2
> +#endif
> +
> + .section ".note.gnu.property", "a"
> + .p2align __PROPERTY_ALIGN
> + .long 1f - 0f   /* name length.  */
> + .long 4f - 1f   /* data length.  */
> + /* NT_GNU_PROPERTY_TYPE_0.   */
> + .long 5 /* note type.  */
> +0:
> + .asciz "GNU"/* vendor name.  */
> +1:
> + .p2align __PROPERTY_ALIGN
> + /* GNU_PROPERTY_1_NEEDED.  */
> + .long 0xb0008000/* pr_type.  */
> + .long 3f - 2f   /* pr_datasz.  */
> +2:
> + /* GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS.  */
> + .long 1
> +3:
> + .p2align __PROPERTY_ALIGN
> +4:
> + .text
> + .globl main
> +main:
> + .byte 0
> +  } ""]
> +}


Grüße
 Thomas


MAINTAINERS: Update my work email address

2024-01-24 Thread Thomas Schwinge
Hi!

Pushed to master branch commit 7fcdb501366632fbf98a1eff275d76b9eea91aa1
"MAINTAINERS: Update my work email address", see attached.

(Happy to talk, of course!)


| Excited to announce that Sourcery Services are now available via BayLibre, 
<https://baylibre.com/>!  慄‍♂️
| 
| GCC, GNU Toolchain, HPC, embedded -- and more to come!
| 
| (Please allow us some time to regroup.)


Copyright assignment for BayLibre is in progress.


Grüße
 Thomas


>From 7fcdb501366632fbf98a1eff275d76b9eea91aa1 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 24 Jan 2024 12:03:03 +0100
Subject: [PATCH] MAINTAINERS: Update my work email address

	* MAINTAINERS: Update my work email address.
---
 MAINTAINERS | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ade5c9f0181f..7d3b78d276eb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -102,7 +102,7 @@ nds32 port		Shiva Chen		
 nios2 port		Chung-Lin Tang		
 nios2 port		Sandra Loosemore	
 nvptx port		Tom de Vries		
-nvptx port		Thomas Schwinge		
+nvptx port		Thomas Schwinge		
 or1k port		Stafford Horne		
 pdp11 port		Paul Koning		
 powerpcspe port		Andrew Jenner		
@@ -181,7 +181,7 @@ libgcc			Ian Lance Taylor	
 libgo			Ian Lance Taylor	
 libgomp			Jakub Jelinek		
 libgomp			Tobias Burnus		
-libgomp (OpenACC)	Thomas Schwinge		
+libgomp (OpenACC)	Thomas Schwinge		
 libgrust		All Rust front end maintainers
 libiberty		Ian Lance Taylor	
 libitm			Torvald Riegel		
@@ -253,7 +253,7 @@ auto-vectorizer		Zdenek Dvorak		
 loop infrastructure	Zdenek Dvorak		
 loop ivopts		Bin Cheng		
 loop optimizer		Bin Cheng		
-OpenACC			Thomas Schwinge		
+OpenACC			Thomas Schwinge		
 OpenACC			Tobias Burnus		
 OpenMP			Jakub Jelinek		
 OpenMP			Tobias Burnus		
-- 
2.40.1



GCN: Enable effective-target 'vect_early_break', 'vect_early_break_hw'

2024-01-12 Thread Thomas Schwinge
Hi!

OK to push the attached
"GCN: Enable effective-target 'vect_early_break', 'vect_early_break_hw'"?
("The relevant test cases are all-PASS with just [two] exceptions, to be
looked into individually, later on."  I'm not currently planning to look
into that.)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 3193614c4f9a8032e85a4da87bde8055aeee7d7b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 9 Jan 2024 10:25:48 +0100
Subject: [PATCH] GCN: Enable effective-target 'vect_early_break',
 'vect_early_break_hw'

Via XPASSing test cases after commit a657c7e3518fcfc796f223d47385cad5e97dc9a5
"testsuite: un-xfail TSVC loops that check for exit control flow vectorization":

PASS: gcc.dg/vect/tsvc/vect-tsvc-s332.c (test for excess errors)
PASS: gcc.dg/vect/tsvc/vect-tsvc-s332.c execution test
[-XFAIL:-]{+XPASS:+} gcc.dg/vect/tsvc/vect-tsvc-s332.c scan-tree-dump vect "vectorized 1 loops"

PASS: gcc.dg/vect/tsvc/vect-tsvc-s481.c (test for excess errors)
PASS: gcc.dg/vect/tsvc/vect-tsvc-s481.c execution test
[-XFAIL:-]{+XPASS:+} gcc.dg/vect/tsvc/vect-tsvc-s481.c scan-tree-dump vect "vectorized 1 loops"

PASS: gcc.dg/vect/tsvc/vect-tsvc-s482.c (test for excess errors)
PASS: gcc.dg/vect/tsvc/vect-tsvc-s482.c execution test
[-XFAIL:-]{+XPASS:+} gcc.dg/vect/tsvc/vect-tsvc-s482.c scan-tree-dump vect "vectorized 1 loops"

..., it became apparent that GCN, too, does support vectorization of loops with
early breaks.  The relevant test cases are all-PASS with just the following
exceptions, to be looked into individually, later on:

PASS: gcc.dg/vect/vect-early-break_25.c (test for excess errors)
PASS: gcc.dg/vect/vect-early-break_25.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-early-break_25.c scan-tree-dump-times vect "Alignment of access forced using peeling" 1

PASS: gcc.dg/vect/vect-early-break_56.c (test for excess errors)
PASS: gcc.dg/vect/vect-early-break_56.c execution test
XPASS: gcc.dg/vect/vect-early-break_56.c scan-tree-dump-times vect "vectorized 2 loops" 2

	gcc/testsuite/
	* lib/target-supports.exp
	(check_effective_target_vect_early_break)
	(check_effective_target_vect_early_break_hw): Enable for GCN.
---
 gcc/testsuite/lib/target-supports.exp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 75d1add894f..497c46de4cb 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4071,6 +4071,7 @@ proc check_effective_target_vect_early_break { } {
 	[istarget aarch64*-*-*]
 	|| [check_effective_target_arm_v8_neon_ok]
 	|| [check_effective_target_sse4]
+	|| [istarget amdgcn-*-*]
 	}}]
 }
 
@@ -4085,6 +4086,7 @@ proc check_effective_target_vect_early_break_hw { } {
 	[istarget aarch64*-*-*]
 	|| [check_effective_target_arm_v8_neon_hw]
 	|| [check_sse4_hw_available]
+	|| [istarget amdgcn-*-*]
 	}}]
 }
 
-- 
2.34.1



Re: [PATCHSET] Fix Rust bootstrap for future libgrust changes

2024-01-11 Thread Thomas Schwinge
Hi!

On 2024-01-11T15:22:07+0100, Arthur Cohen  wrote:
> Sorry about this - two simple changes to Makefile.def we had missed
> during our first libgrust/ patchset

I don't think those were "missed" but rather "intentionally omitted"?
I'll have to have a more detailed look.

(..., and almost no changes in the top-level build system I'd personally
dare to qualify as "simple"...)  ;-P


Grüße
 Thomas


> plus the associated regen of
> Makefile.in in each commit.
>
> Let me know if I should squash them together. I'll follow them up
> with our entire patchset.
>
> Best,
>
> Arthur


Re: [PATCH 1/8] OpenMP: lvalue parsing for map/to/from clauses (C++)

2024-01-09 Thread Thomas Schwinge
Hi Julian!

On 2024-01-07T16:04:37+0100, Tobias Burnus  wrote:
> Am 05.01.24 um 13:23 schrieb Julian Brown:
>> Here's a rebased/retested version [...]

> LGTM - [...]

Got pushed as commit r14-7033-g1413af02d62182bc1e19698aaa4dae406f8f13bf
"OpenMP: lvalue parsing for map/to/from clauses (C++)".

Some (hopefully minor) tuning in the test cases is necessary; for
example, for x86_64-pc-linux-gnu '-m32' testing, I see a few FAILs:

+PASS: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[1\\] [len: x != 0 ? [0-9]+ : [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: [0-9]+\\]\\)"
+PASS: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[1\\] \\[len: x != 0 \\? [0-9]+ : [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: [0-9]+\\]\\)"
+FAIL: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-1.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+PASS: g++.dg/gomp/array-section-1.C  -std=c++98 (test for excess errors)

Etc.

+PASS: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[0\\] \\[len: \\(sizetype\\) y \\* [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: 0\\]\\)"
+PASS: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[0\\] \\[len: \\(sizetype\\) y \\* [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: 0\\]\\)"
+FAIL: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(40 - \\(sizetype\\) SAVE_EXPR 
\\) \\* [0-9]+\\]\\) map\\(firstprivate:arr1 \\[pointer assign, bias: 
\\(long int\\) \\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(40 - \\(sizetype\\) SAVE_EXPR 
\\) \\* [0-9]+\\]\\) map\\(firstprivate:arr1 \\[pointer assign, bias: 
\\(long int\\) \\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(sizetype\\) y \\* [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+FAIL: g++.dg/gomp/array-section-2.C  -std=c++98  scan-tree-dump original 
"map\\(tofrom:arr1\\[SAVE_EXPR \\] \\[len: \\(sizetype\\) y \\* [0-9]+\\]\\) 
map\\(firstprivate:arr1 \\[pointer assign, bias: \\(long int\\) 
\\[SAVE_EXPR \\] - \\(long int\\) \\]\\)"
+PASS: g++.dg/gomp/array-section-2.C  -std=c++98 (test for excess errors)

Etc.

+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 15 (test for 
errors, line 14)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 16 (test for 
errors, line 14)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 17 (test for 
errors, line 14)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 22 (test for 
warnings, line 21)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 36 (test for 
errors, line 35)
+FAIL: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 37 (test for 
warnings, line 35)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 38 (test for 
errors, line 35)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 39 (test for 
errors, line 35)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98  at line 44 (test for 
warnings, line 43)
+PASS: g++.dg/gomp/bad-array-section-4.C  -std=c++98 (test for excess 
errors)

Etc.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [Patch] GCN: Add pre-initial support for gfx1100

2024-01-08 Thread Thomas Schwinge
Hi!

On 2024-01-08T15:30:06+0100, Tobias Burnus  wrote:
> Andrew Stubbs wrote:
>> I know there will be things that need fixing for
>> both experimental architectures.
>
> Indeed. [...]

..., like, making it even build?  ;-P

>> P.S. Apologies, but I think my commits today conflict a little; you
>> should be able to drop the hunks that patch deleted code.
>
> I did so - but I then realized that I should have also added gfx1100 to
> the new chunk.
>
> Committed as r14-7006-g97a52f69d209f6 (see attachment) - as follow up to
> the original r14-7005-g52a2c659ae6c21

Pushed to master branch commit f9290cdf4697f467fd0fb7c710f58cc12e497889
"GCN: Add pre-initial support for gfx1100: 'EF_AMDGPU_MACH_AMDGCN_GFX1100'",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From f9290cdf4697f467fd0fb7c710f58cc12e497889 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 8 Jan 2024 20:35:27 +0100
Subject: [PATCH] GCN: Add pre-initial support for gfx1100:
 'EF_AMDGPU_MACH_AMDGCN_GFX1100'
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

../../../source-gcc/libgomp/plugin/plugin-gcn.c: In function ‘isa_hsa_name’:
../../../source-gcc/libgomp/plugin/plugin-gcn.c:1666:10: error: ‘EF_AMDGPU_MACH_AMDGCN_GFX1100’ undeclared (first use in this function); did you mean ‘EF_AMDGPU_MACH_AMDGCN_GFX1030’?
 1666 | case EF_AMDGPU_MACH_AMDGCN_GFX1100:
  |  ^
  |  EF_AMDGPU_MACH_AMDGCN_GFX1030
../../../source-gcc/libgomp/plugin/plugin-gcn.c:1666:10: note: each undeclared identifier is reported only once for each function it appears in
../../../source-gcc/libgomp/plugin/plugin-gcn.c: In function ‘isa_code’:
../../../source-gcc/libgomp/plugin/plugin-gcn.c:1711:12: error: ‘EF_AMDGPU_MACH_AMDGCN_GFX1100’ undeclared (first use in this function); did you mean ‘EF_AMDGPU_MACH_AMDGCN_GFX1030’?
 1711 | return EF_AMDGPU_MACH_AMDGCN_GFX1100;
  |^
  |EF_AMDGPU_MACH_AMDGCN_GFX1030
../../../source-gcc/libgomp/plugin/plugin-gcn.c: In function ‘max_isa_vgprs’:
../../../source-gcc/libgomp/plugin/plugin-gcn.c:1728:10: error: ‘EF_AMDGPU_MACH_AMDGCN_GFX1100’ undeclared (first use in this function); did you mean ‘EF_AMDGPU_MACH_AMDGCN_GFX1030’?
 1728 | case EF_AMDGPU_MACH_AMDGCN_GFX1100:
  |  ^
  |  EF_AMDGPU_MACH_AMDGCN_GFX1030
make[4]: *** [Makefile:813: libgomp_plugin_gcn_la-plugin-gcn.lo] Error 1

Fix-up for commit 52a2c659ae6c21f84b6acce0afcb9b93b9dc71a0
"GCN: Add pre-initial support for gfx1100".

	libgomp/
	* plugin/plugin-gcn.c (EF_AMDGPU_MACH): Add
	'EF_AMDGPU_MACH_AMDGCN_GFX1100'.
---
 libgomp/plugin/plugin-gcn.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index f24a28faa22..0339848451e 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -389,7 +389,8 @@ typedef enum {
   EF_AMDGPU_MACH_AMDGCN_GFX906 = 0x02f,
   EF_AMDGPU_MACH_AMDGCN_GFX908 = 0x030,
   EF_AMDGPU_MACH_AMDGCN_GFX90a = 0x03f,
-  EF_AMDGPU_MACH_AMDGCN_GFX1030 = 0x036
+  EF_AMDGPU_MACH_AMDGCN_GFX1030 = 0x036,
+  EF_AMDGPU_MACH_AMDGCN_GFX1100 = 0x041
 } EF_AMDGPU_MACH;
 
 const static int EF_AMDGPU_MACH_MASK = 0x00ff;
-- 
2.34.1



Re: OpenMP offloading vs. C++ static local variables

2023-12-23 Thread Thomas Schwinge
Hi!

On 2023-12-21T13:58:23+0100, Jakub Jelinek  wrote:
> On Thu, Dec 21, 2023 at 01:31:19PM +0100, Thomas Schwinge wrote:
>> [...] the gimplification-level code re
>> 'Static locals [...] need to be "omp declare target"' runs *after*
>> 'omp_discover_implicit_declare_target'.  Thus my "move" idea above.
>
> Can't we mark the static locals already during that discovery?

Well, that's precisely what I had tried to communicate, earlier on.  ;-)

I'll work on that, as a refactoring, after I've gotten the current
implementation idea working.

> The addition during gimplification was probably made when we didn't have
> that at all.


>> OK to push, for a start, the attached
>> "GCN, nvptx: Basic '__cxa_guard_{acquire,abort,release}' for C++ static 
>> local variables support"?
>> That's now in libgcc not libgomp, so that it's also usable for GCN, nvptx
>> target testing, where we thus see a number of FAIL -> PASS progressions.
>
>> For now, for single-threaded GCN, nvptx target use only; extension for
>> multi-threaded offloading use to follow later.
>>
>>  libgcc/
>>  * c++-minimal/README: New.
>>  * c++-minimal/guard.c: New.
>>  * config/gcn/t-amdgcn (LIB2ADD): Add it.
>>  * config/nvptx/t-nvptx (LIB2ADD): Likewise.
>
>> +/* Copy'n'paste/edit from 'libstdc++-v3/libsupc++/cxxabi.h'.  */
>> +
>> +  int
>> +  __cxa_guard_acquire(__guard*);
>> +
>> +  void
>> +  __cxa_guard_release(__guard*);
>> +
>> +  void
>> +  __cxa_guard_abort(__guard*);
>
> When all this isn't inside a namespace, shouldn't it be indented by
> 2 spaces less?
>
>> +
>> +/* Copy'n'paste/edit from 'libstdc++-v3/libsupc++/guard.cc'.  */
>> +
>> +# undef _GLIBCXX_GUARD_TEST_AND_ACQUIRE
>> +# undef _GLIBCXX_GUARD_SET_AND_RELEASE
>> +# define _GLIBCXX_GUARD_SET_AND_RELEASE(G) _GLIBCXX_GUARD_SET (G)
>
> And without a space after # here?

Well, those were just un-edited copy'n'pastes from the original files;
now indentation/space-corrected for viewing pleasure.

> Otherwise LGTM, but hope that one day we'll get rid of it again.

Yep.

Pushed to master branch commit c0bf7ea189ecf252152fe15134f70f576bcd20b2
"GCN, nvptx: Basic '__cxa_guard_{acquire,abort,release}' for C++ static local 
variables support",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From c0bf7ea189ecf252152fe15134f70f576bcd20b2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 20 Dec 2023 12:27:48 +0100
Subject: [PATCH] GCN, nvptx: Basic '__cxa_guard_{acquire,abort,release}' for
 C++ static local variables support

For now, for single-threaded GCN, nvptx target use only; extension for
multi-threaded offloading use is to follow later.  Eventually switch to
libstdc++-v3/libsupc++ proper.

	libgcc/
	* c++-minimal/README: New.
	* c++-minimal/guard.c: New.
	* config/gcn/t-amdgcn (LIB2ADD): Add it.
	* config/nvptx/t-nvptx (LIB2ADD): Likewise.
---
 libgcc/c++-minimal/README   |  2 +
 libgcc/c++-minimal/guard.c  | 97 +
 libgcc/config/gcn/t-amdgcn  |  3 ++
 libgcc/config/nvptx/t-nvptx |  3 ++
 4 files changed, 105 insertions(+)
 create mode 100644 libgcc/c++-minimal/README
 create mode 100644 libgcc/c++-minimal/guard.c

diff --git a/libgcc/c++-minimal/README b/libgcc/c++-minimal/README
new file mode 100644
index 000..832f1265f7e
--- /dev/null
+++ b/libgcc/c++-minimal/README
@@ -0,0 +1,2 @@
+Minimal hacked-up version of some C++ support for offload devices, until we
+have libstdc++-v3/libsupc++ proper.
diff --git a/libgcc/c++-minimal/guard.c b/libgcc/c++-minimal/guard.c
new file mode 100644
index 000..e9937b07a62
--- /dev/null
+++ b/libgcc/c++-minimal/guard.c
@@ -0,0 +1,97 @@
+/* 'libstdc++-v3/libsupc++/guard.cc' for offload devices, until we have
+   libstdc++-v3/libsupc++ proper.
+
+   Copyright (C) 2002-2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundati

Re: OpenMP offloading vs. C++ static local variables

2023-12-21 Thread Thomas Schwinge
Hi Jakub!

On 2023-12-07T16:33:08+0100, Jakub Jelinek  wrote:
> On Thu, Dec 07, 2023 at 04:09:04PM +0100, Thomas Schwinge wrote:
>> > Yeah, I believe we should in the omp_discover_* sub-pass handle with
>> > a help of a langhook automatically mark the guard variables (possibly
>> > iff the guarded variable is marked?),
>>
>> Looking at 'gcc/omp-offload.cc:omp_discover_implicit_declare_target' left
>> me confused how that would be the code that marks up 'static' variables
>> as implicit 'omp declare target'.  Working through a simple POD example
>> (say, 's%static S s%static int i') it turns out, indeed that's not where
>> that is happending, but instead 'gcc/gimplify.cc:gimplify_bind_expr' is
>> the place:
>
> Sure, that is for the case where those local statics should be marked
> implicitly because they appear in a target function.
> They can be also marked explicitly by the user through
> #pragma omp declare target enter (name_of_static_var)
> or
> [[omp::decl (declare target)]] attribute on it etc.

These three: implicitly, or explicit '#pragma omp declare target' etc.,
or inside '#pragma omp begin declare target' region are the only OpenMP
facilities to get things 'omp declare target'ed, right?

>> That said...  Couldn't we indeed move this gimplification-level code re
>> 'Static locals [...] need to be "omp declare target"' into
>> 'gcc/omp-offload.cc:omp_discover_implicit_declare_target'?
>
> The omp-offload.cc discovery stuff was added for stuff where the OpenMP
> standard says something is implicitly declare target because there is
> some use of it satisfying some rule.
> Like, calls to functions defined in current compilation unit referenced in
> target region or something similar, or such calls referenced in declare
> target static var initializers.
> So, that feels to me like the right spot to handle the guards as well.
> Of course, the middle-end doesn't know about C++ FE's get_guard variable,
> so it should be some new language hook which would take care of it.
> The omp_discover_declare* functions can add further VAR_DECLs to the
> worklist, so I'd probably call the new language hook in the
> omp_discover_implicit_declare_target last loop.
> Or maybe even better just handle that in the
> cxx_omp_finish_decl_inits hook.  You can just
>   FOR_EACH_VARIABLE (vnode)
> if (DECL_FUNCTION_SCOPE_P (vnode->decl)
>   && omp_declare_target_var_p (vnode->decl))
>   {
>   tree sname = mangle_guard_variable (decl);
>   tree guard = get_global_binding (sname);
>   if (guard)
> ... mark guard as declare target if not yet marked ...
>   }
> because guard var initializers don't really mention anything and so
> their addition doesn't need to trigger further worklist changes.

That doesn't generally work, as the gimplification-level code re
'Static locals [...] need to be "omp declare target"' runs *after*
'omp_discover_implicit_declare_target'.  Thus my "move" idea above.
However, let's defer the latter one; I've now got a simple setup where
the new language hook is invoked in all necessary places.  (Will post
later.)

>> > And sure, __cxa_guard_* would need to be implemented in the offloading
>> > libsupc++.a or libstdc++.a.
>>
>> Until proper libstdc++/libsupc++ support emerges (I'm working on it...),
>> my idea was to add a temporary 'libgomp/config/accel/*.c' implementation
>> (based on 'libstdc++-v3/libsupc++/guard.cc').
>
> That looks reasonable.

OK to push, for a start, the attached
"GCN, nvptx: Basic '__cxa_guard_{acquire,abort,release}' for C++ static local 
variables support"?
That's now in libgcc not libgomp, so that it's also usable for GCN, nvptx
target testing, where we thus see a number of FAIL -> PASS progressions.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From d40678768ae90c3fe1208cffd7d92e7058db5bbf Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 20 Dec 2023 12:27:48 +0100
Subject: [PATCH] GCN, nvptx: Basic '__cxa_guard_{acquire,abort,release}' for
 C++ static local variables support

For now, for single-threaded GCN, nvptx target use only; extension for
multi-threaded offloading use to follow later.

	libgcc/
	* c++-minimal/README: New.
	* c++-minimal/guard.c: New.
	* config/gcn/t-amdgcn (LIB2ADD): Add it.
	* config/nvptx/t-nvptx (LIB2ADD): Likewise.
---
 libgcc/c++-minimal/README   |  2 +
 libgcc/c++-minimal/guard.c  | 97 +
 libgcc/config/gcn/t-amdgcn  |  3 ++
 libgcc/config/nv

RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-21 Thread Thomas Schwinge
Hi!

On 2023-12-13T21:52:29+0100, I wrote:
> On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:
>> On 2023/12/12 1:45, H.J. Lu wrote:
>>> On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng  wrote:
>>> > On 2023/12/9 23:23, Jakub Jelinek wrote:
>>> > > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
>>> > > > This patch try to introduce the rwlock and split the read/write to
>>> > > > unit_root tree and unit_cache with rwlock instead of the mutex to
>>> > > > increase CPU efficiency. In the get_gfc_unit function, the
>>> > > > percentage to step into the insert_unit function is around 30%, in
>>> > > > most instances, we can get the unit in the phase of reading the
>>> > > > unit_cache or unit_root tree. So split the read/write phase by
>>> > > > rwlock would be an approach to make it more parallel.
>>> > > >
>>> > > > BTW, the IPC metrics can gain around 9x in our test server with
>>> > > > 220 cores. The benchmark we used is
>>> > > > https://github.com/rwesson/NEAT
>
>>> > > Ok for trunk, thanks.
>
>>> > Thanks! Looking forward to landing to trunk.
>
>>> Pushed for you.

> I've just filed 
> "'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution 
> test timeouts".
> Would you be able to look into that?

See my update in there.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] tree-optimization/113073 - amend PR112736 fix

2023-12-20 Thread Thomas Schwinge
Hi Richard!

On 2023-12-20T14:44:29+0100, Richard Biener  wrote:
> On Wed, 20 Dec 2023, Richard Biener wrote:
>> On Wed, 20 Dec 2023, Thomas Schwinge wrote:
>> > On 2023-12-19T13:30:58+0100, Richard Biener  wrote:
>> > > The PR112736 testcase fails on RISC-V because the aligned exception
>> > > uses the wrong check.  The alignment support scheme can be
>> > > dr_aligned even when the access isn't aligned to the vector size
>> > > but some targets are happy with element alignment.  The following
>> > > fixes that.
>> > >
>> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
>> >
>> > I've noticed this to regresses GCN target as follows:
>> >
>> > PASS: gcc.dg/vect/bb-slp-pr78205.c (test for excess errors)
>> > PASS: gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times slp2 
>> > "optimized: basic block" 3
>> > PASS: gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times slp2 "BB 
>> > vectorization with gaps at the end of a load is not supported" 1
>> > [-PASS:-]{+FAIL:+} gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times 
>> > optimized " = c\\[4\\];" 1

> Should be fixed by r14-6748-ga8f0278ade1353

Thanks!  Confirmed that 'gcc.dg/vect/bb-slp-pr78205.c' again is all-PASS
(and no unexpected changes in the dumps, and 'bb-slp-pr78205.s' identical
once again, before vs. after).


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


No libstdc++ for GCN (was: No libstdc++ for nvptx)

2023-12-20 Thread Thomas Schwinge
Hi!

On 2015-03-11T22:44:27+0100, I wrote:
> I committed the following in r221362:

> No libstdc++ for nvptx.
>
> The C++ front end insists to link against libstdc++ -- which we don't 
> build:
>
> $ < build-gcc/gcc/testsuite/g++/g++.log grep -o 'error opening 
> [^[:cntrl:]]*' | sort | uniq -c
>   2 error opening libasan.a
>   2 error opening libssp.a
>   12075 error opening libstdc++.a
>
> Based on GCC trunk r220892:
>
> === g++ Summary ===
>
> # of expected passes[-63221-]{+68841+}
> # of unexpected failures[-11751-]{+8764+}
> # of unexpected successes   6
> # of expected failures  [-246-]{+249+}
> # of unresolved testcases   [-5950-]{+3353+}
> # of unsupported tests  [-4160-]{+4143+}

> --- gcc/config/nvptx/nvptx.h
> +++ gcc/config/nvptx/nvptx.h

> +/* The C++ front end insists to link against libstdc++ -- which we don't 
> build.
> +   Tell it to instead link against the innocuous libgcc.  */
> +#define LIBSTDCXX "gcc"

Pushed to master branch commit 4d9d015cf4054f5f9df14a2c11ce81379b6caf0f
"No libstdc++ for GCN", see attached.

(Both these commit are going to get reverted once I've got libstdc++-v3
enabled for GCN, nvptx, but until then, this further harmonizes my GCN
vs. nvptx test results.)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 4d9d015cf4054f5f9df14a2c11ce81379b6caf0f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 13 Apr 2023 08:54:47 +0200
Subject: [PATCH] No libstdc++ for GCN

Like commit d94fae044da071381b73a2ee8afa874b14fa3820 "No libstdc++ for nvptx"
(2015) and elsewhere.

Based on commit 5f1bed2a7af828103ca23a3546466a23e8dd2f30 (2023-12-16), there
are a ton of progressions (for test cases not actually depending on libstdc++
symbols, obviously):

=== g++ Summary ===

# of expected passes[-178369-]{+189226+}
# of unexpected failures[-19880-]{+14089+}
# of unexpected successes   14
# of expected failures  [-1684-]{+1685+}
# of unresolved testcases   [-9820-]{+4837+}
# of unsupported tests  [-11971-]{+11968+}

..., and only two benign "regressions":

[-UNSUPPORTED:-]{+FAIL:+} g++.dg/init/array54.C  -std=c++14 {+(test for excess errors)+}
{+UNRESOLVED: g++.dg/init/array54.C  -std=c++14 compilation failed to produce executable+}
[Etc.]

[...]/g++.dg/init/array54.C:5:10: fatal error: atomic: No such file or directory

That's similar to a lof of other test cases intending to '#include' standard
C++/libstdc++ headers; to be addressed in due time.

PASS: g++.old-deja/g++.pt/const2.C  -std=c++98  at line 5 (test for warnings, line )
[-PASS:-]{+FAIL:+} g++.old-deja/g++.pt/const2.C  -std=c++98 (test for excess errors)
[Etc.]

ld: error: undefined symbol: A::i
>>> referenced by /tmp/ccqXWCSh.o:(p)

The 'error: undefined symbol' is expected here; maybe should simply in the test
case 'dg-prune-output "referenced by"'?  (This PASSed before, as the
'dg-message "i"' was satisfied by 'ld: error: unable to find library -lstdc++',
eh...)

	gcc/
	* config/gcn/gcn.h (LIBSTDCXX): Define to "gcc".
---
 gcc/config/gcn/gcn.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index cb52be7a3a1..b8f2854d497 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -836,3 +836,7 @@ enum gcn_builtin_codes
   || M == V2SFmode || M == V2DImode || M == V2DFmode) \
? 2 \
: 1)
+
+/* The C++ front end insists to link against libstdc++ -- which we don't build.
+   Tell it to instead link against the innocuous libgcc.  */
+#define LIBSTDCXX "gcc"
-- 
2.34.1



Re: [PATCH] tree-optimization/113073 - amend PR112736 fix

2023-12-20 Thread Thomas Schwinge
Hi!

On 2023-12-19T13:30:58+0100, Richard Biener  wrote:
> The PR112736 testcase fails on RISC-V because the aligned exception
> uses the wrong check.  The alignment support scheme can be
> dr_aligned even when the access isn't aligned to the vector size
> but some targets are happy with element alignment.  The following
> fixes that.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

I've noticed this to regresses GCN target as follows:

PASS: gcc.dg/vect/bb-slp-pr78205.c (test for excess errors)
PASS: gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times slp2 "optimized: 
basic block" 3
PASS: gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times slp2 "BB 
vectorization with gaps at the end of a load is not supported" 1
[-PASS:-]{+FAIL:+} gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times 
optimized " = c\\[4\\];" 1

As so often, I've got no clue whether that's a vectorizer, GCN back end,
or test case issue.  ;-)

'diff'ing before vs. after:

--- bb-slp-pr78205.c.191t.slp22023-12-20 09:49:45.834344620 +0100
+++ bb-slp-pr78205.c.191t.slp22023-12-20 09:10:14.706300941 +0100
[...]
@@ -505,8 +505,9 @@
 [...]/bb-slp-pr78205.c:9:8: note: create vector_type-pointer variable to 
type: vector(4) double  vectorizing a pointer ref: c[0]
 [...]/bb-slp-pr78205.c:9:8: note: created [0]
 [...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.7_19 = MEM 
 [(double *)];
-[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.8_20 = MEM 
 [(double *) + 32B];
-[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.9_21 = 
VEC_PERM_EXPR ;
+[...]/bb-slp-pr78205.c:9:8: note: add new stmt: _20 = MEM[(double *) + 
32B];
+[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.8_21 = {_20, 0.0, 
0.0, 0.0};
+[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.9_22 = 
VEC_PERM_EXPR ;
 [...]/bb-slp-pr78205.c:9:8: note: -->vectorizing SLP node starting 
from: a[0] = _1;
 [...]/bb-slp-pr78205.c:9:8: note: vect_is_simple_use: operand c[0], type 
of def: internal
 [...]/bb-slp-pr78205.c:9:8: note: vect_is_simple_use: operand c[1], type 
of def: internal
[...]
@@ -537,9 +538,10 @@
 [...]/bb-slp-pr78205.c:13:8: note: transform load. ncopies = 1
 [...]/bb-slp-pr78205.c:13:8: note: create vector_type-pointer variable to 
type: vector(4) double  vectorizing a pointer ref: c[2]
 [...]/bb-slp-pr78205.c:13:8: note: created [2]
-[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.14_23 = MEM 
 [(double *)];
-[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.15_24 = MEM 
 [(double *) + 32B];
-[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__1.16_25 = 
VEC_PERM_EXPR ;
+[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.14_24 = MEM 
 [(double *)];
+[...]/bb-slp-pr78205.c:13:8: note: add new stmt: _25 = MEM[(double *) + 
32B];
+[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.15_26 = {_25, 
0.0, 0.0, 0.0};
+[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__1.16_27 = 
VEC_PERM_EXPR ;
 [...]/bb-slp-pr78205.c:13:8: note: -->vectorizing SLP node starting 
from: b[0] = _3;
 [...]/bb-slp-pr78205.c:13:8: note: vect_is_simple_use: operand c[2], type 
of def: internal
 [...]/bb-slp-pr78205.c:13:8: note: vect_is_simple_use: operand c[3], type 
of def: internal
[...]
@@ -580,18 +582,22 @@
   double _4;
   double _5;
   vector(2) double _17;
+  double _20;
+  double _25;

[local count: 1073741824]:
   vect__1.7_19 = MEM  [(double *)];
-  vect__1.9_21 = VEC_PERM_EXPR ;
+  _20 = MEM[(double *) + 32B];
+  vect__1.9_22 = VEC_PERM_EXPR ;
   _1 = c[0];
   _2 = c[1];
-  MEM  [(double *)] = vect__1.9_21;
-  vect__3.14_23 = MEM  [(double *)];
-  vect__1.16_25 = VEC_PERM_EXPR ;
+  MEM  [(double *)] = vect__1.9_22;
+  vect__3.14_24 = MEM  [(double *)];
+  _25 = MEM[(double *) + 32B];
+  vect__1.16_27 = VEC_PERM_EXPR ;
   _3 = c[2];
   _4 = c[3];
-  MEM  [(double *)] = vect__1.16_25;
+  MEM  [(double *)] = vect__1.16_27;
   _5 = c[4];
   _17 = {_5, _5};
   MEM  [(double *)] = _17;

--- bb-slp-pr78205.c.265t.optimized   2023-12-20 09:49:45.838344586 +0100
+++ bb-slp-pr78205.c.265t.optimized   2023-12-20 09:10:14.706300941 +0100
@@ -6,17 +6,17 @@
   vector(4) double vect__1.16;
   vector(4) double vect__1.9;
   vector(4) double vect__1.7;
-  double _5;
   vector(2) double _17;
+  double _20;

[local count: 1073741824]:
   vect__1.7_19 = MEM  [(double *)];
-  vect__1.9_21 = VEC_PERM_EXPR ;
-  MEM  [(double *)] = vect__1.9_21;
-  vect__1.16_25 = VEC_PERM_EXPR ;
-  MEM  [(double *)] = vect__1.16_25;
-  _5 = c[4];
-  _17 = {_5, _5};
+  _20 = MEM[(double *) + 32B];
+  vect__1.9_22 = VEC_PERM_EXPR ;
+  MEM  [(double *)] = vect__1.9_22;
+  vect__1.16_27 = VEC_PERM_EXPR ;

Unify OpenACC/C and C++ behavior re duplicate OpenACC 'declare' directives for 'extern' variables [PR90868] (was: [committed] [PR90868] Document status quo for duplicate OpenACC 'declare' directives f

2023-12-19 Thread Thomas Schwinge
Hi!

On 2019-06-19T00:25:49+0200, I wrote:
> This doesn't resolve PR90868, but at least in trunk r272445 we now
> "Document status quo for duplicate OpenACC 'declare' directives for
> 'extern' variables", see attached.

> --- a/gcc/testsuite/c-c++-common/goacc/declare-1.c
> +++ b/gcc/testsuite/c-c++-common/goacc/declare-1.c

> +/* The same as 'f'.  */
> +
> +void
> +f_2 (void)
> +{

> +#ifndef __cplusplus
> +  /* TODO PR90868
> +
> + C: "error: variable '[...]' used more than once with '#pragma acc 
> declare'".  */
> +#else
> +  extern int ve0;
> +#pragma acc declare create(ve0)

>  /* The same as 'f' but everything contained in an OpenACC 'data' construct.  
> */
>
>  void
> @@ -115,7 +193,12 @@ f_data (void)
>  int va3;
>  # pragma acc declare device_resident(va3)
>
> -#if 0 /* TODO */
> +#if 0
> +/* TODO PR90868
> +
> +   C: "error: variable '[...]' used more than once with '#pragma acc 
> declare'".
> +   C++: ICE during gimplification.  */
> +
>  extern int ve0;
>  # pragma acc declare create(ve0)

Pushed to master branch commit cf840a7f7c14242ab7018071310851486a557d4f
"Unify OpenACC/C and C++ behavior re duplicate OpenACC 'declare' directives for 
'extern' variables [PR90868]",
see attached.

Later we'll have to check how changes like
commit 4e62aca0e0520e4ed2532f2d8153581190621c1a
"c++: block-scope externs get an alias [PR95677,PR31775,PR95677]",
commit db3d7270b42fe27fb05664c4fdf524ab7ad13a75
"openmp: Fix up declare target handling for vars with DECL_LOCAL_DECL_ALIAS 
[PR102640]"
(and possibly others) actually apply to OpenACC 'declare'.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From cf840a7f7c14242ab7018071310851486a557d4f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 18 Dec 2023 17:25:17 +0100
Subject: [PATCH] Unify OpenACC/C and C++ behavior re duplicate OpenACC
 'declare' directives for 'extern' variables [PR90868]

This likely still isn't what OpenACC actually intends (addressing that is for
another day), but at least we now misbehave consistently for C and C++.

	PR c++/90868
	gcc/cp/
	* parser.cc (cp_parser_oacc_declare): For "more than once", check
	the DECL that we're actually setting the attribute on.
	gcc/testsuite/
	* c-c++-common/goacc/declare-1.c: Adjust.
	* c-c++-common/goacc/declare-2.c: Likewise.
---
 gcc/cp/parser.cc | 23 +++--
 gcc/testsuite/c-c++-common/goacc/declare-1.c |  9 +++---
 gcc/testsuite/c-c++-common/goacc/declare-2.c | 34 
 3 files changed, 29 insertions(+), 37 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index e4fbab1bab5..1e2d520345b 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -46962,20 +46962,8 @@ cp_parser_oacc_declare (cp_parser *parser, cp_token *pragma_tok)
 	  continue;
 	}
 
-  if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl))
-	  || lookup_attribute ("omp declare target link",
-			   DECL_ATTRIBUTES (decl)))
-	{
-	  error_at (loc, "variable %qD used more than once with "
-		"%<#pragma acc declare%>", decl);
-	  error = true;
-	  continue;
-	}
-
   if (!error)
 	{
-	  tree id;
-
 	  if (DECL_LOCAL_DECL_P (decl))
 	/* We need to mark the aliased decl, as that is the entity
 	   that is being referred to.  This won't work for
@@ -46987,6 +46975,17 @@ cp_parser_oacc_declare (cp_parser *parser, cp_token *pragma_tok)
 	  if (alias != error_mark_node)
 		decl = alias;
 
+	  if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl))
+	  || lookup_attribute ("omp declare target link",
+   DECL_ATTRIBUTES (decl)))
+	{
+	  error_at (loc, "variable %qD used more than once with "
+			"%<#pragma acc declare%>", decl);
+	  error = true;
+	  continue;
+	}
+
+	  tree id;
 	  if (OMP_CLAUSE_MAP_KIND (t) == GOMP_MAP_LINK)
 	id = get_identifier ("omp declare target link");
 	  else
diff --git a/gcc/testsuite/c-c++-common/goacc/declare-1.c b/gcc/testsuite/c-c++-common/goacc/declare-1.c
index 46ee01b6759..808dc2ac818 100644
--- a/gcc/testsuite/c-c++-common/goacc/declare-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/declare-1.c
@@ -113,11 +113,11 @@ f_2 (void)
   int va3;
 #pragma acc declare device_resident(va3)
 
-#ifndef __cplusplus
+#if 0
   /* TODO PR90868
 
- C: "error: variable '[...]' used more than once with '#pragma acc declare'".  */
-#else
+ "error: variable '[...]' used more than once with '#pragma acc declare'".  *

libgrust: 'AM_ENABLE_MULTILIB' only for target builds [PR113056] (was: [PATCH v2 2/4] libgrust: Add libproc_macro and build system)

2023-12-18 Thread Thomas Schwinge
Hi!

> --- a/libgrust/configure.ac
> +++ b/libgrust/configure.ac

> -# AM_ENABLE_MULTILIB(, ..)
> +AM_ENABLE_MULTILIB(, ..)

Such a change was applied eventually, and is necessary for target builds
-- but potentially harmful for host builds.  OK to push the attached
"libgrust: 'AM_ENABLE_MULTILIB' only for target builds [PR113056]"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 71e00b191bd630aa3be66e38069c707ae76a91d3 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 18 Dec 2023 16:27:39 +0100
Subject: [PATCH] libgrust: 'AM_ENABLE_MULTILIB' only for target builds
 [PR113056]

..., but not for host builds, which don't need it, and it may cause the build
to fail.

Use what appears to be the standard pattern for that (see
'libbacktrace/configure.ac', 'zlib/configure.ac').

	PR rust/113056
	libgrust/
	* configure.ac: 'AM_ENABLE_MULTILIB' only for target builds.
	* configure: Regenerate.
---
 libgrust/configure| 8 +---
 libgrust/configure.ac | 4 +++-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/libgrust/configure b/libgrust/configure
index 5388a0e22a6..e778a253915 100755
--- a/libgrust/configure
+++ b/libgrust/configure
@@ -2387,7 +2387,8 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
 ac_config_files="$ac_config_files Makefile"
 
 
-# Default to --enable-multilib
+if test -n "${with_target_subdir}"; then
+  # Default to --enable-multilib
 # Check whether --enable-multilib was given.
 if test "${enable_multilib+set}" = set; then :
   enableval=$enable_multilib; case "$enableval" in
@@ -2424,6 +2425,7 @@ fi
 
 ac_config_commands="$ac_config_commands default-1"
 
+fi
 
 # Do not delete or change the following two lines.  For why, see
 # http://gcc.gnu.org/ml/libstdc++/2003-07/msg00451.html
@@ -12653,7 +12655,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12656 "configure"
+#line 12658 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12759,7 +12761,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12762 "configure"
+#line 12764 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/libgrust/configure.ac b/libgrust/configure.ac
index 226c42ba649..adfb3500fb3 100644
--- a/libgrust/configure.ac
+++ b/libgrust/configure.ac
@@ -2,7 +2,9 @@ AC_INIT([libgrust], version-unused,,libgrust)
 AC_CONFIG_SRCDIR(Makefile.am)
 AC_CONFIG_FILES([Makefile])
 
-AM_ENABLE_MULTILIB(, ..)
+if test -n "${with_target_subdir}"; then
+  AM_ENABLE_MULTILIB(, ..)
+fi
 
 # Do not delete or change the following two lines.  For why, see
 # http://gcc.gnu.org/ml/libstdc++/2003-07/msg00451.html
-- 
2.34.1



Re: [committed] amdgcn: XNACK support

2023-12-18 Thread Thomas Schwinge
Hi Andrew!

On 2023-12-13T15:46:45+, Andrew Stubbs  wrote:
> Some AMD GCN devices support an "XNACK" mode in which the device can
> handle page-misses (and maybe other traps in memory instructions), but
> it's not completely invisible to software.
>
> We need this now to support OpenMP Unified Shared Memory (I plan to post
> updated patches for that in January), and in future it may enable
> support for APU devices (such as MI300).
>
> The first patch ensures that load instructions are "restartable",
> meaning that the outputs do not overwrite the input registers (address
> and offsets). This maps pretty much exactly to the GCC "early-clobber"
> concept, so we just need to add additional alternatives and then not
> generate problem instructions explicitly.
>
> The second patch is a workaround for the register allocation patch I
> asked about on gcc@ yesterday.  The early clobber increases register
> pressure which causes compile failure when LRA is unable to spill
> additional registers without needing yet more registers. This doesn't
> become a problem on gfx90a (MI200) so soon due to the additional AVGPR
> spill registers, and that's the only device that really supports USM, so
> far, so limiting XNACK to that device will work for now.

In case that's useful (I don't know which test cases you've been looking
at) -- in GCN target testing, I've (presumably) ran into this issue here:

{+WARNING: gfortran.dg/pr92161.f   -O  (test for excess errors) program 
timed out.+}
[-PASS:-]{+FAIL:+} gfortran.dg/pr92161.f   -O  (test for excess errors)

Manually reproducing, in '-march=gfx90a' testing, this disappears with
'-mxnack=off' added.  Similarly for '-march=gfx908', '-march=gfx906',
'-march=gfx900', with explicit '-mxnack=on'.


Grüße
 Thomas


> The -mxnack option was already added as a placeholder, so not much is
> needed there.
>
> Committed to master. An older version of these patches is already
> committed to devel/omp/gcc-13 (OG13).
>
> Andrew
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH v7 4/5] OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic

2023-12-15 Thread Thomas Schwinge
Hi!

On 2023-12-14T15:26:38+0100, Tobias Burnus  wrote:
> On 19.08.23 00:47, Julian Brown wrote:
>> This patch adds support for non-constant component offsets in "map"
>> clauses for OpenMP (and the equivalants for OpenACC) [...]

Should eventually also add some OpenACC test cases?


> LGTM with:
>
> - inclusion of your follow-up fix for shared-memory systems (see email
> of August 21)

This was applied here:

>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c

>> +/* { dg-output "(\n|\r|\r\n)" } */
>> +/* { dg-output "libgomp: Mapped array elements must be the same 
>> .*(\n|\r|\r\n)+" } */
>> +/* { dg-shouldfail "" { offload_device_nonshared_as } } */

..., and here:

>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c

>> +/* { dg-output "(\n|\r|\r\n)" } */
>> +/* { dg-output "libgomp: Mapped array elements must be the same 
>> .*(\n|\r|\r\n)+" } */
>> +/* { dg-shouldfail "" { offload_device_nonshared_as } } */

..., but not here:

>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90

>> +! { dg-output "(\n|\r|\r\n)" }
>> +! { dg-output "libgomp: Mapped array elements must be the same 
>> .*(\n|\r|\r\n)+" }
>> +! { dg-shouldfail "" { offload_device_nonshared_as } }

Pushed to master branch commit bc7546e32c5a942e240ef97776352d21105ef291
"In 'libgomp.fortran/map-subarray-5.f90', restrict 'dg-output's to 'target 
offload_device_nonshared_as'",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From bc7546e32c5a942e240ef97776352d21105ef291 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 15 Dec 2023 13:05:24 +0100
Subject: [PATCH] In 'libgomp.fortran/map-subarray-5.f90', restrict
 'dg-output's to 'target offload_device_nonshared_as'

..., as in 'libgomp.c-c++-common/map-arrayofstruct-{2,3}.c'.

Minor fix-up for commit f5745dc1426bdb1a53ebaf7af758b2250ccbff02
"OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic".

	libgomp/
	* testsuite/libgomp.fortran/map-subarray-5.f90: Restrict
	'dg-output's to 'target offload_device_nonshared_as'.
---
 libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
index e7cdf11e610..59ad01ab76b 100644
--- a/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
+++ b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
@@ -49,6 +49,6 @@ end do
 
 end
 
-! { dg-output "(\n|\r|\r\n)" }
-! { dg-output "libgomp: Mapped array elements must be the same .*(\n|\r|\r\n)+" }
+! { dg-output "(\n|\r|\r\n)" { target offload_device_nonshared_as } }
+! { dg-output "libgomp: Mapped array elements must be the same .*(\n|\r|\r\n)+" { target offload_device_nonshared_as } }
 ! { dg-shouldfail "" { offload_device_nonshared_as } }
-- 
2.34.1



Re: [PATCH v2 2/4] libgrust: Add libproc_macro and build system

2023-12-15 Thread Thomas Schwinge
Hi Jason!

I think you usually deal with these kind of GCC Git things?  If not,
please let me know.

On 2023-10-26T10:21:18+0200, I wrote:
> First, I've pushed into GCC upstream Git branch devel/rust/libgrust-v2
> the "v2" libgrust changes as posted by Arthur, so that people can easily
> test this before it getting into Git master branch.  [...]

Please now delete the GCC Git 'devel/rust/libgrust-v2' branch, which was
only used temporarily, and is now obsolete.

$ git push upstream :devel/rust/libgrust-v2
remote: *** Deleting branch 'devel/rust/libgrust-v2' is not allowed.
remote: *** 
remote: *** This repository currently only allow the deletion of references
remote: *** whose name matches the following:
remote: *** 
remote: *** refs/users/[^/]*/heads/.*
remote: *** refs/vendors/[^/]*/heads/.*
remote: *** 
remote: *** Branch deletion is only allowed for user and vendor branches.  
If another branch was created by mistake, contact an administrator to delete it 
on the server with git update-ref.  If a development branch is dead, also 
contact an administrator to move it under refs/dead/heads/ rather than deleting 
it.
remote: error: hook declined to update refs/heads/devel/rust/libgrust-v2
To git+ssh://gcc.gnu.org/git/gcc.git
 ! [remote rejected]   devel/rust/libgrust-v2 (hook declined)
error: failed to push some refs to 'git+ssh://gcc.gnu.org/git/gcc.git'


Grüße
 Thomas


RE: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-12-15 Thread Thomas Schwinge
Hi!

On 2023-12-13T08:14:28+, Di Zhao OS  wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr110279-2.c
> @@ -0,0 +1,41 @@
> +/* PR tree-optimization/110279 */
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast --param tree-reassoc-width=4 --param 
> fully-pipelined-fma=1 -fdump-tree-reassoc2-details -fdump-tree-optimized" } */
> +/* { dg-additional-options "-march=armv8.2-a" { target aarch64-*-* } } */
> +
> +#define LOOP_COUNT 8
> +typedef double data_e;
> +
> +#include 
> +
> +__attribute_noinline__ data_e
> +foo (data_e in)

Pushed to master branch commit 91e9e8faea4086b3b8aef2355fc12c1559d425f6
"Fix 'gcc.dg/pr110279-2.c' syntax error due to '__attribute_noinline__'",
see attached.

However:

> +{
> +  data_e a1, a2, a3, a4;
> +  data_e tmp, result = 0;
> +  a1 = in + 0.1;
> +  a2 = in * 0.1;
> +  a3 = in + 0.01;
> +  a4 = in * 0.59;
> +
> +  data_e result2 = 0;
> +
> +  for (int ic = 0; ic < LOOP_COUNT; ic++)
> +{
> +  /* Test that a complete FMA chain with length=4 is not broken.  */
> +  tmp = a1 + a2 * a2 + a3 * a3 + a4 * a4 ;
> +  result += tmp - ic;
> +  result2 = result2 / 2 - tmp;
> +
> +  a1 += 0.91;
> +  a2 += 0.1;
> +  a3 -= 0.01;
> +  a4 -= 0.89;
> +
> +}
> +
> +  return result + result2;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "was chosen for reassociation" 
> "reassoc2"} } */
> +/* { dg-final { scan-tree-dump-times {\.FMA } 3 "optimized"} } */

..., I still see these latter two tree dump scans FAIL, for GCN:

$ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2
  2 *: a3_40
  2 *: a2_39
Width = 4 was chosen for reassociation
Transforming _15 = powmult_1 + powmult_3;
 into _63 = powmult_1 + a1_38;
$ grep -F .FMA pr110279-2.c.265t.optimized
  _63 = .FMA (a2_39, a2_39, a1_38);
  _64 = .FMA (a3_40, a3_40, powmult_5);

..., nvptx:

$ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2
  2 *: a3_40
  2 *: a2_39
Width = 4 was chosen for reassociation
Transforming _15 = powmult_1 + powmult_3;
 into _63 = powmult_1 + a1_38;
$ grep -F .FMA pr110279-2.c.265t.optimized
  _63 = .FMA (a2_39, a2_39, a1_38);
  _64 = .FMA (a3_40, a3_40, powmult_5);

..., but also x86_64-pc-linux-gnu:

$  grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2
  2 *: a3_40
  2 *: a2_39
Width = 2 was chosen for reassociation
Transforming _15 = powmult_1 + powmult_3;
 into _63 = powmult_1 + powmult_3;
$ grep -cF .FMA pr110279-2.c.265t.optimized
0


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 91e9e8faea4086b3b8aef2355fc12c1559d425f6 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 15 Dec 2023 10:03:12 +0100
Subject: [PATCH] Fix 'gcc.dg/pr110279-2.c' syntax error due to
 '__attribute_noinline__'

For example, for GCN or nvptx target configurations, using newlib:

FAIL: gcc.dg/pr110279-2.c (test for excess errors)
UNRESOLVED: gcc.dg/pr110279-2.c scan-tree-dump-not reassoc2 "was chosen for reassociation"
UNRESOLVED: gcc.dg/pr110279-2.c scan-tree-dump-times optimized "\\.FMA " 3

[...]/source-gcc/gcc/testsuite/gcc.dg/pr110279-2.c:11:1: error: unknown type name '__attribute_noinline__'
[...]/source-gcc/gcc/testsuite/gcc.dg/pr110279-2.c:12:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'foo'

We cannot assume 'stdio.h' to define '__attribute_noinline__' -- but then, that
also isn't necessary for this test case (there is nothing to inline into).

	gcc/testsuite/
	* gcc.dg/pr110279-2.c: Don't '#include '.  Remove
	'__attribute_noinline__'.
---
 gcc/testsuite/gcc.dg/pr110279-2.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr110279-2.c b/gcc/testsuite/gcc.dg/pr110279-2.c
index 0304a77aa66..b6b69969c6b 100644
--- a/gcc/testsuite/gcc.dg/pr110279-2.c
+++ b/gcc/testsuite/gcc.dg/pr110279-2.c
@@ -6,9 +6,7 @@
 #define LOOP_COUNT 8
 typedef double data_e;
 
-#include 
-
-__attribute_noinline__ data_e
+data_e
 foo (data_e in)
 {
   data_e a1, a2, a3, a4;
-- 
2.34.1



Re: [PATCH] Fix tests for gomp

2023-12-14 Thread Thomas Schwinge
Hi!

On 2023-12-13T12:09:14+0100, Jakub Jelinek  wrote:
> On Wed, Dec 13, 2023 at 11:03:50AM +, Andre Vieira (lists) wrote:
>> Hmm I think I understand what you are saying, but I'm not sure I agree.
>> So before I enabled simdclone testing for aarch64, this test had no target
>> selectors. So it checked the same for 'all simdclone test targets'. Which
>> seem to be x86 and amdgcn:
>>
>> @@ -4321,7 +4321,8 @@ proc check_effective_target_vect_simd_clones { } {
>>  return [check_cached_effective_target_indexed vect_simd_clones {
>>expr { (([istarget i?86-*-*] || [istarget x86_64-*-*])
>>   && [check_effective_target_avx512f])
>> || [istarget amdgcn-*-*]
>> || [istarget aarch64*-*-*] }}]
>>  }
>>
>> I haven't checked what amdgcn does with this test, but I'd have to assume
>> they were passing before? Though I'm not sure how amdgcn would pass the
>> original:

>> --- a/libgomp/testsuite/libgomp.c/declare-variant-1.c
>> +++ b/libgomp/testsuite/libgomp.c/declare-variant-1.c

>>  -  /* At gimplification time, we can't decide yet which function to call.  
>> */
>>  -  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */
>
> It can't really pass there.  amdgcn certainly doesn't create 4 different
> simd clones where one has avx512f isa and others don't.
> gcn creates just one simd clone with simdlen 64 and that clone will never
> support avx512f isa and we know that already at gimplification time.

For GCN target (and likewise, nvptx target) configurations, libgomp test
cases currently are a total mess -- the reason being that those target
configurations actually (largely) implement GCN or nvptx *offloading*
configuration functionality: they lower OMP constructs and implement
libgomp functions in a way that (largely) assumes that they're
*offloading* instead of *target* configurations, and therefore things go
horribly wrong.  (This certainly is something worth fixing, but...)
Therefore, currently, GCN or nvptx *target* configuration's
'check-target-libgomp' currently doesn't really have any value, and
certainly isn't maintained in any way.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Update 'gcc.dg/vect/vect-simd-clone-*.c' GCN 'dg-warning's (was: [PATCH] aarch64: enable mixed-types for aarch64 simdclones)

2023-12-14 Thread Thomas Schwinge
Hi!

On 2023-10-16T16:03:26+0100, "Andre Vieira (lists)" 
 wrote:
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
> @@ -12,8 +12,13 @@ int array[N];
>
>  #pragma omp declare simd simdlen(4) notinbranch
>  #pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3)
> +#ifdef __aarch64__
> +#pragma omp declare simd simdlen(2) notinbranch
> +#pragma omp declare simd simdlen(2) notinbranch uniform(b) linear(c:3)
> +#else
>  #pragma omp declare simd simdlen(8) notinbranch
>  #pragma omp declare simd simdlen(8) notinbranch uniform(b) linear(c:3)
> +#endif
>  __attribute__((noinline)) int
>  foo (int a, int b, int c)
>  {

These added lines run afoul with end-of-file GCN-specific DejaGnu
directives:

[...]
/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* 
} 18 } */
/* { dg-warning {unsupported simdlen 4 \(amdgcn\)} "" { target amdgcn*-*-* 
} 18 } */

That, indeed, also has been suboptimal, to use absolute lines numbers
here.  (..., and maybe, like aarch64 have now done, GCN also should
suitably parameterize the 'simdlen', to resolve this altogether?
Until then, to resolve regressions, I've pushed to master branch
commit 7b15959f8e35b821ebfe832a36e5e712b708dae1
"Update 'gcc.dg/vect/vect-simd-clone-*.c' GCN 'dg-warning's", see
attached.


Grüße
 Thomas


> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
> @@ -12,8 +12,13 @@ int array[N] __attribute__((aligned (32)));
>
>  #pragma omp declare simd simdlen(4) notinbranch aligned(a:16) uniform(a) 
> linear(b)
>  #pragma omp declare simd simdlen(4) notinbranch aligned(a:32) uniform(a) 
> linear(b)
> +#ifdef __aarch64__
> +#pragma omp declare simd simdlen(2) notinbranch aligned(a:16) uniform(a) 
> linear(b)
> +#pragma omp declare simd simdlen(2) notinbranch aligned(a:32) uniform(a) 
> linear(b)
> +#else
>  #pragma omp declare simd simdlen(8) notinbranch aligned(a:16) uniform(a) 
> linear(b)
>  #pragma omp declare simd simdlen(8) notinbranch aligned(a:32) uniform(a) 
> linear(b)
> +#endif
>  __attribute__((noinline)) void
>  foo (int *a, int b, int c)
>  {

> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
> @@ -12,7 +12,11 @@ float d[N];
>  int e[N];
>  unsigned short f[N];
>
> +#ifdef __aarch64__
> +#pragma omp declare simd simdlen(4) notinbranch uniform(b)
> +#else
>  #pragma omp declare simd simdlen(8) notinbranch uniform(b)
> +#endif
>  __attribute__((noinline)) float
>  foo (float a, float b, float c)
>  {

> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
> @@ -10,7 +10,11 @@
>
>  int d[N], e[N];
>
> +#ifdef __aarch64__
> +#pragma omp declare simd simdlen(2) notinbranch uniform(b) linear(c:3)
> +#else
>  #pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3)
> +#endif
>  __attribute__((noinline)) long long int
>  foo (int a, int b, int c)
>  {

> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
> @@ -12,14 +12,22 @@ int a[N], b[N];
>  long int c[N];
>  unsigned char d[N];
>
> +#ifdef __aarch64__
> +#pragma omp declare simd simdlen(2) notinbranch
> +#else
>  #pragma omp declare simd simdlen(8) notinbranch
> +#endif
>  __attribute__((noinline)) int
>  foo (long int a, int b, int c)
>  {
>return a + b + c;
>  }
>
> +#ifdef __aarch64__
> +#pragma omp declare simd simdlen(2) notinbranch
> +#else
>  #pragma omp declare simd simdlen(8) notinbranch
> +#endif
>  __attribute__((noinline)) long int
>  bar (int a, int b, long int c)
>  {


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 7b15959f8e35b821ebfe832a36e5e712b708dae1 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 14 Dec 2023 10:47:35 +0100
Subject: [PATCH] Update 'gcc.dg/vect/vect-simd-clone-*.c' GCN 'dg-warning's

Recent commit f5fc001a84a7dbb942a6252b3162dd38b4aae311
"aarch64: enable mixed-types for aarch64 simdclones" added lines to those
test cases and GCN-specific line numbers got out of sync, which had
originally gotten added in commit b73c49f6f88dd7f7569f9a72c8ceb04598d4c15c
"amdgcn: OpenMP SIMD routine support".

	gcc/testsuite/
	* gcc.dg/vect/vect-simd-clone-1.c: Update GCN 'dg-warning's.
	* gcc.dg/vect/vect-simd-clone-2.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-3.c: Likewise.
	* gcc.dg/v

In 'gcc/gimple-ssa-sccopy.cc', '#define INCLUDE_ALGORITHM' instead of '#include ' (was: [PATCH v4] A new copy propagation and PHI elimination pass)

2023-12-14 Thread Thomas Schwinge
Hi!

On 2023-12-13T17:12:11+0100, Filip Kastl  wrote:
> --- /dev/null
> +++ b/gcc/gimple-ssa-sccopy.cc

> +#include 

Pushed to master branch commit 65e41f4fbfc539c5cc429c684176f8ea39f4b8f2
"In 'gcc/gimple-ssa-sccopy.cc', '#define INCLUDE_ALGORITHM' instead of 
'#include '",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 65e41f4fbfc539c5cc429c684176f8ea39f4b8f2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 14 Dec 2023 14:12:45 +0100
Subject: [PATCH] In 'gcc/gimple-ssa-sccopy.cc', '#define INCLUDE_ALGORITHM'
 instead of '#include '

... to avoid issues such as:

In file included from [...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/xmmintrin.h:34:0,
 from [...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/x86intrin.h:31,
 from [...]/i686-pc-linux-gnu/include/c++/5.2.0/i686-pc-linux-gnu/64/bits/opt_random.h:33,
 from [...]/i686-pc-linux-gnu/include/c++/5.2.0/random:50,
 from [...]/i686-pc-linux-gnu/include/c++/5.2.0/bits/stl_algo.h:66,
 from [...]/i686-pc-linux-gnu/include/c++/5.2.0/algorithm:62,
 from [...]/source-gcc/gcc/gimple-ssa-sccopy.cc:32:
[...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/mm_malloc.h:42:12: error: attempt to use poisoned "malloc"
 return malloc (size);
^
make[2]: *** [Makefile:1197: gimple-ssa-sccopy.o] Error 1

Minor fix-up for commit cd794c3961017703a4d2ca0e854ea23b3d4b6373
"A new copy propagation and PHI elimination pass".

	gcc/
	* gimple-ssa-sccopy.cc: '#define INCLUDE_ALGORITHM' instead of
	'#include '.
---
 gcc/gimple-ssa-sccopy.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
index ac5ec32eb32..7ebb6c05caf 100644
--- a/gcc/gimple-ssa-sccopy.cc
+++ b/gcc/gimple-ssa-sccopy.cc
@@ -18,6 +18,7 @@ You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
+#define INCLUDE_ALGORITHM
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -29,7 +30,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "vec.h"
 #include "hash-set.h"
-#include 
 #include "ssa-iterators.h"
 #include "gimple-fold.h"
 #include "gimplify.h"
-- 
2.34.1



RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-14 Thread Thomas Schwinge
Hi Lipeng!

On 2023-12-14T02:28:22+, "Zhu, Lipeng"  wrote:
> On 2023/12/14 4:52, Thomas Schwinge wrote:
>> On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:
>> > On 2023/12/12 1:45, H.J. Lu wrote:
>> >> On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng 
>> wrote:
>> >> > On 2023/12/9 23:23, Jakub Jelinek wrote:
>> >> > > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
>> >> > > > This patch try to introduce the rwlock and split the read/write
>> >> > > > to unit_root tree and unit_cache with rwlock instead of the
>> >> > > > mutex to increase CPU efficiency. In the get_gfc_unit function,
>> >> > > > the percentage to step into the insert_unit function is around
>> >> > > > 30%, in most instances, we can get the unit in the phase of
>> >> > > > reading the unit_cache or unit_root tree. So split the
>> >> > > > read/write phase by rwlock would be an approach to make it more
>> parallel.
>> >> > > >
>> >> > > > BTW, the IPC metrics can gain around 9x in our test server with
>> >> > > > 220 cores. The benchmark we used is
>> >> > > > https://github.com/rwesson/NEAT

>> I've just filed <https://gcc.gnu.org/PR113005>
>> "'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution
>> test timeouts".
>> Would you be able to look into that?

> Sure, I will look into that.
>
> BTW, I didn’t have the PowerPC in hands, do you mind granting the access of 
> your
> test environment to me to help reproduce the issue?

That's unfortunately not possible: it's behind company VPN, restricted
access.  :-/ I'll later try to have at least a quick look where it's
hanging, or what it's doing.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-13 Thread Thomas Schwinge
Hi Lipeng!

On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:
> On 2023/12/12 1:45, H.J. Lu wrote:
>> On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng  wrote:
>> > On 2023/12/9 23:23, Jakub Jelinek wrote:
>> > > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
>> > > > This patch try to introduce the rwlock and split the read/write to
>> > > > unit_root tree and unit_cache with rwlock instead of the mutex to
>> > > > increase CPU efficiency. In the get_gfc_unit function, the
>> > > > percentage to step into the insert_unit function is around 30%, in
>> > > > most instances, we can get the unit in the phase of reading the
>> > > > unit_cache or unit_root tree. So split the read/write phase by
>> > > > rwlock would be an approach to make it more parallel.
>> > > >
>> > > > BTW, the IPC metrics can gain around 9x in our test server with
>> > > > 220 cores. The benchmark we used is
>> > > > https://github.com/rwesson/NEAT

>> > > Ok for trunk, thanks.

>> > Thanks! Looking forward to landing to trunk.

>> Pushed for you.

> Thanks for everyone's patience and help, really appreciate that!

Congratulations on your first contribution to GCC (as far as I can tell)!
:-)


I've just filed 
"'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution test 
timeouts".
Would you be able to look into that?


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Fix 'libgomp/config/linux/allocator.c' 'size_t' vs. '%ld' format string mismatch (was: Build breakage)

2023-12-13 Thread Thomas Schwinge
Hi!

On 2023-12-13T20:36:40+0100, I wrote:
> On 2023-12-13T11:15:54-0800, Jerry D via Gcc  wrote:
>> I am getting this failure to build from clean trunk.
>
> This is due to commit r14-6499-g348874f0baac0f22c98ab11abbfa65fd172f6bdd
> "libgomp: basic pinned memory on Linux", which supposedly was only tested
> with '--disable-multilib' or so.  As Andrew's now on vacations --
> conveniently ;-P -- I'll soon push a fix.

Pushed to master branch commit 5445ff4a51fcee4d281f79b5f54b349290d0327d
"Fix 'libgomp/config/linux/allocator.c' 'size_t' vs. '%ld' format string 
mismatch",
see attached.


Grüße
 Thomas


>> In file included from ../../../../trunk/libgomp/config/linux/allocator.c:31:
>> ../../../../trunk/libgomp/config/linux/allocator.c: In function
>> ‘linux_memspace_alloc’:
>> ../../../../trunk/libgomp/config/linux/allocator.c:70:26: error: format
>> ‘%ld’ expects argument of type ‘long int’, but argument 3 has type
>> ‘size_t’ {aka ‘unsigned int’} [-Werror=format=]
>> 70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
>>|  ^
>> 71 |   " memory (ulimit too low?)\n", size);
>>|  
>>|  |
>>|  size_t
>> {aka unsigned int}
>> ../../../../trunk/libgomp/libgomp.h:186:29: note: in definition of macro
>> ‘gomp_debug’
>>186 |   (gomp_debug) ((KIND), __VA_ARGS__); \
>>| ^~~
>> ../../../../trunk/libgomp/config/linux/allocator.c:70:52: note: format
>> string is defined here
>> 70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
>>|  ~~^
>>||
>>|long int
>>|  %d


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 5445ff4a51fcee4d281f79b5f54b349290d0327d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 13 Dec 2023 17:48:11 +0100
Subject: [PATCH] Fix 'libgomp/config/linux/allocator.c' 'size_t' vs. '%ld'
 format string mismatch
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fix-up for commit 348874f0baac0f22c98ab11abbfa65fd172f6bdd
"libgomp: basic pinned memory on Linux", which may result in build failures
as follow, for example, for the '-m32' multilib of x86_64-pc-linux-gnu:

In file included from [...]/source-gcc/libgomp/config/linux/allocator.c:31:
[...]/source-gcc/libgomp/config/linux/allocator.c: In function ‘linux_memspace_alloc’:
[...]/source-gcc/libgomp/config/linux/allocator.c:70:26: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘size_t’ {aka ‘unsigned int’} [-Werror=format=]
   70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
  |  ^
   71 |   " memory (ulimit too low?)\n", size);
  |  
  |  |
  |  size_t {aka unsigned int}
[...]/source-gcc/libgomp/libgomp.h:186:29: note: in definition of macro ‘gomp_debug’
  186 |   (gomp_debug) ((KIND), __VA_ARGS__); \
  | ^~~
[...]/source-gcc/libgomp/config/linux/allocator.c:70:52: note: format string is defined here
   70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
  |  ~~^
  ||
  |long int
  |  %d
cc1: all warnings being treated as errors
make[9]: *** [allocator.lo] Error 1
make[9]: Leaving directory `[...]/build-gcc/x86_64-pc-linux-gnu/32/libgomp'
[...]

Fix this in the same way as used elsewhere in libgomp.

	libgomp/
	* config/linux/allocator.c (linux_memspace_alloc): Fix 'size_t'
	vs. '%ld' format string mismatch.
---
 libgomp/config/linux/allocator.c | 12 ++--
 1 file change

Re: GCC/Rust libgrust-v2/to-submit branch

2023-12-12 Thread Thomas Schwinge
Hi Arthur, Pierre-Emmanuel!

On 2023-12-12T10:39:50+0100, I wrote:
> On 2023-11-27T16:46:08+0100, I wrote:
>> On 2023-11-21T16:20:22+0100, Arthur Cohen  wrote:
>>> On 11/20/23 15:55, Thomas Schwinge wrote:
>>>> Arthur and Pierre-Emmanuel have prepared a GCC/Rust libgrust-v2/to-submit
>>>> branch: <https://github.com/Rust-GCC/gccrs/tree/libgrust-v2/to-submit>.

> Rebasing onto current master branch, there's a minor (textual) conflict
> in top-level 'configure.ac:host_libs': 'intl' replaced by 'gettext', and
> top-level 'configure' plus 'gcc/configure' have to be re-generated (the
> latter for some unrelated changes in line numbers).  Otherwise, those
> initial libgrust changes are now in the form that I thought they should
> be in -- so I suggest you fix that up (I can quickly have a look again,
> if you like)

I've noticed that you've fix that up (looks good), but I also noticed one
additional small item: into "build: Add libgrust as compilation modules",
you'll have to add the effect of top-level 'autogen Makefile.def' (that
is, regenerate the top-level 'Makefile.in').


Grüße
 Thomas


> and then you do the "scary" 'git push' ;-) -- and then:
>
>>> All the best, and thanks again for testing :)
>>
>> :-) So I hope I've not missed any major issues...
>
> ..., we wait and see.  :-)
>
>
> Grüße
>  Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


  1   2   3   4   5   6   7   8   9   10   >