date:20230517

RE: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build

2023-05-17 Thread Eugene Rozenfeld via Gcc-patches

Thank you for catching this, Thomas!
I modified Makefile.tmp and regenerated Makefile.in.

Here is the patch I pushed:

[PATCH] Disable warnings as errors for STAGEautofeedback.

Compilation during STAGEautofeedback produces additional warnings
since inlining decisions with -fauto-profile are different from
other builds.

This patches disables warnings as errors for STAGEautofeedback.

Tested on x86_64-pc-linux-gnu.

ChangeLog:

* Makefile.tpl: Disable warnings as errors for STAGEautofeedback
* Makefile.in: Regenerate
---
 Makefile.in  | 8 +---
 Makefile.tpl | 3 +++
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index a89bac02351..b559454cc90 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -590,9 +590,8 @@ STAGEautofeedback_CXXFLAGS = $(CXXFLAGS)
 STAGEautofeedback_CXXFLAGS = $(STAGEautofeedback_CFLAGS)
 @endif target-libstdc++-v3-bootstrap
 STAGEautofeedback_TFLAGS = $(STAGE_TFLAGS)
-# Disable warnings as errors since inlining decisions with -fauto-profile
-# may result in additional warnings.
-STAGEautofeedback_CONFIGURE_FLAGS = $(filter-out 
--enable-werror-always,$(STAGE_CONFIGURE_FLAGS))
+STAGEautofeedback_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
+
 
 # By default, C and C++ are the only stage1 languages, because they are the
 # only ones we require to build with the bootstrap compiler, and also the
@@ -641,6 +640,9 @@ STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS)
 
 STAGEautofeedback_CFLAGS = $(STAGE3_CFLAGS)
 STAGEautofeedback_TFLAGS = $(STAGE3_TFLAGS)
+# Disable warnings as errors since inlining decisions with -fauto-profile
+# may result in additional warnings.
+STAGEautofeedback_CONFIGURE_FLAGS = $(filter-out 
--enable-werror-always,$(STAGE_CONFIGURE_FLAGS))
 
 do-compare = @do_compare@
 do-compare3 = $(do-compare)
diff --git a/Makefile.tpl b/Makefile.tpl
index 9d8ef9cf678..6bcee3021c9 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -563,6 +563,9 @@ STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS)
 
 STAGEautofeedback_CFLAGS = $(STAGE3_CFLAGS)
 STAGEautofeedback_TFLAGS = $(STAGE3_TFLAGS)
+# Disable warnings as errors since inlining decisions with -fauto-profile
+# may result in additional warnings.
+STAGEautofeedback_CONFIGURE_FLAGS = $(filter-out 
--enable-werror-always,$(STAGE_CONFIGURE_FLAGS))
 
 do-compare = @do_compare@
 do-compare3 = $(do-compare)
-- 
2.25.1

Eugene
-Original Message-
From: Thomas Schwinge  
Sent: Wednesday, May 17, 2023 12:05 AM
To: Richard Biener ; Eugene Rozenfeld 

Cc: gcc-patches@gcc.gnu.org
Subject: Re: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during 
autoprofiledbootstrap build

Hi!

On 2023-05-15T09:30:35+0200, Richard Biener via Gcc-patches 
 wrote:
> On Fri, May 12, 2023 at 10:35 PM Eugene Rozenfeld 
>  wrote:
>>
>> Thank you, Richard. I went with your suggestion. New patch:
>>
>>
>> [PATCH] Disable warnings as errors for STAGEautofeedback.
>>
>> Compilation during STAGEautofeedback produces additional warnings 
>> since inlining decisions with -fauto-profile are different from other 
>> builds.
>>
>> This patches disables warnings as errors for STAGEautofeedback.
>
> Can you add a comment before the filtering?
>
> Otherwise looks good to me - please leave others 24h to comment before 
> you commit.

>> --- a/Makefile.in
>> +++ b/Makefile.in
>> @@ -590,8 +590,7 @@ STAGEautofeedback_CXXFLAGS = $(CXXFLAGS)  
>> STAGEautofeedback_CXXFLAGS = $(STAGEautofeedback_CFLAGS)  @endif 
>> target-libstdc++-v3-bootstrap  STAGEautofeedback_TFLAGS = 
>> $(STAGE_TFLAGS) -STAGEautofeedback_CONFIGURE_FLAGS = 
>> $(STAGE_CONFIGURE_FLAGS)
>> -
>> +STAGEautofeedback_CONFIGURE_FLAGS = $(filter-out 
>> +--enable-werror-always,$(STAGE_CONFIGURE_FLAGS))

That's not how it works; the next person running 'autogen Makefile.def'
to regenerate 'Makefile.in' is going to undo those changes.  Instead, modify 
'Makefile.def', 'Makefile.tpl', and then 'autogen Makefile.def'.


Grüße
 Thomas


>> -Original Message-
>> From: Richard Biener 
>> Sent: Thursday, May 11, 2023 1:58 AM
>> To: Eugene Rozenfeld 
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [EXTERNAL] Re: [PATCH] Fixes and workarounds for 
>> warnings during autoprofiledbootstrap build
>>
>> On Thu, May 11, 2023 at 4:23 AM Eugene Rozenfeld 
>>  wrote:
>> >
>> > I'm ok with disabling warnings as errors for autoprofiledbootstrap. What's 
>> > the proper way to do that? Searching for "--disable-werror" I see matches 
>> > in lib configure files but not in gcc files.
>>
>> We have --with-build-config selecting things like bootstrap-O3 and configure 
>> then disables werror by default if the build config is anything other than 
>> the default or bootstrap-debug.
>>
>> Of course profiledbootstrap and autoprofiledbootstrap are not build configs 
>> but make targets - that makes it more difficult (or impossible) to use the 
>> --disable-werror machinery here.
>>
>> There is
>>
>> STAGE_CONFIGURE_FLAGS=@stage2_werror_flag@
>>
>> so it might be possible to filter

[PATCH] RISC-V: Support RVV VREINTERPRET from vbool_t to vintm1_t

2023-05-17 Thread Pan Li via Gcc-patches

From: Pan Li 

This patch support the RVV VREINTERPRET from the vbool*_t to the
vint*m1_t.  Aka:

vint*m1_t __riscv_vreinterpret_x_x(vbool*_t);

These APIs help the users to convert vector the vbool*_t to the LMUL=1
signed integer vint*_t.  According to the RVV intrinsic SPEC as below,
the reinterpret intrinsics only change the types of the underlying contents.

https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#reinterpret-vbool-o-vintm1

For example, given below code.
vint16m8_t test_vlmul_ext_v_i16mf4_i16m8(vint16mf4_t op1) {
  return __riscv_vlmul_ext_v_i16mf4_i16m8(op1);
}

It will generate the assembly code similar as below:
vsetvli a5,zero,e8,m8,ta,ma
vlm.v   v1,0(a1)
vs1r.v  v1,0(a0)
ret

Please NOTE the test files doesn't cover all the possible combinations
of the intrinsic APIs introduced by this PATCH due to too many.
The reinterpret from vbool*_t to vuint*m1_t with lmul=1 will be coverred
in another PATCH.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (EEW_SIZE_LIST): New macro
for the eew size list.
(LMUL1_LOG2): New macro for the log2 value of lmul=1.
(main): Add signed_eew*_lmul1_interpret for indexer.
* config/riscv/riscv-vector-builtins-functions.def (vreinterpret):
Register vint*m1_t interpret function.
* config/riscv/riscv-vector-builtins-types.def 
(DEF_RVV_SIGNED_EEW8_LMUL1_INTERPRET_OPS):
New macro for vint8m1_t.
(DEF_RVV_SIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise.
(DEF_RVV_SIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise.
(DEF_RVV_SIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise.
(vbool1_t): Add to signed_eew*_interpret_ops.
(vbool2_t): Likewise.
(vbool4_t): Likewise.
(vbool8_t): Likewise.
(vbool16_t): Likewise.
(vbool32_t): Likewise.
(vbool64_t): Likewise.
* config/riscv/riscv-vector-builtins.cc 
(DEF_RVV_SIGNED_EEW8_LMUL1_INTERPRET_OPS):
New macro for vint*m1_t.
(DEF_RVV_SIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise.
(DEF_RVV_SIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise.
(DEF_RVV_SIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise.
(required_extensions_p): Add vint8m1_t interpret case.
* config/riscv/riscv-vector-builtins.def (signed_eew8_lmul1_interpret):
Add vint*m1_t interpret to base type.
(signed_eew16_lmul1_interpret): Likewise.
(signed_eew32_lmul1_interpret): Likewise.
(signed_eew64_lmul1_interpret): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c:
---
 gcc/config/riscv/genrvv-type-indexer.cc   | 13 
 .../riscv/riscv-vector-builtins-functions.def |  4 ++
 .../riscv/riscv-vector-builtins-types.def | 64 +
 gcc/config/riscv/riscv-vector-builtins.cc | 70 +++
 gcc/config/riscv/riscv-vector-builtins.def|  6 ++
 .../rvv/base/misc_vreinterpret_vbool_vint.c   | 19 -
 6 files changed, 175 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
b/gcc/config/riscv/genrvv-type-indexer.cc
index 33738e41d7c..5148abdda0f 100644
--- a/gcc/config/riscv/genrvv-type-indexer.cc
+++ b/gcc/config/riscv/genrvv-type-indexer.cc
@@ -24,6 +24,8 @@ along with GCC; see the file COPYING3.  If not see
 #include 
 
 #define BOOL_SIZE_LIST {1, 2, 4, 8, 16, 32, 64}
+#define EEW_SIZE_LIST {8, 16, 32, 64}
+#define LMUL1_LOG2 0
 
 std::string
 to_lmul (int lmul_log2)
@@ -223,6 +225,10 @@ main (int argc, const char **argv)
   for (unsigned boolsize : BOOL_SIZE_LIST)
fprintf (fp, "  /*BOOL%d_INTERPRET*/ INVALID,\n", boolsize);
 
+  for (unsigned eew : EEW_SIZE_LIST)
+   fprintf (fp, "  /*SIGNED_EEW%d_LMUL1_INTERPRET*/ %s,\n", eew,
+inttype (eew, LMUL1_LOG2, /* unsigned_p */false).c_str ());
+
   for (unsigned lmul_log2_offset : {1, 2, 3, 4, 5, 6})
{
  unsigned multiple_of_lmul = 1 << lmul_log2_offset;
@@ -312,6 +318,10 @@ main (int argc, const char **argv)
   : "INVALID");
  }
 
+   for (unsigned eew : EEW_SIZE_LIST)
+ fprintf (fp, "  /*SIGNED_EEW%d_LMUL1_INTERPRET*/ INVALID,\n",
+  eew);
+
for (unsigned lmul_log2_offset : {1, 2, 3, 4, 5, 6})
  {
unsigned multiple_of_lmul = 1 << lmul_log2_offset;
@@ -374,6 +384,9 @@ main (int argc, const char **argv)
  for (unsigned boolsize : BOOL_SIZE_LIST)
fprintf (fp, "  /*BOOL%d_INTERPRET*/ INVALID,\n", boolsize);
 
+ for (unsigned eew : EEW_SIZE_LIST)
+   fprintf (fp, "  /*SIGNED_EEW%d_LMUL1_INTERPRET*/ INVALID,\n", eew);
+
  for (unsigned lmul_log2_offset : {1, 2, 3, 4, 5, 6})
{
  unsigned multiple_of_lmul = 1 << lmul_log2_offset;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def

Re: [PATCH] Fortran: set shape of initializers of zero-sized arrays [PR95374,PR104352]

2023-05-17 Thread Jerry D via Gcc-patches


On 5/17/23 11:52 AM, Harald Anlauf via Fortran wrote:

Dear all,

the attached patch is neat, because it fixes a bug by removing code ;-)

When generating the initializer for a parameter array, we excepted
the case of size 0, which however prevented the detection of array
bounds violations and lead to ICEs in various places.  The solution
which removes the comparison for size > 0 also has the bonus that
it fixes a minor memory leak for the size==0 case...

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald


Looks Good To Me.

OK,

Jerry

[committed] c: Handle printf %B like %b for C2x

2023-05-17 Thread Joseph Myers

WG14 decided to change the printf %B format from a recommended
extension to an optional feature defined in normative text.  Thus,
change the format checking to handle %B like %b, so not diagnosing it
with -Wformat -std=c2x -pedantic, just as with other optional
normatively defined features (such as decimal floating point and its
associated formats, for example).

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/c-family/
* c-format.cc (print_char_table): Handle %B like %b.

gcc/testsuite/
* gcc.dg/format/c2x-printf-1.c: Test %B here.
* gcc.dg/format/ext-9.c: Do not test %B here.

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index 32858ef7c17..b4eeebcb30e 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -722,13 +722,12 @@ static const format_char_info print_char_table[] =
   { "F",   0, STD_C99, { T99_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  
BADLEN,  BADLEN,  BADLEN,  T2X_D32, T2X_D64, T2X_D128, BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,   BADLEN }, "-wp0 +#'I", "",   
NULL },
   { "aA",  0, STD_C99, { T99_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  
BADLEN,  BADLEN,  BADLEN,  T2X_D32, T2X_D64, T2X_D128, BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,   BADLEN }, "-wp0 +#",   "",   
NULL },
   /* C2X conversion specifiers.  */
-  { "b",   0, STD_C2X, { T2X_UI,  T2X_UC,  T2X_US,  T2X_UL,  T2X_ULL, TEX_ULL, 
T2X_ST,  T2X_UPD, T2X_UIM, BADLEN,  BADLEN,  BADLEN,   T2X_U8,  T2X_U16, 
T2X_U32, T2X_U64, T2X_UF8, T2X_UF16, T2X_UF32, T2X_UF64 }, "-wp0#", "i",  
NULL },
+  { "bB",  0, STD_C2X, { T2X_UI,  T2X_UC,  T2X_US,  T2X_UL,  T2X_ULL, TEX_ULL, 
T2X_ST,  T2X_UPD, T2X_UIM, BADLEN,  BADLEN,  BADLEN,   T2X_U8,  T2X_U16, 
T2X_U32, T2X_U64, T2X_UF8, T2X_UF16, T2X_UF32, T2X_UF64 }, "-wp0#", "i",  
NULL },
   /* X/Open conversion specifiers.  */
   { "C",   0, STD_EXT, { TEX_WI,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,   BADLEN }, "-w","",   
NULL },
   { "S",   1, STD_EXT, { TEX_W,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,   BADLEN }, "-wp",   "R",  
NULL },
   /* GNU conversion specifiers.  */
   { "m",   0, STD_EXT, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,   BADLEN }, "-wp",   "",   
NULL },
-  { "B",   0, STD_EXT, { T2X_UI,  T2X_UC,  T2X_US,  T2X_UL,  T2X_ULL, TEX_ULL, 
T2X_ST,  T2X_UPD, T2X_UIM, BADLEN,  BADLEN,  BADLEN,   T2X_U8,  T2X_U16, 
T2X_U32, T2X_U64, T2X_UF8, T2X_UF16, T2X_UF32, T2X_UF64 }, "-wp0#", "i",  
NULL },
   { NULL,  0, STD_C89, NOLENGTHS, NULL, NULL, NULL }
 };
 
diff --git a/gcc/testsuite/gcc.dg/format/c2x-printf-1.c 
b/gcc/testsuite/gcc.dg/format/c2x-printf-1.c
index ca43d7997e5..9be7d4753d1 100644
--- a/gcc/testsuite/gcc.dg/format/c2x-printf-1.c
+++ b/gcc/testsuite/gcc.dg/format/c2x-printf-1.c
@@ -28,6 +28,18 @@ foo (unsigned int u, unsigned short us, unsigned char uc, 
unsigned long ul,
   /* Use of 'L' and 'q' for long long is an extension.  */
   printf ("%Lb", ull); /* { dg-warning "does not support" } */
   printf ("%qb", ull); /* { dg-warning "does not support" } */
+  /* Similar tests with %B.  */
+  printf ("%B %hB %hhB %lB %llB %jB %zB %tB\n", u, us, uc, ul, ull, uj, z, ut);
+  printf ("%*.*llB\n", 1, 2, ull);
+  printf ("%-B\n", u);
+  printf ("%#B\n", u);
+  printf ("%08B\n", u);
+  printf ("%+B\n", u); /* { dg-warning "flag" } */
+  printf ("% B\n", u); /* { dg-warning "flag" } */
+  printf ("%-08B\n", u); /* { dg-warning "ignored" } */
+  printf ("%08.5B\n", u); /* { dg-warning "ignored" } */
+  printf ("%LB", ull); /* { dg-warning "does not support" } */
+  printf ("%qB", ull); /* { dg-warning "does not support" } */
   /* Use of %wN and %wfN with each valid conversion specifier.  */
   printf ("%w8d %w16d %w32d %w64d %wf8d %wf16d %wf32d %wf64d",
  i8, i16, i32, i64, if8, if16, if32, if64);
@@ -35,6 +47,8 @@ foo (unsigned int u, unsigned short us, unsigned char uc, 
unsigned long ul,
  i8, i16, i32, i64, if8, if16, if32, if64);
   printf ("%w8b %w16b %w32b %w64b %wf8b %wf16b %wf32b %wf64b",
  u8, u16, u32, u64, uf8, uf16, uf32, uf64);
+  printf ("%w8B %w16B %w32B %w64B %wf8B %wf16B %wf32B %wf64B",
+ u8, u16, u32, u64, uf8, uf16, uf32, uf64);
   printf ("%w8o %w16o %w32o %w64o %wf8o %wf16o %wf32o %wf64o",
  u8, u16, u32, u64, uf8, uf16, uf32, uf64);
   printf ("%w8u %w16u %w32u %w64u %wf8u %wf16u %wf32u %wf64u",
diff --git a/gcc/testsuite/gcc.dg/format/ext-9.c 
b/gcc/testsuite/gcc.dg/format/ext-9.c
index 0aeb365e767..8f091292b72 100644
--- a/gcc/testsuite/gcc.dg/format/ext-9.c
+++

[patch] Allow plugin-specific dumps

2023-05-17 Thread Nathan Sidwell via Gcc-patches

PR 99451 is about the inability to name tree and rtl dumps by plugin name.  And 
includes a patch.  But then I worked around the problem and forgot about it. 
Here it is again, retested against trunk.


ok?

nathan
--
Nathan SidwellFrom e54518bc5e59ef5cdc21c652ceac41bd0c0f436c Mon Sep 17 00:00:00 2001
From: Nathan Sidwell 
Date: Wed, 17 May 2023 19:27:13 -0400
Subject: [PATCH] Allow plugin dumps

Defer dump option parsing until plugins are initialized.  This allows one to
use plugin names for dumps.

	PR other/99451
	gcc/
	* opts.h (handle_deferred_dump_options): Declare.
	* opts-global.cc (handle_common_deferred_options): Do not handle
	dump options here.
	(handle_deferred_dump_options): New.
	* toplev.cc (toplev::main): Call it after plugin init.
---
 gcc/opts-global.cc | 20 +++-
 gcc/opts.h |  1 +
 gcc/toplev.cc  |  4 
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/gcc/opts-global.cc b/gcc/opts-global.cc
index 054169158b1..a61c701621d 100644
--- a/gcc/opts-global.cc
+++ b/gcc/opts-global.cc
@@ -401,7 +401,7 @@ handle_common_deferred_options (void)
 	  break;
 
 	case OPT_fdump_:
-	  g->get_dumps ()->dump_switch_p (opt->arg);
+	  /* Deferred until plugins initialized.  */
 	  break;
 
 case OPT_fopt_info_:
@@ -494,3 +494,21 @@ handle_common_deferred_options (void)
 	}
 }
 }
+
+/* Handle deferred dump options.  */
+
+void
+handle_deferred_dump_options (void)
+{
+  unsigned int i;
+  cl_deferred_option *opt;
+  vec v;
+
+  if (common_deferred_options)
+v = *((vec *) common_deferred_options);
+  else
+v = vNULL;
+  FOR_EACH_VEC_ELT (v, i, opt)
+if (opt->opt_index == OPT_fdump_)
+  g->get_dumps ()->dump_switch_p (opt->arg);
+}
diff --git a/gcc/opts.h b/gcc/opts.h
index 9959a440ca1..00f377f9ca7 100644
--- a/gcc/opts.h
+++ b/gcc/opts.h
@@ -425,6 +425,7 @@ extern void control_warning_option (unsigned int opt_index, int kind,
 extern char *write_langs (unsigned int mask);
 extern void print_ignored_options (void);
 extern void handle_common_deferred_options (void);
+extern void handle_deferred_dump_options (void);
 unsigned int parse_sanitizer_options (const char *, location_t, int,
   unsigned int, int, bool);
 
diff --git a/gcc/toplev.cc b/gcc/toplev.cc
index d53b5e78ae3..c606a0697b7 100644
--- a/gcc/toplev.cc
+++ b/gcc/toplev.cc
@@ -2253,6 +2253,10 @@ toplev::main (int argc, char **argv)
 
   initialize_plugins ();
 
+  /* Handle the dump options now that plugins have had a chance to install new
+ passes.  */
+  handle_deferred_dump_options ();
+
   if (version_flag)
 print_version (stderr, "", true);
 
-- 
2.40.1

Re: [PATCH] Fix type error of 'switch (SUBREG_BYTE (op)).'

2023-05-17 Thread Jeff Law via Gcc-patches





On 5/17/23 03:03, Jin Ma wrote:

For example:
(define_insn "mov_lowpart_sidi2"
   [(set (match_operand:SI0 "register_operand" "=r")
 (subreg:SI (match_operand:DI 1 "register_operand" " r") 0))]
   "TARGET_64BIT"
   "mov\t%0,%1")

(define_insn "mov_highpart_sidi2"
   [(set (match_operand:SI0 "register_operand" "=r")
 (subreg:SI (match_operand:DI 1 "register_operand" " r") 1))]
   "TARGET_64BIT"
   "movh\t%0,%1")

When defining the above patterns, the generated file insn-recog.cc will
appear 'switch (SUBREG_BYTE (op))', but since the return value of
SUBREG_BYTE is poly_uint16_pod, the following error will occur:
"error: switch quantity not an integer".

gcc/ChangeLog:

* genrecog.cc (print_nonbool_test): Fix type error of
'switch (SUBREG_BYTE (op))'.

Thanks.  Installed.

jeff

Re: [PATCH] RISC-V: Remove trailing spaces on lines.

2023-05-17 Thread Jeff Law via Gcc-patches





On 5/17/23 03:08, Jin Ma wrote:

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Remove
trailing spaces on lines.
* config/riscv/riscv.cc (riscv_legitimize_move): Likewise.
* config/riscv/riscv.h (enum reg_class): Likewise.
* config/riscv/riscv.md: Likewise.
---
  gcc/common/config/riscv/riscv-common.cc | 2 +-
  gcc/config/riscv/riscv.cc   | 6 +++---
  gcc/config/riscv/riscv.h| 2 +-
  gcc/config/riscv/riscv.md   | 4 ++--
  4 files changed, 7 insertions(+), 7 deletions(-)

Thanks.  Installed.
jeff

Re: [PATCH v3] emit DW_AT_name for DW_TAG_GNU_formal_parameter_pack [PR70536]

2023-05-17 Thread Ed Catmur

Ping.

On Sat, 4 Feb 2023, at 10:50, Ed Catmur wrote:
> Per 
> http://wiki.dwarfstd.org/index.php?title=C%2B%2B0x:_Variadic_templates 
> DW_TAG_GNU_formal_parameter_pack should have a DW_AT_name:
>
> 17$:  DW_TAG_formal_parameter_pack
>   DW_AT_name("args")
> 18$:  DW_TAG_formal_parameter
>   ! no DW_AT_name attribute
>   DW_AT_type(reference to 13$)
> (...)
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70536
> ---
>  gcc/dwarf2out.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
> index 1f39df3b1e2..462328acd9f 100644
> --- a/gcc/dwarf2out.cc
> +++ b/gcc/dwarf2out.cc
> @@ -23041,7 +23041,7 @@ gen_formal_parameter_pack_die  (tree parm_pack,
> && subr_die);
> 
>parm_pack_die = new_die (DW_TAG_GNU_formal_parameter_pack, subr_die, 
> parm_pack);
> -  add_src_coords_attributes (parm_pack_die, parm_pack);
> +  add_name_and_src_coords_attributes (parm_pack_die, parm_pack);
> 
>for (arg = pack_arg; arg; arg = DECL_CHAIN (arg))
>  {
> -- 
> 2.34.1

Re: [PATCH] libstdc++: use __bool_constant instead of integral_constant

2023-05-17 Thread Ken Matsui via Gcc-patches

Thank you!

On Wed, May 17, 2023 at 8:53 AM Patrick Palka  wrote:
>
> On Fri, 12 May 2023, Ken Matsui via Libstdc++ wrote:
>
> > It appears that GCC 13 has been released, but I am wondering if there
> > are any issues preventing this patch from being merged yet. Can you
> > provide any information on this?
>
> Thanks for the reminder, I pushed this to trunk just now
> (r14-940-g637edefc5863cf). Congrats on your first libstdc++ commit!
>
> >
> > On Sat, Apr 8, 2023 at 2:08 PM Ken Matsui  wrote:
> > >
> > > I see. Thank you!
> > >
> > > On Sat, Apr 8, 2023 at 12:52 AM Jonathan Wakely  
> > > wrote:
> > > >
> > > > This looks good, thanks, but we're too close to the gcc 13 release now, 
> > > > and this isn't fixing any bugs. I'll push it after the release.
> > > >
> > > >
> > > > On Thu, 23 Mar 2023, 11:07 Ken Matsui via Libstdc++, 
> > > >  wrote:
> > > >>
> > > >> In the type_traits header, both integral_constant and 
> > > >> __bool_constant
> > > >> are used. This patch unifies those usages into __bool_constant.
> > > >>
> > > >> libstdc++-v3/ChangeLog:
> > > >>
> > > >> * include/std/type_traits: Use __bool_constant instead of
> > > >> integral_constant.
> > > >>
> > > >> Signed-off-by: Ken Matsui 
> > > >> ---
> > > >>  libstdc++-v3/include/std/type_traits | 32 ++--
> > > >>  1 file changed, 16 insertions(+), 16 deletions(-)
> > > >>
> > > >> diff --git a/libstdc++-v3/include/std/type_traits 
> > > >> b/libstdc++-v3/include/std/type_traits
> > > >> index 2bd607a8b8f..bc6982f9e64 100644
> > > >> --- a/libstdc++-v3/include/std/type_traits
> > > >> +++ b/libstdc++-v3/include/std/type_traits
> > > >> @@ -578,19 +578,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >>/// is_enum
> > > >>template
> > > >>  struct is_enum
> > > >> -: public integral_constant
> > > >> +: public __bool_constant<__is_enum(_Tp)>
> > > >>  { };
> > > >>
> > > >>/// is_union
> > > >>template
> > > >>  struct is_union
> > > >> -: public integral_constant
> > > >> +: public __bool_constant<__is_union(_Tp)>
> > > >>  { };
> > > >>
> > > >>/// is_class
> > > >>template
> > > >>  struct is_class
> > > >> -: public integral_constant
> > > >> +: public __bool_constant<__is_class(_Tp)>
> > > >>  { };
> > > >>
> > > >>/// is_function
> > > >> @@ -784,7 +784,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >>/// is_trivial
> > > >>template
> > > >>  struct is_trivial
> > > >> -: public integral_constant
> > > >> +: public __bool_constant<__is_trivial(_Tp)>
> > > >>  {
> > > >>
> > > >> static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp>{}),
> > > >> "template argument must be a complete class or an unbounded 
> > > >> array");
> > > >> @@ -793,7 +793,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >>/// is_trivially_copyable
> > > >>template
> > > >>  struct is_trivially_copyable
> > > >> -: public integral_constant
> > > >> +: public __bool_constant<__is_trivially_copyable(_Tp)>
> > > >>  {
> > > >>
> > > >> static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp>{}),
> > > >> "template argument must be a complete class or an unbounded 
> > > >> array");
> > > >> @@ -802,7 +802,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >>/// is_standard_layout
> > > >>template
> > > >>  struct is_standard_layout
> > > >> -: public integral_constant
> > > >> +: public __bool_constant<__is_standard_layout(_Tp)>
> > > >>  {
> > > >>
> > > >> static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp>{}),
> > > >> "template argument must be a complete class or an unbounded 
> > > >> array");
> > > >> @@ -817,7 +817,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >>  struct
> > > >>  _GLIBCXX20_DEPRECATED_SUGGEST("is_standard_layout && is_trivial")
> > > >>  is_pod
> > > >> -: public integral_constant
> > > >> +: public __bool_constant<__is_pod(_Tp)>
> > > >>  {
> > > >>
> > > >> static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp>{}),
> > > >> "template argument must be a complete class or an unbounded 
> > > >> array");
> > > >> @@ -831,7 +831,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >>  struct
> > > >>  _GLIBCXX17_DEPRECATED
> > > >>  is_literal_type
> > > >> -: public integral_constant
> > > >> +: public __bool_constant<__is_literal_type(_Tp)>
> > > >>  {
> > > >>
> > > >> static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp>{}),
> > > >> "template argument must be a complete class or an unbounded 
> > > >> array");
> > > >> @@ -840,13 +840,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >>/// is_empty
> > > >>template
> > > >>  struct is_empty
> > > >> -: public integral_constant
> > > >> +: public __bool_constant<__is_empty(_Tp)>
> > > >>  { };
> > > >>
> > > >>///

[committed] hppa: Add clear_cache expander

2023-05-17 Thread John David Anglin

Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11.

Committed to trunk.

Dave
---

Add clear_cache expander.

2023-05-17  John David Anglin  

gcc/ChangeLog:

* config/pa/pa.md (clear_cache): New.

diff --git a/gcc/config/pa/pa.md b/gcc/config/pa/pa.md
index 7b7d7f776c7..726e12768f8 100644
--- a/gcc/config/pa/pa.md
+++ b/gcc/config/pa/pa.md
@@ -9940,6 +9940,23 @@ add,l %2,%3,%3\;bv,n %%r0(%3)"
   [(set_attr "type" "multi")
(set_attr "length" "52")])
 
+(define_expand "clear_cache"
+  [(match_operand 0 "pmode_register_operand")
+   (match_operand 1 "pmode_register_operand")]
+  ""
+{
+  rtx line_length = gen_reg_rtx (Pmode);
+
+  emit_move_insn (line_length, GEN_INT (MIN_CACHELINE_SIZE));
+  if (TARGET_64BIT)
+emit_insn (gen_icacheflushdi (operands[0], operands[1], line_length,
+ gen_reg_rtx (Pmode), gen_reg_rtx (Pmode)));
+  else
+emit_insn (gen_icacheflushsi (operands[0], operands[1], line_length,
+ gen_reg_rtx (Pmode), gen_reg_rtx (Pmode)));
+  DONE;
+})
+
 ;; An out-of-line prologue.
 (define_insn "outline_prologue_call"
   [(unspec_volatile [(const_int 0)] UNSPECV_OPC)



signature.asc
Description: PGP signature

[pushed] doc: Fix a pinch of typos in extend.texi

2023-05-17 Thread Arsen Arsenović via Gcc-patches

gcc/ChangeLog:

* doc/extend.texi (C++ Concepts) : Remove extraneous
parenthesis.  Fix misnamed index entry.
: Fix misnamed index entry.
---

Pushed as obvious.

 gcc/doc/extend.texi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 69b21a75e62..ed8b9c8a87b 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -25460,12 +25460,12 @@ assumption is valid. For example, @code{assume(n > 
0)}.
 @item axiom
 Introduces an axiom definition. Axioms introduce requirements on values.
 
-@kindex axiom
+@kindex forall
 @item forall
 Introduces a universally quantified object in an axiom. For example,
-@code{forall (int n) n + 0 == n}).
+@code{forall (int n) n + 0 == n}.
 
-@kindex axiom
+@kindex concept
 @item concept
 Introduces a concept definition. Concepts are sets of syntactic and semantic
 requirements on types and their values.
-- 
2.40.1

Re: [PATCH] Fortran: set shape of initializers of zero-sized arrays [PR95374,PR104352]

2023-05-17 Thread Mikael Morin


Le 17/05/2023 à 20:52, Harald Anlauf via Fortran a écrit :

Dear all,

the attached patch is neat, because it fixes a bug by removing code ;-)

When generating the initializer for a parameter array, we excepted
the case of size 0, which however prevented the detection of array
bounds violations and lead to ICEs in various places.  The solution
which removes the comparison for size > 0 also has the bonus that
it fixes a minor memory leak for the size==0 case...

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


Sure.

Thanks

Re: [PATCH] libstdc++: Synchronize PSTL with upstream

2023-05-17 Thread Jonathan Wakely via Gcc-patches

-template 
-  _OutputIterator
-__brick_generate_n(_OutputIterator __first, _Size __count, _Generator __g,
/* is_vector = */ std::true_type) noexcept
+template 

Missing uglification on Size.

+_RandomAccessIterator
+__brick_generate_n(_RandomAccessIterator __first, Size __count, _Generator
__g,
+   /* is_vector = */ std::true_type) noexcept
 {
 return __unseq_backend::__simd_generate_n(__first, __count, __g);
 }

-template 
-  _OutputIterator
-__brick_generate_n(_OutputIterator __first, _Size __count, _Generator __g,
/* is_vector = */ std::false_type) noexcept
+template 

Missing uglification on OutputIterator and Size.

+OutputIterator
+__brick_generate_n(OutputIterator __first, Size __count, _Generator __g,
/* is_vector = */ std::false_type) noexcept


-template 
-_ForwardIterator2
-__brick_adjacent_difference(_ForwardIterator1 __first, _ForwardIterator1
__last, _ForwardIterator2 __d_first,
-_BinaryOperation __op, /*is_vector=*/std::true_type) noexcept
+template 

Missing uglification on BinaryOperation.

+_RandomAccessIterator2
+__brick_adjacent_difference(_RandomAccessIterator1 __first,
_RandomAccessIterator1 __last,
+_RandomAccessIterator2 __d_first,
BinaryOperation __op,
+/*is_vector=*/std::true_type) noexcept


The above problems exist on the declaration and the definitions.


--- a/libstdc++-v3/include/pstl/glue_execution_defs.h
+++ b/libstdc++-v3/include/pstl/glue_execution_defs.h
@@ -18,8 +18,8 @@ namespace std
 {
 // Type trait
 using __pstl::execution::is_execution_policy;
-#if _PSTL_CPP14_VARIABLE_TEMPLATES_PRESENT
-#if __INTEL_COMPILER
+#if defined(_PSTL_CPP14_VARIABLE_TEMPLATES_PRESENT)
+#if defined(__INTEL_COMPILER)
 template 
 constexpr bool is_execution_policy_v = is_execution_policy::value;
 #else

Pre-existing, but that T should be _Tp, but it only affects the Intel
compiler branch, so meh.

Please fix these and report them upstream too.

All the actual code changes look good.

I think I'd prefer if __pattern_partial_sort_copy used
std::uninitialized_copy instead of a loop and placement-new, but that
doesn't need to hold this up. We could optimize some uses of
std::conjunction and std::conditional to use our own __and_ and
__conditional, but I'm not sure it's worth diverging from upstream to do
that.

Please fix the naming bugs noted above and push to trunk, thanks!

+Reviewed-by: Jonathan Wakely

Re: [PATCH] libstdc++: Fix up some templates [PR109883]

2023-05-17 Thread Jonathan Wakely via Gcc-patches

On Wed, 17 May 2023 at 19:18, Jakub Jelinek  wrote:

> Hi!
>
> As can be seen on the following testcase, for
>
> std::{atan2,fmod,pow,copysign,fdim,fmax,fmin,hypot,nextafter,remainder,remquo,fma}
> if one operand type is std::float{16,32,64,128}_t or std::bfloat16_t and
> another one some integral type or some other floating point type which
> promotes to the other operand's type, we can end up with endless recursion.
> This is because of a declaration ordering problem in , where the
> float, double and long double overloads of those functions come before
> the templates which use __gnu_cxx::__promote_{2,3}, but the
> std::float{16,32,64,128}_t and std::bfloat16_t overloads come later in the
> file.  If the result of those promotions is _Float{16,32,64,128} or
> __gnu_cxx::__bfloat16_t, say std::pow(_Float64, int) calls
> std::pow(_Float64, _Float64) and the latter calls itself.
>
> The following patch fixes that by moving those templates later in the file,
> so that the calls from those templates see also the other overloads.
>

I checked that each set of moved functions is still within the same #if
group, so there's no change in the logic for whether they are defined or
not, only the point of declaration within the file.


I think other templates in the file like e.g. isgreater etc. shouldn't be
> a problem, because those just use __builtin_isgreater etc. in their bodies.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/13.2?
>

OK for both, thanks.



>
> 2023-05-17  Jakub Jelinek  
>
> PR libstdc++/109883
> * include/c_global/cmath (atan2, fmod, pow): Move
> __gnu_cxx::__promote_2 using templates after _Float{16,32,64,128}
> and
> __gnu_cxx::__bfloat16_t overloads.
> (copysign, fdim, fmax, fmin, hypot, nextafter, remainder, remquo):
> Likewise.
> (fma): Move __gnu_cxx::__promote_3 using template after
> _Float{16,32,64,128} and __gnu_cxx::__bfloat16_t overloads.
>
> * testsuite/26_numerics/headers/cmath/constexpr_std_c++23.cc: New
> test.
>
> --- libstdc++-v3/include/c_global/cmath.jj  2023-01-16
> 23:19:06.225717615 +0100
> +++ libstdc++-v3/include/c_global/cmath 2023-05-17 15:07:07.823657320 +0200
> @@ -151,15 +151,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>{ return __builtin_atan2l(__y, __x); }
>  #endif
>
> -  template
> -inline _GLIBCXX_CONSTEXPR
> -typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
> -atan2(_Tp __y, _Up __x)
> -{
> -  typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
> -  return atan2(__type(__y), __type(__x));
> -}
> -
>using ::ceil;
>
>  #ifndef __CORRECT_ISO_CPP_MATH_H_PROTO
> @@ -286,15 +277,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>{ return __builtin_fmodl(__x, __y); }
>  #endif
>
> -  template
> -inline _GLIBCXX_CONSTEXPR
> -typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
> -fmod(_Tp __x, _Up __y)
> -{
> -  typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
> -  return fmod(__type(__x), __type(__y));
> -}
> -
>using ::frexp;
>
>  #ifndef __CORRECT_ISO_CPP_MATH_H_PROTO
> @@ -411,15 +393,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  #endif
>  #endif
>
> -  template
> -inline _GLIBCXX_CONSTEXPR
> -typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
> -pow(_Tp __x, _Up __y)
> -{
> -  typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
> -  return pow(__type(__x), __type(__y));
> -}
> -
>using ::sin;
>
>  #ifndef __CORRECT_ISO_CPP_MATH_H_PROTO
> @@ -1073,6 +1046,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>{ return __gnu_cxx::__bfloat16_t(__builtin_tanhf(__x)); }
>  #endif
>
> +  template
> +inline _GLIBCXX_CONSTEXPR
> +typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
> +atan2(_Tp __y, _Up __x)
> +{
> +  typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
> +  return atan2(__type(__y), __type(__x));
> +}
> +
> +  template
> +inline _GLIBCXX_CONSTEXPR
> +typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
> +fmod(_Tp __x, _Up __y)
> +{
> +  typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
> +  return fmod(__type(__x), __type(__y));
> +}
> +
> +  template
> +inline _GLIBCXX_CONSTEXPR
> +typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
> +pow(_Tp __x, _Up __y)
> +{
> +  typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
> +  return pow(__type(__x), __type(__y));
> +}
> +
>  #if _GLIBCXX_USE_C99_MATH
>  #if !_GLIBCXX_USE_C99_FP_MACROS_DYNAMIC
>
> @@ -2107,16 +2107,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>{ return __builtin_copysignl(__x, __y); }
>  #endif
>
> -#ifndef __CORRECT_ISO_CPP11_MATH_H_PROTO_INT
> -  template
> -constexpr typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
> -copysign(_Tp __x, _Up __y)
> -{
> -  typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
> -  return

[committed] libstdc++: Uncomment checks for enumeration types

2023-05-17 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

I don't know why these checks are disabled.

libstdc++-v3/ChangeLog:

* testsuite/18_support/headers/limits/synopsis.cc: Uncomment
checks for float_round_style and float_denorm_style.
---
 libstdc++-v3/testsuite/18_support/headers/limits/synopsis.cc | 2 --
 1 file changed, 2 deletions(-)

diff --git a/libstdc++-v3/testsuite/18_support/headers/limits/synopsis.cc 
b/libstdc++-v3/testsuite/18_support/headers/limits/synopsis.cc
index fe158b995cf..38e8de6adab 100644
--- a/libstdc++-v3/testsuite/18_support/headers/limits/synopsis.cc
+++ b/libstdc++-v3/testsuite/18_support/headers/limits/synopsis.cc
@@ -23,10 +23,8 @@
 namespace std {
   template class numeric_limits;
 
-#if 0
   enum float_round_style;
   enum float_denorm_style;
-#endif
 
   template<> class numeric_limits;
 
-- 
2.40.1

[committed] libstdc++: Add system_header pragma to

2023-05-17 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

Without this change many tests that depend on an effective-target will
fail when compiled with -pedantic -std=c++98. This happens because the
preprocessor check done by v3_check_preprocessor_condition uses -Werror
and includes  directly (rather than via another header
like ). If  is not a system header then this
pedwarn is not suppressed, and the effective-target check fails:

bits/c++config.h:220: error: anonymous variadic macros were introduced in C++11 
[-Werror=variadic-macros]
cc1plus: all warnings being treated as errors
compiler exited with status 1
UNSUPPORTED: 18_support/headers/limits/synopsis.cc

We could consider also changing proc v3_check_preprocessor_condition so
that it includes a real header, rather than just , but
that's not necessary for now.

libstdc++-v3/ChangeLog:

* include/bits/c++config: Add system_header pragma.
---
 libstdc++-v3/include/bits/c++config | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 13892787e09..009a017b048 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -30,6 +30,8 @@
 #ifndef _GLIBCXX_CXX_CONFIG_H
 #define _GLIBCXX_CXX_CONFIG_H 1
 
+#pragma GCC system_header
+
 // The major release number for the GCC release the C++ library belongs to.
 #define _GLIBCXX_RELEASE
 
-- 
2.40.1

[committed] libstdc++: Implement LWG 3877 for std::expected monadic ops

2023-05-17 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Pushed to trunk.

I'll probably backport this to gcc-13 too.

There are no tests for the changes to and_then and transform, because
LWG 3843 makes that hard to test. Both and_then and transform call
value(), which requires a copy constructible error_type. So we can test
that certain types can never be used as the error-type, but it's not
very useful to test.

-- >8 --

This was approved in Issaquah 2023. As well as fixing the value
categories, this fixes the fact that we were incorrectly testing E
instead of T in the or_else constraints.

libstdc++-v3/ChangeLog:

* include/std/expected (expected::and_then, expected::or_else)
(expected::transform, expected::transform_error): Fix exception
specifications as per LWG 3877.
(expected::and_then, expected::transform):
Likewise.
* testsuite/20_util/expected/lwg3877.cc: New test.
---
 libstdc++-v3/include/std/expected | 48 +++---
 .../testsuite/20_util/expected/lwg3877.cc | 64 +++
 2 files changed, 88 insertions(+), 24 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/expected/lwg3877.cc

diff --git a/libstdc++-v3/include/std/expected 
b/libstdc++-v3/include/std/expected
index c6d26b0d224..5ea0d6a7cb9 100644
--- a/libstdc++-v3/include/std/expected
+++ b/libstdc++-v3/include/std/expected
@@ -840,7 +840,7 @@ namespace __expected
 
   // monadic operations
 
-  template requires is_copy_constructible_v<_Er>
+  template requires is_constructible_v<_Er, _Er&>
constexpr auto
and_then(_Fn&& __f) &
{
@@ -854,7 +854,7 @@ namespace __expected
return _Up(unexpect, error());
}
 
-  template requires is_copy_constructible_v<_Er>
+  template requires is_constructible_v<_Er, const _Er&>
constexpr auto
and_then(_Fn&& __f) const &
{
@@ -868,7 +868,7 @@ namespace __expected
return _Up(unexpect, error());
}
 
-  template requires is_move_constructible_v<_Er>
+  template requires is_constructible_v<_Er, _Er>
constexpr auto
and_then(_Fn&& __f) &&
{
@@ -883,7 +883,7 @@ namespace __expected
}
 
 
-  template requires is_move_constructible_v<_Er>
+  template requires is_constructible_v<_Er, const _Er>
constexpr auto
and_then(_Fn&& __f) const &&
{
@@ -897,7 +897,7 @@ namespace __expected
return _Up(unexpect, std::move(error()));
}
 
-  template requires is_copy_constructible_v<_Er>
+  template requires is_constructible_v<_Tp, _Tp&>
constexpr auto
or_else(_Fn&& __f) &
{
@@ -911,7 +911,7 @@ namespace __expected
return std::__invoke(std::forward<_Fn>(__f), error());
}
 
-  template requires is_copy_constructible_v<_Er>
+  template requires is_constructible_v<_Tp, const _Tp&>
constexpr auto
or_else(_Fn&& __f) const &
{
@@ -926,7 +926,7 @@ namespace __expected
}
 
 
-  template requires is_move_constructible_v<_Er>
+  template requires is_constructible_v<_Tp, _Tp>
constexpr auto
or_else(_Fn&& __f) &&
{
@@ -940,7 +940,7 @@ namespace __expected
return std::__invoke(std::forward<_Fn>(__f), std::move(error()));
}
 
-  template requires is_move_constructible_v<_Er>
+  template requires is_constructible_v<_Tp, const _Tp>
constexpr auto
or_else(_Fn&& __f) const &&
{
@@ -954,7 +954,7 @@ namespace __expected
return std::__invoke(std::forward<_Fn>(__f), std::move(error()));
}
 
-  template requires is_copy_constructible_v<_Er>
+  template requires is_constructible_v<_Er, _Er&>
constexpr auto
transform(_Fn&& __f) &
{
@@ -970,7 +970,7 @@ namespace __expected
return _Res(unexpect, std::move(error()));
}
 
-  template requires is_copy_constructible_v<_Er>
+  template requires is_constructible_v<_Er, const _Er&>
constexpr auto
transform(_Fn&& __f) const &
{
@@ -986,7 +986,7 @@ namespace __expected
return _Res(unexpect, std::move(error()));
}
 
-  template requires is_move_constructible_v<_Er>
+  template requires is_constructible_v<_Er, _Er>
constexpr auto
transform(_Fn&& __f) &&
{
@@ -1002,7 +1002,7 @@ namespace __expected
return _Res(unexpect, std::move(error()));
}
 
-  template requires is_move_constructible_v<_Er>
+  template requires is_constructible_v<_Er, const _Er>
constexpr auto
transform(_Fn&& __f) const &&
{
@@ -1018,7 +1018,7 @@ namespace __expected
return _Res(unexpect, std::move(error()));
}
 
-  template requires is_copy_constructible_v<_Tp>
+  template requires is_constructible_v<_Tp, _Tp&>
constexpr auto
transform_error(_Fn&& __f) &
{
@@

Re: [v2] RISC-V: Remove masking third operand of rotate instructions

2023-05-17 Thread Jeff Law via Gcc-patches

On 5/17/23 10:02, Jivan Hakobyan via Gcc-patches wrote:

Subject:
[v2] RISC-V: Remove masking third operand of rotate instructions
From:
Jivan Hakobyan via Gcc-patches 
Date:
5/17/23, 10:02

To:
gcc-patches@gcc.gnu.org

Rotate instructions do not need to mask the third operand.
For example,  RV64 the following code:

unsigned long foo1(unsigned long rs1, unsigned long rs2)
{
 long shamt = rs2 & (64 - 1);
 return (rs1 << shamt) | (rs1 >> ((64 - shamt) & (64 - 1)));
}

Compiles to:
foo1:
 andia1,a1,63
 rol a0,a0,a1
 ret

This patch removes unnecessary masking.
Besides, I have merged masking insns for shifts that were written before.

gcc/ChangeLog:
 * config/riscv/riscv.md (*3_mask): New pattern,
 combined from ...
 (*si3_mask, *di3_mask): Here.
 (*3_mask_1): New pattern, combined from ...
 (*si3_mask_1, *di3_mask_1): Here.
 * config/riscv/bitmanip.md (*3_mask): New
 pattern.
 (*si3_sext_mask): Likewise.
 * config/riscv/iterators.md (shiftm1): Generalize to handle more
 masking constants.
 (bitmanip_rotate): New iterator.
 (bitmanip_optab): Add rotates.
 * config/riscv/predicates.md (const_si_mask_operand): Renamed
 from const31_operand.  Generalize to handle more mask constants.
 (const_di_mask_operand): Similarly.

gcc/testsuite/ChangeLog:
 * testsuite/gcc.target/riscv/shift-and-2.c: Fixed test
 * testsuite/gcc.target/riscv/zbb-rol-ror-01.c: New test
 * testsuite/gcc.target/riscv/zbb-rol-ror-02.c: New test
 * testsuite/gcc.target/riscv/zbb-rol-ror-03.c: New test
 * testsuite/gcc.target/riscv/zbb-rol-ror-04.c: New test
 * testsuite/gcc.target/riscv/zbb-rol-ror-05.c: New test
 * testsuite/gcc.target/riscv/zbb-rol-ror-06.c: New test
 * testsuite/gcc.target/riscv/zbb-rol-ror-07.c: New test

Thanks for the updated patch.  A few comments.

-- With the best regards Jivan Hakobyan

rotate_mask.patch

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index a27fc3e34a1..0fd0cbdeb04 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -351,6 +351,42 @@
"rolw\t%0,%1,%2"
[(set_attr "type" "bitmanip")])

+(define_insn_and_split "*3_mask"

+  [(set (match_operand:X 0 "register_operand" "= r")
+(bitmanip_rotate:X
+(match_operand:X 1 "register_operand" "  r")
+(match_operator 4 "subreg_lowpart_operator"
+ [(and:X
+   (match_operand:X 2 "register_operand"  "r")
+   (match_operand 3 "" ""))])))]
+  "TARGET_ZBB || TARGET_ZBKB"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+(bitmanip_rotate:X (match_dup 1)
+   (match_dup 2)))]
+  "operands[2] = gen_lowpart (QImode, operands[2]);"
+  [(set_attr "type" "bitmanip")
+   (set_attr "mode" "")])
So I couldn't resist the temptation to look at the mode iterator 
improvements a bit after our call this morning.  As you note, the most 
obvious changes will regress the testsuite.  But it turns out there are 
things we can do to further simplify/combine patterns.

So the trick for the above pattern is to use GPR2 rather than X for the 
mode of the bitwise-and subexpression.  That allows the mode of that 
subexpression to vary independently of the mode of the output. 
Similarly for the other pattern that you're adding to bitmanip.md.

We can use GPR/GPR2 iterators in the riscv.md changes as well.  The 
primary benefit in doing so is we can eliminate another pair of patterns.

With those change we have simpler code that still passes all the new tests.

I've regression tested this V3 variant with no issues.  I'll commit it 
to the trunk under your name since the bulk of the work is yours.

Thanks,
jeffcommit 6da6ed95c9ca247d405da3dfb737b743686fe5e6
Author: Jivan Hakobyan 
Date:   Wed May 17 13:00:28 2023 -0600

RISC-V: Remove masking third operand of rotate instructions

Rotate instructions do not need to mask the third operand.
For example,  RV64 the following code:

unsigned long foo1(unsigned long rs1, unsigned long rs2)
{
long shamt = rs2 & (64 - 1);
return (rs1 << shamt) | (rs1 >> ((64 - shamt) & (64 - 1)));
}

Compiles to:
foo1:
andia1,a1,63
rol a0,a0,a1
ret

This patch removes unnecessary masking.
Besides, I have merged masking insns for shifts that were written 
before.

gcc/ChangeLog:
* config/riscv/riscv.md (*3_mask): New pattern,
combined from ...
(*si3_mask, *di3_mask): Here.
(*si3_mask_1, *di3_mask_1): And here.
* config/riscv/bitmanip.md (*3_mask): New
pattern.
(*si3_sext_mask): Likewise.
* config/riscv/iterators.md

[PATCH] Fortran: set shape of initializers of zero-sized arrays [PR95374,PR104352]

2023-05-17 Thread Harald Anlauf via Gcc-patches

Dear all,

the attached patch is neat, because it fixes a bug by removing code ;-)

When generating the initializer for a parameter array, we excepted
the case of size 0, which however prevented the detection of array
bounds violations and lead to ICEs in various places.  The solution
which removes the comparison for size > 0 also has the bonus that
it fixes a minor memory leak for the size==0 case...

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 9d2995d2c1cf5708e3297fc7cffb5184d45a65cb Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Wed, 17 May 2023 20:39:18 +0200
Subject: [PATCH] Fortran: set shape of initializers of zero-sized arrays
 [PR95374,PR104352]

gcc/fortran/ChangeLog:

	PR fortran/95374
	PR fortran/104352
	* decl.cc (add_init_expr_to_sym): Set shape of initializer also for
	zero-sized arrays, so that bounds violations can be detected later.

gcc/testsuite/ChangeLog:

	PR fortran/95374
	PR fortran/104352
	* gfortran.dg/zero_sized_13.f90: New test.
---
 gcc/fortran/decl.cc |  3 +--
 gcc/testsuite/gfortran.dg/zero_sized_13.f90 | 28 +
 2 files changed, 29 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/zero_sized_13.f90

diff --git a/gcc/fortran/decl.cc b/gcc/fortran/decl.cc
index 9c4b40d4ac4..4c578d01ad4 100644
--- a/gcc/fortran/decl.cc
+++ b/gcc/fortran/decl.cc
@@ -2239,8 +2239,7 @@ add_init_expr_to_sym (const char *name, gfc_expr **initp, locus *var_locus)
 	  && gfc_is_constant_expr (init)
 	  && (init->expr_type == EXPR_CONSTANT
 		  || init->expr_type == EXPR_STRUCTURE)
-	  && spec_size (sym->as, )
-	  && mpz_cmp_si (size, 0) > 0)
+	  && spec_size (sym->as, ))
 	{
 	  array = gfc_get_array_expr (init->ts.type, init->ts.kind,
 	  >where);
diff --git a/gcc/testsuite/gfortran.dg/zero_sized_13.f90 b/gcc/testsuite/gfortran.dg/zero_sized_13.f90
new file mode 100644
index 000..4035d458b32
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/zero_sized_13.f90
@@ -0,0 +1,28 @@
+! { dg-do compile }
+! { dg-options "-w" }
+!
+! PR fortran/95374
+! PR fortran/104352 - Various ICEs for bounds violation with zero-sized arrays
+!
+! Contributed by G. Steinmetz
+
+program p
+  implicit none
+  integer :: i
+  integer, parameter :: a(0)= 0
+  integer, parameter :: b(0:-5) = 0
+  integer, parameter :: c(*) = [(a(i:i), i=0,0)] ! { dg-error "out of bounds" }
+  integer, parameter :: d(*) = [(b(i:i), i=1,1)] ! { dg-error "out of bounds" }
+  integer, parameter :: e(1) = [(a(i)  , i=1,1)] ! { dg-error "out of bounds" }
+  integer, parameter :: f(1) = [(a(i:i), i=1,1)] ! { dg-error "out of bounds" }
+  integer:: g(1) = [(a(i:i), i=0,0)] ! { dg-error "out of bounds" }
+  integer:: h(1) = [(a(i:i), i=1,1)] ! { dg-error "out of bounds" }
+  print *, [(a(i:i), i=0,0)] ! { dg-error "out of bounds" }
+  print *, [(a(i:i), i=1,1)] ! { dg-error "out of bounds" }
+  print *, any (a(1:1) == 1) ! { dg-error "out of bounds" }
+  print *, all (a(0:0) == 1) ! { dg-error "out of bounds" }
+  print *, sum (a(1:1))  ! { dg-error "out of bounds" }
+  print *, iall (a(0:0)) ! { dg-error "out of bounds" }
+  print *, minloc (a(0:0),1) ! { dg-error "out of bounds" }
+  print *, dot_product(a(1:1),a(1:1)) ! { dg-error "out of bounds" }
+end
--
2.35.3

[committed] tree-ssa-math-opts: correct -ffp-contract= check

2023-05-17 Thread Alexander Monakov via Gcc-patches

Since tree-ssa-math-opts may freely contract across statement boundaries
we should enable it only for -ffp-contract=fast instead of disabling it
for -ffp-contract=off.

No functional change, since -ffp-contract=on is not exposed yet.

gcc/ChangeLog:

* tree-ssa-math-opts.cc (convert_mult_to_fma): Enable only for
FP_CONTRACT_FAST (no functional change).
---

Preapproved in PR 106092, pushed to trunk.

 gcc/tree-ssa-math-opts.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index b58a2ac9e6..d71c51dc0e 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -3320,7 +3320,7 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2,
   imm_use_iterator imm_iter;
 
   if (FLOAT_TYPE_P (type)
-  && flag_fp_contract_mode == FP_CONTRACT_OFF)
+  && flag_fp_contract_mode != FP_CONTRACT_FAST)
 return false;
 
   /* We don't want to do bitfield reduction ops.  */
-- 
2.39.2

Re: [PATCH] i386: Fix up types in __builtin_{inf, huge_val, nan{, s}, fabs, copysign}q builtins [PR109884]

2023-05-17 Thread Uros Bizjak via Gcc-patches

On Wed, May 17, 2023 at 8:08 PM Jakub Jelinek  wrote:
>
> Hi!
>
> When _Float128 support has been added to C++ for 13.1,  float128t_type_node
> tree has been added - in C float128_type_node and float128t_type_node is
> the same and represents both _Float128 and __float128, but in C++ they
> are distinct types which have different handling in the FEs.
> When doing that change, I mistakenly forgot to change FLOAT128 primitive
> type, which is used for the __builtin_{inf,huge_val,nan{,s},fabs,copysign}q
> builtins results and some of their arguments (and nothing else).
>
> The following patch fixes that.
> On ia64 we already use float128t_type_node for those builtins, pa while
> it has __float128 that type is the same as long double and so those builtins
> have long double types and on powerpc seems we  don't have these builtins
> but instead define macros which map them to __builtin_*f128.  That will
> not work properly in C++, perhaps we should change those macros to be
> function-like and cast to __float128.
>
> Anyway, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
> and 13.2?
>
> 2023-05-17  Jakub Jelinek  
>
> PR c++/109884
> * config/i386/i386-builtin-types.def (FLOAT128): Use
> float128t_type_node rather than float128_type_node.
>
> * c-c++-common/pr109884.c: New test.

OK for master and branch.

Thanks,
Uros.

>
> --- gcc/config/i386/i386-builtin-types.def.jj   2022-11-28 22:25:15.838734978 
> +0100
> +++ gcc/config/i386/i386-builtin-types.def  2023-05-17 10:24:04.297929428 
> +0200
> @@ -73,7 +73,7 @@ DEF_PRIMITIVE_TYPE (BFLOAT16, ix86_bf16_
>  DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
>  DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
>  DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
> -DEF_PRIMITIVE_TYPE (FLOAT128, float128_type_node)
> +DEF_PRIMITIVE_TYPE (FLOAT128, float128t_type_node)
>  DEF_PRIMITIVE_TYPE (CONST_STRING, const_string_type_node)
>
>  # MMX vectors
> --- gcc/testsuite/c-c++-common/pr109884.c.jj2023-05-17 10:41:42.295838862 
> +0200
> +++ gcc/testsuite/c-c++-common/pr109884.c   2023-05-17 10:56:51.731872318 
> +0200
> @@ -0,0 +1,32 @@
> +/* PR c++/109884 */
> +/* PowerPC doesn't define these as builtins, but macros expanding to
> +   *f128 builtins.  */
> +/* { dg-do compile { target { __float128 && { { c || c++11 } && { ! 
> powerpc*-*-* } } } } } */
> +/* { dg-add-options __float128 } */
> +
> +#ifdef __cplusplus
> +template 
> +struct is_same {
> +  static const bool value = false;
> +};
> +
> +template 
> +struct is_same  {
> +  static const bool value = true;
> +};
> +#define HAS_TYPE(E, U) static_assert (is_same ::value, "")
> +#else
> +#define HAS_TYPE(E, U) _Static_assert (_Generic (E, default : 0, U : 1), "")
> +#endif
> +
> +void
> +foo ()
> +{
> +  __float128 a = 0;
> +  HAS_TYPE (__builtin_infq (), __float128);
> +  HAS_TYPE (__builtin_huge_valq (), __float128);
> +  HAS_TYPE (__builtin_nanq (""), __float128);
> +  HAS_TYPE (__builtin_nansq (""), __float128);
> +  HAS_TYPE (__builtin_fabsq (a), __float128);
> +  HAS_TYPE (__builtin_copysignq (a, a), __float128);
> +}
>
> Jakub
>

[COMMITTED] i386: Adjust emulated integer vector mode multiplication costs

2023-05-17 Thread Uros Bizjak via Gcc-patches

Returned integer vector mode costs of emulated modes in
ix86_multiplication_cost are wrong and do not reflect generated
instruction sequences.  Rewrite handling of different integer vector
modes and different target ABIs to return real instruction
counts in order to calculate better costs of various emulated modes.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_multiplication_cost): Correct
calculation of integer vector mode costs to reflect generated
instruction sequences of different integer vector modes and
different target ABIs.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 498fac468b5..9ab24242b59 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20463,36 +20463,52 @@ ix86_multiplication_cost (const struct 
processor_costs *cost,
 return  ix86_vec_cost (mode,
   inner_mode == DFmode ? cost->mulsd : cost->mulss);
   else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
-{
-  /* vpmullq is used in this case. No emulation is needed.  */
-  if (TARGET_AVX512DQ)
-   return ix86_vec_cost (mode, cost->mulss);
+switch (mode)
+  {
+  case V16QImode:
+   /* V*QImode is emulated with 4-11 insns.  */
+   if (TARGET_AVX512BW && TARGET_AVX512VL)
+ return ix86_vec_cost (mode, cost->mulss + cost->sse_op * 3);
+   else if (TARGET_XOP)
+ return ix86_vec_cost (mode, cost->mulss * 2 + cost->sse_op * 5);
+   /* FALLTHRU */
+  case V32QImode:
+   if (TARGET_AVX512BW && mode == V32QImode)
+ return ix86_vec_cost (mode, cost->mulss + cost->sse_op * 3);
+   else
+ return ix86_vec_cost (mode, cost->mulss * 2 + cost->sse_op * 7);
 
-  /* V*QImode is emulated with 7-13 insns.  */
-  if (mode == V16QImode || mode == V32QImode)
-   {
- int extra = 11;
- if (TARGET_XOP && mode == V16QImode)
-   extra = 5;
- else if (TARGET_SSSE3)
-   extra = 6;
- return ix86_vec_cost (mode, cost->mulss * 2 + cost->sse_op * extra);
-   }
-  /* V*DImode is emulated with 5-8 insns.  */
-  else if (mode == V2DImode || mode == V4DImode)
-   {
- if (TARGET_XOP && mode == V2DImode)
-   return ix86_vec_cost (mode, cost->mulss * 2 + cost->sse_op * 3);
- else
-   return ix86_vec_cost (mode, cost->mulss * 3 + cost->sse_op * 5);
-   }
-  /* Without sse4.1, we don't have PMULLD; it's emulated with 7
-insns, including two PMULUDQ.  */
-  else if (mode == V4SImode && !(TARGET_SSE4_1 || TARGET_AVX))
-   return ix86_vec_cost (mode, cost->mulss * 2 + cost->sse_op * 5);
-  else
+  case V64QImode:
+   return ix86_vec_cost (mode, cost->mulss * 2 + cost->sse_op * 9);
+
+  case V4SImode:
+   /* pmulld is used in this case. No emulation is needed.  */
+   if (TARGET_SSE4_1)
+ goto do_native;
+   /* V4SImode is emulated with 7 insns.  */
+   else
+ return ix86_vec_cost (mode, cost->mulss * 2 + cost->sse_op * 5);
+
+  case V2DImode:
+  case V4DImode:
+   /* vpmullq is used in this case. No emulation is needed.  */
+   if (TARGET_AVX512DQ && TARGET_AVX512VL)
+ goto do_native;
+   /* V*DImode is emulated with 6-8 insns.  */
+   else if (TARGET_XOP && mode == V2DImode)
+ return ix86_vec_cost (mode, cost->mulss * 2 + cost->sse_op * 4);
+   /* FALLTHRU */
+  case V8DImode:
+   /* vpmullq is used in this case. No emulation is needed.  */
+   if (TARGET_AVX512DQ && mode == V8DImode)
+ goto do_native;
+   else
+ return ix86_vec_cost (mode, cost->mulss * 3 + cost->sse_op * 5);
+
+  default:
+  do_native:
return ix86_vec_cost (mode, cost->mulss);
-}
+  }
   else
 return (cost->mult_init[MODE_INDEX (mode)] + cost->mult_bit * 7);
 }

[PATCH] libstdc++: Fix up some templates [PR109883]

2023-05-17 Thread Jakub Jelinek via Gcc-patches

Hi!

As can be seen on the following testcase, for
std::{atan2,fmod,pow,copysign,fdim,fmax,fmin,hypot,nextafter,remainder,remquo,fma}
if one operand type is std::float{16,32,64,128}_t or std::bfloat16_t and
another one some integral type or some other floating point type which
promotes to the other operand's type, we can end up with endless recursion.
This is because of a declaration ordering problem in , where the
float, double and long double overloads of those functions come before
the templates which use __gnu_cxx::__promote_{2,3}, but the
std::float{16,32,64,128}_t and std::bfloat16_t overloads come later in the
file.  If the result of those promotions is _Float{16,32,64,128} or
__gnu_cxx::__bfloat16_t, say std::pow(_Float64, int) calls
std::pow(_Float64, _Float64) and the latter calls itself.

The following patch fixes that by moving those templates later in the file,
so that the calls from those templates see also the other overloads.

I think other templates in the file like e.g. isgreater etc. shouldn't be
a problem, because those just use __builtin_isgreater etc. in their bodies.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/13.2?

2023-05-17  Jakub Jelinek  

PR libstdc++/109883
* include/c_global/cmath (atan2, fmod, pow): Move
__gnu_cxx::__promote_2 using templates after _Float{16,32,64,128} and
__gnu_cxx::__bfloat16_t overloads.
(copysign, fdim, fmax, fmin, hypot, nextafter, remainder, remquo):
Likewise.
(fma): Move __gnu_cxx::__promote_3 using template after
_Float{16,32,64,128} and __gnu_cxx::__bfloat16_t overloads.

* testsuite/26_numerics/headers/cmath/constexpr_std_c++23.cc: New test.

--- libstdc++-v3/include/c_global/cmath.jj  2023-01-16 23:19:06.225717615 
+0100
+++ libstdc++-v3/include/c_global/cmath 2023-05-17 15:07:07.823657320 +0200
@@ -151,15 +151,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return __builtin_atan2l(__y, __x); }
 #endif
 
-  template
-inline _GLIBCXX_CONSTEXPR
-typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
-atan2(_Tp __y, _Up __x)
-{
-  typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
-  return atan2(__type(__y), __type(__x));
-}
-
   using ::ceil;
 
 #ifndef __CORRECT_ISO_CPP_MATH_H_PROTO
@@ -286,15 +277,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return __builtin_fmodl(__x, __y); }
 #endif
 
-  template
-inline _GLIBCXX_CONSTEXPR
-typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
-fmod(_Tp __x, _Up __y)
-{
-  typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
-  return fmod(__type(__x), __type(__y));
-}
-
   using ::frexp;
 
 #ifndef __CORRECT_ISO_CPP_MATH_H_PROTO
@@ -411,15 +393,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 #endif
 
-  template
-inline _GLIBCXX_CONSTEXPR
-typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
-pow(_Tp __x, _Up __y)
-{
-  typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
-  return pow(__type(__x), __type(__y));
-}
-
   using ::sin;
 
 #ifndef __CORRECT_ISO_CPP_MATH_H_PROTO
@@ -1073,6 +1046,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return __gnu_cxx::__bfloat16_t(__builtin_tanhf(__x)); }
 #endif
 
+  template
+inline _GLIBCXX_CONSTEXPR
+typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
+atan2(_Tp __y, _Up __x)
+{
+  typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
+  return atan2(__type(__y), __type(__x));
+}
+
+  template
+inline _GLIBCXX_CONSTEXPR
+typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
+fmod(_Tp __x, _Up __y)
+{
+  typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
+  return fmod(__type(__x), __type(__y));
+}
+
+  template
+inline _GLIBCXX_CONSTEXPR
+typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
+pow(_Tp __x, _Up __y)
+{
+  typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
+  return pow(__type(__x), __type(__y));
+}
+
 #if _GLIBCXX_USE_C99_MATH
 #if !_GLIBCXX_USE_C99_FP_MACROS_DYNAMIC
 
@@ -2107,16 +2107,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return __builtin_copysignl(__x, __y); }
 #endif
 
-#ifndef __CORRECT_ISO_CPP11_MATH_H_PROTO_INT
-  template
-constexpr typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
-copysign(_Tp __x, _Up __y)
-{
-  typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
-  return copysign(__type(__x), __type(__y));
-}
-#endif
-
 #ifndef __CORRECT_ISO_CPP11_MATH_H_PROTO_FP
   constexpr float
   erf(float __x)
@@ -2199,16 +2189,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return __builtin_fdiml(__x, __y); }
 #endif
 
-#ifndef __CORRECT_ISO_CPP11_MATH_H_PROTO_INT
-  template
-constexpr typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
-fdim(_Tp __x, _Up __y)
-{
-  typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
-  return fdim(__type(__x), __type(__y));
-}

[PATCH] i386: Fix up types in __builtin_{inf,huge_val,nan{,s},fabs,copysign}q builtins [PR109884]

2023-05-17 Thread Jakub Jelinek via Gcc-patches

Hi!

When _Float128 support has been added to C++ for 13.1,  float128t_type_node
tree has been added - in C float128_type_node and float128t_type_node is
the same and represents both _Float128 and __float128, but in C++ they
are distinct types which have different handling in the FEs.
When doing that change, I mistakenly forgot to change FLOAT128 primitive
type, which is used for the __builtin_{inf,huge_val,nan{,s},fabs,copysign}q
builtins results and some of their arguments (and nothing else).

The following patch fixes that.
On ia64 we already use float128t_type_node for those builtins, pa while
it has __float128 that type is the same as long double and so those builtins
have long double types and on powerpc seems we  don't have these builtins
but instead define macros which map them to __builtin_*f128.  That will
not work properly in C++, perhaps we should change those macros to be
function-like and cast to __float128.

Anyway, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
and 13.2?

2023-05-17  Jakub Jelinek  

PR c++/109884
* config/i386/i386-builtin-types.def (FLOAT128): Use
float128t_type_node rather than float128_type_node.

* c-c++-common/pr109884.c: New test.

--- gcc/config/i386/i386-builtin-types.def.jj   2022-11-28 22:25:15.838734978 
+0100
+++ gcc/config/i386/i386-builtin-types.def  2023-05-17 10:24:04.297929428 
+0200
@@ -73,7 +73,7 @@ DEF_PRIMITIVE_TYPE (BFLOAT16, ix86_bf16_
 DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
 DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
-DEF_PRIMITIVE_TYPE (FLOAT128, float128_type_node)
+DEF_PRIMITIVE_TYPE (FLOAT128, float128t_type_node)
 DEF_PRIMITIVE_TYPE (CONST_STRING, const_string_type_node)
 
 # MMX vectors
--- gcc/testsuite/c-c++-common/pr109884.c.jj2023-05-17 10:41:42.295838862 
+0200
+++ gcc/testsuite/c-c++-common/pr109884.c   2023-05-17 10:56:51.731872318 
+0200
@@ -0,0 +1,32 @@
+/* PR c++/109884 */
+/* PowerPC doesn't define these as builtins, but macros expanding to
+   *f128 builtins.  */
+/* { dg-do compile { target { __float128 && { { c || c++11 } && { ! 
powerpc*-*-* } } } } } */
+/* { dg-add-options __float128 } */
+
+#ifdef __cplusplus
+template 
+struct is_same {
+  static const bool value = false;
+};
+
+template 
+struct is_same  {
+  static const bool value = true;
+};
+#define HAS_TYPE(E, U) static_assert (is_same ::value, "")
+#else
+#define HAS_TYPE(E, U) _Static_assert (_Generic (E, default : 0, U : 1), "")
+#endif
+
+void
+foo ()
+{
+  __float128 a = 0;
+  HAS_TYPE (__builtin_infq (), __float128);
+  HAS_TYPE (__builtin_huge_valq (), __float128);
+  HAS_TYPE (__builtin_nanq (""), __float128);
+  HAS_TYPE (__builtin_nansq (""), __float128);
+  HAS_TYPE (__builtin_fabsq (a), __float128);
+  HAS_TYPE (__builtin_copysignq (a, a), __float128);
+}

Jakub

[v2] RISC-V: Remove masking third operand of rotate instructions

2023-05-17 Thread Jivan Hakobyan via Gcc-patches

Rotate instructions do not need to mask the third operand.
For example,  RV64 the following code:

unsigned long foo1(unsigned long rs1, unsigned long rs2)
{
long shamt = rs2 & (64 - 1);
return (rs1 << shamt) | (rs1 >> ((64 - shamt) & (64 - 1)));
}

Compiles to:
foo1:
andia1,a1,63
rol a0,a0,a1
ret

This patch removes unnecessary masking.
Besides, I have merged masking insns for shifts that were written before.


gcc/ChangeLog:
* config/riscv/riscv.md (*3_mask): New pattern,
combined from ...
(*si3_mask, *di3_mask): Here.
(*3_mask_1): New pattern, combined from ...
(*si3_mask_1, *di3_mask_1): Here.
* config/riscv/bitmanip.md (*3_mask): New
pattern.
(*si3_sext_mask): Likewise.
* config/riscv/iterators.md (shiftm1): Generalize to handle more
masking constants.
(bitmanip_rotate): New iterator.
(bitmanip_optab): Add rotates.
* config/riscv/predicates.md (const_si_mask_operand): Renamed
from const31_operand.  Generalize to handle more mask constants.
(const_di_mask_operand): Similarly.

gcc/testsuite/ChangeLog:
* testsuite/gcc.target/riscv/shift-and-2.c: Fixed test
* testsuite/gcc.target/riscv/zbb-rol-ror-01.c: New test
* testsuite/gcc.target/riscv/zbb-rol-ror-02.c: New test
* testsuite/gcc.target/riscv/zbb-rol-ror-03.c: New test
* testsuite/gcc.target/riscv/zbb-rol-ror-04.c: New test
* testsuite/gcc.target/riscv/zbb-rol-ror-05.c: New test
* testsuite/gcc.target/riscv/zbb-rol-ror-06.c: New test
* testsuite/gcc.target/riscv/zbb-rol-ror-07.c: New test




-- 
With the best regards
Jivan Hakobyan
diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index a27fc3e34a1..0fd0cbdeb04 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -351,6 +351,42 @@
   "rolw\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+(define_insn_and_split "*3_mask"
+  [(set (match_operand:X 0 "register_operand" "= r")
+(bitmanip_rotate:X
+(match_operand:X 1 "register_operand" "  r")
+(match_operator 4 "subreg_lowpart_operator"
+ [(and:X
+   (match_operand:X 2 "register_operand"  "r")
+   (match_operand 3 "" ""))])))]
+  "TARGET_ZBB || TARGET_ZBKB"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+(bitmanip_rotate:X (match_dup 1)
+   (match_dup 2)))]
+  "operands[2] = gen_lowpart (QImode, operands[2]);"
+  [(set_attr "type" "bitmanip")
+   (set_attr "mode" "")])
+
+(define_insn_and_split "*si3_sext_mask"
+  [(set (match_operand:DI 0 "register_operand" "= r")
+  (sign_extend:DI (bitmanip_rotate:SI
+(match_operand:SI 1 "register_operand" "  r")
+(match_operator 4 "subreg_lowpart_operator"
+ [(and:DI
+   (match_operand:DI 2 "register_operand"  "r")
+   (match_operand 3 "const_si_mask_operand"))]]
+  "TARGET_64BIT && (TARGET_ZBB || TARGET_ZBKB)"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+  (sign_extend:DI (bitmanip_rotate:SI (match_dup 1)
+   (match_dup 2]
+  "operands[2] = gen_lowpart (QImode, operands[2]);"
+  [(set_attr "type" "bitmanip")
+   (set_attr "mode" "DI")])
+
 ;; orc.b (or-combine) is added as an unspec for the benefit of the support
 ;; for optimized string functions (such as strcmp).
 (define_insn "orcb2"
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 1d56324df03..8afe98e4410 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -117,7 +117,7 @@
 (define_mode_attr HALFMODE [(DF "SI") (DI "SI") (TF "DI")])
 
 ; bitmanip mode attribute
-(define_mode_attr shiftm1 [(SI "const31_operand") (DI "const63_operand")])
+(define_mode_attr shiftm1 [(SI "const_si_mask_operand") (DI "const_di_mask_operand")])
 (define_mode_attr shiftm1p [(SI "DsS") (DI "DsD")])
 
 ;; ---
@@ -174,6 +174,8 @@
 
 (define_code_iterator clz_ctz_pcnt [clz ctz popcount])
 
+(define_code_iterator bitmanip_rotate [rotate rotatert])
+
 ;; ---
 ;; Code Attributes
 ;; ---
@@ -271,7 +273,9 @@
   (umax "umax")
   (clz "clz")
   (ctz "ctz")
-  (popcount "popcount")])
+  (popcount "popcount")
+  (rotate "rotl")
+  (rotatert "rotr")])
 (define_code_attr bitmanip_insn [(smin "min")
  (smax "max")
  (umin "minu")
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index e5adf06fa25..ffcbb9a7589 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -235,13 +235,15 @@
   (and (match_code "const_int")
(match_test "SINGLE_BIT_MASK_OPERAND (~UINTVAL (op))")))
 
-(define_predicate

Re: [PATCH] libstdc++: use __bool_constant instead of integral_constant

2023-05-17 Thread Patrick Palka via Gcc-patches

On Fri, 12 May 2023, Ken Matsui via Libstdc++ wrote:

> It appears that GCC 13 has been released, but I am wondering if there
> are any issues preventing this patch from being merged yet. Can you
> provide any information on this?

Thanks for the reminder, I pushed this to trunk just now
(r14-940-g637edefc5863cf). Congrats on your first libstdc++ commit!

> 
> On Sat, Apr 8, 2023 at 2:08 PM Ken Matsui  wrote:
> >
> > I see. Thank you!
> >
> > On Sat, Apr 8, 2023 at 12:52 AM Jonathan Wakely  
> > wrote:
> > >
> > > This looks good, thanks, but we're too close to the gcc 13 release now, 
> > > and this isn't fixing any bugs. I'll push it after the release.
> > >
> > >
> > > On Thu, 23 Mar 2023, 11:07 Ken Matsui via Libstdc++, 
> > >  wrote:
> > >>
> > >> In the type_traits header, both integral_constant and 
> > >> __bool_constant
> > >> are used. This patch unifies those usages into __bool_constant.
> > >>
> > >> libstdc++-v3/ChangeLog:
> > >>
> > >> * include/std/type_traits: Use __bool_constant instead of
> > >> integral_constant.
> > >>
> > >> Signed-off-by: Ken Matsui 
> > >> ---
> > >>  libstdc++-v3/include/std/type_traits | 32 ++--
> > >>  1 file changed, 16 insertions(+), 16 deletions(-)
> > >>
> > >> diff --git a/libstdc++-v3/include/std/type_traits 
> > >> b/libstdc++-v3/include/std/type_traits
> > >> index 2bd607a8b8f..bc6982f9e64 100644
> > >> --- a/libstdc++-v3/include/std/type_traits
> > >> +++ b/libstdc++-v3/include/std/type_traits
> > >> @@ -578,19 +578,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >>/// is_enum
> > >>template
> > >>  struct is_enum
> > >> -: public integral_constant
> > >> +: public __bool_constant<__is_enum(_Tp)>
> > >>  { };
> > >>
> > >>/// is_union
> > >>template
> > >>  struct is_union
> > >> -: public integral_constant
> > >> +: public __bool_constant<__is_union(_Tp)>
> > >>  { };
> > >>
> > >>/// is_class
> > >>template
> > >>  struct is_class
> > >> -: public integral_constant
> > >> +: public __bool_constant<__is_class(_Tp)>
> > >>  { };
> > >>
> > >>/// is_function
> > >> @@ -784,7 +784,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >>/// is_trivial
> > >>template
> > >>  struct is_trivial
> > >> -: public integral_constant
> > >> +: public __bool_constant<__is_trivial(_Tp)>
> > >>  {
> > >>
> > >> static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp>{}),
> > >> "template argument must be a complete class or an unbounded 
> > >> array");
> > >> @@ -793,7 +793,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >>/// is_trivially_copyable
> > >>template
> > >>  struct is_trivially_copyable
> > >> -: public integral_constant
> > >> +: public __bool_constant<__is_trivially_copyable(_Tp)>
> > >>  {
> > >>
> > >> static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp>{}),
> > >> "template argument must be a complete class or an unbounded 
> > >> array");
> > >> @@ -802,7 +802,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >>/// is_standard_layout
> > >>template
> > >>  struct is_standard_layout
> > >> -: public integral_constant
> > >> +: public __bool_constant<__is_standard_layout(_Tp)>
> > >>  {
> > >>
> > >> static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp>{}),
> > >> "template argument must be a complete class or an unbounded 
> > >> array");
> > >> @@ -817,7 +817,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >>  struct
> > >>  _GLIBCXX20_DEPRECATED_SUGGEST("is_standard_layout && is_trivial")
> > >>  is_pod
> > >> -: public integral_constant
> > >> +: public __bool_constant<__is_pod(_Tp)>
> > >>  {
> > >>
> > >> static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp>{}),
> > >> "template argument must be a complete class or an unbounded 
> > >> array");
> > >> @@ -831,7 +831,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >>  struct
> > >>  _GLIBCXX17_DEPRECATED
> > >>  is_literal_type
> > >> -: public integral_constant
> > >> +: public __bool_constant<__is_literal_type(_Tp)>
> > >>  {
> > >>
> > >> static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp>{}),
> > >> "template argument must be a complete class or an unbounded 
> > >> array");
> > >> @@ -840,13 +840,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >>/// is_empty
> > >>template
> > >>  struct is_empty
> > >> -: public integral_constant
> > >> +: public __bool_constant<__is_empty(_Tp)>
> > >>  { };
> > >>
> > >>/// is_polymorphic
> > >>template
> > >>  struct is_polymorphic
> > >> -: public integral_constant
> > >> +: public __bool_constant<__is_polymorphic(_Tp)>
> > >>  { };
> > >>
> > >>  #if __cplusplus >= 201402L
> > >> @@ -855,14 +855,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >>/// @since C++14
> > >>template

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-17 Thread Prathamesh Kulkarni via Gcc-patches

On Tue, 16 May 2023 at 00:29, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > Hi Richard,
> > After committing the interleave+zip1 patch for vector initialization,
> > it seems to regress the s32 case for this patch:
> >
> > int32x4_t f_s32(int32_t x)
> > {
> >   return (int32x4_t) { x, x, x, 1 };
> > }
> >
> > code-gen:
> > f_s32:
> > moviv30.2s, 0x1
> > fmovs31, w0
> > dup v0.2s, v31.s[0]
> > ins v30.s[0], v31.s[0]
> > zip1v0.4s, v0.4s, v30.4s
> > ret
> >
> > instead of expected code-gen:
> > f_s32:
> > moviv31.2s, 0x1
> > dup v0.4s, w0
> > ins v0.s[3], v31.s[0]
> > ret
> >
> > Cost for fallback sequence: 16
> > Cost for interleave and zip sequence: 12
> >
> > For the above case, the cost for interleave+zip1 sequence is computed as:
> > halves[0]:
> > (set (reg:V2SI 96)
> > (vec_duplicate:V2SI (reg/v:SI 93 [ x ])))
> > cost = 8
> >
> > halves[1]:
> > (set (reg:V2SI 97)
> > (const_vector:V2SI [
> > (const_int 1 [0x1]) repeated x2
> > ]))
> > (set (reg:V2SI 97)
> > (vec_merge:V2SI (vec_duplicate:V2SI (reg/v:SI 93 [ x ]))
> > (reg:V2SI 97)
> > (const_int 1 [0x1])))
> > cost = 8
> >
> > followed by:
> > (set (reg:V4SI 95)
> > (unspec:V4SI [
> > (subreg:V4SI (reg:V2SI 96) 0)
> > (subreg:V4SI (reg:V2SI 97) 0)
> > ] UNSPEC_ZIP1))
> > cost = 4
> >
> > So the total cost becomes
> > max(costs[0], costs[1]) + zip1_insn_cost
> > = max(8, 8) + 4
> > = 12
> >
> > While the fallback rtl sequence is:
> > (set (reg:V4SI 95)
> > (vec_duplicate:V4SI (reg/v:SI 93 [ x ])))
> > cost = 8
> > (set (reg:SI 98)
> > (const_int 1 [0x1]))
> > cost = 4
> > (set (reg:V4SI 95)
> > (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 98))
> > (reg:V4SI 95)
> > (const_int 8 [0x8])))
> > cost = 4
> >
> > So total cost = 8 + 4 + 4 = 16, and we choose the interleave+zip1 sequence.
> >
> > I think the issue is probably that for the interleave+zip1 sequence we take
> > max(costs[0], costs[1]) to reflect that both halves are interleaved,
> > but for the fallback seq we use seq_cost, which assumes serial execution
> > of insns in the sequence.
> > For above fallback sequence,
> > set (reg:V4SI 95)
> > (vec_duplicate:V4SI (reg/v:SI 93 [ x ])))
> > and
> > (set (reg:SI 98)
> > (const_int 1 [0x1]))
> > could be executed in parallel, which would make it's cost max(8, 4) + 4 = 
> > 12.
>
> Agreed.
>
> A good-enough substitute for this might be to ignore scalar moves
> (for both alternatives) when costing for speed.
Thanks for the suggestions. Just wondering for aarch64, if there's an easy
way we can check if insn is a scalar move, similar to riscv's scalar_move_insn_p
that checks if get_attr_type(insn) is TYPE_VIMOVXV or TYPE_VFMOVFV ?
>
> > I was wondering if we should we make cost for interleave+zip1 sequence
> > more conservative
> > by not taking max, but summing up costs[0] + costs[1] even for speed ?
> > For this case,
> > that would be 8 + 8 + 4 = 20.
> >
> > It generates the fallback sequence for other cases (s8, s16, s64) from
> > the test-case.
>
> What does it do for the tests in the interleave+zip1 patch?  If it doesn't
> make a difference there then it sounds like we don't have enough tests. :)
Oh right, the tests in interleave+zip1 patch only check for s16 case,
sorry about that :/
Looking briefly at the code generated for s8, s32 and s64 case,
(a) s8, and s16 seem to use same sequence for all cases.
(b) s64 seems to use fallback sequence.
(c) For vec-init-21.c, s8 and s16 cases prefer fallback sequence
because costs are tied,
while s32 case prefers interleave+zip1:

int32x4_t f_s32(int32_t x, int32_t y)
{
  return (int32x4_t) { x, y, 1, 2 };
}

Code-gen with interleave+zip1 sequence:
f_s32:
moviv31.2s, 0x1
moviv0.2s, 0x2
ins v31.s[0], w0
ins v0.s[0], w1
zip1v0.4s, v31.4s, v0.4s
ret

Code-gen with fallback sequence:
f_s32:
adrpx2, .LC0
ldr q0, [x2, #:lo12:.LC0]
ins v0.s[0], w0
ins v0.s[1], w1
ret

Fallback sequence cost = 20
interleave+zip1 sequence cost = 12
I assume interleave+zip1 sequence is better in this case (chosen currently) ?

I will send a patch to add cases for s8, s16 and s64 in a follow up patch soon.
>
> Summing is only conservative if the fallback sequence is somehow "safer".
> But I don't think it is.   Building an N-element vector from N scalars
> can be done using N instructions in the fallback case and N+1 instructions
> in the interleave+zip1 case.  But the interleave+zip1 case is still
> better (speedwise) for N==16.
Ack, thanks.
Should we also prefer interleave+zip1 when the costs are tied ?
For eg, for the following case:
int32x4_t f_s32(int32_t x)
{
  return (int32x4_t) { x, 1, x, 1 };
}

costs for both fallback and interleave+zip1 sequence = 12, and we

RE: [PATCH] RISC-V: Add mode switching target hook to insert rounding mode config for fixed-point instructions

2023-05-17 Thread Li, Pan2 via Gcc-patches

Committed, thanks kito.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Wednesday, May 17, 2023 6:06 PM
To: juzhe.zh...@rivai.ai
Cc: gcc-patches@gcc.gnu.org; kito.ch...@gmail.com; pal...@dabbelt.com; 
pal...@rivosinc.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Add mode switching target hook to insert rounding 
mode config for fixed-point instructions

LGTM, it's really awesome, I know it's kind of blocking due to enum stuff, so 
feel free to commit this once it unblock :)

On Wed, May 17, 2023 at 5:58 PM  wrote:
>
> From: Juzhe-Zhong 
>
> Hi, this patch support the new coming fixed-point intrinsics:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222
>
> Insert fixed-point rounding mode configuration by mode switching target hook.
>
> Mode switching target hook is implemented applying LCM (Lazy code Motion).
>
> So the performance && correctness can be well trusted.
>
> Here is the example:
>
> void f (void * in, void *out, int32_t x, int n, int m) {
>   for (int i = 0; i < n; i++) {
> vint32m1_t v = __riscv_vle32_v_i32m1 (in + i, 4);
> vint32m1_t v2 = __riscv_vle32_v_i32m1_tu (v, in + 100 + i, 4);
> vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, VXRM_RDN, 4);
> v3 = __riscv_vaadd_vx_i32m1 (v3, 3, VXRM_RDN, 4);
> __riscv_vse32_v_i32m1 (out + 100 + i, v3, 4);
>   }
>
>   for (int i = 0; i < n; i++) {
> vint32m1_t v = __riscv_vle32_v_i32m1 (in + i + 1000, 4);
> vint32m1_t v2 = __riscv_vle32_v_i32m1_tu (v, in + 100 + i + 1000, 4);
> vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, VXRM_RDN, 4);
> v3 = __riscv_vaadd_vx_i32m1 (v3, 3, VXRM_RDN, 4);
> __riscv_vse32_v_i32m1 (out + 100 + i + 1000, v3, 4);
>   }
> }
>
> ASM:
>
> ...
> csrwi   vxrm,2
> vsetivlizero,4,e32,m1,tu,ma
> ...
> Loop 1
> ...
> Loop 2
>
> mode switching can global recognize both Loop 1 and Loop 2 are using 
> RDN rounding mode and hoist such single "csrwi vxrm,2" to dominate 
> both Loop 1 and Loop 2.
>
> Besides, I have add correctness check sanity tests in this patch too.
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-opts.h (enum riscv_entity): New enum.
> * config/riscv/riscv.cc (riscv_emit_mode_set): New function.
> (riscv_mode_needed): Ditto.
> (riscv_mode_after): Ditto.
> (riscv_mode_entry): Ditto.
> (riscv_mode_exit): Ditto.
> (riscv_mode_priority): Ditto.
> (TARGET_MODE_EMIT): New target hook.
> (TARGET_MODE_NEEDED): Ditto.
> (TARGET_MODE_AFTER): Ditto.
> (TARGET_MODE_ENTRY): Ditto.
> (TARGET_MODE_EXIT): Ditto.
> (TARGET_MODE_PRIORITY): Ditto.
> * config/riscv/riscv.h (OPTIMIZE_MODE_SWITCHING): Ditto.
> (NUM_MODES_FOR_MODE_SWITCHING): Ditto.
> * config/riscv/riscv.md: Add csrwvxrm.
> * config/riscv/vector.md (rnu,rne,rdn,rod,none): New attribute.
> (vxrmsi): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/vxrm-10.c: New test.
> * gcc.target/riscv/rvv/base/vxrm-6.c: New test.
> * gcc.target/riscv/rvv/base/vxrm-7.c: New test.
> * gcc.target/riscv/rvv/base/vxrm-8.c: New test.
> * gcc.target/riscv/rvv/base/vxrm-9.c: New test.
>
> ---
>  gcc/config/riscv/riscv-opts.h |   8 ++
>  gcc/config/riscv/riscv.cc | 104 ++
>  gcc/config/riscv/riscv.h  |   6 +-
>  gcc/config/riscv/riscv.md |   3 +-
>  gcc/config/riscv/vector.md|  29 +
>  .../gcc.target/riscv/rvv/base/vxrm-10.c   |  26 +
>  .../gcc.target/riscv/rvv/base/vxrm-6.c|  15 +++
>  .../gcc.target/riscv/rvv/base/vxrm-7.c|  16 +++
>  .../gcc.target/riscv/rvv/base/vxrm-8.c|  18 +++
>  .../gcc.target/riscv/rvv/base/vxrm-9.c|  26 +
>  10 files changed, 249 insertions(+), 2 deletions(-)  create mode 
> 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-10.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-8.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-9.c
>
> diff --git a/gcc/config/riscv/riscv-opts.h 
> b/gcc/config/riscv/riscv-opts.h index 1b2e6de5e1b..2a16402265a 100644
> --- a/gcc/config/riscv/riscv-opts.h
> +++ b/gcc/config/riscv/riscv-opts.h
> @@ -91,6 +91,14 @@ enum riscv_multilib_select_kind {
>select_by_abi,
>  };
>
> +/* ENTITIES in mode switching.  */
> +enum riscv_entity
> +{
> +  RISCV_VXRM = 0,
> +  RISCV_FRM,
> +  MAX_RISCV_ENTITIES
> +};
> +
>  #define MASK_ZICSR(1 << 0)
>  #define MASK_ZIFENCEI (1 << 1)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc 
> index de5b87b1a87..0d1b83f4315 100644
> --- a/gcc/config/riscv/riscv.cc
> +++

RE: [PATCH] RISC-V: Introduce rounding mode operand into fixed-point intrinsics

2023-05-17 Thread Li, Pan2 via Gcc-patches

Committed, thanks kito.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Wednesday, May 17, 2023 11:05 PM
To: 钟居哲 
Cc: Jeff Law ; gcc-patches ; 
kito.cheng ; palmer ; palmer 
; rdapp.gcc 
Subject: Re: [PATCH] RISC-V: Introduce rounding mode operand into fixed-point 
intrinsics

LGTM, thanks!

钟居哲 於 2023年5月17日 週三，23:02寫道：

> Ping this patch which is the prerequisite of this patch:
>
> https://patchwork.sourceware.org/project/gcc/patch/20230517095818.1285188-1-juzhe.zh...@rivai.ai/
>
> which has been approved by kito.
>
> Is this patch also ok for trunk ?
>
> Thanks.
> --
> juzhe.zh...@rivai.ai
>
>
> *From:* juzhe.zhong 
> *Date:* 2023-05-17 13:25
> *To:* gcc-patches 
> *CC:* kito.cheng ; kito.cheng
> ; palmer ; palmer
> ; jeffreyalaw ; rdapp.gcc
> ; Juzhe-Zhong 
> *Subject:* [PATCH] RISC-V: Introduce rounding mode operand into
> fixed-point intrinsics
> From: Juzhe-Zhong 
>
> According to new comming fixed-point API:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222
>
> Introduce vxrm argument:
> - vint32m1_t __riscv_vsadd_vv_i32m1 (vint32m1_t op1, vint32m1_t op2,
> size_t vl);
> + vint32m1_t __riscv_vsadd_vv_i32m1 (vint32m1_t op1, vint32m1_t op2,
> size_t vxrm, size_t vl);
>
> This patch doesn't insert vxrm csrw configuration instruction yet.
> Will support automatically insert csrw vxrm instruction in the next patch.
>
> This patch does this following:
> 1. Only extend the vxrm argument.
> 2. Check vxrm argument is invalid immediate and report error message if it
> is invalid.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc: Introduce rounding
> mode.
> * config/riscv/riscv-vector-builtins-shapes.cc (struct alu_def):
> Ditto.
> (struct narrow_alu_def): Ditto.
> * config/riscv/riscv-vector-builtins.cc
> (function_builder::apply_predication): Ditto.
> (function_expander::use_exact_insn): Ditto.
> * config/riscv/riscv-vector-builtins.h
> (function_checker::arg_num): New function.
> (function_base::has_rounding_mode_operand_p): New function.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/riscv/rvv/base/bug-11.C: Adapt testcase.
> * g++.target/riscv/rvv/base/bug-12.C: Ditto.
> * g++.target/riscv/rvv/base/bug-14.C: Ditto.
> * g++.target/riscv/rvv/base/bug-15.C: Ditto.
> * g++.target/riscv/rvv/base/bug-16.C: Ditto.
> * g++.target/riscv/rvv/base/bug-17.C: Ditto.
> * g++.target/riscv/rvv/base/bug-18.C: Ditto.
> * g++.target/riscv/rvv/base/bug-19.C: Ditto.
> * g++.target/riscv/rvv/base/bug-20.C: Ditto.
> * g++.target/riscv/rvv/base/bug-21.C: Ditto.
> * g++.target/riscv/rvv/base/bug-22.C: Ditto.
> * g++.target/riscv/rvv/base/bug-23.C: Ditto.
> * g++.target/riscv/rvv/base/bug-3.C: Ditto.
> * g++.target/riscv/rvv/base/bug-5.C: Ditto.
> * g++.target/riscv/rvv/base/bug-6.C: Ditto.
> * g++.target/riscv/rvv/base/bug-8.C: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-100.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-101.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-102.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-103.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-104.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-105.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-106.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-107.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-108.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-109.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-110.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-111.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-112.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-113.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-114.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-115.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-116.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-117.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-118.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-119.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-122.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-97.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-98.c: Ditto.
> * gcc.target/riscv/rvv/base/merge_constraint-1.c: Ditto.
> * gcc.target/riscv/rvv/base/narrow_constraint-6.c: Ditto.
> * gcc.target/riscv/rvv/base/narrow_constraint-7.c: Ditto.
> * gcc.target/riscv/rvv/base/narrow_constraint-8.c: Ditto.
> *

Re: [PATCH] RISC-V: Introduce rounding mode operand into fixed-point intrinsics

2023-05-17 Thread Kito Cheng via Gcc-patches

LGTM, thanks!

钟居哲 於 2023年5月17日 週三，23:02寫道：

> Ping this patch which is the prerequisite of this patch:
>
> https://patchwork.sourceware.org/project/gcc/patch/20230517095818.1285188-1-juzhe.zh...@rivai.ai/
>
> which has been approved by kito.
>
> Is this patch also ok for trunk ?
>
> Thanks.
> --
> juzhe.zh...@rivai.ai
>
>
> *From:* juzhe.zhong 
> *Date:* 2023-05-17 13:25
> *To:* gcc-patches 
> *CC:* kito.cheng ; kito.cheng
> ; palmer ; palmer
> ; jeffreyalaw ; rdapp.gcc
> ; Juzhe-Zhong 
> *Subject:* [PATCH] RISC-V: Introduce rounding mode operand into
> fixed-point intrinsics
> From: Juzhe-Zhong 
>
> According to new comming fixed-point API:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222
>
> Introduce vxrm argument:
> - vint32m1_t __riscv_vsadd_vv_i32m1 (vint32m1_t op1, vint32m1_t op2,
> size_t vl);
> + vint32m1_t __riscv_vsadd_vv_i32m1 (vint32m1_t op1, vint32m1_t op2,
> size_t vxrm, size_t vl);
>
> This patch doesn't insert vxrm csrw configuration instruction yet.
> Will support automatically insert csrw vxrm instruction in the next patch.
>
> This patch does this following:
> 1. Only extend the vxrm argument.
> 2. Check vxrm argument is invalid immediate and report error message if it
> is invalid.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc: Introduce rounding
> mode.
> * config/riscv/riscv-vector-builtins-shapes.cc (struct alu_def):
> Ditto.
> (struct narrow_alu_def): Ditto.
> * config/riscv/riscv-vector-builtins.cc
> (function_builder::apply_predication): Ditto.
> (function_expander::use_exact_insn): Ditto.
> * config/riscv/riscv-vector-builtins.h
> (function_checker::arg_num): New function.
> (function_base::has_rounding_mode_operand_p): New function.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/riscv/rvv/base/bug-11.C: Adapt testcase.
> * g++.target/riscv/rvv/base/bug-12.C: Ditto.
> * g++.target/riscv/rvv/base/bug-14.C: Ditto.
> * g++.target/riscv/rvv/base/bug-15.C: Ditto.
> * g++.target/riscv/rvv/base/bug-16.C: Ditto.
> * g++.target/riscv/rvv/base/bug-17.C: Ditto.
> * g++.target/riscv/rvv/base/bug-18.C: Ditto.
> * g++.target/riscv/rvv/base/bug-19.C: Ditto.
> * g++.target/riscv/rvv/base/bug-20.C: Ditto.
> * g++.target/riscv/rvv/base/bug-21.C: Ditto.
> * g++.target/riscv/rvv/base/bug-22.C: Ditto.
> * g++.target/riscv/rvv/base/bug-23.C: Ditto.
> * g++.target/riscv/rvv/base/bug-3.C: Ditto.
> * g++.target/riscv/rvv/base/bug-5.C: Ditto.
> * g++.target/riscv/rvv/base/bug-6.C: Ditto.
> * g++.target/riscv/rvv/base/bug-8.C: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-100.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-101.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-102.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-103.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-104.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-105.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-106.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-107.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-108.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-109.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-110.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-111.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-112.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-113.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-114.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-115.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-116.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-117.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-118.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-119.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-122.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-97.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-98.c: Ditto.
> * gcc.target/riscv/rvv/base/merge_constraint-1.c: Ditto.
> * gcc.target/riscv/rvv/base/narrow_constraint-6.c: Ditto.
> * gcc.target/riscv/rvv/base/narrow_constraint-7.c: Ditto.
> * gcc.target/riscv/rvv/base/narrow_constraint-8.c: Ditto.
> * gcc.target/riscv/rvv/base/narrow_constraint-9.c: Ditto.
> * gcc.target/riscv/rvv/base/vxrm-2.c: New test.
> * gcc.target/riscv/rvv/base/vxrm-3.c: New test.
> * gcc.target/riscv/rvv/base/vxrm-4.c: New test.
> * gcc.target/riscv/rvv/base/vxrm-5.c: New test.
>
> ---
>

Re: Re: [PATCH] RISC-V: Add mode switching target hook to insert rounding mode config for fixed-point instructions

2023-05-17 Thread 钟居哲

Hi, Kito. The intrinsic doc has updated fixed point enum.
This patch (You have LGTM) should be merged after this patch:

https://patchwork.sourceware.org/project/gcc/patch/20230517052521.405836-1-juzhe.zh...@rivai.ai/
 
Can you respond this patch ?

Thanks.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-17 18:05
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Add mode switching target hook to insert rounding 
mode config for fixed-point instructions
LGTM, it's really awesome, I know it's kind of blocking due to enum
stuff, so feel free to commit this once it unblock :)
 
On Wed, May 17, 2023 at 5:58 PM  wrote:
>
> From: Juzhe-Zhong 
>
> Hi, this patch support the new coming fixed-point intrinsics:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222
>
> Insert fixed-point rounding mode configuration by mode switching target hook.
>
> Mode switching target hook is implemented applying LCM (Lazy code Motion).
>
> So the performance && correctness can be well trusted.
>
> Here is the example:
>
> void f (void * in, void *out, int32_t x, int n, int m)
> {
>   for (int i = 0; i < n; i++) {
> vint32m1_t v = __riscv_vle32_v_i32m1 (in + i, 4);
> vint32m1_t v2 = __riscv_vle32_v_i32m1_tu (v, in + 100 + i, 4);
> vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, VXRM_RDN, 4);
> v3 = __riscv_vaadd_vx_i32m1 (v3, 3, VXRM_RDN, 4);
> __riscv_vse32_v_i32m1 (out + 100 + i, v3, 4);
>   }
>
>   for (int i = 0; i < n; i++) {
> vint32m1_t v = __riscv_vle32_v_i32m1 (in + i + 1000, 4);
> vint32m1_t v2 = __riscv_vle32_v_i32m1_tu (v, in + 100 + i + 1000, 4);
> vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, VXRM_RDN, 4);
> v3 = __riscv_vaadd_vx_i32m1 (v3, 3, VXRM_RDN, 4);
> __riscv_vse32_v_i32m1 (out + 100 + i + 1000, v3, 4);
>   }
> }
>
> ASM:
>
> ...
> csrwi   vxrm,2
> vsetivlizero,4,e32,m1,tu,ma
> ...
> Loop 1
> ...
> Loop 2
>
> mode switching can global recognize both Loop 1 and Loop 2 are using RDN
> rounding mode and hoist such single "csrwi vxrm,2" to dominate both Loop 1
> and Loop 2.
>
> Besides, I have add correctness check sanity tests in this patch too.
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-opts.h (enum riscv_entity): New enum.
> * config/riscv/riscv.cc (riscv_emit_mode_set): New function.
> (riscv_mode_needed): Ditto.
> (riscv_mode_after): Ditto.
> (riscv_mode_entry): Ditto.
> (riscv_mode_exit): Ditto.
> (riscv_mode_priority): Ditto.
> (TARGET_MODE_EMIT): New target hook.
> (TARGET_MODE_NEEDED): Ditto.
> (TARGET_MODE_AFTER): Ditto.
> (TARGET_MODE_ENTRY): Ditto.
> (TARGET_MODE_EXIT): Ditto.
> (TARGET_MODE_PRIORITY): Ditto.
> * config/riscv/riscv.h (OPTIMIZE_MODE_SWITCHING): Ditto.
> (NUM_MODES_FOR_MODE_SWITCHING): Ditto.
> * config/riscv/riscv.md: Add csrwvxrm.
> * config/riscv/vector.md (rnu,rne,rdn,rod,none): New attribute.
> (vxrmsi): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/vxrm-10.c: New test.
> * gcc.target/riscv/rvv/base/vxrm-6.c: New test.
> * gcc.target/riscv/rvv/base/vxrm-7.c: New test.
> * gcc.target/riscv/rvv/base/vxrm-8.c: New test.
> * gcc.target/riscv/rvv/base/vxrm-9.c: New test.
>
> ---
>  gcc/config/riscv/riscv-opts.h |   8 ++
>  gcc/config/riscv/riscv.cc | 104 ++
>  gcc/config/riscv/riscv.h  |   6 +-
>  gcc/config/riscv/riscv.md |   3 +-
>  gcc/config/riscv/vector.md|  29 +
>  .../gcc.target/riscv/rvv/base/vxrm-10.c   |  26 +
>  .../gcc.target/riscv/rvv/base/vxrm-6.c|  15 +++
>  .../gcc.target/riscv/rvv/base/vxrm-7.c|  16 +++
>  .../gcc.target/riscv/rvv/base/vxrm-8.c|  18 +++
>  .../gcc.target/riscv/rvv/base/vxrm-9.c|  26 +
>  10 files changed, 249 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-10.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-8.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-9.c
>
> diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
> index 1b2e6de5e1b..2a16402265a 100644
> --- a/gcc/config/riscv/riscv-opts.h
> +++ b/gcc/config/riscv/riscv-opts.h
> @@ -91,6 +91,14 @@ enum riscv_multilib_select_kind {
>select_by_abi,
>  };
>
> +/* ENTITIES in mode switching.  */
> +enum riscv_entity
> +{
> +  RISCV_VXRM = 0,
> +  RISCV_FRM,
> +  MAX_RISCV_ENTITIES
> +};
> +
>  #define MASK_ZICSR(1 << 0)
>  #define MASK_ZIFENCEI (1 << 1)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index

RE: [PATCH] RISC-V: Add rounding mode enum for fixed-point intrinsics

2023-05-17 Thread Li, Pan2 via Gcc-patches

Committed as the below doc PR updated, thanks kito.

https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Wednesday, May 17, 2023 11:01 AM
To: juzhe.zh...@rivai.ai
Cc: gcc-patches@gcc.gnu.org; kito.ch...@gmail.com; pal...@dabbelt.com; 
pal...@rivosinc.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Add rounding mode enum for fixed-point intrinsics

I would like to defer this until the PR has updated.

On Wed, May 17, 2023 at 9:52 AM  wrote:
>
> From: Juzhe-Zhong 
>
> Hi, since fixed-point with modeling rounding mode intrinsics are coming:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222
>
> I am adding vxrm rounding mode enum to user first before the API intrinsic.
>
> This patch is simple && obvious.
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins.cc (register_vxrm): New function.
> (DEF_RVV_VXRM_ENUM): New macro.
> (handle_pragma_vector): Add vxrm enum register.
> * config/riscv/riscv-vector-builtins.def (DEF_RVV_VXRM_ENUM): New 
> macro.
> (RNU): Ditto.
> (RNE): Ditto.
> (RDN): Ditto.
> (ROD): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/vxrm-1.c: New test.
>
> ---
>  gcc/config/riscv/riscv-vector-builtins.cc | 16 ++
>  gcc/config/riscv/riscv-vector-builtins.def| 11 +++
>  .../gcc.target/riscv/rvv/base/vxrm-1.c| 29 +++
>  3 files changed, 56 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-1.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
> b/gcc/config/riscv/riscv-vector-builtins.cc
> index b7458aaace6..bcabf1ea1a6 100644
> --- a/gcc/config/riscv/riscv-vector-builtins.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins.cc
> @@ -3740,6 +3740,19 @@ verify_type_context (location_t loc, type_context_kind 
> context, const_tree type,
>gcc_unreachable ();
>  }
>
> +/* Register the vxrm enum.  */
> +static void
> +register_vxrm ()
> +{
> +  auto_vec values;
> +#define DEF_RVV_VXRM_ENUM(NAME, VALUE)   
>\
> +  values.quick_push (string_int_pair ("VXRM_" #NAME, VALUE)); 
> +#include "riscv-vector-builtins.def"
> +#undef DEF_RVV_VXRM_ENUM
> +
> +  lang_hooks.types.simulate_enum_decl (input_location, "RVV_VXRM", 
> +); }
> +
>  /* Implement #pragma riscv intrinsic vector.  */  void  
> handle_pragma_vector () @@ -3755,6 +3768,9 @@ handle_pragma_vector ()
>for (unsigned int type_i = 0; type_i < NUM_VECTOR_TYPES; ++type_i)
>  register_vector_type ((enum vector_type_index) type_i);
>
> +  /* Define the enums.  */
> +  register_vxrm ();
> +
>/* Define the functions.  */
>function_table = new hash_table (1023);
>function_builder builder;
> diff --git a/gcc/config/riscv/riscv-vector-builtins.def 
> b/gcc/config/riscv/riscv-vector-builtins.def
> index 0a387fd1617..2a1a9dbc903 100644
> --- a/gcc/config/riscv/riscv-vector-builtins.def
> +++ b/gcc/config/riscv/riscv-vector-builtins.def
> @@ -83,6 +83,11 @@ along with GCC; see the file COPYING3.  If not see
>X64_VLMUL_EXT, TUPLE_SUBPART)
>  #endif
>
> +/* Define RVV_VXRM rounding mode enum for fixed-point intrinsics.  */ 
> +#ifndef DEF_RVV_VXRM_ENUM #define DEF_RVV_VXRM_ENUM(NAME, VALUE) 
> +#endif
> +
>  /* SEW/LMUL = 64:
> Only enable when TARGET_MIN_VLEN > 32.
> Machine mode = VNx1BImode when TARGET_MIN_VLEN < 128.
> @@ -643,6 +648,11 @@ DEF_RVV_BASE_TYPE (vlmul_ext_x64, get_vector_type 
> (type_idx))  DEF_RVV_BASE_TYPE (size_ptr, build_pointer_type 
> (size_type_node))  DEF_RVV_BASE_TYPE (tuple_subpart, 
> get_tuple_subpart_type (type_idx))
>
> +DEF_RVV_VXRM_ENUM (RNU, VXRM_RNU)
> +DEF_RVV_VXRM_ENUM (RNE, VXRM_RNE)
> +DEF_RVV_VXRM_ENUM (RDN, VXRM_RDN)
> +DEF_RVV_VXRM_ENUM (ROD, VXRM_ROD)
> +
>  #include "riscv-vector-type-indexer.gen.def"
>
>  #undef DEF_RVV_PRED_TYPE
> @@ -651,3 +661,4 @@ DEF_RVV_BASE_TYPE (tuple_subpart, 
> get_tuple_subpart_type (type_idx))  #undef DEF_RVV_TUPLE_TYPE  #undef 
> DEF_RVV_BASE_TYPE  #undef DEF_RVV_TYPE_INDEX
> +#undef DEF_RVV_VXRM_ENUM
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-1.c
> new file mode 100644
> index 000..0d364787ad0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-1.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +size_t f0 ()
> +{
> +  return VXRM_RNU;
> +}
> +
> +size_t f1 ()
> +{
> +  return VXRM_RNE;
> +}
> +
> +size_t f2 ()
> +{
> +  return VXRM_RDN;
> +}
> +
> +size_t f3 ()
> +{
> +  return VXRM_ROD;
> +}
> +
> +/* { dg-final { scan-assembler-times {li\s+[a-x0-9]+,\s*0} 1} } */
> +/* { dg-final { scan-assembler-times {li\s+[a-x0-9]+,\s*1} 1} } */
> +/* { dg-final { scan-assembler-times

Re: [PATCH] Convert ipcp_vr_lattice to type agnostic framework.

2023-05-17 Thread Aldy Hernandez via Gcc-patches





On 5/17/23 16:30, Aldy Hernandez wrote:

This converts the lattice to store ranges in Value_Range instead of
value_range (*) to make it type agnostic, and adjust all users
accordingly.

I think it is a good example on converting from static ranges to more
general, type agnostic ones.

I've been careful to make sure Value_Range never ends up on GC, since
it contains an int_range_max and can expand on-demand onto the heap.
Longer term storage for ranges should be done with vrange_storage, as
per the previous patch ("Provide an API for ipa_vr").

(*) I do know the Value_Range naming versus value_range is quite
annoying, but it was a judgement call last release for the eventual
migration to having "value_range" be a type agnostic range object.  We
will ultimately rename Value_Range to value_range.


I forgot to mention.  This doesn't make IPA be type agnostic per se, 
just the range usage throughout.  The IPA code is still guarded by stuff 
like:


  if (!param_type
  || (!INTEGRAL_TYPE_P (param_type)
  && !POINTER_TYPE_P (param_type)))
return dest_lat->set_to_bottom (param_type);

It is up to the maintainers to adjust their passes, as I'm liable to 
break everything in the process ;-).


The above should probably become:

   if (!param_type || !Value_Range::supports_type_p (param_type))
...

This is the canonical way of querying whether a type is supported by 
Value_Range, the ranger temporary that can handle each supported type, 
and thus the ranger.  This is documented here:


// To query what types ranger and the entire ecosystem can support,
// use Value_Range::supports_type_p(tree type).  This is a static
// method available independently of any vrange object.
//
// To query what a given vrange variant can support, use:
//irange::supports_p ()
//frange::supports_p ()
//etc

However, with the changes I have posted so far, ranges throughout have a 
much finer granularity and are no longer limited to the 2-sub-ranges in 
a value_range.  If you look at IPA dumps now, you'll see the ranges are 
much more refined and are streamed for LTO accordingly.  This is an 
improvement in and of itself.


Aldy

[PATCH] Convert ipcp_vr_lattice to type agnostic framework.

2023-05-17 Thread Aldy Hernandez via Gcc-patches

This converts the lattice to store ranges in Value_Range instead of
value_range (*) to make it type agnostic, and adjust all users
accordingly.

I think it is a good example on converting from static ranges to more
general, type agnostic ones.

I've been careful to make sure Value_Range never ends up on GC, since
it contains an int_range_max and can expand on-demand onto the heap.
Longer term storage for ranges should be done with vrange_storage, as
per the previous patch ("Provide an API for ipa_vr").

(*) I do know the Value_Range naming versus value_range is quite
annoying, but it was a judgement call last release for the eventual
migration to having "value_range" be a type agnostic range object.  We
will ultimately rename Value_Range to value_range.

OK for trunk?

gcc/ChangeLog:

* ipa-cp.cc (ipcp_vr_lattice::init): Take type argument.
(ipcp_vr_lattice::print): Call dump method.
(ipcp_vr_lattice::meet_with): Adjust for m_vr being a
Value_Range.
(ipcp_vr_lattice::meet_with_1): Make argument a reference.
(ipcp_vr_lattice::set_to_bottom): Add type argument.
(set_all_contains_variable): Same.
(initialize_node_lattices): Pass type when appropriate.
(ipa_vr_operation_and_type_effects): Make type agnostic.
(ipa_value_range_from_jfunc): Same.
(propagate_vr_across_jump_function): Same.
(propagate_constants_across_call): Same.
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Same.
(evaluate_properties_for_edge): Same.
* ipa-prop.cc (ipcp_update_vr): Same.
* ipa-prop.h (ipa_value_range_from_jfunc): Same.
(ipa_range_set_and_normalize): Same.
---
 gcc/ipa-cp.cc| 159 +++
 gcc/ipa-fnsummary.cc |  16 ++---
 gcc/ipa-prop.cc  |   2 +-
 gcc/ipa-prop.h   |  19 ++
 4 files changed, 101 insertions(+), 95 deletions(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index d4b9d4ac27e..bd5b1da17b2 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -343,20 +343,29 @@ private:
 class ipcp_vr_lattice
 {
 public:
-  value_range m_vr;
+  Value_Range m_vr;
 
   inline bool bottom_p () const;
   inline bool top_p () const;
-  inline bool set_to_bottom ();
-  bool meet_with (const value_range *p_vr);
+  inline bool set_to_bottom (tree type);
+  bool meet_with (const vrange _vr);
   bool meet_with (const ipcp_vr_lattice );
-  void init () { gcc_assert (m_vr.undefined_p ()); }
+  void init (tree type);
   void print (FILE * f);
 
 private:
-  bool meet_with_1 (const value_range *other_vr);
+  bool meet_with_1 (const vrange _vr);
 };
 
+inline void
+ipcp_vr_lattice::init (tree type)
+{
+  if (type)
+m_vr.set_type (type);
+
+  // Otherwise m_vr will default to unsupported_range.
+}
+
 /* Structure containing lattices for a parameter itself and for pieces of
aggregates that are passed in the parameter or by a reference in a parameter
plus some other useful flags.  */
@@ -585,7 +594,7 @@ ipcp_bits_lattice::print (FILE *f)
 void
 ipcp_vr_lattice::print (FILE * f)
 {
-  dump_value_range (f, _vr);
+  m_vr.dump (f);
 }
 
 /* Print all ipcp_lattices of all functions to F.  */
@@ -1016,14 +1025,14 @@ set_agg_lats_contain_variable (class 
ipcp_param_lattices *plats)
 bool
 ipcp_vr_lattice::meet_with (const ipcp_vr_lattice )
 {
-  return meet_with_1 (_vr);
+  return meet_with_1 (other.m_vr);
 }
 
 /* Meet the current value of the lattice with value range described by VR
lattice.  */
 
 bool
-ipcp_vr_lattice::meet_with (const value_range *p_vr)
+ipcp_vr_lattice::meet_with (const vrange _vr)
 {
   return meet_with_1 (p_vr);
 }
@@ -1032,23 +1041,23 @@ ipcp_vr_lattice::meet_with (const value_range *p_vr)
OTHER_VR lattice.  Return TRUE if anything changed.  */
 
 bool
-ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
+ipcp_vr_lattice::meet_with_1 (const vrange _vr)
 {
   if (bottom_p ())
 return false;
 
-  if (other_vr->varying_p ())
-return set_to_bottom ();
+  if (other_vr.varying_p ())
+return set_to_bottom (other_vr.type ());
 
   bool res;
   if (flag_checking)
 {
-  value_range save (m_vr);
-  res = m_vr.union_ (*other_vr);
+  Value_Range save (m_vr);
+  res = m_vr.union_ (other_vr);
   gcc_assert (res == (m_vr != save));
 }
   else
-res = m_vr.union_ (*other_vr);
+res = m_vr.union_ (other_vr);
   return res;
 }
 
@@ -1073,16 +1082,11 @@ ipcp_vr_lattice::bottom_p () const
previously was in a different state.  */
 
 bool
-ipcp_vr_lattice::set_to_bottom ()
+ipcp_vr_lattice::set_to_bottom (tree type)
 {
   if (m_vr.varying_p ())
 return false;
-  /* ?? We create all sorts of VARYING ranges for floats, structures,
- and other types which we cannot handle as ranges.  We should
- probably avoid handling them throughout the pass, but it's easier
- to create a sensible VARYING here and let the lattice
- propagate.  */
-  m_vr.set_varying

[PATCH] Provide an API for ipa_vr.

2023-05-17 Thread Aldy Hernandez via Gcc-patches

This patch encapsulates the ipa_vr internals into an API.  It also
makes it type agnostic, in preparation for upcoming changes to IPA.

Interestingly, there's a 0.44% improvement to IPA-cp, which I'm sure
we'll soak up with future changes in this area :).

BTW, there's a note here:
+  // vrange_storage is typeless, but we need to know what type of
+  // range that is being streamed out (irange, frange, etc).  AFAICT,
+  // there's no way to get at the underlying type by the time we
+  // stream out in write_ipcp_transformation_info.
+  tree m_type;

Could someone more IPA savvy double check this is indeed the case?

OK for trunk?

gcc/ChangeLog:

* ipa-cp.cc (ipa_value_range_from_jfunc): Use new ipa_vr API.
(ipcp_store_vr_results): Same.
* ipa-prop.cc (ipa_vr::ipa_vr): New.
(ipa_vr::get_vrange): New.
(ipa_vr::set_unknown): New.
(ipa_vr::streamer_read): New.
(ipa_vr::streamer_write): New.
(write_ipcp_transformation_info): Use new ipa_vr API.
(read_ipcp_transformation_info): Same.
(ipa_vr::nonzero_p): Delete.
(ipcp_update_vr): Use new ipa_vr API.
* ipa-prop.h (class ipa_vr): Provide an API and hide internals.
* ipa-sra.cc (zap_useless_ipcp_results): Use new ipa_vr API.
* gcc.dg/ipa/pr78121.c: Adjust for vrange::dump use.
* gcc.dg/ipa/vrp1.c: Same.
* gcc.dg/ipa/vrp2.c: Same.
* gcc.dg/ipa/vrp3.c: Same.
* gcc.dg/ipa/vrp4.c: Same.
* gcc.dg/ipa/vrp5.c: Same.
* gcc.dg/ipa/vrp6.c: Same.
* gcc.dg/ipa/vrp7.c: Same.
* gcc.dg/ipa/vrp8.c: Same.
---
 gcc/ipa-cp.cc  |  22 ++---
 gcc/ipa-prop.cc| 129 -
 gcc/ipa-prop.h |  25 --
 gcc/ipa-sra.cc |   4 +-
 gcc/testsuite/gcc.dg/ipa/pr78121.c |   2 +-
 gcc/testsuite/gcc.dg/ipa/vrp1.c|   4 +-
 gcc/testsuite/gcc.dg/ipa/vrp2.c|   4 +-
 gcc/testsuite/gcc.dg/ipa/vrp3.c|   2 +-
 gcc/testsuite/gcc.dg/ipa/vrp4.c|   2 +-
 gcc/testsuite/gcc.dg/ipa/vrp5.c|   2 +-
 gcc/testsuite/gcc.dg/ipa/vrp6.c|   2 +-
 gcc/testsuite/gcc.dg/ipa/vrp7.c|   2 +-
 gcc/testsuite/gcc.dg/ipa/vrp8.c|   2 +-
 13 files changed, 109 insertions(+), 93 deletions(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 8cd0fa2cae7..d4b9d4ac27e 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -1947,13 +1947,11 @@ ipa_value_range_from_jfunc (ipa_node_params *info, 
cgraph_edge *cs,
 
   idx = ipa_get_jf_pass_through_formal_id (jfunc);
 
-  if (!(*sum->m_vr)[idx].known)
+  if (!(*sum->m_vr)[idx].known_p ())
return vr;
   tree vr_type = ipa_get_type (info, idx);
-  value_range srcvr (vr_type,
-(*sum->m_vr)[idx].min,
-(*sum->m_vr)[idx].max,
-(*sum->m_vr)[idx].type);
+  value_range srcvr;
+  (*sum->m_vr)[idx].get_vrange (srcvr, vr_type);
 
   enum tree_code operation = ipa_get_jf_pass_through_operation (jfunc);
 
@@ -6621,25 +6619,19 @@ ipcp_store_vr_results (void)
   for (unsigned i = 0; i < count; i++)
{
  ipcp_param_lattices *plats = ipa_get_parm_lattices (info, i);
- ipa_vr vr;
 
  if (!plats->m_value_range.bottom_p ()
  && !plats->m_value_range.top_p ()
  && dbg_cnt (ipa_cp_vr))
{
- tree min, max;
- vr.known = true;
- vr.type = get_legacy_range (plats->m_value_range.m_vr, min, max);
- vr.min = wi::to_wide (min);
- vr.max = wi::to_wide (max);
+ ipa_vr vr (plats->m_value_range.m_vr);
+ ts->m_vr->quick_push (vr);
}
  else
{
- vr.known = false;
- vr.type = VR_VARYING;
- vr.min = vr.max = wi::zero (INT_TYPE_SIZE);
+ ipa_vr vr;
+ ts->m_vr->quick_push (vr);
}
- ts->m_vr->quick_push (vr);
}
 }
 }
diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index d7d70e5ec68..4ace410de49 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "symtab-clones.h"
 #include "attr-fnspec.h"
 #include "gimple-range.h"
+#include "value-range-storage.h"
 
 /* Function summary where the parameter infos are actually stored. */
 ipa_node_params_t *ipa_node_params_sum = NULL;
@@ -177,6 +178,66 @@ struct ipa_cst_ref_desc
 static object_allocator ipa_refdesc_pool
   ("IPA-PROP ref descriptions");
 
+ipa_vr::ipa_vr ()
+  : m_storage (NULL),
+m_type (NULL)
+{
+}
+
+ipa_vr::ipa_vr (const vrange )
+  : m_storage (ggc_alloc_vrange_storage (r)),
+m_type (r.type ())
+{
+}
+
+void
+ipa_vr::get_vrange (vrange , tree type) const
+{
+  m_storage->get_vrange (r, type);
+}
+
+void
+ipa_vr::set_unknown ()
+{
+  if (m_storage)
+ggc_free (m_storage);
+
+  m_storage = NULL;
+}
+
+void

[COMMITTED] Add Value_Range::operator=.

2023-05-17 Thread Aldy Hernandez via Gcc-patches

gcc/ChangeLog:

* value-range.h (Value_Range::operator=): New.
---
 gcc/value-range.h | 25 +
 1 file changed, 25 insertions(+)

diff --git a/gcc/value-range.h b/gcc/value-range.h
index ab982d18402..af81d6080da 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -523,6 +523,7 @@ public:
   Value_Range (const Value_Range &);
   void set_type (tree type);
   vrange& operator= (const vrange &);
+  Value_Range& operator= (const Value_Range &);
   bool operator== (const Value_Range ) const;
   bool operator!= (const Value_Range ) const;
   operator vrange &();
@@ -642,6 +643,30 @@ Value_Range::operator= (const vrange )
   return *m_vrange;
 }
 
+inline Value_Range &
+Value_Range::operator= (const Value_Range )
+{
+  if (r.m_vrange == _irange)
+{
+  m_irange = r.m_irange;
+  m_vrange = _irange;
+}
+  else if (r.m_vrange == _frange)
+{
+  m_frange = r.m_frange;
+  m_vrange = _frange;
+}
+  else if (r.m_vrange == _unsupported)
+{
+  m_unsupported = r.m_unsupported;
+  m_vrange = _unsupported;
+}
+  else
+gcc_unreachable ();
+
+  return *this;
+}
+
 inline bool
 Value_Range::operator== (const Value_Range ) const
 {
-- 
2.40.0

[COMMITTED] Provide support for copying unsupported ranges.

2023-05-17 Thread Aldy Hernandez via Gcc-patches

The unsupported_range class is provided for completness sake.  It is a
way to set VARYING/UNDEFINED ranges for unsupported ranges (currently
anything not float, integer, or pointer).  You can't do anything with
them, except set_varying, and set_undefined.  We will trap on any
other operation.

This patch provides a way to copy them, just in case they creep in.
This could happen in IPA under certain circumstances.

gcc/ChangeLog:

* value-range.cc (vrange::operator=): Add a stub to copy
unsupported ranges.
* value-range.h (is_a ): New.
(Value_Range::operator=): Support copying unsupported ranges.
---
 gcc/value-range.cc |  5 -
 gcc/value-range.h  | 12 
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 93c44a68365..45b1e655967 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -203,7 +203,10 @@ vrange::operator= (const vrange )
   else if (is_a  (src))
 as_a  (*this) = as_a  (src);
   else
-gcc_unreachable ();
+{
+  gcc_checking_assert (is_a  (src));
+  m_kind = src.m_kind;
+}
   return *this;
 }
 
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 0da2a42764a..ab982d18402 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -460,6 +460,13 @@ is_a  (vrange )
   return v.m_discriminator == VR_FRANGE;
 }
 
+template <>
+inline bool
+is_a  (vrange )
+{
+  return v.m_discriminator == VR_UNKNOWN;
+}
+
 // For resizable ranges, resize the range up to HARD_MAX_RANGES if the
 // NEEDED pairs is greater than the current capacity of the range.
 
@@ -624,6 +631,11 @@ Value_Range::operator= (const vrange )
   m_frange = as_a  (r);
   m_vrange = _frange;
 }
+  else if (is_a  (r))
+{
+  m_unsupported = as_a  (r);
+  m_vrange = _unsupported;
+}
   else
 gcc_unreachable ();
 
-- 
2.40.0

Re: [PATCH] Add support for vrange streaming.

2023-05-17 Thread Aldy Hernandez via Gcc-patches

I'm pushing this in preparation for further changes in this area later today.

Aldy

On Thu, Apr 27, 2023 at 1:36 PM Aldy Hernandez  wrote:
>
> Thanks. I will put it aside until I start posting the IPA patches.
>
> Aldy
>
> On Thu, Apr 27, 2023, 13:02 Richard Biener  wrote:
>>
>> On Tue, Apr 18, 2023 at 2:48 PM Aldy Hernandez  wrote:
>> >
>> >
>> >
>> > On 4/18/23 11:06, Aldy Hernandez wrote:
>> > > I think it's time for the ranger folk to start owning range streaming
>> > > instead of passes (IPA, etc) doing their own thing.  I have plans for
>> > > overhauling the IPA code later this cycle to support generic ranges,
>> > > and I'd like to start cleaning up the streaming and hashing interface.
>> > >
>> > > This patch adds generic streaming support for vrange.
>> > >
>> > > I'd appreciate another set of eyes.
>> > >
>> > > Thoughts?
>> >
>> > We recently added support for querying and storing an frange's NAN
>> > without the need to be friends with the class.
>> >
>> > Adjusted patch in testing...
>>
>> I think this is reasonable once you find use for it.
>>
>> Thanks,
>> Richard.
>>
>> > Aldy
>>

RE: [GCC12 backport] arm: MVE testsuite and backend bugfixes

2023-05-17 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Stam Markianos-Wright 
> Sent: Wednesday, May 17, 2023 2:41 PM
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Andrea Corallo
> 
> Subject: [GCC12 backport] arm: MVE testsuite and backend bugfixes
> 
> 
> On 17/05/2023 10:26, Kyrylo Tkachov wrote:
> > Hi Stam,
> >
> >> -Original Message-
> >> From: Stam Markianos-Wright 
> >> Sent: Tuesday, May 16, 2023 2:32 PM
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: Kyrylo Tkachov ; Richard Earnshaw
> >> ; Andrea Corallo
> 
> >> Subject: [GCC12 backport] arm: MVE testsuite and backend bugfixes
> >>
> >> Hi all,
> >>
> >> We've recently sent up a lot of patches overhauling the testsuite of the
> >> Arm MVE backend.
> >> With these changes, we've also identified and fixed a number of bugs
> >> (some backend bugs and many to do with the polymorphism of intrinsics
> in
> >> MVE the header file).
> >> These would all be relevant to backport to GCC12.
> >> The list is as follows (in the order they all apply on top of eachother):
> >>
> >> * This patch series:
> >> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606552.html
> >> (commits 9a79b522e0663a202a288db56ebcbdcdb48bdaca to
> >> f2b54e5b796b00f0072b61f9cd6a964c66ead29b)
> >> * ecc363971aeac52481d92de8b37521f6cc2d38e6 arm: Fix MVE testsuite
> >> fallouts
> >> * 06aa66af7d0dacc1b247d9e38175e789ef159191 arm: Add missing early
> >> clobber to MVE vrev64q_m patterns
> >> * c09663eabfb84ac56ddd8d44abcab3f4902c83bd testsuite: [arm] Relax
> >> expected register names in MVE tests
> >> * 330d665ce6dcc63ed0bd78d807e69bbfc55255b6 arm: [MVE] Add missing
> >> length=8 attribute
> >> * 8d4f007398bc3f8fea812fb8cff4d7d0556d12f1 arm: fix mve intrinsics scan
> >> body tests for C++
> >> * This patch series
> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610312.html
> >> (commits dd4424ef898608321b60610c4f3c98737ace3680 to
> >> 267f01a493ab8a0bec9325ce3386b946c46f2e98)
> >> * 8a1360e72d6c6056606aa5edd8c906c50f26de59 arm: Split up MVE
> _Generic
> >> associations to prevent type clashes [PR107515]
> >> * 3f0ca7a3e4431534bff3b8eb73709cc822e489b0 arm: Fix vcreate
> definition
> >> * c1093923733a1072a237f112e3239b5ebd88eadd arm: Make MVE
> masked
> >> stores
> >> read memory operand [PR 108177]
> >> * f54e31ddefe3ea7146624eabcb75b1c90dc59f1a arm: fix __arm_vld1q_z*
> >> and
> >> __arm_vst1q_p* intrinsics [PR108442]
> >> * 1d509f190393627cdf0afffc427b25dd21c2 arm: remove unused
> variables
> >> from test
> >>
> > Ok to backport.
> >
> >> -- up to this point everything applied cleanly. The final two need minor
> >> rebasing changes --
> >>
> >> * This patch series:
> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-April/617008.html (Not
> >> pushed to trunk yet, but has been approved. For trunk we do now need to
> >> resolve some merge conflicts, since Christophe has started merging the
> >> MVE Intrinsic Restructuring, but these are trivial. I will also backport
> >> to GCC13 where this patch series applies cleanly)
> >> * cfa118fc089e38a94ec60ccf5b667aea015e5f60 [arm] complete vmsr/vmrs
> >> blank and case adjustments.
> >>
> >> The final one is a commit from Alexandre Oliva that is needed to ensure
> >> that we don't accidentally regress the test due to the tabs vs spaces
> >> and capitalisation on the vmrs/vmsr instructions :)
> >>
> >> After all that, no regressions on baremetal arm-none-eabi in a bunch
> >> configurations (-marm, thumb1, thumb2, MVE, MVE.FP, softfp and hardfp):
> >>
> > Will you be sending these to the list after adjusting?
> 
> Yep, I believe we have to!
> 
> I'm thinking we should do one batch of [committed] emails for GCC12 and
> one for trunk.

Sounds good.

> 
> For GCC13 the previously sent version of the series at
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617373.html applies
> cleanly. Let me know if there's anything further we need to do!
> 

WFM, please go ahead.
Thanks,
Kyrill

> Thanks,
> Stamatis
> 
> 
> > Thanks,
> > Kyrill
> >
> >> Thanks,
> >> Stam

[GCC12 backport] arm: MVE testsuite and backend bugfixes

2023-05-17 Thread Stamatis Markianos-Wright via Gcc-patches




On 17/05/2023 10:26, Kyrylo Tkachov wrote:

Hi Stam,


-Original Message-
From: Stam Markianos-Wright 
Sent: Tuesday, May 16, 2023 2:32 PM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov ; Richard Earnshaw
; Andrea Corallo 
Subject: [GCC12 backport] arm: MVE testsuite and backend bugfixes

Hi all,

We've recently sent up a lot of patches overhauling the testsuite of the
Arm MVE backend.
With these changes, we've also identified and fixed a number of bugs
(some backend bugs and many to do with the polymorphism of intrinsics in
MVE the header file).
These would all be relevant to backport to GCC12.
The list is as follows (in the order they all apply on top of eachother):

* This patch series:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606552.html
(commits 9a79b522e0663a202a288db56ebcbdcdb48bdaca to
f2b54e5b796b00f0072b61f9cd6a964c66ead29b)
* ecc363971aeac52481d92de8b37521f6cc2d38e6 arm: Fix MVE testsuite
fallouts
* 06aa66af7d0dacc1b247d9e38175e789ef159191 arm: Add missing early
clobber to MVE vrev64q_m patterns
* c09663eabfb84ac56ddd8d44abcab3f4902c83bd testsuite: [arm] Relax
expected register names in MVE tests
* 330d665ce6dcc63ed0bd78d807e69bbfc55255b6 arm: [MVE] Add missing
length=8 attribute
* 8d4f007398bc3f8fea812fb8cff4d7d0556d12f1 arm: fix mve intrinsics scan
body tests for C++
* This patch series
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610312.html
(commits dd4424ef898608321b60610c4f3c98737ace3680 to
267f01a493ab8a0bec9325ce3386b946c46f2e98)
* 8a1360e72d6c6056606aa5edd8c906c50f26de59 arm: Split up MVE _Generic
associations to prevent type clashes [PR107515]
* 3f0ca7a3e4431534bff3b8eb73709cc822e489b0 arm: Fix vcreate definition
* c1093923733a1072a237f112e3239b5ebd88eadd arm: Make MVE masked
stores
read memory operand [PR 108177]
* f54e31ddefe3ea7146624eabcb75b1c90dc59f1a arm: fix __arm_vld1q_z*
and
__arm_vst1q_p* intrinsics [PR108442]
* 1d509f190393627cdf0afffc427b25dd21c2 arm: remove unused variables
from test


Ok to backport.


-- up to this point everything applied cleanly. The final two need minor
rebasing changes --

* This patch series:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/617008.html (Not
pushed to trunk yet, but has been approved. For trunk we do now need to
resolve some merge conflicts, since Christophe has started merging the
MVE Intrinsic Restructuring, but these are trivial. I will also backport
to GCC13 where this patch series applies cleanly)
* cfa118fc089e38a94ec60ccf5b667aea015e5f60 [arm] complete vmsr/vmrs
blank and case adjustments.

The final one is a commit from Alexandre Oliva that is needed to ensure
that we don't accidentally regress the test due to the tabs vs spaces
and capitalisation on the vmrs/vmsr instructions :)

After all that, no regressions on baremetal arm-none-eabi in a bunch
configurations (-marm, thumb1, thumb2, MVE, MVE.FP, softfp and hardfp):


Will you be sending these to the list after adjusting?


Yep, I believe we have to!

I'm thinking we should do one batch of [committed] emails for GCC12 and 
one for trunk.


For GCC13 the previously sent version of the series at 
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617373.html applies 
cleanly. Let me know if there's anything further we need to do!


Thanks,
Stamatis



Thanks,
Kyrill


Thanks,
Stam

Re: [PATCH] RISC-V: Adjust stdint.h to stdint-gcc.h for rvv tests

2023-05-17 Thread Kito Cheng via Gcc-patches

> > RISC-V glibc will require corresponding muilti-lib has built there,
> > otherwise will report something like:
> >
> >  /usr/include/gnu/stubs.h:11:11: fatal error: gnu/stubs-ilp32d.h: No
> > such file or directory
> >
> > But actually we only require those fixed length types to compile and
> > scan assembly or dump,
> > So we don't really have those multilib built, that's the reason we
> > workaround this way.
> >
> > This way could work even if the multilib build is disabled, and seems
> > ARM has same issue around that and just disable those tests:
>
> Then just mimic that or avoid using stdint.h and instead use things like 
> __INT32_TYPE__?

Hmmm, then that seems like we are inline those types definition from
stdint-gcc.h?

I mean what stdint-gcc.h did is just same as that:

#ifdef __INT8_TYPE__
typedef __INT8_TYPE__ int8_t;
#endif
...
#ifdef __UINT64_TYPE__
typedef __UINT64_TYPE__ uint64_t;
#endif

RE: [PATCH 1/2] PR gcc/98350:Add a param to control the length of the chain with FMA in reassoc pass

2023-05-17 Thread Cui, Lili via Gcc-patches

> I think to make a difference you need to hit the number of parallel fadd/fmul
> the pipeline can perform.  I don't think issue width is ever a problem for
> chains w/o fma and throughput of fma vs fadd + fmul should be similar.
> 

Yes, for x86 backend, fadd , fmul and fma have the same TP meaning they should 
have the same width. 
The current implementation is reasonable  /* reassoc int, fp, vec_int, vec_fp.  
*/.

> That said, I think iff then we should try to improve
> rewrite_expr_tree_parallel rather than adding a new function.  For example
> for the case with equal rank operands we can try to sort adds first.  I can't
> convince myself that rewrite_expr_tree_parallel honors ranks properly
> quickly.
> 

I rewrite this patch, there are mainly two changes:
1. I made some changes to rewrite_expr_tree_parallel_for_fma and used it 
instead of rewrite_expr_tree_parallel. The following example shows that the 
sequence generated by the this patch is better.
2. Put no-mult ops and mult ops alternately at the end of the queue, which is 
conducive to generating more fma and reducing the loss of FMA when breaking the 
chain.
  
With these two changes, GCC can break the chain with width = 2 and generates 6 
FMAs for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98350  without any params.

--
Source code： g + h + j + s + m + n+a+b +e  (https://godbolt.org/z/G8sb86n84)
Compile options: -Ofast -mfpmath=sse -mfma
Width = 3 was chosen for reassociation
-
Old rewrite_expr_tree_parallel generates:
  _6 = g_8(D) + h_9(D);   --> parallel 0
  _3 = s_11(D) + m_12(D);  --> parallel 1
  _5 = _3 + j_10(D);
  _2 = n_13(D) + a_14(D);   --> parallel 2
  _1 = b_15(D) + e_16(D);  -> Parallel 3, This is not necessary, and it is 
not friendly to FMA.
  _4 = _1 + _2;
  _7 = _4 + _5;
  _17 = _6 + _7;  
  return _17;

When the width = 3,  we need 5 cycles here.
-first 
end-
Rewrite the old rewrite_expr_tree_parallel (3 sets in parallel) generates:

  _3 = s_11(D) + m_12(D);  --> parallel 0
  _5 = _3 + j_10(D);
  _2 = n_13(D) + a_14(D);   --> parallel 1
  _1 = b_15(D) + e_16(D);   --> parallel 2
  _4 = _1 + _2;
  _6 = _4 + _5;
  _7 = _6 + h_9(D);
  _17 = _7 + g_8(D); 
  return _17;

When the width = 3, we need 5 cycles here.
-second 
end---
Use rewrite_expr_tree_parallel_for_fma instead of rewrite_expr_tree_parallel 
generates:

  _3 = s_11(D) + m_12(D);
  _6 = _3 + g_8(D);
  _2 = n_13(D) + a_14(D);
  _5 = _2 + h_9(D);
  _1 = b_15(D) + e_16(D);
  _4 = _1 + j_10(D);
  _7 = _4 + _5;
  _17 = _7 + _6;
  return _17;

When the width = 3, we need 4 cycles here.
third 
end---

Thanks,
Lili.

[PATCH] PR gcc/98350:Handle FMA friendly in reassoc pass

2023-05-17 Thread Cui, Lili via Gcc-patches

From: Lili Cui 

Make some changes in reassoc pass to make it more friendly to fma pass later.
Using FMA instead of mult + add reduces register pressure and insruction
retired.

There are mainly two changes
1. Put no-mult ops and mult ops alternately at the end of the queue, which is
conducive to generating more fma and reducing the loss of FMA when breaking
the chain.
2. Rewrite the rewrite_expr_tree_parallel function to try to build parallel
chains according to the given correlation width, keeping the FMA chance as
much as possible.

TEST1:

float
foo (float a, float b, float c, float d, float *e)
{
   return  *e  + a * b + c * d ;
}

For "-Ofast -mfpmath=sse -mfma" GCC generates:
vmulss  %xmm3, %xmm2, %xmm2
vfmadd132ss %xmm1, %xmm2, %xmm0
vaddss  (%rdi), %xmm0, %xmm0
ret

With this patch GCC generates:
vfmadd213ss   (%rdi), %xmm1, %xmm0
vfmadd231ss   %xmm2, %xmm3, %xmm0
ret

TEST2:

for (int i = 0; i < N; i++)
{
  a[i] += b[i]* c[i] + d[i] * e[i] + f[i] * g[i] + h[i] * j[i] + k[i] * l[i] + 
m[i]* o[i] + p[i];
}

For "-Ofast -mfpmath=sse -mfma"  GCC generates:
vmovapd e(%rax), %ymm4
vmulpd  d(%rax), %ymm4, %ymm3
addq$32, %rax
vmovapd c-32(%rax), %ymm5
vmovapd j-32(%rax), %ymm6
vmulpd  h-32(%rax), %ymm6, %ymm2
vmovapd a-32(%rax), %ymm6
vaddpd  p-32(%rax), %ymm6, %ymm0
vmovapd g-32(%rax), %ymm7
vfmadd231pd b-32(%rax), %ymm5, %ymm3
vmovapd o-32(%rax), %ymm4
vmulpd  m-32(%rax), %ymm4, %ymm1
vmovapd l-32(%rax), %ymm5
vfmadd231pd f-32(%rax), %ymm7, %ymm2
vfmadd231pd k-32(%rax), %ymm5, %ymm1
vaddpd  %ymm3, %ymm0, %ymm0
vaddpd  %ymm2, %ymm0, %ymm0
vaddpd  %ymm1, %ymm0, %ymm0
vmovapd %ymm0, a-32(%rax)
cmpq$8192, %rax
jne .L4
vzeroupper
ret

with this patch applied GCC breaks the chain with width = 2 and generates 6 fma:

vmovapd a(%rax), %ymm2
vmovapd c(%rax), %ymm0
addq$32, %rax
vmovapd e-32(%rax), %ymm1
vmovapd p-32(%rax), %ymm5
vmovapd g-32(%rax), %ymm3
vmovapd j-32(%rax), %ymm6
vmovapd l-32(%rax), %ymm4
vmovapd o-32(%rax), %ymm7
vfmadd132pd b-32(%rax), %ymm2, %ymm0
vfmadd132pd d-32(%rax), %ymm5, %ymm1
vfmadd231pd f-32(%rax), %ymm3, %ymm0
vfmadd231pd h-32(%rax), %ymm6, %ymm1
vfmadd231pd k-32(%rax), %ymm4, %ymm0
vfmadd231pd m-32(%rax), %ymm7, %ymm1
vaddpd  %ymm1, %ymm0, %ymm0
vmovapd %ymm0, a-32(%rax)
cmpq$8192, %rax
jne .L2
vzeroupper
ret

gcc/ChangeLog:

PR gcc/98350
* tree-ssa-reassoc.cc
(rewrite_expr_tree_parallel): Rewrite this function.
(rank_ops_for_fma): New.
(reassociate_bb): Handle new function.

gcc/testsuite/ChangeLog:

PR gcc/98350
* gcc.dg/pr98350-1.c: New test.
* gcc.dg/pr98350-2.c: Ditto.
---
 gcc/testsuite/gcc.dg/pr98350-1.c |  31 
 gcc/testsuite/gcc.dg/pr98350-2.c |  11 ++
 gcc/tree-ssa-reassoc.cc  | 256 +--
 3 files changed, 215 insertions(+), 83 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr98350-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr98350-2.c

diff --git a/gcc/testsuite/gcc.dg/pr98350-1.c b/gcc/testsuite/gcc.dg/pr98350-1.c
new file mode 100644
index 000..185511c5e0a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr98350-1.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mfpmath=sse -mfma -Wno-attributes " } */
+
+/* Test that the compiler properly optimizes multiply and add 
+   to generate more FMA instructions.  */
+#define N 1024
+double a[N];
+double b[N];
+double c[N];
+double d[N];
+double e[N];
+double f[N];
+double g[N];
+double h[N];
+double j[N];
+double k[N];
+double l[N];
+double m[N];
+double o[N];
+double p[N];
+
+
+void
+foo (void)
+{
+  for (int i = 0; i < N; i++)
+  {
+a[i] += b[i] * c[i] + d[i] * e[i] + f[i] * g[i] + h[i] * j[i] + k[i] * 
l[i] + m[i]* o[i] + p[i];
+  }
+}
+/* { dg-final { scan-assembler-times "vfm" 6  } } */
diff --git a/gcc/testsuite/gcc.dg/pr98350-2.c b/gcc/testsuite/gcc.dg/pr98350-2.c
new file mode 100644
index 000..b35d88aead9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr98350-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mfpmath=sse -mfma -Wno-attributes " } */
+
+/* Test that the compiler rearrange the ops to generate more FMA.  */
+
+float
+foo1 (float a, float b, float c, float d, float *e)
+{
+   return   *e + a * b + c * d ;
+}
+/* { dg-final { scan-assembler-times "vfm" 2  } } */
diff --git a/gcc/tree-ssa-reassoc.cc b/gcc/tree-ssa-reassoc.cc
index 067a3f07f7e..52c8aab6033 100644
--- a/gcc/tree-ssa-reassoc.cc
+++ b/gcc/tree-ssa-reassoc.cc
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not

Re: [PATCH] RISC-V: Adjust stdint.h to stdint-gcc.h for rvv tests

2023-05-17 Thread Richard Biener via Gcc-patches




> Am 17.05.2023 um 08:55 schrieb Kito Cheng :
> 
> RISC-V glibc will require corresponding muilti-lib has built there,
> otherwise will report something like:
> 
>  /usr/include/gnu/stubs.h:11:11: fatal error: gnu/stubs-ilp32d.h: No
> such file or directory
> 
> But actually we only require those fixed length types to compile and
> scan assembly or dump,
> So we don't really have those multilib built, that's the reason we
> workaround this way.
> 
> This way could work even if the multilib build is disabled, and seems
> ARM has same issue around that and just disable those tests:

Then just mimic that or avoid using stdint.h and instead use things like 
__INT32_TYPE__?

> ---
> # Return 1 if this is an ARM target supporting -mfloat-abi=soft.  Some
> # multilibs may be incompatible with this option.
> 
> proc check_effective_target_arm_soft_ok { } {
>   return [check_no_compiler_messages arm_soft_ok object {
>   #include 
>   int dummy;
>   int main (void) { return 0; }
>   } "-mfloat-abi=soft"]
> }
> 
> # Return 1 if this is an ARM target supporting -mfloat-abi=soft even
> # for linking.  Some multilibs may be incompatible with this option,
> # and some linkers may reject incompatible options.
> 
> proc check_effective_target_arm_soft_ok_link { } {
>   return [check_no_compiler_messages arm_soft_ok_link executable {
>   #include 
>   int dummy;
>   int main (void) { return 0; }
>   } "-mfloat-abi=soft"]
> }
> ---
> 
> 
>> On Wed, May 17, 2023 at 2:25 PM Robin Dapp  wrote:
>> 
>>> Huh, including stdint-gcc.h looks completely wrong.  What's the issue you 
>>> are
>>> trying to solve?
>> 
>> The way I understood it is that that's a temporary workaround until
>> all multilib et al. (+testsuite) configurations are in place but I
>> haven't checked the details myself.  Eventually this should be done
>> properly so we can include the regular headers.  Kito might want to
>> comment as he dealt with it before.
>> 
>> I used #include  for all those tests and Andreas Schwab reported:
>> 
>>  /usr/include/gnu/stubs.h:11:11: fatal error: gnu/stubs-ilp32d.h: No such 
>> file or directory
>> 
>> Regards
>> Robin

Re: [PATCH] vect: Missed opportunity to use [SU]ABD

2023-05-17 Thread Oluwatamilore Adebayo via Gcc-patches

> Yeah.  Like Tami says, this is what the instruction does.
> 
> I think all three definitions are equivalent: the extend/operate/truncate
> one, the ?: one above, and the "max - min" one.  Probably just personal
> preference as to which seems more natural.

Decided to switch to using the ?: one as it makes more sense
for unsigned types.

> It would be good to document what the parameters mean (except VINFO,
> which is obvious).

Documentation added for vect_recog_absolute_difference.

> I think this should instead be:
> 
>   if (TYPE_PRECISION (unprom_diff.type) != TYPE_PRECISION (abs_type)
>   && TYPE_UNSIGNED (unprom_diff.type)
>   && TYPE_UNSIGNED (abs_type))
> return false;
> 
> 
> 
> I think the code would be easier to follow if it used vect_widened_op_tree
> first, and only considered the unextended case on failure.

Implemented Richard's suggested changes to vect_recog_absolute_difference.

> Minor formatting nit: GCC style is to indent braces by two spaces
> further than an "if":
> 
>   if (...)
> {
>   ...
> }

Adopted this style.

New patch will be in the reply.

Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives

2023-05-17 Thread Frederik Harwath via Gcc-patches


Hi Jakub,

On 16.05.23 13:00, Jakub Jelinek wrote:

On Tue, May 16, 2023 at 11:45:16AM +0200, Frederik Harwath wrote:

The place where different compilers implement the loop transformations
was discussed in an OpenMP loop transformation meeting last year. Two
compilers (another one and GCC with this patch series) transformed 
the loops
in the middle end after the handling of data sharing, one planned to 
do so.
Yet another vendor had not yet decided where it will be implemented. 
Clang
currently does everything in the front end, but it was mentioned that 
this

might change in the future e.g. for code sharing with Flang. Implementing
the loop transformations late could potentially
complicate the implementation of transformations which require 
adjustments
of the data sharing clauses, but this is known and consequentially, 
no such

When already in the FE we determine how many canonical loops a particular
loop transformation creates, I think the primary changes I'd like to 
see is
really have OMP_UNROLL/OMP_TILE GENERIC statements (see below) and 
consider

where is the best spot to lower it. I believe for data sharing it is best
done during gimplification before the containing loops are handled, it is
already shared code among all the FEs, I think will make it easier to 
handle

data sharing right and gimplification is also where doacross processing is
done. While there is restriction that ordered clause is incompatible with
generated loops from tile construct, there isn't one for unroll (unless
"The ordered clause must not appear on a worksharing-loop directive if 
the associated loops

include the generated loops of a tile directive."
means unroll partial implicitly because partial unroll tiles the loop, but
it doesn't say it acts as if it was a tile construct), so we'd have to 
handle

#pragma omp for ordered(2)
for (int i = 0; i < 64; i++)
#pragma omp unroll partial(4)
for (int j = 0; j < 64; j++)
{
#pragma omp ordered depend (sink: i - 1, j - 2)
#pragma omp ordered depend (source)
}
and I think handling it after gimplification is going to be increasingly
harder. Of course another possibility is ask lang committee to clarify
unless it has been clarified already in 6.0 (but in TR11 it is not).


I do not really expect that we will have to handle this. Questions 
concerning

the correctness of code after applying loop transformations came up several
times since I have been following the design meetings and the result was
always either that nothing will be changed, because the loop transformations
are not expected to ensure the correctness of enclosing directives, or that
the use of the problematic construct in conjunction with loop 
transformations

will be forbidden. Concerning the use of "ordered" on transformed loops, the
latter approach was suggested for all transformations, cf. issue #3494 
in the
private OpenMP spec repository. I see that you have already asked for 
clarification

on unroll. I suppose this could also be fixed after gimplification with
reasonable effort. But let's just wait for the result of that discussion 
before we

continue worrying about this.


Also, I think creating temporaries is easier to be done during
gimplification than later.


This has not caused problems with the current approach.


Another option is as you implemented a separate pre-omp-lowering pass,
and another one would be do it in the omplower pass, which has actually
several subpasses internally, do it in the scan phase. Disadvantage of
a completely separate pass is that we have to walk the whole IL again,
while doing it in the scan phase means we avoid that cost. We already
do there similar transformations, scan_omp_simd transforms simd constructs
into if (...) simd else simt and then we process it with normal 
scan_omp_for

on what we've created. So, if you insist doing it after gimplification
perhaps for compatibility with other non-LLVM compilers, I'd prefer to
do it there rather than in a completely separate pass.


I see. This would be possible. My current approach is indeed rather
wasteful because the pass is not restricted to functions that actually
use loop transformations. I could add an attribute to such functions
that could be used to avoid the execution of the pass and hence
the gimple walk on functions that do not use transformations.


This is necessary to represent the loop nest that is affected by the
loop transformations by a single OMP_FOR to meet the expectations
of all later OpenMP code transformations. This is also the major
reason why the loop transformations are represented by clauses
instead of representing them as  "OMP_UNROLL/OMP_TILE as
GENERIC constructs like OMP_FOR" as you suggest below. Since the

I really don't see why. We try to represent what we see in the source
as OpenMP constructs as those constructs. We already have a precedent
with composite loop constructs, where for the combined constructs which
aren't innermost we temporarily use NULL

[committed] Re: [Patch,v4] Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings

2023-05-17 Thread Tobias Burnus


The patch has now been committed as r14-931-g80bb0b8a81fdc5

The only change is that I added the &_P in 'if (sym->ts.deferred &&
VAR_P (length))' in trans-decl.cc just to avoid potential issues in case
length is not a var decl (but e.g. a '0' tree node, cf. code).

Tobias

On 23.03.23 10:28, Tobias Burnus wrote:

[...]

Another update - fixing an independent issue which makes sense to be
part of this
patch.

Allocatable/pointer scalars are currently mapped as:

 #pragma omp target enter data map(to:*var.1_1 [len: 4]) map(alloc:var
[pointer assign, bias: 0])
 #pragma omp target exit data map(from:*var.2_2 [len: 4])

where 'GOMP_MAP_POINTER' is removed in gimplify.cc. In v3 (and v4) of
this patch,
this kind of handling moved from gimplify.cc to
fortran/trans-openmp.cc; however,
v3 has the same problem. For allocatable arrays, we have PSET +
POINTER and
the PSET part is changed/set to RELEASE/DELETE for 'exit data'

But for scalars, the map was still left on the stack. Besides having a
stale map,
this could lead to fails when the stack was popped, especially when
attempting
to later map another stack variable with the same stack address,
partially
overlapping with the stale POINTER.

Side remark:
I found this for testcase that is part of an upcoming deep-mapping
follow-up patch;
that test failed with -O1 but worked with -O0/-Og due to changed stack
usage.
(Deep-mapping of allocatable components is on the OG12 branch; it is
scheduled
for mainline integration after stage1 opened.)


The updated mainline patch is included; map-10.f90 is the new testcase.
If anyone wants to see it separately, the patch has been committed to
OG12 as
https://gcc.gnu.org/g:8ea805840200f7dfd2c11b37abf5fbfe479c2fe2

Comments/thoughts/remarks to this patch?

Tobias

PS: For the rest of the patch, see a short description below - or with
some longer
remarks previous in this thread.

On 27.02.23 13:15, Tobias Burnus wrote:

And another re-diff for GCC 13/mainline, updating gcc/testsuite/

(The last change is related to the "[OG12,committed] Update dg-dump-scan
for ..." discussion + OG12 https://gcc.gnu.org/g:e4de87a2309 /
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612871.html )

On 23.02.23 17:42, Tobias Burnus wrote:

On 21.02.23 12:57, Tobias Burnus wrote:

This patch moves some generic code for Fortran out of gimplify.cc
to trans-openmp.cc and fixes several issues related to mapping.

Tested with nvptx offloading.
OK for mainline?

Tobias

Caveats:

Besides the issues shown in the comment-out code, there remains
also an
issue with implicit mapping - at least for deferred-length strings,
but I wouldn't be surprised if - at least depending on the used
'defaultmap' value (e.g. 'alloc') - there are also issues with array
descriptors.

Note:

Regarding the declare target check for mapping: Without declare
target, my assumption is that the hidden length variable will
get implicitly mapped if needed. Independent of deferred-length
or not, there is probably an issue with 'defaultmap(none)' and
the hidden variable. - In any case, I prefer to defer all those
issues to later (by having them captured in one/several PR).


Tobias

PS: This patch is a follow up to
  [Patch] Fortran/OpenMP: Fix DT struct-component with 'alloc' and
array descr
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604887.html
which fixed part of the problems. But as discussed on IRC, it did
treat 'alloc'
as special and missed some other map types. - In addition, this patch
has a
much extended test coverage and fixes some more issues found that way.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 80bb0b8a81fdc5d0a1c88ae3febd593868daa752
Author: Tobias Burnus 
Date:   Wed May 17 12:28:14 2023 +0200

Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings

Previously, array descriptors might have been mapped as 'alloc'
instead of 'to' for 'alloc', not updating the array bounds. The
'alloc' could also appear for 'data exit', failing with a libgomp
assert. In some cases, either array descriptors or deferred-length
string's length variable was not mapped. And, finally, some offset
calculations with array-sections mappings went wrong.

Additionally, the patch now unmaps for scalar allocatables/pointers
the GOMP_MAP_POINTER, avoiding stale mappings.

The testcases contain some comment-out tests which require follow-up
work and for which PR exist. Those mostly relate to deferred-length
strings which have several issues beyong OpenMP support.

gcc/fortran/ChangeLog:

* trans-decl.cc (gfc_get_symbol_decl): Add attributes
such as 'declare target' also to hidden artificial
variable for deferred-length character

Re: [Ping][PATCH] libstdc++: Add missing functions to [PR79700]

2023-05-17 Thread Jonathan Wakely via Gcc-patches

On Wed, 17 May 2023 at 10:38, Nathaniel Shead 
wrote:

> On Wed, May 17, 2023 at 10:05:59AM +0100, Jonathan Wakely wrote:
> > On Wed, 17 May 2023 at 09:37, Nathaniel Shead wrote:
> >
> > > Now that GCC13.1 is released is it ok to merge? Thanks!
> > >
> >
> > Yes, I've been testing this locally, but I think it needs more work
> (sorry!)
> >
> > Looking at it again, I'm not sure why I asked for the additional tests
> > because if they fail, it's a problem in libc, and there's nothing we can
> > actually do about it in libstdc++. We certainly do want std::expl(0.0L)
> to
> > return the same thing as std::exp(0.0L), but if it doesn't, we'll just
> have
> > a libstdc++ test failure caused by a bug in libc. But you wrote the test
> > now, so let's keep it. If we get failures for the test it will allow us
> to
> > inform the relevant libc maintainers that they have a bug.
>
> Sounds good.
>
> > Also, since you're contributing this under the DCO terms the new test
> > should not have the FSF copyright header, unless it's a derived work of
> an
> > existing test with that header (and in that case it should retain the
> dates
> > from the copied test). I don't actually bother putting the copyright and
> > license header on new tests these days. There's nothing in that test that
> > is novel or interesting, and I think it's arguably not useful or
> meaningful
> > to consider it copyrighted.
>
> Makes sense, I was just copying from other tests in the directory. I'll
> keep this in mind for the future, thanks!
>

Yeah, we have a mix of tests using the old conventions (with copyright and
GPL headers) and new conventions (don't bother, they're not really
meaningful on tests).

We're unlikely to *remove* the copyright notices from the old tests,
because that would require all sorts of legal wrangling, and it's not clear
that the copyright holder (the FSF) would agree to it anyway.





> > Finally, and most importantly, the new using-declarations in  are
> > not guarded by any autoconf macro. That will break targets without full
> C99
> >  support, e.g. djgpp declares acosf but not acosl, so the new
> > "using acosl;" would be a hard error as soon as  is included (and
> > might even prevent GCC building on that target). So I think we need a new
> > autoconf check for the existence of those functions. I'm in the process
> of
> > reworking the autoconf macros for  (due to PR 109818), which is
> why
> > I didn't address it for this patch yet.
>
> Ah, I see; yes, that would be a problem. I'm not very familiar with
> autoconf, so thanks for working this out. Let me know when you've done
> that if there's anything else I should do for this patch.
>

I hope to have an updated patch by next week, so I'll let you know once
that's ready. Thanks for your patience and for pining the patch.



> > >
> > > On Tue, Apr 18, 2023 at 6:48 PM Jonathan Wakely 
> > > wrote:
> > > >
> > > > On Mon, 17 Apr 2023 at 09:11, Nathaniel Shead <
> nathanielosh...@gmail.com>
> > > wrote:
> > > > >
> > > > > Hi, just checking whether there were any issues with this patch?
> > > > >
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612386.html
> > > > >
> > > > > Otherwise I assume it won't be in GCC13.
> > > >
> > > > That's right, it's too large and invasive a change to get into GCC 13
> > > > when only submitted in February, sorry. I'll merge it to trunk once
> > > > GCC 13.1 is released though.
> > > >
> > >
> > >
>
>

Re: [PATCH] RISC-V: Add mode switching target hook to insert rounding mode config for fixed-point instructions

2023-05-17 Thread Kito Cheng via Gcc-patches

LGTM, it's really awesome, I know it's kind of blocking due to enum
stuff, so feel free to commit this once it unblock :)

On Wed, May 17, 2023 at 5:58 PM  wrote:
>
> From: Juzhe-Zhong 
>
> Hi, this patch support the new coming fixed-point intrinsics:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222
>
> Insert fixed-point rounding mode configuration by mode switching target hook.
>
> Mode switching target hook is implemented applying LCM (Lazy code Motion).
>
> So the performance && correctness can be well trusted.
>
> Here is the example:
>
> void f (void * in, void *out, int32_t x, int n, int m)
> {
>   for (int i = 0; i < n; i++) {
> vint32m1_t v = __riscv_vle32_v_i32m1 (in + i, 4);
> vint32m1_t v2 = __riscv_vle32_v_i32m1_tu (v, in + 100 + i, 4);
> vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, VXRM_RDN, 4);
> v3 = __riscv_vaadd_vx_i32m1 (v3, 3, VXRM_RDN, 4);
> __riscv_vse32_v_i32m1 (out + 100 + i, v3, 4);
>   }
>
>   for (int i = 0; i < n; i++) {
> vint32m1_t v = __riscv_vle32_v_i32m1 (in + i + 1000, 4);
> vint32m1_t v2 = __riscv_vle32_v_i32m1_tu (v, in + 100 + i + 1000, 4);
> vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, VXRM_RDN, 4);
> v3 = __riscv_vaadd_vx_i32m1 (v3, 3, VXRM_RDN, 4);
> __riscv_vse32_v_i32m1 (out + 100 + i + 1000, v3, 4);
>   }
> }
>
> ASM:
>
> ...
> csrwi   vxrm,2
> vsetivlizero,4,e32,m1,tu,ma
> ...
> Loop 1
> ...
> Loop 2
>
> mode switching can global recognize both Loop 1 and Loop 2 are using RDN
> rounding mode and hoist such single "csrwi vxrm,2" to dominate both Loop 1
> and Loop 2.
>
> Besides, I have add correctness check sanity tests in this patch too.
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-opts.h (enum riscv_entity): New enum.
> * config/riscv/riscv.cc (riscv_emit_mode_set): New function.
> (riscv_mode_needed): Ditto.
> (riscv_mode_after): Ditto.
> (riscv_mode_entry): Ditto.
> (riscv_mode_exit): Ditto.
> (riscv_mode_priority): Ditto.
> (TARGET_MODE_EMIT): New target hook.
> (TARGET_MODE_NEEDED): Ditto.
> (TARGET_MODE_AFTER): Ditto.
> (TARGET_MODE_ENTRY): Ditto.
> (TARGET_MODE_EXIT): Ditto.
> (TARGET_MODE_PRIORITY): Ditto.
> * config/riscv/riscv.h (OPTIMIZE_MODE_SWITCHING): Ditto.
> (NUM_MODES_FOR_MODE_SWITCHING): Ditto.
> * config/riscv/riscv.md: Add csrwvxrm.
> * config/riscv/vector.md (rnu,rne,rdn,rod,none): New attribute.
> (vxrmsi): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/vxrm-10.c: New test.
> * gcc.target/riscv/rvv/base/vxrm-6.c: New test.
> * gcc.target/riscv/rvv/base/vxrm-7.c: New test.
> * gcc.target/riscv/rvv/base/vxrm-8.c: New test.
> * gcc.target/riscv/rvv/base/vxrm-9.c: New test.
>
> ---
>  gcc/config/riscv/riscv-opts.h |   8 ++
>  gcc/config/riscv/riscv.cc | 104 ++
>  gcc/config/riscv/riscv.h  |   6 +-
>  gcc/config/riscv/riscv.md |   3 +-
>  gcc/config/riscv/vector.md|  29 +
>  .../gcc.target/riscv/rvv/base/vxrm-10.c   |  26 +
>  .../gcc.target/riscv/rvv/base/vxrm-6.c|  15 +++
>  .../gcc.target/riscv/rvv/base/vxrm-7.c|  16 +++
>  .../gcc.target/riscv/rvv/base/vxrm-8.c|  18 +++
>  .../gcc.target/riscv/rvv/base/vxrm-9.c|  26 +
>  10 files changed, 249 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-10.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-8.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-9.c
>
> diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
> index 1b2e6de5e1b..2a16402265a 100644
> --- a/gcc/config/riscv/riscv-opts.h
> +++ b/gcc/config/riscv/riscv-opts.h
> @@ -91,6 +91,14 @@ enum riscv_multilib_select_kind {
>select_by_abi,
>  };
>
> +/* ENTITIES in mode switching.  */
> +enum riscv_entity
> +{
> +  RISCV_VXRM = 0,
> +  RISCV_FRM,
> +  MAX_RISCV_ENTITIES
> +};
> +
>  #define MASK_ZICSR(1 << 0)
>  #define MASK_ZIFENCEI (1 << 1)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index de5b87b1a87..0d1b83f4315 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -7513,6 +7513,95 @@ riscv_vectorize_preferred_vector_alignment (const_tree 
> type)
>return TYPE_ALIGN (type);
>  }
>
> +/* Implement Mode switching.  */
> +
> +static void
> +riscv_emit_mode_set (int entity, int mode, int prev_mode,
> +HARD_REG_SET regs_live ATTRIBUTE_UNUSED)
> +{
> +  switch (entity)
> +{
> +case RISCV_VXRM:
> +  if (mode != VXRM_MODE_NONE && mode != prev_mode)
> +

Re: [committed] libstdc++: Disable cacheline alignment for DJGPP [PR109741]

2023-05-17 Thread Jonathan Wakely via Gcc-patches

On Wed, 17 May 2023 at 10:32, Martin Jambor wrote:

> Hello,
>
> On Tue, May 16 2023, Jonathan Wakely via Gcc-patches wrote:
> > Tested powerpc64le-linux. Builds OK on djgpp too.
> >
> > Pushed to trunk.
> >
> > -- >8 --
> >
> > DJGPP (and maybe other targets) uses MAX_OFILE_ALIGNMENT=16 which means
> > that globals (and static objects) can't have alignment greater than 16.
> > This causes an error for the locks defined in src/c++11/shared_ptr.cc
> > because we try to align them to the cacheline size, to avoid false
> > sharing.
> >
> > Add a configure check for the increased alignment, and live with false
> > sharing where we can't increase the alignment.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   PR libstdc++/109741
> >   * acinclude.m4 (GLIBCXX_CHECK_ALIGNAS_CACHELINE): Define.
> >   * config.h.in: Regenerate.
> >   * configure: Regenerate.
> >   * configure.ac: Use GLIBCXX_CHECK_ALIGNAS_CACHELINE.
> >   * src/c++11/shared_ptr.cc (__gnu_internal::get_mutex): Do not
> >   align lock table if not supported. use __GCC_DESTRUCTIVE_SIZE
> >   instead of hardcoded 64.
>
> The periodic tests that Martin Liška left behind for me (for now) to
> look at are now complaining that the configure and configure.ac in
> libstdc++ are not in sync, the difference being (drum roll):
>
> diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
> index 188be08d716..412c4bf0e85 100755
> --- a/libstdc++-v3/configure
> +++ b/libstdc++-v3/configure
> @@ -71957,6 +71957,7 @@ _ACEOF
>fi
>
>
> +# For src/c++11/shared_ptr.cc alignment.
>
>
>ac_ext=cpp
>


Oh no!  I was worried I'd actually broken something when I saw the email
land in my inbox :-)

Looks like I added that comment to configure.ac and then forgot to re-run
autoreconf.



> Can I commit this change or do I need to attempt to adjust the script to
> ignore comments in configure or what would be the correct way to deal
> with this? (Option requiring work on my side may take some time during
> which the otherwise useful tests would not work, I am afraid).
>
>
The "correct" fix is to run autoreconf in the libstdc++-v3 dir to regen the
configure script, but for something this trivial it would also be fine to
just make the change manually.

I've pushed the regenerated file now, as r14-930-g7ddbc6171b383c

Thanks for running those checks!

[PATCH] RISC-V: Add mode switching target hook to insert rounding mode config for fixed-point instructions

2023-05-17 Thread juzhe . zhong

From: Juzhe-Zhong 

Hi, this patch support the new coming fixed-point intrinsics:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222

Insert fixed-point rounding mode configuration by mode switching target hook.

Mode switching target hook is implemented applying LCM (Lazy code Motion).

So the performance && correctness can be well trusted.

Here is the example:

void f (void * in, void *out, int32_t x, int n, int m)
{
  for (int i = 0; i < n; i++) {
vint32m1_t v = __riscv_vle32_v_i32m1 (in + i, 4);
vint32m1_t v2 = __riscv_vle32_v_i32m1_tu (v, in + 100 + i, 4);
vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, VXRM_RDN, 4);
v3 = __riscv_vaadd_vx_i32m1 (v3, 3, VXRM_RDN, 4);
__riscv_vse32_v_i32m1 (out + 100 + i, v3, 4);
  }
  
  for (int i = 0; i < n; i++) {
vint32m1_t v = __riscv_vle32_v_i32m1 (in + i + 1000, 4);
vint32m1_t v2 = __riscv_vle32_v_i32m1_tu (v, in + 100 + i + 1000, 4);
vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, VXRM_RDN, 4);
v3 = __riscv_vaadd_vx_i32m1 (v3, 3, VXRM_RDN, 4);
__riscv_vse32_v_i32m1 (out + 100 + i + 1000, v3, 4);
  }
}

ASM:

...
csrwi   vxrm,2
vsetivlizero,4,e32,m1,tu,ma
...
Loop 1
...
Loop 2

mode switching can global recognize both Loop 1 and Loop 2 are using RDN
rounding mode and hoist such single "csrwi vxrm,2" to dominate both Loop 1
and Loop 2.

Besides, I have add correctness check sanity tests in this patch too.

Ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv-opts.h (enum riscv_entity): New enum.
* config/riscv/riscv.cc (riscv_emit_mode_set): New function.
(riscv_mode_needed): Ditto.
(riscv_mode_after): Ditto.
(riscv_mode_entry): Ditto.
(riscv_mode_exit): Ditto.
(riscv_mode_priority): Ditto.
(TARGET_MODE_EMIT): New target hook.
(TARGET_MODE_NEEDED): Ditto.
(TARGET_MODE_AFTER): Ditto.
(TARGET_MODE_ENTRY): Ditto.
(TARGET_MODE_EXIT): Ditto.
(TARGET_MODE_PRIORITY): Ditto.
* config/riscv/riscv.h (OPTIMIZE_MODE_SWITCHING): Ditto.
(NUM_MODES_FOR_MODE_SWITCHING): Ditto.
* config/riscv/riscv.md: Add csrwvxrm.
* config/riscv/vector.md (rnu,rne,rdn,rod,none): New attribute.
(vxrmsi): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vxrm-10.c: New test.
* gcc.target/riscv/rvv/base/vxrm-6.c: New test.
* gcc.target/riscv/rvv/base/vxrm-7.c: New test.
* gcc.target/riscv/rvv/base/vxrm-8.c: New test.
* gcc.target/riscv/rvv/base/vxrm-9.c: New test.

---
 gcc/config/riscv/riscv-opts.h |   8 ++
 gcc/config/riscv/riscv.cc | 104 ++
 gcc/config/riscv/riscv.h  |   6 +-
 gcc/config/riscv/riscv.md |   3 +-
 gcc/config/riscv/vector.md|  29 +
 .../gcc.target/riscv/rvv/base/vxrm-10.c   |  26 +
 .../gcc.target/riscv/rvv/base/vxrm-6.c|  15 +++
 .../gcc.target/riscv/rvv/base/vxrm-7.c|  16 +++
 .../gcc.target/riscv/rvv/base/vxrm-8.c|  18 +++
 .../gcc.target/riscv/rvv/base/vxrm-9.c|  26 +
 10 files changed, 249 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vxrm-9.c

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 1b2e6de5e1b..2a16402265a 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -91,6 +91,14 @@ enum riscv_multilib_select_kind {
   select_by_abi,
 };
 
+/* ENTITIES in mode switching.  */
+enum riscv_entity
+{
+  RISCV_VXRM = 0,
+  RISCV_FRM,
+  MAX_RISCV_ENTITIES
+};
+
 #define MASK_ZICSR(1 << 0)
 #define MASK_ZIFENCEI (1 << 1)
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index de5b87b1a87..0d1b83f4315 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7513,6 +7513,95 @@ riscv_vectorize_preferred_vector_alignment (const_tree 
type)
   return TYPE_ALIGN (type);
 }
 
+/* Implement Mode switching.  */
+
+static void
+riscv_emit_mode_set (int entity, int mode, int prev_mode,
+HARD_REG_SET regs_live ATTRIBUTE_UNUSED)
+{
+  switch (entity)
+{
+case RISCV_VXRM:
+  if (mode != VXRM_MODE_NONE && mode != prev_mode)
+   emit_insn (gen_vxrmsi (gen_int_mode (mode, SImode)));
+  break;
+default:
+  gcc_unreachable ();
+}
+}
+
+/* Return mode that entity must be switched into
+   prior to the execution of insn.  */
+
+static int
+riscv_mode_needed (int entity, rtx_insn *insn)
+{
+  switch (entity)
+{
+case RISCV_VXRM:
+  return recog_memoized (insn) >= 0 ? get_attr_vxrm_mode (insn)
+

Re: [Ping][PATCH] libstdc++: Add missing functions to [PR79700]

2023-05-17 Thread Nathaniel Shead via Gcc-patches

On Wed, May 17, 2023 at 10:05:59AM +0100, Jonathan Wakely wrote:
> On Wed, 17 May 2023 at 09:37, Nathaniel Shead wrote:
> 
> > Now that GCC13.1 is released is it ok to merge? Thanks!
> >
> 
> Yes, I've been testing this locally, but I think it needs more work (sorry!)
> 
> Looking at it again, I'm not sure why I asked for the additional tests
> because if they fail, it's a problem in libc, and there's nothing we can
> actually do about it in libstdc++. We certainly do want std::expl(0.0L) to
> return the same thing as std::exp(0.0L), but if it doesn't, we'll just have
> a libstdc++ test failure caused by a bug in libc. But you wrote the test
> now, so let's keep it. If we get failures for the test it will allow us to
> inform the relevant libc maintainers that they have a bug.

Sounds good.

> Also, since you're contributing this under the DCO terms the new test
> should not have the FSF copyright header, unless it's a derived work of an
> existing test with that header (and in that case it should retain the dates
> from the copied test). I don't actually bother putting the copyright and
> license header on new tests these days. There's nothing in that test that
> is novel or interesting, and I think it's arguably not useful or meaningful
> to consider it copyrighted.

Makes sense, I was just copying from other tests in the directory. I'll
keep this in mind for the future, thanks!

> Finally, and most importantly, the new using-declarations in  are
> not guarded by any autoconf macro. That will break targets without full C99
>  support, e.g. djgpp declares acosf but not acosl, so the new
> "using acosl;" would be a hard error as soon as  is included (and
> might even prevent GCC building on that target). So I think we need a new
> autoconf check for the existence of those functions. I'm in the process of
> reworking the autoconf macros for  (due to PR 109818), which is why
> I didn't address it for this patch yet.

Ah, I see; yes, that would be a problem. I'm not very familiar with
autoconf, so thanks for working this out. Let me know when you've done
that if there's anything else I should do for this patch.

> >
> > On Tue, Apr 18, 2023 at 6:48 PM Jonathan Wakely 
> > wrote:
> > >
> > > On Mon, 17 Apr 2023 at 09:11, Nathaniel Shead 
> > wrote:
> > > >
> > > > Hi, just checking whether there were any issues with this patch?
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612386.html
> > > >
> > > > Otherwise I assume it won't be in GCC13.
> > >
> > > That's right, it's too large and invasive a change to get into GCC 13
> > > when only submitted in February, sorry. I'll merge it to trunk once
> > > GCC 13.1 is released though.
> > >
> >
> >

Re: [committed] libstdc++: Disable cacheline alignment for DJGPP [PR109741]

2023-05-17 Thread Martin Jambor

Hello,

On Tue, May 16 2023, Jonathan Wakely via Gcc-patches wrote:
> Tested powerpc64le-linux. Builds OK on djgpp too.
>
> Pushed to trunk.
>
> -- >8 --
>
> DJGPP (and maybe other targets) uses MAX_OFILE_ALIGNMENT=16 which means
> that globals (and static objects) can't have alignment greater than 16.
> This causes an error for the locks defined in src/c++11/shared_ptr.cc
> because we try to align them to the cacheline size, to avoid false
> sharing.
>
> Add a configure check for the increased alignment, and live with false
> sharing where we can't increase the alignment.
>
> libstdc++-v3/ChangeLog:
>
>   PR libstdc++/109741
>   * acinclude.m4 (GLIBCXX_CHECK_ALIGNAS_CACHELINE): Define.
>   * config.h.in: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Use GLIBCXX_CHECK_ALIGNAS_CACHELINE.
>   * src/c++11/shared_ptr.cc (__gnu_internal::get_mutex): Do not
>   align lock table if not supported. use __GCC_DESTRUCTIVE_SIZE
>   instead of hardcoded 64.

The periodic tests that Martin Liška left behind for me (for now) to
look at are now complaining that the configure and configure.ac in
libstdc++ are not in sync, the difference being (drum roll):

diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 188be08d716..412c4bf0e85 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -71957,6 +71957,7 @@ _ACEOF
   fi
 
 
+# For src/c++11/shared_ptr.cc alignment.
 
 
   ac_ext=cpp


Can I commit this change or do I need to attempt to adjust the script to
ignore comments in configure or what would be the correct way to deal
with this? (Option requiring work on my side may take some time during
which the otherwise useful tests would not work, I am afraid).

Thanks,

Martin

RE: [GCC12 backport] arm: MVE testsuite and backend bugfixes

2023-05-17 Thread Kyrylo Tkachov via Gcc-patches

Hi Stam,

> -Original Message-
> From: Stam Markianos-Wright 
> Sent: Tuesday, May 16, 2023 2:32 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [GCC12 backport] arm: MVE testsuite and backend bugfixes
> 
> Hi all,
> 
> We've recently sent up a lot of patches overhauling the testsuite of the
> Arm MVE backend.
> With these changes, we've also identified and fixed a number of bugs
> (some backend bugs and many to do with the polymorphism of intrinsics in
> MVE the header file).
> These would all be relevant to backport to GCC12.
> The list is as follows (in the order they all apply on top of eachother):
> 
> * This patch series:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606552.html
> (commits 9a79b522e0663a202a288db56ebcbdcdb48bdaca to
> f2b54e5b796b00f0072b61f9cd6a964c66ead29b)
> * ecc363971aeac52481d92de8b37521f6cc2d38e6 arm: Fix MVE testsuite
> fallouts
> * 06aa66af7d0dacc1b247d9e38175e789ef159191 arm: Add missing early
> clobber to MVE vrev64q_m patterns
> * c09663eabfb84ac56ddd8d44abcab3f4902c83bd testsuite: [arm] Relax
> expected register names in MVE tests
> * 330d665ce6dcc63ed0bd78d807e69bbfc55255b6 arm: [MVE] Add missing
> length=8 attribute
> * 8d4f007398bc3f8fea812fb8cff4d7d0556d12f1 arm: fix mve intrinsics scan
> body tests for C++
> * This patch series
> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610312.html
> (commits dd4424ef898608321b60610c4f3c98737ace3680 to
> 267f01a493ab8a0bec9325ce3386b946c46f2e98)
> * 8a1360e72d6c6056606aa5edd8c906c50f26de59 arm: Split up MVE _Generic
> associations to prevent type clashes [PR107515]
> * 3f0ca7a3e4431534bff3b8eb73709cc822e489b0 arm: Fix vcreate definition
> * c1093923733a1072a237f112e3239b5ebd88eadd arm: Make MVE masked
> stores
> read memory operand [PR 108177]
> * f54e31ddefe3ea7146624eabcb75b1c90dc59f1a arm: fix __arm_vld1q_z*
> and
> __arm_vst1q_p* intrinsics [PR108442]
> * 1d509f190393627cdf0afffc427b25dd21c2 arm: remove unused variables
> from test
> 

Ok to backport.

> -- up to this point everything applied cleanly. The final two need minor
> rebasing changes --
> 
> * This patch series:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-April/617008.html (Not
> pushed to trunk yet, but has been approved. For trunk we do now need to
> resolve some merge conflicts, since Christophe has started merging the
> MVE Intrinsic Restructuring, but these are trivial. I will also backport
> to GCC13 where this patch series applies cleanly)
> * cfa118fc089e38a94ec60ccf5b667aea015e5f60 [arm] complete vmsr/vmrs
> blank and case adjustments.
> 
> The final one is a commit from Alexandre Oliva that is needed to ensure
> that we don't accidentally regress the test due to the tabs vs spaces
> and capitalisation on the vmrs/vmsr instructions :)
> 
> After all that, no regressions on baremetal arm-none-eabi in a bunch
> configurations (-marm, thumb1, thumb2, MVE, MVE.FP, softfp and hardfp):
> 

Will you be sending these to the list after adjusting?
Thanks,
Kyrill

> Thanks,
> Stam

[PATCH 13-backport] riscv/linux: Don't add -latomic with -pthread

2023-05-17 Thread Bo YU via Gcc-patches


Hi,

I just want to backport the commit to gcc-13 branch:

commit 203f3060dd363361b172f7295f42bb6bf5ac0b3b
Author: Andreas Schwab 
Date:   Sat Apr 23 15:48:42 2022 +0200

riscv/linux: Don't add -latomic with -pthread

Now that we have support for inline subword atomic operations, it is no
longer necessary to link against libatomic.  This also fixes testsuite
failures because the framework does not properly set up the linker flags
for finding libatomic.
The use of atomic operations is also independent of the use of libpthread.

gcc/
* config/riscv/linux.h (LIB_SPEC): Don't redefine.

The discussion is here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104338#c20


--
Regards,
--
  Bo YU

From d376ec41a9affa946df4676c3bf81118d122f281 Mon Sep 17 00:00:00 2001
From: Andreas Schwab 
Date: Sat, 23 Apr 2022 15:48:42 +0200
Subject: [PATCH 13-backport] riscv/linux: Don't add -latomic with -pthread

Now that we have support for inline subword atomic operations, it is no
longer necessary to link against libatomic.  This also fixes testsuite
failures because the framework does not properly set up the linker flags
for finding libatomic.
The use of atomic operations is also independent of the use of libpthread.

gcc/
	* config/riscv/linux.h (LIB_SPEC): Don't redefine.

(cherry picked from commit 203f3060dd363361b172f7295f42bb6bf5ac0b3b)
---
 gcc/config/riscv/linux.h | 10 --
 1 file changed, 10 deletions(-)

diff --git a/gcc/config/riscv/linux.h b/gcc/config/riscv/linux.h
index b9557a75dc7..2fdfd930cf2 100644
--- a/gcc/config/riscv/linux.h
+++ b/gcc/config/riscv/linux.h
@@ -35,16 +35,6 @@ along with GCC; see the file COPYING3.  If not see
 #undef MUSL_DYNAMIC_LINKER
 #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-riscv" XLEN_SPEC MUSL_ABI_SUFFIX ".so.1"
 
-/* Because RISC-V only has word-sized atomics, it requries libatomic where
-   others do not.  So link libatomic by default, as needed.  */
-#undef LIB_SPEC
-#ifdef LD_AS_NEEDED_OPTION
-#define LIB_SPEC GNU_USER_TARGET_LIB_SPEC \
-  " %{pthread:" LD_AS_NEEDED_OPTION " -latomic " LD_NO_AS_NEEDED_OPTION "}"
-#else
-#define LIB_SPEC GNU_USER_TARGET_LIB_SPEC " -latomic "
-#endif
-
 #define ICACHE_FLUSH_FUNC "__riscv_flush_icache"
 
 #define CPP_SPEC "%{pthread:-D_REENTRANT}"
-- 
2.39.2



signature.asc
Description: PGP signature

[PATCH] RISC-V: Remove trailing spaces on lines.

2023-05-17 Thread Jin Ma via Gcc-patches

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Remove
trailing spaces on lines.
* config/riscv/riscv.cc (riscv_legitimize_move): Likewise.
* config/riscv/riscv.h (enum reg_class): Likewise.
* config/riscv/riscv.md: Likewise.
---
 gcc/common/config/riscv/riscv-common.cc | 2 +-
 gcc/config/riscv/riscv.cc   | 6 +++---
 gcc/config/riscv/riscv.h| 2 +-
 gcc/config/riscv/riscv.md   | 4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 3a285dfbff0..e46ddf78132 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -104,7 +104,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
 
   {"zfh", "zfhmin"},
   {"zfhmin", "f"},
-  
+
   {"zhinx", "zhinxmin"},
   {"zhinxmin", "zfinx"},
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index b52e613c629..1eb3e142905 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2166,8 +2166,8 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
}
   return true;
 }
-  /* Expand 
-   (set (reg:QI target) (mem:QI (address))) 
+  /* Expand
+   (set (reg:QI target) (mem:QI (address)))
  to
(set (reg:DI temp) (zero_extend:DI (mem:QI (address
(set (reg:QI target) (subreg:QI (reg:DI temp) 0))
@@ -2182,7 +2182,7 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
 
   temp_reg = gen_reg_rtx (word_mode);
   zero_extend_p = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND);
-  emit_insn (gen_extend_insn (temp_reg, src, word_mode, mode, 
+  emit_insn (gen_extend_insn (temp_reg, src, word_mode, mode,
  zero_extend_p));
   riscv_emit_move (dest, gen_lowpart (mode, temp_reg));
   return true;
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index f74b70de562..70087d011f4 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -575,7 +575,7 @@ enum reg_class
 #define POLY_SMALL_OPERAND_P(POLY_VALUE)   \
   (POLY_VALUE.is_constant () ? \
  SMALL_OPERAND (POLY_VALUE.to_constant ()) : false)
- 
+
 /* True if VALUE can be loaded into a register using LUI.  */
 
 #define LUI_OPERAND(VALUE) \
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index c5cf3af9868..f47ebb3a829 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -222,7 +222,7 @@ (define_attr "ext" "base,f,d,vector"
 (define_attr "ext_enabled" "no,yes"
   (cond [(eq_attr "ext" "base")
 (const_string "yes")
-   
+
 (and (eq_attr "ext" "f")
  (match_test "TARGET_HARD_FLOAT"))
 (const_string "yes")
@@ -258,7 +258,7 @@ (define_attr "enabled" "no,yes"
 ;; logical  integer logical instructions
 ;; shift   integer shift instructions
 ;; slt set less than instructions
-;; imulinteger multiply 
+;; imulinteger multiply
 ;; idivinteger divide
 ;; moveinteger register move (addi rd, rs1, 0)
 ;; fmove   floating point register move
-- 
2.17.1

Re: [Ping][PATCH] libstdc++: Add missing functions to [PR79700]

2023-05-17 Thread Jonathan Wakely via Gcc-patches

On Wed, 17 May 2023 at 09:37, Nathaniel Shead wrote:

> Now that GCC13.1 is released is it ok to merge? Thanks!
>

Yes, I've been testing this locally, but I think it needs more work (sorry!)

Looking at it again, I'm not sure why I asked for the additional tests
because if they fail, it's a problem in libc, and there's nothing we can
actually do about it in libstdc++. We certainly do want std::expl(0.0L) to
return the same thing as std::exp(0.0L), but if it doesn't, we'll just have
a libstdc++ test failure caused by a bug in libc. But you wrote the test
now, so let's keep it. If we get failures for the test it will allow us to
inform the relevant libc maintainers that they have a bug.

Also, since you're contributing this under the DCO terms the new test
should not have the FSF copyright header, unless it's a derived work of an
existing test with that header (and in that case it should retain the dates
from the copied test). I don't actually bother putting the copyright and
license header on new tests these days. There's nothing in that test that
is novel or interesting, and I think it's arguably not useful or meaningful
to consider it copyrighted.

Finally, and most importantly, the new using-declarations in  are
not guarded by any autoconf macro. That will break targets without full C99
 support, e.g. djgpp declares acosf but not acosl, so the new
"using acosl;" would be a hard error as soon as  is included (and
might even prevent GCC building on that target). So I think we need a new
autoconf check for the existence of those functions. I'm in the process of
reworking the autoconf macros for  (due to PR 109818), which is why
I didn't address it for this patch yet.

>
> On Tue, Apr 18, 2023 at 6:48 PM Jonathan Wakely 
> wrote:
> >
> > On Mon, 17 Apr 2023 at 09:11, Nathaniel Shead 
> wrote:
> > >
> > > Hi, just checking whether there were any issues with this patch?
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612386.html
> > >
> > > Otherwise I assume it won't be in GCC13.
> >
> > That's right, it's too large and invasive a change to get into GCC 13
> > when only submitted in February, sorry. I'll merge it to trunk once
> > GCC 13.1 is released though.
> >
>
>

[PATCH] Fix type error of 'switch (SUBREG_BYTE (op)).'

2023-05-17 Thread Jin Ma via Gcc-patches

For example:
(define_insn "mov_lowpart_sidi2"
  [(set (match_operand:SI0 "register_operand" "=r")
(subreg:SI (match_operand:DI 1 "register_operand" " r") 0))]
  "TARGET_64BIT"
  "mov\t%0,%1")

(define_insn "mov_highpart_sidi2"
  [(set (match_operand:SI0 "register_operand" "=r")
(subreg:SI (match_operand:DI 1 "register_operand" " r") 1))]
  "TARGET_64BIT"
  "movh\t%0,%1")

When defining the above patterns, the generated file insn-recog.cc will
appear 'switch (SUBREG_BYTE (op))', but since the return value of
SUBREG_BYTE is poly_uint16_pod, the following error will occur:
"error: switch quantity not an integer".

gcc/ChangeLog:

* genrecog.cc (print_nonbool_test): Fix type error of
'switch (SUBREG_BYTE (op))'.
---
 gcc/genrecog.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/genrecog.cc b/gcc/genrecog.cc
index 6dd375da5e3..04a5533ca4b 100644
--- a/gcc/genrecog.cc
+++ b/gcc/genrecog.cc
@@ -4619,6 +4619,7 @@ print_nonbool_test (output_state *os, const rtx_test 
)
   printf ("SUBREG_BYTE (");
   print_test_rtx (os, test);
   printf (")");
+  printf (".to_constant ()");
   break;
 
 case rtx_test::WIDE_INT_FIELD:
-- 
2.17.1

Re: [Ping][PATCH] libstdc++: Add missing functions to [PR79700]

2023-05-17 Thread Nathaniel Shead via Gcc-patches

Now that GCC13.1 is released is it ok to merge? Thanks!

On Tue, Apr 18, 2023 at 6:48 PM Jonathan Wakely  wrote:
>
> On Mon, 17 Apr 2023 at 09:11, Nathaniel Shead  
> wrote:
> >
> > Hi, just checking whether there were any issues with this patch?
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612386.html
> >
> > Otherwise I assume it won't be in GCC13.
>
> That's right, it's too large and invasive a change to get into GCC 13
> when only submitted in February, sorry. I'll merge it to trunk once
> GCC 13.1 is released though.
>

[committed] wide-int: Fix up function comment

2023-05-17 Thread Jakub Jelinek via Gcc-patches

Hi!

When looking into _BitInt support, I've noticed unterminated parens in
a function comment.
Fixing thusly.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
as obvious.

2023-05-17  Jakub Jelinek  

* wide-int.cc (wi::from_array): Add missing closing paren in function
comment.

--- gcc/wide-int.cc.jj  2023-01-02 09:32:53.890830070 +0100
+++ gcc/wide-int.cc 2023-05-16 18:50:28.782323397 +0200
@@ -139,7 +139,7 @@ canonize_uhwi (HOST_WIDE_INT *val, unsig
 
 /* Copy XLEN elements from XVAL to VAL.  If NEED_CANON, canonize the
result for an integer with precision PRECISION.  Return the length
-   of VAL (after any canonization.  */
+   of VAL (after any canonization).  */
 unsigned int
 wi::from_array (HOST_WIDE_INT *val, const HOST_WIDE_INT *xval,
unsigned int xlen, unsigned int precision, bool need_canon)

Jakub

[PATCH] RISC-V: Support RVV VREINTERPRET from v{u}int*_t to vbool[2-64]_t

2023-05-17 Thread Pan Li via Gcc-patches

From: Pan Li 

This patch support the RVV VREINTERPRET from the int to the
vbool[2|4|8|16|32|64]_t.  Aka:

vbool[2|4|8|16|32|64]_t __riscv_vreinterpret_x_x(v{u}int[8|16|32|64]_t);

These APIs help the users to convert vector LMUL=1 integer to
vbool[2-64]_t.  According to the RVV intrinsic SPEC as below,
the reinterpret intrinsics only change the types of the underlying
contents.

https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#reinterpret-vbool-o-vintm1

For example, given below code.
vbool64_t test_vreinterpret_v_u8m1_b64 (vuint8m1_t src) {
  return __riscv_vreinterpret_v_u8m1_b64 (src);
}

It will generate the assembly code similar as below:
vsetvli a5,zero,e8,mf8,ta,ma
vlm.v   v1,0(a1)
vsm.v   v1,0(a0)
ret

Please NOTE the test files doesn't cover all the possible combinations
of the intrinsic APIs introduced by this PATCH due to too many.
The reinterpret from vbool*_t to v{u}int*_t with lmul=1 will be coverred
int another PATCH.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (BOOL_SIZE_LIST): Add the
rest bool size, aka 2, 4, 8, 16, 32, 64.
* config/riscv/riscv-vector-builtins-functions.def (vreinterpret):
Register vbool[2|4|8|16|32|64] interpret function.
* config/riscv/riscv-vector-builtins-types.def 
(DEF_RVV_BOOL2_INTERPRET_OPS):
New macro for vbool2_t.
(DEF_RVV_BOOL4_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL8_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL16_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL32_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL64_INTERPRET_OPS): Likewise.
(vint8m1_t): Add the type to bool[2|4|8|16|32|64]_interpret_ops.
(vint16m1_t): Likewise.
(vint32m1_t): Likewise.
(vint64m1_t): Likewise.
(vuint8m1_t): Likewise.
(vuint16m1_t): Likewise.
(vuint32m1_t): Likewise.
(vuint64m1_t): Likewise.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_BOOL2_INTERPRET_OPS):
New macro for vbool2_t.
(DEF_RVV_BOOL4_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL8_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL16_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL32_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL64_INTERPRET_OPS): Likewise.
(required_extensions_p): Add vbool[2|4|8|16|32|64] interpret case.
* config/riscv/riscv-vector-builtins.def (bool2_interpret): Add
vbool2_t interprect to base type.
(bool4_interpret): Likewise.
(bool8_interpret): Likewise.
(bool16_interpret): Likewise.
(bool32_interpret): Likewise.
(bool64_interpret): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: Add
test cases for vbool[2|4|8|16|32|64]_t.
---
 gcc/config/riscv/genrvv-type-indexer.cc   |   2 +-
 .../riscv/riscv-vector-builtins-functions.def |   6 +
 .../riscv/riscv-vector-builtins-types.def |  97 +++-
 gcc/config/riscv/riscv-vector-builtins.cc | 105 +-
 gcc/config/riscv/riscv-vector-builtins.def|   9 +-
 .../rvv/base/misc_vreinterpret_vbool_vint.c   |  52 -
 6 files changed, 265 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
b/gcc/config/riscv/genrvv-type-indexer.cc
index 2f0375568a8..33738e41d7c 100644
--- a/gcc/config/riscv/genrvv-type-indexer.cc
+++ b/gcc/config/riscv/genrvv-type-indexer.cc
@@ -23,7 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 #include 
 #include 
 
-#define BOOL_SIZE_LIST {1}
+#define BOOL_SIZE_LIST {1, 2, 4, 8, 16, 32, 64}
 
 std::string
 to_lmul (int lmul_log2)
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 72032c6a52c..7c89a20cb24 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -509,6 +509,12 @@ DEF_RVV_FUNCTION (vreinterpret, misc, none_preds, 
iu_v_eew16_interpret_ops)
 DEF_RVV_FUNCTION (vreinterpret, misc, none_preds, iu_v_eew32_interpret_ops)
 DEF_RVV_FUNCTION (vreinterpret, misc, none_preds, iu_v_eew64_interpret_ops)
 DEF_RVV_FUNCTION (vreinterpret, misc, none_preds, iu_v_bool1_interpret_ops)
+DEF_RVV_FUNCTION (vreinterpret, misc, none_preds, iu_v_bool2_interpret_ops)
+DEF_RVV_FUNCTION (vreinterpret, misc, none_preds, iu_v_bool4_interpret_ops)
+DEF_RVV_FUNCTION (vreinterpret, misc, none_preds, iu_v_bool8_interpret_ops)
+DEF_RVV_FUNCTION (vreinterpret, misc, none_preds, iu_v_bool16_interpret_ops)
+DEF_RVV_FUNCTION (vreinterpret, misc, none_preds, iu_v_bool32_interpret_ops)
+DEF_RVV_FUNCTION (vreinterpret, misc, none_preds, iu_v_bool64_interpret_ops)
 DEF_RVV_FUNCTION (vlmul_ext, misc, none_preds, all_v_vlmul_ext_x2_ops)
 DEF_RVV_FUNCTION (vlmul_ext, misc, none_preds, all_v_vlmul_ext_x4_ops)
 DEF_RVV_FUNCTION (vlmul_ext, misc, none_preds, all_v_vlmul_ext_x8_ops)
diff --git

Re: RISC-V Test Errors and Failures

2023-05-17 Thread Andreas Schwab via Gcc-patches

On Mai 16 2023, Vineet Gupta wrote:

> Yes I was seeing similar tcl errors and such - and in my case an even
> higher count.

They are coming from commit d6654a4be3b.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1

2023-05-17 Thread Kewen.Lin via Gcc-patches

Hi Richi,

on 2023/5/17 14:34, Richard Biener wrote:
> On Wed, May 17, 2023 at 8:09 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> This patch is to refactor the handlings for the case (index
>> == count) in a loop of vect_transform_slp_perm_load_1, in
>> order to prepare a subsequent adjustment on *nperm.  This
>> patch doesn't have any functional changes.
> 
> The diff is impossible to be reviewed - can you explain the
> refactoring you have done or also attach a patch more clearly
> showing what you change?

Sorry, I should have made it more clear.
It mainly to combine these two hunks:
  
  if (index == count && !noop_p)
{  
   // A ...
   // ++*n_perms;
}

  if (index == count)
{
   if (!analyze_only)
 {
if (!noop_p)
   // B1 ...

// B2 ...
   
for ...
  {
 if (!noop_p)
// B3 building VEC_PERM_EXPR
 else
// B4 building nothing (no uses for B2 and its seq)
  }
 }
   // B5
}

The former can be part of the latter, so it becomes to:

  if (index == count)
{
   if (!noop_p)
 {
   // A ...
   // ++*n_perms;

   if (!analyze_only)
 {
// B1 ...
// B2 ...
for ...
   // B3 building VEC_PERM_EXPR
 }
 }
   else if (!analyze_only)
 {
// no B2 since no any further uses here.
for ...
  // B4 building nothing
 }
// B5 ...
}

But it's mainly the basic for the subsequent patch for consistent n_perms 
calculation,
the patch 2/2 is to make it further become to:

  if (index == count)
{
   if (!noop_p)
 {
   // A ...

   if (!analyze_only)
 // B1 ...
   
   // B2 ... (trivial computations during analyze_only or not)

   for ...
 {
// ++*n_perms;  (now n_perms is consistent with building 
VEC_PERM_EXPR)
if (analyze_only)
   continue;
// B3 building VEC_PERM_EXPR
 }
 }
   else if (!analyze_only)
 {
// no B2 since no any further uses here.
for ...
  // B4 building nothing
 }
// B5 ...
}

BR,
Kewen


> 
>> Bootstrapped and regtested on x86_64-redhat-linux,
>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Refactor the
>> handling on the case index == count.
>> ---
>>  gcc/tree-vect-slp.cc | 89 ++--
>>  1 file changed, 44 insertions(+), 45 deletions(-)
>>
>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>> index 3b7a21724ec..e5c9d7e766e 100644
>> --- a/gcc/tree-vect-slp.cc
>> +++ b/gcc/tree-vect-slp.cc
>> @@ -8230,59 +8230,50 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, 
>> slp_tree node,
>> noop_p = false;
>>mask[index++] = mask_element;
>>
>> -  if (index == count && !noop_p)
>> +  if (index == count)
>> {
>> - indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
>> - if (!can_vec_perm_const_p (mode, mode, indices))
>> + if (!noop_p)
>> {
>> - if (dump_p)
>> + indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, 
>> nunits);
>> + if (!can_vec_perm_const_p (mode, mode, indices))
>> {
>> - dump_printf_loc (MSG_MISSED_OPTIMIZATION,
>> -  vect_location,
>> -  "unsupported vect permute { ");
>> - for (i = 0; i < count; ++i)
>> + if (dump_p)
>> {
>> - dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
>> - dump_printf (MSG_MISSED_OPTIMIZATION, " ");
>> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, 
>> vect_location,
>> +  "unsupported vect permute { ");
>> + for (i = 0; i < count; ++i)
>> +   {
>> + dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
>> + dump_printf (MSG_MISSED_OPTIMIZATION, " ");
>> +   }
>> + dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
>> }
>> - dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
>> + gcc_assert (analyze_only);
>> + return false;
>> }
>> - gcc_assert (analyze_only);
>> - return false;
>> -   }
>>
>> - ++*n_perms;
>> -   }
>> + ++*n_perms;
>>
>> -  if (index == count)
>> -   {
>> -

RE: [PATCH] RISC-V: Support RVV VREINTERPRET from v{u}int*_t to vbool1_t

2023-05-17 Thread Li, Pan2 via Gcc-patches

Committed, thanks kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Wednesday, May 17, 2023 3:00 PM
To: Li, Pan2 
Cc: juzhe.zh...@rivai.ai; gcc-patches ; Kito.cheng 
; Wang, Yanzhang 
Subject: Re: [PATCH] RISC-V: Support RVV VREINTERPRET from v{u}int*_t to 
vbool1_t

ok, and also ok for those small API test in testsuite.

On Tue, May 16, 2023 at 9:10 AM Li, Pan2 via Gcc-patches 
 wrote:
>
> Kindly ping for this PATCH, .
>
> Pan
>
> From: Li, Pan2
> Sent: Monday, May 15, 2023 11:25 AM
> To: juzhe.zh...@rivai.ai; gcc-patches 
> Cc: Kito.cheng ; Wang, Yanzhang 
> 
> Subject: RE: [PATCH] RISC-V: Support RVV VREINTERPRET from v{u}int*_t 
> to vbool1_t
>
> Thanks Juzhe. Let’s wait kito’s suggestion.
>
> Pan
>
> From: juzhe.zh...@rivai.ai 
> mailto:juzhe.zh...@rivai.ai>>
> Sent: Monday, May 15, 2023 11:20 AM
> To: Li, Pan2 mailto:pan2...@intel.com>>; 
> gcc-patches mailto:gcc-patches@gcc.gnu.org>>
> Cc: Kito.cheng mailto:kito.ch...@sifive.com>>; 
> Li, Pan2 mailto:pan2...@intel.com>>; Wang, Yanzhang 
> mailto:yanzhang.w...@intel.com>>
> Subject: Re: [PATCH] RISC-V: Support RVV VREINTERPRET from v{u}int*_t 
> to vbool1_t
>
> The implementation LGTM.
> But I am not sure testcase since we don't include any intrinsic API testcases 
> in GCC testsuite.
> I think it needs Kito's decision.
>
> Thanks.
> 
> juzhe.zh...@rivai.ai
>
> From: pan2.li
> Date: 2023-05-15 11:14
> To: gcc-patches
> CC: juzhe.zhong; 
> kito.cheng; 
> pan2.li; 
> yanzhang.wang
> Subject: [PATCH] RISC-V: Support RVV VREINTERPRET from v{u}int*_t to 
> vbool1_t
> From: Pan Li mailto:pan2...@intel.com>>
>
> This patch support the RVV VREINTERPRET from the int to the vbool1_t.  Aka:
>
> vbool1_t __riscv_vreinterpret_xx_xx(v{u}int[8|16|32|64]_t);
>
> These APIs help the users to convert vector LMUL=1 integer to vbool1_t.
> According to the RVV intrinsic SPEC as below, the reinterpret 
> intrinsics only change the types of the underlying contents.
>
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-int
> rinsic-rfc.md#reinterpret-vbool-o-vintm1
>
> For example, given below code.
> vbool1_t test_vreinterpret_v_i8m1_b1(vint8m1_t src) {
>   return __riscv_vreinterpret_v_i8m1_b1(src);
> }
>
> It will generate the assembly code similar as below:
> vsetvli a5,zero,e8,m8,ta,ma
> vlm.v   v1,0(a1)
> vsm.v   v1,0(a0)
> ret
>
> The rest intrinsic bool size APIs will be prepared in other PATCH.
>
> Signed-off-by: Pan Li mailto:pan2...@intel.com>>
>
> gcc/ChangeLog:
>
> * config/riscv/genrvv-type-indexer.cc (BOOL_SIZE_LIST): New
>   macro.
> (main): Add bool1 to the type indexer.
> * config/riscv/riscv-vector-builtins-functions.def
> (vreinterpret): Register vbool1 interpret function.
> * config/riscv/riscv-vector-builtins-types.def
> (DEF_RVV_BOOL1_INTERPRET_OPS): New macro.
> (vint8m1_t): Add the type to bool1_interpret_ops.
> (vint16m1_t): Ditto.
> (vint32m1_t): Ditto.
> (vint64m1_t): Ditto.
> (vuint8m1_t): Ditto.
> (vuint16m1_t): Ditto.
> (vuint32m1_t): Ditto.
> (vuint64m1_t): Ditto.
> * config/riscv/riscv-vector-builtins.cc
> (DEF_RVV_BOOL1_INTERPRET_OPS): New macro.
> (required_extensions_p): Add bool1 interpret case.
> * config/riscv/riscv-vector-builtins.def
> (bool1_interpret): Add bool1 interpret to base type.
> * config/riscv/vector.md (@vreinterpret): Add new expand with VB 
> dest for vreinterpret.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: New test.
> ---
> gcc/config/riscv/genrvv-type-indexer.cc   | 19 ++
> .../riscv/riscv-vector-builtins-functions.def |  1 +
> .../riscv/riscv-vector-builtins-types.def | 17 +
> gcc/config/riscv/riscv-vector-builtins.cc | 18 +
> gcc/config/riscv/riscv-vector-builtins.def|  2 +
> gcc/config/riscv/vector.md| 10 +
> .../rvv/base/misc_vreinterpret_vbool_vint.c   | 38 +++
> 7 files changed, 105 insertions(+)
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c
>
> diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
> b/gcc/config/riscv/genrvv-type-indexer.cc
> index 9bf6a82601d..2f0375568a8 100644
> --- a/gcc/config/riscv/genrvv-type-indexer.cc
> +++ b/gcc/config/riscv/genrvv-type-indexer.cc
> @@ -23,6 +23,8 @@ along with GCC; see the file COPYING3.  If not see 
> #include  #include 
> +#define BOOL_SIZE_LIST {1}
> +
> std::string
> to_lmul (int lmul_log2)
> {
> @@ -218,6 +220,9 @@ main (int argc, const char **argv)
>for (unsigned eew : {8, 16, 32, 64}) fprintf (fp, "  
> /*EEW%d_INTERPRET*/ INVALID,\n", eew);
> +  for (unsigned boolsize : BOOL_SIZE_LIST) fprintf (fp, "  
> + /*BOOL%d_INTERPRET*/ INVALID,\n", boolsize);
> +
>for (unsigned

Re: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build

2023-05-17 Thread Thomas Schwinge

Hi!

On 2023-05-15T09:30:35+0200, Richard Biener via Gcc-patches 
 wrote:
> On Fri, May 12, 2023 at 10:35 PM Eugene Rozenfeld
>  wrote:
>>
>> Thank you, Richard. I went with your suggestion. New patch:
>>
>>
>> [PATCH] Disable warnings as errors for STAGEautofeedback.
>>
>> Compilation during STAGEautofeedback produces additional warnings
>> since inlining decisions with -fauto-profile are different from
>> other builds.
>>
>> This patches disables warnings as errors for STAGEautofeedback.
>
> Can you add a comment before the filtering?
>
> Otherwise looks good to me - please leave others 24h to comment before you
> commit.

>> --- a/Makefile.in
>> +++ b/Makefile.in
>> @@ -590,8 +590,7 @@ STAGEautofeedback_CXXFLAGS = $(CXXFLAGS)
>>  STAGEautofeedback_CXXFLAGS = $(STAGEautofeedback_CFLAGS)
>>  @endif target-libstdc++-v3-bootstrap
>>  STAGEautofeedback_TFLAGS = $(STAGE_TFLAGS)
>> -STAGEautofeedback_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
>> -
>> +STAGEautofeedback_CONFIGURE_FLAGS = $(filter-out 
>> --enable-werror-always,$(STAGE_CONFIGURE_FLAGS))

That's not how it works; the next person running 'autogen Makefile.def'
to regenerate 'Makefile.in' is going to undo those changes.  Instead,
modify 'Makefile.def', 'Makefile.tpl', and then 'autogen Makefile.def'.


Grüße
 Thomas


>> -Original Message-
>> From: Richard Biener 
>> Sent: Thursday, May 11, 2023 1:58 AM
>> To: Eugene Rozenfeld 
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings 
>> during autoprofiledbootstrap build
>>
>> On Thu, May 11, 2023 at 4:23 AM Eugene Rozenfeld 
>>  wrote:
>> >
>> > I'm ok with disabling warnings as errors for autoprofiledbootstrap. What's 
>> > the proper way to do that? Searching for "--disable-werror" I see matches 
>> > in lib configure files but not in gcc files.
>>
>> We have --with-build-config selecting things like bootstrap-O3 and configure 
>> then disables werror by default if the build config is anything other than 
>> the default or bootstrap-debug.
>>
>> Of course profiledbootstrap and autoprofiledbootstrap are not build configs 
>> but make targets - that makes it more difficult (or impossible) to use the 
>> --disable-werror machinery here.
>>
>> There is
>>
>> STAGE_CONFIGURE_FLAGS=@stage2_werror_flag@
>>
>> so it might be possible to filter out --enable-werror-always from 
>> STAGEautofeedback_CONFIGURE_FLAGS?
>>
>> Richard.
>>
>> > Thanks,
>> >
>> > Eugene
>> >
>> > -Original Message-
>> > From: Richard Biener 
>> > Sent: Tuesday, May 9, 2023 11:40 PM
>> > To: Eugene Rozenfeld 
>> > Cc: gcc-patches@gcc.gnu.org
>> > Subject: [EXTERNAL] Re: [PATCH] Fixes and workarounds for warnings
>> > during autoprofiledbootstrap build
>> >
>> > On Wed, May 10, 2023 at 3:38 AM Eugene Rozenfeld via Gcc-patches 
>> >  wrote:
>> > >
>> > > autoprofiledbootstrap build produces new warnings since inlining
>> > > decisions are different from other builds. This patch contains fixes
>> > > and workarounds for those warnings.
>> > >
>> > > Tested on x86_64-pc-linux-gnu.
>> >
>> > Rather than this would it make sense to add --disable-werror to 
>> > autoprofiledbootstrap configs like we do for others?  I also wonder how 
>> > "stable" the afdo bootstrap inlining decisions are, so applying these 
>> > workarounds may not be sustainable?
>> >
>> > > gcc/ChangeLog:
>> > >
>> > > * config/i386/i386-expand.cc (expand_vec_perm_interleave2): Work 
>> > > around
>> > > -Wstringop-overflow false positive during autoprofiledbootstrap
>> > > * ipa-devirt.cc (debug_tree_odr_name): Fix for -Wformat-overflow
>> > > warning during autoprofiledbootstrap
>> > > * lra-eliminations.cc (setup_can_eliminate): Work around
>> > > -Wmaybe-uninitialized false positive during autoprofiledbootstrap
>> > > * opts-common.cc (candidates_list_and_hint): Work around
>> > > -Wstringop-overflow false positive during autoprofiledbootstrap
>> > > * tree-ssa-ccp.cc (bit_value_unop): Work around 
>> > > -Wmaybe-uninitialized
>> > > false positive during autoprofiledbootstrap
>> > > * wide-int.h (wi::copy): Work around -Wmaybe-uninitialized false
>> > > positive during autoprofiledbootstrap
>> > > ---
>> > >  gcc/config/i386/i386-expand.cc | 11 +++
>> > >  gcc/ipa-devirt.cc  |  3 ++-
>> > >  gcc/lra-eliminations.cc| 11 +++
>> > >  gcc/opts-common.cc |  1 +
>> > >  gcc/tree-ssa-ccp.cc| 11 +++
>> > >  gcc/wide-int.h | 11 +++
>> > >  6 files changed, 47 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/gcc/config/i386/i386-expand.cc
>> > > b/gcc/config/i386/i386-expand.cc index 634fe61ba79..be9f912775b
>> > > 100644
>> > > --- a/gcc/config/i386/i386-expand.cc
>> > > +++ b/gcc/config/i386/i386-expand.cc
>> > > @@ -20419,6 +20419,13 @@ expand_vec_perm_pblendv (struct
>> > > expand_vec_perm_d *d)
>> > >
>> > >

Re: [PATCH] RISC-V: Support RVV VREINTERPRET from v{u}int*_t to vbool1_t

2023-05-17 Thread Kito Cheng via Gcc-patches

ok, and also ok for those small API test in testsuite.

On Tue, May 16, 2023 at 9:10 AM Li, Pan2 via Gcc-patches
 wrote:
>
> Kindly ping for this PATCH, .
>
> Pan
>
> From: Li, Pan2
> Sent: Monday, May 15, 2023 11:25 AM
> To: juzhe.zh...@rivai.ai; gcc-patches 
> Cc: Kito.cheng ; Wang, Yanzhang 
> 
> Subject: RE: [PATCH] RISC-V: Support RVV VREINTERPRET from v{u}int*_t to 
> vbool1_t
>
> Thanks Juzhe. Let’s wait kito’s suggestion.
>
> Pan
>
> From: juzhe.zh...@rivai.ai 
> mailto:juzhe.zh...@rivai.ai>>
> Sent: Monday, May 15, 2023 11:20 AM
> To: Li, Pan2 mailto:pan2...@intel.com>>; gcc-patches 
> mailto:gcc-patches@gcc.gnu.org>>
> Cc: Kito.cheng mailto:kito.ch...@sifive.com>>; Li, 
> Pan2 mailto:pan2...@intel.com>>; Wang, Yanzhang 
> mailto:yanzhang.w...@intel.com>>
> Subject: Re: [PATCH] RISC-V: Support RVV VREINTERPRET from v{u}int*_t to 
> vbool1_t
>
> The implementation LGTM.
> But I am not sure testcase since we don't include any intrinsic API testcases 
> in GCC testsuite.
> I think it needs Kito's decision.
>
> Thanks.
> 
> juzhe.zh...@rivai.ai
>
> From: pan2.li
> Date: 2023-05-15 11:14
> To: gcc-patches
> CC: juzhe.zhong; 
> kito.cheng; pan2.li; 
> yanzhang.wang
> Subject: [PATCH] RISC-V: Support RVV VREINTERPRET from v{u}int*_t to vbool1_t
> From: Pan Li mailto:pan2...@intel.com>>
>
> This patch support the RVV VREINTERPRET from the int to the vbool1_t.  Aka:
>
> vbool1_t __riscv_vreinterpret_xx_xx(v{u}int[8|16|32|64]_t);
>
> These APIs help the users to convert vector LMUL=1 integer to vbool1_t.
> According to the RVV intrinsic SPEC as below, the reinterpret intrinsics
> only change the types of the underlying contents.
>
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#reinterpret-vbool-o-vintm1
>
> For example, given below code.
> vbool1_t test_vreinterpret_v_i8m1_b1(vint8m1_t src) {
>   return __riscv_vreinterpret_v_i8m1_b1(src);
> }
>
> It will generate the assembly code similar as below:
> vsetvli a5,zero,e8,m8,ta,ma
> vlm.v   v1,0(a1)
> vsm.v   v1,0(a0)
> ret
>
> The rest intrinsic bool size APIs will be prepared in other PATCH.
>
> Signed-off-by: Pan Li mailto:pan2...@intel.com>>
>
> gcc/ChangeLog:
>
> * config/riscv/genrvv-type-indexer.cc (BOOL_SIZE_LIST): New
>   macro.
> (main): Add bool1 to the type indexer.
> * config/riscv/riscv-vector-builtins-functions.def
> (vreinterpret): Register vbool1 interpret function.
> * config/riscv/riscv-vector-builtins-types.def
> (DEF_RVV_BOOL1_INTERPRET_OPS): New macro.
> (vint8m1_t): Add the type to bool1_interpret_ops.
> (vint16m1_t): Ditto.
> (vint32m1_t): Ditto.
> (vint64m1_t): Ditto.
> (vuint8m1_t): Ditto.
> (vuint16m1_t): Ditto.
> (vuint32m1_t): Ditto.
> (vuint64m1_t): Ditto.
> * config/riscv/riscv-vector-builtins.cc
> (DEF_RVV_BOOL1_INTERPRET_OPS): New macro.
> (required_extensions_p): Add bool1 interpret case.
> * config/riscv/riscv-vector-builtins.def
> (bool1_interpret): Add bool1 interpret to base type.
> * config/riscv/vector.md (@vreinterpret): Add new expand
> with VB dest for vreinterpret.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: New test.
> ---
> gcc/config/riscv/genrvv-type-indexer.cc   | 19 ++
> .../riscv/riscv-vector-builtins-functions.def |  1 +
> .../riscv/riscv-vector-builtins-types.def | 17 +
> gcc/config/riscv/riscv-vector-builtins.cc | 18 +
> gcc/config/riscv/riscv-vector-builtins.def|  2 +
> gcc/config/riscv/vector.md| 10 +
> .../rvv/base/misc_vreinterpret_vbool_vint.c   | 38 +++
> 7 files changed, 105 insertions(+)
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c
>
> diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
> b/gcc/config/riscv/genrvv-type-indexer.cc
> index 9bf6a82601d..2f0375568a8 100644
> --- a/gcc/config/riscv/genrvv-type-indexer.cc
> +++ b/gcc/config/riscv/genrvv-type-indexer.cc
> @@ -23,6 +23,8 @@ along with GCC; see the file COPYING3.  If not see
> #include 
> #include 
> +#define BOOL_SIZE_LIST {1}
> +
> std::string
> to_lmul (int lmul_log2)
> {
> @@ -218,6 +220,9 @@ main (int argc, const char **argv)
>for (unsigned eew : {8, 16, 32, 64})
> fprintf (fp, "  /*EEW%d_INTERPRET*/ INVALID,\n", eew);
> +  for (unsigned boolsize : BOOL_SIZE_LIST)
> + fprintf (fp, "  /*BOOL%d_INTERPRET*/ INVALID,\n", boolsize);
> +
>for (unsigned lmul_log2_offset : {1, 2, 3, 4, 5, 6})
> {
>   unsigned multiple_of_lmul = 1 << lmul_log2_offset;
> @@ -297,6 +302,16 @@ main (int argc, const char **argv)
>inttype (eew, lmul_log2, unsigned_p).c_str ());
>   }
> + for (unsigned boolsize : BOOL_SIZE_LIST)
> +   {
> +

[PATCH] Only use NO_REGS in cost calculation when !hard_regno_mode_ok for GENERAL_REGS and mode.

2023-05-17 Thread liuhongt via Gcc-patches

r14-172-g0368d169492017 replaces GENERAL_REGS with NO_REGS in cost
calculation when the preferred register class are not known yet.
It regressed powerpc PR109610 and PR109858, it looks too aggressive to use
NO_REGS when mode can be allocated with GENERAL_REGS.
The patch takes a step back, still use GENERAL_REGS when
hard_regno_mode_ok for mode and GENERAL_REGS, otherwise uses NO_REGS.
Kewen confirmed the patch fixed PR109858, I vefiried it also fixed PR109610.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
No big performance impact for SPEC2017 on icelake server.
Ok for trunk?

gcc/ChangeLog:

* ira-costs.cc (scan_one_insn): Only use NO_REGS in cost
calculation when !hard_regno_mode_ok for GENERAL_REGS and
mode, otherwise still use GENERAL_REGS.
---
 gcc/ira-costs.cc | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index d2a801ab9b0..ae8304ff938 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -1572,12 +1572,16 @@ scan_one_insn (rtx_insn *insn)
   && (! ira_use_lra_p || ! pic_offset_table_rtx
  || ! contains_symbol_ref_p (XEXP (note, 0
 {
-  /* Costs for NO_REGS are used in cost calculation on the
-1st pass when the preferred register classes are not
-known yet.  In this case we take the best scenario.  */
-  enum reg_class cl = NO_REGS;
+  enum reg_class cl = GENERAL_REGS;
   rtx reg = SET_DEST (set);
   int num = COST_INDEX (REGNO (reg));
+  /* Costs for NO_REGS are used in cost calculation on the
+1st pass when the preferred register classes are not
+known yet.  In this case we take the best scenario when
+mode can't be put into GENERAL_REGS.  */
+  if (!targetm.hard_regno_mode_ok (ira_class_hard_regs[cl][0],
+  GET_MODE (reg)))
+   cl = NO_REGS;
 
   COSTS (costs, num)->mem_cost
-= ira_memory_move_cost[GET_MODE (reg)][cl][1] * frequency;
-- 
2.39.1.388.g2fc9e9ca3c

Re: [PATCH] RISC-V: Adjust stdint.h to stdint-gcc.h for rvv tests

2023-05-17 Thread Kito Cheng via Gcc-patches

RISC-V glibc will require corresponding muilti-lib has built there,
otherwise will report something like:

  /usr/include/gnu/stubs.h:11:11: fatal error: gnu/stubs-ilp32d.h: No
such file or directory

But actually we only require those fixed length types to compile and
scan assembly or dump,
So we don't really have those multilib built, that's the reason we
workaround this way.

This way could work even if the multilib build is disabled, and seems
ARM has same issue around that and just disable those tests:

---
# Return 1 if this is an ARM target supporting -mfloat-abi=soft.  Some
# multilibs may be incompatible with this option.

proc check_effective_target_arm_soft_ok { } {
   return [check_no_compiler_messages arm_soft_ok object {
   #include 
   int dummy;
   int main (void) { return 0; }
   } "-mfloat-abi=soft"]
}

# Return 1 if this is an ARM target supporting -mfloat-abi=soft even
# for linking.  Some multilibs may be incompatible with this option,
# and some linkers may reject incompatible options.

proc check_effective_target_arm_soft_ok_link { } {
   return [check_no_compiler_messages arm_soft_ok_link executable {
   #include 
   int dummy;
   int main (void) { return 0; }
   } "-mfloat-abi=soft"]
}
---

On Wed, May 17, 2023 at 2:25 PM Robin Dapp  wrote:
>
> > Huh, including stdint-gcc.h looks completely wrong.  What's the issue you 
> > are
> > trying to solve?
>
> The way I understood it is that that's a temporary workaround until
> all multilib et al. (+testsuite) configurations are in place but I
> haven't checked the details myself.  Eventually this should be done
> properly so we can include the regular headers.  Kito might want to
> comment as he dealt with it before.
>
> I used #include  for all those tests and Andreas Schwab reported:
>
>   /usr/include/gnu/stubs.h:11:11: fatal error: gnu/stubs-ilp32d.h: No such 
> file or directory
>
> Regards
>  Robin

Re: ping: [PATCH] rs6000: Enable const_anchor for 'addi'

2023-05-17 Thread Jiufu Guo via Gcc-patches



Gentle ping...

Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> I'm thinking that we may enable this patch for stage1, so ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html
>
> BR,
> Jeff (Jiufu)
>
> Jiufu Guo  writes:
>
>> Hi,
>>
>> There is a functionality as const_anchor in cse.cc.  This const_anchor
>> supports to generate new constants through adding small gap/offsets to
>> existing constant.  For example:
>>
>> void __attribute__ ((noinline)) foo (long long *a)
>> {
>>   *a++ = 0x2351847027482577LL;
>>   *a++ = 0x2351847027482578LL;
>> }
>> The second constant (0x2351847027482578LL) can be compated by adding '1'
>> to the first constant (0x2351847027482577LL).
>> This is profitable if more than one instructions are need to build the
>> second constant.
>>
>> * For rs6000, we can enable this functionality, as the instruction
>> 'addi' is just for this when gap is smaller than 0x8000.
>>
>> * Besides enabling TARGET_CONST_ANCHOR on rs6000, this patch also fixed
>> one issue. The issue is:
>> "gcc_assert (SCALAR_INT_MODE_P (mode))" is an requirement for function
>> "try_const_anchors". 
>>
>> * One potential side effect of this patch:
>> Comparing with
>> "r101=0x2351847027482577LL
>> ...
>> r201=0x2351847027482578LL"
>> The new r201 will be "r201=r101+1", and then r101 will live longer,
>> and would increase pressure when allocating registers.
>> But I feel, this would be acceptable for this const_anchor feature.
>>
>> * With this patch, I checked the performance change on SPEC2017, while,
>> and the performance is not aggressive, since this functionality is not
>> hit on any hot path. There are runtime wavings/noise(e.g. on
>> povray_r/xalancbmk_r/xz_r), that are not caused by the patch.
>>
>> With this patch, I also checked the changes in object files (from
>> GCC bootstrap and SPEC), the significant changes are the improvement
>> that: "addi" vs. "2 or more insns: lis+or.."; it also exposes some
>> other optimizations opportunities: like combine/jump2. While the
>> code to store/load one more register is also occurring in few cases,
>> but it does not impact overall performance.
>>
>> * To refine this patch, some history discussions are referenced:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33699
>> https://gcc.gnu.org/pipermail/gcc-patches/2009-April/260421.html
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566744.html
>>
>>
>> Bootstrap and regtest pass on ppc64 and ppc64le for this patch.
>> Is this ok for trunk?
>>
>>
>> BR,
>> Jeff (Jiufu)
>>
>> gcc/ChangeLog:
>>
>>  * config/rs6000/rs6000.cc (TARGET_CONST_ANCHOR): New define.
>>  * cse.cc (cse_insn): Add guard condition.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/powerpc/const_anchors.c: New test.
>>  * gcc.target/powerpc/try_const_anchors_ice.c: New test.
>>
>> ---
>>  gcc/config/rs6000/rs6000.cc   |  4 
>>  gcc/cse.cc|  3 ++-
>>  .../gcc.target/powerpc/const_anchors.c| 20 +++
>>  .../powerpc/try_const_anchors_ice.c   | 16 +++
>>  4 files changed, 42 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/const_anchors.c
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/try_const_anchors_ice.c
>>
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index d2743f7bce6..80cded6dec1 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -1760,6 +1760,10 @@ static const struct attribute_spec 
>> rs6000_attribute_table[] =
>>  
>>  #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
>>  #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
>> +
>> +#undef TARGET_CONST_ANCHOR
>> +#define TARGET_CONST_ANCHOR 0x8000
>> +
>>  
>>  
>>  /* Processor table.  */
>> diff --git a/gcc/cse.cc b/gcc/cse.cc
>> index b13afd4ba72..56542b91c1e 100644
>> --- a/gcc/cse.cc
>> +++ b/gcc/cse.cc
>> @@ -5005,7 +5005,8 @@ cse_insn (rtx_insn *insn)
>>if (targetm.const_anchor
>>&& !src_related
>>&& src_const
>> -  && GET_CODE (src_const) == CONST_INT)
>> +  && GET_CODE (src_const) == CONST_INT
>> +  && SCALAR_INT_MODE_P (mode))
>>  {
>>src_related = try_const_anchors (src_const, mode);
>>src_related_is_const_anchor = src_related != NULL_RTX;
>> diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c 
>> b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
>> new file mode 100644
>> index 000..39958ff9765
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-do compile { target has_arch_ppc64 } } */
>> +/* { dg-options "-O2" } */
>> +
>> +#define C1 0x2351847027482577ULL
>> +#define C2 0x2351847027482578ULL
>> +
>> +void __attribute__ ((noinline)) foo (long long *a)
>> +{
>> +  *a++ = C1;
>> +  *a++ = C2;
>> +}
>> +
>> +void __attribute__ ((noinline)) foo1 (long long *a, long long b)
>> +{
>> +

ping^^^ [PATCH] rs6000: mark tieable between INT and FLOAT

2023-05-17 Thread Jiufu Guo via Gcc-patches



Gentle ping...

Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> I would ping this patch for stage1:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609504.html
>
> BR,
> Jeff (Jiufu)
>
> Jiufu Guo via Gcc-patches  writes:
>
>> Hi,
>>
>> Gently Ping:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609504.html
>>
>> BR,
>> Jeff (Jiufu)
>>
>>
>> Jiufu Guo  writes:
>>
>>> Hi,
>>>
>>> During discussing/review patches in maillist, we find more modes are
>>> tieable, e.g. DI<->DF.  With some discussion, I drafted this patch
>>> to mark more tieable modes.
>>>
>>> Bootstrap and regtest pass on ppc64{,le}.
>>> Is this ok for trunk?
>>>
>>> BR,
>>> Jeff (Jiufu)
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/rs6000/rs6000.cc (rs6000_modes_tieable_p): Mark more tieable
>>> modes.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * g++.target/powerpc/pr102024.C: Updated.
>>>
>>> ---
>>>  gcc/config/rs6000/rs6000.cc | 9 +
>>>  gcc/testsuite/g++.target/powerpc/pr102024.C | 3 ++-
>>>  2 files changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index 6ac3adcec6b..3cb0186089e 100644
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -1968,6 +1968,15 @@ rs6000_modes_tieable_p (machine_mode mode1, 
>>> machine_mode mode2)
>>>if (ALTIVEC_OR_VSX_VECTOR_MODE (mode2))
>>>  return false;
>>>  
>>> +  /* SFmode format (IEEE DP) in register would not as required,
>>> + So SFmode is restrict here.  */
>>> +  if (GET_MODE_CLASS (mode1) == MODE_FLOAT
>>> +  && GET_MODE_CLASS (mode2) == MODE_INT)
>>> +return GET_MODE_SIZE (mode2) == UNITS_PER_FP_WORD && mode1 != SFmode;
>>> +  if (GET_MODE_CLASS (mode1) == MODE_INT
>>> +  && GET_MODE_CLASS (mode2) == MODE_FLOAT)
>>> +return GET_MODE_SIZE (mode1) == UNITS_PER_FP_WORD && mode2 != SFmode;
>>> +
>>>if (SCALAR_FLOAT_MODE_P (mode1))
>>>  return SCALAR_FLOAT_MODE_P (mode2);
>>>if (SCALAR_FLOAT_MODE_P (mode2))
>>> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C 
>>> b/gcc/testsuite/g++.target/powerpc/pr102024.C
>>> index 769585052b5..27d2dc5e80b 100644
>>> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
>>> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
>>> @@ -5,7 +5,8 @@
>>>  // Test that a zero-width bit field in an otherwise homogeneous aggregate
>>>  // generates a psabi warning and passes arguments in GPRs.
>>>  
>>> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
>>> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 { target has_arch_pwr8 
>>> } } }
>>> +// { dg-final { scan-assembler-times {\mstd\M} 4 { target { ! 
>>> has_arch_pwr8 } } } }
>>>  
>>>  struct a_thing
>>>  {

ping^^: [PATCH V2] rs6000: Enhance lowpart/highpart DI->SF by mtvsrws/mtvsrd

2023-05-17 Thread Jiufu Guo via Gcc-patches

Gentle ping...

Jiufu Guo via Gcc-patches  writes:

> Hi
>
> I would like to ping this patch for stage1:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612168.html
>
> BR,
> Jeff (Jiufu)
>
> Jiufu Guo  writes:
>
>> Hi,
>>
>> Compare with previous version:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609654.html
>> This patch does not use UNSPEC for insn mtvsrws anymore.  And to handle
>> the subreg better on BE and LE, predicate "lowpart_subreg_operator"
>> is introducted. To help combine pass to match the pattern on high32
>> bit of DI, shiftrt is still used.
>>
>> As mentioned in PR108338, on p9, we could use mtvsrws to implement
>> the conversion from SI#0 to SF (or lowpart DI to SF).
>>
>> For examples:
>>   *(long long*)buff = di;
>>   float f = *(float*)(buff);
>> We generate "sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" instead of
>> "mtvsrws 1,3 ; xscvspdpn 1,1".
>>
>> This patch update this, and also enhance the bitcast from highpart
>> DI to SF.
>>
>> Bootstrap and regtests pass on ppc64{,le}.
>> Is this ok for trunk?
>>
>> BR,
>> Jeff (Jiufu)
>>
>>  PR target/108338
>>
>> gcc/ChangeLog:
>>
>>  * config/rs6000/predicates.md (lowpart_subreg_operator): New
>>  define_predicate.
>>  * config/rs6000/rs6000.md (any_rshift): New code_iterator.
>>  (movsf_from_si2): Rename to...
>>  (movsf_from_si2_): ... this.
>>  (si2sf_mtvsrws): New define_insn.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/powerpc/pr108338.c: New test.
>>
>> ---
>>  gcc/config/rs6000/predicates.md |  5 +++
>>  gcc/config/rs6000/rs6000.md | 35 -
>>  gcc/testsuite/gcc.target/powerpc/pr108338.c | 42 +
>>  3 files changed, 73 insertions(+), 9 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108338.c
>>
>> diff --git a/gcc/config/rs6000/predicates.md 
>> b/gcc/config/rs6000/predicates.md
>> index 52c65534e51..e57c9d99c6b 100644
>> --- a/gcc/config/rs6000/predicates.md
>> +++ b/gcc/config/rs6000/predicates.md
>> @@ -2064,3 +2064,8 @@ (define_predicate "macho_pic_address"
>>else
>>  return false;
>>  })
>> +
>> +(define_predicate "lowpart_subreg_operator"
>> +  (and (match_code "subreg")
>> +   (match_test "subreg_lowpart_offset (mode, GET_MODE (SUBREG_REG (op)))
>> +== SUBREG_BYTE (op)")))
>> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>> index 4a7812fa592..5b4a7f8d801 100644
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -7539,6 +7539,14 @@ (define_split
>>   UNSPEC_MOVSI_GOT))]
>>"")
>>  
>> +(define_insn "si2sf_mtvsrws"
>> +  [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
>> +   (subreg:SF (match_operand:SI 1 "gpc_reg_operand" "r") 0))]
>> +  "TARGET_P9_VECTOR && TARGET_XSCVSPDPN"
>> +  "mtvsrws %x0,%1\n\txscvspdpn %x0,%x0"
>> +  [(set_attr "type" "mfvsr")
>> +   (set_attr "length" "8")])
>> +
>>  ;; MR  LA
>>  ;; LWZ LFIWZX  LXSIWZX
>>  ;; STW STFIWX  STXSIWX
>> @@ -8203,10 +8211,18 @@ (define_insn_and_split "movsf_from_si"
>>rtx op2 = operands[2];
>>rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
>>  
>> -  /* Move SF value to upper 32-bits for xscvspdpn.  */
>> -  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
>> -  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
>> -  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
>> +  if (TARGET_P9_VECTOR)
>> +{
>> +  emit_insn (gen_si2sf_mtvsrws (op0, gen_lowpart (SImode, op1_di)));
>> +}
>> +  else
>> +{
>> +  /* Move SF value to upper 32-bits for xscvspdpn.  */
>> +  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
>> +  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
>> +  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
>> +}
>> +
>>DONE;
>>  }
>>[(set_attr "length"
>> @@ -8219,18 +8235,19 @@ (define_insn_and_split "movsf_from_si"
>>  "*,  *, p9v,   p8v,   *, *,
>>   p8v,p8v,   p8v,   *")])
>>  
>> +(define_code_iterator any_rshift [ashiftrt lshiftrt])
>> +
>>  ;; For extracting high part element from DImode register like:
>>  ;; {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
>>  ;; split it before reload with "and mask" to avoid generating shift right
>>  ;; 32 bit then shift left 32 bit.
>> -(define_insn_and_split "movsf_from_si2"
>> +(define_insn_and_split "movsf_from_si2_"
>>[(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
>>  (unspec:SF
>> - [(subreg:SI
>> -   (ashiftrt:DI
>> + [(match_operator:SI 3 "lowpart_subreg_operator"
>> +   [(any_rshift:DI
>>  (match_operand:DI 1 "input_operand" "r")
>> -(const_int 32))
>> -   0)]
>> +(const_int 32))])]
>>   UNSPEC_SF_FROM_SI))
>>(clobber (match_scratch:DI 2 "=r"))]
>>"TARGET_NO_SF_SUBREG"
>> diff

Re: [PATCH RFC] c-family: make -fno-permissive upgrade pedwarns

2023-05-17 Thread Richard Biener via Gcc-patches

On Mon, May 15, 2023 at 3:56 PM Jason Merrill  wrote:
>
> On 5/15/23 03:32, Richard Biener wrote:
> > On Fri, May 12, 2023 at 10:54 PM Jason Merrill via Gcc-patches
> >  wrote:
> >>
> >> In the context of the recent discussion, it occurred to me that this 
> >> semantic
> >> would be useful, but currently there is no easy way to access it.  
> >> Bikeshedding
> >> welcome; the use of this flag is a bit odd, but it has the advantage of 
> >> being
> >> accepted without error going back at least to 4.3.
> >>
> >> -- 8< --
> >>
> >> Currently there is no flag to use to upgrade all currently-enabled pedwarns
> >> from warning to error.  -pedantic-errors also enables the -Wpedantic
> >> pedwarns, while -Werror=pedantic uselessly makes only the -Wpedantic
> >> pedwarns errors.
> >>
> >> I suggest that since -fpermissive lowers some diagnostics from error to
> >> warning, -fno-permissive could do the reverse.
> >
> > Hmm, but that makes '-fno-permissive' different from '-fpermissive
> > -fno-permissive'?
> > What about '-fpermissive -fno-permissive -fno-permissive' then?
> >
> > So I think over-loading -fno-permissive with differen semantics from 
> > negating
> > the option is bad.
>
> Fair enough.  Any other thoughts?  It occurs to me now that it is
> already possible to specify this behavior with -pedantic-errors
> -Wno-pedantic, maybe that's sufficient if a bit cumbersome.

I guess so.  Maybe the -fpermissive documentation could hint at
that for this case?

Richard.

> >> gcc/ChangeLog:
> >>
> >>  * doc/invoke.texi: Document -fno-permissive.
> >>
> >> gcc/c-family/ChangeLog:
> >>
> >>  * c.opt (fpermissive): Accept in C and ObjC as well.
> >>  * c-opts.cc (c_common_post_options): -fno-permissive sets
> >>  global_dc->pedantic_errors.
> >> ---
> >>   gcc/doc/invoke.texi| 7 +++
> >>   gcc/c-family/c.opt | 2 +-
> >>   gcc/c-family/c-opts.cc | 4 
> >>   3 files changed, 12 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> >> index b92b8576027..6198df14382 100644
> >> --- a/gcc/doc/invoke.texi
> >> +++ b/gcc/doc/invoke.texi
> >> @@ -3438,11 +3438,18 @@ issue.  Currently, the only such diagnostic issued 
> >> by G++ is the one for
> >>   a name having multiple meanings within a class.
> >>
> >>   @opindex fpermissive
> >> +@opindex fno-permissive
> >>   @item -fpermissive
> >>   Downgrade some diagnostics about nonconformant code from errors to
> >>   warnings.  Thus, using @option{-fpermissive} allows some
> >>   nonconforming code to compile.
> >>
> >> +Conversely, @option{-fno-permissive} can be used to upgrade some
> >> +diagnostics about nonconformant code from warnings to errors.  This
> >> +differs from @option{-pedantic-errors} in that the latter also implies
> >> +@option{-Wpedantic}; this option does not enable additional
> >> +diagnostics, only upgrades the severity of those that are enabled.
> >> +
> >>   @opindex fno-pretty-templates
> >>   @opindex fpretty-templates
> >>   @item -fno-pretty-templates
> >> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> >> index cddeece..07165d2bbe8 100644
> >> --- a/gcc/c-family/c.opt
> >> +++ b/gcc/c-family/c.opt
> >> @@ -2075,7 +2075,7 @@ C ObjC C++ ObjC++
> >>   Look for and use PCH files even when preprocessing.
> >>
> >>   fpermissive
> >> -C++ ObjC++ Var(flag_permissive)
> >> +C ObjC C++ ObjC++ Var(flag_permissive)
> >>   Downgrade conformance errors to warnings.
> >>
> >>   fplan9-extensions
> >> diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> >> index c68a2a27469..1973c068d59 100644
> >> --- a/gcc/c-family/c-opts.cc
> >> +++ b/gcc/c-family/c-opts.cc
> >> @@ -1021,6 +1021,10 @@ c_common_post_options (const char **pfilename)
> >> SET_OPTION_IF_UNSET (_options, _options_set,
> >> flag_delete_dead_exceptions, true);
> >>
> >> +  if (!global_options_set.x_flag_pedantic_errors
> >> +  && global_options_set.x_flag_permissive)
> >> +global_dc->pedantic_errors = !flag_permissive;
> >> +
> >> if (cxx_dialect >= cxx11)
> >>   {
> >> /* If we're allowing C++0x constructs, don't warn about C++98
> >>
> >> base-commit: 62c4d34ec005e95f000ffabd34da440dc62ac346
> >> --
> >> 2.31.1
> >>
> >
>

Re: [PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1

2023-05-17 Thread Richard Biener via Gcc-patches

On Wed, May 17, 2023 at 8:09 AM Kewen.Lin  wrote:
>
> Hi,
>
> This patch is to refactor the handlings for the case (index
> == count) in a loop of vect_transform_slp_perm_load_1, in
> order to prepare a subsequent adjustment on *nperm.  This
> patch doesn't have any functional changes.

The diff is impossible to be reviewed - can you explain the
refactoring you have done or also attach a patch more clearly
showing what you change?

> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Refactor the
> handling on the case index == count.
> ---
>  gcc/tree-vect-slp.cc | 89 ++--
>  1 file changed, 44 insertions(+), 45 deletions(-)
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 3b7a21724ec..e5c9d7e766e 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -8230,59 +8230,50 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, 
> slp_tree node,
> noop_p = false;
>mask[index++] = mask_element;
>
> -  if (index == count && !noop_p)
> +  if (index == count)
> {
> - indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
> - if (!can_vec_perm_const_p (mode, mode, indices))
> + if (!noop_p)
> {
> - if (dump_p)
> + indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, 
> nunits);
> + if (!can_vec_perm_const_p (mode, mode, indices))
> {
> - dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> -  vect_location,
> -  "unsupported vect permute { ");
> - for (i = 0; i < count; ++i)
> + if (dump_p)
> {
> - dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
> - dump_printf (MSG_MISSED_OPTIMIZATION, " ");
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +  "unsupported vect permute { ");
> + for (i = 0; i < count; ++i)
> +   {
> + dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
> + dump_printf (MSG_MISSED_OPTIMIZATION, " ");
> +   }
> + dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
> }
> - dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
> + gcc_assert (analyze_only);
> + return false;
> }
> - gcc_assert (analyze_only);
> - return false;
> -   }
>
> - ++*n_perms;
> -   }
> + ++*n_perms;
>
> -  if (index == count)
> -   {
> - if (!analyze_only)
> -   {
> - tree mask_vec = NULL_TREE;
> -
> - if (! noop_p)
> -   mask_vec = vect_gen_perm_mask_checked (vectype, indices);
> + if (!analyze_only)
> +   {
> + tree mask_vec = vect_gen_perm_mask_checked (vectype, 
> indices);
>
> - if (second_vec_index == -1)
> -   second_vec_index = first_vec_index;
> + if (second_vec_index == -1)
> +   second_vec_index = first_vec_index;
>
> - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> -   {
> - /* Generate the permute statement if necessary.  */
> - tree first_vec = dr_chain[first_vec_index + ri];
> - tree second_vec = dr_chain[second_vec_index + ri];
> - gimple *perm_stmt;
> - if (! noop_p)
> + for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
> {
> - gassign *stmt = as_a  (stmt_info->stmt);
> + /* Generate the permute statement if necessary.  */
> + tree first_vec = dr_chain[first_vec_index + ri];
> + tree second_vec = dr_chain[second_vec_index + ri];
> + gassign *stmt = as_a (stmt_info->stmt);
>   tree perm_dest
> = vect_create_destination_var (gimple_assign_lhs 
> (stmt),
>vectype);
>   perm_dest = make_ssa_name (perm_dest);
> - perm_stmt
> + gimple *perm_stmt
> = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
> -  first_vec, second_vec,
> -  mask_vec);
> +  first_vec, second_vec, 
> mask_vec);
>   vect_finish_stmt_generation (vinfo, stmt_info, 
> perm_stmt,

Re: [PATCH] vect: Don't retry if the previous analysis fails

2023-05-17 Thread Richard Biener via Gcc-patches

On Wed, May 17, 2023 at 8:06 AM Kewen.Lin  wrote:
>
> Hi,
>
> When working on a cost tweaking patch, I found that a newly
> added test case has different dumpings with stage-1 and
> bootstrapped gcc.  By looking into it, the apparent reason
> is vect_analyze_loop_2 doesn't get slp_done_for_suggested_uf
> set expectedly, the following retrying will use the garbage
> slp_done_for_suggested_uf instead.  In fact, the setting of
> slp_done_for_suggested_uf only happens when the previous
> analysis succeeds, for the mentioned test case, its previous
> analysis does fail, it's unexpected to use the value of
> slp_done_for_suggested_uf any more.
>
> In function vect_analyze_loop_1, we only return success when
> res is true, which is the result of 1st analysis.  It means
> we never try to vectorize with unroll_vinfo if the previous
> analysis fails.  So this patch shouldn't break anything, and
> just stop some useless analysis early.
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?

OK for trunk and affected branches.

Richard.

> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * tree-vect-loop.cc (vect_analyze_loop_1): Don't retry analysis with
> suggested unroll factor once the previous analysis fails.
> ---
>  gcc/tree-vect-loop.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index ed0166fedab..905145ae97b 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3044,7 +3044,7 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
> *shared,
>  res ? "succeeded" : " failed",
>  GET_MODE_NAME (loop_vinfo->vector_mode));
>
> -  if (!main_loop_vinfo && suggested_unroll_factor > 1)
> +  if (res && !main_loop_vinfo && suggested_unroll_factor > 1)
>  {
>if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> --
> 2.31.1

Re: [PATCH] Fix PR 106900: array-bounds warning inside simplify_builtin_call

2023-05-17 Thread Richard Biener via Gcc-patches

On Wed, May 17, 2023 at 2:54 AM Andrew Pinski via Gcc-patches
 wrote:
>
> The problem here is that VRP cannot figure out isize could not be 0
> due to using integer_zerop. This patch removes the use of integer_zerop
> and instead checks for 0 directly after converting the tree to
> an unsigned HOST_WIDE_INT. This allows VRP to figure out isize is not 0
> and `isize - 1` will always be >= 0.
>
> This patch is just to avoid the warning that GCC could produce sometimes
> and does not change any code generation or even VRP.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

Richard.

> gcc/ChangeLog:
>
> * tree-ssa-forwprop.cc (simplify_builtin_call): Check
> against 0 instead of calling integer_zerop.
> ---
>  gcc/tree-ssa-forwprop.cc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> index 06f19868ade..0326e6733e8 100644
> --- a/gcc/tree-ssa-forwprop.cc
> +++ b/gcc/tree-ssa-forwprop.cc
> @@ -1231,14 +1231,14 @@ simplify_builtin_call (gimple_stmt_iterator *gsi_p, 
> tree callee2)
>   tree size = gimple_call_arg (stmt2, 2);
>   /* Size must be a constant which is <= UNITS_PER_WORD and
>  <= the string length.  */
> - if (TREE_CODE (size) != INTEGER_CST || integer_zerop (size))
> + if (TREE_CODE (size) != INTEGER_CST)
> break;
>
>   if (!tree_fits_uhwi_p (size))
> break;
>
>   unsigned HOST_WIDE_INT sz = tree_to_uhwi (size);
> - if (sz > UNITS_PER_WORD || sz >= slen)
> + if (sz == 0 || sz > UNITS_PER_WORD || sz >= slen)
> break;
>
>   tree ch = gimple_call_arg (stmt2, 1);
> --
> 2.31.1
>

PING^2 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare

2023-05-17 Thread Kewen.Lin via Gcc-patches

Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

> 
> on 2022/11/24 17:15, Kewen Lin wrote:
>> Hi,
>>
>> Following Segher's suggestion, this patch series is to rework
>> function rs6000_emit_vector_compare for vector float and int
>> in multiple steps, it's based on the previous attempts [1][2].
>> As mentioned in [1], the need to rework this for float is to
>> make a centralized place for vector float comparison handlings
>> instead of supporting with swapping ops and reversing code etc.
>> dispersedly.  It's also for a subsequent patch to handle
>> comparison operators with or without trapping math (PR105480).
>> With the handling on vector float reworked, we can further make
>> the handling on vector int simplified as shown.
>>
>> For Segher's concern about whether this rework causes any
>> assembly change, I constructed two testcases for vector float[3]
>> and int[4] respectively before, it showed the most are fine
>> excepting for the difference on LE and UNGT, it's demonstrated
>> as improvement since it uses GE instead of GT ior EQ.  The
>> associated test case in patch 3/9 is a good example.
>>
>> Besides, w/ and w/o the whole patch series, I built the whole
>> SPEC2017 at options -O3 and -Ofast separately, checked the
>> differences on object assembly.  The result showed that the
>> most are unchanged, except for:
>>
>>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>> 9 object files with differences.
>>
>>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>> one and 527.cam4_r has 4 object files with differences.
>>
>> By looking into these differences, all significant differences
>> are caused by the known improvement mentined above transforming
>> GT ior EQ to GE, which can also affect unrolling decision due
>> to insn count.  Some other trivial differences are branch
>> target offset difference, nop difference for alignment, vsx
>> register number differences etc.
>>
>> I also evaluated the runtime performance for these changed
>> benchmarks, the result is neutral.
>>
>> These patches are bootstrapped and regress-tested
>> incrementally on powerpc64-linux-gnu P7 & P8, and
>> powerpc64le-linux-gnu P9 & P10.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
>> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>>
>> Kewen Lin (9):
>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
>>
>>  gcc/config/rs6000/rs6000.cc | 180 ++--
>>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>>  2 files changed, 74 insertions(+), 131 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>>

Re: [PATCH] RISC-V: Adjust stdint.h to stdint-gcc.h for rvv tests

2023-05-17 Thread Robin Dapp via Gcc-patches

> Huh, including stdint-gcc.h looks completely wrong.  What's the issue you are
> trying to solve?

The way I understood it is that that's a temporary workaround until
all multilib et al. (+testsuite) configurations are in place but I
haven't checked the details myself.  Eventually this should be done
properly so we can include the regular headers.  Kito might want to
comment as he dealt with it before.

I used #include  for all those tests and Andreas Schwab reported:
 
  /usr/include/gnu/stubs.h:11:11: fatal error: gnu/stubs-ilp32d.h: No such file 
or directory

Regards
 Robin

PING^1 [PATCH] libgcc: Use initarray section type for .init_stack

2023-05-17 Thread Kewen.Lin via Gcc-patches

Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614228.html

BR,
Kewen

on 2023/3/20 14:33, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> One of my workmates found there is a warning like:
> 
>   libgcc/config/rs6000/morestack.S:402: Warning: ignoring
> incorrect section type for .init_array.0
> 
> when compiling libgcc/config/rs6000/morestack.S.
> 
> Since commit r13-6545 touched that file recently, which was
> suspected to be responsible for this warning, I did some
> investigation and found this is a warning staying for a long
> time.  For section .init_stack*, it's preferred to use
> section type SHT_INIT_ARRAY.  So this patch is use
> "@init_array" to replace "@progbits".
> 
> Although the warning is trivial, Segher suggested me to
> post this to fix it, in order to avoid any possible
> misunderstanding/confusion on the warning.
> 
> As Alan confirmed, this doesn't require a premise check
> on if the existing binutils supports "@init_array" or not,
> "because if you want split-stack to work, you must link
> with gold, any version of binutils that has gold has an
> assembler that understands @init_array". (Thanks Alan!)
> 
> Bootstrapped and regtested on x86_64-redhat-linux
> and powerpc64{,le}-linux-gnu.
> 
> Is it ok for trunk when next stage 1 comes?
> 
> BR,
> Kewen
> -
> libgcc/ChangeLog:
> 
>   * config/i386/morestack.S: Use @init_array rather than
>   @progbits for section type of section .init_array.
>   * config/rs6000/morestack.S: Likewise.
>   * config/s390/morestack.S: Likewise.
> ---
>  libgcc/config/i386/morestack.S   | 2 +-
>  libgcc/config/rs6000/morestack.S | 2 +-
>  libgcc/config/s390/morestack.S   | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/libgcc/config/i386/morestack.S b/libgcc/config/i386/morestack.S
> index 7ae99b50cf5..c822b71e2dd 100644
> --- a/libgcc/config/i386/morestack.S
> +++ b/libgcc/config/i386/morestack.S
> @@ -850,7 +850,7 @@ __morestack_make_guard:
>  # This is ELF specific.
> 
>  #if HAVE_INITFINI_ARRAY_SUPPORT
> - .section.init_array.0,"aw",@progbits
> + .section.init_array.0,"aw",@init_array
>  #else
>   .section.ctors.65535,"aw",@progbits
>  #endif
> diff --git a/libgcc/config/rs6000/morestack.S 
> b/libgcc/config/rs6000/morestack.S
> index f2fea6abb10..dd1e27cd454 100644
> --- a/libgcc/config/rs6000/morestack.S
> +++ b/libgcc/config/rs6000/morestack.S
> @@ -399,7 +399,7 @@ ENTRY0(__morestack_make_guard)
> 
>  # Make __stack_split_initialize a high priority constructor.
>  #if HAVE_INITFINI_ARRAY_SUPPORT
> - .section .init_array.0,"aw",@progbits
> + .section .init_array.0,"aw",@init_array
>  #else
>   .section .ctors.65535,"aw",@progbits
>  #endif
> diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S
> index 09a49bb8851..f52e7a6510c 100644
> --- a/libgcc/config/s390/morestack.S
> +++ b/libgcc/config/s390/morestack.S
> @@ -597,7 +597,7 @@ __morestack_make_guard:
>  # Make __stack_split_initialize a high priority constructor.
> 
>  #if HAVE_INITFINI_ARRAY_SUPPORT
> - .section .init_array.0,"aw",@progbits
> + .section .init_array.0,"aw",@init_array
>  #else
>   .section .ctors.65535,"aw",@progbits
>  #endif
> --
> 2.31.1

Re: [PATCH v2] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-05-17 Thread Kewen.Lin via Gcc-patches

Hi,

I'd like to gentle ping this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html

BR,
Kewen

on 2023/3/29 15:18, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> By addressing Alexander's comments, against v1 this
> patch v2 mainly:
> 
>   - Rename no_real_insns_p to no_real_nondebug_insns_p;
>   - Introduce enum rgn_bb_deps_free_action for three
> kinds of actions to free deps;
>   - Change function free_deps_for_bb_no_real_insns_p to
> resolve_forw_deps which only focuses on forward deps;
>   - Extend the handlings to cover dbg-cnt sched_block,
> add one test case for it;
>   - Move free_trg_info call in schedule_region to an
> appropriate place.
> 
> One thing I'm not sure about is the change in function
> sched_rgn_local_finish, currently the invocation to
> sched_rgn_local_free is guarded with !sel_sched_p (),
> so I just follow it, but the initialization of those
> structures (in sched_rgn_local_init) isn't guarded
> with !sel_sched_p (), it looks odd.
> 
> 
> 
> As PR108273 shows, when there is one block which only has
> NOTE_P and LABEL_P insns at non-debug mode while has some
> extra DEBUG_INSN_P insns at debug mode, after scheduling
> it, the DFA states would be different between debug mode
> and non-debug mode.  Since at non-debug mode, the block
> meets no_real_insns_p, it gets skipped; while at debug
> mode, it gets scheduled, even it only has NOTE_P, LABEL_P
> and DEBUG_INSN_P, the call of function advance_one_cycle
> will change the DFA state.  PR108519 also shows this issue
> issue can be exposed by some scheduler changes.
> 
> This patch is to change function no_real_insns_p to
> function no_real_nondebug_insns_p by taking debug insn into
> account, which make us not try to schedule for the block
> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
> resulting in consistent DFA states between non-debug and
> debug mode.
> 
> Changing no_real_insns_p to no_real_nondebug_insns_p caused
> ICE when doing free_block_dependencies, the root cause is
> that we create dependencies for debug insns, those
> dependencies are expected to be resolved during scheduling
> insns, but which gets skipped after this change.
> By checking the code, it looks it's reasonable to skip to
> compute block dependences for no_real_nondebug_insns_p
> blocks.  There is also another issue, which gets exposed
> in SPEC2017 bmks build at option -O2 -g, is that we could
> skip to schedule some block, which already gets dependency
> graph built so has dependencies computed and rgn_n_insns
> accumulated, then the later verification on if the graph
> becomes exhausted by scheduling would fail as follow:
> 
>   /* Sanity check: verify that all region insns were
>  scheduled.  */
> gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> 
> , and also some forward deps aren't resovled.
> 
> As Alexander pointed out, the current debug count handling
> also suffers the similar issue, so this patch handles these
> two cases together: one is for some block gets skipped by
> !dbg_cnt (sched_block), the other is for some block which
> is not no_real_nondebug_insns_p initially but becomes
> no_real_nondebug_insns_p due to speculative scheduling.
> 
> This patch can be bootstrapped and regress-tested on
> x86_64-redhat-linux, aarch64-linux-gnu and
> powerpc64{,le}-linux-gnu.
> 
> I also verified this patch can pass SPEC2017 both intrate
> and fprate bmks building at -g -O2/-O3.
> 
> Any thoughts?
> 
> BR,
> Kewen
> 
>   PR rtl-optimization/108273
> 
> gcc/ChangeLog:
> 
>   * haifa-sched.cc (no_real_insns_p): Rename to ...
>   (no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn.
>   * sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with
>   no_real_nondebug_insns_p.
>   * sched-int.h (no_real_insns_p): Rename to ...
>   (no_real_nondebug_insns_p): ... this.
>   * sched-rgn.cc (enum rgn_bb_deps_free_action): New enum.
>   (bb_deps_free_actions): New static variable.
>   (compute_block_dependences): Skip for no_real_nondebug_insns_p.
>   (resolve_forw_deps): New function.
>   (free_block_dependencies): Check bb_deps_free_actions and call
>   function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTICIAL.
>   (compute_priorities): Replace no_real_insns_p with
>   no_real_nondebug_insns_p.
>   (schedule_region): Replace no_real_insns_p with
>   no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTICIAL if the block
>   get dependencies computed before but skipped now, fix up count
>   sched_rgn_n_insns for it too.  Call free_trg_info when the block
>   gets scheduled, and move sched_rgn_local_finish after the loop
>   of free_block_dependencies loop.
>   (sched_rgn_local_init): Allocate and compute bb_deps_free_actions.
>   (sched_rgn_local_finish): Free bb_deps_free_actions.
>   * sel-sched.cc (sel_region_target_finish): Replace no_real_insns_p with
>   no_real_nondebug_insns_p.
> 
>

Re: [PATCH] RISC-V: Adjust stdint.h to stdint-gcc.h for rvv tests

2023-05-17 Thread Richard Biener via Gcc-patches

On Tue, May 16, 2023 at 9:11 AM Robin Dapp via Gcc-patches
 wrote:
>
> > This patch would like to align the stdint.h to the stdint-gcc.h for all
> > the RVV test files. Aka:
> >
> > stdint.h => stdint-gcc.h
>
> Looks good.  Jeff already pre-approved so you can go ahead and install
> this on the trunk.

Huh, including stdint-gcc.h looks completely wrong.  What's the issue you are
trying to solve?

> Regards
>  Robin

[PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1

2023-05-17 Thread Kewen.Lin via Gcc-patches

Hi,

Following Richi's suggestion in [1], I'm working on deferring
cost evaluation next to the transformation, this patch is
to enhance function vect_transform_slp_perm_load_1 which
could under-cost for vector permutation, since the costing
doesn't try to consider nvectors_per_build, it's inconsistent
with the transformation part.

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

Is it ok for trunk?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html

BR,
Kewen
-
gcc/ChangeLog:

* tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the
calculation on n_perms by considering nvectors_per_build.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test.
---
 .../vect/costmodel/ppc/costmodel-slp-perm.c   | 23 +++
 gcc/tree-vect-slp.cc  | 66 ++-
 2 files changed, 57 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
new file mode 100644
index 000..e5c4dceddfb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* Specify power9 to ensure the vectorization is profitable
+   and test point stands, otherwise it could be not profitable
+   to vectorize.  */
+/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */
+
+/* Verify we cost the exact count for required vec_perm.  */
+
+int x[1024], y[1024];
+
+void
+foo ()
+{
+  for (int i = 0; i < 512; ++i)
+{
+  x[2 * i] = y[1023 - (2 * i)];
+  x[2 * i + 1] = y[1023 - (2 * i + 1)];
+}
+}
+
+/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index e5c9d7e766e..af9a6dd4fa9 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, 
slp_tree node,

   mode = TYPE_MODE (vectype);
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node);

   /* Initialize the vect stmts of NODE to properly insert the generated
  stmts later.  */
   if (! analyze_only)
-for (unsigned i = SLP_TREE_VEC_STMTS (node).length ();
-i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++)
+for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; i++)
   SLP_TREE_VEC_STMTS (node).quick_push (NULL);

   /* Generate permutation masks for every NODE. Number of masks for each NODE
@@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, 
slp_tree node,
 (b) the permutes only need a single vector input.  */
   mask.new_vector (nunits, group_size, 3);
   nelts_to_build = mask.encoded_nelts ();
-  nvectors_per_build = SLP_TREE_VEC_STMTS (node).length ();
+  /* It's possible to obtain zero nstmts during analyze_only, so make
+it at least one to ensure the later computation for n_perms
+proceed.  */
+  nvectors_per_build = nstmts > 0 ? nstmts : 1;
   in_nlanes = DR_GROUP_SIZE (stmt_info) * 3;
 }
   else
@@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, 
slp_tree node,
  return false;
}

- ++*n_perms;
-
+ tree mask_vec = NULL_TREE;
  if (!analyze_only)
-   {
- tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);
+   mask_vec = vect_gen_perm_mask_checked (vectype, indices);

- if (second_vec_index == -1)
-   second_vec_index = first_vec_index;
+ if (second_vec_index == -1)
+   second_vec_index = first_vec_index;

- for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
+ for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
+   {
+ ++*n_perms;
+ if (analyze_only)
+   continue;
+ /* Generate the permute statement if necessary.  */
+ tree first_vec = dr_chain[first_vec_index + ri];
+ tree second_vec = dr_chain[second_vec_index + ri];
+ gassign *stmt = as_a (stmt_info->stmt);
+ tree perm_dest
+   = vect_create_destination_var (gimple_assign_lhs (stmt),
+  vectype);
+ perm_dest = make_ssa_name (perm_dest);
+ gimple *perm_stmt
+   = gimple_build_assign (perm_dest, VEC_PERM_EXPR, first_vec,
+  second_vec, mask_vec);
+

[PATCH] Optimized "(X - N * M) / N + M" to "X / N" if valid

2023-05-17 Thread Jiufu Guo via Gcc-patches

Hi,

This patch tries to optimize "(X - N * M) / N + M" to "X / N".
As per the discussions in PR108757, we know this transformation is valid
only under some conditions.
For C code, "/" towards zero (trunc_div), and "X - N * M"
maybe wrap/overflow/underflow. So, it is valid that "X - N * M" does
not cross zero and does not wrap/overflow/underflow.

This patch also handles the case when "N" is the power of 2, where
"(X - N * M) / N" is "(X - N * M) >> log2(N)".

Bootstrap & regtest pass on ppc64{,le} and x86_64.
Is this ok for trunk?

BR,
Jeff (Jiufu)

PR tree-optimization/108757

gcc/ChangeLog:

* gimple-match-head.cc (optimize_x_minus_NM_div_N_plus_M): New function.
* match.pd ((X - N * M) / N + M): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/pr108757-1.c: New test.
* gcc.dg/pr108757-2.c: New test.
* gcc.dg/pr108757.h: New test.

---
 gcc/gimple-match-head.cc  |  54 ++
 gcc/match.pd  |  22 
 gcc/testsuite/gcc.dg/pr108757-1.c |  17 
 gcc/testsuite/gcc.dg/pr108757-2.c |  18 
 gcc/testsuite/gcc.dg/pr108757.h   | 160 ++
 5 files changed, 271 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757.h

diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index b08cd891a13..680a4cb2fc6 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -224,3 +224,57 @@ optimize_successive_divisions_p (tree divisor, tree 
inner_div)
 }
   return true;
 }
+
+/* Return true if "(X - N * M) / N + M" can be optimized into "X / N".
+   Otherwise return false.
+
+   For unsigned,
+   If sign bit of M is 0 (clz is 0), valid range is [N*M, MAX].
+   If sign bit of M is 1, valid range is [0, MAX - N*(-M)].
+
+   For signed,
+   If N*M > 0, valid range: [MIN+N*M, 0] + [N*M, MAX]
+   If N*M < 0, valid range: [MIN, -(-N*M)] + [0, MAX - (-N*M)].  */
+
+static bool
+optimize_x_minus_NM_div_N_plus_M (tree x, wide_int n, wide_int m, tree type)
+{
+  wide_int max = wi::max_value (type);
+  signop sgn = TYPE_SIGN (type);
+  wide_int nm;
+  wi::overflow_type ovf;
+  if (TYPE_UNSIGNED (type) && wi::clz (m) == 0)
+nm = wi::mul (n, -m, sgn, );
+  else
+nm = wi::mul (n, m, sgn, );
+
+  if (ovf != wi::OVF_NONE)
+return false;
+
+  value_range vr0;
+  if (!get_range_query (cfun)->range_of_expr (vr0, x) || vr0.varying_p ()
+  || vr0.undefined_p ())
+return false;
+
+  wide_int wmin0 = vr0.lower_bound ();
+  wide_int wmax0 = vr0.upper_bound ();
+  wide_int min = wi::min_value (type);
+
+  /* unsigned */
+  if ((TYPE_UNSIGNED (type)))
+/* M > 0 (clz != 0): [N*M, MAX],  M < 0 : [0, MAX-N*(-M)]  */
+return wi::clz (m) != 0 ? wi::ge_p (wmin0, nm, sgn)
+   : wi::le_p (wmax0, max - nm, sgn);
+
+  /* signed, N*M > 0 */
+  else if (wi::gt_p (nm, 0, sgn))
+/* [N*M, MAX] or [MIN+N*M, 0] */
+return wi::ge_p (wmin0, nm, sgn)
+  || (wi::ge_p (wmin0, min + nm, sgn) && wi::le_p (wmax0, 0, sgn));
+
+  /* signed, N*M < 0 */
+  /* [MIN, N*M] or [0, MAX + N*M]*/
+  else
+return wi::le_p (wmax0, nm, sgn)
+  || (wi::ge_p (wmin0, 0, sgn) && wi::le_p (wmax0, max - (-nm), sgn));
+}
diff --git a/gcc/match.pd b/gcc/match.pd
index ceae1c34abc..1aaa5530577 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -881,6 +881,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 #endif

 
+#if GIMPLE
+/* Simplify ((t + -N*M) / N + M) -> t / N.  */
+(for div (trunc_div exact_div)
+ (simplify
+  (plus (div (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
+  (with {wide_int n = wi::to_wide (@2); wide_int m = wi::to_wide (@3);}
+(if (INTEGRAL_TYPE_P (type)
+&& n * m == -wi::to_wide (@1)
+&& optimize_x_minus_NM_div_N_plus_M (@0, n, m, type))
+(div @0 @2)
+
+/* Simplify ((t + -(M<> N + M) -> t >> N.  */
+(simplify
+ (plus (rshift (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
+ (with {wide_int n = wi::to_wide (@2); wide_int m = wi::to_wide (@3);}
+   (if (INTEGRAL_TYPE_P (type)
+   && (m << n) == -wi::to_wide (@1)
+   && optimize_x_minus_NM_div_N_plus_M (@0,
+wi::one (TYPE_PRECISION (type)) << n, m, type))
+(rshift @0 @2
+#endif
+
 (for op (negate abs)
  /* Simplify cos(-x) and cos(|x|) -> cos(x).  Similarly for cosh.  */
  (for coss (COS COSH)
diff --git a/gcc/testsuite/gcc.dg/pr108757-1.c 
b/gcc/testsuite/gcc.dg/pr108757-1.c
new file mode 100644
index 000..349318a7c82
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr108757-1.c
@@ -0,0 +1,17 @@
+/* PR tree-optimization/108757 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+#define N 4
+#define M 3
+#define GAP 0
+typedef unsigned int UINT;
+typedef int INT;
+#define UMAX UINT_MAX
+#define IMAX INT_MAX
+#define IMIN INT_MIN
+#include "pr108757.h"
+
+/* { dg-final { scan-tree-dump-not "

[PATCH 1/2] vect: Refactor code for index == count in vect_transform_slp_perm_load_1

2023-05-17 Thread Kewen.Lin via Gcc-patches

Hi,

This patch is to refactor the handlings for the case (index
== count) in a loop of vect_transform_slp_perm_load_1, in
order to prepare a subsequent adjustment on *nperm.  This
patch doesn't have any functional changes.

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

BR,
Kewen
-
gcc/ChangeLog:

* tree-vect-slp.cc (vect_transform_slp_perm_load_1): Refactor the
handling on the case index == count.
---
 gcc/tree-vect-slp.cc | 89 ++--
 1 file changed, 44 insertions(+), 45 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3b7a21724ec..e5c9d7e766e 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -8230,59 +8230,50 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, 
slp_tree node,
noop_p = false;
   mask[index++] = mask_element;

-  if (index == count && !noop_p)
+  if (index == count)
{
- indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
- if (!can_vec_perm_const_p (mode, mode, indices))
+ if (!noop_p)
{
- if (dump_p)
+ indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
+ if (!can_vec_perm_const_p (mode, mode, indices))
{
- dump_printf_loc (MSG_MISSED_OPTIMIZATION,
-  vect_location,
-  "unsupported vect permute { ");
- for (i = 0; i < count; ++i)
+ if (dump_p)
{
- dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
- dump_printf (MSG_MISSED_OPTIMIZATION, " ");
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "unsupported vect permute { ");
+ for (i = 0; i < count; ++i)
+   {
+ dump_dec (MSG_MISSED_OPTIMIZATION, mask[i]);
+ dump_printf (MSG_MISSED_OPTIMIZATION, " ");
+   }
+ dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
}
- dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
+ gcc_assert (analyze_only);
+ return false;
}
- gcc_assert (analyze_only);
- return false;
-   }

- ++*n_perms;
-   }
+ ++*n_perms;

-  if (index == count)
-   {
- if (!analyze_only)
-   {
- tree mask_vec = NULL_TREE;
-
- if (! noop_p)
-   mask_vec = vect_gen_perm_mask_checked (vectype, indices);
+ if (!analyze_only)
+   {
+ tree mask_vec = vect_gen_perm_mask_checked (vectype, indices);

- if (second_vec_index == -1)
-   second_vec_index = first_vec_index;
+ if (second_vec_index == -1)
+   second_vec_index = first_vec_index;

- for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
-   {
- /* Generate the permute statement if necessary.  */
- tree first_vec = dr_chain[first_vec_index + ri];
- tree second_vec = dr_chain[second_vec_index + ri];
- gimple *perm_stmt;
- if (! noop_p)
+ for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
{
- gassign *stmt = as_a  (stmt_info->stmt);
+ /* Generate the permute statement if necessary.  */
+ tree first_vec = dr_chain[first_vec_index + ri];
+ tree second_vec = dr_chain[second_vec_index + ri];
+ gassign *stmt = as_a (stmt_info->stmt);
  tree perm_dest
= vect_create_destination_var (gimple_assign_lhs (stmt),
   vectype);
  perm_dest = make_ssa_name (perm_dest);
- perm_stmt
+ gimple *perm_stmt
= gimple_build_assign (perm_dest, VEC_PERM_EXPR,
-  first_vec, second_vec,
-  mask_vec);
+  first_vec, second_vec, mask_vec);
  vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
   gsi);
  if (dce_chain)
@@ -8290,15 +8281,23 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, 
slp_tree node,
  bitmap_set_bit (used_defs, first_vec_index + ri);
  bitmap_set_bit (used_defs, second_vec_index + ri);
}
+
+ /* Store the vector statement in NODE.  */

[PATCH] vect: Don't retry if the previous analysis fails

2023-05-17 Thread Kewen.Lin via Gcc-patches

Hi,

When working on a cost tweaking patch, I found that a newly
added test case has different dumpings with stage-1 and
bootstrapped gcc.  By looking into it, the apparent reason
is vect_analyze_loop_2 doesn't get slp_done_for_suggested_uf
set expectedly, the following retrying will use the garbage
slp_done_for_suggested_uf instead.  In fact, the setting of
slp_done_for_suggested_uf only happens when the previous
analysis succeeds, for the mentioned test case, its previous
analysis does fail, it's unexpected to use the value of
slp_done_for_suggested_uf any more.

In function vect_analyze_loop_1, we only return success when
res is true, which is the result of 1st analysis.  It means
we never try to vectorize with unroll_vinfo if the previous
analysis fails.  So this patch shouldn't break anything, and
just stop some useless analysis early.

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen
-
gcc/ChangeLog:

* tree-vect-loop.cc (vect_analyze_loop_1): Don't retry analysis with
suggested unroll factor once the previous analysis fails.
---
 gcc/tree-vect-loop.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index ed0166fedab..905145ae97b 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3044,7 +3044,7 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
*shared,
 res ? "succeeded" : " failed",
 GET_MODE_NAME (loop_vinfo->vector_mode));

-  if (!main_loop_vinfo && suggested_unroll_factor > 1)
+  if (res && !main_loop_vinfo && suggested_unroll_factor > 1)
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
--
2.31.1

84 matches

Mail list logo