[PATCH v2] RISC-V: Fix VWEXTF iterator requirement

2023-06-18 Thread Li Xu
gcc/ChangeLog:

* config/riscv/vector-iterators.md: zvfh/zvfhmin depends on the Zve32f 
extension.
---
 gcc/config/riscv/vector-iterators.md | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 8c71c9e22cc..92b372986c7 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -503,12 +503,12 @@
 ])

 (define_mode_iterator VWEXTF [
-  (VNx1SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN < 128")
-  (VNx2SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32")
-  (VNx4SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32")
-  (VNx8SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32")
-  (VNx16SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 32")
-  (VNx32SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN >= 128")
+  (VNx1SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2SF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4SF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8SF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx32SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")

   (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
   (VNx2DF "TARGET_VECTOR_ELEN_FP_64")
--
2.17.1



Re: [PATCH ver 2] rs6000, fix vec_replace_unaligned builtin arguments

2023-06-18 Thread Kewen.Lin via Gcc-patches
Hi Carl,

on 2023/6/16 00:00, Carl Love wrote:
> GCC maintainers:
> 
> Version 2, fixed various typos.  Updated the change log body to say the
> instruction counts were updated.  The instruction counts changed as a
> result of changing the first argument of the vec_replace_unaligned
> builtin call from vector unsigned long long (vull) to vector unsigned
> char (vuc).  When the first argument was vull the builtin call
> generated the vinsd instruction for the two test cases.  The updated
> call with vuc as the first argument generates two vinsw instructions
> instead.  Patch was retested on Power 10 with no regressions.
> 
> The following patch fixes the first argument in the builtin definition
> and the corresponding test cases.  Initially, the builtin specification
> was wrong due to a cut and past error.  The documentation was fixed in:
> 
>commit ed3fea09b18f67e757b5768b42cb6e816626f1db
>Author: Bill Schmidt 
>Date:   Fri Feb 4 13:07:17 2022 -0600
> 
>rs6000: Correct function prototypes for vec_replace_unaligned
> 
>Due to a pasto error in the documentation, vec_replace_unaligned was
>implemented with the same function prototypes as vec_replace_elt.  It 
> was
>intended that vec_replace_unaligned always specify output vectors as 
> having
>type vector unsigned char, to emphasize that elements are potentially
>misaligned by this built-in function.  This patch corrects the
>misimplementation.
> 
> 
> This patch fixes the arguments in the definitions and updates the
> testcases accordingly.  Additionally, a few minor spacing issues are
> fixed.
> 
> The patch has been tested on Power 10 with no regressions.  Please let
> me know if the patch is acceptable for mainline.  Thanks.
> 
>  Carl 
> 
> --
> rs6000, fix vec_replace_unaligned builtin arguments
> 
> The first argument of the vec_replace_unaligned builtin should always be
> unsigned char, as specified in gcc/doc/extend.texi.
> 
> This patch fixes the builtin definitions and updates the testcases to use
> the correct arguments.  The expected instruction counts for the testcase
> are updated.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
>   Fix first argument type.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/ver-replace-word-runnable.c

Wrong case name, s/ver/vec/

Sorry that I didn't catch this during the previous review.
How do you generate changelog?  I guess you don't manually type this
but use something like gcc-{commit}-mklog?  that should get this name
right, it's not perfect though. :)

Excepting for the vinsd/vinsw expected insn counts thing, that was
replied in another thread, the others look good to me.

BR,
Kewen


Re: [PATCH] RISC-V: Fix iterator requirement

2023-06-18 Thread juzhe.zh...@rivai.ai
I understand this patch is fixing VWF, VWF_ZVE64, VWEXTF, base on current 
upstream codes.

I agree with "VWEXTF" changes.

But not "VWF"  "VWF_ZVE64", since current reduction pattern has bugs on ZVE32* 
and ZVE64* and we have refactored them:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622089.html 
So the current these 2 fixes are not necessary.

Would you mind sending V2 with only "VWEXTF" fix ?

Thanks.


juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-06-19 12:26
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; Li Xu
Subject: [PATCH] RISC-V: Fix iterator requirement
VWF is defined under TARGET_MIN_VLEN >= 128.
VWEXTF: zvfh/zvfhmin depends on the Zve32f extension.
 
gcc/ChangeLog:
 
* config/riscv/vector-iterators.md: Fix requirement
---
gcc/config/riscv/vector-iterators.md | 24 +---
1 file changed, 13 insertions(+), 11 deletions(-)
 
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 8c71c9e22cc..bc3cde58612 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -330,18 +330,20 @@
])
(define_mode_iterator VWF [
-  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
   (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
-  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
-  (VNx1SF "TARGET_MIN_VLEN < 128") VNx2SF VNx4SF VNx8SF (VNx16SF 
"TARGET_MIN_VLEN > 32") (VNx32SF "TARGET_MIN_VLEN >= 128")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16")
+  VNx2SF VNx4SF VNx8SF VNx16SF VNx32SF
])
(define_mode_iterator VWF_ZVE64 [
-  VNx1HF VNx2HF VNx4HF VNx8HF VNx16HF VNx32HF
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16") (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16") (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16")
   VNx1SF VNx2SF VNx4SF VNx8SF VNx16SF
])
@@ -503,12 +505,12 @@
])
(define_mode_iterator VWEXTF [
-  (VNx1SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN < 128")
-  (VNx2SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32")
-  (VNx4SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32")
-  (VNx8SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32")
-  (VNx16SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 32")
-  (VNx32SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN >= 128")
+  (VNx1SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2SF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4SF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8SF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx32SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
   (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
   (VNx2DF "TARGET_VECTOR_ELEN_FP_64")
-- 
2.17.1
 
 


Re: RE: [PATCH v1] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64

2023-06-18 Thread juzhe.zh...@rivai.ai
I notice VWF_ZVE64 
should be removed.


juzhe.zh...@rivai.ai
 
From: Li, Pan2
Date: 2023-06-19 09:29
To: 钟居哲; gcc-patches
CC: rdapp.gcc; Jeff Law; Wang, Yanzhang; kito.cheng
Subject: RE: [PATCH v1] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64
Thanks Juzhe, will not send the V2 as only commit log change.
 
Pan
 
From: 钟居哲  
Sent: Monday, June 19, 2023 6:02 AM
To: Li, Pan2 ; gcc-patches 
Cc: rdapp.gcc ; Jeff Law ; Li, Pan2 
; Wang, Yanzhang ; kito.cheng 

Subject: Re: [PATCH v1] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64
 
Add target into changelog:
PR target/110299
 
Otherwise, LGTM.


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-18 23:13
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64
From: Pan Li 
 
The rvv widdening reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.
 
code_for_reduc (code, mode1, mode2)
{
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx16hf; // ZVE128+
 
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx8hf;  // ZVE64
 
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx4hf;  // ZVE32
}
 
Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1HF, VNx1HF) which will return the code of
the ZVE128+ instead of the ZVE32 logically.
 
This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32
will be code_for_reduc (max, VNx1HF, VNx2HF), then the correct code of ZVE32
will be returned as expectation.
 
Please note both GCC 13 and 14 are impacted by this issue.
 
Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 
 
PR 110299
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc: Adjust expand for
modes.
* config/riscv/vector-iterators.md: Remove VWLMUL1, VWLMUL1_ZVE64,
VWLMUL1_ZVE32.
* config/riscv/vector.md
(@pred_widen_reduc_plus): Removed.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): New pattern.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr110299-1.c: New test.
* gcc.target/riscv/rvv/base/pr110299-1.h: New test.
* gcc.target/riscv/rvv/base/pr110299-2.c: New test.
* gcc.target/riscv/rvv/base/pr110299-2.h: New test.
* gcc.target/riscv/rvv/base/pr110299-3.c: New test.
* gcc.target/riscv/rvv/base/pr110299-3.h: New test.
* gcc.target/riscv/rvv/base/pr110299-4.c: New test.
* gcc.target/riscv/rvv/base/pr110299-4.h: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  16 +-
gcc/config/riscv/vector-iterators.md  |  62 -
gcc/config/riscv/vector.md| 243 --
.../gcc.target/riscv/rvv/base/pr110299-1.c|   7 +
.../gcc.target/riscv/rvv/base/pr110299-1.h|   9 +
.../gcc.target/riscv/rvv/base/pr110299-2.c|   8 +
.../gcc.target/riscv/rvv/base/pr110299-2.h|  17 ++
.../gcc.target/riscv/rvv/base/pr110299-3.c|   7 +
.../gcc.target/riscv/rvv/base/pr110299-3.h|  17 ++
.../gcc.target/riscv/rvv/base/pr110299-4.c|   8 +
.../gcc.target/riscv/rvv/base/pr110299-4.h|  17 ++
11 files changed, 253 insertions(+), 158 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-3.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-4.h
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 27545113996..c6c53dc13a5 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1396,16 +1396,8 @@ public:
   rtx expand (function_expander ) const override
   {
-machine_mode mode = e.vector_mode ();
-machine_mode ret_mode = e.ret_mode ();
-
-/* TODO: we will use ret_mode after all types of PR110265 are addressed.  
*/
-if (GET_MODE_INNER (mode) != GET_MODE_INNER (ret_mode))
-  return e.use_exact_insn (
- code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
- 

[PATCH] RISC-V: Fix iterator requirement

2023-06-18 Thread Li Xu
VWF is defined under TARGET_MIN_VLEN >= 128.
VWEXTF: zvfh/zvfhmin depends on the Zve32f extension.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Fix requirement
---
 gcc/config/riscv/vector-iterators.md | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 8c71c9e22cc..bc3cde58612 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -330,18 +330,20 @@
 ])
 
 (define_mode_iterator VWF [
-  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
   (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
-  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
-  (VNx1SF "TARGET_MIN_VLEN < 128") VNx2SF VNx4SF VNx8SF (VNx16SF 
"TARGET_MIN_VLEN > 32") (VNx32SF "TARGET_MIN_VLEN >= 128")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16")
+  VNx2SF VNx4SF VNx8SF VNx16SF VNx32SF
 ])
 
 (define_mode_iterator VWF_ZVE64 [
-  VNx1HF VNx2HF VNx4HF VNx8HF VNx16HF VNx32HF
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16") (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16") (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16")
   VNx1SF VNx2SF VNx4SF VNx8SF VNx16SF
 ])
 
@@ -503,12 +505,12 @@
 ])
 
 (define_mode_iterator VWEXTF [
-  (VNx1SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN < 128")
-  (VNx2SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32")
-  (VNx4SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32")
-  (VNx8SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32")
-  (VNx16SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 32")
-  (VNx32SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN >= 128")
+  (VNx1SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2SF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4SF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8SF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx32SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
 
   (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
   (VNx2DF "TARGET_VECTOR_ELEN_FP_64")
-- 
2.17.1



Re: [PATCH 2/2] xtensa: constantsynth: Add new 2-insns synthesis pattern

2023-06-18 Thread Max Filippov via Gcc-patches
On Sun, Jun 18, 2023 at 12:10 AM Takayuki 'January June' Suwa
 wrote:
>
> This patch adds a new 2-instructions constant synthesis pattern:
>
> -  A non-negative square value that root can fit into a signed 12-bit:
> => "MOVI(.N) Ax, simm12" + "MULL Ax, Ax, Ax"
>
> Due to the execution cost of the integer multiply instruction (MULL), this
> synthesis works only when the 32-bit Integer Multiply Option is configured
> and optimize for size is specified.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.cc (xtensa_constantsynth_2insn):
> Add new pattern for the abovementioned case.
> ---
>  gcc/config/xtensa/xtensa.cc | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


Re: [PATCH 1/2] xtensa: Remove TARGET_MEMORY_MOVE_COST hook

2023-06-18 Thread Max Filippov via Gcc-patches
On Sun, Jun 18, 2023 at 12:10 AM Takayuki 'January June' Suwa
 wrote:
>
> It used to always return a constant 4, which is same as the default
> behavior, but doesn't take into account the effects of secondary
> reloads.
>
> Therefore, the implementation of this target hook is removed.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.cc
> (TARGET_MEMORY_MOVE_COST, xtensa_memory_move_cost): Remove.
> ---
>  gcc/config/xtensa/xtensa.cc | 13 -
>  1 file changed, 13 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


Re: [PATCH] rs6000, fix vec_replace_unaligned builtin arguments

2023-06-18 Thread Kewen.Lin via Gcc-patches
Hi Carl,

on 2023/6/16 00:00, Carl Love wrote:
> On Tue, 2023-06-13 at 11:24 +0800, Kewen.Lin wrote:
>> Hi Carl,
>>
>> on 2023/5/31 04:41, Carl Love wrote:
>>> GCC maintainers:
>>>
>>> The following patch fixes the first argument in the builtin
>>> definition
>>> and the corresponding test cases.  Initially, the builtin
>>> specification
>>> was wrong due to a cut and past error.  The documentation was fixed
>>> in:
>>>
>>>
>>>commit 8cb748a31cd8c7ac9c88b6abc38ce077dd462a7a
>>>Author: Bill Schmidt 
>>>Date:   Fri Feb 4 13:26:44 2022 -0600
>>>
>>>rs6000: Clean up ISA 3.1 documentation [PR100808]
>>>
>>>Due to a pasto error in the documentation,
>>> vec_replace_unaligned was
>>>implemented with the same function prototypes as
>>> vec_replace_elt.  It was
>>>intended that vec_replace_unaligned always specify output
>>> vectors as having
>>>type vector unsigned char, to emphasize that elements are
>>> potentially
>>>misaligned by this built-in function.  This patch corrects
>>> the
>>>misimplementation.
>>>
>>>2022-02-04  Bill Schmidt  
>>>
>>>gcc/
>>>PR target/100808
>>>* doc/extend.texi (Basic PowerPC Built-in Functions
>>> Available on ISA
>>>3.1): Provide consistent type names.  Remove
>>> unnecessary semicolons.
>>>Fix bad line breaks.
>>>
>>
>> Wrong referred commit, should be
>> ed3fea09b18f67e757b5768b42cb6e816626f1db.
>> The above commit used the wrong commit log.
> 
> Fixed the commit reference as noted.
> 
>>
>>> This patch fixes the arguments in the definitions and updates the
>>> testcases accordingly.  Additionally, a few minor spacing issues
>>> are
>>> fixed.
>>>
>>> The patch has been tested on Power 10 with no regressions.  Please
>>> let
>>> me know if the patch is acceptable for mainline.  Thanks.
>>>
>>>  Carl 
>>>
>>> --
>>> rs6000, fix vec_replace_unaligned builtin arguments
>>>
>>> The first argument of the vec_replace_unaligned builtin should
>>> always be
>>> unsinged char, as specified in gcc/doc/extend.texi.
>>
>> s/unsinged/unsigned/
> 
> Fixed.
> 
>>
>>> This patch fixes the buitin definitions and updates the testcases
>>> to use
>>
>> s/buitin/builtin/
> 
> Fixed.
> 
>>
>>> the correct arguments.
>>>
>>> gcc/ChangeLog:
>>> * config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
>>> Fix first argument type.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.target/powerpc/ver-replace-word-runnable.c
>>> (vec_replace_unaligned) Fix first argument type.
>>> (vresult_uchar): Fix expected   results.
>>
>> Nit: unexpected tab.
> 
> Fixed.
> 
>>
>>> (vec_replace_unaligned): Update for loop to check uchar
>>> results.
>>> Remove extra spaces in if statements.
>>> Insert missing spaces in for statements.
>>> (dg-final): Update expected instruction counts.
>>> ---
>>>  gcc/config/rs6000/rs6000-overload.def |  12 +-
>>>  .../powerpc/vec-replace-word-runnable.c   | 157 ++--
>>> --
>>>  2 files changed, 92 insertions(+), 77 deletions(-)
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-overload.def
>>> b/gcc/config/rs6000/rs6000-overload.def
>>> index c582490c084..26dc662b8fb 100644
>>> --- a/gcc/config/rs6000/rs6000-overload.def
>>> +++ b/gcc/config/rs6000/rs6000-overload.def
>>> @@ -3059,17 +3059,17 @@
>>>  VREPLACE_ELT_V2DF
>>>  
>>>  [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
>>> -  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
>>> +  vuc __builtin_vec_replace_un (vuc, unsigned int, const int);
>>>  VREPLACE_UN_UV4SI
>>> -  vuc __builtin_vec_replace_un (vsi, signed int, const int);
>>> +  vuc __builtin_vec_replace_un (vuc, signed int, const int);
>>>  VREPLACE_UN_V4SI
>>> -  vuc __builtin_vec_replace_un (vull, unsigned long long, const
>>> int);
>>> +  vuc __builtin_vec_replace_un (vuc, unsigned long long, const
>>> int);
>>>  VREPLACE_UN_UV2DI
>>> -  vuc __builtin_vec_replace_un (vsll, signed long long, const
>>> int);
>>> +  vuc __builtin_vec_replace_un (vuc, signed long long, const int);
>>>  VREPLACE_UN_V2DI
>>> -  vuc __builtin_vec_replace_un (vf, float, const int);
>>> +  vuc __builtin_vec_replace_un (vuc, float, const int);
>>>  VREPLACE_UN_V4SF
>>> -  vuc __builtin_vec_replace_un (vd, double, const int);
>>> +  vuc __builtin_vec_replace_un (vuc, double, const int);
>>>  VREPLACE_UN_V2DF
>>
>> Looks good, since the given element can be replaced without aligned,
>> the given vector type don't need to match the given element, with
>> the potential implication that it can be misaligned.
>>
>>>  
>>>  [VEC_REVB, vec_revb, __builtin_vec_revb]
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-
>>> runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-
>>> runnable.c
>>> index 27318822871..66b0ef58996 100644
>>> --- 

Re: ping^^: [PATCH] rs6000: Enable const_anchor for 'addi'

2023-06-18 Thread Jiufu Guo via Gcc-patches


Hi!

David Edelsohn  writes:

> This Message Is From an External Sender 
> This message came from outside your organization. 
>  
> On Tue, May 30, 2023 at 11:00 PM Jiufu Guo  wrote:
>
>  Gentle ping...
>
>  Jiufu Guo via Gcc-patches  writes:
>
>  > Gentle ping...
>  >
>  > Jiufu Guo via Gcc-patches  writes:
>  >
>  >> Hi,
>  >>
>  >> I'm thinking that we may enable this patch for stage1, so ping it.
>  >> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html
>  >>
>  >> BR,
>  >> Jeff (Jiufu)
>  >>
>  >> Jiufu Guo  writes:
>  >>
>  >>> Hi,
>  >>>
>  >>> There is a functionality as const_anchor in cse.cc.  This const_anchor
>  >>> supports to generate new constants through adding small gap/offsets to
>  >>> existing constant.  For example:
>  >>>
>  >>> void __attribute__ ((noinline)) foo (long long *a)
>  >>> {
>  >>>   *a++ = 0x2351847027482577LL;
>  >>>   *a++ = 0x2351847027482578LL;
>  >>> }
>  >>> The second constant (0x2351847027482578LL) can be compated by adding '1'
>  >>> to the first constant (0x2351847027482577LL).
>  >>> This is profitable if more than one instructions are need to build the
>  >>> second constant.
>  >>>
>  >>> * For rs6000, we can enable this functionality, as the instruction
>  >>> 'addi' is just for this when gap is smaller than 0x8000.
>  >>>
>  >>> * Besides enabling TARGET_CONST_ANCHOR on rs6000, this patch also fixed
>  >>> one issue. The issue is:
>  >>> "gcc_assert (SCALAR_INT_MODE_P (mode))" is an requirement for function
>  >>> "try_const_anchors". 
>  >>>
>  >>> * One potential side effect of this patch:
>  >>> Comparing with
>  >>> "r101=0x2351847027482577LL
>  >>> ...
>  >>> r201=0x2351847027482578LL"
>  >>> The new r201 will be "r201=r101+1", and then r101 will live longer,
>  >>> and would increase pressure when allocating registers.
>  >>> But I feel, this would be acceptable for this const_anchor feature.
>  >>>
>  >>> * With this patch, I checked the performance change on SPEC2017, while,
>  >>> and the performance is not aggressive, since this functionality is not
>  >>> hit on any hot path. There are runtime wavings/noise(e.g. on
>  >>> povray_r/xalancbmk_r/xz_r), that are not caused by the patch.
>  >>>
>  >>> With this patch, I also checked the changes in object files (from
>  >>> GCC bootstrap and SPEC), the significant changes are the improvement
>  >>> that: "addi" vs. "2 or more insns: lis+or.."; it also exposes some
>  >>> other optimizations opportunities: like combine/jump2. While the
>  >>> code to store/load one more register is also occurring in few cases,
>  >>> but it does not impact overall performance.
>  >>>
>  >>> * To refine this patch, some history discussions are referenced:
>  >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33699
>  >>> https://gcc.gnu.org/pipermail/gcc-patches/2009-April/260421.html
>  >>> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566744.html
>  >>>
>  >>>
>  >>> Bootstrap and regtest pass on ppc64 and ppc64le for this patch.
>  >>> Is this ok for trunk?
>
> Hi, Jiufu
>
> Thanks for developing this patch and your persistence.
>
> The rs6000.cc part of the patch (TARGET_CONST_ANCHOR) is okay for
> Stage 1.  This is approved.

Pushed as r14-1919-g41f42d120c4a66.  Thanks!

BR,
Jeff (Jiufu Guo)

>
> I don't have the authority to approve the change to cse_insn.  Is the 
> cse_insn change a prerequisite?  Will the rs6000 change break or produce wrong
> code without the cse change?  The second part of the patch should be posted 
> separately to the mailing list, with a cc for appropriate maintainers,
> because most maintainers will not be following this specific thread to 
> approve the other part of the patch.
>
> Thanks, David
>  
>  >>>
>  >>>
>  >>> BR,
>  >>> Jeff (Jiufu)
>  >>>
>  >>> gcc/ChangeLog:
>  >>>
>  >>> * config/rs6000/rs6000.cc (TARGET_CONST_ANCHOR): New define.
>  >>> * cse.cc (cse_insn): Add guard condition.
>  >>>
>  >>> gcc/testsuite/ChangeLog:
>  >>>
>  >>> * gcc.target/powerpc/const_anchors.c: New test.
>  >>> * gcc.target/powerpc/try_const_anchors_ice.c: New test.
>  >>>
>  >>> ---
>  >>>  gcc/config/rs6000/rs6000.cc   |  4 
>  >>>  gcc/cse.cc|  3 ++-
>  >>>  .../gcc.target/powerpc/const_anchors.c| 20 +++
>  >>>  .../powerpc/try_const_anchors_ice.c   | 16 +++
>  >>>  4 files changed, 42 insertions(+), 1 deletion(-)
>  >>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/const_anchors.c
>  >>>  create mode 100644 
> gcc/testsuite/gcc.target/powerpc/try_const_anchors_ice.c
>  >>>
>  >>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>  >>> index d2743f7bce6..80cded6dec1 100644
>  >>> --- a/gcc/config/rs6000/rs6000.cc
>  >>> +++ b/gcc/config/rs6000/rs6000.cc
>  >>> @@ -1760,6 +1760,10 @@ static const struct attribute_spec 
> rs6000_attribute_table[] =
>  >>>  
>  >>>  #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
>  >>>  #define 

Re: [PATCH] Check SCALAR_INT_MODE_P in try_const_anchors

2023-06-18 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Fri, 16 Jun 2023, Jiufu Guo wrote:
>
>> Hi,
>> 
>> The const_anchor in cse.cc supports integer constants only.
>> There is a "gcc_assert (SCALAR_INT_MODE_P (mode))" in
>> try_const_anchors.
>> 
>> In the latest code, some non-integer modes are used with const int.
>> For examples:
>> "set (mem/c:BLK (xx) (const_int 0 [0])" occur in md files of
>> rs6000, i386, arm, and pa. For this, the mode may be BLKmode.
>> Pattern "(set (strict_low_part (xx)) (const_int xx))" could
>> be generated in a few ports. For this, the mode may be VOIDmode.
>> 
>> So, avoid mode other than SCALAR_INT_MODE in try_const_anchors
>> would be needed.
>> 
>> Some discussions in the previous thread:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621097.html
>> 
>> Bootstrap  pass on ppc64{,le} and x86_64.
>> Is this ok for trunk?
>
> OK.

Thanks a lot! Committed via r14-1918-gc0bd79300e8fad.

BR,
Jeff (Jiufu Guo)

>
> Richard.
>
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> 
>> gcc/ChangeLog:
>> 
>>  * cse.cc (try_const_anchors): Check SCALAR_INT_MODE.
>> 
>> ---
>>  gcc/cse.cc | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>> 
>> diff --git a/gcc/cse.cc b/gcc/cse.cc
>> index 2bb63ac4105..ddb76fd281d 100644
>> --- a/gcc/cse.cc
>> +++ b/gcc/cse.cc
>> @@ -1312,11 +1312,10 @@ try_const_anchors (rtx src_const, machine_mode mode)
>>rtx lower_exp = NULL_RTX, upper_exp = NULL_RTX;
>>unsigned lower_old, upper_old;
>>  
>> -  /* CONST_INT is used for CC modes, but we should leave those alone.  */
>> -  if (GET_MODE_CLASS (mode) == MODE_CC)
>> +  /* CONST_INT may be in various modes, avoid non-scalar-int mode. */
>> +  if (!SCALAR_INT_MODE_P (mode))
>>  return NULL_RTX;
>>  
>> -  gcc_assert (SCALAR_INT_MODE_P (mode));
>>if (!compute_const_anchors (src_const, _base, _offs,
>>_base, _offs))
>>  return NULL_RTX;
>> 


RE: [PATCH v2] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-06-18 Thread Liu, Hongtao via Gcc-patches


> -Original Message-
> From: Jan Beulich 
> Sent: Friday, June 16, 2023 2:22 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kirill Yukhin ; Liu, Hongtao
> 
> Subject: [PATCH v2] x86: make VPTERNLOG* usable on less than 512-bit
> operands with just AVX512F
> 
> There's no reason to constrain this to AVX512VL, unless instructed so by -
> mprefer-vector-width=, as the wider operation is unusable for more narrow
> operands only when the possible memory source is a non-broadcast one.
> This way even the scalar copysign3 can benefit from the operation
> being a single-insn one (leaving aside moves which the compiler decides to
> insert for unclear reasons, and leaving aside the fact that
> bcst_mem_operand() is too restrictive for broadcast to be embedded right
> into VPTERNLOG*).
> 
> Along with this also request value duplication in ix86_expand_copysign()'s
> call to ix86_build_signbit_mask(), eliminating excess space allocation
> in .rodata.*, filled with zeros which are never read.
> 
> gcc/
> 
>   * config/i386/i386-expand.cc (ix86_expand_copysign): Request
>   value duplication by ix86_build_signbit_mask() when AVX512F and
>   not HFmode.
>   * config/i386/sse.md (*_vternlog_all): Convert to
>   2-alternative form. Adjust "mode" attribute. Add "enabled"
>   attribute.
>   (*_vpternlog_1): Also permit when
> TARGET_AVX512F
>   && !TARGET_PREFER_AVX256.
>   (*_vpternlog_2): Likewise.
>   (*_vpternlog_3): Likewise.
> ---
> I guess the underlying pattern, going along the lines of what
> one_cmpl2 uses, can be applied
> elsewhere as well.
> 
> HFmode could use embedded broadcast too for copysign and alike, but that
> would need to be V2HF -> V8HF (for which I don't think there are any existing
> patterns).
> ---
> v2: Respect -mprefer-vector-width=.
> 
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -2266,7 +2266,7 @@ ix86_expand_copysign (rtx operands[])
>else
>  dest = NULL_RTX;
>op1 = lowpart_subreg (vmode, force_reg (mode, operands[2]), mode);
> -  mask = ix86_build_signbit_mask (vmode, 0, 0);
> +  mask = ix86_build_signbit_mask (vmode, TARGET_AVX512F && mode !=
> + HFmode, 0);
> 
>if (CONST_DOUBLE_P (operands[1]))
>  {
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -12597,11 +12597,11 @@
> (set_attr "mode" "")])
> 
>  (define_insn "*_vternlog_all"
> -  [(set (match_operand:V 0 "register_operand" "=v")
> +  [(set (match_operand:V 0 "register_operand" "=v,v")
>   (unspec:V
> -   [(match_operand:V 1 "register_operand" "0")
> -(match_operand:V 2 "register_operand" "v")
> -(match_operand:V 3 "bcst_vector_operand" "vmBr")
> +   [(match_operand:V 1 "register_operand" "0,0")
> +(match_operand:V 2 "register_operand" "v,v")
> +(match_operand:V 3 "bcst_vector_operand" "vBr,m")
>  (match_operand:SI 4 "const_0_to_255_operand")]
> UNSPEC_VTERNLOG))]
>"TARGET_AVX512F
Change condition to  == 64 || TARGET_AVX512VL || (TARGET_AVX512F && 
!TARGET_PREFER_AVX256)
Also please add a testcase for case TARGET_AVX512F && !TARGET_PREFER_AVX256.
> @@ -12609,10 +12609,22 @@
> it's not real AVX512FP16 instruction.  */
>&& (GET_MODE_SIZE (GET_MODE_INNER (mode)) >= 4
>   || GET_CODE (operands[3]) != VEC_DUPLICATE)"
> -  "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}"
> +{
> +  if (TARGET_AVX512VL)
> +return "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}";
> +  else
> +return "vpternlog\t{%4, %g3, %g2, %g0|%g0, %g2, %g3,
> +%4}"; }
>[(set_attr "type" "sselog")
> (set_attr "prefix" "evex")
> -   (set_attr "mode" "")])
> +   (set (attr "mode")
> +(if_then_else (match_test "TARGET_AVX512VL")
> +   (const_string "")
> +   (const_string "XI")))
> +   (set (attr "enabled")
> + (if_then_else (eq_attr "alternative" "1")
> +   (symbol_ref " == 64 || TARGET_AVX512VL")
> +   (const_string "*")))])
> 
>  ;; There must be lots of other combinations like  ;; @@ -12641,7 +12653,8
> @@
> (any_logic2:V
>   (match_operand:V 3 "regmem_or_bitnot_regmem_operand")
>   (match_operand:V 4 "regmem_or_bitnot_regmem_operand"]
> -  "( == 64 || TARGET_AVX512VL)
> +  "( == 64 || TARGET_AVX512VL
> +|| (TARGET_AVX512F && !TARGET_PREFER_AVX256))
> && ix86_pre_reload_split ()
> && (rtx_equal_p (STRIP_UNARY (operands[1]),
>   STRIP_UNARY (operands[4]))
> @@ -12725,7 +12738,8 @@
> (match_operand:V 2 "regmem_or_bitnot_regmem_operand"))
>   (match_operand:V 3 "regmem_or_bitnot_regmem_operand"))
> (match_operand:V 4 "regmem_or_bitnot_regmem_operand")))]
> -  "( == 64 || TARGET_AVX512VL)
> +  "( == 64 || TARGET_AVX512VL
> +|| (TARGET_AVX512F && !TARGET_PREFER_AVX256))
> && ix86_pre_reload_split ()
> && (rtx_equal_p (STRIP_UNARY (operands[1]),
>   STRIP_UNARY (operands[4]))
> @@ -12808,7 

RE: [PATCH v2] x86: correct and improve "*vec_dupv2di"

2023-06-18 Thread Liu, Hongtao via Gcc-patches


> -Original Message-
> From: Jan Beulich 
> Sent: Friday, June 16, 2023 2:20 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; Kirill Yukhin
> 
> Subject: [PATCH v2] x86: correct and improve "*vec_dupv2di"
> 
> The input constraint for the %vmovddup alternative was wrong, as the upper
> 16 XMM registers require AVX512VL to be used with this insn. To
> compensate, introduce a new alternative permitting all 32 registers, by
> broadcasting to the full 512 bits in that case if AVX512VL is not available.
> 
> gcc/
> 
>   * config/i386/sse.md (vec_dupv2di): Correct %vmovddup input
>   constraint. Add new AVX512F alternative.
Could you add a testcase for that.
Ok with the testcase.
> ---
> Strictly speaking the new alternative could be enabled from AVX2 onwards,
> but vmovddup can frequently be a shorter encoding (VEX2 vs VEX3).
> 
> It was suggested that the previously flawed %vmovddup alternative could
> use "xm" as source constraint. But then its destination would better also use
> "x", I think?
> ---
> v2: Use "* return ..." form. Set "mode" to XI for new alternative
> without AVX512VL.
> 
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -26033,19 +26033,35 @@
>  (symbol_ref "true")))])
> 
>  (define_insn "*vec_dupv2di"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x,v,v,x")
> +  [(set (match_operand:V2DI 0 "register_operand" "=x,v,v,v,x")
>   (vec_duplicate:V2DI
> -   (match_operand:DI 1 "nonimmediate_operand" " 0,Yv,vm,0")))]
> +   (match_operand:DI 1 "nonimmediate_operand" "
> 0,Yv,vm,Yvm,0")))]
>"TARGET_SSE"
>"@
> punpcklqdq\t%0, %0
> vpunpcklqdq\t{%d1, %0|%0, %d1}
> +   * return TARGET_AVX512VL ? \"vpbroadcastq\t{%1, %0|%0, %1}\" :
> + \"vpbroadcastq\t{%1, %g0|%g0, %1}\";
> %vmovddup\t{%1, %0|%0, %1}
> movlhps\t%0, %0"
> -  [(set_attr "isa" "sse2_noavx,avx,sse3,noavx")
> -   (set_attr "type" "sselog1,sselog1,sselog1,ssemov")
> -   (set_attr "prefix" "orig,maybe_evex,maybe_vex,orig")
> -   (set_attr "mode" "TI,TI,DF,V4SF")])
> +  [(set_attr "isa" "sse2_noavx,avx,avx512f,sse3,noavx")
> +   (set_attr "type" "sselog1,sselog1,ssemov,sselog1,ssemov")
> +   (set_attr "prefix" "orig,maybe_evex,evex,maybe_vex,orig")
> +   (set (attr "mode")
> + (cond [(and (eq_attr "alternative" "2")
> + (match_test "!TARGET_AVX512VL"))
> +  (const_string "XI")
> +(eq_attr "alternative" "3")
> +  (const_string "DF")
> +(eq_attr "alternative" "4")
> +  (const_string "V4SF")
> +   ]
> +   (const_string "TI")))
> +   (set (attr "enabled")
> + (if_then_else
> +   (eq_attr "alternative" "2")
> +   (symbol_ref "TARGET_AVX512VL
> +|| (TARGET_AVX512F && !TARGET_PREFER_AVX256)")
> +   (const_string "*")))])
> 
>  (define_insn "avx2_vbroadcasti128_"
>[(set (match_operand:VI_256 0 "register_operand" "=x,v,v")


RE: [PATCH v1] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64

2023-06-18 Thread Li, Pan2 via Gcc-patches
Thanks Juzhe, will not send the V2 as only commit log change.

Pan

From: 钟居哲 
Sent: Monday, June 19, 2023 6:02 AM
To: Li, Pan2 ; gcc-patches 
Cc: rdapp.gcc ; Jeff Law ; Li, Pan2 
; Wang, Yanzhang ; kito.cheng 

Subject: Re: [PATCH v1] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64

Add target into changelog:
PR target/110299

Otherwise, LGTM.

juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-06-18 23:13
To: gcc-patches
CC: juzhe.zhong; 
rdapp.gcc; 
jeffreyalaw; pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64
From: Pan Li mailto:pan2...@intel.com>>

The rvv widdening reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.

code_for_reduc (code, mode1, mode2)
{
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx16hf; // ZVE128+

  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx8hf;  // ZVE64

  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx4hf;  // ZVE32
}

Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1HF, VNx1HF) which will return the code of
the ZVE128+ instead of the ZVE32 logically.

This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32
will be code_for_reduc (max, VNx1HF, VNx2HF), then the correct code of ZVE32
will be returned as expectation.

Please note both GCC 13 and 14 are impacted by this issue.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
Co-Authored by: Juzhe-Zhong mailto:juzhe.zh...@rivai.ai>>

PR 110299

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Adjust expand for
modes.
* config/riscv/vector-iterators.md: Remove VWLMUL1, VWLMUL1_ZVE64,
VWLMUL1_ZVE32.
* config/riscv/vector.md
(@pred_widen_reduc_plus): Removed.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): New pattern.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr110299-1.c: New test.
* gcc.target/riscv/rvv/base/pr110299-1.h: New test.
* gcc.target/riscv/rvv/base/pr110299-2.c: New test.
* gcc.target/riscv/rvv/base/pr110299-2.h: New test.
* gcc.target/riscv/rvv/base/pr110299-3.c: New test.
* gcc.target/riscv/rvv/base/pr110299-3.h: New test.
* gcc.target/riscv/rvv/base/pr110299-4.c: New test.
* gcc.target/riscv/rvv/base/pr110299-4.h: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  16 +-
gcc/config/riscv/vector-iterators.md  |  62 -
gcc/config/riscv/vector.md| 243 --
.../gcc.target/riscv/rvv/base/pr110299-1.c|   7 +
.../gcc.target/riscv/rvv/base/pr110299-1.h|   9 +
.../gcc.target/riscv/rvv/base/pr110299-2.c|   8 +
.../gcc.target/riscv/rvv/base/pr110299-2.h|  17 ++
.../gcc.target/riscv/rvv/base/pr110299-3.c|   7 +
.../gcc.target/riscv/rvv/base/pr110299-3.h|  17 ++
.../gcc.target/riscv/rvv/base/pr110299-4.c|   8 +
.../gcc.target/riscv/rvv/base/pr110299-4.h|  17 ++
11 files changed, 253 insertions(+), 158 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-3.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-4.h

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 27545113996..c6c53dc13a5 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1396,16 +1396,8 @@ public:
   rtx expand (function_expander ) const override
   {
-machine_mode mode = e.vector_mode ();
-machine_mode ret_mode = e.ret_mode ();
-
-/* TODO: we will use ret_mode after all types of PR110265 are addressed.  
*/
-if (GET_MODE_INNER (mode) != GET_MODE_INNER 

[PATCH, rs6000] Generate mfvsrwz for all platforms and remove redundant zero extend [PR106769]

2023-06-18 Thread HAO CHEN GUI via Gcc-patches
Hi,
  This patch modifies vsx extract expander and generates mfvsrwz/stxsiwx
for all platforms when the mode is V4SI and the index of extracted element
is 1 for BE and 2 for LE. Also this patch adds a insn pattern for mfvsrwz
which can help eliminate redundant zero extend.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.

Thanks
Gui Haochen


ChangeLog
rs6000: Generate mfvsrwz for all platforms and remove redundant zero extend

mfvsrwz has lower latency than xxextractuw.  So it should be generated
even with p9 vector enabled if possible.  Also the instruction is
already zero extended.  A combine pattern is needed to eliminate
redundant zero extend instructions.

gcc/
PR target/106769
* config/rs6000/vsx.md (expand vsx_extract_): Skip calling
gen_vsx_extract__p9 when it can be implemented by
mfvsrwz/stxsiwx.
(*vsx_extract__di_p9): Not generate the insn when it can
be generated by mfvsrwz.
(mfvsrwz): New insn pattern.
(*vsx_extract_si): Rename to...
(vsx_extract_si): ..., remove redundant insn condition and
generate the insn on p9 when it can be implemented by
mfvsrwz/stxsiwx.  Add a dup alternative for simple vector moving.
Remove reload_completed from split condition as it's unnecessary.
Remove unnecessary checking from preparation statements.  Set
type and length attributes for each alternative.

gcc/testsuite/
PR target/106769
* gcc.target/powerpc/pr106769.h: New.
* gcc.target/powerpc/pr106769-p8.c: New.
* gcc.target/powerpc/pr106769-p9.c: New.

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0a34ceebeb5..09b0f83db86 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3728,7 +3728,9 @@ (define_expand  "vsx_extract_"
   "VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT"
 {
   /* If we have ISA 3.0, we can do a xxextractuw/vextractu{b,h}.  */
-  if (TARGET_P9_VECTOR)
+  if (TARGET_P9_VECTOR
+  && (mode != V4SImode
+ || INTVAL (operands[2]) != (BYTES_BIG_ENDIAN ? 1 : 2)))
 {
   emit_insn (gen_vsx_extract__p9 (operands[0], operands[1],
operands[2]));
@@ -3798,7 +3800,9 @@ (define_insn_and_split "*vsx_extract__di_p9"
  (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "v,")
  (parallel [(match_operand:QI 2 "const_int_operand" "n,n")]
(clobber (match_scratch:SI 3 "=r,X"))]
-  "VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB"
+  "TARGET_VEXTRACTUB
+   && (mode != V4SImode
+   || INTVAL (operands[2]) != (BYTES_BIG_ENDIAN ? 1 : 2))"
   "#"
   "&& reload_completed"
   [(parallel [(set (match_dup 4)
@@ -3830,58 +3834,67 @@ (define_insn_and_split "*vsx_extract__store_p9"
(set (match_dup 0)
(match_dup 3))])

-(define_insn_and_split  "*vsx_extract_si"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,wa,Z")
+(define_insn "mfvsrwz"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (zero_extend:DI
+ (vec_select:SI
+   (match_operand:V4SI 1 "vsx_register_operand" "wa")
+   (parallel [(match_operand:QI 2 "const_int_operand" "n")]
+   (clobber (match_scratch:V4SI 3 "=v"))]
+  "TARGET_DIRECT_MOVE_64BIT
+   && INTVAL (operands[2]) == (BYTES_BIG_ENDIAN ? 1 : 2)"
+  "mfvsrwz %0,%x1"
+  [(set_attr "type" "mfvsr")
+   (set_attr "isa" "p8v")])
+
+(define_insn_and_split  "vsx_extract_si"
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,wa,Z,wa")
(vec_select:SI
-(match_operand:V4SI 1 "gpc_reg_operand" "v,v,v")
-(parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n")])))
-   (clobber (match_scratch:V4SI 3 "=v,v,v"))]
-  "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT && 
!TARGET_P9_VECTOR"
-  "#"
-  "&& reload_completed"
+(match_operand:V4SI 1 "gpc_reg_operand" "v,v,v,0")
+(parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n,n")])))
+   (clobber (match_scratch:V4SI 3 "=v,v,v,v"))]
+  "TARGET_DIRECT_MOVE_64BIT
+   && (!TARGET_P9_VECTOR || INTVAL (operands[2]) == (BYTES_BIG_ENDIAN ? 1 : 
2))"
+{
+   if (which_alternative == 0)
+ return "mfvsrwz %0,%x1";
+
+   if (which_alternative == 1)
+ return "xxlor %x0,%x1,%x1";
+
+   if (which_alternative == 2)
+ return "stxsiwx %x1,%y0";
+
+   return ASM_COMMENT_START " vec_extract to same register";
+}
+  "&& INTVAL (operands[2]) != (BYTES_BIG_ENDIAN ? 1 : 2)"
   [(const_int 0)]
 {
   rtx dest = operands[0];
   rtx src = operands[1];
   rtx element = operands[2];
-  rtx vec_tmp = operands[3];
-  int value;
+  rtx vec_tmp;
+
+  if (GET_CODE (operands[3]) == SCRATCH)
+vec_tmp = gen_reg_rtx (V4SImode);
+  else
+vec_tmp = operands[3];

   /* Adjust index for LE element ordering, the below minuend 3 is computed by
  GET_MODE_NUNITS (V4SImode) - 1.  */
   if (!BYTES_BIG_ENDIAN)
 element = GEN_INT (3 - INTVAL (element));

-  /* If the value is 

Re: [PATCH V3 1/4] rs6000: build constant via li;rotldi

2023-06-18 Thread Jiufu Guo via Gcc-patches


Hi!

Segher Boessenkool  writes:

> Hi!
>
> On Fri, Jun 16, 2023 at 04:34:12PM +0800, Jiufu Guo wrote:
>> +/* Check if value C can be built by 2 instructions: one is 'li', another is
>> +   rotldi.
>> +
>> +   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
>> +   is set to -1, and return true.  Return false otherwise.  */
>
> Don't say "is set to -1", the point of having this is so you say "is set
> to the "li" value".  Just like you describe what SHIFT is for.
Yes, thanks!
>
>> +static bool
>> +can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
>> +   HOST_WIDE_INT *mask)
>> +{
>> +  int n;
>
> Put shis later, like:
Thanks!
>
>> +  /* Check if C can be rotated to a positive or negative value
>> +  which 'li' instruction is able to load.  */
>   int n;
>> +  if (can_be_rotated_to_lowbits (c, 15, )
>> +  || can_be_rotated_to_lowbits (~c, 15, ))
>> +{
>> +  *mask = HOST_WIDE_INT_M1;
>> +  *shift = HOST_BITS_PER_WIDE_INT - n;
>> +  return true;
>> +}
>
> It is tricky to see ~c will always work, since what is really done is -c
> instead.  Can you just use that here?

Some explanation: 
A negative value of 'li' is:
0b11..11xxx there are 49 leading '1's, and the other 15 tailing bits can
be 0 or 1. With the '~' operation, there are 49 '0's.
After the value is rotated,  there are still 49 '1's. (xxx may also be
at head/tail.) 
For the rotated value, with the '~' operation, there are still 49 '0's.

So, for a value, if there are 49 successive '1's (may cross head/tail).
It should be able to rotate to low 15 bits after the '~' operation.

It would not be enough if using the '-' operation, since '-x=~x+1' in
the bit aspect. As the below case 'li_rotldi_3': 0x8531LL
(rotate left 0x8531 32bit).
The '~c' is 0x7ace, this can be rotated from 0x7ace. (~0x8531).
But '-c' is 0x7ace0001. this value is not good.

>
>> @@ -10266,15 +10291,14 @@ static void
>>  rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
>>  {
>>rtx temp;
>> +  int shift;
>> +  HOST_WIDE_INT mask;
>>HOST_WIDE_INT ud1, ud2, ud3, ud4;
>>  
>>ud1 = c & 0x;
>> -  c = c >> 16;
>> -  ud2 = c & 0x;
>> -  c = c >> 16;
>> -  ud3 = c & 0x;
>> -  c = c >> 16;
>> -  ud4 = c & 0x;
>> +  ud2 = (c >> 16) & 0x;
>> +  ud3 = (c >> 32) & 0x;
>> +  ud4 = (c >> 48) & 0x;
>>  
>>if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
>>|| (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
>> @@ -10305,6 +10329,17 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
>> c)
>>emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
>>   GEN_INT ((ud2 ^ 0x) << 16)));
>>  }
>> +  else if (can_be_built_by_li_and_rotldi (c, , ))
>> +{
>> +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
>> +  unsigned HOST_WIDE_INT imm = (c | ~mask);
>> +  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
>> +
>> +  emit_move_insn (temp, GEN_INT (imm));
>> +  if (shift != 0)
>> +temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
>> +  emit_move_insn (dest, temp);
>> +}
>
> If you would rewrite so it isn't such a run-on thing with "else if",
> instead using early outs, or even some factoring, you could declare the
> variable used only in a tiny scope in that tiny scope instead.

Yes! Early returning is better for a lot of cases.  I would like
to have a refactor patch.

>
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
>> @@ -0,0 +1,54 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -save-temps" } */
>> +/* { dg-require-effective-target has_arch_ppc64 } */
>
> Please put a tiny comment here saying what this test is *for*?  The file
> name is a bit of hint already, but you can indicate much more in one or
> two lines :-)

Oh, yes, thanks for point out this!

>
> With those adjustments, okay for trunk.  Thanks!
>
> (If -c doesn't work, it needs more explanation).

Sure, some words as above.

BR,
Jeff (Jiufu Guo)

>
>
> Segher


[PATCH] RISC-V: Add VLS modes for GNU vectors

2023-06-18 Thread Juzhe-Zhong
This patch is a propsal patch is **NOT** ready to push since
after this patch the total machine modes will exceed 255 which will create ICE
in LTO:
  internal compiler error: in bp_pack_int_in_range, at data-streamer.h:290

The reason we need to add VLS modes for following reason:
1. Enhance GNU vectors codegen:
   For example:
 typedef int32_t vnx8si __attribute__ ((vector_size (32)));

 __attribute__ ((noipa)) void
 f_vnx8si (int32_t * in, int32_t * out)
 {
   vnx8si v = *(vnx8si*)in;
   *(vnx8si *) out = v;
 } 

compile option: --param=riscv-autovec-preference=scalable
before this patch:
f_vnx8si:
ld  a2,0(a0)
ld  a3,8(a0)
ld  a4,16(a0)
ld  a5,24(a0)
addisp,sp,-32
sd  a2,0(a1)
sd  a3,8(a1)
sd  a4,16(a1)
sd  a5,24(a1)
addisp,sp,32
jr  ra

After this patch:
   f_vnx8si:
vsetivlizero,8,e32,m2,ta,ma
vle32.v v2,0(a0)
vse32.v v2,0(a1)
ret

2. Ehance VLA SLP:
void
f (uint8_t *restrict a, uint8_t *restrict b, uint8_t *restrict c)
{
  for (int i = 0; i < 100; ++i)
{
  a[i * 8] = b[i * 8] + c[i * 8];
  a[i * 8 + 1] = b[i * 8] + c[i * 8 + 1];
  a[i * 8 + 2] = b[i * 8 + 2] + c[i * 8 + 2];
  a[i * 8 + 3] = b[i * 8 + 2] + c[i * 8 + 3];
  a[i * 8 + 4] = b[i * 8 + 4] + c[i * 8 + 4];
  a[i * 8 + 5] = b[i * 8 + 4] + c[i * 8 + 5];
  a[i * 8 + 6] = b[i * 8 + 6] + c[i * 8 + 6];
  a[i * 8 + 7] = b[i * 8 + 6] + c[i * 8 + 7];
}
}


..
Loop body:
 ...
 vrgatherei16.vv...
 ...

Tail:
 lbu a4,792(a1)
lbu a5,792(a2)
addwa5,a5,a4
sb  a5,792(a0)
lbu a5,793(a2)
addwa5,a5,a4
sb  a5,793(a0)
lbu a4,794(a1)
lbu a5,794(a2)
addwa5,a5,a4
sb  a5,794(a0)
lbu a5,795(a2)
addwa5,a5,a4
sb  a5,795(a0)
lbu a4,796(a1)
lbu a5,796(a2)
addwa5,a5,a4
sb  a5,796(a0)
lbu a5,797(a2)
addwa5,a5,a4
sb  a5,797(a0)
lbu a4,798(a1)
lbu a5,798(a2)
addwa5,a5,a4
sb  a5,798(a0)
lbu a5,799(a2)
addwa5,a5,a4
sb  a5,799(a0)
ret

The tail elements need VLS modes to vectorize like ARM SVE:

f:
mov x3, 0
cntbx5
mov x4, 792
whilelo p7.b, xzr, x4
.L2:
ld1bz31.b, p7/z, [x1, x3]
ld1bz30.b, p7/z, [x2, x3]
trn1z31.b, z31.b, z31.b
add z31.b, z31.b, z30.b
st1bz31.b, p7, [x0, x3]
add x3, x3, x5
whilelo p7.b, x3, x4
b.any   .L2
Tail:
ldr b31, [x1, 792]
ldr b27, [x1, 794]
ldr b28, [x1, 796]
dup v31.8b, v31.b[0]
ldr b29, [x1, 798]
ldr d30, [x2, 792]
ins v31.b[2], v27.b[0]
ins v31.b[3], v27.b[0]
ins v31.b[4], v28.b[0]
ins v31.b[5], v28.b[0]
ins v31.b[6], v29.b[0]
ins v31.b[7], v29.b[0]
add v31.8b, v30.8b, v31.8b
str d31, [x0, 792]
ret

Notice ARM SVE use ADVSIMD modes (Neon) to vectorize the tail.

gcc/ChangeLog:

* config/riscv/riscv-modes.def (VECTOR_BOOL_MODE): Add VLS modes for 
GNU vectors.
(ADJUST_ALIGNMENT): Ditto.
(ADJUST_BYTESIZE): Ditto.

(ADJUST_PRECISION): Ditto.
(VECTOR_MODES): Ditto.
* config/riscv/riscv-protos.h (riscv_v_ext_vls_mode_p): Ditto.
(get_regno_alignment): Ditto.
* config/riscv/riscv-v.cc (INCLUDE_ALGORITHM): Ditto.
(const_vlmax_p): Ditto.
(legitimize_move): Ditto.
(get_vlmul): Ditto.
(get_regno_alignment): Ditto.
(get_ratio): Ditto.
(get_vector_mode): Ditto.
* config/riscv/riscv-vector-switch.def (VLS_ENTRY): Ditto.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p): Ditto.
(VLS_ENTRY): Ditto.
(riscv_v_ext_mode_p): Ditto.
(riscv_hard_regno_nregs): Ditto.
(riscv_hard_regno_mode_ok): Ditto.
* config/riscv/riscv.md: Ditto.
* config/riscv/vector-iterators.md: Ditto.
* config/riscv/vector.md: Ditto.
* config/riscv/autovec-vls.md: New file.

---
 gcc/config/riscv/autovec-vls.md  | 102 +++
 gcc/config/riscv/riscv-modes.def |  72 +
 gcc/config/riscv/riscv-protos.h  |   2 +
 gcc/config/riscv/riscv-v.cc  | 122 ++-
 gcc/config/riscv/riscv-vector-switch.def |  62 
 gcc/config/riscv/riscv.cc|  45 ++---
 gcc/config/riscv/riscv.md|   6 +-
 gcc/config/riscv/vector-iterators.md |  57 +++
 gcc/config/riscv/vector.md   | 113 +++--
 9 files 

Re: [PATCH v1] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64

2023-06-18 Thread 钟居哲
Add target into changelog:
PR target/110299

Otherwise, LGTM.


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-18 23:13
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64
From: Pan Li 
 
The rvv widdening reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.
 
code_for_reduc (code, mode1, mode2)
{
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx16hf; // ZVE128+
 
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx8hf;  // ZVE64
 
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx4hf;  // ZVE32
}
 
Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1HF, VNx1HF) which will return the code of
the ZVE128+ instead of the ZVE32 logically.
 
This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32
will be code_for_reduc (max, VNx1HF, VNx2HF), then the correct code of ZVE32
will be returned as expectation.
 
Please note both GCC 13 and 14 are impacted by this issue.
 
Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 
 
PR 110299
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc: Adjust expand for
modes.
* config/riscv/vector-iterators.md: Remove VWLMUL1, VWLMUL1_ZVE64,
VWLMUL1_ZVE32.
* config/riscv/vector.md
(@pred_widen_reduc_plus): Removed.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): New pattern.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr110299-1.c: New test.
* gcc.target/riscv/rvv/base/pr110299-1.h: New test.
* gcc.target/riscv/rvv/base/pr110299-2.c: New test.
* gcc.target/riscv/rvv/base/pr110299-2.h: New test.
* gcc.target/riscv/rvv/base/pr110299-3.c: New test.
* gcc.target/riscv/rvv/base/pr110299-3.h: New test.
* gcc.target/riscv/rvv/base/pr110299-4.c: New test.
* gcc.target/riscv/rvv/base/pr110299-4.h: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  16 +-
gcc/config/riscv/vector-iterators.md  |  62 -
gcc/config/riscv/vector.md| 243 --
.../gcc.target/riscv/rvv/base/pr110299-1.c|   7 +
.../gcc.target/riscv/rvv/base/pr110299-1.h|   9 +
.../gcc.target/riscv/rvv/base/pr110299-2.c|   8 +
.../gcc.target/riscv/rvv/base/pr110299-2.h|  17 ++
.../gcc.target/riscv/rvv/base/pr110299-3.c|   7 +
.../gcc.target/riscv/rvv/base/pr110299-3.h|  17 ++
.../gcc.target/riscv/rvv/base/pr110299-4.c|   8 +
.../gcc.target/riscv/rvv/base/pr110299-4.h|  17 ++
11 files changed, 253 insertions(+), 158 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-3.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-4.h
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 27545113996..c6c53dc13a5 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1396,16 +1396,8 @@ public:
   rtx expand (function_expander ) const override
   {
-machine_mode mode = e.vector_mode ();
-machine_mode ret_mode = e.ret_mode ();
-
-/* TODO: we will use ret_mode after all types of PR110265 are addressed.  
*/
-if (GET_MODE_INNER (mode) != GET_MODE_INNER (ret_mode))
-  return e.use_exact_insn (
- code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
-else
-  return e.use_exact_insn (
- code_for_pred_reduc (CODE, e.vector_mode (), e.ret_mode ()));
+return e.use_exact_insn (
+  code_for_pred_reduc (CODE, e.vector_mode (), e.ret_mode ()));
   }
};
@@ -1420,7 +1412,7 @@ public:
   {
 return e.use_exact_insn (code_for_pred_widen_reduc_plus (UNSPEC,
 e.vector_mode (),
-  e.vector_mode ()));
+  e.ret_mode ()));
   }
};
@@ -1449,7 +1441,7 @@ public:
   {
 return e.use_exact_insn (code_for_pred_widen_reduc_plus (UNSPEC,
 e.vector_mode (),
-  e.vector_mode 

Re: Extend fnsummary to predict SRA oppurtunities

2023-06-18 Thread Jan Hubicka via Gcc-patches
Hi,
as noticed by Jeff, this patch also triggers warning in one of LTO
testcases.  The testcase is reduced and warning seems legit, triggered
by extra inlining.  So I have just silenced it.

Honza

gcc/testsuite/ChangeLog:

* gcc.dg/lto/20091013-1_0.c: Disable stringop-overread warning.

diff --git a/gcc/testsuite/gcc.dg/lto/20091013-1_0.c 
b/gcc/testsuite/gcc.dg/lto/20091013-1_0.c
index afceb2436cd..7737e252b99 100644
--- a/gcc/testsuite/gcc.dg/lto/20091013-1_0.c
+++ b/gcc/testsuite/gcc.dg/lto/20091013-1_0.c
@@ -2,7 +2,7 @@
 /* { dg-require-effective-target fpic } */
 /* { dg-require-effective-target ptr_eq_long } */
 /* { dg-lto-options {{-fPIC -r -nostdlib -flto} {-fPIC -r -nostdlib -O2 
-flto}} } */
-/* { dg-extra-ld-options "-flinker-output=nolto-rel" } */
+/* { dg-extra-ld-options "-flinker-output=nolto-rel -Wno-stringop-overread" } 
*/
 
 void * HeapAlloc(void*,unsigned int,unsigned long);
 


Extend fnsummary to predict SRA oppurtunities

2023-06-18 Thread Jan Hubicka via Gcc-patches
Hi,
this patch extends ipa-fnsummary to anticipate statements that will be removed
by SRA.  This is done by looking for calls passing addresses of automatic
variables.  In function body we look for dereferences from pointers of such
variables and mark them with new not_sra_candidate condition.

This is just first step which is overly optimistic.  We do not try to prove that
given automatic variable will not be SRAed even after inlining.  We now also
optimistically assume that the transformation will always happen.  I will 
restrict
this in a followup patch, but I think it is useful to gether some data on how
much code is affected by this.

This is motivated by PR109849 where we fail to fully inline push_back.
The patch alone does not solve the problem even for -O3, but improves
analysis in this case.

Bootstrapped/regtested x86_64-linux, commited.

gcc/ChangeLog:

PR tree-optimization/109849
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Add new 
parameter
ES; handle ipa_predicate::not_sra_candidate.
(evaluate_properties_for_edge): Pass es to
evaluate_conditions_for_known_args.
(ipa_fn_summary_t::duplicate): Handle sra candidates.
(dump_ipa_call_summary): Dump points_to_possible_sra_candidate.
(load_or_store_of_ptr_parameter): New function.
(points_to_possible_sra_candidate_p): New function.
(analyze_function_body): Initialize points_to_possible_sra_candidate;
determine sra predicates.
(estimate_ipcp_clone_size_and_time): Update call of
evaluate_conditions_for_known_args.
(remap_edge_params): Update points_to_possible_sra_candidate.
(read_ipa_call_summary): Stream points_to_possible_sra_candidate
(write_ipa_call_summary): Likewise.
* ipa-predicate.cc (ipa_predicate::add_clause): Handle 
not_sra_candidate.
(dump_condition): Dump it.
* ipa-predicate.h (struct inline_param_summary): Add
points_to_possible_sra_candidate.

gcc/testsuite/ChangeLog:

PR tree-optimization/109849
* g++.dg/ipa/devirt-45.C: Update template.

diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
index a301396fb3f..a5f5a50c8a5 100644
--- a/gcc/ipa-fnsummary.cc
+++ b/gcc/ipa-fnsummary.cc
@@ -371,7 +371,8 @@ evaluate_conditions_for_known_args (struct cgraph_node 
*node,
bool inline_p,
ipa_auto_call_arg_values *avals,
clause_t *ret_clause,
-   clause_t *ret_nonspec_clause)
+   clause_t *ret_nonspec_clause,
+   ipa_call_summary *es)
 {
   clause_t clause = inline_p ? 0 : 1 << ipa_predicate::not_inlined_condition;
   clause_t nonspec_clause = 1 << ipa_predicate::not_inlined_condition;
@@ -386,6 +387,17 @@ evaluate_conditions_for_known_args (struct cgraph_node 
*node,
   int j;
   struct expr_eval_op *op;
 
+  if (c->code == ipa_predicate::not_sra_candidate)
+   {
+ if (!inline_p
+ || !es
+ || (int)es->param.length () <= c->operand_num
+ || !es->param[c->operand_num].points_to_possible_sra_candidate)
+   clause |= 1 << (i + ipa_predicate::first_dynamic_condition);
+ nonspec_clause |= 1 << (i + ipa_predicate::first_dynamic_condition);
+ continue;
+   }
+
   if (c->agg_contents)
{
  if (c->code == ipa_predicate::changed
@@ -592,6 +604,7 @@ evaluate_properties_for_edge (struct cgraph_edge *e, bool 
inline_p,
   struct cgraph_node *callee = e->callee->ultimate_alias_target ();
   class ipa_fn_summary *info = ipa_fn_summaries->get (callee);
   class ipa_edge_args *args;
+  class ipa_call_summary *es = NULL;
 
   if (clause_ptr)
 *clause_ptr = inline_p ? 0 : 1 << ipa_predicate::not_inlined_condition;
@@ -603,8 +616,8 @@ evaluate_properties_for_edge (struct cgraph_edge *e, bool 
inline_p,
 {
   struct cgraph_node *caller;
   class ipa_node_params *caller_parms_info, *callee_pi = NULL;
-  class ipa_call_summary *es = ipa_call_summaries->get (e);
   int i, count = ipa_get_cs_argument_count (args);
+  es = ipa_call_summaries->get (e);
 
   if (count)
{
@@ -720,7 +733,7 @@ evaluate_properties_for_edge (struct cgraph_edge *e, bool 
inline_p,
 }
 
   evaluate_conditions_for_known_args (callee, inline_p, avals, clause_ptr,
- nonspec_clause_ptr);
+ nonspec_clause_ptr, es);
 }
 
 
@@ -847,6 +860,7 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
  _truths,
  /* We are going to specialize,
 so ignore nonspec truths.  */
+ NULL,
  NULL);
 
   

[COMMITTED] RTL: Change return type of predicate and callback functions from int to bool

2023-06-18 Thread Uros Bizjak via Gcc-patches
gcc/ChangeLog:

* rtl.h (*rtx_equal_p_callback_function):
Change return type from int to bool.
(rtx_equal_p): Ditto.
(*hash_rtx_callback_function): Ditto.
* rtl.cc (rtx_equal_p): Change return type from int to bool
and adjust function body accordingly.
* early-remat.cc (scratch_equal): Ditto.
* sel-sched-ir.cc (skip_unspecs_callback): Ditto.
(hash_with_unspec_callback): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/early-remat.cc b/gcc/early-remat.cc
index 93cef60c790..700cb65d1e9 100644
--- a/gcc/early-remat.cc
+++ b/gcc/early-remat.cc
@@ -508,16 +508,16 @@ early_remat *early_remat::er;
This allows us to compare two copies of a pattern, even though their
SCRATCHes are always distinct.  */
 
-static int
+static bool
 scratch_equal (const_rtx *x, const_rtx *y, rtx *nx, rtx *ny)
 {
   if (GET_CODE (*x) == SCRATCH && GET_CODE (*y) == SCRATCH)
 {
   *nx = const0_rtx;
   *ny = const0_rtx;
-  return 1;
+  return true;
 }
-  return 0;
+  return false;
 }
 
 /* Hash callback functions for remat_candidate.  */
diff --git a/gcc/rtl.cc b/gcc/rtl.cc
index 0c004947751..635410242fa 100644
--- a/gcc/rtl.cc
+++ b/gcc/rtl.cc
@@ -412,13 +412,13 @@ int currently_expanding_to_rtl;
 
 
 
-/* Return 1 if X and Y are identical-looking rtx's.
+/* Return true if X and Y are identical-looking rtx's.
This is the Lisp function EQUAL for rtx arguments.
 
Call CB on each pair of rtx if CB is not NULL.
When the callback returns true, we continue with the new pair.  */
 
-int
+bool
 rtx_equal_p (const_rtx x, const_rtx y, rtx_equal_p_callback_function cb)
 {
   int i;
@@ -428,9 +428,9 @@ rtx_equal_p (const_rtx x, const_rtx y, 
rtx_equal_p_callback_function cb)
   rtx nx, ny;
 
   if (x == y)
-return 1;
+return true;
   if (x == 0 || y == 0)
-return 0;
+return false;
 
   /* Invoke the callback first.  */
   if (cb != NULL
@@ -440,17 +440,17 @@ rtx_equal_p (const_rtx x, const_rtx y, 
rtx_equal_p_callback_function cb)
   code = GET_CODE (x);
   /* Rtx's of different codes cannot be equal.  */
   if (code != GET_CODE (y))
-return 0;
+return false;
 
   /* (MULT:SI x y) and (MULT:HI x y) are NOT equivalent.
  (REG:SI x) and (REG:HI x) are NOT equivalent.  */
 
   if (GET_MODE (x) != GET_MODE (y))
-return 0;
+return false;
 
   /* MEMs referring to different address space are not equivalent.  */
   if (code == MEM && MEM_ADDR_SPACE (x) != MEM_ADDR_SPACE (y))
-return 0;
+return false;
 
   /* Some RTL can be compared nonrecursively.  */
   switch (code)
@@ -468,7 +468,7 @@ rtx_equal_p (const_rtx x, const_rtx y, 
rtx_equal_p_callback_function cb)
 case VALUE:
 case SCRATCH:
 CASE_CONST_UNIQUE:
-  return 0;
+  return false;
 
 case CONST_VECTOR:
   if (!same_vector_encodings_p (x, y))
@@ -500,7 +500,7 @@ rtx_equal_p (const_rtx x, const_rtx y, 
rtx_equal_p_callback_function cb)
{
case 'w':
  if (XWINT (x, i) != XWINT (y, i))
-   return 0;
+   return false;
  break;
 
case 'n':
@@ -513,30 +513,30 @@ rtx_equal_p (const_rtx x, const_rtx y, 
rtx_equal_p_callback_function cb)
  && XINT (x, i) == XINT (y, i))
break;
 #endif
- return 0;
+ return false;
}
  break;
 
case 'p':
  if (maybe_ne (SUBREG_BYTE (x), SUBREG_BYTE (y)))
-   return 0;
+   return false;
  break;
 
case 'V':
case 'E':
  /* Two vectors must have the same length.  */
  if (XVECLEN (x, i) != XVECLEN (y, i))
-   return 0;
+   return false;
 
  /* And the corresponding elements must match.  */
  for (j = 0; j < XVECLEN (x, i); j++)
-   if (rtx_equal_p (XVECEXP (x, i, j), XVECEXP (y, i, j), cb) == 0)
- return 0;
+   if (!rtx_equal_p (XVECEXP (x, i, j), XVECEXP (y, i, j), cb))
+ return false;
  break;
 
case 'e':
- if (rtx_equal_p (XEXP (x, i), XEXP (y, i), cb) == 0)
-   return 0;
+ if (!rtx_equal_p (XEXP (x, i), XEXP (y, i), cb))
+   return false;
  break;
 
case 'S':
@@ -544,7 +544,7 @@ rtx_equal_p (const_rtx x, const_rtx y, 
rtx_equal_p_callback_function cb)
  if ((XSTR (x, i) || XSTR (y, i))
  && (! XSTR (x, i) || ! XSTR (y, i)
  || strcmp (XSTR (x, i), XSTR (y, i
-   return 0;
+   return false;
  break;
 
case 'u':
@@ -562,7 +562,7 @@ rtx_equal_p (const_rtx x, const_rtx y, 
rtx_equal_p_callback_function cb)
  gcc_unreachable ();
}
 }
-  return 1;
+  return true;
 }
 
 /* Return true if all elements of VEC are equal.  */
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 3995216b58b..f66744b18e3 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -3009,10 +3009,10 @@ extern rtx 

Re: [pushed] c/c++: use positive tone in missing header notes [PR84890]

2023-06-18 Thread Richard Sandiford via Gcc-patches
David Malcolm via Gcc-patches  writes:
> Quoting "How a computer should talk to people" (as quoted
> in "Concepts Error Messages for Humans"):
>
> "Various negative tones or actions are unfriendly: being manipulative,
> not giving a second chance, talking down, using fashionable slang,
> blaming. We must not seem to blame the person. We should avoid suggesting
> that the person is inadequate. Phrases like "you forgot" may seem
> harmless, but what if a computer said this to you four or five times in
> two minutes? Anyway, the person may disagree, so why risk offense?"
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> Pushed to trunk as r14-1798-g7474c46cf2d371.

Thanks for doing this.  I agree the new wording is better FWIW.

Not sure if I prefer it for the same reason as above, but IMO impersonal
error messages are actually more friendly (by being neutral and not
questioning the user).

What do you think about doing something similar for “did you mean ...?”
suggestions?  They're useful suggestions to have, but I find the success
rate isn't that high, and asking a direct question when the suggestion
is obviously nonsensical has a slightly confrontational feel.

Not a strong feeling, just a question. :)

Richard

> gcc/c-family/ChangeLog:
>   PR c/84890
>   * known-headers.cc
>   (suggest_missing_header::~suggest_missing_header): Reword note to
>   avoid negative tone of "forgetting".
>
> gcc/cp/ChangeLog:
>   PR c/84890
>   * name-lookup.cc (missing_std_header::~missing_std_header): Reword
>   note to avoid negative tone of "forgetting".
>
> gcc/testsuite/ChangeLog:
>   PR c/84890
>   * g++.dg/cpp2a/srcloc3.C: Update expected message.
>   * g++.dg/lookup/missing-std-include-2.C: Likewise.
>   * g++.dg/lookup/missing-std-include-3.C: Likewise.
>   * g++.dg/lookup/missing-std-include-6.C: Likewise.
>   * g++.dg/lookup/missing-std-include.C: Likewise.
>   * g++.dg/spellcheck-inttypes.C: Likewise.
>   * g++.dg/spellcheck-stdint.C: Likewise.
>   * g++.dg/spellcheck-stdlib.C: Likewise.
>   * gcc.dg/spellcheck-inttypes.c: Likewise.
>   * gcc.dg/spellcheck-stdbool.c: Likewise.
>   * gcc.dg/spellcheck-stdint.c: Likewise.
>   * gcc.dg/spellcheck-stdlib.c: Likewise.
> ---
>  gcc/c-family/known-headers.cc |  2 +-
>  gcc/cp/name-lookup.cc |  2 +-
>  gcc/testsuite/g++.dg/cpp2a/srcloc3.C  |  2 +-
>  .../g++.dg/lookup/missing-std-include-2.C |  8 +--
>  .../g++.dg/lookup/missing-std-include-3.C |  2 +-
>  .../g++.dg/lookup/missing-std-include-6.C |  4 +-
>  .../g++.dg/lookup/missing-std-include.C   | 16 +++---
>  gcc/testsuite/g++.dg/spellcheck-inttypes.C| 54 +--
>  gcc/testsuite/g++.dg/spellcheck-stdint.C  | 40 +++---
>  gcc/testsuite/g++.dg/spellcheck-stdlib.C  | 28 +-
>  gcc/testsuite/gcc.dg/spellcheck-inttypes.c| 52 +-
>  gcc/testsuite/gcc.dg/spellcheck-stdbool.c |  6 +--
>  gcc/testsuite/gcc.dg/spellcheck-stdint.c  | 40 +++---
>  gcc/testsuite/gcc.dg/spellcheck-stdlib.c  | 34 ++--
>  14 files changed, 145 insertions(+), 145 deletions(-)
>
> diff --git a/gcc/c-family/known-headers.cc b/gcc/c-family/known-headers.cc
> index de92cfd6f3c..3484c867ca0 100644
> --- a/gcc/c-family/known-headers.cc
> +++ b/gcc/c-family/known-headers.cc
> @@ -320,6 +320,6 @@ suggest_missing_header::~suggest_missing_header ()
>maybe_add_include_fixit (, m_header_hint, true);
>inform (,
> "%qs is defined in header %qs;"
> -   " did you forget to %<#include %s%>?",
> +   " this is probably fixable by adding %<#include %s%>",
> m_name_str, m_header_hint, m_header_hint);
>  }
> diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
> index eb5c333b5ea..6ac58a35b56 100644
> --- a/gcc/cp/name-lookup.cc
> +++ b/gcc/cp/name-lookup.cc
> @@ -6760,7 +6760,7 @@ class missing_std_header : public deferred_diagnostic
>   maybe_add_include_fixit (, header, true);
>   inform (,
>   "% is defined in header %qs;"
> - " did you forget to %<#include %s%>?",
> + " this is probably fixable by adding %<#include %s%>",
>   m_name_str, header, header);
>}
>  else
> diff --git a/gcc/testsuite/g++.dg/cpp2a/srcloc3.C 
> b/gcc/testsuite/g++.dg/cpp2a/srcloc3.C
> index 324e03cd548..c843e07fd4f 100644
> --- a/gcc/testsuite/g++.dg/cpp2a/srcloc3.C
> +++ b/gcc/testsuite/g++.dg/cpp2a/srcloc3.C
> @@ -1,5 +1,5 @@
>  // { dg-do compile { target c++20 } }
>  
>  auto x = __builtin_source_location ();   // { dg-error 
> "'source_location' is not a member of 'std'" }
> -// { dg-message "std::source_location' is defined in header 
> ''; did you forget to '#include '" "" { 
> target *-*-* } .-1 }
> +// { dg-message "std::source_location' is defined in header 
> ''; this is probably fixable by adding '#include 
> '" "" { target 

[libstdc++] Improve M_check_len

2023-06-18 Thread Jan Hubicka via Gcc-patches
Hi,
_M_check_len is used in vector reallocations. It computes __n + __s but does
checking for case that (__n + __s) * sizeof (Tp) would overflow ptrdiff_t.
Since we know that __s is a size of already allocated memory block if __n is
not too large, this will never happen on 64bit systems since memory is not that
large.  This patch adds __builtin_constant_p checks for this case.  This size
of fully inlined push_back function that is critical for loops that are
controlled by std::vector based stack.

With the patch to optimize std::max and to handle SRA candidates, we
fully now inline push_back with -O3 (not with -O2), however there are still
quite few silly things for example:

  //  _78 is original size of the allocated vector.

  _76 = stack$_M_end_of_storage_177 - _142;
  _77 = _76 /[ex] 8; 
  _78 = (long unsigned int) _77;
  _79 = MAX_EXPR <_78, 1>;   
  _80 = _78 + _79; // this is result of _M_check_len doubling the allocated 
vector size.
  if (_80 != 0)// result will always be non-zero.  
goto ; [54.67%]
  else
goto ; [45.33%]
  
   [local count: 30795011]:
  if (_80 > 1152921504606846975)  // doubling succesfully allocated memmory 
will never get so large.
goto ; [10.00%]
  else
goto ; [90.00%]
  
   [local count: 3079501]:
  if (_80 > 2305843009213693951)  // I wonder if we really want to have two 
different throws
goto ; [50.00%]
  else 
goto ; [50.00%]
  
   [local count: 1539750]:
  std::__throw_bad_array_new_length ();
  
   [local count: 1539750]:
  std::__throw_bad_alloc ();
  
   [local count: 27715510]:
  _108 = _80 * 8;
  _109 = operator new (_108);

Maybe we want to add assumption that result of the function is never
greater than max_size to get rid of the two checks above.  However this
will still be recongized only after inlining and will continue confusing
inliner heuristics.

Bootstrapped/regtested x86_64-linux.  I am not too familiar with libstdc++ 
internals,
so would welcome comments and ideas.

libstdc++-v3/ChangeLog:

PR tree-optimization/110287
* include/bits/stl_vector.h: Optimize _M_check_len for constantly sized
types and allocations.

diff --git a/libstdc++-v3/include/bits/stl_vector.h 
b/libstdc++-v3/include/bits/stl_vector.h
index 70ced3d101f..3ad59fe3e2b 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1895,11 +1895,22 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   size_type
   _M_check_len(size_type __n, const char* __s) const
   {
-   if (max_size() - size() < __n)
- __throw_length_error(__N(__s));
+   // On 64bit systems vectors of small sizes can not
+   // reach overflow by growing by small sizes; before
+   // this happens, we will run out of memory.
+   if (__builtin_constant_p (sizeof (_Tp))
+   && __builtin_constant_p (__n)
+   && sizeof (ptrdiff_t) >= 8
+   && __n < max_size () / 2)
+ return size() + (std::max)(size(), __n);
+   else
+ {
+   if (max_size() - size() < __n)
+ __throw_length_error(__N(__s));
 
-   const size_type __len = size() + (std::max)(size(), __n);
-   return (__len < size() || __len > max_size()) ? max_size() : __len;
+   const size_type __len = size() + (std::max)(size(), __n);
+   return (__len < size() || __len > max_size()) ? max_size() : __len;
+ }
   }
 
   // Called by constructors to check initial size.


Re: [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations.

2023-06-18 Thread Jeff Law via Gcc-patches




On 6/15/23 09:30, Manolis Tsamis wrote:




Thanks for reporting. I also noticed this while reworking the
implementation for v2 and I have fixed it among other things.

Sounds good.  I stumbled across another problem while testing V2.

GEN_INT can create a non-canonical integer constant (and one might 
legitimately wonder if we should eliminate GEN_INT).  The specific case 
I ran into was something like 0xfff0 for an SImode value on a 64bit 
host.  That should have been 0xfff0 to be canonical.


The right way to handle this these days is with gen_int_mode.You 
should replace the two calls to GEN_INT with gen_int_mode (new_offset, mode)


Still testing the new variant...

jeff


[committed] Fix arc assumption that insns are not re-recognized

2023-06-18 Thread Jeff Law via Gcc-patches
Testing the V2 version of Manolis's fold-mem-offsets patch exposed a 
minor bug in the arc backend.


The movsf_insn pattern has constraints which allow storing certain 
constants to memory.  reload/lra will target those alternatives under 
the right circumstances.  However the insn's condition requires that one 
of the two operands must be a register.


Thus if a pass were to force re-recognition of the pattern we can get an 
unrecognized insn failure.


This patch adjusts the conditions to more closely match movsi_insn. 
More specifically it allows storing a constant into a limited set of 
memory operands (as defined by the Usc constraint).  movqi_insn has the 
same core problem and gets the same solution.



Committed after the tester validated there are no regressions with this 
patch installed for arc-elf.


Jeffcommit 0f9bb3e7a4aab95fd449f60b5f891ed9a6e5f352
Author: Jeff Law 
Date:   Sun Jun 18 11:25:12 2023 -0600

Fix arc assumption that insns are not re-recognized

Testing the V2 version of Manolis's fold-mem-offsets patch exposed a minor 
bug
in the arc backend.

The movsf_insn pattern has constraints which allow storing certain constants
to memory.  reload/lra will target those alternatives under the right
circumstances.  However the insn's condition requires that one of the two
operands must be a register.

Thus if a pass were to force re-recognition of the pattern we can get an
unrecognized insn failure.

This patch adjusts the conditions to more closely match movsi_insn.  More
specifically it allows storing a constant into a limited set of memory
operands (as defined by the Usc constraint).  movqi_insn has the same
core problem and gets the same solution.

Committed after the tester validated there are not regresisons

gcc/
* config/arc/arc.md (movqi_insn): Allow certain constants to
be stored into memory in the pattern's condition.
(movsf_insn): Similarly.

diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index c51ce173350..1f122d9507f 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -705,6 +705,9 @@ (define_insn "*movqi_insn"
(match_operand:QI 1 "move_src_operand"  "rL,rP,q,P,hCm1,cL, 
I,?Rac,i,?i,T,q,Usd,Ucm,m,?Rac,c,?Rac,Cm3,i"))]
   "register_operand (operands[0], QImode)
|| register_operand (operands[1], QImode)
+   || (CONSTANT_P (operands[1])
+   && (!satisfies_constraint_I (operands[1]) || !optimize_size)
+   && satisfies_constraint_Usc (operands[0]))
|| (satisfies_constraint_Cm3 (operands[1])
&& memory_operand (operands[0], QImode))"
   "@
@@ -1363,7 +1366,10 @@ (define_insn "*movsf_insn"
   [(set (match_operand:SF 0 "move_dest_operand"   "=h,h,   r,r,  q,S,Usc,r,m")
(match_operand:SF 1 "move_src_operand"  "hCfZ,E,rCfZ,E,Uts,q,  E,m,r"))]
   "register_operand (operands[0], SFmode)
-   || register_operand (operands[1], SFmode)"
+   || register_operand (operands[1], SFmode)
+   || (CONSTANT_P (operands[1])
+   && (!satisfies_constraint_I (operands[1]) || !optimize_size)
+   && satisfies_constraint_Usc (operands[0]))"
   "@
mov%?\\t%0,%1
mov%?\\t%0,%1 ; %A1


Optimize std::max early

2023-06-18 Thread Jan Hubicka via Gcc-patches
Hi,
we currently produce very bad code on loops using std::vector as a stack, since
we fail to inline push_back which in turn prevents SRA and we fail to optimize
out some store-to-load pairs (PR109849).

I looked into why this function is not inlined and it is inlined by clang.  We
currently estimate it to 66 instructions and inline limits are 15 at -O2 and 30
at -O3.  Clang has similar estimate, but still decides to inline at -O2.

I looked into reason why the body is so large and one problem I spotted is the
way std::max is implemented by taking and returning reference to the values.

  const T& max( const T& a, const T& b );

This makes it necessary to store the values to memory and load them later
(Max is used by code computing new size of vector on resize.)
Two stores, conditional and load accounts as 8 instructions, while
MAX_EXPR as 1 and has a lot better chance to fold with the surrounding
code.

We optimize this to MAX_EXPR, but only during late optimizations.  I think this
is a common enough coding pattern and we ought to make this transparent to
early opts and IPA.  The following is easist fix that simply adds phiprop pass
that turns the PHI of address values into PHI of values so later FRE can
propagate values across memory, phiopt discover the MAX_EXPR pattern and DSE
remove the memory stores.

Bootstrapped/regtested x86_64-linux, does this look resonable thing to do?

Looking into how expensive the pass is, I think it is very cheap, except
that it computes postdominator and updates ssa even if no patterns
are matched.  I will send patch to avoid that.

gcc/ChangeLog:

PR tree-optimization/109811
PR tree-optimization/109849
* passes.def: Add phiprop to early optimization passes.
* tree-ssa-phiprop.cc: Allow clonning.

gcc/testsuite/ChangeLog:

PR tree-optimization/109811
PR tree-optimization/109849
* gcc.dg/tree-ssa/phiprop-1.c: New test.

diff --git a/gcc/passes.def b/gcc/passes.def
index c9a8f19747b..faa5208b26b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -88,6 +88,8 @@ along with GCC; see the file COPYING3.  If not see
  /* pass_build_ealias is a dummy pass that ensures that we
 execute TODO_rebuild_alias at this point.  */
  NEXT_PASS (pass_build_ealias);
+ /* Do phiprop before FRE so we optimize std::min and std::max well.  
*/
+ NEXT_PASS (pass_phiprop);
  NEXT_PASS (pass_fre, true /* may_iterate */);
  NEXT_PASS (pass_early_vrp);
  NEXT_PASS (pass_merge_phi);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c
new file mode 100644
index 000..9f52c2a7298
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-phiprop1-details -fdump-tree-release_ssa" } */
+int max(int a, int b)
+{
+int *ptr;
+if (a > b)
+  ptr = 
+else
+  ptr = 
+return *ptr;
+}
+
+/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 
"phiprop1"} } */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "release_ssa"} } */
diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
index 3cb4900b6be..5dc505df420 100644
--- a/gcc/tree-ssa-phiprop.cc
+++ b/gcc/tree-ssa-phiprop.cc
@@ -476,6 +476,7 @@ public:
   {}
 
   /* opt_pass methods: */
+  opt_pass * clone () final override { return new pass_phiprop (m_ctxt); }
   bool gate (function *) final override { return flag_tree_phiprop; }
   unsigned int execute (function *) final override;
 


[PATCH v1] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64

2023-06-18 Thread Pan Li via Gcc-patches
From: Pan Li 

The rvv widdening reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.

code_for_reduc (code, mode1, mode2)
{
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx16hf; // ZVE128+

  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx8hf;  // ZVE64

  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx4hf;  // ZVE32
}

Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1HF, VNx1HF) which will return the code of
the ZVE128+ instead of the ZVE32 logically.

This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32
will be code_for_reduc (max, VNx1HF, VNx2HF), then the correct code of ZVE32
will be returned as expectation.

Please note both GCC 13 and 14 are impacted by this issue.

Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 

PR 110299

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Adjust expand for
modes.
* config/riscv/vector-iterators.md: Remove VWLMUL1, VWLMUL1_ZVE64,
VWLMUL1_ZVE32.
* config/riscv/vector.md
(@pred_widen_reduc_plus): Removed.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): New pattern.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr110299-1.c: New test.
* gcc.target/riscv/rvv/base/pr110299-1.h: New test.
* gcc.target/riscv/rvv/base/pr110299-2.c: New test.
* gcc.target/riscv/rvv/base/pr110299-2.h: New test.
* gcc.target/riscv/rvv/base/pr110299-3.c: New test.
* gcc.target/riscv/rvv/base/pr110299-3.h: New test.
* gcc.target/riscv/rvv/base/pr110299-4.c: New test.
* gcc.target/riscv/rvv/base/pr110299-4.h: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  |  16 +-
 gcc/config/riscv/vector-iterators.md  |  62 -
 gcc/config/riscv/vector.md| 243 --
 .../gcc.target/riscv/rvv/base/pr110299-1.c|   7 +
 .../gcc.target/riscv/rvv/base/pr110299-1.h|   9 +
 .../gcc.target/riscv/rvv/base/pr110299-2.c|   8 +
 .../gcc.target/riscv/rvv/base/pr110299-2.h|  17 ++
 .../gcc.target/riscv/rvv/base/pr110299-3.c|   7 +
 .../gcc.target/riscv/rvv/base/pr110299-3.h|  17 ++
 .../gcc.target/riscv/rvv/base/pr110299-4.c|   8 +
 .../gcc.target/riscv/rvv/base/pr110299-4.h|  17 ++
 11 files changed, 253 insertions(+), 158 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-1.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-2.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-3.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110299-4.h

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 27545113996..c6c53dc13a5 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1396,16 +1396,8 @@ public:
 
   rtx expand (function_expander ) const override
   {
-machine_mode mode = e.vector_mode ();
-machine_mode ret_mode = e.ret_mode ();
-
-/* TODO: we will use ret_mode after all types of PR110265 are addressed.  
*/
-if (GET_MODE_INNER (mode) != GET_MODE_INNER (ret_mode))
-  return e.use_exact_insn (
-   code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
-else
-  return e.use_exact_insn (
-   code_for_pred_reduc (CODE, e.vector_mode (), e.ret_mode ()));
+return e.use_exact_insn (
+  code_for_pred_reduc (CODE, e.vector_mode (), e.ret_mode ()));
   }
 };
 
@@ -1420,7 +1412,7 @@ public:
   {
 return e.use_exact_insn (code_for_pred_widen_reduc_plus (UNSPEC,
 e.vector_mode (),
-e.vector_mode ()));
+e.ret_mode ()));
   }
 };
 
@@ -1449,7 +1441,7 @@ public:
   {
 return 

[RFC] Workaround LRA reload issue with SUBREGs in SET_DEST.

2023-06-18 Thread Roger Sayle

I was wondering whether I could ask a LRA/reload expert for their help with
a better fix with this issue.

For the testcase (from sse2-v1ti-mov-1.c):

typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
uv1ti foo(__int128 x) { return (uv1ti)x; }

we currently generate (with -O2) the suboptimal:

movq%rdi, %xmm1
movq%rsi, %xmm2
punpcklqdq  %xmm2, %xmm1
movdqa  %xmm1, %xmm0

Notice that (due to register allocation) the result is calculated in
%xmm1 and in the final structure copied to the result %xmm0.

With the one line change (workaround) below, we generate the better
(optimal) sequence:

movq%rdi, %xmm0
movq%rsi, %xmm1
punpcklqdq  %xmm1, %xmm0

The triggering event responsible for the current behaviour is that
combine merges the two instructions:

(insn 12 7 13 2 (set (reg:V2DI 88)
(vec_concat:V2DI (reg:DI 95)
(reg:DI 96))) "sse2-v1ti-mov-1.c":8:10 discrim 1 7238
{vec_concatv2di}
 (expr_list:REG_DEAD (reg:DI 96)
(expr_list:REG_DEAD (reg:DI 95)
(nil

(insn 13 12 17 2 (set (reg:V1TI 82 [  ])
(subreg:V1TI (reg:V2DI 88) 0)) "sse2-v1ti-mov-1.c":8:10 discrim 1
1860 {movv1ti_internal}
 (expr_list:REG_DEAD (reg:V2DI 88)
(nil)))

into the single instruction (with a SUBREG in the SET_DEST):

(insn 13 12 17 2 (set (subreg:V2DI (reg:V1TI 82 [  ]) 0)
(vec_concat:V2DI (reg:DI 95)
(reg:DI 96))) "sse2-v1ti-mov-1.c":8:10 discrim 1 7244
{vec_concatv2di}
 (expr_list:REG_DEAD (reg:DI 95)
(expr_list:REG_DEAD (reg:DI 96)
(nil

Unfortunately, this form is challenging for lra/reload...

 Choosing alt 4 in insn 13:  (0) x  (1) 0  (2) x {vec_concatv2di}
  Creating newreg=98, assigning class SSE_REGS to r98
  Creating newreg=99 from oldreg=96, assigning class SSE_REGS to r99
   13: r98:V2DI=vec_concat(r98:V2DI#0,r99:DI)
  REG_DEAD r96:DI
  REG_DEAD r95:DI
Inserting insn reload before:
   27: clobber r98:V2DI
   28: r98:V2DI#0=r95:DI
   30: r99:DI=r96:DI
Inserting insn reload after:
   29: r82:V1TI#0=r98:V2DI

It's the clobber of r98 (insn 27) that's generated by the emit_clobber
at around line 1081 in match_reload from lra-constraints.cc that's critical,
causing r82 and r98 to occupy different registers/allocations.  Is there
a way of preventing this clobber/conflict?  Are V2DI and V1TI correctly
annotated as tieable to the same hard register.

This patch works by explicitly checking that the destination in
vec_concatv2di is a REG_P, i.e. not a SUBREG, and therefore preventing
the two instructions to be merged by combine.  But clearly this is a
case where lra/reload could be doing better.

Thoughts?


2023-06-18  Roger Sayle  

gcc/ChangeLog
* config/i386/sse.md (vec_concatv2di): Require that the destination
is a REG_P (i.e. a pseudo or hard register, not a SUBREG).


Roger
--

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 70d7410..20a26a0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -20060,7 +20060,7 @@
  "  0, 0,x ,Yv,0,Yv,0,0,v")
  (match_operand:DI 2 "nonimmediate_operand"
  " rm,rm,rm,rm,x,Yv,x,m,m")))]
-  "TARGET_SSE"
+  "TARGET_SSE && REG_P (operands[0])"
   "@
pinsrq\t{$1, %2, %0|%0, %2, 1}
pinsrq\t{$1, %2, %0|%0, %2, 1}


Re: [PATCH] RISC-V: Add tuple vector mode psABI checking and simplify code

2023-06-18 Thread 钟居哲
Thanks for cleaning up codes for future's ABI support patch.
Let's wait for Jeff or Robin comments.

Thanks.


juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-06-18 19:41
To: gcc-patches
CC: juzhe.zhong; yanzhang.wang; kito.cheng; palmer; jeffreyalaw
Subject: [PATCH] RISC-V: Add tuple vector mode psABI checking and simplify code
Hi,
 
This patch does several things:
  1. Adds the missed checking of tuple vector mode
  2. Extend the scope of checking to all vector types, previously it
 was only for scalable vector types.
  3. Simplify the logic of determining code of vector type which will lower to
 vector tmode  code
 
Best,
Lehua
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_scalable_vector_type_p): Delete.
(riscv_arg_has_vector): Simplify.
(riscv_pass_in_vector_p): Adjust warning message.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: Add -Wno-psabi option.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c: Ditto.
* gcc.target/riscv/rvv/base/pr110119-1.c: Ditto.
* gcc.target/riscv/rvv/base/pr110119-2.c: Ditto.
* gcc.target/riscv/vector-abi-1.c: Ditto.
* gcc.target/riscv/vector-abi-2.c: Ditto.
* gcc.target/riscv/vector-abi-3.c: Ditto.
* gcc.target/riscv/vector-abi-4.c: Ditto.
* gcc.target/riscv/vector-abi-5.c: Ditto.
* gcc.target/riscv/vector-abi-6.c: Ditto.
* gcc.target/riscv/vector-abi-7.c: New test.
* gcc.target/riscv/vector-abi-8.c: New test.
* gcc.target/riscv/vector-abi-9.c: New test.
 
---
gcc/config/riscv/riscv.cc | 53 ++-
.../riscv/rvv/autovec/fixed-vlmax-1.c |  2 +-
.../riscv/rvv/autovec/vls-vlmax/merge-1.c |  2 +-
.../riscv/rvv/autovec/vls-vlmax/merge-2.c |  2 +-
.../riscv/rvv/autovec/vls-vlmax/merge-3.c |  2 +-
.../riscv/rvv/autovec/vls-vlmax/merge-4.c |  2 +-
.../riscv/rvv/autovec/vls-vlmax/merge-5.c |  2 +-
.../riscv/rvv/autovec/vls-vlmax/merge-6.c |  2 +-
.../riscv/rvv/autovec/vls-vlmax/merge-7.c |  2 +-
.../riscv/rvv/autovec/vls-vlmax/merge_run-1.c |  2 +-
.../riscv/rvv/autovec/vls-vlmax/merge_run-2.c |  2 +-
.../riscv/rvv/autovec/vls-vlmax/merge_run-3.c |  2 +-
.../riscv/rvv/autovec/vls-vlmax/merge_run-4.c |  2 +-
.../riscv/rvv/autovec/vls-vlmax/merge_run-5.c |  2 +-
.../riscv/rvv/autovec/vls-vlmax/merge_run-6.c |  2 +-
.../riscv/rvv/autovec/vls-vlmax/merge_run-7.c |  2 +-
.../riscv/rvv/autovec/vls-vlmax/perm-1.c  |  2 +-
.../riscv/rvv/autovec/vls-vlmax/perm-2.c  |  2 +-
.../riscv/rvv/autovec/vls-vlmax/perm-3.c  |  2 +-
.../riscv/rvv/autovec/vls-vlmax/perm-4.c  |  2 +-
.../riscv/rvv/autovec/vls-vlmax/perm-5.c  |  2 +-
.../riscv/rvv/autovec/vls-vlmax/perm-6.c  |  2 +-
.../riscv/rvv/autovec/vls-vlmax/perm-7.c  |  2 +-
.../riscv/rvv/autovec/vls-vlmax/perm_run-1.c  |  2 +-
.../riscv/rvv/autovec/vls-vlmax/perm_run-2.c  |  2 +-
.../riscv/rvv/autovec/vls-vlmax/perm_run-3.c  |  2 +-
.../riscv/rvv/autovec/vls-vlmax/perm_run-4.c  |  2 +-
.../riscv/rvv/autovec/vls-vlmax/perm_run-5.c  |  2 +-
.../riscv/rvv/autovec/vls-vlmax/perm_run-6.c  |  2 +-
.../riscv/rvv/autovec/vls-vlmax/perm_run-7.c  |  2 +-
.../gcc.target/riscv/rvv/base/pr110119-1.c|  2 +-
.../gcc.target/riscv/rvv/base/pr110119-2.c|  2 +-
gcc/testsuite/gcc.target/riscv/vector-abi-1.c |  2 +-
gcc/testsuite/gcc.target/riscv/vector-abi-2.c |  2 +-

Re: [PATCH v2] RISC-V: Bugfix for RVV float reduction in ZVE32/64

2023-06-18 Thread 钟居哲
Thanks for fixing it for me.
LGTM now.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-18 10:57
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Bugfix for RVV float reduction in ZVE32/64
From: Pan Li 
 
The rvv integer reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.
 
code_for_reduc (code, mode1, mode2)
{
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx16hf; // ZVE128+
 
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx8hf;  // ZVE64
 
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx4hf;  // ZVE32
}
 
Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1HF, VNx1HF) which will return the code of
the ZVE128+ instead of the ZVE32 logically.
 
This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32
will be code_for_reduc (max, VNx1HF, VNx2HF), then the correct code of ZVE32
will be returned as expectation.
 
Please note both GCC 13 and 14 are impacted by this issue.
 
Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 
 
PR target/110277
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc: Adjust expand for
ret_mode.
* config/riscv/vector-iterators.md: Add VHF, VSF, VDF,
VHF_LMUL1, VSF_LMUL1, VDF_LMUL1, and remove unused attr.
* config/riscv/vector.md (@pred_reduc_): Removed.
(@pred_reduc_): Ditto.
(@pred_reduc_): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_reduc_): New pattern.
(@pred_reduc_): Ditto.
(@pred_reduc_): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_reduc_plus): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr110277-1.c: New test.
* gcc.target/riscv/rvv/base/pr110277-1.h: New test.
* gcc.target/riscv/rvv/base/pr110277-2.c: New test.
* gcc.target/riscv/rvv/base/pr110277-2.h: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |   5 +-
gcc/config/riscv/vector-iterators.md  | 128 +++---
gcc/config/riscv/vector.md| 363 +++---
.../gcc.target/riscv/rvv/base/pr110277-1.c|   9 +
.../gcc.target/riscv/rvv/base/pr110277-1.h|  33 ++
.../gcc.target/riscv/rvv/base/pr110277-2.c|  11 +
.../gcc.target/riscv/rvv/base/pr110277-2.h|  33 ++
7 files changed, 366 insertions(+), 216 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110277-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110277-1.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110277-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110277-2.h
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 53bd0ed2534..27545113996 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1400,8 +1400,7 @@ public:
 machine_mode ret_mode = e.ret_mode ();
 /* TODO: we will use ret_mode after all types of PR110265 are addressed.  
*/
-if ((GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)
-   || GET_MODE_INNER (mode) != GET_MODE_INNER (ret_mode))
+if (GET_MODE_INNER (mode) != GET_MODE_INNER (ret_mode))
   return e.use_exact_insn (
code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
 else
@@ -1435,7 +1434,7 @@ public:
   rtx expand (function_expander ) const override
   {
 return e.use_exact_insn (
-  code_for_pred_reduc_plus (UNSPEC, e.vector_mode (), e.vector_mode ()));
+  code_for_pred_reduc_plus (UNSPEC, e.vector_mode (), e.ret_mode ()));
   }
};
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index e2c8ade98eb..6169116482a 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -967,6 +967,33 @@ (define_mode_iterator VDI [
   (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
])
+(define_mode_iterator VHF [
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
+])
+
+(define_mode_iterator VSF [
+  (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
+  (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
+  (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
+  (VNx8SF "TARGET_VECTOR_ELEN_FP_32")
+  (VNx16SF "TARGET_VECTOR_ELEN_FP_32 && 

Re: [pushed][PATCH v3] LoongArch: Avoid non-returning indirect jumps through $ra [PR110136]

2023-06-18 Thread WANG Xuerui

Hi,

On 6/15/23 17:03, Xi Ruoyao wrote:

Xuerui: I guess this makes it sensible to show "ret" instead of "jirl
$zero, $ra, 0" in objdump -d output, but I don't know how to implement
it.  Do you have some idea?


Thanks for the suggestion! Actually I have previously made this patch 
series [1] which included just that. But the Loongson maintainers said 
they're working on linker relaxation at that time so they would have to 
postpone processing it, and I've never had a review since then; it's 
expected to conflict with the relaxation patches so some rebasing would 
be needed, but IIRC all review comments should be addressed. You can 
take the series if you'd like to ;-)


[1]: https://sourceware.org/pipermail/binutils/2023-February/126088.html



On Thu, 2023-06-15 at 16:27 +0800, Lulu Cheng wrote:

Pushed to trunk and gcc-12 gcc-13.
r14-1866
r13-7448
r12-9698

在 2023/6/15 上午9:30, Lulu Cheng 写道:

Micro-architecture unconditionally treats a "jr $ra" as "return from
subroutine",
hence doing "jr $ra" would interfere with both subroutine return
prediction and
the more general indirect branch prediction.

Therefore, a problem like PR110136 can cause a significant increase
in branch error
prediction rate and affect performance. The same problem exists with
"indirect_jump".

gcc/ChangeLog:

 * config/loongarch/loongarch.md: Modify the register
constraints for template
 "jumptable" and "indirect_jump" from "r" to "e".

Co-authored-by: Andrew Pinski 
---
v1 -> v2:
    1. Modify the description.
    2. Modify the register constraints of the template
"indirect_jump".
v2 -> v3:
    1. Modify the description.
---
   gcc/config/loongarch/loongarch.md | 8 ++--
   1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md
b/gcc/config/loongarch/loongarch.md
index 816a943d155..b37e070660f 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2895,6 +2895,10 @@ (define_insn "*jump_pic"
   }
     [(set_attr "type" "branch")])
   
+;; Micro-architecture unconditionally treats a "jr $ra" as "return

from subroutine",
+;; non-returning indirect jumps through $ra would interfere with
both subroutine
+;; return prediction and the more general indirect branch
prediction.
+
   (define_expand "indirect_jump"
     [(set (pc) (match_operand 0 "register_operand"))]
     ""
@@ -2905,7 +2909,7 @@ (define_expand "indirect_jump"
   })
   
   (define_insn "@indirect_jump"

-  [(set (pc) (match_operand:P 0 "register_operand" "r"))]
+  [(set (pc) (match_operand:P 0 "register_operand" "e"))]
     ""
     "jr\t%0"
     [(set_attr "type" "jump")
@@ -2928,7 +2932,7 @@ (define_expand "tablejump"
   
   (define_insn "@tablejump"

     [(set (pc)
-   (match_operand:P 0 "register_operand" "r"))
+   (match_operand:P 0 "register_operand" "e"))
  (use (label_ref (match_operand 1 "" "")))]
     ""
     "jr\t%0"


[PATCH] RISC-V: Add tuple vector mode psABI checking and simplify code

2023-06-18 Thread Lehua Ding
Hi,

This patch does several things:
  1. Adds the missed checking of tuple vector mode
  2. Extend the scope of checking to all vector types, previously it
 was only for scalable vector types.
  3. Simplify the logic of determining code of vector type which will lower to
 vector tmode  code

Best,
Lehua

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_scalable_vector_type_p): Delete.
(riscv_arg_has_vector): Simplify.
(riscv_pass_in_vector_p): Adjust warning message.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: Add -Wno-psabi option.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c: Ditto.
* gcc.target/riscv/rvv/base/pr110119-1.c: Ditto.
* gcc.target/riscv/rvv/base/pr110119-2.c: Ditto.
* gcc.target/riscv/vector-abi-1.c: Ditto.
* gcc.target/riscv/vector-abi-2.c: Ditto.
* gcc.target/riscv/vector-abi-3.c: Ditto.
* gcc.target/riscv/vector-abi-4.c: Ditto.
* gcc.target/riscv/vector-abi-5.c: Ditto.
* gcc.target/riscv/vector-abi-6.c: Ditto.
* gcc.target/riscv/vector-abi-7.c: New test.
* gcc.target/riscv/vector-abi-8.c: New test.
* gcc.target/riscv/vector-abi-9.c: New test.

---
 gcc/config/riscv/riscv.cc | 53 ++-
 .../riscv/rvv/autovec/fixed-vlmax-1.c |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/merge-1.c |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/merge-2.c |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/merge-3.c |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/merge-4.c |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/merge-5.c |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/merge-6.c |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/merge-7.c |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/merge_run-1.c |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/merge_run-2.c |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/merge_run-3.c |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/merge_run-4.c |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/merge_run-5.c |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/merge_run-6.c |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/merge_run-7.c |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/perm-1.c  |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/perm-2.c  |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/perm-3.c  |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/perm-4.c  |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/perm-5.c  |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/perm-6.c  |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/perm-7.c  |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/perm_run-1.c  |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/perm_run-2.c  |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/perm_run-3.c  |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/perm_run-4.c  |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/perm_run-5.c  |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/perm_run-6.c  |  2 +-
 .../riscv/rvv/autovec/vls-vlmax/perm_run-7.c  |  2 +-
 .../gcc.target/riscv/rvv/base/pr110119-1.c|  2 +-
 .../gcc.target/riscv/rvv/base/pr110119-2.c|  2 +-
 gcc/testsuite/gcc.target/riscv/vector-abi-1.c |  2 +-
 gcc/testsuite/gcc.target/riscv/vector-abi-2.c |  2 +-
 

Re: [x86 PATCH] Add alternate representation for {and, or, xor}b %ah, %dh.

2023-06-18 Thread Uros Bizjak via Gcc-patches
On Sun, Jun 18, 2023 at 11:35 AM Roger Sayle  wrote:
>
>
> A patch that I'm working on to improve RTL simplifications in the
> middle-end results in the regression of pr78904-1b.c, due to changes in
> the canonical representation of high-byte (%ah, %bh, %ch, %dh) logic.
> This patch avoids/prevents those failures by adding support for the
> alternate representation, duplicating the existing *qi_ext_2
> as *qi_ext_3 (the new version also replacing any_or with
> any_logic to provide *andqi_ext_3 in the same pattern).  Removing
> the original pattern isn't trivial, as it's generated by define_split,
> but this can be investigated after the other pieces are approved.

IIRC, I have added these patterns to please combine, based on what
combine generates for the above mentioned testcases. I believe there
is no canonical representation of high-byte logic, so these patterns
are what was appropriate at the time. So, yes, a canonical
representation is the way to go.

Also, please note PR82524. I have a solution for this, we need a
define and split to perform some additional copy of a non-matched
register (there is one pattern that does that in i386.md, but
considering that these patterns are not that common and are rarely
used, I left others as they are).
>
> The current representation of this instruction is:
>
> (set (zero_extract:DI (reg/v:DI 87 [ aD.2763 ])
> (const_int 8 [0x8])
> (const_int 8 [0x8]))
> (subreg:DI (xor:QI (subreg:QI (zero_extract:DI (reg:DI 94)
> (const_int 8 [0x8])
> (const_int 8 [0x8])) 0)
> (subreg:QI (zero_extract:DI (reg/v:DI 87 [ aD.2763 ])
> (const_int 8 [0x8])
> (const_int 8 [0x8])) 0)) 0))
>
> after my proposed middle-end improvement, we attempt to recognize:
>
> (set (zero_extract:DI (reg/v:DI 87 [ aD.2763 ])
> (const_int 8 [0x8])
> (const_int 8 [0x8]))
> (zero_extract:DI (xor:DI (reg:DI 94)
> (reg/v:DI 87 [ aD.2763 ]))
> (const_int 8 [0x8])
> (const_int 8 [0x8])))
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?

I would rather commit this fix after the regression happens. The patch
regresses relatively rarely used simplification, and scan-asm
regressions are not that critical. So, OK for mainline, but after the
regression happens, and it will be clear what patch fixes.

Thanks,
Uros.

>
>
> 2023-06-18  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.md (*qi_ext_3): New define_insn.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [x86 PATCH] Refactor new ix86_expand_carry to set the carry flag.

2023-06-18 Thread Uros Bizjak via Gcc-patches
On Sun, Jun 18, 2023 at 1:10 PM Roger Sayle  wrote:
>
>
> This patch refactors the three places in the i386.md backend that we
> set the carry flag into a new ix86_expand_carry helper function, that
> allows Jakub's recently added uaddc5 and usubc5 expanders
> to take advantage of the recently added support for the stc instruction.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-06-18  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-expand.cc (ix86_expand_carry): New helper
> function for setting the carry flag.
> (ix86_expand_builtin) : Use it here.
> * config/i386/i386-protos.h (ix86_expand_carry): Prototype here.
> * config/i386/i386.md (uaddc5): Use ix86_expand_carry.
> (usubc5): Likewise.

OK.

Thanks,
Uros.

>
> Thanks in advance,
> Roger
> --
>


Re: [x86 PATCH] Standardize shift amount constants as QImode.

2023-06-18 Thread Uros Bizjak via Gcc-patches
On Sun, Jun 18, 2023 at 11:05 AM Roger Sayle  wrote:
>
>
> This clean-up improves consistency within i386.md by using QImode for
> the constant shift count in patterns that specify a mode.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-06-18  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.md (*concat3_1): Use QImode
> for the immediate constant shift count.
> (*concat3_2): Likewise.
> (*concat3_3): Likewise.
> (*concat3_4): Likewise.
> (*concat3_5): Likewise.
> (*concat3_6): Likewise.

OK.

Thanks,
Uros.

>
>
> Thanks,
> Roger
> --
>
>


[x86 PATCH] Refactor new ix86_expand_carry to set the carry flag.

2023-06-18 Thread Roger Sayle

This patch refactors the three places in the i386.md backend that we
set the carry flag into a new ix86_expand_carry helper function, that
allows Jakub's recently added uaddc5 and usubc5 expanders
to take advantage of the recently added support for the stc instruction.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-06-18  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_carry): New helper
function for setting the carry flag.
(ix86_expand_builtin) : Use it here.
* config/i386/i386-protos.h (ix86_expand_carry): Prototype here.
* config/i386/i386.md (uaddc5): Use ix86_expand_carry.
(usubc5): Likewise.

Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index def060a..3d5eca6 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -12644,6 +12644,21 @@ ix86_check_builtin_isa_match (unsigned int fcode,
   return (bisa & isa) == bisa && (bisa2 & isa2) == bisa2;
 }
 
+/* Emit instructions to set the carry flag from ARG.  */
+
+void
+ix86_expand_carry (rtx arg)
+{
+  if (!CONST_INT_P (arg) || arg == const0_rtx)
+{
+  arg = convert_to_mode (QImode, arg, 1);
+  arg = copy_to_mode_reg (QImode, arg);
+  emit_insn (gen_addqi3_cconly_overflow (arg, constm1_rtx));
+}
+  else
+emit_insn (gen_x86_stc ());
+}
+
 /* Expand an expression EXP that calls a built-in function,
with result going to TARGET if that's convenient
(and in mode MODE if that's convenient).
@@ -13975,14 +13990,7 @@ rdseed_step:
   else
{
  /* Generate CF from input operand.  */
- if (!CONST_INT_P (op1))
-   {
- op1 = convert_to_mode (QImode, op1, 1);
- op1 = copy_to_mode_reg (QImode, op1);
- emit_insn (gen_addqi3_cconly_overflow (op1, constm1_rtx));
-   }
- else
-   emit_insn (gen_x86_stc ());
+ ix86_expand_carry (op1);
 
  /* Generate instruction that consumes CF.  */
  op1 = gen_rtx_REG (CCCmode, FLAGS_REG);
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index af01299..27fe73c 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -155,6 +155,7 @@ extern void ix86_expand_sse_movcc (rtx, rtx, rtx, rtx);
 extern void ix86_expand_sse_unpack (rtx, rtx, bool, bool);
 extern void ix86_expand_fp_spaceship (rtx, rtx, rtx);
 extern bool ix86_expand_int_addcc (rtx[]);
+extern void ix86_expand_carry (rtx arg);
 extern rtx_insn *ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
 extern bool ix86_call_use_plt_p (rtx);
 extern void ix86_split_call_vzeroupper (rtx, rtx);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 93794c1..a50cbc8 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -8601,9 +8601,7 @@
 emit_insn (gen_addcarry_0 (operands[0], operands[2], operands[3]));
   else
 {
-  rtx op4 = copy_to_mode_reg (QImode,
- convert_to_mode (QImode, operands[4], 1));
-  emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx));
+  ix86_expand_carry (operands[4]);
   pat = gen_rtx_LTU (mode, cf, const0_rtx);
   pat2 = gen_rtx_LTU (mode, cf, const0_rtx);
   emit_insn (gen_addcarry (operands[0], operands[2], operands[3],
@@ -8634,9 +8632,7 @@
   else
 {
   cf = gen_rtx_REG (CCCmode, FLAGS_REG);
-  rtx op4 = copy_to_mode_reg (QImode,
- convert_to_mode (QImode, operands[4], 1));
-  emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx));
+  ix86_expand_carry (operands[4]);
   pat = gen_rtx_LTU (mode, cf, const0_rtx);
   pat2 = gen_rtx_LTU (mode, cf, const0_rtx);
   emit_insn (gen_subborrow (operands[0], operands[2], operands[3],


[PATCH] Improved SUBREG simplifications in simplify-rtx.cc's simplify_subreg.

2023-06-18 Thread Roger Sayle

An x86 backend improvement that I'm working results in combine attempting
to recognize:

(set (reg:DI 87 [ xD.2846 ])
 (ior:DI (subreg:DI (ashift:TI (zero_extend:TI (reg:DI 92))
   (const_int 64 [0x40])) 0)
 (reg:DI 91)))

where the lowpart SUBREG has difficulty seeing through the (hi<<64)
that the lowpart must be zero.  Rather than workaround this in the
backend, the better fix is to teach simplify-rtx that
lowpart((hi<<64)|lo) -> lo and highpart((hi<<64)|lo) -> hi, so that
all backends benefit.  Reducing the number of places where the
middle-end generates a SUBREG of something other than REG is a
good thing.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures, except for pr78904-1b.c, for which a backend
solution has just been proposed.  Ok for mainline?


2023-06-18  Roger Sayle  

gcc/ChangeLog
* simplify-rtx.cc (simplify_subreg):  Optimize lowpart SUBREGs
of ASHIFT to const0_rtx with sufficiently large shift count.
Optimize highpart SUBREGs of ASHIFT as the shift operand when
the shift count is the correct offset.  Optimize SUBREGs of
multi-word logic operations if the SUBREGs of both operands
can be simplified.


Thanks in advance,
Roger
--

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 21b7eb4..6715247 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -7746,6 +7746,38 @@ simplify_context::simplify_subreg (machine_mode 
outermode, rtx op,
return CONST0_RTX (outermode);
 }
 
+  /* Optimize SUBREGS of scalar integral ASHIFT by a valid constant.  */
+  if (GET_CODE (op) == ASHIFT
+  && SCALAR_INT_MODE_P (innermode)
+  && CONST_INT_P (XEXP (op, 1))
+  && INTVAL (XEXP (op, 1)) > 0
+  && known_gt (GET_MODE_BITSIZE (innermode), INTVAL (XEXP (op, 1
+{
+  HOST_WIDE_INT val = INTVAL (XEXP (op, 1));
+  /* A lowpart SUBREG of a ASHIFT by a constant may fold to zero.  */
+  if (known_eq (subreg_lowpart_offset (outermode, innermode), byte)
+ && known_le (GET_MODE_BITSIZE (outermode), val))
+return CONST0_RTX (outermode);
+  /* Optimize the highpart SUBREG of a suitable ASHIFT (ZERO_EXTEND).  */
+  if (GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
+ && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode
+ && known_eq (GET_MODE_BITSIZE (outermode), val)
+ && known_eq (GET_MODE_BITSIZE (innermode), 2 * val)
+ && known_eq (subreg_highpart_offset (outermode, innermode), byte))
+   return XEXP (XEXP (op, 0), 0);
+}
+
+  /* Attempt to simplify WORD_MODE SUBREGs of bitwise expressions.  */
+  if (outermode == word_mode
+  && (GET_CODE (op) == IOR || GET_CODE (op) == XOR || GET_CODE (op) == AND)
+  && SCALAR_INT_MODE_P (innermode))
+{
+  rtx op0 = simplify_subreg (outermode, XEXP (op, 0), innermode, byte);
+  rtx op1 = simplify_subreg (outermode, XEXP (op, 1), innermode, byte);
+  if (op0 && op1)
+   return simplify_gen_binary (GET_CODE (op), outermode, op0, op1);
+}
+
   scalar_int_mode int_outermode, int_innermode;
   if (is_a  (outermode, _outermode)
   && is_a  (innermode, _innermode)


[x86 PATCH] Add alternate representation for {and,or,xor}b %ah,%dh.

2023-06-18 Thread Roger Sayle

A patch that I'm working on to improve RTL simplifications in the
middle-end results in the regression of pr78904-1b.c, due to changes in
the canonical representation of high-byte (%ah, %bh, %ch, %dh) logic.
This patch avoids/prevents those failures by adding support for the
alternate representation, duplicating the existing *qi_ext_2
as *qi_ext_3 (the new version also replacing any_or with
any_logic to provide *andqi_ext_3 in the same pattern).  Removing
the original pattern isn't trivial, as it's generated by define_split,
but this can be investigated after the other pieces are approved.

The current representation of this instruction is:

(set (zero_extract:DI (reg/v:DI 87 [ aD.2763 ])
(const_int 8 [0x8])
(const_int 8 [0x8]))
(subreg:DI (xor:QI (subreg:QI (zero_extract:DI (reg:DI 94)
(const_int 8 [0x8])
(const_int 8 [0x8])) 0)
(subreg:QI (zero_extract:DI (reg/v:DI 87 [ aD.2763 ])
(const_int 8 [0x8])
(const_int 8 [0x8])) 0)) 0))

after my proposed middle-end improvement, we attempt to recognize:

(set (zero_extract:DI (reg/v:DI 87 [ aD.2763 ])
(const_int 8 [0x8])
(const_int 8 [0x8]))
(zero_extract:DI (xor:DI (reg:DI 94)
(reg/v:DI 87 [ aD.2763 ]))
(const_int 8 [0x8])
(const_int 8 [0x8])))

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-06-18  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.md (*qi_ext_3): New define_insn.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 0929115..889405e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10848,6 +10848,8 @@
   [(set_attr "type" "alu")
(set_attr "mode" "QI")])
 
+;; *andqi_ext_3 is defined via *qi_ext_3 below.
+
 ;; Convert wide AND instructions with immediate operand to shorter QImode
 ;; equivalents when possible.
 ;; Don't do the splitting with memory operands, since it introduces risk
@@ -11560,6 +11562,26 @@
   [(set_attr "type" "alu")
(set_attr "mode" "QI")])
 
+(define_insn "*qi_ext_3"
+  [(set (zero_extract:SWI248
+ (match_operand 0 "int248_register_operand" "+Q")
+ (const_int 8)
+ (const_int 8))
+   (zero_extract:SWI248
+ (any_logic:SWI248
+   (match_operand 1 "int248_register_operand" "%0")
+   (match_operand 2 "int248_register_operand" "Q"))
+ (const_int 8)
+ (const_int 8)))
+   (clobber (reg:CC FLAGS_REG))]
+  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
+   /* FIXME: without this LRA can't reload this pattern, see PR82524.  */
+   && (rtx_equal_p (operands[0], operands[1])
+   || rtx_equal_p (operands[0], operands[2]))"
+  "{b}\t{%h2, %h0|%h0, %h2}"
+  [(set_attr "type" "alu")
+   (set_attr "mode" "QI")])
+
 ;; Convert wide OR instructions with immediate operand to shorter QImode
 ;; equivalents when possible.
 ;; Don't do the splitting with memory operands, since it introduces risk


[x86 PATCH] Standardize shift amount constants as QImode.

2023-06-18 Thread Roger Sayle

This clean-up improves consistency within i386.md by using QImode for
the constant shift count in patterns that specify a mode.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-06-18  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.md (*concat3_1): Use QImode
for the immediate constant shift count.
(*concat3_2): Likewise.
(*concat3_3): Likewise.
(*concat3_4): Likewise.
(*concat3_5): Likewise.
(*concat3_6): Likewise.


Thanks,
Roger
--


diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 93794c1..b8d2e3a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12211,7 +12211,7 @@
   [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
(any_or_plus:
  (ashift: (match_operand: 1 "register_operand" "r,r")
-   (match_operand: 2 "const_int_operand"))
+   (match_operand:QI 2 "const_int_operand"))
  (zero_extend:
(match_operand:DWIH 3 "nonimmediate_operand" "r,m"]
   "INTVAL (operands[2]) ==  * BITS_PER_UNIT"
@@ -12230,7 +12230,7 @@
  (zero_extend:
(match_operand:DWIH 1 "nonimmediate_operand" "r,m"))
  (ashift: (match_operand: 2 "register_operand" "r,r")
-   (match_operand: 3 "const_int_operand"]
+   (match_operand:QI 3 "const_int_operand"]
   "INTVAL (operands[3]) ==  * BITS_PER_UNIT"
   "#"
   "&& reload_completed"
@@ -12247,7 +12247,7 @@
  (ashift:
(zero_extend:
  (match_operand:DWIH 1 "nonimmediate_operand" "r,m,r,m"))
-   (match_operand: 2 "const_int_operand"))
+   (match_operand:QI 2 "const_int_operand"))
  (zero_extend:
(match_operand:DWIH 3 "nonimmediate_operand" "r,r,m,m"]
   "INTVAL (operands[2]) ==  * BITS_PER_UNIT"
@@ -12267,7 +12267,7 @@
  (ashift:
(zero_extend:
  (match_operand:DWIH 2 "nonimmediate_operand" "r,r,m,m"))
-   (match_operand: 3 "const_int_operand"]
+   (match_operand:QI 3 "const_int_operand"]
   "INTVAL (operands[3]) ==  * BITS_PER_UNIT"
   "#"
   "&& reload_completed"
@@ -12281,7 +12281,7 @@
   [(set (match_operand:DWI 0 "nonimmediate_operand" "=r,o,o")
(any_or_plus:DWI
  (ashift:DWI (match_operand:DWI 1 "register_operand" "r,r,r")
- (match_operand:DWI 2 "const_int_operand"))
+ (match_operand:QI 2 "const_int_operand"))
  (match_operand:DWI 3 "const_scalar_int_operand" "n,n,Wd")))]
   "INTVAL (operands[2]) ==  * BITS_PER_UNIT / 2
&& (mode == DImode
@@ -12313,7 +12313,7 @@
  (ashift:
(zero_extend:
  (match_operand:DWIH 1 "nonimmediate_operand" "r,r,r,m"))
-   (match_operand: 2 "const_int_operand"))
+   (match_operand:QI 2 "const_int_operand"))
  (match_operand: 3 "const_scalar_int_operand" "n,n,Wd,n")))]
   "INTVAL (operands[2]) ==  * BITS_PER_UNIT
&& (mode == DImode


[PATCH 1/2] xtensa: Remove TARGET_MEMORY_MOVE_COST hook

2023-06-18 Thread Takayuki 'January June' Suwa via Gcc-patches
It used to always return a constant 4, which is same as the default
behavior, but doesn't take into account the effects of secondary
reloads.

Therefore, the implementation of this target hook is removed.

gcc/ChangeLog:

* config/xtensa/xtensa.cc
(TARGET_MEMORY_MOVE_COST, xtensa_memory_move_cost): Remove.
---
 gcc/config/xtensa/xtensa.cc | 13 -
 1 file changed, 13 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 3b5d25b660a..721c99b56a3 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -131,7 +131,6 @@ static bool xtensa_rtx_costs (rtx, machine_mode, int, int, 
int *, bool);
 static int xtensa_insn_cost (rtx_insn *, bool);
 static int xtensa_register_move_cost (machine_mode, reg_class_t,
  reg_class_t);
-static int xtensa_memory_move_cost (machine_mode, reg_class_t, bool);
 static tree xtensa_build_builtin_va_list (void);
 static bool xtensa_return_in_memory (const_tree, const_tree);
 static tree xtensa_gimplify_va_arg_expr (tree, tree, gimple_seq *,
@@ -213,8 +212,6 @@ static rtx xtensa_delegitimize_address (rtx);
 
 #undef TARGET_REGISTER_MOVE_COST
 #define TARGET_REGISTER_MOVE_COST xtensa_register_move_cost
-#undef TARGET_MEMORY_MOVE_COST
-#define TARGET_MEMORY_MOVE_COST xtensa_memory_move_cost
 #undef TARGET_RTX_COSTS
 #define TARGET_RTX_COSTS xtensa_rtx_costs
 #undef TARGET_INSN_COST
@@ -4356,16 +4353,6 @@ xtensa_register_move_cost (machine_mode mode 
ATTRIBUTE_UNUSED,
 return 10;
 }
 
-/* Worker function for TARGET_MEMORY_MOVE_COST.  */
-
-static int
-xtensa_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
-reg_class_t rclass ATTRIBUTE_UNUSED,
-bool in ATTRIBUTE_UNUSED)
-{
-  return 4;
-}
-
 /* Compute a (partial) cost for rtx X.  Return true if the complete
cost has been computed, and false if subexpressions should be
scanned.  In either case, *TOTAL contains the cost result.  */
-- 
2.30.2


[PATCH 2/2] xtensa: constantsynth: Add new 2-insns synthesis pattern

2023-06-18 Thread Takayuki 'January June' Suwa via Gcc-patches
This patch adds a new 2-instructions constant synthesis pattern:

-  A non-negative square value that root can fit into a signed 12-bit:
=> "MOVI(.N) Ax, simm12" + "MULL Ax, Ax, Ax"

Due to the execution cost of the integer multiply instruction (MULL), this
synthesis works only when the 32-bit Integer Multiply Option is configured
and optimize for size is specified.

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_constantsynth_2insn):
Add new pattern for the abovementioned case.
---
 gcc/config/xtensa/xtensa.cc | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 721c99b56a3..dd35e63c094 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "insn-attr.h"
 #include "tree-pass.h"
 #include "print-rtl.h"
+#include 
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -1067,7 +1068,7 @@ xtensa_constantsynth_2insn (rtx dst, HOST_WIDE_INT srcval,
 {
   HOST_WIDE_INT imm = INT_MAX;
   rtx x = NULL_RTX;
-  int shift;
+  int shift, sqr;
 
   gcc_assert (REG_P (dst));
 
@@ -1078,7 +1079,6 @@ xtensa_constantsynth_2insn (rtx dst, HOST_WIDE_INT srcval,
   x = gen_lshrsi3 (dst, dst, GEN_INT (32 - shift));
 }
 
-
   shift = ctz_hwi (srcval);
   if ((!x || (TARGET_DENSITY && ! IN_RANGE (imm, -32, 95)))
   && xtensa_simm12b (srcval >> shift))
@@ -1105,6 +1105,14 @@ xtensa_constantsynth_2insn (rtx dst, HOST_WIDE_INT 
srcval,
   x = gen_addsi3 (dst, dst, GEN_INT (imm1));
 }
 
+  sqr = (int) floorf (sqrtf (srcval));
+  if (TARGET_MUL32 && optimize_size
+  && !x && IN_RANGE (srcval, 0, (2047 * 2047)) && sqr * sqr == srcval)
+{
+  imm = sqr;
+  x = gen_mulsi3 (dst, dst, dst);
+}
+
   if (!x)
 return 0;
 
-- 
2.30.2