[committed] [v2] More logical op simplifications in simplify-rtx.cc

2024-05-25 Thread Jeff Law

This is a revamp of what started as a target specific patch.

Basically xalan (corrected, I originally thought it was perlbench) has a 
bitset implementation with a bit of an oddity.  Specifically setBit will 
clear the bit before it is set:



if (bitToSet < 32)
{
fBits1 &= ~mask;
fBits1 |= mask;
}
 else
{
fBits2 &= ~mask;
fBits2 |= mask;
}


We can clean this up pretty easily in RTL with a small bit of code in 
simplify-rtx.  While xalan doesn't have other cases, we can synthesize 
tests pretty easily and handle them as well.



It turns out we don't actually have to recognize this stuff at the bit 
level, just standard logical identities are sufficient.  For example


(X | Y) & ~Y -> X & ~Y



Andrew P. might poke at this at the gimple level.  The type changes 
kindof get in the way in gimple but he's much better at match.pd than I 
am, so if he wants to chase it from the gimple side, I'll fully support 
that.


Bootstrapped and regression tested on x86.  Also run through my tester 
on its embedded targets.


Pushing to the trunk.

jeff

commit 05daf617ea22e1d818295ed2d037456937e23530
Author: Jeff Law 
Date:   Sat May 25 12:39:05 2024 -0600

[committed] [v2] More logical op simplifications in simplify-rtx.cc

This is a revamp of what started as a target specific patch.

Basically xalan (corrected, I originally thought it was perlbench) has a 
bitset
implementation with a bit of an oddity.  Specifically setBit will clear the 
bit
before it is set:

> if (bitToSet < 32)
> {
> fBits1 &= ~mask;
> fBits1 |= mask;
> }
>  else
> {
> fBits2 &= ~mask;
> fBits2 |= mask;
> }
We can clean this up pretty easily in RTL with a small bit of code in
simplify-rtx.  While xalan doesn't have other cases, we can synthesize tests
pretty easily and handle them as well.

It turns out we don't actually have to recognize this stuff at the bit 
level,
just standard logical identities are sufficient.  For example

(X | Y) & ~Y -> X & ~Y

Andrew P. might poke at this at the gimple level.  The type changes kindof 
get
in the way in gimple but he's much better at match.pd than I am, so if he 
wants
to chase it from the gimple side, I'll fully support that.

Bootstrapped and regression tested on x86.  Also run through my tester on 
its
embedded targets.

Pushing to the trunk.

gcc/

* simplify-rtx.cc (simplify_context::simplify_binary_operation_1): 
Handle
more logical simplifications.

gcc/testsuite/

* g++.target/riscv/redundant-bitmap-1.C: New test.
* g++.target/riscv/redundant-bitmap-2.C: New test.
* g++.target/riscv/redundant-bitmap-3.C: New test.
* g++.target/riscv/redundant-bitmap-4.C: New test.

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 53f54d1d392..5caf1dfd957 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -3549,6 +3549,12 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
return tem;
}
 
+  /* Convert (ior (and (not A) B) A) into A | B.  */
+  if (GET_CODE (op0) == AND
+ && GET_CODE (XEXP (op0, 0)) == NOT
+ && rtx_equal_p (XEXP (XEXP (op0, 0), 0), op1))
+   return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1);
+
   tem = simplify_byte_swapping_operation (code, mode, op0, op1);
   if (tem)
return tem;
@@ -3801,6 +3807,12 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
return tem;
}
 
+  /* Convert (xor (and (not A) B) A) into A | B.  */
+  if (GET_CODE (op0) == AND
+ && GET_CODE (XEXP (op0, 0)) == NOT
+ && rtx_equal_p (XEXP (XEXP (op0, 0), 0), op1))
+   return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1);
+
   tem = simplify_byte_swapping_operation (code, mode, op0, op1);
   if (tem)
return tem;
@@ -4006,6 +4018,23 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
  && rtx_equal_p (op1, XEXP (XEXP (op0, 1), 0)))
return simplify_gen_binary (AND, mode, op1, XEXP (op0, 0));
 
+  /* (and (ior/xor (X Y) (not Y)) -> X & ~Y */
+  if ((GET_CODE (op0) == IOR || GET_CODE (op0) == XOR)
+ && GET_CODE (op1) == NOT
+ && rtx_equal_p (XEXP (op1, 0), XEXP (op0, 1)))
+   return simplify_gen_binary (AND, mode, XEXP (op0, 0),
+   simplify_gen_unary (NOT, mode,
+ 

[gcc r15-831] [committed] [v2] More logical op simplifications in simplify-rtx.cc

2024-05-25 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:05daf617ea22e1d818295ed2d037456937e23530

commit r15-831-g05daf617ea22e1d818295ed2d037456937e23530
Author: Jeff Law 
Date:   Sat May 25 12:39:05 2024 -0600

[committed] [v2] More logical op simplifications in simplify-rtx.cc

This is a revamp of what started as a target specific patch.

Basically xalan (corrected, I originally thought it was perlbench) has a 
bitset
implementation with a bit of an oddity.  Specifically setBit will clear the 
bit
before it is set:

> if (bitToSet < 32)
> {
> fBits1 &= ~mask;
> fBits1 |= mask;
> }
>  else
> {
> fBits2 &= ~mask;
> fBits2 |= mask;
> }
We can clean this up pretty easily in RTL with a small bit of code in
simplify-rtx.  While xalan doesn't have other cases, we can synthesize tests
pretty easily and handle them as well.

It turns out we don't actually have to recognize this stuff at the bit 
level,
just standard logical identities are sufficient.  For example

(X | Y) & ~Y -> X & ~Y

Andrew P. might poke at this at the gimple level.  The type changes kindof 
get
in the way in gimple but he's much better at match.pd than I am, so if he 
wants
to chase it from the gimple side, I'll fully support that.

Bootstrapped and regression tested on x86.  Also run through my tester on 
its
embedded targets.

Pushing to the trunk.

gcc/

* simplify-rtx.cc (simplify_context::simplify_binary_operation_1): 
Handle
more logical simplifications.

gcc/testsuite/

* g++.target/riscv/redundant-bitmap-1.C: New test.
* g++.target/riscv/redundant-bitmap-2.C: New test.
* g++.target/riscv/redundant-bitmap-3.C: New test.
* g++.target/riscv/redundant-bitmap-4.C: New test.

Diff:
---
 gcc/simplify-rtx.cc| 29 ++
 .../g++.target/riscv/redundant-bitmap-1.C  | 14 +++
 .../g++.target/riscv/redundant-bitmap-2.C  | 14 +++
 .../g++.target/riscv/redundant-bitmap-3.C  | 14 +++
 .../g++.target/riscv/redundant-bitmap-4.C  | 14 +++
 5 files changed, 85 insertions(+)

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 53f54d1d392..5caf1dfd957 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -3549,6 +3549,12 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
return tem;
}
 
+  /* Convert (ior (and (not A) B) A) into A | B.  */
+  if (GET_CODE (op0) == AND
+ && GET_CODE (XEXP (op0, 0)) == NOT
+ && rtx_equal_p (XEXP (XEXP (op0, 0), 0), op1))
+   return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1);
+
   tem = simplify_byte_swapping_operation (code, mode, op0, op1);
   if (tem)
return tem;
@@ -3801,6 +3807,12 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
return tem;
}
 
+  /* Convert (xor (and (not A) B) A) into A | B.  */
+  if (GET_CODE (op0) == AND
+ && GET_CODE (XEXP (op0, 0)) == NOT
+ && rtx_equal_p (XEXP (XEXP (op0, 0), 0), op1))
+   return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1);
+
   tem = simplify_byte_swapping_operation (code, mode, op0, op1);
   if (tem)
return tem;
@@ -4006,6 +4018,23 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
  && rtx_equal_p (op1, XEXP (XEXP (op0, 1), 0)))
return simplify_gen_binary (AND, mode, op1, XEXP (op0, 0));
 
+  /* (and (ior/xor (X Y) (not Y)) -> X & ~Y */
+  if ((GET_CODE (op0) == IOR || GET_CODE (op0) == XOR)
+ && GET_CODE (op1) == NOT
+ && rtx_equal_p (XEXP (op1, 0), XEXP (op0, 1)))
+   return simplify_gen_binary (AND, mode, XEXP (op0, 0),
+   simplify_gen_unary (NOT, mode,
+   XEXP (op1, 0),
+   mode));
+  /* (and (ior/xor (Y X) (not Y)) -> X & ~Y */
+  if ((GET_CODE (op0) == IOR || GET_CODE (op0) == XOR)
+ && GET_CODE (op1) == NOT
+ && rtx_equal_p (XEXP (op1, 0), XEXP (op0, 0)))
+   return simplify_gen_binary (AND, mode, XEXP (op0, 1),
+   simplify_gen_unary (NOT, mode,
+   XEXP (op1, 0),
+   mode));
+
   /* Convert (and (ior A C) (ior B C)) into (ior (and A B) C).  */
   if (GET_CODE (op0) == GET_CODE (op1)
  && (GET_CODE (op0) == AND

Re: [RFC/RFA] [PATCH 04/12] RISC-V: Add CRC built-ins tests for the target ZBC.

2024-05-25 Thread Jeff Law




On 5/24/24 2:41 AM, Mariam Arutunian wrote:

   gcc/testsuite/gcc.target/riscv/

     * crc-builtin-zbc32.c: New file.
     * crc-builtin-zbc64.c: Likewise.

OK once prerequisites are approved.

jeff



Re: [RFC/RFA] [PATCH 12/12] Add tests for CRC detection and generation.

2024-05-25 Thread Jeff Law




On 5/24/24 2:42 AM, Mariam Arutunian wrote:

   gcc/testsuite/gcc.c-torture/compile/

     * crc-11.c: New test.
     * crc-15.c: Likewise.
     * crc-16.c: Likewise.
     * crc-19.c: Likewise.
     * crc-2.c: Likewise.
     * crc-20.c: Likewise.
     * crc-24.c: Likewise.
     * crc-29.c: Likewise.
     * crc-27.c: Likewise.
     * crc-3.c: Likewise.
     * crc-crc32-data24.c: Likewise.
     * crc-from-fedora-packages (1-24).c: Likewise.
     * crc-linux-(1-5).c: Likewise.
     * crc-not-crc-(1-26).c: Likewise.
     * crc-side-instr-(1-17).c: Likewise.

   gcc/testsuite/gcc.c-torture/execute/

     * crc-(1, 4-10, 12-14, 17-18, 21-28).c: New tests.
     * crc-CCIT-data16-xorOutside_InsideFor.c: Likewise.
     * crc-CCIT-data16.c: Likewise.
     * crc-CCIT-data8.c: Likewise.
     * crc-coremark16-data16.c: Likewise.
     * crc-coremark16-data16.c: Likewise.
     * crc-coremark32-data32.c: Likewise.
     * crc-coremark32-data8.c: Likewise.
     * crc-coremark64-data64.c: Likewise.
     * crc-coremark8-data8.c: Likewise.
     * crc-crc32-data16.c: Likewise.
     * crc-crc32-data8.c: Likewise.
     * crc-crc32.c: Likewise.
     * crc-crc64-data32.c: Likewise.
     * crc-crc64-data64.c: Likewise.
     * crc-crc8-data8-loop-xorInFor.c: Likewise.
     * crc-crc8-data8-loop-xorOutsideFor.c: Likewise.
     * crc-crc8-data8-xorOustideFor.c: Likewise.
     * crc-crc8.c: Likewise.

Signed-off-by: Mariam Arutunian >

OK once all prerequisites are approved.

jeff



Re: [RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC.

2024-05-25 Thread Jeff Law




On 5/24/24 2:41 AM, Mariam Arutunian wrote:
If the target is ZBC or ZBKC, it uses clmul instruction for the CRC 
calculation.

Otherwise, if the target is ZBKB, generates table-based CRC,
but for reversing inputs and the output uses bswap and brev8 instructions.
Add new tests to check CRC generation for ZBC, ZBKC and ZBKB targets.

   gcc/

      * expr.cc (gf2n_poly_long_div_quotient): New function.
      (reflect): Likewise.
      * expr.h (gf2n_poly_long_div_quotient): New function declaration.
      (reflect): Likewise.

   gcc/config/riscv/

      * bitmanip.md (crc_rev4): New expander for 
reversed CRC.

      (crc4): New expander for bit-forward CRC.
      (SUBX1, ANYI1): New iterators.
      * riscv-protos.h (generate_reflecting_code_using_brev): New 
function declaration.

      (expand_crc_using_clmul): Likewise.
      (expand_reversed_crc_using_clmul): Likewise.
      * riscv.cc (generate_reflecting_code_using_brev): New function.
      (expand_crc_using_clmul): Likewise.
      (expand_reversed_crc_using_clmul): Likewise.
      * riscv.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.

   gcc/testsuite/gcc.target/riscv/

         * crc-1-zbc.c: New test.
         * crc-10-zbc.c: Likewise.
         * crc-12-zbc.c: Likewise.
         * crc-13-zbc.c: Likewise.
         * crc-14-zbc.c: Likewise.
         * crc-17-zbc.c: Likewise.
         * crc-18-zbc.c: Likewise.
         * crc-21-zbc.c: Likewise.
         * crc-22-rv64-zbc.c: Likewise.
         * crc-22-zbkb.c: Likewise.
         * crc-23-zbc.c: Likewise.
         * crc-4-zbc.c: Likewise.
         * crc-5-zbc.c: Likewise.
         * crc-5-zbkb.c: Likewise.
         * crc-6-zbc.c: Likewise.
         * crc-7-zbc.c: Likewise.
         * crc-8-zbc.c: Likewise.
         * crc-8-zbkb.c: Likewise.
         * crc-9-zbc.c: Likewise.
         * crc-CCIT-data16-zbc.c: Likewise.
         * crc-CCIT-data8-zbc.c: Likewise.
         * crc-coremark-16bitdata-zbc.c: Likewise.

Signed-off-by: Mariam Arutunian >

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 8769a6b818b..c98d451f404 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -973,3 +973,66 @@
   "TARGET_ZBC"
   "clmulr\t%0,%1,%2"
   [(set_attr "type" "clmul")])
+
+
+;; Iterator for hardware integer modes narrower than XLEN, same as SUBX
+(define_mode_iterator SUBX1 [QI HI (SI "TARGET_64BIT")])
+
+;; Iterator for hardware integer modes narrower than XLEN, same as ANYI
+(define_mode_iterator ANYI1 [QI HI SI (DI "TARGET_64BIT")])
If these iterators are the same as existing ones, let's just using the 
existing ones. unless we need both SUBX and SUBX1 in the same pattern or 
ANYI/ANYI1.





+
+;; Reversed CRC 8, 16, 32 for TARGET_64
+(define_expand "crc_rev4"
+   ;; return value (calculated CRC)
+  [(set (match_operand:ANYI 0 "register_operand" "=r")
+ ;; initial CRC
+   (unspec:ANYI [(match_operand:ANYI 1 "register_operand" "r")
+ ;; data
+ (match_operand:ANYI1 2 "register_operand" "r")
+ ;; polynomial without leading 1
+ (match_operand:ANYI 3)]
+ UNSPEC_CRC_REV))]
So the preferred formatting for .md files has operands of a given 
operator at the same indention level.  So in this case SET is the 
operator, with two operands (destination/source).  Indent the source and 
destination at the same level.   so


  [(set (match_operand:ANYI 0 ...0)
(unspec: ANYI ...)

Similarly for the reversed expander.



diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 85df5b7ab49..123695033a6 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11394,7 +11394,7 @@ riscv_expand_usadd (rtx dest, rtx x, rtx y)
   if (mode == HImode || mode == QImode)
 {
   int shift_bits = GET_MODE_BITSIZE (Xmode)
-   - GET_MODE_BITSIZE (mode).to_constant ();
+  - GET_MODE_BITSIZE (mode).to_constant ();
 
   gcc_assert (shift_bits > 0);

Looks like an unrelated spurious change.  Drop.


 
@@ -11415,6 +11415,188 @@ riscv_expand_usadd (rtx dest, rtx x, rtx y)

   emit_move_insn (dest, gen_lowpart (mode, xmode_dest));
 }
 
+/* Generate instruction sequence

+   which reflects the value of the OP using bswap and brev8 instructions.
+   OP's mode may be less than word_mode, to get the correct number,
+   after reflecting we shift right the value by SHIFT_VAL.
+   E.g. we have  0001, after reflection (target 32-bit) we will get
+   1000   , if we shift-out 16 bits,
+   we will get the desired one: 1000 .  */
+
+void
+generate_reflecting_code_using_brev (rtx *op, int shift_val)
+{
+
+  riscv_expand_op (BSWAP, word_mode, *op, *op, *op);
+  riscv_expand_op (LSHIFTRT, word_mode, *op, *op,
+  gen_int_mode (shift_val, word_mode));
Formatting nit with the gen_int_mode (...) argument.  It should line up 
with the 

Re: [RFC/RFA] [PATCH 02/12] Add built-ins and tests for bit-forward and bit-reversed CRCs

2024-05-25 Thread Jeff Law




On 5/24/24 2:41 AM, Mariam Arutunian wrote:
This patch introduces new built-in functions to GCC for computing bit- 
forward and bit-reversed CRCs.

These builtins aim to provide efficient CRC calculation capabilities.
When the target architecture supports CRC operations (as indicated by 
the presence of a CRC optab),

the builtins will utilize the expander to generate CRC code.
In the absence of hardware support, the builtins default to generating 
code for a table-based CRC calculation.


The builtins are defined as follows:
__builtin_rev_crc16_data8,
__builtin_rev_crc32_data8, __builtin_rev_crc32_data16, 
__builtin_rev_crc32_data32

__builtin_crc8_data8,
__builtin_crc16_data16, __builtin_crc16_data8,
__builtin_crc32_data8, __builtin_crc32_data16, __builtin_crc32_data32,
__builtin_crc64_data8, __builtin_crc64_data16,  __builtin_crc64_data32, 
__builtin_crc64_data64


Each builtin takes three parameters:
crc: The initial CRC value.
data: The data to be processed.
polynomial: The CRC polynomial without the leading 1.

To validate the correctness of these builtins, this patch also includes 
additions to the GCC testsuite.
This enhancement allows GCC to offer developers high-performance CRC 
computation options

that automatically adapt to the capabilities of the target hardware.

Co-authored-by: Joern Rennecke >


Not complete. May continue the work if these built-ins are needed.

gcc/

  * builtin-types.def (BT_FN_UINT8_UINT8_UINT8_CONST_SIZE): Define.
  (BT_FN_UINT16_UINT16_UINT8_CONST_SIZE): Likewise.
           (BT_FN_UINT16_UINT16_UINT16_CONST_SIZE): Likewise.
           (BT_FN_UINT32_UINT32_UINT8_CONST_SIZE): Likewise.
           (BT_FN_UINT32_UINT32_UINT16_CONST_SIZE): Likewise.
           (BT_FN_UINT32_UINT32_UINT32_CONST_SIZE): Likewise.
           (BT_FN_UINT64_UINT64_UINT8_CONST_SIZE): Likewise.
           (BT_FN_UINT64_UINT64_UINT16_CONST_SIZE): Likewise.
           (BT_FN_UINT64_UINT64_UINT32_CONST_SIZE): Likewise.
           (BT_FN_UINT64_UINT64_UINT64_CONST_SIZE): Likewise.
           * builtins.cc (associated_internal_fn): Handle 
BUILT_IN_CRC8_DATA8,

           BUILT_IN_CRC16_DATA8, BUILT_IN_CRC16_DATA16,
           BUILT_IN_CRC32_DATA8, BUILT_IN_CRC32_DATA16, 
BUILT_IN_CRC32_DATA32,
           BUILT_IN_CRC64_DATA8, BUILT_IN_CRC64_DATA16, 
BUILT_IN_CRC64_DATA32,

           BUILT_IN_CRC64_DATA64,
           BUILT_IN_REV_CRC8_DATA8,
           BUILT_IN_REV_CRC16_DATA8, BUILT_IN_REV_CRC16_DATA16,
           BUILT_IN_REV_CRC32_DATA8, BUILT_IN_REV_CRC32_DATA16, 
BUILT_IN_REV_CRC32_DATA32.

           (expand_builtin_crc_table_based): New function.
           (expand_builtin): Handle BUILT_IN_CRC8_DATA8,
           BUILT_IN_CRC16_DATA8, BUILT_IN_CRC16_DATA16,
           BUILT_IN_CRC32_DATA8, BUILT_IN_CRC32_DATA16, 
BUILT_IN_CRC32_DATA32,
           BUILT_IN_CRC64_DATA8, BUILT_IN_CRC64_DATA16, 
BUILT_IN_CRC64_DATA32,

           BUILT_IN_CRC64_DATA64,
           BUILT_IN_REV_CRC8_DATA8,
           BUILT_IN_REV_CRC16_DATA8, BUILT_IN_REV_CRC16_DATA16,
           BUILT_IN_REV_CRC32_DATA8, BUILT_IN_REV_CRC32_DATA16, 
BUILT_IN_REV_CRC32_DATA32.

           * builtins.def (BUILT_IN_CRC8_DATA8): New builtin.
           (BUILT_IN_CRC16_DATA8): Likewise.
           (BUILT_IN_CRC16_DATA16): Likewise.
           (BUILT_IN_CRC32_DATA8): Likewise.
           (BUILT_IN_CRC32_DATA16): Likewise.
           (BUILT_IN_CRC32_DATA32): Likewise.
           (BUILT_IN_CRC64_DATA8): Likewise.
           (BUILT_IN_CRC64_DATA16): Likewise.
           (BUILT_IN_CRC64_DATA32): Likewise.
           (BUILT_IN_CRC64_DATA64): Likewise.
           (BUILT_IN_REV_CRC8_DATA8): New builtin.
           (BUILT_IN_REV_CRC16_DATA8): Likewise.
           (BUILT_IN_REV_CRC16_DATA16): Likewise.
           (BUILT_IN_REV_CRC32_DATA8): Likewise.
           (BUILT_IN_REV_CRC32_DATA16): Likewise.
           (BUILT_IN_REV_CRC32_DATA32): Likewise.
           * builtins.h (expand_builtin_crc_table_based): New function 
declaration.

           * doc/extend.texti (__builtin_rev_crc16_data8,
           (__builtin_rev_crc32_data32, __builtin_rev_crc32_data8,
           __builtin_rev_crc32_data16, __builtin_crc8_data8,
           __builtin_crc16_data16, __builtin_crc16_data8,
           __builtin_crc32_data32, __builtin_crc32_data8,
           __builtin_crc32_data16, __builtin_crc64_data64,
           __builtin_crc64_data8, __builtin_crc64_data16,
           __builtin_crc64_data32): Document.

       gcc/testsuite/

          * gcc.c-torture/compile/crc-builtin-rev-target32.c
          * gcc.c-torture/compile/crc-builtin-rev-target64.c
          * gcc.c-torture/compile/crc-builtin-target32.c
          * gcc.c-torture/compile/crc-builtin-target64.c

Signed-off-by: Mariam Arutunian >



diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index f8d94c4b435..b662de91e49 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2207,7 +2207,24 @@ associated_internal_fn (built_in_function 

Re: [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation

2024-05-25 Thread Jeff Law




On 5/24/24 2:41 AM, Mariam Arutunian wrote:
Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster 
CRC generation.

One performs bit-forward and the other bit-reversed CRC computation.
If CRC optabs are supported, they are used for the CRC computation.
Otherwise, table-based CRC is generated.
The supported data and CRC sizes are 8, 16, 32, and 64 bits.
The polynomial is without the leading 1.
A table with 256 elements is used to store precomputed CRCs.
For the reflection of inputs and the output, a simple algorithm involving
SHIFT, AND, and OR operations is used.

Co-authored-by: Joern Rennecke >


gcc/

    * doc/md.texi (crc@var{m}@var{n}4,
    crc_rev@var{m}@var{n}4): Document.
    * expr.cc (generate_crc_table): New function.
    (calculate_table_based_CRC): Likewise.
    (expand_crc_table_based): Likewise.
    (gen_common_operation_to_reflect): Likewise.
    (reflect_64_bit_value): Likewise.
    (reflect_32_bit_value): Likewise.
    (reflect_16_bit_value): Likewise.
    (reflect_8_bit_value): Likewise.
    (generate_reflecting_code_standard): Likewise.
    (expand_reversed_crc_table_based): Likewise.
    * expr.h (generate_reflecting_code_standard): New function declaration.
    (expand_crc_table_based): Likewise.
    (expand_reversed_crc_table_based): Likewise.
    * internal-fn.cc: (crc_direct): Define.
    (direct_crc_optab_supported_p): Likewise.
    (expand_crc_optab_fn): New function
    * internal-fn.def (CRC, CRC_REV): New internal functions.
    * optabs.def (crc_optab, crc_rev_optab): New optabs.

Signed-off-by: Mariam Arutunian >

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5730bda80dc..be68ef860f9 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8557,6 +8557,20 @@ operand 2, greater than operand 2 or is unordered with 
operand 2.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{crc@var{m}@var{n}4} instruction pattern

+@item @samp{crc@var{m}@var{n}4}
+Calculate a bit-forward CRC using operands 1, 2 and 3,
+then store the result in operand 0.
+Operands 1 is the initial CRC, operands 2 is the data and operands 3 is the
+polynomial without leading 1.
+Operands 0, 1 and 3 have mode @var{n} and operand 2 has mode @var{m}, where
+both modes are integers.  The size of CRC to be calculated is determined by the
+mode; for example, if @var{n} is 'hi', a CRC16 is calculated.
+
+@cindex @code{crc_rev@var{m}@var{n}4} instruction pattern
+@item @samp{crc_rev@var{m}@var{n}4}
+Similar to @samp{crc@var{m}@var{n}4}, but calculates a bit-reversed CRC.
+
So just to be clear, this is a case where the input (operand 2) may have 
a different mode than the output (operand 0).  That scenario is 
generally discouraged, with a few exceptions (the most common being 
shift counts which are often QImode objects while the 
value-to-be-shifted and the output value are potentially any scalar 
integer mode.


So I don't think this is a problem, just wanted to point it out to 
anyone else that may be looking at this code.




 @end table
 
 @end ifset

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 1baa39b98eb..18368ae6b6c 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -14091,3 +14091,359 @@ int_expr_size (const_tree exp)
 
   return tree_to_shwi (size);

 }
+
+/* Calculate CRC for the initial CRC and given POLYNOMIAL.
+   CRC_BITS is CRC size.  */
+
+static unsigned HOST_WIDE_INT
+calculate_crc (unsigned HOST_WIDE_INT crc,
+ unsigned HOST_WIDE_INT polynomial,
+ unsigned crc_bits)
Just a nit.  Line up the polynomial & crc_bits declarations with the crc 
declaration.




+{
+  crc = crc << (crc_bits - 8);
+  for (int i = 8; i > 0; --i)
+{
+  if ((crc >> (crc_bits - 1)) & 1)
+   crc = (crc << 1) ^ polynomial;
+  else
+   crc <<= 1;
+}
+
+  crc <<=  (sizeof (crc) * BITS_PER_UNIT - crc_bits);
+  crc >>=  (sizeof (crc) * BITS_PER_UNIT - crc_bits);

Another nit.  Just once space after the <<= or >>= operators.



+
+  return crc;
+}
+
+/* Assemble CRC table with 256 elements for the given POLYNOM and CRC_BITS with
+   given ID.
+   ID is the identifier of the table, the name of the table is unique,
+   contains CRC size and the polynomial.
+   POLYNOM is the polynomial used to calculate the CRC table's elements.
+   CRC_BITS is the size of CRC, may be 8, 16, ... . */
+
+rtx
+assemble_crc_table (tree id, unsigned HOST_WIDE_INT polynom, unsigned crc_bits)
+{
+  unsigned table_el_n = 0x100;
+  tree ar = build_array_type (make_unsigned_type (crc_bits),
+ build_index_type (size_int (table_el_n - 1)));

Nit.  Line up build_index_type at the same indention as make_unsigned_type.

Note that with TREE_READONLY set, there is at least some chance that the 
linker will find identical tables and merge them.  I haven't tested 
this, but I know it happens for other objects in the constant pools.




+  sprintf (buf, "crc_table_for_crc_%u_polynomial_" 

Re: [RFC/RFA][PATCH 00/12] CRC optimization

2024-05-24 Thread Jeff Law




On 5/24/24 2:41 AM, Mariam Arutunian wrote:

Hello!

This patch set detects bitwise CRC implementation loops (with branches) 
in the GIMPLE optimizers and replaces them with more optimal CRC 
implementations in RTL. These patches introduce new internal functions, 
built-in functions, and expanders for CRC generation, leveraging 
hardware instructions where available. Additionally, various tests are 
included to check CRC detection and generation.
Thanks so much for getting this process started.  It's a bit quicker 
than I was ready, but no worries.





 2.

Architecture-Specific Expanders:

  * Expanders are added for RISC-V, aarch64, and i386 architectures.
  * These expanders generate CRCs using either carry-less
multiplication instructions or direct CRC instructions, based on
the target architecture's capabilities.
Also note for the wider audience, this work can also generate table 
lookup based CRC implementations.  This has proven exceedingly helpful 
during the testing phase as we were able to run this code on a wide 
variety of the embedded targets to shake out target dependencies.


On Ventana's V1 design the clmul variant was a small, but clear winner 
over the table lookup.  Obviously the bitwise implementation found in 
coremark was the worst performing.


On our V2 design clmul outperforms the table lookup by a wide margin, 
largely due to reduced latency of clmul.



Jeff


Re: [PATCH v2] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Jeff Law




On 5/24/24 5:43 PM, Palmer Dabbelt wrote:



I'm only reading Zicclsm as saying both scalar and vector misaligned
accesses are supported, but nothing about the performance.

I think it was in the vector docs.  It didn't say anything about
performance, just a note that scalar & vector behavior could differ.


Either way, the split naming scheme seems clearer to me.  It also avoids 
getting mixed up by the no-scalar-misaligned, yes-vector-misaligned 
systems if they ever show up.


So if Robin's OK with re-spinning things, let's just go that way?
Works for me.  Hopefully he's offline until Monday as it's rather late 
for him :-)  So we'll pick it back up in the Tuesday meeting.


jeff



Re: [PATCH v2] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Jeff Law




On 5/24/24 5:39 PM, Palmer Dabbelt wrote:

On Fri, 24 May 2024 16:31:48 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/24/24 11:14 AM, Palmer Dabbelt wrote:

On Fri, 24 May 2024 09:19:09 PDT (-0700), Robin Dapp wrote:

We should have something in doc/invoke too, this one is going to be
tricky for users.  We'll also have to define how this interacts with
the existing -mstrict-align.


Addressed the rest in the attached v2 which also fixes tests.
I'm really not sure about -mstrict-align.  I would have hoped that 
using

-mstrict-align we'd never run into any movmisalign situation but that
might be wishful thinking.  Do we need to specify an
interaction, though?  For now the new options disables movmisalign so
if we hit that despite -mstrict-align we'd still not vectorize it.


I think we just need to write it down.  I think there's two ways to
encode this: either we treat scalar and vector as independent, or we
couple them.  If we treat them independently then we end up with four
cases, it's not clear if they're all interesting.  IIUC with this patch
we'd be able to encode

Given the ISA documents them as independent, I think we should follow
suit and allow them to vary independently.


I'm only reading Zicclsm as saying both scalar and vector misaligned 
accesses are supported, but nothing about the performance.
I think it was in the vector docs.  It didn't say anything about 
performance, just a note that scalar & vector behavior could differ.






Seems reasonable to me.  Just having a regular naming scheme for the 
scalar/vector makes it clear what we're doing, and it's not like having 
the extra name for -mscalar-strict-align really costs anything.

That was my thinking -- get the names right should help avoid confusion.

Jeff


Re: [PATCH] RISC-V: Avoid splitting store dataref groups during SLP discovery

2024-05-24 Thread Jeff Law




On 5/23/24 11:52 PM, Richard Biener wrote:



This worked out so I pushed the change.  The gcc.dg/vect/pr97428.c
test is FAILing on RISC-V (it still gets 0 SLP), because of missed
load permutations.  I hope the followup reorg for the load side will
fix this.  It also FAILs gcc.target/riscv/rvv/autovec/struct/struct_vect-4.c
which does excessive assembly scanning on many functions - I'll leave
this for target maintainers to update - there's one or two functions
which we now expect to SLP.
Yea, folks got a bit carried away with the scan body capability. 
Someone will have to follow-up behind you and clean this up a bit.


Thanks for checking it agains the CI system.  While it's a bit on the 
slow side, we are finding its helping catch real issues and keeping the 
testsuite cleaner WRT FAILs.


jeff



Re: [PATCH v2] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Jeff Law




On 5/24/24 11:14 AM, Palmer Dabbelt wrote:

On Fri, 24 May 2024 09:19:09 PDT (-0700), Robin Dapp wrote:

We should have something in doc/invoke too, this one is going to be
tricky for users.  We'll also have to define how this interacts with
the existing -mstrict-align.


Addressed the rest in the attached v2 which also fixes tests.
I'm really not sure about -mstrict-align.  I would have hoped that using
-mstrict-align we'd never run into any movmisalign situation but that
might be wishful thinking.  Do we need to specify an
interaction, though?  For now the new options disables movmisalign so
if we hit that despite -mstrict-align we'd still not vectorize it.


I think we just need to write it down.  I think there's two ways to 
encode this: either we treat scalar and vector as independent, or we 
couple them.  If we treat them independently then we end up with four 
cases, it's not clear if they're all interesting.  IIUC with this patch 
we'd be able to encode
Given the ISA documents them as independent, I think we should follow 
suit and allow them to vary independently.




* -mstrict-align: Both scalar and vector misaligned accesses are 
  unsupported (-mrvv-allow-misalign doesn't matter).  I'm not sure if 
  there's hardware there, but given we have systems that don't support 
  scalar misaligned accesses it seems reasonable to assume they'll also 
  not support vector misaligned accesses.
* -mno-strict-align -mno-rvv-allow-misalign: Scalar misaligned are 
  supported, vector misaligned aren't supported.  This matches our best 
  theory of how the k230 and k1 behave, so it also seems reasonable to 
  support.
* -mno-strict-align -mrvv-allow-misalign: Both scalar and vector 
  misaligned accesses are supported.  This seems reasonable to support 
  as it's how I'd hope big cores end up being designed, though again 
  there's no hardware.
I'd almost lean towards -m[no-]scalar-strict-align and 
-m[no-]vector-strict-align and deprecate -mstrict-align (aliasing it to 
the scalar alignment option).  But I'll go with consensus here.




The fourth case is kind of wacky: scalar misaligned is unsupported, 
vector misaligned is supported.  I'm not really sure why we'd end up 
with a system like that, but HW vendors do wacky things so it's kind of 
hard to predict.
I've worked on one of these :-)  The thinking from the designers was 
unaligned scalar access just wasn't that important, particularly with 
mem* and str* using the vector rather than scalar ops.


jeff





[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [to-be-committed, v2, RISC-V] Use bclri in constant synthesis

2024-05-24 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:a08b5d4d5c6d679a9d65797eaea93aa381ece172

commit a08b5d4d5c6d679a9d65797eaea93aa381ece172
Author: Jeff Law 
Date:   Fri May 24 07:27:00 2024 -0600

[to-be-committed,v2,RISC-V] Use bclri in constant synthesis

Testing with Zbs enabled by default showed a minor logic error.  After
the loop clearing things with bclri, we can only use the sequence if we
were able to clear all the necessary bits.  If any bits are still on,
then the bclr sequence turned out to not be profitable.

--

So this is conceptually similar to how we handled direct generation of
bseti for constant synthesis, but this time for bclr.

In the bclr case, we already have an expander for AND.  So we just
needed to adjust the predicate to accept another class of constant
operands (those with a single bit clear).

With that in place constant synthesis is adjusted so that it counts the
number of bits clear in the high 33 bits of a 64bit word.  If that
number is small relative to the current best cost, then we try to
generate the constant with a lui based sequence for the low half which
implicitly sets the upper 32 bits as well.  Then we bclri one or more of
those upper 33 bits.

So as an example, this code goes from 4 instructions down to 3:

 > unsigned long foo_0xfffbf7ff(void) { return
0xfffbf7ffUL; }

Note the use of 33 bits above.  That's meant to capture cases like this:

 > unsigned long foo_0xfffd77ff(void) { return
0xfffd77ffUL; }

We can use lui+addi+bclri+bclri to synthesize that in 4 instructions
instead of 5.

I'm including a handful of cases covering the two basic ideas above that
were found by the testing code.

And, no, we're not done yet.  I see at least one more notable idiom
missing before exploring zbkb's potential to improve things.

Tested in my tester and waiting on Rivos CI system before moving forward.
gcc/

* config/riscv/predicates.md (arith_operand_or_mode_mask): Renamed 
to..
(arith_or_mode_mask_or_zbs_operand): New predicate.
* config/riscv/riscv.md (and3): Update predicate for operand 
2.
* config/riscv/riscv.cc (riscv_build_integer_1): Use bclri to clear
bits, particularly bits 31..63 when profitable to do so.

gcc/testsuite/

* gcc.target/riscv/synthesis-6.c: New test.

(cherry picked from commit 401994d60ab38ffa9e63f368f0456eb7b08599be)

Diff:
---
 gcc/config/riscv/predicates.md   | 14 ++--
 gcc/config/riscv/riscv.cc| 34 ++
 gcc/config/riscv/riscv.md|  2 +-
 gcc/testsuite/gcc.target/riscv/synthesis-6.c | 95 
 4 files changed, 138 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 8948fbfc363..0fb5729fdcf 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -27,12 +27,6 @@
   (ior (match_operand 0 "const_arith_operand")
(match_operand 0 "register_operand")))
 
-(define_predicate "arith_operand_or_mode_mask"
-  (ior (match_operand 0 "arith_operand")
-   (and (match_code "const_int")
-(match_test "UINTVAL (op) == GET_MODE_MASK (HImode)
-|| UINTVAL (op) == GET_MODE_MASK (SImode)"
-
 (define_predicate "lui_operand"
   (and (match_code "const_int")
(match_test "LUI_OPERAND (INTVAL (op))")))
@@ -398,6 +392,14 @@
   (and (match_code "const_int")
(match_test "SINGLE_BIT_MASK_OPERAND (~UINTVAL (op))")))
 
+(define_predicate "arith_or_mode_mask_or_zbs_operand"
+  (ior (match_operand 0 "arith_operand")
+   (and (match_test "TARGET_ZBS")
+   (match_operand 0 "not_single_bit_mask_operand"))
+   (and (match_code "const_int")
+   (match_test "UINTVAL (op) == GET_MODE_MASK (HImode)
+|| UINTVAL (op) == GET_MODE_MASK (SImode)"
+
 (define_predicate "const_si_mask_operand"
   (and (match_code "const_int")
(match_test "(INTVAL (op) & (GET_MODE_BITSIZE (SImode) - 1))
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 85df5b7ab49..92935275aaa 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -893,6 +893,40 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  codes[1].use_uw = false;
  cost = 2;
}
+
+  /* If LUI/ADDI are going to set bits 32..63 and we need a small
+number of them cleared, we might be able to use bclri profitably.
+
+Note we may allow clearing of bit 31 using bc

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Enable vectorization for vect-early-break_124-pr114403.c

2024-05-24 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:4e981cccad14ea3add39f92378da41d203814a60

commit 4e981cccad14ea3add39f92378da41d203814a60
Author: xuli 
Date:   Mon May 20 01:56:47 2024 +

RISC-V: Enable vectorization for vect-early-break_124-pr114403.c

Because "targetm.slow_unaligned_access" is set to true by default
(aka -mtune=rocket) for RISC-V, it causes the __builtin_memcpy with
8 bytes failed to folded into int64 assignment during ccp1.

So adding "-mtune=generic-ooo" to the RISC-V target can vectorize
vect-early-break_124-pr114403.c.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-early-break_124-pr114403.c: Enable vectrization 
for RISC-V target.

(cherry picked from commit ffab721f3c9ecbb9831844d844ad257b69a77993)

Diff:
---
 gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
index 101ae1e0eaa..610b951b262 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
@@ -1,8 +1,9 @@
 /* { dg-add-options vect_early_break } */
 /* { dg-require-effective-target vect_early_break_hw } */
 /* { dg-require-effective-target vect_long_long } */
+/* { dg-additional-options "-mtune=generic-ooo" { target riscv*-*-* } } */
 
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { xfail riscv*-*-* } } 
} */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
 
 #include "tree-vect.h"
 
@@ -74,4 +75,3 @@ int main ()
 
   return 0;
 }
-


Re: Question about optimizing function pointers for direct function calls

2024-05-24 Thread Jeff Law via Gcc




On 5/23/24 9:51 PM, Hanke Zhang via Gcc wrote:

Hi,
I got a question about optimizing function pointers for direct
function calls in C.

Consider the following scenario: one of the fields of a structure is a
function pointer, and all its assignments come from the same function.
Can all its uses be replaced by direct calls to this function? So the
later passes can do more optimizations.
Certainly.  The RTL optimizers have been doing this for ~30 years.  If 
they can statically determine the target of an indirect call, they will 
replace that indirect call with a simple direct call.


You're just extending that capability to apply more often, likely by 
doing it at an earlier stage in the pipeline, presumably in one of the 
IPA or gimple optimizers?



Note that by doing this transformation before the gimple->rtl conversion 
step, you don't have to worry about quirky ABIs where direct and 
indirect calls can have different calling conventions.


Jeff


[gcc r15-821] [to-be-committed, v2, RISC-V] Use bclri in constant synthesis

2024-05-24 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:401994d60ab38ffa9e63f368f0456eb7b08599be

commit r15-821-g401994d60ab38ffa9e63f368f0456eb7b08599be
Author: Jeff Law 
Date:   Fri May 24 07:27:00 2024 -0600

[to-be-committed,v2,RISC-V] Use bclri in constant synthesis

Testing with Zbs enabled by default showed a minor logic error.  After
the loop clearing things with bclri, we can only use the sequence if we
were able to clear all the necessary bits.  If any bits are still on,
then the bclr sequence turned out to not be profitable.

--

So this is conceptually similar to how we handled direct generation of
bseti for constant synthesis, but this time for bclr.

In the bclr case, we already have an expander for AND.  So we just
needed to adjust the predicate to accept another class of constant
operands (those with a single bit clear).

With that in place constant synthesis is adjusted so that it counts the
number of bits clear in the high 33 bits of a 64bit word.  If that
number is small relative to the current best cost, then we try to
generate the constant with a lui based sequence for the low half which
implicitly sets the upper 32 bits as well.  Then we bclri one or more of
those upper 33 bits.

So as an example, this code goes from 4 instructions down to 3:

 > unsigned long foo_0xfffbf7ff(void) { return
0xfffbf7ffUL; }

Note the use of 33 bits above.  That's meant to capture cases like this:

 > unsigned long foo_0xfffd77ff(void) { return
0xfffd77ffUL; }

We can use lui+addi+bclri+bclri to synthesize that in 4 instructions
instead of 5.

I'm including a handful of cases covering the two basic ideas above that
were found by the testing code.

And, no, we're not done yet.  I see at least one more notable idiom
missing before exploring zbkb's potential to improve things.

Tested in my tester and waiting on Rivos CI system before moving forward.
gcc/

* config/riscv/predicates.md (arith_operand_or_mode_mask): Renamed 
to..
(arith_or_mode_mask_or_zbs_operand): New predicate.
* config/riscv/riscv.md (and3): Update predicate for operand 
2.
* config/riscv/riscv.cc (riscv_build_integer_1): Use bclri to clear
bits, particularly bits 31..63 when profitable to do so.

gcc/testsuite/

* gcc.target/riscv/synthesis-6.c: New test.

Diff:
---
 gcc/config/riscv/predicates.md   | 14 ++--
 gcc/config/riscv/riscv.cc| 34 ++
 gcc/config/riscv/riscv.md|  2 +-
 gcc/testsuite/gcc.target/riscv/synthesis-6.c | 95 
 4 files changed, 138 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 8948fbfc363..0fb5729fdcf 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -27,12 +27,6 @@
   (ior (match_operand 0 "const_arith_operand")
(match_operand 0 "register_operand")))
 
-(define_predicate "arith_operand_or_mode_mask"
-  (ior (match_operand 0 "arith_operand")
-   (and (match_code "const_int")
-(match_test "UINTVAL (op) == GET_MODE_MASK (HImode)
-|| UINTVAL (op) == GET_MODE_MASK (SImode)"
-
 (define_predicate "lui_operand"
   (and (match_code "const_int")
(match_test "LUI_OPERAND (INTVAL (op))")))
@@ -398,6 +392,14 @@
   (and (match_code "const_int")
(match_test "SINGLE_BIT_MASK_OPERAND (~UINTVAL (op))")))
 
+(define_predicate "arith_or_mode_mask_or_zbs_operand"
+  (ior (match_operand 0 "arith_operand")
+   (and (match_test "TARGET_ZBS")
+   (match_operand 0 "not_single_bit_mask_operand"))
+   (and (match_code "const_int")
+   (match_test "UINTVAL (op) == GET_MODE_MASK (HImode)
+|| UINTVAL (op) == GET_MODE_MASK (SImode)"
+
 (define_predicate "const_si_mask_operand"
   (and (match_code "const_int")
(match_test "(INTVAL (op) & (GET_MODE_BITSIZE (SImode) - 1))
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 85df5b7ab49..92935275aaa 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -893,6 +893,40 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  codes[1].use_uw = false;
  cost = 2;
}
+
+  /* If LUI/ADDI are going to set bits 32..63 and we need a small
+number of them cleared, we might be able to use bclri profitably.
+
+Note we may allow clearing of bit 31 using bclri.  There's a class
+of constants with that bit clear where this helps.  

[to-be-committed][v2][RISC-V] Use bclri in constant synthesis

2024-05-23 Thread Jeff Law
Testing with Zbs enabled by default showed a minor logic error.  After 
the loop clearing things with bclri, we can only use the sequence if we 
were able to clear all the necessary bits.  If any bits are still on, 
then the bclr sequence turned out to not be profitable.


--

So this is conceptually similar to how we handled direct generation of
bseti for constant synthesis, but this time for bclr.

In the bclr case, we already have an expander for AND.  So we just
needed to adjust the predicate to accept another class of constant
operands (those with a single bit clear).

With that in place constant synthesis is adjusted so that it counts the
number of bits clear in the high 33 bits of a 64bit word.  If that
number is small relative to the current best cost, then we try to
generate the constant with a lui based sequence for the low half which
implicitly sets the upper 32 bits as well.  Then we bclri one or more of
those upper 33 bits.

So as an example, this code goes from 4 instructions down to 3:

> unsigned long foo_0xfffbf7ff(void) { return 
0xfffbf7ffUL; }




Note the use of 33 bits above.  That's meant to capture cases like this:


> unsigned long foo_0xfffd77ff(void) { return 
0xfffd77ffUL; }




We can use lui+addi+bclri+bclri to synthesize that in 4 instructions
instead of 5.




I'm including a handful of cases covering the two basic ideas above that
were found by the testing code.

And, no, we're not done yet.  I see at least one more notable idiom
missing before exploring zbkb's potential to improve things.

Tested in my tester and waiting on Rivos CI system before moving forward.
gcc/

* config/riscv/predicates.md (arith_operand_or_mode_mask): Renamed to..
(arith_or_mode_mask_or_zbs_operand): New predicate.
* config/riscv/riscv.md (and3): Update predicate for operand 2.
* config/riscv/riscv.cc (riscv_build_integer_1): Use bclri to clear
bits, particularly bits 31..63 when profitable to do so.

gcc/testsuite/

* gcc.target/riscv/synthesis-6.c: New test.

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 8948fbfc363..c1c693c7617 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -27,12 +27,6 @@ (define_predicate "arith_operand"
   (ior (match_operand 0 "const_arith_operand")
(match_operand 0 "register_operand")))
 
-(define_predicate "arith_operand_or_mode_mask"
-  (ior (match_operand 0 "arith_operand")
-   (and (match_code "const_int")
-(match_test "UINTVAL (op) == GET_MODE_MASK (HImode)
-|| UINTVAL (op) == GET_MODE_MASK (SImode)"
-
 (define_predicate "lui_operand"
   (and (match_code "const_int")
(match_test "LUI_OPERAND (INTVAL (op))")))
@@ -398,6 +392,14 @@ (define_predicate "not_single_bit_mask_operand"
   (and (match_code "const_int")
(match_test "SINGLE_BIT_MASK_OPERAND (~UINTVAL (op))")))
 
+(define_predicate "arith_or_mode_mask_or_zbs_operand"
+  (ior (match_operand 0 "arith_operand")
+   (and (match_test "TARGET_ZBS")
+   (match_operand 0 "not_single_bit_mask_operand"))
+   (and (match_code "const_int")
+(match_test "UINTVAL (op) == GET_MODE_MASK (HImode)
+|| UINTVAL (op) == GET_MODE_MASK (SImode)"
+
 (define_predicate "const_si_mask_operand"
   (and (match_code "const_int")
(match_test "(INTVAL (op) & (GET_MODE_BITSIZE (SImode) - 1))
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 85df5b7ab49..3b32b515fac 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -893,6 +893,40 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  codes[1].use_uw = false;
  cost = 2;
}
+
+  /* If LUI/ADDI are going to set bits 32..63 and we need a small
+number of them cleared, we might be able to use bclri profitably. 
+
+Note we may allow clearing of bit 31 using bclri.  There's a class
+of constants with that bit clear where this helps.  */
+  else if (TARGET_64BIT
+  && TARGET_ZBS
+  && (32 - popcount_hwi (value & HOST_WIDE_INT_C 
(0x8000))) + 1 < cost)
+   {
+ /* Turn on all those upper bits and synthesize the result.  */
+ HOST_WIDE_INT nval = value | HOST_WIDE_INT_C (0x8000);
+ alt_cost = riscv_build_integer_1 (alt_codes, nval, mode);
+
+ /* Now iterate over the bits we want to clear until the cost is
+too high or we're done.  */
+ nval = value ^ HOST_WIDE_INT_C (-1);
+ nval &= HOST_WIDE_INT_C (~0x7fff);
+ while (nval && alt_cost < cost)
+   {
+ HOST_WIDE_INT bit = ctz_hwi (nval);
+ alt_codes[alt_cost].code = AND;
+ alt_codes[alt_cost].value = ~(1UL << bit);
+ alt_codes[alt_cost].use_uw = false;
+ 

[to-be-committed] [RISC-V] Use bclri in constant synthesis

2024-05-23 Thread Jeff Law
So this is conceptually similar to how we handled direct generation of 
bseti for constant synthesis, but this time for bclr.


In the bclr case, we already have an expander for AND.  So we just 
needed to adjust the predicate to accept another class of constant 
operands (those with a single bit clear).


With that in place constant synthesis is adjusted so that it counts the 
number of bits clear in the high 33 bits of a 64bit word.  If that 
number is small relative to the current best cost, then we try to 
generate the constant with a lui based sequence for the low half which 
implicitly sets the upper 32 bits as well.  Then we bclri one or more of 
those upper 33 bits.


So as an example, this code goes from 4 instructions down to 3:


unsigned long foo_0xfffbf7ff(void) { return 0xfffbf7ffUL; }




Note the use of 33 bits above.  That's meant to capture cases like this:



unsigned long foo_0xfffd77ff(void) { return 0xfffd77ffUL; }




We can use lui+addi+bclri+bclri to synthesize that in 4 instructions 
instead of 5.





I'm including a handful of cases covering the two basic ideas above that 
were found by the testing code.


And, no, we're not done yet.  I see at least one more notable idiom 
missing before exploring zbkb's potential to improve things.


Tested in my tester and waiting on Rivos CI system before moving forward.

jeff


gcc/

* config/riscv/predicates.md (arith_operand_or_mode_mask): Renamed to..
(arith_or_mode_mask_or_zbs_operand): New predicate.
* config/riscv/riscv.md (and3): Update predicate for operand 2.
* config/riscv/riscv.cc (riscv_build_integer_1): Use bclri to clear
bits, particularly bits 31..63 when profitable to do so.

gcc/testsuite/

* gcc.target/riscv/synthesis-6.c: New test.

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 8948fbfc363..c1c693c7617 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -27,12 +27,6 @@ (define_predicate "arith_operand"
   (ior (match_operand 0 "const_arith_operand")
(match_operand 0 "register_operand")))
 
-(define_predicate "arith_operand_or_mode_mask"
-  (ior (match_operand 0 "arith_operand")
-   (and (match_code "const_int")
-(match_test "UINTVAL (op) == GET_MODE_MASK (HImode)
-|| UINTVAL (op) == GET_MODE_MASK (SImode)"
-
 (define_predicate "lui_operand"
   (and (match_code "const_int")
(match_test "LUI_OPERAND (INTVAL (op))")))
@@ -398,6 +392,14 @@ (define_predicate "not_single_bit_mask_operand"
   (and (match_code "const_int")
(match_test "SINGLE_BIT_MASK_OPERAND (~UINTVAL (op))")))
 
+(define_predicate "arith_or_mode_mask_or_zbs_operand"
+  (ior (match_operand 0 "arith_operand")
+   (and (match_test "TARGET_ZBS")
+   (match_operand 0 "not_single_bit_mask_operand"))
+   (and (match_code "const_int")
+(match_test "UINTVAL (op) == GET_MODE_MASK (HImode)
+|| UINTVAL (op) == GET_MODE_MASK (SImode)"
+
 (define_predicate "const_si_mask_operand"
   (and (match_code "const_int")
(match_test "(INTVAL (op) & (GET_MODE_BITSIZE (SImode) - 1))
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 85df5b7ab49..3b32b515fac 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -893,6 +893,40 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  codes[1].use_uw = false;
  cost = 2;
}
+
+  /* If LUI/ADDI are going to set bits 32..63 and we need a small
+number of them cleared, we might be able to use bclri profitably. 
+
+Note we may allow clearing of bit 31 using bclri.  There's a class
+of constants with that bit clear where this helps.  */
+  else if (TARGET_64BIT
+  && TARGET_ZBS
+  && (32 - popcount_hwi (value & HOST_WIDE_INT_C 
(0x8000))) + 1 < cost)
+   {
+ /* Turn on all those upper bits and synthesize the result.  */
+ HOST_WIDE_INT nval = value | HOST_WIDE_INT_C (0x8000);
+ alt_cost = riscv_build_integer_1 (alt_codes, nval, mode);
+
+ /* Now iterate over the bits we want to clear until the cost is
+too high or we're done.  */
+ nval = value ^ HOST_WIDE_INT_C (-1);
+ nval &= HOST_WIDE_INT_C (~0x7fff);
+ while (nval && alt_cost < cost)
+   {
+ HOST_WIDE_INT bit = ctz_hwi (nval);
+ alt_codes[alt_cost].code = AND;
+ alt_codes[alt_cost].value = ~(1UL << bit);
+ alt_codes[alt_cost].use_uw = false;
+ alt_cost++;
+ nval &= ~(1UL << bit);
+   }
+
+ if (alt_cost <= cost)
+   {
+ memcpy (codes, alt_codes, sizeof (alt_codes));
+ cost = alt_cost;
+   }
+   }
 }
 
   if (cost > 2 && 

Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-23 Thread Jeff Law




On 5/23/24 6:14 AM, Richard Biener wrote:

On Thu, May 23, 2024 at 1:08 PM Li, Pan2  wrote:


I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond 
? _1 : 255.
And then we can do the matching on COND_EXPR in the underlying widen-mul pass.

Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass =>
sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, 
have ‘exceptional’ (error_mark) in useless_type_conversion_p, at 
gimple-expr.cc:86


Likely you have released _2, more comments below on your previous mail.
You can be sure by calling debug_tree () on the SSA_NAME node in 
question.  If it reports "in-free-list", then that's definitive that the 
SSA_NAME was released back to the SSA_NAME manager.  If that SSA_NAME is 
still in the IL, then that's very bad.


jeff



Re: RISC-V: Fix round_32.c test on RV32

2024-05-22 Thread Jeff Law




On 5/22/24 12:15 PM, Palmer Dabbelt wrote:

On Wed, 22 May 2024 11:01:16 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/22/24 6:47 AM, Jivan Hakobyan wrote:

After 8367c996e55b2 commit several checks on round_32.c test started to
fail.
The reason is that we prevent rounding DF->SI->DF on RV32 and instead of
a conversation sequence we get calls to appropriate library functions.


gcc/testsuite/ChangeLog:
         * testsuite/gcc.target/riscv/round_32.c: Fixed test

I wonder if this test even makes sense for rv32 anymore given we can't
do a DF->DI as a single instruction and DF->SI is going to give
incorrect results.  So the underlying optimization to improve those
rounding cases just doesn't apply to DF mode objects for rv32.

Thoughts?


Unless I'm missing something, we should still be able to do the float 
roundings on rv32?
I initially thought that as well.  The problem is we don't have a DF->DI 
conversion instruction for rv32.  We can't use DF->SI as the range of 
representable values is wrong.





I think with Zfa we'd also have testable sequences for the double/double 
and float/float roundings, which could be useful to test.  I'm not 
entirely sure there, though, as I always get a bit lost in which FP 
rounding flavors map down.
Zfa is a different story as it has instructions with the proper 
semantics ;-)  We'd just emit those new instructions and wouldn't have 
to worry about the initial range test.





I'd also kicked off some run trying to promote these to executable 
tests.   IIRC it was just DG stuff (maybe just adding a `dg-do run`?) 
but I don't know where I stashed the results...

Not a bad idea, particularly if we test the border cases.

jeff



Re: RISC-V: Fix round_32.c test on RV32

2024-05-22 Thread Jeff Law




On 5/22/24 6:47 AM, Jivan Hakobyan wrote:
After 8367c996e55b2 commit several checks on round_32.c test started to 
fail.

The reason is that we prevent rounding DF->SI->DF on RV32 and instead of
a conversation sequence we get calls to appropriate library functions.


gcc/testsuite/ChangeLog:
         * testsuite/gcc.target/riscv/round_32.c: Fixed test
I wonder if this test even makes sense for rv32 anymore given we can't 
do a DF->DI as a single instruction and DF->SI is going to give 
incorrect results.  So the underlying optimization to improve those 
rounding cases just doesn't apply to DF mode objects for rv32.


Thoughts?
Jeff



Re: [PATCH] Fix PR rtl-optimization/115038

2024-05-22 Thread Jeff Law




On 5/20/24 1:13 AM, Eric Botcazou wrote:

Hi,

this is a regression present on mainline and 14 branch under the form of an
ICE in seh_cfa_offset from config/i386/winnt.cc on the attached C++ testcase
compiled with -O2 -fno-omit-frame-pointer.

The problem directly comes from the -ffold-mem-offsets pass messing up with
the prologue and the frame-related instructions, which is a no-no with SEH, so
the fix simply disconnects the pass in these circumstances, the question being
whether this should be done unconditionally as in the fix or only with SEH.

Tested on x86-64/Linux, OK for the mainline and 14 branch?


2024-05-20  Eric Botcazou  

PR rtl-optimization/115038
* fold-mem-offsets.cc (fold_offsets): Return 0 if the defining
instruction of the register is frame related.


2024-05-20  Eric Botcazou  

* g++.dg/opt/fmo1.C: New test.
lol.  I missed that you had already submitted this when I made my 
comment in the PR.


OK for the trunk and gcc-14 branch.

Jeff


Re: [PATCH 4/4] Testsuite updates

2024-05-22 Thread Jeff Law




On 5/22/24 4:58 AM, Richard Biener wrote:



RISC-V CI didn't trigger (not sure what magic is required).  Both
ARM and AARCH64 show that the "Vectorizing stmts using SLP" are a bit
fragile because we sometimes cancel SLP becuase we want to use
load/store-lanes.

The RISC-V tag on the subject line is the trigger.

Jeff


Re: [PATCH] [tree-optimization/110279] fix testcase pr110279-1.c

2024-05-22 Thread Jeff Law




On 5/22/24 5:46 AM, Di Zhao OS wrote:

The test case is for targets that support FMA. Previously
the "target" selector is missed in dg-final command.

Tested on x86_64-pc-linux-gnu.

Thanks
Di Zhao

gcc/testsuite/ChangeLog:

 * gcc.dg/pr110279-1.c: add target selector.
Rather than list targets explicitly in the test, wouldn't it be better 
to have a common routine that could be used in other cases where we have 
a test that requires FMA?


So something similar to check_effective_target_scalar_all_fma?


JEff


Re: [PATCH v1 2/2] RISC-V: Add test cases for __builtin_add_overflow branchless unsigned SAT_ADD

2024-05-21 Thread Jeff Law




On 5/19/24 12:37 AM, pan2...@intel.com wrote:

From: Pan Li 

After we support branchless __builtin_add_overflow unsigned SAT_ADD from
the middle end.  Add more tests case to cover the functionarlities.

The below test suites are passed.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add __builtin_add_overflow test
macro.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c: New test.
* gcc.target/riscv/sat_u_add-5.c: New test.
* gcc.target/riscv/sat_u_add-6.c: New test.
* gcc.target/riscv/sat_u_add-7.c: New test.
* gcc.target/riscv/sat_u_add-8.c: New test.
* gcc.target/riscv/sat_u_add-run-5.c: New test.
* gcc.target/riscv/sat_u_add-run-6.c: New test.
* gcc.target/riscv/sat_u_add-run-7.c: New test.
* gcc.target/riscv/sat_u_add-run-8.c: New test.

OK
jeff



[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: avoid LUI based const mat in alloca epilogue expansion

2024-05-21 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:106d603005c774ad619103bae3b653c94b80bf9c

commit 106d603005c774ad619103bae3b653c94b80bf9c
Author: Vineet Gupta 
Date:   Wed Mar 6 15:44:27 2024 -0800

RISC-V: avoid LUI based const mat in alloca epilogue expansion

This is continuing on the prev patch in function epilogue expansion.
Broken out of easy of review.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_epilogue): Handle offset
being sum of two S12.

Tested-by: Patrick O'Neill  # pre-commit-CI #1569
Signed-off-by: Vineet Gupta 
(cherry picked from commit 9926c40a902edbc665919d508ef0c36f362f9c41)

Diff:
---
 gcc/config/riscv/riscv.cc | 33 ++---
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 2ecbcf1d0af..85df5b7ab49 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8111,7 +8111,10 @@ riscv_expand_epilogue (int style)
   need_barrier_p = false;
 
   poly_int64 adjust_offset = -frame->hard_frame_pointer_offset;
+  rtx dwarf_adj = gen_int_mode (adjust_offset, Pmode);
   rtx adjust = NULL_RTX;
+  bool sum_of_two_s12 = false;
+  HOST_WIDE_INT one, two;
 
   if (!adjust_offset.is_constant ())
{
@@ -8123,14 +8126,23 @@ riscv_expand_epilogue (int style)
}
   else
{
- if (!SMALL_OPERAND (adjust_offset.to_constant ()))
+ HOST_WIDE_INT adj_off_value = adjust_offset.to_constant ();
+ if (SMALL_OPERAND (adj_off_value))
+   {
+ adjust = GEN_INT (adj_off_value);
+   }
+ else if (SUM_OF_TWO_S12_ALGN (adj_off_value))
+   {
+ riscv_split_sum_of_two_s12 (adj_off_value, , );
+ dwarf_adj = adjust = GEN_INT (one);
+ sum_of_two_s12 = true;
+   }
+ else
{
  riscv_emit_move (RISCV_PROLOGUE_TEMP (Pmode),
-  GEN_INT (adjust_offset.to_constant ()));
+  GEN_INT (adj_off_value));
  adjust = RISCV_PROLOGUE_TEMP (Pmode);
}
- else
-   adjust = GEN_INT (adjust_offset.to_constant ());
}
 
   insn = emit_insn (
@@ -8138,14 +8150,21 @@ riscv_expand_epilogue (int style)
  adjust));
 
   rtx dwarf = NULL_RTX;
-  rtx cfa_adjust_value = gen_rtx_PLUS (
-  Pmode, hard_frame_pointer_rtx,
-  gen_int_mode (-frame->hard_frame_pointer_offset, 
Pmode));
+  rtx cfa_adjust_value = gen_rtx_PLUS (Pmode, hard_frame_pointer_rtx,
+  dwarf_adj);
   rtx cfa_adjust_rtx = gen_rtx_SET (stack_pointer_rtx, cfa_adjust_value);
   dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, cfa_adjust_rtx, dwarf);
+
   RTX_FRAME_RELATED_P (insn) = 1;
 
   REG_NOTES (insn) = dwarf;
+
+  if (sum_of_two_s12)
+   {
+ insn = emit_insn (gen_add3_insn (stack_pointer_rtx, stack_pointer_rtx,
+   GEN_INT (two)));
+ RTX_FRAME_RELATED_P (insn) = 1;
+   }
 }
 
   if (use_restore_libcall || use_multi_pop)


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-05-21 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:259f9f2c67458b594fec9eac9df0ddb8a5a27867

commit 259f9f2c67458b594fec9eac9df0ddb8a5a27867
Author: Vineet Gupta 
Date:   Mon May 13 11:46:03 2024 -0700

RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

If the constant used for stack offset can be expressed as sum of two S12
values, the constant need not be materialized (in a reg) and instead the
two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.

The prev patches to generally avoid LUI based const materialization didn't
fix this PR and need this directed fix in funcion prologue/epilogue
expansion.

This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the test ;-)

   gcc-13.1 release   |  gcc 230823 |   |
  |g6619b3d4c15c|   This patch  |  
clang/llvm

-
li  t0,-4096 | lit0,-4096  | addi  sp,sp,-2048 | addi 
sp,sp,-2048
addit0,t0,2016   | addi  t0,t0,2032| add   sp,sp,-16   | addi 
sp,sp,-32
li  a4,4096  | add   sp,sp,t0  | add   a5,sp,a0| add  
a1,sp,16
add sp,sp,t0 | addi  a5,sp,-2032   | sbzero,0(a5)  | add  
a0,a0,a1
li  a5,-4096 | add   a0,a5,a0  | addi  sp,sp,2032  | sb   
zero,0(a0)
addia4,a4,-2032  | lit0, 4096  | addi  sp,sp,32| addi 
sp,sp,2032
add a4,a4,a5 | sbzero,2032(a0) | ret   | addi 
sp,sp,48
addia5,sp,16 | addi  t0,t0,-2032   |   | ret
add a5,a4,a5 | add   sp,sp,t0  |
add a0,a5,a0 | ret |
li  t0,4096  |
sd  a5,8(sp) |
sb  zero,2032(a0)|
addit0,t0,-2016  |
add sp,sp,t0 |
ret  |

gcc/ChangeLog:
PR target/105733
* config/riscv/riscv.h: New macros for with aligned offsets.
* config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
function to split a sum of two s12 values into constituents.
(riscv_expand_prologue): Handle offset being sum of two S12.
(riscv_expand_epilogue): Ditto.
* config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr105733.c: New Test.
* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
expect LUI 4096.
* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.

Tested-by: Edwin Lu  # pre-commit-CI #1568
Signed-off-by: Vineet Gupta 
(cherry picked from commit f9cfc192ed0127edb7e79818917dd2859fce4d44)

Diff:
---
 gcc/config/riscv/riscv-protos.h|  2 +
 gcc/config/riscv/riscv.cc  | 54 --
 gcc/config/riscv/riscv.h   |  7 +++
 gcc/testsuite/gcc.target/riscv/pr105733.c  | 15 ++
 .../gcc.target/riscv/rvv/autovec/vls/spill-1.c |  4 +-
 .../gcc.target/riscv/rvv/autovec/vls/spill-2.c |  4 +-
 .../gcc.target/riscv/rvv/autovec/vls/spill-3.c |  4 +-
 .../gcc.target/riscv/rvv/autovec/vls/spill-4.c |  4 +-
 .../gcc.target/riscv/rvv/autovec/vls/spill-5.c |  4 +-
 .../gcc.target/riscv/rvv/autovec/vls/spill-6.c |  4 +-
 .../gcc.target/riscv/rvv/autovec/vls/spill-7.c |  4 +-
 11 files changed, 89 insertions(+), 17 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index c64aae18deb..0704968561b 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -167,6 +167,8 @@ extern void riscv_subword_address (rtx, rtx *, rtx *, rtx 
*, rtx *);
 extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
 extern enum memmodel riscv_union_memmodels (enum memmodel, enum memmodel);
 extern bool riscv_reg_frame_related (rtx);
+extern void riscv_split_sum_of_two_s12 (HOST_WIDE_INT, HOST_WIDE_INT *,
+   HOST_WIDE_INT *);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d0c22058b8c..2ecbcf1d0af 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4075,6 +4075,32 @@ riscv_split_doubleword_move (rtx dest, rtx src)
riscv_emit_move 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] Regenerate riscv.opt.urls and i386.opt.urls

2024-05-21 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:97fb62e5969841287c275bc12b80fd950a38061b

commit 97fb62e5969841287c275bc12b80fd950a38061b
Author: Mark Wielaard 
Date:   Mon May 20 13:13:02 2024 +0200

Regenerate riscv.opt.urls and i386.opt.urls

risc-v added an -mfence-tso option. i386 removed Xeon Phi ISA support
options. But the opt.urls files weren't regenerated.

Fixes: a6114c2a6911 ("RISC-V: Implement -m{,no}fence-tso")
Fixes: e1a7e2c54d52 ("i386: Remove Xeon Phi ISA support")

gcc/ChangeLog:

* config/riscv/riscv.opt.urls: Regenerate.
* config/i386/i386.opt.urls: Likewise.

(cherry picked from commit 591bc70139d898c06b1d605ff4fed591ffd2e2e7)

Diff:
---
 gcc/config/i386/i386.opt.urls   | 15 ---
 gcc/config/riscv/riscv.opt.urls |  3 +++
 2 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/gcc/config/i386/i386.opt.urls b/gcc/config/i386/i386.opt.urls
index 81c5bb9a927..40e8a844936 100644
--- a/gcc/config/i386/i386.opt.urls
+++ b/gcc/config/i386/i386.opt.urls
@@ -238,12 +238,6 @@ UrlSuffix(gcc/x86-Options.html#index-mavx2)
 mavx512f
 UrlSuffix(gcc/x86-Options.html#index-mavx512f)
 
-mavx512pf
-UrlSuffix(gcc/x86-Options.html#index-mavx512pf)
-
-mavx512er
-UrlSuffix(gcc/x86-Options.html#index-mavx512er)
-
 mavx512cd
 UrlSuffix(gcc/x86-Options.html#index-mavx512cd)
 
@@ -262,12 +256,6 @@ UrlSuffix(gcc/x86-Options.html#index-mavx512ifma)
 mavx512vbmi
 UrlSuffix(gcc/x86-Options.html#index-mavx512vbmi)
 
-mavx5124fmaps
-UrlSuffix(gcc/x86-Options.html#index-mavx5124fmaps)
-
-mavx5124vnniw
-UrlSuffix(gcc/x86-Options.html#index-mavx5124vnniw)
-
 mavx512vpopcntdq
 UrlSuffix(gcc/x86-Options.html#index-mavx512vpopcntdq)
 
@@ -409,9 +397,6 @@ UrlSuffix(gcc/x86-Options.html#index-mrdrnd)
 mf16c
 UrlSuffix(gcc/x86-Options.html#index-mf16c)
 
-mprefetchwt1
-UrlSuffix(gcc/x86-Options.html#index-mprefetchwt1)
-
 mfentry
 UrlSuffix(gcc/x86-Options.html#index-mfentry)
 
diff --git a/gcc/config/riscv/riscv.opt.urls b/gcc/config/riscv/riscv.opt.urls
index 2f01ae5d627..e02ef3ee3dd 100644
--- a/gcc/config/riscv/riscv.opt.urls
+++ b/gcc/config/riscv/riscv.opt.urls
@@ -91,3 +91,6 @@ UrlSuffix(gcc/RISC-V-Options.html#index-minline-strlen)
 
 ; skipping UrlSuffix for 'mtls-dialect=' due to finding no URLs
 
+mfence-tso
+UrlSuffix(gcc/RISC-V-Options.html#index-mfence-tso)
+


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] DSE: Fix ICE after allow vector type in get_stored_val

2024-05-21 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:5ef90118a30e49ce73f48a6f3c94129374290b5c

commit 5ef90118a30e49ce73f48a6f3c94129374290b5c
Author: Pan Li 
Date:   Tue Apr 30 09:42:39 2024 +0800

DSE: Fix ICE after allow vector type in get_stored_val

We allowed vector type for get_stored_val when read is less than or
equal to store in previous.  Unfortunately,  the valididate_subreg
treats the vector type's size is less than vector register as
invalid.  Then we will have ICE here.

This patch would like to fix it by filter-out the invalid type size,
and make sure the subreg is valid for both the read_mode and store_mode
before perform the real gen_lowpart.

The below test suites are passed for this patch:

* The x86 bootstrap test.
* The x86 regression test.
* The riscv rv64gcv regression test.
* The riscv rv64gc regression test.
* The aarch64 regression test.

gcc/ChangeLog:

* dse.cc (get_stored_val): Make sure read_mode/write_mode
is valid subreg before gen_lowpart.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-6.c: New test.

Signed-off-by: Pan Li 
(cherry picked from commit 88b3f83238087cbe2aa2c51c6054796856f2fb94)

Diff:
---
 gcc/dse.cc  |  4 +++-
 gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c | 22 ++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/dse.cc b/gcc/dse.cc
index edc7a1dfecf..1596da91da0 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -1946,7 +1946,9 @@ get_stored_val (store_info *store_info, machine_mode 
read_mode,
 copy_rtx (store_info->const_rhs));
   else if (VECTOR_MODE_P (read_mode) && VECTOR_MODE_P (store_mode)
 && known_le (GET_MODE_BITSIZE (read_mode), GET_MODE_BITSIZE (store_mode))
-&& targetm.modes_tieable_p (read_mode, store_mode))
+&& targetm.modes_tieable_p (read_mode, store_mode)
+&& validate_subreg (read_mode, store_mode, copy_rtx (store_info->rhs),
+   subreg_lowpart_offset (read_mode, store_mode)))
 read_reg = gen_lowpart (read_mode, copy_rtx (store_info->rhs));
   else
 read_reg = extract_low_bits (read_mode, store_mode,
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
new file mode 100644
index 000..5bb00b8f587
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
@@ -0,0 +1,22 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */
+
+struct A { float x, y; };
+struct B { struct A u; };
+
+extern void bar (struct A *);
+
+float
+f3 (struct B *x, int y)
+{
+  struct A p = {1.0f, 2.0f};
+  struct A *q = [y].u;
+
+  __builtin_memcpy (>x, , sizeof (float));
+  __builtin_memcpy (>y, , sizeof (float));
+
+  bar ();
+
+  return x[y].u.x + x[y].u.y;
+}


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [to-be-committed][RISC-V][PR target/115142] Do not create invalidate shift-add insn

2024-05-21 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:08aaf0da2e4cb4e36df0471e532ddf1acc873e79

commit 08aaf0da2e4cb4e36df0471e532ddf1acc873e79
Author: Jeff Law 
Date:   Sun May 19 09:56:16 2024 -0600

[to-be-committed][RISC-V][PR target/115142] Do not create invalidate 
shift-add insn

The circumstances which triggered this weren't something that should appear 
in
the wild (-ftree-ter, without optimization enabled).  So I wasn't planning 
to
backport.  Obviously if it shows up in another context we can revisit that
decision.

I've run this through my rv32gcv and rv64gc tester.  Waiting on the CI 
system before committing.

PR target/115142
gcc/

* config/riscv/riscv.cc (mem_shadd_or_shadd_rtx_p): Make sure
shifted argument is a register.

gcc/testsuite

* gcc.target/riscv/pr115142.c: New test.

(cherry picked from commit e1ce9c37ed68136a99d44c8301990c184ba41849)

Diff:
---
 gcc/config/riscv/riscv.cc |  1 +
 gcc/testsuite/gcc.target/riscv/pr115142.c | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7a34b4be873..d0c22058b8c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2465,6 +2465,7 @@ mem_shadd_or_shadd_rtx_p (rtx x)
 {
   return ((GET_CODE (x) == ASHIFT
   || GET_CODE (x) == MULT)
+ && register_operand (XEXP (x, 0), GET_MODE (x))
  && CONST_INT_P (XEXP (x, 1))
  && ((GET_CODE (x) == ASHIFT && IN_RANGE (INTVAL (XEXP (x, 1)), 1, 3))
  || (GET_CODE (x) == MULT
diff --git a/gcc/testsuite/gcc.target/riscv/pr115142.c 
b/gcc/testsuite/gcc.target/riscv/pr115142.c
new file mode 100644
index 000..40ba49dfa20
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr115142.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -ftree-ter" } */
+
+long a;
+char b;
+void e() {
+  char f[8][1];
+  b = f[a][a];
+}
+


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Implement -m{, no}fence-tso

2024-05-21 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:1b074bdb09654ddd7d0d10ed31133f58df0d656e

commit 1b074bdb09654ddd7d0d10ed31133f58df0d656e
Author: Palmer Dabbelt 
Date:   Sat May 18 15:15:09 2024 -0600

RISC-V: Implement -m{,no}fence-tso

Some processors from T-Head don't implement the `fence.tso` instruction
natively and instead trap to firmware.  This breaks some users who
haven't yet updated the firmware and one could imagine it breaking users
who are trying to build firmware if they're using the C memory model.

So just add an option to disable emitting it, in a similar fashion to
how we allow users to forbid other instructions.

Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1070959
---
I've just smoke tested this one, but

void func(void) { __atomic_thread_fence(__ATOMIC_ACQ_REL); }

generates `fence.tso` without the argument and `fence rw,rw` with
`-mno-fence-tso`, so it seems to be at least mostly there.  I figured
I'd just send it up for comments before putting together the DG bits:
it's kind of a pain to carry around these workarounds for unimplemented
instructions, but it's in HW so there's not much we can do about that.

gcc/ChangeLog:

* config/riscv/riscv.opt: Add -mno-fence-tso.
* config/riscv/sync-rvwmo.md (mem_thread_fence_rvwmo): Respect
-mno-fence-tso.
* doc/invoke.texi (RISC-V): Document -mno-fence-tso.

(cherry picked from commit a6114c2a691112f9cf5b072c21685d2e43c76d81)

Diff:
---
 gcc/config/riscv/riscv.opt | 4 
 gcc/config/riscv/sync-rvwmo.md | 2 +-
 gcc/doc/invoke.texi| 8 
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index d209ac896fd..87f58332016 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -624,3 +624,7 @@ Enum(tls_type) String(desc) Value(TLS_DESCRIPTORS)
 mtls-dialect=
 Target RejectNegative Joined Enum(tls_type) Var(riscv_tls_dialect) 
Init(TLS_TRADITIONAL) Save
 Specify TLS dialect.
+
+mfence-tso
+Target Var(TARGET_FENCE_TSO) Init(1)
+Specifies whether the fence.tso instruction should be used.
diff --git a/gcc/config/riscv/sync-rvwmo.md b/gcc/config/riscv/sync-rvwmo.md
index d4fd26069f7..e639a1e2392 100644
--- a/gcc/config/riscv/sync-rvwmo.md
+++ b/gcc/config/riscv/sync-rvwmo.md
@@ -33,7 +33,7 @@
 if (model == MEMMODEL_SEQ_CST)
return "fence\trw,rw";
 else if (model == MEMMODEL_ACQ_REL)
-   return "fence.tso";
+   return TARGET_FENCE_TSO ? "fence.tso" : "fence\trw,rw";
 else if (model == MEMMODEL_ACQUIRE)
return "fence\tr,rw";
 else if (model == MEMMODEL_RELEASE)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index dc4c5a3189d..1d48d57bcc7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1243,6 +1243,7 @@ See RS/6000 and PowerPC Options.
 -mplt  -mno-plt
 -mabi=@var{ABI-string}
 -mfdiv  -mno-fdiv
+-mfence-tso  -mno-fence-tso
 -mdiv  -mno-div
 -misa-spec=@var{ISA-spec-string}
 -march=@var{ISA-string}
@@ -30378,6 +30379,13 @@ Do or don't use hardware floating-point divide and 
square root instructions.
 This requires the F or D extensions for floating-point registers.  The default
 is to use them if the specified architecture has these instructions.
 
+@opindex mfence-tso
+@item -mfence-tso
+@itemx -mno-fence-tso
+Do or don't use the @samp{fence.tso} instruction, which is unimplemented on
+some processors (including those from T-Head).  If the @samp{fence.tso}
+instruction is not availiable then a stronger fence will be used instead.
+
 @opindex mdiv
 @item -mdiv
 @itemx -mno-div


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [to-be-committed, RISC-V] Improve some shift-add sequences

2024-05-21 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:03f61ba899a4e1025284ee0de2390363694190cc

commit 03f61ba899a4e1025284ee0de2390363694190cc
Author: Jeff Law 
Date:   Sat May 18 15:08:07 2024 -0600

[to-be-committed,RISC-V] Improve some shift-add sequences

So this is a minor fix/improvement for shift-add sequences.  This was
supposed to help xz in a minor way IIRC.

Combine may present us with (x + C2') << C1 which was canonicalized from
(x << C1) + C2.

Depending on the precise values of C2 and C2' one form may be better
than the other.  We can (somewhat awkwardly) use riscv_const_insns to
test for which sequence would be preferred.

Tested on Ventana's CI system as well as my own.  Waiting on CI results
from Rivos's tester before moving forward.

Jeff
gcc/
* config/riscv/riscv.md: Add new patterns to allow selection
between (x << C1) + C2 vs (x + C2') << C1 depending on the
cost C2 vs C2'.

gcc/testsuite

* gcc.target/riscv/shift-add-1.c: New test.

(cherry picked from commit 3c9c52a1c0fa7af22f769a2116b28a0b7ea18129)

Diff:
---
 gcc/config/riscv/riscv.md| 56 
 gcc/testsuite/gcc.target/riscv/shift-add-1.c | 21 +++
 2 files changed, 77 insertions(+)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index ff4557c1325..78c16adee98 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4162,6 +4162,62 @@
   }
 )
 
+;; These are forms of (x << C1) + C2, potentially canonicalized from
+;; ((x + C2') << C1.  Depending on the cost to load C2 vs C2' we may
+;; want to go ahead and recognize this form as C2 may be cheaper to
+;; synthesize than C2'.
+;;
+;; It might be better to refactor riscv_const_insns a bit so that we
+;; can have an API that passes integer values around rather than
+;; constructing a lot of garbage RTL.
+;;
+;; The mvconst_internal pattern in effect requires this pattern to
+;; also be a define_insn_and_split due to insn count costing when
+;; splitting in combine.
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (plus:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
+   (match_operand 2 "const_int_operand" "n"))
+(match_operand 3 "const_int_operand" "n")))
+   (clobber (match_scratch:DI 4 "="))]
+  "(TARGET_64BIT
+&& riscv_const_insns (operands[3])
+&& ((riscv_const_insns (operands[3])
+< riscv_const_insns (GEN_INT (INTVAL (operands[3]) >> INTVAL 
(operands[2]
+   || riscv_const_insns (GEN_INT (INTVAL (operands[3]) >> INTVAL 
(operands[2]))) == 0))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 2)))
+   (set (match_dup 4) (match_dup 3))
+   (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 4)))]
+  ""
+  [(set_attr "type" "arith")])
+
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI (plus:SI (ashift:SI
+  (match_operand:SI 1 "register_operand" "r")
+  (match_operand 2 "const_int_operand" "n"))
+(match_operand 3 "const_int_operand" "n"
+   (clobber (match_scratch:DI 4 "="))]
+  "(TARGET_64BIT
+&& riscv_const_insns (operands[3])
+&& ((riscv_const_insns (operands[3])
+< riscv_const_insns (GEN_INT (INTVAL (operands[3]) >> INTVAL 
(operands[2]
+   || riscv_const_insns (GEN_INT (INTVAL (operands[3]) >> INTVAL 
(operands[2]))) == 0))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 2)))
+   (set (match_dup 4) (match_dup 3))
+   (set (match_dup 0) (sign_extend:DI (plus:SI (match_dup 5) (match_dup 6]
+  "{
+ operands[1] = gen_lowpart (DImode, operands[1]);
+ operands[5] = gen_lowpart (SImode, operands[0]);
+ operands[6] = gen_lowpart (SImode, operands[4]);
+   }"
+  [(set_attr "type" "arith")])
+
+
 (include "bitmanip.md")
 (include "crypto.md")
 (include "sync.md")
diff --git a/gcc/testsuite/gcc.target/riscv/shift-add-1.c 
b/gcc/testsuite/gcc.target/riscv/shift-add-1.c
new file mode 100644
index 000..d98875c3271
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shift-add-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbb_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+int composeFromSurrogate(const unsigned short high) {
+
+return  ((high - 0xD800) << 10) ;
+}
+
+
+long composeFromSurrogate_2(const unsigned long high) {
+
+return  ((high - 0xD800) << 10) ;
+}
+
+
+/* { dg-final { scan-assembler-times "\tli\t" 2 } } */
+/* { dg-final { scan-assembler-times "\tslli\t" 2 } } */
+/* { dg-final { scan-assembler-times "\taddw\t" 1 } } */
+/* { dg-final { scan-assembler-times "\tadd\t" 1 } } */
+


Re: [PATCH v1 2/2] RISC-V: Add test cases for branch form unsigned SAT_ADD

2024-05-21 Thread Jeff Law




On 5/20/24 5:01 AM, pan2...@intel.com wrote:

From: Pan Li 

After we support branch form unsigned SAT_ADD from the
middle end.  Add more tests case to cover the functionarlities.

The below test suites are passed.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add branch form test macro.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c: New test.
* gcc.target/riscv/sat_u_add-10.c: New test.
* gcc.target/riscv/sat_u_add-11.c: New test.
* gcc.target/riscv/sat_u_add-12.c: New test.
* gcc.target/riscv/sat_u_add-9.c: New test.
* gcc.target/riscv/sat_u_add-run-10.c: New test.
* gcc.target/riscv/sat_u_add-run-11.c: New test.
* gcc.target/riscv/sat_u_add-run-12.c: New test.
* gcc.target/riscv/sat_u_add-run-9.c: New test.


OK

jeff



Re: [PATCH v3 2/2] RISC-V: avoid LUI based const mat in alloca epilogue expansion

2024-05-21 Thread Jeff Law




On 5/20/24 5:32 PM, Vineet Gupta wrote:

This is testsuite clean however there's a dwarf quirk which I want to
run by the experts. The test that was tripping CI has following
fragment:

Before patch|   After Patch
--
li  t0,-4096|  addi sp,s0,-2048
addit0,t0,560   |  .cfi_def_cfa 2, 2048  <- #1
add sp,s0,t0|  addi sp,sp,-1488
.cfi_def_cfa 2, 3536|  .cfi_def_cfa_offset 3536  <- #2
addisp,sp,1504  |  addi sp,sp,1504
.cfi_def_cfa_offset 2032|  .cfi_def_cfa_offset 2032  <- #3

The dwarf insn #1 and #3 seem ok, however #2 seems dubious to me.

---

This is continuing on the prev patch in function epilogue expansion.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_epilogue): Handle offset
being sum of two S12.

OK.
jeff



Re: [PATCH v3 1/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-05-21 Thread Jeff Law




On 5/20/24 5:32 PM, Vineet Gupta wrote:

Changes since v2:
   - Broke out the hunk corresponding to alloca in epilogue expansion in
 a seperate patch.
---

If the constant used for stack offset can be expressed as sum of two S12
values, the constant need not be materialized (in a reg) and instead the
two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.

The prev patches to generally avoid LUI based const materialization didn't
fix this PR and need this directed fix in funcion prologue/epilogue
expansion.

This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the test ;-)

gcc-13.1 release   |  gcc 230823 |   |
   |g6619b3d4c15c|   This patch  |  clang/llvm
-
li  t0,-4096 | lit0,-4096  | addi  sp,sp,-2048 | addi 
sp,sp,-2048
addit0,t0,2016   | addi  t0,t0,2032| add   sp,sp,-16   | addi sp,sp,-32
li  a4,4096  | add   sp,sp,t0  | add   a5,sp,a0| add  a1,sp,16
add sp,sp,t0 | addi  a5,sp,-2032   | sbzero,0(a5)  | add  a0,a0,a1
li  a5,-4096 | add   a0,a5,a0  | addi  sp,sp,2032  | sb   zero,0(a0)
addia4,a4,-2032  | lit0, 4096  | addi  sp,sp,32| addi sp,sp,2032
add a4,a4,a5 | sbzero,2032(a0) | ret   | addi sp,sp,48
addia5,sp,16 | addi  t0,t0,-2032   |   | ret
add a5,a4,a5 | add   sp,sp,t0  |
add a0,a5,a0 | ret |
li  t0,4096  |
sd  a5,8(sp) |
sb  zero,2032(a0)|
addit0,t0,-2016  |
add sp,sp,t0 |
ret  |

gcc/ChangeLog:
PR target/105733
* config/riscv/riscv.h: New macros for with aligned offsets.
* config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
function to split a sum of two s12 values into constituents.
(riscv_expand_prologue): Handle offset being sum of two S12.
(riscv_expand_epilogue): Ditto.
* config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr105733.c: New Test.
* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
expect LUI 4096.
* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.

OK
Jeff



Re: [PATCH v1 2/2] RISC-V: Add test cases for __builtin_add_overflow branch form unsigned SAT_ADD

2024-05-21 Thread Jeff Law




On 5/21/24 4:53 AM, pan2...@intel.com wrote:

From: Pan Li 

After we support __builtin_add_overflow  branch form unsigned SAT_ADD
from the middle end.  Add more tests case to cover the functionarlities.

The below test suites are passed.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test macro for
branch __builtin_add_overflow form.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-13.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-14.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-15.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-16.c: New test.
* gcc.target/riscv/sat_u_add-13.c: New test.
* gcc.target/riscv/sat_u_add-14.c: New test.
* gcc.target/riscv/sat_u_add-15.c: New test.
* gcc.target/riscv/sat_u_add-16.c: New test.
* gcc.target/riscv/sat_u_add-run-13.c: New test.
* gcc.target/riscv/sat_u_add-run-14.c: New test.
* gcc.target/riscv/sat_u_add-run-15.c: New test.
* gcc.target/riscv/sat_u_add-run-16.c: New test.

OK
jeff



Re: [committed] PATCH for Re: Stepping down as maintainer for ARC and Epiphany

2024-05-21 Thread Jeff Law




On 5/21/24 8:02 AM, Paul Koning wrote:




On May 21, 2024, at 9:57 AM, Jeff Law  wrote:



On 5/21/24 12:05 AM, Richard Biener via Gcc wrote:

On Mon, May 20, 2024 at 4:45 PM Gerald Pfeifer  wrote:


On Wed, 5 Jul 2023, Joern Rennecke wrote:

I haven't worked with these targets in years and can't really do
sensible maintenance or reviews of patches for them. I am currently
working on optimizations for other ports like RISC-V.


I noticed MAINTAINERS was not updated, so pushed the patch below.

That leaves the epiphany port unmaintained.  Should we automatically add such
ports to the list of obsoleted ports?

Given that epiphany has randomly failed tests for the last 3+ years due to bugs 
in its patterns, yes, it really needs to be deprecated.

I tried to fix the worst of the offenders in epiphany.md a few years back and 
gave up.  Essentially seemingly innocent changes in the RTL will cause reload 
to occasionally not see a path to get constraints satisfied.  So a test which 
passes today, will flip to failing tomorrow while some other test of tests will 
go the other way.


Does LRA make that issue go away, or does it not help?
LRA didn't trivially work on epiphany.  I didn't care enough about the 
port to try and make it LRA compatible.


jeff



Re: [committed] PATCH for Re: Stepping down as maintainer for ARC and Epiphany

2024-05-21 Thread Jeff Law via Gcc




On 5/21/24 8:02 AM, Paul Koning wrote:




On May 21, 2024, at 9:57 AM, Jeff Law  wrote:



On 5/21/24 12:05 AM, Richard Biener via Gcc wrote:

On Mon, May 20, 2024 at 4:45 PM Gerald Pfeifer  wrote:


On Wed, 5 Jul 2023, Joern Rennecke wrote:

I haven't worked with these targets in years and can't really do
sensible maintenance or reviews of patches for them. I am currently
working on optimizations for other ports like RISC-V.


I noticed MAINTAINERS was not updated, so pushed the patch below.

That leaves the epiphany port unmaintained.  Should we automatically add such
ports to the list of obsoleted ports?

Given that epiphany has randomly failed tests for the last 3+ years due to bugs 
in its patterns, yes, it really needs to be deprecated.

I tried to fix the worst of the offenders in epiphany.md a few years back and 
gave up.  Essentially seemingly innocent changes in the RTL will cause reload 
to occasionally not see a path to get constraints satisfied.  So a test which 
passes today, will flip to failing tomorrow while some other test of tests will 
go the other way.


Does LRA make that issue go away, or does it not help?
LRA didn't trivially work on epiphany.  I didn't care enough about the 
port to try and make it LRA compatible.


jeff



Re: [committed] PATCH for Re: Stepping down as maintainer for ARC and Epiphany

2024-05-21 Thread Jeff Law




On 5/21/24 12:05 AM, Richard Biener via Gcc wrote:

On Mon, May 20, 2024 at 4:45 PM Gerald Pfeifer  wrote:


On Wed, 5 Jul 2023, Joern Rennecke wrote:

I haven't worked with these targets in years and can't really do
sensible maintenance or reviews of patches for them. I am currently
working on optimizations for other ports like RISC-V.


I noticed MAINTAINERS was not updated, so pushed the patch below.


That leaves the epiphany port unmaintained.  Should we automatically add such
ports to the list of obsoleted ports?
Given that epiphany has randomly failed tests for the last 3+ years due 
to bugs in its patterns, yes, it really needs to be deprecated.


I tried to fix the worst of the offenders in epiphany.md a few years 
back and gave up.  Essentially seemingly innocent changes in the RTL 
will cause reload to occasionally not see a path to get constraints 
satisfied.  So a test which passes today, will flip to failing tomorrow 
while some other test of tests will go the other way.




jeff



Re: [committed] PATCH for Re: Stepping down as maintainer for ARC and Epiphany

2024-05-21 Thread Jeff Law via Gcc




On 5/21/24 12:05 AM, Richard Biener via Gcc wrote:

On Mon, May 20, 2024 at 4:45 PM Gerald Pfeifer  wrote:


On Wed, 5 Jul 2023, Joern Rennecke wrote:

I haven't worked with these targets in years and can't really do
sensible maintenance or reviews of patches for them. I am currently
working on optimizations for other ports like RISC-V.


I noticed MAINTAINERS was not updated, so pushed the patch below.


That leaves the epiphany port unmaintained.  Should we automatically add such
ports to the list of obsoleted ports?
Given that epiphany has randomly failed tests for the last 3+ years due 
to bugs in its patterns, yes, it really needs to be deprecated.


I tried to fix the worst of the offenders in epiphany.md a few years 
back and gave up.  Essentially seemingly innocent changes in the RTL 
will cause reload to occasionally not see a path to get constraints 
satisfied.  So a test which passes today, will flip to failing tomorrow 
while some other test of tests will go the other way.




jeff



Re: [PATCH v3 2/2] RISC-V: avoid LUI based const mat in alloca epilogue expansion

2024-05-20 Thread Jeff Law




On 5/20/24 5:32 PM, Vineet Gupta wrote:

This is testsuite clean however there's a dwarf quirk which I want to
run by the experts. The test that was tripping CI has following
fragment:

Before patch|   After Patch
--
li  t0,-4096|  addi sp,s0,-2048
addit0,t0,560   |  .cfi_def_cfa 2, 2048  <- #1
add sp,s0,t0|  addi sp,sp,-1488
.cfi_def_cfa 2, 3536|  .cfi_def_cfa_offset 3536  <- #2
addisp,sp,1504  |  addi sp,sp,1504
.cfi_def_cfa_offset 2032|  .cfi_def_cfa_offset 2032  <- #3

The dwarf insn #1 and #3 seem ok, however #2 seems dubious to me.
What about it seems dubious?  We need a CFA adjustment on each insn that 
modifies the stack pointer so that we can unwind at any arbitrary point.


The first adjustment says the prior frame is at sp + 2048.  Then it's at 
sp + 3536.  Then after the final insn the prior frame is at sp+2032.


Jeff


Re: [to-be-committed][RISC-V] Eliminate redundant bitmanip operation

2024-05-19 Thread Jeff Law




On 5/19/24 1:59 PM, Andrew Pinski wrote:

On Sun, May 19, 2024 at 10:58 AM Jeff Law  wrote:


perl has some internal bitmap code.  One of its implementation
properties is that if you ask it to set a bit, the bit is first cleared.


Unfortunately this is fairly hard to see in gimple/match due to type
changes in the IL.  But it is easy to see in the code we get from
combine.  So we just match the relevant cases.



So looking into this from a gimple point of view, we can see the issue
on x86_64 if you used explicitly `unsigned char`.
We have:
```
   c_8 = (unsigned char) _1;
   _2 = *a_10(D);
   c.0_3 = (signed char) _1;
   _4 = ~c.0_3;
   _12 = (unsigned char) _4;
``
So for this, we could push the no_op cast from `signed char` to
`unsigned char` past the `bit_not` and I think it will fix the issue
on the gimple level.
So something like:
```
/* Push no_op conversion past the bit_not expression if it was single use. */
(simplify
  (convert (bit_not:s @0))
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
   (bit_not (convert @0
I'm not sure where the best place to put the conversion would be in 
gimple.  I bet there's times when we want the conversion at the outer 
level and others times at the inner level.  Just not sure it's going to 
be clear cut with either solution likely causing regressions somewhere.


What we can (and probably should) do is put this simplification into 
simplify-rtx.  It's target independent and shouldn't be hard to capture 
there.


Jeff



[to-be-committed][RISC-V] Eliminate redundant bitmanip operation

2024-05-19 Thread Jeff Law
perl has some internal bitmap code.  One of its implementation 
properties is that if you ask it to set a bit, the bit is first cleared.



Unfortunately this is fairly hard to see in gimple/match due to type 
changes in the IL.  But it is easy to see in the code we get from 
combine.  So we just match the relevant cases.




Regression tested in Ventana's CI system as well as my own.  Waiting on 
the Rivos CI system before moving forward.




Jeffgcc/

* config/riscv/bitmanip.md: Add patterns for setting a just
cleared bit or clearing a just set bit.
* config/riscv/riscv.cc (riscv_rtx_costs): Cost that RTL
properly

gcc/testsuite

* gcc.target/riscv/redundant-bitmap-1.c: New test.
* gcc.target/riscv/redundant-bitmap-2.c: New test.
* gcc.target/riscv/redundant-bitmap-3.c: New test.

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 8769a6b818b..9d4247ec8b9 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -877,6 +877,29 @@ (define_insn_and_split ""
}"
   [(set_attr "type" "bitmanip")])
 
+;; In theory these might be better handled with match.pd patterns, but
+;; the type changes tend to make it ugly, at least for the perl testcases
+(define_insn ""
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (ior:X (and:X (rotate:X (const_int -2)
+   (match_operand:QI 1 "register_operand" "r"))
+ (match_operand:X 2 "register_operand" "r"))
+  (ashift:X (const_int 1) (match_operand:QI 3 "register_operand" 
"r"]
+  "TARGET_ZBS && rtx_equal_p (operands[1], operands[3])"
+  "bset\t%0,%2,%1"
+  [(set_attr "type" "bitmanip")])
+
+(define_insn ""
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (and:X (any_or:X (ashift:X (const_int 1)
+  (match_operand:QI 1 "register_operand" "r"))
+(match_operand:X 2 "register_operand" "r"))
+  (rotate:X (const_int -2)
+(match_operand:QI 3 "register_operand" "r"]
+  "TARGET_ZBS && rtx_equal_p (operands[1], operands[3])"
+  "bclr\t%0,%2,%1"
+  [(set_attr "type" "bitmanip")])
+
 ;; IF_THEN_ELSE: test for 2 bits of opposite polarity
 (define_insn_and_split "*branch_mask_twobits_equals_singlebit"
   [(set (pc)
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index b0a14a2a82d..78a4a1cd554 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3712,6 +3712,22 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  return true;
}
 
+  /* Special case for bset followed by bclr.  */
+  if (GET_CODE (x) == AND
+ && (GET_CODE (XEXP (x, 0)) == IOR
+ || GET_CODE (XEXP (x, 0)) == XOR)
+ && GET_CODE (XEXP (XEXP (x, 0), 0)) == ASHIFT
+ && XEXP (XEXP (XEXP (x, 0), 0), 0) == CONST1_RTX (word_mode)
+ && GET_CODE (XEXP (x, 1)) == ROTATE
+ && CONST_INT_P (XEXP (XEXP (x, 1), 0))
+ && INTVAL (XEXP (XEXP (x, 1), 0)) == -2
+ && rtx_equal_p (XEXP (XEXP (XEXP (x, 0), 0), 1),
+(XEXP (XEXP (x, 1), 1
+   {
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
+
   gcc_fallthrough ();
 case IOR:
 case XOR:
@@ -3734,6 +3750,21 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  return true;
}
 
+  /* Special case for bclr followed by bset.  */
+  if (GET_CODE (x) == IOR
+ && GET_CODE (XEXP (x, 0)) == AND
+ && GET_CODE (XEXP (XEXP (x, 0), 0)) == ROTATE
+ && CONST_INT_P (XEXP (XEXP (XEXP (x, 0), 0), 0))
+ && INTVAL (XEXP (XEXP (XEXP (x, 0), 0), 0)) == -2
+ && GET_CODE (XEXP (x, 1)) == ASHIFT
+ && XEXP (XEXP (x, 1), 0) == CONST1_RTX (word_mode)
+ && rtx_equal_p (XEXP (XEXP (XEXP (x, 0), 0), 1),
+ (XEXP (XEXP (x, 1), 1
+   {
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
+
   /* Double-word operations use two single-word operations.  */
   *total = riscv_binary_cost (x, 1, 2);
   return false;
diff --git a/gcc/testsuite/g++.target/riscv/redundant-bitmap-1.C 
b/gcc/testsuite/g++.target/riscv/redundant-bitmap-1.C
new file mode 100644
index 000..85be608bdc8
--- /dev/null
+++ b/gcc/testsuite/g++.target/riscv/redundant-bitmap-1.C
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+void setBit(char , int b) {
+char c = 0x1UL << b;
+a &= ~c;
+a |= c;
+}
+
+/* { dg-final { scan-assembler-not "bclr\t" } } */
+
diff --git a/gcc/testsuite/g++.target/riscv/redundant-bitmap-2.C 
b/gcc/testsuite/g++.target/riscv/redundant-bitmap-2.C
new file mode 100644
index 000..9060eb1d769
--- /dev/null
+++ 

Re: [PATCH v4] DSE: Fix ICE after allow vector type in get_stored_val

2024-05-19 Thread Jeff Law




On 5/2/24 7:51 PM, pan2...@intel.com wrote:

From: Pan Li 

We allowed vector type for get_stored_val when read is less than or
equal to store in previous.  Unfortunately,  the valididate_subreg
treats the vector type's size is less than vector register as
invalid.  Then we will have ICE here.

This patch would like to fix it by filter-out the invalid type size,
and make sure the subreg is valid for both the read_mode and store_mode
before perform the real gen_lowpart.

The below test suites are passed for this patch:

* The x86 bootstrap test.
* The x86 regression test.
* The riscv rv64gcv regression test.
* The riscv rv64gc regression test.
* The aarch64 regression test.

gcc/ChangeLog:

* dse.cc (get_stored_val): Make sure read_mode/write_mode
is valid subreg before gen_lowpart.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-6.c: New test.
OK for the trunk.  Let's let it simmer on the trunk for a while before 
we consider backporting.


jeff



[gcc r15-652] [to-be-committed][RISC-V][PR target/115142] Do not create invalidate shift-add insn

2024-05-19 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:e1ce9c37ed68136a99d44c8301990c184ba41849

commit r15-652-ge1ce9c37ed68136a99d44c8301990c184ba41849
Author: Jeff Law 
Date:   Sun May 19 09:56:16 2024 -0600

[to-be-committed][RISC-V][PR target/115142] Do not create invalidate 
shift-add insn

The circumstances which triggered this weren't something that should appear 
in
the wild (-ftree-ter, without optimization enabled).  So I wasn't planning 
to
backport.  Obviously if it shows up in another context we can revisit that
decision.

I've run this through my rv32gcv and rv64gc tester.  Waiting on the CI 
system before committing.

PR target/115142
gcc/

* config/riscv/riscv.cc (mem_shadd_or_shadd_rtx_p): Make sure
shifted argument is a register.

gcc/testsuite

* gcc.target/riscv/pr115142.c: New test.

Diff:
---
 gcc/config/riscv/riscv.cc |  1 +
 gcc/testsuite/gcc.target/riscv/pr115142.c | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7a34b4be873f..d0c22058b8c3 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2465,6 +2465,7 @@ mem_shadd_or_shadd_rtx_p (rtx x)
 {
   return ((GET_CODE (x) == ASHIFT
   || GET_CODE (x) == MULT)
+ && register_operand (XEXP (x, 0), GET_MODE (x))
  && CONST_INT_P (XEXP (x, 1))
  && ((GET_CODE (x) == ASHIFT && IN_RANGE (INTVAL (XEXP (x, 1)), 1, 3))
  || (GET_CODE (x) == MULT
diff --git a/gcc/testsuite/gcc.target/riscv/pr115142.c 
b/gcc/testsuite/gcc.target/riscv/pr115142.c
new file mode 100644
index ..40ba49dfa20b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr115142.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -ftree-ter" } */
+
+long a;
+char b;
+void e() {
+  char f[8][1];
+  b = f[a][a];
+}
+


Re: [PATCH] Add widening expansion of MULT_HIGHPART_EXPR for integral modes

2024-05-19 Thread Jeff Law




On 5/19/24 3:40 AM, Eric Botcazou wrote:

Hi,


Just notice that this patch may result in some ICE when build libc++ for the
riscv port, details as below. Please note not all configuration can
reproduce this issue, feel free to ping me if you cannot reproduce this
issue. CC more riscv port people for awareness.


Sorry for the breakage, fixed thus, applied as obvious.


* optabs-query.cc (can_mult_highpart_p): Test for the existence of
a wider mode instead of requiring it.
I had basically the same patch here, but hadn't run it through the 
bootstrap & regression test yesterday.


Thanks for taking care of it!

jeff


[to-be-committed][RISC-V][PR target/115142] Do not create invalidate shift-add insn

2024-05-18 Thread Jeff Law

Repost, this time with the RISC-V tag so it's picked up by the CI system.

This fixes a minor bug that showed up in the CI system, presumably with 
fuzz testing.


Under the right circumstances, we could end trying to emit a shift-add 
style sequence where the to-be-shifted operand was not a register.  This 
naturally leads to an unrecognized insn.


The circumstances which triggered this weren't something that should 
appear in the wild (-ftree-ter, without optimization enabled).  So I 
wasn't planning to backport.  Obviously if it shows up in another 
context we can revisit that decision.


PR target/115142
gcc/

* config/riscv/riscv.cc (mem_shadd_or_shadd_rtx_p): Make sure
shifted argument is a register.

gcc/testsuite

* gcc.target/riscv/pr115142.c: New test.

I've run this through my rv32gcv and rv64gc tester.  Waiting on the CI 
system before committing.


jeff
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7a34b4be873..d0c22058b8c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2465,6 +2465,7 @@ mem_shadd_or_shadd_rtx_p (rtx x)
 {
   return ((GET_CODE (x) == ASHIFT
   || GET_CODE (x) == MULT)
+ && register_operand (XEXP (x, 0), GET_MODE (x))
  && CONST_INT_P (XEXP (x, 1))
  && ((GET_CODE (x) == ASHIFT && IN_RANGE (INTVAL (XEXP (x, 1)), 1, 3))
  || (GET_CODE (x) == MULT
diff --git a/gcc/testsuite/gcc.target/riscv/pr115142.c 
b/gcc/testsuite/gcc.target/riscv/pr115142.c
new file mode 100644
index 000..40ba49dfa20
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr115142.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -ftree-ter" } */
+
+long a;
+char b;
+void e() {
+  char f[8][1];
+  b = f[a][a];
+}
+


[to-be-committed][PR target/115142] Do not create invalidate shift-add insn

2024-05-18 Thread Jeff Law
This fixes a minor bug that showed up in the CI system, presumably with 
fuzz testing.


Under the right circumstances, we could end trying to emit a shift-add 
style sequence where the to-be-shifted operand was not a register.  This 
naturally leads to an unrecognized insn.


The circumstances which triggered this weren't something that should 
appear in the wild (-ftree-ter, without optimization enabled).  So I 
wasn't planning to backport.  Obviously if it shows up in another 
context we can revisit that decision.


PR target/115142
gcc/

* config/riscv/riscv.cc (mem_shadd_or_shadd_rtx_p): Make sure
shifted argument is a register.

gcc/testsuite

* gcc.target/riscv/pr115142.c: New test.

I've run this through my rv32gcv and rv64gc tester.  Waiting on the CI 
system before committing.


jeffdiff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7a34b4be873..d0c22058b8c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2465,6 +2465,7 @@ mem_shadd_or_shadd_rtx_p (rtx x)
 {
   return ((GET_CODE (x) == ASHIFT
   || GET_CODE (x) == MULT)
+ && register_operand (XEXP (x, 0), GET_MODE (x))
  && CONST_INT_P (XEXP (x, 1))
  && ((GET_CODE (x) == ASHIFT && IN_RANGE (INTVAL (XEXP (x, 1)), 1, 3))
  || (GET_CODE (x) == MULT
diff --git a/gcc/testsuite/gcc.target/riscv/pr115142.c 
b/gcc/testsuite/gcc.target/riscv/pr115142.c
new file mode 100644
index 000..40ba49dfa20
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr115142.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -ftree-ter" } */
+
+long a;
+char b;
+void e() {
+  char f[8][1];
+  b = f[a][a];
+}
+


[gcc r15-647] RISC-V: Implement -m{,no}fence-tso

2024-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:a6114c2a691112f9cf5b072c21685d2e43c76d81

commit r15-647-ga6114c2a691112f9cf5b072c21685d2e43c76d81
Author: Palmer Dabbelt 
Date:   Sat May 18 15:15:09 2024 -0600

RISC-V: Implement -m{,no}fence-tso

Some processors from T-Head don't implement the `fence.tso` instruction
natively and instead trap to firmware.  This breaks some users who
haven't yet updated the firmware and one could imagine it breaking users
who are trying to build firmware if they're using the C memory model.

So just add an option to disable emitting it, in a similar fashion to
how we allow users to forbid other instructions.

Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1070959
---
I've just smoke tested this one, but

void func(void) { __atomic_thread_fence(__ATOMIC_ACQ_REL); }

generates `fence.tso` without the argument and `fence rw,rw` with
`-mno-fence-tso`, so it seems to be at least mostly there.  I figured
I'd just send it up for comments before putting together the DG bits:
it's kind of a pain to carry around these workarounds for unimplemented
instructions, but it's in HW so there's not much we can do about that.

gcc/ChangeLog:

* config/riscv/riscv.opt: Add -mno-fence-tso.
* config/riscv/sync-rvwmo.md (mem_thread_fence_rvwmo): Respect
-mno-fence-tso.
* doc/invoke.texi (RISC-V): Document -mno-fence-tso.

Diff:
---
 gcc/config/riscv/riscv.opt | 4 
 gcc/config/riscv/sync-rvwmo.md | 2 +-
 gcc/doc/invoke.texi| 8 
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index d209ac896fde..87f583320168 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -624,3 +624,7 @@ Enum(tls_type) String(desc) Value(TLS_DESCRIPTORS)
 mtls-dialect=
 Target RejectNegative Joined Enum(tls_type) Var(riscv_tls_dialect) 
Init(TLS_TRADITIONAL) Save
 Specify TLS dialect.
+
+mfence-tso
+Target Var(TARGET_FENCE_TSO) Init(1)
+Specifies whether the fence.tso instruction should be used.
diff --git a/gcc/config/riscv/sync-rvwmo.md b/gcc/config/riscv/sync-rvwmo.md
index d4fd26069f74..e639a1e23924 100644
--- a/gcc/config/riscv/sync-rvwmo.md
+++ b/gcc/config/riscv/sync-rvwmo.md
@@ -33,7 +33,7 @@
 if (model == MEMMODEL_SEQ_CST)
return "fence\trw,rw";
 else if (model == MEMMODEL_ACQ_REL)
-   return "fence.tso";
+   return TARGET_FENCE_TSO ? "fence.tso" : "fence\trw,rw";
 else if (model == MEMMODEL_ACQUIRE)
return "fence\tr,rw";
 else if (model == MEMMODEL_RELEASE)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b9408ecc9188..70e8004a71b2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1244,6 +1244,7 @@ See RS/6000 and PowerPC Options.
 -mplt  -mno-plt
 -mabi=@var{ABI-string}
 -mfdiv  -mno-fdiv
+-mfence-tso  -mno-fence-tso
 -mdiv  -mno-div
 -misa-spec=@var{ISA-spec-string}
 -march=@var{ISA-string}
@@ -30436,6 +30437,13 @@ Do or don't use hardware floating-point divide and 
square root instructions.
 This requires the F or D extensions for floating-point registers.  The default
 is to use them if the specified architecture has these instructions.
 
+@opindex mfence-tso
+@item -mfence-tso
+@itemx -mno-fence-tso
+Do or don't use the @samp{fence.tso} instruction, which is unimplemented on
+some processors (including those from T-Head).  If the @samp{fence.tso}
+instruction is not availiable then a stronger fence will be used instead.
+
 @opindex mdiv
 @item -mdiv
 @itemx -mno-div


[gcc r13-8777] [committed] Fix RISC-V missing stack tie

2024-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:162c441c9462d073c53dde87258898795bf28a5c

commit r13-8777-g162c441c9462d073c53dde87258898795bf28a5c
Author: Jeff Law 
Date:   Thu Mar 21 20:41:59 2024 -0600

[committed] Fix RISC-V missing stack tie

As some of you know, Raphael has been working on stack-clash support for the
RISC-V port.  A little while ago Florian reached out to us with an issue 
where
glibc was failing its smoke test due to referencing an unallocated stack 
slot.

Without diving into the code in detail I (incorrectly) concluded it was a
problem with the fallback of using Ada's stack-check paths due to not having
stack-clash support.

Once enough stack-clash bits were ready I had Raphael review the code 
generated
for Florian's test and we concluded the the original case from Florian was 
just
wrong irrespective of stack clash/stack check.  While Raphael's stack-clash
work will indirectly fix Florian's case, it really should also work without
stack-clash.

In particular this code was called out by valgrind:

> 0003cb5e :
> __GI___realpath():
>3cb5e:   81010113addisp,sp,-2032
>3cb62:   7d313423sd  s3,1992(sp)
>3cb66:   79fdlui s3,0xf
>3cb68:   7e813023sd  s0,2016(sp)
>3cb6c:   7c913c23sd  s1,2008(sp)
>3cb70:   7f010413addis0,sp,2032
>3cb74:   35098793addia5,s3,848 # 
f350 <__libc_initial+0xffe8946a>
>3cb78:   74fdlui s1,0xf
>3cb7a:   008789b3add s3,a5,s0
>3cb7e:   f9048793addia5,s1,-112 # 
ef90 <__libc_initial+0xffe890aa>
>3cb82:   008784b3add s1,a5,s0
>3cb86:   77fdlui a5,0xf
>3cb88:   7d413023sd  s4,1984(sp)
>3cb8c:   7b513c23sd  s5,1976(sp)
>3cb90:   7e113423sd  ra,2024(sp)
>3cb94:   7d213823sd  s2,2000(sp)
>3cb98:   7b613823sd  s6,1968(sp)
>3cb9c:   7b713423sd  s7,1960(sp)
>3cba0:   7b813023sd  s8,1952(sp)
>3cba4:   79913c23sd  s9,1944(sp)
>3cba8:   79a13823sd  s10,1936(sp)
>3cbac:   79b13423sd  s11,1928(sp)
>3cbb0:   34878793addia5,a5,840 # 
f348 <__libc_initial+0xffe89462>
>3cbb4:   4713li  a4,1024
>3cbb8:   00132a17auipc   s4,0x132
>3cbbc:   ae0a3a03ld  s4,-1312(s4) # 16e698 
<__stack_chk_guard>
>3cbc0:   01098893addia7,s3,16
>3cbc4:   42098693addia3,s3,1056
>3cbc8:   b8040a93addis5,s0,-1152
>3cbcc:   97a2add a5,a5,s0
>3cbce:   000a3603ld  a2,0(s4)
>3cbd2:   f8c43423sd  a2,-120(s0)
>3cbd6:   4601li  a2,0
>3cbd8:   3d14b023sd  a7,960(s1)
>3cbdc:   3ce4b423sd  a4,968(s1)
>3cbe0:   7cd4b823sd  a3,2000(s1)
>3cbe4:   7ce4bc23sd  a4,2008(s1)
>3cbe8:   b7543823sd  s5,-1168(s0)
>3cbec:   b6e43c23sd  a4,-1160(s0)
>3cbf0:   e38csd  a1,0(a5)
>3cbf2:   b0010113addisp,sp,-1280
In particular note the store at 0x3cbd8.  That's hitting (s1 + 960). If you
chase the values around, you'll find it's a bit more than 1k into 
unallocated
stack space.  It's also worth noting the final stack adjustment at 0x3cbf2.

While I haven't reproduced Florian's code exactly, I was able to get 
reasonably
close and verify my suspicion that everything was fine before sched2 and
incorrect after sched2.  It was also obvious at that point what had gone 
wrong
-- we were missing a stack tie after the final stack pointer adjustment.

This patch adds the missing stack tie.

While not technically a regression, I shudder at the thought of chasing one 
of
these issues down again in the wild.  Been there, done that.

Regression tested on rv64gc.  Verified the scheduler no l

[gcc r15-646] [to-be-committed, RISC-V] Improve some shift-add sequences

2024-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:3c9c52a1c0fa7af22f769a2116b28a0b7ea18129

commit r15-646-g3c9c52a1c0fa7af22f769a2116b28a0b7ea18129
Author: Jeff Law 
Date:   Sat May 18 15:08:07 2024 -0600

[to-be-committed,RISC-V] Improve some shift-add sequences

So this is a minor fix/improvement for shift-add sequences.  This was
supposed to help xz in a minor way IIRC.

Combine may present us with (x + C2') << C1 which was canonicalized from
(x << C1) + C2.

Depending on the precise values of C2 and C2' one form may be better
than the other.  We can (somewhat awkwardly) use riscv_const_insns to
test for which sequence would be preferred.

Tested on Ventana's CI system as well as my own.  Waiting on CI results
from Rivos's tester before moving forward.

Jeff
gcc/
* config/riscv/riscv.md: Add new patterns to allow selection
between (x << C1) + C2 vs (x + C2') << C1 depending on the
cost C2 vs C2'.

gcc/testsuite

* gcc.target/riscv/shift-add-1.c: New test.

Diff:
---
 gcc/config/riscv/riscv.md| 56 
 gcc/testsuite/gcc.target/riscv/shift-add-1.c | 21 +++
 2 files changed, 77 insertions(+)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index ff4557c1325f..78c16adee980 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4162,6 +4162,62 @@
   }
 )
 
+;; These are forms of (x << C1) + C2, potentially canonicalized from
+;; ((x + C2') << C1.  Depending on the cost to load C2 vs C2' we may
+;; want to go ahead and recognize this form as C2 may be cheaper to
+;; synthesize than C2'.
+;;
+;; It might be better to refactor riscv_const_insns a bit so that we
+;; can have an API that passes integer values around rather than
+;; constructing a lot of garbage RTL.
+;;
+;; The mvconst_internal pattern in effect requires this pattern to
+;; also be a define_insn_and_split due to insn count costing when
+;; splitting in combine.
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (plus:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
+   (match_operand 2 "const_int_operand" "n"))
+(match_operand 3 "const_int_operand" "n")))
+   (clobber (match_scratch:DI 4 "="))]
+  "(TARGET_64BIT
+&& riscv_const_insns (operands[3])
+&& ((riscv_const_insns (operands[3])
+< riscv_const_insns (GEN_INT (INTVAL (operands[3]) >> INTVAL 
(operands[2]
+   || riscv_const_insns (GEN_INT (INTVAL (operands[3]) >> INTVAL 
(operands[2]))) == 0))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 2)))
+   (set (match_dup 4) (match_dup 3))
+   (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 4)))]
+  ""
+  [(set_attr "type" "arith")])
+
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI (plus:SI (ashift:SI
+  (match_operand:SI 1 "register_operand" "r")
+  (match_operand 2 "const_int_operand" "n"))
+(match_operand 3 "const_int_operand" "n"
+   (clobber (match_scratch:DI 4 "="))]
+  "(TARGET_64BIT
+&& riscv_const_insns (operands[3])
+&& ((riscv_const_insns (operands[3])
+< riscv_const_insns (GEN_INT (INTVAL (operands[3]) >> INTVAL 
(operands[2]
+   || riscv_const_insns (GEN_INT (INTVAL (operands[3]) >> INTVAL 
(operands[2]))) == 0))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 2)))
+   (set (match_dup 4) (match_dup 3))
+   (set (match_dup 0) (sign_extend:DI (plus:SI (match_dup 5) (match_dup 6]
+  "{
+ operands[1] = gen_lowpart (DImode, operands[1]);
+ operands[5] = gen_lowpart (SImode, operands[0]);
+ operands[6] = gen_lowpart (SImode, operands[4]);
+   }"
+  [(set_attr "type" "arith")])
+
+
 (include "bitmanip.md")
 (include "crypto.md")
 (include "sync.md")
diff --git a/gcc/testsuite/gcc.target/riscv/shift-add-1.c 
b/gcc/testsuite/gcc.target/riscv/shift-add-1.c
new file mode 100644
index ..d98875c32716
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shift-add-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbb_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+int composeFromSurrogate(const unsigned short high) {
+
+return  ((high - 0xD800) << 10) ;
+}
+
+
+long composeFromSurrogate_2(const unsigned long high) {
+
+return  ((high - 0xD800) << 10) ;
+}
+
+
+/* { dg-final { scan-assembler-times "\tli\t" 2 } } */
+/* { dg-final { scan-assembler-times "\tslli\t" 2 } } */
+/* { dg-final { scan-assembler-times "\taddw\t" 1 } } */
+/* { dg-final { scan-assembler-times "\tadd\t" 1 } } */
+


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Fix "Nan-box the result of movbf on soft-bf16"

2024-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:a5445260bd42d74aabe6c11d6207d113aafe2c8c

commit a5445260bd42d74aabe6c11d6207d113aafe2c8c
Author: Xiao Zeng 
Date:   Wed May 15 16:23:16 2024 +0800

RISC-V: Fix "Nan-box the result of movbf on soft-bf16"

1 According to unpriv-isa spec:


  1.1 "FMV.H.X moves the half-precision value encoded in IEEE 754-2008
  standard encoding from the lower 16 bits of integer register rs1
  to the floating-point register rd, NaN-boxing the result."
  1.2 "FMV.W.X moves the single-precision value encoded in IEEE 754-2008
  standard encoding from the lower 32 bits of integer register rs1
  to the floating-point register rd. The bits are not modified in the
  transfer, and in particular, the payloads of non-canonical NaNs are 
preserved."

2 When (!TARGET_ZFHMIN == true && TARGET_HARD_FLOAT == true), instruction 
needs
to be added to complete the Nan-box, as done in
"RISC-V: Nan-box the result of movhf on soft-fp16":



3 Consider the "RISC-V: Nan-box the result of movbf on soft-bf16" in:


It ignores that both hf16 and bf16 are 16bits floating-point.

4 zfbfmin -> zfhmin in:



gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Optimize movbf
with Nan-boxing value.
* config/riscv/riscv.md (*movhf_softfloat_boxing): Expand movbf
with Nan-boxing value.
(*mov_softfloat_boxing): Ditto.
with Nan-boxing value.
(*movbf_softfloat_boxing): Delete abandon pattern.

(cherry picked from commit 7422e050f33dd9ee7dcd5a72c80b4e11d61995ce)

Diff:
---
 gcc/config/riscv/riscv.cc | 15 ++-
 gcc/config/riscv/riscv.md | 19 +--
 2 files changed, 11 insertions(+), 23 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 2be04ec6bc5e..7a34b4be873f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3192,13 +3192,12 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
  (set (reg:SI/DI mask) (const_int -65536)
  (set (reg:SI/DI temp) (zero_extend:SI/DI (subreg:HI (reg:HF/BF src) 0)))
  (set (reg:SI/DI temp) (ior:SI/DI (reg:SI/DI mask) (reg:SI/DI temp)))
- (set (reg:HF/BF dest) (unspec:HF/BF[ (reg:SI/DI temp) ]
-   UNSPEC_FMV_SFP16_X/UNSPEC_FMV_SBF16_X))
- */
+ (set (reg:HF/BF dest) (unspec:HF/BF[ (reg:SI/DI temp) ] 
UNSPEC_FMV_FP16_X))
+  */
 
   if (TARGET_HARD_FLOAT
-  && ((!TARGET_ZFHMIN && mode == HFmode)
- || (!TARGET_ZFBFMIN && mode == BFmode))
+  && !TARGET_ZFHMIN
+  && (mode == HFmode || mode == BFmode)
   && REG_P (dest) && FP_REG_P (REGNO (dest))
   && REG_P (src) && !FP_REG_P (REGNO (src))
   && can_create_pseudo_p ())
@@ -3213,10 +3212,8 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
   else
emit_insn (gen_iordi3 (temp, mask, temp));
 
-  riscv_emit_move (dest,
-  gen_rtx_UNSPEC (mode, gen_rtvec (1, temp),
-  mode == HFmode ? UNSPEC_FMV_SFP16_X
- : UNSPEC_FMV_SBF16_X));
+  riscv_emit_move (dest, gen_rtx_UNSPEC (mode, gen_rtvec (1, temp),
+UNSPEC_FMV_FP16_X));
 
   return true;
 }
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 04f54cedad94..ff4557c1325f 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -87,8 +87,7 @@
   UNSPEC_STRLEN
 
   ;; Workaround for HFmode and BFmode without hardware extension
-  UNSPEC_FMV_SFP16_X
-  UNSPEC_FMV_SBF16_X
+  UNSPEC_FMV_FP16_X
 
   ;; XTheadFmv moves
   UNSPEC_XTHEADFMV
@@ -1959,23 +1958,15 @@
(set_attr "type" "fmove,move,load,store,mtc,mfc")
(set_attr "mode" "")])
 
-(define_insn "*movhf_softfloat_boxing"
-  [(set (match_operand:HF 0 "register_operand""=f")
-(unspec:HF [(match_operand:X 1 "register_operand" " r")] 
UNSPEC_FMV_SFP16_X))]
+(define_insn "*mov_softfloat_boxing"
+  [(set (match_operand:HFBF 0 "register_operand"   "=f")
+   (unspec:HFBF [(match_operand:X 1 "register_operand" " r")]
+UNSPEC_FMV_FP16_X))]
   "!TARGET_ZFHMIN"
   "fmv.w.x\t%0,%1"
   [(set_attr "type" "fmove")
(set_attr "mode" "SF")])
 
-(define_insn "*movbf_softfloat_boxing"
-  [(set (match_operand:BF 0 "register_operand"   "=f")
-   (unspec:BF [(match_operand:X 1 "register_operand" " r")]
-UNSPEC_FMV_SBF16_X))]
-  "!TARGET_ZFBFMIN"

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Modify _Bfloat16 to __bf16

2024-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:af9118f721e8d586049ff4a60ff7bc5507478344

commit af9118f721e8d586049ff4a60ff7bc5507478344
Author: Xiao Zeng 
Date:   Fri May 17 13:48:21 2024 +0800

RISC-V: Modify _Bfloat16 to __bf16

According to the description in:
,
the type representation symbol of BF16 has been corrected.

Kito Cheng pointed out relevant information in the email:


gcc/ChangeLog:

* config/riscv/riscv-builtins.cc (riscv_init_builtin_types):
Modify _Bfloat16 to __bf16.
* config/riscv/riscv.cc (riscv_mangle_type): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/_Bfloat16-nanboxing.c: Move to...
* gcc.target/riscv/__bf16-nanboxing.c: ...here.
* gcc.target/riscv/bf16_arithmetic.c: Modify _Bfloat16 to __bf16.
* gcc.target/riscv/bf16_call.c: Ditto.
* gcc.target/riscv/bf16_comparison.c: Ditto.
* gcc.target/riscv/bf16_float_libcall_convert.c: Ditto.
* gcc.target/riscv/bf16_integer_libcall_convert.c: Ditto.

(cherry picked from commit 6da1d6efde2282e6582c00d1631e7457975ad998)

Diff:
---
 gcc/config/riscv/riscv-builtins.cc   |  6 +++---
 gcc/config/riscv/riscv.cc|  2 +-
 .../riscv/{_Bfloat16-nanboxing.c => __bf16-nanboxing.c}  | 12 ++--
 gcc/testsuite/gcc.target/riscv/bf16_arithmetic.c |  6 +++---
 gcc/testsuite/gcc.target/riscv/bf16_call.c   |  4 ++--
 gcc/testsuite/gcc.target/riscv/bf16_comparison.c |  6 +++---
 gcc/testsuite/gcc.target/riscv/bf16_float_libcall_convert.c  |  2 +-
 .../gcc.target/riscv/bf16_integer_libcall_convert.c  |  2 +-
 8 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 4c08834288ac..dc54e1a59b52 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -275,7 +275,7 @@ riscv_init_builtin_types (void)
 lang_hooks.types.register_builtin_type (riscv_float16_type_node,
"_Float16");
 
-  /* Provide the _Bfloat16 type and bfloat16_type_node if needed.  */
+  /* Provide the __bf16 type and bfloat16_type_node if needed.  */
   if (!bfloat16_type_node)
 {
   riscv_bfloat16_type_node = make_node (REAL_TYPE);
@@ -286,9 +286,9 @@ riscv_init_builtin_types (void)
   else
 riscv_bfloat16_type_node = bfloat16_type_node;
 
-  if (!maybe_get_identifier ("_Bfloat16"))
+  if (!maybe_get_identifier ("__bf16"))
 lang_hooks.types.register_builtin_type (riscv_bfloat16_type_node,
-   "_Bfloat16");
+   "__bf16");
 }
 
 /* Implement TARGET_INIT_BUILTINS.  */
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9ac2be87acd2..2be04ec6bc5e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10276,7 +10276,7 @@ riscv_asan_shadow_offset (void)
 static const char *
 riscv_mangle_type (const_tree type)
 {
-  /* Half-precision float, _Float16 is "DF16_" and _Bfloat16 is "DF16b".  */
+  /* Half-precision float, _Float16 is "DF16_" and __bf16 is "DF16b".  */
   if (SCALAR_FLOAT_TYPE_P (type) && TYPE_PRECISION (type) == 16)
 {
   if (TYPE_MODE (type) == HFmode)
diff --git a/gcc/testsuite/gcc.target/riscv/_Bfloat16-nanboxing.c 
b/gcc/testsuite/gcc.target/riscv/__bf16-nanboxing.c
similarity index 83%
rename from gcc/testsuite/gcc.target/riscv/_Bfloat16-nanboxing.c
rename to gcc/testsuite/gcc.target/riscv/__bf16-nanboxing.c
index 11a73d222345..a9a586c98b9c 100644
--- a/gcc/testsuite/gcc.target/riscv/_Bfloat16-nanboxing.c
+++ b/gcc/testsuite/gcc.target/riscv/__bf16-nanboxing.c
@@ -1,14 +1,14 @@
 /* { dg-do compile } */
 /* { dg-options "-march=rv64ifd -mabi=lp64d -mcmodel=medlow -O" } */
 
-_Bfloat16 gvar = 9.87654;
+__bf16 gvar = 9.87654;
 union U
 {
   unsigned short i16;
-  _Bfloat16 f16;
+  __bf16 f16;
 };
 
-_Bfloat16
+__bf16
 test1 (unsigned short input)
 {
   union U tmp;
@@ -16,19 +16,19 @@ test1 (unsigned short input)
   return tmp.f16;
 }
 
-_Bfloat16
+__bf16
 test2 ()
 {
   return 1.234f;
 }
 
-_Bfloat16
+__bf16
 test3 ()
 {
   return gvar;
 }
 
-_Bfloat16
+__bf16
 test ()
 {
   return 0.0f;
diff --git a/gcc/testsuite/gcc.target/riscv/bf16_arithmetic.c 
b/gcc/testsuite/gcc.target/riscv/bf16_arithmetic.c
index 9e4850512600..190cc1d574a6 100644
--- a/gcc/testsuite/gcc.target/riscv/bf16_arithmetic.c
+++ b/gcc/testsuite/gcc.target/riscv/bf16_arithmetic.c
@@ -5,9 +5,9 @@
 /* 1) bf -> sf  (call  __extendbfsf2)  */
 /* 2) sf1 [+|-|*|/] sf2 (call  __[add|sub|mul|div]sf3)  */
 /* 3) sf -> bf  (call  __truncsfbf2)  */
-extern _Bfloat16 bf;
-extern _Bfloat16 bf1;

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add initial cost handling for segment loads/stores.

2024-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:d6cb9a0d984a6c9ea0b548178a5cf79629be073b

commit d6cb9a0d984a6c9ea0b548178a5cf79629be073b
Author: Robin Dapp 
Date:   Mon Feb 26 13:09:15 2024 +0100

RISC-V: Add initial cost handling for segment loads/stores.

This patch makes segment loads and stores more expensive.  It adds
segment_permute_2 as well as 3 to 8 cost fields to the common vector
costs and adds handling to adjust_stmt_cost.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (struct common_vector_cost): Add
segment_permute cost.
* config/riscv/riscv-vector-costs.cc (costs::adjust_stmt_cost):
Handle segment loads/stores.
* config/riscv/riscv.cc: Initialize segment_permute_[2-8] to 1.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr113112-4.c: Adjust test.

(cherry picked from commit e0b9c8ad7098fb08a25a61fe17d4274dd73e5145)

Diff:
---
 gcc/config/riscv/riscv-protos.h|   9 ++
 gcc/config/riscv/riscv-vector-costs.cc | 163 +++--
 gcc/config/riscv/riscv.cc  |  14 ++
 .../gcc.dg/vect/costmodel/riscv/rvv/pr113112-4.c   |   4 +-
 4 files changed, 146 insertions(+), 44 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 565ead1382a7..004ceb1031b8 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -222,6 +222,15 @@ struct common_vector_cost
   const int gather_load_cost;
   const int scatter_store_cost;
 
+  /* Segment load/store permute cost.  */
+  const int segment_permute_2;
+  const int segment_permute_3;
+  const int segment_permute_4;
+  const int segment_permute_5;
+  const int segment_permute_6;
+  const int segment_permute_7;
+  const int segment_permute_8;
+
   /* Cost of a vector-to-scalar operation.  */
   const int vec_to_scalar_cost;
 
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 4582b0db4250..0a88e142a934 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -1052,6 +1052,25 @@ costs::better_main_loop_than_p (const vector_costs 
*uncast_other) const
   return vector_costs::better_main_loop_than_p (other);
 }
 
+/* Returns the group size i.e. the number of vectors to be loaded by a
+   segmented load/store instruction.  Return 0 if it is no segmented
+   load/store.  */
+static int
+segment_loadstore_group_size (enum vect_cost_for_stmt kind,
+ stmt_vec_info stmt_info)
+{
+  if (stmt_info
+  && (kind == vector_load || kind == vector_store)
+  && STMT_VINFO_DATA_REF (stmt_info))
+{
+  stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
+  if (stmt_info
+ && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_LOAD_STORE_LANES)
+   return DR_GROUP_SIZE (stmt_info);
+}
+  return 0;
+}
+
 /* Adjust vectorization cost after calling riscv_builtin_vectorization_cost.
For some statement, we would like to further fine-grain tweak the cost on
top of riscv_builtin_vectorization_cost handling which doesn't have any
@@ -1076,55 +1095,115 @@ costs::adjust_stmt_cost (enum vect_cost_for_stmt kind, 
loop_vec_info loop,
 case vector_load:
 case vector_store:
{
- /* Unit-stride vector loads and stores do not have offset addressing
-as opposed to scalar loads and stores.
-If the address depends on a variable we need an additional
-add/sub for each load/store in the worst case.  */
- if (stmt_info && stmt_info->stmt)
+ if (stmt_info && stmt_info->stmt && STMT_VINFO_DATA_REF (stmt_info))
{
- data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
- class loop *father = stmt_info->stmt->bb->loop_father;
- if (!loop && father && !father->inner && father->superloops)
+ /* Segment loads and stores.  When the group size is > 1
+the vectorizer will add a vector load/store statement for
+each vector in the group.  Here we additionally add permute
+costs for each.  */
+ /* TODO: Indexed and ordered/unordered cost.  */
+ int group_size = segment_loadstore_group_size (kind, stmt_info);
+ if (group_size > 1)
+   {
+ switch (group_size)
+   {
+   case 2:
+ if (riscv_v_ext_vector_mode_p (loop->vector_mode))
+   stmt_cost += costs->vla->segment_permute_2;
+ else
+   stmt_cost += costs->vls->segment_permute_2;
+ break;
+   case 3:
+ if (riscv_v_ext_vector_mode_p (loop->vector_mode))
+   stmt_cost += costs->vla->segment_permute_3;
+ else
+   

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Implement IFN SAT_ADD for both the scalar and vector

2024-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:db2b829f4d45c6f14724148d1f8b2066290b3371

commit db2b829f4d45c6f14724148d1f8b2066290b3371
Author: Pan Li 
Date:   Fri May 17 18:49:46 2024 +0800

RISC-V: Implement IFN SAT_ADD for both the scalar and vector

The patch implement the SAT_ADD in the riscv backend as the
sample for both the scalar and vector.  Given below vector
as example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
}

Before this patch:
vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v0,0(a1)
  vle64.v v1,0(a2)
  sllia4,a5,3
  sub a3,a3,a5
  add a1,a1,a4
  add a2,a2,a4
  vadd.vv v1,v0,v1
  vmsgtu.vv   v0,v0,v1
  vmerge.vim  v1,v1,-1,v0
  vse64.v v1,0(a0)
  ...

After this patch:
vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v1,0(a1)
  vle64.v v2,0(a2)
  sllia4,a5,3
  sub a3,a3,a5
  add a1,a1,a4
  add a2,a2,a4
  vsaddu.vv   v1,v1,v2  <=  Vector Single-Width Saturating Add
  vse64.v v1,0(a0)
  ...

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The aarch64 fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* config/riscv/autovec.md (usadd3): New pattern expand for
the unsigned SAT_ADD in vector mode.
* config/riscv/riscv-protos.h (riscv_expand_usadd): New func decl
to expand usadd3 pattern.
(expand_vec_usadd): Ditto but for vector.
* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to emit
the vsadd insn.
(expand_vec_usadd): New func impl to expand usadd3 for vector.
* config/riscv/riscv.cc (riscv_expand_usadd): New func impl to
expand usadd3 for scalar.
* config/riscv/riscv.md (usadd3): New pattern expand for
the unsigned SAT_ADD in scalar mode.
* config/riscv/vector.md: Allow VLS mode for vsaddu.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c: New 
test.
* gcc.target/riscv/sat_arith.h: New test.
* gcc.target/riscv/sat_u_add-1.c: New test.
* gcc.target/riscv/sat_u_add-2.c: New test.
* gcc.target/riscv/sat_u_add-3.c: New test.
* gcc.target/riscv/sat_u_add-4.c: New test.
* gcc.target/riscv/sat_u_add-run-1.c: New test.
* gcc.target/riscv/sat_u_add-run-2.c: New test.
* gcc.target/riscv/sat_u_add-run-3.c: New test.
* gcc.target/riscv/sat_u_add-run-4.c: New test.
* gcc.target/riscv/scalar_sat_binary.h: New test.

Signed-off-by: Pan Li 
(cherry picked from commit 34ed2b4593fa98b613632d0dde30b6ba3e7ecad9)

Diff:
---
 gcc/config/riscv/autovec.md| 17 +
 gcc/config/riscv/riscv-protos.h|  2 +
 gcc/config/riscv/riscv-v.cc| 19 ++
 gcc/config/riscv/riscv.cc  | 55 
 gcc/config/riscv/riscv.md  | 11 
 gcc/config/riscv/vector.md | 12 ++--
 .../riscv/rvv/autovec/binop/vec_sat_binary.h   | 33 ++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-1.c  | 19 ++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-2.c  | 20 ++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-3.c  | 20 ++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-4.c  | 20 ++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c  | 75 ++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c  | 75 ++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c  | 75 ++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c  | 75 ++
 gcc/testsuite/gcc.target/riscv/sat_arith.h | 31 +
 gcc/testsuite/gcc.target/riscv/sat_u_add-1.c   | 19 ++
 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] internal-fn: Do not force vcond_mask operands to reg.

2024-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:17dfc9744f4995d3161eeba104bd86391005769b

commit 17dfc9744f4995d3161eeba104bd86391005769b
Author: Robin Dapp 
Date:   Fri May 10 12:44:44 2024 +0200

internal-fn: Do not force vcond_mask operands to reg.

In order to directly use constants this patch removes force_regs
in the vcond_mask expander.

gcc/ChangeLog:

PR middle-end/113474

* internal-fn.cc (expand_vec_cond_mask_optab_fn):  Remove
force_regs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr113474.c: New test.

(cherry picked from commit 7ca35f2e430081d6ec91e910002f92d9713350fa)

Diff:
---
 gcc/internal-fn.cc|  3 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c | 13 +
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 73045ca8c8c1..9c09026793fa 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3165,9 +3165,6 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
 
-  mask = force_reg (mask_mode, mask);
-  rtx_op1 = force_reg (mode, rtx_op1);
-
   rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
   create_output_operand ([0], target, mode);
   create_input_operand ([1], rtx_op1, mode);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c
new file mode 100644
index ..0364bf9f5e38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target riscv_v } }  */
+/* { dg-additional-options "-std=c99" }  */
+
+void
+foo (int n, int **a)
+{
+  int b;
+  for (b = 0; b < n; b++)
+for (long e = 8; e > 0; e--)
+  a[b][e] = a[b][e] == 15;
+}
+
+/* { dg-final { scan-assembler "vmerge.vim" } }  */


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Cleanup some temporally files [NFC]

2024-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:586e678cd18c8d7a72e5f785094d911a098092ff

commit 586e678cd18c8d7a72e5f785094d911a098092ff
Author: Pan Li 
Date:   Fri May 17 07:45:19 2024 +0800

RISC-V: Cleanup some temporally files [NFC]

Just notice some temporally files under gcc/config/riscv,
deleted as useless.

* Empty file j.
* Vim swap file.

gcc/ChangeLog:

* config/riscv/.riscv.cc.swo: Removed.
* config/riscv/j: Removed.

Signed-off-by: Pan Li 
(cherry picked from commit d477d683d5c6db90c80d348c795709ae6444ba7a)

Diff:
---
 gcc/config/riscv/.riscv.cc.swo | Bin 417792 -> 0 bytes
 gcc/config/riscv/j |   0
 2 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/gcc/config/riscv/.riscv.cc.swo b/gcc/config/riscv/.riscv.cc.swo
deleted file mode 100644
index 77ed37353bee..
Binary files a/gcc/config/riscv/.riscv.cc.swo and /dev/null differ
diff --git a/gcc/config/riscv/j b/gcc/config/riscv/j
deleted file mode 100644
index e69de29bb2d1..


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Enable vectorizable early exit testsuite

2024-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:c1ad575242ff3dee66f2775412b1c65efbc2269b

commit c1ad575242ff3dee66f2775412b1c65efbc2269b
Author: Pan Li 
Date:   Thu May 16 10:04:10 2024 +0800

RISC-V: Enable vectorizable early exit testsuite

After we supported vectorizable early exit in RISC-V,  we would like to
enable the gcc vect test for vectorizable early test.

The vect-early-break_124-pr114403.c failed to vectorize for now.
Because that the __builtin_memcpy with 8 bytes failed to folded into
int64 assignment during ccp1.  We will improve that first and mark
this as xfail for RISC-V.

The below tests are passed for this patch:
1. The riscv fully regression tests.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-mask-store-1.c: Add pragma novector as it will
have 2 times LOOP VECTORIZED in RISC-V.
* gcc.dg/vect/vect-early-break_124-pr114403.c: Xfail for the
riscv backend.
* lib/target-supports.exp: Add RISC-V backend.

Signed-off-by: Pan Li 
(cherry picked from commit 556e777298dac8574533935000c57335c5232921)

Diff:
---
 gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c  | 2 ++
 gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c | 2 +-
 gcc/testsuite/lib/target-supports.exp | 2 ++
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c 
b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
index fdd9032da98a..2f80bf89e5e6 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
@@ -28,6 +28,8 @@ main ()
 
   if (__builtin_memcmp (x, res, sizeof (x)) != 0)
 abort ();
+
+#pragma GCC novector
   for (int i = 0; i < 32; ++i)
 if (flag[i] != 0 && flag[i] != 1)
   abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
index 51abf245ccb5..101ae1e0eaa1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
@@ -2,7 +2,7 @@
 /* { dg-require-effective-target vect_early_break_hw } */
 /* { dg-require-effective-target vect_long_long } */
 
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { xfail riscv*-*-* } } 
} */
 
 #include "tree-vect.h"
 
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 3a55b2a4159c..6c828b73ded3 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4105,6 +4105,7 @@ proc check_effective_target_vect_early_break { } {
|| [check_effective_target_arm_v8_neon_ok]
|| [check_effective_target_sse4]
|| [istarget amdgcn-*-*]
+   || [check_effective_target_riscv_v]
}}]
 }
 
@@ -4120,6 +4121,7 @@ proc check_effective_target_vect_early_break_hw { } {
|| [check_effective_target_arm_v8_neon_hw]
|| [check_sse4_hw_available]
|| [istarget amdgcn-*-*]
+   || [check_effective_target_riscv_v_ok]
}}]
 }


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Implement vectorizable early exit with vcond_mask_len

2024-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:b1aab03aed7f3d8c9b104b5f596e7e9853b8d5e6

commit b1aab03aed7f3d8c9b104b5f596e7e9853b8d5e6
Author: Pan Li 
Date:   Thu May 16 10:02:40 2024 +0800

RISC-V: Implement vectorizable early exit with vcond_mask_len

After we support the loop lens for the vectorizable,  we would like to
implement the feature for the RISC-V target.  Given below example:

unsigned vect_a[1923];
unsigned vect_b[1923];

void test (unsigned limit, int n)
{
  for (int i = 0; i < n; i++)
{
  vect_b[i] = limit + i;

  if (vect_a[i] > limit)
{
  ret = vect_b[i];
  return ret;
}

  vect_a[i] = limit;
}
}

Before this patch:
  ...
.L8:
  swa3,0(a5)
  addiw a0,a0,1
  addi  a4,a4,4
  addi  a5,a5,4
  beq   a1,a0,.L2
.L4:
  swa0,0(a4)
  lwa2,0(a5)
  bleu  a2,a3,.L8
  ret

After this patch:
  ...
.L5:
  vsetvli   a5,a3,e8,mf4,ta,ma
  vmv1r.v   v4,v2
  vsetvli   t4,zero,e32,m1,ta,ma
  vmv.v.x   v1,a5
  vadd.vv   v2,v2,v1
  vsetvli   zero,a5,e32,m1,ta,ma
  vadd.vv   v5,v4,v3
  slli  a6,a5,2
  vle32.v   v1,0(t1)
  vmsltu.vv v1,v3,v1
  vcpop.m   t4,v1
  beq   t4,zero,.L4
  vmv.x.s   a4,v4
.L3:
  ...

The below tests are passed for this patch:
1. The riscv fully regression tests.

gcc/ChangeLog:

* 
config/riscv/autovec-opt.md(*vcond_mask_len_popcount_):
New pattern of vcond_mask_len_popcount for vector bool mode.
* config/riscv/autovec.md (vcond_mask_len_): New pattern of
vcond_mask_len for vector bool mode.
(cbranch4): New pattern for vector bool mode.
* config/riscv/vector-iterators.md: Add new unspec 
UNSPEC_SELECT_MASK.
* config/riscv/vector.md (@pred_popcount): Add VLS 
mode
to popcount pattern.
(@pred_popcount): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/early-break-1.c: New test.
* gcc.target/riscv/rvv/autovec/early-break-2.c: New test.

Signed-off-by: Pan Li 
(cherry picked from commit 6c1de786e53a11150feb16ba990d0d6c6fd910db)

Diff:
---
 gcc/config/riscv/autovec-opt.md| 33 
 gcc/config/riscv/autovec.md| 61 ++
 gcc/config/riscv/vector-iterators.md   |  1 +
 gcc/config/riscv/vector.md | 18 +++
 .../gcc.target/riscv/rvv/autovec/early-break-1.c   | 34 
 .../gcc.target/riscv/rvv/autovec/early-break-2.c   | 37 +
 6 files changed, 175 insertions(+), 9 deletions(-)

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 645dc53d8680..04f85d8e4553 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1436,3 +1436,36 @@
 DONE;
   }
   [(set_attr "type" "vmalu")])
+
+;; Optimization pattern for early break auto-vectorization
+;; vcond_mask_len (mask, ones, zeros, len, bias) + vlmax popcount
+;; -> non vlmax popcount (mask, len)
+(define_insn_and_split "*vcond_mask_len_popcount_"
+  [(set (match_operand:P 0 "register_operand")
+(popcount:P
+ (unspec:VB_VLS [
+  (unspec:VB_VLS [
+   (match_operand:VB_VLS 1 "register_operand")
+   (match_operand:VB_VLS 2 "const_1_operand")
+   (match_operand:VB_VLS 3 "const_0_operand")
+   (match_operand 4 "autovec_length_operand")
+   (match_operand 5 "const_0_operand")] UNSPEC_SELECT_MASK)
+  (match_operand 6 "autovec_length_operand")
+  (const_int 1)
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)))]
+  "TARGET_VECTOR
+   && can_create_pseudo_p ()
+   && riscv_vector::get_vector_mode (Pmode, GET_MODE_NUNITS 
(mode)).exists ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+riscv_vector::emit_nonvlmax_insn (
+   code_for_pred_popcount (mode, Pmode),
+   riscv_vector::CPOP_OP,
+   operands, operands[4]);
+DONE;
+  }
+  [(set_attr "type" "vector")]
+)
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index aa1ae0fe075b..1ee3c8052fb4 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2612,3 +2612,64 @@
 DONE;
   }
 )
+
+;; =
+;; == Early break auto-vectorization patterns
+;; =
+
+;; vcond_mask_len (mask, 1s, 0s, len, bias)
+;; => mask[i] = mask[i] && i < len ? 1 : 0
+(define_insn_and_split "vcond_mask_len_"
+  [(set (match_operand:VB 0 "register_operand")
+(unspec: VB [
+ (match_operand:VB 1 "register_operand")
+ (match_operand:VB 2 "const_1_operand")
+ (match_operand:VB 3 "const_0_operand")
+ (match_operand 4 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] Vect: Support loop len in vectorizable early exit

2024-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:4ec3a6b6022c1853cfd5866dea0324a4002413b2

commit 4ec3a6b6022c1853cfd5866dea0324a4002413b2
Author: Pan Li 
Date:   Thu May 16 09:58:13 2024 +0800

Vect: Support loop len in vectorizable early exit

This patch adds early break auto-vectorization support for target which
use length on partial vectorization.  Consider this following example:

unsigned vect_a[802];
unsigned vect_b[802];

void test (unsigned x, int n)
{
  for (int i = 0; i < n; i++)
  {
vect_b[i] = x + i;

if (vect_a[i] > x)
  break;

vect_a[i] = x;
  }
}

We use VCOND_MASK_LEN to simulate the generate (mask && i < len + bias).
And then the IR of RVV looks like below:

  ...
  _87 = .SELECT_VL (ivtmp_85, POLY_INT_CST [32, 32]);
  _55 = (int) _87;
  ...
  mask_patt_6.13_69 = vect_cst__62 < vect__3.12_67;
  vec_len_mask_72 = .VCOND_MASK_LEN (mask_patt_6.13_69, { -1, ... }, \
{0, ... }, _87, 0);
  if (vec_len_mask_72 != { 0, ... })
goto ; [5.50%]
  else
goto ; [94.50%]

The below tests are passed for this patch:
1. The riscv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

gcc/ChangeLog:

* tree-vect-loop.cc (vect_gen_loop_len_mask): New func to gen
the loop len mask.
* tree-vect-stmts.cc (vectorizable_early_exit): Invoke the
vect_gen_loop_len_mask for 1 or more stmt(s).
* tree-vectorizer.h (vect_gen_loop_len_mask): New func decl
for vect_gen_loop_len_mask.

Signed-off-by: Pan Li 
(cherry picked from commit 57f8a2f67c1536be23231808ab00613ab69193ed)

Diff:
---
 gcc/tree-vect-loop.cc  | 27 +++
 gcc/tree-vect-stmts.cc | 17 +++--
 gcc/tree-vectorizer.h  |  4 
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 29c03c246d45..6ff3ca09dc6a 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -11394,6 +11394,33 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
gimple_stmt_iterator *gsi,
   return loop_len;
 }
 
+/* Generate the tree for the loop len mask and return it.  Given the lens,
+   nvectors, vectype, index and factor to gen the len mask as below.
+
+   tree len_mask = VCOND_MASK_LEN (compare_mask, ones, zero, len, bias)
+*/
+tree
+vect_gen_loop_len_mask (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi,
+   gimple_stmt_iterator *cond_gsi, vec_loop_lens *lens,
+   unsigned int nvectors, tree vectype, tree stmt,
+   unsigned int index, unsigned int factor)
+{
+  tree all_one_mask = build_all_ones_cst (vectype);
+  tree all_zero_mask = build_zero_cst (vectype);
+  tree len = vect_get_loop_len (loop_vinfo, gsi, lens, nvectors, vectype, 
index,
+   factor);
+  tree bias = build_int_cst (intQI_type_node,
+LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo));
+  tree len_mask = make_temp_ssa_name (TREE_TYPE (stmt), NULL, "vec_len_mask");
+  gcall *call = gimple_build_call_internal (IFN_VCOND_MASK_LEN, 5, stmt,
+   all_one_mask, all_zero_mask, len,
+   bias);
+  gimple_call_set_lhs (call, len_mask);
+  gsi_insert_before (cond_gsi, call, GSI_SAME_STMT);
+
+  return len_mask;
+}
+
 /* Scale profiling counters by estimation for LOOP which is vectorized
by factor VF.
If FLAT is true, the loop we started with had unrealistically flat
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index f8d8636b139a..d592dff73e33 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12893,7 +12893,9 @@ vectorizable_early_exit (vec_info *vinfo, stmt_vec_info 
stmt_info,
 ncopies = vect_get_num_copies (loop_vinfo, vectype);
 
   vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
+  vec_loop_lens *lens = _VINFO_LENS (loop_vinfo);
   bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+  bool len_loop_p = LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo);
 
   /* Now build the new conditional.  Pattern gimple_conds get dropped during
  codegen so we must replace the original insn.  */
@@ -12957,12 +12959,11 @@ vectorizable_early_exit (vec_info *vinfo, 
stmt_vec_info stmt_info,
{
  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
  OPTIMIZE_FOR_SPEED))
-   return false;
+   vect_record_loop_len (loop_vinfo, lens, ncopies, vectype, 1);
  else
vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
}
 
-
   return true;
 }
 
@@ -13015,6 +13016,15 @@ vectorizable_early_exit (vec_info *vinfo, 
stmt_vec_info stmt_info,
  

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:51b69c80a76ba767ed166e93a569a84dae445b23

commit 51b69c80a76ba767ed166e93a569a84dae445b23
Author: Pan Li 
Date:   Wed May 15 10:14:05 2024 +0800

Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

This patch would like to add the middle-end presentation for the
saturation add.  Aka set the result of add to the max when overflow.
It will take the pattern similar as below.

SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))

Take uint8_t as example, we will have:

* SAT_ADD (1, 254)   => 255.
* SAT_ADD (1, 255)   => 255.
* SAT_ADD (2, 255)   => 255.
* SAT_ADD (255, 255) => 255.

Given below example for the unsigned scalar integer uint64_t:

uint64_t sat_add_u64 (uint64_t x, uint64_t y)
{
  return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
}

Before this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  _Bool _2;
  long unsigned int _3;
  long unsigned int _4;
  uint64_t _7;
  long unsigned int _10;
  __complex__ long unsigned int _11;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
  _1 = REALPART_EXPR <_11>;
  _10 = IMAGPART_EXPR <_11>;
  _2 = _10 != 0;
  _3 = (long unsigned int) _2;
  _4 = -_3;
  _7 = _1 | _4;
  return _7;
;;succ:   EXIT

}

After this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _7;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
  return _7;
;;succ:   EXIT
}

The below tests are passed for this patch:
1. The riscv fully regression tests.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
to the return true switch case(s).
* internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
* match.pd: Add unsigned SAT_ADD match(es).
* optabs.def (OPTAB_NL): Remove fixed-point limitation for
us/ssadd.
* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
extern func decl generated in match.pd match.
(match_saturation_arith): New func impl to match the saturation 
arith.
(math_opts_dom_walker::after_dom_children): Try match saturation
arith when IOR expr.

Signed-off-by: Pan Li 
(cherry picked from commit 52b0536710ff3f3ace72ab00ce9ef6c630cd1183)

Diff:
---
 gcc/internal-fn.cc|  1 +
 gcc/internal-fn.def   |  2 ++
 gcc/match.pd  | 51 +++
 gcc/optabs.def|  4 ++--
 gcc/tree-ssa-math-opts.cc | 32 +
 5 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 0a7053c2286c..73045ca8c8c1 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
 case IFN_UBSAN_CHECK_MUL:
 case IFN_ADD_OVERFLOW:
 case IFN_MUL_OVERFLOW:
+case IFN_SAT_ADD:
 case IFN_VEC_WIDEN_PLUS:
 case IFN_VEC_WIDEN_PLUS_LO:
 case IFN_VEC_WIDEN_PLUS_HI:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 848bb9dbff3f..25badbb86e56 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | 
ECF_NOTHROW, first,
 DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first,
  smulhrs, umulhrs, binary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, binary)
+
 DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
 DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
 DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index d401e7503e62..aa1e2875c604 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3043,6 +3043,57 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
|| POINTER_TYPE_P (itype))
   && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))
 
+/* Unsigned Saturation Add */
+(match (usadd_left_part_1 @0 @1)
+ (plus:c @0 @1)
+ (if (INTEGRAL_TYPE_P (type)
+  && TYPE_UNSIGNED (TREE_TYPE (@0))
+  && types_match (type, TREE_TYPE (@0))
+  && types_match (type, TREE_TYPE (@1)
+
+(match (usadd_left_part_2 @0 @1)
+ (realpart (IFN_ADD_OVERFLOW:c @0 @1))
+ (if (INTEGRAL_TYPE_P (type)
+  && TYPE_UNSIGNED (TREE_TYPE (@0))
+  && types_match (type, TREE_TYPE (@0))
+  && types_match (type, TREE_TYPE (@1)
+
+(match (usadd_right_part_1 @0 @1)
+ (negate (convert (lt (plus:c @0 @1) @0)))
+ (if (INTEGRAL_TYPE_P 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] Vect: Support new IFN SAT_ADD for unsigned vector int

2024-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:674362d73e964815cdb700edd9fedbfc34c24c21

commit 674362d73e964815cdb700edd9fedbfc34c24c21
Author: Pan Li 
Date:   Wed May 15 10:14:06 2024 +0800

Vect: Support new IFN SAT_ADD for unsigned vector int

For vectorize, we leverage the existing vect pattern recog to find
the pattern similar to scalar and let the vectorizer to perform
the rest part for standard name usadd3 in vector mode.
The riscv vector backend have insn "Vector Single-Width Saturating
Add and Subtract" which can be leveraged when expand the usadd3
in vector mode.  For example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
}

Before this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
  ivtmp_58 = _80 * 8;
  vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
  vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
  vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
  mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
  vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615,
... }, vect__7.11_66);
  .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, 
vect__12.15_72);
  vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
  vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
  vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
  ivtmp_79 = ivtmp_78 - _80;
  ...
}

After this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
  ivtmp_46 = _62 * 8;
  vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
  vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
  vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
  .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, 
vect__12.11_54);
  ...
}

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* tree-vect-patterns.cc (gimple_unsigned_integer_sat_add): New
func decl generated by match.pd match.
(vect_recog_sat_add_pattern): New func impl to recog the pattern
for unsigned SAT_ADD.

Signed-off-by: Pan Li 
(cherry picked from commit d4dee347b3fe1982bab26485ff31cd039c9df010)

Diff:
---
 gcc/tree-vect-patterns.cc | 52 +++
 1 file changed, 52 insertions(+)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 87c2acff386d..6fd2373644f4 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4487,6 +4487,57 @@ vect_recog_mult_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
+
+/*
+ * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
+ *   _7 = _4 + _6;
+ *   _8 = _4 > _7;
+ *   _9 = (long unsigned int) _8;
+ *   _10 = -_9;
+ *   _12 = _7 | _10;
+ *
+ * And then simplied to
+ *   _12 = .SAT_ADD (_4, _6);
+ */
+
+static gimple *
+vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
+   tree *type_out)
+{
+  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
+
+  if (!is_gimple_assign (last_stmt))
+return NULL;
+
+  tree res_ops[2];
+  tree lhs = gimple_assign_lhs (last_stmt);
+
+  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
+{
+  tree itype = TREE_TYPE (res_ops[0]);
+  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
+
+  if (vtype != NULL_TREE
+   && direct_internal_fn_supported_p (IFN_SAT_ADD, vtype,
+  OPTIMIZE_FOR_BOTH))
+   {
+ *type_out = vtype;
+ gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, res_ops[0],
+   res_ops[1]);
+
+ gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
+ gimple_call_set_nothrow (call, /* nothrow_p */ false);
+ gimple_set_location (call, gimple_location (last_stmt));
+
+ vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
+ return call;
+   }
+}
+
+  return NULL;
+}
+
 /* Detect a signed division by a constant that wouldn't be
otherwise vectorized:
 
@@ -6987,6 +7038,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
   { vect_recog_divmod_pattern, "divmod" },
   { 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: testsuite: Drop march-string in cmpmemsi/cpymemsi tests

2024-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:faf2f9ed73969d838026027566473bde14db748b

commit faf2f9ed73969d838026027566473bde14db748b
Author: Christoph Müllner 
Date:   Thu May 16 09:53:47 2024 +0200

RISC-V: testsuite: Drop march-string in cmpmemsi/cpymemsi tests

The tests cmpmemsi-1.c and cpymemsi-1.c are execution ("dg-do run")
tests, which does not have any restrictions for the enabled extensions.
Further, no other listed options are required.
Let's drop the options, so that the test can also be executed on
non-f and non-d targets.  However, we need to set options to the
defaults without '-ansi', because the included test file uses the
'asm' keyword, which is not part of ANSI C.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmpmemsi-1.c: Drop options.
* gcc.target/riscv/cpymemsi-1.c: Likewise.

Signed-off-by: Christoph Müllner 
(cherry picked from commit b8b82bb05c10544da05cd0d3d39e6bc3763a8d9f)

Diff:
---
 gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c | 3 +--
 gcc/testsuite/gcc.target/riscv/cpymemsi-1.c | 4 +---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c 
b/gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c
index d7e0bc474073..698f27d89fbf 100644
--- a/gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c
@@ -1,6 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-march=rv32gc_zbb -save-temps -g0 -fno-lto" { target { rv32 } 
} } */
-/* { dg-options "-march=rv64gc_zbb -save-temps -g0 -fno-lto" { target { rv64 } 
} } */
+/* { dg-options "-pedantic-errors" } */
 /* { dg-timeout-factor 2 } */
 
 #include "../../gcc.dg/memcmp-1.c"
diff --git a/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c 
b/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
index 983b564ccaf7..30e9f119bedc 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
@@ -1,7 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-march=rv32gc -save-temps -g0 -fno-lto" { target { rv32 } } } 
*/
-/* { dg-options "-march=rv64gc -save-temps -g0 -fno-lto" { target { rv64 } } } 
*/
-/* { dg-additional-options "-DRUN_FRACTION=11" { target simulator } } */
+/* { dg-options "-pedantic-errors" } */
 /* { dg-timeout-factor 2 } */
 
 #include "../../gcc.dg/memcmp-1.c"


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add Zvfbfwma extension to the -march= option

2024-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:67195fbc4deac8659d8f65ab922416ac451ae5bb

commit 67195fbc4deac8659d8f65ab922416ac451ae5bb
Author: Xiao Zeng 
Date:   Wed May 15 10:03:40 2024 +0800

RISC-V: Add Zvfbfwma extension to the -march= option

This patch would like to add new sub extension (aka Zvfbfwma) to the
-march= option. It introduces a new data type BF16.

1 In spec: "Zvfbfwma requires the Zvfbfmin extension and the Zfbfmin 
extension."
  1.1 In EmbeddedProcessor: Zvfbfwma -> Zvfbfmin -> Zve32f
  1.2 In Application Processor: Zvfbfwma -> Zvfbfmin -> V
  1.3 In both scenarios, there are: Zvfbfwma -> Zfbfmin

2 Zvfbfmin's information is in:



3 Zfbfmin's formation is in:



4 Depending on different usage scenarios, the Zvfbfwma extension may
depend on 'V' or 'Zve32f'. This patch only implements dependencies in
scenario of Embedded Processor. This is consistent with the processing
strategy in Zvfbfmin. In scenario of Application Processor, it is
necessary to explicitly indicate the dependent 'V' extension.

5 You can locate more information about Zvfbfwma from below spec doc:



gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
(riscv_implied_info): Add zvfbfwma item.
(riscv_ext_version_table): Ditto.
(riscv_ext_flag_table): Ditto.
* config/riscv/riscv.opt:
(MASK_ZVFBFWMA): New macro.
(TARGET_ZVFBFWMA): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-37.c: New test.
* gcc.target/riscv/arch-38.c: New test.
* gcc.target/riscv/predef-36.c: New test.
* gcc.target/riscv/predef-37.c: New test.

(cherry picked from commit 38dd4e26e07c6be7cf4d169141ee4f3a03f3a09d)

Diff:
---
 gcc/common/config/riscv/riscv-common.cc|  5 
 gcc/config/riscv/riscv.opt |  2 ++
 gcc/testsuite/gcc.target/riscv/arch-37.c   |  5 
 gcc/testsuite/gcc.target/riscv/arch-38.c   |  5 
 gcc/testsuite/gcc.target/riscv/predef-36.c | 48 ++
 gcc/testsuite/gcc.target/riscv/predef-37.c | 48 ++
 6 files changed, 113 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index fb76017ffbc0..88204393fde0 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -162,6 +162,8 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zfa", "f"},
 
   {"zvfbfmin", "zve32f"},
+  {"zvfbfwma", "zvfbfmin"},
+  {"zvfbfwma", "zfbfmin"},
   {"zvfhmin", "zve32f"},
   {"zvfh", "zve32f"},
   {"zvfh", "zfhmin"},
@@ -336,6 +338,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zfh",   ISA_SPEC_CLASS_NONE, 1, 0},
   {"zfhmin",ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvfbfmin",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zvfbfwma",  ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvfhmin",   ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvfh",  ISA_SPEC_CLASS_NONE, 1, 0},
 
@@ -1667,6 +1670,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zve64f",   _options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_32},
   {"zve64d",   _options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_64},
   {"zvfbfmin", _options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_BF_16},
+  {"zvfbfwma", _options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_BF_16},
   {"zvfhmin",  _options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_16},
   {"zvfh", _options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_16},
 
@@ -1704,6 +1708,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zfhmin",_options::x_riscv_zf_subext, MASK_ZFHMIN},
   {"zfh",   _options::x_riscv_zf_subext, MASK_ZFH},
   {"zvfbfmin",  _options::x_riscv_zf_subext, MASK_ZVFBFMIN},
+  {"zvfbfwma",  _options::x_riscv_zf_subext, MASK_ZVFBFWMA},
   {"zvfhmin",   _options::x_riscv_zf_subext, MASK_ZVFHMIN},
   {"zvfh",  _options::x_riscv_zf_subext, MASK_ZVFH},
 
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 1252834aec5b..d209ac896fde 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -401,6 +401,8 @@ Mask(ZFH) Var(riscv_zf_subext)
 
 Mask(ZVFBFMIN) Var(riscv_zf_subext)
 
+Mask(ZVFBFWMA) Var(riscv_zf_subext)
+
 Mask(ZVFHMIN) Var(riscv_zf_subext)
 
 Mask(ZVFH)Var(riscv_zf_subext)
diff --git a/gcc/testsuite/gcc.target/riscv/arch-37.c 
b/gcc/testsuite/gcc.target/riscv/arch-37.c
new file mode 100644
index ..5b19a73c5567
--- /dev/null
+++ 

Re: [PATCH] RISC-V: Fix "Nan-box the result of movbf on soft-bf16"

2024-05-17 Thread Jeff Law




On 5/15/24 7:55 PM, Xiao Zeng wrote:

1 According to unpriv-isa spec:

   1.1 "FMV.H.X moves the half-precision value encoded in IEEE 754-2008
   standard encoding from the lower 16 bits of integer register rs1
   to the floating-point register rd, NaN-boxing the result."
   1.2 "FMV.W.X moves the single-precision value encoded in IEEE 754-2008
   standard encoding from the lower 32 bits of integer register rs1
   to the floating-point register rd. The bits are not modified in the
   transfer, and in particular, the payloads of non-canonical NaNs are 
preserved."

2 When (!TARGET_ZFHMIN == true && TARGET_HARD_FLOAT == true), instruction needs
to be added to complete the Nan-box, as done in
"RISC-V: Nan-box the result of movhf on soft-fp16":


3 Consider the "RISC-V: Nan-box the result of movbf on soft-bf16" in:

It ignores that both hf16 and bf16 are 16bits floating-point.

4 zfbfmin -> zfhmin in:


gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Optimize movbf
with Nan-boxing value.
* config/riscv/riscv.md (*movhf_softfloat_boxing): Expand movbf
with Nan-boxing value.
(*mov_softfloat_boxing): Ditto.
with Nan-boxing value.
(*movbf_softfloat_boxing): Delete abandon pattern.
---
  gcc/config/riscv/riscv.cc | 15 +--
  gcc/config/riscv/riscv.md | 19 +--
  2 files changed, 10 insertions(+), 24 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 4067505270e..04513537aad 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3178,13 +3178,10 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
   (set (reg:SI/DI mask) (const_int -65536)
   (set (reg:SI/DI temp) (zero_extend:SI/DI (subreg:HI (reg:HF/BF src) 0)))
   (set (reg:SI/DI temp) (ior:SI/DI (reg:SI/DI mask) (reg:SI/DI temp)))
- (set (reg:HF/BF dest) (unspec:HF/BF[ (reg:SI/DI temp) ]
-   UNSPEC_FMV_SFP16_X/UNSPEC_FMV_SBF16_X))
- */
+ (set (reg:HF/BF dest) (unspec:HF/BF[ (reg:SI/DI temp) ] 
UNSPEC_FMV_FP16_X))
+  */
  
-  if (TARGET_HARD_FLOAT

-  && ((!TARGET_ZFHMIN && mode == HFmode)
- || (!TARGET_ZFBFMIN && mode == BFmode))
+  if (TARGET_HARD_FLOAT && !TARGET_ZFHMIN && (mode == HFmode || mode == BFmode)
We generally prefer not to mix && and || operators on the same line. 
I'd suggest


if (TARGET_HARD_FLOAT
&& !TARGET_ZFHMIN
&& (mode == HFmode || mode == BFmode)
[ ... ]



@@ -1959,23 +1958,15 @@
 (set_attr "type" "fmove,move,load,store,mtc,mfc")
 (set_attr "mode" "")])
  
-(define_insn "*movhf_softfloat_boxing"

-  [(set (match_operand:HF 0 "register_operand""=f")
-(unspec:HF [(match_operand:X 1 "register_operand" " r")] 
UNSPEC_FMV_SFP16_X))]
+(define_insn "*mov_softfloat_boxing"
+  [(set (match_operand:HFBF 0 "register_operand" "=f")
+(unspec:HFBF [(match_operand:X 1 "register_operand" " r")]
+UNSPEC_FMV_FP16_X))]
"!TARGET_ZFHMIN"
I think the linter complained about having 8 spaces instead of a tab in 
one of the lines above.


With those fixes, this is fine for the trunk.

jeff


Re: [PATCH] RISC-V: Modify _Bfloat16 to __bf16

2024-05-17 Thread Jeff Law




On 5/17/24 2:19 AM, Kito Cheng wrote:

LGTM, thanks for fixing this :)
And just to be clear for Xiao, you can go ahead and commit this patch to 
the trunk.  An ACK from Kito, Juzhe, Palmer, Robin or myself is all you 
need for a change that is isolated to RISC-V code.


jeff



Re: [PATCH] RISC-V: Remove dead perm series code and document.

2024-05-17 Thread Jeff Law




On 5/17/24 9:27 AM, Robin Dapp wrote:

Hi,

with the introduction of shuffle_series_patterns the explicit handler
code for a perm series is dead.  This patch removes it and also adds
a function-level comment to shuffle_series_patterns.

Regtested on rv64gcv_zvfh_zvbb.

Regards
  Robin

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Document.
(shuffle_extract_and_slide1up_patterns): Remove.

OK.

Jeff



Re: [PATCH v1] RISC-V: Cleanup some temporally files [NFC]

2024-05-17 Thread Jeff Law




On 5/16/24 6:12 PM, Li, Pan2 wrote:

Committed, thanks Juzhe.

Thanks for cleaning up my little mess!  Sorry about that.

jeff



Re: [PATCH gcc-13] Fix RISC-V missing stack tie

2024-05-16 Thread Jeff Law




On 5/16/24 12:24 PM, Palmer Dabbelt wrote:



gcc/
* config/riscv/riscv.cc (riscv_expand_prologue): Add missing stack
tie for scalable and final stack adjustment if needed.

Co-authored-by: Raphael Zinsly 

(cherry picked from commit c65046ff2ef0a9a46e59bc0b3369b2d226f6a239)
---
I've only build tested this one, but it's tripping up some of the Fedora
folks here https://bugzilla.redhat.com/show_bug.cgi?id=2242327 so I
figured it's worth backporting.
Yes, that's the the original report from Florian that led Raphael and I 
to dive in.  Definitely worth backporting.


jeff



Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-16 Thread Jeff Law




On 5/16/24 5:58 AM, Richard Biener wrote:

On Thu, May 16, 2024 at 11:35 AM Li, Pan2  wrote:



OK.


Thanks Richard for help and coaching. To double confirm, are you OK with this 
patch only or for the series patch(es) of SAT middle-end?
Thanks again for reviewing and suggestions.


For the series, the riscv specific part of course needs riscv approval.
Yea, we'll take a look at it.  Tons of stuff to go through, but this is 
definitely on the list.


jeff



Re: [PATCH] tree-optimization/13962 - handle ptr-ptr compares in ptrs_compare_unequal

2024-05-16 Thread Jeff Law




On 5/16/24 6:03 AM, Richard Biener wrote:

Now that we handle pt.null conservatively we can implement the missing
tracking of constant pool entries (aka STRING_CST) and handle
ptr-ptr compares using points-to info in ptrs_compare_unequal.

Bootstrapped on x86_64-unknown-linux-gnu, (re-)testing in progress.

Richard.

PR tree-optimization/13962
PR tree-optimization/96564
* tree-ssa-alias.h (pt_solution::const_pool): New flag.
* tree-ssa-alias.cc (ptrs_compare_unequal): Handle pointer-pointer
compares.
(dump_points_to_solution): Dump the const_pool flag, fix guard
of flag dumping.
* gimple-pretty-print.cc (pp_points_to_solution): Likewise.
* tree-ssa-structalias.cc (find_what_var_points_to): Set
the const_pool flag for STRING.
(pt_solution_ior_into): Handle the const_pool flag.
(ipa_escaped_pt): Initialize it.

* gcc.dg/tree-ssa/alias-39.c: New testcase.
* g++.dg/vect/pr68145.cc: Use -fno-tree-pta to avoid UB
to manifest in transforms no longer vectorizing this testcase
for an ICE.
You might want to test this against 92539 as well.  There's a nonzero 
chance it'll resolve that one.


jeff



[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] Add missing hunk in recent change.

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:45c5684c8242add5e97a392374dc160a6e68f2f0

commit 45c5684c8242add5e97a392374dc160a6e68f2f0
Author: Jeff Law 
Date:   Wed May 15 17:05:24 2024 -0600

Add missing hunk in recent change.

gcc/
* config/riscv/riscv-string.cc: Add missing hunk from last change.

(cherry picked from commit d7e6fe0f72ad41b8361f927d2796dbc275347297)

Diff:
---
 gcc/config/riscv/riscv-string.cc | 177 +++
 1 file changed, 177 insertions(+)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index cbb9724d2308..83e7afbd693b 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -627,6 +627,183 @@ riscv_expand_strlen (rtx result, rtx src, rtx 
search_char, rtx align)
   return false;
 }
 
+/* Generate the sequence of load and compares for memcmp using Zbb.
+
+   RESULT is the register where the return value of memcmp will be stored.
+   The source pointers are SRC1 and SRC2 (NBYTES bytes to compare).
+   DATA1 and DATA2 are registers where the data chunks will be stored.
+   DIFF_LABEL is the location of the code that calculates the return value.
+   FINAL_LABEL is the location of the code that comes after the calculation
+   of the return value.  */
+
+static void
+emit_memcmp_scalar_load_and_compare (rtx result, rtx src1, rtx src2,
+unsigned HOST_WIDE_INT nbytes,
+rtx data1, rtx data2,
+rtx diff_label, rtx final_label)
+{
+  const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
+  unsigned HOST_WIDE_INT offset = 0;
+
+  while (nbytes > 0)
+{
+  unsigned HOST_WIDE_INT cmp_bytes = xlen < nbytes ? xlen : nbytes;
+  machine_mode load_mode;
+
+  /* Special cases to avoid masking of trailing bytes.  */
+  if (cmp_bytes == 1)
+   load_mode = QImode;
+  else if (cmp_bytes == 2)
+   load_mode = HImode;
+  else if (cmp_bytes == 4)
+   load_mode = SImode;
+  else
+   load_mode = Xmode;
+
+  rtx addr1 = adjust_address (src1, load_mode, offset);
+  do_load (load_mode, data1, addr1);
+  rtx addr2 = adjust_address (src2, load_mode, offset);
+  do_load (load_mode, data2, addr2);
+
+  /* Fast-path for a single byte.  */
+  if (cmp_bytes == 1)
+   {
+ rtx tmp = gen_reg_rtx (Xmode);
+ do_sub3 (tmp, data1, data2);
+ emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+ emit_jump_insn (gen_jump (final_label));
+ emit_barrier (); /* No fall-through.  */
+ return;
+   }
+
+  /* Shift off trailing bytes in words if needed.  */
+  unsigned int load_bytes = GET_MODE_SIZE (load_mode).to_constant ();
+  if (cmp_bytes < load_bytes)
+   {
+ int shamt = (load_bytes - cmp_bytes) * BITS_PER_UNIT;
+ do_ashl3 (data1, data1, GEN_INT (shamt));
+ do_ashl3 (data2, data2, GEN_INT (shamt));
+   }
+
+  /* Break out if data1 != data2 */
+  rtx cond = gen_rtx_NE (VOIDmode, data1, data2);
+  emit_unlikely_jump_insn (gen_cbranch4 (Pmode, cond, data1,
+data2, diff_label));
+  /* Fall-through on equality.  */
+
+  offset += cmp_bytes;
+  nbytes -= cmp_bytes;
+}
+}
+
+/* memcmp result calculation.
+
+   RESULT is the register where the return value will be stored.
+   The two data chunks are in DATA1 and DATA2.  */
+
+static void
+emit_memcmp_scalar_result_calculation (rtx result, rtx data1, rtx data2)
+{
+  /* Get bytes in big-endian order and compare as words.  */
+  do_bswap2 (data1, data1);
+  do_bswap2 (data2, data2);
+  /* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
+  rtx tmp = gen_reg_rtx (Xmode);
+  emit_insn (gen_slt_3 (LTU, Xmode, Xmode, tmp, data1, data2));
+  do_neg2 (tmp, tmp);
+  do_ior3 (tmp, tmp, const1_rtx);
+  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+}
+
+/* Expand memcmp using scalar instructions (incl. Zbb).
+
+   RESULT is the register where the return value will be stored.
+   The source pointers are SRC1 and SRC2 (NBYTES bytes to compare).  */
+
+static bool
+riscv_expand_block_compare_scalar (rtx result, rtx src1, rtx src2, rtx nbytes)
+{
+  const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
+
+  if (optimize_function_for_size_p (cfun))
+return false;
+
+  /* We don't support big endian.  */
+  if (BYTES_BIG_ENDIAN)
+return false;
+
+  if (!CONST_INT_P (nbytes))
+return false;
+
+  /* We need the rev (bswap) instruction.  */
+  if (!TARGET_ZBB)
+return false;
+
+  unsigned HOST_WIDE_INT length = UINTVAL (nbytes);
+
+  /* Limit to 12-bits (maximum load-offset).  */
+  if (length > IMM_REACH)
+length = IMM_REACH;
+
+  /* We need xlen-aligned memory.  */
+  unsigned HOST_WIDE_INT align = MIN (MEM_ALIGN (src1), MEM_ALIGN (src2));
+  if (align < (xlen * BITS_PER_

[gcc r15-527] Add missing hunk in recent change.

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:d7e6fe0f72ad41b8361f927d2796dbc275347297

commit r15-527-gd7e6fe0f72ad41b8361f927d2796dbc275347297
Author: Jeff Law 
Date:   Wed May 15 17:05:24 2024 -0600

Add missing hunk in recent change.

gcc/
* config/riscv/riscv-string.cc: Add missing hunk from last change.

Diff:
---
 gcc/config/riscv/riscv-string.cc | 177 +++
 1 file changed, 177 insertions(+)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index cbb9724d2308..83e7afbd693b 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -627,6 +627,183 @@ riscv_expand_strlen (rtx result, rtx src, rtx 
search_char, rtx align)
   return false;
 }
 
+/* Generate the sequence of load and compares for memcmp using Zbb.
+
+   RESULT is the register where the return value of memcmp will be stored.
+   The source pointers are SRC1 and SRC2 (NBYTES bytes to compare).
+   DATA1 and DATA2 are registers where the data chunks will be stored.
+   DIFF_LABEL is the location of the code that calculates the return value.
+   FINAL_LABEL is the location of the code that comes after the calculation
+   of the return value.  */
+
+static void
+emit_memcmp_scalar_load_and_compare (rtx result, rtx src1, rtx src2,
+unsigned HOST_WIDE_INT nbytes,
+rtx data1, rtx data2,
+rtx diff_label, rtx final_label)
+{
+  const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
+  unsigned HOST_WIDE_INT offset = 0;
+
+  while (nbytes > 0)
+{
+  unsigned HOST_WIDE_INT cmp_bytes = xlen < nbytes ? xlen : nbytes;
+  machine_mode load_mode;
+
+  /* Special cases to avoid masking of trailing bytes.  */
+  if (cmp_bytes == 1)
+   load_mode = QImode;
+  else if (cmp_bytes == 2)
+   load_mode = HImode;
+  else if (cmp_bytes == 4)
+   load_mode = SImode;
+  else
+   load_mode = Xmode;
+
+  rtx addr1 = adjust_address (src1, load_mode, offset);
+  do_load (load_mode, data1, addr1);
+  rtx addr2 = adjust_address (src2, load_mode, offset);
+  do_load (load_mode, data2, addr2);
+
+  /* Fast-path for a single byte.  */
+  if (cmp_bytes == 1)
+   {
+ rtx tmp = gen_reg_rtx (Xmode);
+ do_sub3 (tmp, data1, data2);
+ emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+ emit_jump_insn (gen_jump (final_label));
+ emit_barrier (); /* No fall-through.  */
+ return;
+   }
+
+  /* Shift off trailing bytes in words if needed.  */
+  unsigned int load_bytes = GET_MODE_SIZE (load_mode).to_constant ();
+  if (cmp_bytes < load_bytes)
+   {
+ int shamt = (load_bytes - cmp_bytes) * BITS_PER_UNIT;
+ do_ashl3 (data1, data1, GEN_INT (shamt));
+ do_ashl3 (data2, data2, GEN_INT (shamt));
+   }
+
+  /* Break out if data1 != data2 */
+  rtx cond = gen_rtx_NE (VOIDmode, data1, data2);
+  emit_unlikely_jump_insn (gen_cbranch4 (Pmode, cond, data1,
+data2, diff_label));
+  /* Fall-through on equality.  */
+
+  offset += cmp_bytes;
+  nbytes -= cmp_bytes;
+}
+}
+
+/* memcmp result calculation.
+
+   RESULT is the register where the return value will be stored.
+   The two data chunks are in DATA1 and DATA2.  */
+
+static void
+emit_memcmp_scalar_result_calculation (rtx result, rtx data1, rtx data2)
+{
+  /* Get bytes in big-endian order and compare as words.  */
+  do_bswap2 (data1, data1);
+  do_bswap2 (data2, data2);
+  /* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
+  rtx tmp = gen_reg_rtx (Xmode);
+  emit_insn (gen_slt_3 (LTU, Xmode, Xmode, tmp, data1, data2));
+  do_neg2 (tmp, tmp);
+  do_ior3 (tmp, tmp, const1_rtx);
+  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+}
+
+/* Expand memcmp using scalar instructions (incl. Zbb).
+
+   RESULT is the register where the return value will be stored.
+   The source pointers are SRC1 and SRC2 (NBYTES bytes to compare).  */
+
+static bool
+riscv_expand_block_compare_scalar (rtx result, rtx src1, rtx src2, rtx nbytes)
+{
+  const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
+
+  if (optimize_function_for_size_p (cfun))
+return false;
+
+  /* We don't support big endian.  */
+  if (BYTES_BIG_ENDIAN)
+return false;
+
+  if (!CONST_INT_P (nbytes))
+return false;
+
+  /* We need the rev (bswap) instruction.  */
+  if (!TARGET_ZBB)
+return false;
+
+  unsigned HOST_WIDE_INT length = UINTVAL (nbytes);
+
+  /* Limit to 12-bits (maximum load-offset).  */
+  if (length > IMM_REACH)
+length = IMM_REACH;
+
+  /* We need xlen-aligned memory.  */
+  unsigned HOST_WIDE_INT align = MIN (MEM_ALIGN (src1), MEM_ALIGN (src2));
+  if (align < (xlen * BITS_PER_UNIT))
+return false;
+
+  if (length > RISCV_MAX_MOVE_BYTES_STRAIG

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [v2, 2/2] RISC-V: strcmp expansion: Use adjust_address() for address calculation

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:72e6ff2bcf293116099988ebd367182cba699e9b

commit 72e6ff2bcf293116099988ebd367182cba699e9b
Author: Christoph Müllner 
Date:   Wed May 15 12:19:40 2024 -0600

[v2,2/2] RISC-V: strcmp expansion: Use adjust_address() for address 
calculation

We have an arch-independent routine to generate an address with an offset.
Let's use that instead of doing the calculation in the backend.

gcc/ChangeLog:

* config/riscv/riscv-string.cc 
(emit_strcmp_scalar_load_and_compare):
Use adjust_address() to calculate MEM-PLUS pattern.

(cherry picked from commit 1fbbae1d4ba3618a3da829a6d7e11a1606a583b3)

Diff:
---
 gcc/config/riscv/riscv-string.cc | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 8f3b6f925e01..cbb9724d2308 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -227,8 +227,6 @@ emit_strcmp_scalar_load_and_compare (rtx result, rtx src1, 
rtx src2,
 rtx final_label)
 {
   const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
-  rtx src1_addr = force_reg (Pmode, XEXP (src1, 0));
-  rtx src2_addr = force_reg (Pmode, XEXP (src2, 0));
   unsigned HOST_WIDE_INT offset = 0;
 
   rtx testval = gen_reg_rtx (Xmode);
@@ -246,10 +244,10 @@ emit_strcmp_scalar_load_and_compare (rtx result, rtx 
src1, rtx src2,
   else
load_mode = Xmode;
 
-  rtx addr1 = gen_rtx_PLUS (Pmode, src1_addr, GEN_INT (offset));
-  do_load_from_addr (load_mode, data1, addr1, src1);
-  rtx addr2 = gen_rtx_PLUS (Pmode, src2_addr, GEN_INT (offset));
-  do_load_from_addr (load_mode, data2, addr2, src2);
+  rtx addr1 = adjust_address (src1, load_mode, offset);
+  do_load (load_mode, data1, addr1);
+  rtx addr2 = adjust_address (src2, load_mode, offset);
+  do_load (load_mode, data2, addr2);
 
   if (cmp_bytes == 1)
{


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [v2, 1/2] RISC-V: Add cmpmemsi expansion

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:d57dfea6e051695349fb9f6da1c30899b7f5

commit d57dfea6e051695349fb9f6da1c30899b7f5
Author: Christoph Müllner 
Date:   Wed May 15 12:18:20 2024 -0600

[v2,1/2] RISC-V: Add cmpmemsi expansion

GCC has a generic cmpmemsi expansion via the by-pieces framework,
which shows some room for target-specific optimizations.
E.g. for comparing two aligned memory blocks of 15 bytes
we get the following sequence:

my_mem_cmp_aligned_15:
li  a4,0
j   .L2
.L8:
bgeua4,a7,.L7
.L2:
add a2,a0,a4
add a3,a1,a4
lbu a5,0(a2)
lbu a6,0(a3)
addia4,a4,1
li  a7,15// missed hoisting
subwa5,a5,a6
andia5,a5,0xff // useless
beq a5,zero,.L8
lbu a0,0(a2) // loading again!
lbu a5,0(a3) // loading again!
subwa0,a0,a5
ret
.L7:
li  a0,0
ret

Diff first byte: 15 insns
Diff second byte: 25 insns
No diff: 25 insns

Possible improvements:
* unroll the loop and use load-with-displacement to avoid offset increments
* load and compare multiple (aligned) bytes at once
* Use the bitmanip/strcmp result calculation (reverse words and
  synthesize (a2 >= a3) ? 1 : -1 in a branchless sequence)

When applying these improvements we get the following sequence:

my_mem_cmp_aligned_15:
ld  a5,0(a0)
ld  a4,0(a1)
bne a5,a4,.L2
ld  a5,8(a0)
ld  a4,8(a1)
sllia5,a5,8
sllia4,a4,8
bne a5,a4,.L2
li  a0,0
.L3:
sext.w  a0,a0
ret
.L2:
rev8a5,a5
rev8a4,a4
sltua5,a5,a4
neg a5,a5
ori a0,a5,1
j   .L3

Diff first byte: 11 insns
Diff second byte: 16 insns
No diff: 11 insns

This patch implements this improvements.

The tests consist of a execution test (similar to
gcc/testsuite/gcc.dg/torture/inline-mem-cmp-1.c) and a few tests
that test the expansion conditions (known length and alignment).

Similar to the cpymemsi expansion this patch does not introduce any
gating for the cmpmemsi expansion (on top of requiring the known length,
alignment and Zbb).

Bootstrapped and SPEC CPU 2017 tested.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_block_compare): New
prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER2): New helper
for zero_extendhi.
(do_load_from_addr): Add support for HI and SI/64 modes.
(do_load): Add helper for zero-extended loads.
(emit_memcmp_scalar_load_and_compare): New helper to emit memcmp.
(emit_memcmp_scalar_result_calculation): Likewise.
(riscv_expand_block_compare_scalar): Likewise.
(riscv_expand_block_compare): New RISC-V expander for memory 
compare.
* config/riscv/riscv.md (cmpmemsi): New cmpmem expansion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmpmemsi-1.c: New test.
* gcc.target/riscv/cmpmemsi-2.c: New test.
* gcc.target/riscv/cmpmemsi-3.c: New test.
* gcc.target/riscv/cmpmemsi.c: New test.

(cherry picked from commit 4bf1aa1ab90dd487fadc27c86523ec3562b2d2fe)

Diff:
---
 gcc/config/riscv/riscv-protos.h |  1 +
 gcc/config/riscv/riscv-string.cc| 40 +--
 gcc/config/riscv/riscv.md   | 15 ++
 gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c |  6 
 gcc/testsuite/gcc.target/riscv/cmpmemsi-2.c | 42 
 gcc/testsuite/gcc.target/riscv/cmpmemsi-3.c | 43 +
 gcc/testsuite/gcc.target/riscv/cmpmemsi.c   | 22 +++
 7 files changed, 155 insertions(+), 14 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5c8a52b78a22..565ead1382a7 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -189,6 +189,7 @@ rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
 rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
 
 /* Routines implemented in riscv-string.c.  */
+extern bool riscv_expand_block_compare (rtx, rtx, rtx, rtx);
 extern bool riscv_expand_block_move (rtx, rtx, rtx);
 extern bool riscv_expand_block_clear (rtx, rtx);
 
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 96394844bbb6..8f3b6f925e01 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -86,35 +86,47 @@ GEN_EMIT_HELPER2(th_rev) /* do_th_rev2  */
 

Re: [PATCH v2 1/2] RISC-V: Add cmpmemsi expansion

2024-05-15 Thread Jeff Law




On 5/15/24 12:49 AM, Christoph Müllner wrote:

GCC has a generic cmpmemsi expansion via the by-pieces framework,
which shows some room for target-specific optimizations.
E.g. for comparing two aligned memory blocks of 15 bytes
we get the following sequence:

my_mem_cmp_aligned_15:
 li  a4,0
 j   .L2
.L8:
 bgeua4,a7,.L7
.L2:
 add a2,a0,a4
 add a3,a1,a4
 lbu a5,0(a2)
 lbu a6,0(a3)
 addia4,a4,1
 li  a7,15// missed hoisting
 subwa5,a5,a6
 andia5,a5,0xff // useless
 beq a5,zero,.L8
 lbu a0,0(a2) // loading again!
 lbu a5,0(a3) // loading again!
 subwa0,a0,a5
 ret
.L7:
 li  a0,0
 ret

Diff first byte: 15 insns
Diff second byte: 25 insns
No diff: 25 insns

Possible improvements:
* unroll the loop and use load-with-displacement to avoid offset increments
* load and compare multiple (aligned) bytes at once
* Use the bitmanip/strcmp result calculation (reverse words and
   synthesize (a2 >= a3) ? 1 : -1 in a branchless sequence)

When applying these improvements we get the following sequence:

my_mem_cmp_aligned_15:
 ld  a5,0(a0)
 ld  a4,0(a1)
 bne a5,a4,.L2
 ld  a5,8(a0)
 ld  a4,8(a1)
 sllia5,a5,8
 sllia4,a4,8
 bne a5,a4,.L2
 li  a0,0
.L3:
 sext.w  a0,a0
 ret
.L2:
 rev8a5,a5
 rev8a4,a4
 sltua5,a5,a4
 neg a5,a5
 ori a0,a5,1
 j   .L3

Diff first byte: 11 insns
Diff second byte: 16 insns
No diff: 11 insns

This patch implements this improvements.

The tests consist of a execution test (similar to
gcc/testsuite/gcc.dg/torture/inline-mem-cmp-1.c) and a few tests
that test the expansion conditions (known length and alignment).

Similar to the cpymemsi expansion this patch does not introduce any
gating for the cmpmemsi expansion (on top of requiring the known length,
alignment and Zbb).

Bootstrapped and SPEC CPU 2017 tested.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_block_compare): New
prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER2): New helper
for zero_extendhi.
(do_load_from_addr): Add support for HI and SI/64 modes.
(do_load): Add helper for zero-extended loads.
(emit_memcmp_scalar_load_and_compare): New helper to emit memcmp.
(emit_memcmp_scalar_result_calculation): Likewise.
(riscv_expand_block_compare_scalar): Likewise.
(riscv_expand_block_compare): New RISC-V expander for memory compare.
* config/riscv/riscv.md (cmpmemsi): New cmpmem expansion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmpmemsi-1.c: New test.
* gcc.target/riscv/cmpmemsi-2.c: New test.
* gcc.target/riscv/cmpmemsi-3.c: New test.
* gcc.target/riscv/cmpmemsi.c: New test.

[ ... ]
I fixed some of the nits from the linter (whitespace stuff) and pushed 
both patches of this series.


Jeff



[gcc r15-525] [v2, 2/2] RISC-V: strcmp expansion: Use adjust_address() for address calculation

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:1fbbae1d4ba3618a3da829a6d7e11a1606a583b3

commit r15-525-g1fbbae1d4ba3618a3da829a6d7e11a1606a583b3
Author: Christoph Müllner 
Date:   Wed May 15 12:19:40 2024 -0600

[v2,2/2] RISC-V: strcmp expansion: Use adjust_address() for address 
calculation

We have an arch-independent routine to generate an address with an offset.
Let's use that instead of doing the calculation in the backend.

gcc/ChangeLog:

* config/riscv/riscv-string.cc 
(emit_strcmp_scalar_load_and_compare):
Use adjust_address() to calculate MEM-PLUS pattern.

Diff:
---
 gcc/config/riscv/riscv-string.cc | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 8f3b6f925e01..cbb9724d2308 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -227,8 +227,6 @@ emit_strcmp_scalar_load_and_compare (rtx result, rtx src1, 
rtx src2,
 rtx final_label)
 {
   const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
-  rtx src1_addr = force_reg (Pmode, XEXP (src1, 0));
-  rtx src2_addr = force_reg (Pmode, XEXP (src2, 0));
   unsigned HOST_WIDE_INT offset = 0;
 
   rtx testval = gen_reg_rtx (Xmode);
@@ -246,10 +244,10 @@ emit_strcmp_scalar_load_and_compare (rtx result, rtx 
src1, rtx src2,
   else
load_mode = Xmode;
 
-  rtx addr1 = gen_rtx_PLUS (Pmode, src1_addr, GEN_INT (offset));
-  do_load_from_addr (load_mode, data1, addr1, src1);
-  rtx addr2 = gen_rtx_PLUS (Pmode, src2_addr, GEN_INT (offset));
-  do_load_from_addr (load_mode, data2, addr2, src2);
+  rtx addr1 = adjust_address (src1, load_mode, offset);
+  do_load (load_mode, data1, addr1);
+  rtx addr2 = adjust_address (src2, load_mode, offset);
+  do_load (load_mode, data2, addr2);
 
   if (cmp_bytes == 1)
{


[gcc r15-524] [v2,1/2] RISC-V: Add cmpmemsi expansion

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:4bf1aa1ab90dd487fadc27c86523ec3562b2d2fe

commit r15-524-g4bf1aa1ab90dd487fadc27c86523ec3562b2d2fe
Author: Christoph Müllner 
Date:   Wed May 15 12:18:20 2024 -0600

[v2,1/2] RISC-V: Add cmpmemsi expansion

GCC has a generic cmpmemsi expansion via the by-pieces framework,
which shows some room for target-specific optimizations.
E.g. for comparing two aligned memory blocks of 15 bytes
we get the following sequence:

my_mem_cmp_aligned_15:
li  a4,0
j   .L2
.L8:
bgeua4,a7,.L7
.L2:
add a2,a0,a4
add a3,a1,a4
lbu a5,0(a2)
lbu a6,0(a3)
addia4,a4,1
li  a7,15// missed hoisting
subwa5,a5,a6
andia5,a5,0xff // useless
beq a5,zero,.L8
lbu a0,0(a2) // loading again!
lbu a5,0(a3) // loading again!
subwa0,a0,a5
ret
.L7:
li  a0,0
ret

Diff first byte: 15 insns
Diff second byte: 25 insns
No diff: 25 insns

Possible improvements:
* unroll the loop and use load-with-displacement to avoid offset increments
* load and compare multiple (aligned) bytes at once
* Use the bitmanip/strcmp result calculation (reverse words and
  synthesize (a2 >= a3) ? 1 : -1 in a branchless sequence)

When applying these improvements we get the following sequence:

my_mem_cmp_aligned_15:
ld  a5,0(a0)
ld  a4,0(a1)
bne a5,a4,.L2
ld  a5,8(a0)
ld  a4,8(a1)
sllia5,a5,8
sllia4,a4,8
bne a5,a4,.L2
li  a0,0
.L3:
sext.w  a0,a0
ret
.L2:
rev8a5,a5
rev8a4,a4
sltua5,a5,a4
neg a5,a5
ori a0,a5,1
j   .L3

Diff first byte: 11 insns
Diff second byte: 16 insns
No diff: 11 insns

This patch implements this improvements.

The tests consist of a execution test (similar to
gcc/testsuite/gcc.dg/torture/inline-mem-cmp-1.c) and a few tests
that test the expansion conditions (known length and alignment).

Similar to the cpymemsi expansion this patch does not introduce any
gating for the cmpmemsi expansion (on top of requiring the known length,
alignment and Zbb).

Bootstrapped and SPEC CPU 2017 tested.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_block_compare): New
prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER2): New helper
for zero_extendhi.
(do_load_from_addr): Add support for HI and SI/64 modes.
(do_load): Add helper for zero-extended loads.
(emit_memcmp_scalar_load_and_compare): New helper to emit memcmp.
(emit_memcmp_scalar_result_calculation): Likewise.
(riscv_expand_block_compare_scalar): Likewise.
(riscv_expand_block_compare): New RISC-V expander for memory 
compare.
* config/riscv/riscv.md (cmpmemsi): New cmpmem expansion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmpmemsi-1.c: New test.
* gcc.target/riscv/cmpmemsi-2.c: New test.
* gcc.target/riscv/cmpmemsi-3.c: New test.
* gcc.target/riscv/cmpmemsi.c: New test.

Diff:
---
 gcc/config/riscv/riscv-protos.h |  1 +
 gcc/config/riscv/riscv-string.cc| 40 +--
 gcc/config/riscv/riscv.md   | 15 ++
 gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c |  6 
 gcc/testsuite/gcc.target/riscv/cmpmemsi-2.c | 42 
 gcc/testsuite/gcc.target/riscv/cmpmemsi-3.c | 43 +
 gcc/testsuite/gcc.target/riscv/cmpmemsi.c   | 22 +++
 7 files changed, 155 insertions(+), 14 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5c8a52b78a22..565ead1382a7 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -189,6 +189,7 @@ rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
 rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
 
 /* Routines implemented in riscv-string.c.  */
+extern bool riscv_expand_block_compare (rtx, rtx, rtx, rtx);
 extern bool riscv_expand_block_move (rtx, rtx, rtx);
 extern bool riscv_expand_block_clear (rtx, rtx);
 
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 96394844bbb6..8f3b6f925e01 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -86,35 +86,47 @@ GEN_EMIT_HELPER2(th_rev) /* do_th_rev2  */
 GEN_EMIT_HELPER2(th_tstnbz) /* do_th_tstnbz2  */
 GEN_EMIT_HELPER3(xor) /* 

Re: [PATCH] RISC-V: propgue/epilogue expansion code minor changes [NFC]

2024-05-15 Thread Jeff Law




On 5/15/24 12:55 PM, Vineet Gupta wrote:

Saw this little room for improvement in current debugging of
prologue/epilogue expansion code.

---

Use the following pattern consistently
`RTX_FRAME_RELATED_P (gen_insn (insn)) = 1`

vs. calling gen_insn around apriori gen_xxx_insn () calls.

This reduces weird indentations which are done inconsistently.

And also move the RTX_FRAME_RELATED_P () calls immediately after those
gen_xxx_insn () calls.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_epilogue): Use pattern
described above.
(riscv_expand_prologue): Ditto.
(riscv_for_each_saved_v_reg): Ditto.

Thanks for cleaning this up.  Just having consistency is helpful.

All this gets scrambled again with stack-clash protection :(  But that's 
just the nature of the beast.


jeff


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Test cbo.zero expansion for rv32

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:f3d5808070acf09d4ca1da5f5e692be52e3a73a6

commit f3d5808070acf09d4ca1da5f5e692be52e3a73a6
Author: Christoph Müllner 
Date:   Wed May 15 01:34:54 2024 +0200

RISC-V: Test cbo.zero expansion for rv32

We had an issue when expanding via cmo-zero for RV32.
This was fixed upstream, but we don't have a RV32 test.
Therefore, this patch introduces such a test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicboz-zic64-1.c: Fix for rv32.

Signed-off-by: Christoph Müllner 
(cherry picked from commit 5609d77e683944439fae38323ecabc44a1eb4671)

Diff:
---
 .../gcc.target/riscv/cmo-zicboz-zic64-1.c  | 37 +++---
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
index 6d4535287d08..9192b391b11d 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
@@ -1,24 +1,9 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zic64b_zicboz -mabi=lp64d" } */
+/* { dg-options "-march=rv32gc_zic64b_zicboz" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zic64b_zicboz" { target { rv64 } } } */
 /* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
-/* { dg-final { check-function-bodies "**" "" } } */
-/* { dg-allow-blank-lines-in-output 1 } */
 
-/*
-**clear_buf_123:
-**...
-**cbo\.zero\t0\(a[0-9]+\)
-**sd\tzero,64\(a[0-9]+\)
-**sd\tzero,72\(a[0-9]+\)
-**sd\tzero,80\(a[0-9]+\)
-**sd\tzero,88\(a[0-9]+\)
-**sd\tzero,96\(a[0-9]+\)
-**sd\tzero,104\(a[0-9]+\)
-**sd\tzero,112\(a[0-9]+\)
-**sh\tzero,120\(a[0-9]+\)
-**sb\tzero,122\(a[0-9]+\)
-**...
-*/
+// 1x cbo.zero, 7x sd (rv64) or 14x sw (rv32), 1x sh, 1x sb
 int
 clear_buf_123 (void *p)
 {
@@ -26,17 +11,17 @@ clear_buf_123 (void *p)
   __builtin_memset (p, 0, 123);
 }
 
-/*
-**clear_buf_128:
-**...
-**cbo\.zero\t0\(a[0-9]+\)
-**addi\ta[0-9]+,a[0-9]+,64
-**cbo\.zero\t0\(a[0-9]+\)
-**...
-*/
+// 2x cbo.zero, 1x addi 64
 int
 clear_buf_128 (void *p)
 {
   p = __builtin_assume_aligned(p, 64);
   __builtin_memset (p, 0, 128);
 }
+
+/* { dg-final { scan-assembler-times "cbo\.zero\t" 3 } } */
+/* { dg-final { scan-assembler-times "addi\ta\[0-9\]+,a\[0-9\]+,64" 1 } } */
+/* { dg-final { scan-assembler-times "sd\t" 7 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "sw\t" 14 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times "sh\t" 1 } } */
+/* { dg-final { scan-assembler-times "sb\t" 1 } } */


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Allow by-pieces to do overlapping accesses in block_move_straight

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:59e6343f99eb53da07bbd6198f083ce1bbdf20d8

commit 59e6343f99eb53da07bbd6198f083ce1bbdf20d8
Author: Christoph Müllner 
Date:   Mon Apr 29 02:53:20 2024 +0200

RISC-V: Allow by-pieces to do overlapping accesses in block_move_straight

The current implementation of riscv_block_move_straight() emits a couple
of loads/stores with with maximum width (e.g. 8-byte for RV64).
The remainder is handed over to move_by_pieces().
The by-pieces framework utilizes target hooks to decide about the emitted
instructions (e.g. unaligned accesses or overlapping accesses).

Since the current implementation will always request less than XLEN bytes
to be handled by the by-pieces infrastructure, it is impossible that
overlapping memory accesses can ever be emitted (the by-pieces code does
not know of any previous instructions that were emitted by the backend).

This patch changes the implementation of riscv_block_move_straight()
such, that it utilizes the by-pieces framework if the remaining data
is less than 2*XLEN bytes, which is sufficient to enable overlapping
memory accesses (if the requirements for them are given).

The changes in the expansion can be seen in the adjustments of the
cpymem-NN-ooo test cases. The changes in the cpymem-NN tests are
caused by the different instruction ordering of the code emitted
by the by-pieces infrastructure, which emits alternating load/store
sequences.

gcc/ChangeLog:

* config/riscv/riscv-string.cc (riscv_block_move_straight):
Hand over up to 2xXLEN bytes to move_by_pieces().

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: Adjustments for overlapping
access.
* gcc.target/riscv/cpymem-32.c: Adjustments for code emitted by
by-pieces.
* gcc.target/riscv/cpymem-64-ooo.c: Adjustments for overlapping
access.
* gcc.target/riscv/cpymem-64.c: Adjustments for code emitted by
by-pieces.

Signed-off-by: Christoph Müllner 
(cherry picked from commit ad22c607f3e17f2c6ca45699c1d88adaa618c23c)

Diff:
---
 gcc/config/riscv/riscv-string.cc   |  6 +++---
 gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c | 16 
 gcc/testsuite/gcc.target/riscv/cpymem-32.c | 10 --
 gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c |  8 
 gcc/testsuite/gcc.target/riscv/cpymem-64.c |  9 +++--
 5 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index b6cd70323563..96394844bbb6 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -637,18 +637,18 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned 
HOST_WIDE_INT length,
   delta = bits / BITS_PER_UNIT;
 
   /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta);
+  regs = XALLOCAVEC (rtx, length / delta - 1);
 
   /* Load as many BITS-sized chunks as possible.  Use a normal load if
  the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
 {
   regs[i] = gen_reg_rtx (mode);
   riscv_emit_move (regs[i], adjust_address (src, mode, offset));
 }
 
   /* Copy the chunks to the destination.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
 riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
 
   /* Mop up any left-over bytes.  */
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c 
b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
index 947d58c30fa3..2a48567353a6 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
@@ -91,8 +91,8 @@ COPY_ALIGNED_N(11)
 **...
 **sw\t[at][0-9],0\([at][0-9]\)
 **...
-**lbu\t[at][0-9],14\([at][0-9]\)
-**sb\t[at][0-9],14\([at][0-9]\)
+**lw\t[at][0-9],11\([at][0-9]\)
+**sw\t[at][0-9],11\([at][0-9]\)
 **...
 */
 COPY_N(15)
@@ -104,8 +104,8 @@ COPY_N(15)
 **...
 **sw\t[at][0-9],0\([at][0-9]\)
 **...
-**lbu\t[at][0-9],14\([at][0-9]\)
-**sb\t[at][0-9],14\([at][0-9]\)
+**lw\t[at][0-9],11\([at][0-9]\)
+**sw\t[at][0-9],11\([at][0-9]\)
 **...
 */
 COPY_ALIGNED_N(15)
@@ -117,8 +117,8 @@ COPY_ALIGNED_N(15)
 **...
 **sw\t[at][0-9],20\([at][0-9]\)
 **...
-**lbu\t[at][0-9],26\([at][0-9]\)
-**sb\t[at][0-9],26\([at][0-9]\)
+**lw\t[at][0-9],23\([at][0-9]\)
+**sw\t[at][0-9],23\([at][0-9]\)
 **...
 */
 COPY_N(27)
@@ -130,8 +130,8 @@ COPY_N(27)
 **...
 **sw\t[at][0-9],20\([at][0-9]\)
 **...
-**

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: add tests for overlapping mem ops

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:ad0413b832400aa9e81e20070b3ef6b0a9a6d888

commit ad0413b832400aa9e81e20070b3ef6b0a9a6d888
Author: Christoph Müllner 
Date:   Mon Apr 29 03:06:52 2024 +0200

RISC-V: add tests for overlapping mem ops

A recent patch added the field overlap_op_by_pieces to the struct
riscv_tune_param, which is used by the TARGET_OVERLAP_OP_BY_PIECES_P()
hook. This hook is used by the by-pieces infrastructure to decide
if overlapping memory accesses should be emitted.

The changes in the expansion can be seen in the adjustments of the
cpymem test cases. These tests also reveal a limitation in the
RISC-V cpymem expansion that prevents this optimization as only
by-pieces cpymem expansions emit overlapping memory accesses.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: Adjust for overlapping
access.
* gcc.target/riscv/cpymem-64-ooo.c: Likewise.

Signed-off-by: Christoph Müllner 
(cherry picked from commit 5814437b4fcc550697d6e286f49a2f8b108815bf)

Diff:
---
 gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c | 20 +++-
 gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c | 33 ++
 2 files changed, 20 insertions(+), 33 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c 
b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
index 946a773f77a0..947d58c30fa3 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
@@ -24,9 +24,8 @@ void copy_aligned_##N (void *to, void *from)  \
 **...
 **lw\t[at][0-9],0\([at][0-9]\)
 **sw\t[at][0-9],0\([at][0-9]\)
-**...
-**lbu\t[at][0-9],6\([at][0-9]\)
-**sb\t[at][0-9],6\([at][0-9]\)
+**lw\t[at][0-9],3\([at][0-9]\)
+**sw\t[at][0-9],3\([at][0-9]\)
 **...
 */
 COPY_N(7)
@@ -36,9 +35,8 @@ COPY_N(7)
 **...
 **lw\t[at][0-9],0\([at][0-9]\)
 **sw\t[at][0-9],0\([at][0-9]\)
-**...
-**lbu\t[at][0-9],6\([at][0-9]\)
-**sb\t[at][0-9],6\([at][0-9]\)
+**lw\t[at][0-9],3\([at][0-9]\)
+**sw\t[at][0-9],3\([at][0-9]\)
 **...
 */
 COPY_ALIGNED_N(7)
@@ -66,11 +64,10 @@ COPY_ALIGNED_N(8)
 **...
 **...
 **lw\t[at][0-9],0\([at][0-9]\)
-**...
 **sw\t[at][0-9],0\([at][0-9]\)
 **...
-**lbu\t[at][0-9],10\([at][0-9]\)
-**sb\t[at][0-9],10\([at][0-9]\)
+**lw\t[at][0-9],7\([at][0-9]\)
+**sw\t[at][0-9],7\([at][0-9]\)
 **...
 */
 COPY_N(11)
@@ -79,11 +76,10 @@ COPY_N(11)
 **copy_aligned_11:
 **...
 **lw\t[at][0-9],0\([at][0-9]\)
-**...
 **sw\t[at][0-9],0\([at][0-9]\)
 **...
-**lbu\t[at][0-9],10\([at][0-9]\)
-**sb\t[at][0-9],10\([at][0-9]\)
+**lw\t[at][0-9],7\([at][0-9]\)
+**sw\t[at][0-9],7\([at][0-9]\)
 **...
 */
 COPY_ALIGNED_N(11)
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c 
b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
index 08a927b94835..108748690cd3 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
@@ -24,9 +24,8 @@ void copy_aligned_##N (void *to, void *from)  \
 **...
 **lw\t[at][0-9],0\([at][0-9]\)
 **sw\t[at][0-9],0\([at][0-9]\)
-**...
-**lbu\t[at][0-9],6\([at][0-9]\)
-**sb\t[at][0-9],6\([at][0-9]\)
+**lw\t[at][0-9],3\([at][0-9]\)
+**sw\t[at][0-9],3\([at][0-9]\)
 **...
 */
 COPY_N(7)
@@ -36,9 +35,8 @@ COPY_N(7)
 **...
 **lw\t[at][0-9],0\([at][0-9]\)
 **sw\t[at][0-9],0\([at][0-9]\)
-**...
-**lbu\t[at][0-9],6\([at][0-9]\)
-**sb\t[at][0-9],6\([at][0-9]\)
+**lw\t[at][0-9],3\([at][0-9]\)
+**sw\t[at][0-9],3\([at][0-9]\)
 **...
 */
 COPY_ALIGNED_N(7)
@@ -66,9 +64,8 @@ COPY_ALIGNED_N(8)
 **...
 **ld\t[at][0-9],0\([at][0-9]\)
 **sd\t[at][0-9],0\([at][0-9]\)
-**...
-**lbu\t[at][0-9],10\([at][0-9]\)
-**sb\t[at][0-9],10\([at][0-9]\)
+**lw\t[at][0-9],7\([at][0-9]\)
+**sw\t[at][0-9],7\([at][0-9]\)
 **...
 */
 COPY_N(11)
@@ -77,11 +74,9 @@ COPY_N(11)
 **copy_aligned_11:
 **...
 **ld\t[at][0-9],0\([at][0-9]\)
-**...
 **sd\t[at][0-9],0\([at][0-9]\)
-**...
-**lbu\t[at][0-9],10\([at][0-9]\)
-**sb\t[at][0-9],10\([at][0-9]\)
+**lw\t[at][0-9],7\([at][0-9]\)
+**sw\t[at][0-9],7\([at][0-9]\)
 **...
 */
 COPY_ALIGNED_N(11)
@@ -90,11 +85,9 @@ COPY_ALIGNED_N(11)
 **copy_15:
 **...
 **ld\t[at][0-9],0\([at][0-9]\)
-**...
 **sd\t[at][0-9],0\([at][0-9]\)
-**...
-**lbu\t[at][0-9],14\([at][0-9]\)
-**sb\t[at][0-9],14\([at][0-9]\)
+**ld\t[at][0-9],7\([at][0-9]\)
+**sd\t[at][0-9],7\([at][0-9]\)
 **...
 */
 COPY_N(15)
@@ -103,11 +96,9 @@ COPY_N(15)
 **copy_aligned_15:
 **...
 **ld\t[at][0-9],0\([at][0-9]\)
-**...
 **sd\t[at][0-9],0\([at][0-9]\)
-**...
-**lbu\t[at][0-9],14\([at][0-9]\)
-**sb\t[at][0-9],14\([at][0-9]\)
+**ld\t[at][0-9],7\([at][0-9]\)
+**

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add test cases for cpymem expansion

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:0dcd2d26d0da77af7f173b6c0d79a7f5ea25c642

commit 0dcd2d26d0da77af7f173b6c0d79a7f5ea25c642
Author: Christoph Müllner 
Date:   Wed May 1 16:54:42 2024 +0200

RISC-V: Add test cases for cpymem expansion

We have two mechanisms in the RISC-V backend that expand
cpymem pattern: a) by-pieces, b) riscv_expand_block_move()
in riscv-string.cc. The by-pieces framework has higher priority
and emits a sequence of up to 15 instructions
(see use_by_pieces_infrastructure_p() for more details).

As a rule-of-thumb, by-pieces emits alternating load/store sequences
and the setmem expansion in the backend emits a sequence of loads
followed by a sequence of stores.

Let's add some test cases to document the current behaviour
and to have tests to identify regressions.

Signed-off-by: Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: New test.
* gcc.target/riscv/cpymem-32.c: New test.
* gcc.target/riscv/cpymem-64-ooo.c: New test.
* gcc.target/riscv/cpymem-64.c: New test.

(cherry picked from commit 00029408387e9cc64e135324c22d15cd5a70e946)

Diff:
---
 gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c | 131 +++
 gcc/testsuite/gcc.target/riscv/cpymem-32.c | 138 +
 gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c | 129 +++
 gcc/testsuite/gcc.target/riscv/cpymem-64.c | 138 +
 4 files changed, 536 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c 
b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
new file mode 100644
index ..33fb9891d823
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
@@ -0,0 +1,131 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv32 } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -mtune=generic-ooo" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-allow-blank-lines-in-output 1 } */
+
+#define COPY_N(N)  \
+void copy_##N (void *to, void *from)   \
+{  \
+  __builtin_memcpy (to, from, N);  \
+}
+
+#define COPY_ALIGNED_N(N)  \
+void copy_aligned_##N (void *to, void *from)   \
+{  \
+  to = __builtin_assume_aligned(to, sizeof(long)); \
+  from = __builtin_assume_aligned(from, sizeof(long)); \
+  __builtin_memcpy (to, from, N);  \
+}
+
+/*
+**copy_7:
+**...
+**lw\t[at][0-9],0\([at][0-9]\)
+**sw\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],6\([at][0-9]\)
+**sb\t[at][0-9],6\([at][0-9]\)
+**...
+*/
+COPY_N(7)
+
+/*
+**copy_aligned_7:
+**...
+**lw\t[at][0-9],0\([at][0-9]\)
+**sw\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],6\([at][0-9]\)
+**sb\t[at][0-9],6\([at][0-9]\)
+**...
+*/
+COPY_ALIGNED_N(7)
+
+/*
+**copy_8:
+**...
+**lw\ta[0-9],0\(a[0-9]\)
+**sw\ta[0-9],0\(a[0-9]\)
+**...
+*/
+COPY_N(8)
+
+/*
+**copy_aligned_8:
+**...
+**lw\ta[0-9],0\(a[0-9]\)
+**sw\ta[0-9],0\(a[0-9]\)
+**...
+*/
+COPY_ALIGNED_N(8)
+
+/*
+**copy_11:
+**...
+**lbu\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],10\([at][0-9]\)
+**...
+**sb\t[at][0-9],0\([at][0-9]\)
+**...
+**sb\t[at][0-9],10\([at][0-9]\)
+**...
+*/
+COPY_N(11)
+
+/*
+**copy_aligned_11:
+**...
+**lw\t[at][0-9],0\([at][0-9]\)
+**...
+**sw\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],10\([at][0-9]\)
+**sb\t[at][0-9],10\([at][0-9]\)
+**...
+*/
+COPY_ALIGNED_N(11)
+
+/*
+**copy_15:
+**...
+**(call|tail)\tmemcpy
+**...
+*/
+COPY_N(15)
+
+/*
+**copy_aligned_15:
+**...
+**lw\t[at][0-9],0\([at][0-9]\)
+**...
+**sw\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],14\([at][0-9]\)
+**sb\t[at][0-9],14\([at][0-9]\)
+**...
+*/
+COPY_ALIGNED_N(15)
+
+/*
+**copy_27:
+**...
+**(call|tail)\tmemcpy
+**...
+*/
+COPY_N(27)
+
+/*
+**copy_aligned_27:
+**...
+**lw\t[at][0-9],20\([at][0-9]\)
+**...
+**sw\t[at][0-9],20\([at][0-9]\)
+**...
+**lbu\t[at][0-9],26\([at][0-9]\)
+**sb\t[at][0-9],26\([at][0-9]\)
+**...
+*/
+COPY_ALIGNED_N(27)
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32.c 
b/gcc/testsuite/gcc.target/riscv/cpymem-32.c
new file mode 100644
index ..44ba14a1d51f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32.c
@@ -0,0 +1,138 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv32 } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -mtune=rocket" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
+/* { dg-final { 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Allow unaligned accesses in cpymemsi expansion

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:69408db9b2b3ede055f4392f9d30be33804eec77

commit 69408db9b2b3ede055f4392f9d30be33804eec77
Author: Christoph Müllner 
Date:   Wed May 1 18:50:38 2024 +0200

RISC-V: Allow unaligned accesses in cpymemsi expansion

The RISC-V cpymemsi expansion is called, whenever the by-pieces
infrastructure will not take care of the builtin expansion.
The code emitted by the by-pieces infrastructure may emits code,
that includes unaligned accesses if riscv_slow_unaligned_access_p
is false.

The RISC-V cpymemsi expansion is handled via riscv_expand_block_move().
The current implementation of this function does not check
riscv_slow_unaligned_access_p and never emits unaligned accesses.

Since by-pieces emits unaligned accesses, it is reasonable to implement
the same behaviour in the cpymemsi expansion. And that's what this patch
is doing.

The patch checks riscv_slow_unaligned_access_p at the entry and sets
the allowed alignment accordingly. This alignment is then propagated
down to the routines that emit the actual instructions.

The changes introduced by this patch can be seen in the adjustments
of the cpymem tests.

gcc/ChangeLog:

* config/riscv/riscv-string.cc (riscv_block_move_straight): Add
parameter align.
(riscv_adjust_block_mem): Replace parameter length by align.
(riscv_block_move_loop): Add parameter align.
(riscv_expand_block_move_scalar): Set alignment properly if the
target has fast unaligned access.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: Adjust for unaligned access.
* gcc.target/riscv/cpymem-64-ooo.c: Likewise.

Signed-off-by: Christoph Müllner 
(cherry picked from commit 04cd8ccaec90405ccf7471252c0e06ba7f5437dc)

Diff:
---
 gcc/config/riscv/riscv-string.cc   | 54 --
 gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c | 20 +++---
 gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c | 14 ++-
 3 files changed, 60 insertions(+), 28 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index b515f44d17ae..b6cd70323563 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -617,11 +617,13 @@ riscv_expand_strlen (rtx result, rtx src, rtx 
search_char, rtx align)
   return false;
 }
 
-/* Emit straight-line code to move LENGTH bytes from SRC to DEST.
+/* Emit straight-line code to move LENGTH bytes from SRC to DEST
+   with accesses that are ALIGN bytes aligned.
Assume that the areas do not overlap.  */
 
 static void
-riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
+riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
+  unsigned HOST_WIDE_INT align)
 {
   unsigned HOST_WIDE_INT offset, delta;
   unsigned HOST_WIDE_INT bits;
@@ -629,8 +631,7 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned 
HOST_WIDE_INT length)
   enum machine_mode mode;
   rtx *regs;
 
-  bits = MAX (BITS_PER_UNIT,
- MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest;
+  bits = MAX (BITS_PER_UNIT, MIN (BITS_PER_WORD, align));
 
   mode = mode_for_size (bits, MODE_INT, 0).require ();
   delta = bits / BITS_PER_UNIT;
@@ -655,21 +656,20 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned 
HOST_WIDE_INT length)
 {
   src = adjust_address (src, BLKmode, offset);
   dest = adjust_address (dest, BLKmode, offset);
-  move_by_pieces (dest, src, length - offset,
- MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), RETURN_BEGIN);
+  move_by_pieces (dest, src, length - offset, align, RETURN_BEGIN);
 }
 }
 
 /* Helper function for doing a loop-based block operation on memory
-   reference MEM.  Each iteration of the loop will operate on LENGTH
-   bytes of MEM.
+   reference MEM.
 
Create a new base register for use within the loop and point it to
the start of MEM.  Create a new memory reference that uses this
-   register.  Store them in *LOOP_REG and *LOOP_MEM respectively.  */
+   register and has an alignment of ALIGN.  Store them in *LOOP_REG
+   and *LOOP_MEM respectively.  */
 
 static void
-riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT length,
+riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT align,
rtx *loop_reg, rtx *loop_mem)
 {
   *loop_reg = copy_addr_to_reg (XEXP (mem, 0));
@@ -677,15 +677,17 @@ riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT 
length,
   /* Although the new mem does not refer to a known location,
  it does keep up to LENGTH bytes of alignment.  */
   *loop_mem = change_address (mem, BLKmode, *loop_reg);
-  set_mem_align (*loop_mem, MIN (MEM_ALIGN (mem), length * BITS_PER_UNIT));
+  set_mem_align (*loop_mem, align);
 }
 
 /* Move LENGTH bytes from SRC 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [committed] Fix rv32 issues with recent zicboz work

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:75a06302ef660397001d67afc1fb4d22e6da5870

commit 75a06302ef660397001d67afc1fb4d22e6da5870
Author: Jeff Law 
Date:   Tue May 14 22:50:15 2024 -0600

[committed] Fix rv32 issues with recent zicboz work

I should have double-checked the CI system before pushing Christoph's 
patches
for memset-zero.  While I thought I'd checked CI state, I must have been
looking at the wrong patch from Christoph.

Anyway, this fixes the rv32 ICEs and disables one of the tests for rv32.

The test would need a revamp for rv32 as the expected output is all rv64 
code
using "sd" instructions.  I'm just not vested deeply enough into rv32 to 
adjust
the test to work in that environment though it should be fairly trivial to 
copy
the test and provide new expected output if someone cares enough.

Verified this fixes the rv32 failures in my tester:
> New tests that FAIL (6 tests):
>
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1  
(internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1  
(test for excess errors)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2  
(internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2  
(test for excess errors)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
(internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
(test for excess errors)

And after the ICE is fixed, these are eliminated by only running the test 
for
rv64:

> New tests that FAIL (3 tests):
>
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1   
check-function-bodies clear_buf_123
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2   
check-function-bodies clear_buf_123
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
 check-function-bodies clear_buf_123

gcc/
* config/riscv/riscv-string.cc
(riscv_expand_block_clear_zicboz_zic64b): Handle rv32 correctly.

gcc/testsuite

* gcc.target/riscv/cmo-zicboz-zic64-1.c: Don't run on rv32.

(cherry picked from commit e410ad74e5e4589aeb666aa298b2f933e7b5d9e7)

Diff:
---
 gcc/config/riscv/riscv-string.cc| 5 -
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c | 3 +--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 87f5fdee3c14..b515f44d17ae 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -827,7 +827,10 @@ riscv_expand_block_clear_zicboz_zic64b (rtx dest, rtx 
length)
 {
   rtx mem = adjust_address (dest, BLKmode, offset);
   rtx addr = force_reg (Pmode, XEXP (mem, 0));
-  emit_insn (gen_riscv_zero_di (addr));
+  if (TARGET_64BIT)
+   emit_insn (gen_riscv_zero_di (addr));
+  else
+   emit_insn (gen_riscv_zero_si (addr));
   offset += cbo_bytes;
 }
 
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
index c2d79eb7ae68..6d4535287d08 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
@@ -1,6 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zic64b_zicboz" { target { rv64 } } } */
-/* { dg-options "-march=rv32gc_zic64b_zicboz" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zic64b_zicboz -mabi=lp64d" } */
 /* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
 /* { dg-final { check-function-bodies "**" "" } } */
 /* { dg-allow-blank-lines-in-output 1 } */


[to-be-committed][RISC-V] Improve some shift-add sequences

2024-05-15 Thread Jeff Law


So this is a minor fix/improvement for shift-add sequences.  This was 
supposed to help xz in a minor way IIRC.


Combine may present us with (x + C2') << C1 which was canonicalized from 
(x << C1) + C2.


Depending on the precise values of C2 and C2' one form may be better 
than the other.  We can (somewhat awkwardly) use riscv_const_insns to 
test for which sequence would be preferred.


Tested on Ventana's CI system as well as my own.  Waiting on CI results 
from Rivos's tester before moving forward.


Jeff




gcc/
* config/riscv/riscv.md: Add new patterns to allow selection
between (x << C1) + C2 vs (x + C2') << C1 depending on the
cost C2 vs C2'.

gcc/testsuite

* gcc.target/riscv/shift-add-1.c: New test.

commit 03933cf8813b28587ceb7f6f66ac03d08c5de58b
Author: Jeff Law 
Date:   Thu Apr 4 13:35:54 2024 -0600

Optimize (x << C1) + C2 after canonicalization to ((x + C2') << C1).

C2' may have a lower cost to synthesize than C1.  Reassociate to take
advantage of that.

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index ffb09a4109d..69c80bc4a86 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4416,6 +4416,62 @@ (define_insn_and_split ""
   "{ operands[6] = gen_lowpart (SImode, operands[5]); }"
   [(set_attr "type" "arith")])
 
+;; These are forms of (x << C1) + C2, potentially canonicalized from
+;; ((x + C2') << C1.  Depending on the cost to load C2 vs C2' we may
+;; want to go ahead and recognize this form as C2 may be cheaper to
+;; synthesize than C2'.
+;;
+;; It might be better to refactor riscv_const_insns a bit so that we
+;; can have an API that passes integer values around rather than
+;; constructing a lot of garbage RTL.
+;;
+;; The mvconst_internal pattern in effect requires this pattern to
+;; also be a define_insn_and_split due to insn count costing when
+;; splitting in combine.
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (plus:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
+   (match_operand 2 "const_int_operand" "n"))
+(match_operand 3 "const_int_operand" "n")))
+   (clobber (match_scratch:DI 4 "="))]
+  "(TARGET_64BIT
+&& riscv_const_insns (operands[3])
+&& ((riscv_const_insns (operands[3])
+< riscv_const_insns (GEN_INT (INTVAL (operands[3]) >> INTVAL 
(operands[2]
+   || riscv_const_insns (GEN_INT (INTVAL (operands[3]) >> INTVAL 
(operands[2]))) == 0))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 2)))
+   (set (match_dup 4) (match_dup 3))
+   (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 4)))]
+  ""
+  [(set_attr "type" "arith")])
+
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI (plus:SI (ashift:SI
+  (match_operand:SI 1 "register_operand" "r")
+  (match_operand 2 "const_int_operand" "n"))
+(match_operand 3 "const_int_operand" "n"
+   (clobber (match_scratch:DI 4 "="))]
+  "(TARGET_64BIT
+&& riscv_const_insns (operands[3])
+&& ((riscv_const_insns (operands[3])
+< riscv_const_insns (GEN_INT (INTVAL (operands[3]) >> INTVAL 
(operands[2]
+   || riscv_const_insns (GEN_INT (INTVAL (operands[3]) >> INTVAL 
(operands[2]))) == 0))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 2)))
+   (set (match_dup 4) (match_dup 3))
+   (set (match_dup 0) (sign_extend:DI (plus:SI (match_dup 5) (match_dup 6]
+  "{
+ operands[1] = gen_lowpart (DImode, operands[1]);
+ operands[5] = gen_lowpart (SImode, operands[0]);
+ operands[6] = gen_lowpart (SImode, operands[4]);
+   }"
+  [(set_attr "type" "arith")])
+
+
 (include "bitmanip.md")
 (include "crypto.md")
 (include "sync.md")
diff --git a/gcc/testsuite/gcc.target/riscv/shift-add-1.c 
b/gcc/testsuite/gcc.target/riscv/shift-add-1.c
new file mode 100644
index 000..d98875c3271
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shift-add-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbb_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+int composeFromSurrogate(const unsigned short high) {
+
+return  ((high - 0xD800) << 10) ;
+}
+
+
+long composeFromSurrogate_2(const unsigned long high) {
+
+return  ((high - 0xD800) << 10) ;
+}
+
+
+/* { dg-final { scan-assembler-times "\tli\t" 2 } } */
+/* { dg-final { scan-assembler-times "\tslli\t" 2 } } */
+/* { dg-final { scan-assembler-times "\taddw\t" 1 } } */
+/* { dg-final { scan-assembler-times "\tadd\t" 1 } } */
+


Re: [PATCH] RISC-V: Fix cbo.zero expansion for rv32

2024-05-15 Thread Jeff Law




On 5/15/24 12:48 AM, Christoph Müllner wrote:

Emitting a DI pattern won't find a match for rv32 and manifests in
the failing test case gcc.target/riscv/cmo-zicboz-zic64-1.c.
Let's fix this in the expansion and also address the different
code that gets generated for rv32/rv64.

gcc/ChangeLog:

* config/riscv/riscv-string.cc (riscv_expand_block_clear_zicboz_zic64b):
Fix expansion for rv32.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicboz-zic64-1.c: Fix for rv32.
The exact change I made yesterday for the code generator.  Glad to see I 
didn't muck it up :-)  And thanks for fixing the test to have some 
coverage on rv32.


Jeff



Re: [PATCH] RISC-V: Test cbo.zero expansion for rv32

2024-05-15 Thread Jeff Law




On 5/15/24 1:28 AM, Christoph Müllner wrote:

We had an issue when expanding via cmo-zero for RV32.
This was fixed upstream, but we don't have a RV32 test.
Therefore, this patch introduces such a test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicboz-zic64-1.c: Fix for rv32.

OK.  Thanks!

jeff



[committed] Fix rv32 issues with recent zicboz work

2024-05-14 Thread Jeff Law
I should have double-checked the CI system before pushing Christoph's 
patches for memset-zero.  While I thought I'd checked CI state, I must 
have been looking at the wrong patch from Christoph.


Anyway, this fixes the rv32 ICEs and disables one of the tests for rv32.

The test would need a revamp for rv32 as the expected output is all rv64 
code using "sd" instructions.  I'm just not vested deeply enough into 
rv32 to adjust the test to work in that environment though it should be 
fairly trivial to copy the test and provide new expected output if 
someone cares enough.





Verified this fixes the rv32 failures in my tester:

New tests that FAIL (6 tests):

unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1  
(internal compiler error: in extract_insn, at recog.cc:2812)
unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1  (test 
for excess errors)
unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2  
(internal compiler error: in extract_insn, at recog.cc:2812)
unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2  (test 
for excess errors)
unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
(internal compiler error: in extract_insn, at recog.cc:2812)
unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  (test 
for excess errors)



And after the ICE is fixed, these are eliminated by only running the 
test for rv64:



New tests that FAIL (3 tests):

unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1   
check-function-bodies clear_buf_123
unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2   
check-function-bodies clear_buf_123
unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g   
check-function-bodies clear_buf_123


Pushed to the trunk.

Jeff

commit e410ad74e5e4589aeb666aa298b2f933e7b5d9e7
Author: Jeff Law 
Date:   Tue May 14 22:50:15 2024 -0600

[committed] Fix rv32 issues with recent zicboz work

I should have double-checked the CI system before pushing Christoph's 
patches
for memset-zero.  While I thought I'd checked CI state, I must have been
looking at the wrong patch from Christoph.

Anyway, this fixes the rv32 ICEs and disables one of the tests for rv32.

The test would need a revamp for rv32 as the expected output is all rv64 
code
using "sd" instructions.  I'm just not vested deeply enough into rv32 to 
adjust
the test to work in that environment though it should be fairly trivial to 
copy
the test and provide new expected output if someone cares enough.

Verified this fixes the rv32 failures in my tester:
> New tests that FAIL (6 tests):
>
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1  
(internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1  
(test for excess errors)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2  
(internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2  
(test for excess errors)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
(internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
(test for excess errors)

And after the ICE is fixed, these are eliminated by only running the test 
for
rv64:

> New tests that FAIL (3 tests):
>
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1   
check-function-bodies clear_buf_123
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2   
check-function-bodies clear_buf_123
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
 check-function-bodies clear_buf_123

gcc/
* config/riscv/riscv-string.cc
(riscv_expand_block_clear_zicboz_zic64b): Handle rv32 correctly.

gcc/testsuite

* gcc.target/riscv/cmo-zicboz-zic64-1.c: Don't run on rv32.

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 87f5fdee3c1..b515f44d17a 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -827,7 +827,10 @@ riscv_expand_block_clear_zicboz_zic64b (rtx dest, rtx 
length)
 {
   rtx mem = adjust_address (dest, BLKmode, offset);
   rtx addr = force_reg (Pmode, XEXP (mem, 0));
-  emit_insn (gen_riscv_zero_di (addr));
+  if (TARGET_64BIT)
+   emit_insn (gen_riscv_zero_di (addr));
+  else
+   emit_insn (gen_riscv_zero_si (addr));
   offset += cbo_bytes;
 }
 
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
index c2d

[gcc r15-500] [committed] Fix rv32 issues with recent zicboz work

2024-05-14 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:e410ad74e5e4589aeb666aa298b2f933e7b5d9e7

commit r15-500-ge410ad74e5e4589aeb666aa298b2f933e7b5d9e7
Author: Jeff Law 
Date:   Tue May 14 22:50:15 2024 -0600

[committed] Fix rv32 issues with recent zicboz work

I should have double-checked the CI system before pushing Christoph's 
patches
for memset-zero.  While I thought I'd checked CI state, I must have been
looking at the wrong patch from Christoph.

Anyway, this fixes the rv32 ICEs and disables one of the tests for rv32.

The test would need a revamp for rv32 as the expected output is all rv64 
code
using "sd" instructions.  I'm just not vested deeply enough into rv32 to 
adjust
the test to work in that environment though it should be fairly trivial to 
copy
the test and provide new expected output if someone cares enough.

Verified this fixes the rv32 failures in my tester:
> New tests that FAIL (6 tests):
>
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1  
(internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1  
(test for excess errors)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2  
(internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2  
(test for excess errors)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
(internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
(test for excess errors)

And after the ICE is fixed, these are eliminated by only running the test 
for
rv64:

> New tests that FAIL (3 tests):
>
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1   
check-function-bodies clear_buf_123
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2   
check-function-bodies clear_buf_123
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
 check-function-bodies clear_buf_123

gcc/
* config/riscv/riscv-string.cc
(riscv_expand_block_clear_zicboz_zic64b): Handle rv32 correctly.

gcc/testsuite

* gcc.target/riscv/cmo-zicboz-zic64-1.c: Don't run on rv32.

Diff:
---
 gcc/config/riscv/riscv-string.cc| 5 -
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c | 3 +--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 87f5fdee3c14..b515f44d17ae 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -827,7 +827,10 @@ riscv_expand_block_clear_zicboz_zic64b (rtx dest, rtx 
length)
 {
   rtx mem = adjust_address (dest, BLKmode, offset);
   rtx addr = force_reg (Pmode, XEXP (mem, 0));
-  emit_insn (gen_riscv_zero_di (addr));
+  if (TARGET_64BIT)
+   emit_insn (gen_riscv_zero_di (addr));
+  else
+   emit_insn (gen_riscv_zero_si (addr));
   offset += cbo_bytes;
 }
 
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
index c2d79eb7ae68..6d4535287d08 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
@@ -1,6 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zic64b_zicboz" { target { rv64 } } } */
-/* { dg-options "-march=rv32gc_zic64b_zicboz" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zic64b_zicboz -mabi=lp64d" } */
 /* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
 /* { dg-final { check-function-bodies "**" "" } } */
 /* { dg-allow-blank-lines-in-output 1 } */


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [to-be-committed, RISC-V] Remove redundant AND in shift-add sequence

2024-05-14 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:9de32107d731fbbf15096d065bf706bb9aff94f6

commit 9de32107d731fbbf15096d065bf706bb9aff94f6
Author: Jeff Law 
Date:   Tue May 14 18:17:59 2024 -0600

[to-be-committed,RISC-V] Remove redundant AND in shift-add sequence

So this patch allows us to eliminate an redundant AND in some shift-add
style sequences.   I think the testcase was reduced from xz by the RAU
team, but I'm not highly confident of that.

Specifically the AND is masking off the upper 32 bits of the un-shifted
value and there's an outer SIGN_EXTEND from SI to DI.  However in the
RTL it's working on the post-shifted value, so the constant is left
shifted, so we have to account for that in the pattern's condition.

We can just drop the AND in this case.  So instead we do a 64bit shift,
then a sign extending ADD utilizing the low part of that 64bit shift result.

This has run through Ventana's CI as well as my own.  I'll wait for it
to run through the larger CI system before pushing.

Jeff

gcc/
* config/riscv/riscv.md: Add pattern for sign extended shift-add
sequence with a masked input.

gcc/testsuite

* gcc.target/riscv/shift-add-2.c: New test.

(cherry picked from commit 32ff344d57d56fddb66c4976b5651345d40b7157)

Diff:
---
 gcc/config/riscv/riscv.md| 25 +
 gcc/testsuite/gcc.target/riscv/shift-add-2.c | 16 
 2 files changed, 41 insertions(+)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 893040f28541..ee15c63db107 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4120,6 +4120,31 @@
   [(set_attr "type" "load")
(set (attr "length") (const_int 8))])
 
+;; The AND is redunant here.  It always turns off the high 32 bits  and the
+;; low number of bits equal to the shift count.  Those upper 32 bits will be
+;; reset by the SIGN_EXTEND at the end.
+;;
+;; One could argue combine should have realized this and simplified what it
+;; presented to the backend.  But we can obviously cope with what it gave us.
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI
+ (plus:SI (subreg:SI
+(and:DI
+  (ashift:DI (match_operand:DI 1 "register_operand" "r")
+ (match_operand 2 "const_int_operand" "n"))
+  (match_operand 3 "const_int_operand" "n")) 0)
+  (match_operand:SI 4 "register_operand" "r"
+   (clobber (match_scratch:DI 5 "="))]
+  "TARGET_64BIT
+   && (INTVAL (operands[3]) | ((1 << INTVAL (operands[2])) - 1)) == 0x"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 5) (ashift:DI (match_dup 1) (match_dup 2)))
+   (set (match_dup 0) (sign_extend:DI (plus:SI (match_dup 6) (match_dup 4]
+  "{ operands[6] = gen_lowpart (SImode, operands[5]); }"
+  [(set_attr "type" "arith")])
+
 (include "bitmanip.md")
 (include "crypto.md")
 (include "sync.md")
diff --git a/gcc/testsuite/gcc.target/riscv/shift-add-2.c 
b/gcc/testsuite/gcc.target/riscv/shift-add-2.c
new file mode 100644
index ..87439858e59e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shift-add-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbb_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+int sub2(int a, long long b) {
+  b = (b << 32) >> 31;
+  unsigned int x = a + b;
+  return x;
+}
+
+
+/* { dg-final { scan-assembler-times "\tslli\t" 1 } } */
+/* { dg-final { scan-assembler-times "\taddw\t" 1 } } */
+/* { dg-final { scan-assembler-not "\tsrai\t" } } */
+/* { dg-final { scan-assembler-not "\tsh.add\t" } } */
+


Re: [PATCH] RISC-V: Implement -m{,no}fence-tso

2024-05-14 Thread Jeff Law




On 5/14/24 5:13 PM, Palmer Dabbelt wrote:

Some processors from T-Head don't implement the `fence.tso` instruction
natively and instead trap to firmware.  This breaks some users who
haven't yet updated the firmware and one could imagine it breaking users
who are trying to build firmware if they're using the C memory model.

So just add an option to disable emitting it, in a similar fashion to
how we allow users to forbid other instructions.

gcc/ChangeLog:

* config/riscv/riscv.opt: Add -mno-fence-tso.
* config/riscv/sync-rvwmo.md (mem_thread_fence_rvwmo): Respect
-mno-fence-tso.
* doc/invoke.texi (RISC-V): Document -mno-fence-tso.

Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1070959
---
I've just smoke tested this one, but

 void func(void) { __atomic_thread_fence(__ATOMIC_ACQ_REL); }

generates `fence.tso` without the argument and `fence rw,rw` with
`-mno-fence-tso`, so it seems to be at least mostly there.  I figured
I'd just send it up for comments before putting together the DG bits:
it's kind of a pain to carry around these workarounds for unimplemented
instructions, but it's in HW so there's not much we can do about that.
Seems reasonable.  We might consider adding a comment in the code 
indicating this is for a particular set of thead systems.  10 years from 
now when someone else looks at the code they'll know why this is in 
there and they won't have to do the archaeology.


Jeff


[gcc r15-497] [to-be-committed, RISC-V] Remove redundant AND in shift-add sequence

2024-05-14 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:32ff344d57d56fddb66c4976b5651345d40b7157

commit r15-497-g32ff344d57d56fddb66c4976b5651345d40b7157
Author: Jeff Law 
Date:   Tue May 14 18:17:59 2024 -0600

[to-be-committed,RISC-V] Remove redundant AND in shift-add sequence

So this patch allows us to eliminate an redundant AND in some shift-add
style sequences.   I think the testcase was reduced from xz by the RAU
team, but I'm not highly confident of that.

Specifically the AND is masking off the upper 32 bits of the un-shifted
value and there's an outer SIGN_EXTEND from SI to DI.  However in the
RTL it's working on the post-shifted value, so the constant is left
shifted, so we have to account for that in the pattern's condition.

We can just drop the AND in this case.  So instead we do a 64bit shift,
then a sign extending ADD utilizing the low part of that 64bit shift result.

This has run through Ventana's CI as well as my own.  I'll wait for it
to run through the larger CI system before pushing.

Jeff

gcc/
* config/riscv/riscv.md: Add pattern for sign extended shift-add
sequence with a masked input.

gcc/testsuite

* gcc.target/riscv/shift-add-2.c: New test.

Diff:
---
 gcc/config/riscv/riscv.md| 25 +
 gcc/testsuite/gcc.target/riscv/shift-add-2.c | 16 
 2 files changed, 41 insertions(+)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 893040f28541..ee15c63db107 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4120,6 +4120,31 @@
   [(set_attr "type" "load")
(set (attr "length") (const_int 8))])
 
+;; The AND is redunant here.  It always turns off the high 32 bits  and the
+;; low number of bits equal to the shift count.  Those upper 32 bits will be
+;; reset by the SIGN_EXTEND at the end.
+;;
+;; One could argue combine should have realized this and simplified what it
+;; presented to the backend.  But we can obviously cope with what it gave us.
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI
+ (plus:SI (subreg:SI
+(and:DI
+  (ashift:DI (match_operand:DI 1 "register_operand" "r")
+ (match_operand 2 "const_int_operand" "n"))
+  (match_operand 3 "const_int_operand" "n")) 0)
+  (match_operand:SI 4 "register_operand" "r"
+   (clobber (match_scratch:DI 5 "="))]
+  "TARGET_64BIT
+   && (INTVAL (operands[3]) | ((1 << INTVAL (operands[2])) - 1)) == 0x"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 5) (ashift:DI (match_dup 1) (match_dup 2)))
+   (set (match_dup 0) (sign_extend:DI (plus:SI (match_dup 6) (match_dup 4]
+  "{ operands[6] = gen_lowpart (SImode, operands[5]); }"
+  [(set_attr "type" "arith")])
+
 (include "bitmanip.md")
 (include "crypto.md")
 (include "sync.md")
diff --git a/gcc/testsuite/gcc.target/riscv/shift-add-2.c 
b/gcc/testsuite/gcc.target/riscv/shift-add-2.c
new file mode 100644
index ..87439858e59e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shift-add-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbb_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+int sub2(int a, long long b) {
+  b = (b << 32) >> 31;
+  unsigned int x = a + b;
+  return x;
+}
+
+
+/* { dg-final { scan-assembler-times "\tslli\t" 1 } } */
+/* { dg-final { scan-assembler-times "\taddw\t" 1 } } */
+/* { dg-final { scan-assembler-not "\tsrai\t" } } */
+/* { dg-final { scan-assembler-not "\tsh.add\t" } } */
+


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: avoid LUI based const materialization ... [part of PR/106265]

2024-05-14 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:de257cc78146b0e518b272de5afc3faa9bbf3669

commit de257cc78146b0e518b272de5afc3faa9bbf3669
Author: Vineet Gupta 
Date:   Mon May 13 11:45:55 2024 -0700

RISC-V: avoid LUI based const materialization ... [part of PR/106265]

... if the constant can be represented as sum of two S12 values.
The two S12 values could instead be fused with subsequent ADD insn.
The helps
 - avoid an additional LUI insn
 - side benefits of not clobbering a reg

e.g.
w/o patch w/ patch
long  | |
plus(unsigned long i) | li  a5,4096 |
{ | addia5,a5,-2032 | addi a0, a0, 2047
   return i + 2064;   | add a0,a0,a5| addi a0, a0, 17
} | ret | ret

NOTE: In theory not having const in a standalone reg might seem less
  CSE friendly, but for workloads in consideration these mat are
  from very late LRA reloads and follow on GCSE is not doing much
  currently.

The real benefit however is seen in base+offset computation for array
accesses and especially for stack accesses which are finalized late in
optim pipeline, during LRA register allocation. Often the finalized
offsets trigger LRA reloads resulting in mind boggling repetition of
exact same insn sequence including LUI based constant materialization.

This shaves off 290 billion dynamic instrustions (QEMU icounts) in
SPEC 2017 Cactu benchmark which is over 10% of workload. In the rest of
suite, there additional 10 billion shaved, with both gains and losses
in indiv workloads as is usual with compiler changes.

 500.perlbench_r-0 |  1,214,534,029,025 | 1,212,887,959,387 |
 500.perlbench_r-1 |740,383,419,739 |   739,280,308,163 |
 500.perlbench_r-2 |692,074,638,817 |   691,118,734,547 |
 502.gcc_r-0   |190,820,141,435 |   190,857,065,988 |
 502.gcc_r-1   |225,747,660,839 |   225,809,444,357 | <- -0.02%
 502.gcc_r-2   |220,370,089,641 |   220,406,367,876 | <- -0.03%
 502.gcc_r-3   |179,111,460,458 |   179,135,609,723 | <- -0.02%
 502.gcc_r-4   |219,301,546,340 |   219,320,416,956 | <- -0.01%
 503.bwaves_r-0|278,733,324,691 |   278,733,323,575 | <- -0.01%
 503.bwaves_r-1|442,397,521,282 |   442,397,519,616 |
 503.bwaves_r-2|344,112,218,206 |   344,112,216,760 |
 503.bwaves_r-3|417,561,469,153 |   417,561,467,597 |
 505.mcf_r |669,319,257,525 |   669,318,763,084 |
 507.cactuBSSN_r   |  2,852,767,394,456 | 2,564,736,063,742 | <+ 10.10%
 508.namd_r|  1,855,884,342,110 | 1,855,881,110,934 |
 510.parest_r  |  1,654,525,521,053 | 1,654,402,859,174 |
 511.povray_r  |  2,990,146,655,619 | 2,990,060,324,589 |
 519.lbm_r |  1,158,337,294,525 | 1,158,337,294,529 |
 520.omnetpp_r |  1,021,765,791,283 | 1,026,165,661,394 |
 521.wrf_r |  1,715,955,652,503 | 1,714,352,737,385 |
 523.xalancbmk_r   |849,846,008,075 |   849,836,851,752 |
 525.x264_r-0  |277,801,762,763 |   277,488,776,427 |
 525.x264_r-1  |927,281,789,540 |   926,751,516,742 |
 525.x264_r-2  |915,352,631,375 |   914,667,785,953 |
 526.blender_r |  1,652,839,180,887 | 1,653,260,825,512 |
 527.cam4_r|  1,487,053,494,925 | 1,484,526,670,770 |
 531.deepsjeng_r   |  1,641,969,526,837 | 1,642,126,598,866 |
 538.imagick_r |  2,098,016,546,691 | 2,097,997,929,125 |
 541.leela_r   |  1,983,557,323,877 | 1,983,531,314,526 |
 544.nab_r |  1,516,061,611,233 | 1,516,061,407,715 |
 548.exchange2_r   |  2,072,594,330,215 | 2,072,591,648,318 |
 549.fotonik3d_r   |  1,001,499,307,366 | 1,001,478,944,189 |
 554.roms_r|  1,028,799,739,111 | 1,028,780,904,061 |
 557.xz_r-0|363,827,039,684 |   363,057,014,260 |
 557.xz_r-1|906,649,112,601 |   905,928,888,732 |
 557.xz_r-2|509,023,898,187 |   508,140,356,932 |
 997.specrand_fr   |402,535,577 |   403,052,561 |
 999.specrand_ir   |402,535,577 |   403,052,561 |

This should still be considered damage control as the real/deeper fix
would be to reduce number of LRA reloads or CSE/anchor those during
LRA constraint sub-pass (re)runs (thats a different PR/114729.

Implementation Details (for posterity)
--
 - basic idea is to have a splitter selected via a new predicate for 
constant
   being possible sum of two S12 and provide the transform.
   This is however a 2 -> 2 transform which combine can't handle.
   So we specify it using a define_insn_and_split.

 - the initial loose "i" constraint caused LRA to accept invalid insns thus
  

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [PATCH 3/3] RISC-V: Add memset-zero expansion to cbo.zero

2024-05-14 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:f9a0426cdbd0d1e796cd0a9bcd39d31e3d2df018

commit f9a0426cdbd0d1e796cd0a9bcd39d31e3d2df018
Author: Christoph Müllner 
Date:   Tue May 14 09:21:17 2024 -0600

[PATCH 3/3] RISC-V: Add memset-zero expansion to cbo.zero

The Zicboz extension offers the cbo.zero instruction, which can be used
to clean a memory region corresponding to a cache block.
The Zic64b extension defines the cache block size to 64 byte.
If both extensions are available, it is possible to use cbo.zero
to clear memory, if the alignment and size constraints are met.
This patch implements this.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_block_clear): New 
prototype.
* config/riscv/riscv-string.cc 
(riscv_expand_block_clear_zicboz_zic64b):
New function to expand a block-clear with cbo.zero.
(riscv_expand_block_clear): New RISC-V block-clear expansion 
function.
* config/riscv/riscv.md (setmem): New setmem expansion.

(cherry picked from commit 54ba8d44bbd703bca6984700b4d6f978890097e2)

Diff:
---
 gcc/config/riscv/riscv-protos.h|  1 +
 gcc/config/riscv/riscv-string.cc   | 59 ++
 gcc/config/riscv/riscv.md  | 24 +
 .../gcc.target/riscv/cmo-zicboz-zic64-1.c  | 43 
 4 files changed, 127 insertions(+)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index e5aebf3fc3d5..255fd6a0de97 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -189,6 +189,7 @@ rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
 
 /* Routines implemented in riscv-string.c.  */
 extern bool riscv_expand_block_move (rtx, rtx, rtx);
+extern bool riscv_expand_block_clear (rtx, rtx);
 
 /* Information about one CPU we know about.  */
 struct riscv_cpu_info {
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 41cb061c746d..87f5fdee3c14 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -794,6 +794,65 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
   return false;
 }
 
+/* Expand a block-clear instruction via cbo.zero instructions.  */
+
+static bool
+riscv_expand_block_clear_zicboz_zic64b (rtx dest, rtx length)
+{
+  unsigned HOST_WIDE_INT hwi_length;
+  unsigned HOST_WIDE_INT align;
+  const unsigned HOST_WIDE_INT cbo_bytes = 64;
+
+  gcc_assert (TARGET_ZICBOZ && TARGET_ZIC64B);
+
+  if (!CONST_INT_P (length))
+return false;
+
+  hwi_length = UINTVAL (length);
+  if (hwi_length < cbo_bytes)
+return false;
+
+  align = MEM_ALIGN (dest) / BITS_PER_UNIT;
+  if (align < cbo_bytes)
+return false;
+
+  /* We don't emit loops.  Instead apply move-bytes limitation.  */
+  unsigned HOST_WIDE_INT max_bytes = RISCV_MAX_MOVE_BYTES_STRAIGHT /
+ UNITS_PER_WORD * cbo_bytes;
+  if (hwi_length > max_bytes)
+return false;
+
+  unsigned HOST_WIDE_INT offset = 0;
+  while (offset + cbo_bytes <= hwi_length)
+{
+  rtx mem = adjust_address (dest, BLKmode, offset);
+  rtx addr = force_reg (Pmode, XEXP (mem, 0));
+  emit_insn (gen_riscv_zero_di (addr));
+  offset += cbo_bytes;
+}
+
+  if (offset < hwi_length)
+{
+  rtx mem = adjust_address (dest, BLKmode, offset);
+  clear_by_pieces (mem, hwi_length - offset, align);
+}
+
+  return true;
+}
+
+bool
+riscv_expand_block_clear (rtx dest, rtx length)
+{
+  /* Only use setmem-zero expansion for Zicboz + Zic64b.  */
+  if (!TARGET_ZICBOZ || !TARGET_ZIC64B)
+return false;
+
+  if (optimize_function_for_size_p (cfun))
+return false;
+
+  return riscv_expand_block_clear_zicboz_zic64b (dest, length);
+}
+
 /* --- Vector expanders --- */
 
 namespace riscv_vector {
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4d6de9925572..c45b1129b0a0 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2608,6 +2608,30 @@
 FAIL;
 })
 
+;; Fill memory with constant byte.
+;; Argument 0 is the destination
+;; Argument 1 is the constant byte
+;; Argument 2 is the length
+;; Argument 3 is the alignment
+
+(define_expand "setmem"
+  [(parallel [(set (match_operand:BLK 0 "memory_operand")
+  (match_operand:QI 2 "const_int_operand"))
+ (use (match_operand:P 1 ""))
+ (use (match_operand:SI 3 "const_int_operand"))])]
+ ""
+ {
+  /* If value to set is not zero, use the library routine.  */
+  if (operands[2] != const0_rtx)
+FAIL;
+
+  if (riscv_expand_block_clear (operands[0], operands[1]))
+DONE;
+  else
+FAIL;
+})
+
+
 ;; Expand in-line code to clear the instruction cache between operand[0] and
 ;; operand[1].
 (define_expand "clear_cache"
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
new file mode 100644
index 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [PATCH 2/3] RISC-V: testsuite: Make cmo tests LTO safe

2024-05-14 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:0db572dff53572f4c471ec588c7328a33f2cb6ab

commit 0db572dff53572f4c471ec588c7328a33f2cb6ab
Author: Christoph Müllner 
Date:   Tue May 14 09:20:18 2024 -0600

[PATCH 2/3] RISC-V: testsuite: Make cmo tests LTO safe

Let's add '\t' to the instruction match pattern to avoid false positive
matches when compiling with -flto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicbom-1.c: Add \t to test pattern.
* gcc.target/riscv/cmo-zicbom-2.c: Likewise.
* gcc.target/riscv/cmo-zicbop-1.c: Likewise.
* gcc.target/riscv/cmo-zicbop-2.c: Likewise.
* gcc.target/riscv/cmo-zicboz-1.c: Likewise.
* gcc.target/riscv/cmo-zicboz-2.c: Likewise.

(cherry picked from commit 21855f960141c1811d6a5f6ad3b2065f20d4b353)

Diff:
---
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c | 6 +++---
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c | 6 +++---
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c | 6 +++---
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c | 6 +++---
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c | 2 +-
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c | 2 +-
 6 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
index 6341f7874d3e..02c38e201fae 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
@@ -24,6 +24,6 @@ void foo3()
 __builtin_riscv_zicbom_cbo_inval((void*)0x111);
 }
 
-/* { dg-final { scan-assembler-times "cbo.clean" 3 } } */
-/* { dg-final { scan-assembler-times "cbo.flush" 3 } } */
-/* { dg-final { scan-assembler-times "cbo.inval" 3 } } */
+/* { dg-final { scan-assembler-times "cbo.clean\t" 3 } } */
+/* { dg-final { scan-assembler-times "cbo.flush\t" 3 } } */
+/* { dg-final { scan-assembler-times "cbo.inval\t" 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
index a04f106c8b0e..040b96952bc3 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
@@ -24,6 +24,6 @@ void foo3()
 __builtin_riscv_zicbom_cbo_inval((void*)0x111);
 }
 
-/* { dg-final { scan-assembler-times "cbo.clean" 3 } } */
-/* { dg-final { scan-assembler-times "cbo.flush" 3 } } */
-/* { dg-final { scan-assembler-times "cbo.inval" 3 } } */
+/* { dg-final { scan-assembler-times "cbo.clean\t" 3 } } */
+/* { dg-final { scan-assembler-times "cbo.flush\t" 3 } } */
+/* { dg-final { scan-assembler-times "cbo.inval\t" 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
index c5d78c1763d3..97181154d85b 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
@@ -18,6 +18,6 @@ int foo1()
   return __builtin_riscv_zicbop_cbo_prefetchi(1);
 }
 
-/* { dg-final { scan-assembler-times "prefetch.i" 1 } } */
-/* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
-/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */
+/* { dg-final { scan-assembler-times "prefetch.i\t" 1 } } */
+/* { dg-final { scan-assembler-times "prefetch.r\t" 4 } } */
+/* { dg-final { scan-assembler-times "prefetch.w\t" 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
index 6576365b39ca..4871a97b21aa 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
@@ -18,6 +18,6 @@ int foo1()
   return __builtin_riscv_zicbop_cbo_prefetchi(1);
 }
 
-/* { dg-final { scan-assembler-times "prefetch.i" 1 } } */
-/* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
-/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */ 
+/* { dg-final { scan-assembler-times "prefetch.i\t" 1 } } */
+/* { dg-final { scan-assembler-times "prefetch.r\t" 4 } } */
+/* { dg-final { scan-assembler-times "prefetch.w\t" 4 } } */ 
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
index 5eb78ab94b5a..63b8782bf89e 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
@@ -10,4 +10,4 @@ void foo1()
 __builtin_riscv_zicboz_cbo_zero((void*)0x121);
 }
 
-/* { dg-final { scan-assembler-times "cbo.zero" 3 } } */ 
+/* { dg-final { scan-assembler-times "cbo.zero\t" 3 } } */ 
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c
index fdc9c719669c..cc3bd505ec09 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c
@@ -10,4 +10,4 @@ void foo1()
 __builtin_riscv_zicboz_cbo_zero((void*)0x121);
 }
 
-/* { dg-final { scan-assembler-times "cbo.zero" 3 } } */ 
+/* { dg-final { scan-assembler-times "cbo.zero\t" 3 } } */


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [1/3] expr: Export clear_by_pieces()

2024-05-14 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:5b00e29d1833dee69e1146f13a8d8a37dadfa31a

commit 5b00e29d1833dee69e1146f13a8d8a37dadfa31a
Author: Christoph Müllner 
Date:   Tue May 14 09:19:13 2024 -0600

[1/3] expr: Export clear_by_pieces()

Make clear_by_pieces() available to other parts of the compiler,
similar to store_by_pieces().

gcc/ChangeLog:

* expr.cc (clear_by_pieces): Remove static from clear_by_pieces.
* expr.h (clear_by_pieces): Add prototype for clear_by_pieces.

(cherry picked from commit e6e41b68fd805ab126895a20bb9670442b198f62)

Diff:
---
 gcc/expr.cc | 6 +-
 gcc/expr.h  | 5 +
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index d4414e242cb9..eaf86d3d8429 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -85,7 +85,6 @@ static void emit_block_move_via_sized_loop (rtx, rtx, rtx, 
unsigned, unsigned);
 static void emit_block_move_via_oriented_loop (rtx, rtx, rtx, unsigned, 
unsigned);
 static rtx emit_block_cmp_via_loop (rtx, rtx, rtx, tree, rtx, bool,
unsigned, unsigned);
-static void clear_by_pieces (rtx, unsigned HOST_WIDE_INT, unsigned int);
 static rtx_insn *compress_float_constant (rtx, rtx);
 static rtx get_subtarget (rtx);
 static rtx store_field (rtx, poly_int64, poly_int64, poly_uint64, poly_uint64,
@@ -1832,10 +1831,7 @@ store_by_pieces (rtx to, unsigned HOST_WIDE_INT len,
 return to;
 }
 
-/* Generate several move instructions to clear LEN bytes of block TO.  (A MEM
-   rtx with BLKmode).  ALIGN is maximum alignment we can assume.  */
-
-static void
+void
 clear_by_pieces (rtx to, unsigned HOST_WIDE_INT len, unsigned int align)
 {
   if (len == 0)
diff --git a/gcc/expr.h b/gcc/expr.h
index 64956f630297..751815841083 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -245,6 +245,11 @@ extern bool can_store_by_pieces (unsigned HOST_WIDE_INT,
 extern rtx store_by_pieces (rtx, unsigned HOST_WIDE_INT, by_pieces_constfn,
void *, unsigned int, bool, memop_ret);
 
+/* Generate several move instructions to clear LEN bytes of block TO.  (A MEM
+   rtx with BLKmode).  ALIGN is maximum alignment we can assume.  */
+
+extern void clear_by_pieces (rtx, unsigned HOST_WIDE_INT, unsigned int);
+
 /* If can_store_by_pieces passes for worst-case values near MAX_LEN, call
store_by_pieces within conditionals so as to handle variable LEN 
efficiently,
storing VAL, if non-NULL_RTX, or valc instead.  */


Re: [PATCH v2 2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-05-14 Thread Jeff Law




On 5/14/24 10:36 AM, Vineet Gupta wrote:



On 5/14/24 08:44, Jeff Law wrote:

On 5/14/24 8:51 AM, Patrick O'Neill wrote:

I was able to find the summary info:


Tests that now fail, but worked before (15 tests):
libgomp: libgomp.fortran/simd7.f90   -O0  execution test
libgomp: libgomp.fortran/task2.f90   -O0  execution test
libgomp: libgomp.fortran/vla2.f90   -O0  execution test
libgomp: libgomp.fortran/vla3.f90   -O3 -fomit-frame-pointer -
funroll-loops -fpeel-loops -ftracer -finline-functions execution test
libgomp: libgomp.fortran/vla3.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla4.f90   -O1  execution test
libgomp: libgomp.fortran/vla4.f90   -O2  execution test
libgomp: libgomp.fortran/vla4.f90   -O3 -fomit-frame-pointer -
funroll-loops -fpeel-loops -ftracer -finline-functions execution test
libgomp: libgomp.fortran/vla4.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla4.f90   -Os  execution test
libgomp: libgomp.fortran/vla5.f90   -O1  execution test
libgomp: libgomp.fortran/vla5.f90   -O2  execution test
libgomp: libgomp.fortran/vla5.f90   -O3 -fomit-frame-pointer -
funroll-loops -fpeel-loops -ftracer -finline-functions execution test
libgomp: libgomp.fortran/vla5.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla5.f90   -Os  execution test

So if you could check on those, it'd be appreciated.

I checked rv64gcv linux and those do not currently run in CI.

So just ran with Vineet's patch in our CI system.  His patch is still
triggering those regressions.  So we need to get that resolved before
that second patch can go in.


And just for reproducibility what exact --with-arch build is this from ?

This run was with "--with-arch=rv64gc_zba_zbb_zbc_zbkb_zbs_zfa_zicond"

I think we likely saw it without zbkb & zfa when we first looked at this 
a few months back.


jeff



Re: [PATCH v2 2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-05-14 Thread Jeff Law




On 5/14/24 8:51 AM, Patrick O'Neill wrote:





I was able to find the summary info:


Tests that now fail, but worked before (15 tests):
libgomp: libgomp.fortran/simd7.f90   -O0  execution test
libgomp: libgomp.fortran/task2.f90   -O0  execution test
libgomp: libgomp.fortran/vla2.f90   -O0  execution test
libgomp: libgomp.fortran/vla3.f90   -O3 -fomit-frame-pointer - 
funroll-loops -fpeel-loops -ftracer -finline-functions execution test

libgomp: libgomp.fortran/vla3.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla4.f90   -O1  execution test
libgomp: libgomp.fortran/vla4.f90   -O2  execution test
libgomp: libgomp.fortran/vla4.f90   -O3 -fomit-frame-pointer - 
funroll-loops -fpeel-loops -ftracer -finline-functions execution test

libgomp: libgomp.fortran/vla4.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla4.f90   -Os  execution test
libgomp: libgomp.fortran/vla5.f90   -O1  execution test
libgomp: libgomp.fortran/vla5.f90   -O2  execution test
libgomp: libgomp.fortran/vla5.f90   -O3 -fomit-frame-pointer - 
funroll-loops -fpeel-loops -ftracer -finline-functions execution test

libgomp: libgomp.fortran/vla5.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla5.f90   -Os  execution test


So if you could check on those, it'd be appreciated.


I checked rv64gcv linux and those do not currently run in CI.
So just ran with Vineet's patch in our CI system.  His patch is still 
triggering those regressions.  So we need to get that resolved before 
that second patch can go in.


jeff



Re: [PATCH 1/3] expr: Export clear_by_pieces()

2024-05-14 Thread Jeff Law




On 5/7/24 11:38 PM, Christoph Müllner wrote:

Make clear_by_pieces() available to other parts of the compiler,
similar to store_by_pieces().

gcc/ChangeLog:

* expr.cc (clear_by_pieces): Remove static from clear_by_pieces.
* expr.h (clear_by_pieces): Add prototype for clear_by_pieces.
I'm going to push this series.  It's fully ack'd, tested and is going to 
interact with Sergei's work on vector variants of relevant patterns.


Jeff


  1   2   3   4   5   6   7   8   9   10   >