[to-be-committed][v2][RISC-V] Use bclri in constant synthesis

2024-05-23 Thread Jeff Law
Testing with Zbs enabled by default showed a minor logic error.  After 
the loop clearing things with bclri, we can only use the sequence if we 
were able to clear all the necessary bits.  If any bits are still on, 
then the bclr sequence turned out to not be profitable.


--

So this is conceptually similar to how we handled direct generation of
bseti for constant synthesis, but this time for bclr.

In the bclr case, we already have an expander for AND.  So we just
needed to adjust the predicate to accept another class of constant
operands (those with a single bit clear).

With that in place constant synthesis is adjusted so that it counts the
number of bits clear in the high 33 bits of a 64bit word.  If that
number is small relative to the current best cost, then we try to
generate the constant with a lui based sequence for the low half which
implicitly sets the upper 32 bits as well.  Then we bclri one or more of
those upper 33 bits.

So as an example, this code goes from 4 instructions down to 3:

> unsigned long foo_0xfffbf7ff(void) { return 
0xfffbf7ffUL; }




Note the use of 33 bits above.  That's meant to capture cases like this:


> unsigned long foo_0xfffd77ff(void) { return 
0xfffd77ffUL; }




We can use lui+addi+bclri+bclri to synthesize that in 4 instructions
instead of 5.




I'm including a handful of cases covering the two basic ideas above that
were found by the testing code.

And, no, we're not done yet.  I see at least one more notable idiom
missing before exploring zbkb's potential to improve things.

Tested in my tester and waiting on Rivos CI system before moving forward.
gcc/

* config/riscv/predicates.md (arith_operand_or_mode_mask): Renamed to..
(arith_or_mode_mask_or_zbs_operand): New predicate.
* config/riscv/riscv.md (and3): Update predicate for operand 2.
* config/riscv/riscv.cc (riscv_build_integer_1): Use bclri to clear
bits, particularly bits 31..63 when profitable to do so.

gcc/testsuite/

* gcc.target/riscv/synthesis-6.c: New test.

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 8948fbfc363..c1c693c7617 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -27,12 +27,6 @@ (define_predicate "arith_operand"
   (ior (match_operand 0 "const_arith_operand")
(match_operand 0 "register_operand")))
 
-(define_predicate "arith_operand_or_mode_mask"
-  (ior (match_operand 0 "arith_operand")
-   (and (match_code "const_int")
-(match_test "UINTVAL (op) == GET_MODE_MASK (HImode)
-|| UINTVAL (op) == GET_MODE_MASK (SImode)"
-
 (define_predicate "lui_operand"
   (and (match_code "const_int")
(match_test "LUI_OPERAND (INTVAL (op))")))
@@ -398,6 +392,14 @@ (define_predicate "not_single_bit_mask_operand"
   (and (match_code "const_int")
(match_test "SINGLE_BIT_MASK_OPERAND (~UINTVAL (op))")))
 
+(define_predicate "arith_or_mode_mask_or_zbs_operand"
+  (ior (match_operand 0 "arith_operand")
+   (and (match_test "TARGET_ZBS")
+   (match_operand 0 "not_single_bit_mask_operand"))
+   (and (match_code "const_int")
+(match_test "UINTVAL (op) == GET_MODE_MASK (HImode)
+|| UINTVAL (op) == GET_MODE_MASK (SImode)"
+
 (define_predicate "const_si_mask_operand"
   (and (match_code "const_int")
(match_test "(INTVAL (op) & (GET_MODE_BITSIZE (SImode) - 1))
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 85df5b7ab49..3b32b515fac 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -893,6 +893,40 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  codes[1].use_uw = false;
  cost = 2;
}
+
+  /* If LUI/ADDI are going to set bits 32..63 and we need a small
+number of them cleared, we might be able to use bclri profitably. 
+
+Note we may allow clearing of bit 31 using bclri.  There's a class
+of constants with that bit clear where this helps.  */
+  else if (TARGET_64BIT
+  && TARGET_ZBS
+  && (32 - popcount_hwi (value & HOST_WIDE_INT_C 
(0x8000))) + 1 < cost)
+   {
+ /* Turn on all those upper bits and synthesize the result.  */
+ HOST_WIDE_INT nval = value | HOST_WIDE_INT_C (0x8000);
+ alt_cost = riscv_build_integer_1 (alt_codes, nval, mode);
+
+ /* Now iterate over the bits we want to clear until the cost is
+too high or we're done.  */
+ nval = value ^ HOST_WIDE_INT_C (-1);
+ nval &= HOST_WIDE_INT_C (~0x7fff);
+ while (nval && alt_cost < cost)
+   {
+ HOST_WIDE_INT bit = ctz_hwi (nval);
+ alt_codes[alt_cost].code = AND;
+ alt_codes[alt_cost].value = ~(1UL << bit);
+ alt_codes[alt_cost].use_uw = false;
+ 

[to-be-committed] [RISC-V] Use bclri in constant synthesis

2024-05-23 Thread Jeff Law
So this is conceptually similar to how we handled direct generation of 
bseti for constant synthesis, but this time for bclr.


In the bclr case, we already have an expander for AND.  So we just 
needed to adjust the predicate to accept another class of constant 
operands (those with a single bit clear).


With that in place constant synthesis is adjusted so that it counts the 
number of bits clear in the high 33 bits of a 64bit word.  If that 
number is small relative to the current best cost, then we try to 
generate the constant with a lui based sequence for the low half which 
implicitly sets the upper 32 bits as well.  Then we bclri one or more of 
those upper 33 bits.


So as an example, this code goes from 4 instructions down to 3:


unsigned long foo_0xfffbf7ff(void) { return 0xfffbf7ffUL; }




Note the use of 33 bits above.  That's meant to capture cases like this:



unsigned long foo_0xfffd77ff(void) { return 0xfffd77ffUL; }




We can use lui+addi+bclri+bclri to synthesize that in 4 instructions 
instead of 5.





I'm including a handful of cases covering the two basic ideas above that 
were found by the testing code.


And, no, we're not done yet.  I see at least one more notable idiom 
missing before exploring zbkb's potential to improve things.


Tested in my tester and waiting on Rivos CI system before moving forward.

jeff


gcc/

* config/riscv/predicates.md (arith_operand_or_mode_mask): Renamed to..
(arith_or_mode_mask_or_zbs_operand): New predicate.
* config/riscv/riscv.md (and3): Update predicate for operand 2.
* config/riscv/riscv.cc (riscv_build_integer_1): Use bclri to clear
bits, particularly bits 31..63 when profitable to do so.

gcc/testsuite/

* gcc.target/riscv/synthesis-6.c: New test.

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 8948fbfc363..c1c693c7617 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -27,12 +27,6 @@ (define_predicate "arith_operand"
   (ior (match_operand 0 "const_arith_operand")
(match_operand 0 "register_operand")))
 
-(define_predicate "arith_operand_or_mode_mask"
-  (ior (match_operand 0 "arith_operand")
-   (and (match_code "const_int")
-(match_test "UINTVAL (op) == GET_MODE_MASK (HImode)
-|| UINTVAL (op) == GET_MODE_MASK (SImode)"
-
 (define_predicate "lui_operand"
   (and (match_code "const_int")
(match_test "LUI_OPERAND (INTVAL (op))")))
@@ -398,6 +392,14 @@ (define_predicate "not_single_bit_mask_operand"
   (and (match_code "const_int")
(match_test "SINGLE_BIT_MASK_OPERAND (~UINTVAL (op))")))
 
+(define_predicate "arith_or_mode_mask_or_zbs_operand"
+  (ior (match_operand 0 "arith_operand")
+   (and (match_test "TARGET_ZBS")
+   (match_operand 0 "not_single_bit_mask_operand"))
+   (and (match_code "const_int")
+(match_test "UINTVAL (op) == GET_MODE_MASK (HImode)
+|| UINTVAL (op) == GET_MODE_MASK (SImode)"
+
 (define_predicate "const_si_mask_operand"
   (and (match_code "const_int")
(match_test "(INTVAL (op) & (GET_MODE_BITSIZE (SImode) - 1))
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 85df5b7ab49..3b32b515fac 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -893,6 +893,40 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  codes[1].use_uw = false;
  cost = 2;
}
+
+  /* If LUI/ADDI are going to set bits 32..63 and we need a small
+number of them cleared, we might be able to use bclri profitably. 
+
+Note we may allow clearing of bit 31 using bclri.  There's a class
+of constants with that bit clear where this helps.  */
+  else if (TARGET_64BIT
+  && TARGET_ZBS
+  && (32 - popcount_hwi (value & HOST_WIDE_INT_C 
(0x8000))) + 1 < cost)
+   {
+ /* Turn on all those upper bits and synthesize the result.  */
+ HOST_WIDE_INT nval = value | HOST_WIDE_INT_C (0x8000);
+ alt_cost = riscv_build_integer_1 (alt_codes, nval, mode);
+
+ /* Now iterate over the bits we want to clear until the cost is
+too high or we're done.  */
+ nval = value ^ HOST_WIDE_INT_C (-1);
+ nval &= HOST_WIDE_INT_C (~0x7fff);
+ while (nval && alt_cost < cost)
+   {
+ HOST_WIDE_INT bit = ctz_hwi (nval);
+ alt_codes[alt_cost].code = AND;
+ alt_codes[alt_cost].value = ~(1UL << bit);
+ alt_codes[alt_cost].use_uw = false;
+ alt_cost++;
+ nval &= ~(1UL << bit);
+   }
+
+ if (alt_cost <= cost)
+   {
+ memcpy (codes, alt_codes, sizeof (alt_codes));
+ cost = alt_cost;
+   }
+   }
 }
 
   if (cost > 2 && 

Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-23 Thread Jeff Law




On 5/23/24 6:14 AM, Richard Biener wrote:

On Thu, May 23, 2024 at 1:08 PM Li, Pan2  wrote:


I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond 
? _1 : 255.
And then we can do the matching on COND_EXPR in the underlying widen-mul pass.

Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass =>
sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, 
have ‘exceptional’ (error_mark) in useless_type_conversion_p, at 
gimple-expr.cc:86


Likely you have released _2, more comments below on your previous mail.
You can be sure by calling debug_tree () on the SSA_NAME node in 
question.  If it reports "in-free-list", then that's definitive that the 
SSA_NAME was released back to the SSA_NAME manager.  If that SSA_NAME is 
still in the IL, then that's very bad.


jeff



Re: RISC-V: Fix round_32.c test on RV32

2024-05-22 Thread Jeff Law




On 5/22/24 12:15 PM, Palmer Dabbelt wrote:

On Wed, 22 May 2024 11:01:16 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/22/24 6:47 AM, Jivan Hakobyan wrote:

After 8367c996e55b2 commit several checks on round_32.c test started to
fail.
The reason is that we prevent rounding DF->SI->DF on RV32 and instead of
a conversation sequence we get calls to appropriate library functions.


gcc/testsuite/ChangeLog:
         * testsuite/gcc.target/riscv/round_32.c: Fixed test

I wonder if this test even makes sense for rv32 anymore given we can't
do a DF->DI as a single instruction and DF->SI is going to give
incorrect results.  So the underlying optimization to improve those
rounding cases just doesn't apply to DF mode objects for rv32.

Thoughts?


Unless I'm missing something, we should still be able to do the float 
roundings on rv32?
I initially thought that as well.  The problem is we don't have a DF->DI 
conversion instruction for rv32.  We can't use DF->SI as the range of 
representable values is wrong.





I think with Zfa we'd also have testable sequences for the double/double 
and float/float roundings, which could be useful to test.  I'm not 
entirely sure there, though, as I always get a bit lost in which FP 
rounding flavors map down.
Zfa is a different story as it has instructions with the proper 
semantics ;-)  We'd just emit those new instructions and wouldn't have 
to worry about the initial range test.





I'd also kicked off some run trying to promote these to executable 
tests.   IIRC it was just DG stuff (maybe just adding a `dg-do run`?) 
but I don't know where I stashed the results...

Not a bad idea, particularly if we test the border cases.

jeff



Re: RISC-V: Fix round_32.c test on RV32

2024-05-22 Thread Jeff Law




On 5/22/24 6:47 AM, Jivan Hakobyan wrote:
After 8367c996e55b2 commit several checks on round_32.c test started to 
fail.

The reason is that we prevent rounding DF->SI->DF on RV32 and instead of
a conversation sequence we get calls to appropriate library functions.


gcc/testsuite/ChangeLog:
         * testsuite/gcc.target/riscv/round_32.c: Fixed test
I wonder if this test even makes sense for rv32 anymore given we can't 
do a DF->DI as a single instruction and DF->SI is going to give 
incorrect results.  So the underlying optimization to improve those 
rounding cases just doesn't apply to DF mode objects for rv32.


Thoughts?
Jeff



Re: [PATCH] Fix PR rtl-optimization/115038

2024-05-22 Thread Jeff Law




On 5/20/24 1:13 AM, Eric Botcazou wrote:

Hi,

this is a regression present on mainline and 14 branch under the form of an
ICE in seh_cfa_offset from config/i386/winnt.cc on the attached C++ testcase
compiled with -O2 -fno-omit-frame-pointer.

The problem directly comes from the -ffold-mem-offsets pass messing up with
the prologue and the frame-related instructions, which is a no-no with SEH, so
the fix simply disconnects the pass in these circumstances, the question being
whether this should be done unconditionally as in the fix or only with SEH.

Tested on x86-64/Linux, OK for the mainline and 14 branch?


2024-05-20  Eric Botcazou  

PR rtl-optimization/115038
* fold-mem-offsets.cc (fold_offsets): Return 0 if the defining
instruction of the register is frame related.


2024-05-20  Eric Botcazou  

* g++.dg/opt/fmo1.C: New test.
lol.  I missed that you had already submitted this when I made my 
comment in the PR.


OK for the trunk and gcc-14 branch.

Jeff


Re: [PATCH 4/4] Testsuite updates

2024-05-22 Thread Jeff Law




On 5/22/24 4:58 AM, Richard Biener wrote:



RISC-V CI didn't trigger (not sure what magic is required).  Both
ARM and AARCH64 show that the "Vectorizing stmts using SLP" are a bit
fragile because we sometimes cancel SLP becuase we want to use
load/store-lanes.

The RISC-V tag on the subject line is the trigger.

Jeff


Re: [PATCH] [tree-optimization/110279] fix testcase pr110279-1.c

2024-05-22 Thread Jeff Law




On 5/22/24 5:46 AM, Di Zhao OS wrote:

The test case is for targets that support FMA. Previously
the "target" selector is missed in dg-final command.

Tested on x86_64-pc-linux-gnu.

Thanks
Di Zhao

gcc/testsuite/ChangeLog:

 * gcc.dg/pr110279-1.c: add target selector.
Rather than list targets explicitly in the test, wouldn't it be better 
to have a common routine that could be used in other cases where we have 
a test that requires FMA?


So something similar to check_effective_target_scalar_all_fma?


JEff


Re: [PATCH v1 2/2] RISC-V: Add test cases for __builtin_add_overflow branchless unsigned SAT_ADD

2024-05-21 Thread Jeff Law




On 5/19/24 12:37 AM, pan2...@intel.com wrote:

From: Pan Li 

After we support branchless __builtin_add_overflow unsigned SAT_ADD from
the middle end.  Add more tests case to cover the functionarlities.

The below test suites are passed.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add __builtin_add_overflow test
macro.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c: New test.
* gcc.target/riscv/sat_u_add-5.c: New test.
* gcc.target/riscv/sat_u_add-6.c: New test.
* gcc.target/riscv/sat_u_add-7.c: New test.
* gcc.target/riscv/sat_u_add-8.c: New test.
* gcc.target/riscv/sat_u_add-run-5.c: New test.
* gcc.target/riscv/sat_u_add-run-6.c: New test.
* gcc.target/riscv/sat_u_add-run-7.c: New test.
* gcc.target/riscv/sat_u_add-run-8.c: New test.

OK
jeff



Re: [PATCH v1 2/2] RISC-V: Add test cases for branch form unsigned SAT_ADD

2024-05-21 Thread Jeff Law




On 5/20/24 5:01 AM, pan2...@intel.com wrote:

From: Pan Li 

After we support branch form unsigned SAT_ADD from the
middle end.  Add more tests case to cover the functionarlities.

The below test suites are passed.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add branch form test macro.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c: New test.
* gcc.target/riscv/sat_u_add-10.c: New test.
* gcc.target/riscv/sat_u_add-11.c: New test.
* gcc.target/riscv/sat_u_add-12.c: New test.
* gcc.target/riscv/sat_u_add-9.c: New test.
* gcc.target/riscv/sat_u_add-run-10.c: New test.
* gcc.target/riscv/sat_u_add-run-11.c: New test.
* gcc.target/riscv/sat_u_add-run-12.c: New test.
* gcc.target/riscv/sat_u_add-run-9.c: New test.


OK

jeff



Re: [PATCH v3 2/2] RISC-V: avoid LUI based const mat in alloca epilogue expansion

2024-05-21 Thread Jeff Law




On 5/20/24 5:32 PM, Vineet Gupta wrote:

This is testsuite clean however there's a dwarf quirk which I want to
run by the experts. The test that was tripping CI has following
fragment:

Before patch|   After Patch
--
li  t0,-4096|  addi sp,s0,-2048
addit0,t0,560   |  .cfi_def_cfa 2, 2048  <- #1
add sp,s0,t0|  addi sp,sp,-1488
.cfi_def_cfa 2, 3536|  .cfi_def_cfa_offset 3536  <- #2
addisp,sp,1504  |  addi sp,sp,1504
.cfi_def_cfa_offset 2032|  .cfi_def_cfa_offset 2032  <- #3

The dwarf insn #1 and #3 seem ok, however #2 seems dubious to me.

---

This is continuing on the prev patch in function epilogue expansion.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_epilogue): Handle offset
being sum of two S12.

OK.
jeff



Re: [PATCH v3 1/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-05-21 Thread Jeff Law




On 5/20/24 5:32 PM, Vineet Gupta wrote:

Changes since v2:
   - Broke out the hunk corresponding to alloca in epilogue expansion in
 a seperate patch.
---

If the constant used for stack offset can be expressed as sum of two S12
values, the constant need not be materialized (in a reg) and instead the
two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.

The prev patches to generally avoid LUI based const materialization didn't
fix this PR and need this directed fix in funcion prologue/epilogue
expansion.

This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the test ;-)

gcc-13.1 release   |  gcc 230823 |   |
   |g6619b3d4c15c|   This patch  |  clang/llvm
-
li  t0,-4096 | lit0,-4096  | addi  sp,sp,-2048 | addi 
sp,sp,-2048
addit0,t0,2016   | addi  t0,t0,2032| add   sp,sp,-16   | addi sp,sp,-32
li  a4,4096  | add   sp,sp,t0  | add   a5,sp,a0| add  a1,sp,16
add sp,sp,t0 | addi  a5,sp,-2032   | sbzero,0(a5)  | add  a0,a0,a1
li  a5,-4096 | add   a0,a5,a0  | addi  sp,sp,2032  | sb   zero,0(a0)
addia4,a4,-2032  | lit0, 4096  | addi  sp,sp,32| addi sp,sp,2032
add a4,a4,a5 | sbzero,2032(a0) | ret   | addi sp,sp,48
addia5,sp,16 | addi  t0,t0,-2032   |   | ret
add a5,a4,a5 | add   sp,sp,t0  |
add a0,a5,a0 | ret |
li  t0,4096  |
sd  a5,8(sp) |
sb  zero,2032(a0)|
addit0,t0,-2016  |
add sp,sp,t0 |
ret  |

gcc/ChangeLog:
PR target/105733
* config/riscv/riscv.h: New macros for with aligned offsets.
* config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
function to split a sum of two s12 values into constituents.
(riscv_expand_prologue): Handle offset being sum of two S12.
(riscv_expand_epilogue): Ditto.
* config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr105733.c: New Test.
* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
expect LUI 4096.
* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.

OK
Jeff



Re: [PATCH v1 2/2] RISC-V: Add test cases for __builtin_add_overflow branch form unsigned SAT_ADD

2024-05-21 Thread Jeff Law




On 5/21/24 4:53 AM, pan2...@intel.com wrote:

From: Pan Li 

After we support __builtin_add_overflow  branch form unsigned SAT_ADD
from the middle end.  Add more tests case to cover the functionarlities.

The below test suites are passed.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test macro for
branch __builtin_add_overflow form.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-13.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-14.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-15.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-16.c: New test.
* gcc.target/riscv/sat_u_add-13.c: New test.
* gcc.target/riscv/sat_u_add-14.c: New test.
* gcc.target/riscv/sat_u_add-15.c: New test.
* gcc.target/riscv/sat_u_add-16.c: New test.
* gcc.target/riscv/sat_u_add-run-13.c: New test.
* gcc.target/riscv/sat_u_add-run-14.c: New test.
* gcc.target/riscv/sat_u_add-run-15.c: New test.
* gcc.target/riscv/sat_u_add-run-16.c: New test.

OK
jeff



Re: [committed] PATCH for Re: Stepping down as maintainer for ARC and Epiphany

2024-05-21 Thread Jeff Law




On 5/21/24 8:02 AM, Paul Koning wrote:




On May 21, 2024, at 9:57 AM, Jeff Law  wrote:



On 5/21/24 12:05 AM, Richard Biener via Gcc wrote:

On Mon, May 20, 2024 at 4:45 PM Gerald Pfeifer  wrote:


On Wed, 5 Jul 2023, Joern Rennecke wrote:

I haven't worked with these targets in years and can't really do
sensible maintenance or reviews of patches for them. I am currently
working on optimizations for other ports like RISC-V.


I noticed MAINTAINERS was not updated, so pushed the patch below.

That leaves the epiphany port unmaintained.  Should we automatically add such
ports to the list of obsoleted ports?

Given that epiphany has randomly failed tests for the last 3+ years due to bugs 
in its patterns, yes, it really needs to be deprecated.

I tried to fix the worst of the offenders in epiphany.md a few years back and 
gave up.  Essentially seemingly innocent changes in the RTL will cause reload 
to occasionally not see a path to get constraints satisfied.  So a test which 
passes today, will flip to failing tomorrow while some other test of tests will 
go the other way.


Does LRA make that issue go away, or does it not help?
LRA didn't trivially work on epiphany.  I didn't care enough about the 
port to try and make it LRA compatible.


jeff



Re: [committed] PATCH for Re: Stepping down as maintainer for ARC and Epiphany

2024-05-21 Thread Jeff Law




On 5/21/24 12:05 AM, Richard Biener via Gcc wrote:

On Mon, May 20, 2024 at 4:45 PM Gerald Pfeifer  wrote:


On Wed, 5 Jul 2023, Joern Rennecke wrote:

I haven't worked with these targets in years and can't really do
sensible maintenance or reviews of patches for them. I am currently
working on optimizations for other ports like RISC-V.


I noticed MAINTAINERS was not updated, so pushed the patch below.


That leaves the epiphany port unmaintained.  Should we automatically add such
ports to the list of obsoleted ports?
Given that epiphany has randomly failed tests for the last 3+ years due 
to bugs in its patterns, yes, it really needs to be deprecated.


I tried to fix the worst of the offenders in epiphany.md a few years 
back and gave up.  Essentially seemingly innocent changes in the RTL 
will cause reload to occasionally not see a path to get constraints 
satisfied.  So a test which passes today, will flip to failing tomorrow 
while some other test of tests will go the other way.




jeff



Re: [PATCH v3 2/2] RISC-V: avoid LUI based const mat in alloca epilogue expansion

2024-05-20 Thread Jeff Law




On 5/20/24 5:32 PM, Vineet Gupta wrote:

This is testsuite clean however there's a dwarf quirk which I want to
run by the experts. The test that was tripping CI has following
fragment:

Before patch|   After Patch
--
li  t0,-4096|  addi sp,s0,-2048
addit0,t0,560   |  .cfi_def_cfa 2, 2048  <- #1
add sp,s0,t0|  addi sp,sp,-1488
.cfi_def_cfa 2, 3536|  .cfi_def_cfa_offset 3536  <- #2
addisp,sp,1504  |  addi sp,sp,1504
.cfi_def_cfa_offset 2032|  .cfi_def_cfa_offset 2032  <- #3

The dwarf insn #1 and #3 seem ok, however #2 seems dubious to me.
What about it seems dubious?  We need a CFA adjustment on each insn that 
modifies the stack pointer so that we can unwind at any arbitrary point.


The first adjustment says the prior frame is at sp + 2048.  Then it's at 
sp + 3536.  Then after the final insn the prior frame is at sp+2032.


Jeff


Re: [to-be-committed][RISC-V] Eliminate redundant bitmanip operation

2024-05-19 Thread Jeff Law




On 5/19/24 1:59 PM, Andrew Pinski wrote:

On Sun, May 19, 2024 at 10:58 AM Jeff Law  wrote:


perl has some internal bitmap code.  One of its implementation
properties is that if you ask it to set a bit, the bit is first cleared.


Unfortunately this is fairly hard to see in gimple/match due to type
changes in the IL.  But it is easy to see in the code we get from
combine.  So we just match the relevant cases.



So looking into this from a gimple point of view, we can see the issue
on x86_64 if you used explicitly `unsigned char`.
We have:
```
   c_8 = (unsigned char) _1;
   _2 = *a_10(D);
   c.0_3 = (signed char) _1;
   _4 = ~c.0_3;
   _12 = (unsigned char) _4;
``
So for this, we could push the no_op cast from `signed char` to
`unsigned char` past the `bit_not` and I think it will fix the issue
on the gimple level.
So something like:
```
/* Push no_op conversion past the bit_not expression if it was single use. */
(simplify
  (convert (bit_not:s @0))
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
   (bit_not (convert @0
I'm not sure where the best place to put the conversion would be in 
gimple.  I bet there's times when we want the conversion at the outer 
level and others times at the inner level.  Just not sure it's going to 
be clear cut with either solution likely causing regressions somewhere.


What we can (and probably should) do is put this simplification into 
simplify-rtx.  It's target independent and shouldn't be hard to capture 
there.


Jeff



[to-be-committed][RISC-V] Eliminate redundant bitmanip operation

2024-05-19 Thread Jeff Law
perl has some internal bitmap code.  One of its implementation 
properties is that if you ask it to set a bit, the bit is first cleared.



Unfortunately this is fairly hard to see in gimple/match due to type 
changes in the IL.  But it is easy to see in the code we get from 
combine.  So we just match the relevant cases.




Regression tested in Ventana's CI system as well as my own.  Waiting on 
the Rivos CI system before moving forward.




Jeffgcc/

* config/riscv/bitmanip.md: Add patterns for setting a just
cleared bit or clearing a just set bit.
* config/riscv/riscv.cc (riscv_rtx_costs): Cost that RTL
properly

gcc/testsuite

* gcc.target/riscv/redundant-bitmap-1.c: New test.
* gcc.target/riscv/redundant-bitmap-2.c: New test.
* gcc.target/riscv/redundant-bitmap-3.c: New test.

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 8769a6b818b..9d4247ec8b9 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -877,6 +877,29 @@ (define_insn_and_split ""
}"
   [(set_attr "type" "bitmanip")])
 
+;; In theory these might be better handled with match.pd patterns, but
+;; the type changes tend to make it ugly, at least for the perl testcases
+(define_insn ""
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (ior:X (and:X (rotate:X (const_int -2)
+   (match_operand:QI 1 "register_operand" "r"))
+ (match_operand:X 2 "register_operand" "r"))
+  (ashift:X (const_int 1) (match_operand:QI 3 "register_operand" 
"r"]
+  "TARGET_ZBS && rtx_equal_p (operands[1], operands[3])"
+  "bset\t%0,%2,%1"
+  [(set_attr "type" "bitmanip")])
+
+(define_insn ""
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (and:X (any_or:X (ashift:X (const_int 1)
+  (match_operand:QI 1 "register_operand" "r"))
+(match_operand:X 2 "register_operand" "r"))
+  (rotate:X (const_int -2)
+(match_operand:QI 3 "register_operand" "r"]
+  "TARGET_ZBS && rtx_equal_p (operands[1], operands[3])"
+  "bclr\t%0,%2,%1"
+  [(set_attr "type" "bitmanip")])
+
 ;; IF_THEN_ELSE: test for 2 bits of opposite polarity
 (define_insn_and_split "*branch_mask_twobits_equals_singlebit"
   [(set (pc)
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index b0a14a2a82d..78a4a1cd554 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3712,6 +3712,22 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  return true;
}
 
+  /* Special case for bset followed by bclr.  */
+  if (GET_CODE (x) == AND
+ && (GET_CODE (XEXP (x, 0)) == IOR
+ || GET_CODE (XEXP (x, 0)) == XOR)
+ && GET_CODE (XEXP (XEXP (x, 0), 0)) == ASHIFT
+ && XEXP (XEXP (XEXP (x, 0), 0), 0) == CONST1_RTX (word_mode)
+ && GET_CODE (XEXP (x, 1)) == ROTATE
+ && CONST_INT_P (XEXP (XEXP (x, 1), 0))
+ && INTVAL (XEXP (XEXP (x, 1), 0)) == -2
+ && rtx_equal_p (XEXP (XEXP (XEXP (x, 0), 0), 1),
+(XEXP (XEXP (x, 1), 1
+   {
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
+
   gcc_fallthrough ();
 case IOR:
 case XOR:
@@ -3734,6 +3750,21 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  return true;
}
 
+  /* Special case for bclr followed by bset.  */
+  if (GET_CODE (x) == IOR
+ && GET_CODE (XEXP (x, 0)) == AND
+ && GET_CODE (XEXP (XEXP (x, 0), 0)) == ROTATE
+ && CONST_INT_P (XEXP (XEXP (XEXP (x, 0), 0), 0))
+ && INTVAL (XEXP (XEXP (XEXP (x, 0), 0), 0)) == -2
+ && GET_CODE (XEXP (x, 1)) == ASHIFT
+ && XEXP (XEXP (x, 1), 0) == CONST1_RTX (word_mode)
+ && rtx_equal_p (XEXP (XEXP (XEXP (x, 0), 0), 1),
+ (XEXP (XEXP (x, 1), 1
+   {
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
+
   /* Double-word operations use two single-word operations.  */
   *total = riscv_binary_cost (x, 1, 2);
   return false;
diff --git a/gcc/testsuite/g++.target/riscv/redundant-bitmap-1.C 
b/gcc/testsuite/g++.target/riscv/redundant-bitmap-1.C
new file mode 100644
index 000..85be608bdc8
--- /dev/null
+++ b/gcc/testsuite/g++.target/riscv/redundant-bitmap-1.C
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+void setBit(char , int b) {
+char c = 0x1UL << b;
+a &= ~c;
+a |= c;
+}
+
+/* { dg-final { scan-assembler-not "bclr\t" } } */
+
diff --git a/gcc/testsuite/g++.target/riscv/redundant-bitmap-2.C 
b/gcc/testsuite/g++.target/riscv/redundant-bitmap-2.C
new file mode 100644
index 000..9060eb1d769
--- /dev/null
+++ 

Re: [PATCH v4] DSE: Fix ICE after allow vector type in get_stored_val

2024-05-19 Thread Jeff Law




On 5/2/24 7:51 PM, pan2...@intel.com wrote:

From: Pan Li 

We allowed vector type for get_stored_val when read is less than or
equal to store in previous.  Unfortunately,  the valididate_subreg
treats the vector type's size is less than vector register as
invalid.  Then we will have ICE here.

This patch would like to fix it by filter-out the invalid type size,
and make sure the subreg is valid for both the read_mode and store_mode
before perform the real gen_lowpart.

The below test suites are passed for this patch:

* The x86 bootstrap test.
* The x86 regression test.
* The riscv rv64gcv regression test.
* The riscv rv64gc regression test.
* The aarch64 regression test.

gcc/ChangeLog:

* dse.cc (get_stored_val): Make sure read_mode/write_mode
is valid subreg before gen_lowpart.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-6.c: New test.
OK for the trunk.  Let's let it simmer on the trunk for a while before 
we consider backporting.


jeff



Re: [PATCH] Add widening expansion of MULT_HIGHPART_EXPR for integral modes

2024-05-19 Thread Jeff Law




On 5/19/24 3:40 AM, Eric Botcazou wrote:

Hi,


Just notice that this patch may result in some ICE when build libc++ for the
riscv port, details as below. Please note not all configuration can
reproduce this issue, feel free to ping me if you cannot reproduce this
issue. CC more riscv port people for awareness.


Sorry for the breakage, fixed thus, applied as obvious.


* optabs-query.cc (can_mult_highpart_p): Test for the existence of
a wider mode instead of requiring it.
I had basically the same patch here, but hadn't run it through the 
bootstrap & regression test yesterday.


Thanks for taking care of it!

jeff


[to-be-committed][RISC-V][PR target/115142] Do not create invalidate shift-add insn

2024-05-18 Thread Jeff Law

Repost, this time with the RISC-V tag so it's picked up by the CI system.

This fixes a minor bug that showed up in the CI system, presumably with 
fuzz testing.


Under the right circumstances, we could end trying to emit a shift-add 
style sequence where the to-be-shifted operand was not a register.  This 
naturally leads to an unrecognized insn.


The circumstances which triggered this weren't something that should 
appear in the wild (-ftree-ter, without optimization enabled).  So I 
wasn't planning to backport.  Obviously if it shows up in another 
context we can revisit that decision.


PR target/115142
gcc/

* config/riscv/riscv.cc (mem_shadd_or_shadd_rtx_p): Make sure
shifted argument is a register.

gcc/testsuite

* gcc.target/riscv/pr115142.c: New test.

I've run this through my rv32gcv and rv64gc tester.  Waiting on the CI 
system before committing.


jeff
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7a34b4be873..d0c22058b8c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2465,6 +2465,7 @@ mem_shadd_or_shadd_rtx_p (rtx x)
 {
   return ((GET_CODE (x) == ASHIFT
   || GET_CODE (x) == MULT)
+ && register_operand (XEXP (x, 0), GET_MODE (x))
  && CONST_INT_P (XEXP (x, 1))
  && ((GET_CODE (x) == ASHIFT && IN_RANGE (INTVAL (XEXP (x, 1)), 1, 3))
  || (GET_CODE (x) == MULT
diff --git a/gcc/testsuite/gcc.target/riscv/pr115142.c 
b/gcc/testsuite/gcc.target/riscv/pr115142.c
new file mode 100644
index 000..40ba49dfa20
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr115142.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -ftree-ter" } */
+
+long a;
+char b;
+void e() {
+  char f[8][1];
+  b = f[a][a];
+}
+


[to-be-committed][PR target/115142] Do not create invalidate shift-add insn

2024-05-18 Thread Jeff Law
This fixes a minor bug that showed up in the CI system, presumably with 
fuzz testing.


Under the right circumstances, we could end trying to emit a shift-add 
style sequence where the to-be-shifted operand was not a register.  This 
naturally leads to an unrecognized insn.


The circumstances which triggered this weren't something that should 
appear in the wild (-ftree-ter, without optimization enabled).  So I 
wasn't planning to backport.  Obviously if it shows up in another 
context we can revisit that decision.


PR target/115142
gcc/

* config/riscv/riscv.cc (mem_shadd_or_shadd_rtx_p): Make sure
shifted argument is a register.

gcc/testsuite

* gcc.target/riscv/pr115142.c: New test.

I've run this through my rv32gcv and rv64gc tester.  Waiting on the CI 
system before committing.


jeffdiff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7a34b4be873..d0c22058b8c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2465,6 +2465,7 @@ mem_shadd_or_shadd_rtx_p (rtx x)
 {
   return ((GET_CODE (x) == ASHIFT
   || GET_CODE (x) == MULT)
+ && register_operand (XEXP (x, 0), GET_MODE (x))
  && CONST_INT_P (XEXP (x, 1))
  && ((GET_CODE (x) == ASHIFT && IN_RANGE (INTVAL (XEXP (x, 1)), 1, 3))
  || (GET_CODE (x) == MULT
diff --git a/gcc/testsuite/gcc.target/riscv/pr115142.c 
b/gcc/testsuite/gcc.target/riscv/pr115142.c
new file mode 100644
index 000..40ba49dfa20
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr115142.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -ftree-ter" } */
+
+long a;
+char b;
+void e() {
+  char f[8][1];
+  b = f[a][a];
+}
+


Re: [PATCH] RISC-V: Fix "Nan-box the result of movbf on soft-bf16"

2024-05-17 Thread Jeff Law




On 5/15/24 7:55 PM, Xiao Zeng wrote:

1 According to unpriv-isa spec:

   1.1 "FMV.H.X moves the half-precision value encoded in IEEE 754-2008
   standard encoding from the lower 16 bits of integer register rs1
   to the floating-point register rd, NaN-boxing the result."
   1.2 "FMV.W.X moves the single-precision value encoded in IEEE 754-2008
   standard encoding from the lower 32 bits of integer register rs1
   to the floating-point register rd. The bits are not modified in the
   transfer, and in particular, the payloads of non-canonical NaNs are 
preserved."

2 When (!TARGET_ZFHMIN == true && TARGET_HARD_FLOAT == true), instruction needs
to be added to complete the Nan-box, as done in
"RISC-V: Nan-box the result of movhf on soft-fp16":


3 Consider the "RISC-V: Nan-box the result of movbf on soft-bf16" in:

It ignores that both hf16 and bf16 are 16bits floating-point.

4 zfbfmin -> zfhmin in:


gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Optimize movbf
with Nan-boxing value.
* config/riscv/riscv.md (*movhf_softfloat_boxing): Expand movbf
with Nan-boxing value.
(*mov_softfloat_boxing): Ditto.
with Nan-boxing value.
(*movbf_softfloat_boxing): Delete abandon pattern.
---
  gcc/config/riscv/riscv.cc | 15 +--
  gcc/config/riscv/riscv.md | 19 +--
  2 files changed, 10 insertions(+), 24 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 4067505270e..04513537aad 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3178,13 +3178,10 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
   (set (reg:SI/DI mask) (const_int -65536)
   (set (reg:SI/DI temp) (zero_extend:SI/DI (subreg:HI (reg:HF/BF src) 0)))
   (set (reg:SI/DI temp) (ior:SI/DI (reg:SI/DI mask) (reg:SI/DI temp)))
- (set (reg:HF/BF dest) (unspec:HF/BF[ (reg:SI/DI temp) ]
-   UNSPEC_FMV_SFP16_X/UNSPEC_FMV_SBF16_X))
- */
+ (set (reg:HF/BF dest) (unspec:HF/BF[ (reg:SI/DI temp) ] 
UNSPEC_FMV_FP16_X))
+  */
  
-  if (TARGET_HARD_FLOAT

-  && ((!TARGET_ZFHMIN && mode == HFmode)
- || (!TARGET_ZFBFMIN && mode == BFmode))
+  if (TARGET_HARD_FLOAT && !TARGET_ZFHMIN && (mode == HFmode || mode == BFmode)
We generally prefer not to mix && and || operators on the same line. 
I'd suggest


if (TARGET_HARD_FLOAT
&& !TARGET_ZFHMIN
&& (mode == HFmode || mode == BFmode)
[ ... ]



@@ -1959,23 +1958,15 @@
 (set_attr "type" "fmove,move,load,store,mtc,mfc")
 (set_attr "mode" "")])
  
-(define_insn "*movhf_softfloat_boxing"

-  [(set (match_operand:HF 0 "register_operand""=f")
-(unspec:HF [(match_operand:X 1 "register_operand" " r")] 
UNSPEC_FMV_SFP16_X))]
+(define_insn "*mov_softfloat_boxing"
+  [(set (match_operand:HFBF 0 "register_operand" "=f")
+(unspec:HFBF [(match_operand:X 1 "register_operand" " r")]
+UNSPEC_FMV_FP16_X))]
"!TARGET_ZFHMIN"
I think the linter complained about having 8 spaces instead of a tab in 
one of the lines above.


With those fixes, this is fine for the trunk.

jeff


Re: [PATCH] RISC-V: Modify _Bfloat16 to __bf16

2024-05-17 Thread Jeff Law




On 5/17/24 2:19 AM, Kito Cheng wrote:

LGTM, thanks for fixing this :)
And just to be clear for Xiao, you can go ahead and commit this patch to 
the trunk.  An ACK from Kito, Juzhe, Palmer, Robin or myself is all you 
need for a change that is isolated to RISC-V code.


jeff



Re: [PATCH] RISC-V: Remove dead perm series code and document.

2024-05-17 Thread Jeff Law




On 5/17/24 9:27 AM, Robin Dapp wrote:

Hi,

with the introduction of shuffle_series_patterns the explicit handler
code for a perm series is dead.  This patch removes it and also adds
a function-level comment to shuffle_series_patterns.

Regtested on rv64gcv_zvfh_zvbb.

Regards
  Robin

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Document.
(shuffle_extract_and_slide1up_patterns): Remove.

OK.

Jeff



Re: [PATCH v1] RISC-V: Cleanup some temporally files [NFC]

2024-05-17 Thread Jeff Law




On 5/16/24 6:12 PM, Li, Pan2 wrote:

Committed, thanks Juzhe.

Thanks for cleaning up my little mess!  Sorry about that.

jeff



Re: [PATCH gcc-13] Fix RISC-V missing stack tie

2024-05-16 Thread Jeff Law




On 5/16/24 12:24 PM, Palmer Dabbelt wrote:



gcc/
* config/riscv/riscv.cc (riscv_expand_prologue): Add missing stack
tie for scalable and final stack adjustment if needed.

Co-authored-by: Raphael Zinsly 

(cherry picked from commit c65046ff2ef0a9a46e59bc0b3369b2d226f6a239)
---
I've only build tested this one, but it's tripping up some of the Fedora
folks here https://bugzilla.redhat.com/show_bug.cgi?id=2242327 so I
figured it's worth backporting.
Yes, that's the the original report from Florian that led Raphael and I 
to dive in.  Definitely worth backporting.


jeff



Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-16 Thread Jeff Law




On 5/16/24 5:58 AM, Richard Biener wrote:

On Thu, May 16, 2024 at 11:35 AM Li, Pan2  wrote:



OK.


Thanks Richard for help and coaching. To double confirm, are you OK with this 
patch only or for the series patch(es) of SAT middle-end?
Thanks again for reviewing and suggestions.


For the series, the riscv specific part of course needs riscv approval.
Yea, we'll take a look at it.  Tons of stuff to go through, but this is 
definitely on the list.


jeff



Re: [PATCH] tree-optimization/13962 - handle ptr-ptr compares in ptrs_compare_unequal

2024-05-16 Thread Jeff Law




On 5/16/24 6:03 AM, Richard Biener wrote:

Now that we handle pt.null conservatively we can implement the missing
tracking of constant pool entries (aka STRING_CST) and handle
ptr-ptr compares using points-to info in ptrs_compare_unequal.

Bootstrapped on x86_64-unknown-linux-gnu, (re-)testing in progress.

Richard.

PR tree-optimization/13962
PR tree-optimization/96564
* tree-ssa-alias.h (pt_solution::const_pool): New flag.
* tree-ssa-alias.cc (ptrs_compare_unequal): Handle pointer-pointer
compares.
(dump_points_to_solution): Dump the const_pool flag, fix guard
of flag dumping.
* gimple-pretty-print.cc (pp_points_to_solution): Likewise.
* tree-ssa-structalias.cc (find_what_var_points_to): Set
the const_pool flag for STRING.
(pt_solution_ior_into): Handle the const_pool flag.
(ipa_escaped_pt): Initialize it.

* gcc.dg/tree-ssa/alias-39.c: New testcase.
* g++.dg/vect/pr68145.cc: Use -fno-tree-pta to avoid UB
to manifest in transforms no longer vectorizing this testcase
for an ICE.
You might want to test this against 92539 as well.  There's a nonzero 
chance it'll resolve that one.


jeff



Re: [PATCH v2 1/2] RISC-V: Add cmpmemsi expansion

2024-05-15 Thread Jeff Law




On 5/15/24 12:49 AM, Christoph Müllner wrote:

GCC has a generic cmpmemsi expansion via the by-pieces framework,
which shows some room for target-specific optimizations.
E.g. for comparing two aligned memory blocks of 15 bytes
we get the following sequence:

my_mem_cmp_aligned_15:
 li  a4,0
 j   .L2
.L8:
 bgeua4,a7,.L7
.L2:
 add a2,a0,a4
 add a3,a1,a4
 lbu a5,0(a2)
 lbu a6,0(a3)
 addia4,a4,1
 li  a7,15// missed hoisting
 subwa5,a5,a6
 andia5,a5,0xff // useless
 beq a5,zero,.L8
 lbu a0,0(a2) // loading again!
 lbu a5,0(a3) // loading again!
 subwa0,a0,a5
 ret
.L7:
 li  a0,0
 ret

Diff first byte: 15 insns
Diff second byte: 25 insns
No diff: 25 insns

Possible improvements:
* unroll the loop and use load-with-displacement to avoid offset increments
* load and compare multiple (aligned) bytes at once
* Use the bitmanip/strcmp result calculation (reverse words and
   synthesize (a2 >= a3) ? 1 : -1 in a branchless sequence)

When applying these improvements we get the following sequence:

my_mem_cmp_aligned_15:
 ld  a5,0(a0)
 ld  a4,0(a1)
 bne a5,a4,.L2
 ld  a5,8(a0)
 ld  a4,8(a1)
 sllia5,a5,8
 sllia4,a4,8
 bne a5,a4,.L2
 li  a0,0
.L3:
 sext.w  a0,a0
 ret
.L2:
 rev8a5,a5
 rev8a4,a4
 sltua5,a5,a4
 neg a5,a5
 ori a0,a5,1
 j   .L3

Diff first byte: 11 insns
Diff second byte: 16 insns
No diff: 11 insns

This patch implements this improvements.

The tests consist of a execution test (similar to
gcc/testsuite/gcc.dg/torture/inline-mem-cmp-1.c) and a few tests
that test the expansion conditions (known length and alignment).

Similar to the cpymemsi expansion this patch does not introduce any
gating for the cmpmemsi expansion (on top of requiring the known length,
alignment and Zbb).

Bootstrapped and SPEC CPU 2017 tested.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_block_compare): New
prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER2): New helper
for zero_extendhi.
(do_load_from_addr): Add support for HI and SI/64 modes.
(do_load): Add helper for zero-extended loads.
(emit_memcmp_scalar_load_and_compare): New helper to emit memcmp.
(emit_memcmp_scalar_result_calculation): Likewise.
(riscv_expand_block_compare_scalar): Likewise.
(riscv_expand_block_compare): New RISC-V expander for memory compare.
* config/riscv/riscv.md (cmpmemsi): New cmpmem expansion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmpmemsi-1.c: New test.
* gcc.target/riscv/cmpmemsi-2.c: New test.
* gcc.target/riscv/cmpmemsi-3.c: New test.
* gcc.target/riscv/cmpmemsi.c: New test.

[ ... ]
I fixed some of the nits from the linter (whitespace stuff) and pushed 
both patches of this series.


Jeff



Re: [PATCH] RISC-V: propgue/epilogue expansion code minor changes [NFC]

2024-05-15 Thread Jeff Law




On 5/15/24 12:55 PM, Vineet Gupta wrote:

Saw this little room for improvement in current debugging of
prologue/epilogue expansion code.

---

Use the following pattern consistently
`RTX_FRAME_RELATED_P (gen_insn (insn)) = 1`

vs. calling gen_insn around apriori gen_xxx_insn () calls.

This reduces weird indentations which are done inconsistently.

And also move the RTX_FRAME_RELATED_P () calls immediately after those
gen_xxx_insn () calls.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_epilogue): Use pattern
described above.
(riscv_expand_prologue): Ditto.
(riscv_for_each_saved_v_reg): Ditto.

Thanks for cleaning this up.  Just having consistency is helpful.

All this gets scrambled again with stack-clash protection :(  But that's 
just the nature of the beast.


jeff


[to-be-committed][RISC-V] Improve some shift-add sequences

2024-05-15 Thread Jeff Law


So this is a minor fix/improvement for shift-add sequences.  This was 
supposed to help xz in a minor way IIRC.


Combine may present us with (x + C2') << C1 which was canonicalized from 
(x << C1) + C2.


Depending on the precise values of C2 and C2' one form may be better 
than the other.  We can (somewhat awkwardly) use riscv_const_insns to 
test for which sequence would be preferred.


Tested on Ventana's CI system as well as my own.  Waiting on CI results 
from Rivos's tester before moving forward.


Jeff




gcc/
* config/riscv/riscv.md: Add new patterns to allow selection
between (x << C1) + C2 vs (x + C2') << C1 depending on the
cost C2 vs C2'.

gcc/testsuite

* gcc.target/riscv/shift-add-1.c: New test.

commit 03933cf8813b28587ceb7f6f66ac03d08c5de58b
Author: Jeff Law 
Date:   Thu Apr 4 13:35:54 2024 -0600

Optimize (x << C1) + C2 after canonicalization to ((x + C2') << C1).

C2' may have a lower cost to synthesize than C1.  Reassociate to take
advantage of that.

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index ffb09a4109d..69c80bc4a86 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4416,6 +4416,62 @@ (define_insn_and_split ""
   "{ operands[6] = gen_lowpart (SImode, operands[5]); }"
   [(set_attr "type" "arith")])
 
+;; These are forms of (x << C1) + C2, potentially canonicalized from
+;; ((x + C2') << C1.  Depending on the cost to load C2 vs C2' we may
+;; want to go ahead and recognize this form as C2 may be cheaper to
+;; synthesize than C2'.
+;;
+;; It might be better to refactor riscv_const_insns a bit so that we
+;; can have an API that passes integer values around rather than
+;; constructing a lot of garbage RTL.
+;;
+;; The mvconst_internal pattern in effect requires this pattern to
+;; also be a define_insn_and_split due to insn count costing when
+;; splitting in combine.
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (plus:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
+   (match_operand 2 "const_int_operand" "n"))
+(match_operand 3 "const_int_operand" "n")))
+   (clobber (match_scratch:DI 4 "="))]
+  "(TARGET_64BIT
+&& riscv_const_insns (operands[3])
+&& ((riscv_const_insns (operands[3])
+< riscv_const_insns (GEN_INT (INTVAL (operands[3]) >> INTVAL 
(operands[2]
+   || riscv_const_insns (GEN_INT (INTVAL (operands[3]) >> INTVAL 
(operands[2]))) == 0))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 2)))
+   (set (match_dup 4) (match_dup 3))
+   (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 4)))]
+  ""
+  [(set_attr "type" "arith")])
+
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI (plus:SI (ashift:SI
+  (match_operand:SI 1 "register_operand" "r")
+  (match_operand 2 "const_int_operand" "n"))
+(match_operand 3 "const_int_operand" "n"
+   (clobber (match_scratch:DI 4 "="))]
+  "(TARGET_64BIT
+&& riscv_const_insns (operands[3])
+&& ((riscv_const_insns (operands[3])
+< riscv_const_insns (GEN_INT (INTVAL (operands[3]) >> INTVAL 
(operands[2]
+   || riscv_const_insns (GEN_INT (INTVAL (operands[3]) >> INTVAL 
(operands[2]))) == 0))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 2)))
+   (set (match_dup 4) (match_dup 3))
+   (set (match_dup 0) (sign_extend:DI (plus:SI (match_dup 5) (match_dup 6]
+  "{
+ operands[1] = gen_lowpart (DImode, operands[1]);
+ operands[5] = gen_lowpart (SImode, operands[0]);
+ operands[6] = gen_lowpart (SImode, operands[4]);
+   }"
+  [(set_attr "type" "arith")])
+
+
 (include "bitmanip.md")
 (include "crypto.md")
 (include "sync.md")
diff --git a/gcc/testsuite/gcc.target/riscv/shift-add-1.c 
b/gcc/testsuite/gcc.target/riscv/shift-add-1.c
new file mode 100644
index 000..d98875c3271
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shift-add-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbb_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+int composeFromSurrogate(const unsigned short high) {
+
+return  ((high - 0xD800) << 10) ;
+}
+
+
+long composeFromSurrogate_2(const unsigned long high) {
+
+return  ((high - 0xD800) << 10) ;
+}
+
+
+/* { dg-final { scan-assembler-times "\tli\t" 2 } } */
+/* { dg-final { scan-assembler-times "\tslli\t" 2 } } */
+/* { dg-final { scan-assembler-times "\taddw\t" 1 } } */
+/* { dg-final { scan-assembler-times "\tadd\t" 1 } } */
+


Re: [PATCH] RISC-V: Fix cbo.zero expansion for rv32

2024-05-15 Thread Jeff Law




On 5/15/24 12:48 AM, Christoph Müllner wrote:

Emitting a DI pattern won't find a match for rv32 and manifests in
the failing test case gcc.target/riscv/cmo-zicboz-zic64-1.c.
Let's fix this in the expansion and also address the different
code that gets generated for rv32/rv64.

gcc/ChangeLog:

* config/riscv/riscv-string.cc (riscv_expand_block_clear_zicboz_zic64b):
Fix expansion for rv32.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicboz-zic64-1.c: Fix for rv32.
The exact change I made yesterday for the code generator.  Glad to see I 
didn't muck it up :-)  And thanks for fixing the test to have some 
coverage on rv32.


Jeff



Re: [PATCH] RISC-V: Test cbo.zero expansion for rv32

2024-05-15 Thread Jeff Law




On 5/15/24 1:28 AM, Christoph Müllner wrote:

We had an issue when expanding via cmo-zero for RV32.
This was fixed upstream, but we don't have a RV32 test.
Therefore, this patch introduces such a test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicboz-zic64-1.c: Fix for rv32.

OK.  Thanks!

jeff



[committed] Fix rv32 issues with recent zicboz work

2024-05-14 Thread Jeff Law
I should have double-checked the CI system before pushing Christoph's 
patches for memset-zero.  While I thought I'd checked CI state, I must 
have been looking at the wrong patch from Christoph.


Anyway, this fixes the rv32 ICEs and disables one of the tests for rv32.

The test would need a revamp for rv32 as the expected output is all rv64 
code using "sd" instructions.  I'm just not vested deeply enough into 
rv32 to adjust the test to work in that environment though it should be 
fairly trivial to copy the test and provide new expected output if 
someone cares enough.





Verified this fixes the rv32 failures in my tester:

New tests that FAIL (6 tests):

unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1  
(internal compiler error: in extract_insn, at recog.cc:2812)
unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1  (test 
for excess errors)
unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2  
(internal compiler error: in extract_insn, at recog.cc:2812)
unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2  (test 
for excess errors)
unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
(internal compiler error: in extract_insn, at recog.cc:2812)
unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  (test 
for excess errors)



And after the ICE is fixed, these are eliminated by only running the 
test for rv64:



New tests that FAIL (3 tests):

unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1   
check-function-bodies clear_buf_123
unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2   
check-function-bodies clear_buf_123
unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g   
check-function-bodies clear_buf_123


Pushed to the trunk.

Jeff

commit e410ad74e5e4589aeb666aa298b2f933e7b5d9e7
Author: Jeff Law 
Date:   Tue May 14 22:50:15 2024 -0600

[committed] Fix rv32 issues with recent zicboz work

I should have double-checked the CI system before pushing Christoph's 
patches
for memset-zero.  While I thought I'd checked CI state, I must have been
looking at the wrong patch from Christoph.

Anyway, this fixes the rv32 ICEs and disables one of the tests for rv32.

The test would need a revamp for rv32 as the expected output is all rv64 
code
using "sd" instructions.  I'm just not vested deeply enough into rv32 to 
adjust
the test to work in that environment though it should be fairly trivial to 
copy
the test and provide new expected output if someone cares enough.

Verified this fixes the rv32 failures in my tester:
> New tests that FAIL (6 tests):
>
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1  
(internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1  
(test for excess errors)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2  
(internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2  
(test for excess errors)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
(internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
(test for excess errors)

And after the ICE is fixed, these are eliminated by only running the test 
for
rv64:

> New tests that FAIL (3 tests):
>
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1   
check-function-bodies clear_buf_123
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2   
check-function-bodies clear_buf_123
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
 check-function-bodies clear_buf_123

gcc/
* config/riscv/riscv-string.cc
(riscv_expand_block_clear_zicboz_zic64b): Handle rv32 correctly.

gcc/testsuite

* gcc.target/riscv/cmo-zicboz-zic64-1.c: Don't run on rv32.

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 87f5fdee3c1..b515f44d17a 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -827,7 +827,10 @@ riscv_expand_block_clear_zicboz_zic64b (rtx dest, rtx 
length)
 {
   rtx mem = adjust_address (dest, BLKmode, offset);
   rtx addr = force_reg (Pmode, XEXP (mem, 0));
-  emit_insn (gen_riscv_zero_di (addr));
+  if (TARGET_64BIT)
+   emit_insn (gen_riscv_zero_di (addr));
+  else
+   emit_insn (gen_riscv_zero_si (addr));
   offset += cbo_bytes;
 }
 
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
index c2d

Re: [PATCH] RISC-V: Implement -m{,no}fence-tso

2024-05-14 Thread Jeff Law




On 5/14/24 5:13 PM, Palmer Dabbelt wrote:

Some processors from T-Head don't implement the `fence.tso` instruction
natively and instead trap to firmware.  This breaks some users who
haven't yet updated the firmware and one could imagine it breaking users
who are trying to build firmware if they're using the C memory model.

So just add an option to disable emitting it, in a similar fashion to
how we allow users to forbid other instructions.

gcc/ChangeLog:

* config/riscv/riscv.opt: Add -mno-fence-tso.
* config/riscv/sync-rvwmo.md (mem_thread_fence_rvwmo): Respect
-mno-fence-tso.
* doc/invoke.texi (RISC-V): Document -mno-fence-tso.

Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1070959
---
I've just smoke tested this one, but

 void func(void) { __atomic_thread_fence(__ATOMIC_ACQ_REL); }

generates `fence.tso` without the argument and `fence rw,rw` with
`-mno-fence-tso`, so it seems to be at least mostly there.  I figured
I'd just send it up for comments before putting together the DG bits:
it's kind of a pain to carry around these workarounds for unimplemented
instructions, but it's in HW so there's not much we can do about that.
Seems reasonable.  We might consider adding a comment in the code 
indicating this is for a particular set of thead systems.  10 years from 
now when someone else looks at the code they'll know why this is in 
there and they won't have to do the archaeology.


Jeff


Re: [PATCH v2 2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-05-14 Thread Jeff Law




On 5/14/24 10:36 AM, Vineet Gupta wrote:



On 5/14/24 08:44, Jeff Law wrote:

On 5/14/24 8:51 AM, Patrick O'Neill wrote:

I was able to find the summary info:


Tests that now fail, but worked before (15 tests):
libgomp: libgomp.fortran/simd7.f90   -O0  execution test
libgomp: libgomp.fortran/task2.f90   -O0  execution test
libgomp: libgomp.fortran/vla2.f90   -O0  execution test
libgomp: libgomp.fortran/vla3.f90   -O3 -fomit-frame-pointer -
funroll-loops -fpeel-loops -ftracer -finline-functions execution test
libgomp: libgomp.fortran/vla3.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla4.f90   -O1  execution test
libgomp: libgomp.fortran/vla4.f90   -O2  execution test
libgomp: libgomp.fortran/vla4.f90   -O3 -fomit-frame-pointer -
funroll-loops -fpeel-loops -ftracer -finline-functions execution test
libgomp: libgomp.fortran/vla4.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla4.f90   -Os  execution test
libgomp: libgomp.fortran/vla5.f90   -O1  execution test
libgomp: libgomp.fortran/vla5.f90   -O2  execution test
libgomp: libgomp.fortran/vla5.f90   -O3 -fomit-frame-pointer -
funroll-loops -fpeel-loops -ftracer -finline-functions execution test
libgomp: libgomp.fortran/vla5.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla5.f90   -Os  execution test

So if you could check on those, it'd be appreciated.

I checked rv64gcv linux and those do not currently run in CI.

So just ran with Vineet's patch in our CI system.  His patch is still
triggering those regressions.  So we need to get that resolved before
that second patch can go in.


And just for reproducibility what exact --with-arch build is this from ?

This run was with "--with-arch=rv64gc_zba_zbb_zbc_zbkb_zbs_zfa_zicond"

I think we likely saw it without zbkb & zfa when we first looked at this 
a few months back.


jeff



Re: [PATCH v2 2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-05-14 Thread Jeff Law




On 5/14/24 8:51 AM, Patrick O'Neill wrote:





I was able to find the summary info:


Tests that now fail, but worked before (15 tests):
libgomp: libgomp.fortran/simd7.f90   -O0  execution test
libgomp: libgomp.fortran/task2.f90   -O0  execution test
libgomp: libgomp.fortran/vla2.f90   -O0  execution test
libgomp: libgomp.fortran/vla3.f90   -O3 -fomit-frame-pointer - 
funroll-loops -fpeel-loops -ftracer -finline-functions execution test

libgomp: libgomp.fortran/vla3.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla4.f90   -O1  execution test
libgomp: libgomp.fortran/vla4.f90   -O2  execution test
libgomp: libgomp.fortran/vla4.f90   -O3 -fomit-frame-pointer - 
funroll-loops -fpeel-loops -ftracer -finline-functions execution test

libgomp: libgomp.fortran/vla4.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla4.f90   -Os  execution test
libgomp: libgomp.fortran/vla5.f90   -O1  execution test
libgomp: libgomp.fortran/vla5.f90   -O2  execution test
libgomp: libgomp.fortran/vla5.f90   -O3 -fomit-frame-pointer - 
funroll-loops -fpeel-loops -ftracer -finline-functions execution test

libgomp: libgomp.fortran/vla5.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla5.f90   -Os  execution test


So if you could check on those, it'd be appreciated.


I checked rv64gcv linux and those do not currently run in CI.
So just ran with Vineet's patch in our CI system.  His patch is still 
triggering those regressions.  So we need to get that resolved before 
that second patch can go in.


jeff



Re: [PATCH 1/3] expr: Export clear_by_pieces()

2024-05-14 Thread Jeff Law




On 5/7/24 11:38 PM, Christoph Müllner wrote:

Make clear_by_pieces() available to other parts of the compiler,
similar to store_by_pieces().

gcc/ChangeLog:

* expr.cc (clear_by_pieces): Remove static from clear_by_pieces.
* expr.h (clear_by_pieces): Add prototype for clear_by_pieces.
I'm going to push this series.  It's fully ack'd, tested and is going to 
interact with Sergei's work on vector variants of relevant patterns.


Jeff


Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-14 Thread Jeff Law




On 5/14/24 8:57 AM, Qing Zhao wrote:




On May 13, 2024, at 20:14, Kees Cook  wrote:

On Tue, May 14, 2024 at 01:38:49AM +0200, Andrew Pinski wrote:

On Mon, May 13, 2024, 11:41 PM Kees Cook  wrote:

But it makes no sense to warn about:

void sparx5_set (int * ptr, struct nums * sg, int index)
{
   if (index >= 4)
 warn ();
   *ptr = 0;
   *val = sg->vals[index];
   if (index >= 4)
 warn ();
   *ptr = *val;
}

Because at "*val = sg->vals[index];" the actual value range tracking for
index is _still_ [INT_MIN,INT_MAX]. (Only within the "then" side of the
"if" statements is the range tracking [4,INT_MAX].)

However, in the case where jump threading has split the execution flow
and produced a copy of "*val = sg->vals[index];" where the value range
tracking for "index" is now [4,INT_MAX], is the warning valid. But it
is only for that instance. Reporting it for effectively both (there is
only 1 source line for the array indexing) is misleading because there
is nothing the user can do about it -- the compiler created the copy and
then noticed it had a range it could apply to that array index.



"there is nothing the user can do about it" is very much false. They could
change warn call into a noreturn function call instead.  (In the case of
the Linux kernel panic). There are things the user can do to fix the
warning and even get better code generation out of the compilers.


This isn't about warn() not being noreturn. The warn() could be any
function call; the jump threading still happens.


When the program is executed on the “if (index > = 4)” path,  the value of 
“index” is definitely

= 4, when sg->vals[index] is referenced on this path (the case when the routine 
“warn” is NOT noreturn), it’s

definitely an out-of-bounds array access.  So, the compiler’s warning is 
correct. And this warning does catch
a potential issue in the source code that need to be fixed by either of the 
following two solutions:

1. Make the routine “warn” as noreturn and mark it noreturn;
This would be my recommendation.  We're about to execute undefined 
behavior.  I don't see a way to necessarily recover safely here, so I'd 
suggest having warn() not return and mark it appropriately.


That'll have numerous secondary benefits as well.

jeff



Re: [PATCH v2 2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-05-14 Thread Jeff Law




On 5/14/24 8:51 AM, Patrick O'Neill wrote:


On 5/13/24 20:36, Jeff Law wrote:



On 5/13/24 6:54 PM, Patrick O'Neill wrote:


On 5/13/24 13:28, Jeff Law wrote:



On 5/13/24 12:49 PM, Vineet Gupta wrote:
If the constant used for stack offset can be expressed as sum of 
two S12
values, the constant need not be materialized (in a reg) and 
instead the

two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.

The prev patches to generally avoid LUI based const materialization 
didn't

fix this PR and need this directed fix in funcion prologue/epilogue
expansion.

This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the 
test ;-)


    gcc-13.1 release   |  gcc 230823 | |
   |    g6619b3d4c15c    |   This patch | 
clang/llvm

-
li  t0,-4096 | li    t0,-4096  | addi sp,sp,-2048 | 
addi sp,sp,-2048
addi    t0,t0,2016   | addi  t0,t0,2032    | add sp,sp,-16   | addi 
sp,sp,-32
li  a4,4096  | add   sp,sp,t0  | add a5,sp,a0    | add 
a1,sp,16
add sp,sp,t0 | addi  a5,sp,-2032   | sb zero,0(a5)  | add 
a0,a0,a1
li  a5,-4096 | add   a0,a5,a0  | addi sp,sp,2032  | sb 
zero,0(a0)
addi    a4,a4,-2032  | li    t0, 4096  | addi sp,sp,32    | 
addi sp,sp,2032
add a4,a4,a5 | sb    zero,2032(a0) | ret   | 
addi sp,sp,48

addi    a5,sp,16 | addi  t0,t0,-2032 |   | ret
add a5,a4,a5 | add   sp,sp,t0  |
add a0,a5,a0 | ret |
li  t0,4096  |
sd  a5,8(sp) |
sb  zero,2032(a0)|
addi    t0,t0,-2016  |
add sp,sp,t0 |
ret  |

gcc/ChangeLog:
PR target/105733
* config/riscv/riscv.h: New macros for with aligned offsets.
* config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
function to split a sum of two s12 values into constituents.
(riscv_expand_prologue): Handle offset being sum of two S12.
(riscv_expand_epilogue): Ditto.
* config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr105733.c: New Test.
* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
expect LUI 4096.
* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.





@@ -8074,14 +8111,26 @@ riscv_expand_epilogue (int style)
  }
    else
  {
-  if (!SMALL_OPERAND (adjust_offset.to_constant ()))
+  HOST_WIDE_INT adj_off_value = adjust_offset.to_constant ();
+  if (SMALL_OPERAND (adj_off_value))
+    {
+  adjust = GEN_INT (adj_off_value);
+    }
+  else if (SUM_OF_TWO_S12_ALGN (adj_off_value))
+    {
+  HOST_WIDE_INT base, off;
+  riscv_split_sum_of_two_s12 (adj_off_value, , );
+  insn = gen_add3_insn (stack_pointer_rtx, 
hard_frame_pointer_rtx,

+    GEN_INT (base));
+  RTX_FRAME_RELATED_P (insn) = 1;
+  adjust = GEN_INT (off);
+    }
So this was the hunk that we identified internally as causing 
problems with libgomp's testsuite.  We never fully chased it down as 
this hunk didn't seem terribly important performance wise -- we just 
set it aside.  The thing is it looked basically correct to me.  So 
the failure was certainly unexpected, but it was consistent.


So I think the question is whether or not the CI system runs the 
libgomp testsuite, particularly in the rv64 linux configuration. If 
it does, and it passes, then we're good. I'm still finding my way 
around the configuration, so I don't know if the CI system Edwin & 
Patrick have built tests libgomp or not.


I poked around the .sum files in pre/postcommit and we do run tests 
like:


PASS: c-c++-common/gomp/affinity-2.c  (test for errors, line 45)

I was able to find the summary info:


Tests that now fail, but worked before (15 tests):
libgomp: libgomp.fortran/simd7.f90   -O0  execution test
libgomp: libgomp.fortran/task2.f90   -O0  execution test
libgomp: libgomp.fortran/vla2.f90   -O0  execution test
libgomp: libgomp.fortran/vla3.f90   -O3 -fomit-frame-pointer - 
funroll-loops -fpeel-loops -ftracer -finline-functions execution test

libgomp: libgomp.fortran/vla3.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla4.f90   -O1  execution test
libgomp: libgomp.fortran/vla4.f90   -O2  execution test
libgomp: libgomp.fortran/vla4.f90   -O3 -fomit-frame-pointer - 
funroll-loops -fpeel-loops -ftracer -finline-functions execution test

libgomp: libgomp.fortran/vla4.f90   -O

[to-be-committed][RISC-V] Remove redundant AND in shift-add sequence

2024-05-14 Thread Jeff Law
So this patch allows us to eliminate an redundant AND in some shift-add 
style sequences.   I think the testcase was reduced from xz by the RAU 
team, but I'm not highly confident of that.


Specifically the AND is masking off the upper 32 bits of the un-shifted 
value and there's an outer SIGN_EXTEND from SI to DI.  However in the 
RTL it's working on the post-shifted value, so the constant is left 
shifted, so we have to account for that in the pattern's condition.


We can just drop the AND in this case.  So instead we do a 64bit shift, 
then a sign extending ADD utilizing the low part of that 64bit shift result.



This has run through Ventana's CI as well as my own.  I'll wait for it 
to run through the larger CI system before pushing.


Jeff
gcc/
* config/riscv/riscv.md: Add pattern for sign extended shift-add 
sequence with a masked input.

gcc/testsuite

* gcc.target/riscv/shift-add-2.c: New test.

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4d6de992557..520c0f54150 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4056,6 +4056,31 @@ (define_insn "*large_load_address"
   [(set_attr "type" "load")
(set (attr "length") (const_int 8))])
 
+;; The AND is redunant here.  It always turns off the high 32 bits  and the
+;; low number of bits equal to the shift count.  Those upper 32 bits will be
+;; reset by the SIGN_EXTEND at the end.
+;;
+;; One could argue combine should have realized this and simplified what it
+;; presented to the backend.  But we can obviously cope with what it gave us.
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI
+ (plus:SI (subreg:SI
+(and:DI
+  (ashift:DI (match_operand:DI 1 "register_operand" "r")
+ (match_operand 2 "const_int_operand" "n"))
+  (match_operand 3 "const_int_operand" "n")) 0)
+  (match_operand:SI 4 "register_operand" "r"
+   (clobber (match_scratch:DI 5 "="))]
+  "TARGET_64BIT
+   && (INTVAL (operands[3]) | ((1 << INTVAL (operands[2])) - 1)) == 0x"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 5) (ashift:DI (match_dup 1) (match_dup 2)))
+   (set (match_dup 0) (sign_extend:DI (plus:SI (match_dup 6) (match_dup 4]
+  "{ operands[6] = gen_lowpart (SImode, operands[5]); }"
+  [(set_attr "type" "arith")])
+
 (include "bitmanip.md")
 (include "crypto.md")
 (include "sync.md")
diff --git a/gcc/testsuite/gcc.target/riscv/shift-add-2.c 
b/gcc/testsuite/gcc.target/riscv/shift-add-2.c
new file mode 100644
index 000..87439858e59
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shift-add-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbb_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+int sub2(int a, long long b) {
+  b = (b << 32) >> 31;
+  unsigned int x = a + b;
+  return x;
+}
+
+
+/* { dg-final { scan-assembler-times "\tslli\t" 1 } } */
+/* { dg-final { scan-assembler-times "\taddw\t" 1 } } */
+/* { dg-final { scan-assembler-not "\tsrai\t" } } */
+/* { dg-final { scan-assembler-not "\tsh.add\t" } } */
+


Re: [PATCH v2 2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-05-13 Thread Jeff Law




On 5/13/24 6:54 PM, Patrick O'Neill wrote:


On 5/13/24 13:28, Jeff Law wrote:



On 5/13/24 12:49 PM, Vineet Gupta wrote:

If the constant used for stack offset can be expressed as sum of two S12
values, the constant need not be materialized (in a reg) and instead the
two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.

The prev patches to generally avoid LUI based const materialization 
didn't

fix this PR and need this directed fix in funcion prologue/epilogue
expansion.

This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the test ;-)

    gcc-13.1 release   |  gcc 230823 | |
   |    g6619b3d4c15c    |   This patch | clang/llvm
-
li  t0,-4096 | li    t0,-4096  | addi  sp,sp,-2048 | addi 
sp,sp,-2048
addi    t0,t0,2016   | addi  t0,t0,2032    | add   sp,sp,-16   | addi 
sp,sp,-32
li  a4,4096  | add   sp,sp,t0  | add   a5,sp,a0    | add 
a1,sp,16
add sp,sp,t0 | addi  a5,sp,-2032   | sb    zero,0(a5)  | add 
a0,a0,a1
li  a5,-4096 | add   a0,a5,a0  | addi  sp,sp,2032  | sb 
zero,0(a0)
addi    a4,a4,-2032  | li    t0, 4096  | addi  sp,sp,32    | addi 
sp,sp,2032
add a4,a4,a5 | sb    zero,2032(a0) | ret   | addi 
sp,sp,48

addi    a5,sp,16 | addi  t0,t0,-2032   |   | ret
add a5,a4,a5 | add   sp,sp,t0  |
add a0,a5,a0 | ret |
li  t0,4096  |
sd  a5,8(sp) |
sb  zero,2032(a0)|
addi    t0,t0,-2016  |
add sp,sp,t0 |
ret  |

gcc/ChangeLog:
PR target/105733
* config/riscv/riscv.h: New macros for with aligned offsets.
* config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
function to split a sum of two s12 values into constituents.
(riscv_expand_prologue): Handle offset being sum of two S12.
(riscv_expand_epilogue): Ditto.
* config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr105733.c: New Test.
* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
expect LUI 4096.
* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.





@@ -8074,14 +8111,26 @@ riscv_expand_epilogue (int style)
  }
    else
  {
-  if (!SMALL_OPERAND (adjust_offset.to_constant ()))
+  HOST_WIDE_INT adj_off_value = adjust_offset.to_constant ();
+  if (SMALL_OPERAND (adj_off_value))
+    {
+  adjust = GEN_INT (adj_off_value);
+    }
+  else if (SUM_OF_TWO_S12_ALGN (adj_off_value))
+    {
+  HOST_WIDE_INT base, off;
+  riscv_split_sum_of_two_s12 (adj_off_value, , );
+  insn = gen_add3_insn (stack_pointer_rtx, 
hard_frame_pointer_rtx,

+    GEN_INT (base));
+  RTX_FRAME_RELATED_P (insn) = 1;
+  adjust = GEN_INT (off);
+    }
So this was the hunk that we identified internally as causing problems 
with libgomp's testsuite.  We never fully chased it down as this hunk 
didn't seem terribly important performance wise -- we just set it 
aside.  The thing is it looked basically correct to me.  So the 
failure was certainly unexpected, but it was consistent.


So I think the question is whether or not the CI system runs the 
libgomp testsuite, particularly in the rv64 linux configuration. If it 
does, and it passes, then we're good.  I'm still finding my way around 
the configuration, so I don't know if the CI system Edwin & Patrick 
have built tests libgomp or not.


I poked around the .sum files in pre/postcommit and we do run tests like:

PASS: c-c++-common/gomp/affinity-2.c  (test for errors, line 45)

I was able to find the summary info:


Tests that now fail, but worked before (15 tests):
libgomp: libgomp.fortran/simd7.f90   -O0  execution test
libgomp: libgomp.fortran/task2.f90   -O0  execution test
libgomp: libgomp.fortran/vla2.f90   -O0  execution test
libgomp: libgomp.fortran/vla3.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
libgomp: libgomp.fortran/vla3.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla4.f90   -O1  execution test
libgomp: libgomp.fortran/vla4.f90   -O2  execution test
libgomp: libgomp.fortran/vla4.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
libgomp: libgomp.fortran/vla4.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla4.f90   -Os  execution 

Re: [PATCH v2 1/3] RISC-V: movmem for RISCV with V extension

2024-05-13 Thread Jeff Law




On 12/19/23 10:28 PM, Jeff Law wrote:



On 12/19/23 02:53, Sergei Lewis wrote:

gcc/ChangeLog

 * config/riscv/riscv.md (movmem): Use 
riscv_vector::expand_block_move,
 if and only if we know the entire operation can be performed 
using one vector

 load followed by one vector store

gcc/testsuite/ChangeLog

 PR target/112109
 * gcc.target/riscv/rvv/base/movmem-1.c: New test
So this needs to be regression tested.  Given that it only affects RVV, 
I would suggest testing rv64gcv or rv32gcv.





+(define_expand "movmem"
+  [(parallel [(set (match_operand:BLK 0 "general_operand")
+   (match_operand:BLK 1 "general_operand"))
+    (use (match_operand:P 2 "const_int_operand"))
+    (use (match_operand:SI 3 "const_int_operand"))])]
+  "TARGET_VECTOR"
+{
+  if ((INTVAL (operands[2]) >= TARGET_MIN_VLEN/8)
+    && (INTVAL (operands[2]) <= TARGET_MIN_VLEN)
+    && riscv_vector::expand_block_move (operands[0], operands[1],
+ operands[2]))
+    DONE;
+  else
+    FAIL;
+})

Just a formatting nit.  A space on each side of the '/' operator above.
So I've fixed the formatting nit and tested on rv64gc and rv32gcv.  I 
hadn't planned to push it, but muscle memory kicked in and 1/3 has been 
pushed.


I'll be looking at 2/3 and 3/3 tomorrow (or possibly a bit tonight to 
take advantage of overnight CI runs).


jeff



Re: Follow up #1 (was Re: [PATCH v2 1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265])

2024-05-13 Thread Jeff Law




On 5/13/24 3:13 PM, Vineet Gupta wrote:

On 5/13/24 11:49, Vineet Gupta wrote:

  500.perlbench_r-0 |  1,214,534,029,025 | 1,212,887,959,387 |
  500.perlbench_r-1 |740,383,419,739 |   739,280,308,163 |
  500.perlbench_r-2 |692,074,638,817 |   691,118,734,547 |
  502.gcc_r-0   |190,820,141,435 |   190,857,065,988 |
  502.gcc_r-1   |225,747,660,839 |   225,809,444,357 | <- -0.02%
  502.gcc_r-2   |220,370,089,641 |   220,406,367,876 | <- -0.03%
  502.gcc_r-3   |179,111,460,458 |   179,135,609,723 | <- -0.02%
  502.gcc_r-4   |219,301,546,340 |   219,320,416,956 | <- -0.01%
  503.bwaves_r-0|278,733,324,691 |   278,733,323,575 | <- -0.01%
  503.bwaves_r-1|442,397,521,282 |   442,397,519,616 |
  503.bwaves_r-2|344,112,218,206 |   344,112,216,760 |
  503.bwaves_r-3|417,561,469,153 |   417,561,467,597 |
  505.mcf_r |669,319,257,525 |   669,318,763,084 |
  507.cactuBSSN_r   |  2,852,767,394,456 | 2,564,736,063,742 | <+ 10.10%


The small gcc regression seems like a tooling issue of some sort.
Looking at the topblocks, the insn sequences are exactly the same, only
the counts differ and its not obvious why.
Here's for gcc_r-1.


 > Block 0 @ 0x170ca, 12 insns, 87854493 times, 0.47%:

 000170ca :
    170ca:    7179        add    sp,sp,-48
    170cc:    ec26        sd    s1,24(sp)
    170ce:    e84a        sd    s2,16(sp)
    170d0:    e44e        sd    s3,8(sp)
    170d2:    f406        sd    ra,40(sp)
    170d4:    f022        sd    s0,32(sp)
    170d6:    84aa        mv    s1,a0
    170d8:    03200913      li    s2,50
    170dc:    03d00993      li    s3,61
    170e0:    8526        mv    a0,s1
    170e2:    001cd097      auipc    ra,0x1cd
    170e6:    bac080e7      jalr    -1108(ra) # 1e3c8e
 

 > Block 1 @ 0x706d0a, 3 insns, 274713936 times, 0.37%:
 >  Block 2 @ 0x1e3c8e, 9 insns, 88507109 times, 0.35%:
 ...

 < Block 0 @ 0x170ca, 12 insns, 87869602 times, 0.47%:
 < Block 1 @ 0x706d42, 3 insns, 274608893 times, 0.36%:
 < Block 2 @ 0x1e3c94, 9 insns, 88526354 times, 0.35%:


FWIW, Greg internally has been looking at some of this and found some
issues in the bbv tooling, but I wish all of this was  shared/upstream
(QEMU bbv plugin) for people to compare notes and not discover/fix the
same issues over and again.
Yea, we all meant to coordinate on those plugins.  The one we've got had 
some problems with hash collisions and when there's a hash collision it 
just produces total junk data.  I chased a few of these down and fixed 
them about a year ago.


The other thing is qemu will split up blocks based on its internal 
notion of a translation page.   So if you're looking at block level data 
you'll stumble over that as well.  This aspect is the most troublesome 
problem I'm aware of right now.






Jeff


Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-13 Thread Jeff Law




On 5/13/24 1:48 PM, Qing Zhao wrote:

-Warray-bounds is an important option to enable linux kernal to keep
the array out-of-bound errors out of the source tree.

However, due to the false positive warnings reported in PR109071
(-Warray-bounds false positive warnings due to code duplication from
jump threading), -Warray-bounds=1 cannot be added on by default.

Although it's impossible to elinimate all the false positive warnings
from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
documentation says "always out of bounds"), we should minimize the
false positive warnings in -Warray-bounds=1.

The root reason for the false positive warnings reported in PR109071 is:

When the thread jump optimization tries to reduce the # of branches
inside the routine, sometimes it needs to duplicate the code and
split into two conditional pathes. for example:

The original code:

void sparx5_set (int * ptr, struct nums * sg, int index)
{
   if (index >= 4)
 warn ();
   *ptr = 0;
   *val = sg->vals[index];
   if (index >= 4)
 warn ();
   *ptr = *val;

   return;
}

With the thread jump, the above becomes:

void sparx5_set (int * ptr, struct nums * sg, int index)
{
   if (index >= 4)
 {
   warn ();
   *ptr = 0;// Code duplications since "warn" does return;
   *val = sg->vals[index];   // same this line.
// In this path, since it's under the condition
// "index >= 4", the compiler knows the value
// of "index" is larger then 4, therefore the
// out-of-bound warning.
   warn ();
 }
   else
 {
   *ptr = 0;
   *val = sg->vals[index];
 }
   *ptr = *val;
   return;
}

We can see, after the thread jump optimization, the # of branches inside
the routine "sparx5_set" is reduced from 2 to 1, however,  due to the
code duplication (which is needed for the correctness of the code), we
got a false positive out-of-bound warning.

In order to eliminate such false positive out-of-bound warning,

A. Add one more flag for GIMPLE: is_splitted.
B. During the thread jump optimization, when the basic blocks are
duplicated, mark all the STMTs inside the original and duplicated
basic blocks as "is_splitted";
C. Inside the array bound checker, add the following new heuristic:

If
1. the stmt is duplicated and splitted into two conditional paths;
+  2. the warning level < 2;
+  3. the current block is not dominating the exit block
Then not report the warning.

The false positive warnings are moved from -Warray-bounds=1 to
  -Warray-bounds=2 now.

Bootstrapped and regression tested on both x86 and aarch64. adjusted
  -Warray-bounds-61.c due to the false positive warnings.

Let me know if you have any comments and suggestions.

This sounds horribly wrong.   In the code above, the warning is correct.

Jeff


Re: [PATCH v2 2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-05-13 Thread Jeff Law




On 5/13/24 12:49 PM, Vineet Gupta wrote:

If the constant used for stack offset can be expressed as sum of two S12
values, the constant need not be materialized (in a reg) and instead the
two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.

The prev patches to generally avoid LUI based const materialization didn't
fix this PR and need this directed fix in funcion prologue/epilogue
expansion.

This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the test ;-)

gcc-13.1 release   |  gcc 230823 |   |
   |g6619b3d4c15c|   This patch  |  clang/llvm
-
li  t0,-4096 | lit0,-4096  | addi  sp,sp,-2048 | addi 
sp,sp,-2048
addit0,t0,2016   | addi  t0,t0,2032| add   sp,sp,-16   | addi sp,sp,-32
li  a4,4096  | add   sp,sp,t0  | add   a5,sp,a0| add  a1,sp,16
add sp,sp,t0 | addi  a5,sp,-2032   | sbzero,0(a5)  | add  a0,a0,a1
li  a5,-4096 | add   a0,a5,a0  | addi  sp,sp,2032  | sb   zero,0(a0)
addia4,a4,-2032  | lit0, 4096  | addi  sp,sp,32| addi sp,sp,2032
add a4,a4,a5 | sbzero,2032(a0) | ret   | addi sp,sp,48
addia5,sp,16 | addi  t0,t0,-2032   |   | ret
add a5,a4,a5 | add   sp,sp,t0  |
add a0,a5,a0 | ret |
li  t0,4096  |
sd  a5,8(sp) |
sb  zero,2032(a0)|
addit0,t0,-2016  |
add sp,sp,t0 |
ret  |

gcc/ChangeLog:
PR target/105733
* config/riscv/riscv.h: New macros for with aligned offsets.
* config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
function to split a sum of two s12 values into constituents.
(riscv_expand_prologue): Handle offset being sum of two S12.
(riscv_expand_epilogue): Ditto.
* config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr105733.c: New Test.
* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
expect LUI 4096.
* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.





@@ -8074,14 +8111,26 @@ riscv_expand_epilogue (int style)
}
else
{
- if (!SMALL_OPERAND (adjust_offset.to_constant ()))
+ HOST_WIDE_INT adj_off_value = adjust_offset.to_constant ();
+ if (SMALL_OPERAND (adj_off_value))
+   {
+ adjust = GEN_INT (adj_off_value);
+   }
+ else if (SUM_OF_TWO_S12_ALGN (adj_off_value))
+   {
+ HOST_WIDE_INT base, off;
+ riscv_split_sum_of_two_s12 (adj_off_value, , );
+ insn = gen_add3_insn (stack_pointer_rtx, hard_frame_pointer_rtx,
+   GEN_INT (base));
+ RTX_FRAME_RELATED_P (insn) = 1;
+ adjust = GEN_INT (off);
+   }
So this was the hunk that we identified internally as causing problems 
with libgomp's testsuite.  We never fully chased it down as this hunk 
didn't seem terribly important performance wise -- we just set it aside. 
 The thing is it looked basically correct to me.  So the failure was 
certainly unexpected, but it was consistent.


So I think the question is whether or not the CI system runs the libgomp 
testsuite, particularly in the rv64 linux configuration.  If it does, 
and it passes, then we're good.  I'm still finding my way around the 
configuration, so I don't know if the CI system Edwin & Patrick have 
built tests libgomp or not.


If it isn't run, then we'll need to do a run to test that.  I'm set up 
here to do that if needed.   I can just drop this version into our 
internal tree, trigger an internal CI run and see if it complains :-)


If it does complain, then we know where to start investigations.




Jeff



Re: [PATCH v2 1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265]

2024-05-13 Thread Jeff Law




On 5/13/24 12:49 PM, Vineet Gupta wrote:

Apologies for the delay in getting this out. Needed to fix one ICE
with glibc build and fresh round of testing: both testsuite and SPEC
runs (which are similar to v1 in terms of Cactu gains, but some more minor
regressions elsewhere gcc). Again those seem so small that IMHO this
should still go in.

I'll investigate those next as well as an existing weirdnes in glibc tempnam
which I spotted during the debugging.

Changes since v1 [1]
  - Tighten the main conditition to avoid stack regs as destination
(to avoid making them potentially unaligned with -2047 addend:
 this might be OK execution/ABI wise, but undesirable/ugly still
 specially when coming from compiler codegen).
  - Ensure that first alternative is always split
  - Remove "&& 1" from split condition. That was tripping up glibc build
with illegal operands `add s0, s0, 2048`.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647877.html

  
+;; Special case of adding a reg and constant if latter is sum of two S12

+;; values (in range -2048 to 2047). Avoid materialized the const and fuse
+;; into the add (with an additional add for 2nd value). Makes a 3 insn
+;; sequence into 2 insn.
+
+(define_insn_and_split "*add3_const_sum_of_two_s12"
+  [(set (match_operand:P0 "register_operand" "=r,r")
+   (plus:P (match_operand:P 1 "register_operand" " r,r")
+   (match_operand:P 2 "const_two_s12"" MiG,r")))]
+  "!riscv_reg_frame_related (operands[0])"
So that !riscv_reg_frame_related is my only concern with this patch. 
It's a destination, so it *may* be OK.


If it were a source operand, then we'd have to worry about cases where 
it was a pseudo with the same value as sp/fp/argp and subsequent copy 
propagation replacing the pseudo with sp/fp/argp causing the insn to no 
longer match.


Similarly if it were a source operand we'd have to worry about cases 
where the pseudo had a registered (or discoverable) equivalence to 
sp/fp/argp plus an offset.  IRA/LRA can replace the use with its 
equivalence in some of those cases which would have potentially caused 
headaches.


But as a destination we really just have to worry about generation in 
the prologue/epilogue and for alloca calls.  Those should be the only 
places that set one of those special registers.  They're constrained 
enough that I think we'll be OK.


I'm very slightly worried about hard register cprop, but I think it 
should be safe these days WRT those special registers in the unlikely 
event it found an opportunity to propagate them.


So a tentative OK.  If we find this tidibit is problematical in the 
future, then what I would suggest is we allow those special registers 
and dial-back the aggressiveness on the range of allowed constants. 
That would allow the first instruction in the sequence to never create a 
mis-aligned sp.  But again, that's only if we need to revisit.


Please wait for CI to report back sane results :-)

Jeff


[to-be-committed][RISC-V] Improve AND with some constants

2024-05-13 Thread Jeff Law


If we have an AND with a constant operand and the constant operand 
requires synthesis, then we may be able to generate more efficient code 
than we do now.


Essentially the need for constant synthesis gives us a budget for 
alternative ways to clear bits, which zext.w can do for bits 32..63 
trivially.   So if we clear 32..63  via zext.w, the constant for the 
remaining bits to clear may be simple enough to use with andi or bseti. 
That will save us an instruction.


This has tested in Ventana's CI system as well as my own.  I'll wait for 
the upstream CI tester to report success before committing.


Jeff
gcc/
* config/riscv/bitmanip.md: Add new splitter for AND with
a constant that masks off bits 32..63 and needs synthesis.

gcc/testsuite/

* gcc.target/riscv/zba_zbs_and-1.c: New test.

+++ b/gcc/testsuite/gcc.target/riscv/zba_zbs_and-1.c
diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 724511b6df3..8769a6b818b 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -843,6 +843,40 @@ (define_insn_and_split "*andi_extrabit"
 }
 [(set_attr "type" "bitmanip")])
 
+;; If we have the ZBA extension, then we can clear the upper half of a 64
+;; bit object with a zext.w.  So if we have AND where the constant would
+;; require synthesis of two or more instructions, but 32->64 sign extension
+;; of the constant is a simm12, then we can use zext.w+andi.  If the adjusted
+;; constant is a single bit constant, then we can use zext.w+bclri
+;;
+;; With the mvconst_internal pattern claiming a single insn to synthesize
+;; constants, this must be a define_insn_and_split.
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (and:DI (match_operand:DI 1 "register_operand" "r")
+   (match_operand 2 "const_int_operand" "n")))]
+  "TARGET_64BIT
+   && TARGET_ZBA
+   && !paradoxical_subreg_p (operands[1])
+   /* Only profitable if synthesis takes more than one insn.  */
+   && riscv_const_insns (operands[2]) != 1
+   /* We need the upper half to be zero.  */
+   && (INTVAL (operands[2]) & HOST_WIDE_INT_C (0x)) == 0
+   /* And the the adjusted constant must either be something we can
+  implement with andi or bclri.  */
+   && ((SMALL_OPERAND (sext_hwi (INTVAL (operands[2]), 32))
+|| (TARGET_ZBS && popcount_hwi (INTVAL (operands[2])) == 31))
+   && INTVAL (operands[2]) != 0x7fff)"
+  "#"
+  "&& 1"
+  [(set (match_dup 0) (zero_extend:DI (match_dup 3)))
+   (set (match_dup 0) (and:DI (match_dup 0) (match_dup 2)))]
+  "{
+ operands[3] = gen_lowpart (SImode, operands[1]);
+ operands[2] = GEN_INT (sext_hwi (INTVAL (operands[2]), 32));
+   }"
+  [(set_attr "type" "bitmanip")])
+
 ;; IF_THEN_ELSE: test for 2 bits of opposite polarity
 (define_insn_and_split "*branch_mask_twobits_equals_singlebit"
   [(set (pc)
diff --git a/gcc/testsuite/gcc.target/riscv/zba_zbs_and-1.c 
b/gcc/testsuite/gcc.target/riscv/zba_zbs_and-1.c
new file mode 100644
index 000..23fd769449e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zba_zbs_and-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbb_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+
+unsigned long long w32mem_1(unsigned long long w32)
+{
+return w32 & ~(1U << 0);
+}
+
+unsigned long long w32mem_2(unsigned long long w32)
+{
+return w32 & ~(1U << 30);
+}
+
+unsigned long long w32mem_3(unsigned long long w32)
+{
+return w32 & ~(1U << 31);
+}
+
+/* If we do synthesis, then we'd see an addi.  */
+/* { dg-final { scan-assembler-not "addi\t" } } */


Re: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 scalar

2024-05-13 Thread Jeff Law




On 5/13/24 9:00 AM, Li, Pan2 wrote:

Committed, thanks Juzhe and Kito. Let's wait for a while before backport to 14.

Could you fix the formatting nits caught by the CI linter?

=== ERROR type #1: trailing operator (4 error(s)) ===
gcc/config/riscv/riscv-vector-builtins.cc:4641:39:  if ((exts & 
RVV_REQUIRE_ELEN_FP_16) &&
gcc/config/riscv/riscv-vector-builtins.cc:4651:39:  if ((exts & 
RVV_REQUIRE_ELEN_FP_32) &&
gcc/config/riscv/riscv-vector-builtins.cc:4661:39:  if ((exts & 
RVV_REQUIRE_ELEN_FP_64) &&
gcc/config/riscv/riscv-vector-builtins.cc:4670:36:  if ((exts & 
RVV_REQUIRE_ELEN_64) &&



The "&&" needs to come down to the next line, indented like

if ((exts && RVV_REQUIRE_ELEN_FP_16)
&& !TARGET_VECTOR_.)

Ie, the "&&" indents just inside the first open paren.  It looks like 
all the conditions in validate_instance_type_required_extensions need to 
be fixed in a similar manner.


Given this is NFC, just post it for the archiver.  No need to wait on 
review.


Jeff




[to-be-committed] [RISC-V] Improve single inverted bit extraction - v3

2024-05-12 Thread Jeff Law


The only change in v2 vs v3 is testsuite adjustments for the updated 
sequences and fixing the name of the second pattern.


--


So this patch fixes a minor code generation inefficiency that (IIRC) the
RAU team discovered a while ago in spec.

If we want the inverted value of a single bit we can use bext to extract
the bit, then seq to invert the value (if viewed as a 0/1 truth value).

The RTL is fairly convoluted, but it's basically a right shift to get
the bit into position, bitwise-not then masking off all but the low bit.
So it's a 3->2 combine, hidden by the fact that and-not is a
define_insn_and_split, so it actually looks like a 2->2 combine.

We've run this through Ventana's internal CI (which includes
zba_zbb_zbs) and I've run it in my own tester (rv64gc, rv32gcv).  I'll
wait for the upstream CI to finish with positive results before pushing.

Jeff

gcc/
* config/riscv/bitmanip.md (bextseqzdisi): New patterns.

gcc/testsuite/

* gcc.target/riscv/zbs-bext-2.c: New test.
* gcc.target/riscv/zbs-bext.c: Fix one of the possible expectes 
sequences.


diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index d76a72d30e0..724511b6df3 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -711,6 +711,49 @@ (define_insn "*bext"
   "bext\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+;; This is a bext followed by a seqz.  Normally this would be a 3->2 split
+;; But the and-not pattern with a constant operand is a define_insn_and_split,
+;; so this looks like a 2->2 split, which combine rejects.  So implement it
+;; as a define_insn_and_split as well.
+(define_insn_and_split "*bextseqzdisi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (and:DI
+ (not:DI
+   (subreg:DI
+ (lshiftrt:SI
+   (match_operand:SI 1 "register_operand" "r")
+   (match_operand:QI 2 "register_operand" "r")) 0))
+ (const_int 1)))]
+  "TARGET_64BIT && TARGET_ZBS"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (zero_extract:DI (match_dup 1)
+(const_int 1)
+(zero_extend:DI (match_dup 2
+   (set (match_dup 0) (eq:DI (match_dup 0) (const_int 0)))]
+  "operands[1] = gen_lowpart (word_mode, operands[1]);"
+  [(set_attr "type" "bitmanip")])
+
+(define_insn_and_split "*bextseqz"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (and:X
+ (not:X
+   (lshiftrt:X
+ (match_operand:X 1 "register_operand" "r")
+ (match_operand:QI 2 "register_operand" "r")))
+ (const_int 1)))]
+  "TARGET_ZBS"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (zero_extract:X (match_dup 1)
+   (const_int 1)
+   (zero_extend:X (match_dup 2
+   (set (match_dup 0) (eq:X (match_dup 0) (const_int 0)))]
+  "operands[1] = gen_lowpart (word_mode, operands[1]);"
+  [(set_attr "type" "bitmanip")])
+
 ;; When performing `(a & (1UL << bitno)) ? 0 : -1` the combiner
 ;; usually has the `bitno` typed as X-mode (i.e. no further
 ;; zero-extension is performed around the bitno).
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c
new file mode 100644
index 000..79f120b2286
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+
+_Bool match(const int ch, int fMap) {
+return ((fMap & (1<<(ch))) == 0);
+}
+
+_Bool match2(const int ch, int fMap) {
+return ((fMap & (1UL<<(ch))) == 0);
+}
+
+
+/* { dg-final { scan-assembler-times "bext\t" 2 } } */
+/* { dg-final { scan-assembler-times "seqz\t|xori\t" 2 } } */
+/* { dg-final { scan-assembler-not "sraw\t" } } */
+/* { dg-final { scan-assembler-not "not\t" } } */
+/* { dg-final { scan-assembler-not "andi\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bext.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bext.c
index ff75dad6528..0db97f5ab59 100644
--- a/gcc/testsuite/gcc.target/riscv/zbs-bext.c
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bext.c
@@ -38,7 +38,7 @@ long bext64_4(long a, char bitno)
 
 /* { dg-final { scan-assembler-times "bexti\t" 1 } } */
 /* { dg-final { scan-assembler-times "bext\t" 5 } } */
-/* { dg-final { scan-assembler-times "xori\t|snez\t" 1 } } */
+/* { dg-final { scan-assembler-times "xori\t|seqz\t" 1 } } */
 /* { dg-final { scan-assembler-times "addi\t" 1 } } */
 /* { dg-final { scan-assembler-times "neg\t" 1 } } */
 /* { dg-final { scan-assembler-not {\mandi} } } */


[to-be-committed] [RISC-V] Improve single inverted bit extraction - v2

2024-05-12 Thread Jeff Law


So the first version failed CI and after looking at the patch again, I 
think it can be improved.


First, the output pattern might as well go ahead and use the 
zero_extract form.


Second, we should be able to handle cases where all the ops are in 
word_mode as well as when the shift is in a narrow made.


Third, the testcase should cover additional modes.

Fourth, fix some lint issues with tabs vs spaces.

This has only been lightly tested, so it should be interesting to see 
what CI shows.


Jeffdiff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index d76a72d30e0..724511b6df3 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -711,6 +711,49 @@ (define_insn "*bext"
   "bext\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+;; This is a bext followed by a seqz.  Normally this would be a 3->2 split
+;; But the and-not pattern with a constant operand is a define_insn_and_split,
+;; so this looks like a 2->2 split, which combine rejects.  So implement it
+;; as a define_insn_and_split as well.
+(define_insn_and_split "*bextseqzdisi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (and:DI
+ (not:DI
+   (subreg:DI
+ (lshiftrt:SI
+   (match_operand:SI 1 "register_operand" "r")
+   (match_operand:QI 2 "register_operand" "r")) 0))
+ (const_int 1)))]
+  "TARGET_64BIT && TARGET_ZBS"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (zero_extract:DI (match_dup 1)
+(const_int 1)
+(zero_extend:DI (match_dup 2
+   (set (match_dup 0) (eq:DI (match_dup 0) (const_int 0)))]
+  "operands[1] = gen_lowpart (word_mode, operands[1]);"
+  [(set_attr "type" "bitmanip")])
+
+(define_insn_and_split "*bextseqzdisi"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (and:X
+ (not:X
+   (lshiftrt:X
+ (match_operand:X 1 "register_operand" "r")
+ (match_operand:QI 2 "register_operand" "r")))
+ (const_int 1)))]
+  "TARGET_ZBS"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (zero_extract:X (match_dup 1)
+   (const_int 1)
+   (zero_extend:X (match_dup 2
+   (set (match_dup 0) (eq:X (match_dup 0) (const_int 0)))]
+  "operands[1] = gen_lowpart (word_mode, operands[1]);"
+  [(set_attr "type" "bitmanip")])
+
 ;; When performing `(a & (1UL << bitno)) ? 0 : -1` the combiner
 ;; usually has the `bitno` typed as X-mode (i.e. no further
 ;; zero-extension is performed around the bitno).
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c
new file mode 100644
index 000..719df442fed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+
+_Bool match(const int ch, int fMap) {
+return ((fMap & (1<<(ch))) == 0);
+}
+
+_Bool match2(const int ch, int fMap) {
+return ((fMap & (1UL<<(ch))) == 0);
+}
+
+
+/* { dg-final { scan-assembler-times "bext\t" 1 } } */
+/* { dg-final { scan-assembler-times "seqz\t" 1 } } */
+/* { dg-final { scan-assembler-not "sraw\t" } } */
+/* { dg-final { scan-assembler-not "not\t" } } */
+/* { dg-final { scan-assembler-not "andi\t" } } */


[to-be-committed] [RISC-V] Improve single inverted bit extraction

2024-05-12 Thread Jeff Law
So the first time I sent this, I attached the wrong patch.  As a result 
the CI system wasn't happy.


The second time I sent the right patch, but I don't see evidence the CI 
system ran the correct patch through.  So I'm just starting over ;-)


--

So this patch fixes a minor code generation inefficiency that (IIRC) the
RAU team discovered a while ago in spec.

If we want the inverted value of a single bit we can use bext to extract
the bit, then seq to invert the value (if viewed as a 0/1 truth value).

The RTL is fairly convoluted, but it's basically a right shift to get
the bit into position, bitwise-not then masking off all but the low bit.
So it's a 3->2 combine, hidden by the fact that and-not is a
define_insn_and_split, so it actually looks like a 2->2 combine.

We've run this through Ventana's internal CI (which includes
zba_zbb_zbs) and I've run it in my own tester (rv64gc, rv32gcv).  I'll
wait for the upstream CI to finish with positive results before pushing.

Jeff* config/riscv/riscv.cc (riscv_build_integer_1): Recognize cases where
we can use shNadd to improve constant synthesis.
(riscv_move_integer): Handle code generation for shNadd.

gcc/testsuite
* gcc.target/riscv/synthesis-1.c: Also count shNadd instructions.
* gcc.target/riscv/synthesis-3.c: New test.

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index d76a72d30e0..cf2fa04d4c4 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -711,6 +711,30 @@ (define_insn "*bext"
   "bext\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+;; This is a bext followed by a seqz.  Normally this would be a 3->2 split
+;; But the and-not pattern with a constant operand is a define_insn_and_split,
+;; so this looks like a 2->2 split, which combine rejects.  So implement it
+;; as a define_insn_and_split as well.
+(define_insn_and_split "*bextseqzdisi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (and:DI
+ (not:DI
+   (subreg:DI
+ (lshiftrt:SI
+   (match_operand:SI 1 "register_operand" "r")
+   (match_operand:QI 2 "register_operand" "r")) 0))
+  (const_int 1)))]
+  "TARGET_64BIT && TARGET_ZBS"
+  "#"
+  "&& 1"
+  [(set (match_dup 0) (and:DI (subreg:DI
+   (lshiftrt:SI (match_dup 1)
+(match_dup 2)) 0)
+ (const_int 1)))
+   (set (match_dup 0) (eq:DI (match_dup 0) (const_int 0)))]
+  ""
+  [(set_attr "type" "bitmanip")])
+
 ;; When performing `(a & (1UL << bitno)) ? 0 : -1` the combiner
 ;; usually has the `bitno` typed as X-mode (i.e. no further
 ;; zero-extension is performed around the bitno).
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c
new file mode 100644
index 000..53f47dc3afe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+
+_Bool match(const int ch, int fMap) {
+return ((fMap & (1<<(ch))) == 0);
+}
+
+
+/* { dg-final { scan-assembler-times "bext\t" 1 } } */
+/* { dg-final { scan-assembler-times "seqz\t" 1 } } */
+/* { dg-final { scan-assembler-not "sraw\t" } } */
+/* { dg-final { scan-assembler-not "not\t" } } */
+/* { dg-final { scan-assembler-not "andi\t" } } */


[to-be-committed][RISC-V] Improve usage of slli.uw in constant synthesis

2024-05-11 Thread Jeff Law

And an improvement to using slli.uw...

I recently added the ability to use slli.uw in the synthesis path.  That 
code was conditional on the right justified constant being a LUI_OPERAND 
after sign extending from bit 31 to bit 63.


That code is working fine, but could be improved.  Specifically there's 
no reason it shouldn't work for LUI+ADDI under the same circumstances. 
So rather than testing the sign extended, right justified constant is a 
LUI_OPERAND, we can just test that the right justified constant has 
precisely 32 leading zeros.



Waiting on CI to finish, expecting to commit after it's successful.

Jeff
gcc/
* config/riscv/riscv.cc (riscv_build_integer_1): Use slli.uw more.

gcc/testsuite
* gcc.target/riscv/synthesis-5.c: New test.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 049f8f8cb9f..a1e5a014bed 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -819,13 +819,14 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  & ~HOST_WIDE_INT_C (0x8000)
shift -= IMM_BITS, x <<= IMM_BITS;
 
-  /* Adjust X if it isn't a LUI operand in isolation, but we can use
-a subsequent "uw" instruction form to mask off the undesirable
-bits.  */
+  /* If X has bits 32..63 clear and bit 31 set, then go ahead and mark
+it as desiring a "uw" operation for the shift.  That way we can have
+LUI+ADDI to generate the constant, then shift it into position
+clearing out the undesirable bits.  */
   if (!LUI_OPERAND (x)
  && TARGET_64BIT
  && TARGET_ZBA
- && LUI_OPERAND (x & ~HOST_WIDE_INT_C (0x8000UL)))
+ && clz_hwi (x) == 32)
{
  x = sext_hwi (x, 32);
  use_uw = true;
diff --git a/gcc/testsuite/gcc.target/riscv/synthesis-5.c 
b/gcc/testsuite/gcc.target/riscv/synthesis-5.c
new file mode 100644
index 000..4d81565b563
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/synthesis-5.c
@@ -0,0 +1,294 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+
+/* We aggressively skip as we really just need to test the basic synthesis
+   which shouldn't vary based on the optimization level.  -O1 seems to work
+   and eliminates the usual sources of extraneous dead code that would throw
+   off the counts.  */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O2" "-O3" "-Os" "-Oz" "-flto" } } 
*/
+/* { dg-options "-march=rv64gc_zba_zbb_zbs" } */
+
+/* Rather than test for a specific synthesis of all these constants or
+   having thousands of tests each testing one variant, we just test the
+   total number of instructions.
+
+   This isn't expected to change much and any change is worthy of a look.  */
+/* { dg-final { scan-assembler-times 
"\\t(add|addi|bseti|li|ret|sh1add|sh2add|sh3add|slli)" 556 } } */
+
+unsigned long foo_0x80180001000(void) { return 0x80180001000UL; }
+
+unsigned long foo_0x80280001000(void) { return 0x80280001000UL; }
+
+unsigned long foo_0x80480001000(void) { return 0x80480001000UL; }
+
+unsigned long foo_0x80880001000(void) { return 0x80880001000UL; }
+
+unsigned long foo_0x81080001000(void) { return 0x81080001000UL; }
+
+unsigned long foo_0x82080001000(void) { return 0x82080001000UL; }
+
+unsigned long foo_0x84080001000(void) { return 0x84080001000UL; }
+
+unsigned long foo_0x88080001000(void) { return 0x88080001000UL; }
+
+unsigned long foo_0x90080001000(void) { return 0x90080001000UL; }
+
+unsigned long foo_0xa0080001000(void) { return 0xa0080001000UL; }
+
+unsigned long foo_0x8031000(void) { return 0x8031000UL; }
+
+unsigned long foo_0x8051000(void) { return 0x8051000UL; }
+
+unsigned long foo_0x8091000(void) { return 0x8091000UL; }
+
+unsigned long foo_0x8111000(void) { return 0x8111000UL; }
+
+unsigned long foo_0x8211000(void) { return 0x8211000UL; }
+
+unsigned long foo_0x8411000(void) { return 0x8411000UL; }
+
+unsigned long foo_0x8811000(void) { return 0x8811000UL; }
+
+unsigned long foo_0x9011000(void) { return 0x9011000UL; }
+
+unsigned long foo_0xa011000(void) { return 0xa011000UL; }
+
+unsigned long foo_0xc011000(void) { return 0xc011000UL; }
+
+unsigned long foo_0x8061000(void) { return 0x8061000UL; }
+
+unsigned long foo_0x80a1000(void) { return 0x80a1000UL; }
+
+unsigned long foo_0x8121000(void) { return 0x8121000UL; }
+
+unsigned long foo_0x8221000(void) { return 0x8221000UL; }
+
+unsigned long foo_0x8421000(void) { return 0x8421000UL; }
+
+unsigned long foo_0x8821000(void) { return 0x8821000UL; }
+
+unsigned long foo_0xa021000(void) { return 0xa021000UL; }
+
+unsigned long foo_0xc021000(void) { return 0xc021000UL; }
+
+unsigned long foo_0x80c1000(void) { return 0x80c1000UL; }
+
+unsigned long foo_0x8141000(void) { return 0x8141000UL; }
+
+unsigned long 

[to-be-committed] RISC-V Fix minor regression in synthesis WRT bseti usage

2024-05-11 Thread Jeff Law
Overnight testing showed a small number of cases where constant 
synthesis was doing something dumb.  Specifically generating more 
instructions than the number of bits set in the constant.


It was a minor goof in the recent bseti code.  In the code to first 
figure out what bits LUI could set, I included one bit outside the space 
LUI operates.  For some dumb reason I kept thinking in terms of 11 low 
bits belonging to addi, but it's actually 12 bits.  The net is what we 
thought should be a single LUI for costing turned into LUI+ADDI.


I didn't let the test run to completion, but over the course of 12 hours 
it found 9 cases.  Given we know that the triggers all have 0x800 set, I 
bet we could likely find more, but I doubt it's that critical to cover 
every possible constant that regressed.


This has run in my tester (rv64gc, rv32gcv), but I'll wait for the CI 
tester as it covers the bitmanip extensions much better.



Jeff

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9c98b1da035..049f8f8cb9f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -921,12 +921,12 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
 
   /* First handle any bits set by LUI.  Be careful of the
 SImode sign bit!.  */
-  if (value & 0x7800)
+  if (value & 0x7000)
{
  alt_codes[i].code = (i == 0 ? UNKNOWN : IOR);
- alt_codes[i].value = value & 0x7800;
+ alt_codes[i].value = value & 0x7000;
  alt_codes[i].use_uw = false;
- value &= ~0x7800;
+ value &= ~0x7000;
   i++;
}
 
diff --git a/gcc/testsuite/gcc.target/riscv/synthesis-4.c 
b/gcc/testsuite/gcc.target/riscv/synthesis-4.c
new file mode 100644
index 000..328a55b9e6e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/synthesis-4.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* We aggressively skip as we really just need to test the basic synthesis
+   which shouldn't vary based on the optimization level.  -O1 seems to work
+   and eliminates the usual sources of extraneous dead code that would throw
+   off the counts.  */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O2" "-O3" "-Os" "-Oz" "-flto" } } 
*/
+/* { dg-options "-march=rv64gc_zba_zbb_zbs" } */
+
+/* Rather than test for a specific synthesis of all these constants or
+   having thousands of tests each testing one variant, we just test the
+   total number of instructions. 
+
+   This isn't expected to change much and any change is worthy of a look.  */
+/* { dg-final { scan-assembler-times 
"\\t(add|addi|bseti|li|ret|sh1add|sh2add|sh3add|slli)" 45 } } */
+
+
+unsigned long foo_0x640800(void) { return 0x640800UL; }
+
+unsigned long foo_0xc40800(void) { return 0xc40800UL; }
+
+unsigned long foo_0x1840800(void) { return 0x1840800UL; }
+
+unsigned long foo_0x3040800(void) { return 0x3040800UL; }
+
+unsigned long foo_0x6040800(void) { return 0x6040800UL; }
+
+unsigned long foo_0xc040800(void) { return 0xc040800UL; }
+
+unsigned long foo_0x18040800(void) { return 0x18040800UL; }
+
+unsigned long foo_0x30040800(void) { return 0x30040800UL; }
+
+unsigned long foo_0x60040800(void) { return 0x60040800UL; }


Re: [PATCH v2 1/4] Support for CodeView debugging format

2024-05-11 Thread Jeff Law




On 10/30/23 6:28 PM, Mark Harmstone wrote:

This patch and the following add initial support for Microsoft's
CodeView debugging format, as used by MSVC, to mingw targets.

Note that you will need a recent version of binutils for this to be
useful. The best way to view the output is to run Microsoft's
cvdump.exe, found in their microsoft-pdb repo on GitHub, against the
object files.
So I'd hoped to have these wrapped up last year in time for gcc-14, but 
life got in the way.


The patches are fine for the trunk, though they are missing ChangeLog 
entries.  I'll cobble those together and push the series to the trunk.


Thanks for your patience.

jeff



Re: [to-be-committed][RISC-V] Improve extraction of inverted single bit

2024-05-10 Thread Jeff Law



On 5/10/24 4:28 PM, Jeff Law wrote:
So this patch fixes a minor code generation inefficiency that (IIRC) the 
RAU team discovered a while ago in spec.


If we want the inverted value of a single bit we can use bext to extract 
the bit, then seq to invert the value (if viewed as a 0/1 truth value).


The RTL is fairly convoluted, but it's basically a right shift to get 
the bit into position, bitwise-not then masking off all but the low bit. 
  So it's a 3->2 combine, hidden by the fact that and-not is a 
define_insn_and_split, so it actually looks like a 2->2 combine.


We've run this through Ventana's internal CI (which includes 
zba_zbb_zbs) and I've run it in my own tester (rv64gc, rv32gcv).  I'll 
wait for the upstream CI to finish with positive results before pushing.

[ ... ]
Whoops, sent the wrong patch.  The downside of doing work on one system, 
but handling email from another :(


Here's the right patch.



gcc/
* config/riscv/bitmanip.md (*bextseqzdisi): New pattern.

gcc/testsuite/

* gcc.target/riscv/zbs-bext-2.c: New test.

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index d76a72d30e0..cf2fa04d4c4 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -711,6 +711,30 @@ (define_insn "*bext"
   "bext\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+;; This is a bext followed by a seqz.  Normally this would be a 3->2 split
+;; But the and-not pattern with a constant operand is a define_insn_and_split,
+;; so this looks like a 2->2 split, which combine rejects.  So implement it
+;; as a define_insn_and_split as well.
+(define_insn_and_split "*bextseqzdisi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (and:DI
+ (not:DI
+   (subreg:DI
+ (lshiftrt:SI
+   (match_operand:SI 1 "register_operand" "r")
+   (match_operand:QI 2 "register_operand" "r")) 0))
+  (const_int 1)))]
+  "TARGET_64BIT && TARGET_ZBS"
+  "#"
+  "&& 1"
+  [(set (match_dup 0) (and:DI (subreg:DI
+   (lshiftrt:SI (match_dup 1)
+(match_dup 2)) 0)
+ (const_int 1)))
+   (set (match_dup 0) (eq:DI (match_dup 0) (const_int 0)))]
+  ""
+  [(set_attr "type" "bitmanip")])
+
 ;; When performing `(a & (1UL << bitno)) ? 0 : -1` the combiner
 ;; usually has the `bitno` typed as X-mode (i.e. no further
 ;; zero-extension is performed around the bitno).
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c
new file mode 100644
index 000..53f47dc3afe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+
+_Bool match(const int ch, int fMap) {
+return ((fMap & (1<<(ch))) == 0);
+}
+
+
+/* { dg-final { scan-assembler-times "bext\t" 1 } } */
+/* { dg-final { scan-assembler-times "seqz\t" 1 } } */
+/* { dg-final { scan-assembler-not "sraw\t" } } */
+/* { dg-final { scan-assembler-not "not\t" } } */
+/* { dg-final { scan-assembler-not "andi\t" } } */


Re: [wwwdocs] Add Cauldron2024

2024-05-10 Thread Jeff Law




On 5/7/24 4:34 AM, Jan Hubicka wrote:

Hi,
this adds Cauldron2024 to main page. OK?

OK, of course.

jeff



Re: [PATCH 4/4] RISC-V: Allow by-pieces to do overlapping accesses in block_move_straight

2024-05-10 Thread Jeff Law




On 5/7/24 11:17 PM, Christoph Müllner wrote:

The current implementation of riscv_block_move_straight() emits a couple
of loads/stores with with maximum width (e.g. 8-byte for RV64).
The remainder is handed over to move_by_pieces().
The by-pieces framework utilizes target hooks to decide about the emitted
instructions (e.g. unaligned accesses or overlapping accesses).

Since the current implementation will always request less than XLEN bytes
to be handled by the by-pieces infrastructure, it is impossible that
overlapping memory accesses can ever be emitted (the by-pieces code does
not know of any previous instructions that were emitted by the backend).

This patch changes the implementation of riscv_block_move_straight()
such, that it utilizes the by-pieces framework if the remaining data
is less than 2*XLEN bytes, which is sufficient to enable overlapping
memory accesses (if the requirements for them are given).

The changes in the expansion can be seen in the adjustments of the
cpymem-NN-ooo test cases. The changes in the cpymem-NN tests are
caused by the different instruction ordering of the code emitted
by the by-pieces infrastructure, which emits alternating load/store
sequences.

gcc/ChangeLog:

* config/riscv/riscv-string.cc (riscv_block_move_straight):
Hand over up to 2xXLEN bytes to move_by_pieces().

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: Adjustments for overlapping
access.
* gcc.target/riscv/cpymem-32.c: Adjustments for code emitted by
by-pieces.
* gcc.target/riscv/cpymem-64-ooo.c: Adjustments for overlapping
access.
* gcc.target/riscv/cpymem-64.c: Adjustments for code emitted by
by-pieces.

OK once any prereqs are in.

jeff



Re: [PATCH 3/4] RISC-V: tune: Add setting for overlapping mem ops to tuning struct

2024-05-10 Thread Jeff Law




On 5/7/24 11:17 PM, Christoph Müllner wrote:

This patch adds the field overlap_op_by_pieces to the struct
riscv_tune_param, which is used by the TARGET_OVERLAP_OP_BY_PIECES_P()
hook. This hook is used by the by-pieces infrastructure to decide
if overlapping memory accesses should be emitted.

The new property is set to false in all tune structs except for
generic-ooo.

The changes in the expansion can be seen in the adjustments of the
cpymem test cases. These tests also reveal a limitation in the
RISC-V cpymem expansion that prevents this optimization as only
by-pieces cpymem expansions emit overlapping memory accesses.

gcc/ChangeLog:

* config/riscv/riscv.cc (struct riscv_tune_param): New field
overlap_op_by_pieces.
(riscv_overlap_op_by_pieces): New function.
(TARGET_OVERLAP_OP_BY_PIECES_P): Connect to
riscv_overlap_op_by_pieces.
I think these are redundant with the changes I installed earlier this 
week :-)




gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: Adjust for overlapping
access.
* gcc.target/riscv/cpymem-64-ooo.c: Likewise.

OK once prereqs are in.

jeff



Re: [PATCH 2/4] RISC-V: Allow unaligned accesses in cpymemsi expansion

2024-05-10 Thread Jeff Law




On 5/7/24 11:17 PM, Christoph Müllner wrote:

The RISC-V cpymemsi expansion is called, whenever the by-pieces
infrastructure will not take care of the builtin expansion.
The code emitted by the by-pieces infrastructure may emits code,
that includes unaligned accesses if riscv_slow_unaligned_access_p
is false.

The RISC-V cpymemsi expansion is handled via riscv_expand_block_move().
The current implementation of this function does not check
riscv_slow_unaligned_access_p and never emits unaligned accesses.

Since by-pieces emits unaligned accesses, it is reasonable to implement
the same behaviour in the cpymemsi expansion. And that's what this patch
is doing.

The patch checks riscv_slow_unaligned_access_p at the entry and sets
the allowed alignment accordingly. This alignment is then propagated
down to the routines that emit the actual instructions.

The changes introduced by this patch can be seen in the adjustments
of the cpymem tests.

gcc/ChangeLog:

* config/riscv/riscv-string.cc (riscv_block_move_straight): Add
parameter align.
(riscv_adjust_block_mem): Replace parameter length by align.
(riscv_block_move_loop): Add parameter align.
(riscv_expand_block_move_scalar): Set alignment properly if the
target has fast unaligned access.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: Adjust for unaligned access.
* gcc.target/riscv/cpymem-64-ooo.c: Likewise.

Mostly ok.  One concern noted below.




Signed-off-by: Christoph Müllner 
---
  gcc/config/riscv/riscv-string.cc  | 53 +++
  .../gcc.target/riscv/cpymem-32-ooo.c  | 20 +--
  .../gcc.target/riscv/cpymem-64-ooo.c  | 14 -
  3 files changed, 59 insertions(+), 28 deletions(-)

@@ -730,8 +732,16 @@ riscv_expand_block_move_scalar (rtx dest, rtx src, rtx 
length)
unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
unsigned HOST_WIDE_INT factor, align;
  
-  align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);

-  factor = BITS_PER_WORD / align;
+  if (riscv_slow_unaligned_access_p)
+{
+  align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
+  factor = BITS_PER_WORD / align;
+}
+  else
+{
+  align = hwi_length * BITS_PER_UNIT;
+  factor = 1;
+}
Not sure why you're using hwi_length here.  That's a property of the 
host, not the target.  ISTM you wanted BITS_PER_WORD here to encourage 
word sized moves irrespective of alignment.


OK with that change after a fresh rounding of testing.

jeff


[to-be-committed][RISC-V] Improve extraction of inverted single bit

2024-05-10 Thread Jeff Law
So this patch fixes a minor code generation inefficiency that (IIRC) the 
RAU team discovered a while ago in spec.


If we want the inverted value of a single bit we can use bext to extract 
the bit, then seq to invert the value (if viewed as a 0/1 truth value).


The RTL is fairly convoluted, but it's basically a right shift to get 
the bit into position, bitwise-not then masking off all but the low bit. 
 So it's a 3->2 combine, hidden by the fact that and-not is a 
define_insn_and_split, so it actually looks like a 2->2 combine.


We've run this through Ventana's internal CI (which includes 
zba_zbb_zbs) and I've run it in my own tester (rv64gc, rv32gcv).  I'll 
wait for the upstream CI to finish with positive results before pushing.


Jeffgcc/

* config/riscv/riscv.cc (riscv_build_integer_1): Recognize cases where
we can use shNadd to improve constant synthesis.
(riscv_move_integer): Handle code generation for shNadd.

gcc/testsuite
* gcc.target/riscv/synthesis-1.c: Also count shNadd instructions.
* gcc.target/riscv/synthesis-3.c: New test.


diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 2eac67b0ce0..75e828c81a7 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -880,6 +880,37 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
}
 }
 
+  if (cost > 2 && TARGET_64BIT && TARGET_ZBA)
+{
+  if ((value % 9) == 0
+ && (alt_cost = riscv_build_integer_1 (alt_codes, value / 9, mode) + 
1) < cost)
+   {
+  alt_codes[alt_cost - 1].code = FMA;
+  alt_codes[alt_cost - 1].value = 9;
+  alt_codes[alt_cost - 1].use_uw = false;
+  memcpy (codes, alt_codes, sizeof (alt_codes));
+  cost = alt_cost;
+   }
+  if ((value % 5) == 0
+ && (alt_cost = riscv_build_integer_1 (alt_codes, value / 5, mode) + 
1) < cost)
+   {
+  alt_codes[alt_cost - 1].code = FMA;
+  alt_codes[alt_cost - 1].value = 5;
+  alt_codes[alt_cost - 1].use_uw = false;
+  memcpy (codes, alt_codes, sizeof (alt_codes));
+  cost = alt_cost;
+   }
+  if ((value % 3) == 0
+ && (alt_cost = riscv_build_integer_1 (alt_codes, value / 3, mode) + 
1) < cost)
+   {
+  alt_codes[alt_cost - 1].code = FMA;
+  alt_codes[alt_cost - 1].value = 3;
+  alt_codes[alt_cost - 1].use_uw = false;
+  memcpy (codes, alt_codes, sizeof (alt_codes));
+  cost = alt_cost;
+   }
+}
+
   /* Final cases, particularly focused on bseti.  */
   if (cost > 2 && TARGET_ZBS)
 {
@@ -2542,6 +2573,14 @@ riscv_move_integer (rtx temp, rtx dest, HOST_WIDE_INT 
value,
  x = gen_rtx_fmt_ee (AND, mode, x, GEN_INT (value));
  x = riscv_emit_set (t, x);
}
+ else if (codes[i].code == FMA)
+   {
+ HOST_WIDE_INT value = exact_log2 (codes[i].value - 1);
+ rtx ashift = gen_rtx_fmt_ee (ASHIFT, mode, x, GEN_INT (value));
+ x = gen_rtx_fmt_ee (PLUS, mode, ashift, x);
+ rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp;
+ x = riscv_emit_set (t, x);
+   }
  else
x = gen_rtx_fmt_ee (codes[i].code, mode,
x, GEN_INT (codes[i].value));
diff --git a/gcc/testsuite/gcc.target/riscv/synthesis-1.c 
b/gcc/testsuite/gcc.target/riscv/synthesis-1.c
index 3384e488ade..9176d5f4989 100644
--- a/gcc/testsuite/gcc.target/riscv/synthesis-1.c
+++ b/gcc/testsuite/gcc.target/riscv/synthesis-1.c
@@ -12,7 +12,7 @@
total number of instructions. 
 
This isn't expected to change much and any change is worthy of a look.  */
-/* { dg-final { scan-assembler-times "\\t(add|addi|bseti|li|ret|slli)" 5822 } 
} */
+/* { dg-final { scan-assembler-times 
"\\t(add|addi|bseti|li|ret|sh1add|sh2add|sh3add|slli)" 5822 } } */
 
  unsigned long foo_0x3(void) { return 0x3UL; }
  unsigned long foo_0x5(void) { return 0x5UL; }
diff --git a/gcc/testsuite/gcc.target/riscv/synthesis-3.c 
b/gcc/testsuite/gcc.target/riscv/synthesis-3.c
new file mode 100644
index 000..5d92ac8e309
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/synthesis-3.c
@@ -0,0 +1,81 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* We aggressively skip as we really just need to test the basic synthesis
+   which shouldn't vary based on the optimization level.  -O1 seems to work
+   and eliminates the usual sources of extraneous dead code that would throw
+   off the counts.  */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O2" "-O3" "-Os" "-Oz" "-flto" } } 
*/
+/* { dg-options "-march=rv64gc_zba_zbb_zbs" } */
+
+/* Rather than test for a specific synthesis of all these constants or
+   having thousands of tests each testing one variant, we just test the
+   total number of instructions. 
+
+   This isn't expected to change much and any change is worthy of a 

[RISC-V] Use shNadd for constant synthesis

2024-05-09 Thread Jeff Law

So here's the next idiom to improve constant synthesis.

The basic idea here is to try and use shNadd to generate the constant 
when profitable.


Let's take 0x30801.  Right now that generates:

li  a0,3145728
addia0,a0,1
sllia0,a0,12
addia0,a0,-2047


But we can do better.  The constant is evenly divisible by 9 resulting 
in 0x5639 which doesn't look terribly interesting.  But that 
constant can be generated with two instructions, then we can use a 
sh3add to multiply it by 9.  So the updated sequence looks like:


li  a0,1431654400
addia0,a0,1593
sh3add  a0,a0,a0


This doesn't trigger a whole lot, but I haven't really set up a test to 
explore the most likely space where this might be useful.  The tests 
were found exploring a different class of constant synthesis problems.


If you were to dive into the before/after you'd see that the shNadd 
interacts quite nicely with the recent bseti work.   The joys of recursion.


Probably the most controversial thing in here is using the "FMA" opcode 
to stand in for when we want to use shNadd.  Essentially when we 
synthesize a constant we generate a series of RTL opcodes and constants 
for emission by another routine.   We don't really have a way to say we 
want a shift-add.  But you can think of shift-add as a limited form of 
multiply-accumulate.  It's a bit of a stretch, but not crazy bad IMHO.


Other approaches would be to store our own enum rather than an RTL 
opcode.  Or store an actual generator function rather than any kind of 
opcode.


It wouldn't take much pushback over (ab)using FMA in this manner to get 
me to use our own enums rather than RTL opcodes for this stuff.


Tested on rv64gc and rv32gcv.  Waiting on wider CI run before committing.

Jeff


gcc/

* config/riscv/riscv.cc (riscv_build_integer_1): Recognize cases where
we can use shNadd to improve constant synthesis.
(riscv_move_integer): Handle code generation for shNadd.

gcc/testsuite
* gcc.target/riscv/synthesis-1.c: Also count shNadd instructions.
* gcc.target/riscv/synthesis-3.c: New test.


diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 2eac67b0ce0..75e828c81a7 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -880,6 +880,37 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
}
 }
 
+  if (cost > 2 && TARGET_64BIT && TARGET_ZBA)
+{
+  if ((value % 9) == 0
+ && (alt_cost = riscv_build_integer_1 (alt_codes, value / 9, mode) + 
1) < cost)
+   {
+  alt_codes[alt_cost - 1].code = FMA;
+  alt_codes[alt_cost - 1].value = 9;
+  alt_codes[alt_cost - 1].use_uw = false;
+  memcpy (codes, alt_codes, sizeof (alt_codes));
+  cost = alt_cost;
+   }
+  if ((value % 5) == 0
+ && (alt_cost = riscv_build_integer_1 (alt_codes, value / 5, mode) + 
1) < cost)
+   {
+  alt_codes[alt_cost - 1].code = FMA;
+  alt_codes[alt_cost - 1].value = 5;
+  alt_codes[alt_cost - 1].use_uw = false;
+  memcpy (codes, alt_codes, sizeof (alt_codes));
+  cost = alt_cost;
+   }
+  if ((value % 3) == 0
+ && (alt_cost = riscv_build_integer_1 (alt_codes, value / 3, mode) + 
1) < cost)
+   {
+  alt_codes[alt_cost - 1].code = FMA;
+  alt_codes[alt_cost - 1].value = 3;
+  alt_codes[alt_cost - 1].use_uw = false;
+  memcpy (codes, alt_codes, sizeof (alt_codes));
+  cost = alt_cost;
+   }
+}
+
   /* Final cases, particularly focused on bseti.  */
   if (cost > 2 && TARGET_ZBS)
 {
@@ -2542,6 +2573,14 @@ riscv_move_integer (rtx temp, rtx dest, HOST_WIDE_INT 
value,
  x = gen_rtx_fmt_ee (AND, mode, x, GEN_INT (value));
  x = riscv_emit_set (t, x);
}
+ else if (codes[i].code == FMA)
+   {
+ HOST_WIDE_INT value = exact_log2 (codes[i].value - 1);
+ rtx ashift = gen_rtx_fmt_ee (ASHIFT, mode, x, GEN_INT (value));
+ x = gen_rtx_fmt_ee (PLUS, mode, ashift, x);
+ rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp;
+ x = riscv_emit_set (t, x);
+   }
  else
x = gen_rtx_fmt_ee (codes[i].code, mode,
x, GEN_INT (codes[i].value));
diff --git a/gcc/testsuite/gcc.target/riscv/synthesis-1.c 
b/gcc/testsuite/gcc.target/riscv/synthesis-1.c
index 3384e488ade..9176d5f4989 100644
--- a/gcc/testsuite/gcc.target/riscv/synthesis-1.c
+++ b/gcc/testsuite/gcc.target/riscv/synthesis-1.c
@@ -12,7 +12,7 @@
total number of instructions. 
 
This isn't expected to change much and any change is worthy of a look.  */
-/* { dg-final { scan-assembler-times "\\t(add|addi|bseti|li|ret|slli)" 5822 } 
} */
+/* { dg-final { scan-assembler-times 

Re: [PATCH 1/4] RISC-V: Add test cases for cpymem expansion

2024-05-09 Thread Jeff Law




On 5/7/24 11:17 PM, Christoph Müllner wrote:

We have two mechanisms in the RISC-V backend that expand
cpymem pattern: a) by-pieces, b) riscv_expand_block_move()
in riscv-string.cc. The by-pieces framework has higher priority
and emits a sequence of up to 15 instructions
(see use_by_pieces_infrastructure_p() for more details).

As a rule-of-thumb, by-pieces emits alternating load/store sequences
and the setmem expansion in the backend emits a sequence of loads
followed by a sequence of stores.

Let's add some test cases to document the current behaviour
and to have tests to identify regressions.

Signed-off-by: Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: New test.
* gcc.target/riscv/cpymem-32.c: New test.
* gcc.target/riscv/cpymem-64-ooo.c: New test.
* gcc.target/riscv/cpymem-64.c: New test.

It looks like those function body tests are fairly generic.  So OK.

Jeff



Re: [patch,avr] PR114981: Implement __builtin_powif in assembly

2024-05-09 Thread Jeff Law




On 5/8/24 4:10 AM, Georg-Johann Lay wrote:

__builtin_powif is currently implemented in C,
and this patch implements it (__powisf2) in assembly.

Ok for master?

Johann

--

AVR: target/114981 - Tweak __powisf2

Implement __powisf2 in assembly.

 PR target/114981
libgcc/
 * config/avr/t-avr (LIB2FUNCS_EXCLUDE): Add _powisf2.
 (LIB1ASMFUNCS) [!avrtiny]: Add _powif.
 * config/avr/lib1funcs.S (mov4): New .macro.
 (L_powif, __powisf2) [!avrtiny]: New module and function.

testsuite/
 * gcc.target/avr/pr114981-powif.c: New test.
Trusting you on the implementation, I don't know this anywhere near well 
enough to review it.


OK
Jeff



Re: [PATCH 3/3] RISC-V: Add memset-zero expansion to cbo.zero

2024-05-09 Thread Jeff Law




On 5/7/24 11:38 PM, Christoph Müllner wrote:

The Zicboz extension offers the cbo.zero instruction, which can be used
to clean a memory region corresponding to a cache block.
The Zic64b extension defines the cache block size to 64 byte.
If both extensions are available, it is possible to use cbo.zero
to clear memory, if the alignment and size constraints are met.
This patch implements this.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_block_clear): New prototype.
* config/riscv/riscv-string.cc (riscv_expand_block_clear_zicboz_zic64b):
New function to expand a block-clear with cbo.zero.
(riscv_expand_block_clear): New RISC-V block-clear expansion function.
* config/riscv/riscv.md (setmem): New setmem expansion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicboz-zic64-1.c: New test.
Depending on the underlying uarch details cbo.zero may not be nearly as 
useful as it might first appear.  There can be multiple uarch details 
that come into play.  We've done a fair amount of measurement internally 
in this space and while cbo.zero is a win, it's not a huge win.  Point 
being we may nee to come back and make this part of the tuning structure 
so uarchs can adjust.


--


I know in the cbo memset implementation VRULL provided to Ventana you 
used the trick of allowing overlapping stores to avoid the alignment 
requirements.  ie we issue a series of "sd" instrutions to ensure we 
cross the alignment barrier, then a series of cbo.zero instructions for 
the cache lines (possibly overlapping the locations stored by those "sd" 
instructions, then handled residuals which may overlap the last cbo.zero 
instructions.


I don't think you necessarily have to do that for this patch, but I 
suspect that a similar approach would make this apply much more often in 
practice.


So, OK for the trunk and consider the unaligned cases as potential 
follow-up enhancements.


THanks
Jeff


Re: [PATCH 2/3] RISC-V: testsuite: Make cmo tests LTO safe

2024-05-09 Thread Jeff Law




On 5/7/24 11:38 PM, Christoph Müllner wrote:

Let's add '\t' to the instruction match pattern to avoid false positive
matches when compiling with -flto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicbom-1.c: Add \t to test pattern.
* gcc.target/riscv/cmo-zicbom-2.c: Likewise.
* gcc.target/riscv/cmo-zicbop-1.c: Likewise.
* gcc.target/riscv/cmo-zicbop-2.c: Likewise.
* gcc.target/riscv/cmo-zicboz-1.c: Likewise.
* gcc.target/riscv/cmo-zicboz-2.c: Likewise.

OK
jeff



Re: [PATCH 1/3] expr: Export clear_by_pieces()

2024-05-09 Thread Jeff Law




On 5/7/24 11:38 PM, Christoph Müllner wrote:

Make clear_by_pieces() available to other parts of the compiler,
similar to store_by_pieces().

gcc/ChangeLog:

* expr.cc (clear_by_pieces): Remove static from clear_by_pieces.
* expr.h (clear_by_pieces): Add prototype for clear_by_pieces.

OK
jeff



Re: [PATCH 2/2] RISC-V: Add cmpmemsi expansion

2024-05-09 Thread Jeff Law




On 5/7/24 11:52 PM, Christoph Müllner wrote:

GCC has a generic cmpmemsi expansion via the by-pieces framework,
which shows some room for target-specific optimizations.
E.g. for comparing two aligned memory blocks of 15 bytes
we get the following sequence:

my_mem_cmp_aligned_15:
 li  a4,0
 j   .L2
.L8:
 bgeua4,a7,.L7
.L2:
 add a2,a0,a4
 add a3,a1,a4
 lbu a5,0(a2)
 lbu a6,0(a3)
 addia4,a4,1
 li  a7,15// missed hoisting
 subwa5,a5,a6
 andia5,a5,0xff // useless
 beq a5,zero,.L8
 lbu a0,0(a2) // loading again!
 lbu a5,0(a3) // loading again!
 subwa0,a0,a5
 ret
.L7:
 li  a0,0
 ret

Diff first byte: 15 insns
Diff second byte: 25 insns
No diff: 25 insns

Possible improvements:
* unroll the loop and use load-with-displacement to avoid offset increments
* load and compare multiple (aligned) bytes at once
* Use the bitmanip/strcmp result calculation (reverse words and
   synthesize (a2 >= a3) ? 1 : -1 in a branchless sequence)

When applying these improvements we get the following sequence:

my_mem_cmp_aligned_15:
 ld  a5,0(a0)
 ld  a4,0(a1)
 bne a5,a4,.L2
 ld  a5,8(a0)
 ld  a4,8(a1)
 sllia5,a5,8
 sllia4,a4,8
 bne a5,a4,.L2
 li  a0,0
.L3:
 sext.w  a0,a0
 ret
.L2:
 rev8a5,a5
 rev8a4,a4
 sltua5,a5,a4
 neg a5,a5
 ori a0,a5,1
 j   .L3

Diff first byte: 11 insns
Diff second byte: 16 insns
No diff: 11 insns

This patch implements this improvements.

The tests consist of a execution test (similar to
gcc/testsuite/gcc.dg/torture/inline-mem-cmp-1.c) and a few tests
that test the expansion conditions (known length and alignment).

Similar to the cpymemsi expansion this patch does not introduce any
gating for the cmpmemsi expansion (on top of requiring the known length,
alignment and Zbb).

Bootstrapped and SPEC CPU 2017 tested.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_block_compare): New
prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER2): New helper.
(do_load_from_addr): Add support for HI and SI/64 modes.
(emit_memcmp_scalar_load_and_compare): New helper to emit memcmp.
(emit_memcmp_scalar_result_calculation): Likewise.
(riscv_expand_block_compare_scalar): Likewise.
(riscv_expand_block_compare): New RISC-V expander for memory compare.
* config/riscv/riscv.md (cmpmemsi): New cmpmem expansion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmpmemsi-1.c: New test.
* gcc.target/riscv/cmpmemsi-2.c: New test.
* gcc.target/riscv/cmpmemsi-3.c: New test.
* gcc.target/riscv/cmpmemsi.c: New test.

Signed-off-by: Christoph Müllner 
---
  gcc/config/riscv/riscv-protos.h |   1 +
  gcc/config/riscv/riscv-string.cc| 161 
  gcc/config/riscv/riscv.md   |  15 ++
  gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c |   6 +
  gcc/testsuite/gcc.target/riscv/cmpmemsi-2.c |  42 +
  gcc/testsuite/gcc.target/riscv/cmpmemsi-3.c |  43 ++
  gcc/testsuite/gcc.target/riscv/cmpmemsi.c   |  22 +++
  7 files changed, 290 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/cmpmemsi-2.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/cmpmemsi-3.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/cmpmemsi.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index e5aebf3fc3d..30ffe30be1d 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -188,6 +188,7 @@ rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
  rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
  
  /* Routines implemented in riscv-string.c.  */

+extern bool riscv_expand_block_compare (rtx, rtx, rtx, rtx);
  extern bool riscv_expand_block_move (rtx, rtx, rtx);
  
  /* Information about one CPU we know about.  */

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index b09b51d7526..9d4dc0cb827 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -86,6 +86,7 @@ GEN_EMIT_HELPER2(th_rev) /* do_th_rev2  */
  GEN_EMIT_HELPER2(th_tstnbz) /* do_th_tstnbz2  */
  GEN_EMIT_HELPER3(xor) /* do_xor3  */
  GEN_EMIT_HELPER2(zero_extendqi) /* do_zero_extendqi2  */
+GEN_EMIT_HELPER2(zero_extendhi) /* do_zero_extendhi2  */
  
  #undef GEN_EMIT_HELPER2

  #undef GEN_EMIT_HELPER3
@@ -109,6 +110,10 @@ do_load_from_addr (machine_mode mode, rtx dest, rtx 
addr_reg, rtx addr)
  
if (mode == QImode)

  do_zero_extendqi2 (dest, mem);
+  else if (mode == HImode)
+do_zero_extendhi2 (dest, mem);
+  

Re: [PATCH 1/2] RISC-V: Add tests for cpymemsi expansion

2024-05-08 Thread Jeff Law




On 5/7/24 11:52 PM, Christoph Müllner wrote:

cpymemsi expansion was available for RISC-V since the initial port.
However, there are not tests to detect regression.
This patch adds such tests.

Three of the tests target the expansion requirements (known length and
alignment). One test reuses an existing memcpy test from the by-pieces
framework (gcc/testsuite/gcc.dg/torture/inline-mem-cpy-1.c).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymemsi-1.c: New test.
* gcc.target/riscv/cpymemsi-2.c: New test.
* gcc.target/riscv/cpymemsi-3.c: New test.
* gcc.target/riscv/cpymemsi.c: New test.

OK
jeff



Re: [PATCH gcc-13-backport] RISCV: Add -m(no)-omit-leaf-frame-pointer support.

2024-05-08 Thread Jeff Law




On 5/8/24 11:32 AM, Palmer Dabbelt wrote:

From: Yanzhang Wang 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_save_reg_p): Save ra for leaf
when enabling -mno-omit-leaf-frame-pointer
(riscv_option_override): Override omit-frame-pointer.
(riscv_frame_pointer_required): Save s0 for non-leaf function
(TARGET_FRAME_POINTER_REQUIRED): Override defination
* config/riscv/riscv.opt: Add option support.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/omit-frame-pointer-1.c: New test.
* gcc.target/riscv/omit-frame-pointer-2.c: New test.
* gcc.target/riscv/omit-frame-pointer-3.c: New test.
* gcc.target/riscv/omit-frame-pointer-4.c: New test.
* gcc.target/riscv/omit-frame-pointer-test.c: New test.

Signed-off-by: Yanzhang Wang 
(cherry picked from commit 39663298b5934831a0125e12f113ebd83248c3be)
---
I haven't tested this (just an all-gcc build), but I figured I'd just
send it now as it's kind of a grey area for backports: the flag itself
is a new feature, but it also fixes a compatibility issue with the psABI
-- which itself is a grey area, as the psABI change was a retrofit and is
marked as optional.  I'd test it before pushing it, but this is one of
those things where I'm not really sure what the backporting rules
indicate we should do.

There's more discussion on this LKML thread:
https://lore.kernel.org/linux-riscv/527dd4d8-f1e5-4581-b1e3-aa315fea8...@sifive.com/T/#mf15ccc659b7b8b838b88959fbea460210875eb9c

That also has a much smaller fix, but having the whole argument seems
like a nicer user interface to me -- then users who really want
compatibility with the psABI's section on frame records can just ask for
it directly (via the odd spelling `-fno-omit-frame-pointer
-mno-omit-leaf-frame-pointer`, but too late to change that).

Thoughts on this for 13?

Given its target specific, I think we have a lot more leeway here.

I think there was a followup in the space.  defa8681d951




We'd probably also want it all the way back to 11, but I assume that's
going to be the same discussion.

Yea.

You might explicitly run it by Jakub.  But I'm certainly OK with this 
being backported.


jeff


[committed] [RISC-V] Provide splitting guidance to combine to faciliate shNadd.uw generation

2024-05-08 Thread Jeff Law
This fixes a minor code quality issue I found while comparing GCC and 
LLVM.  Essentially we want to do a bit of re-association to generate 
shNadd.uw instructions.


Combine does the right thing and finds all the necessary instructions, 
reassociates the operands, combines constants, etc.  Where is fails is 
finding a good split point.  The backend can trivially provide guidance 
on how to split via a define_split pattern.


This has survived both Ventana's internal CI system (rv64gcb) as well as 
my own (rv64gc, rv32gcv).


I'll wait for the external CI system to give the all-clear before pushing.



jeff

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index ad3ad758959..d76a72d30e0 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -184,6 +184,23 @@ (define_insn "*slliuw"
   [(set_attr "type" "bitmanip")
(set_attr "mode" "DI")])
 
+;; Combine will reassociate the operands in the most useful way here.  We
+;; just have to give it guidance on where to split the result to facilitate
+;; shNadd.uw generation.
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+   (plus:DI (plus:DI (and:DI (ashift:DI (match_operand:DI 1 
"register_operand")
+(match_operand:QI 2 
"imm123_operand"))
+ (match_operand 3 
"consecutive_bits32_operand"))
+ (match_operand:DI 4 "register_operand"))
+(match_operand 5 "immediate_operand")))]
+  "TARGET_64BIT && TARGET_ZBA"
+  [(set (match_dup 0)
+   (plus:DI (and:DI (ashift:DI (match_dup 1) (match_dup 2))
+(match_dup 3))
+(match_dup 4)))
+   (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 5)))])
+
 ;; ZBB extension.
 
 (define_expand "clzdi2"
diff --git a/gcc/testsuite/gcc.target/riscv/zba-shadduw.c 
b/gcc/testsuite/gcc.target/riscv/zba-shadduw.c
new file mode 100644
index 000..5b77447e681
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zba-shadduw.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gc_zba -mabi=lp64" } */
+
+typedef struct simple_bitmap_def
+{
+  unsigned char *popcount;
+  unsigned int n_bits;
+  unsigned int size;
+  unsigned long elms[1];
+} *sbitmap;
+typedef const struct simple_bitmap_def *const_sbitmap;
+
+typedef unsigned long *sbitmap_ptr;
+typedef const unsigned long *const_sbitmap_ptr;
+static unsigned long sbitmap_elt_popcount (unsigned long);
+
+void
+sbitmap_a_or_b (sbitmap dst, const_sbitmap a, const_sbitmap b)
+{
+  unsigned int i, n = dst->size;
+  sbitmap_ptr dstp = dst->elms;
+  const_sbitmap_ptr ap = a->elms;
+  const_sbitmap_ptr bp = b->elms;
+  unsigned char has_popcount = dst->popcount != ((void *) 0);
+
+  for (i = 0; i < n; i++)
+{
+  const unsigned long tmp = *ap++ | *bp++;
+  *dstp++ = tmp;
+}
+}
+
+
+/* { dg-final { scan-assembler "sh3add.uw" } } */
+/* { dg-final { scan-assembler-not {\mslli.uw} } } */


Re: [PATCH v1 1/1] RISC-V: Nan-box the result of movbf on soft-bf16

2024-05-08 Thread Jeff Law




On 5/7/24 6:38 PM, Xiao Zeng wrote:

1 This patch implements the Nan-box of bf16.

2 Please refer to the Nan-box implementation of hf16 in:


3 The discussion about Nan-box can be found on the website:


4 Below test are passed for this patch
 * The riscv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Expand movbf
with Nan-boxing value.
* config/riscv/riscv.md (*movbf_softfloat_boxing): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/_Bfloat16-nanboxing.c: New test.
---
  gcc/config/riscv/riscv.cc | 51 ++-
  gcc/config/riscv/riscv.md | 12 -
  .../gcc.target/riscv/_Bfloat16-nanboxing.c| 38 ++
  3 files changed, 76 insertions(+), 25 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/_Bfloat16-nanboxing.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 545e68566dc..be2cb245733 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3120,35 +3120,38 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)


  
- if (TARGET_HARD_FLOAT

- && !TARGET_ZFHMIN && mode == HFmode
- && REG_P (dest) && FP_REG_P (REGNO (dest))
- && REG_P (src) && !FP_REG_P (REGNO (src))
- && can_create_pseudo_p ())

[ ... ]


+  if (TARGET_HARD_FLOAT
+  && ((!TARGET_ZFHMIN && mode == HFmode)
+ || (!TARGET_ZFBFMIN && mode == BFmode))
+  && REG_P (dest) && FP_REG_P (REGNO (dest)) && REG_P (src)
+  && !FP_REG_P (REGNO (src)) && can_create_pseudo_p ())


So there's a bit of gratutious rewriting going on here.  I realize you 
were fixing formatting problems (thanks!), but I don't see a need to 
rewriting the tests starting with REG_P.  I put those back in their 
original form with the whitespace fixes.


I'll push the fixed version momentarily.

Thanks again!

jeff




Re: [PATCH v2 4/4] RISC-V: Cover sign-extensions in lshr3_zero_extend_4

2024-05-08 Thread Jeff Law




On 5/8/24 1:36 AM, Christoph Müllner wrote:

The lshr3_zero_extend_4 pattern targets bit extraction
with zero-extension. This pattern represents the canonical form
of zero-extensions of a logical right shift.

The same optimization can be applied to sign-extensions.
Given the two optimizations are so similar, this patch converts
the existing one to also cover the sign-extension case as well.

gcc/ChangeLog:

* config/riscv/iterators.md (ashiftrt): New code attribute
'extract_shift' and adding extractions to optab.
* config/riscv/riscv.md (*lshr3_zero_extend_4): Rename to...
(*3):...this and add support for
sign-extensions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/extend-shift-helpers.h: Add helpers for
sign-extension.
* gcc.target/riscv/sign-extend-rshift-32.c: New test.
* gcc.target/riscv/sign-extend-rshift-64.c: New test.
* gcc.target/riscv/sign-extend-rshift.c: New test.
Oh, I see, you handled the special case with this patch.  Ignore my 
comment on 3/4.  3/4 is fine, as is this patch.


Thanks!

jeff


Re: [PATCH v2 3/4] RISC-V: Add zero_extract support for rv64gc

2024-05-08 Thread Jeff Law




On 5/8/24 1:36 AM, Christoph Müllner wrote:

The combiner attempts to optimize a zero-extension of a logical right shift
using zero_extract. We already utilize this optimization for those cases
that result in a single instructions.  Let's add a insn_and_split
pattern that also matches the generic case, where we can emit an
optimized sequence of a slli/srli.

Tested with SPEC CPU 2017 (rv64gc).

PR 111501

gcc/ChangeLog:

* config/riscv/riscv.md (*lshr3_zero_extend_4): New
pattern for zero-extraction.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/extend-shift-helpers.h: New test.
* gcc.target/riscv/pr111501.c: New test.
* gcc.target/riscv/zero-extend-rshift-32.c: New test.
* gcc.target/riscv/zero-extend-rshift-64.c: New test.
* gcc.target/riscv/zero-extend-rshift.c: New test.

Doesn't your new pattern still match this one:


;; Canonical form for a zero-extend of a logical right shift.
(define_insn "*lshrsi3_zero_extend_2"   [(set (match_operand:DI   0 
"register_operand" "=r")
(zero_extract:DI (match_operand:DI  1 "register_operand" " r")
 (match_operand 2 "const_int_operand")
 (match_operand 3 "const_int_operand")))]
  "(TARGET_64BIT && (INTVAL (operands[3]) > 0)
&& (INTVAL (operands[2]) + INTVAL (operands[3]) == 32))"
{
  return "srliw\t%0,%1,%3";
}
  [(set_attr "type" "shift")
   (set_attr "mode" "SI")]) 


Meaning that we'll start generating shift-pairs for this special case 
rather than using srliw directly.  I'm pretty sure Lyut and I stumbled 
over this exact problem when evaluating his effort in this space.


?

Jeff


Re: [PATCH v2 2/4] RISC-V: Cover sign-extensions in lshrsi3_zero_extend_2

2024-05-08 Thread Jeff Law




On 5/8/24 1:36 AM, Christoph Müllner wrote:

The pattern lshrsi3_zero_extend_2 extracts the MSB bits of the lower
32-bit word and zero-extends it back to DImode.
This is realized using srliw, which operates on 32-bit registers.

The same optimziation can be applied to sign-extensions when emitting
a sraiw instead of the srliw.

Given these two optimizations are so similar, this patch simply
converts the existing one to also cover the sign-extension case as well.

gcc/ChangeLog:

* config/riscv/iterators.md (sraiw): New code iterator 'any_extract'.
New code attribute 'extract_sidi_shift'.
* config/riscv/riscv.md (*lshrsi3_zero_extend_2): Rename to...
(*lshrsi3_extend_2):...this and add support for sign-extensions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sign-extend-1.c: Test sraiw 24 and sraiw 16.

OK
jeff



Re: [PATCH v2 1/4] RISC-V: Add test for sraiw-31 special case

2024-05-08 Thread Jeff Law




On 5/8/24 1:36 AM, Christoph Müllner wrote:

We already optimize a sign-extension of a right-shift by 31 in
si3_extend.  Let's add a test for that (similar to
zero-extend-1.c).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sign-extend-1.c: New test.

OK
jeff



[committed][RISC-V] Turn on overlap_op_by_pieces for generic-ooo tuning

2024-05-07 Thread Jeff Law
Per quick email exchange with Palmer.  Given the triviality, I'm just 
pushing it.


jeffcommit 9f14f1978260148d4d6208dfd73df1858e623758
Author: Jeff Law 
Date:   Tue May 7 15:34:16 2024 -0600

[committed][RISC-V] Turn on overlap_op_by_pieces for generic-ooo tuning

Per quick email exchange with Palmer.  Given the triviality, I'm just 
pushing
it.

gcc/
* config/riscv/riscv.cc (generic_ooo_tune_info): Turn on
overlap_op_by_pieces.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a9b57d41184..62207b6b227 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -536,7 +536,7 @@ static const struct riscv_tune_param generic_ooo_tune_info 
= {
   4,   /* fmv_cost */
   false,   /* slow_unaligned_access */
   false,   /* use_divmod_expansion */
-  false,   /* overlap_op_by_pieces */
+  true,/* overlap_op_by_pieces 
*/
   RISCV_FUSE_NOTHING,   /* fusible_ops */
   _vector_cost,/* vector cost */
 };


Re: [committed] [RISC-V] Allow uarchs to set TARGET_OVERLAP_OP_BY_PIECES_P

2024-05-07 Thread Jeff Law




On 5/7/24 3:24 PM, Palmer Dabbelt wrote:


@@ -529,6 +536,7 @@ static const struct riscv_tune_param generic_ooo_tune_info 
= {
4,  /* fmv_cost */
false,  /* slow_unaligned_access */
false,  /* use_divmod_expansion */
+  false,   /* overlap_op_by_pieces */


IMO we should turn this on for the generic OOO tuning -- the benchmarks
say it's not faster for the T-Head OOO cores, but we were all so
surprised to find that I don't think we even fully trust the benchmarks.
I'd assume OOO cores are faster with the overlapping stores, so we
should just lean into it and let vendors say something if that's the
wrong assumption.
Several factors likely come into play (branch prediction, OOO 
properties, write combining, etc etc).


But sure, I don't think that'd be terribly controversial.  I can go 
ahead and make that change now given its triviality.


Jeff





[committed] [RISC-V] Allow uarchs to set TARGET_OVERLAP_OP_BY_PIECES_P

2024-05-07 Thread Jeff Law

This is almost exclusively work from the VRULL team.

As we've discussed in the Tuesday meeting in the past, we'd like to have 
a knob in the tuning structure to indicate that overlapped stores during 
move_by_pieces expansion of memcpy & friends are acceptable.


This patch adds the that capability in our tuning structure.  It's off 
for all the uarchs upstream, but we have been using it inside Ventana 
for our uarch with success.  So technically it's NFC upstream, but puts 
in the infrastructure multiple organizations likely need.



Built and tested rv64gc.  Pushing to the trunk shortly.
jeffcommit 300393484dbfa9fd3891174ea47aa3fb41915abc
Author: Christoph Müllner 
Date:   Tue May 7 15:16:21 2024 -0600

[committed] [RISC-V] Allow uarchs to set TARGET_OVERLAP_OP_BY_PIECES_P

This is almost exclusively work from the VRULL team.

As we've discussed in the Tuesday meeting in the past, we'd like to have a 
knob
in the tuning structure to indicate that overlapped stores during
move_by_pieces expansion of memcpy & friends are acceptable.

This patch adds the that capability in our tuning structure.  It's off for 
all
the uarchs upstream, but we have been using it inside Ventana for our uarch
with success.  So technically it's NFC upstream, but puts in the 
infrastructure
multiple organizations likely need.

gcc/

* config/riscv/riscv.cc (struct riscv_tune_param): Add new
"overlap_op_by_pieces" field.
(rocket_tune_info, sifive_7_tune_info): Set it.
(sifive_p400_tune_info, sifive_p600_tune_info): Likewise.
(thead_c906_tune_info, xiangshan_nanhu_tune_info): Likewise.
(generic_ooo_tune_info, optimize_size_tune_info): Likewise.
(riscv_overlap_op_by_pieces): New function.
(TARGET_OVERLAP_OP_BY_PIECES_P): define.

gcc/testsuite/

* gcc.target/riscv/memcpy-nonoverlapping.c: New test.
* gcc.target/riscv/memset-nonoverlapping.c: New test.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 545e68566dc..a9b57d41184 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -288,6 +288,7 @@ struct riscv_tune_param
   unsigned short fmv_cost;
   bool slow_unaligned_access;
   bool use_divmod_expansion;
+  bool overlap_op_by_pieces;
   unsigned int fusible_ops;
   const struct cpu_vector_cost *vec_costs;
 };
@@ -427,6 +428,7 @@ static const struct riscv_tune_param rocket_tune_info = {
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
   false,   /* use_divmod_expansion */
+  false,   /* overlap_op_by_pieces */
   RISCV_FUSE_NOTHING,   /* fusible_ops */
   NULL,/* vector cost */
 };
@@ -444,6 +446,7 @@ static const struct riscv_tune_param sifive_7_tune_info = {
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
   false,   /* use_divmod_expansion */
+  false,   /* overlap_op_by_pieces */
   RISCV_FUSE_NOTHING,   /* fusible_ops */
   NULL,/* vector cost */
 };
@@ -461,6 +464,7 @@ static const struct riscv_tune_param sifive_p400_tune_info 
= {
   4,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
   false,   /* use_divmod_expansion */
+  false,   /* overlap_op_by_pieces */
   RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI,  /* fusible_ops */
   _vector_cost,/* vector cost */
 };
@@ -478,6 +482,7 @@ static const struct riscv_tune_param sifive_p600_tune_info 
= {
   4,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
   false,   /* use_divmod_expansion */
+  false,   /* overlap_op_by_pieces */
   RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI,  /* fusible_ops */
   _vector_cost,/* vector cost */
 };
@@ -495,6 +500,7 @@ static const struct riscv_tune_param thead_c906_tune_info = 
{
   8,   /* fmv_cost */
   false,/* slow_unaligned_access */
   false,   /* use_divmod_expansion */
+  false,   /* overlap_op_by_pieces */
   RISCV_FUSE_NOTHING,   /* fusible_ops */
   NULL,/* vector cost */
 };
@@ -512,6 +518,7 @@ static 

Re: [PATCH] MATCH: Add some more value_replacement simplifications (a != 0 ? expr : 0) to match

2024-05-07 Thread Jeff Law




On 4/30/24 9:21 PM, Andrew Pinski wrote:

This adds a few more of what is currently done in phiopt's value_replacement
to match. I noticed this when I was hooking up phiopt's value_replacement
code to use match and disabling the old code. But this can be done
independently from the hooking up phiopt's value_replacement as phiopt
is already hooked up for simplified versions already.

/* a != 0 ? a / b : 0  -> a / b iff b is nonzero. */
/* a != 0 ? a * b : 0 -> a * b */
/* a != 0 ? a & b : 0 -> a & b */

We prefer the `cond ? a : 0` forms to allow optimization of `a * cond` which
uses that form.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR treee-optimization/114894

gcc/ChangeLog:

* match.pd (`a != 0 ? a / b : 0`): New pattern.
(`a != 0 ? a * b : 0`): New pattern.
(`a != 0 ? a & b : 0`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-value-5.c: New test.
Is there any need to also handle the reversed conditional with the arms 
swapped?If not, this is fine as-is.  If yes, then fine with the 
obvious generalization.


jeff



Re: [PATCH v3] DCE __cxa_atexit calls where the function is pure/const [PR19661]

2024-05-07 Thread Jeff Law




On 5/4/24 5:58 PM, Andrew Pinski wrote:

In C++ sometimes you have a deconstructor function which is "empty", like for an
example with unions or with arrays.  The front-end might not know it is empty 
either
so this should be done on during optimization.o
To implement it I added it to DCE where we mark if a statement is necessary or 
not.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Changes since v1:
   * v2: Add support for __aeabi_atexit for arm-*eabi. Add extra comments.
 Add cxa_atexit-5.C testcase for -fPIC case.
   * v3: Fix testcases for the __aeabi_atexit (forgot to do in the v2).

PR tree-optimization/19661

gcc/ChangeLog:

* tree-ssa-dce.cc (is_cxa_atexit): New function.
(is_removable_cxa_atexit_call): New function.
(mark_stmt_if_obviously_necessary): Don't mark removable
cxa_at_exit calls.
(mark_all_reaching_defs_necessary_1): Likewise.
(propagate_necessity): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/cxa_atexit-1.C: New test.
* g++.dg/tree-ssa/cxa_atexit-2.C: New test.
* g++.dg/tree-ssa/cxa_atexit-3.C: New test.
* g++.dg/tree-ssa/cxa_atexit-4.C: New test.
* g++.dg/tree-ssa/cxa_atexit-5.C: New test.
* g++.dg/tree-ssa/cxa_atexit-6.C: New test.

OK
jeff



Re: [patch,avr] PR114975: Better 8-bit parity detection.

2024-05-07 Thread Jeff Law




On 5/7/24 11:23 AM, Georg-Johann Lay wrote:

Add a combine pattern for parity detection.

Ok for master?

Johann

AVR: target/114975 - Add combine-pattern for __parityqi2.

 PR target/114975
gcc/
 * config/avr/avr.md: Add combine pattern for
 8-bit parity detection.

gcc/testsuite/
 * gcc.target/avr/pr114975-parity.c: New test.

OK
jeff



Re: [patch,avr] PR114975: Better 8-bit popcount detection.

2024-05-07 Thread Jeff Law




On 5/7/24 11:25 AM, Georg-Johann Lay wrote:

Add a pattern for better popcount detection.

Ok for master?

Johann

--

AVR: target/114975 - Add combine-pattern for __popcountqi2.

 PR target/114975
gcc/
 * config/avr/avr.md: Add combine pattern for
 8-bit popcount detection.

gcc/testsuite/
 * gcc.target/avr/pr114975-popcount.c: New test.

OK
jeff



Re: [PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-07 Thread Jeff Law




On 5/7/24 9:36 AM, Andreas Schwab wrote:

On Mai 07 2024, Jonathan Wakely wrote:


+#ifdef __riscv
+   return _M_insert(__builtin_copysign((double)__f,
+   (double)-__builtin_signbit(__f));


Should this use static_cast?

And it's missing a close paren.

jeff


Re: [RFA][RISC-V] [PATCH v2] Enable inlining str* by default

2024-05-07 Thread Jeff Law




On 5/4/24 8:41 AM, Jeff Law wrote:
The CI system caught a latent bug in the inline string comparison code 
that shows up with rv32+zbb.  It was hardcoding 64 when AFAICT it should 
have been using BITS_PER_WORD.


So v2 with that fixed.
So per the discussion in today's call I reviewed a couple of spaces, 
particularly -Os and interactions with vector expansion of these routines.



WRT vector expansion.  We *always* use loops for this stuff right now 
(str[n]cmp, strlen).   Vector expansion of these routines is suppressed 
with -Os enabled, which is good as it's hard to see how the vector loops 
will ever be smaller than a function call.


WRT scalar expansion.  -Os generally turns off scalar expansion as well, 
with the exception of trivial cases involving str[n]cmp with one arg 
being a constant string.


These shouldn't interact at all with Sergei's setmem, clrmem, movmem 
expanders.


If we look to improve the vector expansion case (say by handling cases 
with small counts for strncmp or when one argument to str[n]cmp is a 
constant string) in the future, we'll have to revisit.


Overall conclusion is we should go ahead with the patch.

jeff



Re: [PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-07 Thread Jeff Law




On 5/7/24 8:06 AM, Jonathan Wakely wrote:

On Tue, 7 May 2024 at 14:57, Jeff Law wrote:




On 5/7/24 7:49 AM, Jonathan Wakely wrote:

Do we want this change for RISC-V, to fix PR113578?

I haven't tested it on RISC-V, only on x86_64-linux (where it doesn't do
anything).

-- >8 --

libstdc++-v3/ChangeLog:

   PR libstdc++/113578
   * include/std/ostream (operator<<(basic_ostream&, float)):
   Restore signbit after converting to double.

No strong opinion. One could argue that the existence of a
conditional like that inherently implies the generic code is dependent
on specific processor behavior which probably is unwise.  But again, no
strong opinion.


Yes, but I'm not aware of any other processors that lose the signbit
like this, so in practice it's always worked fine to cast the float to
double.
We kicked it around a bit in our meeting today and the thinking is that 
while RISC-V implementation is IEEE 754 compliant, it does differ from 
other implementations.


So do we want to be stuck explaining this corner of IEEE 754 compliance 
to end users?  If not, then we probably want to go with your fix.


Similarly if there's a reasonable chance a standard higher in the 
software stacks mandates the behavior that everyone else has, then we'd 
want to go with your fix as well.


So after further review, I'd lean towards fixing this in libstdc++ by 
whatever means you think is cleanest.


jeff


Re: [PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-07 Thread Jeff Law




On 5/7/24 7:49 AM, Jonathan Wakely wrote:

Do we want this change for RISC-V, to fix PR113578?

I haven't tested it on RISC-V, only on x86_64-linux (where it doesn't do
anything).

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/113578
* include/std/ostream (operator<<(basic_ostream&, float)):
Restore signbit after converting to double.
No strong opinion. One could argue that the existence of a 
conditional like that inherently implies the generic code is dependent 
on specific processor behavior which probably is unwise.  But again, no 
strong opinion.


jeff


[RISC-V][V2] Fix incorrect if-then-else nesting of Zbs usage in constant synthesis

2024-05-06 Thread Jeff Law
Reposting without the patch that ignores whitespace.  The CI system 
doesn't like including both patches, that'll generate a failure to apply 
and none of the tests actually get run.


So I managed to goof the if-then-else level of the bseti bits last week. 
 They were supposed to be a last ditch effort to improve the result, 
but ended up inside a conditional where they don't really belong.  I 
almost always use Zba, Zbb and Zbs together, so it slipped by.


So it's NFC if you always test with Zbb and Zbs enabled together.  But 
if you enabled Zbs without Zbb you'd see a failure to use bseti.


Planning to commit once pre-commit CI passes.

jeff

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 6f1c67bf3f7..dddb7f8d673 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -869,50 +869,51 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  codes[1].use_uw = false;
  cost = 2;
}
-  /* Final cases, particularly focused on bseti.  */
-  else if (cost > 2 && TARGET_ZBS)
-   {
- int i = 0;
+}
 
- /* First handle any bits set by LUI.  Be careful of the
-SImode sign bit!.  */
- if (value & 0x7800)
-   {
- alt_codes[i].code = (i == 0 ? UNKNOWN : IOR);
- alt_codes[i].value = value & 0x7800;
- alt_codes[i].use_uw = false;
- value &= ~0x7800;
- i++;
-   }
+  /* Final cases, particularly focused on bseti.  */
+  if (cost > 2 && TARGET_ZBS)
+{
+  int i = 0;
 
- /* Next, any bits we can handle with addi.  */
- if (value & 0x7ff)
-   {
- alt_codes[i].code = (i == 0 ? UNKNOWN : PLUS);
- alt_codes[i].value = value & 0x7ff;
- alt_codes[i].use_uw = false;
- value &= ~0x7ff;
- i++;
-   }
+  /* First handle any bits set by LUI.  Be careful of the
+SImode sign bit!.  */
+  if (value & 0x7800)
+   {
+ alt_codes[i].code = (i == 0 ? UNKNOWN : IOR);
+ alt_codes[i].value = value & 0x7800;
+ alt_codes[i].use_uw = false;
+ value &= ~0x7800;
+  i++;
+   }
 
- /* And any residuals with bseti.  */
- while (i < cost && value)
-   {
- HOST_WIDE_INT bit = ctz_hwi (value);
- alt_codes[i].code = (i == 0 ? UNKNOWN : IOR);
- alt_codes[i].value = 1UL << bit;
- alt_codes[i].use_uw = false;
- value &= ~(1ULL << bit);
- i++;
-   }
+  /* Next, any bits we can handle with addi.  */
+  if (value & 0x7ff)
+   {
+ alt_codes[i].code = (i == 0 ? UNKNOWN : PLUS);
+ alt_codes[i].value = value & 0x7ff;
+ alt_codes[i].use_uw = false;
+ value &= ~0x7ff;
+ i++;
+   }
 
- /* If LUI+ADDI+BSETI resulted in a more efficient
-sequence, then use it.  */
- if (i < cost)
-   {
- memcpy (codes, alt_codes, sizeof (alt_codes));
- cost = i;
-   }
+  /* And any residuals with bseti.  */
+  while (i < cost && value)
+   {
+ HOST_WIDE_INT bit = ctz_hwi (value);
+ alt_codes[i].code = (i == 0 ? UNKNOWN : IOR);
+ alt_codes[i].value = 1UL << bit;
+ alt_codes[i].use_uw = false;
+ value &= ~(1ULL << bit);
+ i++;
+   }
+
+  /* If LUI+ADDI+BSETI resulted in a more efficient
+sequence, then use it.  */
+  if (i < cost)
+   {
+ memcpy (codes, alt_codes, sizeof (alt_codes));
+ cost = i;
}
 }
 


Re: [PATCH 1/1] RISC-V: Add Zfbfmin extension to the -march= option

2024-05-06 Thread Jeff Law




On 4/11/24 9:32 PM, Xiao Zeng wrote:

This patch would like to add new sub extension (aka Zfbfmin) to the
-march= option. It introduces a new data type BF16.

1 The Zfbfmin extension depend on 'F', and the FLH, FSH, FMV.X.H, and
FMV.H.X instructions as defined in the Zfh extension.

2 The Zfhmin extension includes the following instructions from the
Zfh extension: FLH, FSH, FMV.X.H, FMV.H.X, FCVT.S.H, and FCVT.H.S.

3 Zfhmin extension depend on 'F'.

4 Simply put, just make Zfbfmin dependent on Zfhmin.

Perhaps in the future, we could propose making the FLH, FSH, FMV.X.H, and
FMV.H.X instructions an independent extension to achieve precise dependency
relationships for the Zfbfmin.

You can locate more information about Zfbfmin from below spec doc.



Below test are passed for this patch
 * The riscv fully regression test.

I wrote a suitable ChangeLog entry and pushed this patch to the trunk.

THanks,
jeff




Re: [PATCH] RISC-V: Add zero_extract support for rv64gc

2024-05-06 Thread Jeff Law




On 5/6/24 3:42 PM, Vineet Gupta wrote:



On 5/6/24 13:40, Christoph Müllner wrote:

The combiner attempts to optimize a zero-extension of a logical right shift
using zero_extract. We already utilize this optimization for those cases
that result in a single instructions.  Let's add a insn_and_split
pattern that also matches the generic case, where we can emit an
optimized sequence of a slli/srli.

...

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index d4676507b45..80cbecb78e8 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2792,6 +2792,36 @@ (define_insn "*lshrsi3_zero_extend_3"
[(set_attr "type" "shift")
 (set_attr "mode" "SI")])
  
+;; Canonical form for a zero-extend of a logical right shift.

+;; Special cases are handled above.
+;; Skip for single-bit extraction (Zbs/XTheadBs) and th.extu (XTheadBb)


Dumb question: Why not for Zbs: Zb[abs] is going to be very common going
fwd and will end up being unused.
Zbs only handles single bit extractions.  The pattern rejects that case 
allowing the single bit patterns from bitmanip.md and thead.md to match 
them.


Jeff




Re: [NOT CODE REVIEW] [PATCH v3 1/1] [RISC-V] Add support for _Bfloat16

2024-05-06 Thread Jeff Law




On 5/5/24 6:38 PM, Xiao Zeng wrote:

1 At point ,
   BF16 has already been completed "post public review".

2 LLVM has also added support for RISCV BF16 in
    and
   .

3 According to the discussion 
,
   this use __bf16 and use DF16b in riscv_mangle_type like x86.

Below test are passed for this patch
 * The riscv fully regression test.

gcc/ChangeLog:

* config/riscv/iterators.md: New mode iterator HFBF.
* config/riscv/riscv-builtins.cc (riscv_init_builtin_types):
Initialize data type _Bfloat16.
* config/riscv/riscv-modes.def (FLOAT_MODE): New.
(ADJUST_FLOAT_FORMAT): New.
* config/riscv/riscv.cc (riscv_mangle_type): Support for BFmode.
(riscv_scalar_mode_supported_p): Ditto.
(riscv_libgcc_floating_mode_supported_p): Ditto.
(riscv_init_libfuncs): Set the conversion method for BFmode and
HFmode.
(riscv_block_arith_comp_libfuncs_for_mode): Set the arithmetic
and comparison libfuncs for the mode.
* config/riscv/riscv.md (mode" ): Add BF.
(movhf): Support for BFmode.
(mov): Ditto.
(*movhf_softfloat): Ditto.
(*mov_softfloat): Ditto.

libgcc/ChangeLog:

* config/riscv/sfp-machine.h (_FP_NANFRAC_B): New.
(_FP_NANSIGN_B): Ditto.
* config/riscv/t-softfp32: Add support for BF16 libfuncs.
* config/riscv/t-softfp64: Ditto.
* soft-fp/floatsibf.c: For si -> bf16.
* soft-fp/floatunsibf.c: For unsi -> bf16.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/bf16_arithmetic.c: New test.
* gcc.target/riscv/bf16_call.c: New test.
* gcc.target/riscv/bf16_comparison.c: New test.
* gcc.target/riscv/bf16_float_libcall_convert.c: New test.
* gcc.target/riscv/bf16_integer_libcall_convert.c: New test.
Given we were only looking to have the CI system check the formatting 
nit and that has passed.  I've pushed this to the trunk.


jeff



Re: [PATCH] RISC-V: Document -mcmodel=large

2024-05-06 Thread Jeff Law




On 12/20/23 11:13 AM, Jeff Law wrote:



On 12/20/23 11:08, Palmer Dabbelt wrote:

This slipped through the cracks.  Probably also NEWS-worthy.

gcc/ChangeLog:

* doc/invoke.texi (RISC-V): Add -mcmodel=large.

OK.

And yes, I think we're going to need to to a new/changes update for the 
port as a whole as part of the gcc-14 process.

This never got committed as far as I can tell.  So I pushed it.

Jeff


Re: [RFA][RISC-V] Use "uw" forms for constant synthesis

2024-05-06 Thread Jeff Law




On 5/4/24 6:53 PM, Jeff Law wrote:


So another constant synthesis improvement.

In this patch we're looking at cases where we'd like to be able to use 
lui+slli, but can't because of the sign extending nature of lui on 
TARGET_64BIT.  For example: 0x800110020UL.  The trunk currently 
generates 4 instructions for that constant, when it can be done with 3 
(lui+slli.uw+addi).


When Zba is enabled, we can use lui+slli.uw as the slli.uw masks off the 
bits 32..63 before shifting, giving us the precise semantics we want.


I strongly suspect we'll want to do the same for a set of constants with 
lui+add.uw, lui+shNadd.uw, so you'll see the beginnings of generalizing 
support for lui followed by a "uw" instruction.


The new test just tests the set of cases that showed up while exploring 
a particular space of the constant synthesis problem.  It's not meant to 
be exhaustive (failure to use shadd when profitable).


Tested on rv64gc and rv32gcv.  OK for the trunk assuming it passes CI?

I pushed this after fixing the two over-length lines.

jeff



Re: [PATCH] RISC-V: Add zero_extract support for rv64gc

2024-05-06 Thread Jeff Law




On 5/6/24 2:40 PM, Christoph Müllner wrote:

The combiner attempts to optimize a zero-extension of a logical right shift
using zero_extract. We already utilize this optimization for those cases
that result in a single instructions.  Let's add a insn_and_split
pattern that also matches the generic case, where we can emit an
optimized sequence of a slli/srli.

Tested with SPEC CPU 2017 (rv64gc).

PR 111501

gcc/ChangeLog:

* config/riscv/riscv.md (*lshr3_zero_extend_4): New
pattern for zero-extraction.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr111501.c: New test.
* gcc.target/riscv/zero-extend-rshift-32.c: New test.
* gcc.target/riscv/zero-extend-rshift-64.c: New test.
* gcc.target/riscv/zero-extend-rshift.c: New test.
So I had Lyut looking in this space as well.  Mostly because there's a 
desire to avoid the srl+and approach and instead represent this stuff as 
shifts (which are fusible in our uarch).  SO I've already got some state...





Signed-off-by: Christoph Müllner 
---
  gcc/config/riscv/riscv.md |  30 +
  gcc/testsuite/gcc.target/riscv/pr111501.c |  32 +
  .../gcc.target/riscv/zero-extend-rshift-32.c  |  37 ++
  .../gcc.target/riscv/zero-extend-rshift-64.c  |  63 ++
  .../gcc.target/riscv/zero-extend-rshift.c | 119 ++
  5 files changed, 281 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/riscv/pr111501.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zero-extend-rshift-32.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zero-extend-rshift-64.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zero-extend-rshift.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index d4676507b45..80cbecb78e8 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2792,6 +2792,36 @@ (define_insn "*lshrsi3_zero_extend_3"
[(set_attr "type" "shift")
 (set_attr "mode" "SI")])
  
+;; Canonical form for a zero-extend of a logical right shift.

+;; Special cases are handled above.
+;; Skip for single-bit extraction (Zbs/XTheadBs) and th.extu (XTheadBb)
+(define_insn_and_split "*lshr3_zero_extend_4"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+(zero_extract:GPR
+   (match_operand:GPR 1 "register_operand" " r")
+   (match_operand 2 "const_int_operand")
+   (match_operand 3 "const_int_operand")))
+   (clobber (match_scratch:GPR  4 "="))]
+  "!((TARGET_ZBS || TARGET_XTHEADBS) && (INTVAL (operands[2]) == 1))
+   && !TARGET_XTHEADBB"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 4)
+ (ashift:GPR (match_dup 1) (match_dup 2)))
+   (set (match_dup 0)
+ (lshiftrt:GPR (match_dup 4) (match_dup 3)))]
Consider adding support for signed extractions as well.  You just need 
an iterator across zero_extract/sign_extract and suitable selection of 
arithmetic vs logical right shift step.


A nit on the condition.   Bring the && INTVAL (operands[2]) == 1 down to 
a new line like you've gone with !TARGET_XTHEADBB.


You also want to make sure the condition rejects the cases handled by 
this pattern (or merge your pattern with this one):



;; Canonical form for a zero-extend of a logical right shift.
(define_insn "*lshrsi3_zero_extend_2" 
  [(set (match_operand:DI   0 "register_operand" "=r")

(zero_extract:DI (match_operand:DI  1 "register_operand" " r")
 (match_operand 2 "const_int_operand")
 (match_operand 3 "const_int_operand")))]
  "(TARGET_64BIT && (INTVAL (operands[3]) > 0)
&& (INTVAL (operands[2]) + INTVAL (operands[3]) == 32))"
{
  return "srliw\t%0,%1,%3";
}
  [(set_attr "type" "shift")
   (set_attr "mode" "SI")])


So generally going the right direction.  But needs another iteration.

Jeff



Re: [PATCH v2 1/1] [RISC-V] Add support for _Bfloat16

2024-05-06 Thread Jeff Law




On 5/4/24 8:08 PM, Xiao Zeng wrote:



https://github.com/ewlu/gcc-precommit-ci/issues/1412#issuecomment-2031568644

In the future, my patch will strictly adhere to the formatting suggestions 
provided by CI.
No worries.  Even those of us who have been working on the project for 
30+ years still goof this stuff up from time to time.   In fact, it 
complained about one of my patches over the weekend ;-)




With that fixed, this is fine for the trunk.  No need to repost,
go ahead and commit.

Currently, I do not have commit permission. Can I have this permission?


Use this form:

https://sourceware.org/cgi-bin/pdw/ps_form.cgi


And my email address as the as your sponsor.  j...@ventanamicro.com

I'll go ahead and commit the Bfloat16 patch.  But if you plan on 
contributing regularly, it's definitely easier to have write access.


Jeff


[RISC-V] Fix incorrect if-then-else nesting of Zbs usage in constant synthesis

2024-05-06 Thread Jeff Law
So I managed to goof the if-then-else level of the bseti bits last week. 
 They were supposed to be a last ditch effort to improve the result, 
but ended up inside a conditional where they don't really belong.  I 
almost always use Zba, Zbb and Zbs together, so it slipped by.


So it's NFC if you always test with Zbb and Zbs enabled together.  But 
if you enabled Zbs without Zbb you'd see a failure to use bseti.


Planning to commit once pre-commit CI passes.

I'm attaching the actual patch (P) and a diff with whitespace ignored 
(P2) so it's easier to see what actually changed.


Jeffdiff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 6f1c67bf3f7..dddb7f8d673 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -869,50 +869,51 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  codes[1].use_uw = false;
  cost = 2;
}
-  /* Final cases, particularly focused on bseti.  */
-  else if (cost > 2 && TARGET_ZBS)
-   {
- int i = 0;
+}
 
- /* First handle any bits set by LUI.  Be careful of the
-SImode sign bit!.  */
- if (value & 0x7800)
-   {
- alt_codes[i].code = (i == 0 ? UNKNOWN : IOR);
- alt_codes[i].value = value & 0x7800;
- alt_codes[i].use_uw = false;
- value &= ~0x7800;
- i++;
-   }
+  /* Final cases, particularly focused on bseti.  */
+  if (cost > 2 && TARGET_ZBS)
+{
+  int i = 0;
 
- /* Next, any bits we can handle with addi.  */
- if (value & 0x7ff)
-   {
- alt_codes[i].code = (i == 0 ? UNKNOWN : PLUS);
- alt_codes[i].value = value & 0x7ff;
- alt_codes[i].use_uw = false;
- value &= ~0x7ff;
- i++;
-   }
+  /* First handle any bits set by LUI.  Be careful of the
+SImode sign bit!.  */
+  if (value & 0x7800)
+   {
+ alt_codes[i].code = (i == 0 ? UNKNOWN : IOR);
+ alt_codes[i].value = value & 0x7800;
+ alt_codes[i].use_uw = false;
+ value &= ~0x7800;
+  i++;
+   }
 
- /* And any residuals with bseti.  */
- while (i < cost && value)
-   {
- HOST_WIDE_INT bit = ctz_hwi (value);
- alt_codes[i].code = (i == 0 ? UNKNOWN : IOR);
- alt_codes[i].value = 1UL << bit;
- alt_codes[i].use_uw = false;
- value &= ~(1ULL << bit);
- i++;
-   }
+  /* Next, any bits we can handle with addi.  */
+  if (value & 0x7ff)
+   {
+ alt_codes[i].code = (i == 0 ? UNKNOWN : PLUS);
+ alt_codes[i].value = value & 0x7ff;
+ alt_codes[i].use_uw = false;
+ value &= ~0x7ff;
+ i++;
+   }
 
- /* If LUI+ADDI+BSETI resulted in a more efficient
-sequence, then use it.  */
- if (i < cost)
-   {
- memcpy (codes, alt_codes, sizeof (alt_codes));
- cost = i;
-   }
+  /* And any residuals with bseti.  */
+  while (i < cost && value)
+   {
+ HOST_WIDE_INT bit = ctz_hwi (value);
+ alt_codes[i].code = (i == 0 ? UNKNOWN : IOR);
+ alt_codes[i].value = 1UL << bit;
+ alt_codes[i].use_uw = false;
+ value &= ~(1ULL << bit);
+ i++;
+   }
+
+  /* If LUI+ADDI+BSETI resulted in a more efficient
+sequence, then use it.  */
+  if (i < cost)
+   {
+ memcpy (codes, alt_codes, sizeof (alt_codes));
+ cost = i;
}
 }
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 6f1c67bf3f7..dddb7f8d673 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -869,8 +869,10 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  codes[1].use_uw = false;
  cost = 2;
}
+}
+
   /* Final cases, particularly focused on bseti.  */
-  else if (cost > 2 && TARGET_ZBS)
+  if (cost > 2 && TARGET_ZBS)
 {
   int i = 0;
 
@@ -914,7 +916,6 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  cost = i;
}
 }
-}
 
   gcc_assert (cost <= RISCV_MAX_INTEGER_OPS);
   return cost;


Re: [RFA][RISC-V] Use "uw" forms for constant synthesis

2024-05-05 Thread Jeff Law




On 5/4/24 6:53 PM, Jeff Law wrote:


So another constant synthesis improvement.

In this patch we're looking at cases where we'd like to be able to use 
lui+slli, but can't because of the sign extending nature of lui on 
TARGET_64BIT.  For example: 0x800110020UL.  The trunk currently 
generates 4 instructions for that constant, when it can be done with 3 
(lui+slli.uw+addi).


When Zba is enabled, we can use lui+slli.uw as the slli.uw masks off the 
bits 32..63 before shifting, giving us the precise semantics we want.


I strongly suspect we'll want to do the same for a set of constants with 
lui+add.uw, lui+shNadd.uw, so you'll see the beginnings of generalizing 
support for lui followed by a "uw" instruction.


The new test just tests the set of cases that showed up while exploring 
a particular space of the constant synthesis problem.  It's not meant to 
be exhaustive (failure to use shadd when profitable).


Tested on rv64gc and rv32gcv.  OK for the trunk assuming it passes CI?

Assume I'll fix the two overly long lines pointed out by the linter :-)
jeff


[RFA][RISC-V] Use "uw" forms for constant synthesis

2024-05-04 Thread Jeff Law


So another constant synthesis improvement.

In this patch we're looking at cases where we'd like to be able to use 
lui+slli, but can't because of the sign extending nature of lui on 
TARGET_64BIT.  For example: 0x800110020UL.  The trunk currently 
generates 4 instructions for that constant, when it can be done with 3 
(lui+slli.uw+addi).


When Zba is enabled, we can use lui+slli.uw as the slli.uw masks off the 
bits 32..63 before shifting, giving us the precise semantics we want.


I strongly suspect we'll want to do the same for a set of constants with 
lui+add.uw, lui+shNadd.uw, so you'll see the beginnings of generalizing 
support for lui followed by a "uw" instruction.


The new test just tests the set of cases that showed up while exploring 
a particular space of the constant synthesis problem.  It's not meant to 
be exhaustive (failure to use shadd when profitable).


Tested on rv64gc and rv32gcv.  OK for the trunk assuming it passes CI?

Jeff


gcc/

* config/riscv/riscv.cc (riscv_integer_op): Add field tracking if we
want to use a "uw" instruction variant.
(riscv_build_integer_1): Initialize the new field in various places.
Use lui+slli.uw for some constants.
(riscv_move_integer): Handle slli.uw.  

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 44945d47fd6..fd81f69e230 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -249,6 +249,7 @@ struct riscv_arg_info {
where A is an accumulator, each CODE[i] is a binary rtl operation
and each VALUE[i] is a constant integer.  CODE[0] is undefined.  */
 struct riscv_integer_op {
+  bool use_uw;
   enum rtx_code code;
   unsigned HOST_WIDE_INT value;
 };
@@ -734,6 +735,7 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
   /* Simply ADDI or LUI.  */
   codes[0].code = UNKNOWN;
   codes[0].value = value;
+  codes[0].use_uw = false;
   return 1;
 }
   if (TARGET_ZBS && SINGLE_BIT_MASK_OPERAND (value))
@@ -741,6 +743,7 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
   /* Simply BSETI.  */
   codes[0].code = UNKNOWN;
   codes[0].value = value;
+  codes[0].use_uw = false;
 
   /* RISC-V sign-extends all 32bit values that live in a 32bit
 register.  To avoid paradoxes, we thus need to use the
@@ -769,6 +772,7 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
{
  alt_codes[alt_cost-1].code = PLUS;
  alt_codes[alt_cost-1].value = low_part;
+ alt_codes[alt_cost-1].use_uw = false;
  memcpy (codes, alt_codes, sizeof (alt_codes));
  cost = alt_cost;
}
@@ -782,6 +786,7 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
{
  alt_codes[alt_cost-1].code = XOR;
  alt_codes[alt_cost-1].value = low_part;
+ alt_codes[alt_cost-1].use_uw = false;
  memcpy (codes, alt_codes, sizeof (alt_codes));
  cost = alt_cost;
}
@@ -792,17 +797,37 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
 {
   int shift = ctz_hwi (value);
   unsigned HOST_WIDE_INT x = value;
+  bool use_uw = false;
   x = sext_hwi (x >> shift, HOST_BITS_PER_WIDE_INT - shift);
 
   /* Don't eliminate the lower 12 bits if LUI might apply.  */
-  if (shift > IMM_BITS && !SMALL_OPERAND (x) && LUI_OPERAND (x << 
IMM_BITS))
+  if (shift > IMM_BITS
+ && !SMALL_OPERAND (x)
+ && (LUI_OPERAND (x << IMM_BITS)
+ || (TARGET_64BIT
+ && TARGET_ZBA
+ && LUI_OPERAND ((x << IMM_BITS)
+ & ~HOST_WIDE_INT_C (0x8000)
shift -= IMM_BITS, x <<= IMM_BITS;
 
+  /* Adjust X if it isn't a LUI operand in isolation, but we can use
+a subsequent "uw" instruction form to mask off the undesirable
+bits.  */
+  if (!LUI_OPERAND (x)
+ && TARGET_64BIT
+ && TARGET_ZBA
+ && LUI_OPERAND (x & ~HOST_WIDE_INT_C (0x8000UL)))
+   {
+ x = sext_hwi (x, 32);
+ use_uw = true;
+   }
+
   alt_cost = 1 + riscv_build_integer_1 (alt_codes, x, mode);
   if (alt_cost < cost)
{
  alt_codes[alt_cost-1].code = ASHIFT;
  alt_codes[alt_cost-1].value = shift;
+ alt_codes[alt_cost-1].use_uw = use_uw;
  memcpy (codes, alt_codes, sizeof (alt_codes));
  cost = alt_cost;
}
@@ -823,8 +848,10 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  /* The sign-bit might be zero, so just rotate to be safe.  */
  codes[0].value = (((unsigned HOST_WIDE_INT) value >> trailing_ones)
| (value << (64 - trailing_ones)));
+ codes[0].use_uw = false;
  codes[1].code = ROTATERT;
  codes[1].value = 64 - 

Re: [PATCH v2 1/1] [RISC-V] Add support for _Bfloat16

2024-05-04 Thread Jeff Law




On 4/2/24 3:22 AM, Xiao Zeng wrote:

1 At point ,
   BF16 has already been completed "post public review".

2 LLVM has also added support for RISCV BF16 in
    and
   .

3 According to the discussion 
,
   this use __bf16 and use DF16b in riscv_mangle_type like x86.

Below test are passed for this patch
 * The riscv fully regression test.

gcc/ChangeLog:

* config/riscv/iterators.md: New mode iterator HFBF.
* config/riscv/riscv-builtins.cc (riscv_init_builtin_types):
Initialize data type _Bfloat16.
* config/riscv/riscv-modes.def (FLOAT_MODE): New.
(ADJUST_FLOAT_FORMAT): New.
* config/riscv/riscv.cc (riscv_mangle_type): Support for BFmode.
(riscv_scalar_mode_supported_p): Ditto.
(riscv_libgcc_floating_mode_supported_p): Ditto.
(riscv_init_libfuncs): Set the conversion method for BFmode and
HFmode.
(riscv_block_arith_comp_libfuncs_for_mode): Set the arithmetic
and comparison libfuncs for the mode.
* config/riscv/riscv.md (mode" ): Add BF.
(movhf): Support for BFmode.
(mov): Ditto.
(*movhf_softfloat): Ditto.
(*mov_softfloat): Ditto.

libgcc/ChangeLog:

* config/riscv/sfp-machine.h (_FP_NANFRAC_B): New.
(_FP_NANSIGN_B): Ditto.
* config/riscv/t-softfp32: Add support for BF16 libfuncs.
* config/riscv/t-softfp64: Ditto.
* soft-fp/floatsibf.c: For si -> bf16.
* soft-fp/floatunsibf.c: For unsi -> bf16.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/bf16_arithmetic.c: New test.
* gcc.target/riscv/bf16_call.c: New test.
* gcc.target/riscv/bf16_comparison.c: New test.
* gcc.target/riscv/bf16_float_libcall_convert.c: New test.
* gcc.target/riscv/bf16_integer_libcall_convert.c: New test.
Just some nits.  In t-softfp32 and t-softfp64 the code you've added 
should be using tabs, not 8 spaces, as noted by the CI "Lint Status":


https://github.com/ewlu/gcc-precommit-ci/issues/1412#issuecomment-2031568644

With that fixed, this is fine for the trunk.  No need to repost, go 
ahead and commit.


Thanks for your patience,
Jeff


  1   2   3   4   5   6   7   8   9   10   >