Re: [gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak

2024-03-18 Thread Jeff Law




On 3/16/24 11:35 AM, Vineet Gupta wrote:

Hi,

This set of patches (for gcc-15) help improve stack/array accesses
by improving constant materialization. Details are in respective
patches.

The first patch is the main change which improves SPEC cactu by 10%.
Just to confirm.  Yup, 10% reduction in icounts and about a 3.5% 
improvement in cycles on our target.  Which is great!


This also makes me wonder if cactu is the benchmark that was sensitive 
to flushing the pending queue in the scheduler.  Jivan's data would tend 
to indicate that is the case as several routines seem to flush the 
pending queue often.  In particular:


ML_BSSN_RHS_Body
ML_BSSN_Advect_Body
ML_BSSN_constraints_Body

All have a high number of dynamic instructions as well as lots of 
flushes of the pending queue.


Vineet, you might want to look and see if cranking up the 
max-pending-list-length parameter helps drive down spilling.   I think 
it's default value is 32 insns.  I've seen it cranked up to 128 and 256 
insns without significant ill effects on compile time.


My recollection (it's been like 3 years) of the key loop was that it had 
a few hundred instructions and we'd flush the pending list about 50 
cycles into the loop as there just wasn't enough issue bandwidth to the 
FP units to dispatch all the FP instructions as their inputs became 
ready.  So you'd be looking for flushes in a big loop.



Jeff




Re: [PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits

2024-03-18 Thread Jeff Law




On 3/10/24 11:41 PM, HAO CHEN GUI wrote:

Hi,
   This patch tries to fix the problem when a canonical form doesn't benefit
on a specific target. The const operand of AND is and with the nonzero
bits of another operand in combine pass. It's a canonical form, but it's no
benefits for the target which has rotate and mask insns. As the mask is
truncated, it can't match the insn conditions which it originally matches.
For example, the following insn condition checks the sum of two AND masks.
When one of the mask is truncated, the condition breaks.

(define_insn "*rotlsi3_insert_5"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
(ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
(match_operand:SI 2 "const_int_operand" "n,n"))
(and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0")
(match_operand:SI 4 "const_int_operand" "n,n"]
   "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode)
&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0
&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0"
...

   This patch tries to fix the problem by comparing the rtx cost. If another
operand (varop) is not changed and rtx cost with new mask is not less than
the original one, the mask is restored to original one.

   I'm not sure if comparison of rtx cost here is proper. The outer code is
unknown and I suppose it as "SET". Also the rtx cost might not be accurate.
 From my understanding, the canonical forms should always benefit as it can't
be undo in combine pass. Do we have a perfect solution for this kind of
issues? Looking forward for your advice.

   Another similar issues for canonical forms. Whether the widen mode for
lshiftrt is always good?
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html

Thanks
Gui Haochen

ChangeLog
Combine: Don't truncate const operand of AND if it's no benefits

In combine pass, the canonical form is to turn off all bits in the constant
that are know to already be zero for AND.

   /* Turn off all bits in the constant that are known to already be zero.
  Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS
  which is tested below.  */

   constop &= nonzero;

But it doesn't benefit when the target has rotate and mask insert insns.
The AND mask is truncated and lost its information.  Thus it can't match
the insn conditions.  For example, the following insn condition checks
the sum of two AND masks.

(define_insn "*rotlsi3_insert_5"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
(ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
(match_operand:SI 2 "const_int_operand" "n,n"))
(and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0")
(match_operand:SI 4 "const_int_operand" "n,n"]
   "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode)
&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0
&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0"
...

This patch restores the const operand of AND if the another operand is
not optimized and the truncated const operand doesn't save the rtx cost.

gcc/
* combine.cc (simplify_and_const_int_1): Restore the const operand
of AND if varop is not optimized and the rtx cost of the new const
operand is not reduced.

gcc/testsuite/
* gcc.target/powerpc/rlwimi-0.c: Reduced total number of insns and
adjust the number of rotate and mask insns.
* gcc.target/powerpc/rlwimi-1.c: Likewise.
* gcc.target/powerpc/rlwimi-2.c: Likewise.

It seems like this should defer to gcc-15.

jeff



Re: [PATCH] RISC-V: Allow LICM hoist POLY_INT configuration code sequence

2024-03-18 Thread Jeff Law




On 2/6/24 6:14 AM, Robin Dapp wrote:

The root cause is this following RTL pattern, after fwprop1:

(insn 82 78 84 9 (set (reg:DI 230)
         (sign_extend:DI (minus:SI (subreg/s/v:SI (reg:DI 150 [ niters.10 ]) 0)
                 (subreg:SI (reg:DI 221) 0 13 {subsi3_extended}
      (expr_list:REG_EQUAL (sign_extend:DI (plus:SI (subreg/s/v:SI (reg:DI 150 
[ niters.10 ]) 0)
                 *(const_poly_int:SI [-16, -16])*))
         (nil)))

The highlight *(const_poly_int:SI [-16, -16])*
causes ICE.

This RTL is because:
(insn 69 68 71 8 (set (reg:DI 221)
         (const_poly_int:DI [16, 16])) 208 {*movdi_64bit}
      (nil))
(insn 82 78 84 9 (set (reg:DI 230)
         (sign_extend:DI (minus:SI (subreg/s/v:SI (reg:DI 150 [ niters.10 ]) 0)
                 (subreg:SI (reg:DI 221) 0 13 {subsi3_extended}                
                          > (subreg:SI (const_poly_int:SI [-16, -16])) 
fwprop1 add  (const_poly_int:SI [-16, -16]) reg_equal
      (expr_list:REG_EQUAL (sign_extend:DI (plus:SI (subreg/s/v:SI (reg:DI 150 
[ niters.10 ]) 0)
                 (const_poly_int:SI [-16, -16])))
         (nil)))


I'm seeing a slightly different pattern but that doesn't change
the problem.


(set (reg:SI)  (subreg:SI (DI: poly value))) but it causes ICE that I
mentioned above.


That's indeed a bit more idiomatic and I wouldn't oppose that.

The problem causing the ICE is that we want to simplify a PLUS
with (const_poly_int:SI [16, 16]) and (const_int 0) but the mode
is DImode.  My suspicion is that this is caused by our
addsi3_extended pattern and we fail to deduce the proper mode
for analysis.
Certainly possible.  It didn't even occur to me that a POLY_INT would 
slip through here.




I'm just speculating but maybe that's because we assert that a
plus is of the form simple_reg_p (op0) && CONSTANT_P (op1).
Usually, constants don't have a mode and can just be used.
poly_int_csts do have one and need to be explicitly converted
(kind of).

We can only analyze this zero_extended plus at all since Jeff
added the addsi3_extended handling for loop-iv.   Maybe we could
punt like

diff --git a/gcc/loop-iv.cc b/gcc/loop-iv.cc
index eb7e923a38b..796413c25a3 100644
--- a/gcc/loop-iv.cc
+++ b/gcc/loop-iv.cc
@@ -714,6 +714,9 @@ get_biv_step_1 (df_ref def, scalar_int_mode outer_mode, rtx 
reg,
   if (!simple_reg_p (op0) || !CONSTANT_P (op1))
 return false;
  
+ if (CONST_POLY_INT_P (op1) && GET_MODE (op1) != outer_mode)

+   return false;
+

This helps for your test case but I haven't done any further
testing.  I'd think this is relatively safe because it's only
a missed analysis/optimization in the worst case.
Still, generally, I don't see a reason why we wouldn't be able
to analyze this?
I don't think it would significant hurt anything.  IIRC bit of code was 
to fix a minor regression caused by the backend changes.


I would ACK that patch given the usual testing cycle.

jeff



Re: RISC-V: Use convert instructions instead of calling library functions

2024-03-18 Thread Jeff Law




On 3/18/24 3:09 AM, Jivan Hakobyan wrote:
As RV has round instructions it is reasonable to use them instead of 
calling the library functions.


With my patch for the following C code:
double foo(double a) {
     return ceil(a);
}

GCC generates the following ASM code (before it was tail call)
foo:
         fabs.d  fa4,fa0
         lui     a5,%hi(.LC0)
         fld     fa3,%lo(.LC0)(a5)
         flt.d   a5,fa4,fa3
         beq     a5,zero,.L3
         fcvt.l.d a5,fa0,rup
         fcvt.d.l        fa4,a5
         fsgnj.d fa0,fa4,fa0
.L3:
         ret

.LC0:
         .word   0
         .word   1127219200     // 0x4330


The patch I have evaluated on SPEC2017.
Counted dynamic instructions counts and got the following improvements

510.parest_r       262 m      -
511.povray_r      2.1  b        0.04%
521.wrt_r            269 m       -
526.blender_r    3 b             0.1%
527.cam4_r       15 b           0.6%
538.imagick_r    365 b         7.6%

Overall executed 385 billion fewer instructions which is 0.5%.

A few more notes.

The sequence Jivan is using is derived from LLVM.  The condition in the 
generated code tests for those values were are supposed to pass through 
unaltered.  The condition in the pattern ensures we do something 
sensible WRT FE_INEXACT and mirrors how other ports handle these insns.


Our internal testing shows a benefit well beyond the 7% reduction in 
icounts.  Presumably due to fewer calls, fewer transfers across the 
register files, better scheduling around the call site, etc.


Obviously for Zfa we'll use the more efficient instructions for that 
extension.  But there's no reason to not go forward with this change for 
gcc-15.



Jeff


Re: [PATCH v3] testsuite: Add a test case for negating FP vectors containing zeros

2024-03-18 Thread Jeff Law




On 3/5/24 4:40 AM, Xi Ruoyao wrote:

Recently I've fixed two wrong FP vector negate implementation which
caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
prevent a similar issue from happening again, add a test case.

Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
(with MSA), LoongArch (with LSX and LASX).

gcc/testsuite:

* gcc.dg/vect/vect-neg-zero.c: New test.
---

- v1 -> v2: Remove { dg-do run } which may cause SIGILL.
- v2 -> v3: Add -fno-associative-math to fix an excessive warning on
   arm.

Ok for trunk?

OK
jeff



Re: [PATCH v5 1/1] RISC-V: Add support for XCVbi extension in CV32E40P

2024-03-18 Thread Jeff Law




On 1/22/24 6:30 AM, Mary Bennett wrote:


On 09/01/2024 18:43, Jeff Law wrote:



On 1/8/24 06:14, Mary Bennett wrote:
Spec: github.com/openhwgroup/core-v-sw/blob/master/specifications/ 
corev-builtin-spec.md


Contributors:
   Mary Bennett 
   Nandni Jamnadas 
   Pietra Ferreira 
   Charlie Keaney
   Jessica Mills
   Craig Blackmore 
   Simon Cook 
   Jeremy Bennett 
   Helene Chelin 

gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Create XCVbi extension
  support.
* config/riscv/riscv.opt: Likewise.
* config/riscv/corev.md: Implement cv_branch pattern
  for cv.beqimm and cv.bneimm.
* config/riscv/riscv.md: Add CORE-V branch immediate to RISC-V
  branch instruction pattern.
* config/riscv/constraints.md: Implement constraints
  cv_bi_s5 - signed 5-bit immediate.
* config/riscv/predicates.md: Implement predicate
  const_int5s_operand - signed 5 bit immediate.
* doc/sourcebuild.texi: Add XCVbi documentation.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/cv-bi-beqimm-compile-1.c: New test.
* gcc.target/riscv/cv-bi-beqimm-compile-2.c: New test.
* gcc.target/riscv/cv-bi-bneimm-compile-1.c: New test.
* gcc.target/riscv/cv-bi-bneimm-compile-2.c: New test.
* lib/target-supports.exp: Add proc for XCVbi.
Assuming this has gone through a testing cycle, this is fine for the 
trunk.


Thanks,
jeff


This patch passes regression. Are there any other changes required 
before it can be merged?

I pushed this to the trunk.  Sorry for the delays.

jeff



Re: [PATCH] LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-18 Thread chenglulu



在 2024/3/18 下午5:34, Xi Ruoyao 写道:

We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument.  This is causing gcc.dg/c23-stdarg-6.c and
gcc.dg/c23-stdarg-8.c to fail.

Fix the issue by checking if arg.type is NULL, as r14-9503 explains.

gcc/ChangeLog:

PR target/114175
* config/loongarch/loongarch.cc
(loongarch_setup_incoming_varargs): Only skip
loongarch_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
functions if arg.type is NULL.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch.cc | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 70e31bb831c..57de8ef7d20 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -767,7 +767,8 @@ loongarch_setup_incoming_varargs (cumulative_args_t cum,
   argument.  Advance a local copy of CUM past the last "real" named
   argument, to find out how many registers are left over.  */
local_cum = *get_cumulative_args (cum);

I think it's important to add annotation information here:
    /* where there is no hidden return argument passed, arg.type

 is always NULL.  */

Others LTGM.

Thanks!


-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
  loongarch_function_arg_advance (pack_cumulative_args (_cum), arg);
  
/* Found out how many registers we need to save.  */




Re: [PATCH V2] RISC-V: Update test expectancies with recent scheduler change

2024-03-18 Thread Jeff Law




On 3/12/24 3:56 PM, Edwin Lu wrote:

Given the recent change with adding the scheduler pipeline descriptions,
many scan-dump failures emerged. Relax the expected assembler output
conditions on the affected tests to reduce noise.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: Disable scheduling
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: Ditto
* gcc.target/riscv/rvv/base/pr108185-1.c: Update test expectancies
* gcc.target/riscv/rvv/base/pr108185-2.c: Ditto
* gcc.target/riscv/rvv/base/pr108185-3.c: Ditto
* gcc.target/riscv/rvv/base/pr108185-4.c: Ditto
* gcc.target/riscv/rvv/base/pr108185-5.c: Ditto
* gcc.target/riscv/rvv/base/pr108185-6.c: Ditto
* gcc.target/riscv/rvv/base/pr108185-7.c: Ditto
* gcc.target/riscv/rvv/base/vcreate.c: Disable scheduling and update
test expectancies
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-30.c: Disable scheduling
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-31.c: Ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_single_block-17.c: Update test
expectancies
* gcc.target/riscv/rvv/vsetvl/vlmax_single_block-18.c: Ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-10.c: Ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-11.c: Ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-12.c: Ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-4.c: Ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-5.c: Ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-6.c: Ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-7.c: Ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-8.c: Ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-9.c: Ditto
As we discussed last week.  This should go forward as it brings a better 
degree of stability to these tests.  Looking forward to cleaner 
testresults as my tester has been complaining about this stuff for a 
month now :(



And a note for the future.  Let's take this one:


diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-1.c
index 4c6e88e7eed..46d3b5e98d4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-1.c
@@ -60,11 +60,11 @@ test_vbool1_then_vbool64(int8_t * restrict in, int8_t * 
restrict out) {
  }
  
  /* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */

-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 2 } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 2 } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 2 } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 2 } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 2 } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 2 } } */
  /* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
12 } } */
  /* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
12 } } */


This shows an example of how uarch information such as instruction 
latency will affect vset counts.  If someone wanted to test that 
pr108185-1 can drive the counts of each of those vsets to since 
instance, they can certainly #include pr108185-1 and provide suitable dg 
directives to set a specific uarch tuning and appropriate dg-final 
directives to ensure just a single instance of each vset occurs.


I'm not expecting you do to this.  Just making a note if someone really 
wants to use those tests to verify a specific set of vsets on a 
particular uarch.



Jeff


Re: [PATCH] RISC-V: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-18 Thread Jeff Law




On 3/18/24 12:54 PM, Edwin Lu wrote:

We assume that TYPE_NO_NAMED_ARGS_STDARG_P don't have any named arguments and
there is nothing to advance, but that is not the case for (...) functions
returning by hidden reference which have one such artificial argument.
This causes gcc.dg/c23-stdarg-[68].c to fail

Fix the issue by checking if arg.type is NULL as r14-9503-g218d1749612
explains

Tested on linux rv64gcv.

gcc/ChangeLog:

PR target/114175
* config/riscv/riscv.cc (riscv_setup_incoming_varargs): Only skip
riscv_funciton_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL

OK.  Thanks for taking care of this.

Jeff



Re: [PATCH] alpha: Fix alpha_setup_incoming_varargs [PR114175]

2024-03-18 Thread Jeff Law




On 3/18/24 12:40 PM, Jakub Jelinek wrote:

Hi!

Like in the r14-9503 change on x86-64, I think Alpha also needs to
function_arg_advance after the hidden return pointer argument if
any.
At least, the following patch changes the assembly of s1-s6 functions
on the https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647956.html
c23-stdarg-9.c testcase, and eyeballing the assembly for int f8 (...)
the ... args are passed in 16..21 registers and then on the stack,
while for struct S s8 (...) have hidden return pointer passed in 16
register and ... args in 17..21 registers and then on the stack, and
seems without this patch the incoming varargs setup does the wrong thing
(but I can't test on alpha easily).

Many targets seem to be unaffected, e.g. aarch64, arm, s390*, so I'm not
trying to change all targets together because such a change clearly isn't
needed e.g. for targets which use special register for the hidden return
pointer.

Ok for trunk?

2024-03-18  Jakub Jelinek  

PR target/114175
* config/alpha/alpha.cc (alpha_setup_incoming_varargs): Only skip
function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL.
OK.  And I think you should go ahead with any others that you think need 
a similar change.  I certainly trust your judgment.


jeff


Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.

2024-03-18 Thread Jeff Law




On 2/27/24 1:52 AM, Jiawei wrote:

From: Chen Jiawei 

Co-Authored by: Lin Jiawei 

This patch add XiangShan Nanhu cpu microarchitecture,
Nanhu is a 6-issue, superscalar, out-of-order processor.
More details see: https://xiangshan-doc.readthedocs.io/zh-cn/latest/arch

gcc/ChangeLog:

 * config/riscv/riscv-cores.def (RISCV_TUNE): New def.
 (RISCV_CORE): Ditto.
 * config/riscv/riscv-opts.h (enum
 * riscv_microarchitecture_type): New option.
 * config/riscv/riscv.cc: New def.
 * config/riscv/riscv.md: New include.
 * config/riscv/xiangshan.md: New file.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/mcpu-xiangshan-nanhu.c: New test.
As was discussed last Tuesday, this should be safe, even at this late 
stage in the gcc-14 cycle.


  
+/* Costs to use when optimizing for xiangshan nanhu.  */

+static const struct riscv_tune_param xiangshan_nanhu_tune_info = {
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_add */
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_mul */
+  {COSTS_N_INSNS (10), COSTS_N_INSNS (20)},/* fp_div */
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* int_mul */
+  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},  /* int_div */
+  6,   /* issue_rate */
+  3,   /* branch_cost */
+  3,   /* memory_cost */
+  3,   /* fmv_cost */
+  true,/* 
slow_unaligned_access */
+  false,   /* use_divmod_expansion */
+  RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH,  /* fusible_ops */
+  NULL,/* vector cost */
Is your integer division really that fast?  The table above essentially 
says that your cpu can do integer division in 6 cycles.



+
+(define_insn_reservation "xiangshan_mul" 3
+  (and (eq_attr "tune" "xiangshan")
+   (eq_attr "type" "imul"))
+  "xs_mdu_rs")
+
+(define_insn_reservation "xiangshan_div" 21
+  (and (eq_attr "tune" "xiangshan")
+   (eq_attr "type" "idiv"))
+  "xs_mdu_rs")

Whereas your pipeline description says it's 21c.

I strongly suspect you want to increase the cost of the int_div in the 
tuning table.  And with a the higher cost you probably want to turn on 
use_divmod_expansion.


I'll also note that your scheduler description also indicates your 
division is fully pipelined.  Is that correct?  if not, you'll want to 
adjust that reservation.





+
+(define_insn_reservation "xiangshan_sfdiv" 11
+  (and (eq_attr "tune" "xiangshan")
+   (eq_attr "type" "fdiv")
+   (eq_attr "mode" "SF"))
+  "xs_fmisc_rs")
+
+(define_insn_reservation "xiangshan_sfsqrt" 17
+  (and (eq_attr "tune" "xiangshan")
+   (eq_attr "type" "fsqrt")
+   (eq_attr "mode" "SF"))
+  "xs_fmisc_rs")
+
+(define_insn_reservation "xiangshan_dfdiv" 21
+  (and (eq_attr "tune" "xiangshan")
+   (eq_attr "type" "fdiv")
+   (eq_attr "mode" "DF"))
+  "xs_fmisc_rs")
+
+(define_insn_reservation "xiangshan_dfsqrt" 37
+  (and (eq_attr "tune" "xiangshan")
+   (eq_attr "type" "fsqrt")
+   (eq_attr "mode" "DF"))
+  "xs_fmisc_rs")
Similarly these say your fpdiv and fpsqrt are fully pipelined.  It's 
certainly possible, but I suspect it's really just an oversight.  Given 
these values you may also want to adjust the cost of an fp division in 
the cost table.



Finally with such high values for for the div/sqrt units, we find that 
the DFA "blows up" causing genattrtab to run for a very long time. We'll 
have to keep an eye on that.


And just to be clear, I think these can be done as a followup patch. I'm 
going to push this patch as-is rather than make any adjustments -- you 
almost certainly know the processor's capabilities better than myself or 
anyone else on this list :-)



Jeff


[PATCH v5] LoongArch: Add support for TLS descriptors

2024-03-18 Thread mengqinggang
Add support for TLS descriptors on normal code model and extreme code model.

Normal code model instruction sequence:
  -mno-explicit-relocs:
la.tls.desc $r4, s
add.d   $r12, $r4, $r2
  -mexplicit-relocs:
pcalau12i   $r4,%desc_pc_hi20(s)
addi.d  $r4,$r4,%desc_pc_lo12(s)
ld.d$r1,$r4,%desc_ld(s)
jirl$r1,$r1,%desc_call(s)
add.d   $r12, $r4, $r2

Extreme code model instruction sequence:
  -mno-explicit-relocs:
la.tls.desc $r4, $r12, s
add.d   $r12, $r4, $r2
  -mexplicit-relocs:
pcalau12i   $r4,%desc_pc_hi20(s)
addi.d  $r12,$r0,%desc_pc_lo12(s)
lu32i.d $r12,%desc64_pc_lo20(s)
lu52i.d $r12,$r12,%desc64_pc_hi12(s)
add.d   $r4,$r4,$r12
ld.d$r1,$r4,%desc_ld(s)
jirl$r1,$r1,%desc_call(s)
add.d   $r12, $r4, $r2

The default is still traditional TLS model, but can be configured with
--with-tls={trad,desc}. The default can change to TLS descriptors once
libc and LLVM support this.

gcc/ChangeLog:

* config.gcc: Add --with-tls option to change TLS flavor.
* config/loongarch/genopts/loongarch.opt.in: Add -mtls-dialect to
configure TLS flavor.
* config/loongarch/loongarch-def.h (struct loongarch_target): Add
tls_dialect.
* config/loongarch/loongarch-driver.cc (la_driver_init): Add tls
flavor.
* config/loongarch/loongarch-opts.cc (loongarch_init_target): Add
tls_dialect.
(loongarch_config_target): Ditto.
(loongarch_update_gcc_opt_status): Ditto.
* config/loongarch/loongarch-opts.h (loongarch_init_target):Ditto.
(TARGET_TLS_DESC): New define.
* config/loongarch/loongarch.cc (loongarch_symbol_insns): Add TLS DESC
instructions sequence length.
(loongarch_legitimize_tls_address): New TLS DESC instruction sequence.
(loongarch_option_override_internal): Add la_opt_tls_dialect.
(loongarch_option_restore): Add la_target.tls_dialect.
* config/loongarch/loongarch.md (@got_load_tls_desc): Normal
code model for TLS DESC.
(got_load_tls_desc_off64): Extreme code model for TLS DESC.
* config/loongarch/loongarch.opt: Regenerated.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/cmodel-extreme-1.c: Add -mtls-dialect=trad.
* gcc.target/loongarch/cmodel-extreme-2.c: Ditto.
* gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: Ditto.
* gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c:
Ditto.
* gcc.target/loongarch/func-call-medium-1.c: Ditto.
* gcc.target/loongarch/func-call-medium-2.c: Ditto.
* gcc.target/loongarch/func-call-medium-3.c: Ditto.
* gcc.target/loongarch/func-call-medium-4.c: Ditto.
* gcc.target/loongarch/tls-extreme-macro.c: Ditto.
* gcc.target/loongarch/tls-gd-noplt.c: Ditto.
* gcc.target/loongarch/explicit-relocs-auto-extreme-tls-desc.c: New 
test.
* gcc.target/loongarch/explicit-relocs-auto-tls-desc.c: New test.
* gcc.target/loongarch/explicit-relocs-extreme-tls-desc.c: New test.
* gcc.target/loongarch/explicit-relocs-tls-desc.c: New test.

Co-authored-by: Lulu Cheng 
Co-authored-by: Xi Ruoyao 
---
Changes v4 -> v5:
- Use (reg:P 4) instead of match_operand in got_load_tls_desc and
  got_load_tls_desc_off64.
- Change instruction sequence to prevent additional white spaces in the output 
asm
  before tabs.

Changes v3 -> v4:
- Add TLS descriptors test cases.

Changes v2 -> v3:
- Set default to traditional TLS model.
- Add support for -mexplicit-relocs and extreme code model.

Changes v1 -> v2:
- Clobber fcc0-fcc7 registers in got_load_tls_desc template.
- Support --with-tls in configure.

v4 link: https://sourceware.org/pipermail/gcc-patches/2024-March/647597.html
v3 link: https://sourceware.org/pipermail/gcc-patches/2024-March/647578.html
v2 link: https://sourceware.org/pipermail/gcc-patches/2024-February/646817.html
v1 link: https://sourceware.org/pipermail/gcc-patches/2023-December/638907.html

 gcc/config.gcc| 19 +-
 gcc/config/loongarch/genopts/loongarch.opt.in | 14 
 gcc/config/loongarch/loongarch-def.h  |  7 ++
 gcc/config/loongarch/loongarch-driver.cc  |  2 +-
 gcc/config/loongarch/loongarch-opts.cc| 12 +++-
 gcc/config/loongarch/loongarch-opts.h |  2 +
 gcc/config/loongarch/loongarch.cc | 47 +
 gcc/config/loongarch/loongarch.md | 68 +++
 gcc/config/loongarch/loongarch.opt| 14 
 .../gcc.target/loongarch/cmodel-extreme-1.c   |  2 +-
 .../gcc.target/loongarch/cmodel-extreme-2.c   |  2 +-
 .../explicit-relocs-auto-extreme-tls-desc.c   | 10 +++
 .../loongarch/explicit-relocs-auto-tls-desc.c | 10 +++
 .../explicit-relocs-auto-tls-ld-gd.c  |  2 +-
 .../explicit-relocs-extreme-tls-desc.c| 16 +
 

Re: [PATCH] c++: explicit inst of template method not generated [PR110323]

2024-03-18 Thread Jason Merrill

On 3/15/24 13:48, Marek Polacek wrote:

On Thu, Mar 14, 2024 at 03:39:04PM -0400, Jason Merrill wrote:

On 3/8/24 12:02, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Consider

constexpr int VAL = 1;
struct foo {
template 
void bar(typename std::conditional::type arg) { }
};
template void foo::bar<1>(int arg);

where we since r11-291 fail to emit the code for the explicit
instantiation.  That's because cp_walk_subtrees/TYPENAME_TYPE now
walks TYPE_CONTEXT ('conditional' here) as well, and in a template
finds the B==VAL template argument.  VAL is constexpr, which implies const,
which in the global scope implies static.  constrain_visibility_for_template
then makes "struct conditional<(B == VAL), int, float>" non-TREE_PUBLIC.
Then symtab_node::needed_p checks TREE_PUBLIC, sees it's 0, and we don't
emit any code.

I thought the fix would be some ODR-esque check to not consider
constexpr variables/fns that are used just for their value.  But
it turned out to be tricky.  For instance, we can't skip
determine_visibility in a template; we can't even skip it for value-dep
expressions.  For example, no-linkage-expr1.C has

using P = struct {}*;
template 
void f(int(*)[((P)0, N)]) {}

where ((P)0, N) is value-dep, but N is not relevant here: we have to
ferret out the anonymous type.  When instantiating, it's already gone.


Hmm, how is that different from the B == VAL case?  In both cases we're
naming an internal entity that gets folded away.

I guess the difference is that B == VAL falls under the special allowance in
https://eel.is/c++draft/basic.def.odr#14.5.1 because it's a constant used as
a prvalue, and therefore is not odr-used under
https://eel.is/c++draft/basic.def.odr#5.2

So I would limit this change to decl_constant_var_p.  Really we should also
be checking that the lvalue-rvalue conversion is applied, but that's more
complicated.


Thanks.  My previous version had it, but it didn't handle

   static constexpr int getval () { return 1; }

   template 
   void baz(typename conditional::type arg) { }

I'd say that "getval()" is one of "manifestly constant-evaluated expressions 
that
are not value-dependent", so it should be treated the same as B == VAL.


But it doesn't satisfy the 14.5 rule that corresponding names need to 
refer to the same entity; since getval names a function, it doesn't get 
the special exemption from that rule that VAL gets.


So this should not be treated the same as B == VAL.


I don't know if this is important to handle.  Do you want me to poke further or
should we just go with decl_constant_var_p and leave it at that for now?


Just decl_constant_var_p.

Jason



Re: [PATCH] Document -fexcess-precision=16.

2024-03-18 Thread Hongtao Liu
On Tue, Mar 19, 2024 at 12:16 AM Joseph Myers  wrote:
>
> On Mon, 18 Mar 2024, liuhongt wrote:
>
> > +If @option{-fexcess-precision=16} is specified, casts and assignments of
> > +@code{_Float16} and @code{bfloat16_t} cause value to be rounded to their
> > +semantic types if they're supported by the target.
>
> Isn't that option about rounding results of all operations, whether or not
> a cast or assignment is involved?  That's certainly what the brief mention
> of this option in extend.texi says, and fits the intent that
> -fexcess-precision=16 corresponds to FLT_EVAL_METHOD == 16.
Yes, how about this.


+If @option{-fexcess-precision=16} is specified, each operation of
+@code{_Float16} and @code{bfloat16_t} causes value to be rounded to their
+semantic types if they're supported by the target.

>
> --
> Joseph S. Myers
> josmy...@redhat.com
>


-- 
BR,
Hongtao


Re: [PATCH v3] c++: Fix handling of no-linkage decls for modules

2024-03-18 Thread Jason Merrill

On 3/16/24 07:23, Nathaniel Shead wrote:

On Mon, Mar 11, 2024 at 02:13:34PM -0400, Jason Merrill wrote:

On 3/8/24 18:18, Nathaniel Shead wrote:

On Fri, Mar 08, 2024 at 10:19:52AM -0500, Jason Merrill wrote:

On 3/7/24 21:55, Nathaniel Shead wrote:

On Mon, Nov 27, 2023 at 03:59:39PM +1100, Nathaniel Shead wrote:

On Thu, Nov 23, 2023 at 03:03:37PM -0500, Nathan Sidwell wrote:

On 11/20/23 04:47, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write
access.

-- >8 --

Block-scope declarations of functions or extern values are not allowed
when attached to a named module. Similarly, class member functions are
not inline if attached to a named module. However, in both these cases
we currently only check if the declaration is within the module purview;
it is possible for such a declaration to occur within the module purview
but not be attached to a named module (e.g. in an 'extern "C++"' block).
This patch makes the required adjustments.



Ah I'd been puzzling over the default inlinedness of  member-fns of
block-scope structs.  Could you augment the testcase to make sure that's
right too?

Something like:

// dg-module-do link
export module Mod;

export auto Get () {
 struct X { void Fn () {} };
 return X();
}


///
import Mod
void Frob () { Get().Fn(); }



I gave this a try and it indeed doesn't work correctly; 'Fn' needs to be
marked 'inline' for this to link (whether or not 'Get' itself is
inline). I've tried tracing the code to work out what's going on but
I've been struggling to work out how all the different flags (e.g.
TREE_PUBLIC, TREE_EXTERNAL, TREE_COMDAT, DECL_NOT_REALLY_EXTERN)
interact, which flags we want to be set where, and where the decision of
what function definitions to emit is actually made.

I did find that doing 'mark_used(decl)' on all member functions in
block-scope structs seems to work however, but I wonder if that's maybe
too aggressive or if there's something else we should be doing?


I got around to looking at this again, here's an updated version of this
patch. Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

(I'm not sure if 'start_preparsed_function' is the right place to be
putting this kind of logic or if it should instead be going in
'grokfndecl', e.g. decl.cc:10761 where the rules for making local
functions have no linkage are initially determined, but I found this
easier to implement: happy to move around though if preferred.)

-- >8 --

Block-scope declarations of functions or extern values are not allowed
when attached to a named module. Similarly, class member functions are
not inline if attached to a named module. However, in both these cases
we currently only check if the declaration is within the module purview;
it is possible for such a declaration to occur within the module purview
but not be attached to a named module (e.g. in an 'extern "C++"' block).
This patch makes the required adjustments.

While implementing this we discovered that block-scope local functions
are not correctly emitted, causing link failures; this patch also
corrects some assumptions here and ensures that they are emitted when
needed.

PR c++/112631

gcc/cp/ChangeLog:

* cp-tree.h (named_module_attach_p): New function.
* decl.cc (start_decl): Check for attachment not purview.
(grokmethod): Likewise.


These changes are OK; the others I want to consider more.


Thanks, I can commit this as a separate commit if you prefer?


Please.



Thanks, committed as r14-9501-gead3075406ece9.


+export auto n_n() {
+  internal();
+  struct X { void f() { internal(); } };
+  return X{};


Hmm, is this not a prohibited exposure?  Seems like X has no linkage because
it's at block scope, and the deduced return type names it.

I know we try to support this "voldemort" pattern, but is that actually
correct?


I had similar doubts, but this is not an especially uncommon pattern in
the wild either. I also asked some other people for their thoughts and
got told:

"no linkage" doesn't mean it's ill-formed to name it in other scopes.
It means a declaration in another scope cannot correspond to it

And that the wording in [basic.link] p2.4 is imprecise. (Apparently they
were going to raise a core issue about this too, I think?)

As for whether it's an exposure, looking at [basic.link] p15, the entity
'X' doesn't actually appear to be TU-local: it doesn't have a name with
internal linkage (no linkage is different) and is not declared or
introduced within the definition of a TU-local entity (n_n is not
TU-local).


Hmm, I think you're right.  And this rule:


-/* DR 757: A type without linkage shall not be used as the type of a
-   variable or function with linkage, unless
-   o the variable or function has extern "C" linkage (7.5 [dcl.link]), or
-   o the variable or function is not used (3.2 [basic.def.odr]) or is
-   defined in the same translation unit.


is no longer part of the 

Re: [gcc-15 2/3] RISC-V: avoid LUI based const mat: keep stack offsets aligned

2024-03-18 Thread Vineet Gupta



On 3/16/24 13:21, Jeff Law wrote:
> |   59944:add s0,sp,2047  <
> |   59948:mv  a2,a0
> |   5994c:mv  a3,a1
> |   59950:mv  a0,sp
> |   59954:li  a4,1
> |   59958:lui a1,0x1
> |   5995c:add s0,s0,1 <---
> |   59960:jal 59a3c
>
> SP here becomes unaligned, even if transitively which is undesirable as
> well as incorrect:
>   - ABI requires stack to be 8 byte aligned
>   - asm code looks weird and unexpected
>   - to the user it might falsely seem like a compiler bug even when not,
> specially when staring at asm for debugging unrelated issue.
> It's not ideal, but I think it's still ABI compliant as-is.  If it 
> wasn't, then I suspect things like virtual origins in Ada couldn't be 
> made ABI compliant.

To be clear are u suggesting ADD sp, sp, 2047 is ABI compliant ?
I'd still like to avoid it as I'm sure someone will complain about it.

>> With the patch, we get following correct code instead:
>>
>> | ..
>> | 59944: add s0,sp,2032
>> | ..
>> | 5995c: add s0,s0,16
> Alternately you could tighten the positive side of the range of the 
> splitter from patch 1/3 so that you could always use 2032 rather than 
> 2047 on the first addi.   ie instead of allowing 2048..4094, allow 
> 2048..4064.

2033..4064 vs. 2048..4094

Yeah I was a bit split about this as well. Since you are OK with either,
I'll keep them as-is and perhaps add this observation to commitlog.

Thx,
-Vineet


Re: [gcc-15 1/3] RISC-V: avoid LUI based const materialization ... [part of PR/106265]

2024-03-18 Thread Vineet Gupta



On 3/16/24 13:28, Jeff Law wrote:
>> Implementation Details (for posterity)
>> --
>>   - basic idea is to have a splitter selected via a new predicate for 
>> constant
>> being possible sum of two S12 and provide the transform.
>> This is however a 2 -> 2 transform which combine can't handle.
>> So we specify it using a define_insn_and_split.
> Right.  For the record it looks like a 2->2 case because of the 
> mvconst_internal define_insn_and_split.
>
> What I was talking about in the Tuesday meeting last week was the desire 
> to rethink that pattern precisely because it drives us to need to 
> implement patterns like yours rather than the more natural 3->2 or 4->3/2.

A few weeks, I did an experimental run removing that splitter isn't
catastrophic for SPEC, granted it is a pretty narrow view of the world :-)
Below is upstream vs. revert of mvconst_internal (for apples:apples just
the revert, none of my new splitter stuff)

 BaselineRevert mvconst_int
1,214,172,747,858 |  1,225,245,908,131 |  -0.91% <--
  740,618,658,160 |    743,873,857,461 |  -0.44% <-
  692,128,469,436 |    695,695,894,602 |  -0.52% <--
  190,811,495,310 |    190,934,386,665 |  -0.06%
  225,751,808,189 |    225,909,747,497 |  -0.07%
  220,364,237,915 |    220,466,640,640 |  -0.05%
  179,157,361,261 |    179,357,613,835 |  -0.11%
  219,313,921,837 |    219,436,712,114 |  -0.06%
  281,344,210,410 |    281,344,197,305 |   0.00%
  446,517,079,816 |    446,517,058,807 |   0.00%
  347,300,137,757 |    347,300,119,942 |   0.00%
  421,496,082,529 |    421,496,063,144 |   0.00%
  669,319,255,911 |    673,977,112,732 |  -0.70% <--
2,852,810,623,629 |  2,852,809,986,901 |   0.00%
1,855,884,349,001 |  1,855,837,810,109 |   0.00%
1,653,853,403,482 |  1,653,714,171,270 |   0.01%
2,974,347,287,057 |  2,970,520,074,342 |   0.13%
1,158,337,609,995 |  1,158,337,607,076 |   0.00%
1,019,181,486,091 |  1,020,576,716,760 |  -0.14%
1,738,961,517,942 |  1,736,024,569,247 |   0.17%
  849,330,280,930 |    848,955,738,855 |   0.04%
  276,584,892,794 |    276,457,202,331 |   0.05%
  913,410,589,335 |    913,407,101,214 |   0.00%
  903,864,384,529 |    903,800,709,452 |   0.01%
1,654,905,086,567 |  1,656,684,048,052 |  -0.11%
1,513,514,546,262 |  1,510,815,435,668 |   0.18%
1,641,980,078,831 |  1,602,410,552,874 |   2.41% <--
2,546,170,307,336 |  2,546,206,790,578 |   0.00%
1,983,551,321,388 |  1,979,562,936,994 |   0.20%
1,516,077,036,742 |  1,515,038,668,667 |   0.07%
2,056,386,745,670 |  2,055,592,903,700 |   0.04%
1,008,224,427,448 |  1,008,027,321,669 |   0.02%
1,169,305,141,789 |  1,169,028,937,430 |   0.02%
  363,827,037,541 |    361,982,880,800 |   0.51% <--
  906,649,110,459 |    909,651,995,900 |  -0.33% <-
  509,023,896,044 |    508,578,027,604 |   0.09%
  403,238,430 |    398,492,922 |   1.18%
  403,238,430 |    398,492,922 |   1.18%
38,917,902,479,830  38,886,374,486,212


Do note however that this takes us back to constant pool way of loading
complex constants (vs. bit fiddling and stitching). The 2.4% gain in
deepsjeng is exactly that and we need to decide what to do about it:
keep it as such or tweak it with a cost model change and/or make this
for everyone or have this per cpu tune - hard to tell whats the right
thing to do here.

P.S. Perhaps obvious but the prerequisite to revert is to tend to the
original linked PRs which will now start failing so that will hopefully
improve some of the above.

> Furthermore, I have a suspicion that there's logicals where we're going 
> to want to do something similar and if we keep the mvconst_internal 
> pattern they're all going to have to be implemented as 2->2s with a 
> define_insn_and_split rather than the more natural 3->2.

Naive question: why is define_split more natural vs. define_insn_and_split.
Is it because of the syntax/implementation or that one is more Combine
"centric" and other is more of a Split* Pass thing (roughly speaking of
course) or something else altogether ?

Would we have to revisit the new splitter (and perhaps audit the
existing define_insn_and_split patterns) if we were to go ahead with
this revert ?
>> +(define_insn_and_split "*add3_const_sum_of_two_s12"
>> +  [(set (match_operand:P 0 "register_operand" "=r,r")
>> +(plus:P (match_operand:P 1 "register_operand" " r,r")
>> +(match_operand:P 2 "const_two_s12"" MiG,r")))]
>> +  ""
>> +  "add %0,%1,%2"
>> +  ""
>> +  [(set (match_dup 0)
>> +(plus:P (match_dup 1) (match_dup 3)))
>> +   (set (match_dup 0)
>> +(plus:P (match_dup 0) (match_dup 4)))]
>> +{
>> +  int val = INTVAL (operands[2]);
>> +  if (SUM_OF_TWO_S12_P (val))
>> +{
>> +   operands[3] = GEN_INT (2047);
>> +   operands[4] = GEN_INT (val - 2047);
>> +}
>> +  else if (SUM_OF_TWO_S12_N (val))
>> +{
>> +   operands[3] = GEN_INT (-2048);
>> +   operands[4] = GEN_INT (val + 2048);
>> +}
>> +  else
>> +  gcc_unreachable ();
>> +}
>> + 

Re: [PATCH] c-c++-common/Wrestrict.c: fix some typos and enable for LLP64

2024-03-18 Thread Jonathan Yong

On 3/17/24 17:38, Mike Stump wrote:

On Feb 15, 2024, at 6:08 AM, Jonathan Yong wrote:


Attached patch OK?


Ok.


Copy/pasted for review convenience.

diff --git a/gcc/testsuite/c-c++-common/Wrestrict.c 
b/gcc/testsuite/c-c++-common/Wrestrict.c
index 4d005a618b3..57a3f67e21e 100644
--- a/gcc/testsuite/c-c++-common/Wrestrict.c
+++ b/gcc/testsuite/c-c++-common/Wrestrict.c
@@ -381,14 +381,14 @@ void test_memcpy_range_exceed (char *d, const char *s)
   T (d + i, s + 1, 3);   /* { dg-warning "accessing 3 bytes at offsets \\\[\[0-9\]+, 
\[0-9\]+] and 1 overlaps 1 byte" "memcpy" } */
  #if __SIZEOF_SIZE_T__ == 8
-  /* Verfiy the offset and size computation is correct.  The overlap
- offset mentioned in the warning plus sthe size of the access must
+  /* Verify the offset and size computation is correct.  The overlap
+ offset mentioned in the warning plus the size of the access must
  not exceed DIFF_MAX.  */
-  T (d, d + i, 5);   /* { dg-warning "accessing 5 bytes at offsets 0 and 
\\\[9223372036854775805, 9223372036854775807] overlaps 3 bytes at offset 9223372036854775802" 
"LP64" { target lp64 } } */
-  T (d + i, d, 5);   /* { dg-warning "accessing 5 bytes at offsets \\\[9223372036854775805, 
9223372036854775807] and 0 overlaps 3 bytes at offset 9223372036854775802" "LP64" { 
target lp64 } } */
+  T (d, d + i, 5);   /* { dg-warning "accessing 5 bytes at offsets 0 and 
\\\[9223372036854775805, 9223372036854775807] overlaps 3 bytes at offset 9223372036854775802" 
"LP64" { target { lp64 || llp64 } } } */
+  T (d + i, d, 5);   /* { dg-warning "accessing 5 bytes at offsets \\\[9223372036854775805, 
9223372036854775807] and 0 overlaps 3 bytes at offset 9223372036854775802" "LP64" { 
target { lp64 || llp64 } } } */
-  T (d, s + i, 5);   /* { dg-warning "accessing 5 bytes at offsets 0 and 
\\\[9223372036854775805, 9223372036854775807] overlaps 3 bytes at offset 9223372036854775802" 
"LP64" { target lp64 } } */
-  T (d + i, s, 5);   /* { dg-warning "accessing 5 bytes at offsets \\\[9223372036854775805, 
9223372036854775807] and 0 overlaps 3 bytes at offset 9223372036854775802" "LP64" { 
target lp64 } } */
+  T (d, s + i, 5);   /* { dg-warning "accessing 5 bytes at offsets 0 and 
\\\[9223372036854775805, 9223372036854775807] overlaps 3 bytes at offset 9223372036854775802" 
"LP64" { target { lp64 || llp64 } } } */
+  T (d + i, s, 5);   /* { dg-warning "accessing 5 bytes at offsets \\\[9223372036854775805, 
9223372036854775807] and 0 overlaps 3 bytes at offset 9223372036854775802" "LP64" { 
target { lp64 || llp64 } } } */
#elif __SIZEOF_SIZE_T__ == 4
   T (d, d + i, 5);   /* { dg-warning "accessing 5 bytes at offsets 0 and \\\[2147483645, 
2147483647] overlaps 3 bytes at offset 2147483642" "ILP32" { target ilp32 } } */
   T (d + i, d, 5);   /* { dg-warning "accessing 5 bytes at offsets \\\[2147483645, 2147483647] and 
0 overlaps 3 bytes at offset 2147483642" "ILP32" { target ilp32 } } 
*/<0001-c-c-common-Wrestrict.c-fix-some-typos-and-enable-for.patch>




Thanks all, pushed to master branch.


Re: [PATCH v2 00/13] Add aarch64-w64-mingw32 target

2024-03-18 Thread Andrew Pinski
On Mon, Mar 18, 2024 at 3:59 PM Fangrui Song  wrote:
>
> On Mon, Mar 18, 2024 at 3:10 PM Evgeny Karpov
>  wrote:
> >
> >
> > Monday, March 18, 2024 2:34 PM
> > Christophe Lyon wrote:
> >
> > > I had a look at the v2 series, and besides a minor comment patch #8, ISTM 
> > > than
> > > all the comments your received about v1 have been addressed, indeed.
> > >
> > > > While unit testing for the x86_64-w64-mingw32 target is still in
> > > > progress, the first 4 patches do not obviously change other targets,
> > > > including aarch64-linux-gnu.
> > > > Could they be merged once stage 1 starts, or could it be done even
> > > > now?
> > >
> > > What would be the benefit of committing only the first 4 patches?
> > > (whether now or when stage 1 reopens)
> > >
> > > Thanks,
> > >
> > > Christophe
> >
> > Work on obtaining regression test results for x86_x64-w64-mingw32 was still 
> > in progress at that moment.
> > The first 4 patches do not obviously change other targets, so it was safe 
> > to merge them.
> > Now, based on the regression test results 
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647967.html,
> > it appears safe to merge the entire series.
> >
> > Regards,
> > Evgeny
>
> LLVM has had an aarch64 mingw ABI support for a long time. Does this
> patch series introduce a different ABI?
> If yes, do you have a summary?
>
> Does the patch need any adaptation on the LLVM side, or should a
> different target triple be picked?
> I have always been wondering what "32" in "x86_x64-w64-mingw32" means.

It was always mingw32, it comes from win32 API interface which dates
to when windows also a 16bit API too.
The API has always been named win32 and Microsoft didn't rename it to
win64 though.

Thanks,
Andrew Pinski

>
> https://github.com/llvm/llvm-project/pull/78908 even introduced the
> first use of the triple "arm64ec-w64-mingw32" into llvm-project.
>
>
> --
> 宋方睿


Re: [PATCH v2 00/13] Add aarch64-w64-mingw32 target

2024-03-18 Thread Fangrui Song
On Mon, Mar 18, 2024 at 3:10 PM Evgeny Karpov
 wrote:
>
>
> Monday, March 18, 2024 2:34 PM
> Christophe Lyon wrote:
>
> > I had a look at the v2 series, and besides a minor comment patch #8, ISTM 
> > than
> > all the comments your received about v1 have been addressed, indeed.
> >
> > > While unit testing for the x86_64-w64-mingw32 target is still in
> > > progress, the first 4 patches do not obviously change other targets,
> > > including aarch64-linux-gnu.
> > > Could they be merged once stage 1 starts, or could it be done even
> > > now?
> >
> > What would be the benefit of committing only the first 4 patches?
> > (whether now or when stage 1 reopens)
> >
> > Thanks,
> >
> > Christophe
>
> Work on obtaining regression test results for x86_x64-w64-mingw32 was still 
> in progress at that moment.
> The first 4 patches do not obviously change other targets, so it was safe to 
> merge them.
> Now, based on the regression test results 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647967.html,
> it appears safe to merge the entire series.
>
> Regards,
> Evgeny

LLVM has had an aarch64 mingw ABI support for a long time. Does this
patch series introduce a different ABI?
If yes, do you have a summary?

Does the patch need any adaptation on the LLVM side, or should a
different target triple be picked?
I have always been wondering what "32" in "x86_x64-w64-mingw32" means.

https://github.com/llvm/llvm-project/pull/78908 even introduced the
first use of the triple "arm64ec-w64-mingw32" into llvm-project.


-- 
宋方睿


[pushed] analyzer: fix ICEs due to sloppy types in bounds-checking [PR110902, PR110928, PR111305, PR111441]

2024-03-18 Thread David Malcolm
Various analyzer ICEs in our bugzilla relate to sloppy use of types
within bounds-checking.

The bounds-checking code works by comparing symbolic *bit* offsets, and
we don't have a good user-facing type that can represent such an offset
(ptrdiff_type_node is for *byte* offsets).

ana::svalue doesn't enforce valid combinations of types for things like
binary operations.  When I added the access diagrams for GCC 14, this
could lead to attempts to generate trees for such svalues, leading to
trees with invalid combinations of types (e.g. PLUS_EXPR or MULT_EXPR of
incompatible types), leading to ICEs inside the tree folding logic.

I tried two approaches to fixing this.

My first approach was to fix the type-handling throughout the
bounds-checking code to use correct types, using size_type_node for
sizes, ptrdiff_type_node for byte offsets, and trying ptrdiff_type_node
for bit offsets.  I implemented this, and it fixed the crashes, but
unfortunately it led to:
(a) numerous false negatives from the bounds-checking code, due to it
becoming unable to be sure that the accessed offset was beyond the valid
bounds, due to the expressions involved gaining complicated sets of
nested casts.
(b) ugly access diagrams full of nested casts (for capacities, gap
measurements, etc)

So my second approach, implemented in this patch, is to accept that we
don't have a tree type for representing bit offsets.  The patch
represents bit offsets using "typeless" symbolic values i.e. ones for
which get_type () is NULL_TREE, and implements enough support for basic
arithemetic as if these are mathematical integers (albeit ones for which
concrete values within an expression must fit within a signed wide int).
Such values can't be converted to tree, so the patch avoids such
conversions, instead implementing a new svalue::maybe_print_for_user for
printing them to a pretty_printer.  The patch uses ptrdiff_type_node for
byte offsets.

Doing so fixes the crashes, whilst appearing to preserve the behavior of
-Wanalyzer-out-of-bounds in my testing.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-9527-g1579394c9ecf3d.

gcc/analyzer/ChangeLog:
PR analyzer/110902
PR analyzer/110928
PR analyzer/111305
PR analyzer/111441
* access-diagram.cc: Include "analyzer/analyzer-selftests.h".
(get_access_size_str): Reimplement for conversion of
implmementation of bit_size_expr from tree to const svalue &.  Use
svalue::maybe_print_for_user rather than tree printing routines.
(remove_ssa_names): Make non-static.
(bit_size_expr::get_formatted_str): Rename to...
(bit_size_expr::maybe_get_formatted_str): ...this, adding "model"
param and converting return type to a unique_ptr.  Update for
conversion of implementation of bit_size_expr from tree to
const svalue &.  Use svalue::maybe_print_for_user rather than tree
printing routines.
(bit_size_expr::print): Rename to...
(bit_size_expr::maybe_print_for_user): ...this, adding "model"
param and converting return type to bool.  Update for
conversion of implementation of bit_size_expr from tree to
const svalue &.  Use svalue::maybe_print_for_user rather than tree
printing routines.
(bit_size_expr::maybe_get_as_bytes): Add "mgr" param and convert
return type from tree to const svalue *; reimplement.
(access_range::access_range): Call strip_types when on region_offset
intializations.
(access_range::get_size): Update for conversion of implementation
of bit_size_expr from tree to const svalue &.
(access_operation::get_valid_bits): Pass manager to access_range
ctor.
(access_operation::maybe_get_invalid_before_bits): Likewise.
(access_operation::maybe_get_invalid_after_bits): Likewise.
(boundaries::add): Likewise.
(bit_to_table_map::populate): Add "mgr" param and pass it to
access_range ctor.
(access_diagram_impl::access_diagram_impl): Pass manager to
bit_to_table_map::populate.
(access_diagram_impl::maybe_add_gap): Use svalue rather than tree
for symbolic bit offsets.  Port to new bit_size_expr
representation.
(access_diagram_impl::add_valid_vs_invalid_ruler): Port to new
bit_size_expr representation.
(selftest::assert_eq_typeless_integer): New.
(ASSERT_EQ_TYPELESS_INTEGER): New.
(selftest::test_bit_size_expr_to_bytes): New.
(selftest::analyzer_access_diagram_cc_tests): New.
* access-diagram.h (class bit_size_expr): Reimplement, converting
implementation from tree to const svalue &.
(access_range::access_range): Add "mgr" param.  Call strip_types
on region_offset initializations.
(access_range::get_size): Update decl for 

[pushed] analyzer: support null operands in remove_ssa_names

2024-03-18 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-9526-g3c2827d75ea8fb.

gcc/analyzer/ChangeLog:
* access-diagram.cc (remove_ssa_names): Support operands being
NULL_TREE, such as e.g. for COMPONENT_REF's operand 2.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/access-diagram.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/analyzer/access-diagram.cc b/gcc/analyzer/access-diagram.cc
index 2836308c019..a9c5c899950 100644
--- a/gcc/analyzer/access-diagram.cc
+++ b/gcc/analyzer/access-diagram.cc
@@ -311,7 +311,8 @@ remove_ssa_names (tree expr)
 return SSA_NAME_VAR (expr);
   tree t = copy_node (expr);
   for (int i = 0; i < TREE_OPERAND_LENGTH (expr); i++)
-TREE_OPERAND (t, i) = remove_ssa_names (TREE_OPERAND (expr, i));
+if (TREE_OPERAND (expr, i))
+  TREE_OPERAND (t, i) = remove_ssa_names (TREE_OPERAND (expr, i));
   return t;
 }
 
-- 
2.26.3



[PATCH v2 00/13] Add aarch64-w64-mingw32 target

2024-03-18 Thread Evgeny Karpov


Monday, March 18, 2024 2:34 PM
Christophe Lyon wrote:

> I had a look at the v2 series, and besides a minor comment patch #8, ISTM than
> all the comments your received about v1 have been addressed, indeed.
> 
> > While unit testing for the x86_64-w64-mingw32 target is still in
> > progress, the first 4 patches do not obviously change other targets,
> > including aarch64-linux-gnu.
> > Could they be merged once stage 1 starts, or could it be done even
> > now?
> 
> What would be the benefit of committing only the first 4 patches?
> (whether now or when stage 1 reopens)
> 
> Thanks,
> 
> Christophe

Work on obtaining regression test results for x86_x64-w64-mingw32 was still in 
progress at that moment.
The first 4 patches do not obviously change other targets, so it was safe to 
merge them.
Now, based on the regression test results 
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647967.html,
it appears safe to merge the entire series.

Regards,
Evgeny


Re: _LIBCXX_DEBUG value initialized singular iterators assert failures in std algorithms [PR104316]

2024-03-18 Thread François Dumont

Both committed now.

Just to confirm, those 2 last patches should be backported to gcc-13 
branch, right ?


I might have a try to update version.h but if you want to do it before 
don't hesitate.


François

On 18/03/2024 08:45, Jonathan Wakely wrote:

On Sun, 17 Mar 2024 at 18:14, François Dumont  wrote:

I was a little bit too confident below. After review of all _M_singular
usages I found another necessary fix.

After this one for sure we will be able to define
__cpp_lib_null_iterators even in Debug mode.

  libstdc++: Fix N3344 behavior on _Safe_iterator::_M_can_advance

  We shall be able to advance from a 0 offset a value-initialized
iterator.

  libstdc++-v3/ChangeLog:

  * include/debug/safe_iterator.tcc
(_Safe_iterator<>::_M_can_advance):
  Accept 0 offset advance on value-initialized iterator.
  * testsuite/23_containers/vector/debug/n3644.cc: New test case.

Ok to commit ?


OK, thanks.



François


On 17/03/2024 17:52, François Dumont wrote:

OK for trunk, thanks!

I think this is OK to backport to 13 too.

Maybe after this we can define the __cpp_lib_null_itetators macro for
debug mode?


After this fix of local_iterator I think we can indeed.

In fact the added 11316.cc was already passing for
unordered_set<>::local_iterator but simply because we were missing the
singular check. Both issues solved with this patch.

I found the version.def file to cleanup but no idea how to regenerate
version.h from it so I'll let you do it, ok ?

 libstdc++: Fix _Safe_local_iterator<>::_M_valid_range

 Unordered container local_iterator range shall not contain any
singular
 iterator unless both iterators are value-initialized.

 libstdc++-v3/ChangeLog:

 * include/debug/safe_local_iterator.tcc
 (_Safe_local_iterator::_M_valid_range): Add
_M_value_initialized and
 _M_singular checks.
 * testsuite/23_containers/unordered_set/debug/114316.cc:
New test case.


Ok to commit ?

François


[PATCH v2 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-03-18 Thread Evgeny Karpov
Monday, March 18, 2024 2:27 PM
Christophe Lyon wrote: 

> > +/* Disable SEH and declare the required SEH-related macros that are
> > +still needed for compilation.  */ #undef TARGET_SEH #define
> > +TARGET_SEH 0
> > +
> > +#define SSE_REGNO_P(N) 0
> > +#define GENERAL_REGNO_P(N) 0
> I think you forgot to add a comment to explain the above two lines.
> (it was requested during v1 review)
> 
> Thanks,
> 
> Christophe

Hi Christophe,

Thank you for the review!
The comment regarding SEH and SEH-related macros has been added two lines above.
It may not be obvious, but these macros are needed to emit SEH data in 
mingw/winnt.cc.
This group is separated by an empty line; however, it still relates to 
SEH-related macros.

Regards,
Evgeny


Re: [PATCH v2 00/13] Add aarch64-w64-mingw32 target

2024-03-18 Thread Radek Barton
Hello, everyone.

Currently, we are able to provide results of regression testing for 
`x86_64-w64-mingw32` target with `--enable-languages=c,lto,c++,fortran` running 
in WSL only.

The summarized results, both for the branch with patch set applied and its 
corresponding base branch, show:

517501 expected passes
4537 of expected failures
10828 unexpected failures
180 of unexpected successes
5934 of unresolved testcases
19113 of unsupported tests

which means that 98% of the tests ends in an expected way and we haven't 
detected a single regression between the branches.

The detailed results can be downloaded and reviewed at 
https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build/actions/runs/8327889403

Best regards,

Radek Bartoň


[PATCH] i386: Unify {general, timode}_scalar_chain::convert_op [PR111822]

2024-03-18 Thread Uros Bizjak
Recent PR111822 fix implemented REG_EH_REGION note copying to a STV converted
preload instruction in general_scalar_chain::convert_op.  However, the same
issue remains in timode_scalar_chain::convert_op.  Instead of copying the
newly introduced code to timode_scalar_chain::convert_op, the patch unifies
both functions to a common function.

PR target/111822

gcc/ChangeLog:

* config/i386/i386-features.cc (smode_convert_cst): New function
to handle SImode, DImode and TImode immediates, generalized from
timode_convert_cst.
(timode_convert_cst): Remove.
(scalar_chain::convert_op): Unify from
general_scalar_chain::convert_op and timode_scalar_chain::convert_op.
(general_scalar_chain::convert_op): Remove.
(timode_scalar_chain::convert_op): Remove.
(timode_scalar_chain::convert_insn): Update the call to
renamed timode_convert_cst.
* config/i386/i386-features.h (class scalar_chain):
Redeclare convert_op as protected class member.
(class general_calar_chain): Remove convert_op.
(class timode_scalar_chain): Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr111822.C (dg-do): Compile only for ia32 targets.
(dg-options): Add -march=x86-64.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master, will be pushed to gcc-12 and gcc-13 branch.

Uros.
diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index c7d7a965901..e3e004d5526 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -980,14 +980,35 @@ scalar_chain::convert_reg (rtx_insn *insn, rtx dst, rtx 
src)
 REGNO (src), REGNO (dst), INSN_UID (insn));
 }
 
+/* Helper function to convert immediate constant X to vmode.  */
+static rtx
+smode_convert_cst (rtx x, enum machine_mode vmode)
+{
+  /* Prefer all ones vector in case of -1.  */
+  if (constm1_operand (x, GET_MODE (x)))
+return CONSTM1_RTX (vmode);
+
+  unsigned n = GET_MODE_NUNITS (vmode);
+  rtx *v = XALLOCAVEC (rtx, n);
+  v[0] = x;
+  for (unsigned i = 1; i < n; ++i)
+v[i] = const0_rtx;
+  return gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (n, v));
+}
+
 /* Convert operand OP in INSN.  We should handle
memory operands and uninitialized registers.
All other register uses are converted during
registers conversion.  */
 
 void
-general_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
+scalar_chain::convert_op (rtx *op, rtx_insn *insn)
 {
+  rtx tmp;
+
+  if (GET_MODE (*op) == V1TImode)
+return;
+
   *op = copy_rtx_if_shared (*op);
 
   if (GET_CODE (*op) == NOT
@@ -998,20 +1019,21 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn 
*insn)
 }
   else if (MEM_P (*op))
 {
-  rtx_insn* eh_insn, *movabs = NULL;
-  rtx tmp = gen_reg_rtx (GET_MODE (*op));
+  rtx_insn *movabs = NULL;
 
   /* Emit MOVABS to load from a 64-bit absolute address to a GPR.  */
   if (!memory_operand (*op, GET_MODE (*op)))
{
- rtx tmp2 = gen_reg_rtx (GET_MODE (*op));
- movabs = emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
+ tmp = gen_reg_rtx (GET_MODE (*op));
+ movabs = emit_insn_before (gen_rtx_SET (tmp, *op), insn);
 
- *op = tmp2;
+ *op = tmp;
}
 
-  eh_insn
-   = emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
+  tmp = gen_rtx_SUBREG (vmode, gen_reg_rtx (GET_MODE (*op)), 0);
+
+  rtx_insn *eh_insn
+   = emit_insn_before (gen_rtx_SET (copy_rtx (tmp),
 gen_gpr_to_xmm_move_src (vmode, *op)),
insn);
 
@@ -1028,33 +1050,17 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn 
*insn)
}
}
 
-  *op = gen_rtx_SUBREG (vmode, tmp, 0);
+  *op = tmp;
 
   if (dump_file)
fprintf (dump_file, "  Preloading operand for insn %d into r%d\n",
 INSN_UID (insn), REGNO (tmp));
 }
   else if (REG_P (*op))
+*op = gen_rtx_SUBREG (vmode, *op, 0);
+  else if (CONST_SCALAR_INT_P (*op))
 {
-  *op = gen_rtx_SUBREG (vmode, *op, 0);
-}
-  else if (CONST_INT_P (*op))
-{
-  rtx vec_cst;
-  rtx tmp = gen_rtx_SUBREG (vmode, gen_reg_rtx (smode), 0);
-
-  /* Prefer all ones vector in case of -1.  */
-  if (constm1_operand (*op, GET_MODE (*op)))
-   vec_cst = CONSTM1_RTX (vmode);
-  else
-   {
- unsigned n = GET_MODE_NUNITS (vmode);
- rtx *v = XALLOCAVEC (rtx, n);
- v[0] = *op;
- for (unsigned i = 1; i < n; ++i)
-   v[i] = const0_rtx;
- vec_cst = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (n, v));
-   }
+  rtx vec_cst = smode_convert_cst (*op, vmode);
 
   if (!standard_sse_constant_p (vec_cst, vmode))
{
@@ -1065,6 +1071,8 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
  emit_insn_before (seq, insn);
}
 
+  tmp = gen_rtx_SUBREG (vmode, gen_reg_rtx (smode), 0);
+
   

Re: [PATCH] cpp: new built-in __EXP_COUNTER__

2024-03-18 Thread Jonathan Wakely
On Mon, 18 Mar 2024 at 16:46, Kaz Kylheku  wrote:
>
> On 2024-03-18 00:30, Jonathan Wakely wrote:
> >>+@item __EXP_COUNTER__
> >>+This macro's name means "(macro) expansion counter".
> >>+Outside of macro replacement sequences, it expands to the integer
> >>+token @code{1}.  This make it possible to easily test for the presence
> >>+of this feature using conditional directives such as
> >>+@code{#if __EXP_COUNTER__}.
> >
> > It's a macro, so you can just use '#ifdef __EXP_COUNTER__' to test if
> > it's supported. Is this additional behaviour necessary?
>
> Thanks for looking at the patch.
>
> The macro has to be defined, but it doesn't have to be 1.
>
> Outside a macro body, it could just appear defined
> with an empty value, as if by #define __EXP_COUNTER__.
>
> When I dump the predefined macros of a nearby GCC installation, I
> see very few empty ones:
>
>$ echo | gcc -dM -E - | awk 'NF == 2'
>#define __USER_LABEL_PREFIX__
>#define __REGISTER_PREFIX__
>
> The __EXP_COUNTER__ and __UEXP_COUNTER__ symbols are
> intended for suffix and infix use, so they are roughly in
> a kind of category with those those two.

I don't feel strongly about the value, it seems fine to define it as 1
too. I just thought it was a bit strange to explicitly mention testing
its value to detect support, when #ifdef seems simpler and more
"correct" (as in, it just checks if it's defined, and thevalue is
irrelevant).

>
> I will make that change, and also fix the grammar error
> and remove the conflict-promoting ChangeLog changes.
>



[PATCH] RISC-V: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-18 Thread Edwin Lu
We assume that TYPE_NO_NAMED_ARGS_STDARG_P don't have any named arguments and
there is nothing to advance, but that is not the case for (...) functions
returning by hidden reference which have one such artificial argument.
This causes gcc.dg/c23-stdarg-[68].c to fail

Fix the issue by checking if arg.type is NULL as r14-9503-g218d1749612
explains

Tested on linux rv64gcv.

gcc/ChangeLog:

PR target/114175
* config/riscv/riscv.cc (riscv_setup_incoming_varargs): Only skip
riscv_funciton_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL
---
 gcc/config/riscv/riscv.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 680c4a728e9..1f5dc33796b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5378,7 +5378,8 @@ riscv_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 riscv_function_arg_advance (pack_cumulative_args (_cum), arg);
 
   /* Found out how many registers we need to save.  */
-- 
2.34.1



[PATCH, committed] Fortran: error recovery in frontend optimization [PR103715]

2024-03-18 Thread Harald Anlauf
Dear all,

I've committed the attached simple & obvious patch for an ICE due to
an invalid read in frontend optimization after regtesting and an OK
by Jerry in the PR.

Pushed: https://gcc.gnu.org/g:3be2b8f475f22c531d6cef1b041c0573b3ea5133

As this PR is marked as a regression, I plan to backport to open
branches.

Thanks,
Harald

From 3be2b8f475f22c531d6cef1b041c0573b3ea5133 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Mon, 18 Mar 2024 19:36:59 +0100
Subject: [PATCH] Fortran: error recovery in frontend optimization [PR103715]

gcc/fortran/ChangeLog:

	PR fortran/103715
	* frontend-passes.cc (check_externals_expr): Prevent invalid read
	in case of mismatch of external subroutine with function.

gcc/testsuite/ChangeLog:

	PR fortran/103715
	* gfortran.dg/pr103715.f90: New test.
---
 gcc/fortran/frontend-passes.cc |  3 +++
 gcc/testsuite/gfortran.dg/pr103715.f90 | 12 
 2 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pr103715.f90

diff --git a/gcc/fortran/frontend-passes.cc b/gcc/fortran/frontend-passes.cc
index 06dfa1a3232..3c06018fdbb 100644
--- a/gcc/fortran/frontend-passes.cc
+++ b/gcc/fortran/frontend-passes.cc
@@ -5807,6 +5807,9 @@ check_externals_expr (gfc_expr **ep, int *walk_subtrees ATTRIBUTE_UNUSED,
   if (e->expr_type != EXPR_FUNCTION)
 return 0;

+  if (e->symtree && e->symtree->n.sym->attr.subroutine)
+return 0;
+
   sym = e->value.function.esym;
   if (sym == NULL)
 return 0;
diff --git a/gcc/testsuite/gfortran.dg/pr103715.f90 b/gcc/testsuite/gfortran.dg/pr103715.f90
new file mode 100644
index 000..72c5a31fb21
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr103715.f90
@@ -0,0 +1,12 @@
+! { dg-do compile }
+! PR fortran/103715 - ICE in gfc_find_gsymbol
+!
+! valgrind did report an invalid read in check_externals_procedure
+
+program p
+  select type (y => g()) ! { dg-error "Selector shall be polymorphic" }
+  end select
+  call g()
+end
+
+! { dg-prune-output "already being used as a FUNCTION" }
--
2.35.3



Re: [PATCH, v2] Fortran: fix for absent array argument passed to optional dummy [PR101135]

2024-03-18 Thread Mikael Morin

Le 17/03/2024 à 23:10, Harald Anlauf a écrit :

Hi Mikael,

On 3/17/24 22:04, Mikael Morin wrote:

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 3673fa40720..a7717a8107e 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -7526,6 +7526,17 @@ gfc_get_dataptr_offset (stmtblock_t *block,
tree parm, tree desc, tree offset,

   /* Set the target data pointer.  */
   offset = gfc_build_addr_expr (gfc_array_dataptr_type (desc), tmp);
+
+  /* Check for optional dummy argument being present.  Arguments of
BIND(C)
+ procedures are excepted here since they are handled
differently.  */
+  if (expr->expr_type == EXPR_VARIABLE
+  && expr->symtree->n.sym->attr.dummy
+  && expr->symtree->n.sym->attr.optional
+  && !is_CFI_desc (NULL, expr))


I think the condition could additionally check the lack of subreferences.
But it's maybe not worth the trouble, and the patch is conservatively
correct as is, so OK.


I have thought about the conditions here for some time and did not
find better ones.  They need to be broad enough to catch the case
in gfortran.dg/missing_optional_dummy_6a.f90 that (according to the
tree-dump) was not properly handled previously and would have triggered
ubsan at some point in the future when someone tried to change that
testcase from currently dg-do compile to dg-do run...


No problem, as said it is conservatively correct.


(After the patch it would pass, but I didn't dare to change the dg-do).
Did it include cases not covered by the new testcase (which was quite 

complete already)?


I have pushed the patch as-is, but feel free to post testcases
not covered (or improperly covered) to narrow this down further...

The case I had in mind would only be a missed optimization, and probably 
not that important, so let's move on.


[PATCH] alpha: Fix alpha_setup_incoming_varargs [PR114175]

2024-03-18 Thread Jakub Jelinek
Hi!

Like in the r14-9503 change on x86-64, I think Alpha also needs to
function_arg_advance after the hidden return pointer argument if
any.
At least, the following patch changes the assembly of s1-s6 functions
on the https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647956.html
c23-stdarg-9.c testcase, and eyeballing the assembly for int f8 (...)
the ... args are passed in 16..21 registers and then on the stack,
while for struct S s8 (...) have hidden return pointer passed in 16
register and ... args in 17..21 registers and then on the stack, and
seems without this patch the incoming varargs setup does the wrong thing
(but I can't test on alpha easily).

Many targets seem to be unaffected, e.g. aarch64, arm, s390*, so I'm not
trying to change all targets together because such a change clearly isn't
needed e.g. for targets which use special register for the hidden return
pointer.

Ok for trunk?

2024-03-18  Jakub Jelinek  

PR target/114175
* config/alpha/alpha.cc (alpha_setup_incoming_varargs): Only skip
function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL.

--- gcc/config/alpha/alpha.cc.jj2024-01-05 15:22:21.762686175 +0100
+++ gcc/config/alpha/alpha.cc   2024-03-18 16:12:14.594761619 +0100
@@ -6090,7 +6090,8 @@ alpha_setup_incoming_varargs (cumulative
 {
   CUMULATIVE_ARGS cum = *get_cumulative_args (pcum);
 
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 /* Skip the current argument.  */
 targetm.calls.function_arg_advance (pack_cumulative_args (), arg);
 

Jakub



[PING^0][PATCH V3 0/2] aarch64: Place target independent and dependent changed and unchanged code in one file.

2024-03-18 Thread Ajit Agarwal
Hello Richard/Alex:

Ping!

Please reply.

Thanks & Regards
Ajit

On 27/02/24 12:33 pm, Ajit Agarwal wrote:
> Hello Richard/Alex:
> 
> This patch has better diff with changed and unchanged code.
> Unchanged code and some of the changed code  will be extracted 
> into target independent headers and sources wherein target
> deoendent code changed and unchanged code would be in target
> dependent file like aarch64-ldp-fusion
> 
> Please review.
> 
> Thanks & Regards
> Ajit
> 
> On 23/02/24 4:41 pm, Ajit Agarwal wrote:
>> Hello Richard/Alex/Segher:
>>
>> This patch adds the changed code for target independent and
>> dependent code for load store fusion.
>>
>> Common infrastructure of load store pair fusion is
>> divided into target independent and target dependent
>> changed code.
>>
>> Target independent code is the Generic code with
>> pure virtual function to interface betwwen target
>> independent and dependent code.
>>
>> Target dependent code is the implementation of pure
>> virtual function for aarch64 target and the call
>> to target independent code.
>>
>> Bootstrapped for aarch64-linux-gnu.
>>
>> Thanks & Regards
>> Ajit
>>
>> aarch64: Place target independent and dependent changed code in one file.
>>
>> Common infrastructure of load store pair fusion is
>> divided into target independent and target dependent
>> changed code.
>>
>> Target independent code is the Generic code with
>> pure virtual function to interface betwwen target
>> independent and dependent code.
>>
>> Target dependent code is the implementation of pure
>> virtual function for aarch64 target and the call
>> to target independent code.
>>
>> 2024-02-23  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>>  * config/aarch64/aarch64-ldp-fusion.cc: Place target
>>  independent and dependent changed code.
>> ---
>>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 437 ---
>>  1 file changed, 305 insertions(+), 132 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
>> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> index 22ed95eb743..2ef22ff1e96 100644
>> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> @@ -40,10 +40,10 @@
>>  
>>  using namespace rtl_ssa;
>>  
>> -static constexpr HOST_WIDE_INT LDP_IMM_BITS = 7;
>> -static constexpr HOST_WIDE_INT LDP_IMM_SIGN_BIT = (1 << (LDP_IMM_BITS - 1));
>> -static constexpr HOST_WIDE_INT LDP_MAX_IMM = LDP_IMM_SIGN_BIT - 1;
>> -static constexpr HOST_WIDE_INT LDP_MIN_IMM = -LDP_MAX_IMM - 1;
>> +static constexpr HOST_WIDE_INT PAIR_MEM_IMM_BITS = 7;
>> +static constexpr HOST_WIDE_INT PAIR_MEM_IMM_SIGN_BIT = (1 << 
>> (PAIR_MEM_IMM_BITS - 1));
>> +static constexpr HOST_WIDE_INT PAIR_MEM_MAX_IMM = PAIR_MEM_IMM_SIGN_BIT - 1;
>> +static constexpr HOST_WIDE_INT PAIR_MEM_MIN_IMM = -PAIR_MEM_MAX_IMM - 1;
>>  
>>  // We pack these fields (load_p, fpsimd_p, and size) into an integer
>>  // (LFS) which we use as part of the key into the main hash tables.
>> @@ -138,8 +138,18 @@ struct alt_base
>>poly_int64 offset;
>>  };
>>  
>> +// Virtual base class for load/store walkers used in alias analysis.
>> +struct alias_walker
>> +{
>> +  virtual bool conflict_p (int ) const = 0;
>> +  virtual insn_info *insn () const = 0;
>> +  virtual bool valid () const  = 0;
>> +  virtual void advance () = 0;
>> +};
>> +
>> +
>>  // State used by the pass for a given basic block.
>> -struct ldp_bb_info
>> +struct pair_fusion
>>  {
>>using def_hash = nofree_ptr_hash;
>>using expr_key_t = pair_hash>;
>> @@ -161,13 +171,13 @@ struct ldp_bb_info
>>static const size_t obstack_alignment = sizeof (void *);
>>bb_info *m_bb;
>>  
>> -  ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
>> +  pair_fusion (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
>>{
>>  obstack_specify_allocation (_obstack, OBSTACK_CHUNK_SIZE,
>>  obstack_alignment, obstack_chunk_alloc,
>>  obstack_chunk_free);
>>}
>> -  ~ldp_bb_info ()
>> +  ~pair_fusion ()
>>{
>>  obstack_free (_obstack, nullptr);
>>  
>> @@ -177,10 +187,50 @@ struct ldp_bb_info
>>  bitmap_obstack_release (_bitmap_obstack);
>>}
>>}
>> +  void track_access (insn_info *, bool load, rtx mem);
>> +  void transform ();
>> +  void cleanup_tombstones ();
>> +  virtual void set_multiword_subreg (insn_info *i1, insn_info *i2,
>> + bool load_p) = 0;
>> +  virtual rtx gen_load_store_pair (rtx *pats,  rtx writeback,
>> +   bool load_p) = 0;
>> +  void merge_pairs (insn_list_t &, insn_list_t &,
>> +bool load_p, unsigned access_size);
>> +  virtual void transform_for_base (int load_size, access_group ) = 0;
>> +
>> +  bool try_fuse_pair (bool load_p, unsigned access_size,
>> + insn_info *i1, insn_info *i2);
>> +
>> +  bool fuse_pair (bool load_p, unsigned access_size,
>> +  int 

Re: [PATCH] arm: [MVE intrinsics] Fix support for loads [PR target/114323]

2024-03-18 Thread Richard Earnshaw (lists)




On 15/03/2024 20:08, Christophe Lyon wrote:

The testcase in this PR shows that we would load from an uninitialized
location, because the vld1 instrinsics are reported as "const". This
is because function_instance::reads_global_state_p() does not take
CP_READ_MEMORY into account.  Fixing this gives vld1 the "pure"
attribute instead, and solves the problem.

2024-03-15  Christophe Lyon  

PR target/114323
gcc/
* config/arm/arm-mve-builtins.cc
(function_instance::reads_global_state_p): Take CP_READ_MEMORY
into account.

gcc/testsuite/
* gcc.target/arm/mve/pr114323.c: New.


OK.

R.


---
  gcc/config/arm/arm-mve-builtins.cc  |  2 +-
  gcc/testsuite/gcc.target/arm/mve/pr114323.c | 22 +
  2 files changed, 23 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr114323.c

diff --git a/gcc/config/arm/arm-mve-builtins.cc 
b/gcc/config/arm/arm-mve-builtins.cc
index 2f2c0f4a02a..6a5775c67e5 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -746,7 +746,7 @@ function_instance::reads_global_state_p () const
if (flags & CP_READ_FPCR)
  return true;
  
-  return false;

+  return flags & CP_READ_MEMORY;
  }
  
  /* Return true if calls to the function could modify some form of

diff --git a/gcc/testsuite/gcc.target/arm/mve/pr114323.c 
b/gcc/testsuite/gcc.target/arm/mve/pr114323.c
new file mode 100644
index 000..bd9127b886a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/pr114323.c
@@ -0,0 +1,22 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_mve_hw } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#include 
+
+__attribute__((noipa))
+uint32x4_t foo (void) {
+  uint32x4_t V0 = vld1q_u32(((const uint32_t[4]){1, 2, 3, 4}));
+  return V0;
+}
+
+int main(void)
+{
+  uint32_t buf[4];
+ vst1q_u32 (buf, foo());
+
+  for (int i = 0; i < 4; i++)
+if (buf[i] != i+1)
+  __builtin_abort ();
+}


Re: [PATCH] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]

2024-03-18 Thread Peter Bergner
On 3/18/24 9:36 AM, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Feb 23, 2024 at 03:04:13PM +0530, jeevitha wrote:
>> PTImode attribute assists in generating even/odd register pairs on 128 bits.
> 
> It is a mode, not an attribute.  Attributes are on declarations, while
> modes are on a much more fundamental level (every value has a mode, in
> GCC!)
> 
>> When the user specifies PTImode as an attribute, it breaks because there is 
>> no
>> internal type to handle this mode . We have created a tree node with dummy 
>> type
>> to handle PTImode.
> 
> You are talking about the mode attribute.  Aha.

Correct.



>> We are not documenting this dummy type since users are not
>> allowed to use this type externally.
> 
> Not sure what this is meant to mean?  What does "allowed to" mean, even?
> We do not forbid people from doing anything (we can discourage them
> though).  Or dso you mean something just doesn't work?

The use case is the Linux kernel will use the mode attribute as shown in
the test case.  The type she created was needed for the code to "work"
internally, but we don't want to advertise it to users, so that's why we're
not documenting it.  Yes, pesky users who go digging through the GCC source
could find it and use it, but we don't want to encourage them. :-)


>>  PR target/106895
>>  * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add fields
>>  to hold PTImode type.
> 
> An enum does not have fields.  What do you actually mean?

Yeah, as per your follow-on comment, I think a simple "Add RS6000_BTI_INTPTI."
should suffice.


Peter



[PATCH] rs6000: Fix up setup_incoming_varargs [PR114175]

2024-03-18 Thread Jakub Jelinek
Hi!

The c23-stdarg-8.c test (as well as the new test below added to cover even
more cases) FAIL on powerpc64le-linux and presumably other powerpc* targets
as well.
Like in the r14-9503-g218d174961 change on x86-64 we need to advance
next_cum after the hidden return pointer argument even in case where
there are no user arguments before ... in C23.
The following patch does that.

There is another TYPE_NO_NAMED_ARGS_STDARG_P use later on:
  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
  && targetm.calls.must_pass_in_stack (arg))
first_reg_offset += rs6000_arg_size (TYPE_MODE (arg.type), arg.type);
but I believe it was added there in r13-3549-g4fe34cdc unnecessarily,
when there is no hidden return pointer argument, arg.type is NULL and
must_pass_in_stack_var_size as well as must_pass_in_stack_var_size_or_pad
return false in that case, and for the TYPE_NO_NAMED_ARGS_STDARG_P
case with hidden return pointer argument that argument should have pointer
type and it is the first argument, so must_pass_in_stack shouldn't be true
for it either.

Bootstrapped/regtested on powerpc64le-linux, bootstrap/regtest on
powerpc64-linux running, ok for trunk?

2024-03-18  Jakub Jelinek  

PR target/114175
* config/rs6000/rs6000-call.cc (setup_incoming_varargs): Only skip
rs6000_function_arg_advance_1 for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL.

* gcc.dg/c23-stdarg-9.c: New test.

--- gcc/config/rs6000/rs6000-call.cc.jj 2024-01-03 12:01:19.645532834 +0100
+++ gcc/config/rs6000/rs6000-call.cc2024-03-18 11:36:02.376846802 +0100
@@ -2253,7 +2253,8 @@ setup_incoming_varargs (cumulative_args_
 
   /* Skip the last named argument.  */
   next_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 rs6000_function_arg_advance_1 (_cum, arg.mode, arg.type, arg.named,
   0);
 
--- gcc/testsuite/gcc.dg/c23-stdarg-9.c.jj  2024-03-18 11:46:17.281200214 
+0100
+++ gcc/testsuite/gcc.dg/c23-stdarg-9.c 2024-03-18 11:46:26.826065998 +0100
@@ -0,0 +1,284 @@
+/* Test C23 variadic functions with no named parameters, or last named
+   parameter with a declaration not allowed in C17.  Execution tests.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -std=c23 -pedantic-errors" } */
+
+#include 
+
+struct S { int a[1024]; };
+
+int
+f1 (...)
+{
+  int r = 0;
+  va_list ap;
+  va_start (ap);
+  r += va_arg (ap, int);
+  va_end (ap);
+  return r;
+}
+
+int
+f2 (...)
+{
+  int r = 0;
+  va_list ap;
+  va_start (ap);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  va_end (ap);
+  return r;
+}
+
+int
+f3 (...)
+{
+  int r = 0;
+  va_list ap;
+  va_start (ap);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  va_end (ap);
+  return r;
+}
+
+int
+f4 (...)
+{
+  int r = 0;
+  va_list ap;
+  va_start (ap);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  va_end (ap);
+  return r;
+}
+
+int
+f5 (...)
+{
+  int r = 0;
+  va_list ap;
+  va_start (ap);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  va_end (ap);
+  return r;
+}
+
+int
+f6 (...)
+{
+  int r = 0;
+  va_list ap;
+  va_start (ap);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  va_end (ap);
+  return r;
+}
+
+int
+f7 (...)
+{
+  int r = 0;
+  va_list ap;
+  va_start (ap);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  va_end (ap);
+  return r;
+}
+
+int
+f8 (...)
+{
+  int r = 0;
+  va_list ap;
+  va_start (ap);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  va_end (ap);
+  return r;
+}
+
+struct S
+s1 (...)
+{
+  int r = 0;
+  va_list ap;
+  va_start (ap);
+  r += va_arg (ap, int);
+  va_end (ap);
+  struct S s = {};
+  s.a[0] = r;
+  return s;
+}
+
+struct S
+s2 (...)
+{
+  int r = 0;
+  va_list ap;
+  va_start (ap);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  va_end (ap);
+  struct S s = {};
+  s.a[0] = r;
+  return s;
+}
+
+struct S
+s3 (...)
+{
+  int r = 0;
+  va_list ap;
+  va_start (ap);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  va_end (ap);
+  struct S s = {};
+  s.a[0] = r;
+  return s;
+}
+
+struct S
+s4 (...)
+{
+  int r = 0;
+  va_list ap;
+  va_start (ap);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  r += va_arg (ap, int);
+  

Re: [PATCH] testsuite: Turn errors back into warnings in arm/acle/cde-mve-error-2.c

2024-03-18 Thread Richard Earnshaw (lists)




On 15/03/2024 15:13, Thiago Jung Bauermann wrote:


Hello,

"Richard Earnshaw (lists)"  writes:


On 13/01/2024 20:46, Thiago Jung Bauermann wrote:

diff --git a/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c 
b/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c
index 5b7774825442..da283a06a54d 100644
--- a/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c
+++ b/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c
@@ -2,6 +2,7 @@

  /* { dg-do assemble } */
  /* { dg-require-effective-target arm_v8_1m_main_cde_mve_fp_ok } */
+/* { dg-options "-fpermissive" } */
  /* { dg-add-options arm_v8_1m_main_cde_mve_fp } */

  /* The error checking files are split since there are three kinds of
@@ -115,73 +116,73 @@ uint8x16_t test_bad_immediates (uint8x16_t n, uint8x16_t 
m, int someval,

/* `imm' is of wrong type.  */
accum += __arm_vcx1q_u8 (0, "");/* { dg-error {argument 
2 to '__builtin_arm_vcx1qv16qi' must be a constant immediate in range \[0-4095\]} } */
-  /* { dg-warning {passing argument 2 of '__builtin_arm_vcx1qv16qi' makes integer from 
pointer without a cast \[-Wint-conversion\]} "" { target *-*-* } 117 } */
+  /* { dg-warning {passing argument 2 of '__builtin_arm_vcx1qv16qi' makes integer from 
pointer without a cast \[-Wint-conversion\]} "" { target *-*-* } 118 } */


Absolute line numbers are a pain, but I think we can use '.-1' (without the 
quotes) in
these cases to minimize the churn.


That worked, thank you for the tip.


If that works, ok with that change.


I took the opportunity to request commit access to the GCC repo so that
I can commit the patch myself. Sorry for the delay. I'll commit it as
soon as I get it.

Thank you for the patch review! I'm including below the updated version.


I pushed this, thanks.

R.



--
Thiago


 From 78e70788da5ed849d7828b0219d3aa8955ad0fea Mon Sep 17 00:00:00 2001
From: Thiago Jung Bauermann 
Date: Sat, 13 Jan 2024 14:28:07 -0300
Subject: [PATCH v2] testsuite: Turn errors back into warnings in
  arm/acle/cde-mve-error-2.c
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Since commit 2c3db94d9fd ("c: Turn int-conversion warnings into
permerrors") the test fails with errors such as:

   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
32)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
33)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
34)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
35)
 ⋮
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   at line 118 (test for 
warnings, line 117)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
119)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   at line 120 (test for 
warnings, line 119)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
121)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   at line 122 (test for 
warnings, line 121)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
123)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   at line 124 (test for 
warnings, line 123)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
125)
 ⋮
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0  (test for excess errors)

There's a total of 1016 errors.  Here's a sample of the excess errors:

   Excess errors:
   /path/gcc.git/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c:117:31: 
error: passing argument 2 of '__builtin_arm_vcx1qv16qi' makes integer from 
pointer without a cast [-Wint-conversion]
   /path/gcc.git/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c:119:3: 
error: passing argument 3 of '__builtin_arm_vcx1qav16qi' makes integer from 
pointer without a cast [-Wint-conversion]
   /path/gcc.git/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c:121:3: 
error: passing argument 3 of '__builtin_arm_vcx2qv16qi' makes integer from 
pointer without a cast [-Wint-conversion]
   /path/gcc.git/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c:123:3: 
error: passing argument 3 of '__builtin_arm_vcx2qv16qi' makes integer from 
pointer without a cast [-Wint-conversion]

The test expects these messages to be warnings, not errors.  My first try
was to change it to expect them as errors instead.  This didn't work, IIUC
because the error prevents the compiler from continuing processing the file
and thus other errors which are expected by the test don't get emitted.

Therefore, add -fpermissive so that the test behaves as it did previously.
Because of the additional line in the header, the line numbers of the
expected warnings don't match anymore so replace them with ".-1" as
suggested by Richard Earnshaw.

Tested on armv8l-linux-gnueabihf.

gcc/testsuite/ChangeLog:
* gcc.target/arm/acle/cde-mve-error-2.c: Add -fpermissive.
---
  .../gcc.target/arm/acle/cde-mve-error-2.c 

Re: [PATCH, OpenACC 2.7] struct/array reductions for Fortran

2024-03-18 Thread Thomas Schwinge
Hi Chung-Lin!

Thanks for your work here, which I'm beginning to look into (prerequisite
"[PATCH, OpenACC 2.7] Implement reductions for arrays and structs",
first, of course); it'll take me some time.


In non-offloading testing, I noticed for x86_64-pc-linux-gnu '-m32':

+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O0  (test for excess errors)
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O0  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O1  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O1  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O2  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  execution test
+PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -Os  (test for excess errors)
+FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -Os  execution test

With optimizations enabled, it runs into 'STOP 4'.

Per '-Wextra':

[...]/libgomp.oacc-fortran/reduction-13.f90:40:6: Warning: Inequality 
comparison for REAL(4) at (1) [-Wcompare-reals]
[...]/libgomp.oacc-fortran/reduction-13.f90:63:6: Warning: Inequality 
comparison for REAL(4) at (1) [-Wcompare-reals]
[...]/libgomp.oacc-fortran/reduction-13.f90:64:6: Warning: Inequality 
comparison for REAL(8) at (1) [-Wcompare-reals]

Do we need to allow for some epsilon (generally in such test cases), or
is there another problem?

For reference:

On 2024-02-08T22:47:13+0800, Chung-Lin Tang  wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-13.f90
> @@ -0,0 +1,66 @@
> +! { dg-do run }
> +
> +! record type reductions
> +
> +program reduction_13
> +  implicit none
> +
> +  type t1
> + integer :: i
> + real :: r
> +  end type t1
> +
> +  type t2
> + real :: r
> + integer :: i
> + double precision :: d
> +  end type t2
> +
> +  integer, parameter :: n = 10, ng = 8, nw = 4, vl = 32
> +  integer :: i
> +  type(t1) :: v1, a1
> +  type (t2) :: v2, a2
> +
> +  v1%i = 0
> +  v1%r = 0
> +  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(v1)
> +  !$acc loop reduction (+:v1)
> +  do i = 1, n
> + v1%i = v1%i + 1
> + v1%r = v1%r + 2
> +  end do
> +  !$acc end parallel
> +  a1%i = 0
> +  a1%r = 0
> +  do i = 1, n
> + a1%i = a1%i + 1
> + a1%r = a1%r + 2
> +  end do
> +  if (v1%i .ne. a1%i) STOP 1
> +  if (v1%r .ne. a1%r) STOP 2
> +
> +  v2%i = 1
> +  v2%r = 1
> +  v2%d = 1
> +  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(v2)
> +  !$acc loop reduction (*:v2)
> +  do i = 1, n
> + v2%i = v2%i * 2
> + v2%r = v2%r * 1.1
> + v2%d = v2%d * 1.3
> +  end do
> +  !$acc end parallel
> +  a2%i = 1
> +  a2%r = 1
> +  a2%d = 1
> +  do i = 1, n
> + a2%i = a2%i * 2
> + a2%r = a2%r * 1.1
> + a2%d = a2%d * 1.3
> +  end do
> +
> +  if (v2%i .ne. a2%i) STOP 3
> +  if (v2%r .ne. a2%r) STOP 4
> +  if (v2%d .ne. a2%d) STOP 5
> +
> +end program reduction_13


Grüße
 Thomas


Re: [PATCH] cpp: new built-in __EXP_COUNTER__

2024-03-18 Thread Kaz Kylheku
On 2024-03-18 00:30, Jonathan Wakely wrote:
>>+@item __EXP_COUNTER__
>>+This macro's name means "(macro) expansion counter".
>>+Outside of macro replacement sequences, it expands to the integer
>>+token @code{1}.  This make it possible to easily test for the presence
>>+of this feature using conditional directives such as
>>+@code{#if __EXP_COUNTER__}.
> 
> It's a macro, so you can just use '#ifdef __EXP_COUNTER__' to test if
> it's supported. Is this additional behaviour necessary?

Thanks for looking at the patch.

The macro has to be defined, but it doesn't have to be 1.

Outside a macro body, it could just appear defined
with an empty value, as if by #define __EXP_COUNTER__.

When I dump the predefined macros of a nearby GCC installation, I
see very few empty ones:

   $ echo | gcc -dM -E - | awk 'NF == 2'
   #define __USER_LABEL_PREFIX__
   #define __REGISTER_PREFIX__

The __EXP_COUNTER__ and __UEXP_COUNTER__ symbols are
intended for suffix and infix use, so they are roughly in
a kind of category with those those two.

I will make that change, and also fix the grammar error
and remove the conflict-promoting ChangeLog changes.


Re: [PATCH v6 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-03-18 Thread Qing Zhao


> On Mar 18, 2024, at 12:30, Siddhesh Poyarekar  wrote:
> 
> On 2024-03-18 12:28, Qing Zhao wrote:
 This should probably bail out if object_size_type & OST_DYNAMIC == 0.
>>> Okay. Will add this.
>> When add this into access_with_size_object_size, I found some old bugs in 
>> early_object_sizes_execute_one, and fixed them as well.
> 
> Would you be able to isolate this fix and post them separately?  I reckon 
> they would be relevant for gcc 14 too.

Yes, that’s a good idea, I can do that.
No specific testing case for it, though. 

thanks.

Qing

> 
> Thanks,
> Sid



Re: [PATCH v6 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-03-18 Thread Siddhesh Poyarekar

On 2024-03-18 12:28, Qing Zhao wrote:

This should probably bail out if object_size_type & OST_DYNAMIC == 0.

Okay. Will add this.


When add this into access_with_size_object_size, I found some old bugs 
in early_object_sizes_execute_one, and fixed them as well.




Would you be able to isolate this fix and post them separately?  I 
reckon they would be relevant for gcc 14 too.


Thanks,
Sid


Re: [PATCH v6 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-03-18 Thread Qing Zhao


On Mar 13, 2024, at 15:17, Qing Zhao  wrote:



On Mar 11, 2024, at 13:11, Siddhesh Poyarekar  wrote:



On 2024-02-16 14:47, Qing Zhao wrote:
gcc/ChangeLog:
* tree-object-size.cc (access_with_size_object_size): New function.
(call_object_size): Call the new function.
gcc/testsuite/ChangeLog:
* gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
* gcc.dg/flex-array-counted-by-3.c: New test.
* gcc.dg/flex-array-counted-by-4.c: New test.
* gcc.dg/flex-array-counted-by-5.c: New test.
---
 .../gcc.dg/builtin-object-size-common.h   |  11 ++
 .../gcc.dg/flex-array-counted-by-3.c  |  63 +++
 .../gcc.dg/flex-array-counted-by-4.c  | 178 ++
 .../gcc.dg/flex-array-counted-by-5.c  |  48 +
 gcc/tree-object-size.cc   |  59 ++
 5 files changed, 359 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-5.c
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h 
b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
index 66ff7cdd953a..b677067c6e6b 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
@@ -30,3 +30,14 @@ unsigned nfails = 0;
   __builtin_abort ();   \
 return 0;   \
   } while (0)
+
+#define EXPECT(p, _v) do {   \
+  size_t v = _v;   \
+  if (p == v)   \
+__builtin_printf ("ok:  %s == %zd\n", #p, p);   \
+  else   \
+{   \
+  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v);   \
+  FAIL ();   \
+}   \
+} while (0);
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
new file mode 100644
index ..0066c32ca808
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
@@ -0,0 +1,63 @@
+/* test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+struct nested_annotated {
+  struct {
+union {
+  int b;
+  float f;
+};
+int n;
+  };
+  int c[] __attribute__ ((counted_by (b)));
+} *array_nested_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
+  + normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+   + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  array_nested_annotated
+= (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
+  + attr_count *  sizeof (int));
+  array_nested_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1),
+array_annotated->b * sizeof (int));
+EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
+array_nested_annotated->b * sizeof (int));
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10,10);
+  test ();
+  DONE ();
+}
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
new file mode 100644
index ..3ce7f3545549
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
@@ -0,0 +1,178 @@
+/* test the attribute counted_by and its usage in
+__builtin_dynamic_object_size: what's the correct behavior when the
+allocation size mismatched with the value of counted_by attribute?
+we should always use the latest value that is hold by the counted_by
+field.  */
+/* { dg-do run } */
+/* { dg-options "-O -fstrict-flex-arrays=3" } */
+
+#include "builtin-object-size-common.h"
+
+struct annotated {
+  size_t foo;
+  char others;
+  char array[] __attribute__((counted_by (foo)));
+};
+
+#define noinline __attribute__((__noinline__))
+#define SIZE_BUMP 10
+#define MAX(a, b) ((a) > (b) ? (a) : (b))
+
+/* In general, Due to type casting, the type for the pointee of a pointer
+   does not say anything about the object it points to,
+   So, __builtin_object_size can not directly use the type of the pointee
+   to decide the size of the object the pointer points to.
+
+   there are only two reliable ways:
+   A. observed allocations  (call to the allocation functions in the routine)
+   B. observed accesses (read or write access to the location of the
+ pointer points to)
+
+   that provide information about the type/existence of an object at
+   the corresponding address.
+
+   for A, we use the 

Re: [PATCH] Document -fexcess-precision=16.

2024-03-18 Thread Joseph Myers
On Mon, 18 Mar 2024, liuhongt wrote:

> +If @option{-fexcess-precision=16} is specified, casts and assignments of
> +@code{_Float16} and @code{bfloat16_t} cause value to be rounded to their
> +semantic types if they're supported by the target.

Isn't that option about rounding results of all operations, whether or not 
a cast or assignment is involved?  That's certainly what the brief mention 
of this option in extend.texi says, and fits the intent that 
-fexcess-precision=16 corresponds to FLT_EVAL_METHOD == 16.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-18 Thread Uros Bizjak
On Mon, Mar 18, 2024 at 3:51 PM Segher Boessenkool
 wrote:
>
> On Thu, Mar 07, 2024 at 11:46:54PM +0100, Uros Bizjak wrote:
> > > Can't you just describe the dataflow then, without an unspec?  An unspec
> > > by definition does some (unspecified) operation on the data.
> >
> > Previously, it was defined as:
> >
> >  (define_insn "*pushfl2"
> >[(set (match_operand:W 0 "push_operand" "=<")
> >  (match_operand:W 1 "flags_reg_operand"))]
> >
> > But Wmode, AKA SI/DImode is not CCmode. And as said in my last
> > message, nothing prevents current source code to try to update the CC
> > register here.
>
> So you can use an unspec just to convert the flags reg to an integer?
> To make an integer representation of flags reg contents.

Yes, this is correct. But please note the v3 patch, where the mode
update is made at the correct location. Quote from the patch:

Replace cc_use_loc with the entire new RTX only in case cc_use_loc satisfies
COMPARISON_P predicate.  Otherwise scan the entire cc_use_loc RTX for CC reg
to be updated with a new mode.

> Or is that what we started with here?

The reason for the patch is that when CC reg is used outside
comparison RTX, the combine tries to update CC reg mode where it is
used after the combined instruction. This happens on extremely rare
occasions, but when it happens, combine assumes that it is used
exclusively in the comparison RTX and uses "SUBST (XEXP (*cc_use_loc,
0), ...);". XEXP (*cc_use_loc, 0) will segfault when CC reg is
referred in a simple SET assignment, not only when it is referred in
an UNSPEC. Please note that the comparison RTX is handled a few lines
above, where my patch also fixes the "???" issue.

Uros.


Re: [PATCH v2] RISC-V: Introduce option -mrvv-max-lmul for RVV autovec

2024-03-18 Thread Robin Dapp
LGTM as well.

Regards
 Robin



Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-18 Thread Uros Bizjak
On Mon, Mar 18, 2024 at 3:46 PM Segher Boessenkool
 wrote:
>
> On Thu, Mar 07, 2024 at 11:27:28PM +0100, Uros Bizjak wrote:
> > On Thu, Mar 7, 2024 at 11:07 PM Uros Bizjak  wrote:
> > > > >  (unspec:DI [
> > > > >  (reg:CC 17 flags)
> > > > >  ] UNSPEC_PUSHFL)
> > > >
> > > > But that is invalid RTL?  The only valid use of a CC is written as
> > > > cc-compared-to-0.  It means "the result of something compared to 0,
> > > > stored in that cc reg".
> > > >
> > > > (And you can copy a CC reg around, but that is not a use ;-) )
> >
> > Hm... under this premise, we can also say that any form that is not
> > cc-compared-to-0 is not a use.
>
> Well, no.  All uses of CC are written as comparisons to 0, or are pure
> dataflow.  Anything else is not "not a use" but just invalid.
>
> > Consequently, if it is not a use, then
> > the CC reg should not be updated at its use location, so my v1 patch,
> > where we simply skip the update (but retain the combined insn),
> > actually makes sense.
>
> With more asserts, perhaps.
>
> > In this concrete situation, we don't care about CC register mode in
> > the PUSHFL insn. And we should not change CC reg mode of the use,
> > because any other mode than the generic CCmode won't be recognized by
> > the insn pattern.
>
> You always care about the CC mode, that is why you always write it as
> comparison, so the backend can choose a mode based on what the flag bits
> mean in this context.  For an extreme example look at 390, but on pretty
> much any target both signed and unsigned comparisons use the same flag
> bits, and maybe fp comparisons as well.

After some more thoughts, I came up with v3 [1], where the mode is
updated also in non-comparison use. But instead of a blind SUBST, we
find the correct use location and update the mode accordingly. If the
new mode is not recognized in the use insn, then the whole combination
is cancelled. Since x86 pushfl does not care about CCmode, I have
changed it to handle all CCmodes. Please note, that with this
approach, the backend can choose a mode.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647634.html

> But pushfl does sound like pure dataflow.  Why is this a builtin, is
> it ever a good idea if the user writes stuff the compiler can do a
> better job doing itself, not to mention it is way easier for the
> compiler anyway?  :-)

The mode of the pushfl has to be correct, because it operates on
stack. Unfortunately, the plain move can't do this due to
WORDmode/CCmode mismatch in the rtx, so UNSPEC is necessary. But it IS
a pure dataflow instruction - it is a PUSH.

Uros.


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-18 Thread Segher Boessenkool
On Thu, Mar 07, 2024 at 11:46:54PM +0100, Uros Bizjak wrote:
> > Can't you just describe the dataflow then, without an unspec?  An unspec
> > by definition does some (unspecified) operation on the data.
> 
> Previously, it was defined as:
> 
>  (define_insn "*pushfl2"
>[(set (match_operand:W 0 "push_operand" "=<")
>  (match_operand:W 1 "flags_reg_operand"))]
> 
> But Wmode, AKA SI/DImode is not CCmode. And as said in my last
> message, nothing prevents current source code to try to update the CC
> register here.

So you can use an unspec just to convert the flags reg to an integer?
To make an integer representation of flags reg contents.

Or is that what we started with here?


Segher


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-18 Thread Segher Boessenkool
On Thu, Mar 07, 2024 at 11:27:28PM +0100, Uros Bizjak wrote:
> On Thu, Mar 7, 2024 at 11:07 PM Uros Bizjak  wrote:
> > > >  (unspec:DI [
> > > >  (reg:CC 17 flags)
> > > >  ] UNSPEC_PUSHFL)
> > >
> > > But that is invalid RTL?  The only valid use of a CC is written as
> > > cc-compared-to-0.  It means "the result of something compared to 0,
> > > stored in that cc reg".
> > >
> > > (And you can copy a CC reg around, but that is not a use ;-) )
> 
> Hm... under this premise, we can also say that any form that is not
> cc-compared-to-0 is not a use.

Well, no.  All uses of CC are written as comparisons to 0, or are pure
dataflow.  Anything else is not "not a use" but just invalid.

> Consequently, if it is not a use, then
> the CC reg should not be updated at its use location, so my v1 patch,
> where we simply skip the update (but retain the combined insn),
> actually makes sense.

With more asserts, perhaps.

> In this concrete situation, we don't care about CC register mode in
> the PUSHFL insn. And we should not change CC reg mode of the use,
> because any other mode than the generic CCmode won't be recognized by
> the insn pattern.

You always care about the CC mode, that is why you always write it as
comparison, so the backend can choose a mode based on what the flag bits
mean in this context.  For an extreme example look at 390, but on pretty
much any target both signed and unsigned comparisons use the same flag
bits, and maybe fp comparisons as well.

But pushfl does sound like pure dataflow.  Why is this a builtin, is
it ever a good idea if the user writes stuff the compiler can do a
better job doing itself, not to mention it is way easier for the
compiler anyway?  :-)


Segher


Re: [PATCH] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]

2024-03-18 Thread Segher Boessenkool
Hi!

On Fri, Feb 23, 2024 at 03:04:13PM +0530, jeevitha wrote:
> PTImode attribute assists in generating even/odd register pairs on 128 bits.

It is a mode, not an attribute.  Attributes are on declarations, while
modes are on a much more fundamental level (every value has a mode, in
GCC!)

> When the user specifies PTImode as an attribute, it breaks because there is no
> internal type to handle this mode . We have created a tree node with dummy 
> type
> to handle PTImode.

You are talking about the mode attribute.  Aha.

This attribute says the mode of the datum is something specific; it is
the only way a user can specify the mode directly.  Not something you
want to use normally, but it's a nice escape hatch (like here).

> We are not documenting this dummy type since users are not
> allowed to use this type externally.

Not sure what this is meant to mean?  What does "allowed to" mean, even?
We do not forbid people from doing anything (we can discourage them
though).  Or dso you mean something just doesn't work?

> gcc/
>   PR target/106895
>   * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add fields
>   to hold PTImode type.

An enum does not have fields.  What do you actually mean?

> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -2304,6 +2304,7 @@ enum rs6000_builtin_type_index
>RS6000_BTI_ptr_vector_quad,
>RS6000_BTI_ptr_long_long,
>RS6000_BTI_ptr_long_long_unsigned,
> +  RS6000_BTI_PTI,

Please call it RS6000_BTI_INTPTI, to be more in line with the naming of
other things.

With that fixed it should be good.  Please repost with a good commit
comment and changelog :-)

Thanks,


Segher


[PATCH] tree-optimization/114375 - disallow SLP discovery of permuted mask loads

2024-03-18 Thread Richard Biener
We cannot currently handle permutations of mask loads in code generation
or permute optimization.  But we simply drop any permutation on the
floor, so the following instead rejects the SLP build rather than
producing wrong-code.  I've also made sure to reject them in
vectorizable_load for completeness.

Bootstrapped and tested on x86_64-unknown-linux-gnu, I'm going to
watch the Linaro CI and push if that shows no regressions.

Thanks,
Richard.

PR tree-optimization/114375
* tree-vect-slp.cc (vect_build_slp_tree_2): Compute the
load permutation for masked loads but reject it when any
such is necessary.
* tree-vect-stmts.cc (vectorizable_load): Reject masked
VMAT_ELEMENTWISE and VMAT_STRIDED_SLP as those are not
supported.

* gcc.dg/vect/vect-pr114375.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/vect-pr114375.c | 44 +++
 gcc/tree-vect-slp.cc  | 39 ++--
 gcc/tree-vect-stmts.cc|  8 +
 3 files changed, 81 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-pr114375.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-pr114375.c 
b/gcc/testsuite/gcc.dg/vect/vect-pr114375.c
new file mode 100644
index 000..1e1cb0123d0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-pr114375.c
@@ -0,0 +1,44 @@
+/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
+
+#include "tree-vect.h"
+
+int a[512];
+int b[512];
+int c[512];
+
+void __attribute__((noipa))
+foo(int * __restrict p)
+{
+  for (int i = 0; i < 64; ++i)
+{
+  int tem = 2, tem2 = 2;
+  if (a[4*i + 1])
+tem = p[4*i];
+  if (a[4*i])
+tem2 = p[4*i + 2];
+  b[2*i] = tem2;
+  b[2*i+1] = tem;
+  if (a[4*i + 2])
+tem = p[4*i + 1];
+  if (a[4*i + 3])
+tem2 = p[4*i + 3];
+  c[2*i] = tem2;
+  c[2*i+1] = tem;
+}
+}
+int main()
+{
+  check_vect ();
+
+  for (int i = 0; i < 512; ++i)
+a[i] = (i >> 1) & 1;
+
+  foo (a);
+
+  if (c[0] != 1 || c[1] != 0 || c[2] != 1 || c[3] != 0
+  || b[0] != 2 || b[1] != 2 || b[2] != 2 || b[3] != 2)
+abort ();
+
+  return 0;
+}
+
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 527b06c9f9c..23f9593191a 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1921,12 +1921,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
   if (STMT_VINFO_DATA_REF (stmt_info)
   && DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)))
 {
-  if (gcall *stmt = dyn_cast  (stmt_info->stmt))
-   gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
-   || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
-   || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
-   || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
-  else if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
+  if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
gcc_assert (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)));
   else
{
@@ -1943,19 +1938,43 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
  load_permutation.create (group_size);
  stmt_vec_info first_stmt_info
= DR_GROUP_FIRST_ELEMENT (SLP_TREE_SCALAR_STMTS (node)[0]);
+ bool any_permute = false;
  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), j, load_info)
{
  int load_place;
  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
load_place = vect_get_place_in_interleaving_chain
-   (load_info, first_stmt_info);
+   (load_info, first_stmt_info);
  else
load_place = 0;
  gcc_assert (load_place != -1);
- load_permutation.safe_push (load_place);
+ any_permute |= load_place != j;
+ load_permutation.quick_push (load_place);
+   }
+
+ if (gcall *stmt = dyn_cast  (stmt_info->stmt))
+   {
+ gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
+ || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
+ || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
+ || gimple_call_internal_p (stmt,
+IFN_MASK_LEN_GATHER_LOAD));
+ load_permutation.release ();
+ /* We cannot handle permuted masked loads, see PR114375.  */
+ if (any_permute
+ || (STMT_VINFO_GROUPED_ACCESS (stmt_info)
+ && DR_GROUP_SIZE (first_stmt_info) != group_size)
+ || STMT_VINFO_STRIDED_P (stmt_info))
+   {
+ matches[0] = false;
+ return NULL;
+   }
+   }
+ else
+   {
+ SLP_TREE_LOAD_PERMUTATION (node) = load_permutation;
+ return node;
}

Re: [PATCH] gcc_update: Add missing generated files

2024-03-18 Thread Joseph Myers
On Fri, 1 Mar 2024, Jonathan Wakely wrote:

> I'm seeing errors for --enable-maintainer-mode builds due to incorrectly
> regenerating these files. They should be touched by gcc_update so they
> aren't regenerated unnecessarily.
> 
> contrib/ChangeLog:
> 
>   * gcc_update: Add more generated files in libcc1, lto-plugin,
>   fixincludes, and libstdc++-v3.

OK.

-- 
Joseph S. Myers
josmy...@redhat.com



[COMMITTED] testsuite: Fix excess errors for new modules testcases on powerpc [PR114320]

2024-03-18 Thread Nathaniel Shead
Tested on powerpc64-unknown-linux-gnu, committed as obvious.

-- >8 --

On some configurations, PowerPC emits -Wpsabi warnings when using IEEE
long doubles on a machine configured with IBM long double by default.
This patch suppresses these warnings for this testcase.

PR testsuite/114320

gcc/testsuite/ChangeLog:

* g++.dg/modules/target-powerpc-1_a.C: Suppress -Wpsabi.
* g++.dg/modules/target-powerpc-1_b.C: Likewise.

Signed-off-by: Nathaniel Shead 
---
 gcc/testsuite/g++.dg/modules/target-powerpc-1_a.C | 2 +-
 gcc/testsuite/g++.dg/modules/target-powerpc-1_b.C | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/modules/target-powerpc-1_a.C 
b/gcc/testsuite/g++.dg/modules/target-powerpc-1_a.C
index 693ed101ed5..01709e0eac0 100644
--- a/gcc/testsuite/g++.dg/modules/target-powerpc-1_a.C
+++ b/gcc/testsuite/g++.dg/modules/target-powerpc-1_a.C
@@ -1,7 +1,7 @@
 // PR c++/98645
 // { dg-do compile { target powerpc*-*-* } }
 // { dg-require-effective-target ppc_float128_sw }
-// { dg-additional-options "-fmodules-ts -mfloat128 -mabi=ieeelongdouble" }
+// { dg-additional-options "-fmodules-ts -mfloat128 -mabi=ieeelongdouble 
-Wno-psabi" }
 
 export module M;
 export __ibm128 i = 0.0;
diff --git a/gcc/testsuite/g++.dg/modules/target-powerpc-1_b.C 
b/gcc/testsuite/g++.dg/modules/target-powerpc-1_b.C
index d6b684b556d..b4209bc1550 100644
--- a/gcc/testsuite/g++.dg/modules/target-powerpc-1_b.C
+++ b/gcc/testsuite/g++.dg/modules/target-powerpc-1_b.C
@@ -1,7 +1,7 @@
 // PR c++/98645
 // { dg-module-do compile { target powerpc*-*-* } }
 // { dg-require-effective-target ppc_float128_sw }
-// { dg-additional-options "-fmodules-ts -mfloat128 -mabi=ieeelongdouble" }
+// { dg-additional-options "-fmodules-ts -mfloat128 -mabi=ieeelongdouble 
-Wno-psabi" }
 
 import M;
 
-- 
2.43.2



Re: [PATCH v2 00/13] Add aarch64-w64-mingw32 target

2024-03-18 Thread Christophe Lyon
On Thu, 7 Mar 2024 at 21:48, Evgeny Karpov  wrote:
>
> Monday, March 4, 2024
> Evgeny Karpov wrote:
>
> >
> > Changes from v1 to v2:
> > Adjust the target name to aarch64-*-mingw* to exclude the big-endian target
> > from support.
> > Exclude 64-bit ISA.
> > Rename enum calling_abi to aarch64_calling_abi.
> > Move AArch64 MS ABI definitions FIXED_REGISTERS,
> > CALL_REALLY_USED_REGISTERS, and STATIC_CHAIN_REGNUM from aarch64.h
> > to aarch64-abi-ms.h.
> > Rename TARGET_ARM64_MS_ABI to TARGET_AARCH64_MS_ABI.
> > Exclude TARGET_64BIT from the aarch64 target.
> > Exclude HAVE_GAS_WEAK.
> > Set HAVE_GAS_ALIGNED_COMM to 1 by default.
> > Use a reference from "x86 Windows Options" to "Cygwin and MinGW
> > Options".
> > Update commit descriptions to follow standard style.
> > Rebase from 4th March 2024.
>
> Hello,
>
> It looks like the initial feedback has been addressed.

I had a look at the v2 series, and besides a minor comment patch #8,
ISTM than all the comments your received about v1 have been addressed,
indeed.

> While unit testing for the x86_64-w64-mingw32 target is still in
> progress, the first 4 patches do not obviously change other
> targets, including aarch64-linux-gnu.
> Could they be merged once stage 1 starts,
> or could it be done even now?

What would be the benefit of committing only the first 4 patches?
(whether now or when stage 1 reopens)

Thanks,

Christophe

> Thanks!
>
> Regards,
> Evgeny
>


Re: [pushed] aarch64: Define out-of-class static constants

2024-03-18 Thread Vaseeharan Vinayagamoorthy
Hi Richard,

I think this patch is breaking the build of aarch64-none-elf and 
aarch64-none-linux-gnu targets, when building with GCC 4.8.
This is not an issue when building with GCC 7.5.

Kind regards,
Vasee


From: Richard Sandiford 
Sent: 06 March 2024 10:06
To: gcc-patches@gcc.gnu.org
Subject: [pushed] aarch64: Define out-of-class static constants

While reworking the aarch64 feature descriptions, I forgot
to add out-of-class definitions of some static constants.
This could lead to a build failure with some compilers.

This was seen with some WIP to increase the number of extensions
beyond 64.  It's latent on trunk though, and a regression from
before the rework.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
* config/aarch64/aarch64-feature-deps.h (feature_deps::info): Add
out-of-class definitions of static constants.
---
 gcc/config/aarch64/aarch64-feature-deps.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-feature-deps.h 
b/gcc/config/aarch64/aarch64-feature-deps.h
index a1b81f9070b..3641badb82f 100644
--- a/gcc/config/aarch64/aarch64-feature-deps.h
+++ b/gcc/config/aarch64/aarch64-feature-deps.h
@@ -71,6 +71,9 @@ template struct info;
 static constexpr auto enable = flag | get_enable REQUIRES; \
 static constexpr auto explicit_on = enable | get_enable EXPLICIT_ON; \
   };   \
+  const aarch64_feature_flags info::flag;  \
+  const aarch64_feature_flags info::enable;\
+  const aarch64_feature_flags info::explicit_on; \
   constexpr info IDENT ()  \
   {\
 return info ();\
--
2.25.1



Re: [PATCH v2 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-03-18 Thread Christophe Lyon
Hi!

On Mon, 4 Mar 2024 at 18:44, Evgeny Karpov  wrote:
>
> From: Zac Walker 
> Date: Fri, 1 Mar 2024 10:49:28 +0100
> Subject: [PATCH v2 08/13] aarch64: Add Cygwin and MinGW environments for
>  AArch64
>
> Define Cygwin and MinGW environment such as types, SEH definitions,
> shared libraries, etc.
>
> gcc/ChangeLog:
>
> * config.gcc: Add Cygwin and MinGW difinitions.
> * config/aarch64/aarch64-protos.h
> (mingw_pe_maybe_record_exported_symbol): Declare functions
> which are used in Cygwin and MinGW environment.
> (mingw_pe_section_type_flags): Likewise.
> (mingw_pe_unique_section): Likewise.
> (mingw_pe_encode_section_info): Likewise.
> * config/aarch64/cygming.h: New file.
> ---
>  gcc/config.gcc  |   4 +
>  gcc/config/aarch64/aarch64-protos.h |   5 +
>  gcc/config/aarch64/cygming.h| 175 
>  3 files changed, 184 insertions(+)
>  create mode 100644 gcc/config/aarch64/cygming.h
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 3aca257c322..4471599454b 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -1267,7 +1267,11 @@ aarch64*-*-linux*)
>  aarch64-*-mingw*)
> tm_file="${tm_file} aarch64/aarch64-abi-ms.h"
> tm_file="${tm_file} aarch64/aarch64-coff.h"
> +   tm_file="${tm_file} aarch64/cygming.h"
> +   tm_file="${tm_file} mingw/mingw32.h"
> +   tm_file="${tm_file} mingw/mingw-stdint.h"
> tmake_file="${tmake_file} aarch64/t-aarch64"
> +   target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
> case ${enable_threads} in
>   "" | yes | win32)
> thread_file='win32'
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index bd719b992a5..759e1a0f9da 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1110,6 +1110,11 @@ extern void aarch64_output_patchable_area (unsigned 
> int, bool);
>
>  extern void aarch64_adjust_reg_alloc_order ();
>
> +extern void mingw_pe_maybe_record_exported_symbol (tree, const char *, int);
> +extern unsigned int mingw_pe_section_type_flags (tree, const char *, int);
> +extern void mingw_pe_unique_section (tree, int);
> +extern void mingw_pe_encode_section_info (tree, rtx, int);
> +
>  bool aarch64_optimize_mode_switching (aarch64_mode_entity);
>  void aarch64_restore_za (rtx);
>
> diff --git a/gcc/config/aarch64/cygming.h b/gcc/config/aarch64/cygming.h
> new file mode 100644
> index 000..2f239c42a89
> --- /dev/null
> +++ b/gcc/config/aarch64/cygming.h
> @@ -0,0 +1,175 @@
> +/* Operating system specific defines to be used when targeting GCC for
> +   hosting on Windows32, using a Unix style C library and tools.
> +   Copyright (C) 1995-2024 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +#ifndef GCC_AARCH64_CYGMING_H
> +#define GCC_AARCH64_CYGMING_H
> +
> +#undef PREFERRED_DEBUGGING_TYPE
> +#define PREFERRED_DEBUGGING_TYPE DINFO_TYPE_NONE
> +
> +#define FASTCALL_PREFIX '@'
> +
> +#define print_reg(rtx, code, file)
> +
> +#define SYMBOL_FLAG_DLLIMPORT 0
> +#define SYMBOL_FLAG_DLLEXPORT 0
> +
> +#define SYMBOL_REF_DLLEXPORT_P(X) \
> +   ((SYMBOL_REF_FLAGS (X) & SYMBOL_FLAG_DLLEXPORT) != 0)
> +
> +/* Disable SEH and declare the required SEH-related macros that are
> +still needed for compilation.  */
> +#undef TARGET_SEH
> +#define TARGET_SEH 0
> +
> +#define SSE_REGNO_P(N) 0
> +#define GENERAL_REGNO_P(N) 0
I think you forgot to add a comment to explain the above two lines.
(it was requested during v1 review)

Thanks,

Christophe

> +#define SEH_MAX_FRAME_SIZE 0
> +
> +#undef DEFAULT_ABI
> +#define DEFAULT_ABI AARCH64_CALLING_ABI_MS
> +
> +#undef TARGET_PECOFF
> +#define TARGET_PECOFF 1
> +
> +#include 
> +#ifdef __MINGW32__
> +#include 
> +#endif
> +
> +extern void mingw_pe_asm_named_section (const char *, unsigned int, tree);
> +extern void mingw_pe_declare_function_type (FILE *file, const char *name,
> +   int pub);
> +
> +#define TARGET_ASM_NAMED_SECTION  mingw_pe_asm_named_section
> +
> +/* Select attributes for named sections.  */
> +#define TARGET_SECTION_TYPE_FLAGS  mingw_pe_section_type_flags
> +
> +#define TARGET_ASM_UNIQUE_SECTION mingw_pe_unique_section
> +#define TARGET_ENCODE_SECTION_INFO  

Re: RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen5 CPU with znver5 scheduler Model

2024-03-18 Thread Jan Hubicka
> Hello,
> 
> Le 22/02/2024 à 19:29, Anbazhagan, Karthiban a écrit :
> (...)
> >  gcc/config/i386/{znver4.md => zn4zn5.md}  | 858 +-
> 
> looks like the patch pushed to master lost the file rename.
> I get a bootstrap failure caused by the missing zn4zn5.md file.
> 
> Can you have a look?
Aha, sorry.
I did reset of the git commit since there was formatting error in
changelog.  I will fix it shortly.

Honza


Re: Frontend access to target features (was Re: [PATCH] libgccjit: Add ability to get CPU features)

2024-03-18 Thread Antoni Boucher

David: Ping.

Le 2024-03-10 à 07 h 05, Iain Buclaw a écrit :

Excerpts from David Malcolm's message of März 5, 2024 4:09 pm:

On Thu, 2023-11-09 at 19:33 -0500, Antoni Boucher wrote:

Hi.
See answers below.

On Thu, 2023-11-09 at 18:04 -0500, David Malcolm wrote:

On Thu, 2023-11-09 at 17:27 -0500, Antoni Boucher wrote:

Hi.
This patch adds support for getting the CPU features in libgccjit
(bug
112466)

There's a TODO in the test:
I'm not sure how to test that gcc_jit_target_info_arch returns
the
correct value since it is dependant on the CPU.
Any idea on how to improve this?

Also, I created a CStringHash to be able to have a
std::unordered_set. Is there any built-in way of
doing
this?


Thanks for the patch.

Some high-level questions:

Is this specifically about detecting capabilities of the host that
libgccjit is currently running on? or how the target was configured
when libgccjit was built?


I'm less sure about this part. I'll need to do more tests.



One of the benefits of libgccjit is that, in theory, we support all
of
the targets that GCC already supports.  Does this patch change
that,
or
is this more about giving client code the ability to determine
capabilities of the specific host being compiled for?


This should not change that. If it does, this is a bug.



I'm nervous about having per-target jit code.  Presumably there's a
reason that we can't reuse existing target logic here - can you
please
describe what the problem is.  I see that the ChangeLog has:


 * config/i386/i386-jit.cc: New file.


where i386-jit.cc has almost 200 lines of nontrivial code.  Where
did
this come from?  Did you base it on existing code in our source
tree,
making modifications to fit the new internal API, or did you write
it
from scratch?  In either case, how onerous would this be for other
targets?


This was mostly copied from the same code done for the Rust and D
frontends.
See this commit and the following:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b1c06fd9723453dd2b2ec306684cb806dc2b4fbb
The equivalent to i386-jit.cc is there:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=22e3557e2d52f129f2bbfdc98688b945dba28dc9


[CCing Iain and Arthur re those patches; for reference, the patch being
discussed is attached to :
https://gcc.gnu.org/pipermail/jit/2024q1/001792.html ]

One of my concerns about this patch is that we seem to be gaining code
that's per-(frontend x config) which seems to be copied and pasted with
a search and replace, which could lead to an M*N explosion.



That's certainly the case with the configure/make rules. Itself I think
is copied originally from the {cpu_type}-protos.h machinery.

It might be worth pointing out that the c-family of front-ends don't
have separate headers because their per-target macros are defined in
{cpu_type}.h directly - for better or worse.


Is there any real difference between the per-config code for the
different frontends, or should there be a general "enumerate all
features of the target" hook that's independent of the frontend? (but
perhaps calls into it).



As far as I understand, the configure parts should all be identical
between tm_p, tm_d, tm_rust, ..., so would benefit from being templated
to aid any other front-ends adding in their own per target hooks.


Am I right in thinking that (rustc with default LLVM backend) has some
set of feature strings that both (rustc with rustc_codegen_gcc) and
gccrs are trying to emulate?  If so, is it presumably a goal that
libgccjit gives identical results to gccrs?  If so, would it be crazy
for libgccjit to consume e.g. config/i386/i386-rust.cc ?


I don't know whether libgccjit can just pull in directly the
implementation of the rust target hooks here.  The per-frontend target
hooks usually also make use of code specific to that front-end -
TARGET_CPU_CPP_BUILTINS and others can't be used by a non-c-family
front-end without adding a plethora of stubs, for example.

Whether or not libgccjit wants to give identical information as as rust
I think is a decision for you as the maintainer of its API.

Iain.


Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.

2024-03-18 Thread Aleksandar Rakic
Here is a patch for 
the GCC bug 109429.





Re: RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen5 CPU with znver5 scheduler Model

2024-03-18 Thread Mikael Morin

Hello,

Le 22/02/2024 à 19:29, Anbazhagan, Karthiban a écrit :
(...)

 gcc/config/i386/{znver4.md => zn4zn5.md}  | 858 +-


looks like the patch pushed to master lost the file rename.
I get a bootstrap failure caused by the missing zn4zn5.md file.

Can you have a look?


Re: [PATCH] i386 [stv]: Handle REG_EH_REGION note [pr111822].

2024-03-18 Thread Hongtao Liu
On Mon, Mar 18, 2024 at 6:59 PM Uros Bizjak  wrote:
>
> On Mon, Mar 18, 2024 at 11:52 AM liuhongt  wrote:
> >
> > Commit r14-9459-g618e34d56cc38e only handles
> > general_scalar_chain::convert_op. The patch also handles
> > timode_scalar_chain::convert_op to avoid potential similar bug.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk and backport to releases/gcc-13 branch?
>
> I have the following patch in testing that merges
> {general,timode}_scalar_chain::convert_op, so in addition to less code
> duplication, it will fix the issue for both chains. WDYT?
It would be better for maintenance, I prefer your patch.
>
> Uros.
>
> >
> > gcc/ChangeLog:
> >
> > PR target/111822
> > * config/i386/i386-features.cc
> > (timode_scalar_chain::convert_op): Handle REG_EH_REGION note.
> > ---
> >  gcc/config/i386/i386-features.cc | 20 +---
> >  1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386-features.cc 
> > b/gcc/config/i386/i386-features.cc
> > index c7d7a965901..38f57d96df5 100644
> > --- a/gcc/config/i386/i386-features.cc
> > +++ b/gcc/config/i386/i386-features.cc
> > @@ -1794,12 +1794,26 @@ timode_scalar_chain::convert_op (rtx *op, rtx_insn 
> > *insn)
> >  *op = gen_rtx_SUBREG (V1TImode, *op, 0);
> >else if (MEM_P (*op))
> >  {
> > +  rtx_insn* eh_insn;
> >rtx tmp = gen_reg_rtx (V1TImode);
> > -  emit_insn_before (gen_rtx_SET (tmp,
> > -gen_gpr_to_xmm_move_src (V1TImode, 
> > *op)),
> > -   insn);
> > +  eh_insn
> > +   = emit_insn_before (gen_rtx_SET (tmp,
> > +gen_gpr_to_xmm_move_src (V1TImode,
> > + *op)),
> > +   insn);
> >*op = tmp;
> >
> > +  if (cfun->can_throw_non_call_exceptions)
> > +   {
> > + /* Handle REG_EH_REGION note.  */
> > + rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
> > + if (note)
> > +   {
> > + control_flow_insns.safe_push (eh_insn);
> > + add_reg_note (eh_insn, REG_EH_REGION, XEXP (note, 0));
> > +   }
> > +   }
> > +
> >if (dump_file)
> > fprintf (dump_file, "  Preloading operand for insn %d into r%d\n",
> >  INSN_UID (insn), REGNO (tmp));
> > --
> > 2.31.1
> >



-- 
BR,
Hongtao


Re: [PATCH] i386 [stv]: Handle REG_EH_REGION note [pr111822].

2024-03-18 Thread Uros Bizjak
On Mon, Mar 18, 2024 at 11:52 AM liuhongt  wrote:
>
> Commit r14-9459-g618e34d56cc38e only handles
> general_scalar_chain::convert_op. The patch also handles
> timode_scalar_chain::convert_op to avoid potential similar bug.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk and backport to releases/gcc-13 branch?

I have the following patch in testing that merges
{general,timode}_scalar_chain::convert_op, so in addition to less code
duplication, it will fix the issue for both chains. WDYT?

Uros.

>
> gcc/ChangeLog:
>
> PR target/111822
> * config/i386/i386-features.cc
> (timode_scalar_chain::convert_op): Handle REG_EH_REGION note.
> ---
>  gcc/config/i386/i386-features.cc | 20 +---
>  1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/i386/i386-features.cc 
> b/gcc/config/i386/i386-features.cc
> index c7d7a965901..38f57d96df5 100644
> --- a/gcc/config/i386/i386-features.cc
> +++ b/gcc/config/i386/i386-features.cc
> @@ -1794,12 +1794,26 @@ timode_scalar_chain::convert_op (rtx *op, rtx_insn 
> *insn)
>  *op = gen_rtx_SUBREG (V1TImode, *op, 0);
>else if (MEM_P (*op))
>  {
> +  rtx_insn* eh_insn;
>rtx tmp = gen_reg_rtx (V1TImode);
> -  emit_insn_before (gen_rtx_SET (tmp,
> -gen_gpr_to_xmm_move_src (V1TImode, *op)),
> -   insn);
> +  eh_insn
> +   = emit_insn_before (gen_rtx_SET (tmp,
> +gen_gpr_to_xmm_move_src (V1TImode,
> + *op)),
> +   insn);
>*op = tmp;
>
> +  if (cfun->can_throw_non_call_exceptions)
> +   {
> + /* Handle REG_EH_REGION note.  */
> + rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
> + if (note)
> +   {
> + control_flow_insns.safe_push (eh_insn);
> + add_reg_note (eh_insn, REG_EH_REGION, XEXP (note, 0));
> +   }
> +   }
> +
>if (dump_file)
> fprintf (dump_file, "  Preloading operand for insn %d into r%d\n",
>  INSN_UID (insn), REGNO (tmp));
> --
> 2.31.1
>
diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index c7d7a965901..6d7ef28e4b1 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -980,14 +980,36 @@ scalar_chain::convert_reg (rtx_insn *insn, rtx dst, rtx 
src)
 REGNO (src), REGNO (dst), INSN_UID (insn));
 }
 
+
+/* Helper function to convert immediate constant X to vmode.  */
+static rtx
+smode_convert_cst (rtx x, enum machine_mode vmode)
+{
+  /* Prefer all ones vector in case of -1.  */
+  if (constm1_operand (x, GET_MODE (x)))
+return  CONSTM1_RTX (vmode);
+
+  unsigned n = GET_MODE_NUNITS (vmode);
+  rtx *v = XALLOCAVEC (rtx, n);
+  v[0] = x;
+  for (unsigned i = 1; i < n; ++i)
+v[i] = const0_rtx;
+  return gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (n, v));
+}
+
 /* Convert operand OP in INSN.  We should handle
memory operands and uninitialized registers.
All other register uses are converted during
registers conversion.  */
 
 void
-general_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
+scalar_chain::convert_op (rtx *op, rtx_insn *insn)
 {
+  rtx tmp;
+
+  if (GET_MODE (*op) == V1TImode)
+return;
+
   *op = copy_rtx_if_shared (*op);
 
   if (GET_CODE (*op) == NOT
@@ -998,20 +1020,21 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn 
*insn)
 }
   else if (MEM_P (*op))
 {
-  rtx_insn* eh_insn, *movabs = NULL;
-  rtx tmp = gen_reg_rtx (GET_MODE (*op));
+  rtx_insn *movabs = NULL;
 
   /* Emit MOVABS to load from a 64-bit absolute address to a GPR.  */
   if (!memory_operand (*op, GET_MODE (*op)))
{
- rtx tmp2 = gen_reg_rtx (GET_MODE (*op));
- movabs = emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
+ tmp = gen_reg_rtx (GET_MODE (*op));
+ movabs = emit_insn_before (gen_rtx_SET (tmp, *op), insn);
 
- *op = tmp2;
+ *op = tmp;
}
 
-  eh_insn
-   = emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
+  tmp = gen_rtx_SUBREG (vmode, gen_reg_rtx (GET_MODE (*op)), 0);
+
+  rtx_insn *eh_insn
+   = emit_insn_before (gen_rtx_SET (copy_rtx (tmp),
 gen_gpr_to_xmm_move_src (vmode, *op)),
insn);
 
@@ -1028,33 +1051,18 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn 
*insn)
}
}
 
-  *op = gen_rtx_SUBREG (vmode, tmp, 0);
+  *op = tmp;
 
   if (dump_file)
fprintf (dump_file, "  Preloading operand for insn %d into r%d\n",
 INSN_UID (insn), REGNO (tmp));
 }
   else if (REG_P (*op))
+*op = gen_rtx_SUBREG (vmode, *op, 0);
+  else if (CONST_SCALAR_INT_P (*op))
 {
-  *op = gen_rtx_SUBREG (vmode, *op, 

[PATCH] Document -fexcess-precision=16.

2024-03-18 Thread liuhongt
Ok for trunk?

gcc/ChangeLog:

* doc/invoke.texi: Document -fexcess-precision=16.
---
 gcc/doc/invoke.texi | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 85c938d4a14..673420fdd3e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14930,6 +14930,9 @@ assignments).  This option is enabled by default for C 
or C++ if a strict
 conformance option such as @option{-std=c99} or @option{-std=c++17} is used.
 @option{-ffast-math} enables @option{-fexcess-precision=fast} by default
 regardless of whether a strict conformance option is used.
+If @option{-fexcess-precision=16} is specified, casts and assignments of
+@code{_Float16} and @code{bfloat16_t} cause value to be rounded to their
+semantic types if they're supported by the target.
 
 @opindex mfpmath
 @option{-fexcess-precision=standard} is not implemented for languages
-- 
2.31.1



[PATCH] i386 [stv]: Handle REG_EH_REGION note [pr111822].

2024-03-18 Thread liuhongt
Commit r14-9459-g618e34d56cc38e only handles
general_scalar_chain::convert_op. The patch also handles
timode_scalar_chain::convert_op to avoid potential similar bug.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk and backport to releases/gcc-13 branch?

gcc/ChangeLog:

PR target/111822
* config/i386/i386-features.cc
(timode_scalar_chain::convert_op): Handle REG_EH_REGION note.
---
 gcc/config/i386/i386-features.cc | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index c7d7a965901..38f57d96df5 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -1794,12 +1794,26 @@ timode_scalar_chain::convert_op (rtx *op, rtx_insn 
*insn)
 *op = gen_rtx_SUBREG (V1TImode, *op, 0);
   else if (MEM_P (*op))
 {
+  rtx_insn* eh_insn;
   rtx tmp = gen_reg_rtx (V1TImode);
-  emit_insn_before (gen_rtx_SET (tmp,
-gen_gpr_to_xmm_move_src (V1TImode, *op)),
-   insn);
+  eh_insn
+   = emit_insn_before (gen_rtx_SET (tmp,
+gen_gpr_to_xmm_move_src (V1TImode,
+ *op)),
+   insn);
   *op = tmp;
 
+  if (cfun->can_throw_non_call_exceptions)
+   {
+ /* Handle REG_EH_REGION note.  */
+ rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
+ if (note)
+   {
+ control_flow_insns.safe_push (eh_insn);
+ add_reg_note (eh_insn, REG_EH_REGION, XEXP (note, 0));
+   }
+   }
+
   if (dump_file)
fprintf (dump_file, "  Preloading operand for insn %d into r%d\n",
 INSN_UID (insn), REGNO (tmp));
-- 
2.31.1



[PATCH] LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-18 Thread Xi Ruoyao
We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument.  This is causing gcc.dg/c23-stdarg-6.c and
gcc.dg/c23-stdarg-8.c to fail.

Fix the issue by checking if arg.type is NULL, as r14-9503 explains.

gcc/ChangeLog:

PR target/114175
* config/loongarch/loongarch.cc
(loongarch_setup_incoming_varargs): Only skip
loongarch_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
functions if arg.type is NULL.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 70e31bb831c..57de8ef7d20 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -767,7 +767,8 @@ loongarch_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 loongarch_function_arg_advance (pack_cumulative_args (_cum), arg);
 
   /* Found out how many registers we need to save.  */
-- 
2.44.0



Re: [PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits

2024-03-18 Thread HAO CHEN GUI
Hi,
  Gently ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647533.html

Thanks
Gui Haochen

在 2024/3/11 13:41, HAO CHEN GUI 写道:
> Hi,
>   This patch tries to fix the problem when a canonical form doesn't benefit
> on a specific target. The const operand of AND is and with the nonzero
> bits of another operand in combine pass. It's a canonical form, but it's no
> benefits for the target which has rotate and mask insns. As the mask is
> truncated, it can't match the insn conditions which it originally matches.
> For example, the following insn condition checks the sum of two AND masks.
> When one of the mask is truncated, the condition breaks.
> 
> (define_insn "*rotlsi3_insert_5"
>   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
>   (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
>   (match_operand:SI 2 "const_int_operand" "n,n"))
>   (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0")
>   (match_operand:SI 4 "const_int_operand" "n,n"]
>   "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode)
>&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0
>&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0"
> ...
> 
>   This patch tries to fix the problem by comparing the rtx cost. If another
> operand (varop) is not changed and rtx cost with new mask is not less than
> the original one, the mask is restored to original one.
> 
>   I'm not sure if comparison of rtx cost here is proper. The outer code is
> unknown and I suppose it as "SET". Also the rtx cost might not be accurate.
> From my understanding, the canonical forms should always benefit as it can't
> be undo in combine pass. Do we have a perfect solution for this kind of
> issues? Looking forward for your advice.
> 
>   Another similar issues for canonical forms. Whether the widen mode for
> lshiftrt is always good?
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> Combine: Don't truncate const operand of AND if it's no benefits
> 
> In combine pass, the canonical form is to turn off all bits in the constant
> that are know to already be zero for AND.
> 
>   /* Turn off all bits in the constant that are known to already be zero.
>  Thus, if the AND isn't needed at all, we will have CONSTOP == 
> NONZERO_BITS
>  which is tested below.  */
> 
>   constop &= nonzero;
> 
> But it doesn't benefit when the target has rotate and mask insert insns.
> The AND mask is truncated and lost its information.  Thus it can't match
> the insn conditions.  For example, the following insn condition checks
> the sum of two AND masks.
> 
> (define_insn "*rotlsi3_insert_5"
>   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
>   (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
>   (match_operand:SI 2 "const_int_operand" "n,n"))
>   (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0")
>   (match_operand:SI 4 "const_int_operand" "n,n"]
>   "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode)
>&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0
>&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0"
> ...
> 
> This patch restores the const operand of AND if the another operand is
> not optimized and the truncated const operand doesn't save the rtx cost.
> 
> gcc/
>   * combine.cc (simplify_and_const_int_1): Restore the const operand
>   of AND if varop is not optimized and the rtx cost of the new const
>   operand is not reduced.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/rlwimi-0.c: Reduced total number of insns and
>   adjust the number of rotate and mask insns.
>   * gcc.target/powerpc/rlwimi-1.c: Likewise.
>   * gcc.target/powerpc/rlwimi-2.c: Likewise.
> 
> patch.diff
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index a4479f8d836..16ff09ea854 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -10161,8 +10161,23 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx 
> varop,
>if (constop == nonzero)
>  return varop;
> 
> -  if (varop == orig_varop && constop == orig_constop)
> -return NULL_RTX;
> +  if (varop == orig_varop)
> +{
> +  if (constop == orig_constop)
> + return NULL_RTX;
> +  else
> + {
> +   rtx tmp = simplify_gen_binary (AND, mode, varop,
> +  gen_int_mode (constop, mode));
> +   rtx orig = simplify_gen_binary (AND, mode, varop,
> +   gen_int_mode (orig_constop, mode));
> +   if (set_src_cost (tmp, mode, optimize_this_for_speed_p)
> +   < set_src_cost (orig, mode, optimize_this_for_speed_p))
> + return tmp;
> +   else
> + return NULL_RTX;
> + }
> +}
> 
>/* Otherwise, return an AND.  */
>return simplify_gen_binary (AND, mode, varop, gen_int_mode (constop, 
> mode));
> diff --git 

RISC-V: Use convert instructions instead of calling library functions

2024-03-18 Thread Jivan Hakobyan
As RV has round instructions it is reasonable to use them instead of
calling the library functions.

With my patch for the following C code:
double foo(double a) {
return ceil(a);
}

GCC generates the following ASM code (before it was tail call)
foo:
fabs.d  fa4,fa0
lui a5,%hi(.LC0)
fld fa3,%lo(.LC0)(a5)
flt.d   a5,fa4,fa3
beq a5,zero,.L3
fcvt.l.d a5,fa0,rup
fcvt.d.lfa4,a5
fsgnj.d fa0,fa4,fa0
.L3:
ret

.LC0:
.word   0
.word   1127219200 // 0x4330


The patch I have evaluated on SPEC2017.
Counted dynamic instructions counts and got the following improvements

510.parest_r   262 m  -
511.povray_r  2.1  b0.04%
521.wrt_r269 m   -
526.blender_r3 b 0.1%
527.cam4_r   15 b   0.6%
538.imagick_r365 b 7.6%

Overall executed 385 billion fewer instructions which is 0.5%.


gcc/ChangeLog:
* config/riscv/iterators.md (fix_ops, fix_uns): New iterators for
fix patterns.
(RINT, rint_pattern, rint_rm): Removed.
* config/riscv/riscv-protos.h (get_fp_rounding_coefficient): Add
function declaration.
* config/riscv/riscv-v.cc (get_fp_rounding_coefficient): Turned to
not static
* config/riscv/riscv.md (UNSPEC_LROUND): Removed.
(_truncsi2, lrintsi2): New expanders.
(lsi2,
2):  Likewise.
(_truncsi2_sext,
lrintsi2_sext): Expose generator.
(lsi2_sext): Likewise.
(_truncsi2, lrintsi2): Hide
generator.
(lsi2): Hide generator.
(fix_trunc2,
fixuns_trunc2): Removed.
(l2,
2): Likewise.
(_truncdi2, lrintdi2): New patterns.
(ldi2): Likewise.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/fix.c: New test.
* gcc.target/riscv/round.c: Likewise.
* gcc.target/riscv/round_32.c: Likewise.
* gcc.target/riscv/round_64.c: Likewise.


-- 
With the best regards
Jivan Hakobyan
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index a7694137685aee97ca249c0e720afdfc62ec33c9..75e119e407a36c273eaa6e5ffab24be42af7a8d7 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -196,6 +196,13 @@
 
 (define_code_iterator bitmanip_rotate [rotate rotatert])
 
+;; These code iterators allow the signed and unsigned fix operations to use
+;; the same template.
+(define_code_iterator fix_ops [fix unsigned_fix])
+
+(define_code_attr fix_uns [(fix "fix") (unsigned_fix "fixuns")])
+
+
 ;; ---
 ;; Code Attributes
 ;; ---
@@ -312,11 +319,6 @@
 ;; Int Iterators.
 ;; ---
 
-;; Iterator and attributes for floating-point rounding instructions.
-(define_int_iterator RINT [UNSPEC_LRINT UNSPEC_LROUND])
-(define_int_attr rint_pattern [(UNSPEC_LRINT "rint") (UNSPEC_LROUND "round")])
-(define_int_attr rint_rm [(UNSPEC_LRINT "dyn") (UNSPEC_LROUND "rmm")])
-
 ;; Iterator and attributes for quiet comparisons.
 (define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET UNSPEC_FLE_QUIET])
 (define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") (UNSPEC_FLE_QUIET "le")])
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index b87355938052a3a0ca9107774bb3a683c85b74d9..03486f4c4e3dab733e48a702c4ffbb5865f1884d 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -709,6 +709,7 @@ bool gather_scatter_valid_offset_p (machine_mode);
 HOST_WIDE_INT estimated_poly_value (poly_int64, unsigned int);
 bool whole_reg_to_reg_move_p (rtx *, machine_mode, int);
 bool splat_to_scalar_move_p (rtx *);
+rtx get_fp_rounding_coefficient (machine_mode);
 }
 
 /* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 967f4e382875dfeee7d5de5ab05a2b537766844e..95a233dea44168dec2a28c94a46004a40e18 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4494,7 +4494,7 @@ vls_mode_valid_p (machine_mode vls_mode)
   All double floating point will be unchanged for ceil if it is
   greater than and equal to 4503599627370496.
  */
-static rtx
+rtx
 get_fp_rounding_coefficient (machine_mode inner_mode)
 {
   REAL_VALUE_TYPE real;
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index b16ed97909c04456ce4fe5234a82c5597549b67d..d4eb440d7baeb3d71c7e58291ce4136da6852246 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -64,7 +64,6 @@
   UNSPEC_ROUNDEVEN
   UNSPEC_NEARBYINT
   UNSPEC_LRINT
-  UNSPEC_LROUND
   UNSPEC_FMIN
   UNSPEC_FMAX
   UNSPEC_FMINM
@@ -1967,21 +1966,48 @@
 ;;
 ;;  
 
-(define_insn "fix_trunc2"
-  [(set (match_operand:GPR  0 "register_operand" "=r")
-	(fix:GPR
+(define_expand "_truncsi2"
+  [(set 

Re: [PATCH] ipa: Fix C++ member ptr indirect inlining (PR 114254, PR 108802)

2024-03-18 Thread Jan Hubicka
> gcc/ChangeLog:
> 
> 2024-03-06  Martin Jambor  
> 
>   PR ipa/108802
>   PR ipa/114254
>   * ipa-prop.cc (ipa_get_stmt_member_ptr_load_param): Fix case looking
>   at COMPONENT_REFs directly from a PARM_DECL, also recognize loads from
>   a pointer parameter.
>   (ipa_analyze_indirect_call_uses): Also recognize loads from a pointer
>   parameter, also recognize the case when pfn pointer is loaded in its
>   own BB.
> 
> gcc/testsuite/ChangeLog:
> 
> 2024-03-06  Martin Jambor  
> 
>   PR ipa/108802
>   PR ipa/114254
>   * g++.dg/ipa/iinline-4.C: New test.
>   * g++.dg/ipa/pr108802.C: Likewise.

OK,
thanks!

Honza


[patch,avr,applied] Tweak xor insn constraints

2024-03-18 Thread Georg-Johann Lay

xor insn allows some more values without the requirement
of a scratch register.  This patch adds new constraint
alternative for such values.  The output function avr_out_bitop
already handles these cases, so no change is needed there.

Johann

--

avr.md - Tweak xor insn constraints.

xor insn can handle some more values without the requirement of a
scratch register.  This patch adds a new constraint alternative for
such values.  The output function avr_out_bitop already handles
these cases, so no change is needed there.

gcc/
* config/avr/constraints.md (CX2, CX3, CX4): New constraints.
* config/avr/avr-protos.h (avr_xor_noclobber_dconst): New proto.
* config/avr/avr.cc (avr_xor_noclobber_dconst): New function.
* config/avr/avr.md (xorhi3, *xorhi3): Add "d,0,CX2,X" alternative.
(xorpsi3, *xorpsi3): Add "d,0,CX3,X" alternative.
(xorsi3, *xorsi3): Add "d,0,CX4,X" alternative.diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h
index bb680312117..dc23cfbf461 100644
--- a/gcc/config/avr/avr-protos.h
+++ b/gcc/config/avr/avr-protos.h
@@ -101,6 +101,7 @@ extern const char* avr_out_xload (rtx_insn *, rtx*, int*);
 extern const char* avr_out_cpymem (rtx_insn *, rtx*, int*);
 extern const char* avr_out_insert_bits (rtx*, int*);
 extern bool avr_popcount_each_byte (rtx, int, int);
+extern bool avr_xor_noclobber_dconst (rtx, int);
 extern bool avr_has_nibble_0xf (rtx);
 
 extern int extra_constraint_Q (rtx x);
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 00fce8da15f..12c59668b4c 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -281,6 +281,31 @@ avr_popcount_each_byte (rtx xval, int n_bytes, int pop_mask)
 }
 
 
+/* Constraint helper function.  XVAL is a CONST_INT.  Return true if we
+   can perform XOR without a clobber reg, provided the operation is on
+   a d-register.  This means each byte is in { 0, 0xff, 0x80 }.  */
+
+bool
+avr_xor_noclobber_dconst (rtx xval, int n_bytes)
+{
+  machine_mode mode = GET_MODE (xval);
+
+  if (VOIDmode == mode)
+mode = SImode;
+
+  for (int i = 0; i < n_bytes; ++i)
+{
+  rtx xval8 = simplify_gen_subreg (QImode, xval, mode, i);
+  unsigned int val8 = UINTVAL (xval8) & GET_MODE_MASK (QImode);
+
+  if (val8 != 0 && val8 != 0xff && val8 != 0x80)
+	return false;
+}
+
+  return true;
+}
+
+
 /* Access some RTX as INT_MODE.  If X is a CONST_FIXED we can get
the bit representation of X by "casting" it to CONST_INT.  */
 
diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index bc408633eb5..97f42be7729 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -4741,10 +4741,10 @@ (define_insn "*xorqi3"
   [(set_attr "length" "1")])
 
 (define_insn_and_split "xorhi3"
-  [(set (match_operand:HI 0 "register_operand"   "=??r,r  ,r")
-(xor:HI (match_operand:HI 1 "register_operand" "%0,0  ,0")
-(match_operand:HI 2 "nonmemory_operand" "r,Cx2,n")))
-   (clobber (match_scratch:QI 3"=X,X  ,"))]
+  [(set (match_operand:HI 0 "register_operand"   "=??r,r  ,d  ,r")
+(xor:HI (match_operand:HI 1 "register_operand" "%0,0  ,0  ,0")
+(match_operand:HI 2 "nonmemory_operand" "r,Cx2,CX2,n")))
+   (clobber (match_scratch:QI 3"=X,X  ,X  ,"))]
   ""
   "#"
   "&& reload_completed"
@@ -4755,10 +4755,10 @@ (define_insn_and_split "xorhi3"
   (clobber (reg:CC REG_CC))])])
 
 (define_insn "*xorhi3"
-  [(set (match_operand:HI 0 "register_operand"   "=??r,r  ,r")
-(xor:HI (match_operand:HI 1 "register_operand" "%0,0  ,0")
-(match_operand:HI 2 "nonmemory_operand" "r,Cx2,n")))
-   (clobber (match_scratch:QI 3"=X,X  ,"))
+  [(set (match_operand:HI 0 "register_operand"   "=??r,r  ,d  ,r")
+(xor:HI (match_operand:HI 1 "register_operand" "%0,0  ,0  ,0")
+(match_operand:HI 2 "nonmemory_operand" "r,Cx2,CX2,n")))
+   (clobber (match_scratch:QI 3"=X,X  ,X  ,"))
(clobber (reg:CC REG_CC))]
   "reload_completed"
   {
@@ -4767,14 +4767,14 @@ (define_insn "*xorhi3"
 
 return avr_out_bitop (insn, operands, NULL);
   }
-  [(set_attr "length" "2,2,4")
-   (set_attr "adjust_len" "*,out_bitop,out_bitop")])
+  [(set_attr "length" "2,2,4,4")
+   (set_attr "adjust_len" "*,out_bitop,out_bitop,out_bitop")])
 
 (define_insn_and_split "xorpsi3"
-  [(set (match_operand:PSI 0 "register_operand""=??r,r  ,r")
-(xor:PSI (match_operand:PSI 1 "register_operand" "%0,0  ,0")
- (match_operand:PSI 2 "nonmemory_operand" "r,Cx3,n")))
-   (clobber (match_scratch:QI 3  "=X,X  ,"))]
+  [(set (match_operand:PSI 0 "register_operand""=??r,r  ,d  ,r")
+(xor:PSI (match_operand:PSI 1 "register_operand" "%0,0  ,0  ,0")
+ (match_operand:PSI 2 "nonmemory_operand" "r,Cx3,CX3,n")))
+   (clobber (match_scratch:QI 3 

Re: [Patch][RFC] GCN: Define ISA archs in gcn-devices.def and use it

2024-03-18 Thread Richard Biener
On Fri, Mar 15, 2024 at 5:36 PM Andrew Stubbs  wrote:
>
> On 15/03/2024 13:56, Tobias Burnus wrote:
> > Hi Andrew,
> >
> > Andrew Stubbs wrote:
> >> This is more-or-less what I was planning to do myself, but as I want
> >> to include all the other features that get parametrized in gcn.cc,
> >> gcn.h, gcn-hsa.h, gcn-opts.h, I hadn't got around to it yet.
> >> Unfortunately, I think the gcn.opt and config.gcc will always need
> >> manually updating, but if that's all it'll be an improvement.
> >
> > Well, for .opt see how nvptx does it – it actually generates an .opt file.
> >
> >> I don't like the idea of including AMDGPU_ISA_UNSUPPORTED;
> >
> > I concur – I was initially thinking of reporting the device name
> > ("Unsupported %s") but I then realized that the agent returns a string
> > while only for GCC generated files (→ eflag) the hexcode is used. Thus,
> > I ended up not using it.
> >
> >> Ultimately, I want to replace many of the conditionals like
> >> "TARGET_CDNA2_PLUS" from the code and replace them with feature flags
> >> derived from a def file, or at least a header file. We've acquired too
> >> many places where there are unsearchable conditionals that need
> >> finding and fixing every time a new device comes along.
> > I was thinking of having more flags, but those where the only ones
> > required for the two files.
> >> I had imagined that this .def file would exist in gcc/config/gcn, but
> >> you've placed it in libgomp maybe it makes sense to have multiple
> >> such files if they contain very different data, but I had imagined one
> >> file and I'm not sure that the compiler definitions live in libgomp.
> >
> > There is already:
> >
> > gcc/config/darwin-c.cc:#include "../../libcpp/internal.h"
> >
> > gcc/config/gcn/gcn-run.cc:#include
> > "../../../libgomp/config/gcn/libgomp-gcn.h"
> >
> > gcc/fortran/cpp.cc:#include "../../libcpp/internal.h"
> >
> > gcc/fortran/trigd_fe.inc:#include "../../libgfortran/intrinsics/trigd.inc"
> >
> > But there is also the reverse:
> >
> > libcpp/lex.cc:#include "../gcc/config/i386/cpuid.h"
> >
> > libgfortran/libgfortran.h:#include "../gcc/fortran/libgfortran.h"
> >
> > lto-plugin/lto-plugin.c:#include "../gcc/lto/common.h"
> >
> > If you add more items, it is probably better to have it under
> > gcc/config/gcn/ - and I really prefer a single file for all.
> >
> > * * *
> >
> > Talking about feature sets: This would be a bit like LLVM (see below)
> > but I think they have a bit too much indirections. But I do concur that
> > we need to consolidate the current support – and hopefully make it
> > easier to keep adding more GPU support; we seem to have already covered
> > a larger chunk :-)
> >
> > I also did wonder whether we should support, e.g., running a gfx1100
> > code (or a gfx11-generic one) on, e.g., a gfx1103 device. Alternatively,
> > we could keep the current check which requires an exact match.
>
> We didn't invent that restriction; the runtime won't let you do it. We
> only have the check because the HSA/ROCr error message is not very
> user-friendly.

Note that I heard/read somewhere that they plan to support a "blend" version
that would allow running a kernel on any gfx11xx or gfx106x versions (supposedly
at some runtime cost).  Guess we'll need to watch the LLVM side of things here
(or the ROCm runtime side of it).

> > BTW: I do note that looking at the feature sets in LLVM that all GFX110x
> > GPUs seem to have common silicon bugs: FeatureMSAALoadDstSelBug and
> > FeatureMADIntraFwdBug, while 1100 and 1102 additionally have the
> > FeatureUserSGPRInit16Bug but 1101 and 1103 don't. — For some reasons,
> > FeatureISAVersion11_Generic only consists of two of those bugs (it
> > doesn't have FeatureMADIntraFwdBug), which doesn't seem to be that
> > consistent. Maybe the workaround has issues elsewhere? If so, a generic
> > -march=gfx11 might be not as useful as one might hope for.
> >
> > * * *
> >
> > If I look at LLVM's
> > https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPU.td
> >  ,
> >
> > they first define several features – like 'FeatureUnalignedScratchAccess'.
> >
> > Then they combine them like in:
> >
> > def FeatureISAVersion11_Common ... [FeatureGFX11, ...
> > FeatureAtomicFaddRtnInsts ...
> >
> > And then they use those to map them to feature sets like:
> >
> > def FeatureISAVersion11_0_Common ...
> > listconcat(FeatureISAVersion11_Common.Features,
> >  [FeatureMSAALoadDstSelBug ...
> >
> > And for gfx1103:
> >
> > def FeatureISAVersion11_0_3 : FeatureSet<
> >!listconcat(FeatureISAVersion11_0_Common.Features,
> >  [])>;
> >
> > The mapping to gfx... names then happens in
> > https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/GCNProcessors.td
> >  such as:
> >
> > def : ProcessorModel<"gfx1103", GFX11SpeedModel,
> >FeatureISAVersion11_0_3.Features
> >  >;
> >
> > Or for the generic one, i.e.:
> >
> > // [gfx1100, gfx1101, gfx1102, gfx1103, gfx1150, gfx1151]
> > def : 

Re: _LIBCXX_DEBUG value initialized singular iterators assert failures in std algorithms [PR104316]

2024-03-18 Thread Jonathan Wakely
On Sun, 17 Mar 2024 at 16:52, François Dumont  wrote:
>
>
> >
> > OK for trunk, thanks!
> >
> > I think this is OK to backport to 13 too.
> >
> > Maybe after this we can define the __cpp_lib_null_itetators macro for
> > debug mode?
> >
> After this fix of local_iterator I think we can indeed.
>
> In fact the added 11316.cc was already passing for
> unordered_set<>::local_iterator but simply because we were missing the
> singular check. Both issues solved with this patch.
>
> I found the version.def file to cleanup but no idea how to regenerate
> version.h from it so I'll let you do it, ok ?

Sure, I can do that. To regenerate it run 'make update-version' in the
libstdc++-v3/include build directory.

>
>  libstdc++: Fix _Safe_local_iterator<>::_M_valid_range
>
>  Unordered container local_iterator range shall not contain any singular
>  iterator unless both iterators are value-initialized.
>
>  libstdc++-v3/ChangeLog:
>
>  * include/debug/safe_local_iterator.tcc
>  (_Safe_local_iterator::_M_valid_range): Add
> _M_value_initialized and
>  _M_singular checks.
>  * testsuite/23_containers/unordered_set/debug/114316.cc:
> New test case.
>
>
> Ok to commit ?

OK.



Re: _LIBCXX_DEBUG value initialized singular iterators assert failures in std algorithms [PR104316]

2024-03-18 Thread Jonathan Wakely
On Sun, 17 Mar 2024 at 18:14, François Dumont  wrote:
>
> I was a little bit too confident below. After review of all _M_singular
> usages I found another necessary fix.
>
> After this one for sure we will be able to define
> __cpp_lib_null_iterators even in Debug mode.
>
>  libstdc++: Fix N3344 behavior on _Safe_iterator::_M_can_advance
>
>  We shall be able to advance from a 0 offset a value-initialized
> iterator.
>
>  libstdc++-v3/ChangeLog:
>
>  * include/debug/safe_iterator.tcc
> (_Safe_iterator<>::_M_can_advance):
>  Accept 0 offset advance on value-initialized iterator.
>  * testsuite/23_containers/vector/debug/n3644.cc: New test case.
>
> Ok to commit ?


OK, thanks.


> François
>
>
> On 17/03/2024 17:52, François Dumont wrote:
> >
> >>
> >> OK for trunk, thanks!
> >>
> >> I think this is OK to backport to 13 too.
> >>
> >> Maybe after this we can define the __cpp_lib_null_itetators macro for
> >> debug mode?
> >>
> > After this fix of local_iterator I think we can indeed.
> >
> > In fact the added 11316.cc was already passing for
> > unordered_set<>::local_iterator but simply because we were missing the
> > singular check. Both issues solved with this patch.
> >
> > I found the version.def file to cleanup but no idea how to regenerate
> > version.h from it so I'll let you do it, ok ?
> >
> > libstdc++: Fix _Safe_local_iterator<>::_M_valid_range
> >
> > Unordered container local_iterator range shall not contain any
> > singular
> > iterator unless both iterators are value-initialized.
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/debug/safe_local_iterator.tcc
> > (_Safe_local_iterator::_M_valid_range): Add
> > _M_value_initialized and
> > _M_singular checks.
> > * testsuite/23_containers/unordered_set/debug/114316.cc:
> > New test case.
> >
> >
> > Ok to commit ?
> >
> > François



Re: [PATCH] cpp: new built-in __EXP_COUNTER__

2024-03-18 Thread Jonathan Wakely

On 18/03/24 07:30 +, Jonathan Wakely wrote:

On 21/04/22 04:31 -0700, Kaz Kylheku wrote:

libcpp/ChangeLog
2022-04-21  Kaz Kylheku  

This change introduces a pair of related macros
__EXP_COUNTER__ and __UEXP_COUNTER__.  These macros access
integer values which enumerate macro expansions.
They can be used for the purposes of obtaining, unique
identifiers (within the scope of a translation unit), as a
replacement for unreliable hacks based on __LINE__.

Outside of macro expansions, these macros expand to 1,
so they are easy to test for in portable code that needs
to fall back on something, like __LINE__.

* gcc/doc/cpp.texi (__EXP_COUNTER__, __UEXP_COUNTER__):
Special built-in macros documented.

* libcpp/include/cpplib.h (struct cpp_macro): New members of
type long long: exp_number and uexp_number.  These members are
used to attach the current exp_counter value, and the parent
context's copy of it to a new macro expansion.  The special
macros then just access these.
(enum cpp_builtin_type): New enumeration members,
BT_EXP_COUNTER and BT_UEXP_COUNTER.

* libcpp/macro.cc (exp_counter): New static variable for
counting expansions.  There is an existing one,
num_expanded_macros_counter, but that has its own purpose and
is incremented in a specific place.  Plus it uses a narrower
integer type.
(_cpp_builtin_number_text): Change the local variable "number"
from linenum_type to unsigned long long, so it holds at least
64 bit values.  Handle the BT_EXP_COUNTER and BT_UEXP_COUNTER
cases.  These just have to see if there is a current macro, and
retrieve the values from it, otherwise do nothing so that the
default 1 is produced.  In the case of BT_UEXP_COUNTER, if the
value is zero, we don't use it, so 1 emerges.  The sprintf of
the number is adjusted to use the right conversion specifier
for the wider type.  Space is already being reserved for
a 64 bit decimal.
(enter_macro_context): After processing the macro arguments,
if any, we increment exp_counter and attach its new value to
the macro's context structure in the exp_number member.  We
also calculate the macro's uexp_number: the parent context's
exp_number.  This is tricky: we have to chase the previous
macro context.  This works if the macro is object-like, or has
no parameters.  If it has parameters, there is a parameter
context, and so we have to climb one more flight of stairs to
get to the real context.

gcc/testsuite/ChangeLog
2022-04-21  Kaz Kylheku  

* gcc.dg/cpp/expcounter1.c: New test.
* gcc.dg/cpp/expcounter2.c: New test.
* gcc.dg/cpp/expcounter3.c: New test.
* gcc.dg/cpp/expcounter4.c: New test.
* gcc.dg/cpp/expcounter5.c: New test.

Signed-off-by: Kaz Kylheku 
---
gcc/doc/cpp.texi   | 81 ++
gcc/testsuite/ChangeLog|  8 +++
gcc/testsuite/gcc.dg/cpp/expcounter1.c | 16 +
gcc/testsuite/gcc.dg/cpp/expcounter2.c | 21 +++
gcc/testsuite/gcc.dg/cpp/expcounter3.c | 22 +++
gcc/testsuite/gcc.dg/cpp/expcounter4.c | 22 +++
gcc/testsuite/gcc.dg/cpp/expcounter5.c | 28 +
libcpp/ChangeLog   | 49 
libcpp/include/cpplib.h|  8 +++
libcpp/init.cc |  2 +
libcpp/macro.cc| 44 +-
11 files changed, 299 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/cpp/expcounter1.c
create mode 100644 gcc/testsuite/gcc.dg/cpp/expcounter2.c
create mode 100644 gcc/testsuite/gcc.dg/cpp/expcounter3.c
create mode 100644 gcc/testsuite/gcc.dg/cpp/expcounter4.c
create mode 100644 gcc/testsuite/gcc.dg/cpp/expcounter5.c

diff --git a/gcc/doc/cpp.texi b/gcc/doc/cpp.texi
index 90b2767e39a..d52450958d7 100644
--- a/gcc/doc/cpp.texi
+++ b/gcc/doc/cpp.texi
@@ -1941,6 +1941,87 @@ generate unique identifiers.  Care must be 
taken to ensure that

@code{__COUNTER__} is not expanded prior to inclusion of precompiled headers
which use it.  Otherwise, the precompiled headers will not be used.

+@item __EXP_COUNTER__
+This macro's name means "(macro) expansion counter".
+Outside of macro replacement sequences, it expands to the integer
+token @code{1}.  This make it possible to easily test for the presence
+of this feature using conditional directives such as
+@code{#if __EXP_COUNTER__}.


It's a macro, so you can just use '#ifdef __EXP_COUNTER__' to test if
it's supported. Is this additional behaviour necessary?


+When @code{__EXP_COUNTER__} occurs in the replacement token sequence
+of a macro, it expands to positive decimal integer token which


expands to _a_ positive decimal integer token?


+uniquely identifies the 

Re: [PATCH] Predefine __STRICT_ALIGN__ if STRICT_ALIGNMENT

2024-03-18 Thread Richard Biener
On Sun, 17 Mar 2024, YunQiang Su wrote:

> Arm32 predefines __ARM_FEATURE_UNALIGNED if -mno-unaligned-access,
> and RISC-V predefines __riscv_misaligned_avoid, while other ports
> that support -mstrict-align/-mno-unaligned-access don't have such
> macro, and these backend macros are only avaiable for c-family.
> Note: Arm64 always predefine __ARM_FEATURE_UNALIGNED: See #111555.
> 
> Let's add a generic one.

STRICT_ALIGNMENT is supposed to be gone, it doesn't tell the full
truth so exposing it will cause more confusion only.  Nak.

Richard.

> __STRICT_ALIGN__ is used instead of __STRICT_ALIGNMENT__, due to that
> the later is used by some softwares, such as lzo2, syslinux etc.
> 
> gcc
>   * cppbuiltin.cc: Predefine __STRICT_ALIGNMENT__ if
>   STRICT_ALIGNMENT.
> ---
>  gcc/cppbuiltin.cc | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/gcc/cppbuiltin.cc b/gcc/cppbuiltin.cc
> index c4bfc2917dc..d32efdf9a07 100644
> --- a/gcc/cppbuiltin.cc
> +++ b/gcc/cppbuiltin.cc
> @@ -123,6 +123,9 @@ define_builtin_macros_for_compilation_flags (cpp_reader 
> *pfile)
>  
>cpp_define_formatted (pfile, "__FINITE_MATH_ONLY__=%d",
>   flag_finite_math_only);
> +
> +  if (STRICT_ALIGNMENT)
> +cpp_define (pfile, "__STRICT_ALIGNMENT__");
>  }
>  
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] testsuite: Define _POSIX_C_SOURCE for test

2024-03-18 Thread Torbjorn SVENSSON




On 2024-03-17 18:48, Mike Stump wrote:

On Mar 10, 2024, at 10:26 AM, Torbjörn SVENSSON  
wrote:


Ok for trunk?


Ok.


Pushed as basepoints/gcc-14-9513-g58753dba800 to trunk.

Kind regards,
Torbjörn


Re: [PATCH] cpp: new built-in __EXP_COUNTER__

2024-03-18 Thread Jonathan Wakely

On 21/04/22 04:31 -0700, Kaz Kylheku wrote:

libcpp/ChangeLog
2022-04-21  Kaz Kylheku  

This change introduces a pair of related macros
__EXP_COUNTER__ and __UEXP_COUNTER__.  These macros access
integer values which enumerate macro expansions.
They can be used for the purposes of obtaining, unique
identifiers (within the scope of a translation unit), as a
replacement for unreliable hacks based on __LINE__.

Outside of macro expansions, these macros expand to 1,
so they are easy to test for in portable code that needs
to fall back on something, like __LINE__.

* gcc/doc/cpp.texi (__EXP_COUNTER__, __UEXP_COUNTER__):
Special built-in macros documented.

* libcpp/include/cpplib.h (struct cpp_macro): New members of
type long long: exp_number and uexp_number.  These members are
used to attach the current exp_counter value, and the parent
context's copy of it to a new macro expansion.  The special
macros then just access these.
(enum cpp_builtin_type): New enumeration members,
BT_EXP_COUNTER and BT_UEXP_COUNTER.

* libcpp/macro.cc (exp_counter): New static variable for
counting expansions.  There is an existing one,
num_expanded_macros_counter, but that has its own purpose and
is incremented in a specific place.  Plus it uses a narrower
integer type.
(_cpp_builtin_number_text): Change the local variable "number"
from linenum_type to unsigned long long, so it holds at least
64 bit values.  Handle the BT_EXP_COUNTER and BT_UEXP_COUNTER
cases.  These just have to see if there is a current macro, and
retrieve the values from it, otherwise do nothing so that the
default 1 is produced.  In the case of BT_UEXP_COUNTER, if the
value is zero, we don't use it, so 1 emerges.  The sprintf of
the number is adjusted to use the right conversion specifier
for the wider type.  Space is already being reserved for
a 64 bit decimal.
(enter_macro_context): After processing the macro arguments,
if any, we increment exp_counter and attach its new value to
the macro's context structure in the exp_number member.  We
also calculate the macro's uexp_number: the parent context's
exp_number.  This is tricky: we have to chase the previous
macro context.  This works if the macro is object-like, or has
no parameters.  If it has parameters, there is a parameter
context, and so we have to climb one more flight of stairs to
get to the real context.

gcc/testsuite/ChangeLog
2022-04-21  Kaz Kylheku  

* gcc.dg/cpp/expcounter1.c: New test.
* gcc.dg/cpp/expcounter2.c: New test.
* gcc.dg/cpp/expcounter3.c: New test.
* gcc.dg/cpp/expcounter4.c: New test.
* gcc.dg/cpp/expcounter5.c: New test.

Signed-off-by: Kaz Kylheku 
---
gcc/doc/cpp.texi   | 81 ++
gcc/testsuite/ChangeLog|  8 +++
gcc/testsuite/gcc.dg/cpp/expcounter1.c | 16 +
gcc/testsuite/gcc.dg/cpp/expcounter2.c | 21 +++
gcc/testsuite/gcc.dg/cpp/expcounter3.c | 22 +++
gcc/testsuite/gcc.dg/cpp/expcounter4.c | 22 +++
gcc/testsuite/gcc.dg/cpp/expcounter5.c | 28 +
libcpp/ChangeLog   | 49 
libcpp/include/cpplib.h|  8 +++
libcpp/init.cc |  2 +
libcpp/macro.cc| 44 +-
11 files changed, 299 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/cpp/expcounter1.c
create mode 100644 gcc/testsuite/gcc.dg/cpp/expcounter2.c
create mode 100644 gcc/testsuite/gcc.dg/cpp/expcounter3.c
create mode 100644 gcc/testsuite/gcc.dg/cpp/expcounter4.c
create mode 100644 gcc/testsuite/gcc.dg/cpp/expcounter5.c

diff --git a/gcc/doc/cpp.texi b/gcc/doc/cpp.texi
index 90b2767e39a..d52450958d7 100644
--- a/gcc/doc/cpp.texi
+++ b/gcc/doc/cpp.texi
@@ -1941,6 +1941,87 @@ generate unique identifiers.  Care must be 
taken to ensure that

@code{__COUNTER__} is not expanded prior to inclusion of precompiled headers
which use it.  Otherwise, the precompiled headers will not be used.

+@item __EXP_COUNTER__
+This macro's name means "(macro) expansion counter".
+Outside of macro replacement sequences, it expands to the integer
+token @code{1}.  This make it possible to easily test for the presence
+of this feature using conditional directives such as
+@code{#if __EXP_COUNTER__}.


It's a macro, so you can just use '#ifdef __EXP_COUNTER__' to test if
it's supported. Is this additional behaviour necessary?


+When @code{__EXP_COUNTER__} occurs in the replacement token sequence
+of a macro, it expands to positive decimal integer token which


expands to _a_ positive decimal integer token?


+uniquely identifies the expansion, within a translation unit.
+Unlike 

Re: [PATCH v1] RISC-V: Bugfix ICE for __attribute__((target("arch=+v"))

2024-03-18 Thread juzhe.zh...@rivai.ai
I can't review this stuff. Let kito review this.


Thanks.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-03-18 14:05
To: gcc-patches
CC: juzhe.zhong; kito.cheng; yanzhang.wang; Pan Li
Subject: [PATCH v1] RISC-V: Bugfix ICE for __attribute__((target("arch=+v"))
From: Pan Li 
 
This patch would like to fix one ICE for __attribute__((target("arch=+v"))
and likewise extension(s). Given we have sample code as below:
 
void __attribute__((target("arch=+v")))
test_2 (int *a, int *b, int *out, unsigned count)
{
  unsigned i;
  for (i = 0; i < count; i++)
   out[i] = a[i] + b[i];
}
 
It will have ICE when build with -march=rv64gc -O3.
 
test.c: In function ‘test_2’:
test.c:4:1: internal compiler error: Floating point exception
4 | {
  | ^
0x1a5891b crash_signal
.../__RISC-V_BUILD__/../gcc/toplev.cc:319
0x7f0a7884251f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x1f51ba4 riscv_hard_regno_nregs
.../__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:8143
0x1967bb9 init_reg_modes_target()
.../__RISC-V_BUILD__/../gcc/reginfo.cc:471
0x13fc029 init_emit_regs()
.../__RISC-V_BUILD__/../gcc/emit-rtl.cc:6237
0x1a5b83d target_reinit()
.../__RISC-V_BUILD__/../gcc/toplev.cc:1936
0x35e374d save_target_globals()
.../__RISC-V_BUILD__/../gcc/target-globals.cc:92
0x35e381f save_target_globals_default_opts()
.../__RISC-V_BUILD__/../gcc/target-globals.cc:122
0x1f544cc riscv_save_restore_target_globals(tree_node*)
.../__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:9138
0x1f55c36 riscv_set_current_function
...
 
There are two reasons for this ICE.
1. The implied extension(s) of v are not well handled and the
   TARGET_MIN_VLEN is 0 which is not reinitialized.  Then the
   size / TARGET_MIN_VLEN will have DivideByZero.
2. The machine modes of the vector types will be vary after
   the v extension is introduced.
 
This patch passed below testsuite:
1. The riscv fully regression test.
 
PR target/114352
 
gcc/ChangeLog:
 
* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse_single_ext): Add implied, combine
and conflict check after parse single extension.
* config/riscv/riscv.cc (riscv_set_current_function):
Reini the machine mode before when set cur function.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr114352-1.c: New test.
* gcc.target/riscv/rvv/base/pr114352-2.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/common/config/riscv/riscv-common.cc   | 33 ---
gcc/config/riscv/riscv.cc |  4 ++
.../gcc.target/riscv/rvv/base/pr114352-1.c| 58 +++
.../gcc.target/riscv/rvv/base/pr114352-2.c| 27 +
4 files changed, 115 insertions(+), 7 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-2.c
 
diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 48efef40dfd..d32bf147eca 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1375,20 +1375,39 @@ riscv_subset_list::parse_single_multiletter_ext (const 
char *p,
const char *
riscv_subset_list::parse_single_ext (const char *p, bool exact_single_p)
{
+  const char *end_of_ext;
+
   switch (p[0])
 {
 case 'x':
-  return parse_single_multiletter_ext (p, "x", "non-standard extension",
-exact_single_p);
+  end_of_ext = parse_single_multiletter_ext (p, "x",
+ "non-standard extension",
+ exact_single_p);
+  break;
 case 'z':
-  return parse_single_multiletter_ext (p, "z", "sub-extension",
-exact_single_p);
+  end_of_ext = parse_single_multiletter_ext (p, "z", "sub-extension",
+ exact_single_p);
+  break;
 case 's':
-  return parse_single_multiletter_ext (p, "s", "supervisor extension",
-exact_single_p);
+  end_of_ext = parse_single_multiletter_ext (p, "s", "supervisor 
extension",
+ exact_single_p);
+  break;
 default:
-  return parse_single_std_ext (p, exact_single_p);
+  end_of_ext = parse_single_std_ext (p, exact_single_p);
+  break;
 }
+
+  /* Make sure the implied or combined extension is included after add
+ a new std extension to subset list.  For exmaple as below,
+
+ void __attribute__((target("arch=+v"))) func () with -march=rv64gc.
+
+ The implied zvl128b and zve64d of the std v should be included.  */
+  handle_implied_ext (p);
+  handle_combine_ext ();
+  check_conflict_ext ();
+
+  return end_of_ext;
}
/* Parsing arch string to subset list, return NULL if parsing failed.  */
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 680c4a728e9..89acb94af10 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9474,6 +9474,10 @@ riscv_set_current_function (tree decl)
   cl_target_option_restore (_options, _options_set,
TREE_TARGET_OPTION (new_tree));
+  /* The ISA extension can vary based on the 

[PATCH v1] RISC-V: Bugfix ICE for __attribute__((target("arch=+v"))

2024-03-18 Thread pan2 . li
From: Pan Li 

This patch would like to fix one ICE for __attribute__((target("arch=+v"))
and likewise extension(s). Given we have sample code as below:

void __attribute__((target("arch=+v")))
test_2 (int *a, int *b, int *out, unsigned count)
{
  unsigned i;
  for (i = 0; i < count; i++)
   out[i] = a[i] + b[i];
}

It will have ICE when build with -march=rv64gc -O3.

test.c: In function ‘test_2’:
test.c:4:1: internal compiler error: Floating point exception
4 | {
  | ^
0x1a5891b crash_signal
.../__RISC-V_BUILD__/../gcc/toplev.cc:319
0x7f0a7884251f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x1f51ba4 riscv_hard_regno_nregs
.../__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:8143
0x1967bb9 init_reg_modes_target()
.../__RISC-V_BUILD__/../gcc/reginfo.cc:471
0x13fc029 init_emit_regs()
.../__RISC-V_BUILD__/../gcc/emit-rtl.cc:6237
0x1a5b83d target_reinit()
.../__RISC-V_BUILD__/../gcc/toplev.cc:1936
0x35e374d save_target_globals()
.../__RISC-V_BUILD__/../gcc/target-globals.cc:92
0x35e381f save_target_globals_default_opts()
.../__RISC-V_BUILD__/../gcc/target-globals.cc:122
0x1f544cc riscv_save_restore_target_globals(tree_node*)
.../__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:9138
0x1f55c36 riscv_set_current_function
...

There are two reasons for this ICE.
1. The implied extension(s) of v are not well handled and the
   TARGET_MIN_VLEN is 0 which is not reinitialized.  Then the
   size / TARGET_MIN_VLEN will have DivideByZero.
2. The machine modes of the vector types will be vary after
   the v extension is introduced.

This patch passed below testsuite:
1. The riscv fully regression test.

PR target/114352

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse_single_ext): Add implied, combine
and conflict check after parse single extension.
* config/riscv/riscv.cc (riscv_set_current_function):
Reini the machine mode before when set cur function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr114352-1.c: New test.
* gcc.target/riscv/rvv/base/pr114352-2.c: New test.

Signed-off-by: Pan Li 
---
 gcc/common/config/riscv/riscv-common.cc   | 33 ---
 gcc/config/riscv/riscv.cc |  4 ++
 .../gcc.target/riscv/rvv/base/pr114352-1.c| 58 +++
 .../gcc.target/riscv/rvv/base/pr114352-2.c| 27 +
 4 files changed, 115 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-2.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 48efef40dfd..d32bf147eca 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1375,20 +1375,39 @@ riscv_subset_list::parse_single_multiletter_ext (const 
char *p,
 const char *
 riscv_subset_list::parse_single_ext (const char *p, bool exact_single_p)
 {
+  const char *end_of_ext;
+
   switch (p[0])
 {
 case 'x':
-  return parse_single_multiletter_ext (p, "x", "non-standard extension",
-  exact_single_p);
+  end_of_ext = parse_single_multiletter_ext (p, "x",
+"non-standard extension",
+exact_single_p);
+  break;
 case 'z':
-  return parse_single_multiletter_ext (p, "z", "sub-extension",
-  exact_single_p);
+  end_of_ext = parse_single_multiletter_ext (p, "z", "sub-extension",
+exact_single_p);
+  break;
 case 's':
-  return parse_single_multiletter_ext (p, "s", "supervisor extension",
-  exact_single_p);
+  end_of_ext = parse_single_multiletter_ext (p, "s", "supervisor 
extension",
+exact_single_p);
+  break;
 default:
-  return parse_single_std_ext (p, exact_single_p);
+  end_of_ext = parse_single_std_ext (p, exact_single_p);
+  break;
 }
+
+  /* Make sure the implied or combined extension is included after add
+ a new std extension to subset list.  For exmaple as below,
+
+ void __attribute__((target("arch=+v"))) func () with -march=rv64gc.
+
+ The implied zvl128b and zve64d of the std v should be included.  */
+  handle_implied_ext (p);
+  handle_combine_ext ();
+  check_conflict_ext ();
+
+  return end_of_ext;
 }
 
 /* Parsing arch string to subset list, return NULL if parsing failed.  */
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 680c4a728e9..89acb94af10 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9474,6 +9474,10 @@ riscv_set_current_function (tree decl)
   cl_target_option_restore (_options,