Re: [PATCH, rs6000] Call vector load/store with length expand only on 64-bit Power10 [PR96762]

2023-08-30 Thread HAO CHEN GUI via Gcc-patches
Kewen,
  I refined the patch according to your comments and it passed bootstrap
and regression test.

  I committed it as
https://gcc.gnu.org/g:946b8967b905257ac9f140225db744c9a6ab91be

Thanks
Gui Haochen

在 2023/8/29 16:55, Kewen.Lin 写道:
> Hi Haochen,
> 
> on 2023/8/29 10:50, HAO CHEN GUI wrote:
>> Hi,
>>   This patch adds "TARGET_64BIT" check when calling vector load/store
>> with length expand in expand_block_move. It matches the expand condition
>> of "lxvl" and "stxvl" defined in vsx.md.
>>
>>   This patch fixes the ICE occurred with the test case on 32-bit Power10.
>>
>>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
>>
>> Thanks
>> Gui Haochen
>>
>>
>> ChangeLog
>> rs6000: call vector load/store with length expand only on 64-bit Power10
>>
>> gcc/
>>  PR target/96762
>>  * config/rs6000/rs6000-string.cc (expand_block_move): Call vector
>>  load/store with length expand only on 64-bit Power10.
>>
>> gcc/testsuite/
>>  PR target/96762
>>  * gcc.target/powerpc/pr96762.c: New.
>>
>>
>> patch.diff
>> diff --git a/gcc/config/rs6000/rs6000-string.cc 
>> b/gcc/config/rs6000/rs6000-string.cc
>> index cd8ee8c..d1b48c2 100644
>> --- a/gcc/config/rs6000/rs6000-string.cc
>> +++ b/gcc/config/rs6000/rs6000-string.cc
>> @@ -2811,8 +2811,9 @@ expand_block_move (rtx operands[], bool might_overlap)
>>gen_func.mov = gen_vsx_movv2di_64bit;
>>  }
>>else if (TARGET_BLOCK_OPS_UNALIGNED_VSX
>> -   && TARGET_POWER10 && bytes < 16
>> -   && orig_bytes > 16
>> +   /* Only use lxvl/stxvl on 64bit POWER10.  */
>> +   && TARGET_POWER10 && TARGET_64BIT
>> +   && bytes < 16 && orig_bytes > 16
>> && !(bytes == 1 || bytes == 2
>>  || bytes == 4 || bytes == 8)
>> && (align >= 128 || !STRICT_ALIGNMENT))
> 
> Nit: Since you touched this part of code, could you format it better as well, 
> like:
> 
>   else if (TARGET_BLOCK_OPS_UNALIGNED_VSX
>  /* Only use lxvl/stxvl on 64bit POWER10.  */
>  && TARGET_POWER10
>  && TARGET_64BIT
>  && bytes < 16
>  && orig_bytes > 16
>  && !(bytes == 1
>   || bytes == 2
>   || bytes == 4
>   || bytes == 8)
>  && (align >= 128
>  || !STRICT_ALIGNMENT))
> 
> 
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96762.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr96762.c
>> new file mode 100644
>> index 000..1145dd1
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr96762.c
>> @@ -0,0 +1,11 @@
>> +/* { dg-do compile { target ilp32 } } */
> 
> Nit: we can compile this on lp64, so you can remove the ilp32 restriction,
> ...
> 
>> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
>> +
> 
> ... but add one comment line to note the initial purpose, like:
> 
> /* Verify there is no ICE on ilp32 env.  */
> 
> or similar.
> 
> Okay for trunk with these nits fixed, thanks!
> 
> BR,
> Kewen
> 
>> +extern void foo (char *);
>> +
>> +void
>> +bar (void)
>> +{
>> +  char zj[] = "";
>> +  foo (zj);
>> +}


Re: [PATCH V5 1/2] Add overflow API for plus minus mult on range

2023-08-30 Thread guojiufu via Gcc-patches

On 2023-08-03 21:18, Andrew MacLeod wrote:

This is OK.



Thanks a lot!  Committed via r14-3582.


BR,
Jeff (Jiufu Guo)



On 8/2/23 22:18, Jiufu Guo wrote:

Hi,

I would like to have a ping on this patch.

BR,
Jeff (Jiufu Guo)


Jiufu Guo  writes:


Hi,

As discussed in previous reviews, adding overflow APIs to range-op
would be useful. Those APIs could help to check if overflow happens
when operating between two 'range's, like: plus, minus, and mult.

Previous discussions are here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624067.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624701.html

Bootstrap & regtest pass on ppc64{,le} and x86_64.
Is this patch ok for trunk?

BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* range-op-mixed.h (operator_plus::overflow_free_p): New declare.
(operator_minus::overflow_free_p): New declare.
(operator_mult::overflow_free_p): New declare.
* range-op.cc (range_op_handler::overflow_free_p): New function.
(range_operator::overflow_free_p): New default function.
(operator_plus::overflow_free_p): New function.
(operator_minus::overflow_free_p): New function.
(operator_mult::overflow_free_p): New function.
* range-op.h (range_op_handler::overflow_free_p): New declare.
(range_operator::overflow_free_p): New declare.
* value-range.cc (irange::nonnegative_p): New function.
(irange::nonpositive_p): New function.
* value-range.h (irange::nonnegative_p): New declare.
(irange::nonpositive_p): New declare.

---
  gcc/range-op-mixed.h |  11 
  gcc/range-op.cc  | 124 
+++

  gcc/range-op.h   |   5 ++
  gcc/value-range.cc   |  12 +
  gcc/value-range.h|   2 +
  5 files changed, 154 insertions(+)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index 6944742ecbc..42157ed9061 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -383,6 +383,10 @@ public:
  relation_kind rel) const final override;
void update_bitmask (irange , const irange ,
   const irange ) const final override;
+
+  virtual bool overflow_free_p (const irange , const irange ,
+   relation_trio = TRIO_VARYING) const;
+
  private:
void wi_fold (irange , tree type, const wide_int _lb,
const wide_int _ub, const wide_int _lb,
@@ -446,6 +450,10 @@ public:
relation_kind rel) const final override;
void update_bitmask (irange , const irange ,
   const irange ) const final override;
+
+  virtual bool overflow_free_p (const irange , const irange ,
+   relation_trio = TRIO_VARYING) const;
+
  private:
void wi_fold (irange , tree type, const wide_int _lb,
const wide_int _ub, const wide_int _lb,
@@ -525,6 +533,9 @@ public:
const REAL_VALUE_TYPE _lb, const REAL_VALUE_TYPE _ub,
const REAL_VALUE_TYPE _lb, const REAL_VALUE_TYPE _ub,
relation_kind kind) const final override;
+  virtual bool overflow_free_p (const irange , const irange ,
+   relation_trio = TRIO_VARYING) const;
+
  };
class operator_addr_expr : public range_operator
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index cb584314f4c..632b044331b 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -366,6 +366,22 @@ range_op_handler::op1_op2_relation (const vrange 
) const

  }
  }
  +bool
+range_op_handler::overflow_free_p (const vrange ,
+  const vrange ,
+  relation_trio rel) const
+{
+  gcc_checking_assert (m_operator);
+  switch (dispatch_kind (lh, lh, rh))
+{
+  case RO_III:
+   return m_operator->overflow_free_p(as_a  (lh),
+  as_a  (rh),
+  rel);
+  default:
+   return false;
+}
+}
// Convert irange bitmasks into a VALUE MASK pair suitable for 
calling CCP.
  @@ -688,6 +704,13 @@ range_operator::op1_op2_relation_effect 
(irange _range ATTRIBUTE_UNUSED,

return false;
  }
  +bool
+range_operator::overflow_free_p (const irange &, const irange &,
+relation_trio) const
+{
+  return false;
+}
+
  // Apply any known bitmask updates based on this operator.
void
@@ -4311,6 +4334,107 @@ range_op_table::initialize_integral_ops ()
}
  +bool
+operator_plus::overflow_free_p (const irange , const irange ,
+   relation_trio) const
+{
+  if (lh.undefined_p () || rh.undefined_p ())
+return false;
+
+  tree type = lh.type ();
+  if (TYPE_OVERFLOW_UNDEFINED (type))
+return true;
+
+  wi::overflow_type ovf;
+  signop sgn = TYPE_SIGN (type);
+  wide_int wmax0 = lh.upper_bound ();
+  wide_int wmax1 = rh.upper_bound ();
+  wi::add (wmax0, wmax1, sgn, );
+  if 

[committed] arc: Honor SWAP option for lsl16 instruction

2023-08-30 Thread Claudiu Zissulescu via Gcc-patches
The LSL16 instruction is only available if SWAP (-mswap) option is
turned on.

gcc/ChangeLog:

* config/arc/arc.cc (arc_split_mov_const): Use LSL16 only when
SWAP option is enabled.
* config/arc/arc.md (ashlsi2_cnt16): Likewise.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/config/arc/arc.cc | 2 +-
 gcc/config/arc/arc.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
index 266ba8b00bb..8ee7387286e 100644
--- a/gcc/config/arc/arc.cc
+++ b/gcc/config/arc/arc.cc
@@ -11647,7 +11647,7 @@ arc_split_mov_const (rtx *operands)
 }
 
   /* 3. Check if we can just shift by 16 to fit into the u6 of LSL16.  */
-  if (TARGET_BARREL_SHIFTER && TARGET_V2
+  if (TARGET_SWAP && TARGET_V2
   && ((ival & ~0x3f) == 0))
 {
   shimm = (ival >> 16) & 0x3f;
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 1f122d9507f..a4e77a207bf 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -5991,7 +5991,7 @@ (define_insn "*ashlsi2_cnt16"
   [(set (match_operand:SI 0 "register_operand""=r")
(ashift:SI (match_operand:SI 1 "nonmemory_operand" "rL")
   (const_int 16)))]
-  "TARGET_BARREL_SHIFTER && TARGET_V2"
+  "TARGET_SWAP && TARGET_V2"
   "lsl16\\t%0,%1"
   [(set_attr "type" "shift")
(set_attr "iscompact" "false")
-- 
2.30.2



Re: [PATCH v4] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-30 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-31 at 10:46 +0800, chenxiaolong wrote:
> +;; Implement __builtin_fabs128 function.
> +
> +(define_expand "abstf2"
> +  [(match_operand:TF 0 "register_operand")
> +   (match_operand:TF 1 "register_operand")]
> +  "TARGET_64BIT"
> +{
> +  loongarch_emit_move (operands[0], operands[1]);
> +  emit_insn (gen_abstf_local (operands[0]));
> +  DONE;
> +})
> +
> +(define_insn "abstf_local"
> +  [(set (match_operand:TF 0 "register_operand" "+r")
> +   (abs:TF (match_dup 0)))]
> +  "TARGET_64BIT"
> +{
> +  operands[0] = gen_rtx_REG (DImode, REGNO (operands[0]) + 1);
> +  return "bstrins.d\t%0,$r0,0x3f,0x3f";
> +})

This should be removed because the "generic" expand works fine:

$ cat t.c
_Float128 fabsf128 (_Float128 in)
{
  return __builtin_fabsf128 (in);
}
$ cc t.c -S -O2 -o-
fabsf128:
.LFB0 = .
.cfi_startproc
bstrpick.d  $r5,$r5,62,0
jr  $r1
.cfi_endproc

It does not work with -O0, but -O0 means "not optimized" anyway.

> +;; Implement __builtin_copysignf128 function.
> +
> +(define_insn_and_split "copysigntf3"
> +  [(set (match_operand:TF 0 "register_operand" "=")
> +   (unspec:TF [(match_operand:TF 1 "register_operand" "r")
> +   (match_operand:TF 2 "register_operand" "r")]
> +   UNSPEC_COPYSIGNF128))]
> +  "TARGET_64BIT"
> +  "#"
> +  "reload_completed"
> + [(const_int 0)]
> +{
> +  rtx op0_lo = gen_rtx_REG (DImode,REGNO (operands[0]) + 0);
> +  rtx op0_hi = gen_rtx_REG (DImode,REGNO (operands[0]) + 1);
> +  rtx op1_lo = gen_rtx_REG (DImode,REGNO (operands[1]) + 0);
> +  rtx op1_hi = gen_rtx_REG (DImode,REGNO (operands[1]) + 1);
> +  rtx op2_hi = gen_rtx_REG (DImode,REGNO (operands[2]) + 1);
> +
> +  if (REGNO (operands[1]) == REGNO (operands[2]))
> +    {
> +  loongarch_emit_move (operands[0], operands[1]);
> +  DONE;
> +    }
> +  else
> +    {
> +  loongarch_emit_move (op0_hi, op2_hi);
> +  loongarch_emit_move (op0_lo, op1_lo);
> +  emit_insn (gen_insvdi (op0_hi, GEN_INT (63), GEN_INT (0), op1_hi));
> +  DONE;
> +    }
> +})

Hmm... The generic implementation does not work:

copysignf128:
.LFB0 = .
.cfi_startproc
or  $r12,$r0,$r0
lu52i.d $r12,$r12,0x8000>>52
and $r7,$r7,$r12
bstrpick.d  $r5,$r5,62,0
or  $r5,$r5,$r7
jr  $r1
.cfi_endproc

It's sub-optimal.  But there seems a general issue about cases like

int test(int a, int b)
{
  return (a & ~0x10) | (b & 0x10);
}

It's compiled to:

test:
.LFB0 = .
.cfi_startproc
addi.w  $r12,$r0,-17# 0xffef
and $r12,$r12,$r4
andi$r5,$r5,16
or  $r12,$r12,$r5
slli.w  $r4,$r12,0
jr  $r1
.cfi_endproc

But the optimal implementation should be:

bstrpick.w $r4, $r4, 4, 4
bstrins.w  $r5, $r4, 4, 4
or $r5, $r4, $r0

So to me we should fix the general case instead.  Please hold this part
(you can commit the remains of the patch w/o the loongarch.md change for
now), and I'll try to fix the general case.

Created https://gcc.gnu.org/PR111252 for tracking the issue.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH v2] RISC-V: Optimize the MASK opt generation

2023-08-30 Thread Feng Wang
This patch rebases the change of "[PATCH] RISC-V: Optimize the MASK opt
generation". Please check the detail info on the 
"https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg302295.html;

gcc/ChangeLog:

* config/riscv/riscv-opts.h (MASK_ZICSR):
(MASK_ZIFENCEI): Delete;
(MASK_ZIHINTNTL):Ditto;
(MASK_ZIHINTPAUSE):  Ditto;
(TARGET_ZICSR):  Ditto;
(TARGET_ZIFENCEI):   Ditto;
(TARGET_ZIHINTNTL):  Ditto;
(TARGET_ZIHINTPAUSE):Ditto;
(MASK_ZAWRS):Ditto;
(TARGET_ZAWRS):  Ditto;
(MASK_ZBA):  Ditto;
(MASK_ZBB):  Ditto;
(MASK_ZBC):  Ditto;
(MASK_ZBS):  Ditto;
(TARGET_ZBA):Ditto;
(TARGET_ZBB):Ditto;
(TARGET_ZBC):Ditto;
(TARGET_ZBS):Ditto;
(MASK_ZFINX):Ditto;
(MASK_ZDINX):Ditto;
(MASK_ZHINX):Ditto;
(MASK_ZHINXMIN): Ditto;
(TARGET_ZFINX):  Ditto;
(TARGET_ZDINX):  Ditto;
(TARGET_ZHINX):  Ditto;
(TARGET_ZHINXMIN):   Ditto;
(MASK_ZBKB): Ditto;
(MASK_ZBKC): Ditto;
(MASK_ZBKX): Ditto;
(MASK_ZKNE): Ditto;
(MASK_ZKND): Ditto;
(MASK_ZKNH): Ditto;
(MASK_ZKR):  Ditto;
(MASK_ZKSED):Ditto;
(MASK_ZKSH): Ditto;
(MASK_ZKT):  Ditto;
(TARGET_ZBKB):   Ditto;
(TARGET_ZBKC):   Ditto;
(TARGET_ZBKX):   Ditto;
(TARGET_ZKNE):   Ditto;
(TARGET_ZKND):   Ditto;
(TARGET_ZKNH):   Ditto;
(TARGET_ZKR):Ditto;
(TARGET_ZKSED):  Ditto;
(TARGET_ZKSH):   Ditto;
(TARGET_ZKT):Ditto;
(MASK_ZTSO): Ditto;
(TARGET_ZTSO):   Ditto;
(MASK_VECTOR_ELEN_32):   Ditto;
(MASK_VECTOR_ELEN_64):   Ditto;
(MASK_VECTOR_ELEN_FP_32):Ditto;
(MASK_VECTOR_ELEN_FP_64):Ditto;
(MASK_VECTOR_ELEN_FP_16):Ditto;
(TARGET_VECTOR_ELEN_32): Ditto;
(TARGET_VECTOR_ELEN_64): Ditto;
(TARGET_VECTOR_ELEN_FP_32):Ditto;
(TARGET_VECTOR_ELEN_FP_64):Ditto;
(TARGET_VECTOR_ELEN_FP_16):Ditto;
 (MASK_ZVBB):   Ditto;
(MASK_ZVBC):   Ditto;
(TARGET_ZVBB): Ditto;
(TARGET_ZVBC): Ditto;
(MASK_ZVKG):   Ditto;
(MASK_ZVKNED): Ditto;
(MASK_ZVKNHA): Ditto;
(MASK_ZVKNHB): Ditto;
(MASK_ZVKSED): Ditto;
(MASK_ZVKSH):  Ditto;
(MASK_ZVKN):   Ditto;
(MASK_ZVKNC):  Ditto;
(MASK_ZVKNG):  Ditto;
(MASK_ZVKS):   Ditto;
(MASK_ZVKSC):  Ditto;
(MASK_ZVKSG):  Ditto;
(MASK_ZVKT):   Ditto;
(TARGET_ZVKG): Ditto;
(TARGET_ZVKNED):   Ditto;
(TARGET_ZVKNHA):   Ditto;
(TARGET_ZVKNHB):   Ditto;
(TARGET_ZVKSED):   Ditto;
(TARGET_ZVKSH):Ditto;
(TARGET_ZVKN): Ditto;
(TARGET_ZVKNC):Ditto;
(TARGET_ZVKNG):Ditto;
(TARGET_ZVKS): Ditto;
(TARGET_ZVKSC):Ditto;
(TARGET_ZVKSG):Ditto;
(TARGET_ZVKT): Ditto;
(MASK_ZVL32B): Ditto;
(MASK_ZVL64B): Ditto;
(MASK_ZVL128B):Ditto;
(MASK_ZVL256B):Ditto;
(MASK_ZVL512B):Ditto;
(MASK_ZVL1024B):   Ditto;
(MASK_ZVL2048B):   Ditto;
(MASK_ZVL4096B):   Ditto;
(MASK_ZVL8192B):   Ditto;
(MASK_ZVL16384B):  Ditto;
(MASK_ZVL32768B):  Ditto;
(MASK_ZVL65536B):  Ditto;
(TARGET_ZVL32B):   Ditto;
(TARGET_ZVL64B):   Ditto;
(TARGET_ZVL128B):  Ditto;
(TARGET_ZVL256B):  Ditto;
(TARGET_ZVL512B):  Ditto;
(TARGET_ZVL1024B): Ditto;
(TARGET_ZVL2048B): Ditto;
(TARGET_ZVL4096B): Ditto;
(TARGET_ZVL8192B): Ditto;
(TARGET_ZVL16384B):Ditto;
(TARGET_ZVL32768B):Ditto;
(TARGET_ZVL65536B):Ditto;
(MASK_ZICBOZ): Ditto;
(MASK_ZICBOM): Ditto;
(MASK_ZICBOP): Ditto;
(TARGET_ZICBOZ):   Ditto;
(TARGET_ZICBOM):   Ditto;
(TARGET_ZICBOP):   Ditto;
(MASK_ZICOND): Ditto;
(TARGET_ZICOND):   Ditto;
(MASK_ZFA):Ditto;
(TARGET_ZFA):  Ditto;
(MASK_ZFHMIN): Ditto;
(MASK_ZFH):Ditto;
(MASK_ZVFHMIN):Ditto;
(MASK_ZVFH):   Ditto;
(TARGET_ZFHMIN):   Ditto;

Re: [PING][PATCH] LoongArch: initial ada support on linux

2023-08-30 Thread chenglulu

ping?

在 2023/8/25 下午1:55, Yujie Yang 写道:

Hi!

I'd like to ping this patch for acknowledgement from the Ada team.

We have successfully compiled a cross-native toolchain with Ada enabled
for loongarch64-linux-gnuf64 (or loongarch64-linux-gnu), and have run the
regtests with the following results:

While the failures are being worked on, we would like to merge this patch
first so we can have basic ada support for debian test-builds.

=== gnat Summary ===

# of expected passes3376
# of unexpected failures1
# of expected failures  23
# of unsupported tests  25

FAIL: gnat.dg/prot7.adb (test for excess errors)

=== acats Summary ===
# of expected passes2325
# of unexpected failures3

*** FAILURES: c35503d c35503f c4a007a

Sincerely,
Yujie




Re: [PATCH v2 1/4] LoongArch: improved target configuration interface

2023-08-30 Thread Yujie Yang
On Wed, Aug 30, 2023 at 09:36:22PM +, Joseph Myers wrote:
> On Wed, 30 Aug 2023, Yang Yujie wrote:
> 
> > +A suffix @code{[/ARCH][/OPTION]...]} may follow immediately after the ABI
> > +identifier to customize the compiler options for building the given set of
> > +libraries.  @code{ARCH} denotes the architecture name recognized by the
> > +@code{-march=ARCH} compiler option, which acts as a basic target ISA
> > +configuration that can be adjusted using the subsequent @code{OPTION}
> > +suffixes, where each @code{OPTION} is a compiler option itself.
> 
> Since ARCH and OPTION are not literal strings of program source code, you 
> should actually be using @var{arch} and @var{option} for them (and @dots{} 
> instead of ..., since the ... isn't literal source code either).
> 
> This patch series also adds a new configure option --with-strict-align-lib 
> that needs documenting in the corresponding patch.
> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com

Thanks for the review.  Does the following fix look good?
If so, I will include these in the patchset.

Yujie

---
 gcc/doc/install.texi | 44 +++-
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 05a626280b7..3e589080f4e 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1236,7 +1236,7 @@ sysv, aix.
 @itemx --without-multilib-list
 Specify what multilibs to build.  @var{list} is a comma separated list of
 values, possibly consisting of a single value.  Currently only implemented
-for aarch64*-*-*, arm*-*-*, loongarch64-*-*, riscv*-*-*, sh*-*-* and
+for aarch64*-*-*, arm*-*-*, loongarch*-*-*, riscv*-*-*, sh*-*-* and
 x86-64-*-linux*.  The accepted values and meaning for each target is given
 below.
 
@@ -1331,27 +1331,27 @@ the following ABI identifiers: @code{lp64d[/base]} 
@code{lp64f[/base]}
 @code{lp64d[/base]} (the @code{/base} suffix may be omitted)
 to enable their respective run-time libraries.
 
-A suffix @code{[/ARCH][/OPTION]...]} may follow immediately after the ABI
-identifier to customize the compiler options for building the given set of
-libraries.  @code{ARCH} denotes the architecture name recognized by the
-@code{-march=ARCH} compiler option, which acts as a basic target ISA
-configuration that can be adjusted using the subsequent @code{OPTION}
-suffixes, where each @code{OPTION} is a compiler option itself.
+A suffix @code{[/@var{arch}][/@var{option}/@dots{}]} may follow immediately
+after the ABI identifier to customize the compiler options for building the
+given set of libraries.  @var{arch} denotes the architecture name recognized
+by the @option{-march=@var{arch}} compiler option, which acts as a basic target
+ISA configuration that can be adjusted using the subsequent @var{option}
+suffixes, where each @var{option} is a compiler option itself.
 
-If none of such suffix is present, the configured value of
-@option{--with-multilib-default} can be used as a common default suffix
-for all library ABI variants.  Otherwise, the default build option
-@code{-march=abi-default} is applied when building the variants without
-a suffix.
+If no such suffix is present for a given multilib variant, the
+configured value of @code{--with-multilib-default} is appended as a default
+suffix.  If @code{--with-multilib-default} is not given, the default build
+option @code{-march=abi-default} is applied when building the variants
+without a suffix.
 
-As a special case, @code{fixed} may be used in the position of @code{ARCH},
-which means use the architecture configured with @option{--with-arch=ARCH},
-or its default value (e.g. @code{loongarch64} for @code{loongarch64-*}
-targets).
+As a special case, @code{fixed} may be used in the position of @var{arch},
+which means using the architecture configured with
+@code{--with-arch=@var{arch}}, or its default value (e.g. @code{loongarch64}
+for @code{loongarch64-*} targets).
 
-If @var{list} is empty or @code{default}, or if @option{--with-multilib-list}
-is not specified, then the default ABI as specified by @option{--with-abi} or
-implied by @option{--target}.
+If @var{list} is empty or @code{default}, or if @code{--with-multilib-list}
+is not specified, then only the default variant of the libraries are built,
+where the default ABI is implied by the configured target triplet.
 
 @item riscv*-*-*
 @var{list} is a single ABI name.  The target architecture must be either
@@ -1414,6 +1414,9 @@ Multiple @code{OPTION}s may appear consecutively while 
@code{ARCH} may only
 appear in the beginning or be omitted (which means @code{-march=abi-default}
 is applied when building the libraries).
 
+@item --with-strict-align-lib
+On LoongArch targets, build all enabled multilibs with @code{-mstrict-align}
+(Not enabled by default).
 
 @item --with-multilib-generator=@var{config}
 Specify what multilibs to build.  @var{config} is a semicolon separated list of
@@ -4539,8 +4542,7 @@ Uses 

[PATCH v4] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-30 Thread chenxiaolong
Brief version history of patch set:

v1 -> v2:
   According to the GNU code specification, adjust the format of the
function implementation with "q" as the suffix function.

v2 - >v3:

   1.On the LoongArch architecture, refer to the functionality of 64-bit 
functions
and modify the underlying implementation of __builtin_{nanq,nansq} functions
in libgcc.

   2.Modify the function's instruction template to use some instructions such as
"bstrins.d" to implement the 128-bit __builtin_{fabsq,copysignq} function
instead of calling libgcc library support, so as to better play the machine's
performance.

v3 -> v4:

   1.The above v1,v2, and v3 all implement 128-bit floating-point functions
with "q" as the suffix, but it is an older implementation. The v4 version
completely abandoned the old implementation by associating the 128-bit
floating-point function with the "q" suffix with the "f128" function that
already existed in GCC.

   2.Modify the code so that both "__float128" and "_Float128" function types
can be supported in compiler gcc.

   3.Associating a function with the suffix "q" to the "f128" function allows
two different forms of the function to produce the same effect, For example,
__builtin_{huge_{valq,valf128},{infq/inff128},{nanq/nanf128},{nansq/nansf128}}.

   4.For the _builtin_{fabsq,copysignq} function, do not call the new "f128"
implementation, but use the "bstrins" and other instructions in the machine
description file to implement the function function, the result is that the
number of assembly instructions can be reduced and the function optimization
to achieve the optimal effect.

During implementation, float128_type_node is bound with the type "__float128"
so that the compiler can correctly identify the type   of the function. The
"q" suffix is associated with the "f128" function, which makes GCC more
flexible to support different user input cases, implementing functions such
as __builtin_{huge_valq infq, fabsq, copysignq, nanq,nansq}.At the same time,
the __builtin_{copysign{q/f128},fabs{q/f128}} functions are optimized by
using "bstrins" and other instructions on LoongArch architecture to better
play the optimization performance of the compiler.

gcc/ChangeLog:

* config/loongarch/loongarch-builtins.cc (loongarch_init_builtins):
Associate the __float128 type to float128_type_node so that it can
be recognized by the compiler.
* config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins):
Add the flag "FLOAT128_TYPE" to gcc and associate a function
with the suffix "q" to "f128".
* config/loongarch/loongarch.md (abstf2):Modify the instruction
template to implement the __builtin_{copysignf128/fabsf128} function.
(abstf_local):Ditto.
(copysigntf3):Implement the built-in function __builtin_copysignf128().
* doc/extend.texi:Added support for 128-bit floating-point functions on
the LoongArch architecture.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/math-float-128.c: New test.
---
 gcc/config/loongarch/loongarch-builtins.cc|   5 +
 gcc/config/loongarch/loongarch-c.cc   |  11 ++
 gcc/config/loongarch/loongarch.md |  54 
 gcc/doc/extend.texi   |  20 ++-
 .../gcc.target/loongarch/math-float-128.c | 115 ++
 5 files changed, 202 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/math-float-128.c

diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index b929f224dfa..58b612bf445 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -256,6 +256,11 @@ loongarch_init_builtins (void)
   unsigned int i;
   tree type;
 
+  /* Register the type float128_type_node as a built-in type and
+ give it an alias "__float128".  */
+  (*lang_hooks.types.register_builtin_type) (float128_type_node,
+   "__float128");
+
   /* Iterate through all of the bdesc arrays, initializing all of the
  builtin functions.  */
   for (i = 0; i < ARRAY_SIZE (loongarch_builtins); i++)
diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index 67911b78f28..6ffbf748316 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -99,6 +99,17 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   else
 builtin_define ("__loongarch_frlen=0");
 
+  /* Add support for FLOAT128_TYPE on the LoongArch architecture.  */
+  builtin_define ("__FLOAT128_TYPE__");
+
+  /* Map the old _Float128 'q' builtins into the new 'f128' builtins.  */
+  builtin_define ("__builtin_fabsq=__builtin_fabsf128");
+  builtin_define ("__builtin_copysignq=__builtin_copysignf128");
+  builtin_define ("__builtin_nanq=__builtin_nanf128");
+  builtin_define ("__builtin_nansq=__builtin_nansf128");
+  builtin_define 

Re: [PATCH] RISC-V: Fix vsetvl pass ICE

2023-08-30 Thread Lehua Ding

Committed to the trunk and backported to GCC 13 one week later.
Thanks Juzhe and Kito.

On 2023/8/31 9:44, Kito Cheng via Gcc-patches wrote:

OK for gcc 13 branch too, the general rule for backport is to wait one
week on trunk to make sure the fix is stable.


On Thu, Aug 31, 2023 at 8:08 AM juzhe.zh...@rivai.ai
 wrote:


Ok for trunk. But not sure whether it's ok for GCC-13.



juzhe.zh...@rivai.ai

From: Lehua Ding
Date: 2023-08-30 17:51
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw
Subject: [PATCH] RISC-V: Fix vsetvl pass ICE
This patch fix pr111234 (a vsetvl pass ICE) when fuse a mask any
vlmax vsetvl_vtype_change_only insn with a mu vsetvl insn.

PR target/111234

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Remove condition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr111234.c: New test.

---
gcc/config/riscv/riscv-vsetvl.cc  |  2 +-
.../gcc.target/riscv/rvv/vsetvl/pr111234.c| 19 +++
2 files changed, 20 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 1386d9250ca..a81bb53a521 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -655,7 +655,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info 
,
  new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl);
else
  {
-  if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ()))
+  if (vsetvl_insn_p (rinsn))
new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, get_vl (rinsn));
else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only)
new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c
new file mode 100644
index 000..ee5eec4a257
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include 
+
+void
+f (vint32m1_t *in, vint64m2_t *out, vbool32_t *m, int b)
+{
+  vint32m1_t va = *in;
+  vbool32_t mask = *m;
+  vint64m2_t vb
+= __riscv_vwadd_vx_i64m2_m (mask, va, 1, __riscv_vsetvlmax_e64m2 ());
+  vint64m2_t vc = __riscv_vadd_vx_i64m2 (vb, 1, __riscv_vsetvlmax_e64m2 ());
+
+  if (b != 0)
+vc = __riscv_vadd_vx_i64m2_mu (mask, vc, vc, 1, __riscv_vsetvlmax_e64m2 
());
+
+  *out = vc;
+}
--
2.36.3



--
Best,
Lehua



Re: [PATCH] RISC-V: Fix vsetvl pass ICE

2023-08-30 Thread Kito Cheng via Gcc-patches
OK for gcc 13 branch too, the general rule for backport is to wait one
week on trunk to make sure the fix is stable.


On Thu, Aug 31, 2023 at 8:08 AM juzhe.zh...@rivai.ai
 wrote:
>
> Ok for trunk. But not sure whether it's ok for GCC-13.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Lehua Ding
> Date: 2023-08-30 17:51
> To: gcc-patches
> CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw
> Subject: [PATCH] RISC-V: Fix vsetvl pass ICE
> This patch fix pr111234 (a vsetvl pass ICE) when fuse a mask any
> vlmax vsetvl_vtype_change_only insn with a mu vsetvl insn.
>
> PR target/111234
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Remove condition.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/pr111234.c: New test.
>
> ---
> gcc/config/riscv/riscv-vsetvl.cc  |  2 +-
> .../gcc.target/riscv/rvv/vsetvl/pr111234.c| 19 +++
> 2 files changed, 20 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 1386d9250ca..a81bb53a521 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -655,7 +655,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info 
> ,
>  new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl);
>else
>  {
> -  if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ()))
> +  if (vsetvl_insn_p (rinsn))
> new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, get_vl (rinsn));
>else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only)
> new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX);
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c
> new file mode 100644
> index 000..ee5eec4a257
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +
> +#include 
> +
> +void
> +f (vint32m1_t *in, vint64m2_t *out, vbool32_t *m, int b)
> +{
> +  vint32m1_t va = *in;
> +  vbool32_t mask = *m;
> +  vint64m2_t vb
> += __riscv_vwadd_vx_i64m2_m (mask, va, 1, __riscv_vsetvlmax_e64m2 ());
> +  vint64m2_t vc = __riscv_vadd_vx_i64m2 (vb, 1, __riscv_vsetvlmax_e64m2 ());
> +
> +  if (b != 0)
> +vc = __riscv_vadd_vx_i64m2_mu (mask, vc, vc, 1, __riscv_vsetvlmax_e64m2 
> ());
> +
> +  *out = vc;
> +}
> --
> 2.36.3
>


[r14-3571 Regression] FAIL: gcc.target/i386/pr52252-atom.c scan-assembler palignr on Linux/x86_64

2023-08-30 Thread Jiang, Haochen via Gcc-patches
On Linux/x86_64,

caa7a99a052929d5970677c5b639e1fa5166e334 is the first bad commit
commit caa7a99a052929d5970677c5b639e1fa5166e334
Author: Richard Biener 
Date:   Wed Aug 30 11:57:47 2023 +0200

tree-optimization/111228 - combine two VEC_PERM_EXPRs

caused

FAIL: gcc.target/i386/pr52252-atom.c scan-assembler palignr

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-3571/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr52252-atom.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr52252-atom.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(For question about this report, contact me at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


RE: [EXTERNAL] Check that passes do not forget to define profile

2023-08-30 Thread Eugene Rozenfeld via Gcc-patches
Hi Jan,

These new checks are too strong for AutoFDO. For example, the edge 
probabilities are not guaranteed to be initialized (see 
afdo_calculate_branch_prob).
This currently breaks autoprofiledbootstrap build.

I suggest removing
cfun->cfg->full_profile = true;
from auto-profile.cc.

Eugene

-Original Message-
From: Gcc-patches  On 
Behalf Of Jan Hubicka via Gcc-patches
Sent: Thursday, August 24, 2023 6:15 AM
To: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Check that passes do not forget to define profile

Hi,
this patch extends verifier to check that all probabilities and counts are 
initialized if profile is supposed to be present.  This is a bit complicated by 
the posibility that we inline !flag_guess_branch_probability function into 
function with profile defined and in this case we need to stop verification.  
For this reason I added flag to cfg structure tracking this.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* cfg.h (struct control_flow_graph): New field full_profile.
* auto-profile.cc (afdo_annotate_cfg): Set full_profile to true.
* cfg.cc (init_flow): Set full_profile to false.
* graphite.cc (graphite_transform_loops): Set full_profile to false.
* lto-streamer-in.cc (input_cfg): Initialize full_profile flag.
* predict.cc (pass_profile::execute): Set full_profile to true.
* symtab-thunks.cc (expand_thunk): Set full_profile to true.
* tree-cfg.cc (gimple_verify_flow_info): Verify that profile is full
if full_profile is set.
* tree-inline.cc (initialize_cfun): Initialize full_profile.
(expand_call_inline): Combine full_profile.


diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index 
e3af3555e75..ff3b763945c 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -1578,6 +1578,7 @@ afdo_annotate_cfg (const stmt_set _stmts)
 }
   update_max_bb_count ();
   profile_status_for_fn (cfun) = PROFILE_READ;
+  cfun->cfg->full_profile = true;
   if (flag_value_profile_transformations)
 {
   gimple_value_profile_transformations (); diff --git a/gcc/cfg.cc 
b/gcc/cfg.cc index 9eb9916f61a..b7865f14e7f 100644
--- a/gcc/cfg.cc
+++ b/gcc/cfg.cc
@@ -81,6 +81,7 @@ init_flow (struct function *the_fun)
 = ENTRY_BLOCK_PTR_FOR_FN (the_fun);
   the_fun->cfg->edge_flags_allocated = EDGE_ALL_FLAGS;
   the_fun->cfg->bb_flags_allocated = BB_ALL_FLAGS;
+  the_fun->cfg->full_profile = false;
 }
 

 /* Helper function for remove_edge and free_cffg.  Frees edge structure diff 
--git a/gcc/cfg.h b/gcc/cfg.h index a0e944979c8..53e2553012c 100644
--- a/gcc/cfg.h
+++ b/gcc/cfg.h
@@ -78,6 +78,9 @@ struct GTY(()) control_flow_graph {
   /* Dynamically allocated edge/bb flags.  */
   int edge_flags_allocated;
   int bb_flags_allocated;
+
+  /* Set if the profile is computed on every edge and basic block.  */  
+ bool full_profile;
 };
 
 
diff --git a/gcc/graphite.cc b/gcc/graphite.cc index 19f8975ffa2..2b387d5b016 
100644
--- a/gcc/graphite.cc
+++ b/gcc/graphite.cc
@@ -512,6 +512,8 @@ graphite_transform_loops (void)
 
   if (changed)
 {
+  /* FIXME: Graphite does not update profile meaningfully currently.  */
+  cfun->cfg->full_profile = false;
   cleanup_tree_cfg ();
   profile_status_for_fn (cfun) = PROFILE_ABSENT;
   release_recorded_exits (cfun);
diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc index 
0cce14414ca..d3128fcebe4 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1030,6 +1030,7 @@ input_cfg (class lto_input_block *ib, class data_in 
*data_in,
   basic_block p_bb;
   unsigned int i;
   int index;
+  bool full_profile = false;
 
   init_empty_tree_cfg_for_function (fn);
 
@@ -1071,6 +1072,8 @@ input_cfg (class lto_input_block *ib, class data_in 
*data_in,
  data_in->location_cache.input_location_and_block (>goto_locus,
, ib, data_in);
  e->probability = profile_probability::stream_in (ib);
+ if (!e->probability.initialized_p ())
+   full_profile = false;
 
}
 
@@ -1145,6 +1148,7 @@ input_cfg (class lto_input_block *ib, class data_in 
*data_in,
 
   /* Rebuild the loop tree.  */
   flow_loops_find (loops);
+  cfun->cfg->full_profile = full_profile;
 }
 
 
diff --git a/gcc/predict.cc b/gcc/predict.cc index 5a1a561cc24..396746cbfd1 
100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -4131,6 +4131,7 @@ pass_profile::execute (function *fun)
 scev_initialize ();
 
   tree_estimate_probability (false);
+  cfun->cfg->full_profile = true;
 
   if (nb_loops > 1)
 scev_finalize ();
diff --git a/gcc/symtab-thunks.cc b/gcc/symtab-thunks.cc index 
4c04235c41b..23ead0d2138 100644
--- a/gcc/symtab-thunks.cc
+++ b/gcc/symtab-thunks.cc
@@ -648,6 +648,7 @@ expand_thunk (cgraph_node *node, bool output_asm_thunks,
  ? PROFILE_READ : PROFILE_GUESSED;
   /* FIXME: C++ FE should stop setting TREE_ASM_WRITTEN on 

Re: [PATCH] RISC-V: Fix vsetvl pass ICE

2023-08-30 Thread juzhe.zh...@rivai.ai
Ok for trunk. But not sure whether it's ok for GCC-13.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-08-30 17:51
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw
Subject: [PATCH] RISC-V: Fix vsetvl pass ICE
This patch fix pr111234 (a vsetvl pass ICE) when fuse a mask any
vlmax vsetvl_vtype_change_only insn with a mu vsetvl insn.
 
PR target/111234
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Remove condition.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/pr111234.c: New test.
 
---
gcc/config/riscv/riscv-vsetvl.cc  |  2 +-
.../gcc.target/riscv/rvv/vsetvl/pr111234.c| 19 +++
2 files changed, 20 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 1386d9250ca..a81bb53a521 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -655,7 +655,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info 
,
 new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl);
   else
 {
-  if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ()))
+  if (vsetvl_insn_p (rinsn))
new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, get_vl (rinsn));
   else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only)
new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c
new file mode 100644
index 000..ee5eec4a257
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include 
+
+void
+f (vint32m1_t *in, vint64m2_t *out, vbool32_t *m, int b)
+{
+  vint32m1_t va = *in;
+  vbool32_t mask = *m;
+  vint64m2_t vb
+= __riscv_vwadd_vx_i64m2_m (mask, va, 1, __riscv_vsetvlmax_e64m2 ());
+  vint64m2_t vc = __riscv_vadd_vx_i64m2 (vb, 1, __riscv_vsetvlmax_e64m2 ());
+
+  if (b != 0)
+vc = __riscv_vadd_vx_i64m2_mu (mask, vc, vc, 1, __riscv_vsetvlmax_e64m2 
());
+
+  *out = vc;
+}
-- 
2.36.3
 


[RFC PATCH v2 1/1] RISC-V: Add support for 'XVentanaCondOps' reusing 'Zicond' support

2023-08-30 Thread Tsukasa OI via Gcc-patches
From: Tsukasa OI 

'XVentanaCondOps' is a vendor extension from Ventana Micro Systems
containing two instructions for conditional move and will be supported on
their Veyron V1 CPU.

And most notably (for historical reasons), 'XVentanaCondOps' and the
standard 'Zicond' extension are functionally equivalent (only encodings and
instruction names are different).

*   czero.eqz == vt.maskc
*   czero.nez == vt.maskcn

This commit adds support for the 'XVentanaCondOps' extension by extending
'Zicond' extension support.  With this, we can now reuse the optimization
using the 'Zicond' extension for the 'XVentanaCondOps' extension.

The specification for the 'XVentanaCondOps' extension is based on:


gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_flag_table):
Parse 'XVentanaCondOps' extension.
* config/riscv/riscv-opts.h (MASK_XVENTANACONDOPS): New.
(TARGET_XVENTANACONDOPS): Ditto.
(TARGET_ZICOND_LIKE): New to represent targets with conditional
moves like 'Zicond'.  It includes RV64 + 'XVentanaCondOps'.
* config/riscv/riscv.cc (riscv_rtx_costs): Replace TARGET_ZICOND
with TARGET_ZICOND_LIKE.
(riscv_expand_conditional_move): Ditto.
* config/riscv/riscv.md (movcc): Replace TARGET_ZICOND with
TARGET_ZICOND_LIKE.
* config/riscv/riscv.opt: Add new riscv_xventana_subext.
* config/riscv/zicond.md: Modify description.
(eqz_ventana): New to match corresponding czero instructions.
(nez_ventana): Ditto.
(*czero..): Emit a 'XVentanaCondOps' instruction if
'Zicond' is not available but 'XVentanaCondOps' + RV64 is.
(*czero..): Ditto.
(*czero.eqz..opt1): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xventanacondops-primitiveSemantics.c: New test,
modified from zicond-primitiveSemantics.c.
* gcc.target/riscv/xventanacondops-primitiveSemantics-rv32.c: New
test to make sure that XVentanaCondOps instructions are disabled
on RV32.
* gcc.target/riscv/xventanacondops-xor-01.c: New test, modified
from zicond-xor-01.c.
---
 gcc/common/config/riscv/riscv-common.cc   |  2 +
 gcc/config/riscv/riscv-opts.h |  6 +++
 gcc/config/riscv/riscv.cc |  4 +-
 gcc/config/riscv/riscv.md |  2 +-
 gcc/config/riscv/riscv.opt|  3 ++
 gcc/config/riscv/zicond.md| 52 ---
 .../xventanacondops-primitiveSemantics-rv32.c | 45 
 .../xventanacondops-primitiveSemantics.c  | 48 +
 .../gcc.target/riscv/xventanacondops-xor-01.c | 14 +
 9 files changed, 154 insertions(+), 22 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xventanacondops-primitiveSemantics-rv32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xventanacondops-primitiveSemantics.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-xor-01.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index f142212f2edc..9a0a68fe5db3 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1493,6 +1493,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"xtheadmempair", _options::x_riscv_xthead_subext, MASK_XTHEADMEMPAIR},
   {"xtheadsync",_options::x_riscv_xthead_subext, MASK_XTHEADSYNC},
 
+  {"xventanacondops", _options::x_riscv_xventana_subext, 
MASK_XVENTANACONDOPS},
+
   {NULL, NULL, 0}
 };
 
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 5ed69abd214d..a4fb0a0a5946 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -319,6 +319,12 @@ enum riscv_entity
 #define TARGET_XTHEADMEMPAIR ((riscv_xthead_subext & MASK_XTHEADMEMPAIR) != 0)
 #define TARGET_XTHEADSYNC((riscv_xthead_subext & MASK_XTHEADSYNC) != 0)
 
+#define MASK_XVENTANACONDOPS  (1 << 0)
+
+#define TARGET_XVENTANACONDOPS ((riscv_xventana_subext & MASK_XVENTANACONDOPS) 
!= 0)
+
+#define TARGET_ZICOND_LIKE (TARGET_ZICOND || (TARGET_XVENTANACONDOPS && 
TARGET_64BIT))
+
 /* We only enable VLS modes for VLA vectorization since fixed length VLMAX mode
is the highest priority choice and should not conflict with VLS modes.  */
 #define TARGET_VECTOR_VLS  
\
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5dc303f89c79..89af39a08190 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2744,7 +2744,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  *total = COSTS_N_INSNS (1);
  return true;
}
-  else if (TARGET_ZICOND
+  else if (TARGET_ZICOND_LIKE
   && outer_code == 

[RFC PATCH v2 0/1] RISC-V: Add support for 'XVentanaCondOps' reusing 'Zicond' support

2023-08-30 Thread Tsukasa OI via Gcc-patches
PATCH v1:


Changes: v1 -> v2
*   Removed bogus opt2 pattern as pointed out in:

note that this is not in the ChangeLog expecting the patch above
applies first.




Tsukasa OI (1):
  RISC-V: Add support for 'XVentanaCondOps' reusing 'Zicond' support

 gcc/common/config/riscv/riscv-common.cc   |  2 +
 gcc/config/riscv/riscv-opts.h |  6 +++
 gcc/config/riscv/riscv.cc |  4 +-
 gcc/config/riscv/riscv.md |  2 +-
 gcc/config/riscv/riscv.opt|  3 ++
 gcc/config/riscv/zicond.md| 52 ---
 .../xventanacondops-primitiveSemantics-rv32.c | 45 
 .../xventanacondops-primitiveSemantics.c  | 48 +
 .../gcc.target/riscv/xventanacondops-xor-01.c | 14 +
 9 files changed, 154 insertions(+), 22 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xventanacondops-primitiveSemantics-rv32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xventanacondops-primitiveSemantics.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-xor-01.c


base-commit: 597b9ec69bca8acb7a3d65641c0a730de8b27ed4
-- 
2.42.0



[PATCH] rs6000: Update instruction counts to match vec_* calls [PR111228]

2023-08-30 Thread Peter Bergner via Gcc-patches
Commit  r14-3258-ge7a36e4715c716 increased the amount of folding we perform,
leading to better code.  Update the expected instruction counts to match the
the number of associated vec_* built-in calls.

Tested on powerpc64le-linux with no regressions.  Ok for mainline?

Peter

gcc/testsuite/
PR testsuite/111228
* gcc.target/powerpc/fold-vec-logical-ors-char.c: Update instruction
counts to match the number of associated vec_* built-in calls.
* gcc.target/powerpc/fold-vec-logical-ors-int.c: Likewise.
* gcc.target/powerpc/fold-vec-logical-ors-longlong.c: Likewise.
* gcc.target/powerpc/fold-vec-logical-ors-short.c: Likewise.
* gcc.target/powerpc/fold-vec-logical-other-char.c: Likewise.
* gcc.target/powerpc/fold-vec-logical-other-int.c: Likewise.
* gcc.target/powerpc/fold-vec-logical-other-longlong.c: Likewise.
* gcc.target/powerpc/fold-vec-logical-other-short.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-char.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-char.c
index 713fed7824a..7406039d054 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-char.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-char.c
@@ -120,6 +120,6 @@ test6_nor (vector unsigned char x, vector unsigned char y)
   return *foo;
 }
 
-/* { dg-final { scan-assembler-times {\mxxlor\M} 7 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mxxlxor\M} 6 } } */
-/* { dg-final { scan-assembler-times {\mxxlnor\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxxlnor\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-int.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-int.c
index 4d1c78f40ec..a7c6366b938 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-int.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-int.c
@@ -119,6 +119,6 @@ test6_nor (vector unsigned int x, vector unsigned int y)
   return *foo;
 }
 
-/* { dg-final { scan-assembler-times {\mxxlor\M} 7 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mxxlxor\M} 6 } } */
-/* { dg-final { scan-assembler-times {\mxxlnor\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxxlnor\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-longlong.c
index 27ef09ada80..10c69d3d87b 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-longlong.c
@@ -156,6 +156,6 @@ test6_nor (vector unsigned long long x, vector unsigned 
long long y)
 // For simplicity, this test now only targets "powerpc_p8vector_ok" 
environments
 // where the answer is expected to be 6.
 
-/* { dg-final { scan-assembler-times {\mxxlor\M} 9 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mxxlxor\M} 6 } } */
-/* { dg-final { scan-assembler-times {\mxxlnor\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mxxlnor\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-short.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-short.c
index f796c5b33a9..8352a7f4dc5 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-short.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-short.c
@@ -119,6 +119,6 @@ test6_nor (vector unsigned short x, vector unsigned short y)
   return *foo;
 }
 
-/* { dg-final { scan-assembler-times {\mxxlor\M} 7 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mxxlxor\M} 6 } } */
-/* { dg-final { scan-assembler-times {\mxxlnor\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxxlnor\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-other-char.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-other-char.c
index e74308ccda2..7fe3e0b8e0e 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-other-char.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-other-char.c
@@ -104,5 +104,5 @@ test6_nand (vector unsigned char x, vector unsigned char y)
   return *foo;
 }
 
-/* { dg-final { scan-assembler-times {\mxxlnand\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mxxlnand\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mxxlorc\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-other-int.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-other-int.c
index 57edaad52a8..61d34059b67 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-other-int.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-other-int.c
@@ -104,5 +104,5 @@ test6_nand (vector unsigned int x, vector unsigned int y)
   return *foo;
 }
 
-/* { dg-final { scan-assembler-times {\mxxlnand\M} 3 } } 

Re: RFC: Top level configure: Require a minimum version 6.8 texinfo

2023-08-30 Thread Tom Tromey
> "Eric" == Eric Gallager via Gdb-patches  
> writes:

Eric> Just as a point of reference, but the default makeinfo shipped with
Eric> macOS (/usr/bin/makeinfo) is stuck at version 4.8 due to the whole
Eric> GPL3 transition. The other makeinfos that I have installed are:
[...]

I think brew has a newer one.

However, I also sent a patch to back out what we think is the problem
patch.  Could you try that?  It's on the binutils list.

Tom


[PATCH] MATCH: extend min_value/max_value match to vectors

2023-08-30 Thread Andrew Pinski via Gcc-patches
This simple patch extends the min_value/max_value match to vector integer types.
Using uniform_integer_cst_p makes this easy.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

The testcases pr110915-*.c are the same as pr88784-*.c except using vector
types instead.

PR tree-optimization/110915

gcc/ChangeLog:

* match.pd (min_value, max_value): Extend to vector constants.

gcc/testsuite/ChangeLog:

* gcc.dg/pr110915-1.c: New test.
* gcc.dg/pr110915-10.c: New test.
* gcc.dg/pr110915-11.c: New test.
* gcc.dg/pr110915-12.c: New test.
* gcc.dg/pr110915-2.c: New test.
* gcc.dg/pr110915-3.c: New test.
* gcc.dg/pr110915-4.c: New test.
* gcc.dg/pr110915-5.c: New test.
* gcc.dg/pr110915-6.c: New test.
* gcc.dg/pr110915-7.c: New test.
* gcc.dg/pr110915-8.c: New test.
* gcc.dg/pr110915-9.c: New test.
---
 gcc/match.pd   | 24 ++
 gcc/testsuite/gcc.dg/pr110915-1.c  | 31 
 gcc/testsuite/gcc.dg/pr110915-10.c | 33 ++
 gcc/testsuite/gcc.dg/pr110915-11.c | 31 
 gcc/testsuite/gcc.dg/pr110915-12.c | 31 
 gcc/testsuite/gcc.dg/pr110915-2.c  | 31 
 gcc/testsuite/gcc.dg/pr110915-3.c  | 33 ++
 gcc/testsuite/gcc.dg/pr110915-4.c  | 33 ++
 gcc/testsuite/gcc.dg/pr110915-5.c  | 32 +
 gcc/testsuite/gcc.dg/pr110915-6.c  | 32 +
 gcc/testsuite/gcc.dg/pr110915-7.c  | 32 +
 gcc/testsuite/gcc.dg/pr110915-8.c  | 32 +
 gcc/testsuite/gcc.dg/pr110915-9.c  | 33 ++
 13 files changed, 400 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr110915-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr110915-10.c
 create mode 100644 gcc/testsuite/gcc.dg/pr110915-11.c
 create mode 100644 gcc/testsuite/gcc.dg/pr110915-12.c
 create mode 100644 gcc/testsuite/gcc.dg/pr110915-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr110915-3.c
 create mode 100644 gcc/testsuite/gcc.dg/pr110915-4.c
 create mode 100644 gcc/testsuite/gcc.dg/pr110915-5.c
 create mode 100644 gcc/testsuite/gcc.dg/pr110915-6.c
 create mode 100644 gcc/testsuite/gcc.dg/pr110915-7.c
 create mode 100644 gcc/testsuite/gcc.dg/pr110915-8.c
 create mode 100644 gcc/testsuite/gcc.dg/pr110915-9.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 6a7edde5736..c01362ee359 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2750,16 +2750,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   & (bitpos / BITS_PER_UNIT))); }
 
 (match min_value
- INTEGER_CST
- (if ((INTEGRAL_TYPE_P (type)
-   || POINTER_TYPE_P(type))
-  && wi::eq_p (wi::to_wide (t), wi::min_value (type)
+ uniform_integer_cst_p
+ (with {
+   tree int_cst = uniform_integer_cst_p (t);
+   tree inner_type = TREE_TYPE (int_cst);
+  }
+  (if ((INTEGRAL_TYPE_P (inner_type)
+|| POINTER_TYPE_P (inner_type))
+   && wi::eq_p (wi::to_wide (int_cst), wi::min_value (inner_type))
 
 (match max_value
- INTEGER_CST
- (if ((INTEGRAL_TYPE_P (type)
-   || POINTER_TYPE_P(type))
-  && wi::eq_p (wi::to_wide (t), wi::max_value (type)
+ uniform_integer_cst_p
+ (with {
+   tree int_cst = uniform_integer_cst_p (t);
+   tree itype = TREE_TYPE (int_cst);
+  }
+ (if ((INTEGRAL_TYPE_P (itype)
+   || POINTER_TYPE_P (itype))
+  && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))
 
 /* x >  y  &&  x != XXX_MIN  -->  x > y
x >  y  &&  x == XXX_MIN  -->  false . */
diff --git a/gcc/testsuite/gcc.dg/pr110915-1.c 
b/gcc/testsuite/gcc.dg/pr110915-1.c
new file mode 100644
index 000..2e1e871b9a0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr110915-1.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ifcombine" } */
+#define vector __attribute__((vector_size(sizeof(unsigned)*2)))
+
+#include 
+
+vector signed and1(vector unsigned x, vector unsigned y)
+{
+  /* (x > y) & (x != 0)  --> x > y */
+  return (x > y) & (x != 0);
+}
+
+vector signed and2(vector unsigned x, vector unsigned y)
+{
+  /* (x < y) & (x != UINT_MAX)  --> x < y */
+  return (x < y) & (x != UINT_MAX);
+}
+
+vector signed and3(vector signed x, vector signed y)
+{
+  /* (x > y) & (x != INT_MIN)  --> x > y */
+  return (x > y) & (x != INT_MIN);
+}
+
+vector signed and4(vector signed x, vector signed y)
+{
+  /* (x < y) & (x != INT_MAX)  --> x < y */
+  return (x < y) & (x != INT_MAX);
+}
+
+/* { dg-final { scan-tree-dump-not " != " "ifcombine" } } */
diff --git a/gcc/testsuite/gcc.dg/pr110915-10.c 
b/gcc/testsuite/gcc.dg/pr110915-10.c
new file mode 100644
index 000..b0644bf3123
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr110915-10.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { 

Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-30 Thread Eric Feng via Gcc-patches
On Tue, Aug 29, 2023 at 5:14 PM David Malcolm  wrote:
>
> On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> > Additionally, by using the old model and the pointer per your
> > suggestion,
> > we are able to find the representative tree and emit a more accurate
> > diagnostic!
> >
> > rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’
> > but ob_refcnt field is: ‘2’
> >23 |   return list;
> >   |  ^~~~
> >   ‘create_py_object’: events 1-4
> > |
> > |4 |   PyObject* item = PyLong_FromLong(3);
> > |  |^~
> > |  ||
> > |  |(1) when ‘PyLong_FromLong’ succeeds
> > |5 |   PyObject* list = PyList_New(1);
> > |  |~
> > |  ||
> > |  |(2) when ‘PyList_New’ succeeds
> > |..
> > |   14 |   PyList_Append(list, item);
> > |  |   ~
> > |  |   |
> > |  |   (3) when ‘PyList_Append’ succeeds, moving buffer
> > |..
> > |   23 |   return list;
> > |  |  
> > |  |  |
> > |  |  (4) here
> > |
>
> Excellent, that's a big improvement.
>
> >
> > If a representative tree is not found, I decided we should just bail
> > out
> > of emitting a diagnostic for now, to avoid confusing the user on what
> > the problem is.
>
> Fair enough.
>
> >
> > I've attached the patch for this (on top of the previous one) below.
> > If
> > it also looks good, I can merge it with the last patch and push it in
> > at
> > the same time.
>
> I don't mind either way, but please can you update the tests so that we
> have some automated test coverage that the correct name is being
> printed in the warning.
>
> Thanks
> Dave
>

Sorry — forgot to hit 'reply all' in the previous e-mail. Resending to
preserve our chain on the list:

---

Thanks; pushed to trunk with nits fixed:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4.

Incidentally, I updated my formatting settings in VSCode, which I've
previously mentioned in passing. In case anyone is interested:

"C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
TabWidth: 8, IndentWidth: 2, BinPackParameters: false,
AlignAfterOpenBracket: Align,
AllowAllParametersOfDeclarationOnNextLine: true }",

This fixes some issues with the indent width and also ensures function
parameters of appropriate length are aligned properly and on a new
line each (like the rest of the analyzer code).


[PATCH] RISC-V: zicond: remove bogus opt2 pattern

2023-08-30 Thread Vineet Gupta
This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since the
pattern semantics can't be expressed by zicond instructions.

This involves test code snippet:

  if (a == 0)
return 0;
  else
return x;
}

which is equivalent to:  "x = (a != 0) ? x : a"

and matches define_insn "*czero.nez..opt2"

| (insn 41 20 38 3 (set (reg/v:DI 136 [ x ])
|(if_then_else:DI (ne (reg/v:DI 134 [ a ])
|(const_int 0 [0]))
|(reg/v:DI 136 [ x ])
|(reg/v:DI 134 [ a ]))) {*czero.nez.didi.opt2}

The corresponding asm pattern generates
czero.nez x, x, a   ; %0, %2, %1
implying
"x = (a != 0) ? 0 : a"

which is not what the pattern semantics are.

Essentially "(a != 0) ? x : a" cannot be expressed with CZERO.nez

As a side note, while correctness prevails, this test still gets a
czero in the end, albeit a different one.

if-convert generates two if_then_else

| (insn 43 20 44 3 (set (reg:DI 143)
|(reg/v:DI 136 [ x ])) "pr60003.c":36:9 179 {*movdi_64bit}
| (insn 44 43 46 3 (set (reg:DI 142)
|(reg/v:DI 134 [ a ])) "pr60003.c":36:9 179 {*movdi_64bit}
|
| (insn 46 44 47 3 (set (reg:DI 145)
|(if_then_else:DI (ne:DI (reg/v:DI 134 [ a ])
|(const_int 0 [0]))
|(const_int 0 [0])
|(reg:DI 142))) "pr60003.c":36:9 14532 {*czero.nez.didi}
|
| (insn 47 46 48 3 (set (reg:DI 144)
|(if_then_else:DI (eq:DI (reg/v:DI 134 [ a ])
|(const_int 0 [0]))
|(const_int 0 [0])
|(reg:DI 143))) "pr60003.c":36:9 14531 {*czero.eqz.didi}

and combine is able to fuse them together

| (insn 38 48 39 3 (set (reg/i:DI 10 a0)
|(if_then_else:DI (eq:DI (reg/v:DI 134 [ a ])
|(const_int 0 [0]))
|(const_int 0 [0])
|(reg:DI 143))) "pr60003.c":40:1 14531 {*czero.eqz.didi}

before fix  after fix
-   -
lia5,1  lia0,1
lda4,8(sp)  lda5,8(sp)
czero.nez a0,a4,a5  czero.eqz a0,a5,a0

The issue only happens at -O1 as at higher optimization levels, the
whole conditional move gets optimized away.

gcc/ChangeLog:
* config/riscv/zicond.md: Remove incorrect op2 pattern.

Fixes: 1d5bc3285e8a ("[committed][RISC-V] Fix 20010221-1.c with zicond")
Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/zicond.md | 10 --
 1 file changed, 10 deletions(-)

diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
index 25f21d33487e..aa5607a9efd8 100644
--- a/gcc/config/riscv/zicond.md
+++ b/gcc/config/riscv/zicond.md
@@ -52,13 +52,3 @@
   "TARGET_ZICOND && rtx_equal_p (operands[1], operands[2])"
   "czero.eqz\t%0,%3,%1"
 )
-
-(define_insn "*czero.nez..opt2"
-  [(set (match_operand:GPR 0 "register_operand"   "=r")
-(if_then_else:GPR (ne (match_operand:X 1 "register_operand" "r")
-  (const_int 0))
-  (match_operand:GPR 2 "register_operand" "r")
-  (match_operand:GPR 3 "register_operand" "1")))]
-  "TARGET_ZICOND && rtx_equal_p (operands[1], operands[3])"
-  "czero.nez\t%0,%2,%1"
-)
-- 
2.34.1



Re: Analyzer failure due to missing header

2023-08-30 Thread David Malcolm via Gcc-patches
On Wed, 2023-08-30 at 23:24 +0200, FX Coudert wrote:
> > std::max and std::min, introduced by d99d73c77d1e and 2bad0eeb5573,
> > are not available because  is not included.
> 
> I originally thought this was only seen in cross-compilers, but it
> actually broke bootstrap on darwin.
> Attached patch restores it, OK to commit?

LGTM

Thanks
Dave



Re: [PATCH v2 1/4] LoongArch: improved target configuration interface

2023-08-30 Thread Joseph Myers
On Wed, 30 Aug 2023, Yang Yujie wrote:

> +A suffix @code{[/ARCH][/OPTION]...]} may follow immediately after the ABI
> +identifier to customize the compiler options for building the given set of
> +libraries.  @code{ARCH} denotes the architecture name recognized by the
> +@code{-march=ARCH} compiler option, which acts as a basic target ISA
> +configuration that can be adjusted using the subsequent @code{OPTION}
> +suffixes, where each @code{OPTION} is a compiler option itself.

Since ARCH and OPTION are not literal strings of program source code, you 
should actually be using @var{arch} and @var{option} for them (and @dots{} 
instead of ..., since the ... isn't literal source code either).

This patch series also adds a new configure option --with-strict-align-lib 
that needs documenting in the corresponding patch.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Analyzer failure due to missing header

2023-08-30 Thread FX Coudert via Gcc-patches
> std::max and std::min, introduced by d99d73c77d1e and 2bad0eeb5573, are not 
> available because  is not included.

I originally thought this was only seen in cross-compilers, but it actually 
broke bootstrap on darwin.
Attached patch restores it, OK to commit?

FX



0001-Analyzer-include-algorithm-header.patch
Description: Binary data


Re: [PATCH] c++: Check for indirect change of active union member in constexpr [PR101631]

2023-08-30 Thread Jason Merrill via Gcc-patches

On 8/29/23 09:35, Nathaniel Shead wrote:

This is an attempt to improve the constexpr machinery's handling of
union lifetime by catching more cases that cause UB. Is this approach
OK?

I'd also like some feedback on a couple of pain points with this
implementation; in particular, is there a good way to detect if a type
has a non-deleted trivial constructor? I've used 'is_trivially_xible' in
this patch, but that also checks for a trivial destructor which by my
reading of [class.union.general]p5 is possibly incorrect. Checking for a
trivial default constructor doesn't seem too hard but I couldn't find a
good way of checking if that constructor is deleted.


I guess the simplest would be

(TYPE_HAS_TRIVIAL_DFLT (t) && locate_ctor (t))

because locate_ctor returns null for a deleted default ctor.  It would 
be good to make this a separate predicate.



I'm also generally unsatisfied with the additional complexity with the
third 'refs' argument in 'cxx_eval_store_expression' being pushed and
popped; would it be better to replace this with a vector of some
specific structure type for the data that needs to be passed on?


Perhaps, but what you have here is fine.  Another possibility would be 
to just have a vec of the refs and extract the index from the ref later 
as needed.


Jason



[committed] pru: Add cstore expansion patterns

2023-08-30 Thread Dimitar Dimitrov
Add cstore patterns for the two specific operations which can be
efficiently expanded using the UMIN instruction:
  X != 0
  X == 0
The rest of the operations are rejected, and left to be expanded
by the common expansion code.

Reg-tested pru-unknown-elf.  Pushed to trunk.

PR target/106562

gcc/ChangeLog:

* config/pru/predicates.md (const_0_operand): New predicate.
(pru_cstore_comparison_operator): Ditto.
* config/pru/pru.md (cstore4): New pattern.
(cstoredi4): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/pru/pr106562-10.c: New test.
* gcc.target/pru/pr106562-11.c: New test.
* gcc.target/pru/pr106562-5.c: New test.
* gcc.target/pru/pr106562-6.c: New test.
* gcc.target/pru/pr106562-7.c: New test.
* gcc.target/pru/pr106562-8.c: New test.
* gcc.target/pru/pr106562-9.c: New test.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/config/pru/predicates.md   |  8 +++
 gcc/config/pru/pru.md  | 62 ++
 gcc/testsuite/gcc.target/pru/pr106562-10.c |  8 +++
 gcc/testsuite/gcc.target/pru/pr106562-11.c |  8 +++
 gcc/testsuite/gcc.target/pru/pr106562-5.c  |  8 +++
 gcc/testsuite/gcc.target/pru/pr106562-6.c  |  8 +++
 gcc/testsuite/gcc.target/pru/pr106562-7.c  |  8 +++
 gcc/testsuite/gcc.target/pru/pr106562-8.c  |  8 +++
 gcc/testsuite/gcc.target/pru/pr106562-9.c  |  8 +++
 9 files changed, 126 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/pru/pr106562-10.c
 create mode 100644 gcc/testsuite/gcc.target/pru/pr106562-11.c
 create mode 100644 gcc/testsuite/gcc.target/pru/pr106562-5.c
 create mode 100644 gcc/testsuite/gcc.target/pru/pr106562-6.c
 create mode 100644 gcc/testsuite/gcc.target/pru/pr106562-7.c
 create mode 100644 gcc/testsuite/gcc.target/pru/pr106562-8.c
 create mode 100644 gcc/testsuite/gcc.target/pru/pr106562-9.c

diff --git a/gcc/config/pru/predicates.md b/gcc/config/pru/predicates.md
index e4a7fcf259b..faa0dbf9fb4 100644
--- a/gcc/config/pru/predicates.md
+++ b/gcc/config/pru/predicates.md
@@ -22,6 +22,10 @@ (define_predicate "const_1_operand"
   (and (match_code "const_int")
(match_test "INTVAL (op) == 1")))
 
+(define_predicate "const_0_operand"
+  (and (match_code "const_int")
+   (match_test "INTVAL (op) == 0")))
+
 ; Note: Always pass a valid mode!
 (define_predicate "const_ubyte_operand"
   (match_code "const_int")
@@ -49,6 +53,10 @@ (define_predicate "pru_signed_cmp_operator"
 (define_predicate "pru_fp_comparison_operator"
   (match_code "eq,ne,lt,gt,le,ge"))
 
+;; TRUE for comparisons supported by PRU's cstore.
+(define_predicate "pru_cstore_comparison_operator"
+  (match_code "eq,ne,gtu"))
+
 ;; Return true if OP is a constant that contains only one 1 in its
 ;; binary representation.
 (define_predicate "single_one_operand"
diff --git a/gcc/config/pru/pru.md b/gcc/config/pru/pru.md
index 6deb5ecfecb..93ad7b6ad7e 100644
--- a/gcc/config/pru/pru.md
+++ b/gcc/config/pru/pru.md
@@ -1489,6 +1489,68 @@ (define_expand "cbranchdi4"
 gcc_unreachable ();
 })
 
+;; Emit efficient code for two specific cstore cases:
+;;   X == 0
+;;   X != 0
+;;
+;; These can be efficiently compiled on the PRU using the umin
+;; instruction.
+;;
+;; This expansion does not handle "X > 0 unsigned" and "X >= 1 unsigned"
+;; because it is assumed that those would have been replaced with the
+;; canonical "X != 0".
+(define_expand "cstore4"
+  [(set (match_operand:QISI 0 "register_operand")
+   (match_operator:QISI 1 "pru_cstore_comparison_operator"
+ [(match_operand:QISI 2 "register_operand")
+  (match_operand:QISI 3 "const_0_operand")]))]
+  ""
+{
+  const enum rtx_code op1code = GET_CODE (operands[1]);
+
+  /* Crash if OP1 is GTU.  It would mean that "X > 0 unsigned"
+ had not been canonicalized before calling this expansion.  */
+  gcc_assert (op1code == NE || op1code == EQ);
+  gcc_assert (CONST_INT_P (operands[3]) && INTVAL (operands[3]) == 0);
+
+  if (op1code == NE)
+{
+  emit_insn (gen_umin3 (operands[0], operands[2], const1_rtx));
+  DONE;
+}
+  else if (op1code == EQ)
+{
+  rtx tmpval = gen_reg_rtx (mode);
+  emit_insn (gen_umin3 (tmpval, operands[2], const1_rtx));
+  emit_insn (gen_xor3 (operands[0], tmpval, const1_rtx));
+  DONE;
+}
+
+  gcc_unreachable ();
+})
+
+(define_expand "cstoredi4"
+  [(set (match_operand:SI 0 "register_operand")
+   (match_operator:SI 1 "pru_cstore_comparison_operator"
+ [(match_operand:DI 2 "register_operand")
+  (match_operand:DI 3 "const_0_operand")]))]
+  ""
+{
+  /* Combining the two SImode suboperands with IOR works only for
+ the currently supported set of cstoresi3 operations.  */
+  const enum rtx_code op1code = GET_CODE (operands[1]);
+  gcc_assert (op1code == NE || op1code == EQ);
+  gcc_assert (CONST_INT_P (operands[3]) && INTVAL (operands[3]) == 0);
+
+  rtx tmpval = gen_reg_rtx (SImode);
+  rtx src_lo = simplify_gen_subreg 

Re: [PATCH] c++: CWG 2359, wrong copy-init with designated init [PR91319]

2023-08-30 Thread Marek Polacek via Gcc-patches
On Tue, Aug 29, 2023 at 04:44:11PM -0400, Jason Merrill wrote:
> On 8/28/23 19:09, Marek Polacek wrote:
> > On Mon, Aug 28, 2023 at 06:27:26PM -0400, Jason Merrill wrote:
> > > On 8/25/23 12:44, Marek Polacek wrote:
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > -- >8 --
> > > > 
> > > > This CWG clarifies that designated initializer support 
> > > > direct-initialization.
> > > > Just be careful what Note 2 in [dcl.init.aggr]/4.2 says: "If the
> > > > initialization is by designated-initializer-clause, its form determines
> > > > whether copy-initialization or direct-initialization is performed."  
> > > > Hence
> > > > this patch sets CONSTRUCTOR_IS_DIRECT_INIT only when we are dealing with
> > > > ".x{}", but not ".x = {}".
> > > > 
> > > > PR c++/91319
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * parser.cc (cp_parser_initializer_list): Set 
> > > > CONSTRUCTOR_IS_DIRECT_INIT
> > > > when the designated initializer is of the .x{} form.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/cpp2a/desig30.C: New test.
> > > > ---
> > > >gcc/cp/parser.cc |  6 ++
> > > >gcc/testsuite/g++.dg/cpp2a/desig30.C | 22 ++
> > > >2 files changed, 28 insertions(+)
> > > >create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig30.C
> > > > 
> > > > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > > > index eeb22e44fb4..b3d5c65b469 100644
> > > > --- a/gcc/cp/parser.cc
> > > > +++ b/gcc/cp/parser.cc
> > > > @@ -25718,6 +25718,7 @@ cp_parser_initializer_list (cp_parser* parser, 
> > > > bool* non_constant_p,
> > > >  tree designator;
> > > >  tree initializer;
> > > >  bool clause_non_constant_p;
> > > > +  bool direct_p = false;
> > > >  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
> > > >  /* Handle the C++20 syntax, '. id ='.  */
> > > > @@ -25740,6 +25741,8 @@ cp_parser_initializer_list (cp_parser* parser, 
> > > > bool* non_constant_p,
> > > >   if (cp_lexer_next_token_is (parser->lexer, CPP_EQ))
> > > > /* Consume the `='.  */
> > > > cp_lexer_consume_token (parser->lexer);
> > > > + else
> > > > +   direct_p = true;
> > > > }
> > > >  /* Also, if the next token is an identifier and the following 
> > > > one is a
> > > >  colon, we are looking at the GNU designated-initializer
> > > > @@ -25817,6 +25820,9 @@ cp_parser_initializer_list (cp_parser* parser, 
> > > > bool* non_constant_p,
> > > >  if (clause_non_constant_p && non_constant_p)
> > > > *non_constant_p = true;
> > > > +  if (TREE_CODE (initializer) == CONSTRUCTOR)
> > > > +   CONSTRUCTOR_IS_DIRECT_INIT (initializer) |= direct_p;
> > > 
> > > Why |= rather than = ?
> > 
> > CONSTRUCTOR_IS_DIRECT_INIT could already have been set earlier so using
> > = might wrongly clear it.  I saw this in direct-enum-init1.C.
> 
> What is setting it earlier?

cp_parser_functional_cast.  Test:

enum class C {};

template 
void
foo ()
{
  C c = { C{8} };
}

void
test ()
{
  foo<0> ();
}

The template actually matters here because then finish_compound_literal
returns {8} and not just 8 due to:

  /* If we're in a template, return the original compound literal.  */
  if (orig_cl)
return orig_cl;
 
> The patch is OK with a comment explaining that.

Ok, I'll say that CONSTRUCTOR_IS_DIRECT_INIT could have been set in
cp_parser_functional_cast so we must be careful not to clear the flag.
Thanks,

Marek



Re: [PATCH] expmed: Allow extract_bit_field via mem for low-precision modes.

2023-08-30 Thread Richard Sandiford via Gcc-patches
Robin Dapp  writes:
>> But in the VLA case, doesn't it instead have precision 4+4X?
>> The problem then is that we can't tell at compile time which
>> byte that corresponds to.  So...
>
> Yes 4 + 4x.  I keep getting confused with poly modes :)
> In this case we want to extract the bitnum [3 4] = 3 + 4x which
> would be in byte 0 for x = 0 or x = 1 and in byte 1 for x = 2, 3 and
> so on.
>
> Can't we still make that work somehow?  As far as I can tell we're looking
> for the byte range to be accessed.  It's not like we have a precision or
> bitnum of e.g. [3 17] where the access could be anywhere but still a pow2
> fraction of BITS_PER_UNIT.
>
> I'm just having trouble writing that down.
>
> What about something like
>
> int factor = BITS_PER_UINT / prec.coeffs[0];
> bytenum = force_align_down_and_div (bitnum, prec.coeffs[0]);
> bytenum *= factor;
>
> (or a similar thing done manually without helpers) guarded by the
> proper condition?
> Or do we need something more generic for the factor (i.e. prec.coeffs[0])
> is not enough when we have a precision like [8 16]? Does that even exist?.

It's not just a question of which byte though.  It's also a question
of which bit.

One option would be to code-generate for even X and for odd X, and select
between them at runtime.  But that doesn't scale well to 2+2X and 1+1X.

Otherwise I think we need to treat the bit position as a variable,
with bitpos % 8 and bitpos / 8 being calculated at runtime.

Thanks,
Richard




Re: [PATCH] expmed: Allow extract_bit_field via mem for low-precision modes.

2023-08-30 Thread Robin Dapp via Gcc-patches
> But in the VLA case, doesn't it instead have precision 4+4X?
> The problem then is that we can't tell at compile time which
> byte that corresponds to.  So...

Yes 4 + 4x.  I keep getting confused with poly modes :)
In this case we want to extract the bitnum [3 4] = 3 + 4x which
would be in byte 0 for x = 0 or x = 1 and in byte 1 for x = 2, 3 and
so on.

Can't we still make that work somehow?  As far as I can tell we're looking
for the byte range to be accessed.  It's not like we have a precision or
bitnum of e.g. [3 17] where the access could be anywhere but still a pow2
fraction of BITS_PER_UNIT.

I'm just having trouble writing that down.

What about something like

int factor = BITS_PER_UINT / prec.coeffs[0];
bytenum = force_align_down_and_div (bitnum, prec.coeffs[0]);
bytenum *= factor;

(or a similar thing done manually without helpers) guarded by the
proper condition?
Or do we need something more generic for the factor (i.e. prec.coeffs[0])
is not enough when we have a precision like [8 16]? Does that even exist?.

Regards
 Robin


Re: [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches




On 30/08/2023 14:01, Richard Biener wrote:

On Wed, Aug 30, 2023 at 11:15 AM Andre Vieira (lists) via Gcc-patches
 wrote:


This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE
hook to enable rejecting SVE modes when the target architecture does not
support SVE.


How does the graph node of the SIMD clone lack this information?  That is, it
should have information on the types (and thus modes) for all formal arguments
and return values already, no?  At least the target would know how to
instantiate
it if it's not readily available at the point of use.



Yes it does, but that's the modes the simd clone itself uses, it does 
not know what vector_mode we are currently vectorizing for. Which is 
exactly why we need the vinfo's vector_mode to make sure the simd clone 
and its types are compatible with the vector mode.


In practice, to make sure that a SVE simd clones are only used in loops 
being vectorized for SVE modes. Having said that... I just realized that 
the simdlen check already takes care of that currently...


by simdlen check I mean the one that writes off simdclones that match:
if (!constant_multiple_p (vf, n->simdclone->simdlen, _calls)

However, when using -msve-vector-bits this will become an issue, as the 
VF will be constant and we will match NEON simdclones.  This requires 
some further attention though given that we now also reject the use of 
SVE simdclones when using -msve-vector-bits, and I'm not entirely sure 
we should...


I'm going on holidays for 2 weeks now though, so I'll have a look at 
that scenario when I get back. Same with other feedback, didn't expect 
feedback this quickly ;) Thank you!!


Kind regards,
Andre



Re: [pushed] analyzer: fix ICE in text art strings support

2023-08-30 Thread David Malcolm via Gcc-patches
On Wed, 2023-08-30 at 11:52 +0530, Prathamesh Kulkarni wrote:
> On Wed, 30 Aug 2023 at 04:21, David Malcolm 
> wrote:
> > 
> > On Tue, 2023-08-29 at 11:01 +0530, Prathamesh Kulkarni wrote:
> > > On Fri, 25 Aug 2023 at 18:15, David Malcolm via Gcc-patches
> > >  wrote:
> > > > 
> > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > > Pushed to trunk as r14-3481-g99a3fcb8ff0bf2.
> > > Hi David,
> > > It seems the new tests FAIL on arm for LTO bootstrap config:
> > > https://ci.linaro.org/job/tcwg_bootstrap_check--master-arm-check_bootstrap_lto-build/263/artifact/artifacts/06-check_regression/fails.sum/*view*/
> > 
> > Sorry about this.
> > 
> > Looking at e.g. the console.log.xz, I just see the status of the
> > failing tests.
> > 
> > Is there an easy way to get at the stderr from the tests without
> > rerunning this?
> > 
> > Otherwise, I'd appreciate help with reproducing this.
> Hi David,
> I have attached make check log for the failing tests.
> To reproduce, I configured and built gcc with following options on
> armv8 machine:
> ../gcc/configure --enable-languages=c,c++,fortran --with-float=hard
> --with-fpu=neon-fp-armv8 --with-mode=thumb --with-arch=armv8-a
> --disable-werror --with-build-config=bootstrap-lto
> make -j$(nproc)

Thanks.

Looks a lot like PR analyzer/110483, which I'm working on now (sorry!)

What's the endianness of the host?


Specifically, the pertinent part of the log is:

FAIL: gcc.dg/analyzer/out-of-bounds-diagram-17.c (test for excess errors)
Excess errors:
   ┌─┬─┬┬┬┐┌─┬─┬─┐
   │ [1] │ [1] │[1] │[1] │[1] ││ [1] │ [1] │ [1] │
   ├─┼─┼┼┼┤├─┼─┼─┤
   │ ' ' │ 'w' │'o' │'r' │'l' ││ 'd' │ '!' │ NUL │
   ├─┴─┴┴┴┴┴─┴─┴─┤
   │  string literal (type: 'char[8]')   │
   └─┘
  │ ││││  │ │ │
  │ ││││  │ │ │
  v vvvv  v v v
  ┌─┬┬┐┌─┐
  │ [0] │  ...   │[9] ││ │
  ├─┴┴┤│after valid range│
  │ 'buf' (type: 'char[10]')  ││ │
  └───┘└─┘
  ├─┬─┤├┬┤
│   │
  ╭─┴╮╭─┴─╮
  │capacity: 10 bytes││overflow of 3 bytes│
  ╰──╯╰───╯

where the issue seems to be all those [1], which are meant to be index
[0], [1], [2], etc.


Dave


Re: [PATCH] c++: disallow constinit on functions [PR111173]

2023-08-30 Thread Jason Merrill via Gcc-patches

On 8/29/23 15:01, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --

[dcl.constinit]/1: The constinit specifier shall be applied only to a 
declaration
of a variable with static or thread storage duration.

and while we detect

   constinit int fn();

we weren't detecting

   using F = int();
   constinit F f;

PR c++/73

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Disallow constinit on functions.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constinit19.C: New test.
---
  gcc/cp/decl.cc   | 3 +++
  gcc/testsuite/g++.dg/cpp2a/constinit19.C | 5 +
  2 files changed, 8 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constinit19.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index bea0ee92106..a0e8a24efc0 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -14639,6 +14639,9 @@ grokdeclarator (const cp_declarator *declarator,
"storage class % invalid for "
"function %qs", name);
  }
+   else if (constinit_p)
+ error_at (declspecs->locations[ds_constinit],
+   "% specifier invalid for function %qs", name);
  
  if (virt_specifiers)

error ("virt-specifiers in %qs not allowed outside a class "
diff --git a/gcc/testsuite/g++.dg/cpp2a/constinit19.C 
b/gcc/testsuite/g++.dg/cpp2a/constinit19.C
new file mode 100644
index 000..5be610a18a2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constinit19.C
@@ -0,0 +1,5 @@
+// PR c++/73
+// { dg-do compile { target c++20 } }
+
+using Function = int();
+constinit Function f; // { dg-error ".constinit. specifier invalid for 
function" }

base-commit: fce74ce2535aa3b7648ba82e7e61eb77d0175546




RE: [PATCH] expmed: Allow extract_bit_field via mem for low-precision modes.

2023-08-30 Thread Richard Sandiford via Gcc-patches
[Sorry for any weird MUA issues, don't have access to my usual set-up.]

> when looking at a riscv ICE in vect-live-6.c I noticed that we
> assume that the variable part (coeffs[1] * x1) of the to-be-extracted
> bit number in extract_bit_field_1 is a multiple of BITS_PER_UNIT.
>
> This means that bits_to_bytes_round_down and num_trailing_bits
> cannot handle e.g. extracting from a "VNx4BI"-mode vector which has
> 4-bit precision on riscv.

But in the VLA case, doesn't it instead have precision 4+4X?
The problem then is that we can't tell at compile time which
byte that corresponds to.  So...

> This patch adds a special case for that situation and sets bytenum to
> zero as well as bitnum to its proper value.  It works for the riscv
> case because in all other situations we can align to a byte boundary.
> If x1 were 3 for some reason, however, the above assertion would still
> fail.  I don't think this can happen for riscv as we only ever double
> the number of chunks for larger vector sizes but not sure about the
> general case.
>
> If there's another, correct way to work around feel free to suggest.
>
> Bootstrap/testsuite on aarch64 and x86 is running but I would be
> surprised if there were any changes as riscv is the only target that
> uses modes with precision < 8.
>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
>   * expmed.cc (extract_bit_field_1): Handle bitnum with variable
>   part less than BITS_PER_UNIT.
> ---
>  gcc/expmed.cc | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> index e22e43c8505..1b0119f9cfc 100644
> --- a/gcc/expmed.cc
> +++ b/gcc/expmed.cc
> @@ -1858,8 +1858,22 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
> poly_uint64 bitnum,
>   but is useful for things like vector booleans.  */
>if (MEM_P (op0) && !bitnum.is_constant ())
>  {
> -  bytenum = bits_to_bytes_round_down (bitnum);
> -  bitnum = num_trailing_bits (bitnum);
> +  /* bits_to_bytes_round_down tries to align to a byte (BITS_PER_UNIT)
> +  boundary and asserts that bitnum.coeffs[1] % BITS_PER_UNIT == 0.
> +  For modes with precision < BITS_PER_UNIT this fails but we can
> +  still extract from the first byte.  */
> +  poly_uint16 prec = GET_MODE_PRECISION (outermode);
> +  if (prec.coeffs[1] < BITS_PER_UNIT && bitnum.coeffs[1] < BITS_PER_UNIT)
> + {
> +   bytenum = 0;
> +   bitnum = bitnum.coeffs[0] & (BITS_PER_UNIT - 1);

...this doesn't look right.  We can't drop bitnum.coeffs[1] when it's
nonzero, because it says that for some runtime vector sizes, the bit
position might be higher than bitnum.coeffs[0].

Also, it's not possible to access coeffs[1] unconditionally in
target-independent code.

Thanks,
Richard

> + }
> +  else
> + {
> +   bytenum = bits_to_bytes_round_down (bitnum);
> +   bitnum = num_trailing_bits (bitnum);
> + }
> +
>poly_uint64 bytesize = bits_to_bytes_round_up (bitnum + bitsize);
>op0 = adjust_bitfield_address_size (op0, BLKmode, bytenum, bytesize);
>op0_mode = opt_scalar_int_mode ();



Re: RFC: Introduce -fhardened to enable security-related flags

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, Aug 30, 2023 at 12:51 PM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Tue, Aug 29, 2023 at 03:42:27PM -0400, Marek Polacek via Gcc-patches wrote:
> > +   if (UNLIKELY (flag_hardened)
> > +   && (opt->code == OPT_D || opt->code == OPT_U))
> > + {
> > +   if (!fortify_seen_p)
> > + fortify_seen_p = !strncmp (opt->arg, "_FORTIFY_SOURCE", 15);
>
> Perhaps this should check that the char after it is either '\0' or '=', we
> shouldn't care if user defines or undefines _FORTIFY_SOURCE_WHATEVER macro.
>
> > +   if (!cxx_assert_seen_p)
> > + cxx_assert_seen_p = !strcmp (opt->arg, "_GLIBCXX_ASSERTIONS");
>
> Like we don't care in this case about -D_GLIBCXX_ASSERTIONS42
>
> > + }
> > + }
> > +
> > +  if (flag_hardened)
> > + {
> > +   if (!fortify_seen_p && optimize > 0)
> > + {
> > +   if (TARGET_GLIBC_MAJOR == 2 && TARGET_GLIBC_MINOR >= 35)
> > + cpp_define (parse_in, "_FORTIFY_SOURCE=3");
> > +   else
> > + cpp_define (parse_in, "_FORTIFY_SOURCE=2");
>
> I wonder if it wouldn't be better to enable _FORTIFY_SOURCE=2 by default for
> -fhardened only for targets which actually have such a support in the C
> library.  There is some poor man's _FORTIFY_SOURCE support in libssp,
> but e.g. one has to link with -lssp in that case and compile with
> -isystem `gcc -print-include-filename=include`/ssp .
> For glibc that is >= 2.3.4, https://maskray.me/blog/2022-11-06-fortify-source
> mentions NetBSD support since 2006, newlib since 2017, some Darwin libc,
> bionic (but seems they have only some clang support and dropped GCC
> support) and some third party reimplementation of libssp.
> Or do we just enable it and hope that either it works well or isn't
> supported at all quietly?  E.g. it would certainly break the ssp case
> where -isystem finds ssp headers but -lssp isn't linked in.
>
> > @@ -4976,6 +4993,22 @@ process_command (unsigned int decoded_options_count,
> >  #endif
> >  }
> >
> > +  /* TODO: check if -static -pie works and maybe use it.  */
> > +  if (flag_hardened && !any_link_options_p && !static_p)
> > +{
> > +  save_switch ("-pie", 0, NULL, /*validated=*/true, /*known=*/false);
> > +  /* TODO: check if BIND_NOW/RELRO is supported.  */
> > +  if (true)
> > + {
> > +   /* These are passed straight down to collect2 so we have to break
> > +  it up like this.  */
> > +   add_infile ("-z", "*");
> > +   add_infile ("now", "*");
> > +   add_infile ("-z", "*");
> > +   add_infile ("relro", "*");
>
> As the TODO comment says, to do that we need to check at configure time that
> linker supports -z now and -z relro options.
>
> > @@ -1117,9 +1121,12 @@ finish_options (struct gcc_options *opts, struct 
> > gcc_options *opts_set,
> >  }
> >
> >/* We initialize opts->x_flag_stack_protect to -1 so that targets
> > - can set a default value.  */
> > + can set a default value.  With --enable-default-ssp or -fhardened
> > + the default is -fstack-protector-strong.  */
> >if (opts->x_flag_stack_protect == -1)
> > -opts->x_flag_stack_protect = DEFAULT_FLAG_SSP;
> > +opts->x_flag_stack_protect = (opts->x_flag_hardened
> > +   ? SPCT_FLAG_STRONG
> > +   : DEFAULT_FLAG_SSP);
>
> This needs to be careful, -fstack-protector isn't supported on all targets
> (e.g. ia64) and we don't want toplev.cc warning:
>   /* Targets must be able to place spill slots at lower addresses.  If the
>  target already uses a soft frame pointer, the transition is trivial.  */
>   if (!FRAME_GROWS_DOWNWARD && flag_stack_protect)
> {
>   warning_at (UNKNOWN_LOCATION, 0,
>   "%<-fstack-protector%> not supported for this target");
>   flag_stack_protect = 0;
> }
> to be emitted whenever using -fhardened, it should not be enabled there
> silently (for ia64 Fedora/RHEL gcc actually had a short patch to make it
> work, turn the target into FRAME_GROWS_DOWNWARD one if -fstack-protect* was
> enabled and otherwise keep it !FRAME_GROWS_DOWNWARD).

I'll note that with selectively enabling parts of -fhardening it can
also give a false
sensation of safety when under the hood we ignore half of the option due to
one or another reason ...

How does -fhardening reflect into -[gf]record-gcc-switches?  Is it at
least possible
to verify the actually enabled bits?

Richard.

> Jakub
>


Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:

> This patch adds a new target hook to enable us to adapt the types of return
> and parameters of simd clones.  We use this in two ways, the first one is to
> make sure we can create valid SVE types, including the SVE type attribute,
> when creating a SVE simd clone, even when the target options do not support
> SVE.  We are following the same behaviour seen with x86 that creates simd
> clones according to the ABI rules when no simdlen is provided, even if that
> simdlen is not supported by the current target options.  Note that this
> doesn't mean the simd clone will be used in auto-vectorization.

You are not documenting the bool parameter of the new hook.

What's wrong with doing the adjustment in TARGET_SIMD_CLONE_ADJUST?

> gcc/ChangeLog:
> 
>   (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Define.
>   * doc/tm.texi (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Document.
>   * doc/tm.texi.in (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): New.
>   * omp-simd-clone.cc (simd_adjust_return_type): Call new hook.
>   (simd_clone_adjust_argument_types): Likewise.
>   * target.def (adjust_ret_or_param): New hook.
>   * targhooks.cc (default_simd_clone_adjust_ret_or_param): New.
>   * targhooks.h (default_simd_clone_adjust_ret_or_param): New.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, Aug 30, 2023 at 11:15 AM Andre Vieira (lists) via Gcc-patches
 wrote:
>
> This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE
> hook to enable rejecting SVE modes when the target architecture does not
> support SVE.

How does the graph node of the SIMD clone lack this information?  That is, it
should have information on the types (and thus modes) for all formal arguments
and return values already, no?  At least the target would know how to
instantiate
it if it's not readily available at the point of use.

> gcc/ChangeLog:
>
> * config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add mode
> parameter and use to to reject SVE modes when target architecture does
> not support SVE.
> * config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused mode 
> parameter.
> * config/i386/i386.cc (ix86_simd_clone_usable): Likewise.
> * doc/tm.texi (TARGET_SIMD_CLONE_USABLE): Document new parameter.
> * target.def (usable): Add new parameter.
> * tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass vector mode
> to TARGET_SIMD_CLONE_CALL hook.


Re: [PATCH 4/8] vect: don't allow fully masked loops with non-masked simd clones [PR 110485]

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:

> When analyzing a loop and choosing a simdclone to use it is possible to choose
> a simdclone that cannot be used 'inbranch' for a loop that can use partial
> vectors.  This may lead to the vectorizer deciding to use partial vectors
> which are not supported for notinbranch simd clones. This patch fixes that by
> disabling the use of partial vectors once a notinbranch simd clone has been
> selected.

OK.

> gcc/ChangeLog:
> 
>   PR tree-optimization/110485
>   * tree-vect-stmts.cc (vectorizable_simd_clone_call): Disable partial
>   vectors usage if a notinbranch simdclone has been selected.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/gomp/pr110485.c: New test.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [Patch 3/8] vect: Fix vect_get_smallest_scalar_type for simd clones

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:

> The vect_get_smallest_scalar_type helper function was using any argument to a
> simd clone call when trying to determine the smallest scalar type that would
> be vectorized.  This included the function pointer type in a MASK_CALL for
> instance, and would result in the wrong type being selected.  Instead this
> patch special cases simd_clone_call's and uses only scalar types of the
> original function that get transformed into vector types.

Looks sensible.

+bool
+simd_clone_call_p (gimple *stmt, cgraph_node **out_node)

you could return the cgraph_node * or NULL here.  Are you going to
use the function elsewhere?  Otherwise put it in the same TU as
the only use please and avoid exporting it.

Richard.

> gcc/ChangeLog:
> 
>   * tree-vect-data-refs.cci (vect_get_smallest_scalar_type): Special
>   case
>   simd clone calls and only use types that are mapped to vectors.
>   * tree-vect-stmts.cc (simd_clone_call_p): New helper function.
>   * tree-vectorizer.h (simd_clone_call_p): Declare new function.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-simd-clone-16f.c: Remove unnecessary differentation
>   between targets with different pointer sizes.
>   * gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
>   * gcc.dg/vect/vect-simd-clone-18f.c: Likewise.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [Patch] OpenMP (C only): omp allocate - handle stack vars, improve diagnostic

2023-08-30 Thread Tobias Burnus

Attached is an incremental patch to add diagnostic for the in-between
allocator issues, i.e.

On 30.08.23 12:47, Tobias Burnus wrote:

omp_allocator_handle_t uninit;
  int var, var2;
  uninit = omp_low_lat_mem_alloc;
  omp_allocator_handle_t late_declared = omp_low_lat_mem_alloc;
#pragma omp allocate(var) allocator(uninit)
#pragma omp allocate(var) allocator(late_declared)


Further comments, remarks and suggestions to this patch - or the base
patch (v2 in previous email) are highly welcome.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP (C only): omp allocate - improve diagnostic regarding the allocator

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_allocate): Diagnose when allocator
	is declared or modified between the declaration of a list item
	and the 'omp allocate' directive.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/allocate-5.c: Fix testcase.
* c-c++-common/gomp/allocate-12.c: New test.

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index dcc5de7ad93..dfef61da082 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -19445,6 +19445,44 @@ c_parser_omp_allocate (c_parser *parser)
 		  "%qD not yet supported", var);
 	  continue;
 	}
+  if (allocator
+	  && TREE_CODE (allocator) == VAR_DECL
+	  && c_check_in_current_scope (var))
+	{
+	  if (DECL_SOURCE_LOCATION (allocator) > DECL_SOURCE_LOCATION (var))
+	{
+	  error_at (OMP_CLAUSE_LOCATION (nl),
+			"allocator variable %qD must be declared before %qD",
+			allocator, var);
+	  inform (DECL_SOURCE_LOCATION (allocator), "declared here");
+	  inform (DECL_SOURCE_LOCATION (var), "declared here");
+	}
+	  else
+	   {
+	 gcc_assert (cur_stmt_list
+			 && TREE_CODE (cur_stmt_list) == STATEMENT_LIST);
+	 tree_stmt_iterator l = tsi_last (cur_stmt_list);
+	 while (!tsi_end_p (l))
+	   {
+		 if (EXPR_LOCATION (*l) < DECL_SOURCE_LOCATION (var))
+		   break;
+		 if (TREE_CODE (*l) == MODIFY_EXPR
+		 && TREE_OPERAND (*l, 0) == allocator)
+		   {
+		 error_at (EXPR_LOCATION (*l),
+			   "allocator variable %qD, used in the "
+			   "% directive for %qD, must not be "
+			   "modified between declaration of %qD and its "
+			   "% directive",
+			   allocator, var, var);
+		 inform (DECL_SOURCE_LOCATION (var), "declared here");
+		 inform (OMP_CLAUSE_LOCATION (nl), "used here");
+		 break;
+		  }
+		--l;
+	 }
+	   }
+	}
   DECL_ATTRIBUTES (var) = tree_cons (get_identifier ("omp allocate"),
 	 build_tree_list (allocator, alignment),
 	 DECL_ATTRIBUTES (var));
diff --git a/gcc/testsuite/c-c++-common/gomp/allocate-5.c b/gcc/testsuite/c-c++-common/gomp/allocate-5.c
index de1efc6832d..2ca4786264f 100644
--- a/gcc/testsuite/c-c++-common/gomp/allocate-5.c
+++ b/gcc/testsuite/c-c++-common/gomp/allocate-5.c
@@ -18,9 +18,9 @@ typedef enum omp_allocator_handle_t
 void
 foo ()
 {
+  omp_allocator_handle_t my_allocator = omp_default_mem_alloc;
   int a, b;
   static int c;
-  omp_allocator_handle_t my_allocator;
 #pragma omp allocate (a)  /* { dg-message "sorry, unimplemented: '#pragma omp allocate' not yet supported" "" { target c++ } } */
 #pragma omp allocate (b) allocator(my_allocator)  /* { dg-message "sorry, unimplemented: '#pragma omp allocate' not yet supported" "" { target c++ } } */
 #pragma omp allocate(c) align(32)
diff --git a/gcc/testsuite/c-c++-common/gomp/allocate-12.c b/gcc/testsuite/c-c++-common/gomp/allocate-12.c
new file mode 100644
index 000..38836ef5089
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/allocate-12.c
@@ -0,0 +1,43 @@
+/* TODO: enable for C++ once implemented. */
+/* { dg-do compile { target c } } */
+
+typedef enum omp_allocator_handle_t
+{
+  omp_default_mem_alloc = 1,
+  omp_low_lat_mem_alloc = 5,
+  __omp_allocator_handle_t_max__ = __UINTPTR_MAX__
+} omp_allocator_handle_t;
+
+int
+f ()
+{
+  omp_allocator_handle_t my_allocator;
+  int n = 5;  /* { dg-note "declared here" } */
+  my_allocator = omp_default_mem_alloc;  /* { dg-error "allocator variable 'my_allocator' must not be modified between declaration of 'n' and its 'allocate' directive" } */
+  #pragma omp allocate(n) allocator(my_allocator)  /* { dg-note "used here" } */
+  n = 7;
+  return n;
+}
+
+
+int
+g ()
+{
+  int n = 5;  /* { dg-note "declared here" } */
+  omp_allocator_handle_t my_allocator = omp_low_lat_mem_alloc;  /* { dg-note "declared here" } */
+  #pragma omp allocate(n) allocator(my_allocator)  /* { dg-error "allocator variable 'my_allocator' must be declared before 'n'" } */
+  n = 7;
+  return n;
+}
+
+int
+h ()
+{
+  /* my_allocator uninitialized - but only diagnosed in the ME with -Wuninitialized;
+ see gomp/allocate-10.c.  */
+  omp_allocator_handle_t my_allocator;
+  int n = 5;
+  

RE: [PATCH] test: Adapt slp-26.c check for RVV

2023-08-30 Thread Li, Pan2 via Gcc-patches
Committed, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Wednesday, August 30, 2023 8:23 PM
To: Juzhe-Zhong 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] test: Adapt slp-26.c check for RVV

On Wed, 30 Aug 2023, Juzhe-Zhong wrote:

> Fix FAILs:
> FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
> "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
> "vectorizing stmts using SLP" 0
> FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorizing stmts using 
> SLP" 0
> 
> Since RVV is able to vectorize it with VLS modes like amdgcn.

OK

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/slp-26.c: Adapt for RVV.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/slp-26.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-26.c 
> b/gcc/testsuite/gcc.dg/vect/slp-26.c
> index d398a5acb0c..196981d83c1 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-26.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-26.c
> @@ -47,7 +47,7 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target 
> { ! { mips_msa || amdgcn-*-* } } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { mips_msa || amdgcn-*-* } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
> { target { ! { mips_msa || amdgcn-*-* } } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { mips_msa || amdgcn-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target 
> { ! { mips_msa || { amdgcn-*-* || riscv_vector } } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { mips_msa || { amdgcn-*-* || riscv_vector } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
> { target { ! { mips_msa || { amdgcn-*-* || riscv_vector } } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { mips_msa || { amdgcn-*-* || riscv_vector } } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


RE: [PATCH] test: Add xfail into slp-reduc-7.c for RVV VLA vectorization

2023-08-30 Thread Li, Pan2 via Gcc-patches
Committed, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Wednesday, August 30, 2023 8:23 PM
To: Juzhe-Zhong 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] test: Add xfail into slp-reduc-7.c for RVV VLA 
vectorization

On Wed, 30 Aug 2023, Juzhe-Zhong wrote:

> Like ARM SVE, add RVV variable length xfail.

OK

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/slp-reduc-7.c: Add RVV.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/slp-reduc-7.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c 
> b/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c
> index 7a958f24733..a8528ab53ee 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c
> @@ -57,5 +57,5 @@ int main (void)
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail 
> vect_no_int_add } } } */
>  /* For variable-length SVE, the number of scalar statements in the
> reduction exceeds the number of elements in a 128-bit granule.  */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { xfail { vect_no_int_add || { aarch64_sve && vect_variable_length } } } } } 
> */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { xfail { vect_no_int_add || { { aarch64_sve && vect_variable_length } || { 
> riscv_vector && vect_variable_length } } } } } } */
>  /* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" { xfail { 
> aarch64_sve && vect_variable_length } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [Patch 2/8] parloops: Allow poly nit and bound

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:

> Teach parloops how to handle a poly nit and bound e ahead of the changes to
> enable non-constant simdlen.

Can you use poly_int_tree_p to combine INTEGER_CST || POLY_INT_CST please?

OK with that change.

> gcc/ChangeLog:
> 
>   * tree-parloops.cc (try_to_transform_to_exit_first_loop_alt): Accept
>   poly NIT and ALT_BOUND.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH 1/8] parloops: Copy target and optimizations when creating a function clone

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:

> 
> SVE simd clones require to be compiled with a SVE target enabled or the
> argument types will not be created properly. To achieve this we need to copy
> DECL_FUNCTION_SPECIFIC_TARGET from the original function declaration to the
> clones.  I decided it was probably also a good idea to copy
> DECL_FUNCTION_SPECIFIC_OPTIMIZATION in case the original function is meant to
> be compiled with specific optimization options.

OK.

> gcc/ChangeLog:
> 
>   * tree-parloops.cc (create_loop_fn): Copy specific target and
>   optimization options to clone.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] tree-optimization/111228 - combine two VEC_PERM_EXPRs

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Jakub Jelinek wrote:

> On Wed, Aug 30, 2023 at 01:54:46PM +0200, Richard Biener via Gcc-patches 
> wrote:
> > * gcc.dg/tree-ssa/forwprop-42.c: New testcase.
> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O -fdump-tree-cddce1" } */
> > +
> > +typedef unsigned long v2di __attribute__((vector_size(16)));
> 
> Shouldn't this be unsigned long long ?  Otherwise it is actually V4SImode
> rather than V2DImode.

Fixed like this.

Richard.

>From 695caedeb1b89ec05c727b2e2aacc2a27aa16c42 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Wed, 30 Aug 2023 14:24:57 +0200
Subject: [PATCH] tree-optimization/111228 - fix testcase
To: gcc-patches@gcc.gnu.org

* gcc.dg/tree-ssa/forwprop-42.c: Use __UINT64_TYPE__ instead
of unsigned long.
---
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
index f3dbc3e9394..257a05d3ec8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O -fdump-tree-cddce1" } */
 
-typedef unsigned long v2di __attribute__((vector_size(16)));
+typedef __UINT64_TYPE__ v2di __attribute__((vector_size(16)));
 
 v2di g;
 void test (v2di *v)
-- 
2.35.3



Re: [PATCH] test: Add xfail into slp-reduc-7.c for RVV VLA vectorization

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Juzhe-Zhong wrote:

> Like ARM SVE, add RVV variable length xfail.

OK

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/slp-reduc-7.c: Add RVV.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/slp-reduc-7.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c 
> b/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c
> index 7a958f24733..a8528ab53ee 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c
> @@ -57,5 +57,5 @@ int main (void)
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail 
> vect_no_int_add } } } */
>  /* For variable-length SVE, the number of scalar statements in the
> reduction exceeds the number of elements in a 128-bit granule.  */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { xfail { vect_no_int_add || { aarch64_sve && vect_variable_length } } } } } 
> */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { xfail { vect_no_int_add || { { aarch64_sve && vect_variable_length } || { 
> riscv_vector && vect_variable_length } } } } } } */
>  /* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" { xfail { 
> aarch64_sve && vect_variable_length } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] test: Adapt slp-26.c check for RVV

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Juzhe-Zhong wrote:

> Fix FAILs:
> FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
> "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
> "vectorizing stmts using SLP" 0
> FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorizing stmts using 
> SLP" 0
> 
> Since RVV is able to vectorize it with VLS modes like amdgcn.

OK

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/slp-26.c: Adapt for RVV.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/slp-26.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-26.c 
> b/gcc/testsuite/gcc.dg/vect/slp-26.c
> index d398a5acb0c..196981d83c1 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-26.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-26.c
> @@ -47,7 +47,7 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target 
> { ! { mips_msa || amdgcn-*-* } } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { mips_msa || amdgcn-*-* } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
> { target { ! { mips_msa || amdgcn-*-* } } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { mips_msa || amdgcn-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target 
> { ! { mips_msa || { amdgcn-*-* || riscv_vector } } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { mips_msa || { amdgcn-*-* || riscv_vector } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
> { target { ! { mips_msa || { amdgcn-*-* || riscv_vector } } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { mips_msa || { amdgcn-*-* || riscv_vector } } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] expmed: Allow extract_bit_field via mem for low-precision modes.

2023-08-30 Thread Robin Dapp via Gcc-patches
Hi,

when looking at a riscv ICE in vect-live-6.c I noticed that we
assume that the variable part (coeffs[1] * x1) of the to-be-extracted
bit number in extract_bit_field_1 is a multiple of BITS_PER_UNIT.

This means that bits_to_bytes_round_down and num_trailing_bits
cannot handle e.g. extracting from a "VNx4BI"-mode vector which has
4-bit precision on riscv.

This patch adds a special case for that situation and sets bytenum to
zero as well as bitnum to its proper value.  It works for the riscv
case because in all other situations we can align to a byte boundary.
If x1 were 3 for some reason, however, the above assertion would still
fail.  I don't think this can happen for riscv as we only ever double
the number of chunks for larger vector sizes but not sure about the
general case.

If there's another, correct way to work around feel free to suggest.

Bootstrap/testsuite on aarch64 and x86 is running but I would be
surprised if there were any changes as riscv is the only target that
uses modes with precision < 8.

Regards
 Robin

gcc/ChangeLog:

* expmed.cc (extract_bit_field_1): Handle bitnum with variable
part less than BITS_PER_UNIT.
---
 gcc/expmed.cc | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index e22e43c8505..1b0119f9cfc 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -1858,8 +1858,22 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
  but is useful for things like vector booleans.  */
   if (MEM_P (op0) && !bitnum.is_constant ())
 {
-  bytenum = bits_to_bytes_round_down (bitnum);
-  bitnum = num_trailing_bits (bitnum);
+  /* bits_to_bytes_round_down tries to align to a byte (BITS_PER_UNIT)
+boundary and asserts that bitnum.coeffs[1] % BITS_PER_UNIT == 0.
+For modes with precision < BITS_PER_UNIT this fails but we can
+still extract from the first byte.  */
+  poly_uint16 prec = GET_MODE_PRECISION (outermode);
+  if (prec.coeffs[1] < BITS_PER_UNIT && bitnum.coeffs[1] < BITS_PER_UNIT)
+   {
+ bytenum = 0;
+ bitnum = bitnum.coeffs[0] & (BITS_PER_UNIT - 1);
+   }
+  else
+   {
+ bytenum = bits_to_bytes_round_down (bitnum);
+ bitnum = num_trailing_bits (bitnum);
+   }
+
   poly_uint64 bytesize = bits_to_bytes_round_up (bitnum + bitsize);
   op0 = adjust_bitfield_address_size (op0, BLKmode, bytenum, bytesize);
   op0_mode = opt_scalar_int_mode ();
-- 
2.41.0



Re: [PATCH] Adjust costing of emulated vectorized gather/scatter

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, Aug 30, 2023 at 12:38 PM liuhongt via Gcc-patches
 wrote:
>
> r14-332-g24905a4bd1375c adjusts costing of emulated vectorized
> gather/scatter.
> 
> commit 24905a4bd1375ccd99c02510b9f9529015a48315
> Author: Richard Biener 
> Date:   Wed Jan 18 11:04:49 2023 +0100
>
> Adjust costing of emulated vectorized gather/scatter
>
> Emulated gather/scatter behave similar to strided elementwise
> accesses in that they need to decompose the offset vector
> and construct or decompose the data vector so handle them
> the same way, pessimizing the cases with may elements.
> 
>
> But for emulated gather/scatter, offset vector load/vec_construct has
> aready been counted, and in real case, it's probably eliminated by
> later optimizer.
> Also after decomposing, element loads from continous memory could be
> less bounded compared to normal elementwise load.
> The patch decreases the cost a little bit.
>
> This will enable gather emulation for below loop with VF=8(ymm)
>
> double
> foo (double* a, double* b, unsigned int* c, int n)
> {
>   double sum = 0;
>   for (int i = 0; i != n; i++)
> sum += a[i] * b[c[i]];
>   return sum;
> }
>
> For the upper loop, microbenchmark result shows on ICX,
> emulated gather with VF=8 is 30% faster than emulated gather with
> VF=4 when tripcount is big enough.
> It bring back ~4% for 510.parest still ~5% regression compared to
> gather instruction due to throughput bound.
>
> For -march=znver1/2/3/4, the change doesn't enable VF=8(ymm) for the
> loop, VF remains 4(xmm) as before(guess related to their own cost
> model).
>
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/111064
> * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
> Decrease cost a little bit for vec_to_scalar(offset vector) in
> emulated gather.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr111064.c: New test.
> ---
>  gcc/config/i386/i386.cc  | 11 ++-
>  gcc/testsuite/gcc.target/i386/pr111064.c | 12 
>  2 files changed, 22 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr111064.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 1bc3f11ff07..337e0f1bfbb 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -24079,7 +24079,16 @@ ix86_vector_costs::add_stmt_cost (int count, 
> vect_cost_for_stmt kind,
>   || STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == 
> VMAT_GATHER_SCATTER))
>  {
>stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
> -  stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1);
> +  /* For emulated gather/scatter, offset vector load/vec_construct has
> +already been counted and in real case, it's probably eliminated by
> +later optimizer.
> +Also after decomposing, element loads from continous memory
> +could be less bounded compared to normal elementwise load.  */
> +  if (kind == vec_to_scalar
> + && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER)
> +   stmt_cost *= TYPE_VECTOR_SUBPARTS (vectype);

For gather we cost N vector extracts (from the offset vector), N scalar loads
(the actual data loads) and one vec_construct.

For scatter we cost N vector extracts (from the offset vector),
N vector extracts (from the data vector) and N scalar stores.

It was intended penaltize the extracts the same way as vector construction.

Your change will adjust all three different decomposition kinds "a
bit", I realize the
scaling by (TYPE_VECTOR_SUBPARTS + 1) is kind-of arbitrary but so is your
adjustment and I don't see why VMAT_GATHER_SCATTER is special to your
adjustment.

So the comment you put before the special-casing doesn't really make
sense to me.

For zen4 costing we currently have

*_11 8 times vec_to_scalar costs 576 in body
*_11 8 times scalar_load costs 96 in body
*_11 1 times vec_construct costs 792 in body

for zmm

*_11 4 times vec_to_scalar costs 80 in body
*_11 4 times scalar_load costs 48 in body
*_11 1 times vec_construct costs 100 in body

for ymm and

*_11 2 times vec_to_scalar costs 24 in body
*_11 2 times scalar_load costs 24 in body
*_11 1 times vec_construct costs 12 in body

for xmm.  Even with your adjustment if we were to enable cost comparison between
vector sizes we'd choose xmm I bet (you can try by re-ordering the modes in
the ix86_autovectorize_vector_modes hook).  So it feels like a hack.  If you
think that Icelake should enable 4 element vectorized emulated gather then
we should disable this individual scaling and possibly instead penaltize when
the number of (emulated) gathers is too high?

That said, we could count the number of element extracts and inserts
(and maybe [scalar] loads and stores) and at finish_cost time weight them
against the number of "other" operations.

As repeatedly said the current cost 

Re: [PATCH] tree-optimization/111228 - combine two VEC_PERM_EXPRs

2023-08-30 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 30, 2023 at 01:54:46PM +0200, Richard Biener via Gcc-patches wrote:
>   * gcc.dg/tree-ssa/forwprop-42.c: New testcase.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-cddce1" } */
> +
> +typedef unsigned long v2di __attribute__((vector_size(16)));

Shouldn't this be unsigned long long ?  Otherwise it is actually V4SImode
rather than V2DImode.

> +
> +v2di g;
> +void test (v2di *v)
> +{
> +  v2di lo = v[0];
> +  v2di hi = v[1];
> +  v2di res;
> +  res[1] = hi[1];
> +  res[0] = lo[0];
> +  g = res;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR <\[^>\]*, { 0, 3 }>" 1 
> "cddce1" } } */
> -- 
> 2.35.3

Jakub



[PATCH V6] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-30 Thread Juzhe-Zhong
This patch is the final version of enabling vect_int test for RVV.

There are still 80+ FAILs and they can't be fixed by adjusting testcases or 
target-supports.exp

Here is the analysis of **ALL** FAILs:

1. REAL highest priority FAILs:

ICE:
   
FAIL: gcc.dg/vect/vect-live-6.c (internal compiler error: in 
force_align_down_and_div, at poly-int.h:1903)
FAIL: gcc.dg/vect/vect-live-6.c (test for excess errors)
FAIL: gcc.dg/vect/vect-live-6.c -flto -ffat-lto-objects (internal compiler 
error: in force_align_down_and_div, at poly-int.h:1903)
FAIL: gcc.dg/vect/vect-live-6.c -flto -ffat-lto-objects (test for excess errors)

Execution fails:
FAIL: gcc.dg/vect/slp-reduc-7.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/slp-reduc-7.c execution test
FAIL: gcc.dg/vect/vect-alias-check-10.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-10.c execution test
FAIL: gcc.dg/vect/vect-alias-check-11.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-11.c execution test
FAIL: gcc.dg/vect/vect-alias-check-12.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-12.c execution test
FAIL: gcc.dg/vect/vect-alias-check-14.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-alias-check-14.c execution test
FAIL: gcc.dg/vect/vect-double-reduc-5.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-double-reduc-5.c execution test

These FAILs are REAL problem that we need to address first.

2. Missed optimizations due to lacking VLS modes patterns:

FAIL: gcc.dg/vect/pr57705.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loop" 2
FAIL: gcc.dg/vect/pr57705.c scan-tree-dump-times vect "vectorized 1 loop" 2
FAIL: gcc.dg/vect/pr65518.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops in function" 2
FAIL: gcc.dg/vect/pr65518.c scan-tree-dump-times vect "vectorized 0 loops in 
function" 2
FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 4
FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 4
FAIL: gcc.dg/vect/slp-12a.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-12a.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 1
FAIL: gcc.dg/vect/slp-16.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-16.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 2
FAIL: gcc.dg/vect/slp-34-big-array.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-34-big-array.c scan-tree-dump-times vect "vectorizing 
stmts using SLP" 2
FAIL: gcc.dg/vect/slp-34.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-34.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 2
FAIL: gcc.dg/vect/slp-35.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-35.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 1
FAIL: gcc.dg/vect/slp-43.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loops" 13
FAIL: gcc.dg/vect/slp-43.c scan-tree-dump-times vect "vectorized 1 loops" 13
FAIL: gcc.dg/vect/slp-45.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loops" 13
FAIL: gcc.dg/vect/slp-45.c scan-tree-dump-times vect "vectorized 1 loops" 13
FAIL: gcc.dg/vect/slp-47.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-47.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 2
FAIL: gcc.dg/vect/slp-48.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 2
FAIL: gcc.dg/vect/slp-48.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 2

These testcases need VLS modes vec_init patterns.

FAIL: gcc.dg/vect/vect-bic-bitmask-12.c -flto -ffat-lto-objects  scan-tree-dump 
dce7 "<=\\s*.+{ 255,.+}"
FAIL: gcc.dg/vect/vect-bic-bitmask-12.c scan-tree-dump dce7 "<=\\s*.+{ 255,.+}"
FAIL: gcc.dg/vect/vect-bic-bitmask-23.c -flto -ffat-lto-objects  scan-tree-dump 
dce7 "<=\\s*.+{ 255, 15, 1, 65535 }"
FAIL: gcc.dg/vect/vect-bic-bitmask-23.c scan-tree-dump dce7 "<=\\s*.+{ 255, 15, 
1, 65535 }"

These testcases need VLS modes VCOND_MASK and vec_cmp patterns.

3. Maybe bogus dump check FAILs:

FAIL: gcc.dg/vect/vect-multitypes-11.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-multitypes-11.c scan-tree-dump-times vect "vectorized 1 
loops" 1
FAIL: gcc.dg/vect/vect-outer-4c-big-array.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "zero step in outer loop." 1
FAIL: gcc.dg/vect/vect-outer-4c-big-array.c scan-tree-dump-times vect "zero 
step in outer loop." 1
FAIL: gcc.dg/vect/vect-reduc-dot-s16a.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vect_recog_dot_prod_pattern: 

[PATCH] tree-optimization/111228 - combine two VEC_PERM_EXPRs

2023-08-30 Thread Richard Biener via Gcc-patches
The following adds simplification of two VEC_PERM_EXPRs where
the later one replaces all elements from either the first or the
second input of the earlier permute.  This allows a three input
permute to be simplified to a two input one.

I'm following the existing two input simplification case and only
allow non-VLA permutes.  The now existing three cases and the
single case in tree-ssa-forwprop.cc somehow ask for merging,
I'm not doing this as part of this change though.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111228
* match.pd ((vec_perm (vec_perm ..) @5 ..) -> (vec_perm @x @5 ..)):
New simplifications.

* gcc.dg/tree-ssa/forwprop-42.c: New testcase.
---
 gcc/match.pd| 141 +++-
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c |  17 +++
 2 files changed, 155 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 47d2733211a..6a7edde5736 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8993,10 +8993,10 @@ and,
 
 
 /* Merge
-   c = VEC_PERM_EXPR ;
-   d = VEC_PERM_EXPR ;
+ c = VEC_PERM_EXPR ;
+ d = VEC_PERM_EXPR ;
to
-   d = VEC_PERM_EXPR ;  */
+ d = VEC_PERM_EXPR ;  */
 
 (simplify
  (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
@@ -9038,6 +9038,141 @@ and,
  (if (op0)
   (vec_perm @1 @2 { op0; })))
 
+/* Merge
+ c = VEC_PERM_EXPR ;
+ d = VEC_PERM_EXPR ;
+   to
+ d = VEC_PERM_EXPR ;
+   when all elements from a or b are replaced by the later
+   permutation.  */
+
+(simplify
+ (vec_perm @5 (vec_perm@0 @1 @2 VECTOR_CST@3) VECTOR_CST@4)
+ (if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
+  (with
+   {
+ machine_mode result_mode = TYPE_MODE (type);
+ machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
+ int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
+ vec_perm_builder builder0;
+ vec_perm_builder builder1;
+ vec_perm_builder builder2 (nelts, nelts, 2);
+   }
+   (if (tree_to_vec_perm_builder (, @3)
+   && tree_to_vec_perm_builder (, @4))
+(with
+ {
+   vec_perm_indices sel0 (builder0, 2, nelts);
+   vec_perm_indices sel1 (builder1, 2, nelts);
+   bool use_1 = false, use_2 = false;
+
+   for (int i = 0; i < nelts; i++)
+ {
+  if (known_lt ((poly_uint64)sel1[i], sel1.nelts_per_input ()))
+builder2.quick_push (sel1[i]);
+  else
+{
+  poly_uint64 j = sel0[(sel1[i] - sel1.nelts_per_input ())
+   .to_constant ()];
+  if (known_lt (j, sel0.nelts_per_input ()))
+use_1 = true;
+  else
+{
+  use_2 = true;
+  j -= sel0.nelts_per_input ();
+}
+  builder2.quick_push (j + sel1.nelts_per_input ());
+}
+}
+ }
+ (if (use_1 ^ use_2)
+  (with
+   {
+vec_perm_indices sel2 (builder2, 2, nelts);
+tree op0 = NULL_TREE;
+/* If the new VEC_PERM_EXPR can't be handled but both
+   original VEC_PERM_EXPRs can, punt.
+   If one or both of the original VEC_PERM_EXPRs can't be
+   handled and the new one can't be either, don't increase
+   number of VEC_PERM_EXPRs that can't be handled.  */
+if (can_vec_perm_const_p (result_mode, op_mode, sel2, false)
+|| (single_use (@0)
+? (!can_vec_perm_const_p (result_mode, op_mode, sel0, false)
+   || !can_vec_perm_const_p (result_mode, op_mode, sel1, 
false))
+: !can_vec_perm_const_p (result_mode, op_mode, sel1, false)))
+  op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
+   }
+   (if (op0)
+   (switch
+(if (use_1)
+ (vec_perm @5 @1 { op0; }))
+(if (use_2)
+ (vec_perm @5 @2 { op0; })))
+
+/* And the case with swapped outer permute sources.  */
+
+(simplify
+ (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @5 VECTOR_CST@4)
+ (if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
+  (with
+   {
+ machine_mode result_mode = TYPE_MODE (type);
+ machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
+ int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
+ vec_perm_builder builder0;
+ vec_perm_builder builder1;
+ vec_perm_builder builder2 (nelts, nelts, 2);
+   }
+   (if (tree_to_vec_perm_builder (, @3)
+   && tree_to_vec_perm_builder (, @4))
+(with
+ {
+   vec_perm_indices sel0 (builder0, 2, nelts);
+   vec_perm_indices sel1 (builder1, 2, nelts);
+   bool use_1 = false, use_2 = false;
+
+   for (int i = 0; i < nelts; i++)
+ {
+  if (known_ge ((poly_uint64)sel1[i], sel1.nelts_per_input ()))
+builder2.quick_push (sel1[i]);
+  else
+{
+  poly_uint64 j = 

[PATCH] RISC-V: Refactor and clean emit_{vlmax, nonvlmax}_xxx functions

2023-08-30 Thread Lehua Ding
Hi,

This patch refactor the code of emit_{vlmax,nonvlmax}_xxx functions.
These functions are used to generate RVV insn. There are currently 31
such functions and a few duplicates. The reason so many functions are
needed is because there are more types of RVV instructions. There are
patterns that don't have mask operand, patterns that don't have merge
operand, and patterns that don't need a tail policy operand, etc.

Previously there was the insn_type enum, but it's value was just used
to indicate how many operands were passed in by caller. The rest of
the operands information is scattered throughout these functions.
For example, emit_vlmax_fp_insn indicates that a rounding mode operand
of FRM_DYN should also be passed, emit_vlmax_merge_insn means that
there is no mask operand or mask policy operand.

I introduced a new enum insn_flags to indicate some properties of these
RVV patterns. These insn_flags are then used to define insn_type enum.
For example for the defintion of WIDEN_TERNARY_OP:

  WIDEN_TERNARY_OP = HAS_DEST_P | HAS_MASK_P | USE_ALL_TRUES_MASK_P
   | TDEFAULT_POLICY_P | MDEFAULT_POLICY_P | TERNARY_OP_P,

This flags mean the RVV pattern has no merge operand. This flags only apply
to vwmacc instructions. After defining the desired insn_type, all the
emit_{vlmax,nonvlmax}_xxx functions are unified into three functions:

  emit_vlmax_insn (icode, insn_flags, ops);
  emit_nonvlmax_insn (icode, insn_flags, ops, vl);
  emit_vlmax_insn_lra (icode, insn_flags, ops, vl);

Then user can select the appropriate insn_type and the appropriate emit_xxx
function for RVV patterns generation as needed.

Best,
Lehua

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Adjust.
* config/riscv/autovec-vls.md: Ditto.
* config/riscv/autovec.md: Ditto.
* config/riscv/riscv-protos.h (enum insn_type): Add insn_type.
(enum insn_flags): Add insn flags.
(emit_vlmax_insn): Adjust.
(emit_vlmax_fp_insn): Delete.
(emit_vlmax_ternary_insn): Delete.
(emit_vlmax_fp_ternary_insn): Delete.
(emit_nonvlmax_insn): Adjust.
(emit_vlmax_slide_insn): Delete.
(emit_nonvlmax_slide_tu_insn): Delete.
(emit_vlmax_merge_insn): Delete.
(emit_vlmax_cmp_insn): Delete.
(emit_vlmax_cmp_mu_insn): Delete.
(emit_vlmax_masked_mu_insn): Delete.
(emit_scalar_move_insn): Delete.
(emit_nonvlmax_integer_move_insn): Delete.
(emit_vlmax_insn_lra): Add.
* config/riscv/riscv-v.cc (get_mask_mode_from_insn_flags): New.
(emit_vlmax_insn): Adjust.
(emit_nonvlmax_insn): Adjust.
(emit_vlmax_insn_lra): Add.
(emit_vlmax_fp_insn): Delete.
(emit_vlmax_ternary_insn): Delete.
(emit_vlmax_fp_ternary_insn): Delete.
(emit_vlmax_slide_insn): Delete.
(emit_nonvlmax_slide_tu_insn): Delete.
(emit_nonvlmax_slide_insn): Delete.
(emit_vlmax_merge_insn): Delete.
(emit_vlmax_cmp_insn): Delete.
(emit_vlmax_cmp_mu_insn): Delete.
(emit_vlmax_masked_insn): Delete.
(emit_nonvlmax_masked_insn): Delete.
(emit_vlmax_masked_store_insn): Delete.
(emit_nonvlmax_masked_store_insn): Delete.
(emit_vlmax_masked_mu_insn): Delete.
(emit_vlmax_masked_fp_mu_insn): Delete.
(emit_nonvlmax_tu_insn): Delete.
(emit_nonvlmax_fp_tu_insn): Delete.
(emit_nonvlmax_tumu_insn): Delete.
(emit_nonvlmax_fp_tumu_insn): Delete.
(emit_scalar_move_insn): Delete.
(emit_cpop_insn): Delete.
(emit_vlmax_integer_move_insn): Delete.
(emit_nonvlmax_integer_move_insn): Delete.
(emit_vlmax_gather_insn): Delete.
(emit_vlmax_masked_gather_mu_insn): Delete.
(emit_vlmax_compress_insn): Delete.
(emit_nonvlmax_compress_insn): Delete.
(emit_vlmax_reduction_insn): Delete.
(emit_vlmax_fp_reduction_insn): Delete.
(emit_nonvlmax_fp_reduction_insn): Delete.
(expand_vec_series): Adjust.
(expand_const_vector): Adjust.
(legitimize_move): Adjust.
(sew64_scalar_helper): Adjust.
(expand_tuple_move): Adjust.
(expand_vector_init_insert_elems): Adjust.
(expand_vector_init_merge_repeating_sequence): Adjust.
(expand_vec_cmp): Adjust.
(expand_vec_cmp_float): Adjust.
(expand_vec_perm): Adjust.
(shuffle_merge_patterns): Adjust.
(shuffle_compress_patterns): Adjust.
(shuffle_decompress_patterns): Adjust.
(expand_load_store): Adjust.
(expand_cond_len_op): Adjust.
(expand_cond_len_unop): Adjust.
(expand_cond_len_binop): Adjust.
(expand_gather_scatter): Adjust.
(expand_cond_len_ternop): Adjust.
(expand_reduction): Adjust.
(expand_lanes_load_store): Adjust.
(expand_fold_extract_last): Adjust.
* config/riscv/riscv.cc (vector_zero_call_used_regs): 

[PATCH] test: Add xfail into slp-reduc-7.c for RVV VLA vectorization

2023-08-30 Thread Juzhe-Zhong
Like ARM SVE, add RVV variable length xfail.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-reduc-7.c: Add RVV.

---
 gcc/testsuite/gcc.dg/vect/slp-reduc-7.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c 
b/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c
index 7a958f24733..a8528ab53ee 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c
@@ -57,5 +57,5 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail 
vect_no_int_add } } } */
 /* For variable-length SVE, the number of scalar statements in the
reduction exceeds the number of elements in a 128-bit granule.  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
xfail { vect_no_int_add || { aarch64_sve && vect_variable_length } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
xfail { vect_no_int_add || { { aarch64_sve && vect_variable_length } || { 
riscv_vector && vect_variable_length } } } } } } */
 /* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" { xfail { 
aarch64_sve && vect_variable_length } } } } */
-- 
2.36.3



[PATCH] test: Adapt slp-26.c check for RVV

2023-08-30 Thread Juzhe-Zhong
Fix FAILs:
FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 0
FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 0

Since RVV is able to vectorize it with VLS modes like amdgcn.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-26.c: Adapt for RVV.

---
 gcc/testsuite/gcc.dg/vect/slp-26.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-26.c 
b/gcc/testsuite/gcc.dg/vect/slp-26.c
index d398a5acb0c..196981d83c1 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-26.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-26.c
@@ -47,7 +47,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { 
! { mips_msa || amdgcn-*-* } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
mips_msa || amdgcn-*-* } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 
target { ! { mips_msa || amdgcn-*-* } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { mips_msa || amdgcn-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { 
! { mips_msa || { amdgcn-*-* || riscv_vector } } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
mips_msa || { amdgcn-*-* || riscv_vector } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 
target { ! { mips_msa || { amdgcn-*-* || riscv_vector } } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { mips_msa || { amdgcn-*-* || riscv_vector } } } } } */
-- 
2.36.3



Re: [Patch] OpenMP (C only): omp allocate - handle stack vars, improve diagnostic

2023-08-30 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 30, 2023 at 12:47:42PM +0200, Tobias Burnus wrote:
> > For switches, there is the case of the switch jumping across declaration
> > of an automatic var which is not initialized/constructed (I think in that
> > case there is normally no warning/error and happens a lot in the wild
> > including GCC sources) but perhaps one could treat those cases with
> > #pragma omp allocate as if they are actually constructed (though, I'd still
> > raise it at OpenMP F2F),
> Can you open an OpenMP spec issue?

https://github.com/OpenMP/spec/issues/3676

Feel freeo to amend it.

Jakub



Re: RFC: Introduce -fhardened to enable security-related flags

2023-08-30 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 29, 2023 at 03:42:27PM -0400, Marek Polacek via Gcc-patches wrote:
> +   if (UNLIKELY (flag_hardened)
> +   && (opt->code == OPT_D || opt->code == OPT_U))
> + {
> +   if (!fortify_seen_p)
> + fortify_seen_p = !strncmp (opt->arg, "_FORTIFY_SOURCE", 15);

Perhaps this should check that the char after it is either '\0' or '=', we
shouldn't care if user defines or undefines _FORTIFY_SOURCE_WHATEVER macro.

> +   if (!cxx_assert_seen_p)
> + cxx_assert_seen_p = !strcmp (opt->arg, "_GLIBCXX_ASSERTIONS");

Like we don't care in this case about -D_GLIBCXX_ASSERTIONS42

> + }
> + }
> +
> +  if (flag_hardened)
> + {
> +   if (!fortify_seen_p && optimize > 0)
> + {
> +   if (TARGET_GLIBC_MAJOR == 2 && TARGET_GLIBC_MINOR >= 35)
> + cpp_define (parse_in, "_FORTIFY_SOURCE=3");
> +   else
> + cpp_define (parse_in, "_FORTIFY_SOURCE=2");

I wonder if it wouldn't be better to enable _FORTIFY_SOURCE=2 by default for
-fhardened only for targets which actually have such a support in the C
library.  There is some poor man's _FORTIFY_SOURCE support in libssp,
but e.g. one has to link with -lssp in that case and compile with
-isystem `gcc -print-include-filename=include`/ssp .
For glibc that is >= 2.3.4, https://maskray.me/blog/2022-11-06-fortify-source
mentions NetBSD support since 2006, newlib since 2017, some Darwin libc,
bionic (but seems they have only some clang support and dropped GCC
support) and some third party reimplementation of libssp.
Or do we just enable it and hope that either it works well or isn't
supported at all quietly?  E.g. it would certainly break the ssp case
where -isystem finds ssp headers but -lssp isn't linked in.

> @@ -4976,6 +4993,22 @@ process_command (unsigned int decoded_options_count,
>  #endif
>  }
>  
> +  /* TODO: check if -static -pie works and maybe use it.  */
> +  if (flag_hardened && !any_link_options_p && !static_p)
> +{
> +  save_switch ("-pie", 0, NULL, /*validated=*/true, /*known=*/false);
> +  /* TODO: check if BIND_NOW/RELRO is supported.  */
> +  if (true)
> + {
> +   /* These are passed straight down to collect2 so we have to break
> +  it up like this.  */
> +   add_infile ("-z", "*");
> +   add_infile ("now", "*");
> +   add_infile ("-z", "*");
> +   add_infile ("relro", "*");

As the TODO comment says, to do that we need to check at configure time that
linker supports -z now and -z relro options.

> @@ -1117,9 +1121,12 @@ finish_options (struct gcc_options *opts, struct 
> gcc_options *opts_set,
>  }
>  
>/* We initialize opts->x_flag_stack_protect to -1 so that targets
> - can set a default value.  */
> + can set a default value.  With --enable-default-ssp or -fhardened
> + the default is -fstack-protector-strong.  */
>if (opts->x_flag_stack_protect == -1)
> -opts->x_flag_stack_protect = DEFAULT_FLAG_SSP;
> +opts->x_flag_stack_protect = (opts->x_flag_hardened
> +   ? SPCT_FLAG_STRONG
> +   : DEFAULT_FLAG_SSP);

This needs to be careful, -fstack-protector isn't supported on all targets
(e.g. ia64) and we don't want toplev.cc warning:
  /* Targets must be able to place spill slots at lower addresses.  If the
 target already uses a soft frame pointer, the transition is trivial.  */
  if (!FRAME_GROWS_DOWNWARD && flag_stack_protect)
{
  warning_at (UNKNOWN_LOCATION, 0,
  "%<-fstack-protector%> not supported for this target");
  flag_stack_protect = 0;
}
to be emitted whenever using -fhardened, it should not be enabled there
silently (for ia64 Fedora/RHEL gcc actually had a short patch to make it
work, turn the target into FRAME_GROWS_DOWNWARD one if -fstack-protect* was
enabled and otherwise keep it !FRAME_GROWS_DOWNWARD).

Jakub



Re: [PATCH] RISC-V: Remove movmisalign pattern for VLA modes

2023-08-30 Thread Lehua Ding

Committed, thanks Jeff.

On 2023/8/29 21:48, Jeff Law via Gcc-patches wrote:



On 8/29/23 03:39, Juzhe-Zhong wrote:

This patch fixed this bunch of failures in "vect" testsuite:
FAIL: gcc.dg/vect/pr63341-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-1.c execution test
FAIL: gcc.dg/vect/pr63341-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-2.c execution test
FAIL: gcc.dg/vect/pr94994.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr94994.c execution test
FAIL: gcc.dg/vect/vect-align-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-align-1.c execution test
FAIL: gcc.dg/vect/vect-align-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-align-2.c execution test

Spike report:
z   ra 000100f4 sp 003ffb30 gp 
00012cc8
tp  t0 000102d4 t1 000f t2 

s0  s1  a0 000101a6 a1 
0008
a2 0010 a3 00012401 a4 00012480 a5 
0020
a6 001f a7 00d6 s2  s3 

s4  s5  s6  s7 

s8  s9  sA  sB 

t3  t4  t5  t6 


pc 000101ec va/inst 0206dc07 sr 80026620
Load access fault!

(spike)
core   0: 0x00010204 (0x02065087) vle16.v v1, (a2)
core   0: exception trap_load_address_misaligned, epc 0x00010204
core   0:   tval 0x00012c81
(spike) reg 0 a2
0x00012c81

According to RVV ISA, we couldn't use "vle16.v" if the address is byte 
align.


Such issue is caused by this GIMPLE IR:

vect__1.15_17 = .MASK_LEN_LOAD (vectp_t.13_15, 8B, { -1, ... }, _24, 0);

For partial vectorization, the alignment is "8B" byte align here is 
incorrect here.


After this patch, the vectorization failed:

sll a5,a4,0x1
add a5,a5,a1
lhu a3,64(a5)
lbu a5,66(a5)
addw    a4,a4,1
srl a3,a3,0x8
sll a5,a5,0x8
or  a5,a5,a3
sh  a5,0(a2)
add a2,a2,2
bne a4,a0,101f8 

I will enable auto-vectorization in another approach in the next 
following patch.


gcc/ChangeLog:

* config/riscv/autovec.md (movmisalign): Delete.

OK.
jeff


--
Best,
Lehua



Re: [Patch] OpenMP (C only): omp allocate - handle stack vars, improve diagnostic

2023-08-30 Thread Tobias Burnus

Revised patch included - addresses part of the issues:
* gimplify.cc: Fix placement of GOMP_alloc by really checking for
  DECL_EXPR (experimented with it before but settled
  for a different pattern)
* c/ Add it to has_jump_unsafe_decl similar to VLA
  + added msg to the switch/goto error handling.
  + new c-c++-common/gomp/allocate-11.c testcase for it

But not all discussion points are solved, yet. Namely,
how to diagnose:

  omp_allocator_handle_t uninit;
  int var, var2;
  uninit = omp_low_lat_mem_alloc;
  omp_allocator_handle_t late_declared = omp_low_lat_mem_alloc;
#pragma omp allocate(var) allocator(uninit)
#pragma omp allocate(var) allocator(late_declared)

(Currently, it is only diagnosed with -Wuninitialized (or -Wextra).)

Otherwise, I think all is covered, but I might have missed something.

Regarding:

On 29.08.23 19:14, Jakub Jelinek wrote:


What about
   int n = 5;
   omp_allocator_handle_t my_allocator = omp_low_lat_mem_alloc;
   #pragma omp allocate(n) allocator(my_allocator)
?  What we do in that case?  Is that invalid because my_allocator
isn't in scope when n is defined?


IMHO yes - as we can always construct cases with loops in the
execution-order dependence.

The current implementation handles it as VLA,
i.e. warning (VLA vs. allocator):

foo.c:5:3: warning: ‘m’ is used uninitialized [-Wuninitialized]
5 |   int A[m];
  |   ^~~

foo.c:7:7: warning: ‘my_allocator’ is used uninitialized [-Wuninitialized]

7 |   int n = 5;

As this is more surprising than VLA, it is probably worth an error
and a more explicit error. The question is how to best detect that
the allocator is either declared after the to-be-omp-allocated variable
declaration or that it is declared before but the assignment happens
only after the declaration (either before 'omp allocate' or not).


I wonder how to best check for this. During parsing, we can check for
TREE_USE and whether an initializer exists, for DECL_EXPR and for
c_check_in_current_scope, but this is not sufficient. We could use
DECL_SOURCE_LOCATION () with '<' comparisons, but I am not sure how it
plays with #line and #include - nor will it catch:
  int var;
  allocator = ...
  #pragma omp allocate(var) allocator(allocator)
where 'allocator' is TREE_USED and declared before 'var' but still wrong.
The 'is used initialized' is done deep in the middle end, but I don't think
we want to add some flag_openmp always-run special case there.

[Admittedly, we do not need to catch all issues and -Wuninitialized, enabled
by -Wextra, will also find this usage. Still, we should try to diagnose the
most common issues.]



Well, in this case the warning is there just because the patch chose to put
it at the start of the BIND_EXPR rather than right before DECL_EXPR.


Granted; the way the code before the DECL_EXPR was found wasn't ideal; I
now changed it back to what I previously had - where for C++, I need to
handle also a cleanup_point_expr around DECL_EXPR.


For switches, there is the case of the switch jumping across declaration
of an automatic var which is not initialized/constructed (I think in that
case there is normally no warning/error and happens a lot in the wild
including GCC sources) but perhaps one could treat those cases with
#pragma omp allocate as if they are actually constructed (though, I'd still
raise it at OpenMP F2F),

Can you open an OpenMP spec issue?

and another case with switch which doesn't even do
that.
Consider
   switch (i)
 {
 case 42:
   bar ();
   break;
 case 51:
   int j = 5;
   use ();
   break;
 }
This is valid for both C and C++, one doesn't jump across any initialization
in there.  Yet if the j allocation is done at the start of the BIND_EXPR, it
will jump across that initialization.


With the previously posted patch (and also the current one), that yields:

:
D.2900 = __builtin_GOMP_alloc (0, 4, 0B);
*D.2900 = 5;

which looks fine to me.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP (C only): omp allocate - handle stack vars, improve diagnostic

The 'allocate' directive can be used for both stack and static variables.
While the parser in C and C++ was pre-existing, it missed several
diagnostics, which this commit adds - for now only for C.
Additionally, it stopped with a sorry after parsing.

For C only, the sorry is now restricted to static variables, the stack
variable declarations are now tagged by the 'omp allocate' attribute and
in gimplify_bind_expr the GOMP_alloc/GOMP_free allocation will now be
added.

Follow up: Add the same parser additions for C++ and update the testcases.
And add Fortran support, where also parsing support exists, where also
diagnostic updates are required.

gcc/c/ChangeLog:

	* c-parser.cc 

[PATCH] Adjust costing of emulated vectorized gather/scatter

2023-08-30 Thread liuhongt via Gcc-patches
r14-332-g24905a4bd1375c adjusts costing of emulated vectorized
gather/scatter.

commit 24905a4bd1375ccd99c02510b9f9529015a48315
Author: Richard Biener 
Date:   Wed Jan 18 11:04:49 2023 +0100

Adjust costing of emulated vectorized gather/scatter

Emulated gather/scatter behave similar to strided elementwise
accesses in that they need to decompose the offset vector
and construct or decompose the data vector so handle them
the same way, pessimizing the cases with may elements.


But for emulated gather/scatter, offset vector load/vec_construct has
aready been counted, and in real case, it's probably eliminated by
later optimizer.
Also after decomposing, element loads from continous memory could be
less bounded compared to normal elementwise load.
The patch decreases the cost a little bit.

This will enable gather emulation for below loop with VF=8(ymm)

double
foo (double* a, double* b, unsigned int* c, int n)
{
  double sum = 0;
  for (int i = 0; i != n; i++)
sum += a[i] * b[c[i]];
  return sum;
}

For the upper loop, microbenchmark result shows on ICX,
emulated gather with VF=8 is 30% faster than emulated gather with
VF=4 when tripcount is big enough.
It bring back ~4% for 510.parest still ~5% regression compared to
gather instruction due to throughput bound.

For -march=znver1/2/3/4, the change doesn't enable VF=8(ymm) for the
loop, VF remains 4(xmm) as before(guess related to their own cost
model).


Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR target/111064
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Decrease cost a little bit for vec_to_scalar(offset vector) in
emulated gather.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr111064.c: New test.
---
 gcc/config/i386/i386.cc  | 11 ++-
 gcc/testsuite/gcc.target/i386/pr111064.c | 12 
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr111064.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 1bc3f11ff07..337e0f1bfbb 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -24079,7 +24079,16 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
  || STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER))
 {
   stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
-  stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1);
+  /* For emulated gather/scatter, offset vector load/vec_construct has
+already been counted and in real case, it's probably eliminated by
+later optimizer.
+Also after decomposing, element loads from continous memory
+could be less bounded compared to normal elementwise load.  */
+  if (kind == vec_to_scalar
+ && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER)
+   stmt_cost *= TYPE_VECTOR_SUBPARTS (vectype);
+  else
+   stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1);
 }
   else if ((kind == vec_construct || kind == scalar_to_vec)
   && node
diff --git a/gcc/testsuite/gcc.target/i386/pr111064.c 
b/gcc/testsuite/gcc.target/i386/pr111064.c
new file mode 100644
index 000..aa2589bd36f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr111064.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=icelake-server -mno-gather" } */
+/* { dg-final { scan-assembler-times {(?n)vfmadd[123]*pd.*ymm} 2 { target { ! 
ia32 } } } }  */
+
+double
+foo (double* a, double* b, unsigned int* c, int n)
+{
+  double sum = 0;
+  for (int i = 0; i != n; i++)
+sum += a[i] * b[c[i]];
+  return sum;
+}
-- 
2.31.1



RE: [PATCH] test: Add xfail for riscv_vector

2023-08-30 Thread Li, Pan2 via Gcc-patches
Committed, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Wednesday, August 30, 2023 4:36 PM
To: Juzhe-Zhong 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] test: Add xfail for riscv_vector

On Wed, 30 Aug 2023, Juzhe-Zhong wrote:

> Like ARM SVE, when we enable scalable vectorization for RVV,
> we can't do constant fold for these yet for both ARM SVE and RVV.
> 
> 
> Ok for trunk ?

OK.

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/pr88598-1.c: Add riscv_vector.
>   * gcc.dg/vect/pr88598-2.c: Ditto.
>   * gcc.dg/vect/pr88598-3.c: Ditto.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/pr88598-1.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/pr88598-2.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/pr88598-3.c | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/pr88598-1.c 
> b/gcc/testsuite/gcc.dg/vect/pr88598-1.c
> index e25c6c04543..ddcebb067ea 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr88598-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr88598-1.c
> @@ -51,4 +51,4 @@ main ()
>  
>  /* ??? We need more constant folding for this to work with fully-masked
> loops.  */
> -/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail 
> aarch64_sve } } } */
> +/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail { 
> aarch64_sve || riscv_vector } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/pr88598-2.c 
> b/gcc/testsuite/gcc.dg/vect/pr88598-2.c
> index f4c41bd8e58..ef5ea8a1a86 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr88598-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr88598-2.c
> @@ -51,4 +51,4 @@ main ()
>  
>  /* ??? We need more constant folding for this to work with fully-masked
> loops.  */
> -/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail 
> aarch64_sve } } } */
> +/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail { 
> aarch64_sve || riscv_vector } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/pr88598-3.c 
> b/gcc/testsuite/gcc.dg/vect/pr88598-3.c
> index 0fc23bf0ee7..75b8d024a95 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr88598-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr88598-3.c
> @@ -51,4 +51,4 @@ main ()
>  
>  /* ??? We need more constant folding for this to work with fully-masked
> loops.  */
> -/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail 
> aarch64_sve } } } */
> +/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail { 
> aarch64_sve || riscv_vector } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


RE: [PATCH] test: Fix XPASS of RVV

2023-08-30 Thread Li, Pan2 via Gcc-patches
Committed, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Wednesday, August 30, 2023 6:24 PM
To: Juzhe-Zhong 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] test: Fix XPASS of RVV

On Wed, 30 Aug 2023, Juzhe-Zhong wrote:

> XPASS: gcc.dg/vect/vect-outer-4e.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4e.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4f.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4g.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4k.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4l.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED" 1
> 
> Like ARM SVE, Fix these XPASS for RVV.

OK.

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-double-reduc-5.c: Add riscv.
>   * gcc.dg/vect/vect-outer-4e.c: Ditto.
>   * gcc.dg/vect/vect-outer-4f.c: Ditto.
>   * gcc.dg/vect/vect-outer-4g.c: Ditto.
>   * gcc.dg/vect/vect-outer-4k.c: Ditto.
>   * gcc.dg/vect/vect-outer-4l.c: Ditto.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-outer-4e.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-outer-4f.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-outer-4g.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-outer-4k.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-outer-4l.c   | 2 +-
>  6 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c 
> b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
> index 7465eae1c47..b990405745e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
> @@ -53,5 +53,5 @@ int main ()
>  
>  /* Vectorization of loops with multiple types and double reduction is not 
> supported yet.  */   
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4e.c 
> b/gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
> index e65a092f5bf..cc9e96f5d58 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
> @@ -23,4 +23,4 @@ foo (){
>return;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4f.c 
> b/gcc/testsuite/gcc.dg/vect/vect-outer-4f.c
> index a88014a2fbf..c903dc9bfea 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4f.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4f.c
> @@ -65,4 +65,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4g.c 
> b/gcc/testsuite/gcc.dg/vect/vect-outer-4g.c
> index a88014a2fbf..c903dc9bfea 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4g.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4g.c
> @@ -65,4 +65,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4k.c 
> b/gcc/testsuite/gcc.dg/vect/vect-outer-4k.c
> index a88014a2fbf..c903dc9bfea 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4k.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4k.c
> @@ -65,4 +65,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
> diff --git 

Re: [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets

2023-08-30 Thread Manolis Tsamis
On Tue, Jul 18, 2023 at 9:38 PM Richard Sandiford
 wrote:
>
> Manolis Tsamis  writes:
> > On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford
> >  wrote:
> >>
> >> Manolis Tsamis  writes:
> >> > noce_convert_multiple_sets has been introduced and extended over time to 
> >> > handle
> >> > if conversion for blocks with multiple sets. Currently this is focused on
> >> > register moves and rejects any sort of arithmetic operations.
> >> >
> >> > This series is an extension to allow more sequences to take part in if
> >> > conversion. The first patch is a required change to emit correct code 
> >> > and the
> >> > second patch whitelists a larger number of operations through
> >> > bb_ok_for_noce_convert_multiple_sets.
> >> >
> >> > For targets that have a rich selection of conditional instructions,
> >> > like aarch64, I have seen an ~5x increase of profitable if conversions 
> >> > for
> >> > multiple set blocks in SPEC benchmarks. Also tested with a wide variety 
> >> > of
> >> > benchmarks and I have not seen performance regressions on either x64 / 
> >> > aarch64.
> >>
> >> Interesting results.  Are you free to say which target you used for 
> >> aarch64?
> >>
> >> If I've understood the cost heuristics correctly, we'll allow a 
> >> "predictable"
> >> branch to be replaced by up to 5 simple conditional instructions and an
> >> "unpredictable" branch to be replaced by up to 10 simple conditional
> >> instructions.  That seems pretty high.  And I'm not sure how well we
> >> guess predictability in the absence of real profile information.
> >>
> >> So my gut instinct was that the limitations of the current code might
> >> be saving us from overly generous limits.  It sounds from your results
> >> like that might not be the case though.
> >>
> >> Still, if it does turn out to be the case in future, I agree we should
> >> fix the costs rather than hamstring the code.
> >>
> >
> > My writing may have been confusing, but with "~5x increase of
> > profitable if conversions" I just meant that ifcvt considers these
> > profitable, not that they actually are when executed in particular
> > hardware.
>
> Yeah, sorry, I'd read that part as measuring the number of if-converisons.
> But...
>
> > But at the same time I haven't yet seen any obvious performance
> > regressions in some benchmarks that I have ran.
>
> ...it was a pleasant surprise that doing so much more if-conversion
> didn't make things noticeably worse. :)
>
> > In any case it could be interesting to microbenchmark branches vs
> > conditional instructions and see how sane these numbers are.
>
> I think for this we really do need the real workload, since it's
> hard to measure realistic branch mispredict penalties with a
> microbenchmark.
>
Yes indeed. I'm still trying to get properly analyze the effects of this change.
I'll share when I have something interesting on the benchmarks side.

> > [...]
> >> (2) Don't you also need to update the "rewiring" mechanism, to cope
> >> with cases where the then block has something like:
> >>
> >>   if (a == 0) {
> >> a = b op c;   ->a' = a == 0 ? b op c : a;
> >> d = a op b;   ->d = a == 0 ? a' op b : d;
> >>   } a = a'
> >>
> >> At the moment the code only handles regs and subregs, whereas but IIUC
> >> it should now iterate over all the regs in the SET_SRC.  And I suppose
> >> that creates the need for multiple possible rewirings in the same insn,
> >> so that it isn't a simple insn -> index mapping any more.
> >>
> >
> > Indeed, I believe this current patch cannot properly handle these. I
> > will create testcases for this and see what changes need to be done in
> > the next iteration so that correct code is generated.
>
> Perhaps we should change the way that the rewiring is done.
> At the moment, need_cmov_or_rewire detects the renumbering
> ahead of time.  But it might be easier to:
>
> - have noce_convert_multiple_sets_1 keep track of which
>   SET_DESTs it has replaced with temporaries.
>
> - for each subsequent instruction, go through that list in order
>   and use insn_propagation (from recog.h) to apply each replacement.
>
> That might be simpler, and should also be more robust, since the
> insn_propagation routines return false on failure.
>
Thanks, I've tried various designs with these ideas, including moving
the rewiring code to noce_convert_multiple_sets_1  but there was
always some issue.
For example we cannot remove the need_cmov_or_rewire function because
the part that calculates need_no_cmov needs to iterate once before the
conversion, otherwise it wouldn't work.
Also noce_convert_multiple_sets_1 is ran twice each time so doing the
new rewiring logic which is somewhat expensive two times felt like a
regression without making things much easier.

In the end I opted to keep need_cmov_or_rewire (albait renamed) and
introduce a new struct noce_multiple_sets_info, which I thing made
things much nicer.
That includes 

Re: [PATCH] test: Fix XPASS of RVV

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Juzhe-Zhong wrote:

> XPASS: gcc.dg/vect/vect-outer-4e.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4e.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4f.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4g.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4k.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4l.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED" 1
> 
> Like ARM SVE, Fix these XPASS for RVV.

OK.

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-double-reduc-5.c: Add riscv.
>   * gcc.dg/vect/vect-outer-4e.c: Ditto.
>   * gcc.dg/vect/vect-outer-4f.c: Ditto.
>   * gcc.dg/vect/vect-outer-4g.c: Ditto.
>   * gcc.dg/vect/vect-outer-4k.c: Ditto.
>   * gcc.dg/vect/vect-outer-4l.c: Ditto.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-outer-4e.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-outer-4f.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-outer-4g.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-outer-4k.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-outer-4l.c   | 2 +-
>  6 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c 
> b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
> index 7465eae1c47..b990405745e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
> @@ -53,5 +53,5 @@ int main ()
>  
>  /* Vectorization of loops with multiple types and double reduction is not 
> supported yet.  */   
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4e.c 
> b/gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
> index e65a092f5bf..cc9e96f5d58 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
> @@ -23,4 +23,4 @@ foo (){
>return;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4f.c 
> b/gcc/testsuite/gcc.dg/vect/vect-outer-4f.c
> index a88014a2fbf..c903dc9bfea 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4f.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4f.c
> @@ -65,4 +65,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4g.c 
> b/gcc/testsuite/gcc.dg/vect/vect-outer-4g.c
> index a88014a2fbf..c903dc9bfea 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4g.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4g.c
> @@ -65,4 +65,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4k.c 
> b/gcc/testsuite/gcc.dg/vect/vect-outer-4k.c
> index a88014a2fbf..c903dc9bfea 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4k.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4k.c
> @@ -65,4 +65,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4l.c 
> b/gcc/testsuite/gcc.dg/vect/vect-outer-4l.c
> index 4f95c652ee3..a63b9332afa 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4l.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4l.c
> @@ -65,4 +65,4 @@ int main (void)
>return 0;
>  }
>  
> -/* 

[PATCH v3 3/4] ifcvt: Handle multiple rewired regs and refactor noce_convert_multiple_sets

2023-08-30 Thread Manolis Tsamis
The existing implementation of need_cmov_or_rewire and
noce_convert_multiple_sets_1 assumes that sets are either REG or SUBREG.
This commit enchances them so they can handle/rewire arbitrary set statements.

To do that a new helper struct noce_multiple_sets_info is introduced which is
used by noce_convert_multiple_sets and its helper functions. This results in
cleaner function signatures, improved efficientcy (a number of vecs and hash
set/map are replaced with a single vec of struct) and simplicity.

gcc/ChangeLog:

* ifcvt.cc (need_cmov_or_rewire): Renamed init_noce_multiple_sets_info.
(init_noce_multiple_sets_info): Initialize noce_multiple_sets_info.
(noce_convert_multiple_sets_1): Use noce_multiple_sets_info and handle
rewiring of multiple registers.
(noce_convert_multiple_sets): Updated to use noce_multiple_sets_info.
* ifcvt.h (struct noce_multiple_sets_info): Introduce new struct
noce_multiple_sets_info to store info for noce_convert_multiple_sets.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ifcvt_multiple_sets_rewire.c: New test.

Signed-off-by: Manolis Tsamis 
---

(no changes since v1)

 gcc/ifcvt.cc  | 255 --
 gcc/ifcvt.h   |  16 ++
 .../aarch64/ifcvt_multiple_sets_rewire.c  |  20 ++
 3 files changed, 149 insertions(+), 142 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_rewire.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index efe8ab1577a..ecc0cbabef9 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -98,14 +98,10 @@ static bool dead_or_predicable (basic_block, basic_block, 
basic_block,
edge, bool);
 static void noce_emit_move_insn (rtx, rtx);
 static rtx_insn *block_has_only_trap (basic_block);
-static void need_cmov_or_rewire (basic_block, hash_set *,
-hash_map *);
+static void init_noce_multiple_sets_info (basic_block,
+  auto_delete_vec &);
 static bool noce_convert_multiple_sets_1 (struct noce_if_info *,
- hash_set *,
- hash_map *,
- auto_vec *,
- auto_vec *,
- auto_vec *, int *);
+  auto_delete_vec &, int *);
 
 /* Count the number of non-jump active insns in BB.  */
 
@@ -3270,24 +3266,13 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
   rtx x = XEXP (cond, 0);
   rtx y = XEXP (cond, 1);
 
-  /* The true targets for a conditional move.  */
-  auto_vec targets;
-  /* The temporaries introduced to allow us to not consider register
- overlap.  */
-  auto_vec temporaries;
-  /* The insns we've emitted.  */
-  auto_vec unmodified_insns;
-
-  hash_set need_no_cmov;
-  hash_map rewired_src;
-
-  need_cmov_or_rewire (then_bb, _no_cmov, _src);
+  auto_delete_vec insn_info;
+  init_noce_multiple_sets_info (then_bb, insn_info);
 
   int last_needs_comparison = -1;
 
   bool ok = noce_convert_multiple_sets_1
-(if_info, _no_cmov, _src, , ,
- _insns, _needs_comparison);
+(if_info, insn_info, _needs_comparison);
   if (!ok)
   return false;
 
@@ -3302,8 +3287,7 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   end_sequence ();
   start_sequence ();
   ok = noce_convert_multiple_sets_1
-   (if_info, _no_cmov, _src, , ,
-_insns, _needs_comparison);
+   (if_info, insn_info, _needs_comparison);
   /* Actually we should not fail anymore if we reached here,
 but better still check.  */
   if (!ok)
@@ -3312,12 +3296,12 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
 
   /* We must have seen some sort of insn to insert, otherwise we were
  given an empty BB to convert, and we can't handle that.  */
-  gcc_assert (!unmodified_insns.is_empty ());
+  gcc_assert (!insn_info.is_empty ());
 
   /* Now fixup the assignments.  */
-  for (unsigned i = 0; i < targets.length (); i++)
-if (targets[i] != temporaries[i])
-  noce_emit_move_insn (targets[i], temporaries[i]);
+  for (unsigned i = 0; i < insn_info.length (); i++)
+if (insn_info[i]->target != insn_info[i]->temporary)
+  noce_emit_move_insn (insn_info[i]->target, insn_info[i]->temporary);
 
   /* Actually emit the sequence if it isn't too expensive.  */
   rtx_insn *seq = get_insns ();
@@ -3332,10 +3316,10 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
 set_used_flags (insn);
 
   /* Mark all our temporaries and targets as used.  */
-  for (unsigned i = 0; i < targets.length (); i++)
+  for (unsigned i = 0; i < insn_info.length (); i++)
 {
-  set_used_flags (temporaries[i]);
-  set_used_flags (targets[i]);
+  set_used_flags (insn_info[i]->temporary);
+  set_used_flags (insn_info[i]->target);
 }
 
   set_used_flags (cond);
@@ -3354,7 +3338,7 @@ 

[PATCH v3 4/4] ifcvt: Remove obsolete code for subreg handling in noce_convert_multiple_sets

2023-08-30 Thread Manolis Tsamis
This code used to handle register replacement issues with SUBREG before
simplify_replace_rtx was introduced. This should not be needed anymore as
new_val has the correct mode and that should be preserved by
simplify_replace_rtx.

gcc/ChangeLog:

* ifcvt.cc (noce_convert_multiple_sets_1): Remove old code.

Signed-off-by: Manolis Tsamis 
---

(no changes since v1)

 gcc/ifcvt.cc | 38 --
 1 file changed, 38 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index ecc0cbabef9..3b4b873612c 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3449,44 +3449,6 @@ noce_convert_multiple_sets_1 (struct noce_if_info 
*if_info,
   if (if_info->then_else_reversed)
std::swap (old_val, new_val);
 
-
-  /* We allow simple lowpart register subreg SET sources in
-bb_ok_for_noce_convert_multiple_sets.  Be careful when processing
-sequences like:
-(set (reg:SI r1) (reg:SI r2))
-(set (reg:HI r3) (subreg:HI (r1)))
-For the second insn new_val or old_val (r1 in this example) will be
-taken from the temporaries and have the wider mode which will not
-match with the mode of the other source of the conditional move, so
-we'll end up trying to emit r4:HI = cond ? (r1:SI) : (r3:HI).
-Wrap the two cmove operands into subregs if appropriate to prevent
-that.  */
-
-  if (!CONSTANT_P (new_val)
- && GET_MODE (new_val) != GET_MODE (temp))
-   {
- machine_mode src_mode = GET_MODE (new_val);
- machine_mode dst_mode = GET_MODE (temp);
- if (!partial_subreg_p (dst_mode, src_mode))
-   {
- end_sequence ();
- return false;
-   }
- new_val = lowpart_subreg (dst_mode, new_val, src_mode);
-   }
-  if (!CONSTANT_P (old_val)
- && GET_MODE (old_val) != GET_MODE (temp))
-   {
- machine_mode src_mode = GET_MODE (old_val);
- machine_mode dst_mode = GET_MODE (temp);
- if (!partial_subreg_p (dst_mode, src_mode))
-   {
- end_sequence ();
- return false;
-   }
- old_val = lowpart_subreg (dst_mode, old_val, src_mode);
-   }
-
   /* We have identified swap-style idioms before.  A normal
 set will need to be a cmov while the first instruction of a swap-style
 idiom can be a regular move.  This helps with costing.  */
-- 
2.34.1



[PATCH v3 2/4] ifcvt: Allow more operations in multiple set if conversion

2023-08-30 Thread Manolis Tsamis
Currently the operations allowed for if conversion of a basic block with
multiple sets are few, namely REG, SUBREG and CONST_INT (as controlled by
bb_ok_for_noce_convert_multiple_sets).

This commit allows more operations (arithmetic, compare, etc) to participate
in if conversion. The target's profitability hook and ifcvt's costing is
expected to reject sequences that are unprofitable.

This is especially useful for targets which provide a rich selection of
conditional instructions (like aarch64 which has cinc, csneg, csinv, ccmp, ...)
which are currently not used in basic blocks with more than a single set.

gcc/ChangeLog:

* ifcvt.cc (try_emit_cmove_seq): Modify comments.
(noce_convert_multiple_sets_1): Modify comments.
(bb_ok_for_noce_convert_multiple_sets): Allow more operations.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ifcvt_multiple_sets_arithm.c: New test.

Signed-off-by: Manolis Tsamis 
---

Changes in v3:
- Add SCALAR_INT_MODE_P check in bb_ok_for_noce_convert_multiple_sets.
- Allow rewiring of multiple regs.
- Refactor code with noce_multiple_sets_info.
- Remove old code for subregs.

 gcc/ifcvt.cc  | 63 ++-
 .../aarch64/ifcvt_multiple_sets_arithm.c  | 79 +++
 2 files changed, 123 insertions(+), 19 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 3273aeca125..efe8ab1577a 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3215,13 +3215,13 @@ try_emit_cmove_seq (struct noce_if_info *if_info, rtx 
temp,
 /* We have something like:
 
  if (x > y)
-   { i = a; j = b; k = c; }
+   { i = EXPR_A; j = EXPR_B; k = EXPR_C; }
 
Make it:
 
- tmp_i = (x > y) ? a : i;
- tmp_j = (x > y) ? b : j;
- tmp_k = (x > y) ? c : k;
+ tmp_i = (x > y) ? EXPR_A : i;
+ tmp_j = (x > y) ? EXPR_B : j;
+ tmp_k = (x > y) ? EXPR_C : k;
  i = tmp_i;
  j = tmp_j;
  k = tmp_k;
@@ -3637,11 +3637,10 @@ noce_convert_multiple_sets_1 (struct noce_if_info 
*if_info,
 
 
 
-/* Return true iff basic block TEST_BB is comprised of only
-   (SET (REG) (REG)) insns suitable for conversion to a series
-   of conditional moves.  Also check that we have more than one set
-   (other routines can handle a single set better than we would), and
-   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
+/* Return true iff basic block TEST_BB is suitable for conversion to a
+   series of conditional moves.  Also check that we have more than one
+   set (other routines can handle a single set better than we would),
+   and fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
through the insns store the sum of their potential costs in COST.  */
 
 static bool
@@ -3667,20 +3666,46 @@ bb_ok_for_noce_convert_multiple_sets (basic_block 
test_bb, unsigned *cost)
   rtx dest = SET_DEST (set);
   rtx src = SET_SRC (set);
 
-  /* We can possibly relax this, but for now only handle REG to REG
-(including subreg) moves.  This avoids any issues that might come
-from introducing loads/stores that might violate data-race-freedom
-guarantees.  */
-  if (!REG_P (dest))
+  /* Do not handle anything involving memory loads/stores since it might
+violate data-race-freedom guarantees.  */
+  if (!REG_P (dest) || contains_mem_rtx_p (src))
+   return false;
+
+  if (!SCALAR_INT_MODE_P (GET_MODE (src)))
return false;
 
-  if (!((REG_P (src) || CONSTANT_P (src))
-   || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
- && subreg_lowpart_p (src
+  /* Allow a wide range of operations and let the costing function decide
+if the conversion is worth it later.  */
+  enum rtx_code code = GET_CODE (src);
+  if (!(CONSTANT_P (src)
+   || code == REG
+   || code == SUBREG
+   || code == ZERO_EXTEND
+   || code == SIGN_EXTEND
+   || code == NOT
+   || code == NEG
+   || code == PLUS
+   || code == MINUS
+   || code == AND
+   || code == IOR
+   || code == MULT
+   || code == ASHIFT
+   || code == ASHIFTRT
+   || code == NE
+   || code == EQ
+   || code == GE
+   || code == GT
+   || code == LE
+   || code == LT
+   || code == GEU
+   || code == GTU
+   || code == LEU
+   || code == LTU
+   || code == COMPARE))
return false;
 
-  /* Destination must be appropriate for a conditional write.  */
-  if (!noce_operand_ok (dest))
+  /* Destination and source must be appropriate.  */
+  if (!noce_operand_ok (dest) || !noce_operand_ok (src))
return false;
 
   /* We must be able to conditionally move in this mode.  */
diff --git 

[PATCH v3 1/4] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets

2023-08-30 Thread Manolis Tsamis
This is an extension of what was done in PR106590.

Currently if a sequence generated in noce_convert_multiple_sets clobbers the
condition rtx (cc_cmp or rev_cc_cmp) then only seq1 is used afterwards
(sequences that emit the comparison itself). Since this applies only from the
next iteration it assumes that the sequences generated (in particular seq2)
doesn't clobber the condition rtx itself before using it in the if_then_else,
which is only true in specific cases (currently only register/subregister moves
are allowed).

This patch changes this so it also tests if seq2 clobbers cc_cmp/rev_cc_cmp in
the current iteration. This makes it possible to include arithmetic operations
in noce_convert_multiple_sets.

gcc/ChangeLog:

* ifcvt.cc (check_for_cc_cmp_clobbers): Use modified_in_p instead.
(noce_convert_multiple_sets_1): Don't use seq2 if it clobbers cc_cmp.

Signed-off-by: Manolis Tsamis 
---

(no changes since v1)

 gcc/ifcvt.cc | 49 +++--
 1 file changed, 19 insertions(+), 30 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index a0af553b9ff..3273aeca125 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3375,20 +3375,6 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   return true;
 }
 
-/* Helper function for noce_convert_multiple_sets_1.  If store to
-   DEST can affect P[0] or P[1], clear P[0].  Called via note_stores.  */
-
-static void
-check_for_cc_cmp_clobbers (rtx dest, const_rtx, void *p0)
-{
-  rtx *p = (rtx *) p0;
-  if (p[0] == NULL_RTX)
-return;
-  if (reg_overlap_mentioned_p (dest, p[0])
-  || (p[1] && reg_overlap_mentioned_p (dest, p[1])))
-p[0] = NULL_RTX;
-}
-
 /* This goes through all relevant insns of IF_INFO->then_bb and tries to
create conditional moves.  In case a simple move sufficis the insn
should be listed in NEED_NO_CMOV.  The rewired-src cases should be
@@ -3552,9 +3538,17 @@ noce_convert_multiple_sets_1 (struct noce_if_info 
*if_info,
 creating an additional compare for each.  If successful, costing
 is easier and this sequence is usually preferred.  */
   if (cc_cmp)
-   seq2 = try_emit_cmove_seq (if_info, temp, cond,
-  new_val, old_val, need_cmov,
-  , _dest2, cc_cmp, rev_cc_cmp);
+   {
+ seq2 = try_emit_cmove_seq (if_info, temp, cond,
+new_val, old_val, need_cmov,
+, _dest2, cc_cmp, rev_cc_cmp);
+
+ /* The if_then_else in SEQ2 may be affected when cc_cmp/rev_cc_cmp is
+clobbered.  We can't safely use the sequence in this case.  */
+ if (seq2 && (modified_in_p (cc_cmp, seq2)
+ || (rev_cc_cmp && modified_in_p (rev_cc_cmp, seq2
+   seq2 = NULL;
+   }
 
   /* The backend might have created a sequence that uses the
 condition.  Check this.  */
@@ -3609,21 +3603,16 @@ noce_convert_multiple_sets_1 (struct noce_if_info 
*if_info,
  return false;
}
 
-  if (cc_cmp)
+  if (cc_cmp && seq == seq1)
{
- /* Check if SEQ can clobber registers mentioned in
-cc_cmp and/or rev_cc_cmp.  If yes, we need to use
-only seq1 from that point on.  */
- rtx cc_cmp_pair[2] = { cc_cmp, rev_cc_cmp };
- for (walk = seq; walk; walk = NEXT_INSN (walk))
+ /* Check if SEQ can clobber registers mentioned in cc_cmp/rev_cc_cmp.
+If yes, we need to use only seq1 from that point on.
+Only check when we use seq1 since we have already tested seq2.  */
+ if (modified_in_p (cc_cmp, seq)
+ || (rev_cc_cmp && modified_in_p (rev_cc_cmp, seq)))
{
- note_stores (walk, check_for_cc_cmp_clobbers, cc_cmp_pair);
- if (cc_cmp_pair[0] == NULL_RTX)
-   {
- cc_cmp = NULL_RTX;
- rev_cc_cmp = NULL_RTX;
- break;
-   }
+ cc_cmp = NULL_RTX;
+ rev_cc_cmp = NULL_RTX;
}
}
 
-- 
2.34.1



[PATCH v3 0/4] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets

2023-08-30 Thread Manolis Tsamis


noce_convert_multiple_sets has been introduced and extended over time to handle
if conversion for blocks with multiple sets. Currently this is focused on
register moves and rejects any sort of arithmetic operations.

This series is an extension to allow more sequences to take part in if
conversion. The first patch is a required change to emit correct code and the
second patch whitelists a larger number of operations through
bb_ok_for_noce_convert_multiple_sets. The third patch adds support to rewire
multiple registers in noce_convert_multiple_sets_1 and refactors the code with
a new helper info struct. The fourth patch removes some old code that should
not be needed anymore.

For targets that have a rich selection of conditional instructions,
like aarch64, I have seen an ~5x increase of profitable if conversions for
multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
benchmarks and I have not seen performance regressions on either x64 / aarch64.

Some samples that previously resulted in a branch but now better use these
instructions can be seen in the provided test cases.

Bootstrapped and tested on AArch64 and x86-64.


Changes in v3:
- Add SCALAR_INT_MODE_P check in bb_ok_for_noce_convert_multiple_sets.
- Allow rewiring of multiple regs.
- Refactor code with noce_multiple_sets_info.
- Remove old code for subregs.

Manolis Tsamis (4):
  ifcvt: handle sequences that clobber flags in
noce_convert_multiple_sets
  ifcvt: Allow more operations in multiple set if conversion
  ifcvt: Handle multiple rewired regs and refactor
noce_convert_multiple_sets
  ifcvt: Remove obsolete code for subreg handling in
noce_convert_multiple_sets

 gcc/ifcvt.cc  | 403 --
 gcc/ifcvt.h   |  16 +
 .../aarch64/ifcvt_multiple_sets_arithm.c  |  79 
 .../aarch64/ifcvt_multiple_sets_rewire.c  |  20 +
 4 files changed, 290 insertions(+), 228 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_rewire.c

-- 
2.34.1



Re: [PATCH 0/3] [RISC-V] support zcmp extension

2023-08-30 Thread Kito Cheng via Gcc-patches
Pass regression without introducing any new fail, push to trunk :)

On Tue, Aug 29, 2023 at 4:39 PM Fei Gao  wrote:
>
> Fei Gao (3):
>   [RISC-V] support cm.push cm.pop cm.popret in zcmp
>   [RISC-V] support cm.popretz in zcmp
>   [RISC-V] support cm.mva01s cm.mvsa01 in zcmp
>
>  gcc/config/riscv/iterators.md |   15 +
>  gcc/config/riscv/peephole.md  |   28 +
>  gcc/config/riscv/predicates.md|  107 ++
>  gcc/config/riscv/riscv-protos.h   |2 +
>  gcc/config/riscv/riscv.cc |  499 +-
>  gcc/config/riscv/riscv.h  |   25 +
>  gcc/config/riscv/riscv.md |4 +
>  gcc/config/riscv/zc.md| 1457 +
>  gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c   |   23 +
>  gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   |  269 +++
>  gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   |  269 +++
>  .../gcc.target/riscv/zcmp_push_fpr.c  |   34 +
>  .../gcc.target/riscv/zcmp_stack_alignment.c   |   24 +
>  13 files changed, 2705 insertions(+), 51 deletions(-)
>  create mode 100644 gcc/config/riscv/zc.md
>  create mode 100644 gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_push_fpr.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c
>
> --
> 2.17.1
>


Re: [PATCH v2 3/4] LoongArch: add new configure option --with-strict-align-lib

2023-08-30 Thread Yujie Yang
On Wed, Aug 30, 2023 at 04:22:13PM +0800, Xi Ruoyao wrote:
> On Wed, 2023-08-30 at 14:51 +0800, Yujie Yang wrote:
> > > > LoongArch processors may not support memory accesses without natural
> > > > alignments.  Building libraries with -mstrict-align may help with
> > > > toolchain binary compatiblity and performance on these implementations
> > > > (e.g. Loongson 2K1000LA).
> > > > 
> > > > No significant performance degredation is observed on current mainstream
> > > > LoongArch processors when the option is enabled.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > * config.gcc: use -mstrict-align for building libraries
> > > > if --with-strict-align-lib is given.
> > > 
> > > Isn't this equivalent to --with-default-multilib=mno-strict-align now?
> > > 
> > > And I still believe the easiest way for 2K1000LA is adding -march=la264
> > > support so the user can simply configure with --with-arch=la264.
> > 
> > Not exactly -- Options given in --with-multilib-default= will not be applied
> > to multilib variants that have build options specified in 
> > --with-multilib-list,
> > but --with-strict-align-lib is always effective.
> > 
> > e.g. for the following configuration:
> > 
> >   --with-multilib-default=mstrict-align
> >   --with-multilib-list=lp64d/la464,lp64s
> > 
> > The library build options would be:
> > 
> >   base/lp64d variant: -mabi=lp64d -march=la464 (no -mstrict-align appended)
> >   base/lp64s variant: -mabi=lp64s -march=abi-default -mstrict-align
> > 
> > Sure, you can do it with --with-arch=la264. It's just a convenient
> > switch that we can use for building generic toolchains.
> 
> If you want a generic toolchain, it should default to -mstrict-align as
> well.  Or it will still do unexpected thing for cases like:
> 
> struct foo { char x; int y; } __attribute__ ((packed));
> 
> int get (struct foo *foo) { return foo->y; }
> 
> So it should be --with-strict-align (it should make the *compiler*
> default to -mstrict-align).  But them it seems --with-arch=la264 is just
> easier...

By "generic" I mean: when you enable "-march=la264"/"-march=la464"
and link statically, you get a binary that's good for running on
LA264/LA464 cores, respectively.  It's more of a cross-toolchain case.



[PATCH] test: Fix XPASS of RVV

2023-08-30 Thread Juzhe-Zhong
XPASS: gcc.dg/vect/vect-outer-4e.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4e.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4f.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4g.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4k.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4l.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED" 1

Like ARM SVE, Fix these XPASS for RVV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-double-reduc-5.c: Add riscv.
* gcc.dg/vect/vect-outer-4e.c: Ditto.
* gcc.dg/vect/vect-outer-4f.c: Ditto.
* gcc.dg/vect/vect-outer-4g.c: Ditto.
* gcc.dg/vect/vect-outer-4k.c: Ditto.
* gcc.dg/vect/vect-outer-4l.c: Ditto.

---
 gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-outer-4e.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-outer-4f.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-outer-4g.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-outer-4k.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-outer-4l.c   | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c 
b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
index 7465eae1c47..b990405745e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
@@ -53,5 +53,5 @@ int main ()
 
 /* Vectorization of loops with multiple types and double reduction is not 
supported yet.  */   
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
{ ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
{ ! { aarch64*-*-* riscv*-*-* } } } } } */
   
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4e.c 
b/gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
index e65a092f5bf..cc9e96f5d58 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
@@ -23,4 +23,4 @@ foo (){
   return;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
{ ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
{ ! { aarch64*-*-* riscv*-*-* } } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4f.c 
b/gcc/testsuite/gcc.dg/vect/vect-outer-4f.c
index a88014a2fbf..c903dc9bfea 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-4f.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4f.c
@@ -65,4 +65,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
{ ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
{ ! { aarch64*-*-* riscv*-*-* } } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4g.c 
b/gcc/testsuite/gcc.dg/vect/vect-outer-4g.c
index a88014a2fbf..c903dc9bfea 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-4g.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4g.c
@@ -65,4 +65,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
{ ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
{ ! { aarch64*-*-* riscv*-*-* } } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4k.c 
b/gcc/testsuite/gcc.dg/vect/vect-outer-4k.c
index a88014a2fbf..c903dc9bfea 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-4k.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4k.c
@@ -65,4 +65,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
{ ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
{ ! { aarch64*-*-* riscv*-*-* } } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4l.c 
b/gcc/testsuite/gcc.dg/vect/vect-outer-4l.c
index 4f95c652ee3..a63b9332afa 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-4l.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4l.c
@@ -65,4 +65,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
{ ! aarch64*-*-* } } } }*/
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
{ ! { aarch64*-*-* riscv*-*-* } } } } }*/
-- 
2.36.3



[PATCH] RISC-V: Fix vsetvl pass ICE

2023-08-30 Thread Lehua Ding
This patch fix pr111234 (a vsetvl pass ICE) when fuse a mask any
vlmax vsetvl_vtype_change_only insn with a mu vsetvl insn.

PR target/111234

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Remove condition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr111234.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  |  2 +-
 .../gcc.target/riscv/rvv/vsetvl/pr111234.c| 19 +++
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 1386d9250ca..a81bb53a521 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -655,7 +655,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info 
,
 new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl);
   else
 {
-  if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ()))
+  if (vsetvl_insn_p (rinsn))
new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, get_vl (rinsn));
   else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only)
new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c
new file mode 100644
index 000..ee5eec4a257
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111234.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include 
+
+void
+f (vint32m1_t *in, vint64m2_t *out, vbool32_t *m, int b)
+{
+  vint32m1_t va = *in;
+  vbool32_t mask = *m;
+  vint64m2_t vb
+= __riscv_vwadd_vx_i64m2_m (mask, va, 1, __riscv_vsetvlmax_e64m2 ());
+  vint64m2_t vc = __riscv_vadd_vx_i64m2 (vb, 1, __riscv_vsetvlmax_e64m2 ());
+
+  if (b != 0)
+vc = __riscv_vadd_vx_i64m2_mu (mask, vc, vc, 1, __riscv_vsetvlmax_e64m2 
());
+
+  *out = vc;
+}
-- 
2.36.3



RE: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-30 Thread Di Zhao OS via Gcc-patches
Hello Richard,

> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, August 29, 2023 7:11 PM
> To: Di Zhao OS 
> Cc: Jeff Law ; Martin Jambor ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to
> reduce cross backedge FMA
> 
> On Tue, Aug 29, 2023 at 10:59 AM Di Zhao OS
>  wrote:
> >
> > Hi,
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Tuesday, August 29, 2023 4:09 PM
> > > To: Di Zhao OS 
> > > Cc: Jeff Law ; Martin Jambor ;
> gcc-
> > > patc...@gcc.gnu.org
> > > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc
> to
> > > reduce cross backedge FMA
> > >
> > > On Tue, Aug 29, 2023 at 9:49 AM Di Zhao OS
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > > -Original Message-
> > > > > From: Richard Biener 
> > > > > Sent: Tuesday, August 29, 2023 3:41 PM
> > > > > To: Jeff Law ; Martin Jambor 
> > > > > Cc: Di Zhao OS ; gcc-
> patc...@gcc.gnu.org
> > > > > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in
> reassoc
> > > to
> > > > > reduce cross backedge FMA
> > > > >
> > > > > On Tue, Aug 29, 2023 at 1:23 AM Jeff Law via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 8/28/23 02:17, Di Zhao OS via Gcc-patches wrote:
> > > > > > > This patch tries to fix the 2% regression in 510.parest_r on
> > > > > > > ampere1 in the tracker. (Previous discussion is here:
> > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624893.html)
> > > > > > >
> > > > > > > 1. Add testcases for the problem. For an op list in the form of
> > > > > > > "acc = a * b + c * d + acc", currently reassociation doesn't
> > > > > > > Swap the operands so that more FMAs can be generated.
> > > > > > > After widening_mul the result looks like:
> > > > > > >
> > > > > > > _1 = .FMA(a, b, acc_0);
> > > > > > > acc_1 = .FMA(c, d, _1);
> > > > > > >
> > > > > > > While previously (before the "Handle FMA friendly..." patch),
> > > > > > > widening_mul's result was like:
> > > > > > >
> > > > > > > _1 = a * b;
> > > > > > > _2 = .FMA (c, d, _1);
> > > > > > > acc_1 = acc_0 + _2;
> > > > >
> > > > > How can we execute the multiply and the FMA in parallel?  They
> > > > > depend on each other.  Or is it the uarch can handle dependence
> > > > > on the add operand but only when it is with a multiplication and
> > > > > not a FMA in some better ways?  (I'd doubt so much complexity)
> > > > >
> > > > > Can you explain in more detail how the uarch executes one vs. the
> > > > > other case?
> >
> > Here's my understanding after consulted our hardware team. For the
> > second case, the uarch of some out-of-order processors can calculate
> > "_2" of several loops at the same time, since there's no dependency
> > among different iterations. While for the first case the next iteration
> > has to wait for the current iteration to finish, so "acc_0"'s value is
> > known. I assume it is also the case in some i386 processors, since I
> > saw the patch "Deferring FMA transformations in tight loops" also
> > changed corresponding files.
> 
> That should be true for all kind of operations, no?  Thus it means
> reassoc should in general associate cross-iteration accumulation
Yes I think both are true.

> last?  Historically we associated those first because that's how the
> vectorizer liked to see them, but I think that's no longer necessary.
> 
> It should be achievable by properly biasing the operand during
> rank computation (don't we already do that?).

The issue is related with the following codes (handling cases with
three operands left):
  /* When there are three operands left, we want
 to make sure the ones that get the double
 binary op are chosen wisely.  */
  int len = ops.length ();
  if (len >= 3 && !has_fma)
swap_ops_for_binary_stmt (ops, len - 3);

  new_lhs = rewrite_expr_tree (stmt, rhs_code, 0, ops,
   powi_result != NULL
   || negate_result,
   len != orig_len);

Originally (before the "Handle FMA friendly..." patch), for the
tiny example, the 2 multiplications will be placed first by 
swap_ops_for_binary_stmt and rewrite_expr_tree, according to
ranks. While currently, to preserve more FMAs,
swap_ops_for_binary_stmt won't be called, so the result would
be MULT_EXPRs and PLUS_EXPRs interleaved with each other (which
is mostly fine if these are not in such tight loops).

What this patch tries to do can be summarized as: when cross
backedge dependency is detected (and the uarch doesn't like it),
better fallback to the "old" way, since we don't want the loop
depending FMA anyway. (I hope I'm understanding the question
correctly.)

> 
> > > > >
> > > > > > > If the code fragment is in a loop, some architecture can execute
> > > > > > > the latter in parallel, so the performance can be much faster
> than
> > 

[PATCH 8/8] aarch64: Add SVE support for simd clones [PR 96342]

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
This patch finalizes adding support for the generation of SVE simd 
clones when no simdlen is provided, following the ABI rules where the 
widest data type determines the minimum amount of elements in a length 
agnostic vector.


gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (add_sve_type_attribute): 
Declare.

* config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute): Make
visibility global.
* config/aarch64/aarch64.cc (aarch64_fntype_abi): Ensure SVE ABI is
chosen over SIMD ABI if a SVE type is used in return or arguments.
(aarch64_simd_clone_compute_vecsize_and_simdlen): Create VLA simd clone
when no simdlen is provided, according to ABI rules.
(aarch64_simd_clone_adjust): Add '+sve' attribute to SVE simd clones.
(aarch64_simd_clone_adjust_ret_or_param): New.
(TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Define.
* omp-simd-clone.cc (simd_clone_mangle): Print 'x' for VLA simdlen.
(simd_clone_adjust): Adapt safelen check to be compatible with VLA
simdlen.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/declare-variant-14.c: Adapt aarch64 scan.
* gfortran.dg/gomp/declare-variant-14.f90: Likewise.
* gcc.target/aarch64/declare-simd-1.c: Remove warning checks where no
longer necessary.
* gcc.target/aarch64/declare-simd-2.c: Add SVE clone scan.diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
70303d6fd953e0c397b9138ede8858c2db2e53db..d7888c95a4999fad1a4c55d5cd2287c2040302c8
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1001,6 +1001,8 @@ namespace aarch64_sve {
 #ifdef GCC_TARGET_H
   bool verify_type_context (location_t, type_context_kind, const_tree, bool);
 #endif
+ void add_sve_type_attribute (tree, unsigned int, unsigned int,
+ const char *, const char *);
 }
 
 extern void aarch64_split_combinev16qi (rtx operands[3]);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 
161a14edde7c9fb1b13b146cf50463e2d78db264..6f99c438d10daa91b7e3b623c995489f1a8a0f4c
 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -569,14 +569,16 @@ static bool reported_missing_registers_p;
 /* Record that TYPE is an ABI-defined SVE type that contains NUM_ZR SVE vectors
and NUM_PR SVE predicates.  MANGLED_NAME, if nonnull, is the ABI-defined
mangling of the type.  ACLE_NAME is the  name of the type.  */
-static void
+void
 add_sve_type_attribute (tree type, unsigned int num_zr, unsigned int num_pr,
const char *mangled_name, const char *acle_name)
 {
   tree mangled_name_tree
 = (mangled_name ? get_identifier (mangled_name) : NULL_TREE);
+  tree acle_name_tree
+= (acle_name ? get_identifier (acle_name) : NULL_TREE);
 
-  tree value = tree_cons (NULL_TREE, get_identifier (acle_name), NULL_TREE);
+  tree value = tree_cons (NULL_TREE, acle_name_tree, NULL_TREE);
   value = tree_cons (NULL_TREE, mangled_name_tree, value);
   value = tree_cons (NULL_TREE, size_int (num_pr), value);
   value = tree_cons (NULL_TREE, size_int (num_zr), value);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
a13d3fba05f9f9d2989b36c681bc77d71e943e0d..492acb9ce081866162faa8dfca777e4cb943797f
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -4034,13 +4034,13 @@ aarch64_takes_arguments_in_sve_regs_p (const_tree 
fntype)
 static const predefined_function_abi &
 aarch64_fntype_abi (const_tree fntype)
 {
-  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
-return aarch64_simd_abi ();
-
   if (aarch64_returns_value_in_sve_regs_p (fntype)
   || aarch64_takes_arguments_in_sve_regs_p (fntype))
 return aarch64_sve_abi ();
 
+  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
+return aarch64_simd_abi ();
+
   return default_function_abi;
 }
 
@@ -27327,7 +27327,7 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct 
cgraph_node *node,
int num, bool explicit_p)
 {
   tree t, ret_type;
-  unsigned int nds_elt_bits;
+  unsigned int nds_elt_bits, wds_elt_bits;
   int count;
   unsigned HOST_WIDE_INT const_simdlen;
   poly_uint64 vec_bits;
@@ -27374,10 +27374,14 @@ aarch64_simd_clone_compute_vecsize_and_simdlen 
(struct cgraph_node *node,
   if (TREE_CODE (ret_type) != VOID_TYPE)
 {
   nds_elt_bits = lane_size (SIMD_CLONE_ARG_TYPE_VECTOR, ret_type);
+  wds_elt_bits = nds_elt_bits;
   vec_elts.safe_push (std::make_pair (ret_type, nds_elt_bits));
 }
   else
-nds_elt_bits = POINTER_SIZE;
+{
+  nds_elt_bits = POINTER_SIZE;
+  wds_elt_bits = 0;
+}
 
   int i;
   tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
@@ -27385,30 +27389,36 @@ 

[PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
This patch adds a new target hook to enable us to adapt the types of 
return and parameters of simd clones.  We use this in two ways, the 
first one is to make sure we can create valid SVE types, including the 
SVE type attribute, when creating a SVE simd clone, even when the target 
options do not support SVE.  We are following the same behaviour seen 
with x86 that creates simd clones according to the ABI rules when no 
simdlen is provided, even if that simdlen is not supported by the 
current target options.  Note that this doesn't mean the simd clone will 
be used in auto-vectorization.


gcc/ChangeLog:

(TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Define.
* doc/tm.texi (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Document.
* doc/tm.texi.in (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): New.
* omp-simd-clone.cc (simd_adjust_return_type): Call new hook.
(simd_clone_adjust_argument_types): Likewise.
* target.def (adjust_ret_or_param): New hook.
* targhooks.cc (default_simd_clone_adjust_ret_or_param): New.
* targhooks.h (default_simd_clone_adjust_ret_or_param): New.diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 
bde22e562ebb9069122eb3b142ab8f4a4ae56a3a..b80c09ec36d51f1bb55b14229f46207fb4457223
 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6343,6 +6343,9 @@ non-negative number if it is usable.  In that case, the 
smaller the number is,
 the more desirable it is to use it.
 @end deftypefn
 
+@deftypefn {Target Hook} tree TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM (struct 
cgraph_node *@var{}, @var{tree}, @var{bool})
+If defined, this hook should adjust the type of the return or parameter
+@var{type} to be used by the simd clone @var{node}.
 @end deftypefn
 
 @deftypefn {Target Hook} int TARGET_SIMT_VF (void)
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 
4ac96dc357d35e0e57bb43a41d1b1a4f66d05946..7496a32d84f7c422fe7ea88215ee72f3c354a3f4
 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4211,6 +4211,8 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_SIMD_CLONE_USABLE
 
+@hook TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM
+
 @hook TARGET_SIMT_VF
 
 @hook TARGET_OMP_DEVICE_KIND_ARCH_ISA
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 
ef0b9b48c7212900023bc0eaebca5e1f9389db77..c2fd4d3be878e56b6394e34097d2de826a0ba1ff
 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -736,6 +736,7 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
   t = build_array_type_nelts (t, exact_div (node->simdclone->simdlen,
veclen));
 }
+  t = targetm.simd_clone.adjust_ret_or_param (node, t, false);
   TREE_TYPE (TREE_TYPE (fndecl)) = t;
   if (!node->definition)
 return NULL_TREE;
@@ -748,6 +749,7 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
 
   tree atype = build_array_type_nelts (orig_rettype,
   node->simdclone->simdlen);
+  atype = targetm.simd_clone.adjust_ret_or_param (node, atype, false);
   if (maybe_ne (veclen, node->simdclone->simdlen))
 return build1 (VIEW_CONVERT_EXPR, atype, t);
 
@@ -880,6 +882,8 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
   ? IDENTIFIER_POINTER (DECL_NAME (parm))
   : NULL, parm_type, sc->simdlen);
}
+  adj.type = targetm.simd_clone.adjust_ret_or_param (node, adj.type,
+false);
   vec_safe_push (new_params, adj);
 }
 
@@ -912,6 +916,8 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
adj.type = build_vector_type (pointer_sized_int_node, veclen);
   else
adj.type = build_vector_type (base_type, veclen);
+  adj.type = targetm.simd_clone.adjust_ret_or_param (node, adj.type,
+true);
   vec_safe_push (new_params, adj);
 
   k = vector_unroll_factor (sc->simdlen, veclen);
@@ -937,6 +943,7 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
sc->args[i].simd_array = NULL_TREE;
}
   sc->args[i].orig_type = base_type;
+  sc->args[i].vector_type = adj.type;
   sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK;
   sc->args[i].vector_type = adj.type;
 }
diff --git a/gcc/target.def b/gcc/target.def
index 
6a0cbc454526ee29011451b570354bf234a4eabd..665083ce035da03b40b15f23684ccdacce33c9d3
 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1650,6 +1650,13 @@ non-negative number if it is usable.  In that case, the 
smaller the number is,\n
 the more desirable it is to use it.",
 int, (struct cgraph_node *, machine_mode), NULL)
 
+DEFHOOK
+(adjust_ret_or_param,
+"If defined, this hook should adjust the type of the return or parameter\n\
+@var{type} to be used by the simd clone @var{node}.",
+tree, (struct cgraph_node *, tree, 

Re: [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches

Forgot to CC this one to maintainers...

On 30/08/2023 10:14, Andre Vieira (lists) via Gcc-patches wrote:
This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE 
hook to enable rejecting SVE modes when the target architecture does not 
support SVE.


gcc/ChangeLog:

 * config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add mode
 parameter and use to to reject SVE modes when target architecture does
 not support SVE.
 * config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused mode 
parameter.

 * config/i386/i386.cc (ix86_simd_clone_usable): Likewise.
 * doc/tm.texi (TARGET_SIMD_CLONE_USABLE): Document new parameter.
 * target.def (usable): Add new parameter.
 * tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass vector mode
 to TARGET_SIMD_CLONE_CALL hook.


[PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE 
hook to enable rejecting SVE modes when the target architecture does not 
support SVE.


gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add mode
parameter and use to to reject SVE modes when target architecture does
not support SVE.
* config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused mode parameter.
* config/i386/i386.cc (ix86_simd_clone_usable): Likewise.
* doc/tm.texi (TARGET_SIMD_CLONE_USABLE): Document new parameter.
* target.def (usable): Add new parameter.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass vector mode
to TARGET_SIMD_CLONE_CALL hook.diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
5fb4c863d875871d6de865e72ce360506a3694d2..a13d3fba05f9f9d2989b36c681bc77d71e943e0d
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -27498,12 +27498,18 @@ aarch64_simd_clone_adjust (struct cgraph_node *node)
 /* Implement TARGET_SIMD_CLONE_USABLE.  */
 
 static int
-aarch64_simd_clone_usable (struct cgraph_node *node)
+aarch64_simd_clone_usable (struct cgraph_node *node, machine_mode vector_mode)
 {
   switch (node->simdclone->vecsize_mangle)
 {
 case 'n':
-  if (!TARGET_SIMD)
+  if (!TARGET_SIMD
+ || aarch64_sve_mode_p (vector_mode))
+   return -1;
+  return 0;
+case 's':
+  if (!TARGET_SVE
+ || !aarch64_sve_mode_p (vector_mode))
return -1;
   return 0;
 default:
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 
02f4dedec4214b1eea9e6f5057ed57d7e0db316a..252676273f06500c99df6ae251f0406c618df891
 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -5599,7 +5599,8 @@ gcn_simd_clone_adjust (struct cgraph_node *ARG_UNUSED 
(node))
 /* Implement TARGET_SIMD_CLONE_USABLE.  */
 
 static int
-gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node))
+gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node),
+  machine_mode ARG_UNUSED (mode))
 {
   /* We don't need to do anything here because
  gcn_simd_clone_compute_vecsize_and_simdlen currently only returns one
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 
5d57726e22cea8bcaa8ac8b1b25ac420193f39bb..84f0d5a7cb679e6be92001f59802276635506e97
 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -24379,7 +24379,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct 
cgraph_node *node,
slightly less desirable, etc.).  */
 
 static int
-ix86_simd_clone_usable (struct cgraph_node *node)
+ix86_simd_clone_usable (struct cgraph_node *node,
+   machine_mode mode ATTRIBUTE_UNUSED)
 {
   switch (node->simdclone->vecsize_mangle)
 {
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 
95ba56e05ae4a0f11639cc4a21d6736c53ad5ef1..bde22e562ebb9069122eb3b142ab8f4a4ae56a3a
 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6336,11 +6336,13 @@ This hook should add implicit 
@code{attribute(target("..."))} attribute
 to SIMD clone @var{node} if needed.
 @end deftypefn
 
-@deftypefn {Target Hook} int TARGET_SIMD_CLONE_USABLE (struct cgraph_node 
*@var{})
+@deftypefn {Target Hook} int TARGET_SIMD_CLONE_USABLE (struct cgraph_node 
*@var{}, @var{machine_mode})
 This hook should return -1 if SIMD clone @var{node} shouldn't be used
-in vectorized loops in current function, or non-negative number if it is
-usable.  In that case, the smaller the number is, the more desirable it is
-to use it.
+in vectorized loops being vectorized with mode @var{m} in current function, or
+non-negative number if it is usable.  In that case, the smaller the number is,
+the more desirable it is to use it.
+@end deftypefn
+
 @end deftypefn
 
 @deftypefn {Target Hook} int TARGET_SIMT_VF (void)
diff --git a/gcc/target.def b/gcc/target.def
index 
7d684296c17897b4ceecb31c5de1ae8665a8228e..6a0cbc454526ee29011451b570354bf234a4eabd
 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1645,10 +1645,11 @@ void, (struct cgraph_node *), NULL)
 DEFHOOK
 (usable,
 "This hook should return -1 if SIMD clone @var{node} shouldn't be used\n\
-in vectorized loops in current function, or non-negative number if it is\n\
-usable.  In that case, the smaller the number is, the more desirable it is\n\
-to use it.",
-int, (struct cgraph_node *), NULL)
+in vectorized loops being vectorized with mode @var{m} in current function, 
or\n\
+non-negative number if it is usable.  In that case, the smaller the number 
is,\n\
+the more desirable it is to use it.",
+int, (struct cgraph_node *, machine_mode), NULL)
+
 
 HOOK_VECTOR_END (simd_clone)
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
7217f36a250d549b955c874d7c7644d94982b0b5..dc2fc20ef9fe777132308c9e33f7731d62717466
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4195,7 +4195,7 @@ 

[PATCH 5/8] vect: Use inbranch simdclones in masked loops

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
This patch enables the compiler to use inbranch simdclones when 
generating masked loops in autovectorization.


gcc/ChangeLog:

* omp-simd-clone.cc (simd_clone_adjust_argument_types): Make function
compatible with mask parameters in clone.
* tree-vect-stmts.cc (vect_convert): New helper function.
(vect_build_all_ones_mask): Allow vector boolean typed masks.
(vectorizable_simd_clone_call): Enable the use of masked clones in
fully masked loops.diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 
a42643400ddcf10961633448b49d4caafb999f12..ef0b9b48c7212900023bc0eaebca5e1f9389db77
 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -807,8 +807,14 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 {
   ipa_adjusted_param adj;
   memset (, 0, sizeof (adj));
-  tree parm = args[i];
-  tree parm_type = node->definition ? TREE_TYPE (parm) : parm;
+  tree parm = NULL_TREE;
+  tree parm_type = NULL_TREE;
+  if(i < args.length())
+   {
+ parm = args[i];
+ parm_type = node->definition ? TREE_TYPE (parm) : parm;
+   }
+
   adj.base_index = i;
   adj.prev_clone_index = i;
 
@@ -1547,7 +1553,7 @@ simd_clone_adjust (struct cgraph_node *node)
  mask = gimple_assign_lhs (g);
  g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),
   BIT_AND_EXPR, mask,
-  build_int_cst (TREE_TYPE (mask), 1));
+  build_one_cst (TREE_TYPE (mask)));
  gsi_insert_after (, g, GSI_CONTINUE_LINKING);
  mask = gimple_assign_lhs (g);
}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
664c3b5f7ca48fdb49383fb8a97f407465574479..7217f36a250d549b955c874d7c7644d94982b0b5
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1723,6 +1723,20 @@ check_load_store_for_partial_vectors (loop_vec_info 
loop_vinfo, tree vectype,
 }
 }
 
+/* Return SSA name of the result of the conversion of OPERAND into type TYPE.
+   The conversion statement is inserted at GSI.  */
+
+static tree
+vect_convert (vec_info *vinfo, stmt_vec_info stmt_info, tree type, tree 
operand,
+ gimple_stmt_iterator *gsi)
+{
+  operand = build1 (VIEW_CONVERT_EXPR, type, operand);
+  gassign *new_stmt = gimple_build_assign (make_ssa_name (type),
+  operand);
+  vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
+  return gimple_get_lhs (new_stmt);
+}
+
 /* Return the mask input to a masked load or store.  VEC_MASK is the vectorized
form of the scalar mask condition and LOOP_MASK, if nonnull, is the mask
that needs to be applied to all loads and stores in a vectorized loop.
@@ -2666,7 +2680,8 @@ vect_build_all_ones_mask (vec_info *vinfo,
 {
   if (TREE_CODE (masktype) == INTEGER_TYPE)
 return build_int_cst (masktype, -1);
-  else if (TREE_CODE (TREE_TYPE (masktype)) == INTEGER_TYPE)
+  else if (VECTOR_BOOLEAN_TYPE_P (masktype)
+  || TREE_CODE (TREE_TYPE (masktype)) == INTEGER_TYPE)
 {
   tree mask = build_int_cst (TREE_TYPE (masktype), -1);
   mask = build_vector_from_val (masktype, mask);
@@ -4018,7 +4033,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
   size_t i, nargs;
   tree lhs, rtype, ratype;
   vec *ret_ctor_elts = NULL;
-  int arg_offset = 0;
+  int masked_call_offset = 0;
 
   /* Is STMT a vectorizable call?   */
   gcall *stmt = dyn_cast  (stmt_info->stmt);
@@ -4033,7 +4048,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
   gcc_checking_assert (TREE_CODE (fndecl) == ADDR_EXPR);
   fndecl = TREE_OPERAND (fndecl, 0);
   gcc_checking_assert (TREE_CODE (fndecl) == FUNCTION_DECL);
-  arg_offset = 1;
+  masked_call_offset = 1;
 }
   if (fndecl == NULL_TREE)
 return false;
@@ -4065,7 +4080,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
 return false;
 
   /* Process function arguments.  */
-  nargs = gimple_call_num_args (stmt) - arg_offset;
+  nargs = gimple_call_num_args (stmt) - masked_call_offset;
 
   /* Bail out if the function has zero arguments.  */
   if (nargs == 0)
@@ -4083,7 +4098,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
   thisarginfo.op = NULL_TREE;
   thisarginfo.simd_lane_linear = false;
 
-  op = gimple_call_arg (stmt, i + arg_offset);
+  op = gimple_call_arg (stmt, i + masked_call_offset);
   if (!vect_is_simple_use (op, vinfo, ,
   )
  || thisarginfo.dt == vect_uninitialized_def)
@@ -4161,14 +4176,6 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
 }
 
   poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-  if (!vf.is_constant ())
-{
-  if (dump_enabled_p ())
-   dump_printf_loc 

[PATCH 4/8] vect: don't allow fully masked loops with non-masked simd clones [PR 110485]

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
When analyzing a loop and choosing a simdclone to use it is possible to 
choose a simdclone that cannot be used 'inbranch' for a loop that can 
use partial vectors.  This may lead to the vectorizer deciding to use 
partial vectors which are not supported for notinbranch simd clones. 
This patch fixes that by disabling the use of partial vectors once a 
notinbranch simd clone has been selected.


gcc/ChangeLog:

PR tree-optimization/110485
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Disable partial
vectors usage if a notinbranch simdclone has been selected.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/pr110485.c: New test.diff --git a/gcc/testsuite/gcc.dg/gomp/pr110485.c 
b/gcc/testsuite/gcc.dg/gomp/pr110485.c
new file mode 100644
index 
..ba6817a127f40246071e32ccebf692cc4d121d15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/pr110485.c
@@ -0,0 +1,19 @@
+/* PR 110485 */
+/* { dg-do compile } */
+/* { dg-additional-options "-Ofast -fdump-tree-vect-details" } */
+/* { dg-additional-options "-march=znver4 --param=vect-partial-vector-usage=1" 
{ target x86_64-*-* } } */
+#pragma omp declare simd notinbranch uniform(p)
+extern double __attribute__ ((const)) bar (double a, double p);
+
+double a[1024];
+double b[1024];
+
+void foo (int n)
+{
+  #pragma omp simd
+  for (int i = 0; i < n; ++i)
+a[i] = bar (b[i], 71.2);
+}
+
+/* { dg-final { scan-tree-dump-not "MASK_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump "can't use a fully-masked loop because a 
non-masked simd clone was selected." "vect" { target x86_64-*-* } } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
35207de7acb410358220dbe8d1af82215b5091bf..664c3b5f7ca48fdb49383fb8a97f407465574479
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4349,6 +4349,17 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
   ? boolean_true_node : boolean_false_node;
STMT_VINFO_SIMD_CLONE_INFO (stmt_info).safe_push (sll);
  }
+
+  if (!bestn->simdclone->inbranch)
+   {
+ if (dump_enabled_p ()
+ && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+   dump_printf_loc (MSG_NOTE, vect_location,
+"can't use a fully-masked loop because a"
+" non-masked simd clone was selected.\n");
+ LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
+   }
+
   STMT_VINFO_TYPE (stmt_info) = call_simd_clone_vec_info_type;
   DUMP_VECT_SCOPE ("vectorizable_simd_clone_call");
 /*  vect_model_simple_cost (vinfo, stmt_info, ncopies,


[Patch 3/8] vect: Fix vect_get_smallest_scalar_type for simd clones

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
The vect_get_smallest_scalar_type helper function was using any argument 
to a simd clone call when trying to determine the smallest scalar type 
that would be vectorized.  This included the function pointer type in a 
MASK_CALL for instance, and would result in the wrong type being 
selected.  Instead this patch special cases simd_clone_call's and uses 
only scalar types of the original function that get transformed into 
vector types.


gcc/ChangeLog:

* tree-vect-data-refs.cci (vect_get_smallest_scalar_type): Special case
simd clone calls and only use types that are mapped to vectors.
* tree-vect-stmts.cc (simd_clone_call_p): New helper function.
* tree-vectorizer.h (simd_clone_call_p): Declare new function.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-simd-clone-16f.c: Remove unnecessary differentation
between targets with different pointer sizes.
* gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
* gcc.dg/vect/vect-simd-clone-18f.c: Likewise.diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c 
b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
index 
574698d3e133ecb8700e698fa42a6b05dd6b8a18..7cd29e894d0502a59fadfe67db2db383133022d3
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
@@ -7,9 +7,8 @@
 #include "vect-simd-clone-16.c"
 
 /* Ensure the the in-branch simd clones are used on targets that support them.
-   Some targets use pairs of vectors and do twice the calls.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" 
{ target { ! { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } } */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" 
{ target { { i?86*-*-* x86_64-*-* } && { ! lp64 } } } } } */
+ */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" 
} } */
 
 /* The LTO test produces two dump files and we scan the wrong one.  */
 /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c 
b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
index 
8bb6d19301a67a3eebce522daaf7d54d88f708d7..177521dc44531479fca1f1a1a0f2010f30fa3fb5
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
@@ -7,9 +7,8 @@
 #include "vect-simd-clone-17.c"
 
 /* Ensure the the in-branch simd clones are used on targets that support them.
-   Some targets use pairs of vectors and do twice the calls.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" 
{ target { ! { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } } */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" 
{ target { { i?86*-*-* x86_64-*-* } && { ! lp64 } } } } } */
+ */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" 
} } */
 
 /* The LTO test produces two dump files and we scan the wrong one.  */
 /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c 
b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
index 
d34f23f4db8e9c237558cc22fe66b7e02b9e6c20..4dd51381d73c0c7c8ec812f24e5054df038059c5
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
@@ -7,9 +7,8 @@
 #include "vect-simd-clone-18.c"
 
 /* Ensure the the in-branch simd clones are used on targets that support them.
-   Some targets use pairs of vectors and do twice the calls.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" 
{ target { ! { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } } } */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" 
{ target { { i?86*-*-* x86_64-*-* } && { ! lp64 } } } } } */
+ */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" 
} } */
 
 /* The LTO test produces two dump files and we scan the wrong one.  */
 /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 
a3570c45b5209281ac18c1220c3b95398487f389..1bdbea232afc6facddac23269ee3da033eb1ed50
 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -119,6 +119,7 @@ tree
 vect_get_smallest_scalar_type (stmt_vec_info stmt_info, tree scalar_type)
 {
   HOST_WIDE_INT lhs, rhs;
+  cgraph_node *node;
 
   /* During the analysis phase, this function is called on arbitrary
  statements that might not have scalar results.  */
@@ -145,6 +146,23 @@ vect_get_smallest_scalar_type (stmt_vec_info stmt_info, 
tree scalar_type)
scalar_type = rhs_type;
}
 }
+  else if (simd_clone_call_p (stmt_info->stmt, ))
+{
+  auto clone = node->simd_clones->simdclone;
+  for (unsigned int i = 0; i < clone->nargs; ++i)
+   {
+ if (clone->args[i].arg_type == 

[Patch 2/8] parloops: Allow poly nit and bound

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
Teach parloops how to handle a poly nit and bound e ahead of the changes 
to enable non-constant simdlen.


gcc/ChangeLog:

* tree-parloops.cc (try_to_transform_to_exit_first_loop_alt): Accept
poly NIT and ALT_BOUND.diff --git a/gcc/tree-parloops.cc b/gcc/tree-parloops.cc
index 
a35f3d5023b06e5ef96eb4222488fcb34dd7bd45..cf713e53d712fb5ad050e274f373adba5a90c5a7
 100644
--- a/gcc/tree-parloops.cc
+++ b/gcc/tree-parloops.cc
@@ -2531,14 +2531,16 @@ try_transform_to_exit_first_loop_alt (class loop *loop,
   tree nit_type = TREE_TYPE (nit);
 
   /* Figure out whether nit + 1 overflows.  */
-  if (TREE_CODE (nit) == INTEGER_CST)
+  if (TREE_CODE (nit) == INTEGER_CST
+  || TREE_CODE (nit) == POLY_INT_CST)
 {
   if (!tree_int_cst_equal (nit, TYPE_MAX_VALUE (nit_type)))
{
  alt_bound = fold_build2_loc (UNKNOWN_LOCATION, PLUS_EXPR, nit_type,
   nit, build_one_cst (nit_type));
 
- gcc_assert (TREE_CODE (alt_bound) == INTEGER_CST);
+ gcc_assert (TREE_CODE (alt_bound) == INTEGER_CST
+ || TREE_CODE (alt_bound) == POLY_INT_CST);
  transform_to_exit_first_loop_alt (loop, reduction_list, alt_bound);
  return true;
}


Re: RFC: Introduce -fhardened to enable security-related flags

2023-08-30 Thread Xi Ruoyao via Gcc-patches
On Tue, 2023-08-29 at 15:42 -0400, Marek Polacek via Gcc-patches wrote:
> + if (UNLIKELY (flag_hardened)
> + && (opt->code == OPT_D || opt->code == OPT_U))
> +   {
> + if (!fortify_seen_p)
> +   fortify_seen_p = !strncmp (opt->arg, "_FORTIFY_SOURCE", 15);
> + if (!cxx_assert_seen_p)
> +   cxx_assert_seen_p = !strcmp (opt->arg, "_GLIBCXX_ASSERTIONS");

It looks like there is some minor logic issue here: the first strncmp
will mistakenly match "-D_FORTIFY_SOURCE_FAKE", and the second strcmp
will not match "-D_GLIBCXX_ASSERTIONS=1".

> +   }

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH 1/8] parloops: Copy target and optimizations when creating a function clone

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches


SVE simd clones require to be compiled with a SVE target enabled or the 
argument types will not be created properly. To achieve this we need to 
copy DECL_FUNCTION_SPECIFIC_TARGET from the original function 
declaration to the clones.  I decided it was probably also a good idea 
to copy DECL_FUNCTION_SPECIFIC_OPTIMIZATION in case the original 
function is meant to be compiled with specific optimization options.


gcc/ChangeLog:

* tree-parloops.cc (create_loop_fn): Copy specific target and
optimization options to clone.diff --git a/gcc/tree-parloops.cc b/gcc/tree-parloops.cc
index 
e495bbd65270bdf90bae2c4a2b52777522352a77..a35f3d5023b06e5ef96eb4222488fcb34dd7bd45
 100644
--- a/gcc/tree-parloops.cc
+++ b/gcc/tree-parloops.cc
@@ -2203,6 +2203,11 @@ create_loop_fn (location_t loc)
   DECL_CONTEXT (t) = decl;
   TREE_USED (t) = 1;
   DECL_ARGUMENTS (decl) = t;
+  DECL_FUNCTION_SPECIFIC_TARGET (decl)
+= DECL_FUNCTION_SPECIFIC_TARGET (act_cfun->decl);
+  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl)
+= DECL_FUNCTION_SPECIFIC_OPTIMIZATION (act_cfun->decl);
+
 
   allocate_struct_function (decl, false);
 


aarch64, vect, omp: Add SVE support for simd clones [PR 96342]

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches

Hi,

This patch series aims to implement support for SVE simd clones when not 
specifying a 'simdlen' clause for AArch64. This patch depends on my 
earlier patch: '[PATCH] aarch64: enable mixed-types for aarch64 simdclones'.


Bootstrapped and regression tested the series on 
aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu. I also tried building 
the patches separately, but that was before some further clean-up 
restructuring, so will do that again prior to pushing.


Andre Vieira (8):

parloops: Copy target and optimizations when creating a function clone
parloops: Allow poly nit and bound
vect: Fix vect_get_smallest_scalar_type for simd clones
vect: don't allow fully masked loops with non-masked simd clones [PR 110485]
vect: Use inbranch simdclones in masked loops
vect: Add vector_mode paramater to simd_clone_usable
vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM
aarch64: Add SVE support for simd clones [PR 96342]


[PATCH] Refactor vector HF/BF mode iterators and patterns.

2023-08-30 Thread liuhongt via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

* config/i386/sse.md (_blendm): Merge
VF_AVX512HFBFVL into VI12HFBF_AVX512VL.
(VF_AVX512HFBF16): Renamed to VHFBF.
(VF_AVX512FP16VL): Renamed to VHF_AVX512VL.
(VF_AVX512FP16): Removed.
(div3): Adjust VF_AVX512FP16VL to VHF_AVX512VL.
(avx512fp16_rcp2): Ditto.
(rsqrt2): Ditto.
(_rsqrt2): Ditto.
(vcond): Ditto.
(vcond): Ditto.
(_fmaddc__mask1): Ditto.
(_fmaddc__maskz): Ditto.
(_fcmaddc__mask1): Ditto.
(_fcmaddc__maskz): Ditto.
(cmla4): Ditto.
(fma__fadd_fmul): Ditto.
(fma__fadd_fcmul): Ditto.
(fma___fma_zero): Ditto.
(fma__fmaddc_bcst): Ditto.
(fma__fcmaddc_bcst): Ditto.
(___mask): Ditto.
(cmul3): Ditto.
(__):
Ditto.
(vec_unpacks_lo_): Ditto.
(vec_unpacks_hi_): Ditto.
(vec_unpack_fix_trunc_lo_): Ditto.
(vec_unpack_fix_trunc_lo_): Ditto.
(*vec_extract_0): Ditto.
(*_cmp3): Extend to V48H_AVX512VL.
---
 gcc/config/i386/sse.md | 238 +++--
 1 file changed, 108 insertions(+), 130 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 192e746fda3..e282d978a01 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -459,18 +459,10 @@ (define_mode_iterator VF2_AVX512VL
 (define_mode_iterator VF1_AVX512VL
   [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
 
-(define_mode_iterator VF_AVX512FP16
-  [V32HF V16HF V8HF])
+(define_mode_iterator VHFBF
+  [V32HF V16HF V8HF V32BF V16BF V8BF])
 
-(define_mode_iterator VF_AVX512HFBF16
-  [(V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
-   (V8HF "TARGET_AVX512FP16") V32BF V16BF V8BF])
-
-(define_mode_iterator VF_AVX512HFBFVL
-  [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")
-   V32BF (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
-
-(define_mode_iterator VF_AVX512FP16VL
+(define_mode_iterator VHF_AVX512VL
   [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")])
 
 ;; All vector integer modes
@@ -1624,29 +1616,15 @@ (define_insn "_blendm"
(set_attr "mode" "")])
 
 (define_insn "_blendm"
-  [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v,v")
-   (vec_merge:VI12_AVX512VL
- (match_operand:VI12_AVX512VL 2 "nonimmediate_operand" "vm,vm")
- (match_operand:VI12_AVX512VL 1 "nonimm_or_0_operand" "0C,v")
- (match_operand: 3 "register_operand" "Yk,Yk")))]
-  "TARGET_AVX512BW"
-  "@
-vmovdqu\t{%2, %0%{%3%}%N1|%0%{%3%}%N1, %2}
-vpblendm\t{%2, %1, %0%{%3%}|%0%{%3%}, %1, %2}"
-  [(set_attr "type" "ssemov")
-   (set_attr "prefix" "evex")
-   (set_attr "mode" "")])
-
-(define_insn "_blendm"
-  [(set (match_operand:VF_AVX512HFBFVL 0 "register_operand" "=v,v")
-   (vec_merge:VF_AVX512HFBFVL
- (match_operand:VF_AVX512HFBFVL 2 "nonimmediate_operand" "vm,vm")
- (match_operand:VF_AVX512HFBFVL 1 "nonimm_or_0_operand" "0C,v")
+  [(set (match_operand:VI12HFBF_AVX512VL 0 "register_operand" "=v,v")
+   (vec_merge:VI12HFBF_AVX512VL
+ (match_operand:VI12HFBF_AVX512VL 2 "nonimmediate_operand" "vm,vm")
+ (match_operand:VI12HFBF_AVX512VL 1 "nonimm_or_0_operand" "0C,v")
  (match_operand: 3 "register_operand" "Yk,Yk")))]
   "TARGET_AVX512BW"
   "@
 vmovdqu\t{%2, %0%{%3%}%N1|%0%{%3%}%N1, %2}
-vpblendmw\t{%2, %1, %0%{%3%}|%0%{%3%}, %1, %2}"
+vpblendm\t{%2, %1, %0%{%3%}|%0%{%3%}, %1, %2}"
   [(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -2448,10 +2426,10 @@ (define_expand "div3"
   "TARGET_SSE2")
 
 (define_expand "div3"
-  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand")
-   (div:VF_AVX512FP16VL
- (match_operand:VF_AVX512FP16VL 1 "register_operand")
- (match_operand:VF_AVX512FP16VL 2 "vector_operand")))]
+  [(set (match_operand:VHF_AVX512VL 0 "register_operand")
+   (div:VHF_AVX512VL
+ (match_operand:VHF_AVX512VL 1 "register_operand")
+ (match_operand:VHF_AVX512VL 2 "vector_operand")))]
   "TARGET_AVX512FP16"
 {
   /* Transform HF vector div to vector mul/rcp.  */
@@ -2568,9 +2546,9 @@ (define_insn "*sse_vmrcpv4sf2"
(set_attr "mode" "SF")])
 
 (define_insn "avx512fp16_rcp2"
-  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=v")
-   (unspec:VF_AVX512FP16VL
- [(match_operand:VF_AVX512FP16VL 1 "nonimmediate_operand" "vm")]
+  [(set (match_operand:VHF_AVX512VL 0 "register_operand" "=v")
+   (unspec:VHF_AVX512VL
+ [(match_operand:VHF_AVX512VL 1 "nonimmediate_operand" "vm")]
  UNSPEC_RCP))]
   "TARGET_AVX512FP16"
   "vrcpph\t{%1, %0|%0, %1}"
@@ -2731,9 +2709,9 @@ (define_expand "rsqrt2"
 })
 
 (define_expand "rsqrt2"
-  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand")
-   (unspec:VF_AVX512FP16VL
- 

Re: RFC: Introduce -fhardened to enable security-related flags

2023-08-30 Thread Martin Uecker
> Improving the security of software has been a major trend in the recent
> years.  Fortunately, GCC offers a wide variety of flags that enable extra
> hardening.  These flags aren't enabled by default, though.  And since
> there are a lot of hardening flags, with more to come, it's been difficult
> to keep on top of them; more so for the users of GCC who ought not to be
> expected to keep track of all the new options.
> 
> To alleviate some of the problems I mentioned, we thought it would
> be useful to provide a new umbrella option that enables a reasonable set
> of hardening flags.  What's "reasonable" in this context is not easy to
> pin down.  Surely, there must be no ABI impact, the option cannot cause
> severe performance issues, and, I suspect, it should not cause build
> errors by enabling stricter compile-time errors (such as, -Wimplicit-int,
> -Wint-conversion).  Including a controversial option in -fhardened
> would likely cause that users would not use -fhardened at all.  It's
> roughly akin to -Wall or -O2 -- those also enable a reasonable set of
> options, and evolve over time, and are not kept in sync with other
> compilers.
> 
> Currently, -fhardened enables:
> 
>   -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
>   -D_GLIBCXX_ASSERTIONS
>   -ftrivial-auto-var-init=zero
>   -fPIE  -pie  -Wl,-z,relro,-z,now
>   -fstack-protector-strong
>   -fstack-clash-protection
>   -fcf-protection=full (x86 GNU/Linux only)
> 
> -fsanitize=undefined is specifically not enabled.  -fstrict-flex-arrays is
> also liable to break a lot of code so I didn't include it.
> 
> Appended is a proof-of-concept patch.  It doesn't implement --help=hardened
> yet.  A fairly crucial point is that -fhardened will not override options
> that were specified on the command line (before or after -fhardened).  For
> example,
>  
>  -D_FORTIFY_SOURCE=1 -fhardened
> 
> means that _FORTIFY_SOURCE=1 will be used.  Similarly,
> 
>   -fhardened -fstack-protector
> 
> will not enable -fstack-protector-strong.
> 
> Thoughts?

I think this is a great idea!  Considering that it is difficult to
decide what shoud be activated and what not and the baseline should
not cause compile errors,  I wonder whether there should be higher
levels  similar to -O1,2,3 ? 

Although it would be nice to have a one-letter or very short
option similar to -O2 or -Wall, but maybe this is not possible 
because all short ones are already taken. Of course, 
"-fhardening" would  already a huge  improvement to the 
current situation.

Martin




Re: [PATCH] tree-ssa-strlen: Fix up handling of conditionally zero memcpy [PR110914]

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled since r279392 aka 
> r10-5451-gef29b12cfbb4979
> The strlen pass has adjust_last_stmt function, which performs mainly strcat
> or strcat-like optimizations (say strcpy (x, "abcd"); strcat (x, p);
> or equivalent memcpy (x, "abcd", strlen ("abcd") + 1); char *q = strchr (x, 
> 0);
> memcpy (x, p, strlen (p)); etc. where the first stmt stores '\0' character
> at the end but next immediately overwrites it and so the first memcpy can be
> adjusted to store 1 fewer bytes.  handle_builtin_memcpy called this function
> in two spots, the first one guarded like:
>   if (olddsi != NULL
>   && tree_fits_uhwi_p (len)
>   && !integer_zerop (len))
> adjust_last_stmt (olddsi, stmt, false);
> i.e. only for constant non-zero length.  The other spot can call it even
> for non-constant length but in that case we punt before that if that length
> isn't length of some string + 1, so again non-zero.
> The r279392 change I assume wanted to add some warning stuff and changed it
> like
>if (olddsi != NULL
> -  && tree_fits_uhwi_p (len)
>&& !integer_zerop (len))
> -adjust_last_stmt (olddsi, stmt, false);
> +{
> +  maybe_warn_overflow (stmt, len, rvals, olddsi, false, true);
> +  adjust_last_stmt (olddsi, stmt, false);
> +}
> While maybe_warn_overflow possibly handles non-constant length fine,
> adjust_last_stmt really relies on length to be non-zero, which
> !integer_zerop (len) alone doesn't guarantee.  While we could for
> len being SSA_NAME ask the ranger or tree_expr_nonzero_p, I think
> adjust_last_stmt will not benefit from it much, so the following patch
> just restores the above condition/previous behavior for the adjust_last_stmt
> call only.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2023-08-30  Jakub Jelinek  
> 
>   PR tree-optimization/110914
>   * tree-ssa-strlen.cc (strlen_pass::handle_builtin_memcpy): Don't call
>   adjust_last_stmt unless len is known constant.
> 
>   * gcc.c-torture/execute/pr110914.c: New test.
> 
> --- gcc/tree-ssa-strlen.cc.jj 2023-04-27 10:17:46.406486796 +0200
> +++ gcc/tree-ssa-strlen.cc2023-08-29 18:13:38.189327203 +0200
> @@ -3340,7 +3340,8 @@ strlen_pass::handle_builtin_memcpy (buil
>&& !integer_zerop (len))
>  {
>maybe_warn_overflow (stmt, false, len, olddsi, false, true);
> -  adjust_last_stmt (olddsi, stmt, false);
> +  if (tree_fits_uhwi_p (len))
> + adjust_last_stmt (olddsi, stmt, false);
>  }
>  
>int idx = get_stridx (src, stmt);
> --- gcc/testsuite/gcc.c-torture/execute/pr110914.c.jj 2023-08-29 
> 18:38:33.305699206 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr110914.c2023-08-29 
> 18:38:18.678901007 +0200
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/110914 */
> +
> +__attribute__ ((noipa)) int
> +foo (const char *s, unsigned long l)
> +{
> +  unsigned char r = 0;
> +  __builtin_memcpy (, s, l != 0);
> +  return r;
> +}
> +
> +int
> +main ()
> +{
> +  const char *p = "123456";
> +  int a = foo (p, __builtin_strlen (p) - 5);
> +  int b = foo (p, __builtin_strlen (p) - 6);
> +  if (a != '1')
> +__builtin_abort ();
> +  if (b != 0)
> +__builtin_abort ();
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] tree-ssa-strlen: Fix up handling of conditionally zero memcpy [PR110914]

2023-08-30 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase is miscompiled since r279392 aka 
r10-5451-gef29b12cfbb4979
The strlen pass has adjust_last_stmt function, which performs mainly strcat
or strcat-like optimizations (say strcpy (x, "abcd"); strcat (x, p);
or equivalent memcpy (x, "abcd", strlen ("abcd") + 1); char *q = strchr (x, 0);
memcpy (x, p, strlen (p)); etc. where the first stmt stores '\0' character
at the end but next immediately overwrites it and so the first memcpy can be
adjusted to store 1 fewer bytes.  handle_builtin_memcpy called this function
in two spots, the first one guarded like:
  if (olddsi != NULL
  && tree_fits_uhwi_p (len)
  && !integer_zerop (len))
adjust_last_stmt (olddsi, stmt, false);
i.e. only for constant non-zero length.  The other spot can call it even
for non-constant length but in that case we punt before that if that length
isn't length of some string + 1, so again non-zero.
The r279392 change I assume wanted to add some warning stuff and changed it
like
   if (olddsi != NULL
-  && tree_fits_uhwi_p (len)
   && !integer_zerop (len))
-adjust_last_stmt (olddsi, stmt, false);
+{
+  maybe_warn_overflow (stmt, len, rvals, olddsi, false, true);
+  adjust_last_stmt (olddsi, stmt, false);
+}
While maybe_warn_overflow possibly handles non-constant length fine,
adjust_last_stmt really relies on length to be non-zero, which
!integer_zerop (len) alone doesn't guarantee.  While we could for
len being SSA_NAME ask the ranger or tree_expr_nonzero_p, I think
adjust_last_stmt will not benefit from it much, so the following patch
just restores the above condition/previous behavior for the adjust_last_stmt
call only.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-08-30  Jakub Jelinek  

PR tree-optimization/110914
* tree-ssa-strlen.cc (strlen_pass::handle_builtin_memcpy): Don't call
adjust_last_stmt unless len is known constant.

* gcc.c-torture/execute/pr110914.c: New test.

--- gcc/tree-ssa-strlen.cc.jj   2023-04-27 10:17:46.406486796 +0200
+++ gcc/tree-ssa-strlen.cc  2023-08-29 18:13:38.189327203 +0200
@@ -3340,7 +3340,8 @@ strlen_pass::handle_builtin_memcpy (buil
   && !integer_zerop (len))
 {
   maybe_warn_overflow (stmt, false, len, olddsi, false, true);
-  adjust_last_stmt (olddsi, stmt, false);
+  if (tree_fits_uhwi_p (len))
+   adjust_last_stmt (olddsi, stmt, false);
 }
 
   int idx = get_stridx (src, stmt);
--- gcc/testsuite/gcc.c-torture/execute/pr110914.c.jj   2023-08-29 
18:38:33.305699206 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr110914.c  2023-08-29 
18:38:18.678901007 +0200
@@ -0,0 +1,22 @@
+/* PR tree-optimization/110914 */
+
+__attribute__ ((noipa)) int
+foo (const char *s, unsigned long l)
+{
+  unsigned char r = 0;
+  __builtin_memcpy (, s, l != 0);
+  return r;
+}
+
+int
+main ()
+{
+  const char *p = "123456";
+  int a = foo (p, __builtin_strlen (p) - 5);
+  int b = foo (p, __builtin_strlen (p) - 6);
+  if (a != '1')
+__builtin_abort ();
+  if (b != 0)
+__builtin_abort ();
+  return 0;
+}

Jakub



Re: [PATCH] test: Add xfail for riscv_vector

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Juzhe-Zhong wrote:

> Like ARM SVE, when we enable scalable vectorization for RVV,
> we can't do constant fold for these yet for both ARM SVE and RVV.
> 
> 
> Ok for trunk ?

OK.

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/pr88598-1.c: Add riscv_vector.
>   * gcc.dg/vect/pr88598-2.c: Ditto.
>   * gcc.dg/vect/pr88598-3.c: Ditto.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/pr88598-1.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/pr88598-2.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/pr88598-3.c | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/pr88598-1.c 
> b/gcc/testsuite/gcc.dg/vect/pr88598-1.c
> index e25c6c04543..ddcebb067ea 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr88598-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr88598-1.c
> @@ -51,4 +51,4 @@ main ()
>  
>  /* ??? We need more constant folding for this to work with fully-masked
> loops.  */
> -/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail 
> aarch64_sve } } } */
> +/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail { 
> aarch64_sve || riscv_vector } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/pr88598-2.c 
> b/gcc/testsuite/gcc.dg/vect/pr88598-2.c
> index f4c41bd8e58..ef5ea8a1a86 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr88598-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr88598-2.c
> @@ -51,4 +51,4 @@ main ()
>  
>  /* ??? We need more constant folding for this to work with fully-masked
> loops.  */
> -/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail 
> aarch64_sve } } } */
> +/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail { 
> aarch64_sve || riscv_vector } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/pr88598-3.c 
> b/gcc/testsuite/gcc.dg/vect/pr88598-3.c
> index 0fc23bf0ee7..75b8d024a95 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr88598-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr88598-3.c
> @@ -51,4 +51,4 @@ main ()
>  
>  /* ??? We need more constant folding for this to work with fully-masked
> loops.  */
> -/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail 
> aarch64_sve } } } */
> +/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail { 
> aarch64_sve || riscv_vector } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] test: Add xfail for riscv_vector

2023-08-30 Thread Juzhe-Zhong
Like ARM SVE, when we enable scalable vectorization for RVV,
we can't do constant fold for these yet for both ARM SVE and RVV.


Ok for trunk ?

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr88598-1.c: Add riscv_vector.
* gcc.dg/vect/pr88598-2.c: Ditto.
* gcc.dg/vect/pr88598-3.c: Ditto.

---
 gcc/testsuite/gcc.dg/vect/pr88598-1.c | 2 +-
 gcc/testsuite/gcc.dg/vect/pr88598-2.c | 2 +-
 gcc/testsuite/gcc.dg/vect/pr88598-3.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr88598-1.c 
b/gcc/testsuite/gcc.dg/vect/pr88598-1.c
index e25c6c04543..ddcebb067ea 100644
--- a/gcc/testsuite/gcc.dg/vect/pr88598-1.c
+++ b/gcc/testsuite/gcc.dg/vect/pr88598-1.c
@@ -51,4 +51,4 @@ main ()
 
 /* ??? We need more constant folding for this to work with fully-masked
loops.  */
-/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail 
aarch64_sve } } } */
+/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail { 
aarch64_sve || riscv_vector } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr88598-2.c 
b/gcc/testsuite/gcc.dg/vect/pr88598-2.c
index f4c41bd8e58..ef5ea8a1a86 100644
--- a/gcc/testsuite/gcc.dg/vect/pr88598-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr88598-2.c
@@ -51,4 +51,4 @@ main ()
 
 /* ??? We need more constant folding for this to work with fully-masked
loops.  */
-/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail 
aarch64_sve } } } */
+/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail { 
aarch64_sve || riscv_vector } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr88598-3.c 
b/gcc/testsuite/gcc.dg/vect/pr88598-3.c
index 0fc23bf0ee7..75b8d024a95 100644
--- a/gcc/testsuite/gcc.dg/vect/pr88598-3.c
+++ b/gcc/testsuite/gcc.dg/vect/pr88598-3.c
@@ -51,4 +51,4 @@ main ()
 
 /* ??? We need more constant folding for this to work with fully-masked
loops.  */
-/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail 
aarch64_sve } } } */
+/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail { 
aarch64_sve || riscv_vector } } } } */
-- 
2.36.3



Re: [PATCH] fortran: Restore interface to its previous state on error [PR48776]

2023-08-30 Thread Mikael Morin via Gcc-patches

Le 28/08/2023 à 21:17, Harald Anlauf via Fortran a écrit :

Hi Mikael,

On 8/27/23 21:22, Mikael Morin via Gcc-patches wrote:

Hello,

this fixes an old error-recovery bug.
Tested on x86_64-pc-linux-gnu.

OK for master?


I have only a minor comment:

+/* Free the leading members of the gfc_interface linked list given in 
INTR

+   up to the END element (exclusive: the END element is not freed).
+   If END is not nullptr, it is assumed that END is in the linked 
list starting

+   with INTR.  */
+
+static void
+free_interface_elements_until (gfc_interface *intr, gfc_interface *end)
+{
+  gfc_interface *next;
+
+  for (; intr != end; intr = next)


Would it make sense to add a protection for intr == NULL, i.e.:

+  for (; intr && intr != end; intr = next)

Just to prevent a NULL pointer dereference in case there
is a corruption of the chain or something else went wrong.

This would happen in the case END is not a member of the INTR linked 
list.  In that case, the most forgiving would be not freeing any memory 
and just returning.  But it would require walking the list a second time 
to determine before proceeding if END is present, and let's not do work 
that is expected to be useless.


I will just do the change as you suggest it.


Otherwise it looks good to me.

It appears that your patch similarly fixes PR107923.  :-)


Good news. :-)
I will double check that none of the testcases there remain unfixed and 
close as duplicate.


I don't know how you manage to make your way through the hundreds of 
open PRs by the way.


Thanks for the review.


Thanks for the patch!

Harald






Re: [PATCH v2 3/4] LoongArch: add new configure option --with-strict-align-lib

2023-08-30 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-08-30 at 14:51 +0800, Yujie Yang wrote:
> > > LoongArch processors may not support memory accesses without natural
> > > alignments.  Building libraries with -mstrict-align may help with
> > > toolchain binary compatiblity and performance on these implementations
> > > (e.g. Loongson 2K1000LA).
> > > 
> > > No significant performance degredation is observed on current mainstream
> > > LoongArch processors when the option is enabled.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * config.gcc: use -mstrict-align for building libraries
> > > if --with-strict-align-lib is given.
> > 
> > Isn't this equivalent to --with-default-multilib=mno-strict-align now?
> > 
> > And I still believe the easiest way for 2K1000LA is adding -march=la264
> > support so the user can simply configure with --with-arch=la264.
> 
> Not exactly -- Options given in --with-multilib-default= will not be applied
> to multilib variants that have build options specified in 
> --with-multilib-list,
> but --with-strict-align-lib is always effective.
> 
> e.g. for the following configuration:
> 
>   --with-multilib-default=mstrict-align
>   --with-multilib-list=lp64d/la464,lp64s
> 
> The library build options would be:
> 
>   base/lp64d variant: -mabi=lp64d -march=la464 (no -mstrict-align appended)
>   base/lp64s variant: -mabi=lp64s -march=abi-default -mstrict-align
> 
> Sure, you can do it with --with-arch=la264. It's just a convenient
> switch that we can use for building generic toolchains.

If you want a generic toolchain, it should default to -mstrict-align as
well.  Or it will still do unexpected thing for cases like:

struct foo { char x; int y; } __attribute__ ((packed));

int get (struct foo *foo) { return foo->y; }

So it should be --with-strict-align (it should make the *compiler*
default to -mstrict-align).  But them it seems --with-arch=la264 is just
easier...

Or maybe we should add -march=la64-baseline (or another name?) as the
"bottom line" of a LA64 CPU.  Currently the definition of -
march=loongarch64 includes unaligned access and 64-bit FP support, so
IMO we should have a baseline definition if we need to support something
"below" loongarch64.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH V2] Emit funcall external declarations only if actually used.

2023-08-30 Thread Jose E. Marchesi via Gcc-patches


ping

> [Differences from V1:
> - Prototype for call_from_call_insn moved before comment block.
> - Reuse the `call' flag for SYMBOL_REF_LIBCALL.
> - Fallback to check REG_CALL_DECL in non-direct calls.
> - New test to check correct behavior for non-direct calls.]
>
> There are many places in GCC where alternative local sequences are
> tried in order to determine what is the cheapest or best alternative
> to use in the current target.  When any of these sequences involve a
> libcall, the current implementation of emit_library_call_value_1
> introduce a side-effect consisting on emitting an external declaration
> for the funcall (such as __divdi3) which is thus emitted even if the
> sequence that does the libcall is not retained.
>
> This is problematic in targets such as BPF, because the kernel loader
> chokes on the spurious symbol __divdi3 and makes the resulting BPF
> object unloadable.  Note that BPF objects are not linked before being
> loaded.
>
> This patch changes emit_library_call_value_1 to mark the target
> SYMBOL_REF as a libcall.  Then, the emission of the external
> declaration is done in the first loop of final.cc:shorten_branches.
> This happens only if the corresponding sequence has been kept.
>
> Regtested in x86_64-linux-gnu.
> Tested with host x86_64-linux-gnu with target bpf-unknown-none.
>
> gcc/ChangeLog
>
>   * rtl.h (SYMBOL_REF_LIBCALL): Define.
>   * calls.cc (emit_library_call_value_1): Do not emit external
>   libcall declaration here.
>   * final.cc (shorten_branches): Do it here.
>
> gcc/testsuite/ChangeLog
>
>   * gcc.target/bpf/divmod-libcall-1.c: New test.
>   * gcc.target/bpf/divmod-libcall-2.c: Likewise.
>   * gcc.c-torture/compile/libcall-2.c: Likewise.
> ---
>  gcc/calls.cc  |  9 +++---
>  gcc/final.cc  | 30 +++
>  gcc/rtl.h |  5 
>  .../gcc.c-torture/compile/libcall-2.c |  8 +
>  .../gcc.target/bpf/divmod-libcall-1.c | 19 
>  .../gcc.target/bpf/divmod-libcall-2.c | 16 ++
>  6 files changed, 83 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index 1f3a6d5c450..219ea599b16 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -4388,9 +4388,10 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx 
> value,
>   || argvec[i].partial != 0)
>update_stack_alignment_for_call ([i].locate);
>  
> -  /* If this machine requires an external definition for library
> - functions, write one out.  */
> -  assemble_external_libcall (fun);
> +  /* Mark the emitted target as a libcall.  This will be used by final
> + in order to emit an external symbol declaration if the libcall is
> + ever used.  */
> +  SYMBOL_REF_LIBCALL (fun) = 1;
>  
>original_args_size = args_size;
>args_size.constant = (aligned_upper_bound (args_size.constant
> @@ -4735,7 +4736,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx 
> value,
>  valreg,
>  old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far);
>  
> -  if (flag_ipa_ra)
> +  if (flag_ipa_ra || SYMBOL_REF_LIBCALL (orgfun))
>  {
>rtx datum = orgfun;
>gcc_assert (GET_CODE (datum) == SYMBOL_REF);
> diff --git a/gcc/final.cc b/gcc/final.cc
> index dd3e22547ac..2041e43fdd1 100644
> --- a/gcc/final.cc
> +++ b/gcc/final.cc
> @@ -804,6 +804,8 @@ make_pass_compute_alignments (gcc::context *ctxt)
>  }
>  
>  
> +static rtx call_from_call_insn (rtx_call_insn *insn);
> +
>  /* Make a pass over all insns and compute their actual lengths by shortening
> any branches of variable length if possible.  */
>  
> @@ -850,6 +852,34 @@ shorten_branches (rtx_insn *first)
>for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn))
>  {
>INSN_SHUID (insn) = i++;
> +
> +  /* If this is a `call' instruction implementing a libcall, and
> + this machine requires an external definition for library
> + functions, write one out.  */
> +  if (CALL_P (insn))
> +{
> +  rtx x;
> +
> +  if ((x = call_from_call_insn (dyn_cast  (insn)))
> +  && (x = XEXP (x, 0))
> +  && MEM_P (x)
> +  && (x = XEXP (x, 0))
> +  && SYMBOL_REF_P (x)
> +  && SYMBOL_REF_LIBCALL (x))
> +{
> +  /* Direct call.  */
> +  assemble_external_libcall (x);
> +}
> +  else if ((x = find_reg_note (insn, REG_CALL_DECL, NULL_RTX))
> +   && (x = XEXP (x, 0)))
> +{
> +  /* Indirect call with REG_CALL_DECL note.  */
> +  gcc_assert (SYMBOL_REF_P (x));
> +  if 

Re: [PATCH] store-merging: Fix up >= 64 bit insertion [PR111015]

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase shows that we mishandle bit insertion for
> info->bitsize >= 64.  The problem is in using unsigned HOST_WIDE_INT
> shift + subtraction + build_int_cst to compute mask, the shift invokes
> UB at compile time for info->bitsize 64 and larger and e.g. on the testcase
> with info->bitsize happens to compute mask of 0x3f rather than
> 0x3f''.
> 
> The patch fixes that by using wide_int wi::mask + wide_int_to_tree, so it
> handles masks in any precision (up to WIDE_INT_MAX_PRECISION ;) ).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
> backports?

OK.

Thanks,
Richard.

> 2023-08-30  Jakub Jelinek  
> 
>   PR tree-optimization/111015
>   * gimple-ssa-store-merging.cc
>   (imm_store_chain_info::output_merged_store): Use wi::mask and
>   wide_int_to_tree instead of unsigned HOST_WIDE_INT shift and
>   build_int_cst to build BIT_AND_EXPR mask.
> 
>   * gcc.dg/pr111015.c: New test.
> 
> --- gcc/gimple-ssa-store-merging.cc.jj2023-07-11 13:40:39.049448058 
> +0200
> +++ gcc/gimple-ssa-store-merging.cc   2023-08-29 16:13:12.808434272 +0200
> @@ -4687,12 +4687,13 @@ imm_store_chain_info::output_merged_stor
>   }
> else if ((BYTES_BIG_ENDIAN ? start_gap : end_gap) > 0)
>   {
> -   const unsigned HOST_WIDE_INT imask
> - = (HOST_WIDE_INT_1U << info->bitsize) - 1;
> +   wide_int imask
> + = wi::mask (info->bitsize, false,
> + TYPE_PRECISION (TREE_TYPE (tem)));
> tem = gimple_build (, loc,
> BIT_AND_EXPR, TREE_TYPE (tem), tem,
> -   build_int_cst (TREE_TYPE (tem),
> -  imask));
> +   wide_int_to_tree (TREE_TYPE (tem),
> + imask));
>   }
> const HOST_WIDE_INT shift
>   = (BYTES_BIG_ENDIAN ? end_gap : start_gap);
> --- gcc/testsuite/gcc.dg/pr111015.c.jj2023-08-29 16:06:38.526938204 
> +0200
> +++ gcc/testsuite/gcc.dg/pr111015.c   2023-08-29 16:19:03.702536015 +0200
> @@ -0,0 +1,28 @@
> +/* PR tree-optimization/111015 */
> +/* { dg-do run { target int128 } } */
> +/* { dg-options "-O2" } */
> +
> +struct S { unsigned a : 4, b : 4; unsigned __int128 c : 70; } d;
> +
> +__attribute__((noipa)) void
> +foo (unsigned __int128 x, unsigned char y, unsigned char z)
> +{
> +  d.a = y;
> +  d.b = z;
> +  d.c = x;
> +}
> +
> +int
> +main ()
> +{
> +  foo (-1, 12, 5);
> +  if (d.a != 12
> +  || d.b != 5
> +  || d.c != (-1ULL | (((unsigned __int128) 0x3f) << 64)))
> +__builtin_abort ();
> +  foo (0x123456789abcdef0ULL | (((unsigned __int128) 26) << 64), 7, 11);
> +  if (d.a != 7
> +  || d.b != 11
> +  || d.c != (0x123456789abcdef0ULL | (((unsigned __int128) 26) << 64)))
> +__builtin_abort ();
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH V5] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-30 Thread Juzhe-Zhong
Add vect_strided and vect_widen so that we will remove these following failures:
FAIL: gcc.dg/vect/vect-reduc-pattern-1c-big-array.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 1 loops" 0
FAIL: gcc.dg/vect/vect-reduc-pattern-1c-big-array.c scan-tree-dump-times vect 
"vectorized 1 loops" 0
FAIL: gcc.dg/vect/vect-reduc-pattern-1c.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 1 loops" 0
FAIL: gcc.dg/vect/vect-reduc-pattern-1c.c scan-tree-dump-times vect "vectorized 
1 loops" 0
FAIL: gcc.dg/vect/wrapv-vect-reduc-pattern-2c.c scan-tree-dump-times vect 
"vectorized 1 loops" 0
FAIL: gcc.dg/vect/slp-19a.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-19a.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 0
FAIL: gcc.dg/vect/slp-19a.c scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-19a.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 0
FAIL: gcc.dg/vect/slp-19b.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-19b.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 0
FAIL: gcc.dg/vect/slp-19b.c scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-19b.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 0
FAIL: gcc.dg/vect/slp-21.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loops" 1
FAIL: gcc.dg/vect/slp-21.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 0
FAIL: gcc.dg/vect/slp-21.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/slp-21.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 0
FAIL: gcc.dg/vect/slp-23.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loops" 1
FAIL: gcc.dg/vect/slp-23.c scan-tree-dump-times vect "vectorized 1 loops" 1
XPASS: gcc.dg/vect/slp-reduc-6.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "vectorized 1 loops" 2
XPASS: gcc.dg/vect/slp-reduc-6.c scan-tree-dump-times vect "vectorized 1 loops" 
2
XPASS: gcc.dg/vect/vect-10-big-array.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 1 loops" 1
XPASS: gcc.dg/vect/vect-10-big-array.c scan-tree-dump-times vect "vectorized 1 
loops" 1
XPASS: gcc.dg/vect/vect-10.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loops" 1
XPASS: gcc.dg/vect/vect-10.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-98-big-array.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 1 loops" 0
FAIL: gcc.dg/vect/vect-98-big-array.c scan-tree-dump-times vect "vectorized 1 
loops" 0
FAIL: gcc.dg/vect/vect-98.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loops" 0
FAIL: gcc.dg/vect/vect-98.c scan-tree-dump-times vect "vectorized 1 loops" 0
FAIL: gcc.dg/vect/vect-strided-store-u32-i2.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/vect-strided-store-u32-i2.c scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/vect/vect-vfa-03.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "vectorized 1 loops" 0
FAIL: gcc.dg/vect/vect-vfa-03.c scan-tree-dump-times vect "vectorized 1 loops" 0

With patch, the failures report (236 now original 270):

XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect 
"vect_recog_widen_mult_pattern: detected" 1
FAIL: gcc.dg/vect/no-section-anchors-vect-31.c scan-tree-dump-times vect 
"Alignment of access forced using peeling" 2
FAIL: gcc.dg/vect/no-section-anchors-vect-64.c scan-tree-dump-times vect 
"Alignment of access forced using peeling" 2
FAIL: gcc.dg/vect/no-vfa-vect-101.c scan-tree-dump-times vect "can't determine 
dependence" 1
FAIL: gcc.dg/vect/no-vfa-vect-102.c scan-tree-dump-times vect "possible 
dependence between data-refs" 1
FAIL: gcc.dg/vect/no-vfa-vect-102a.c scan-tree-dump-times vect "possible 
dependence between data-refs" 1
FAIL: gcc.dg/vect/no-vfa-vect-37.c scan-tree-dump-times vect "can't determine 
dependence" 2
FAIL: gcc.dg/vect/pr57705.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loop" 2
FAIL: gcc.dg/vect/pr57705.c scan-tree-dump-times vect "vectorized 1 loop" 2
FAIL: gcc.dg/vect/pr65310.c -flto -ffat-lto-objects  scan-tree-dump vect "can't 
force alignment"
FAIL: gcc.dg/vect/pr65310.c -flto -ffat-lto-objects  scan-tree-dump-not vect 
"misalign = 0"
FAIL: gcc.dg/vect/pr65310.c scan-tree-dump vect "can't force alignment"
FAIL: gcc.dg/vect/pr65310.c scan-tree-dump-not vect "misalign = 0"
FAIL: 

[PATCH] store-merging: Fix up >= 64 bit insertion [PR111015]

2023-08-30 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase shows that we mishandle bit insertion for
info->bitsize >= 64.  The problem is in using unsigned HOST_WIDE_INT
shift + subtraction + build_int_cst to compute mask, the shift invokes
UB at compile time for info->bitsize 64 and larger and e.g. on the testcase
with info->bitsize happens to compute mask of 0x3f rather than
0x3f''.

The patch fixes that by using wide_int wi::mask + wide_int_to_tree, so it
handles masks in any precision (up to WIDE_INT_MAX_PRECISION ;) ).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
backports?

2023-08-30  Jakub Jelinek  

PR tree-optimization/111015
* gimple-ssa-store-merging.cc
(imm_store_chain_info::output_merged_store): Use wi::mask and
wide_int_to_tree instead of unsigned HOST_WIDE_INT shift and
build_int_cst to build BIT_AND_EXPR mask.

* gcc.dg/pr111015.c: New test.

--- gcc/gimple-ssa-store-merging.cc.jj  2023-07-11 13:40:39.049448058 +0200
+++ gcc/gimple-ssa-store-merging.cc 2023-08-29 16:13:12.808434272 +0200
@@ -4687,12 +4687,13 @@ imm_store_chain_info::output_merged_stor
}
  else if ((BYTES_BIG_ENDIAN ? start_gap : end_gap) > 0)
{
- const unsigned HOST_WIDE_INT imask
-   = (HOST_WIDE_INT_1U << info->bitsize) - 1;
+ wide_int imask
+   = wi::mask (info->bitsize, false,
+   TYPE_PRECISION (TREE_TYPE (tem)));
  tem = gimple_build (, loc,
  BIT_AND_EXPR, TREE_TYPE (tem), tem,
- build_int_cst (TREE_TYPE (tem),
-imask));
+ wide_int_to_tree (TREE_TYPE (tem),
+   imask));
}
  const HOST_WIDE_INT shift
= (BYTES_BIG_ENDIAN ? end_gap : start_gap);
--- gcc/testsuite/gcc.dg/pr111015.c.jj  2023-08-29 16:06:38.526938204 +0200
+++ gcc/testsuite/gcc.dg/pr111015.c 2023-08-29 16:19:03.702536015 +0200
@@ -0,0 +1,28 @@
+/* PR tree-optimization/111015 */
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O2" } */
+
+struct S { unsigned a : 4, b : 4; unsigned __int128 c : 70; } d;
+
+__attribute__((noipa)) void
+foo (unsigned __int128 x, unsigned char y, unsigned char z)
+{
+  d.a = y;
+  d.b = z;
+  d.c = x;
+}
+
+int
+main ()
+{
+  foo (-1, 12, 5);
+  if (d.a != 12
+  || d.b != 5
+  || d.c != (-1ULL | (((unsigned __int128) 0x3f) << 64)))
+__builtin_abort ();
+  foo (0x123456789abcdef0ULL | (((unsigned __int128) 26) << 64), 7, 11);
+  if (d.a != 7
+  || d.b != 11
+  || d.c != (0x123456789abcdef0ULL | (((unsigned __int128) 26) << 64)))
+__builtin_abort ();
+}

Jakub



[PATCH V4 2/2] rs6000: use mtvsrws to move sf from si p9

2023-08-30 Thread Jiufu Guo via Gcc-patches
Hi,

As mentioned in PR108338, on p9, we could use mtvsrws to implement
the bitcast from SI to SF (or lowpart DI to SF).

For code:
  *(long long*)buff = di;
  float f = *(float*)(buff);

"sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" is generated.
A better one would be "mtvsrws 1,3 ; xscvspdpn 1,1".

Compare with previous patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623533.html
"highpart DI-->SF" is put to a seperate patch.

Pass bootstrap and regression on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* config/rs6000/rs6000.md (movsf_from_si): Update to generate mtvsrws
for P9.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108338.c: Updated to check mtvsrws for p9.

---
 gcc/config/rs6000/rs6000.md | 25 -
 gcc/testsuite/gcc.target/powerpc/pr108338.c |  6 +++--
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 
8c92cbf976de915136ad5dba24e69a363d21438d..c03e677bca79e8fb1acb276d07d0acfae009f6d8
 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -8280,13 +8280,26 @@ (define_insn_and_split "movsf_from_si"
 {
   rtx op0 = operands[0];
   rtx op1 = operands[1];
-  rtx op2 = operands[2];
-  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
 
-  /* Move SF value to upper 32-bits for xscvspdpn.  */
-  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
-  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
-  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
+  /* Move lowpart 32-bits from register for SFmode.  */
+  if (TARGET_P9_VECTOR)
+{
+  /* Using mtvsrws;xscvspdpn.  */
+  rtx op0_v = gen_rtx_REG (V4SImode, REGNO (op0));
+  emit_insn (gen_vsx_splat_v4si (op0_v, op1));
+  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
+}
+  else
+{
+  rtx op2 = operands[2];
+  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
+
+  /* Using ashl;mtvsrd;xscvspdpn.  */
+  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
+  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
+  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
+}
+
   DONE;
 }
   [(set_attr "length"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr108338.c 
b/gcc/testsuite/gcc.target/powerpc/pr108338.c
index 
6db65595343c2407fc32f68f5f52a1f7196c371d..0565e5254ed0a8cc579cf505a3f865426dcf62ae
 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr108338.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr108338.c
@@ -19,9 +19,11 @@ float  __attribute__ ((noipa)) sf_from_di_off4 (long long l)
 
 /* Under lp64, parameter 'l' is in one DI reg, then bitcast sub DI to SF. */
 /* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 { target { lp64 && 
has_arch_pwr8 } } } } */
-/* { dg-final { scan-assembler-times {\mmtvsrd\M} 2 { target { lp64 && 
has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-times {\mmtvsrd\M} 2 { target { lp64 && { 
has_arch_pwr8 && { ! has_arch_pwr9 } } } } } } */
+/* { dg-final { scan-assembler-times {\msldi\M} 1 { target { lp64 && { 
has_arch_pwr8 && { ! has_arch_pwr9 } } } } } } */
+/* { dg-final { scan-assembler-times {\mmtvsrd\M} 1 { target { lp64 && 
has_arch_pwr9 } } } } */
+/* { dg-final { scan-assembler-times {\mmtvsrws\M} 1 { target { lp64 && 
has_arch_pwr9 } } } } */
 /* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { lp64 && 
has_arch_pwr8 } } } } */
-/* { dg-final { scan-assembler-times {\msldi\M} 1 { target { lp64 && 
has_arch_pwr8 } } } } */
 
 union di_sf_sf
 {
-- 
2.25.1



[PATCH V4 1/2] rs6000: optimize moving to sf from highpart di

2023-08-30 Thread Jiufu Guo via Gcc-patches
Hi,

Currently, we have the pattern "movsf_from_si2" which was trying
to support moving high part DI to SF.

The pattern looks like: XX:SF=bitcast:SF(subreg(YY:DI>>32),0)
It only accepts the "ashiftrt" for ">>", but "lshiftrt" is also ok.
And the offset of "subreg" is hard code 0, which only works for LE.

"movsf_from_si2" is updated to cover BE for "subreg", and cover
the logical shift for ":DI>>32".

Pass bootstrap and regression on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

PR target/108338

gcc/ChangeLog:

* config/rs6000/predicates.md (lowpart_subreg_operator): New
define_predicate.
* config/rs6000/rs6000.md (any_rshift): New code_iterator.
(movsf_from_si2): Rename to ...
(movsf_from_si2_): ... this.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108338.c: New test.

---
 gcc/config/rs6000/predicates.md |  5 +++
 gcc/config/rs6000/rs6000.md | 11 +++---
 gcc/testsuite/gcc.target/powerpc/pr108338.c | 40 +
 3 files changed, 51 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108338.c

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 
3552d908e9d149a30993e3e6568466de537336be..e25b3b4864f681d47e9d5c2eb88bcde0aea6d17b
 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -2098,3 +2098,8 @@ (define_predicate "macho_pic_address"
   else
 return false;
 })
+
+(define_predicate "lowpart_subreg_operator"
+  (and (match_code "subreg")
+   (match_test "subreg_lowpart_offset (mode, GET_MODE (SUBREG_REG (op)))
+   == SUBREG_BYTE (op)")))
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 
1a9a7b1a47918f39fc91038607f21a8ba9a2e740..8c92cbf976de915136ad5dba24e69a363d21438d
 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -8299,18 +8299,19 @@ (define_insn_and_split "movsf_from_si"
"*,  *, p9v,   p8v,   *, *,
 p8v,p8v,   p8v,   *")])
 
+(define_code_iterator any_rshift [ashiftrt lshiftrt])
+
 ;; For extracting high part element from DImode register like:
 ;; {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
 ;; split it before reload with "and mask" to avoid generating shift right
 ;; 32 bit then shift left 32 bit.
-(define_insn_and_split "movsf_from_si2"
+(define_insn_and_split "movsf_from_si2_"
   [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
(unspec:SF
-[(subreg:SI
-  (ashiftrt:DI
+[(match_operator:SI 3 "lowpart_subreg_operator"
+  [(any_rshift:DI
(match_operand:DI 1 "input_operand" "r")
-   (const_int 32))
-  0)]
+   (const_int 32))])]
 UNSPEC_SF_FROM_SI))
   (clobber (match_scratch:DI 2 "=r"))]
   "TARGET_NO_SF_SUBREG"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr108338.c 
b/gcc/testsuite/gcc.target/powerpc/pr108338.c
new file mode 100644
index 
..6db65595343c2407fc32f68f5f52a1f7196c371d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr108338.c
@@ -0,0 +1,40 @@
+// { dg-do run }
+// { dg-options "-O2 -save-temps" }
+
+float __attribute__ ((noipa)) sf_from_di_off0 (long long l)
+{
+  char buff[16];
+  *(long long*)buff = l;
+  float f = *(float*)(buff);
+  return f;
+}
+
+float  __attribute__ ((noipa)) sf_from_di_off4 (long long l)
+{
+  char buff[16];
+  *(long long*)buff = l;
+  float f = *(float*)(buff + 4);
+  return f; 
+}
+
+/* Under lp64, parameter 'l' is in one DI reg, then bitcast sub DI to SF. */
+/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 { target { lp64 && 
has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-times {\mmtvsrd\M} 2 { target { lp64 && 
has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { lp64 && 
has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-times {\msldi\M} 1 { target { lp64 && 
has_arch_pwr8 } } } } */
+
+union di_sf_sf
+{
+  struct {float f1; float f2;};
+  long long l;
+};
+
+int main()
+{
+  union di_sf_sf v;
+  v.f1 = 1.0f;
+  v.f2 = 2.0f;
+  if (sf_from_di_off0 (v.l) != 1.0f || sf_from_di_off4 (v.l) != 2.0f )
+__builtin_abort ();
+  return 0;
+}
-- 
2.25.1



Re: [PATCH] middle-end: Apply MASK_LEN_LOAD_LANES/MASK_LEN_STORE_LANES to ivopts/alias

2023-08-30 Thread Lehua Ding

Committed, thanks Richard.

On 2023/8/30 15:25, Richard Biener via Gcc-patches wrote:

On Wed, 30 Aug 2023, Juzhe-Zhong wrote:


Like MASK_LOAD_LANES/MASK_STORE_LANES, add MASK_LEN_ variant.

Bootstrap and Regression on X86 passed.

Ok for trunk?


OK.


gcc/ChangeLog:

* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Add MASK_LEN_ variant.
(call_may_clobber_ref_p_1): Ditto.
* tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Ditto.
(get_alias_ptr_type_for_ptr_address): Ditto.

---
  gcc/tree-ssa-alias.cc   | 3 +++
  gcc/tree-ssa-loop-ivopts.cc | 4 
  2 files changed, 7 insertions(+)

diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index cf38fe506a8..373940b5f6c 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -2818,11 +2818,13 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref *ref, 
bool tbaa_p)
case IFN_MASK_LEN_STORE:
return false;
case IFN_MASK_STORE_LANES:
+  case IFN_MASK_LEN_STORE_LANES:
goto process_args;
case IFN_MASK_LOAD:
case IFN_LEN_LOAD:
case IFN_MASK_LEN_LOAD:
case IFN_MASK_LOAD_LANES:
+  case IFN_MASK_LEN_LOAD_LANES:
{
  ao_ref rhs_ref;
  tree lhs = gimple_call_lhs (call);
@@ -3072,6 +3074,7 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, bool 
tbaa_p)
case IFN_LEN_STORE:
case IFN_MASK_LEN_STORE:
case IFN_MASK_STORE_LANES:
+  case IFN_MASK_LEN_STORE_LANES:
{
  tree rhs = gimple_call_arg (call,
  internal_fn_stored_value_index (fn));
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index d208d9dbd4d..3d3f28f7f3b 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -2441,6 +2441,7 @@ get_mem_type_for_internal_fn (gcall *call, tree *op_p)
  {
  case IFN_MASK_LOAD:
  case IFN_MASK_LOAD_LANES:
+case IFN_MASK_LEN_LOAD_LANES:
  case IFN_LEN_LOAD:
  case IFN_MASK_LEN_LOAD:
if (op_p == gimple_call_arg_ptr (call, 0))
@@ -2449,6 +2450,7 @@ get_mem_type_for_internal_fn (gcall *call, tree *op_p)
  
  case IFN_MASK_STORE:

  case IFN_MASK_STORE_LANES:
+case IFN_MASK_LEN_STORE_LANES:
  case IFN_LEN_STORE:
  case IFN_MASK_LEN_STORE:
{
@@ -7573,6 +7575,8 @@ get_alias_ptr_type_for_ptr_address (iv_use *use)
  case IFN_MASK_STORE:
  case IFN_MASK_LOAD_LANES:
  case IFN_MASK_STORE_LANES:
+case IFN_MASK_LEN_LOAD_LANES:
+case IFN_MASK_LEN_STORE_LANES:
  case IFN_LEN_LOAD:
  case IFN_LEN_STORE:
  case IFN_MASK_LEN_LOAD:





--
Best,
Lehua


Re: [PATCH] middle-end: Apply MASK_LEN_LOAD_LANES/MASK_LEN_STORE_LANES to ivopts/alias

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Juzhe-Zhong wrote:

> Like MASK_LOAD_LANES/MASK_STORE_LANES, add MASK_LEN_ variant.
> 
> Bootstrap and Regression on X86 passed.
> 
> Ok for trunk?

OK.

> gcc/ChangeLog:
> 
>   * tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Add MASK_LEN_ variant.
>   (call_may_clobber_ref_p_1): Ditto.
>   * tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Ditto.
>   (get_alias_ptr_type_for_ptr_address): Ditto.
> 
> ---
>  gcc/tree-ssa-alias.cc   | 3 +++
>  gcc/tree-ssa-loop-ivopts.cc | 4 
>  2 files changed, 7 insertions(+)
> 
> diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
> index cf38fe506a8..373940b5f6c 100644
> --- a/gcc/tree-ssa-alias.cc
> +++ b/gcc/tree-ssa-alias.cc
> @@ -2818,11 +2818,13 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref *ref, 
> bool tbaa_p)
>case IFN_MASK_LEN_STORE:
>   return false;
>case IFN_MASK_STORE_LANES:
> +  case IFN_MASK_LEN_STORE_LANES:
>   goto process_args;
>case IFN_MASK_LOAD:
>case IFN_LEN_LOAD:
>case IFN_MASK_LEN_LOAD:
>case IFN_MASK_LOAD_LANES:
> +  case IFN_MASK_LEN_LOAD_LANES:
>   {
> ao_ref rhs_ref;
> tree lhs = gimple_call_lhs (call);
> @@ -3072,6 +3074,7 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, 
> bool tbaa_p)
>case IFN_LEN_STORE:
>case IFN_MASK_LEN_STORE:
>case IFN_MASK_STORE_LANES:
> +  case IFN_MASK_LEN_STORE_LANES:
>   {
> tree rhs = gimple_call_arg (call,
> internal_fn_stored_value_index (fn));
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index d208d9dbd4d..3d3f28f7f3b 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -2441,6 +2441,7 @@ get_mem_type_for_internal_fn (gcall *call, tree *op_p)
>  {
>  case IFN_MASK_LOAD:
>  case IFN_MASK_LOAD_LANES:
> +case IFN_MASK_LEN_LOAD_LANES:
>  case IFN_LEN_LOAD:
>  case IFN_MASK_LEN_LOAD:
>if (op_p == gimple_call_arg_ptr (call, 0))
> @@ -2449,6 +2450,7 @@ get_mem_type_for_internal_fn (gcall *call, tree *op_p)
>  
>  case IFN_MASK_STORE:
>  case IFN_MASK_STORE_LANES:
> +case IFN_MASK_LEN_STORE_LANES:
>  case IFN_LEN_STORE:
>  case IFN_MASK_LEN_STORE:
>{
> @@ -7573,6 +7575,8 @@ get_alias_ptr_type_for_ptr_address (iv_use *use)
>  case IFN_MASK_STORE:
>  case IFN_MASK_LOAD_LANES:
>  case IFN_MASK_STORE_LANES:
> +case IFN_MASK_LEN_LOAD_LANES:
> +case IFN_MASK_LEN_STORE_LANES:
>  case IFN_LEN_LOAD:
>  case IFN_LEN_STORE:
>  case IFN_MASK_LEN_LOAD:
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2 3/4] LoongArch: add new configure option --with-strict-align-lib

2023-08-30 Thread Yujie Yang
> > LoongArch processors may not support memory accesses without natural
> > alignments.  Building libraries with -mstrict-align may help with
> > toolchain binary compatiblity and performance on these implementations
> > (e.g. Loongson 2K1000LA).
> > 
> > No significant performance degredation is observed on current mainstream
> > LoongArch processors when the option is enabled.
> > 
> > gcc/ChangeLog:
> > 
> > * config.gcc: use -mstrict-align for building libraries
> > if --with-strict-align-lib is given.
> 
> Isn't this equivalent to --with-default-multilib=mno-strict-align now?
> 
> And I still believe the easiest way for 2K1000LA is adding -march=la264
> support so the user can simply configure with --with-arch=la264.

Not exactly -- Options given in --with-multilib-default= will not be applied
to multilib variants that have build options specified in --with-multilib-list,
but --with-strict-align-lib is always effective.

e.g. for the following configuration:

  --with-multilib-default=mstrict-align
  --with-multilib-list=lp64d/la464,lp64s

The library build options would be:

  base/lp64d variant: -mabi=lp64d -march=la464 (no -mstrict-align appended)
  base/lp64s variant: -mabi=lp64s -march=abi-default -mstrict-align

Sure, you can do it with --with-arch=la264. It's just a convenient
switch that we can use for building generic toolchains.



  1   2   >