Re: [PATCH] Check hard_regno_mode_ok before setting lowest memory move cost for the mode with different reg classes.

2023-04-18 Thread Hongtao Liu via Gcc-patches
On Thu, Apr 6, 2023 at 1:07 PM Liu, Hongtao via Gcc-patches
 wrote:
>
>
>
> > -Original Message-
> > From: Vladimir Makarov 
> > Sent: Wednesday, April 5, 2023 8:59 PM
> > To: Jeff Law ; Liu, Hongtao
> > ; gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] Check hard_regno_mode_ok before setting lowest
> > memory move cost for the mode with different reg classes.
> >
> >
> > On 4/4/23 21:29, Jeff Law wrote:
> > >
> > >
> > > On 4/3/23 23:13, liuhongt via Gcc-patches wrote:
> > >> There's a potential performance issue when backend returns some
> > >> unreasonable value for the mode which can be never be allocate with
> > >> reg class.
> > >>
> > >> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > >> Ok for trunk(or GCC14 stage1)?
> > >>
> > >> gcc/ChangeLog:
> > >>
> > >> PR rtl-optimization/109351
> > >> * ira.cc (setup_class_subset_and_memory_move_costs): Check
> > >> hard_regno_mode_ok before setting lowest memory move cost for
> > >> the mode with different reg classes.
> > > Not a regression *and* changing register allocation.  This seems like
> > > it should defer to gcc-14.
> > >
> > Yes, I am agree.  It should wait for gcc-14, especially when we are close 
> > to the
> > release. Also the testing x86-64 is not enough for such changes (although I
> > tried ppc64le and did not find any problem).
> >
> > Cost related patches for RA frequently result in new testsuite failures on
> > some targets.  Even if the change seems obvious and expected to improve
> > the generated code.
> >
> > Target dependent code sometimes defines correctly the costs only for some
> > possible cases and making less dependent from this pitfall is good.  So I 
> > think
> > the patch moves us to the right direction.
> >
> > The patch is ok for me to commit it to the trunk after the gcc-13 release 
> > and if
> > arm64 testing shows no GCC testsuite regression.
> Bootstrapped and regtested on aarch64-unknown-linux-gnu.
> Waiting for GCC14.
Committed.
> >
> > Thank you for working on this issue.
> >
>


-- 
BR,
Hongtao


Re: [PATCH] RISC-V: Fix bug reported by PR109535

2023-04-18 Thread Jeff Law via Gcc-patches




On 4/18/23 19:29, juzhe.zh...@rivai.ai wrote:

I tried refers_to_regno_p
It can not work for us since it just return true or false whether the 
"rtx" has the regno.
Use refers_to_regno_p instead of the equality comparison for the REGNO. 
 So you're still going to have count_regno_occurrences, you're just 
changing the test it uses so that it works for modes which potentially

span multiple hard registers.

Note that you'll want to pass in AVL rather than REGNO (avl).  When you 
call refers_to_regno_p it'll look something like


tmp = REGNO (avl);
mode = GET_MODE (avl);

if (REG_P (recog_data.operand[i])
&& refers_to_regno_p (tmp, hard_regno_nregs (tmp, mode),
  recog_data.operand[i], NULL))

Or something like that.  I'm assuming AVL is a hard register at this 
point.  If it could be a pseudo the code will be slightly different.


I'm still not sure all this stuff is handling SUBREGs properly either. 
Though if it's only checked after reload, we should be OK as we should 
have simplified the subreg away.




Jeff





Re: [PATCH v2] doc: Document order of define_peephole2 scanning

2023-04-18 Thread Hans-Peter Nilsson via Gcc-patches
> From: Hans-Peter Nilsson 
> Date: Wed, 19 Apr 2023 05:15:27 +0200

> Approvers: pdf output reviewed.  Ok to commit?

Patch retracted, at least temporarily.  My "understanding"
may be clouded by looking at an actual bug.  Sigh.

brgds, H-P


Re: [PATCH] RISC-V: Update multilib-generator to handle V

2023-04-18 Thread Kito Cheng via Gcc-patches
Write a primary version for that, did you mind giving it a try?

The basic idea is to select multilib only by ABI, so that we don't
need to bother with endless multilib reuse cases...

On Wed, Apr 19, 2023 at 9:38 AM Kito Cheng  wrote:
>
> OK, thanks, I know what the problem is, I tried rv64 but didn't try
> rv32, I have another fix in my mind, and will post another fix soon.
>
> On Wed, Apr 19, 2023 at 9:29 AM Palmer Dabbelt  wrote:
> >
> > On Tue, 18 Apr 2023 18:26:18 PDT (-0700), Kito Cheng wrote:
> > > And which -march -mabi you used will got issue?
> > >
> > > On Wed, Apr 19, 2023 at 8:51 AM Palmer Dabbelt  
> > > wrote:
> > >>
> > >> On Tue, 18 Apr 2023 17:47:31 PDT (-0700), Kito Cheng wrote:
> > >> > Do you mind shared gcc configure and the option you tried?
> > >>
> > >> Just riscv-gnu-toolchain with "--enbale-multilib --enable-linux".
> > >>
> > >> > On Wed, Apr 19, 2023 at 4:01 AM Palmer Dabbelt  
> > >> > wrote:
> > >> >>
> > >> >> On Tue, 18 Apr 2023 08:44:24 PDT (-0700), gcc-patches@gcc.gnu.org 
> > >> >> wrote:
> > >> >> >> Yep, if I drop the non-canonicial strings via
> > >> >> >>
> > >> >> >> diff --git a/gcc/config/riscv/multilib-generator 
> > >> >> >> b/gcc/config/riscv/multilib-generator
> > >> >> >> index 58b7198b243..a63a4d69c18 100755
> > >> >> >> --- a/gcc/config/riscv/multilib-generator
> > >> >> >> +++ b/gcc/config/riscv/multilib-generator
> > >> >> >> @@ -174,7 +174,7 @@ for cmodel in cmodels:
> > >> >> >>  ext_combs = expand_combination(ext)
> > >> >> >>  alts = sum([[x] + [x + y for y in ext_combs] for x in 
> > >> >> >> [arch] + extra], [])
> > >> >> >>  alts = filter(lambda x: len(x) != 0, alts)
> > >> >> >> -alts = alts + list(map(lambda a : arch_canonicalize(a, 
> > >> >> >> args.misa_spec), alts))
> > >> >> >> +alts = list(map(lambda a : arch_canonicalize(a, 
> > >> >> >> args.misa_spec), alts))
> > >> >> >>
> > >> >> >>  # Drop duplicated entry.
> > >> >> >>  alts = unique(alts)
> > >> >> >>
> > >> >> >> then I can't link `-march=rv32imafdcv`, I need
> > >> >> >> `-march=rv32imacv_zicsr_zve32f_zve32x_zve64x_zvl128b_zvl32b_zvl64b`.
> > >> >> >>   That's
> > >> >> >> kind of a headache for users to type in.
> > >> >> >
> > >> >> > Yes, that's a headache for users, but arch string canonicalization 
> > >> >> > is
> > >> >> > hiddened at the process,
> > >> >> > so the user could still just use rv32imafdcv at compile time and
> > >> >> > multi-lib config.
> > >> >> >
> > >> >> > And the driver and multilib-generator (with arch_canonicalize) 
> > >> >> > script
> > >> >> > will handle those headache in the background.
> > >> >>
> > >> >> Sorry, I'm not exactly sure what you're trying to say.  I just rebuilt
> > >> >> GCC with this patch (and t-linux-multilib regenerated from it), it's 
> > >> >> not
> > >> >> resolving multlibs for the short names.
> >
> > `-march=rv32imafdcv` is the broken one,
> > `-march=rv32imacv_zicsr_zve32f_zve32x_zve64x_zvl128b_zvl32b_zvl64b`
> > resolves multilibs (there's a bit more above).
From a92c0cb2ce6fa58939331549bfb9e8110ec86a11 Mon Sep 17 00:00:00 2001
From: Kito Cheng 
Date: Wed, 19 Apr 2023 11:54:42 +0800
Subject: [PATCH] RISC-V: Handle multi-lib path correclty for linux [DRAFT]

---
 gcc/common/config/riscv/riscv-common.cc | 118 
 gcc/config/riscv/linux.h|  13 ++-
 2 files changed, 90 insertions(+), 41 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc
index 2fc0f8bffc1..f40b1b617c2 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1597,6 +1597,73 @@ riscv_check_conds (
   return match_score + ok_count * 100;
 }
 
+static const char *
+riscv_select_multilib_by_abi (
+  const std::string _current_arch_str,
+  const std::string _current_abi_str,
+  const riscv_subset_list *subset_list,
+  const struct switchstr *switches,
+  int n_switches,
+  const std::vector _infos
+	)
+{
+  for (size_t i = 0; i < multilib_infos.size (); ++i)
+if (riscv_current_abi_str == multilib_infos[i].abi_str)
+  return xstrdup (multilib_infos[i].path.c_str ());
+
+  return NULL;
+}
+
+
+static const char *
+riscv_select_multilib (
+  const std::string _current_arch_str,
+  const std::string _current_abi_str,
+  const riscv_subset_list *subset_list,
+  const struct switchstr *switches,
+  int n_switches,
+  const std::vector _infos
+	)
+{
+  int match_score = 0;
+  int max_match_score = 0;
+  int best_match_multi_lib = -1;
+  /* Try to decision which set we should used.  */
+  /* We have 3 level decision tree here, ABI, check input arch/ABI must
+ be superset of multi-lib arch, and other rest option checking.  */
+  for (size_t i = 0; i < multilib_infos.size (); ++i)
+{
+  /* Check ABI is same first.  */
+  if (riscv_current_abi_str != multilib_infos[i].abi_str)
+	continue;
+
+  /* Found a potential compatible 

[PATCH] RISC-V: Allow VMS{Compare} (V1, V1) shortcut optimization

2023-04-18 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch try to adjust the RISC-V Vector RTL for the
generic shortcut optimization for RVV integer compare.
It includes compare operator eq, ne, ltu, lt, leu, le,
gtu, gt, geu and ge.

Assume we have below test code.
vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
  return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v24,0(a1)
vmslt.vv v8,v24,v24
vsetvli  a5,zero,e8,m8,ta,ma
vsm.vv8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,mf8,ta,ma
vmclr.m v24
vsetvli zero,a5,e8,mf8,ta,ma
vsm.v   v24,0(a0)
ret

However, there some cases in the test files cannot be optimized right
now. We will file separated patches to try to make it happen.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (emit_pred_op):
* config/riscv/riscv-vector-builtins-bases.cc:
* config/riscv/vector.md:

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.

Signed-off-by: Pan Li 
Co-authored-by: Ju-Zhe Zhong 
---
 gcc/config/riscv/riscv-v.cc   |  15 +-
 .../riscv/riscv-vector-builtins-bases.cc  |   6 +-
 gcc/config/riscv/vector.md|  14 +-
 .../rvv/base/integer_compare_insn_shortcut.c  | 291 ++
 4 files changed, 319 insertions(+), 7 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 392f5d02e17..c3881920812 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -71,12 +71,23 @@ public:
 add_input_operand (RVV_VUNDEF (mode), mode);
   }
   void add_policy_operand (enum tail_policy vta, enum mask_policy vma)
+  {
+add_tail_policy_operand (vta);
+add_mask_policy_operand (vma);
+  }
+
+  void add_tail_policy_operand (enum tail_policy vta)
   {
 rtx tail_policy_rtx = gen_int_mode (vta, Pmode);
-rtx mask_policy_rtx = gen_int_mode (vma, Pmode);
 add_input_operand (tail_policy_rtx, Pmode);
+  }
+
+  void add_mask_policy_operand (enum mask_policy vma)
+  {
+rtx mask_policy_rtx = gen_int_mode (vma, Pmode);
 add_input_operand (mask_policy_rtx, Pmode);
   }
+
   void add_avl_type_operand (avl_type type)
   {
 add_input_operand (gen_int_mode (type, Pmode), Pmode);
@@ -206,6 +217,8 @@ emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, 
rtx len,
 
   if (GET_MODE_CLASS (mode) != MODE_VECTOR_BOOL)
 e.add_policy_operand (get_prefer_tail_policy (), get_prefer_mask_policy 
());
+  else
+e.add_tail_policy_operand (get_prefer_tail_policy ());
 
   if (vlmax_p)
 e.add_avl_type_operand (avl_type::VLMAX);
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 52467bbc961..7c6064a5a24 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -756,7 +756,7 @@ template
 class mask_logic : public function_base
 {
 public:
-  bool apply_tail_policy_p () const override { return false; }
+  bool apply_tail_policy_p () const override { return true; }
   bool apply_mask_policy_p () const override { return false; }
 
   rtx expand (function_expander ) const override
@@ -768,7 +768,7 @@ template
 class mask_nlogic : public function_base
 {
 public:
-  bool apply_tail_policy_p () const override { return false; }
+  bool apply_tail_policy_p () const override { return true; }
   bool apply_mask_policy_p () const override { return false; }
 
   rtx expand (function_expander ) const override
@@ -780,7 +780,7 @@ template
 class mask_notlogic : public function_base
 {
 public:
-  bool apply_tail_policy_p () const override { return false; }
+  bool apply_tail_policy_p () const override { return true; }
   bool apply_mask_policy_p () const override { return false; }
 
   rtx expand (function_expander ) const override
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 0ecca98f20c..6819363b9ff 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1032,6 +1032,7 @@ (define_insn_and_split "@pred_mov"
[(match_operand:VB 1 "vector_all_trues_mask_operand" "Wc1, Wc1, 
Wc1, Wc1, Wc1")
 (match_operand 4 "vector_length_operand"" rK,  rK,  
rK,  rK,  rK")
 (match_operand 5 "const_int_operand""  i,   i,   
i,   i,   i")
+(match_operand 6 "const_int_operand""  i,   i,   
i,   i,   i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (match_operand:VB 3 "vector_move_operand"  "  m,  vr,  
vr, Wc0, Wc1")
@@ -4113,7 +4114,8 @@ (define_expand "@pred_ge_scalar"
   if (satisfies_constraint_Wc1 (operands[1]))
emit_insn (
  gen_pred_mov (mode, operands[0], CONSTM1_RTX (mode), undef,
-   CONSTM1_RTX (mode), operands[6], 

[PATCH v2] doc: Document order of define_peephole2 scanning

2023-04-18 Thread Hans-Peter Nilsson via Gcc-patches
> From: Hans-Peter Nilsson 
> Date: Tue, 18 Apr 2023 20:44:12 +0200
> 
> > From: Paul Koning 
> 
> > Date: Tue, 18 Apr 2023 14:32:07 -0400
> > 
> > I'm not sure about the meaning of part of this.
> > "...resumes at the last generated insn."  Does that mean:

[...]

> (Neither...)

[...]

> Sorry, your confusement confuses me.  I just don't see how
> to confuse last with first or matched with generated. :)

It's 4:30am and things appear much clearer, in particular
wrt. confusion.  Hopefully the version below is clearer.
Here's also the example from 35 lines up in md.texi:

(define_peephole2
  [(match_scratch:SI 4 "r")
   (set (match_operand:SI 0 "" "") (match_operand:SI 1 "" ""))
   (set (match_operand:SI 2 "" "") (match_dup 1))
   (match_dup 4)
   (set (match_operand:SI 3 "" "") (match_dup 1))]
  "/* @r{determine 1 does not overlap 0 and 2} */"
  [(set (match_dup 4) (match_dup 1))
   (set (match_dup 0) (match_dup 4))
   (set (match_dup 2) (match_dup 4))
   (set (match_dup 3) (match_dup 4))]
  "")

Approvers: pdf output reviewed.  Ok to commit?

All: thoughts on making define_peephole2 work "as expected";
"backtracing" so the replacement buffer ends with the first
generated replacement insn?  Might be simpler to restart at
the beginning of the BB, but I'm scared of overly long BB's.
Does anyone have statistics on the sizes of BB's in terms of
number of insns?

-- >8 --
I was a bit surprised when my define_peephole2 didn't match, but
it was because it was expected to partially match the generated
output of a previous define_peephole2.  I had assumed that the
algorithm backed-up the size of the match-buffer, thereby
exposing newly created opportunities with context to all
define_peephole2's.  While things can change in that direction,
let's start with documenting the current state.

* doc/md.texi (define_peephole2): Document order of scanning.
---
 gcc/doc/md.texi | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 07bf8bdebffb..2ce043e6edc2 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -9362,6 +9362,14 @@ If the preparation falls through (invokes neither 
@code{DONE} nor
 @code{FAIL}), then the @code{define_peephole2} uses the replacement
 template.
 
+Insns are scanned in forward order from beginning to end for each basic
+block.  Matches are attempted in order of appearance in the @file{md}
+file.  After a successful replacement, scanning for further
+opportunities for @code{define_peephole2}, resumes with the last
+generated replacement insn as the first insn to be matched.  For the
+example above, the first insn that can be matched by another
+@code{define_peephole2}, is @code{(set (match_dup 3) (match_dup 4))}.
+
 @end ifset
 @ifset INTERNALS
 @node Insn Attributes
-- 
2.30.2



Re: [GCC14 PATCH] LoongArch: Improve cpymemsi expansion [PR109465]

2023-04-18 Thread Lulu Cheng



在 2023/4/12 下午8:16, Xi Ruoyao 写道:

We'd been generating really bad block move sequences which is recently
complained by kernel developers who tried __builtin_memcpy.  To improve
it:

1. Take the advantage of -mno-strict-align.  When it is set, set mode
size to UNITS_PER_WORD regardless of the alignment.
2. Half the mode size when (block size) % (mode size) != 0, instead of
falling back to ld.bu/st.b at once.
3. Limit the length of block move sequence considering the number of
instructions, not the size of block.  When -mstrict-align is set and
the block is not aligned, the old size limit for straight-line
implementation (64 bytes) was definitely too large (we don't have 64
registers anyway).

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for GCC 14?


/* snip */



  static void
-loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length)
+loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length,
+  HOST_WIDE_INT delta)
  {
-  HOST_WIDE_INT offset, delta;
-  unsigned HOST_WIDE_INT bits;
+  HOST_WIDE_INT offs, delta_cur;
int i;
machine_mode mode;
rtx *regs;
  
-  bits = MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest)));

-
-  mode = int_mode_for_size (bits, 0).require ();
-  delta = bits / BITS_PER_UNIT;
+  HOST_WIDE_INT num_reg = length / delta;


I think comments need to be added here, if it is not chasing the code, 
it is not easy to understand.


Otherwise LGTM!

Thanks!


+  for (delta_cur = delta / 2; delta_cur != 0; delta_cur /= 2)
+num_reg += !!(length & delta_cur);
  
/* Allocate a buffer for the temporary registers.  */

-  regs = XALLOCAVEC (rtx, length / delta);
+  regs = XALLOCAVEC (rtx, num_reg);
  
-  /* Load as many BITS-sized chunks as possible.  Use a normal load if

- the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2)
  {
-  regs[i] = gen_reg_rtx (mode);
-  loongarch_emit_move (regs[i], adjust_address (src, mode, offset));
-}
+  mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require ();
  
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)

-loongarch_emit_move (adjust_address (dest, mode, offset), regs[i]);
+  for (; offs + delta_cur <= length; offs += delta_cur, i++)
+   {
+ regs[i] = gen_reg_rtx (mode);
+ loongarch_emit_move (regs[i], adjust_address (src, mode, offs));
+   }
+}
  
-  /* Mop up any left-over bytes.  */

-  if (offset < length)
+  for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2)
  {
-  src = adjust_address (src, BLKmode, offset);
-  dest = adjust_address (dest, BLKmode, offset);
-  move_by_pieces (dest, src, length - offset,
- MIN (MEM_ALIGN (src), MEM_ALIGN (dest)),
- (enum memop_ret) 0);
+  mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require ();
+
+  for (; offs + delta_cur <= length; offs += delta_cur, i++)
+   loongarch_emit_move (adjust_address (dest, mode, offs), regs[i]);
  }
  }
  
@@ -4523,10 +4520,11 @@ loongarch_adjust_block_mem (rtx mem, HOST_WIDE_INT length, rtx *loop_reg,
  
  static void

  loongarch_block_move_loop (rtx dest, rtx src, HOST_WIDE_INT length,
-  HOST_WIDE_INT bytes_per_iter)
+  HOST_WIDE_INT align)
  {
rtx_code_label *label;
rtx src_reg, dest_reg, final_src, test;
+  HOST_WIDE_INT bytes_per_iter = align * LARCH_MAX_MOVE_OPS_PER_LOOP_ITER;
HOST_WIDE_INT leftover;
  
leftover = length % bytes_per_iter;

@@ -4546,7 +4544,7 @@ loongarch_block_move_loop (rtx dest, rtx src, 
HOST_WIDE_INT length,
emit_label (label);
  
/* Emit the loop body.  */

-  loongarch_block_move_straight (dest, src, bytes_per_iter);
+  loongarch_block_move_straight (dest, src, bytes_per_iter, align);
  
/* Move on to the next block.  */

loongarch_emit_move (src_reg,
@@ -4563,7 +4561,7 @@ loongarch_block_move_loop (rtx dest, rtx src, 
HOST_WIDE_INT length,
  
/* Mop up any left-over bytes.  */

if (leftover)
-loongarch_block_move_straight (dest, src, leftover);
+loongarch_block_move_straight (dest, src, leftover, align);
else
  /* Temporary fix for PR79150.  */
  emit_insn (gen_nop ());
@@ -4573,25 +4571,32 @@ loongarch_block_move_loop (rtx dest, rtx src, 
HOST_WIDE_INT length,
 memory reference SRC to memory reference DEST.  */
  
  bool

-loongarch_expand_block_move (rtx dest, rtx src, rtx length)
+loongarch_expand_block_move (rtx dest, rtx src, rtx r_length, rtx r_align)
  {
-  int max_move_bytes = LARCH_MAX_MOVE_BYTES_STRAIGHT;
+  if (!CONST_INT_P (r_length))
+return false;
+
+  HOST_WIDE_INT length = INTVAL (r_length);
+  if (length > loongarch_max_inline_memcpy_size)
+   

Re: [PATCH] PR testsuite/106879 FAIL: gcc.dg/vect/bb-slp-layout-19.c on powerpc64

2023-04-18 Thread Kewen.Lin via Gcc-patches
Hi Jeff,

on 2023/4/19 10:03, Jiufu Guo wrote:
> Hi,
> 
> On P7, option -mno-allow-movmisalign is added during testing, which
> prevents slp happen on the case.
> 
> Like Like PR65484 and PR87306, this patch use vect_hw_misalig to guard
  Dup like...  ~~ missing the last character n.

> the case on powerpc targets.
> 
> Tested on ppc64{le,} and x86_64.
> Is this ok for trunk?
> 
> BR,
> Jeff (Jiufu)
> 
> gcc/testsuite/ChangeLog:
> 
>   PR testsuite/106879
>   * gcc.dg/vect/bb-slp-layout-19.c: Modify to guard the check with
>   vect_hw_misalig on POWERs.
...   ~ Same here.

OK for trunk with these nits fixed, thanks!

BR,
Kewen

> 
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c
> index f075a83a25b..faf98e8d3c0 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c
> @@ -31,4 +31,9 @@ void f()
>e[3] = b3;
>  }
> 
> -/* { dg-final { scan-tree-dump-times "add new stmt: \[^\\n\\r\]* = 
> VEC_PERM_EXPR" 3 "slp1" { target { vect_int_mult && vect_perm } } } } */
> +/* On older powerpc hardware (POWER7 and earlier), the default flag
> +   -mno-allow-movmisalign prevents vectorization.  On POWER8 and later,
> +   when vect_hw_misalign is true, vectorization occurs.  For other
> +   targets, ! vect_no_align is a sufficient test.  */
> +
> +/* { dg-final { scan-tree-dump-times "add new stmt: \[^\\n\\r\]* = 
> VEC_PERM_EXPR" 3 "slp1" { target { { vect_int_mult && vect_perm } && { { ! 
> powerpc*-*-* } || { vect_hw_misalign } } } } } } */



[PATCH v5] gcov: Fix "do-while" structure in case statement leads to incorrect code coverage [PR93680]

2023-04-18 Thread Xionghu Luo via Gcc-patches
v5: Refine patch and send this for gcc14 stage1.

v4: Address comments.
 4.1. Handle GIMPLE_GOTO and GIMPLE_ASM.
 4.2. Fix failure of limit-caselabels.c (labels on same line),
 pointer_array_1.f90 (unused labels) etc.

v3: Add compute_target_labels and call it in the front of make_blocks_1.
v2: Check whether two locus are on same line.

Start a new basic block if two labels have different location when
test-coverage.

Regression tested pass on x86_64-linux-gnu and aarch64-linux-gnu, OK for
master?

2023-03-22  Xionghu Luo  
Richard Biener  

gcc/ChangeLog:

PR gcov/93680
* tree-cfg.cc (stmt_starts_bb_p): Check whether the label is in
target_labels.
(compute_target_labels): New function.
(make_blocks_1): Call compute_target_labels.
(same_line_p): Return true if two locus are both
UNKOWN_LOCATION.

gcc/testsuite/ChangeLog:

PR gcov/93680
* g++.dg/gcov/gcov-1.C: Correct counts.
* gcc.misc-tests/gcov-4.c: Likewise.
* gcc.misc-tests/gcov-pr85332.c: Likewise.
* lib/gcov.exp: Also clean gcda if fail.
* gcc.misc-tests/gcov-pr93680.c: New test.

Signed-off-by: Xionghu Luo 
---
 gcc/tree-cfg.cc | 245 +---
 gcc/testsuite/g++.dg/gcov/gcov-1.C  |   2 +-
 gcc/testsuite/gcc.misc-tests/gcov-4.c   |   2 +-
 gcc/testsuite/gcc.misc-tests/gcov-pr85332.c |   2 +-
 gcc/testsuite/gcc.misc-tests/gcov-pr93680.c |  24 ++
 gcc/testsuite/lib/gcov.exp  |   4 +-
 6 files changed, 189 insertions(+), 90 deletions(-)
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr93680.c

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index a9fcc7fd050..9dca30af397 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -164,7 +164,7 @@ static edge gimple_redirect_edge_and_branch (edge, 
basic_block);
 static edge gimple_try_redirect_by_replacing_jump (edge, basic_block);
 
 /* Various helpers.  */
-static inline bool stmt_starts_bb_p (gimple *, gimple *);
+static inline bool stmt_starts_bb_p (gimple *, gimple *, hash_set *);
 static int gimple_verify_flow_info (void);
 static void gimple_make_forwarder_block (edge);
 static gimple *first_non_label_stmt (basic_block);
@@ -521,6 +521,68 @@ gimple_call_initialize_ctrl_altering (gimple *stmt)
 gimple_call_set_ctrl_altering (stmt, false);
 }
 
+/* Compute target labels to save useful labels.  */
+static void
+compute_target_labels (gimple_seq seq, hash_set *target_labels)
+{
+  gimple *stmt = NULL;
+  gimple_stmt_iterator j = gsi_start (seq);
+
+  while (!gsi_end_p (j))
+  {
+  stmt = gsi_stmt (j);
+
+  switch (gimple_code (stmt))
+  {
+   case GIMPLE_COND:
+ {
+   gcond *cstmt = as_a  (stmt);
+   tree true_label = gimple_cond_true_label (cstmt);
+   tree false_label = gimple_cond_false_label (cstmt);
+   target_labels->add (true_label);
+   target_labels->add (false_label);
+ }
+ break;
+   case GIMPLE_SWITCH:
+ {
+   gswitch *gstmt = as_a  (stmt);
+   size_t i, n = gimple_switch_num_labels (gstmt);
+   tree elt, label;
+   for (i = 0; i < n; i++)
+   {
+ elt = gimple_switch_label (gstmt, i);
+ label = CASE_LABEL (elt);
+ target_labels->add (label);
+   }
+ }
+ break;
+   case GIMPLE_GOTO:
+ if (!computed_goto_p (stmt))
+   {
+ tree dest = gimple_goto_dest (stmt);
+ target_labels->add (dest);
+   }
+ break;
+   case GIMPLE_ASM:
+ {
+   gasm *asm_stmt = as_a  (stmt);
+   int i, n = gimple_asm_nlabels (asm_stmt);
+   for (i = 0; i < n; ++i)
+   {
+ tree cons = gimple_asm_label_op (asm_stmt, i);
+ target_labels->add (cons);
+   }
+ }
+ break;
+
+   default:
+ break;
+  }
+
+  gsi_next ();
+  }
+}
+
 
 /* Insert SEQ after BB and build a flowgraph.  */
 
@@ -532,6 +594,10 @@ make_blocks_1 (gimple_seq seq, basic_block bb)
   gimple *prev_stmt = NULL;
   bool start_new_block = true;
   bool first_stmt_of_seq = true;
+  hash_set target_labels;
+
+  if (!optimize)
+compute_target_labels (seq, _labels);
 
   while (!gsi_end_p (i))
 {
@@ -553,7 +619,7 @@ make_blocks_1 (gimple_seq seq, basic_block bb)
   /* If the statement starts a new basic block or if we have determined
 in a previous pass that we need to create a new block for STMT, do
 so now.  */
-  if (start_new_block || stmt_starts_bb_p (stmt, prev_stmt))
+  if (start_new_block || stmt_starts_bb_p (stmt, prev_stmt, 
_labels))
{
  if (!first_stmt_of_seq)
gsi_split_seq_before (, );
@@ -854,102 +920,102 @@ make_edges_bb (basic_block bb, struct omp_region 
**pcur_region, int *pomp_index)
   int ret = 0;
 
   if (!last)
-return ret;
-
-  switch 

Re: [PATCH-1, rs6000] xfail float128 comparison test case that fails on powerpc64 [PR108728]

2023-04-18 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2023/4/18 11:13, HAO CHEN GUI wrote:
> Hi,
>   This patch xfails a float128 comparison test case on powerpc64
> that fails due to a longstanding issue with floating-point
> compares.
> 
>   See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58684 for more
> information.
> 
>   The case is xfailed when instructions of float128 hardware are
> generated. When software simulation is used, the case should pass.

IMHO we should make the comments here (commit log) clearer, like:
when float128 hardware gets supported (-mfloat128-hardware takes
effect), xscmpuqp is generated for comparison which is unexpected.

"When software simulation is used, the case should pass" is not so
right, I would interpret the use of __lekf2 as software simulation,
we have to xfail this too when the _hw version is used during runtime,
please make it clearer.

> 
>   The patch passed regression test on Power Linux platforms.
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: xfail float128 comparison test case that fails on powerpc64.
> 
> This patch xfails a float128 comparison test cases on powerpc64 that
> fails due to a longstanding issue with floating-point compares.
> 
> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58684 for more information.
> 
> gcc/testsuite/
>   PR target/108728
>   * gcc.dg/torture/float128-cmp-invalid.c: Add xfail.
> 
> patch.diff
> diff --git a/gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c 
> b/gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c
> index 1f675efdd61..7b520d1f9f1 100644
> --- a/gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c
> +++ b/gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c
> @@ -1,5 +1,5 @@
>  /* Test for "invalid" exceptions from __float128 comparisons.  */
> -/* { dg-do run } */
> +/* { dg-do run { xfail { ppc_float128_hw || { ppc_cpu_supports_hw && 
> p9vector_hw } } } } */

This change looks good to me, though personally I prefer dg-xfail-run-if
as we can specify one associated comment with it. :)

BR,
Kewen

>  /* { dg-options "" } */
>  /* { dg-require-effective-target __float128 } */
>  /* { dg-require-effective-target base_quadfloat_support } */



RE: [PATCH] i386: Share AES xmm intrin with VAES

2023-04-18 Thread Liu, Hongtao via Gcc-patches


> -Original Message-
> From: Jiang, Haochen 
> Sent: Wednesday, April 19, 2023 10:41 AM
> To: Hongtao Liu 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: RE: [PATCH] i386: Share AES xmm intrin with VAES
> 
> > > a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > > 33e281901cf..e7d565a8389 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -25107,67 +25107,71 @@
> > >
> > > 
> > > ;;
> > > ;;
> > >
> > >  (define_insn "aesenc"
> > > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > > +  (match_operand:V2DI 2 "vector_operand"
> > > + "xBm,xm,vm")]
> > >   UNSPEC_AESENC))]
> > > -  "TARGET_AES"
> > > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> > >"@
> > > aesenc\t{%2, %0|%0, %2}
> > > +   vaesenc\t{%2, %1, %0|%0, %1, %2}
> > > vaesenc\t{%2, %1, %0|%0, %1, %2}"
> > > -  [(set_attr "isa" "noavx,avx")
> > > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > Shouldn't it be vaes_avx512vl and then remove " || (TARGET_VAES &&
> > TARGET_AVX512VL)" from condition.
> 
> Since VAES should not imply AES, we need that "|| (TARGET_VAES &&
> TARGET_AVX512VL)"
> 
> And there is no need to add vaes_avx512vl since the last alternative will only
> be hit when there is no aes. When there is no aes, the pattern will need vaes
> and avx512vl both or we could not use this pattern. avx512vl here is just 
> like a
> placeholder.
Ok, I see, then LGTM.
> 
> BRs,
> Haochen
> 
> > Similar for below patterns.
> > Others LGTM.
> > > (set_attr "type" "sselog1")
> > > (set_attr "prefix_extra" "1")
> > > -   (set_attr "prefix" "orig,vex")
> > > -   (set_attr "btver2_decode" "double,double")
> > > +   (set_attr "prefix" "orig,vex,evex")
> > > +   (set_attr "btver2_decode" "double,double,double")
> > > (set_attr "mode" "TI")])
> > >
> > >  (define_insn "aesenclast"
> > > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > > +  (match_operand:V2DI 2 "vector_operand"
> > > + "xBm,xm,vm")]
> > >   UNSPEC_AESENCLAST))]
> > > -  "TARGET_AES"
> > > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> > >"@
> > > aesenclast\t{%2, %0|%0, %2}
> > > +   vaesenclast\t{%2, %1, %0|%0, %1, %2}
> > > vaesenclast\t{%2, %1, %0|%0, %1, %2}"
> > > -  [(set_attr "isa" "noavx,avx")
> > > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > > (set_attr "type" "sselog1")
> > > (set_attr "prefix_extra" "1")
> > > -   (set_attr "prefix" "orig,vex")
> > > -   (set_attr "btver2_decode" "double,double")
> > > +   (set_attr "prefix" "orig,vex,evex")
> > > +   (set_attr "btver2_decode" "double,double,double")
> > > (set_attr "mode" "TI")])
> > >
> > >  (define_insn "aesdec"
> > > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > > +  (match_operand:V2DI 2 "vector_operand"
> > > + "xBm,xm,vm")]
> > >   UNSPEC_AESDEC))]
> > > -  "TARGET_AES"
> > > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> > >"@
> > > aesdec\t{%2, %0|%0, %2}
> > > +   vaesdec\t{%2, %1, %0|%0, %1, %2}
> > > vaesdec\t{%2, %1, %0|%0, %1, %2}"
> > > -  [(set_attr "isa" "noavx,avx")
> > > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > > (set_attr "type" "sselog1")
> > > (set_attr "prefix_extra" "1")
> > > -   (set_attr "prefix" "orig,vex")
> > > -   (set_attr "btver2_decode" "double,double")
> > > +   (set_attr "prefix" "orig,vex,evex")
> > > +   (set_attr "btver2_decode" "double,double,double")
> > > (set_attr "mode" "TI")])
> > >
> > >  (define_insn "aesdeclast"
> > > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > > +  

Re: [PATCH-2, rs6000] Add ppc_cpu_supports_hw into proc is-effective-target-keyword [PR108728]

2023-04-18 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2023/4/18 11:13, HAO CHEN GUI via Gcc-patches wrote:
> Hi,
>   This patch adds ppc_cpu_supports_hw into explicit name checking in
> proc is-effective-target-keyword. So ppc_cpu_supports_hw can be used
> as a target selector in test directives.

I think this is the prerequisite of the one "xfail float128 comparison
test case that fails on powerpc64", so this should be PATCH 1/2 and that
one would be 2/2.  For the subject, maybe something like "testsuite: Make
ppc_cpu_supports_hw as effective target keyword [PR108728]" can be clearer.

OK for trunk, thanks.

BR,
Kewen

> 
>   The patch passed regression test on Power Linux platforms.
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Add ppc_cpu_supports_hw into proc is-effective-target-keyword.
> 
> gcc/testsuite/
>   PR target/108728
>   * lib/target-supports.exp (is-effective-target-keyword): Add
>   ppc_cpu_supports_hw.
> 
> 
> patch.diff
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 1d6cc6f8d88..e65b447663f 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -9170,6 +9170,7 @@ proc is-effective-target-keyword { arg } {
> "named_sections" { return 1 }
> "gc_sections"{ return 1 }
> "cxa_atexit" { return 1 }
> +   "ppc_cpu_supports_hw" { return 1 }
> default  { return 0 }
>   }
>  }


Re: [PATCH 13-backport 0/3] RISC-V Testsuite Fixes

2023-04-18 Thread Palmer Dabbelt

On Mon, 17 Apr 2023 23:10:03 PDT (-0700), richard.guent...@gmail.com wrote:

On Mon, Apr 17, 2023 at 8:22 PM Palmer Dabbelt  wrote:


These had been approved for trunk, but I hadn't gotten around to
committing them before the branch.  They're on trunk now.  They're all
pretty trivial test suite fixes.

OK for 13?


Yes


(Also I'm not sure if we're supposed to be using `git cherry-pick -x`)


And yes.


Thanks.  Committed.


RE: [PATCH] i386: Share AES xmm intrin with VAES

2023-04-18 Thread Jiang, Haochen via Gcc-patches
> > a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > 33e281901cf..e7d565a8389 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -25107,67 +25107,71 @@
> >
> > ;;
> > ;;
> >
> >  (define_insn "aesenc"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > +  (match_operand:V2DI 2 "vector_operand"
> > + "xBm,xm,vm")]
> >   UNSPEC_AESENC))]
> > -  "TARGET_AES"
> > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> >"@
> > aesenc\t{%2, %0|%0, %2}
> > +   vaesenc\t{%2, %1, %0|%0, %1, %2}
> > vaesenc\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "isa" "noavx,avx")
> > +  [(set_attr "isa" "noavx,aes,avx512vl")
> Shouldn't it be vaes_avx512vl and then remove " || (TARGET_VAES &&
> TARGET_AVX512VL)" from condition.

Since VAES should not imply AES, we need that "|| (TARGET_VAES && 
TARGET_AVX512VL)"

And there is no need to add vaes_avx512vl since the last alternative will only
be hit when there is no aes. When there is no aes, the pattern will need vaes
and avx512vl both or we could not use this pattern. avx512vl here is just like
a placeholder.

BRs,
Haochen

> Similar for below patterns.
> Others LGTM.
> > (set_attr "type" "sselog1")
> > (set_attr "prefix_extra" "1")
> > -   (set_attr "prefix" "orig,vex")
> > -   (set_attr "btver2_decode" "double,double")
> > +   (set_attr "prefix" "orig,vex,evex")
> > +   (set_attr "btver2_decode" "double,double,double")
> > (set_attr "mode" "TI")])
> >
> >  (define_insn "aesenclast"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > +  (match_operand:V2DI 2 "vector_operand"
> > + "xBm,xm,vm")]
> >   UNSPEC_AESENCLAST))]
> > -  "TARGET_AES"
> > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> >"@
> > aesenclast\t{%2, %0|%0, %2}
> > +   vaesenclast\t{%2, %1, %0|%0, %1, %2}
> > vaesenclast\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "isa" "noavx,avx")
> > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > (set_attr "type" "sselog1")
> > (set_attr "prefix_extra" "1")
> > -   (set_attr "prefix" "orig,vex")
> > -   (set_attr "btver2_decode" "double,double")
> > +   (set_attr "prefix" "orig,vex,evex")
> > +   (set_attr "btver2_decode" "double,double,double")
> > (set_attr "mode" "TI")])
> >
> >  (define_insn "aesdec"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > +  (match_operand:V2DI 2 "vector_operand"
> > + "xBm,xm,vm")]
> >   UNSPEC_AESDEC))]
> > -  "TARGET_AES"
> > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> >"@
> > aesdec\t{%2, %0|%0, %2}
> > +   vaesdec\t{%2, %1, %0|%0, %1, %2}
> > vaesdec\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "isa" "noavx,avx")
> > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > (set_attr "type" "sselog1")
> > (set_attr "prefix_extra" "1")
> > -   (set_attr "prefix" "orig,vex")
> > -   (set_attr "btver2_decode" "double,double")
> > +   (set_attr "prefix" "orig,vex,evex")
> > +   (set_attr "btver2_decode" "double,double,double")
> > (set_attr "mode" "TI")])
> >
> >  (define_insn "aesdeclast"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > +  (match_operand:V2DI 2 "vector_operand"
> > + "xBm,xm,vm")]
> >   UNSPEC_AESDECLAST))]
> > -  "TARGET_AES"
> > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> >"@
> > aesdeclast\t{%2, %0|%0, %2}
> > +   vaesdeclast\t{%2, %1, %0|%0, %1, %2}
> > vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "isa" "noavx,avx")
> > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > (set_attr "type" "sselog1")
> > (set_attr "prefix_extra" "1")
> > -   (set_attr 

Re: [PATCH] i386: Share AES xmm intrin with VAES

2023-04-18 Thread Hongtao Liu via Gcc-patches
On Tue, Apr 18, 2023 at 3:19 PM Haochen Jiang via Gcc-patches
 wrote:
>
> Hi all,
>
> Currently in GCC, the 128 bit intrin for instruction vaes{end,dec}{last,}
> is under AES ISA. Because there is no dependency between ISA set AES
> and VAES, The 128 bit intrin is not available when we use compiler flag
> -mvaes -mavx512vl and there is no other way to use that intrin. But it
> should according to Intel SDM.
>
> Although VAES aims to be a VEX/EVEX promotion for AES, but it is only part
> of it. Therefore, we share the AES xmm intrin with VAES.
>
> Also, since -mvaes indicates that we could use VEX encoding for ymm, we
> should imply AVX for VAES.
>
> Tested on x86_64-pc-linux-gnu. Ok for trunk?
>
> BRs,
> Haochen
>
> gcc/ChangeLog:
>
> * common/config/i386/i386-common.cc
> (OPTION_MASK_ISA2_AVX_UNSET): Add OPTION_MASK_ISA2_VAES_UNSET.
> (ix86_handle_option): Set AVX flag for VAES.
> * config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins):
> Add OPTION_MASK_ISA2_VAES_UNSET.
> (def_builtin): Share builtin between AES and VAES.
> * config/i386/i386-expand.cc (ix86_check_builtin_isa_match):
> Ditto.
> * config/i386/i386.md (aes): New isa attribute.
> * config/i386/sse.md (aesenc): Add pattern for VAES with xmm.
> (aesenclast): Ditto.
> (aesdec): Ditto.
> (aesdeclast): Ditto.
> * config/i386/vaesintrin.h: Remove redundant avx target push.
> * config/i386/wmmintrin.h (_mm_aesdec_si128): Change to macro.
> (_mm_aesdeclast_si128): Ditto.
> (_mm_aesenc_si128): Ditto.
> (_mm_aesenclast_si128): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512fvl-vaes-1.c: Add VAES xmm test.
> * gcc.target/i386/pr84335.c: Modify error message.
> ---
>  gcc/common/config/i386/i386-common.cc |  5 +-
>  gcc/config/i386/i386-builtins.cc  | 21 ---
>  gcc/config/i386/i386-expand.cc|  1 +
>  gcc/config/i386/i386.md   |  3 +-
>  gcc/config/i386/sse.md| 60 ++-
>  gcc/config/i386/vaesintrin.h  |  4 +-
>  gcc/config/i386/wmmintrin.h   | 29 +++--
>  .../gcc.target/i386/avx512fvl-vaes-1.c| 11 
>  gcc/testsuite/gcc.target/i386/pr84335.c   |  4 +-
>  9 files changed, 75 insertions(+), 63 deletions(-)
>
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index c7954da8e34..bf126f14073 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -348,7 +348,8 @@ along with GCC; see the file COPYING3.  If not see
> | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET)
>  #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
>OPTION_MASK_ISA2_SSE_UNSET
> -#define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
> +#define OPTION_MASK_ISA2_AVX_UNSET \
> +  (OPTION_MASK_ISA2_AVX2_UNSET | OPTION_MASK_ISA2_VAES_UNSET)
>  #define OPTION_MASK_ISA2_SSE4_2_UNSET OPTION_MASK_ISA2_AVX_UNSET
>  #define OPTION_MASK_ISA2_SSE4_1_UNSET OPTION_MASK_ISA2_SSE4_2_UNSET
>  #define OPTION_MASK_ISA2_SSE4_UNSET OPTION_MASK_ISA2_SSE4_1_UNSET
> @@ -685,6 +686,8 @@ ix86_handle_option (struct gcc_options *opts,
> {
>   opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_VAES_SET;
>   opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_VAES_SET;
> + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX_SET;
> + opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX_SET;
> }
>else
> {
> diff --git a/gcc/config/i386/i386-builtins.cc 
> b/gcc/config/i386/i386-builtins.cc
> index fc0c82b156e..28f404da288 100644
> --- a/gcc/config/i386/i386-builtins.cc
> +++ b/gcc/config/i386/i386-builtins.cc
> @@ -279,14 +279,15 @@ def_builtin (HOST_WIDE_INT mask, HOST_WIDE_INT mask2,
>if (((mask2 == 0 || (mask2 & ix86_isa_flags2) != 0)
>&& (mask == 0 || (mask & ix86_isa_flags) != 0))
>   || ((mask & OPTION_MASK_ISA_MMX) != 0 && TARGET_MMX_WITH_SSE)
> - /* "Unified" builtin used by either AVXVNNI/AVXIFMA intrinsics
> -or AVX512VNNIVL/AVX512IFMAVL non-mask intrinsics should be
> -defined whenever avxvnni/avxifma or avx512vnni/avxifma &&
> -avx512vl exist.  */
> + /* "Unified" builtin used by either AVXVNNI/AVXIFMA/AES intrinsics
> +or AVX512VNNIVL/AVX512IFMAVL/VAESVL non-mask intrinsics should be
> +defined whenever avxvnni/avxifma/aes or 
> avx512vnni/avx512ifma/vaes
> +&& avx512vl exist.  */
>   || (mask2 == OPTION_MASK_ISA2_AVXVNNI)
>   || (mask2 == OPTION_MASK_ISA2_AVXIFMA)
>   || (mask2 == (OPTION_MASK_ISA2_AVXNECONVERT
> | OPTION_MASK_ISA2_AVX512BF16))
> + || ((mask2 & OPTION_MASK_ISA2_VAES) != 0)
>   || (lang_hooks.builtin_function
>   == 

Re: [PATCH] i386: Use macro to wrap up share builtin exceptions in builtin isa check

2023-04-18 Thread Hongtao Liu via Gcc-patches
On Tue, Apr 18, 2023 at 2:57 PM Haochen Jiang via Gcc-patches
 wrote:
>
> Hi all,
>
> Currently in i386, we have several ISAs share builtin between each other
> which is handled in ix86_check_builtin_isa_match with if condition clauses.
>
> The patterns for these clauses are quite similar so it will be more friendly
> for developers if we rewrite them as a macro.
>
> This patch adds that macro. Tested on x86_64-pc-linux-gnu. Ok for trunk?
Ok.
>
> BRs,
> Haochen
>
> gcc/ChangeLog:
>
> * config/i386/i386-expand.cc
> (ix86_check_builtin_isa_match): Correct wrong comments.
> Add a new macro SHARE_BUILTIN and refactor the current if
> clauses to macro.
> ---
>  gcc/config/i386/i386-expand.cc | 72 --
>  1 file changed, 24 insertions(+), 48 deletions(-)
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 0d817fc3f3b..54d5dfae677 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -12588,6 +12588,7 @@ ix86_check_builtin_isa_match (unsigned int fcode,
>HOST_WIDE_INT isa2 = ix86_isa_flags2;
>HOST_WIDE_INT bisa = ix86_builtins_isa[fcode].isa;
>HOST_WIDE_INT bisa2 = ix86_builtins_isa[fcode].isa2;
> +  HOST_WIDE_INT tmp_isa = isa, tmp_isa2 = isa2;
>/* The general case is we require all the ISAs specified in bisa{,2}
>   to be enabled.
>   The exceptions are:
> @@ -12596,60 +12597,35 @@ ix86_check_builtin_isa_match (unsigned int fcode,
>   OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4
>   (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL) or
> OPTION_MASK_ISA2_AVXVNNI
> - (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512IFMA) or
> + (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL) or
> OPTION_MASK_ISA2_AVXIFMA
> - (OPTION_MASK_ISA_AVXNECONVERT | OPTION_MASK_ISA2_AVX512BF16) or
> + (OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA2_AVX512BF16) or
> OPTION_MASK_ISA2_AVXNECONVERT
>   where for each such pair it is sufficient if either of the ISAs is
>   enabled, plus if it is ored with other options also those others.
>   OPTION_MASK_ISA_MMX in bisa is satisfied also if TARGET_MMX_WITH_SSE.  
> */
> -  if (((bisa & (OPTION_MASK_ISA_SSE | OPTION_MASK_ISA_3DNOW_A))
> -   == (OPTION_MASK_ISA_SSE | OPTION_MASK_ISA_3DNOW_A))
> -  && (isa & (OPTION_MASK_ISA_SSE | OPTION_MASK_ISA_3DNOW_A)) != 0)
> -isa |= (OPTION_MASK_ISA_SSE | OPTION_MASK_ISA_3DNOW_A);
>
> -  if (((bisa & (OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_CRC32))
> -   == (OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_CRC32))
> -  && (isa & (OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_CRC32)) != 0)
> -isa |= (OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_CRC32);
> -
> -  if (((bisa & (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4))
> -   == (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4))
> -  && (isa & (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4)) != 0)
> -isa |= (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4);
> -
> -  if bisa & (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL))
> -   == (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL))
> -   || (bisa2 & OPTION_MASK_ISA2_AVXVNNI) != 0)
> -  && (((isa & (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL))
> -  == (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL))
> - || (isa2 & OPTION_MASK_ISA2_AVXVNNI) != 0))
> -{
> -  isa |= OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL;
> -  isa2 |= OPTION_MASK_ISA2_AVXVNNI;
> -}
> -
> -  if bisa & (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL))
> -   == (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL))
> -   || (bisa2 & OPTION_MASK_ISA2_AVXIFMA) != 0)
> -  && (((isa & (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL))
> -  == (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL))
> - || (isa2 & OPTION_MASK_ISA2_AVXIFMA) != 0))
> -{
> -  isa |= OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL;
> -  isa2 |= OPTION_MASK_ISA2_AVXIFMA;
> -}
> -
> -  if bisa & OPTION_MASK_ISA_AVX512VL) != 0
> -&& (bisa2 & OPTION_MASK_ISA2_AVX512BF16) != 0)
> -   && (bisa2 & OPTION_MASK_ISA2_AVXNECONVERT) != 0)
> -   && (((isa & OPTION_MASK_ISA_AVX512VL) != 0
> -   && (isa2 & OPTION_MASK_ISA2_AVX512BF16) != 0)
> -  || (isa2 & OPTION_MASK_ISA2_AVXNECONVERT) != 0))
> -{
> -  isa |= OPTION_MASK_ISA_AVX512VL;
> -  isa2 |= OPTION_MASK_ISA2_AVXNECONVERT | OPTION_MASK_ISA2_AVX512BF16;
> -}
> +#define SHARE_BUILTIN(A1, A2, B1, B2) \
> +  if bisa & (A1)) == (A1) && (bisa2 & (A2)) == (A2)) \
> +   && ((bisa & (B1)) == (B1) && (bisa2 & (B2)) == (B2))) \
> +  && (((isa & (A1)) == (A1) && (isa2 & (A2)) == (A2)) \
> + || ((isa & (B1)) == (B1) && (isa2 & (B2)) == (B2 \
> +{ \
> +  tmp_isa |= (A1) | (B1); \
> +  tmp_isa2 |= (A2) | (B2); \
> +  

RE: [PATCH] Re-arrange sections of i386 cpuid

2023-04-18 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Mo, Zewei 
> Sent: Wednesday, April 19, 2023 10:03 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] Re-arrange sections of i386 cpuid
> 
> Re-order i386 cpuid based on the order of CPUID.
> 
> gcc/ChangeLog:
> 
> * config/i386/cpuid.h: Open a new section for Extended Features
>   Leaf (%eax == 7, %ecx == 0) and Extended Features Sub-leaf (%eax
> == 7,
>   %ecx == 1).
Ok.
> ---
>  gcc/config/i386/cpuid.h | 35 +++
>  1 file changed, 19 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index
> be162dd8c78..971781c2b91 100644
> --- a/gcc/config/i386/cpuid.h
> +++ b/gcc/config/i386/cpuid.h
> @@ -24,15 +24,6 @@
>  #ifndef _CPUID_H_INCLUDED
>  #define _CPUID_H_INCLUDED
> 
> -/* %eax */
> -#define bit_RAOINT   (1 << 3)
> -#define bit_AVXVNNI  (1 << 4)
> -#define bit_AVX512BF16   (1 << 5)
> -#define bit_CMPCCXADD(1 << 7)
> -#define bit_AMX_FP16 (1 << 21)
> -#define bit_HRESET   (1 << 22)
> -#define bit_AVXIFMA  (1 << 23)
> -
>  /* %ecx */
>  #define bit_SSE3 (1 << 0)
>  #define bit_PCLMUL   (1 << 1)
> @@ -52,10 +43,7 @@
>  #define bit_RDRND(1 << 30)
> 
>  /* %edx */
> -#define bit_AVXVNNIINT8 (1 << 4)
> -#define bit_AVXNECONVERT (1 << 5)
>  #define bit_CMPXCHG8B(1 << 8)
> -#define bit_PREFETCHI(1 << 14)
>  #define bit_CMOV (1 << 15)
>  #define bit_MMX  (1 << 23)
>  #define bit_FXSAVE   (1 << 24)
> @@ -84,7 +72,7 @@
>  #define bit_CLZERO   (1 << 0)
>  #define bit_WBNOINVD (1 << 9)
> 
> -/* Extended Features (%eax == 7) */
> +/* Extended Features Leaf (%eax == 7, %ecx == 0) */
>  /* %ebx */
>  #define bit_FSGSBASE (1 << 0)
>  #define bit_SGX (1 << 2)
> @@ -132,9 +120,9 @@
>  #define bit_AVX5124VNNIW (1 << 2)
>  #define bit_AVX5124FMAPS (1 << 3)
>  #define bit_AVX512VP2INTERSECT   (1 << 8)
> -#define bit_AVX512FP16   (1 << 23)
> -#define bit_IBT  (1 << 20)
> -#define bit_UINTR (1 << 5)
> +#define bit_AVX512FP16   (1 << 23)
> +#define bit_IBT (1 << 20)
> +#define bit_UINTR   (1 << 5)
>  #define bit_PCONFIG  (1 << 18)
>  #define bit_SERIALIZE(1 << 14)
>  #define bit_TSXLDTRK(1 << 16)
> @@ -142,6 +130,21 @@
>  #define bit_AMX_TILE(1 << 24)
>  #define bit_AMX_INT8(1 << 25)
> 
> +/* Extended Features Sub-leaf (%eax == 7, %ecx == 1) */
> +/* %eax */
> +#define bit_RAOINT  (1 << 3)
> +#define bit_AVXVNNI (1 << 4)
> +#define bit_AVX512BF16  (1 << 5)
> +#define bit_CMPCCXADD   (1 << 7)
> +#define bit_AMX_FP16(1 << 21)
> +#define bit_HRESET  (1 << 22)
> +#define bit_AVXIFMA (1 << 23)
> +
> +/* %edx */
> +#define bit_AVXVNNIINT8 (1 << 4)
> +#define bit_AVXNECONVERT (1 << 5)
> +#define bit_PREFETCHI (1 << 14)
> +
>  /* Extended State Enumeration Sub-leaf (%eax == 0xd, %ecx == 1) */
>  #define bit_XSAVEOPT (1 << 0)
>  #define bit_XSAVEC   (1 << 1)
> --
> 2.31.1



Re: [PATCH v5] gcc: Drop obsolete INCLUDE_PTHREAD_H

2023-04-18 Thread Sam James via Gcc-patches

Jeff Law  writes:

> On 4/2/23 15:33, Sam James wrote:
>> gcc/ChangeLog:
>>  * system.h: Drop unused INCLUDE_PTHREAD_H.
> THanks.  I've pushed this to the trunk.

Cheers Jeff!

> jeff

best,
sam


signature.asc
Description: PGP signature


[PATCH] Re-arrange sections of i386 cpuid

2023-04-18 Thread Mo, Zewei via Gcc-patches
Re-order i386 cpuid based on the order of CPUID.

gcc/ChangeLog:

* config/i386/cpuid.h: Open a new section for Extended Features
Leaf (%eax == 7, %ecx == 0) and Extended Features Sub-leaf (%eax == 7,
%ecx == 1).
---
 gcc/config/i386/cpuid.h | 35 +++
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index be162dd8c78..971781c2b91 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -24,15 +24,6 @@
 #ifndef _CPUID_H_INCLUDED
 #define _CPUID_H_INCLUDED
 
-/* %eax */
-#define bit_RAOINT (1 << 3)
-#define bit_AVXVNNI(1 << 4)
-#define bit_AVX512BF16 (1 << 5)
-#define bit_CMPCCXADD  (1 << 7)
-#define bit_AMX_FP16   (1 << 21)
-#define bit_HRESET (1 << 22)
-#define bit_AVXIFMA(1 << 23)
-
 /* %ecx */
 #define bit_SSE3   (1 << 0)
 #define bit_PCLMUL (1 << 1)
@@ -52,10 +43,7 @@
 #define bit_RDRND  (1 << 30)
 
 /* %edx */
-#define bit_AVXVNNIINT8 (1 << 4)
-#define bit_AVXNECONVERT (1 << 5)
 #define bit_CMPXCHG8B  (1 << 8)
-#define bit_PREFETCHI  (1 << 14)
 #define bit_CMOV   (1 << 15)
 #define bit_MMX(1 << 23)
 #define bit_FXSAVE (1 << 24)
@@ -84,7 +72,7 @@
 #define bit_CLZERO (1 << 0)
 #define bit_WBNOINVD   (1 << 9)
 
-/* Extended Features (%eax == 7) */
+/* Extended Features Leaf (%eax == 7, %ecx == 0) */
 /* %ebx */
 #define bit_FSGSBASE   (1 << 0)
 #define bit_SGX (1 << 2)
@@ -132,9 +120,9 @@
 #define bit_AVX5124VNNIW (1 << 2)
 #define bit_AVX5124FMAPS (1 << 3)
 #define bit_AVX512VP2INTERSECT (1 << 8)
-#define bit_AVX512FP16   (1 << 23)
-#define bit_IBT(1 << 20)
-#define bit_UINTR (1 << 5)
+#define bit_AVX512FP16 (1 << 23)
+#define bit_IBT (1 << 20)
+#define bit_UINTR   (1 << 5)
 #define bit_PCONFIG(1 << 18)
 #define bit_SERIALIZE  (1 << 14)
 #define bit_TSXLDTRK(1 << 16)
@@ -142,6 +130,21 @@
 #define bit_AMX_TILE(1 << 24)
 #define bit_AMX_INT8(1 << 25)
 
+/* Extended Features Sub-leaf (%eax == 7, %ecx == 1) */
+/* %eax */
+#define bit_RAOINT  (1 << 3)
+#define bit_AVXVNNI (1 << 4)
+#define bit_AVX512BF16  (1 << 5)
+#define bit_CMPCCXADD   (1 << 7)
+#define bit_AMX_FP16(1 << 21)
+#define bit_HRESET  (1 << 22)
+#define bit_AVXIFMA (1 << 23)
+
+/* %edx */
+#define bit_AVXVNNIINT8 (1 << 4)
+#define bit_AVXNECONVERT (1 << 5)
+#define bit_PREFETCHI (1 << 14)
+
 /* Extended State Enumeration Sub-leaf (%eax == 0xd, %ecx == 1) */
 #define bit_XSAVEOPT   (1 << 0)
 #define bit_XSAVEC (1 << 1)
-- 
2.31.1



[PATCH] PR testsuite/106879 FAIL: gcc.dg/vect/bb-slp-layout-19.c on powerpc64

2023-04-18 Thread Jiufu Guo via Gcc-patches
Hi,

On P7, option -mno-allow-movmisalign is added during testing, which
prevents slp happen on the case.

Like Like PR65484 and PR87306, this patch use vect_hw_misalig to guard
the case on powerpc targets.

Tested on ppc64{le,} and x86_64.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/testsuite/ChangeLog:

PR testsuite/106879
* gcc.dg/vect/bb-slp-layout-19.c: Modify to guard the check with
vect_hw_misalig on POWERs.

---
 gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c
index f075a83a25b..faf98e8d3c0 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c
@@ -31,4 +31,9 @@ void f()
   e[3] = b3;
 }
 
-/* { dg-final { scan-tree-dump-times "add new stmt: \[^\\n\\r\]* = 
VEC_PERM_EXPR" 3 "slp1" { target { vect_int_mult && vect_perm } } } } */
+/* On older powerpc hardware (POWER7 and earlier), the default flag
+   -mno-allow-movmisalign prevents vectorization.  On POWER8 and later,
+   when vect_hw_misalign is true, vectorization occurs.  For other
+   targets, ! vect_no_align is a sufficient test.  */
+
+/* { dg-final { scan-tree-dump-times "add new stmt: \[^\\n\\r\]* = 
VEC_PERM_EXPR" 3 "slp1" { target { { vect_int_mult && vect_perm } && { { ! 
powerpc*-*-* } || { vect_hw_misalign } } } } } } */
-- 
2.31.1



Re: [PATCH] i386: Add PCLMUL dependency for VPCLMULQDQ

2023-04-18 Thread Hongtao Liu via Gcc-patches
On Wed, Apr 19, 2023 at 9:54 AM Hongtao Liu  wrote:
>
> On Tue, Apr 18, 2023 at 3:18 PM Haochen Jiang via Gcc-patches
>  wrote:
> >
> > Hi all,
> >
> > Currently in GCC, the 128 bit intrin for instruction vpclmulqdq is
> > under PCLMUL ISA. Because there is no dependency between ISA set PCLMUL
> > and VPCLMULQDQ, The 128 bit intrin is not available when we just use
> > compiler flag -mvpclmulqdq. But it should according to Intel SDM.
> >
> > Since VPCLMULQDQ is a VEX/EVEX promotion for PCLMUL, it is natural to
> > add dependency between them.
> >
> > Also, with -mvpclmulqdq, we can use ymm under VEX encoding, so
> > VPCLMULQDQ should imply AVX.
> >
> > Tested on x86_64-pc-linux-gnu. Ok for trunk?
> >
> > BRs,
> > Haochen
> >
> > gcc/ChangeLog:
> >
> > * common/config/i386/i386-common.cc
> > (OPTION_MASK_ISA_VPCLMULQDQ_SET):
> > Add OPTION_MASK_ISA_PCLMUL_SET and OPTION_MASK_ISA_AVX_SET.
> > (OPTION_MASK_ISA_AVX_UNSET):
> > Add OPTION_MASK_ISA_VPCLMULQDQ_UNSET.
> > (OPTION_MASK_ISA_PCLMUL_UNSET): Ditto.
> > * config/i386/i386.md (vpclmulqdqvl): New.
> > * config/i386/sse.md (pclmulqdq): Add evex encoding.
> > * config/i386/vpclmulqdqintrin.h: Remove redudant avx target
> > push.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/vpclmulqdq.c: Add compile test for xmm.
> > ---
> >  gcc/common/config/i386/i386-common.cc  |  9 ++---
> >  gcc/config/i386/i386.md|  4 +++-
> >  gcc/config/i386/sse.md | 11 ++-
> >  gcc/config/i386/vpclmulqdqintrin.h |  4 ++--
> >  gcc/testsuite/gcc.target/i386/vpclmulqdq.c |  3 +++
> >  5 files changed, 20 insertions(+), 11 deletions(-)
> >
> > diff --git a/gcc/common/config/i386/i386-common.cc 
> > b/gcc/common/config/i386/i386-common.cc
> > index 315db854862..c7954da8e34 100644
> > --- a/gcc/common/config/i386/i386-common.cc
> > +++ b/gcc/common/config/i386/i386-common.cc
> > @@ -171,7 +171,9 @@ along with GCC; see the file COPYING3.  If not see
> >  #define OPTION_MASK_ISA_GFNI_SET OPTION_MASK_ISA_GFNI
> >  #define OPTION_MASK_ISA_SHSTK_SET OPTION_MASK_ISA_SHSTK
> >  #define OPTION_MASK_ISA2_VAES_SET OPTION_MASK_ISA2_VAES
> > -#define OPTION_MASK_ISA_VPCLMULQDQ_SET OPTION_MASK_ISA_VPCLMULQDQ
> > +#define OPTION_MASK_ISA_VPCLMULQDQ_SET \
> > +  (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_PCLMUL_SET \
> > +   | OPTION_MASK_ISA_AVX_SET)
> >  #define OPTION_MASK_ISA_MOVDIRI_SET OPTION_MASK_ISA_MOVDIRI
> >  #define OPTION_MASK_ISA2_MOVDIR64B_SET OPTION_MASK_ISA2_MOVDIR64B
> >  #define OPTION_MASK_ISA2_WAITPKG_SET OPTION_MASK_ISA2_WAITPKG
> > @@ -211,7 +213,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #define OPTION_MASK_ISA_AVX_UNSET \
> >(OPTION_MASK_ISA_AVX | OPTION_MASK_ISA_FMA_UNSET \
> > | OPTION_MASK_ISA_FMA4_UNSET | OPTION_MASK_ISA_F16C_UNSET \
> > -   | OPTION_MASK_ISA_AVX2_UNSET )
> > +   | OPTION_MASK_ISA_AVX2_UNSET | OPTION_MASK_ISA_VPCLMULQDQ_UNSET)
> >  #define OPTION_MASK_ISA_FMA_UNSET OPTION_MASK_ISA_FMA
> >  #define OPTION_MASK_ISA_FXSR_UNSET OPTION_MASK_ISA_FXSR
> >  #define OPTION_MASK_ISA_XSAVE_UNSET \
> > @@ -314,7 +316,8 @@ along with GCC; see the file COPYING3.  If not see
> >
> >  #define OPTION_MASK_ISA_AES_UNSET OPTION_MASK_ISA_AES
> >  #define OPTION_MASK_ISA_SHA_UNSET OPTION_MASK_ISA_SHA
> > -#define OPTION_MASK_ISA_PCLMUL_UNSET OPTION_MASK_ISA_PCLMUL
> > +#define OPTION_MASK_ISA_PCLMUL_UNSET \
> > +  (OPTION_MASK_ISA_PCLMUL | OPTION_MASK_ISA_VPCLMULQDQ_UNSET)
> >  #define OPTION_MASK_ISA_ABM_UNSET OPTION_MASK_ISA_ABM
> >  #define OPTION_MASK_ISA2_PCONFIG_UNSET OPTION_MASK_ISA2_PCONFIG
> >  #define OPTION_MASK_ISA2_WBNOINVD_UNSET OPTION_MASK_ISA2_WBNOINVD
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index ed689b044c3..acc994226e7 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -841,7 +841,7 @@
> > 
> > avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
> > avx512bw,noavx512bw,avx512dq,noavx512dq,fma_or_avx512vl,
> > 
> > avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16,avxifma,
> > -   avx512ifmavl,avxneconvert,avx512bf16vl"
> > +   avx512ifmavl,avxneconvert,avx512bf16vl,vpclmulqdqvl"
> >(const_string "base"))
> >
> >  ;; Define instruction set of MMX instructions
> > @@ -903,6 +903,8 @@
> >  (eq_attr "isa" "avxneconvert") (symbol_ref "TARGET_AVXNECONVERT")
> >  (eq_attr "isa" "avx512bf16vl")
> >(symbol_ref "TARGET_AVX512BF16 && TARGET_AVX512VL")
> > +(eq_attr "isa" "vpclmulqdqvl")
> > +  (symbol_ref "TARGET_VPCLMULQDQ && TARGET_AVX512VL")
> >
> >  (eq_attr "mmx_isa" "native")
> >(symbol_ref "!TARGET_MMX_WITH_SSE")
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 26812ab6106..33e281901cf 100644
> > --- 

Re: [PATCH] i386: Add PCLMUL dependency for VPCLMULQDQ

2023-04-18 Thread Hongtao Liu via Gcc-patches
On Tue, Apr 18, 2023 at 3:18 PM Haochen Jiang via Gcc-patches
 wrote:
>
> Hi all,
>
> Currently in GCC, the 128 bit intrin for instruction vpclmulqdq is
> under PCLMUL ISA. Because there is no dependency between ISA set PCLMUL
> and VPCLMULQDQ, The 128 bit intrin is not available when we just use
> compiler flag -mvpclmulqdq. But it should according to Intel SDM.
>
> Since VPCLMULQDQ is a VEX/EVEX promotion for PCLMUL, it is natural to
> add dependency between them.
>
> Also, with -mvpclmulqdq, we can use ymm under VEX encoding, so
> VPCLMULQDQ should imply AVX.
>
> Tested on x86_64-pc-linux-gnu. Ok for trunk?
>
> BRs,
> Haochen
>
> gcc/ChangeLog:
>
> * common/config/i386/i386-common.cc
> (OPTION_MASK_ISA_VPCLMULQDQ_SET):
> Add OPTION_MASK_ISA_PCLMUL_SET and OPTION_MASK_ISA_AVX_SET.
> (OPTION_MASK_ISA_AVX_UNSET):
> Add OPTION_MASK_ISA_VPCLMULQDQ_UNSET.
> (OPTION_MASK_ISA_PCLMUL_UNSET): Ditto.
> * config/i386/i386.md (vpclmulqdqvl): New.
> * config/i386/sse.md (pclmulqdq): Add evex encoding.
> * config/i386/vpclmulqdqintrin.h: Remove redudant avx target
> push.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/vpclmulqdq.c: Add compile test for xmm.
> ---
>  gcc/common/config/i386/i386-common.cc  |  9 ++---
>  gcc/config/i386/i386.md|  4 +++-
>  gcc/config/i386/sse.md | 11 ++-
>  gcc/config/i386/vpclmulqdqintrin.h |  4 ++--
>  gcc/testsuite/gcc.target/i386/vpclmulqdq.c |  3 +++
>  5 files changed, 20 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index 315db854862..c7954da8e34 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -171,7 +171,9 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA_GFNI_SET OPTION_MASK_ISA_GFNI
>  #define OPTION_MASK_ISA_SHSTK_SET OPTION_MASK_ISA_SHSTK
>  #define OPTION_MASK_ISA2_VAES_SET OPTION_MASK_ISA2_VAES
> -#define OPTION_MASK_ISA_VPCLMULQDQ_SET OPTION_MASK_ISA_VPCLMULQDQ
> +#define OPTION_MASK_ISA_VPCLMULQDQ_SET \
> +  (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_PCLMUL_SET \
> +   | OPTION_MASK_ISA_AVX_SET)
>  #define OPTION_MASK_ISA_MOVDIRI_SET OPTION_MASK_ISA_MOVDIRI
>  #define OPTION_MASK_ISA2_MOVDIR64B_SET OPTION_MASK_ISA2_MOVDIR64B
>  #define OPTION_MASK_ISA2_WAITPKG_SET OPTION_MASK_ISA2_WAITPKG
> @@ -211,7 +213,7 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA_AVX_UNSET \
>(OPTION_MASK_ISA_AVX | OPTION_MASK_ISA_FMA_UNSET \
> | OPTION_MASK_ISA_FMA4_UNSET | OPTION_MASK_ISA_F16C_UNSET \
> -   | OPTION_MASK_ISA_AVX2_UNSET )
> +   | OPTION_MASK_ISA_AVX2_UNSET | OPTION_MASK_ISA_VPCLMULQDQ_UNSET)
>  #define OPTION_MASK_ISA_FMA_UNSET OPTION_MASK_ISA_FMA
>  #define OPTION_MASK_ISA_FXSR_UNSET OPTION_MASK_ISA_FXSR
>  #define OPTION_MASK_ISA_XSAVE_UNSET \
> @@ -314,7 +316,8 @@ along with GCC; see the file COPYING3.  If not see
>
>  #define OPTION_MASK_ISA_AES_UNSET OPTION_MASK_ISA_AES
>  #define OPTION_MASK_ISA_SHA_UNSET OPTION_MASK_ISA_SHA
> -#define OPTION_MASK_ISA_PCLMUL_UNSET OPTION_MASK_ISA_PCLMUL
> +#define OPTION_MASK_ISA_PCLMUL_UNSET \
> +  (OPTION_MASK_ISA_PCLMUL | OPTION_MASK_ISA_VPCLMULQDQ_UNSET)
>  #define OPTION_MASK_ISA_ABM_UNSET OPTION_MASK_ISA_ABM
>  #define OPTION_MASK_ISA2_PCONFIG_UNSET OPTION_MASK_ISA2_PCONFIG
>  #define OPTION_MASK_ISA2_WBNOINVD_UNSET OPTION_MASK_ISA2_WBNOINVD
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index ed689b044c3..acc994226e7 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -841,7 +841,7 @@
> avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
> avx512bw,noavx512bw,avx512dq,noavx512dq,fma_or_avx512vl,
> 
> avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16,avxifma,
> -   avx512ifmavl,avxneconvert,avx512bf16vl"
> +   avx512ifmavl,avxneconvert,avx512bf16vl,vpclmulqdqvl"
>(const_string "base"))
>
>  ;; Define instruction set of MMX instructions
> @@ -903,6 +903,8 @@
>  (eq_attr "isa" "avxneconvert") (symbol_ref "TARGET_AVXNECONVERT")
>  (eq_attr "isa" "avx512bf16vl")
>(symbol_ref "TARGET_AVX512BF16 && TARGET_AVX512VL")
> +(eq_attr "isa" "vpclmulqdqvl")
> +  (symbol_ref "TARGET_VPCLMULQDQ && TARGET_AVX512VL")
>
>  (eq_attr "mmx_isa" "native")
>(symbol_ref "!TARGET_MMX_WITH_SSE")
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 26812ab6106..33e281901cf 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -25195,20 +25195,21 @@
> (set_attr "mode" "TI")])
>
>  (define_insn "pclmulqdq"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> -  

Re: [PATCH] i386: Fix vpblendm{b,w} intrins and insns

2023-04-18 Thread Hongtao Liu via Gcc-patches
On Tue, Apr 18, 2023 at 3:15 PM Haochen Jiang via Gcc-patches
 wrote:
>
> Hi all,
>
> For vpblendm{b,w}, they actually do not have constant parameters.
> Therefore, there is no need for them been wrapped in __OPTIMIZE__.
>
> Also, we should check TARGET_AVX512VL for 128/256 bit vectors in patterns.
>
> This patch did the fixes mentioned above. Tested on x86_64-pc-linux-gnu.
> Ok for trunk?
Ok.
>
> BRs,
> Haochen
>
> gcc/ChangeLog:
>
> * config/i386/avx512vlbwintrin.h
> (_mm_mask_blend_epi16): Remove __OPTIMIZE__ wrapper.
> (_mm_mask_blend_epi8): Ditto.
> (_mm256_mask_blend_epi16): Ditto.
> (_mm256_mask_blend_epi8): Ditto.
> * config/i386/avx512vlintrin.h
> (_mm256_mask_blend_pd): Ditto.
> (_mm256_mask_blend_ps): Ditto.
> (_mm256_mask_blend_epi64): Ditto.
> (_mm256_mask_blend_epi32): Ditto.
> (_mm_mask_blend_pd): Ditto.
> (_mm_mask_blend_ps): Ditto.
> (_mm_mask_blend_epi64): Ditto.
> (_mm_mask_blend_epi32): Ditto.
> * config/i386/sse.md (VF_AVX512BWHFBF16): Removed.
> (VF_AVX512HFBFVL): Move it before the first usage.
> (_blendm): Change iterator from VF_AVX512BWHFBF16
> to VF_AVX512HFBFVL.
> ---
>  gcc/config/i386/avx512vlbwintrin.h |  92 ++-
>  gcc/config/i386/avx512vlintrin.h   | 184 +++--
>  gcc/config/i386/sse.md |  17 ++-
>  3 files changed, 115 insertions(+), 178 deletions(-)
>
> diff --git a/gcc/config/i386/avx512vlbwintrin.h 
> b/gcc/config/i386/avx512vlbwintrin.h
> index 0232783a362..9d2aba2a8ff 100644
> --- a/gcc/config/i386/avx512vlbwintrin.h
> +++ b/gcc/config/i386/avx512vlbwintrin.h
> @@ -257,6 +257,42 @@ _mm_maskz_loadu_epi8 (__mmask16 __U, void const *__P)
>  (__mmask16) __U);
>  }
>
> +extern __inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_blend_epi16 (__mmask8 __U, __m128i __A, __m128i __W)
> +{
> +  return (__m128i) __builtin_ia32_blendmw_128_mask ((__v8hi) __A,
> +   (__v8hi) __W,
> +   (__mmask8) __U);
> +}
> +
> +extern __inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_blend_epi8 (__mmask16 __U, __m128i __A, __m128i __W)
> +{
> +  return (__m128i) __builtin_ia32_blendmb_128_mask ((__v16qi) __A,
> +   (__v16qi) __W,
> +   (__mmask16) __U);
> +}
> +
> +extern __inline __m256i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask_blend_epi16 (__mmask16 __U, __m256i __A, __m256i __W)
> +{
> +  return (__m256i) __builtin_ia32_blendmw_256_mask ((__v16hi) __A,
> +   (__v16hi) __W,
> +   (__mmask16) __U);
> +}
> +
> +extern __inline __m256i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask_blend_epi8 (__mmask32 __U, __m256i __A, __m256i __W)
> +{
> +  return (__m256i) __builtin_ia32_blendmb_256_mask ((__v32qi) __A,
> +   (__v32qi) __W,
> +   (__mmask32) __U);
> +}
> +
>  extern __inline __m128i
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm256_cvtepi16_epi8 (__m256i __A)
> @@ -1442,42 +1478,6 @@ _mm_maskz_dbsad_epu8 (__mmask8 __U, __m128i __A, 
> __m128i __B,
> (__mmask8) __U);
>  }
>
> -extern __inline __m128i
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_mask_blend_epi16 (__mmask8 __U, __m128i __A, __m128i __W)
> -{
> -  return (__m128i) __builtin_ia32_blendmw_128_mask ((__v8hi) __A,
> -   (__v8hi) __W,
> -   (__mmask8) __U);
> -}
> -
> -extern __inline __m128i
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_mask_blend_epi8 (__mmask16 __U, __m128i __A, __m128i __W)
> -{
> -  return (__m128i) __builtin_ia32_blendmb_128_mask ((__v16qi) __A,
> -   (__v16qi) __W,
> -   (__mmask16) __U);
> -}
> -
> -extern __inline __m256i
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm256_mask_blend_epi16 (__mmask16 __U, __m256i __A, __m256i __W)
> -{
> -  return (__m256i) __builtin_ia32_blendmw_256_mask ((__v16hi) __A,
> -   (__v16hi) __W,
> -   (__mmask16) __U);
> -}
> -
> -extern __inline __m256i
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> 

Re: [PATCH] i386: Add reduce_*_ep[i|u][8|16] series intrinsics

2023-04-18 Thread Hongtao Liu via Gcc-patches
On Tue, Apr 18, 2023 at 3:13 PM Hu, Lin1 via Gcc-patches
 wrote:
>
> More details: Intrinsics guide add these 128/256-bit intrinsics as follow: 
> https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=reduce__expand=5814.
>
> So we intend to enable these intrinsics for GCC-14.
>
> -Original Message-
> From: Gcc-patches  On 
> Behalf Of Hu, Lin1 via Gcc-patches
> Sent: Tuesday, April 18, 2023 3:03 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] i386: Add reduce_*_ep[i|u][8|16] series intrinsics
>
> Hi all,
>
> The patch aims to support reduce_*_ep[i|u][8|16] series intrinsics, and has 
> been tested on x86_64-pc-linux-gnu. OK for trunk?
Ok.
>
> BRs,
> Lin
>
> gcc/ChangeLog:
>
> * config/i386/avx2intrin.h
> (_MM_REDUCE_OPERATOR_BASIC_EPI16): New macro.
> (_MM_REDUCE_OPERATOR_MAX_MIN_EP16): Ditto.
> (_MM256_REDUCE_OPERATOR_BASIC_EPI16): Ditto.
> (_MM256_REDUCE_OPERATOR_MAX_MIN_EP16): Ditto.
> (_MM_REDUCE_OPERATOR_BASIC_EPI8): Ditto.
> (_MM_REDUCE_OPERATOR_MAX_MIN_EP8): Ditto.
> (_MM256_REDUCE_OPERATOR_BASIC_EPI8): Ditto.
> (_MM256_REDUCE_OPERATOR_MAX_MIN_EP8): Ditto.
> (_mm_reduce_add_epi16): New instrinsics.
> (_mm_reduce_mul_epi16): Ditto.
> (_mm_reduce_and_epi16): Ditto.
> (_mm_reduce_or_epi16): Ditto.
> (_mm_reduce_max_epi16): Ditto.
> (_mm_reduce_max_epu16): Ditto.
> (_mm_reduce_min_epi16): Ditto.
> (_mm_reduce_min_epu16): Ditto.
> (_mm256_reduce_add_epi16): Ditto.
> (_mm256_reduce_mul_epi16): Ditto.
> (_mm256_reduce_and_epi16): Ditto.
> (_mm256_reduce_or_epi16): Ditto.
> (_mm256_reduce_max_epi16): Ditto.
> (_mm256_reduce_max_epu16): Ditto.
> (_mm256_reduce_min_epi16): Ditto.
> (_mm256_reduce_min_epu16): Ditto.
> (_mm_reduce_add_epi8): Ditto.
> (_mm_reduce_mul_epi8): Ditto.
> (_mm_reduce_and_epi8): Ditto.
> (_mm_reduce_or_epi8): Ditto.
> (_mm_reduce_max_epi8): Ditto.
> (_mm_reduce_max_epu8): Ditto.
> (_mm_reduce_min_epi8): Ditto.
> (_mm_reduce_min_epu8): Ditto.
> (_mm256_reduce_add_epi8): Ditto.
> (_mm256_reduce_mul_epi8): Ditto.
> (_mm256_reduce_and_epi8): Ditto.
> (_mm256_reduce_or_epi8): Ditto.
> (_mm256_reduce_max_epi8): Ditto.
> (_mm256_reduce_max_epu8): Ditto.
> (_mm256_reduce_min_epi8): Ditto.
> (_mm256_reduce_min_epu8): Ditto.
> * config/i386/avx512vlbwintrin.h:
> (_mm_mask_reduce_add_epi16): Ditto.
> (_mm_mask_reduce_mul_epi16): Ditto.
> (_mm_mask_reduce_and_epi16): Ditto.
> (_mm_mask_reduce_or_epi16): Ditto.
> (_mm_mask_reduce_max_epi16): Ditto.
> (_mm_mask_reduce_max_epu16): Ditto.
> (_mm_mask_reduce_min_epi16): Ditto.
> (_mm_mask_reduce_min_epu16): Ditto.
> (_mm256_mask_reduce_add_epi16): Ditto.
> (_mm256_mask_reduce_mul_epi16): Ditto.
> (_mm256_mask_reduce_and_epi16): Ditto.
> (_mm256_mask_reduce_or_epi16): Ditto.
> (_mm256_mask_reduce_max_epi16): Ditto.
> (_mm256_mask_reduce_max_epu16): Ditto.
> (_mm256_mask_reduce_min_epi16): Ditto.
> (_mm256_mask_reduce_min_epu16): Ditto.
> (_mm_mask_reduce_add_epi8): Ditto.
> (_mm_mask_reduce_mul_epi8): Ditto.
> (_mm_mask_reduce_and_epi8): Ditto.
> (_mm_mask_reduce_or_epi8): Ditto.
> (_mm_mask_reduce_max_epi8): Ditto.
> (_mm_mask_reduce_max_epu8): Ditto.
> (_mm_mask_reduce_min_epi8): Ditto.
> (_mm_mask_reduce_min_epu8): Ditto.
> (_mm256_mask_reduce_add_epi8): Ditto.
> (_mm256_mask_reduce_mul_epi8): Ditto.
> (_mm256_mask_reduce_and_epi8): Ditto.
> (_mm256_mask_reduce_or_epi8): Ditto.
> (_mm256_mask_reduce_max_epi8): Ditto.
> (_mm256_mask_reduce_max_epu8): Ditto.
> (_mm256_mask_reduce_min_epi8): Ditto.
> (_mm256_mask_reduce_min_epu8): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512vlbw-reduce-op-1.c: New test.
> ---
>  gcc/config/i386/avx2intrin.h  | 347 ++
>  gcc/config/i386/avx512vlbwintrin.h| 256 +
>  .../gcc.target/i386/avx512vlbw-reduce-op-1.c  | 206 +++
>  3 files changed, 809 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlbw-reduce-op-1.c
>
> diff --git a/gcc/config/i386/avx2intrin.h b/gcc/config/i386/avx2intrin.h 
> index 1b9c8169a96..9b8c13b7233 100644
> --- a/gcc/config/i386/avx2intrin.h
> +++ b/gcc/config/i386/avx2intrin.h
> @@ -1915,6 +1915,353 @@ _mm256_mask_i64gather_epi32 (__m128i __src, int const 
> *__base,
>(int) (SCALE))
>  #endif  /* __OPTIMIZE__ */
>
> +#define _MM_REDUCE_OPERATOR_BASIC_EPI16(op) \
> +  __v8hi __T1 = 

Re: [PATCH] i386: Optimize vshuf{i, f}{32x4, 64x2} ymm and vperm{i, f}128 ymm

2023-04-18 Thread Hongtao Liu via Gcc-patches
On Tue, Apr 18, 2023 at 2:52 PM Hu, Lin1 via Gcc-patches
 wrote:
>
> Hi, all
>
> The patch aims to optimize vshuf{i,f}{32x4,64x2} ymm and vperm{i,f}128.
> And it has regtested on x86_64-pc-linux-gnu. OK for trunk?
Ok.
>
> Thanks.
> Lin
>
> vshuf{i,f}{32x4,64x2} ymm and vperm{i,f}128 ymm are 3 clk.
> We can optimze them to vblend, vmovaps when there's no cross-lane.
>
> gcc/ChangeLog:
>
> * config/i386/sse.md: Modify insn vperm{i,f}
> and vshuf{i,f}.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512vl-vshuff32x4-1.c: Modify test.
> * gcc.target/i386/avx512vl-vshuff64x2-1.c: Ditto.
> * gcc.target/i386/avx512vl-vshufi32x4-1.c: Ditto.
> * gcc.target/i386/avx512vl-vshufi64x2-1.c: Ditto.
> * gcc.target/i386/opt-vperm-vshuf-1.c: New test.
> * gcc.target/i386/opt-vperm-vshuf-2.c: Ditto.
> * gcc.target/i386/opt-vperm-vshuf-3.c: Ditto.
> ---
>  gcc/config/i386/sse.md| 36 --
>  .../gcc.target/i386/avx512vl-vshuff32x4-1.c   |  2 +-
>  .../gcc.target/i386/avx512vl-vshuff64x2-1.c   |  2 +-
>  .../gcc.target/i386/avx512vl-vshufi32x4-1.c   |  2 +-
>  .../gcc.target/i386/avx512vl-vshufi64x2-1.c   |  2 +-
>  .../gcc.target/i386/opt-vperm-vshuf-1.c   | 51 ++
>  .../gcc.target/i386/opt-vperm-vshuf-2.c   | 68 +++
>  .../gcc.target/i386/opt-vperm-vshuf-3.c   | 63 +
>  8 files changed, 218 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/opt-vperm-vshuf-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/opt-vperm-vshuf-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/opt-vperm-vshuf-3.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 513960e8f33..5b6b2427460 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -18437,6 +18437,8 @@
>mask = INTVAL (operands[3]) / 2;
>mask |= (INTVAL (operands[5]) - 4) / 2 << 1;
>operands[3] = GEN_INT (mask);
> +  if (INTVAL (operands[3]) == 2 && !)
> +return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
>return "vshuf64x2\t{%3, %2, %1, 
> %0|%0, %1, %2, %3}";
>  }
>[(set_attr "type" "sselog")
> @@ -18595,6 +18597,9 @@
>mask |= (INTVAL (operands[7]) - 8) / 4 << 1;
>operands[3] = GEN_INT (mask);
>
> +  if (INTVAL (operands[3]) == 2 && !)
> +return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
> +
>return "vshuf32x4\t{%3, %2, %1, 
> %0|%0, %1, %2, %3}";
>  }
>[(set_attr "type" "sselog")
> @@ -25663,7 +25668,28 @@
>(match_operand:SI 3 "const_0_to_255_operand")]
>   UNSPEC_VPERMTI))]
>"TARGET_AVX2"
> -  "vperm2i128\t{%3, %2, %1, %0|%0, %1, %2, %3}"
> +  {
> +int mask = INTVAL (operands[3]);
> +if ((mask & 0xbb) == 16)
> +  {
> +   if (rtx_equal_p (operands[0], operands[1]))
> + return "";
> +   else
> + return "vmovaps\t{%1, %0|%0, %1}";
> +  }
> +if ((mask & 0xbb) == 50)
> +  {
> +   if (rtx_equal_p (operands[0], operands[2]))
> + return "";
> +   else
> + return "vmovaps\t{%2, %0|%0, %2}";
> +  }
> +if ((mask & 0xbb) == 18)
> +  return "vblendps\t{$15, %2, %1, %0|%0, %1, %2, 15}";
> +if ((mask & 0xbb) == 48)
> +  return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
> +return "vperm2i128\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> +  }
>[(set_attr "type" "sselog")
> (set_attr "prefix" "vex")
> (set_attr "mode" "OI")])
> @@ -26226,9 +26252,11 @@
> && avx_vperm2f128_parallel (operands[3], mode)"
>  {
>int mask = avx_vperm2f128_parallel (operands[3], mode) - 1;
> -  if (mask == 0x12)
> -return "vinsert\t{$0, %x2, %1, %0|%0, %1, %x2, 0}";
> -  if (mask == 0x20)
> +  if ((mask & 0xbb) == 0x12)
> +return "vblendps\t{$15, %2, %1, %0|%0, %1, %2, 15}";
> +  if ((mask & 0xbb) == 0x30)
> +return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}";
> +  if ((mask & 0xbb) == 0x20)
>  return "vinsert\t{$1, %x2, %1, %0|%0, %1, %x2, 1}";
>operands[3] = GEN_INT (mask);
>return "vperm2\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vshuff32x4-1.c 
> b/gcc/testsuite/gcc.target/i386/avx512vl-vshuff32x4-1.c
> index 6c2fb2f184a..02aecf4edce 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512vl-vshuff32x4-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vshuff32x4-1.c
> @@ -12,7 +12,7 @@ volatile __mmask8 m;
>  void extern
>  avx512vl_test (void)
>  {
> -  x = _mm256_shuffle_f32x4 (x, x, 2);
> +  x = _mm256_shuffle_f32x4 (x, x, 3);
>x = _mm256_mask_shuffle_f32x4 (x, m, x, x, 2);
>x = _mm256_maskz_shuffle_f32x4 (m, x, x, 2);
>  }
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vshuff64x2-1.c 
> b/gcc/testsuite/gcc.target/i386/avx512vl-vshuff64x2-1.c
> index 1191b400134..563ded5d9df 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512vl-vshuff64x2-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vshuff64x2-1.c
> @@ 

Re: [PATCH 2/2] i386: Add AVX512BW dependency to AVX512VBMI2

2023-04-18 Thread Hongtao Liu via Gcc-patches
On Tue, Apr 18, 2023 at 3:07 PM Haochen Jiang via Gcc-patches
 wrote:
>
> gcc/ChangeLog:
>
> * common/config/i386/i386-common.cc
> (OPTION_MASK_ISA_AVX512VBMI2_SET): Change OPTION_MASK_ISA_AVX512F_SET
> to OPTION_MASK_ISA_AVX512BW_SET.
> (OPTION_MASK_ISA_AVX512F_UNSET):
> Remove OPTION_MASK_ISA_AVX512VBMI2_UNSET.
> (OPTION_MASK_ISA_AVX512BW_UNSET):
> Add OPTION_MASK_ISA_AVX512VBMI2_UNSET.
> * config/i386/avx512vbmi2intrin.h: Do not push avx512bw.
> * config/i386/avx512vbmi2vlintrin.h: Ditto.
> * config/i386/i386-builtin.def: Remove OPTION_MASK_ISA_AVX512BW.
> * config/i386/sse.md (VI12_AVX512VLBW): Removed.
> (VI12_VI48F_AVX512VLBW): Rename to VI12_VI48F_AVX512VL.
> (compress_mask): Change iterator from VI12_AVX512VLBW to
> VI12_AVX512VL.
> (compressstore_mask): Ditto.
> (expand_mask): Ditto.
> (expand_maskz): Ditto.
> (*expand_mask): Change iterator from VI12_VI48F_AVX512VLBW to
> VI12_VI48F_AVX512VL.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512bw-pr100267-1.c: Remove avx512f and avx512bw.
> * gcc.target/i386/avx512bw-pr100267-b-2.c: Ditto.
> * gcc.target/i386/avx512bw-pr100267-d-2.c: Ditto.
> * gcc.target/i386/avx512bw-pr100267-q-2.c: Ditto.
> * gcc.target/i386/avx512bw-pr100267-w-2.c: Ditto.
> * gcc.target/i386/avx512f-vpcompressb-1.c: Ditto.
> * gcc.target/i386/avx512f-vpcompressb-2.c: Ditto.
> * gcc.target/i386/avx512f-vpcompressw-1.c: Ditto.
> * gcc.target/i386/avx512f-vpcompressw-2.c: Ditto.
> * gcc.target/i386/avx512f-vpexpandb-1.c: Ditto.
> * gcc.target/i386/avx512f-vpexpandb-2.c: Ditto.
> * gcc.target/i386/avx512f-vpexpandw-1.c: Ditto.
> * gcc.target/i386/avx512f-vpexpandw-2.c: Ditto.
> * gcc.target/i386/avx512f-vpshld-1.c: Ditto.
> * gcc.target/i386/avx512f-vpshldd-2.c: Ditto.
> * gcc.target/i386/avx512f-vpshldq-2.c: Ditto.
> * gcc.target/i386/avx512f-vpshldv-1.c: Ditto.
> * gcc.target/i386/avx512f-vpshldvd-2.c: Ditto.
> * gcc.target/i386/avx512f-vpshldvq-2.c: Ditto.
> * gcc.target/i386/avx512f-vpshldvw-2.c: Ditto.
> * gcc.target/i386/avx512f-vpshrdd-2.c: Ditto.
> * gcc.target/i386/avx512f-vpshrdq-2.c: Ditto.
> * gcc.target/i386/avx512f-vpshrdv-1.c: Ditto.
> * gcc.target/i386/avx512f-vpshrdvd-2.c: Ditto.
> * gcc.target/i386/avx512f-vpshrdvq-2.c: Ditto.
> * gcc.target/i386/avx512f-vpshrdvw-2.c: Ditto.
> * gcc.target/i386/avx512f-vpshrdw-2.c: Ditto.
> * gcc.target/i386/avx512vbmi2-vpshld-1.c: Ditto.
> * gcc.target/i386/avx512vbmi2-vpshrd-1.c: Ditto.
> * gcc.target/i386/avx512vl-vpcompressb-1.c: Ditto.
> * gcc.target/i386/avx512vl-vpcompressb-2.c: Ditto.
> * gcc.target/i386/avx512vl-vpcompressw-2.c: Ditto.
> * gcc.target/i386/avx512vl-vpexpandb-1.c: Ditto.
> * gcc.target/i386/avx512vl-vpexpandb-2.c: Ditto.
> * gcc.target/i386/avx512vl-vpexpandw-1.c: Ditto.
> * gcc.target/i386/avx512vl-vpexpandw-2.c: Ditto.
> * gcc.target/i386/avx512vl-vpshldd-2.c: Ditto.
> * gcc.target/i386/avx512vl-vpshldq-2.c: Ditto.
> * gcc.target/i386/avx512vl-vpshldv-1.c: Ditto.
> * gcc.target/i386/avx512vl-vpshldvd-2.c: Ditto.
> * gcc.target/i386/avx512vl-vpshldvq-2.c: Ditto.
> * gcc.target/i386/avx512vl-vpshldvw-2.c: Ditto.
> * gcc.target/i386/avx512vl-vpshrdd-2.c: Ditto.
> * gcc.target/i386/avx512vl-vpshrdq-2.c: Ditto.
> * gcc.target/i386/avx512vl-vpshrdv-1.c: Ditto.
> * gcc.target/i386/avx512vl-vpshrdvd-2.c: Ditto.
> * gcc.target/i386/avx512vl-vpshrdvq-2.c: Ditto.
> * gcc.target/i386/avx512vl-vpshrdvw-2.c: Ditto.
> * gcc.target/i386/avx512vl-vpshrdw-2.c: Ditto.
> * gcc.target/i386/avx512vlbw-pr100267-1.c: Ditto.
> * gcc.target/i386/avx512vlbw-pr100267-b-2.c: Ditto.
> * gcc.target/i386/avx512vlbw-pr100267-w-2.c: Ditto.
Ok.
> ---
>  gcc/common/config/i386/i386-common.cc |  5 +-
>  gcc/config/i386/avx512vbmi2intrin.h   | 18 ++-
>  gcc/config/i386/avx512vbmi2vlintrin.h | 21 ++--
>  gcc/config/i386/i386-builtin.def  | 48 -
>  gcc/config/i386/sse.md| 51 ---
>  .../gcc.target/i386/avx512bw-pr100267-1.c |  2 +-
>  .../gcc.target/i386/avx512bw-pr100267-b-2.c   |  3 +-
>  .../gcc.target/i386/avx512bw-pr100267-d-2.c   |  3 +-
>  .../gcc.target/i386/avx512bw-pr100267-q-2.c   |  3 +-
>  .../gcc.target/i386/avx512bw-pr100267-w-2.c   |  3 +-
>  .../gcc.target/i386/avx512f-vpcompressb-1.c   |  2 +-
>  .../gcc.target/i386/avx512f-vpcompressb-2.c   |  3 +-
>  .../gcc.target/i386/avx512f-vpcompressw-1.c   |  2 +-
>  .../gcc.target/i386/avx512f-vpcompressw-2.c   |  3 +-

Re: [PATCH 1/2] i386: Add AVX512BW dependency to AVX512BITALG

2023-04-18 Thread Hongtao Liu via Gcc-patches
On Tue, Apr 18, 2023 at 3:07 PM Haochen Jiang via Gcc-patches
 wrote:
>
> gcc/ChangeLog:
>
> * common/config/i386/i386-common.cc
> (OPTION_MASK_ISA_AVX512BITALG_SET):
> Change OPTION_MASK_ISA_AVX512F_SET
> to OPTION_MASK_ISA_AVX512BW_SET.
> (OPTION_MASK_ISA_AVX512F_UNSET):
> Remove OPTION_MASK_ISA_AVX512BITALG_SET.
> (OPTION_MASK_ISA_AVX512BW_UNSET):
> Add OPTION_MASK_ISA_AVX512BITALG_SET.
> * config/i386/avx512bitalgintrin.h: Do not push avx512bw.
> * config/i386/i386-builtin.def:
> Remove redundant OPTION_MASK_ISA_AVX512BW.
> * config/i386/sse.md (VI1_AVX512VLBW): Removed.
> (avx512vl_vpshufbitqmb):
> Change the iterator from VI1_AVX512VLBW to VI1_AVX512VL.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512bitalg-vpopcntb-1.c:
> Remove avx512bw.
> * gcc.target/i386/avx512bitalg-vpopcntb.c: Ditto.
> * gcc.target/i386/avx512bitalg-vpopcntbvl.c: Ditto.
> * gcc.target/i386/avx512bitalg-vpopcntw-1.c: Ditto.
> * gcc.target/i386/avx512bitalg-vpopcntw.c: Ditto.
> * gcc.target/i386/avx512bitalg-vpopcntwvl.c: Ditto.
> * gcc.target/i386/avx512bitalg-vpshufbitqmb-1.c: Ditto.
> * gcc.target/i386/avx512bitalg-vpshufbitqmb.c: Ditto.
> * gcc.target/i386/avx512bitalgvl-vpopcntb-1.c: Ditto.
> * gcc.target/i386/avx512bitalgvl-vpopcntw-1.c: Ditto.
> * gcc.target/i386/avx512bitalgvl-vpshufbitqmb-1.c: Ditto.
> * gcc.target/i386/pr93696-1.c: Ditto.
> * gcc.target/i386/pr93696-2.c: Ditto.
Ok.
> ---
>  gcc/common/config/i386/i386-common.cc |  8 ++--
>  gcc/config/i386/avx512bitalgintrin.h  | 39 ---
>  gcc/config/i386/i386-builtin.def  | 10 ++---
>  gcc/config/i386/sse.md|  8 +---
>  .../gcc.target/i386/avx512bitalg-vpopcntb-1.c |  3 +-
>  .../gcc.target/i386/avx512bitalg-vpopcntb.c   |  2 +-
>  .../gcc.target/i386/avx512bitalg-vpopcntbvl.c |  2 +-
>  .../gcc.target/i386/avx512bitalg-vpopcntw-1.c |  3 +-
>  .../gcc.target/i386/avx512bitalg-vpopcntw.c   |  2 +-
>  .../gcc.target/i386/avx512bitalg-vpopcntwvl.c |  2 +-
>  .../i386/avx512bitalg-vpshufbitqmb-1.c|  2 +-
>  .../i386/avx512bitalg-vpshufbitqmb.c  |  2 +-
>  .../i386/avx512bitalgvl-vpopcntb-1.c  |  3 +-
>  .../i386/avx512bitalgvl-vpopcntw-1.c  |  3 +-
>  .../i386/avx512bitalgvl-vpshufbitqmb-1.c  |  2 +-
>  gcc/testsuite/gcc.target/i386/pr93696-1.c |  2 +-
>  gcc/testsuite/gcc.target/i386/pr93696-2.c |  2 +-
>  17 files changed, 32 insertions(+), 63 deletions(-)
>
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index d90c558311b..f78fc0a60e2 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -91,7 +91,7 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA_AVX512VPOPCNTDQ_SET \
>(OPTION_MASK_ISA_AVX512VPOPCNTDQ | OPTION_MASK_ISA_AVX512F_SET)
>  #define OPTION_MASK_ISA_AVX512BITALG_SET \
> -  (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512F_SET)
> +  (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512BW_SET)
>  #define OPTION_MASK_ISA2_AVX512BF16_SET OPTION_MASK_ISA2_AVX512BF16
>  #define OPTION_MASK_ISA_RTM_SET OPTION_MASK_ISA_RTM
>  #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW
> @@ -234,14 +234,14 @@ along with GCC; see the file COPYING3.  If not see
> | OPTION_MASK_ISA_AVX512VL_UNSET | OPTION_MASK_ISA_AVX512IFMA_UNSET \
> | OPTION_MASK_ISA_AVX512VBMI2_UNSET \
> | OPTION_MASK_ISA_AVX512VNNI_UNSET \
> -   | OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET \
> -   | OPTION_MASK_ISA_AVX512BITALG_UNSET)
> +   | OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET)
>  #define OPTION_MASK_ISA_AVX512CD_UNSET OPTION_MASK_ISA_AVX512CD
>  #define OPTION_MASK_ISA_AVX512PF_UNSET OPTION_MASK_ISA_AVX512PF
>  #define OPTION_MASK_ISA_AVX512ER_UNSET OPTION_MASK_ISA_AVX512ER
>  #define OPTION_MASK_ISA_AVX512DQ_UNSET OPTION_MASK_ISA_AVX512DQ
>  #define OPTION_MASK_ISA_AVX512BW_UNSET \
> -  (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VBMI_UNSET)
> +  (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VBMI_UNSET \
> +   | OPTION_MASK_ISA_AVX512BITALG_UNSET)
>  #define OPTION_MASK_ISA_AVX512VL_UNSET OPTION_MASK_ISA_AVX512VL
>  #define OPTION_MASK_ISA_AVX512IFMA_UNSET OPTION_MASK_ISA_AVX512IFMA
>  #define OPTION_MASK_ISA2_AVXIFMA_UNSET OPTION_MASK_ISA2_AVXIFMA
> diff --git a/gcc/config/i386/avx512bitalgintrin.h 
> b/gcc/config/i386/avx512bitalgintrin.h
> index aa6d652938a..a1c7be109a9 100644
> --- a/gcc/config/i386/avx512bitalgintrin.h
> +++ b/gcc/config/i386/avx512bitalgintrin.h
> @@ -48,17 +48,6 @@ _mm512_popcnt_epi16 (__m512i __A)
>return (__m512i) __builtin_ia32_vpopcountw_v32hi ((__v32hi) __A);
>  }
>
> -#ifdef __DISABLE_AVX512BITALG__
> -#undef __DISABLE_AVX512BITALG__
> -#pragma GCC pop_options
> 

Re: [PATCH] RISC-V: Update multilib-generator to handle V

2023-04-18 Thread Kito Cheng via Gcc-patches
OK, thanks, I know what the problem is, I tried rv64 but didn't try
rv32, I have another fix in my mind, and will post another fix soon.

On Wed, Apr 19, 2023 at 9:29 AM Palmer Dabbelt  wrote:
>
> On Tue, 18 Apr 2023 18:26:18 PDT (-0700), Kito Cheng wrote:
> > And which -march -mabi you used will got issue?
> >
> > On Wed, Apr 19, 2023 at 8:51 AM Palmer Dabbelt  wrote:
> >>
> >> On Tue, 18 Apr 2023 17:47:31 PDT (-0700), Kito Cheng wrote:
> >> > Do you mind shared gcc configure and the option you tried?
> >>
> >> Just riscv-gnu-toolchain with "--enbale-multilib --enable-linux".
> >>
> >> > On Wed, Apr 19, 2023 at 4:01 AM Palmer Dabbelt  
> >> > wrote:
> >> >>
> >> >> On Tue, 18 Apr 2023 08:44:24 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
> >> >> >> Yep, if I drop the non-canonicial strings via
> >> >> >>
> >> >> >> diff --git a/gcc/config/riscv/multilib-generator 
> >> >> >> b/gcc/config/riscv/multilib-generator
> >> >> >> index 58b7198b243..a63a4d69c18 100755
> >> >> >> --- a/gcc/config/riscv/multilib-generator
> >> >> >> +++ b/gcc/config/riscv/multilib-generator
> >> >> >> @@ -174,7 +174,7 @@ for cmodel in cmodels:
> >> >> >>  ext_combs = expand_combination(ext)
> >> >> >>  alts = sum([[x] + [x + y for y in ext_combs] for x in 
> >> >> >> [arch] + extra], [])
> >> >> >>  alts = filter(lambda x: len(x) != 0, alts)
> >> >> >> -alts = alts + list(map(lambda a : arch_canonicalize(a, 
> >> >> >> args.misa_spec), alts))
> >> >> >> +alts = list(map(lambda a : arch_canonicalize(a, 
> >> >> >> args.misa_spec), alts))
> >> >> >>
> >> >> >>  # Drop duplicated entry.
> >> >> >>  alts = unique(alts)
> >> >> >>
> >> >> >> then I can't link `-march=rv32imafdcv`, I need
> >> >> >> `-march=rv32imacv_zicsr_zve32f_zve32x_zve64x_zvl128b_zvl32b_zvl64b`. 
> >> >> >>  That's
> >> >> >> kind of a headache for users to type in.
> >> >> >
> >> >> > Yes, that's a headache for users, but arch string canonicalization is
> >> >> > hiddened at the process,
> >> >> > so the user could still just use rv32imafdcv at compile time and
> >> >> > multi-lib config.
> >> >> >
> >> >> > And the driver and multilib-generator (with arch_canonicalize) script
> >> >> > will handle those headache in the background.
> >> >>
> >> >> Sorry, I'm not exactly sure what you're trying to say.  I just rebuilt
> >> >> GCC with this patch (and t-linux-multilib regenerated from it), it's not
> >> >> resolving multlibs for the short names.
>
> `-march=rv32imafdcv` is the broken one,
> `-march=rv32imacv_zicsr_zve32f_zve32x_zve64x_zvl128b_zvl32b_zvl64b`
> resolves multilibs (there's a bit more above).


Re: Re: [PATCH] RISC-V: Fix bug reported by PR109535

2023-04-18 Thread juzhe.zh...@rivai.ai
Meaning when "AVL" is a reg and appears once, we will eliminate "AVL" operand 
in uses.
If it appears more than once, we don't eliminate the "AVL" operand in uses.

You can this case:
vint8m1_t b = __riscv_vadd_vx_i8m1 (a, vl, vl);

Here you can see "vl" variable not only serves as the "AVL" which is used in 
vsetvli but also it serves as "scalar operand" involved in the vadd.vx 
operation.
In this case, we can eliminate the operand "vl"

However, vint8m1_t b = __riscv_vadd_vx_i8m1 (a, x, vl);
This case you can see "vl" operand only serves as "avl" which is used already 
in vsetvli instructions before, so this operand is not used anymore in 
"vadd.vx" instruction,
I removed this operand and dependency.

Feel free to give me more comments. Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-19 09:11
To: juzhe.zh...@rivai.ai; kito.cheng; Richard Biener
CC: gcc-patches; palmer
Subject: Re: [PATCH] RISC-V: Fix bug reported by PR109535
 
 
On 4/18/23 19:04, juzhe.zh...@rivai.ai wrote:
> The bug issue reported by google/highway project:
> (set(..)
> (reg:QI s0)
> (reg:DI s0))
> 
> The "avl" operand rtx  = (reg:DI s0)
> count_occurrences return 1 however the actual regno occurrences should be 2.
> In this case, the VSETVL PASS will eliminate the use of (reg:DI s0) then 
> file assertion in RTL_SSA.
> Instead, we should not eliminate "s0" dependency.
So these are not vector hard registers, but GPR hard registers.  Meaning 
you have to worry about even more things.  Consider case on rv32 when 
you ask to count (reg:QI s1) and there is a reference to (reg:DI s0).
 
Prior to reload you also have to worry about SUBREGs.
 
 
You probably need to be using refers_to_regno_p or something similar.
 
jeff
 


Re: Re: [PATCH] RISC-V: Fix bug reported by PR109535

2023-04-18 Thread juzhe.zh...@rivai.ai
I tried refers_to_regno_p
It can not work for us since it just return true or false whether the "rtx" has 
the regno.

In our situation, we remove "AVL" dependency when it appears once in the "rtx" 
otherwise, we don't eliminate "AVL" dependency.
Would you mind giving me more suggestions?

Thanks


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-19 09:11
To: juzhe.zh...@rivai.ai; kito.cheng; Richard Biener
CC: gcc-patches; palmer
Subject: Re: [PATCH] RISC-V: Fix bug reported by PR109535
 
 
On 4/18/23 19:04, juzhe.zh...@rivai.ai wrote:
> The bug issue reported by google/highway project:
> (set(..)
> (reg:QI s0)
> (reg:DI s0))
> 
> The "avl" operand rtx  = (reg:DI s0)
> count_occurrences return 1 however the actual regno occurrences should be 2.
> In this case, the VSETVL PASS will eliminate the use of (reg:DI s0) then 
> file assertion in RTL_SSA.
> Instead, we should not eliminate "s0" dependency.
So these are not vector hard registers, but GPR hard registers.  Meaning 
you have to worry about even more things.  Consider case on rv32 when 
you ask to count (reg:QI s1) and there is a reference to (reg:DI s0).
 
Prior to reload you also have to worry about SUBREGs.
 
 
You probably need to be using refers_to_regno_p or something similar.
 
jeff
 


Re: [PATCH] RISC-V: Update multilib-generator to handle V

2023-04-18 Thread Palmer Dabbelt

On Tue, 18 Apr 2023 18:26:18 PDT (-0700), Kito Cheng wrote:

And which -march -mabi you used will got issue?

On Wed, Apr 19, 2023 at 8:51 AM Palmer Dabbelt  wrote:


On Tue, 18 Apr 2023 17:47:31 PDT (-0700), Kito Cheng wrote:
> Do you mind shared gcc configure and the option you tried?

Just riscv-gnu-toolchain with "--enbale-multilib --enable-linux".

> On Wed, Apr 19, 2023 at 4:01 AM Palmer Dabbelt  wrote:
>>
>> On Tue, 18 Apr 2023 08:44:24 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
>> >> Yep, if I drop the non-canonicial strings via
>> >>
>> >> diff --git a/gcc/config/riscv/multilib-generator 
b/gcc/config/riscv/multilib-generator
>> >> index 58b7198b243..a63a4d69c18 100755
>> >> --- a/gcc/config/riscv/multilib-generator
>> >> +++ b/gcc/config/riscv/multilib-generator
>> >> @@ -174,7 +174,7 @@ for cmodel in cmodels:
>> >>  ext_combs = expand_combination(ext)
>> >>  alts = sum([[x] + [x + y for y in ext_combs] for x in [arch] + 
extra], [])
>> >>  alts = filter(lambda x: len(x) != 0, alts)
>> >> -alts = alts + list(map(lambda a : arch_canonicalize(a, 
args.misa_spec), alts))
>> >> +alts = list(map(lambda a : arch_canonicalize(a, args.misa_spec), 
alts))
>> >>
>> >>  # Drop duplicated entry.
>> >>  alts = unique(alts)
>> >>
>> >> then I can't link `-march=rv32imafdcv`, I need
>> >> `-march=rv32imacv_zicsr_zve32f_zve32x_zve64x_zvl128b_zvl32b_zvl64b`.  
That's
>> >> kind of a headache for users to type in.
>> >
>> > Yes, that's a headache for users, but arch string canonicalization is
>> > hiddened at the process,
>> > so the user could still just use rv32imafdcv at compile time and
>> > multi-lib config.
>> >
>> > And the driver and multilib-generator (with arch_canonicalize) script
>> > will handle those headache in the background.
>>
>> Sorry, I'm not exactly sure what you're trying to say.  I just rebuilt
>> GCC with this patch (and t-linux-multilib regenerated from it), it's not
>> resolving multlibs for the short names.


`-march=rv32imafdcv` is the broken one, 
`-march=rv32imacv_zicsr_zve32f_zve32x_zve64x_zvl128b_zvl32b_zvl64b` 
resolves multilibs (there's a bit more above).


Re: [PATCH] RISC-V: Update multilib-generator to handle V

2023-04-18 Thread Kito Cheng via Gcc-patches
And which -march -mabi you used will got issue?

On Wed, Apr 19, 2023 at 8:51 AM Palmer Dabbelt  wrote:
>
> On Tue, 18 Apr 2023 17:47:31 PDT (-0700), Kito Cheng wrote:
> > Do you mind shared gcc configure and the option you tried?
>
> Just riscv-gnu-toolchain with "--enbale-multilib --enable-linux".
>
> > On Wed, Apr 19, 2023 at 4:01 AM Palmer Dabbelt  wrote:
> >>
> >> On Tue, 18 Apr 2023 08:44:24 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
> >> >> Yep, if I drop the non-canonicial strings via
> >> >>
> >> >> diff --git a/gcc/config/riscv/multilib-generator 
> >> >> b/gcc/config/riscv/multilib-generator
> >> >> index 58b7198b243..a63a4d69c18 100755
> >> >> --- a/gcc/config/riscv/multilib-generator
> >> >> +++ b/gcc/config/riscv/multilib-generator
> >> >> @@ -174,7 +174,7 @@ for cmodel in cmodels:
> >> >>  ext_combs = expand_combination(ext)
> >> >>  alts = sum([[x] + [x + y for y in ext_combs] for x in [arch] + 
> >> >> extra], [])
> >> >>  alts = filter(lambda x: len(x) != 0, alts)
> >> >> -alts = alts + list(map(lambda a : arch_canonicalize(a, 
> >> >> args.misa_spec), alts))
> >> >> +alts = list(map(lambda a : arch_canonicalize(a, 
> >> >> args.misa_spec), alts))
> >> >>
> >> >>  # Drop duplicated entry.
> >> >>  alts = unique(alts)
> >> >>
> >> >> then I can't link `-march=rv32imafdcv`, I need
> >> >> `-march=rv32imacv_zicsr_zve32f_zve32x_zve64x_zvl128b_zvl32b_zvl64b`.  
> >> >> That's
> >> >> kind of a headache for users to type in.
> >> >
> >> > Yes, that's a headache for users, but arch string canonicalization is
> >> > hiddened at the process,
> >> > so the user could still just use rv32imafdcv at compile time and
> >> > multi-lib config.
> >> >
> >> > And the driver and multilib-generator (with arch_canonicalize) script
> >> > will handle those headache in the background.
> >>
> >> Sorry, I'm not exactly sure what you're trying to say.  I just rebuilt
> >> GCC with this patch (and t-linux-multilib regenerated from it), it's not
> >> resolving multlibs for the short names.


Re: [PATCH v4 05/10] RISC-V:autovec: Add autovectorization patterns for binary integer operations

2023-04-18 Thread Kito Cheng via Gcc-patches
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index 70ad85b661b..7fae87968d7 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -34,6 +34,8 @@
>UNSPEC_VMULHU
>UNSPEC_VMULHSU
>
> +  UNSPEC_VADD
> +  UNSPEC_VSUB

Defined but unused?

>UNSPEC_VADC
>UNSPEC_VSBC
>UNSPEC_VMADC
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index 0ecca98f20c..2ac5b744503 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -26,8 +26,6 @@
>  ;; - Auto-vectorization (TBD)
>  ;; - Combine optimization (TBD)
>
> -(include "vector-iterators.md")
> -

Why remove this?


Re: [PATCH v4 03/10] RISC-V:autovec: Add auto-vectorization support functions

2023-04-18 Thread Kito Cheng via Gcc-patches
> @@ -118,6 +120,41 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT 
> minval,
>   && IN_RANGE (INTVAL (elt), minval, maxval));
>  }
>
> +/* Return the vlmul field for a specific machine mode.  */
> +unsigned int
> +riscv_classify_vlmul_field (enum machine_mode mode)

This is not implemented right for the current type system.

> @@ -176,6 +213,64 @@ calculate_ratio (unsigned int sew, enum vlmul_type vlmul)
>return ratio;
>  }
>
> +/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE for RVV.  */
> +
> +machine_mode
> +riscv_vector_preferred_simd_mode (scalar_mode mode, unsigned vf)

`vf` is kind of misleading, it should be `LMUL` or something like that.

> +{
> +  if (!TARGET_VECTOR)
> +return word_mode;
> +
> +  switch (mode)
> +{
> +case E_QImode:
> +  return vf == 1   ? VNx8QImode
> +: vf == 2 ? VNx16QImode
> +: vf == 4 ? VNx32QImode
> +  : VNx64QImode;

I would prefer only to keep LMUL=1/ vf=1 case for this patch set,
so maybe drop the vf parameter for this moment and add back when
we implement later.

> +/* Return true if it is a RVV tuple mode.  */
> +bool
> +riscv_tuple_mode_p (machine_mode mode ATTRIBUTE_UNUSED)

just drop this for now.

> +/* Return nf for a machine mode.  */
> +int
> +riscv_classify_nf (machine_mode mode)

Drop this, add that when we implement tuple type.

> +
> +/* Return vlmul register size for a machine mode.  */
> +int
> +riscv_vlmul_regsize (machine_mode mode)

UNITS_PER_V_REGget mode size and calculate with UNITS_PER_V_REG
like exact_div (GET_MODE_SIZE (mode), UNITS_PER_V_REG).to_constant ()

> +{
> +  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
> +return 1;
> +  switch (riscv_classify_vlmul_field (mode))
> +{
> +case VLMUL_FIELD_001:
> +  return 2;
> +case VLMUL_FIELD_010:
> +  return 4;
> +case VLMUL_FIELD_011:
> +  return 8;
> +case VLMUL_FIELD_100:
> +  gcc_unreachable ();
> +default:
> +  return 1;
> +}
> +}
> +
> +/* Return true if it is a RVV mask mode.  */
> +bool
> +riscv_vector_mask_mode_p (machine_mode mode)
> +{
> +  return (mode == VNx1BImode || mode == VNx2BImode || mode == VNx4BImode
> + || mode == VNx8BImode || mode == VNx16BImode || mode == VNx32BImode
> + || mode == VNx64BImode);
> +}
> +
> +/* Implement TARGET_VECTORIZE_GET_MASK_MODE for RVV.  */
> +
> +opt_machine_mode
> +riscv_vector_get_mask_mode (machine_mode mode)
> +{
> +  machine_mode mask_mode;
> +  int nf = 1;
> +  if (riscv_tuple_mode_p (mode))
> +nf = riscv_classify_nf (mode);

drop nf stuffs


Re: [PATCH] RISC-V: Fix bug reported by PR109535

2023-04-18 Thread Jeff Law via Gcc-patches




On 4/18/23 19:04, juzhe.zh...@rivai.ai wrote:

The bug issue reported by google/highway project:
(set(..)
        (reg:QI s0)
(reg:DI s0))

The "avl" operand rtx  = (reg:DI s0)
count_occurrences return 1 however the actual regno occurrences should be 2.
In this case, the VSETVL PASS will eliminate the use of (reg:DI s0) then 
file assertion in RTL_SSA.

Instead, we should not eliminate "s0" dependency.
So these are not vector hard registers, but GPR hard registers.  Meaning 
you have to worry about even more things.  Consider case on rv32 when 
you ask to count (reg:QI s1) and there is a reference to (reg:DI s0).


Prior to reload you also have to worry about SUBREGs.


You probably need to be using refers_to_regno_p or something similar.

jeff


Re: [PATCH v4 04/10] RISC-V:autovec: Add target vectorization hooks

2023-04-18 Thread Kito Cheng via Gcc-patches
> +/* Implement TARGET_ESTIMATED_POLY_VALUE.
> +   Look into the tuning structure for an estimate.
> +   KIND specifies the type of requested estimate: min, max or likely.
> +   For cores with a known RVV width all three estimates are the same.
> +   For generic RVV tuning we want to distinguish the maximum estimate from
> +   the minimum and likely ones.
> +   The likely estimate is the same as the minimum in that case to give a
> +   conservative behavior of auto-vectorizing with RVV when it is a win
> +   even for 128-bit RVV.
> +   When RVV width information is available VAL.coeffs[1] is multiplied by
> +   the number of VQ chunks over the initial Advanced SIMD 128 bits.  */
> +
> +static HOST_WIDE_INT
> +riscv_estimated_poly_value (poly_int64 val,
> +   poly_value_estimate_kind kind = POLY_VALUE_LIKELY)
> +{
> +  unsigned int width_source = BITS_PER_RISCV_VECTOR.is_constant ()
> +? (unsigned int) BITS_PER_RISCV_VECTOR.to_constant ()
> +: (unsigned int) RVV_SCALABLE;

It could be RVV_SCALABLE only for now, so I would prefer to just
keep that switch only for now.

And adding assert (!BITS_PER_RISCV_VECTOR.is_constant ());

> +
> +  /* If there is no core-specific information then the minimum and likely
> + values are based on 128-bit vectors and the maximum is based on
> + the architectural maximum of 2048 bits.  */

Maximum is 65,536 bit per vector spec.

> +  if (width_source == RVV_SCALABLE)
> +switch (kind)
> +  {
> +  case POLY_VALUE_MIN:
> +  case POLY_VALUE_LIKELY:
> +   return val.coeffs[0];
> +
> +  case POLY_VALUE_MAX:
> +   return val.coeffs[0] + val.coeffs[1] * 15;
> +  }
> +
> +  /* Allow BITS_PER_RISCV_VECTOR to be a bitmask of different VL, treating 
> the
> + lowest as likely.  This could be made more general if future -mtune
> + options need it to be.  */
> +  if (kind == POLY_VALUE_MAX)
> +width_source = 1 << floor_log2 (width_source);
> +  else
> +width_source = least_bit_hwi (width_source);
> +
> +  /* If the core provides width information, use that.  */
> +  HOST_WIDE_INT over_128 = width_source - 128;
> +  return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
> +}
> +
> +/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE.  */
> +
> +static machine_mode
> +riscv_preferred_simd_mode (scalar_mode mode)
> +{
> +  machine_mode vmode =
> +riscv_vector::riscv_vector_preferred_simd_mode (mode,
> +   
> riscv_vectorization_factor);
> +  if (VECTOR_MODE_P (vmode))
> +return vmode;
> +
> +  return word_mode;
> +}
> +
> +/* Implement TARGET_AUTOVECTORIZE_VECTOR_MODES for RVV.  */
> +static unsigned int
> +riscv_autovectorize_vector_modes (vector_modes *modes, bool)
> +{
> +  if (!TARGET_VECTOR)
> +return 0;
> +
> +  if (riscv_vectorization_factor == RVV_LMUL1)
> +{
> +  modes->safe_push (VNx16QImode);
> +  modes->safe_push (VNx8QImode);
> +  modes->safe_push (VNx4QImode);
> +  modes->safe_push (VNx2QImode);
> +}

Keep LMUL1 case only for this moment.


Re: Re: [PATCH] RISC-V: Fix bug reported by PR109535

2023-04-18 Thread juzhe.zh...@rivai.ai
The bug issue reported by google/highway project:
(set(..)
   (reg:QI s0)
(reg:DI s0))

The "avl" operand rtx  = (reg:DI s0)
count_occurrences return 1 however the actual regno occurrences should be 2.
In this case, the VSETVL PASS will eliminate the use of (reg:DI s0) then file 
assertion in RTL_SSA.
Instead, we should not eliminate "s0" dependency.

Thanks


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-19 08:56
To: Kito Cheng; juzhe.zhong; Richard Biener
CC: gcc-patches; palmer
Subject: Re: [PATCH] RISC-V: Fix bug reported by PR109535
 
 
On 4/18/23 18:18, Kito Cheng wrote:
> Hi Richard, Jeff:
> 
> It's it possible to backport to GCC 13? highway is one of our
> important users for RISC-V vector stuff, and it has built in some
> distro, so we believe this bug fix is important to backport.
I want to see an explanation why count_occurrences isn't doing what you 
want.
 
jeff
 


Re: [PATCH] testsuite: fix scan-tree-dump patterns [PR83904, PR100297]

2023-04-18 Thread Jerry D via Gcc-patches

On 4/18/23 12:39 PM, Harald Anlauf via Fortran wrote:

Dear all,

the attached patch adjusts the scan-tree-dump patterns of the
reported testcases which likely were run in a location such
that a path in an error message showing in the tree-dump might
have accidentally matched "free" or "data", respectively.

For the testcase gfortran.dg/reshape_8.f90 I checked with a
failing gfortran-11 that the pattern is appropriate.

OK for mainline?

Thanks,
Harald


Yes, OK

Thanks


Re: [PATCH] RISC-V: Fix bug reported by PR109535

2023-04-18 Thread Jeff Law via Gcc-patches




On 4/18/23 18:18, Kito Cheng wrote:

Hi Richard, Jeff:

It's it possible to backport to GCC 13? highway is one of our
important users for RISC-V vector stuff, and it has built in some
distro, so we believe this bug fix is important to backport.
I want to see an explanation why count_occurrences isn't doing what you 
want.


jeff


Re: [PATCH v4 01/10] RISC-V: Add new predicates and function prototypes

2023-04-18 Thread Kito Cheng via Gcc-patches
Could you please move the new function declarations and new code to
the patch where they are being used?

> +/* RVV vector register sizes.  */
> +enum riscv_vector_bits_enum
> +{
> +  RVV_SCALABLE,
> +  RVV_NOT_IMPLEMENTED = RVV_SCALABLE,
> +  RVV_64 = 64,
> +  RVV_128 = 128,
> +  RVV_256 = 256,
> +  RVV_512 = 512,
> +  RVV_1024 = 1024,
> +  RVV_2048 = 2048,
> +  RVV_4096 = 4096,
> +  RVV_8192 = 8192,
> +  RVV_16384 = 16384,
> +  RVV_32768 = 32768,
> +  RVV_65536 = 65536
> +};

I think this is not necessary for the VLA vectorizer?

> +Enum
> +Name(riscv_vector_lmul) Type(enum riscv_vector_lmul_enum)
> +The possible vectorization factor:
> +
> +EnumValue
> +Enum(riscv_vector_lmul) String(1) Value(RVV_LMUL1)
> +
> +EnumValue
> +Enum(riscv_vector_lmul) String(2) Value(RVV_LMUL2)
> +
> +EnumValue
> +Enum(riscv_vector_lmul) String(4) Value(RVV_LMUL4)
> +
> +EnumValue
> +Enum(riscv_vector_lmul) String(8) Value(RVV_LMUL8)

I would like to introduce this option later, it's used for fine tuning,
VLA vectorizer should be able to work without this tuning option.

> +mriscv-vector-lmul=
> +Target RejectNegative Joined Enum(riscv_vector_lmul) Var(riscv_vector_lmul) 
> Init(RVV_LMUL1)
> +-mriscv-vector-lmul= Set the vf using lmul in auto-vectorization.
> +

Same question for this


Re: [PATCH] RISC-V: Update multilib-generator to handle V

2023-04-18 Thread Palmer Dabbelt

On Tue, 18 Apr 2023 17:47:31 PDT (-0700), Kito Cheng wrote:

Do you mind shared gcc configure and the option you tried?


Just riscv-gnu-toolchain with "--enbale-multilib --enable-linux".


On Wed, Apr 19, 2023 at 4:01 AM Palmer Dabbelt  wrote:


On Tue, 18 Apr 2023 08:44:24 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
>> Yep, if I drop the non-canonicial strings via
>>
>> diff --git a/gcc/config/riscv/multilib-generator 
b/gcc/config/riscv/multilib-generator
>> index 58b7198b243..a63a4d69c18 100755
>> --- a/gcc/config/riscv/multilib-generator
>> +++ b/gcc/config/riscv/multilib-generator
>> @@ -174,7 +174,7 @@ for cmodel in cmodels:
>>  ext_combs = expand_combination(ext)
>>  alts = sum([[x] + [x + y for y in ext_combs] for x in [arch] + 
extra], [])
>>  alts = filter(lambda x: len(x) != 0, alts)
>> -alts = alts + list(map(lambda a : arch_canonicalize(a, 
args.misa_spec), alts))
>> +alts = list(map(lambda a : arch_canonicalize(a, args.misa_spec), 
alts))
>>
>>  # Drop duplicated entry.
>>  alts = unique(alts)
>>
>> then I can't link `-march=rv32imafdcv`, I need
>> `-march=rv32imacv_zicsr_zve32f_zve32x_zve64x_zvl128b_zvl32b_zvl64b`.  That's
>> kind of a headache for users to type in.
>
> Yes, that's a headache for users, but arch string canonicalization is
> hiddened at the process,
> so the user could still just use rv32imafdcv at compile time and
> multi-lib config.
>
> And the driver and multilib-generator (with arch_canonicalize) script
> will handle those headache in the background.

Sorry, I'm not exactly sure what you're trying to say.  I just rebuilt
GCC with this patch (and t-linux-multilib regenerated from it), it's not
resolving multlibs for the short names.


Re: [PATCH] RISC-V: Update multilib-generator to handle V

2023-04-18 Thread Kito Cheng via Gcc-patches
Do you mind shared gcc configure and the option you tried?

On Wed, Apr 19, 2023 at 4:01 AM Palmer Dabbelt  wrote:
>
> On Tue, 18 Apr 2023 08:44:24 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
> >> Yep, if I drop the non-canonicial strings via
> >>
> >> diff --git a/gcc/config/riscv/multilib-generator 
> >> b/gcc/config/riscv/multilib-generator
> >> index 58b7198b243..a63a4d69c18 100755
> >> --- a/gcc/config/riscv/multilib-generator
> >> +++ b/gcc/config/riscv/multilib-generator
> >> @@ -174,7 +174,7 @@ for cmodel in cmodels:
> >>  ext_combs = expand_combination(ext)
> >>  alts = sum([[x] + [x + y for y in ext_combs] for x in [arch] + 
> >> extra], [])
> >>  alts = filter(lambda x: len(x) != 0, alts)
> >> -alts = alts + list(map(lambda a : arch_canonicalize(a, 
> >> args.misa_spec), alts))
> >> +alts = list(map(lambda a : arch_canonicalize(a, args.misa_spec), 
> >> alts))
> >>
> >>  # Drop duplicated entry.
> >>  alts = unique(alts)
> >>
> >> then I can't link `-march=rv32imafdcv`, I need
> >> `-march=rv32imacv_zicsr_zve32f_zve32x_zve64x_zvl128b_zvl32b_zvl64b`.  
> >> That's
> >> kind of a headache for users to type in.
> >
> > Yes, that's a headache for users, but arch string canonicalization is
> > hiddened at the process,
> > so the user could still just use rv32imafdcv at compile time and
> > multi-lib config.
> >
> > And the driver and multilib-generator (with arch_canonicalize) script
> > will handle those headache in the background.
>
> Sorry, I'm not exactly sure what you're trying to say.  I just rebuilt
> GCC with this patch (and t-linux-multilib regenerated from it), it's not
> resolving multlibs for the short names.


Re: [PATCH] RISC-V: Fix bug reported by PR109535

2023-04-18 Thread Kito Cheng via Gcc-patches
Hi Richard, Jeff:

It's it possible to backport to GCC 13? highway is one of our
important users for RISC-V vector stuff, and it has built in some
distro, so we believe this bug fix is important to backport.

Thanks

Hi Ju-Zhe:

Thanks for update

On Wed, Apr 19, 2023 at 7:25 AM  wrote:
>
> From: Ju-Zhe Zhong 
>
> Fix bug reported by google/highway who is using rvv intrinsic:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109535
>
> PR 109535
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (count_regno_occurrences): New 
> function.
> (pass_vsetvl::cleanup_insns): Fix bug.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr109535.c: New test.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc  | 15 ++-
>  .../gcc.target/riscv/rvv/base/pr109535.c  | 11 +++
>  2 files changed, 25 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109535.c
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 1b66e3b9eeb..b570b003a1e 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -1592,6 +1592,19 @@ backward_propagate_worthwhile_p (const basic_block 
> cfg_bb,
>return true;
>  }
>
> +/* Count the number of REGNO in RINSN.  */
> +static int
> +count_regno_occurrences (rtx_insn *rinsn, unsigned int regno)
> +{
> +  int count = 0;
> +  extract_insn (rinsn);
> +  for (int i = 0; i < recog_data.n_operands; i++)
> +if (REG_P (recog_data.operand[i])
> +   && REGNO (recog_data.operand[i]) == regno)
> +  count++;
> +  return count;
> +}
> +
>  avl_info::avl_info (const avl_info )
>  {
>m_value = other.get_value ();
> @@ -3924,7 +3937,7 @@ pass_vsetvl::cleanup_insns (void) const
>   if (!has_vl_op (rinsn) || !REG_P (get_vl (rinsn)))
> continue;
>   rtx avl = get_vl (rinsn);
> - if (count_occurrences (PATTERN (rinsn), avl, 0) == 1)
> + if (count_regno_occurrences (rinsn, REGNO (avl)) == 1)
> {
>   /* Get the list of uses for the new instruction.  */
>   auto attempt = crtl->ssa->new_change_attempt ();
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr109535.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr109535.c
> new file mode 100644
> index 000..7582fe9c392
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr109535.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=rv32gcv -mabi=ilp32d" } */
> +
> +#include "riscv_vector.h"
> +
> +void foo(void *in1, void *in2, void *in3, void *out, size_t vl) {
> +  vint8m1_t a = __riscv_vle8_v_i8m1(in1, vl);
> +  vint8m1_t b = __riscv_vadd_vx_i8m1 (a, vl, vl);
> +  __riscv_vse8_v_i8m1(out, b, vl);
> +}
> +
> --
> 2.36.1
>


[PATCH] i386: Add new pattern for zero-extend cmov

2023-04-18 Thread Andrew Pinski via Gcc-patches
After a phiopt change, I got a failure of cmov9.c.
The RTL IR has zero_extend on the outside of
the if_then_else rather than on the side. Both
ways are considered canonical as mentioned in
PR 66588.

This fixes the failure I got and also adds a testcase
which fails before even my phiopt patch but will pass
with this patch.

OK? Bootstrapped and tested on x86_64-linux-gnu with
no regressions.

gcc/ChangeLog:

* config/i386/i386.md (*movsicc_noc_zext_1): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/i386/cmov10.c: New test.
* gcc.target/i386/cmov11.c: New test.
---
 gcc/config/i386/i386.md| 16 
 gcc/testsuite/gcc.target/i386/cmov10.c | 10 ++
 gcc/testsuite/gcc.target/i386/cmov11.c | 10 ++
 3 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/cmov10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/cmov11.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 1419ea4cff3..10f15b1e8a8 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -21959,6 +21959,22 @@ (define_insn "*movsicc_noc_zext"
   [(set_attr "type" "icmov")
(set_attr "mode" "SI")])
 
+(define_insn "*movsicc_noc_zext_1"
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r")
+   (zero_extend:DI
+ (if_then_else:SI (match_operator 1 "ix86_comparison_operator"
+[(reg FLAGS_REG) (const_int 0)])
+(match_operand:SI 2 "nonimmediate_operand" "rm,0")
+(match_operand:SI 3 "nonimmediate_operand" "0,rm"]
+  "TARGET_64BIT
+   && TARGET_CMOVE && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "@
+   cmov%O2%C1\t{%2, %k0|%k0, %2}
+   cmov%O2%c1\t{%3, %k0|%k0, %3}"
+  [(set_attr "type" "icmov")
+   (set_attr "mode" "SI")])
+
+
 ;; Don't do conditional moves with memory inputs.  This splitter helps
 ;; register starved x86_32 by forcing inputs into registers before reload.
 (define_split
diff --git a/gcc/testsuite/gcc.target/i386/cmov10.c 
b/gcc/testsuite/gcc.target/i386/cmov10.c
new file mode 100644
index 000..9ba23b191fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/cmov10.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -dp" } */
+/* { dg-final { scan-assembler-not "zero_extendsidi" } } */
+
+
+void foo (unsigned long long *d, int a, unsigned int b, unsigned int c)
+{
+  *d = a ? b : c;
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/cmov11.c 
b/gcc/testsuite/gcc.target/i386/cmov11.c
new file mode 100644
index 000..ba8a5e692b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/cmov11.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -dp" } */
+/* { dg-final { scan-assembler-not "zero_extendsidi" } } */
+
+unsigned long long foo (int a, unsigned b, unsigned  c)
+{
+  unsigned t = a ? b : c;
+  return t;
+}
+
-- 
2.31.1



[PATCH] RISC-V: Fix bug reported by PR109535

2023-04-18 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Fix bug reported by google/highway who is using rvv intrinsic:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109535

PR 109535

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (count_regno_occurrences): New function.
(pass_vsetvl::cleanup_insns): Fix bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr109535.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 15 ++-
 .../gcc.target/riscv/rvv/base/pr109535.c  | 11 +++
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109535.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 1b66e3b9eeb..b570b003a1e 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1592,6 +1592,19 @@ backward_propagate_worthwhile_p (const basic_block 
cfg_bb,
   return true;
 }
 
+/* Count the number of REGNO in RINSN.  */
+static int
+count_regno_occurrences (rtx_insn *rinsn, unsigned int regno)
+{
+  int count = 0;
+  extract_insn (rinsn);
+  for (int i = 0; i < recog_data.n_operands; i++)
+if (REG_P (recog_data.operand[i])
+   && REGNO (recog_data.operand[i]) == regno)
+  count++;
+  return count;
+}
+
 avl_info::avl_info (const avl_info )
 {
   m_value = other.get_value ();
@@ -3924,7 +3937,7 @@ pass_vsetvl::cleanup_insns (void) const
  if (!has_vl_op (rinsn) || !REG_P (get_vl (rinsn)))
continue;
  rtx avl = get_vl (rinsn);
- if (count_occurrences (PATTERN (rinsn), avl, 0) == 1)
+ if (count_regno_occurrences (rinsn, REGNO (avl)) == 1)
{
  /* Get the list of uses for the new instruction.  */
  auto attempt = crtl->ssa->new_change_attempt ();
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr109535.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr109535.c
new file mode 100644
index 000..7582fe9c392
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr109535.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv32gcv -mabi=ilp32d" } */
+
+#include "riscv_vector.h"
+
+void foo(void *in1, void *in2, void *in3, void *out, size_t vl) {
+  vint8m1_t a = __riscv_vle8_v_i8m1(in1, vl);
+  vint8m1_t b = __riscv_vadd_vx_i8m1 (a, vl, vl);
+  __riscv_vse8_v_i8m1(out, b, vl);
+}
+
-- 
2.36.1



Re: [PATCH v4 07/10] vect: Verify that GET_MODE_NUNITS is a multiple of 2.

2023-04-18 Thread Michael Collison

Juzhe and Kito,

Thank you for the clarification.

On 4/18/23 18:48, juzhe.zh...@rivai.ai wrote:

Yes, like kito said.
We won't enable VNx1DImode in auto-vectorization so it's meaningless 
to fix it here.
We dynamic adjust the minimum vector-length for different '-march' 
according to RVV ISA specification.

So we strongly suggest that we should drop this fix.

Thanks.

juzhe.zh...@rivai.ai

*From:* Kito Cheng 
*Date:* 2023-04-19 02:21
*To:* Richard Biener ; Jeff Law
; Palmer Dabbelt

*CC:* Michael Collison ; gcc-patches
; 钟居哲 
*Subject:* Re: [PATCH v4 07/10] vect: Verify that GET_MODE_NUNITS
is a multiple of 2.
Few more background about RVV:
RISC-V has provide different VLEN configuration by different ISA
extension like `zve32x`, `zve64x` and `v`
zve32x just guarantee the minimal VLEN is 32 bits,
zve64x guarantee the minimal VLEN is 64 bits,
and v guarantee the minimal VLEN is 128 bits,
Current status (without that patch):
Zve32x: Mode for one vector register mode is VNx1SImode and VNx1DImode
is invalid mode
- one vector register could hold 1 + 1x SImode where x is 0~n, so it
might hold just one SI
Zve64x: Mode for one vector register mode is VNx1DImode or VNx2SImode
- one vector register could hold 1 + 1x DImode where x is 0~n, so it
might hold just one DI
- one vector register could hold 2 + 2x SImode where x is 0~n, so it
might hold just two SI
So what I want to say here is VNx1DImode is really NOT safe to assume
to have more than two DI in theory.
However `v` extension guarantees the minimal VLEN is 128 bits.
We are trying to introduce another type/mode mapping for this
configure:
v: Mode for one vector register mode is VNx2DImode or VNx4SImode
- one vector register could hold 2 + 2x DImode where x is 0~n, so it
will hold at least two DI
- one vector register could hold 4 + 4x SImode where x is 0~n, so it
will hold at least four DI
So GET_MODE_NUNITS for a single vector register with DI mode will
become 2 (VNx2DImode) if it is really possible, which is a more
precise way to model the vector extension for RISC-V .
On Tue, Apr 18, 2023 at 10:28 PM Kito Cheng 
wrote:
>
> Wait, VNx1DImode can be really evaluate to just one element if
> -march=rv64g_zve64x,
>
> I thinks this should be just fixed on backend by this patch:
>
>

https://patchwork.ozlabs.org/project/gcc/patch/20230414014518.15458-1-juzhe.zh...@rivai.ai/
>
> On Tue, Apr 18, 2023 at 2:12 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Mon, Apr 17, 2023 at 8:42 PM Michael Collison
 wrote:
> > >
> > > While working on autovectorizing for the RISCV port I
encountered an issue
> > > where can_duplicate_and_interleave_p assumes that
GET_MODE_NUNITS is a
> > > evenly divisible by two. The RISC-V target has vector modes
(e.g. VNx1DImode),
> > > where GET_MODE_NUNITS is equal to one.
> > >
> > > Tested on RISCV and x86_64-linux-gnu. Okay?
> >
> > OK.
> >
> > > 2023-03-09  Michael Collison 
> > >
> > > * tree-vect-slp.cc (can_duplicate_and_interleave_p):
> > > Check that GET_MODE_NUNITS is a multiple of 2.
> > > ---
> > >  gcc/tree-vect-slp.cc | 7 +--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index d73deaecce0..a64fe454e19 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -423,10 +423,13 @@ can_duplicate_and_interleave_p
(vec_info *vinfo, unsigned int count,
> > > (GET_MODE_BITSIZE (int_mode), 1);
> > >   tree vector_type
> > > = get_vectype_for_scalar_type (vinfo, int_type,
count);
> > > + poly_int64 half_nelts;
> > >   if (vector_type
> > >   && VECTOR_MODE_P (TYPE_MODE (vector_type))
> > >   && known_eq (GET_MODE_SIZE (TYPE_MODE
(vector_type)),
> > > -  GET_MODE_SIZE (base_vector_mode)))
> > > +  GET_MODE_SIZE (base_vector_mode))
> > > + && multiple_p (GET_MODE_NUNITS (TYPE_MODE
(vector_type)),
> > > +    2, _nelts))
> > > {
> > >   /* Try fusing consecutive sequences of COUNT /
NVECTORS elements
> > >  together into elements of type INT_TYPE and
using the result
> > > @@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info

[committed] libstdc++: Adjust uses of null pointer constants in docs

2023-04-18 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* doc/xml/manual/extensions.xml: Fix example to declare and
qualify std::free, and use NULL instead of 0.
* doc/html/manual/ext_demangling.html: Regenerate.
* libsupc++/cxxabi.h: Adjust doxygen comments.
---
 libstdc++-v3/doc/html/manual/ext_demangling.html | 8 +++-
 libstdc++-v3/doc/xml/manual/extensions.xml   | 8 +++-
 libstdc++-v3/libsupc++/cxxabi.h  | 4 ++--
 3 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/doc/html/manual/ext_demangling.html 
b/libstdc++-v3/doc/html/manual/ext_demangling.html
index 028ec71d8c8..1e7cdda8326 100644
--- a/libstdc++-v3/doc/html/manual/ext_demangling.html
+++ b/libstdc++-v3/doc/html/manual/ext_demangling.html
@@ -26,6 +26,7 @@
   
 #include exception
 #include iostream
+#include cstdlib
 #include cxxabi.h
 
 struct empty { };
@@ -33,7 +34,6 @@ struct empty { };
 template typename T, int N
   struct bar { };
 
-
 int main()
 {
   int status;
@@ -43,11 +43,9 @@ int main()
   barempty,17  u;
   const std::type_info  ti = typeid(u);
 
-  realname = abi::__cxa_demangle(ti.name(), 0, 0, status);
+  realname = abi::__cxa_demangle(ti.name(), NULL, NULL, status);
   std::cout  ti.name()  "\t= "  realname  
"\t: "  status  '\n';
-  free(realname);
-
-  return 0;
+  std::free(realname);
 }

  This prints
diff --git a/libstdc++-v3/doc/xml/manual/extensions.xml 
b/libstdc++-v3/doc/xml/manual/extensions.xml
index 196b55d8347..daa98f5cba7 100644
--- a/libstdc++-v3/doc/xml/manual/extensions.xml
+++ b/libstdc++-v3/doc/xml/manual/extensions.xml
@@ -521,6 +521,7 @@ get_temporary_buffer(5, (int*)0);

 #include exception
 #include iostream
+#include cstdlib
 #include cxxabi.h
 
 struct empty { };
@@ -528,7 +529,6 @@ struct empty { };
 template typename T, int N
   struct bar { };
 
-
 int main()
 {
   int status;
@@ -538,11 +538,9 @@ int main()
   barempty,17  u;
   const std::type_info  ti = typeid(u);
 
-  realname = abi::__cxa_demangle(ti.name(), 0, 0, status);
+  realname = abi::__cxa_demangle(ti.name(), NULL, NULL, status);
   std::cout  ti.name()  "\t= "  realname  
"\t: "  status  '\n';
-  free(realname);
-
-  return 0;
+  std::free(realname);
 }


diff --git a/libstdc++-v3/libsupc++/cxxabi.h b/libstdc++-v3/libsupc++/cxxabi.h
index 10179bc0a0d..ac0637b0343 100644
--- a/libstdc++-v3/libsupc++/cxxabi.h
+++ b/libstdc++-v3/libsupc++/cxxabi.h
@@ -169,7 +169,7 @@ namespace __cxxabiv1
*  @param __output_buffer A region of memory, allocated with
*  malloc, of @a *__length bytes, into which the demangled name is
*  stored.  If @a __output_buffer is not long enough, it is
-   *  expanded using realloc.  @a __output_buffer may instead be NULL;
+   *  expanded using realloc.  @a __output_buffer may instead be null;
*  in that case, the demangled name is placed in a region of memory
*  allocated with malloc.
*
@@ -184,7 +184,7 @@ namespace __cxxabiv1
*  -3: One of the arguments is invalid.
*
*  @return A pointer to the start of the NUL-terminated demangled
-   *  name, or NULL if the demangling fails.  The caller is
+   *  name, or a null pointer if the demangling fails.  The caller is
*  responsible for deallocating this memory using @c free.
*
*  The demangling is performed using the C++ ABI mangling rules,
-- 
2.40.0



Re: [PATCH v4 05/10] RISC-V:autovec: Add autovectorization patterns for binary integer operations

2023-04-18 Thread Jeff Law via Gcc-patches




On 4/17/23 12:36, Michael Collison wrote:

2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.md (riscv_vector_preferred_simd_mode): Include
vector-iterators.md.
* config/riscv/vector-auto.md: New file containing
autovectorization patterns.
* config/riscv/vector-iterators.md (UNSPEC_VADD/UNSPEC_VSUB):
New unspecs for autovectorization patterns.
* config/riscv/vector.md: Remove include of vector-iterators.md
and include vector-auto.md.
So the basic idea here appears to be to have a define_expand with the 
well known names (for the optab interface) generate RTL that is 
subsequently matched by the intrinsics that Juzhe has already defined 
and integrated.


That seems like a reasonable model to start with and get the basic 
functionality in place.  I'm all for focusing on that basic 
functionality first.




diff --git a/gcc/config/riscv/vector-auto.md b/gcc/config/riscv/vector-auto.md
new file mode 100644
index 000..dc62f9af705
--- /dev/null
+++ b/gcc/config/riscv/vector-auto.md
So basically vector-auto.md provides the interface to utilize the 
builtins found in vector.md.  Given the size of vector.md I can 
certainly see the desire to separate that out.




+
+
+;; -
+;;  [INT] Addition
Just a note.  This patch actually wires up plus, minus, and, ior, xor, 
ashift, ashiftrt and lshiftrt.  So it's quite a bit more than just 
addition.  So updating the comments is probably warranted.




+;; -
+;; Includes:
+;; - vadd.vv
+;; - vadd.vx
+;; - vadd.vi
+;; -
+
+(define_expand "3"
+  [(set (match_operand:VI 0 "register_operand")
+   (any_int_binop:VI (match_operand:VI 1 "register_operand")
+ (match_operand:VI 2 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = RVV_VUNDEF (mode);
+  rtx vl = gen_reg_rtx (Pmode);
+  emit_vlmax_vsetvl (mode, vl);
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_(operands[0], mask, merge, operands[1], 
operands[2],
+   vl, tail_policy, mask_policy, vlmax_avl_p));
Just nits.  Make sure to put a space before the open paren of an 
argument list, even when the argument list is empty.  Similarly for the 
other expander in here.  And update the comment.  You may not want to 
list every instruction handled by the expander.  Your call, though 
clearly if you're going to include them, the list ought to be reasonably 
complete.


No objections to this code.  It obviously depends on some bits earlier 
in the patchset which I still need to look at, but I wanted to look at 
this one first as it shows the basic formula for how to wire up the 
basic vector patterns.


Please wait for the prereqs to get reviewed before installing on the trunk.

jeff


Re: Re: [PATCH] RISC-V: Fix PR109535

2023-04-18 Thread juzhe.zhong
>> ChangeLog should reference the bug number, like this:
>> PR target/109535
>> Seems like this ought to be static. Though it's not clear why
>> count_occurrences didn't do what you needed.  Can you explain why
>> count_occurrences was insufficient for your needs?

Address comment, I will resend a patch with referencing bug PR number and 
adding "static".
The reason why count_occurrences can not work since we want to count the regno 
occurrences
instead of rtx occurrences.

The bug issue reported by google/highway project:
(set(..)
   (reg:QI s0)
(reg:DI s0))

The "avl" operand rtx  = (reg:DI s0)
count_occurrences return 1 however the actual regno occurrences should be 2.

Thanks


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-19 03:00
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Fix PR109535
 
 
On 4/17/23 20:03, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv-vsetvl.cc (count_regno_occurrences): New 
> function.
>  (pass_vsetvl::cleanup_insns): Fix bug.
ChangeLog should reference the bug number, like this:
 
PR target/109535
 
 
> 
> ---
>   gcc/config/riscv/riscv-vsetvl.cc | 15 ++-
>   1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 1b66e3b9eeb..43e2cf08377 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -1592,6 +1592,19 @@ backward_propagate_worthwhile_p (const basic_block 
> cfg_bb,
> return true;
>   }
>   
> +/* Count the number of REGNO in RINSN.  */
> +int
> +count_regno_occurrences (rtx_insn *rinsn, unsigned int regno)
Seems like this ought to be static. Though it's not clear why 
count_occurrences didn't do what you needed.  Can you explain why 
count_occurrences was insufficient for your needs?
 
 
 
 
Jeff
 


Re: Re: [PATCH v4 07/10] vect: Verify that GET_MODE_NUNITS is a multiple of 2.

2023-04-18 Thread juzhe.zhong
Yes, like kito said.
We won't enable VNx1DImode in auto-vectorization so it's meaningless to fix it 
here.
We dynamic adjust the minimum vector-length for different '-march' according to 
RVV ISA specification.
So we strongly suggest that we should drop this fix.

Thanks.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-04-19 02:21
To: Richard Biener; Jeff Law; Palmer Dabbelt
CC: Michael Collison; gcc-patches; 钟居哲
Subject: Re: [PATCH v4 07/10] vect: Verify that GET_MODE_NUNITS is a multiple 
of 2.
Few more background about RVV:
 
RISC-V has provide different VLEN configuration by different ISA
extension like `zve32x`, `zve64x` and `v`
zve32x just guarantee the minimal VLEN is 32 bits,
zve64x guarantee the minimal VLEN is 64 bits,
and v guarantee the minimal VLEN is 128 bits,
 
Current status (without that patch):
 
Zve32x: Mode for one vector register mode is VNx1SImode and VNx1DImode
is invalid mode
- one vector register could hold 1 + 1x SImode where x is 0~n, so it
might hold just one SI
 
Zve64x: Mode for one vector register mode is VNx1DImode or VNx2SImode
- one vector register could hold 1 + 1x DImode where x is 0~n, so it
might hold just one DI
- one vector register could hold 2 + 2x SImode where x is 0~n, so it
might hold just two SI
 
So what I want to say here is VNx1DImode is really NOT safe to assume
to have more than two DI in theory.
 
However `v` extension guarantees the minimal VLEN is 128 bits.
 
We are trying to introduce another type/mode mapping for this configure:
 
v: Mode for one vector register mode is VNx2DImode or VNx4SImode
- one vector register could hold 2 + 2x DImode where x is 0~n, so it
will hold at least two DI
- one vector register could hold 4 + 4x SImode where x is 0~n, so it
will hold at least four DI
 
So GET_MODE_NUNITS for a single vector register with DI mode will
become 2 (VNx2DImode) if it is really possible, which is a more
precise way to model the vector extension for RISC-V .
 
 
 
On Tue, Apr 18, 2023 at 10:28 PM Kito Cheng  wrote:
>
> Wait, VNx1DImode can be really evaluate to just one element if
> -march=rv64g_zve64x,
>
> I thinks this should be just fixed on backend by this patch:
>
> https://patchwork.ozlabs.org/project/gcc/patch/20230414014518.15458-1-juzhe.zh...@rivai.ai/
>
> On Tue, Apr 18, 2023 at 2:12 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Mon, Apr 17, 2023 at 8:42 PM Michael Collison  
> > wrote:
> > >
> > > While working on autovectorizing for the RISCV port I encountered an issue
> > > where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
> > > evenly divisible by two. The RISC-V target has vector modes (e.g. 
> > > VNx1DImode),
> > > where GET_MODE_NUNITS is equal to one.
> > >
> > > Tested on RISCV and x86_64-linux-gnu. Okay?
> >
> > OK.
> >
> > > 2023-03-09  Michael Collison  
> > >
> > > * tree-vect-slp.cc (can_duplicate_and_interleave_p):
> > > Check that GET_MODE_NUNITS is a multiple of 2.
> > > ---
> > >  gcc/tree-vect-slp.cc | 7 +--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index d73deaecce0..a64fe454e19 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -423,10 +423,13 @@ can_duplicate_and_interleave_p (vec_info *vinfo, 
> > > unsigned int count,
> > > (GET_MODE_BITSIZE (int_mode), 1);
> > >   tree vector_type
> > > = get_vectype_for_scalar_type (vinfo, int_type, count);
> > > + poly_int64 half_nelts;
> > >   if (vector_type
> > >   && VECTOR_MODE_P (TYPE_MODE (vector_type))
> > >   && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
> > > -  GET_MODE_SIZE (base_vector_mode)))
> > > +  GET_MODE_SIZE (base_vector_mode))
> > > + && multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),
> > > +2, _nelts))
> > > {
> > >   /* Try fusing consecutive sequences of COUNT / NVECTORS 
> > > elements
> > >  together into elements of type INT_TYPE and using the 
> > > result
> > > @@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info *vinfo, 
> > > unsigned int count,
> > >   poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE 
> > > (vector_type));
> > >   vec_perm_builder sel1 (nelts, 2, 3);
> > >   vec_perm_builder sel2 (nelts, 2, 3);
> > > - poly_int64 half_nelts = exact_div (nelts, 2);
> > > +
> > >   for (unsigned int i = 0; i < 3; ++i)
> > > {
> > >   sel1.quick_push (i);
> > > --
> > > 2.34.1
> > >
 


[PATCH] aarch64: Add the cost model for Neoverse N1

2023-04-18 Thread Evandro Menezes via Gcc-patches
This patch adds the cost model for Neoverse N1, based on the information from 
the "Arm Neoverse N1 Software Optimization Guide”.

-- 
Evandro Menezes



gcc/ChangeLog:

   * config/aarch64/aarch64-cores.def: Use the Neoverse N1 cost model.
   * config/aarch64/aarch64.cc
   (cortexa76_tunings): Rename variable.
   (neoversen1_addrcost_table): New variable.
   (neoversen1_vector_cost): Likewise.
   (neoversen1_regmove_cost): Likewise.
   (neoversen1_advsimd_vector_cost): Likewise.
   (neoversen1_scalar_issue_info): Likewise.
   (neoversen1_advsimd_issue_info): Likewise.
   (neoversen1_vec_issue_info): Likewise.
   (neoversen1_vector_cost): Likewise.
   (neoversen1_tunings): Likewise.
   * config/arm/aarch-cost-tables.h
   (neoversen1_extra_costs): New variable.

Signed-off-by: Evandro Menezes 
---
gcc/config/aarch64/aarch64-cores.def |  20 ++--
gcc/config/aarch64/aarch64.cc| 155 ---
gcc/config/arm/aarch-cost-tables.h   | 107 ++
3 files changed, 259 insertions(+), 23 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 2ec88c98400..e352e4077b1 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -105,17 +105,17 @@ AARCH64_CORE("thunderx2t99",  thunderx2t99,  
thunderx2t99, V8_1A,  (CRYPTO), thu
/* ARM ('A') cores. */
AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, V8_2A,  (F16, RCPC, DOTPROD), 
cortexa53, 0x41, 0xd05, -1)
AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, V8_2A,  (F16, RCPC, DOTPROD), 
cortexa73, 0x41, 0xd0a, -1)
-AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD), neoversen1, 0x41, 0xd0b, -1)
-AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), neoversen1, 0x41, 0xd0e, -1)
-AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS), neoversen1, 0x41, 0xd0d, -1)
-AARCH64_CORE("cortex-a78",  cortexa78, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), neoversen1, 0x41, 0xd41, -1)
-AARCH64_CORE("cortex-a78ae",  cortexa78ae, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS, PROFILE), neoversen1, 0x41, 0xd42, -1)
-AARCH64_CORE("cortex-a78c",  cortexa78c, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS, PROFILE, FLAGM, PAUTH), neoversen1, 0x41, 0xd4b, -1)
+AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD), cortexa76, 0x41, 0xd0b, -1)
+AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa76, 0x41, 0xd0e, -1)
+AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS), cortexa76, 0x41, 0xd0d, -1)
+AARCH64_CORE("cortex-a78",  cortexa78, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), cortexa76, 0x41, 0xd41, -1)
+AARCH64_CORE("cortex-a78ae",  cortexa78ae, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS, PROFILE), cortexa76, 0x41, 0xd42, -1)
+AARCH64_CORE("cortex-a78c",  cortexa78c, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS, PROFILE, FLAGM, PAUTH), cortexa76, 0x41, 0xd4b, -1)
AARCH64_CORE("cortex-a65",  cortexa65, cortexa53, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS), cortexa73, 0x41, 0xd06, -1)
AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa53, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa73, 0x41, 0xd43, -1)
-AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), neoversen1, 0x41, 0xd44, -1)
-AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE, PAUTH), neoversen1, 0x41, 0xd4c, -1)
-AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, PROFILE), 
neoversen1, 0x41, 0xd0c, -1)
+AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), cortexa76, 0x41, 0xd44, -1)
+AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE, PAUTH), cortexa76, 0x41, 0xd4c, -1)
+AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, PROFILE), 
cortexa76, 0x41, 0xd0c, -1)
AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa73, 0x41, 0xd4a, -1)

@@ -160,7 +160,7 @@ AARCH64_CORE("cortex-a73.cortex-a53",  cortexa73cortexa53, 
cortexa53, V8A,  (CRC
/* ARM DynamIQ big.LITTLE configurations.  */

AARCH64_CORE("cortex-a75.cortex-a55",  cortexa75cortexa55, cortexa53, V8_2A,  
(F16, RCPC, DOTPROD), cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd0a, 0xd05), -1)
-AARCH64_CORE("cortex-a76.cortex-a55",  cortexa76cortexa55, cortexa53, V8_2A,  
(F16, RCPC, DOTPROD), neoversen1, 0x41, AARCH64_BIG_LITTLE (0xd0b, 0xd05), -1)
+AARCH64_CORE("cortex-a76.cortex-a55",  cortexa76cortexa55, cortexa53, V8_2A,  
(F16, RCPC, DOTPROD), 

[PATCH v7] RISCV: Inline subword atomic ops

2023-04-18 Thread Patrick O'Neill
RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls 
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2023-04-18 Patrick O'Neill 

PR target/104338
* riscv-protos.h: Add helper function stubs.
* riscv.cc: Add helper functions for subword masking.
* riscv.opt: Add command-line flag.
* sync.md: Add masking logic and inline asm for fetch_and_op,
fetch_and_nand, CAS, and exchange ops.
* invoke.texi: Add blurb regarding command-line flag.
* inline-atomics-1.c: New test.
* inline-atomics-2.c: Likewise.
* inline-atomics-3.c: Likewise.
* inline-atomics-4.c: Likewise.
* inline-atomics-5.c: Likewise.
* inline-atomics-6.c: Likewise.
* inline-atomics-7.c: Likewise.
* inline-atomics-8.c: Likewise.
* atomic.c: Add reference to duplicate logic.

Signed-off-by: Patrick O'Neill 
Signed-off-by: Palmer Dabbelt 
---
Comment from Jeff Law that gives more context to this patch:
"So for others who may be interested.  The motivation here is that for a 
sub-word atomic we currently have to explicitly link in libatomic or we 
get undefined symbols.

This is particularly problematical for the distros because we're one of 
the few (only?) architectures supported by the distros that require 
linking in libatomic for these cases.  THe distros don't want to adjust 
each affected packages and be stuck carrying that change forward or 
negotiating with all the relevant upstreams.  The distros might tackle 
this problem by porting this patch into their compiler tree which has 
its own set of problems with long term maintenance.

The net is from a usability standpoint it's best if we get this problem 
addressed and backported to our gcc-13 RISC-V coordination branch.

We had held this up pending resolution of some other issues in the 
atomics space.  In retrospect that might have been a mistake."
  https://inbox.sourceware.org/gcc-patches/87y1mpb57m@igel.home/
---
v6: 
https://inbox.sourceware.org/gcc-patches/20230418163913.2429812-1-patr...@rivosinc.com/

Addressed Jeff Law's comments.
  https://inbox.sourceware.org/gcc-patches/87y1mpb57m@igel.home/

Changes:
- Simplified define_expand expressions/removed unneeded register
  constraints
- Improve comment describing riscv_subword_address
- Use #include "inline-atomics-1.c" in inline-atomics-2.c
- Use rtx addr_mask variable to describe the use of the -4 magic number.
- Misc. formatting/define_expand comments

No new failures on trunk.
---
The mapping implemented here matches Libatomic. That mapping changes if
"Implement ISA Manual Table A.6 Mappings" is merged. Depending on which
patch is merged first, I will update the other to make sure the
correct mapping is emitted.
  https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615748.html
---
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv.cc |  49 ++
 gcc/config/riscv/riscv.opt|   4 +
 gcc/config/riscv/sync.md  | 301 +
 gcc/doc/invoke.texi   |  10 +-
 .../gcc.target/riscv/inline-atomics-1.c   |  18 +
 .../gcc.target/riscv/inline-atomics-2.c   |   9 +
 .../gcc.target/riscv/inline-atomics-3.c   | 569 ++
 .../gcc.target/riscv/inline-atomics-4.c   | 566 +
 .../gcc.target/riscv/inline-atomics-5.c   |  87 +++
 .../gcc.target/riscv/inline-atomics-6.c   |  87 +++
 .../gcc.target/riscv/inline-atomics-7.c   |  69 +++
 .../gcc.target/riscv/inline-atomics-8.c   |  69 +++
 libgcc/config/riscv/atomic.c  |   2 +
 14 files changed, 1841 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5244e8dcbf0..02b33e02020 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -79,6 +79,8 @@ extern void riscv_reinit (void);
 extern poly_uint64 riscv_regmode_natural_size (machine_mode);
 extern bool 

[PATCH] aarch64: Add the scheduling model for Neoverse N1

2023-04-18 Thread Evandro Menezes via Gcc-patches
This patch adds the scheduling model for Neoverse N1, based on the information 
from the "Arm Neoverse N1 Software Optimization Guide”.

-- 
Evandro Menezes



gcc/ChangeLog:

* config/aarch64/aarch64-cores.def: Use the Neoverse N1 scheduling 
model.
* config/aarch64/aarch64.md: Include `neoverse-n1.md`.
* config/aarch64/neoverse-n1.md: New file.

Signed-off-by: Evandro Menezes 
---
 gcc/config/aarch64/aarch64-cores.def |   2 +-
 gcc/config/aarch64/aarch64.md|   1 +
 gcc/config/aarch64/neoverse-n1.md| 711 +++
 3 files changed, 713 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/aarch64/neoverse-n1.md

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index e352e4077b1..cc842c4e22c 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -116,7 +116,7 @@ AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa53, 
V8_2A,  (F16, RCPC, DOTPRO
 AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), cortexa76, 0x41, 0xd44, -1)
 AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE, PAUTH), cortexa76, 0x41, 0xd4c, -1)
 AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, PROFILE), 
cortexa76, 0x41, 0xd0c, -1)
-AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
+AARCH64_CORE("neoverse-n1",  neoversen1, neoversen1, V8_2A,  (F16, RCPC, 
DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
 AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa73, 0x41, 0xd4a, -1)
 
 /* Cavium ('C') cores. */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 022eef80bc1..6cb9e31259b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -471,6 +471,7 @@
 (include "../arm/cortex-a57.md")
 (include "../arm/exynos-m1.md")
 (include "falkor.md")
+(include "neoverse-n1.md")
 (include "saphira.md")
 (include "thunderx.md")
 (include "../arm/xgene1.md")
diff --git a/gcc/config/aarch64/neoverse-n1.md 
b/gcc/config/aarch64/neoverse-n1.md
new file mode 100644
index 000..d66fa10c330
--- /dev/null
+++ b/gcc/config/aarch64/neoverse-n1.md
@@ -0,0 +1,711 @@
+;; Arm Neoverse N1 pipeline description
+;; (Based on the "Arm Neoverse N1 Software Optimization Guide")
+;;
+;; Copyright (C) 2014-2023 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+;; The Neoverse N1 core is modelled as a multiple issue pipeline that has
+;; the following functional units.
+
+(define_automaton "neoverse_n1")
+
+;; 1 - Two pipelines for integer operations: SX1, SX2.
+
+(define_cpu_unit "neon1_sx1_issue" "neoverse_n1")
+(define_reservation "neon1_sx1" "neon1_sx1_issue")
+
+(define_cpu_unit "neon1_sx2_issue" "neoverse_n1")
+(define_reservation "neon1_sx2" "neon1_sx2_issue")
+
+;; 2 - One pipeline for complex integer operations: MX.
+
+(define_cpu_unit "neon1_mx_issue"
+"neoverse_n1")
+(define_reservation "neon1_mx" "neon1_mx_issue")
+(define_reservation "neon1_m_block" "neon1_mx_issue")
+
+;; 3 - Two asymmetric pipelines for Neon and FP operations: CX1, CX2.
+(define_automaton "neoverse_n1_cx")
+
+(define_cpu_unit "neon1_cx1_issue"
+"neoverse_n1_cx")
+(define_cpu_unit "neon1_cx2_issue"
+"neoverse_n1_cx")
+
+(define_reservation "neon1_cx1" "neon1_cx1_issue")
+(define_reservation "neon1_cx2" "neon1_cx2_issue")
+(define_reservation "neon1_v0_block" "neon1_cx1_issue")
+
+;; 4 - One pipeline for branch operations: BX.
+
+(define_cpu_unit "neon1_bx_issue" "neoverse_n1")
+(define_reservation "neon1_bx" "neon1_bx_issue")
+
+;; 5 - Two pipelines for load and store operations: LS1, LS2.
+
+(define_cpu_unit "neon1_ls1_issue" "neoverse_n1")
+(define_reservation "neon1_ls1" "neon1_ls1_issue")
+
+(define_cpu_unit "neon1_ls2_issue" "neoverse_n1")
+(define_reservation "neon1_ls2" "neon1_ls2_issue")
+
+;; Block all issue queues.
+
+(define_reservation "neon1_block" "neon1_sx1_issue + neon1_sx2_issue
+ + neon1_mx_issue
+ + neon1_cx1_issue + 

Re: [PATCH v5] RISCV: Inline subword atomic ops

2023-04-18 Thread Jeff Law via Gcc-patches




On 4/18/23 14:48, Patrick O'Neill wrote:

On 4/18/23 09:59, Jeff Law wrote:

On 4/18/23 08:28, Patrick O'Neill wrote:
...

+  rtx addr = force_reg (Pmode, XEXP (mem, 0));
+
+  rtx aligned_addr = gen_reg_rtx (Pmode);
+  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
+  gen_int_mode (-4, Pmode)));
So rather than -4 as a magic number, GET_MODE_MASK would be better. 
That may result in needing to rewrap this code.  I'd bring the 
gen_rtx_AND down on a new line, aligned with aligned_addr.
IIUC GET_MODE_MASK generates masks like 0xFF for QI (for example). It 
doesn't have the granularity to generate 0x3 (which we can NOT to get 
-4). I searched the GCC internals docs but couldn't find a function that 
does address alignment masks.

Yea, yea.  Big "duh" on my side.

Presumably using SImode is intentional here rather than wanting to use 
word_mode which would be SImode for rv32 and DImode for rv64?  I'm 
going to work based on that assumption, but if it isn't there's more 
work to do to generalize this code.
It's been a year but IIRC it was just simpler to implement (and to me it 
didn't make sense to use 64 bits for a subword op).

Is there a benefit in using 64 bit instructions when computing subwords?
Given that rv64 should have 32bit load/stores, I don't offhand see any 
advantage.




+
+(define_expand "atomic_fetch_nand"
+  [(set (match_operand:SHORT 0 "register_operand" "=")
+    (match_operand:SHORT 1 "memory_operand" "+A"))
+   (set (match_dup 1)
+    (unspec_volatile:SHORT
+  [(not:SHORT (and:SHORT (match_dup 1)
+ (match_operand:SHORT 2 "reg_or_0_operand" "rJ")))
+   (match_operand:SI 3 "const_int_operand")] ;; model
+ UNSPEC_SYNC_OLD_OP_SUBWORD))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
Just a note, constraints aren't necessary for a define_expand. They 
don't hurt anything though.  They do document expectations, but then 
you have to maintain them over time.  I'm OK leaving them, mostly 
wanted to make sure you're aware they aren't strictly necessary for a 
define_expand.
I wasn't aware, thanks for pointing it out! - you're referring to the 
"TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC", (not the register 
constraints) right?
I was referring to the register constraints like "=".  They're ignored 
on define_expand constructors.  A define_expand generates RTL that will 
be matched later by a define_insn.


The "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC" is usually referred 
to as the insn condition.




Thanks for reviewing!

NP.  Looking forward to V6 which I expect will be ready for inclusion.

jeff


Re: [PATCH v2] Add -gcodeview option

2023-04-18 Thread Jeff Law via Gcc-patches




On 4/18/23 14:32, Jason Merrill wrote:

On 4/18/23 15:57, Jeff Law via Gcc-patches wrote:



On 11/20/22 09:54, Mark Harmstone wrote:

On 20/11/22 16:43, Jeff Law wrote:


On 10/26/22 21:38, Mark Harmstone wrote:

Changed to double dashes as per
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604287.html.


What value is there in providing this option now?  IIUC we don't 
have any of the bits yet to actually produce PDB records.   It seems 
to me like this ought to be patch 1/n of a patch to produce PDB 
debug symbols.


This isn't useless, as ld will create symbols for the mangled names 
even without the .debug$S and .debug$T sections being present.


Sorry this didn't get resolved for gcc-13.  The good news is I have 
committed your V2 patch into the trunk for gcc-14.


FYI I've removed the stray obsolete @gol that broke building the docs.

Thanks.  I didn't realize those were obsolete.

jeff


Re: [PATCH v5] RISCV: Inline subword atomic ops

2023-04-18 Thread Patrick O'Neill

On 4/18/23 09:59, Jeff Law wrote:

On 4/18/23 08:28, Patrick O'Neill wrote:
...

+  rtx addr = force_reg (Pmode, XEXP (mem, 0));
+
+  rtx aligned_addr = gen_reg_rtx (Pmode);
+  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
+  gen_int_mode (-4, Pmode)));
So rather than -4 as a magic number, GET_MODE_MASK would be better. 
That may result in needing to rewrap this code.  I'd bring the 
gen_rtx_AND down on a new line, aligned with aligned_addr.
IIUC GET_MODE_MASK generates masks like 0xFF for QI (for example). It 
doesn't have the granularity to generate 0x3 (which we can NOT to get 
-4). I searched the GCC internals docs but couldn't find a function that 
does address alignment masks.
Presumably using SImode is intentional here rather than wanting to use 
word_mode which would be SImode for rv32 and DImode for rv64?  I'm 
going to work based on that assumption, but if it isn't there's more 
work to do to generalize this code.
It's been a year but IIRC it was just simpler to implement (and to me it 
didn't make sense to use 64 bits for a subword op).

Is there a benefit in using 64 bit instructions when computing subwords?

+
+(define_expand "atomic_fetch_nand"
+  [(set (match_operand:SHORT 0 "register_operand" "=")
+    (match_operand:SHORT 1 "memory_operand" "+A"))
+   (set (match_dup 1)
+    (unspec_volatile:SHORT
+  [(not:SHORT (and:SHORT (match_dup 1)
+ (match_operand:SHORT 2 "reg_or_0_operand" "rJ")))
+   (match_operand:SI 3 "const_int_operand")] ;; model
+ UNSPEC_SYNC_OLD_OP_SUBWORD))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
Just a note, constraints aren't necessary for a define_expand. They 
don't hurt anything though.  They do document expectations, but then 
you have to maintain them over time.  I'm OK leaving them, mostly 
wanted to make sure you're aware they aren't strictly necessary for a 
define_expand.
I wasn't aware, thanks for pointing it out! - you're referring to the 
"TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC", (not the register 
constraints) right?

...

Thanks for reviewing!
Patrick


[PATCH v2] expansion: make layout of x_shift*cost[][][] more efficient

2023-04-18 Thread Vineet Gupta
when debugging expmed.[ch] for PR/108987 saw that some of the cost arrays have
less than ideal layout as follows:

   x_shift*cost[0..63][speed][modes]

We would want speed to be first index since a typical compile will have
that fixed, followed by mode and then the shift values.

It should be non-functional from compiler semantics pov, except
executing slightly faster due to better locality of shift values for
given speed and mode. And also a bit more intutive when debugging.

gcc/Changelog:

* expmed.h (x_shift*_cost): convert to int [speed][mode][shift].
(shift*_cost_ptr ()): Access x_shift*_cost array directly.

Signed-off-by: Vineet Gupta 
---
Changes since v1:
   - Post a non stale version of patch
---
 gcc/expmed.h | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/gcc/expmed.h b/gcc/expmed.h
index c747a0da1637..22ae1d2d0743 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -161,15 +161,14 @@ struct target_expmed {
   struct expmed_op_cheap x_sdiv_pow2_cheap;
   struct expmed_op_cheap x_smod_pow2_cheap;
 
-  /* Cost of various pieces of RTL.  Note that some of these are indexed by
- shift count and some by mode.  */
+  /* Cost of various pieces of RTL.  */
   int x_zero_cost[2];
   struct expmed_op_costs x_add_cost;
   struct expmed_op_costs x_neg_cost;
-  struct expmed_op_costs x_shift_cost[MAX_BITS_PER_WORD];
-  struct expmed_op_costs x_shiftadd_cost[MAX_BITS_PER_WORD];
-  struct expmed_op_costs x_shiftsub0_cost[MAX_BITS_PER_WORD];
-  struct expmed_op_costs x_shiftsub1_cost[MAX_BITS_PER_WORD];
+  int x_shift_cost[2][NUM_MODE_IPV_INT][MAX_BITS_PER_WORD];
+  int x_shiftadd_cost[2][NUM_MODE_IPV_INT][MAX_BITS_PER_WORD];
+  int x_shiftsub0_cost[2][NUM_MODE_IPV_INT][MAX_BITS_PER_WORD];
+  int x_shiftsub1_cost[2][NUM_MODE_IPV_INT][MAX_BITS_PER_WORD];
   struct expmed_op_costs x_mul_cost;
   struct expmed_op_costs x_sdiv_cost;
   struct expmed_op_costs x_udiv_cost;
@@ -395,8 +394,8 @@ neg_cost (bool speed, machine_mode mode)
 inline int *
 shift_cost_ptr (bool speed, machine_mode mode, int bits)
 {
-  return expmed_op_cost_ptr (_target_expmed->x_shift_cost[bits],
-speed, mode);
+  int midx = expmed_mode_index (mode);
+  return _target_expmed->x_shift_cost[speed][midx][bits];
 }
 
 /* Set the COST of doing a shift in MODE by BITS when optimizing for SPEED.  */
@@ -421,8 +420,8 @@ shift_cost (bool speed, machine_mode mode, int bits)
 inline int *
 shiftadd_cost_ptr (bool speed, machine_mode mode, int bits)
 {
-  return expmed_op_cost_ptr (_target_expmed->x_shiftadd_cost[bits],
-speed, mode);
+  int midx = expmed_mode_index (mode);
+  return _target_expmed->x_shiftadd_cost[speed][midx][bits];
 }
 
 /* Set the COST of doing a shift in MODE by BITS followed by an add when
@@ -448,8 +447,8 @@ shiftadd_cost (bool speed, machine_mode mode, int bits)
 inline int *
 shiftsub0_cost_ptr (bool speed, machine_mode mode, int bits)
 {
-  return expmed_op_cost_ptr (_target_expmed->x_shiftsub0_cost[bits],
-speed, mode);
+  int midx = expmed_mode_index (mode);
+  return _target_expmed->x_shiftsub0_cost[speed][midx][bits];
 }
 
 /* Set the COST of doing a shift in MODE by BITS and then subtracting a
@@ -475,8 +474,8 @@ shiftsub0_cost (bool speed, machine_mode mode, int bits)
 inline int *
 shiftsub1_cost_ptr (bool speed, machine_mode mode, int bits)
 {
-  return expmed_op_cost_ptr (_target_expmed->x_shiftsub1_cost[bits],
-speed, mode);
+  int midx = expmed_mode_index (mode);
+  return _target_expmed->x_shiftsub1_cost[speed][midx][bits];
 }
 
 /* Set the COST of subtracting a shift in MODE by BITS from a value when
-- 
2.34.1



Re: [PATCH] Add inchash support for vrange.

2023-04-18 Thread Jakub Jelinek via Gcc-patches
On Tue, Apr 18, 2023 at 01:33:47PM +0200, Aldy Hernandez wrote:
> > > +  const irange  = as_a  (v);
> > > +  if (r.varying_p ())
> > > + hstate.add_int (VR_VARYING);
> > > +  else
> > > + hstate.add_int (VR_RANGE);
> > 
> > Shouldn't this also
> >hstate.add_int (r.num_pairs ());
> > ?
> > Or is that unnecessary because different number of add_wide_int
> > calls will likely result in different hashes then?
> 
> That was my thinking, and we could save one write.
> 
> I can add the num_pairs() if you prefer.  I don't have a strong opinion.

Me neither.  Let's go with your version then.

Jakub



Re: [PATCH] dse: Use SUBREG_REG for copy_to_mode_reg in DSE replace_read for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-18 Thread Jakub Jelinek via Gcc-patches
On Tue, Apr 18, 2023 at 07:35:44AM -0600, Jeff Law wrote:
> 
> 
> On 4/18/23 03:06, Jakub Jelinek wrote:
> > Hi!
> > 
> > While we've agreed this is not the right fix for the PR109040 bug,
> > the patch clearly improves generated code (at least on the testcase from the
> > PR), so I'd like to propose this as optimization heuristics improvement
> > for GCC 14.
> > 
> > Ok for trunk?
> > 
> > 2023-04-18  Jakub Jelinek  
> > 
> > PR target/109040
> > * dse.cc (replace_read): If read_reg is a SUBREG of a word mode
> > REG, for WORD_REGISTER_OPERATIONS copy SUBREG_REG of it into
> > a new REG rather than the SUBREG.
> Doesn't the new behavior need to be conditional on can_create_pseudos_p
> since the call to copy_to_mode_reg can ultimately call gen_reg_rtx.

Why?  copy_to_mode_reg was used before as well and it unconditionally does
  rtx temp = gen_reg_rtx (mode);
as the first thing in the function.
So, if replace_read is used during post-RA DSE instance, it would already
ICE before.
All the patch changes is it will in some cases do copy_to_mode_reg
on SUBREG_REG and create a new SUBREG instead of doing copy_to_mode_reg
on the original.
I think
  /* No place to keep the value after ra.  */
  && !reload_completed
in record_store prevents recording the rhs values in DSE2 and so
replace_read should never trigger there.

Jakub



Re: [PATCH v2] Add -gcodeview option

2023-04-18 Thread Jason Merrill via Gcc-patches

On 4/18/23 15:57, Jeff Law via Gcc-patches wrote:



On 11/20/22 09:54, Mark Harmstone wrote:

On 20/11/22 16:43, Jeff Law wrote:


On 10/26/22 21:38, Mark Harmstone wrote:

Changed to double dashes as per
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604287.html.


What value is there in providing this option now?  IIUC we don't have 
any of the bits yet to actually produce PDB records.   It seems to me 
like this ought to be patch 1/n of a patch to produce PDB debug symbols.


This isn't useless, as ld will create symbols for the mangled names 
even without the .debug$S and .debug$T sections being present.


Sorry this didn't get resolved for gcc-13.  The good news is I have 
committed your V2 patch into the trunk for gcc-14.


FYI I've removed the stray obsolete @gol that broke building the docs.

Jason



Re: [PATCH v6] RISCV: Inline subword atomic ops

2023-04-18 Thread Palmer Dabbelt

On Tue, 18 Apr 2023 09:39:13 PDT (-0700), Patrick O'Neill wrote:

RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2023-04-18 Patrick O'Neill 

PR target/104338
* riscv-protos.h: Add helper function stubs.
* riscv.cc: Add helper functions for subword masking.
* riscv.opt: Add command-line flag.
* sync.md: Add masking logic and inline asm for fetch_and_op,
fetch_and_nand, CAS, and exchange ops.
* invoke.texi: Add blurb regarding command-line flag.
* inline-atomics-1.c: New test.
* inline-atomics-2.c: Likewise.
* inline-atomics-3.c: Likewise.
* inline-atomics-4.c: Likewise.
* inline-atomics-5.c: Likewise.
* inline-atomics-6.c: Likewise.
* inline-atomics-7.c: Likewise.
* inline-atomics-8.c: Likewise.
* atomic.c: Add reference to duplicate logic.

Signed-off-by: Patrick O'Neill 
Signed-off-by: Palmer Dabbelt 
---
v5: 
https://inbox.sourceware.org/gcc-patches/20230418142858.2424851-1-patr...@rivosinc.com/

Addressed Andreas Schwab's comments about the flags/documentation.
  https://inbox.sourceware.org/gcc-patches/87y1mpb57m@igel.home/

No new failures on trunk.


Looks like Jeff had some comments as well.

IMO we should be targeting this for gcc-13: it's enough of a headache 
for distros that they'll likely backport it anyway, so we might as well 
just take on the pain ourselves.


Since Jeff and Kito have chimed in on the code I'll let them have some 
time to look, I wrote some of it so I'm OK with it.



---
The mapping implemented here matches Libatomic. That mapping changes if
"Implement ISA Manual Table A.6 Mappings" is merged. Depending on which
patch is merged first, I will update the other to make sure the
correct mapping is emitted.
  https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615748.html
---
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv.cc |  50 ++
 gcc/config/riscv/riscv.opt|   4 +
 gcc/config/riscv/sync.md  | 314 ++
 gcc/doc/invoke.texi   |  10 +-
 .../gcc.target/riscv/inline-atomics-1.c   |  18 +
 .../gcc.target/riscv/inline-atomics-2.c   |  19 +
 .../gcc.target/riscv/inline-atomics-3.c   | 569 ++
 .../gcc.target/riscv/inline-atomics-4.c   | 566 +
 .../gcc.target/riscv/inline-atomics-5.c   |  87 +++
 .../gcc.target/riscv/inline-atomics-6.c   |  87 +++
 .../gcc.target/riscv/inline-atomics-7.c   |  69 +++
 .../gcc.target/riscv/inline-atomics-8.c   |  69 +++
 libgcc/config/riscv/atomic.c  |   2 +
 14 files changed, 1865 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5244e8dcbf0..02b33e02020 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -79,6 +79,8 @@ extern void riscv_reinit (void);
 extern poly_uint64 riscv_regmode_natural_size (machine_mode);
 extern bool riscv_v_ext_vector_mode_p (machine_mode);
 extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
+extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
+extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);

 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e4937d1af25..fa0247be22f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7143,6 +7143,56 @@ riscv_zero_call_used_regs (HARD_REG_SET 
need_zeroed_hardregs)
& ~zeroed_hardregs);
 }

+/* Helper function for extracting a subword from memory.  */
+
+void
+riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
+  rtx *not_mask)
+{
+  /* Align the memory addess to a word.  */
+  rtx addr = force_reg (Pmode, XEXP 

Re: [PATCH] ifcvt.cc: Prevent excessive if-conversion for conditional moves

2023-04-18 Thread Jeff Law via Gcc-patches




On 1/10/23 21:20, Takayuki 'January June' Suwa via Gcc-patches wrote:

Currently, cond_move_process_if_block() does the conversion without
balancing the cost of the converted sequence with the original one, but
this should be checked by calling targetm.noce_conversion_profitable_p().

Doing so allows us to provide a way based on the target-specific cost
estimate, to prevent unwanted size growth due to excessive conditional
moves on optimizing for size.

On optimizing for speed, default_noce_conversion_profitable_p() allows
plenty of headroom, so this patch has little impact.

Also, if the target-specific cost estimate is accurate or allows for
margins, the impact should be similarly small.

gcc/ChangeLog:

* ifcvt.cc (cond_move_process_if_block):
Consider the result of targetm.noce_conversion_profitable_p()
when replacing the original sequence with the converted one.

THanks.  I pushed this to the trunk.

Jeff


Re: [PATCH] RISC-V: Update multilib-generator to handle V

2023-04-18 Thread Palmer Dabbelt

On Tue, 18 Apr 2023 08:44:24 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

Yep, if I drop the non-canonicial strings via

diff --git a/gcc/config/riscv/multilib-generator 
b/gcc/config/riscv/multilib-generator
index 58b7198b243..a63a4d69c18 100755
--- a/gcc/config/riscv/multilib-generator
+++ b/gcc/config/riscv/multilib-generator
@@ -174,7 +174,7 @@ for cmodel in cmodels:
 ext_combs = expand_combination(ext)
 alts = sum([[x] + [x + y for y in ext_combs] for x in [arch] + extra], 
[])
 alts = filter(lambda x: len(x) != 0, alts)
-alts = alts + list(map(lambda a : arch_canonicalize(a, 
args.misa_spec), alts))
+alts = list(map(lambda a : arch_canonicalize(a, args.misa_spec), alts))

 # Drop duplicated entry.
 alts = unique(alts)

then I can't link `-march=rv32imafdcv`, I need
`-march=rv32imacv_zicsr_zve32f_zve32x_zve64x_zvl128b_zvl32b_zvl64b`.  That's
kind of a headache for users to type in.


Yes, that's a headache for users, but arch string canonicalization is
hiddened at the process,
so the user could still just use rv32imafdcv at compile time and
multi-lib config.

And the driver and multilib-generator (with arch_canonicalize) script
will handle those headache in the background.


Sorry, I'm not exactly sure what you're trying to say.  I just rebuilt 
GCC with this patch (and t-linux-multilib regenerated from it), it's not 
resolving multlibs for the short names.


Re: [PATCH v2] Add -gcodeview option

2023-04-18 Thread Jeff Law via Gcc-patches




On 11/20/22 09:54, Mark Harmstone wrote:

On 20/11/22 16:43, Jeff Law wrote:


On 10/26/22 21:38, Mark Harmstone wrote:

Changed to double dashes as per
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604287.html.


What value is there in providing this option now?  IIUC we don't have 
any of the bits yet to actually produce PDB records.   It seems to me 
like this ought to be patch 1/n of a patch to produce PDB debug symbols.


This isn't useless, as ld will create symbols for the mangled names even 
without the .debug$S and .debug$T sections being present.
Sorry this didn't get resolved for gcc-13.  The good news is I have 
committed your V2 patch into the trunk for gcc-14.


Thanks for your patience,
jeff


Re: [PATCH] install.texi: Document --enable-decimal-float for AArch64

2023-04-18 Thread Richard Sandiford via Gcc-patches
Christophe Lyon via Gcc-patches  writes:
> When I committed the patches to enable support for DFP on AArch64, I
> forgot to update the installation documentation.
>
> This patch adds AArch64 as needed (same as i386/x86_64).
>
> OK for trunk and gcc-13?

OK for both, thanks.

Richard

> 2023-04-17  Christophe Lyon  
>
>   gcc/
>   * doc/install.texi (enable-decimal-float): Add AArch64.
> ---
>  gcc/doc/install.texi | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index 15aef1394f4..b13bc122513 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -2178,13 +2178,14 @@ forward to maintain the port.
>  @itemx --enable-decimal-float=dpd
>  @itemx --disable-decimal-float
>  Enable (or disable) support for the C decimal floating point extension
> -that is in the IEEE 754-2008 standard.  This is enabled by default only
> -on PowerPC, i386, and x86_64 GNU/Linux systems.  Other systems may also
> -support it, but require the user to specifically enable it.  You can
> -optionally control which decimal floating point format is used (either
> -@samp{bid} or @samp{dpd}).  The @samp{bid} (binary integer decimal)
> -format is default on i386 and x86_64 systems, and the @samp{dpd}
> -(densely packed decimal) format is default on PowerPC systems.
> +that is in the IEEE 754-2008 standard.  This is enabled by default
> +only on AArch64, PowerPC, i386, and x86_64 GNU/Linux systems.  Other
> +systems may also support it, but require the user to specifically
> +enable it.  You can optionally control which decimal floating point
> +format is used (either @samp{bid} or @samp{dpd}).  The @samp{bid}
> +(binary integer decimal) format is default on AArch64, i386 and x86_64
> +systems, and the @samp{dpd} (densely packed decimal) format is default
> +on PowerPC systems.
>  
>  @item --enable-fixed-point
>  @itemx --disable-fixed-point


[PATCH] testsuite: fix scan-tree-dump patterns [PR83904,PR100297]

2023-04-18 Thread Harald Anlauf via Gcc-patches
Dear all,

the attached patch adjusts the scan-tree-dump patterns of the
reported testcases which likely were run in a location such
that a path in an error message showing in the tree-dump might
have accidentally matched "free" or "data", respectively.

For the testcase gfortran.dg/reshape_8.f90 I checked with a
failing gfortran-11 that the pattern is appropriate.

OK for mainline?

Thanks,
Harald

From ad7ea82929f65ef34a13dea5a0fe23d567f220e8 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 18 Apr 2023 21:24:20 +0200
Subject: [PATCH] testsuite: fix scan-tree-dump patterns [PR83904,PR100297]

Adjust scan-tree-dump patterns so that they do not accidentally match a
valid path.

gcc/testsuite/ChangeLog:

	PR testsuite/83904
	PR fortran/100297
	* gfortran.dg/allocatable_function_1.f90: Use "__builtin_free "
	instead of the naive "free".
	* gfortran.dg/reshape_8.f90: Extend pattern from a simple "data".
---
 gcc/testsuite/gfortran.dg/allocatable_function_1.f90 | 2 +-
 gcc/testsuite/gfortran.dg/reshape_8.f90  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/allocatable_function_1.f90 b/gcc/testsuite/gfortran.dg/allocatable_function_1.f90
index f96ebc499e8..e38953bd777 100644
--- a/gcc/testsuite/gfortran.dg/allocatable_function_1.f90
+++ b/gcc/testsuite/gfortran.dg/allocatable_function_1.f90
@@ -107,4 +107,4 @@ contains
 end function bar

 end program alloc_fun
-! { dg-final { scan-tree-dump-times "free" 10 "original" } }
+! { dg-final { scan-tree-dump-times "__builtin_free " 10 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/reshape_8.f90 b/gcc/testsuite/gfortran.dg/reshape_8.f90
index 01799ac5c19..56812124cb8 100644
--- a/gcc/testsuite/gfortran.dg/reshape_8.f90
+++ b/gcc/testsuite/gfortran.dg/reshape_8.f90
@@ -11,4 +11,4 @@ program test
   a = reshape([1,2,3,4], [2,0])
   print *, a
 end
-! { dg-final { scan-tree-dump-times "data" 4 "original" } }
+! { dg-final { scan-tree-dump-not "data..0. =" "original" } }
--
2.35.3



Re: [PATCH v5] gcc: Drop obsolete INCLUDE_PTHREAD_H

2023-04-18 Thread Jeff Law via Gcc-patches




On 4/2/23 15:33, Sam James wrote:

gcc/ChangeLog:
* system.h: Drop unused INCLUDE_PTHREAD_H.

THanks.  I've pushed this to the trunk.
jeff


Re: [PATCH] PHIOPT: Move tree_ssa_cs_elim into pass_cselim::execute.

2023-04-18 Thread Jeff Law via Gcc-patches




On 4/18/23 13:21, Andrew Pinski via Gcc-patches wrote:

This moves around the code for tree_ssa_cs_elim slightly
improving code readability and removing declarations that
are no longer needed.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove declaration.
(make_pass_phiopt): Make execute out of line.
(tree_ssa_cs_elim): Move code into ...
(pass_cselim::execute): here.

OK
jeff


Re: [PATCH] c++: Define built-in for std::tuple_element [PR100157]

2023-04-18 Thread Patrick Palka via Gcc-patches
On Tue, 18 Apr 2023, Jason Merrill wrote:

> On 4/11/23 10:21, Patrick Palka wrote:
> > On Thu, 26 Jan 2023, Jason Merrill wrote:
> > 
> > > On 1/25/23 15:35, Patrick Palka wrote:
> > > > On Tue, 17 Jan 2023, Jason Merrill wrote:
> > > > 
> > > > > On 1/9/23 14:25, Patrick Palka via Gcc-patches wrote:
> > > > > > On Mon, 9 Jan 2023, Patrick Palka wrote:
> > > > > > 
> > > > > > > On Wed, 5 Oct 2022, Patrick Palka wrote:
> > > > > > > 
> > > > > > > > On Thu, 7 Jul 2022, Jonathan Wakely via Gcc-patches wrote:
> > > > > > > > 
> > > > > > > > > This adds a new built-in to replace the recursive class
> > > > > > > > > template
> > > > > > > > > instantiations done by traits such as std::tuple_element and
> > > > > > > > > std::variant_alternative. The purpose is to select the Nth
> > > > > > > > > type
> > > > > > > > > from a
> > > > > > > > > list of types, e.g. __builtin_type_pack_element(1, char, int,
> > > > > > > > > float)
> > > > > > > > > is
> > > > > > > > > int.
> > > > > > > > > 
> > > > > > > > > For a pathological example tuple_element_t<1000, tuple<2000
> > > > > > > > > types...>>
> > > > > > > > > the compilation time is reduced by more than 90% and the
> > > > > > > > > memory
> > > > > > > > > used
> > > > > > > > > by
> > > > > > > > > the compiler is reduced by 97%. In realistic examples the
> > > > > > > > > gains
> > > > > > > > > will
> > > > > > > > > be
> > > > > > > > > much smaller, but still relevant.
> > > > > > > > > 
> > > > > > > > > Clang has a similar built-in, __type_pack_element,
> > > > > > > > > but
> > > > > > > > > that's
> > > > > > > > > a
> > > > > > > > > "magic template" built-in using <> syntax, which GCC doesn't
> > > > > > > > > support.
> > > > > > > > > So
> > > > > > > > > this provides an equivalent feature, but as a built-in
> > > > > > > > > function
> > > > > > > > > using
> > > > > > > > > parens instead of <>. I don't really like the name "type pack
> > > > > > > > > element"
> > > > > > > > > (it gives you an element from a pack of types) but the
> > > > > > > > > semi-consistency
> > > > > > > > > with Clang seems like a reasonable argument in favour of
> > > > > > > > > keeping
> > > > > > > > > the
> > > > > > > > > name. I'd be open to alternative names though, e.g.
> > > > > > > > > __builtin_nth_type
> > > > > > > > > or __builtin_type_at_index.
> > > > > > > > 
> > > > > > > > Rather than giving the trait a different name from
> > > > > > > > __type_pack_element,
> > > > > > > > I wonder if we could just special case cp_parser_trait to expect
> > > > > > > > <>
> > > > > > > > instead of parens for this trait?
> > > > > > > > 
> > > > > > > > Btw the frontend recently got a generic TRAIT_TYPE tree code,
> > > > > > > > which
> > > > > > > > gets
> > > > > > > > rid of much of the boilerplate of adding a new type-yielding
> > > > > > > > built-in
> > > > > > > > trait, see e.g. cp-trait.def.
> > > > > > > 
> > > > > > > Here's a tested patch based on Jonathan's original patch that
> > > > > > > implements
> > > > > > > the built-in in terms of TRAIT_TYPE, names it __type_pack_element
> > > > > > > instead of __builtin_type_pack_element, and treats invocations of
> > > > > > > it
> > > > > > > like a template-id instead of a call (to match Clang).
> > > > > > > 
> > > > > > > -- >8 --
> > > > > > > 
> > > > > > > Subject: [PATCH] c++: Define built-in for std::tuple_element
> > > > > > > [PR100157]
> > > > > > > 
> > > > > > > This adds a new built-in to replace the recursive class template
> > > > > > > instantiations done by traits such as std::tuple_element and
> > > > > > > std::variant_alternative.  The purpose is to select the Nth type
> > > > > > > from
> > > > > > > a
> > > > > > > list of types, e.g. __type_pack_element<1, char, int, float> is
> > > > > > > int.
> > > > > > > We implement it as a special kind of TRAIT_TYPE.
> > > > > > > 
> > > > > > > For a pathological example tuple_element_t<1000, tuple<2000
> > > > > > > types...>>
> > > > > > > the compilation time is reduced by more than 90% and the memory
> > > > > > > used
> > > > > > > by
> > > > > > > the compiler is reduced by 97%.  In realistic examples the gains
> > > > > > > will
> > > > > > > be
> > > > > > > much smaller, but still relevant.
> > > > > > > 
> > > > > > > Unlike the other built-in traits, __type_pack_element uses
> > > > > > > template-id
> > > > > > > syntax instead of call syntax and is SFINAE-enabled, matching
> > > > > > > Clang's
> > > > > > > implementation.  And like the other built-in traits, it's not
> > > > > > > mangleable
> > > > > > > so we can't use it directly in function signatures.
> > > > > > > 
> > > > > > > Some caveats:
> > > > > > > 
> > > > > > >  * Clang's version of the built-in seems to act like a "magic
> > > > > > > template"
> > > > > > >that can e.g. be used as a template template argument.  For
> > > > > > > simplicity
> > > > > > >we implement it in a more ad-hoc way.
> > > > > > >  * Our parsing of 

[PATCH] PHIOPT: Move tree_ssa_cs_elim into pass_cselim::execute.

2023-04-18 Thread Andrew Pinski via Gcc-patches
This moves around the code for tree_ssa_cs_elim slightly
improving code readability and removing declarations that
are no longer needed.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove declaration.
(make_pass_phiopt): Make execute out of line.
(tree_ssa_cs_elim): Move code into ...
(pass_cselim::execute): here.
---
 gcc/tree-ssa-phiopt.cc | 118 -
 1 file changed, 57 insertions(+), 61 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 616b5778602..945507be11e 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -55,7 +55,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-propagate.h"
 #include "tree-ssa-dce.h"
 
-static unsigned int tree_ssa_phiopt_worker (bool, bool, bool);
 static bool two_value_replacement (basic_block, basic_block, edge, gphi *,
   tree, tree);
 static bool match_simplify_replacement (basic_block, basic_block,
@@ -78,62 +77,6 @@ static hash_set * get_non_trapping ();
 static void hoist_adjacent_loads (basic_block, basic_block,
  basic_block, basic_block);
 
-/* This pass tries to transform conditional stores into unconditional
-   ones, enabling further simplifications with the simpler then and else
-   blocks.  In particular it replaces this:
-
- bb0:
-   if (cond) goto bb2; else goto bb1;
- bb1:
-   *p = RHS;
- bb2:
-
-   with
-
- bb0:
-   if (cond) goto bb1; else goto bb2;
- bb1:
-   condtmp' = *p;
- bb2:
-   condtmp = PHI 
-   *p = condtmp;
-
-   This transformation can only be done under several constraints,
-   documented below.  It also replaces:
-
- bb0:
-   if (cond) goto bb2; else goto bb1;
- bb1:
-   *p = RHS1;
-   goto bb3;
- bb2:
-   *p = RHS2;
- bb3:
-
-   with
-
- bb0:
-   if (cond) goto bb3; else goto bb1;
- bb1:
- bb3:
-   condtmp = PHI 
-   *p = condtmp;  */
-
-static unsigned int
-tree_ssa_cs_elim (void)
-{
-  unsigned todo;
-  /* ???  We are not interested in loop related info, but the following
- will create it, ICEing as we didn't init loops with pre-headers.
- An interfacing issue of find_data_references_in_bb.  */
-  loop_optimizer_init (LOOPS_NORMAL);
-  scev_initialize ();
-  todo = tree_ssa_phiopt_worker (true, false, false);
-  scev_finalize ();
-  loop_optimizer_finalize ();
-  return todo;
-}
-
 /* Return the singleton PHI in the SEQ of PHIs for edges E0 and E1. */
 
 static gphi *
@@ -4278,6 +4221,47 @@ make_pass_phiopt (gcc::context *ctxt)
   return new pass_phiopt (ctxt);
 }
 
+/* This pass tries to transform conditional stores into unconditional
+   ones, enabling further simplifications with the simpler then and else
+   blocks.  In particular it replaces this:
+
+ bb0:
+   if (cond) goto bb2; else goto bb1;
+ bb1:
+   *p = RHS;
+ bb2:
+
+   with
+
+ bb0:
+   if (cond) goto bb1; else goto bb2;
+ bb1:
+   condtmp' = *p;
+ bb2:
+   condtmp = PHI 
+   *p = condtmp;
+
+   This transformation can only be done under several constraints,
+   documented below.  It also replaces:
+
+ bb0:
+   if (cond) goto bb2; else goto bb1;
+ bb1:
+   *p = RHS1;
+   goto bb3;
+ bb2:
+   *p = RHS2;
+ bb3:
+
+   with
+
+ bb0:
+   if (cond) goto bb3; else goto bb1;
+ bb1:
+ bb3:
+   condtmp = PHI 
+   *p = condtmp;  */
+
 namespace {
 
 const pass_data pass_data_cselim =
@@ -4302,10 +4286,7 @@ public:
 
   /* opt_pass methods: */
   bool gate (function *) final override { return flag_tree_cselim; }
-  unsigned int execute (function *) final override
-  {
-return tree_ssa_cs_elim ();
-  }
+  unsigned int execute (function *) final override;
 
 }; // class pass_cselim
 
@@ -4316,3 +4297,18 @@ make_pass_cselim (gcc::context *ctxt)
 {
   return new pass_cselim (ctxt);
 }
+
+unsigned int
+pass_cselim::execute (function *)
+{
+  unsigned todo;
+  /* ???  We are not interested in loop related info, but the following
+ will create it, ICEing as we didn't init loops with pre-headers.
+ An interfacing issue of find_data_references_in_bb.  */
+  loop_optimizer_init (LOOPS_NORMAL);
+  scev_initialize ();
+  todo = tree_ssa_phiopt_worker (true, false, false);
+  scev_finalize ();
+  loop_optimizer_finalize ();
+  return todo;
+}
-- 
2.39.1



Re: [PATCH] c++: Define built-in for std::tuple_element [PR100157]

2023-04-18 Thread Jason Merrill via Gcc-patches

On 4/11/23 10:21, Patrick Palka wrote:

On Thu, 26 Jan 2023, Jason Merrill wrote:


On 1/25/23 15:35, Patrick Palka wrote:

On Tue, 17 Jan 2023, Jason Merrill wrote:


On 1/9/23 14:25, Patrick Palka via Gcc-patches wrote:

On Mon, 9 Jan 2023, Patrick Palka wrote:


On Wed, 5 Oct 2022, Patrick Palka wrote:


On Thu, 7 Jul 2022, Jonathan Wakely via Gcc-patches wrote:


This adds a new built-in to replace the recursive class template
instantiations done by traits such as std::tuple_element and
std::variant_alternative. The purpose is to select the Nth type
from a
list of types, e.g. __builtin_type_pack_element(1, char, int,
float)
is
int.

For a pathological example tuple_element_t<1000, tuple<2000
types...>>
the compilation time is reduced by more than 90% and the memory
used
by
the compiler is reduced by 97%. In realistic examples the gains
will
be
much smaller, but still relevant.

Clang has a similar built-in, __type_pack_element, but
that's
a
"magic template" built-in using <> syntax, which GCC doesn't
support.
So
this provides an equivalent feature, but as a built-in function
using
parens instead of <>. I don't really like the name "type pack
element"
(it gives you an element from a pack of types) but the
semi-consistency
with Clang seems like a reasonable argument in favour of keeping
the
name. I'd be open to alternative names though, e.g.
__builtin_nth_type
or __builtin_type_at_index.


Rather than giving the trait a different name from
__type_pack_element,
I wonder if we could just special case cp_parser_trait to expect <>
instead of parens for this trait?

Btw the frontend recently got a generic TRAIT_TYPE tree code, which
gets
rid of much of the boilerplate of adding a new type-yielding
built-in
trait, see e.g. cp-trait.def.


Here's a tested patch based on Jonathan's original patch that
implements
the built-in in terms of TRAIT_TYPE, names it __type_pack_element
instead of __builtin_type_pack_element, and treats invocations of it
like a template-id instead of a call (to match Clang).

-- >8 --

Subject: [PATCH] c++: Define built-in for std::tuple_element
[PR100157]

This adds a new built-in to replace the recursive class template
instantiations done by traits such as std::tuple_element and
std::variant_alternative.  The purpose is to select the Nth type from
a
list of types, e.g. __type_pack_element<1, char, int, float> is int.
We implement it as a special kind of TRAIT_TYPE.

For a pathological example tuple_element_t<1000, tuple<2000 types...>>
the compilation time is reduced by more than 90% and the memory  used
by
the compiler is reduced by 97%.  In realistic examples the gains will
be
much smaller, but still relevant.

Unlike the other built-in traits, __type_pack_element uses template-id
syntax instead of call syntax and is SFINAE-enabled, matching Clang's
implementation.  And like the other built-in traits, it's not
mangleable
so we can't use it directly in function signatures.

Some caveats:

 * Clang's version of the built-in seems to act like a "magic
template"
   that can e.g. be used as a template template argument.  For
simplicity
   we implement it in a more ad-hoc way.
 * Our parsing of the <>'s in __type_pack_element<...> is currently
   rudimentary and doesn't try to disambiguate a trailing >> vs > >
   as cp_parser_enclosed_template_argument_list does.


Hmm, this latter caveat turns out to be inconvenient (for code such as
type_pack_element3.C) and admits an easy workaround inspired by what
cp_parser_enclosed_template_argument_list does.

v2: Consider the >> in __type_pack_element<0, int, char>> to be two >'s.
   Handle non-type TRAIT_TYPE_TYPE1 in strip_typedefs (for sake of
   CPTK_TYPE_PACK_ELEMENT).


Why not use cp_parser_enclosed_template_argument_list directly?


If we used cp_parser_enclosed_template_argument_list we would then need
to convert the returned TREE_VEC into a TREE_LIST and also diagnose
argument kind mismatches (i.e. verify the first argument is an
expression and the rest are types).  It seemed like more complexity
overall then just duplicating the >> splitting logic, but I can do that
if you prefer?


I think I would prefer that, parser stuff can be pretty subtle.

Instead of turning the TREE_VEC into a TREE_LIST, we could handle TREE_VEC as
a trait operand?


Sorry for the late follow up...  Here's an updated patch that uses
cp_parser_enclosed_template_argument_list instead of copying the parsing
logic from there.  I put off handling TREE_VEC as a trait operand for
now.  We could convert all variadic traits to use TREE_VEC instead of
TREE_LIST at once in a followup patch.

Bootstrapped and regtested on x86_64-pc-linux-gnu, also tested against
libc++'s tuple/variant impl for good measure (which uses
__type_pack_element when available).

-- >8 --

Subject: [PATCH] c++: Define built-in for std::tuple_element [PR100157]

This adds a new built-in to replace the recursive class template
instantiations done by traits such as 

Re: [PATCH] RISC-V: Fix PR109535

2023-04-18 Thread Kito Cheng via Gcc-patches
Hi Jeff, Ju-Zhe:

Let you know that I am running creduce with this testcase for reduce
the size of testcsae, it's really too huge...

On Wed, Apr 19, 2023 at 3:00 AM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 4/17/23 20:03, juzhe.zh...@rivai.ai wrote:
> > From: Ju-Zhe Zhong 
> >
> > gcc/ChangeLog:
> >
> >  * config/riscv/riscv-vsetvl.cc (count_regno_occurrences): New 
> > function.
> >  (pass_vsetvl::cleanup_insns): Fix bug.
> ChangeLog should reference the bug number, like this:
>
> PR target/109535
>
>
> >
> > ---
> >   gcc/config/riscv/riscv-vsetvl.cc | 15 ++-
> >   1 file changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> > b/gcc/config/riscv/riscv-vsetvl.cc
> > index 1b66e3b9eeb..43e2cf08377 100644
> > --- a/gcc/config/riscv/riscv-vsetvl.cc
> > +++ b/gcc/config/riscv/riscv-vsetvl.cc
> > @@ -1592,6 +1592,19 @@ backward_propagate_worthwhile_p (const basic_block 
> > cfg_bb,
> > return true;
> >   }
> >
> > +/* Count the number of REGNO in RINSN.  */
> > +int
> > +count_regno_occurrences (rtx_insn *rinsn, unsigned int regno)
> Seems like this ought to be static. Though it's not clear why
> count_occurrences didn't do what you needed.  Can you explain why
> count_occurrences was insufficient for your needs?
>
>
>
>
> Jeff


Re: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build

2023-04-18 Thread Jeff Law via Gcc-patches




On 3/14/23 15:21, Eugene Rozenfeld wrote:

 From 1808fe371ab5618b7c0ce22c0dbecdaf593e516d Mon Sep 17 00:00:00 2001
From: Eugene Rozenfeld
Date: Mon, 21 Nov 2022 13:33:38 -0800
Subject: [PATCH] Fix autoprofiledbootstrap build

1. Fix gcov version
2. Merge perf data collected when compiling the compiler and runtime libraries
3. Fix documentation typo

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:

* Makefile.in: Define PROFILE_MERGER
* Makefile.tpl: Define PROFILE_MERGER
* c/Make-lang.in: Merge perf data collected when compiling cc1 and 
runtime libraries
* cp/Make-lang.in: Merge perf data collected when compiling cc1plus and 
runtime libraries
* lto/Make-lang.in: Merge perf data collected when compiling lto1 and 
runtime libraries
* doc/install.texi: Fix documentation typo

OK
jeff


Re: [PATCH] riscv: relax splitter restrictions for creating pseudos

2023-04-18 Thread Vineet Gupta




On 4/18/23 11:36, Jeff Law wrote:



On 4/18/23 08:36, Vineet Gupta wrote:

[partial addressing of PR/109279]

RISCV splitters have restrictions to not create pesudos due to a combine
limitatation. And despite this being a split-during-combine limitation,
all split passes take the hit due to way define*_split are used in gcc.

With the original combine issue being fixed 61bee6aed2 ("combine: Don't
record for UNDO_MODE pointers into regno_reg_rtx array [PR104985]")
the RV splitters can now be relaxed.

This improves the codegen in general. e.g.

long long f(void) { return 0x0101010101010101ull; }

Before

li    a0,0x0101
addi    a0,0x0101
slli    a0,a0,16
addi    a0,a0,0x0101
slli    a0,a0,16
addi    a0,a0,0x0101
ret

With patch

li    a5,0x0101
addi    a5,a5,0x0101
mv    a0,a5
slli    a5,a5,32
add    a0,a5,a0
ret

This is testsuite clean, no regression w/ patch.

    = Summary of gcc testsuite =
 | # of unexpected case / # of unique 
unexpected case

 |  gcc |  g++ | gfortran |
  rv64imafdc/  lp64d/ medlow |    2 / 2 |    1 / 1 | 6 / 1 |
    rv64imac/   lp64/ medlow |    3 / 3 |    1 / 1 | 43 / 
8 |

  rv32imafdc/ ilp32d/ medlow |    1 / 1 |    3 / 2 | 6 / 1 |
    rv32imac/  ilp32/ medlow |    1 / 1 |    3 / 2 | 43 / 
8 |


This came up as part of IRC chat on PR/109279 and was suggested by
Andrew Pinski.

Signed-off-by: Vineet Gupta 
---
  gcc/config/riscv/riscv-protos.h |  4 +--
  gcc/config/riscv/riscv.cc   | 46 +
  gcc/config/riscv/riscv.md   |  8 +++---
  3 files changed, 24 insertions(+), 34 deletions(-)

This looks fine, except that you don't have a ChangeLog.


Oops sorry, yeah realized that right after pressing send. I did post a 
v2 with the changelog.


It also looks like you don't have write permissions in the repository 
(not listed in the MAINTAINERS file).  We might as well fix the 
latter. You can then add a ChangeLog and push this yourself.


That would be awesome. Thx.


Start with this form:

https://sourceware.org/cgi-bin/pdw/ps_form.cgi

Go ahead and list me as approving your request.


Done.

Thx again.
-Vineet


Re: [PATCH] RISC-V: Fix PR109535

2023-04-18 Thread Jeff Law via Gcc-patches




On 4/17/23 20:03, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

gcc/ChangeLog:

 * config/riscv/riscv-vsetvl.cc (count_regno_occurrences): New function.
 (pass_vsetvl::cleanup_insns): Fix bug.

ChangeLog should reference the bug number, like this:

PR target/109535




---
  gcc/config/riscv/riscv-vsetvl.cc | 15 ++-
  1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 1b66e3b9eeb..43e2cf08377 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1592,6 +1592,19 @@ backward_propagate_worthwhile_p (const basic_block 
cfg_bb,
return true;
  }
  
+/* Count the number of REGNO in RINSN.  */

+int
+count_regno_occurrences (rtx_insn *rinsn, unsigned int regno)
Seems like this ought to be static. Though it's not clear why 
count_occurrences didn't do what you needed.  Can you explain why 
count_occurrences was insufficient for your needs?





Jeff


Re: [PATCH] doc: Document order of define_peephole2 scanning

2023-04-18 Thread Hans-Peter Nilsson via Gcc-patches
> From: Paul Koning 

> Date: Tue, 18 Apr 2023 14:32:07 -0400
> 
> I'm not sure about the meaning of part of this.
> "...resumes at the last generated insn."  Does that mean:

(Neither...)
 
> 1. If a match is found at some insn, the replacement
> defined by the matching define_peephole2 is performed, and
> then the scan resumes at the first of the insns produced
> by the replacement.

This was what I expected.  If it had been this, I wouldn't
have suggested the doc update.  But it isn't: no, it's the
*last produced one*.  If you look at the referenced example
(unfortunately outside of the diff context) it should all be
clear.

> or
> 
> 2. If a match is found at some insn, the replacement
> defined by the matching define_peephole2 is performed, and
> then the scan resumes at the insn immediately following
> the ones just matched.

No, from the last of the replacement insns.

> "Last generated" seems to fit option 1,

Sorry, your confusement confuses me.  I just don't see how
to confuse last with first or matched with generated. :)

> but I'm not sure
> if that's what you meant.  Maybe you could add some words
> to say more explicitly which it is.

I'm referring to an example on the same pdf page.

But perhaps s/resumes at the last generated insn/resumes at
the last insn in the replacement sequence/ would help?

brgds, H-P


> 
>   paul
> 
> > On Apr 18, 2023, at 1:55 PM, Hans-Peter Nilsson via Gcc-patches 
> >  wrote:
> > 
> > Generated pdf inspected.  Ok to commit?
> > 
> > Thoughts on fixing the IMHO wart to also expose all
> > replacements to all define_peephole2?  Looks feasible
> > (famous last words), but then again I haven't checked the
> > history yet.
> > 
> > -- >8 --
> > I was a bit surprised when my define_peephole2 didn't match,
> > but it was because it was expected to partially match the
> > generated output of a previous define_peephole2.  I had
> > assumed that the algorithm exposed newly created opportunities
> > to all define_peephole2's.  While things can change in that
> > direction, let's start with documenting the current state.
> > 
> > * doc/md.texi (define_peephole2): Document order of scanning.
> > ---
> > gcc/doc/md.texi | 8 
> > 1 file changed, 8 insertions(+)
> > 
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 07bf8bdebffb..0f9e32d2c648 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -9362,6 +9362,14 @@ If the preparation falls through (invokes neither 
> > @code{DONE} nor
> > @code{FAIL}), then the @code{define_peephole2} uses the replacement
> > template.
> > 
> > +Insns are scanned in forward order from beginning to end for each basic
> > +block, but the basic blocks are scanned in reverse order of appearance
> > +in a function.  After a successful replacement, scanning for further
> > +opportunities for @code{define_peephole2} matches, resumes at the last
> > +generated insn.  I.e. for the example above, the first insn that can be
> > +matched by another @code{define_peephole2}, is @code{(set (match_dup 3)
> > +(match_dup 4))}.
> > +
> > @end ifset
> > @ifset INTERNALS
> > @node Insn Attributes
> > -- 
> > 2.30.2
> > 
> 


Re: [PATCH v3] vect: Verify that GET_MODE_UNITS is greater than one for vect_grouped_store_supported

2023-04-18 Thread Jeff Law via Gcc-patches




On 4/17/23 10:38, Kevin Lee wrote:

This patch properly guards gcc_assert (multiple_p (m_full_nelts,
m_npatterns)) in vec_perm_indices indices (sel, 2, nelt) for VNx1 vectors.

Based on the feedback from Richard Biener and Richard Sandiford,
multiple_p has been used instead of maybe_lt to compare nelt with the
minimum size 2.

Bootstrap and testing done on x86_64-pc-linux-gnu. Would this be ok for trunk?

Patch V1: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614463.html
Patch V2: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614700.html
  
Kevin Lee 

gcc/ChangeLog:

* tree-vect-data-refs.cc (vect_grouped_store_supported): Add new
condition.

I fixed up the indentation and pushed this to the trunk.

jeff


Re: [PATCH] riscv: relax splitter restrictions for creating pseudos

2023-04-18 Thread Jeff Law via Gcc-patches




On 4/18/23 08:36, Vineet Gupta wrote:

[partial addressing of PR/109279]

RISCV splitters have restrictions to not create pesudos due to a combine
limitatation. And despite this being a split-during-combine limitation,
all split passes take the hit due to way define*_split are used in gcc.

With the original combine issue being fixed 61bee6aed2 ("combine: Don't
record for UNDO_MODE pointers into regno_reg_rtx array [PR104985]")
the RV splitters can now be relaxed.

This improves the codegen in general. e.g.

long long f(void) { return 0x0101010101010101ull; }

Before

li  a0,0x0101
addia0,0x0101
sllia0,a0,16
addia0,a0,0x0101
sllia0,a0,16
addia0,a0,0x0101
ret

With patch

li  a5,0x0101
addia5,a5,0x0101
mv  a0,a5
sllia5,a5,32
add a0,a5,a0
ret

This is testsuite clean, no regression w/ patch.

= Summary of gcc testsuite =
 | # of unexpected case / # of unique unexpected 
case
 |  gcc |  g++ | gfortran |
  rv64imafdc/  lp64d/ medlow |2 / 2 |1 / 1 |6 / 1 |
rv64imac/   lp64/ medlow |3 / 3 |1 / 1 |   43 / 8 |
  rv32imafdc/ ilp32d/ medlow |1 / 1 |3 / 2 |6 / 1 |
rv32imac/  ilp32/ medlow |1 / 1 |3 / 2 |   43 / 8 |

This came up as part of IRC chat on PR/109279 and was suggested by
Andrew Pinski.

Signed-off-by: Vineet Gupta 
---
  gcc/config/riscv/riscv-protos.h |  4 +--
  gcc/config/riscv/riscv.cc   | 46 +
  gcc/config/riscv/riscv.md   |  8 +++---
  3 files changed, 24 insertions(+), 34 deletions(-)
This looks fine, except that you don't have a ChangeLog.  It also looks 
like you don't have write permissions in the repository (not listed in 
the MAINTAINERS file).  We might as well fix the latter. You can then 
add a ChangeLog and push this yourself.


Start with this form:

https://sourceware.org/cgi-bin/pdw/ps_form.cgi

Go ahead and list me as approving your request.

Thanks,
jeff


Re: [PATCH] doc: Document order of define_peephole2 scanning

2023-04-18 Thread Paul Koning via Gcc-patches
I'm not sure about the meaning of part of this.  "...resumes at the last 
generated insn."  Does that mean:

1. If a match is found at some insn, the replacement defined by the matching 
define_peephole2 is performed, and then the scan resumes at the first of the 
insns produced by the replacement.

or

2. If a match is found at some insn, the replacement defined by the matching 
define_peephole2 is performed, and then the scan resumes at the insn 
immediately following the ones just matched.

"Last generated" seems to fit option 1, but I'm not sure if that's what you 
meant.  Maybe you could add some words to say more explicitly which it is.

paul

> On Apr 18, 2023, at 1:55 PM, Hans-Peter Nilsson via Gcc-patches 
>  wrote:
> 
> Generated pdf inspected.  Ok to commit?
> 
> Thoughts on fixing the IMHO wart to also expose all
> replacements to all define_peephole2?  Looks feasible
> (famous last words), but then again I haven't checked the
> history yet.
> 
> -- >8 --
> I was a bit surprised when my define_peephole2 didn't match,
> but it was because it was expected to partially match the
> generated output of a previous define_peephole2.  I had
> assumed that the algorithm exposed newly created opportunities
> to all define_peephole2's.  While things can change in that
> direction, let's start with documenting the current state.
> 
>   * doc/md.texi (define_peephole2): Document order of scanning.
> ---
> gcc/doc/md.texi | 8 
> 1 file changed, 8 insertions(+)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 07bf8bdebffb..0f9e32d2c648 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -9362,6 +9362,14 @@ If the preparation falls through (invokes neither 
> @code{DONE} nor
> @code{FAIL}), then the @code{define_peephole2} uses the replacement
> template.
> 
> +Insns are scanned in forward order from beginning to end for each basic
> +block, but the basic blocks are scanned in reverse order of appearance
> +in a function.  After a successful replacement, scanning for further
> +opportunities for @code{define_peephole2} matches, resumes at the last
> +generated insn.  I.e. for the example above, the first insn that can be
> +matched by another @code{define_peephole2}, is @code{(set (match_dup 3)
> +(match_dup 4))}.
> +
> @end ifset
> @ifset INTERNALS
> @node Insn Attributes
> -- 
> 2.30.2
> 



Re: [PATCH] RISC-V: add TARGET_ZBKB to the condition of bswapsi2, bswapdi2 and rotr3 patterns

2023-04-18 Thread Jeff Law via Gcc-patches




On 4/10/23 04:56, Lin Sinan wrote:

From: Sinan Lin 

tell gcc that zbkb has these two spn to enable some optimizations. e.g.
1) the rrotate_expr could match to rotrm3 during expand; 2) hook up
__builtin_bswap64 with `rev8` in zbkb64.

Thanks.  I pushed this to the trunk.

jeff


Re: [PATCH v4 07/10] vect: Verify that GET_MODE_NUNITS is a multiple of 2.

2023-04-18 Thread Kito Cheng via Gcc-patches
Few more background about RVV:

RISC-V has provide different VLEN configuration by different ISA
extension like `zve32x`, `zve64x` and `v`
zve32x just guarantee the minimal VLEN is 32 bits,
zve64x guarantee the minimal VLEN is 64 bits,
and v guarantee the minimal VLEN is 128 bits,

Current status (without that patch):

Zve32x: Mode for one vector register mode is VNx1SImode and VNx1DImode
is invalid mode
 - one vector register could hold 1 + 1x SImode where x is 0~n, so it
might hold just one SI

Zve64x: Mode for one vector register mode is VNx1DImode or VNx2SImode
 - one vector register could hold 1 + 1x DImode where x is 0~n, so it
might hold just one DI
 - one vector register could hold 2 + 2x SImode where x is 0~n, so it
might hold just two SI

So what I want to say here is VNx1DImode is really NOT safe to assume
to have more than two DI in theory.

However `v` extension guarantees the minimal VLEN is 128 bits.

We are trying to introduce another type/mode mapping for this configure:

v: Mode for one vector register mode is VNx2DImode or VNx4SImode
 - one vector register could hold 2 + 2x DImode where x is 0~n, so it
will hold at least two DI
 - one vector register could hold 4 + 4x SImode where x is 0~n, so it
will hold at least four DI

So GET_MODE_NUNITS for a single vector register with DI mode will
become 2 (VNx2DImode) if it is really possible, which is a more
precise way to model the vector extension for RISC-V .



On Tue, Apr 18, 2023 at 10:28 PM Kito Cheng  wrote:
>
> Wait, VNx1DImode can be really evaluate to just one element if
> -march=rv64g_zve64x,
>
> I thinks this should be just fixed on backend by this patch:
>
> https://patchwork.ozlabs.org/project/gcc/patch/20230414014518.15458-1-juzhe.zh...@rivai.ai/
>
> On Tue, Apr 18, 2023 at 2:12 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Mon, Apr 17, 2023 at 8:42 PM Michael Collison  
> > wrote:
> > >
> > > While working on autovectorizing for the RISCV port I encountered an issue
> > > where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
> > > evenly divisible by two. The RISC-V target has vector modes (e.g. 
> > > VNx1DImode),
> > > where GET_MODE_NUNITS is equal to one.
> > >
> > > Tested on RISCV and x86_64-linux-gnu. Okay?
> >
> > OK.
> >
> > > 2023-03-09  Michael Collison  
> > >
> > > * tree-vect-slp.cc (can_duplicate_and_interleave_p):
> > > Check that GET_MODE_NUNITS is a multiple of 2.
> > > ---
> > >  gcc/tree-vect-slp.cc | 7 +--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index d73deaecce0..a64fe454e19 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -423,10 +423,13 @@ can_duplicate_and_interleave_p (vec_info *vinfo, 
> > > unsigned int count,
> > > (GET_MODE_BITSIZE (int_mode), 1);
> > >   tree vector_type
> > > = get_vectype_for_scalar_type (vinfo, int_type, count);
> > > + poly_int64 half_nelts;
> > >   if (vector_type
> > >   && VECTOR_MODE_P (TYPE_MODE (vector_type))
> > >   && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
> > > -  GET_MODE_SIZE (base_vector_mode)))
> > > +  GET_MODE_SIZE (base_vector_mode))
> > > + && multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),
> > > +2, _nelts))
> > > {
> > >   /* Try fusing consecutive sequences of COUNT / NVECTORS 
> > > elements
> > >  together into elements of type INT_TYPE and using the 
> > > result
> > > @@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info *vinfo, 
> > > unsigned int count,
> > >   poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE 
> > > (vector_type));
> > >   vec_perm_builder sel1 (nelts, 2, 3);
> > >   vec_perm_builder sel2 (nelts, 2, 3);
> > > - poly_int64 half_nelts = exact_div (nelts, 2);
> > > +
> > >   for (unsigned int i = 0; i < 3; ++i)
> > > {
> > >   sel1.quick_push (i);
> > > --
> > > 2.34.1
> > >


[PATCH] expansion: make layout of x_shift*cost[][][] more efficient

2023-04-18 Thread Vineet Gupta
when debugging expmed.[ch] for PR/108987 saw that some of the cost arrays have
less than ideal layout as follows:

   x_shift*cost[0..63][speed][modes]

We would want speed to be first index since a typical compile will have
that fixed, followed by mode and then the shift values.

It should be non-functional from compiler semantics pov, except
executing slightly faster due to better locality of shift values for
given speed and mode. And also a bit more intutive when debugging.

gcc/Changelog:

* expmed.h (x_shift*_cost): convert to int [speed][mode][shift].
(shift*_cost_ptr ()): Access x_shift*_cost array directly.

Signed-off-by: Vineet Gupta 
---
 gcc/expmed.h | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/gcc/expmed.h b/gcc/expmed.h
index c747a0da1637..d032beaef550 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -161,15 +161,14 @@ struct target_expmed {
   struct expmed_op_cheap x_sdiv_pow2_cheap;
   struct expmed_op_cheap x_smod_pow2_cheap;
 
-  /* Cost of various pieces of RTL.  Note that some of these are indexed by
- shift count and some by mode.  */
+  /* Cost of various pieces of RTL.  */
   int x_zero_cost[2];
   struct expmed_op_costs x_add_cost;
   struct expmed_op_costs x_neg_cost;
-  struct expmed_op_costs x_shift_cost[MAX_BITS_PER_WORD];
-  struct expmed_op_costs x_shiftadd_cost[MAX_BITS_PER_WORD];
-  struct expmed_op_costs x_shiftsub0_cost[MAX_BITS_PER_WORD];
-  struct expmed_op_costs x_shiftsub1_cost[MAX_BITS_PER_WORD];
+  int x_shift_cost[2][NUM_MODE_IPV_INT][MAX_BITS_PER_WORD];
+  int x_shiftadd_cost[2][NUM_MODE_IPV_INT][MAX_BITS_PER_WORD];
+  int x_shiftsub0_cost[2][NUM_MODE_IPV_INT][MAX_BITS_PER_WORD];
+  int x_shiftsub1_cost[2][NUM_MODE_IPV_INT][MAX_BITS_PER_WORD];
   struct expmed_op_costs x_mul_cost;
   struct expmed_op_costs x_sdiv_cost;
   struct expmed_op_costs x_udiv_cost;
@@ -395,8 +394,8 @@ neg_cost (bool speed, machine_mode mode)
 inline int *
 shift_cost_ptr (bool speed, machine_mode mode, int bits)
 {
-  return expmed_op_cost_ptr (_target_expmed->x_shift_cost[bits],
-speed, mode);
+  int midx = expmed_mode_index (mode);
+  return _target_expmed->x_shift_cost[speed][midex][bits];
 }
 
 /* Set the COST of doing a shift in MODE by BITS when optimizing for SPEED.  */
@@ -421,8 +420,8 @@ shift_cost (bool speed, machine_mode mode, int bits)
 inline int *
 shiftadd_cost_ptr (bool speed, machine_mode mode, int bits)
 {
-  return expmed_op_cost_ptr (_target_expmed->x_shiftadd_cost[bits],
-speed, mode);
+  int midx = expmed_mode_index (mode);
+  return _target_expmed->x_shiftadd_cost[speed][midex][bits];
 }
 
 /* Set the COST of doing a shift in MODE by BITS followed by an add when
@@ -448,8 +447,8 @@ shiftadd_cost (bool speed, machine_mode mode, int bits)
 inline int *
 shiftsub0_cost_ptr (bool speed, machine_mode mode, int bits)
 {
-  return expmed_op_cost_ptr (_target_expmed->x_shiftsub0_cost[bits],
-speed, mode);
+  int midx = expmed_mode_index (mode);
+  return _target_expmed->x_shiftsub0_cost[speed][midex][bits];
 }
 
 /* Set the COST of doing a shift in MODE by BITS and then subtracting a
@@ -475,8 +474,8 @@ shiftsub0_cost (bool speed, machine_mode mode, int bits)
 inline int *
 shiftsub1_cost_ptr (bool speed, machine_mode mode, int bits)
 {
-  return expmed_op_cost_ptr (_target_expmed->x_shiftsub1_cost[bits],
-speed, mode);
+  int midx = expmed_mode_index (mode);
+  return _target_expmed->x_shiftsub1_cost[speed][midex][bits];
 }
 
 /* Set the COST of subtracting a shift in MODE by BITS from a value when
-- 
2.34.1



Re: [PATCH v4 09/10] This patch adds a guard for VNx1 vectors that are present in ports like riscv.

2023-04-18 Thread Michael Collison

Thanks Kito I will look into this.


On 4/18/23 10:26, Kito Cheng wrote:

I would prefer drop this patch from this patch series since I believe
https://patchwork.ozlabs.org/project/gcc/patch/20230414014518.15458-1-juzhe.zh...@rivai.ai/
is the right fix for this issue.

On Tue, Apr 18, 2023 at 2:40 AM Michael Collison  wrote:

From: Kevin Lee 

Kevin Lee 
gcc/ChangeLog:

 * tree-vect-data-refs.cc (vect_grouped_store_supported): Add new
condition
---
  gcc/tree-vect-data-refs.cc | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 8daf7bd7dd3..df393ba723d 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -5399,6 +5399,8 @@ vect_grouped_store_supported (tree vectype, unsigned 
HOST_WIDE_INT count)
   poly_uint64 nelt = GET_MODE_NUNITS (mode);

   /* The encoding has 2 interleaved stepped patterns.  */
+if(!multiple_p (nelt, 2))
+  return false;
   vec_perm_builder sel (nelt, 2, 3);
   sel.quick_grow (6);
   for (i = 0; i < 3; i++)
--
2.34.1



[PATCH] doc: Document order of define_peephole2 scanning

2023-04-18 Thread Hans-Peter Nilsson via Gcc-patches
Generated pdf inspected.  Ok to commit?

Thoughts on fixing the IMHO wart to also expose all
replacements to all define_peephole2?  Looks feasible
(famous last words), but then again I haven't checked the
history yet.

-- >8 --
I was a bit surprised when my define_peephole2 didn't match,
but it was because it was expected to partially match the
generated output of a previous define_peephole2.  I had
assumed that the algorithm exposed newly created opportunities
to all define_peephole2's.  While things can change in that
direction, let's start with documenting the current state.

* doc/md.texi (define_peephole2): Document order of scanning.
---
 gcc/doc/md.texi | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 07bf8bdebffb..0f9e32d2c648 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -9362,6 +9362,14 @@ If the preparation falls through (invokes neither 
@code{DONE} nor
 @code{FAIL}), then the @code{define_peephole2} uses the replacement
 template.
 
+Insns are scanned in forward order from beginning to end for each basic
+block, but the basic blocks are scanned in reverse order of appearance
+in a function.  After a successful replacement, scanning for further
+opportunities for @code{define_peephole2} matches, resumes at the last
+generated insn.  I.e. for the example above, the first insn that can be
+matched by another @code{define_peephole2}, is @code{(set (match_dup 3)
+(match_dup 4))}.
+
 @end ifset
 @ifset INTERNALS
 @node Insn Attributes
-- 
2.30.2



Re: [PATCH] Docs: Add doc for RISC-V vector intrinsics

2023-04-18 Thread Kito Cheng via Gcc-patches
committed to trunk and gcc 13

On Tue, Apr 18, 2023 at 9:29 PM Jeff Law  wrote:
>
>
>
> On 4/18/23 04:16, Kito Cheng via Gcc-patches wrote:
> > Document which version of RISC-V vector intrinsics has implemented in
> > GCC.
> >
> > gcc/ChangeLog:
> >
> >   * doc/extend.texi (Target Builtins): Add RISC-V Vector
> >   Intrinsics.
> >   (RISC-V Vector Intrinsics): Document GCC implemented which
> >   version of RISC-V vector intrinsics and its reference.
> >
> > OK for 13?
> OK for 13 and the trunk.
>
> jeff


Re: [PATCH] Introduce VIRTUAL_REGISTER_P and VIRTUAL_REGISTER_NUM_P predicates

2023-04-18 Thread Jakub Jelinek via Gcc-patches
On Mon, Apr 17, 2023 at 11:27:28PM +0200, Uros Bizjak via Gcc-patches wrote:
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -1972,6 +1972,13 @@ set_regno_raw (rtx x, unsigned int regno, unsigned int 
> nregs)
>  /* 1 if the given register number REG_NO corresponds to a hard register.  */
>  #define HARD_REGISTER_NUM_P(REG_NO) ((REG_NO) < FIRST_PSEUDO_REGISTER)
>  
> +/* 1 if the given register REG corresponds to a virtual register.  */
> +#define VIRTUAL_REGISTER_P(REG) (VIRTUAL_REGISTER_NUM_P (REGNO (REG)))
> +
> +/* 1 if the given register number REG_NO corresponds to a virtual register.  
> */
> +#define VIRTUAL_REGISTER_NUM_P(REG_NO)   
> \
> +  (IN_RANGE (REG_NO, FIRST_VIRTUAL_REGISTER, LAST_VIRTUAL_REGISTER))

Why the ()s around both definitions?
IN_RANGE adds its own and anything on top of that is just superfluous.

Jakub



Re: [PATCH] Introduce VIRTUAL_REGISTER_P and VIRTUAL_REGISTER_NUM_P predicates

2023-04-18 Thread Jeff Law via Gcc-patches




On 4/17/23 15:27, Uros Bizjak via Gcc-patches wrote:

These two predicates are similar to existing HARD_REGISTER_P and
HARD_REGISTER_NUM_P predicates and return 1 if the given register
corresponds to a virtual register.

gcc/ChangeLog:

 * rtl.h (VIRTUAL_REGISTER_P): New predicate.
 (VIRTUAL_REGISTER_NUM_P): Ditto.
 (REGNO_PTR_FRAME_P): Use VIRTUAL_REGISTER_NUM_P predicate.
 * expr.cc (force_operand): Use VIRTUAL_REGISTER_P predicate.
 * function.cc (instantiate_decl_rtl): Ditto.
 * rtlanal.cc (rtx_addr_can_trap_p_1): Ditto.
 (nonzero_address_p): Ditto.
 (refers_to_regno_p): Use VIRTUAL_REGISTER_NUM_P predicate.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for master?

OK.

Jeff


[PATCH] i386: Improve permutations with INSERTPS instruction [PR94908]

2023-04-18 Thread Uros Bizjak via Gcc-patches
INSERTPS can select any element from src and insert into any place
of the dest.  For SSE4.1 targets, compiler can generate e.g.

insertps $64, %xmm0, %xmm1

to insert element 1 from %xmm1 to element 0 of %xmm0.

gcc/ChangeLog:

PR target/94908
* config/i386/i386-builtin.def (__builtin_ia32_insertps128):
Use CODE_FOR_sse4_1_insertps_v4sf.
* config/i386/i386-expand.cc (expand_vec_perm_insertps): New.
(expand_vec_perm_1): Call expand_vec_per_insertps.
* config/i386/i386.md ("unspec"): Declare UNSPEC_INSERTPS here.
* config/i386/mmx.md (mmxscalarmode): New mode attribute.
(@sse4_1_insertps_): New insn pattern.
* config/i386/sse.md (@sse4_1_insertps_): Macroize insn
pattern from sse4_1_insertps using VI4F_128 mode iterator.

gcc/testsuite/ChangeLog:

PR target/94908
* gcc.target/i386/pr94908.c: New test.
* gcc.target/i386/sse4_1-insertps-5.c: New test.
* gcc.target/i386/vperm-v4sf-2-sse4.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 6dae6972d81..f7cf105ae69 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -942,7 +942,7 @@ BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_blendvpd, 
"__builtin_ia32_blen
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_nothing, "__builtin_ia32_blendvps", 
IX86_BUILTIN_BLENDVPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dppd, "__builtin_ia32_dppd", 
IX86_BUILTIN_DPPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dpps, "__builtin_ia32_dpps", 
IX86_BUILTIN_DPPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
-BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_insertps, 
"__builtin_ia32_insertps128", IX86_BUILTIN_INSERTPS128, UNKNOWN, (int) 
V4SF_FTYPE_V4SF_V4SF_INT)
+BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_insertps_v4sf, 
"__builtin_ia32_insertps128", IX86_BUILTIN_INSERTPS128, UNKNOWN, (int) 
V4SF_FTYPE_V4SF_V4SF_INT)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_mpsadbw, 
"__builtin_ia32_mpsadbw128", IX86_BUILTIN_MPSADBW128, UNKNOWN, (int) 
V16QI_FTYPE_V16QI_V16QI_INT)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_nothing, 
"__builtin_ia32_pblendvb128", IX86_BUILTIN_PBLENDVB128, UNKNOWN, (int) 
V16QI_FTYPE_V16QI_V16QI_V16QI)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_pblendw, 
"__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, (int) 
V8HI_FTYPE_V8HI_V8HI_INT)
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 0d817fc3f3b..9fa549c4c3b 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -18985,6 +18985,78 @@ expand_vec_perm_movs (struct expand_vec_perm_d *d)
   return true;
 }
 
+/* A subroutine of ix86_expand_vec_perm_const_1.  Try to implement D
+   using insertps.  */
+static bool
+expand_vec_perm_insertps (struct expand_vec_perm_d *d)
+{
+  machine_mode vmode = d->vmode;
+  unsigned i, cnt_s, nelt = d->nelt;
+  int cnt_d = -1;
+  rtx src, dst;
+
+  if (d->one_operand_p)
+return false;
+
+  if (!(TARGET_SSE4_1
+   && (vmode == V4SFmode || vmode == V4SImode
+   || (TARGET_MMX_WITH_SSE
+   && (vmode == V2SFmode || vmode == V2SImode)
+return false;
+
+  for (i = 0; i < nelt; ++i)
+{
+  if (d->perm[i] == i)
+   continue;
+  if (cnt_d != -1)
+   {
+ cnt_d = -1;
+ break;
+   }
+  cnt_d = i;
+}
+
+  if (cnt_d == -1)
+{
+  for (i = 0; i < nelt; ++i)
+   {
+ if (d->perm[i] == i + nelt)
+   continue;
+ if (cnt_d != -1)
+   return false;
+ cnt_d = i;
+   }
+
+  if (cnt_d == -1)
+   return false;
+}
+
+  if (d->testing_p)
+return true;
+
+  gcc_assert (cnt_d != -1);
+
+  cnt_s = d->perm[cnt_d];
+  if (cnt_s < nelt)
+{
+  src = d->op0;
+  dst = d->op1;
+}
+  else
+{
+  cnt_s -= nelt;
+  src = d->op1;
+  dst = d->op0;
+ }
+  gcc_assert (cnt_s < nelt);
+
+  rtx x = gen_sse4_1_insertps (vmode, d->target, dst, src,
+  GEN_INT (cnt_s << 6 | cnt_d << 4));
+  emit_insn (x);
+
+  return true;
+}
+
 /* A subroutine of ix86_expand_vec_perm_const_1.  Try to implement D
in terms of blendp[sd] / pblendw / pblendvb / vpblendd.  */
 
@@ -19918,6 +19990,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
   if (expand_vec_perm_movs (d))
 return true;
 
+  /* Try the SSE4.1 insertps instruction.  */
+  if (expand_vec_perm_insertps (d))
+return true;
+
   /* Try the fully general two operand permute.  */
   if (expand_vselect_vconcat (d->target, d->op0, d->op1, d->perm, nelt,
  d->testing_p))
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ed689b044c3..1419ea4cff3 100644
--- a/gcc/config/i386/i386.md
+++ 

Re: [PATCH v5] RISCV: Inline subword atomic ops

2023-04-18 Thread Jeff Law via Gcc-patches




On 4/18/23 08:28, Patrick O'Neill wrote:

RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2023-04-18 Patrick O'Neill 

PR target/104338
* riscv-protos.h: Add helper function stubs.
* riscv.cc: Add helper functions for subword masking.
* riscv.opt: Add command-line flag.
* sync.md: Add masking logic and inline asm for fetch_and_op,
fetch_and_nand, CAS, and exchange ops.
* invoke.texi: Add blurb regarding command-line flag.
* inline-atomics-1.c: New test.
* inline-atomics-2.c: Likewise.
* inline-atomics-3.c: Likewise.
* inline-atomics-4.c: Likewise.
* inline-atomics-5.c: Likewise.
* inline-atomics-6.c: Likewise.
* inline-atomics-7.c: Likewise.
* inline-atomics-8.c: Likewise.
* atomic.c: Add reference to duplicate logic.
So for others who may be interested.  The motivation here is that for a 
sub-word atomic we currently have to explicitly link in libatomic or we 
get undefined symbols.


This is particularly problematical for the distros because we're one of 
the few (only?) architectures supported by the distros that require 
linking in libatomic for these cases.  THe distros don't want to adjust 
each affected packages and be stuck carrying that change forward or 
negotiating with all the relevant upstreams.  The distros might tackle 
this problem by porting this patch into their compiler tree which has 
its own set of problems with long term maintenance.


The net is from a usability standpoint it's best if we get this problem 
addressed and backported to our gcc-13 RISC-V coordination branch.


We had held this up pending resolution of some other issues in the 
atomics space.  In retrospect that might have been a mistake.


So with that background...  Here we go...

  
+/* Helper function for extracting a subword from memory.  */

+
+void
+riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
+  rtx *not_mask)
So I'd expand on that comment.  The idea is we would like someone 
working in the backend to be able to read the function comment and have 
a reasonable sense of what the function does as well as the inputs and 
return value.  So perhaps something like this:


/* Given memory reference MEM, expand code to compute the aligned
   memory address, shift and mask values and store them into
   *ALIGNED_MEM, *SHIFT, *MASK and *NOT_MASK.  */

Or something like that.



+{
+  /* Align the memory addess to a word.  */

s/addess/address/



+  rtx addr = force_reg (Pmode, XEXP (mem, 0));
+
+  rtx aligned_addr = gen_reg_rtx (Pmode);
+  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
+ gen_int_mode (-4, Pmode)));
So rather than -4 as a magic number, GET_MODE_MASK would be better. 
That may result in needing to rewrap this code.  I'd bring the 
gen_rtx_AND down on a new line, aligned with aligned_addr.


Presumably using SImode is intentional here rather than wanting to use 
word_mode which would be SImode for rv32 and DImode for rv64?  I'm going 
to work based on that assumption, but if it isn't there's more work to 
do to generalize this code.





+
+  /* Calculate the shift amount.  */
+  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
+  gen_int_mode (3, SImode)));
+  emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
+ gen_int_mode(3, SImode)));



Formatting nit.  space after before open paren in the gen_int_mode call 
on that last line above.  This minor goof shows up in various places, 
please review the patch as a whole looking for similar nits.





+
+  /* Calculate the mask.  */
+  int unshifted_mask;
+  if (GET_MODE (mem) == QImode)
+unshifted_mask = 0xFF;
+  else
+unshifted_mask = 0x;
Can you just use GET_MASK_MODE here which should simplify this to 
something like


unshifted_mask = GET_MODE_MASK (GET_MODE (mem));









+
+(define_expand "atomic_fetch_nand"
+  [(set (match_operand:SHORT 0 "register_operand" "=")
+   (match_operand:SHORT 1 "memory_operand" "+A"))
+   (set (match_dup 1)
+   (unspec_volatile:SHORT
+ [(not:SHORT (and:SHORT (match_dup 1)
+(match_operand:SHORT 2 "reg_or_0_operand" 
"rJ")))
+  (match_operand:SI 3 "const_int_operand")] ;; model
+UNSPEC_SYNC_OLD_OP_SUBWORD))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
Just a note, constraints aren't 

RE: [PATCH 1/3]middle-end match.pd: don't emit label if not needed

2023-04-18 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, April 18, 2023 11:38 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de;
> j...@ventanamicro.com
> Subject: Re: [PATCH 1/3]middle-end match.pd: don't emit label if not needed
> 
> On Tue, Apr 18, 2023 at 12:21 PM Tamar Christina via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > This is a small QoL codegen improvement for match.pd to not emit
> > labels when they are not needed.  The codegen is nice and there is a
> > small (but consistent) improvement in compile time.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> OK.  Btw - how many labels does this remove? (just wc -l the generated files?)

Not terribly much anymore, it's about 160 lines.  Though when benchmarking it
shows a consistent 2-5% speedup in compile time (I take the geomean of about 
100 compiles).

Regards,
Tamar

> 
> Richard.
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR bootstrap/84402
> > * genmatch.cc (dt_simplify::gen_1): Only emit labels if used.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc index
> >
> 4fab4135347c43d95546a7df0bb1c4d074937288..638606b2502f640e595
> 27fc5a0b2
> > 3fa3bedd0cee 100644
> > --- a/gcc/genmatch.cc
> > +++ b/gcc/genmatch.cc
> > @@ -3352,6 +3352,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool
> gimple, operand *result)
> >char local_fail_label[256];
> >snprintf (local_fail_label, 256, "next_after_fail%u", ++fail_label_cnt);
> >fail_label = local_fail_label;
> > +  bool needs_label = false;
> >
> >/* Analyze captures and perform early-outs on the incoming arguments
> >   that cover cases we cannot handle.  */ @@ -3366,6 +3367,7 @@
> > dt_simplify::gen_1 (FILE *f, int indent, bool gimple, operand *result)
> > fprintf_indent (f, indent,
> > "if (TREE_SIDE_EFFECTS (_p%d)) goto %s;\n",
> > i, fail_label);
> > +   needs_label = true;
> > if (verbose >= 1)
> >   warning_at (as_a  (s->match)->ops[i]->location,
> >   "forcing toplevel operand to have no "
> > @@ -3381,6 +3383,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool
> gimple, operand *result)
> > fprintf_indent (f, indent,
> > "if (TREE_SIDE_EFFECTS (captures[%d])) "
> > "goto %s;\n", i, fail_label);
> > +   needs_label = true;
> > if (verbose >= 1)
> >   warning_at (cinfo.info[i].c->location,
> >   "forcing captured operand to have no "
> > @@ -3423,7 +3426,10 @@ dt_simplify::gen_1 (FILE *f, int indent, bool
> gimple, operand *result)
> >  }
> >
> >if (s->kind == simplify::SIMPLIFY)
> > -fprintf_indent (f, indent, "if (UNLIKELY (!dbg_cnt (match))) goto 
> > %s;\n",
> fail_label);
> > +{
> > +  fprintf_indent (f, indent, "if (UNLIKELY (!dbg_cnt (match))) goto 
> > %s;\n",
> fail_label);
> > +  needs_label = true;
> > +}
> >
> >fprintf_indent (f, indent, "if (UNLIKELY (dump_file && (dump_flags &
> TDF_FOLDING))) "
> >"fprintf (dump_file, \"%s ", @@ -3496,9 +3502,12 @@
> > dt_simplify::gen_1 (FILE *f, int indent, bool gimple, operand *result)
> >   "res_op->resimplify (%s, valueize);\n",
> >   !e->force_leaf ? "lseq" : "NULL");
> >   if (e->force_leaf)
> > -   fprintf_indent (f, indent,
> > -   "if (!maybe_push_res_to_seq (res_op, NULL)) 
> > "
> > -   "goto %s;\n", fail_label);
> > +   {
> > + fprintf_indent (f, indent,
> > + "if (!maybe_push_res_to_seq (res_op, 
> > NULL)) "
> > + "goto %s;\n", fail_label);
> > + needs_label = true;
> > +   }
> > }
> > }
> >else if (result->type == operand::OP_CAPTURE @@ -3554,9
> > +3563,12 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, operand
> *result)
> >   continue;
> > if (cinfo.info[i].result_use_count
> > > cinfo.info[i].match_use_count)
> > - fprintf_indent (f, indent,
> > - "if (! tree_invariant_p (captures[%d])) "
> > - "goto %s;\n", i, fail_label);
> > + {
> > +   fprintf_indent (f, indent,
> > +   "if (! tree_invariant_p (captures[%d])) 
> > "
> > +   "goto %s;\n", i, fail_label);
> > +   needs_label = true;
> > + }
> >   }
> >   for (unsigned j = 

[PATCH] Fix pointer sharing in Value_Range constructor.

2023-04-18 Thread Aldy Hernandez via Gcc-patches
I will push this when a final round of testing finishes on x86-64 Linux.

gcc/ChangeLog:

* value-range.h (Value_Range::Value_Range): Avoid pointer sharing.
---
 gcc/value-range.h | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/value-range.h b/gcc/value-range.h
index 0eeea79b322..f97596cdd14 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -583,7 +583,18 @@ Value_Range::Value_Range (tree min, tree max, 
value_range_kind kind)
 inline
 Value_Range::Value_Range (const Value_Range )
 {
-  m_vrange = r.m_vrange;
+  if (r.m_vrange == _irange)
+{
+  m_irange = r.m_irange;
+  m_vrange = _irange;
+}
+  else if (r.m_vrange == _frange)
+{
+  m_frange = r.m_frange;
+  m_vrange = _frange;
+}
+  else
+m_vrange = _unsupported;
 }
 
 // Initialize object so it is possible to store temporaries of TYPE
-- 
2.39.2



[PATCH v6] RISCV: Inline subword atomic ops

2023-04-18 Thread Patrick O'Neill
RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls 
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2023-04-18 Patrick O'Neill 

PR target/104338
* riscv-protos.h: Add helper function stubs.
* riscv.cc: Add helper functions for subword masking.
* riscv.opt: Add command-line flag.
* sync.md: Add masking logic and inline asm for fetch_and_op,
fetch_and_nand, CAS, and exchange ops.
* invoke.texi: Add blurb regarding command-line flag.
* inline-atomics-1.c: New test.
* inline-atomics-2.c: Likewise.
* inline-atomics-3.c: Likewise.
* inline-atomics-4.c: Likewise.
* inline-atomics-5.c: Likewise.
* inline-atomics-6.c: Likewise.
* inline-atomics-7.c: Likewise.
* inline-atomics-8.c: Likewise.
* atomic.c: Add reference to duplicate logic.

Signed-off-by: Patrick O'Neill 
Signed-off-by: Palmer Dabbelt 
---
v5: 
https://inbox.sourceware.org/gcc-patches/20230418142858.2424851-1-patr...@rivosinc.com/

Addressed Andreas Schwab's comments about the flags/documentation.
  https://inbox.sourceware.org/gcc-patches/87y1mpb57m@igel.home/

No new failures on trunk.
---
The mapping implemented here matches Libatomic. That mapping changes if
"Implement ISA Manual Table A.6 Mappings" is merged. Depending on which
patch is merged first, I will update the other to make sure the
correct mapping is emitted.
  https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615748.html
---
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv.cc |  50 ++
 gcc/config/riscv/riscv.opt|   4 +
 gcc/config/riscv/sync.md  | 314 ++
 gcc/doc/invoke.texi   |  10 +-
 .../gcc.target/riscv/inline-atomics-1.c   |  18 +
 .../gcc.target/riscv/inline-atomics-2.c   |  19 +
 .../gcc.target/riscv/inline-atomics-3.c   | 569 ++
 .../gcc.target/riscv/inline-atomics-4.c   | 566 +
 .../gcc.target/riscv/inline-atomics-5.c   |  87 +++
 .../gcc.target/riscv/inline-atomics-6.c   |  87 +++
 .../gcc.target/riscv/inline-atomics-7.c   |  69 +++
 .../gcc.target/riscv/inline-atomics-8.c   |  69 +++
 libgcc/config/riscv/atomic.c  |   2 +
 14 files changed, 1865 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5244e8dcbf0..02b33e02020 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -79,6 +79,8 @@ extern void riscv_reinit (void);
 extern poly_uint64 riscv_regmode_natural_size (machine_mode);
 extern bool riscv_v_ext_vector_mode_p (machine_mode);
 extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
+extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
+extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e4937d1af25..fa0247be22f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7143,6 +7143,56 @@ riscv_zero_call_used_regs (HARD_REG_SET 
need_zeroed_hardregs)
& ~zeroed_hardregs);
 }
 
+/* Helper function for extracting a subword from memory.  */
+
+void
+riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
+  rtx *not_mask)
+{
+  /* Align the memory addess to a word.  */
+  rtx addr = force_reg (Pmode, XEXP (mem, 0));
+
+  rtx aligned_addr = gen_reg_rtx (Pmode);
+  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
+ gen_int_mode (-4, Pmode)));
+
+  *aligned_mem = change_address (mem, SImode, aligned_addr);
+
+  /* Calculate the shift amount.  */
+  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
+  

Re: [committed] libstdc++: Export global iostreams with GLIBCXX_3.4.31 symver [PR108969]

2023-04-18 Thread Jonathan Wakely via Gcc-patches
On Tue, 18 Apr 2023 at 17:13, Jonathan Wakely  wrote:
>
> On Tue, 18 Apr 2023 at 16:59, Jonathan Wakely via Libstdc++
>  wrote:
> > diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
> > b/libstdc++-v3/config/abi/pre/gnu.ver
> > index 4ae63094eb7..7c015524b62 100644
> > --- a/libstdc++-v3/config/abi/pre/gnu.ver
> > +++ b/libstdc++-v3/config/abi/pre/gnu.ver
> > @@ -2512,6 +2512,20 @@ GLIBCXX_3.4.31 {
> >  
> > _ZNKSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE[012]EEcvbEv;
> >  
> > _ZNKSt12__shared_ptrINSt10filesystem7__cxx1128recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE[012]EEcvbEv;
> >
> > +#if defined(_GLIBCXX_SYMVER_GNU) && defined(_GLIBCXX_SHARED) \
> > +&& defined(_GLIBCXX_HAVE_AS_SYMVER_DIRECTIVE) \
> > +&& defined(_GLIBCXX_HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT)
>
> I've just noticed that these will never be defined, because this
> linker script is preprocessed with config.h not c++config.h, and so it
> should be:
>
> #if defined(_GLIBCXX_SYMVER_GNU) && defined(_GLIBCXX_SHARED) \
> && defined(HAVE_AS_SYMVER_DIRECTIVE) \
> && defined(HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT)
>
>
> The condition will never be true as currently written (which doesn't
> actually seem to matter, but Jakub says that re-exporting the symbols
> as the new version is needed in the linker script for older linkers).

Fixed with this patch, pushed to trunk and gcc-13 (after Jakub's approval).

Tested x86_64-linux and sparc-solaris2.11
commit 6067ae4557a3a7e5b08359e78a29b8a9d5dfedce
Author: Jonathan Wakely 
Date:   Tue Apr 18 17:22:40 2023

libstdc++: Fix preprocessor condition in linker script [PR108969]

The linker script is preprocessed with $(top_builddir)/config.h not the
include/$target/bits/c++config.h version, which means that configure
macros do not have the _GLIBCXX_ prefix yet.

The _GLIBCXX_SYMVER_GNU and _GLIBCXX_SHARED checks are redundant,
because the gnu.ver file is only used for _GLIBCXX_SYMVER_GNU and the
linker script is only used for the shared library. Remove those.

libstdc++-v3/ChangeLog:

PR libstdc++/108969
* config/abi/pre/gnu.ver: Fix preprocessor condition.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 7c015524b62..311a5056e72 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2512,9 +2512,8 @@ GLIBCXX_3.4.31 {
 
_ZNKSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE[012]EEcvbEv;
 
_ZNKSt12__shared_ptrINSt10filesystem7__cxx1128recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE[012]EEcvbEv;
 
-#if defined(_GLIBCXX_SYMVER_GNU) && defined(_GLIBCXX_SHARED) \
-&& defined(_GLIBCXX_HAVE_AS_SYMVER_DIRECTIVE) \
-&& defined(_GLIBCXX_HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT)
+#if defined(HAVE_AS_SYMVER_DIRECTIVE) \
+&& defined(HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT)
 # See src/c++98/globals_io.cc
 _ZSt3cin;
 _ZSt4cout;


[PATCH] RFC: New compact syntax for insn and insn_split in Machine Descriptions

2023-04-18 Thread Tamar Christina via Gcc-patches
Hi All,

This patch adds support for a compact syntax for specifying constraints in
instruction patterns. Credit for the idea goes to Richard Earnshaw.

I am sending up this RFC to get feedback for it's inclusion in GCC 14.
With this new syntax we want a clean break from the current limitations to make
something that is hopefully easier to use and maintain.

The idea behind this compact syntax is that often times it's quite hard to
correlate the entries in the constrains list, attributes and instruction lists.

One has to count and this often is tedious.  Additionally when changing a single
line in the insn multiple lines in a diff change, making it harder to see what's
going on.

This new syntax takes into account many of the common things that are done in MD
files.   It's also worth saying that this version is intended to deal with the
common case of a string based alternatives.   For C chunks we have some ideas
but those are not intended to be addressed here.

It's easiest to explain with an example:

normal syntax:

(define_insn_and_split "*movsi_aarch64"
  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m,  
r,  r,  r, w,r,w, w")
(match_operand:SI 1 "aarch64_mov_operand"  " 
r,r,k,M,n,Usv,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Ds"))]
  "(register_operand (operands[0], SImode)
|| aarch64_reg_or_zero (operands[1], SImode))"
  "@
   mov\\t%w0, %w1
   mov\\t%w0, %w1
   mov\\t%w0, %w1
   mov\\t%w0, %1
   #
   * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
   ldr\\t%w0, %1
   ldr\\t%s0, %1
   str\\t%w1, %0
   str\\t%s1, %0
   adrp\\t%x0, %A1\;ldr\\t%w0, [%x0, %L1]
   adr\\t%x0, %c1
   adrp\\t%x0, %A1
   fmov\\t%s0, %w1
   fmov\\t%w0, %s1
   fmov\\t%s0, %s1
   * return aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);"
  "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode)
&& REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
   [(const_int 0)]
   "{
   aarch64_expand_mov_immediate (operands[0], operands[1]);
   DONE;
}"
  ;; The "mov_imm" type for CNT is just a placeholder.
  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_4,

load_4,store_4,store_4,load_4,adr,adr,f_mcr,f_mrc,fmov,neon_move")
   (set_attr "arch"   "*,*,*,*,*,sve,*,fp,*,fp,*,*,*,fp,fp,fp,simd")
   (set_attr "length" "4,4,4,4,*,  4,4, 4,4, 4,8,4,4, 4, 4, 4,   4")
]
)

New syntax:

(define_insn_and_split "*movsi_aarch64"
  [(set (match_operand:SI 0 "nonimmediate_operand")
(match_operand:SI 1 "aarch64_mov_operand"))]
  "(register_operand (operands[0], SImode)
|| aarch64_reg_or_zero (operands[1], SImode))"
  "@@ (cons: 0 1; attrs: type arch length)
   [=r, r  ; mov_reg  , *   , 4] mov\t%w0, %w1
   [k , r  ; mov_reg  , *   , 4] ^
   [r , k  ; mov_reg  , *   , 4] ^
   [r , M  ; mov_imm  , *   , 4] mov\t%w0, %1
   [r , n  ; mov_imm  , *   , *] #
   [r , Usv; mov_imm  , sve , 4] << aarch64_output_sve_cnt_immediate ('cnt', 
'%x0', operands[1]);
   [r , m  ; load_4   , *   , 4] ldr\t%w0, %1
   [w , m  ; load_4   , fp  , 4] ldr\t%s0, %1
   [m , rZ ; store_4  , *   , 4] str\t%w1, %0
   [m , w  ; store_4  , fp  , 4] str\t%s1, %0
   [r , Usw; load_4   , *   , 8] adrp\t%x0, %A1;ldr\t%w0, [%x0, %L1]
   [r , Usa; adr  , *   , 4] adr\t%x0, %c1
   [r , Ush; adr  , *   , 4] adrp\t%x0, %A1
   [w , rZ ; f_mcr, fp  , 4] fmov\t%s0, %w1
   [r , w  ; f_mrc, fp  , 4] fmov\t%w0, %s1
   [w , w  ; fmov , fp  , 4] fmov\t%s0, %s1
   [w , Ds ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate 
(operands[1], SImode);"
  "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode)
&& REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
  [(const_int 0)]
  {
aarch64_expand_mov_immediate (operands[0], operands[1]);
DONE;
  }
  ;; The "mov_imm" type for CNT is just a placeholder.
)

The patch contains some more rewritten examples for both Arm and AArch64.  I
have included them for examples in this RFC but the final version posted in
GCC 14 will have these split out.

The main syntax rules are as follows (See docs for full rules):
  - Template must start with "@@" to use the new syntax.
  - "@@" is followed by a layout in parentheses which is "cons:" followed by
a list of match_operand/match_scratch IDs, then a semicolon, then the
same for attributes ("attrs:"). Both sections are optional (so you can
use only cons, or only attrs, or both), and cons must come before attrs
if present.
  - Each alternative begins with any amount of whitespace.
  - Following the whitespace is a comma-separated list of constraints and/or
attributes within brackets [], with sections separated by a semicolon.
  - Following the closing ']' is any amount of whitespace, and then the actual
asm output.
  - Spaces are allowed in the list (they will simply be removed).
  - All alternatives should be specified: a blank list should be
"[,,]", 

[COMMITTED] Add GTY support for vrange.

2023-04-18 Thread Aldy Hernandez via Gcc-patches
IPA currently puts *some* irange's in GC memory.  When I contribute
support for generic ranges in IPA, we'll need to change this to
vrange.  This patch adds GTY support for both vrange and frange.

gcc/ChangeLog:

* value-range.cc (gt_ggc_mx): New.
(gt_pch_nx): New.
* value-range.h (class vrange): Add GTY marker.
(class frange): Same.
(gt_ggc_mx): Remove.
(gt_pch_nx): Remove.
---
 gcc/value-range.cc | 85 ++
 gcc/value-range.h  | 51 
 2 files changed, 99 insertions(+), 37 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 3b3102bc6d0..17f4e1b9f59 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -3252,6 +3252,91 @@ vrp_operand_equal_p (const_tree val1, const_tree val2)
   return true;
 }
 
+void
+gt_ggc_mx (irange *x)
+{
+  for (unsigned i = 0; i < x->m_num_ranges; ++i)
+{
+  gt_ggc_mx (x->m_base[i * 2]);
+  gt_ggc_mx (x->m_base[i * 2 + 1]);
+}
+  if (x->m_nonzero_mask)
+gt_ggc_mx (x->m_nonzero_mask);
+}
+
+void
+gt_pch_nx (irange *x)
+{
+  for (unsigned i = 0; i < x->m_num_ranges; ++i)
+{
+  gt_pch_nx (x->m_base[i * 2]);
+  gt_pch_nx (x->m_base[i * 2 + 1]);
+}
+  if (x->m_nonzero_mask)
+gt_pch_nx (x->m_nonzero_mask);
+}
+
+void
+gt_pch_nx (irange *x, gt_pointer_operator op, void *cookie)
+{
+  for (unsigned i = 0; i < x->m_num_ranges; ++i)
+{
+  op (>m_base[i * 2], NULL, cookie);
+  op (>m_base[i * 2 + 1], NULL, cookie);
+}
+  if (x->m_nonzero_mask)
+op (>m_nonzero_mask, NULL, cookie);
+}
+
+void
+gt_ggc_mx (frange *x)
+{
+  gt_ggc_mx (x->m_type);
+}
+
+void
+gt_pch_nx (frange *x)
+{
+  gt_pch_nx (x->m_type);
+}
+
+void
+gt_pch_nx (frange *x, gt_pointer_operator op, void *cookie)
+{
+  op (>m_type, NULL, cookie);
+}
+
+void
+gt_ggc_mx (vrange *x)
+{
+  if (is_a  (*x))
+return gt_ggc_mx ((irange *) x);
+  if (is_a  (*x))
+return gt_ggc_mx ((frange *) x);
+  gcc_unreachable ();
+}
+
+void
+gt_pch_nx (vrange *x)
+{
+  if (is_a  (*x))
+return gt_pch_nx ((irange *) x);
+  if (is_a  (*x))
+return gt_pch_nx ((frange *) x);
+  gcc_unreachable ();
+}
+
+void
+gt_pch_nx (vrange *x, gt_pointer_operator op, void *cookie)
+{
+  if (is_a  (*x))
+gt_pch_nx ((irange *) x, op, cookie);
+  else if (is_a  (*x))
+gt_pch_nx ((frange *) x, op, cookie);
+  else
+gcc_unreachable ();
+}
+
 // ?? These stubs are for ipa-prop.cc which use a value_range in a
 // hash_traits.  hash-traits.h defines an extern of gt_ggc_mx (T &)
 // instead of picking up the gt_ggc_mx (T *) version.
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 5545cce5024..0eeea79b322 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -72,7 +72,7 @@ enum value_range_discriminator
 // if (f.supports_type_p (type)) ...
 //}
 
-class vrange
+class GTY((user)) vrange
 {
   template  friend bool is_a (vrange &);
   friend class Value_Range;
@@ -329,10 +329,13 @@ nan_state::neg_p () const
 // The representation is a type with a couple of endpoints, unioned
 // with the set of { -NAN, +Nan }.
 
-class frange : public vrange
+class GTY((user)) frange : public vrange
 {
   friend class frange_storage_slot;
   friend class vrange_printer;
+  friend void gt_ggc_mx (frange *);
+  friend void gt_pch_nx (frange *);
+  friend void gt_pch_nx (frange *, gt_pointer_operator, void *);
 public:
   frange ();
   frange (const frange &);
@@ -827,41 +830,15 @@ range_includes_zero_p (const irange *vr)
   return vr->may_contain_p (build_zero_cst (vr->type ()));
 }
 
-inline void
-gt_ggc_mx (irange *x)
-{
-  for (unsigned i = 0; i < x->m_num_ranges; ++i)
-{
-  gt_ggc_mx (x->m_base[i * 2]);
-  gt_ggc_mx (x->m_base[i * 2 + 1]);
-}
-  if (x->m_nonzero_mask)
-gt_ggc_mx (x->m_nonzero_mask);
-}
-
-inline void
-gt_pch_nx (irange *x)
-{
-  for (unsigned i = 0; i < x->m_num_ranges; ++i)
-{
-  gt_pch_nx (x->m_base[i * 2]);
-  gt_pch_nx (x->m_base[i * 2 + 1]);
-}
-  if (x->m_nonzero_mask)
-gt_pch_nx (x->m_nonzero_mask);
-}
-
-inline void
-gt_pch_nx (irange *x, gt_pointer_operator op, void *cookie)
-{
-  for (unsigned i = 0; i < x->m_num_ranges; ++i)
-{
-  op (>m_base[i * 2], NULL, cookie);
-  op (>m_base[i * 2 + 1], NULL, cookie);
-}
-  if (x->m_nonzero_mask)
-op (>m_nonzero_mask, NULL, cookie);
-}
+extern void gt_ggc_mx (vrange *);
+extern void gt_pch_nx (vrange *);
+extern void gt_pch_nx (vrange *, gt_pointer_operator, void *);
+extern void gt_ggc_mx (irange *);
+extern void gt_pch_nx (irange *);
+extern void gt_pch_nx (irange *, gt_pointer_operator, void *);
+extern void gt_ggc_mx (frange *);
+extern void gt_pch_nx (frange *);
+extern void gt_pch_nx (frange *, gt_pointer_operator, void *);
 
 template
 inline void
-- 
2.39.2



Re: [PATCH] tree-optimization/109304 - properly handle instrumented aliases

2023-04-18 Thread Jan Hubicka via Gcc-patches
> > 
> > I do not think LTO is of any help here.  You can allways call non-LTO
> > const function from outer-world and that function can will end up
> > calling back to instrumented const function in your unit which
> > effectively makes the extenral const function non-const.
> 
> Hmm, true.
> 
> > > 
> > > That said, when there's a definition of say strlen in a TU and
> > > that's instrumented we do want to drop pure from calls but if
> > > not then we shouldn't worry.
> > > 
> > > Without LTO we'd still run into coverage issues but at least
> > > with LTO we shouldn't ICE?
> > 
> > I am not sure I see your point here...
> > We could avoid demoting builtins to avoid ICEs and have coverage
> > mismathces, but how LTO makes difference?
> 
> At least we get more functions local, but yes, we can still trigger
> the issue.
> 
> So what's the solution?  All functions that are not leaf or possibly
> instrumented have to be called as if they were not pure/const,
> including builtins?  As we've said we're going to ICE quite a bit
> when const/pure builtins suddenly are no longer const/pure.

Yep, I can't think of any easier solution than handling all functions as
not pure/const as soon as something instrumented is ever inlined to a
given function.  For builtins this is fun indeed.  We can special case
those that are always expanded inline at least...

Honza
> 
> Richard.


Re: [committed] libstdc++: Export global iostreams with GLIBCXX_3.4.31 symver [PR108969]

2023-04-18 Thread Jonathan Wakely via Gcc-patches
On Tue, 18 Apr 2023 at 16:59, Jonathan Wakely via Libstdc++
 wrote:
> diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
> b/libstdc++-v3/config/abi/pre/gnu.ver
> index 4ae63094eb7..7c015524b62 100644
> --- a/libstdc++-v3/config/abi/pre/gnu.ver
> +++ b/libstdc++-v3/config/abi/pre/gnu.ver
> @@ -2512,6 +2512,20 @@ GLIBCXX_3.4.31 {
>  
> _ZNKSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE[012]EEcvbEv;
>  
> _ZNKSt12__shared_ptrINSt10filesystem7__cxx1128recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE[012]EEcvbEv;
>
> +#if defined(_GLIBCXX_SYMVER_GNU) && defined(_GLIBCXX_SHARED) \
> +&& defined(_GLIBCXX_HAVE_AS_SYMVER_DIRECTIVE) \
> +&& defined(_GLIBCXX_HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT)

I've just noticed that these will never be defined, because this
linker script is preprocessed with config.h not c++config.h, and so it
should be:

#if defined(_GLIBCXX_SYMVER_GNU) && defined(_GLIBCXX_SHARED) \
&& defined(HAVE_AS_SYMVER_DIRECTIVE) \
&& defined(HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT)


The condition will never be true as currently written (which doesn't
actually seem to matter, but Jakub says that re-exporting the symbols
as the new version is needed in the linker script for older linkers).


> +# See src/c++98/globals_io.cc
> +_ZSt3cin;
> +_ZSt4cout;
> +_ZSt4cerr;
> +_ZSt4clog;
> +_ZSt4wcin;
> +_ZSt5wcout;
> +_ZSt5wcerr;
> +_ZSt5wclog;
> +#endif
> +
>  } GLIBCXX_3.4.30;
>
>  # Symbols in the support library (libsupc++) have their own tag.



Re: [PATCH v3] constraint: fix relaxed memory and repeated constraint handling

2023-04-18 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches  writes:
> On 4/18/23 07:02, Richard Sandiford via Gcc-patches wrote:
>> "Victor L. Do Nascimento"  writes:
>>> The function `constrain_operands' lacked the logic to consider relaxed
>>> memory constraints when "traditional" memory constraints were not
>>> satisfied, creating potential issues as observed during the reload
>>> compilation pass.
>>>
>>> In addition, it was observed that while `constrain_operands' chooses
>>> to disregard constraints when more than one alternative is provided,
>>> e.g. "m,r" using CONSTRAINT__UNKNOWN, it has no checks in place to
>>> determine whether the multiple constraints in a given string are in
>>> fact repetitions of the same constraint and should thus in fact be
>>> treated as a single constraint, as ought to be the case for something
>>> like "m,m".
>>>
>>> Both of these issues are dealt with here, thus ensuring that we get
>>> appropriate pattern matching.
>>>
>>> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?
>>>
>>> Victor
>>>
>>> gcc/
>>> * lra-constraints.cc (constraint_unique): New.
>>> (process_address_1): Apply constraint_unique test.
>>> * recog.cc (constrain_operands): Allow relaxed memory
>>> constaints.
>> 
>> OK, thanks.
> Does Victor have write access?  If not you should probably cover the 
> commit for him.

Ah, right, thanks.  I've pushed it now.

> If Victor is going to be making regular contributions, 
> then we should probably get him write access going forward.

Yeah.

Richard


Re: [PATCH] reload: Handle generating reloads that also clobbers flags

2023-04-18 Thread Eric Botcazou via Gcc-patches
> That "supposed to" is only *one* possible implementation.
> The one in CRIS - and I believe the preferred one; one I
> should advocate more - is to *always* expose clobbering of
> the flags.

Yes, both approaches are acceptable IMO and should work.

-- 
Eric Botcazou




[committed] libstdc++: Export global iostreams with GLIBCXX_3.4.31 symver [PR108969]

2023-04-18 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, sparc-solaris2.11. Pushed to trunk and gcc-13.

-- >8 --

Since GCC 13 the global iostream objects are only initialized once in
libstdc++, and not by a std::ios::Init object in every translation unit
that includes . To avoid using uninitialized streams defined
in an older libstdc++.so, translation units using the global iostreams
should depend on the GLIBCXX_3.4.31 symver.

Define std::cin as std::__io::cin and then export it as
std::cin@@GLIBCXX_3.4.31 so that references to std::cin bind to the new
symver. Also export it as @GLIBCXX_3.4 for backwards compatibility

libstdc++-v3/ChangeLog:

PR libstdc++/108969
* src/Makefile.am: Move globals_io.cc to here.
* src/Makefile.in: Regenerate.
* src/c++98/Makefile.am: Remove globals_io.cc from here.
* src/c++98/Makefile.in: Regenerate.
* src/c++98/globals_io.cc [_GLIBCXX_SYMVER_GNU] (cin): Adjust
symbol name and then export with GLIBCXX_3.4.31 symver.
(cout, cerr, clog, wcin, wcout, wcerr, wclog): Likewise.
* config/abi/post/aarch64-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/i486-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/m68k-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/riscv64-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt:
Regenerate.
* config/abi/post/s390x-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/x86_64-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/pre/gnu.ver: Add iostream objects to new symver.
---
 .../aarch64-linux-gnu/baseline_symbols.txt| 24 +-
 .../post/i486-linux-gnu/baseline_symbols.txt  | 24 +-
 .../post/m68k-linux-gnu/baseline_symbols.txt  | 24 +-
 .../powerpc64-linux-gnu/baseline_symbols.txt  | 24 +-
 .../riscv64-linux-gnu/baseline_symbols.txt| 24 +-
 .../post/s390x-linux-gnu/baseline_symbols.txt | 24 +-
 .../x86_64-linux-gnu/32/baseline_symbols.txt  | 24 +-
 .../x86_64-linux-gnu/baseline_symbols.txt | 24 +-
 libstdc++-v3/config/abi/pre/gnu.ver   | 14 
 libstdc++-v3/src/Makefile.am  |  1 +
 libstdc++-v3/src/Makefile.in  |  3 +-
 libstdc++-v3/src/c++98/Makefile.am|  1 -
 libstdc++-v3/src/c++98/Makefile.in| 16 --
 libstdc++-v3/src/c++98/globals_io.cc  | 32 ++-
 14 files changed, 176 insertions(+), 83 deletions(-)

diff --git 
a/libstdc++-v3/config/abi/post/aarch64-linux-gnu/baseline_symbols.txt 
b/libstdc++-v3/config/abi/post/aarch64-linux-gnu/baseline_symbols.txt
index 9be3453d6ed..d2cf0d41ab5 100644
--- a/libstdc++-v3/config/abi/post/aarch64-linux-gnu/baseline_symbols.txt
+++ b/libstdc++-v3/config/abi/post/aarch64-linux-gnu/baseline_symbols.txt
@@ -5258,15 +5258,23 @@ OBJECT:25:_ZTSNSt7__cxx118numpunctIcEE@@GLIBCXX_3.4.21
 OBJECT:25:_ZTSNSt7__cxx118numpunctIwEE@@GLIBCXX_3.4.21
 OBJECT:25:_ZTSSt20bad_array_new_length@@CXXABI_1.3.8
 OBJECT:26:_ZTSNSt3pmr15memory_resourceE@@GLIBCXX_3.4.28
-OBJECT:272:_ZSt4cerr@@GLIBCXX_3.4
-OBJECT:272:_ZSt4clog@@GLIBCXX_3.4
-OBJECT:272:_ZSt4cout@@GLIBCXX_3.4
-OBJECT:272:_ZSt5wcerr@@GLIBCXX_3.4
-OBJECT:272:_ZSt5wclog@@GLIBCXX_3.4
-OBJECT:272:_ZSt5wcout@@GLIBCXX_3.4
+OBJECT:272:_ZSt4cerr@@GLIBCXX_3.4.31
+OBJECT:272:_ZSt4cerr@GLIBCXX_3.4
+OBJECT:272:_ZSt4clog@@GLIBCXX_3.4.31
+OBJECT:272:_ZSt4clog@GLIBCXX_3.4
+OBJECT:272:_ZSt4cout@@GLIBCXX_3.4.31
+OBJECT:272:_ZSt4cout@GLIBCXX_3.4
+OBJECT:272:_ZSt5wcerr@@GLIBCXX_3.4.31
+OBJECT:272:_ZSt5wcerr@GLIBCXX_3.4
+OBJECT:272:_ZSt5wclog@@GLIBCXX_3.4.31
+OBJECT:272:_ZSt5wclog@GLIBCXX_3.4
+OBJECT:272:_ZSt5wcout@@GLIBCXX_3.4.31
+OBJECT:272:_ZSt5wcout@GLIBCXX_3.4
 OBJECT:27:_ZTSSt19__codecvt_utf8_baseIwE@@GLIBCXX_3.4.21
-OBJECT:280:_ZSt3cin@@GLIBCXX_3.4
-OBJECT:280:_ZSt4wcin@@GLIBCXX_3.4
+OBJECT:280:_ZSt3cin@@GLIBCXX_3.4.31
+OBJECT:280:_ZSt3cin@GLIBCXX_3.4
+OBJECT:280:_ZSt4wcin@@GLIBCXX_3.4.31
+OBJECT:280:_ZSt4wcin@GLIBCXX_3.4
 OBJECT:28:_ZTSSt19__codecvt_utf8_baseIDiE@@GLIBCXX_3.4.21
 OBJECT:28:_ZTSSt19__codecvt_utf8_baseIDsE@@GLIBCXX_3.4.21
 OBJECT:28:_ZTSSt20__codecvt_utf16_baseIwE@@GLIBCXX_3.4.21
diff --git a/libstdc++-v3/config/abi/post/i486-linux-gnu/baseline_symbols.txt 
b/libstdc++-v3/config/abi/post/i486-linux-gnu/baseline_symbols.txt
index ed8966b9c7b..35436370a58 100644
--- a/libstdc++-v3/config/abi/post/i486-linux-gnu/baseline_symbols.txt
+++ b/libstdc++-v3/config/abi/post/i486-linux-gnu/baseline_symbols.txt
@@ -4769,14 +4769,22 @@ OBJECT:13:_ZTSSt9exception@@GLIBCXX_3.4
 OBJECT:13:_ZTSSt9strstream@@GLIBCXX_3.4
 OBJECT:13:_ZTSSt9time_base@@GLIBCXX_3.4
 OBJECT:13:_ZTSSt9type_info@@GLIBCXX_3.4
-OBJECT:140:_ZSt4cerr@@GLIBCXX_3.4

  1   2   3   >