date:20231027

> Suggested adapt codes as follows:
> 
> unsigned int element_size = GET_MODE_SIZE (mode).to_constant ();
> poly_int64 nunits = exact_div (BYTES_PER_RISCV_VECTOR *TARGET_MAX_LMUL, 
> element_size);
> if (!get_vector_mode(mode, nunits).exists())
>   gcc_unreachable ();

Actually I was initially considering using lmul = m8 here,
unconditionally, but the param is probably the more intuitive choice.

Attached v2 with that included.  Also moved the riscv test to
autovec/builtin/ so we can add the other builtins as well.

> Also, this patch reminds me we are missing some more similiar builtin
> function which can use RVV:
> 
> strlen, strcpy, strcmp...etc

Yes we should still have them but I'd rather not work on that right
now.  How about I open a PR for it so we can still add them in stage 3?
Their impact is pretty localized and the risk should be low.
Kito, Palmer, Jeff - would that be acceptable?

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md (rawmemchr): New expander.
* config/riscv/riscv-protos.h (enum insn_type): Define.
(expand_rawmemchr): New function.
* config/riscv/riscv-v.cc (expand_rawmemchr): Add vectorized
rawmemchr.
* internal-fn.cc (expand_RAWMEMCHR): Fix typo.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ldist-rawmemchr-1.c: Add riscv.
* gcc.dg/tree-ssa/ldist-rawmemchr-2.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add builtin directory.
* gcc.target/riscv/rvv/autovec/builtin/rawmemchr-1.c: New test.
---
 gcc/config/riscv/autovec.md   | 13 +++
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   | 89 +
 gcc/internal-fn.cc|  2 +-
 .../gcc.dg/tree-ssa/ldist-rawmemchr-1.c   |  8 +-
 .../gcc.dg/tree-ssa/ldist-rawmemchr-2.c   |  8 +-
 .../riscv/rvv/autovec/builtin/rawmemchr-1.c   | 99 +++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  2 +
 8 files changed, 213 insertions(+), 9 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/rawmemchr-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 1ddc1993120..4f13494afdb 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2397,3 +2397,16 @@ (define_expand "lfloor2"
 DONE;
   }
 )
+
+;; Implement rawmemchr[qi|si|hi].
+(define_expand "rawmemchr"
+  [(match_operand  0 "register_operand")
+   (match_operand  1 "memory_operand")
+   (match_operand:ANYI 2 "const_int_operand")]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_rawmemchr(mode, operands[0], operands[1],
+  operands[2]);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 843a81b0e86..7f148ed95fe 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -526,6 +526,7 @@ void expand_cond_unop (unsigned, rtx *);
 void expand_cond_binop (unsigned, rtx *);
 void expand_cond_ternop (unsigned, rtx *);
 void expand_popcount (rtx *);
+void expand_rawmemchr (machine_mode, rtx, rtx, rtx);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 3fe8125801b..0f664553cf4 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2215,6 +2215,95 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
   return true;
 }
 
+/* Implement rawmemchr using vector instructions.
+   It can be assumed that the needle is in the haystack, otherwise the
+   behavior is undefined.  */
+
+void
+expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
+{
+  /*
+rawmemchr:
+loop:
+   vsetvli a1, zero, e[8,16,32,64], m1, ta, ma
+   vle[8,16,32,64]ff.v v8, (a0)  # Load.
+   csrr a1, vl  # Get number of bytes read.
+   vmseq.vx v0, v8, pat # v0 = (v8 == {pat, pat, ...})
+   vfirst.m a2, v0  # Find first hit.
+   add a0, a0, a1   # Bump pointer.
+   bltz a2, loop# Not found?
+
+   sub a0, a0, a1   # Go back by a1.
+   shll a2, a2, [0,1,2,3]   # Shift to get byte offset.
+   add a0, a0, a2   # Add the offset.
+
+   ret
+  */
+  gcc_assert (TARGET_VECTOR);
+
+  unsigned int isize = GET_MODE_SIZE (mode).to_constant ();
+  int lmul = riscv_autovec_lmul == RVV_DYNAMIC ? RVV_M8 : riscv_autovec_lmul;
+  poly_int64 nunits = exact_div (BYTES_PER_RISCV_VECTOR * lmul, isize);
+
+  machine_mode vmode;
+  if (!get_vector_mode (GET_MODE_INNER (mode), nunits).exists ())
+gcc_unreachable ();
+
+  machine_mode mask_mode = get_mask_mode (vmode);
+
+  rtx cnt = gen_reg_rtx (Pmode);
+  rtx end = gen_reg_rtx (Pmode);
+  rtx vec = gen_reg_rtx (vmode);
+  rtx mask = gen_reg_rtx (mask_mode);
+
+  /* After finding the first vector element

[PATCH] RISC-V: Fix wrong tune parameters on int_div

2023-10-27 Thread juzhe.zh...@rivai.ai

LGTM from my side.

The original integer division COST seems too low.

Hi, Jeff and Kito. Could take a look at this patch ?

Thanks.



juzhe.zh...@rivai.ai

Re: [PATCH] RISC-V: Add rawmemchr expander.

> I notice we have expand_block_move
> in riscv-v.cc
> 
> Maybe we should move it into riscv-string.cc ?

Yes I will also move that one.

Regards
 Robin

[PATCH, expand] Checking available optabs for scalar modes in by pieces operations

2023-10-27 Thread HAO CHEN GUI

Hi,
  This patch checks available optabs for scalar modes used in by
pieces operations. It fixes the regression cases caused by previous
patch. Now both scalar and vector modes are examined by the same
approach.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen


ChangeLog
Expand: Checking available optabs for scalar modes in by pieces operations

The former patch (f08ca5903c7) examines the scalar modes by target
hook scalar_mode_supported_p.  It causes some i386 regression cases
as XImode and OImode are not enabled in i386 target function.  This
patch examines the scalar mode by checking if the corresponding optabs
are available for the mode.

gcc/
PR target/111449
* expr.cc (qi_vector_mode_supported_p): Rename to...
(by_pieces_mode_supported_p): ...this, and extends it to do
the checking for both scalar and vector mode.
(widest_fixed_size_mode_for_size): Call
by_pieces_mode_supported_p to examine the mode.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): Likewise.

patch.diff
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 7aac575eff8..2af9fcbed18 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -1000,18 +1000,21 @@ can_use_qi_vectors (by_pieces_operation op)
 /* Return true if optabs exists for the mode and certain by pieces
operations.  */
 static bool
-qi_vector_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
+by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
 {
+  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
+return false;
+
   if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
-  && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing)
-return true;
+  && VECTOR_MODE_P (mode)
+  && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing)
+return false;

   if (op == COMPARE_BY_PIECES
-  && optab_handler (mov_optab, mode) != CODE_FOR_nothing
-  && can_compare_p (EQ, mode, ccp_jump))
-return true;
+  && !can_compare_p (EQ, mode, ccp_jump))
+return false;

-  return false;
+  return true;
 }

 /* Return the widest mode that can be used to perform part of an
@@ -1035,7 +1038,7 @@ widest_fixed_size_mode_for_size (unsigned int size, 
by_pieces_operation op)
  {
if (GET_MODE_SIZE (candidate) >= size)
  break;
-   if (qi_vector_mode_supported_p (candidate, op))
+   if (by_pieces_mode_supported_p (candidate, op))
  result = candidate;
  }

@@ -1049,7 +1052,7 @@ widest_fixed_size_mode_for_size (unsigned int size, 
by_pieces_operation op)
 {
   mode = tmode.require ();
   if (GET_MODE_SIZE (mode) < size
- && targetm.scalar_mode_supported_p (mode))
+ && by_pieces_mode_supported_p (mode, op))
   result = mode;
 }

@@ -1454,7 +1457,7 @@ op_by_pieces_d::smallest_fixed_size_mode_for_size 
(unsigned int size)
  break;

if (GET_MODE_SIZE (candidate) >= size
-   && qi_vector_mode_supported_p (candidate, m_op))
+   && by_pieces_mode_supported_p (candidate, m_op))
  return candidate;
  }
 }

Re: [PATCH] testsuite, Darwin: Add support for Mach-O function body scans.

2023-10-27 Thread Iain Sandoe

Hi Richard,

> On 26 Oct 2023, at 21:00, Iain Sandoe  wrote:

>> On 26 Oct 2023, at 20:49, Richard Sandiford  
>> wrote:
>> 
>> Iain Sandoe  writes:
>>> This was written before Thomas' modification to the ELF-handling to allow
>>> a config-based change for target details.  I did consider updating this
>>> to try and use that scheme, but I think that it would sit a little
>>> awkwardly, since there are some differences in the start-up scanning for
>>> Mach-O.  I would say that in all probability we could improve things but
>>> I'd like to put this forward as a well-tested initial implementation.
>> 
>> Sorry, I would prefer to extend the existing function instead.
>> E.g. there's already some divergence between the Mach-O version
>> and the default version, in that the Mach-O version doesn't print
>> verbose messages.  I also don't think that the current default code
>> is so watertight that it'll never need to be updated in future.
> 
> Fair enough, will explore what can be done (as I recall last I looked the
> primary difference was in the initial start-up scan).

I’ve done this as attached.

For the record, when doing it, it gave rise to the same misgivings that led
to the separate implementation before.

 * as we add formats and uncover asm oddities, they all need to be handled
   in one set of code, IMO it could be come quite convoluted.

 * now making a change to the MACH-O code, means I have to check I did not
   inadvertently break ELF (and likewise, in theory, an ELF change should check
   MACH-O, but many folks do/can not do that).

Maybe there’s some half-way-house where code can usefully be shared without
those down-sides.

Anyway, to make progress, is the revised version OK for trunk? (tested on
aarch64-linux and aarch64-darwin).
thanks
Iain

0001-testsuite-Darwin-Add-support-for-Mach-O-function-bod.patch
Description: Binary data

Re: [PATCH] MATCH: Simplify `(X &| B) CMP X` if possible [PR 101590]




> Am 26.10.2023 um 23:10 schrieb Andrew Pinski :
> 
> From: Andrew Pinski 
> 
> I noticed we were missing these simplifications so let's add them.
> 
> This adds the following simplifications:
> U & N <= U  -> true
> U & N >  U  -> false
> When U is known to be as non-negative.
> 
> When N is also known to be non-negative, this is also true:
> U | N <  U  -> false
> U | N >= U  -> true
> 
> When N is a negative integer, the result flips and we get:
> U | N <  U  -> true
> U | N >= U  -> false

I think bit-CCP should get this, does ranger also figure this out (iirc it 
tracks nonzero bits?)

Your testcases suggest this doesn’t happen, can you figure out why CCP doesn’t 
optimize this and maybe file a bug?

> We could extend this later on to be the case where we know N
> is nonconstant but is known to be negative.
> 
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Ok

Richard 

>PR tree-optimization/101590
>PR tree-optimization/94884
> 
> gcc/ChangeLog:
> 
>* match.pd (`(X BIT_OP Y) CMP X`): New pattern.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.dg/tree-ssa/bitcmp-1.c: New test.
>* gcc.dg/tree-ssa/bitcmp-2.c: New test.
>* gcc.dg/tree-ssa/bitcmp-3.c: New test.
>* gcc.dg/tree-ssa/bitcmp-4.c: New test.
>* gcc.dg/tree-ssa/bitcmp-5.c: New test.
>* gcc.dg/tree-ssa/bitcmp-6.c: New test.
> ---
> gcc/match.pd | 24 +
> gcc/testsuite/gcc.dg/tree-ssa/bitcmp-1.c | 20 +++
> gcc/testsuite/gcc.dg/tree-ssa/bitcmp-2.c | 20 +++
> gcc/testsuite/gcc.dg/tree-ssa/bitcmp-3.c | 21 
> gcc/testsuite/gcc.dg/tree-ssa/bitcmp-4.c | 36 
> gcc/testsuite/gcc.dg/tree-ssa/bitcmp-5.c | 43 
> gcc/testsuite/gcc.dg/tree-ssa/bitcmp-6.c | 41 ++
> 7 files changed, 205 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitcmp-1.c
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitcmp-2.c
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitcmp-3.c
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitcmp-4.c
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitcmp-5.c
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitcmp-6.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5f6aeb07ac0..7d651a6582d 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2707,6 +2707,30 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (TREE_INT_CST_LOW (@1) & 1)
>{ constant_boolean_node (cmp == NE_EXPR, type); })))
> 
> +/*
> +   U & N <= U  -> true
> +   U & N >  U  -> false
> +   U needs to be non-negative.
> +
> +   U | N <  U  -> false
> +   U | N >= U  -> true
> +   U and N needs to be non-negative
> +
> +   U | N <  U  -> true
> +   U | N >= U  -> false
> +   U needs to be non-negative and N needs to be a negative constant.
> +   */
> +(for cmp   (lt  ge  le  gt )
> + bitop (bit_ior bit_ior bit_and bit_and)
> + (simplify
> +  (cmp:c (bitop:c tree_expr_nonnegative_p@0 @1) @0)
> +  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
> +   (if (bitop == BIT_AND_EXPR || tree_expr_nonnegative_p (@1))
> +{ constant_boolean_node (cmp == GE_EXPR || cmp == LE_EXPR, type); }
> +/* The sign is opposite now so the comparison is swapped around. */
> +(if (TREE_CODE (@1) == INTEGER_CST && wi::neg_p (wi::to_wide (@1)))
> + { constant_boolean_node (cmp == LT_EXPR, type); })
> +
> /* Arguments on which one can call get_nonzero_bits to get the bits
>possibly set.  */
> (match with_possible_nonzero_bits
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-1.c
> new file mode 100644
> index 000..f3d515bb2d6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-1.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* PR tree-optimization/101590 */
> +
> +int f_and_le(unsigned len) {
> +  const unsigned N = 4;
> +  unsigned newlen = len & -N;
> +  return newlen <= len; // return 1
> +}
> +int f_or_ge(unsigned len) {
> +  const unsigned N = 4;
> +  unsigned newlen = len | -N;
> +  return newlen >= len; // return 1
> +}
> +
> +/* { dg-final { scan-tree-dump-not " <= " "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " >= " "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " & "  "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " \\\| " "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-2.c
> new file mode 100644
> index 000..d0031d9ecb8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-2.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* PR tree-optimization/101590 */
> +
> +int f_and_gt(unsigned len) {
> +  const unsigned N = 4;
> +  unsigned newlen = len & -N;
> +  return newlen > len; // return 0

Re: [PATCH] MATCH: Simplify `(X &| B) CMP X` if possible [PR 101590]

2023-10-27 Thread Andrew Pinski

On Thu, Oct 26, 2023 at 11:56 PM Richard Biener
 wrote:
>
>
>
> > Am 26.10.2023 um 23:10 schrieb Andrew Pinski :
> >
> > From: Andrew Pinski 
> >
> > I noticed we were missing these simplifications so let's add them.
> >
> > This adds the following simplifications:
> > U & N <= U  -> true
> > U & N >  U  -> false
> > When U is known to be as non-negative.
> >
> > When N is also known to be non-negative, this is also true:
> > U | N <  U  -> false
> > U | N >= U  -> true
> >
> > When N is a negative integer, the result flips and we get:
> > U | N <  U  -> true
> > U | N >= U  -> false
>
> I think bit-CCP should get this, does ranger also figure this out (iirc it 
> tracks nonzero bits?)
>
> Your testcases suggest this doesn’t happen, can you figure out why CCP 
> doesn’t optimize this and maybe file a bug?

CCP and ranger is able to figure when N is a negative constant.
Otherwise no. I only added this to the testcase/match because I
originally messed up that case while working on the patch and noticed
different answers.

Thanks,
Andrew

>
> > We could extend this later on to be the case where we know N
> > is nonconstant but is known to be negative.
> >
> > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> Ok
>
> Richard
>
> >PR tree-optimization/101590
> >PR tree-optimization/94884
> >
> > gcc/ChangeLog:
> >
> >* match.pd (`(X BIT_OP Y) CMP X`): New pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> >* gcc.dg/tree-ssa/bitcmp-1.c: New test.
> >* gcc.dg/tree-ssa/bitcmp-2.c: New test.
> >* gcc.dg/tree-ssa/bitcmp-3.c: New test.
> >* gcc.dg/tree-ssa/bitcmp-4.c: New test.
> >* gcc.dg/tree-ssa/bitcmp-5.c: New test.
> >* gcc.dg/tree-ssa/bitcmp-6.c: New test.
> > ---
> > gcc/match.pd | 24 +
> > gcc/testsuite/gcc.dg/tree-ssa/bitcmp-1.c | 20 +++
> > gcc/testsuite/gcc.dg/tree-ssa/bitcmp-2.c | 20 +++
> > gcc/testsuite/gcc.dg/tree-ssa/bitcmp-3.c | 21 
> > gcc/testsuite/gcc.dg/tree-ssa/bitcmp-4.c | 36 
> > gcc/testsuite/gcc.dg/tree-ssa/bitcmp-5.c | 43 
> > gcc/testsuite/gcc.dg/tree-ssa/bitcmp-6.c | 41 ++
> > 7 files changed, 205 insertions(+)
> > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitcmp-1.c
> > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitcmp-2.c
> > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitcmp-3.c
> > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitcmp-4.c
> > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitcmp-5.c
> > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitcmp-6.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 5f6aeb07ac0..7d651a6582d 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -2707,6 +2707,30 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   (if (TREE_INT_CST_LOW (@1) & 1)
> >{ constant_boolean_node (cmp == NE_EXPR, type); })))
> >
> > +/*
> > +   U & N <= U  -> true
> > +   U & N >  U  -> false
> > +   U needs to be non-negative.
> > +
> > +   U | N <  U  -> false
> > +   U | N >= U  -> true
> > +   U and N needs to be non-negative
> > +
> > +   U | N <  U  -> true
> > +   U | N >= U  -> false
> > +   U needs to be non-negative and N needs to be a negative constant.
> > +   */
> > +(for cmp   (lt  ge  le  gt )
> > + bitop (bit_ior bit_ior bit_and bit_and)
> > + (simplify
> > +  (cmp:c (bitop:c tree_expr_nonnegative_p@0 @1) @0)
> > +  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
> > +   (if (bitop == BIT_AND_EXPR || tree_expr_nonnegative_p (@1))
> > +{ constant_boolean_node (cmp == GE_EXPR || cmp == LE_EXPR, type); }
> > +/* The sign is opposite now so the comparison is swapped around. */
> > +(if (TREE_CODE (@1) == INTEGER_CST && wi::neg_p (wi::to_wide (@1)))
> > + { constant_boolean_node (cmp == LT_EXPR, type); })
> > +
> > /* Arguments on which one can call get_nonzero_bits to get the bits
> >possibly set.  */
> > (match with_possible_nonzero_bits
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-1.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-1.c
> > new file mode 100644
> > index 000..f3d515bb2d6
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-1.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +/* PR tree-optimization/101590 */
> > +
> > +int f_and_le(unsigned len) {
> > +  const unsigned N = 4;
> > +  unsigned newlen = len & -N;
> > +  return newlen <= len; // return 1
> > +}
> > +int f_or_ge(unsigned len) {
> > +  const unsigned N = 4;
> > +  unsigned newlen = len | -N;
> > +  return newlen >= len; // return 1
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not " <= " "optimized" } } */
> > +/* { dg-final { scan-tree-dump-not " >= " "optimized" } } */
> > +/* { dg-final { scan-tree-dump-not " & "  "optimized" } } */
> > +/* { dg-final { scan-tree-dump-not " \\\| " "optimized" } } */
> > +/* { dg-final {

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-27 Thread Martin Uecker

Am Donnerstag, dem 26.10.2023 um 19:57 + schrieb Qing Zhao:
> I guess that what Kees wanted, ""fill the array without knowing the actual 
> final size" code pattern”, as following:
> 
> > >   struct foo *f;
> > >   char *p;
> > >   int i;
> > > 
> > >   f = alloc(maximum_possible);
> > >   f->count = 0;
> > >   p = f->buf;
> > > 
> > >   for (i; data_is_available() && i < maximum_possible; i++) {
> > >   f->count ++;
> > >   p[i] = next_data_item();
> > >   }
> 
> actually is a dynamic array, or more accurately, Bounded-size dynamic array: 
> ( but not a dynamic allocated array as we discussed so far)
> 
> https://en.wikipedia.org/wiki/Dynamic_array
> 
> This dynamic array, also is called growable array, or resizable array, whose 
> size can 
> be changed during the lifetime. 
> 
> For VLA or FAM, I believe that they are both dynamic allocated array, i.e, 
> even though the size is not know at the compilation time, but the size
> will be fixed after the array is allocated. 
> 
> I am not sure whether C has support to such Dynamic array? Or whether it’s 
> easy to provide dynamic array support in C?

It is possible to support dynamic arrays in C even with
good checking, but not safely using the pattern above
where you derive a pointer which you later use independently.

While we could track the connection to the original struct,
the necessary synchronization between the counter and the
access to the buffer is difficult.  I do not see how this
could be supported with reasonable effort and cost.
 

But with this restriction in mind, we can do a lot in C.
For example, see my experimental (!) container library
which has vector type.
https://github.com/uecker/noplate/blob/main/test.c
You can get an array view for the vector (which then
also can decay to a pointer), so it interoperates nicely
with C but you can get good bounds checking.


But once you derive a pointer and pass it on, it gets
difficult.  But if you want safety, you just have to 
to simply avoid this in code. 

What we could potentially do is add restrictions so 
that the access to buf always has to go via x->buf 
or you get at least a warning.

Martin




> 
> Qing
> 
> 
> > On Oct 26, 2023, at 12:45 PM, Martin Uecker  wrote:
> > 
> > Am Donnerstag, dem 26.10.2023 um 09:13 -0700 schrieb Kees Cook:
> > > On Thu, Oct 26, 2023 at 10:15:10AM +0200, Martin Uecker wrote:
> > > > but not this:
> > > > 
> > 
> > x->count = 11;
> > > > char *p = >buf;
> > > > x->count = 1;
> > > > p[10] = 1; // !
> > > 
> > > This seems fine to me -- it's how I'd expect it to work: "10" is beyond
> > > "1".
> > 
> > Note that the store would be allowed.
> > 
> > > 
> > > > (because the pointer is passed around the
> > > > store to the counter)
> > > > 
> > > > and also here the second store is then irrelevant
> > > > for the access:
> > > > 
> > > > x->count = 10;
> > > > char* p = >buf;
> > > > ...
> > > > x->count = 1; // somewhere else
> > > > 
> > > > p[9] = 1; // ok, because count matter when buf was accesssed.
> > > 
> > > This is less great, but I can understand why it happens. "p" loses the
> > > association with "x". It'd be nice if "p" had to way to retain that it
> > > was just an alias for x->buf, so future p access would check count.
> > 
> > The problem is not to discover that p is an alias to x->buf, 
> > but that it seems difficult to make sure that stores to 
> > x->count are not reordered relative to the final access to
> > p[i] you want to check, so that you then get the right value.
> > 
> > > 
> > > But this appears to be an existing limitation in other areas where an
> > > assignment will cause the loss of object association. (I've run into
> > > this before.) It's just more surprising in the above example because in
> > > the past the loss of association would cause __bdos() to revert back to
> > > "SIZE_MAX" results ("I don't know the size") rather than an "outdated"
> > > size, which may get us into unexpected places...
> > > 
> > > > IMHO this makes sense also from the user side and
> > > > are the desirable semantics we discussed before.
> > > > 
> > > > But can you take a look at this?
> > > > 
> > > > 
> > > > This should simulate it fairly well:
> > > > https://godbolt.org/z/xq89aM7Gr
> > > > 
> > > > (the call to the noinline function would go away,
> > > > but not necessarily its impact on optimization)
> > > 
> > > Yeah, this example should be a very rare situation: a leaf function is
> > > changing the characteristics of the struct but returning a buffer within
> > > it to the caller. The more likely glitch would be from:
> > > 
> > > int main()
> > > {
> > >   struct foo *f = foo_alloc(7);
> > >   char *p = FAM_ACCESS(f, size, buf);
> > > 
> > >   printf("%ld\n", __builtin_dynamic_object_size(p, 0));
> > >   test1(f); // or just "f->count = 10;" no function call needed
> > >   printf("%ld\n", __builtin_dynamic_object_size(p, 0));
> > > 
> > >   return 0;
> > > }
> > > 
> > > which reports:
> > > 7
> > > 7
> > > 
>

Re: [PATCH] RISC-V: Fix wrong tune parameters on int_div

> @@ -346,7 +346,7 @@ static const struct riscv_tune_param rocket_tune_info = {
>{COSTS_N_INSNS (4), COSTS_N_INSNS (5)},/* fp_mul */
>{COSTS_N_INSNS (20), COSTS_N_INSNS (20)},  /* fp_div */
>{COSTS_N_INSNS (4), COSTS_N_INSNS (4)},/* int_mul */
> -  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},/* int_div */
> +  {COSTS_N_INSNS (33), COSTS_N_INSNS (65)},  /* int_div */
>1, /* issue_rate */
>3, /* branch_cost */
>5, /* memory_cost */
> @@ -361,7 +361,7 @@ static const struct riscv_tune_param sifive_7_tune_info = 
> {
>{COSTS_N_INSNS (4), COSTS_N_INSNS (5)},/* fp_mul */
>{COSTS_N_INSNS (20), COSTS_N_INSNS (20)},  /* fp_div */
>{COSTS_N_INSNS (4), COSTS_N_INSNS (4)},/* int_mul */
> -  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},/* int_div */
> +  {COSTS_N_INSNS (33), COSTS_N_INSNS (65)},  /* int_div */
>2, /* issue_rate */
>4, /* branch_cost */
>3, /* memory_cost */
> @@ -376,7 +376,7 @@ static const struct riscv_tune_param thead_c906_tune_info 
> = {
>{COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_mul */
>{COSTS_N_INSNS (20), COSTS_N_INSNS (20)}, /* fp_div */
>{COSTS_N_INSNS (4), COSTS_N_INSNS (4)}, /* int_mul */
> -  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)}, /* int_div */
> +  {COSTS_N_INSNS (18), COSTS_N_INSNS (34)}, /* int_div */
>1,/* issue_rate */
>3,/* branch_cost */
>5,/* memory_cost */

Instruction costs don't really correspond to latencies even though
sometimes they are used as if they were.  I'm a bit wary of using
e.g. 65 which would disparage each use of an integer division inside
a sequence.

Could you check which costs we need in order to still emit your wanted
sequence?  Maybe we can use values a bit lower than yours and still
get the proper code.  Where is the decision being made actually?

Regards
 Robin

Re: [PATCH] Improve memcmpeq for 512-bit vector with vpcmpeq + kortest.




> Am 27.10.2023 um 07:50 schrieb liuhongt :
> 
> When 2 vectors are equal, kmask is allones and kortest will set CF,
> else CF will be cleared.
> 
> So CF bit can be used to check for the result of the comparison.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?

Is that also profitable for 256bit aka AVX10?
Is there a jump on carry in case the result feeds control flow rather than a 
value and is using ktest better then (does combine figure this out?)

> Before:
>vmovdqu (%rsi), %ymm0
>vpxorq  (%rdi), %ymm0, %ymm0
>vptest  %ymm0, %ymm0
>jne .L2
>vmovdqu 32(%rsi), %ymm0
>vpxorq  32(%rdi), %ymm0, %ymm0
>vptest  %ymm0, %ymm0
>je  .L5
> .L2:
>movl$1, %eax
>xorl$1, %eax
>vzeroupper
>ret
> 
> After:
>vmovdqu64   (%rsi), %zmm0
>xorl%eax, %eax
>vpcmpeqd(%rdi), %zmm0, %k0
>kortestw%k0, %k0
>setc%al
>vzeroupper
>ret
> 
> gcc/ChangeLog:
> 
>PR target/104610
>* config/i386/i386-expand.cc (ix86_expand_branch): Handle
>512-bit vector with vpcmpeq + kortest.
>* config/i386/i386.md (cbranchxi4): New expander.
>* config/i386/sse.md: (cbranch4): Extend to V16SImode
>and V8DImode.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.target/i386/pr104610-2.c: New test.
> ---
> gcc/config/i386/i386-expand.cc | 55 +++---
> gcc/config/i386/i386.md| 16 +++
> gcc/config/i386/sse.md | 36 +++---
> gcc/testsuite/gcc.target/i386/pr104610-2.c | 14 ++
> 4 files changed, 99 insertions(+), 22 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/i386/pr104610-2.c
> 
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 1eae9d7c78c..c664cb61e80 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -2411,30 +2411,53 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx 
> op1, rtx label)
>   rtx tmp;
> 
>   /* Handle special case - vector comparsion with boolean result, transform
> - it using ptest instruction.  */
> + it using ptest instruction or vpcmpeq + kortest.  */
>   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>   || (mode == TImode && !TARGET_64BIT)
> -  || mode == OImode)
> +  || mode == OImode
> +  || GET_MODE_SIZE (mode) == 64)
> {
> -  rtx flag = gen_rtx_REG (CCZmode, FLAGS_REG);
> -  machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : V2DImode;
> +  unsigned msize = GET_MODE_SIZE (mode);
> +  machine_mode p_mode
> += msize == 64 ? V16SImode : msize == 32 ? V4DImode : V2DImode;
> +  /* kortest set CF when result is 0x (op0 == op1).  */
> +  rtx flag = gen_rtx_REG (msize == 64 ? CCCmode : CCZmode, FLAGS_REG);
> 
>   gcc_assert (code == EQ || code == NE);
> 
> -  if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT)
> +  /* Using vpcmpeq zmm zmm k + kortest for 512-bit vectors.  */
> +  if (msize == 64)
>{
> -  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
> -  op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
> -  mode = p_mode;
> +  if (mode != V16SImode)
> +{
> +  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
> +  op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
> +}
> +
> +  tmp = gen_reg_rtx (HImode);
> +  emit_insn (gen_avx512f_cmpv16si3 (tmp, op0, op1, GEN_INT (0)));
> +  emit_insn (gen_kortesthi_ccc (tmp, tmp));
> +}
> +  /* Using ptest for 128/256-bit vectors.  */
> +  else
> +{
> +  if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT)
> +{
> +  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
> +  op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
> +  mode = p_mode;
> +}
> +
> +  /* Generate XOR since we can't check that one operand is zero
> + vector.  */
> +  tmp = gen_reg_rtx (mode);
> +  emit_insn (gen_rtx_SET (tmp, gen_rtx_XOR (mode, op0, op1)));
> +  tmp = gen_lowpart (p_mode, tmp);
> +  emit_insn (gen_rtx_SET (gen_rtx_REG (CCZmode, FLAGS_REG),
> +  gen_rtx_UNSPEC (CCZmode,
> +  gen_rtvec (2, tmp, tmp),
> +  UNSPEC_PTEST)));
>}
> -  /* Generate XOR since we can't check that one operand is zero vector.  
> */
> -  tmp = gen_reg_rtx (mode);
> -  emit_insn (gen_rtx_SET (tmp, gen_rtx_XOR (mode, op0, op1)));
> -  tmp = gen_lowpart (p_mode, tmp);
> -  emit_insn (gen_rtx_SET (gen_rtx_REG (CCZmode, FLAGS_REG),
> -  gen_rtx_UNSPEC (CCZmode,
> -  gen_rtvec (2, tmp, tmp),
> -  UNSPEC_PTEST)));
>   tmp = gen_rtx_fmt_ee (code, VOIDmode, flag, const0_rtx);
>   tmp =

Re: [PATCH] RISC-V: Add rawmemchr expander.

> Could you put it into riscv-string.cc rather than riscv-v.cc? I would
> like to put those builtin function expander together if possible,
> riscv-string.cc might little bit confuse, but it's all included in
> string.h

Ok, sure.  Will commit the adjusted patch if no further comments.

Regards
 Robin

Re: [PATCH v2 2/4] libgrust: Add libproc_macro and build system

2023-10-27 Thread Iain Sandoe



> On 26 Oct 2023, at 09:21, Thomas Schwinge  wrote:
> 
> Hi!
> 
> First, I've pushed into GCC upstream Git branch devel/rust/libgrust-v2
> the "v2" libgrust changes as posted by Arthur, so that people can easily
> test this before it getting into Git master branch.
> 
> I'll myself later try this for GCN and nvptx targets -- in their current
> form where they don't support C++ (standard library), and in my hacky WIP
> trees where C++ (standard library) is supported to some extent.  (This
> should, roughly, match C++ functionality (not) provided by a number of
> other GCC "embedded" targets.)

on Darwin, it works for later systems without multilibs, but fails to build 
multilibs.

—— so….

With the patch below bootstrap suceeded on x86_64-darwin17 and produced a 
correct
architecture multilib.  Of course, there is no way to test this at the moment - 
I’d suggest
that the next step might be something small in functionality that can allow at 
least one
test to be wired up.

^^^ this is “lightly tested” of course, as I cycle through other versions of 
the OS will see
how it pans out.

Do you want me to make a PR with this change against upstream?

Iain



0001-libgrust-enable-multilib.patch
Description: Binary data

> 
> 
> Then:
> 
> On 2023-10-25T13:06:46+0200, Arthur Cohen  wrote:
>> From: Pierre-Emmanuel Patry 
>> 
>> Add some dummy files in libproc_macro along with its build system.
> 
> I've not reviewed the build system in detail, just had a very quick look.
> 
> Three instances of 'librust'; should be 'libgrust':
> 
>configure.ac:AC_INIT([libgrust], version-unused,,librust)
> 
>configure.ac:AC_MSG_NOTICE([librust has been configured.])
> 
>Makefile.am:"TARGET_LIB_PATH_librust=$(TARGET_LIB_PATH_librust)" \
> 
> Compared to libgomp (which I'm reasonably familiar with), I found missing
> in 'libgrust' at 'configure'-level:
> 
>  --enable-multilib   build many library versions (default)
> 
>  --disable-werrordisable building with -Werror
> 
>  --enable-symvers=STYLE  enables symbol versioning of the shared library
>  [default=yes]
> 
>  --enable-cetenable Intel CET in target libraries 
> [default=auto]
> 
>  --with-gcc-major-version-only
>  use only GCC major number in filesystem paths
> 
> I can't tell off-hand whether all these are important, however.
> 
> Additionally, the new one that's being discussed in
> 
> 'Update libgrust for upstream GCC commit 
> 6a6d3817afa02bbcd2388c8e005da6faf88932f1 "Config,Darwin: Allow for 
> configuring Darwin to use embedded runpath"'.
> 
> 
> Grüße
> Thomas
> 
> 
>> libgrust/Changelog:
>> 
>>  * Makefile.am: New file.
>>  * configure.ac: New file.
>>  * libproc_macro/Makefile.am: New file.
>>  * libproc_macro/proc_macro.cc: New file.
>>  * libproc_macro/proc_macro.h: New file.
>> 
>> Signed-off-by: Pierre-Emmanuel Patry 
>> ---
>> libgrust/Makefile.am |  68 
>> libgrust/configure.ac| 113 +++
>> libgrust/libproc_macro/Makefile.am   |  58 ++
>> libgrust/libproc_macro/proc_macro.cc |   7 ++
>> libgrust/libproc_macro/proc_macro.h  |   7 ++
>> 5 files changed, 253 insertions(+)
>> create mode 100644 libgrust/Makefile.am
>> create mode 100644 libgrust/configure.ac
>> create mode 100644 libgrust/libproc_macro/Makefile.am
>> create mode 100644 libgrust/libproc_macro/proc_macro.cc
>> create mode 100644 libgrust/libproc_macro/proc_macro.h
>> 
>> diff --git a/libgrust/Makefile.am b/libgrust/Makefile.am
>> new file mode 100644
>> index 000..8e5274922c5
>> --- /dev/null
>> +++ b/libgrust/Makefile.am
>> @@ -0,0 +1,68 @@
>> +AUTOMAKE_OPTIONS = 1.8 foreign
>> +
>> +SUFFIXES = .c .rs .def .o .lo .a
>> +
>> +ACLOCAL_AMFLAGS = -I . -I .. -I ../config
>> +
>> +AM_CFLAGS = -I $(srcdir)/../libgcc -I $(MULTIBUILDTOP)../../gcc/include
>> +
>> +TOP_GCCDIR := $(shell cd $(top_srcdir) && cd .. && pwd)
>> +
>> +GCC_DIR = $(TOP_GCCDIR)/gcc
>> +RUST_SRC = $(GCC_DIR)/rust
>> +
>> +toolexeclibdir=@toolexeclibdir@
>> +toolexecdir=@toolexecdir@
>> +
>> +SUBDIRS = libproc_macro
>> +
>> +RUST_BUILDDIR := $(shell pwd)
>> +
>> +# Work around what appears to be a GNU make bug handling MAKEFLAGS
>> +# values defined in terms of make variables, as is the case for CC and
>> +# friends when we are called from the top level Makefile.
>> +AM_MAKEFLAGS = \
>> +"GCC_DIR=$(GCC_DIR)" \
>> +"RUST_SRC=$(RUST_SRC)" \
>> + "AR_FLAGS=$(AR_FLAGS)" \
>> + "CC_FOR_BUILD=$(CC_FOR_BUILD)" \
>> + "CC_FOR_TARGET=$(CC_FOR_TARGET)" \
>> + "RUST_FOR_TARGET=$(RUST_FOR_TARGET)" \
>> + "CFLAGS=$(CFLAGS)" \
>> + "CXXFLAGS=$(CXXFLAGS)" \
>> + "CFLAGS_FOR_BUILD=$(CFLAGS_FOR_BUILD)" \
>> + "CFLAGS_FOR_TARGET=$(CFLAGS_FOR_TARGET)" \
>> + "INSTALL=$(INSTALL)" \
>> + "INSTALL_DATA=$(INSTALL_DATA)" \
>> +

Only build host libgrust if the Rust language is enabled (was: [PATCH v2 3/4] build: Add libgrust as compilation modules)

Hi!

On 2023-10-25T23:40:40+0200, I wrote:
> On 2023-10-25T13:06:48+0200, Arthur Cohen  wrote:
>> From: Pierre-Emmanuel Patry 
>>
>> Define the libgrust directory as a host compilation module as well as
>> for targets.
>
> I don't see a response to Richard's comments:
> .
> Re "doesn't build libgrust if [Rust is not enabled]", I suppose (but have
> not checked) this works for the *target* libgrust module via
> 'gcc/rust/config-lang.in:target_libs' requesting 'target-libgrust' only
> if the Rust language is enabled?

That aspect appears to work: testing a GCC configuration without the Rust
language enabled, target libgrust doesn't get built, but...

> I don't know what enables/disables the
> *host* libgrust build?

... indeed, host libgrust still does get built.

The attached "Only build host libgrust if the Rust language is enabled"
ought to address this.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 403e6bf5349f8a22e4dc7b74ea80acb55e4f5133 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 27 Oct 2023 11:59:19 +0200
Subject: [PATCH] Only build host libgrust if the Rust language is enabled

	gcc/
	* rust/config-lang.in (lang_dirs): Set to 'libgrust'.
---
 gcc/rust/config-lang.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/rust/config-lang.in b/gcc/rust/config-lang.in
index 8f071dcb0bf0..aee6f0eb468f 100644
--- a/gcc/rust/config-lang.in
+++ b/gcc/rust/config-lang.in
@@ -30,5 +30,6 @@ compilers="rust1\$(exeext)"
 build_by_default="no"
 
 target_libs="target-libffi target-libbacktrace target-libgrust"
+lang_dirs=libgrust
 
 gtfiles="\$(srcdir)/rust/rust-lang.cc"
-- 
2.40.1

Re: [PATCH] Improve memcmpeq for 512-bit vector with vpcmpeq + kortest.

2023-10-27 Thread Hongtao Liu

On Fri, Oct 27, 2023 at 3:21 PM Hongtao Liu  wrote:
>
> On Fri, Oct 27, 2023 at 2:49 PM Richard Biener
>  wrote:
> >
> >
> >
> > > Am 27.10.2023 um 07:50 schrieb liuhongt :
> > >
> > > When 2 vectors are equal, kmask is allones and kortest will set CF,
> > > else CF will be cleared.
> > >
> > > So CF bit can be used to check for the result of the comparison.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > Ok for trunk?
> >
> > Is that also profitable for 256bit aka AVX10?
> Yes, it's also available for both 128-bit and 256-bit with AVX10, from
> performance perspective it's better.
> AVX10:
>   vpcmp + kortest
>  vs
> AVX2:
>  vpxor + vptest
>
>  vptest is more expensive than vpcmp + kortest
>
> > Is there a jump on carry in case the result feeds control flow rather than 
> > a value and is using ktest better then (does combine figure this out?)
> There are JC and JNC, there're many pattern matches for ptest which
> can't be automatically adjusted to kortest by combining, backend needs
> to manually transform them.
> That's why my patch only handles 64-bit vectors(to avoid regressing
I mean 64 bytes.
> those pattern match stuff).
>
> >
> > > Before:
> > >vmovdqu (%rsi), %ymm0
> > >vpxorq  (%rdi), %ymm0, %ymm0
> > >vptest  %ymm0, %ymm0
> > >jne .L2
> > >vmovdqu 32(%rsi), %ymm0
> > >vpxorq  32(%rdi), %ymm0, %ymm0
> > >vptest  %ymm0, %ymm0
> > >je  .L5
> > > .L2:
> > >movl$1, %eax
> > >xorl$1, %eax
> > >vzeroupper
> > >ret
> > >
> > > After:
> > >vmovdqu64   (%rsi), %zmm0
> > >xorl%eax, %eax
> > >vpcmpeqd(%rdi), %zmm0, %k0
> > >kortestw%k0, %k0
> > >setc%al
> > >vzeroupper
> > >ret
> > >
> > > gcc/ChangeLog:
> > >
> > >PR target/104610
> > >* config/i386/i386-expand.cc (ix86_expand_branch): Handle
> > >512-bit vector with vpcmpeq + kortest.
> > >* config/i386/i386.md (cbranchxi4): New expander.
> > >* config/i386/sse.md: (cbranch4): Extend to V16SImode
> > >and V8DImode.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >* gcc.target/i386/pr104610-2.c: New test.
> > > ---
> > > gcc/config/i386/i386-expand.cc | 55 +++---
> > > gcc/config/i386/i386.md| 16 +++
> > > gcc/config/i386/sse.md | 36 +++---
> > > gcc/testsuite/gcc.target/i386/pr104610-2.c | 14 ++
> > > 4 files changed, 99 insertions(+), 22 deletions(-)
> > > create mode 100644 gcc/testsuite/gcc.target/i386/pr104610-2.c
> > >
> > > diff --git a/gcc/config/i386/i386-expand.cc 
> > > b/gcc/config/i386/i386-expand.cc
> > > index 1eae9d7c78c..c664cb61e80 100644
> > > --- a/gcc/config/i386/i386-expand.cc
> > > +++ b/gcc/config/i386/i386-expand.cc
> > > @@ -2411,30 +2411,53 @@ ix86_expand_branch (enum rtx_code code, rtx op0, 
> > > rtx op1, rtx label)
> > >   rtx tmp;
> > >
> > >   /* Handle special case - vector comparsion with boolean result, 
> > > transform
> > > - it using ptest instruction.  */
> > > + it using ptest instruction or vpcmpeq + kortest.  */
> > >   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
> > >   || (mode == TImode && !TARGET_64BIT)
> > > -  || mode == OImode)
> > > +  || mode == OImode
> > > +  || GET_MODE_SIZE (mode) == 64)
> > > {
> > > -  rtx flag = gen_rtx_REG (CCZmode, FLAGS_REG);
> > > -  machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : 
> > > V2DImode;
> > > +  unsigned msize = GET_MODE_SIZE (mode);
> > > +  machine_mode p_mode
> > > += msize == 64 ? V16SImode : msize == 32 ? V4DImode : V2DImode;
> > > +  /* kortest set CF when result is 0x (op0 == op1).  */
> > > +  rtx flag = gen_rtx_REG (msize == 64 ? CCCmode : CCZmode, 
> > > FLAGS_REG);
> > >
> > >   gcc_assert (code == EQ || code == NE);
> > >
> > > -  if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT)
> > > +  /* Using vpcmpeq zmm zmm k + kortest for 512-bit vectors.  */
> > > +  if (msize == 64)
> > >{
> > > -  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
> > > -  op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
> > > -  mode = p_mode;
> > > +  if (mode != V16SImode)
> > > +{
> > > +  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
> > > +  op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
> > > +}
> > > +
> > > +  tmp = gen_reg_rtx (HImode);
> > > +  emit_insn (gen_avx512f_cmpv16si3 (tmp, op0, op1, GEN_INT (0)));
> > > +  emit_insn (gen_kortesthi_ccc (tmp, tmp));
> > > +}
> > > +  /* Using ptest for 128/256-bit vectors.  */
> > > +  else
> > > +{
> > > +  if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT)
> > > +{
> > > +  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
> > > +  op1 =

Re: Re: [PATCH] RISC-V: Add rawmemchr expander.

2023-10-27 Thread juzhe.zh...@rivai.ai

I notice we have expand_block_move
in riscv-v.cc

Maybe we should move it into riscv-string.cc ?



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-27 15:51
To: Kito Cheng; juzhe.zh...@rivai.ai
CC: rdapp.gcc; gcc-patches; palmer; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Add rawmemchr expander.
> Could you put it into riscv-string.cc rather than riscv-v.cc? I would
> like to put those builtin function expander together if possible,
> riscv-string.cc might little bit confuse, but it's all included in
> string.h
 
Ok, sure.  Will commit the adjusted patch if no further comments.
 
Regards
Robin

Re: [PATCH] RISC-V: Add rawmemchr expander.

Attached v3 that I'd commit.

Regards
 Robin

>From 246b986a8ea2332ced7a094dd68d35d84dcbbc04 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Tue, 24 Oct 2023 10:33:15 +0200
Subject: [PATCH v3] RISC-V: Add rawmemchr expander.

This patch adds a vectorized rawmemchr expander.  It also moves the
vectorized expand_block_move to riscv-string.cc.

gcc/ChangeLog:

* config/riscv/autovec.md (rawmemchr): New expander.
* config/riscv/riscv-protos.h (gen_no_side_effects_vsetvl_rtx):
Define.
(expand_rawmemchr): Define.
* config/riscv/riscv-v.cc (force_vector_length_operand): Remove
static.
(expand_block_move): Move from here...
* config/riscv/riscv-string.cc (expand_block_move): ...to here.
(expand_rawmemchr): Add vectorized expander.
* internal-fn.cc (expand_RAWMEMCHR): Fix typo.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/peel-2.c: Add
-fno-tree-loop-distribute-patterns.
* gcc.dg/tree-ssa/ldist-rawmemchr-1.c: Add riscv.
* gcc.dg/tree-ssa/ldist-rawmemchr-2.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add builtin directory.
* gcc.target/riscv/rvv/autovec/builtin/rawmemchr-1.c: New test.
---
 gcc/config/riscv/autovec.md   |  13 +
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv-string.cc  | 302 ++
 gcc/config/riscv/riscv-v.cc   | 202 +---
 gcc/internal-fn.cc|   2 +-
 gcc/testsuite/gcc.dg/tree-prof/peel-2.c   |   2 +-
 .../gcc.dg/tree-ssa/ldist-rawmemchr-1.c   |   8 +-
 .../gcc.dg/tree-ssa/ldist-rawmemchr-2.c   |   8 +-
 .../riscv/rvv/autovec/builtin/rawmemchr-1.c   |  99 ++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 10 files changed, 429 insertions(+), 211 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/rawmemchr-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 1ddc1993120..4f13494afdb 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2397,3 +2397,16 @@ (define_expand "lfloor2"
 DONE;
   }
 )
+
+;; Implement rawmemchr[qi|si|hi].
+(define_expand "rawmemchr"
+  [(match_operand  0 "register_operand")
+   (match_operand  1 "memory_operand")
+   (match_operand:ANYI 2 "const_int_operand")]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_rawmemchr(mode, operands[0], operands[1],
+  operands[2]);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 843a81b0e86..44189ec8139 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -495,6 +495,7 @@ void expand_vec_lfloor (rtx, rtx, machine_mode, 
machine_mode);
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
  bool, void (*)(rtx *, rtx));
 rtx gen_scalar_move_mask (machine_mode);
+rtx gen_no_side_effects_vsetvl_rtx (machine_mode, rtx, rtx);
 
 /* RVV vector register sizes.
TODO: Currently, we only add RVV_32/RVV_64/RVV_128, we may need to
@@ -526,6 +527,7 @@ void expand_cond_unop (unsigned, rtx *);
 void expand_cond_binop (unsigned, rtx *);
 void expand_cond_ternop (unsigned, rtx *);
 void expand_popcount (rtx *);
+void expand_rawmemchr (machine_mode, rtx, rtx, rtx);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 0b4606aa7b2..75c2acaf9b1 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -36,6 +36,9 @@
 #include "target.h"
 #include "predict.h"
 #include "optabs.h"
+#include "riscv-protos.h"
+#include "recog.h"
+#include "tm-constrs.h"
 
 /* Emit proper instruction depending on mode of dest.  */
 
@@ -747,3 +750,302 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
 }
   return false;
 }
+
+/* --- Vector expanders --- */
+
+namespace riscv_vector {
+
+/* Used by cpymemsi in riscv.md .  */
+
+bool
+expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
+{
+  /*
+memcpy:
+   mv a3, a0   # Copy destination
+loop:
+   vsetvli t0, a2, e8, m8, ta, ma  # Vectors of 8b
+   vle8.v v0, (a1) # Load bytes
+   add a1, a1, t0  # Bump pointer
+   sub a2, a2, t0  # Decrement count
+   vse8.v v0, (a3) # Store bytes
+   add a3, a3, t0  # Bump pointer
+   bnez a2, loop   # Any more?
+   ret # Return
+  */
+  if (!TARGET_VECTOR)
+return false;
+  HOST_WIDE_INT potential_ew
+= (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dst_in)), BITS_PER_WORD)
+   / BITS_PER_UNIT);
+  machine_mode vmode = VOIDmode;
+  bool need_loop = true;
+  bool size_p = optimize_function_for_size_p (cfun);
+  rtx src,

Re: [PATCH, expand] Checking available optabs for scalar modes in by pieces operations

2023-10-27 Thread Richard Sandiford

HAO CHEN GUI  writes:
> Hi,
>   This patch checks available optabs for scalar modes used in by
> pieces operations. It fixes the regression cases caused by previous
> patch. Now both scalar and vector modes are examined by the same
> approach.
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
>
> Thanks
> Gui Haochen
>
>
> ChangeLog
> Expand: Checking available optabs for scalar modes in by pieces operations
>
> The former patch (f08ca5903c7) examines the scalar modes by target
> hook scalar_mode_supported_p.  It causes some i386 regression cases
> as XImode and OImode are not enabled in i386 target function.  This
> patch examines the scalar mode by checking if the corresponding optabs
> are available for the mode.
>
> gcc/
>   PR target/111449
>   * expr.cc (qi_vector_mode_supported_p): Rename to...
>   (by_pieces_mode_supported_p): ...this, and extends it to do
>   the checking for both scalar and vector mode.
>   (widest_fixed_size_mode_for_size): Call
>   by_pieces_mode_supported_p to examine the mode.
>   (op_by_pieces_d::smallest_fixed_size_mode_for_size): Likewise.

OK, thanks.

Richard

> patch.diff
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 7aac575eff8..2af9fcbed18 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -1000,18 +1000,21 @@ can_use_qi_vectors (by_pieces_operation op)
>  /* Return true if optabs exists for the mode and certain by pieces
> operations.  */
>  static bool
> -qi_vector_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
> +by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
>  {
> +  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
> +return false;
> +
>if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
> -  && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing)
> -return true;
> +  && VECTOR_MODE_P (mode)
> +  && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing)
> +return false;
>
>if (op == COMPARE_BY_PIECES
> -  && optab_handler (mov_optab, mode) != CODE_FOR_nothing
> -  && can_compare_p (EQ, mode, ccp_jump))
> -return true;
> +  && !can_compare_p (EQ, mode, ccp_jump))
> +return false;
>
> -  return false;
> +  return true;
>  }
>
>  /* Return the widest mode that can be used to perform part of an
> @@ -1035,7 +1038,7 @@ widest_fixed_size_mode_for_size (unsigned int size, 
> by_pieces_operation op)
> {
>   if (GET_MODE_SIZE (candidate) >= size)
> break;
> - if (qi_vector_mode_supported_p (candidate, op))
> + if (by_pieces_mode_supported_p (candidate, op))
> result = candidate;
> }
>
> @@ -1049,7 +1052,7 @@ widest_fixed_size_mode_for_size (unsigned int size, 
> by_pieces_operation op)
>  {
>mode = tmode.require ();
>if (GET_MODE_SIZE (mode) < size
> -   && targetm.scalar_mode_supported_p (mode))
> +   && by_pieces_mode_supported_p (mode, op))
>result = mode;
>  }
>
> @@ -1454,7 +1457,7 @@ op_by_pieces_d::smallest_fixed_size_mode_for_size 
> (unsigned int size)
> break;
>
>   if (GET_MODE_SIZE (candidate) >= size
> - && qi_vector_mode_supported_p (candidate, m_op))
> + && by_pieces_mode_supported_p (candidate, m_op))
> return candidate;
> }
>  }

Re: [PATCH] Improve memcmpeq for 512-bit vector with vpcmpeq + kortest.

2023-10-27 Thread Hongtao Liu

On Fri, Oct 27, 2023 at 2:49 PM Richard Biener
 wrote:
>
>
>
> > Am 27.10.2023 um 07:50 schrieb liuhongt :
> >
> > When 2 vectors are equal, kmask is allones and kortest will set CF,
> > else CF will be cleared.
> >
> > So CF bit can be used to check for the result of the comparison.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
>
> Is that also profitable for 256bit aka AVX10?
Yes, it's also available for both 128-bit and 256-bit with AVX10, from
performance perspective it's better.
AVX10:
  vpcmp + kortest
 vs
AVX2:
 vpxor + vptest

 vptest is more expensive than vpcmp + kortest

> Is there a jump on carry in case the result feeds control flow rather than a 
> value and is using ktest better then (does combine figure this out?)
There are JC and JNC, there're many pattern matches for ptest which
can't be automatically adjusted to kortest by combining, backend needs
to manually transform them.
That's why my patch only handles 64-bit vectors(to avoid regressing
those pattern match stuff).

>
> > Before:
> >vmovdqu (%rsi), %ymm0
> >vpxorq  (%rdi), %ymm0, %ymm0
> >vptest  %ymm0, %ymm0
> >jne .L2
> >vmovdqu 32(%rsi), %ymm0
> >vpxorq  32(%rdi), %ymm0, %ymm0
> >vptest  %ymm0, %ymm0
> >je  .L5
> > .L2:
> >movl$1, %eax
> >xorl$1, %eax
> >vzeroupper
> >ret
> >
> > After:
> >vmovdqu64   (%rsi), %zmm0
> >xorl%eax, %eax
> >vpcmpeqd(%rdi), %zmm0, %k0
> >kortestw%k0, %k0
> >setc%al
> >vzeroupper
> >ret
> >
> > gcc/ChangeLog:
> >
> >PR target/104610
> >* config/i386/i386-expand.cc (ix86_expand_branch): Handle
> >512-bit vector with vpcmpeq + kortest.
> >* config/i386/i386.md (cbranchxi4): New expander.
> >* config/i386/sse.md: (cbranch4): Extend to V16SImode
> >and V8DImode.
> >
> > gcc/testsuite/ChangeLog:
> >
> >* gcc.target/i386/pr104610-2.c: New test.
> > ---
> > gcc/config/i386/i386-expand.cc | 55 +++---
> > gcc/config/i386/i386.md| 16 +++
> > gcc/config/i386/sse.md | 36 +++---
> > gcc/testsuite/gcc.target/i386/pr104610-2.c | 14 ++
> > 4 files changed, 99 insertions(+), 22 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/i386/pr104610-2.c
> >
> > diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> > index 1eae9d7c78c..c664cb61e80 100644
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > @@ -2411,30 +2411,53 @@ ix86_expand_branch (enum rtx_code code, rtx op0, 
> > rtx op1, rtx label)
> >   rtx tmp;
> >
> >   /* Handle special case - vector comparsion with boolean result, transform
> > - it using ptest instruction.  */
> > + it using ptest instruction or vpcmpeq + kortest.  */
> >   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
> >   || (mode == TImode && !TARGET_64BIT)
> > -  || mode == OImode)
> > +  || mode == OImode
> > +  || GET_MODE_SIZE (mode) == 64)
> > {
> > -  rtx flag = gen_rtx_REG (CCZmode, FLAGS_REG);
> > -  machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : 
> > V2DImode;
> > +  unsigned msize = GET_MODE_SIZE (mode);
> > +  machine_mode p_mode
> > += msize == 64 ? V16SImode : msize == 32 ? V4DImode : V2DImode;
> > +  /* kortest set CF when result is 0x (op0 == op1).  */
> > +  rtx flag = gen_rtx_REG (msize == 64 ? CCCmode : CCZmode, FLAGS_REG);
> >
> >   gcc_assert (code == EQ || code == NE);
> >
> > -  if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT)
> > +  /* Using vpcmpeq zmm zmm k + kortest for 512-bit vectors.  */
> > +  if (msize == 64)
> >{
> > -  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
> > -  op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
> > -  mode = p_mode;
> > +  if (mode != V16SImode)
> > +{
> > +  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
> > +  op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
> > +}
> > +
> > +  tmp = gen_reg_rtx (HImode);
> > +  emit_insn (gen_avx512f_cmpv16si3 (tmp, op0, op1, GEN_INT (0)));
> > +  emit_insn (gen_kortesthi_ccc (tmp, tmp));
> > +}
> > +  /* Using ptest for 128/256-bit vectors.  */
> > +  else
> > +{
> > +  if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT)
> > +{
> > +  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
> > +  op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
> > +  mode = p_mode;
> > +}
> > +
> > +  /* Generate XOR since we can't check that one operand is zero
> > + vector.  */
> > +  tmp = gen_reg_rtx (mode);
> > +  emit_insn (gen_rtx_SET (tmp, gen_rtx_XOR (mode, op0, op1)));
> > +  tmp = gen_lowpart (p_mode, tmp);
> > +

Re: Re: [PATCH] RISC-V: Add rawmemchr expander.

2023-10-27 Thread juzhe.zh...@rivai.ai

LGTM. Thanks.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-27 15:38
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Add rawmemchr expander.
> Suggested adapt codes as follows:
> 
> unsigned int element_size = GET_MODE_SIZE (mode).to_constant ();
> poly_int64 nunits = exact_div (BYTES_PER_RISCV_VECTOR *TARGET_MAX_LMUL, 
> element_size);
> if (!get_vector_mode(mode, nunits).exists())
>   gcc_unreachable ();
 
Actually I was initially considering using lmul = m8 here,
unconditionally, but the param is probably the more intuitive choice.
 
Attached v2 with that included.  Also moved the riscv test to
autovec/builtin/ so we can add the other builtins as well.
 
> Also, this patch reminds me we are missing some more similiar builtin
> function which can use RVV:
> 
> strlen, strcpy, strcmp...etc
 
Yes we should still have them but I'd rather not work on that right
now.  How about I open a PR for it so we can still add them in stage 3?
Their impact is pretty localized and the risk should be low.
Kito, Palmer, Jeff - would that be acceptable?
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (rawmemchr): New expander.
* config/riscv/riscv-protos.h (enum insn_type): Define.
(expand_rawmemchr): New function.
* config/riscv/riscv-v.cc (expand_rawmemchr): Add vectorized
rawmemchr.
* internal-fn.cc (expand_RAWMEMCHR): Fix typo.
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/tree-ssa/ldist-rawmemchr-1.c: Add riscv.
* gcc.dg/tree-ssa/ldist-rawmemchr-2.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add builtin directory.
* gcc.target/riscv/rvv/autovec/builtin/rawmemchr-1.c: New test.
---
gcc/config/riscv/autovec.md   | 13 +++
gcc/config/riscv/riscv-protos.h   |  1 +
gcc/config/riscv/riscv-v.cc   | 89 +
gcc/internal-fn.cc|  2 +-
.../gcc.dg/tree-ssa/ldist-rawmemchr-1.c   |  8 +-
.../gcc.dg/tree-ssa/ldist-rawmemchr-2.c   |  8 +-
.../riscv/rvv/autovec/builtin/rawmemchr-1.c   | 99 +++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  2 +
8 files changed, 213 insertions(+), 9 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/rawmemchr-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 1ddc1993120..4f13494afdb 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2397,3 +2397,16 @@ (define_expand "lfloor2"
 DONE;
   }
)
+
+;; Implement rawmemchr[qi|si|hi].
+(define_expand "rawmemchr"
+  [(match_operand  0 "register_operand")
+   (match_operand  1 "memory_operand")
+   (match_operand:ANYI 2 "const_int_operand")]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_rawmemchr(mode, operands[0], operands[1],
+operands[2]);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 843a81b0e86..7f148ed95fe 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -526,6 +526,7 @@ void expand_cond_unop (unsigned, rtx *);
void expand_cond_binop (unsigned, rtx *);
void expand_cond_ternop (unsigned, rtx *);
void expand_popcount (rtx *);
+void expand_rawmemchr (machine_mode, rtx, rtx, rtx);
/* Rounding mode bitfield for fixed point VXRM.  */
enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 3fe8125801b..0f664553cf4 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2215,6 +2215,95 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
   return true;
}
+/* Implement rawmemchr using vector instructions.
+   It can be assumed that the needle is in the haystack, otherwise the
+   behavior is undefined.  */
+
+void
+expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
+{
+  /*
+rawmemchr:
+loop:
+   vsetvli a1, zero, e[8,16,32,64], m1, ta, ma
+   vle[8,16,32,64]ff.v v8, (a0)  # Load.
+   csrr a1, vl  # Get number of bytes read.
+   vmseq.vx v0, v8, pat  # v0 = (v8 == {pat, pat, ...})
+   vfirst.m a2, v0  # Find first hit.
+   add a0, a0, a1  # Bump pointer.
+   bltz a2, loop  # Not found?
+
+   sub a0, a0, a1  # Go back by a1.
+   shll a2, a2, [0,1,2,3]  # Shift to get byte offset.
+   add a0, a0, a2  # Add the offset.
+
+   ret
+  */
+  gcc_assert (TARGET_VECTOR);
+
+  unsigned int isize = GET_MODE_SIZE (mode).to_constant ();
+  int lmul = riscv_autovec_lmul == RVV_DYNAMIC ? RVV_M8 : riscv_autovec_lmul;
+  poly_int64 nunits = exact_div (BYTES_PER_RISCV_VECTOR * lmul, isize);
+
+  machine_mode vmode;
+  if (!get_vector_mode (GET_MODE_INNER (mode), nunits).exists ())
+gcc_unreachable ();
+
+  machine_mode mask_mode = get_mask_mode (vmode);
+
+  rtx cnt = gen_reg_rtx (Pmode);
+  rtx end = gen_reg_rtx (Pmode);
+  rtx vec = gen_reg_rtx (vmode);
+  rtx mask = gen_reg_rtx (mask_mode);
+
+  /* After finding the first vector

Re: [PATCH] Improve memcmpeq for 512-bit vector with vpcmpeq + kortest.




> Am 27.10.2023 um 09:13 schrieb Hongtao Liu :
> 
> On Fri, Oct 27, 2023 at 2:49 PM Richard Biener
>  wrote:
>> 
>> 
>> 
 Am 27.10.2023 um 07:50 schrieb liuhongt :
>>> 
>>> When 2 vectors are equal, kmask is allones and kortest will set CF,
>>> else CF will be cleared.
>>> 
>>> So CF bit can be used to check for the result of the comparison.
>>> 
>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>>> Ok for trunk?
>> 
>> Is that also profitable for 256bit aka AVX10?
> Yes, it's also available for both 128-bit and 256-bit with AVX10, from
> performance perspective it's better.
> AVX10:
>  vpcmp + kortest
> vs
> AVX2:
> vpxor + vptest
> 
> vptest is more expensive than vpcmp + kortest
> 
>> Is there a jump on carry in case the result feeds control flow rather than a 
>> value and is using ktest better then (does combine figure this out?)
> There are JC and JNC, there're many pattern matches for ptest which
> can't be automatically adjusted to kortest by combining, backend needs
> to manually transform them.
> That's why my patch only handles 64-bit vectors(to avoid regressing
> those pattern match stuff).

Ah, I see.  That’s exactly what I was wondering.

Richard 

> 
>> 
>>> Before:
>>>   vmovdqu (%rsi), %ymm0
>>>   vpxorq  (%rdi), %ymm0, %ymm0
>>>   vptest  %ymm0, %ymm0
>>>   jne .L2
>>>   vmovdqu 32(%rsi), %ymm0
>>>   vpxorq  32(%rdi), %ymm0, %ymm0
>>>   vptest  %ymm0, %ymm0
>>>   je  .L5
>>> .L2:
>>>   movl$1, %eax
>>>   xorl$1, %eax
>>>   vzeroupper
>>>   ret
>>> 
>>> After:
>>>   vmovdqu64   (%rsi), %zmm0
>>>   xorl%eax, %eax
>>>   vpcmpeqd(%rdi), %zmm0, %k0
>>>   kortestw%k0, %k0
>>>   setc%al
>>>   vzeroupper
>>>   ret
>>> 
>>> gcc/ChangeLog:
>>> 
>>>   PR target/104610
>>>   * config/i386/i386-expand.cc (ix86_expand_branch): Handle
>>>   512-bit vector with vpcmpeq + kortest.
>>>   * config/i386/i386.md (cbranchxi4): New expander.
>>>   * config/i386/sse.md: (cbranch4): Extend to V16SImode
>>>   and V8DImode.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>   * gcc.target/i386/pr104610-2.c: New test.
>>> ---
>>> gcc/config/i386/i386-expand.cc | 55 +++---
>>> gcc/config/i386/i386.md| 16 +++
>>> gcc/config/i386/sse.md | 36 +++---
>>> gcc/testsuite/gcc.target/i386/pr104610-2.c | 14 ++
>>> 4 files changed, 99 insertions(+), 22 deletions(-)
>>> create mode 100644 gcc/testsuite/gcc.target/i386/pr104610-2.c
>>> 
>>> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
>>> index 1eae9d7c78c..c664cb61e80 100644
>>> --- a/gcc/config/i386/i386-expand.cc
>>> +++ b/gcc/config/i386/i386-expand.cc
>>> @@ -2411,30 +2411,53 @@ ix86_expand_branch (enum rtx_code code, rtx op0, 
>>> rtx op1, rtx label)
>>>  rtx tmp;
>>> 
>>>  /* Handle special case - vector comparsion with boolean result, transform
>>> - it using ptest instruction.  */
>>> + it using ptest instruction or vpcmpeq + kortest.  */
>>>  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>>>  || (mode == TImode && !TARGET_64BIT)
>>> -  || mode == OImode)
>>> +  || mode == OImode
>>> +  || GET_MODE_SIZE (mode) == 64)
>>>{
>>> -  rtx flag = gen_rtx_REG (CCZmode, FLAGS_REG);
>>> -  machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : 
>>> V2DImode;
>>> +  unsigned msize = GET_MODE_SIZE (mode);
>>> +  machine_mode p_mode
>>> += msize == 64 ? V16SImode : msize == 32 ? V4DImode : V2DImode;
>>> +  /* kortest set CF when result is 0x (op0 == op1).  */
>>> +  rtx flag = gen_rtx_REG (msize == 64 ? CCCmode : CCZmode, FLAGS_REG);
>>> 
>>>  gcc_assert (code == EQ || code == NE);
>>> 
>>> -  if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT)
>>> +  /* Using vpcmpeq zmm zmm k + kortest for 512-bit vectors.  */
>>> +  if (msize == 64)
>>>   {
>>> -  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
>>> -  op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
>>> -  mode = p_mode;
>>> +  if (mode != V16SImode)
>>> +{
>>> +  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
>>> +  op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
>>> +}
>>> +
>>> +  tmp = gen_reg_rtx (HImode);
>>> +  emit_insn (gen_avx512f_cmpv16si3 (tmp, op0, op1, GEN_INT (0)));
>>> +  emit_insn (gen_kortesthi_ccc (tmp, tmp));
>>> +}
>>> +  /* Using ptest for 128/256-bit vectors.  */
>>> +  else
>>> +{
>>> +  if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT)
>>> +{
>>> +  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
>>> +  op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
>>> +  mode = p_mode;
>>> +}
>>> +
>>> +  /* Generate XOR since we can't check that one operand is zero
>>> + vector.  */
>>> +  tmp =

Re: [PATCH] testsuite, aarch64: Normalise options to aarch64.exp.

2023-10-27 Thread Iain Sandoe

Hi Andrew,

> On 26 Oct 2023, at 20:00, Andrew Pinski  wrote:
> 
> On Thu, Oct 26, 2023 at 11:58 AM Iain Sandoe  wrote:
>> 
>> tested on cfarm185 (aarch64-linux-gnu, xgene1) and with the aarch64
>> Darwin prototype.  It is possible that some initial fallout could occur
>> on some test setups (where the default has been catered for in some
>> way) - but that should stabilize.  OK for trunk?
> 
> This fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93619 I think.

Actually, it does not - the aarch64-with-arch-dg-options () function explicitly
excludes overriding mcpu/march/tune options provided by the test-case so
that you still see:

Excess errors:
cc1: warning: switch '-mcpu=octeontx' conflicts with '-march=armv8.4-a’ switch

Iain

Re: [PATCH] RISC-V: Fix wrong tune parameters on int_div





On 10/27/23 01:37, juzhe.zh...@rivai.ai wrote:

LGTM from my side.

The original integer division COST seems too low.
Almost certainly, though there may be good reasons why it was initially 
set so low.  I'm generally hesitant to change things like that without 
either someone with knowledge of the code/uarch stepping in with a 
recommendation or some kind of analysis showing their wrong.




Hi, Jeff and Kito. Could take a look at this patch ?

It's on the list.

jeff

Re: [PATCH V2 5/7] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-27 Thread Victor Do Nascimento





On 10/27/23 14:18, Alex Coplan wrote:

On 26/10/2023 16:23, Richard Sandiford wrote:

Victor Do Nascimento  writes:

On 10/18/23 21:39, Richard Sandiford wrote:

Victor Do Nascimento  writes:

Implement the aarch64 intrinsics for reading and writing system
registers with the following signatures:

uint32_t __arm_rsr(const char *special_register);
uint64_t __arm_rsr64(const char *special_register);
void* __arm_rsrp(const char *special_register);
float __arm_rsrf(const char *special_register);
double __arm_rsrf64(const char *special_register);
void __arm_wsr(const char *special_register, uint32_t value);
void __arm_wsr64(const char *special_register, uint64_t value);
void __arm_wsrp(const char *special_register, const void *value);
void __arm_wsrf(const char *special_register, float value);
void __arm_wsrf64(const char *special_register, double value);

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
Add enums for new builtins.
(aarch64_init_rwsr_builtins): New.
(aarch64_general_init_builtins): Call aarch64_init_rwsr_builtins.
(aarch64_expand_rwsr_builtin):  New.
(aarch64_general_expand_builtin): Call aarch64_general_expand_builtin.
* gcc/config/aarch64/aarch64.md (read_sysregdi): New insn_and_split.
(write_sysregdi): Likewise.
* gcc/config/aarch64/arm_acle.h (__arm_rsr): New.
(__arm_rsrp): Likewise.
(__arm_rsr64): Likewise.
(__arm_rsrf): Likewise.
(__arm_rsrf64): Likewise.
(__arm_wsr): Likewise.
(__arm_wsrp): Likewise.
(__arm_wsr64): Likewise.
(__arm_wsrf): Likewise.
(__arm_wsrf64): Likewise.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/acle/rwsr.c: New.
* gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c: Likewise.
---
   gcc/config/aarch64/aarch64-builtins.cc| 200 ++
   gcc/config/aarch64/aarch64.md |  17 ++
   gcc/config/aarch64/arm_acle.h |  30 +++
   .../gcc.target/aarch64/acle/rwsr-1.c  |  20 ++
   gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  | 144 +
   5 files changed, 411 insertions(+)
   create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
   create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 04f59fd9a54..d8bb2a989a5 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -808,6 +808,17 @@ enum aarch64_builtins
 AARCH64_RBIT,
 AARCH64_RBITL,
 AARCH64_RBITLL,
+  /* System register builtins.  */
+  AARCH64_RSR,
+  AARCH64_RSRP,
+  AARCH64_RSR64,
+  AARCH64_RSRF,
+  AARCH64_RSRF64,
+  AARCH64_WSR,
+  AARCH64_WSRP,
+  AARCH64_WSR64,
+  AARCH64_WSRF,
+  AARCH64_WSRF64,
 AARCH64_BUILTIN_MAX
   };
   
@@ -1798,6 +1809,65 @@ aarch64_init_rng_builtins (void)

   AARCH64_BUILTIN_RNG_RNDRRS);
   }
   
+/* Add builtins for reading system register.  */

+static void
+aarch64_init_rwsr_builtins (void)
+{
+  tree fntype = NULL;
+  tree const_char_ptr_type
+= build_pointer_type (build_type_variant (char_type_node, true, false));
+
+#define AARCH64_INIT_RWSR_BUILTINS_DECL(F, N, T) \
+  aarch64_builtin_decls[AARCH64_##F] \
+= aarch64_general_add_builtin ("__builtin_aarch64_"#N, T, AARCH64_##F);
+
+  fntype
+= build_function_type_list (uint32_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR, rsr, fntype);
+
+  fntype
+= build_function_type_list (ptr_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRP, rsrp, fntype);
+
+  fntype
+= build_function_type_list (uint64_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR64, rsr64, fntype);
+
+  fntype
+= build_function_type_list (float_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF, rsrf, fntype);
+
+  fntype
+= build_function_type_list (double_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF64, rsrf64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint32_type_node, NULL);
+
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR, wsr, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   const_ptr_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSRP, wsrp, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint64_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR64, wsr64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   float_type_node,

Re: [pushed] [RA]: Modify cost calculation for dealing with pseudo equivalences

2023-10-27 Thread Christophe Lyon

Hi Vladimir,

On Thu, 26 Oct 2023 at 16:00, Vladimir Makarov  wrote:
>
> This is the second attempt to improve RA cost calculation for pseudos
> with equivalences.  The patch explanation is in the log message.
>
> The patch was successfully bootstrapped and tested on x86-64, aarch64,
> and ppc64le.  The patch was also benchmarked on x86-64 spec2017.
> specfp2017 performance did not changed, specint2017 improved by 0.3%.
>

As reported by our CI, this patch causes a regression on arm:
FAIL: gcc.target/arm/eliminate.c scan-assembler-times r0,[\\t ]*sp 3


For this testcase, we used to generate:
str lr, [sp, #-4]!
sub sp, sp, #12
add r0, sp, #4
bl  bar
add r0, sp, #4
bl  bar
add r0, sp, #4
bl  bar
add sp, sp, #12
ldr lr, [sp], #4
bx  lr

After your patch, we generate:
push{r4, lr}
sub sp, sp, #8
add r4, sp, #4
mov r0, r4
bl  bar
mov r0, r4
bl  bar
mov r0, r4
bl  bar
add sp, sp, #8
pop {r4, lr}
bx  lr

which uses 1 more register and 1 more instruction.

Shall I file a bugzilla report for this?

Thanks,

Christophe

[pushed] [RA]: Add cost calculation for reg equivalence invariants

2023-10-27 Thread Vladimir Makarov

The following patch fixes one aarch64 GCC test failure resulted from my 
previous patch dealing with reg equivalences.


The patch was successfully bootstrapped and tested on x86-64, aarch64, 
ppc64le.


commit 9b03e1d20c00dca215b787a5e959db473325b660
Author: Vladimir N. Makarov 
Date:   Fri Oct 27 08:28:24 2023 -0400

[RA]: Add cost calculation for reg equivalence invariants

My recent patch improving cost calculation for pseudos with equivalence
resulted in failure of gcc.target/arm/eliminate.c on aarch64.  This patch
fixes this failure.

gcc/ChangeLog:

* ira-costs.cc: (get_equiv_regno, calculate_equiv_gains):
Process reg equivalence invariants.

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index a59d45a6e24..c4086807076 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -1784,6 +1784,7 @@ get_equiv_regno (rtx x, int , rtx )
 }
   if (REG_P (x)
   && (ira_reg_equiv[REGNO (x)].memory != NULL
+	  || ira_reg_equiv[REGNO (x)].invariant != NULL
 	  || ira_reg_equiv[REGNO (x)].constant != NULL))
 {
   regno = REGNO (x);
@@ -1826,6 +1827,7 @@ calculate_equiv_gains (void)
   for (regno = max_reg_num () - 1; regno >= FIRST_PSEUDO_REGISTER; regno--)
 if (ira_reg_equiv[regno].init_insns != NULL
 	&& (ira_reg_equiv[regno].memory != NULL
+	|| ira_reg_equiv[regno].invariant != NULL
 	|| (ira_reg_equiv[regno].constant != NULL
 		/* Ignore complicated constants which probably will be placed
 		   in memory:  */
@@ -1876,6 +1878,8 @@ calculate_equiv_gains (void)
 
 	  if (subst == NULL)
 	subst = ira_reg_equiv[regno].constant;
+	  if (subst == NULL)
+	subst = ira_reg_equiv[regno].invariant;
 	  ira_assert (subst != NULL);
 	  mode = PSEUDO_REGNO_MODE (regno);
 	  ira_init_register_move_cost_if_necessary (mode);

Re: [pushed] [RA]: Modify cost calculation for dealing with pseudo equivalences

2023-10-27 Thread Christophe Lyon

On Fri, 27 Oct 2023 at 16:19, Vladimir Makarov  wrote:
>
>
> On 10/27/23 09:56, Christophe Lyon wrote:
> > Hi Vladimir,
> >
> > On Thu, 26 Oct 2023 at 16:00, Vladimir Makarov  
> > wrote:
> >> This is the second attempt to improve RA cost calculation for pseudos
> >> with equivalences.  The patch explanation is in the log message.
> >>
> >> The patch was successfully bootstrapped and tested on x86-64, aarch64,
> >> and ppc64le.  The patch was also benchmarked on x86-64 spec2017.
> >> specfp2017 performance did not changed, specint2017 improved by 0.3%.
> >>
> > As reported by our CI, this patch causes a regression on arm:
> > FAIL: gcc.target/arm/eliminate.c scan-assembler-times r0,[\\t ]*sp 3
> >
> >
> > For this testcase, we used to generate:
> >  str lr, [sp, #-4]!
> >  sub sp, sp, #12
> >  add r0, sp, #4
> >  bl  bar
> >  add r0, sp, #4
> >  bl  bar
> >  add r0, sp, #4
> >  bl  bar
> >  add sp, sp, #12
> >  ldr lr, [sp], #4
> >  bx  lr
> >
> > After your patch, we generate:
> >  push{r4, lr}
> >  sub sp, sp, #8
> >  add r4, sp, #4
> >  mov r0, r4
> >  bl  bar
> >  mov r0, r4
> >  bl  bar
> >  mov r0, r4
> >  bl  bar
> >  add sp, sp, #8
> >  pop {r4, lr}
> >  bx  lr
> >
> > which uses 1 more register and 1 more instruction.
> >
> > Shall I file a bugzilla report for this?
> >
> I started to work on this right after I got the message (yesterday).  I
> already have a patch and am going to commit it during an hour.  So there
> is no need to fill the PR.
>
Great, thanks for the quick fix!

[PATCH 2/2 v2] arm: move the switch tables for Arm to the RO data section

2023-10-27 Thread Richard Ball

v2: Formatting and nits fixed.

Follow up patch to arm: Use deltas for Arm switch tables
This patch moves the switch tables for Arm from the .text section
into the .rodata section.

gcc/ChangeLog:

* config/arm/aout.h: Change to use the Lrtx label.
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove arm targets
from (!target_pure_code) condition.
(ADDR_VEC_ALIGN): Add align for tables in rodata section.
* config/arm/arm.cc (arm_output_casesi): Alter the function to include
.Lrtx label and remove adr instructions.
* config/arm/arm.md
(arm_casesi_internal): Use force_reg to generate ldr instructions that
would otherwise be out of range, and change rtl to accommodate force 
reg.
Additionally remove unnecessary register temp.
(casesi): Remove pure code check for Arm.
* config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Remove arm
targets from JUMP_TABLES_IN_TEXT_SECTION definition.

gcc/testsuite/ChangeLog:

* gcc.target/arm/arm-switchstatement.c: Alter the tests to
change adr instruction to ldr.diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index 
6a4c8da5f6d5a1695518f42830b9d045888eeed6..49896bb962081a5ee4b5328029813c681c489a9e
 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -187,16 +187,16 @@
  switch (GET_MODE (body))  \
{   \
case E_QImode:  \
- asm_fprintf (STREAM, "\t.byte\t(%LL%d-%LL%d-4)/4\n",  \
+ asm_fprintf (STREAM, "\t.byte\t(%LL%d-%LLrtx%d-4)/4\n",   \
   VALUE, REL); \
  break;\
case E_HImode:  \
- asm_fprintf (STREAM, "\t.2byte\t(%LL%d-%LL%d-4)/4\n", \
+ asm_fprintf (STREAM, "\t.2byte\t(%LL%d-%LLrtx%d-4)/4\n",  \
   VALUE, REL); \
  break;\
case E_SImode:  \
  if (flag_pic) \
-   asm_fprintf (STREAM, "\t.word\t%LL%d-%LL%d-4\n",\
+   asm_fprintf (STREAM, "\t.word\t%LL%d-%LLrtx%d-4\n", \
 VALUE, REL);   \
  else  \
asm_fprintf (STREAM, "\t.word\t%LL%d\n", VALUE);\
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
3063e3489094f04ecf03a52952c185d4a75da645..a9c2752c0ea5ecd4597ded254e9426753ac0a098
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2092,10 +2092,11 @@ enum arm_auto_incmodes
for the index in the tablejump instruction.  */
 #define CASE_VECTOR_MODE Pmode
 
-#define CASE_VECTOR_PC_RELATIVE ((TARGET_ARM || TARGET_THUMB2  \
- || (TARGET_THUMB1 \
- && (optimize_size || flag_pic)))  \
-&& (!target_pure_code))
+#define CASE_VECTOR_PC_RELATIVE
\
+   (TARGET_ARM \
+|| (!target_pure_code  \
+   && (TARGET_THUMB2   \
+   || (TARGET_THUMB1 && (optimize_size || flag_pic)
 
 
 #define CASE_VECTOR_SHORTEN_MODE(min, max, body)   \
@@ -2301,8 +2302,14 @@ extern int making_const_table;
asm_fprintf (STREAM, "\tpop {%r}\n", REGNO);\
 } while (0)
 
-#define ADDR_VEC_ALIGN(JUMPTABLE)  \
-  ((TARGET_THUMB && GET_MODE (PATTERN (JUMPTABLE)) == SImode) ? 2 : 0)
+/* If the switch table is in the code segment, additional alignment is
+   needed for Thumb SImode tables.  Otherwise, tables in RO data have
+   natural alignment.  */
+#define ADDR_VEC_ALIGN(TABLE)  \
+  (JUMP_TABLES_IN_TEXT_SECTION \
+   ? ((TARGET_THUMB && GET_MODE (PATTERN (TABLE)) == SImode) ? 2 : 0)  \
+   : (exact_log2 (GET_MODE_ALIGNMENT (GET_MODE (PATTERN (TABLE)))  \
+ / BITS_PER_UNIT)))
 
 /* Alignment for case labels comes from ADDR_VEC_ALIGN; avoid the
default alignment from elfos.h.  */
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
4e5e6997ed555372683e01b2aff5c25265f4e50c..620ef7bfb2f3af9b8de576359a6157190c439aad
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -30469,44 +30469,55 @@ arm_output_iwmmxt_tinsr (rtx *operands)
 const char *

Re: [PATCH] C99 testsuite readiness: Some unverified test case reductions





On 10/20/23 13:05, Florian Weimer wrote:

gcc/testsuite/

* gcc.c-torture/compile/2412-2.c (f): Call
__builtin_strleninstead of strlen.
* gcc.c-torture/compile/2427-1.c (FindNearestPowerOf2):
Declare.
* gcc.c-torture/compile/2802-1.c (bar): Call
__builtin_memcpyinstead of memcpy.
* gcc.c-torture/compile/20010525-1.c (kind_varread): Likewise.
* gcc.c-torture/compile/20010706-1.c (foo): Add missing int
return type.
* gcc.c-torture/compile/20020314-1.c (add_output_space_event)
(del_tux_atom, add_req_to_workqueue): Declare.
* gcc.c-torture/compile/20020701-1.c (f): Call
__builtin_memcpyinstead of memcpy.
* gcc.c-torture/compile/20021015-2.c (f): Call __builtin_bcmp
instead of bcmo.
* gcc.c-torture/compile/20030110-1.c (inb): Declare.
* gcc.c-torture/compile/20030314-1.c (bar): Add missing
void return type.
* gcc.c-torture/compile/20030405-1.c (bar): Add missing int
return type.
* gcc.c-torture/compile/20030416-1.c (bar): Declare.
(main): Add missing int return type.
* gcc.c-torture/compile/20030503-1.c (bar): Declare.
* gcc.c-torture/compile/20030530-1.c: (bar): Declare.
* gcc.c-torture/compile/20031031-2.c (foo, bar, baz): Declare.
* gcc.c-torture/compile/20040101-1.c (test16): Call
__builtin_printf instead of printf.
* gcc.c-torture/compile/20040124-1.c (f2, f3): Declare.
* gcc.c-torture/compile/20040304-1.c (macarg): Declare.
* gcc.c-torture/compile/20040705-1.c (f): Call
__builtin_memcpy instead of memcpy.
* gcc.c-torture/compile/20040908-1.c (bar): Declare.
* gcc.c-torture/compile/20050510-1.c (dont_remove): Declare.
* gcc.c-torture/compile/20051228-1.c (bar): Declare.
* gcc.c-torture/compile/20060109-1.c (cpp_interpret_string):
Declare.
(int_c_lex, cb_ident): Add missing void return type.
(cb_ident): Define as static.
* gcc.c-torture/compile/20060202-1.c (sarray_get): Declare.
* gcc.c-torture/compile/20070129.c (regcurly)
(reguni): Declare.
* gcc.c-torture/compile/20070529-1.c (__fswab16): Declare.
* gcc.c-torture/compile/20070529-2.c (kmem_free): Declare.
* gcc.c-torture/compile/20070605-1.c (quantize_fs_dither):
Add missing void return type.
* gcc.c-torture/compile/20071107-1.c
(settings_install_property_parser): Declare.
* gcc.c-torture/compile/20090907-1.c (load_waveform): Call
__builtin_abort instead of abort.
* gcc.c-torture/compile/20100907.c (t): Add missing void
types.
* gcc.c-torture/compile/20120524-1.c (build_packet): Call
__builtin_memcpy instead of memcpy.
* gcc.c-torture/compile/20120830-2.c
(ubidi_writeReordered_49): Add missing void return type.
* gcc.c-torture/compile/20121010-1.c (read_long): Add missing
int return type.
* gcc.c-torture/compile/920301-1.c (f, g): Add missing void
types.
* gcc.c-torture/compile/920409-1.c (x): Likewise.
* gcc.c-torture/compile/920410-1.c (main): Add missing int
return type.  Call __builtin_printf instead of printf.
* gcc.c-torture/compile/920410-2.c (joe): Add missing void
types.
* gcc.c-torture/compile/920411-2.c (x): Likewise.
* gcc.c-torture/compile/920413-1.c (f): Add missing int return
type.
* gcc.c-torture/compile/920428-3.c (x): Add missing int types.
* gcc.c-torture/compile/920428-4.c (x): Add missing void
return type and int parameter type.
* gcc.c-torture/compile/920501-10.c (x): Add missing int
types.
* gcc.c-torture/compile/920501-12.c (x, a, b, A, B): Likewise.
* gcc.c-torture/compile/920501-17.c (x): Add missing void
types.
* gcc.c-torture/compile/920501-19.c (y): Likewise.
* gcc.c-torture/compile/920501-22.c (x): Likewise.
* gcc.c-torture/compile/920501-3.c (x): Likewise.
* gcc.c-torture/compile/920501-4.c (foo): Likewise.
* gcc.c-torture/compile/920529-1.c (f): Call __builtin_abort
instead of abort.
* gcc.c-torture/compile/920615-1.c (f): Add missing void
types.
* gcc.c-torture/compile/920623-1.c (g): Likewise.
* gcc.c-torture/compile/920624-1.c (f): Likewise.
* gcc.c-torture/compile/920711-1.c (f): Add missing int types.
* gcc.c-torture/compile/920729-1.c (f): Add missing void
types.
* gcc.c-torture/compile/920806-1.c (f): Likewise.
* gcc.c-torture/compile/920821-2.c (f): Likewise.
* gcc.c-torture/compile/920825-1.c (f): Likewise.
* gcc.c-torture/compile/920825-2.c (f, g): Add missing void
return type.
* gcc.c-torture/compile/920826-1.c (f): Likewise.

Re: [PATCH] wwwdocs: gcc-14: mark amdgcn fiji deprecated

2023-10-27 Thread Andrew Stubbs


On 22/10/2023 13:24, Gerald Pfeifer wrote:

Hi Andrew,

On Fri, 20 Oct 2023, Andrew Stubbs wrote:

  Additionally, I wonder whether "Fiji" should be changed to "Fiji
(gfx803)" in the first line and whether the  "," should be removed in
"The ... configuration ... , and no longer includes".

Fair enough, how's this version? (I like the comma, even if it is optional.)


it's definitely fine. I do have a recommendation and a question, though
feel free to go about them as you prefer.

+  The default device architecture is now gfx900 (Vega).

How about starting with this as the first sub-item, as a "positive",
then follow with the deprecation?

+
+  
+The Fiji (gfx803) device support is now deprecated and will be removed from

Could this be "Fiji (gfx803) device support" without the article?


Thank you for your suggestions; I have committed the attached.

Andrewgcc-14: mark amdgcn fiji deprecated


diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index c817dde4..a20499e9 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -178,6 +178,21 @@ a work-in-progress.
 
 
 
+AMD Radeon (GCN)
+
+
+  The default device architecture is now gfx900 (Vega).
+  
+Fiji (gfx803) device support is now deprecated and will be removed from
+a future release.  The default compiler configuration no longer uses Fiji
+as the default device, and no longer includes the Fiji libraries.  Both can
+be restored by configuring with
+https://gcc.gnu.org/install/specific.html#amdgcn-x-amdhsa;>
+  --with-arch=fiji
+.
+  
+
+

Re: [PATCH] Improve tree_expr_nonnegative_p by using the ranger [PR111959]

On Thu, Oct 26, 2023 at 8:30 PM Andrew Pinski  wrote:
>
> On Thu, Oct 26, 2023 at 2:29 AM Richard Biener
>  wrote:
> >
> > On Wed, Oct 25, 2023 at 5:51 AM Andrew Pinski  wrote:
> > >
> > > I noticed we were missing optimizing `a / (1 << b)` when
> > > we know that a is nonnegative but only due to ranger information.
> > > This adds the use of the global ranger to tree_single_nonnegative_warnv_p
> > > for SSA_NAME.
> > > I didn't extend tree_single_nonnegative_warnv_p to use the ranger for 
> > > floating
> > > point nor to use the local ranger since I am not 100% sure it is safe 
> > > where
> > > all of the uses tree_expr_nonnegative_p would be safe.
> > >
> > > Note pr80776-1.c testcase fails again due to vrp's bad handling of setting
> > > global ranges from __builtin_unreachable. It just happened to be optimized
> > > before due to global ranges not being used as much.
> > >
> > > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> > >
> > > PR tree-optimization/111959
> > >
> > > gcc/ChangeLog:
> > >
> > > * fold-const.cc (tree_single_nonnegative_warnv_p): Use
> > > the global range to see if the SSA_NAME was nonnegative.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/tree-ssa/forwprop-42.c: New test.
> > > * gcc.dg/pr80776-1.c: xfail and update comment.
> > > ---
> > >  gcc/fold-const.cc   | 36 +++--
> > >  gcc/testsuite/gcc.dg/pr80776-1.c|  8 ++---
> > >  gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c | 15 +
> > >  3 files changed, 46 insertions(+), 13 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
> > >
> > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > > index 40767736389..2a2a90230f5 100644
> > > --- a/gcc/fold-const.cc
> > > +++ b/gcc/fold-const.cc
> > > @@ -15047,15 +15047,33 @@ tree_single_nonnegative_warnv_p (tree t, bool 
> > > *strict_overflow_p, int depth)
> > >return RECURSE (TREE_OPERAND (t, 1)) && RECURSE (TREE_OPERAND (t, 
> > > 2));
> > >
> > >  case SSA_NAME:
> > > -  /* Limit the depth of recursion to avoid quadratic behavior.
> > > -This is expected to catch almost all occurrences in practice.
> > > -If this code misses important cases that unbounded recursion
> > > -would not, passes that need this information could be revised
> > > -to provide it through dataflow propagation.  */
> > > -  return (!name_registered_for_update_p (t)
> > > - && depth < param_max_ssa_name_query_depth
> > > - && gimple_stmt_nonnegative_warnv_p (SSA_NAME_DEF_STMT (t),
> > > - strict_overflow_p, 
> > > depth));
> > > +  {
> > > +   /* For integral types, querry the global range if possible. */
> >
> > query
> >
> > > +   if (INTEGRAL_TYPE_P (TREE_TYPE (t)))
> > > + {
> > > +   value_range vr;
> > > +   if (get_global_range_query ()->range_of_expr (vr, t)
> > > +   && !vr.varying_p () && !vr.undefined_p ())
> > > + {
> > > +   /* If the range is nonnegative, return true. */
> > > +   if (vr.nonnegative_p ())
> > > + return true;
> > > +
> > > +   /* If the range is non-positive, then return false. */
> > > +   if (vr.nonpositive_p ())
> > > + return false;
> >
> > That's testing for <= 0, nonnegative for >= 0.  This means when
> > vr.nonpositive_p () the value could still be zero (and nonnegative),
> > possibly be figured out by the recursion below.
> >
> > Since we don't have negative_p () do we want to test
> > nonpositive_p () && nonzero_p () instead?
>
> I was thinking about that when I was writing the patch.
> If the ranger figured out the value was zero, nonnegative_p would have
> returned true.
> So while yes nonpositive_p() would return true but we already checked
> nonnegative_p beforehand and the nonzero_p would not matter.
> Now the question is if after nonnegative_p we check if the range could
> contain 0 still is that worth the recursion.

Yes, that was the point.

> The whole idea of
> returning false was to remove the need from recursion as much.

Well, specifically the exact zero case is something the code is reasonably
good at.  If we want to remove recursion as much as possible we'd
remove it completely.

Maybe you can do some statistics how many hits the recursion yields
after the vr.nonnegative_p () out (but without the nonpositive_p one)?

Richard.

> Thanks,
> Andrew
>
>
> >
> > OK with that change.
> >
> > Richard.
> >
> > > + }
> > > + }
> > > +   /* Limit the depth of recursion to avoid quadratic behavior.
> > > +  This is expected to catch almost all occurrences in practice.
> > > +  If this code misses important cases that unbounded recursion
> > > +  would not, passes that need this information could be

Re: [PATCH v2 2/4] libgrust: Add libproc_macro and build system

Hi!

Short Friday afternoon status update:

On 2023-10-27T08:51:12+0100, Iain Sandoe  wrote:
>> On 26 Oct 2023, at 09:21, Thomas Schwinge  wrote:
>> First, I've pushed into GCC upstream Git branch devel/rust/libgrust-v2
>> the "v2" libgrust changes as posted by Arthur, so that people can easily
>> test this before it getting into Git master branch.
>>
>> I'll myself later try this for GCN and nvptx targets -- in their current
>> form where they don't support C++ (standard library)

Indeed, this currently fails to build:

[...]
make[3]: Entering directory 
`[...]/build-gcc/amdgcn-amdhsa/libgrust/libproc_macro'
[...]
libtool: compile:  [...]/build-gcc/./gcc/xg++ -B[...]/build-gcc/./gcc/ 
-nostdinc++ -funconfigured-libstdc++-v3 [...] -c 
[...]/source-gcc/libgrust/libproc_macro/proc_macro.cc
xg++: error: unrecognized command-line option ‘-funconfigured-libstdc++-v3’
make[3]: *** [proc_macro.lo] Error 1
make[3]: Leaving directory 
`[...]/build-gcc/amdgcn-amdhsa/libgrust/libproc_macro'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `[...]/build-gcc/amdgcn-amdhsa/libgrust'
make[1]: *** [all-target-libgrust] Error 2
make[1]: Leaving directory `[...]/build-gcc'
make: *** [all] Error 2

("error: unrecognized command-line option ‘-funconfigured-libstdc++-v3’"
indeed is the expected outcome if libstdc++ is not available, as I
understand.)

Same for nvptx-none target.

We need two things: (a) make sure that target libgrust build depends on
target libstdc++, and (b) disable target libgrust if target libstdc++ is
not available (and, later, gracefully handle that situation in the Rust
front end).

As far as I remember, patches exist for (a), and Arthur is going to
integrate/re-submit those.  Arthur, before re-submission, feel free to
first cherr-pick and push'these into the GCC upstream Git branch
devel/rust/libgrust-v2, so that I can re-test.  I'm not sure about (b),
whether that fell out of the (a) changes, too?  I can otherwise look into
that later.

>> and in my hacky WIP
>> trees where C++ (standard library) is supported to some extent.

This does built -- but the default multilib only, as Iain already
reported:

>> (This
>> should, roughly, match C++ functionality (not) provided by a number of
>> other GCC "embedded" targets.)
>
> on Darwin, it works for later systems without multilibs, but fails to build 
> multilibs.

I see that, too.

> —— so….
>
> With the patch below bootstrap suceeded on x86_64-darwin17 and produced a 
> correct
> architecture multilib.

Confirmed, thanks!

> Of course, there is no way to test this at the moment - I’d suggest
> that the next step might be something small in functionality that can allow 
> at least one
> test to be wired up.

ACK.

> ^^^ this is “lightly tested” of course, as I cycle through other versions of 
> the OS will see
> how it pans out.
>
> Do you want me to make a PR with this change against upstream?

Yes, please.  (But no hurry.)


Grüße
 Thomas


> From 027bc2c5255a6f1b75592e896dd99fac55bfb9b8 Mon Sep 17 00:00:00 2001
> From: Iain Sandoe 
> Date: Thu, 26 Oct 2023 23:19:36 +0100
> Subject: [PATCH] libgrust: enable multilib
>
> Most of this change is the regenerated files, the multilib config macro
> was already present, but commented out.
>
> libgrust/ChangeLog:
>
>   * Makefile.in:
>   * aclocal.m4: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Uncomment AM_ENABLE_MULTILIB.
>   * libproc_macro/Makefile.in:
>
> Signed-off-by: Iain Sandoe 
> ---
>  libgrust/Makefile.in   |  2 +
>  libgrust/aclocal.m4|  1 +
>  libgrust/configure | 68 --
>  libgrust/configure.ac  |  2 +-
>  libgrust/libproc_macro/Makefile.in |  2 +
>  5 files changed, 71 insertions(+), 4 deletions(-)
>
> diff --git a/libgrust/Makefile.in b/libgrust/Makefile.in
> index bc9b6cc227a..2dc39adff24 100644
> --- a/libgrust/Makefile.in
> +++ b/libgrust/Makefile.in
> @@ -93,6 +93,7 @@ ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
>  am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
>   $(top_srcdir)/../config/depstand.m4 \
>   $(top_srcdir)/../config/lead-dot.m4 \
> + $(top_srcdir)/../config/multi.m4 \
>   $(top_srcdir)/../config/no-executables.m4 \
>   $(top_srcdir)/../config/override.m4 \
>   $(top_srcdir)/../config/toolexeclibdir.m4 \
> @@ -275,6 +276,7 @@ localedir = @localedir@
>  localstatedir = @localstatedir@
>  mandir = @mandir@
>  mkdir_p = @mkdir_p@
> +multi_basedir = @multi_basedir@
>  oldincludedir = @oldincludedir@
>  pdfdir = @pdfdir@
>  prefix = @prefix@
> diff --git a/libgrust/aclocal.m4 b/libgrust/aclocal.m4
> index 1bd42c34d74..5d808f05afa 100644
> --- a/libgrust/aclocal.m4
> +++ b/libgrust/aclocal.m4
> @@ -1250,6 +1250,7 @@ AC_SUBST([am__untar])
>  m4_include([../config/acx.m4])
>  m4_include([../config/depstand.m4])
>  m4_include([../config/lead-dot.m4])
>

Re: [PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis

Hi!

Richard, as the original author of 'SSA_NAME_POINTS_TO_READONLY_MEMORY':
2018 commit 6214d5c7e7470bdd5ecbeae668c2522551bfebbc (Subversion r263958)
"Move const_parm trick to generic code"; 'gcc/tree.h':

/* Nonzero if this SSA_NAME is known to point to memory that may not
   be written to.  This is set for default defs of function parameters
   that have a corresponding r or R specification in the functions
   fn spec attribute.  This is used by alias analysis.  */
#define SSA_NAME_POINTS_TO_READONLY_MEMORY(NODE) \
SSA_NAME_CHECK (NODE)->base.deprecated_flag

..., may I ask you to please help review the following patch
(full-quoted)?

For context: this patch here ("second patch") depends on a first patch:

"[PATCH, OpenACC 2.7] readonly modifier support in front-ends".  That one
is still under review/rework; so you're not able to apply this second
patch here.

In a nutshell: a 'readonly' modifier has been added to the OpenACC
'copyin' clause (copy host to device memory, don't copy back at end of
region):

| If the optional 'readonly' modifier appears, then the implementation may 
assume that the data
| referenced by _var-list_ is never written to within the applicable region.

That is, for example (untested):

#pragma acc routine
void escape(int *);

int x[32] = [...];
#pragma acc parallel copyin(readonly: x)
{
  int a1 = x[3];
  escape(x);
  int a2 = x[3]; // Per 'readonly', don't need to reload 'x[3]' here.
  //x[22] = 0; // Invalid -- but no diagnostic mandated.
}

What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
flag.

The actual optimization then is done in this second patch.  Chung-Lin
found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
I don't have much experience with most of the following generic code, so
would appreciate a helping hand, whether that conceptually makes sense as
well as from the implementation point of view:

On 2023-07-25T23:52:06+0800, Chung-Lin Tang via Gcc-patches 
 wrote:
> On 2023/7/11 2:33 AM, Chung-Lin Tang via Gcc-patches wrote:
>> As we discussed earlier, the work for actually linking this to middle-end
>> points-to analysis is a somewhat non-trivial issue. This first patch allows
>> the language feature to be used in OpenACC directives first (with no effect 
>> for now).
>> The middle-end changes are probably going to be a later patch.
>
> This second patch tries to link the readonly modifier to points-to analysis.
>
> There already exists SSA_NAME_POINTS_TO_READONLY_MEMORY and it's support in 
> the
> alias oracle routines in tree-ssa-alias.cc, so basically what this patch does 
> is
> try to make the variables holding the array section base pointers to have this
> flag set.
>
> There is an another OMP_CLAUSE_MAP_POINTS_TO_READONLY set by front-ends on the
> associated pointer clauses if OMP_CLAUSE_MAP_READONLY is set.
> Also a DECL_POINTS_TO_READONLY flag is set for VAR_DECLs when creating the tmp
> vars carrying these receiver references on the offloaded side. These
> eventually get translated to SSA_NAME_POINTS_TO_READONLY_MEMORY.

> This still doesn't always work as expected in terms of optimization:
> struct pointer fields and Fortran arrays (kind of like C structs) which have
> several accesses to create the pointer access on the receive/offloaded side,
> and SRA appears to not work on these sequences, so gets in the way of much
> redundancy elimination.

I understand correctly that this is left as future work?  Please add the test
cases you have, XFAILed in some reasonable way.

> Currently have one testcase where we can demonstrate 'readonly' can avoid
> a clobber by function call.

:-)

> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -14258,6 +14258,8 @@ handle_omp_array_sections (tree c, enum 
> c_omp_region_type ort)
>   OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH);
>else
>   OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
> +  if (OMP_CLAUSE_MAP_READONLY (c))
> + OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
>OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c);
>if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER
> && !c_mark_addressable (t))

> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -5872,6 +5872,8 @@ handle_omp_array_sections (tree c, enum 
> c_omp_region_type ort)
>   }
> else
>   OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
> +   if (OMP_CLAUSE_MAP_READONLY (c))
> + OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
> OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c);
> if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER
> && !cxx_mark_addressable (t))

> --- a/gcc/fortran/trans-openmp.cc
> +++ b/gcc/fortran/trans-openmp.cc
> @@ -2524,6 +2524,8 @@

[PATCH] testsuite: Force use of -c when precompiling headers

2023-10-27 Thread Christophe Lyon

In some configurations of our validation setup, we always call the
compiler with -Wl,-rpath=XXX, which instructs the driver to invoke the
linker if none of -c, -S or -E is used.

This happens to be the case in the PCH tests, where dg-flags-pch sets
dg-do-what-default to precompile.

This works most of the time, in absence of any linker option, the
compiler defaults to generating a precompiled header (otherwise the
linker complains because it cannot find 'main').

This small patch forces the use of '-c' when generating the .gch file,
which is sufficient not to invoke the linker.

Arguably, this could be seen as a dejagnu bug: in gcc-dg-test-1 (in
gcc-dg.exp), we set compile_type to "precompiled_header", which is not
one of the supported values in dejagnu's default_target_compile (in
target.exp).

2023-10-27  Christophe Lyon  

gcc/testsuite/
* lib/dg-pch.exp (dg-flags-pch): Add -c when generating the
precompiled header.
---
 gcc/testsuite/lib/dg-pch.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/dg-pch.exp b/gcc/testsuite/lib/dg-pch.exp
index b6fefaa0286..ae8ce3bf1e1 100644
--- a/gcc/testsuite/lib/dg-pch.exp
+++ b/gcc/testsuite/lib/dg-pch.exp
@@ -95,7 +95,7 @@ proc dg-flags-pch { subdir test otherflags options suffix } {
set dg-do-what-default precompile
catch { file_on_host delete "$bname$suffix" }
gcc_copy_files "[file rootname $test]${suffix}s" "$bname$suffix"
-   dg-test -keep-output "./$bname$suffix" "$otherflags $flags" ""
+   dg-test -keep-output "./$bname$suffix" "$otherflags $flags -c" ""
 
# For the rest, the default is to compile to .s.
set dg-do-what-default compile
-- 
2.34.1

Re: [PATCH] recog: Fix propagation into ASM_OPERANDS

2023-10-27 Thread Richard Sandiford

Jeff Law  writes:
> On 10/24/23 04:15, Richard Sandiford wrote:
>> An inline asm with multiple output operands is represented as a
>> parallel set in which the SET_SRCs are the same (shared) ASM_OPERANDS.
>> insn_propgation didn't account for this, and instead propagated
>> into each ASM_OPERANDS individually.  This meant that it could
>> apply a substitution X->Y to Y itself, which (a) could create
>> circularity and (b) would be semantically wrong in any case,
>> since Y might use a different value of X.
>> 
>> This patch checks explicitly for parallels involving ASM_OPERANDS,
>> just like combine does.
>> 
>> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?
>> 
>> Richard
>> 
>> 
>> gcc/
>>  * recog.cc (insn_propagation::apply_to_pattern_1): Handle shared
>>  ASM_OPERANDS.
> As the combine comment says "Ug".

Aye :)  Thanks for the reviews.

Richard

>  OK for the trunk.
>
> jeff

Re: [PATCH] RISC-V: Fix wrong tune parameters on int_div





On 10/27/23 01:49, Robin Dapp wrote:

@@ -346,7 +346,7 @@ static const struct riscv_tune_param rocket_tune_info = {
{COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_mul */
{COSTS_N_INSNS (20), COSTS_N_INSNS (20)},   /* fp_div */
{COSTS_N_INSNS (4), COSTS_N_INSNS (4)}, /* int_mul */
-  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},  /* int_div */
+  {COSTS_N_INSNS (33), COSTS_N_INSNS (65)},/* int_div */
1,  /* issue_rate */
3,  /* branch_cost */
5,  /* memory_cost */
@@ -361,7 +361,7 @@ static const struct riscv_tune_param sifive_7_tune_info = {
{COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_mul */
{COSTS_N_INSNS (20), COSTS_N_INSNS (20)},   /* fp_div */
{COSTS_N_INSNS (4), COSTS_N_INSNS (4)}, /* int_mul */
-  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},  /* int_div */
+  {COSTS_N_INSNS (33), COSTS_N_INSNS (65)},/* int_div */
2,  /* issue_rate */
4,  /* branch_cost */
3,  /* memory_cost */
@@ -376,7 +376,7 @@ static const struct riscv_tune_param thead_c906_tune_info = 
{
{COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_mul */
{COSTS_N_INSNS (20), COSTS_N_INSNS (20)}, /* fp_div */
{COSTS_N_INSNS (4), COSTS_N_INSNS (4)}, /* int_mul */
-  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)}, /* int_div */
+  {COSTS_N_INSNS (18), COSTS_N_INSNS (34)}, /* int_div */
1,/* issue_rate */
3,/* branch_cost */
5,/* memory_cost */


Instruction costs don't really correspond to latencies even though
sometimes they are used as if they were.  I'm a bit wary of using
e.g. 65 which would disparage each use of an integer division inside
a sequence.

Could you check which costs we need in order to still emit your wanted
sequence?  Maybe we can use values a bit lower than yours and still
get the proper code.  Where is the decision being made actually?
The main use of costing of a div/mod instruction is to guide the 
reciprocal division code when dividing by a constant.In that context 
we're comparing costs against a sequence of multiplies, shifts, add/sub 
insns which are almost always costed by their latency.  So using latency 
for division is a reasonable place to start.


The other thing that might be worth investigating for those processors 
would be to set "use_divmod_expansion" in the cost structure.  I've 
heard talk of fusing div/mod into divmod, though I'm not aware of any 
part implementing that fusion (from a prior life, that would seem to 
require a 2nd output port on the integer unit which could be highly 
undesirable).  Anyway, this could be a followup item for Yangyu if it 
looks profitable.


jeff

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)



> On Oct 27, 2023, at 3:21 AM, Martin Uecker  wrote:
> 
> Am Donnerstag, dem 26.10.2023 um 19:57 + schrieb Qing Zhao:
>> I guess that what Kees wanted, ""fill the array without knowing the actual 
>> final size" code pattern”, as following:
>> 
struct foo *f;
char *p;
int i;
 
f = alloc(maximum_possible);
f->count = 0;
p = f->buf;
 
for (i; data_is_available() && i < maximum_possible; i++) {
f->count ++;
p[i] = next_data_item();
}
>> 
>> actually is a dynamic array, or more accurately, Bounded-size dynamic array: 
>> ( but not a dynamic allocated array as we discussed so far)
>> 
>> https://en.wikipedia.org/wiki/Dynamic_array
>> 
>> This dynamic array, also is called growable array, or resizable array, whose 
>> size can 
>> be changed during the lifetime. 
>> 
>> For VLA or FAM, I believe that they are both dynamic allocated array, i.e, 
>> even though the size is not know at the compilation time, but the size
>> will be fixed after the array is allocated. 
>> 
>> I am not sure whether C has support to such Dynamic array? Or whether it’s 
>> easy to provide dynamic array support in C?
> 
> It is possible to support dynamic arrays in C even with
> good checking, but not safely using the pattern above
> where you derive a pointer which you later use independently.
> 
> While we could track the connection to the original struct,
> the necessary synchronization between the counter and the
> access to the buffer is difficult.  I do not see how this
> could be supported with reasonable effort and cost.
> 
> 
> But with this restriction in mind, we can do a lot in C.
> For example, see my experimental (!) container library
> which has vector type.
> https://github.com/uecker/noplate/blob/main/test.c
> You can get an array view for the vector (which then
> also can decay to a pointer), so it interoperates nicely
> with C but you can get good bounds checking.
> 
> 
> But once you derive a pointer and pass it on, it gets
> difficult.  But if you want safety, you just have to 
> to simply avoid this in code. 

So, for the following modified code: (without the additional pointer “p”)

struct foo
{
 size_t count;
 char buf[] __attribute__((counted_by(count)));
};

struct foo *f;
int i;  

f = alloc(maximum_possible);
f->count = 0;

for (i; data_is_available() && i < maximum_possible; i++) {
  f->count ++;  
  f->buf[i] = next_data_item();
}   

The support for dynamic array should be possible? 


> 
> What we could potentially do is add restrictions so 
> that the access to buf always has to go via x->buf 
> or you get at least a warning.

Are the following two restrictions to the user enough:

1. The access to buf should always go via x->buf, 
no assignment to another independent pointer 
and access buf through this new pointer.
2.  User need to keep the synchronization between
  the counter and the access to the buffer all the time.


Qing
> 
> Martin
> 
> 
> 
> 
>> 
>> Qing
>> 
>> 
>>> On Oct 26, 2023, at 12:45 PM, Martin Uecker  wrote:
>>> 
>>> Am Donnerstag, dem 26.10.2023 um 09:13 -0700 schrieb Kees Cook:
 On Thu, Oct 26, 2023 at 10:15:10AM +0200, Martin Uecker wrote:
> but not this:
> 
>>> 
>>> x->count = 11;
> char *p = >buf;
> x->count = 1;
> p[10] = 1; // !
 
 This seems fine to me -- it's how I'd expect it to work: "10" is beyond
 "1".
>>> 
>>> Note that the store would be allowed.
>>> 
 
> (because the pointer is passed around the
> store to the counter)
> 
> and also here the second store is then irrelevant
> for the access:
> 
> x->count = 10;
> char* p = >buf;
> ...
> x->count = 1; // somewhere else
> 
> p[9] = 1; // ok, because count matter when buf was accesssed.
 
 This is less great, but I can understand why it happens. "p" loses the
 association with "x". It'd be nice if "p" had to way to retain that it
 was just an alias for x->buf, so future p access would check count.
>>> 
>>> The problem is not to discover that p is an alias to x->buf, 
>>> but that it seems difficult to make sure that stores to 
>>> x->count are not reordered relative to the final access to
>>> p[i] you want to check, so that you then get the right value.
>>> 
 
 But this appears to be an existing limitation in other areas where an
 assignment will cause the loss of object association. (I've run into
 this before.) It's just more surprising in the above example because in
 the past the loss of association would cause __bdos() to revert back to
 "SIZE_MAX" results ("I don't know the size") rather than an "outdated"
 size, which may get us into unexpected places...
 
> IMHO this makes sense also from the user side and
> are the desirable semantics we discussed before.
> 
> But can you take a look at this?
> 
> 
> This should simulate it

Re: [PATCH] [x86_64]: Zhaoxin yongfeng enablement

2023-10-27 Thread Uros Bizjak

On Fri, Oct 27, 2023 at 12:20 PM mayshao  wrote:
>
> On 2023/10/26 17:34, Uros Bizjak wrote:
> > On Wed, Oct 25, 2023 at 8:43 AM mayshao  wrote:
> >>
> >> Hi all:
> >>  This patch enables -march/-mtune=yongfeng, costs and tunings are set 
> >> according to the characteristics of the processor. We add a new md file to 
> >> describe yongfeng processor.
> >>
> >>  Bootstrapped /regtested X86_64.
> >>
> >>  Ok for trunk?
> >> BR
> >> Mayshao
> >> gcc/ChangeLog:
> >>
> >>  * common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize 
> >> yongfeng.
> >>  * common/config/i386/i386-common.cc: Add yongfeng.
> >>  * common/config/i386/i386-cpuinfo.h (enum processor_subtypes): 
> >> Add ZHAOXIN_FAM7H_YONGFENG.
> >>  * config.gcc: Add yongfeng.
> >>  * config/i386/driver-i386.cc (host_detect_local_cpu): Let 
> >> -march=native
> >>  recognize yongfeng processors.
> >>  * config/i386/i386-c.cc (ix86_target_macros_internal): Add 
> >> yongfeng.
> >>  * config/i386/i386-options.cc (m_YONGFENG): New definition.
> >>  (m_ZHAOXIN): Ditto.
> >>  * config/i386/i386.h (enum processor_type): Add 
> >> PROCESSOR_YONGFENG.
> >>  * config/i386/i386.md: Add yongfeng.
> >>  * config/i386/lujiazui.md: Fix typo.
> >>  * config/i386/x86-tune-costs.h (struct processor_costs): Add 
> >> yongfeng costs.
> >>  * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add yongfeng.
> >>  (ix86_adjust_cost): Ditto.
> >>  * config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Replace 
> >> m_LUJIAZUI by m_ZHAOXIN.
> >>  (X86_TUNE_PARTIAL_REG_DEPENDENCY): Ditto.
> >>  (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Ditto.
> >>  (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Ditto.
> >>  (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Ditto.
> >>  (X86_TUNE_MOVX): Ditto.
> >>  (X86_TUNE_MEMORY_MISMATCH_STALL): Ditto.
> >>  (X86_TUNE_FUSE_CMP_AND_BRANCH_32): Ditto.
> >>  (X86_TUNE_FUSE_CMP_AND_BRANCH_64): Ditto.
> >>  (X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS): Ditto.
> >>  (X86_TUNE_FUSE_ALU_AND_BRANCH): Ditto.
> >>  (X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Ditto.
> >>  (X86_TUNE_USE_LEAVE): Ditto.
> >>  (X86_TUNE_PUSH_MEMORY): Ditto.
> >>  (X86_TUNE_LCP_STALL): Ditto.
> >>  (X86_TUNE_INTEGER_DFMODE_MOVES): Ditto.
> >>  (X86_TUNE_OPT_AGU): Ditto.
> >>  (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Ditto.
> >>  (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Ditto.
> >>  (X86_TUNE_USE_SAHF): Ditto.
> >>  (X86_TUNE_USE_BT): Ditto.
> >>  (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Ditto.
> >>  (X86_TUNE_ONE_IF_CONV_INSN): Ditto.
> >>  (X86_TUNE_AVOID_MFENCE): Ditto.
> >>  (X86_TUNE_EXPAND_ABS): Ditto.
> >>  (X86_TUNE_USE_SIMODE_FIOP): Ditto.
> >>  (X86_TUNE_USE_FFREEP): Ditto.
> >>  (X86_TUNE_EXT_80387_CONSTANTS): Ditto.
> >>  (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Ditto.
> >>  (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Ditto.
> >>  (X86_TUNE_SSE_TYPELESS_STORES): Ditto.
> >>  (X86_TUNE_SSE_LOAD0_BY_PXOR): Ditto.
> >>  (X86_TUNE_USE_GATHER_2PARTS): Add m_YONGFENG.
> >>  (X86_TUNE_USE_GATHER_4PARTS): Ditto.
> >>  (X86_TUNE_USE_GATHER_8PARTS): Ditto.
> >>  (X86_TUNE_AVOID_128FMA_CHAINS): Ditto.
> >>  * doc/extend.texi: Add details about yongfeng.
> >>  * doc/invoke.texi: Ditto.
> >>  * config/i386/yongfeng.md: New file for decribing yongfeng 
> >> processor.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  * g++.target/i386/mv32.C: Handle new march.
> >>  * gcc.target/i386/funcspec-56.inc: Ditto.
> >
> > LGTM.
> >
> > There are a couple of comments that needs to be fixed, please see inline.
> >
> > BTW: A couple of days ago, I have added a new tunung flag [1]. I
> > considered Zhaoxin cores a modern core, but please review the new flag
> > anyway.
> >
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634280.html
> >
> > Thanks,
> > Uros.
> >
> Hi Uros:
>Thanks for your review. I have fix the errors that you comment,
> please review the attached patch again.
>I have review the new tuning flag[1]. When a write of 64 bits or
> less is followed by a read of a smaller size which is fully contained
> in the write address range, regardless of alignement, Zhaoxin
> processors will do store forwarding.

The patch is OK.

Thanks,
Uros.

Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer

On Fri, Oct 27, 2023 at 4:17 AM Li, Pan2  wrote:
>
> Thanks Richard S for comments.
>
> > In other words, I don't think simply removing the test from the vectoriser
> > is correct.  It needs to be replaced by something more selective.
>
> Does it mean we need to check if the internal fun allow different modes/sizes 
> here?
>
> For example, standard name lrintmn2 (m, n mode) is allowed here, while rintm2 
> (only m mode) isn't.

We need to check whether the "size" of the LHS is somehow participating in the
optab query.  I think the direct_internal_fn_info type0/1 members,
when -1 say that.
If none is -1 (and -2) then the LHS has to match one of the arguments (if there
are two different I'm not sure which we'd pick).

So patch-wise the existing check can probably be skipped when
vectorizable_internal_function
returns an IFN but that function should have the very same check when
vectype_out isn't
participating in the optab selection.

Richard.

>
> Pan
>
> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, October 27, 2023 1:47 AM
> To: Richard Biener 
> Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
> juzhe.zh...@rivai.ai; Wang, Yanzhang ; 
> kito.ch...@gmail.com; Liu, Hongtao 
> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
>
> Richard Biener  writes:
> >> Am 26.10.2023 um 13:59 schrieb Li, Pan2 :
> >>
> >> Thanks Richard for comments.
> >>
> >>> Can you explain why this is necessary?  In particular what is lhs_rtx
> >>> mode vs ops[0].value mode?
> >>
> >> For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as 
> >> below.
> >>
> >> The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]).
> >> The ops[0].value is (reg:VNx2DI 104).
> >>
> >> The restriction removing make the vector rtl enter expand_fn_using_insn 
> >> and of course hit the INTEGER_P assertion.
> >
> > But I think this shows we mid-selected the optab, a convert_move is 
> > certainly not correct unconditionally here (the target might not support 
> > that)
>
> Agreed.  Allowing TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)
> makes sense if the called function allows the input and output modes
> to vary.  That's true for internal functions that eventually map to
> two-mode optabs.  But we can't remove the condition for calls to
> other functions, at least not without some fix-ups.
>
> ISTM that the problem being hit is the one described by the removed
> comment.
>
> In other words, I don't think simply removing the test from the vectoriser
> is correct.  It needs to be replaced by something more selective.
>
> Thanks,
> Richard
>
> >> Pan
> >>
> >> -Original Message-
> >> From: Richard Biener 
> >> Sent: Thursday, October 26, 2023 4:38 PM
> >> To: Li, Pan2 
> >> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
> >> ; kito.ch...@gmail.com; Liu, Hongtao 
> >> ; Richard Sandiford 
> >> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of 
> >> vectorizer
> >>
> >>> On Thu, Oct 26, 2023 at 4:18 AM  wrote:
> >>>
> >>> From: Pan Li 
> >>>
> >>> Update in v2:
> >>>
> >>> * Fix one ICE of type assertion.
> >>> * Adjust some test cases for aarch64 sve and riscv vector.
> >>>
> >>> Original log:
> >>>
> >>> The vectoriable_call has one restriction of the size of data type.
> >>> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> >>> when try to vectorize function call like lrintf.
> >>>
> >>> void
> >>> test_lrintf (long *out, float *in, unsigned count)
> >>> {
> >>>  for (unsigned i = 0; i < count; i++)
> >>>out[i] = __builtin_lrintf (in[i]);
> >>> }
> >>>
> >>> lrintf.c:5:26: missed: couldn't vectorize loop
> >>> lrintf.c:5:26: missed: not vectorized: unsupported data-type
> >>>
> >>> Then the standard name pattern like lrintmn2 cannot work for different
> >>> data type size like SF => DI. This patch would like to remove this data
> >>> type size check and unblock the standard name like lrintmn2.
> >>>
> >>> The below test are passed for this patch.
> >>>
> >>> * The x86 bootstrap and regression test.
> >>> * The aarch64 regression test.
> >>> * The risc-v regression tests.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>* internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
> >>>* tree-vect-stmts.cc (vectorizable_call): Remove size check.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>* gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
> >>>* gcc.target/aarch64/sve/clz_1.c: Ditto.
> >>>* gcc.target/aarch64/sve/popcount_1.c: Ditto.
> >>>* gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.
> >>>
> >>> Signed-off-by: Pan Li 
> >>> ---
> >>> gcc/internal-fn.cc  |  3 ++-
> >>> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c  |  3 +--
> >>> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c|  3 +--
> >>> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
> >>> .../gcc.target/riscv/rvv/autovec/unop/popcount.c|  2 +-
> >>>

Re: [PATCH] RISC-V: Add rawmemchr expander.





On 10/27/23 01:38, Robin Dapp wrote:

Suggested adapt codes as follows:

unsigned int element_size = GET_MODE_SIZE (mode).to_constant ();
poly_int64 nunits = exact_div (BYTES_PER_RISCV_VECTOR *TARGET_MAX_LMUL, 
element_size);
if (!get_vector_mode(mode, nunits).exists())
   gcc_unreachable ();


Actually I was initially considering using lmul = m8 here,
unconditionally, but the param is probably the more intuitive choice.

Attached v2 with that included.  Also moved the riscv test to
autovec/builtin/ so we can add the other builtins as well.


Also, this patch reminds me we are missing some more similiar builtin
function which can use RVV:

strlen, strcpy, strcmp...etc


Yes we should still have them but I'd rather not work on that right
now.  How about I open a PR for it so we can still add them in stage 3?
Their impact is pretty localized and the risk should be low.
Kito, Palmer, Jeff - would that be acceptable?
I'd definitely like to see them get included.  Those routines often have 
efficient and relatively simple vector implementations.


I'd open the PR, mostly so we don't lose track of them.  Whether or not 
to include after stage1 closes would be done on an individual review 
basis -- the deeper we get into stage3/stage4 the higher the bar would be.


What I'm keen to avoid is lots of new work going in after stage1 closes.

Jeff

Re: [PATCH] recog: Fix propagation into ASM_OPERANDS





On 10/24/23 04:15, Richard Sandiford wrote:

An inline asm with multiple output operands is represented as a
parallel set in which the SET_SRCs are the same (shared) ASM_OPERANDS.
insn_propgation didn't account for this, and instead propagated
into each ASM_OPERANDS individually.  This meant that it could
apply a substitution X->Y to Y itself, which (a) could create
circularity and (b) would be semantically wrong in any case,
since Y might use a different value of X.

This patch checks explicitly for parallels involving ASM_OPERANDS,
just like combine does.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard


gcc/
* recog.cc (insn_propagation::apply_to_pattern_1): Handle shared
ASM_OPERANDS.

As the combine comment says "Ug".  OK for the trunk.

jeff

Re: [PATCH] RISC-V: Add rawmemchr expander.

> It seems that you didn't commit it yet.
> 
> A nit comment:
> 
> +  int lmul = riscv_autovec_lmul == RVV_DYNAMIC ? RVV_M8 : riscv_autovec_lmul;
> 
> I change you could use TARGET_MAX_LMUL

No didn't commit yet, testsuite was still running.

OK, added it, will commit later.

Regards
 Robin

Re: [pushed] [RA]: Modify cost calculation for dealing with pseudo equivalences

2023-10-27 Thread Vladimir Makarov




On 10/27/23 09:56, Christophe Lyon wrote:

Hi Vladimir,

On Thu, 26 Oct 2023 at 16:00, Vladimir Makarov  wrote:

This is the second attempt to improve RA cost calculation for pseudos
with equivalences.  The patch explanation is in the log message.

The patch was successfully bootstrapped and tested on x86-64, aarch64,
and ppc64le.  The patch was also benchmarked on x86-64 spec2017.
specfp2017 performance did not changed, specint2017 improved by 0.3%.


As reported by our CI, this patch causes a regression on arm:
FAIL: gcc.target/arm/eliminate.c scan-assembler-times r0,[\\t ]*sp 3


For this testcase, we used to generate:
 str lr, [sp, #-4]!
 sub sp, sp, #12
 add r0, sp, #4
 bl  bar
 add r0, sp, #4
 bl  bar
 add r0, sp, #4
 bl  bar
 add sp, sp, #12
 ldr lr, [sp], #4
 bx  lr

After your patch, we generate:
 push{r4, lr}
 sub sp, sp, #8
 add r4, sp, #4
 mov r0, r4
 bl  bar
 mov r0, r4
 bl  bar
 mov r0, r4
 bl  bar
 add sp, sp, #8
 pop {r4, lr}
 bx  lr

which uses 1 more register and 1 more instruction.

Shall I file a bugzilla report for this?

I started to work on this right after I got the message (yesterday).  I 
already have a patch and am going to commit it during an hour.  So there 
is no need to fill the PR.

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-27 Thread Martin Uecker

Am Freitag, dem 27.10.2023 um 14:32 + schrieb Qing Zhao:
> 
> > On Oct 27, 2023, at 3:21 AM, Martin Uecker  wrote:
> > 
> > Am Donnerstag, dem 26.10.2023 um 19:57 + schrieb Qing Zhao:
> > > I guess that what Kees wanted, ""fill the array without knowing the 
> > > actual final size" code pattern”, as following:
> > > 
> > > > >   struct foo *f;
> > > > >   char *p;
> > > > >   int i;
> > > > > 
> > > > >   f = alloc(maximum_possible);
> > > > >   f->count = 0;
> > > > >   p = f->buf;
> > > > > 
> > > > >   for (i; data_is_available() && i < maximum_possible; i++) {
> > > > >   f->count ++;
> > > > >   p[i] = next_data_item();
> > > > >   }
> > > 
> > > actually is a dynamic array, or more accurately, Bounded-size dynamic 
> > > array: ( but not a dynamic allocated array as we discussed so far)
> > > 
> > > https://en.wikipedia.org/wiki/Dynamic_array
> > > 
> > > This dynamic array, also is called growable array, or resizable array, 
> > > whose size can 
> > > be changed during the lifetime. 
> > > 
> > > For VLA or FAM, I believe that they are both dynamic allocated array, 
> > > i.e, even though the size is not know at the compilation time, but the 
> > > size
> > > will be fixed after the array is allocated. 
> > > 
> > > I am not sure whether C has support to such Dynamic array? Or whether 
> > > it’s easy to provide dynamic array support in C?
> > 
> > It is possible to support dynamic arrays in C even with
> > good checking, but not safely using the pattern above
> > where you derive a pointer which you later use independently.
> > 
> > While we could track the connection to the original struct,
> > the necessary synchronization between the counter and the
> > access to the buffer is difficult.  I do not see how this
> > could be supported with reasonable effort and cost.
> > 
> > 
> > But with this restriction in mind, we can do a lot in C.
> > For example, see my experimental (!) container library
> > which has vector type.
> > https://github.com/uecker/noplate/blob/main/test.c
> > You can get an array view for the vector (which then
> > also can decay to a pointer), so it interoperates nicely
> > with C but you can get good bounds checking.
> > 
> > 
> > But once you derive a pointer and pass it on, it gets
> > difficult.  But if you want safety, you just have to 
> > to simply avoid this in code. 
> 
> So, for the following modified code: (without the additional pointer “p”)
> 
> struct foo
> {
>  size_t count;
>  char buf[] __attribute__((counted_by(count)));
> };
> 
> struct foo *f;
> int i;  
> 
> f = alloc(maximum_possible);
> f->count = 0;
> 
> for (i; data_is_available() && i < maximum_possible; i++) {
>   f->count ++;  
>   f->buf[i] = next_data_item();
> }   
> 
> The support for dynamic array should be possible? 

With the design we discussed this should work because
__builtin_with_access (or whatever) it reads:

f = alloc(maximum_possible);
f->count = 0;

for (i; data_is_available() && i < maximum_possible; i++) {
  f->count ++;  
  __builtin_with_access(f->buf, f->count)[i] = next_data_item();
}   

> 
> 
> > 
> > What we could potentially do is add restrictions so 
> > that the access to buf always has to go via x->buf 
> > or you get at least a warning.
> 
> Are the following two restrictions to the user enough:
> 
> 1. The access to buf should always go via x->buf, 
> no assignment to another independent pointer 
> and access buf through this new pointer.

Yes, maybe. One could also try to be smarter.

For example, one warn only when >buf is
assigned to another pointer and one of the
following conditions is fulfilled:

- the pointer escapes from the local context 

- there is a store to f->counter in the
local context that does not dominate >buf.

Then Kees' example would work too in most cases.

But I would probably wait until we have some
initial experience with this feature.

Martin

> 2.  User need to keep the synchronization between
>   the counter and the access to the buffer all the time.



> 
>

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)



> On Oct 27, 2023, at 10:53 AM, Martin Uecker  wrote:
> 
> Am Freitag, dem 27.10.2023 um 14:32 + schrieb Qing Zhao:
>> 
>>> On Oct 27, 2023, at 3:21 AM, Martin Uecker  wrote:
>>> 
>>> Am Donnerstag, dem 26.10.2023 um 19:57 + schrieb Qing Zhao:
 I guess that what Kees wanted, ""fill the array without knowing the actual 
 final size" code pattern”, as following:
 
>>  struct foo *f;
>>  char *p;
>>  int i;
>> 
>>  f = alloc(maximum_possible);
>>  f->count = 0;
>>  p = f->buf;
>> 
>>  for (i; data_is_available() && i < maximum_possible; i++) {
>>  f->count ++;
>>  p[i] = next_data_item();
>>  }
 
 actually is a dynamic array, or more accurately, Bounded-size dynamic 
 array: ( but not a dynamic allocated array as we discussed so far)
 
 https://en.wikipedia.org/wiki/Dynamic_array
 
 This dynamic array, also is called growable array, or resizable array, 
 whose size can 
 be changed during the lifetime. 
 
 For VLA or FAM, I believe that they are both dynamic allocated array, i.e, 
 even though the size is not know at the compilation time, but the size
 will be fixed after the array is allocated. 
 
 I am not sure whether C has support to such Dynamic array? Or whether it’s 
 easy to provide dynamic array support in C?
>>> 
>>> It is possible to support dynamic arrays in C even with
>>> good checking, but not safely using the pattern above
>>> where you derive a pointer which you later use independently.
>>> 
>>> While we could track the connection to the original struct,
>>> the necessary synchronization between the counter and the
>>> access to the buffer is difficult.  I do not see how this
>>> could be supported with reasonable effort and cost.
>>> 
>>> 
>>> But with this restriction in mind, we can do a lot in C.
>>> For example, see my experimental (!) container library
>>> which has vector type.
>>> https://github.com/uecker/noplate/blob/main/test.c
>>> You can get an array view for the vector (which then
>>> also can decay to a pointer), so it interoperates nicely
>>> with C but you can get good bounds checking.
>>> 
>>> 
>>> But once you derive a pointer and pass it on, it gets
>>> difficult.  But if you want safety, you just have to 
>>> to simply avoid this in code. 
>> 
>> So, for the following modified code: (without the additional pointer “p”)
>> 
>> struct foo
>> {
>> size_t count;
>> char buf[] __attribute__((counted_by(count)));
>> };
>> 
>> struct foo *f;
>> int i;  
>> 
>> f = alloc(maximum_possible);
>> f->count = 0;
>> 
>> for (i; data_is_available() && i < maximum_possible; i++) {
>>  f->count ++;  
>>  f->buf[i] = next_data_item();
>> }   
>> 
>> The support for dynamic array should be possible? 
> 
> With the design we discussed this should work because
> __builtin_with_access (or whatever) it reads:
> 
> f = alloc(maximum_possible);
> f->count = 0;
> 
> for (i; data_is_available() && i < maximum_possible; i++) {
>  f->count ++;  
>  __builtin_with_access(f->buf, f->count)[i] = next_data_item();
> }   
> 

Yes, with the data flow, f->count should get the latest value of f->count. 
>> 
>> 
>>> 
>>> What we could potentially do is add restrictions so 
>>> that the access to buf always has to go via x->buf 
>>> or you get at least a warning.
>> 
>> Are the following two restrictions to the user enough:
>> 
>> 1. The access to buf should always go via x->buf, 
>>no assignment to another independent pointer 
>>and access buf through this new pointer.
> 
> Yes, maybe. One could also try to be smarter.
> 
> For example, one warn only when >buf is
> assigned to another pointer and one of the
> following conditions is fulfilled:
> 
> - the pointer escapes from the local context 
> 
> - there is a store to f->counter in the
> local context that does not dominate >buf.
> 
> Then Kees' example would work too in most cases.

I guess that we might need to come up with the list of concrete restrictions to 
the user, 
and list these restrictions in the user documentation.

Since  the dynamic array support is quite important to the kernel (is this 
true, Kees? ),
We might need to include such support into our design in the beginning. 

> 
> But I would probably wait until we have some
> initial experience with this feature.

You mean after we have an initial implementation of the “builtin_with_size”?
Yes, at this moment, I think that the “builtin_with_size” approach is the best 
one.
Just some details need more thinking before the real implementation.  -:)

Qing
> 
> Martin
> 
>> 2.  User need to keep the synchronization between
>>  the counter and the access to the buffer all the time.

Re: [ARC PATCH] Improved SImode shifts and rotates with -mswap.





On 10/27/23 08:22, Roger Sayle wrote:


This patch improves the code generated by the ARC back-end for CPUs
without a barrel shifter but with -mswap.  The -mswap option provides
a SWAP instruction that implements SImode rotations by 16, but also
logical shift instructions (left and right) by 16 bits.  Clearly these
are also useful building blocks for implementing shifts by 17, 18, etc.
which would otherwise require a loop.

As a representative example:
int shl20 (int x) { return x << 20; }

GCC with -O2 -mcpu=em -mswap would previously generate:

shl20:  mov lp_count,10
 lp  2f
 add r0,r0,r0
 add r0,r0,r0
2:  # end single insn loop
 j_s [blink]

with this patch we now generate:

shl20:  mov_s   r2,0;3
 lsl16   r0,r0
 add3r0,r2,r0
 j_s.d   [blink]
 asl_s r0,r0

Although both are four instructions (excluding the j_s),
the original takes ~22 cycles, and replacement ~4 cycles.


Tested with a cross-compiler to arc-linux hosted on x86_64,
with no new (compile-only) regressions from make -k check.
Ok for mainline if this passes Claudiu's nightly testing?

Not a review, just a comment.

The H8 has a ton of shift synthesis.  If you're looking for inspiration 
to improve this stuff further for ARC, it might be worth a looksie.



Jeff

[pushed] c++: add testcase verifying non-dep new-expr checking

N.B. we currently don't diagnose 'new A(1)' below ultimately because
when in a template context our valid ctor call checking only happens for
type_build_ctor_call types.

-- >8 --

gcc/testsuite/ChangeLog:

* g++.dg/template/new14.C: New test.
---
 gcc/testsuite/g++.dg/template/new14.C | 20 
 1 file changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/template/new14.C

diff --git a/gcc/testsuite/g++.dg/template/new14.C 
b/gcc/testsuite/g++.dg/template/new14.C
new file mode 100644
index 000..8c0efe47ae2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/new14.C
@@ -0,0 +1,20 @@
+// Verify we check new-expressions ahead of time.
+
+struct A { };
+struct B { B(int); };
+struct C { void* operator new(__SIZE_TYPE__, int); };
+
+template
+void f() {
+  new A(1); // { dg-error "no match" "" { xfail *-*-* } }
+  new B(1, 2); // { dg-error "no match" }
+  new B; // { dg-error "no match" }
+  new C; // { dg-error "no match" }
+}
+
+
+template
+void g() {
+  new int[__SIZE_MAX__]; // { dg-error "exceeds maximum" }
+  new int[__SIZE_MAX__ / sizeof(int)]; // { dg-error "exceeds maximum" }
+}
-- 
2.42.0.482.g2e8e77cbac

Re: [PATCH V2 5/7] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-27 Thread Alex Coplan

On 26/10/2023 16:23, Richard Sandiford wrote:
> Victor Do Nascimento  writes:
> > On 10/18/23 21:39, Richard Sandiford wrote:
> >> Victor Do Nascimento  writes:
> >>> Implement the aarch64 intrinsics for reading and writing system
> >>> registers with the following signatures:
> >>>
> >>>   uint32_t __arm_rsr(const char *special_register);
> >>>   uint64_t __arm_rsr64(const char *special_register);
> >>>   void* __arm_rsrp(const char *special_register);
> >>>   float __arm_rsrf(const char *special_register);
> >>>   double __arm_rsrf64(const char *special_register);
> >>>   void __arm_wsr(const char *special_register, uint32_t value);
> >>>   void __arm_wsr64(const char *special_register, uint64_t value);
> >>>   void __arm_wsrp(const char *special_register, const void *value);
> >>>   void __arm_wsrf(const char *special_register, float value);
> >>>   void __arm_wsrf64(const char *special_register, double value);
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>   * gcc/config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
> >>>   Add enums for new builtins.
> >>>   (aarch64_init_rwsr_builtins): New.
> >>>   (aarch64_general_init_builtins): Call aarch64_init_rwsr_builtins.
> >>>   (aarch64_expand_rwsr_builtin):  New.
> >>>   (aarch64_general_expand_builtin): Call aarch64_general_expand_builtin.
> >>>   * gcc/config/aarch64/aarch64.md (read_sysregdi): New insn_and_split.
> >>>   (write_sysregdi): Likewise.
> >>>   * gcc/config/aarch64/arm_acle.h (__arm_rsr): New.
> >>>   (__arm_rsrp): Likewise.
> >>>   (__arm_rsr64): Likewise.
> >>>   (__arm_rsrf): Likewise.
> >>>   (__arm_rsrf64): Likewise.
> >>>   (__arm_wsr): Likewise.
> >>>   (__arm_wsrp): Likewise.
> >>>   (__arm_wsr64): Likewise.
> >>>   (__arm_wsrf): Likewise.
> >>>   (__arm_wsrf64): Likewise.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>   * gcc/testsuite/gcc.target/aarch64/acle/rwsr.c: New.
> >>>   * gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c: Likewise.
> >>> ---
> >>>   gcc/config/aarch64/aarch64-builtins.cc| 200 ++
> >>>   gcc/config/aarch64/aarch64.md |  17 ++
> >>>   gcc/config/aarch64/arm_acle.h |  30 +++
> >>>   .../gcc.target/aarch64/acle/rwsr-1.c  |  20 ++
> >>>   gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  | 144 +
> >>>   5 files changed, 411 insertions(+)
> >>>   create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
> >>>   create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
> >>>
> >>> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> >>> b/gcc/config/aarch64/aarch64-builtins.cc
> >>> index 04f59fd9a54..d8bb2a989a5 100644
> >>> --- a/gcc/config/aarch64/aarch64-builtins.cc
> >>> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> >>> @@ -808,6 +808,17 @@ enum aarch64_builtins
> >>> AARCH64_RBIT,
> >>> AARCH64_RBITL,
> >>> AARCH64_RBITLL,
> >>> +  /* System register builtins.  */
> >>> +  AARCH64_RSR,
> >>> +  AARCH64_RSRP,
> >>> +  AARCH64_RSR64,
> >>> +  AARCH64_RSRF,
> >>> +  AARCH64_RSRF64,
> >>> +  AARCH64_WSR,
> >>> +  AARCH64_WSRP,
> >>> +  AARCH64_WSR64,
> >>> +  AARCH64_WSRF,
> >>> +  AARCH64_WSRF64,
> >>> AARCH64_BUILTIN_MAX
> >>>   };
> >>>   
> >>> @@ -1798,6 +1809,65 @@ aarch64_init_rng_builtins (void)
> >>>  AARCH64_BUILTIN_RNG_RNDRRS);
> >>>   }
> >>>   
> >>> +/* Add builtins for reading system register.  */
> >>> +static void
> >>> +aarch64_init_rwsr_builtins (void)
> >>> +{
> >>> +  tree fntype = NULL;
> >>> +  tree const_char_ptr_type
> >>> += build_pointer_type (build_type_variant (char_type_node, true, 
> >>> false));
> >>> +
> >>> +#define AARCH64_INIT_RWSR_BUILTINS_DECL(F, N, T) \
> >>> +  aarch64_builtin_decls[AARCH64_##F] \
> >>> += aarch64_general_add_builtin ("__builtin_aarch64_"#N, T, 
> >>> AARCH64_##F);
> >>> +
> >>> +  fntype
> >>> += build_function_type_list (uint32_type_node, const_char_ptr_type, 
> >>> NULL);
> >>> +  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR, rsr, fntype);
> >>> +
> >>> +  fntype
> >>> += build_function_type_list (ptr_type_node, const_char_ptr_type, 
> >>> NULL);
> >>> +  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRP, rsrp, fntype);
> >>> +
> >>> +  fntype
> >>> += build_function_type_list (uint64_type_node, const_char_ptr_type, 
> >>> NULL);
> >>> +  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR64, rsr64, fntype);
> >>> +
> >>> +  fntype
> >>> += build_function_type_list (float_type_node, const_char_ptr_type, 
> >>> NULL);
> >>> +  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF, rsrf, fntype);
> >>> +
> >>> +  fntype
> >>> += build_function_type_list (double_type_node, const_char_ptr_type, 
> >>> NULL);
> >>> +  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF64, rsrf64, fntype);
> >>> +
> >>> +  fntype
> >>> += build_function_type_list (void_type_node, const_char_ptr_type,
> >>> + uint32_type_node, NULL);
> >>> +
> >>> +  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR, wsr, fntype);
> >>> +
> >>> +  fntype
> >>> +=

Re: hardcfr: support checking at abnormal edges [PR111943]

On Thu, Oct 26, 2023 at 5:44 PM Alexandre Oliva  wrote:
>
>
> Control flow redundancy may choose abnormal edges for early checking,
> but that breaks because we can't insert checks on such edges.
>
> Introduce conditional checking on the dest block of abnormal edges,
> and leave it for the optimizer to drop the conditional.
>
> Also, oops, I noticed the new files went in with an incorrect copyright
> notice, that this patch fixes.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?

OK.


>
> for  gcc/ChangeLog
>
> PR tree-optimization/111943
> * gimple-harden-control-flow.cc: Adjust copyright year.
> (rt_bb_visited): Add vfalse and vtrue data members.
> Zero-initialize them in the ctor.
> (rt_bb_visited::insert_exit_check_on_edge): Upon encountering
> abnormal edges, insert initializers for vfalse and vtrue on
> entry, and insert the check sequence guarded by a conditional
> in the dest block.
>
> for  libgcc/ChangeLog
>
> * hardcfr.c: Adjust copyright year.
>
> for  gcc/testsuite/ChangeLog
>
> PR tree-optimization/111943
> * gcc.dg/harden-cfr-pr111943.c: New.
> ---
>  gcc/gimple-harden-control-flow.cc  |   78 
> +++-
>  gcc/testsuite/gcc.dg/harden-cfr-pr111943.c |   33 
>  libgcc/hardcfr.c   |2 -
>  3 files changed, 109 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/harden-cfr-pr111943.c
>
> diff --git a/gcc/gimple-harden-control-flow.cc 
> b/gcc/gimple-harden-control-flow.cc
> index 3711b25d09123..77c140178060e 100644
> --- a/gcc/gimple-harden-control-flow.cc
> +++ b/gcc/gimple-harden-control-flow.cc
> @@ -1,5 +1,5 @@
>  /* Control flow redundancy hardening.
> -   Copyright (C) 2022 Free Software Foundation, Inc.
> +   Copyright (C) 2022-2023 Free Software Foundation, Inc.
> Contributed by Alexandre Oliva .
>
>  This file is part of GCC.
> @@ -460,6 +460,10 @@ class rt_bb_visited
>   at the end of a block's predecessors or successors list.  */
>tree ckfail, ckpart, ckinv, ckblk;
>
> +  /* If we need to deal with abnormal edges, we insert SSA_NAMEs for
> + boolean true and false.  */
> +  tree vfalse, vtrue;
> +
>/* Convert a block index N to a block vindex, the index used to
>   identify it in the VISITED array.  Check that it's in range:
>   neither ENTRY nor EXIT, but maybe one-past-the-end, to compute
> @@ -596,7 +600,8 @@ public:
>/* Prepare to add control flow redundancy testing to CFUN.  */
>rt_bb_visited (int checkpoints)
>  : nblocks (n_basic_blocks_for_fn (cfun)),
> -  vword_type (NULL), ckseq (NULL), rtcfg (NULL)
> +  vword_type (NULL), ckseq (NULL), rtcfg (NULL),
> +  vfalse (NULL), vtrue (NULL)
>{
>  /* If we've already added a declaration for the builtin checker,
> extract vword_type and vword_bits from its declaration.  */
> @@ -703,7 +708,74 @@ public:
>/* Insert SEQ on E.  */
>void insert_exit_check_on_edge (gimple_seq seq, edge e)
>{
> -gsi_insert_seq_on_edge_immediate (e, seq);
> +if (!(e->flags & EDGE_ABNORMAL))
> +  {
> +   gsi_insert_seq_on_edge_immediate (e, seq);
> +   return;
> +  }
> +
> +/* Initialize SSA boolean constants for use in abnormal PHIs.  */
> +if (!vfalse)
> +  {
> +   vfalse = make_ssa_name (boolean_type_node);
> +   vtrue = make_ssa_name (boolean_type_node);
> +
> +   gimple_seq vft_seq = NULL;
> +   gassign *vfalse_init = gimple_build_assign (vfalse, 
> boolean_false_node);
> +   gimple_seq_add_stmt (_seq, vfalse_init);
> +   gassign *vtrue_init = gimple_build_assign (vtrue, boolean_true_node);
> +   gimple_seq_add_stmt (_seq, vtrue_init);
> +
> +   gsi_insert_seq_on_edge_immediate (single_succ_edge
> + (ENTRY_BLOCK_PTR_FOR_FN (cfun)),
> + vft_seq);
> +  }
> +
> +/* We can't insert on abnormal edges, but we can arrange for SEQ
> +   to execute conditionally at dest.  Add a PHI boolean with TRUE
> +   from E and FALES from other preds, split the whole block, add a
> +   test for the PHI to run a new block with SEQ or skip straight
> +   to the original block.  If there are multiple incoming abnormal
> +   edges, we'll do this multiple times.  ??? Unless there are
> +   multiple abnormal edges with different postcheck status, we
> +   could split the block and redirect other edges, rearranging the
> +   PHI nodes.  Optimizers already know how to do this, so we can
> +   keep things simple here.  */
> +basic_block bb = e->dest;
> +basic_block bb_postcheck = split_block_after_labels (bb)->dest;
> +
> +basic_block bb_check = create_empty_bb (e->dest);
> +bb_check->count = e->count ();
> +if (dom_info_available_p (CDI_DOMINATORS))
> +  set_immediate_dominator (CDI_DOMINATORS, bb_check, bb);
> +

[ARC PATCH] Improved SImode shifts and rotates with -mswap.

2023-10-27 Thread Roger Sayle


This patch improves the code generated by the ARC back-end for CPUs
without a barrel shifter but with -mswap.  The -mswap option provides
a SWAP instruction that implements SImode rotations by 16, but also
logical shift instructions (left and right) by 16 bits.  Clearly these
are also useful building blocks for implementing shifts by 17, 18, etc.
which would otherwise require a loop.

As a representative example:
int shl20 (int x) { return x << 20; }

GCC with -O2 -mcpu=em -mswap would previously generate:

shl20:  mov lp_count,10
lp  2f
add r0,r0,r0
add r0,r0,r0
2:  # end single insn loop
j_s [blink]

with this patch we now generate:

shl20:  mov_s   r2,0;3
lsl16   r0,r0
add3r0,r2,r0
j_s.d   [blink]
asl_s r0,r0

Although both are four instructions (excluding the j_s),
the original takes ~22 cycles, and replacement ~4 cycles.


Tested with a cross-compiler to arc-linux hosted on x86_64,
with no new (compile-only) regressions from make -k check.
Ok for mainline if this passes Claudiu's nightly testing?


2023-10-27  Roger Sayle  

gcc/ChangeLog
* config/arc/arc.cc (arc_split_ashl): Use lsl16 on TARGET_SWAP.
(arc_split_ashr): Use swap and sign-extend on TARGET_SWAP.
(arc_split_lshr): Use lsr16 on TARGET_SWAP.
(arc_split_rotl): Use swap on TARGET_SWAP.
(arc_split_rotr): Likewise.
* config/arc/arc.md (ANY_ROTATE): New code iterator.
(si2_cnt16): New define_insn for alternate form of
swap instruction on TARGET_SWAP.
(ashlsi2_cnt16): Rename from *ashlsi16_cnt16 and move earlier.
(lshrsi2_cnt16): New define_insn for LSR16 instruction.
(*ashlsi2_cnt16): See above.

gcc/testsuite/ChangeLog
* gcc.target/arc/lsl16-1.c: New test case.
* gcc.target/arc/lsr16-1.c: Likewise.
* gcc.target/arc/swap-1.c: Likewise.
* gcc.target/arc/swap-2.c: Likewise.


Thanks in advance,
Roger
--

diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
index 353ac69..e98692a 100644
--- a/gcc/config/arc/arc.cc
+++ b/gcc/config/arc/arc.cc
@@ -4256,6 +4256,17 @@ arc_split_ashl (rtx *operands)
}
  return;
}
+  else if (n >= 16 && n <= 22 && TARGET_SWAP && TARGET_V2)
+   {
+ emit_insn (gen_ashlsi2_cnt16 (operands[0], operands[1]));
+ if (n > 16)
+   {
+ operands[1] = operands[0];
+ operands[2] = GEN_INT (n - 16);
+ arc_split_ashl (operands);
+   }
+ return;
+   }
   else if (n >= 29)
{
  if (n < 31)
@@ -4300,6 +4311,15 @@ arc_split_ashr (rtx *operands)
emit_move_insn (operands[0], operands[1]);
  return;
}
+  else if (n >= 16 && n <= 18 && TARGET_SWAP)
+   {
+ emit_insn (gen_rotrsi2_cnt16 (operands[0], operands[1]));
+ emit_insn (gen_extendhisi2 (operands[0],
+ gen_lowpart (HImode, operands[0])));
+ while (--n >= 16)
+   emit_insn (gen_ashrsi3_cnt1 (operands[0], operands[0]));
+ return;
+   }
   else if (n == 30)
{
  rtx tmp = gen_reg_rtx (SImode);
@@ -4339,6 +4359,13 @@ arc_split_lshr (rtx *operands)
emit_move_insn (operands[0], operands[1]);
  return;
}
+  else if (n >= 16 && n <= 19 && TARGET_SWAP && TARGET_V2)
+   {
+ emit_insn (gen_lshrsi2_cnt16 (operands[0], operands[1]));
+ while (--n >= 16)
+   emit_insn (gen_lshrsi3_cnt1 (operands[0], operands[0]));
+ return;
+   }
   else if (n == 30)
{
  rtx tmp = gen_reg_rtx (SImode);
@@ -4385,6 +4412,19 @@ arc_split_rotl (rtx *operands)
emit_insn (gen_rotrsi3_cnt1 (operands[0], operands[0]));
  return;
}
+  else if (n >= 13 && n <= 16 && TARGET_SWAP)
+   {
+ emit_insn (gen_rotlsi2_cnt16 (operands[0], operands[1]));
+ while (++n <= 16)
+   emit_insn (gen_rotrsi3_cnt1 (operands[0], operands[0]));
+ return;
+   }
+  else if (n == 17 && TARGET_SWAP)
+   {
+ emit_insn (gen_rotlsi2_cnt16 (operands[0], operands[1]));
+ emit_insn (gen_rotlsi3_cnt1 (operands[0], operands[0]));
+ return;
+   }
   else if (n >= 16 || n == 12 || n == 14)
{
  emit_insn (gen_rotrsi3_loop (operands[0], operands[1],
@@ -4415,6 +4455,19 @@ arc_split_rotr (rtx *operands)
emit_move_insn (operands[0], operands[1]);
  return;
}
+  else if (n == 15 && TARGET_SWAP)
+   {
+ emit_insn (gen_rotrsi2_cnt16 (operands[0], operands[1]));
+ emit_insn (gen_rotlsi3_cnt1 (operands[0], operands[0]));
+ return;
+   }
+  else if (n >= 16 && n <= 19 && TARGET_SWAP)
+   {
+ emit_insn (gen_rotrsi2_cnt16 (operands[0], operands[1]));
+ while (--n >= 16)
+

Re: [PATCH] RISC-V: Fix wrong tune parameters on int_div

On 10/26/23 12:50, Yangyu Chen wrote:

This patch fixes an issue with the cost on "int_div" in various RISC-V
tune parameters including those for Rocket, SiFive U7 series, and T-Head
C906. This incorrect cost value interferes with the optimization process.
For example, it prevents the optimization of division by a constant to a
more efficient method known as Barrett reduction. This lack of
optimization negatively affects the performance of these systems.

The integer div cost of the Rocket and SiFive U7 is taken from the
Rocket-Chip Divider source code[1] with BigCore configuration[2]. It shows
the divUnroll unchanged which is 1 by default. Thus, the maximum int_div
cycles should be the dataWidth + 1, which is 33 for 32-bit and 65 for
64-bit.

As for C906, the divider takes 2 cycle to start[3], and it produce 2-bit
result each cycle[4]. Thus, the maximum int_div cycles should be the
dataWidth / 2 + 2, which is 18 for 32-bit and 34 for 64-bit.

I also test the performance on VisionFive2 which has Qual-Core Sifive U74.
I write a simple C program to do 1e8 times div by constant 6 in int32. The
result shows it takes 1.998s using div, and 0.420s using barrett reduction
to replace div with mul, which is 4.75x faster.

[1]
https://github.com/chipsalliance/rocket-chip/blob/v1.6/src/main/scala/rocket/Multiplier.scala#L40
[2]
https://github.com/chipsalliance/rocket-chip/blob/v1.6/src/main/scala/subsystem/Configs.scala#L97
[3]
https://github.com/T-head-Semi/openc906/blob/af5614d72de7e5a4b8609c427d2e20af1deb21c4/C906_RTL_FACTORY/gen_rtl/iu/rtl/aq_iu_div.v#L267
[4]
https://github.com/T-head-Semi/openc906/blob/af5614d72de7e5a4b8609c427d2e20af1deb21c4/C906_RTL_FACTORY/gen_rtl/iu/rtl/aq_iu_div_shift2_kernel.v#L93

gcc/ChangeLog:

* config/riscv/riscv.cc: Fix wrong tune parameters on int_div

Signed-off-by: Yangyu Chen

I adjusted the ChangeLog entry and pushed this to the trunk.

As a follow-up you should look at the tuning parameter which controls
using the divmod expansion. If these uarchs don't do div+mod fusion,
then I should strongly suspect that they'll benefit from using the
divmod expansion path when we need a/b and a%b.

Jeff

Re: [committed] amdgcn: add -march=gfx1030 EXPERIMENTAL

2023-10-27 Thread Andrew Stubbs


On 20/10/2023 12:51, Andrew Stubbs wrote:
I've committed this patch that allows building binaries for AMD gfx1030 
GPUs. I can't actually test it, however, so somebody else will have to 
debug it (or wait for me to get my hands on a device). Richi reports 
that it does not execute correctly, as is.


The patch introduced a bug returning exit values that affected all the 
targets. I've now committed a patch to fix the issue.


Andrewamdgcn: Fix bug in gfx1030 support patch

The previous patch to add gfx1030 support introduced an issue with passing
exit codes from kernels run under gcn-run (offload kernels were unaffected).

gcc/ChangeLog:

PR target/112088
* config/gcn/gcn.cc (gcn_expand_epilogue): Fix kernel epilogue register
conflict.

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 6f85f55803c..6a2aaefceca 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -3615,13 +3615,11 @@ gcn_expand_epilogue (void)
   set_mem_addr_space (retptr_mem, ADDR_SPACE_SCALAR_FLAT);
   emit_move_insn (kernarg_reg, retptr_mem);
 
-  rtx retval_addr = gen_rtx_REG (DImode, FIRST_VPARM_REG);
+  rtx retval_addr = gen_rtx_REG (DImode, FIRST_VPARM_REG + 2);
   emit_move_insn (retval_addr, kernarg_reg);
   rtx retval_mem = gen_rtx_MEM (SImode, retval_addr);
-  rtx scalar_retval = gen_rtx_REG (SImode, FIRST_VPARM_REG + 2);
   set_mem_addr_space (retval_mem, ADDR_SPACE_FLAT);
-  emit_move_insn (scalar_retval, gen_rtx_REG (SImode, RETURN_VALUE_REG));
-  emit_move_insn (retval_mem, scalar_retval);
+  emit_move_insn (retval_mem, gen_rtx_REG (SImode, RETURN_VALUE_REG));
 }
 
   emit_jump_insn (gen_gcn_return ());

Re: [PATCH] RISC-V: Fix wrong tune parameters on int_div





On 10/27/23 11:39, Andrew Waterman wrote:

On Fri, Oct 27, 2023 at 6:44 AM Jeff Law  wrote:




On 10/27/23 01:37, juzhe.zh...@rivai.ai wrote:

LGTM from my side.

The original integer division COST seems too low.

Almost certainly, though there may be good reasons why it was initially
set so low.  I'm generally hesitant to change things like that without
either someone with knowledge of the code/uarch stepping in with a
recommendation or some kind of analysis showing their wrong.


It is probably just a bug, as the cores I'm familiar with mentioned in
the original post do benefit from converting division by constant into
multiplication.  (We should make sure we continue emitting a division
instruction when optimizing for size, though.)
Yes.  When optimizing for size we have a special tune info which 
explicitly turns off the divmod expansion.


jeff

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

Okay, thanks for the explanation.
We will keep this in mind.

Qing

> On Oct 27, 2023, at 1:19 PM, Kees Cook  wrote:
> 
> On Fri, Oct 27, 2023 at 03:10:22PM +, Qing Zhao wrote:
>> Since  the dynamic array support is quite important to the kernel (is this 
>> true, Kees? ),
>> We might need to include such support into our design in the beginning. 
> 
> tl;dr: We don't need "dynamic array support" in the 1st version of 
> __counted_by
> 
> I'm not sure it's as strong as "quite important", but it is a code
> pattern that exists. The vast majority of FAM usage is run-time fixed,
> in the sense that the allocation matches the usage. Only sometimes do we
> over-allocate and then slowly fill it up like I've shown.
> 
> So really my thoughts on this are to bring light to the usage pattern
> in the hopes that we don't make it an impossible thing to do. And if
> it's a limitation of the initial version of __counted_by, the kernel can
> still use it: it will just need to use __counted_by strictly for
> allocation sizes, not "usage" size:
> 
> struct foo {
>   int allocated;
>   int used;
>   int array[] __counted_by(allocated); // would nice to use "used"
> };
> 
>   struct foo *p;
> 
>   p = alloc(sizeof(*p) + sizeof(*p->array) * max_items);
>   p->allocated = max_items;
>   p->used = 0;
> 
>   while (data_available())
>   p->array[++p->used] = next_datum();
> 
> With this, we'll still catch p->array accesses beyond "allocated",
> but other code in the kernel won't catch "invalid data" accesses for
> p->array beyond "used". (i.e. we still have memory corruption protection,
> just not logic error protection.)
> 
> We can deal with aliasing in the future if we want to expand to catching
> logic errors.
> 
> I should not that we don't get logic error protection from things like
> ARM's Memory Tagging Extension either -- it only tracks allocation size
> (and is very expensive to change as the "used" part of an allocation
> grows), so this isn't an unreasonable condition for __counted_by to
> require as well.
> 
> -- 
> Kees Cook

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-27 Thread Bernhard Reutner-Fischer

On Wed, 25 Oct 2023 16:41:07 +0530
Ajit Agarwal  wrote:

> On 25/10/23 2:19 am, Vineet Gupta wrote:
> > On 10/24/23 13:36, rep.dot@gmail.com wrote:  
> >> As said, I don't see why the below was not cleaned up before the V1 
> >> submission.
> >> Iff it breaks when manually CSEing, I'm curious why?  
>  The function below looks identical in v12 of the patch.
>  Why didn't you use common subexpressions?
>  ba  
> >>> Using CSE here breaks aarch64 regressions hence I have reverted it back
> >>> not to use CSE,  
> >> Just for my own education, can you please paste your patch perusing common 
> >> subexpressions and an assembly diff of the failing versus working aarch64 
> >> testcase, along how you configured that failing (cross-?)compiler and the 
> >> command-line of a typical testcase that broke when manually CSEing the 
> >> function below?  
> > 
> > I was meaning to ask this before, but what exactly is the CSE issue, 
> > manually or whatever.

If nothing else it would hopefully improve the readability.

> >   
> Here is the abi interface where I CSE'D and got a mail from automated 
> regressions run that aarch64
> test fails.

We already concluded that this failure was obviously a hiccup on the
testers, no problem.

> +static inline bool
> +abi_extension_candidate_return_reg_p (int regno)
> +{
> +  return targetm.calls.function_value_regno_p (regno);
> +}

But i was referring to abi_extension_candidate_p :)

your v13 looks like this:

+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  machine_mode dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set), 0);
+
+  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  || abi_extension_candidate_return_reg_p (REGNO (orig_src)))
+return false;
+
+  /* Return FALSE if mode of destination and source is same.  */
+  if (dst_mode == GET_MODE (orig_src))
+return false;
+
+  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
+  bool promote_p = abi_target_promote_function_mode (mode);
+
+  /* Return FALSE if promote is false and REGNO of source and destination
+ is different.  */
+  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
+return false;
+
+  return true;
+}

and i suppose it would be easier to read if phrased something like

static bool
abi_extension_candidate_p (rtx_insn *insn)
{
  rtx set = single_set (insn);
  rtx orig_src = XEXP (SET_SRC (set), 0);
  unsigned int src_regno = REGNO (orig_src);

  /* Not a function argument reg or is a function values return reg.  */
  if (!FUNCTION_ARG_REGNO_P (src_regno)
  || abi_extension_candidate_return_reg_p (src_regno))
return false;

  rtx dst = SET_DST (set);
  machine_mode src_mode = GET_MODE (orig_src);

  /* Return FALSE if mode of destination and source is the same.  */
  if (GET_MODE (dst) == src_mode)
return false;

  /* Return FALSE if the FIX THE COMMENT and REGNO of source and destination
 is different.  */
  if (!abi_target_promote_function_mode_p (src_mode)
  && REGNO (dst) != src_regno)
return false;

  return true;
}

so no, that's not exactly better.

Maybe just do what the function comment says (i did not check the "not
promoted" part, but you get the idea):

^L

/* Return TRUE if
   reg source operand is argument register and not return register,
   mode of source and destination operand are different,
   if not promoted REGNO of source and destination operand are the same.  */
static bool
abi_extension_candidate_p (rtx_insn *insn)
{
  rtx set = single_set (insn);
  rtx orig_src = XEXP (SET_SRC (set), 0);

  if (FUNCTION_ARG_REGNO_P (REGNO (orig_src))
  && !abi_extension_candidate_return_reg_p (REGNO (orig_src))
  && GET_MODE (SET_DST (set)) != GET_MODE (orig_src)
  && abi_target_promote_function_mode_p (GET_MODE (orig_src))
  && REGNO (SET_DST (set)) == REGNO (orig_src))
return true;

  return false;
}

I think this is much easier to actually read (and that's why good
function comments are important). In the end it's not important and
just personal preference.
Either way, I did not check the plausibility of the logic therein.

> 
> 
> I have not done any assembly diff as myself have not cross compiled with 
> aarch64.

fair enough.

[PATCH] RISC-V: Make stack_save_restore_2 more robust

2023-10-27 Thread Patrick O'Neill

GCC recently changed to emit __riscv_restore_5 which causes this
testcase to fail.
This patch updates the regex to be more robust to change by accepting
any number after __riscv_save_ and __riscv_restore_.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/stack_save_restore_2.c: Accept any number
after __riscv_save_ and __riscv_restore_.

Signed-off-by: Patrick O'Neill 
---
Tested using glibc rv64gc on r14-4980-g2672c60917d.
---
 gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c 
b/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
index 4c549cb11ae..5f0389243b1 100644
--- a/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
+++ b/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
@@ -7,7 +7,7 @@ float getf();
 
 /*
 ** bar:
-** callt0,__riscv_save_(3|4)
+** callt0,__riscv_save_[0-9]+
 ** addisp,sp,-[0-9]+
 ** ...
 ** li  t0,-[0-9]+
@@ -17,7 +17,7 @@ float getf();
 ** add sp,sp,t0
 ** ...
 ** addisp,sp,[0-9]+
-** tail__riscv_restore_(3|4)
+** tail__riscv_restore_[0-9]+
 */
 int bar()
 {
-- 
2.34.1

Re: [PATCH] genemit: Split insn-emit.cc into ten files.

After working with Sam off-list (thanks) I managed to get hppa to
build.  Initially it looked as if hppa just had a very small number of
instruction patterns so we wouldn't generate all 10 output files.
However, the actual issue (which we will only hit with a low
pattern count) was with counting all the patterns vs only counting
the patterns that will be output.  A wrong pattern count lead to
prematurely stopping to write output files.

With that corrected, hppa "just works" until I hit linker errors
due to relocations - most likely unrelated:

bin/ld: unwind-dw2-fde-dip_s.o(.data.rel.ro+0): cannot handle
R_PARISC_FPTR64 for __pthread_key_create@@GLIBC_2.34

Attached is v3 that has been bootstrapped and tested on x86 and power10,
aarch64 bootstrap was ok, testsuite is still running.  A riscv build and
testsuite run was successful as well.

Regards
 Robin

>From 248744c328440bff9cc339d2bf622852cbaac343 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Thu, 12 Oct 2023 11:23:26 +0200
Subject: [PATCH v3] genemit: Split insn-emit.cc into several partitions.

On riscv insn-emit.cc has grown to over 1.2 mio lines of code and
compiling it takes considerable time.
Therefore, this patch adjust genemit to create several partitions
(insn-emit-1.cc to insn-emit-n.cc).  The available patterns are
written to the given files in a sequential fashion.

Similar to match.pd a configure option --with-emitinsn-partitions=num
is introduced that makes the number of partition configurable.

gcc/ChangeLog:

PR bootstrap/84402
PR target/111600

* Makefile.in: Handle split insn-emit.cc.
* configure: Regenerate.
* configure.ac: Add --with-insnemit-partitions.
* genemit.cc (output_peephole2_scratches): Print to file instead
of stdout.
(print_code): Ditto.
(gen_rtx_scratch): Ditto.
(gen_exp): Ditto.
(gen_emit_seq): Ditto.
(emit_c_code): Ditto.
(gen_insn): Ditto.
(gen_expand): Ditto.
(gen_split): Ditto.
(output_add_clobbers): Ditto.
(output_added_clobbers_hard_reg_p): Ditto.
(print_overload_arguments): Ditto.
(print_overload_test): Ditto.
(handle_overloaded_code_for): Ditto.
(handle_overloaded_gen): Ditto.
(print_header): New function.
(handle_arg): New function.
(main): Split output into 10 files.
* gensupport.cc (count_patterns): New function.
* gensupport.h (count_patterns): Define.
* read-md.cc (md_reader::print_md_ptr_loc): Add file argument.
* read-md.h (class md_reader): Change definition.
---
 gcc/Makefile.in   |  36 ++-
 gcc/configure |  24 +-
 gcc/configure.ac  |  13 ++
 gcc/genemit.cc| 542 +-
 gcc/gensupport.cc |  55 +
 gcc/gensupport.h  |   1 +
 gcc/read-md.cc|   4 +-
 gcc/read-md.h |   2 +-
 8 files changed, 422 insertions(+), 255 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 91d6bfbea4d..d8bfad8de15 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -236,6 +236,13 @@ GIMPLE_MATCH_PD_SEQ_O = $(patsubst %, gimple-match-%.o, 
$(MATCH_SPLITS_SEQ))
 GENERIC_MATCH_PD_SEQ_SRC = $(patsubst %, generic-match-%.cc, 
$(MATCH_SPLITS_SEQ))
 GENERIC_MATCH_PD_SEQ_O = $(patsubst %, generic-match-%.o, $(MATCH_SPLITS_SEQ))
 
+# The number of splits to be made for the insn-emit files.
+NUM_INSNEMIT_SPLITS = @DEFAULT_INSNEMIT_PARTITIONS@
+INSNEMIT_SPLITS_SEQ = $(wordlist 1,$(NUM_INSNEMIT_SPLITS),$(one_to_))
+INSNEMIT_SEQ_SRC = $(patsubst %, insn-emit-%.cc, $(INSNEMIT_SPLITS_SEQ))
+INSNEMIT_SEQ_TMP = $(patsubst %, tmp-emit-%.cc, $(INSNEMIT_SPLITS_SEQ))
+INSNEMIT_SEQ_O = $(patsubst %, insn-emit-%.o, $(INSNEMIT_SPLITS_SEQ))
+
 # These files are to have specific diagnostics suppressed, or are not to
 # be subject to -Werror:
 # flex output may yield harmless "no previous prototype" warnings
@@ -1356,7 +1363,7 @@ OBJS = \
insn-attrtab.o \
insn-automata.o \
insn-dfatab.o \
-   insn-emit.o \
+   $(INSNEMIT_SEQ_O) \
insn-extract.o \
insn-latencytab.o \
insn-modes.o \
@@ -1857,7 +1864,8 @@ TREECHECKING = @TREECHECKING@
 FULL_DRIVER_NAME=$(target_noncanonical)-gcc-$(version)$(exeext)
 
 MOSTLYCLEANFILES = insn-flags.h insn-config.h insn-codes.h \
- insn-output.cc insn-recog.cc insn-emit.cc insn-extract.cc insn-peep.cc \
+ insn-output.cc insn-recog.cc $(INSNEMIT_SEQ_SRC) \
+ insn-extract.cc insn-peep.cc \
  insn-attr.h insn-attr-common.h insn-attrtab.cc insn-dfatab.cc \
  insn-latencytab.cc insn-opinit.cc insn-opinit.h insn-preds.cc 
insn-constants.h \
  tm-preds.h tm-constrs.h checksum-options $(GIMPLE_MATCH_PD_SEQ_SRC) \
@@ -2489,11 +2497,11 @@ $(common_out_object_file): $(common_out_file)
 # and compile them.
 
 .PRECIOUS: insn-config.h insn-flags.h insn-codes.h insn-constants.h \
-  insn-emit.cc insn-recog.cc insn-extract.cc insn-output.cc insn-peep.cc \
-  insn-attr.h

[PATCH v3 3/3] c++: note other candidates when diagnosing deletedness

With the previous two patches in place, we can now extend our
deletedness diagnostic to note the other considered candidates, e.g.:

  deleted16.C: In function 'int main()':
  deleted16.C:10:4: error: use of deleted function 'void f(int)'
 10 |   f(0);
|   ~^~~
  deleted16.C:5:6: note: declared here
  5 | void f(int) = delete;
|  ^
  deleted16.C:5:6: note: candidate: 'void f(int)' (deleted)
  deleted16.C:6:6: note: candidate: 'void f(...)'
  6 | void f(...);
|  ^
  deleted16.C:7:6: note: candidate: 'void f(int, int)'
  7 | void f(int, int);
|  ^
  deleted16.C:7:6: note:   candidate expects 2 arguments, 1 provided

These notes are controlled by a new command line flag -fnote-all-cands,
which also controls whether we note ignored candidates more generally.

gcc/ChangeLog:

* doc/invoke.texi (C++ Dialect Options): Document -fnote-all-cands.

gcc/c-family/ChangeLog:

* c.opt: Add -fnote-all-cands.

gcc/cp/ChangeLog:

* call.cc (print_z_candidates): Only print ignored candidates
when -fnote-all-cands is set.
(build_over_call): When diagnosing deletedness, call
print_z_candidates if -fnote-all-cands is set.

gcc/testsuite/ChangeLog:

* g++.dg/overload/error6.C: Pass -fnote-all-cands.
* g++.dg/cpp0x/deleted16.C: New test.
---
 gcc/c-family/c.opt |  4 
 gcc/cp/call.cc |  8 +++-
 gcc/doc/invoke.texi|  5 +
 gcc/testsuite/g++.dg/cpp0x/deleted16.C | 25 +
 gcc/testsuite/g++.dg/overload/error6.C |  1 +
 5 files changed, 42 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/deleted16.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 44b9c862c14..a76f73cc661 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -2006,6 +2006,10 @@ fnil-receivers
 ObjC ObjC++ Var(flag_nil_receivers) Init(1)
 Assume that receivers of Objective-C messages may be nil.
 
+fnote-all-cands
+C++ ObjC++ Var(flag_note_all_cands)
+Note all candidates during overload resolution failure.
+
 flocal-ivars
 ObjC ObjC++ Var(flag_local_ivars) Init(1)
 Allow access to instance variables as if they were local declarations within 
instance method implementations.
diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 81cc029dddb..7ace0e65096 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -4090,6 +4090,8 @@ print_z_candidates (location_t loc, struct z_candidate 
*candidates,
 {
   if (only_viable_p.is_true () && candidates->viable != 1)
break;
+  if (ignored_candidate_p (candidates) && !flag_note_all_cands)
+   break;
   print_z_candidate (loc, N_("candidate:"), candidates);
 }
 }
@@ -9933,7 +9935,11 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
   if (DECL_DELETED_FN (fn))
 {
   if (complain & tf_error)
-   mark_used (fn);
+   {
+ mark_used (fn);
+ if (cand->next && flag_note_all_cands)
+   print_z_candidates (input_location, cand, /*only_viable_p=*/false);
+   }
   return error_mark_node;
 }
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5a9284d635c..ac82299416c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -3479,6 +3479,11 @@ Disable built-in declarations of functions that are not 
mandated by
 ANSI/ISO C@.  These include @code{ffs}, @code{alloca}, @code{_exit},
 @code{index}, @code{bzero}, @code{conjf}, and other related functions.
 
+@opindex fnote-all-cands
+@item -fnote-all-cands
+Permit the C++ front end to note all candidates during overload resolution
+failure, including when a deleted function is selected.
+
 @opindex fnothrow-opt
 @item -fnothrow-opt
 Treat a @code{throw()} exception specification as if it were a
diff --git a/gcc/testsuite/g++.dg/cpp0x/deleted16.C 
b/gcc/testsuite/g++.dg/cpp0x/deleted16.C
new file mode 100644
index 000..506caae76b6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/deleted16.C
@@ -0,0 +1,25 @@
+// Verify -fnote-all-cands causes us to note other candidates when a deleted
+// function is selected by overload resolution.
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-fnote-all-cands" }
+
+void f(int) = delete; // { dg-message "declared here|candidate" }
+void f(...); // { dg-message "candidate" }
+void f(int, int); // { dg-message "candidate" }
+
+// An example where the perfect candidate optimization causes us
+// to ignore function templates.
+void g(int) = delete; // { dg-message "declared here|candidate" }
+template void g(T); // { dg-message "candidate" }
+
+// An example where we have a strictly viable candidate and
+// an incompletely considered bad candidate.
+template void h(T, T) = delete; // { dg-message "declared 
here|candidate" }
+void h(int*, int) = delete; // { dg-message "candidate" }
+
+int main() {
+  f(0); // { dg-error "deleted" }
+  g(0); // { dg-error

[PATCH v3 2/3] c++: remember candidates that we ignored

During overload resolution, we sometimes outright ignore a function in
the overload set and leave no trace of it in the candidates list, for
example when we find a perfect non-template candidate we discard all
function templates, or when the callee is a template-id we discard all
non-template functions.  We should still however make note of these
non-viable functions when diagnosing overload resolution failure, but
that's not possible if they're not present in the returned candidates
list.

To that end, this patch reworks add_candidates to add such ignored
functions to the list.  The new rr_ignored rejection reason is somewhat
of a catch-all; we could perhaps split it up into more specific rejection
reasons, but I leave that as future work.

gcc/cp/ChangeLog:

* call.cc (enum rejection_reason_code): Add rr_ignored.
(add_ignored_candidate): Define.
(ignored_candidate_p): Define.
(add_template_candidate_real): Do add_ignored_candidate
instead of returning NULL.
(splice_viable): Put ignored (non-viable) candidates last.
(print_z_candidate): Handle ignored candidates.
(build_new_function_call): Refine shortcut that calls
cp_build_function_call_vec now that non-templates can
appear in the candidate list for a template-id call.
(add_candidates): Replace 'bad_fns' overload with 'bad_cands'
candidate list.  When not considering a candidate, add it
to the list as an ignored candidate.  Add all 'bad_cands'
to the overload set as well.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/param-type-mismatch-2.C: Rename template
function test_7 that accidentally (perhaps) shares the same
name as its non-template callee.
* g++.dg/overload/error6.C: New test.
---
 gcc/cp/call.cc| 150 +-
 .../g++.dg/diagnostic/param-type-mismatch-2.C |  20 +--
 gcc/testsuite/g++.dg/overload/error6.C|   9 ++
 3 files changed, 133 insertions(+), 46 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/overload/error6.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 5d175b93a47..81cc029dddb 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -441,7 +441,8 @@ enum rejection_reason_code {
   rr_template_unification,
   rr_invalid_copy,
   rr_inherited_ctor,
-  rr_constraint_failure
+  rr_constraint_failure,
+  rr_ignored,
 };
 
 struct conversion_info {
@@ -2224,6 +2225,35 @@ add_candidate (struct z_candidate **candidates,
   return cand;
 }
 
+/* FN is a function from the overload set that we outright didn't even
+   consider (for some reason); add it to the list as an non-viable "ignored"
+   candidate.  */
+
+static z_candidate *
+add_ignored_candidate (z_candidate **candidates, tree fn)
+{
+  /* No need to dynamically allocate these.  */
+  static const rejection_reason reason_ignored = { rr_ignored, {} };
+
+  struct z_candidate *cand = (struct z_candidate *)
+conversion_obstack_alloc (sizeof (struct z_candidate));
+
+  cand->fn = fn;
+  cand->reason = const_cast (_ignored);
+  cand->next = *candidates;
+  *candidates = cand;
+
+  return cand;
+}
+
+/* True iff CAND is a candidate added by add_ignored_candidate.  */
+
+static bool
+ignored_candidate_p (const z_candidate *cand)
+{
+  return cand->reason && cand->reason->code == rr_ignored;
+}
+
 /* Return the number of remaining arguments in the parameter list
beginning with ARG.  */
 
@@ -3471,7 +3501,7 @@ add_template_candidate_real (struct z_candidate 
**candidates, tree tmpl,
 }
 
   if (len < skip_without_in_chrg)
-return NULL;
+return add_ignored_candidate (candidates, tmpl);
 
   if (DECL_CONSTRUCTOR_P (tmpl) && nargs == 2
   && same_type_ignoring_top_level_qualifiers_p (TREE_TYPE (first_arg),
@@ -3609,7 +3639,7 @@ add_template_candidate_real (struct z_candidate 
**candidates, tree tmpl,
   if (((flags & (LOOKUP_ONLYCONVERTING|LOOKUP_LIST_INIT_CTOR))
== LOOKUP_ONLYCONVERTING)
   && DECL_NONCONVERTING_P (fn))
-return NULL;
+return add_ignored_candidate (candidates, fn);
 
   if (DECL_CONSTRUCTOR_P (fn) && nargs == 2)
 {
@@ -3724,6 +3754,9 @@ splice_viable (struct z_candidate *cands,
   z_candidate *non_viable = nullptr;
   z_candidate **non_viable_tail = _viable;
 
+  z_candidate *non_viable_ignored = nullptr;
+  z_candidate **non_viable_ignored_tail = _viable_ignored;
+
   /* Be strict inside templates, since build_over_call won't actually
  do the conversions to get pedwarns.  */
   if (processing_template_decl)
@@ -3742,6 +3775,7 @@ splice_viable (struct z_candidate *cands,
 its viability.  */
   auto& tail = (cand->viable == 1 ? strictly_viable_tail
: cand->viable == -1 ? non_strictly_viable_tail
+   : ignored_candidate_p (cand) ? non_viable_ignored_tail
: non_viable_tail);
   *tail = cand;
   tail = >next;
@@ -3751,7 +3785,8 @@ splice_viable (struct

[PATCH v3 1/3] c++: sort candidates according to viability

New in patch 1/3:
  * consistently use "non-viable" instead of "unviable"
throughout
  * make 'champ' and 'challenger' in 'tourney' be z_candidate**
to simplify moving 'champ' to the front of the list.  drive-by
cleanups in tourney, including renaming 'champ_compared_to_predecessor'
to 'previous_worse_champ' for clarity.
New in patch 2/3:
  * consistently use "non-viable" instead of "unviable" throughout
New in patch 3/3:
  * introduce new -fnote-all-cands flag that controls noting other
candidates when diagnosing deletedness, and also controls
noting "ignored" candidates in general.

-- >8 --

This patch:

  * changes splice_viable to move the non-viable candidates to the end
of the list instead of removing them outright
  * makes tourney move the best candidate to the front of the candidate
list
  * adjusts print_z_candidates to preserve our behavior of printing only
viable candidates when diagnosing ambiguity
  * adds a parameter to print_z_candidates to control this default behavior
(the follow-up patch will want to print all candidates when diagnosing
deletedness)

Thus after this patch we have access to the entire candidate list through
the best viable candidate.

This change also happens to fix diagnostics for the below testcase where
we currently neglect to note the third candidate, since the presence of
the two unordered non-strictly viable candidates causes splice_viable to
prematurely get rid of the non-viable third candidate.

gcc/cp/ChangeLog:

* call.cc: Include "tristate.h".
(splice_viable): Sort the candidate list according to viability.
Don't remove non-viable candidates from the list.
(print_z_candidates): Add defaulted only_viable_p parameter.
By default only print non-viable candidates if there is no
viable candidate.
(tourney): Make 'candidates' parameter a reference.  Ignore
non-viable candidates.  Move the true champ to the front
of the candidates list, and update 'candidates' to point to
the front.  Drive-by cleanups, including renaming
'champ_compared_to_predecessor' to 'previous_worse_champ'.

gcc/testsuite/ChangeLog:

* g++.dg/overload/error5.C: New test.
---
 gcc/cp/call.cc | 181 ++---
 gcc/testsuite/g++.dg/overload/error5.C |  12 ++
 2 files changed, 117 insertions(+), 76 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/overload/error5.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 2eb54b5b6ed..5d175b93a47 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "attribs.h"
 #include "decl.h"
 #include "gcc-rich-location.h"
+#include "tristate.h"
 
 /* The various kinds of conversion.  */
 
@@ -160,7 +161,7 @@ static struct obstack conversion_obstack;
 static bool conversion_obstack_initialized;
 struct rejection_reason;
 
-static struct z_candidate * tourney (struct z_candidate *, tsubst_flags_t);
+static struct z_candidate * tourney (struct z_candidate *&, tsubst_flags_t);
 static int equal_functions (tree, tree);
 static int joust (struct z_candidate *, struct z_candidate *, bool,
  tsubst_flags_t);
@@ -176,7 +177,8 @@ static void op_error (const op_location_t &, enum 
tree_code, enum tree_code,
 static struct z_candidate *build_user_type_conversion_1 (tree, tree, int,
 tsubst_flags_t);
 static void print_z_candidate (location_t, const char *, struct z_candidate *);
-static void print_z_candidates (location_t, struct z_candidate *);
+static void print_z_candidates (location_t, struct z_candidate *,
+   tristate = tristate::unknown ());
 static tree build_this (tree);
 static struct z_candidate *splice_viable (struct z_candidate *, bool, bool *);
 static bool any_strictly_viable (struct z_candidate *);
@@ -3700,68 +3702,60 @@ add_template_conv_candidate (struct z_candidate 
**candidates, tree tmpl,
 }
 
 /* The CANDS are the set of candidates that were considered for
-   overload resolution.  Return the set of viable candidates, or CANDS
-   if none are viable.  If any of the candidates were viable, set
+   overload resolution.  Sort CANDS so that the strictly viable
+   candidates appear first, followed by non-strictly viable candidates,
+   followed by non-viable candidates.  Returns the first candidate
+   in this sorted list.  If any of the candidates were viable, set
*ANY_VIABLE_P to true.  STRICT_P is true if a candidate should be
-   considered viable only if it is strictly viable.  */
+   considered viable only if it is strictly viable when setting
+   *ANY_VIABLE_P.  */
 
 static struct z_candidate*
 splice_viable (struct z_candidate *cands,
   bool strict_p,
   bool *any_viable_p)
 {
-  struct z_candidate *viable;
-  struct z_candidate **last_viable;
-  struct z_candidate

Re: [PATCH] RISC-V: Fix wrong tune parameters on int_div

2023-10-27 Thread Andrew Waterman

On Fri, Oct 27, 2023 at 6:44 AM Jeff Law  wrote:
>
>
>
> On 10/27/23 01:37, juzhe.zh...@rivai.ai wrote:
> > LGTM from my side.
> >
> > The original integer division COST seems too low.
> Almost certainly, though there may be good reasons why it was initially
> set so low.  I'm generally hesitant to change things like that without
> either someone with knowledge of the code/uarch stepping in with a
> recommendation or some kind of analysis showing their wrong.

It is probably just a bug, as the cores I'm familiar with mentioned in
the original post do benefit from converting division by constant into
multiplication.  (We should make sure we continue emitting a division
instruction when optimizing for size, though.)

>
> >
> > Hi, Jeff and Kito. Could take a look at this patch ?
> It's on the list.
>
> jeff

[pushed] [RA]: Fixing i686 bootstrap failure because of pushing the equivalence patch

2023-10-27 Thread Vladimir Makarov

The following patch fixes i686 bootstrap failure because of my recent 
patch:


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112107

commit 7361b49d7fa3624cb3f1f825a22607d9d31986e5
Author: Vladimir N. Makarov 
Date:   Fri Oct 27 14:50:40 2023 -0400

[RA]: Fixing i686 bootstrap failure because of pushing the equivalence patch

GCC with my recent patch improving cost calculation for pseudos with
equivalence may generate different code with and without debug info
and as the result i686 bootstrap fails on i686.  The patch fixes this
bug.

gcc/ChangeLog:

PR rtl-optimization/112107
* ira-costs.cc: (calculate_equiv_gains): Use NONDEBUG_INSN_P
instead of INSN_P.

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index c4086807076..50f80779025 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -1871,7 +1871,8 @@ calculate_equiv_gains (void)
 	= ira_bb_nodes[bb->index].parent->regno_allocno_map;
   FOR_BB_INSNS (bb, insn)
 	{
-	  if (!INSN_P (insn) || !get_equiv_regno (PATTERN (insn), regno, subreg)
+	  if (!NONDEBUG_INSN_P (insn)
+	  || !get_equiv_regno (PATTERN (insn), regno, subreg)
 	  || !bitmap_bit_p (_pseudos, regno))
 	continue;
 	  rtx subst = ira_reg_equiv[regno].memory;

Re: [PATCH v4] bpf: Improvements in CO-RE builtins implementation.

2023-10-27 Thread Cupertino Miranda



Hi David,

David Faust writes:

> On 10/26/23 08:08, Cupertino Miranda wrote:
>>
>> Changes from v1:
>>  - Fixed Davids remarks on initial patch.
>>  - Fixed mistake with deleted '*'.
>>
>> Changes from v2:
>>  - Reversed return value for bpf_const_not_ok_for_debug_p function.
>
> Hmm..
>
>> +static bool
>> +bpf_const_not_ok_for_debug_p (rtx p)
>> +{
>> +  if (GET_CODE (p) == UNSPEC
>> +  && XINT (p, 1) == UNSPEC_CORE_RELOC)
>> +return false;
>> +
>> +  return true;
>> +}
>> +
>> +#undef TARGET_CONST_NOT_OK_FOR_DEBUG_P
>> +#define TARGET_CONST_NOT_OK_FOR_DEBUG_P bpf_const_not_ok_for_debug_p
>
>  -- Target Hook: bool TARGET_CONST_NOT_OK_FOR_DEBUG_P (rtx X)
>  This hook should return true if X should not be emitted into debug
>  sections.
>
> As written now, won't this cause all ordinary (non-UNSPEC_CORE_RELOC)
> consts to get rejected for debug? ("regular" debug i.e. DWARF, not to
> be confused with the BTF.ext holding CO-RE relocs).
>
> I see other targets implementing the hook returning true only in
> specific cases and false otherwise.  The implementation in v1 makes
> more sense to me.  Could you explain why flip the return value?
It turns out that defining this hook is not the proper solution.
I am trying a different approach which I believe is better.

Thanks,
Cupertino

>>
>> Changes from v3:
>>  - Fixed ICE in two bpf-next tests:
>>  -  if (!wi->is_lhs)
>>  -   core_mark_as_access_index (gimple_get_lhs (wi->stmt));
>>  +  tree lhs;
>>  +  if (!wi->is_lhs
>>  + && (lhs = gimple_get_lhs (wi->stmt)) != NULL_TREE)
>>  +   core_mark_as_access_index (lhs);
>>

[wwwdocs] Get newlib via git in simtest-howto.html

2023-10-27 Thread Roger Sayle


A minor tweak to the documentation, to use git rather than cvs to obtain
the latest version of newlib.  Ok for mainline?


2023-10-27  Roger Sayle  

* htdocs/simtest-howto.html: Use git to obtain newlib.

Cheers,
Roger
--

diff --git a/htdocs/simtest-howto.html b/htdocs/simtest-howto.html
index 2e54476b..d9c027fd 100644
--- a/htdocs/simtest-howto.html
+++ b/htdocs/simtest-howto.html
@@ -59,9 +59,7 @@ contrib/gcc_update --touch
 
 
 cd ${TOP}
-cvs -d :pserver:anon...@sourceware.org:/cvs/src login
-# You will be prompted for a password; reply with "anoncvs".
-cvs -d :pserver:anon...@sourceware.org:/cvs/src co newlib
+git clone https://sourceware.org/git/newlib-cygwin.git newlib
 
 
 Check out the sim and binutils tree:

[PATCH v2] aarch64: SVE/NEON Bridging intrinsics

2023-10-27 Thread Richard Ball

ACLE has added intrinsics to bridge between SVE and Neon.

The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
SVE vectors.

This patch adds support to GCC for the following 3 intrinsics:
svset_neonq, svget_neonq and svdup_neonq

gcc/ChangeLog:

* config.gcc: Adds new header to config.
* config/aarch64/aarch64-builtins.cc (enum aarch64_type_qualifiers):
Moved to header file.
(ENTRY): Likewise.
(enum aarch64_simd_type): Likewise.
(struct aarch64_simd_type_info): Make extern.
(GTY): Likewise.
* config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64):
Defines pragma for arm_neon_sve_bridge.h.
* config/aarch64/aarch64-protos.h: New function.
* config/aarch64/aarch64-sve-builtins-base.h: New intrinsics.
* config/aarch64/aarch64-sve-builtins-base.cc
(class svget_neonq_impl): New intrinsic implementation.
(class svset_neonq_impl): Likewise.
(class svdup_neonq_impl): Likewise.
(NEON_SVE_BRIDGE_FUNCTION): New intrinsics.
* config/aarch64/aarch64-sve-builtins-functions.h
(NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE
functions.
* config/aarch64/aarch64-sve-builtins-shapes.h: New shapes.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_element_type): Add NEON element types.
(parse_type): Likewise.
(struct get_neonq_def): Defines function shape for get_neonq.
(struct set_neonq_def): Defines function shape for set_neonq.
(struct dup_neonq_def): Defines function shape for dup_neonq.
* config/aarch64/aarch64-sve-builtins.cc (DEF_SVE_TYPE_SUFFIX):
(DEF_SVE_NEON_TYPE_SUFFIX): Defines 
macro for NEON_SVE_BRIDGE type suffixes.
(DEF_NEON_SVE_FUNCTION): Defines 
macro for NEON_SVE_BRIDGE functions.
(function_resolver::infer_neon128_vector_type): Infers type suffix
for overloaded functions.
(init_neon_sve_builtins): Initialise neon_sve_bridge_builtins for LTO.
(handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h.
* config/aarch64/aarch64-sve-builtins.def
(DEF_SVE_NEON_TYPE_SUFFIX): Macro for handling neon_sve type suffixes.
(bf16): Replace entry with neon-sve entry.
(f16): Likewise.
(f32): Likewise.
(f64): Likewise.
(s8): Likewise.
(s16): Likewise.
(s32): Likewise.
(s64): Likewise.
(u8): Likewise.
(u16): Likewise.
(u32): Likewise.
(u64): Likewise.
* config/aarch64/aarch64-sve-builtins.h
(GCC_AARCH64_SVE_BUILTINS_H): Include aarch64-builtins.h.
(ENTRY): Add aarch64_simd_type definiton.
(enum aarch64_simd_type): Add neon information to type_suffix_info.
(struct type_suffix_info): New function.
* config/aarch64/aarch64-sve.md
(@aarch64_sve_get_neonq_): New intrinsic insn for big endian.
(@aarch64_sve_set_neonq_): Likewise.
(@aarch64_sve_dup_neonq_): Likewise.
* config/aarch64/aarch64.cc (aarch64_init_builtins):
Add call to init_neon_sve_builtins.
* config/aarch64/iterators.md: Add UNSPEC_SET_NEONQ.
* config/aarch64/aarch64-builtins.h: New file.
* config/aarch64/aarch64-neon-sve-bridge-builtins.def: New file.
* config/aarch64/arm_neon_sve_bridge.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add include 
arm_neon_sve_bridge header file
* gcc.dg/torture/neon-sve-bridge.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s64.c: New test.
*

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-27 Thread Kees Cook

On Fri, Oct 27, 2023 at 03:10:22PM +, Qing Zhao wrote:
> Since  the dynamic array support is quite important to the kernel (is this 
> true, Kees? ),
> We might need to include such support into our design in the beginning. 

tl;dr: We don't need "dynamic array support" in the 1st version of __counted_by

I'm not sure it's as strong as "quite important", but it is a code
pattern that exists. The vast majority of FAM usage is run-time fixed,
in the sense that the allocation matches the usage. Only sometimes do we
over-allocate and then slowly fill it up like I've shown.

So really my thoughts on this are to bring light to the usage pattern
in the hopes that we don't make it an impossible thing to do. And if
it's a limitation of the initial version of __counted_by, the kernel can
still use it: it will just need to use __counted_by strictly for
allocation sizes, not "usage" size:

struct foo {
int allocated;
int used;
int array[] __counted_by(allocated); // would nice to use "used"
};

struct foo *p;

p = alloc(sizeof(*p) + sizeof(*p->array) * max_items);
p->allocated = max_items;
p->used = 0;

while (data_available())
p->array[++p->used] = next_datum();

With this, we'll still catch p->array accesses beyond "allocated",
but other code in the kernel won't catch "invalid data" accesses for
p->array beyond "used". (i.e. we still have memory corruption protection,
just not logic error protection.)

We can deal with aliasing in the future if we want to expand to catching
logic errors.

I should not that we don't get logic error protection from things like
ARM's Memory Tagging Extension either -- it only tracks allocation size
(and is very expensive to change as the "used" part of an allocation
grows), so this isn't an unreasonable condition for __counted_by to
require as well.

-- 
Kees Cook

Re: [committed] RISC-V: Make stack_save_restore tests more robust

2023-10-27 Thread Patrick O'Neill



On 8/25/23 15:36, Jeff Law wrote:
Spurred by Jivan's patch and a desire for cleaner testresults, I went 
ahead and make the stack_save_restore tests independent of the precise 
stack size by using a regexp.


Pushed to the trunk.

Jeff


Hi Jeff, A recent change that I'm still bisecting [1] caused 
stack_save_restore_2.c to start failing.


Debug log:

Executing on host: 
/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
 
-B/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
  
/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output
-O0  -march=rv32imafc -mabi=ilp32f -msave-restore -O2 -fno-schedule-insns 
-fno-schedule-insns2 -fno-unroll-loops -fno-peel-loops -fno-lto -S   -o 
stack_save_restore_2.s(timeout = 600)
spawn -ignore SIGHUP 
/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
 
-B/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
 
/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -O0 
-march=rv32imafc -mabi=ilp32f -msave-restore -O2 -fno-schedule-insns 
-fno-schedule-insns2 -fno-unroll-loops -fno-peel-loops -fno-lto -S -o 
stack_save_restore_2.s
PASS: gcc.target/riscv/stack_save_restore_2.c   -O0  (test for excess errors)
body: \tcallt0,__riscv_save_(3|4)
\taddi  sp,sp,-[0-9]+
.*\tli  t0,-[0-9]+
\tadd   sp,sp,t0
.*\tli  t0,[0-9]+
\tadd   sp,sp,t0
.*\taddisp,sp,[0-9]+
\ttail  __riscv_restore_(3|4)

against:callt0,__riscv_save_5
addisp,sp,-2016
fsw fs0,2012(sp)
fsw fs1,2008(sp)
fsw fs2,2004(sp)
fsw fs3,2000(sp)
fsw fs4,1996(sp)
li  t0,-12288
add sp,sp,t0
callgetf
fmv.s   fs1,fa0
callgetf
fmv.s   fs4,fa0
callgetf
fmv.s   fs3,fa0
callgetf
fmv.s   fs2,fa0
li  s0,0
fmv.s.x fs0,zero
lui a5,%hi(.LC0)
lw  s2,%lo(.LC0)(a5)
lw  s3,%lo(.LC0+4)(a5)
addis4,sp,1984
li  s1,4096
addis1,s1,-528
callmy_getchar
call__floatsidf
mv  a2,s2
mv  a3,s3
call__muldf3
call__truncdfsf2
sllia5,s0,2
add a5,s4,a5
fsw fa0,-1984(a5)
flw fa5,-1984(a5)
fadd.s  fs0,fs0,fa5
addis0,s0,1
bne s0,s1,.L2
fadd.s  fa5,fs1,fs0
fadd.s  fa5,fa5,fs4
fadd.s  fa5,fa5,fs3
fadd.s  fa5,fa5,fs2
fcvt.w.s a0,fa5,rtz
li  t0,12288
add sp,sp,t0
flw fs0,2012(sp)
flw fs1,2008(sp)
flw fs2,2004(sp)
flw fs3,2000(sp)
flw fs4,1996(sp)
addisp,sp,2016
tail__riscv_restore_5

FAIL: gcc.target/riscv/stack_save_restore_2.c   -O0   check-function-bodies bar

It looks like the issue is that your regex matches
__riscv_save_(3|4) where now gcc emits __riscv_restore_5.

Would it be OK to update the regex to also accept 5 (& are we going to 
bump into this again)?


diff --git a/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c 
b/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c

index 4c549cb11ae..bc95736cf8e 100644
--- a/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
+++ b/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
@@ -7,7 +7,7 @@ float getf();

 /*
 ** bar:
-** call    t0,__riscv_save_(3|4)
+** call    t0,__riscv_save_(3|4|5)
 ** addi    sp,sp,-[0-9]+
 ** ...
 ** li  t0,-[0-9]+
@@ -17,7 +17,7 @@ float getf();
 ** add sp,sp,t0
 ** ...
 ** addi    sp,sp,[0-9]+
-** tail    __riscv_restore_(3|4)
+** tail    __riscv_restore_(3|4|5)
 */
 int bar()
 {

If we're going to run into this again, it might make sense to allow a 
wider range of numbers (up to 9):


diff --git a/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c 
b/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c

index 4c549cb11ae..1d5b950130e 100644
--- a/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
+++ b/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
@@ -7,7 +7,7 @@ float getf();

 /*
 ** bar:
-** call    t0,__riscv_save_(3|4)
+** call    t0,__riscv_save_([3-9])
 ** addi    sp,sp,-[0-9]+
 ** ...
 ** li  t0,-[0-9]+
@@ -17,7 +17,7 @@

Re: [PATCH] RISC-V: Fix wrong tune parameters on int_div

2023-10-27 Thread Andrew Waterman

On Fri, Oct 27, 2023 at 6:55 AM Jeff Law  wrote:
>
>
>
> On 10/27/23 01:49, Robin Dapp wrote:
> >> @@ -346,7 +346,7 @@ static const struct riscv_tune_param rocket_tune_info 
> >> = {
> >> {COSTS_N_INSNS (4), COSTS_N_INSNS (5)},  /* fp_mul */
> >> {COSTS_N_INSNS (20), COSTS_N_INSNS (20)},/* fp_div */
> >> {COSTS_N_INSNS (4), COSTS_N_INSNS (4)},  /* int_mul */
> >> -  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},   /* int_div */
> >> +  {COSTS_N_INSNS (33), COSTS_N_INSNS (65)}, /* int_div */
> >> 1,   /* issue_rate */
> >> 3,   /* branch_cost */
> >> 5,   /* memory_cost */
> >> @@ -361,7 +361,7 @@ static const struct riscv_tune_param 
> >> sifive_7_tune_info = {
> >> {COSTS_N_INSNS (4), COSTS_N_INSNS (5)},  /* fp_mul */
> >> {COSTS_N_INSNS (20), COSTS_N_INSNS (20)},/* fp_div */
> >> {COSTS_N_INSNS (4), COSTS_N_INSNS (4)},  /* int_mul */
> >> -  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},   /* int_div */
> >> +  {COSTS_N_INSNS (33), COSTS_N_INSNS (65)}, /* int_div */
> >> 2,   /* issue_rate */
> >> 4,   /* branch_cost */
> >> 3,   /* memory_cost */
> >> @@ -376,7 +376,7 @@ static const struct riscv_tune_param 
> >> thead_c906_tune_info = {
> >> {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_mul */
> >> {COSTS_N_INSNS (20), COSTS_N_INSNS (20)}, /* fp_div */
> >> {COSTS_N_INSNS (4), COSTS_N_INSNS (4)}, /* int_mul */
> >> -  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)}, /* int_div */
> >> +  {COSTS_N_INSNS (18), COSTS_N_INSNS (34)}, /* int_div */
> >> 1,/* issue_rate */
> >> 3,/* branch_cost */
> >> 5,/* memory_cost */
> >
> > Instruction costs don't really correspond to latencies even though
> > sometimes they are used as if they were.  I'm a bit wary of using
> > e.g. 65 which would disparage each use of an integer division inside
> > a sequence.
> >
> > Could you check which costs we need in order to still emit your wanted
> > sequence?  Maybe we can use values a bit lower than yours and still
> > get the proper code.  Where is the decision being made actually?
> The main use of costing of a div/mod instruction is to guide the
> reciprocal division code when dividing by a constant.In that context
> we're comparing costs against a sequence of multiplies, shifts, add/sub
> insns which are almost always costed by their latency.  So using latency
> for division is a reasonable place to start.
>
> The other thing that might be worth investigating for those processors
> would be to set "use_divmod_expansion" in the cost structure.  I've
> heard talk of fusing div/mod into divmod, though I'm not aware of any
> part implementing that fusion

I'm also unaware of existing implementations that fuse these
operations; div + mul + sub is probably best for most uarches...

> (from a prior life, that would seem to
> require a 2nd output port on the integer unit which could be highly
> undesirable).

...but it can be done more cheaply than this, so I wouldn't foreclose
on the possibility.  Nevertheless, future work, as you say.

> Anyway, this could be a followup item for Yangyu if it
> looks profitable.
>
> jeff

Re: [PATCH] RISC-V: Make stack_save_restore_2 more robust





On 10/27/23 11:56, Patrick O'Neill wrote:

GCC recently changed to emit __riscv_restore_5 which causes this
testcase to fail.
This patch updates the regex to be more robust to change by accepting
any number after __riscv_save_ and __riscv_restore_.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/stack_save_restore_2.c: Accept any number
after __riscv_save_ and __riscv_restore_.

OK
jeff

Re: [PATCH htdocs v3] bugs: Mention -D_GLIBCXX_ASSERTIONS and -D_GLIBCXX_DEBUG

2023-10-27 Thread Jonathan Wakely

On Thu, 26 Oct 2023 at 23:40, Gerald Pfeifer  wrote:
>
> On Thu, 26 Oct 2023, Sam James wrote:
> > These options both enabled more checking within the C++ standard library
> > and can expose errors in submitted code.
>
> This is a good addition, thank you! I was going to approve/push, but it's
> probably better for Jonathan to give the final okay.

LGTM, I'll push it.

>
> Just one question:
>
> > +... If either of these fail, this is a strong indicator
> > +of an error in your code.
>
> What does "fails" mean in this context? Are we looking at build failures?
> Run-time failures?

Run-time. Defining those macros should not cause build failures,
although if they do that's probably a bug in the user code too.

Re: [PATCH v2] gcc.c-torture/execute/builtins/fputs.c: fputs_unlocked prototype





On 10/23/23 02:37, Florian Weimer wrote:

Current glibc headers only declare fputs_unlocked for _GNU_SOURCE,
so define it to obtain an official prototype.

Add a fallback prototype declaration for other systems that do not
have fputs_unlocked.  This seems to the most straightforward approach
to avoid an implicit function declaration, without reducing test
coverage and introducing ongoing maintenance requirements (e.g.g,
FreeBSD added fputs_unlocked support fairly recently).

gcc/testsuite/

* gcc.c-torture/execute/builtins/fputs.c (_GNU_SOURCE):
Define.
(fputs_unlocked): Declare.

This approach is fine too.  OK.
jeff

Re: [PATCH v1] RISC-V: Remove unnecessary asm check for vec cvt





On 10/23/23 03:54, pan2...@intel.com wrote:

From: Pan Li 

The vsetvl asm check is unnecessary for the vector convert. We
should be focus for constrait and leave the vsetvl test to the
vsetvl pass.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/cvt-0.c: Remove the vsetvl
asm check from func body.
* gcc.target/riscv/rvv/autovec/unop/cvt-1.c: Ditto.

OK
jeff

Re: [PATCH] A new copy propagation and PHI elimination pass





On 10/20/23 07:52, Filip Kastl wrote:

On Fri 2023-10-20 15:50:25, Filip Kastl wrote:

Bootstraped and tested* on x86_64-pc-linux-gnu.

* One testcase (pr79691.c) did regress. However that is because the test is
dependent on a certain variable not being copy propagated. I will go into more
detail about this in a reply to this mail.


This testcase checks for the string '= 9' being present in the tree-optimized
gimple dump ({ dg-final { scan-tree-dump " = 9;" "optimized" } }). This is how
the relevant place in the dump looks like without my patch:

int f4 (int i)
{
   int _6;

[local count: 1073741824]:
   _6 = 9;
   return _6;

}

Note that '= 9' is indeed present but there is an opportunity for copy
propagation. With my patch, the copy propagation happens:

int f4 (int i)
{
   int _6;

[local count: 1073741824]:
   return 9;

}

Which means no '= 9' is present and therefore the test fails.

What should I do? I don't suppose that changing the testcase to search for just
'9' would be wise since the dump may contain other '9's. I could change it to
search for 'return 9'. That would make it dependent on some copy propagation
being run late enough. However it is currently dependent on *no* copy
propagation being run late in the compilation. Also, if the test would search
for 'return 9', it would search for the most optimized version of the function
f4.

Or maybe searching for '9;' would work.
So in general you have to go back and try to assess the original intent 
of the test.  Once you have the original intent, the path forward is 
often clear.


In this specific case the source is:
+/* Verify -fprintf-return-value results used for constant propagation.  */
+int f4 (int i)
+{
+  int n1 = __builtin_snprintf (0, 0, "%i", 1234);
+  int n2 = __builtin_snprintf (0, 0, "%i", 12345);
+  return n1 + n2;
+}

And the intent of the test is to verify that we get constants from the 
snprintf calls and that they in turn simplify to a constant.


That is certainly still the case after your patch, just the form of the 
output is different (the constant is propagated further).  So I think 
testing for "return 9" would be the right approach here.


jeff

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

About where we should insert the new __builtin_with_access_and_size:

> On Oct 26, 2023, at 2:54 PM, Qing Zhao  wrote:
> 
> 
> 
>> On Oct 26, 2023, at 10:05 AM, Richard Biener  
>> wrote:
>> 
>> 
>> 
>>> Am 26.10.2023 um 12:14 schrieb Martin Uecker :
>>> 
>>> Am Donnerstag, dem 26.10.2023 um 11:20 +0200 schrieb Martin Uecker:
> Am Donnerstag, dem 26.10.2023 um 10:45 +0200 schrieb Richard Biener:
> On Wed, Oct 25, 2023 at 8:16 PM Martin Uecker  wrote:
>> 
>> Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
>>> 
 Am 25.10.2023 um 12:47 schrieb Martin Uecker :
 
 Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
>> On 2023-10-25 04:16, Martin Uecker wrote:
>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>>> 
 Am 24.10.2023 um 22:38 schrieb Martin Uecker :
 
 Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
> Hi, Sid,
> 
> Really appreciate for your example and detailed explanation. Very 
> helpful.
> I think that this example is an excellent example to show 
> (almost) all the issues we need to consider.
> 
> I slightly modified this example to make it to be compilable and 
> run-able, as following:
> (but I still cannot make the incorrect reordering or DSE 
> happening, anyway, the potential reordering possibility is there…)
> 
> 1 #include 
> 2 struct A
> 3 {
> 4  size_t size;
> 5  char buf[] __attribute__((counted_by(size)));
> 6 };
> 7
> 8 static size_t
> 9 get_size_from (void *ptr)
> 10 {
> 11  return __builtin_dynamic_object_size (ptr, 1);
> 12 }
> 13
> 14 void
> 15 foo (size_t sz)
> 16 {
> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
> sizeof(char));
> 18  obj->size = sz;
> 19  obj->buf[0] = 2;
> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> 21  return;
> 22 }
> 23
> 24 int main ()
> 25 {
> 26  foo (20);
> 27  return 0;
> 28 }
> 
> 
> 
> 
>>> When it’s set I suppose.  Turn
>>> 
>>> X.l = n;
>>> 
>>> Into
>>> 
>>> X.l = __builtin_with_size (x.buf, n);
>> 
>> It would turn
>> 
>> some_variable = (&) x.buf
>> 
>> into
>> 
>> some_variable = __builtin_with_size ( (&) x.buf. x.len)
>> 
>> 
>> So the later access to x.buf and not the initialization
>> of a member of the struct (which is too early).
>> 
> 
> Hmm, so with Qing's example above, are you suggesting the 
> transformation
> be to foo like so:
> 
> 14 void
> 15 foo (size_t sz)
> 16 {
> 16.5  void * _1;
> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
> sizeof(char));
> 18  obj->size = sz;
> 19  obj->buf[0] = 2;
> 19.5  _1 = __builtin_with_size (obj->buf, obj->size);
> 20  __builtin_printf (“%d\n", get_size_from (_1));
> 21  return;
> 22 }
> 
> If yes then this could indeed work.  I think I got thrown off by the
> reference to __bdos.
 
 Yes. I think it is important not to evaluate the size at the
 access to buf and not the allocation, because the point is to
 recover it from the size member even when the compiler can't
 see the original allocation.
>>> 
>>> But if the access is through a pointer without the attribute visible
>>> even the Frontend cannot recover?
>> 
>> Yes, if the access is using a struct-with-FAM without the attribute
>> the FE would not be insert the builtin.  BDOS could potentially
>> still see the original allocation but if it doesn't, then there is
>> no information.
>> 
>>> We’d need to force type correctness and give up on indirecting
>>> through an int * when it can refer to two diffenent container types.
>>> The best we can do I think is mark allocation sites and hope for
>>> some basic code hygiene (not clobbering size or array pointer
>>> through pointers without the appropriately attributed type)
>> 
>> I am do not fully understand what you are referring to.
> 
> struct A { int n; int data[n]; };
> struct B { long n; int data[n]; };
> 
> int *p = flag ? a->data : b->data;
> 
> access *p;
> 
> Since we need to allow

[committed] amdgcn: silence warnings

2023-10-27 Thread Andrew Stubbs

This trivial patch adds the "operands" keyword to the condition in a 
couple of patterns that cause warnings about "missing" mode specifiers.


With the iterators, there were a large number of warnings about these 
cases that have now been silenced.


Andrewamdgcn: silence warnings

The operands really should be VOIDmode, so the warnings are false.

gcc/ChangeLog:

* config/gcn/gcn-valu.md
(vec_extract_nop): Mention "operands" in
condition to silence the warnings.
(vec_extract_nop): Likewise.
* config/gcn/gcn.md (*movti_insn): Likewise.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index c128c819c89..39c1dc5b7b4 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -948,7 +948,8 @@ (define_insn "vec_extract_nop"
  (match_operand:V_1REG 1 "register_operand"   " 0,v")
  (match_operand 2 "ascending_zero_int_parallel" "")))]
   "MODE_VF (mode) < MODE_VF (mode)
-   && mode == mode"
+   && mode == mode
+   /* This comment silences a warning for operands[2]. */"
   "@
   ; in-place extract %0
   v_mov_b32\t%L0, %L1"
@@ -961,7 +962,8 @@ (define_insn "vec_extract_nop"
  (match_operand:V_2REG 1 "register_operand"   " 0,v")
  (match_operand 2 "ascending_zero_int_parallel" "")))]
   "MODE_VF (mode) < MODE_VF (mode)
-   && mode == mode"
+   && mode == mode
+   /* This comment silences a warning for operands[2]. */"
   "@
   ; in-place extract %0
   v_mov_b32\t%L0, %L1\;v_mov_b32\t%H0, %H1"
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index a3d8beefd6d..e6a9ac60b57 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -694,7 +694,7 @@ (define_insn_and_split "*movti_insn"
 
 (define_insn "prologue_use"
   [(unspec_volatile [(match_operand 0 "register_operand")] 
UNSPECV_PROLOGUE_USE)]
-  ""
+  "1 /* This comment silences a warning for operands[2]. */"
   ""
   [(set_attr "length" "0")])

Re: [committed] RISC-V: Make stack_save_restore tests more robust





On 10/27/23 11:34, Patrick O'Neill wrote:


On 8/25/23 15:36, Jeff Law wrote:
Spurred by Jivan's patch and a desire for cleaner testresults, I went 
ahead and make the stack_save_restore tests independent of the precise 
stack size by using a regexp.


Pushed to the trunk.

Jeff


Hi Jeff, A recent change that I'm still bisecting [1] caused 
stack_save_restore_2.c to start failing.


Debug log:

Executing on host: 
/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
 
-B/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
  
/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output
-O0  -march=rv32imafc -mabi=ilp32f -msave-restore -O2 -fno-schedule-insns 
-fno-schedule-insns2 -fno-unroll-loops -fno-peel-loops -fno-lto -S   -o 
stack_save_restore_2.s(timeout = 600)
spawn -ignore SIGHUP 
/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
 
-B/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
 
/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output -O0 
-march=rv32imafc -mabi=ilp32f -msave-restore -O2 -fno-schedule-insns 
-fno-schedule-insns2 -fno-unroll-loops -fno-peel-loops -fno-lto -S -o 
stack_save_restore_2.s
PASS: gcc.target/riscv/stack_save_restore_2.c   -O0  (test for excess errors)
body: \tcallt0,__riscv_save_(3|4)
\taddi  sp,sp,-[0-9]+
.*\tli  t0,-[0-9]+
\tadd   sp,sp,t0
.*\tli  t0,[0-9]+
\tadd   sp,sp,t0
.*\taddisp,sp,[0-9]+
\ttail  __riscv_restore_(3|4)

against:callt0,__riscv_save_5
addisp,sp,-2016
fsw fs0,2012(sp)
fsw fs1,2008(sp)
fsw fs2,2004(sp)
fsw fs3,2000(sp)
fsw fs4,1996(sp)
li  t0,-12288
add sp,sp,t0
callgetf
fmv.s   fs1,fa0
callgetf
fmv.s   fs4,fa0
callgetf
fmv.s   fs3,fa0
callgetf
fmv.s   fs2,fa0
li  s0,0
fmv.s.x fs0,zero
lui a5,%hi(.LC0)
lw  s2,%lo(.LC0)(a5)
lw  s3,%lo(.LC0+4)(a5)
addis4,sp,1984
li  s1,4096
addis1,s1,-528
callmy_getchar
call__floatsidf
mv  a2,s2
mv  a3,s3
call__muldf3
call__truncdfsf2
sllia5,s0,2
add a5,s4,a5
fsw fa0,-1984(a5)
flw fa5,-1984(a5)
fadd.s  fs0,fs0,fa5
addis0,s0,1
bne s0,s1,.L2
fadd.s  fa5,fs1,fs0
fadd.s  fa5,fa5,fs4
fadd.s  fa5,fa5,fs3
fadd.s  fa5,fa5,fs2
fcvt.w.s a0,fa5,rtz
li  t0,12288
add sp,sp,t0
flw fs0,2012(sp)
flw fs1,2008(sp)
flw fs2,2004(sp)
flw fs3,2000(sp)
flw fs4,1996(sp)
addisp,sp,2016
tail__riscv_restore_5

FAIL: gcc.target/riscv/stack_save_restore_2.c   -O0   check-function-bodies bar

It looks like the issue is that your regex matches
__riscv_save_(3|4) where now gcc emits __riscv_restore_5.

Would it be OK to update the regex to also accept 5 (& are we going to 
bump into this again)?
Thanks for looking at this -- my tester flagged them yesterday as well 
and I hadn't dug into them yet:



Tests that now fail, but worked before (11 tests):

gcc.target/riscv/rv32i_zcmp.c   -Os   check-function-bodies test1
gcc.target/riscv/rv32i_zcmp.c   -Os   check-function-bodies test2_step1_0_size
gcc.target/riscv/rv32i_zcmp.c   -Os   check-function-bodies test3
gcc.target/riscv/stack_save_restore_2.c   -O0   check-function-bodies bar
gcc.target/riscv/stack_save_restore_2.c   -O1   check-function-bodies bar
gcc.target/riscv/stack_save_restore_2.c   -O2   check-function-bodies bar
gcc.target/riscv/stack_save_restore_2.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none   check-function-bodies bar
gcc.target/riscv/stack_save_restore_2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   check-function-bodies bar
gcc.target/riscv/stack_save_restore_2.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions   check-function-bodies 
bar
gcc.target/riscv/stack_save_restore_2.c   -O3 -g   check-function-bodies bar
gcc.target/riscv/stack_save_restore_2.c   -Os   check-function-bodies bar




Yes, I think accepting more cases here is quite reasonable.  In fact, 
you

Re: [PATCH] RISC-V: Make stack_save_restore_2 more robust

2023-10-27 Thread Patrick O'Neill




On 10/27/23 11:02, Jeff Law wrote:



On 10/27/23 11:56, Patrick O'Neill wrote:

GCC recently changed to emit __riscv_restore_5 which causes this
testcase to fail.
This patch updates the regex to be more robust to change by accepting
any number after __riscv_save_ and __riscv_restore_.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/stack_save_restore_2.c: Accept any number
after __riscv_save_ and __riscv_restore_.

OK
jeff


Committed

Patrick

[PATCH] RISC-V: Fix bugs of handling scalar of SEW64 vx instruction in RV32

2023-10-27 Thread Juzhe-Zhong

sew64_scalar_helper is handling SEW64 vx instruction pattern on RV32 system.
According to RVV ISA, we can directly use vx instruction of SEW64 on RV32 system
since RV32 GR reg is 32bit.

Consider this following case:

vsetvl e64m1
vadd.vx v,v,x

will be transform by sew64_scalar_helper:

vsetvl e64m1
sw
sw
vlse v
vadd.vv

This bug is reported by Robin. 
(insn 143 179 230 9 (set (reg:SI 15 a5 [234])
(unspec:SI [
(const_int 64 [0x40])
] UNSPEC_VLMAX)) 751 {vlmax_avlsi}
 (expr_list:REG_EQUIV (unspec:SI [
(const_int 64 [0x40])
] UNSPEC_VLMAX)
(nil)))
(insn 230 143 78 9 (parallel [
(set (reg:SI 66 vl)
(unspec:SI [
(reg:SI 15 a5 [234])
(const_int 64 [0x40])
(const_int 0 [0])
] UNSPEC_VSETVL))
(set (reg:SI 67 vtype)
(unspec:SI [
(const_int 64 [0x40])
(const_int 0 [0])
(const_int 1 [0x1]) repeated x2
] UNSPEC_VSETVL))
]) "bug.c":14:14 discrim 1 1469 {vsetvl_discard_resultsi}
 (nil))
(insn 78 230 84 9 (set (reg:RVVM1DI 102 v6 [203])
(if_then_else:RVVM1DI (unspec:RVVMF64BI [
(const_vector:RVVMF64BI repeat [
(const_int 1 [0x1])
])
(const_int 0 [0])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(vec_duplicate:RVVM1DI (mem/u/c:DI (reg/f:SI 29 t4 [230]) [0  S8 
A64]))
(unspec:RVVM1DI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))) "bug.c":14:14 discrim 1 1872 
{*pred_broadcastrvvm1di}
 (expr_list:REG_DEAD (reg/f:SI 29 t4 [230])
(nil)))

The root cause of this is because we missed VLMAX handling since the codes was 
invented
long time ago (Callers always intrinsics codes, no VLMAX situation).

Now, all following bugs are fixed after this patch:

FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test

gcc/ChangeLog:

* config/riscv/riscv-protos.h (sew64_scalar_helper): Fix bug.
* config/riscv/riscv-v.cc (sew64_scalar_helper): Ditto.
* config/riscv/vector.md: Ditto.

---
 gcc/config/riscv/riscv-protos.h |  2 +-
 gcc/config/riscv/riscv-v.cc |  8 +++--
 gcc/config/riscv/vector.md  | 54 ++---
 3 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 2926d5d50d5..150b61bb5b5 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -490,7 +490,7 @@ void expand_vec_lceil (rtx, rtx, machine_mode, 
machine_mode);
 void expand_vec_lfloor (rtx, rtx, machine_mode, machine_mode);
 #endif
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
- bool, void (*)(rtx *, rtx));
+ bool, void (*)(rtx *, rtx), enum avl_type);
 rtx gen_scalar_move_mask (machine_mode);
 rtx gen_no_side_effects_vsetvl_rtx (machine_mode, rtx, rtx);
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 53991cc1090..ee631404b44 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1641,7 +1641,7 @@ has_vi_variant_p (rtx_code code, rtx x)
 bool
 sew64_scalar_helper (rtx *operands, rtx *scalar_op, rtx vl,
 machine_mode vector_mode, bool has_vi_variant_p,
-void (*emit_vector_func) (rtx *, rtx))
+void (*emit_vector_func) (rtx *, rtx), enum avl_type type)
 {
   machine_mode scalar_mode = GET_MODE_INNER (vector_mode);
   if (has_vi_variant_p)
@@ -1671,7 +1671,11 @@ sew64_scalar_helper (rtx *operands, rtx *scalar_op, rtx 
vl,
 
   rtx tmp = gen_reg_rtx (vector_mode);
   rtx ops[] = {tmp, *scalar_op};
-  emit_nonvlmax_insn (code_for_pred_broadcast (vector_mode), UNARY_OP, ops, 
vl);
+  if (type == VLMAX)
+emit_vlmax_insn (code_for_pred_broadcast (vector_mode), UNARY_OP, ops);
+  else
+emit_nonvlmax_insn (code_for_pred_broadcast (vector_mode), UNARY_OP, ops,
+   vl);
   emit_vector_func (operands, tmp);
 
   return true;
diff --git a/gcc/config/riscv/vector.md

Re: [PATCH v3 3/3] c++: note other candidates when diagnosing deletedness

On Fri, 27 Oct 2023, Jason Merrill wrote:

> On 10/27/23 15:55, Patrick Palka wrote:
> > With the previous two patches in place, we can now extend our
> > deletedness diagnostic to note the other considered candidates, e.g.:
> > 
> >deleted16.C: In function 'int main()':
> >deleted16.C:10:4: error: use of deleted function 'void f(int)'
> >   10 |   f(0);
> >  |   ~^~~
> >deleted16.C:5:6: note: declared here
> >5 | void f(int) = delete;
> >  |  ^
> >deleted16.C:5:6: note: candidate: 'void f(int)' (deleted)
> >deleted16.C:6:6: note: candidate: 'void f(...)'
> >6 | void f(...);
> >  |  ^
> >deleted16.C:7:6: note: candidate: 'void f(int, int)'
> >7 | void f(int, int);
> >  |  ^
> >deleted16.C:7:6: note:   candidate expects 2 arguments, 1 provided
> > 
> > These notes are controlled by a new command line flag -fnote-all-cands,
> > which also controls whether we note ignored candidates more generally.
> > 
> > gcc/ChangeLog:
> > 
> > * doc/invoke.texi (C++ Dialect Options): Document -fnote-all-cands.
> > 
> > gcc/c-family/ChangeLog:
> > 
> > * c.opt: Add -fnote-all-cands.
> > 
> > gcc/cp/ChangeLog:
> > 
> > * call.cc (print_z_candidates): Only print ignored candidates
> > when -fnote-all-cands is set.
> > (build_over_call): When diagnosing deletedness, call
> > print_z_candidates if -fnote-all-cands is set.
> 
> My suggestion was also to suggest using the flag in cases where it would make
> a difference, e.g.
> 
> note: some candidates omitted, use '-fnote-all-cands' to display them

Ah thanks, fixed.  That'll help a lot with discoverability of the flag.

> 
> Maybe "-fdiagnostics-all-candidates"?

Nice, that's a better name indeed :)

How does the following look?  Full bootstrap/regtest in progress.

Here's the output of e.g. deleted16a.C.  I think I'd prefer to not print
the source line when emitting the suggestion, but I don't know how to do
that properly (aside from e.g. emitting the note at UNKNOWN_LOCATION).

In file included from gcc/testsuite/g++.dg/cpp0x/deleted16a.C:4:
gcc/testsuite/g++.dg/cpp0x/deleted16.C: In function ‘int main()’:
gcc/testsuite/g++.dg/cpp0x/deleted16.C:21:4: error: use of deleted function 
‘void f(int)’
   21 |   f(0); // { dg-error "deleted" }
  |   ~^~~
gcc/testsuite/g++.dg/cpp0x/deleted16.C:6:6: note: declared here
6 | void f(int) = delete; // { dg-message "declared here" }
  |  ^
gcc/testsuite/g++.dg/cpp0x/deleted16.C:21:4: note: use 
‘-fdiagnostics-all-candidates’ to display considered candidates
   21 |   f(0); // { dg-error "deleted" }
  |   ~^~~
gcc/testsuite/g++.dg/cpp0x/deleted16.C:22:4: error: use of deleted function 
‘void g(int)’
   22 |   g(0); // { dg-error "deleted" }
  |   ~^~~
gcc/testsuite/g++.dg/cpp0x/deleted16.C:12:6: note: declared here
   12 | void g(int) = delete; // { dg-message "declared here" }
  |  ^
gcc/testsuite/g++.dg/cpp0x/deleted16.C:22:4: note: use 
‘-fdiagnostics-all-candidates’ to display considered candidates
   22 |   g(0); // { dg-error "deleted" }
  |   ~^~~
gcc/testsuite/g++.dg/cpp0x/deleted16.C:23:4: error: use of deleted function 
‘void h(T, T) [with T = int]’
   23 |   h(1, 1); // { dg-error "deleted" }
  |   ~^~
gcc/testsuite/g++.dg/cpp0x/deleted16.C:17:24: note: declared here
   17 | template void h(T, T) = delete; // { dg-message "declared 
here|candidate" }
  |^
gcc/testsuite/g++.dg/cpp0x/deleted16.C:23:4: note: use 
‘-fdiagnostics-all-candidates’ to display considered candidates
   23 |   h(1, 1); // { dg-error "deleted" }
  |   ~^~

-- >8 --


Subject: [PATCH 3/3] c++: note other candidates when diagnosing deletedness

With the previous two patches in place, we can now extend our
deletedness diagnostic to note the other considered candidates, e.g.:

  deleted16.C: In function 'int main()':
  deleted16.C:10:4: error: use of deleted function 'void f(int)'
 10 |   f(0);
|   ~^~~
  deleted16.C:5:6: note: declared here
  5 | void f(int) = delete;
|  ^
  deleted16.C:5:6: note: candidate: 'void f(int)' (deleted)
  deleted16.C:6:6: note: candidate: 'void f(...)'
  6 | void f(...);
|  ^
  deleted16.C:7:6: note: candidate: 'void f(int, int)'
  7 | void f(int, int);
|  ^
  deleted16.C:7:6: note:   candidate expects 2 arguments, 1 provided

These notes are controlled by a new command line flag
-fdiagnostics-all-candidates which also controls whether we note
ignored candidates more generally.

gcc/ChangeLog:

* doc/invoke.texi (C++ Dialect Options): Document
-fdiagnostics-all-candidates.

gcc/c-family/ChangeLog:

* c.opt: Add -fdiagnostics-all-candidates.

gcc/cp/ChangeLog:

* call.cc (print_z_candidates): Only print ignored candidates
when -fdiagnostics-all-candidates is set, otherwise suggest
the flag.
(build_over_call): When

Re: [PATCH v3 3/3] c++: note other candidates when diagnosing deletedness

On Fri, 27 Oct 2023, Patrick Palka wrote:

> On Fri, 27 Oct 2023, Jason Merrill wrote:
> 
> > On 10/27/23 15:55, Patrick Palka wrote:
> > > With the previous two patches in place, we can now extend our
> > > deletedness diagnostic to note the other considered candidates, e.g.:
> > > 
> > >deleted16.C: In function 'int main()':
> > >deleted16.C:10:4: error: use of deleted function 'void f(int)'
> > >   10 |   f(0);
> > >  |   ~^~~
> > >deleted16.C:5:6: note: declared here
> > >5 | void f(int) = delete;
> > >  |  ^
> > >deleted16.C:5:6: note: candidate: 'void f(int)' (deleted)
> > >deleted16.C:6:6: note: candidate: 'void f(...)'
> > >6 | void f(...);
> > >  |  ^
> > >deleted16.C:7:6: note: candidate: 'void f(int, int)'
> > >7 | void f(int, int);
> > >  |  ^
> > >deleted16.C:7:6: note:   candidate expects 2 arguments, 1 provided
> > > 
> > > These notes are controlled by a new command line flag -fnote-all-cands,
> > > which also controls whether we note ignored candidates more generally.
> > > 
> > > gcc/ChangeLog:
> > > 
> > >   * doc/invoke.texi (C++ Dialect Options): Document -fnote-all-cands.

It just occurred to me that this despite this flag being C++ specific, it
probably should be documented under "Diagnostic Message Formatting Options",
like -fdiagnostics-show-template-tree is.

> > > 
> > > gcc/c-family/ChangeLog:
> > > 
> > >   * c.opt: Add -fnote-all-cands.
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * call.cc (print_z_candidates): Only print ignored candidates
> > >   when -fnote-all-cands is set.
> > >   (build_over_call): When diagnosing deletedness, call
> > >   print_z_candidates if -fnote-all-cands is set.
> > 
> > My suggestion was also to suggest using the flag in cases where it would 
> > make
> > a difference, e.g.
> > 
> > note: some candidates omitted, use '-fnote-all-cands' to display them
> 
> Ah thanks, fixed.  That'll help a lot with discoverability of the flag.
> 
> > 
> > Maybe "-fdiagnostics-all-candidates"?
> 
> Nice, that's a better name indeed :)
> 
> How does the following look?  Full bootstrap/regtest in progress.
> 
> Here's the output of e.g. deleted16a.C.  I think I'd prefer to not print
> the source line when emitting the suggestion, but I don't know how to do
> that properly (aside from e.g. emitting the note at UNKNOWN_LOCATION).
> 
> In file included from gcc/testsuite/g++.dg/cpp0x/deleted16a.C:4:
> gcc/testsuite/g++.dg/cpp0x/deleted16.C: In function ‘int main()’:
> gcc/testsuite/g++.dg/cpp0x/deleted16.C:21:4: error: use of deleted function 
> ‘void f(int)’
>21 |   f(0); // { dg-error "deleted" }
>   |   ~^~~
> gcc/testsuite/g++.dg/cpp0x/deleted16.C:6:6: note: declared here
> 6 | void f(int) = delete; // { dg-message "declared here" }
>   |  ^
> gcc/testsuite/g++.dg/cpp0x/deleted16.C:21:4: note: use 
> ‘-fdiagnostics-all-candidates’ to display considered candidates
>21 |   f(0); // { dg-error "deleted" }
>   |   ~^~~
> gcc/testsuite/g++.dg/cpp0x/deleted16.C:22:4: error: use of deleted function 
> ‘void g(int)’
>22 |   g(0); // { dg-error "deleted" }
>   |   ~^~~
> gcc/testsuite/g++.dg/cpp0x/deleted16.C:12:6: note: declared here
>12 | void g(int) = delete; // { dg-message "declared here" }
>   |  ^
> gcc/testsuite/g++.dg/cpp0x/deleted16.C:22:4: note: use 
> ‘-fdiagnostics-all-candidates’ to display considered candidates
>22 |   g(0); // { dg-error "deleted" }
>   |   ~^~~
> gcc/testsuite/g++.dg/cpp0x/deleted16.C:23:4: error: use of deleted function 
> ‘void h(T, T) [with T = int]’
>23 |   h(1, 1); // { dg-error "deleted" }
>   |   ~^~
> gcc/testsuite/g++.dg/cpp0x/deleted16.C:17:24: note: declared here
>17 | template void h(T, T) = delete; // { dg-message "declared 
> here|candidate" }
>   |^
> gcc/testsuite/g++.dg/cpp0x/deleted16.C:23:4: note: use 
> ‘-fdiagnostics-all-candidates’ to display considered candidates
>23 |   h(1, 1); // { dg-error "deleted" }
>   |   ~^~
> 
> -- >8 --
> 
> 
> Subject: [PATCH 3/3] c++: note other candidates when diagnosing deletedness
> 
> With the previous two patches in place, we can now extend our
> deletedness diagnostic to note the other considered candidates, e.g.:
> 
>   deleted16.C: In function 'int main()':
>   deleted16.C:10:4: error: use of deleted function 'void f(int)'
>  10 |   f(0);
> |   ~^~~
>   deleted16.C:5:6: note: declared here
>   5 | void f(int) = delete;
> |  ^
>   deleted16.C:5:6: note: candidate: 'void f(int)' (deleted)
>   deleted16.C:6:6: note: candidate: 'void f(...)'
>   6 | void f(...);
> |  ^
>   deleted16.C:7:6: note: candidate: 'void f(int, int)'
>   7 | void f(int, int);
> |  ^
>   deleted16.C:7:6: note:   candidate expects 2 arguments, 1 provided
> 
> These notes are controlled by a new command line flag
> -fdiagnostics-all-candidates

Re: [PATCH 1/3] [V6] [RISC-V] support cm.push cm.pop cm.popret in zcmp





On 10/27/23 14:31, Patrick O'Neill wrote:

Hi Fei,

A recent change to GCC [1] updated the  the registers in the cm.push and 
cm.pop insns for these testcases:


|FAIL: gcc.target/riscv/rv32i_zcmp.c -Os check-function-bodies test1 
FAIL: gcc.target/riscv/rv32i_zcmp.c -Os check-function-bodies 
test2_step1_0_size FAIL: gcc.target/riscv/rv32i_zcmp.c -Os 
check-function-bodies test3|

[ ... ]
Actually [1-9] looks better upon further review.

jeff

Re: [PATCH 2/3] build: Add libgrust as compilation modules

Hi!

To close the loop here:

On 2023-09-27T00:25:16+0200, I wrote:
> On 2023-09-20T13:59:53+0200, Arthur Cohen  wrote:
>> From: Pierre-Emmanuel Patry 
>>
>> Define the libgrust directory as a host compilation module as well as
>> for targets.
>
>> --- a/Makefile.def
>> +++ b/Makefile.def
>> @@ -149,6 +149,7 @@ host_modules= { module= libcc1; 
>> extra_configure_flags=--enable-shared; };
>>  host_modules= { module= gotools; };
>>  host_modules= { module= libctf; bootstrap=true; };
>>  host_modules= { module= libsframe; bootstrap=true; };
>> +host_modules= { module= libgrust; };
>>
>>  target_modules = { module= libstdc++-v3;
>>  bootstrap=true;
>> @@ -192,6 +193,7 @@ target_modules = { module= libgm2; lib_path=.libs; };
>>  target_modules = { module= libgomp; bootstrap= true; lib_path=.libs; };
>>  target_modules = { module= libitm; lib_path=.libs; };
>>  target_modules = { module= libatomic; bootstrap=true; lib_path=.libs; };
>> +target_modules = { module= libgrust; };
>>
>>  // These are (some of) the make targets to be done in each subdirectory.
>>  // Not all; these are the ones which don't have special options.
>
> Maybe just I am confused, but to make sure that the build doesn't break
> for different GCC configurations

Indeed, as discussed in

"[PATCH v2 2/4] libgrust: Add libproc_macro and build system".

> don't we also directly need to
> incorporate here a few GCC/Rust master branch follow-on commits, like:
>
>   - commit 171ea4e2b3e202067c50f9c206974fbe1da691c0 "fixup: Fix bootstrap 
> build"
>   - commit 61cbe201029658c32e5c360823b9a1a17d21b03c "fixup: Fix missing build 
> dependency"

I've not yet run into the need for these two.  Let's please leave these
out of the upstream submission for now, until we understand what exactly
these are necessary for.

However:

>   - commit 6a8b207b9ef7f9038e0cae7766117428783825d8 "libgrust: Add dependency 
> to libstdc++"

... this one definitely is necessary right now; see discussion in

"Disable target libgrust if we're not building target libstdc++".


And:

> (Not sure if all of these are necessary and/or if that's the complete
> list; haven't looked up the corresponding GCC/Rust GitHub PRs.)
>
>> --- a/gcc/rust/config-lang.in
>> +++ b/gcc/rust/config-lang.in
>
>> +target_libs="target-libffi target-libbacktrace target-libgrust"
>
> Please don't add back 'target-libffi' and 'target-libbacktrace' here;
> just 'target-libgrust'.  (As is present in GCC/Rust master branch, and
> per commit 7411eca498beb13729cc2acec77e68250940aa81
> "Rust: Don't depend on unused 'target-libffi', 'target-libbacktrace'".)

... that change is necessary, too.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH v3 3/3] c++: note other candidates when diagnosing deletedness

2023-10-27 Thread Jason Merrill


On 10/27/23 15:55, Patrick Palka wrote:

With the previous two patches in place, we can now extend our
deletedness diagnostic to note the other considered candidates, e.g.:

   deleted16.C: In function 'int main()':
   deleted16.C:10:4: error: use of deleted function 'void f(int)'
  10 |   f(0);
 |   ~^~~
   deleted16.C:5:6: note: declared here
   5 | void f(int) = delete;
 |  ^
   deleted16.C:5:6: note: candidate: 'void f(int)' (deleted)
   deleted16.C:6:6: note: candidate: 'void f(...)'
   6 | void f(...);
 |  ^
   deleted16.C:7:6: note: candidate: 'void f(int, int)'
   7 | void f(int, int);
 |  ^
   deleted16.C:7:6: note:   candidate expects 2 arguments, 1 provided

These notes are controlled by a new command line flag -fnote-all-cands,
which also controls whether we note ignored candidates more generally.

gcc/ChangeLog:

* doc/invoke.texi (C++ Dialect Options): Document -fnote-all-cands.

gcc/c-family/ChangeLog:

* c.opt: Add -fnote-all-cands.

gcc/cp/ChangeLog:

* call.cc (print_z_candidates): Only print ignored candidates
when -fnote-all-cands is set.
(build_over_call): When diagnosing deletedness, call
print_z_candidates if -fnote-all-cands is set.


My suggestion was also to suggest using the flag in cases where it would 
make a difference, e.g.


note: some candidates omitted, use '-fnote-all-cands' to display them

Maybe "-fdiagnostics-all-candidates"?

Jason

Re: [PATCH 1/3] [V6] [RISC-V] support cm.push cm.pop cm.popret in zcmp





On 10/27/23 14:31, Patrick O'Neill wrote:

Hi Fei,

A recent change to GCC [1] updated the  the registers in the cm.push and 
cm.pop insns for these testcases:


|FAIL: gcc.target/riscv/rv32i_zcmp.c -Os check-function-bodies test1 
FAIL: gcc.target/riscv/rv32i_zcmp.c -Os check-function-bodies 
test2_step1_0_size FAIL: gcc.target/riscv/rv32i_zcmp.c -Os 
check-function-bodies test3|


Debug log:

Executing on host: 
/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
 
-B/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
  
/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
  -march=rv64gcv -mabi=lp64d -mcmodel=medlow   -fdiagnostics-plain-output
-Os   -Os -march=rv32imaf_zca_zcmp -mabi=ilp32f -mcmodel=medlow -S   -o 
rv32i_zcmp.s(timeout = 600)
spawn -ignore SIGHUP 
/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
 
-B/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
 
/github/patrick-postcommit-runner-1/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
 -march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output -Os -Os 
-march=rv32imaf_zca_zcmp -mabi=ilp32f -mcmodel=medlow -S -o rv32i_zcmp.s
PASS: gcc.target/riscv/rv32i_zcmp.c   -Os  (test for excess errors)
body: .*\tcm.push   {ra, s0-s4}, -80
.*\tcm.popret   {ra, s0-s4}, 80
.*
against:lui a5,%hi(.LC0)
li  t0,-16384
cm.push {ra, s0-s6}, -80
addit0,t0,816
fsw fs0,44(sp)
lw  s2,%lo(.LC0)(a5)
lw  s3,%lo(.LC0+4)(a5)
fmv.s.x fs0,zero
li  a5,4096
add sp,sp,t0
addia5,a5,-784
li  s1,4096
li  s0,0
addis5,sp,-784
add s4,sp,a5
addis1,s1,-976
callmy_getchar
add s6,s5,s0
sb  a0,784(s6)
callmy_getchar
call__floatsidf
mv  a2,s2
mv  a3,s3
call__muldf3
call__truncdfsf2
sllia5,s0,2
add a5,s4,a5
fsw fa0,-192(a5)
addis0,s0,1
lbu a4,784(s6)
fcvt.s.wfa5,a4
flw fa4,-192(a5)
fadd.s  fa5,fa5,fa4
fadd.s  fs0,fs0,fa5
bne s0,s1,.L2
li  t0,16384
addit0,t0,-816
add sp,sp,t0
fcvt.w.s a0,fs0,rtz
flw fs0,44(sp)
cm.popret   {ra, s0-s6}, 80

FAIL: gcc.target/riscv/rv32i_zcmp.c   -Os   check-function-bodies test1

Would  it be OK if we made the regex accept any s[0-9] register (or 
would it be better if it was [1-9])?

Proposed change:

I'd think your proposed change would be fine.




[1] It was one of these commits:
https://github.com/gcc-mirror/gcc/compare/a4ca869144cecc595d1af8b21e51f588e2f2...4d49685d671e4e604b2b873ada65aaac89348794

Almost certainly the regsiter allocator change.

Jeff

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-10-27 Thread waffl3x

I've been under the weather so I took a few days break, I honestly was
also very reluctant to pick it back up. The current problem is what I
like to call "not friendly", but I am back at it now.

> > I don't understand what this means exactly, under what circumstances
> > would  find the member function. Oh, I guess while in the body of
> > it's class, I hadn't considered that. Is that what you're referring to?
>
>
> Right:
>
> struct A {
> void g(this A&);
> A() {
> ::g; // ok
>  // same error as for an implicit object member function
> }
> };

I fully get this now, I threw together a test for it so this case
doesn't get forgotten about. Unfortunately though, I am concerned that
the approach I was going to take to fix the crash would have the
incorrect behavior for this.

Here is what I added to cp_build_addr_expr_1 with context included.
```
case OFFSET_REF:
offset_ref:
  /* Turn a reference to a non-static data member into a
 pointer-to-member.  */
  {
tree type;
tree t;

gcc_assert (PTRMEM_OK_P (arg));

t = TREE_OPERAND (arg, 1);
if (TYPE_REF_P (TREE_TYPE (t)))
  {
if (complain & tf_error)
  error_at (loc,
"cannot create pointer to reference member %qD", t);
return error_mark_node;
  }
/* -- Waffl3x additions start -- */
/* Exception for non-overloaded explicit object member function.  */
if (TREE_CODE (TREE_TYPE (t)) == FUNCTION_TYPE)
  return build1 (ADDR_EXPR, unknown_type_node, arg);
/* -- Waffl3x additions end -- */

/* Forming a pointer-to-member is a use of non-pure-virtual fns.  */
if (TREE_CODE (t) == FUNCTION_DECL
&& !DECL_PURE_VIRTUAL_P (t)
&& !mark_used (t, complain) && !(complain & tf_error))
  return error_mark_node;
```

I had hoped this naive solution would work just fine, but unfortunately
the following code fails to compile with an error.

```
struct S {
void f(this S&) {}
};
int main() {
void (*a)(S&) = ::f;
}
```
normal_static.C: In function ‘int main()’:
normal_static.C:13:25: error: cannot convert ‘S::f’ from type ‘void(S&)’ to 
type ‘void (*)(S&)’
   13 | void (*a)(S&) = ::f;
  | ^

So clearly it isn't going to be that easy. I went up and down looking
at how the single static case is handled, and tried to read the code in
build_ptrmem_type and build_ptrmemfunc_type but I had a hard time
figuring it out.

The good news is, this problem was difficult enough that it made me
pick a proper diff tool with regex support instead of using a silly web
browser tool and pasting things into it. Or worse, pasting them into a
tool and doing replacement and then pasting them into the silly web
browser tool. I have been forced to improve my workflow thanks to this
head scratcher. So it's not all for naught.

Back on topic, it's not really the optimization returning a baselink
that is causing the original crash. It's just the assert in
build_ptrmem_type failing when a FUNCTION_TYPE is reaching it. The
optimization did throw me for a loop when I was comparing how my
previous version (that incorrectly set the lang_decl_fn ::
static_function flag) was handling things. Looking back, I think I
explained myself and the methodology I was using to investigate really
poorly, I apologize for the confusion I might have caused :).

To state it plainly, it seems to me that the arg parameter being passed
into cp_build_addr_expr_1 for explicit object member functions is
(mostly) pretty much correct and what we would want.

So the whole thing with the baselink optimization was really just a red
herring that I was following. Now that I have a better understanding of
what's happening leading up to and in cp_build_addr_expr_1 I don't
think it's relevant at all for this problem. With that said, I am
questioning again if the optimization that returns a baselink node is
the right way to do things. So this is something I'm going to put down
into my little notes text file to investigate at a later time, and
forget about it for the moment as it shouldn't be causing any friction
for us here.

Anyway, as I eluded to above, if I can't figure out the right way to
solve this problem in a decent amount of time I think I'm going to
leave it for now. I'll come back to it once other higher priority
things are fixed or finished. And hopefully someone more familiar with
this area of the code will have a better idea of what we need to do to
handle this case in a non-intrusive manner.

That wraps up my current status on this specifically. But while
investigating it I uncovered a few things that I feel are important to
discuss/note.

I wanted to change DECL_NONSTATIC_MEMBER_FUNCTION_P to include explicit
object member functions, but it had some problems when I made the
modification. I also noticed that it's used in cp-objcp-common.cc so
would making changes to it be a bad

Disable target libgrust if we're not building target libstdc++ (was: [PATCH v2 2/4] libgrust: Add libproc_macro and build system)

Hi!

Short Friday evening status update:

On 2023-10-27T16:20:34+0200, I wrote:
> Short Friday afternoon status update:
>
> On 2023-10-27T08:51:12+0100, Iain Sandoe  wrote:
>>> On 26 Oct 2023, at 09:21, Thomas Schwinge  wrote:
>>> First, I've pushed into GCC upstream Git branch devel/rust/libgrust-v2
>>> the "v2" libgrust changes as posted by Arthur, so that people can easily
>>> test this before it getting into Git master branch.
>>>
>>> I'll myself later try this for GCN and nvptx targets -- in their current
>>> form where they don't support C++ (standard library)
>
> Indeed, this currently fails to build:
>
> [...]
> make[3]: Entering directory 
> `[...]/build-gcc/amdgcn-amdhsa/libgrust/libproc_macro'
> [...]
> libtool: compile:  [...]/build-gcc/./gcc/xg++ -B[...]/build-gcc/./gcc/ 
> -nostdinc++ -funconfigured-libstdc++-v3 [...] -c 
> [...]/source-gcc/libgrust/libproc_macro/proc_macro.cc
> xg++: error: unrecognized command-line option 
> ‘-funconfigured-libstdc++-v3’
> make[3]: *** [proc_macro.lo] Error 1
> make[3]: Leaving directory 
> `[...]/build-gcc/amdgcn-amdhsa/libgrust/libproc_macro'
> make[2]: *** [all-recursive] Error 1
> make[2]: Leaving directory `[...]/build-gcc/amdgcn-amdhsa/libgrust'
> make[1]: *** [all-target-libgrust] Error 2
> make[1]: Leaving directory `[...]/build-gcc'
> make: *** [all] Error 2
>
> ("error: unrecognized command-line option ‘-funconfigured-libstdc++-v3’"
> indeed is the expected outcome if libstdc++ is not available, as I
> understand.)
>
> Same for nvptx-none target.
>
> We need two things: (a) make sure that target libgrust build depends on
> target libstdc++, and (b) disable target libgrust if target libstdc++ is
> not available (and, later, gracefully handle that situation in the Rust
> front end).
>
> As far as I remember, patches exist for (a), and Arthur is going to
> integrate/re-submit those.

In fact, for (a), it seem that we just this one
GCC/Rust commit 6a8b207b9ef7f9038e0cae7766117428783825d8
"libgrust: Add dependency to libstdc++"; see

"Add libstdc++ dependency to libgrust".

> Arthur, before re-submission, feel free to
> first cherr-pick and push'these into the GCC upstream Git branch
> devel/rust/libgrust-v2, so that I can re-test.

> I'm not sure about (b),
> whether that fell out of the (a) changes, too?  I can otherwise look into
> that later.

..., which I've now done.  Indeed that is still broken.  We need, if I
understand this correctly, the attached
"Disable target libgrust if we're not building target libstdc++" to
address that issue.

Pierre-Emmanuel: In this case (that is we cannot build target libgrust
because we're not building target libstdc++), do we also disable host
libgrust, or do we still build that one?  (This can be settled later.)


>>> and in my hacky WIP
>>> trees where C++ (standard library) is supported to some extent.
>
> This does built

..., but only if target libstdc++ already happens to have been built.  If
not, you'll run into funny libtool errors like:

[...]
libtool: compile: unrecognized option 
`-B[...]/build-gcc/amdgcn-amdhsa/libstdc++-v3/src/.libs'
libtool: compile: Try `libtool --help' for more information.
make[3]: *** [proc_macro.lo] Error 1
make[3]: Leaving directory 
`[...]/build-gcc/amdgcn-amdhsa/libgrust/libproc_macro'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `[...]/build-gcc/amdgcn-amdhsa/libgrust'
make[1]: *** [all-target-libgrust] Error 2
[...]

(This translates into: target libstdc++ has not (yet) been built;
'[...]/build-gcc/amdgcn-amdhsa/libstdc++-v3/src/.libs' is not (yet)
available.)

Need "libgrust: Add dependency to libstdc++" (see above) to make this
work reliably.


Grüße
 Thomas


> -- but the default multilib only, as Iain already
> reported:
>
>>> (This
>>> should, roughly, match C++ functionality (not) provided by a number of
>>> other GCC "embedded" targets.)
>>
>> on Darwin, it works for later systems without multilibs, but fails to build 
>> multilibs.
>
> I see that, too.
>
>> —— so….
>>
>> With the patch below bootstrap suceeded on x86_64-darwin17 and produced a 
>> correct
>> architecture multilib.
>
> Confirmed, thanks!
>
>> Of course, there is no way to test this at the moment - I’d suggest
>> that the next step might be something small in functionality that can allow 
>> at least one
>> test to be wired up.
>
> ACK.
>
>> ^^^ this is “lightly tested” of course, as I cycle through other versions of 
>> the OS will see
>> how it pans out.
>>
>> Do you want me to make a PR with this change against upstream?
>
> Yes, please.  (But no hurry.)
>
>
> Grüße
>  Thomas
>
>
>> From 027bc2c5255a6f1b75592e896dd99fac55bfb9b8 Mon Sep 17 00:00:00 2001
>> From: Iain Sandoe 
>> Date: Thu, 26 Oct 2023 23:19:36 +0100
>> Subject: [PATCH] libgrust: enable multilib
>>
>> Most of this change is the regenerated files, the multilib config macro
>> was already

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-27 Thread Vineet Gupta





On 10/27/23 10:16, Bernhard Reutner-Fischer wrote:

On Wed, 25 Oct 2023 16:41:07 +0530
Ajit Agarwal  wrote:


On 25/10/23 2:19 am, Vineet Gupta wrote:

On 10/24/23 13:36, rep.dot@gmail.com wrote:

As said, I don't see why the below was not cleaned up before the V1 submission.
Iff it breaks when manually CSEing, I'm curious why?

The function below looks identical in v12 of the patch.
Why didn't you use common subexpressions?
ba

Using CSE here breaks aarch64 regressions hence I have reverted it back
not to use CSE,

Just for my own education, can you please paste your patch perusing common 
subexpressions and an assembly diff of the failing versus working aarch64 
testcase, along how you configured that failing (cross-?)compiler and the 
command-line of a typical testcase that broke when manually CSEing the function 
below?

I was meaning to ask this before, but what exactly is the CSE issue, manually 
or whatever.

If nothing else it would hopefully improve the readability.

   

Here is the abi interface where I CSE'D and got a mail from automated 
regressions run that aarch64
test fails.

We already concluded that this failure was obviously a hiccup on the
testers, no problem.


+static inline bool
+abi_extension_candidate_return_reg_p (int regno)
+{
+  return targetm.calls.function_value_regno_p (regno);
+}

But i was referring to abi_extension_candidate_p :)

your v13 looks like this:

+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  machine_mode dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set), 0);
+
+  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  || abi_extension_candidate_return_reg_p (REGNO (orig_src)))
+return false;
+
+  /* Return FALSE if mode of destination and source is same.  */
+  if (dst_mode == GET_MODE (orig_src))
+return false;
+
+  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
+  bool promote_p = abi_target_promote_function_mode (mode);
+
+  /* Return FALSE if promote is false and REGNO of source and destination
+ is different.  */
+  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
+return false;
+
+  return true;
+}

and i suppose it would be easier to read if phrased something like

static bool
abi_extension_candidate_p (rtx_insn *insn)
{
   rtx set = single_set (insn);
   rtx orig_src = XEXP (SET_SRC (set), 0);
   unsigned int src_regno = REGNO (orig_src);

   /* Not a function argument reg or is a function values return reg.  */
   if (!FUNCTION_ARG_REGNO_P (src_regno)
   || abi_extension_candidate_return_reg_p (src_regno))
 return false;

   rtx dst = SET_DST (set);
   machine_mode src_mode = GET_MODE (orig_src);

   /* Return FALSE if mode of destination and source is the same.  */
   if (GET_MODE (dst) == src_mode)
 return false;

   /* Return FALSE if the FIX THE COMMENT and REGNO of source and destination
  is different.  */
   if (!abi_target_promote_function_mode_p (src_mode)
   && REGNO (dst) != src_regno)
 return false;

   return true;
}

so no, that's not exactly better.

Maybe just do what the function comment says (i did not check the "not
promoted" part, but you get the idea):

^L

/* Return TRUE if
reg source operand is argument register and not return register,
mode of source and destination operand are different,
if not promoted REGNO of source and destination operand are the same.  */
static bool
abi_extension_candidate_p (rtx_insn *insn)
{
   rtx set = single_set (insn);
   rtx orig_src = XEXP (SET_SRC (set), 0);

   if (FUNCTION_ARG_REGNO_P (REGNO (orig_src))
   && !abi_extension_candidate_return_reg_p (REGNO (orig_src))
   && GET_MODE (SET_DST (set)) != GET_MODE (orig_src)
   && abi_target_promote_function_mode_p (GET_MODE (orig_src))
   && REGNO (SET_DST (set)) == REGNO (orig_src))
 return true;

   return false;
}


This may have been my doing as I asked to split out the logic as some of 
the conditions merit more commentary.

e.g. why does the mode need to be same
But granted this is the usual coding style in gcc and the extra comments 
could still be added before the big if


-Vineet



I think this is much easier to actually read (and that's why good
function comments are important). In the end it's not important and
just personal preference.
Either way, I did not check the plausibility of the logic therein.



I have not done any assembly diff as myself have not cross compiled with 
aarch64.

fair enough.

Re: [PATCH] testsuite, Darwin: Add support for Mach-O function body scans.

2023-10-27 Thread Andrew Pinski

On Fri, Oct 27, 2023 at 4:00 AM Iain Sandoe  wrote:
>
> Hi Richard,
>
> > On 26 Oct 2023, at 21:00, Iain Sandoe  wrote:
>
> >> On 26 Oct 2023, at 20:49, Richard Sandiford  
> >> wrote:
> >>
> >> Iain Sandoe  writes:
> >>> This was written before Thomas' modification to the ELF-handling to allow
> >>> a config-based change for target details.  I did consider updating this
> >>> to try and use that scheme, but I think that it would sit a little
> >>> awkwardly, since there are some differences in the start-up scanning for
> >>> Mach-O.  I would say that in all probability we could improve things but
> >>> I'd like to put this forward as a well-tested initial implementation.
> >>
> >> Sorry, I would prefer to extend the existing function instead.
> >> E.g. there's already some divergence between the Mach-O version
> >> and the default version, in that the Mach-O version doesn't print
> >> verbose messages.  I also don't think that the current default code
> >> is so watertight that it'll never need to be updated in future.
> >
> > Fair enough, will explore what can be done (as I recall last I looked the
> > primary difference was in the initial start-up scan).
>
> I’ve done this as attached.
>
> For the record, when doing it, it gave rise to the same misgivings that led
> to the separate implementation before.
>
>  * as we add formats and uncover asm oddities, they all need to be handled
>in one set of code, IMO it could be come quite convoluted.
>
>  * now making a change to the MACH-O code, means I have to check I did not
>inadvertently break ELF (and likewise, in theory, an ELF change should 
> check
>MACH-O, but many folks do/can not do that).
>
> Maybe there’s some half-way-house where code can usefully be shared without
> those down-sides.

There is already gcc.test-framework which seems like a good place to
put a test for both formats so when someone changes the function, they
could run that testsuite to make sure it is still working for the
other format.
(Note I am not saying you should add it as part of this patch but it
seems like that would be the perfect place for it.)

Thanks,
Andrew

>
> Anyway, to make progress, is the revised version OK for trunk? (tested on
> aarch64-linux and aarch64-darwin).
> thanks
> Iain
>
>
>

Re: [pushed] c++: fix tourney logic

On Fri, 27 Oct 2023, Patrick Palka wrote:

> On Fri, 27 Oct 2023, Patrick Palka wrote:
> 
> > On Fri, 20 Oct 2023, Jason Merrill wrote:
> > 
> > > Tested x86_64-pc-linux-gnu, applying to trunk.  Patrick, sorry I didn't 
> > > apply
> > > this sooner.
> > > 
> > > -- 8< --
> > > 
> > > In r13-3766 I changed the logic at the end of tourney to avoid redundant
> > > comparisons, but the change also meant skipping any less-good matches
> > > between the champ_compared_to_predecessor candidate and champ itself.
> > > 
> > > This should not be a correctness issue, since we believe that joust is a
> > > partial order.  But it can lead to missed warnings, as in this testcase.
> > 
> > I suppose this rules out optimizing tourney via transitivity when in
> > a non-SFINAE context since it'd cause missed warnings such as these.
> > But maybe we'd still want to optimize the second pass via transitivity
> > in a SFINAE context?
> 
> Eh, maybe it's not worth it either way..  According to some quick
> experiments, making the second pass in tourney assume transitivity by
> going up to the most recent tie even in non-SFINAE contexts reduces the
> total number of non-trivial calls to tourney by about 5%.  Doing the

total number of non-trivial calls to joust, rather

> same in only SFINAE contexts reduces the number of calls by less than 1%.
> 
> > 
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * call.cc (tourney): Only skip champ_compared_to_predecessor.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/warn/Wsign-promo1.C: New test.
> > > ---
> > >  gcc/cp/call.cc   |  5 +++--
> > >  gcc/testsuite/g++.dg/warn/Wsign-promo1.C | 15 +++
> > >  2 files changed, 18 insertions(+), 2 deletions(-)
> > >  create mode 100644 gcc/testsuite/g++.dg/warn/Wsign-promo1.C
> > > 
> > > diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> > > index 657eca93d23..a49fde949d5 100644
> > > --- a/gcc/cp/call.cc
> > > +++ b/gcc/cp/call.cc
> > > @@ -13227,10 +13227,11 @@ tourney (struct z_candidate *candidates, 
> > > tsubst_flags_t complain)
> > >   been compared to.  */
> > >  
> > >for (challenger = candidates;
> > > -   challenger != champ
> > > -  && challenger != champ_compared_to_predecessor;
> > > +   challenger != champ;
> > > challenger = challenger->next)
> > >  {
> > > +  if (challenger == champ_compared_to_predecessor)
> > > + continue;
> > >fate = joust (champ, challenger, 0, complain);
> > >if (fate != 1)
> > >   return NULL;
> > > diff --git a/gcc/testsuite/g++.dg/warn/Wsign-promo1.C 
> > > b/gcc/testsuite/g++.dg/warn/Wsign-promo1.C
> > > new file mode 100644
> > > index 000..51b76eee735
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/warn/Wsign-promo1.C
> > > @@ -0,0 +1,15 @@
> > > +// Check that we get joust warnings from comparing the final champ to a
> > > +// candidate between it and the previous champ.
> > > +
> > > +// { dg-additional-options -Wsign-promo }
> > > +
> > > +struct A { A(int); };
> > > +
> > > +enum E { e };
> > > +
> > > +int f(int, A);
> > > +int f(unsigned, A);
> > > +int f(int, int);
> > > +
> > > +int i = f(e, 42);// { dg-warning "passing 'E'" }
> > > +// { dg-warning "in call to 'int f" "" { target *-*-* } .-1 }
> > > 
> > > base-commit: 084addf8a700fab9222d4127ab8524920d0ca481
> > > -- 
> > > 2.39.3
> > > 
> > > 
> > 
>

Re: [pushed] c++: fix tourney logic

On Fri, 27 Oct 2023, Patrick Palka wrote:

> On Fri, 20 Oct 2023, Jason Merrill wrote:
> 
> > Tested x86_64-pc-linux-gnu, applying to trunk.  Patrick, sorry I didn't 
> > apply
> > this sooner.
> > 
> > -- 8< --
> > 
> > In r13-3766 I changed the logic at the end of tourney to avoid redundant
> > comparisons, but the change also meant skipping any less-good matches
> > between the champ_compared_to_predecessor candidate and champ itself.
> > 
> > This should not be a correctness issue, since we believe that joust is a
> > partial order.  But it can lead to missed warnings, as in this testcase.
> 
> I suppose this rules out optimizing tourney via transitivity when in
> a non-SFINAE context since it'd cause missed warnings such as these.
> But maybe we'd still want to optimize the second pass via transitivity
> in a SFINAE context?

Eh, maybe it's not worth it either way..  According to some quick
experiments, making the second pass in tourney assume transitivity by
going up to the most recent tie even in non-SFINAE contexts reduces the
total number of non-trivial calls to tourney by about 5%.  Doing the
same in only SFINAE contexts reduces the number of calls by less than 1%.

> 
> > 
> > gcc/cp/ChangeLog:
> > 
> > * call.cc (tourney): Only skip champ_compared_to_predecessor.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/warn/Wsign-promo1.C: New test.
> > ---
> >  gcc/cp/call.cc   |  5 +++--
> >  gcc/testsuite/g++.dg/warn/Wsign-promo1.C | 15 +++
> >  2 files changed, 18 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/warn/Wsign-promo1.C
> > 
> > diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> > index 657eca93d23..a49fde949d5 100644
> > --- a/gcc/cp/call.cc
> > +++ b/gcc/cp/call.cc
> > @@ -13227,10 +13227,11 @@ tourney (struct z_candidate *candidates, 
> > tsubst_flags_t complain)
> >   been compared to.  */
> >  
> >for (challenger = candidates;
> > -   challenger != champ
> > -&& challenger != champ_compared_to_predecessor;
> > +   challenger != champ;
> > challenger = challenger->next)
> >  {
> > +  if (challenger == champ_compared_to_predecessor)
> > +   continue;
> >fate = joust (champ, challenger, 0, complain);
> >if (fate != 1)
> > return NULL;
> > diff --git a/gcc/testsuite/g++.dg/warn/Wsign-promo1.C 
> > b/gcc/testsuite/g++.dg/warn/Wsign-promo1.C
> > new file mode 100644
> > index 000..51b76eee735
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/warn/Wsign-promo1.C
> > @@ -0,0 +1,15 @@
> > +// Check that we get joust warnings from comparing the final champ to a
> > +// candidate between it and the previous champ.
> > +
> > +// { dg-additional-options -Wsign-promo }
> > +
> > +struct A { A(int); };
> > +
> > +enum E { e };
> > +
> > +int f(int, A);
> > +int f(unsigned, A);
> > +int f(int, int);
> > +
> > +int i = f(e, 42);  // { dg-warning "passing 'E'" }
> > +// { dg-warning "in call to 'int f" "" { target *-*-* } .-1 }
> > 
> > base-commit: 084addf8a700fab9222d4127ab8524920d0ca481
> > -- 
> > 2.39.3
> > 
> > 
>

Re: [PATCH v3 1/3] c++: sort candidates according to viability

2023-10-27 Thread Jason Merrill


On 10/27/23 15:55, Patrick Palka wrote:

New in patch 1/3:
   * consistently use "non-viable" instead of "unviable"
 throughout
   * make 'champ' and 'challenger' in 'tourney' be z_candidate**
 to simplify moving 'champ' to the front of the list.  drive-by
 cleanups in tourney, including renaming 'champ_compared_to_predecessor'
 to 'previous_worse_champ' for clarity.
New in patch 2/3:
   * consistently use "non-viable" instead of "unviable" throughout
New in patch 3/3:
   * introduce new -fnote-all-cands flag that controls noting other
 candidates when diagnosing deletedness, and also controls
 noting "ignored" candidates in general.

-- >8 --

This patch:

   * changes splice_viable to move the non-viable candidates to the end
 of the list instead of removing them outright
   * makes tourney move the best candidate to the front of the candidate
 list
   * adjusts print_z_candidates to preserve our behavior of printing only
 viable candidates when diagnosing ambiguity
   * adds a parameter to print_z_candidates to control this default behavior
 (the follow-up patch will want to print all candidates when diagnosing
 deletedness)

Thus after this patch we have access to the entire candidate list through
the best viable candidate.

This change also happens to fix diagnostics for the below testcase where
we currently neglect to note the third candidate, since the presence of
the two unordered non-strictly viable candidates causes splice_viable to
prematurely get rid of the non-viable third candidate.

gcc/cp/ChangeLog:

* call.cc: Include "tristate.h".
(splice_viable): Sort the candidate list according to viability.
Don't remove non-viable candidates from the list.
(print_z_candidates): Add defaulted only_viable_p parameter.
By default only print non-viable candidates if there is no
viable candidate.
(tourney): Make 'candidates' parameter a reference.


Why, when all the callers use the return value?

OK without that change.

Jason

Re: [PATCH] c++: Implement C++26 P1854R4 - Making non-encodable string literals ill-formed [PR110341]

2023-10-27 Thread Jason Merrill


On 8/25/23 16:49, Jakub Jelinek wrote:

Hi!

This paper voted in as DR makes some multi-character literals ill-formed.
'abcd' stays valid, but e.g. 'á' is newly invalid in UTF-8 exec charset
while valid e.g. in ISO-8859-1, because it is a single character which needs
2 bytes to be encoded.

The following patch does that by checking (only pedantically, especially
because it is a DR) if we'd emit a -Wmultichar warning because character
constant has more than one byte in it whether the number of bytes in the
narrow string matches number of bytes in CPP_STRING32 divided by char32_t
size in bytes.  If it is, it is normal multi-character literal constant
and is diagnosed normally with -Wmultichar, if the number of bytes is
larger, at least one of the c-chars in the sequence was encoded as 2+
bytes.

Now, doing this way has 2 drawbacks, some of the diagnostics which doesn't
result in cpp_interpret_string_1 failures can be printed twice, once
when calling cpp_interpret_string_1 for CPP_CHAR, once for CPP_STRING32.
And, functionally I think it must work 100% correctly if host source
character set is UTF-8 (because all valid UTF-8 chars are encodable in
UTF-32), but might not work for some control codes in UTF-EBCDIC if
that is the source character set (though I don't know if we really actually
support it, e.g. Linux iconv certainly doesn't).
All we actually need is count the number of c-chars in the literal,
alternative would be to write custom character counter which would quietly
interpret/skip over + count escape sequences and decode UTF-8 characters
in between those escape sequences.  But we'd need to have something similar
also for UTF-EBCDIC if it works at all, and from what I've looked, we don't
have anyything like that implemented in libcpp nor anywhere else in GCC.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Or ok with some tweaks to avoid the second round of diagnostics from
cpp_interpret_string_1/convert_escape?  Or reimplement that second time and
count manually?

2023-08-25  Jakub Jelinek  

PR c++/110341
libcpp/
* charset.cc: Implement C++ 26 P1854R4 - Making non-encodable string
literals ill-formed.
(narrow_str_to_charconst): Change last type from cpp_ttype to
const cpp_token *.  For C++ if pedantic and i > 1 in CPP_CHAR
interpret token also as CPP_STRING32 and if number of characters
in the CPP_STRING32 is larger than number of bytes in CPP_CHAR,
pedwarn on it.
(cpp_interpret_charconst): Adjust narrow_str_to_charconst caller.
gcc/testsuite/
* g++.dg/cpp26/literals1.C: New test.
* g++.dg/cpp26/literals2.C: New test.
* g++.dg/cpp23/wchar-multi1.C (c, d): Expect an error rather than
warning.

--- gcc/testsuite/g++.dg/cpp26/literals1.C.jj   2023-08-25 17:23:06.662878355 
+0200
+++ gcc/testsuite/g++.dg/cpp26/literals1.C  2023-08-25 17:37:03.085132304 
+0200
@@ -0,0 +1,65 @@
+// C++26 P1854R4 - Making non-encodable string literals ill-formed
+// { dg-do compile { target c++11 } }
+// { dg-require-effective-target int32 }
+// { dg-options "-pedantic-errors -finput-charset=UTF-8 -fexec-charset=UTF-8" }
+
+int d = '';   // { dg-error "character too 
large for character literal type" }

...

+char16_t m = u''; // { dg-error "character 
constant too long for its type" }


Why are these different diagnostics?  Why doesn't the first line already 
hit the existing diagnostic that the second gets?


Both could be clearer that the problem is that the single source 
character can't be encoded as a single execution character.


Jason

Re: [pushed] c++: fix tourney logic