Re: [PATCH] v2: small _BitInt tweaks

2023-09-19 Thread Richard Biener via Gcc-patches
On Tue, 19 Sep 2023, Jakub Jelinek wrote:

> Hi!
> 
> On Tue, Sep 12, 2023 at 05:27:30PM +, Joseph Myers wrote:
> > On Tue, 12 Sep 2023, Jakub Jelinek via Gcc-patches wrote:
> > 
> > > And by ensuring we never create 1-bit signed BITINT_TYPE e.g. the backends
> > > don't need to worry about them.
> > > 
> > > But I admit I don't feel strongly about that.
> > > 
> > > Joseph, what do you think about this?
> > 
> > I think it's appropriate to avoid 1-bit signed BITINT_TYPE consistently.
> 
> Here is a patch which does that.  In addition to the previously changed two
> hunks it also adds a checking assertion that we don't create
> signed _BitInt(0), unsigned _BitInt(0) or signed _BitInt(1) types.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2023-09-18  Jakub Jelinek  
> 
> gcc/
>   * tree.cc (build_bitint_type): Assert precision is not 0, or
>   for signed types 1.
>   (signed_or_unsigned_type_for): Return INTEGER_TYPE for signed variant
>   of unsigned _BitInt(1).
> gcc/c-family/
>   * c-common.cc (c_common_signed_or_unsigned_type): Return INTEGER_TYPE
>   for signed variant of unsigned _BitInt(1).
> 
> --- gcc/tree.cc.jj2023-09-11 17:01:17.612714178 +0200
> +++ gcc/tree.cc   2023-09-18 12:36:37.598912717 +0200
> @@ -7179,6 +7179,8 @@ build_bitint_type (unsigned HOST_WIDE_IN
>  {
>tree itype, ret;
>  
> +  gcc_checking_assert (precision >= 1 + !unsignedp);
> +
>if (unsignedp)
>  unsignedp = MAX_INT_CACHED_PREC + 1;
>  
> @@ -11096,7 +11098,7 @@ signed_or_unsigned_type_for (int unsigne
>else
>  return NULL_TREE;
>  
> -  if (TREE_CODE (type) == BITINT_TYPE)
> +  if (TREE_CODE (type) == BITINT_TYPE && (unsignedp || bits > 1))
>  return build_bitint_type (bits, unsignedp);
>return build_nonstandard_integer_type (bits, unsignedp);
>  }
> --- gcc/c-family/c-common.cc.jj   2023-09-11 17:01:17.517715431 +0200
> +++ gcc/c-family/c-common.cc  2023-09-18 12:35:06.829126858 +0200
> @@ -2739,7 +2739,9 @@ c_common_signed_or_unsigned_type (int un
>|| TYPE_UNSIGNED (type) == unsignedp)
>  return type;
>  
> -  if (TREE_CODE (type) == BITINT_TYPE)
> +  if (TREE_CODE (type) == BITINT_TYPE
> +  /* signed _BitInt(1) is invalid, avoid creating that.  */
> +  && (unsignedp || TYPE_PRECISION (type) > 1))
>  return build_bitint_type (TYPE_PRECISION (type), unsignedp);
>  
>  #define TYPE_OK(node)
> \
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: LTO: Get rid of 'lto_mode_identity_table' (was: Machine Mode ICE in RISC-V when LTO)

2023-09-18 Thread Richard Biener via Gcc-patches
On Mon, Sep 18, 2023 at 4:46 PM Thomas Schwinge  wrote:
>
> Hi!
>
> On 2023-09-15T15:33:59+0200, Robin Dapp  wrote:
> > is there anything we can do to assist from the riscv side in order to help
> > with this?  I haven't really been involved with it but was wondering
> > what's missing.  If I understand correctly Thomas has a major cleanup
> > operation in plan
>
> Not really major, but indeed non-trivial -- but WIP already.  ;-)
>
> > but might not get to it soon.
>
> Right.
>
> > The fix he proposed
> > helps for the riscv case, however, even without the rework?
>
> Right, and no harm done for my work.
>
> > If so, I'd kindly ping Jakub to check if the fix is reasonable.
>
> I'll push the attached "LTO: Get rid of 'lto_mode_identity_table'"
> mid-week, unless any objections raised.

OK.

Richard.

>
> Grüße
>  Thomas
>
>
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955


Re: [PATCH] MATCH: Avoid recusive zero_one_valued_p for conversions

2023-09-18 Thread Richard Biener via Gcc-patches
On Sun, Sep 17, 2023 at 3:45 AM Andrew Pinski via Gcc-patches
 wrote:
>
> So when VN finds a name which has a nop conversion, it says
> both names are equivalent to each other and the valuaization
> function for one will return the other. This normally does not
> cause any issues as there is no recusive matches. But after
> r14-4038-gb975c0dc3be285, there was one added. So we would
> do an infinite recusion on the match and never finish.
> This fixes the issue (and adds a comment in match.pd) by
> for converts just handle one level instead of being recusive
> always.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> Note the testcase was reduced from tree-ssa-loop-niter.cc and then
> changed slightly into C rather than C++ but it still needs exceptions
> turned on get the IR that VN would produce this equivalence relationship
> going on. Also had to turn off early inline to force put to be inlined later.
>
> PR tree-optimization/111435
>
> gcc/ChangeLog:
>
> * match.pd (zero_one_valued_p): Don't do recusion
> on converts.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/compile/pr111435-1.c: New test.
> ---
>  gcc/match.pd   |  8 +++-
>  .../gcc.c-torture/compile/pr111435-1.c | 18 ++
>  2 files changed, 25 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr111435-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 97405e6a5c3..887665633d4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2188,8 +2188,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>
>  /* A conversion from an zero_one_valued_p is still a [0,1].
> This is useful when the range of a variable is not known */
> +/* Note this matches can't be recusive because of the way VN handles
> +   nop conversions being equivalent and then recusive between them. */
>  (match zero_one_valued_p
> - (convert@0 zero_one_valued_p))
> + (convert@0 @1)
> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +  && (TYPE_UNSIGNED (TREE_TYPE (@1))
> + || TYPE_PRECISION (TREE_TYPE (@1)) > 1)
> +  && wi::leu_p (tree_nonzero_bits (@1), 1
>
>  /* Transform { 0 or 1 } * { 0 or 1 } into { 0 or 1 } & { 0 or 1 }.  */
>  (simplify
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr111435-1.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr111435-1.c
> new file mode 100644
> index 000..afa84dd59dd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr111435-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-options "-fexceptions -fno-early-inlining" } */
> +/* { dg-require-effective-target exceptions } */
> +
> +void find_slot_with_hash(const int *);
> +
> +void put(const int *k, const int *) {
> +find_slot_with_hash(k);
> +}
> +unsigned len();
> +int *address();
> +void h(int header, int **bounds) {
> +  if (!*bounds)
> +return;
> +  unsigned t = *bounds ? len() : 0;
> +  int queue_index = t;
> +  address()[(unsigned)queue_index] = 0;
> +  put(, _index);
> +}
> --
> 2.31.1
>


Re: [PATCH] MATCH: Make zero_one_valued_p non-recusive fully

2023-09-18 Thread Richard Biener via Gcc-patches
On Sun, Sep 17, 2023 at 11:41 PM Andrew Pinski via Gcc-patches
 wrote:
>
> So it turns out VN can't handle any kind of recusion for match. In this
> case we have `b = a & -1` and we try to match a as being zero_one_valued_p
> and VN returns b as being the value and we just go into an infinite loop at
> this point.

Huh, interesting.  Must be because we return an available expression for
the b, a & -1 equivalency class.  Otherwise I'd have expected you get 'a'.

>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

Richard.

> Note genmatch should warn (or error out) if this gets detected so I filed PR 
> 111446
> which I will be looking into next week or the week after so we don't run into
> this issue again.
>
> PR tree-optimization/111442
>
> gcc/ChangeLog:
>
> * match.pd (zero_one_valued_p): Have the bit_and match not be
> recusive.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/compile/pr111442-1.c: New test.
> ---
>  gcc/match.pd |  5 -
>  gcc/testsuite/gcc.c-torture/compile/pr111442-1.c | 13 +
>  2 files changed, 17 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr111442-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 887665633d4..773c3810f51 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2183,8 +2183,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>
>  /* (a&1) is always [0,1] too. This is useful again when
> the range is not known. */
> +/* Note this can't be recusive due to VN handling of equivalents,
> +   VN and would cause an infinite recusion. */
>  (match zero_one_valued_p
> - (bit_and:c@0 @1 zero_one_valued_p))
> + (bit_and:c@0 @1 integer_onep)
> + (if (INTEGRAL_TYPE_P (type
>
>  /* A conversion from an zero_one_valued_p is still a [0,1].
> This is useful when the range of a variable is not known */
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr111442-1.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr111442-1.c
> new file mode 100644
> index 000..5814ee938de
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr111442-1.c
> @@ -0,0 +1,13 @@
> +
> +int *a, b;
> +int main() {
> +  int d = 1, e;
> +  if (d)
> +e = a ? 0 % 0 : 0;
> +  if (d)
> +a = 
> +  d = -1;
> +  b = d & e;
> +  b = 2 * e ^ 1;
> +  return 0;
> +}
> --
> 2.31.1
>


Re: [PATCH/RFC 08/10] aarch64: Don't use CEIL for vector_store in aarch64_stp_sequence_cost

2023-09-18 Thread Richard Biener via Gcc-patches
On Mon, Sep 18, 2023 at 10:41 AM Richard Sandiford
 wrote:
>
> Kewen Lin  writes:
> > This costing adjustment patch series exposes one issue in
> > aarch64 specific costing adjustment for STP sequence.  It
> > causes the below test cases to fail:
> >
> >   - gcc/testsuite/gcc.target/aarch64/ldp_stp_15.c
> >   - gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
> >   - gcc/testsuite/gcc.target/aarch64/ldp_stp_17.c
> >   - gcc/testsuite/gcc.target/aarch64/ldp_stp_18.c
> >
> > Take the below function extracted from ldp_stp_15.c as
> > example:
> >
> > void
> > dup_8_int32_t (int32_t *x, int32_t val)
> > {
> > for (int i = 0; i < 8; ++i)
> >   x[i] = val;
> > }
> >
> > Without my patch series, during slp1 it gets:
> >
> >   val_8(D) 2 times unaligned_store (misalign -1) costs 2 in body
> >   node 0x10008c85e38 1 times scalar_to_vec costs 1 in prologue
> >
> > then the final vector cost is 3.
> >
> > With my patch series, during slp1 it gets:
> >
> >   val_8(D) 1 times unaligned_store (misalign -1) costs 1 in body
> >   val_8(D) 1 times unaligned_store (misalign -1) costs 1 in body
> >   node 0x10004cc5d88 1 times scalar_to_vec costs 1 in prologue
> >
> > but the final vector cost is 17.  The unaligned_store count is
> > actually unchanged, but the final vector costs become different,
> > it's because the below aarch64 special handling makes the
> > different costs:
> >
> >   /* Apply the heuristic described above m_stp_sequence_cost.  */
> >   if (m_stp_sequence_cost != ~0U)
> > {
> >   uint64_t cost = aarch64_stp_sequence_cost (count, kind,
> >stmt_info, vectype);
> >   m_stp_sequence_cost = MIN (m._stp_sequence_cost + cost, ~0U);
> > }
> >
> > For the former, since the count is 2, function
> > aarch64_stp_sequence_cost returns 2 as "CEIL (count, 2) * 2".
> > While for the latter, it's separated into twice calls with
> > count 1, aarch64_stp_sequence_cost returns 2 for each time,
> > so it returns 4 in total.
> >
> > For this case, the stmt with scalar_to_vec also contributes
> > 4 to m_stp_sequence_cost, then the final m_stp_sequence_cost
> > are 6 (2+4) vs. 8 (4+4).
> >
> > Considering scalar_costs->m_stp_sequence_cost is 8 and below
> > checking and re-assigning:
> >
> >   else if (m_stp_sequence_cost >= scalar_costs->m_stp_sequence_cost)
> > m_costs[vect_body] = 2 * scalar_costs->total_cost ();
> >
> > For the former, the body cost of vector isn't changed; but
> > for the latter, the body cost of vector is double of scalar
> > cost which is 8 for this case, then it becomes 16 which is
> > bigger than what we expect.
> >
> > I'm not sure why it adopts CEIL for the return value for
> > case unaligned_store in function aarch64_stp_sequence_cost,
> > but I tried to modify it with "return count;" (as it can
> > get back to previous cost), there is no failures exposed
> > in regression testing.  I expected that if the previous
> > unaligned_store count is even, this adjustment doesn't
> > change anything, if it's odd, the adjustment may reduce
> > it by one, but I'd guess it would be few.  Besides, as
> > the comments for m_stp_sequence_cost, the current
> > handlings seems temporary, maybe a tweak like this can be
> > accepted, so I posted this RFC/PATCH to request comments.
> > this one line change is considered.
>
> It's unfortunate that doing this didn't show up a regression.
> I guess it's not a change we explicitly added tests to guard against.
>
> But the point of the condition is to estimate how many single stores
> (STRs) and how many paired stores (STPs) would be generated.  As far
> as this heuristic goes, STP (storing two values) is as cheap as STR
> (storing only one value).  So the point of the CEIL is to count 1 store
> as having equal cost to 2, 3 as having equal cost to 4, etc.
>
> For a heuristic like that, costing a vector stmt once with count 2
> is different from costing 2 vector stmts with count 1.  The former
> makes it obvious that the 2 vector stmts are associated with the
> same scalar stmt, and are highly likely to be consecutive.  The latter
> (costing 2 stmts with count 1) could also happen for unrelated stmts.
>
> ISTM that costing once with count N provides strictly more information
> to targets than costing N time with count 1.  Is there no way we can
> keep the current behaviour?  E.g. rather than costing a stmt immediately
> within a loop, could we just increment a counter and cost once at the end?

I suppose we can.  But isn't the logic currently (or before the series) cheated
for variable-strided stores with ncopies > 1?  That is, while it sounds like
reasonable heuristics you can't really rely on this as the vectorizer doesn't
currently provide the info whether two vector loads/stores are adjacent?

Making sure we only pass count > 1 for adjacent load/store would be possible
though.  We should document this with comments though.

Richard.

>
> Thanks,
> Richard
>
> > gcc/ChangeLog:
> >
> >   * 

Re: [PATCH] tree-optimization/111294 - backwards threader PHI costing

2023-09-18 Thread Richard Biener via Gcc-patches
On Mon, 18 Sep 2023, Jakub Jelinek wrote:

> On Thu, Sep 14, 2023 at 01:23:13PM +0000, Richard Biener via Gcc-patches 
> wrote:
> > diff --git a/libgomp/team.c b/libgomp/team.c
> > index 54dfca8080a..e5a86de1dd0 100644
> > --- a/libgomp/team.c
> > +++ b/libgomp/team.c
> > @@ -756,8 +756,9 @@ gomp_team_start (void (*fn) (void *), void *data, 
> > unsigned nthreads,
> >attr = _attr;
> >  }
> >  
> > -  start_data = gomp_alloca (sizeof (struct gomp_thread_start_data)
> > -   * (nthreads - i));
> > +  if (i < nthreads)
> > +start_data = gomp_alloca (sizeof (struct gomp_thread_start_data)
> > + * (nthreads - i));
> >  
> >/* Launch new threads.  */
> >for (; i < nthreads; ++i)
> 
> Wouldn't just
>   if (i >= nthreads)
> __builtin_unreachable ();
> do the trick as well?

I'll check and adjust to that if possible.

Richard.

> I'd prefer not to add further runtime checks here if possible.
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] Trivial typo fix in variadic

2023-09-18 Thread Richard Biener via Gcc-patches
On Sun, Sep 17, 2023 at 9:47 PM Marc Poulhiès via Gcc-patches
 wrote:
>
> Fix all occurences of varadic, except for Rust (will be part of another 
> change).

OK.

> gcc/ChangeLog:
>
> * config/nvptx/nvptx.h (struct machine_function): Fix typo in 
> variadic.
> * config/nvptx/nvptx.cc (nvptx_function_arg_advance): Adjust to use 
> fixed name.
> (nvptx_declare_function_name): Likewise.
> (nvptx_call_args): Likewise.
> (nvptx_expand_call): Likewise.
>
> gcc/cp/ChangeLog:
>
> * lambda.cc (compare_lambda_sig): Fix typo in variadic.
>
> libcpp/ChangeLog:
>
> * macro.cc (parse_params): Fix typo in variadic.
> (create_iso_definition): Likewise.
>
> Signed-off-by: Marc Poulhiès 
> ---
>
> Hi,
>
> I came across this trivial typo and fixed it.
>
> The compiler still builds correctly.
> I've bootstraped x86_64-linux.
> As I don't really know how to setup nvptx correctly (and not sure
> this trivial fix warrants learning the full setup...), I've simply
> built the compiler for nvptx-none.
>
> Ok for master?
>
>  gcc/config/nvptx/nvptx.cc | 14 +++---
>  gcc/config/nvptx/nvptx.h  |  4 ++--
>  gcc/cp/lambda.cc  |  2 +-
>  libcpp/macro.cc   | 20 ++--
>  4 files changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
> index edef39fb5e1..0de42408841 100644
> --- a/gcc/config/nvptx/nvptx.cc
> +++ b/gcc/config/nvptx/nvptx.cc
> @@ -720,7 +720,7 @@ nvptx_function_arg_advance (cumulative_args_t cum_v, 
> const function_arg_info &)
>
>  /* Implement TARGET_FUNCTION_ARG_BOUNDARY.
>
> -   For nvptx This is only used for varadic args.  The type has already
> +   For nvptx This is only used for variadic args.  The type has already
> been promoted and/or converted to invisible reference.  */
>
>  static unsigned
> @@ -1548,7 +1548,7 @@ nvptx_declare_function_name (FILE *file, const char 
> *name, const_tree decl)
>if (!TARGET_SOFT_STACK)
>  {
>/* Declare a local var for outgoing varargs.  */
> -  if (cfun->machine->has_varadic)
> +  if (cfun->machine->has_variadic)
> init_frame (file, STACK_POINTER_REGNUM,
> UNITS_PER_WORD, crtl->outgoing_args_size);
>
> @@ -1558,7 +1558,7 @@ nvptx_declare_function_name (FILE *file, const char 
> *name, const_tree decl)
> init_frame (file, FRAME_POINTER_REGNUM, alignment,
> ROUND_UP (sz, GET_MODE_SIZE (DImode)));
>  }
> -  else if (need_frameptr || cfun->machine->has_varadic || cfun->calls_alloca
> +  else if (need_frameptr || cfun->machine->has_variadic || cfun->calls_alloca
>|| (cfun->machine->has_simtreg && !crtl->is_leaf))
>  init_softstack_frame (file, alignment, sz);
>
> @@ -1795,13 +1795,13 @@ nvptx_call_args (rtx arg, tree fntype)
>if (!cfun->machine->doing_call)
>  {
>cfun->machine->doing_call = true;
> -  cfun->machine->is_varadic = false;
> +  cfun->machine->is_variadic = false;
>cfun->machine->num_args = 0;
>
>if (fntype && stdarg_p (fntype))
> {
> - cfun->machine->is_varadic = true;
> - cfun->machine->has_varadic = true;
> + cfun->machine->is_variadic = true;
> + cfun->machine->has_variadic = true;
>   cfun->machine->num_args++;
> }
>  }
> @@ -1871,7 +1871,7 @@ nvptx_expand_call (rtx retval, rtx address)
>  }
>
>unsigned nargs = cfun->machine->num_args;
> -  if (cfun->machine->is_varadic)
> +  if (cfun->machine->is_variadic)
>  {
>varargs = gen_reg_rtx (Pmode);
>emit_move_insn (varargs, stack_pointer_rtx);
> diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
> index 129427e5654..666021283c2 100644
> --- a/gcc/config/nvptx/nvptx.h
> +++ b/gcc/config/nvptx/nvptx.h
> @@ -209,8 +209,8 @@ struct GTY(()) machine_function
>  {
>rtx_expr_list *call_args;  /* Arg list for the current call.  */
>bool doing_call; /* Within a CALL_ARGS ... CALL_ARGS_END sequence.  */
> -  bool is_varadic;  /* This call is varadic  */
> -  bool has_varadic;  /* Current function has a varadic call.  */
> +  bool is_variadic;  /* This call is variadic  */
> +  bool has_variadic;  /* Current function has a variadic call.  */
>bool has_chain; /* Current function has outgoing static chain.  */
>bool has_softstack; /* Current function has a soft stack frame.  */
>bool has_simtreg; /* Current function has an OpenMP SIMD region.  */
> diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
> index a359bc6ee8d..34d0190a89b 100644
> --- a/gcc/cp/lambda.cc
> +++ b/gcc/cp/lambda.cc
> @@ -1619,7 +1619,7 @@ compare_lambda_sig (tree fn_a, tree fn_b)
>  {
>if (!args_a || !args_b)
> return false;
> -  // This check also deals with differing varadicness
> +  // This check also deals with differing variadicness
>if (!same_type_p (TREE_VALUE (args_a), TREE_VALUE (args_b)))
> 

Re: [PATCH] MATCH: Add simplifications of `(a == CST) & a`

2023-09-18 Thread Richard Biener via Gcc-patches
On Sat, Sep 16, 2023 at 6:00 PM Andrew Pinski via Gcc-patches
 wrote:
>
> `(a == CST) & a` can be either simplified to simplying `a == CST`
> or 0 depending on the first bit of the CST.
> This is an extension of the already pattern of `X & !X` and allows
> us to remove the 2 xfails on gcc.dg/binop-notand1a.c and 
> gcc.dg/binop-notand4a.c.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> PR tree-optimization/111431
>
> gcc/ChangeLog:
>
> * match.pd (`(a == CST) & a`): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/binop-notand1a.c: Remove xfail.
> * gcc.dg/binop-notand4a.c: Likewise.
> * gcc.c-torture/execute/pr111431-1.c: New test.
> * gcc.dg/binop-andeq1.c: New test.
> * gcc.dg/binop-andeq2.c: New test.
> * gcc.dg/binop-notand7.c: New test.
> * gcc.dg/binop-notand7a.c: New test.
> ---
>  gcc/match.pd  |  8 
>  .../gcc.c-torture/execute/pr111431-1.c| 39 +++
>  gcc/testsuite/gcc.dg/binop-andeq1.c   | 12 ++
>  gcc/testsuite/gcc.dg/binop-andeq2.c   | 14 +++
>  gcc/testsuite/gcc.dg/binop-notand1a.c |  4 +-
>  gcc/testsuite/gcc.dg/binop-notand4a.c |  4 +-
>  gcc/testsuite/gcc.dg/binop-notand7.c  | 12 ++
>  gcc/testsuite/gcc.dg/binop-notand7a.c | 12 ++
>  8 files changed, 99 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr111431-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/binop-andeq1.c
>  create mode 100644 gcc/testsuite/gcc.dg/binop-andeq2.c
>  create mode 100644 gcc/testsuite/gcc.dg/binop-notand7.c
>  create mode 100644 gcc/testsuite/gcc.dg/binop-notand7a.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index ebb50ee0581..65960a1701e 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5172,6 +5172,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   )
>  )
>
> +/* `(a == CST) & a` can be simplified to `0` or `(a == CST)` depending
> +   on the first bit of the CST.  */
> +(simplify
> + (bit_and:c (convert@2 (eq @0 INTEGER_CST@1)) (convert? @0))
> + (if ((wi::to_wide (@1) & 1) != 0)
> +  @2
> +  { build_zero_cst (type); }))
> +
>  /* Optimize
> # x_5 in range [cst1, cst2] where cst2 = cst1 + 1
> x_5 ? cstN ? cst4 : cst3
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111431-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr111431-1.c
> new file mode 100644
> index 000..a96dbadf2b5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr111431-1.c
> @@ -0,0 +1,39 @@
> +int
> +foo (int a)
> +{
> +  int b = a == 0;
> +  return (a & b);
> +}
> +
> +#define function(vol,cst) \
> +__attribute__((noipa)) \
> +_Bool func_##cst##_##vol(vol int a) \
> +{ \
> +  vol int b = a == cst; \
> +  return (a & b); \
> +}
> +
> +#define funcdefs(cst) \
> +function(,cst) \
> +function(volatile,cst)
> +
> +#define funcs(f) \
> +f(0) \
> +f(1) \
> +f(5)
> +
> +funcs(funcdefs)
> +
> +#define test(cst) \
> +do { \
> + if(func_##cst##_(a) != func_##cst##_volatile(a))\
> +   __builtin_abort(); \
> +} while(0);
> +int main(void)
> +{
> +  for(int a = -10; a <= 10; a++)
> +   {
> + funcs(test)
> +   }
> +}
> +
> diff --git a/gcc/testsuite/gcc.dg/binop-andeq1.c 
> b/gcc/testsuite/gcc.dg/binop-andeq1.c
> new file mode 100644
> index 000..2a92b8f95df
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/binop-andeq1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* PR tree-optimization/111431 */
> +
> +int
> +foo (int a)
> +{
> +  int b = a == 2;
> +  return (a & b);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "return 0" 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/binop-andeq2.c 
> b/gcc/testsuite/gcc.dg/binop-andeq2.c
> new file mode 100644
> index 000..895262fc17e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/binop-andeq2.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* PR tree-optimization/111431 */
> +
> +int
> +foo (int a)
> +{
> +  int b = a == 1025;
> +  return (a & b);
> +}
> +
> +/* { dg-final { scan-tree-dump-not "return 0"  "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " & "  "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " == 1025;" 1  "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/binop-notand1a.c 
> b/gcc/testsuite/gcc.dg/binop-notand1a.c
> index c7e932b2638..d94685eb4ce 100644
> --- a/gcc/testsuite/gcc.dg/binop-notand1a.c
> +++ b/gcc/testsuite/gcc.dg/binop-notand1a.c
> @@ -7,6 +7,4 @@ foo (char a, unsigned short b)
>return (a & !a) | (b & !b);
>  }
>
> -/* As long as comparisons aren't boolified and casts from boolean-types
> -   aren't preserved, the folding of  X & !X to zero fails.  */
> -/* { dg-final { scan-tree-dump-times "return 0" 1 "optimized" { xfail *-*-* 
> } } } */
> +/* { dg-final { scan-tree-dump-times "return 0" 1 "optimized"  } } */
> diff 

Re: [PATCH] MATCH: Add simplifications for `(a * zero_one) ==/!= CST`

2023-09-18 Thread Richard Biener via Gcc-patches
On Sat, Sep 16, 2023 at 7:50 AM Andrew Pinski via Gcc-patches
 wrote:
>
> Transforming `(a * b@[0,1]) != 0` into `((cast)b) & a != 0`

that isn't strictly a simplification (one more op), and your
alternate transform is even worse in this regard.

> will produce better code as a lot of the time b is defined
> by a comparison.

what if not?  How does it simplify then?

> Also since canonicalize `a & -zero_one` into `a * zero_one` we
> start to lose information when doing comparisons against 0.
> In the case of PR 110992, we lose that `a != 0` on the branch

How so?  Ranger should be happy with both forms, no?

> and then don't do a jump threading like we should.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> PR tree-optimization/110992
>
> gcc/ChangeLog:
>
> * match.pd (`a * zero_one !=/== CST`): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/vrp116.c: Update test to avoid the
> extra comparison.
> * gcc.c-torture/execute/pr110992-1.c: New test.
> * gcc.dg/tree-ssa/pr110992-1.c: New test.
> * gcc.dg/tree-ssa/pr110992-2.c: New test.
> ---
>  gcc/match.pd  | 15 +++
>  .../gcc.c-torture/execute/pr110992-1.c| 43 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr110992-1.c| 21 +
>  gcc/testsuite/gcc.dg/tree-ssa/pr110992-2.c| 17 
>  gcc/testsuite/gcc.dg/tree-ssa/vrp116.c|  2 +-
>  5 files changed, 97 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110992-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110992-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110992-2.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 39c9c81966a..97405e6a5c3 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2197,6 +2197,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type))
>(bit_and @0 @1)))
>
> +/* (a * b@[0,1]) == CST
> + ->
> +   CST == 0 ? (a == CST | b == 0) : (a == CST & b != 0)
> +   (a * b@[0,1]) != CST
> + ->
> +   CST != 0 ? (a != CST | b == 0) : (a != CST & b != 0)  */
> +(for cmp (ne eq)
> + (simplify
> +  (cmp (mult:cs @0 zero_one_valued_p@1) INTEGER_CST@2)
> +  (if ((cmp == EQ_EXPR) ^ (wi::to_wide (@2) != 0))
> +   (bit_ior
> +(cmp @0 @2)
> +(convert (bit_xor @1 { build_one_cst (TREE_TYPE (@1)); })))
> +   (bit_and (cmp @0 @2) (convert @1)
> +
>  (for cmp (tcc_comparison)
>   icmp (inverted_tcc_comparison)
>   /* Fold (((a < b) & c) | ((a >= b) & d)) into (a < b ? c : d) & 1.  */
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110992-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr110992-1.c
> new file mode 100644
> index 000..edb7eb75ef2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr110992-1.c
> @@ -0,0 +1,43 @@
> +#define CST 5
> +#define OP !=
> +#define op_eq ==
> +#define op_ne !=
> +
> +#define function(vol,op, cst) \
> +__attribute__((noipa)) \
> +_Bool func_##op##_##cst##_##vol(vol int a, vol _Bool b) \
> +{ \
> +  vol int d = (a * b); \
> +  return d op_##op cst; \
> +}
> +
> +#define funcdefs(op,cst) \
> +function(,op,cst) \
> +function(volatile,op,cst)
> +
> +#define funcs(f) \
> +f(eq,0) \
> +f(eq,1) \
> +f(eq,5) \
> +f(ne,0) \
> +f(ne,1) \
> +f(ne,5)
> +
> +funcs(funcdefs)
> +
> +#define test(op,cst) \
> +do { \
> + if(func_##op##_##cst##_(a,b) != func_##op##_##cst##_volatile(a,b))\
> +   __builtin_abort(); \
> +} while(0);
> +
> +int main(void)
> +{
> +for(int a = -10; a <= 10; a++)
> +{
> +_Bool b = 0;
> +funcs(test)
> +b = 1;
> +funcs(test)
> +}
> +}
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr110992-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr110992-1.c
> new file mode 100644
> index 000..825fd63f84c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr110992-1.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +static unsigned b;
> +static short c = 4;
> +void foo(void);
> +static short(a)(short d, short g) { return d * g; }
> +void e();
> +static char f() {
> +  b = 0;
> +  return 0;
> +}
> +int main() {
> +  int h = b;
> +  if ((short)(a(c && e, 65535) & h)) {
> +foo();
> +h || f();
> +  }
> +}
> +
> +/* There should be no calls to foo left. */
> +/* { dg-final { scan-tree-dump-not " foo " "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr110992-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr110992-2.c
> new file mode 100644
> index 000..6082949a218
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr110992-2.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +static unsigned b;
> +static short c = 4;
> +void foo(void);
> +int main() {
> +  int h = b;
> +  int d = c != 0;
> +  if (h*d) {
> +foo();
> +if (!h) b = 20;
> +  }
> +}
> +

Re: [PATCH] gcc: Introduce -fhardened

2023-09-18 Thread Richard Biener via Gcc-patches
On Fri, Sep 15, 2023 at 5:09 PM Marek Polacek via Gcc-patches
 wrote:
>
> Bootstrapped/regtested on x86_64-pc-linux-gnu, powerpc64le-unknown-linux-gnu,
> and aarch64-unknown-linux-gnu; ok for trunk?
>
> -- >8 --
> In 
> I proposed -fhardened, a new umbrella option that enables a reasonable set
> of hardening flags.  The read of the room seems to be that the option
> would be useful.  So here's a patch implementing that option.
>
> Currently, -fhardened enables:
>
>   -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
>   -D_GLIBCXX_ASSERTIONS
>   -ftrivial-auto-var-init=pattern
>   -fPIE  -pie  -Wl,-z,relro,-z,now
>   -fstack-protector-strong
>   -fstack-clash-protection
>   -fcf-protection=full (x86 GNU/Linux only)
>
> -fhardened will not override options that were specified on the command line
> (before or after -fhardened).  For example,
>
>  -D_FORTIFY_SOURCE=1 -fhardened
>
> means that _FORTIFY_SOURCE=1 will be used.  Similarly,
>
>   -fhardened -fstack-protector
>
> will not enable -fstack-protector-strong.
>
> In DW_AT_producer it is reflected only as -fhardened; it doesn't expand
> to anything.  I think we need a better way to show what it actually
> enables.

I do think we need to find a solution here to solve asserting compliance.
Maybe we can have -Whardened that will diagnose any altering of
-fhardened by other options on the command-line or by missed target
implementations?  People might for example use -fstack-protector
but don't really want to make protection lower than requested with -fhardened.

Any such conflict is much less appearant than when you use the
flags -fhardened composes.

Richard.

>
> gcc/c-family/ChangeLog:
>
> * c-opts.cc (c_finish_options): Maybe cpp_define _FORTIFY_SOURCE
> and _GLIBCXX_ASSERTIONS.
>
> gcc/ChangeLog:
>
> * common.opt (fhardened): New option.
> * config.in: Regenerate.
> * config/bpf/bpf.cc: Include "opts.h".
> (bpf_option_override): If flag_stack_protector_set_by_fhardened_p, do
> not inform that -fstack-protector does not work.
> * config/i386/i386-options.cc (ix86_option_override_internal): When
> -fhardened, maybe enable -fcf-protection=full.
> * configure: Regenerate.
> * configure.ac: Check if the linker supports '-z now' and '-z relro'.
> * doc/invoke.texi: Document -fhardened.
> * gcc.cc (driver_handle_option): Remember if any link options or 
> -static
> were specified on the command line.
> (process_command): When -fhardened, maybe enable -pie and
> -Wl,-z,relro,-z,now.
> * opts.cc (flag_stack_protector_set_by_fhardened_p): New global.
> (finish_options): When -fhardened, enable
> -ftrivial-auto-var-init=pattern and -fstack-protector-strong.
> (print_help_hardened): New.
> (print_help): Call it.
> * toplev.cc (process_options): When -fhardened, enable
> -fstack-clash-protection.  If flag_stack_protector_set_by_fhardened_p,
> do not warn that -fstack-protector not supported for this target.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.misc-tests/help.exp: Test -fhardened.
> * c-c++-common/fhardened-1.S: New test.
> * c-c++-common/fhardened-1.c: New test.
> * c-c++-common/fhardened-10.c: New test.
> * c-c++-common/fhardened-11.c: New test.
> * c-c++-common/fhardened-12.c: New test.
> * c-c++-common/fhardened-13.c: New test.
> * c-c++-common/fhardened-14.c: New test.
> * c-c++-common/fhardened-2.c: New test.
> * c-c++-common/fhardened-3.c: New test.
> * c-c++-common/fhardened-5.c: New test.
> * c-c++-common/fhardened-6.c: New test.
> * c-c++-common/fhardened-7.c: New test.
> * c-c++-common/fhardened-8.c: New test.
> * c-c++-common/fhardened-9.c: New test.
> * gcc.target/i386/cf_check-6.c: New test.
> ---
>  gcc/c-family/c-opts.cc | 29 
>  gcc/common.opt |  4 ++
>  gcc/config.in  | 12 +
>  gcc/config/bpf/bpf.cc  |  8 ++--
>  gcc/config/i386/i386-options.cc| 11 -
>  gcc/configure  | 50 +++-
>  gcc/configure.ac   | 42 -
>  gcc/doc/invoke.texi| 29 +++-
>  gcc/gcc.cc | 39 +++-
>  gcc/opts.cc| 53 --
>  gcc/opts.h |  1 +
>  gcc/testsuite/c-c++-common/fhardened-1.S   |  6 +++
>  gcc/testsuite/c-c++-common/fhardened-1.c   | 14 ++
>  gcc/testsuite/c-c++-common/fhardened-10.c  | 10 
>  gcc/testsuite/c-c++-common/fhardened-11.c  | 10 
>  gcc/testsuite/c-c++-common/fhardened-12.c  | 11 +
>  

Re: [PATCH] [RFC] New early __builtin_unreachable processing.

2023-09-18 Thread Richard Biener via Gcc-patches
On Fri, Sep 15, 2023 at 4:45 PM Andrew MacLeod  wrote:
>
> Ive been looking at __builtin_unreachable () regressions.  The
> fundamental problem seems to be  a lack of consistent expectation for
> when we remove it earlier than the final pass of VRP.After looking
> through them, I think this provides a sensible approach.
>
> Ranger is pretty good at providing ranges in blocks dominated by the
> __builtin_unreachable  branch, so removing it isn't quite a critical as
> it once was.  Its also pretty good at identifying what in the block can
> be affected by the branch.
>
> This patch provide an alternate removal algorithm for earlier passes.
> it looks at *all* the exports from the block, and if the branch
> dominates every use of all the exports, AND none of those values access
> memory, VRP will remove the unreachable call, rewrite the branch, update
> all the values globally, and finally perform the simple DCE on the
> branch's ssa-name.   This is kind of what it did before, but it wasn't
> as stringent on the requirements.
>
> The memory access check is required because there are a couple of test
> cases for PRE in which there is a series of instruction leading to an
> unreachable call, and none of those ssa names are ever used in the IL
> again. The whole chunk is dead, and we update globals, however
> pointlessly.  However, one of ssa_names loads from memory, and a later
> passes commons this value with a later load, and then  the unreachable
> call provides additional information about the later load.This is
> evident in tree-ssa/ssa-pre-34.c.   The only way I see to avoid this
> situation is to not remove the unreachable if there is a load feeding it.
>
> What this does is a more sophisticated version of what DOM does in
> all_uses_feed_or_dominated_by_stmt.  THe feeding instructions dont have
> to be single use, but they do have to be dominated by the branch or be
> single use within the branches block..
>
> If there are multiple uses in the same block as the branch, this does
> not remove the unreachable call.  If we could be sure there are no
> intervening calls or side effects, it would be allowable, but this a
> more expensive checking operation.  Ranger gets the ranges right anyway,
> so with more passes using ranger, Im not sure we'd see much benefit from
> the additional analysis.   It could always be added later.
>
> This fixes at least 110249 and 110080 (and probably others).  The only
> regression is 93917 for which I changed the testcase to adjust
> expectations:
>
> // PR 93917
> void f1(int n)
> {
>if(n<0)
>  __builtin_unreachable();
>f3(n);
> }
>
> void f2(int*n)
> {
>if(*n<0)
>  __builtin_unreachable();
>f3 (*n);
> }
>
> We were removing both unreachable calls in VRP1, but only updating the
> global values in the first case, meaning we lose information.   With the
> change in semantics, we only update the global in the first case, but we
> leave the unreachable call in the second case now (due to the load from
> memory).  Ranger still calculates the contextual range correctly as [0,
> +INF] in the second case, it just doesn't set the global value until
> VRP2 when it is removed.
>
> Does this seem reasonable?

I wonder how this addresses the fundamental issue we always faced
in that when we apply the range this range info in itself allows the
branch to the __builtin_unreachable () to be statically determined,
so when the first VRP pass sets the range the next pass evaluating
the condition will remove it (and the guarded __builtin_unreachable ()).

In principle there's nothing wrong with that if we don't lose the range
info during optimizations, but that unfortunately happens more often
than wanted and with the __builtin_unreachable () gone we've lost
the ability to re-compute them.

I think it's good to explicitly remove the branch at the point we want
rather than relying on the "next" visitor to pick up the global range.

As I read the patch we now remove __builtin_unreachable () explicitly
as soon as possible but don't really address the fundamental issue
in any way?

> Bootstraps on x86_64-pc-linux-gnu with no regressions.  OK?
>
> Andrew
>
>


Re: Question on -fwrapv and -fwrapv-pointer

2023-09-15 Thread Richard Biener via Gcc-patches



> Am 15.09.2023 um 17:37 schrieb Qing Zhao :
> 
> 
> 
>>> On Sep 15, 2023, at 11:29 AM, Richard Biener  
>>> wrote:
>>> 
>>> 
>>> 
 Am 15.09.2023 um 17:25 schrieb Qing Zhao :
>>> 
>>> 
>>> 
 On Sep 15, 2023, at 8:41 AM, Arsen Arsenović  wrote:
 
 
 Qing Zhao  writes:
 
> Even though unsigned integer overflow is well defined, it might be
> unintentional, shall we warn user about this?
 
 This would be better addressed by providing operators or functions that
 do overflow checking in the language, so that they can be explicitly
 used where overflow is unexpected.
>>> 
>>> Yes, that will be very helpful to prevent unexpected overflow in the 
>>> program in general.
>>> However, this will mainly benefit new codes.
>>> 
>>> For the existing C codes, especially large applications, we still need to 
>>> identify all the places 
>>> Where the overflow is unexpected, and fix them. 
>>> 
>>> One good example is linux kernel. 
>>> 
 One could easily imagine a scenario
 where overflow is not expected in some region of code but is in the
 larger application.
>>> 
>>> Yes, that’s exactly the same situation Linux kernel faces now, the 
>>> unexpected Overflow and 
>>> expected wrap-around are mixed together inside one module. 
>>> It’s hard to detect the unexpected overflow under such situation based on 
>>> the current GCC. 
>> 
>> But that’s hardly GCCs fault nor can GCC fix that in any way.  Only the 
>> programmer can distinguish both cases.
> 
> Right, compiler cannot fix this. 
> But can provide some tools to help the user to detect this more conveniently. 
> 
> Right now, GCC provides two set of options for different types:
> 
> A. Turn the overflow to expected wrap-around (remove UB);
> B. Detect overflow;
> 
>AB
>   remove UB-fsanitize=…
> signed   -fwrapvsigned-integer-overflow
> pointer   -fwrapv-pointerpointer-overflow (broken in Clang)
> 
> However, Options in A and B excluded with each other. They cannot mix 
> together for a single file.
> 
> What’s requested from Kernel is:
> 
> compiler needs to provide a functionality that can mix these two together for 
> a file. 
> 
> i.e, apply A (convert UB to defined behavior WRAP-AROUND) only to part of the 
> program.  And then add -fsnaitize=*overflow to detect all other
> Unexpected overflows in the program.
> 
> This is currently missing from GCC, I guess?

How can GCC know which part of the program wants wrapping and which sanitizing?

Richard 

> Qing
> 
> 
> 
> 
> 
>> 
>> Richard 
>> 
>>> Thanks.
>>> 
>>> Qing
 -- 
 Arsen Arsenović
> 


Re: Question on -fwrapv and -fwrapv-pointer

2023-09-15 Thread Richard Biener via Gcc-patches



> Am 15.09.2023 um 17:25 schrieb Qing Zhao :
> 
> 
> 
>> On Sep 15, 2023, at 8:41 AM, Arsen Arsenović  wrote:
>> 
>> 
>> Qing Zhao  writes:
>> 
>>> Even though unsigned integer overflow is well defined, it might be
>>> unintentional, shall we warn user about this?
>> 
>> This would be better addressed by providing operators or functions that
>> do overflow checking in the language, so that they can be explicitly
>> used where overflow is unexpected.
> 
> Yes, that will be very helpful to prevent unexpected overflow in the program 
> in general.
> However, this will mainly benefit new codes.
> 
> For the existing C codes, especially large applications, we still need to 
> identify all the places 
> Where the overflow is unexpected, and fix them. 
> 
> One good example is linux kernel. 
> 
>> One could easily imagine a scenario
>> where overflow is not expected in some region of code but is in the
>> larger application.
> 
> Yes, that’s exactly the same situation Linux kernel faces now, the unexpected 
> Overflow and 
> expected wrap-around are mixed together inside one module. 
> It’s hard to detect the unexpected overflow under such situation based on the 
> current GCC. 

But that’s hardly GCCs fault nor can GCC fix that in any way.  Only the 
programmer can distinguish both cases.

Richard 

> Thanks.
> 
> Qing
>> -- 
>> Arsen Arsenović
> 


Re: [WIP] Re-introduce 'TREE_USED' in tree streaming

2023-09-15 Thread Richard Biener via Gcc-patches
On Fri, Sep 15, 2023 at 3:05 PM Richard Biener
 wrote:
>
> On Fri, Sep 15, 2023 at 3:01 PM Thomas Schwinge  
> wrote:
> >
> > Hi!
> >
> > On 2023-09-15T12:11:44+0200, Richard Biener via Gcc-patches 
> >  wrote:
> > > On Fri, Sep 15, 2023 at 11:20 AM Thomas Schwinge
> > >  wrote:
> > >> Now, that was another quirky debug session: in
> > >> 'gcc/omp-low.cc:create_omp_child_function' we clearly do set
> > >> 'TREE_USED (t) = 1;' for '.omp_data_i', which ends up as formal parameter
> > >> for outlined '[...]._omp_fn.[...]' functions, pointing to the "OMP blob".
> > >> Yet, in offloading compilation, I only ever got '!TREE_USED' for the
> > >> formal parameter '.omp_data_i'.  This greatly disturbs a nvptx back end
> > >> expand-time transformation that I have implemented, that's active
> > >> 'if (!TREE_USED ([formal parameter]))'.
> > >>
> > >> After checking along all the host-side OMP handling, eventually (in
> > >> hindsight: "obvious"...) I found that, "simply", we're not streaming
> > >> 'TREE_USED'!  With that changed (see attached
> > >> "Re-introduce 'TREE_USED' in tree streaming"; no visible changes in
> > >> x86_64-pc-linux-gnu and powerpc64le-unknown-linux-gnu 'make check'), my
> > >> issue was quickly addressed -- if not for the question *why* 'TREE_USED'
> > >> isn't streamed (..., and apparently, that's a problem only for my
> > >> case..?), and then I found that it's *intentionally been removed*
> > >> in one-decade-old commit ee03e71d472a3f73cbc1a132a284309f36565972
> > >> (Subversion r200151) "Re-write LTO type merging again, do tree merging".
> > >>
> > >> At this point, I need help: is this OK to re-introduce unconditionally,
> > >> or in some conditionalized form (but, "ugh..."), or be done differently
> > >> altogether in the nvptx back end (is 'TREE_USED' considered "stale" at
> > >> some point in the compilation pipeline?), or do we need some logic in
> > >> tree stream read-in (?) to achieve the same thing that removing
> > >> 'TREE_USED' streaming apparently did achieve, or yet something else?
> > >> Indeed, from a quick look, most use of 'TREE_USED' seems to be "early",
> > >> but I saw no reason that it couldn't be used "late", either?
> > >
> > > TREE_USED is considered stale, it doesn't reflect reality and is used with
> > > different semantics throughout the pass pipeline
> >
> > Aha, thanks.  Any suggestion about how to update 'gcc/tree.h:TREE_USED',
> > for next time, to detail at which stages the properties indicated there
> > are meaningful?  (..., and we shall also add some such comment in the two
> > tree streamer functions.)
> >
> > > so it doesn't make much sense
> > > to stream it also because it will needlessly cause divergence between TUs
> > > during tree merging.
> >
> > Right, that's what I'd assumed from quickly skimming the 2013 discussion.
> >
> > > So we definitely do not want to stream TREE_USED for
> > > every tree.
> > >
> > > Why would you guard anything late on TREE_USED?  If you want to know
> > > whether a formal parameter is "used" (used in code generation?  used in 
> > > the
> > > source?) you have to compute this property.  As you can see using 
> > > TREE_USED
> > > is fragile.
> >
> > The issue is: for function call outgoing/incoming arguments, the nvptx
> > back end has (to use) a mechanism different from usual targets.  For the
> > latter, the incoming arguments are readily available in registers or on
> > the stack, without requiring emission of any setup instructions.  For
> > nvptx, we have to generate boilerplate code for every function incoming
> > argument, to load the argument value into a local register.  (The latter
> > are then, at least for '-O0', spilled to and restored from the stack
> > frame, before the first actual use -- if there's any use at all.)
> >
> > This generates some bulky PTX code, which goes so far that we run into
> > timeout or OOM-killed 'ptxas' for 'gcc.c-torture/compile/limits-fndefn.c'
> > at '-O0', for example, where we've got half a million lines of
> > boilerplate PTX code.  That one certainly is a rogue test case, but I
> > then found that if I conditionalize emission of that incoming argument
> > setup code on 'TREE_USED' of the respective elemen

Re: [WIP] Re-introduce 'TREE_USED' in tree streaming

2023-09-15 Thread Richard Biener via Gcc-patches
On Fri, Sep 15, 2023 at 3:01 PM Thomas Schwinge  wrote:
>
> Hi!
>
> On 2023-09-15T12:11:44+0200, Richard Biener via Gcc-patches 
>  wrote:
> > On Fri, Sep 15, 2023 at 11:20 AM Thomas Schwinge
> >  wrote:
> >> Now, that was another quirky debug session: in
> >> 'gcc/omp-low.cc:create_omp_child_function' we clearly do set
> >> 'TREE_USED (t) = 1;' for '.omp_data_i', which ends up as formal parameter
> >> for outlined '[...]._omp_fn.[...]' functions, pointing to the "OMP blob".
> >> Yet, in offloading compilation, I only ever got '!TREE_USED' for the
> >> formal parameter '.omp_data_i'.  This greatly disturbs a nvptx back end
> >> expand-time transformation that I have implemented, that's active
> >> 'if (!TREE_USED ([formal parameter]))'.
> >>
> >> After checking along all the host-side OMP handling, eventually (in
> >> hindsight: "obvious"...) I found that, "simply", we're not streaming
> >> 'TREE_USED'!  With that changed (see attached
> >> "Re-introduce 'TREE_USED' in tree streaming"; no visible changes in
> >> x86_64-pc-linux-gnu and powerpc64le-unknown-linux-gnu 'make check'), my
> >> issue was quickly addressed -- if not for the question *why* 'TREE_USED'
> >> isn't streamed (..., and apparently, that's a problem only for my
> >> case..?), and then I found that it's *intentionally been removed*
> >> in one-decade-old commit ee03e71d472a3f73cbc1a132a284309f36565972
> >> (Subversion r200151) "Re-write LTO type merging again, do tree merging".
> >>
> >> At this point, I need help: is this OK to re-introduce unconditionally,
> >> or in some conditionalized form (but, "ugh..."), or be done differently
> >> altogether in the nvptx back end (is 'TREE_USED' considered "stale" at
> >> some point in the compilation pipeline?), or do we need some logic in
> >> tree stream read-in (?) to achieve the same thing that removing
> >> 'TREE_USED' streaming apparently did achieve, or yet something else?
> >> Indeed, from a quick look, most use of 'TREE_USED' seems to be "early",
> >> but I saw no reason that it couldn't be used "late", either?
> >
> > TREE_USED is considered stale, it doesn't reflect reality and is used with
> > different semantics throughout the pass pipeline
>
> Aha, thanks.  Any suggestion about how to update 'gcc/tree.h:TREE_USED',
> for next time, to detail at which stages the properties indicated there
> are meaningful?  (..., and we shall also add some such comment in the two
> tree streamer functions.)
>
> > so it doesn't make much sense
> > to stream it also because it will needlessly cause divergence between TUs
> > during tree merging.
>
> Right, that's what I'd assumed from quickly skimming the 2013 discussion.
>
> > So we definitely do not want to stream TREE_USED for
> > every tree.
> >
> > Why would you guard anything late on TREE_USED?  If you want to know
> > whether a formal parameter is "used" (used in code generation?  used in the
> > source?) you have to compute this property.  As you can see using TREE_USED
> > is fragile.
>
> The issue is: for function call outgoing/incoming arguments, the nvptx
> back end has (to use) a mechanism different from usual targets.  For the
> latter, the incoming arguments are readily available in registers or on
> the stack, without requiring emission of any setup instructions.  For
> nvptx, we have to generate boilerplate code for every function incoming
> argument, to load the argument value into a local register.  (The latter
> are then, at least for '-O0', spilled to and restored from the stack
> frame, before the first actual use -- if there's any use at all.)
>
> This generates some bulky PTX code, which goes so far that we run into
> timeout or OOM-killed 'ptxas' for 'gcc.c-torture/compile/limits-fndefn.c'
> at '-O0', for example, where we've got half a million lines of
> boilerplate PTX code.  That one certainly is a rogue test case, but I
> then found that if I conditionalize emission of that incoming argument
> setup code on 'TREE_USED' of the respective element of the chain of
> 'DECL_ARGUMENTS', then I do get the desired behavior: zero-instructions
> 'limits-fndefn.S'.  So this "late" use of 'TREE_USED' does work -- just
> that, as discussed, 'TREE_USED' isn't available in the offloading
> setting.  ;-)
>
> I'll look into computing "unused" locally, before/for nvptx expand time.
> (To make the '-O0' case work, I figure this has to happen early, instead
> of later 

[PATCH][RFC] middle-end/106811 - document GENERIC/GIMPLE undefined behavior

2023-09-15 Thread Richard Biener via Gcc-patches
The following attempts to provide a set of conditions GENERIC/GIMPLE
considers invoking undefined behavior, leaning on the C standards
Annex J, as to provide portability guidance to language frontend
developers.

I've both tried to remember cases we exploit undefined behavior
and went over C2x Annex J to catch more stuff.  I'd be grateful
if people could point out obvious omissions or cases where the
wording isn't clear.  I plan to check/amend the individual operator
documentations as well, but not everything fits there.

I've put this into generic.texi because it applies to GENERIC as
the frontend interface.  All constraints apply to GIMPLE as well.
I plan to add a section to gimple.texi as to how to deal with
undefined behavior.

As said, every comment is welcome.

For testing I've built doc and inspected the resulting pdf.

PR middle-end/106811
* doc/generic.texi: Add portability section with
subsection on undefined behavior.
---
 gcc/doc/generic.texi | 87 
 1 file changed, 87 insertions(+)

diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 6534c354b7a..0969f881146 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -43,6 +43,7 @@ seems inelegant.
 * Functions::  Function bodies, linkage, and other aspects.
 * Language-dependent trees::Topics and trees specific to language front 
ends.
 * C and C++ Trees::Trees specific to C and C++.
+* Portability issues::  Portability summary for languages.
 @end menu
 
 @c -
@@ -3733,3 +3734,89 @@ In either case, the expression is void.
 
 
 @end table
+
+
+@node Portability issues
+@section Portability issues
+
+This section summarizes portability issues when translating source languages
+to GENERIC.  Everything written here also applies to GIMPLE.  This section
+heavily relies on interpretation according to the C standard.
+
+@menu
+* Undefined behavior::  Undefined behavior.
+@end menu
+
+@node Undefined behavior
+@subsection Undefined behavior
+
+The following is a list of circumstances that invoke undefined behavior.
+
+@itemize @bullet
+@item
+When the result of negation, addition, subtraction or division of two signed
+integers or signed integer vectors not subject to @option{-fwrapv} cannot be
+represented in the type.
+
+@item
+The value of the second operand of any of the division or modulo operators
+is zero.
+
+@item
+When incrementing or decrementing a pointer not subject to
+@option{-fwrapv-pointer} wraps around zero.
+
+@item
+An expression is shifted by a negative number or by an amount greater
+than or equal to the width of the shifted operand.
+
+@item
+Pointers that do not point to the same object are compared using
+relational operators.
+
+@item
+An object which has been modified is accessed through a restrict-qualified
+pointer and another pointer that are not both based on the same object.
+
+@item
+The @} that terminates a function is reached, and the value of the function
+call is used by the caller.
+
+@item
+When program execution reaches __builtin_unreachable.
+
+@item
+When an object has its stored value accessed by an lvalue that
+does not have one of the following types:
+@itemize @minus
+@item
+a (qualified) type compatible with the effective type of the object
+@item
+a type that is the (qualified) signed or unsigned type corresponding to
+the effective type of the object
+@item
+a character type, a ref-all qualified type or a type subject to
+@option{-fno-strict-aliasing}
+@item
+a pointer to void with the same level of indirection as the accessed
+pointer object
+@end itemize
+
+@item
+Addition or subtraction of a pointer into, or just beyond, an object
+and an integer type produces a result that does not point into, or just
+beyond when not dereferenced, the same object.
+
+@item
+Pointers that do not point into, or just beyond, the same object are
+subtracted.
+
+@item
+When a pointer not pointing to actual storage is dereferenced.
+
+@item
+An array subscript is out of range, even if an object is apparently accessible
+with the given subscript (as in the lvalue expression a[1][7] given the
+declaration int a[4][5]).
+
+@end itemize
-- 
2.35.3


Re: [WIP] Re-introduce 'TREE_USED' in tree streaming

2023-09-15 Thread Richard Biener via Gcc-patches
On Fri, Sep 15, 2023 at 11:20 AM Thomas Schwinge
 wrote:
>
> Hi!
>
> Now, that was another quirky debug session: in
> 'gcc/omp-low.cc:create_omp_child_function' we clearly do set
> 'TREE_USED (t) = 1;' for '.omp_data_i', which ends up as formal parameter
> for outlined '[...]._omp_fn.[...]' functions, pointing to the "OMP blob".
> Yet, in offloading compilation, I only ever got '!TREE_USED' for the
> formal parameter '.omp_data_i'.  This greatly disturbs a nvptx back end
> expand-time transformation that I have implemented, that's active
> 'if (!TREE_USED ([formal parameter]))'.
>
> After checking along all the host-side OMP handling, eventually (in
> hindsight: "obvious"...) I found that, "simply", we're not streaming
> 'TREE_USED'!  With that changed (see attached
> "Re-introduce 'TREE_USED' in tree streaming"; no visible changes in
> x86_64-pc-linux-gnu and powerpc64le-unknown-linux-gnu 'make check'), my
> issue was quickly addressed -- if not for the question *why* 'TREE_USED'
> isn't streamed (..., and apparently, that's a problem only for my
> case..?), and then I found that it's *intentionally been removed*
> in one-decade-old commit ee03e71d472a3f73cbc1a132a284309f36565972
> (Subversion r200151) "Re-write LTO type merging again, do tree merging".
>
> At this point, I need help: is this OK to re-introduce unconditionally,
> or in some conditionalized form (but, "ugh..."), or be done differently
> altogether in the nvptx back end (is 'TREE_USED' considered "stale" at
> some point in the compilation pipeline?), or do we need some logic in
> tree stream read-in (?) to achieve the same thing that removing
> 'TREE_USED' streaming apparently did achieve, or yet something else?
> Indeed, from a quick look, most use of 'TREE_USED' seems to be "early",
> but I saw no reason that it couldn't be used "late", either?

TREE_USED is considered stale, it doesn't reflect reality and is used with
different semantics throughout the pass pipeline so it doesn't make much sense
to stream it also because it will needlessly cause divergence between TUs
during tree merging.  So we definitely do not want to stream TREE_USED for
every tree.

Why would you guard anything late on TREE_USED?  If you want to know
whether a formal parameter is "used" (used in code generation?  used in the
source?) you have to compute this property.  As you can see using TREE_USED
is fragile.

> Original discussion "not streaming and comparing TREE_USED":
> 
> "[RFC] Re-write LTO type merging again, do tree merging", continued
> 
> "Re-write LTO type merging again, do tree merging".
>
>
> In 2013, offloading compilation was just around the corner --
> 
> "Summary of the Accelerator BOF at Cauldron" -- and you easily could've
> foreseen this issue, no?  ;-P
>
>
> Grüße
>  Thomas
>
>
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955


Re: [PATCH] test: Block SLP check of slp-34.c for vect_strided5

2023-09-15 Thread Richard Biener via Gcc-patches
On Fri, 15 Sep 2023, Juzhe-Zhong wrote:

> Since RISC-V use vsseg5 which is the vect_store_lanes with stride 5
> if failed on RISC-V.

OK.

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/slp-34.c: Block check for vect_strided5.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/slp-34.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-34.c 
> b/gcc/testsuite/gcc.dg/vect/slp-34.c
> index 41832d7f519..53b8284d084 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-34.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-34.c
> @@ -57,5 +57,5 @@ int main (void)
>  }
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  
> } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
> { target {! vect_strided5 } } } } */
>
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] test: Block SLP check of slp-35.c for vect_strided5

2023-09-15 Thread Richard Biener via Gcc-patches
On Fri, 15 Sep 2023, Juzhe-Zhong wrote:

> gcc/testsuite/ChangeLog:

OK.

>   * gcc.dg/vect/slp-35.c: Block SLP check for vect_strided5 targets.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/slp-35.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-35.c 
> b/gcc/testsuite/gcc.dg/vect/slp-35.c
> index 5e9f6739e1f..2c9d168e096 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-35.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-35.c
> @@ -68,5 +68,5 @@ int main (void)
>  }
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target {! vect_strided5 } } } } */
>
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] test: Block vect_strided5 for slp-34-big-array.c SLP check

2023-09-15 Thread Richard Biener via Gcc-patches
On Fri, 15 Sep 2023, Juzhe-Zhong wrote:

> If failed on RISC-V since it use vect_store_lanes with array 5.

OK.

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/slp-34-big-array.c: Block SLP check for vect_strided5.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/slp-34-big-array.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-34-big-array.c 
> b/gcc/testsuite/gcc.dg/vect/slp-34-big-array.c
> index 0baaff7dc6e..db0e440639e 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-34-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-34-big-array.c
> @@ -63,5 +63,5 @@ int main (void)
>  }
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  
> } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
> { target {! vect_strided5 } } } } */
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] test: Isolate slp-1.c check of target supports vect_strided5

2023-09-15 Thread Richard Biener via Gcc-patches
On Fri, 15 Sep 2023, Juzhe-Zhong wrote:

> This test failed in RISC-V:
> FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
> "vectorizing stmts using SLP" 4
> FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using 
> SLP" 4
> 
> Because this loop:
>   /* SLP with unrolling by 8.  */
>   for (i = 0; i < N; i++)
> {
>   out[i*5] = 8;
>   out[i*5 + 1] = 7;
>   out[i*5 + 2] = 81;
>   out[i*5 + 3] = 28;
>   out[i*5 + 4] = 18;
> }
> 
> is using vect_load_lanes with array size = 5.
> instead of SLP.
> 
> When we adjust the COST of LANES load store, then it will use SLP.

OK.

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/slp-1.c: Add vect_stried5.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/slp-1.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-1.c 
> b/gcc/testsuite/gcc.dg/vect/slp-1.c
> index 82e4f6469fb..d4a13f12df6 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-1.c
> @@ -122,5 +122,5 @@ int main (void)
>  }
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" 
> } } */
> -  
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" 
> { target {! vect_strided5 } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" 
> { target vect_strided5 } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] test: Block slp-16.c check for target support vect_strided6

2023-09-15 Thread Richard Biener via Gcc-patches
On Fri, 15 Sep 2023, Juzhe-Zhong wrote:

> This testcase FAIL in RISC-V because RISC-V support vect_load_lanes with 6.
> FAIL: gcc.dg/vect/slp-16.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
> "vectorizing stmts using SLP" 2
> FAIL: gcc.dg/vect/slp-16.c scan-tree-dump-times vect "vectorizing stmts using 
> SLP" 2
> 
> Since it use vlseg6 (vect_load_lanes with array size = 6)

OK.

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/slp-16.c: Block vect_strided6.
>   * lib/target-supports.exp: Add strided type.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/slp-16.c| 2 +-
>  gcc/testsuite/lib/target-supports.exp | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-16.c 
> b/gcc/testsuite/gcc.dg/vect/slp-16.c
> index d053a64276d..44ba730bda8 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-16.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-16.c
> @@ -67,5 +67,5 @@ int main (void)
>  }
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
> vect_int_mult } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
> { target vect_int_mult } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
> { target { vect_int_mult && {! vect_strided6 } } } } } */
>
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index edaa010258f..2de41cef2f6 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -8621,7 +8621,7 @@ proc check_effective_target_vect_interleave { } {
>&& [check_effective_target_s390_vx]) }}]
>  }
>  
> -foreach N {2 3 4 8} {
> +foreach N {2 3 4 5 6 7 8} {
>  eval [string map [list N $N] {
>   # Return 1 if the target supports 2-vector interleaving
>   proc check_effective_target_vect_stridedN { } {
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] test: Remove XPASS for RISCV

2023-09-15 Thread Richard Biener via Gcc-patches
On Fri, 15 Sep 2023, Juzhe-Zhong wrote:

> Like ARM SVE, this test cause FAILs of XPASS:
> XPASS: gcc.dg/Wstringop-overflow-47.c pr97027 (test for warnings, line 72)
> XPASS: gcc.dg/Wstringop-overflow-47.c pr97027 (test for warnings, line 77)
> XPASS: gcc.dg/Wstringop-overflow-47.c pr97027 note (test for warnings, line 
> 68)
> 
> on RISC-V

OK.

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/Wstringop-overflow-47.c: Add riscv.
> 
> ---
>  gcc/testsuite/gcc.dg/Wstringop-overflow-47.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-47.c 
> b/gcc/testsuite/gcc.dg/Wstringop-overflow-47.c
> index 968f6ee4ad4..883921b097f 100644
> --- a/gcc/testsuite/gcc.dg/Wstringop-overflow-47.c
> +++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-47.c
> @@ -65,15 +65,15 @@ void warn_i16_64 (int16_t i)
> like x86_64 it's a series of BIT_FIELD_REFs.  The overflow by
> the former is detected but the latter is not yet.  */
>  
> - extern char warn_a64[64];   // { dg-message "at offset (1|128) into 
> destination object 'warn_a64' of size (63|64)" "pr97027 note" { xfail { ! 
> aarch64-*-* } } }
> + extern char warn_a64[64];   // { dg-message "at offset (1|128) into 
> destination object 'warn_a64' of size (63|64)" "pr97027 note" { xfail { ! { 
> aarch64-*-* riscv*-*-* } } } }
>  
>void *p = warn_a64 + 1;
>I16_64 *q = (I16_64*)p;
> -  *q = (I16_64){ i }; // { dg-warning "writing (1 byte|64 bytes) 
> into a region of size (0|63)" "pr97027" { xfail { ! aarch64-*-* } } }
> +  *q = (I16_64){ i }; // { dg-warning "writing (1 byte|64 bytes) 
> into a region of size (0|63)" "pr97027" { xfail { ! { aarch64-*-* riscv*-*-* 
> } } } }
>  
>char a64[64];
>p = a64 + 1;
>q = (I16_64*)p;
> -  *q = (I16_64){ i }; // { dg-warning "writing (1 byte|64 bytes) 
> into a region of size (0|63)" "pr97027" { xfail { ! aarch64-*-* } } }
> +  *q = (I16_64){ i }; // { dg-warning "writing (1 byte|64 bytes) 
> into a region of size (0|63)" "pr97027" { xfail { ! { aarch64-*-* riscv*-*-* 
> } } } }
>sink (p);
>  }
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] use local range for one more pattern in match.pd

2023-09-15 Thread Richard Biener via Gcc-patches
On Fri, 15 Sep 2023, Jiufu Guo wrote:

> Hi,
> 
> For "get_global_range_query" SSA_NAME_RANGE_INFO can be queried.
> For "get_range_query", it could get more context-aware range info.
> And look at the implementation of "get_range_query",  it returns
> global range if no local fun info.
> 
> ATTRIBUTE_RETURNS_NONNULL inline range_query *
> get_range_query (const struct function *fun)
> {
>   return (fun && fun->x_range_query) ? fun->x_range_query : _ranges;
> }
> 
> So, using "get_range_query" would cover more case.
> For example, the test case of "pr111303.c".
> 
> Bootstrap  pass on ppc64{,le} and x86_64.
> Is this ok for trunk?

OK.

> 
> BR,
> Jeff (Jiufu Guo)
> 
> 
>   PR middle-end/111303
> 
> gcc/ChangeLog:
> 
>   * match.pd ((t * 2) / 2): Update pattern.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/pr111303.c: New test.
> 
> ---
>  gcc/match.pd |  4 ++--
>  gcc/testsuite/gcc.dg/tree-ssa/pr111303.c | 15 +++
>  2 files changed, 17 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr111303.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 693638f8ca0..6bd72ff4d69 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -931,8 +931,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> bool overflowed = true;
> value_range vr0, vr1;
> if (INTEGRAL_TYPE_P (type)
> -&& get_global_range_query ()->range_of_expr (vr0, @0)
> -&& get_global_range_query ()->range_of_expr (vr1, @1)
> +&& get_range_query (cfun)->range_of_expr (vr0, @0)
> +&& get_range_query (cfun)->range_of_expr (vr1, @1)
>  && !vr0.varying_p () && !vr0.undefined_p ()
>  && !vr1.varying_p () && !vr1.undefined_p ())
>{
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr111303.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr111303.c
> new file mode 100644
> index 000..b703fe4546d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr111303.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +typedef unsigned int INT;
> +
> +INT
> +foo (INT x, INT y)
> +{
> +  if (x > 100 || y > 100)
> +return x;
> +  return (x * y) / y;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "return x_..D." 1 "optimized"} } */
> +/* { dg-final { scan-tree-dump-times " / " 0 "optimized"} } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] MATCH: Improve zero_one_valued_p for cases without range information

2023-09-15 Thread Richard Biener via Gcc-patches
On Fri, Sep 15, 2023 at 3:09 AM Andrew Pinski via Gcc-patches
 wrote:
>
> I noticed we sometimes lose range information in forwprop due to a few
> match and simplify patterns optimizing away casts. So the easier way
> to these cases is to add a match for zero_one_valued_p wich mathes
> a cast from another zero_one_valued_p.
> This also adds the case of `x & zero_one_valued_p` as being zero_one_valued_p
> which allows catching more cases too.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

I wonder if it would make a difference if we'd enable ranger unconditionally
in forwprop (maybe with -O2+), currently it gets enabled sometimes only.

Richard.

> gcc/ChangeLog:
>
> * match.pd (zero_one_valued_p): Match a cast from a zero_one_valued_p.
> Also match `a & zero_one_valued_p` too.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/bool-13.c: Update testcase as we now do
> the MIN/MAX during forwprop1.
> ---
>  gcc/match.pd| 10 ++
>  gcc/testsuite/gcc.dg/tree-ssa/bool-13.c | 15 +--
>  2 files changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 97db0eb5f25..39c9c81966a 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2181,6 +2181,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>&& (TYPE_UNSIGNED (type)
>   || TYPE_PRECISION (type) > 1
>
> +/* (a&1) is always [0,1] too. This is useful again when
> +   the range is not known. */
> +(match zero_one_valued_p
> + (bit_and:c@0 @1 zero_one_valued_p))
> +
> +/* A conversion from an zero_one_valued_p is still a [0,1].
> +   This is useful when the range of a variable is not known */
> +(match zero_one_valued_p
> + (convert@0 zero_one_valued_p))
> +
>  /* Transform { 0 or 1 } * { 0 or 1 } into { 0 or 1 } & { 0 or 1 }.  */
>  (simplify
>   (mult zero_one_valued_p@0 zero_one_valued_p@1)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bool-13.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/bool-13.c
> index 438f15a484a..de8c99a7727 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/bool-13.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bool-13.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O1 -fdump-tree-optimized -fdump-tree-original 
> -fdump-tree-phiopt1 -fdump-tree-forwprop2" } */
> +/* { dg-options "-O1 -fdump-tree-optimized -fdump-tree-original 
> -fdump-tree-forwprop1 -fdump-tree-forwprop2" } */
>  #define bool _Bool
>  int maxbool(bool ab, bool bb)
>  {
> @@ -22,15 +22,10 @@ int minbool(bool ab, bool bb)
>  /* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "original" } } */
>  /* { dg-final { scan-tree-dump-times "if " 0 "original" } } */
>
> -/* PHI-OPT1 should have kept it as min/max. */
> -/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "phiopt1" } } */
> -/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "phiopt1" } } */
> -/* { dg-final { scan-tree-dump-times "if " 0 "phiopt1" } } */
> -
> -/* Forwprop2 (after ccp) will convert it into &\| */
> -/* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "forwprop2" } } */
> -/* { dg-final { scan-tree-dump-times "MIN_EXPR" 0 "forwprop2" } } */
> -/* { dg-final { scan-tree-dump-times "if " 0 "forwprop2" } } */
> +/* Forwprop1 will convert it into &\| as we can detect that the arguments 
> are one_zero. */
> +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 0 "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-times "if " 0 "forwprop1" } } */
>
>  /* By optimize there should be no min/max nor if  */
>  /* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "optimized" } } */
> --
> 2.31.1
>


Re: [PATCH] MATCH: Fix `(1 >> X) != 0` pattern for vector types

2023-09-15 Thread Richard Biener via Gcc-patches
On Thu, Sep 14, 2023 at 7:49 PM Andrew Pinski via Gcc-patches
 wrote:
>
> I had missed that integer_onep can match vector types with uniform constant 
> of `1`.
> This means the shifter could be an scalar type and then doing a comparison 
> against `0`
> would be an invalid transformation.
> This fixes the problem by adding a check for the type of the integer_onep to 
> make
> sure it is a INTEGRAL_TYPE_P (which does not match a vector type).
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> PR tree-optimization/111414
>
> gcc/ChangeLog:
>
> * match.pd (`(1 >> X) != 0`): Check to see if
> the integer_onep was an integral type (not a vector type).
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/compile/pr111414-1.c: New test.
> ---
>  gcc/match.pd |  5 +++--
>  gcc/testsuite/gcc.c-torture/compile/pr111414-1.c | 13 +
>  2 files changed, 16 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr111414-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 07ffd831132..97db0eb5f25 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4206,8 +4206,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   /* `(1 >> X) != 0` -> `X == 0` */
>   /* `(1 >> X) == 0` -> `X != 0` */
>   (simplify
> -  (cmp (rshift integer_onep @0) integer_zerop)
> -   (icmp @0 { build_zero_cst (TREE_TYPE (@0)); })))
> +  (cmp (rshift integer_onep@1 @0) integer_zerop)
> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@1)))
> +(icmp @0 { build_zero_cst (TREE_TYPE (@0)); }
>
>  /* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
> (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr111414-1.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr111414-1.c
> new file mode 100644
> index 000..13fbdae7230
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr111414-1.c
> @@ -0,0 +1,13 @@
> +int a, b, c, d, e, f, g;
> +int h(int i) { return b >= 2 ?: i >> b; }
> +void j() {
> +  int k;
> +  int *l = 
> +  for (; d; d++) {
> +g = h(0 != j);
> +f = g >> a;
> +k = f << 7;
> +e = k > 5 ? k : 0;
> +*l ^= e;
> +  }
> +}
> --
> 2.31.1
>


Re: [PATCH] tree optimization/111407--SSA corruption due to widening_mul opt

2023-09-15 Thread Richard Biener via Gcc-patches
On Thu, Sep 14, 2023 at 3:25 PM Qing Zhao via Gcc-patches
 wrote:
>
> on conflict across an abnormal edge
>
> This is a bug in tree-ssa-math-opts.cc, when applying the widening mul
> optimization, the compiler needs to check whether the operand is in a
> ABNORMAL PHI, if YES, we should avoid the transformation.
>
> bootstrapped and regression tested on both aarch64 and x86, no issue.
>
> Okay for committing?

OK.

> thanks.
>
> Qing
>
> =
>
> PR tree-optimization/111407
>
> gcc/ChangeLog:
>
> * tree-ssa-math-opts.cc (convert_mult_to_widen): Avoid the transform
> when one of the operands is subject to abnormal coalescing.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr111407.c: New test.
> ---
>  gcc/testsuite/gcc.dg/pr111407.c | 21 +
>  gcc/tree-ssa-math-opts.cc   |  8 
>  2 files changed, 29 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr111407.c
>
> diff --git a/gcc/testsuite/gcc.dg/pr111407.c b/gcc/testsuite/gcc.dg/pr111407.c
> new file mode 100644
> index 000..a171074753f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr111407.c
> @@ -0,0 +1,21 @@
> +/* PR tree-optimization/111407*/
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +enum { SEND_TOFILE } __sigsetjmp();
> +void fclose();
> +void foldergets();
> +void sendpart_stats(int *p1, int a1, int b1) {
> + int *a = p1;
> + fclose();
> + p1 = 0;
> + long t = b1;
> + if (__sigsetjmp()) {
> +   {
> + long t1 = a1;
> + a1+=1;
> + fclose(a1*(long)t1);
> +   }
> + }
> + if (p1)
> +   fclose();
> +}
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index 3db69ad5733..51c14d6bad9 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -2755,6 +2755,14 @@ convert_mult_to_widen (gimple *stmt, 
> gimple_stmt_iterator *gsi)
>if (!is_widening_mult_p (stmt, , , , ))
>  return false;
>
> +  /* if any one of rhs1 and rhs2 is subject to abnormal coalescing,
> + avoid the tranform. */
> +  if ((TREE_CODE (rhs1) == SSA_NAME
> +   && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs1))
> +  || (TREE_CODE (rhs2) == SSA_NAME
> + && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs2)))
> +return false;
> +
>to_mode = SCALAR_INT_TYPE_MODE (type);
>from_mode = SCALAR_INT_TYPE_MODE (type1);
>if (to_mode == from_mode)
> --
> 2.31.1
>


Re: Question on -fwrapv and -fwrapv-pointer

2023-09-14 Thread Richard Biener via Gcc-patches



> Am 14.09.2023 um 17:01 schrieb Qing Zhao :
> 
> Thanks for the info.
> 
>> On Sep 14, 2023, at 10:06 AM, Richard Biener  
>> wrote:
>> 
>>> On Thu, Sep 14, 2023 at 3:42 PM Qing Zhao via Gcc-patches
>>>  wrote:
>>> 
>>> Hi,
>>> 
>>> I have several questions on these options:
>>> 
>>> 1.are pointers treated as signed integers in general? (I thought that 
>>> pointers are addresses to the memory, should be treated as unsigned 
>>> integer…)
>>> 2. If Yes, why?
>>> 3. why a separate option for pointesr -fwrapv-pointer in addition to 
>>> -fwrapv if they are treated as signed integers?
>> 
>> Pointers are unsigned, they might sign-extend to Pmode though.
> If they are unsigned, why they are sign-extend to Pmode? Is there any special 
> reason for this? 

Some targets require this.  See POINTERS_EXTEND_UNSIGNED

> In another word, can we consistently treat pointers as unsigned? 

We do, but on GIMPLE it doesn’t matter.

>> -fwrapv-pointer is to enable wrapping over zero,
> 
> If we always treat pointers are unsigned, then we don’t need the 
> -fwrapv-pointer anymore, right? 

No, the naming is just ‚bad‘

> 
>> we don't have many places using this, ISTR kernel folks requested to
>> disable specific folding - digging in history
>> might reveal the case/PR.
> 
> Do you mean that this -fwrapv-pointer was introduced for kernel?

I think it was introduced when removing the separate fstrict-overflow flag and 
since that covered also some pointer transforms the wraps-pointer flag was 
introduced.

> 
> I will try to dig a little bit here.
> 
> thanks.
> 
> Qing
>> 
>> Richard.
>> 
>>> Thanks for your help.
>>> 
>>> Qing
>>> 
> 


Re: Question on -fwrapv and -fwrapv-pointer

2023-09-14 Thread Richard Biener via Gcc-patches
On Thu, Sep 14, 2023 at 3:42 PM Qing Zhao via Gcc-patches
 wrote:
>
> Hi,
>
> I have several questions on these options:
>
> 1.are pointers treated as signed integers in general? (I thought that 
> pointers are addresses to the memory, should be treated as unsigned integer…)
> 2. If Yes, why?
> 3. why a separate option for pointesr -fwrapv-pointer in addition to -fwrapv 
> if they are treated as signed integers?

Pointers are unsigned, they might sign-extend to Pmode though.
-fwrapv-pointer is to enable wrapping over zero,
we don't have many places using this, ISTR kernel folks requested to
disable specific folding - digging in history
might reveal the case/PR.

Richard.

> Thanks for your help.
>
> Qing
>


[PATCH] tree-optimization/111294 - backwards threader PHI costing

2023-09-14 Thread Richard Biener via Gcc-patches
This revives an earlier patch since the problematic code applying
extra costs to PHIs in copied blocks we couldn't make any sense of
prevents a required threading in this case.  Instead of coming up
with an artificial other costing the following simply removes the
bits.

As with all threading changes this requires a plethora of testsuite
adjustments, but only the last three are unfortunate as is the
libgomp team.c adjustment which is required to avoid a bogus -Werror
diagnostic during bootstrap.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Any objections?

Thanks,
Richard.

PR tree-optimization/111294
gcc/
* tree-ssa-threadbackward.cc (back_threader_profitability::m_name):
Remove
(back_threader::find_paths_to_names): Adjust.
(back_threader::maybe_thread_block): Likewise.
(back_threader_profitability::possibly_profitable_path_p): Remove
code applying extra costs to copies PHIs.

libgomp/
* team.c (gomp_team_start): Guard gomp_alloca to avoid false
positive alloc-size diagnostic.

gcc/testsuite/
* gcc.dg/tree-ssa/pr111294.c: New test.
* gcc.dg/tree-ssa/phi_on_compare-4.c: Adjust.
* gcc.dg/tree-ssa/pr59597.c: Likewise.
* gcc.dg/tree-ssa/pr61839_2.c: Likewise.
* gcc.dg/tree-ssa/ssa-sink-18.c: Likewise.
* g++.dg/warn/Wstringop-overflow-4.C: XFAIL subtest on ilp32.
* gcc.dg/uninit-pred-9_b.c: XFAIL subtest everywhere.
* gcc.dg/vect/vect-117.c: Make scan for not Invalid sum
conditional on lp64.
---
 .../g++.dg/warn/Wstringop-overflow-4.C|  4 +-
 .../gcc.dg/tree-ssa/phi_on_compare-4.c|  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr111294.c  | 32 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr59597.c   |  8 +--
 gcc/testsuite/gcc.dg/tree-ssa/pr61839_2.c |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c   |  6 +-
 gcc/testsuite/gcc.dg/uninit-pred-9_b.c|  2 +-
 gcc/testsuite/gcc.dg/vect/vect-117.c  |  2 +-
 gcc/tree-ssa-threadbackward.cc| 60 ++-
 libgomp/team.c|  5 +-
 10 files changed, 57 insertions(+), 70 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr111294.c

diff --git a/gcc/testsuite/g++.dg/warn/Wstringop-overflow-4.C 
b/gcc/testsuite/g++.dg/warn/Wstringop-overflow-4.C
index faad5bed074..275ecac01b5 100644
--- a/gcc/testsuite/g++.dg/warn/Wstringop-overflow-4.C
+++ b/gcc/testsuite/g++.dg/warn/Wstringop-overflow-4.C
@@ -151,7 +151,9 @@ void test_strcpy_new_int16_t (size_t n, const size_t vals[])
as size_t as a result of threading.  See PR 101688 comment #2.  */
 T (S (1), new int16_t[r_0_imax]);
 
-  T (S (2), new int16_t[r_0_imax + 1]);
+  /* Similar to PR 101688 the following can result in a bougs warning because
+ of threading.  */
+  T (S (2), new int16_t[r_0_imax + 1]); // { dg-bogus "into a region of size" 
"" { xfail { ilp32 } } }
   T (S (9), new int16_t[r_0_imax * 2 + 1]);
 
   int r_1_imax = SR (1, INT_MAX);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-4.c
index 1e09f89af9f..6240d1cdd6d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-dom2" } */
+/* { dg-options "-Ofast -fdump-tree-threadfull1-stats" } */
 
 void g (int);
 void g1 (int);
@@ -37,4 +37,4 @@ f (long a, long b, long c, long d, int x)
   g (c + d);
 }
 
-/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "dom2" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 2" "threadfull1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr111294.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr111294.c
new file mode 100644
index 000..9ad912bad0b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr111294.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -fdump-tree-optimized" } */
+
+void foo(void);
+static short a;
+static int b, c, d;
+static int *e, *f = 
+static int **g = 
+static unsigned char h;
+static short(i)(short j, int k) { return j > k ?: j; }
+static char l() {
+if (a) return b;
+return c;
+}
+int main() {
+b = 0;
+for (; b < 5; ++b)
+;
+h = l();
+if (a ^ 3 >= i(h, 11))
+a = 0;
+else {
+*g = f;
+if (e ==  & b) {
+__builtin_unreachable();
+} else
+foo();
+;
+}
+}
+
+/* { dg-final { scan-tree-dump-not "foo" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
index 0f66aae87bb..26c81d9dbb7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdisable-tree-cunrolli 
-fdump-tree-threadfull1-details" } */
+/* { dg-options "-Ofast 

[PATCH] tree-optimization/111294 - better DCE after forwprop

2023-09-14 Thread Richard Biener via Gcc-patches
The following adds more aggressive DCE to forwprop to clean up dead
stmts when folding a stmt leaves some operands unused.  The patch
uses simple_dce_from_worklist for this purpose, queueing original
operands before substitution and folding, but only if we folded the
stmt.

This removes one dead stmt biasing threading costs in a later pass
but it doesn't resolve the optimization issue in the PR yet.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111294
* tree-ssa-forwprop.cc (pass_forwprop::execute): Track
operands that eventually become dead and use simple_dce_from_worklist
to remove their definitions if they did so.

* gcc.dg/tree-ssa/evrp10.c: Adjust.
* gcc.dg/tree-ssa/evrp6.c: Likewise.
* gcc.dg/tree-ssa/forwprop-31.c: Likewise.
* gcc.dg/tree-ssa/neg-cast-3.c: Likewise.
---
 gcc/testsuite/gcc.dg/tree-ssa/evrp10.c  |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/evrp6.c   |  5 ++--
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-31.c |  3 +--
 gcc/testsuite/gcc.dg/tree-ssa/neg-cast-3.c  |  4 +--
 gcc/tree-ssa-forwprop.cc| 27 +
 5 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/evrp10.c 
b/gcc/testsuite/gcc.dg/tree-ssa/evrp10.c
index 6ca00e4adaa..776c80c684f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/evrp10.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/evrp10.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-evrp" }*/
+/* { dg-options "-O2 -fdump-tree-evrp -fno-tree-forwprop" }*/
 
 typedef __INT32_TYPE__ int32_t;
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/evrp6.c 
b/gcc/testsuite/gcc.dg/tree-ssa/evrp6.c
index aaeec68866e..0f9561b6a72 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/evrp6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/evrp6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-evrp-details" } */
+/* { dg-options "-O2 -fdump-tree-evrp-details -fdump-tree-mergephi1" } */
 
 extern void abort (void);
 
@@ -18,4 +18,5 @@ foo (int k, int j)
 
   return j;
 }
-/* { dg-final { scan-tree-dump "\\\[12, \\+INF" "evrp" } } */
+/* { dg-final { scan-tree-dump "\\\[11, \\+INF" "evrp" } } */
+/* { dg-final { scan-tree-dump-not "abort" "mergephi1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-31.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-31.c
index edf80264884..40cc86383fa 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-31.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-31.c
@@ -9,6 +9,5 @@ int foo (int x)
   return w - z; /* becomes 0 */
 }
 
-/* Only z = x + 1 is retained.  */
-/* { dg-final { scan-tree-dump-times " = " 1 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times " = " 0 "forwprop1" } } */
 /* { dg-final { scan-tree-dump "return 0;" "forwprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-3.c
index 7b23ca85d1f..61b89403a93 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-3.c
@@ -10,6 +10,4 @@ unsigned f(_Bool a)
 }
 
 /* There should be no cast to int at all. */
-/* Forwprop1 does not remove all of the statements. */
-/* { dg-final { scan-tree-dump-not "\\\(int\\\)" "forwprop1" { xfail *-*-* } } 
} */
-/* { dg-final { scan-tree-dump-not "\\\(int\\\)" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "\\\(int\\\)" "forwprop1" } } */
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 94ca47a9726..d4e9202a2d4 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cgraph.h"
 #include "tree-ssa.h"
 #include "gimple-range.h"
+#include "tree-ssa-dce.h"
 
 /* This pass propagates the RHS of assignment statements into use
sites of the LHS of the assignment.  It's basically a specialized
@@ -3502,8 +3503,9 @@ pass_forwprop::execute (function *fun)
 |= EDGE_EXECUTABLE;
   auto_vec to_fixup;
   auto_vec to_remove;
+  auto_bitmap simple_dce_worklist;
+  auto_bitmap need_ab_cleanup;
   to_purge = BITMAP_ALLOC (NULL);
-  bitmap need_ab_cleanup = BITMAP_ALLOC (NULL);
   for (int i = 0; i < postorder_num; ++i)
 {
   gimple_stmt_iterator gsi;
@@ -3902,10 +3904,14 @@ pass_forwprop::execute (function *fun)
{
  tree use = USE_FROM_PTR (usep);
  tree val = fwprop_ssa_val (use);
- if (val && val != use && may_propagate_copy (use, val))
+ if (val && val != use)
{
- propagate_value (usep, val);
- substituted_p = true;
+ bitmap_set_bit (simple_dce_worklist, SSA_NAME_VERSION (use));
+ if (may_propagate_copy (use, val))
+   {
+ propagate_value (usep, val);
+ substituted_p = true;
+   }
}
}
  if (substituted_p

Re: [PATCH] core: Support heap-based trampolines

2023-09-14 Thread Richard Biener via Gcc-patches
On Wed, Sep 6, 2023 at 5:44 PM FX Coudert  wrote:
>
> Hi,
>
> ping**2 on the revised patch, for Richard or another global reviewer. So far 
> all review feedback is that it’s a step forward, and it’s been widely used 
> for both aarch64-darwin and x86_64-darwin distributions for almost three 
> years now.
>
> OK to commit?

I just noticed that ftrampoline-impl isn't Optimize, thus it's not
streamed with LTO.  How does mixing
different -ftrampoline-impl for different LTO TUs behave?  How does
mis-specifying -ftrampoline-impl
at LTO link time compared to compile-time behave?  Is the state fully
reflected during pre-IPA compilation
and the flag not needed after that?  It appears so, but did you check?

OK if that's a non-issue.

Thanks,
Richard.

> FX
>
>
>
> > Le 5 août 2023 à 16:20, FX Coudert  a écrit :
> >
> > Hi Richard,
> >
> > Thanks for your feedback. Here is an amended version of the patch, taking 
> > into consideration your requests and the following discussion. There is no 
> > configure option for the libgcc part, and the documentation is amended. The 
> > patch is split into three commits for core, target and libgcc.
> >
> > Currently regtesting on x86_64 linux and darwin (it was fine before I split 
> > up into three commits, so I’m re-testing to make sure I didn’t screw 
> > anything up).
> >
> > OK to commit?
> > FX
>


Re: [PATCHSET] Reintroduce targetrustm hooks

2023-09-14 Thread Richard Biener via Gcc-patches
On Wed, Sep 13, 2023 at 10:14 PM Iain Buclaw via Gcc-patches
 wrote:
>
> Excerpts from Arthur Cohen's message of September 7, 2023 3:41 pm:
> > Alright, was not expecting to mess up this patchset so bad so here we go:
> >
> > This patchset reintroduces proper targetrustm hooks without the old
> > problematic mess of macros we had, which had been removed for the first
> > merge of gccrs upstream.
> >
> > Tested on x86-64 GNU Linux, and has also been present in our development
> > repository for a long time - added by this pull-request from Iain [1]
> > which was merged in October 2022.
> >
> > Ok for trunk?
> >
> > [PATCH 01/14] rust: Add skeleton support and documentation for
> > [PATCH 02/14] rust: Reintroduce TARGET_RUST_CPU_INFO hook
> > [PATCH 03/14] rust: Reintroduce TARGET_RUST_OS_INFO hook
> > [PATCH 04/14] rust: Implement TARGET_RUST_CPU_INFO for i[34567]86-*-*
> > [PATCH 05/14] rust: Implement TARGET_RUST_OS_INFO for *-*-darwin*
> > [PATCH 06/14] rust: Implement TARGET_RUST_OS_INFO for *-*-freebsd*
> > [PATCH 07/14] rust: Implement TARGET_RUST_OS_INFO for *-*-netbsd*
> > [PATCH 08/14] rust: Implement TARGET_RUST_OS_INFO for *-*-openbsd*
> > [PATCH 09/14] rust: Implement TARGET_RUST_OS_INFO for *-*-solaris2*.
> > [PATCH 10/14] rust: Implement TARGET_RUST_OS_INFO for *-*-dragonfly*
> > [PATCH 11/14] rust: Implement TARGET_RUST_OS_INFO for *-*-vxworks*
> > [PATCH 12/14] rust: Implement TARGET_RUST_OS_INFO for *-*-fuchsia*.
> > [PATCH 13/14] rust: Implement TARGET_RUST_OS_INFO for
> > [PATCH 14/14] rust: Implement TARGET_RUST_OS_INFO for *-*-*linux*.
> >
>
> Thanks for eventually getting round to this.
>
> As the co-author of this patch series, I'm not going to look at it.
>
> FWIW, these being Rust-specific target changes isolated to just
> Rust-specific files, you should have the automony to commit without
> needing any request for review - at least this is my understanding when
> have made D-specific target changes in the past that have not touched
> common back-end headers.
>
> I'll let someone else confirm and check over the shared parts touched by
> the patch however.

I confirm.  I briefly went over the shared parts and they look OK.

Thanks,
Richard.

> For reviewers, this is pretty much a mirror of the D front-end's CPU and
> OS-specific target hooks (D has built-in version identifiers, not
> built-in attributes, but both Rust and D are otherwise the same in the
> kind of information exposed by them).
>
> > [1]: https://github.com/Rust-GCC/gccrs/pull/1543
> >
>
> The other GitHub pull request that added these is here.
>
> https://github.com/Rust-GCC/gccrs/pull/1596
>
> Regards,
> Iain.


Re: gcc-patches From rewriting mailman settings (Was: [Linaro-TCWG-CI] gcc patch #75674: FAIL: 68 regressions)

2023-09-14 Thread Richard Biener via Gcc-patches
On Tue, Sep 12, 2023 at 5:00 PM Mark Wielaard  wrote:
>
> Hi Maxim,
>
> Adding Jeff to CC who is the official gcc-patches mailinglist admin.
>
> On Tue, 2023-09-12 at 11:08 +0400, Maxim Kuvyrkov wrote:
> > Normally, notifications from Linaro TCWG precommit CI are sent only to
> > patch author and patch submitter.  In this case the sender was rewritten
> > to "Benjamin Priour via Gcc-patches ",
> > which was detected by Patchwork [1] as patch submitter.
>
> BTW. Really looking forward to your talk at Cauldron about this!
>
> > Is "From:" re-write on gcc-patches@ mailing list a side-effect of [2]?
> > I see that some, but not all messages to gcc-patches@ have their
> > "From:" re-written.
> >
> > Also, do you know if re-write of "From:" on gcc-patches@ is expected?
>
> Yes, it is expected for emails that come from domains with a dmarc
> policy. That is because the current settings of the gcc-patches
> mailinglist might slightly alter the message or headers in a way that
> invalidates the DKIM signature. Without From rewriting those messages
> would be bounced by recipients that check the dmarc policy/dkim
> signature.
>
> As you noticed the glibc hackers have recently worked together with the
> sourceware overseers to upgrade mailman and alter the postfix and the
> libc-alpha mailinglist setting so it doesn't require From rewriting
> anymore (the message and header aren't altered anymore to invalidate
> the DKIM signatures).
>
> We (Jeff or anyone else with mailman admin privs) could use the same
> settings for gcc-patches. The settings that need to be set are in that
> bug:
>
> - subject_prefix (general): (empty)
> - from_is_list (general): No
> - anonymous_list (general): No
> - first_strip_reply_to (general): No
> - reply_goes_to_list (general): Poster
> - reply_to_address (general): (empty)
> - include_sender_header (general): No
> - drop_cc (general): No
> - msg_header (nondigest): (empty)
> - msg_footer (nondigest): (empty)
> - scrub_nondigest (nondigest): No
> - dmarc_moderation_action (privacy): Accept
> - filter_content (contentfilter): No
>
> The only visible change (apart from no more From rewriting) is that
> HTML multi-parts aren't scrubbed anymore (that would be a message
> altering issue). The html part is still scrubbed from the
> inbox.sourceware.org archive, so b4 works just fine. But I don't know
> what patchwork.sourceware.org does with HTML attachements. Of course
> people really shouldn't sent HTML attachments to gcc-patches, so maybe
> this is no real problem.

Ick (to the HTML part).  I wonder if we can use From rewriting for those,
still stripping the HTML part?  Maybe we can also go back to rejecting
mails that are not text/plain ...

> Let me know if you want Jeff (or me or one of the other overseers) make
> the above changes to the gcc-patches mailman settings.
>
> Cheers,
>
> Mark
>
> > [1] https://patchwork.sourceware.org/project/gcc/list/
> > [2] https://sourceware.org/bugzilla/show_bug.cgi?id=29713
>


Re: [PATCH] MATCH: Support `(a != (CST+1)) & (a > CST)` optimizations

2023-09-14 Thread Richard Biener via Gcc-patches
On Thu, Sep 14, 2023 at 7:34 AM Andrew Pinski via Gcc-patches
 wrote:
>
> Even though this is done via reassocation, match can support
> these with a simple change to detect that the difference is just
> one. This allows to optimize these earlier and even during phiopt
> for an example.
>
> This patch adds the following cases:
> (a != (CST+1)) & (a > CST) -> a > (CST+1)
> (a != (CST-1)) & (a < CST) -> a < (CST-1)
> (a == (CST-1)) | (a >= CST) -> a >= (CST-1)
> (a == (CST+1)) | (a <= CST) -> a <= (CST+1)
>
> Canonicalizations of comparisons causes this case to show up more.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> PR tree-optimization/106164
>
> gcc/ChangeLog:
>
> * match.pd (`(X CMP1 CST1) AND/IOR (X CMP2 CST2)`):
> Expand to support constants that are off by one.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr21643.c: Update test now that match does
> the combing of the comparisons.
> * gcc.dg/tree-ssa/cmpbit-5.c: New test.
> * gcc.dg/tree-ssa/phi-opt-35.c: New test.
> ---
>  gcc/match.pd   | 44 ++-
>  gcc/testsuite/gcc.dg/pr21643.c |  6 ++-
>  gcc/testsuite/gcc.dg/tree-ssa/cmpbit-5.c   | 51 ++
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-35.c | 13 ++
>  4 files changed, 111 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cmpbit-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-35.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7ecf5568599..07ffd831132 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2970,10 +2970,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> && operand_equal_p (@1, @2)))
>  (with
>   {
> +  bool one_before = false;
> +  bool one_after = false;
>int cmp = 0;
>if (TREE_CODE (@1) == INTEGER_CST
>   && TREE_CODE (@2) == INTEGER_CST)
> -   cmp = tree_int_cst_compare (@1, @2);
> +   {
> + cmp = tree_int_cst_compare (@1, @2);
> + if (cmp < 0
> + && wi::to_wide (@1) == wi::to_wide (@2) - 1)
> +   one_before = true;
> + if (cmp > 0
> + && wi::to_wide (@1) == wi::to_wide (@2) + 1)
> +   one_after = true;
> +   }
>bool val;
>switch (code2)
>  {
> @@ -2998,6 +3008,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> && code2 == LE_EXPR
>&& cmp == 0)
> (lt @0 @1))
> +  /* (a != (b+1)) & (a > b) -> a > (b+1) */
> +  (if (code1 == NE_EXPR
> +   && code2 == GT_EXPR
> +  && one_after)
> +   (gt @0 @1))
> +  /* (a != (b-1)) & (a < b) -> a < (b-1) */
> +  (if (code1 == NE_EXPR
> +   && code2 == LT_EXPR
> +  && one_before)
> +   (lt @0 @1))
>   )
>  )
> )
> @@ -3069,10 +3089,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> && operand_equal_p (@1, @2)))
>  (with
>   {
> +  bool one_before = false;
> +  bool one_after = false;
>int cmp = 0;
>if (TREE_CODE (@1) == INTEGER_CST
>   && TREE_CODE (@2) == INTEGER_CST)
> -   cmp = tree_int_cst_compare (@1, @2);
> +   {
> + cmp = tree_int_cst_compare (@1, @2);
> + if (cmp < 0
> + && wi::to_wide (@1) == wi::to_wide (@2) - 1)
> +   one_before = true;
> + if (cmp > 0
> + && wi::to_wide (@1) == wi::to_wide (@2) + 1)
> +   one_after = true;
> +   }
>bool val;
>switch (code2)
> {
> @@ -3097,6 +3127,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> && code2 == LT_EXPR
>&& cmp == 0)
> (le @0 @1))
> +  /* (a == (b-1)) | (a >= b) -> a >= (b-1) */
> +  (if (code1 == EQ_EXPR
> +   && code2 == GE_EXPR
> +  && one_before)
> +   (ge @0 @1))
> +  /* (a == (b+1)) | (a <= b) -> a <= (b-1) */
> +  (if (code1 == EQ_EXPR
> +   && code2 == LE_EXPR
> +  && one_after)
> +   (le @0 @1))
>   )
>  )
> )
> diff --git a/gcc/testsuite/gcc.dg/pr21643.c b/gcc/testsuite/gcc.dg/pr21643.c
> index 4e7f93d351a..42517b5af1e 100644
> --- a/gcc/testsuite/gcc.dg/pr21643.c
> +++ b/gcc/testsuite/gcc.dg/pr21643.c
> @@ -86,4 +86,8 @@ f9 (unsigned char c)
>return 1;
>  }
>
> -/* { dg-final { scan-tree-dump-times "Optimizing range tests c_\[0-9\]*.D. 
> -.0, 31. and -.32, 32.\[\n\r\]* into" 6 "reassoc1" } }  */
> +/* Note with match being able to simplify this, optimizing range tests is no 
> longer needed here. */
> +/* Equivalence: _7 | _2 -> c_5(D) <= 32 */
> +/* old test: dg-final  scan-tree-dump-times "Optimizing range tests 
> c_\[0-9\]*.D. -.0, 31. and -.32, 32.\[\n\r\]* into" 6 "reassoc1"   */
> +/* { dg-final { scan-tree-dump-times "Equivalence: _\[0-9\]+ \\\| _\[0-9\]+ 
> -> c_\[0-9\]+.D. <= 32" 5 "reassoc1" } }  */
> +/* { dg-final { scan-tree-dump-times "Equivalence: _\[0-9\]+ \& _\[0-9\]+ -> 

Re: [PATCH] Improve error message for if with an else part while in switch

2023-09-14 Thread Richard Biener via Gcc-patches
On Thu, Sep 14, 2023 at 12:30 AM Andrew Pinski via Gcc-patches
 wrote:
>
> While writing some match.pd code, I was trying to figure
> out why I was getting an `expected ), got (` error message
> while writing an if statement with an else clause. For switch
> statements, the if statements cannot have an else clause so
> it would be better to have a decent error message saying that
> explictly.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK.

Richard.

> gcc/ChangeLog:
>
> * genmatch.cc (parser::parse_result): For an else clause
> of an if statement inside a switch, error out explictly.
> ---
>  gcc/genmatch.cc | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index a1925a747a7..03d325efdf6 100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -4891,6 +4891,8 @@ parser::parse_result (operand *result, predicate_id 
> *matcher)
> ife->trueexpr = parse_result (result, matcher);
>   else
> ife->trueexpr = parse_op ();
> + if (peek ()->type == CPP_OPEN_PAREN)
> +   fatal_at (peek(), "if inside switch cannot have an else");
>   eat_token (CPP_CLOSE_PAREN);
> }
>   else
> --
> 2.31.1
>


[PATCH] tree-optimization/111387 - BB SLP and irreducible regions

2023-09-13 Thread Richard Biener via Gcc-patches
When we split an irreducible region for BB vectorization analysis
the defensive handling of external backedge defs in
vect_get_and_check_slp_defs doesn't work since that relies on
dominance info to identify a backedge.  The testcase also shows
we are iterating over the function in a sub-optimal way which is
why we split the irreducible region in the first place.  The fix
is to mark backedges and use EDGE_DFS_BACK to identify them and
to use the region RPO compute which can produce a RPO order keeping
cycles in a better order (and as side effect marks backedges).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111387
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Check
EDGE_DFS_BACK when doing BB vectorization.
(vect_slp_function): Use rev_post_order_and_mark_dfs_back_seme
to compute RPO and mark backedges.

* gcc.dg/torture/pr111387.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr111387.c | 34 +
 gcc/tree-vect-slp.cc| 16 +---
 2 files changed, 46 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr111387.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr111387.c 
b/gcc/testsuite/gcc.dg/torture/pr111387.c
new file mode 100644
index 000..e14eeef6e4a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr111387.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ftree-vectorize -fno-vect-cost-model" } */
+
+struct {
+  unsigned a;
+  unsigned b;
+} c;
+int d, e, f, g, h;
+int main()
+{
+  if (c.b && g && g > 7)
+goto i;
+ j:
+  if (c.a) {
+int k = 0;
+unsigned l = c.b;
+if (0) {
+m:
+  k = l = c.b;
+}
+c.a = k;
+c.b = l;
+  }
+  if (0) {
+  i:
+goto m;
+  }
+  if (d)
+goto j;
+  for (f = 5; f; f--)
+if (h)
+  e = 0;
+  return 0;
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 0cf6e02285e..a3e54ebf62a 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -628,9 +628,13 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
char swap,
{
  oprnd = gimple_arg (stmt_info->stmt, opno);
  if (gphi *stmt = dyn_cast  (stmt_info->stmt))
-   backedge = dominated_by_p (CDI_DOMINATORS,
-  gimple_phi_arg_edge (stmt, opno)->src,
-  gimple_bb (stmt_info->stmt));
+   {
+ edge e = gimple_phi_arg_edge (stmt, opno);
+ backedge = (is_a  (vinfo)
+ ? e->flags & EDGE_DFS_BACK
+ : dominated_by_p (CDI_DOMINATORS, e->src,
+   gimple_bb (stmt_info->stmt)));
+   }
}
   if (TREE_CODE (oprnd) == VIEW_CONVERT_EXPR)
oprnd = TREE_OPERAND (oprnd, 0);
@@ -7771,7 +7775,11 @@ vect_slp_function (function *fun)
 {
   bool r = false;
   int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (fun));
-  unsigned n = pre_and_rev_post_order_compute_fn (fun, NULL, rpo, false);
+  auto_bitmap exit_bbs;
+  bitmap_set_bit (exit_bbs, EXIT_BLOCK);
+  edge entry = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (fun));
+  unsigned n = rev_post_order_and_mark_dfs_back_seme (fun, entry, exit_bbs,
+ true, rpo, NULL);
 
   /* For the moment split the function into pieces to avoid making
  the iteration on the vector mode moot.  Split at points we know
-- 
2.35.3


[PATCH] tree-optimization/111397 - missed copy propagation involving abnormal dest

2023-09-13 Thread Richard Biener via Gcc-patches
The following extends the previous enhancement to copy propagation
involving abnormals.  We can easily replace abnormal uses by not
abnormal uses and only need to preserve the abnormals in PHI arguments
flowing in from abnormal edges.  This changes the may_propagate_copy
argument indicating we are not propagating into a PHI node to indicate
whether we know we are not propagating into a PHI argument from an
abnormal PHI instead.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111397
* tree-ssa-propagate.cc (may_propagate_copy): Change optional
argument to specify whether the PHI destination doesn't flow in
from an abnormal PHI.
(propagate_value): Adjust.
* tree-ssa-forwprop.cc (pass_forwprop::execute): Indicate abnormal
PHI dest.
* tree-ssa-sccvn.cc (eliminate_dom_walker::before_dom_children):
Likewise.
(process_bb): Likewise.

* gcc.dg/uninit-pr111397.c: New testcase.
---
 gcc/testsuite/gcc.dg/uninit-pr111397.c | 15 +++
 gcc/tree-ssa-forwprop.cc   |  2 +-
 gcc/tree-ssa-propagate.cc  | 20 +---
 gcc/tree-ssa-sccvn.cc  |  5 +++--
 4 files changed, 32 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/uninit-pr111397.c

diff --git a/gcc/testsuite/gcc.dg/uninit-pr111397.c 
b/gcc/testsuite/gcc.dg/uninit-pr111397.c
new file mode 100644
index 000..ec12f9d642a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/uninit-pr111397.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wuninitialized" } */
+
+int globalVar = 1;
+int __attribute__ ((__returns_twice__)) test_setjmpex(void *context);
+
+void testfn()
+{
+  int localVar = globalVar;
+  while (!localVar) {
+  test_setjmpex(__builtin_frame_address (0)); // { dg-bogus 
"uninitialized" }
+  if (globalVar)
+   break;
+  }
+}
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 047f9237dd4..94ca47a9726 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -4070,7 +4070,7 @@ pass_forwprop::execute (function *fun)
  continue;
tree val = fwprop_ssa_val (arg);
if (val != arg
-   && may_propagate_copy (arg, val))
+   && may_propagate_copy (arg, val, !(e->flags & EDGE_ABNORMAL)))
  propagate_value (use_p, val);
  }
 
diff --git a/gcc/tree-ssa-propagate.cc b/gcc/tree-ssa-propagate.cc
index cb68b419b8c..a29c49328ad 100644
--- a/gcc/tree-ssa-propagate.cc
+++ b/gcc/tree-ssa-propagate.cc
@@ -1032,11 +1032,12 @@ substitute_and_fold_engine::substitute_and_fold 
(basic_block block)
 
 
 /* Return true if we may propagate ORIG into DEST, false otherwise.
-   If DEST_NOT_PHI_ARG_P is true then assume the propagation does
-   not happen into a PHI argument which relaxes some constraints.  */
+   If DEST_NOT_ABNORMAL_PHI_EDGE_P is true then assume the propagation does
+   not happen into a PHI argument which flows in from an abnormal edge
+   which relaxes some constraints.  */
 
 bool
-may_propagate_copy (tree dest, tree orig, bool dest_not_phi_arg_p)
+may_propagate_copy (tree dest, tree orig, bool dest_not_abnormal_phi_edge_p)
 {
   tree type_d = TREE_TYPE (dest);
   tree type_o = TREE_TYPE (orig);
@@ -1056,9 +1057,9 @@ may_propagate_copy (tree dest, tree orig, bool 
dest_not_phi_arg_p)
   && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (orig))
 return false;
   /* Similarly if DEST flows in from an abnormal edge then the copy cannot be
- propagated.  If we know we do not propagate into a PHI argument this
+ propagated.  If we know we do not propagate into such a PHI argument this
  does not apply.  */
-  else if (!dest_not_phi_arg_p
+  else if (!dest_not_abnormal_phi_edge_p
   && TREE_CODE (dest) == SSA_NAME
   && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (dest))
 return false;
@@ -1162,8 +1163,13 @@ void
 propagate_value (use_operand_p op_p, tree val)
 {
   if (flag_checking)
-gcc_assert (may_propagate_copy (USE_FROM_PTR (op_p), val,
-   !is_a  (USE_STMT (op_p;
+{
+  bool ab = (is_a  (USE_STMT (op_p))
+&& (gimple_phi_arg_edge (as_a  (USE_STMT (op_p)),
+ PHI_ARG_INDEX_FROM_USE (op_p))
+->flags & EDGE_ABNORMAL));
+  gcc_assert (may_propagate_copy (USE_FROM_PTR (op_p), val, !ab));
+}
   replace_exp (op_p, val);
 }
 
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index d9487be302b..1eaf5f6a363 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -7399,7 +7399,8 @@ eliminate_dom_walker::before_dom_children (basic_block b)
  || virtual_operand_p (arg))
continue;
  tree sprime = eliminate_avail (b, arg);
- if (sprime && may_propagate_copy (arg, sprime))
+ if (sprime && may_propagate_copy (arg, sprime,
+

Re: [PATCH] MATCH: Simplify `(X % Y) < Y` pattern.

2023-09-13 Thread Richard Biener via Gcc-patches
On Wed, Sep 13, 2023 at 12:11 AM Andrew Pinski via Gcc-patches
 wrote:
>
> This merges the two patterns to catch
> `(X % Y) < Y` and `Y > (X % Y)` into one by
> using :c on the comparison operator.
> It does not change any code generation nor
> anything else. It is more to allow for better
> maintainability of this pattern.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK.

> gcc/ChangeLog:
>
> * match.pd (`Y > (X % Y)`): Merge
> into ...
> (`(X % Y) < Y`): Pattern by adding `:c`
> on the comparison.
> ---
>  gcc/match.pd | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 39c7ea1088f..24fd29863fb 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1483,14 +1483,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* X % Y is smaller than Y.  */
>  (for cmp (lt ge)
>   (simplify
> -  (cmp (trunc_mod @0 @1) @1)
> +  (cmp:c (trunc_mod @0 @1) @1)
>(if (TYPE_UNSIGNED (TREE_TYPE (@0)))
> { constant_boolean_node (cmp == LT_EXPR, type); })))
> -(for cmp (gt le)
> - (simplify
> -  (cmp @1 (trunc_mod @0 @1))
> -  (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
> -   { constant_boolean_node (cmp == GT_EXPR, type); })))
>
>  /* x | ~0 -> ~0  */
>  (simplify
> --
> 2.31.1
>


Re: [PATCH v2 08/11] Native complex ops: Add explicit vector of complex

2023-09-13 Thread Richard Biener via Gcc-patches
On Tue, Sep 12, 2023 at 7:26 PM Joseph Myers  wrote:
>
> On Tue, 12 Sep 2023, Sylvain Noiry via Gcc-patches wrote:
>
> > Summary:
> > Allow the creation and usage of builtins vectors of complex
> > in C, using __attribute__ ((vector_size ()))
>
> If you're adding a new language feature like this, you need to update
> extend.texi to explain the valid uses of the attribute for complex types,
> and (under "Vector Extensions") the valid uses of the resulting vectors.
> You also need to add testcases to the testsuite for such vectors - both
> execution tests covering valid uses of the vectors, and tests that invalid
> declarations or uses of such vectors (uses with any operator, or other
> operand to such operator, that aren't valid) are properly rejected - go
> through all cases of operators, with one or two complex vector operands,
> of the same or different types, and with different choices for what type
> the other operand might be when one has complex vector type, and make sure
> they are all properly tested and do have the desired and documented
> semantics.
>
> If the intended semantics are the same for C and C++, the tests should be
> c-c++-common tests.  Any cases where the intended semantics are different
> will need separate tests for each language or appropriately conditional
> test assertions in c-c++-common.

And to add - in other related discussions we always rejected adding vector types
of composite types.  I realize that if the hardware supports vector complex
arithmetic instructions this might be the first true good reason to allow these.

Richard.

> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: [PATCH 2/2] MATCH: Move `X <= MAX(X, Y)` before `MIN (X, C1) < C2` pattern

2023-09-13 Thread Richard Biener via Gcc-patches
On Tue, Sep 12, 2023 at 5:41 PM Andrew Pinski via Gcc-patches
 wrote:
>
> Since matching C1 as C2 here will decrease how much other simplifications
> will need to happen to get the final answer.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK

Richard.

> gcc/ChangeLog:
>
> * match.pd (`X <= MAX(X, Y)`):
> Move before `MIN (X, C1) < C2` pattern.
> ---
>  gcc/match.pd | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 36e3da4841b..34b67df784e 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3931,13 +3931,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (if (wi::lt_p (wi::to_wide (@1), wi::to_wide (@2),
>   TYPE_SIGN (TREE_TYPE (@0
>  (cmp @0 @2)
> -/* MIN (X, C1) < C2 -> X < C2 || C1 < C2  */
> -(for minmax (min min max max min min max max)
> - cmp(lt  le  gt  ge  gt  ge  lt  le )
> - comb   (bit_ior bit_ior bit_ior bit_ior bit_and bit_and bit_and bit_and)
> - (simplify
> -  (cmp (minmax @0 INTEGER_CST@1) INTEGER_CST@2)
> -  (comb (cmp @0 @2) (cmp @1 @2
>
>  /* X <= MAX(X, Y) -> true
> X > MAX(X, Y) -> false
> @@ -3949,6 +3942,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(cmp:c @0 (minmax:c @0 @1))
>{ constant_boolean_node (cmp == GE_EXPR || cmp == LE_EXPR, type); } ))
>
> +/* MIN (X, C1) < C2 -> X < C2 || C1 < C2  */
> +(for minmax (min min max max min min max max)
> + cmp(lt  le  gt  ge  gt  ge  lt  le )
> + comb   (bit_ior bit_ior bit_ior bit_ior bit_and bit_and bit_and bit_and)
> + (simplify
> +  (cmp (minmax @0 INTEGER_CST@1) INTEGER_CST@2)
> +  (comb (cmp @0 @2) (cmp @1 @2
> +
>  /* Undo fancy ways of writing max/min or other ?: expressions, like
> a - ((a - b) & -(a < b))  and  a - (a - b) * (a < b) into (a < b) ? b : a.
> People normally use ?: and that is what we actually try to optimize.  */
> --
> 2.31.1
>


Re: [PATCH 1/2] MATCH: [PR111364] Add some more minmax cmp operand simplifications

2023-09-13 Thread Richard Biener via Gcc-patches
On Tue, Sep 12, 2023 at 5:31 PM Andrew Pinski via Gcc-patches
 wrote:
>
> This adds a few more minmax cmp operand simplifications which were missed 
> before.
> `MIN(a,b) < a` -> `a > b`
> `MIN(a,b) >= a` -> `a <= b`
> `MAX(a,b) > a` -> `a < b`
> `MAX(a,b) <= a` -> `a >= b`
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK.  I wonder if any of these are also valid for FP types?

> Note gcc.dg/pr96708-negative.c needed to updated to remove the
> check for MIN/MAX as they have been optimized (correctly) away.
>
> PR tree-optimization/111364
>
> gcc/ChangeLog:
>
> * match.pd (`MIN (X, Y) == X`): Extend
> to min/lt, min/ge, max/gt, max/le.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/execute/minmaxcmp-1.c: New test.
> * gcc.dg/tree-ssa/minmaxcmp-2.c: New test.
> * gcc.dg/pr96708-negative.c: Update testcase.
> * gcc.dg/pr96708-positive.c: Add comment about `return 0`.
> ---
>  gcc/match.pd  |  8 +--
>  .../gcc.c-torture/execute/minmaxcmp-1.c   | 51 +++
>  gcc/testsuite/gcc.dg/pr96708-negative.c   |  4 +-
>  gcc/testsuite/gcc.dg/pr96708-positive.c   |  1 +
>  gcc/testsuite/gcc.dg/tree-ssa/minmaxcmp-2.c   | 30 +++
>  5 files changed, 89 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/minmaxcmp-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmaxcmp-2.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 51985c1bad4..36e3da4841b 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3902,9 +3902,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(maxmin @0 (bit_not @1
>
>  /* MIN (X, Y) == X -> X <= Y  */
> -(for minmax (min min max max)
> - cmp(eq  ne  eq  ne )
> - out(le  gt  ge  lt )
> +/* MIN (X, Y) < X -> X > Y  */
> +/* MIN (X, Y) >= X -> X <= Y  */
> +(for minmax (min min min min max max max max)
> + cmp(eq  ne  lt  ge  eq  ne  gt  le )
> + out(le  gt  gt  le  ge  lt  lt  ge )
>   (simplify
>(cmp:c (minmax:c @0 @1) @0)
>(if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0)))
> diff --git a/gcc/testsuite/gcc.c-torture/execute/minmaxcmp-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/minmaxcmp-1.c
> new file mode 100644
> index 000..6705a053768
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/minmaxcmp-1.c
> @@ -0,0 +1,51 @@
> +#define func(vol, op1, op2)\
> +_Bool op1##_##op2##_##vol (int a, int b)   \
> +{  \
> + vol int x = op_##op1(a, b);   \
> + return op_##op2(x, a);\
> +}
> +
> +#define op_lt(a, b) ((a) < (b))
> +#define op_le(a, b) ((a) <= (b))
> +#define op_eq(a, b) ((a) == (b))
> +#define op_ne(a, b) ((a) != (b))
> +#define op_gt(a, b) ((a) > (b))
> +#define op_ge(a, b) ((a) >= (b))
> +#define op_min(a, b) ((a) < (b) ? (a) : (b))
> +#define op_max(a, b) ((a) > (b) ? (a) : (b))
> +
> +
> +#define funcs(a) \
> + a(min,lt) \
> + a(max,lt) \
> + a(min,gt) \
> + a(max,gt) \
> + a(min,le) \
> + a(max,le) \
> + a(min,ge) \
> + a(max,ge) \
> + a(min,ne) \
> + a(max,ne) \
> + a(min,eq) \
> + a(max,eq)
> +
> +#define funcs1(a,b) \
> +func(,a,b) \
> +func(volatile,a,b)
> +
> +funcs(funcs1)
> +
> +#define test(op1,op2)   \
> +do {\
> +  if (op1##_##op2##_(x,y) != op1##_##op2##_volatile(x,y))   \
> +__builtin_abort();  \
> +} while(0);
> +
> +int main()
> +{
> +  for(int x = -10; x < 10; x++)
> +for(int y = -10; y < 10; y++)
> +{
> +funcs(test)
> +}
> +}
> diff --git a/gcc/testsuite/gcc.dg/pr96708-negative.c 
> b/gcc/testsuite/gcc.dg/pr96708-negative.c
> index 91964d3b971..c9c1aa85558 100644
> --- a/gcc/testsuite/gcc.dg/pr96708-negative.c
> +++ b/gcc/testsuite/gcc.dg/pr96708-negative.c
> @@ -42,7 +42,7 @@ int main()
>  return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "MAX_EXPR" 2 "optimized" } } */
> -/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "optimized" } } */
> +/* Even though test[1-4] originally has MIN/MAX, those can be optimized away
> +   into just comparing a and b arguments. */
>  /* { dg-final { scan-tree-dump-times "return 0;" 1 "optimized" } } */
>  /* { dg-final { scan-tree-dump-not { "return 1;" } "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/pr96708-positive.c 
> b/gcc/testsuite/gcc.dg/pr96708-positive.c
> index 65af85344b6..12c5fedfd30 100644
> --- a/gcc/testsuite/gcc.dg/pr96708-positive.c
> +++ b/gcc/testsuite/gcc.dg/pr96708-positive.c
> @@ -42,6 +42,7 @@ int main()
>  return 0;
>  }
>
> +/* Note main has one `return 0`. */
>  /* { dg-final { scan-tree-dump-times "return 0;" 3 "optimized" } } */
>  /* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
>  /* { dg-final { scan-tree-dump-not { "MAX_EXPR" } "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmaxcmp-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmaxcmp-2.c
> new file 

Re: [PATCH] MATCH: Simplify (a CMP1 b) ^ (a CMP2 b)

2023-09-12 Thread Richard Biener via Gcc-patches
On Tue, Sep 12, 2023 at 6:22 AM Andrew Pinski via Gcc-patches
 wrote:
>
> This adds the missing optimizations here.
> Note we don't need to match where CMP1 and CMP2 are complements of each
> other as that is already handled elsewhere.
>
> I added a new executable testcase to make sure we optimize it correctly
> as I had originally messed up one of the entries for the resulting
> comparison to make sure they were 100% correct.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> PR tree-optimization/107881
>
> gcc/ChangeLog:
>
> * match.pd (`(a CMP1 b) ^ (a CMP2 b)`): New pattern.
> (`(a CMP1 b) == (a CMP2 b)`): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/execute/pr107881-1.c: New test.
> * gcc.dg/tree-ssa/cmpeq-4.c: New test.
> * gcc.dg/tree-ssa/cmpxor-1.c: New test.
> ---
>  gcc/match.pd  |  20 +++
>  .../gcc.c-torture/execute/pr107881-1.c| 115 ++
>  gcc/testsuite/gcc.dg/tree-ssa/cmpeq-4.c   |  51 
>  gcc/testsuite/gcc.dg/tree-ssa/cmpxor-1.c  |  51 
>  4 files changed, 237 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr107881-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cmpeq-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cmpxor-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index e96e385c6fa..39c7ea1088f 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3154,6 +3154,26 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>{ constant_boolean_node (true, type); })
>   ))
>
> +/* Optimize (a CMP b) ^ (a CMP b)  */
> +/* Optimize (a CMP b) != (a CMP b)  */
> +(for op (bit_xor ne)
> + (for cmp1 (lt lt lt le le le)
> +  cmp2 (gt eq ne ge eq ne)
> +  rcmp (ne le gt ne lt ge)
> +  (simplify
> +   (op:c (cmp1:c @0 @1) (cmp2:c @0 @1))
> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0)))
> +(rcmp @0 @1)
> +
> +/* Optimize (a CMP b) == (a CMP b)  */
> +(for cmp1 (lt lt lt le le le)
> + cmp2 (gt eq ne ge eq ne)
> + rcmp (eq gt le eq ge lt)
> + (simplify
> +  (eq:c (cmp1:c @0 @1) (cmp2:c @0 @1))
> +  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0)))
> +(rcmp @0 @1
> +
>  /* We can't reassociate at all for saturating types.  */
>  (if (!TYPE_SATURATING (type))
>
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr107881-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr107881-1.c
> new file mode 100644
> index 000..063ec4c2797
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr107881-1.c
> @@ -0,0 +1,115 @@
> +#define func(vol, op1, op2, op3)   \
> +_Bool op1##_##op2##_##op3##_##vol (int a, int b)   \
> +{  \
> + vol _Bool x = op_##op1(a, b); \
> + vol _Bool y = op_##op2(a, b); \
> + return op_##op3(x, y);\
> +}
> +
> +#define op_lt(a, b) ((a) < (b))
> +#define op_le(a, b) ((a) <= (b))
> +#define op_eq(a, b) ((a) == (b))
> +#define op_ne(a, b) ((a) != (b))
> +#define op_gt(a, b) ((a) > (b))
> +#define op_ge(a, b) ((a) >= (b))
> +#define op_xor(a, b) ((a) ^ (b))
> +
> +
> +#define funcs(a) \
> + a(lt,lt,ne) \
> + a(lt,lt,eq) \
> + a(lt,lt,xor) \
> + a(lt,le,ne) \
> + a(lt,le,eq) \
> + a(lt,le,xor) \
> + a(lt,gt,ne) \
> + a(lt,gt,eq) \
> + a(lt,gt,xor) \
> + a(lt,ge,ne) \
> + a(lt,ge,eq) \
> + a(lt,ge,xor) \
> + a(lt,eq,ne) \
> + a(lt,eq,eq) \
> + a(lt,eq,xor) \
> + a(lt,ne,ne) \
> + a(lt,ne,eq) \
> + a(lt,ne,xor) \
> +  \
> + a(le,lt,ne) \
> + a(le,lt,eq) \
> + a(le,lt,xor) \
> + a(le,le,ne) \
> + a(le,le,eq) \
> + a(le,le,xor) \
> + a(le,gt,ne) \
> + a(le,gt,eq) \
> + a(le,gt,xor) \
> + a(le,ge,ne) \
> + a(le,ge,eq) \
> + a(le,ge,xor) \
> + a(le,eq,ne) \
> + a(le,eq,eq) \
> + a(le,eq,xor) \
> + a(le,ne,ne) \
> + a(le,ne,eq) \
> + a(le,ne,xor)  \
> + \
> + a(gt,lt,ne) \
> + a(gt,lt,eq) \
> + a(gt,lt,xor) \
> + a(gt,le,ne) \
> + a(gt,le,eq) \
> + a(gt,le,xor) \
> + a(gt,gt,ne) \
> + a(gt,gt,eq) \
> + a(gt,gt,xor) \
> + a(gt,ge,ne) \
> + a(gt,ge,eq) \
> + a(gt,ge,xor) \
> + a(gt,eq,ne) \
> + a(gt,eq,eq) \
> + a(gt,eq,xor) \
> + a(gt,ne,ne) \
> + a(gt,ne,eq) \
> + a(gt,ne,xor) \
> +  \
> + a(ge,lt,ne) \
> + a(ge,lt,eq) \
> + a(ge,lt,xor) \
> + a(ge,le,ne) \
> + a(ge,le,eq) \
> + a(ge,le,xor) \
> + a(ge,gt,ne) \
> + a(ge,gt,eq) \
> + a(ge,gt,xor) \
> + a(ge,ge,ne) \
> + a(ge,ge,eq) \
> + a(ge,ge,xor) \
> + a(ge,eq,ne) \
> + a(ge,eq,eq) \
> + a(ge,eq,xor) \
> + a(ge,ne,ne) \
> + a(ge,ne,eq) \
> + a(ge,ne,xor)
> +
> +#define funcs1(a,b,c) \
> +func(,a,b,c) \
> +func(volatile,a,b,c)
> +
> +funcs(funcs1)
> +
> +#define test(op1,op2,op3)  \
> +do {   \
> +  if (op1##_##op2##_##op3##_(x,y)  \
> +  != op1##_##op2##_##op3##_volatile(x,y))  \
> +__builtin_abort(); \
> +} while(0);
> +
> +int main()
> +{
> +  for(int x = 

Re: [PATCH] small _BitInt tweaks

2023-09-12 Thread Richard Biener via Gcc-patches
On Mon, 11 Sep 2023, Jakub Jelinek wrote:

> Hi!
> 
> When discussing PR111369 with Andrew Pinski, I've realized that
> I haven't added BITINT_TYPE handling to range_check_type.  Right now
> (unsigned) max + 1 == (unsigned) min for signed _BitInt,l so I think we
> don't need to do the extra hops for BITINT_TYPE (though possibly we don't
> need them for INTEGER_TYPE either in the two's complement word and we don't
> support anything else, though I really don't know if Ada or some other
> FEs don't create weird INTEGER_TYPEs).
> And, also I think it is undesirable when being asked for signed_type_for
> of unsigned _BitInt(1) (which is valid) to get signed _BitInt(1) (which is
> invalid, the standard only allows signed _BitInt(2) and larger), so the
> patch returns 1-bit signed INTEGER_TYPE for those cases.

I think the last bit is a bit surprising - do the frontends use
signed_or_unsigned_type_for and would they be confused if getting
back an INTEGER_TYPE here?

The range_check_type bits are OK.  For the tree.cc part I think
the middle-end can just handle signed 1-bit BITINT fine?

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2023-09-11  Jakub Jelinek  
> 
> gcc/
>   * tree.cc (signed_or_unsigned_type_for): Return INTEGER_TYPE for
>   signed variant of unsigned _BitInt(1).
>   * fold-const.cc (range_check_type): Handle BITINT_TYPE like
>   OFFSET_TYPE.
> gcc/c-family/
>   * c-common.cc (c_common_signed_or_unsigned_type): Return INTEGER_TYPE
>   for signed variant of unsigned _BitInt(1).
> 
> --- gcc/tree.cc.jj2023-09-06 17:50:30.707589026 +0200
> +++ gcc/tree.cc   2023-09-11 16:24:58.749625569 +0200
> @@ -11096,7 +11096,7 @@ signed_or_unsigned_type_for (int unsigne
>else
>  return NULL_TREE;
>  
> -  if (TREE_CODE (type) == BITINT_TYPE)
> +  if (TREE_CODE (type) == BITINT_TYPE && (unsignedp || bits > 1))
>  return build_bitint_type (bits, unsignedp);
>return build_nonstandard_integer_type (bits, unsignedp);
>  }
> --- gcc/c-family/c-common.cc.jj   2023-09-06 17:34:24.467254960 +0200
> +++ gcc/c-family/c-common.cc  2023-09-11 16:24:07.873300311 +0200
> @@ -2739,7 +2739,9 @@ c_common_signed_or_unsigned_type (int un
>|| TYPE_UNSIGNED (type) == unsignedp)
>  return type;
>  
> -  if (TREE_CODE (type) == BITINT_TYPE)
> +  if (TREE_CODE (type) == BITINT_TYPE
> +  /* signed _BitInt(1) is invalid, avoid creating that.  */
> +  && (unsignedp || TYPE_PRECISION (type) > 1))
>  return build_bitint_type (TYPE_PRECISION (type), unsignedp);
>  
>  #define TYPE_OK(node)
> \
> --- gcc/fold-const.cc.jj  2023-09-11 11:05:47.473728473 +0200
> +++ gcc/fold-const.cc 2023-09-11 16:28:06.052141516 +0200
> @@ -5565,7 +5565,12 @@ range_check_type (tree etype)
>else
>   return NULL_TREE;
>  }
> -  else if (POINTER_TYPE_P (etype) || TREE_CODE (etype) == OFFSET_TYPE)
> +  else if (POINTER_TYPE_P (etype)
> +|| TREE_CODE (etype) == OFFSET_TYPE
> +/* Right now all BITINT_TYPEs satisfy
> +   (unsigned) max + 1 == (unsigned) min, so no need to verify
> +   that like for INTEGER_TYPEs.  */
> +|| TREE_CODE (etype) == BITINT_TYPE)
>  etype = unsigned_type_for (etype);
>return etype;
>  }
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] sccvn: Avoid ICEs on _BitInt load BIT_AND_EXPR mask [PR111338]

2023-09-12 Thread Richard Biener via Gcc-patches
On Mon, 11 Sep 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase ICEs, because vn_walk_cb_data::push_partial_def
> uses a fixed size buffer (64 target bytes) for its
> construction/deconstruction of partial stores and fails if larger precision
> than that is needed, and the PR93582 changes assert push_partial_def
> succeeds (and check the various other conditions much earlier when seeing
> the BIT_AND_EXPR statement, like CHAR_BIT == 8, BITS_PER_UNIT == 8,
> BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN, etc.).  So, just removing the assert
> and allowing it fail there doesn't really work and ICEs later on.
> 
> The following patch moves the bufsize out of the method and tests it
> together with the other checks.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> BTW, perhaps we could increase the bufsize as well or in addition to
> increasing it make the buffer allocated using XALLOCAVEC, but still I think
> it is useful to have some upper bound and so I think this patch is useful
> even in that case.

Yeah, the size is choosen to match the largest vector mode we currently
have.

Richard.

> 2023-09-11  Jakub Jelinek  
> 
>   PR middle-end/111338
>   * tree-ssa-sccvn.cc (struct vn_walk_cb_data): Add bufsize non-static
>   data member.
>   (vn_walk_cb_data::push_partial_def): Remove bufsize variable.
>   (visit_nary_op): Avoid the BIT_AND_EXPR with constant rhs2
>   optimization if type's precision is too large for
>   vn_walk_cb_data::bufsize.
> 
>   * gcc.dg/bitint-37.c: New test.
> 
> --- gcc/tree-ssa-sccvn.cc.jj  2023-09-06 17:28:24.232977433 +0200
> +++ gcc/tree-ssa-sccvn.cc 2023-09-08 13:22:27.928158846 +0200
> @@ -1903,6 +1903,7 @@ struct vn_walk_cb_data
>alias_set_type first_base_set;
>splay_tree known_ranges;
>obstack ranges_obstack;
> +  static constexpr HOST_WIDE_INT bufsize = 64;
>  };
>  
>  vn_walk_cb_data::~vn_walk_cb_data ()
> @@ -1973,7 +1974,6 @@ vn_walk_cb_data::push_partial_def (pd_da
>  HOST_WIDE_INT offseti,
>  HOST_WIDE_INT maxsizei)
>  {
> -  const HOST_WIDE_INT bufsize = 64;
>/* We're using a fixed buffer for encoding so fail early if the object
>   we want to interpret is bigger.  */
>if (maxsizei > bufsize * BITS_PER_UNIT
> @@ -5414,6 +5414,7 @@ visit_nary_op (tree lhs, gassign *stmt)
> && CHAR_BIT == 8
> && BITS_PER_UNIT == 8
> && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN
> +   && TYPE_PRECISION (type) <= vn_walk_cb_data::bufsize * BITS_PER_UNIT
> && !integer_all_onesp (gimple_assign_rhs2 (stmt))
> && !integer_zerop (gimple_assign_rhs2 (stmt)))
>   {
> --- gcc/testsuite/gcc.dg/bitint-37.c.jj   2023-09-08 13:27:51.676882523 
> +0200
> +++ gcc/testsuite/gcc.dg/bitint-37.c  2023-09-08 13:27:22.460268614 +0200
> @@ -0,0 +1,11 @@
> +/* PR middle-end/111338 */
> +/* { dg-do compile { target bitint575 } } */
> +/* { dg-options "-O1" } */
> +
> +_BitInt(575) e;
> +
> +_BitInt(575)
> +foo (void)
> +{
> +  return e & 1;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] match: Don't sink comparisons into vec_cond operands.

2023-09-12 Thread Richard Biener via Gcc-patches
On Fri, Sep 8, 2023 at 7:55 PM Robin Dapp via Gcc-patches
 wrote:
>
> Hi,
>
> on riscv gcc.dg/pr70252.c ICEs at gimple-isel.cc:283.  This is because
> we created the gimple statement
>
>   mask__7.36_170 = VEC_COND_EXPR  }>;
>
> during vrp2.
>
> What happens is that, starting with
>   maskdest = (vec_cond mask1 1 0) >= (vec_cond mask2 1 0)
> we fold to
>   maskdest = mask1 >= (vec_cond (mask2 1 0))
> and then sink the "mask1 >=" into the vec_cond so we end up with
>   maskdest = vec_cond (mask2 ? mask1 : 0),
> i.e. a vec_cond with a mask "data mode".

I don't see how the patterns change the modes involved in the vec_cond
nor how they change the condition.

> In gimple-isel, when the target does not provide a vcond_mask
> implementation for that (which none does) we fail the assertion that the
> mask mode be MODE_VECTOR_INT.
>
> To prevent this, this patch restricts the match.pd sinking pattern to
> non-mask types.  I was also thinking about restricting the type of
> the operands, wondering if that would be less intrusive.

If you can show what vec_cond is supported before the transform
(with types/modes shown) and what vec_cond is not, after the transform
then those patterns need to be adjusted to check for the support of
the target operation.  I'll note that we have many patterns like

(simplify
 (vec_cond (vec_cond:s @0 @3 integer_zerop) @1 @2)
 (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
  (vec_cond (bit_and @0 @3) @1 @2)))

which check optimize_vectors_before_lowering_p () (but even then if
the new vec_cond isn't supported by the taget but the original ones
are we get sub-optimal code).

Richard.

> Bootstrapped and regression-tested on x86 and aarch64.
>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
> PR target/111337
> * match.pd: Do not sink comparisons into vec_conds when the type
> is a vector mask.
> ---
>  gcc/match.pd | 24 +++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 8c24dae71cd..db3e698f471 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4856,7 +4856,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(vec_cond @0 (view_convert! @1) (view_convert! @2
>
>  /* Sink binary operation to branches, but only if we can fold it.  */
> -(for op (tcc_comparison plus minus mult bit_and bit_ior bit_xor
> +(for op (plus minus mult bit_and bit_ior bit_xor
>  lshift rshift rdiv trunc_div ceil_div floor_div round_div
>  trunc_mod ceil_mod floor_mod round_mod min max)
>  /* (c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e) */
> @@ -4872,6 +4872,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(op @3 (vec_cond:s @0 @1 @2))
>(vec_cond @0 (op! @3 @1) (op! @3 @2
>
> +/* Comparison sinks might be folded into vector masks which could
> +   end up as "data" operand of a vec_cond
> +   e.g. (vec_cond @0 (mask1) (...)).
> +   gimple-isel does not handle such cases if the target does not provide
> +   a vcond_mask.  Therefore, restrict the operands to non-mask classes.  */
> +(for op (tcc_comparison)
> +/* (c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e) */
> + (simplify
> +  (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
> +  (if (GET_MODE_CLASS (TYPE_MODE (type)) != MODE_VECTOR_BOOL)
> +(vec_cond @0 (op! @1 @3) (op! @2 @4
> +
> +/* (c ? a : b) op d  -->  c ? (a op d) : (b op d) */
> + (simplify
> +  (op (vec_cond:s @0 @1 @2) @3)
> +  (if (GET_MODE_CLASS (TYPE_MODE (type)) != MODE_VECTOR_BOOL)
> +(vec_cond @0 (op! @1 @3) (op! @2 @3
> + (simplify
> +  (op @3 (vec_cond:s @0 @1 @2))
> +  (if (GET_MODE_CLASS (TYPE_MODE (type)) != MODE_VECTOR_BOOL)
> +(vec_cond @0 (op! @3 @1) (op! @3 @2)
> +
>  #if GIMPLE
>  (match (nop_atomic_bit_test_and_p @0 @1 @4)
>   (bit_and (convert?@4 (ATOMIC_FETCH_OR_XOR_N @2 INTEGER_CST@0 @3))
> --
> 2.41.0
>


Re: [PATCH] math-opts: Add dbgcounter for FMA formation

2023-09-12 Thread Richard Biener via Gcc-patches
On Thu, Sep 7, 2023 at 6:47 PM Martin Jambor  wrote:
>
> Hello,
>
> This patch is a simple addition of a debug counter to FMA formation in
> tree-ssa-math-opts.cc.  Given that issues with FMAs do occasionally
> pop up, it seems genuinely useful.
>
> I simply added an if right after the initial checks in
> convert_mult_to_fma even though when FMA formation deferring is
> active (i.e. when targeting Zen CPUs) this would interact with it (and
> at this moment lead to producing all deferred candidates), so when
> using the dbg counter to find a harmful set of FMAs, it is probably
> best to also set param_avoid_fma_max_bits to zero.  I could not find a
> better place which would not also make the code unnecessarily more
> complicated.
>
> Bootstrapped and tested on x86_64-linux.  OK for master?

OK.

> Thanks,
>
> Martin
>
>
>
> gcc/ChangeLog:
>
> 2023-09-06  Martin Jambor  
>
> * dbgcnt.def (form_fma): New.
> * tree-ssa-math-opts.cc: Include dbgcnt.h.
> (convert_mult_to_fma): Bail out if the debug counter say so.
> ---
>  gcc/dbgcnt.def| 1 +
>  gcc/tree-ssa-math-opts.cc | 4 
>  2 files changed, 5 insertions(+)
>
> diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
> index 9e2f1d857b4..871cbf75d93 100644
> --- a/gcc/dbgcnt.def
> +++ b/gcc/dbgcnt.def
> @@ -162,6 +162,7 @@ DEBUG_COUNTER (dom_unreachable_edges)
>  DEBUG_COUNTER (dse)
>  DEBUG_COUNTER (dse1)
>  DEBUG_COUNTER (dse2)
> +DEBUG_COUNTER (form_fma)
>  DEBUG_COUNTER (gcse2_delete)
>  DEBUG_COUNTER (gimple_unroll)
>  DEBUG_COUNTER (global_alloc_at_func)
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index 95c22694368..3db69ad5733 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -116,6 +116,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "targhooks.h"
>  #include "domwalk.h"
>  #include "tree-ssa-math-opts.h"
> +#include "dbgcnt.h"
>
>  /* This structure represents one basic block that either computes a
> division, or is a common dominator for basic block that compute a
> @@ -3366,6 +3367,9 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree 
> op2,
>&& !has_single_use (mul_result))
>  return false;
>
> +  if (!dbg_cnt (form_fma))
> +return false;
> +
>/* Make sure that the multiplication statement becomes dead after
>   the transformation, thus that all uses are transformed to FMAs.
>   This means we assume that an FMA operation has the same cost
> --
> 2.41.0
>


Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems

2023-09-12 Thread Richard Biener via Gcc-patches
On Thu, Sep 7, 2023 at 2:32 PM Segher Boessenkool
 wrote:
>
> On Thu, Sep 07, 2023 at 02:23:00PM +0300, Dan Carpenter wrote:
> > On Thu, Sep 07, 2023 at 06:04:09AM -0500, Segher Boessenkool wrote:
> > > On Thu, Sep 07, 2023 at 12:48:25PM +0300, Dan Carpenter via Gcc-patches 
> > > wrote:
> > > > I started to hunt
> > > > down all the Makefile which add a -Werror but there are a lot and
> > > > eventually I got bored and gave up.
> > >
> > > I have a patch stack for that, since 2014 or so.  I build Linux with
> > > unreleased GCC versions all the time, so pretty much any new warning is
> > > fatal if you unwisely use -Werror.
> > >
> > > > Someone should patch GCC so there it checks an environment variable to
> > > > ignore -Werror.  Somethine like this?
> > >
> > > No.  You should patch your program, instead.
> >
> > There are 2930 Makefiles in the kernel source.
>
> Yes.  And you need patches to about thirty.  Or a bit more, if you want
> to do it more cleanly.  This isn't a guess.
>
> > > One easy way is to add a
> > > -Wno-error at the end of your command lines.  Or even just -w if you
> > > want or need a bigger hammer.
> >
> > I tried that.  Some of the Makefiles check an environemnt variable as
> > well if you want to turn off -Werror.  It's not a complete solution at
> > all.  I have no idea what a complete solution looks like because I gave
> > up.
>
> A solution can not involve changing the compiler.  That is just saying
> the kernel doesn't know how to fix its own problems, so let's give the
> compiler some more unnecessary problems.

You can change the compiler by replacing it with a script that appends
-Wno-error
for example.

> > > Or nicer, put it all in Kconfig, like powerpc already has for example.
> > > There is a CONFIG_WERROR as well, so maybe use that in all places?
> >
> > That's a good idea but I'm trying to compile old kernels and not the
> > current kernel.
>
> You can patch older kernels, too, you know :-)
>
> If you need to not make any changes to your source code for some crazy
> reason (political perhaps?), just use a shell script or shell function
> instead of invoking the compiler driver directly?
>
>
> Segher
>
> Segher


Re: [PATCH] Tweak language choice in config-list.mk

2023-09-12 Thread Richard Biener via Gcc-patches
On Thu, Sep 7, 2023 at 11:30 AM Richard Sandiford via Gcc-patches
 wrote:
>
> When I tried to use config-list.mk, the build for every triple except
> the build machine's failed for m2.  This is because, unlike other
> languages, m2 builds target objects during all-gcc.  The build will
> therefore fail unless you have access to an appropriate binutils
> (or an equivalent).  That's quite a big ask for over 100 targets. :)
>
> This patch therefore makes m2 an optional inclusion.
>
> Doing that wasn't entirely straightforward though.  The current
> configure line includes "--enable-languages=all,...", which means
> that the "..." can only force languages to be added that otherwise
> wouldn't have been.  (I.e. the only effect of the "..." is to
> override configure autodetection.)
>
> The choice of all,ada and:
>
>   # Make sure you have a recent enough gcc (with ada support) in your path so
>   # that --enable-werror-always will work.
>
> make it clear that lack of GNAT should be a build failure rather than
> silently ignored.  This predates the D frontend, which requires GDC
> in the same way that Ada requires GNAT.  I don't know of a reason
> why D should be treated differently.
>
> The patch therefore expands the "all" into a specific list of
> languages.
>
> That in turn meant that Fortran had to be handled specially,
> since bpf and mmix don't support Fortran.
>
> Perhaps there's an argument that m2 shouldn't build target objects
> during all-gcc,

Yes, I think that's unfortunate - can you open a bugreport for this?

> but (a) it works for practical usage and (b) the
> patch is an easy workaround.  I'd be happy for the patch to be
> reverted if the build system changes.
>
> OK to install?

OK.

> Richard
>
>
> gcc/
> * contrib/config-list.mk (OPT_IN_LANGUAGES): New variable.
> ($(LIST)): Replace --enable-languages=all with a specifc list.
> Disable fortran on bpf and mmix.  Enable the languages in
> OPT_IN_LANGUAGES.
> ---
>  contrib/config-list.mk | 17 ++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/contrib/config-list.mk b/contrib/config-list.mk
> index e570b13c71b..50ecb014bc0 100644
> --- a/contrib/config-list.mk
> +++ b/contrib/config-list.mk
> @@ -12,6 +12,11 @@ TEST=all-gcc
>  # supply an absolute path.
>  GCC_SRC_DIR=../../gcc
>
> +# Define this to ,m2 if you want to build Modula-2.  Modula-2 builds target
> +# objects during all-gcc, so it can only be included if you've installed
> +# binutils (or an equivalent) for each target.
> +OPT_IN_LANGUAGES=
> +
>  # Use -j / -l make arguments and nice to assure a smooth resource-efficient
>  # load on the build machine, e.g. for 24 cores:
>  # svn co svn://gcc.gnu.org/svn/gcc/branches/foo-branch gcc
> @@ -126,17 +131,23 @@ $(LIST): make-log-dir
> TGT=`echo $@ | awk 'BEGIN { FS = "OPT" }; { print $$1 }'` &&  
>   \
> TGT=`$(GCC_SRC_DIR)/config.sub $$TGT` &&  
>   \
> case $$TGT in 
>   \
> -   *-*-darwin* | *-*-cygwin* | *-*-mingw* | *-*-aix* | 
> bpf-*-*)\
> +   bpf-*-*)  
>   \
> ADDITIONAL_LANGUAGES="";  
>   \
> ;;
>   \
> -   *)
>   \
> +   *-*-darwin* | *-*-cygwin* | *-*-mingw* | *-*-aix* | 
> bpf-*-*)\
> +   ADDITIONAL_LANGUAGES=",fortran";  
>   \
> +   ;;
>   \
> +   mmix-*-*) 
>   \
> ADDITIONAL_LANGUAGES=",go";   
>   \
> ;;
>   \
> +   *)
>   \
> +   ADDITIONAL_LANGUAGES=",fortran,go";   
>   \
> +   ;;
>   \
> esac &&   
>   \
> $(GCC_SRC_DIR)/configure  
>   \
> --target=$(subst SCRIPTS,`pwd`/../scripts/,$(subst 
> OPT,$(empty) -,$@))  \
> --enable-werror-always ${host_options}
>   \
> -

Re: [PATCH] Checking undefined_p before using the vr

2023-09-12 Thread Richard Biener via Gcc-patches
On Thu, 7 Sep 2023, Jiufu Guo wrote:

> Hi,
> 
> As discussed in PR111303:
> 
> For pattern "(X + C) / N": "div (plus@3 @0 INTEGER_CST@1) INTEGER_CST@2)",
> Even if "X" has value-range and "X + C" does not overflow, "@3" may still
> be undefined. Like below example:
> 
> _3 = _2 + -5;
> if (0 != 0)
>   goto ; [34.00%]
> else
>   goto ; [66.00%]
> ;;  succ:   3
> ;;  4
> 
> ;; basic block 3, loop depth 0
> ;;  pred:   2
> _5 = _3 / 5; 
> ;;  succ:   4
> 
> The whole pattern "(_2 + -5 ) / 5" is in "bb 3", but "bb 3" would be
> unreachable (because "if (0 != 0)" is always false).
> And "get_range_query (cfun)->range_of_expr (vr3, @3)" is checked in
> "bb 3", "range_of_expr" gets an "undefined vr3". Where "@3" is "_5".
> 
> So, before using "vr3", it would be safe to check "!vr3.undefined_p ()".
> 
> Bootstrap & regtest pass on ppc64{,le} and x86_64.
> Is this ok for trunk?

OK, but I wonder why ->range_of_expr () doesn't return false for
undefined_p ()?  While "undefined" technically means we can treat
it as nonnegative_p (or not, maybe but maybe not both), we seem to
not want to do that.  So why expose it at all to ranger users
(yes, internally we in some places want to handle undefined).

Richard.

> BR,
> Jeff (Jiufu Guo)
> 
>   PR middle-end/111303
> 
> gcc/ChangeLog:
> 
>   * match.pd ((X - N * M) / N): Add undefined_p checking.
>   (X + N * M) / N): Likewise.
>   ((X + C) div_rshift N): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr111303.c: New test.
> 
> ---
>  gcc/match.pd|  3 +++
>  gcc/testsuite/gcc.dg/pr111303.c | 11 +++
>  2 files changed, 14 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr111303.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 801edb128f9..e2583ca7960 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -975,6 +975,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> /* "X+(N*M)" doesn't overflow.  */
> && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr3)
> && get_range_query (cfun)->range_of_expr (vr4, @4)
> +   && !vr4.undefined_p ()
> /* "X+N*M" is not with opposite sign as "X".  */
> && (TYPE_UNSIGNED (type)
>  || (vr0.nonnegative_p () && vr4.nonnegative_p ())
> @@ -995,6 +996,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> /* "X - (N*M)" doesn't overflow.  */
> && range_op_handler (MINUS_EXPR).overflow_free_p (vr0, vr3)
> && get_range_query (cfun)->range_of_expr (vr4, @4)
> +   && !vr4.undefined_p ()
> /* "X-N*M" is not with opposite sign as "X".  */
> && (TYPE_UNSIGNED (type)
>  || (vr0.nonnegative_p () && vr4.nonnegative_p ())
> @@ -1025,6 +1027,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> /* "X+C" doesn't overflow.  */
> && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr1)
> && get_range_query (cfun)->range_of_expr (vr3, @3)
> +   && !vr3.undefined_p ()
> /* "X+C" and "X" are not of opposite sign.  */
> && (TYPE_UNSIGNED (type)
> || (vr0.nonnegative_p () && vr3.nonnegative_p ())
> diff --git a/gcc/testsuite/gcc.dg/pr111303.c b/gcc/testsuite/gcc.dg/pr111303.c
> new file mode 100644
> index 000..eaabe55c105
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr111303.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* Make sure no ICE. */
> +unsigned char a;
> +int b(int c) {
> +  if (c >= 5000)
> +return c / 5;
> +}
> +void d() { b(a - 5); }
> +int main() {}
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] ggc, jit: forcibly clear GTY roots in jit

2023-09-12 Thread Richard Biener via Gcc-patches
On Wed, Sep 6, 2023 at 3:41 PM David Malcolm via Gcc-patches
 wrote:
>
> As part of Antoyo's work on supporting LTO in rustc_codegen_gcc, he
> noticed an ICE inside libgccjit when compiling certain rust files.
>
> Debugging libgccjit showed that outdated information from a previous
> in-memory compile was referring to ad-hoc locations in the previous
> compile's line_table.
>
> The issue turned out to be the function decls in internal_fn_fnspec_array
> from the previous compile keeping alive the symtab nodes for these
> functions, and from this finding other functions in the previous
> compile, walking their CFGs, and finding ad-hoc data pointers in an edge
> with a location_t using ad-hoc data from the previous line_table
> instance, and thus a use-after-free ICE attempting to use this ad-hoc
> data.
>
> Previously in toplev::finalize we've fixed global state "piecemeal" by
> calling out to individual source_name_cc_finalize functions.  However,
> it occurred to me that we have run-time information on where the
> GTY-marked pointers are.
>
> Hence this patch takes something of a "big hammer" approach by adding a
> new ggc_common_finalize that walks the GC roots, zeroing all of the
> pointers.  I stepped through this in the debugger and observed that, in
> particular, this correctly zeroes the internal_fn_fnspec_array at the end
> of a libgccjit compile.  Antoyo reports that this fixes the ICE for him.
> Doing so uncovered an ICE with libgccjit in dwarf2cfi.cc due to reuse of
> global variables from the previous compile, which this patch also fixes.
>
> I noticed that in ggc_mark_roots when clearing deletable roots we only
> clear the initial element in each gcc_root_tab_t.  This looks like a
> latent bug to me, which the patch fixes.  That said, there don't seem to
> be any deletable roots where the number of elements != 1.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> OK for trunk?

OK.

Thanks,
Richard.

> Thanks
> Dave
>
> gcc/ChangeLog:
> * dwarf2cfi.cc (dwarf2cfi_cc_finalize): New.
> * dwarf2out.h (dwarf2cfi_cc_finalize): New decl.
> * ggc-common.cc (ggc_mark_roots): Multiply by rti->nelt when
> clearing the deletable gcc_root_tab_t.
> (ggc_common_finalize): New.
> * ggc.h (ggc_common_finalize): New decl.
> * toplev.cc (toplev::finalize): Call dwarf2cfi_cc_finalize and
> ggc_common_finalize.
> ---
>  gcc/dwarf2cfi.cc  |  9 +
>  gcc/dwarf2out.h   |  1 +
>  gcc/ggc-common.cc | 23 ++-
>  gcc/ggc.h |  2 ++
>  gcc/toplev.cc |  3 +++
>  5 files changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/dwarf2cfi.cc b/gcc/dwarf2cfi.cc
> index ddc728f4ad00..f1777c0a4cf1 100644
> --- a/gcc/dwarf2cfi.cc
> +++ b/gcc/dwarf2cfi.cc
> @@ -3822,4 +3822,13 @@ make_pass_dwarf2_frame (gcc::context *ctxt)
>return new pass_dwarf2_frame (ctxt);
>  }
>
> +void dwarf2cfi_cc_finalize ()
> +{
> +  add_cfi_insn = NULL;
> +  add_cfi_vec = NULL;
> +  cur_trace = NULL;
> +  cur_row = NULL;
> +  cur_cfa = NULL;
> +}
> +
>  #include "gt-dwarf2cfi.h"
> diff --git a/gcc/dwarf2out.h b/gcc/dwarf2out.h
> index 870b56a6a372..61a996050ff9 100644
> --- a/gcc/dwarf2out.h
> +++ b/gcc/dwarf2out.h
> @@ -419,6 +419,7 @@ struct fixed_point_type_info
>  } scale_factor;
>  };
>
> +void dwarf2cfi_cc_finalize (void);
>  void dwarf2out_cc_finalize (void);
>
>  /* Some DWARF internals are exposed for the needs of DWARF-based debug
> diff --git a/gcc/ggc-common.cc b/gcc/ggc-common.cc
> index bed7a9d4d021..95803fa95a17 100644
> --- a/gcc/ggc-common.cc
> +++ b/gcc/ggc-common.cc
> @@ -86,7 +86,7 @@ ggc_mark_roots (void)
>
>for (rt = gt_ggc_deletable_rtab; *rt; rt++)
>  for (rti = *rt; rti->base != NULL; rti++)
> -  memset (rti->base, 0, rti->stride);
> +  memset (rti->base, 0, rti->stride * rti->nelt);
>
>for (rt = gt_ggc_rtab; *rt; rt++)
>  ggc_mark_root_tab (*rt);
> @@ -1293,3 +1293,24 @@ report_heap_memory_use ()
>  SIZE_AMOUNT (MALLINFO_FN ().arena));
>  #endif
>  }
> +
> +/* Forcibly clear all GTY roots.  */
> +
> +void
> +ggc_common_finalize ()
> +{
> +  const struct ggc_root_tab *const *rt;
> +  const_ggc_root_tab_t rti;
> +
> +  for (rt = gt_ggc_deletable_rtab; *rt; rt++)
> +for (rti = *rt; rti->base != NULL; rti++)
> +  memset (rti->base, 0, rti->stride * rti->nelt);
> +
> +  for (rt = gt_ggc_rtab; *rt; rt++)
> +for (rti = *rt; rti->base != NULL; rti++)
> +  memset (rti->base, 0, rti->stride * rti->nelt);
> +
> +  for (rt = gt_pch_scalar_rtab; *rt; rt++)
> +for (rti = *rt; rti->base != NULL; rti++)
> +  memset (rti->base, 0, rti->stride * rti->nelt);
> +}
> diff --git a/gcc/ggc.h b/gcc/ggc.h
> index 34108e2f0061..3280314f8481 100644
> --- a/gcc/ggc.h
> +++ b/gcc/ggc.h
> @@ -368,4 +368,6 @@ inline void gt_ggc_mx (unsigned long int) { }
>  inline void gt_ggc_mx (long long int) { }
>  inline void gt_ggc_mx (unsigned long long int) { }
>
> +extern void 

Re: [PATCH] ssa_name_has_boolean_range vs signed-boolean:31 types

2023-09-12 Thread Richard Biener via Gcc-patches
On Sat, Sep 2, 2023 at 4:33 AM Andrew Pinski via Gcc-patches
 wrote:
>
> This turns out to be a latent bug in ssa_name_has_boolean_range
> where it would return true for all boolean types but all of the
> uses of ssa_name_has_boolean_range was expecting 0/1 as the range
> rather than [-1,0].
> So when I fixed vector lower to do all comparisons in boolean_type
> rather than still in the signed-boolean:31 type (to fix a different issue),
> the pattern in match for `-(type)!A -> (type)A - 1.` would assume A (which
> was signed-boolean:31) had a range of [0,1] which broke down and sometimes
> gave us -1/-2 as values rather than what we were expecting of -1/0.
>
> This was the simpliest patch I found while testing.
>
> We have another way of matching [0,1] range which we could use instead
> of ssa_name_has_boolean_range except that uses only the global ranges
> rather than the local range (during VRP).
> I tried to clean this up slightly by using gimple_match_zero_one_valuedp
> inside ssa_name_has_boolean_range but that failed because due to using
> only the global ranges. I then tried to change get_nonzero_bits to use
> the local ranges at the optimization time but that failed also because
> we would remove branches to __builtin_unreachable during evrp and lose
> information as we don't set the global ranges during evrp.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

I guess the name of 'ssa_name_has_boolean_range' is unfortunate here.

We also lack documenting BOOLEAN_TYPE with [-1,0], neither tree.def
nor generic.texi elaborate on those.  build_nonstandard_boolean_type
simply calls fixup_signed_type which will end up setting MIN/MAX value
to [INT_MIN, INT_MAX].

Iff ssa_name_has_boolean_range really checks for zero_one we should
maybe rename it.

Iff _all_ signed BOOLEAN_TYPE have a true value of -1 (signed:8 can
very well represent [0, 1] as well) then we should document that.  (No,
I don't think we want TYPE_MIN/MAX_VALUE to specify this)

At some point the middle-end was very conservative and only considered
unsigned BOOLEAN_TYPE with 1 bit precision to have a [0,1] range.

I think that a more general 'boolean range' (not [0, 1]) query is only
possible if we hand in context.

The patch is definitely correct - not all BOOLEAN_TYPE types have a [0, 1]
range, thus OK.

Does Ada have signed booleans that are BOOLEAN_TYPE but do _not_
have [-1, 0] as range?  I think documenting [0, 1] for (single-bit precision?)
unsigned BOOLEAN_TYPE and [-1, 1] for signed BOOLEAN_TYPE would
be conservative.

Thanks,
Richard.

> PR 110817
>
> gcc/ChangeLog:
>
> * tree-ssanames.cc (ssa_name_has_boolean_range): Remove the
> check for boolean type as they don't have "[0,1]" range.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/execute/pr110817-1.c: New test.
> * gcc.c-torture/execute/pr110817-2.c: New test.
> * gcc.c-torture/execute/pr110817-3.c: New test.
> ---
>  gcc/testsuite/gcc.c-torture/execute/pr110817-1.c | 13 +
>  gcc/testsuite/gcc.c-torture/execute/pr110817-2.c | 16 
>  gcc/testsuite/gcc.c-torture/execute/pr110817-3.c | 14 ++
>  gcc/tree-ssanames.cc |  4 
>  4 files changed, 43 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110817-1.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110817-2.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110817-3.c
>
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110817-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr110817-1.c
> new file mode 100644
> index 000..1d33fa9a207
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr110817-1.c
> @@ -0,0 +1,13 @@
> +typedef unsigned long __attribute__((__vector_size__ (8))) V;
> +
> +
> +V c;
> +
> +int
> +main (void)
> +{
> +  V v = ~((V) { } <=0);
> +  if (v[0])
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110817-2.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr110817-2.c
> new file mode 100644
> index 000..1f759178425
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr110817-2.c
> @@ -0,0 +1,16 @@
> +
> +typedef unsigned char u8;
> +typedef unsigned __attribute__((__vector_size__ (8))) V;
> +
> +V v;
> +unsigned char c;
> +
> +int
> +main (void)
> +{
> +  V x = (v > 0) > (v != c);
> + // V x = foo ();
> +  if (x[0] || x[1])
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110817-3.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr110817-3.c
> new file mode 100644
> index 000..36f09c88dd9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr110817-3.c
> @@ -0,0 +1,14 @@
> +typedef unsigned __attribute__((__vector_size__ (1*sizeof(unsigned V;
> +
> +V v;
> +unsigned char c;
> +
> +int
> +main (void)
> +{
> +  V x = (v > 0) > (v != c);
> +  volatile signed int t = x[0];
> +  if (t)
> +

Re: [PATCH] Improve rewrite_to_defined_overflow for lhs already the correct type

2023-09-12 Thread Richard Biener via Gcc-patches
On Sun, Sep 3, 2023 at 6:19 PM Andrew Pinski via Gcc-patches
 wrote:
>
> This improves rewrite_to_defined_overflow slightly if we already
> have an unsigned type. The only place where this seems to show up
> is ifcombine. It removes one extra statement which gets added and
> then later on removed.

What specific case is that?  It sounds like we call the function when
it isn't needed?  I also think that refactoring to a special case when
the LHS type already is OK will result in better code in the end.

Richard.

> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
> PR tree-optimization/111276
> * gimple-fold.cc (rewrite_to_defined_overflow): Don't
> add a new lhs if we already the unsigned type.
> ---
>  gcc/gimple-fold.cc | 17 +++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index fd01810581a..2fcafeada37 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -8721,10 +8721,19 @@ rewrite_to_defined_overflow (gimple *stmt, bool 
> in_place /* = false */)
> op = gimple_convert (, type, op);
> gimple_set_op (stmt, i, op);
>}
> -  gimple_assign_set_lhs (stmt, make_ssa_name (type, stmt));
> +  bool needs_cast_back = false;
> +  if (!useless_type_conversion_p (type, TREE_TYPE (lhs)))
> +{
> +  gimple_assign_set_lhs (stmt, make_ssa_name (type, stmt));
> +  needs_cast_back = true;
> +}
> +
>if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR)
>  gimple_assign_set_rhs_code (stmt, PLUS_EXPR);
> -  gimple_set_modified (stmt, true);
> +
> +  if (needs_cast_back || stmts)
> +gimple_set_modified (stmt, true);
> +
>if (in_place)
>  {
>gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> @@ -8734,6 +8743,10 @@ rewrite_to_defined_overflow (gimple *stmt, bool 
> in_place /* = false */)
>  }
>else
>  gimple_seq_add_stmt (, stmt);
> +
> +  if (!needs_cast_back)
> +return stmts;
> +
>gimple *cvt = gimple_build_assign (lhs, NOP_EXPR, gimple_assign_lhs 
> (stmt));
>if (in_place)
>  {
> --
> 2.31.1
>


Re: [PATCH V6] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-09-01 Thread Richard Biener via Gcc-patches
On Fri, 1 Sep 2023, Jiufu Guo wrote:

> Hi,
> 
> Integer expression "(X - N * M) / N" can be optimized to "X / N - M" with
> the below conditions:
> 1. There is no wrap/overflow/underflow.
>wrap/overflow/underflow breaks the arithmetic operation.
> 2. "X - N * M" and "X" are not of opposite sign.
>Here, the operation "/" would be "trunc_div", the fractional part is
>discarded towards zero. If "X - N * M" and "X" are in different signs,
>then trunc_div discards the fractional parts (of /N) in different
>directions.
> 
> Compare the previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624801.html
> This patch adds comments and update the pattern on "(t + C)" to be more
> tight.
> 
> Bootstrap & regtest pass on ppc64{,le} and x86_64.
> Is this patch ok for trunk?
> 
> BR,
> Jeff (Jiufu Guo)
> 
>   PR tree-optimization/108757
> 
> gcc/ChangeLog:
> 
>   * match.pd ((X - N * M) / N): New pattern.
>   ((X + N * M) / N): New pattern.
>   ((X + C) div_rshift N): New pattern.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr108757-1.c: New test.
>   * gcc.dg/pr108757-2.c: New test.
>   * gcc.dg/pr108757.h: New test.
> 
> ---
>  gcc/match.pd  |  78 ++
>  gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
>  gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
>  gcc/testsuite/gcc.dg/pr108757.h   | 233 ++
>  4 files changed, 348 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr108757.h
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> fa598d5ca2e470f9cc3b82469e77d743b12f107e..863bc7299cdefc622a7806a4d32e37268c50d453
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -959,6 +959,84 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  #endif
> 
>  
> +#if GIMPLE
> +(for div (trunc_div exact_div)
> + /* Simplify (X + M*N) / N -> X / N + M.  */
> + (simplify
> +  (div (plus:c@4 @0 (mult:c@3 @1 @2)) @2)
> +  (with {value_range vr0, vr1, vr2, vr3, vr4;}
> +  (if (INTEGRAL_TYPE_P (type)
> +   && get_range_query (cfun)->range_of_expr (vr1, @1)
> +   && get_range_query (cfun)->range_of_expr (vr2, @2)
> +   /* "N*M" doesn't overflow.  */
> +   && range_op_handler (MULT_EXPR).overflow_free_p (vr1, vr2)
> +   && get_range_query (cfun)->range_of_expr (vr0, @0)
> +   && get_range_query (cfun)->range_of_expr (vr3, @3)
> +   /* "X+(N*M)" doesn't overflow.  */
> +   && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr3)
> +   && get_range_query (cfun)->range_of_expr (vr4, @4)
> +   /* "X+N*M" is not with opposite sign as "X".  */
> +   && (TYPE_UNSIGNED (type)
> +|| (vr0.nonnegative_p () && vr4.nonnegative_p ())
> +|| (vr0.nonpositive_p () && vr4.nonpositive_p (
> +  (plus (div @0 @2) @1
> +
> + /* Simplify (X - M*N) / N -> X / N - M.  */
> + (simplify
> +  (div (minus@4 @0 (mult:c@3 @1 @2)) @2)
> +  (with {value_range vr0, vr1, vr2, vr3, vr4;}
> +  (if (INTEGRAL_TYPE_P (type)
> +   && get_range_query (cfun)->range_of_expr (vr1, @1)
> +   && get_range_query (cfun)->range_of_expr (vr2, @2)
> +   /* "N * M" doesn't overflow.  */
> +   && range_op_handler (MULT_EXPR).overflow_free_p (vr1, vr2)
> +   && get_range_query (cfun)->range_of_expr (vr0, @0)
> +   && get_range_query (cfun)->range_of_expr (vr3, @3)
> +   /* "X - (N*M)" doesn't overflow.  */
> +   && range_op_handler (MINUS_EXPR).overflow_free_p (vr0, vr3)
> +   && get_range_query (cfun)->range_of_expr (vr4, @4)
> +   /* "X-N*M" is not with opposite sign as "X".  */
> +   && (TYPE_UNSIGNED (type)
> +|| (vr0.nonnegative_p () && vr4.nonnegative_p ())
> +|| (vr0.nonpositive_p () && vr4.nonpositive_p (
> +  (minus (div @0 @2) @1)
> +
> +/* Simplify
> +   (X + C) / N -> X / N + C / N where C is multiple of N.
> +   (X + C) >> N -> X >> N + C>>N if low N bits of C is 0.  */
> +(for op (trunc_div exact_div rshift)
> + (simplify
> +  (op (plus@3 @0 INTEGER_CST@1) INTEGER_CST@2)
> +   (with
> +{
> +  wide_int c = wi::to_wide (@1);
> +  wide_int n = wi::to_wide (@2);
> +  bool shift = op == RSHIFT_EXPR;
> +  #define plus_op1(v) (shift ? wi::rshift (v, n, TYPE_SIGN (type)) \
> +  : wi::div_trunc (v, n, TYPE_SIGN (type)))
> +  #define exact_mod(v) (shift ? wi::ctz (v) >= n.to_shwi () \
> +   : wi::multiple_of_p (v, n, TYPE_SIGN (type)))

please indent these full left

> +  value_range vr0, vr1, vr3;
> +}
> +(if (INTEGRAL_TYPE_P (type)
> +  && get_range_query (cfun)->range_of_expr (vr0, @0))
> + (if (exact_mod (c)
> +   && get_range_query (cfun)->range_of_expr (vr1, @1)
> +   /* "X+C" doesn't overflow.  */
> +   && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr1)
> +   && get_range_query 

Re: [PATCH v3] tree-optimization/110279- Check for nested FMA in reassoc

2023-09-01 Thread Richard Biener via Gcc-patches
On Wed, Aug 9, 2023 at 6:53 PM Di Zhao OS  wrote:
>
> Hi,
>
> The previous version of this patch tries to solve two problems
> at the same time. For better clarity, I'll separate them and
> only deal with the "nested" FMA in this version. I plan to
> propose another patch in avoiding bad shaped FMA (deferring FMA).
>
> Other changes:
>
> 1. Added new testcases for the "nested" FMA issue. For the
>following code:
>
> tmp1 = a + c * c + d * d + x * y;
> tmp2 = x * tmp1;
> result += (a + c + d + tmp2);
>
>, when "tmp1 = ..." is not rewritten, tmp1 will be result of
>an FMA, and there will be a list of consecutive FMAs:
>
> _1 = .FMA (c, c, a_39);
> _2 = .FMA (d, d, _1);
> tmp1 = .FMA (x, y, _2);
> _3 = .FMA (tmp1, x, d);
> ...
>
>If "tmp1 = ..." is rewritten to parallel, tmp1 will be result
>of a PLUS_EXPR between FMAs:
>
> _1 = .FMA (c, c, a_39);
> _2 = x * y;
> _3 = .FMA (d, d, _2);
>  tmp1 = _3 + _1;
>  _4 = .FMA (tmp1, x, d);
> ...
>
>It seems the register pressure of the latter is higher than
>the former.

Yes, that's a natural consequence of rewriting to parallel.

>On the test machines we have (including Ampere1,
>Neoverse-n1 and Intel Xeon), with "tmp1 = ..." is rewritten to
>parallel, the run time all increased significantly. In
>contrast, when "tmp1" is not the 1st or 2nd operand of another
>FMA (pr110279-1.c), rewriting it results in better performance.
>(I'll also append the testcases in the bug tracker.)
>
> 2. Enhanced checking for nested FMA by: 1) Modified
>convert_mult_to_fma so it can return multiple LHS.  2) Check
>NEGATE_EXPRs for nested FMA.
>
> (I think maybe this can be further refined by enabling rewriting
> to parallel for very long op list. )

So again, what you do applies to all operations, not just FMA.
Consider

  tmp1 = a + c + d + y;
  tmp2 = x + tmp1;
  result += (a + c + d + tmp2);
  foo (tmp2);

where I just removed all multiplications.  Since re-assoc works
on isolated single-use chains it will rewrite the tmp2 chain
to parallel and it will rewrite the result chain to parallel, in
the end this results in reassoc-width for 'result' to not be honored
because we don't see that at 'tmp2' it will fork again.  OTOH
the other 'result' arms end, so eventually just two (for reassoc
width 2) arms are "active" at any point.

That said - isn't the issue that we "overcommit" reassoc-width
this way because we apply it locally instead of globally
(of course also ignoring every other chain of instructions
reassoc isn't interestedin)?

Unfortunately we work backwards when processing chains,
if we processed leaf chains first we could record the
association width applied to the chain at its new root and
honor that when such root ends up in the oplist of a consuming
chain.  But as we work backwards we'd have to record
the reassoc width used to in the leafs of the associated
chain.  So if those become roots of other chains we can
then still honor that.

Would it work to attack the problem this way?  For
parallel rewritten chains record the width used?
Similar to operand_rank we could use a hash-map
from SSA leaf to width it appears in.  When we rewrite
a chain with such leaf as root we can then subtract
the incoming chain width from reassoc-width to lower
the width its tail?

Richard.

> Bootstrapped and regression tested on x86_64-linux-gnu.
>
> Thanks,
> Di Zhao
>
> 
>
> PR tree-optimization/110279
>
> gcc/ChangeLog:
>
> * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Added
> new parameter collect_lhs.
> (struct fma_transformation_info): Moved to header.
> (class fma_deferring_state): Moved to header.
> (convert_mult_to_fma): Added new parameter collect_lhs.
> * tree-ssa-math-opts.h (struct fma_transformation_info):
> (class fma_deferring_state): Moved from .cc.
> (convert_mult_to_fma): Moved from .cc.
> * tree-ssa-reassoc.cc (enum fma_state): Defined enum to
> describe the state of FMA candidates for a list of
> operands.
> (rewrite_expr_tree_parallel): Changed boolean parameter
> to enum type.
> (has_nested_fma_p): New function to check for nested FMA
> on given multiplication statement.
> (rank_ops_for_fma): Return enum fma_state.
> (reassociate_bb): Avoid rewriting to parallel if nested
> FMAs are found.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr110279-1.c: New test.
> * gcc.dg/pr110279-2.c: New test.


Re: [PATCH] MATCH: `(nop_convert)-a` into -(nop_convert)a if the negate is single use and a is known not to be signed min value

2023-09-01 Thread Richard Biener via Gcc-patches
On Fri, Sep 1, 2023 at 4:27 AM Andrew Pinski via Gcc-patches
 wrote:
>
> This pushes the conversion further down the chain which allows to optimize 
> away more
> conversions in many cases.

But when building (T1)(T2)-x it will make simplifying (T1)(T2) more difficult
as we'd need a

(convert (negate (convert ...)))

pattern for that?  So I'm not convinced this is the correct approach to the
cases you want to optimize?  The testcase actually are of the
form (T1)-(T2)x so hoisting the other way around would have worked as well
(if the outer convert would have been folded).

Are there any existing cases where we push/pull (nop) conversions around
unary operations?

Should we pay the price and simply have patterns for
(convert (unary (convert ...)))?

[how nice is the RTL world without signedness of operands but
signed/unsigned operation variants ...]

Richard.

> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> PR tree-optimization/107765
> PR tree-optimization/107137
>
> gcc/ChangeLog:
>
> * match.pd (`(nop_convert)-a`): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/neg-cast-1.c: New test.
> * gcc.dg/tree-ssa/neg-cast-2.c: New test.
> * gcc.dg/tree-ssa/neg-cast-3.c: New test.
> ---
>  gcc/match.pd   | 31 ++
>  gcc/testsuite/gcc.dg/tree-ssa/neg-cast-1.c | 17 
>  gcc/testsuite/gcc.dg/tree-ssa/neg-cast-2.c | 20 ++
>  gcc/testsuite/gcc.dg/tree-ssa/neg-cast-3.c | 15 +++
>  4 files changed, 83 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/neg-cast-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/neg-cast-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/neg-cast-3.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 487a7e38719..3cff9b03d92 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -959,6 +959,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  #endif
> 
>
> +/* (nop_cast)-var -> -(nop_cast)(var)
> +   if -var is known to not overflow; that is does not include
> +   the signed integer MIN. */
> +(simplify
> + (convert (negate:s @0))
> + (if (INTEGRAL_TYPE_P (type)
> +  && tree_nop_conversion_p (type, TREE_TYPE (@0)))
> +  (with {
> +/* If the top is not set, there is no overflow happening. */
> +bool contains_signed_min = !wi::ges_p (tree_nonzero_bits (@0), 0);
> +#if GIMPLE
> +int_range_max vr;
> +if (contains_signed_min
> +&& TREE_CODE (@0) == SSA_NAME
> +   && get_range_query (cfun)->range_of_expr (vr, @0)
> +   && !vr.undefined_p ())
> +  {
> +tree stype = signed_type_for (type);
> +   auto minvalue = wi::min_value (stype);
> +   int_range_max valid_range (TREE_TYPE (@0), minvalue, minvalue);
> +   vr.intersect (valid_range);
> +   /* If the range does not include min value,
> +  then we can do this change around. */
> +   if (vr.undefined_p ())
> + contains_signed_min = false;
> +  }
> +#endif
> +   }
> +   (if (!contains_signed_min)
> +(negate (convert @0))
> +
>  (for op (negate abs)
>   /* Simplify cos(-x) and cos(|x|) -> cos(x).  Similarly for cosh.  */
>   (for coss (COS COSH)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-1.c
> new file mode 100644
> index 000..7ddf40aca29
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-evrp" } */
> +/* PR tree-optimization/107765 */
> +
> +#include 
> +
> +int a(int input)
> +{
> +if (input == INT_MIN) __builtin_unreachable();
> +unsigned t = input;
> +int tt =  -t;
> +return tt == -input;
> +}
> +
> +/* Should be able to optimize this down to just `return 1;` during evrp. */
> +/* { dg-final { scan-tree-dump "return 1;" "evrp" } } */
> +/* { dg-final { scan-tree-dump-not " - " "evrp" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-2.c
> new file mode 100644
> index 000..ce49079e235
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-2.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-fre3 -fdump-tree-optimized" } */
> +/* part of PR tree-optimization/108397 */
> +
> +long long
> +foo (unsigned char o)
> +{
> +  unsigned long long t1 = -(long long) (o == 0);
> +  unsigned long long t2 = -(long long) (t1 != 0);
> +  unsigned long long t3 = -(long long) (t1 <= t2);
> +  return t3;
> +}
> +
> +/* Should be able to optimize this down to just `return -1;` during fre3. */
> +/* { dg-final { scan-tree-dump "return -1;" "fre3" } } */
> +/* FRE does not remove all dead statements */
> +/* { dg-final { scan-tree-dump-not " - " "fre3" { xfail *-*-* } } } */
> +
> +/* { dg-final { scan-tree-dump "return -1;" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " - " "optimized" 

Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass

2023-09-01 Thread Richard Biener via Gcc-patches
On Fri, 1 Sep 2023, Filip Kastl wrote:

> > That's interesting.  Your placement at
> > 
> >   NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
> >   NEXT_PASS (pass_phiopt, true /* early_p */);
> > + NEXT_PASS (pass_sccp);
> > 
> > and
> > 
> >NEXT_PASS (pass_tsan);
> >NEXT_PASS (pass_dse, true /* use DR analysis */);
> >NEXT_PASS (pass_dce);
> > +  NEXT_PASS (pass_sccp);
> > 
> > isn't immediately after the "best" existing pass we have to
> > remove dead PHIs which is pass_cd_dce.  phiopt might leave
> > dead PHIs around and the second instance runs long after the
> > last CD-DCE.
> > 
> > So I wonder if your pass just detects unnecessary PHIs we'd have
> > removed by other means and what survives until RTL expansion is
> > what we should count?
> > 
> > Can you adjust your original early placement to right after
> > the cd-dce pass and for the late placement turn the dce pass
> > before it into cd-dce and re-do your measurements?
> 
> So I did this
> 
>   NEXT_PASS (pass_dse);
>   NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
>   NEXT_PASS (pass_sccp);
>   NEXT_PASS (pass_phiopt, true /* early_p */);
>   NEXT_PASS (pass_tail_recursion); 
> 
> and this
> 
>   NEXT_PASS (pass_dse, true /* use DR analysis */);
>   NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
>   NEXT_PASS (pass_sccp);
>   /* Pass group that runs when 1) enabled, 2) there are loops
> 
> and got these results:
> 
> 500.perlbench_r
> Started with (1) 30318
> Ended with (1) 26219
> Removed PHI % (1) 13.52002110957187149600
> Started with (2) 39043
> Ended with (2) 38941
> Removed PHI % (2) .26125041620777092000
> 
> 502.gcc_r
> Started with (1) 148361
> Ended with (1) 140464
> Removed PHI % (1) 5.32282742769326170700
> Started with (2) 216209
> Ended with (2) 215367
> Removed PHI % (2) .38943799749316633500
> 
> 505.mcf_r
> Started with (1) 342
> Ended with (1) 304
> Removed PHI % (1) 11.1200
> Started with (2) 437
> Ended with (2) 433
> Removed PHI % (2) .91533180778032036700
>  
> 523.xalancbmk_r
> Started with (1) 62995
> Ended with (1) 58289 
> Removed PHI % (1) 7.47043416144138423700
> Started with (2) 134026
> Ended with (2) 133193
> Removed PHI % (2) .62152119737961291100
>   
> 531.deepsjeng_r
> Started with (1) 1402
> Ended with (1) 1264
> Removed PHI % (1) 9.84308131241084165500
> Started with (2) 1928
> Ended with (2) 1920
> Removed PHI % (2) .41493775933609958600
> 
> 541.leela_r
> Started with (1) 3398
> Ended with (1) 3060
> Removed PHI % (1) 9.94702766333137139500
> Started with (2) 4473
> Ended with (2) 4453
> Removed PHI % (2) .44712720769058797300
> 
> 557.xz_r
> Started with (1) 47
> Ended with (1) 44
> Removed PHI % (1) 6.38297872340425532000
> Started with (2) 43
> Ended with (2) 43
> Removed PHI % (2) 0
> 
> These measurements don't differ very much from the previous. It seems to me
> that phiopt does output some redundant PHIs but the vast majority of the
> eliminated PHIs are generated in earlier passes and cd_dce isn't able to get
> rid of them.
> 
> A noteworthy information might be that most of the eliminated PHIs are 
> actually
> trivial PHIs. I consider a PHI to be trivial if it only references itself or
> one other SSA name.

Ah.  The early pass numbers are certainly intresting - can you elaborate
on the last bit?  We have for example loop-closed PHI nodes like

_1 = PHI <_2>

and there are non-trivial degenerate PHIs like

_1 = PHI <_2, _2>

those are generally removed by value-numbering (FRE, DOM and PRE) and SSA 
propagation (CCP and copyprop), they are not "dead" so CD-DCE doesn't
remove them.

But we do have passes removing these kind of PHIs.

The issue with the early pass is likely that we have

  NEXT_PASS (pass_fre, true /* may_iterate */);
^^
would elimimate these kind of PHIs

  NEXT_PASS (pass_early_vrp);
^^
rewrites into loop-closed SSA, adding many such PHIs

  NEXT_PASS (pass_merge_phi);
  NEXT_PASS (pass_dse);
  NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);

and until here there's no pass eliding the LC SSA PHIs.

You could add a pass_copy_prop after early_vrp, the later sccp
pass shouldn't run into this issue I think so it must be other
passes adding such kind of PHIs.

Maybe you can count single-argument PHIs, degenerate multi-arg PHIs
and "other" PHIs separately as you remove them?


> Here is a comparison of the newest measurements (sccp after cd_dce) with the
> previous ones (sccp after phiopt and dce):
> 
> 500.perlbench_r
>  
> Started with (1-PREV) 30287
> Started with (1-NEW) 30318
>  
> Ended with (1-PREV) 26188
> Ended with (1-NEW) 26219
>  
> Removed PHI % (1-PREV) 13.53385941162875161000
> Removed PHI % (1-NEW) 

Re: [PATCH] MATCH [PR19832]: Optimize some `(a != b) ? a OP b : c`

2023-09-01 Thread Richard Biener via Gcc-patches
On Thu, Aug 31, 2023 at 7:25 PM Andrew Pinski via Gcc-patches
 wrote:
>
> This patch adds the following match patterns to optimize these:
>  /* (a != b) ? (a - b) : 0 -> (a - b) */
>  /* (a != b) ? (a ^ b) : 0 -> (a ^ b) */
>  /* (a != b) ? (a & b) : a -> (a & b) */
>  /* (a != b) ? (a | b) : a -> (a | b) */
>  /* (a != b) ? min(a,b) : a -> min(a,b) */
>  /* (a != b) ? max(a,b) : a -> max(a,b) */
>  /* (a != b) ? (a * b) : (a * a) -> (a * b) */
>  /* (a != b) ? (a + b) : (a + a) -> (a + b) */
>  /* (a != b) ? (a + b) : (2 * a) -> (a + b) */
> Note currently only integer types (include vector types)
> are handled. Floating point types can be added later on.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK.

> The first pattern had still shows up in GCC in cse.c's preferable
> function which was the original motivation for this patch.
>
> PR tree-optimization/19832
>
> gcc/ChangeLog:
>
> * match.pd: Add pattern to optimize
> `(a != b) ? a OP b : c`.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/opt/vectcond-1.C: New test.
> * gcc.dg/tree-ssa/phi-opt-same-1.c: New test.
> ---
>  gcc/match.pd  | 31 ++
>  gcc/testsuite/g++.dg/opt/vectcond-1.C | 57 ++
>  .../gcc.dg/tree-ssa/phi-opt-same-1.c  | 60 +++
>  3 files changed, 148 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/opt/vectcond-1.C
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index c01362ee359..487a7e38719 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5261,6 +5261,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (convert @c0
>  #endif
>
> +(for cnd (cond vec_cond)
> + /* (a != b) ? (a - b) : 0 -> (a - b) */
> + (simplify
> +  (cnd (ne:c @0 @1) (minus@2 @0 @1) integer_zerop)
> +  @2)
> + /* (a != b) ? (a ^ b) : 0 -> (a ^ b) */
> + (simplify
> +  (cnd (ne:c @0 @1) (bit_xor:c@2 @0 @1) integer_zerop)
> +  @2)
> + /* (a != b) ? (a & b) : a -> (a & b) */
> + /* (a != b) ? (a | b) : a -> (a | b) */
> + /* (a != b) ? min(a,b) : a -> min(a,b) */
> + /* (a != b) ? max(a,b) : a -> max(a,b) */
> + (for op (bit_and bit_ior min max)
> +  (simplify
> +   (cnd (ne:c @0 @1) (op:c@2 @0 @1) @0)
> +   @2))
> + /* (a != b) ? (a * b) : (a * a) -> (a * b) */
> + /* (a != b) ? (a + b) : (a + a) -> (a + b) */
> + (for op (mult plus)
> +  (simplify
> +   (cnd (ne:c @0 @1) (op@2 @0 @1) (op @0 @0))
> +   (if (ANY_INTEGRAL_TYPE_P (type))
> +@2)))
> + /* (a != b) ? (a + b) : (2 * a) -> (a + b) */
> + (simplify
> +  (cnd (ne:c @0 @1) (plus@2 @0 @1) (mult @0 uniform_integer_cst_p@3))
> +  (if (wi::to_wide (uniform_integer_cst_p (@3)) == 2)
> +   @2))
> +)
> +
>  /* These was part of minmax phiopt.  */
>  /* Optimize (a CMP b) ? minmax : minmax
> to minmax, c> */
> diff --git a/gcc/testsuite/g++.dg/opt/vectcond-1.C 
> b/gcc/testsuite/g++.dg/opt/vectcond-1.C
> new file mode 100644
> index 000..3877ad11414
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/opt/vectcond-1.C
> @@ -0,0 +1,57 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-ccp1 -fdump-tree-optimized" } */
> +/* This is the vector version of these optimizations. */
> +/* PR tree-optimization/19832 */
> +
> +#define vector __attribute__((vector_size(sizeof(unsigned)*2)))
> +
> +static inline vector int max_(vector int a, vector int b)
> +{
> +   return (a > b)? a : b;
> +}
> +static inline vector int min_(vector int a, vector int b)
> +{
> +  return (a < b) ? a : b;
> +}
> +
> +vector int f_minus(vector int a, vector int b)
> +{
> +  return (a != b) ? a - b : (a - a);
> +}
> +vector int f_xor(vector int a, vector int b)
> +{
> +  return (a != b) ? a ^ b : (a ^ a);
> +}
> +
> +vector int f_ior(vector int a, vector int b)
> +{
> +  return (a != b) ? a | b : (a | a);
> +}
> +vector int f_and(vector int a, vector int b)
> +{
> +  return (a != b) ? a & b : (a & a);
> +}
> +vector int f_max(vector int a, vector int b)
> +{
> +  return (a != b) ? max_(a, b) : max_(a, a);
> +}
> +vector int f_min(vector int a, vector int b)
> +{
> +  return (a != b) ? min_(a, b) : min_(a, a);
> +}
> +vector int f_mult(vector int a, vector int b)
> +{
> +  return (a != b) ? a * b : (a * a);
> +}
> +vector int f_plus(vector int a, vector int b)
> +{
> +  return (a != b) ? a + b : (a + a);
> +}
> +vector int f_plus_alt(vector int a, vector int b)
> +{
> +  return (a != b) ? a + b : (a * 2);
> +}
> +
> +/* All of the above function's VEC_COND_EXPR should have been optimized 
> away. */
> +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "ccp1" } } */
> +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-1.c
> new file mode 100644
> index 000..24e757b9b9f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-1.c
> @@ -0,0 +1,60 @@
> +/* { dg-do 

Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass

2023-09-01 Thread Richard Biener via Gcc-patches
On Thu, 31 Aug 2023, Andrew Pinski wrote:

> On Thu, Aug 31, 2023 at 5:15?AM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Thu, 31 Aug 2023, Filip Kastl wrote:
> >
> > > > The most obvious places would be right after SSA construction and 
> > > > before RTL expansion.
> > > > Can you provide measurements for those positions?
> > >
> > > The algorithm should only remove PHIs that break SSA form minimality. 
> > > Since
> > > GCC's SSA construction already produces minimal SSA form, the algorithm 
> > > isn't
> > > expected to remove any PHIs if run right after the construction. I even
> > > measured it and indeed -- no PHIs got removed (except for 502.gcc_r, 
> > > where the
> > > algorithm managed to remove exactly 1 PHI, which is weird).
> > >
> > > I tried putting the pass before pass_expand. There isn't a lot of PHIs to
> > > remove at that point, but there still are some.
> >
> > That's interesting.  Your placement at
> >
> >   NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
> >   NEXT_PASS (pass_phiopt, true /* early_p */);
> > + NEXT_PASS (pass_sccp);
> >
> > and
> >
> >NEXT_PASS (pass_tsan);
> >NEXT_PASS (pass_dse, true /* use DR analysis */);
> >NEXT_PASS (pass_dce);
> > +  NEXT_PASS (pass_sccp);
> >
> > isn't immediately after the "best" existing pass we have to
> > remove dead PHIs which is pass_cd_dce.  phiopt might leave
> > dead PHIs around and the second instance runs long after the
> > last CD-DCE.
> 
> Actually the last phiopt is run before last pass_cd_dce:

I meant the second instance of pass_sccp, not phiopt.

Richard.


[PATCH] middle-end/111253 - partly revert r11-6508-gabb1b6058c09a7

2023-08-31 Thread Richard Biener via Gcc-patches
The following keeps dumping SSA def stmt RHS during diagnostic
reporting only for gimple_assign_single_p defs which means
memory loads.  This avoids diagnostics containing PHI nodes
like

  warning: 'realloc' called on pointer '*_42 = PHI .t_mem_caches' with nonzero offset 40

instead getting back the previous behavior:

  warning: 'realloc' called on pointer '*.t_mem_caches' with nonzero 
offset 40

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

PR middle-end/111253
gcc/c-family/
* c-pretty-print.cc (c_pretty_printer::primary_expression):
Only dump gimple_assign_single_p SSA def RHS.

* gcc.dg/Wfree-nonheap-object-7.c: New testcase.
---
 gcc/c-family/c-pretty-print.cc|  7 -
 gcc/testsuite/gcc.dg/Wfree-nonheap-object-7.c | 26 +++
 2 files changed, 32 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/Wfree-nonheap-object-7.c

diff --git a/gcc/c-family/c-pretty-print.cc b/gcc/c-family/c-pretty-print.cc
index 7536a7c471f..679aa766fe0 100644
--- a/gcc/c-family/c-pretty-print.cc
+++ b/gcc/c-family/c-pretty-print.cc
@@ -33,6 +33,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "options.h"
 #include "internal-fn.h"
+#include "function.h"
+#include "basic-block.h"
+#include "gimple.h"
 
 /* The pretty-printer code is primarily designed to closely follow
(GNU) C and C++ grammars.  That is to be contrasted with spaghetti
@@ -1380,12 +1383,14 @@ c_pretty_printer::primary_expression (tree e)
  else
primary_expression (var);
}
-  else
+  else if (gimple_assign_single_p (SSA_NAME_DEF_STMT (e)))
{
  /* Print only the right side of the GIMPLE assignment.  */
  gimple *def_stmt = SSA_NAME_DEF_STMT (e);
  pp_gimple_stmt_1 (this, def_stmt, 0, TDF_RHS_ONLY);
}
+  else
+   expression (e);
   break;
 
 default:
diff --git a/gcc/testsuite/gcc.dg/Wfree-nonheap-object-7.c 
b/gcc/testsuite/gcc.dg/Wfree-nonheap-object-7.c
new file mode 100644
index 000..6116bfa4d8e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wfree-nonheap-object-7.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wfree-nonheap-object" } */
+
+struct local_caches *get_local_caches_lcs;
+void *calloc(long, long);
+void *realloc();
+
+struct local_caches {
+  int *t_mem_caches;
+};
+
+struct local_caches *get_local_caches() {
+  if (get_local_caches_lcs)
+return get_local_caches_lcs;
+  get_local_caches_lcs = calloc(1, 0);
+  return get_local_caches_lcs;
+}
+
+void libtrace_ocache_free() {
+  struct local_caches lcs = *get_local_caches(), __trans_tmp_1 = lcs;
+  {
+struct local_caches *lcs = &__trans_tmp_1;
+lcs->t_mem_caches += 10;
+__trans_tmp_1.t_mem_caches = realloc(__trans_tmp_1.t_mem_caches, 
sizeof(int)); // { dg-warning "called on pointer (?:(?!PHI).)*nonzero offset" }
+  }
+}
-- 
2.35.3


Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-31 Thread Richard Biener via Gcc-patches
On Wed, Aug 30, 2023 at 11:33 AM Di Zhao OS
 wrote:
>
> Hello Richard,
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, August 29, 2023 7:11 PM
> > To: Di Zhao OS 
> > Cc: Jeff Law ; Martin Jambor ; gcc-
> > patc...@gcc.gnu.org
> > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to
> > reduce cross backedge FMA
> >
> > On Tue, Aug 29, 2023 at 10:59 AM Di Zhao OS
> >  wrote:
> > >
> > > Hi,
> > >
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Tuesday, August 29, 2023 4:09 PM
> > > > To: Di Zhao OS 
> > > > Cc: Jeff Law ; Martin Jambor ;
> > gcc-
> > > > patc...@gcc.gnu.org
> > > > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc
> > to
> > > > reduce cross backedge FMA
> > > >
> > > > On Tue, Aug 29, 2023 at 9:49 AM Di Zhao OS
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > > -Original Message-
> > > > > > From: Richard Biener 
> > > > > > Sent: Tuesday, August 29, 2023 3:41 PM
> > > > > > To: Jeff Law ; Martin Jambor 
> > > > > > 
> > > > > > Cc: Di Zhao OS ; gcc-
> > patc...@gcc.gnu.org
> > > > > > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in
> > reassoc
> > > > to
> > > > > > reduce cross backedge FMA
> > > > > >
> > > > > > On Tue, Aug 29, 2023 at 1:23 AM Jeff Law via Gcc-patches
> > > > > >  wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 8/28/23 02:17, Di Zhao OS via Gcc-patches wrote:
> > > > > > > > This patch tries to fix the 2% regression in 510.parest_r on
> > > > > > > > ampere1 in the tracker. (Previous discussion is here:
> > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624893.html)
> > > > > > > >
> > > > > > > > 1. Add testcases for the problem. For an op list in the form of
> > > > > > > > "acc = a * b + c * d + acc", currently reassociation doesn't
> > > > > > > > Swap the operands so that more FMAs can be generated.
> > > > > > > > After widening_mul the result looks like:
> > > > > > > >
> > > > > > > > _1 = .FMA(a, b, acc_0);
> > > > > > > > acc_1 = .FMA(c, d, _1);
> > > > > > > >
> > > > > > > > While previously (before the "Handle FMA friendly..." patch),
> > > > > > > > widening_mul's result was like:
> > > > > > > >
> > > > > > > > _1 = a * b;
> > > > > > > > _2 = .FMA (c, d, _1);
> > > > > > > > acc_1 = acc_0 + _2;
> > > > > >
> > > > > > How can we execute the multiply and the FMA in parallel?  They
> > > > > > depend on each other.  Or is it the uarch can handle dependence
> > > > > > on the add operand but only when it is with a multiplication and
> > > > > > not a FMA in some better ways?  (I'd doubt so much complexity)
> > > > > >
> > > > > > Can you explain in more detail how the uarch executes one vs. the
> > > > > > other case?
> > >
> > > Here's my understanding after consulted our hardware team. For the
> > > second case, the uarch of some out-of-order processors can calculate
> > > "_2" of several loops at the same time, since there's no dependency
> > > among different iterations. While for the first case the next iteration
> > > has to wait for the current iteration to finish, so "acc_0"'s value is
> > > known. I assume it is also the case in some i386 processors, since I
> > > saw the patch "Deferring FMA transformations in tight loops" also
> > > changed corresponding files.
> >
> > That should be true for all kind of operations, no?  Thus it means
> > reassoc should in general associate cross-iteration accumulation
> Yes I think both are true.
>
> > last?  Historically we associated those first because that's how the
> > vectorizer liked to see them, but I think that's no longer necessary.
> >
> > It should be achievable by properly biasing the operand during
> > rank computation (don't we already do that?).
>
> The issue is related with the following codes (handling cases with
> three operands left):
>   /* When there are three operands left, we want
>  to make sure the ones that get the double
>  binary op are chosen wisely.  */
>   int len = ops.length ();
>   if (len >= 3 && !has_fma)
> swap_ops_for_binary_stmt (ops, len - 3);
>
>   new_lhs = rewrite_expr_tree (stmt, rhs_code, 0, ops,
>powi_result != NULL
>|| negate_result,
>len != orig_len);
>
> Originally (before the "Handle FMA friendly..." patch), for the
> tiny example, the 2 multiplications will be placed first by
> swap_ops_for_binary_stmt and rewrite_expr_tree, according to
> ranks. While currently, to preserve more FMAs,
> swap_ops_for_binary_stmt won't be called, so the result would
> be MULT_EXPRs and PLUS_EXPRs interleaved with each other (which
> is mostly fine if these are not in such tight loops).
>
> What this patch tries to do can be summarized as: when cross
> backedge dependency is detected (and the uarch doesn't like it),
> better fallback to 

Re: [PATCH] MATCH: extend min_value/max_value match to vectors

2023-08-31 Thread Richard Biener via Gcc-patches
On Thu, Aug 31, 2023 at 12:27 AM Andrew Pinski via Gcc-patches
 wrote:
>
> This simple patch extends the min_value/max_value match to vector integer 
> types.
> Using uniform_integer_cst_p makes this easy.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> The testcases pr110915-*.c are the same as pr88784-*.c except using vector
> types instead.

OK.

> PR tree-optimization/110915
>
> gcc/ChangeLog:
>
> * match.pd (min_value, max_value): Extend to vector constants.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr110915-1.c: New test.
> * gcc.dg/pr110915-10.c: New test.
> * gcc.dg/pr110915-11.c: New test.
> * gcc.dg/pr110915-12.c: New test.
> * gcc.dg/pr110915-2.c: New test.
> * gcc.dg/pr110915-3.c: New test.
> * gcc.dg/pr110915-4.c: New test.
> * gcc.dg/pr110915-5.c: New test.
> * gcc.dg/pr110915-6.c: New test.
> * gcc.dg/pr110915-7.c: New test.
> * gcc.dg/pr110915-8.c: New test.
> * gcc.dg/pr110915-9.c: New test.
> ---
>  gcc/match.pd   | 24 ++
>  gcc/testsuite/gcc.dg/pr110915-1.c  | 31 
>  gcc/testsuite/gcc.dg/pr110915-10.c | 33 ++
>  gcc/testsuite/gcc.dg/pr110915-11.c | 31 
>  gcc/testsuite/gcc.dg/pr110915-12.c | 31 
>  gcc/testsuite/gcc.dg/pr110915-2.c  | 31 
>  gcc/testsuite/gcc.dg/pr110915-3.c  | 33 ++
>  gcc/testsuite/gcc.dg/pr110915-4.c  | 33 ++
>  gcc/testsuite/gcc.dg/pr110915-5.c  | 32 +
>  gcc/testsuite/gcc.dg/pr110915-6.c  | 32 +
>  gcc/testsuite/gcc.dg/pr110915-7.c  | 32 +
>  gcc/testsuite/gcc.dg/pr110915-8.c  | 32 +
>  gcc/testsuite/gcc.dg/pr110915-9.c  | 33 ++
>  13 files changed, 400 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-10.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-11.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-12.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-6.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-7.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-8.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-9.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 6a7edde5736..c01362ee359 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2750,16 +2750,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>& (bitpos / BITS_PER_UNIT))); }
>
>  (match min_value
> - INTEGER_CST
> - (if ((INTEGRAL_TYPE_P (type)
> -   || POINTER_TYPE_P(type))
> -  && wi::eq_p (wi::to_wide (t), wi::min_value (type)
> + uniform_integer_cst_p
> + (with {
> +   tree int_cst = uniform_integer_cst_p (t);
> +   tree inner_type = TREE_TYPE (int_cst);
> +  }
> +  (if ((INTEGRAL_TYPE_P (inner_type)
> +|| POINTER_TYPE_P (inner_type))
> +   && wi::eq_p (wi::to_wide (int_cst), wi::min_value (inner_type))
>
>  (match max_value
> - INTEGER_CST
> - (if ((INTEGRAL_TYPE_P (type)
> -   || POINTER_TYPE_P(type))
> -  && wi::eq_p (wi::to_wide (t), wi::max_value (type)
> + uniform_integer_cst_p
> + (with {
> +   tree int_cst = uniform_integer_cst_p (t);
> +   tree itype = TREE_TYPE (int_cst);
> +  }
> + (if ((INTEGRAL_TYPE_P (itype)
> +   || POINTER_TYPE_P (itype))
> +  && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))
>
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
> diff --git a/gcc/testsuite/gcc.dg/pr110915-1.c 
> b/gcc/testsuite/gcc.dg/pr110915-1.c
> new file mode 100644
> index 000..2e1e871b9a0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr110915-1.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-ifcombine" } */
> +#define vector __attribute__((vector_size(sizeof(unsigned)*2)))
> +
> +#include 
> +
> +vector signed and1(vector unsigned x, vector unsigned y)
> +{
> +  /* (x > y) & (x != 0)  --> x > y */
> +  return (x > y) & (x != 0);
> +}
> +
> +vector signed and2(vector unsigned x, vector unsigned y)
> +{
> +  /* (x < y) & (x != UINT_MAX)  --> x < y */
> +  return (x < y) & (x != UINT_MAX);
> +}
> +
> +vector signed and3(vector signed x, vector signed y)
> +{
> +  /* (x > y) & (x != INT_MIN)  --> x > y */
> +  return (x > y) & (x != INT_MIN);
> +}
> +
> +vector signed and4(vector signed x, vector signed y)
> +{
> +  /* (x < y) & (x != INT_MAX)  --> x < y */
> +  return (x < y) & (x != 

Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass

2023-08-31 Thread Richard Biener via Gcc-patches
On Thu, 31 Aug 2023, Filip Kastl wrote:

> > The most obvious places would be right after SSA construction and before 
> > RTL expansion.
> > Can you provide measurements for those positions?
> 
> The algorithm should only remove PHIs that break SSA form minimality. Since
> GCC's SSA construction already produces minimal SSA form, the algorithm isn't
> expected to remove any PHIs if run right after the construction. I even
> measured it and indeed -- no PHIs got removed (except for 502.gcc_r, where the
> algorithm managed to remove exactly 1 PHI, which is weird). 
> 
> I tried putting the pass before pass_expand. There isn't a lot of PHIs to
> remove at that point, but there still are some.

That's interesting.  Your placement at

  NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
  NEXT_PASS (pass_phiopt, true /* early_p */);
+ NEXT_PASS (pass_sccp);

and

   NEXT_PASS (pass_tsan);
   NEXT_PASS (pass_dse, true /* use DR analysis */);
   NEXT_PASS (pass_dce);
+  NEXT_PASS (pass_sccp);

isn't immediately after the "best" existing pass we have to
remove dead PHIs which is pass_cd_dce.  phiopt might leave
dead PHIs around and the second instance runs long after the
last CD-DCE.

So I wonder if your pass just detects unnecessary PHIs we'd have
removed by other means and what survives until RTL expansion is
what we should count?

Can you adjust your original early placement to right after
the cd-dce pass and for the late placement turn the dce pass
before it into cd-dce and re-do your measurements?

> 500.perlbench_r
> Started with 43111
> Ended with 42942
> Removed PHI % .39201131961680313700
> 
> 502.gcc_r
> Started with 141392
> Ended with 140455
> Removed PHI % .66269661649881181400
> 
> 505.mcf_r
> Started with 482
> Ended with 478
> Removed PHI % .82987551867219917100
> 
> 523.xalancbmk_r
> Started with 136040
> Ended with 135629
> Removed PHI % .30211702440458688700
> 
> 531.deepsjeng_r
> Started with 2150
> Ended with 2148
> Removed PHI % .09302325581395348900
> 
> 541.leela_r
> Started with 4664
> Ended with 4650
> Removed PHI % .30017152658662092700
> 
> 557.xz_r
> Started with 43
> Ended with 43
> Removed PHI % 0
> 
> > Can the pass somehow be used as part of propagations like during value 
> > numbering?
> 
> I don't think that the pass could be used as a part of different optimizations
> since it works on the whole CFG (except for copy propagation as I noted in the
> RFC). I'm adding Honza into Cc. He'll have more insight into this.
> 
> > Could the new file be called gimple-ssa-sccp.cc or something similar?
> 
> Certainly. Though I'm not sure, but wouldn't tree-ssa-sccp.cc be more
> appropriate?
> 
> I'm thinking about naming the pass 'scc-copy' and the file
> 'tree-ssa-scc-copy.cc'.
> 
> > Removing some PHIs is nice, but it would be also interesting to know what
> > are the effects on generated code size and/or performance.
> > And also if it has any effects on debug information coverage.
> 
> Regarding performance: I ran some benchmarks on a Zen3 machine with -O3 with
> and without the new pass. *I got ~2% speedup for 505.mcf_r and 541.leela_r.
> Here are the full results. What do you think? Should I run more benchmarks? Or
> benchmark multiple times? Or run the benchmarks on different machines?*
> 
> 500.perlbench_r
> Without SCCP: 244.151807s
> With SCCP: 242.448438s
> -0.7025695913124297%
> 
> 502.gcc_r
> Without SCCP: 211.029606s
> With SCCP: 211.614523s
> +0.27640683243653763%
> 
> 505.mcf_r
> Without SCCP: 298.782621s
> With SCCP: 291.671468s
> -2.438069465197046%
> 
> 523.xalancbmk_r
> Without SCCP: 189.940639s
> With SCCP: 189.876261s
> -0.03390523894928332%
> 
> 531.deepsjeng_r
> Without SCCP: 250.63648s
> With SCCP: 250.988624s
> +0.1403027732444051%
> 
> 541.leela_r
> Without SCCP: 346.066278s
> With SCCP: 339.692987s
> -1.8761915152519792%
> 
> Regarding size: The pass doesn't seem to significantly reduce or increase the
> size of the result binary. The differences were at most ~0.1%.
> 
> Regarding debug info coverage: I didn't notice any additional guality 
> testcases
> failing after I applied the patch. *Is there any other way how I should check
> debug info coverage?*
> 
> 
> Filip K
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] testsuite/vect: Make match patterns more accurate.

2023-08-31 Thread Richard Biener via Gcc-patches
On Thu, 31 Aug 2023, Robin Dapp wrote:

> Hi,
> 
> on some targets we fail to vectorize with the first type the vectorizer
> tries but succeed with the second.  This patch changes several regex
> patterns to reflect that behavior.
> 
> Before we would look for a single occurrence of e.g.
> "vect_recog_dot_prod_pattern" but would possible find two (one for each
> attempted mode).  The new pattern tries to match sequences where we
> first have a "vect_recog_dot_prod_pattern" and a "succeeded" afterwards
> while making sure there is no "failed" or "Re-trying" in between.
> 
> I realized we already only do scan-tree-dump instead of
> scan-tree-dump-times in some related testcases, probably for the same
> reason but I didn't touch them for now.
> 
> Testsuite unchanged on x86, aarch64 and Power10.

LGTM.

Thanks for discovering the required TCL regex magic.

Richard.

> Regards
>  Robin
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-reduc-dot-s16a.c: Adjust regex pattern.
>   * gcc.dg/vect/vect-reduc-dot-s8a.c: Ditto.
>   * gcc.dg/vect/vect-reduc-dot-s8b.c: Ditto.
>   * gcc.dg/vect/vect-reduc-dot-u16a.c: Ditto.
>   * gcc.dg/vect/vect-reduc-dot-u16b.c: Ditto.
>   * gcc.dg/vect/vect-reduc-dot-u8a.c: Ditto.
>   * gcc.dg/vect/vect-reduc-dot-u8b.c: Ditto.
>   * gcc.dg/vect/vect-reduc-pattern-1a.c: Ditto.
>   * gcc.dg/vect/vect-reduc-pattern-1b-big-array.c: Ditto.
>   * gcc.dg/vect/vect-reduc-pattern-1c-big-array.c: Ditto.
>   * gcc.dg/vect/vect-reduc-pattern-2a.c: Ditto.
>   * gcc.dg/vect/vect-reduc-pattern-2b-big-array.c: Ditto.
>   * gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c: Ditto.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c  | 4 ++--
>  gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c  | 4 ++--
>  gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u16a.c | 5 +++--
>  gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u16b.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8a.c  | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8b.c  | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1a.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1b-big-array.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1c-big-array.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2a.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2b-big-array.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c| 4 ++--
>  13 files changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c 
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
> index ffbc9706901..d826828e3d6 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
> @@ -51,7 +51,7 @@ main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 
> 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: 
> detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> vect_sdot_hi } } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> vect_widen_mult_hi_to_si } } } */
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c 
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
> index 05e343ad782..4e1e0b234f4 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
> @@ -55,8 +55,8 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 
> 1 "vect" } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
> detected" 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: 
> detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
> detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> vect_sdot_qi } } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { vect_widen_mult_qi_to_hi && vect_widen_sum_hi_to_si } } } } */
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c 
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c
> index 82c648cc73c..cb88ad5b639 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c
> @@ -53,8 +53,8 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 
> 1 "vect" { xfail *-*-* } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
> 

[PATCH] Fix gcc.dg/tree-ssa/forwprop-42.c

2023-08-31 Thread Richard Biener via Gcc-patches
The testcase requires hardware support for V2DImode vectors because
otherwise we do not rewrite inserts via BIT_FIELD_REF to
BIT_INSERT_EXPR.  There's no effective target for this so the
following makes the testcase x86 specific, requiring and enabling SSE2.

Pushed.

* gcc.dg/tree-ssa/forwprop-42.c: Move ...
* gcc.target/i386/pr111228.c: ... here.  Enable SSE2.
---
 .../tree-ssa/forwprop-42.c => gcc.target/i386/pr111228.c}  | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
 rename gcc/testsuite/{gcc.dg/tree-ssa/forwprop-42.c => 
gcc.target/i386/pr111228.c} (76%)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c 
b/gcc/testsuite/gcc.target/i386/pr111228.c
similarity index 76%
rename from gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
rename to gcc/testsuite/gcc.target/i386/pr111228.c
index 257a05d3ec8..f0c3f9b77bf 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
+++ b/gcc/testsuite/gcc.target/i386/pr111228.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O -fdump-tree-cddce1" } */
+/* { dg-additional-options "-msse2" { target sse2 } } */
 
 typedef __UINT64_TYPE__ v2di __attribute__((vector_size(16)));
 
@@ -14,4 +15,4 @@ void test (v2di *v)
   g = res;
 }
 
-/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR <\[^>\]*, { 0, 3 }>" 1 
"cddce1" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR <\[^>\]*, { 0, 3 }>" 1 
"cddce1" { target sse2 } } } */
-- 
2.35.3


Re: [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)

2023-08-31 Thread Richard Biener via Gcc-patches
On Thu, Aug 31, 2023 at 11:26 AM Richard Biener
 wrote:
>
> On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches
>  wrote:
> >
> > From: Kong Lingling 
> >
> > Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
> > but no evex counterpart.
> >
> > insn list:
> > 1. phminposuw/vphminposuw
> > 2. ptest/vptest
> > 3. roundps/vroundps, roundpd/vroundpd,
> >roundss/vroundss, roundsd/vroundsd
> > 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
> > 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
>
> How are GPRs involved in the above?  Or did I misunderstand something?

Following up myself - for the memory operand alternatives I guess.  How about
simply disabling the memory alternatives when EGPR is active?  Wouldn't
that simplify the initial patchset a lot?  Re-enabling them when
deemed important
could be done as followup then?

Richard.

> > 6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
> > prototype.
> > * config/i386/i386.cc (x86_evex_reg_mentioned_p): New
> > function.
> > * config/i386/i386.md (sse4_1_round2): Set attr gpr32 0
> > and constraint Bt/BM to all non-evex alternatives, adjust
> > alternative outputs if evex reg is mentioned.
> > * config/i386/sse.md (_ptest): Set attr gpr32 0
> > and constraint Bt/BM to all non-evex alternatives.
> > (ptesttf2): Likewise.
> > (_round > (sse4_1_round): Likewise.
> > (sse4_2_pcmpestri): Likewise.
> > (sse4_2_pcmpestrm): Likewise.
> > (sse4_2_pcmpestr_cconly): Likewise.
> > (sse4_2_pcmpistr): Likewise.
> > (sse4_2_pcmpistri): Likewise.
> > (sse4_2_pcmpistrm): Likewise.
> > (sse4_2_pcmpistr_cconly): Likewise.
> > (aesimc): Likewise.
> > (aeskeygenassist): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
> > tests.
> > ---
> >  gcc/config/i386/i386-protos.h |  1 +
> >  gcc/config/i386/i386.cc   | 13 +++
> >  gcc/config/i386/i386.md   |  3 +-
> >  gcc/config/i386/sse.md| 93 +--
> >  .../i386/apx-legacy-insn-check-norex2.c   | 55 ++-
> >  5 files changed, 132 insertions(+), 33 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> > index 78eb3e0f584..bbb219e3039 100644
> > --- a/gcc/config/i386/i386-protos.h
> > +++ b/gcc/config/i386/i386-protos.h
> > @@ -65,6 +65,7 @@ extern bool extended_reg_mentioned_p (rtx);
> >  extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
> >  extern bool x86_extended_reg_mentioned_p (rtx);
> >  extern bool x86_extended_rex2reg_mentioned_p (rtx);
> > +extern bool x86_evex_reg_mentioned_p (rtx [], int);
> >  extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
> >  extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index f5d642948bc..ec93c5bab97 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -22936,6 +22936,19 @@ x86_extended_rex2reg_mentioned_p (rtx insn)
> >return false;
> >  }
> >
> > +/* Return true when rtx operands mentions register that must be encoded 
> > using
> > +   evex prefix.  */
> > +bool
> > +x86_evex_reg_mentioned_p (rtx operands[], int nops)
> > +{
> > +  int i;
> > +  for (i = 0; i < nops; i++)
> > +if (EXT_REX_SSE_REG_P (operands[i])
> > +   || x86_extended_rex2reg_mentioned_p (operands[i]))
> > +  return true;
> > +  return false;
> > +}
> > +
> >  /* If profitable, negate (without causing overflow) integer constant
> > of mode MODE at location LOC.  Return true in this case.  */
> >  bool
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index 83ad01b43c1..4c305e72389 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -21603,7 +21603,7 @@ (define_expand "significand2"
> >  (define_insn "sse4_1_round2"
> >[(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
> > (unspec:MODEFH
> > - [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
> > + [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,Bt,v,m")
> >(match_operand:SI 2 "const_0_to_15_operand")]
> >   UNSPEC_ROUND))]
> >"TARGET_SSE4_1"
> > @@ -21616,6 +21616,7 @@ (define_insn "sse4_1_round2"
> >[(set_attr "type" "ssecvt")
> > (set_attr "prefix_extra" "1,1,1,*,*")
> > (set_attr "length_immediate" "1")
> > +   (set_attr "gpr32" "1,1,0,1,1")
> > (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
> > (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
> > (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
> > diff 

Re: [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)

2023-08-31 Thread Richard Biener via Gcc-patches
On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches
 wrote:
>
> From: Kong Lingling 
>
> Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
> but no evex counterpart.
>
> insn list:
> 1. phminposuw/vphminposuw
> 2. ptest/vptest
> 3. roundps/vroundps, roundpd/vroundpd,
>roundss/vroundss, roundsd/vroundsd
> 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
> 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm

How are GPRs involved in the above?  Or did I misunderstand something?

> 6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist
>
> gcc/ChangeLog:
>
> * config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
> prototype.
> * config/i386/i386.cc (x86_evex_reg_mentioned_p): New
> function.
> * config/i386/i386.md (sse4_1_round2): Set attr gpr32 0
> and constraint Bt/BM to all non-evex alternatives, adjust
> alternative outputs if evex reg is mentioned.
> * config/i386/sse.md (_ptest): Set attr gpr32 0
> and constraint Bt/BM to all non-evex alternatives.
> (ptesttf2): Likewise.
> (_round (sse4_1_round): Likewise.
> (sse4_2_pcmpestri): Likewise.
> (sse4_2_pcmpestrm): Likewise.
> (sse4_2_pcmpestr_cconly): Likewise.
> (sse4_2_pcmpistr): Likewise.
> (sse4_2_pcmpistri): Likewise.
> (sse4_2_pcmpistrm): Likewise.
> (sse4_2_pcmpistr_cconly): Likewise.
> (aesimc): Likewise.
> (aeskeygenassist): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
> tests.
> ---
>  gcc/config/i386/i386-protos.h |  1 +
>  gcc/config/i386/i386.cc   | 13 +++
>  gcc/config/i386/i386.md   |  3 +-
>  gcc/config/i386/sse.md| 93 +--
>  .../i386/apx-legacy-insn-check-norex2.c   | 55 ++-
>  5 files changed, 132 insertions(+), 33 deletions(-)
>
> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> index 78eb3e0f584..bbb219e3039 100644
> --- a/gcc/config/i386/i386-protos.h
> +++ b/gcc/config/i386/i386-protos.h
> @@ -65,6 +65,7 @@ extern bool extended_reg_mentioned_p (rtx);
>  extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
>  extern bool x86_extended_reg_mentioned_p (rtx);
>  extern bool x86_extended_rex2reg_mentioned_p (rtx);
> +extern bool x86_evex_reg_mentioned_p (rtx [], int);
>  extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
>  extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index f5d642948bc..ec93c5bab97 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22936,6 +22936,19 @@ x86_extended_rex2reg_mentioned_p (rtx insn)
>return false;
>  }
>
> +/* Return true when rtx operands mentions register that must be encoded using
> +   evex prefix.  */
> +bool
> +x86_evex_reg_mentioned_p (rtx operands[], int nops)
> +{
> +  int i;
> +  for (i = 0; i < nops; i++)
> +if (EXT_REX_SSE_REG_P (operands[i])
> +   || x86_extended_rex2reg_mentioned_p (operands[i]))
> +  return true;
> +  return false;
> +}
> +
>  /* If profitable, negate (without causing overflow) integer constant
> of mode MODE at location LOC.  Return true in this case.  */
>  bool
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 83ad01b43c1..4c305e72389 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -21603,7 +21603,7 @@ (define_expand "significand2"
>  (define_insn "sse4_1_round2"
>[(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
> (unspec:MODEFH
> - [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
> + [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,Bt,v,m")
>(match_operand:SI 2 "const_0_to_15_operand")]
>   UNSPEC_ROUND))]
>"TARGET_SSE4_1"
> @@ -21616,6 +21616,7 @@ (define_insn "sse4_1_round2"
>[(set_attr "type" "ssecvt")
> (set_attr "prefix_extra" "1,1,1,*,*")
> (set_attr "length_immediate" "1")
> +   (set_attr "gpr32" "1,1,0,1,1")
> (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
> (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
> (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 05963de9219..456713b991a 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -22617,11 +22617,12 @@ (define_insn "avx2_pblendd"
>
>  (define_insn "sse4_1_phminposuw"
>[(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,x")
> -   (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBm,*xBm,xm")]
> +   (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBT,*xBT,xBt")]
>  UNSPEC_PHMINPOSUW))]
>"TARGET_SSE4_1"
>"%vphminposuw\t{%1, 

Re: [PATCH 00/13] [RFC] Support Intel APX EGPR

2023-08-31 Thread Richard Biener via Gcc-patches
On Thu, Aug 31, 2023 at 10:22 AM Hongyu Wang via Gcc-patches
 wrote:
>
> Intel Advanced performance extension (APX) has been released in [1].
> It contains several extensions such as extended 16 general purpose registers
> (EGPRs), push2/pop2, new data destination (NDD), conditional compare
> (CCMP/CTEST) combined with suppress flags write version of common instructions
> (NF). This RFC focused on EGPR implementation in GCC.
>
> APX introduces a REX2 prefix to help represent EGPR for several legacy/SSE
> instructions. For the remaining ones, it promotes some of them using evex
> prefix for EGPR.  The main issue in APX is that not all legacy/sse/vex
> instructions support EGPR. For example, instructions in legacy opcode map2/3
> cannot use REX2 prefix since there is only 1bit in REX2 to indicate map0/1
> instructions, e.g., pinsrd. Also, for most vector extensions, EGPR is 
> supported
> in their evex forms but not vex forms, which means the mnemonics with no evex
> forms also cannot use EGPR, e.g., vphaddw.
>
> Such limitation brings some challenge with current GCC infrastructure.
> Generally, we use constraints to guide register allocation behavior. For
> register operand, it is easy to add a new constraint to certain insn and limit
> it to legacy or REX registers. But for memory operand, if we only use
> constraint to limit base/index register choice, reload has no backoff when
> process_address allocates any egprs to base/index reg, and then any 
> post-reload
> pass would get ICE from the constraint.

How realistic would it be to simply disable instructions not supporting EGPR?
I hope there are alternatives that would be available in actual APX
implementations?
Otherwise this design limitation doesn't shed a very positive light on
the designers ...

How sure are we actual implementations with APX will appear (just
remembering SSE5...)?
I'm quite sure it's not going to be 2024 so would it be realistic to
post-pone APX work
to next stage1, targeting GCC 15 only?

> Here is what we did to address the issue:
>
> Middle-end:
> -   Add rtx_insn parameter to base_reg_class, reuse the
> MODE_CODE_BASE_REG_CLASS macro with rtx_insn parameter.
> -   Add index_reg_class like base_reg_class, calls new 
> INSN_INDEX_REG_CLASS
> macro with rtx_insn parameter.
> -   In process_address_1, add rtx_insn parameter to call sites of
> base_reg_class, replace usage of INDEX_REG_CLASS to index_reg_class with
> rtx_insn parameter.
>
> Back-end:
> -   Extend GENERAL_REG_CLASS, INDEX_REG_CLASS and their supersets with
> corresponding regno checks for EGPRs.
> -   Add GENERAL_GPR16/INDEX_GPR16 class for old 16 GPRs.
> -   Whole component is controlled under -mapxf/TARGET_APX_EGPR. If it is
> not enabled, clear r16-r31 in accessible_reg_set.
> -   New register_constraint “h” and memory_constraint “Bt” that disallows
> EGPRs in operand.
> -   New asm_gpr32 flag option to enable/disable gpr32 for inline asm,
>   disabled by default.
> -   If asm_gpr32 is disabled, replace constraints “r” to “h”, and
> “m/memory” to “Bt”.
> -   Extra insn attribute gpr32, value 0 indicates the alternative cannot
> use EGPRs.
> -   Add target functions for base_reg_class and index_reg_class, calls a
> helper function to verify if insn can use EGPR in its memory_operand.
> -   In the helper function, the verify process works as follow:
> 1. Returns true if APX_EGPR disabled or insn is null.
> 2. If the insn is inline asm, returns asm_gpr32 flag.
> 3. Returns false for unrecognizable insn.
> 4. Save recog_data and which_alternative, extract the insn, and restore 
> them
> before return.
> 5. Loop through all enabled alternatives, if one of the enabled 
> alternatives
> have attr_gpr32 0, returns false, otherwise returns true.
> -   For insn alternatives that cannot use gpr32 in register_operand, use h
> constraint instead of r.
> -   For insn alternatives that cannot use gpr32 in memory operand, use Bt
> constraint instead of m, and set corresponding attr_gpr32 to 0.
> -   Split output template with %v if the sse version of mnemonic cannot 
> use
> gpr32.
> -   For insn alternatives that cannot use gpr32 in memory operand, 
> classify
> the isa attribute and split alternatives to noavx, avx_noavx512f and etc., so
> the helper function can properly loop through the available enabled mask.
>
> Specifically for inline asm, we currently just map “r/m/memory” constraints as
> an example. Eventually we will support entire mapping of all common 
> constraints
> if the mapping method was accepted.
>
> Also, for vex instructions, currently we assume egpr was supported if they 
> have
> evex counterpart, since any APX enabled machine will have AVX10 support for 
> all
> the evex encodings. We just disabled those mnemonics that doesn’t support 
> EGPR.
> So EGPR will be allowed under -mavx2 -mapxf for many vex mnemonics.
>
> We haven’t disabled EGPR for 

Re: [PATCH] Adjust costing of emulated vectorized gather/scatter

2023-08-31 Thread Richard Biener via Gcc-patches
On Thu, Aug 31, 2023 at 10:06 AM Hongtao Liu  wrote:
>
> On Wed, Aug 30, 2023 at 8:18 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Wed, Aug 30, 2023 at 12:38 PM liuhongt via Gcc-patches
> >  wrote:
> > >
> > > r14-332-g24905a4bd1375c adjusts costing of emulated vectorized
> > > gather/scatter.
> > > 
> > > commit 24905a4bd1375ccd99c02510b9f9529015a48315
> > > Author: Richard Biener 
> > > Date:   Wed Jan 18 11:04:49 2023 +0100
> > >
> > > Adjust costing of emulated vectorized gather/scatter
> > >
> > > Emulated gather/scatter behave similar to strided elementwise
> > > accesses in that they need to decompose the offset vector
> > > and construct or decompose the data vector so handle them
> > > the same way, pessimizing the cases with may elements.
> > > 
> > >
> > > But for emulated gather/scatter, offset vector load/vec_construct has
> > > aready been counted, and in real case, it's probably eliminated by
> > > later optimizer.
> > > Also after decomposing, element loads from continous memory could be
> > > less bounded compared to normal elementwise load.
> > > The patch decreases the cost a little bit.
> > >
> > > This will enable gather emulation for below loop with VF=8(ymm)
> > >
> > > double
> > > foo (double* a, double* b, unsigned int* c, int n)
> > > {
> > >   double sum = 0;
> > >   for (int i = 0; i != n; i++)
> > > sum += a[i] * b[c[i]];
> > >   return sum;
> > > }
> > >
> > > For the upper loop, microbenchmark result shows on ICX,
> > > emulated gather with VF=8 is 30% faster than emulated gather with
> > > VF=4 when tripcount is big enough.
> > > It bring back ~4% for 510.parest still ~5% regression compared to
> > > gather instruction due to throughput bound.
> > >
> > > For -march=znver1/2/3/4, the change doesn't enable VF=8(ymm) for the
> > > loop, VF remains 4(xmm) as before(guess related to their own cost
> > > model).
> > >
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/111064
> > > * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
> > > Decrease cost a little bit for vec_to_scalar(offset vector) in
> > > emulated gather.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/pr111064.c: New test.
> > > ---
> > >  gcc/config/i386/i386.cc  | 11 ++-
> > >  gcc/testsuite/gcc.target/i386/pr111064.c | 12 
> > >  2 files changed, 22 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr111064.c
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index 1bc3f11ff07..337e0f1bfbb 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -24079,7 +24079,16 @@ ix86_vector_costs::add_stmt_cost (int count, 
> > > vect_cost_for_stmt kind,
> > >   || STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == 
> > > VMAT_GATHER_SCATTER))
> > >  {
> > >stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, 
> > > misalign);
> > > -  stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1);
> > > +  /* For emulated gather/scatter, offset vector load/vec_construct 
> > > has
> > > +already been counted and in real case, it's probably eliminated 
> > > by
> > > +later optimizer.
> > > +Also after decomposing, element loads from continous memory
> > > +could be less bounded compared to normal elementwise load.  */
> > > +  if (kind == vec_to_scalar
> > > + && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == 
> > > VMAT_GATHER_SCATTER)
> > > +   stmt_cost *= TYPE_VECTOR_SUBPARTS (vectype);
> >
> > For gather we cost N vector extracts (from the offset vector), N scalar 
> > loads
> > (the actual data loads) and one vec_construct.
> >
> > For scatter we cost N vector extracts (from the offset vector),
> > N vector extracts (from the data vector) and N scalar stores.
> >
> > It was intended penaltize the extracts the same way as vector construction.
> >
> > Your change

Re: [r14-3571 Regression] FAIL: gcc.target/i386/pr52252-atom.c scan-assembler palignr on Linux/x86_64

2023-08-31 Thread Richard Biener via Gcc-patches
On Thu, 31 Aug 2023, Jiang, Haochen wrote:

> On Linux/x86_64,
> 
> caa7a99a052929d5970677c5b639e1fa5166e334 is the first bad commit
> commit caa7a99a052929d5970677c5b639e1fa5166e334
> Author: Richard Biener 
> Date:   Wed Aug 30 11:57:47 2023 +0200
> 
> tree-optimization/111228 - combine two VEC_PERM_EXPRs
> 
> caused
> 
> FAIL: gcc.target/i386/pr52252-atom.c scan-assembler palignr
> 
> with GCC configured with
> 
> ../../gcc/configure 
> --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-3571/usr 
> --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr52252-atom.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr52252-atom.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
> 
> (For question about this report, contact me at haochen dot jiang at 
> intel.com.)
> (If you met problems with cascadelake related, disabling AVX512F in command 
> line might save that.)
> (However, please make sure that there is no potential problems with AVX512.)

We are eliding 6 permutations on the testcase with the change.  The
testcase in question gcc.target/i386/pr52252-atom.c, doesn't expect
to have AVX512 enabled.

I wonder why -mtune=slm doesn't enable -mprefer-vector-width=128,
when I add that the testcase will pass again.


I'm pushing the following, tested on x86_64-unknown-linux-gnu.

Richard

>From 2d24c1715a096cd069e1627864cdcbba908c807c Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Thu, 31 Aug 2023 09:06:24 +0200
Subject: [PATCH] Adjust gcc.target/i386/pr52252-{atom,core}.c
To: gcc-patches@gcc.gnu.org

The following adjusts the testcases to force 128bit vectorization
to make them more robust when for example adding -march=cascadelake

* gcc.target/i386/pr52252-atom.c: Add -mprefer-vector-width=128.
* gcc.target/i386/pr52252-core.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr52252-atom.c | 2 +-
 gcc/testsuite/gcc.target/i386/pr52252-core.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr52252-atom.c 
b/gcc/testsuite/gcc.target/i386/pr52252-atom.c
index ee604f2189a..11f94411575 100644
--- a/gcc/testsuite/gcc.target/i386/pr52252-atom.c
+++ b/gcc/testsuite/gcc.target/i386/pr52252-atom.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -mssse3 -mtune=slm" } */
+/* { dg-options "-O2 -ftree-vectorize -mssse3 -mtune=slm 
-mprefer-vector-width=128" } */
 #define byte unsigned char
 
 void
diff --git a/gcc/testsuite/gcc.target/i386/pr52252-core.c 
b/gcc/testsuite/gcc.target/i386/pr52252-core.c
index 65d62cfa365..897026b0997 100644
--- a/gcc/testsuite/gcc.target/i386/pr52252-core.c
+++ b/gcc/testsuite/gcc.target/i386/pr52252-core.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -mssse3 -mtune=corei7" } */
+/* { dg-options "-O2 -ftree-vectorize -mssse3 -mtune=corei7 
-mprefer-vector-width=128" } */
 #define byte unsigned char
 
 void
-- 
2.35.3



Re: [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable

2023-08-31 Thread Richard Biener via Gcc-patches
On Wed, Aug 30, 2023 at 5:02 PM Andre Vieira (lists)
 wrote:
>
>
>
> On 30/08/2023 14:01, Richard Biener wrote:
> > On Wed, Aug 30, 2023 at 11:15 AM Andre Vieira (lists) via Gcc-patches
> >  wrote:
> >>
> >> This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE
> >> hook to enable rejecting SVE modes when the target architecture does not
> >> support SVE.
> >
> > How does the graph node of the SIMD clone lack this information?  That is, 
> > it
> > should have information on the types (and thus modes) for all formal 
> > arguments
> > and return values already, no?  At least the target would know how to
> > instantiate
> > it if it's not readily available at the point of use.
> >
>
> Yes it does, but that's the modes the simd clone itself uses, it does
> not know what vector_mode we are currently vectorizing for. Which is
> exactly why we need the vinfo's vector_mode to make sure the simd clone
> and its types are compatible with the vector mode.
>
> In practice, to make sure that a SVE simd clones are only used in loops
> being vectorized for SVE modes. Having said that... I just realized that
> the simdlen check already takes care of that currently...
>
> by simdlen check I mean the one that writes off simdclones that match:
>  if (!constant_multiple_p (vf, n->simdclone->simdlen, _calls)
>
> However, when using -msve-vector-bits this will become an issue, as the
> VF will be constant and we will match NEON simdclones.  This requires
> some further attention though given that we now also reject the use of
> SVE simdclones when using -msve-vector-bits, and I'm not entirely sure
> we should...

Hmm, but vectorizable_simdclone should check for compatible types here
and if they are compatible why should we reject them?  Are -msve-vector-bits
"SVE" modes different from "NEON" modes?  I suppose not, because otherwise
the type compatibility check would say incompatible.

> I'm going on holidays for 2 weeks now though, so I'll have a look at
> that scenario when I get back. Same with other feedback, didn't expect
> feedback this quickly ;) Thank you!!
>
> Kind regards,
> Andre
>


Re: RFC: Introduce -fhardened to enable security-related flags

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, Aug 30, 2023 at 12:51 PM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Tue, Aug 29, 2023 at 03:42:27PM -0400, Marek Polacek via Gcc-patches wrote:
> > +   if (UNLIKELY (flag_hardened)
> > +   && (opt->code == OPT_D || opt->code == OPT_U))
> > + {
> > +   if (!fortify_seen_p)
> > + fortify_seen_p = !strncmp (opt->arg, "_FORTIFY_SOURCE", 15);
>
> Perhaps this should check that the char after it is either '\0' or '=', we
> shouldn't care if user defines or undefines _FORTIFY_SOURCE_WHATEVER macro.
>
> > +   if (!cxx_assert_seen_p)
> > + cxx_assert_seen_p = !strcmp (opt->arg, "_GLIBCXX_ASSERTIONS");
>
> Like we don't care in this case about -D_GLIBCXX_ASSERTIONS42
>
> > + }
> > + }
> > +
> > +  if (flag_hardened)
> > + {
> > +   if (!fortify_seen_p && optimize > 0)
> > + {
> > +   if (TARGET_GLIBC_MAJOR == 2 && TARGET_GLIBC_MINOR >= 35)
> > + cpp_define (parse_in, "_FORTIFY_SOURCE=3");
> > +   else
> > + cpp_define (parse_in, "_FORTIFY_SOURCE=2");
>
> I wonder if it wouldn't be better to enable _FORTIFY_SOURCE=2 by default for
> -fhardened only for targets which actually have such a support in the C
> library.  There is some poor man's _FORTIFY_SOURCE support in libssp,
> but e.g. one has to link with -lssp in that case and compile with
> -isystem `gcc -print-include-filename=include`/ssp .
> For glibc that is >= 2.3.4, https://maskray.me/blog/2022-11-06-fortify-source
> mentions NetBSD support since 2006, newlib since 2017, some Darwin libc,
> bionic (but seems they have only some clang support and dropped GCC
> support) and some third party reimplementation of libssp.
> Or do we just enable it and hope that either it works well or isn't
> supported at all quietly?  E.g. it would certainly break the ssp case
> where -isystem finds ssp headers but -lssp isn't linked in.
>
> > @@ -4976,6 +4993,22 @@ process_command (unsigned int decoded_options_count,
> >  #endif
> >  }
> >
> > +  /* TODO: check if -static -pie works and maybe use it.  */
> > +  if (flag_hardened && !any_link_options_p && !static_p)
> > +{
> > +  save_switch ("-pie", 0, NULL, /*validated=*/true, /*known=*/false);
> > +  /* TODO: check if BIND_NOW/RELRO is supported.  */
> > +  if (true)
> > + {
> > +   /* These are passed straight down to collect2 so we have to break
> > +  it up like this.  */
> > +   add_infile ("-z", "*");
> > +   add_infile ("now", "*");
> > +   add_infile ("-z", "*");
> > +   add_infile ("relro", "*");
>
> As the TODO comment says, to do that we need to check at configure time that
> linker supports -z now and -z relro options.
>
> > @@ -1117,9 +1121,12 @@ finish_options (struct gcc_options *opts, struct 
> > gcc_options *opts_set,
> >  }
> >
> >/* We initialize opts->x_flag_stack_protect to -1 so that targets
> > - can set a default value.  */
> > + can set a default value.  With --enable-default-ssp or -fhardened
> > + the default is -fstack-protector-strong.  */
> >if (opts->x_flag_stack_protect == -1)
> > -opts->x_flag_stack_protect = DEFAULT_FLAG_SSP;
> > +opts->x_flag_stack_protect = (opts->x_flag_hardened
> > +   ? SPCT_FLAG_STRONG
> > +   : DEFAULT_FLAG_SSP);
>
> This needs to be careful, -fstack-protector isn't supported on all targets
> (e.g. ia64) and we don't want toplev.cc warning:
>   /* Targets must be able to place spill slots at lower addresses.  If the
>  target already uses a soft frame pointer, the transition is trivial.  */
>   if (!FRAME_GROWS_DOWNWARD && flag_stack_protect)
> {
>   warning_at (UNKNOWN_LOCATION, 0,
>   "%<-fstack-protector%> not supported for this target");
>   flag_stack_protect = 0;
> }
> to be emitted whenever using -fhardened, it should not be enabled there
> silently (for ia64 Fedora/RHEL gcc actually had a short patch to make it
> work, turn the target into FRAME_GROWS_DOWNWARD one if -fstack-protect* was
> enabled and otherwise keep it !FRAME_GROWS_DOWNWARD).

I'll note that with selectively enabling parts of -fhardening it can
also give a false
sensation of safety when under the hood we ignore half of the option due to
one or another reason ...

How does -fhardening reflect into -[gf]record-gcc-switches?  Is it at
least possible
to verify the actually enabled bits?

Richard.

> Jakub
>


Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:

> This patch adds a new target hook to enable us to adapt the types of return
> and parameters of simd clones.  We use this in two ways, the first one is to
> make sure we can create valid SVE types, including the SVE type attribute,
> when creating a SVE simd clone, even when the target options do not support
> SVE.  We are following the same behaviour seen with x86 that creates simd
> clones according to the ABI rules when no simdlen is provided, even if that
> simdlen is not supported by the current target options.  Note that this
> doesn't mean the simd clone will be used in auto-vectorization.

You are not documenting the bool parameter of the new hook.

What's wrong with doing the adjustment in TARGET_SIMD_CLONE_ADJUST?

> gcc/ChangeLog:
> 
>   (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Define.
>   * doc/tm.texi (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): Document.
>   * doc/tm.texi.in (TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM): New.
>   * omp-simd-clone.cc (simd_adjust_return_type): Call new hook.
>   (simd_clone_adjust_argument_types): Likewise.
>   * target.def (adjust_ret_or_param): New hook.
>   * targhooks.cc (default_simd_clone_adjust_ret_or_param): New.
>   * targhooks.h (default_simd_clone_adjust_ret_or_param): New.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, Aug 30, 2023 at 11:15 AM Andre Vieira (lists) via Gcc-patches
 wrote:
>
> This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE
> hook to enable rejecting SVE modes when the target architecture does not
> support SVE.

How does the graph node of the SIMD clone lack this information?  That is, it
should have information on the types (and thus modes) for all formal arguments
and return values already, no?  At least the target would know how to
instantiate
it if it's not readily available at the point of use.

> gcc/ChangeLog:
>
> * config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add mode
> parameter and use to to reject SVE modes when target architecture does
> not support SVE.
> * config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused mode 
> parameter.
> * config/i386/i386.cc (ix86_simd_clone_usable): Likewise.
> * doc/tm.texi (TARGET_SIMD_CLONE_USABLE): Document new parameter.
> * target.def (usable): Add new parameter.
> * tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass vector mode
> to TARGET_SIMD_CLONE_CALL hook.


Re: [PATCH 4/8] vect: don't allow fully masked loops with non-masked simd clones [PR 110485]

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:

> When analyzing a loop and choosing a simdclone to use it is possible to choose
> a simdclone that cannot be used 'inbranch' for a loop that can use partial
> vectors.  This may lead to the vectorizer deciding to use partial vectors
> which are not supported for notinbranch simd clones. This patch fixes that by
> disabling the use of partial vectors once a notinbranch simd clone has been
> selected.

OK.

> gcc/ChangeLog:
> 
>   PR tree-optimization/110485
>   * tree-vect-stmts.cc (vectorizable_simd_clone_call): Disable partial
>   vectors usage if a notinbranch simdclone has been selected.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/gomp/pr110485.c: New test.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [Patch 3/8] vect: Fix vect_get_smallest_scalar_type for simd clones

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:

> The vect_get_smallest_scalar_type helper function was using any argument to a
> simd clone call when trying to determine the smallest scalar type that would
> be vectorized.  This included the function pointer type in a MASK_CALL for
> instance, and would result in the wrong type being selected.  Instead this
> patch special cases simd_clone_call's and uses only scalar types of the
> original function that get transformed into vector types.

Looks sensible.

+bool
+simd_clone_call_p (gimple *stmt, cgraph_node **out_node)

you could return the cgraph_node * or NULL here.  Are you going to
use the function elsewhere?  Otherwise put it in the same TU as
the only use please and avoid exporting it.

Richard.

> gcc/ChangeLog:
> 
>   * tree-vect-data-refs.cci (vect_get_smallest_scalar_type): Special
>   case
>   simd clone calls and only use types that are mapped to vectors.
>   * tree-vect-stmts.cc (simd_clone_call_p): New helper function.
>   * tree-vectorizer.h (simd_clone_call_p): Declare new function.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-simd-clone-16f.c: Remove unnecessary differentation
>   between targets with different pointer sizes.
>   * gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
>   * gcc.dg/vect/vect-simd-clone-18f.c: Likewise.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [Patch 2/8] parloops: Allow poly nit and bound

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:

> Teach parloops how to handle a poly nit and bound e ahead of the changes to
> enable non-constant simdlen.

Can you use poly_int_tree_p to combine INTEGER_CST || POLY_INT_CST please?

OK with that change.

> gcc/ChangeLog:
> 
>   * tree-parloops.cc (try_to_transform_to_exit_first_loop_alt): Accept
>   poly NIT and ALT_BOUND.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH 1/8] parloops: Copy target and optimizations when creating a function clone

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:

> 
> SVE simd clones require to be compiled with a SVE target enabled or the
> argument types will not be created properly. To achieve this we need to copy
> DECL_FUNCTION_SPECIFIC_TARGET from the original function declaration to the
> clones.  I decided it was probably also a good idea to copy
> DECL_FUNCTION_SPECIFIC_OPTIMIZATION in case the original function is meant to
> be compiled with specific optimization options.

OK.

> gcc/ChangeLog:
> 
>   * tree-parloops.cc (create_loop_fn): Copy specific target and
>   optimization options to clone.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] tree-optimization/111228 - combine two VEC_PERM_EXPRs

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Jakub Jelinek wrote:

> On Wed, Aug 30, 2023 at 01:54:46PM +0200, Richard Biener via Gcc-patches 
> wrote:
> > * gcc.dg/tree-ssa/forwprop-42.c: New testcase.
> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O -fdump-tree-cddce1" } */
> > +
> > +typedef unsigned long v2di __attribute__((vector_size(16)));
> 
> Shouldn't this be unsigned long long ?  Otherwise it is actually V4SImode
> rather than V2DImode.

Fixed like this.

Richard.

>From 695caedeb1b89ec05c727b2e2aacc2a27aa16c42 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Wed, 30 Aug 2023 14:24:57 +0200
Subject: [PATCH] tree-optimization/111228 - fix testcase
To: gcc-patches@gcc.gnu.org

* gcc.dg/tree-ssa/forwprop-42.c: Use __UINT64_TYPE__ instead
of unsigned long.
---
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
index f3dbc3e9394..257a05d3ec8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O -fdump-tree-cddce1" } */
 
-typedef unsigned long v2di __attribute__((vector_size(16)));
+typedef __UINT64_TYPE__ v2di __attribute__((vector_size(16)));
 
 v2di g;
 void test (v2di *v)
-- 
2.35.3



Re: [PATCH] test: Add xfail into slp-reduc-7.c for RVV VLA vectorization

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Juzhe-Zhong wrote:

> Like ARM SVE, add RVV variable length xfail.

OK

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/slp-reduc-7.c: Add RVV.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/slp-reduc-7.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c 
> b/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c
> index 7a958f24733..a8528ab53ee 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-7.c
> @@ -57,5 +57,5 @@ int main (void)
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail 
> vect_no_int_add } } } */
>  /* For variable-length SVE, the number of scalar statements in the
> reduction exceeds the number of elements in a 128-bit granule.  */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { xfail { vect_no_int_add || { aarch64_sve && vect_variable_length } } } } } 
> */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { xfail { vect_no_int_add || { { aarch64_sve && vect_variable_length } || { 
> riscv_vector && vect_variable_length } } } } } } */
>  /* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" { xfail { 
> aarch64_sve && vect_variable_length } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] test: Adapt slp-26.c check for RVV

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Juzhe-Zhong wrote:

> Fix FAILs:
> FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
> "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
> "vectorizing stmts using SLP" 0
> FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorizing stmts using 
> SLP" 0
> 
> Since RVV is able to vectorize it with VLS modes like amdgcn.

OK

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/slp-26.c: Adapt for RVV.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/slp-26.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-26.c 
> b/gcc/testsuite/gcc.dg/vect/slp-26.c
> index d398a5acb0c..196981d83c1 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-26.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-26.c
> @@ -47,7 +47,7 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target 
> { ! { mips_msa || amdgcn-*-* } } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { mips_msa || amdgcn-*-* } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
> { target { ! { mips_msa || amdgcn-*-* } } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { mips_msa || amdgcn-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target 
> { ! { mips_msa || { amdgcn-*-* || riscv_vector } } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { mips_msa || { amdgcn-*-* || riscv_vector } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
> { target { ! { mips_msa || { amdgcn-*-* || riscv_vector } } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { mips_msa || { amdgcn-*-* || riscv_vector } } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] Adjust costing of emulated vectorized gather/scatter

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, Aug 30, 2023 at 12:38 PM liuhongt via Gcc-patches
 wrote:
>
> r14-332-g24905a4bd1375c adjusts costing of emulated vectorized
> gather/scatter.
> 
> commit 24905a4bd1375ccd99c02510b9f9529015a48315
> Author: Richard Biener 
> Date:   Wed Jan 18 11:04:49 2023 +0100
>
> Adjust costing of emulated vectorized gather/scatter
>
> Emulated gather/scatter behave similar to strided elementwise
> accesses in that they need to decompose the offset vector
> and construct or decompose the data vector so handle them
> the same way, pessimizing the cases with may elements.
> 
>
> But for emulated gather/scatter, offset vector load/vec_construct has
> aready been counted, and in real case, it's probably eliminated by
> later optimizer.
> Also after decomposing, element loads from continous memory could be
> less bounded compared to normal elementwise load.
> The patch decreases the cost a little bit.
>
> This will enable gather emulation for below loop with VF=8(ymm)
>
> double
> foo (double* a, double* b, unsigned int* c, int n)
> {
>   double sum = 0;
>   for (int i = 0; i != n; i++)
> sum += a[i] * b[c[i]];
>   return sum;
> }
>
> For the upper loop, microbenchmark result shows on ICX,
> emulated gather with VF=8 is 30% faster than emulated gather with
> VF=4 when tripcount is big enough.
> It bring back ~4% for 510.parest still ~5% regression compared to
> gather instruction due to throughput bound.
>
> For -march=znver1/2/3/4, the change doesn't enable VF=8(ymm) for the
> loop, VF remains 4(xmm) as before(guess related to their own cost
> model).
>
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/111064
> * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
> Decrease cost a little bit for vec_to_scalar(offset vector) in
> emulated gather.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr111064.c: New test.
> ---
>  gcc/config/i386/i386.cc  | 11 ++-
>  gcc/testsuite/gcc.target/i386/pr111064.c | 12 
>  2 files changed, 22 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr111064.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 1bc3f11ff07..337e0f1bfbb 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -24079,7 +24079,16 @@ ix86_vector_costs::add_stmt_cost (int count, 
> vect_cost_for_stmt kind,
>   || STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == 
> VMAT_GATHER_SCATTER))
>  {
>stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
> -  stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1);
> +  /* For emulated gather/scatter, offset vector load/vec_construct has
> +already been counted and in real case, it's probably eliminated by
> +later optimizer.
> +Also after decomposing, element loads from continous memory
> +could be less bounded compared to normal elementwise load.  */
> +  if (kind == vec_to_scalar
> + && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER)
> +   stmt_cost *= TYPE_VECTOR_SUBPARTS (vectype);

For gather we cost N vector extracts (from the offset vector), N scalar loads
(the actual data loads) and one vec_construct.

For scatter we cost N vector extracts (from the offset vector),
N vector extracts (from the data vector) and N scalar stores.

It was intended penaltize the extracts the same way as vector construction.

Your change will adjust all three different decomposition kinds "a
bit", I realize the
scaling by (TYPE_VECTOR_SUBPARTS + 1) is kind-of arbitrary but so is your
adjustment and I don't see why VMAT_GATHER_SCATTER is special to your
adjustment.

So the comment you put before the special-casing doesn't really make
sense to me.

For zen4 costing we currently have

*_11 8 times vec_to_scalar costs 576 in body
*_11 8 times scalar_load costs 96 in body
*_11 1 times vec_construct costs 792 in body

for zmm

*_11 4 times vec_to_scalar costs 80 in body
*_11 4 times scalar_load costs 48 in body
*_11 1 times vec_construct costs 100 in body

for ymm and

*_11 2 times vec_to_scalar costs 24 in body
*_11 2 times scalar_load costs 24 in body
*_11 1 times vec_construct costs 12 in body

for xmm.  Even with your adjustment if we were to enable cost comparison between
vector sizes we'd choose xmm I bet (you can try by re-ordering the modes in
the ix86_autovectorize_vector_modes hook).  So it feels like a hack.  If you
think that Icelake should enable 4 element vectorized emulated gather then
we should disable this individual scaling and possibly instead penaltize when
the number of (emulated) gathers is too high?

That said, we could count the number of element extracts and inserts
(and maybe [scalar] loads and stores) and at finish_cost time weight them
against the number of "other" operations.

As repeatedly said the current cost 

[PATCH] tree-optimization/111228 - combine two VEC_PERM_EXPRs

2023-08-30 Thread Richard Biener via Gcc-patches
The following adds simplification of two VEC_PERM_EXPRs where
the later one replaces all elements from either the first or the
second input of the earlier permute.  This allows a three input
permute to be simplified to a two input one.

I'm following the existing two input simplification case and only
allow non-VLA permutes.  The now existing three cases and the
single case in tree-ssa-forwprop.cc somehow ask for merging,
I'm not doing this as part of this change though.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111228
* match.pd ((vec_perm (vec_perm ..) @5 ..) -> (vec_perm @x @5 ..)):
New simplifications.

* gcc.dg/tree-ssa/forwprop-42.c: New testcase.
---
 gcc/match.pd| 141 +++-
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c |  17 +++
 2 files changed, 155 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 47d2733211a..6a7edde5736 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8993,10 +8993,10 @@ and,
 
 
 /* Merge
-   c = VEC_PERM_EXPR ;
-   d = VEC_PERM_EXPR ;
+ c = VEC_PERM_EXPR ;
+ d = VEC_PERM_EXPR ;
to
-   d = VEC_PERM_EXPR ;  */
+ d = VEC_PERM_EXPR ;  */
 
 (simplify
  (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
@@ -9038,6 +9038,141 @@ and,
  (if (op0)
   (vec_perm @1 @2 { op0; })))
 
+/* Merge
+ c = VEC_PERM_EXPR ;
+ d = VEC_PERM_EXPR ;
+   to
+ d = VEC_PERM_EXPR ;
+   when all elements from a or b are replaced by the later
+   permutation.  */
+
+(simplify
+ (vec_perm @5 (vec_perm@0 @1 @2 VECTOR_CST@3) VECTOR_CST@4)
+ (if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
+  (with
+   {
+ machine_mode result_mode = TYPE_MODE (type);
+ machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
+ int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
+ vec_perm_builder builder0;
+ vec_perm_builder builder1;
+ vec_perm_builder builder2 (nelts, nelts, 2);
+   }
+   (if (tree_to_vec_perm_builder (, @3)
+   && tree_to_vec_perm_builder (, @4))
+(with
+ {
+   vec_perm_indices sel0 (builder0, 2, nelts);
+   vec_perm_indices sel1 (builder1, 2, nelts);
+   bool use_1 = false, use_2 = false;
+
+   for (int i = 0; i < nelts; i++)
+ {
+  if (known_lt ((poly_uint64)sel1[i], sel1.nelts_per_input ()))
+builder2.quick_push (sel1[i]);
+  else
+{
+  poly_uint64 j = sel0[(sel1[i] - sel1.nelts_per_input ())
+   .to_constant ()];
+  if (known_lt (j, sel0.nelts_per_input ()))
+use_1 = true;
+  else
+{
+  use_2 = true;
+  j -= sel0.nelts_per_input ();
+}
+  builder2.quick_push (j + sel1.nelts_per_input ());
+}
+}
+ }
+ (if (use_1 ^ use_2)
+  (with
+   {
+vec_perm_indices sel2 (builder2, 2, nelts);
+tree op0 = NULL_TREE;
+/* If the new VEC_PERM_EXPR can't be handled but both
+   original VEC_PERM_EXPRs can, punt.
+   If one or both of the original VEC_PERM_EXPRs can't be
+   handled and the new one can't be either, don't increase
+   number of VEC_PERM_EXPRs that can't be handled.  */
+if (can_vec_perm_const_p (result_mode, op_mode, sel2, false)
+|| (single_use (@0)
+? (!can_vec_perm_const_p (result_mode, op_mode, sel0, false)
+   || !can_vec_perm_const_p (result_mode, op_mode, sel1, 
false))
+: !can_vec_perm_const_p (result_mode, op_mode, sel1, false)))
+  op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
+   }
+   (if (op0)
+   (switch
+(if (use_1)
+ (vec_perm @5 @1 { op0; }))
+(if (use_2)
+ (vec_perm @5 @2 { op0; })))
+
+/* And the case with swapped outer permute sources.  */
+
+(simplify
+ (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @5 VECTOR_CST@4)
+ (if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
+  (with
+   {
+ machine_mode result_mode = TYPE_MODE (type);
+ machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
+ int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
+ vec_perm_builder builder0;
+ vec_perm_builder builder1;
+ vec_perm_builder builder2 (nelts, nelts, 2);
+   }
+   (if (tree_to_vec_perm_builder (, @3)
+   && tree_to_vec_perm_builder (, @4))
+(with
+ {
+   vec_perm_indices sel0 (builder0, 2, nelts);
+   vec_perm_indices sel1 (builder1, 2, nelts);
+   bool use_1 = false, use_2 = false;
+
+   for (int i = 0; i < nelts; i++)
+ {
+  if (known_ge ((poly_uint64)sel1[i], sel1.nelts_per_input ()))
+builder2.quick_push (sel1[i]);
+  else
+{
+  poly_uint64 j = 

Re: [PATCH] test: Fix XPASS of RVV

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Juzhe-Zhong wrote:

> XPASS: gcc.dg/vect/vect-outer-4e.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4e.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4f.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4g.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4k.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> XPASS: gcc.dg/vect/vect-outer-4l.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED" 1
> 
> Like ARM SVE, Fix these XPASS for RVV.

OK.

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-double-reduc-5.c: Add riscv.
>   * gcc.dg/vect/vect-outer-4e.c: Ditto.
>   * gcc.dg/vect/vect-outer-4f.c: Ditto.
>   * gcc.dg/vect/vect-outer-4g.c: Ditto.
>   * gcc.dg/vect/vect-outer-4k.c: Ditto.
>   * gcc.dg/vect/vect-outer-4l.c: Ditto.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-outer-4e.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-outer-4f.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-outer-4g.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-outer-4k.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-outer-4l.c   | 2 +-
>  6 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c 
> b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
> index 7465eae1c47..b990405745e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
> @@ -53,5 +53,5 @@ int main ()
>  
>  /* Vectorization of loops with multiple types and double reduction is not 
> supported yet.  */   
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4e.c 
> b/gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
> index e65a092f5bf..cc9e96f5d58 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
> @@ -23,4 +23,4 @@ foo (){
>return;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4f.c 
> b/gcc/testsuite/gcc.dg/vect/vect-outer-4f.c
> index a88014a2fbf..c903dc9bfea 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4f.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4f.c
> @@ -65,4 +65,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4g.c 
> b/gcc/testsuite/gcc.dg/vect/vect-outer-4g.c
> index a88014a2fbf..c903dc9bfea 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4g.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4g.c
> @@ -65,4 +65,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4k.c 
> b/gcc/testsuite/gcc.dg/vect/vect-outer-4k.c
> index a88014a2fbf..c903dc9bfea 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4k.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4k.c
> @@ -65,4 +65,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4l.c 
> b/gcc/testsuite/gcc.dg/vect/vect-outer-4l.c
> index 4f95c652ee3..a63b9332afa 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4l.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4l.c
> @@ -65,4 +65,4 @@ int main (void)
>return 0;
>  }
>  
> -/* 

Re: [PATCH] tree-ssa-strlen: Fix up handling of conditionally zero memcpy [PR110914]

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled since r279392 aka 
> r10-5451-gef29b12cfbb4979
> The strlen pass has adjust_last_stmt function, which performs mainly strcat
> or strcat-like optimizations (say strcpy (x, "abcd"); strcat (x, p);
> or equivalent memcpy (x, "abcd", strlen ("abcd") + 1); char *q = strchr (x, 
> 0);
> memcpy (x, p, strlen (p)); etc. where the first stmt stores '\0' character
> at the end but next immediately overwrites it and so the first memcpy can be
> adjusted to store 1 fewer bytes.  handle_builtin_memcpy called this function
> in two spots, the first one guarded like:
>   if (olddsi != NULL
>   && tree_fits_uhwi_p (len)
>   && !integer_zerop (len))
> adjust_last_stmt (olddsi, stmt, false);
> i.e. only for constant non-zero length.  The other spot can call it even
> for non-constant length but in that case we punt before that if that length
> isn't length of some string + 1, so again non-zero.
> The r279392 change I assume wanted to add some warning stuff and changed it
> like
>if (olddsi != NULL
> -  && tree_fits_uhwi_p (len)
>&& !integer_zerop (len))
> -adjust_last_stmt (olddsi, stmt, false);
> +{
> +  maybe_warn_overflow (stmt, len, rvals, olddsi, false, true);
> +  adjust_last_stmt (olddsi, stmt, false);
> +}
> While maybe_warn_overflow possibly handles non-constant length fine,
> adjust_last_stmt really relies on length to be non-zero, which
> !integer_zerop (len) alone doesn't guarantee.  While we could for
> len being SSA_NAME ask the ranger or tree_expr_nonzero_p, I think
> adjust_last_stmt will not benefit from it much, so the following patch
> just restores the above condition/previous behavior for the adjust_last_stmt
> call only.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2023-08-30  Jakub Jelinek  
> 
>   PR tree-optimization/110914
>   * tree-ssa-strlen.cc (strlen_pass::handle_builtin_memcpy): Don't call
>   adjust_last_stmt unless len is known constant.
> 
>   * gcc.c-torture/execute/pr110914.c: New test.
> 
> --- gcc/tree-ssa-strlen.cc.jj 2023-04-27 10:17:46.406486796 +0200
> +++ gcc/tree-ssa-strlen.cc2023-08-29 18:13:38.189327203 +0200
> @@ -3340,7 +3340,8 @@ strlen_pass::handle_builtin_memcpy (buil
>&& !integer_zerop (len))
>  {
>maybe_warn_overflow (stmt, false, len, olddsi, false, true);
> -  adjust_last_stmt (olddsi, stmt, false);
> +  if (tree_fits_uhwi_p (len))
> + adjust_last_stmt (olddsi, stmt, false);
>  }
>  
>int idx = get_stridx (src, stmt);
> --- gcc/testsuite/gcc.c-torture/execute/pr110914.c.jj 2023-08-29 
> 18:38:33.305699206 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr110914.c2023-08-29 
> 18:38:18.678901007 +0200
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/110914 */
> +
> +__attribute__ ((noipa)) int
> +foo (const char *s, unsigned long l)
> +{
> +  unsigned char r = 0;
> +  __builtin_memcpy (, s, l != 0);
> +  return r;
> +}
> +
> +int
> +main ()
> +{
> +  const char *p = "123456";
> +  int a = foo (p, __builtin_strlen (p) - 5);
> +  int b = foo (p, __builtin_strlen (p) - 6);
> +  if (a != '1')
> +__builtin_abort ();
> +  if (b != 0)
> +__builtin_abort ();
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] test: Add xfail for riscv_vector

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Juzhe-Zhong wrote:

> Like ARM SVE, when we enable scalable vectorization for RVV,
> we can't do constant fold for these yet for both ARM SVE and RVV.
> 
> 
> Ok for trunk ?

OK.

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/pr88598-1.c: Add riscv_vector.
>   * gcc.dg/vect/pr88598-2.c: Ditto.
>   * gcc.dg/vect/pr88598-3.c: Ditto.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/pr88598-1.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/pr88598-2.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/pr88598-3.c | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/pr88598-1.c 
> b/gcc/testsuite/gcc.dg/vect/pr88598-1.c
> index e25c6c04543..ddcebb067ea 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr88598-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr88598-1.c
> @@ -51,4 +51,4 @@ main ()
>  
>  /* ??? We need more constant folding for this to work with fully-masked
> loops.  */
> -/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail 
> aarch64_sve } } } */
> +/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail { 
> aarch64_sve || riscv_vector } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/pr88598-2.c 
> b/gcc/testsuite/gcc.dg/vect/pr88598-2.c
> index f4c41bd8e58..ef5ea8a1a86 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr88598-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr88598-2.c
> @@ -51,4 +51,4 @@ main ()
>  
>  /* ??? We need more constant folding for this to work with fully-masked
> loops.  */
> -/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail 
> aarch64_sve } } } */
> +/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail { 
> aarch64_sve || riscv_vector } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/pr88598-3.c 
> b/gcc/testsuite/gcc.dg/vect/pr88598-3.c
> index 0fc23bf0ee7..75b8d024a95 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr88598-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr88598-3.c
> @@ -51,4 +51,4 @@ main ()
>  
>  /* ??? We need more constant folding for this to work with fully-masked
> loops.  */
> -/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail 
> aarch64_sve } } } */
> +/* { dg-final { scan-tree-dump-not {REDUC_PLUS} "optimized" { xfail { 
> aarch64_sve || riscv_vector } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] store-merging: Fix up >= 64 bit insertion [PR111015]

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase shows that we mishandle bit insertion for
> info->bitsize >= 64.  The problem is in using unsigned HOST_WIDE_INT
> shift + subtraction + build_int_cst to compute mask, the shift invokes
> UB at compile time for info->bitsize 64 and larger and e.g. on the testcase
> with info->bitsize happens to compute mask of 0x3f rather than
> 0x3f''.
> 
> The patch fixes that by using wide_int wi::mask + wide_int_to_tree, so it
> handles masks in any precision (up to WIDE_INT_MAX_PRECISION ;) ).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
> backports?

OK.

Thanks,
Richard.

> 2023-08-30  Jakub Jelinek  
> 
>   PR tree-optimization/111015
>   * gimple-ssa-store-merging.cc
>   (imm_store_chain_info::output_merged_store): Use wi::mask and
>   wide_int_to_tree instead of unsigned HOST_WIDE_INT shift and
>   build_int_cst to build BIT_AND_EXPR mask.
> 
>   * gcc.dg/pr111015.c: New test.
> 
> --- gcc/gimple-ssa-store-merging.cc.jj2023-07-11 13:40:39.049448058 
> +0200
> +++ gcc/gimple-ssa-store-merging.cc   2023-08-29 16:13:12.808434272 +0200
> @@ -4687,12 +4687,13 @@ imm_store_chain_info::output_merged_stor
>   }
> else if ((BYTES_BIG_ENDIAN ? start_gap : end_gap) > 0)
>   {
> -   const unsigned HOST_WIDE_INT imask
> - = (HOST_WIDE_INT_1U << info->bitsize) - 1;
> +   wide_int imask
> + = wi::mask (info->bitsize, false,
> + TYPE_PRECISION (TREE_TYPE (tem)));
> tem = gimple_build (, loc,
> BIT_AND_EXPR, TREE_TYPE (tem), tem,
> -   build_int_cst (TREE_TYPE (tem),
> -  imask));
> +   wide_int_to_tree (TREE_TYPE (tem),
> + imask));
>   }
> const HOST_WIDE_INT shift
>   = (BYTES_BIG_ENDIAN ? end_gap : start_gap);
> --- gcc/testsuite/gcc.dg/pr111015.c.jj2023-08-29 16:06:38.526938204 
> +0200
> +++ gcc/testsuite/gcc.dg/pr111015.c   2023-08-29 16:19:03.702536015 +0200
> @@ -0,0 +1,28 @@
> +/* PR tree-optimization/111015 */
> +/* { dg-do run { target int128 } } */
> +/* { dg-options "-O2" } */
> +
> +struct S { unsigned a : 4, b : 4; unsigned __int128 c : 70; } d;
> +
> +__attribute__((noipa)) void
> +foo (unsigned __int128 x, unsigned char y, unsigned char z)
> +{
> +  d.a = y;
> +  d.b = z;
> +  d.c = x;
> +}
> +
> +int
> +main ()
> +{
> +  foo (-1, 12, 5);
> +  if (d.a != 12
> +  || d.b != 5
> +  || d.c != (-1ULL | (((unsigned __int128) 0x3f) << 64)))
> +__builtin_abort ();
> +  foo (0x123456789abcdef0ULL | (((unsigned __int128) 26) << 64), 7, 11);
> +  if (d.a != 7
> +  || d.b != 11
> +  || d.c != (0x123456789abcdef0ULL | (((unsigned __int128) 26) << 64)))
> +__builtin_abort ();
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] middle-end: Apply MASK_LEN_LOAD_LANES/MASK_LEN_STORE_LANES to ivopts/alias

2023-08-30 Thread Richard Biener via Gcc-patches
On Wed, 30 Aug 2023, Juzhe-Zhong wrote:

> Like MASK_LOAD_LANES/MASK_STORE_LANES, add MASK_LEN_ variant.
> 
> Bootstrap and Regression on X86 passed.
> 
> Ok for trunk?

OK.

> gcc/ChangeLog:
> 
>   * tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Add MASK_LEN_ variant.
>   (call_may_clobber_ref_p_1): Ditto.
>   * tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Ditto.
>   (get_alias_ptr_type_for_ptr_address): Ditto.
> 
> ---
>  gcc/tree-ssa-alias.cc   | 3 +++
>  gcc/tree-ssa-loop-ivopts.cc | 4 
>  2 files changed, 7 insertions(+)
> 
> diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
> index cf38fe506a8..373940b5f6c 100644
> --- a/gcc/tree-ssa-alias.cc
> +++ b/gcc/tree-ssa-alias.cc
> @@ -2818,11 +2818,13 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref *ref, 
> bool tbaa_p)
>case IFN_MASK_LEN_STORE:
>   return false;
>case IFN_MASK_STORE_LANES:
> +  case IFN_MASK_LEN_STORE_LANES:
>   goto process_args;
>case IFN_MASK_LOAD:
>case IFN_LEN_LOAD:
>case IFN_MASK_LEN_LOAD:
>case IFN_MASK_LOAD_LANES:
> +  case IFN_MASK_LEN_LOAD_LANES:
>   {
> ao_ref rhs_ref;
> tree lhs = gimple_call_lhs (call);
> @@ -3072,6 +3074,7 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, 
> bool tbaa_p)
>case IFN_LEN_STORE:
>case IFN_MASK_LEN_STORE:
>case IFN_MASK_STORE_LANES:
> +  case IFN_MASK_LEN_STORE_LANES:
>   {
> tree rhs = gimple_call_arg (call,
> internal_fn_stored_value_index (fn));
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index d208d9dbd4d..3d3f28f7f3b 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -2441,6 +2441,7 @@ get_mem_type_for_internal_fn (gcall *call, tree *op_p)
>  {
>  case IFN_MASK_LOAD:
>  case IFN_MASK_LOAD_LANES:
> +case IFN_MASK_LEN_LOAD_LANES:
>  case IFN_LEN_LOAD:
>  case IFN_MASK_LEN_LOAD:
>if (op_p == gimple_call_arg_ptr (call, 0))
> @@ -2449,6 +2450,7 @@ get_mem_type_for_internal_fn (gcall *call, tree *op_p)
>  
>  case IFN_MASK_STORE:
>  case IFN_MASK_STORE_LANES:
> +case IFN_MASK_LEN_STORE_LANES:
>  case IFN_LEN_STORE:
>  case IFN_MASK_LEN_STORE:
>{
> @@ -7573,6 +7575,8 @@ get_alias_ptr_type_for_ptr_address (iv_use *use)
>  case IFN_MASK_STORE:
>  case IFN_MASK_LOAD_LANES:
>  case IFN_MASK_STORE_LANES:
> +case IFN_MASK_LEN_LOAD_LANES:
> +case IFN_MASK_LEN_STORE_LANES:
>  case IFN_LEN_LOAD:
>  case IFN_LEN_STORE:
>  case IFN_MASK_LEN_LOAD:
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] IFCOMBINE: Remove outer condition for two same conditionals

2023-08-29 Thread Richard Biener via Gcc-patches
On Mon, Aug 28, 2023 at 12:58 AM Andrew Pinski via Gcc-patches
 wrote:
>
> This adds a simple case to remove an outer condition if the two inner
> condtionals are the same and lead the same location.
> This can show up due to jump threading or inlining or someone wrote code
> like this.
>
> ifcombine-1.c shows the simple case where this is supposed to solve.
> Even though PRE can handle some cases, ifcombine is earlier and even runs
> at -O1.
>
> Note in the case of the PR here, it comes from jump threading.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
> PR tree-optimization/110891
> * tree-ssa-ifcombine.cc (ifcombine_bb_same): New function.
> (tree_ssa_ifcombine_bb): Call ifcombine_bb_same.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/110891
> * gcc.dg/tree-ssa/ifcombine-1.c: New test.
> * gcc.dg/tree-ssa/pr110891-1.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ifcombine-1.c |  27 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr110891-1.c  |  53 +++
>  gcc/tree-ssa-ifcombine.cc   | 100 
>  3 files changed, 180 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifcombine-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110891-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifcombine-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ifcombine-1.c
> new file mode 100644
> index 000..02d08efef87
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifcombine-1.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-optimized -fdump-tree-ifcombine" } */
> +
> +int g();
> +int h();
> +
> +int j, l;
> +
> +int f(int a, int *b)
> +{
> +if (a == 0)
> +{
> +if (b == ) goto L9; else goto L7;
> +}
> +else
> +{
> +if (b == ) goto L9; else goto L7;
> +}
> +L7: return g();
> +L9: return h();
> +}
> +
> +/* ifcombine can optimize away the outer most if here. */
> +/* { dg-final { scan-tree-dump-times "optimized away the test from bb " 1 
> "ifcombine" } } */
> +/* We should have remove the outer if and one of the inner ifs; leaving us 
> with one if. */
> +/* { dg-final { scan-tree-dump-times "if " 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "goto " 3 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr110891-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr110891-1.c
> new file mode 100644
> index 000..320d8823077
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr110891-1.c
> @@ -0,0 +1,53 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +void foo(void);
> +static int a, c = 7, d, o, q;
> +static int *b = , *f, *j = , *n = , *ae;
> +static short e, m;
> +static short *i = 
> +static char r;
> +void __assert_fail(char *, char *, int, const char *) 
> __attribute__((__noreturn__));
> +static const short g();
> +static void h();
> +static int *k(int) {
> +(*i)++;
> +*j ^= *b;
> +return 
> +}
> +static void l(unsigned p) {
> +int *aa = 
> +h();
> +o = 5 ^ g() && p;
> +if (f ==  || f ==  || f == )
> +;
> +else {
> +foo();
> +__assert_fail("", "", 3, __PRETTY_FUNCTION__);
> +}
> +*aa ^= *n;
> +if (*aa)
> +if (!(((p) >= 0) && ((p) <= 0))) {
> +__builtin_unreachable();
> +}
> +k(p);
> +}
> +static const short g() { return q; }
> +static void h() {
> +unsigned ag = c;
> +d = ag > r ? ag : 0;
> +ae = k(c);
> +f = ae;
> +if (ae ==  || ae ==  || ae == )
> +;
> +else
> +__assert_fail("", "", 4, __PRETTY_FUNCTION__);
> +}
> +int main() {
> +l(a);
> +m || (*b |= 64);
> +*b &= 5;
> +}
> +
> +/* We should be able to optimize away foo. */
> +/* { dg-final { scan-tree-dump-not "foo " "optimized" } } */
> diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
> index 46b076804f4..f79545b9a0b 100644
> --- a/gcc/tree-ssa-ifcombine.cc
> +++ b/gcc/tree-ssa-ifcombine.cc
> @@ -666,6 +666,103 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool 
> inner_inv,
>return false;
>  }
>
> +/* Function to remove an outer condition if two inner basic blocks have the 
> same condition and both empty otherwise. */
> +
> +static bool
> +ifcombine_bb_same (basic_block cond_bb, basic_block outer_cond_bb,
> +  basic_block then_bb, basic_block else_bb)
> +{
> +  basic_block inner_cond_bbt = nullptr, inner_cond_bbf = nullptr;
> +
> +  /* See if the the outer condition is a condition. */
> +  if (!recognize_if_then_else (outer_cond_bb, _cond_bbt, 
> _cond_bbf))
> +return false;
> +  basic_block other_cond_bb;
> +  if (cond_bb == inner_cond_bbt)
> +other_cond_bb = inner_cond_bbf;
> +  else
> +other_cond_bb = inner_cond_bbt;
> +
> +  /* The other bb has to have a single predecessor too. */
> +  if (!single_pred_p 

Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-29 Thread Richard Biener via Gcc-patches
On Tue, Aug 29, 2023 at 10:59 AM Di Zhao OS
 wrote:
>
> Hi,
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, August 29, 2023 4:09 PM
> > To: Di Zhao OS 
> > Cc: Jeff Law ; Martin Jambor ; gcc-
> > patc...@gcc.gnu.org
> > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to
> > reduce cross backedge FMA
> >
> > On Tue, Aug 29, 2023 at 9:49 AM Di Zhao OS
> >  wrote:
> > >
> > > Hi,
> > >
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Tuesday, August 29, 2023 3:41 PM
> > > > To: Jeff Law ; Martin Jambor 
> > > > Cc: Di Zhao OS ; gcc-patches@gcc.gnu.org
> > > > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc
> > to
> > > > reduce cross backedge FMA
> > > >
> > > > On Tue, Aug 29, 2023 at 1:23 AM Jeff Law via Gcc-patches
> > > >  wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 8/28/23 02:17, Di Zhao OS via Gcc-patches wrote:
> > > > > > This patch tries to fix the 2% regression in 510.parest_r on
> > > > > > ampere1 in the tracker. (Previous discussion is here:
> > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624893.html)
> > > > > >
> > > > > > 1. Add testcases for the problem. For an op list in the form of
> > > > > > "acc = a * b + c * d + acc", currently reassociation doesn't
> > > > > > Swap the operands so that more FMAs can be generated.
> > > > > > After widening_mul the result looks like:
> > > > > >
> > > > > > _1 = .FMA(a, b, acc_0);
> > > > > > acc_1 = .FMA(c, d, _1);
> > > > > >
> > > > > > While previously (before the "Handle FMA friendly..." patch),
> > > > > > widening_mul's result was like:
> > > > > >
> > > > > > _1 = a * b;
> > > > > > _2 = .FMA (c, d, _1);
> > > > > > acc_1 = acc_0 + _2;
> > > >
> > > > How can we execute the multiply and the FMA in parallel?  They
> > > > depend on each other.  Or is it the uarch can handle dependence
> > > > on the add operand but only when it is with a multiplication and
> > > > not a FMA in some better ways?  (I'd doubt so much complexity)
> > > >
> > > > Can you explain in more detail how the uarch executes one vs. the
> > > > other case?
>
> Here's my understanding after consulted our hardware team. For the
> second case, the uarch of some out-of-order processors can calculate
> "_2" of several loops at the same time, since there's no dependency
> among different iterations. While for the first case the next iteration
> has to wait for the current iteration to finish, so "acc_0"'s value is
> known. I assume it is also the case in some i386 processors, since I
> saw the patch "Deferring FMA transformations in tight loops" also
> changed corresponding files.

That should be true for all kind of operations, no?  Thus it means
reassoc should in general associate cross-iteration accumulation
last?  Historically we associated those first because that's how the
vectorizer liked to see them, but I think that's no longer necessary.

It should be achievable by properly biasing the operand during
rank computation (don't we already do that?).

> > > >
> > > > > > If the code fragment is in a loop, some architecture can execute
> > > > > > the latter in parallel, so the performance can be much faster than
> > > > > > the former. For the small testcase, the performance gap is over
> > > > > > 10% on both ampere1 and neoverse-n1. So the point here is to avoid
> > > > > > turning the last statement into FMA, and keep it a PLUS_EXPR as
> > > > > > much as possible. (If we are rewriting the op list into parallel,
> > > > > > no special treatment is needed, since the last statement after
> > > > > > rewrite_expr_tree_parallel will be PLUS_EXPR anyway.)
> > > > > >
> > > > > > 2. Function result_feeds_back_from_phi_p is to check for cross
> > > > > > backedge dependency. Added new enum fma_state to describe the
> > > > > > state of FMA candidates.
> > > > > >
> > > > > > With this patch, there's a 3% improvement in 510.parest_r 1-copy
> > > > > > run on ampere1. The compile options are:
> > > > > > "-Ofast -mcpu=ampere1 -flto --param avoid-fma-max-bits=512".
> > > > > >
> > > > > > Best regards,
> > > > > > Di Zhao
> > > > > >
> > > > > > 
> > > > > >
> > > > > >  PR tree-optimization/110279
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >  * tree-ssa-reassoc.cc (enum fma_state): New enum to
> > > > > >  describe the state of FMA candidates for an op list.
> > > > > >  (rewrite_expr_tree_parallel): Changed boolean
> > > > > >  parameter to enum type.
> > > > > >  (result_feeds_back_from_phi_p): New function to check
> > > > > >  for cross backedge dependency.
> > > > > >  (rank_ops_for_fma): Return enum fma_state. Added new
> > > > > >  parameter.
> > > > > >  (reassociate_bb): If there's backedge dependency in an
> > > > > >  op list, swap the operands before rewrite_expr_tree.
> > > > > >
> > > > > > 

Re: [RFC] > WIDE_INT_MAX_PREC support in wide-int

2023-08-29 Thread Richard Biener via Gcc-patches
On Mon, 28 Aug 2023, Jakub Jelinek wrote:

> Hi!
> 
> While the _BitInt series isn't committed yet, I had a quick look at
> lifting the current lowest limitation on maximum _BitInt precision,
> that wide_int can only support wide_int until WIDE_INT_MAX_PRECISION - 1.
> 
> Note, other limits if that is lifted are INTEGER_CST currently using 3
> unsigned char members and so being able to only hold up to 255 * 64 = 16320
> bit numbers and then TYPE_PRECISION being 16-bit, so limiting us to 65535
> bits.  The INTEGER_CST limit could be dealt with by dropping the
> int_length.offset "cache" and making int_length.extended and
> int_length.unextended members unsinged short rather than unsigned char.
> 
> The following so far just compile tested patch changes wide_int_storage
> to be a union, for precisions up to WIDE_INT_MAX_PRECISION inclusive it
> will work as before (just being no longer trivially copyable type and
> having an inline destructor), while larger precision instead use a pointer
> to heap allocated array.
> For wide_int this is fairly easy (of course, I'd need to see what the
> patch does to gcc code size and compile time performance, some
> growth/slowdown is certain), but I'd like to brainstorm on
> widest_int/widest2_int.
> 
> Currently it is a constant precision storage with WIDE_INT_MAX_PRECISION
> precision (widest2_int twice that), so memory layout-wide on at least 64-bit
> hosts identical to wide_int, just it doesn't have precision member and so
> 32 bits smaller on 32-bit hosts.  It is used in lots of places.
> 
> I think the most common is what is done e.g. in tree_int_cst* comparisons
> and similarly, using wi::to_widest () to just compare INTEGER_CSTs.
> That case actually doesn't even use wide_int but widest_extended_tree
> as storage, unless stored into widest_int in between (that happens in
> various spots as well).  For comparisons, it would be fine if
> widest_int_storage/widest_extended_tree storages had a dynamic precision,
> WIDE_INT_MAX_PRECISION for most of the cases (if only
> precision < WIDE_INT_MAX_PRECISION is involved), otherwise the needed
> precision (e.g. for binary ops) which would be what we say have in
> INTEGER_CST or some type, rounded up to whole multiples of HOST_WIDE_INTs
> and if unsigned with multiple of HOST_WIDE_INT precision, have another
> HWI to make it always sign-extended.
> 
> Another common case is how e.g. tree-ssa-ccp.cc uses them, that is mostly
> for bitwise ops and so I think the above would be just fine for that case.
> 
> Another case is how tree-ssa-loop-niter.cc uses it, I think for such a usage
> it really wants something widest, perhaps we could just try to punt for
> _BitInt(N) for N >= WIDE_INT_MAX_PRECISION in there, so that we never care
> about bits beyond that limit?

I'll note tree-ssa-loop-niter.cc also uses GMP in some cases, widest_int
is really trying to be poor-mans GMP by limiting the maximum precision.

> Some passes only use widest_int after the bitint lowering spot, we don't
> really need to care about those.
> 
> I think another possibility could be to make widest_int_storage etc. always
> pretend it has 65536 bit precision or something similarly large and make the
> decision on whether inline array or pointer is used in the storage be done
> using len.  Unfortunately, set_len method is usually called after filling
> the array, not before it (it even sign-extends some cases, so it has to be
> done that late).
> 
> Or for e.g. binary ops compute widest_int precision based on the 2 (for
> binary) or 1 (for unary) operand's .len involved?
> 
> Thoughts on this?

The simplest way would probably to keep widest_int at 
WIDE_INT_MAX_PRECISION like we have now and assert that this is
enough at ::to_widest time (we probably do already).  And then
declare uses with more precision need to use GMP.

Not sure if that's not also a viable way for wide_int - we're
only losing optimization here, no?

Richard.

> Note, the wide-int.cc change is just to show it does something, it would be
> a waste to put that into self-test when _BitInt can support such sizes.
> 
> 2023-08-28  Jakub Jelinek  
> 
>   * wide-int.h (wide_int_storage): Replace val member with a union of
>   val and valp.  Declare destructor.
>   (wide_int_storage::wide_int_storage): Initialize precision to 0
>   in default ctor.  Allocate u.valp if needed in copy ctor.
>   (wide_int_storage::~wide_int_storage): New.
>   (wide_int_storage::operator =): Delete and/or allocate u.valp if
>   needed.
>   (wide_int_storage::get_val, wide_int_storage::write_val): Return
>   u.valp for precision > WIDE_INT_MAX_PRECISION, otherwise u.val.
>   (wide_int_storage::set_len): Use write_val instead of accessing
>   val directly.
>   (wide_int_storage::create): Allocate u.valp if needed.
>   * value-range.h (irange::maybe_resize): Use a loop instead of
>   memcpy.
>   * wide-int.cc (wide_int_cc_tests): Add a test for 4096 bit 

Re: Bind RTL to a TREE expr (Re: [Bug target/111166])

2023-08-29 Thread Richard Biener via Gcc-patches
On Tue, 29 Aug 2023, Jiufu Guo wrote:

> 
> Hi Richard,
> 
> Thanks a lot for your quick reply!
> 
> Richard Biener  writes:
> 
> > On Tue, 29 Aug 2023, Jiufu Guo wrote:
> >
> >> 
> >> Hi All!
> >> 
> >> "rguenth at gcc dot gnu.org"  writes:
> >> 
> >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66
> >> ...
> >> >
> >> >
> >> > At RTL expansion time we store to D.2865 where it's DECL_RTL is r82:TI so
> >> > we can hardly fix it there.  Only a later pass could figure each of the
> >> > insns fully define the reg.
> >> >
> >> > Jiufu Guo is working to improve what we choose for DECL_RTL, but for
> >> > incoming params / outgoing return.  This is a case where we could,
> >> > with -fno-tree-vectorize, improve DECL_RTL for an automatic var and
> >> > choose not TImode but something like a (concat:TI reg:DI reg:DI).
> >> 
> >> Here is the patch about improving the parameters and returns in
> >> registers.
> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628213.html
> >> 
> >> I have a question about how to bind an RTL to a TREE expression.
> >> In this patch, a map TREE->RTL is used. But it would be better if
> >> there was a faster way.
> >> 
> >> We have DECL_RTL/INCOMING_RTL, but they can only be bound to
> >> DECL(or PARM). In the above patch, the TREE can be an EXPR
> >> (e.g. COMPONENT_REF/ARRAY_REF).
> >> 
> >> Is there a way to achieve this? Thanks for suggestions!
> >
> > No, but we don't need to bind RTL to COMPONENT_REF and friends,
> > what we want to change is the DECL_RTL of the underlying DECL.
> 
> In the above patch, the scalarized rtx for the access of the
> parameter/returns are computed at the time when parameters
> are set up.  And record "scalarized rtx" and "access expression".
> When expanding an expression, the patch queries the scalarized rtx.
> 
> +  rtx x = get_scalar_rtx_for_aggregate_expr (exp);
> +  if (x)
> +return x;
> 
> I'm reading "don't need to bind RTL to COMPONENT_REF and friends"
> and "change is the DECL_RTL of the underlying DECL."
> This may be doable. The method would be:
> 1. When the incoming/outgoing registers are determined, we can
>   check if the parameter/return can be scalarized, **then bind
>   the registers to DECL_RTL of the parm/ret**.
> 2. When expanding the expression (e.g. COMPONENT_REF), compute the
>   scalarized rtx from DECL_RTL of the param/return.
>   In expand_expr_real_1:
>   case COMPONENT_REF: ... case ARRAY_REF: if base is parm...
> 
> Is my understanding correct?

Yes, that's how it works today.  The target computes DECL_RTL
for the parameter (could be a BLKmode memory), expansion
of references first expands the base and gets DECL_RTL
and then extracts the piece as analyzed via extract_bit_field
or more direct means.

As said in the review attempt sent out just now the complication
is allowing more "complex" DECL_RTL, say a set of possibly
different sized pseudos rather than a single pseudo or MEM.
There's support for CONCAT already (for _Complex), some
rough support for PARALLEL (not sure what it actually supports).

Richard.

> BR,
> Jeff (Jiufu Guo)
> 
> >
> > Richard.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH V1 1/2] light expander sra v0

2023-08-29 Thread Richard Biener via Gcc-patches
On Wed, 23 Aug 2023, Jiufu Guo wrote:

> 
> Hi,
> 
> I just updated the patch.  We could review this one.
> 
> Compare with previous patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627287.html
> This version:
> * Supports bitfield access from one register.
> * Allow return scalar registers cleaned via contructor.
> 
> Bootstrapped and regtested on x86_64-redhat-linux, and
> powerpc64{,le}-linux-gnu.
> 
> Is it ok for trunk?

Some comments inline - not a full review (and sorry for the delay).

> 
>   PR target/65421
>   PR target/69143
> 
> gcc/ChangeLog:
> 
>   * cfgexpand.cc (extract_bit_field): Extern declare.
>   (struct access): New class.
>   (struct expand_sra): New class.
>   (expand_sra::build_access): New member function.
>   (expand_sra::visit_base): Likewise.
>   (expand_sra::analyze_default_stmt): Likewise.
>   (expand_sra::analyze_assign): Likewise.
>   (expand_sra::add_sra_candidate): Likewise.
>   (expand_sra::collect_sra_candidates): Likewise.
>   (expand_sra::valid_scalariable_accesses): Likewise.
>   (expand_sra::prepare_expander_sra): Likewise.
>   (expand_sra::expand_sra): Class constructor.
>   (expand_sra::~expand_sra): Class destructor.
>   (expand_sra::get_scalarized_rtx): New member function.
>   (extract_one_reg): New function.
>   (extract_bitfield): New function.
>   (expand_sra::scalarize_access): New member function.
>   (expand_sra::scalarize_accesses): New member function.
>   (get_scalar_rtx_for_aggregate_expr): New function.
>   (set_scalar_rtx_for_aggregate_access): New function.
>   (set_scalar_rtx_for_returns): New function.
>   (expand_return): Call get_scalar_rtx_for_aggregate_expr.
>   (expand_debug_expr): Call get_scalar_rtx_for_aggregate_expr.
>   (pass_expand::execute): Update to use the expand_sra.
>   * expr.cc (get_scalar_rtx_for_aggregate_expr): Extern declare.
>   (expand_assignment): Call get_scalar_rtx_for_aggregate_expr.
>   (expand_expr_real): Call get_scalar_rtx_for_aggregate_expr.
>   * function.cc (set_scalar_rtx_for_aggregate_access):  Extern declare.
>   (set_scalar_rtx_for_returns): Extern declare.
>   (assign_parm_setup_block): Call set_scalar_rtx_for_aggregate_access.
>   (assign_parms): Call set_scalar_rtx_for_aggregate_access. 
>   (expand_function_start): Call set_scalar_rtx_for_returns.
>   * tree-sra.h (struct base_access): New class.
>   (struct default_analyzer): New class.
>   (scan_function): New function template.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/powerpc/pr102024.C: Updated.
>   * gcc.target/powerpc/pr108073.c: New test.
>   * gcc.target/powerpc/pr65421-1.c: New test.
>   * gcc.target/powerpc/pr65421-2.c: New test.
> 
> ---
>  gcc/cfgexpand.cc | 474 ++-
>  gcc/expr.cc  |  29 +-
>  gcc/function.cc  |  28 +-
>  gcc/tree-sra.h   |  77 +++
>  gcc/testsuite/g++.target/powerpc/pr102024.C  |   2 +-
>  gcc/testsuite/gcc.target/powerpc/pr108073.c  |  29 ++
>  gcc/testsuite/gcc.target/powerpc/pr65421-1.c |   6 +
>  gcc/testsuite/gcc.target/powerpc/pr65421-2.c |  32 ++
>  8 files changed, 668 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c
> 
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index 
> edf292cfbe95ac2711faee7769e839cb4edb0dd3..385b6c781aa2805e7ca40293a0ae84f87e23e0b6
>  100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -74,6 +74,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "output.h"
>  #include "builtins.h"
>  #include "opts.h"
> +#include "tree-sra.h"
>  
>  /* Some systems use __main in a way incompatible with its use in gcc, in 
> these
> cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN 
> to
> @@ -97,6 +98,468 @@ static bool defer_stack_allocation (tree, bool);
>  
>  static void record_alignment_for_reg_var (unsigned int);
>  
> +extern rtx extract_bit_field (rtx, poly_uint64, poly_uint64, int, rtx,
> +   machine_mode, machine_mode, bool, rtx *);

belongs in some header

> +
> +/* For light SRA in expander about paramaters and returns.  */
> +struct access : public base_access
> +{
> +  /* The rtx for the access: link to incoming/returning register(s).  */
> +  rtx rtx_val;
> +};
> +
> +typedef struct access *access_p;
> +
> +struct expand_sra : public default_analyzer
> +{

Both 'base_access' and 'default_analyzer' need a more specific
name I think.  Just throwing in two names here,
'sra_access_base' and 'sra_default_analyzer'

> +  expand_sra ();
> +  ~expand_sra ();
> +
> +  /* Now use default APIs, no actions for
> + 

Re: [PATCH] tree-ssa-math-opts: Improve uaddc/usubc pattern matching [PR111209]

2023-08-29 Thread Richard Biener via Gcc-patches
On Tue, 29 Aug 2023, Jakub Jelinek wrote:

> Hi!
> 
> The uaddc/usubc usual matching is of the .{ADD,SUB}_OVERFLOW pair in the
> middle, which adds/subtracts carry-in (from lower limbs) and computes
> carry-out (to higher limbs).  Before optimizations (unless user writes
> it intentionally that way already), all the steps look the same, but
> optimizations simplify the handling of the least significant limb
> (one which adds/subtracts 0 carry-in) to just a single
> .{ADD,SUB}_OVERFLOW and the handling of the most significant limb
> if the computed carry-out is ignored to normal addition/subtraction
> of multiple operands.
> Now, match_uaddc_usubc has code to turn that least significant
> .{ADD,SUB}_OVERFLOW call into .U{ADD,SUB}C call with 0 carry-in if
> a more significant limb above it is matched into .U{ADD,SUB}C; this
> isn't necessary for functionality, as .ADD_OVERFLOW (x, y) is
> functionally equal to .UADDC (x, y, 0) (provided the types of operands
> are the same and result is complex type with that type element), and
> it also has code to match the most significant limb with ignored carry-out
> (in that case one pattern match turns both the penultimate limb pair of
> .{ADD,SUB}_OVERFLOW into .U{ADD,SUB}C and the addition/subtraction
> of the 4 values (2 carries) into another .U{ADD,SUB}C.
> 
> As the following patch shows, what we weren't handling is the case when
> one uses either the __builtin_{add,sub}c builtins or hand written forms
> thereof (either __builtin_*_overflow or even that written by hand) for
> just 2 limbs, where the least significant has 0 carry-in and the most
> significant ignores carry-out.  The following patch matches that, e.g.
>   _16 = .ADD_OVERFLOW (_1, _2);
>   _17 = REALPART_EXPR <_16>;
>   _18 = IMAGPART_EXPR <_16>;
>   _15 = _3 + _4;
>   _12 = _15 + _18;
> into
>   _16 = .UADDC (_1, _2, 0);
>   _17 = REALPART_EXPR <_16>;
>   _18 = IMAGPART_EXPR <_16>;
>   _19 = .UADDC (_3, _4, _18);
>   _12 = IMAGPART_EXPR <_19>;
> so that we can emit better code.
> 
> As the 2 later comments show, we must do that carefully, because the
> pass walks the IL from first to last stmt in a bb and we must avoid
> pattern matching this way something that should be matched on a later
> instruction differently.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2023-08-29  Jakub Jelinek  
> 
>   PR middle-end/79173
>   PR middle-end/111209
>   * tree-ssa-math-opts.cc (match_uaddc_usubc): Match also
>   just 2 limb uaddc/usubc with 0 carry-in on lower limb and ignored
>   carry-out on higher limb.  Don't match it though if it could be
>   matched later on 4 argument addition/subtraction.
> 
>   * gcc.target/i386/pr79173-12.c: New test.
> 
> --- gcc/tree-ssa-math-opts.cc.jj  2023-08-08 15:55:09.498122557 +0200
> +++ gcc/tree-ssa-math-opts.cc 2023-08-28 20:51:31.893886862 +0200
> @@ -4641,8 +4641,135 @@ match_uaddc_usubc (gimple_stmt_iterator
>  __imag__ of something, verify it is .UADDC/.USUBC.  */
>   tree rhs1 = gimple_assign_rhs1 (im);
>   gimple *ovf = SSA_NAME_DEF_STMT (TREE_OPERAND (rhs1, 0));
> + tree ovf_lhs = NULL_TREE;
> + tree ovf_arg1 = NULL_TREE, ovf_arg2 = NULL_TREE;
>   if (gimple_call_internal_p (ovf, code == PLUS_EXPR
> -  ? IFN_UADDC : IFN_USUBC)
> +  ? IFN_ADD_OVERFLOW
> +  : IFN_SUB_OVERFLOW))
> +   {
> + /* Or verify it is .ADD_OVERFLOW/.SUB_OVERFLOW.
> +This is for the case of 2 chained .UADDC/.USUBC,
> +where the first one uses 0 carry-in and the second
> +one ignores the carry-out.
> +So, something like:
> +_16 = .ADD_OVERFLOW (_1, _2);
> +_17 = REALPART_EXPR <_16>;
> +_18 = IMAGPART_EXPR <_16>;
> +_15 = _3 + _4;
> +_12 = _15 + _18;
> +where the first 3 statements come from the lower
> +limb addition and the last 2 from the higher limb
> +which ignores carry-out.  */
> + ovf_lhs = gimple_call_lhs (ovf);
> + tree ovf_lhs_type = TREE_TYPE (TREE_TYPE (ovf_lhs));
> + ovf_arg1 = gimple_call_arg (ovf, 0);
> + ovf_arg2 = gimple_call_arg (ovf, 1);
> + /* In that case we need to punt if the types don't
> +mismatch.  */
> + if (!types_compatible_p (type, ovf_lhs_type)
> + || !types_compatible_p (type, TREE_TYPE (ovf_arg1))
> + || !types_compatible_p (type,
> + 

Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-29 Thread Richard Biener via Gcc-patches
On Tue, Aug 29, 2023 at 9:49 AM Di Zhao OS
 wrote:
>
> Hi,
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, August 29, 2023 3:41 PM
> > To: Jeff Law ; Martin Jambor 
> > Cc: Di Zhao OS ; gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to
> > reduce cross backedge FMA
> >
> > On Tue, Aug 29, 2023 at 1:23 AM Jeff Law via Gcc-patches
> >  wrote:
> > >
> > >
> > >
> > > On 8/28/23 02:17, Di Zhao OS via Gcc-patches wrote:
> > > > This patch tries to fix the 2% regression in 510.parest_r on
> > > > ampere1 in the tracker. (Previous discussion is here:
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624893.html)
> > > >
> > > > 1. Add testcases for the problem. For an op list in the form of
> > > > "acc = a * b + c * d + acc", currently reassociation doesn't
> > > > Swap the operands so that more FMAs can be generated.
> > > > After widening_mul the result looks like:
> > > >
> > > > _1 = .FMA(a, b, acc_0);
> > > > acc_1 = .FMA(c, d, _1);
> > > >
> > > > While previously (before the "Handle FMA friendly..." patch),
> > > > widening_mul's result was like:
> > > >
> > > > _1 = a * b;
> > > > _2 = .FMA (c, d, _1);
> > > > acc_1 = acc_0 + _2;
> >
> > How can we execute the multiply and the FMA in parallel?  They
> > depend on each other.  Or is it the uarch can handle dependence
> > on the add operand but only when it is with a multiplication and
> > not a FMA in some better ways?  (I'd doubt so much complexity)
> >
> > Can you explain in more detail how the uarch executes one vs. the
> > other case?
> >
> > > > If the code fragment is in a loop, some architecture can execute
> > > > the latter in parallel, so the performance can be much faster than
> > > > the former. For the small testcase, the performance gap is over
> > > > 10% on both ampere1 and neoverse-n1. So the point here is to avoid
> > > > turning the last statement into FMA, and keep it a PLUS_EXPR as
> > > > much as possible. (If we are rewriting the op list into parallel,
> > > > no special treatment is needed, since the last statement after
> > > > rewrite_expr_tree_parallel will be PLUS_EXPR anyway.)
> > > >
> > > > 2. Function result_feeds_back_from_phi_p is to check for cross
> > > > backedge dependency. Added new enum fma_state to describe the
> > > > state of FMA candidates.
> > > >
> > > > With this patch, there's a 3% improvement in 510.parest_r 1-copy
> > > > run on ampere1. The compile options are:
> > > > "-Ofast -mcpu=ampere1 -flto --param avoid-fma-max-bits=512".
> > > >
> > > > Best regards,
> > > > Di Zhao
> > > >
> > > > 
> > > >
> > > >  PR tree-optimization/110279
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >  * tree-ssa-reassoc.cc (enum fma_state): New enum to
> > > >  describe the state of FMA candidates for an op list.
> > > >  (rewrite_expr_tree_parallel): Changed boolean
> > > >  parameter to enum type.
> > > >  (result_feeds_back_from_phi_p): New function to check
> > > >  for cross backedge dependency.
> > > >  (rank_ops_for_fma): Return enum fma_state. Added new
> > > >  parameter.
> > > >  (reassociate_bb): If there's backedge dependency in an
> > > >  op list, swap the operands before rewrite_expr_tree.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >  * gcc.dg/pr110279.c: New test.
> > > Not a review, but more of a question -- isn't this transformation's
> > > profitability uarch sensitive.  ie, just because it's bad for a set of
> > > aarch64 uarches, doesn't mean it's bad everywhere.
> > >
> > > And in general we shy away from trying to adjust gimple code based on
> > > uarch preferences.
> > >
> > > It seems the right place to do this is gimple->rtl expansion.
> >
> > Another comment is that FMA forming has this deferring code which I
> > think deals exactly with this kind of thing?  CCing Martin who did this
> > work based on AMD uarchs also not wanting cross-loop dependences
> > on FMAs (or so).  In particular I see
> >
> >   if (fma_state.m_deferring_p
> >   && fma_state.m_initial_phi)
> > {
> >   gcc_checking_assert (fma_state.m_last_result);
> >   if (!last_fma_candidate_feeds_initial_phi (_state,
> >  _last_result_set))
> > cancel_fma_deferring (_state);
> >
> > and I think code to avoid FMAs in other/related cases should be here
> > as well, like avoid forming back-to-back FMAs.
>
> The changes in this patch is controlled by "param_avoid_fma_max_bits", so
> I think it should only affect architectures with similar behavior. (The
> parameter was added in a previous patch "Deferring FMA transformations
> in tight loops", which seems to be dealing with the same issue.)

That's what I said.  Is the pipeline behavior properly modeled by the
scheduler description?  Your patch seems to not only affect loops
but the FMA forming case 

Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-29 Thread Richard Biener via Gcc-patches
On Tue, Aug 29, 2023 at 1:23 AM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 8/28/23 02:17, Di Zhao OS via Gcc-patches wrote:
> > This patch tries to fix the 2% regression in 510.parest_r on
> > ampere1 in the tracker. (Previous discussion is here:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624893.html)
> >
> > 1. Add testcases for the problem. For an op list in the form of
> > "acc = a * b + c * d + acc", currently reassociation doesn't
> > Swap the operands so that more FMAs can be generated.
> > After widening_mul the result looks like:
> >
> > _1 = .FMA(a, b, acc_0);
> > acc_1 = .FMA(c, d, _1);
> >
> > While previously (before the "Handle FMA friendly..." patch),
> > widening_mul's result was like:
> >
> > _1 = a * b;
> > _2 = .FMA (c, d, _1);
> > acc_1 = acc_0 + _2;

How can we execute the multiply and the FMA in parallel?  They
depend on each other.  Or is it the uarch can handle dependence
on the add operand but only when it is with a multiplication and
not a FMA in some better ways?  (I'd doubt so much complexity)

Can you explain in more detail how the uarch executes one vs. the
other case?

> > If the code fragment is in a loop, some architecture can execute
> > the latter in parallel, so the performance can be much faster than
> > the former. For the small testcase, the performance gap is over
> > 10% on both ampere1 and neoverse-n1. So the point here is to avoid
> > turning the last statement into FMA, and keep it a PLUS_EXPR as
> > much as possible. (If we are rewriting the op list into parallel,
> > no special treatment is needed, since the last statement after
> > rewrite_expr_tree_parallel will be PLUS_EXPR anyway.)
> >
> > 2. Function result_feeds_back_from_phi_p is to check for cross
> > backedge dependency. Added new enum fma_state to describe the
> > state of FMA candidates.
> >
> > With this patch, there's a 3% improvement in 510.parest_r 1-copy
> > run on ampere1. The compile options are:
> > "-Ofast -mcpu=ampere1 -flto --param avoid-fma-max-bits=512".
> >
> > Best regards,
> > Di Zhao
> >
> > 
> >
> >  PR tree-optimization/110279
> >
> > gcc/ChangeLog:
> >
> >  * tree-ssa-reassoc.cc (enum fma_state): New enum to
> >  describe the state of FMA candidates for an op list.
> >  (rewrite_expr_tree_parallel): Changed boolean
> >  parameter to enum type.
> >  (result_feeds_back_from_phi_p): New function to check
> >  for cross backedge dependency.
> >  (rank_ops_for_fma): Return enum fma_state. Added new
> >  parameter.
> >  (reassociate_bb): If there's backedge dependency in an
> >  op list, swap the operands before rewrite_expr_tree.
> >
> > gcc/testsuite/ChangeLog:
> >
> >  * gcc.dg/pr110279.c: New test.
> Not a review, but more of a question -- isn't this transformation's
> profitability uarch sensitive.  ie, just because it's bad for a set of
> aarch64 uarches, doesn't mean it's bad everywhere.
>
> And in general we shy away from trying to adjust gimple code based on
> uarch preferences.
>
> It seems the right place to do this is gimple->rtl expansion.

Another comment is that FMA forming has this deferring code which I
think deals exactly with this kind of thing?  CCing Martin who did this
work based on AMD uarchs also not wanting cross-loop dependences
on FMAs (or so).  In particular I see

  if (fma_state.m_deferring_p
  && fma_state.m_initial_phi)
{
  gcc_checking_assert (fma_state.m_last_result);
  if (!last_fma_candidate_feeds_initial_phi (_state,
 _last_result_set))
cancel_fma_deferring (_state);

and I think code to avoid FMAs in other/related cases should be here
as well, like avoid forming back-to-back FMAs.

Richard.

> Jeff


Re: [PATCH] MATCH: Move `(x | y) & (~x ^ y)` over to use bitwise_inverted_equal_p

2023-08-29 Thread Richard Biener via Gcc-patches
On Mon, Aug 28, 2023 at 10:15 PM Andrew Pinski via Gcc-patches
 wrote:
>
> This moves the match pattern `(x | y) & (~x ^ y)` over to use 
> bitwise_inverted_equal_p.
> This now also allows to optmize comparisons and also catches the missed `(~x 
> | y) & (x ^ y)`
> transformation into `~x & y`.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK

> gcc/ChangeLog:
>
> PR tree-optmization/47
> * match.pd (`(x | y) & (~x ^ y)`) Use bitwise_inverted_equal_p
> instead of matching bit_not.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optmization/47
> * gcc.dg/tree-ssa/cmpbit-4.c: New test.
> ---
>  gcc/match.pd |  7 +++-
>  gcc/testsuite/gcc.dg/tree-ssa/cmpbit-4.c | 47 
>  2 files changed, 52 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cmpbit-4.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index e6bdc3149b6..47d2733211a 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1616,8 +1616,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>
>  /* (x | y) & (~x ^ y) -> x & y */
>  (simplify
> - (bit_and:c (bit_ior:c @0 @1) (bit_xor:c @1 (bit_not @0)))
> - (bit_and @0 @1))
> + (bit_and:c (bit_ior:c @0 @1) (bit_xor:c @1 @2))
> + (with { bool wascmp; }
> +  (if (bitwise_inverted_equal_p (@0, @2, wascmp)
> +   && (!wascmp || element_precision (type) == 1))
> +   (bit_and @0 @1
>
>  /* (~x | y) & (x | ~y) -> ~(x ^ y) */
>  (simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-4.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-4.c
> new file mode 100644
> index 000..cdba5d623af
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-4.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
> +
> +int g(int x, int y)
> +{
> +  int xp = ~x;
> +  return (x | y) & (xp ^ y); // x & y
> +}
> +int g0(int x, int y)
> +{
> +  int xp = ~x;
> +  return (xp | y) & (x ^ y); // ~x & y
> +}
> +
> +_Bool gb(_Bool x, _Bool y)
> +{
> +  _Bool xp = !x;
> +  return (x | y) & (xp ^ y); // x & y
> +}
> +_Bool gb0(_Bool x, _Bool y)
> +{
> +  _Bool xp = !x;
> +  return (xp | y) & (x ^ y); // !x & y
> +}
> +
> +
> +_Bool gbi(int a, int b)
> +{
> +  _Bool x = a < 2;
> +  _Bool y = b < 3;
> +  _Bool xp = !x;
> +  return (x | y) & (xp ^ y); // x & y
> +}
> +_Bool gbi0(int a, int b)
> +{
> +  _Bool x = a < 2;
> +  _Bool y = b < 3;
> +  _Bool xp = !x;
> +  return (xp | y) & (x ^ y); // !x & y
> +}
> +
> +/* All of these should be optimized to `x & y` or `~x & y` */
> +/* { dg-final { scan-tree-dump-times "le_expr, " 3 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "gt_expr, " 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "bit_xor_expr, " "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "bit_and_expr, " 6 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "bit_not_expr, " 2 "optimized" } } */
> --
> 2.31.1
>


Re: [PATCH] vect test: Remove xfail for riscv

2023-08-29 Thread Richard Biener via Gcc-patches
On Tue, 29 Aug 2023, Juzhe-Zhong wrote:

> We are planning to enable "vect" testsuite with scalable vector 
> auto-vectorization.
> 
> This case XPASS:
> XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> 
> like ARM SVE.

OK

> ---
>  gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c 
> b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
> index e9ec4ca0da3..c2d3031bc0c 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
> @@ -47,4 +47,4 @@ int main (void)
>  }
>  
>  /* Until we support multiple types in the inner loop  */
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: Bind RTL to a TREE expr (Re: [Bug target/111166])

2023-08-29 Thread Richard Biener via Gcc-patches
On Tue, 29 Aug 2023, Jiufu Guo wrote:

> 
> Hi All!
> 
> "rguenth at gcc dot gnu.org"  writes:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66
> ...
> >
> >
> > At RTL expansion time we store to D.2865 where it's DECL_RTL is r82:TI so
> > we can hardly fix it there.  Only a later pass could figure each of the
> > insns fully define the reg.
> >
> > Jiufu Guo is working to improve what we choose for DECL_RTL, but for
> > incoming params / outgoing return.  This is a case where we could,
> > with -fno-tree-vectorize, improve DECL_RTL for an automatic var and
> > choose not TImode but something like a (concat:TI reg:DI reg:DI).
> 
> Here is the patch about improving the parameters and returns in
> registers.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628213.html
> 
> I have a question about how to bind an RTL to a TREE expression.
> In this patch, a map TREE->RTL is used. But it would be better if
> there was a faster way.
> 
> We have DECL_RTL/INCOMING_RTL, but they can only be bound to
> DECL(or PARM). In the above patch, the TREE can be an EXPR
> (e.g. COMPONENT_REF/ARRAY_REF).
> 
> Is there a way to achieve this? Thanks for suggestions!

No, but we don't need to bind RTL to COMPONENT_REF and friends,
what we want to change is the DECL_RTL of the underlying DECL.

Richard.


Re: Ping^^ [PATCH V5 2/2] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-08-28 Thread Richard Biener via Gcc-patches
On Wed, 23 Aug 2023, guojiufu wrote:

> Hi,
> 
> I would like to have a gentle ping...
> 
> BR,
> Jeff (Jiufu Guo)
> 
> On 2023-08-07 10:45, guojiufu via Gcc-patches wrote:
> > Hi,
> > 
> > Gentle ping...
> > 
> > On 2023-07-18 22:05, Jiufu Guo wrote:
> >> Hi,
> >> 
> >> Integer expression "(X - N * M) / N" can be optimized to "X / N - M"
> >> if there is no wrap/overflow/underflow and "X - N * M" has the same
> >> sign with "X".
> >> 
> >> Compare the previous version:
> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624067.html
> >> - APIs: overflow, nonnegative_p and nonpositive_p are moved close
> >>   to value range.
> >> - Use above APIs in match.pd.
> >> 
> >> Bootstrap & regtest pass on ppc64{,le} and x86_64.
> >> Is this patch ok for trunk?
> >> 
> >> BR,
> >> Jeff (Jiufu Guo)
> >> 
> >>  PR tree-optimization/108757
> >> 
> >> gcc/ChangeLog:
> >> 
> >>  * match.pd ((X - N * M) / N): New pattern.
> >>  ((X + N * M) / N): New pattern.
> >>  ((X + C) div_rshift N): New pattern.
> >> 
> >> gcc/testsuite/ChangeLog:
> >> 
> >>  * gcc.dg/pr108757-1.c: New test.
> >>  * gcc.dg/pr108757-2.c: New test.
> >>  * gcc.dg/pr108757.h: New test.
> >> 
> >> ---
> >>  gcc/match.pd  |  85 +++
> >>  gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
> >>  gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
> >>  gcc/testsuite/gcc.dg/pr108757.h   | 233 
> >> ++
> >>  4 files changed, 355 insertions(+)
> >>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/pr108757.h
> >> 
> >> diff --git a/gcc/match.pd b/gcc/match.pd
> >> index 8543f777a28..39dbb0567dc 100644
> >> --- a/gcc/match.pd
> >> +++ b/gcc/match.pd
> >> @@ -942,6 +942,91 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>  #endif
> >> 
> >> 
> >> +#if GIMPLE
> >> +(for div (trunc_div exact_div)
> >> + /* Simplify (t + M*N) / N -> t / N + M.  */
> >> + (simplify
> >> +  (div (plus:c@4 @0 (mult:c@3 @1 @2)) @2)

The :c on the plus isn't necessary?

> >> +  (with {value_range vr0, vr1, vr2, vr3, vr4;}
> >> +  (if (INTEGRAL_TYPE_P (type)
> >> +   && get_range_query (cfun)->range_of_expr (vr1, @1)
> >> +   && get_range_query (cfun)->range_of_expr (vr2, @2)
> >> +   && range_op_handler (MULT_EXPR).overflow_free_p (vr1, vr2)

the multiplication doesn't overflow

> >> +   && get_range_query (cfun)->range_of_expr (vr0, @0)
> >> +   && get_range_query (cfun)->range_of_expr (vr3, @3)
> >> +   && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr3)

the add doesn't overflow

> >> +   && get_range_query (cfun)->range_of_expr (vr4, @4)
> >> +   && (TYPE_UNSIGNED (type)
> >> + || (vr0.nonnegative_p () && vr4.nonnegative_p ())
> >> + || (vr0.nonpositive_p () && vr4.nonpositive_p (

I don't know what this checks - the add result and the add first
argument are not of opposite sign.  Huh.  At least this part
needs an explaining comment.

Sorry if we hashed this out before, but you can see I forgot
and it's not obvious.

> >> +  (plus (div @0 @2) @1
> >> +
> >> + /* Simplify (t - M*N) / N -> t / N - M.  */
> >> + (simplify
> >> +  (div (minus@4 @0 (mult:c@3 @1 @2)) @2)
> >> +  (with {value_range vr0, vr1, vr2, vr3, vr4;}
> >> +  (if (INTEGRAL_TYPE_P (type)
> >> +   && get_range_query (cfun)->range_of_expr (vr1, @1)
> >> +   && get_range_query (cfun)->range_of_expr (vr2, @2)
> >> +   && range_op_handler (MULT_EXPR).overflow_free_p (vr1, vr2)
> >> +   && get_range_query (cfun)->range_of_expr (vr0, @0)
> >> +   && get_range_query (cfun)->range_of_expr (vr3, @3)
> >> +   && range_op_handler (MINUS_EXPR).overflow_free_p (vr0, vr3)
> >> +   && get_range_query (cfun)->range_of_expr (vr4, @4)
> >> +   && (TYPE_UNSIGNED (type)
> >> + || (vr0.nonnegative_p () && vr4.nonnegative_p ())
> >> + || (vr0.nonpositive_p () && vr4.nonpositive_p (
> >> +  (minus (div @0 @2) @1)

looks like exactly the same - if you use a

 (for addsub (plus minus)

you should be able to do range_op_handler (addsub).

> >> +
> >> +/* Simplify
> >> +   (t + C) / N -> t / N + C / N where C is multiple of N.
> >> +   (t + C) >> N -> t >> N + C>>N if low N bits of C is 0.  */
> >> +(for op (trunc_div exact_div rshift)
> >> + (simplify
> >> +  (op (plus@3 @0 INTEGER_CST@1) INTEGER_CST@2)
> >> +   (with
> >> +{
> >> +  wide_int c = wi::to_wide (@1);
> >> +  wide_int n = wi::to_wide (@2);
> >> +  bool is_rshift = op == RSHIFT_EXPR;
> >> +  bool neg_c = false;
> >> +  bool ok = false;
> >> +  value_range vr0;
> >> +  if (INTEGRAL_TYPE_P (type)
> >> +&& get_range_query (cfun)->range_of_expr (vr0, @0))
> >> +{
> >> +ok = is_rshift ? wi::ctz (c) >= n.to_shwi ()
> >> +   : wi::multiple_of_p (c, n, TYPE_SIGN (type));
> >> +value_range vr1, vr3;
> >> +ok = ok && get_range_query (cfun)->range_of_expr (vr1, 

  1   2   3   4   5   6   7   8   9   10   >