Re: [PATCH] git-backport: support renamed .cc files in commit message.

2022-01-13 Thread Bernhard Reutner-Fischer via Gcc-patches
On Wed, 12 Jan 2022 16:54:46 +0100
Martin Liška  wrote:

> +def replace_file_in_changelog(lines, filename):
> +if not filename.endswith('.cc'):
> +return
> +
> +# consider all componenets of a path: gcc/ipa-icf.cc
> +while filename:
> +for i, line in enumerate(lines):
> +if filename in line:
> +line = line.replace(filename, filename[:-1])
> +lines[i] = line
> +return
> +parts = filename.split('/')
> +if len(parts) == 1:
> +return
> +filename = '/'.join(parts[1:])
> +

I think you mean os.sep instead of the hardcoded slash.
But i'd use os.path.split and os.path.join

thanks,


Re: [PATCH] cprop_hardreg: Workaround for narrow mode != lowpart targets

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, Jan 13, 2022 at 6:12 PM Andreas Krebbel via Gcc-patches
 wrote:
>
> The cprop_hardreg pass is built around the assumption that accessing a
> register in a narrower mode is the same as accessing the lowpart of
> the register.  This unfortunately is not true for vector registers on
> IBM Z. This caused a miscompile of LLVM with GCC 8.5. The problem
> could not be reproduced with upstream GCC unfortunately but we have to
> assume that it is latent there. The right fix would require
> substantial changes to the cprop pass and is certainly something we
> would want for our platform. But since this would not be acceptable
> for older GCCs I'll go with what Vladimir proposed in the RedHat BZ
> and introduce a hopefully temporary and undocumented target hook to
> disable that specific transformation in regcprop.c.
>
> Here the RedHat BZ for reference:
> https://bugzilla.redhat.com/show_bug.cgi?id=2028609

Can the gist of this bug be put into the GCC bugzilla so the rev can
refer to it?  Can we have a testcase even?

I'm not quite understanding the problem but is it that, say,

 (subreg:DI (reg:V2DI ..) 0)

isn't the same as

 (lowpart:DI (reg:V2DI ...) 0)

?  The regcprop code looks more like asking whether the larger reg
is a composition of multiple other hardregs and will return the specific
hardreg corresponding to the lowpart - so like if on s390 the vector
registers overlap with some other regset.  But then doing the actual
accesses via the other regset regs doesn't actually work?  Isn't the
backend then lying to us (aka the mode_change_ok returns the
wrong answer)?

How does the stage1 fix, aka "rewrite" of cprop, look like?  How can we
be sure this hack isn't still present in 10 years from now?

Thanks,
Richard.

> Bootstrapped and regression-tested on s390x.
>
> Ok?
>
> gcc/ChangeLog:
>
> * target.def (narrow_mode_refers_low_part_p): Add new target hook.
> * config/s390/s390.c (s390_narrow_mode_refers_low_part_p):
> Implement new target hook for IBM Z.
> (TARGET_NARROW_MODE_REFERS_LOW_PART_P): New macro.
> * regcprop.c (maybe_mode_change): Disable transformation depending
> on the new target hook.
> ---
>  gcc/config/s390/s390.c | 14 ++
>  gcc/regcprop.c |  3 ++-
>  gcc/target.def | 12 +++-
>  3 files changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index 056002e4a4a..aafc6d63be6 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -10488,6 +10488,18 @@ s390_hard_regno_mode_ok (unsigned int regno, 
> machine_mode mode)
>return false;
>  }
>
> +/* Implement TARGET_NARROW_MODE_REFERS_LOW_PART_P.  */
> +
> +static bool
> +s390_narrow_mode_refers_low_part_p (unsigned int regno)
> +{
> +  if (reg_classes_intersect_p (VEC_REGS, REGNO_REG_CLASS (regno)))
> +return false;
> +
> +  return true;
> +}
> +
> +
>  /* Implement TARGET_MODES_TIEABLE_P.  */
>
>  static bool
> @@ -17472,6 +17484,8 @@ s390_vectorize_vec_perm_const (machine_mode vmode, 
> rtx target, rtx op0, rtx op1,
>  #undef TARGET_VECTORIZE_VEC_PERM_CONST
>  #define TARGET_VECTORIZE_VEC_PERM_CONST s390_vectorize_vec_perm_const
>
> +#undef TARGET_NARROW_MODE_REFERS_LOW_PART_P
> +#define TARGET_NARROW_MODE_REFERS_LOW_PART_P 
> s390_narrow_mode_refers_low_part_p
>
>  struct gcc_target targetm = TARGET_INITIALIZER;
>
> diff --git a/gcc/regcprop.c b/gcc/regcprop.c
> index 1a9bcf0a1ad..aaf94ad9b51 100644
> --- a/gcc/regcprop.c
> +++ b/gcc/regcprop.c
> @@ -426,7 +426,8 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
> copy_mode,
>
>if (orig_mode == new_mode)
>  return gen_raw_REG (new_mode, regno);
> -  else if (mode_change_ok (orig_mode, new_mode, regno))
> +  else if (mode_change_ok (orig_mode, new_mode, regno)
> +  && targetm.narrow_mode_refers_low_part_p (regno))
>  {
>int copy_nregs = hard_regno_nregs (copy_regno, copy_mode);
>int use_nregs = hard_regno_nregs (copy_regno, new_mode);
> diff --git a/gcc/target.def b/gcc/target.def
> index 8fd2533e90a..598eea501ff 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -5446,6 +5446,16 @@ value that the middle-end intended.",
>   bool, (machine_mode from, machine_mode to, reg_class_t rclass),
>   hook_bool_mode_mode_reg_class_t_true)
>
> +/* This hook is used to work around a problem in regcprop. Hardcoded
> +assumptions currently prevent it from working correctly for targets
> +where the low part of a multi-word register doesn't align to accessing
> +the register with a narrower mode.  */
> +DEFHOOK_UNDOC
> +(narrow_mode_refers_low_part_p,
> +"",
> +bool, (unsigned int regno),
> +hook_bool_unit_true)
> +
>  /* Change pseudo allocno class calculated by IRA.  */
>  DEFHOOK
>  (ira_change_pseudo_allocno_class,
> @@ -5949,7 +5959,7 @@ register if floating point arithmetic is not being 
> done.  As long as the\n\
>  floating registers are not in class @code{GENERAL_REGS}, they will 

Re: [PATCH] Loop unswitching: support gswitch statements.

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, Jan 13, 2022 at 5:01 PM Martin Liška  wrote:
>
> On 1/6/22 17:30, Martin Liška wrote:
> > I really welcome that, I've pushed devel/loop-unswitch-support-switches
> > branch with first changes you pointed out. Feel free playing with the 
> > branch.
>
> Hello.
>
> I've just pushed a revision to the branch that introduced top-level comment.
> Feel free to play with the branch once you have spare cycles and we can
> return to it next stage1.

Thanks, will do!

Richard.

>
> Cheers,
> Martin


Re: [PATCH] forwprop, v2: Canonicalize atomic fetch_op op x to op_fetch or vice versa [PR98737]

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Jan 2022, Jakub Jelinek wrote:

> On Thu, Jan 13, 2022 at 04:07:20PM +0100, Richard Biener wrote:
> > I'm mostly concerned about the replace_uses_by use.  forwprop
> > will go over newly emitted stmts and thus the hypothetical added
> > 
> > lhs2 = d;
> > 
> > record the copy and schedule the stmt for removal, substituting 'd'
> > in each use as it goes along the function and folding them.  It's
> > a bit iffy (and maybe has unintended side-effects in odd cases)
> > to trample around and fold stuff behind that flows back.
> > 
> > I'd always vote to simplify the folding code so it's easier to
> > maintain and not micro-optimize there since it's not going to be
> > a hot part of the compiler.
> 
> Ok.  So like this?
> 
> 2022-01-13  Jakub Jelinek  
> 
>   PR target/98737
>   * tree-ssa-forwprop.c (simplify_builtin_call): Canonicalize
>   __atomic_fetch_op (p, x, y) op x into __atomic_op_fetch (p, x, y)
>   and __atomic_op_fetch (p, x, y) iop x into
>   __atomic_fetch_op (p, x, y).
> 
>   * gcc.dg/tree-ssa/pr98737-1.c: New test.
>   * gcc.dg/tree-ssa/pr98737-2.c: New test.
> 
> --- gcc/tree-ssa-forwprop.c.jj2022-01-11 23:11:23.467275019 +0100
> +++ gcc/tree-ssa-forwprop.c   2022-01-13 18:09:50.318625915 +0100
> @@ -1241,12 +1241,19 @@ constant_pointer_difference (tree p1, tr
> memset (p + 4, ' ', 3);
> into
> memcpy (p, "abcd   ", 7);
> -   call if the latter can be stored by pieces during expansion.  */
> +   call if the latter can be stored by pieces during expansion.
> +
> +   Also canonicalize __atomic_fetch_op (p, x, y) op x
> +   to __atomic_op_fetch (p, x, y) or
> +   __atomic_op_fetch (p, x, y) iop x
> +   to __atomic_fetch_op (p, x, y) when possible (also __sync).  */
>  
>  static bool
>  simplify_builtin_call (gimple_stmt_iterator *gsi_p, tree callee2)
>  {
>gimple *stmt1, *stmt2 = gsi_stmt (*gsi_p);
> +  enum built_in_function other_atomic = END_BUILTINS;
> +  enum tree_code atomic_op = ERROR_MARK;
>tree vuse = gimple_vuse (stmt2);
>if (vuse == NULL)
>  return false;
> @@ -1448,6 +1455,290 @@ simplify_builtin_call (gimple_stmt_itera
>   }
>   }
>break;
> +
> + #define CASE_ATOMIC(NAME, OTHER, OP) \
> +case BUILT_IN_##NAME##_1:
> \
> +case BUILT_IN_##NAME##_2:
> \
> +case BUILT_IN_##NAME##_4:
> \
> +case BUILT_IN_##NAME##_8:
> \
> +case BUILT_IN_##NAME##_16:   
> \
> +  atomic_op = OP;
> \
> +  other_atomic   \
> + = (enum built_in_function) (BUILT_IN_##OTHER##_1\
> + + (DECL_FUNCTION_CODE (callee2) \
> +- BUILT_IN_##NAME##_1)); \
> +  goto handle_atomic_fetch_op;
> +
> +CASE_ATOMIC (ATOMIC_FETCH_ADD, ATOMIC_ADD_FETCH, PLUS_EXPR)
> +CASE_ATOMIC (ATOMIC_FETCH_SUB, ATOMIC_SUB_FETCH, MINUS_EXPR)
> +CASE_ATOMIC (ATOMIC_FETCH_AND, ATOMIC_AND_FETCH, BIT_AND_EXPR)
> +CASE_ATOMIC (ATOMIC_FETCH_XOR, ATOMIC_XOR_FETCH, BIT_XOR_EXPR)
> +CASE_ATOMIC (ATOMIC_FETCH_OR, ATOMIC_OR_FETCH, BIT_IOR_EXPR)
> +
> +CASE_ATOMIC (SYNC_FETCH_AND_ADD, SYNC_ADD_AND_FETCH, PLUS_EXPR)
> +CASE_ATOMIC (SYNC_FETCH_AND_SUB, SYNC_SUB_AND_FETCH, MINUS_EXPR)
> +CASE_ATOMIC (SYNC_FETCH_AND_AND, SYNC_AND_AND_FETCH, BIT_AND_EXPR)
> +CASE_ATOMIC (SYNC_FETCH_AND_XOR, SYNC_XOR_AND_FETCH, BIT_XOR_EXPR)
> +CASE_ATOMIC (SYNC_FETCH_AND_OR, SYNC_OR_AND_FETCH, BIT_IOR_EXPR)
> +
> +CASE_ATOMIC (ATOMIC_ADD_FETCH, ATOMIC_FETCH_ADD, MINUS_EXPR)
> +CASE_ATOMIC (ATOMIC_SUB_FETCH, ATOMIC_FETCH_SUB, PLUS_EXPR)
> +CASE_ATOMIC (ATOMIC_XOR_FETCH, ATOMIC_FETCH_XOR, BIT_XOR_EXPR)
> +
> +CASE_ATOMIC (SYNC_ADD_AND_FETCH, SYNC_FETCH_AND_ADD, MINUS_EXPR)
> +CASE_ATOMIC (SYNC_SUB_AND_FETCH, SYNC_FETCH_AND_SUB, PLUS_EXPR)
> +CASE_ATOMIC (SYNC_XOR_AND_FETCH, SYNC_FETCH_AND_XOR, BIT_XOR_EXPR)
> +
> +#undef CASE_ATOMIC
> +
> +handle_atomic_fetch_op:
> +  if (gimple_call_num_args (stmt2) >= 2 && gimple_call_lhs (stmt2))
> + {
> +   tree lhs2 = gimple_call_lhs (stmt2), lhsc = lhs2;
> +   tree arg = gimple_call_arg (stmt2, 1);
> +   gimple *use_stmt, *cast_stmt = NULL;
> +   use_operand_p use_p;
> +   tree ndecl = builtin_decl_explicit (other_atomic);
> +
> +   if (ndecl == NULL_TREE || !single_imm_use (lhs2, _p, _stmt))
> + break;
> +
> +   if (gimple_assign_cast_p (use_stmt))
> + {
> +   cast_stmt = use_stmt;
> +   lhsc = gimple_assign_lhs (cast_stmt);
> +   if (lhsc == NULL_TREE
> +   || !INTEGRAL_TYPE_P (TREE_TYPE (lhsc))
> +   || 

Re: [vect] PR103997: Fix epilogue mode skipping

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Jan 2022, Andre Vieira (lists) wrote:

> 
> On 13/01/2022 14:25, Richard Biener wrote:
> > On Thu, 13 Jan 2022, Andre Vieira (lists) wrote:
> >
> >> On 13/01/2022 12:36, Richard Biener wrote:
> >>> On Thu, 13 Jan 2022, Andre Vieira (lists) wrote:
> >>>
>  This time to the list too (sorry for double email)
> 
>  Hi,
> 
>  The original patch '[vect] Re-analyze all modes for epilogues', skipped
>  modes
>  that should not be skipped since it used the vector mode provided by
>  autovectorize_vector_modes to derive the minimum VF required for it.
>  However,
>  those modes should only really be used to dictate vector size, so instead
>  this
>  patch looks for the mode in 'used_vector_modes' with the largest element
>  size,
>  and constructs a vector mode with the smae size as the current
>  vector_modes[mode_i]. Since we are using the largest element size the
>  NUNITs
>  for this mode is the smallest possible VF required for an epilogue with
>  this
>  mode and should thus skip only the modes we are certain can not be used.
> 
>  Passes bootstrap and regression on x86_64 and aarch64.
> >>> Clearly
> >>>
> >>> + /* To make sure we are conservative as to what modes we skip, we
> >>> +should use check the smallest possible NUNITS which would be
> >>> +derived from the mode in USED_VECTOR_MODES with the largest
> >>> +element size.  */
> >>> + scalar_mode max_elsize_mode = GET_MODE_INNER
> >>> (vector_modes[mode_i]);
> >>> + for (vec_info::mode_set::iterator i =
> >>> +   first_loop_vinfo->used_vector_modes.begin ();
> >>> + i != first_loop_vinfo->used_vector_modes.end (); ++i)
> >>> +   {
> >>> + if (VECTOR_MODE_P (*i)
> >>> + && GET_MODE_SIZE (GET_MODE_INNER (*i))
> >>> + > GET_MODE_SIZE (max_elsize_mode))
> >>> +   max_elsize_mode = GET_MODE_INNER (*i);
> >>> +   }
> >>>
> >>> can be done once before iterating over the modes for the epilogue.
> >> True, I'll start with QImode instead of the inner of vector_modes[mode_i]
> >> too
> >> since we can't guarantee the mode is a VECTOR_MODE_P and it is actually
> >> better
> >> too since we can't possible guarantee the element size of the
> >> USED_VECTOR_MODES is smaller than that of the first vector mode...
> >>
> >>> Richard maybe knows whether we should take care to look at the
> >>> size of the vector mode as well since related_vector_mode when
> >>> passed 0 as nunits produces a vector mode with the same size
> >>> as vector_modes[mode_i] but not all used_vector_modes may be
> >>> of the same size
> >> I suspect that should be fine though, since if we use the largest element
> >> size
> >> of all used_vector_modes then that should gives us the least possible
> >> number
> >> of NUNITS and thus only conservatively skip. That said, that does assume
> >> that
> >> no vector mode used may be larger than the size of the loop's vector_mode.
> >> Can
> >> I assume that?
> > No idea, but I would lean towards a no ;)  I think the loops vector_mode
> > doesn't have to match vector_modes[mode_i] either, does it?  At least
> > autodetected_vector_mode will be not QImode based.
> The mode doesn't but both vector modes have to be the same vector size surely,
> I'm not referring to the element size here.
> What I was trying to ask was whether all vector modes in used_vector_modes had
> the same vector size as the loops vector mode (and the vector_modes[mode_i] it
> originated from).

Definitely not I think.

> >>> (and you probably also want to exclude
> >>> VECTOR_BOOLEAN_TYPE_P from the search?)
> >> Yeah I think so too, thanks!
> >>
> >> I keep going back to thinking (as I brought up in the bugzilla ticket),
> >> maybe
> >> we ought to only skip if the NUNITS of the vector mode with the same vector
> >> size as vector_modes[mode_i] is larger than first_info_vf, or just don't
> >> skip
> >> at all...
> > The question is how much work we do before realizing the chosen mode
> > cannot be used because there's not enough iterations?  Maybe we can
> > improve there easily?
> IIUC the VF can change depending on whether we decide to use SLP, so really we
> can only check if after we have determined whether or not to use SLP, so
> either:
> * When SLP fully succeeds, so somewhere between the last 'goto again;' and
> return success, but there is very little left to do there
> * When SLP fails: here we could save on some work.

Hmm, yeah.  Guess it's quite expensive then in the end so worth to
avoid doing useless stuff.  I do wonder whether we could cache
analysis fails (and VFs in case of success but worse cost) of the
main loop analysis.

> > Also for targets that for the main loop do not perform cost
> > comparison (like x86) but have lots of vector modes the previous
> > mode of operation really made sense (start at next_mode_i or

Re: [PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-13 Thread Hongyu Wang via Gcc-patches
> > No, the approach is wrong. You have to solve output clearing on RTL
> > level, please look at how e.g. tzcnt false dep is solved:
>
> Actually we have considered such approach before, but we found we need
> to break original define_insn to remove the mask/rounding subst,
> since define_split could not adopt subst, and that would add 6 more
> define_insn_and_split and 4 define_insn for each instruction. We think
> such approach would introduce too much redundant code.
>
> Do you think the code size increment is acceptable?

Also that 100+ more patterns increases maintenance effort. If we split
them at epilogue_complete stage,
it seems not much difference to put it under output template...

Hongyu Wang  于2022年1月14日周五 13:38写道:
>
> > No, the approach is wrong. You have to solve output clearing on RTL
> > level, please look at how e.g. tzcnt false dep is solved:
>
> Actually we have considered such approach before, but we found we need
> to break original define_insn to remove the mask/rounding subst,
> since define_split could not adopt subst, and that would add 6 more
> define_insn_and_split and 4 define_insn for each instruction. We think
> such approach would introduce too much redundant code.
>
> Do you think the code size increment is acceptable?
>
> Uros Bizjak via Gcc-patches  于2022年1月13日周四 15:42写道:
> >
> > On Thu, Jan 13, 2022 at 8:28 AM Hongyu Wang  wrote:
> > >
> > > From: wwwhhhyyy 
> > >
> > > Hi,
> > >
> > > For GoldenCove micro-architecture, force insert zero-idiom in asm
> > > template to break false dependency of dest register for several insns.
> > >
> > > The related insns are:
> > >
> > > VPERM/D/Q/PS/PD
> > > VRANGEPD/PS/SD/SS
> > > VGETMANTSS/SD/SH
> > > VGETMANDPS/PD - mem version only
> > > VPMULLQ
> > > VFMULCSH/PH
> > > VFCMULCSH/PH
> > >
> > > Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,}
> > >
> > > Ok for master?
> >
> > No, the approach is wrong. You have to solve output clearing on RTL
> > level, please look at how e.g. tzcnt false dep is solved:
> >
> >   [(set (reg:CCC FLAGS_REG)
> > (compare:CCC (match_operand:SWI48 1 "nonimmediate_operand" "rm")
> >  (const_int 0)))
> >(set (match_operand:SWI48 0 "register_operand" "=r")
> > (ctz:SWI48 (match_dup 1)))]
> >   "TARGET_BMI"
> >   "tzcnt{}\t{%1, %0|%0, %1}";
> >   "&& TARGET_AVOID_FALSE_DEP_FOR_BMI && epilogue_completed
> >&& optimize_function_for_speed_p (cfun)
> >&& !reg_mentioned_p (operands[0], operands[1])"
> >   [(parallel
> > [(set (reg:CCC FLAGS_REG)
> >   (compare:CCC (match_dup 1) (const_int 0)))
> >  (set (match_dup 0)
> >   (ctz:SWI48 (match_dup 1)))
> >  (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP)])]
> >   "ix86_expand_clear (operands[0]);"
> >   [(set_attr "type" "alu1")
> >(set_attr "prefix_0f" "1")
> >(set_attr "prefix_rep" "1")
> >(set_attr "btver2_decode" "double")
> >(set_attr "mode" "")])
> >
> > For TARGET_AVOID_FALSE_DEP_FOR_BMI, we split at epilogue_complete when
> > insn registers are stable and use ix86_expand_clear to clear output
> > operand. Please also note how the final insn is tagged with
> > UNSPEC_INSN_FALSE_DEP to avoid combine from recognizing it too early.
> >
> > Uros.
> >
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386.h (TARGET_DEST_FALSE_DEPENDENCY): New macro.
> > > * config/i386/i386.md (dest_false_dep): New define_attr.
> > > * config/i386/sse.md 
> > > (__):
> > > Insert zero-idiom in output template when attr enabled, set new 
> > > attribute to
> > > true for non-mask/maskz insn.
> > > 
> > > (avx512fp16_sh_v8hf):
> > > Likewise.
> > > (avx512dq_mul3): Likewise.
> > > (_permvar): Likewise.
> > > (avx2_perm_1): Likewise.
> > > (avx512f_perm_1): Likewise.
> > > (avx512dq_rangep): Likewise.
> > > 
> > > (avx512dq_ranges):
> > > Likewise.
> > > (_getmant): Likewise.
> > > 
> > > (avx512f_vgetmant):
> > > Likewise.
> > > * config/i386/subst.md (mask3_dest_false_dep_attr): New 
> > > subst_attr.
> > > (mask4_dest_false_dep_attr): Likewise.
> > > (mask6_dest_false_dep_attr): Likewise.
> > > (mask10_dest_false_dep_attr): Likewise.
> > > (maskc_dest_false_dep_attr): Likewise.
> > > (mask_scalar4_dest_false_dep_attr): Likewise.
> > > (mask_scalarc_dest_false_dep_attr): Likewise.
> > > * config/i386/x86-tune.def (X86_TUNE_DEST_FALSE_DEPENDENCY): New
> > > DEF_TUNE enabled for m_SAPPHIRERAPIDS and m_ALDERLAKE
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/avx2-dest-false-dependency.c: New test.
> > > * gcc.target/i386/avx512dq-dest-false-dependency.c: Ditto.
> > > * gcc.target/i386/avx512f-dest-false-dependency.c: Ditto.
> > > * gcc.target/i386/avx512fp16-dest-false-dependency.c: Ditto.
> > > * 

Re: [PATCH] RISC-V: Document the degree of position independence that medany affords

2022-01-13 Thread Christoph Müllner via Gcc-patches
On Fri, Jan 14, 2022 at 4:42 AM Palmer Dabbelt  wrote:
>
> The code generated by -mcmodel=medany is defined to be
> position-independent, but is not guarnteed to function correctly when
> linked into position-independent executables or libraries.  See the
> recent discussion at the psABI specification [1] for more details.
>
> [1]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/245
>
> gcc/ChangeLog:
>
> * doc/invoke.texi: Document the degree of position independence
> that -mcmodel=medany affords.
>
> Signed-off-by: Palmer Dabbelt 
> ---
>  gcc/doc/invoke.texi | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 5504971ea81..eaba12bb61f 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -27568,6 +27568,10 @@ Generate code for the medium-any code model. The 
> program and its statically
>  defined symbols must be within any single 2 GiB address range. Programs can 
> be
>  statically or dynamically linked.
>
> +The code generated by the medium-any code model is position-independent, but 
> is
> +not guarnteed to function correctly when linked into position-independent
> +executables or libraries.

Typo: guarnteed -> guaranteed

I think it would be more helpful from a user perspective if a hint to
the solution
(i.e. use -fPIC) would be there. What about something like this:
"""
The code generated by the medium-any code model is position-independent.
However, to link such code into position-independent executables or libraries,
the corresponding flags to enable position-independent code generation
still need to be provided (e.g. -fPIC or -fPIE).
"""

>  @item -mexplicit-relocs
>  @itemx -mno-exlicit-relocs
>  Use or do not use assembler relocation operators when dealing with symbolic
> --
> 2.32.0


Re: [PATCH] [i386] GLC tuning: Break false dependency for dest register.

2022-01-13 Thread Hongyu Wang via Gcc-patches
> No, the approach is wrong. You have to solve output clearing on RTL
> level, please look at how e.g. tzcnt false dep is solved:

Actually we have considered such approach before, but we found we need
to break original define_insn to remove the mask/rounding subst,
since define_split could not adopt subst, and that would add 6 more
define_insn_and_split and 4 define_insn for each instruction. We think
such approach would introduce too much redundant code.

Do you think the code size increment is acceptable?

Uros Bizjak via Gcc-patches  于2022年1月13日周四 15:42写道:
>
> On Thu, Jan 13, 2022 at 8:28 AM Hongyu Wang  wrote:
> >
> > From: wwwhhhyyy 
> >
> > Hi,
> >
> > For GoldenCove micro-architecture, force insert zero-idiom in asm
> > template to break false dependency of dest register for several insns.
> >
> > The related insns are:
> >
> > VPERM/D/Q/PS/PD
> > VRANGEPD/PS/SD/SS
> > VGETMANTSS/SD/SH
> > VGETMANDPS/PD - mem version only
> > VPMULLQ
> > VFMULCSH/PH
> > VFCMULCSH/PH
> >
> > Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,}
> >
> > Ok for master?
>
> No, the approach is wrong. You have to solve output clearing on RTL
> level, please look at how e.g. tzcnt false dep is solved:
>
>   [(set (reg:CCC FLAGS_REG)
> (compare:CCC (match_operand:SWI48 1 "nonimmediate_operand" "rm")
>  (const_int 0)))
>(set (match_operand:SWI48 0 "register_operand" "=r")
> (ctz:SWI48 (match_dup 1)))]
>   "TARGET_BMI"
>   "tzcnt{}\t{%1, %0|%0, %1}";
>   "&& TARGET_AVOID_FALSE_DEP_FOR_BMI && epilogue_completed
>&& optimize_function_for_speed_p (cfun)
>&& !reg_mentioned_p (operands[0], operands[1])"
>   [(parallel
> [(set (reg:CCC FLAGS_REG)
>   (compare:CCC (match_dup 1) (const_int 0)))
>  (set (match_dup 0)
>   (ctz:SWI48 (match_dup 1)))
>  (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP)])]
>   "ix86_expand_clear (operands[0]);"
>   [(set_attr "type" "alu1")
>(set_attr "prefix_0f" "1")
>(set_attr "prefix_rep" "1")
>(set_attr "btver2_decode" "double")
>(set_attr "mode" "")])
>
> For TARGET_AVOID_FALSE_DEP_FOR_BMI, we split at epilogue_complete when
> insn registers are stable and use ix86_expand_clear to clear output
> operand. Please also note how the final insn is tagged with
> UNSPEC_INSN_FALSE_DEP to avoid combine from recognizing it too early.
>
> Uros.
>
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386.h (TARGET_DEST_FALSE_DEPENDENCY): New macro.
> > * config/i386/i386.md (dest_false_dep): New define_attr.
> > * config/i386/sse.md 
> > (__):
> > Insert zero-idiom in output template when attr enabled, set new 
> > attribute to
> > true for non-mask/maskz insn.
> > 
> > (avx512fp16_sh_v8hf):
> > Likewise.
> > (avx512dq_mul3): Likewise.
> > (_permvar): Likewise.
> > (avx2_perm_1): Likewise.
> > (avx512f_perm_1): Likewise.
> > (avx512dq_rangep): Likewise.
> > 
> > (avx512dq_ranges):
> > Likewise.
> > (_getmant): Likewise.
> > 
> > (avx512f_vgetmant):
> > Likewise.
> > * config/i386/subst.md (mask3_dest_false_dep_attr): New subst_attr.
> > (mask4_dest_false_dep_attr): Likewise.
> > (mask6_dest_false_dep_attr): Likewise.
> > (mask10_dest_false_dep_attr): Likewise.
> > (maskc_dest_false_dep_attr): Likewise.
> > (mask_scalar4_dest_false_dep_attr): Likewise.
> > (mask_scalarc_dest_false_dep_attr): Likewise.
> > * config/i386/x86-tune.def (X86_TUNE_DEST_FALSE_DEPENDENCY): New
> > DEF_TUNE enabled for m_SAPPHIRERAPIDS and m_ALDERLAKE
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/avx2-dest-false-dependency.c: New test.
> > * gcc.target/i386/avx512dq-dest-false-dependency.c: Ditto.
> > * gcc.target/i386/avx512f-dest-false-dependency.c: Ditto.
> > * gcc.target/i386/avx512fp16-dest-false-dependency.c: Ditto.
> > * gcc.target/i386/avx512fp16vl-dest-false-dependency.c: Ditto.
> > * gcc.target/i386/avx512vl-dest-false-dependency.c: Ditto.
> > ---
> >  gcc/config/i386/i386.h|   2 +
> >  gcc/config/i386/i386.md   |   4 +
> >  gcc/config/i386/sse.md| 142 +++---
> >  gcc/config/i386/subst.md  |   7 +
> >  gcc/config/i386/x86-tune.def  |   5 +
> >  .../i386/avx2-dest-false-dependency.c |  24 +++
> >  .../i386/avx512dq-dest-false-dependency.c |  73 +
> >  .../i386/avx512f-dest-false-dependency.c  | 102 +
> >  .../i386/avx512fp16-dest-false-dependency.c   |  45 ++
> >  .../i386/avx512fp16vl-dest-false-dependency.c |  24 +++
> >  .../i386/avx512vl-dest-false-dependency.c |  76 ++
> >  11 files changed, 486 insertions(+), 18 deletions(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/i386/avx2-dest-false-dependency.c
> >  create mode 100644 
> 

Re: [PATCH] disable aggressive_loop_optimizations until niter ready

2022-01-13 Thread Jiufu Guo via Gcc-patches
Richard Biener  writes:

> On Thu, 13 Jan 2022, guojiufu wrote:
>
>> On 2022-01-03 22:30, Richard Biener wrote:
>> > On Wed, 22 Dec 2021, Jiufu Guo wrote:
>> > 
>> >> Hi,
>> >> ...
>> >> 
>> >> Bootstrap and regtest pass on ppc64* and x86_64.  Is this ok for trunk?
>> > 
>> > So this is a optimality fix, not a correctness one?  I suppose the
>> > estimates are computed/used from scev_probably_wraps_p via
>> > loop_exits_before_overflow and ultimatively chrec_convert.
>> > 
>> > We have a call cycle here,
>> > 
>> > estimate_numbers_of_iterations -> number_of_latch_executions ->
>> > ... -> estimate_numbers_of_iterations
>> > 
>> > where the first estimate_numbers_of_iterations will make sure
>> > the later call will immediately return.
>> 
>> Hi Richard,
>> Thanks for your comments! And sorry for the late reply.
>> 
>> In estimate_numbers_of_iterations, there is a guard to make sure
>> the second call to estimate_numbers_of_iterations returns
>> immediately.
>> 
>> Exactly as you said, it relates to scev_probably_wraps_p calls
>> loop_exits_before_overflow.
>> 
>> The issue is: the first calling to estimate_numbers_of_iterations
>> maybe inside number_of_latch_executions.
>> 
>> > 
>> > I'm not sure what your patch tries to do - it seems to tackle
>> > the case where we enter the cycle via number_of_latch_executions?
>> > Why do we get "non-final" values?  idx_infer_loop_bounds resorts
>> 
>> Right, when the call cycle starts from number_of_latch_execution,
>> the issue may occur:
>> 
>> number_of_latch_executions(*1st call)->..->
>> analyze_scalar_evolution(IVs 1st) ->..follow_ssa_edge_expr..->
>> loop_exits_before_overflow->
>> estimate_numbers_of_iterations (*1st call)->
>> number_of_latch_executions(*2nd call)->..->
>> analyze_scalar_evolution(IVs 2nd)->..loop_exits_before_overflow->
>> estimate_numbers_of_iterations(*2nd call)
>> 
>> The second calling to estimate_numbers_of_iterations returns quickly.
>> And then, in the first calling to estimate_numbers_of_iterations,
>> infer_loop_bounds_from_undefined is invoked.
>> 
>> And, function "infer_loop_bounds_from_undefined" instantiate/analyze
>> SCEV for each SSA in the loop.
>> *Here the issue occur*, these SCEVs are based on the interim IV's
>> SCEV which come from "analyze_scalar_evolution(IVs 2nd)",
>> and those IV's SCEV will be overridden by up level
>> "analyze_scalar_evolution(IVs 1st)".
>
> OK, so indeed analyze_scalar_evolution is not protected against
> recursive invocation on the same SSA name (though it definitely
> doesn't expect to do that).  We could fix that by pre-seeding
> the cache conservatively in analyze_scalar_evolution or by
> not overwriting the cached result of the recursive invocation.
>
> But to re-iterate an unanswered question, is this a correctness issue
> or an optimization issue?

Hi Richard,

Thanks for your time and patience on review this!

I would say it is an optimization issue for the current code,
it does not fix known error.

The patch could help compiling-time.  Another benefit, this patch
would be able to improve some scev(s) if the scev is cached in
infer_loop_bounds_from_undefined under the call stack where IV's
SCEV is under analyzing.

Yes, in analyze_scalar_evolution call chain, it may recursive on
same SSA name.
While outer level analyze_scalar_evolution 'may' get better
chrec(POLYNOMIAL_CHREC), inner one may get other scev (e.g.
conversion).  I'm even wondering this recursive is intended :).
It may help to handle the chick-egg issue(wrap vs. niter).

>
>> To handle this issue, disabling flag_aggressive_loop_optimizations
>> inside number_of_latch_executions is one method.
>> To avoid the issue in other cases, e.g. the call cycle starts from
>> number_of_iterations_exit or number_of_iterations_exit_assumptions,
>> this patch disable flag_aggressive_loop_optimizations inside
>> number_of_iterations_exit_assumptions.
>
> But disabling flag_aggressive_loop_optimizations is a very
> non-intuitive way of avoiding recursive calls.  I'd rather
> avoid those in a similar way estimate_numbers_of_iterations does,
> for example with
>
> diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
> index 61d72c278a1..cc1e510b6c2 100644
> --- a/gcc/tree-scalar-evolution.c
> +++ b/gcc/tree-scalar-evolution.c
> @@ -2807,7 +2807,7 @@ number_of_latch_executions (class loop *loop)
>if (dump_file && (dump_flags & TDF_SCEV))
>  fprintf (dump_file, "(number_of_iterations_in_loop = \n");
>  
> -  res = chrec_dont_know;
> +  loop->nb_iterations = res = chrec_dont_know;
>exit = single_exit (loop);
>  
>if (exit && number_of_iterations_exit (loop, exit, _desc, false))
>
> though this doesn't seem to improve the SCEV analysis with your
> testcase.  Alternatively one could more conciously compute an
> "estimated" estimate like with
>
> diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
> index 61d72c278a1..8529c44d574 100644
> --- a/gcc/tree-scalar-evolution.c
> +++ 

Re: [PATCH] [i386] Fix ICE of unrecognizable insn. [PR target/104001]

2022-01-13 Thread Hongtao Liu via Gcc-patches
Here's the patch I'm going to check in, the patch is pre-approved in PR.

On Thu, Jan 13, 2022 at 11:59 PM liuhongt  wrote:
>
> For define_insn_and_split "*xor2andn":
>
> 1. Refine predicate of operands[0] from nonimmediate_operand to
> register_operand.
> 2. Remove TARGET_AVX512BW from condition to avoid kmov when TARGET_BMI
> is not available.
> 3. Force_reg operands[2].
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/104001
> PR target/94790
> * config/i386/i386.md (*xor2andn): Refine predicate of
> operands[0] from nonimmediate_operand to
> register_operand, remove TARGET_AVX512BW from condition,
> force_reg operands[2].
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr104001.c: New test.
> ---
>  gcc/config/i386/i386.md  |  6 +++---
>  gcc/testsuite/gcc.target/i386/pr104001.c | 21 +
>  2 files changed, 24 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104001.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 9937643a273..7bd4f24aa07 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -10455,7 +10455,7 @@ (define_insn_and_split "*xordi_1_btc"
>
>  ;; PR target/94790: Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & mask)
>  (define_insn_and_split "*xor2andn"
> -  [(set (match_operand:SWI248 0 "nonimmediate_operand")
> +  [(set (match_operand:SWI248 0 "register_operand")
> (xor:SWI248
>   (and:SWI248
> (xor:SWI248
> @@ -10464,8 +10464,7 @@ (define_insn_and_split "*xor2andn"
> (match_operand:SWI248 3 "nonimmediate_operand"))
>   (match_dup 1)))
>  (clobber (reg:CC FLAGS_REG))]
> -  "(TARGET_BMI || TARGET_AVX512BW)
> -   && ix86_pre_reload_split ()"
> +  "TARGET_BMI && ix86_pre_reload_split ()"
>"#"
>"&& 1"
>[(parallel [(set (match_dup 4)
> @@ -10486,6 +10485,7 @@ (define_insn_and_split "*xor2andn"
>   (clobber (reg:CC FLAGS_REG))])]
>  {
>operands[1] = force_reg (mode, operands[1]);
> +  operands[2] = force_reg (mode, operands[2]);
>operands[3] = force_reg (mode, operands[3]);
>operands[4] = gen_reg_rtx (mode);
>operands[5] = gen_reg_rtx (mode);
> diff --git a/gcc/testsuite/gcc.target/i386/pr104001.c 
> b/gcc/testsuite/gcc.target/i386/pr104001.c
> new file mode 100644
> index 000..bd85aa7145e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr104001.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not "kandn" } } */
> +/* { dg-final { scan-assembler-times "andn" 1 } } */
> +
> +int b, c, d;
> +int r;
> +
> +void
> +__attribute__((target("bmi")))
> +foo ()
> +{
> +  r = ((b & ~d) | (c & d));
> +}
> +
> +void
> +__attribute__((target("avx512bw")))
> +bar ()
> +{
> +  r = ((b & ~d) | (c & d));
> +}
> --
> 2.18.1
>


-- 
BR,
Hongtao
From 04ef9a899217f334f729a75a1907b505c4c29451 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Thu, 13 Jan 2022 22:51:49 +0800
Subject: [PATCH] [i386] Fix ICE of unrecognizable insn. [PR target/104001]

For define_insn_and_split "*xor2andn":

1. Refine predicate of operands[0] from nonimmediate_operand to
register_operand.
2. Remove TARGET_AVX512BW from condition to avoid kmov when TARGET_BMI
is not available.

gcc/ChangeLog:

	PR target/104001
	PR target/94790
	PR target/104014
	* config/i386/i386.md (*xor2andn): Refine predicate of
	operands[0] from nonimmediate_operand to
	register_operand, remove TARGET_AVX512BW from condition.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr104001.c: New test.
---
 gcc/config/i386/i386.md  | 11 +--
 gcc/testsuite/gcc.target/i386/pr104001.c | 21 +
 2 files changed, 26 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104001.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9937643a273..a631630add6 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10453,9 +10453,9 @@ (define_insn_and_split "*xordi_1_btc"
(set_attr "znver1_decode" "double")
(set_attr "mode" "DI")])
 
-;; PR target/94790: Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & mask)
+;; Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & mask)
 (define_insn_and_split "*xor2andn"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand")
+  [(set (match_operand:SWI248 0 "register_operand")
 	(xor:SWI248
 	  (and:SWI248
 	(xor:SWI248
@@ -10464,8 +10464,7 @@ (define_insn_and_split "*xor2andn"
 	(match_operand:SWI248 3 "nonimmediate_operand"))
 	  (match_dup 1)))
 (clobber (reg:CC FLAGS_REG))]
-  "(TARGET_BMI || TARGET_AVX512BW)
-   && ix86_pre_reload_split ()"
+  "TARGET_BMI && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(parallel [(set (match_dup 4)
@@ -10476,8 +10475,8 @@ (define_insn_and_split "*xor2andn"
 	  (clobber (reg:CC 

Re: [PATCH] tree-optimization/104009: Conservative underflow estimate in object size

2022-01-13 Thread Jakub Jelinek via Gcc-patches
On Fri, Jan 14, 2022 at 09:01:09AM +0530, Siddhesh Poyarekar wrote:
> Restrict negative offset computation only to dynamic object sizes, where
> size expressions are accurate and not a maximum/minimum estimate and in
> cases where negative offsets definitely mean an underflow, e.g. in
> MEM_REF of the whole object with negative ofset in addr_object_size.
> 
> This ends up missing some cases where __builtin_object_size could have
> come up with more precise results, so tests have been adjusted to
> reflect that.
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/104009
>   * tree-object-size.c (compute_builtin_object_size): Bail out on
>   negative offset.
>   (plus_stmt_object_size): Return maximum of wholesize and minimum
>   of 0 for negative offset.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/104009
>   * gcc.dg/builtin-object-size-1.c (test10): New test.
>   * gcc.dg/builtin-object-size-3.c (test10): Likewise.
>   (test9): Expect zero size for negative offsets.
>   * gcc.dg/builtin-object-size-4.c (test8): Likewise.
>   * gcc.dg/builtin-object-size-5.c (test7): Drop test for
>   __builtin_object_size.

Ok.

Jakub



[PATCH] RISC-V: Document the degree of position independence that medany affords

2022-01-13 Thread Palmer Dabbelt
The code generated by -mcmodel=medany is defined to be
position-independent, but is not guarnteed to function correctly when
linked into position-independent executables or libraries.  See the
recent discussion at the psABI specification [1] for more details.

[1]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/245

gcc/ChangeLog:

* doc/invoke.texi: Document the degree of position independence
that -mcmodel=medany affords.

Signed-off-by: Palmer Dabbelt 
---
 gcc/doc/invoke.texi | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5504971ea81..eaba12bb61f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -27568,6 +27568,10 @@ Generate code for the medium-any code model. The 
program and its statically
 defined symbols must be within any single 2 GiB address range. Programs can be
 statically or dynamically linked.
 
+The code generated by the medium-any code model is position-independent, but is
+not guarnteed to function correctly when linked into position-independent
+executables or libraries.
+
 @item -mexplicit-relocs
 @itemx -mno-exlicit-relocs
 Use or do not use assembler relocation operators when dealing with symbolic
-- 
2.32.0



[PATCH] tree-optimization/104009: Conservative underflow estimate in object size

2022-01-13 Thread Siddhesh Poyarekar
Restrict negative offset computation only to dynamic object sizes, where
size expressions are accurate and not a maximum/minimum estimate and in
cases where negative offsets definitely mean an underflow, e.g. in
MEM_REF of the whole object with negative ofset in addr_object_size.

This ends up missing some cases where __builtin_object_size could have
come up with more precise results, so tests have been adjusted to
reflect that.

gcc/ChangeLog:

PR tree-optimization/104009
* tree-object-size.c (compute_builtin_object_size): Bail out on
negative offset.
(plus_stmt_object_size): Return maximum of wholesize and minimum
of 0 for negative offset.

gcc/testsuite/ChangeLog:

PR tree-optimization/104009
* gcc.dg/builtin-object-size-1.c (test10): New test.
* gcc.dg/builtin-object-size-3.c (test10): Likewise.
(test9): Expect zero size for negative offsets.
* gcc.dg/builtin-object-size-4.c (test8): Likewise.
* gcc.dg/builtin-object-size-5.c (test7): Drop test for
__builtin_object_size.

Signed-off-by: Siddhesh Poyarekar 
---
Testing:
- bootstrap build+test for x86_64
- build+test for i686
- bootstrap build --with-build-config=bootstrap-ubsan

 gcc/testsuite/gcc.dg/builtin-object-size-1.c | 27 
 gcc/testsuite/gcc.dg/builtin-object-size-3.c | 34 +---
 gcc/testsuite/gcc.dg/builtin-object-size-4.c |  6 ++--
 gcc/testsuite/gcc.dg/builtin-object-size-5.c |  2 ++
 gcc/tree-object-size.c   | 15 +++--
 5 files changed, 75 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-1.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-1.c
index 161f426ec0b..b772e2da9b9 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-1.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-1.c
@@ -603,6 +603,32 @@ test9 (unsigned cond)
 #endif
 }
 
+void
+__attribute__ ((noinline))
+test10 (void)
+{
+  static char buf[255];
+  unsigned int i, len = sizeof (buf);
+  char *p = buf;
+
+  for (i = 0 ; i < sizeof (buf) ; i++)
+{
+  if (len < 2)
+   {
+#ifdef __builtin_object_size
+ if (__builtin_object_size (p - 3, 0) != sizeof (buf) - i + 3)
+   abort ();
+#else
+ if (__builtin_object_size (p - 3, 0) != sizeof (buf))
+   abort ();
+#endif
+ break;
+   }
+  p++;
+  len--;
+}
+}
+
 int
 main (void)
 {
@@ -617,5 +643,6 @@ main (void)
   test7 ();
   test8 ();
   test9 (1);
+  test10 ();
   exit (0);
 }
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-3.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-3.c
index db31171a8bd..44a99189776 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-3.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-3.c
@@ -581,7 +581,7 @@ test9 (unsigned cond)
   if (__builtin_object_size ([-4], 2) != (cond ? 6 : 10))
 abort ();
 #else
-  if (__builtin_object_size ([-4], 2) != 6)
+  if (__builtin_object_size ([-4], 2) != 0)
 abort ();
 #endif
 
@@ -592,7 +592,7 @@ test9 (unsigned cond)
   if (__builtin_object_size (p, 2) != ((cond ? 2 : 6) + cond))
 abort ();
 #else
-  if (__builtin_object_size (p, 2) != 2)
+  if (__builtin_object_size (p, 2) != 0)
 abort ();
 #endif
 
@@ -605,12 +605,37 @@ test9 (unsigned cond)
   != sizeof (y) - __builtin_offsetof (struct A, c) - 8 + cond)
 abort ();
 #else
-  if (__builtin_object_size (p, 2)
-  != sizeof (y) - __builtin_offsetof (struct A, c) - 8)
+  if (__builtin_object_size (p, 2) != 0)
 abort ();
 #endif
 }
 
+void
+__attribute__ ((noinline))
+test10 (void)
+{
+  static char buf[255];
+  unsigned int i, len = sizeof (buf);
+  char *p = buf;
+
+  for (i = 0 ; i < sizeof (buf) ; i++)
+{
+  if (len < 2)
+   {
+#ifdef __builtin_object_size
+ if (__builtin_object_size (p - 3, 2) != sizeof (buf) - i + 3)
+   abort ();
+#else
+ if (__builtin_object_size (p - 3, 2) != 0)
+   abort ();
+#endif
+ break;
+   }
+  p++;
+  len--;
+}
+}
+
 int
 main (void)
 {
@@ -625,5 +650,6 @@ main (void)
   test7 ();
   test8 ();
   test9 (1);
+  test10 ();
   exit (0);
 }
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-4.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-4.c
index f644890dd14..b9fddfed036 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-4.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-4.c
@@ -489,7 +489,7 @@ test8 (unsigned cond)
   if (__builtin_object_size ([-4], 3) != (cond ? 6 : 10))
 abort ();
 #else
-  if (__builtin_object_size ([-4], 3) != 6)
+  if (__builtin_object_size ([-4], 3) != 0)
 abort ();
 #endif
 
@@ -500,7 +500,7 @@ test8 (unsigned cond)
   if (__builtin_object_size (p, 3) != ((cond ? 2 : 6) + cond))
 abort ();
 #else
-  if (__builtin_object_size (p, 3) != 2)
+  if (__builtin_object_size (p, 3) != 0)
 abort ();
 #endif
 
@@ -512,7 +512,7 @@ test8 (unsigned cond)
   if (__builtin_object_size (p, 3) != sizeof (y.c) - 8 + 

Re: [PATCH] PR fortran/103782 - [9/10/11/12 Regression] internal error occurs when overloading intrinsic

2022-01-13 Thread Jerry D via Gcc-patches

On 1/13/22 12:56 PM, Harald Anlauf via Fortran wrote:

Dear all,

there was a regression handling overloaded elemental intrinsics,
leading to an ICE on valid code.  Reported by Urban Jost.

The logic for when we need to scalarize a call to an intrinsic
seems to have been broken during the 9-release.  The attached
patch fixes the ICE and seems to work on the extended testcase
as well as regtests fine on x86_64-pc-linux-gnu.

OK for mainline?  Backport to affected branches?


Looks good to me. Go for mainline and backport.

Regards,

Jerry


[committed] Add __attribute__ ((tainted_args))

2022-01-13 Thread David Malcolm via Gcc-patches
On Thu, 2022-01-13 at 14:08 -0500, Jason Merrill wrote:
> On 1/12/22 10:33, David Malcolm wrote:
> > On Tue, 2022-01-11 at 23:36 -0500, Jason Merrill wrote:
> > > On 1/10/22 16:36, David Malcolm via Gcc-patches wrote:
> > > > On Thu, 2022-01-06 at 09:08 -0500, David Malcolm wrote:
> > > > > On Sat, 2021-11-13 at 15:37 -0500, David Malcolm wrote:
> > > > > > This patch adds a new __attribute__ ((tainted)) to the
> > > > > > C/C++
> > > > > > frontends.
> > > > > 
> > > > > Ping for GCC C/C++ mantainers for review of the C/C++ FE
> > > > > parts of
> > > > > this
> > > > > patch (attribute registration, documentation, the name of the
> > > > > attribute, etc).
> > > > > 
> > > > > (I believe it's independent of the rest of the patch kit, in
> > > > > that
> > > > > it
> > > > > could go into trunk without needing the prior patches)
> > > > > 
> > > > > Thanks
> > > > > Dave
> > > > 
> > > > Getting close to end of stage 3 for GCC 12, so pinging this
> > > > patch
> > > > again...
> > > > 
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584376.html
> > > 
> > > The c-family change is OK.
> > 
> > Thanks.
> > 
> > I'm retesting the patch now, but it now seems to me that
> >__attribute__((tainted_args))
> > would lead to more readable code than:
> >__attribute__((tainted))
> > 
> > in that the name "tainted_args" better conveys the idea that all
> > arguments are under attacker-control (as opposed to the body of the
> > function or the function pointer being under attacker-control).
> > 
> > Looking at
> >   
> > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
> > we already have some attributes with underscores in their names.
> > 
> > Does this sound good?
> 
> Makes sense to me.

Thanks.

I updated the patch to use the name "tainted_args" for the attribute,
and there were a few other changes needed due to splitting it out from
the rest of the kit.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk for gcc 12 as b31cec9c22b8dfa40baefd4c2dd774477e8e04c5.

The following is what I committed, for reference:

This patch adds a new __attribute__ ((tainted_args)) to the C/C++ frontends.

It can be used on function decls: the analyzer will treat as tainted
all parameters to the function and all buffers pointed to by parameters
to the function.  Adding this in one place to the Linux kernel's
__SYSCALL_DEFINEx macro allows the analyzer to treat all syscalls as
having tainted inputs.  This gives some coverage of system calls without
needing to "teach" the analyzer about "__user" - an example of the use
of this can be seen in CVE-2011-2210, where given:

 SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user *, buffer,
 unsigned long, nbytes, int __user *, start, void __user *, arg)

the analyzer will treat the nbytes param as under attacker control, and
can complain accordingly:

taint-CVE-2011-2210-1.c: In function 'sys_osf_getsysinfo':
taint-CVE-2011-2210-1.c:69:21: warning: use of attacker-controlled value
  'nbytes' as size without upper-bounds checking [CWE-129] 
[-Wanalyzer-tainted-size]
   69 | if (copy_to_user(buffer, hwrpb, nbytes) != 0)
  | ^~~

Additionally, the patch allows the attribute to be used on field decls:
specifically function pointers.  Any function used as an initializer
for such a field gets treated as being called with tainted arguments.
An example can be seen in CVE-2020-13143, where adding
__attribute__((tainted_args)) to the "store" callback of
configfs_attribute:

  struct configfs_attribute {
/* [...snip...] */
ssize_t (*store)(struct config_item *, const char *, size_t)
  __attribute__((tainted_args));
/* [...snip...] */
  };

allows the analyzer to see:

 CONFIGFS_ATTR(gadget_dev_desc_, UDC);

and treat gadget_dev_desc_UDC_store as having tainted arguments, so that
it complains:

taint-CVE-2020-13143-1.c: In function 'gadget_dev_desc_UDC_store':
taint-CVE-2020-13143-1.c:33:17: warning: use of attacker-controlled value
  'len + 18446744073709551615' as offset without upper-bounds checking 
[CWE-823] [-Wanalyzer-tainted-offset]
   33 | if (name[len - 1] == '\n')
  | ^

As before this currently still needs -fanalyzer-checker=taint (in
addition to -fanalyzer).

gcc/analyzer/ChangeLog:
* engine.cc: Include "stringpool.h", "attribs.h", and
"tree-dfa.h".
(mark_params_as_tainted): New.
(class tainted_args_function_custom_event): New.
(class tainted_args_function_info): New.
(exploded_graph::add_function_entry): Handle functions with
"tainted_args" attribute.
(class tainted_args_field_custom_event): New.
(class tainted_args_callback_custom_event): New.
(class tainted_args_call_info): New.
(add_tainted_args_callback): New.
(add_any_callbacks): New.

Re: [PATCH] Fix -Wformat-diag for rs6000 target.

2022-01-13 Thread Segher Boessenkool
Hi!

On Wed, Jan 12, 2022 at 10:02:36AM +0100, Martin Liška wrote:
> -  error ("%qs requires ISA 3.0 IEEE 128-bit floating point", name);
> +  error ("%qs requires ISA 3.0 IEEE 128-bit floating-point", name);

This change is incorrect.  Floating point is not an adjective here.

It is probably best to just rewrite this text, of course.  Maybe
"%qs requires quad-precision floating-point arithmetic" or something
like that (that it was first introduced in Power ISA 3.0 is dated
already, for example).

> -  error ("%<%s%> not supported with %<-msoft-float%>",
> +  error ("%qs not supported with %<-msoft-float%>",

This change is trivial and obviously correct.  Please just commit such
changes, *not* mixed in with anything else though.  (And if you aren't
sure something is trivial or obviously correct, it is not!)

> -  return scalar_extract_exp (source);/* { dg-error "requires ISA 3.0 
> IEEE 
> 128-bit floating point" } */
> +  return scalar_extract_exp (source);/* { dg-error "requires ISA 3.0 
> IEEE 
> 128-bit floating-point" } */

Your patch is malformed.  Please fix your mailer.


Segher


Re: [PATCH] c++: Avoid some -Wreturn-type false positives with const{expr,eval} if [PR103991]

2022-01-13 Thread Jason Merrill via Gcc-patches

On 1/13/22 16:23, Jakub Jelinek wrote:

On Thu, Jan 13, 2022 at 04:09:22PM -0500, Jason Merrill wrote:

The changes done to genericize_if_stmt in order to improve
-Wunreachable-code* warning (which Richi didn't actually commit
for GCC 12) are I think fine for normal ifs, but for constexpr if
and consteval if we have two competing warnings.
The problem is that we replace the non-taken clause (then or else)
with void_node and keep the if (cond) { something } else {}
or if (cond) {} else { something }; in the IL.
This helps -Wunreachable-code*, if something can't fallthru but the
non-taken clause can, we don't warn about code after it because it
is still (in theory) reachable.
But if the non-taken branch can't fallthru, we can get false positive
-Wreturn-type warnings (which are enabled by default) if there is
nothing after the if and the taken branch can't fallthru either.


Perhaps we should replace the non-taken clause with __builtin_unreachable()
instead of void_node?


It depends.  If the non-taken clause doesn't exist, is empty or otherwise
can fallthru, then using void_node for it is what we want.
If it exists and can't fallthru, then __builtin_unreachable() is one
possibility, but for all purpose
   if (1)
 something
   else
 __builtin_unreachable();
is equivalent to genericization of it as
   something
and
   if (0)
 __builtin_unreachable();
   else
 something
too.
The main problem is what to do for the consteval if that throws away
the non-taken clause too early, whether we can do block_may_fallthru
already where we throw it away or not.  If we can do that, we could
as right now clear the non-taken clause if it can fallthru and otherwise
either set some flag on the IF_STMT or set the non-taken clause to
__builtin_unreachable or endless empty loop etc., ideally something
as cheap as possible.
  

And/or block_may_fallthru could handle INTEGER_CST op0?


That is what I'm doing for consteval if in the patch because the info
whether the non-taken clause can fallthru is lost.
We can't do that for normal if, because the non-taken clause could
have labels in it to which something jumps.
But, block_may_fallthru isn't actually what is used for the -Wreturn-type
warning, I think we warn only at cfg creation.


Fair enough.  The patch is OK.

Jason



Re: [PATCH RFA] diagnostic: avoid repeating include path

2022-01-13 Thread David Malcolm via Gcc-patches
On Thu, 2022-01-13 at 17:08 -0500, Jason Merrill wrote:
> When a sequence of diagnostic messages bounces back and forth
> repeatedly
> between two includes, as with
> 
>  #include 
>  std::map m ("123", "456");
> 
> The output is quite a bit longer than necessary because we dump the
> include
> path each time it changes.  I'd think we could print the include path
> once
> for each header file, and then expect that the user can look earlier
> in the
> output if they're wondering.
> 
> Tested x86_64-pc-linux-gnu, OK for trunk?
> 
> gcc/ChangeLog:
> 
> * diagnostic.c (includes_seen): New.
> (diagnostic_report_current_module): Use it.
> ---
>  gcc/diagnostic.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
> index 58139427d01..e56441a2dbf 100644
> --- a/gcc/diagnostic.c
> +++ b/gcc/diagnostic.c
> @@ -700,6 +700,16 @@ set_last_module (diagnostic_context *context,
> const line_map_ordinary *map)
>    context->last_module = map;
>  }
>  
> +/* Only dump the "In file included from..." stack once for each
> file.  */
> +
> +static bool
> +includes_seen (const line_map_ordinary *map)
> +{
> +  using hset = hash_set;
> +  static hset *set = new hset;
> +  return set->add (map);
> +}

Overall, I like the idea, but...

- the patch works at the level of line_map_ordinary instances, rather
than header files.  There are various ways in which a single header
file can have multiple line maps e.g. due to very long lines, or
including another file, etc.  I think it makes sense to do it at the
per-file level, assuming we aren't in a horrible situation where a
header is being included repeatedly, with different effects.  So maybe
this ought to look at what include directive led to this map, i.e.
looking at the ord_map->included_from field, and having a
hash_set ?

- there's no test coverage, but it's probably not feasible to write
DejaGnu tests for this, given the way prune.exp's prune_gcc_output
strips these strings.  Maybe a dg directive to selectively disable the
pertinent pruning operations in prune_gcc_output???  Gah...

- global state is a pet peeve of mine; can the above state be put
inside the diagnostic_context instead?   (perhaps via a pointer to a
wrapper class to avoid requiring all users of diagnostic.h to include
hash-set.h?).

Hope this is constructive
Dave

> +
>  void
>  diagnostic_report_current_module (diagnostic_context *context,
> location_t where)
>  {
> @@ -721,7 +731,7 @@ diagnostic_report_current_module
> (diagnostic_context *context, location_t where)
>    if (map && last_module_changed_p (context, map))
>  {
>    set_last_module (context, map);
> -  if (! MAIN_FILE_P (map))
> +  if (! MAIN_FILE_P (map) && !includes_seen (map))
> {
>   bool first = true, need_inc = true, was_module =
> MAP_MODULE_P (map);
>   expanded_location s = {};
> 
> base-commit: b8ffa71e4271ae562c2d315b9b24c4979bbf8227
> prerequisite-patch-id: e45065ef320968d982923dd44da7bed07e3326ef




[PATCH RFA] diagnostic: avoid repeating include path

2022-01-13 Thread Jason Merrill via Gcc-patches
When a sequence of diagnostic messages bounces back and forth repeatedly
between two includes, as with

 #include 
 std::map m ("123", "456");

The output is quite a bit longer than necessary because we dump the include
path each time it changes.  I'd think we could print the include path once
for each header file, and then expect that the user can look earlier in the
output if they're wondering.

Tested x86_64-pc-linux-gnu, OK for trunk?

gcc/ChangeLog:

* diagnostic.c (includes_seen): New.
(diagnostic_report_current_module): Use it.
---
 gcc/diagnostic.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 58139427d01..e56441a2dbf 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -700,6 +700,16 @@ set_last_module (diagnostic_context *context, const 
line_map_ordinary *map)
   context->last_module = map;
 }
 
+/* Only dump the "In file included from..." stack once for each file.  */
+
+static bool
+includes_seen (const line_map_ordinary *map)
+{
+  using hset = hash_set;
+  static hset *set = new hset;
+  return set->add (map);
+}
+
 void
 diagnostic_report_current_module (diagnostic_context *context, location_t 
where)
 {
@@ -721,7 +731,7 @@ diagnostic_report_current_module (diagnostic_context 
*context, location_t where)
   if (map && last_module_changed_p (context, map))
 {
   set_last_module (context, map);
-  if (! MAIN_FILE_P (map))
+  if (! MAIN_FILE_P (map) && !includes_seen (map))
{
  bool first = true, need_inc = true, was_module = MAP_MODULE_P (map);
  expanded_location s = {};

base-commit: b8ffa71e4271ae562c2d315b9b24c4979bbf8227
prerequisite-patch-id: e45065ef320968d982923dd44da7bed07e3326ef
-- 
2.27.0



Re: [PATCH] cprop_hardreg: Workaround for narrow mode != lowpart targets

2022-01-13 Thread Andreas Krebbel via Gcc-patches
On 1/13/22 18:11, Andreas Krebbel via Gcc-patches wrote:
...
> @@ -5949,7 +5959,7 @@ register if floating point arithmetic is not being 
> done.  As long as the\n\
>  floating registers are not in class @code{GENERAL_REGS}, they will not\n\
>  be used unless some pattern's constraint asks for one.",
>   bool, (unsigned int regno, machine_mode mode),
> - hook_bool_uint_mode_true)
> + hook_bool_uint_true)
>  
>  DEFHOOK
>  (modes_tieable_p,

That hunk was a copy and paste bug and does not belong to the patch.

Andreas


[patch, libgfortran, power-ieee128] Add multiple defaults for GFORTRAN_CONVERT_UNIT

2022-01-13 Thread Thomas Koenig via Gcc-patches

Hello world,

with this patch, it is now possible to specify both the
endianness and the REAL(KIND=16) format using the
environment variable GFORTRAN_CONVERT_UNIT.  The following
now works:

koenig@gcc-fortran:~/Tst$ cat write_env.f90
program main
  real(kind=16) :: x
  character (len=30) :: conv
  x = 1/3._16
  open 
(10,file="out.dat",status="replace",access="stream",form="unformatted")

  inquire(10,convert=conv)
  print *,conv
  write (10) 1/3._16
end program main
tkoenig@gcc-fortran:~/Tst$ gfortran -g -static-libgfortran write_env.f90
tkoenig@gcc-fortran:~/Tst$ GFORTRAN_CONVERT_UNIT="little_endian;r16_ibm" 
&& ./a.out

 LITTLE_ENDIAN,R16_IBM
tkoenig@gcc-fortran:~/Tst$ 
GFORTRAN_CONVERT_UNIT="little_endian;r16_ieee" && ./a.out

 LITTLE_ENDIAN,R16_IEEE
tkoenig@gcc-fortran:~/Tst$ GFORTRAN_CONVERT_UNIT="big_endian;r16_ieee" 
&& ./a.out

 BIG_ENDIAN,R16_IEEE
tkoenig@gcc-fortran:~/Tst$ GFORTRAN_CONVERT_UNIT="big_endian;r16_ibm" && 
./a.out

 BIG_ENDIAN,R16_IBM

Since the branch has been pushed to trunk, I don't think we need
it any more (or do we?), so OK for trunk?

Best regards

Thomas

Allow for multiple defaults in endianness and r16 in GFORTRAN_CONVERT_UNIT.

With this patch, it is possible to specify multiple defaults inthe
GFORTRAN_CONVERT_UNIT environment variable so that, for example, R16_IEEE
and BIG_ENDIAN can be specified together.


libgfortran/ChangeLog:

* runtime/environ.c: Allow for multiple default values so that
separate default specifications for IBM long double format and
endianness are possible.

diff --git a/libgfortran/runtime/environ.c b/libgfortran/runtime/environ.c
index 3d60950234d..a53c64965b6 100644
--- a/libgfortran/runtime/environ.c
+++ b/libgfortran/runtime/environ.c
@@ -499,78 +499,79 @@ do_parse (void)
 
   unit_count = 0;
 
-  start = p;
-
   /* Parse the string.  First, let's look for a default.  */
-  tok = next_token ();
   endian = 0;
-
-  switch (tok)
+  while (1)
 {
-case NATIVE:
-  endian = GFC_CONVERT_NATIVE;
-  break;
+  start = p;
+  tok = next_token ();
+  switch (tok)
+	{
+	case NATIVE:
+	  endian = GFC_CONVERT_NATIVE;
+	  break;
 
-case SWAP:
-  endian = GFC_CONVERT_SWAP;
-  break;
+	case SWAP:
+	  endian = GFC_CONVERT_SWAP;
+	  break;
 
-case BIG:
-  endian = GFC_CONVERT_BIG;
-  break;
+	case BIG:
+	  endian = GFC_CONVERT_BIG;
+	  break;
 
-case LITTLE:
-  endian = GFC_CONVERT_LITTLE;
-  break;
+	case LITTLE:
+	  endian = GFC_CONVERT_LITTLE;
+	  break;
 
 #ifdef HAVE_GFC_REAL_17
-case R16_IEEE:
-  endian = GFC_CONVERT_R16_IEEE;
-  break;
+	case R16_IEEE:
+	  endian = GFC_CONVERT_R16_IEEE;
+	  break;
 
-case R16_IBM:
-  endian = GFC_CONVERT_R16_IBM;
-  break;
+	case R16_IBM:
+	  endian = GFC_CONVERT_R16_IBM;
+	  break;
 #endif
-case INTEGER:
-  /* A leading digit means that we are looking at an exception.
-	 Reset the position to the beginning, and continue processing
-	 at the exception list.  */
-  p = start;
-  goto exceptions;
-  break;
+	case INTEGER:
+	  /* A leading digit means that we are looking at an exception.
+	 Reset the position to the beginning, and continue processing
+	 at the exception list.  */
+	  p = start;
+	  goto exceptions;
+	  break;
 
-case END:
-  goto end;
-  break;
+	case END:
+	  goto end;
+	  break;
 
-default:
-  goto error;
-  break;
+	default:
+	  goto error;
+	  break;
 }
 
-  tok = next_token ();
-  switch (tok)
-{
-case ';':
-  def = endian;
-  break;
+  tok = next_token ();
+  switch (tok)
+	{
+	case ';':
+	  def = def == GFC_CONVERT_NONE ? endian : def | endian;
+	  break;
 
-case ':':
-  /* This isn't a default after all.  Reset the position to the
-	 beginning, and continue processing at the exception list.  */
-  p = start;
-  goto exceptions;
-  break;
+	case ':':
+	  /* This isn't a default after all.  Reset the position to the
+	 beginning, and continue processing at the exception list.  */
+	  p = start;
+	  goto exceptions;
+	  break;
 
-case END:
-  def = endian;
-  goto end;
-  break;
+	case END:
+	  def = def == GFC_CONVERT_NONE ? endian : def | endian;
+	  goto end;
+	  break;
 
-default:
-  goto error;
-  break;
+	default:
+	  goto error;
+	  break;
+	}
 }
 
  exceptions:


Re: Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics for OpenACC test cases

2022-01-13 Thread Thomas Schwinge
Hi Martin!

On 2022-01-13T09:06:16-0700, Martin Sebor  wrote:
> On 1/13/22 03:55, Thomas Schwinge wrote:
>> This has fallen out of (unfinished...) work earlier in the year: pushed
>> to master branch commit 4bd8b1e881f0c26a5103cd1919809b3d63b60ef2
>> "Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics
>> for OpenACC test cases".
>
> Thanks for the heads up.  If any of these are recent regressions
> (either the false negatives or the false positives) it would be
> helpful to isolate them to a few representative test cases.
> The warning itself hasn't changed much in GCC 12 but regressions
> in it could be due to the jump threading changes that it tends to
> be sensitive to.

Ah, sorry for the ambiguity -- I don't think any of these are recent
regressions.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] c++: Avoid some -Wreturn-type false positives with const{expr,eval} if [PR103991]

2022-01-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 13, 2022 at 04:09:22PM -0500, Jason Merrill wrote:
> > The changes done to genericize_if_stmt in order to improve
> > -Wunreachable-code* warning (which Richi didn't actually commit
> > for GCC 12) are I think fine for normal ifs, but for constexpr if
> > and consteval if we have two competing warnings.
> > The problem is that we replace the non-taken clause (then or else)
> > with void_node and keep the if (cond) { something } else {}
> > or if (cond) {} else { something }; in the IL.
> > This helps -Wunreachable-code*, if something can't fallthru but the
> > non-taken clause can, we don't warn about code after it because it
> > is still (in theory) reachable.
> > But if the non-taken branch can't fallthru, we can get false positive
> > -Wreturn-type warnings (which are enabled by default) if there is
> > nothing after the if and the taken branch can't fallthru either.
> 
> Perhaps we should replace the non-taken clause with __builtin_unreachable()
> instead of void_node?

It depends.  If the non-taken clause doesn't exist, is empty or otherwise
can fallthru, then using void_node for it is what we want.
If it exists and can't fallthru, then __builtin_unreachable() is one
possibility, but for all purpose
  if (1)
something
  else
__builtin_unreachable();
is equivalent to genericization of it as
  something
and
  if (0)
__builtin_unreachable();
  else
something
too.
The main problem is what to do for the consteval if that throws away
the non-taken clause too early, whether we can do block_may_fallthru
already where we throw it away or not.  If we can do that, we could
as right now clear the non-taken clause if it can fallthru and otherwise
either set some flag on the IF_STMT or set the non-taken clause to
__builtin_unreachable or endless empty loop etc., ideally something
as cheap as possible.
 
> And/or block_may_fallthru could handle INTEGER_CST op0?

That is what I'm doing for consteval if in the patch because the info
whether the non-taken clause can fallthru is lost.
We can't do that for normal if, because the non-taken clause could
have labels in it to which something jumps.
But, block_may_fallthru isn't actually what is used for the -Wreturn-type
warning, I think we warn only at cfg creation.

Jakub



Re: [PATCH] c++: Reject in constant evaluation address comparisons of start of one var and end of another [PR89074]

2022-01-13 Thread Jason Merrill via Gcc-patches

On 1/6/22 04:24, Jakub Jelinek wrote:


The following testcase used to be incorrectly accepted.  The match.pd
optimization that uses address_compare punts on folding comparison
of start of one object and end of another one only when those addresses
are cast to integral types, when the comparison is done on pointer types
it assumes undefined behavior and decides to fold the comparison such
that the addresses don't compare equal even when they at runtime they
could be equal.
But C++ says it is undefined behavior and so during constant evaluation
we should reject those, so this patch adds !folding_initializer &&
check to that spot.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


Note, address_compare has some special cases, e.g. it assumes that
static vars are never adjacent to automatic vars, which is the case
for the usual layout where automatic vars are on the stack and after
.rodata/.data sections there is heap:
   /* Assume that automatic variables can't be adjacent to global
  variables.  */
   else if (is_global_var (base0) != is_global_var (base1))
 ;
Is it ok that during constant evaluation we don't treat those as undefined
behavior, or shall that be with !folding_initializer && too?


I guess that's undefined as well.


Another special case is:
   if ((DECL_P (base0) && TREE_CODE (base1) == STRING_CST)
|| (TREE_CODE (base0) == STRING_CST && DECL_P (base1))
|| (TREE_CODE (base0) == STRING_CST
&& TREE_CODE (base1) == STRING_CST
&& ioff0 >= 0 && ioff1 >= 0
&& ioff0 < TREE_STRING_LENGTH (base0)
&& ioff1 < TREE_STRING_LENGTH (base1)
   /* This is a too conservative test that the STRING_CSTs
  will not end up being string-merged.  */
&& strncmp (TREE_STRING_POINTER (base0) + ioff0,
TREE_STRING_POINTER (base1) + ioff1,
MIN (TREE_STRING_LENGTH (base0) - ioff0,
 TREE_STRING_LENGTH (base1) - ioff1)) != 0))
 ;
   else if (!DECL_P (base0) || !DECL_P (base1))
 return 2;
Here we similarly assume that vars aren't adjacent to string literals
or vice versa.  Do we need to stick !folding_initializer && to those
DECL_P vs. STRING_CST cases?


Seems so.


Though, because of the return 2; for
non-DECL_P that would mean rejecting comparisons like  == &"foobar"[3]
etc. which ought to be fine, no?  So perhaps we need to watch for
decls. vs. STRING_CSTs like for DECLs whether the address is at the start
or at the end of the string literal or somewhere in between (at least
for folding_initializer)?


Agreed.


And yet another chapter but probably unsolvable is comparison of
string literal addresses.  I think pedantically in C++
&"foo"[0] == &"foo"[0] is undefined behavior, different occurences of
the same string literals might still not be merged in some implementations.


I disagree; it's unspecified whether string literals are merged, but I 
think the comparison result is well specified depending on that 
implementation behavior.



But constexpr const char *s = "foo"; [0] == [0] should be well defined,
and we aren't tracking anywhere whether the string literal was the same one
or different (and I think other compilers don't track that either).

2022-01-06  Jakub Jelinek  

PR c++/89074
* fold-const.c (address_compare): Punt on comparison of address of
one object with address of end of another object if
folding_initializer.

* g++.dg/cpp1y/constexpr-89074-1.C: New test.

--- gcc/fold-const.c.jj 2022-01-05 20:30:08.731806756 +0100
+++ gcc/fold-const.c2022-01-05 20:34:52.277822349 +0100
@@ -16627,7 +16627,7 @@ address_compare (tree_code code, tree ty
/* If this is a pointer comparison, ignore for now even
   valid equalities where one pointer is the offset zero
   of one object and the other to one past end of another one.  */
-  else if (!INTEGRAL_TYPE_P (type))
+  else if (!folding_initializer && !INTEGRAL_TYPE_P (type))
  ;
/* Assume that automatic variables can't be adjacent to global
   variables.  */
--- gcc/testsuite/g++.dg/cpp1y/constexpr-89074-1.C.jj   2022-01-05 
20:43:03.696917484 +0100
+++ gcc/testsuite/g++.dg/cpp1y/constexpr-89074-1.C  2022-01-05 
20:42:12.676634044 +0100
@@ -0,0 +1,28 @@
+// PR c++/89074
+// { dg-do compile { target c++14 } }
+
+constexpr bool
+foo ()
+{
+  int a[] = { 1, 2 };
+  int b[] = { 3, 4 };
+
+  if ([0] == [0])
+return false;
+
+  if ([1] == [0])
+return false;
+
+  if ([1] == [1])
+return false;
+
+  if ([2] == [1])
+return false;
+
+  if ([2] == [0])  // { dg-error "is not a constant expression" }
+return false;
+
+  return true;
+}
+
+constexpr bool a = foo ();




Re: [PATCH] c++: Avoid some -Wreturn-type false positives with const{expr,eval} if [PR103991]

2022-01-13 Thread Jason Merrill via Gcc-patches

On 1/13/22 04:39, Jakub Jelinek wrote:

Hi!

The changes done to genericize_if_stmt in order to improve
-Wunreachable-code* warning (which Richi didn't actually commit
for GCC 12) are I think fine for normal ifs, but for constexpr if
and consteval if we have two competing warnings.
The problem is that we replace the non-taken clause (then or else)
with void_node and keep the if (cond) { something } else {}
or if (cond) {} else { something }; in the IL.
This helps -Wunreachable-code*, if something can't fallthru but the
non-taken clause can, we don't warn about code after it because it
is still (in theory) reachable.
But if the non-taken branch can't fallthru, we can get false positive
-Wreturn-type warnings (which are enabled by default) if there is
nothing after the if and the taken branch can't fallthru either.


Perhaps we should replace the non-taken clause with 
__builtin_unreachable() instead of void_node?


And/or block_may_fallthru could handle INTEGER_CST op0?


One possibility to fix this is revert at least temporarily
to the previous behavior for constexpr and consteval if, yes, we
can get false positive -Wunreachable-code* warnings but the warning
isn't present in GCC 12.
The patch below implements that for constexpr if which throws its
clauses very early (either during parsing or during instantiation),
and for consteval if it decides based on block_may_fallthru on the
non-taken (for constant evaluation only) clause - if the non-taken
branch may fallthru, it does what you did in genericize_if_stmt
for consteval if, if it can't fallthru, it uses the older way
of pretending there wasn't an if and just replacing it with the
taken clause.  There are some false positive risks with this though,
block_may_fallthru is optimistic and doesn't handle some statements
at all (like FOR_STMT, WHILE_STMT, DO_STMT - of course handling those
is quite hard).
For constexpr if (but perhaps for GCC 13?) we could try to
block_may_fallthru before we throw it away and remember it in some
flag on the IF_STMT, but am not sure how dangerous would it be to call
it on the discarded stmts.  Or if it is too dangerous e.g. just
remember whether the discarded block of consteval if wasn't present
or was empty, in that case assume fallthru, and otherwise assume
it can't fallthru (-Wunreachable-code possible false positives).

Bootstrapped/regtested on x86_64-linux and i686-linux, if needed,
I can also test the safer variant with just
   if (IF_STMT_CONSTEVAL_P (stmt))
 stmt = else_;
for consteval if.

2022-01-13  Jakub Jelinek  

PR c++/103991
* cp-objcp-common.c (cxx_block_may_fallthru) : For
IF_STMT_CONSTEXPR_P with constant false or true condition only
check if the taken clause may fall through.
* cp-gimplify.c (genericize_if_stmt): For consteval if, revert
to r12-5638^ behavior if then_ block can't fall through.  For
constexpr if, revert to r12-5638^ behavior.

* g++.dg/warn/Wreturn-type-13.C: New test.

--- gcc/cp/cp-objcp-common.c.jj 2022-01-11 23:11:22.091294356 +0100
+++ gcc/cp/cp-objcp-common.c2022-01-12 17:57:18.232202275 +0100
@@ -313,6 +313,13 @@ cxx_block_may_fallthru (const_tree stmt)
return false;
  
  case IF_STMT:

+  if (IF_STMT_CONSTEXPR_P (stmt))
+   {
+ if (integer_nonzerop (IF_COND (stmt)))
+   return block_may_fallthru (THEN_CLAUSE (stmt));
+ if (integer_zerop (IF_COND (stmt)))
+   return block_may_fallthru (ELSE_CLAUSE (stmt));
+   }
if (block_may_fallthru (THEN_CLAUSE (stmt)))
return true;
return block_may_fallthru (ELSE_CLAUSE (stmt));
--- gcc/cp/cp-gimplify.c.jj 2022-01-11 23:11:22.090294370 +0100
+++ gcc/cp/cp-gimplify.c2022-01-12 21:22:17.585212804 +0100
@@ -166,8 +166,15 @@ genericize_if_stmt (tree *stmt_p)
   can contain unfolded immediate function calls, we have to discard
   the then_ block regardless of whether else_ has side-effects or not.  */
if (IF_STMT_CONSTEVAL_P (stmt))
-stmt = build3 (COND_EXPR, void_type_node, boolean_false_node,
-  void_node, else_);
+{
+  if (block_may_fallthru (then_))
+   stmt = build3 (COND_EXPR, void_type_node, boolean_false_node,
+  void_node, else_);
+  else
+   stmt = else_;
+}
+  else if (IF_STMT_CONSTEXPR_P (stmt))
+stmt = integer_nonzerop (cond) ? then_ : else_;
else
  stmt = build3 (COND_EXPR, void_type_node, cond, then_, else_);
protected_set_expr_location_if_unset (stmt, locus);
--- gcc/testsuite/g++.dg/warn/Wreturn-type-13.C.jj  2022-01-12 
21:21:36.567794238 +0100
+++ gcc/testsuite/g++.dg/warn/Wreturn-type-13.C 2022-01-12 21:20:48.487475787 
+0100
@@ -0,0 +1,35 @@
+// PR c++/103991
+// { dg-do compile { target c++17 } }
+
+struct S { ~S(); };
+int
+foo ()
+{
+  S s;
+  if constexpr (true)
+return 0;
+  else
+return 1;
+}  // { dg-bogus "control reaches end of non-void 

Re: [PATCH] c++: error message for dependent template members [PR70417]

2022-01-13 Thread Jason Merrill via Gcc-patches

On 12/9/21 10:51, Jason Merrill wrote:

On 12/4/21 12:23, Anthony Sharp wrote:

Hi Jason,

Hope you are well. Apologies for not coming back sooner.

 >I'd put it just above the definition of saved_token_sentinel in 
parser.c.


Sounds good, done.

 >Maybe cp_parser_require_end_of_template_parameter_list?  Either way 
is fine.


Even better, have changed it.

 >Hmm, good point; operators that are member functions must be 
non-static,

 >so we couldn't be doing a comparison of the address of the function.

In that case I have set it to return early there.

 >So declarator_p should be true there.  I'll fix that.

Thank you.

 >> +  if (next_token->keyword == RID_TEMPLATE)
 >> +    {
 >> +      /* But at least make sure it's properly formed (e.g. see 
PR19397).  */
 >> +      if (cp_lexer_peek_nth_token (parser->lexer, 2)->type == 
CPP_NAME)

 >> +       return 1;
 >> +
 >> +      return -1;
 >> +    }
 >> +
 >> +  /* Could be a ~ referencing the destructor of a class template. 
 */

 >> +  if (next_token->type == CPP_COMPL)
 >> +    {
 >> +      /* It could only be a template.  */
 >> +      if (cp_lexer_peek_nth_token (parser->lexer, 2)->type == 
CPP_NAME)

 >> +       return 1;
 >> +
 >> +      return -1;
 >> +    }
 >
 >Why don't these check for the < ?

I think perhaps I could have named the function better; instead of
next_token_begins_template_id, how's about 
next_token_begins_template_name?

That's all I intended to check for.


You're using it to control whether we try to parse a template-id, and 
it's used to initialize variables named looks_like_template_id, so I 
think better to keep the old name.



In the first case, something like "->template some_name" will always be
intended as a template, so no need to check for the <. If there were 
default

template arguments you could also validly omit the <> completely, I think
(could be wrong).


Or if the template arguments can be deduced, yes:

template  struct A
{
   template  void f(U u);
};

template  void g(A a)
{
   a->template f(42);
}

But 'f' is still not a template-id.

...

Actually, it occurs to me that you might be better off handling this in 
cp_parser_template_name, something like the below, to avoid the complex 
duplicate logic in the id-expression handling.


Note that in this patch I'm using "any_object_scope" as a proxy for "I'm 
parsing an expression", since !is_declaration doesn't work for that; as 
a result, this doesn't handle the static member function template case. 
For that you'd probably still need to pass down a flag so that 
cp_parser_template_name knows it's being called from 
cp_parser_id_expression.


Your patch has a false positive on

template  struct A { };
template  void f()
{
   A();
};

which my patch checks in_template_argument_list_p to avoid, though 
checking any_object_scope also currently avoids it.


What do you think?


I decided that it made more sense to keep the check in 
cp_parser_id_expression like you had it, but I moved it to the end to 
simplify the logic.  Here's what I'm applying, thanks!From 1978f05716133b934de0fca7c3d64089b62e3e78 Mon Sep 17 00:00:00 2001
From: Anthony Sharp 
Date: Sat, 4 Dec 2021 17:23:22 +
Subject: [PATCH] c++: warning for dependent template members [PR70417]
To: gcc-patches@gcc.gnu.org

Add a helpful warning message for when the user forgets to
include the "template" keyword after ., -> or :: when
accessing a member in a dependent context, where the member is a
template.

	PR c++/70417

gcc/c-family/ChangeLog:

	* c.opt: Added -Wmissing-template-keyword.

gcc/cp/ChangeLog:

	* parser.c (cp_parser_id_expression): Handle
	-Wmissing-template-keyword.
	(struct saved_token_sentinel): Add modes to control what happens
	on destruction.
	(cp_parser_statement): Adjust.
	(cp_parser_skip_entire_template_parameter_list): New function that
	skips an entire template parameter list.
	(cp_parser_require_end_of_template_parameter_list): Rename old
	cp_parser_skip_to_end_of_template_parameter_list.
	(cp_parser_skip_to_end_of_template_parameter_list): Refactor to be
	called from one of the above two functions.
	(cp_parser_lambda_declarator_opt)
	(cp_parser_explicit_template_declaration)
	(cp_parser_enclosed_template_argument_list): Adjust.

gcc/ChangeLog:

	* doc/invoke.texi: Documentation for Wmissing-template-keyword.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/variadic-mem_fn2.C: Catch warning about missing
	template keyword.
	* g++.dg/template/dependent-name17.C: New test.
	* g++.dg/template/dependent-name18.C: New test.

Co-authored-by: Jason Merrill 
---
 gcc/doc/invoke.texi   |  33 
 gcc/c-family/c.opt|   4 +
 gcc/cp/parser.c   | 178 +-
 gcc/testsuite/g++.dg/cpp0x/variadic-mem_fn2.C |   1 +
 .../g++.dg/template/dependent-name17.C|  49 +
 .../g++.dg/template/dependent-name18.C|   5 +
 6 files changed, 223 insertions(+), 47 deletions(-)
 create mode 100644 

[PATCH] PR fortran/103782 - [9/10/11/12 Regression] internal error occurs when overloading intrinsic

2022-01-13 Thread Harald Anlauf via Gcc-patches
Dear all,

there was a regression handling overloaded elemental intrinsics,
leading to an ICE on valid code.  Reported by Urban Jost.

The logic for when we need to scalarize a call to an intrinsic
seems to have been broken during the 9-release.  The attached
patch fixes the ICE and seems to work on the extended testcase
as well as regtests fine on x86_64-pc-linux-gnu.

OK for mainline?  Backport to affected branches?

Thanks,
Harald

From 5b914bef991528aebfe9734b4e7af7bae039e66a Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 13 Jan 2022 21:50:45 +0100
Subject: [PATCH] Fortran: fix ICE overloading elemental intrinsics

gcc/fortran/ChangeLog:

	PR fortran/103782
	* expr.c (gfc_simplify_expr): Adjust logic for when to scalarize a
	call of an intrinsic which may have been overloaded.

gcc/testsuite/ChangeLog:

	PR fortran/103782
	* gfortran.dg/overload_4.f90: New test.
---
 gcc/fortran/expr.c   |  5 ++---
 gcc/testsuite/gfortran.dg/overload_4.f90 | 27 
 2 files changed, 29 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/overload_4.f90

diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index a87686d8217..20b88a8ef56 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -2219,10 +2219,9 @@ gfc_simplify_expr (gfc_expr *p, int type)
 	  && gfc_intrinsic_func_interface (p, 1) == MATCH_ERROR)
 	return false;

-  if (p->expr_type == EXPR_FUNCTION)
+  if (p->symtree && (p->value.function.isym || p->ts.type == BT_UNKNOWN))
 	{
-	  if (p->symtree)
-	isym = gfc_find_function (p->symtree->n.sym->name);
+	  isym = gfc_find_function (p->symtree->n.sym->name);
 	  if (isym && isym->elemental)
 	scalarize_intrinsic_call (p, false);
 	}
diff --git a/gcc/testsuite/gfortran.dg/overload_4.f90 b/gcc/testsuite/gfortran.dg/overload_4.f90
new file mode 100644
index 000..43207e358ba
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/overload_4.f90
@@ -0,0 +1,27 @@
+! { dg-do run }
+! { dg-additional-options "-Wno-intrinsic-shadow" }
+! PR fortran/103782 - ICE overloading an intrinsic like dble or real
+! Contributed by Urban Jost
+
+program runtest
+  implicit none
+  interface dble
+ procedure to_double
+  end interface dble
+  interface real
+ procedure floor ! not really FLOOR...
+  end interface real
+  if (any (dble ([10.0d0,20.0d0]) - [10.0d0,20.0d0] /= 0.d0)) stop 1
+  if (any (real ([1.5,2.5])   - [1.5,2.5]   /= 0.0 )) stop 2
+contains
+  elemental function to_double (valuein) result(d_out)
+doubleprecision,intent(in) :: valuein
+doubleprecision:: d_out
+d_out=valuein
+  end function to_double
+  elemental function floor (valuein) result(d_out) ! not really FLOOR...
+real, intent(in) :: valuein
+real :: d_out
+d_out=valuein
+  end function floor
+end program runtest
--
2.31.1



Re: [COMIITTED] Testsuite: Make dependence on -fdelete-null-pointer-checks explicit

2022-01-13 Thread Jonathan Wakely via Gcc-patches

On 10/01/22 11:45 +, Jonathan Wakely wrote:

CC libstdc++ and Jakub.

On 08/01/22 23:22 -0700, Sandra Loosemore wrote:

I've checked in these tweaks for various testcases that fail on
nios2-elf without an explicit -fdelete-null-pointer-checks option.  This
target is configured to build with that optimization off by default.

-Sandra

commit 04c69d0e61c0f98a010d77a79ab749d5f0aa6b67
Author: Sandra Loosemore 
Date:   Sat Jan 8 22:02:13 2022 -0800

  Testsuite: Make dependence on -fdelete-null-pointer-checks explicit

  nios2-elf target defaults to -fno-delete-null-pointer-checks, breaking
  tests that implicitly depend on that optimization.  Add the option
  explicitly on these tests.

  2022-01-08  Sandra Loosemore  

gcc/testsuite/
* g++.dg/cpp0x/constexpr-compare1.C: Add explicit
-fdelete-null-pointer-checks option.
* g++.dg/cpp0x/constexpr-compare2.C: Likewise.
* g++.dg/cpp0x/constexpr-typeid2.C: Likewise.
* g++.dg/cpp1y/constexpr-94716.C: Likewise.
* g++.dg/cpp1z/constexpr-compare1.C: Likewise.
* g++.dg/cpp1z/constexpr-if36.C: Likewise.
* gcc.dg/init-compare-1.c: Likewise.

libstdc++-v3/
* testsuite/18_support/type_info/constexpr.cc: Add explicit
-fdelete-null-pointer-checks option.


This test should not be doing anything with null pointers. Instead of
working around the error on nios2-elf, I think the front-end needs
fixing.

Maybe something is not being folded early enough for the constexpr
evaluation to work. Jakub?

$ g++ -std=gnu++23  
~/src/gcc/gcc/libstdc++-v3/testsuite/18_support/type_info/constexpr.cc -c 
-fno-delete-null-pointer-checks
/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/18_support/type_info/constexpr.cc:49:22:
 error: non-constant condition for static assertion
  49 | static_assert( test01() );
 |~~^~
In file included from 
/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/18_support/type_info/constexpr.cc:5:
/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/18_support/type_info/constexpr.cc:49:22:
   in 'constexpr' expansion of 'test01()'
/home/jwakely/gcc/12/include/c++/12.0.0/typeinfo:196:19: error: '(((const 
std::type_info*)(& _ZTIi)) == ((const std::type_info*)(& _ZTIl)))' is not a 
constant expression
 196 |   return this == &__arg;
 |  ~^


This is now https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104016



[PATCH] i386: Introduce V2QImode vectorized shifts [PR103861]

2022-01-13 Thread Uros Bizjak via Gcc-patches
Add V2QImode shift operations and split them to synthesized
double HI/LO QImode operations with integer registers.

Also robustify arithmetic split patterns.

2022-01-13  Uroš Bizjak  

gcc/ChangeLog:

PR target/103861
* config/i386/i386.md (*ashlqi_ext_2): New insn pattern.
(*qi_ext_2): Ditto.
* config/i386/mmx.md (v2qi):
New insn_and_split pattern.

gcc/testsuite/ChangeLog:

PR target/103861
* gcc.target/i386/pr103861.c (shl,ashr,lshr): New tests.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index bcaaa4993b1..c2acb1dbd90 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12413,6 +12413,54 @@
(const_string "*")))
(set_attr "mode" "")])
 
+(define_insn "*ashlqi_ext_2"
+  [(set (zero_extract:SWI248
+ (match_operand:SWI248 0 "register_operand" "+Q")
+ (const_int 8)
+ (const_int 8))
+   (subreg:SWI248
+ (ashift:QI
+   (subreg:QI
+ (zero_extract:SWI248
+   (match_operand:SWI248 1 "register_operand" "0")
+   (const_int 8)
+   (const_int 8)) 0)
+   (match_operand:QI 2 "nonmemory_operand" "cI")) 0))
+  (clobber (reg:CC FLAGS_REG))]
+  "/* FIXME: without this LRA can't reload this pattern, see PR82524.  */
+   rtx_equal_p (operands[0], operands[1])"
+{
+  switch (get_attr_type (insn))
+{
+case TYPE_ALU:
+  gcc_assert (operands[2] == const1_rtx);
+  return "add{b}\t%h0, %h0";
+
+default:
+  if (operands[2] == const1_rtx
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+   return "sal{b}\t%h0";
+  else
+   return "sal{b}\t{%2, %h0|%h0, %2}";
+}
+}
+  [(set (attr "type")
+ (cond [(and (match_test "TARGET_DOUBLE_WITH_ADD")
+(match_operand 2 "const1_operand"))
+ (const_string "alu")
+  ]
+  (const_string "ishift")))
+   (set (attr "length_immediate")
+ (if_then_else
+   (ior (eq_attr "type" "alu")
+   (and (eq_attr "type" "ishift")
+(and (match_operand 2 "const1_operand")
+ (ior (match_test "TARGET_SHIFT1")
+  (match_test "optimize_function_for_size_p 
(cfun)")
+   (const_string "0")
+   (const_string "*")))
+   (set_attr "mode" "QI")])
+
 ;; See comment above `ashl3' about how this works.
 
 (define_expand "3"
@@ -13143,6 +13191,39 @@
(const_string "0")
(const_string "*")))
(set_attr "mode" "")])
+
+(define_insn "*qi_ext_2"
+  [(set (zero_extract:SWI248
+ (match_operand:SWI248 0 "register_operand" "+Q")
+ (const_int 8)
+ (const_int 8))
+   (subreg:SWI248
+ (any_shiftrt:QI
+   (subreg:QI
+ (zero_extract:SWI248
+   (match_operand:SWI248 1 "register_operand" "0")
+   (const_int 8)
+   (const_int 8)) 0)
+   (match_operand:QI 2 "nonmemory_operand" "cI")) 0))
+  (clobber (reg:CC FLAGS_REG))]
+  "/* FIXME: without this LRA can't reload this pattern, see PR82524.  */
+   rtx_equal_p (operands[0], operands[1])"
+{
+  if (operands[2] == const1_rtx
+  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+return "{b}\t%h0";
+  else
+return "{b}\t{%2, %h0|%h0, %2}";
+}
+  [(set_attr "type" "ishift")
+   (set (attr "length_immediate")
+ (if_then_else
+   (and (match_operand 2 "const1_operand")
+   (ior (match_test "TARGET_SHIFT1")
+(match_test "optimize_function_for_size_p (cfun)")))
+   (const_string "0")
+   (const_string "*")))
+   (set_attr "mode" "QI")])
 
 ;; Rotate instructions
 
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 3d99a5e851b..782da220f98 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1657,7 +1657,8 @@
 (neg:V2QI
  (match_operand:V2QI 1 "general_reg_operand")))
(clobber (reg:CC FLAGS_REG))]
-  "reload_completed"
+  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
+   && reload_completed"
   [(parallel
  [(set (strict_low_part (match_dup 0))
   (neg:QI (match_dup 1)))
@@ -1683,7 +1684,8 @@
 (neg:V2QI
  (match_operand:V2QI 1 "sse_reg_operand")))
(clobber (reg:CC FLAGS_REG))]
-  "reload_completed"
+  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
+   && TARGET_SSE2 && reload_completed"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 0)
(minus:V16QI (match_dup 0) (match_dup 1)))]
@@ -1757,7 +1759,8 @@
  (match_operand:V2QI 1 "general_reg_operand")
  (match_operand:V2QI 2 "general_reg_operand")))
(clobber (reg:CC FLAGS_REG))]
-  "reload_completed"
+  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
+   && reload_completed"
   [(parallel
  [(set (strict_low_part (match_dup 0))
   

Re: PING^2 (C/C++): Re: [PATCH 6/6] Add __attribute__ ((tainted))

2022-01-13 Thread Jason Merrill via Gcc-patches

On 1/12/22 10:33, David Malcolm wrote:

On Tue, 2022-01-11 at 23:36 -0500, Jason Merrill wrote:

On 1/10/22 16:36, David Malcolm via Gcc-patches wrote:

On Thu, 2022-01-06 at 09:08 -0500, David Malcolm wrote:

On Sat, 2021-11-13 at 15:37 -0500, David Malcolm wrote:

This patch adds a new __attribute__ ((tainted)) to the C/C++
frontends.


Ping for GCC C/C++ mantainers for review of the C/C++ FE parts of
this
patch (attribute registration, documentation, the name of the
attribute, etc).

(I believe it's independent of the rest of the patch kit, in that
it
could go into trunk without needing the prior patches)

Thanks
Dave


Getting close to end of stage 3 for GCC 12, so pinging this patch
again...

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584376.html


The c-family change is OK.


Thanks.

I'm retesting the patch now, but it now seems to me that
   __attribute__((tainted_args))
would lead to more readable code than:
   __attribute__((tainted))

in that the name "tainted_args" better conveys the idea that all
arguments are under attacker-control (as opposed to the body of the
function or the function pointer being under attacker-control).

Looking at
   https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
we already have some attributes with underscores in their names.

Does this sound good?


Makes sense to me.




Thanks
Dave






It can be used on function decls: the analyzer will treat as
tainted
all parameters to the function and all buffers pointed to by
parameters
to the function.  Adding this in one place to the Linux kernel's
__SYSCALL_DEFINEx macro allows the analyzer to treat all syscalls
as
having tainted inputs.  This gives additional testing beyond e.g.
__user
pointers added by earlier patches - an example of the use of this
can
be
seen in CVE-2011-2210, where given:

   SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user
*,
buffer,
   unsigned long, nbytes, int __user *, start,
void
__user *, arg)

the analyzer will treat the nbytes param as under attacker
control,
and
can complain accordingly:

taint-CVE-2011-2210-1.c: In function ‘sys_osf_getsysinfo’:
taint-CVE-2011-2210-1.c:69:21: warning: use of attacker-
controlled
value
    ‘nbytes’ as size without upper-bounds checking [CWE-129] [-
Wanalyzer-tainted-size]
     69 | if (copy_to_user(buffer, hwrpb, nbytes)
!= 0)
    | ^~~

Additionally, the patch allows the attribute to be used on field
decls:
specifically function pointers.  Any function used as an
initializer
for such a field gets treated as tainted.  An example can be seen
in
CVE-2020-13143, where adding __attribute__((tainted)) to the
"store"
callback of configfs_attribute:

    struct configfs_attribute {
   /* [...snip...] */
   ssize_t (*store)(struct config_item *, const char *,
size_t)
     __attribute__((tainted));
   /* [...snip...] */
    };

allows the analyzer to see:

   CONFIGFS_ATTR(gadget_dev_desc_, UDC);

and treat gadget_dev_desc_UDC_store as tainted, so that it
complains:

taint-CVE-2020-13143-1.c: In function
‘gadget_dev_desc_UDC_store’:
taint-CVE-2020-13143-1.c:33:17: warning: use of attacker-
controlled
value
    ‘len + 18446744073709551615’ as offset without upper-bounds
checking [CWE-823] [-Wanalyzer-tainted-offset]
     33 | if (name[len - 1] == '\n')
    | ^

Similarly, the attribute could be used on the ioctl callback
field,
USB device callbacks, network-handling callbacks etc.  This
potentially
gives a lot of test coverage with relatively little code
annotation,
and
without necessarily needing link-time analysis (which -fanalyzer
can
only do at present on trivial examples).

I believe this is the first time we've had an attribute on a
field.
If that's an issue, I could prepare a version of the patch that
merely allowed it on functions themselves.

As before this currently still needs -fanalyzer-checker=taint (in
addition to -fanalyzer).

gcc/analyzer/ChangeLog:
  * engine.cc: Include "stringpool.h", "attribs.h", and
  "tree-dfa.h".
  (mark_params_as_tainted): New.
  (class tainted_function_custom_event): New.
  (class tainted_function_info): New.
  (exploded_graph::add_function_entry): Handle functions
with
  "tainted" attribute.
  (class tainted_field_custom_event): New.
  (class tainted_callback_custom_event): New.
  (class tainted_call_info): New.
  (add_tainted_callback): New.
  (add_any_callbacks): New.
  (exploded_graph::build_initial_worklist): Find callbacks
that
are
  reachable from global initializers, calling
add_any_callbacks
on
  them.

gcc/c-family/ChangeLog:
  * c-attribs.c (c_common_attribute_table): Add "tainted".
  (handle_tainted_attribute): New.

gcc/ChangeLog:
  * doc/extend.texi (Function Attributes): Note that

Ping^4: [PATCH, rs6000 V2] rotate and mask constants [PR94393]

2022-01-13 Thread Pat Haugen via Gcc-patches
Ping.

On 11/22/21 1:38 PM, Pat Haugen via Gcc-patches wrote:
> Updated version of the patch. Changes made from original are updated 
> commentary to hopefully aid readability, no functional changes.
> 
> 
> Implement more two insn constants.  rotate_and_mask_constant covers
> 64-bit constants that can be formed by rotating a 16-bit signed
> constant, rotating a 16-bit signed constant masked on left or right
> (rldicl and rldicr), rotating a 16-bit signed constant masked by
> rldic, and unusual "lis; rldicl" and "lis; rldicr" patterns.  All the
> values possible for DImode rs6000_is_valid_and_mask are covered.
> 
> Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
> Ok for master?
> 
> -Pat
> 
> 
> 2021-11-22  Alan Modra  
>   Pat Haugen  
> 
>   PR 94393
> gcc/
>   * config/rs6000/rs6000.c (rotate_di, is_rotate_positive_constant,
>   is_rotate_negative_constant, rotate_and_mask_constant): New functions.
>   (num_insns_constant_multi, rs6000_emit_set_long_const): Use it here.
>   * config/rs6000/rs6000.md (*movdi_internal64+1 splitter): Delete.
> gcc/testsuite/
>   * gcc.target/powerpc/rot_cst.h,
>   * gcc.target/powerpc/rot_cst1.c,
>   * gcc.target/powerpc/rot_cst2.c: New tests.



[PATCH] i386: Cleanup V2QI arithmetic instructions

2022-01-13 Thread Uros Bizjak via Gcc-patches
2022-01-13  Uroš Bizjak  

gcc/ChangeLog:

* config/i386/mmx.md (negv2qi): Disparage GPR alternative a bit.
Disable for TARGET_PARTIAL_REG_STALL unless optimizing for size.
(negv2qi splitters): Use lowpart_subreg instead of
gen_lowpart to create subreg.
(v2qi3): Disparage GPR alternative a bit.
Disable for TARGET_PARTIAL_REG_STALL unless optimizing for size.
(v2qi3 splitters): Use lowpart_subreg instead of
gen_lowpart to create subreg.
* config/i386/i386.md (*subqi_ext_2): Move.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9937643a273..bcaaa4993b1 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6905,6 +6905,30 @@
   [(set_attr "type" "alu")
(set_attr "mode" "SI")])
 
+(define_insn "*subqi_ext_2"
+  [(set (zero_extract:SWI248
+ (match_operand:SWI248 0 "register_operand" "+Q")
+ (const_int 8)
+ (const_int 8))
+   (subreg:SWI248
+ (minus:QI
+   (subreg:QI
+ (zero_extract:SWI248
+   (match_operand:SWI248 1 "register_operand" "0")
+   (const_int 8)
+   (const_int 8)) 0)
+   (subreg:QI
+ (zero_extract:SWI248
+   (match_operand:SWI248 2 "register_operand" "Q")
+   (const_int 8)
+   (const_int 8)) 0)) 0))
+  (clobber (reg:CC FLAGS_REG))]
+  "/* FIXME: without this LRA can't reload this pattern, see PR82524.  */
+   rtx_equal_p (operands[0], operands[1])"
+  "sub{b}\t{%h2, %h0|%h0, %h2}"
+  [(set_attr "type" "alu")
+   (set_attr "mode" "QI")])
+
 ;; Subtract with jump on overflow.
 (define_expand "subv4"
   [(parallel [(set (reg:CCO FLAGS_REG)
@@ -6932,30 +6956,6 @@
 operands[4] = gen_rtx_SIGN_EXTEND (mode, operands[2]);
 })
 
-(define_insn "*subqi_ext_2"
-  [(set (zero_extract:SWI248
- (match_operand:SWI248 0 "register_operand" "+Q")
- (const_int 8)
- (const_int 8))
-   (subreg:SWI248
- (minus:QI
-   (subreg:QI
- (zero_extract:SWI248
-   (match_operand:SWI248 1 "register_operand" "0")
-   (const_int 8)
-   (const_int 8)) 0)
-   (subreg:QI
- (zero_extract:SWI248
-   (match_operand:SWI248 2 "register_operand" "Q")
-   (const_int 8)
-   (const_int 8)) 0)) 0))
-  (clobber (reg:CC FLAGS_REG))]
-  "/* FIXME: without this LRA can't reload this pattern, see PR82524.  */
-   rtx_equal_p (operands[0], operands[1])"
-  "sub{b}\t{%h2, %h0|%h0, %h2}"
-  [(set_attr "type" "alu")
-   (set_attr "mode" "QI")])
-
 (define_insn "*subv4"
   [(set (reg:CCO FLAGS_REG)
(eq:CCO (minus:
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 295a132bc46..3d99a5e851b 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1633,12 +1633,20 @@
   "TARGET_MMX_WITH_SSE"
   "operands[2] = force_reg (mode, CONST0_RTX (mode));")
 
+(define_expand "neg2"
+  [(set (match_operand:VI_32 0 "register_operand")
+   (minus:VI_32
+ (match_dup 2)
+ (match_operand:VI_32 1 "register_operand")))]
+  "TARGET_SSE2"
+  "operands[2] = force_reg (mode, CONST0_RTX (mode));")
+
 (define_insn "negv2qi2"
   [(set (match_operand:V2QI 0 "register_operand" "=?Q,")
 (neg:V2QI
  (match_operand:V2QI 1 "register_operand" "0,Yw")))
(clobber (reg:CC FLAGS_REG))]
-  ""
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
   "#"
   [(set_attr "isa" "*,sse2")
(set_attr "type" "multi")
@@ -1664,10 +1672,10 @@
  (const_int 8)) 0)) 0))
   (clobber (reg:CC FLAGS_REG))])]
 {
-  operands[3] = gen_lowpart (HImode, operands[1]);
-  operands[2] = gen_lowpart (HImode, operands[0]);
-  operands[1] = gen_lowpart (QImode, operands[1]);
-  operands[0] = gen_lowpart (QImode, operands[0]);
+  operands[3] = lowpart_subreg (HImode, operands[1], V2QImode);
+  operands[2] = lowpart_subreg (HImode, operands[0], V2QImode);
+  operands[1] = lowpart_subreg (QImode, operands[1], V2QImode);
+  operands[0] = lowpart_subreg (QImode, operands[0], V2QImode);
 })
 
 (define_split
@@ -1678,11 +1686,11 @@
   "reload_completed"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 0)
-   (minus:V4QI (match_dup 0) (match_dup 1)))]
+   (minus:V16QI (match_dup 0) (match_dup 1)))]
 {
-  operands[2] = CONST0_RTX (V4QImode);
-  operands[1] = gen_lowpart (V4QImode, operands[1]);
-  operands[0] = gen_lowpart (V4QImode, operands[0]);
+  operands[2] = CONST0_RTX (V16QImode);
+  operands[1] = lowpart_subreg (V16QImode, operands[1], V2QImode);
+  operands[0] = lowpart_subreg (V16QImode, operands[0], V2QImode);
 })
 
 (define_expand "mmx_3"
@@ -1718,14 +1726,6 @@
(set_attr "type" "mmxadd,sseadd,sseadd")
(set_attr "mode" "DI,TI,TI")])
 
-(define_expand "neg2"
-  [(set (match_operand:VI_32 0 "register_operand")

Patch ping (Re: [PATCH] c++: Reject in constant evaluation address comparisons of start of one var and end of another [PR89074])

2022-01-13 Thread Jakub Jelinek via Gcc-patches
Hi!

I'd like to ping this patch:

> 2022-01-06  Jakub Jelinek  
> 
>   PR c++/89074
>   * fold-const.c (address_compare): Punt on comparison of address of
>   one object with address of end of another object if
>   folding_initializer.
> 
>   * g++.dg/cpp1y/constexpr-89074-1.C: New test.

Thanks.

Jakub



[PATCH] forwprop, v2: Canonicalize atomic fetch_op op x to op_fetch or vice versa [PR98737]

2022-01-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 13, 2022 at 04:07:20PM +0100, Richard Biener wrote:
> I'm mostly concerned about the replace_uses_by use.  forwprop
> will go over newly emitted stmts and thus the hypothetical added
> 
> lhs2 = d;
> 
> record the copy and schedule the stmt for removal, substituting 'd'
> in each use as it goes along the function and folding them.  It's
> a bit iffy (and maybe has unintended side-effects in odd cases)
> to trample around and fold stuff behind that flows back.
> 
> I'd always vote to simplify the folding code so it's easier to
> maintain and not micro-optimize there since it's not going to be
> a hot part of the compiler.

Ok.  So like this?

2022-01-13  Jakub Jelinek  

PR target/98737
* tree-ssa-forwprop.c (simplify_builtin_call): Canonicalize
__atomic_fetch_op (p, x, y) op x into __atomic_op_fetch (p, x, y)
and __atomic_op_fetch (p, x, y) iop x into
__atomic_fetch_op (p, x, y).

* gcc.dg/tree-ssa/pr98737-1.c: New test.
* gcc.dg/tree-ssa/pr98737-2.c: New test.

--- gcc/tree-ssa-forwprop.c.jj  2022-01-11 23:11:23.467275019 +0100
+++ gcc/tree-ssa-forwprop.c 2022-01-13 18:09:50.318625915 +0100
@@ -1241,12 +1241,19 @@ constant_pointer_difference (tree p1, tr
memset (p + 4, ' ', 3);
into
memcpy (p, "abcd   ", 7);
-   call if the latter can be stored by pieces during expansion.  */
+   call if the latter can be stored by pieces during expansion.
+
+   Also canonicalize __atomic_fetch_op (p, x, y) op x
+   to __atomic_op_fetch (p, x, y) or
+   __atomic_op_fetch (p, x, y) iop x
+   to __atomic_fetch_op (p, x, y) when possible (also __sync).  */
 
 static bool
 simplify_builtin_call (gimple_stmt_iterator *gsi_p, tree callee2)
 {
   gimple *stmt1, *stmt2 = gsi_stmt (*gsi_p);
+  enum built_in_function other_atomic = END_BUILTINS;
+  enum tree_code atomic_op = ERROR_MARK;
   tree vuse = gimple_vuse (stmt2);
   if (vuse == NULL)
 return false;
@@ -1448,6 +1455,290 @@ simplify_builtin_call (gimple_stmt_itera
}
}
   break;
+
+ #define CASE_ATOMIC(NAME, OTHER, OP) \
+case BUILT_IN_##NAME##_1:  \
+case BUILT_IN_##NAME##_2:  \
+case BUILT_IN_##NAME##_4:  \
+case BUILT_IN_##NAME##_8:  \
+case BUILT_IN_##NAME##_16: \
+  atomic_op = OP;  \
+  other_atomic \
+   = (enum built_in_function) (BUILT_IN_##OTHER##_1\
+   + (DECL_FUNCTION_CODE (callee2) \
+  - BUILT_IN_##NAME##_1)); \
+  goto handle_atomic_fetch_op;
+
+CASE_ATOMIC (ATOMIC_FETCH_ADD, ATOMIC_ADD_FETCH, PLUS_EXPR)
+CASE_ATOMIC (ATOMIC_FETCH_SUB, ATOMIC_SUB_FETCH, MINUS_EXPR)
+CASE_ATOMIC (ATOMIC_FETCH_AND, ATOMIC_AND_FETCH, BIT_AND_EXPR)
+CASE_ATOMIC (ATOMIC_FETCH_XOR, ATOMIC_XOR_FETCH, BIT_XOR_EXPR)
+CASE_ATOMIC (ATOMIC_FETCH_OR, ATOMIC_OR_FETCH, BIT_IOR_EXPR)
+
+CASE_ATOMIC (SYNC_FETCH_AND_ADD, SYNC_ADD_AND_FETCH, PLUS_EXPR)
+CASE_ATOMIC (SYNC_FETCH_AND_SUB, SYNC_SUB_AND_FETCH, MINUS_EXPR)
+CASE_ATOMIC (SYNC_FETCH_AND_AND, SYNC_AND_AND_FETCH, BIT_AND_EXPR)
+CASE_ATOMIC (SYNC_FETCH_AND_XOR, SYNC_XOR_AND_FETCH, BIT_XOR_EXPR)
+CASE_ATOMIC (SYNC_FETCH_AND_OR, SYNC_OR_AND_FETCH, BIT_IOR_EXPR)
+
+CASE_ATOMIC (ATOMIC_ADD_FETCH, ATOMIC_FETCH_ADD, MINUS_EXPR)
+CASE_ATOMIC (ATOMIC_SUB_FETCH, ATOMIC_FETCH_SUB, PLUS_EXPR)
+CASE_ATOMIC (ATOMIC_XOR_FETCH, ATOMIC_FETCH_XOR, BIT_XOR_EXPR)
+
+CASE_ATOMIC (SYNC_ADD_AND_FETCH, SYNC_FETCH_AND_ADD, MINUS_EXPR)
+CASE_ATOMIC (SYNC_SUB_AND_FETCH, SYNC_FETCH_AND_SUB, PLUS_EXPR)
+CASE_ATOMIC (SYNC_XOR_AND_FETCH, SYNC_FETCH_AND_XOR, BIT_XOR_EXPR)
+
+#undef CASE_ATOMIC
+
+handle_atomic_fetch_op:
+  if (gimple_call_num_args (stmt2) >= 2 && gimple_call_lhs (stmt2))
+   {
+ tree lhs2 = gimple_call_lhs (stmt2), lhsc = lhs2;
+ tree arg = gimple_call_arg (stmt2, 1);
+ gimple *use_stmt, *cast_stmt = NULL;
+ use_operand_p use_p;
+ tree ndecl = builtin_decl_explicit (other_atomic);
+
+ if (ndecl == NULL_TREE || !single_imm_use (lhs2, _p, _stmt))
+   break;
+
+ if (gimple_assign_cast_p (use_stmt))
+   {
+ cast_stmt = use_stmt;
+ lhsc = gimple_assign_lhs (cast_stmt);
+ if (lhsc == NULL_TREE
+ || !INTEGRAL_TYPE_P (TREE_TYPE (lhsc))
+ || (TYPE_PRECISION (TREE_TYPE (lhsc))
+ != TYPE_PRECISION (TREE_TYPE (lhs2)))
+ || !single_imm_use (lhsc, _p, _stmt))
+   {
+ use_stmt = cast_stmt;
+ cast_stmt = NULL;
+ lhsc = lhs2;

[PATCH] cprop_hardreg: Workaround for narrow mode != lowpart targets

2022-01-13 Thread Andreas Krebbel via Gcc-patches
The cprop_hardreg pass is built around the assumption that accessing a
register in a narrower mode is the same as accessing the lowpart of
the register.  This unfortunately is not true for vector registers on
IBM Z. This caused a miscompile of LLVM with GCC 8.5. The problem
could not be reproduced with upstream GCC unfortunately but we have to
assume that it is latent there. The right fix would require
substantial changes to the cprop pass and is certainly something we
would want for our platform. But since this would not be acceptable
for older GCCs I'll go with what Vladimir proposed in the RedHat BZ
and introduce a hopefully temporary and undocumented target hook to
disable that specific transformation in regcprop.c.

Here the RedHat BZ for reference:
https://bugzilla.redhat.com/show_bug.cgi?id=2028609

Bootstrapped and regression-tested on s390x.

Ok?

gcc/ChangeLog:

* target.def (narrow_mode_refers_low_part_p): Add new target hook.
* config/s390/s390.c (s390_narrow_mode_refers_low_part_p):
Implement new target hook for IBM Z.
(TARGET_NARROW_MODE_REFERS_LOW_PART_P): New macro.
* regcprop.c (maybe_mode_change): Disable transformation depending
on the new target hook.
---
 gcc/config/s390/s390.c | 14 ++
 gcc/regcprop.c |  3 ++-
 gcc/target.def | 12 +++-
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 056002e4a4a..aafc6d63be6 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -10488,6 +10488,18 @@ s390_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
   return false;
 }
 
+/* Implement TARGET_NARROW_MODE_REFERS_LOW_PART_P.  */
+
+static bool
+s390_narrow_mode_refers_low_part_p (unsigned int regno)
+{
+  if (reg_classes_intersect_p (VEC_REGS, REGNO_REG_CLASS (regno)))
+return false;
+
+  return true;
+}
+
+
 /* Implement TARGET_MODES_TIEABLE_P.  */
 
 static bool
@@ -17472,6 +17484,8 @@ s390_vectorize_vec_perm_const (machine_mode vmode, rtx 
target, rtx op0, rtx op1,
 #undef TARGET_VECTORIZE_VEC_PERM_CONST
 #define TARGET_VECTORIZE_VEC_PERM_CONST s390_vectorize_vec_perm_const
 
+#undef TARGET_NARROW_MODE_REFERS_LOW_PART_P
+#define TARGET_NARROW_MODE_REFERS_LOW_PART_P s390_narrow_mode_refers_low_part_p
 
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index 1a9bcf0a1ad..aaf94ad9b51 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -426,7 +426,8 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
copy_mode,
 
   if (orig_mode == new_mode)
 return gen_raw_REG (new_mode, regno);
-  else if (mode_change_ok (orig_mode, new_mode, regno))
+  else if (mode_change_ok (orig_mode, new_mode, regno)
+  && targetm.narrow_mode_refers_low_part_p (regno))
 {
   int copy_nregs = hard_regno_nregs (copy_regno, copy_mode);
   int use_nregs = hard_regno_nregs (copy_regno, new_mode);
diff --git a/gcc/target.def b/gcc/target.def
index 8fd2533e90a..598eea501ff 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5446,6 +5446,16 @@ value that the middle-end intended.",
  bool, (machine_mode from, machine_mode to, reg_class_t rclass),
  hook_bool_mode_mode_reg_class_t_true)
 
+/* This hook is used to work around a problem in regcprop. Hardcoded
+assumptions currently prevent it from working correctly for targets
+where the low part of a multi-word register doesn't align to accessing
+the register with a narrower mode.  */
+DEFHOOK_UNDOC
+(narrow_mode_refers_low_part_p,
+"",
+bool, (unsigned int regno),
+hook_bool_unit_true)
+
 /* Change pseudo allocno class calculated by IRA.  */
 DEFHOOK
 (ira_change_pseudo_allocno_class,
@@ -5949,7 +5959,7 @@ register if floating point arithmetic is not being done.  
As long as the\n\
 floating registers are not in class @code{GENERAL_REGS}, they will not\n\
 be used unless some pattern's constraint asks for one.",
  bool, (unsigned int regno, machine_mode mode),
- hook_bool_uint_mode_true)
+ hook_bool_uint_true)
 
 DEFHOOK
 (modes_tieable_p,
-- 
2.33.1



[PATCH v9] rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2022-01-13 Thread Raoni Fassina Firmino via Gcc-patches
Changes since v8[8]:
  - Refactored and expanded builtin-feclearexcept-feraiseexcept-2.c
testcase:
+ Use a macro to avoid extended repetition of the core test code.
+ Expanded the test code to check builtins return code.
+ Added more tests to test all valid (standard) exceptions input
  combinations.
+ Updated the header comment to explain why the input must be
  passed as constants.

Changes since v7[7]:
  - Fixed an array indexing bug on fegeround testcase.
  - Fixed typos and spelling mistakes spread trouout the added comments.
  - Reworded header comment/description for fegetround expander.
  - Fixed changelog in the commit message.

This is an update to v8, based on the same review from Seguer to
expand the test coverage for feclearexcept and feraiseexcept (That is
also why I am keeping the v8 changelog here, since v8 had no reviews).
Two things to point out is: 1) The use of a macro there instead of a
function, unfortunately the builtins (for rs6000) only expand when the
input is a constant, so a macro is the way to go, and for the same
reason 2) I wanted to simplify the way to test all combinations of
input, but I could not think in a way without making some macro magics
that would be way less readable than listing all combinations by hand.

Tested on top of master (02a8a01bf396e009bfc31e1104c315fd403b4cca)
on the following plataforms with no regression:
  - powerpc64le-linux-gnu (Power 9)
  - powerpc64le-linux-gnu (Power 8)
  - powerpc64-linux-gnu (Power 9, with 32 and 64 bits tests)

Documentation changes tested on x86_64-redhat-linux.

==

I'm repeating the "changelog" from past versions here for convenience:

Changes since v6[6] and v5[5]:
  - Based this version on the v5 one.
  - Reworked all builtins back to the way they are in v5 and added the
following changes:
+ Added a test to target libc, only expanding with glibc as the
  target libc.
+ Updated all three expanders header comment to reflect the added
  behavior (fegetround got a full header as it had none).
+ Added extra documentation for the builtins on doc/extend.texi,
  similar to v6 version, but only the introductory paragraph,
  without a dedicated entry for each, since now they behavior and
  signature match the C99 ones.
  - Changed the description for the return operand in the RTL template
of the fegetround expander.  Using "(set )", the same way as
rs6000_mffsl expander (this change was taken from v6).
  - Updated the commit message mentioning the target libc restriction
and updated changelog.

Changes since v5[5]:
  - Reworked all builtins to accept the FE_* macros as parameters and
so be agnostic to libc implementations.  Largely based of
fpclassify.  To that end, there is some new files changed:
+ Change the argument list for the builtins declarations in
  builtins.def
+ Added new types in builtin-types.def to use in the buitins
  declarations.
+ Added extra documentation for the builtins on doc/extend.texi,
  similar to fpclassify.
  - Updated doc/md.texi documentation with the new optab behaviors.
  - Updated comments to the expanders and expand handlers to try to
explain whats is going on.
  - Changed the description for the return operand in the RTL template
of the fegetround expander.  Using "(set )", the same way as
rs6000_mffsl expander.
  - Updated testcases with helper macros with the new argument list.

Changes since v4[4]:
  - Fixed more spelling and code style.
  - Add more clarification on  comments for feraiseexcept and
feclearexcept expands;

Changes since v3[3]:
  - Fixed fegetround bug on powerpc64 (big endian) that Segher
spotted;

Changes since v2[2]:
  - Added documentation for the new optabs;
  - Remove use of non portable __builtin_clz;
  - Changed feclearexcept and feraiseexcept to accept all 4 valid
flags at the same time and added more test for that case;
  - Extended feclearexcept and feraiseexcept testcases to match
accepting multiple flags;
  - Fixed builtin-feclearexcept-feraiseexcept-2.c testcase comparison
after feclearexcept tests;
  - Updated commit message to reflect change in feclearexcept and
feraiseexcept from the glibc counterpart;
  - Fixed English spelling and typos;
  - Fixed code-style;
  - Changed subject line tag to make clear it is not just rs6000 code.

Changes since v1[1]:
  - Fixed English spelling;
  - Fixed code-style;
  - Changed match operand predicate in feclearexcept and feraiseexcept;
  - Changed testcase options;
  - Minor changes in test code to be C90 compatible;
  - Other minor changes suggested by Segher;
  - Changed subject line tag (not sure if I tagged correctly or should
include optabs: also)

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552024.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553297.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557109.html
[4] 

Re: [PATCH] tree-optimization/83072 - Allow more precision when querying from fold_const.

2022-01-13 Thread Andrew MacLeod via Gcc-patches

On 1/13/22 10:13, Richard Biener wrote:

On Thu, Jan 13, 2022 at 2:59 PM Andrew MacLeod via Gcc-patches
 wrote:

This patch actually addresses a few PRs.

The root PR was 97909.   Ranger context functionality was added to
fold_const back in early November
(https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583216.html)

The other 2 PRs mentioned (83072 and 83073) partially worked after this,
but the original patch did not change the result of the query in
expr_not_equal_to () to a multi-range object.

This patch simply changes the value_range variable in that routine to an
int_range<5> so we can pick up more precision. This in turn allows us to
capture all the tests as expected.

Bootstrapped on x86_64-pc-linux-gnu with no regressions.

OK for trunk?

OK (though I wonder why not use int_range_max?)

No good reason..  Initially it was just because I wasn't familiar with 
what call chains might end up here, but really, I guess it doesn't matter.


I can change it to int_range_max before committing it.

Andrew



[committed] libgfortran: Fix Solaris version file creation [PR104006]

2022-01-13 Thread Jakub Jelinek via Gcc-patches
Hi!

I forgot to change the gfortran.map-sun goal to gfortran.ver-sun
when changing other spots for the preprocessed version file.

Fixed thusly, committed to trunk as obvious.

2022-01-13  Jakub Jelinek  

PR libfortran/104006
* Makefile.am (gfortran.map-sun): Rename target to ...
(gfortran.ver-sun): ... this.
* Makefile.in: Regenerated.

--- libgfortran/Makefile.am.jj  2022-01-13 17:43:58.685553296 +0100
+++ libgfortran/Makefile.am 2022-01-13 17:44:40.503962317 +0100
@@ -23,7 +23,7 @@ endif
 if LIBGFOR_USE_SYMVER_SUN
 version_arg = -Wl,-M,gfortran.ver-sun
 version_dep = gfortran.ver-sun gfortran.ver
-gfortran.map-sun : gfortran.ver \
+gfortran.ver-sun : gfortran.ver \
$(top_srcdir)/../contrib/make_sunver.pl \
$(libgfortran_la_OBJECTS) $(libgfortran_la_LIBADD)
perl $(top_srcdir)/../contrib/make_sunver.pl \
--- libgfortran/Makefile.in.jj  2022-01-13 17:43:58.687553267 +0100
+++ libgfortran/Makefile.in 2022-01-13 17:44:51.468807363 +0100
@@ -719,7 +719,6 @@ pdfdir = @pdfdir@
 prefix = @prefix@
 program_transform_name = @program_transform_name@
 psdir = @psdir@
-runstatedir = @runstatedir@
 sbindir = @sbindir@
 sharedstatedir = @sharedstatedir@
 srcdir = @srcdir@
@@ -7616,7 +7615,7 @@ uninstall-am: uninstall-cafexeclibLTLIBR
 @libgfor_use_symver_t...@gfortran.ver: $(srcdir)/gfortran.map kinds.inc
 @LIBGFOR_USE_SYMVER_TRUE@  $(EGREP) -v '#(#| |$$)' $< | \
 @LIBGFOR_USE_SYMVER_TRUE@$(PREPROCESS) -P -include config.h -include 
kinds.inc - > $@ || (rm -f $@ ; exit 1)
-@LIBGFOR_USE_SYMVER_SUN_TRUE@@libgfor_use_symver_t...@gfortran.map-sun : 
gfortran.ver \
+@LIBGFOR_USE_SYMVER_SUN_TRUE@@libgfor_use_symver_t...@gfortran.ver-sun : 
gfortran.ver \
 @LIBGFOR_USE_SYMVER_SUN_TRUE@@LIBGFOR_USE_SYMVER_TRUE@ 
$(top_srcdir)/../contrib/make_sunver.pl \
 @LIBGFOR_USE_SYMVER_SUN_TRUE@@LIBGFOR_USE_SYMVER_TRUE@ 
$(libgfortran_la_OBJECTS) $(libgfortran_la_LIBADD)
 @LIBGFOR_USE_SYMVER_SUN_TRUE@@LIBGFOR_USE_SYMVER_TRUE@ perl 
$(top_srcdir)/../contrib/make_sunver.pl \

Jakub



Re: [PATCH] rs6000: Fix constraint v with rs6000_constraints[RS6000_CONSTRAINT_v]

2022-01-13 Thread David Edelsohn via Gcc-patches
On Thu, Jan 13, 2022 at 7:28 AM Kewen.Lin  wrote:
>
> on 2022/1/13 上午11:56, Kewen.Lin via Gcc-patches wrote:
> > on 2022/1/13 上午11:44, David Edelsohn wrote:
> >> On Wed, Jan 12, 2022 at 10:38 PM Kewen.Lin  wrote:
> >>>
> >>> Hi David,
> >>>
> >>> on 2022/1/13 上午11:07, David Edelsohn wrote:
>  On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
> >
> > Hi,
> >
> > This patch is to fix register constraint v with
> > rs6000_constraints[RS6000_CONSTRAINT_v] instead of ALTIVEC_REGS,
> > just like some other existing register constraints with
> > RS6000_CONSTRAINT_*.
> >
> > I happened to see this and hope it's not intentional and just
> > got neglected.
> >
> > Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> > powerpc64-linux-gnu P8.
> >
> > Is it ok for trunk?
> 
>  Why do you want to make this change?
> 
>  rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;
> 
>  but all of the patterns that use a "v" constraint are (or should be)
>  protected by TARGET_ALTIVEC, or some final condition that only is
>  active for TARGET_ALTIVEC.  The other constraints are conditionally
>  set because they can be used in a pattern with multiple alternatives
>  where the pattern itself is active but some of the constraints
>  correspond to NO_REGS when some instruction variants for VSX is not
>  enabled.
> 
> >>>
> >>> Good point!  Thanks for the explanation.
> >>>
>  The change isn't wrong, but it doesn't correct a bug and provides no
>  additional benefit nor clarty that I can see.
> 
> >>>
> >>> The original intention is to make it consistent with the other existing
> >>> register constraints with RS6000_CONSTRAINT_*, otherwise it looks a bit
> >>> weird (like was neglected).  After you clarified above, 
> >>> RS6000_CONSTRAINT_v
> >>> seems useless at all in the current framework.  Do you prefer to remove
> >>> it to avoid any confusions instead?
> >>
> >> It's used in the reg_class, so there may be some heuristic in the GCC
> >> register allocator that cares about the number of registers available
> >> for the target.  rs6000_constraints[RS6000_CONSTRAINT_v] is defined
> >> conditionally, so it seems best to leave it as is.
> >>
> >
> > I may miss something, but I didn't find it's used for the above purposes.
> > If it's best to leave it as is, the proposed patch seems to offer better
> > readability.
>
> Two more inputs for maintainers' decision:
>
> 1) the original proposed patch fixed one "bug" that is:
>
> In function rs6000_debug_reg_global, it tries to print the register class
> for the register constraint:
>
>   fprintf (stderr,
>"\n"
>"d  reg_class = %s\n"
>"f  reg_class = %s\n"
>"v  reg_class = %s\n"
>"wa reg_class = %s\n"
>...
>"\n",
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wa]],
>...
>
> It uses rs6000_constraints[RS6000_CONSTRAINT_v] which is conditionally
> set here:
>
>   /* Add conditional constraints based on various options, to allow us to
>  collapse multiple insn patterns.  */
>   if (TARGET_ALTIVEC)
> rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;
>
> But the actual register class for register constraint is hardcoded as
> ALTIVEC_REGS rather than rs6000_constraints[RS6000_CONSTRAINT_v].

I agree that the information is inaccurate, but it is informal
debugging output.  And if Altivec is disabled, the value of the
constraint is irrelevant / garbage.

>
> 2) Bootstrapped and tested one below patch to remove all the code using
> RS6000_CONSTRAINT_v on powerpc64le-linux-gnu P10 and P9,
> powerpc64-linux-gnu P8 and P7 with no regressions.
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 37f07fe5358..3652629c5d0 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -2320,7 +2320,6 @@ rs6000_debug_reg_global (void)
>"\n"
>"d  reg_class = %s\n"
>"f  reg_class = %s\n"
> -  "v  reg_class = %s\n"
>"wa reg_class = %s\n"
>"we reg_class = %s\n"
>"wr reg_class = %s\n"
> @@ -2329,7 +2328,6 @@ rs6000_debug_reg_global (void)
>"\n",
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]],
> -  reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wa]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_we]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wr]],
> @@ -2984,11 +2982,6 @@ 

[PATCH] i386: Add 16-bit vector modes to xop_pcmov [PR104003]

2022-01-13 Thread Uros Bizjak via Gcc-patches
2022-01-13  Uroš Bizjak  

gcc/ChangeLog:

PR target/104003
* config/i386/mmx.md (*xop_pcmov_): Use VI_16_32 mode iterator.

gcc/testsuite/ChangeLog:

PR target/104003
* g++.target/i386/pr103861-1-sse4.C: New test.
* g++.target/i386/pr103861-1-xop.C: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 8a8142c8a09..295a132bc46 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2704,11 +2704,11 @@
   [(set_attr "type" "sse4arg")])
 
 (define_insn "*xop_pcmov_"
-  [(set (match_operand:VI_32 0 "register_operand" "=x")
-(if_then_else:VI_32
-  (match_operand:VI_32 3 "register_operand" "x")
-  (match_operand:VI_32 1 "register_operand" "x")
-  (match_operand:VI_32 2 "register_operand" "x")))]
+  [(set (match_operand:VI_16_32 0 "register_operand" "=x")
+(if_then_else:VI_16_32
+  (match_operand:VI_16_32 3 "register_operand" "x")
+  (match_operand:VI_16_32 1 "register_operand" "x")
+  (match_operand:VI_16_32 2 "register_operand" "x")))]
   "TARGET_XOP"
   "vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")])
diff --git a/gcc/testsuite/g++.target/i386/pr103861-1-sse4.C 
b/gcc/testsuite/g++.target/i386/pr103861-1-sse4.C
new file mode 100644
index 000..a07b3ad111d
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr103861-1-sse4.C
@@ -0,0 +1,5 @@
+/* PR target/103861 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse4" } */
+
+#include "pr103861-1.C"
diff --git a/gcc/testsuite/g++.target/i386/pr103861-1-xop.C 
b/gcc/testsuite/g++.target/i386/pr103861-1-xop.C
new file mode 100644
index 000..d65542dc57f
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr103861-1-xop.C
@@ -0,0 +1,5 @@
+/* PR target/103861 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mxop" } */
+
+#include "pr103861-1.C"


Re: Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics for OpenACC test cases

2022-01-13 Thread Martin Sebor via Gcc-patches

On 1/13/22 03:55, Thomas Schwinge wrote:

Hi!

This has fallen out of (unfinished...) work earlier in the year: pushed
to master branch commit 4bd8b1e881f0c26a5103cd1919809b3d63b60ef2
"Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics
for OpenACC test cases".


Thanks for the heads up.  If any of these are recent regressions
(either the false negatives or the false positives) it would be
helpful to isolate them to a few representative test cases.
The warning itself hasn't changed much in GCC 12 but regressions
in it could be due to the jump threading changes that it tends to
be sensitive to.

Martin




Grüße
  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955




Re: [PATCH] Fix -Wformat-diag for rs6000 target.

2022-01-13 Thread Martin Liška

On 1/13/22 13:55, Richard Sandiford wrote:

Like you say in the linked message, we could add an explicit noun too.
But the change seems OK as-is to me.


May I consider it as an approval of the suggested patch?

Thanks,
Martin


Re: [PATCH] Loop unswitching: support gswitch statements.

2022-01-13 Thread Martin Liška

On 1/6/22 17:30, Martin Liška wrote:

I really welcome that, I've pushed devel/loop-unswitch-support-switches
branch with first changes you pointed out. Feel free playing with the branch.


Hello.

I've just pushed a revision to the branch that introduced top-level comment.
Feel free to play with the branch once you have spare cycles and we can
return to it next stage1.

Cheers,
Martin


[PATCH] [i386] Fix ICE of unrecognizable insn. [PR target/104001]

2022-01-13 Thread liuhongt via Gcc-patches
For define_insn_and_split "*xor2andn":

1. Refine predicate of operands[0] from nonimmediate_operand to
register_operand.
2. Remove TARGET_AVX512BW from condition to avoid kmov when TARGET_BMI
is not available.
3. Force_reg operands[2].

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR target/104001
PR target/94790
* config/i386/i386.md (*xor2andn): Refine predicate of
operands[0] from nonimmediate_operand to
register_operand, remove TARGET_AVX512BW from condition,
force_reg operands[2].

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr104001.c: New test.
---
 gcc/config/i386/i386.md  |  6 +++---
 gcc/testsuite/gcc.target/i386/pr104001.c | 21 +
 2 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104001.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9937643a273..7bd4f24aa07 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10455,7 +10455,7 @@ (define_insn_and_split "*xordi_1_btc"
 
 ;; PR target/94790: Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & mask)
 (define_insn_and_split "*xor2andn"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand")
+  [(set (match_operand:SWI248 0 "register_operand")
(xor:SWI248
  (and:SWI248
(xor:SWI248
@@ -10464,8 +10464,7 @@ (define_insn_and_split "*xor2andn"
(match_operand:SWI248 3 "nonimmediate_operand"))
  (match_dup 1)))
 (clobber (reg:CC FLAGS_REG))]
-  "(TARGET_BMI || TARGET_AVX512BW)
-   && ix86_pre_reload_split ()"
+  "TARGET_BMI && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(parallel [(set (match_dup 4)
@@ -10486,6 +10485,7 @@ (define_insn_and_split "*xor2andn"
  (clobber (reg:CC FLAGS_REG))])]
 {
   operands[1] = force_reg (mode, operands[1]);
+  operands[2] = force_reg (mode, operands[2]);
   operands[3] = force_reg (mode, operands[3]);
   operands[4] = gen_reg_rtx (mode);
   operands[5] = gen_reg_rtx (mode);
diff --git a/gcc/testsuite/gcc.target/i386/pr104001.c 
b/gcc/testsuite/gcc.target/i386/pr104001.c
new file mode 100644
index 000..bd85aa7145e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr104001.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "kandn" } } */
+/* { dg-final { scan-assembler-times "andn" 1 } } */
+
+int b, c, d;
+int r;
+
+void
+__attribute__((target("bmi")))
+foo ()
+{
+  r = ((b & ~d) | (c & d));
+}
+
+void
+__attribute__((target("avx512bw")))
+bar ()
+{
+  r = ((b & ~d) | (c & d));
+}
-- 
2.18.1



Re: [PATCH] Fix -Wformat-diag for rs6000 target.

2022-01-13 Thread Martin Sebor via Gcc-patches

On 1/13/22 05:55, Richard Sandiford wrote:

Martin Sebor via Gcc-patches  writes:

On 1/12/22 02:02, Martin Liška wrote:

Hello.

We've got -Wformat-diag for some time and I think we should start using it
in -Werror for GCC bootstrap. The following patch removes last pieces of
the warning
for rs6000 target.

Ready to be installed?
Thanks,
Martin


gcc/ChangeLog:

  * config/rs6000/rs6000-call.c (rs6000_invalid_builtin): Wrap
  keywords and use %qs instead of %<%s%>.
  (rs6000_expand_builtin): Likewise.

gcc/testsuite/ChangeLog:

  * gcc.target/powerpc/bfp/scalar-extract-exp-5.c: Adjust scans in
  testcases.
  * gcc.target/powerpc/bfp/scalar-extract-sig-5.c: Likewise.
  * gcc.target/powerpc/bfp/scalar-insert-exp-11.c: Likewise.
---
   gcc/config/rs6000/rs6000-call.c   | 8 
   .../gcc.target/powerpc/bfp/scalar-extract-exp-5.c | 2 +-
   .../gcc.target/powerpc/bfp/scalar-extract-sig-5.c | 2 +-
   .../gcc.target/powerpc/bfp/scalar-insert-exp-11.c | 2 +-
   4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-call.c
b/gcc/config/rs6000/rs6000-call.c
index c78b8b08c40..becdad73812 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -3307,7 +3307,7 @@ rs6000_invalid_builtin (enum rs6000_gen_builtins
fncode)
    "-mvsx");
     break;
   case ENB_IEEE128_HW:
-  error ("%qs requires ISA 3.0 IEEE 128-bit floating point", name);
+  error ("%qs requires ISA 3.0 IEEE 128-bit floating-point", name);


The instances of the warning where floating point is at the end
of a message aren't correct.  The warning should be relaxed to
allow unhyphenated floating point as a noun (as discussed briefly
last March:
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566881.html)


Wouldn't it be fair to say that “floating point” in the message above is
really an adjective modifying an implicit noun?  The floating (decimal)
point doesn't itself have 128 bits.

Like you say in the linked message, we could add an explicit noun too.
But the change seems OK as-is to me.


I agree you could say that too.  I didn't mean what I said as
an objection to the change but more as an observation that it
shouldn't be necessary (and an acknowledgment that I haven't
yet done what I said I'd do).

Martin



Thanks,
Richard




Re: [PATCH] Fix -Wformat-diag for ARM target.

2022-01-13 Thread Martin Liška

On 1/13/22 16:37, Richard Earnshaw wrote:

  "range [0-%d] enabled with %<+cdecp%>"


Great, this works.

So I'm going to push the commit.

Martin



The other changes look OK.




Re: [PATCH] Fix -Wformat-diag for ARM target.

2022-01-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 13, 2022 at 03:37:31PM +, Richard Earnshaw via Gcc-patches 
wrote:
> I'm not sure about this hunk.  It changes a literal '<'...'>' into quotes.
> The text is trying to say you substitute  with a digit in the range
> shown.  Closer would be:
> 
>  "range [0-%d] enabled with %<+cdecp%>"

Then perhaps it should be %<+cdecp%>N ?   in between quotes suggests
literal .

Jakub



Re: [vect] PR103997: Fix epilogue mode skipping

2022-01-13 Thread Andre Vieira (lists) via Gcc-patches



On 13/01/2022 14:25, Richard Biener wrote:

On Thu, 13 Jan 2022, Andre Vieira (lists) wrote:


On 13/01/2022 12:36, Richard Biener wrote:

On Thu, 13 Jan 2022, Andre Vieira (lists) wrote:


This time to the list too (sorry for double email)

Hi,

The original patch '[vect] Re-analyze all modes for epilogues', skipped
modes
that should not be skipped since it used the vector mode provided by
autovectorize_vector_modes to derive the minimum VF required for it.
However,
those modes should only really be used to dictate vector size, so instead
this
patch looks for the mode in 'used_vector_modes' with the largest element
size,
and constructs a vector mode with the smae size as the current
vector_modes[mode_i]. Since we are using the largest element size the
NUNITs
for this mode is the smallest possible VF required for an epilogue with
this
mode and should thus skip only the modes we are certain can not be used.

Passes bootstrap and regression on x86_64 and aarch64.

Clearly

+ /* To make sure we are conservative as to what modes we skip, we
+should use check the smallest possible NUNITS which would be
+derived from the mode in USED_VECTOR_MODES with the largest
+element size.  */
+ scalar_mode max_elsize_mode = GET_MODE_INNER
(vector_modes[mode_i]);
+ for (vec_info::mode_set::iterator i =
+   first_loop_vinfo->used_vector_modes.begin ();
+ i != first_loop_vinfo->used_vector_modes.end (); ++i)
+   {
+ if (VECTOR_MODE_P (*i)
+ && GET_MODE_SIZE (GET_MODE_INNER (*i))
+ > GET_MODE_SIZE (max_elsize_mode))
+   max_elsize_mode = GET_MODE_INNER (*i);
+   }

can be done once before iterating over the modes for the epilogue.

True, I'll start with QImode instead of the inner of vector_modes[mode_i] too
since we can't guarantee the mode is a VECTOR_MODE_P and it is actually better
too since we can't possible guarantee the element size of the
USED_VECTOR_MODES is smaller than that of the first vector mode...


Richard maybe knows whether we should take care to look at the
size of the vector mode as well since related_vector_mode when
passed 0 as nunits produces a vector mode with the same size
as vector_modes[mode_i] but not all used_vector_modes may be
of the same size

I suspect that should be fine though, since if we use the largest element size
of all used_vector_modes then that should gives us the least possible number
of NUNITS and thus only conservatively skip. That said, that does assume that
no vector mode used may be larger than the size of the loop's vector_mode. Can
I assume that?

No idea, but I would lean towards a no ;)  I think the loops vector_mode
doesn't have to match vector_modes[mode_i] either, does it?  At least
autodetected_vector_mode will be not QImode based.
The mode doesn't but both vector modes have to be the same vector size 
surely, I'm not referring to the element size here.
What I was trying to ask was whether all vector modes in 
used_vector_modes had the same vector size as the loops vector mode (and 
the vector_modes[mode_i] it originated from).



(and you probably also want to exclude
VECTOR_BOOLEAN_TYPE_P from the search?)

Yeah I think so too, thanks!

I keep going back to thinking (as I brought up in the bugzilla ticket), maybe
we ought to only skip if the NUNITS of the vector mode with the same vector
size as vector_modes[mode_i] is larger than first_info_vf, or just don't skip
at all...

The question is how much work we do before realizing the chosen mode
cannot be used because there's not enough iterations?  Maybe we can
improve there easily?
IIUC the VF can change depending on whether we decide to use SLP, so 
really we can only check if after we have determined whether or not to 
use SLP, so either:
* When SLP fully succeeds, so somewhere between the last 'goto again;' 
and return success, but there is very little left to do there

* When SLP fails: here we could save on some work.



Also for targets that for the main loop do not perform cost
comparison (like x86) but have lots of vector modes the previous
mode of operation really made sense (start at next_mode_i or
mode_i when unrolling).
Are you hinting at maybe creating different paths here based on some 
target configurable thing? Could be something we ask vector_costs?




Re: [PATCH] Fix -Wformat-diag for ARM target.

2022-01-13 Thread Richard Earnshaw via Gcc-patches




On 12/01/2022 12:59, Martin Liška wrote:

Hello.

We've got -Wformat-diag for some time and I think we should start using it
in -Werror for GCC bootstrap. The following patch removes last pieces of 
the warning

for ARM target.





> diff --git a/gcc/config/arm/arm-builtins.c 
b/gcc/config/arm/arm-builtins.c

> index 9c645722230..ab5c469b1ba 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -3013,7 +3013,7 @@ constant_arg:
> else
>   error_at (EXPR_LOCATION (exp),
> "coproc must be a constant immediate in "
> -  "range [0-%d] enabled with +cdecp",
> +  "range [0-%d] enabled with +cdecp%",
> ARM_CDE_CONST_COPROC);
>   }
> else

I'm not sure about this hunk.  It changes a literal '<'...'>' into 
quotes.  The text is trying to say you substitute  with a digit in 
the range shown.  Closer would be:


 "range [0-%d] enabled with %<+cdecp%>"

The other changes look OK.

R.


Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

 * common/config/arm/arm-common.c (arm_target_mode): Wrap
 keywords with %<, %> and remove trailing punctuation char.
 (arm_canon_arch_option_1): Likewise.
 (arm_asm_auto_mfpu): Likewise.
 * config/arm/arm-builtins.c (arm_expand_builtin): Likewise.
 * config/arm/arm.c (arm_options_perform_arch_sanity_checks): Likewise.
 (use_vfp_abi): Likewise.
 (aapcs_vfp_is_call_or_return_candidate): Likewise.
 (arm_handle_cmse_nonsecure_entry): Likewise.
 (arm_handle_cmse_nonsecure_call): Likewise.
 (thumb1_md_asm_adjust): Likewise.
---
  gcc/common/config/arm/arm-common.c | 12 +++
  gcc/config/arm/arm-builtins.c  | 50 +++---
  gcc/config/arm/arm.c   | 12 +++
  3 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/gcc/common/config/arm/arm-common.c 
b/gcc/common/config/arm/arm-common.c

index e7e19400263..6a898d8554b 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -286,7 +286,7 @@ arm_target_mode (int argc, const char **argv)

    if (argc % 2 != 0)
  fatal_error (input_location,
- "%%:target_mode_check takes an even number of parameters");
+ "%%:% takes an even number of parameters");

    while (argc)
  {
@@ -295,8 +295,8 @@ arm_target_mode (int argc, const char **argv)
    else if (strcmp (argv[0], "cpu") == 0)
  cpu = argv[1];
    else
-    fatal_error (input_location,
- "unrecognized option passed to %%:target_mode_check");
+    fatal_error (input_location, "unrecognized option passed to %%:"
+ "%>");
    argc -= 2;
    argv += 2;
  }
@@ -662,7 +662,7 @@ arm_canon_arch_option_1 (int argc, const char 
**argv, bool arch_for_multilib)


    if (argc & 1)
  fatal_error (input_location,
- "%%:canon_for_mlib takes 1 or more pairs of parameters");
+ "%%:% takes 1 or more pairs of parameters");

    while (argc)
  {
@@ -676,7 +676,7 @@ arm_canon_arch_option_1 (int argc, const char 
**argv, bool arch_for_multilib)

  abi = argv[1];
    else
  fatal_error (input_location,
- "unrecognized operand to %%:canon_for_mlib");
+ "unrecognized operand to %%:%");

    argc -= 2;
    argv += 2;
@@ -1032,7 +1032,7 @@ arm_asm_auto_mfpu (int argc, const char **argv)
  arch = argv[1];
    else
  fatal_error (input_location,
- "unrecognized operand to %%:asm_auto_mfpu");
+ "unrecognized operand to %%:%");
    argc -= 2;
    argv += 2;
  }
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 9c645722230..ab5c469b1ba 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -3013,7 +3013,7 @@ constant_arg:
    else
  error_at (EXPR_LOCATION (exp),
    "coproc must be a constant immediate in "
-  "range [0-%d] enabled with +cdecp",
+  "range [0-%d] enabled with +cdecp%",
    ARM_CDE_CONST_COPROC);
  }
    else
@@ -3860,60 +3860,60 @@ arm_expand_builtin (tree exp,
    && (imm < 0 || imm > 32))
  {
    if (fcode == ARM_BUILTIN_WRORHI)
-    error ("the range of count should be in 0 to 32.  please check 
the intrinsic _mm_rori_pi16 in code.");
+    error ("the range of count should be in 0 to 32; please check 
the intrinsic %<_mm_rori_pi16%> in code");

    else if (fcode == ARM_BUILTIN_WRORWI)
-    error ("the range of count should be in 0 to 32.  please check 
the intrinsic _mm_rori_pi32 in code.");
+    error ("the range of count should be in 0 to 32; please check 
the intrinsic %<_mm_rori_pi32%> in code");

    else if (fcode == ARM_BUILTIN_WRORH)
-    error ("the 

Re: [PATCH] rs6000: Use known constant for GET_MODE_NUNITS and similar

2022-01-13 Thread David Edelsohn via Gcc-patches
On Thu, Jan 13, 2022 at 7:40 AM Kewen.Lin  wrote:
>
> Hi David,
>
> on 2022/1/13 上午11:12, David Edelsohn wrote:
> > On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
> >>
> >> Hi,
> >>
> >> This patch is to clean up some codes with GET_MODE_UNIT_SIZE or
> >> GET_MODE_NUNITS, which can use known constant instead.
> >
> > I'll let Segher decide, but often the additional code is useful
> > self-documentation instead of magic constants.  Or at least the change
> > requires comments documenting the derivation of the constants
> > currently described by the code itself.
> >
>
> Thanks for the comments, I added some comments as suggested, also removed
> the whole "altivec_vreveti2" since I noticed it's useless, it's not used
> by any built-in functions and even unused in the commit db042e1603db50573.
>
> The updated version has been tested as before.

As we have discussed offline, the comments need to be clarified and expanded.

And the removal of altivec_vreveti2 should be confirmed with Carl
Love, who added the pattern less than a year ago. There may be another
patch planning to use it.

Thanks, David

>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * config/rs6000/altivec.md (altivec_vreveti2): Remove.
> * config/rs6000/vsx.md (*vsx_extract_si, 
> *vsx_extract_si_float_df,
> *vsx_extract_si_float_, *vsx_insert_extract_v4sf_p9): Use
> known constant values to simplify code.
> ---
>  gcc/config/rs6000/altivec.md | 25 -
>  gcc/config/rs6000/vsx.md | 12 
>  2 files changed, 8 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index c2312cc1e0f..b7f056f8c60 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -3950,31 +3950,6 @@ (define_expand "altivec_negv4sf2"
>DONE;
>  })
>
> -;; Vector reverse elements
> -(define_expand "altivec_vreveti2"
> -  [(set (match_operand:TI 0 "register_operand" "=v")
> -   (unspec:TI [(match_operand:TI 1 "register_operand" "v")]
> - UNSPEC_VREVEV))]
> -  "TARGET_ALTIVEC"
> -{
> -  int i, j, size, num_elements;
> -  rtvec v = rtvec_alloc (16);
> -  rtx mask = gen_reg_rtx (V16QImode);
> -
> -  size = GET_MODE_UNIT_SIZE (TImode);
> -  num_elements = GET_MODE_NUNITS (TImode);
> -
> -  for (j = 0; j < num_elements; j++)
> -for (i = 0; i < size; i++)
> -  RTVEC_ELT (v, i + j * size)
> -   = GEN_INT (i + (num_elements - 1 - j) * size);
> -
> -  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> -  emit_insn (gen_altivec_vperm_ti (operands[0], operands[1],
> -operands[1], mask));
> -  DONE;
> -})
> -
>  ;; Vector reverse elements for V16QI V8HI V4SI V4SF
>  (define_expand "altivec_vreve2"
>[(set (match_operand:VEC_K 0 "register_operand" "=v")
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 802db0d112b..d246410880d 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3854,8 +3854,9 @@ (define_insn_and_split  "*vsx_extract_si"
>rtx vec_tmp = operands[3];
>int value;
>
> +  /* Adjust index for LE element ordering.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4230,8 +4231,9 @@ (define_insn_and_split "*vsx_extract_si_float_df"
>rtx v4si_tmp = operands[3];
>int value;
>
> +  /* Adjust index for LE element ordering.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4273,8 +4275,9 @@ (define_insn_and_split 
> "*vsx_extract_si_float_"
>rtx df_tmp = operands[4];
>int value;
>
> +  /* Adjust index for LE element ordering.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4466,8 +4469,9 @@ (define_insn "*vsx_insert_extract_v4sf_p9"
>  {
>int ele = INTVAL (operands[4]);
>
> +  /* Adjust index for LE element ordering.  */
>if (!BYTES_BIG_ENDIAN)
> -ele = GET_MODE_NUNITS (V4SFmode) - 1 - ele;
> +ele = 3 - ele;
>
>operands[4] = GEN_INT (GET_MODE_SIZE (SFmode) * ele);
>return "xxinsertw %x0,%x2,%4";
> --
> 2.27.0
>


Re: [PATCH] tree-optimization/83072 - Allow more precision when querying from fold_const.

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, Jan 13, 2022 at 2:59 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> This patch actually addresses a few PRs.
>
> The root PR was 97909.   Ranger context functionality was added to
> fold_const back in early November
> (https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583216.html)
>
> The other 2 PRs mentioned (83072 and 83073) partially worked after this,
> but the original patch did not change the result of the query in
> expr_not_equal_to () to a multi-range object.
>
> This patch simply changes the value_range variable in that routine to an
> int_range<5> so we can pick up more precision. This in turn allows us to
> capture all the tests as expected.
>
> Bootstrapped on x86_64-pc-linux-gnu with no regressions.
>
> OK for trunk?

OK (though I wonder why not use int_range_max?)

Thanks,
Richard.

>
> Andrew


Re: [PATCH] tree-optimization/96707 - Add relation to unsigned right shift.

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, Jan 13, 2022 at 2:58 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> A quick addition to range ops for
>
> LHS = OP1 >> OP2
>
> if OP1 and OP2 are both >= 0,   then we can register the relation  LHS
> <= OP1   and all the expected good things happen.
>
> Bootstrapped on x86_64-pc-linux-gnu with no regressions.
>
> OK for trunk?

OK.

>
> Andrew


Re: [PATCH] forwprop: Canonicalize atomic fetch_op op x to op_fetch or vice versa [PR98737]

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Jan 2022, Jakub Jelinek wrote:

> On Thu, Jan 13, 2022 at 02:49:47PM +0100, Richard Biener wrote:
> > > +   tree d = build_debug_expr_decl (type);
> > > +   gdebug *g
> > > + = gimple_build_debug_bind (d, build2 (rcode, type,
> > > +   new_lhs, arg),
> > > +stmt2);
> > > +   gsi_insert_after (, g, GSI_NEW_STMT);
> > > +   replace_uses_by (lhs2, d);
> > 
> > I wonder if you can leave a lhs2 = d; in the IL instead of using
> > replace_uses_by which will process imm uses and fold stmts while
> > we're going to do that anyway in the caller?  That would IMHO
> > be better here.
> 
> I'd need to emit them always for reversible ops and when the
> atomic call can't be last, regardless of whether it is needed or not,
> just so that next DCE would remove those up and emit those debug stmts,
> because otherwise that could result in -fcompare-debug failures
> (at least with -fno-tree-dce -fno-tree-whatever ...).
> And
> + tree narg = build_debug_expr_decl (type);
> + gdebug *g
> +   = gimple_build_debug_bind (narg,
> +  fold_convert (type, arg),
> +  stmt2);
> isn't that much more code compared to
> gimple *g = gimple_build_assign (lhs2, NOP_EXPR, arg);
> Or would you like it to be emitted always, i.e.
> if (atomic_op != BIT_AND_EXPR
>&& atomic_op != BIT_IOR_EXPR
>/* With -fnon-call-exceptions if we can't
>   add stmts after the call easily.  */
>&& !stmt_ends_bb_p (stmt2))
>   {
> tree type = TREE_TYPE (lhs2);
> if (TREE_CODE (arg) == INTEGER_CST)
>   arg = fold_convert (type, arg);
> else if (!useless_type_conversion_p (type, TREE_TYPE (arg)))
>   {
> tree narg = make_ssa_name (type);
> gimple *g = gimple_build_assign (narg, NOP_EXPR, arg);
> gsi_insert_after (, g, GSI_NEW_STMT);
> arg = narg;
>   }
> enum tree_code rcode;
> switch (atomic_op)
>   {
>   case PLUS_EXPR: rcode = MINUS_EXPR; break;
>   case MINUS_EXPR: rcode = PLUS_EXPR; break;
>   case BIT_XOR_EXPR: rcode = atomic_op; break;
>   default: gcc_unreachable ();
>   }
> tree d = build_debug_expr_decl (type);
> gimple *g = gimple_build_assign (lhs2, rcode, new_lhs, arg);
> gsi_insert_after (, g, GSI_NEW_STMT);
> lhs2 = NULL_TREE;
>   }
> in between
> update_stmt (use_stmt);
> and
> imm_use_iterator iter;
> and then do the
>  FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs2)
>if (use_stmt != cast_stmt)
> with resetting only if (lhs2)
> and similarly release_ssa_name (lhs2) only if (lhs2)?
> I think the usual case is that we emit debug exprs right away,
> not emit something that we want to DCE.
> 
> +   if (atomic_op == BIT_AND_EXPR
> +   || atomic_op == BIT_IOR_EXPR
> +   /* Or with -fnon-call-exceptions if we can't
> +  add debug stmts after the call.  */
> +   || stmt_ends_bb_p (stmt2))
> 
> 
> But now that you mention it, I think I don't handle right the
> case where lhs2 has no debug uses but there is a cast_stmt that has debug
> uses for its lhs.  We'd need to add_debug_temp in that case too and
> add a debug temp.

I'm mostly concerned about the replace_uses_by use.  forwprop
will go over newly emitted stmts and thus the hypothetical added

lhs2 = d;

record the copy and schedule the stmt for removal, substituting 'd'
in each use as it goes along the function and folding them.  It's
a bit iffy (and maybe has unintended side-effects in odd cases)
to trample around and fold stuff behind that flows back.

I'd always vote to simplify the folding code so it's easier to
maintain and not micro-optimize there since it's not going to be
a hot part of the compiler.

Richard.


[PATCH v3 15/15] arm: Fix constraint check for V8HI in mve_vector_mem_operand

2022-01-13 Thread Christophe Lyon via Gcc-patches
When compiling gcc.target/arm/mve/intrinsics/mve_immediates_1_n.c with
-mthumb -mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp, the compiler
crashes because:
error: insn does not satisfy its constraints:
(insn 28 14 17 2 (set (reg:V8HI 16 s0 [orig:249 u16 ] [249])
(mem/c:V8HI (pre_modify:SI (reg/f:SI 12 ip [248])
(plus:SI (reg/f:SI 12 ip [248])
(const_int 32 [0x20]))) [1 u16+0 S16 A64])) 
"arm_mve.h":17113:10 3011 {*mve_movv8hi}
(expr_list:REG_INC (reg/f:SI 12 ip [248])
  (nil)))
during RTL pass: reload

We are trying to generate:
vldrh.16q3, [ip], #14
but the constraint check fails because ip is not a low reg.

This patch replaces LAST_LO_REGNUM by LAST_ARM_REGNUM in
mve_vector_mem_operand and avoids the ICE.

2022-01-13  Christophe Lyon  

gcc/
* config/arm/arm.c (mve_vector_mem_operand): Fix handling of V8HI.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7d56fa71806..5edca248fb7 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13479,7 +13479,7 @@ mve_vector_mem_operand (machine_mode mode, rtx op, bool 
strict)
  case E_V4HImode:
  case E_V4HFmode:
if (val % 2 == 0 && abs (val) <= 254)
- return reg_no <= LAST_LO_REGNUM
+ return reg_no <= LAST_ARM_REGNUM
|| reg_no >= FIRST_PSEUDO_REGISTER;
return FALSE;
  case E_V4SImode:
-- 
2.25.1



[PATCH v3 14/15] arm: Add VPR_REG to ALL_REGS

2022-01-13 Thread Christophe Lyon via Gcc-patches
VPR_REG should be part of ALL_REGS, this patch fixes this omission.

2022-01-13  Christophe Lyon  

gcc/
* config/arm/arm.h (REG_CLASS_CONTENTS): Add VPR_REG to ALL_REGS.

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 2416fb5ef64..ea9fb16b9b1 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1347,7 +1347,7 @@ enum reg_class
   { 0x, 0x, 0x, 0x0080 }, /* AFP_REG */\
   { 0x, 0x, 0x, 0x0400 }, /* VPR_REG.  */  \
   { 0x5FFF, 0x, 0x, 0x0400 }, /* GENERAL_AND_VPR_REGS. 
 */ \
-  { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS.  */ \
+  { 0x7FFF, 0x, 0x, 0x040F }  /* ALL_REGS.  */ \
 }
 
 #define FP_SYSREGS \
-- 
2.25.1



[PATCH v3 13/15] arm: Convert more MVE/CDE builtins to predicate qualifiers

2022-01-13 Thread Christophe Lyon via Gcc-patches
This patch covers a few non-load/store builtins where we do not use
the  iterator and thus we cannot use .

2022-01-13  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (CX_UNARY_UNONE_QUALIFIERS): Use
predicate.
(CX_BINARY_UNONE_QUALIFIERS): Likewise.
(CX_TERNARY_UNONE_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
(QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Delete.
* config/arm/arm_mve_builtins.def: Use predicated qualifiers.
* config/arm/mve.md: Use VxBI instead of HI.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 73678a00398..f9437752a22 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -295,7 +295,7 @@ static enum arm_type_qualifiers
 arm_cx_unary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate, qualifier_none,
   qualifier_unsigned_immediate,
-  qualifier_unsigned };
+  qualifier_predicate };
 #define CX_UNARY_UNONE_QUALIFIERS (arm_cx_unary_unone_qualifiers)
 
 /* T (immediate, T, T, unsigned immediate).  */
@@ -304,7 +304,7 @@ arm_cx_binary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate,
   qualifier_none, qualifier_none,
   qualifier_unsigned_immediate,
-  qualifier_unsigned };
+  qualifier_predicate };
 #define CX_BINARY_UNONE_QUALIFIERS (arm_cx_binary_unone_qualifiers)
 
 /* T (immediate, T, T, T, unsigned immediate).  */
@@ -313,7 +313,7 @@ arm_cx_ternary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate,
   qualifier_none, qualifier_none, qualifier_none,
   qualifier_unsigned_immediate,
-  qualifier_unsigned };
+  qualifier_predicate };
 #define CX_TERNARY_UNONE_QUALIFIERS (arm_cx_ternary_unone_qualifiers)
 
 /* The first argument (return type) of a store should be void type,
@@ -509,12 +509,6 @@ 
arm_ternop_none_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_NONE_NONE_NONE_IMM_QUALIFIERS \
   (arm_ternop_none_none_none_imm_qualifiers)
 
-static enum arm_type_qualifiers
-arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_none, qualifier_unsigned };
-#define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_ternop_none_none_none_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate };
@@ -567,13 +561,6 @@ 
arm_quadop_unone_unone_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_UNONE_UNONE_NONE_NONE_PRED_QUALIFIERS \
   (arm_quadop_unone_unone_none_none_pred_qualifiers)
 
-static enum arm_type_qualifiers
-arm_quadop_none_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_none, qualifier_none,
-qualifier_unsigned };
-#define QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_quadop_none_none_none_none_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_quadop_none_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none,
@@ -588,13 +575,6 @@ 
arm_quadop_none_none_none_imm_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_NONE_NONE_NONE_IMM_PRED_QUALIFIERS \
   (arm_quadop_none_none_none_imm_pred_qualifiers)
 
-static enum arm_type_qualifiers
-arm_quadop_unone_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
-qualifier_unsigned, qualifier_unsigned };
-#define QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
-  (arm_quadop_unone_unone_unone_unone_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_quadop_unone_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
diff --git a/gcc/config/arm/arm_mve_builtins.def 
b/gcc/config/arm/arm_mve_builtins.def
index 7db6d47867e..1c8ee34f5cb 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -87,8 +87,8 @@ VAR4 (BINOP_UNONE_UNONE_UNONE, vcreateq_u, v16qi, v8hi, v4si, 
v2di)
 VAR4 (BINOP_NONE_UNONE_UNONE, vcreateq_s, v16qi, v8hi, v4si, v2di)
 VAR3 (BINOP_UNONE_UNONE_IMM, vshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi, v8hi, v4si)
-VAR1 (BINOP_NONE_NONE_UNONE, vaddlvq_p_s, v4si)
-VAR1 (BINOP_UNONE_UNONE_UNONE, vaddlvq_p_u, v4si)
+VAR1 (BINOP_NONE_NONE_PRED, vaddlvq_p_s, v4si)
+VAR1 (BINOP_UNONE_UNONE_PRED, vaddlvq_p_u, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vshlq_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_NONE, vshlq_u, v16qi, v8hi, v4si)
@@ -465,20 +465,20 @@ VAR2 

[PATCH v3 12/15] arm: Convert more load/store MVE builtins to predicate qualifiers

2022-01-13 Thread Christophe Lyon via Gcc-patches
This patch covers a few builtins where we do not use the 
iterator and thus we cannot use .

For v2di instructions, we keep the HI mode for predicates.

2022-01-13  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (STRSBS_P_QUALIFIERS): Use predicate
qualifier.
(STRSBU_P_QUALIFIERS): Likewise.
(LDRGBS_Z_QUALIFIERS): Likewise.
(LDRGBU_Z_QUALIFIERS): Likewise.
(LDRGBWBXU_Z_QUALIFIERS): Likewise.
(LDRGBWBS_Z_QUALIFIERS): Likewise.
(LDRGBWBU_Z_QUALIFIERS): Likewise.
(STRSBWBS_P_QUALIFIERS): Likewise.
(STRSBWBU_P_QUALIFIERS): Likewise.
* config/arm/mve.md: Use VxBI instead of HI.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 0b063b5f037..73678a00398 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -689,13 +689,13 @@ arm_strss_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_strsbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_unsigned, qualifier_immediate,
-  qualifier_none, qualifier_unsigned};
+  qualifier_none, qualifier_predicate};
 #define STRSBS_P_QUALIFIERS (arm_strsbs_p_qualifiers)
 
 static enum arm_type_qualifiers
 arm_strsbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned, qualifier_unsigned};
+  qualifier_unsigned, qualifier_predicate};
 #define STRSBU_P_QUALIFIERS (arm_strsbu_p_qualifiers)
 
 static enum arm_type_qualifiers
@@ -731,13 +731,13 @@ arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBS_Z_QUALIFIERS (arm_ldrgbs_z_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ldrgbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBU_Z_QUALIFIERS (arm_ldrgbu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -777,7 +777,7 @@ arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbwbxu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBWBXU_Z_QUALIFIERS (arm_ldrgbwbxu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -793,13 +793,13 @@ arm_ldrgbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbwbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBWBS_Z_QUALIFIERS (arm_ldrgbwbs_z_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ldrgbwbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBWBU_Z_QUALIFIERS (arm_ldrgbwbu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -815,13 +815,13 @@ arm_strsbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_strsbwbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_const,
-  qualifier_none, qualifier_unsigned};
+  qualifier_none, qualifier_predicate};
 #define STRSBWBS_P_QUALIFIERS (arm_strsbwbs_p_qualifiers)
 
 static enum arm_type_qualifiers
 arm_strsbwbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_const,
-  qualifier_unsigned, qualifier_unsigned};
+  qualifier_unsigned, qualifier_predicate};
 #define STRSBWBU_P_QUALIFIERS (arm_strsbwbu_p_qualifiers)
 
 static enum arm_type_qualifiers
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index a8087815c22..9633b7187f6 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -7282,7 +7282,7 @@ (define_insn "mve_vstrwq_scatter_base_p_v4si"
[(match_operand:V4SI 0 "s_register_operand" "w")
 (match_operand:SI 1 "immediate_operand" "i")
 (match_operand:V4SI 2 "s_register_operand" "w")
-(match_operand:HI 3 "vpr_register_operand" "Up")]
+(match_operand:V4BI 3 "vpr_register_operand" "Up")]
 VSTRWSBQ))
   ]
   "TARGET_HAVE_MVE"
@@ -7371,7 +7371,7 @@ (define_insn "mve_vldrwq_gather_base_z_v4si"
   [(set (match_operand:V4SI 0 "s_register_operand" "=")
(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
  (match_operand:SI 2 "immediate_operand" "i")
- (match_operand:HI 3 "vpr_register_operand" "Up")]
+ (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 VLDRWGBQ))
   ]
   "TARGET_HAVE_MVE"
@@ 

[PATCH v3 10/15] arm: Convert remaining MVE vcmp builtins to predicate qualifiers

2022-01-13 Thread Christophe Lyon via Gcc-patches
This is mostly a mechanical change, only tested by the intrinsics
expansion tests.

2022-01-13  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (BINOP_UNONE_NONE_NONE_QUALIFIERS):
Delete.
(TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ...
(TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS): ... this.
(TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm_mve_builtins.def (vcmp*q_n_, vcmp*q_m_f): Use new
predicated qualifiers.
* config/arm/mve.md (mve_vcmpq_n_)
(mve_vcmp*q_m_f): Use MVE_VPRED instead of HI.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 36d71ab1a13..9cc192ddb9a 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -438,12 +438,6 @@ arm_binop_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_NONE_NONE_UNONE_QUALIFIERS \
   (arm_binop_none_none_unone_qualifiers)
 
-static enum arm_type_qualifiers
-arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_none, qualifier_none };
-#define BINOP_UNONE_NONE_NONE_QUALIFIERS \
-  (arm_binop_unone_none_none_qualifiers)
-
 static enum arm_type_qualifiers
 arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_predicate, qualifier_none, qualifier_none };
@@ -504,10 +498,10 @@ 
arm_ternop_unone_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   (arm_ternop_unone_unone_imm_unone_qualifiers)
 
 static enum arm_type_qualifiers
-arm_ternop_unone_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_none, qualifier_none, qualifier_unsigned };
-#define TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_ternop_unone_none_none_unone_qualifiers)
+arm_ternop_pred_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_none, qualifier_none, qualifier_predicate 
};
+#define TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS \
+  (arm_ternop_pred_none_none_pred_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -553,6 +547,13 @@ 
arm_ternop_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS \
   (arm_ternop_unone_unone_unone_pred_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_pred_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned,
+qualifier_predicate };
+#define TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS \
+  (arm_ternop_pred_unone_unone_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
diff --git a/gcc/config/arm/arm_mve_builtins.def 
b/gcc/config/arm/arm_mve_builtins.def
index 44b41eab4c5..b7ebbcab87f 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -118,9 +118,9 @@ VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, veorq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vbicq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vandq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vaddvq_p_u, v16qi, v8hi, v4si)
@@ -142,17 +142,17 @@ VAR3 (BINOP_UNONE_UNONE_NONE, vbrsrq_n_u, v16qi, v8hi, 
v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vshlq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vrshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vqshlq_n_u, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpeqq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, 

[PATCH v3 09/15] arm: Fix vcond_mask expander for MVE (PR target/100757)

2022-01-13 Thread Christophe Lyon via Gcc-patches
The problem in this PR is that we call VPSEL with a mask of vector
type instead of HImode. This happens because operand 3 in vcond_mask
is the pre-computed vector comparison and has vector type.

This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE,
returning the appropriate VxBI mode when targeting MVE.  In turn, this
implies implementing vec_cmp,
vec_cmpu and vcond_mask_, and we can
move vec_cmp, vec_cmpu and
vcond_mask_ back to neon.md since they are not
used by MVE anymore.  The new * patterns listed above are
implemented in mve.md since they are only valid for MVE. However this
may make maintenance/comparison more painful than having all of them
in vec-common.md.

In the process, we can get rid of the recently added vcond_mve
parameter of arm_expand_vector_compare.

Compared to neon.md's vcond_mask_ before my "arm:
Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH
iterator added in r12-835 (to have V4HF/V8HF support), as well as the
(! || flag_unsafe_math_optimizations) condition which
was not present before r12-834 although SF modes were enabled by VDQW
(I think this was a bug).

Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no
longer need to generate vpsel with vectors of 0 and 1: the masks are
now merged via scalar 'ands' instructions operating on 16-bit masks
after converting the boolean vectors.

In addition, this patch fixes a problem in arm_expand_vcond() where
the result would be a vector of 0 or 1 instead of operand 1 or 2.

Since we want to skip gcc.dg/signbit-2.c for MVE, we also add a new
arm_mve effective target.

Reducing the number of iterations in pr100757-3.c from 32 to 8, we
generate the code below:

float a[32];
float fn1(int d) {
  float c = 4.0f;
  for (int b = 0; b < 8; b++)
if (a[b] != 2.0f)
  c = 5.0f;
  return c;
}

fn1:
ldr r3, .L3+48
vldr.64 d4, .L3  // q2=(2.0,2.0,2.0,2.0)
vldr.64 d5, .L3+8
vldrw.32q0, [r3] // q0=a(0..3)
addsr3, r3, #16
vcmp.f32eq, q0, q2   // cmp a(0..3) == (2.0,2.0,2.0,2.0)
vldrw.32q1, [r3] // q1=a(4..7)
vmrs r3, P0
vcmp.f32eq, q1, q2   // cmp a(4..7) == (2.0,2.0,2.0,2.0)
vmrsr2, P0  @ movhi
andsr3, r3, r2   // r3=select(a(0..3]) & select(a(4..7))
vldr.64 d4, .L3+16   // q2=(5.0,5.0,5.0,5.0)
vldr.64 d5, .L3+24
vmsr P0, r3
vldr.64 d6, .L3+32   // q3=(4.0,4.0,4.0,4.0)
vldr.64 d7, .L3+40
vpsel q3, q3, q2 // q3=vcond_mask(4.0,5.0)
vmov.32 r2, q3[1]// keep the scalar max
vmov.32 r0, q3[3]
vmov.32 r3, q3[2]
vmov.f32s11, s12
vmovs15, r2
vmovs14, r3
vmaxnm.f32  s15, s11, s15
vmaxnm.f32  s15, s15, s14
vmovs14, r0
vmaxnm.f32  s15, s15, s14
vmovr0, s15
bx  lr
.L4:
.align  3
.L3:
.word   1073741824  // 2.0f
.word   1073741824
.word   1073741824
.word   1073741824
.word   1084227584  // 5.0f
.word   1084227584
.word   1084227584
.word   1084227584
.word   1082130432  // 4.0f
.word   1082130432
.word   1082130432
.word   1082130432

2022-01-13  Christophe Lyon  

PR target/100757
gcc/
* config/arm/arm-protos.h (arm_get_mask_mode): New prototype.
(arm_expand_vector_compare): Update prototype.
* config/arm/arm.c (TARGET_VECTORIZE_GET_MASK_MODE): New.
(arm_vector_mode_supported_p): Add support for VxBI modes.
(arm_expand_vector_compare): Remove useless generation of vpsel.
(arm_expand_vcond): Fix select operands.
(arm_get_mask_mode): New.
* config/arm/mve.md (vec_cmp): New.
(vec_cmpu): New.
(vcond_mask_): New.
* config/arm/vec-common.md (vec_cmp)
(vec_cmpu): Move to ...
* config/arm/neon.md (vec_cmp)
(vec_cmpu): ... here
and disable for MVE.
* doc/sourcebuild.texi (arm_mve): Document new effective-target.

gcc/testsuite/
* gcc.dg/signbit-2.c: Skip when targeting ARM/MVE.
* lib/target-supports.exp (check_effective_target_arm_mve): New.

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index b978adf2038..a84613104b1 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -202,6 +202,7 @@ extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, 
tree, rtx, tree);
 extern bool arm_pad_reg_upward (machine_mode, tree, int);
 #endif
 extern int arm_apply_result_size (void);
+extern opt_machine_mode arm_get_mask_mode (machine_mode mode);
 
 #endif /* RTX_CODE */
 
@@ -378,7 +379,7 @@ extern void arm_emit_coreregs_64bit_shift (enum rtx_code, 
rtx, rtx, rtx, rtx,
 extern bool arm_fusion_enabled_p 

[PATCH v3 08/15] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates

2022-01-13 Thread Christophe Lyon via Gcc-patches
We make use of qualifier_predicate to describe MVE builtins
prototypes, restricting to auto-vectorizable vcmp* and vpsel builtins,
as they are exercised by the tests added earlier in the series.

Special handling is needed for mve_vpselq because it has a v2di
variant, which has no natural VPR.P0 representation: we keep HImode
for it.

The vector_compare expansion code is updated to use the right VxBI
mode instead of HI for the result.

We extend the existing thumb2_movhi_vfp and thumb2_movhi_fp16 patterns
to use the new MVE_7_HI iterator which covers HI and the new VxBI
modes, in conjunction with the new DB constraint for a constant vector
of booleans.

2022-01-13  Christophe Lyon 
Richard Sandiford  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (BINOP_PRED_UNONE_UNONE_QUALIFIERS)
(BINOP_PRED_NONE_NONE_QUALIFIERS)
(TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS)
(TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm-protos.h (mve_const_bool_vec_to_hi): New.
* config/arm/arm.c (arm_hard_regno_mode_ok): Handle new VxBI
modes.
(arm_mode_to_pred_mode): New.
(arm_expand_vector_compare): Use the right VxBI mode instead of
HI.
(arm_expand_vcond): Likewise.
(simd_valid_immediate): Handle MODE_VECTOR_BOOL.
(mve_const_bool_vec_to_hi): New.
(neon_make_constant): Call mve_const_bool_vec_to_hi when needed.
* config/arm/arm_mve_builtins.def (vcmpneq_, vcmphiq_, vcmpcsq_)
(vcmpltq_, vcmpleq_, vcmpgtq_, vcmpgeq_, vcmpeqq_, vcmpneq_f)
(vcmpltq_f, vcmpleq_f, vcmpgtq_f, vcmpgeq_f, vcmpeqq_f, vpselq_u)
(vpselq_s, vpselq_f): Use new predicated qualifiers.
* config/arm/constraints.md (DB): New.
* config/arm/iterators.md (MVE_7, MVE_7_HI): New mode iterators.
(MVE_VPRED, MVE_vpred): New attribute iterators.
* config/arm/mve.md (@mve_vcmpq_)
(@mve_vcmpq_f, @mve_vpselq_)
(@mve_vpselq_f): Use MVE_VPRED instead of HI.
(@mve_vpselq_v2di): Define separately.
(mov): New expander for VxBI modes.
* config/arm/vfp.md (thumb2_movhi_vfp, thumb2_movhi_fp16): Use
MVE_7_HI iterator and add support for DB constraint.

gcc/testsuite/
PR target/100757
PR target/101325
* gcc.dg/rtl/arm/mve-vxbi.c: New test.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 2ccfa37c302..36d71ab1a13 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -420,6 +420,12 @@ 
arm_binop_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_UNONE_UNONE_QUALIFIERS \
   (arm_binop_unone_unone_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_pred_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned };
+#define BINOP_PRED_UNONE_UNONE_QUALIFIERS \
+  (arm_binop_pred_unone_unone_qualifiers)
+
 static enum arm_type_qualifiers
 arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_none, qualifier_immediate };
@@ -438,6 +444,12 @@ arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_NONE_NONE_QUALIFIERS \
   (arm_binop_unone_none_none_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_none, qualifier_none };
+#define BINOP_PRED_NONE_NONE_QUALIFIERS \
+  (arm_binop_pred_none_none_qualifiers)
+
 static enum arm_type_qualifiers
 arm_binop_unone_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_none };
@@ -509,6 +521,12 @@ 
arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
   (arm_ternop_none_none_none_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate };
+#define TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS \
+  (arm_ternop_none_none_none_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_immediate, qualifier_unsigned 
};
@@ -528,6 +546,13 @@ 
arm_ternop_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
   (arm_ternop_unone_unone_unone_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
+qualifier_predicate };
+#define TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS \
+  (arm_ternop_unone_unone_unone_pred_qualifiers)
+
 static enum arm_type_qualifiers
 

[PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans

2022-01-13 Thread Christophe Lyon via Gcc-patches
This patch implements support for vectors of booleans to support MVE
predicates, instead of HImode.  Since the ABI mandates pred16_t (aka
uint16_t) to represent predicates in intrinsics prototypes, we
introduce a new "predicate" type qualifier so that we can map relevant
builtins HImode arguments and return value to the appropriate vector
of booleans (VxBI).

We have to update test_vector_ops_duplicate, because it iterates using
an offset in bytes, where we would need to iterate in bits: we stop
iterating when we reach the end of the vector of booleans.

In addition, we have to fix the underlying definition of vectors of
booleans because ARM/MVE needs a different representation than
AArch64/SVE. With ARM/MVE the 'true' bit is duplicated over the
element size, so that a true element of V4BI is represented by
'0b'.  This patch updates the aarch64 definition of VNx*BI as
needed.

2022-01-13  Christophe Lyon  
Richard Sandiford  

gcc/
PR target/100757
PR target/101325
* config/aarch64/aarch64-modes.def (VNx16BI, VNx8BI, VNx4BI,
VNx2BI): Update definition.
* config/arm/arm-builtins.c (arm_init_simd_builtin_types): Add new
simd types.
(arm_init_builtin): Map predicate vectors arguments to HImode.
(arm_expand_builtin_args): Move HImode predicate arguments to VxBI
rtx. Move return value to HImode rtx.
* config/arm/arm-builtins.h (arm_type_qualifiers): Add 
qualifier_predicate.
* config/arm/arm-modes.def (B2I, B4I, V16BI, V8BI, V4BI): New modes.
* config/arm/arm-simd-builtin-types.def (Pred1x16_t,
Pred2x8_t,Pred4x4_t): New.
* emit-rtl.c (init_emit_once): Handle all boolean modes.
* genmodes.c (mode_data): Add boolean field.
(blank_mode): Initialize it.
(make_complex_modes): Fix handling of boolean modes.
(make_vector_modes): Likewise.
(VECTOR_BOOL_MODE): Use new COMPONENT parameter.
(make_vector_bool_mode): Likewise.
(BOOL_MODE): New.
(make_bool_mode): New.
(emit_insn_modes_h): Fix generation of boolean modes.
(emit_class_narrowest_mode): Likewise.
* machmode.def: Use new BOOL_MODE instead of FRACTIONAL_INT_MODE
to define BImode.
* rtx-vector-builder.c (rtx_vector_builder::find_cached_value):
Fix handling of constm1_rtx for VECTOR_BOOL.
* simplify-rtx.c (native_encode_rtx): Fix support for VECTOR_BOOL.
(native_decode_vector_rtx): Likewise.
(test_vector_ops_duplicate): Skip vec_merge test
with vectors of booleans.
* varasm.c (output_constant_pool_2): Likewise.

diff --git a/gcc/config/aarch64/aarch64-modes.def 
b/gcc/config/aarch64/aarch64-modes.def
index 976bf9b42be..8f399225a80 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -47,10 +47,10 @@ ADJUST_FLOAT_FORMAT (HF, _half_format);
 
 /* Vector modes.  */
 
-VECTOR_BOOL_MODE (VNx16BI, 16, 2);
-VECTOR_BOOL_MODE (VNx8BI, 8, 2);
-VECTOR_BOOL_MODE (VNx4BI, 4, 2);
-VECTOR_BOOL_MODE (VNx2BI, 2, 2);
+VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
+VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
+VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
+VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
 
 ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
 ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 9c645722230..2ccfa37c302 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -1548,6 +1548,13 @@ arm_init_simd_builtin_types (void)
   arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
   arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
 
+  if (TARGET_HAVE_MVE)
+{
+  arm_simd_types[Pred1x16_t].eltype = unsigned_intHI_type_node;
+  arm_simd_types[Pred2x8_t].eltype = unsigned_intHI_type_node;
+  arm_simd_types[Pred4x4_t].eltype = unsigned_intHI_type_node;
+}
+
   for (i = 0; i < nelts; i++)
 {
   tree eltype = arm_simd_types[i].eltype;
@@ -1695,6 +1702,11 @@ arm_init_builtin (unsigned int fcode, arm_builtin_datum 
*d,
   if (qualifiers & qualifier_map_mode)
op_mode = d->mode;
 
+  /* MVE Predicates use HImode as mandated by the ABI: pred16_t is unsigned
+short.  */
+  if (qualifiers & qualifier_predicate)
+   op_mode = HImode;
+
   /* For pointers, we want a pointer to the basic type
 of the vector.  */
   if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
@@ -2939,6 +2951,11 @@ arm_expand_builtin_args (rtx target, machine_mode 
map_mode, int fcode,
case ARG_BUILTIN_COPY_TO_REG:
  if (POINTER_TYPE_P (TREE_TYPE (arg[argc])))
op[argc] = convert_memory_address (Pmode, op[argc]);
+
+ /* MVE uses mve_pred16_t (aka HImode) for vectors of predicates.  
*/
+ if (GET_MODE_CLASS (mode[argc]) == MODE_VECTOR_BOOL)
+   op[argc] = 

[PATCH v3 06/15] arm: Fix mve_vmvnq_n_ argument mode

2022-01-13 Thread Christophe Lyon via Gcc-patches
The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
 iterator instead of HI in mve_vmvnq_n_.

2022-01-13  Christophe Lyon  

gcc/
* config/arm/mve.md (mve_vmvnq_n_): Use V_elem mode
for operand 1.

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 171dd384133..5c3b34dce3a 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_"
 (define_insn "mve_vmvnq_n_"
   [
(set (match_operand:MVE_5 0 "s_register_operand" "=w")
-   (unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
+   (unspec:MVE_5 [(match_operand: 1 "immediate_operand" "i")]
 VMVNQ_N))
   ]
   "TARGET_HAVE_MVE"
-- 
2.25.1



[PATCH v3 05/15] arm: Add support for VPR_REG in arm_class_likely_spilled_p

2022-01-13 Thread Christophe Lyon via Gcc-patches
VPR_REG is the only register in its class, so it should be handled by
TARGET_CLASS_LIKELY_SPILLED_P, which is achieved by calling
default_class_likely_spilled_p.  No test fails without this patch, but
it seems it should be implemented.

2022-01-13  Christophe Lyon  

gcc/
* config/arm/arm.c (arm_class_likely_spilled_p): Handle VPR_REG.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c3559ca8703..64a8f2dc7de 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29317,7 +29317,7 @@ arm_class_likely_spilled_p (reg_class_t rclass)
   || rclass  == CC_REG)
 return true;
 
-  return false;
+  return default_class_likely_spilled_p (rclass);
 }
 
 /* Implements target hook small_register_classes_for_mode_p.  */
-- 
2.25.1



[PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass

2022-01-13 Thread Christophe Lyon via Gcc-patches
At some point during the development of this patch series, it appeared
that in some cases the register allocator wants “VPR or general”
rather than “VPR or general or FP” (which is the same thing as
ALL_REGS).  The series does not seem to require this anymore, but it
seems to be a good thing to do anyway, to give the register allocator
more freedom.

CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
-mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.

2022-01-13  Christophe Lyon  

gcc/
* config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(CLASS_MAX_NREGS): Handle VPR.
* config/arm/arm.c (arm_hard_regno_nregs): Handle VPR.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index bb75921f32d..c3559ca8703 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25287,6 +25287,9 @@ thumb2_asm_output_opcode (FILE * stream)
 static unsigned int
 arm_hard_regno_nregs (unsigned int regno, machine_mode mode)
 {
+  if (IS_VPR_REGNUM (regno))
+return CEIL (GET_MODE_SIZE (mode), 2);
+
   if (TARGET_32BIT
   && regno > PC_REGNUM
   && regno != FRAME_POINTER_REGNUM
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index dacce2b7f08..2416fb5ef64 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1287,6 +1287,7 @@ enum reg_class
   SFP_REG,
   AFP_REG,
   VPR_REG,
+  GENERAL_AND_VPR_REGS,
   ALL_REGS,
   LIM_REG_CLASSES
 };
@@ -1316,6 +1317,7 @@ enum reg_class
   "SFP_REG",   \
   "AFP_REG",   \
   "VPR_REG",   \
+  "GENERAL_AND_VPR_REGS", \
   "ALL_REGS"   \
 }
 
@@ -1344,6 +1346,7 @@ enum reg_class
   { 0x, 0x, 0x, 0x0040 }, /* SFP_REG */\
   { 0x, 0x, 0x, 0x0080 }, /* AFP_REG */\
   { 0x, 0x, 0x, 0x0400 }, /* VPR_REG.  */  \
+  { 0x5FFF, 0x, 0x, 0x0400 }, /* GENERAL_AND_VPR_REGS. 
 */ \
   { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS.  */ \
 }
 
@@ -1453,7 +1456,9 @@ extern const char *fp_sysreg_names[NB_FP_SYSREGS];
ARM regs are UNITS_PER_WORD bits.  
FIXME: Is this true for iWMMX?  */
 #define CLASS_MAX_NREGS(CLASS, MODE)  \
-  (ARM_NUM_REGS (MODE))
+  (CLASS == VPR_REG) \
+  ? CEIL (GET_MODE_SIZE (MODE), 2)\
+  : (ARM_NUM_REGS (MODE))
 
 /* If defined, gives a class of registers that cannot be used as the
operand of a SUBREG that changes the mode of the object illegally.  */
-- 
2.25.1



[PATCH v3 03/15] arm: Add tests for PR target/101325

2022-01-13 Thread Christophe Lyon via Gcc-patches
These tests are derived from the one provided in the PR: there is a
compile-only test because I did not have access to anything that could
execute MVE code until recently.
I have been able to add an executable test since QEMU supports MVE.

Instead of adding arm_v8_1m_mve_hw, I update arm_mve_hw so that it
uses add_options_for_arm_v8_1m_mve_fp, like arm_neon_hw does.  This
ensures arm_mve_hw passes even if the toolchain does not generate MVE
code by default.

2022-01-13  Christophe Lyon  

gcc/testsuite/
PR target/101325
* gcc.target/arm/simd/pr101325.c: New.
* gcc.target/arm/simd/pr101325-2.c: New.
* lib/target-supports.exp (check_effective_target_arm_mve_hw): Use
add_options_for_arm_v8_1m_mve_fp.

diff --git a/gcc/testsuite/gcc.target/arm/simd/pr101325-2.c 
b/gcc/testsuite/gcc.target/arm/simd/pr101325-2.c
new file mode 100644
index 000..355f6473a00
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr101325-2.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_mve_hw } */
+/* { dg-options "-O3" } */
+/* { dg-add-options arm_v8_1m_mve } */
+
+#include 
+
+
+__attribute((noipa))
+unsigned foo(int8x16_t v, int8x16_t w)
+{
+  return vcmpeqq (v, w);
+}
+
+int main(void)
+{
+  if (foo (vdupq_n_s8(0), vdupq_n_s8(0)) != 0xU)
+__builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr101325.c 
b/gcc/testsuite/gcc.target/arm/simd/pr101325.c
new file mode 100644
index 000..4cb2513da87
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr101325.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+
+#include 
+
+unsigned foo(int8x16_t v, int8x16_t w)
+{
+  return vcmpeqq (v, w);
+}
+/* { dg-final { scan-assembler {\tvcmp.i8  eq} } } */
+/* { dg-final { scan-assembler {\tvmrs\tr[0-9]+, P0} } } */
+/* { dg-final { scan-assembler {\tuxth} } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index b4bf2e6b495..0fe1e1e077a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5037,6 +5037,7 @@ proc check_effective_target_arm_cmse_hw { } {
}
 } "-mcmse"]
 }
+
 # Return 1 if the target supports executing MVE instructions, 0
 # otherwise.
 
@@ -5052,7 +5053,7 @@ proc check_effective_target_arm_mve_hw {} {
   : "0" (a), "r" (b));
  return (a != 2);
}
-} ""]
+} [add_options_for_arm_v8_1m_mve_fp ""]]
 }
 
 # Return 1 if this is an ARM target where ARMv8-M Security Extensions with
-- 
2.25.1



[PATCH v3 02/15] arm: Add tests for PR target/100757

2022-01-13 Thread Christophe Lyon via Gcc-patches
These tests currently trigger an ICE which is fixed later in the patch
series.

The pr100757*.c testcases are derived from
gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
various types and return values different from 0 and 1 to avoid
commonalization with boolean masks.  In addition, since we should not
need these masks, the tests make sure they are not present.

2022-01-13  Christophe Lyon  

gcc/testsuite/
PR target/100757
* gcc.target/arm/simd/pr100757-2.c: New.
* gcc.target/arm/simd/pr100757-3.c: New.
* gcc.target/arm/simd/pr100757-4.c: New.
* gcc.target/arm/simd/pr100757.c: New.

diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c 
b/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
new file mode 100644
index 000..c2262b4d81e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+float a[32];
+int fn1(int d) {
+  int c = 4;
+  for (int b = 0; b < 32; b++)
+if (a[b] != 2.0f)
+  c = 5;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 4 } } */ /* 
Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t4\n} 4 } } */ /* Initial value 
for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t5\n} 4 } } */ /* Possible value 
for c.  */
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c 
b/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
new file mode 100644
index 000..e604555c04c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+/* Copied from gcc.c-torture/compile/20160205-1.c.  */
+
+float a[32];
+float fn1(int d) {
+  float c = 4.0f;
+  for (int b = 0; b < 32; b++)
+if (a[b] != 2.0f)
+  c = 5.0f;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 4 } } */ /* 
Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t1084227584\n} 4 } } */ /* 
Initial value for c (4.0).  */
+/* { dg-final { scan-assembler-times {\t.word\t1082130432\n} 4 } } */ /* 
Possible value for c (5.0).  */
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c 
b/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
new file mode 100644
index 000..c12040c517f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+unsigned int a[32];
+int fn1(int d) {
+  int c = 2;
+  for (int b = 0; b < 32; b++)
+if (a[b])
+  c = 3;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t0\n} 4 } } */ /* 'false' mask.  
*/
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-times {\t.word\t2\n} 4 } } */ /* Initial value 
for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t3\n} 4 } } */ /* Possible value 
for c.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757.c 
b/gcc/testsuite/gcc.target/arm/simd/pr100757.c
new file mode 100644
index 000..41d6e4e2d7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+int a[32];
+int fn1(int d) {
+  int c = 2;
+  for (int b = 0; b < 32; b++)
+if (a[b])
+  c = 3;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t0\n} 4 } } */ /* 'false' mask.  
*/
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-times {\t.word\t2\n} 4 } } */ /* Initial value 
for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t3\n} 4 } } */ /* Possible value 
for c.  */
-- 
2.25.1



[PATCH v3 01/15] arm: Add new tests for comparison vectorization with Neon and MVE

2022-01-13 Thread Christophe Lyon via Gcc-patches
This patch mainly adds Neon tests similar to existing MVE ones,
to make sure we do not break Neon when fixing MVE.

mve-vcmp-f32-2.c is similar to mve-vcmp-f32.c but uses a conditional
with 2.0f and 3.0f constants to help scan-assembler-times.

2022-01-13  Christophe Lyon 

gcc/testsuite/
* gcc.target/arm/simd/mve-vcmp-f32-2.c: New.
* gcc.target/arm/simd/neon-compare-1.c: New.
* gcc.target/arm/simd/neon-compare-2.c: New.
* gcc.target/arm/simd/neon-compare-3.c: New.
* gcc.target/arm/simd/neon-compare-scalar-1.c: New.
* gcc.target/arm/simd/neon-vcmp-f16.c: New.
* gcc.target/arm/simd/neon-vcmp-f32-2.c: New.
* gcc.target/arm/simd/neon-vcmp-f32-3.c: New.
* gcc.target/arm/simd/neon-vcmp-f32.c: New.
* gcc.target/arm/simd/neon-vcmp.c: New.

diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c 
b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
new file mode 100644
index 000..917a95bf141
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
@@ -0,0 +1,32 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+
+#include 
+
+#define NB 4
+
+#define FUNC(OP, NAME) \
+  void test_ ## NAME ##_f (float * __restrict__ dest, float *a, float *b) { \
+int i; \
+for (i=0; i, vcmpgt)
+FUNC(>=, vcmpge)
+
+/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 24 } } */ /* 
Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t1077936128\n} 24 } } */ /* 
Constant 3.0f.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c 
b/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
new file mode 100644
index 000..2e0222a71f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-add-options arm_neon } */
+/* { dg-additional-options "-O3" } */
+
+#include "mve-compare-1.c"
+
+/* 64-bit vectors.  */
+/* vmvn is used by 'ne' comparisons: 3 sizes * 2 (signed/unsigned) * 2
+   (register/zero) = 12.  */
+/* { dg-final { scan-assembler-times {\tvmvn\td[0-9]+, d[0-9]+\n} 12 } } */
+
+/* { 8 bits } x { eq, ne, lt, le, gt, ge }. */
+/* ne uses eq, lt/le only apply to comparison with zero, they use gt/ge
+   otherwise.  */
+/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, d[0-9]+\n} 
4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, #0\n} 4 } } 
*/
+/* { dg-final { scan-assembler-times {\tvclt.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+/* { dg-final { scan-assembler-times {\tvcle.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+
+/* { 16 bits } x { eq, ne, lt, le, gt, ge }. */
+/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, d[0-9]+\n} 
4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, #0\n} 4 } 
} */
+/* { dg-final { scan-assembler-times {\tvclt.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcle.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcgt.s16\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcge.s16\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+
+/* { 32 bits } x { eq, ne, lt, le, gt, ge }. */
+/* { dg-final { scan-assembler-times {\tvceq.i32\td[0-9]+, d[0-9]+, d[0-9]+\n} 
4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i32\td[0-9]+, d[0-9]+, #0\n} 4 } 
} */
+/* { dg-final { scan-assembler-times {\tvclt.s32\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcle.s32\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcgt.s32\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { 

[PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates

2022-01-13 Thread Christophe Lyon via Gcc-patches


This is v3 of this patch series, fixing issues I discovered before
committing v2 (which had been approved).

Thanks a lot to Richard Sandiford for his help.

The changes v2 -> v3 are:

Patch 4: Fix arm_hard_regno_nregs and CLASS_MAX_NREGS to support VPR.

Patch 7: Changes to the underlying representation of vectors of
booleans to account for the different expectations between AArch64/SVE
and Arm/MVE.

Patch 8: Re-use and extend existing thumb2_movhi* patterns instead of
duplicating them in mve_mov. This requires the introduction of a
new constraint to match a constant vector of booleans. Add a new RTL
test.

Patch 9: Introduce check_effective_target_arm_mve and skip
gcc.dg/signbit-2.c, because with MVE there is no fallback architecture
unlike SVE or AVX512.

Patch 12: Update less load/store MVE builtins
(mve_vldrdq_gather_base_z_v2di,
mve_vldrdq_gather_offset_z_v2di,
mve_vldrdq_gather_shifted_offset_z_v2di,
mve_vstrdq_scatter_base_p_v2di,
mve_vstrdq_scatter_offset_p_v2di,
mve_vstrdq_scatter_offset_p_v2di_insn,
mve_vstrdq_scatter_shifted_offset_p_v2di,
mve_vstrdq_scatter_shifted_offset_p_v2di_insn,
mve_vstrdq_scatter_base_wb_p_v2di,
mve_vldrdq_gather_base_wb_z_v2di,
mve_vldrdq_gather_base_nowb_z_v2di,
mve_vldrdq_gather_base_wb_z_v2di_insn) for which we keep HI mode
for vpr_register_operand.

Patch 13: No need to update
gcc.target/arm/acle/cde-mve-full-assembly.c anymore since we re-use
the mov pattern that emits '@ movhi' in the assembly.

Patch 15: This is a new patch to fix a problem I noticed during this
v2->v3 update.



I'll squash patch 2 with patch 9 and patch 3 with patch 8.

Original text:

This patch series addresses PR 100757 and 101325 by representing
vectors of predicates (MVE VPR.P0 register) as vectors of booleans
rather than using HImode.

As this implies a lot of mostly mechanical changes, I have tried to
split the patches in a way that should help reviewers, but the split
is a bit artificial.

Patches 1-3 add new tests.

Patches 4-6 are small independent improvements.

Patch 7 implements the predicate qualifier, but does not change any
builtin yet.

Patch 8 is the first of the two main patches, and uses the new
qualifier to describe the vcmp and vpsel builtins that are useful for
auto-vectorization of comparisons.

Patch 9 is the second main patch, which fixes the vcond_mask expander.

Patches 10-13 convert almost all the remaining builtins with HI
operands to use the predicate qualifier.  After these, there are still
a few builtins with HI operands left, about which I am not sure: vctp,
vpnot, load-gather and store-scatter with v2di operands.  In fact,
patches 11/12 update some STR/LDR qualifiers in a way that breaks
these v2di builtins although existing tests still pass.

Christophe Lyon (15):
  arm: Add new tests for comparison vectorization with Neon and MVE
  arm: Add tests for PR target/100757
  arm: Add tests for PR target/101325
  arm: Add GENERAL_AND_VPR_REGS regclass
  arm: Add support for VPR_REG in arm_class_likely_spilled_p
  arm: Fix mve_vmvnq_n_ argument mode
  arm: Implement MVE predicates as vectors of booleans
  arm: Implement auto-vectorized MVE comparisons with vectors of boolean
predicates
  arm: Fix vcond_mask expander for MVE (PR target/100757)
  arm: Convert remaining MVE vcmp builtins to predicate qualifiers
  arm: Convert more MVE builtins to predicate qualifiers
  arm: Convert more load/store MVE builtins to predicate qualifiers
  arm: Convert more MVE/CDE builtins to predicate qualifiers
  arm: Add VPR_REG to ALL_REGS
  arm: Fix constraint check for V8HI in mve_vector_mem_operand

 gcc/config/aarch64/aarch64-modes.def  |   8 +-
 gcc/config/arm/arm-builtins.c | 224 +++--
 gcc/config/arm/arm-builtins.h |   4 +-
 gcc/config/arm/arm-modes.def  |   8 +
 gcc/config/arm/arm-protos.h   |   4 +-
 gcc/config/arm/arm-simd-builtin-types.def |   4 +
 gcc/config/arm/arm.c  | 169 ++--
 gcc/config/arm/arm.h  |   9 +-
 gcc/config/arm/arm_mve_builtins.def   | 746 
 gcc/config/arm/constraints.md |   6 +
 gcc/config/arm/iterators.md   |   6 +
 gcc/config/arm/mve.md | 795 ++
 gcc/config/arm/neon.md|  39 +
 gcc/config/arm/vec-common.md  |  52 --
 gcc/config/arm/vfp.md |  34 +-
 gcc/doc/sourcebuild.texi  |   4 +
 gcc/emit-rtl.c|  20 +-
 gcc/genmodes.c|  81 +-
 gcc/machmode.def  |   2 +-
 gcc/rtx-vector-builder.c  |   4 +-
 gcc/simplify-rtx.c|  34 +-
 gcc/testsuite/gcc.dg/signbit-2.c  |   1 +
 .../gcc.target/arm/simd/mve-vcmp-f32-2.c  |  32 +
 .../gcc.target/arm/simd/neon-compare-1.c  |  78 ++
 

[PATCH 5/5] [gfortran] Lower allocate directive (OpenMP 5.0).

2022-01-13 Thread Hafiz Abid Qadeer
This patch looks for malloc/free calls that were generated by allocate statement
that is associated with allocate directive and replaces them with GOMP_alloc
and GOMP_free.

gcc/ChangeLog:

* omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE_ALLOCATOR.
(scan_omp_allocate): New.
(scan_omp_1_stmt): Call it.
(lower_omp_allocate): New function.
(lower_omp_1): Call it.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-6.f90: Add tests.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/allocate-1.c: New test.
* testsuite/libgomp.fortran/allocate-2.f90: New test.
---
 gcc/omp-low.c | 125 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |   9 ++
 .../testsuite/libgomp.fortran/allocate-1.c|   7 +
 .../testsuite/libgomp.fortran/allocate-2.f90  |  49 +++
 4 files changed, 190 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-1.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-2.f90

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index f2237428de1..8a0ae3932b9 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1684,6 +1684,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
case OMP_CLAUSE_FINALIZE:
case OMP_CLAUSE_TASK_REDUCTION:
case OMP_CLAUSE_ALLOCATE:
+   case OMP_CLAUSE_ALLOCATOR:
  break;
 
case OMP_CLAUSE_ALIGNED:
@@ -1892,6 +1893,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
case OMP_CLAUSE_FINALIZE:
case OMP_CLAUSE_FILTER:
case OMP_CLAUSE__CONDTEMP_:
+   case OMP_CLAUSE_ALLOCATOR:
  break;
 
case OMP_CLAUSE__CACHE_:
@@ -2962,6 +2964,16 @@ scan_omp_simd_scan (gimple_stmt_iterator *gsi, gomp_for 
*stmt,
   maybe_lookup_ctx (new_stmt)->for_simd_scan_phase = true;
 }
 
+/* Scan an OpenMP allocate directive.  */
+
+static void
+scan_omp_allocate (gomp_allocate *stmt, omp_context *outer_ctx)
+{
+  omp_context *ctx;
+  ctx = new_omp_context (stmt, outer_ctx);
+  scan_sharing_clauses (gimple_omp_allocate_clauses (stmt), ctx);
+}
+
 /* Scan an OpenMP sections directive.  */
 
 static void
@@ -4247,6 +4259,9 @@ scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool 
*handled_ops_p,
insert_decl_map (>cb, var, var);
   }
   break;
+case GIMPLE_OMP_ALLOCATE:
+  scan_omp_allocate (as_a  (stmt), ctx);
+  break;
 default:
   *handled_ops_p = false;
   break;
@@ -8680,6 +8695,111 @@ lower_omp_single_simple (gomp_single *single_stmt, 
gimple_seq *pre_p)
   gimple_seq_add_stmt (pre_p, gimple_build_label (flabel));
 }
 
+static void
+lower_omp_allocate (gimple_stmt_iterator *gsi_p, omp_context *)
+{
+  gomp_allocate *st = as_a  (gsi_stmt (*gsi_p));
+  tree clauses = gimple_omp_allocate_clauses (st);
+  int kind = gimple_omp_allocate_kind (st);
+  gcc_assert (kind == GF_OMP_ALLOCATE_KIND_ALLOCATE
+ || kind == GF_OMP_ALLOCATE_KIND_FREE);
+  bool allocate = (kind == GF_OMP_ALLOCATE_KIND_ALLOCATE);
+
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+{
+  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_ALLOCATOR)
+   continue;
+  tree var = OMP_ALLOCATE_DECL (c);
+
+  gimple_stmt_iterator gsi = *gsi_p;
+  for (gsi_next (); !gsi_end_p (gsi); gsi_next ())
+   {
+ gimple *stmt = gsi_stmt (gsi);
+
+ if (gimple_code (stmt) != GIMPLE_CALL
+ || (allocate && gimple_call_fndecl (stmt)
+ != builtin_decl_explicit (BUILT_IN_MALLOC))
+ || (!allocate && gimple_call_fndecl (stmt)
+ != builtin_decl_explicit (BUILT_IN_FREE)))
+   continue;
+ const gcall *gs = as_a  (stmt);
+ tree allocator = OMP_ALLOCATE_ALLOCATOR (c)
+  ? OMP_ALLOCATE_ALLOCATOR (c)
+  : integer_zero_node;
+ if (allocate)
+   {
+ tree lhs = gimple_call_lhs (gs);
+ if (lhs && TREE_CODE (lhs) == SSA_NAME)
+   {
+ gimple_stmt_iterator gsi2 = gsi;
+ gsi_next ();
+ gimple *assign = gsi_stmt (gsi2);
+ if (gimple_code (assign) == GIMPLE_ASSIGN)
+   {
+ lhs = gimple_assign_lhs (as_a  (assign));
+ if (lhs == NULL_TREE
+ || TREE_CODE (lhs) != COMPONENT_REF)
+   continue;
+ lhs = TREE_OPERAND (lhs, 0);
+   }
+   }
+
+ if (lhs == var)
+   {
+ unsigned HOST_WIDE_INT ialign = 0;
+ tree align;
+ if (TYPE_P (var))
+   ialign = TYPE_ALIGN_UNIT (var);
+ else
+   ialign = DECL_ALIGN_UNIT (var);
+ align = build_int_cst (size_type_node, ialign);
+ tree repl = builtin_decl_explicit (BUILT_IN_GOMP_ALLOC);
+ 

[PATCH 4/5] [gfortran] Gimplify allocate directive (OpenMP 5.0).

2022-01-13 Thread Hafiz Abid Qadeer
gcc/ChangeLog:

* doc/gimple.texi: Describe GIMPLE_OMP_ALLOCATE.
* gimple-pretty-print.c (dump_gimple_omp_allocate): New function.
(pp_gimple_stmt_1): Call it.
* gimple.c (gimple_build_omp_allocate): New function.
* gimple.def (GIMPLE_OMP_ALLOCATE): New node.
* gimple.h (enum gf_mask): Add GF_OMP_ALLOCATE_KIND_MASK,
GF_OMP_ALLOCATE_KIND_ALLOCATE and GF_OMP_ALLOCATE_KIND_FREE.
(struct gomp_allocate): New.
(is_a_helper ::test): New.
(is_a_helper ::test): New.
(gimple_build_omp_allocate): Declare.
(gimple_omp_subcode): Replace GIMPLE_OMP_TEAMS with
GIMPLE_OMP_ALLOCATE.
(gimple_omp_allocate_set_clauses): New.
(gimple_omp_allocate_set_kind): Likewise.
(gimple_omp_allocate_clauses): Likewise.
(gimple_omp_allocate_kind): Likewise.
(CASE_GIMPLE_OMP): Add GIMPLE_OMP_ALLOCATE.
* gimplify.c (gimplify_omp_allocate): New.
(gimplify_expr): Call it.
* gsstruct.def (GSS_OMP_ALLOCATE): Define.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-6.f90: Add tests.
---
 gcc/doc/gimple.texi   | 38 +++-
 gcc/gimple-pretty-print.c | 37 
 gcc/gimple.c  | 10 
 gcc/gimple.def|  6 ++
 gcc/gimple.h  | 60 ++-
 gcc/gimplify.c| 19 ++
 gcc/gsstruct.def  |  1 +
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |  4 +-
 8 files changed, 171 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 65ef63d6ee9..60a4d2c17ca 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -420,6 +420,9 @@ kinds, along with their relationships to @code{GSS_} values 
(layouts) and
  + gomp_continue
  |layout: GSS_OMP_CONTINUE, code: GIMPLE_OMP_CONTINUE
  |
+ + gomp_allocate
+ |layout: GSS_OMP_ALLOCATE, code: GIMPLE_OMP_ALLOCATE
+ |
  + gomp_atomic_load
  |layout: GSS_OMP_ATOMIC_LOAD, code: GIMPLE_OMP_ATOMIC_LOAD
  |
@@ -454,6 +457,7 @@ The following table briefly describes the GIMPLE 
instruction set.
 @item @code{GIMPLE_GOTO}   @tab x  @tab x
 @item @code{GIMPLE_LABEL}  @tab x  @tab x
 @item @code{GIMPLE_NOP}@tab x  @tab x
+@item @code{GIMPLE_OMP_ALLOCATE}   @tab x  @tab x
 @item @code{GIMPLE_OMP_ATOMIC_LOAD}@tab x  @tab x
 @item @code{GIMPLE_OMP_ATOMIC_STORE}   @tab x  @tab x
 @item @code{GIMPLE_OMP_CONTINUE}   @tab x  @tab x
@@ -1029,6 +1033,7 @@ Return a deep copy of statement @code{STMT}.
 * @code{GIMPLE_LABEL}::
 * @code{GIMPLE_GOTO}::
 * @code{GIMPLE_NOP}::
+* @code{GIMPLE_OMP_ALLOCATE}::
 * @code{GIMPLE_OMP_ATOMIC_LOAD}::
 * @code{GIMPLE_OMP_ATOMIC_STORE}::
 * @code{GIMPLE_OMP_CONTINUE}::
@@ -1729,6 +1734,38 @@ Build a @code{GIMPLE_NOP} statement.
 Returns @code{TRUE} if statement @code{G} is a @code{GIMPLE_NOP}.
 @end deftypefn
 
+@node @code{GIMPLE_OMP_ALLOCATE}
+@subsection @code{GIMPLE_OMP_ALLOCATE}
+@cindex @code{GIMPLE_OMP_ALLOCATE}
+
+@deftypefn {GIMPLE function} gomp_allocate *gimple_build_omp_allocate ( @
+tree clauses, int kind)
+Build a @code{GIMPLE_OMP_ALLOCATE} statement.  @code{CLAUSES} is the clauses
+associated with this node.  @code{KIND} is the enumeration value
+@code{GF_OMP_ALLOCATE_KIND_ALLOCATE} if this directive allocates memory
+or @code{GF_OMP_ALLOCATE_KIND_FREE} if it de-allocates.
+@end deftypefn
+
+@deftypefn {GIMPLE function} void gimple_omp_allocate_set_clauses ( @
+gomp_allocate *g, tree clauses)
+Set the @code{CLAUSES} for a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
+@deftypefn {GIMPLE function} tree gimple_omp_aallocate_clauses ( @
+const gomp_allocate *g)
+Get the @code{CLAUSES} of a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
+@deftypefn {GIMPLE function} void gimple_omp_allocate_set_kind ( @
+gomp_allocate *g, int kind)
+Set the @code{KIND} for a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
+@deftypefn {GIMPLE function} tree gimple_omp_allocate_kind ( @
+const gomp_atomic_load *g)
+Get the @code{KIND} of a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
 @node @code{GIMPLE_OMP_ATOMIC_LOAD}
 @subsection @code{GIMPLE_OMP_ATOMIC_LOAD}
 @cindex @code{GIMPLE_OMP_ATOMIC_LOAD}
@@ -1760,7 +1797,6 @@ const gomp_atomic_load *g)
 Get the @code{RHS} of an atomic set.
 @end deftypefn
 
-
 @node @code{GIMPLE_OMP_ATOMIC_STORE}
 @subsection @code{GIMPLE_OMP_ATOMIC_STORE}
 @cindex @code{GIMPLE_OMP_ATOMIC_STORE}
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index ebd87b20a0a..bb961a900df 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1967,6 +1967,38 @@ dump_gimple_omp_critical (pretty_printer 

[PATCH 3/5] [gfortran] Handle cleanup of omp allocated variables (OpenMP 5.0).

2022-01-13 Thread Hafiz Abid Qadeer
Currently we are only handling omp allocate directive that is associated
with an allocate statement.  This statement results in malloc and free calls.
The malloc calls are easy to get to as they are in the same block as allocate
directive.  But the free calls come in a separate cleanup block.  To help any
later passes finding them, an allocate directive is generated in the
cleanup block with kind=free. The normal allocate directive is given
kind=allocate.

gcc/fortran/ChangeLog:

* gfortran.h (struct access_ref): Declare new members
omp_allocated and omp_allocated_end.
* openmp.c (gfc_match_omp_allocate): Set new_st.resolved_sym to
NULL.
(prepare_omp_allocated_var_list_for_cleanup): New function.
(gfc_resolve_omp_allocate): Call it.
* trans-decl.c (gfc_trans_deferred_vars): Process omp_allocated.
* trans-openmp.c (gfc_trans_omp_allocate): Set kind for the stmt
generated for allocate directive.

gcc/ChangeLog:

* tree-core.h (struct tree_base): Add comments.
* tree-pretty-print.c (dump_generic_node): Handle allocate directive
kind.
* tree.h (OMP_ALLOCATE_KIND_ALLOCATE): New define.
(OMP_ALLOCATE_KIND_FREE): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-6.f90: Test kind of allocate directive.
---
 gcc/fortran/gfortran.h|  1 +
 gcc/fortran/openmp.c  | 30 +++
 gcc/fortran/trans-decl.c  | 20 +
 gcc/fortran/trans-openmp.c|  6 
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |  3 +-
 gcc/tree-core.h   |  6 
 gcc/tree-pretty-print.c   |  4 +++
 gcc/tree.h|  4 +++
 8 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 79a43a2fdf0..6a43847d31f 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1820,6 +1820,7 @@ typedef struct gfc_symbol
   gfc_array_spec *as;
   struct gfc_symbol *result;   /* function result symbol */
   gfc_component *components;   /* Derived type components */
+  gfc_omp_namelist *omp_allocated, *omp_allocated_end;
 
   /* Defined only for Cray pointees; points to their pointer.  */
   struct gfc_symbol *cp_pointer;
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index ee7c39980bb..f11812b0b12 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5818,6 +5818,7 @@ gfc_match_omp_allocate (void)
 
   new_st.op = EXEC_OMP_ALLOCATE;
   new_st.ext.omp_clauses = c;
+  new_st.resolved_sym = NULL;
   gfc_free_expr (allocator);
   return MATCH_YES;
 }
@@ -9049,6 +9050,34 @@ gfc_resolve_oacc_routines (gfc_namespace *ns)
 }
 }
 
+static void
+prepare_omp_allocated_var_list_for_cleanup (gfc_omp_namelist *cn, locus loc)
+{
+  gfc_symbol *proc = cn->sym->ns->proc_name;
+  gfc_omp_namelist *p, *n;
+
+  for (n = cn; n; n = n->next)
+{
+  if (n->sym->attr.allocatable && !n->sym->attr.save
+ && !n->sym->attr.result && !proc->attr.is_main_program)
+   {
+ p = gfc_get_omp_namelist ();
+ p->sym = n->sym;
+ p->expr = gfc_copy_expr (n->expr);
+ p->where = loc;
+ p->next = NULL;
+ if (proc->omp_allocated == NULL)
+   proc->omp_allocated_end = proc->omp_allocated = p;
+ else
+   {
+ proc->omp_allocated_end->next = p;
+ proc->omp_allocated_end = p;
+   }
+
+   }
+}
+}
+
 static void
 check_allocate_directive_restrictions (gfc_symbol *sym, gfc_expr *omp_al,
   gfc_namespace *ns, locus loc)
@@ -9179,6 +9208,7 @@ gfc_resolve_omp_allocate (gfc_code *code, gfc_namespace 
*ns)
 code->loc);
}
 }
+  prepare_omp_allocated_var_list_for_cleanup (cn, code->loc);
 }
 
 
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 066fb3a5f61..e5c9bf413e7 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -4583,6 +4583,26 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, 
gfc_wrapped_block * block)
  }
 }
 
+  /* Generate a dummy allocate pragma with free kind so that cleanup
+ of those variables which were allocated using the allocate statement
+ associated with an allocate clause happens correctly.  */
+
+  if (proc_sym->omp_allocated)
+{
+  gfc_clear_new_st ();
+  new_st.op = EXEC_OMP_ALLOCATE;
+  gfc_omp_clauses *c = gfc_get_omp_clauses ();
+  c->lists[OMP_LIST_ALLOCATOR] = proc_sym->omp_allocated;
+  new_st.ext.omp_clauses = c;
+  /* This is just a hacky way to convey to handler that we are
+dealing with cleanup here.  Saves us from using another field
+for it.  */
+  new_st.resolved_sym = proc_sym->omp_allocated->sym;
+  gfc_add_init_cleanup (block, NULL,
+ 

[PATCH 2/5] [gfortran] Translate allocate directive (OpenMP 5.0).

2022-01-13 Thread Hafiz Abid Qadeer
gcc/fortran/ChangeLog:

* trans-openmp.c (gfc_trans_omp_clauses): Handle OMP_LIST_ALLOCATOR.
(gfc_trans_omp_allocate): New function.
(gfc_trans_omp_directive): Handle EXEC_OMP_ALLOCATE.

gcc/ChangeLog:

* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE_ALLOCATOR.
(dump_generic_node): Handle OMP_ALLOCATE.
* tree.def (OMP_ALLOCATE): New.
* tree.h (OMP_ALLOCATE_CLAUSES): Likewise.
(OMP_ALLOCATE_DECL): Likewise.
(OMP_ALLOCATE_ALLOCATOR): Likewise.
* tree.c (omp_clause_num_ops): Add entry for OMP_CLAUSE_ALLOCATOR.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-6.f90: New test.
---
 gcc/fortran/trans-openmp.c| 44 
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 | 72 +++
 gcc/tree-core.h   |  3 +
 gcc/tree-pretty-print.c   | 19 +
 gcc/tree.c|  1 +
 gcc/tree.def  |  4 ++
 gcc/tree.h| 11 +++
 7 files changed, 154 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 9661c77f905..cb389f40370 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -2649,6 +2649,28 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  }
  }
  break;
+   case OMP_LIST_ALLOCATOR:
+ for (; n != NULL; n = n->next)
+   if (n->sym->attr.referenced)
+ {
+   tree t = gfc_trans_omp_variable (n->sym, false);
+   if (t != error_mark_node)
+ {
+   tree node = build_omp_clause (input_location,
+ OMP_CLAUSE_ALLOCATOR);
+   OMP_ALLOCATE_DECL (node) = t;
+   if (n->expr)
+ {
+   tree allocator_;
+   gfc_init_se (, NULL);
+   gfc_conv_expr (, n->expr);
+   allocator_ = gfc_evaluate_now (se.expr, block);
+   OMP_ALLOCATE_ALLOCATOR (node) = allocator_;
+ }
+   omp_clauses = gfc_trans_add_clause (node, omp_clauses);
+ }
+ }
+ break;
case OMP_LIST_LINEAR:
  {
gfc_expr *last_step_expr = NULL;
@@ -4888,6 +4910,26 @@ gfc_trans_omp_atomic (gfc_code *code)
   return gfc_finish_block ();
 }
 
+static tree
+gfc_trans_omp_allocate (gfc_code *code)
+{
+  stmtblock_t block;
+  tree stmt;
+
+  gfc_omp_clauses *clauses = code->ext.omp_clauses;
+  gcc_assert (clauses);
+
+  gfc_start_block ();
+  stmt = make_node (OMP_ALLOCATE);
+  TREE_TYPE (stmt) = void_type_node;
+  OMP_ALLOCATE_CLAUSES (stmt) = gfc_trans_omp_clauses (, clauses,
+  code->loc, false,
+  true);
+  gfc_add_expr_to_block (, stmt);
+  gfc_merge_block_scope ();
+  return gfc_finish_block ();
+}
+
 static tree
 gfc_trans_omp_barrier (void)
 {
@@ -7280,6 +7322,8 @@ gfc_trans_omp_directive (gfc_code *code)
 {
   switch (code->op)
 {
+case EXEC_OMP_ALLOCATE:
+  return gfc_trans_omp_allocate (code);
 case EXEC_OMP_ATOMIC:
   return gfc_trans_omp_atomic (code);
 case EXEC_OMP_BARRIER:
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 
b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
new file mode 100644
index 000..2de2b52ee44
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
@@ -0,0 +1,72 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-original" }
+
+module omp_lib_kinds
+  use iso_c_binding, only: c_int, c_intptr_t
+  implicit none
+  private :: c_int, c_intptr_t
+  integer, parameter :: omp_allocator_handle_kind = c_intptr_t
+
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_null_allocator = 0
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_default_mem_alloc = 1
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_large_cap_mem_alloc = 2
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_const_mem_alloc = 3
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_high_bw_mem_alloc = 4
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_low_lat_mem_alloc = 5
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_cgroup_mem_alloc = 6
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_pteam_mem_alloc = 7
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_thread_mem_alloc = 8
+end module
+
+
+subroutine foo(x, y, al)
+  use omp_lib_kinds
+  implicit none
+  
+type :: my_type
+  integer :: i
+  integer :: j
+  real :: x

[PATCH 1/5] [gfortran] Add parsing support for allocate directive (OpenMP 5.0).

2022-01-13 Thread Hafiz Abid Qadeer
Currently we only make use of this directive when it is associated
with an allocate statement.

gcc/fortran/ChangeLog:

* dump-parse-tree.c (show_omp_node): Handle EXEC_OMP_ALLOCATE.
(show_code_node): Likewise.
* gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATE.
(OMP_LIST_ALLOCATOR): New enum value.
(enum gfc_exec_op): Add EXEC_OMP_ALLOCATE.
* match.h (gfc_match_omp_allocate): New function.
* openmp.c (enum omp_mask1): Add OMP_CLAUSE_ALLOCATOR.
(OMP_ALLOCATE_CLAUSES): New define.
(gfc_match_omp_allocate): New function.
(resolve_omp_clauses): Add ALLOCATOR in clause_names.
(omp_code_to_statement): Handle EXEC_OMP_ALLOCATE.
(EMPTY_VAR_LIST): New define.
(check_allocate_directive_restrictions): New function.
(gfc_resolve_omp_allocate): Likewise.
(gfc_resolve_omp_directive): Handle EXEC_OMP_ALLOCATE.
* parse.c (decode_omp_directive): Handle ST_OMP_ALLOCATE.
(next_statement): Likewise.
(gfc_ascii_statement): Likewise.
* resolve.c (gfc_resolve_code): Handle EXEC_OMP_ALLOCATE.
* st.c (gfc_free_statement): Likewise.
* trans.c (trans_code): Likewise

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-4.f90: New test.
* gfortran.dg/gomp/allocate-5.f90: New test.
---
 gcc/fortran/dump-parse-tree.c |   3 +
 gcc/fortran/gfortran.h|   4 +-
 gcc/fortran/match.h   |   1 +
 gcc/fortran/openmp.c  | 199 +-
 gcc/fortran/parse.c   |  10 +-
 gcc/fortran/resolve.c |   1 +
 gcc/fortran/st.c  |   1 +
 gcc/fortran/trans.c   |   1 +
 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 | 112 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 |  73 +++
 10 files changed, 400 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90

diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 7459f4b89a9..38fef42150a 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1993,6 +1993,7 @@ show_omp_node (int level, gfc_code *c)
 case EXEC_OACC_CACHE: name = "CACHE"; is_oacc = true; break;
 case EXEC_OACC_ENTER_DATA: name = "ENTER DATA"; is_oacc = true; break;
 case EXEC_OACC_EXIT_DATA: name = "EXIT DATA"; is_oacc = true; break;
+case EXEC_OMP_ALLOCATE: name = "ALLOCATE"; break;
 case EXEC_OMP_ATOMIC: name = "ATOMIC"; break;
 case EXEC_OMP_BARRIER: name = "BARRIER"; break;
 case EXEC_OMP_CANCEL: name = "CANCEL"; break;
@@ -2194,6 +2195,7 @@ show_omp_node (int level, gfc_code *c)
   || c->op == EXEC_OMP_TARGET_UPDATE || c->op == EXEC_OMP_TARGET_ENTER_DATA
   || c->op == EXEC_OMP_TARGET_EXIT_DATA || c->op == EXEC_OMP_SCAN
   || c->op == EXEC_OMP_DEPOBJ || c->op == EXEC_OMP_ERROR
+  || c->op == EXEC_OMP_ALLOCATE
   || (c->op == EXEC_OMP_ORDERED && c->block == NULL))
 return;
   if (c->op == EXEC_OMP_SECTIONS || c->op == EXEC_OMP_PARALLEL_SECTIONS)
@@ -3314,6 +3316,7 @@ show_code_node (int level, gfc_code *c)
 case EXEC_OACC_CACHE:
 case EXEC_OACC_ENTER_DATA:
 case EXEC_OACC_EXIT_DATA:
+case EXEC_OMP_ALLOCATE:
 case EXEC_OMP_ATOMIC:
 case EXEC_OMP_CANCEL:
 case EXEC_OMP_CANCELLATION_POINT:
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 3b791a4f6be..79a43a2fdf0 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -259,7 +259,7 @@ enum gfc_statement
   ST_OACC_CACHE, ST_OACC_KERNELS_LOOP, ST_OACC_END_KERNELS_LOOP,
   ST_OACC_SERIAL_LOOP, ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL,
   ST_OACC_END_SERIAL, ST_OACC_ENTER_DATA, ST_OACC_EXIT_DATA, ST_OACC_ROUTINE,
-  ST_OACC_ATOMIC, ST_OACC_END_ATOMIC,
+  ST_OACC_ATOMIC, ST_OACC_END_ATOMIC, ST_OMP_ALLOCATE,
   ST_OMP_ATOMIC, ST_OMP_BARRIER, ST_OMP_CRITICAL, ST_OMP_END_ATOMIC,
   ST_OMP_END_CRITICAL, ST_OMP_END_DO, ST_OMP_END_MASTER, ST_OMP_END_ORDERED,
   ST_OMP_END_PARALLEL, ST_OMP_END_PARALLEL_DO, ST_OMP_END_PARALLEL_SECTIONS,
@@ -1392,6 +1392,7 @@ enum
   OMP_LIST_USE_DEVICE_PTR,
   OMP_LIST_USE_DEVICE_ADDR,
   OMP_LIST_NONTEMPORAL,
+  OMP_LIST_ALLOCATOR,
   OMP_LIST_NUM
 };
 
@@ -2893,6 +2894,7 @@ enum gfc_exec_op
   EXEC_OACC_DATA, EXEC_OACC_HOST_DATA, EXEC_OACC_LOOP, EXEC_OACC_UPDATE,
   EXEC_OACC_WAIT, EXEC_OACC_CACHE, EXEC_OACC_ENTER_DATA, EXEC_OACC_EXIT_DATA,
   EXEC_OACC_ATOMIC, EXEC_OACC_DECLARE,
+  EXEC_OMP_ALLOCATE,
   EXEC_OMP_CRITICAL, EXEC_OMP_DO, EXEC_OMP_FLUSH, EXEC_OMP_MASTER,
   EXEC_OMP_ORDERED, EXEC_OMP_PARALLEL, EXEC_OMP_PARALLEL_DO,
   EXEC_OMP_PARALLEL_SECTIONS, EXEC_OMP_PARALLEL_WORKSHARE,
diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h
index 65ee3b6cb41..9f0449eda0e 100644
--- a/gcc/fortran/match.h
+++ 

[PATCH 0/5] [gfortran] Support for allocate directive (OpenMP 5.0)

2022-01-13 Thread Hafiz Abid Qadeer
This patch series add initial support for allocate directive in the
gfortran.  Although every allocate directive is parsed, only those
which are associated with an allocate statement are translated. The
lowering consists of replacing implicitly generated malloc/free call
from the allocate statement to GOMP_alloc and GOMP_free calls.

Hafiz Abid Qadeer (5):
  [gfortran] Add parsing support for allocate directive (OpenMP 5.0).
  [gfortran] Translate allocate directive (OpenMP 5.0).
  [gfortran] Handle cleanup of omp allocated variables (OpenMP 5.0).
  Gimplify allocate directive (OpenMP 5.0).
  Lower allocate directive  (OpenMP 5.0).

 gcc/doc/gimple.texi   |  38 ++-
 gcc/fortran/dump-parse-tree.c |   3 +
 gcc/fortran/gfortran.h|   5 +-
 gcc/fortran/match.h   |   1 +
 gcc/fortran/openmp.c  | 229 +-
 gcc/fortran/parse.c   |  10 +-
 gcc/fortran/resolve.c |   1 +
 gcc/fortran/st.c  |   1 +
 gcc/fortran/trans-decl.c  |  20 ++
 gcc/fortran/trans-openmp.c|  50 
 gcc/fortran/trans.c   |   1 +
 gcc/gimple-pretty-print.c |  37 +++
 gcc/gimple.c  |  10 +
 gcc/gimple.def|   6 +
 gcc/gimple.h  |  60 -
 gcc/gimplify.c|  19 ++
 gcc/gsstruct.def  |   1 +
 gcc/omp-low.c | 125 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 | 112 +
 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 |  73 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |  84 +++
 gcc/tree-core.h   |   9 +
 gcc/tree-pretty-print.c   |  23 ++
 gcc/tree.c|   1 +
 gcc/tree.def  |   4 +
 gcc/tree.h|  15 ++
 .../testsuite/libgomp.fortran/allocate-1.c|   7 +
 .../testsuite/libgomp.fortran/allocate-2.f90  |  49 
 28 files changed, 986 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-1.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-2.f90

-- 
2.25.1



Re: [PATCH] c/104002 - shufflevector variable indexing

2022-01-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 13, 2022 at 03:12:03PM +0100, Richard Biener wrote:
> Variable indexing of a __builtin_shufflevector result is broken because
> we fail to properly mark the TARGET_EXPR decl as addressable.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> 
> Thanks,
> Richard.
> 
> 2022-01-13  Richard Biener  
> 
>   PR c/104002
> gcc/c-family/
>   * c-common.c (c_common_mark_addressable_vec): Handle TARGET_EXPR.
> 
> gcc/testsuite/
>   * c-c++-common/builtin-shufflevector-3.c: Move ...
>   * c-c++-common/torture/builtin-shufflevector-3.c: ... here.

LGTM.

Jakub



Re: [PATCH] forwprop: Canonicalize atomic fetch_op op x to op_fetch or vice versa [PR98737]

2022-01-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 13, 2022 at 02:49:47PM +0100, Richard Biener wrote:
> > + tree d = build_debug_expr_decl (type);
> > + gdebug *g
> > +   = gimple_build_debug_bind (d, build2 (rcode, type,
> > + new_lhs, arg),
> > +  stmt2);
> > + gsi_insert_after (, g, GSI_NEW_STMT);
> > + replace_uses_by (lhs2, d);
> 
> I wonder if you can leave a lhs2 = d; in the IL instead of using
> replace_uses_by which will process imm uses and fold stmts while
> we're going to do that anyway in the caller?  That would IMHO
> be better here.

I'd need to emit them always for reversible ops and when the
atomic call can't be last, regardless of whether it is needed or not,
just so that next DCE would remove those up and emit those debug stmts,
because otherwise that could result in -fcompare-debug failures
(at least with -fno-tree-dce -fno-tree-whatever ...).
And
+ tree narg = build_debug_expr_decl (type);
+ gdebug *g
+   = gimple_build_debug_bind (narg,
+  fold_convert (type, arg),
+  stmt2);
isn't that much more code compared to
  gimple *g = gimple_build_assign (lhs2, NOP_EXPR, arg);
Or would you like it to be emitted always, i.e.
  if (atomic_op != BIT_AND_EXPR
 && atomic_op != BIT_IOR_EXPR
 /* With -fnon-call-exceptions if we can't
add stmts after the call easily.  */
 && !stmt_ends_bb_p (stmt2))
{
  tree type = TREE_TYPE (lhs2);
  if (TREE_CODE (arg) == INTEGER_CST)
arg = fold_convert (type, arg);
  else if (!useless_type_conversion_p (type, TREE_TYPE (arg)))
{
  tree narg = make_ssa_name (type);
  gimple *g = gimple_build_assign (narg, NOP_EXPR, arg);
  gsi_insert_after (, g, GSI_NEW_STMT);
  arg = narg;
}
  enum tree_code rcode;
  switch (atomic_op)
{
case PLUS_EXPR: rcode = MINUS_EXPR; break;
case MINUS_EXPR: rcode = PLUS_EXPR; break;
case BIT_XOR_EXPR: rcode = atomic_op; break;
default: gcc_unreachable ();
}
  tree d = build_debug_expr_decl (type);
  gimple *g = gimple_build_assign (lhs2, rcode, new_lhs, arg);
  gsi_insert_after (, g, GSI_NEW_STMT);
  lhs2 = NULL_TREE;
}
in between
  update_stmt (use_stmt);
and
  imm_use_iterator iter;
and then do the
 FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs2)
   if (use_stmt != cast_stmt)
with resetting only if (lhs2)
and similarly release_ssa_name (lhs2) only if (lhs2)?
I think the usual case is that we emit debug exprs right away,
not emit something that we want to DCE.

+   if (atomic_op == BIT_AND_EXPR
+   || atomic_op == BIT_IOR_EXPR
+   /* Or with -fnon-call-exceptions if we can't
+  add debug stmts after the call.  */
+   || stmt_ends_bb_p (stmt2))


But now that you mention it, I think I don't handle right the
case where lhs2 has no debug uses but there is a cast_stmt that has debug
uses for its lhs.  We'd need to add_debug_temp in that case too and
add a debug temp.

Jakub



Re: [vect] PR103997: Fix epilogue mode skipping

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Jan 2022, Andre Vieira (lists) wrote:

> 
> On 13/01/2022 12:36, Richard Biener wrote:
> > On Thu, 13 Jan 2022, Andre Vieira (lists) wrote:
> >
> >> This time to the list too (sorry for double email)
> >>
> >> Hi,
> >>
> >> The original patch '[vect] Re-analyze all modes for epilogues', skipped
> >> modes
> >> that should not be skipped since it used the vector mode provided by
> >> autovectorize_vector_modes to derive the minimum VF required for it.
> >> However,
> >> those modes should only really be used to dictate vector size, so instead
> >> this
> >> patch looks for the mode in 'used_vector_modes' with the largest element
> >> size,
> >> and constructs a vector mode with the smae size as the current
> >> vector_modes[mode_i]. Since we are using the largest element size the
> >> NUNITs
> >> for this mode is the smallest possible VF required for an epilogue with
> >> this
> >> mode and should thus skip only the modes we are certain can not be used.
> >>
> >> Passes bootstrap and regression on x86_64 and aarch64.
> > Clearly
> >
> > + /* To make sure we are conservative as to what modes we skip, we
> > +should use check the smallest possible NUNITS which would be
> > +derived from the mode in USED_VECTOR_MODES with the largest
> > +element size.  */
> > + scalar_mode max_elsize_mode = GET_MODE_INNER
> > (vector_modes[mode_i]);
> > + for (vec_info::mode_set::iterator i =
> > +   first_loop_vinfo->used_vector_modes.begin ();
> > + i != first_loop_vinfo->used_vector_modes.end (); ++i)
> > +   {
> > + if (VECTOR_MODE_P (*i)
> > + && GET_MODE_SIZE (GET_MODE_INNER (*i))
> > + > GET_MODE_SIZE (max_elsize_mode))
> > +   max_elsize_mode = GET_MODE_INNER (*i);
> > +   }
> >
> > can be done once before iterating over the modes for the epilogue.
> True, I'll start with QImode instead of the inner of vector_modes[mode_i] too
> since we can't guarantee the mode is a VECTOR_MODE_P and it is actually better
> too since we can't possible guarantee the element size of the
> USED_VECTOR_MODES is smaller than that of the first vector mode...
> 
> > Richard maybe knows whether we should take care to look at the
> > size of the vector mode as well since related_vector_mode when
> > passed 0 as nunits produces a vector mode with the same size
> > as vector_modes[mode_i] but not all used_vector_modes may be
> > of the same size
> I suspect that should be fine though, since if we use the largest element size
> of all used_vector_modes then that should gives us the least possible number
> of NUNITS and thus only conservatively skip. That said, that does assume that
> no vector mode used may be larger than the size of the loop's vector_mode. Can
> I assume that?

No idea, but I would lean towards a no ;)  I think the loops vector_mode
doesn't have to match vector_modes[mode_i] either, does it?  At least
autodetected_vector_mode will be not QImode based.

> >
> > (and you probably also want to exclude
> > VECTOR_BOOLEAN_TYPE_P from the search?)
> Yeah I think so too, thanks!
> 
> I keep going back to thinking (as I brought up in the bugzilla ticket), maybe
> we ought to only skip if the NUNITS of the vector mode with the same vector
> size as vector_modes[mode_i] is larger than first_info_vf, or just don't skip
> at all...

The question is how much work we do before realizing the chosen mode
cannot be used because there's not enough iterations?  Maybe we can
improve there easily?

Also for targets that for the main loop do not perform cost
comparison (like x86) but have lots of vector modes the previous
mode of operation really made sense (start at next_mode_i or
mode_i when unrolling).


[PATCH] c/104002 - shufflevector variable indexing

2022-01-13 Thread Richard Biener via Gcc-patches
Variable indexing of a __builtin_shufflevector result is broken because
we fail to properly mark the TARGET_EXPR decl as addressable.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2022-01-13  Richard Biener  

PR c/104002
gcc/c-family/
* c-common.c (c_common_mark_addressable_vec): Handle TARGET_EXPR.

gcc/testsuite/
* c-c++-common/builtin-shufflevector-3.c: Move ...
* c-c++-common/torture/builtin-shufflevector-3.c: ... here.
---
 gcc/c-family/c-common.c  | 5 -
 .../c-c++-common/{ => torture}/builtin-shufflevector-3.c | 0
 2 files changed, 4 insertions(+), 1 deletion(-)
 rename gcc/testsuite/c-c++-common/{ => torture}/builtin-shufflevector-3.c 
(100%)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 4a6a4edb763..a34f32f51a4 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -6989,12 +6989,15 @@ c_common_mark_addressable_vec (tree t)
 }
   if (!VAR_P (t)
   && TREE_CODE (t) != PARM_DECL
-  && TREE_CODE (t) != COMPOUND_LITERAL_EXPR)
+  && TREE_CODE (t) != COMPOUND_LITERAL_EXPR
+  && TREE_CODE (t) != TARGET_EXPR)
 return;
   if (!VAR_P (t) || !DECL_HARD_REGISTER (t))
 TREE_ADDRESSABLE (t) = 1;
   if (TREE_CODE (t) == COMPOUND_LITERAL_EXPR)
 TREE_ADDRESSABLE (COMPOUND_LITERAL_EXPR_DECL (t)) = 1;
+  else if (TREE_CODE (t) == TARGET_EXPR)
+TREE_ADDRESSABLE (TARGET_EXPR_SLOT (t)) = 1;
 }
 
 
diff --git a/gcc/testsuite/c-c++-common/builtin-shufflevector-3.c 
b/gcc/testsuite/c-c++-common/torture/builtin-shufflevector-3.c
similarity index 100%
rename from gcc/testsuite/c-c++-common/builtin-shufflevector-3.c
rename to gcc/testsuite/c-c++-common/torture/builtin-shufflevector-3.c
-- 
2.31.1


[PATCH] tree-optimization/83072 - Allow more precision when querying from fold_const.

2022-01-13 Thread Andrew MacLeod via Gcc-patches

This patch actually addresses a few PRs.

The root PR was 97909.   Ranger context functionality was added to 
fold_const back in early November 
(https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583216.html)


The other 2 PRs mentioned (83072 and 83073) partially worked after this, 
but the original patch did not change the result of the query in 
expr_not_equal_to () to a multi-range object.


This patch simply changes the value_range variable in that routine to an 
int_range<5> so we can pick up more precision. This in turn allows us to 
capture all the tests as expected.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.

OK for trunk?

Andrew
From 329626a426d21dfe484053f7b6ac4f2d0c14fa0e Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 12 Jan 2022 13:31:08 -0500
Subject: [PATCH 2/2] Allow more precision when querying from fold_const.

fold_const::expr_not_equal_to queries for a current range, but still uses
the old value_range class.  This is causing it to miss opportunities when
ranger can provide something better.

	PR tree-optimization/83072
	PR tree-optimization/83073
	PR tree-optimization/97909
	gcc/
	* fold-const.c (expr_not_equal_to): Use a multi-range class.

	gcc/testsuite/
	* gcc.dg/pr83072-2.c: New.
	* gcc.dg/pr83073.c: New.
---
 gcc/fold-const.c |  2 +-
 gcc/testsuite/gcc.dg/pr83072-2.c | 18 ++
 gcc/testsuite/gcc.dg/pr83073.c   | 10 ++
 3 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr83072-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr83073.c

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 397fa9a03a1..7945b8d9eda 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -10734,7 +10734,7 @@ tree_expr_nonzero_p (tree t)
 bool
 expr_not_equal_to (tree t, const wide_int )
 {
-  value_range vr;
+  int_range<5> vr;
   switch (TREE_CODE (t))
 {
 case INTEGER_CST:
diff --git a/gcc/testsuite/gcc.dg/pr83072-2.c b/gcc/testsuite/gcc.dg/pr83072-2.c
new file mode 100644
index 000..f495f2582c4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr83072-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -fdump-tree-evrp-details" } */
+
+int f1(int a, int b, int c){
+  if(c==0)__builtin_unreachable();
+  a *= c;
+  b *= c;
+  return a == b;
+}
+
+int f2(int a, int b, int c){
+  c |= 1;
+  a *= c;
+  b *= c;
+  return a == b;
+}
+
+/* { dg-final { scan-tree-dump-times "gimple_simplified to" 2 "evrp" } }  */
diff --git a/gcc/testsuite/gcc.dg/pr83073.c b/gcc/testsuite/gcc.dg/pr83073.c
new file mode 100644
index 000..1168ae822a4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr83073.c
@@ -0,0 +1,10 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -fdump-tree-evrp-details -fno-tree-fre -fno-tree-ccp -fno-tree-forwprop" } */
+
+int f(int x)
+{
+x = x|1;
+return x & 1;
+}
+
+/* { dg-final { scan-tree-dump "gimple_simplified to.* = 1" "evrp" } }  */
-- 
2.17.2



[PATCH] tree-optimization/96707 - Add relation to unsigned right shift.

2022-01-13 Thread Andrew MacLeod via Gcc-patches

A quick addition to range ops for

LHS = OP1 >> OP2

if OP1 and OP2 are both >= 0,   then we can register the relation  LHS 
<= OP1   and all the expected good things happen.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.

OK for trunk?

Andrew
From c34dab537d6f54b66b430f5980cde278fa033904 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 12 Jan 2022 13:28:55 -0500
Subject: [PATCH 1/2] Add relation to unsigned right shift.

If the first operand and the shift value of a right shift operation are both
>= 0, then we know the LHS of the operation is <= the first operand.

	PR tree-optimization/96707
	gcc/
	* range-op.c (operator_rshift::lhs_op1_relation): New.
	gcc/testtsuite/
	* g++.dg/pr96707.C: New.
---
 gcc/range-op.cc| 16 
 gcc/testsuite/g++.dg/pr96707.C | 10 ++
 2 files changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/pr96707.C

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index a4f6e9eba29..19bdf30911a 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -1941,9 +1941,25 @@ public:
 			  const irange ,
 			  const irange ,
 			  relation_kind rel = VREL_NONE) const;
+  virtual enum tree_code lhs_op1_relation (const irange ,
+	   const irange ,
+	   const irange ) const;
 } op_rshift;
 
 
+enum tree_code
+operator_rshift::lhs_op1_relation (const irange  ATTRIBUTE_UNUSED,
+   const irange ,
+   const irange ) const
+{
+  // If both operands range are >= 0, then the LHS <= op1.
+  if (!op1.undefined_p () && !op2.undefined_p ()
+  && wi::ge_p (op1.lower_bound (), 0, TYPE_SIGN (op1.type ()))
+  && wi::ge_p (op2.lower_bound (), 0, TYPE_SIGN (op2.type (
+return LE_EXPR;
+  return VREL_NONE;
+}
+
 bool
 operator_lshift::fold_range (irange , tree type,
 			 const irange ,
diff --git a/gcc/testsuite/g++.dg/pr96707.C b/gcc/testsuite/g++.dg/pr96707.C
new file mode 100644
index 000..2653fe3d043
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr96707.C
@@ -0,0 +1,10 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+bool f(unsigned x, unsigned y)
+{
+return (x >> y) <= x;
+}
+
+/* { dg-final { scan-tree-dump "return 1" "evrp" } }  */
+
-- 
2.17.2



Re: [PATCH] libgomp, openmp: pinned memory

2022-01-13 Thread Andrew Stubbs

On 05/01/2022 17:07, Andrew Stubbs wrote:
I don't believe 64KB will be anything like enough for any real HPC 
application. Is it really worth optimizing for this case?


Anyway, I'm working on an implementation using mmap instead of malloc 
for pinned allocations. I figure that will simplify the unpin algorithm 
(because it'll be munmap) and optimize for large allocations such as I 
imagine HPC applications will use. It won't fix the ulimit issue.


Here's my new patch.

This version is intended to apply on top of the latest version of my 
low-latency allocator patch, although the dependency is mostly textual.


Pinned memory is allocated via mmap + mlock, and allocation fails 
(returns NULL) if the lock fails and there's no fallback configured.


This means that large allocations will now be page aligned and therefore 
pin the smallest number of pages for the size requested, and that that 
memory will be unpinned automatically when freed via munmap, or moved 
via mremap.


Obviously this is not ideal for allocations much smaller than one page. 
If that turns out to be a problem in the real world then we can add a 
special case fairly straight-forwardly, and incur the extra page 
tracking expense in those cases only, or maybe implement our own 
pinned-memory heap (something like already proposed for low-latency 
memory, perhaps).


Also new is a realloc implementation that works better when reallocation 
fails. This is confirmed by the new testcases.


OK for stage 1?

Thanks

Andrewlibgomp: pinned memory

Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall.  Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(xmlock): New function.
(omp_init_allocator): Don't disallow the pinned trait.
(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
(omp_free): Likewise.
* config/linux/allocator.c: New file.
* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
* testsuite/libgomp.c/alloc-pinned-1.c: New test.
* testsuite/libgomp.c/alloc-pinned-2.c: New test.
* testsuite/libgomp.c/alloc-pinned-3.c: New test.
* testsuite/libgomp.c/alloc-pinned-4.c: New test.

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 1cc7486fc4c..5ab161b6314 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -36,16 +36,20 @@
 
 /* These macros may be overridden in config//allocator.c.  */
 #ifndef MEMSPACE_ALLOC
-#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE)
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : malloc (SIZE))
 #endif
 #ifndef MEMSPACE_CALLOC
-#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE)
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : calloc (1, SIZE))
 #endif
 #ifndef MEMSPACE_REALLOC
-#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE)
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \
+  ((PIN) || (OLDPIN) ? NULL : realloc (ADDR, SIZE))
 #endif
 #ifndef MEMSPACE_FREE
-#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR)
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
+  (PIN ? NULL : free (ADDR))
 #endif
 
 /* Map the predefined allocators to the correct memory space.
@@ -208,7 +212,7 @@ omp_init_allocator (omp_memspace_handle_t memspace, int 
ntraits,
 data.alignment = sizeof (void *);
 
   /* No support for these so far (for hbw will use memkind).  */
-  if (data.pinned || data.memspace == omp_high_bw_mem_space)
+  if (data.memspace == omp_high_bw_mem_space)
 return omp_null_allocator;
 
   ret = gomp_malloc (sizeof (struct omp_allocator_data));
@@ -309,7 +313,8 @@ retry:
   allocator_data->used_pool_size = used_pool_size;
   gomp_mutex_unlock (_data->lock);
 #endif
-  ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
+  ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size,
+   allocator_data->pinned);
   if (ptr == NULL)
{
 #ifdef HAVE_SYNC_BUILTINS
@@ -329,7 +334,8 @@ retry:
= (allocator_data
   ? allocator_data->memspace
   : predefined_alloc_mapping[allocator]);
-  ptr = MEMSPACE_ALLOC (memspace, new_size);
+  ptr = MEMSPACE_ALLOC (memspace, new_size,
+   allocator_data && allocator_data->pinned);
   if (ptr == NULL)
goto fail;
 }
@@ -356,9 +362,9 @@ fail:
 {
 case omp_atv_default_mem_fb:
   if ((new_alignment > sizeof (void *) && new_alignment > alignment)
- || (allocator_data
- && 

Re: [vect] PR103997: Fix epilogue mode skipping

2022-01-13 Thread Andre Vieira (lists) via Gcc-patches



On 13/01/2022 12:36, Richard Biener wrote:

On Thu, 13 Jan 2022, Andre Vieira (lists) wrote:


This time to the list too (sorry for double email)

Hi,

The original patch '[vect] Re-analyze all modes for epilogues', skipped modes
that should not be skipped since it used the vector mode provided by
autovectorize_vector_modes to derive the minimum VF required for it. However,
those modes should only really be used to dictate vector size, so instead this
patch looks for the mode in 'used_vector_modes' with the largest element size,
and constructs a vector mode with the smae size as the current
vector_modes[mode_i]. Since we are using the largest element size the NUNITs
for this mode is the smallest possible VF required for an epilogue with this
mode and should thus skip only the modes we are certain can not be used.

Passes bootstrap and regression on x86_64 and aarch64.

Clearly

+ /* To make sure we are conservative as to what modes we skip, we
+should use check the smallest possible NUNITS which would be
+derived from the mode in USED_VECTOR_MODES with the largest
+element size.  */
+ scalar_mode max_elsize_mode = GET_MODE_INNER
(vector_modes[mode_i]);
+ for (vec_info::mode_set::iterator i =
+   first_loop_vinfo->used_vector_modes.begin ();
+ i != first_loop_vinfo->used_vector_modes.end (); ++i)
+   {
+ if (VECTOR_MODE_P (*i)
+ && GET_MODE_SIZE (GET_MODE_INNER (*i))
+ > GET_MODE_SIZE (max_elsize_mode))
+   max_elsize_mode = GET_MODE_INNER (*i);
+   }

can be done once before iterating over the modes for the epilogue.
True, I'll start with QImode instead of the inner of 
vector_modes[mode_i] too since we can't guarantee the mode is a 
VECTOR_MODE_P and it is actually better too since we can't possible 
guarantee the element size of the USED_VECTOR_MODES is smaller than that 
of the first vector mode...



Richard maybe knows whether we should take care to look at the
size of the vector mode as well since related_vector_mode when
passed 0 as nunits produces a vector mode with the same size
as vector_modes[mode_i] but not all used_vector_modes may be
of the same size
I suspect that should be fine though, since if we use the largest 
element size of all used_vector_modes then that should gives us the 
least possible number of NUNITS and thus only conservatively skip. That 
said, that does assume that no vector mode used may be larger than the 
size of the loop's vector_mode. Can I assume that?


(and you probably also want to exclude
VECTOR_BOOLEAN_TYPE_P from the search?)

Yeah I think so too, thanks!

I keep going back to thinking (as I brought up in the bugzilla ticket), 
maybe we ought to only skip if the NUNITS of the vector mode with the 
same vector size as vector_modes[mode_i] is larger than first_info_vf, 
or just don't skip at all...




Re: [PATCH] forwprop: Canonicalize atomic fetch_op op x to op_fetch or vice versa [PR98737]

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Jan 2022, Jakub Jelinek wrote:

> Hi!
> 
> When writing the PR98737 fix, I've handled just the case where people
> use __atomic_op_fetch (p, x, y) etc.
> But some people actually use the other builtins, like
> __atomic_fetch_op (p, x, y) op x.
> The following patch canonicalizes the latter to the former and vice versa
> when possible if the result of the builtin is a single use and if
> that use is a cast with same precision, also that cast's lhs has a single
> use.
> For all ops of +, -, &, | and ^ we can do those
> __atomic_fetch_op (p, x, y) op x -> __atomic_op_fetch (p, x, y)
> (and __sync too) opts, but cases of INTEGER_CST and SSA_NAME x
> behave differently.  For INTEGER_CST, typically - x is
> canonicalized to + (-x), while for SSA_NAME we need to handle various
> casts, which sometimes happen on the second argument of the builtin
> (there can be even two subsequent casts for char/short due to the
> promotions we do) and there can be a cast on the argument of op too.
> And all ops but - are commutative.
> For the other direction, i.e.
> __atomic_op_fetch (p, x, y) rop x -> __atomic_fetch_op (p, x, y)
> we can't handle op of & and |, those aren't reversible, for
> op + rop is -, for - rop is + and for ^ rop is ^, otherwise the same
> stuff as above applies.
> And, there is another case, we canonicalize
> x - y == 0 (or != 0) and x ^ y == 0 (or != 0) to x == y (or x != y)
> and for constant y x + y == 0 (or != 0) to x == -y (or != -y),
> so the patch also virtually undoes those canonicalizations, because
> e.g. for the earlier PR98737 patch but even generally, it is better
> if a result of atomic op fetch is compared against 0 than doing
> atomic fetch op and compare it to some variable or non-zero constant.
> As for debug info, for non-reversible operations (& and |) the patch
> resets debug stmts if there are any, for -fnon-call-exceptions too
> (didn't want to include debug temps right before all uses), but
> otherwise it emits the reverse operation from the result as a debug
> temp and uses that in debug stmts.
> 
> On the emitted assembly for the testcases which are fairly large,
> I see substantial decreases of the *.s size:
> -rw-rw-r--. 1 jakub jakub 116897 Jan 13 09:58 pr98737-1.svanilla
> -rw-rw-r--. 1 jakub jakub  93861 Jan 13 09:57 pr98737-1.spatched
> -rw-rw-r--. 1 jakub jakub  70257 Jan 13 09:57 pr98737-2.svanilla
> -rw-rw-r--. 1 jakub jakub  67537 Jan 13 09:57 pr98737-2.spatched
> There are some functions where due to RA we get one more instruction
> than previously, but most of them are smaller even when not hitting
> the PR98737 previous patch's optimizations.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2022-01-13  Jakub Jelinek  
> 
>   PR target/98737
>   * tree-ssa-forwprop.c (simplify_builtin_call): Canonicalize
>   __atomic_fetch_op (p, x, y) op x into __atomic_op_fetch (p, x, y)
>   and __atomic_op_fetch (p, x, y) iop x into
>   __atomic_fetch_op (p, x, y).
> 
>   * gcc.dg/tree-ssa/pr98737-1.c: New test.
>   * gcc.dg/tree-ssa/pr98737-2.c: New test.
> 
> --- gcc/tree-ssa-forwprop.c.jj2022-01-11 23:11:23.467275019 +0100
> +++ gcc/tree-ssa-forwprop.c   2022-01-12 22:12:24.666522743 +0100
> @@ -1241,12 +1241,19 @@ constant_pointer_difference (tree p1, tr
> memset (p + 4, ' ', 3);
> into
> memcpy (p, "abcd   ", 7);
> -   call if the latter can be stored by pieces during expansion.  */
> +   call if the latter can be stored by pieces during expansion.
> +
> +   Also canonicalize __atomic_fetch_op (p, x, y) op x
> +   to __atomic_op_fetch (p, x, y) or
> +   __atomic_op_fetch (p, x, y) iop x
> +   to __atomic_fetch_op (p, x, y) when possible (also __sync).  */
>  
>  static bool
>  simplify_builtin_call (gimple_stmt_iterator *gsi_p, tree callee2)
>  {
>gimple *stmt1, *stmt2 = gsi_stmt (*gsi_p);
> +  enum built_in_function other_atomic = END_BUILTINS;
> +  enum tree_code atomic_op = ERROR_MARK;
>tree vuse = gimple_vuse (stmt2);
>if (vuse == NULL)
>  return false;
> @@ -1448,6 +1455,300 @@ simplify_builtin_call (gimple_stmt_itera
>   }
>   }
>break;
> +
> + #define CASE_ATOMIC(NAME, OTHER, OP) \
> +case BUILT_IN_##NAME##_1:
> \
> +case BUILT_IN_##NAME##_2:
> \
> +case BUILT_IN_##NAME##_4:
> \
> +case BUILT_IN_##NAME##_8:
> \
> +case BUILT_IN_##NAME##_16:   
> \
> +  atomic_op = OP;
> \
> +  other_atomic   \
> + = (enum built_in_function) (BUILT_IN_##OTHER##_1\
> + + (DECL_FUNCTION_CODE (callee2) \
> +- 

Re: [PATCH] disable aggressive_loop_optimizations until niter ready

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Jan 2022, guojiufu wrote:

> On 2022-01-03 22:30, Richard Biener wrote:
> > On Wed, 22 Dec 2021, Jiufu Guo wrote:
> > 
> >> Hi,
> >> 
> >> Normaly, estimate_numbers_of_iterations get/caculate niter first,
> >> and then invokes infer_loop_bounds_from_undefined. While in some case,
> >> after a few call stacks, estimate_numbers_of_iterations is invoked before
> >> niter is ready (e.g. before number_of_latch_executions returns).
> >> 
> >> e.g. number_of_latch_executions->...follow_ssa_edge_expr-->
> >>   --> estimate_numbers_of_iterations --> 
> >> infer_loop_bounds_from_undefined.
> >> 
> >> Since niter is still not computed, call to infer_loop_bounds_from_undefined
> >> may not get final result.
> >> To avoid infer_loop_bounds_from_undefined to be called with interim state
> >> and avoid infer_loop_bounds_from_undefined generates interim data, during
> >> niter's computing, we could disable flag_aggressive_loop_optimizations.
> >> 
> >> Bootstrap and regtest pass on ppc64* and x86_64.  Is this ok for trunk?
> > 
> > So this is a optimality fix, not a correctness one?  I suppose the
> > estimates are computed/used from scev_probably_wraps_p via
> > loop_exits_before_overflow and ultimatively chrec_convert.
> > 
> > We have a call cycle here,
> > 
> > estimate_numbers_of_iterations -> number_of_latch_executions ->
> > ... -> estimate_numbers_of_iterations
> > 
> > where the first estimate_numbers_of_iterations will make sure
> > the later call will immediately return.
> 
> Hi Richard,
> Thanks for your comments! And sorry for the late reply.
> 
> In estimate_numbers_of_iterations, there is a guard to make sure
> the second call to estimate_numbers_of_iterations returns
> immediately.
> 
> Exactly as you said, it relates to scev_probably_wraps_p calls
> loop_exits_before_overflow.
> 
> The issue is: the first calling to estimate_numbers_of_iterations
> maybe inside number_of_latch_executions.
> 
> > 
> > I'm not sure what your patch tries to do - it seems to tackle
> > the case where we enter the cycle via number_of_latch_executions?
> > Why do we get "non-final" values?  idx_infer_loop_bounds resorts
> 
> Right, when the call cycle starts from number_of_latch_execution,
> the issue may occur:
> 
> number_of_latch_executions(*1st call)->..->
> analyze_scalar_evolution(IVs 1st) ->..follow_ssa_edge_expr..->
> loop_exits_before_overflow->
> estimate_numbers_of_iterations (*1st call)->
> number_of_latch_executions(*2nd call)->..->
> analyze_scalar_evolution(IVs 2nd)->..loop_exits_before_overflow->
> estimate_numbers_of_iterations(*2nd call)
> 
> The second calling to estimate_numbers_of_iterations returns quickly.
> And then, in the first calling to estimate_numbers_of_iterations,
> infer_loop_bounds_from_undefined is invoked.
> 
> And, function "infer_loop_bounds_from_undefined" instantiate/analyze
> SCEV for each SSA in the loop.
> *Here the issue occur*, these SCEVs are based on the interim IV's
> SCEV which come from "analyze_scalar_evolution(IVs 2nd)",
> and those IV's SCEV will be overridden by up level
> "analyze_scalar_evolution(IVs 1st)".

OK, so indeed analyze_scalar_evolution is not protected against
recursive invocation on the same SSA name (though it definitely
doesn't expect to do that).  We could fix that by pre-seeding
the cache conservatively in analyze_scalar_evolution or by
not overwriting the cached result of the recursive invocation.

But to re-iterate an unanswered question, is this a correctness issue
or an optimization issue?

> To handle this issue, disabling flag_aggressive_loop_optimizations
> inside number_of_latch_executions is one method.
> To avoid the issue in other cases, e.g. the call cycle starts from
> number_of_iterations_exit or number_of_iterations_exit_assumptions,
> this patch disable flag_aggressive_loop_optimizations inside
> number_of_iterations_exit_assumptions.

But disabling flag_aggressive_loop_optimizations is a very
non-intuitive way of avoiding recursive calls.  I'd rather
avoid those in a similar way estimate_numbers_of_iterations does,
for example with

diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 61d72c278a1..cc1e510b6c2 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -2807,7 +2807,7 @@ number_of_latch_executions (class loop *loop)
   if (dump_file && (dump_flags & TDF_SCEV))
 fprintf (dump_file, "(number_of_iterations_in_loop = \n");
 
-  res = chrec_dont_know;
+  loop->nb_iterations = res = chrec_dont_know;
   exit = single_exit (loop);
 
   if (exit && number_of_iterations_exit (loop, exit, _desc, false))

though this doesn't seem to improve the SCEV analysis with your
testcase.  Alternatively one could more conciously compute an
"estimated" estimate like with

diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 61d72c278a1..8529c44d574 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -2802,6 +2802,19 @@ 

Re: [PATCH] Fix -Wformat-diag for rs6000 target.

2022-01-13 Thread Richard Sandiford via Gcc-patches
Martin Sebor via Gcc-patches  writes:
> On 1/12/22 02:02, Martin Liška wrote:
>> Hello.
>> 
>> We've got -Wformat-diag for some time and I think we should start using it
>> in -Werror for GCC bootstrap. The following patch removes last pieces of 
>> the warning
>> for rs6000 target.
>> 
>> Ready to be installed?
>> Thanks,
>> Martin
>> 
>> 
>> gcc/ChangeLog:
>> 
>>  * config/rs6000/rs6000-call.c (rs6000_invalid_builtin): Wrap
>>  keywords and use %qs instead of %<%s%>.
>>  (rs6000_expand_builtin): Likewise.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.target/powerpc/bfp/scalar-extract-exp-5.c: Adjust scans in
>>  testcases.
>>  * gcc.target/powerpc/bfp/scalar-extract-sig-5.c: Likewise.
>>  * gcc.target/powerpc/bfp/scalar-insert-exp-11.c: Likewise.
>> ---
>>   gcc/config/rs6000/rs6000-call.c   | 8 
>>   .../gcc.target/powerpc/bfp/scalar-extract-exp-5.c | 2 +-
>>   .../gcc.target/powerpc/bfp/scalar-extract-sig-5.c | 2 +-
>>   .../gcc.target/powerpc/bfp/scalar-insert-exp-11.c | 2 +-
>>   4 files changed, 7 insertions(+), 7 deletions(-)
>> 
>> diff --git a/gcc/config/rs6000/rs6000-call.c 
>> b/gcc/config/rs6000/rs6000-call.c
>> index c78b8b08c40..becdad73812 100644
>> --- a/gcc/config/rs6000/rs6000-call.c
>> +++ b/gcc/config/rs6000/rs6000-call.c
>> @@ -3307,7 +3307,7 @@ rs6000_invalid_builtin (enum rs6000_gen_builtins 
>> fncode)
>>    "-mvsx");
>>     break;
>>   case ENB_IEEE128_HW:
>> -  error ("%qs requires ISA 3.0 IEEE 128-bit floating point", name);
>> +  error ("%qs requires ISA 3.0 IEEE 128-bit floating-point", name);
>
> The instances of the warning where floating point is at the end
> of a message aren't correct.  The warning should be relaxed to
> allow unhyphenated floating point as a noun (as discussed briefly
> last March:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566881.html)

Wouldn't it be fair to say that “floating point” in the message above is
really an adjective modifying an implicit noun?  The floating (decimal)
point doesn't itself have 128 bits.

Like you say in the linked message, we could add an explicit noun too.
But the change seems OK as-is to me.

Thanks,
Richard


Re: [PATCH] [gfortran] Add support for allocate clause (OpenMP 5.0).

2022-01-13 Thread Jakub Jelinek via Gcc-patches
On Tue, Jan 11, 2022 at 10:31:54PM +, Hafiz Abid Qadeer wrote:
> +   gfc_omp_namelist *n;
> +   for (n = *head; n; n = n->next)

Better
  for (gfc_omp_namelist *n = *head; n; n = n->next)
as we are in C++ and n isn't used after the loop.

> +  /* non-composite constructs.  */

Capital N

Ok for trunk with these nits fixed, no need to repost.

Jakub



Re: [PATCH] rs6000: Use known constant for GET_MODE_NUNITS and similar

2022-01-13 Thread Kewen.Lin via Gcc-patches
Hi David,

on 2022/1/13 上午11:12, David Edelsohn wrote:
> On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> This patch is to clean up some codes with GET_MODE_UNIT_SIZE or
>> GET_MODE_NUNITS, which can use known constant instead.
> 
> I'll let Segher decide, but often the additional code is useful
> self-documentation instead of magic constants.  Or at least the change
> requires comments documenting the derivation of the constants
> currently described by the code itself.
> 

Thanks for the comments, I added some comments as suggested, also removed
the whole "altivec_vreveti2" since I noticed it's useless, it's not used
by any built-in functions and even unused in the commit db042e1603db50573.

The updated version has been tested as before.

BR,
Kewen
-
gcc/ChangeLog:

* config/rs6000/altivec.md (altivec_vreveti2): Remove.
* config/rs6000/vsx.md (*vsx_extract_si, *vsx_extract_si_float_df,
*vsx_extract_si_float_, *vsx_insert_extract_v4sf_p9): Use
known constant values to simplify code.
---
 gcc/config/rs6000/altivec.md | 25 -
 gcc/config/rs6000/vsx.md | 12 
 2 files changed, 8 insertions(+), 29 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index c2312cc1e0f..b7f056f8c60 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -3950,31 +3950,6 @@ (define_expand "altivec_negv4sf2"
   DONE;
 })

-;; Vector reverse elements
-(define_expand "altivec_vreveti2"
-  [(set (match_operand:TI 0 "register_operand" "=v")
-   (unspec:TI [(match_operand:TI 1 "register_operand" "v")]
- UNSPEC_VREVEV))]
-  "TARGET_ALTIVEC"
-{
-  int i, j, size, num_elements;
-  rtvec v = rtvec_alloc (16);
-  rtx mask = gen_reg_rtx (V16QImode);
-
-  size = GET_MODE_UNIT_SIZE (TImode);
-  num_elements = GET_MODE_NUNITS (TImode);
-
-  for (j = 0; j < num_elements; j++)
-for (i = 0; i < size; i++)
-  RTVEC_ELT (v, i + j * size)
-   = GEN_INT (i + (num_elements - 1 - j) * size);
-
-  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
-  emit_insn (gen_altivec_vperm_ti (operands[0], operands[1],
-operands[1], mask));
-  DONE;
-})
-
 ;; Vector reverse elements for V16QI V8HI V4SI V4SF
 (define_expand "altivec_vreve2"
   [(set (match_operand:VEC_K 0 "register_operand" "=v")
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 802db0d112b..d246410880d 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3854,8 +3854,9 @@ (define_insn_and_split  "*vsx_extract_si"
   rtx vec_tmp = operands[3];
   int value;

+  /* Adjust index for LE element ordering.  */
   if (!BYTES_BIG_ENDIAN)
-element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
+element = GEN_INT (3 - INTVAL (element));

   /* If the value is in the correct position, we can avoid doing the VSPLT
  instruction.  */
@@ -4230,8 +4231,9 @@ (define_insn_and_split "*vsx_extract_si_float_df"
   rtx v4si_tmp = operands[3];
   int value;

+  /* Adjust index for LE element ordering.  */
   if (!BYTES_BIG_ENDIAN)
-element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
+element = GEN_INT (3 - INTVAL (element));

   /* If the value is in the correct position, we can avoid doing the VSPLT
  instruction.  */
@@ -4273,8 +4275,9 @@ (define_insn_and_split "*vsx_extract_si_float_"
   rtx df_tmp = operands[4];
   int value;

+  /* Adjust index for LE element ordering.  */
   if (!BYTES_BIG_ENDIAN)
-element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
+element = GEN_INT (3 - INTVAL (element));

   /* If the value is in the correct position, we can avoid doing the VSPLT
  instruction.  */
@@ -4466,8 +4469,9 @@ (define_insn "*vsx_insert_extract_v4sf_p9"
 {
   int ele = INTVAL (operands[4]);

+  /* Adjust index for LE element ordering.  */
   if (!BYTES_BIG_ENDIAN)
-ele = GET_MODE_NUNITS (V4SFmode) - 1 - ele;
+ele = 3 - ele;

   operands[4] = GEN_INT (GET_MODE_SIZE (SFmode) * ele);
   return "xxinsertw %x0,%x2,%4";
--
2.27.0



Re: [vect] PR103997: Fix epilogue mode skipping

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Jan 2022, Andre Vieira (lists) wrote:

> This time to the list too (sorry for double email)
> 
> Hi,
> 
> The original patch '[vect] Re-analyze all modes for epilogues', skipped modes
> that should not be skipped since it used the vector mode provided by
> autovectorize_vector_modes to derive the minimum VF required for it. However,
> those modes should only really be used to dictate vector size, so instead this
> patch looks for the mode in 'used_vector_modes' with the largest element size,
> and constructs a vector mode with the smae size as the current
> vector_modes[mode_i]. Since we are using the largest element size the NUNITs
> for this mode is the smallest possible VF required for an epilogue with this
> mode and should thus skip only the modes we are certain can not be used.
> 
> Passes bootstrap and regression on x86_64 and aarch64.

Clearly

+ /* To make sure we are conservative as to what modes we skip, we
+should use check the smallest possible NUNITS which would be
+derived from the mode in USED_VECTOR_MODES with the largest
+element size.  */
+ scalar_mode max_elsize_mode = GET_MODE_INNER
(vector_modes[mode_i]);
+ for (vec_info::mode_set::iterator i =
+   first_loop_vinfo->used_vector_modes.begin ();
+ i != first_loop_vinfo->used_vector_modes.end (); ++i)
+   {
+ if (VECTOR_MODE_P (*i)
+ && GET_MODE_SIZE (GET_MODE_INNER (*i))
+ > GET_MODE_SIZE (max_elsize_mode))
+   max_elsize_mode = GET_MODE_INNER (*i);
+   }

can be done once before iterating over the modes for the epilogue.

Richard maybe knows whether we should take care to look at the
size of the vector mode as well since related_vector_mode when
passed 0 as nunits produces a vector mode with the same size
as vector_modes[mode_i] but not all used_vector_modes may be
of the same size (and you probably also want to exclude
VECTOR_BOOLEAN_TYPE_P from the search?)

Thanks,
Richard.

> gcc/ChangeLog:
> 
>     PR 103997
>     * tree-vect-loop.c (vect_analyze_loop): Fix mode skipping for 
> epilogue
>     vectorization.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


Re: [PATCH] rs6000: Fix constraint v with rs6000_constraints[RS6000_CONSTRAINT_v]

2022-01-13 Thread Kewen.Lin via Gcc-patches
on 2022/1/13 上午11:56, Kewen.Lin via Gcc-patches wrote:
> on 2022/1/13 上午11:44, David Edelsohn wrote:
>> On Wed, Jan 12, 2022 at 10:38 PM Kewen.Lin  wrote:
>>>
>>> Hi David,
>>>
>>> on 2022/1/13 上午11:07, David Edelsohn wrote:
 On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
>
> Hi,
>
> This patch is to fix register constraint v with
> rs6000_constraints[RS6000_CONSTRAINT_v] instead of ALTIVEC_REGS,
> just like some other existing register constraints with
> RS6000_CONSTRAINT_*.
>
> I happened to see this and hope it's not intentional and just
> got neglected.
>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> powerpc64-linux-gnu P8.
>
> Is it ok for trunk?

 Why do you want to make this change?

 rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;

 but all of the patterns that use a "v" constraint are (or should be)
 protected by TARGET_ALTIVEC, or some final condition that only is
 active for TARGET_ALTIVEC.  The other constraints are conditionally
 set because they can be used in a pattern with multiple alternatives
 where the pattern itself is active but some of the constraints
 correspond to NO_REGS when some instruction variants for VSX is not
 enabled.

>>>
>>> Good point!  Thanks for the explanation.
>>>
 The change isn't wrong, but it doesn't correct a bug and provides no
 additional benefit nor clarty that I can see.

>>>
>>> The original intention is to make it consistent with the other existing
>>> register constraints with RS6000_CONSTRAINT_*, otherwise it looks a bit
>>> weird (like was neglected).  After you clarified above, RS6000_CONSTRAINT_v
>>> seems useless at all in the current framework.  Do you prefer to remove
>>> it to avoid any confusions instead?
>>
>> It's used in the reg_class, so there may be some heuristic in the GCC
>> register allocator that cares about the number of registers available
>> for the target.  rs6000_constraints[RS6000_CONSTRAINT_v] is defined
>> conditionally, so it seems best to leave it as is.
>>
> 
> I may miss something, but I didn't find it's used for the above purposes.
> If it's best to leave it as is, the proposed patch seems to offer better
> readability.

Two more inputs for maintainers' decision:

1) the original proposed patch fixed one "bug" that is:

In function rs6000_debug_reg_global, it tries to print the register class
for the register constraint:

  fprintf (stderr,
   "\n"
   "d  reg_class = %s\n"
   "f  reg_class = %s\n"
   "v  reg_class = %s\n"
   "wa reg_class = %s\n"
   ...
   "\n",
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wa]],
   ...

It uses rs6000_constraints[RS6000_CONSTRAINT_v] which is conditionally
set here:

  /* Add conditional constraints based on various options, to allow us to
 collapse multiple insn patterns.  */
  if (TARGET_ALTIVEC)
rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;

But the actual register class for register constraint is hardcoded as
ALTIVEC_REGS rather than rs6000_constraints[RS6000_CONSTRAINT_v].

2) Bootstrapped and tested one below patch to remove all the code using
RS6000_CONSTRAINT_v on powerpc64le-linux-gnu P10 and P9,
powerpc64-linux-gnu P8 and P7 with no regressions.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 37f07fe5358..3652629c5d0 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -2320,7 +2320,6 @@ rs6000_debug_reg_global (void)
   "\n"
   "d  reg_class = %s\n"
   "f  reg_class = %s\n"
-  "v  reg_class = %s\n"
   "wa reg_class = %s\n"
   "we reg_class = %s\n"
   "wr reg_class = %s\n"
@@ -2329,7 +2328,6 @@ rs6000_debug_reg_global (void)
   "\n",
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]],
-  reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wa]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_we]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wr]],
@@ -2984,11 +2982,6 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
   if (TARGET_VSX)
 rs6000_constraints[RS6000_CONSTRAINT_wa] = VSX_REGS;

-  /* Add conditional constraints based on various options, to allow us to
- collapse multiple insn patterns.  */
-  if (TARGET_ALTIVEC)
-rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;
-
   if (TARGET_POWERPC64)
 {
   rs6000_constraints[RS6000_CONSTRAINT_wr] = GENERAL_REGS;
diff --git 

Re: [committed] libgomp/testsuite: Improve omp_get_device_num() tests

2022-01-13 Thread Thomas Schwinge
Hi!

On 2022-01-04T15:12:58+0100, Tobias Burnus  wrote:
> This commit r12-6209 now makes the testcases iterate over all devices
> (including the initial/host device).
>
> Hence, with multiple non-host devices and this test, the error had been
> found before ... ;-)

Yay for test cases!  :-)

... but we now run into issues if Intel MIC (emulated) offloading is
(additionally) enabled, because that one still doesn't properly implement
device-side 'omp_get_device_num'.  ;-)

Thus pushed to master branch
commit d97364aab1af361275b87713154c366ce2b9029a
"Improve Intel MIC offloading XFAILing for 'omp_get_device_num'", see
attached.

(It wasn't obvious to me how to implement that; very incomplete
"[WIP] Intel MIC 'omp_get_device_num'" attached, not planning on working
on this any further.)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From d97364aab1af361275b87713154c366ce2b9029a Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 5 Jan 2022 19:52:25 +0100
Subject: [PATCH] Improve Intel MIC offloading XFAILing for
 'omp_get_device_num'

After recent commit be661959a6b6d8f9c3c8608a746789e7b2ec3ca4
"libgomp/testsuite: Improve omp_get_device_num() tests", we're now iterating
over all OpenMP target devices.  Intel MIC (emulated) offloading still doesn't
properly implement device-side 'omp_get_device_num', and we thus regress:

PASS: libgomp.c/../libgomp.c-c++-common/target-45.c (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/target-45.c execution test

PASS: libgomp.c++/../libgomp.c-c++-common/target-45.c (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c++/../libgomp.c-c++-common/target-45.c execution test

PASS: libgomp.fortran/target10.f90   -O0  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O0  execution test
PASS: libgomp.fortran/target10.f90   -O1  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O1  execution test
PASS: libgomp.fortran/target10.f90   -O2  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O2  execution test
PASS: libgomp.fortran/target10.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
PASS: libgomp.fortran/target10.f90   -O3 -g  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O3 -g  execution test
PASS: libgomp.fortran/target10.f90   -Os  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -Os  execution test

Improve the XFAILing added in commit bb75b22aba254e8ff144db27b1c8b4804bad73bb
"Allow matching Intel MIC in OpenMP 'declare variant'" for the case that *any*
Intel MIC offload device is available.

	libgomp/
	* testsuite/libgomp.c-c++-common/on_device_arch.h
	(any_device_arch, any_device_arch_intel_mic): New.
	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_device_any_intel_mic): New.
	* testsuite/libgomp.c-c++-common/target-45.c: Use it.
	* testsuite/libgomp.fortran/target10.f90: Likewise.
---
 libgomp/testsuite/lib/libgomp.exp | 12 +-
 .../libgomp.c-c++-common/on_device_arch.h | 23 +++
 .../libgomp.c-c++-common/target-45.c  |  2 +-
 .../testsuite/libgomp.fortran/target10.f90|  2 +-
 4 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 57fb6b068f3..8c5ecfff0ac 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -451,7 +451,6 @@ proc check_effective_target_openacc_nvidia_accel_selected { } {
 # Return 1 if using Intel MIC offload device.
 proc check_effective_target_offload_device_intel_mic { } {
 return [check_runtime_nocache offload_device_intel_mic {
-  #include 
   #include "testsuite/libgomp.c-c++-common/on_device_arch.h"
   int main ()
 	{
@@ -460,6 +459,17 @@ proc check_effective_target_offload_device_intel_mic { } {
 } ]
 }
 
+# Return 1 if any Intel MIC offload device is available.
+proc check_effective_target_offload_device_any_intel_mic { } {
+return [check_runtime_nocache offload_device_any_intel_mic {
+  #include "testsuite/libgomp.c-c++-common/on_device_arch.h"
+  int main ()
+	{
+	  return !any_device_arch_intel_mic ();
+	}
+} ]
+}
+
 # Return 1 if the OpenACC 'host' device type is selected.
 
 proc check_effective_target_openacc_host_selected { } {
diff --git a/libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h b/libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h
index 

Re: [PATCH] Mass rename of C++ .c files to .cc suffix

2022-01-13 Thread Martin Jambor
On Thu, Jan 13 2022, Jakub Jelinek via Gcc-patches wrote:
> On Thu, Jan 13, 2022 at 12:20:57PM +0100, Martin Liška wrote:
>> On 1/13/22 12:14, Richard Biener wrote:
>> > But please make sure all intermediate revs will still build.
>> 
>> That's not possible :) I don't it's a good idea mixing .cc renaming
>> and changes in that files.
>
> I think it is possible, but would require more work.
> Comments in the files don't matter for sure, and in the Makefiles we
> could do (just one random file can be checked):
> ifeq (,$(wildcard $(srcdir)/expr.cc))
> what we used to do
> else
> what we want to do newly
> endif
> A commit that changes the Makefiles that way comes first, then
> the renaming commit, then a commit that removes those ifeq ... else
> and endif lines.
>

I would expect that the problematic case is only when you modify a file
that you also rename.  Is there any such file where we do more than
adjust comments, where the contents modifications are essential for
bootstrap too?

I would expect that modifications in Makefiles, configure-scripts etc
could go in the same commit as the renames and these could be then
followed up with comments adjustments and similar.

But it would be more work, so I guess just using git bisect skip if
bisection ever lands in the middle of this is acceptable in this special
case too.

Martin



Merge 'c-c++-common/goacc/routine-6.c' into 'c-c++-common/goacc/routine-5.c', and document current C/C++ difference (was: [PATCH] openacc: Fix up C++ #pragma acc routine handling [PR101731])

2022-01-13 Thread Thomas Schwinge
Hi!

On 2021-11-22T16:02:31+0100, Jakub Jelinek via Gcc-patches 
 wrote:
> On Mon, Nov 22, 2021 at 03:49:42PM +0100, Thomas Schwinge wrote:
>> Then, regarding the user-visible behavior:
>>
>> > +#pragma acc routine  /* { dg-error "not immediately followed by a single 
>> > function declaration or definition" "" { target c++ } } */
>> > +int foo (int bar ());
>>
>> So in C++ we now refuse, but in C we do accept this.  I suppose I shall
>> look into making C behave the same way -- unless there is a reason for
>> the different behavior?  And/or, is it actually is useful to allow such
>> nested usage?  Per its associated clauses, an OpenACC 'routine' directive
>> really is meant to apply to one function only, in contrast to OpenMP
>> 'target declare'.  But the question is whether we should raise an error
>> for the example above, or whether the 'routine' shall just apply to 'foo'
>> but not 'bar', but without an error diagnostic?
>
> All I've verified is that our OpenMP code handles it the same way,

Thanks for the explanation.

Pushed to master branch commit 67fdcc8835665b5bc13652205e815e498d65c5a1
"Merge 'c-c++-common/goacc/routine-6.c' into
'c-c++-common/goacc/routine-5.c', and document current C/C++ difference",
see attached.


Grüße
 Thomas


> i.e.
> #pragma omp declare simd
> int foo (int bar ());
> is accepted in C and rejected in C++.
> I guess one question is to check if it is in both languages actually
> the same thing.  If we want to accept it in C++ and let the pragma
> apply only to the outer declaration, I guess we'd need to temporarily
> set to NULL parser->omp_declare_simd and parser->oacc_routine while
> parsing the parameters of a function declaration or definition.
> At least OpenMP is fairly fuzzy here, the reason we error on
> #pragma omp declare simd
> int foo (), i;
> has been mainly some discussions in the lang committee and the fact
> that it talks about a single declaration, not all affected declarations.
> Whether int foo (int bar ()); should be in that light treated as two
> function declarations or one with another one nested in it and irrelevant
> for it is unclear.
>
>   Jakub


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 67fdcc8835665b5bc13652205e815e498d65c5a1 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 22 Nov 2021 16:09:09 +0100
Subject: [PATCH] Merge 'c-c++-common/goacc/routine-6.c' into
 'c-c++-common/goacc/routine-5.c', and document current C/C++ difference

	gcc/testsuite/
	* c-c++-common/goacc/routine-6.c: Merge into...
	* c-c++-common/goacc/routine-5.c: ... this, and document current
	C/C++ difference.
---
 gcc/testsuite/c-c++-common/goacc/routine-5.c | 8 
 gcc/testsuite/c-c++-common/goacc/routine-6.c | 4 
 2 files changed, 8 insertions(+), 4 deletions(-)
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/routine-6.c

diff --git a/gcc/testsuite/c-c++-common/goacc/routine-5.c b/gcc/testsuite/c-c++-common/goacc/routine-5.c
index e3fbd6573b8..94678f2bf5b 100644
--- a/gcc/testsuite/c-c++-common/goacc/routine-5.c
+++ b/gcc/testsuite/c-c++-common/goacc/routine-5.c
@@ -94,6 +94,14 @@ typedef struct c_2 c_2;
 #pragma acc routine /* { dg-error ".#pragma acc routine. not immediately followed by function declaration or definition" } */
 struct d_2 {} d_2;
 
+/* PR c++/101731 */
+/* Regarding the current C/C++ difference, see
+   .  */
+#pragma acc routine /* { dg-error "not immediately followed by a single function declaration or definition" "" { target c++ } } */
+int pr101731_foo (int pr101731_bar ());
+#pragma acc routine (pr101731_foo) vector /* { dg-error "has already been marked with an OpenACC 'routine' directive" "" { target c } } */
+#pragma acc routine (pr101731_bar) vector /* { dg-error "'pr101731_bar' has not been declared" } */
+
 #pragma acc routine /* { dg-error ".#pragma acc routine. not immediately followed by function declaration or definition" } */
 #pragma acc routine
 int fn4 (void);
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-6.c b/gcc/testsuite/c-c++-common/goacc/routine-6.c
deleted file mode 100644
index 0a231a015a7..000
--- a/gcc/testsuite/c-c++-common/goacc/routine-6.c
+++ /dev/null
@@ -1,4 +0,0 @@
-/* PR c++/101731 */
-
-#pragma acc routine	/* { dg-error "not immediately followed by a single function declaration or definition" "" { target c++ } } */
-int foo (int bar ());
-- 
2.34.1



Re: [PATCH] PR fortran/67804 - ICE on data initialization of type(character) with wrong data

2022-01-13 Thread Mikael Morin

Le 12/01/2022 à 21:29, Harald Anlauf via Fortran a écrit :

Dear Fortranners,

the attached patch improves error recovery after an invalid
structure constructor has been detected in a DATA statement.

Testcase by Gerhard.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

This should be a rather safe patch which I would like to
backport to 11-branch after a suitable waiting period.


OK; thanks.


Re: [PATCH] Mass rename of C++ .c files to .cc suffix

2022-01-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 13, 2022 at 12:20:57PM +0100, Martin Liška wrote:
> On 1/13/22 12:14, Richard Biener wrote:
> > But please make sure all intermediate revs will still build.
> 
> That's not possible :) I don't it's a good idea mixing .cc renaming
> and changes in that files.

I think it is possible, but would require more work.
Comments in the files don't matter for sure, and in the Makefiles we
could do (just one random file can be checked):
ifeq (,$(wildcard $(srcdir)/expr.cc))
what we used to do
else
what we want to do newly
endif
A commit that changes the Makefiles that way comes first, then
the renaming commit, then a commit that removes those ifeq ... else
and endif lines.

Jakub



Re: [PATCH] disable aggressive_loop_optimizations until niter ready

2022-01-13 Thread guojiufu via Gcc-patches

On 2022-01-03 22:30, Richard Biener wrote:

On Wed, 22 Dec 2021, Jiufu Guo wrote:


Hi,

Normaly, estimate_numbers_of_iterations get/caculate niter first,
and then invokes infer_loop_bounds_from_undefined. While in some case,
after a few call stacks, estimate_numbers_of_iterations is invoked 
before

niter is ready (e.g. before number_of_latch_executions returns).

e.g. number_of_latch_executions->...follow_ssa_edge_expr-->
  --> estimate_numbers_of_iterations --> 
infer_loop_bounds_from_undefined.


Since niter is still not computed, call to 
infer_loop_bounds_from_undefined

may not get final result.
To avoid infer_loop_bounds_from_undefined to be called with interim 
state
and avoid infer_loop_bounds_from_undefined generates interim data, 
during
niter's computing, we could disable 
flag_aggressive_loop_optimizations.


Bootstrap and regtest pass on ppc64* and x86_64.  Is this ok for 
trunk?


So this is a optimality fix, not a correctness one?  I suppose the
estimates are computed/used from scev_probably_wraps_p via
loop_exits_before_overflow and ultimatively chrec_convert.

We have a call cycle here,

estimate_numbers_of_iterations -> number_of_latch_executions ->
... -> estimate_numbers_of_iterations

where the first estimate_numbers_of_iterations will make sure
the later call will immediately return.


Hi Richard,
Thanks for your comments! And sorry for the late reply.

In estimate_numbers_of_iterations, there is a guard to make sure
the second call to estimate_numbers_of_iterations returns
immediately.

Exactly as you said, it relates to scev_probably_wraps_p calls
loop_exits_before_overflow.

The issue is: the first calling to estimate_numbers_of_iterations
maybe inside number_of_latch_executions.



I'm not sure what your patch tries to do - it seems to tackle
the case where we enter the cycle via number_of_latch_executions?
Why do we get "non-final" values?  idx_infer_loop_bounds resorts


Right, when the call cycle starts from number_of_latch_execution,
the issue may occur:

number_of_latch_executions(*1st call)->..->
analyze_scalar_evolution(IVs 1st) ->..follow_ssa_edge_expr..->
loop_exits_before_overflow->
estimate_numbers_of_iterations (*1st call)->
number_of_latch_executions(*2nd call)->..->
analyze_scalar_evolution(IVs 2nd)->..loop_exits_before_overflow-> 
estimate_numbers_of_iterations(*2nd call)


The second calling to estimate_numbers_of_iterations returns quickly.
And then, in the first calling to estimate_numbers_of_iterations,
infer_loop_bounds_from_undefined is invoked.

And, function "infer_loop_bounds_from_undefined" instantiate/analyze
SCEV for each SSA in the loop.
*Here the issue occur*, these SCEVs are based on the interim IV's
SCEV which come from "analyze_scalar_evolution(IVs 2nd)",
and those IV's SCEV will be overridden by up level
"analyze_scalar_evolution(IVs 1st)".

To handle this issue, disabling flag_aggressive_loop_optimizations
inside number_of_latch_executions is one method.
To avoid the issue in other cases, e.g. the call cycle starts from
number_of_iterations_exit or number_of_iterations_exit_assumptions,
this patch disable flag_aggressive_loop_optimizations inside
number_of_iterations_exit_assumptions.

Thanks again.

BR,
Jiufu


to SCEV and thus may recurse again - to me it would be more
logical to try avoid recursing in number_of_latch_executions by
setting ->nb_iterations to something early, maybe chrec_dont_know,
to signal we're using something we're just trying to compute.

Richard.


BR,
Jiufu

gcc/ChangeLog:

* tree-ssa-loop-niter.c (number_of_iterations_exit_assumptions):
Disable/restore flag_aggressive_loop_optimizations.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/scev-16.c: New test.

---
 gcc/tree-ssa-loop-niter.c   | 23 +++
 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 20 
 2 files changed, 39 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 06954e437f5..51bb501019e 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2534,18 +2534,31 @@ number_of_iterations_exit_assumptions (class 
loop *loop, edge exit,

   && !POINTER_TYPE_P (type))
 return false;

+  /* Before niter is calculated, avoid to analyze interim state. */
+  int old_aggressive_loop_optimizations = 
flag_aggressive_loop_optimizations;

+  flag_aggressive_loop_optimizations = 0;
+
   tree iv0_niters = NULL_TREE;
   if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),
  op0, , safe ? _niters : NULL, false))
-return number_of_iterations_popcount (loop, exit, code, niter);
+{
+  bool res = number_of_iterations_popcount (loop, exit, code, 
niter);
+  flag_aggressive_loop_optimizations = 
old_aggressive_loop_optimizations;

+  return res;
+}
   tree iv1_niters = NULL_TREE;
   if (!simple_iv_with_niters (loop, 

Re: [PATCH] Mass rename of C++ .c files to .cc suffix

2022-01-13 Thread Martin Liška

On 1/13/22 12:14, Richard Biener wrote:

But please make sure all intermediate revs will still build.


That's not possible :) I don't it's a good idea mixing .cc renaming
and changes in that files.

Martin


Re: [PATCH] Mass rename of C++ .c files to .cc suffix

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, Jan 13, 2022 at 11:59 AM Martin Liška  wrote:
>
> On 1/13/22 11:47, Martin Jambor wrote:
> > Hi,
> >
> > On Tue, Jan 11 2022, Martin Liška wrote:
> >> Hello.
> >>
> >> I've got a patch series that does the renaming. It contains of 2 automatic
> >> scripts ([1] and [2]) that were run as:
> >>
> >> $ gcc-renaming-candidates.py gcc --rename && git commit -a -m 'Rename 
> >> files.' && rename-gcc.py . -vv && git commit -a -m 'Automatic renaming'
> >>
> >> The first scripts does the renaming (with a couple of exceptions that are 
> >> really C files) and saves
> >> the renamed files to a file. Then the file is then loaded and replacement 
> >> of all the renamed files does happen
> >> for most of the GCC files ([2]). It basically replaces at 
> >> \b${old_filename}\b with ${old_filename}c
> >> (with some exceptions). That corresponds to patch #1 and #2 and the 
> >> patches are quite huge.
> >>
> >> The last piece are manual changes needed for Makefile.in, configure.ac and 
> >> so on.
> >>
> >> The git branch can be seen here:
> >> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=log;h=refs/users/marxin/heads/cc-renaming
> >>
> >> and pulled with:
> >> $ git fetch refs/users/marxin/heads/cc-renaming
> >> $ git co FETCH_HEAD
> >>
> >
> > Thanks for the effort!  I looked at the branch and liked what I saw.
>
> Thanks.
>
> > Perhaps only a small nit about the commit message of the 2nd commit
> > ("Automatic renaming of .c files to .cc.") which confused me.  It does
> > not actually rename any files so I would change it to "change references
> > to .c files to .cc files" or something like that.
>
> Sure, I'm going to update the commit message.
>
> >
> > But I assume the branch will need to be committed squashed anyway, so
> > commit message worries might be a bit premature.
>
> No, I would like to commit it as 3 separate commits for this reasons:
> - git renaming with 100% match should guarantee git would fully work with 
> merging and stuff like that
> - I would like to distinguish manual changes from these that are only a 
> mechanical replacement.

But please make sure all intermediate revs will still build.

Richard.

> Cheers,
> Martin
>
> >
> > I am looking forward to seeing it in trunk.
> >
> > Martin
>


Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-13 Thread Andrew Stubbs
Updated patch: this version fixes some missed cases of malloc in the 
realloc implementation. It also reworks the unused variable workarounds 
so that the work better with my reworked pinned memory patches I've not 
posted yet.


Andrewlibgomp, nvptx: low-latency memory allocator

This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.

The use of the PTX dynamic_smem_size feature means that the minimum version
requirement is now bumped to 4.1 (still old at this point).

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): New macro.
(MEMSPACE_CALLOC): New macro.
(MEMSPACE_REALLOC): New macro.
(MEMSPACE_FREE): New macro.
(dynamic_smem_size): New constants.
(omp_alloc): Use MEMSPACE_ALLOC.
Implement fall-backs for predefined allocators.
(omp_free): Use MEMSPACE_FREE.
(omp_calloc): Use MEMSPACE_CALLOC.
Implement fall-backs for predefined allocators.
(omp_realloc): Use MEMSPACE_REALLOC and MEMSPACE_ALLOC..
Implement fall-backs for predefined allocators.
* config/nvptx/team.c (__nvptx_lowlat_heap_root): New variable.
(__nvptx_lowlat_pool): New asm varaible.
(gomp_nvptx_main): Initialize the low-latency heap.
* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
* config/nvptx/allocator.c: New file.
* testsuite/libgomp.c/allocators-1.c: New test.
* testsuite/libgomp.c/allocators-2.c: New test.
* testsuite/libgomp.c/allocators-3.c: New test.
* testsuite/libgomp.c/allocators-4.c: New test.
* testsuite/libgomp.c/allocators-5.c: New test.
* testsuite/libgomp.c/allocators-6.c: New test.

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 07a5645f4cc..1cc7486fc4c 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -34,6 +34,34 @@
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
+/* These macros may be overridden in config//allocator.c.  */
+#ifndef MEMSPACE_ALLOC
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE)
+#endif
+#ifndef MEMSPACE_CALLOC
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE)
+#endif
+#ifndef MEMSPACE_REALLOC
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE)
+#endif
+#ifndef MEMSPACE_FREE
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR)
+#endif
+
+/* Map the predefined allocators to the correct memory space.
+   The index to this table is the omp_allocator_handle_t enum value.  */
+static const omp_memspace_handle_t predefined_alloc_mapping[] = {
+  omp_default_mem_space,   /* omp_null_allocator. */
+  omp_default_mem_space,   /* omp_default_mem_alloc. */
+  omp_large_cap_mem_space, /* omp_large_cap_mem_alloc. */
+  omp_default_mem_space,   /* omp_const_mem_alloc. */
+  omp_high_bw_mem_space,   /* omp_high_bw_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_low_lat_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
+};
+
 struct omp_allocator_data
 {
   omp_memspace_handle_t memspace;
@@ -281,7 +309,7 @@ retry:
   allocator_data->used_pool_size = used_pool_size;
   gomp_mutex_unlock (_data->lock);
 #endif
-  ptr = malloc (new_size);
+  ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
   if (ptr == NULL)
{
 #ifdef HAVE_SYNC_BUILTINS
@@ -297,7 +325,11 @@ retry:
 }
   else
 {
-  ptr = malloc (new_size);
+  omp_memspace_handle_t memspace __attribute__((unused))
+   = (allocator_data
+  ? allocator_data->memspace
+  : predefined_alloc_mapping[allocator]);
+  ptr = MEMSPACE_ALLOC (memspace, new_size);
   if (ptr == NULL)
goto fail;
 }
@@ -315,32 +347,35 @@ retry:
   return ret;
 
 fail:
-  if (allocator_data)
+  int fallback = (allocator_data
+ ? allocator_data->fallback
+ : allocator == omp_default_mem_alloc
+ ? omp_atv_null_fb
+ : omp_atv_default_mem_fb);
+  switch (fallback)
 {
-  switch (allocator_data->fallback)
+case omp_atv_default_mem_fb:
+  if ((new_alignment > sizeof (void *) && new_alignment > alignment)
+ || (allocator_data
+ && allocator_data->pool_size < ~(uintptr_t) 0)
+ || !allocator_data)
{
-   case omp_atv_default_mem_fb:
- if ((new_alignment > sizeof (void *) && new_alignment > alignment)
- || (allocator_data
- && allocator_data->pool_size < 

[ANNOUNCEMENT] Mass rename of C++ .c files to .cc suffix is going to happen on Jan 17 evening UTC TZ

2022-01-13 Thread Martin Liška

Hello.

Based on the discussion with release managers, the change is going to happen
after stage4 begins.

Martin


Re: [PATCH] [12/11/10] Fix invalid format warnings on Windows

2022-01-13 Thread Tomas Kalibera via Gcc-patches

On 1/13/22 10:40 AM, Martin Liška wrote:

[...]
Apart from that, I support the patch (I cannot approve it). Note we're 
now approaching
stage4 and this is definitelly a stage1 material (opens after GCC 
12.1.0 gets released).


Thanks, Martin, I've updated the patch following your suggestions.

Cheers
Tomas




Cheers,
Martin

>From 4db4e6b35be5793902d8820d2c8e4d1f1cbba80d Mon Sep 17 00:00:00 2001
From: Tomas Kalibera 
Date: Thu, 13 Jan 2022 05:25:32 -0500
Subject: [PATCH] c-family: Let stdio.h override built in printf format
 [PR95130,PR92292]

Mingw32 targets use ms_printf format for printf, but mingw-w64 when
configured for UCRT uses gnu_format (via stdio.h).  GCC then checks both
formats, which means that one cannot print a 64-bit integer without a
warning.  All these lines issue a warning:

  printf("Hello %"PRIu64"\n", x);
  printf("Hello %I64u\n", x);
  printf("Hello %llu\n", x);

because each of them violates one of the formats.  Also, one gets a warning
twice if the format string violates both formats.

Fixed by disabling the built in format in case there are additional ones.

gcc/c-family/ChangeLog:

	PR c/95130
	PR c/92292

	* c-common.c (check_function_arguments): Pass also function
	  declaration to check_function_format.

	* c-common.h (check_function_format): Extra argument - function
	  declaration.

	* c-format.c (check_function_format): For builtin functions with a
	  built in format and at least one more, do not check the first one.
---
 gcc/c-family/c-common.c |  2 +-
 gcc/c-family/c-common.h |  2 +-
 gcc/c-family/c-format.c | 32 ++--
 3 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 4a6a4edb763..00fc734d28e 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -6064,7 +6064,7 @@ check_function_arguments (location_t loc, const_tree fndecl, const_tree fntype,
   /* Check for errors in format strings.  */
 
   if (warn_format || warn_suggest_attribute_format)
-check_function_format (fntype, TYPE_ATTRIBUTES (fntype), nargs, argarray,
+check_function_format (fndecl, fntype, TYPE_ATTRIBUTES (fntype), nargs, argarray,
 			   arglocs);
 
   if (warn_format)
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 8b7bf35e888..ee370eafbbc 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -856,7 +856,7 @@ extern void check_function_arguments_recurse (void (*)
 	  unsigned HOST_WIDE_INT);
 extern bool check_builtin_function_arguments (location_t, vec,
 	  tree, tree, int, tree *);
-extern void check_function_format (const_tree, tree, int, tree *,
+extern void check_function_format (const_tree, const_tree, tree, int, tree *,
    vec *);
 extern bool attribute_fallthrough_p (tree);
 extern tree handle_format_attribute (tree *, tree, tree, int, bool *);
diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index afa77810a5c..bc2abee5146 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -1160,12 +1160,13 @@ decode_format_type (const char *s, bool *is_raw /* = NULL */)
attribute themselves.  */
 
 void
-check_function_format (const_tree fntype, tree attrs, int nargs,
+check_function_format (const_tree fndecl, const_tree fntype, tree attrs, int nargs,
 		   tree *argarray, vec *arglocs)
 {
-  tree a;
+  tree a, aa;
 
   tree atname = get_identifier ("format");
+  bool skipped_default_format = false;
 
   /* See if this function has any format attributes.  */
   for (a = attrs; a; a = TREE_CHAIN (a))
@@ -1176,6 +1177,33 @@ check_function_format (const_tree fntype, tree attrs, int nargs,
 	  function_format_info info;
 	  decode_format_attr (fntype, atname, TREE_VALUE (a), ,
 			  /*validated=*/true);
+
+	  /* Mingw32 targets have traditionally used ms_printf format for the
+	 printf function, and this format is built in GCC. But nowadays,
+	 if mingw-w64 is configured to target UCRT, the printf function
+	 uses the gnu_printf format (specified in the stdio.h header). This
+	 causes GCC to check both formats, which means that there is no way
+	 to e.g. print a long long unsigned without a warning (ms_printf
+	 warns for %llu and gnu_printf warns for %I64u). Also, GCC would warn
+	 twice about the same issue when both formats are violated, e.g.
+	 for %lu used to print long long unsigned.
+
+	 Hence, if there are multiple format specifiers, we skip the first
+	 one. See PR 95130, PR 92292.  */
+
+	  if (!skipped_default_format && fndecl)
+	{
+	  for(aa = TREE_CHAIN (a); aa; aa = TREE_CHAIN(aa))
+		if (is_attribute_p ("format", get_attribute_name (aa)) &&
+		fndecl && fndecl_built_in_p (fndecl, BUILT_IN_NORMAL))
+		  {
+			skipped_default_format = true;
+			break;
+		  }
+	  if (skipped_default_format)
+		continue;
+	}
+
 	  if (warn_format)
 	{
 	  /* FIXME: Rewrite all the internal functions in this file
-- 
2.25.1



  1   2   >