date:20220613

[PATCH 5/5] xtensa: Eliminate [DS]Cmode hard register clobber that is immediately followed by whole overwrite the register

2022-06-13 Thread Takayuki 'January June' Suwa via Gcc-patches

RTL expansion of substitution to [DS]Cmode hard register includes obstructive
register clobber.

A simplest example:

double _Complex test(double _Complex c) {
  return c;
}

will be converted to:

(set (reg:DF 42 [ c ]) (reg:DF 2 a2))
(set (reg:DF 43 [ c+8 ]) (reg:DF 4 a4))
(clobber (reg:DC 2 a2))
(set (reg:DF 2 a2) (reg:DF 42 [ c ]))
(set (reg:DF 4 a4) (reg:DF 43 [ c+8 ]))
(use (reg:DC 2 a2))
(return)

and then finally:

test:
mov a8, a2
mov a9, a3
mov a6, a4
mov a7, a5
mov a2, a8
mov a3, a9
mov a4, a6
mov a5, a7
ret

As you see, it is so ridiculous.

This patch eliminates such clobber in order to prune away the wasted move
instructions by the optimizer:

test:
ret

gcc/ChangeLog:

* config/xtensa/xtensa.md (DSC): New split pattern and mode iterator.
---
 gcc/config/xtensa/xtensa.md | 28 
 1 file changed, 28 insertions(+)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index e6f5594762f..3b3be5f8436 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -86,6 +86,10 @@
 ;; This code iterator is for *shlrd and its variants.
 (define_code_iterator ior_op [ior plus])
 
+;; This mode iterator allows the DC and SC patterns to be defined from
+;; the same template.
+(define_mode_iterator DSC [DC SC])
+
 
 ;; Attributes.
 
@@ -2755,3 +2759,27 @@
   operands[6] = gen_rtx_MEM (SFmode, XEXP (operands[6], 0));
   operands[7] = gen_rtx_MEM (SFmode, XEXP (operands[7], 0));
 })
+
+(define_split
+  [(clobber (match_operand:DSC 0 "register_operand"))]
+  "GP_REG_P (REGNO (operands[0]))"
+  [(const_int 0)]
+{
+  unsigned int regno = REGNO (operands[0]);
+  machine_mode inner_mode = GET_MODE_INNER (mode);
+  rtx_insn *insn;
+  rtx x;
+  if (! ((insn = next_nonnote_nondebug_insn (curr_insn))
+&& NONJUMP_INSN_P (insn)
+&& GET_CODE (x = PATTERN (insn)) == SET
+&& REG_P (x = XEXP (x, 0))
+&& GET_MODE (x) == inner_mode
+&& REGNO (x) == regno
+&& (insn = next_nonnote_nondebug_insn (insn))
+&& NONJUMP_INSN_P (insn)
+&& GET_CODE (x = PATTERN (insn)) == SET
+&& REG_P (x = XEXP (x, 0))
+&& GET_MODE (x) == inner_mode
+&& REGNO (x) == regno + REG_NREGS (operands[0]) / 2))
+FAIL;
+})
-- 
2.20.1

[PATCH 4/5] xtensa: Eliminate unwanted reg-reg moves during DFmode input reloads

2022-06-13 Thread Takayuki 'January June' Suwa via Gcc-patches

When spilled DFmode registers are reloaded in, once loaded into a pair of
SImode regs and then copied from that regs.  Such unwanted reg-reg moves
seems not to be eliminated at the "cprop_hardreg" stage, despite no problem
in output reloads.

Luckily it is easy to resolve such inefficiencies, with the use of peephole2
pattern.

gcc/ChangeLog:

* config/xtensa/predicates.md (reload_operand):
New predicate.
* config/xtensa/xtensa.md: New peephole2 pattern.
---
 gcc/config/xtensa/predicates.md | 13 +
 gcc/config/xtensa/xtensa.md | 31 +++
 2 files changed, 44 insertions(+)

diff --git a/gcc/config/xtensa/predicates.md b/gcc/config/xtensa/predicates.md
index d63a6cf034c..edd13ae41b9 100644
--- a/gcc/config/xtensa/predicates.md
+++ b/gcc/config/xtensa/predicates.md
@@ -165,6 +165,19 @@
   (and (match_code "const_int")
(match_test "xtensa_mem_offset (INTVAL (op), SFmode)")))
 
+(define_predicate "reload_operand"
+  (match_code "mem")
+{
+  const_rtx addr = XEXP (op, 0);
+  if (REG_P (addr))
+return REGNO (addr) == A1_REG;
+  if (GET_CODE (addr) == PLUS)
+return REG_P (XEXP (addr, 0))
+  && REGNO (XEXP (addr, 0)) == A1_REG
+  && CONST_INT_P (XEXP (addr, 1));
+  return false;
+})
+
 (define_predicate "branch_operator"
   (match_code "eq,ne,lt,ge"))
 
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 9588a829136..e6f5594762f 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -2724,3 +2724,34 @@
(if_then_else (match_test "TARGET_DENSITY")
  (const_int 5)
  (const_int 6)))])
+
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand")
+   (match_operand:SI 6 "reload_operand"))
+   (set (match_operand:SI 1 "register_operand")
+   (match_operand:SI 7 "reload_operand"))
+   (set (match_operand:SF 2 "register_operand")
+   (match_operand:SF 4 "register_operand"))
+   (set (match_operand:SF 3 "register_operand")
+   (match_operand:SF 5 "register_operand"))]
+  "REGNO (operands[0]) == REGNO (operands[4])
+   && REGNO (operands[1]) == REGNO (operands[5])
+   && peep2_reg_dead_p (4, operands[0])
+   && peep2_reg_dead_p (4, operands[1])"
+  [(set (match_dup 2)
+   (match_dup 6))
+   (set (match_dup 3)
+   (match_dup 7))]
+{
+  uint32_t check = 0;
+  int i;
+  for (i = 0; i <= 3; ++i)
+{
+  uint32_t mask = (uint32_t)1 << REGNO (operands[i]);
+  if (check & mask)
+   FAIL;
+  check |= mask;
+}
+  operands[6] = gen_rtx_MEM (SFmode, XEXP (operands[6], 0));
+  operands[7] = gen_rtx_MEM (SFmode, XEXP (operands[7], 0));
+})
-- 
2.20.1

[PATCH 2/5] xtensa: Add support for sibling call optimization

2022-06-13 Thread Takayuki 'January June' Suwa via Gcc-patches

This patch introduces support for sibling call optimization, when call0
ABI is in effect.

gcc/ChangeLog:

* config/xtensa/xtensa-protos.h (xtensa_prepare_expand_call,
xtensa_emit_sibcall): New prototypes.
(xtensa_expand_epilogue): Add new argument that specifies whether
or not sibling call.
* config/xtensa/xtensa.cc (TARGET_FUNCTION_OK_FOR_SIBCALL):
New macro definition.
(xtensa_prepare_expand_call): New function in order to share
the common code.
(xtensa_emit_sibcall, xtensa_function_ok_for_sibcall):
New functions.
(xtensa_expand_epilogue): Add new argument sibcall_p and use it
for sibling call handling.
* config/xtensa/xtensa.md (call, call_value):
Use xtensa_prepare_expand_call.
(call_internal, call_value_internal):
Add the condition in order to be disabled if sibling call.
(sibcall, sibcall_value, sibcall_epilogue): New expansions.
(sibcall_internal, sibcall_value_internal): New insn patterns.

gcc/testsuite/ChangeLog:

* gcc.target/xtensa/sibcalls.c: New.
---
 gcc/config/xtensa/xtensa-protos.h  |  4 +-
 gcc/config/xtensa/xtensa.cc| 63 ++--
 gcc/config/xtensa/xtensa.md| 68 +-
 gcc/testsuite/gcc.target/xtensa/sibcalls.c | 15 +
 4 files changed, 130 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/xtensa/sibcalls.c

diff --git a/gcc/config/xtensa/xtensa-protos.h 
b/gcc/config/xtensa/xtensa-protos.h
index 168ad70710b..e020a332b03 100644
--- a/gcc/config/xtensa/xtensa-protos.h
+++ b/gcc/config/xtensa/xtensa-protos.h
@@ -53,7 +53,9 @@ extern void xtensa_expand_atomic (enum rtx_code, rtx, rtx, 
rtx, bool);
 extern void xtensa_emit_loop_end (rtx_insn *, rtx *);
 extern char *xtensa_emit_branch (bool, rtx *);
 extern char *xtensa_emit_movcc (bool, bool, bool, rtx *);
+extern void xtensa_prepare_expand_call (int, rtx *);
 extern char *xtensa_emit_call (int, rtx *);
+extern char *xtensa_emit_sibcall (int, rtx *);
 extern bool xtensa_tls_referenced_p (rtx);
 extern enum rtx_code xtensa_shlrd_which_direction (rtx, rtx);
 
@@ -73,7 +75,7 @@ extern int xtensa_dbx_register_number (int);
 extern long compute_frame_size (poly_int64);
 extern bool xtensa_use_return_instruction_p (void);
 extern void xtensa_expand_prologue (void);
-extern void xtensa_expand_epilogue (void);
+extern void xtensa_expand_epilogue (bool);
 extern void order_regs_for_local_alloc (void);
 extern enum reg_class xtensa_regno_to_class (int regno);
 extern HOST_WIDE_INT xtensa_initial_elimination_offset (int from, int to);
diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 58b6eb0b711..b97f37ac956 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -189,7 +189,7 @@ static bool xtensa_can_eliminate (const int from 
ATTRIBUTE_UNUSED,
  const int to);
 static HOST_WIDE_INT xtensa_starting_frame_offset (void);
 static unsigned HOST_WIDE_INT xtensa_asan_shadow_offset (void);
-
+static bool xtensa_function_ok_for_sibcall (tree, tree);
 static rtx xtensa_delegitimize_address (rtx);
 
 
@@ -347,6 +347,9 @@ static rtx xtensa_delegitimize_address (rtx);
 #undef TARGET_DELEGITIMIZE_ADDRESS
 #define TARGET_DELEGITIMIZE_ADDRESS xtensa_delegitimize_address
 
+#undef TARGET_FUNCTION_OK_FOR_SIBCALL
+#define TARGET_FUNCTION_OK_FOR_SIBCALL xtensa_function_ok_for_sibcall
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 
@@ -2127,6 +2130,20 @@ xtensa_emit_movcc (bool inverted, bool isfp, bool 
isbool, rtx *operands)
 }
 
 
+void
+xtensa_prepare_expand_call (int callop, rtx *operands)
+{
+  rtx addr = XEXP (operands[callop], 0);
+
+  if (flag_pic && SYMBOL_REF_P (addr)
+  && (!SYMBOL_REF_LOCAL_P (addr) || SYMBOL_REF_EXTERNAL_P (addr)))
+addr = gen_sym_PLT (addr);
+
+  if (!call_insn_operand (addr, VOIDmode))
+XEXP (operands[callop], 0) = copy_to_mode_reg (Pmode, addr);
+}
+
+
 char *
 xtensa_emit_call (int callop, rtx *operands)
 {
@@ -2145,6 +2162,24 @@ xtensa_emit_call (int callop, rtx *operands)
 }
 
 
+char *
+xtensa_emit_sibcall (int callop, rtx *operands)
+{
+  static char result[64];
+  rtx tgt = operands[callop];
+
+  if (GET_CODE (tgt) == CONST_INT)
+sprintf (result, "j.l\t" HOST_WIDE_INT_PRINT_HEX ", a9",
+INTVAL (tgt));
+  else if (register_operand (tgt, VOIDmode))
+sprintf (result, "jx\t%%%d", callop);
+  else
+sprintf (result, "j.l\t%%%d, a9", callop);
+
+  return result;
+}
+
+
 bool
 xtensa_legitimate_address_p (machine_mode mode, rtx addr, bool strict)
 {
@@ -3270,7 +3305,7 @@ xtensa_expand_prologue (void)
 }
 
 void
-xtensa_expand_epilogue (void)
+xtensa_expand_epilogue (bool sibcall_p)
 {
   if (!TARGET_WINDOWED_ABI)
 {
@@ -3304,10 +3339,13 @@ xtensa_expand_epilogue (void)
  if (xtensa_call_save_reg(regno))
{
  rtx x = gen_rtx_PLUS

[PATCH 3/5] xtensa: Add some dedicated patterns that correspond to GIMPLE canonicalizations

2022-06-13 Thread Takayuki 'January June' Suwa via Gcc-patches

This patch offers better RTL representations against straightforward
derivations from some tree optimizers' canonicalized forms.

- rounding up to even, such as '(x + (x & 1))', is canonicalized to
  '((x + 1) & -2)', but the former is one instruction less than the latter
  in Xtensa ISA.
- signed greater or equal to zero as logical value '((signed)x >= 0)',
  is canonicalized to '((unsigned)(x ^ -1) >> 31)', but the equivalent
  '(((signed)x >> 31) + 1)' is one instruction less.

gcc/ChangeLog:

* config/xtensa/xtensa.md (*round_up_to_even):
New insn-and-split pattern.
(*signed_ge_zero): Ditto.
---
 gcc/config/xtensa/xtensa.md | 45 +
 1 file changed, 45 insertions(+)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 181f935e3c3..9588a829136 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -2679,3 +2679,48 @@
   xtensa_expand_atomic (, operands[0], operands[1], operands[2], true);
   DONE;
 })
+
+(define_insn_and_split "*round_up_to_even"
+  [(set (match_operand:SI 0 "register_operand" "=a")
+   (and:SI (plus:SI (match_operand:SI 1 "register_operand" "r")
+(const_int 1))
+   (const_int -2)))]
+  ""
+  "#"
+  "can_create_pseudo_p ()"
+  [(set (match_dup 2)
+   (and:SI (match_dup 1)
+   (const_int 1)))
+   (set (match_dup 0)
+   (plus:SI (match_dup 2)
+(match_dup 1)))]
+{
+  operands[2] = gen_reg_rtx (SImode);
+}
+  [(set_attr "type""arith")
+   (set_attr "mode""SI")
+   (set (attr "length")
+   (if_then_else (match_test "TARGET_DENSITY")
+ (const_int 5)
+ (const_int 6)))])
+
+(define_insn_and_split "*signed_ge_zero"
+  [(set (match_operand:SI 0 "register_operand" "=a")
+   (ge:SI (match_operand:SI 1 "register_operand" "r")
+  (const_int 0)))]
+  ""
+  "#"
+  ""
+  [(set (match_dup 0)
+   (ashiftrt:SI (match_dup 1)
+(const_int 31)))
+   (set (match_dup 0)
+   (plus:SI (match_dup 0)
+(const_int 1)))]
+  ""
+  [(set_attr "type""arith")
+   (set_attr "mode""SI")
+   (set (attr "length")
+   (if_then_else (match_test "TARGET_DENSITY")
+ (const_int 5)
+ (const_int 6)))])
-- 
2.20.1

[PATCH 1/5] xtensa: Document new -mextra-l32r-costs= Xtensa-specific option

2022-06-13 Thread Takayuki 'January June' Suwa via Gcc-patches



gcc/ChangeLog:
* doc/invoke.texi: Document -mextra-l32r-costs= option.
---
 gcc/doc/invoke.texi | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b6c0305f198..2a3c638de40 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1473,7 +1473,8 @@ See RS/6000 and PowerPC Options.
 -mauto-litpools  -mno-auto-litpools @gol
 -mtarget-align  -mno-target-align @gol
 -mlongcalls  -mno-longcalls @gol
--mabi=@var{abi-type}}
+-mabi=@var{abi-type} @gol
+-mextra-l32r-costs=@var{cycles}}
 
 @emph{zSeries Options}
 See S/390 and zSeries Options.
@@ -33526,6 +33527,14 @@ by 8 registers on entry so that its arguments are 
found in registers
 pointer.  Register window is rotated 8 registers back upon return.
 When this version of the ABI is enabled the C preprocessor symbol
 @code{__XTENSA_WINDOWED_ABI__} is defined.
+
+@item -mextra-l32r-costs=@var{n}
+@opindex mextra-l32r-costs
+Specify an extra cost of instruction RAM/ROM access for @code{L32R}
+instructions, in clock cycles.  This affects, when optimizing for speed,
+whether loading a constant from literal pool using @code{L32R} or
+synthesizing the constant from a small one with a couple of arithmetic
+instructions.  The default value is 0.
 @end table
 
 @node zSeries Options
-- 
2.20.1

Re: PING Re: [PATCH 07/10] value-relation.h: add 'final' and 'override' to relation_oracle vfunc impls

2022-06-13 Thread David Malcolm via Gcc-patches

On Mon, 2022-06-13 at 20:45 -0400, Aldy Hernandez wrote:
> Final implies we can't further derive from the derived class, right?

"final" on a vfunc implies that nothing overrides that vfunc, but you
could still have further derived classes.

You can think of it as "nothing further overrides this vfunc", as both
a hint to the human reader, and an optimization hint to the compiler.  

You can always just remove the "final" if you want to override it in
the future (unless the override is happening in a plugin, I suppose).

> If so
> we may want just override.

"override" is indeed probably more useful, in that it documents to
compilers and human readers that you intend this to override a vfunc.

FWIW I wrote the patch by using both "final" and "override", and then
dropping the "final" everywhere I needed to to get it to compile.

Dave


> 
> Andrew, what are your thoughts?
> 
> Thanks for doing this.
> 
> Aldy
> 
> On Mon, Jun 13, 2022, 14:28 David Malcolm  wrote:
> 
> > Ping re this patch:
> >   https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595438.html
> > 
> > OK for trunk?
> > 
> > Thanks
> > Dave
> > 
> > 
> > On Mon, 2022-05-23 at 15:28 -0400, David Malcolm wrote:
> > > gcc/ChangeLog:
> > >     * value-relation.h: Add "final" and "override" to
> > > relation_oracle
> > >     vfunc implementations as appropriate.
> > > 
> > > Signed-off-by: David Malcolm 
> > > ---
> > >  gcc/value-relation.h | 38 +-
> > >  1 file changed, 21 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/gcc/value-relation.h b/gcc/value-relation.h
> > > index 19762d8ce2b..478729be0bf 100644
> > > --- a/gcc/value-relation.h
> > > +++ b/gcc/value-relation.h
> > > @@ -130,14 +130,15 @@ public:
> > >    equiv_oracle ();
> > >    ~equiv_oracle ();
> > > 
> > > -  const_bitmap equiv_set (tree ssa, basic_block bb);
> > > +  const_bitmap equiv_set (tree ssa, basic_block bb) final
> > > override;
> > >    void register_relation (basic_block bb, relation_kind k, tree
> > > ssa1,
> > > - tree ssa2);
> > > + tree ssa2) override;
> > > 
> > > -  relation_kind query_relation (basic_block, tree, tree);
> > > -  relation_kind query_relation (basic_block, const_bitmap,
> > > const_bitmap);
> > > -  void dump (FILE *f, basic_block bb) const;
> > > -  void dump (FILE *f) const;
> > > +  relation_kind query_relation (basic_block, tree, tree) override;
> > > +  relation_kind query_relation (basic_block, const_bitmap,
> > > const_bitmap)
> > > +    override;
> > > +  void dump (FILE *f, basic_block bb) const override;
> > > +  void dump (FILE *f) const override;
> > > 
> > >  protected:
> > >    bitmap_obstack m_bitmaps;
> > > @@ -185,14 +186,16 @@ public:
> > >    dom_oracle ();
> > >    ~dom_oracle ();
> > > 
> > > -  void register_relation (basic_block bb, relation_kind k, tree
> > > op1,
> > > tree op2);
> > > +  void register_relation (basic_block bb, relation_kind k, tree
> > > op1,
> > > tree op2)
> > > +    final override;
> > > 
> > > -  relation_kind query_relation (basic_block bb, tree ssa1, tree
> > > ssa2);
> > > +  relation_kind query_relation (basic_block bb, tree ssa1, tree
> > > ssa2)
> > > +    final override;
> > >    relation_kind query_relation (basic_block bb, const_bitmap b1,
> > > -  const_bitmap b2);
> > > +   const_bitmap b2) final override;
> > > 
> > > -  void dump (FILE *f, basic_block bb) const;
> > > -  void dump (FILE *f) const;
> > > +  void dump (FILE *f, basic_block bb) const final override;
> > > +  void dump (FILE *f) const final override;
> > >  private:
> > >    bitmap m_tmp, m_tmp2;
> > >    bitmap m_relation_set;  // Index by ssa-name. True if a relation
> > > exists
> > > @@ -229,15 +232,16 @@ class path_oracle : public relation_oracle
> > >  public:
> > >    path_oracle (relation_oracle *oracle = NULL);
> > >    ~path_oracle ();
> > > -  const_bitmap equiv_set (tree, basic_block);
> > > -  void register_relation (basic_block, relation_kind, tree, tree);
> > > +  const_bitmap equiv_set (tree, basic_block) final override;
> > > +  void register_relation (basic_block, relation_kind, tree, tree)
> > > final override;
> > >    void killing_def (tree);
> > > -  relation_kind query_relation (basic_block, tree, tree);
> > > -  relation_kind query_relation (basic_block, const_bitmap,
> > > const_bitmap);
> > > +  relation_kind query_relation (basic_block, tree, tree) final
> > > override;
> > > +  relation_kind query_relation (basic_block, const_bitmap,
> > > const_bitmap)
> > > +    final override;
> > >    void reset_path ();
> > >    void set_root_oracle (relation_oracle *oracle) { m_root =
> > > oracle; }
> > > -  void dump (FILE *, basic_block) const;
> > > -  void dump (FILE *) const;
> > > +  void dump (FILE *, basic_block) const final override;
> > > +  void dump (FILE *) const final override;
> > >  private:
> > >    void register_equiv (basic_block

Re: PING Re: [PATCH 07/10] value-relation.h: add 'final' and 'override' to relation_oracle vfunc impls

2022-06-13 Thread Aldy Hernandez via Gcc-patches

Final implies we can't further derive from the derived class, right? If so
we may want just override.

Andrew, what are your thoughts?

Thanks for doing this.

Aldy

On Mon, Jun 13, 2022, 14:28 David Malcolm  wrote:

> Ping re this patch:
>   https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595438.html
>
> OK for trunk?
>
> Thanks
> Dave
>
>
> On Mon, 2022-05-23 at 15:28 -0400, David Malcolm wrote:
> > gcc/ChangeLog:
> > * value-relation.h: Add "final" and "override" to
> > relation_oracle
> > vfunc implementations as appropriate.
> >
> > Signed-off-by: David Malcolm 
> > ---
> >  gcc/value-relation.h | 38 +-
> >  1 file changed, 21 insertions(+), 17 deletions(-)
> >
> > diff --git a/gcc/value-relation.h b/gcc/value-relation.h
> > index 19762d8ce2b..478729be0bf 100644
> > --- a/gcc/value-relation.h
> > +++ b/gcc/value-relation.h
> > @@ -130,14 +130,15 @@ public:
> >equiv_oracle ();
> >~equiv_oracle ();
> >
> > -  const_bitmap equiv_set (tree ssa, basic_block bb);
> > +  const_bitmap equiv_set (tree ssa, basic_block bb) final override;
> >void register_relation (basic_block bb, relation_kind k, tree ssa1,
> > - tree ssa2);
> > + tree ssa2) override;
> >
> > -  relation_kind query_relation (basic_block, tree, tree);
> > -  relation_kind query_relation (basic_block, const_bitmap,
> > const_bitmap);
> > -  void dump (FILE *f, basic_block bb) const;
> > -  void dump (FILE *f) const;
> > +  relation_kind query_relation (basic_block, tree, tree) override;
> > +  relation_kind query_relation (basic_block, const_bitmap,
> > const_bitmap)
> > +override;
> > +  void dump (FILE *f, basic_block bb) const override;
> > +  void dump (FILE *f) const override;
> >
> >  protected:
> >bitmap_obstack m_bitmaps;
> > @@ -185,14 +186,16 @@ public:
> >dom_oracle ();
> >~dom_oracle ();
> >
> > -  void register_relation (basic_block bb, relation_kind k, tree op1,
> > tree op2);
> > +  void register_relation (basic_block bb, relation_kind k, tree op1,
> > tree op2)
> > +final override;
> >
> > -  relation_kind query_relation (basic_block bb, tree ssa1, tree ssa2);
> > +  relation_kind query_relation (basic_block bb, tree ssa1, tree ssa2)
> > +final override;
> >relation_kind query_relation (basic_block bb, const_bitmap b1,
> > -  const_bitmap b2);
> > +   const_bitmap b2) final override;
> >
> > -  void dump (FILE *f, basic_block bb) const;
> > -  void dump (FILE *f) const;
> > +  void dump (FILE *f, basic_block bb) const final override;
> > +  void dump (FILE *f) const final override;
> >  private:
> >bitmap m_tmp, m_tmp2;
> >bitmap m_relation_set;  // Index by ssa-name. True if a relation
> > exists
> > @@ -229,15 +232,16 @@ class path_oracle : public relation_oracle
> >  public:
> >path_oracle (relation_oracle *oracle = NULL);
> >~path_oracle ();
> > -  const_bitmap equiv_set (tree, basic_block);
> > -  void register_relation (basic_block, relation_kind, tree, tree);
> > +  const_bitmap equiv_set (tree, basic_block) final override;
> > +  void register_relation (basic_block, relation_kind, tree, tree)
> > final override;
> >void killing_def (tree);
> > -  relation_kind query_relation (basic_block, tree, tree);
> > -  relation_kind query_relation (basic_block, const_bitmap,
> > const_bitmap);
> > +  relation_kind query_relation (basic_block, tree, tree) final
> > override;
> > +  relation_kind query_relation (basic_block, const_bitmap,
> > const_bitmap)
> > +final override;
> >void reset_path ();
> >void set_root_oracle (relation_oracle *oracle) { m_root = oracle; }
> > -  void dump (FILE *, basic_block) const;
> > -  void dump (FILE *) const;
> > +  void dump (FILE *, basic_block) const final override;
> > +  void dump (FILE *) const final override;
> >  private:
> >void register_equiv (basic_block bb, tree ssa1, tree ssa2);
> >equiv_chain m_equiv;
>
>
>

Re: [PATCH v2 4/4] xtensa: Optimize bitwise AND operation with some specific forms of constants

2022-06-13 Thread Max Filippov via Gcc-patches

On Mon, Jun 13, 2022 at 9:39 AM Takayuki 'January June' Suwa
 wrote:
>
> On 2022/06/13 12:49, Max Filippov wrote:
> > Hi Suwa-san,
> hi!
>
> > This change produces a bunch of regression test failures in big-endian
> > configuration:
> bad news X(
> that point is what i was a little worried about...
>
> > E.g. for the test gcc.c-torture/execute/struct-ini-2.c
> > the following assembly code is generated now:
> > and the following code was generated before this change:
> -   .literal .LC1, -4096
> -   l32ra10, .LC1
> -   and a10, a8, a10
> +   extui   a10, a8, 16, 4  // wrong! must be 12, 4
> +   sllia10, a10, 12
> and of course, '(zero_extract)' is endianness-sensitive.
> (ref. 14.11 Bit-Fields, gcc-internals)
>
> the all patches that i previouly posted do not match or emit 
> '(zero_extract)', except for this case.
>
> ===
> This patch offers several insn-and-split patterns for bitwise AND with
> register and constant that can be represented as:
>
> i.   1's least significant N bits and the others 0's (17 <= N <= 31)
> ii.  1's most significant N bits and the others 0's (12 <= N <= 31)
> iii. M 1's sequence of bits and trailing N 0's bits, that cannot fit into a
> "MOVI Ax, simm12" instruction (1 <= M <= 16, 1 <= N <= 30)
>
> And also offers shortcuts for conditional branch if each of the abovementioned
> operations is (not) equal to zero.
>
> gcc/ChangeLog:
>
> * config/xtensa/predicates.md (shifted_mask_operand):
> New predicate.
> * config/xtensa/xtensa.md (*andsi3_const_pow2_minus_one):
> New insn-and-split pattern.
> (*andsi3_const_negative_pow2, *andsi3_const_shifted_mask,
> *masktrue_const_pow2_minus_one, *masktrue_const_negative_pow2,
> *masktrue_const_shifted_mask): Ditto.
> ---
>  gcc/config/xtensa/predicates.md |  10 ++
>  gcc/config/xtensa/xtensa.md | 179 
>  2 files changed, 189 insertions(+)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max

Re: [PATCH 3/4] xtensa: Make use of BALL/BNALL instructions

2022-06-13 Thread Max Filippov via Gcc-patches

On Sat, Jun 11, 2022 at 11:43 PM Takayuki 'January June' Suwa
 wrote:
>
> In Xtensa ISA, there is no single machine instruction that calculates unary
> bitwise negation, but a few similar fused instructions are exist:
>
>"BALL  Ax, Ay, label"  // if ((~Ax & Ay) == 0) goto label;
>"BNALL Ax, Ay, label"  // if ((~Ax & Ay) != 0) goto label;
>
> These instructions have never been emitted before, but it seems no
> reason not
> to make use of them.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.md (*masktrue_bitcmpl): New insn pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/xtensa/BALL-BNALL.c: New.
> ---
>   gcc/config/xtensa/xtensa.md  | 21 +
>   gcc/testsuite/gcc.target/xtensa/BALL-BNALL.c | 33 
>   2 files changed, 54 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/xtensa/BALL-BNALL.c

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max

Re: [PATCH 2/4] xtensa: Simplify conditional branch/move insn patterns

2022-06-13 Thread Max Filippov via Gcc-patches

On Sat, Jun 11, 2022 at 11:43 PM Takayuki 'January June' Suwa
 wrote:
>
> No need to describe the "false side" conditional insn patterns anymore.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa-protos.h (xtensa_emit_branch):
> Remove the first argument.
> (xtensa_emit_bit_branch): Remove it because now called only from the
> output statement of *bittrue insn pattern.
> * config/xtensa/xtensa.cc (gen_int_relational): Remove the last
> argument 'p_invert', and make so that the condition is reversed by
> itself as needed.
> (xtensa_expand_conditional_branch): Share the common path, and remove
> condition inversion code.
> (xtensa_emit_branch, xtensa_emit_movcc): Simplify by removing the
> "false side" pattern.
> (xtensa_emit_bit_branch): Remove it because of the abovementioned
> reason, and move the function body to *bittrue insn pattern.
> * config/xtensa/xtensa.md (*bittrue): Transplant the output
> statement from removed xtensa_emit_bit_branch().
> (*bfalse, *ubfalse, *bitfalse, *maskfalse): Remove the "false side"
> insn patterns.
> ---
>   gcc/config/xtensa/xtensa-protos.h |   3 +-
>   gcc/config/xtensa/xtensa.cc   | 111 ++--
>   gcc/config/xtensa/xtensa.md   | 117 --
>   3 files changed, 70 insertions(+), 161 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max

Re: [PATCH v2 1/4] xtensa: Improve shift operations more

2022-06-13 Thread Max Filippov via Gcc-patches

On Mon, Jun 13, 2022 at 9:39 AM Takayuki 'January June' Suwa
 wrote:
>
> Changes from v1:
>   (*shift_per_byte_omit_AND_1): changed to be split as early as possible
>
>
> ===
> This patch introduces funnel shifter utilization, and rearranges existing
> "per-byte shift" insn patterns.
>
> gcc/ChangeLog:
>
> * config/xtensa/predicates.md (logical_shift_operator,
> xtensa_shift_per_byte_operator): New predicates.
> * config/xtensa/xtensa-protos.h (xtensa_shlrd_which_direction):
> New prototype.
> * config/xtensa/xtensa.cc (xtensa_shlrd_which_direction):
> New helper function for funnel shift patterns.
> * config/xtensa/xtensa.md (ior_op): New code iterator.
> (*ashlsi3_1): Replace with new split pattern.
> (*shift_per_byte): Unify *ashlsi3_3x, *ashrsi3_3x and *lshrsi3_3x.
> (*shift_per_byte_omit_AND_0, *shift_per_byte_omit_AND_1):
> New insn-and-split patterns that redirect to *xtensa_shift_per_byte,
> in order to omit unnecessary bitwise AND operation.
> (*shlrd_reg_, *shlrd_const_, *shlrd_per_byte_,
> *shlrd_per_byte__omit_AND):
> New insn patterns for funnel shifts.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/xtensa/funnel_shifter.c: New.
> ---
>  gcc/config/xtensa/predicates.md   |   6 +
>  gcc/config/xtensa/xtensa-protos.h |   1 +
>  gcc/config/xtensa/xtensa.cc   |  14 ++
>  gcc/config/xtensa/xtensa.md   | 213 ++
>  .../gcc.target/xtensa/funnel_shifter.c|  17 ++
>  5 files changed, 213 insertions(+), 38 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/xtensa/funnel_shifter.c

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max

[PING] cpp: new built-in __EXP_COUNTER__

2022-06-13 Thread Kaz Kylheku via Gcc-patches

Pinging this item:

https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593473.html

Thanks.

Re: [committed] d: Merge upstream dmd 821ed393d, druntime 454471d8, phobos 1206fc94f.

2022-06-13 Thread Iain Buclaw via Gcc-patches

Excerpts from Rainer Orth's message of Juni 13, 2022 8:58 pm:
> Hi Iain,
> 
>> This patches merges the D front-end with upstream dmd 821ed393d, and the
>> standard library with upstream druntime 454471d8 and phobos 1206fc94f.
> 
>> libphobos/ChangeLog:
>>
>>  * libdruntime/MERGE: Merge upstream druntime 454471d8.
>>  * libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add
>>  core/sync/package.d.
>>  * libdruntime/Makefile.in: Regenerate.
>>  * src/MERGE: Merge upstream phobos 1206fc94f.
> 
> unfortunately, you missed core/sync/package.d here, breaking the build:
> 
> make[5]: *** No rule to make target 'core/sync/package.d', needed by 
> 'core/sync/package.lo'.  Stop.
> 

Hi Rainer,

Really sorry for that, I've checked the core.sync package module in
r13-1077.

Iain.

---
 libphobos/libdruntime/core/sync/package.d | 20 
 1 file changed, 20 insertions(+)

diff --git a/libphobos/libdruntime/core/sync/package.d 
b/libphobos/libdruntime/core/sync/package.d
new file mode 100644
index 000..fe389a0475d
--- /dev/null
+++ b/libphobos/libdruntime/core/sync/package.d
@@ -0,0 +1,20 @@
+/**
+ * Provides thread synchronization tools such as mutexes, semaphores and 
barriers.
+ *
+ * License: Distributed under the
+ *  $(LINK2 http://www.boost.org/LICENSE_1_0.txt, Boost Software License 
1.0).
+ *(See accompanying file LICENSE)
+ * Authors:   Sean Kelly, Rainer Schuetze
+ * Source:$(DRUNTIMESRC core/sync/package.d)
+ */
+
+module core.sync;
+
+public import core.sync.barrier;
+public import core.sync.condition;
+public import core.sync.config;
+public import core.sync.event;
+public import core.sync.exception;
+public import core.sync.mutex;
+public import core.sync.rwmutex;
+public import core.sync.semaphore;

Re: [PATCH] RISC-V: Reset the length to the default of 4 for FP comparisons

2022-06-13 Thread Maciej W. Rozycki

On Thu, 9 Jun 2022, Kito Cheng wrote:

> LGTM, *f_quiet4_default and
> *f_quiet4_snan has set their own
> length and the only user of this setting is
> *cstore4, but apparently the length if 4 for that
> not 8.

 I have committed this change now, thank you for your review.

  Maciej

Re: [PATCH RFA] ubsan: -Wreturn-type and ubsan trap-on-error

2022-06-13 Thread Jakub Jelinek via Gcc-patches

On Mon, Jun 13, 2022 at 03:38:23PM -0400, Jason Merrill via Gcc-patches wrote:
> I noticed that -fsanitize=undefined -fsanitize-undefined-trap-on-error was
> omitting the usual -Wreturn-type warning for control flowing off the end of
> a function.  This was because the warning code was looking for calls either
> to __builtin_unreachable or the UBSan function, but these flags produce a
> call to __builtin_trap instead.
> 
> Tested x86_64-pc-linux-gnu, OK for trunk?
> 
> gcc/c-family/ChangeLog:
> 
>   * c-ubsan.cc (ubsan_instrument_return): Use BUILTINS_LOCATION.
> 
> gcc/ChangeLog:
> 
>   * tree-cfg.cc (pass_warn_function_return::execute): Also check
>   BUILT_IN_TRAP.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/ubsan/return-8.C: New test.

LGTM.

Jakub

[PATCH RFA] ubsan: default to trap on unreachable at -O0 and -Og [PR104642]

2022-06-13 Thread Jason Merrill via Gcc-patches

When not optimizing, we can't do anything useful with unreachability in
terms of code performance, so we might as well improve debugging by turning
__builtin_unreachable into a trap.  In the PR richi suggested introducing an
-funreachable-traps flag for this, but this functionality is already
implemented as -fsanitize=unreachable -fsanitize-undefined-trap-on-error, we
just need to set those flags by default.

I think it also makes sense to do this when we're explicitly optimizing for
the debugging experience.

I then needed to make options-save handle -fsanitize and
-fsanitize-undefined-trap-on-error; since -fsanitize is has custom parsing,
that meant handling it explicitly in the awk scripts.  I also noticed we
weren't setting it in opts_set.

Do we still want -funreachable-traps as an alias (or separate flag) for this
behavior that doesn't mention the sanitizer?

Tested x86_64-pc-linux-gnu, OK for trunk?

PR c++/104642

gcc/ChangeLog:

* doc/invoke.texi (-fsanitize-undefined-trap-on-error):
On by default at -O0, implying -fsanitize=unreachable,return
* opts.cc (finish_options): At -O0 trap on unreachable code.
(common_handle_option): Set opts_set->x_flag_sanitize.
* common.opt (fsanitize-undefined-trap-on-error): Add
Optimization tag.
* optc-save-gen.awk, opth-gen.awk: Include flag_sanitize.

gcc/testsuite/ChangeLog:

* g++.dg/ubsan/return-8a.C: New test.
---
 gcc/doc/invoke.texi|  4 
 gcc/common.opt |  2 +-
 gcc/opts.cc| 10 ++
 gcc/testsuite/g++.dg/ubsan/return-8a.C | 17 +
 gcc/optc-save-gen.awk  |  8 ++--
 gcc/opth-gen.awk   |  3 ++-
 6 files changed, 40 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ubsan/return-8a.C

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 174bc09e5cf..446b0691305 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16100,6 +16100,10 @@ a @code{libubsan} library routine.  The advantage of 
this is that the
 @code{libubsan} library is not needed and is not linked in, so this
 is usable even in freestanding environments.
 
+If @option{-fsanitize} has not been specified, this option implies
+@option{-fsanitize=unreachable,return}, and is enabled by default at
+@option{-O0} and @option{-Og}.
+
 @item -fsanitize-coverage=trace-pc
 @opindex fsanitize-coverage=trace-pc
 Enable coverage-guided fuzzing code instrumentation.
diff --git a/gcc/common.opt b/gcc/common.opt
index 7ca0cceed82..90e3e84913b 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1109,7 +1109,7 @@ fsanitize-address-use-after-scope
 Common Driver Var(flag_sanitize_address_use_after_scope) Init(0)
 
 fsanitize-undefined-trap-on-error
-Common Driver Var(flag_sanitize_undefined_trap_on_error) Init(0)
+Common Driver Optimization Var(flag_sanitize_undefined_trap_on_error) Init(0)
 Use trap instead of a library function for undefined behavior sanitization.
 
 fasynchronous-unwind-tables
diff --git a/gcc/opts.cc b/gcc/opts.cc
index bf06a55456a..3699eabb599 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -1122,6 +1122,15 @@ finish_options (struct gcc_options *opts, struct 
gcc_options *opts_set,
   opts->x_flag_no_inline = 1;
 }
 
+  /* At -O0 or -Og, turn __builtin_unreachable into a trap.  */
+  if ((!opts->x_optimize || opts->x_optimize_debug)
+  && !opts_set->x_flag_sanitize)
+SET_OPTION_IF_UNSET (opts, opts_set,
+flag_sanitize_undefined_trap_on_error, true);
+  if (opts->x_flag_sanitize_undefined_trap_on_error)
+SET_OPTION_IF_UNSET (opts, opts_set, flag_sanitize,
+SANITIZE_UNREACHABLE|SANITIZE_RETURN);
+
   /* Pipelining of outer loops is only possible when general pipelining
  capabilities are requested.  */
   if (!opts->x_flag_sel_sched_pipelining)
@@ -2613,6 +2622,7 @@ common_handle_option (struct gcc_options *opts,
   break;
 
 case OPT_fsanitize_:
+  opts_set->x_flag_sanitize = true;
   opts->x_flag_sanitize
= parse_sanitizer_options (arg, loc, code,
   opts->x_flag_sanitize, value, true);
diff --git a/gcc/testsuite/g++.dg/ubsan/return-8a.C 
b/gcc/testsuite/g++.dg/ubsan/return-8a.C
new file mode 100644
index 000..9b2265c4bb0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ubsan/return-8a.C
@@ -0,0 +1,17 @@
+// PR c++/104642
+
+// At -O0 we default to
+//  -fsanitize=unreachable,return -fsanitize-undefined-trap-on-error
+// so the below should abort at runtime.
+
+// { dg-do run }
+// { dg-shouldfail { *-*-* } }
+// { dg-additional-options "-O0" }
+
+bool b;
+
+int f() {
+  if (b) return 42;
+}  // { dg-warning "-Wreturn-type" }
+
+int main() { f(); }
diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk
index 233d1fbb637..38c02bcc2cf 100644
--- a/gcc/optc-save-gen.awk
+++ b/gcc/optc-save-gen.awk
@@ -84,7

[PATCH RFA] ubsan: -Wreturn-type and ubsan trap-on-error

2022-06-13 Thread Jason Merrill via Gcc-patches

I noticed that -fsanitize=undefined -fsanitize-undefined-trap-on-error was
omitting the usual -Wreturn-type warning for control flowing off the end of
a function.  This was because the warning code was looking for calls either
to __builtin_unreachable or the UBSan function, but these flags produce a
call to __builtin_trap instead.

Tested x86_64-pc-linux-gnu, OK for trunk?

gcc/c-family/ChangeLog:

* c-ubsan.cc (ubsan_instrument_return): Use BUILTINS_LOCATION.

gcc/ChangeLog:

* tree-cfg.cc (pass_warn_function_return::execute): Also check
BUILT_IN_TRAP.

gcc/testsuite/ChangeLog:

* g++.dg/ubsan/return-8.C: New test.
---
 gcc/c-family/c-ubsan.cc   | 4 +++-
 gcc/testsuite/g++.dg/ubsan/return-8.C | 9 +
 gcc/tree-cfg.cc   | 5 +++--
 3 files changed, 15 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ubsan/return-8.C

diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 48f948745f8..a2cd8fb3262 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -308,7 +308,9 @@ tree
 ubsan_instrument_return (location_t loc)
 {
   if (flag_sanitize_undefined_trap_on_error)
-return build_call_expr_loc (loc, builtin_decl_explicit (BUILT_IN_TRAP), 0);
+return build_call_expr_loc
+  /* pass_warn_function_return checks for BUILTINS_LOCATION.  */
+  (BUILTINS_LOCATION, builtin_decl_explicit (BUILT_IN_TRAP), 0);
 
   tree data = ubsan_create_data ("__ubsan_missing_return_data", 1, ,
 NULL_TREE, NULL_TREE);
diff --git a/gcc/testsuite/g++.dg/ubsan/return-8.C 
b/gcc/testsuite/g++.dg/ubsan/return-8.C
new file mode 100644
index 000..354c96098d2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ubsan/return-8.C
@@ -0,0 +1,9 @@
+// { dg-additional-options "-fsanitize=undefined 
-fsanitize-undefined-trap-on-error" }
+
+bool b;
+
+int f() {
+  if (b) return 42;
+}  // { dg-warning "-Wreturn-type" }
+
+int main() { f(); }
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 9e5d84a9805..c67c278dad0 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -9543,7 +9543,7 @@ pass_warn_function_return::execute (function *fun)
}
   /* The C++ FE turns fallthrough from the end of non-void function
 into __builtin_unreachable () call with BUILTINS_LOCATION.
-Recognize those too.  */
+Recognize those as well as calls from ubsan_instrument_return.  */
   basic_block bb;
   if (!warning_suppressed_p (fun->decl, OPT_Wreturn_type))
FOR_EACH_BB_FN (bb, fun)
@@ -9555,7 +9555,8 @@ pass_warn_function_return::execute (function *fun)
  if (last
  && ((LOCATION_LOCUS (gimple_location (last))
   == BUILTINS_LOCATION
-  && gimple_call_builtin_p (last, BUILT_IN_UNREACHABLE))
+  && (gimple_call_builtin_p (last, BUILT_IN_UNREACHABLE)
+  || gimple_call_builtin_p (last, BUILT_IN_TRAP)))
  || gimple_call_builtin_p (last, ubsan_missing_ret)))
{
  gimple_stmt_iterator gsi = gsi_for_stmt (last);

base-commit: 13ea4a6e830da1f245136601e636dec62e74d1a7
-- 
2.27.0

Re: [PATCH] libstdc++: Rename __null_terminated to avoid collision with Apple SDK

2022-06-13 Thread Jonathan Wakely via Gcc-patches

On Fri, 10 Jun 2022 at 22:42, Mark Mentovai wrote:
>
> Thanks, Jonathan. I am, in fact, so certifying.

Great, thanks.

>
> I do believe that bringing up support for new OS versions is in scope for 
> open branches, and it makes sense to merge, particularly for a trivial and 
> uncontentious patch like this one.

Agreed. I've pushed it to trunk for now and will follow up with the
backports soon.

Thanks again for the patch.

[committed] libstdc++: Use type_identity_t for non-deducible std::atomic_xxx args

2022-06-13 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

This is LWG 3220 which is about to become Tentatively Ready.

libstdc++-v3/ChangeLog:

* include/std/atomic (__atomic_val_t): Use __type_identity_t
instead of atomic::value_type, as per LWG 3220.
* testsuite/29_atomics/atomic/lwg3220.cc: New test.
---
 libstdc++-v3/include/std/atomic |  4 +++-
 libstdc++-v3/testsuite/29_atomics/atomic/lwg3220.cc | 13 +
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/29_atomics/atomic/lwg3220.cc

diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index 1c6acfa36d0..70055b8fa83 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -1244,8 +1244,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { atomic_flag_clear_explicit(__a, memory_order_seq_cst); }
 
   /// @cond undocumented
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 3220. P0558 broke conforming C++14 uses of atomic shared_ptr
   template
-using __atomic_val_t = typename atomic<_Tp>::value_type;
+using __atomic_val_t = __type_identity_t<_Tp>;
   template
 using __atomic_diff_t = typename atomic<_Tp>::difference_type;
   /// @endcond
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/lwg3220.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic/lwg3220.cc
new file mode 100644
index 000..d2ff6cf9fe3
--- /dev/null
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/lwg3220.cc
@@ -0,0 +1,13 @@
+// { dg-do compile { target c++11 } }
+// DR 3220. P0558 broke conforming C++14 uses of atomic shared_ptr
+
+#include 
+#include 
+
+struct Abstract { virtual void test() = 0; };
+struct Concrete : Abstract { virtual void test() override {} };
+
+int main() {
+  std::shared_ptr ptr;
+  std::atomic_store(, std::make_shared());
+}
-- 
2.34.3

Re: [committed] d: Merge upstream dmd 821ed393d, druntime 454471d8, phobos 1206fc94f.

2022-06-13 Thread Rainer Orth

Hi Iain,

> This patches merges the D front-end with upstream dmd 821ed393d, and the
> standard library with upstream druntime 454471d8 and phobos 1206fc94f.

> libphobos/ChangeLog:
>
>   * libdruntime/MERGE: Merge upstream druntime 454471d8.
>   * libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add
>   core/sync/package.d.
>   * libdruntime/Makefile.in: Regenerate.
>   * src/MERGE: Merge upstream phobos 1206fc94f.

unfortunately, you missed core/sync/package.d here, breaking the build:

make[5]: *** No rule to make target 'core/sync/package.d', needed by 
'core/sync/package.lo'.  Stop.

I copied the file from upstream libdruntime to fix it.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

PING Re: [PATCH 10/10] Add 'final' and 'override' in various places

2022-06-13 Thread David Malcolm via Gcc-patches

Ping re this patch:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595443.html

OK for trunk?

Thanks
Dave

On Mon, 2022-05-23 at 15:28 -0400, David Malcolm wrote:
> gcc/cp/ChangeLog:
> * cxx-pretty-print.h: Add "final" and "override" to various
> vfunc
> implementations, removing redundant "virtual" as appropriate.
> * module.cc: Likewise.
> 
> gcc/ChangeLog:
> * genmatch.cc: Add "final" and "override" to various vfunc
> implementations, removing redundant "virtual" as appropriate.
> * gensupport.cc: Likewise.
> * gimple-range-cache.h: Likewise.
> * ipa-icf-gimple.h: Likewise.
> * ipa-icf.h: Likewise.
> * read-md.h: Likewise.
> * read-rtl-function.cc: Likewise.
> * tree-ssa-loop-ch.cc: Likewise.
> * tree-ssa-sccvn.cc: Likewise.
> 
> gcc/lto/ChangeLog:
> * lto-dump.cc: Add "final" and "override" to various vfunc
> implementations, removing redundant "virtual" as appropriate.
> 
> Signed-off-by: David Malcolm 
> ---
>  gcc/cp/cxx-pretty-print.h | 38 +++---
>  gcc/cp/module.cc  |  4 ++--
>  gcc/genmatch.cc   | 22 +++---
>  gcc/gensupport.cc |  2 +-
>  gcc/gimple-range-cache.h  |  4 ++--
>  gcc/ipa-icf-gimple.h  |  6 --
>  gcc/ipa-icf.h | 36 
>  gcc/lto/lto-dump.cc   |  8 
>  gcc/read-md.h |  2 +-
>  gcc/read-rtl-function.cc  |  6 +++---
>  gcc/tree-ssa-loop-ch.cc   |  4 ++--
>  gcc/tree-ssa-sccvn.cc |  4 ++--
>  12 files changed, 71 insertions(+), 65 deletions(-)
> 
> diff --git a/gcc/cp/cxx-pretty-print.h b/gcc/cp/cxx-pretty-print.h
> index 5080f70a8e4..593bd91d4f7 100644
> --- a/gcc/cp/cxx-pretty-print.h
> +++ b/gcc/cp/cxx-pretty-print.h
> @@ -36,25 +36,25 @@ public:
>  
>    pretty_printer *clone () const override;
>  
> -  void constant (tree);
> -  void id_expression (tree);
> -  void primary_expression (tree);
> -  void postfix_expression (tree);
> -  void unary_expression (tree);
> -  void multiplicative_expression (tree);
> -  void conditional_expression (tree);
> -  void assignment_expression (tree);
> -  void expression (tree);
> -  void type_id (tree);
> -  void statement (tree);
> -  void declaration (tree);
> -  void declaration_specifiers (tree);
> -  void simple_type_specifier (tree);
> -  void function_specifier (tree);
> -  void declarator (tree);
> -  void direct_declarator (tree);
> -  void abstract_declarator (tree);
> -  void direct_abstract_declarator (tree);
> +  void constant (tree) final override;
> +  void id_expression (tree) final override;
> +  void primary_expression (tree) final override;
> +  void postfix_expression (tree) final override;
> +  void unary_expression (tree) final override;
> +  void multiplicative_expression (tree) final override;
> +  void conditional_expression (tree) final override;
> +  void assignment_expression (tree) final override;
> +  void expression (tree) final override;
> +  void type_id (tree) final override;
> +  void statement (tree) final override;
> +  void declaration (tree) final override;
> +  void declaration_specifiers (tree) final override;
> +  void simple_type_specifier (tree) final override;
> +  void function_specifier (tree) final override;
> +  void declarator (tree) final override;
> +  void direct_declarator (tree) final override;
> +  void abstract_declarator (tree) final override;
> +  void direct_abstract_declarator (tree) final override;
>  
>    /* This is the enclosing scope of the entity being pretty-printed. 
> */
>    tree enclosing_scope;
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index d1dc73724d1..e93151c98c2 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -1483,10 +1483,10 @@ private:
>  
>  protected:
>    using allocator::grow;
> -  virtual char *grow (char *, unsigned needed);
> +  char *grow (char *, unsigned needed) final override;
>  #if MAPPED_WRITING
>    using allocator::shrink;
> -  virtual void shrink (char *);
> +  void shrink (char *) final override;
>  #endif
>  
>  public:
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index 2b84b849330..a0b22c50ae3 100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -723,9 +723,9 @@ public:
>    bool force_leaf;
>    /* If non-zero, the group for optional handling.  */
>    unsigned char opt_grp;
> -  virtual void gen_transform (FILE *f, int, const char *, bool, int,
> - const char *, capture_info *,
> - dt_operand ** = 0, int = 0);
> +  void gen_transform (FILE *f, int, const char *, bool, int,
> + const char *, capture_info *,
> + dt_operand ** = 0, int = 0) override;
>  };
>  
>  /* An operator that is represented by native C code.  This is always
> @@ -757,9 +757,9 @@ public:
>    unsigned nr_stmts;
>    /* The identifier replacement vector.  */
>    vec

PING: Re: [PATCH 08/10] i386: add 'final' and 'override' to scalar_chain vfunc impls

2022-06-13 Thread David Malcolm via Gcc-patches

Ping for this patch:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595440.html

OK for trunk?

Thanks
Dave

On Mon, 2022-05-23 at 15:28 -0400, David Malcolm wrote:
> gcc/ChangeLog:
> * config/i386/i386-features.h: Add "final" and "override" to
> scalar_chain vfunc implementations as appropriate.
> 
> Signed-off-by: David Malcolm 
> ---
>  gcc/config/i386/i386-features.h | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/config/i386/i386-features.h b/gcc/config/i386/i386-
> features.h
> index 5c307607ae5..f46a6d95b74 100644
> --- a/gcc/config/i386/i386-features.h
> +++ b/gcc/config/i386/i386-features.h
> @@ -169,18 +169,18 @@ class general_scalar_chain : public scalar_chain
>   public:
>    general_scalar_chain (enum machine_mode smode_, enum machine_mode
> vmode_);
>    ~general_scalar_chain ();
> -  int compute_convert_gain ();
> +  int compute_convert_gain () final override;
>   private:
>    hash_map defs_map;
>    bitmap insns_conv;
>    unsigned n_sse_to_integer;
>    unsigned n_integer_to_sse;
> -  void mark_dual_mode_def (df_ref def);
> -  void convert_insn (rtx_insn *insn);
> +  void mark_dual_mode_def (df_ref def) final override;
> +  void convert_insn (rtx_insn *insn) final override;
>    void convert_op (rtx *op, rtx_insn *insn);
>    void convert_reg (rtx_insn *insn, rtx dst, rtx src);
>    void make_vector_copies (rtx_insn *, rtx);
> -  void convert_registers ();
> +  void convert_registers () final override;
>    int vector_const_cost (rtx exp);
>  };
>  
> @@ -190,14 +190,14 @@ class timode_scalar_chain : public scalar_chain
>    timode_scalar_chain () : scalar_chain (TImode, V1TImode) {}
>  
>    /* Convert from TImode to V1TImode is always faster.  */
> -  int compute_convert_gain () { return 1; }
> +  int compute_convert_gain () final override { return 1; }
>  
>   private:
> -  void mark_dual_mode_def (df_ref def);
> +  void mark_dual_mode_def (df_ref def) final override;
>    void fix_debug_reg_uses (rtx reg);
> -  void convert_insn (rtx_insn *insn);
> +  void convert_insn (rtx_insn *insn) final override;
>    /* We don't convert registers to difference size.  */
> -  void convert_registers () {}
> +  void convert_registers () final override {}
>  };
>  
>  } // anon namespace

PING Re: [PATCH 07/10] value-relation.h: add 'final' and 'override' to relation_oracle vfunc impls

2022-06-13 Thread David Malcolm via Gcc-patches

Ping re this patch:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595438.html

OK for trunk?

Thanks
Dave


On Mon, 2022-05-23 at 15:28 -0400, David Malcolm wrote:
> gcc/ChangeLog:
> * value-relation.h: Add "final" and "override" to
> relation_oracle
> vfunc implementations as appropriate.
> 
> Signed-off-by: David Malcolm 
> ---
>  gcc/value-relation.h | 38 +-
>  1 file changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/gcc/value-relation.h b/gcc/value-relation.h
> index 19762d8ce2b..478729be0bf 100644
> --- a/gcc/value-relation.h
> +++ b/gcc/value-relation.h
> @@ -130,14 +130,15 @@ public:
>    equiv_oracle ();
>    ~equiv_oracle ();
>  
> -  const_bitmap equiv_set (tree ssa, basic_block bb);
> +  const_bitmap equiv_set (tree ssa, basic_block bb) final override;
>    void register_relation (basic_block bb, relation_kind k, tree ssa1,
> - tree ssa2);
> + tree ssa2) override;
>  
> -  relation_kind query_relation (basic_block, tree, tree);
> -  relation_kind query_relation (basic_block, const_bitmap,
> const_bitmap);
> -  void dump (FILE *f, basic_block bb) const;
> -  void dump (FILE *f) const;
> +  relation_kind query_relation (basic_block, tree, tree) override;
> +  relation_kind query_relation (basic_block, const_bitmap,
> const_bitmap)
> +    override;
> +  void dump (FILE *f, basic_block bb) const override;
> +  void dump (FILE *f) const override;
>  
>  protected:
>    bitmap_obstack m_bitmaps;
> @@ -185,14 +186,16 @@ public:
>    dom_oracle ();
>    ~dom_oracle ();
>  
> -  void register_relation (basic_block bb, relation_kind k, tree op1,
> tree op2);
> +  void register_relation (basic_block bb, relation_kind k, tree op1,
> tree op2)
> +    final override;
>  
> -  relation_kind query_relation (basic_block bb, tree ssa1, tree ssa2);
> +  relation_kind query_relation (basic_block bb, tree ssa1, tree ssa2)
> +    final override;
>    relation_kind query_relation (basic_block bb, const_bitmap b1,
> -  const_bitmap b2);
> +   const_bitmap b2) final override;
>  
> -  void dump (FILE *f, basic_block bb) const;
> -  void dump (FILE *f) const;
> +  void dump (FILE *f, basic_block bb) const final override;
> +  void dump (FILE *f) const final override;
>  private:
>    bitmap m_tmp, m_tmp2;
>    bitmap m_relation_set;  // Index by ssa-name. True if a relation
> exists
> @@ -229,15 +232,16 @@ class path_oracle : public relation_oracle
>  public:
>    path_oracle (relation_oracle *oracle = NULL);
>    ~path_oracle ();
> -  const_bitmap equiv_set (tree, basic_block);
> -  void register_relation (basic_block, relation_kind, tree, tree);
> +  const_bitmap equiv_set (tree, basic_block) final override;
> +  void register_relation (basic_block, relation_kind, tree, tree)
> final override;
>    void killing_def (tree);
> -  relation_kind query_relation (basic_block, tree, tree);
> -  relation_kind query_relation (basic_block, const_bitmap,
> const_bitmap);
> +  relation_kind query_relation (basic_block, tree, tree) final
> override;
> +  relation_kind query_relation (basic_block, const_bitmap,
> const_bitmap)
> +    final override;
>    void reset_path ();
>    void set_root_oracle (relation_oracle *oracle) { m_root = oracle; }
> -  void dump (FILE *, basic_block) const;
> -  void dump (FILE *) const;
> +  void dump (FILE *, basic_block) const final override;
> +  void dump (FILE *) const final override;
>  private:
>    void register_equiv (basic_block bb, tree ssa1, tree ssa2);
>    equiv_chain m_equiv;

PING Re: [PATCH 04/10] tree-switch-conversion.h: use final/override for cluster vfunc impls

2022-06-13 Thread David Malcolm via Gcc-patches

Ping for this patch:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595435.html

OK for trunk?

Thanks
Dave

On Mon, 2022-05-23 at 15:28 -0400, David Malcolm wrote:
> gcc/ChangeLog:
> * tree-switch-conversion.h: Add "final" and "override" to
> cluster
> vfunc implementations as appropriate.
> 
> Signed-off-by: David Malcolm 
> ---
>  gcc/tree-switch-conversion.h | 32 +---
>  1 file changed, 17 insertions(+), 15 deletions(-)
> 
> diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-
> conversion.h
> index 2b677d9f7e9..d22515eb296 100644
> --- a/gcc/tree-switch-conversion.h
> +++ b/gcc/tree-switch-conversion.h
> @@ -130,19 +130,19 @@ public:
>    {}
>  
>    cluster_type
> -  get_type ()
> +  get_type () final override
>    {
>  return SIMPLE_CASE;
>    }
>  
>    tree
> -  get_low ()
> +  get_low () final override
>    {
>  return m_low;
>    }
>  
>    tree
> -  get_high ()
> +  get_high () final override
>    {
>  return m_high;
>    }
> @@ -153,13 +153,13 @@ public:
>    }
>  
>    void
> -  debug ()
> +  debug () final override
>    {
>  dump (stderr);
>    }
>  
>    void
> -  dump (FILE *f, bool details ATTRIBUTE_UNUSED = false)
> +  dump (FILE *f, bool details ATTRIBUTE_UNUSED = false) final override
>    {
>  PRINT_CASE (f, get_low ());
>  if (get_low () != get_high ())
> @@ -170,12 +170,12 @@ public:
>  fprintf (f, " ");
>    }
>  
> -  void emit (tree, tree, tree, basic_block, location_t)
> +  void emit (tree, tree, tree, basic_block, location_t) final override
>    {
>  gcc_unreachable ();
>    }
>  
> -  bool is_single_value_p ()
> +  bool is_single_value_p () final override
>    {
>  return tree_int_cst_equal (get_low (), get_high ());
>    }
> @@ -224,24 +224,24 @@ public:
>    ~group_cluster ();
>  
>    tree
> -  get_low ()
> +  get_low () final override
>    {
>  return m_cases[0]->get_low ();
>    }
>  
>    tree
> -  get_high ()
> +  get_high () final override
>    {
>  return m_cases[m_cases.length () - 1]->get_high ();
>    }
>  
>    void
> -  debug ()
> +  debug () final override
>    {
>  dump (stderr);
>    }
>  
> -  void dump (FILE *f, bool details = false);
> +  void dump (FILE *f, bool details = false) final override;
>  
>    /* List of simple clusters handled by the group.  */
>    vec m_cases;
> @@ -261,13 +261,14 @@ public:
>    {}
>  
>    cluster_type
> -  get_type ()
> +  get_type () final override
>    {
>  return JUMP_TABLE;
>    }
>  
>    void emit (tree index_expr, tree index_type,
> -    tree default_label_expr, basic_block default_bb,
> location_t loc);
> +    tree default_label_expr, basic_block default_bb,
> location_t loc)
> +    final override;
>  
>    /* Find jump tables of given CLUSTERS, where all members of the
> vector
>   are of type simple_cluster.  New clusters are returned.  */
> @@ -366,7 +367,7 @@ public:
>    {}
>  
>    cluster_type
> -  get_type ()
> +  get_type () final override
>    {
>  return BIT_TEST;
>    }
> @@ -388,7 +389,8 @@ public:
>  There *MUST* be max_case_bit_tests or less unique case
>  node targets.  */
>    void emit (tree index_expr, tree index_type,
> -    tree default_label_expr, basic_block default_bb,
> location_t loc);
> +    tree default_label_expr, basic_block default_bb,
> location_t loc)
> + final override;
>  
>    /* Find bit tests of given CLUSTERS, where all members of the vector
>   are of type simple_cluster.  New clusters are returned.  */

PING: Re: [PATCH 03/10] expr.cc: use final/override on op_by_pieces_d vfuncs

2022-06-13 Thread David Malcolm via Gcc-patches

Ping on this patch:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595434.html

OK for trunk?

Thanks
Dave

On Mon, 2022-05-23 at 15:28 -0400, David Malcolm wrote:
> gcc/ChangeLog:
> * expr.cc: Add "final" and "override" to op_by_pieces_d vfunc
> implementations as appropriate.
> 
> Signed-off-by: David Malcolm 
> ---
>  gcc/expr.cc | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 7197996cec7..ce58728862a 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -1357,8 +1357,8 @@ op_by_pieces_d::run ()
>  class move_by_pieces_d : public op_by_pieces_d
>  {
>    insn_gen_fn m_gen_fun;
> -  void generate (rtx, rtx, machine_mode);
> -  bool prepare_mode (machine_mode, unsigned int);
> +  void generate (rtx, rtx, machine_mode) final override;
> +  bool prepare_mode (machine_mode, unsigned int) final override;
>  
>   public:
>    move_by_pieces_d (rtx to, rtx from, unsigned HOST_WIDE_INT len,
> @@ -1453,8 +1453,8 @@ move_by_pieces (rtx to, rtx from, unsigned
> HOST_WIDE_INT len,
>  class store_by_pieces_d : public op_by_pieces_d
>  {
>    insn_gen_fn m_gen_fun;
> -  void generate (rtx, rtx, machine_mode);
> -  bool prepare_mode (machine_mode, unsigned int);
> +  void generate (rtx, rtx, machine_mode) final override;
> +  bool prepare_mode (machine_mode, unsigned int) final override;
>  
>   public:
>    store_by_pieces_d (rtx to, by_pieces_constfn cfn, void *cfn_data,
> @@ -1650,9 +1650,9 @@ class compare_by_pieces_d : public op_by_pieces_d
>    rtx m_accumulator;
>    int m_count, m_batch;
>  
> -  void generate (rtx, rtx, machine_mode);
> -  bool prepare_mode (machine_mode, unsigned int);
> -  void finish_mode (machine_mode);
> +  void generate (rtx, rtx, machine_mode) final override;
> +  bool prepare_mode (machine_mode, unsigned int) final override;
> +  void finish_mode (machine_mode) final override;
>   public:
>    compare_by_pieces_d (rtx op0, rtx op1, by_pieces_constfn op1_cfn,
>    void *op1_cfn_data, HOST_WIDE_INT len, int
> align,

PING Re: [PATCH 02/10] Add 'final' and 'override' on dom_walker vfunc impls

2022-06-13 Thread David Malcolm via Gcc-patches

Ping for this patch:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595436.html

OK for tunk?

Thanks
Dave

On Mon, 2022-05-23 at 15:28 -0400, David Malcolm wrote:
> gcc/ChangeLog:
> * compare-elim.cc: Add "final" and "override" to dom_walker
> vfunc
> implementations, removing redundant "virtual" as appropriate.
> * gimple-ssa-strength-reduction.cc: Likewise.
> * ipa-prop.cc: Likewise.
> * rtl-ssa/blocks.cc: Likewise.
> * tree-into-ssa.cc: Likewise.
> * tree-ssa-dom.cc: Likewise.
> * tree-ssa-math-opts.cc: Likewise.
> * tree-ssa-phiopt.cc: Likewise.
> * tree-ssa-propagate.cc: Likewise.
> * tree-ssa-sccvn.cc: Likewise.
> * tree-ssa-strlen.cc: Likewise.
> * tree-ssa-uncprop.cc: Likewise.
> 
> Signed-off-by: David Malcolm 
> ---
>  gcc/compare-elim.cc  |  2 +-
>  gcc/gimple-ssa-strength-reduction.cc |  2 +-
>  gcc/ipa-prop.cc  |  4 ++--
>  gcc/rtl-ssa/blocks.cc    |  4 ++--
>  gcc/tree-into-ssa.cc | 10 +-
>  gcc/tree-ssa-dom.cc  |  4 ++--
>  gcc/tree-ssa-math-opts.cc    |  2 +-
>  gcc/tree-ssa-phiopt.cc   |  4 ++--
>  gcc/tree-ssa-propagate.cc    |  4 ++--
>  gcc/tree-ssa-sccvn.cc    |  4 ++--
>  gcc/tree-ssa-strlen.cc   |  4 ++--
>  gcc/tree-ssa-uncprop.cc  |  4 ++--
>  12 files changed, 24 insertions(+), 24 deletions(-)
> 
> diff --git a/gcc/compare-elim.cc b/gcc/compare-elim.cc
> index e869d9d3249..4a23202f8ff 100644
> --- a/gcc/compare-elim.cc
> +++ b/gcc/compare-elim.cc
> @@ -283,7 +283,7 @@ public:
>    find_comparison_dom_walker (cdi_direction direction)
>  : dom_walker (direction) {}
>  
> -  virtual edge before_dom_children (basic_block);
> +  edge before_dom_children (basic_block) final override;
>  };
>  
>  /* Return true if conforming COMPARE with EH_NOTE is redundant with
> comparison
> diff --git a/gcc/gimple-ssa-strength-reduction.cc b/gcc/gimple-ssa-
> strength-reduction.cc
> index 2b559e96fc8..fb2bb9f4e74 100644
> --- a/gcc/gimple-ssa-strength-reduction.cc
> +++ b/gcc/gimple-ssa-strength-reduction.cc
> @@ -1729,7 +1729,7 @@ class find_candidates_dom_walker : public
> dom_walker
>  public:
>    find_candidates_dom_walker (cdi_direction direction)
>  : dom_walker (direction) {}
> -  virtual edge before_dom_children (basic_block);
> +  edge before_dom_children (basic_block) final override;
>  };
>  
>  /* Find strength-reduction candidates in block BB.  */
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index c6c745f84a0..03f0ba2ec75 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -3004,7 +3004,7 @@ public:
>    analysis_dom_walker (struct ipa_func_body_info *fbi)
>  : dom_walker (CDI_DOMINATORS), m_fbi (fbi) {}
>  
> -  virtual edge before_dom_children (basic_block);
> +  edge before_dom_children (basic_block) final override;
>  
>  private:
>    struct ipa_func_body_info *m_fbi;
> @@ -5653,7 +5653,7 @@ public:
>  : dom_walker (CDI_DOMINATORS), m_fbi (fbi), m_descriptors (descs),
>    m_aggval (av), m_something_changed (sc) {}
>  
> -  virtual edge before_dom_children (basic_block);
> +  edge before_dom_children (basic_block) final override;
>    bool cleanup_eh ()
>  { return gimple_purge_all_dead_eh_edges (m_need_eh_cleanup); }
>  
> diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
> index 959fad8f829..6b03dd03747 100644
> --- a/gcc/rtl-ssa/blocks.cc
> +++ b/gcc/rtl-ssa/blocks.cc
> @@ -85,8 +85,8 @@ class function_info::bb_walker : public dom_walker
>  {
>  public:
>    bb_walker (function_info *, build_info &);
> -  virtual edge before_dom_children (basic_block);
> -  virtual void after_dom_children (basic_block);
> +  edge before_dom_children (basic_block) final override;
> +  void after_dom_children (basic_block) final override;
>  
>  private:
>    // Information about the function we're building.
> diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc
> index 46df57ae0e1..9631d8c6556 100644
> --- a/gcc/tree-into-ssa.cc
> +++ b/gcc/tree-into-ssa.cc
> @@ -1462,8 +1462,8 @@ public:
>    rewrite_dom_walker (cdi_direction direction)
>  : dom_walker (direction, ALL_BLOCKS, NULL) {}
>  
> -  virtual edge before_dom_children (basic_block);
> -  virtual void after_dom_children (basic_block);
> +  edge before_dom_children (basic_block) final override;
> +  void after_dom_children (basic_block) final override;
>  };
>  
>  /* SSA Rewriting Step 1.  Initialization, create a block local stack
> @@ -2148,8 +2148,8 @@ public:
>    rewrite_update_dom_walker (cdi_direction direction)
>  : dom_walker (direction, ALL_BLOCKS, NULL) {}
>  
> -  virtual edge before_dom_children (basic_block);
> -  virtual void after_dom_children (basic_block);
> +  edge before_dom_children (basic_block) final override;
> +  void after_dom_children (basic_block) final override;
>  };
>  
>  /* Initialization

Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to set the lowpart.

2022-06-13 Thread Jeff Law via Gcc-patches





On 6/13/2022 4:19 AM, Tamar Christina wrote:

-Original Message-
From: Gcc-patches  On Behalf Of Jeff Law via
Gcc-patches
Sent: Sunday, June 12, 2022 6:27 PM
To: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to
set the lowpart.



On 6/9/2022 1:52 AM, Tamar Christina via Gcc-patches wrote:

Hi All,

When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs.

One

for the lowpart and one for the highpart.

The problem with this is that in RTL the lvalue of the RTX is the only
thing tying the two instructions together.

This means that e.g. combine is unable to try to combine the two
instructions for setting the lowpart and highpart.

For ISAs that have bit extract instructions we can eliminate one of
the extracts if, and only if we're setting the entire complex number.

This change changes the expand code when we're setting the entire
complex number to generate a subreg for the lowpart instead of a

vec_extract.

This allows us to optimize sequences such as:

Just a note.  I regularly see subregs significantly interfere with optimization,
particularly register allocation.  So be aware that subregs can often get in the
way of generating good code.  When changing something to use subregs I
like to run real benchmarks rather than working with code snippets.



_Complex int f(int a, int b) {
  _Complex int t = a + b * 1i;
  return t;
}

from:

f:
bfi x2, x0, 0, 32
bfi x2, x1, 32, 32
mov x0, x2
ret

into:

f:
bfi x0, x1, 32, 32
ret

I have also confirmed the codegen for x86_64 did not change.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* emit-rtl.cc (validate_subreg): Accept subregs of complex modes.
* expr.cc (emit_move_complex_parts): Emit subreg of lowpart if

possible.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/complex-init.C: New test.

OK.

On a related topic, any thoughts on keeping complex objects as complex
types/modes through gimple and into at least parts of the RTL pipeline?

The way complex arithmetic instructions work on our chip is going to be
extremely tough to utilize in GCC -- we really need to the complex
types/arithmetic up through RTL generation at the least. Ideally we'd even
expose complex modes all the way to final.    Is that something y'all could
benefit from as well?  Have y'all poked at this problem at all?

Not extensively, but right now the big advantage of lowering them early is for
auto-vec.   Lowering them early allows you to e.g. realize that you only need 
the
imaginary part of the number etc.  For
In the case where you only operate on part of the object, then 
decomposing early and using standard vops and fops on the components 
absolutely makes sense.  That we already do.


The case I want to handle is when we're operating on both the complex 
and imaginary parts at the same time.  I've got instructions to do that, 
but where they find their operands is, umm, inconvenient -- a complex 
add ends up looking like a 2-element vector add.




auto-vec it also means we treat them as
just any other loads/stores.

I think LLVM keeps them as complex expressions much longer and they've been
having a harder time implementing some of the complex arith stuff we did in GCC 
11.

Good to know.  THanks.

jeff

Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to set the lowpart.

2022-06-13 Thread Jeff Law via Gcc-patches





On 6/13/2022 5:54 AM, Richard Biener wrote:

On Sun, Jun 12, 2022 at 7:27 PM Jeff Law via Gcc-patches
 wrote:
[...]

On a related topic, any thoughts on keeping complex objects as complex
types/modes through gimple and into at least parts of the RTL pipeline?

The way complex arithmetic instructions work on our chip is going to be
extremely tough to utilize in GCC -- we really need to the complex
types/arithmetic up through RTL generation at the least. Ideally we'd
even expose complex modes all the way to final.Is that something
y'all could benefit from as well?  Have y'all poked at this problem at all?

Since you are going to need to "recover" complex operations from people
open-coding them (both fortran and C and also C++ with std::complex) it
should be less work to just do that ;)  I think that complex modes and types
exist solely for ABI purposes.
I don't see any reasonable way to do that.  Without going into all the 
details, our complex ops work on low elements within a vector 
register.   Trying to recover them after gimple->rtl expansion would be 
similar to trying to SLP vectorize on RTL, something I'm not keen to chase.


It would be a hell of a lot easier to leave the complex ops as complex 
ops to the expanders, then make the decision to use the complex 
instructions or decompose into components.


jeff

[PATCH v2 1/4] xtensa: Improve shift operations more

2022-06-13 Thread Takayuki 'January June' Suwa via Gcc-patches

Changes from v1:
  (*shift_per_byte_omit_AND_1): changed to be split as early as possible


===
This patch introduces funnel shifter utilization, and rearranges existing
"per-byte shift" insn patterns.

gcc/ChangeLog:

* config/xtensa/predicates.md (logical_shift_operator,
xtensa_shift_per_byte_operator): New predicates.
* config/xtensa/xtensa-protos.h (xtensa_shlrd_which_direction):
New prototype.
* config/xtensa/xtensa.cc (xtensa_shlrd_which_direction):
New helper function for funnel shift patterns.
* config/xtensa/xtensa.md (ior_op): New code iterator.
(*ashlsi3_1): Replace with new split pattern.
(*shift_per_byte): Unify *ashlsi3_3x, *ashrsi3_3x and *lshrsi3_3x.
(*shift_per_byte_omit_AND_0, *shift_per_byte_omit_AND_1):
New insn-and-split patterns that redirect to *xtensa_shift_per_byte,
in order to omit unnecessary bitwise AND operation.
(*shlrd_reg_, *shlrd_const_, *shlrd_per_byte_,
*shlrd_per_byte__omit_AND):
New insn patterns for funnel shifts.

gcc/testsuite/ChangeLog:

* gcc.target/xtensa/funnel_shifter.c: New.
---
 gcc/config/xtensa/predicates.md   |   6 +
 gcc/config/xtensa/xtensa-protos.h |   1 +
 gcc/config/xtensa/xtensa.cc   |  14 ++
 gcc/config/xtensa/xtensa.md   | 213 ++
 .../gcc.target/xtensa/funnel_shifter.c|  17 ++
 5 files changed, 213 insertions(+), 38 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/xtensa/funnel_shifter.c

diff --git a/gcc/config/xtensa/predicates.md b/gcc/config/xtensa/predicates.md
index a912e6d8bb2..bcc83ada0ae 100644
--- a/gcc/config/xtensa/predicates.md
+++ b/gcc/config/xtensa/predicates.md
@@ -164,9 +164,15 @@
 (define_predicate "boolean_operator"
   (match_code "eq,ne"))
 
+(define_predicate "logical_shift_operator"
+  (match_code "ashift,lshiftrt"))
+
 (define_predicate "xtensa_cstoresi_operator"
   (match_code "eq,ne,gt,ge,lt,le"))
 
+(define_predicate "xtensa_shift_per_byte_operator"
+  (match_code "ashift,ashiftrt,lshiftrt"))
+
 (define_predicate "tls_symbol_operand"
   (and (match_code "symbol_ref")
(match_test "SYMBOL_REF_TLS_MODEL (op) != 0")))
diff --git a/gcc/config/xtensa/xtensa-protos.h 
b/gcc/config/xtensa/xtensa-protos.h
index c2fd750cd3a..2c08ed4992d 100644
--- a/gcc/config/xtensa/xtensa-protos.h
+++ b/gcc/config/xtensa/xtensa-protos.h
@@ -56,6 +56,7 @@ extern char *xtensa_emit_bit_branch (bool, bool, rtx *);
 extern char *xtensa_emit_movcc (bool, bool, bool, rtx *);
 extern char *xtensa_emit_call (int, rtx *);
 extern bool xtensa_tls_referenced_p (rtx);
+extern enum rtx_code xtensa_shlrd_which_direction (rtx, rtx);
 
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, int);
diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 3477e983592..df78af66714 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -2403,6 +2403,20 @@ xtensa_tls_referenced_p (rtx x)
 }
 
 
+/* Helper function for "*shlrd_..." patterns.  */
+
+enum rtx_code
+xtensa_shlrd_which_direction (rtx op0, rtx op1)
+{
+  if (GET_CODE (op0) == ASHIFT && GET_CODE (op1) == LSHIFTRT)
+return ASHIFT; /* shld  */
+  if (GET_CODE (op0) == LSHIFTRT && GET_CODE (op1) == ASHIFT)
+return LSHIFTRT;   /* shrd  */
+
+  return UNKNOWN;
+}
+
+
 /* Implement TARGET_CANNOT_FORCE_CONST_MEM.  */
 
 static bool
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index d806d43d129..cd7ded073eb 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -83,6 +83,9 @@
 ;; the same template.
 (define_mode_iterator HQI [HI QI])
 
+;; This code iterator is for *shlrd and its variants.
+(define_code_iterator ior_op [ior plus])
+
 
 ;; Attributes.
 
@@ -1267,16 +1270,6 @@
   operands[1] = xtensa_copy_incoming_a7 (operands[1]);
 })
 
-(define_insn "*ashlsi3_1"
-  [(set (match_operand:SI 0 "register_operand" "=a")
-   (ashift:SI (match_operand:SI 1 "register_operand" "r")
-  (const_int 1)))]
-  "TARGET_DENSITY"
-  "add.n\t%0, %1, %1"
-  [(set_attr "type""arith")
-   (set_attr "mode""SI")
-   (set_attr "length"  "2")])
-
 (define_insn "ashlsi3_internal"
   [(set (match_operand:SI 0 "register_operand" "=a,a")
(ashift:SI (match_operand:SI 1 "register_operand" "r,r")
@@ -1289,16 +1282,14 @@
(set_attr "mode""SI")
(set_attr "length"  "3,6")])
 
-(define_insn "*ashlsi3_3x"
-  [(set (match_operand:SI 0 "register_operand" "=a")
-   (ashift:SI (match_operand:SI 1 "register_operand" "r")
-  (ashift:SI (match_operand:SI 2 "register_operand" "r")
- (const_int 3]
-  ""
-  "ssa8b\t%2\;sll\t%0, %1"
-  [(set_attr "type""arith")
-   (set_attr "mode""SI")
-   (set_attr "length"  "6")])
+(define_split
+  [(set (match_operand:SI 0 "register_operand")
+   (ashift:SI (match_operand:SI 1

[PATCH v2 4/4] xtensa: Optimize bitwise AND operation with some specific forms of constants

2022-06-13 Thread Takayuki 'January June' Suwa via Gcc-patches

On 2022/06/13 12:49, Max Filippov wrote:
> Hi Suwa-san,
hi!

> This change produces a bunch of regression test failures in big-endian
> configuration:
bad news X(
that point is what i was a little worried about...

> E.g. for the test gcc.c-torture/execute/struct-ini-2.c
> the following assembly code is generated now:
> and the following code was generated before this change:
-   .literal .LC1, -4096
-   l32ra10, .LC1
-   and a10, a8, a10
+   extui   a10, a8, 16, 4  // wrong! must be 12, 4
+   sllia10, a10, 12
and of course, '(zero_extract)' is endianness-sensitive.
(ref. 14.11 Bit-Fields, gcc-internals)

the all patches that i previouly posted do not match or emit '(zero_extract)', 
except for this case.

===
This patch offers several insn-and-split patterns for bitwise AND with
register and constant that can be represented as:

i.   1's least significant N bits and the others 0's (17 <= N <= 31)
ii.  1's most significant N bits and the others 0's (12 <= N <= 31)
iii. M 1's sequence of bits and trailing N 0's bits, that cannot fit into a
"MOVI Ax, simm12" instruction (1 <= M <= 16, 1 <= N <= 30)

And also offers shortcuts for conditional branch if each of the abovementioned
operations is (not) equal to zero.

gcc/ChangeLog:

* config/xtensa/predicates.md (shifted_mask_operand):
New predicate.
* config/xtensa/xtensa.md (*andsi3_const_pow2_minus_one):
New insn-and-split pattern.
(*andsi3_const_negative_pow2, *andsi3_const_shifted_mask,
*masktrue_const_pow2_minus_one, *masktrue_const_negative_pow2,
*masktrue_const_shifted_mask): Ditto.
---
 gcc/config/xtensa/predicates.md |  10 ++
 gcc/config/xtensa/xtensa.md | 179 
 2 files changed, 189 insertions(+)

diff --git a/gcc/config/xtensa/predicates.md b/gcc/config/xtensa/predicates.md
index bcc83ada0ae..d63a6cf034c 100644
--- a/gcc/config/xtensa/predicates.md
+++ b/gcc/config/xtensa/predicates.md
@@ -52,6 +52,16 @@
(match_test "xtensa_mask_immediate (INTVAL (op))"))
(match_operand 0 "register_operand")))
 
+(define_predicate "shifted_mask_operand"
+  (match_code "const_int")
+{
+  HOST_WIDE_INT mask = INTVAL (op);
+  int shift = ctz_hwi (mask);
+
+  return IN_RANGE (shift, 1, 31)
+&& xtensa_mask_immediate ((uint32_t)mask >> shift);
+})
+
 (define_predicate "extui_fldsz_operand"
   (and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 1, 16)")))
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index a4477e2207e..5d0f346b01a 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -645,6 +645,83 @@
(set_attr "mode""SI")
(set_attr "length"  "6")])
 
+(define_insn_and_split "*andsi3_const_pow2_minus_one"
+  [(set (match_operand:SI 0 "register_operand" "=a")
+   (and:SI (match_operand:SI 1 "register_operand" "r")
+   (match_operand:SI 2 "const_int_operand" "i")))]
+  "IN_RANGE (exact_log2 (INTVAL (operands[2]) + 1), 17, 31)"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (ashift:SI (match_dup 1)
+  (match_dup 2)))
+   (set (match_dup 0)
+   (lshiftrt:SI (match_dup 0)
+(match_dup 2)))]
+{
+  operands[2] = GEN_INT (32 - floor_log2 (INTVAL (operands[2]) + 1));
+}
+  [(set_attr "type""arith")
+   (set_attr "mode""SI")
+   (set (attr "length")
+   (if_then_else (match_test "TARGET_DENSITY
+  && INTVAL (operands[2]) == 0x7FFF")
+ (const_int 5)
+ (const_int 6)))])
+
+(define_insn_and_split "*andsi3_const_negative_pow2"
+  [(set (match_operand:SI 0 "register_operand" "=a")
+   (and:SI (match_operand:SI 1 "register_operand" "r")
+   (match_operand:SI 2 "const_int_operand" "i")))]
+  "IN_RANGE (exact_log2 (-INTVAL (operands[2])), 12, 31)"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (lshiftrt:SI (match_dup 1)
+(match_dup 2)))
+   (set (match_dup 0)
+   (ashift:SI (match_dup 0)
+  (match_dup 2)))]
+{
+  operands[2] = GEN_INT (floor_log2 (-INTVAL (operands[2])));
+}
+  [(set_attr "type""arith")
+   (set_attr "mode""SI")
+   (set_attr "length"  "6")])
+
+(define_insn_and_split "*andsi3_const_shifted_mask"
+  [(set (match_operand:SI 0 "register_operand" "=a")
+   (and:SI (match_operand:SI 1 "register_operand" "r")
+   (match_operand:SI 2 "shifted_mask_operand" "i")))]
+  "! xtensa_simm12b (INTVAL (operands[2]))"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (zero_extract:SI (match_dup 1)
+(match_dup 3)
+(match_dup 4)))
+   (set (match_dup 0)
+   (ashift:SI (match_dup 0)
+  (match_dup 2)))]
+{
+  HOST_WIDE_INT mask = INTVAL (operands[2]);
+  int shift = ctz_hwi (mask);
+  int mask_size = floor_log2 (((uint32_t)mask >> shift) + 1);
+  int mask_pos = shift;
+

Re: [PATCH] Add -fextra-libc-function=memcmpeq for __memcmpeq

2022-06-13 Thread Richard Biener via Gcc-patches




> Am 13.06.2022 um 16:36 schrieb H.J. Lu :
> 
> On Mon, Jun 13, 2022 at 3:11 AM Richard Biener
>  wrote:
>> 
>>> On Tue, Jun 7, 2022 at 9:02 PM H.J. Lu via Gcc-patches
>>>  wrote:
>>> 
>>> Add -fextra-libc-function=memcmpeq to map
>>> 
>>> extern int __memcmpeq (const void *, const void *, size_t);
>>> 
>>> which was added to GLIBC 2.35, to __builtin_memcmp_eq.
>> 
>> Humm.  Can't we instead use the presence of a declaration
>> of __memcmpeq with a GNU standard dialect as this instead of
>> adding a weird -fextra-libc-function= option?  Maybe that's even
>> reasonable with a non-GNU dialect standard in effect since
>> __ prefixed names are in the implementation namespace?
> 
> But not all source codes include  and GCC may generate
> memcmp directly.  How should we handle these cases?

Not.  Similar as to vectorized math functions.
I think it’s not worth optimizing for this case.

Richard.

> 
>> Richard.
>> 
>>> gcc/
>>> 
>>>* builtins.cc: Include "opts.h".
>>>(expand_builtin): Generate BUILT_IN_MEMCMP_EQ if __memcmpeq is
>>>available.
>>>* builtins.def (BUILT_IN___MEMCMPEQ): New.
>>>* common.opt: Add -fextra-libc-function=.
>>>* opts.cc (extra_libc_functions): New.
>>>(parse_extra_libc_function): New function.
>>>(common_handle_option): Handle -fextra-libc-function=.
>>>* opts.h (extra_libc_function_list): New.
>>>(extra_libc_functions): Likewise.
>>>* doc/invoke.texi: Document -fextra-libc-function=memcmpeq.
>>> 
>>> gcc/testsuite/
>>> 
>>>* c-c++-common/memcmpeq-1.c: New test.
>>>* c-c++-common/memcmpeq-2.c: Likewise.
>>>* c-c++-common/memcmpeq-3.c: Likewise.
>>>* c-c++-common/memcmpeq-4.c: Likewise.
>>>* c-c++-common/memcmpeq-5.c: Likewise.
>>>* c-c++-common/memcmpeq-6.c: Likewise.
>>>* c-c++-common/memcmpeq-7.c: Likewise.
>>> ---
>>> gcc/builtins.cc |  5 -
>>> gcc/builtins.def|  4 
>>> gcc/common.opt  |  4 
>>> gcc/doc/invoke.texi |  6 ++
>>> gcc/opts.cc | 23 +++
>>> gcc/opts.h  |  7 +++
>>> gcc/testsuite/c-c++-common/memcmpeq-1.c | 11 +++
>>> gcc/testsuite/c-c++-common/memcmpeq-2.c | 11 +++
>>> gcc/testsuite/c-c++-common/memcmpeq-3.c | 11 +++
>>> gcc/testsuite/c-c++-common/memcmpeq-4.c | 11 +++
>>> gcc/testsuite/c-c++-common/memcmpeq-5.c | 11 +++
>>> gcc/testsuite/c-c++-common/memcmpeq-6.c | 11 +++
>>> gcc/testsuite/c-c++-common/memcmpeq-7.c | 11 +++
>>> 13 files changed, 125 insertions(+), 1 deletion(-)
>>> create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-1.c
>>> create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-2.c
>>> create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-3.c
>>> create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-4.c
>>> create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-5.c
>>> create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-6.c
>>> create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-7.c
>>> 
>>> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
>>> index b9d89b409b8..22269318e8c 100644
>>> --- a/gcc/builtins.cc
>>> +++ b/gcc/builtins.cc
>>> @@ -81,6 +81,7 @@ along with GCC; see the file COPYING3.  If not see
>>> #include "demangle.h"
>>> #include "gimple-range.h"
>>> #include "pointer-query.h"
>>> +#include "opts.h"
>>> 
>>> struct target_builtins default_target_builtins;
>>> #if SWITCHABLE_TARGET
>>> @@ -7410,7 +7411,9 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
>>> machine_mode mode,
>>>return target;
>>>   if (fcode == BUILT_IN_MEMCMP_EQ)
>>>{
>>> - tree newdecl = builtin_decl_explicit (BUILT_IN_MEMCMP);
>>> + tree newdecl = builtin_decl_explicit
>>> +   (extra_libc_functions.has_memcmpeq
>>> +? BUILT_IN___MEMCMPEQ : BUILT_IN_MEMCMP);
>>>  TREE_OPERAND (exp, 1) = build_fold_addr_expr (newdecl);
>>>}
>>>   break;
>>> diff --git a/gcc/builtins.def b/gcc/builtins.def
>>> index 005976f34e9..eb8d33b16e9 100644
>>> --- a/gcc/builtins.def
>>> +++ b/gcc/builtins.def
>>> @@ -965,6 +965,10 @@ DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_ALIGN_AND_MAX, 
>>> "__builtin_alloca_with_ali
>>>equality with zero.  */
>>> DEF_BUILTIN_STUB (BUILT_IN_MEMCMP_EQ, "__builtin_memcmp_eq")
>>> 
>>> +/* Similar to BUILT_IN_MEMCMP_EQ, but is mapped to __memcmpeq only with
>>> +   -fextra-libc-function=memcmpeq.  */
>>> +DEF_EXT_LIB_BUILTIN (BUILT_IN___MEMCMPEQ, "__memcmpeq", 
>>> BT_FN_INT_CONST_PTR_CONST_PTR_SIZE, ATTR_PURE_NOTHROW_NONNULL_LEAF)
>>> +
>>> /* An internal version of strcmp/strncmp, used when the result is only
>>>tested for equality with zero.  */
>>> DEF_BUILTIN_STUB (BUILT_IN_STRCMP_EQ, "__builtin_strcmp_eq")
>>> diff --git a/gcc/common.opt b/gcc/common.opt
>>> index

Re: [PATCH] Do not erase warning data in gimple_set_location

2022-06-13 Thread Martin Sebor via Gcc-patches


On 6/13/22 05:15, Richard Biener wrote:

On Fri, Jun 10, 2022 at 12:58 PM Eric Botcazou via Gcc-patches
 wrote:


Hi,

gimple_set_location is mostly invoked on newly built GIMPLE statements, so
their location is UNKNOWN_LOCATION and setting it will clobber the warning
data of the passed location, if any.


Hmm, I think instead of special-casing UNKNOWN_LOCATION
what gimple_set_location should probably do is either not copy
warnings at all or union them.  Btw, gimple_set_location also
removes a previously set BLOCK (but gimple_set_block preserves
the location locus and diagnostic override).

So I'd be tempted to axe the copy_warning () completely here.  Martin,
there were
probably cases that warranted it - do you remember anything specific here?


Nothing specific, just that the assumption behind the warning group
design was that a location must exist in order to suppress a warning
(a location is one of the first things that's set early on by the FE
and it makes little sense to issue a warning without one).

There was and in all likelihood still is code sets TREE_NO_WARNING
or gimple_no_warning on new trees/statements before setting their
location.  That interferes with the design when the new tree or
statement is meant to be a replacement of another.  I fixed a few
cases like that to set the location first but didn't have a way
of finding all such instances.  My expectation was to over time
change GCC to make sure a location would always be set before
the no-warning bit, and asserting that on every call to these
routines.  Adding tests like in the patch below goes in the opposite
direction and effectively papers over the problem.  I can't think
of a way to make the suppression work completely reliably without
ensuring that a location is always set before suppressing
a warning.

Martin



Thanks,
Richard.


Tested on x86-64/Linux, OK for mainline and 12 branch?


2022-06-10  Eric Botcazou  

 * gimple.h (gimple_set_location): Do not copy warning data from
 the previous location when it is UNKNOWN_LOCATION.


2022-06-10  Eric Botcazou  

testsuite/
 * c-c++-common/nonnull-1.c: Remove XFAIL for C++.

--
Eric Botcazou

[PATCH] i386: Return true for (SUBREG (MEM....)) in register_no_elim_operand [PR105927]

2022-06-13 Thread Uros Bizjak via Gcc-patches

Under certain conditions register_operand predicate also allows
subregs of memory operands.  When RTL checking is enabled, these
will fail with REGNO (op).

Allow subregs of memory operands, these are guaranteed
to be reloaded to a register.

2022-06-13  Uroš Bizjak  

gcc/ChangeLog:

PR target/105927
* config/i386/predicates.md (register_no_elim_operand):
Return true for subreg of a memory operand.

gcc/testsuite/ChangeLog:

PR target/105927
* gcc.target/i386/pr105927.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 848a79a8d16..128144f1050 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -672,6 +672,12 @@ (define_predicate "register_no_elim_operand"
 {
   if (SUBREG_P (op))
 op = SUBREG_REG (op);
+
+  /* Before reload, we can allow (SUBREG (MEM...)) as a register operand
+ because it is guaranteed to be reloaded into one.  */
+  if (MEM_P (op))
+return true;
+
   return !(op == arg_pointer_rtx
   || op == frame_pointer_rtx
   || IN_RANGE (REGNO (op),
@@ -685,6 +691,7 @@ (define_predicate "index_register_operand"
 {
   if (SUBREG_P (op))
 op = SUBREG_REG (op);
+
   if (reload_completed)
 return REG_OK_FOR_INDEX_STRICT_P (op);
   else
diff --git a/gcc/testsuite/gcc.target/i386/pr105927.c 
b/gcc/testsuite/gcc.target/i386/pr105927.c
new file mode 100644
index 000..602461806fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105927.c
@@ -0,0 +1,18 @@
+/* PR target/105927 */
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O1 -fno-tree-dce -mtune=k6-3 -msse2" } */
+
+typedef _Float16 __attribute__((__vector_size__(4))) U;
+typedef _Float16 __attribute__((__vector_size__(2))) V;
+typedef short __attribute__((__vector_size__(4))) W;
+V v;
+U u;
+
+extern void bar(W i);
+
+void
+foo(void)
+{
+  U x = __builtin_shufflevector(v, u, 2, 0);
+  bar(x >= 0);
+}

[committed] d: Match function declarations of gcc built-ins from any module.

2022-06-13 Thread Iain Buclaw via Gcc-patches

This patch changes the `Compiler::onParseModule' hook in the D front-end
to scan for declarations of recognised gcc built-ins from any module.
Previously, only the `core.stdc' package was scanned.

In addition to matching of the symbol, any user-applied `@attributes' or
`pragma(mangle)' name will be applied to the built-in decl as well.
Because there would now be no control over where built-in declarations
are coming from, the warning option `-Wbuiltin-declaration-mismatch' has
been implemented in the D front-end too.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32, and
committed to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* d-builtins.cc: Include builtins.h.
(gcc_builtins_libfuncs): Remove.
(strip_type_modifiers): New function.
(matches_builtin_type): New function.
(covariant_with_builtin_type_p): New function.
(maybe_set_builtin_1): Set front-end built-in if identifier matches
gcc built-in name.  Apply user-specified attributes and assembler name
overrides to the built-in.  Warn about built-in declaration mismatches.
(d_builtin_function): Set IDENTIFIER_DECL_TREE of built-in functions.
* d-compiler.cc (Compiler::onParseModule): Scan all modules for any
identifiers that match built-in function names.
* lang.opt (Wbuiltin-declaration-mismatch): New option.

gcc/testsuite/ChangeLog:

* gdc.dg/Wbuiltin_declaration_mismatch.d: New test.
* gdc.dg/builtins.d: New test.
---
 gcc/d/d-builtins.cc   | 136 --
 gcc/d/d-compiler.cc   |  40 +++---
 gcc/d/lang.opt|   4 +
 .../gdc.dg/Wbuiltin_declaration_mismatch.d|  37 +
 gcc/testsuite/gdc.dg/builtins.d   |  17 +++
 5 files changed, 203 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/Wbuiltin_declaration_mismatch.d
 create mode 100644 gcc/testsuite/gdc.dg/builtins.d

diff --git a/gcc/d/d-builtins.cc b/gcc/d/d-builtins.cc
index cd9748c1de1..c2ef0c836e1 100644
--- a/gcc/d/d-builtins.cc
+++ b/gcc/d/d-builtins.cc
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "common/common-target.h"
 #include "stringpool.h"
 #include "stor-layout.h"
+#include "builtins.h"
 
 #include "d-tree.h"
 #include "d-frontend.h"
@@ -44,7 +45,6 @@ along with GCC; see the file COPYING3.  If not see
 
 
 static GTY(()) vec  *gcc_builtins_functions = NULL;
-static GTY(()) vec  *gcc_builtins_libfuncs = NULL;
 static GTY(()) vec  *gcc_builtins_types = NULL;
 
 /* Record built-in types and their associated decls for re-use when
@@ -672,6 +672,87 @@ d_build_builtins_module (Module *m)
   m->members->push (LinkDeclaration::create (Loc (), LINK::c, members));
 }
 
+/* Remove all type modifiers from TYPE, returning the naked type.  */
+
+static Type *
+strip_type_modifiers (Type *type)
+{
+  if (type->ty == TY::Tpointer)
+{
+  Type *tnext = strip_type_modifiers (type->nextOf ());
+  return tnext->pointerTo ();
+}
+
+  return type->castMod (0);
+}
+
+/* Returns true if types T1 and T2 representing return types or types of
+   function arguments are close enough to be considered interchangeable.  */
+
+static bool
+matches_builtin_type (Type *t1, Type *t2)
+{
+  Type *tb1 = strip_type_modifiers (t1);
+  Type *tb2 = strip_type_modifiers (t2);
+
+  if (same_type_p (t1, t2))
+return true;
+
+  if (((tb1->isTypePointer () && tb2->isTypePointer ())
+   || (tb1->isTypeVector () && tb2->isTypeVector ()))
+  && tb1->implicitConvTo (tb2) != MATCH::nomatch)
+return true;
+
+  if (tb1->isintegral () == tb2->isintegral ()
+  && tb1->size () == tb2->size ())
+return true;
+
+  return false;
+}
+
+/* Check whether the declared function type T1 is covariant with the built-in
+   function type T2.  Returns true if they are covariant.  */
+
+static bool
+covariant_with_builtin_type_p (Type *t1, Type *t2)
+{
+  /* Check whether the declared function matches the built-in.  */
+  if (same_type_p (t1, t2) || t1->covariant (t2) == Covariant::yes)
+return true;
+
+  /* May not be covariant because of D attributes applied on t1.
+ Strip them all off and compare again.  */
+  TypeFunction *tf1 = t1->isTypeFunction ();
+  TypeFunction *tf2 = t2->isTypeFunction ();
+
+  /* Check for obvious reasons why types may be distinct.  */
+  if (tf1 == NULL || tf2 == NULL
+  || tf1->isref () != tf2->isref ()
+  || tf1->parameterList.varargs != tf2->parameterList.varargs
+  || tf1->parameterList.length () != tf2->parameterList.length ())
+return false;
+
+  /* Check return type and each parameter type for mismatch.  */
+  if (!matches_builtin_type (tf1->next, tf2->next))
+return false;
+
+  const size_t nparams = tf1->parameterList.length ();
+  for (size_t i = 0; i < nparams; i++)
+{
+  Parameter *fparam1 = tf1->parameterList[i];
+  Parameter *fparam2 =

RE: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-13 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Richard Biener 
> Sent: Monday, June 13, 2022 12:48 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> 
> Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2
> bitmask
> 
> On Mon, 13 Jun 2022, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Monday, June 13, 2022 10:39 AM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> > > 
> > > Subject: Re: [PATCH 1/2]middle-end Support optimized division by
> > > pow2 bitmask
> > >
> > > On Mon, 13 Jun 2022, Richard Biener wrote:
> > >
> > > > On Thu, 9 Jun 2022, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > In plenty of image and video processing code it's common to
> > > > > modify pixel values by a widening operation and then scale them
> > > > > back into range
> > > by dividing by 255.
> > > > >
> > > > > This patch adds an optab to allow us to emit an optimized
> > > > > sequence when doing an unsigned division that is equivalent to:
> > > > >
> > > > >x = y / (2 ^ (bitsize (y)/2)-1
> > > > >
> > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > > x86_64-pc-linux-gnu and no issues.
> > > > >
> > > > > Ok for master?
> > > >
> > > > Looking at 2/2 it seems that this is the wrong way to attack the
> > > > problem.  The ISA doesn't have such instruction so adding an optab
> > > > looks premature.  I suppose that there's no unsigned vector
> > > > integer division and thus we open-code that in a different way?
> > > > Isn't the correct thing then to fixup that open-coding if it is more
> efficient?
> > >
> >
> > The problem is that even if you fixup the open-coding it would need to
> > be something target specific? The sequence of instructions we generate
> > don't have a GIMPLE representation.  So whatever is generated I'd have
> > to fixup in RTL then.
> 
> What's the operation that doesn't have a GIMPLE representation?

For NEON use two operations:
1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(a)/2
Where the + widens and the >> narrows.  So you give it two shorts, get a 
byte
2. Add widening add of lowpart so basically lowpart (a +w b)

For SVE2 we use a different sequence, we use two back-to-back sequences of:
1. Add narrow high part (bottom).  In SVE the Top and Bottom instructions select
   Even and odd elements of the vector rather than "top half" and "bottom half".

   So this instruction does : Add each vector element of the first source 
vector to the
   corresponding vector element of the second source vector, and place the most
significant half of the result in the even-numbered half-width destination 
elements,
while setting the odd-numbered elements to zero.

So there's an explicit permute in there. The instructions are sufficiently 
different that there
wouldn't be a single GIMPLE representation.

> 
> I think for costing you could resort to the *_cost functions as used by
> synth_mult and friends.
> 
> > The problem with this is that it seemed fragile. We generate from the
> > Vectorizer:
> >
> >   vect__3.8_35 = MEM  [(uint8_t *)_21];
> >   vect_patt_28.9_37 = WIDEN_MULT_LO_EXPR  vect_cst__36>;
> >   vect_patt_28.9_38 = WIDEN_MULT_HI_EXPR  vect_cst__36>;
> >   vect_patt_19.10_40 = vect_patt_28.9_37 h* { 32897, 32897, 32897, 32897,
> 32897, 32897, 32897, 32897 };
> >   vect_patt_19.10_41 = vect_patt_28.9_38 h* { 32897, 32897, 32897, 32897,
> 32897, 32897, 32897, 32897 };
> >   vect_patt_25.11_42 = vect_patt_19.10_40 >> 7;
> >   vect_patt_25.11_43 = vect_patt_19.10_41 >> 7;
> >   vect_patt_11.12_44 = VEC_PACK_TRUNC_EXPR  > vect_patt_25.11_43>;
> >
> > and if the magic constants change then we miss the optimization. I
> > could rewrite the open coding to use shifts alone, but that might be a
> regression for some uarches I would imagine.
> 
> OK, so you do have a highpart multiply.  I suppose the pattern is too deep to
> be recognized by combine?  What's the RTL good vs. bad before combine of
> one of the expressions?

Yeah combine only tried 2-3 instructions, but to use these sequences we have to
match the entire chain as the instructions do the narrowing themselves.  So the 
RTL
for the bad case before combine is

(insn 39 37 42 4 (set (reg:V4SI 119)
(mult:V4SI (zero_extend:V4SI (vec_select:V4HI (reg:V8HI 116 [ 
vect_patt_28.9D.3754 ])
(parallel:V8HI [
(const_int 4 [0x4])
(const_int 5 [0x5])
(const_int 6 [0x6])
(const_int 7 [0x7])
])))
(zero_extend:V4SI (vec_select:V4HI (reg:V8HI 118)
(parallel:V8HI [
(const_int 4 [0x4])
(const_int 5 [0x5])
(const_int 6 [0x6])
(const_int 7 [0x7])

Re: [PATCH] Add -fextra-libc-function=memcmpeq for __memcmpeq

2022-06-13 Thread H.J. Lu via Gcc-patches

On Mon, Jun 13, 2022 at 3:11 AM Richard Biener
 wrote:
>
> On Tue, Jun 7, 2022 at 9:02 PM H.J. Lu via Gcc-patches
>  wrote:
> >
> > Add -fextra-libc-function=memcmpeq to map
> >
> > extern int __memcmpeq (const void *, const void *, size_t);
> >
> > which was added to GLIBC 2.35, to __builtin_memcmp_eq.
>
> Humm.  Can't we instead use the presence of a declaration
> of __memcmpeq with a GNU standard dialect as this instead of
> adding a weird -fextra-libc-function= option?  Maybe that's even
> reasonable with a non-GNU dialect standard in effect since
> __ prefixed names are in the implementation namespace?

But not all source codes include  and GCC may generate
memcmp directly.  How should we handle these cases?

> Richard.
>
> > gcc/
> >
> > * builtins.cc: Include "opts.h".
> > (expand_builtin): Generate BUILT_IN_MEMCMP_EQ if __memcmpeq is
> > available.
> > * builtins.def (BUILT_IN___MEMCMPEQ): New.
> > * common.opt: Add -fextra-libc-function=.
> > * opts.cc (extra_libc_functions): New.
> > (parse_extra_libc_function): New function.
> > (common_handle_option): Handle -fextra-libc-function=.
> > * opts.h (extra_libc_function_list): New.
> > (extra_libc_functions): Likewise.
> > * doc/invoke.texi: Document -fextra-libc-function=memcmpeq.
> >
> > gcc/testsuite/
> >
> > * c-c++-common/memcmpeq-1.c: New test.
> > * c-c++-common/memcmpeq-2.c: Likewise.
> > * c-c++-common/memcmpeq-3.c: Likewise.
> > * c-c++-common/memcmpeq-4.c: Likewise.
> > * c-c++-common/memcmpeq-5.c: Likewise.
> > * c-c++-common/memcmpeq-6.c: Likewise.
> > * c-c++-common/memcmpeq-7.c: Likewise.
> > ---
> >  gcc/builtins.cc |  5 -
> >  gcc/builtins.def|  4 
> >  gcc/common.opt  |  4 
> >  gcc/doc/invoke.texi |  6 ++
> >  gcc/opts.cc | 23 +++
> >  gcc/opts.h  |  7 +++
> >  gcc/testsuite/c-c++-common/memcmpeq-1.c | 11 +++
> >  gcc/testsuite/c-c++-common/memcmpeq-2.c | 11 +++
> >  gcc/testsuite/c-c++-common/memcmpeq-3.c | 11 +++
> >  gcc/testsuite/c-c++-common/memcmpeq-4.c | 11 +++
> >  gcc/testsuite/c-c++-common/memcmpeq-5.c | 11 +++
> >  gcc/testsuite/c-c++-common/memcmpeq-6.c | 11 +++
> >  gcc/testsuite/c-c++-common/memcmpeq-7.c | 11 +++
> >  13 files changed, 125 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-1.c
> >  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-2.c
> >  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-3.c
> >  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-4.c
> >  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-5.c
> >  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-6.c
> >  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-7.c
> >
> > diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> > index b9d89b409b8..22269318e8c 100644
> > --- a/gcc/builtins.cc
> > +++ b/gcc/builtins.cc
> > @@ -81,6 +81,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "demangle.h"
> >  #include "gimple-range.h"
> >  #include "pointer-query.h"
> > +#include "opts.h"
> >
> >  struct target_builtins default_target_builtins;
> >  #if SWITCHABLE_TARGET
> > @@ -7410,7 +7411,9 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
> > machine_mode mode,
> > return target;
> >if (fcode == BUILT_IN_MEMCMP_EQ)
> > {
> > - tree newdecl = builtin_decl_explicit (BUILT_IN_MEMCMP);
> > + tree newdecl = builtin_decl_explicit
> > +   (extra_libc_functions.has_memcmpeq
> > +? BUILT_IN___MEMCMPEQ : BUILT_IN_MEMCMP);
> >   TREE_OPERAND (exp, 1) = build_fold_addr_expr (newdecl);
> > }
> >break;
> > diff --git a/gcc/builtins.def b/gcc/builtins.def
> > index 005976f34e9..eb8d33b16e9 100644
> > --- a/gcc/builtins.def
> > +++ b/gcc/builtins.def
> > @@ -965,6 +965,10 @@ DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_ALIGN_AND_MAX, 
> > "__builtin_alloca_with_ali
> > equality with zero.  */
> >  DEF_BUILTIN_STUB (BUILT_IN_MEMCMP_EQ, "__builtin_memcmp_eq")
> >
> > +/* Similar to BUILT_IN_MEMCMP_EQ, but is mapped to __memcmpeq only with
> > +   -fextra-libc-function=memcmpeq.  */
> > +DEF_EXT_LIB_BUILTIN (BUILT_IN___MEMCMPEQ, "__memcmpeq", 
> > BT_FN_INT_CONST_PTR_CONST_PTR_SIZE, ATTR_PURE_NOTHROW_NONNULL_LEAF)
> > +
> >  /* An internal version of strcmp/strncmp, used when the result is only
> > tested for equality with zero.  */
> >  DEF_BUILTIN_STUB (BUILT_IN_STRCMP_EQ, "__builtin_strcmp_eq")
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index 7ca0cceed82..7a7631682b0 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -1587,6 +1587,10 @@ Enum(excess_precision) String(standard) 
> >

[RFA configure parts] aarch64: Make cc1 handle --with options

2022-06-13 Thread Richard Sandiford via Gcc-patches

On aarch64, --with-arch, --with-cpu and --with-tune only have an
effect on the driver, so “./xgcc -B./ -O3” can give significantly
different results from “./cc1 -O3”.  --with-arch did have a limited
effect on ./cc1 in previous releases, although it didn't work
entirely correctly.

Being of a lazy persuasion, I've got used to ./cc1 selecting SVE for
--with-arch=armv8.2-a+sve without having to supply an explicit -march,
so this patch makes ./cc1 emulate the relevant OPTION_DEFAULT_SPECS.
It relies on Wilco's earlier clean-ups.

The patch makes config.gcc define WITH_FOO_STRING macros for each
supported --with-foo option.  This could be done only in aarch64-
specific code, but I thought it could be useful on other targets
too (and can be safely ignored otherwise).  There didn't seem to
be any existing and potentially clashing uses of macros with this
style of name.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK for the configure
bits?

Richard


gcc/
* config.gcc: Define WITH_FOO_STRING macros for each supported
--with-foo option.
* config/aarch64/aarch64.cc (aarch64_override_options): Emulate
OPTION_DEFAULT_SPECS.
* config/aarch64/aarch64.h (OPTION_DEFAULT_SPECS): Reference the above.
---
 gcc/config.gcc| 14 ++
 gcc/config/aarch64/aarch64.cc |  8 
 gcc/config/aarch64/aarch64.h  |  5 -
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index cdbefb5b4f5..e039230431c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -5865,6 +5865,20 @@ else
configure_default_options="{ ${t} }"
 fi
 
+for option in $supported_defaults
+do
+   lc_option=`echo $option | sed s/-/_/g`
+   uc_option=`echo $lc_option | tr a-z A-Z`
+   eval "val=\$with_$lc_option"
+   if test -n "$val"
+   then
+   val="\\\"$val\\\""
+   else
+   val=nullptr
+   fi
+   tm_defines="$tm_defines WITH_${uc_option}_STRING=$val"
+done
+
 if test "$target_cpu_default2" != ""
 then
if test "$target_cpu_default" != ""
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d21e041eccb..0bc700b81ad 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18109,6 +18109,14 @@ aarch64_override_options (void)
   if (aarch64_branch_protection_string)
 aarch64_validate_mbranch_protection (aarch64_branch_protection_string);
 
+  /* Emulate OPTION_DEFAULT_SPECS.  */
+  if (!aarch64_arch_string && !aarch64_cpu_string)
+aarch64_arch_string = WITH_ARCH_STRING;
+  if (!aarch64_arch_string && !aarch64_cpu_string)
+aarch64_cpu_string = WITH_CPU_STRING;
+  if (!aarch64_cpu_string && !aarch64_tune_string)
+aarch64_tune_string = WITH_TUNE_STRING;
+
   /* -mcpu=CPU is shorthand for -march=ARCH_FOR_CPU, -mtune=CPU.
  If either of -march or -mtune is given, they override their
  respective component of -mcpu.  */
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 80cfe4b7407..3122dbd7098 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -1267,7 +1267,10 @@ extern enum aarch64_code_model aarch64_cmodel;
 /* Support for configure-time --with-arch, --with-cpu and --with-tune.
--with-arch and --with-cpu are ignored if either -mcpu or -march is used.
--with-tune is ignored if either -mtune or -mcpu is used (but is not
-   affected by -march).  */
+   affected by -march).
+
+   There is corresponding code in aarch64_override_options that emulates
+   this behavior when cc1  are invoked directly.  */
 #define OPTION_DEFAULT_SPECS   \
   {"arch", "%{!march=*:%{!mcpu=*:-march=%(VALUE)}}" }, \
   {"cpu",  "%{!march=*:%{!mcpu=*:-mcpu=%(VALUE)}}" },   \
-- 
2.25.1

Re: [PATCH][WIP] have configure probe prefix for gmp/mpfr/mpc [PR44425]

2022-06-13 Thread Eric Gallager via Gcc-patches

On Mon, Jun 13, 2022 at 7:02 AM Richard Biener
 wrote:
>
> On Thu, Jun 2, 2022 at 5:54 PM Eric Gallager via Gcc-patches
>  wrote:
> >
> > So, I'm working on fixing PR bootstrap/44425, and have this patch to
> > have the top-level configure script check in the value passed to
> > `--prefix=` when looking for gmp/mpfr/mpc. It "works" (in that
> > configuring with just `--prefix=` and none of
> > `--with-gmp=`/`--with-mpfr=`/`--with-mpc=` now works where it failed
> > before), but unfortunately it results in a bunch of duplicated
> > `-I`/`-L` flags stuck in ${gmplibs} and ${gmpinc}... is that
> > acceptable or should I try another approach?
>
> --prefix is documented as to be used for installing (architecture
> independent) files,
> I'm not sure it is a good idea to probe that for gmp/mpfr/mpc installs used 
> for
> the host.
>
> Richard.
>
> > Eric

So... I guess that means we should close bug 44425 as INVALID or
WONTFIX then? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44425

Re: [PATCH, OpenMP, v4] Implement uses_allocators clause for target regions

2022-06-13 Thread Jakub Jelinek via Gcc-patches

On Mon, Jun 13, 2022 at 09:29:34PM +0800, Chung-Lin Tang wrote:
> > I was hoping you'd drop all this.
> > Seehttps://gcc.gnu.org/r13-1002
> > for implementation (both C and C++ FE) of something very similar,
> > the only difference there is that in the case of linear clause, it is
> > looking for
> > val
> > ref
> > uval
> > step ( whatever )
> > followed by , or )
> > (anod ref and uval not in C FE),
> > while you are looking for
> > memspace ( whatever )
> > traits ( whatever )
> > followed by : or by , (in case of , repeat).
> > But in both cases you can actually use the same parser APIs
> > for raw token pre-parsing to just compute if it is the modifier
> > syntax or not, set bool has_modifiers based on that (when you
> > come over probably valid syntax followed by CPP_COLON).
> 
> The linear clause doesn't have the legacy 'allocator1(t1), allocator2(t2), 
> ...' requirement,
> and c_parser_omp_variable_list doesn't seem to support this pattern.

True, but I don't see why it is relevant.

> Also, the way c_parser_omp_clause_linear is implemented doesn't support the 
> requirement
> you mentioned earlier of allowing the use of "memspace", "traits" as the 
> allocator name when
> it's actually not a modifier.

No, it is exactly the same thing.
As you can see e.g. in the testsuite coverage I've committed in the linear
patch, in the linear clause case after : either one uses a modifier syntax,
or everything after the : is the step expression (assignment-expression in
C/C++).  There is parsing ambiguity and the spec says that it is resolved
to the modifier syntax in that case.
Say if I have:
constexpr int step (int x) { return x; }
and use
linear (a, b, c : step (1))
then it is the modifier syntax (incompatible change from 5.1), while
linear (a, b, c : step (1) + 0)
linear (a, b, c : (step (1)))
linear (a, b, c : 0 + step (1))
etc. have step expressions.  The spec wording is such that it doesn't even
have to be discovered by strictly tentative parsing (like in GCC the C++ and
Fortran FEs do but C FE doesn't), modifier syntax wins if one sees the
modifiers with balanced () after it if it is complex, followed by , or ) as
a terminator of a single modifier.
The first raw token walk in the patch is just a fast "tentative" parsing
check whether it is modifier syntax or not, if it is, then we just parse it
as modifiers, otherwise parse it as expression.

The uses_allocator is similar, although in that case it actually isn't
a parsing ambiguity, just that we need arbitrary number of tokens look-ahead
to decide.  We need to check if the tokens after uses_allocators (
look like one or more complex modifiers (with the 2 modifier names and just
a balanced ()s after them - in the uses_allocators case currently all
supported modifiers are complex), if yes and it is followed by : token,
it is the modifiers syntax, otherwise it is not.
So say:
#include 
void foo (void)
{
  omp_alloc_handle_t traits, x;
  const omp_alloctrait_t my_traits[] = { ... };
  #pragma omp target uses_allocators (traits (my_traits) : x)
  ;
  #pragma omp target uses_allocators (traits (my_traits), x (my_traits))
  ;
  #pragma omp target uses_allocators (traits (my_traits), 
omp_high_mem_bw_mem_alloc)
  ;
  #pragma omp target uses_allocators (traits (my_traits))
  ;
}
All the clauses above start with the same tokens, but depending on what
follows we need to either parse it as the modifier syntax (the first
directive) or as the compatibility syntax (the rest).

Which is why I was suggesting to do this quick raw token parsing check
if it is the modifier syntax or not.
If it is, parse modifiers and : and then you know to expect a single
allocator without ()s after it (e.g. you could use
c_parser_omp_variable_list etc. and just verify afterwards the list
has a single entry in it).
If it is not, it might still be old or new syntax, the latter only if
the list contains a single var and not followed by ()s and sure, you need
to write a parsing loop for that.  It isn't the same thing as the modifier
loop though, modifiers start with a keyword, the allocator list with
a identifier for the variable.

For uses_allocators, we can then even simplify when we almost finish
OpenMP 6.0 support, if the old style syntax uses_allocators are gone
by then, we could decide if it is a modifier syntax or not just by
looking at first 2 tokens, whether the first token is allowed modifier
keyword and whether it is followed by (, then we could commit to
modifier parsing right away.  And the loop to do the ()s parsing
can go too...

Jakub

Re: [PATCH, OpenMP, v4] Implement uses_allocators clause for target regions

2022-06-13 Thread Chung-Lin Tang


On 2022/6/9 8:22 PM, Jakub Jelinek wrote:

+   OpenMP 5.2:
+
+   uses_allocators ( modifier : allocator-list )

Please drop the -list above.


+   uses_allocators ( modifier , modifier : allocator-list )

and here too.


Thanks for catching.


+  struct item_tok
+  {
+location_t loc;
+tree id;
+item_tok (void) : loc (UNKNOWN_LOCATION), id (NULL_TREE) {}
+  };
+  struct item { item_tok name, arg; };
+  auto_vec *modifiers = NULL, *allocators = NULL;
+  auto_vec *cur_list = new auto_vec (4);

I was hoping you'd drop all this.
Seehttps://gcc.gnu.org/r13-1002
for implementation (both C and C++ FE) of something very similar,
the only difference there is that in the case of linear clause, it is
looking for
val
ref
uval
step ( whatever )
followed by , or )
(anod ref and uval not in C FE),
while you are looking for
memspace ( whatever )
traits ( whatever )
followed by : or by , (in case of , repeat).
But in both cases you can actually use the same parser APIs
for raw token pre-parsing to just compute if it is the modifier
syntax or not, set bool has_modifiers based on that (when you
come over probably valid syntax followed by CPP_COLON).


The linear clause doesn't have the legacy 'allocator1(t1), allocator2(t2), ...' 
requirement,
and c_parser_omp_variable_list doesn't seem to support this pattern.

Also, the way c_parser_omp_clause_linear is implemented doesn't support the 
requirement
you mentioned earlier of allowing the use of "memspace", "traits" as the 
allocator name when
it's actually not a modifier.

I have merged the v4 patch with the syntax comments updated as above to 
devel/omp/gcc-11.

Thanks,
Chung-Lin

[PATCH 1/2] riscv-cores.def: Fix description of RISCV_CORE() macro

2022-06-13 Thread Christoph Muellner

From: Christoph Müllner 

The current description of RISCV_CORE() does not match the
implementation. This commit provides a fix for that.

gcc/ChangeLog:

* config/riscv/riscv-cores.def: Fix comment.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-cores.def | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def
index ecb5e213d98..60bcadbb034 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -21,15 +21,13 @@
 
Before using #include to read this file, define a macro:
 
-  RISCV_CORE(CORE_NAME, ARCH, MICRO_ARCH, TUNE_INFO)
+  RISCV_CORE(CORE_NAME, ARCH, TUNE_INFO)
 
The CORE_NAME is the name of the core, represented as a string.
-   The ARCH is the default arch of the core, represented as a string,
-   can be NULL if no default arch.
-   The MICRO_ARCH is the name of the core for which scheduling decisions
-   will be made, represented as an identifier.
-   The TUNE_INFO is the detail cost model for this core, represented as an
-   identifier, reference to riscv-tunes.def.  */
+   The ARCH is a string describing the supported RISC-V ISA (e.g. "rv32i"
+   or "rv64gc_zifencei").
+   The TUNE_INFO is a string that references the detail tuning information
+   for this core (refer to riscv_tune_info_table for possible values).  */
 
 RISCV_CORE("sifive-e20",  "rv32imc","rocket")
 RISCV_CORE("sifive-e21",  "rv32imac",   "rocket")
-- 
2.35.3

[PATCH 2/2] riscv-cores.def: Add Allwinner D1 core

2022-06-13 Thread Christoph Muellner

From: Christoph Müllner 

This adds Allwinner's D1 to the list of known cores.
The Allwinner includes a single-core XuanTie C906 and is available
for quite some time. Note, that the tuning struct for the C906
is already part of GCC.

gcc/ChangeLog:

* config/riscv/riscv-cores.def (RISCV_CORE): Add "allwinner-d1".

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-cores.def | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def
index 60bcadbb034..dd97ece376f 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -44,4 +44,6 @@ RISCV_CORE("sifive-s76",  "rv64imafdc", "sifive-7-series")
 RISCV_CORE("sifive-u54",  "rv64imafdc", "sifive-5-series")
 RISCV_CORE("sifive-u74",  "rv64imafdc", "sifive-7-series")
 
+RISCV_CORE("thead-c906",  "rv64imafdc", "thead-c906")
+
 #undef RISCV_CORE
-- 
2.35.3

[committed] d: Improve TypeInfo errors when compiling in -fno-rtti mode

2022-06-13 Thread Iain Buclaw via Gcc-patches

Hi,

The existing TypeInfo errors can be cryptic.  This patch alters the
diagnostic to include which expression is requiring `object.TypeInfo'.

Bootstrapped and regression tested on x86_64-linux-gnu, and backported
to releases/gcc-12 branch.

Regards,
Iain.

---
gcc/d/ChangeLog:

* d-tree.h (check_typeinfo_type): Add Expression* parameter.
(build_typeinfo): Likewise.  Declare new override.
* expr.cc (ExprVisitor): Call build_typeinfo with Expression*.
* typeinfo.cc (check_typeinfo_type): Include expression in the
diagnostic message.
(build_typeinfo): New override.

gcc/testsuite/ChangeLog:

* gdc.dg/rtti1.d: New test.
---
 gcc/d/d-tree.h   |  5 +++--
 gcc/d/expr.cc| 36 ++--
 gcc/d/typeinfo.cc| 34 --
 gcc/testsuite/gdc.dg/rtti1.d | 18 ++
 4 files changed, 63 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/rtti1.d

diff --git a/gcc/d/d-tree.h b/gcc/d/d-tree.h
index d93d02c2954..5f6b9d61001 100644
--- a/gcc/d/d-tree.h
+++ b/gcc/d/d-tree.h
@@ -671,8 +671,9 @@ extern tree layout_classinfo (ClassDeclaration *);
 extern unsigned base_vtable_offset (ClassDeclaration *, BaseClass *);
 extern tree get_typeinfo_decl (TypeInfoDeclaration *);
 extern tree get_classinfo_decl (ClassDeclaration *);
-extern void check_typeinfo_type (const Loc &, Scope *);
-extern tree build_typeinfo (const Loc &, Type *);
+extern void check_typeinfo_type (const Loc &, Scope *, Expression * = NULL);
+extern tree build_typeinfo (const Loc &, Type *, Expression * = NULL);
+extern tree build_typeinfo (Expression *, Type *);
 extern void create_typeinfo (Type *, Module *);
 extern void create_tinfo_types (Module *);
 extern void layout_cpp_typeinfo (ClassDeclaration *);
diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index 179f9a5f4fd..fba397bed35 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -427,7 +427,7 @@ public:
tree result = build_libcall (LIBCALL_ADEQ2, e->type, 3,
 d_array_convert (e->e1),
 d_array_convert (e->e2),
-build_typeinfo (e->loc, t1array));
+build_typeinfo (e, t1array));
 
if (e->op == EXP::notEqual)
  result = build1 (TRUTH_NOT_EXPR, build_ctype (e->type), result);
@@ -450,7 +450,7 @@ public:
   {
/* Use _aaEqual() for associative arrays.  */
tree result = build_libcall (LIBCALL_AAEQUAL, e->type, 3,
-build_typeinfo (e->loc, tb1),
+build_typeinfo (e, tb1),
 build_expr (e->e1),
 build_expr (e->e2));
 
@@ -484,7 +484,7 @@ public:
 /* Build a call to _aaInX().  */
 this->result_ = build_libcall (LIBCALL_AAINX, e->type, 3,
   build_expr (e->e2),
-  build_typeinfo (e->loc, tkey),
+  build_typeinfo (e, tkey),
   build_address (key));
   }
 
@@ -728,13 +728,13 @@ public:
   size_int (ndims), build_address (var));
 
result = build_libcall (LIBCALL_ARRAYCATNTX, e->type, 2,
-   build_typeinfo (e->loc, e->type), arrs);
+   build_typeinfo (e, e->type), arrs);
   }
 else
   {
/* Handle single concatenation (a ~ b).  */
result = build_libcall (LIBCALL_ARRAYCATT, e->type, 3,
-   build_typeinfo (e->loc, e->type),
+   build_typeinfo (e, e->type),
d_array_convert (etype, e->e1),
d_array_convert (etype, e->e2));
   }
@@ -903,7 +903,7 @@ public:
 
/* So we can call postblits on const/immutable objects.  */
Type *tm = etype->unSharedOf ()->mutableOf ();
-   tree ti = build_typeinfo (e->loc, tm);
+   tree ti = build_typeinfo (e, tm);
 
/* Generate: _d_arraysetassign (t1.ptr, , t1.length, ti);  */
result = build_libcall (LIBCALL_ARRAYSETASSIGN, Type::tvoid, 4,
@@ -977,7 +977,7 @@ public:
  {
/* Generate: _d_arrayassign(ti, from, to);  */
this->result_ = build_libcall (LIBCALL_ARRAYASSIGN, e->type, 3,
-  build_typeinfo (e->loc, etype),
+  build_typeinfo (e, etype),
   d_array_convert (e->e2),
   d_array_convert (e->e1));
  }
@@ -1122,7 +1122,7 @@ public:
Type *arrtype = (e->type->ty == TY::Tsarray)

[committed] openmp: Conforming device numbers and omp_{initial,invalid}_device

2022-06-13 Thread Jakub Jelinek via Gcc-patches

Hi!

OpenMP 5.2 changed once more what device numbers are allowed.
In 5.1, valid device numbers were [0, omp_get_num_devices()].
5.2 makes also -1 valid (calls it omp_initial_device), which is equivalent
in behavior to omp_get_num_devices() number but has the advantage that it
is a constant.  And it also introduces omp_invalid_device which is
also a constant with implementation defined value < -1.  That value should
act like sNaN, any time any device construct (GOMP_target*) or OpenMP runtime
API routine is asked for such a device, the program is terminated.
And if OMP_TARGET_OFFLOAD=mandatory, all non-conforming device numbers (which
is all but [-1, omp_get_num_devices()] other than omp_invalid_device)
must be treated like omp_invalid_device.

For device constructs, we have a compatibility problem, we've historically
used 2 magic negative values to mean something special.
GOMP_DEVICE_ICV (-1) means device clause wasn't present, pick the
 omp_get_default_device () number
GOMP_DEVICE_FALLBACK (-2) means the host device (this is used e.g. for
  #pragma omp target if (cond)
  where if cond is false, we pass -2
But 5.2 requires that omp_initial_device is -1 (there were discussions
about it, advantage of -1 is that one can say iterate over the
[-1, omp_get_num_devices()-1] range to get all devices starting with
the host/initial one.
And also, if user passes -2, unless it is omp_invalid_device, we need to
treat it like non-conforming with OMP_TARGET_OFFLOAD=mandatory.

So, the patch does on the compiler side some number remapping,
user_device_num >= -2U ? user_device_num - 1 : user_device_num.
This remapping is done at compile time if device clause has constant
argument, otherwise at runtime, and means that for user -1 (omp_initial_device)
we pass -2 to GOMP_* in the runtime library where it treats it like host
fallback, while -2 is remapped to -3 (one of the non-conforming device numbers,
for those it doesn't matter which one is which).
omp_invalid_device is then -4.
For the OpenMP device runtime APIs, no remapping is done.

This patch doesn't deal with the initial default-device-var for
OMP_TARGET_OFFLOAD=mandatory , the spec says that the inital ICV value
for that should in that case depend on whether there are any offloading
devices or not (if not, should be omp_invalid_device), but that means
we can't determine the number of devices lazily (and let libraries have the
possibility to register their offloading data etc.).

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2022-06-13  Jakub Jelinek  

gcc/
* omp-expand.cc (expand_omp_target): Remap user provided
device clause arguments, -1 to -2 and -2 to -3, either
at compile time if constant, or at runtime.
include/
* gomp-constants.h (GOMP_DEVICE_INVALID): Define.
libgomp/
* omp.h.in (omp_initial_device, omp_invalid_device): New enumerators.
* omp_lib.f90.in (omp_initial_device, omp_invalid_device): New
parameters.
* omp_lib.h.in (omp_initial_device, omp_invalid_device): Likewise.
* target.c (resolve_device): Add remapped argument, handle
GOMP_DEVICE_ICV only if remapped is true (and clear remapped),
for negative values, treat GOMP_DEVICE_FALLBACK as fallback only
if remapped, otherwise treat omp_initial_device that way.  For
omp_invalid_device, always emit gomp_fatal, even when
OMP_TARGET_OFFLOAD isn't mandatory.
(GOMP_target, GOMP_target_ext, GOMP_target_data, GOMP_target_data_ext,
GOMP_target_update, GOMP_target_update_ext,
GOMP_target_enter_exit_data): Pass true as remapped argument to
resolve_device.
(omp_target_alloc, omp_target_free, omp_target_is_present,
omp_target_memcpy_check, omp_target_associate_ptr,
omp_target_disassociate_ptr, omp_get_mapped_ptr,
omp_target_is_accessible): Pass false as remapped argument to
resolve_device.  Treat omp_initial_device the same as
gomp_get_num_devices ().  Don't bypass resolve_device calls if
device_num is negative.
(omp_pause_resource): Treat omp_initial_device the same as
gomp_get_num_devices ().  Call resolve_device.
* icv-device.c (omp_set_default_device): Always set to device_num
even when it is negative.
* libgomp.texi: Document that Conforming device numbers,
omp_initial_device and omp_invalid_device is implemented.
* testsuite/libgomp.c/target-41.c (main): Add test with
omp_initial_device.
* testsuite/libgomp.c/target-45.c: New test.
* testsuite/libgomp.c/target-46.c: New test.
* testsuite/libgomp.c/target-47.c: New test.
* testsuite/libgomp.c-c++-common/target-is-accessible-1.c (main): Add
test with omp_initial_device.  Use -5 instead of -1 for negative value
test.
*

Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to set the lowpart.

2022-06-13 Thread Richard Biener via Gcc-patches

On Sun, Jun 12, 2022 at 7:27 PM Jeff Law via Gcc-patches
 wrote:
[...]
> On a related topic, any thoughts on keeping complex objects as complex
> types/modes through gimple and into at least parts of the RTL pipeline?
>
> The way complex arithmetic instructions work on our chip is going to be
> extremely tough to utilize in GCC -- we really need to the complex
> types/arithmetic up through RTL generation at the least. Ideally we'd
> even expose complex modes all the way to final.Is that something
> y'all could benefit from as well?  Have y'all poked at this problem at all?

Since you are going to need to "recover" complex operations from people
open-coding them (both fortran and C and also C++ with std::complex) it
should be less work to just do that ;)  I think that complex modes and types
exist solely for ABI purposes.

Richard.

> jeff
>

RE: [PATCH]AArch64 relax predicate on load structure load instructions

2022-06-13 Thread Richard Biener via Gcc-patches

On Mon, 13 Jun 2022, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, June 13, 2022 9:38 AM
> > To: Richard Sandiford 
> > Cc: Tamar Christina ; gcc-patches@gcc.gnu.org;
> > nd ; Richard Earnshaw ;
> > Marcus Shawcroft ; Kyrylo Tkachov
> > ; ro...@eyesopen.com
> > Subject: Re: [PATCH]AArch64 relax predicate on load structure load
> > instructions
> > 
> > On Mon, 13 Jun 2022, Richard Sandiford wrote:
> > 
> > > Richard Biener  writes:
> > > > On Wed, 8 Jun 2022, Richard Sandiford wrote:
> > > >> Tamar Christina  writes:
> > > >> >> -Original Message-
> > > >> >> From: Richard Sandiford 
> > > >> >> Sent: Wednesday, June 8, 2022 11:31 AM
> > > >> >> To: Tamar Christina 
> > > >> >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> > > >> >> ; Marcus Shawcroft
> > > >> >> ; Kyrylo Tkachov
> > > >> >> 
> > > >> >> Subject: Re: [PATCH]AArch64 relax predicate on load structure
> > > >> >> load instructions
> > > >> >>
> > > >> >> Tamar Christina  writes:
> > > >> >> > Hi All,
> > > >> >> >
> > > >> >> > At some point in time we started lowering the ld1r instructions in
> > gimple.
> > > >> >> >
> > > >> >> > That is:
> > > >> >> >
> > > >> >> > uint8x8_t f1(const uint8_t *in) {
> > > >> >> > return vld1_dup_u8([1]); }
> > > >> >> >
> > > >> >> > generates at gimple:
> > > >> >> >
> > > >> >> >   _3 = MEM[(const uint8_t *)in_1(D) + 1B];
> > > >> >> >   _4 = {_3, _3, _3, _3, _3, _3, _3, _3};
> > > >> >> >
> > > >> >> > Which is good, but we then generate:
> > > >> >> >
> > > >> >> > f1:
> > > >> >> >   ldr b0, [x0, 1]
> > > >> >> >   dup v0.8b, v0.b[0]
> > > >> >> >   ret
> > > >> >> >
> > > >> >> > instead of ld1r.
> > > >> >> >
> > > >> >> > The reason for this is because the load instructions have a
> > > >> >> > too restrictive predicate on them which causes combine not to
> > > >> >> > be able to combine the instructions due to the predicate only
> > > >> >> > accepting simple
> > > >> >> addressing modes.
> > > >> >> >
> > > >> >> > This patch relaxes the predicate to accept any memory operand
> > > >> >> > and relies on LRA to legitimize the address when it needs to
> > > >> >> > as the constraint still only allows the simple addressing
> > > >> >> > mode.  Reload is always able to legitimize to these.
> > > >> >> >
> > > >> >> > Secondly since we are now actually generating more ld1r it
> > > >> >> > became clear that the lane instructions suffer from a similar 
> > > >> >> > issue.
> > > >> >> >
> > > >> >> > i.e.
> > > >> >> >
> > > >> >> > float32x4_t f2(const float32_t *in, float32x4_t a) {
> > > >> >> > float32x4_t dup = vld1q_dup_f32([1]);
> > > >> >> > return vfmaq_laneq_f32 (a, a, dup, 1); }
> > > >> >> >
> > > >> >> > would generate ld1r + vector fmla instead of ldr + lane fmla.
> > > >> >> >
> > > >> >> > The reason for this is similar to the ld1r issue.  The
> > > >> >> > predicate is too restrictive in only acception register operands 
> > > >> >> > but
> > not memory.
> > > >> >> >
> > > >> >> > This relaxes it to accept register and/or memory while leaving
> > > >> >> > the constraint to only accept registers.  This will have LRA
> > > >> >> > generate a reload if needed forcing the memory to registers
> > > >> >> > using the standard
> > > >> >> patterns.
> > > >> >> >
> > > >> >> > These two changes allow combine and reload to generate the
> > > >> >> > right
> > > >> >> sequences.
> > > >> >> >
> > > >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >> >>
> > > >> >> This is going against the general direction of travel, which is
> > > >> >> to make the instruction's predicates and conditions enforce the
> > > >> >> constraints as much as possible (making optimistic assumptions
> > about pseudo registers).
> > > >> >>
> > > >> >> The RA *can* deal with things like:
> > > >> >>
> > > >> >>   (match_operand:M N "general_operand" "r")
> > > >> >>
> > > >> >> but it's best avoided, for a few reasons:
> > > >> >>
> > > >> >> (1) The fix-up will be done in LRA, so IRA will not see the 
> > > >> >> temporary
> > > >> >> registers.  This can make the allocation of those temporaries
> > > >> >> suboptimal but (more importantly) it might require other
> > > >> >> previously-allocated registers to be spilled late due to the
> > > >> >> unexpected increase in register pressure.
> > > >> >>
> > > >> >> (2) It ends up hiding instructions from the pre-RA optimisers.
> > > >> >>
> > > >> >> (3) It can also prevent combine opportunities (as well as create
> > them),
> > > >> >> unless the loose predicates in an insn I are propagated to all
> > > >> >> patterns that might result from combining I with something else.
> > > >> >>
> > > >> >> It sounds like the first problem (not generating ld1r) could be
> > > >> >> fixed by (a) combining aarch64_simd_dup and
> > > >> >> *aarch64_simd_ld1r, so that the register and memory
> > > >> >> alternatives are in the same pattern and

RE: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-13 Thread Richard Biener via Gcc-patches

On Mon, 13 Jun 2022, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, June 13, 2022 10:39 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> > 
> > Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2
> > bitmask
> > 
> > On Mon, 13 Jun 2022, Richard Biener wrote:
> > 
> > > On Thu, 9 Jun 2022, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > In plenty of image and video processing code it's common to modify
> > > > pixel values by a widening operation and then scale them back into range
> > by dividing by 255.
> > > >
> > > > This patch adds an optab to allow us to emit an optimized sequence
> > > > when doing an unsigned division that is equivalent to:
> > > >
> > > >x = y / (2 ^ (bitsize (y)/2)-1
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > x86_64-pc-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > >
> > > Looking at 2/2 it seems that this is the wrong way to attack the
> > > problem.  The ISA doesn't have such instruction so adding an optab
> > > looks premature.  I suppose that there's no unsigned vector integer
> > > division and thus we open-code that in a different way?  Isn't the
> > > correct thing then to fixup that open-coding if it is more efficient?
> > 
> 
> The problem is that even if you fixup the open-coding it would need to
> be something target specific? The sequence of instructions we generate
> don't have a GIMPLE representation.  So whatever is generated I'd have to 
> fixup
> in RTL then.

What's the operation that doesn't have a GIMPLE representation?

I think for costing you could resort to the *_cost functions as used
by synth_mult and friends.

> The problem with this is that it seemed fragile. We generate from the
> Vectorizer:
> 
>   vect__3.8_35 = MEM  [(uint8_t *)_21];
>   vect_patt_28.9_37 = WIDEN_MULT_LO_EXPR ;
>   vect_patt_28.9_38 = WIDEN_MULT_HI_EXPR ;
>   vect_patt_19.10_40 = vect_patt_28.9_37 h* { 32897, 32897, 32897, 32897, 
> 32897, 32897, 32897, 32897 };
>   vect_patt_19.10_41 = vect_patt_28.9_38 h* { 32897, 32897, 32897, 32897, 
> 32897, 32897, 32897, 32897 };
>   vect_patt_25.11_42 = vect_patt_19.10_40 >> 7;
>   vect_patt_25.11_43 = vect_patt_19.10_41 >> 7;
>   vect_patt_11.12_44 = VEC_PACK_TRUNC_EXPR  vect_patt_25.11_43>;
> 
> and if the magic constants change then we miss the optimization. I could 
> rewrite the open coding to use
> shifts alone, but that might be a regression for some uarches I would imagine.

OK, so you do have a highpart multiply.  I suppose the pattern is too deep
to be recognized by combine?  What's the RTL good vs. bad before combine
of one of the expressions?

> > Btw, on x86 we use
> > 
> > t.c:3:21: note:   replacing earlier pattern patt_25 = patt_28 / 255;
> > t.c:3:21: note:   with patt_25 = patt_19 >> 7;
> > t.c:3:21: note:   extra pattern stmt: patt_19 = patt_28 h* 32897;
> > 
> > which translates to
> > 
> > vpmulhuw%ymm4, %ymm0, %ymm0
> > vpmulhuw%ymm4, %ymm1, %ymm1
> > vpsrlw  $7, %ymm0, %ymm0
> > vpsrlw  $7, %ymm1, %ymm1
> > 
> > there's odd
> > 
> > vpand   %ymm0, %ymm3, %ymm0
> > vpand   %ymm1, %ymm3, %ymm1
> > 
> > before (%ymm3 is all 0x00ff)
> > 
> > vpackuswb   %ymm1, %ymm0, %ymm0
> > 
> > that's not visible in GIMPLE.  I guess aarch64 lacks a highpart multiply 
> > here?
> > In any case, it seems that generic division expansion could be improved
> > here? (choose_multiplier?)
> 
> We do generate multiply highpart here, but the patch completely avoids 
> multiplies
> and shifts entirely by creative use of the ISA. Another reason I went for an 
> optab is costing.
> The chosen operations are significantly cheaper on all Arm uarches than 
> Shifts and multiply.
> 
> This means we get vectorization in some cases where the cost model would 
> correctly say
> It's too expensive to vectorize. Particularly around double precision.
> 
> Thanks,
> Tamar
> 
> > 
> > Richard.
> > 
> > > Richard.
> > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * internal-fn.def (DIV_POW2_BITMASK): New.
> > > > * optabs.def (udiv_pow2_bitmask_optab): New.
> > > > * doc/md.texi: Document it.
> > > > * tree-vect-patterns.cc (vect_recog_divmod_pattern): Recognize
> > pattern.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.dg/vect/vect-div-bitmask-1.c: New test.
> > > > * gcc.dg/vect/vect-div-bitmask-2.c: New test.
> > > > * gcc.dg/vect/vect-div-bitmask-3.c: New test.
> > > > * gcc.dg/vect/vect-div-bitmask.h: New file.
> > > >
> > > > --- inline copy of patch --
> > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > > >
> > f3619c505c025f158c2bc64756531877378b22e1..784c49d7d24cef7619e4d613f7
> > > > b4f6e945866c38 100644
> > > > --- a/gcc/doc/md.texi
> > > > +++ b/gcc/doc/md.texi
> > > > @@ -5588,6

Re: [PATCH] vect: Move suggested_unroll_factor applying [PR105940]

2022-06-13 Thread Richard Biener via Gcc-patches

On Mon, Jun 13, 2022 at 12:02 PM Kewen.Lin  wrote:
>
> Hi,
>
> As PR105940 shown, when rs6000 port tries to assign
> m_suggested_unroll_factor by 4 or so, there will be ICE
> on below statement:
>
>   exact_div (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
>  loop_vinfo->suggested_unroll_factor);
>
> In function vect_analyze_loop_2, the current place of
> suggested_unroll_factor applying can't guarantee it's
> applied for all cases.  As the case shows, vectorizer
> could retry with SLP forced off, the vf is reset by
> saved_vectorization_factor which isn't applied with
> suggested_unroll_factor before.  It means it can end
> up with one vf which neglects suggested_unroll_factor.
> I think it's off design, we should move the applying
> of suggested_unroll_factor after start_over.
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?

OK (I think the GCC 12 branch is also affected).

Richard.

>
> BR,
> Kewen
> -
> PR tree-optimization/105940
>
> gcc/ChangeLog:
>
> * tree-vect-loop.cc (vect_analyze_loop_2): Move the place of
> applying suggested_unroll_factor after start_over.
> ---
>  gcc/tree-vect-loop.cc | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 896218f23ea..af955d26f8d 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -2393,15 +2393,15 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool 
> ,
>   set of rgroups.  */
>gcc_assert (LOOP_VINFO_MASKS (loop_vinfo).is_empty ());
>
> +  /* This is the point where we can re-start analysis with SLP forced off.  
> */
> +start_over:
> +
>/* Apply the suggested unrolling factor, this was determined by the backend
>   during finish_cost the first time we ran the analyzis for this
>   vector mode.  */
>if (loop_vinfo->suggested_unroll_factor > 1)
>  LOOP_VINFO_VECT_FACTOR (loop_vinfo) *= 
> loop_vinfo->suggested_unroll_factor;
>
> -  /* This is the point where we can re-start analysis with SLP forced off.  
> */
> -start_over:
> -
>/* Now the vectorization factor is final.  */
>poly_uint64 vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>gcc_assert (known_ne (vectorization_factor, 0U));
> --
> 2.27.0

Re: [PATCH] PR middle-end/105874: Use EXPAND_MEMORY to fix ada bootstrap.

2022-06-13 Thread Eric Botcazou via Gcc-patches

> +   /* If tem is a VAR_DECL, we need a memory reference.  */
> +   enum expand_modifier tem_modifier = modifier;
> +   if (tem_modifier == EXPAND_SUM)
> + tem_modifier = EXPAND_NORMAL;
> +   if (TREE_CODE (tem) == VAR_DECL)
> + tem_modifier = EXPAND_MEMORY;
> 
> that changes EXPAND_WRITE to EXPAND_MEMORY for VAR_DECL
> for example - what's 'modifier' in the problematic case?

My understanding is that it was EXPAND_NORMAL.

> I do not understand how 'VAR_DECL' is special here btw. - it seems to be a
> condition making sure the new optimization doesn't trigger rather than a
> condition that will always require memory?

It may indeed be too big a hammer.

Roger, would it be sufficient to use EXPAND_MEMORY only when must_force_mem 
computed a few lines below if true?

-- 
Eric Botcazou

Re: [PATCH] Introduce -finstrument-functions-once

2022-06-13 Thread Richard Biener via Gcc-patches

On Mon, Jun 13, 2022 at 11:46 AM Eric Botcazou  wrote:
>
> > So that also applies to
> >
> > "... and the second profiling function is called before the exit
> > +corresponding to this first entry"
> >
> > specifically "corresponding to this first entry"?   As if the second
> > entry exits first will that call the second profiling function or will
> > it really be the thread that called the first profiling function
> > (what happens when that thread terminates before calling the second
> > profiling function? (***)).  Consider re-wording this slightly.
>
> The calls are always paired, i.e. if a thread calls the first function, then
> it will call the second function; I can indeed state it explicitly in the doc.
>
> > +  /* If -finstrument-functions-once is specified, generate:
> > +
> > +  static volatile bool F.0 = true;
> > +  bool tmp_first;
> >
> > is there any good reason to make F.0 volatile?  That doesn't prevent
> > races.
>
> No, it does not, but it guarantees a single read so the pairing.
>
> > Any reason to make F.0 initialized to true rather than false (bss init?)
>
> None, changed.
>
> > (***) looking at the implementation the second profiling function
> > can end up being never called when the thread calling the first
> > profiling function does not exit the function.  So I wonder if
> > the "optimization"(?) not re-reading F.0 makes sense (it also
> > requires to keep the value of F.0 live across the whole function)
>
> It's for the pairing.  The value should be spilled onto the stack if need be,
> so you'd get at most 2 loads like if you re-read the variable.
>
> Revised patch attached.

OK.

Thanks,
Richard.

>
> --
> Eric Botcazou

c++: Separate late stage module writing

2022-06-13 Thread Nathan Sidwell via Gcc-patches



This moves some module writing into a newly added write_end function,
which is called after writing initializers.

Thus paving the way to eliminate calls to NOP initializer fns.

nathan

--
Nathan SidwellFrom 6303eee4b92e8509409503a3abebde8bd50f0f05 Mon Sep 17 00:00:00 2001
From: Nathan Sidwell 
Date: Thu, 9 Jun 2022 08:48:25 -0700
Subject: [PATCH] c++: Separate late stage module writing

This moves some module writing into a newly added write_end function,
which is called after writing initializers.

	gcc/cp/
	* module.cc (module_state::write): Separate to ...
	(module_state::write_begin, module_state::write_end): ...
	these.
	(module_state::write_readme): Drop extensions parameter.
	(struct module_processing_cookie): Add more fields.
	(finish_module_processing): Adjust state writing call.
	(late_finish_module): Call write_end.
---
 gcc/cp/module.cc | 47 ++-
 1 file changed, 30 insertions(+), 17 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 51d774ae608..e7ce40ef464 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -3523,7 +3523,10 @@ class GTY((chain_next ("%h.parent"), for_user)) module_state {
 
  public:
   /* Read and write module.  */
-  void write (elf_out *to, cpp_reader *);
+  void write_begin (elf_out *to, cpp_reader *,
+		module_state_config &, unsigned );
+  void write_end (elf_out *to, cpp_reader *,
+		  module_state_config &, unsigned );
   bool read_initial (cpp_reader *);
   bool read_preprocessor (bool);
   bool read_language (bool);
@@ -3545,8 +3548,7 @@ class GTY((chain_next ("%h.parent"), for_user)) module_state {
 
  private:
   /* The README, for human consumption.  */
-  void write_readme (elf_out *to, cpp_reader *,
-		 const char *dialect, unsigned extensions);
+  void write_readme (elf_out *to, cpp_reader *, const char *dialect);
   void write_env (elf_out *to);
 
  private:
@@ -13954,8 +13956,7 @@ module_state::announce (const char *what) const
  readelf -pgnu.c++.README $(module).gcm */
 
 void
-module_state::write_readme (elf_out *to, cpp_reader *reader,
-			const char *dialect, unsigned extensions)
+module_state::write_readme (elf_out *to, cpp_reader *reader, const char *dialect)
 {
   bytes_out readme (to);
 
@@ -17560,7 +17561,8 @@ ool_cmp (const void *a_, const void *b_)
 */
 
 void
-module_state::write (elf_out *to, cpp_reader *reader)
+module_state::write_begin (elf_out *to, cpp_reader *reader,
+			   module_state_config , unsigned )
 {
   /* Figure out remapped module numbers, which might elide
  partitions.  */
@@ -17656,8 +17658,6 @@ module_state::write (elf_out *to, cpp_reader *reader)
 }
   ool->qsort (ool_cmp);
 
-  unsigned crc = 0;
-  module_state_config config;
   location_map_info map_info = write_prepare_maps ();
   unsigned counts[MSC_HWM];
 
@@ -17811,28 +17811,35 @@ module_state::write (elf_out *to, cpp_reader *reader)
   unsigned clusters = counts[MSC_sec_hwm] - counts[MSC_sec_lwm];
   dump () && dump ("Wrote %u clusters, average %u bytes/cluster",
 		   clusters, (bytes + clusters / 2) / (clusters + !clusters));
+  trees_out::instrument ();
 
   write_counts (to, counts, );
 
-  /* And finish up.  */
-  write_config (to, config, crc);
-
   spaces.release ();
   sccs.release ();
 
   vec_free (ool);
 
-  /* Human-readable info.  */
-  write_readme (to, reader, config.dialect_str, extensions);
-
   // FIXME:QOI:  Have a command line switch to control more detailed
   // information (which might leak data you do not want to leak).
   // Perhaps (some of) the write_readme contents should also be
   // so-controlled.
   if (false)
 write_env (to);
+}
+
+// Finish module writing after we've emitted all dynamic initializers. 
+
+void
+module_state::write_end (elf_out *to, cpp_reader *reader,
+			 module_state_config , unsigned )
+{
+  /* And finish up.  */
+  write_config (to, config, crc);
+
+  /* Human-readable info.  */
+  write_readme (to, reader, config.dialect_str);
 
-  trees_out::instrument ();
   dump () && dump ("Wrote %u sections", to->get_section_limit ());
 }
 
@@ -19855,15 +19862,18 @@ maybe_check_all_macros (cpp_reader *reader)
 }
 
 // State propagated from finish_module_processing to fini_modules
+
 struct module_processing_cookie
 {
   elf_out out;
+  module_state_config config;
   char *cmi_name;
   char *tmp_name;
+  unsigned crc;
   bool began;
 
   module_processing_cookie (char *cmi, char *tmp, int fd, int e)
-: out (fd, e), cmi_name (cmi), tmp_name (tmp), began (false)
+: out (fd, e), cmi_name (cmi), tmp_name (tmp), crc (0), began (false)
   {
   }
   ~module_processing_cookie ()
@@ -19941,7 +19951,7 @@ finish_module_processing (cpp_reader *reader)
 	  auto loc = input_location;
 	  /* So crashes finger-point the module decl.  */
 	  input_location = state->loc;
-	  state->write (>out, reader);
+	  state->write_begin (>out, reader, cookie->config, cookie->crc);
 	  input_location = loc;
 	}
 
@@ -19977,6 +19987,9 @@

Re: [PATCH] Do not erase warning data in gimple_set_location

2022-06-13 Thread Richard Biener via Gcc-patches

On Fri, Jun 10, 2022 at 12:58 PM Eric Botcazou via Gcc-patches
 wrote:
>
> Hi,
>
> gimple_set_location is mostly invoked on newly built GIMPLE statements, so
> their location is UNKNOWN_LOCATION and setting it will clobber the warning
> data of the passed location, if any.

Hmm, I think instead of special-casing UNKNOWN_LOCATION
what gimple_set_location should probably do is either not copy
warnings at all or union them.  Btw, gimple_set_location also
removes a previously set BLOCK (but gimple_set_block preserves
the location locus and diagnostic override).

So I'd be tempted to axe the copy_warning () completely here.  Martin,
there were
probably cases that warranted it - do you remember anything specific here?

Thanks,
Richard.

> Tested on x86-64/Linux, OK for mainline and 12 branch?
>
>
> 2022-06-10  Eric Botcazou  
>
> * gimple.h (gimple_set_location): Do not copy warning data from
> the previous location when it is UNKNOWN_LOCATION.
>
>
> 2022-06-10  Eric Botcazou  
>
> testsuite/
> * c-c++-common/nonnull-1.c: Remove XFAIL for C++.
>
> --
> Eric Botcazou

[committed] d: Merge upstream dmd 821ed393d, druntime 454471d8, phobos 1206fc94f.

2022-06-13 Thread Iain Buclaw via Gcc-patches

Hi,

This patches merges the D front-end with upstream dmd 821ed393d, and the
standard library with upstream druntime 454471d8 and phobos 1206fc94f.

D front-end changes:

- Import latest bug fixes to mainline.

D runtime changes:

- Fix duplicate Elf64_Dyn definitions on Solaris.
- _d_newThrowable has been converted to a template.

Phobos changes:

- Import latest bug fixes to mainline.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 821ed393d.
* expr.cc (ExprVisitor::visit (NewExp *)): Remove handled of
allocating `@nogc' throwable object.
* runtime.def (NEWTHROW): Remove.

Bootstrapped and regression tested on x86-64-linux-gnu/-m32/-mx32, and
committed to mainline.

Regards,
Iain.

---
libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 454471d8.
* libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add
core/sync/package.d.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos 1206fc94f.
---
 gcc/d/dmd/MERGE   |   2 +-
 gcc/d/dmd/attrib.d|   6 +-
 gcc/d/dmd/cparse.d|  88 +++-
 gcc/d/dmd/cppmangle.d |   2 +-
 gcc/d/dmd/dcast.d |   4 +-
 gcc/d/dmd/dinterpret.d|  38 +-
 gcc/d/dmd/dscope.d|  17 +-
 gcc/d/dmd/dsymbol.d   |  18 +-
 gcc/d/dmd/dsymbol.h   |   2 -
 gcc/d/dmd/dsymbolsem.d|   7 -
 gcc/d/dmd/dtemplate.d |   8 +-
 gcc/d/dmd/expressionsem.d | 100 +++-
 gcc/d/dmd/func.d  |   4 +-
 gcc/d/dmd/id.d|   1 +
 gcc/d/dmd/mtype.d |  13 +-
 gcc/d/dmd/mtype.h |  11 +
 gcc/d/dmd/parse.d |   2 +-
 gcc/d/dmd/scope.h |   1 +
 gcc/d/dmd/statement.d |   2 +-
 gcc/d/dmd/statementsem.d  |  25 +-
 gcc/d/dmd/typesem.d   | 471 +-
 gcc/d/dmd/typinf.d|   9 +-
 gcc/d/expr.cc |  10 +-
 gcc/d/runtime.def |   1 -
 .../gdc.test/compilable/imports/defines.c |  28 ++
 gcc/testsuite/gdc.test/compilable/nogc.d  |   9 +
 gcc/testsuite/gdc.test/compilable/test22626.d |  23 +
 gcc/testsuite/gdc.test/compilable/test23076.d |  38 ++
 gcc/testsuite/gdc.test/compilable/test23142.d |  19 +
 gcc/testsuite/gdc.test/compilable/test23174.d |  58 +++
 .../gdc.test/compilable/testdefines.d |  14 +
 .../gdc.test/compilable/testdip1008.d |  19 +
 .../fail_compilation/mixin_template.d |  10 +
 .../gdc.test/fail_compilation/noreturn.d  |  18 +
 .../gdc.test/fail_compilation/template_decl.d |   9 +
 .../gdc.test/fail_compilation/test21477.d |  16 +
 .../gdc.test/fail_compilation/test23159.d |  22 +
 .../gdc.test/fail_compilation/traits.d|  18 +
 libphobos/libdruntime/MERGE   |   2 +-
 libphobos/libdruntime/Makefile.am |   6 +-
 libphobos/libdruntime/Makefile.in |  30 +-
 libphobos/libdruntime/core/attribute.d|  18 +-
 .../core/internal/array/equality.d|  33 +-
 libphobos/libdruntime/core/lifetime.d |  40 +-
 libphobos/libdruntime/core/stdcpp/xutility.d  |  15 +-
 .../core/sys/dragonflybsd/sys/elf32.d |  10 +
 .../core/sys/dragonflybsd/sys/elf64.d |  10 +
 libphobos/libdruntime/core/sys/elf/package.d  |  20 -
 .../libdruntime/core/sys/freebsd/sys/elf32.d  |  10 +
 .../libdruntime/core/sys/freebsd/sys/elf64.d  |  10 +
 libphobos/libdruntime/core/sys/linux/elf.d|  20 +
 .../libdruntime/core/sys/netbsd/sys/elf32.d   |  10 +
 .../libdruntime/core/sys/netbsd/sys/elf64.d   |  10 +
 .../libdruntime/core/sys/openbsd/sys/elf32.d  |  10 +
 .../libdruntime/core/sys/openbsd/sys/elf64.d  |  10 +
 .../core/sys/solaris/sys/elftypes.d   |  18 +-
 .../libdruntime/core/sys/solaris/sys/link.d   | 235 +
 .../libdruntime/core/thread/threadbase.d  |   2 +-
 libphobos/libdruntime/rt/ehalloc.d|  45 --
 libphobos/src/MERGE   |   2 +-
 libphobos/src/std/mmfile.d|  10 +-
 libphobos/src/std/sumtype.d   |  22 +-
 62 files changed, 1029 insertions(+), 712 deletions(-)
 create mode 100644 gcc/testsuite/gdc.test/compilable/imports/defines.c
 create mode 100644 gcc/testsuite/gdc.test/compilable/test22626.d
 create mode 100644 gcc/testsuite/gdc.test/compilable/test23076.d
 create mode 100644 gcc/testsuite/gdc.test/compilable/test23142.d
 create mode 100644 gcc/testsuite/gdc.test/compilable/test23174.d
 create mode 100644 gcc/testsuite/gdc.test/compilable/testdefines.d
 create mode 100644

Re: [PATCH][WIP] have configure probe prefix for gmp/mpfr/mpc [PR44425]

2022-06-13 Thread Richard Biener via Gcc-patches

On Thu, Jun 2, 2022 at 5:54 PM Eric Gallager via Gcc-patches
 wrote:
>
> So, I'm working on fixing PR bootstrap/44425, and have this patch to
> have the top-level configure script check in the value passed to
> `--prefix=` when looking for gmp/mpfr/mpc. It "works" (in that
> configuring with just `--prefix=` and none of
> `--with-gmp=`/`--with-mpfr=`/`--with-mpc=` now works where it failed
> before), but unfortunately it results in a bunch of duplicated
> `-I`/`-L` flags stuck in ${gmplibs} and ${gmpinc}... is that
> acceptable or should I try another approach?

--prefix is documented as to be used for installing (architecture
independent) files,
I'm not sure it is a good idea to probe that for gmp/mpfr/mpc installs used for
the host.

Richard.

> Eric

Re: [PATCH] aarch64: Lower vcombine to GIMPLE

2022-06-13 Thread Richard Biener via Gcc-patches

On Tue, Jun 7, 2022 at 7:24 PM Andrew Carlotti via Gcc-patches
 wrote:
>
> Hi all,
>
> This lowers vcombine intrinsics to a GIMPLE vector constructor, which enables 
> better optimisation during GIMPLE passes.
>
> Bootstrapped and tested on aarch64-none-linux-gnu, and tested for 
> aarch64_be-none-linux-gnu via cross-compilation.
>
>
> gcc/
>
> * config/aarch64/aarch64-builtins.c
> (aarch64_general_gimple_fold_builtin): Add combine.
>
> gcc/testsuite/
>
> * gcc.target/aarch64/advsimd-intrinsics/combine.c:
> New test.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 
> 5217dbdb2ac78bba0a669d22af6d769d1fe91a3d..9d52fb8c5a48c9b743defb340a85fb20a1c8f014
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -2827,6 +2827,18 @@ aarch64_general_gimple_fold_builtin (unsigned int 
> fcode, gcall *stmt,
> gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
> break;
>
> + BUILTIN_VDC (BINOP, combine, 0, AUTO_FP)
> + BUILTIN_VD_I (BINOPU, combine, 0, NONE)
> + BUILTIN_VDC_P (BINOPP, combine, 0, NONE)
> +   {
> + if (BYTES_BIG_ENDIAN)
> +   std::swap(args[0], args[1]);
> + tree ret_type = TREE_TYPE (gimple_call_lhs (stmt));
> + tree ctor = build_constructor_va (ret_type, 2, NULL_TREE, args[0], 
> NULL_TREE, args[1]);
> + new_stmt = gimple_build_assign (gimple_call_lhs (stmt), ctor);

the LHS might be NULL (that seems to be a general issue in this
function), x86 the simply
leaves the builtin alone.

> +   }
> +   break;
> +
>   /*lower store and load neon builtins to gimple.  */
>   BUILTIN_VALL_F16 (LOAD1, ld1, 0, LOAD)
>   BUILTIN_VDQ_I (LOAD1_U, ld1, 0, LOAD)
> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/combine.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/combine.c
> new file mode 100644
> index 
> ..d08faf7a4a160a1e83428ed9b270731bbf7b8c8a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/combine.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */
> +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
> +
> +#include 
> +
> +/*
> +** foo:
> +** umovw0, v1\.s\[1\]
> +** ret
> +*/
> +
> +int32_t foo (int32x2_t a, int32x2_t b)
> +{
> +  int32x4_t c = vcombine_s32(a, b);
> +  return vgetq_lane_s32(c, 3);
> +}
> +

Re: [PATCH] PR middle-end/105874: Use EXPAND_MEMORY to fix ada bootstrap.

2022-06-13 Thread Richard Biener via Gcc-patches

On Wed, Jun 8, 2022 at 4:14 PM Eric Botcazou via Gcc-patches
 wrote:
>
> > The fix is to ensure that we call expand_expr with EXPAND_MEMORY
> > when processing the VAR_DECL's returned by get_inner_reference.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check (with no new failures), but also with
> > --enable-languages="ada" where it allows the bootstrap to finish,
> > and with no unexpected failures in the acats and gnat testsuites.
> > Ok for mainline?
>
> Yes, thanks (modulo the nit in the ChangeLog).

+   /* If tem is a VAR_DECL, we need a memory reference.  */
+   enum expand_modifier tem_modifier = modifier;
+   if (tem_modifier == EXPAND_SUM)
+ tem_modifier = EXPAND_NORMAL;
+   if (TREE_CODE (tem) == VAR_DECL)
+ tem_modifier = EXPAND_MEMORY;

that changes EXPAND_WRITE to EXPAND_MEMORY for VAR_DECL
for example - what's 'modifier' in the problematic case?  I do not understand
how 'VAR_DECL' is special here btw. - it seems to be a condition making
sure the new optimization doesn't trigger rather than a condition that will
always require memory?

Richard.

> > 2022-06-08  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR middle-end/105874
> > * gcc/expr.cc (expand_expr_real_1) :  New local
> > variable tem_modifier for calculating the expand_modifier enum to
> > use for expanding tem.  If tem is a VAR_DECL, use EXPAND_MEMORY.
>
> gcc/ prefix to be stripped
>
> --
> Eric Botcazou
>
>

RE: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to set the lowpart.

2022-06-13 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Jeff Law via
> Gcc-patches
> Sent: Sunday, June 12, 2022 6:27 PM
> To: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to
> set the lowpart.
> 
> 
> 
> On 6/9/2022 1:52 AM, Tamar Christina via Gcc-patches wrote:
> > Hi All,
> >
> > When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs.
> One
> > for the lowpart and one for the highpart.
> >
> > The problem with this is that in RTL the lvalue of the RTX is the only
> > thing tying the two instructions together.
> >
> > This means that e.g. combine is unable to try to combine the two
> > instructions for setting the lowpart and highpart.
> >
> > For ISAs that have bit extract instructions we can eliminate one of
> > the extracts if, and only if we're setting the entire complex number.
> >
> > This change changes the expand code when we're setting the entire
> > complex number to generate a subreg for the lowpart instead of a
> vec_extract.
> >
> > This allows us to optimize sequences such as:
> Just a note.  I regularly see subregs significantly interfere with 
> optimization,
> particularly register allocation.  So be aware that subregs can often get in 
> the
> way of generating good code.  When changing something to use subregs I
> like to run real benchmarks rather than working with code snippets.
> 
> 
> >
> > _Complex int f(int a, int b) {
> >  _Complex int t = a + b * 1i;
> >  return t;
> > }
> >
> > from:
> >
> > f:
> > bfi x2, x0, 0, 32
> > bfi x2, x1, 32, 32
> > mov x0, x2
> > ret
> >
> > into:
> >
> > f:
> > bfi x0, x1, 32, 32
> > ret
> >
> > I have also confirmed the codegen for x86_64 did not change.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * emit-rtl.cc (validate_subreg): Accept subregs of complex modes.
> > * expr.cc (emit_move_complex_parts): Emit subreg of lowpart if
> possible.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.target/aarch64/complex-init.C: New test.
> OK.
> 
> On a related topic, any thoughts on keeping complex objects as complex
> types/modes through gimple and into at least parts of the RTL pipeline?
> 
> The way complex arithmetic instructions work on our chip is going to be
> extremely tough to utilize in GCC -- we really need to the complex
> types/arithmetic up through RTL generation at the least. Ideally we'd even
> expose complex modes all the way to final.    Is that something y'all could
> benefit from as well?  Have y'all poked at this problem at all?

Not extensively, but right now the big advantage of lowering them early is for
auto-vec.   Lowering them early allows you to e.g. realize that you only need 
the
imaginary part of the number etc.  For auto-vec it also means we treat them as
just any other loads/stores.

I think LLVM keeps them as complex expressions much longer and they've been
having a harder time implementing some of the complex arith stuff we did in GCC 
11.

Regards,
Tamar

> 
> jeff

Re: [PATCH 1/9] dwarf: add dw_get_die_parent function

2022-06-13 Thread Richard Biener via Gcc-patches

On Tue, Jun 7, 2022 at 11:44 PM David Faust via Gcc-patches
 wrote:

OK

> gcc/
>
> * dwarf2out.cc (dw_get_die_parent): New function.
> * dwarf2out.h (dw_get_die_parent): Declare it here.
> ---
>  gcc/dwarf2out.cc | 8 
>  gcc/dwarf2out.h  | 1 +
>  2 files changed, 9 insertions(+)
>
> diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
> index 29f32ec6939..9c61026bb34 100644
> --- a/gcc/dwarf2out.cc
> +++ b/gcc/dwarf2out.cc
> @@ -5235,6 +5235,14 @@ dw_get_die_sib (dw_die_ref die)
>return die->die_sib;
>  }
>
> +/* Return a reference to the parent of a given DIE.  */
> +
> +dw_die_ref
> +dw_get_die_parent (dw_die_ref die)
> +{
> +  return die->die_parent;
> +}
> +
>  /* Add an address constant attribute value to a DIE.  When using
> dwarf_split_debug_info, address attributes in dies destined for the
> final executable should be direct references--setting the parameter
> diff --git a/gcc/dwarf2out.h b/gcc/dwarf2out.h
> index 656ef94afde..e6962fb4848 100644
> --- a/gcc/dwarf2out.h
> +++ b/gcc/dwarf2out.h
> @@ -455,6 +455,7 @@ extern dw_die_ref lookup_type_die (tree);
>
>  extern dw_die_ref dw_get_die_child (dw_die_ref);
>  extern dw_die_ref dw_get_die_sib (dw_die_ref);
> +extern dw_die_ref dw_get_die_parent (dw_die_ref);
>  extern enum dwarf_tag dw_get_die_tag (dw_die_ref);
>
>  /* Data about a single source file.  */
> --
> 2.36.1
>

Re: [PATCH] Add -fextra-libc-function=memcmpeq for __memcmpeq

2022-06-13 Thread Richard Biener via Gcc-patches

On Tue, Jun 7, 2022 at 9:02 PM H.J. Lu via Gcc-patches
 wrote:
>
> Add -fextra-libc-function=memcmpeq to map
>
> extern int __memcmpeq (const void *, const void *, size_t);
>
> which was added to GLIBC 2.35, to __builtin_memcmp_eq.

Humm.  Can't we instead use the presence of a declaration
of __memcmpeq with a GNU standard dialect as this instead of
adding a weird -fextra-libc-function= option?  Maybe that's even
reasonable with a non-GNU dialect standard in effect since
__ prefixed names are in the implementation namespace?

Richard.

> gcc/
>
> * builtins.cc: Include "opts.h".
> (expand_builtin): Generate BUILT_IN_MEMCMP_EQ if __memcmpeq is
> available.
> * builtins.def (BUILT_IN___MEMCMPEQ): New.
> * common.opt: Add -fextra-libc-function=.
> * opts.cc (extra_libc_functions): New.
> (parse_extra_libc_function): New function.
> (common_handle_option): Handle -fextra-libc-function=.
> * opts.h (extra_libc_function_list): New.
> (extra_libc_functions): Likewise.
> * doc/invoke.texi: Document -fextra-libc-function=memcmpeq.
>
> gcc/testsuite/
>
> * c-c++-common/memcmpeq-1.c: New test.
> * c-c++-common/memcmpeq-2.c: Likewise.
> * c-c++-common/memcmpeq-3.c: Likewise.
> * c-c++-common/memcmpeq-4.c: Likewise.
> * c-c++-common/memcmpeq-5.c: Likewise.
> * c-c++-common/memcmpeq-6.c: Likewise.
> * c-c++-common/memcmpeq-7.c: Likewise.
> ---
>  gcc/builtins.cc |  5 -
>  gcc/builtins.def|  4 
>  gcc/common.opt  |  4 
>  gcc/doc/invoke.texi |  6 ++
>  gcc/opts.cc | 23 +++
>  gcc/opts.h  |  7 +++
>  gcc/testsuite/c-c++-common/memcmpeq-1.c | 11 +++
>  gcc/testsuite/c-c++-common/memcmpeq-2.c | 11 +++
>  gcc/testsuite/c-c++-common/memcmpeq-3.c | 11 +++
>  gcc/testsuite/c-c++-common/memcmpeq-4.c | 11 +++
>  gcc/testsuite/c-c++-common/memcmpeq-5.c | 11 +++
>  gcc/testsuite/c-c++-common/memcmpeq-6.c | 11 +++
>  gcc/testsuite/c-c++-common/memcmpeq-7.c | 11 +++
>  13 files changed, 125 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-1.c
>  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-2.c
>  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-3.c
>  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-4.c
>  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-5.c
>  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-6.c
>  create mode 100644 gcc/testsuite/c-c++-common/memcmpeq-7.c
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index b9d89b409b8..22269318e8c 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -81,6 +81,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "demangle.h"
>  #include "gimple-range.h"
>  #include "pointer-query.h"
> +#include "opts.h"
>
>  struct target_builtins default_target_builtins;
>  #if SWITCHABLE_TARGET
> @@ -7410,7 +7411,9 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
> machine_mode mode,
> return target;
>if (fcode == BUILT_IN_MEMCMP_EQ)
> {
> - tree newdecl = builtin_decl_explicit (BUILT_IN_MEMCMP);
> + tree newdecl = builtin_decl_explicit
> +   (extra_libc_functions.has_memcmpeq
> +? BUILT_IN___MEMCMPEQ : BUILT_IN_MEMCMP);
>   TREE_OPERAND (exp, 1) = build_fold_addr_expr (newdecl);
> }
>break;
> diff --git a/gcc/builtins.def b/gcc/builtins.def
> index 005976f34e9..eb8d33b16e9 100644
> --- a/gcc/builtins.def
> +++ b/gcc/builtins.def
> @@ -965,6 +965,10 @@ DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_ALIGN_AND_MAX, 
> "__builtin_alloca_with_ali
> equality with zero.  */
>  DEF_BUILTIN_STUB (BUILT_IN_MEMCMP_EQ, "__builtin_memcmp_eq")
>
> +/* Similar to BUILT_IN_MEMCMP_EQ, but is mapped to __memcmpeq only with
> +   -fextra-libc-function=memcmpeq.  */
> +DEF_EXT_LIB_BUILTIN (BUILT_IN___MEMCMPEQ, "__memcmpeq", 
> BT_FN_INT_CONST_PTR_CONST_PTR_SIZE, ATTR_PURE_NOTHROW_NONNULL_LEAF)
> +
>  /* An internal version of strcmp/strncmp, used when the result is only
> tested for equality with zero.  */
>  DEF_BUILTIN_STUB (BUILT_IN_STRCMP_EQ, "__builtin_strcmp_eq")
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 7ca0cceed82..7a7631682b0 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1587,6 +1587,10 @@ Enum(excess_precision) String(standard) 
> Value(EXCESS_PRECISION_STANDARD)
>  EnumValue
>  Enum(excess_precision) String(16) Value(EXCESS_PRECISION_FLOAT16)
>
> +fextra-libc-function=
> +Common Driver Joined
> +Specify the extra function in the C library.
> +
>  ; Whether we permit the extended set of values for FLT_EVAL_METHOD
>  ; introduced in ISO/IEC TS 18661-3, or limit ourselves to those in C99/C11.
>

RE: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-13 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Richard Biener 
> Sent: Monday, June 13, 2022 10:39 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> 
> Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2
> bitmask
> 
> On Mon, 13 Jun 2022, Richard Biener wrote:
> 
> > On Thu, 9 Jun 2022, Tamar Christina wrote:
> >
> > > Hi All,
> > >
> > > In plenty of image and video processing code it's common to modify
> > > pixel values by a widening operation and then scale them back into range
> by dividing by 255.
> > >
> > > This patch adds an optab to allow us to emit an optimized sequence
> > > when doing an unsigned division that is equivalent to:
> > >
> > >x = y / (2 ^ (bitsize (y)/2)-1
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > x86_64-pc-linux-gnu and no issues.
> > >
> > > Ok for master?
> >
> > Looking at 2/2 it seems that this is the wrong way to attack the
> > problem.  The ISA doesn't have such instruction so adding an optab
> > looks premature.  I suppose that there's no unsigned vector integer
> > division and thus we open-code that in a different way?  Isn't the
> > correct thing then to fixup that open-coding if it is more efficient?
> 

The problem is that even if you fixup the open-coding it would need to
be something target specific? The sequence of instructions we generate
don't have a GIMPLE representation.  So whatever is generated I'd have to fixup
in RTL then.

The problem with this is that it seemed fragile. We generate from the
Vectorizer:

  vect__3.8_35 = MEM  [(uint8_t *)_21];
  vect_patt_28.9_37 = WIDEN_MULT_LO_EXPR ;
  vect_patt_28.9_38 = WIDEN_MULT_HI_EXPR ;
  vect_patt_19.10_40 = vect_patt_28.9_37 h* { 32897, 32897, 32897, 32897, 
32897, 32897, 32897, 32897 };
  vect_patt_19.10_41 = vect_patt_28.9_38 h* { 32897, 32897, 32897, 32897, 
32897, 32897, 32897, 32897 };
  vect_patt_25.11_42 = vect_patt_19.10_40 >> 7;
  vect_patt_25.11_43 = vect_patt_19.10_41 >> 7;
  vect_patt_11.12_44 = VEC_PACK_TRUNC_EXPR ;

and if the magic constants change then we miss the optimization. I could 
rewrite the open coding to use
shifts alone, but that might be a regression for some uarches I would imagine.

> Btw, on x86 we use
> 
> t.c:3:21: note:   replacing earlier pattern patt_25 = patt_28 / 255;
> t.c:3:21: note:   with patt_25 = patt_19 >> 7;
> t.c:3:21: note:   extra pattern stmt: patt_19 = patt_28 h* 32897;
> 
> which translates to
> 
> vpmulhuw%ymm4, %ymm0, %ymm0
> vpmulhuw%ymm4, %ymm1, %ymm1
> vpsrlw  $7, %ymm0, %ymm0
> vpsrlw  $7, %ymm1, %ymm1
> 
> there's odd
> 
> vpand   %ymm0, %ymm3, %ymm0
> vpand   %ymm1, %ymm3, %ymm1
> 
> before (%ymm3 is all 0x00ff)
> 
> vpackuswb   %ymm1, %ymm0, %ymm0
> 
> that's not visible in GIMPLE.  I guess aarch64 lacks a highpart multiply here?
> In any case, it seems that generic division expansion could be improved
> here? (choose_multiplier?)

We do generate multiply highpart here, but the patch completely avoids 
multiplies
and shifts entirely by creative use of the ISA. Another reason I went for an 
optab is costing.
The chosen operations are significantly cheaper on all Arm uarches than Shifts 
and multiply.

This means we get vectorization in some cases where the cost model would 
correctly say
It's too expensive to vectorize. Particularly around double precision.

Thanks,
Tamar

> 
> Richard.
> 
> > Richard.
> >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * internal-fn.def (DIV_POW2_BITMASK): New.
> > >   * optabs.def (udiv_pow2_bitmask_optab): New.
> > >   * doc/md.texi: Document it.
> > >   * tree-vect-patterns.cc (vect_recog_divmod_pattern): Recognize
> pattern.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.dg/vect/vect-div-bitmask-1.c: New test.
> > >   * gcc.dg/vect/vect-div-bitmask-2.c: New test.
> > >   * gcc.dg/vect/vect-div-bitmask-3.c: New test.
> > >   * gcc.dg/vect/vect-div-bitmask.h: New file.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > >
> f3619c505c025f158c2bc64756531877378b22e1..784c49d7d24cef7619e4d613f7
> > > b4f6e945866c38 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -5588,6 +5588,18 @@ signed op0, op1;
> > >  op0 = op1 / (1 << imm);
> > >  @end smallexample
> > >
> > > +@cindex @code{udiv_pow2_bitmask@var{m2}} instruction pattern
> @item
> > > +@samp{udiv_pow2_bitmask@var{m2}} @cindex
> > > +@code{udiv_pow2_bitmask@var{m2}} instruction pattern @itemx
> > > +@samp{udiv_pow2_bitmask@var{m2}} Unsigned vector division by an
> > > +immediate that is equivalent to
> > > +@samp{2^(bitsize(m) / 2) - 1}.
> > > +@smallexample
> > > +unsigned short op0; op1;
> > > +@dots{}
> > > +op0 = op1 / 0xffU;
> > > +@end smallexample
> > > +
> > >  @cindex @code{vec_shl_insert_@var{m}} instruction pattern  @item
> > > @samp{vec_shl_insert_@var{m}}  Shift the elements in vector input
> > > operand 1

[PATCH] vect: Move suggested_unroll_factor applying [PR105940]

2022-06-13 Thread Kewen.Lin via Gcc-patches

Hi,

As PR105940 shown, when rs6000 port tries to assign
m_suggested_unroll_factor by 4 or so, there will be ICE
on below statement:

  exact_div (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
 loop_vinfo->suggested_unroll_factor);

In function vect_analyze_loop_2, the current place of
suggested_unroll_factor applying can't guarantee it's
applied for all cases.  As the case shows, vectorizer
could retry with SLP forced off, the vf is reset by
saved_vectorization_factor which isn't applied with
suggested_unroll_factor before.  It means it can end
up with one vf which neglects suggested_unroll_factor.
I think it's off design, we should move the applying
of suggested_unroll_factor after start_over.

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen
-
PR tree-optimization/105940

gcc/ChangeLog:

* tree-vect-loop.cc (vect_analyze_loop_2): Move the place of
applying suggested_unroll_factor after start_over.
---
 gcc/tree-vect-loop.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 896218f23ea..af955d26f8d 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2393,15 +2393,15 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool 
,
  set of rgroups.  */
   gcc_assert (LOOP_VINFO_MASKS (loop_vinfo).is_empty ());

+  /* This is the point where we can re-start analysis with SLP forced off.  */
+start_over:
+
   /* Apply the suggested unrolling factor, this was determined by the backend
  during finish_cost the first time we ran the analyzis for this
  vector mode.  */
   if (loop_vinfo->suggested_unroll_factor > 1)
 LOOP_VINFO_VECT_FACTOR (loop_vinfo) *= loop_vinfo->suggested_unroll_factor;

-  /* This is the point where we can re-start analysis with SLP forced off.  */
-start_over:
-
   /* Now the vectorization factor is final.  */
   poly_uint64 vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   gcc_assert (known_ne (vectorization_factor, 0U));
--
2.27.0

Re: [PATCH] Simplify vec_unpack of uniform_vector_p constructors in match.pd.

2022-06-13 Thread Richard Biener via Gcc-patches

On Mon, Jun 6, 2022 at 11:06 AM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Sat, May 21, 2022 at 5:31 PM Roger Sayle  
> > wrote:
> >> This patch simplifies vec_unpack_hi_expr/vec_unpack_lo_expr of a uniform
> >> constructor or vec_duplicate operand.  The motivation is from PR 105621
> >> where after optimization, we're left with:
> >>
> >>   vect_cst__21 = {c_8(D), c_8(D), c_8(D), c_8(D)};
> >>   vect_iftmp.7_4 = [vec_unpack_hi_expr] vect_cst__21;
> >>
> >> It turns out that there are no constant folding/simplification patterns
> >> in match.pd, but the above can be simplified further to the equivalent:
> >>
> >>   _20 = (long int) c_8(D);
> >>   vect_iftmp.7_4 = [vec_duplicate_expr] _20;
> >>
> >> which on x86-64 results in one less instruction, replacing pshufd $0
> >> then punpackhq, with punpcklqdq.  This transformation is also useful
> >> for helping CSE to spot that unpack_hi and unpack_lo are equivalent.
> >>
> >> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> >> and make -k check with no new failures.  Ok for mainline?
> >
> > I think we need a way to query whether the target can do a 
> > VEC_DUPLICATE_EXPR.
> > Currently we only ever have them for VL vectors and expand via
> > expand_vector_broadcast which eventually simply gives up when there's no
> > vec_duplicate or vec_init optabs suitable.
> >
> > IIRC with the VEC_PERM extension we should be able to handle
> > VEC_DUPLICATE via VEC_PERM?  (but we don't yet accept a scalar
> > input, just V1?)
>
> Yeah, should be possible.  Not sure whether it would really help though.
> A VEC_PERM_EXPR with only one scalar argument could only have one sensible
> permute mask[*], so there'd be a bit of false generality.
>
> Maybe allowing scalar arguments would be more useful for 2 distinct
> scalar arguments, but then I guess the question is: why stop at 2?
> So if we go down the route of accepting scalars, it might be more
> consistent to make VEC_PERM_EXPR support any number of operands
> and use it as a replacement for CONSTRUCTOR as well.

Discussion was hijacked by the '[PATCH]AArch64 relax predicate on load
structure load
instructions' thread btw.

Roger - your eyesopen.com mail bounces, can you fix your MAINTAINERS
entry please?

Richard.

> Thanks,
> Richard
>
> [*] At least until we support “don't care” elements.  However, like I
> mentioned before, I'd personally prefer a “don't care” mask to be
> a separate operand, rather than treating something like -1 as a
> special value.  Special values like that don't really fit the
> current encoding scheme for VL constants, but a separate mask would.
>
> A separate don't-care mask would also work for variable permute masks.
> >
> > I see most targets have picked up vec_duplicate but sparc, but still
> > we'd need to check the specific mode.  I think we can disregart
> > vec_init checking and only apply the transforms when vec_duplicate
> > is available.
> >
> > Richard.
> >
> >>
> >> 2022-05-21  Roger Sayle  
> >>
> >> gcc/ChangeLog
> >> * match.pd (simplify vec_unpack_hi): Simplify VEC_UNPACK_*_EXPR
> >> of uniform vector constructors and vec_duplicate.
> >>
> >> gcc/testsuite/ChangeLog
> >> * g++.dg/vect/pr105621.cc: New test case.
> >>
> >>
> >> Thanks in advance,
> >> Roger
> >> --
> >>

RE: [PATCH]AArch64 relax predicate on load structure load instructions

2022-06-13 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Richard Biener 
> Sent: Monday, June 13, 2022 9:38 AM
> To: Richard Sandiford 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org;
> nd ; Richard Earnshaw ;
> Marcus Shawcroft ; Kyrylo Tkachov
> ; ro...@eyesopen.com
> Subject: Re: [PATCH]AArch64 relax predicate on load structure load
> instructions
> 
> On Mon, 13 Jun 2022, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > On Wed, 8 Jun 2022, Richard Sandiford wrote:
> > >> Tamar Christina  writes:
> > >> >> -Original Message-
> > >> >> From: Richard Sandiford 
> > >> >> Sent: Wednesday, June 8, 2022 11:31 AM
> > >> >> To: Tamar Christina 
> > >> >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> > >> >> ; Marcus Shawcroft
> > >> >> ; Kyrylo Tkachov
> > >> >> 
> > >> >> Subject: Re: [PATCH]AArch64 relax predicate on load structure
> > >> >> load instructions
> > >> >>
> > >> >> Tamar Christina  writes:
> > >> >> > Hi All,
> > >> >> >
> > >> >> > At some point in time we started lowering the ld1r instructions in
> gimple.
> > >> >> >
> > >> >> > That is:
> > >> >> >
> > >> >> > uint8x8_t f1(const uint8_t *in) {
> > >> >> > return vld1_dup_u8([1]); }
> > >> >> >
> > >> >> > generates at gimple:
> > >> >> >
> > >> >> >   _3 = MEM[(const uint8_t *)in_1(D) + 1B];
> > >> >> >   _4 = {_3, _3, _3, _3, _3, _3, _3, _3};
> > >> >> >
> > >> >> > Which is good, but we then generate:
> > >> >> >
> > >> >> > f1:
> > >> >> > ldr b0, [x0, 1]
> > >> >> > dup v0.8b, v0.b[0]
> > >> >> > ret
> > >> >> >
> > >> >> > instead of ld1r.
> > >> >> >
> > >> >> > The reason for this is because the load instructions have a
> > >> >> > too restrictive predicate on them which causes combine not to
> > >> >> > be able to combine the instructions due to the predicate only
> > >> >> > accepting simple
> > >> >> addressing modes.
> > >> >> >
> > >> >> > This patch relaxes the predicate to accept any memory operand
> > >> >> > and relies on LRA to legitimize the address when it needs to
> > >> >> > as the constraint still only allows the simple addressing
> > >> >> > mode.  Reload is always able to legitimize to these.
> > >> >> >
> > >> >> > Secondly since we are now actually generating more ld1r it
> > >> >> > became clear that the lane instructions suffer from a similar issue.
> > >> >> >
> > >> >> > i.e.
> > >> >> >
> > >> >> > float32x4_t f2(const float32_t *in, float32x4_t a) {
> > >> >> > float32x4_t dup = vld1q_dup_f32([1]);
> > >> >> > return vfmaq_laneq_f32 (a, a, dup, 1); }
> > >> >> >
> > >> >> > would generate ld1r + vector fmla instead of ldr + lane fmla.
> > >> >> >
> > >> >> > The reason for this is similar to the ld1r issue.  The
> > >> >> > predicate is too restrictive in only acception register operands but
> not memory.
> > >> >> >
> > >> >> > This relaxes it to accept register and/or memory while leaving
> > >> >> > the constraint to only accept registers.  This will have LRA
> > >> >> > generate a reload if needed forcing the memory to registers
> > >> >> > using the standard
> > >> >> patterns.
> > >> >> >
> > >> >> > These two changes allow combine and reload to generate the
> > >> >> > right
> > >> >> sequences.
> > >> >> >
> > >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >> >>
> > >> >> This is going against the general direction of travel, which is
> > >> >> to make the instruction's predicates and conditions enforce the
> > >> >> constraints as much as possible (making optimistic assumptions
> about pseudo registers).
> > >> >>
> > >> >> The RA *can* deal with things like:
> > >> >>
> > >> >>   (match_operand:M N "general_operand" "r")
> > >> >>
> > >> >> but it's best avoided, for a few reasons:
> > >> >>
> > >> >> (1) The fix-up will be done in LRA, so IRA will not see the temporary
> > >> >> registers.  This can make the allocation of those temporaries
> > >> >> suboptimal but (more importantly) it might require other
> > >> >> previously-allocated registers to be spilled late due to the
> > >> >> unexpected increase in register pressure.
> > >> >>
> > >> >> (2) It ends up hiding instructions from the pre-RA optimisers.
> > >> >>
> > >> >> (3) It can also prevent combine opportunities (as well as create
> them),
> > >> >> unless the loose predicates in an insn I are propagated to all
> > >> >> patterns that might result from combining I with something else.
> > >> >>
> > >> >> It sounds like the first problem (not generating ld1r) could be
> > >> >> fixed by (a) combining aarch64_simd_dup and
> > >> >> *aarch64_simd_ld1r, so that the register and memory
> > >> >> alternatives are in the same pattern and (b) using the merged
> instruction(s) to implement the vec_duplicate optab.
> > >> >> Target-independent code should then make the address satisfy the
> > >> >> predicate, simplifying the address where necessary.
> > >> >>
> > >> >
> > >> > I think I am likely missing something here. I would assume that
> > >> > you

Re: [PATCH] Introduce -finstrument-functions-once

2022-06-13 Thread Eric Botcazou via Gcc-patches

> So that also applies to
> 
> "... and the second profiling function is called before the exit
> +corresponding to this first entry"
> 
> specifically "corresponding to this first entry"?   As if the second
> entry exits first will that call the second profiling function or will
> it really be the thread that called the first profiling function
> (what happens when that thread terminates before calling the second
> profiling function? (***)).  Consider re-wording this slightly.

The calls are always paired, i.e. if a thread calls the first function, then 
it will call the second function; I can indeed state it explicitly in the doc.

> +  /* If -finstrument-functions-once is specified, generate:
> +
> +  static volatile bool F.0 = true;
> +  bool tmp_first;
> 
> is there any good reason to make F.0 volatile?  That doesn't prevent
> races.

No, it does not, but it guarantees a single read so the pairing.

> Any reason to make F.0 initialized to true rather than false (bss init?)

None, changed.

> (***) looking at the implementation the second profiling function
> can end up being never called when the thread calling the first
> profiling function does not exit the function.  So I wonder if
> the "optimization"(?) not re-reading F.0 makes sense (it also
> requires to keep the value of F.0 live across the whole function)

It's for the pairing.  The value should be spilled onto the stack if need be, 
so you'd get at most 2 loads like if you re-read the variable.

Revised patch attached.

-- 
Eric Botcazoudiff --git a/gcc/common.opt b/gcc/common.opt
index 7ca0cceed82..8e961f16b0e 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1890,9 +1890,13 @@ EnumValue
 Enum(cf_protection_level) String(none) Value(CF_NONE)
 
 finstrument-functions
-Common Var(flag_instrument_function_entry_exit)
+Common Var(flag_instrument_function_entry_exit,1)
 Instrument function entry and exit with profiling calls.
 
+finstrument-functions-once
+Common Var(flag_instrument_function_entry_exit,2)
+Instrument function entry and exit with profiling calls invoked once.
+
 finstrument-functions-exclude-function-list=
 Common RejectNegative Joined
 -finstrument-functions-exclude-function-list=name,...	Do not instrument listed functions.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 174bc09e5cf..b6c0305f198 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -618,7 +618,7 @@ Objective-C and Objective-C++ Dialects}.
 -fno-stack-limit  -fsplit-stack @gol
 -fvtable-verify=@r{[}std@r{|}preinit@r{|}none@r{]} @gol
 -fvtv-counts  -fvtv-debug @gol
--finstrument-functions @gol
+-finstrument-functions  -finstrument-functions-once @gol
 -finstrument-functions-exclude-function-list=@var{sym},@var{sym},@dots{} @gol
 -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{}} @gol
 -fprofile-prefix-map=@var{old}=@var{new}
@@ -16395,6 +16395,22 @@ cannot safely be called (perhaps signal handlers, if the profiling
 routines generate output or allocate memory).
 @xref{Common Function Attributes}.
 
+@item -finstrument-functions-once
+@opindex -finstrument-functions-once
+This is similar to @option{-finstrument-functions}, but the profiling
+functions are called only once per instrumented function, i.e. the first
+profiling function is called after the first entry into the instrumented
+function and the second profiling function is called before the exit
+corresponding to this first entry.
+
+The definition of @code{once} for the purpose of this option is a little
+vague because the implementation is not protected against data races.
+As a result, the implementation only guarantees that the profiling
+functions are called at @emph{least} once per process and at @emph{most}
+once per thread, but the calls are always paired, that is to say, if a
+thread calls the first function, then it will call the second function,
+unless it never reaches the exit of the instrumented function.
+
 @item -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{}
 @opindex finstrument-functions-exclude-file-list
 
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index cd1796643d7..04990ad91a6 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -16586,6 +16586,51 @@ flag_instrument_functions_exclude_p (tree fndecl)
   return false;
 }
 
+/* Build a call to the instrumentation function FNCODE and add it to SEQ.
+   If COND_VAR is not NULL, it is a boolean variable guarding the call to
+   the instrumentation function.  IF STMT is not NULL, it is a statement
+   to be executed just before the call to the instrumentation function.  */
+
+static void
+build_instrumentation_call (gimple_seq *seq, enum built_in_function fncode,
+			tree cond_var, gimple *stmt)
+{
+  /* The instrumentation hooks aren't going to call the instrumented
+ function and the address they receive is expected to be matchable
+ against symbol addresses.  Make sure we don't create a trampoline,
+ in case the current function

Re: [COMMITTED] Implement vrange::supports_type_p.

2022-06-13 Thread Richard Biener via Gcc-patches

On Fri, Jun 3, 2022 at 9:31 AM Aldy Hernandez  wrote:
>
> [I have conservatively assumed that both the loop-ch and loop-unswitch
> passes, which also use the ranger, only support integers and pointers.
> If the goal is to handle other types as well, irange::supports_p()
> should be Value_Range::supports_type_p(), and any uses of
> int_range_max should be converted to Value_Range.  I can help in the
> conversion if you'd like.]

Both should also support float ranges, so yes - if you can send simple
patches I'll review them.

> As discussed, this patch disambiguates the use of supports_type_p
> throughout, as what ranger supports is a totally different question
> than what a given range variant (irange, frange, etc) supports.
>
> Unfortunately we need both a static method and a virtual method, and
> they can't be named the same.  The uses are documented in the vrange
> class:
>
> +// To query what types ranger and the entire ecosystem can support,
> +// use Value_Range::supports_type_p(tree type).  This is a static
> +// method available independently of any vrange object.
> +//
> +// To query what a given vrange variant can support, use:
> +//irange::supports_p ()
> +//frange::supports_p ()
> +//etc
> +//
> +// To query what a range object can support, use:
> +//void foo (vrange , irange , frange )
> +//{
> +// if (v.supports_type_p (type)) ...
> +// if (i.supports_type_p (type)) ...
> +// if (f.supports_type_p (type)) ...
> +//}
>
> The value_range_equiv::supports_p() method can be use to determine
> what legacy VRP supports, as irange::supports_p() will no longer be
> applicable in the evrp analyzer code base once irange and prange are
> split.
>
> Tested on x86-64 Linux.
>
> gcc/ChangeLog:
>
> * gimple-range-edge.cc (gimple_outgoing_range_stmt_p): Adjust for
> an object level supports_type_p for irange and a static
> Value_Range::supports_type_p.
> * gimple-range-fold.cc (fold_using_range::range_of_range_op): Same.
> (fold_using_range::range_of_address): Same.
> (fold_using_range::range_of_builtin_call): Same.
> * gimple-range-fold.h (gimple_range_type): Same.
> (gimple_range_ssa_p): Same.
> * gimple-range-path.cc (path_range_query::internal_range_of_expr):
> Same.
> (path_range_query::range_of_stmt): Same.
> (path_range_query::add_to_imports): Same.
> * gimple-range.cc (gimple_ranger::range_on_edge): Same.
> (gimple_ranger::export_global_ranges): Same.
> * gimple-ssa-evrp-analyze.cc
> (evrp_range_analyzer::record_ranges_from_phis):  Same.
> * range-op.cc (range_operator::wi_fold): Same.
> (range_operator::fold_range): Same.
> * tree-ssa-loop-ch.cc (entry_loop_condition_is_static): Same.
> * tree-ssa-loop-unswitch.cc (struct unswitch_predicate): Same.
> (evaluate_control_stmt_using_entry_checks): Same.
> * tree-ssa-threadedge.cc
> (hybrid_jt_simplifier::compute_ranges_from_state): Same.
> * tree-vrp.cc (supported_types_p): Same.
> * value-query.cc (range_query::value_of_expr): Same.
> (range_query::value_on_edge): Same.
> (range_query::value_of_stmt): Same.
> (range_query::get_tree_range): Same.
> (get_range_global): Same.
> (global_range_query::range_of_expr): Same.
> * value-range-equiv.h (class value_range_equiv): Same.
> * value-range.cc (irange::supports_type_p): Same.
> (unsupported_range::supports_type_p): Same.
> * value-range.h (enum value_range_discriminator): Same.
> (Value_Range::init): Same.
> (Value_Range::supports_type_p): Same.
> (irange::supports_type_p): Same.
> (irange::supports_p): Same.
> (vrange::supports_type_p): Same.
> (vrange_allocator::alloc_vrange): Same.
> ---
>  gcc/gimple-range-edge.cc   |  3 +--
>  gcc/gimple-range-fold.cc   |  5 ++--
>  gcc/gimple-range-fold.h|  4 +--
>  gcc/gimple-range-path.cc   |  6 ++---
>  gcc/gimple-range.cc|  4 +--
>  gcc/gimple-ssa-evrp-analyze.cc |  2 +-
>  gcc/range-op.cc|  4 +--
>  gcc/tree-ssa-loop-ch.cc|  2 +-
>  gcc/tree-ssa-loop-unswitch.cc  |  6 ++---
>  gcc/tree-ssa-threadedge.cc |  2 +-
>  gcc/tree-vrp.cc|  4 +--
>  gcc/value-query.cc | 16 ++--
>  gcc/value-range-equiv.h|  4 +++
>  gcc/value-range.cc | 12 +
>  gcc/value-range.h  | 45 +-
>  15 files changed, 76 insertions(+), 43 deletions(-)
>
> diff --git a/gcc/gimple-range-edge.cc b/gcc/gimple-range-edge.cc
> index 6fe33408f7e..03a804ac2be 100644
> --- a/gcc/gimple-range-edge.cc
> +++ b/gcc/gimple-range-edge.cc
> @@ -44,8 +44,7 @@ gimple_outgoing_range_stmt_p (basic_block bb)
>gimple *s = gsi_stmt (gsi);
>if (is_a (s) && range_op_handler (s))
>

Re: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-13 Thread Richard Biener via Gcc-patches

On Mon, 13 Jun 2022, Richard Biener wrote:

> On Thu, 9 Jun 2022, Tamar Christina wrote:
> 
> > Hi All,
> > 
> > In plenty of image and video processing code it's common to modify pixel 
> > values
> > by a widening operation and then scale them back into range by dividing by 
> > 255.
> > 
> > This patch adds an optab to allow us to emit an optimized sequence when 
> > doing
> > an unsigned division that is equivalent to:
> > 
> >x = y / (2 ^ (bitsize (y)/2)-1
> > 
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues.
> > 
> > Ok for master?
> 
> Looking at 2/2 it seems that this is the wrong way to attack the
> problem.  The ISA doesn't have such instruction so adding an optab
> looks premature.  I suppose that there's no unsigned vector integer
> division and thus we open-code that in a different way?  Isn't the
> correct thing then to fixup that open-coding if it is more efficient?

Btw, on x86 we use

t.c:3:21: note:   replacing earlier pattern patt_25 = patt_28 / 255;
t.c:3:21: note:   with patt_25 = patt_19 >> 7;
t.c:3:21: note:   extra pattern stmt: patt_19 = patt_28 h* 32897;

which translates to

vpmulhuw%ymm4, %ymm0, %ymm0
vpmulhuw%ymm4, %ymm1, %ymm1
vpsrlw  $7, %ymm0, %ymm0
vpsrlw  $7, %ymm1, %ymm1

there's odd

vpand   %ymm0, %ymm3, %ymm0
vpand   %ymm1, %ymm3, %ymm1

before (%ymm3 is all 0x00ff)

vpackuswb   %ymm1, %ymm0, %ymm0

that's not visible in GIMPLE.  I guess aarch64 lacks a highpart
multiply here?  In any case, it seems that generic division expansion
could be improved here? (choose_multiplier?)

Richard.

> Richard.
> 
> > Thanks,
> > Tamar
> > 
> > gcc/ChangeLog:
> > 
> > * internal-fn.def (DIV_POW2_BITMASK): New.
> > * optabs.def (udiv_pow2_bitmask_optab): New.
> > * doc/md.texi: Document it.
> > * tree-vect-patterns.cc (vect_recog_divmod_pattern): Recognize pattern.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.dg/vect/vect-div-bitmask-1.c: New test.
> > * gcc.dg/vect/vect-div-bitmask-2.c: New test.
> > * gcc.dg/vect/vect-div-bitmask-3.c: New test.
> > * gcc.dg/vect/vect-div-bitmask.h: New file.
> > 
> > --- inline copy of patch -- 
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 
> > f3619c505c025f158c2bc64756531877378b22e1..784c49d7d24cef7619e4d613f7b4f6e945866c38
> >  100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5588,6 +5588,18 @@ signed op0, op1;
> >  op0 = op1 / (1 << imm);
> >  @end smallexample
> >  
> > +@cindex @code{udiv_pow2_bitmask@var{m2}} instruction pattern
> > +@item @samp{udiv_pow2_bitmask@var{m2}}
> > +@cindex @code{udiv_pow2_bitmask@var{m2}} instruction pattern
> > +@itemx @samp{udiv_pow2_bitmask@var{m2}}
> > +Unsigned vector division by an immediate that is equivalent to
> > +@samp{2^(bitsize(m) / 2) - 1}.
> > +@smallexample
> > +unsigned short op0; op1;
> > +@dots{}
> > +op0 = op1 / 0xffU;
> > +@end smallexample
> > +
> >  @cindex @code{vec_shl_insert_@var{m}} instruction pattern
> >  @item @samp{vec_shl_insert_@var{m}}
> >  Shift the elements in vector input operand 1 left one element (i.e.@:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 
> > d2d550d358606022b1cb44fa842f06e0be507bc3..a3e3cc1520f77683ebf6256898f916ed45de475f
> >  100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -159,6 +159,8 @@ DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | 
> > ECF_NOTHROW,
> >vec_shl_insert, binary)
> >  
> >  DEF_INTERNAL_OPTAB_FN (DIV_POW2, ECF_CONST | ECF_NOTHROW, sdiv_pow2, 
> > binary)
> > +DEF_INTERNAL_OPTAB_FN (DIV_POW2_BITMASK, ECF_CONST | ECF_NOTHROW,
> > +  udiv_pow2_bitmask, unary)
> >  
> >  DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
> >  DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index 
> > 801310ebaa7d469520809bb7efed6820f8eb866b..3f0ac05ef5ad5aed8d6ca391f4eed71b0494e17f
> >  100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -372,6 +372,7 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3")
> >  OPTAB_D (umulhs_optab, "umulhs$a3")
> >  OPTAB_D (umulhrs_optab, "umulhrs$a3")
> >  OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3")
> > +OPTAB_D (udiv_pow2_bitmask_optab, "udiv_pow2_bitmask$a2")
> >  OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
> >  OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
> >  OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c 
> > b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c
> > new file mode 100644
> > index 
> > ..a7ea3cce4764239c5d281a8f0bead1f6a452de3f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#include 
> > +#include "tree-vect.h"
> > +
> > +#define N 50
> >

Re: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-13 Thread Richard Biener via Gcc-patches

On Thu, 9 Jun 2022, Tamar Christina wrote:

> Hi All,
> 
> In plenty of image and video processing code it's common to modify pixel 
> values
> by a widening operation and then scale them back into range by dividing by 
> 255.
> 
> This patch adds an optab to allow us to emit an optimized sequence when doing
> an unsigned division that is equivalent to:
> 
>x = y / (2 ^ (bitsize (y)/2)-1
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
> 
> Ok for master?

Looking at 2/2 it seems that this is the wrong way to attack the
problem.  The ISA doesn't have such instruction so adding an optab
looks premature.  I suppose that there's no unsigned vector integer
division and thus we open-code that in a different way?  Isn't the
correct thing then to fixup that open-coding if it is more efficient?

Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * internal-fn.def (DIV_POW2_BITMASK): New.
>   * optabs.def (udiv_pow2_bitmask_optab): New.
>   * doc/md.texi: Document it.
>   * tree-vect-patterns.cc (vect_recog_divmod_pattern): Recognize pattern.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-div-bitmask-1.c: New test.
>   * gcc.dg/vect/vect-div-bitmask-2.c: New test.
>   * gcc.dg/vect/vect-div-bitmask-3.c: New test.
>   * gcc.dg/vect/vect-div-bitmask.h: New file.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> f3619c505c025f158c2bc64756531877378b22e1..784c49d7d24cef7619e4d613f7b4f6e945866c38
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5588,6 +5588,18 @@ signed op0, op1;
>  op0 = op1 / (1 << imm);
>  @end smallexample
>  
> +@cindex @code{udiv_pow2_bitmask@var{m2}} instruction pattern
> +@item @samp{udiv_pow2_bitmask@var{m2}}
> +@cindex @code{udiv_pow2_bitmask@var{m2}} instruction pattern
> +@itemx @samp{udiv_pow2_bitmask@var{m2}}
> +Unsigned vector division by an immediate that is equivalent to
> +@samp{2^(bitsize(m) / 2) - 1}.
> +@smallexample
> +unsigned short op0; op1;
> +@dots{}
> +op0 = op1 / 0xffU;
> +@end smallexample
> +
>  @cindex @code{vec_shl_insert_@var{m}} instruction pattern
>  @item @samp{vec_shl_insert_@var{m}}
>  Shift the elements in vector input operand 1 left one element (i.e.@:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 
> d2d550d358606022b1cb44fa842f06e0be507bc3..a3e3cc1520f77683ebf6256898f916ed45de475f
>  100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -159,6 +159,8 @@ DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | 
> ECF_NOTHROW,
>  vec_shl_insert, binary)
>  
>  DEF_INTERNAL_OPTAB_FN (DIV_POW2, ECF_CONST | ECF_NOTHROW, sdiv_pow2, binary)
> +DEF_INTERNAL_OPTAB_FN (DIV_POW2_BITMASK, ECF_CONST | ECF_NOTHROW,
> +udiv_pow2_bitmask, unary)
>  
>  DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
>  DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 
> 801310ebaa7d469520809bb7efed6820f8eb866b..3f0ac05ef5ad5aed8d6ca391f4eed71b0494e17f
>  100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -372,6 +372,7 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3")
>  OPTAB_D (umulhs_optab, "umulhs$a3")
>  OPTAB_D (umulhrs_optab, "umulhrs$a3")
>  OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3")
> +OPTAB_D (udiv_pow2_bitmask_optab, "udiv_pow2_bitmask$a2")
>  OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
>  OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
>  OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c
> new file mode 100644
> index 
> ..a7ea3cce4764239c5d281a8f0bead1f6a452de3f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c
> @@ -0,0 +1,25 @@
> +/* { dg-require-effective-target vect_int } */
> +
> +#include 
> +#include "tree-vect.h"
> +
> +#define N 50
> +#define TYPE uint8_t 
> +
> +__attribute__((noipa, noinline, optimize("O1")))
> +void fun1(TYPE* restrict pixel, TYPE level, int n)
> +{
> +  for (int i = 0; i < n; i+=1)
> +pixel[i] = (pixel[i] * level) / 0xff;
> +}
> +
> +__attribute__((noipa, noinline, optimize("O3")))
> +void fun2(TYPE* restrict pixel, TYPE level, int n)
> +{
> +  for (int i = 0; i < n; i+=1)
> +pixel[i] = (pixel[i] * level) / 0xff;
> +}
> +
> +#include "vect-div-bitmask.h"
> +
> +/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" "vect" 
> } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c 
> b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c
> new file mode 100644
> index 
> ..009e16e1b36497e5724410d9843f1ce122b26dda
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c
> @@ -0,0 +1,25 @@
> +/* { dg-require-effective-target vect_int } */
> +
> +#include 
> +#include "tree-vect.h"
> +
> +#define N 50
>

Re: [ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-06-13 Thread Richard Biener via Gcc-patches

On Tue, 7 Jun 2022, Richard Sandiford wrote:

> Joel Hutton  writes:
> >> > Patches attached. They already incorporated the .cc rename, now
> >> > rebased to be after the change to tree.h
> >>
> >> @@ -1412,8 +1412,7 @@ vect_recog_widen_op_pattern (vec_info *vinfo,
> >>2, oprnd, half_type, unprom, vectype);
> >>
> >>tree var = vect_recog_temp_ssa_var (itype, NULL);
> >> -  gimple *pattern_stmt = gimple_build_assign (var, wide_code,
> >> - oprnd[0], oprnd[1]);
> >> +  gimple *pattern_stmt = gimple_build (var, wide_code, oprnd[0],
> >> oprnd[1]);
> >>
> >>
> >> you should be able to do without the new gimple_build overload
> >> by using
> >>
> >>gimple_seq stmts = NULL;
> >>gimple_build (, wide_code, itype, oprnd[0], oprnd[1]);
> >>gimple *pattern_stmt = gimple_seq_last_stmt (stmts);
> >>
> >> because 'gimple_build' is an existing API.
> >
> > Done.
> >
> > The gimple_build overload was at the request of Richard Sandiford, I assume 
> > removing it is ok with you Richard S?
> > From Richard Sandiford:
> >> For example, I think we should hide this inside a new:
> >>
> >>   gimple_build (var, wide_code, oprnd[0], oprnd[1]);
> >>
> >> that works directly on code_helper, similarly to the new code_helper
> >> gimple_build interfaces.
> 
> I thought the potential problem with the above is that gimple_build
> is a folding interface, so in principle it's allowed to return an
> existing SSA_NAME set by an existing statement (or even a constant).
> I think in this context we do need to force a new statement to be
> created.

Yes, that's due to how we use vect_finish_stmt_generation (only?).
It might be useful to add an overload that takes a gimple_seq
instead of a single gimple * for the vectorized stmt and leave
all the magic to that.  Now - we have the additional issue
that we have STMT_VINFO_VEC_STMTS instead of STMT_VINFO_VEC_DEFS
(in the end we'll only ever need the defs, never the stmts I think).

I do think that we eventually want to 'enhance' the gimple.h
non-folding stmt building API, unfortunately I took the 'gimple_build'
name for the folding one, so alternatively we can unify assign/call
with gimple_build_assign_or_call (...).  I don't really like the
idea of having folding and non-folding APIs being overloads :/
Maybe the non-folding API should be CTORs (guess GTY won't like
that) or static member functions:

gimple *gimple::build (tree, code_helper, tree, tree);

and in the long run the gimple_build API should be (for some uses?)
off a class as well, like instead of

  gimple_seq seq = NULL;
  op = gimple_build (, ...);

do

  gimple_builder b (location); // location defaulted to UNKNOWN
  op = b.build (...);

So - writing the above I somewhat like the idea of static
member functions in 'gimple' (yes, at the root of the class
hierarchy, definitely not at gimple_statement_with_memory_ops_base,
not sure if we want gassign::build for assigns
and only the code_helper 'overloads' at the class root).

Richard.

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

RE: [ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-06-13 Thread Richard Biener via Gcc-patches

On Tue, 7 Jun 2022, Joel Hutton wrote:

> Thanks Richard,
> 
> > I thought the potential problem with the above is that gimple_build is a
> > folding interface, so in principle it's allowed to return an existing 
> > SSA_NAME
> > set by an existing statement (or even a constant).
> > I think in this context we do need to force a new statement to be created.
> 
> Before I make any changes, I'd like to check we're all on the same page.
> 
> richi, are you ok with the gimple_build function, perhaps with a 
> different name if you are concerned with overloading? we could use 
> gimple_ch_build or gimple_code_helper_build?

We can go with a private vect_gimple_build function until we sort out
the API issue to unblock Tamar (I'll reply to Richards reply with further 
thoughts on this)

> Similarly are you ok with the use of gimple_extract_op? I would lean towards 
> using it as it is cleaner, but I don't have strong feelings.

I don't like using gimple_extract_op here, I think I outlined a variant
that is even shorter.

Richard.

> Joel
> 
> > -Original Message-
> > From: Richard Sandiford 
> > Sent: 07 June 2022 09:18
> > To: Joel Hutton 
> > Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> > Subject: Re: [ping][vect-patterns] Refactor widen_plus/widen_minus as
> > internal_fns
> > 
> > Joel Hutton  writes:
> > >> > Patches attached. They already incorporated the .cc rename, now
> > >> > rebased to be after the change to tree.h
> > >>
> > >> @@ -1412,8 +1412,7 @@ vect_recog_widen_op_pattern (vec_info *vinfo,
> > >>2, oprnd, half_type, unprom, vectype);
> > >>
> > >>tree var = vect_recog_temp_ssa_var (itype, NULL);
> > >> -  gimple *pattern_stmt = gimple_build_assign (var, wide_code,
> > >> - oprnd[0], oprnd[1]);
> > >> +  gimple *pattern_stmt = gimple_build (var, wide_code, oprnd[0],
> > >> oprnd[1]);
> > >>
> > >>
> > >> you should be able to do without the new gimple_build overload by
> > >> using
> > >>
> > >>gimple_seq stmts = NULL;
> > >>gimple_build (, wide_code, itype, oprnd[0], oprnd[1]);
> > >>gimple *pattern_stmt = gimple_seq_last_stmt (stmts);
> > >>
> > >> because 'gimple_build' is an existing API.
> > >
> > > Done.
> > >
> > > The gimple_build overload was at the request of Richard Sandiford, I
> > assume removing it is ok with you Richard S?
> > > From Richard Sandiford:
> > >> For example, I think we should hide this inside a new:
> > >>
> > >>   gimple_build (var, wide_code, oprnd[0], oprnd[1]);
> > >>
> > >> that works directly on code_helper, similarly to the new code_helper
> > >> gimple_build interfaces.
> > 
> > I thought the potential problem with the above is that gimple_build is a
> > folding interface, so in principle it's allowed to return an existing 
> > SSA_NAME
> > set by an existing statement (or even a constant).
> > I think in this context we do need to force a new statement to be created.
> > 
> > Of course, the hope is that there wouldn't still be such folding 
> > opportunities
> > at this stage, but I don't think it's guaranteed (especially with options
> > fuzzing).
> > 
> > Sind I was mentioned :-) ...
> > 
> > Could you run the patch through contrib/check_GNU_style.py?
> > There seem to be a few long lines.
> > 
> > > +  if (res_op.code.is_tree_code ())
> > 
> > Do you need this is_tree_code ()?  These comparisons…
> > 
> > > +  {
> > > +  widen_arith = (code == WIDEN_PLUS_EXPR
> > > +  || code == WIDEN_MINUS_EXPR
> > > +  || code == WIDEN_MULT_EXPR
> > > +  || code == WIDEN_LSHIFT_EXPR);
> > 
> > …ought to be safe unconditionally.
> > 
> > > + }
> > > +  else
> > > +  widen_arith = false;
> > > +
> > > +  if (!widen_arith
> > > +  && !CONVERT_EXPR_CODE_P (code)
> > > +  && code != FIX_TRUNC_EXPR
> > > +  && code != FLOAT_EXPR)
> > > +return false;
> > >
> > >/* Check types of lhs and rhs.  */
> > > -  scalar_dest = gimple_assign_lhs (stmt);
> > > +  scalar_dest = gimple_get_lhs (stmt);
> > >lhs_type = TREE_TYPE (scalar_dest);
> > >vectype_out = STMT_VINFO_VECTYPE (stmt_info);
> > >
> > > @@ -4938,10 +4951,14 @@ vectorizable_conversion (vec_info *vinfo,
> > >
> > >if (op_type == binary_op)
> > >  {
> > > -  gcc_assert (code == WIDEN_MULT_EXPR || code ==
> > WIDEN_LSHIFT_EXPR
> > > -   || code == WIDEN_PLUS_EXPR || code ==
> > WIDEN_MINUS_EXPR);
> > > +  gcc_assert (code == WIDEN_MULT_EXPR
> > > +   || code == WIDEN_LSHIFT_EXPR
> > > +   || code == WIDEN_PLUS_EXPR
> > > +   || code == WIDEN_MINUS_EXPR);
> > >
> > > -  op1 = gimple_assign_rhs2 (stmt);
> > > +
> > > +  op1 = is_gimple_assign (stmt) ? gimple_assign_rhs2 (stmt) :
> > > +  gimple_call_arg (stmt, 0);
> > >tree vectype1_in;
> > >if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 1,
> > >  , _op1, [1], _in)) […] @@
> > -12181,7
>

Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to set the lowpart.

2022-06-13 Thread Richard Sandiford via Gcc-patches

Tamar Christina  writes:
> Hi All,
>
> When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs.  One for the
> lowpart and one for the highpart.
>
> The problem with this is that in RTL the lvalue of the RTX is the only thing
> tying the two instructions together.
>
> This means that e.g. combine is unable to try to combine the two instructions
> for setting the lowpart and highpart.
>
> For ISAs that have bit extract instructions we can eliminate one of the 
> extracts
> if, and only if we're setting the entire complex number.
>
> This change changes the expand code when we're setting the entire complex 
> number
> to generate a subreg for the lowpart instead of a vec_extract.
>
> This allows us to optimize sequences such as:
>
> _Complex int f(int a, int b) {
> _Complex int t = a + b * 1i;
> return t;
> }
>
> from:
>
> f:
>   bfi x2, x0, 0, 32
>   bfi x2, x1, 32, 32
>   mov x0, x2
>   ret
>
> into:
>
> f:
>   bfi x0, x1, 32, 32
>   ret
>
> I have also confirmed the codegen for x86_64 did not change.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
>
> Ok for master?

I'm not sure this is endian-safe.  For big-endian it's the imaginary
part that can be written as a subreg.  The real part might be the
high part of a register.

Maybe a more general way to handle this would be to add (yet another)
parameter to store_bit_field that indicates that the current value
of the structure is undefined.  That would also be useful in at
least one other caller (from calls.cc).  write_complex_part could
then have a similar parameter, true for the first write and false
for the second.

Thanks,
Richard

>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * emit-rtl.cc (validate_subreg): Accept subregs of complex modes.
>   * expr.cc (emit_move_complex_parts): Emit subreg of lowpart if possible.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.target/aarch64/complex-init.C: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index 
> f4404d7abe33b565358b7f609a91114c75ecf4e7..15ffca2ffe986bca56c1fae9381bd33f5d6b012d
>  100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -947,9 +947,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
>  && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
>  ;
>/* Subregs involving floating point modes are not allowed to
> - change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> + change size unless it's an insert into a complex mode.
> + Therefore (subreg:DI (reg:DF) 0) and (subreg:CS (reg:SF) 0) are fine, 
> but
>   (subreg:SI (reg:DF) 0) isn't.  */
> -  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> +  else if ((FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> +&& !COMPLEX_MODE_P (omode))
>  {
>if (! (known_eq (isize, osize)
>/* LRA can use subreg to store a floating point value in
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 
> 5f7142b975ada2cd8b00663d35ba1e0004b8e28d..fce672c236fdbc4d40adb6e2614c234c02a61933
>  100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -3740,7 +3740,17 @@ emit_move_complex_parts (rtx x, rtx y)
>&& REG_P (x) && !reg_overlap_mentioned_p (x, y))
>  emit_clobber (x);
>  
> -  write_complex_part (x, read_complex_part (y, false), false);
> +  /* If we're writing the entire value using a concat into a register
> + then emit the lower part as a simple mov followed by an insert
> + into the top part.  */
> +  if (GET_CODE (y) == CONCAT && !reload_completed && REG_P (x))
> +{
> +  rtx val = XEXP (y, false);
> +  rtx dest = lowpart_subreg (GET_MODE (val), x, GET_MODE (x));
> +  emit_move_insn (dest, val);
> +}
> +  else
> +write_complex_part (x, read_complex_part (y, false), false);
>write_complex_part (x, read_complex_part (y, true), true);
>  
>return get_last_insn ();
> diff --git a/gcc/testsuite/g++.target/aarch64/complex-init.C 
> b/gcc/testsuite/g++.target/aarch64/complex-init.C
> new file mode 100644
> index 
> ..497cc4bca3e2c59da95c871ceb5cc96216fc302d
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/aarch64/complex-init.C
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> +
> +/*
> +** _Z1fii:
> +** ...
> +**   bfi x0, x1, 32, 32
> +**   ret
> +** ...
> +*/
> +_Complex int f(int a, int b) {
> +_Complex int t = a + b * 1i;
> +return t;
> +}
> +
> +/*
> +** _Z2f2ii:
> +** ...
> +**   bfi x0, x1, 32, 32
> +**   ret
> +** ...
> +*/
> +_Complex int f2(int a, int b) {
> +_Complex int t = {a, b};
> +return t;
> +}
> +
> +/*
> +** _Z12f_convolutedii:
> +** ...
> +**   bfi x0, x1, 32, 32
> +**   ret
> +** ...
> +*/
> +_Complex int f_convoluted(int a, int b) {
> +_Complex int t = (_Complex int)a;
> +__imag__ t

Re: [PATCH]AArch64 relax predicate on load structure load instructions

2022-06-13 Thread Richard Biener via Gcc-patches

On Mon, 13 Jun 2022, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 8 Jun 2022, Richard Sandiford wrote:
> >> Tamar Christina  writes:
> >> >> -Original Message-
> >> >> From: Richard Sandiford 
> >> >> Sent: Wednesday, June 8, 2022 11:31 AM
> >> >> To: Tamar Christina 
> >> >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> >> >> ; Marcus Shawcroft
> >> >> ; Kyrylo Tkachov 
> >> >> Subject: Re: [PATCH]AArch64 relax predicate on load structure load
> >> >> instructions
> >> >> 
> >> >> Tamar Christina  writes:
> >> >> > Hi All,
> >> >> >
> >> >> > At some point in time we started lowering the ld1r instructions in 
> >> >> > gimple.
> >> >> >
> >> >> > That is:
> >> >> >
> >> >> > uint8x8_t f1(const uint8_t *in) {
> >> >> > return vld1_dup_u8([1]);
> >> >> > }
> >> >> >
> >> >> > generates at gimple:
> >> >> >
> >> >> >   _3 = MEM[(const uint8_t *)in_1(D) + 1B];
> >> >> >   _4 = {_3, _3, _3, _3, _3, _3, _3, _3};
> >> >> >
> >> >> > Which is good, but we then generate:
> >> >> >
> >> >> > f1:
> >> >> >   ldr b0, [x0, 1]
> >> >> >   dup v0.8b, v0.b[0]
> >> >> >   ret
> >> >> >
> >> >> > instead of ld1r.
> >> >> >
> >> >> > The reason for this is because the load instructions have a too
> >> >> > restrictive predicate on them which causes combine not to be able to
> >> >> > combine the instructions due to the predicate only accepting simple
> >> >> addressing modes.
> >> >> >
> >> >> > This patch relaxes the predicate to accept any memory operand and
> >> >> > relies on LRA to legitimize the address when it needs to as the
> >> >> > constraint still only allows the simple addressing mode.  Reload is
> >> >> > always able to legitimize to these.
> >> >> >
> >> >> > Secondly since we are now actually generating more ld1r it became
> >> >> > clear that the lane instructions suffer from a similar issue.
> >> >> >
> >> >> > i.e.
> >> >> >
> >> >> > float32x4_t f2(const float32_t *in, float32x4_t a) {
> >> >> > float32x4_t dup = vld1q_dup_f32([1]);
> >> >> > return vfmaq_laneq_f32 (a, a, dup, 1); }
> >> >> >
> >> >> > would generate ld1r + vector fmla instead of ldr + lane fmla.
> >> >> >
> >> >> > The reason for this is similar to the ld1r issue.  The predicate is
> >> >> > too restrictive in only acception register operands but not memory.
> >> >> >
> >> >> > This relaxes it to accept register and/or memory while leaving the
> >> >> > constraint to only accept registers.  This will have LRA generate a
> >> >> > reload if needed forcing the memory to registers using the standard
> >> >> patterns.
> >> >> >
> >> >> > These two changes allow combine and reload to generate the right
> >> >> sequences.
> >> >> >
> >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >> >> 
> >> >> This is going against the general direction of travel, which is to make 
> >> >> the
> >> >> instruction's predicates and conditions enforce the constraints as much 
> >> >> as
> >> >> possible (making optimistic assumptions about pseudo registers).
> >> >> 
> >> >> The RA *can* deal with things like:
> >> >> 
> >> >>   (match_operand:M N "general_operand" "r")
> >> >> 
> >> >> but it's best avoided, for a few reasons:
> >> >> 
> >> >> (1) The fix-up will be done in LRA, so IRA will not see the temporary
> >> >> registers.  This can make the allocation of those temporaries
> >> >> suboptimal but (more importantly) it might require other
> >> >> previously-allocated registers to be spilled late due to the
> >> >> unexpected increase in register pressure.
> >> >> 
> >> >> (2) It ends up hiding instructions from the pre-RA optimisers.
> >> >> 
> >> >> (3) It can also prevent combine opportunities (as well as create them),
> >> >> unless the loose predicates in an insn I are propagated to all
> >> >> patterns that might result from combining I with something else.
> >> >> 
> >> >> It sounds like the first problem (not generating ld1r) could be fixed 
> >> >> by (a)
> >> >> combining aarch64_simd_dup and *aarch64_simd_ld1r, so
> >> >> that the register and memory alternatives are in the same pattern and 
> >> >> (b)
> >> >> using the merged instruction(s) to implement the vec_duplicate optab.
> >> >> Target-independent code should then make the address satisfy the
> >> >> predicate, simplifying the address where necessary.
> >> >> 
> >> >
> >> > I think I am likely missing something here. I would assume that you 
> >> > wanted
> >> > to use the optab to split the addressing off from the mem expression so 
> >> > the
> >> > combined insn matches.
> >> >
> >> > But in that case, why do you need to combine the two instructions?
> >> > I've tried and it doesn't work since the vec_duplicate optab doesn't see 
> >> > the
> >> > mem as op1, because in gimple the mem is not part of the duplicate.
> >> >
> >> > So you still just see:
> >> >
> >>  dbgrtx (ops[1].value)
> >> > (subreg/s/v:QI (reg:SI 92 [ _3 ]) 0)
> >> >
> >> > As the operand as the

Re: [PATCH]AArch64 relax predicate on load structure load instructions

2022-06-13 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> On Wed, 8 Jun 2022, Richard Sandiford wrote:
>> Tamar Christina  writes:
>> >> -Original Message-
>> >> From: Richard Sandiford 
>> >> Sent: Wednesday, June 8, 2022 11:31 AM
>> >> To: Tamar Christina 
>> >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> >> ; Marcus Shawcroft
>> >> ; Kyrylo Tkachov 
>> >> Subject: Re: [PATCH]AArch64 relax predicate on load structure load
>> >> instructions
>> >> 
>> >> Tamar Christina  writes:
>> >> > Hi All,
>> >> >
>> >> > At some point in time we started lowering the ld1r instructions in 
>> >> > gimple.
>> >> >
>> >> > That is:
>> >> >
>> >> > uint8x8_t f1(const uint8_t *in) {
>> >> > return vld1_dup_u8([1]);
>> >> > }
>> >> >
>> >> > generates at gimple:
>> >> >
>> >> >   _3 = MEM[(const uint8_t *)in_1(D) + 1B];
>> >> >   _4 = {_3, _3, _3, _3, _3, _3, _3, _3};
>> >> >
>> >> > Which is good, but we then generate:
>> >> >
>> >> > f1:
>> >> > ldr b0, [x0, 1]
>> >> > dup v0.8b, v0.b[0]
>> >> > ret
>> >> >
>> >> > instead of ld1r.
>> >> >
>> >> > The reason for this is because the load instructions have a too
>> >> > restrictive predicate on them which causes combine not to be able to
>> >> > combine the instructions due to the predicate only accepting simple
>> >> addressing modes.
>> >> >
>> >> > This patch relaxes the predicate to accept any memory operand and
>> >> > relies on LRA to legitimize the address when it needs to as the
>> >> > constraint still only allows the simple addressing mode.  Reload is
>> >> > always able to legitimize to these.
>> >> >
>> >> > Secondly since we are now actually generating more ld1r it became
>> >> > clear that the lane instructions suffer from a similar issue.
>> >> >
>> >> > i.e.
>> >> >
>> >> > float32x4_t f2(const float32_t *in, float32x4_t a) {
>> >> > float32x4_t dup = vld1q_dup_f32([1]);
>> >> > return vfmaq_laneq_f32 (a, a, dup, 1); }
>> >> >
>> >> > would generate ld1r + vector fmla instead of ldr + lane fmla.
>> >> >
>> >> > The reason for this is similar to the ld1r issue.  The predicate is
>> >> > too restrictive in only acception register operands but not memory.
>> >> >
>> >> > This relaxes it to accept register and/or memory while leaving the
>> >> > constraint to only accept registers.  This will have LRA generate a
>> >> > reload if needed forcing the memory to registers using the standard
>> >> patterns.
>> >> >
>> >> > These two changes allow combine and reload to generate the right
>> >> sequences.
>> >> >
>> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >> 
>> >> This is going against the general direction of travel, which is to make 
>> >> the
>> >> instruction's predicates and conditions enforce the constraints as much as
>> >> possible (making optimistic assumptions about pseudo registers).
>> >> 
>> >> The RA *can* deal with things like:
>> >> 
>> >>   (match_operand:M N "general_operand" "r")
>> >> 
>> >> but it's best avoided, for a few reasons:
>> >> 
>> >> (1) The fix-up will be done in LRA, so IRA will not see the temporary
>> >> registers.  This can make the allocation of those temporaries
>> >> suboptimal but (more importantly) it might require other
>> >> previously-allocated registers to be spilled late due to the
>> >> unexpected increase in register pressure.
>> >> 
>> >> (2) It ends up hiding instructions from the pre-RA optimisers.
>> >> 
>> >> (3) It can also prevent combine opportunities (as well as create them),
>> >> unless the loose predicates in an insn I are propagated to all
>> >> patterns that might result from combining I with something else.
>> >> 
>> >> It sounds like the first problem (not generating ld1r) could be fixed by 
>> >> (a)
>> >> combining aarch64_simd_dup and *aarch64_simd_ld1r, so
>> >> that the register and memory alternatives are in the same pattern and (b)
>> >> using the merged instruction(s) to implement the vec_duplicate optab.
>> >> Target-independent code should then make the address satisfy the
>> >> predicate, simplifying the address where necessary.
>> >> 
>> >
>> > I think I am likely missing something here. I would assume that you wanted
>> > to use the optab to split the addressing off from the mem expression so the
>> > combined insn matches.
>> >
>> > But in that case, why do you need to combine the two instructions?
>> > I've tried and it doesn't work since the vec_duplicate optab doesn't see 
>> > the
>> > mem as op1, because in gimple the mem is not part of the duplicate.
>> >
>> > So you still just see:
>> >
>>  dbgrtx (ops[1].value)
>> > (subreg/s/v:QI (reg:SI 92 [ _3 ]) 0)
>> >
>> > As the operand as the argument to the dup is just an SSA_NAME.
>> 
>> Ah, yeah, I'd forgotten that fixed-length vec_duplicates would
>> come from a constructor rather than a vec_duplicate_expr, so we don't
>> get the usual benefit of folding single-use mems during expand.
>> 
>>

Re: [PATCH]AArch64 relax predicate on load structure load instructions

2022-06-13 Thread Richard Biener via Gcc-patches

On Wed, 8 Jun 2022, Richard Sandiford wrote:

> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Wednesday, June 8, 2022 11:31 AM
> >> To: Tamar Christina 
> >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> >> ; Marcus Shawcroft
> >> ; Kyrylo Tkachov 
> >> Subject: Re: [PATCH]AArch64 relax predicate on load structure load
> >> instructions
> >> 
> >> Tamar Christina  writes:
> >> > Hi All,
> >> >
> >> > At some point in time we started lowering the ld1r instructions in 
> >> > gimple.
> >> >
> >> > That is:
> >> >
> >> > uint8x8_t f1(const uint8_t *in) {
> >> > return vld1_dup_u8([1]);
> >> > }
> >> >
> >> > generates at gimple:
> >> >
> >> >   _3 = MEM[(const uint8_t *)in_1(D) + 1B];
> >> >   _4 = {_3, _3, _3, _3, _3, _3, _3, _3};
> >> >
> >> > Which is good, but we then generate:
> >> >
> >> > f1:
> >> >  ldr b0, [x0, 1]
> >> >  dup v0.8b, v0.b[0]
> >> >  ret
> >> >
> >> > instead of ld1r.
> >> >
> >> > The reason for this is because the load instructions have a too
> >> > restrictive predicate on them which causes combine not to be able to
> >> > combine the instructions due to the predicate only accepting simple
> >> addressing modes.
> >> >
> >> > This patch relaxes the predicate to accept any memory operand and
> >> > relies on LRA to legitimize the address when it needs to as the
> >> > constraint still only allows the simple addressing mode.  Reload is
> >> > always able to legitimize to these.
> >> >
> >> > Secondly since we are now actually generating more ld1r it became
> >> > clear that the lane instructions suffer from a similar issue.
> >> >
> >> > i.e.
> >> >
> >> > float32x4_t f2(const float32_t *in, float32x4_t a) {
> >> > float32x4_t dup = vld1q_dup_f32([1]);
> >> > return vfmaq_laneq_f32 (a, a, dup, 1); }
> >> >
> >> > would generate ld1r + vector fmla instead of ldr + lane fmla.
> >> >
> >> > The reason for this is similar to the ld1r issue.  The predicate is
> >> > too restrictive in only acception register operands but not memory.
> >> >
> >> > This relaxes it to accept register and/or memory while leaving the
> >> > constraint to only accept registers.  This will have LRA generate a
> >> > reload if needed forcing the memory to registers using the standard
> >> patterns.
> >> >
> >> > These two changes allow combine and reload to generate the right
> >> sequences.
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >> 
> >> This is going against the general direction of travel, which is to make the
> >> instruction's predicates and conditions enforce the constraints as much as
> >> possible (making optimistic assumptions about pseudo registers).
> >> 
> >> The RA *can* deal with things like:
> >> 
> >>   (match_operand:M N "general_operand" "r")
> >> 
> >> but it's best avoided, for a few reasons:
> >> 
> >> (1) The fix-up will be done in LRA, so IRA will not see the temporary
> >> registers.  This can make the allocation of those temporaries
> >> suboptimal but (more importantly) it might require other
> >> previously-allocated registers to be spilled late due to the
> >> unexpected increase in register pressure.
> >> 
> >> (2) It ends up hiding instructions from the pre-RA optimisers.
> >> 
> >> (3) It can also prevent combine opportunities (as well as create them),
> >> unless the loose predicates in an insn I are propagated to all
> >> patterns that might result from combining I with something else.
> >> 
> >> It sounds like the first problem (not generating ld1r) could be fixed by 
> >> (a)
> >> combining aarch64_simd_dup and *aarch64_simd_ld1r, so
> >> that the register and memory alternatives are in the same pattern and (b)
> >> using the merged instruction(s) to implement the vec_duplicate optab.
> >> Target-independent code should then make the address satisfy the
> >> predicate, simplifying the address where necessary.
> >> 
> >
> > I think I am likely missing something here. I would assume that you wanted
> > to use the optab to split the addressing off from the mem expression so the
> > combined insn matches.
> >
> > But in that case, why do you need to combine the two instructions?
> > I've tried and it doesn't work since the vec_duplicate optab doesn't see the
> > mem as op1, because in gimple the mem is not part of the duplicate.
> >
> > So you still just see:
> >
>  dbgrtx (ops[1].value)
> > (subreg/s/v:QI (reg:SI 92 [ _3 ]) 0)
> >
> > As the operand as the argument to the dup is just an SSA_NAME.
> 
> Ah, yeah, I'd forgotten that fixed-length vec_duplicates would
> come from a constructor rather than a vec_duplicate_expr, so we don't
> get the usual benefit of folding single-use mems during expand.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595362.html
> moves towards using vec_duplicate even for fixed-length vectors.
> If we take that approach, then I suppose a plain constructor
> should be folded to a vec_duplicate where

Re: [PATCH] Add optional __Bfloat16 support

2022-06-13 Thread Hongtao Liu via Gcc-patches

On Sat, Jun 11, 2022 at 1:46 AM H.J. Lu  wrote:
>
> On Fri, Jun 10, 2022 at 7:44 AM H.J. Lu  wrote:
> >
> > On Fri, Jun 10, 2022 at 2:38 AM Florian Weimer  wrote:
> > >
> > > * liuhongt via Libc-alpha:
> > >
> > > > +\subsubsection{Special Types}
> > > > +
> > > > +The \code{__Bfloat16} type uses a 8-bit exponent and 7-bit mantissa.
> > > > +It is used for \code{BF16} related intrinsics, it cannot be
> >
> > Please mention that this is an alternate encoding format for 16-bit floating
> > point.  It has the same size and alignment as _Float16.
>
> It also follows the same rules as _Float16 for parameter passing and function
> return.

How about
+\subsubsection{Special Types}
+
+The \code{__Bfloat16} type has an alternate encoding format for 16-bit
+floating point with 8-bit exponent and 7-bit mantissa. It has the same size,
+alignment, parameter passing and return rules as _Float16.
+It is used for \code{BF16} related intrinsics, it cannot be used with standard
+C operators.
+
>
> > > > +used with standard C operators.
> > >
> > > I think it's not necessary to specify whether the type supports certain
> > > C operators (surely assignment will work?).  If they are added later,
> > > the ABI won't need changing.
> > >
> >
> > If _Bfloat16 becomes a fundamental type, the ABI should be changed to
> > move it together with other scalar types.
> >
> > --
> > H.J.
>
>
>
> --
> H.J.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "X86-64 System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to x86-64-abi+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/x86-64-abi/CAMe9rOrvrZRbksjQ%2BY8caC%3DYo%3D%3DoFY5%2BmuOGf9UZRh6L7pUQjw%40mail.gmail.com.



-- 
BR,
Hongtao

74 matches

Mail list logo