[PATCH] Check calls before loop unrolling

2020-08-19 Thread guojiufu via Gcc-patches
Hi,

When unroll loops, if there are calls inside the loop, those calls
may raise negative impacts for unrolling.  This patch adds a param
param_max_unrolled_calls, and checks if the number of calls inside
the loop bigger than this param, loop is prevent from unrolling.

This patch is checking the _average_ number of calls which is the
summary of call numbers multiply the possibility of the call maybe
executed.  The _average_ number could be a fraction, to keep the
precision, the param is the threshold number multiply 1.

Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?

gcc/ChangeLog
2020-08-19  Jiufu Guo   

* params.opt (param_max_unrolled_average_calls_x1): New param.
* cfgloop.h (average_num_loop_calls): New declare.
* cfgloopanal.c (average_num_loop_calls): New function.
* loop-unroll.c (decide_unroll_constant_iteration,
decide_unroll_runtime_iterations,
decide_unroll_stupid): Check average_num_loop_calls and
param_max_unrolled_average_calls_x1.
---
 gcc/cfgloop.h |  2 ++
 gcc/cfgloopanal.c | 25 +
 gcc/loop-unroll.c | 10 ++
 gcc/params.opt|  4 
 4 files changed, 41 insertions(+)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 18b404e292f..dab933da150 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_CFGLOOP_H
 
 #include "cfgloopmanip.h"
+#include "sreal.h"
 
 /* Structure to hold decision about unrolling/peeling.  */
 enum lpt_dec
@@ -387,6 +388,7 @@ extern vec get_loop_exit_edges (const class loop *, 
basic_block * = NULL);
 extern edge single_exit (const class loop *);
 extern edge single_likely_exit (class loop *loop, vec);
 extern unsigned num_loop_branches (const class loop *);
+extern sreal average_num_loop_calls (const class loop *);
 
 extern edge loop_preheader_edge (const class loop *);
 extern edge loop_latch_edge (const class loop *);
diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c
index 0b33e8272a7..a314db4e0c0 100644
--- a/gcc/cfgloopanal.c
+++ b/gcc/cfgloopanal.c
@@ -233,6 +233,31 @@ average_num_loop_insns (const class loop *loop)
   return ret;
 }
 
+/* Count the number of call insns in LOOP.  */
+sreal
+average_num_loop_calls (const class loop *loop)
+{
+  basic_block *bbs;
+  rtx_insn *insn;
+  unsigned int i, bncalls;
+  sreal ncalls = 0;
+
+  bbs = get_loop_body (loop);
+  for (i = 0; i < loop->num_nodes; i++)
+{
+  bncalls = 0;
+  FOR_BB_INSNS (bbs[i], insn)
+   if (CALL_P (insn))
+ bncalls++;
+
+  ncalls += (sreal) bncalls
+   * bbs[i]->count.to_sreal_scale (loop->header->count);
+}
+  free (bbs);
+
+  return ncalls;
+}
+
 /* Returns expected number of iterations of LOOP, according to
measured or guessed profile.
 
diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
index 693c7768868..56b8fb37d2a 100644
--- a/gcc/loop-unroll.c
+++ b/gcc/loop-unroll.c
@@ -370,6 +370,10 @@ decide_unroll_constant_iterations (class loop *loop, int 
flags)
 nunroll = nunroll_by_av;
   if (nunroll > (unsigned) param_max_unroll_times)
 nunroll = param_max_unroll_times;
+  if (!loop->unroll
+  && (average_num_loop_calls (loop) * (sreal) 1).to_int ()
+  > (unsigned) param_max_unrolled_average_calls_x1)
+nunroll = 0;
 
   if (targetm.loop_unroll_adjust)
 nunroll = targetm.loop_unroll_adjust (nunroll, loop);
@@ -689,6 +693,9 @@ decide_unroll_runtime_iterations (class loop *loop, int 
flags)
 nunroll = nunroll_by_av;
   if (nunroll > (unsigned) param_max_unroll_times)
 nunroll = param_max_unroll_times;
+  if ((average_num_loop_calls (loop) * (sreal) 1).to_int ()
+  > (unsigned) param_max_unrolled_average_calls_x1)
+nunroll = 0;
 
   if (targetm.loop_unroll_adjust)
 nunroll = targetm.loop_unroll_adjust (nunroll, loop);
@@ -1173,6 +1180,9 @@ decide_unroll_stupid (class loop *loop, int flags)
 nunroll = nunroll_by_av;
   if (nunroll > (unsigned) param_max_unroll_times)
 nunroll = param_max_unroll_times;
+  if ((average_num_loop_calls (loop) * (sreal) 1).to_int ()
+  > (unsigned) param_max_unrolled_average_calls_x1)
+nunroll = 0;
 
   if (targetm.loop_unroll_adjust)
 nunroll = targetm.loop_unroll_adjust (nunroll, loop);
diff --git a/gcc/params.opt b/gcc/params.opt
index f39e5d1a012..80605861223 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -634,6 +634,10 @@ The maximum number of unrollings of a single loop.
 Common Joined UInteger Var(param_max_unrolled_insns) Init(200) Param 
Optimization
 The maximum number of instructions to consider to unroll in a loop.
 
+-param=max-unrolled-average-calls-x1=
+Common Joined UInteger Var(param_max_unrolled_average_calls_x1) Init(0) 
Param Optimization
+The maximum number of calls to consider to unroll in a loop on average and 
multiply 1.
+
 -param=max-unswitch-insns=
 Common Joined UInteger 

[PATCH PR96698] aarch64: ICE during GIMPLE pass:vect

2020-08-19 Thread yangyang (ET)
Hi, 

This is a simple fix for PR96698.

For the test case, there are two PHIs in the inner loop in pass_vect

 [local count: 719407024]:
# b_26 = PHI <0(4), b_15(10)>
# c_27 = PHI <0(4), b_26(10)>

c_27 = PHI <0(4), b_26(10)> is detected to be a vectorizable nested 
cycle by vect_is_simple_reduction incorrectly, and # b_26 = PHI <0(4), 
b_15(10)> is marked as the reduc_def of it, resulting in the crash.

This patch adjusts the order of judgements in vect_is_simple_reduction 
to avoid the incorrect detection.

Added one testcase for this. Bootstrap and tested on both aarch64 and 
x86 Linux platform, no new regression witnessed.

Ok for trunk?`

Thanks, 
Yang Yang


+2020-08-20  Yang Yang  
+
+   PR tree-optimization/96698
+   * tree-vect-loop.c (vect_is_simple_reduction): Moves the analysis
+   of phi nodes above the analysis of nested cycle to avoid the
+   incorrect detection.
+

+2020-08-20  Yang Yang  
+
+   PR tree-optimization/96698
+   * gcc.dg/pr96698.c: New test.
+



pr96698-v1.patch
Description: pr96698-v1.patch


Re: [Patch 3/5] rs6000, Add TI to TD (128-bit DFP) and TD to TI support

2020-08-19 Thread Segher Boessenkool
Hi!

On Tue, Aug 11, 2020 at 12:22:59PM -0700, Carl Love wrote:
> +(define_insn "floattitd2"
> +  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
> + (float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
> +  "TARGET_TI_VECTOR_OPS"
> +  "dcffixqq %0,%1"
> +  [(set_attr "type" "dfp")])

I wonder if this should just be TARGET_POWER10 now?  That goes for the
whole series of course.

> +  ;; carll

I don't think we need this comment on trunk ;-)

Looks fine otherwise.  Okay for trunk, modulo whatever we do with
YARGET_TI_VECTOR_OPS.  Thanks!


Segher


[committed] analyzer: fix ICE on vector comparisons [PR96713]

2020-08-19 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-2776-g2f5951bd95e334d611f4be7bbe1a136c580f9c20.

gcc/analyzer/ChangeLog:
PR analyzer/96713
* region-model.cc (region_model::get_gassign_result): For
comparisons, only use eval_condition when the lhs has boolean
type, and use get_or_create_constant_svalue on the boolean
constants directly rather than via get_rvalue.

gcc/testsuite/ChangeLog:
PR analyzer/96713
* gcc.dg/analyzer/pr96713.c: New test.
---
 gcc/analyzer/region-model.cc| 25 -
 gcc/testsuite/gcc.dg/analyzer/pr96713.c |  8 
 2 files changed, 20 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr96713.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 8a5e74ebc0e..b8a0f9ffd3d 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -462,24 +462,23 @@ region_model::get_gassign_result (const gassign *assign,
   {
tree rhs2 = gimple_assign_rhs2 (assign);
 
-   // TODO: constraints between svalues
const svalue *rhs1_sval = get_rvalue (rhs1, ctxt);
const svalue *rhs2_sval = get_rvalue (rhs2, ctxt);
 
-   tristate t = eval_condition (rhs1_sval, op, rhs2_sval);
-   if (t.is_known ())
- return get_rvalue (t.is_true ()
-? boolean_true_node
-: boolean_false_node,
-ctxt);
-   else
+   if (TREE_TYPE (lhs) == boolean_type_node)
  {
-   // TODO: symbolic value for binop
-   const svalue *sval_binop
- = m_mgr->get_or_create_binop (TREE_TYPE (lhs), op,
-   rhs1_sval, rhs2_sval);
-   return sval_binop;
+   /* Consider constraints between svalues.  */
+   tristate t = eval_condition (rhs1_sval, op, rhs2_sval);
+   if (t.is_known ())
+ return m_mgr->get_or_create_constant_svalue
+   (t.is_true () ? boolean_true_node : boolean_false_node);
  }
+
+   /* Otherwise, generate a symbolic binary op.  */
+   const svalue *sval_binop
+ = m_mgr->get_or_create_binop (TREE_TYPE (lhs), op,
+   rhs1_sval, rhs2_sval);
+   return sval_binop;
   }
   break;
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96713.c 
b/gcc/testsuite/gcc.dg/analyzer/pr96713.c
new file mode 100644
index 000..fe9cafd73f1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96713.c
@@ -0,0 +1,8 @@
+typedef int __attribute__ ((vector_size (8))) V;
+
+void
+foo (V d, V e)
+{
+  d <= e;
+  foo ((V){}, (V){});
+}
-- 
2.26.2



Re: [Patch 2/5] rs6000, 128-bit multiply, divide, modulo, shift, compare

2020-08-19 Thread Segher Boessenkool
On Thu, Aug 13, 2020 at 06:46:05PM -0500, will schmidt wrote:
> >  .../gcc.target/powerpc/int_128bit-runnable.c  | 2254 +
> 
> The path into the testsuite subdir looks strange there.

Git abbreviated this.  It is autogenerated (git diffstat), so there is
nothing much you can do about it.  The abbreviation is quite helpful
often (for moves for example), but not always, yup.

> > --- a/gcc/config/rs6000/altivec.h
> > +++ b/gcc/config/rs6000/altivec.h
> > @@ -183,7 +183,7 @@
> >  #define vec_recipdiv __builtin_vec_recipdiv
> >  #define vec_rlmi __builtin_vec_rlmi
> >  #define vec_vrlnm __builtin_vec_rlnm
> > -#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
> > +#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c)))
> 
> per above.   I don't see this change called out.

It looks like an accidental revert.  Good catch :-)

commit e97929e20b2f52e6cfc046c1302324d1b24d95e3
Author: Carl Love 
Date:   Wed Mar 25 18:33:37 2020 -0500

> > +  /* The -mti-vector-ops option requires ISA 3.1 support and -maltivec for
> > + the 128-bit instructions.  Currently, TARGET_POWER10 is sufficient to
> > + enable it by default.  */
> > +  if (TARGET_POWER10)
> > +{
> > +  if (rs6000_isa_flags_explicit & OPTION_MASK_VSX)
> > +   warning(0, ("%<-mno-altivec%> disables -mti-vector-ops (128-bit integer 
> > vector register operations)."));
> > +  else
> > +   rs6000_isa_flags |= OPTION_MASK_TI_VECTOR_OPS;
> > +}
> 
> It seems odd here that -maltivec is explicitly called out here.  That
> should be default on for quite a while at this point.

And the actual check it for -mvsx anyway?  Not sure I follow what this
does at all.

> > @@ -2305,6 +2306,7 @@ extern int frame_pointer_needed;
> >  #define RS6000_BTM_P8_VECTOR   MASK_P8_VECTOR  /* ISA 2.07 vector.  */
> >  #define RS6000_BTM_P9_VECTOR   MASK_P9_VECTOR  /* ISA 3.0 vector.  */
> >  #define RS6000_BTM_P9_MISC MASK_P9_MISC/* ISA 3.0 misc. non-vector */
> > +#define RS6000_BTM_P10_128BIT   MASK_POWER10/* ISA P10 vector.  */
> 
> Should comment be 128-bit something?  (not just P10 vector).

Yeah, or it should be called P10_VECTOR (and it is called ISA 3.1).

> > @@ -2436,6 +2438,7 @@ enum rs6000_builtin_type_index
> >RS6000_BTI_bool_V8HI,  /* __vector __bool short */
> >RS6000_BTI_bool_V4SI,  /* __vector __bool int */
> >RS6000_BTI_bool_V2DI,  /* __vector __bool long */
> > +  RS6000_BTI_bool_V1TI,  /* __vector __bool long */
> 
> Fix comment?

I wonder if the V2DI is correct even (should be "long long"?).

> > +mti-vector-ops
> > +Target Report Mask(TI_VECTOR_OPS) Var(rs6000_isa_flags)
> > +Use integer 128-bit instructions for a future architecture.
> 
> 'future' can probably be adjusted.

Yes :-)

> > \ No newline at end of file
> 
> diff error?

No, the file really should have a newline at the end.  Not all editors
enforce that by default :-(

> > +(define_expand "vector_eqv1ti"
> > +  [(set (match_operand:V1TI 0 "vlogical_operand")
> > +   (eq:V1TI (match_operand:V1TI 1 "vlogical_operand")
> > +(match_operand:V1TI 2 "vlogical_operand")))]
> > +  "TARGET_TI_VECTOR_OPS"
> > +  "")

All the rest of this is in rs6000.md, won't "eqvv1ti3" work already?

> Since it's on all of the clauses, Maybe adjust the dg-require to
> include ppc_native_128bit for the whole test, unless there is more to
> follow.

Good plan :-)

Thanks for all the comments Will!  Carl, could you fix things and resend
please?  It's a rather big patch, we'll have to do it in stages :-/


Segher


Re: [PATCH, rs6000] Add non-relative jump table support on Power Linux

2020-08-19 Thread Segher Boessenkool
Hi!

Sorry this took so long to review.  "I lost track of this patch", what
can I say :-/

On Fri, Aug 14, 2020 at 03:31:05PM +0800, HAO CHEN GUI wrote:
> This patch adds non-relative jump table support on Power Linux. It 
> implements ASM_OUTPUT_ADDR_VEC_ELT and adds four new expansions for 
> non-relative jump table in rs6000.md. It also defines a rs6000 
> option(mrelative-jumptables). If it's set to false, the non-relative 
> jump table is picked up even PIC flag is set.

> diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h
> index 34776c8421e..626f2201d96 100644
> --- a/gcc/config/rs6000/linux64.h
> +++ b/gcc/config/rs6000/linux64.h
> @@ -324,7 +324,10 @@ extern int dot_symbols;
>  
>  /* Indicate that jump tables go in the text section.  */
>  #undef  JUMP_TABLES_IN_TEXT_SECTION
> -#define JUMP_TABLES_IN_TEXT_SECTION TARGET_64BIT
> +#define JUMP_TABLES_IN_TEXT_SECTION rs6000_relative_jumptables
> +
> +#undef JUMP_TABLES_IN_RELRO
> +#define JUMP_TABLES_IN_RELRO !rs6000_relative_jumptables

The third choice is .rodata.  .data.rel.ro can be considered to be a
sub-case of that, but not the other way around.

> +extern void rs6000_output_addr_vec_elt(FILE *, int);

(space before opening paren)

> +/* Implement TARGET_ASM_GENERATE_PIC_ADDR_DIFF_VEC. Return true if 
> rs6000_relative_jumptables is set. */

The maximum line length is 80 characters.  Two spaces after a full stop.

> +/* Enable non-relative jump tables for linux ELF. */
> +#ifdef TARGET_ELF
> +#if !TARGET_AIX 
> +#undef rs6000_relative_jumptables
> +#define rs6000_relative_jumptables 0
> +#endif
> +#endif

AIX never is ELF, but many other things *are* ELF.  Do all of those
handle .data.rel.ro, including do they have enough OS support for it?

This also changes the default for ELF systems?  Let's not do that (in
this patch, that is).

> +/* Specify the machine mode that this machine uses
> +   for the index in the tablejump instruction.  */
> +#define CASE_VECTOR_MODE \
> +  (TARGET_32BIT || rs6000_relative_jumptables ? SImode : DImode)

#define CASE_VECTOR_MODE (rs6000_relative_jumptables ? SImode : Pmode)

> +/* This is how to output an element of a case-vector 
> +   that is non-relative. */

(no space at end of line; two spaces after full stop, also before */)

> +#define ASM_OUTPUT_ADDR_VEC_ELT(FILE, VALUE) \
> +  rs6000_output_addr_vec_elt((FILE), (VALUE))

Space before first ( please (on the second line, on the first you
cannot -- macro definitions are the only exception).

> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 0aa5265d199..2f28da0ea1f 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -12669,18 +12669,32 @@
>if (rs6000_speculate_indirect_jumps)
>  {
>if (TARGET_32BIT)
> - emit_jump_insn (gen_tablejumpsi (operands[0], operands[1]));
> -  else
> +{
> +  if (rs6000_relative_jumptables)
> +emit_jump_insn (gen_tablejumpsi (operands[0], operands[1]));
> +  else
> + emit_jump_insn (gen_nonrelative_tablejumpsi (operands[0], 
> operands[1]));

Please fix the indent while you're at it (tabs shouldn't follow spaces).
Oh, and line too long.

> +}
> +  else if (rs6000_relative_jumptables)
>   emit_jump_insn (gen_tablejumpdi (operands[0], operands[1]));
> +  else 
> +emit_jump_insn (gen_nonrelative_tablejumpdi (operands[0], 
> operands[1]));

Please put the "relative" case on the outside, and the "32 or 64" if on
the inside.

> +(define_expand "nonrelative_tablejumpsi"

Maybe call it "absolute_tablejumpsi"?  Not sure :-)

> +   "TARGET_32BIT && rs6000_speculate_indirect_jumps && 
> !rs6000_relative_jumptables"

Line too long (more follow, I won't point all out, please check :-) )

> + {
> +  operands[2] = gen_reg_rtx (SImode);
> + })

No spaces before { and } here, those are flush left like in outer braces
in a C function are.

> +(define_expand "nonrelative_tablejumpsi_nospec"
> +  [(set (match_dup 3)
> +(match_operand:SI 0 "gpc_reg_operand" "r"))
> +   (parallel [(set (pc)
> +   (match_dup 3))
> +  (use (label_ref (match_operand 1)))
> +  (clobber (match_operand 2))])]

Please use a leading tab instead of every 8 leading spaces.


I'll try to be quicker at reviewing iterations of this -- there is quite
some way to go, without me slowing things down!


Segher


Re: [PATCH 2/5] C front end support to detect out-of-bounds accesses to array parameters

2020-08-19 Thread Joseph Myers
On Wed, 19 Aug 2020, Martin Sebor via Gcc-patches wrote:

> > I think you need a while loop there, not just an if, to account for the
> > case of multiple consecutive cdk_attrs.  At least the GNU attribute syntax
> > 
> > direct-declarator:
> > [...]
> >   ( gnu-attributes[opt] declarator )
> > 
> > should produce multiple consecutive cdk_attrs for each level of
> > parentheses with attributes inside.
> 
> I had considered a loop but couldn't find a way to trigger what you
> describe (or a test in the testsuite that would do it) so I didn't
> use one.  I saw loops like that in other places but I couldn't get
> even those to uncover such a test case.  Here's what I tried:
> 
>   #define A(N) __attribute__ ((aligned (N), may_alias))
>   int n;
>   void f (int (* A (2) A (4) (* A (2) A (4) (* A (2) A (4) [n])[n])));
> 
> Sequences of consecutive attributes are all chained together.
> 
> I've added the loop here but I have no test for it.  It would be
> good to add one if it really is needed.

The sort of thing I'm thinking of would be, where A is some attribute:

void f (int (A (A (A arg;

(that example doesn't involve an array, but it illustrates the syntax I'd 
expect to produce multiple consecutive cdk_attrs).

-- 
Joseph S. Myers
jos...@codesourcery.com


RE: [PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V macro definitions.

2020-08-19 Thread Carl Love via Gcc-patches
On Wed, 2020-08-19 at 15:16 -0500, Segher Boessenkool wrote:
> On Wed, Aug 19, 2020 at 02:19:12PM -0500, Peter Bergner wrote:
> > On 8/14/20 7:42 PM, Segher Boessenkool wrote:
> > > I think your current code is fine; I hadn't considered Bill's
> > > upcoming
> > > rewrite.  It is more important to make that go smoother than to
> > > fix some
> > > aesthetics right now.
> > > 
> > > Please check if the names for the builtins match the (future)
> > > documentation, and then, approved for trunk.  Thank you!
> > 
> > This is also a bug in GCC 10, so this should be backported too.
> > 
> > Segher, is this ok for Carl to backport to GCC 10 after it has sat
> > on
> > trunk for a few days?
> 
> Yes.  Thanks guys.


The patch was committed today, I left issue 935 open for the moment.  I
will work on a backport patch for GCC 10 and then as long as nothing
bad happens I will push the patch early next week.

   Carl



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-19 Thread Qing Zhao via Gcc-patches



> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool  
> wrote:
> 
> Hi!
> 
> On Wed, Aug 19, 2020 at 03:05:36PM -0500, Qing Zhao wrote:
>> So, cleaning the scratch registers that are used to pass parameters at 
>> return instructions should 
>> effectively mitigate ROP attack. 
> 
> But that is *very* expensive, in general.  Instead of doing just a
> return instruction (which effectively costs 0 cycles, and is just one
> insn), you now have to zero all call-clobbered register at every return
> (typically many returns per function, and you are talking 10+ registers
> even if only considering the simple integer registers).

Yes, the run-time overhead and also the code-size overhead are major concerns. 
We should minimize the overhead
as much as we can during implementation. However, such overhead cannot be 
completely avoided for the security purpose. 

In order to reduce the overhead for the ROP mitigation, I added 3 new values 
for -fzero-call-used-regs=used-arg-grp|used-arg|arg

For “used-arg-grp”, we only zero the integer registers that are used in the 
routine and can pass parameters; this should provide ROP mitigation
with the minimum overhead. 

For “used-arg”, in addition to “used-arg-grp”, the other registers (for 
example, FP registers) that can pass parameters will be zeroed. But I am not
very sure whether this option is really needed in practical. 

For “arg”, in addition to “used-arg”, all registers that pass parameters will 
be zeroed. Same as “used-arg”, I am not very sure whether we need this option
Or not. 

> 
> Numbers on how expensive this is (for what arch, in code size and in
> execution time) would be useful.  If it is so expensive that no one will
> use it, it helps security at most none at all :-(

CLEAR Linux project has been using a similar patch since GCC 8, the option it 
used is an equivalent to -fzero-call-used-regs=used-gpr.

-fzero-call-used-regs=used-arg-gpr in this new proposal will have smaller 
overhead than the one currently being used in CLEAR Linux.

Victor, do you have any data on the overhead of the option that currently is 
used by CLEAR project?

> 
>> Q1. Which registers should be set to zeros at the return of the function?
>> A. the caller-saved, i.e, call-used, or call-clobbered registers.
>>   For ROP mitigation purpose, only the call-used registers that pass
>> parameters need to be zeroed. 
>>   For register erasure purpose, all the call-used registers might need to
>> be zeroed. we can provide multiple levels to user for controling the runtime
>> overhead. 
> 
> The call-clobbered regs are the only ones you *can* touch.  That does
> not mean you should clear them all (it doesn't help much at all in some
> cases).  Only the backend knows.

I think that for ROP mitigation purpose, we only need to clear the call-used 
(i.e, call-clobbered) registers that are used in the current routine and
can pass parameters. 

But for preventing information leak from callee registers, we might need to 
clear all the call-used registers at return.


> 
>>So, from both run-time performance and code-size aspects, setting the
>> registers to zero is a better approach. 
> 
> From a security perspective, this isn't clear though.  But that is a lot
> of extra research ;-)

There has been quite some discussion on this topic at

https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html 


From those old discussion, we can see that zero value should be good enough for 
the security purpose (though it’s not perfect).

Qing

> 
> 
> Segher



Re: [PATCH] bb-reorder: Remove a misfiring micro-optimization (PR96475)

2020-08-19 Thread Segher Boessenkool
Hi!

On Wed, Aug 19, 2020 at 07:18:02PM +0100, Richard Sandiford wrote:
> > I have never seen the second case misfiring in practice, only the first
> > one!
> 
> Shucks, I guessed the wrong way round :-)
> 
> I'd argue that the first check isn't a micro-optimisation though.
> It's testing whether the duplication really is a duplication
> (rather than a move).

Yes, by removing it the code is duplicated like in all other cases, but
then the original is removed later (as dead code).  With the check in
place we have it appended to the predecessor block (by insn chain
grafting instead of copying, done by cleanup_cfg).  That is what I call
a micro-optimisation.  Of course it isn't correct in all cases (maybe
those cases didn't happen before, maybe no one noticed; in any case,
this patch makes it more robust, while still keeping it O(N)).

> > The original patch is at
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551560.html
> > btw.  I'll prepare a testcase for that (it doesn't trigger with recent
> > GCC versions though :-/ )
> 
> Hmm, OK.  If the first check is the important one, then I think it'd
> be better to make the max_size stuff conditional on !single_pred_p
> rather than drop the test entirely.

Okay, I'll look at that.  Thanks,


Segher


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-19 Thread Segher Boessenkool
Hi!

On Wed, Aug 19, 2020 at 03:05:36PM -0500, Qing Zhao wrote:
> So, cleaning the scratch registers that are used to pass parameters at return 
> instructions should 
> effectively mitigate ROP attack. 

But that is *very* expensive, in general.  Instead of doing just a
return instruction (which effectively costs 0 cycles, and is just one
insn), you now have to zero all call-clobbered register at every return
(typically many returns per function, and you are talking 10+ registers
even if only considering the simple integer registers).

Numbers on how expensive this is (for what arch, in code size and in
execution time) would be useful.  If it is so expensive that no one will
use it, it helps security at most none at all :-(

> Q1. Which registers should be set to zeros at the return of the function?
> A. the caller-saved, i.e, call-used, or call-clobbered registers.
>For ROP mitigation purpose, only the call-used registers that pass
> parameters need to be zeroed. 
>For register erasure purpose, all the call-used registers might need to
> be zeroed. we can provide multiple levels to user for controling the runtime
> overhead. 

The call-clobbered regs are the only ones you *can* touch.  That does
not mean you should clear them all (it doesn't help much at all in some
cases).  Only the backend knows.

> So, from both run-time performance and code-size aspects, setting the
> registers to zero is a better approach. 

>From a security perspective, this isn't clear though.  But that is a lot
of extra research ;-)


Segher


Re: [PATCH 2/5] C front end support to detect out-of-bounds accesses to array parameters

2020-08-19 Thread Martin Sebor via Gcc-patches

On 8/17/20 4:09 PM, Joseph Myers wrote:

On Thu, 13 Aug 2020, Martin Sebor via Gcc-patches wrote:


* Maybe cdk_pointer is followed by cdk_attrs before cdk_id.  In this case
the code won't return.


I think I see the problem you're pointing out (I just don't see how
to trigger it or test that it doesn't happen).  If the tweak in
the attached update doesn't fix it a test case would be helpful.


I think you need a while loop there, not just an if, to account for the
case of multiple consecutive cdk_attrs.  At least the GNU attribute syntax

direct-declarator:
[...]
  ( gnu-attributes[opt] declarator )

should produce multiple consecutive cdk_attrs for each level of
parentheses with attributes inside.


I had considered a loop but couldn't find a way to trigger what you
describe (or a test in the testsuite that would do it) so I didn't
use one.  I saw loops like that in other places but I couldn't get
even those to uncover such a test case.  Here's what I tried:

  #define A(N) __attribute__ ((aligned (N), may_alias))
  int n;
  void f (int (* A (2) A (4) (* A (2) A (4) (* A (2) A (4) [n])[n])));

Sequences of consecutive attributes are all chained together.

I've added the loop here but I have no test for it.  It would be
good to add one if it really is needed.




* Maybe the code is correct to continue because we're in the case of an
array of pointers (cdk_array follows).  But as I understand it, the intent
is to set up an "arg spec" that describes only the (multidimensional)
array that is the parameter itself - not any array pointed to.  And it
looks to me like, in the case of an array of pointers to arrays, both sets
of array bounds would end up in the spec constructed.


Ideally, I'd like to check even pointers to arrays and so they should
be recorded somewhere.  The middle end code doesn't do any checking
of those yet for out-of-bounds accesses.  It wasn't a goal for
the first iteration so I've tweaked the code to avoid recording them.


Could you expand the comment on get_parm_array_spec to specify exactly
what you think the function should be putting in the returned attribute,
in what order, in cases where there are array declarators (constant,
empty, [*] and VLA) intermixed with other kinds of declarators and the
type from the type specifiers may or may not be an array type itself?
That will provide a basis for subsequent rounds of review of whether the
function is actually behaving as expected.


Done.



As far as I can see, the logic

+  if (TREE_CODE (nelts) == INTEGER_CST)
+   {
+ /* Skip all constant bounds except the most significant one.
+The interior ones are included in the array type.  */
+ if (next && (next->kind == cdk_array || next->kind == cdk_pointer))
+   continue;

will skip constant bounds in an array that's the target of a pointer
declarator, but not any other kind of bounds.  Is that what you intend -
that all the other kind of bounds in pointed-to arrays will be recorded in
this string?


The immediate goal (for the front end) is to detect differences
in the form ([] vs [N]) or value of the most significant bound
in parameters of array types (before they decay to pointers),
and differences in the form ([*] vs [n]) or (the presumed) value
(e.g., [n] vs [n + 1]) of VLA bounds.

So I need to encode every variable bound (specified or unspecified).
For ordinary arrays I want to encode just the form of the most
significant bound.




Then, the code

+  if (pd->kind == cdk_id)
+   {
+ /* Extract the upper bound from a parameter of an array type.  */

also seems misplaced.  If the type specifiers for the parameter are a
typedef for an array type, that array type should be processed *before*
the declarator to get the correct semantics (as if the bounds from those
type specifiers were given in the declarator), not at the end which gets
that type out of order with respect to array declarators.  (Processing
before the declarator also means clearing the results of that processing
if a pointer declarator is encountered at any point, because in that case
the array type in the type specifiers is irrelevant.)


I'm not sure I follow you here.  Can you show me what you mean on
a piece of code?  This test case (which IIUC does what you described)
works as expected:

$ cat q.c && gcc -O2 -S -Wall q.c
typedef int A[7][9];

void f (A[3][5]);


So this is equivalent to A[3][5][7][9].  The c_declarator structures have
the one for the [3] (the top-level bound) inside the one for the [5].
The [5] bound is skipped by the "Skip all constant bounds except the most
significant one." logic.  When the [3] bound is reached, the "break;" at
the end of that processing means the "Extract the upper bound from a
parameter of an array type." never gets executed.  Try replacing the [3]
bound by a VLA bound.  As I read the code, it will end up generating a
spec string that records first the VLA, then the [7], when it should be
first the 9 

Re: [PATCH 3/4] libstdc++: Add floating-point std::to_chars implementation

2020-08-19 Thread Patrick Palka via Gcc-patches
On Wed, 22 Jul 2020, Patrick Palka wrote:

> On Mon, 20 Jul 2020, Patrick Palka wrote:
> 
> > On Mon, 20 Jul 2020, Jonathan Wakely wrote:
> > 
> > > On 20/07/20 08:53 -0400, Patrick Palka via Libstdc++ wrote:
> > > > On Mon, 20 Jul 2020, Jonathan Wakely wrote:
> > > > 
> > > > > On 19/07/20 23:37 -0400, Patrick Palka via Libstdc++ wrote:
> > > > > > On Fri, 17 Jul 2020, Patrick Palka wrote:
> > > > > >
> > > > > > > On Fri, 17 Jul 2020, Patrick Palka wrote:
> > > > > > >
> > > > > > > > On Wed, 15 Jul 2020, Patrick Palka wrote:
> > > > > > > >
> > > > > > > > > On Tue, 14 Jul 2020, Patrick Palka wrote:
> > > > > > > > >
> > > > > > > > > > This implements the floating-point std::to_chars overloads 
> > > > > > > > > > for
> > > > > > > float,
> > > > > > > > > > double and long double.  We use the Ryu library to compute 
> > > > > > > > > > the
> > > > > > > shortest
> > > > > > > > > > round-trippable fixed and scientific forms of a number for
> > > > > float,
> > > > > > > double
> > > > > > > > > > and long double.  We also use Ryu for performing fixed and
> > > > > > > scientific
> > > > > > > > > > formatting of float and double. For formatting long double 
> > > > > > > > > > with
> > > > > an
> > > > > > > > > > explicit precision argument we use a printf fallback.
> > > > > Hexadecimal
> > > > > > > > > > formatting for float, double and long double is implemented 
> > > > > > > > > > from
> > > > > > > > > > scratch.
> > > > > > > > > >
> > > > > > > > > > The supported long double binary formats are float64 (same 
> > > > > > > > > > as
> > > > > > > double),
> > > > > > > > > > float80 (x86 extended precision), float128 and ibm128.
> > > > > > > > > >
> > > > > > > > > > Much of the complexity of the implementation is in 
> > > > > > > > > > computing the
> > > > > > > exact
> > > > > > > > > > output length before handing it off to Ryu (which doesn't do
> > > > > bounds
> > > > > > > > > > checking).  In some cases it's hard to compute the output 
> > > > > > > > > > length
> > > > > > > before
> > > > > > > > > > the fact, so in these cases we instead compute an upper 
> > > > > > > > > > bound on
> > > > > the
> > > > > > > > > > output length and use a sufficiently-sized intermediate 
> > > > > > > > > > buffer
> > > > > (if
> > > > > > > the
> > > > > > > > > > output range is smaller than the upper bound).
> > > > > > > > > >
> > > > > > > > > > Another source of complexity is in the 
> > > > > > > > > > general-with-precision
> > > > > > > formatting
> > > > > > > > > > mode, where we need to do zero-trimming of the string 
> > > > > > > > > > returned
> > > > > by
> > > > > > > Ryu, and
> > > > > > > > > > where we also take care to avoid having to format the 
> > > > > > > > > > string a
> > > > > > > second
> > > > > > > > > > time when the general formatting mode resolves to fixed.
> > > > > > > > > >
> > > > > > > > > > Tested on x86_64-pc-linux-gnu, aarch64-unknown-linux-gnu,
> > > > > > > > > > s390x-ibm-linux-gnu, and powerpc64-unknown-linux-gnu.
> > > > > > > > > >
> > > > > > > > > > libstdc++-v3/ChangeLog:
> > > > > > > > > >
> > > > > > > > > > * acinclude.m4 (libtool_VERSION): Bump to 6:29:0.
> > > > > > > > > > * config/abi/pre/gnu.ver: Add new exports.
> > > > > > > > > > * configure: Regenerate.
> > > > > > > > > > * include/std/charconv (to_chars): Declare the 
> > > > > > > > > > floating-point
> > > > > > > > > > overloads for float, double and long double.
> > > > > > > > > > * src/c++17/Makefile.am (sources): Add 
> > > > > > > > > > floating_to_chars.cc.
> > > > > > > > > > * src/c++17/Makefile.in: Regenerate.
> > > > > > > > > > * src/c++17/floating_to_chars.cc: New file.
> > > > > > > > > > * testsuite/20_util/to_chars/long_double.cc: New test.
> > > > > > > > > > * testsuite/util/testsuite_abi.cc: Add new symbol 
> > > > > > > > > > version.
> > > > > > > > >
> > > > > > > > > Here is v2 of this patch, which fixes a build failure on i386 
> > > > > > > > > due
> > > > > to
> > > > > > > > > __int128 being unavailable, by refactoring the long double 
> > > > > > > > > binary
> > > > > > > format
> > > > > > > > > selection to avoid referring to __int128 when it doesn't 
> > > > > > > > > exist.
> > > > > The
> > > > > > > > > patch also makes the hex formatting for 80-bit long double use
> > > > > > > uint64_t
> > > > > > > > > instead of __int128 since the mantissa has exactly 64 bits in 
> > > > > > > > > this
> > > > > > > case.
> > > > > > > >
> > > > > > > > Here's v3 which just makes some minor stylistic adjustments, and
> > > > > most
> > > > > > > > notably replaces the use of _GLIBCXX_DEBUG with 
> > > > > > > > _GLIBCXX_ASSERTIONS
> > > > > > > > since we just want to enable __glibcxx_assert and not all of 
> > > > > > > > debug
> > > > > mode.
> > > > > > >
> > > > > > > Here's v4, which should now correctly support using  
> > > > > > > with
> > > > > > > -mlong-double-64 on targets with a large default long 

[pushed] c++: Check satisfaction before non-dep convs. [CWG2369]

2020-08-19 Thread Jason Merrill via Gcc-patches
It's very hard to use concepts to protect a template from hard errors due to
unwanted instantiation if constraints aren't checked until after doing all
substitution and checking of non-dependent conversions.

It was pretty straightforward to insert the satisfaction check into the
logic, but I needed to make the 3-parameter version of
satisfy_declaration_constraints call push_tinst_level like the 2-parameter
version already does.  For simplicity, I also made it add any needed outer
template arguments from the TEMPLATE_DECL to the args.

The testsuite changes are mostly because this change causes unsatisfaction
to cause deduction to fail rather than reject the candidate later in
overload resolution.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

DR 2369
* cp-tree.h (push_tinst_level, push_tinst_level_loc): Declare.
* constraint.cc (satisfy_declaration_constraints):
Use add_outermost_template_args and push_tinst_level.
* pt.c (add_outermost_template_args): Handle getting
a TEMPLATE_DECL as the first argument.
(push_tinst_level, push_tinst_level_loc): No longer static.
(fn_type_unification): Check satisfaction before non-dependent
conversions.

gcc/testsuite/ChangeLog:

DR 2369
* g++.dg/concepts/diagnostic10.C: Adjust expexcted errors.
* g++.dg/concepts/diagnostic13.C: Adjust expexcted errors.
* g++.dg/concepts/diagnostic2.C: Adjust expexcted errors.
* g++.dg/concepts/diagnostic3.C: Adjust expexcted errors.
* g++.dg/concepts/diagnostic4.C: Adjust expexcted errors.
* g++.dg/concepts/diagnostic5.C: Adjust expexcted errors.
* g++.dg/concepts/diagnostic9.C: Adjust expexcted errors.
* g++.dg/concepts/expression2.C: Adjust expexcted errors.
* g++.dg/concepts/fn5.C: Adjust expexcted errors.
* g++.dg/concepts/placeholder5.C: Adjust expexcted errors.
* g++.dg/concepts/pr67595.C: Adjust expexcted errors.
* g++.dg/cpp2a/concepts-pr78752-2.C: Adjust expexcted errors.
* g++.dg/cpp2a/concepts-pr84140.C: Adjust expexcted errors.
* g++.dg/cpp2a/concepts-recursive-sat3.C: Adjust expexcted errors.
* g++.dg/cpp2a/concepts-requires18.C: Adjust expexcted errors.
* g++.dg/cpp2a/concepts-requires19.C: Adjust expexcted errors.
* g++.dg/cpp2a/concepts3.C: Adjust expexcted errors.
* g++.dg/cpp2a/concepts-nondep1.C: New test.
* g++.dg/cpp2a/concepts-nondep1a.C: New test.
---
 gcc/cp/cp-tree.h  |  2 ++
 gcc/cp/constraint.cc  | 12 +--
 gcc/cp/pt.c   | 35 +++
 gcc/testsuite/g++.dg/concepts/diagnostic10.C  |  2 +-
 gcc/testsuite/g++.dg/concepts/diagnostic13.C  |  2 +-
 gcc/testsuite/g++.dg/concepts/diagnostic2.C   |  2 +-
 gcc/testsuite/g++.dg/concepts/diagnostic3.C   |  4 +--
 gcc/testsuite/g++.dg/concepts/diagnostic4.C   |  2 +-
 gcc/testsuite/g++.dg/concepts/diagnostic5.C   |  2 +-
 gcc/testsuite/g++.dg/concepts/diagnostic9.C   |  3 +-
 gcc/testsuite/g++.dg/concepts/expression2.C   |  2 +-
 gcc/testsuite/g++.dg/concepts/fn5.C   |  4 +--
 gcc/testsuite/g++.dg/concepts/placeholder5.C  |  4 +--
 gcc/testsuite/g++.dg/concepts/pr67595.C   |  2 +-
 gcc/testsuite/g++.dg/cpp2a/concepts-nondep1.C | 19 ++
 .../g++.dg/cpp2a/concepts-nondep1a.C  | 20 +++
 .../g++.dg/cpp2a/concepts-pr78752-2.C |  2 +-
 gcc/testsuite/g++.dg/cpp2a/concepts-pr84140.C |  1 -
 .../g++.dg/cpp2a/concepts-recursive-sat3.C|  2 +-
 .../g++.dg/cpp2a/concepts-requires18.C|  4 +--
 .../g++.dg/cpp2a/concepts-requires19.C| 12 +++
 gcc/testsuite/g++.dg/cpp2a/concepts3.C|  6 ++--
 22 files changed, 106 insertions(+), 38 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-nondep1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-nondep1a.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 44531cd86dc..3f3717a6bb5 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6981,7 +6981,9 @@ extern bool template_parm_object_p
(const_tree);
 extern tree tparm_object_argument  (tree);
 extern bool explicit_class_specialization_p (tree);
 extern bool push_tinst_level(tree);
+extern bool push_tinst_level(tree, tree);
 extern bool push_tinst_level_loc(tree, location_t);
+extern bool push_tinst_level_loc(tree, tree, location_t);
 extern void pop_tinst_level (void);
 extern struct tinst_level *outermost_tinst_level(void);
 extern void init_template_processing   (void);
diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 48d52ec5b7a..7a2f3b9fde0 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2814,16 +2814,22 @@ satisfy_declaration_constraints (tree t, tree args, 
subst_info info)
   

Re: [EXTERNAL] Re: [PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V macro definitions.

2020-08-19 Thread Segher Boessenkool
On Wed, Aug 19, 2020 at 02:19:12PM -0500, Peter Bergner wrote:
> On 8/14/20 7:42 PM, Segher Boessenkool wrote:
> > I think your current code is fine; I hadn't considered Bill's upcoming
> > rewrite.  It is more important to make that go smoother than to fix some
> > aesthetics right now.
> > 
> > Please check if the names for the builtins match the (future)
> > documentation, and then, approved for trunk.  Thank you!
> 
> This is also a bug in GCC 10, so this should be backported too.
> 
> Segher, is this ok for Carl to backport to GCC 10 after it has sat on
> trunk for a few days?

Yes.  Thanks guys.


Segher


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-19 Thread Qing Zhao via Gcc-patches
Hi,

Based on all the previous discussion and more extensive study on ROP and its 
mitigation techniques these days, I came up with the following
High-level proposal as requested, please take a look and let me know what I 
should change in this high-level design:

> On Aug 6, 2020, at 6:37 PM, Segher Boessenkool  
> wrote:
> 
> Anyway.  This all needs a good description in the user manual (is there?
> I couldn't find any), explaining what exactly it does (user-visible),
> and when you would want to use it, etc.  We need that before we can
> review anything else in this patch sanely.
> 
> 
> Segher

zeroing call-used registers for security purpose

8/19/2020
Qing Zhao
=

**Motivation:
There are two purposes of this patch:

1. ROP mitigation:

ROP (Return-oriented programming, 
https://en.wikipedia.org/wiki/Return-oriented_programming) is 
one of the most popular code reuse attack technique, which executes gadget 
chains to perform malicious tasks.
A gadget is a carefully chosen machine instruction sequence that is already 
present in the machines' memory. 
Each gadget typically ends in a return instruction and is located in a 
subroutine within the existing program 
and/or shared library code.

There are two variations that use gadgets that end with indirect call (COP, 
Call Oriented Programming )
 and jump instruction (JOP, Jump-Oriented Programming). However, performing ROP 
without return 
instructions in reality is difficult because the gadgets of COP and JOP that 
can form a completed chain 
are almost nonexistent. 

As a result, gadgets based on return instructions remain the most popular.

One important feature of ROP attack is (Clean the Scratch Registers:A Way to 
Mitigate Return-Oriented
Programming Attacks https://ieeexplore.ieee.org/document/8445132):
the destination of using gadget chains usually call system functions to perform 
malicious behaviour,
on many of the mordern architectures, the registers would be used to pass 
parameters for those 
system functions.

So, cleaning the scratch registers that are used to pass parameters at return 
instructions should 
effectively mitigate ROP attack. 

2. Register Erasure:

In the SECURE project and GCC (https://gcc.gnu.org/wiki/cauldron2018#secure)

One of the well known security techniques is stack and register erasure. 
Ensuring that on return from a function, no data is left lying on the stack or 
in registers.

As mentioned in the slides 
(https://gmarkall.files.wordpress.com/2018/09/secure_and_gcc.pdf), 
there is a seperate project that tried to resolve the stack erasure problem. 
and the patch for
 stack erasure had been ready to submit. That specific patch does not handle 
register erasure problem. 

So, we will also address the register erasure problem with this patch along 
with the ROP mitigation. 

** Questions and Answers:

Q1. Which registers should be set to zeros at the return of the function?
A. the caller-saved, i.e, call-used, or call-clobbered registers.
   For ROP mitigation purpose, only the call-used registers that pass
parameters need to be zeroed. 
   For register erasure purpose, all the call-used registers might need to
be zeroed. we can provide multiple levels to user for controling the runtime
overhead. 

Q2. Why zeroing the registers other than randomalize them?
A. (From Kees Cook)
In the performance analysis I looked at a while ago, doing the
register-self-xor is extremely fast to run (IIRC the cycle counts on x86
were absolutely tiny), and it's smaller for code size which minimized
the overall image footprint.
a fixed value is a significantly better defensive position to take
for ROP. And specifically zero _tends_ to be the safest choice as it's
less "useful" to be used as a size, index, or pointer. And, no, it is
not perfect, but nothing can be if we're dealing with trying to defend
against arbitrary ROP gadget finding (or uninitialized stack contents,
where the same argument for "zero is best" also holds[1]).

-Kees
([1]https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html)

So, from both run-time performance and code-size aspects, setting the
registers to zero is a better approach. 

** Proposal:

We will provide a new feature into GCC for the above security purposes. 

Add -fzero-call-used-regs=[skip|rop-mitigation|used-gpr|all-gpr|used|all] 
command-line option
and 
zero_call_used_regs("skip|used-arg-gpr|used-arg|arg|used-gpr|all-gpr|used|all") 
function attribues:

1. -mzero-call-used-regs=skip and zero_call_used_regs("skip")

Don't zero call-used registers upon function return. This is the default 
behavior.

2. -mzero-call-used-regs=used-arg-gpr and 
zero_call_used_regs("used-arg-gpr")

Zero used call-used general purpose registers that are used to pass 
parameters upon function return.

3. -mzero-call-used-regs=used-arg and zero_call_used_regs("used-arg")

Zero used call-used registers that are used to pass parameters 

[committed] [OG10] Re: Re: [Patch] [OpenMP, Fortran] Add structure/derived-type element mapping

2020-08-19 Thread Kwok Cheung Yeung

If I got my tracking right, the og10 commit
4677091db1aa9d2a52e4839812bd73f47cc5e421 "[OpenMP, Fortran] Add
structure/derived-type element mapping" regresses:

[-PASS:-]{+FAIL:+} gfortran.dg/goacc/pr70828.f90   -O   scan-tree-dump-times 
gimple "omp target oacc_data map\\(tofrom:MEM\\[\\(c_char \\*\\)_[0-9]+\\] 
\\[len: _[0-9]+\\]\\) map\\(alloc:data \\[pointer assign, b
ias: _[0-9]+\\]\\)" 1
[-PASS:-]{+FAIL:+} gfortran.dg/goacc/pr70828.f90   -O   scan-tree-dump-times 
gimple "omp target oacc_parallel map\\(force_present:MEM\\[\\(c_char 
\\*\\)D\\.[0-9]+\\] \\[len: D\\.[0-9]+\\]\\) map\\(alloc:data \\[
pointer assign, bias: D\\.[0-9]+\\]\\)" 1
PASS: gfortran.dg/goacc/pr70828.f90   -O  (test for excess errors)


The mapping in the Gimple dump has changed from:

_6 = parm.0.data;
...
#pragma omp target oacc_data map(tofrom:MEM[(c_char *)_6] [len: _5]) 
map(alloc:data [pointer assign, bias: _10])


to:

_6 = parm.0.data;
_7 = (integer(kind=8)) _6
data.2_8 = (integer(kind=8)) 
_9 = _7 - data.2_8;
_10 = _9 / 4;
...
#pragma omp target oacc_data map(tofrom:data[_10] [len: _5]) map(alloc:data 
[pointer assign, bias: _14])


i.e. It is now mapping the array data starting from the offset that parm.0.data 
is at relative to data, rather than from the memory at parm.0.data directly.


This is due to the change in array handling in this part of the patch:

   if (TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
-   ptr2 = build_fold_addr_expr (decl);
+   {
+ tree offset;
+ ptr2 = build_fold_addr_expr (decl);
+ offset = fold_build2 (MINUS_EXPR, ptrdiff_type_node, ptr,
+   fold_convert (ptrdiff_type_node, ptr2));
+ offset = build2 (TRUNC_DIV_EXPR, ptrdiff_type_node,
+  offset, fold_convert (ptrdiff_type_node, elemsz));
+ offset = build4_loc (input_location, ARRAY_REF,
+  TREE_TYPE (TREE_TYPE (decl)),
+  decl, offset, NULL_TREE, NULL_TREE);
+ OMP_CLAUSE_DECL (node) = offset;
+   }

As this does not cause a change in program behaviour, I believe it is enough to 
just change the expected output in the testcase. I have committed the attached 
patch to the OG10 branch as 'obvious'.


Kwok
From 0b999612a0205da31e3948f67cc754e9208e85fb Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Wed, 19 Aug 2020 12:50:42 -0700
Subject: [PATCH] Fix gfortran.dg/goacc/pr70828.f90 testcase

Array mapping was changed by the patch '[OpenMP, Fortran] Add
structure/derived-type element mapping'.

2020-08-19  Kwok Cheung Yeung  

gcc/testsuite/
* gfortran.dg/goacc/pr70828.f90: Update expected output in Gimple
dump.
---
 gcc/testsuite/ChangeLog.omp | 5 +
 gcc/testsuite/gfortran.dg/goacc/pr70828.f90 | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index 19395cf..ffc8f63 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,3 +1,8 @@
+2020-08-19  Kwok Cheung Yeung  
+
+   * gfortran.dg/goacc/pr70828.f90: Update expected output in Gimple
+   dump.
+
 2020-08-18  Kwok Cheung Yeung  
 
Backport from mainline
diff --git a/gcc/testsuite/gfortran.dg/goacc/pr70828.f90 
b/gcc/testsuite/gfortran.dg/goacc/pr70828.f90
index 2e58120..fcfe086 100644
--- a/gcc/testsuite/gfortran.dg/goacc/pr70828.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/pr70828.f90
@@ -18,5 +18,5 @@ program test
   !$acc end data
 end program test
 
-! { dg-final { scan-tree-dump-times "omp target oacc_data 
map\\(tofrom:MEM\\\[\\(c_char \\*\\)\_\[0-9\]+\\\] \\\[len: _\[0-9\]+\\\]\\) 
map\\(alloc:data \\\[pointer assign, bias: _\[0-9\]+\\\]\\)" 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "omp target oacc_parallel 
map\\(force_present:MEM\\\[\\(c_char \\*\\)D\\.\[0-9\]+\\\] \\\[len: 
D\\.\[0-9\]+\\\]\\) map\\(alloc:data \\\[pointer assign, bias: 
D\\.\[0-9\]+\\\]\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_data 
map\\(tofrom:data\\\[\_\[0-9\]+\\\] \\\[len: _\[0-9\]+\\\]\\) map\\(alloc:data 
\\\[pointer assign, bias: _\[0-9\]+\\\]\\)" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_parallel 
map\\(force_present:data\\\[D\\.\[0-9\]+\\\] \\\[len: D\\.\[0-9\]+\\\]\\) 
map\\(alloc:data \\\[pointer assign, bias: D\\.\[0-9\]+\\\]\\)" 1 "gimple" } }
-- 
2.8.1



Re: [committed] libstdc++: Make __int128 meet integer-class requirements [PR 96042]

2020-08-19 Thread Jonathan Wakely via Gcc-patches

On 19/08/20 17:00 +0100, Jonathan Wakely wrote:

Because __int128 can be used as the difference type for iota_view, we
need to ensure that it meets the requirements of an integer-class type.
The requirements in [iterator.concept.winc] p10 include numeric_limits
being specialized and giving meaningful answers. Currently we only
specialize numeric_limits for non-standard integer types in non-strict
modes.  However, nothing prevents us from defining an explicit
specialization for any implementation-defined type, so it doesn't matter
whether std::is_integral<__int128> is true or not.

This patch ensures that the numeric_limits specializations for signed
and unsigned __int128 are defined whenever __int128 is available. It
also makes the __numeric_traits and __int_limits helpers work for
__int128, via a new __gnu_cxx::__is_integer_nonstrict trait.

libstdc++-v3/ChangeLog:

PR libstdc++/96042
* include/ext/numeric_traits.h (__is_integer_nonstrict): New
trait which is true for 128-bit integers even in strict modes.
(__numeric_traits_integer, __numeric_traits): Use
__is_integer_nonstrict instead of __is_integer.
* include/std/limits [__STRICT_ANSI__ && __SIZEOF_INT128__]
(numeric_limits<__int128>, (numeric_limits):
Define.
* testsuite/std/ranges/iota/96042.cc: New test.


The attached patch is another change needed to support __int128 as an
integer-like type in strict mode.

Tested x86_64-linux, -m32 and -m64. Committed to trunk.

I'll backport this to gcc-10 too.


commit e6e01618e83bcd9eb3a2b27df30ed87106a748b4
Author: Jonathan Wakely 
Date:   Wed Aug 19 20:36:10 2020

libstdc++: Make make-unsigned-like-t<__int128> work [PR 96042]

As well as ensuring that numeric_limits<__int128> is defined, we need to
ensure that make-unsigned-like-t and to-unsigned-like work correctly for
128-bit integers in strict mode. This ensures that a subrange created
from an iota_view's iterator and sentinel can represent its size.

Co-authored-by: Patrick Palka  

libstdc++-v3/ChangeLog:

2020-08-19  Jonathan Wakely  
Patrick Palka  

PR libstdc++/96042
* include/bits/range_access.h (__detail::__to_unsigned_like):
Do not use make_unsigned_t in the return type, as it can
result in an error before the integral constraint is checked.
[__STRICT_ANSI__]: Add overloads for 128-bit integer types.
(__detail::__make_unsigned_like_t): Define as the return type
of __to_unsigned_like.
* testsuite/std/ranges/subrange/96042.cc: New test.

diff --git a/libstdc++-v3/include/bits/range_access.h b/libstdc++-v3/include/bits/range_access.h
index 3eb1f2fd272..bafced31ea8 100644
--- a/libstdc++-v3/include/bits/range_access.h
+++ b/libstdc++-v3/include/bits/range_access.h
@@ -364,13 +364,23 @@ namespace ranges
 { return __max_size_type(__t); }
 
 template
-  constexpr make_unsigned_t<_Tp>
+  constexpr auto
   __to_unsigned_like(_Tp __t) noexcept
-  { return __t; }
+  { return static_cast>(__t); }
 
-template>
+#if defined __STRICT_ANSI__ && defined __SIZEOF_INT128__
+constexpr unsigned __int128
+__to_unsigned_like(__int128 __t) noexcept
+{ return __t; }
+
+constexpr unsigned __int128
+__to_unsigned_like(unsigned __int128 __t) noexcept
+{ return __t; }
+#endif
+
+template
   using __make_unsigned_like_t
-	= conditional_t<_MaxDiff, __max_size_type, make_unsigned_t<_Tp>>;
+	= decltype(__detail::__to_unsigned_like(std::declval<_Tp>()));
 
 // Part of the constraints of ranges::borrowed_range
 template
diff --git a/libstdc++-v3/testsuite/std/ranges/subrange/96042.cc b/libstdc++-v3/testsuite/std/ranges/subrange/96042.cc
new file mode 100644
index 000..5826203f03c
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/ranges/subrange/96042.cc
@@ -0,0 +1,34 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=c++20" }
+// { dg-do compile { target c++2a } }
+
+#include 
+
+constexpr bool
+test01()
+{
+  using I = unsigned long long;
+  // view with a difference type that doesn't fit in long long:
+  std::ranges::iota_view v(0, 

[committed] analyzer: fix ICE converting float to int [PR96699]

2020-08-19 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-2770-g366bd1ac01a5249a463e64234674ad2d174faa9a.

gcc/analyzer/ChangeLog:
PR analyzer/96699
* region-model-manager.cc
(region_model_manager::get_or_create_cast): Use FIX_TRUNC_EXPR for
casting from REAL_TYPE to INTEGER_TYPE.

gcc/testsuite/ChangeLog:
PR analyzer/96699
* gcc.dg/analyzer/pr96699.c: New test.
---
 gcc/analyzer/region-model-manager.cc|  5 +
 gcc/testsuite/gcc.dg/analyzer/pr96699.c | 13 +
 2 files changed, 18 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr96699.c

diff --git a/gcc/analyzer/region-model-manager.cc 
b/gcc/analyzer/region-model-manager.cc
index 4faeaa52a63..07925743ab0 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -396,6 +396,11 @@ region_model_manager::get_or_create_unaryop (tree type, 
enum tree_code op,
 const svalue *
 region_model_manager::get_or_create_cast (tree type, const svalue *arg)
 {
+  gcc_assert (type);
+  if (arg->get_type ())
+if (TREE_CODE (type) == INTEGER_TYPE
+   && TREE_CODE (arg->get_type ()) == REAL_TYPE)
+  return get_or_create_unaryop (type, FIX_TRUNC_EXPR, arg);
   return get_or_create_unaryop (type, NOP_EXPR, arg);
 }
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96699.c 
b/gcc/testsuite/gcc.dg/analyzer/pr96699.c
new file mode 100644
index 000..c68e45a9401
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96699.c
@@ -0,0 +1,13 @@
+struct qi {
+  union {
+int hj;
+float sl;
+  };
+};
+
+void
+i2 (struct qi *la)
+{
+  if (la->hj == 0)
+la->sl = 0.0f;
+}
-- 
2.26.2



[committed] analyzer: fix ICE on deref_rvalue on SK_COMPOUND [PR96643]

2020-08-19 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-2772-g23ebfda0e352fa0a92c6b012458ecb65505a135f.

gcc/analyzer/ChangeLog:
PR analyzer/96643
* region-model.cc (region_model::deref_rvalue): Rather than
attempting to handle all svalue kinds in the switch, only cover
the special cases, and move symbolic-region handling to after
the switch, thus implicitly handling the missing case SK_COMPOUND.

gcc/testsuite/ChangeLog:
PR analyzer/96643
* g++.dg/analyzer/pr96643.C: New test.
---
 gcc/analyzer/region-model.cc| 26 +
 gcc/testsuite/g++.dg/analyzer/pr96643.C | 26 +
 2 files changed, 31 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/analyzer/pr96643.C

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 5b08e48e6e5..8a5e74ebc0e 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1369,7 +1369,7 @@ region_model::deref_rvalue (const svalue *ptr_sval, tree 
ptr_tree,
   switch (ptr_sval->get_kind ())
 {
 default:
-  gcc_unreachable ();
+  break;
 
 case SK_REGION:
   {
@@ -1395,17 +1395,10 @@ region_model::deref_rvalue (const svalue *ptr_sval, 
tree ptr_tree,
  return m_mgr->get_offset_region (parent_region, type, offset);
}
  default:
-   goto create_symbolic_region;
+   break;
  }
   }
-
-case SK_CONSTANT:
-case SK_INITIAL:
-case SK_UNARYOP:
-case SK_SUB:
-case SK_WIDENING:
-case SK_CONJURED:
-  goto create_symbolic_region;
+  break;
 
 case SK_POISONED:
   {
@@ -1425,20 +1418,11 @@ region_model::deref_rvalue (const svalue *ptr_sval, 
tree ptr_tree,
ctxt->warn (new poisoned_value_diagnostic (ptr, pkind));
  }
  }
-   goto create_symbolic_region;
   }
-
-case SK_UNKNOWN:
-  {
-  create_symbolic_region:
-   return m_mgr->get_symbolic_region (ptr_sval);
-  }
-
-case SK_SETJMP:
-  goto create_symbolic_region;
+  break;
 }
 
-  gcc_unreachable ();
+  return m_mgr->get_symbolic_region (ptr_sval);
 }
 
 /* Set the value of the region given by LHS_REG to the value given
diff --git a/gcc/testsuite/g++.dg/analyzer/pr96643.C 
b/gcc/testsuite/g++.dg/analyzer/pr96643.C
new file mode 100644
index 000..2d0a248c73e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/analyzer/pr96643.C
@@ -0,0 +1,26 @@
+/* { dg-additional-options "-O1" } */
+
+int l0;
+
+class qv {
+public:
+  int operator[] (int b1) const { return k2[b1]; }
+
+private:
+  int *k2;
+};
+
+class g0 {
+  qv nf, v6;
+
+  void
+  iq ();
+};
+
+void
+g0::iq ()
+{
+  for (;;)
+if (nf[0] == 0)
+  ++l0;
+}
-- 
2.26.2



[committed] analyzer: fix ICE on folding vector 0 [PR96705]

2020-08-19 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-2771-gfc02b568e2cd3f6a28d4b7c1063bbf8842c89aad.

gcc/analyzer/ChangeLog:
* region-model-manager.cc
PR analyzer/96705
(region_model_manager::maybe_fold_binop): Check that we have an
integral type before calling build_int_cst.

gcc/testsuite/ChangeLog:
PR analyzer/96705
* gcc.dg/analyzer/pr96705.c: New test.
---
 gcc/analyzer/region-model-manager.cc| 4 ++--
 gcc/testsuite/gcc.dg/analyzer/pr96705.c | 9 +
 2 files changed, 11 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr96705.c

diff --git a/gcc/analyzer/region-model-manager.cc 
b/gcc/analyzer/region-model-manager.cc
index 07925743ab0..422c4a95e7b 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -445,7 +445,7 @@ region_model_manager::maybe_fold_binop (tree type, enum 
tree_code op,
   break;
 case MULT_EXPR:
   /* (VAL * 0).  */
-  if (cst1 && zerop (cst1))
+  if (cst1 && zerop (cst1) && INTEGRAL_TYPE_P (type))
return get_or_create_constant_svalue (build_int_cst (type, 0));
   /* (VAL * 1) -> VAL.  */
   if (cst1 && integer_onep (cst1))
@@ -455,7 +455,7 @@ region_model_manager::maybe_fold_binop (tree type, enum 
tree_code op,
 case TRUTH_AND_EXPR:
   if (cst1)
{
- if (zerop (cst1))
+ if (zerop (cst1) && INTEGRAL_TYPE_P (type))
/* "(ARG0 && 0)" -> "0".  */
return get_or_create_constant_svalue (build_int_cst (type, 0));
  else
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96705.c 
b/gcc/testsuite/gcc.dg/analyzer/pr96705.c
new file mode 100644
index 000..d7856d20be5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96705.c
@@ -0,0 +1,9 @@
+int __attribute__ ((vector_size (8))) v;
+int i;
+
+void
+test (void)
+{
+  v &= 0;
+  v *= i;
+}
-- 
2.26.2



Re: [EXTERNAL] Re: [PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V macro definitions.

2020-08-19 Thread Peter Bergner via Gcc-patches
On 8/14/20 7:42 PM, Segher Boessenkool wrote:
> I think your current code is fine; I hadn't considered Bill's upcoming
> rewrite.  It is more important to make that go smoother than to fix some
> aesthetics right now.
> 
> Please check if the names for the builtins match the (future)
> documentation, and then, approved for trunk.  Thank you!

This is also a bug in GCC 10, so this should be backported too.

Segher, is this ok for Carl to backport to GCC 10 after it has sat on
trunk for a few days?

Peter



[PATCH] rs6000: Fix extraneous characters in the documentation

2020-08-19 Thread Tulio Magno Quites Machado Filho via Gcc-patches
Replace them with a whitespace in order to avoid artifacts in the HTML
document.

2020-08-19  Tulio Magno Quites Machado Filho  

gcc/
* doc/extend.texi (PowerPC Built-in Functions): Replace
extraneous characters with whitespace.
---
 gcc/doc/extend.texi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index bcc251481ca..0c380322280 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21538,10 +21538,10 @@ void amo_stdat_smin (int64_t *, int64_t);
 ISA 3.1 of the PowerPC added new Matrix-Multiply Assist (MMA) instructions.
 GCC provides support for these instructions through the following built-in
 functions which are enabled with the @code{-mmma} option.  The vec_t type
-below is defined to be a normal vector unsigned char type.  The uint2, uint4
+below is defined to be a normal vector unsigned char type.  The uint2, uint4
 and uint8 parameters are 2-bit, 4-bit and 8-bit unsigned integer constants
-respectively.  The compiler will verify that they are constants and that
-their values are within range. 
+respectively.  The compiler will verify that they are constants and that
+their values are within range.
 
 The built-in functions supported are:
 
-- 
2.25.4



Re: [PATCH] bb-reorder: Remove a misfiring micro-optimization (PR96475)

2020-08-19 Thread Richard Sandiford
Segher Boessenkool  writes:
> On Wed, Aug 19, 2020 at 01:10:36PM +0100, Richard Sandiford wrote:
>> Segher Boessenkool  writes:
>> > When the compgotos pass copies the tail of blocks ending in an indirect
>> > jump, there is a micro-optimization to not copy the last one, since the
>> > original block will then just be deleted.  This does not work properly
>> > if cleanup_cfg does not merge all pairs of blocks we expect it to.
>> >
>> >
>> > v2: This also deletes the other use of single_pred_p, which has the same
>> > problem in principle, I just never have triggered it so far.
>> 
>> Could you go into more details?  I thought the reason you wanted to
>> remove the test in the loop was that each successful iteration of the
>> loop removes one incoming edge, meaning that if all previous iterations
>> succeed, the final iteration is bound to see a single predecessor.
>> And there's no guarantee in that case that the jump in the final
>> block will be removed.  I.e. the loop is unnecessarily leaving
>> cfgcleanup to finish off the last step for it.  So I agree that the
>> loop part looks good.  (I assume that was v1, but I missed it, sorry.)
>
> I didn't Cc: you, I should have :-)
>
> Yes, that is part of the problem.  The case where it was noticed however
> was some release of GCC (some 9 release iirc) that did something wrong
> with cfg_cleanup apparenetly, so not everything was threaded optimally,
> making duplicate_computed_gotos run into jumps to jumps and stopping.
> Since it is such an important pass (for performance, for things with
> bigger jump tables or with threaded FSMs etc.), it seemed a good idea to
> make it more robust instead of slightly faster.
>
>> But it looks like the point of the initial test is to avoid “duplicating”
>> the block if there would still only be one copy of the code.  That test
>> still seems reasonable since the case it rejects should be handled by
>> other cfg cleanups.  If for some reason it isn't, there doesn't seem
>> to be any reason to apply the max_size limit for a single predecessor,
>> since no code duplication will take place.  So ISTM that removing the
>> first condition is really a separate thing that should be its own patch.
>
> I have never seen the second case misfiring in practice, only the first
> one!

Shucks, I guessed the wrong way round :-)

I'd argue that the first check isn't a micro-optimisation though.
It's testing whether the duplication really is a duplication
(rather than a move).

The second test does look like a micro-optimisation.

> The original patch is at
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551560.html
> btw.  I'll prepare a testcase for that (it doesn't trigger with recent
> GCC versions though :-/ )

Hmm, OK.  If the first check is the important one, then I think it'd
be better to make the max_size stuff conditional on !single_pred_p
rather than drop the test entirely.

Thanks,
Richard


Re: [PATCH] rs6000: Enable more sibcalls when TOC is not preserved

2020-08-19 Thread Segher Boessenkool
Hi!

On Wed, Aug 19, 2020 at 09:40:16AM -0500, Bill Schmidt wrote:
> A function compiled with the PC-relative addressing model does not
> require r2 to contain a TOC pointer, and does not guarantee that r2
> will be preserved for its caller.  Such a function can make sibcalls
> without restriction based on TOC preservation rules.  However, a
> caller that does preserve r2 cannot make a sibcall to a callee that
> does not.

This looks fine.  _Is_ fine even, afaics :-)  Okay for trunk.

Thanks!


Segher


Re: [PATCH 1/2] Add new RTX instruction class FILLER_INSN

2020-08-19 Thread Andrea Corallo
Segher Boessenkool  writes:

> [ Please don't post new patch series as replies to old ]
>
> On Wed, Jul 22, 2020 at 12:02:33PM +0200, Andrea Corallo wrote:
>> This first patch implements the addition of a new RTX instruction class
>> FILLER_INSN, which has been white listed to allow placement of NOPs
>> outside of a basic block.  This is to allow padding after unconditional
>> branches.  This is favorable so that any performance gained from
>> diluting branches is not paid straight back via excessive eating of
>> nops.
>> 
>> It was deemed that a new RTX class was less invasive than modifying
>> behavior in regards to standard UNSPEC nops.
>
> Deemed, by whom?  There are several people very against it, too.  You
> need to modify only one simple behaviour (maybe in a handful of places),
> making a new RTX class for that is excessive.

Hi Segher,

That's understood and agreed.  I haven't posted any new patch on this,
the quoted mail is an old one.  I just wanted to discuss how to proceede
this way with my mail of today.

Thanks for your feedback!

  Andrea


Re: [PATCH] bb-reorder: Remove a misfiring micro-optimization (PR96475)

2020-08-19 Thread Segher Boessenkool
On Wed, Aug 19, 2020 at 01:10:36PM +0100, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > When the compgotos pass copies the tail of blocks ending in an indirect
> > jump, there is a micro-optimization to not copy the last one, since the
> > original block will then just be deleted.  This does not work properly
> > if cleanup_cfg does not merge all pairs of blocks we expect it to.
> >
> >
> > v2: This also deletes the other use of single_pred_p, which has the same
> > problem in principle, I just never have triggered it so far.
> 
> Could you go into more details?  I thought the reason you wanted to
> remove the test in the loop was that each successful iteration of the
> loop removes one incoming edge, meaning that if all previous iterations
> succeed, the final iteration is bound to see a single predecessor.
> And there's no guarantee in that case that the jump in the final
> block will be removed.  I.e. the loop is unnecessarily leaving
> cfgcleanup to finish off the last step for it.  So I agree that the
> loop part looks good.  (I assume that was v1, but I missed it, sorry.)

I didn't Cc: you, I should have :-)

Yes, that is part of the problem.  The case where it was noticed however
was some release of GCC (some 9 release iirc) that did something wrong
with cfg_cleanup apparenetly, so not everything was threaded optimally,
making duplicate_computed_gotos run into jumps to jumps and stopping.
Since it is such an important pass (for performance, for things with
bigger jump tables or with threaded FSMs etc.), it seemed a good idea to
make it more robust instead of slightly faster.

> But it looks like the point of the initial test is to avoid “duplicating”
> the block if there would still only be one copy of the code.  That test
> still seems reasonable since the case it rejects should be handled by
> other cfg cleanups.  If for some reason it isn't, there doesn't seem
> to be any reason to apply the max_size limit for a single predecessor,
> since no code duplication will take place.  So ISTM that removing the
> first condition is really a separate thing that should be its own patch.

I have never seen the second case misfiring in practice, only the first
one!

The original patch is at
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551560.html
btw.  I'll prepare a testcase for that (it doesn't trigger with recent
GCC versions though :-/ )

Thanks,


Segher


Re: [PATCH 1/2] Add new RTX instruction class FILLER_INSN

2020-08-19 Thread Segher Boessenkool
On Wed, Aug 19, 2020 at 11:13:40AM +0200, Andrea Corallo wrote:
> Segher Boessenkool  writes:
> > So I wonder if this cannot be done with some kind of NOTE, instead?
> 
> I was having a look into reworking this using an insn note as (IIUC)
> suggested.  The idea is appealing but looking into insn-notes.def I've
> found the following comment:
> 
> "We are slowly removing the concept of insn-chain notes from the
> compiler.  Adding new codes to this file is STRONGLY DISCOURAGED.
> If you think you need one, look for other ways to express what you
> mean, such as register notes or bits in the basic-block structure."

That is from 2004.  Since then 9 note types have been removed, but 7
new types added.  (There are 18 insn note types now).

> Would still be justificated in this case to proceed this way?

Yes, it is a lesser evil imho.

> The other
> option would be to add the information into the basic-block or into
> struct rtx_jump_insn.

Or just look at the insn code, define a "filler-nop" insn, allow those
after BBs?  Any reason that wouldn't work?


Segher


Re: [PATCH 1/2] Add new RTX instruction class FILLER_INSN

2020-08-19 Thread Segher Boessenkool
[ Please don't post new patch series as replies to old ]

On Wed, Jul 22, 2020 at 12:02:33PM +0200, Andrea Corallo wrote:
> This first patch implements the addition of a new RTX instruction class
> FILLER_INSN, which has been white listed to allow placement of NOPs
> outside of a basic block.  This is to allow padding after unconditional
> branches.  This is favorable so that any performance gained from
> diluting branches is not paid straight back via excessive eating of
> nops.
> 
> It was deemed that a new RTX class was less invasive than modifying
> behavior in regards to standard UNSPEC nops.

Deemed, by whom?  There are several people very against it, too.  You
need to modify only one simple behaviour (maybe in a handful of places),
making a new RTX class for that is excessive.

>   * cfgbuild.c (inside_basic_block_p): Handle FILLER_INSN.
>   * cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside
>   basic blocks.
>   * coretypes.h: New rtx class.

coretypes.h is not a new RTX class?  :-)  Maybe:

* coretypes.h (rtx_filler_insn): New struct.

> @@ -3033,7 +3034,20 @@ rtl_verify_bb_layout (void)
> break;
>  
>   default:
> -   fatal_insn ("insn outside basic block", x);
> +   /* Allow nops after branches, via FILLER_INSN.  */
> +   bool fail = true;
> +   subrtx_iterator::array_type array;
> +   FOR_EACH_SUBRTX (iter, array, x, ALL)
> + {
> +   const_rtx rtx = *iter;
> +   if (GET_CODE (rtx) == FILLER_INSN)
> + {
> +   fail = false;
> +   break;
> + }
> + }
> +   if (fail)
> + fatal_insn ("insn outside basic block", x);
>   }
>   }

It wouldn't be hard to allow some existing RTL here.  Maybe something
with CODE_FOR_filler_nop or similar (after you recog () it).

It still allows anything after leading filler insns, btw; you could get
rid of the "fail" variable altogether, just call fatal_insn as soon as
you see some unexpected RTX code.

> +  rtx_insn* i = make_insn_raw (pattern);

rtx_insn *i = ...


Segher


Re: [PATCH] PR libstdc++/71579 assert that type traits are not misused with an incomplete type

2020-08-19 Thread Antony Polukhin via Gcc-patches
ср, 19 авг. 2020 г. в 14:29, Jonathan Wakely :
<...>
> Do we also want to check
> (std::__is_complete_or_unbounded(__type_identity<_ArgTypes>{}) && ...)
> for invoke_result and the is_invocable traits?
>
> We can use a fold expression there, because those traits are not
> defined before C++17.

Good idea. I'll try.

-- 
Best regards,
Antony Polukhin


[committed] libstdc++: Remove deprecated comparison operators for RB trees

2020-08-19 Thread Jonathan Wakely via Gcc-patches
These functions were deprecated in GCC 9.1.0 because they are never used
by the library. This patch removes them for GCC 11.

libstdc++-v3/ChangeLog:

* include/bits/stl_tree.h (operator!=, operator>, operator<=)
(operator>=): Remove deprecated functions.

Tested powerpc64le-linux. Committed to trunk.

commit 5abc821556e141c9b7003877d09d9dd9e9f98ae7
Author: Jonathan Wakely 
Date:   Wed Aug 19 17:04:49 2020

libstdc++: Remove deprecated comparison operators for RB trees

These functions were deprecated in GCC 9.1.0 because they are never used
by the library. This patch removes them for GCC 11.

libstdc++-v3/ChangeLog:

* include/bits/stl_tree.h (operator!=, operator>, operator<=)
(operator>=): Remove deprecated functions.

diff --git a/libstdc++-v3/include/bits/stl_tree.h 
b/libstdc++-v3/include/bits/stl_tree.h
index 21b72cebf2e..c50391d68c3 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -1632,22 +1632,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return std::lexicographical_compare(__x.begin(), __x.end(),
__y.begin(), __y.end());
   }
-
-  friend bool _GLIBCXX_DEPRECATED
-  operator!=(const _Rb_tree& __x, const _Rb_tree& __y)
-  { return !(__x == __y); }
-
-  friend bool _GLIBCXX_DEPRECATED
-  operator>(const _Rb_tree& __x, const _Rb_tree& __y)
-  { return __y < __x; }
-
-  friend bool _GLIBCXX_DEPRECATED
-  operator<=(const _Rb_tree& __x, const _Rb_tree& __y)
-  { return !(__y < __x); }
-
-  friend bool _GLIBCXX_DEPRECATED
-  operator>=(const _Rb_tree& __x, const _Rb_tree& __y)
-  { return !(__x < __y); }
 #endif
 };
 


[PATCH v2][GCC] arm: Add +nomve and +nomve.fp options to -mcpu=cortex-m55

2020-08-19 Thread Joe Ramsay
From: Joe Ramsay 

Hi all,

This patch rearranges feature bits for MVE and FP to implement the
following flags for -mcpu=cortex-m55.

  - +nomve:equivalent to armv8.1-m.main+fp.dp+dsp.
  - +nomve.fp: equivalent to armv8.1-m.main+mve+fp.dp (+dsp is implied by +mve).
  - +nofp: equivalent to armv8.1-m.main+mve (+dsp is implied by +mve).
  - +nodsp:equivalent to armv8.1-m.main+fp.dp.

Combinations of the above:

  - +nomve+nofp: equivalent to armv8.1-m.main+dsp.
  - +nodsp+nofp: equivalent to armv8.1-m.main.

Due to MVE and FP sharing vfp_base, some new syntax was required in the CPU
description to implement the concept of 'implied bits'. These are non-named
features added to the ISA late, depending on whether one or more features which
depend on them are present. This means vfp_base can be present when only one of
MVE and FP is removed, but absent when both are removed.

Bootstrapped and tested on arm-none-eabi. OK for master?

Thanks,
Joe

gcc/ChangeLog:

2020-07-31  Joe Ramsay  

* config/arm/arm-cpus.in:
(ALL_FPU_INTERNAL): Remove vfp_base.
(VFPv2): Remove vfp_base.
(MVE): Remove vfp_base.
(vfp_base): Redefine as implied bit dependent on MVE or FP
(cortex-m55): Add flags to disable MVE, MVE FP, FP and DSP extensions.
* config/arm/arm.c (arm_configure_build_target): Add implied bits to 
ISA.
* config/arm/parsecpu.awk:
(gen_isa): Print implied bits and their dependencies to ISA header.
(gen_data): Add parsing for implied feature bits.

gcc/testsuite/ChangeLog:

2020-07-31  Joe Ramsay  

* gcc.target/arm/multilib.exp: Add tests for -mcpu=cortex-m55.
* gcc.target/arm/cortex-m55-nodsp-flag.c: New test.
* gcc.target/arm/cortex-m55-nodsp-nofp-flag.c: New test.
* gcc.target/arm/cortex-m55-nofp-flag.c: New test.
* gcc.target/arm/cortex-m55-nofp-nomve-flag.c: New test.
* gcc.target/arm/cortex-m55-nomve-flag.c: New test.
* gcc.target/arm/cortex-m55-nomve.fp-flag.c: New test.
---
 gcc/config/arm/arm-cpus.in | 26 ---
 gcc/config/arm/arm.c   | 14 ++
 gcc/config/arm/parsecpu.awk| 51 ++
 .../gcc.target/arm/cortex-m55-nodsp-flag-hard.c| 15 +++
 .../gcc.target/arm/cortex-m55-nodsp-flag-softfp.c  | 15 +++
 .../arm/cortex-m55-nodsp-nofp-flag-softfp.c| 15 +++
 .../gcc.target/arm/cortex-m55-nofp-flag-hard.c | 15 +++
 .../gcc.target/arm/cortex-m55-nofp-flag-softfp.c   | 15 +++
 .../arm/cortex-m55-nofp-nomve-flag-softfp.c| 15 +++
 .../gcc.target/arm/cortex-m55-nomve-flag-hard.c| 15 +++
 .../gcc.target/arm/cortex-m55-nomve-flag-softfp.c  | 15 +++
 .../gcc.target/arm/cortex-m55-nomve.fp-flag-hard.c | 15 +++
 .../arm/cortex-m55-nomve.fp-flag-softfp.c  | 15 +++
 gcc/testsuite/gcc.target/arm/multilib.exp  | 16 +++
 14 files changed, 250 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/cortex-m55-nodsp-flag-hard.c
 create mode 100644 gcc/testsuite/gcc.target/arm/cortex-m55-nodsp-flag-softfp.c
 create mode 100644 
gcc/testsuite/gcc.target/arm/cortex-m55-nodsp-nofp-flag-softfp.c
 create mode 100644 gcc/testsuite/gcc.target/arm/cortex-m55-nofp-flag-hard.c
 create mode 100644 gcc/testsuite/gcc.target/arm/cortex-m55-nofp-flag-softfp.c
 create mode 100644 
gcc/testsuite/gcc.target/arm/cortex-m55-nofp-nomve-flag-softfp.c
 create mode 100644 gcc/testsuite/gcc.target/arm/cortex-m55-nomve-flag-hard.c
 create mode 100644 gcc/testsuite/gcc.target/arm/cortex-m55-nomve-flag-softfp.c
 create mode 100644 gcc/testsuite/gcc.target/arm/cortex-m55-nomve.fp-flag-hard.c
 create mode 100644 
gcc/testsuite/gcc.target/arm/cortex-m55-nomve.fp-flag-softfp.c

diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index c98f8ed..5083028 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -135,10 +135,6 @@ define feature armv8_1m_main
 # Floating point and Neon extensions.
 # VFPv1 is not supported in GCC.
 
-# This feature bit is enabled for all VFP, MVE and
-# MVE with floating point extensions.
-define feature vfp_base
-
 # Vector floating point v2.
 define feature vfpv2
 
@@ -251,7 +247,7 @@ define fgroup ALL_SIMD  ALL_SIMD_INTERNAL 
ALL_SIMD_EXTERNAL
 
 # List of all FPU bits to strip out if -mfpu is used to override the
 # default.  fp16 is deliberately missing from this list.
-define fgroup ALL_FPU_INTERNAL vfp_base vfpv2 vfpv3 vfpv4 fpv5 fp16conv fp_dbl 
ALL_SIMD_INTERNAL
+define fgroup ALL_FPU_INTERNAL vfpv2 vfpv3 vfpv4 fpv5 fp16conv fp_dbl 
ALL_SIMD_INTERNAL
 # Similarly, but including fp16 and other extensions that aren't part of
 # -mfpu support.
 define fgroup ALL_FPU_EXTERNAL fp16 bf16
@@ -296,11 +292,11 @@ define fgroup ARMv8r  ARMv8a
 define fgroup ARMv8_1m_main ARMv8m_main armv8_1m_main
 
 # Useful combinations.
-define fgroup VFPv2   

[committed] libstdc++: Make __int128 meet integer-class requirements [PR 96042]

2020-08-19 Thread Jonathan Wakely via Gcc-patches
Because __int128 can be used as the difference type for iota_view, we
need to ensure that it meets the requirements of an integer-class type.
The requirements in [iterator.concept.winc] p10 include numeric_limits
being specialized and giving meaningful answers. Currently we only
specialize numeric_limits for non-standard integer types in non-strict
modes.  However, nothing prevents us from defining an explicit
specialization for any implementation-defined type, so it doesn't matter
whether std::is_integral<__int128> is true or not.

This patch ensures that the numeric_limits specializations for signed
and unsigned __int128 are defined whenever __int128 is available. It
also makes the __numeric_traits and __int_limits helpers work for
__int128, via a new __gnu_cxx::__is_integer_nonstrict trait.

libstdc++-v3/ChangeLog:

PR libstdc++/96042
* include/ext/numeric_traits.h (__is_integer_nonstrict): New
trait which is true for 128-bit integers even in strict modes.
(__numeric_traits_integer, __numeric_traits): Use
__is_integer_nonstrict instead of __is_integer.
* include/std/limits [__STRICT_ANSI__ && __SIZEOF_INT128__]
(numeric_limits<__int128>, (numeric_limits):
Define.
* testsuite/std/ranges/iota/96042.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

I'll backport this to gcc-10 too.


commit 386fd16c551188e20d5b1684b7139e4269f9a739
Author: Jonathan Wakely 
Date:   Wed Aug 19 16:27:25 2020

libstdc++: Make __int128 meet integer-class requirements [PR 96042]

Because __int128 can be used as the difference type for iota_view, we
need to ensure that it meets the requirements of an integer-class type.
The requirements in [iterator.concept.winc] p10 include numeric_limits
being specialized and giving meaningful answers. Currently we only
specialize numeric_limits for non-standard integer types in non-strict
modes.  However, nothing prevents us from defining an explicit
specialization for any implementation-defined type, so it doesn't matter
whether std::is_integral<__int128> is true or not.

This patch ensures that the numeric_limits specializations for signed
and unsigned __int128 are defined whenever __int128 is available. It
also makes the __numeric_traits and __int_limits helpers work for
__int128, via a new __gnu_cxx::__is_integer_nonstrict trait.

libstdc++-v3/ChangeLog:

PR libstdc++/96042
* include/ext/numeric_traits.h (__is_integer_nonstrict): New
trait which is true for 128-bit integers even in strict modes.
(__numeric_traits_integer, __numeric_traits): Use
__is_integer_nonstrict instead of __is_integer.
* include/std/limits [__STRICT_ANSI__ && __SIZEOF_INT128__]
(numeric_limits<__int128>, (numeric_limits):
Define.
* testsuite/std/ranges/iota/96042.cc: New test.

diff --git a/libstdc++-v3/include/ext/numeric_traits.h 
b/libstdc++-v3/include/ext/numeric_traits.h
index 69f286d7be7..585ecc0ba9f 100644
--- a/libstdc++-v3/include/ext/numeric_traits.h
+++ b/libstdc++-v3/include/ext/numeric_traits.h
@@ -51,11 +51,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   (__glibcxx_signed(_Tp) ? \
(_Tp)1 << (__glibcxx_digits(_Tp) - 1)) - 1) << 1) + 1) : ~(_Tp)0)
 
+  template
+struct __is_integer_nonstrict
+: public std::__is_integer<_Tp>
+{ };
+
+#if defined __STRICT_ANSI__ && defined __SIZEOF_INT128__
+  // __is_integer<__int128> is false, but we still want to allow it here.
+  template<> struct __is_integer_nonstrict<__int128>
+  { enum { __value = 1 }; typedef std::__true_type __type; };
+
+  template<> struct __is_integer_nonstrict
+  { enum { __value = 1 }; typedef std::__true_type __type; };
+#endif
+
   template
 struct __numeric_traits_integer
 {
 #if __cplusplus >= 201103L
-  static_assert(std::__is_integer<_Value>::__value,
+  static_assert(__is_integer_nonstrict<_Value>::__value,
"invalid specialization");
 #endif
 
@@ -132,7 +146,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 struct __numeric_traits
-: public __conditional_type::__value,
+: public __conditional_type<__is_integer_nonstrict<_Value>::__value,
__numeric_traits_integer<_Value>,
__numeric_traits_floating<_Value> >::__type
 { };
diff --git a/libstdc++-v3/include/std/limits b/libstdc++-v3/include/std/limits
index f5e403be727..20883ba6403 100644
--- a/libstdc++-v3/include/std/limits
+++ b/libstdc++-v3/include/std/limits
@@ -1477,8 +1477,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
= round_toward_zero;
 };
 
-#if !defined(__STRICT_ANSI__)
-
 #define __INT_N(TYPE, BITSIZE, EXT, UEXT)  \
   template<>   \
 struct numeric_limits   

[committed] i386: Use code_for_ instead of gen_ for parameterized names more.

2020-08-19 Thread Uros Bizjak via Gcc-patches
Some builtins are better expanded to patterns with
parametrized names via code_for_ than gen_ helpers.

No functional changes.

2020-08-19  Uroš Bizjak  

gcc/ChangeLog:

* config/i386/i386-expand.c (ix86_expand_builtin)
[case IX86_BUILTIN_ENQCMD, case IX86_BUILTIN_ENQCMDS]:
Rewrite expansion to use code_for_enqcmd.
[case IX86_BUILTIN_WRSSD, case IX86_BUILTIN_WRSSQ]:
Rewrite expansion to use code_for_wrss.
[case IX86_BUILTIN_WRUSSD, case IX86_BUILTIN_WRUSSD]:
Rewrite expansion to use code_for_wruss.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 68fbe8385b3..61618636cff 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -11433,24 +11433,24 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
subtarget,
}
   else
{
- rtx pat;
+ if (target == 0
+ || !register_operand (target, SImode))
+   target = gen_reg_rtx (SImode);
 
- target = gen_reg_rtx (SImode);
  emit_move_insn (target, const0_rtx);
  target = gen_rtx_SUBREG (QImode, target, 0);
 
- if (fcode == IX86_BUILTIN_ENQCMD)
-   pat = gen_enqcmd (UNSPECV_ENQCMD, Pmode, op0, op1);
- else
-   pat = gen_enqcmd (UNSPECV_ENQCMDS, Pmode, op0, op1);
-
- emit_insn (pat);
-
- emit_insn (gen_rtx_SET (gen_rtx_STRICT_LOW_PART (VOIDmode, target),
- gen_rtx_fmt_ee (EQ, QImode,
- SET_DEST (pat),
- const0_rtx)));
+ int unspecv = (fcode == IX86_BUILTIN_ENQCMD
+? UNSPECV_ENQCMD
+: UNSPECV_ENQCMDS);
+ icode = code_for_enqcmd (unspecv, Pmode);
+ emit_insn (GEN_FCN (icode) (op0, op1));
 
+ emit_insn
+   (gen_rtx_SET (gen_rtx_STRICT_LOW_PART (VOIDmode, target),
+ gen_rtx_fmt_ee (EQ, QImode,
+ gen_rtx_REG (CCZmode, FLAGS_REG),
+ const0_rtx)));
  return SUBREG_REG (target);
}
 
@@ -12839,10 +12839,12 @@ rdseed_step:
}
   op1 = gen_rtx_MEM (mode, op1);
 
-  emit_insn ((fcode == IX86_BUILTIN_WRSSD
- || fcode == IX86_BUILTIN_WRSSQ)
-? gen_wrss (mode, op0, op1)
-: gen_wruss (mode, op0, op1));
+  icode = ((fcode == IX86_BUILTIN_WRSSD
+   || fcode == IX86_BUILTIN_WRSSQ)
+  ? code_for_wrss (mode)
+  : code_for_wruss (mode));
+  emit_insn (GEN_FCN (icode) (op0, op1));
+
   return 0;
 
 default:


[PING][PATCH] correct handling of indices into arrays with elements larger than 1 (PR c++/96511)

2020-08-19 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551783.html

On 8/11/20 10:19 AM, Martin Sebor wrote:

-Wplacement-new handles array indices and pointer offsets the same:
by adjusting them by the size of the element.  That's correct for
the latter but wrong for the former, causing false positives when
the element size is greater than one.

In addition, the warning doesn't even attempt to handle arrays of
arrays.  I'm not sure if I forgot or if I simply didn't think of
it.

The attached patch corrects these oversights by replacing most
of the -Wplacement-new code with a call to compute_objsize which
handles all this correctly (plus more), and is also better tested.
But even compute_objsize has bugs: it trips up while converting
wide_int to offset_int for some pointer offset ranges.  Since
handling the C++ IL required changes in this area the patch also
fixes that.

For review purposes, the patch affects just the middle end.
The C++ diff pretty much just removes code from the front end.

Tested on x86_64-linux plus by building the latest Glibc and
confirming no new warnings.

Martin




[wwwdocs] Remove stray '>' character

2020-08-19 Thread Jonathan Wakely via Gcc-patches
Pushed to wwwdocs.


commit e8120aa3ecc3dec9c6ddcd3add2326e94fe5fb1e
Author: Jonathan Wakely 
Date:   Wed Aug 19 15:41:44 2020 +0100

Remove stray '>' character

diff --git a/htdocs/bugs/index.html b/htdocs/bugs/index.html
index 66d9138f..a6631d8a 100644
--- a/htdocs/bugs/index.html
+++ b/htdocs/bugs/index.html
@@ -222,7 +222,7 @@ submit it according to our generic 
instructions.
 "[Ada]" tag in the subject.)
 
 Detailed bug reporting instructions when using a precompiled
-header>
+header
 
 If you're encountering a bug when using a precompiled header, the
 first thing to do is to delete the precompiled header, and try running


Re: [PATCH] rs6000: Enable more sibcalls when TOC is not preserved

2020-08-19 Thread Bill Schmidt via Gcc-patches
I failed to mention that this has been bootstrapped and tested on 
powerpc64le-unknown-linux-gnu, with no regressions.  Is this ok for trunk?


Thanks,
Bill

On 8/19/20 9:40 AM, Bill Schmidt via Gcc-patches wrote:

A function compiled with the PC-relative addressing model does not
require r2 to contain a TOC pointer, and does not guarantee that r2
will be preserved for its caller.  Such a function can make sibcalls
without restriction based on TOC preservation rules.  However, a
caller that does preserve r2 cannot make a sibcall to a callee that
does not.

2020-08-19  Bill Schmidt  

gcc/
* config/rs6000/rs6000-logue.c (rs6000_decl_ok_for_sibcall):
Sibcalls are always legal when the caller doesn't preserve r2.

gcc/testsuite/
* gcc.target/powerpc/pcrel-sibcall-1.c: Adjust.
---
  gcc/config/rs6000/rs6000-logue.c  | 30 +--
  .../gcc.target/powerpc/pcrel-sibcall-1.c  | 19 
  2 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-logue.c b/gcc/config/rs6000/rs6000-logue.c
index 6aad1ff826a..5a2cb7fdf2c 100644
--- a/gcc/config/rs6000/rs6000-logue.c
+++ b/gcc/config/rs6000/rs6000-logue.c
@@ -1080,28 +1080,28 @@ rs6000_decl_ok_for_sibcall (tree decl)

if (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)
  {
-  /* Under the AIX or ELFv2 ABIs we can't allow calls to non-local
-functions, because the callee may have a different TOC pointer to
-the caller and there's no way to ensure we restore the TOC when
+  /* A function compiled using the PC-relative addressing model does not
+use a TOC pointer; nor is it guaranteed to preserve the value of
+r2 for its caller's TOC.  Such a function may make sibcalls to any
+function, whether local or external, without restriction based on
+TOC-save/restore rules.  */
+  if (rs6000_pcrel_p (cfun))
+   return true;
+
+  /* Otherwise, under the AIX or ELFv2 ABIs we can't allow sibcalls
+to non-local functions, because the callee may not preserve the
+TOC pointer, and there's no way to ensure we restore the TOC when
 we return.  */
if (!decl || DECL_EXTERNAL (decl) || DECL_WEAK (decl)
  || !(*targetm.binds_local_p) (decl))
return false;

-  /* Similarly, if the caller preserves the TOC pointer and the callee
-doesn't (or vice versa), proper TOC setup or restoration will be
-missed.  For example, suppose A, B, and C are in the same binary
-and A -> B -> C.  A and B preserve the TOC pointer but C does not,
-and B -> C is eligible as a sibcall.  A will call B through its
-local entry point, so A will not restore its TOC itself.  B calls
-C with a sibcall, so it will not restore the TOC.  C does not
-preserve the TOC, so it may clobber r2 with impunity.  Returning
-from C will result in a corrupted TOC for A.  */
-  else if (rs6000_fndecl_pcrel_p (decl) != rs6000_pcrel_p (cfun))
+  /* A local sibcall from a function that preserves the TOC pointer
+to a function that does not is invalid for the same reason.  */
+  if (rs6000_fndecl_pcrel_p (decl))
return false;

-  else
-   return true;
+  return true;
  }

/*  With the secure-plt SYSV ABI we can't make non-local calls when
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-sibcall-1.c 
b/gcc/testsuite/gcc.target/powerpc/pcrel-sibcall-1.c
index dfcf8183ccd..9197788f98f 100644
--- a/gcc/testsuite/gcc.target/powerpc/pcrel-sibcall-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-sibcall-1.c
@@ -3,10 +3,9 @@
  /* { dg-require-effective-target powerpc_elfv2 } */
  /* { dg-require-effective-target power10_ok } */

-/* Test that potential sibcalls are not generated when the caller preserves the
-   TOC and the callee doesn't, or vice versa.  At present, -mcpu=power10 does
-   not enable pc-relative mode.  Enable it here explicitly until it is turned
-   on by default.  */
+/* Test that potential sibcalls are generated when the caller does not
+   preserve the TOC, even for external calls; and that sibcalls are not
+   generated when the caller preserves the TOC but the callee does not.  */

  #pragma GCC target ("cpu=power10,pcrel")
  int x (void) __attribute__((noinline));
@@ -39,12 +38,20 @@ int xx (void)
return 1;
  }

+extern int yy (void);
+
  #pragma GCC target ("cpu=power10,pcrel")
-int notoc_call (void)
+int notoc_sibcall (void)
  {
return xx ();
  }

+int extern_sibcall (void)
+{
+  return yy ();
+}
+
  /* { dg-final { scan-assembler {\mb x@notoc\M} } } */
  /* { dg-final { scan-assembler {\mbl y\M} } } */
-/* { dg-final { scan-assembler {\mbl xx@notoc\M} } } */
+/* { dg-final { scan-assembler {\mb xx@notoc\M} } } */
+/* { dg-final { scan-assembler {\mb yy@notoc\M} } } */


[PATCH] rs6000: Enable more sibcalls when TOC is not preserved

2020-08-19 Thread Bill Schmidt via Gcc-patches
A function compiled with the PC-relative addressing model does not
require r2 to contain a TOC pointer, and does not guarantee that r2
will be preserved for its caller.  Such a function can make sibcalls
without restriction based on TOC preservation rules.  However, a
caller that does preserve r2 cannot make a sibcall to a callee that
does not.

2020-08-19  Bill Schmidt  

gcc/
* config/rs6000/rs6000-logue.c (rs6000_decl_ok_for_sibcall):
Sibcalls are always legal when the caller doesn't preserve r2.

gcc/testsuite/
* gcc.target/powerpc/pcrel-sibcall-1.c: Adjust.
---
 gcc/config/rs6000/rs6000-logue.c  | 30 +--
 .../gcc.target/powerpc/pcrel-sibcall-1.c  | 19 
 2 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-logue.c b/gcc/config/rs6000/rs6000-logue.c
index 6aad1ff826a..5a2cb7fdf2c 100644
--- a/gcc/config/rs6000/rs6000-logue.c
+++ b/gcc/config/rs6000/rs6000-logue.c
@@ -1080,28 +1080,28 @@ rs6000_decl_ok_for_sibcall (tree decl)
 
   if (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)
 {
-  /* Under the AIX or ELFv2 ABIs we can't allow calls to non-local
-functions, because the callee may have a different TOC pointer to
-the caller and there's no way to ensure we restore the TOC when
+  /* A function compiled using the PC-relative addressing model does not
+use a TOC pointer; nor is it guaranteed to preserve the value of
+r2 for its caller's TOC.  Such a function may make sibcalls to any
+function, whether local or external, without restriction based on
+TOC-save/restore rules.  */
+  if (rs6000_pcrel_p (cfun))
+   return true;
+
+  /* Otherwise, under the AIX or ELFv2 ABIs we can't allow sibcalls
+to non-local functions, because the callee may not preserve the
+TOC pointer, and there's no way to ensure we restore the TOC when
 we return.  */
   if (!decl || DECL_EXTERNAL (decl) || DECL_WEAK (decl)
  || !(*targetm.binds_local_p) (decl))
return false;
 
-  /* Similarly, if the caller preserves the TOC pointer and the callee
-doesn't (or vice versa), proper TOC setup or restoration will be
-missed.  For example, suppose A, B, and C are in the same binary
-and A -> B -> C.  A and B preserve the TOC pointer but C does not,
-and B -> C is eligible as a sibcall.  A will call B through its
-local entry point, so A will not restore its TOC itself.  B calls
-C with a sibcall, so it will not restore the TOC.  C does not
-preserve the TOC, so it may clobber r2 with impunity.  Returning
-from C will result in a corrupted TOC for A.  */
-  else if (rs6000_fndecl_pcrel_p (decl) != rs6000_pcrel_p (cfun))
+  /* A local sibcall from a function that preserves the TOC pointer
+to a function that does not is invalid for the same reason.  */
+  if (rs6000_fndecl_pcrel_p (decl))
return false;
 
-  else
-   return true;
+  return true;
 }
 
   /*  With the secure-plt SYSV ABI we can't make non-local calls when
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-sibcall-1.c 
b/gcc/testsuite/gcc.target/powerpc/pcrel-sibcall-1.c
index dfcf8183ccd..9197788f98f 100644
--- a/gcc/testsuite/gcc.target/powerpc/pcrel-sibcall-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-sibcall-1.c
@@ -3,10 +3,9 @@
 /* { dg-require-effective-target powerpc_elfv2 } */
 /* { dg-require-effective-target power10_ok } */
 
-/* Test that potential sibcalls are not generated when the caller preserves the
-   TOC and the callee doesn't, or vice versa.  At present, -mcpu=power10 does
-   not enable pc-relative mode.  Enable it here explicitly until it is turned
-   on by default.  */
+/* Test that potential sibcalls are generated when the caller does not
+   preserve the TOC, even for external calls; and that sibcalls are not
+   generated when the caller preserves the TOC but the callee does not.  */
 
 #pragma GCC target ("cpu=power10,pcrel")
 int x (void) __attribute__((noinline));
@@ -39,12 +38,20 @@ int xx (void)
   return 1;
 }
 
+extern int yy (void);
+
 #pragma GCC target ("cpu=power10,pcrel")
-int notoc_call (void)
+int notoc_sibcall (void)
 {
   return xx ();
 }
 
+int extern_sibcall (void)
+{
+  return yy ();
+}
+
 /* { dg-final { scan-assembler {\mb x@notoc\M} } } */
 /* { dg-final { scan-assembler {\mbl y\M} } } */
-/* { dg-final { scan-assembler {\mbl xx@notoc\M} } } */
+/* { dg-final { scan-assembler {\mb xx@notoc\M} } } */
+/* { dg-final { scan-assembler {\mb yy@notoc\M} } } */
-- 
2.17.1



Re: [committed] libstdc++: Add deprecated attributes to old iostream members

2020-08-19 Thread Jonathan Wakely via Gcc-patches

On 19/08/20 12:29 +0100, Jonathan Wakely wrote:

Back in 2017 I removed these prehistoric members (which were deprecated
since C++98) for C++17 mode. But I didn't add deprecated attributes to
most of them, so users didn't get any warning they would be going away.
Apparently some poor souls do actually use some of these names, and so
now that GCC 11 defaults to -std=gnu++17 some code has stopped
compiling.

This adds deprecated attributes to them, so that C++98/03/11/14 code
will get a warning if it uses them. I'll also backport this to the
release branches so that users can find out about the deprecation before
they start using C++17.

In order to give deprecated warnings even in C++98 mode this patch makes
_GLIBCXX_DEPRECATED work even for C++98, adds _GLIBCXX11_DEPRECATED for
the old meaning of _GLIBCXX_DEPRECATED, and adds new macros such as
_GLIBCXX_DEPRECATED_SUGGEST for suggesting alternatives to deprecated
features.

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_DEPRECATED): Define for all
standard modes.
(_GLIBCXX_DEPRECATED_SUGGEST): New macro for "use 'foo' instead"
message in deprecated warnings.
(_GLIBCXX11_DEPRECATED, _GLIBCXX11_DEPRECATED_SUGGEST): New
macros for marking features derpecated in C++11.
(_GLIBCXX17_DEPRECATED_SUGGEST, _GLIBCXX20_DEPRECATED_SUGGEST):
New macros.
* include/backward/auto_ptr.h (auto_ptr_ref, auto_ptr):
Use _GLIBCXX11_DEPRECATED instead of _GLIBCXX_DEPRECATED.
(auto_ptr): Use _GLIBCXX11_DEPRECATED_SUGGEST.
* include/backward/binders.h (binder1st, binder2nd): Likewise.
* include/bits/ios_base.h (io_state, open_mode, seek_dir)
(streampos, streamoff): Use _GLIBCXX_DEPRECATED_SUGGEST.
* include/std/streambuf (stossc): Replace C++11 attribute
with _GLIBCXX_DEPRECATED_SUGGEST.
* include/std/type_traits (__is_nullptr_t): Use
_GLIBCXX_DEPRECATED_SUGGEST instead of _GLIBCXX_DEPRECATED.
* testsuite/27_io/types/1.cc: Check for deprecated warnings.
Also check for io_state, open_mode and seek_dir typedefs.


And this adds the new macros to the comment before their definitions.

Tested powerpc64le-linux. Committed to trunk.

commit 1e235788bbfc41f3eec1bb665d8e4bb2c63a1982
Author: Jonathan Wakely 
Date:   Wed Aug 19 12:42:02 2020

libstdc++: Mention new macros in comments

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_DEPRECATED_SUGGEST)
(_GLIBCXX11_DEPRECATED, _GLIBCXX11_DEPRECATED_SUGGEST)
(_GLIBCXX17_DEPRECATED_SUGGEST, _GLIBCXX20_DEPRECATED_SUGGEST):
Add new macros to comment.

diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config
index 116faf64441..de28acea6b7 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -77,8 +77,13 @@
 // Macros for deprecated attributes.
 //   _GLIBCXX_USE_DEPRECATED
 //   _GLIBCXX_DEPRECATED
+//   _GLIBCXX_DEPRECATED_SUGGEST( string-literal )
+//   _GLIBCXX11_DEPRECATED
+//   _GLIBCXX11_DEPRECATED_SUGGEST( string-literal )
 //   _GLIBCXX17_DEPRECATED
+//   _GLIBCXX17_DEPRECATED_SUGGEST( string-literal )
 //   _GLIBCXX20_DEPRECATED( string-literal )
+//   _GLIBCXX20_DEPRECATED_SUGGEST( string-literal )
 #ifndef _GLIBCXX_USE_DEPRECATED
 # define _GLIBCXX_USE_DEPRECATED 1
 #endif


Re: [PATCH v2] libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

2020-08-19 Thread Richard Earnshaw
On 19/08/2020 13:54, Maciej W. Rozycki via Gcc-patches wrote:
> On Tue, 18 Aug 2020, Richard Earnshaw wrote:
> 
>>> Complement commit b932f770f70d ("x86_64 frame unwind info"), SVN r46374, 
>>> , and replace 
>>> `-fexceptions -fnon-call-exceptions' with `-fasynchronous-unwind-tables' 
>>> in LIB2_DIVMOD_FUNCS compilation flags so as to provide unwind tables 
>>> for the affected functions while not pulling the unwinder proper, which 
>>> is not required here.
>>>
>>> Remove the ARM overrides accordingly, retaining the hook infrastructure 
>>> however, and make the ARM test case a generic one.
> [...]
>> From a quick glance, I'm not convinced this is right for Arm, since the
>> Arm unwind format does not support anything other than call-based
>> exceptions.  How did you test it?
> 
>  Surely no ARM target has been verified as I have no access to such 
> configurations; I will appreciate if either you or someone else suitably 
> equipped does that for the sake of cross-target code unification (= fewer 
> special cases to maintain).  This has been regression-tested with RISC-V 
> and x86-64 targets, as noted in the two submissions.
> 
>  Are you trying to say that `-fasynchronous-unwind-tables' has no effect 
> on ARM?  This code does not throw exceptions, so any unwinding would only 
> happen in contexts such as in GDB poking at this code or from a signal 
> handler such as SIGALRM or SIGFPE (if ARM does ever send the latter signal 
> for integer division operations; I don't know offhand).  The GCC option is 
> generic and is supposed to fully support such use cases regardless of the 
> target chosen, so shouldn't the ARM backend be wired appropriately so as 
> to use whatever unwind format is required to handle the use cases, 
> regardless of whether the minimal format usually used is supported by the 
> psABI or not?
> 
>  There is indeed a documented provision for not supporting the option: 
> "Generate unwind table in DWARF format, if supported by target machine." 
> however I infer that refers to the support of the DWARF format as a whole 
> rather than specifically minimal unwind tables, that is if the DWARF 
> format is supported (as opposed to say stabs or mdebug only), then the 
> option shall generate an unwind table in that format.
> 
>  That said I'm of course happy to keep the ARM overrides if you consider 
> them still necessary in the context of the generic change made.  Let me 
> know what you prefer, and if required, I will submit v3 with the ARM 
> pieces removed.
> 
>   Maciej
> 

So you've made a change to the Arm target, but not tested it.  And
what's more didn't even bother to mention that fact.

If you make changes, you need to test them, particularly when there are
likely to be target-specific implications.  If you can't test yourself
then you need to make that very clear in your submission.

There are Arm targets in the testfarm, so it's not really an excuse for
not doing testing.

R.


PING [PATCH] x86: Inline strncmp only with -minline-all-stringops

2020-08-19 Thread H.J. Lu via Gcc-patches
On Wed, Jul 15, 2020 at 10:42:27AM -0700, H.J. Lu wrote:
> Expand strncmp to "repz cmpsb" only with -minline-all-stringops since
> "repz cmpsb" can be much slower than strncmp function implemented with
> vector instructions, see
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> 
> gcc/
> 
>   PR target/95458
>   * config/i386/i386-expand.c (ix86_expand_cmpstrn_or_cmpmem):
>   Return false for -mno-inline-all-stringops.
> 
> gcc/testsuite/
> 
>   PR target/95458
>   * gcc.target/i386/pr95458-1.c: New test.
>   * gcc.target/i386/pr95458-2.c: Likewise.

Expand strncmp to "repz cmpsb" only with -minline-all-stringops since
"repz cmpsb" can be much slower than strncmp function implemented with
vector instructions, see

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052

gcc/

PR target/95458
* config/i386/i386.md (cmpstrnsi): Expand only with
TARGET_INLINE_ALL_STRINGOPS.

gcc/testsuite/

PR target/95458
* gcc.target/i386/pr95458-1.c: New test.
* gcc.target/i386/pr95458-2.c: Likewise.
---
 gcc/config/i386/i386.md   |  8 +++-
 gcc/testsuite/gcc.target/i386/pr95458-1.c | 11 +++
 gcc/testsuite/gcc.target/i386/pr95458-2.c |  7 +++
 3 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr95458-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr95458-2.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index fb677e17817..f3fbed81c4a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -18007,7 +18007,13 @@ (define_expand "cmpstrnsi"
 {
   rtx addr1, addr2, countreg, align, out;
 
-  if (optimize_insn_for_size_p () && !TARGET_INLINE_ALL_STRINGOPS)
+  /* Expand strncmp only with -minline-all-stringops since
+ "repz cmpsb" can be much slower than strncmp functions
+ implemented with vector instructions, see
+
+ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
+   */
+  if (!TARGET_INLINE_ALL_STRINGOPS)
 FAIL;
 
   /* Can't use this if the user has appropriated ecx, esi or edi.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr95458-1.c 
b/gcc/testsuite/gcc.target/i386/pr95458-1.c
new file mode 100644
index 000..231a4787dce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr95458-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -minline-all-stringops" } */
+
+int
+func (char *d, unsigned int l)
+{
+  return __builtin_strncmp (d, "foo", l) ? 1 : 2;
+}
+
+/* { dg-final { scan-assembler-not "call\[\\t \]*_?strncmp" } } */
+/* { dg-final { scan-assembler "cmpsb" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr95458-2.c 
b/gcc/testsuite/gcc.target/i386/pr95458-2.c
new file mode 100644
index 000..1a620444770
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr95458-2.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-inline-all-stringops" } */
+
+#include "pr95458-1.c"
+
+/* { dg-final { scan-assembler "call\[\\t \]*_?strncmp" } } */
+/* { dg-final { scan-assembler-not "cmpsb" } } */
-- 
2.26.2



PING [PATCH] x86: Add cmpmemsi for -minline-all-stringops

2020-08-19 Thread H.J. Lu via Gcc-patches
On Tue, May 19, 2020 at 5:14 AM H.J. Lu  wrote:
>
> On Tue, May 19, 2020 at 1:48 AM Uros Bizjak  wrote:
> >
> > On Sun, May 17, 2020 at 7:06 PM H.J. Lu  wrote:
> > >
> > > Duplicate the cmpstrn pattern for cmpmem.  The only difference is that
> > > the length argument of cmpmem is guaranteed to be less than or equal to
> > > lengths of 2 memory areas.  Since "repz cmpsb" can be much slower than
> > > memcmp function implemented with vector instruction, see
> > >
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> > >
> > > expand cmpmem to "repz cmpsb" only with -mgeneral-regs-only.
> >
> > If there is no benefit compared to the library implementation, then
> > enable these patterns only when -minline-all-stringops is used.
>
> Fixed.
>
> > Eventually these should be reimplemented with SSE4 string instructions.
> >
> > Honza is the author of the block handling x86 system, I'll leave the
> > review to him.
>
> We used to expand memcmp to "repz cmpsb" via cmpstrnsi.  It was changed
> by
>
> commit 9b0f6f5e511ca512e4faeabc81d2fd3abad9b02f
> Author: Nick Clifton 
> Date:   Fri Aug 12 16:26:11 2011 +
>
> builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi pattern.
>
> * builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi
> pattern.
> * doc/md.texi (cmpstrn): Note that the comparison stops if both
> fetched bytes are zero.
> (cmpstr): Likewise.
> (cmpmem): Note that the comparison does not stop if both of the
> fetched bytes are zero.
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95151
>
> is a regression.
>
> Honza, can you take a look at this?
>

PING:

https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546921.html

-- 
H.J.


Re: [PATCH v2] libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

2020-08-19 Thread Maciej W. Rozycki via Gcc-patches
On Tue, 18 Aug 2020, Richard Earnshaw wrote:

> > Complement commit b932f770f70d ("x86_64 frame unwind info"), SVN r46374, 
> > , and replace 
> > `-fexceptions -fnon-call-exceptions' with `-fasynchronous-unwind-tables' 
> > in LIB2_DIVMOD_FUNCS compilation flags so as to provide unwind tables 
> > for the affected functions while not pulling the unwinder proper, which 
> > is not required here.
> > 
> > Remove the ARM overrides accordingly, retaining the hook infrastructure 
> > however, and make the ARM test case a generic one.
[...]
> From a quick glance, I'm not convinced this is right for Arm, since the
> Arm unwind format does not support anything other than call-based
> exceptions.  How did you test it?

 Surely no ARM target has been verified as I have no access to such 
configurations; I will appreciate if either you or someone else suitably 
equipped does that for the sake of cross-target code unification (= fewer 
special cases to maintain).  This has been regression-tested with RISC-V 
and x86-64 targets, as noted in the two submissions.

 Are you trying to say that `-fasynchronous-unwind-tables' has no effect 
on ARM?  This code does not throw exceptions, so any unwinding would only 
happen in contexts such as in GDB poking at this code or from a signal 
handler such as SIGALRM or SIGFPE (if ARM does ever send the latter signal 
for integer division operations; I don't know offhand).  The GCC option is 
generic and is supposed to fully support such use cases regardless of the 
target chosen, so shouldn't the ARM backend be wired appropriately so as 
to use whatever unwind format is required to handle the use cases, 
regardless of whether the minimal format usually used is supported by the 
psABI or not?

 There is indeed a documented provision for not supporting the option: 
"Generate unwind table in DWARF format, if supported by target machine." 
however I infer that refers to the support of the DWARF format as a whole 
rather than specifically minimal unwind tables, that is if the DWARF 
format is supported (as opposed to say stabs or mdebug only), then the 
option shall generate an unwind table in that format.

 That said I'm of course happy to keep the ARM overrides if you consider 
them still necessary in the context of the generic change made.  Let me 
know what you prefer, and if required, I will submit v3 with the ARM 
pieces removed.

  Maciej


Re: [Patch] Fortran: Add 'device_type' clause to OpenMP's declare target

2020-08-19 Thread Tobias Burnus
Hi Andre,

thanks for the comments.

Am 18.08.20 um 19:33 schrieb Andre Vehreschild:

> + case OMP_DEVICE_TYPE_HOST:
> +   MIO_NAME (ab_attribute) (AB_OMP_DEVICE_TYPE_NOHOST, attr_bits);
> Why also NOHOST here?

Copy and paste error. Well spotted. Thanks!
(I wonder why it didn't show up in the testcase;
probably because I generated the module in the same
translation unit, I'd guess.)

> @@ -426,6 +426,8 @@ build_common_decl (gfc_common_head *com, tree union_type,
> bool is_init) /* If there is no backend_decl for the common block, build it.  
> */
>if (decl == NULL_TREE)
>  {
> +  tree clauses = NULL_TREE;
> Would you mind using "omp_clauses" or the like here?

I thought about this – but due to indentation, I think I
used 'clauses'. But looking again at the patch, this
must have been either 'c' or for some other patch as
"omp_clauses" should work as well.

I will later update the patch for the items.

Tobias



Re: [committed] libstdc++: Add deprecated attributes to old iostream members

2020-08-19 Thread Jonathan Wakely via Gcc-patches

On 19/08/20 12:29 +0100, Jonathan Wakely wrote:

Back in 2017 I removed these prehistoric members (which were deprecated
since C++98) for C++17 mode. But I didn't add deprecated attributes to
most of them, so users didn't get any warning they would be going away.
Apparently some poor souls do actually use some of these names, and so
now that GCC 11 defaults to -std=gnu++17 some code has stopped
compiling.

This adds deprecated attributes to them, so that C++98/03/11/14 code
will get a warning if it uses them. I'll also backport this to the
release branches so that users can find out about the deprecation before
they start using C++17.

In order to give deprecated warnings even in C++98 mode this patch makes
_GLIBCXX_DEPRECATED work even for C++98, adds _GLIBCXX11_DEPRECATED for
the old meaning of _GLIBCXX_DEPRECATED, and adds new macros such as
_GLIBCXX_DEPRECATED_SUGGEST for suggesting alternatives to deprecated
features.

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_DEPRECATED): Define for all
standard modes.
(_GLIBCXX_DEPRECATED_SUGGEST): New macro for "use 'foo' instead"
message in deprecated warnings.
(_GLIBCXX11_DEPRECATED, _GLIBCXX11_DEPRECATED_SUGGEST): New
macros for marking features derpecated in C++11.


I'll try to remember to fix the "derpecated" typo in the ChangeLog
tomorrow.


(_GLIBCXX17_DEPRECATED_SUGGEST, _GLIBCXX20_DEPRECATED_SUGGEST):
New macros.
* include/backward/auto_ptr.h (auto_ptr_ref, auto_ptr):
Use _GLIBCXX11_DEPRECATED instead of _GLIBCXX_DEPRECATED.
(auto_ptr): Use _GLIBCXX11_DEPRECATED_SUGGEST.
* include/backward/binders.h (binder1st, binder2nd): Likewise.
* include/bits/ios_base.h (io_state, open_mode, seek_dir)
(streampos, streamoff): Use _GLIBCXX_DEPRECATED_SUGGEST.
* include/std/streambuf (stossc): Replace C++11 attribute
with _GLIBCXX_DEPRECATED_SUGGEST.
* include/std/type_traits (__is_nullptr_t): Use
_GLIBCXX_DEPRECATED_SUGGEST instead of _GLIBCXX_DEPRECATED.
* testsuite/27_io/types/1.cc: Check for deprecated warnings.
Also check for io_state, open_mode and seek_dir typedefs.

Tested powerpc64le-linux. Committed to trunk.


The attached patch is the backport I've pushed to the gcc-10 branch,
which only adds the new _GLIBCXX_DEPRECATED_SUGGEST macro and doesn't
enable it for C++98.

The version for gcc-9 is the same but doesn't touch the __is_nullptr_t
trait, because GCC 10.1.0 was the first release to deprecate that.


commit 817ac30a8581f75cc25c4c71ef8623eb52953390
Author: Jonathan Wakely 
Date:   Wed Aug 19 13:41:26 2020

libstdc++: Add deprecated attributes to old iostream members

Back in 2017 I removed these prehistoric members (which were deprecated
since C++98) for C++17 mode. But I didn't add deprecated attributes to
most of them, so users didn't get any warning they would be going away.
Apparently some poor souls do actually use some of these names, and so
now that GCC 11 defaults to -std=gnu++17 some code has stopped
compiling.

This adds deprecated attributes to them, so that C++98/03/11/14 code
will get a warning if it uses them. I'll also backport this to the
release branches so that users can find out about the deprecation before
they start using C++17.

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_DEPRECATED_SUGGEST): New
macro for "use 'foo' instead" message in deprecated warnings.
* include/bits/ios_base.h (io_state, open_mode, seek_dir)
(streampos, streamoff): Use _GLIBCXX_DEPRECATED_SUGGEST.
* include/std/streambuf (stossc): Replace C++11 attribute
with _GLIBCXX_DEPRECATED_SUGGEST.
* include/std/type_traits (__is_nullptr_t): Use
_GLIBCXX_DEPRECATED_SUGGEST instead of _GLIBCXX_DEPRECATED.
* testsuite/27_io/types/1.cc: Check for deprecated warnings.
Also check for io_state, open_mode and seek_dir typedefs.

(cherry picked from commit eef9bf4ca8d90a1751bc4bff03722ee68999eb8e)

diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config
index b1fad59d4b3..aa94a681fff 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -77,6 +77,7 @@
 // Macros for deprecated attributes.
 //   _GLIBCXX_USE_DEPRECATED
 //   _GLIBCXX_DEPRECATED
+//   _GLIBCXX_DEPRECATED_SUGGEST
 //   _GLIBCXX17_DEPRECATED
 //   _GLIBCXX20_DEPRECATED( string-literal )
 #ifndef _GLIBCXX_USE_DEPRECATED
@@ -85,8 +86,11 @@
 
 #if defined(__DEPRECATED) && (__cplusplus >= 201103L)
 # define _GLIBCXX_DEPRECATED __attribute__ ((__deprecated__))
+# define _GLIBCXX_DEPRECATED_SUGGEST(ALT) \
+  __attribute__ ((__deprecated__ ("use '" ALT "' instead")))
 #else
 # define _GLIBCXX_DEPRECATED
+# define _GLIBCXX_DEPRECATED_SUGGEST(ALT)
 #endif
 
 #if 

Re: [PATCH] vxworks: Fix GCC selftests for *-wrs-vxworks7-* targets

2020-08-19 Thread Iain Buclaw via Gcc-patches
Excerpts from Olivier Hainque's message of August 18, 2020 2:25 pm:
> Hi Iain,
> 
>> On 18 Aug 2020, at 13:45, Iain Buclaw  wrote:
>> 
>> Attached is the change as per your proposal.
>> 
>>  * config/vxworks.h (VXWORKS_ADDITIONAL_CPP_SPEC): Replace -nostdinc
>>  with -fself-tests.
>> #undef VXWORKS_ADDITIONAL_CPP_SPEC
>> #define VXWORKS_ADDITIONAL_CPP_SPEC \
>> - "%{!nostdinc:  \
>> + "%{!fself-test=*:  \
>> %{isystem*} \
>> %{mrtp: -idirafter %:getenv(VSB_DIR /h) \
>> -idirafter %:getenv(VSB_DIR /share/h)   \
>> @@ -55,7 +60,7 @@ along with GCC; see the file COPYING3.  If not see
>> 
>> #undef VXWORKS_ADDITIONAL_CPP_SPEC
>> #define VXWORKS_ADDITIONAL_CPP_SPEC  \
>> - "%{!nostdinc:  \
>> + "%{!fself-test=*:  \
> 
> Thanks for the updated proposal.
> 
> Sorry, I have been unclear: If I'm reading the spec of
> -nostdinc correctly, I think we should still prevent those
> CPP switches from being added if the option is provided.
> 

Ah, no worries, attached updated patch.

> Can you please amend just this part to prevent the addition
> of the following switches if either -nostdinc or -fself-test
> is provided ?
> 
> Ok, for me with this change, assuming -nostdinc in
> SELFTEST_FLAGS didn't have other uses than the one documented
> in the attached comment (I'm not familiar enough with the
> self-tests to know for sure).
> 

The introducing commit is cf7bb33f4d9532dc7ea2551acbf888341b8f12ce, the
reliance on -nostdinc might have changed in the meantime though.

[Update]
As we have discussed this off the lists though, we agreed to compromise
and leave -nostdinc as it is in SELFTEST_FLAGS.

Iain.

---
gcc/ChangeLog:

* config/vxworks.h (VXWORKS_ADDITIONAL_CPP_SPEC): Don't include
VxWorks header files if -fself-tests is used.
(STARTFILE_PREFIX_SPEC): Avoid using VSB_DIR if -fself-tests is used.
---
diff --git a/gcc/config/vxworks.h b/gcc/config/vxworks.h
index d648d2f23cb..e50260b08e4 100644
--- a/gcc/config/vxworks.h
+++ b/gcc/config/vxworks.h
@@ -36,11 +36,16 @@ along with GCC; see the file COPYING3.  If not see
 
 /* Since we provide a default -isystem, expand -isystem on the command
line early.  */
+
+/* Self-tests may be run in contexts where the VxWorks environment isn't
+   available.  Prevent attempts at designating the location of runtime header
+   files, libraries or startfiles, which would fail on unset environment
+   variables and aren't needed for such tests.  */
 #if TARGET_VXWORKS7
 
 #undef VXWORKS_ADDITIONAL_CPP_SPEC
 #define VXWORKS_ADDITIONAL_CPP_SPEC \
- "%{!nostdinc:  \
+ "%{!nostdinc:%{!fself-test=*:  \
 %{isystem*} \
 %{mrtp: -idirafter %:getenv(VSB_DIR /h) \
 -idirafter %:getenv(VSB_DIR /share/h)   \
@@ -49,19 +54,19 @@ along with GCC; see the file COPYING3.  If not see
   ;:-idirafter %:getenv(VSB_DIR /h) \
 -idirafter %:getenv(VSB_DIR /share/h)   \
 -idirafter %:getenv(VSB_DIR /krnl/h/system) \
--idirafter %:getenv(VSB_DIR /krnl/h/public)}}"
+-idirafter %:getenv(VSB_DIR /krnl/h/public)}}}"
 
 #else /* TARGET_VXWORKS7 */
 
 #undef VXWORKS_ADDITIONAL_CPP_SPEC
 #define VXWORKS_ADDITIONAL_CPP_SPEC\
- "%{!nostdinc: \
+ "%{!nostdinc:%{!fself-test=*: \
 %{isystem*}\
 %{mrtp: -idirafter %:getenv(WIND_USR /h)   \
-idirafter %:getenv(WIND_USR /h/wrn/coreip) \
   ;:-idirafter %:getenv(WIND_BASE /target/h) \
-idirafter %:getenv(WIND_BASE /target/h/wrn/coreip) \
-}}"
+}}}"
 
 #endif
 
@@ -108,7 +113,8 @@ along with GCC; see the file COPYING3.  If not see
 
 #if TARGET_VXWORKS7
 #undef  STARTFILE_PREFIX_SPEC
-#define STARTFILE_PREFIX_SPEC "%:getenv(VSB_DIR /usr/lib/common)"
+#define STARTFILE_PREFIX_SPEC \
+  "%{!fself-test=*:%:getenv(VSB_DIR /usr/lib/common)}"
 #define TLS_SYM "-u __tls__"
 #else
 #define TLS_SYM ""



Re: [PATCH] bb-reorder: Remove a misfiring micro-optimization (PR96475)

2020-08-19 Thread Richard Sandiford
Segher Boessenkool  writes:
> When the compgotos pass copies the tail of blocks ending in an indirect
> jump, there is a micro-optimization to not copy the last one, since the
> original block will then just be deleted.  This does not work properly
> if cleanup_cfg does not merge all pairs of blocks we expect it to.
>
>
> v2: This also deletes the other use of single_pred_p, which has the same
> problem in principle, I just never have triggered it so far.

Could you go into more details?  I thought the reason you wanted to
remove the test in the loop was that each successful iteration of the
loop removes one incoming edge, meaning that if all previous iterations
succeed, the final iteration is bound to see a single predecessor.
And there's no guarantee in that case that the jump in the final
block will be removed.  I.e. the loop is unnecessarily leaving
cfgcleanup to finish off the last step for it.  So I agree that the
loop part looks good.  (I assume that was v1, but I missed it, sorry.)

But it looks like the point of the initial test is to avoid “duplicating”
the block if there would still only be one copy of the code.  That test
still seems reasonable since the case it rejects should be handled by
other cfg cleanups.  If for some reason it isn't, there doesn't seem
to be any reason to apply the max_size limit for a single predecessor,
since no code duplication will take place.  So ISTM that removing the
first condition is really a separate thing that should be its own patch.

Thanks,
Richard

>
> Tested on powerpc64-linux {-m32,-m64} like before.  Is this okay for
> trunk?
>
>
> Segher
>
>
> 2020-08-07  Segher Boessenkool  
>
>   PR rtl-optimization/96475
>   * bb-reorder.c (maybe_duplicate_computed_goto): Remove single_pred_p
>   micro-optimization.
> ---
>  gcc/bb-reorder.c | 10 +++---
>  1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
> index c635010..76e56b5 100644
> --- a/gcc/bb-reorder.c
> +++ b/gcc/bb-reorder.c
> @@ -2680,9 +2680,6 @@ make_pass_reorder_blocks (gcc::context *ctxt)
>  static bool
>  maybe_duplicate_computed_goto (basic_block bb, int max_size)
>  {
> -  if (single_pred_p (bb))
> -return false;
> -
>/* Make sure that the block is small enough.  */
>rtx_insn *insn;
>FOR_BB_INSNS (bb, insn)
> @@ -2700,10 +2697,9 @@ maybe_duplicate_computed_goto (basic_block bb, int 
> max_size)
>  {
>basic_block pred = e->src;
>  
> -  /* Do not duplicate BB into PRED if that is the last predecessor, or if
> -  we cannot merge a copy of BB with PRED.  */
> -  if (single_pred_p (bb)
> -   || !single_succ_p (pred)
> +  /* Do not duplicate BB into PRED if we cannot merge a copy of BB
> +  with PRED.  */
> +  if (!single_succ_p (pred)
> || e->flags & EDGE_COMPLEX
> || pred->index < NUM_FIXED_BLOCKS
> || (JUMP_P (BB_END (pred)) && !simplejump_p (BB_END (pred)))


Re: [PATCH] [AVX512]For vector compare to mask register, UNSPEC is needed instead of comparison operator [PR96243]

2020-08-19 Thread Hongtao Liu via Gcc-patches
ping^1

On Tue, Aug 11, 2020 at 5:43 PM Hongtao Liu  wrote:
>
> Hi:
>   The issue is described in the bugzilla.
>   Bootstrap is ok, regression test for i386/x86-64 backend is ok.
>  Ok for trunk?
>
> ChangeLog
> gcc/
> PR target/96551
> * config/i386/sse.md (vec_unpacku_float_hi_v16si): For vector
> compare to integer mask, don't use gen_rtx_LT , use
> ix86_expand_mask_vec_cmp instead.
> (vec_unpacku_float_hi_v16si): Ditto.
>
> gcc/testsuite
> * gcc.target/i386/pr96551-1.c: New test.
> * gcc.target/i386/pr96551-2.c: New test.
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


[committed] libstdc++: Add deprecated attributes to old iostream members

2020-08-19 Thread Jonathan Wakely via Gcc-patches
Back in 2017 I removed these prehistoric members (which were deprecated
since C++98) for C++17 mode. But I didn't add deprecated attributes to
most of them, so users didn't get any warning they would be going away.
Apparently some poor souls do actually use some of these names, and so
now that GCC 11 defaults to -std=gnu++17 some code has stopped
compiling.

This adds deprecated attributes to them, so that C++98/03/11/14 code
will get a warning if it uses them. I'll also backport this to the
release branches so that users can find out about the deprecation before
they start using C++17.

In order to give deprecated warnings even in C++98 mode this patch makes
_GLIBCXX_DEPRECATED work even for C++98, adds _GLIBCXX11_DEPRECATED for
the old meaning of _GLIBCXX_DEPRECATED, and adds new macros such as
_GLIBCXX_DEPRECATED_SUGGEST for suggesting alternatives to deprecated
features.

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_DEPRECATED): Define for all
standard modes.
(_GLIBCXX_DEPRECATED_SUGGEST): New macro for "use 'foo' instead"
message in deprecated warnings.
(_GLIBCXX11_DEPRECATED, _GLIBCXX11_DEPRECATED_SUGGEST): New
macros for marking features derpecated in C++11.
(_GLIBCXX17_DEPRECATED_SUGGEST, _GLIBCXX20_DEPRECATED_SUGGEST):
New macros.
* include/backward/auto_ptr.h (auto_ptr_ref, auto_ptr):
Use _GLIBCXX11_DEPRECATED instead of _GLIBCXX_DEPRECATED.
(auto_ptr): Use _GLIBCXX11_DEPRECATED_SUGGEST.
* include/backward/binders.h (binder1st, binder2nd): Likewise.
* include/bits/ios_base.h (io_state, open_mode, seek_dir)
(streampos, streamoff): Use _GLIBCXX_DEPRECATED_SUGGEST.
* include/std/streambuf (stossc): Replace C++11 attribute
with _GLIBCXX_DEPRECATED_SUGGEST.
* include/std/type_traits (__is_nullptr_t): Use
_GLIBCXX_DEPRECATED_SUGGEST instead of _GLIBCXX_DEPRECATED.
* testsuite/27_io/types/1.cc: Check for deprecated warnings.
Also check for io_state, open_mode and seek_dir typedefs.

Tested powerpc64le-linux. Committed to trunk.

commit eef9bf4ca8d90a1751bc4bff03722ee68999eb8e
Author: Jonathan Wakely 
Date:   Wed Aug 19 12:13:03 2020

libstdc++: Add deprecated attributes to old iostream members

Back in 2017 I removed these prehistoric members (which were deprecated
since C++98) for C++17 mode. But I didn't add deprecated attributes to
most of them, so users didn't get any warning they would be going away.
Apparently some poor souls do actually use some of these names, and so
now that GCC 11 defaults to -std=gnu++17 some code has stopped
compiling.

This adds deprecated attributes to them, so that C++98/03/11/14 code
will get a warning if it uses them. I'll also backport this to the
release branches so that users can find out about the deprecation before
they start using C++17.

In order to give deprecated warnings even in C++98 mode this patch makes
_GLIBCXX_DEPRECATED work even for C++98, adds _GLIBCXX11_DEPRECATED for
the old meaning of _GLIBCXX_DEPRECATED, and adds new macros such as
_GLIBCXX_DEPRECATED_SUGGEST for suggesting alternatives to deprecated
features.

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_DEPRECATED): Define for all
standard modes.
(_GLIBCXX_DEPRECATED_SUGGEST): New macro for "use 'foo' instead"
message in deprecated warnings.
(_GLIBCXX11_DEPRECATED, _GLIBCXX11_DEPRECATED_SUGGEST): New
macros for marking features derpecated in C++11.
(_GLIBCXX17_DEPRECATED_SUGGEST, _GLIBCXX20_DEPRECATED_SUGGEST):
New macros.
* include/backward/auto_ptr.h (auto_ptr_ref, auto_ptr):
Use _GLIBCXX11_DEPRECATED instead of _GLIBCXX_DEPRECATED.
(auto_ptr): Use _GLIBCXX11_DEPRECATED_SUGGEST.
* include/backward/binders.h (binder1st, binder2nd): Likewise.
* include/bits/ios_base.h (io_state, open_mode, seek_dir)
(streampos, streamoff): Use _GLIBCXX_DEPRECATED_SUGGEST.
* include/std/streambuf (stossc): Replace C++11 attribute
with _GLIBCXX_DEPRECATED_SUGGEST.
* include/std/type_traits (__is_nullptr_t): Use
_GLIBCXX_DEPRECATED_SUGGEST instead of _GLIBCXX_DEPRECATED.
* testsuite/27_io/types/1.cc: Check for deprecated warnings.
Also check for io_state, open_mode and seek_dir typedefs.

diff --git a/libstdc++-v3/include/backward/auto_ptr.h 
b/libstdc++-v3/include/backward/auto_ptr.h
index 85116364cd2..0863a0804d1 100644
--- a/libstdc++-v3/include/backward/auto_ptr.h
+++ b/libstdc++-v3/include/backward/auto_ptr.h
@@ -51,7 +51,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   
   explicit
   auto_ptr_ref(_Tp1* __p): _M_ptr(__p) { }
-} _GLIBCXX_DEPRECATED;
+} _GLIBCXX11_DEPRECATED;
 
 #pragma GCC 

Re: [PATCH] PR libstdc++/71579 assert that type traits are not misused with an incomplete type

2020-08-19 Thread Jonathan Wakely via Gcc-patches

On 12/08/20 11:19 +0300, Antony Polukhin via Libstdc++ wrote:

Fixed patch for type traits hardening

Changelog

2020-08-12  Antony Polukhin  

   PR libstdc/71579
   * include/std/type_traits (invoke_result, is_nothrow_invocable_r)
   Add static_asserts to make sure that the argument of the type
   trait is not misused with incomplete types.
   (is_swappable_with, is_nothrow_swappable_with): Add static_asserts
   to make sure that the first and second arguments of the type trait
   are not misused with incomplete types.
   * testsuite/20_util/invoke_result/incomplete_neg.cc: New test.
   * testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc: New test.
   * testsuite/20_util/is_nothrow_swappable/incomplete_neg.cc: New test.
   * testsuite/20_util/is_nothrow_swappable_with/incomplete_neg.cc: New
   test.
   * testsuite/20_util/is_swappable_with/incomplete_neg.cc: New test.


Thanks, pushed to master.

Do we also want to check
(std::__is_complete_or_unbounded(__type_identity<_ArgTypes>{}) && ...)
for invoke_result and the is_invocable traits?

We can use a fold expression there, because those traits are not
defined before C++17.




Re: [PATCH] Using gen_int_mode instead of GEN_INT to avoid ICE caused by type promotion.

2020-08-19 Thread Hongtao Liu via Gcc-patches
ping ^ 4, it's a very simple fix for ICE.

On Mon, Aug 10, 2020 at 6:00 PM Hongtao Liu  wrote:
>
> Ping^3
>
> On Tue, Aug 4, 2020 at 4:21 PM Hongtao Liu  wrote:
> >
> > ping ^2
> >
> > On Mon, Jul 27, 2020 at 5:31 PM Hongtao Liu  wrote:
> > >
> > > ping
> > >
> > > On Wed, Jul 22, 2020 at 3:57 PM Hongtao Liu  wrote:
> > > >
> > > >   Bootstrap is ok, regression test is ok for i386 backend.
> > > >
> > > > gcc/
> > > > PR target/96262
> > > > * config/i386/i386-expand.c
> > > > (ix86_expand_vec_shift_qihi_constant): Refine.
> > > >
> > > > gcc/testsuite/
> > > > * gcc.target/i386/pr96262-1.c: New test.
> > > >
> > > > ---
> > > >  gcc/config/i386/i386-expand.c |  6 +++---
> > > >  gcc/testsuite/gcc.target/i386/pr96262-1.c | 11 +++
> > > >  2 files changed, 14 insertions(+), 3 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr96262-1.c
> > > >
> > > > diff --git a/gcc/config/i386/i386-expand.c 
> > > > b/gcc/config/i386/i386-expand.c
> > > > index e194214804b..d57d043106a 100644
> > > > --- a/gcc/config/i386/i386-expand.c
> > > > +++ b/gcc/config/i386/i386-expand.c
> > > > @@ -19537,7 +19537,7 @@ bool
> > > >  ix86_expand_vec_shift_qihi_constant (enum rtx_code code, rtx dest,
> > > > rtx op1, rtx op2)
> > > >  {
> > > >machine_mode qimode, himode;
> > > > -  unsigned int and_constant, xor_constant;
> > > > +  HOST_WIDE_INT and_constant, xor_constant;
> > > >HOST_WIDE_INT shift_amount;
> > > >rtx vec_const_and, vec_const_xor;
> > > >rtx tmp, op1_subreg;
> > > > @@ -19612,7 +19612,7 @@ ix86_expand_vec_shift_qihi_constant (enum
> > > > rtx_code code, rtx dest, rtx op1, rtx
> > > >emit_move_insn (dest, simplify_gen_subreg (qimode, tmp, himode, 0));
> > > >emit_move_insn (vec_const_and,
> > > >   ix86_build_const_vector (qimode, true,
> > > > -  GEN_INT (and_constant)));
> > > > +  gen_int_mode (and_constant,
> > > > QImode)));
> > > >emit_insn (gen_and (dest, dest, vec_const_and));
> > > >
> > > >/* For ASHIFTRT, perform extra operation like
> > > > @@ -19623,7 +19623,7 @@ ix86_expand_vec_shift_qihi_constant (enum
> > > > rtx_code code, rtx dest, rtx op1, rtx
> > > >vec_const_xor = gen_reg_rtx (qimode);
> > > >emit_move_insn (vec_const_xor,
> > > >   ix86_build_const_vector (qimode, true,
> > > > -  GEN_INT (xor_constant)));
> > > > +  gen_int_mode
> > > > (xor_constant, QImode)));
> > > >emit_insn (gen_xor (dest, dest, vec_const_xor));
> > > >emit_insn (gen_sub (dest, dest, vec_const_xor));
> > > >  }
> > > > diff --git a/gcc/testsuite/gcc.target/i386/pr96262-1.c
> > > > b/gcc/testsuite/gcc.target/i386/pr96262-1.c
> > > > new file mode 100644
> > > > index 000..1825388072e
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/pr96262-1.c
> > > > @@ -0,0 +1,11 @@
> > > > +/* PR target/96262 */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-mavx512bw -O" } */
> > > > +
> > > > +typedef char __attribute__ ((__vector_size__ (64))) V;
> > > > +
> > > > +V
> > > > +foo (V v)
> > > > +{
> > > > +  return ~(v << 1);
> > > > +}
> > > > --
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: [PATCH] PR target/96347: non-delegitimized UNSPEC UNSPEC_TP (19) found in variable location

2020-08-19 Thread Richard Sandiford
Iain Buclaw via Gcc-patches  writes:
> Hi,
>
> On x86, a memory reference reference to a TLS address can be broken out
> and stored in a register, for instance:
>
> movq  %fs:8+testYearsBC@tpoff, %rdx
>
> Subsequently becomes:
>
> pushq %rbp
> leaq  8+testYearsBC@tpoff, %rbp
> // later...
> movq  %fs:0(%rbp), %rdx
>
> When it comes to representing this in Dwarf, mem_loc_descriptor is left
> with the following RTL, which ix86_delegitimize_tls_address is unable to
> handle as the symbol_ref has been swapped out with the temporary.
>
> (plus:DI (unspec:DI [
>   (const_int 0 [0])
>   ] UNSPEC_TP)
>   (reg/f:DI 6 bp [90]))
>
> The UNSPEC_TP expression is checked, ix86_const_not_ok_for_debug_p
> returns false as it only lets through UNSPEC_GOTOFF, and finally
> const_ok_for_output is either not smart enough or does not have the
> information needed to recognize that UNSPEC_TP is a TLS UNSPEC that
> should be ignored.  This results in informational warnings being fired
> under -fchecking.
>
> The entrypoint that triggers this warning to occur is that MEM codes are
> first lowered with mem_loc_descriptor, with tls_mem_loc_descriptor only
> being used if that failed.  This patch switches that around so that TLS
> memory expressions first call tls_mem_loc_descriptor, and only use
> mem_loc_descriptor if that fails (which may result in UNSPEC warnings).
>
> Bootstrapped and regression tested on x86_64-linux-gnu, I do also have a
> sparc64-linux-gnu build at hand, but have not yet got the results to
> check for a before/after comparison.
>
> OK for mainline?
>
> Regards
> Iain
>
> ---
> gcc/ChangeLog:
>
>   PR target/96347
>   * dwarf2out.c (is_tls_mem_loc): New.
>   (mem_loc_descriptor): Call tls_mem_loc_descriptor on TLS memory
>   expressions, fallback to mem_loc_descriptor if unsuccessful.
>   (loc_descriptor): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/96347
>   * g++.dg/other/pr96347.C: New test.
> ---
>  gcc/dwarf2out.c  | 30 ++
>  gcc/testsuite/g++.dg/other/pr96347.C | 61 
>  2 files changed, 84 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/other/pr96347.C
>
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index 9deca031fc2..093ceb23c7a 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -14435,6 +14435,20 @@ is_based_loc (const_rtx rtl)
>  && CONST_INT_P (XEXP (rtl, 1);
>  }
>  
> +/* Return true if this MEM expression is for a TLS variable.  */
> +
> +static bool
> +is_tls_mem_loc (rtx mem)
> +{
> +  tree base;
> +
> +  if (MEM_EXPR (mem) == NULL_TREE || !MEM_OFFSET_KNOWN_P (mem))
> +return false;
> +
> +  base = get_base_address (MEM_EXPR (mem));
> +  return (base && VAR_P (base) && DECL_THREAD_LOCAL_P (base));
> +}
> +
>  /* Try to handle TLS MEMs, for which mem_loc_descriptor on XEXP (mem, 0)
> failed.  */
>  
> @@ -15671,11 +15685,12 @@ mem_loc_descriptor (rtx rtl, machine_mode mode,
> return mem_loc_result;
> }
>}
> -  mem_loc_result = mem_loc_descriptor (XEXP (rtl, 0),
> -get_address_mode (rtl), mode,
> -VAR_INIT_STATUS_INITIALIZED);
> -  if (mem_loc_result == NULL)
> +  if (is_tls_mem_loc (rtl))
>   mem_loc_result = tls_mem_loc_descriptor (rtl);
> +  if (mem_loc_result == NULL)

Is the is_tls_mem_loc part necessary?  It looks like it's just
repeating the first part of tls_mem_loc_descriptor, and so we
could call that unconditionally instead.

I guess this raises the question: if we swapping the order for
TLS so that MEM_EXPR trumps the MEM address, why shouldn't we take
that approach more generally?  I.e. is there any reason to look at
the MEM_EXPR only when DECL_THREAD_LOCAL_P is true for the base address,
rather than for all symbolic base addresses?

Thanks,
Richard


Re: [PATCH] cmpelim: recognize extra clobbers in insns

2020-08-19 Thread Richard Sandiford
Sorry for the slow reply.

Pip Cet via Gcc-patches  writes:
> I'm working on the AVR cc0 -> CCmode conversion (bug#92729). One
> problem is that the cmpelim pass is currently very strict in requiring
> insns of the form
>
> (parallel [(set (reg:SI) (op:SI ... ...))
>(clobber (reg:CC REG_CC))])
>
> when in fact AVR's insns often have the form
>
> (parallel [(set (reg:SI) (op:SI ... ...))
>(clobber (scratch:QI))
>(clobber (reg:CC REG_CC))])
>
> The attached patch relaxes checks in the cmpelim code to recognize
> such insns, and makes it attempt to recognize
>
> (parallel [(set (reg:CC REG_CC) (compare:CC ... ...))
>(set (reg:SI (op:SI ... ...)))
>(clobber (scratch:QI))])
>
> as a new insn for that example. This appears to work.

The idea looks good.  However, I think it'd be better (or at least
more usual) for the define_insns to list the clobbers the other
way around:

(parallel [(set (reg:SI) (op:SI ... ...))
   (clobber (reg:CC REG_CC))
   (clobber (scratch:QI))])

(clobber (scratch…))s generally come last because any rtl optimisation
pass that uses recog can automatically add any necessary
(clobber (scratch…))s.  In contrast, very few passes (probably just
combine) know how to add (clobber (reg:CC REG_CC)) to a pattern that
didn't already have it.  This is because adding a REG_CC clobber
requires the pass to “prove” that REG_CC is dead at that point,
whereas there are no restrictions on adding a scratch clobber.

Thanks,
Richard


Re: [PATCH 1/2] Add new RTX instruction class FILLER_INSN

2020-08-19 Thread Richard Sandiford
Andrea Corallo  writes:
> Segher Boessenkool  writes:
>
>> Hi Andrea,
>>
>> On Wed, Jul 22, 2020 at 12:02:33PM +0200, Andrea Corallo wrote:
>>> This first patch implements the addition of a new RTX instruction class
>>> FILLER_INSN, which has been white listed to allow placement of NOPs
>>> outside of a basic block.  This is to allow padding after unconditional
>>> branches.  This is favorable so that any performance gained from
>>> diluting branches is not paid straight back via excessive eating of
>>> nops.
>>
>>> It was deemed that a new RTX class was less invasive than modifying
>>> behavior in regards to standard UNSPEC nops.
>>
>> So I wonder if this cannot be done with some kind of NOTE, instead?
>>
>
> Hi Segher,
>
> I was having a look into reworking this using an insn note as (IIUC)
> suggested.  The idea is appealing but looking into insn-notes.def I've
> found the following comment:
>
> "We are slowly removing the concept of insn-chain notes from the
> compiler.  Adding new codes to this file is STRONGLY DISCOURAGED.
> If you think you need one, look for other ways to express what you
> mean, such as register notes or bits in the basic-block structure."
>
> Would still be justificated in this case to proceed this way?  The other
> option would be to add the information into the basic-block or into
> struct rtx_jump_insn.
>
> My GCC experience is far from sufficient for having a formed opinion on
> this, I'd probably bet on struct rtx_jump_insn as the better option.

Adding it to the basic block structure wouldn't work because we need
this information to survive until asm output time, and the cfg doesn't
last that long.  (Would be nice if it did, but that's a whole new can
of worms.)

Using REG_NOTES on the jump might be OK.  I guess the note value could
be the length in bytes.  shorten_branches would then need to look for
these notes and add the associated length after adding the length of
the insn itself.  There would then need to be some hook that final.c
can call to emit nops of the given length.

I guess there's also the option of representing this in the same way
as a delayed branch sequence, which is to make the jump insn pattern:

  (sequence [(normal jump insn)
 (delayed insn 1)
 ...])

The members of the sequence are full insns, rather than just patterns.
For this use case, the delayed insns would all be nops.

However, not much is prepared to handle the sequence representation
before the normal pass_machine_reorg position.  (The main dbr pass
itself is pass_delay_slots, but some targets run dbr within
pass_machine_reorg instead.)  There again, it isn't worth doing
layout optimisations earlier than pass_machine_reorg anyway.

Thanks,
Richard


Re: [PATCH] libibery/hashtab: add new functions

2020-08-19 Thread Martin Liška

On 8/18/20 1:38 AM, Ian Lance Taylor wrote:

I guess I'm not sure why either of these belong in libiberty.
htab_insert can be written elsewhere as needed.  And while perhaps
some sort of stats API would be reasonable, I don't think it should be
something that prints values to a FILE.


Understood. I put these functions directly to a gas subdirectory.

Thanks,
Martin


Re: [PATCH] x86: Detect Rocket Lake and Alder Lake

2020-08-19 Thread Kirill Yukhin via Gcc-patches
Hello,

On 16 авг 06:17, H.J. Lu via Gcc-patches wrote:
> From arch/x86/include/asm/intel-family.h on Linux kernel master branch:
> 
>  #define INTEL_FAM6_ROCKETLAKE   0xA7
>  #define INTEL_FAM6_ALDERLAKE0x97
> 
>   * common/config/i386/cpuinfo.h (get_intel_cpu): Detect Rocket
>   Lake and Alder Lake.

Your patch is OK for trunk.

--
K


RE: [PATCH] AArch64: Add if condition in aarch64_function_value [PR96479]

2020-08-19 Thread qiaopeixin
Thanks.

All the best,
Peixin

-Original Message-
From: Richard Sandiford [mailto:richard.sandif...@arm.com] 
Sent: Wednesday, August 19, 2020 5:56 PM
To: qiaopeixin 
Cc: Christophe Lyon ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] AArch64: Add if condition in aarch64_function_value 
[PR96479]

qiaopeixin  writes:
> Hi Richard,
>
> Thanks for the example.
>
> I remove the whole line "& && TREE_PUBLIC (fndecl)" and passed 
> bootstrap and deja tests. I add your provided example under 
> /gcc.target/aarch64, and the patch is attached.

Thanks, pushed to master.

Richard


Re: [PATCH PR96357][GCC][AArch64]: could not split insn UNSPEC_COND_FSUB with AArch64 SVE

2020-08-19 Thread Richard Sandiford
Przemyslaw Wirkus  writes:
> Hi,
>
> Problem is related to that operand 4 (In original pattern
> *cond_sub_any_const) is no longer the same as operand 1, and so
> the pattern doesn't match the split condition.
>
> Pattern *cond_sub_any_const is being split by this patch into two
> separate patterns:
> * Pattern *cond_sub_relaxed_const now matches const_int
>   SVE_RELAXED_GP operand.
> * Pattern *cond_sub_strict_const now matches const_int
>   SVE_STRICT_GP operand.
> * Remove aarch64_sve_pred_dominates_p condition from both patterns.

Thanks for doing this.

> @@ -5271,6 +5270,43 @@ (define_insn_and_rewrite "*cond_sub_any_const"
>[(set_attr "movprfx" "yes")]
>  )
>  
> +;; Predicated floating-point subtraction from a constant, merging with an
> +;; independent value.

The previous pattern had the same comment.  Maybe add:

  The subtraction predicate and the merge predicate are allowed to be
  different.

to the relaxed one and:

  The subtraction predicate and the merge predicate must be the same.

to this one.

> +(define_insn_and_rewrite "*cond_sub_strict_const"
> +  [(set (match_operand:SVE_FULL_F 0 "register_operand" "=w, w, ?w")
> + (unspec:SVE_FULL_F
> +   [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
> +(unspec:SVE_FULL_F
> +  [(match_dup 1)
> +   (const_int SVE_STRICT_GP)
> +   (match_operand:SVE_FULL_F 2 "aarch64_sve_float_arith_immediate")
> +   (match_operand:SVE_FULL_F 3 "register_operand" "w, w, w")]
> +  UNSPEC_COND_FSUB)
> +(match_operand:SVE_FULL_F 4 "aarch64_simd_reg_or_zero" "Dz, 0, w")]
> +   UNSPEC_SEL))]
> +  "TARGET_SVE
> +   && !rtx_equal_p (operands[3], operands[4])"

Very minor, but the file generally puts conditions on a single line
if they'll fit.  Same for the relaxed version.

> +  "@
> +   movprfx\t%0., %1/z, %3.\;fsubr\t%0., %1/m, 
> %0., #%2
> +   movprfx\t%0., %1/m, %3.\;fsubr\t%0., %1/m, 
> %0., #%2
> +   #"
> +  "&& 1"
> +  {
> +if (reload_completed
> +&& register_operand (operands[4], mode)
> +&& !rtx_equal_p (operands[0], operands[4]))
> +  {
> + emit_insn (gen_vcond_mask_ (operands[0], operands[3],
> +  operands[4], operands[1]));
> + operands[4] = operands[3] = operands[0];
> +  }
> +else if (!rtx_equal_p (operands[1], operands[5]))
> +  operands[5] = copy_rtx (operands[1]);

The last two lines are a hold-over from the relaxed version, where there
were two predicates.  There's no operand 5 in this pattern, so we should
just delete the lines.

Thanks,
Richard


Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-08-19 Thread Richard Sandiford
xiezhiheng  writes:
> I add FLAGS for part of intrinsics in aarch64-simd-builtins.def first for a 
> try,
> including all the add/sub arithmetic intrinsics.
>
> Something like faddp intrinsic which only handles floating-point operations,
> both FP and NONE flags are suitable for it because FLAG_FP will be added
> later if the intrinsic handles floating-point operations.  And I prefer FP 
> since
> it would be more clear.

Sounds good to me.

> But for qadd intrinsics, they would modify FPSR register which is a scenario
> I missed before.  And I consider to add an additional flag FLAG_WRITE_FPSR
> to represent it.

I don't think we make any attempt to guarantee that the Q flag is
meaningful after saturating intrinsics.  To do that, we'd need to model
the modification of the flag in the .md patterns too.

So my preference would be to leave this out and just use NONE for the
saturating forms too.

Thanks,
Richard


Re: [PATCH] AArch64: Add if condition in aarch64_function_value [PR96479]

2020-08-19 Thread Richard Sandiford
qiaopeixin  writes:
> Hi Richard,
>
> Thanks for the example.
>
> I remove the whole line "& && TREE_PUBLIC (fndecl)" and passed 
> bootstrap and deja tests. I add your provided example under 
> /gcc.target/aarch64, and the patch is attached.

Thanks, pushed to master.

Richard


Re: [PATCH] libstdc++: Implement integer-class types as defined in [iterator.concept.winc]

2020-08-19 Thread Jonathan Wakely via Gcc-patches

On 11/08/20 11:38 -0400, Patrick Palka via Libstdc++ wrote:

Subject: [PATCH] libstdc++: integer-class types as per [iterator.concept.winc]

This implements signed and unsigned integer-class types, whose width is
one bit larger than the widest supported signed and unsigned integral
type respectively.  In our case this is either __int128 and unsigned
__int128, or long long and unsigned long long.

Internally, the two integer-class types are represented as a largest
supported unsigned integral type plus one extra bit.  The signed
integer-class type is represented in two's complement form with the
extra bit acting as the sign bit.

libstdc++-v3/ChangeLog:

* include/Makefile.am (bits_headers): Add new header
.
* include/Makefile.in: Regenerate.
* include/bits/iterator_concepts.h
(ranges::__detail::__max_diff_type): Remove definition, replace
with forward declaration of class __max_diff_type.
(__detail::__max_size_type): Remove definition, replace with
forward declaration of class __max_size_type.
(__detail::__is_unsigned_int128, __is_signed_int128,
__is_int128): New concepts.


According to GNU changelog conventions you should close the parens at
the end of the first line and reopen them on the next i.e.

(__detail::__is_unsigned_int128, __is_signed_int128)
(__is_int128): New concepts.

See the paragraph beginning "Break long lists of function names" at
https://www.gnu.org/prep/standards/html_node/Style-of-Change-Logs.html#Style-of-Change-Logs

Apart from that, this looks great. Please push, thanks!



RE: [PATCH] AArch64: Add if condition in aarch64_function_value [PR96479]

2020-08-19 Thread qiaopeixin
Hi Richard,

Thanks for the example.

I remove the whole line "& && TREE_PUBLIC (fndecl)" and passed bootstrap 
and deja tests. I add your provided example under /gcc.target/aarch64, and the 
patch is attached.

By the way, the new example also passed deja tests.

All the best,
Peixin

-Original Message-
From: Richard Sandiford [mailto:richard.sandif...@arm.com] 
Sent: Wednesday, August 19, 2020 1:01 AM
To: qiaopeixin 
Cc: Christophe Lyon ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] AArch64: Add if condition in aarch64_function_value 
[PR96479]

qiaopeixin  writes:
> Hi Richard,
>
> Thanks for the review and explanation.
>
> The previous fix adding if condition of TARGET_FLOAT does crash glibc-2.29.
>
> I checked the past log of writing the function aarch64_init_cumulative_args, 
> and did not find the reason why Alan Lawrence added TREE_PUBLIC (fndecl) as 
> one condition for entering the function type check. Maybe Alan could clarify? 
> I tried to delete TREE_PUBLIC (fndecl), which turns out could solve both the 
> glibc problem and the previous ICE problem. A new fix is made as following, 
> passed bootstrap and deja test. I believe this fix is reasonable, since the 
> function type should be checked no matter if it has external linkage or not.
>
> The function aarch64_init_cumulative_args checks the function types and 
> should catch the error that "-mgeneral-regs-only" is incompatible with the 
> use of SIMD/FP registers. In the test case on PR96479, the function myfunc2 
> returns one vector of 4 integers, while it is defined static type. 
> TREE_PUBLIC (fndecl) is set as false and it prevents from entering if 
> statement and checking function types. I delete "TREE_PUBLIC (fndecl)" so 
> that gcc can catch the error through the function 
> aarch64_init_cumulative_args now. The ICE on PR96479 can report the 
> diagnostic error with this fix. The patch for the fix is attached as 
> following:
>
> diff --git a/gcc/config/aarch64/aarch64.c 
> b/gcc/config/aarch64/aarch64.c index b7f5bc76f1b..9ce83dce131 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -6017,7 +6017,7 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS 
> *pcum,
>  
>if (!silent_p
>&& !TARGET_FLOAT
> -  && fndecl && TREE_PUBLIC (fndecl)
> +  && fndecl
>&& fntype && fntype != error_mark_node)
>  {
>const_tree type = TREE_TYPE (fntype);

I think the fndecl test is problematic too though.  E.g. consider:

typedef int v4si __attribute__((vector_size(16)));
v4si (*foo) ();
void f (v4si *ptr) { *ptr = foo (); }

which ICEs for me even with the above.

I suggest we just remove the line and see whether anything breaks.

Thanks,
Richard


0001-AArch64-Remove-fndecl-TREE_PUBLIC-fndecl-in-aarch64_.patch
Description: 0001-AArch64-Remove-fndecl-TREE_PUBLIC-fndecl-in-aarch64_.patch


[PATCH] arm: Fix -mpure-code support/-mslow-flash-data for armv8-m.base [PR94538]

2020-08-19 Thread Christophe Lyon via Gcc-patches
armv8-m.base (cortex-m23) has the movt instruction, so we need to
disable the define_split to generate a constant in this case,
otherwise we get incorrect insn constraints as described in PR94538.

We also need to fix the pure-code alternative for thumb1_movsi_insn
because the assembler complains with instructions like
movs r0, #:upper8_15:1234
(Internal error in md_apply_fix)
We now generate movs r0, 4 instead.

2020-08-19  Christophe Lyon  
gcc/ChangeLog:

* config/arm/thumb1.md: Disable set-constant splitter when
TARGET_HAVE_MOVT.
(thumb1_movsi_insn): Fix -mpure-code
alternative.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pure-code/pr94538-1.c: New test.
* gcc.target/arm/pure-code/pr94538-2.c: New test.
---
 gcc/config/arm/thumb1.md   | 66 ++
 gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c | 13 +
 gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c | 12 
 3 files changed, 79 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c

diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 0ff8190..f0129db 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -70,6 +70,7 @@ (define_split
   "TARGET_THUMB1
&& arm_disable_literal_pool
&& GET_CODE (operands[1]) == CONST_INT
+   && !TARGET_HAVE_MOVT
&& !satisfies_constraint_I (operands[1])"
   [(clobber (const_int 0))]
   "
@@ -696,18 +697,59 @@ (define_insn "*thumb1_movsi_insn"
   "TARGET_THUMB1
&& (   register_operand (operands[0], SImode)
|| register_operand (operands[1], SImode))"
-  "@
-   movs%0, %1
-   movs%0, %1
-   movw%0, %1
-   #
-   #
-   ldmia\\t%1, {%0}
-   stmia\\t%0, {%1}
-   movs\\t%0, #:upper8_15:%1; lsls\\t%0, #8; adds\\t%0, #:upper0_7:%1; 
lsls\\t%0, #8; adds\\t%0, #:lower8_15:%1; lsls\\t%0, #8; adds\\t%0, 
#:lower0_7:%1
-   ldr\\t%0, %1
-   str\\t%1, %0
-   mov\\t%0, %1"
+{
+  switch (which_alternative)
+{
+  default:
+  case 0: return "movs\t%0, %1";
+  case 1: return "movs\t%0, %1";
+  case 2: return "movw\t%0, %1";
+  case 3: return "#";
+  case 4: return "#";
+  case 5: return "ldmia\t%1, {%0}";
+  case 6: return "stmia\t%0, {%1}";
+  case 7:
+  /* pure-code alternative: build the constant byte by byte,
+instead of loading it from a constant pool.  */
+   {
+ int i;
+ HOST_WIDE_INT op1 = INTVAL (operands[1]);
+ bool mov_done_p = false;
+ rtx ops[2];
+ ops[0] = operands[0];
+
+ /* Emit upper 3 bytes if needed.  */
+ for (i = 0; i < 3; i++)
+   {
+  int byte = (op1 >> (8 * (3 - i))) & 0xff;
+
+ if (byte)
+   {
+ ops[1] = GEN_INT (byte);
+ if (mov_done_p)
+   output_asm_insn ("adds\t%0, %1", ops);
+ else
+   output_asm_insn ("movs\t%0, %1", ops);
+ mov_done_p = true;
+   }
+
+ if (mov_done_p)
+   output_asm_insn ("lsls\t%0, #8", ops);
+   }
+
+ /* Emit lower byte if needed.  */
+ ops[1] = GEN_INT (op1 & 0xff);
+ if (!mov_done_p)
+   output_asm_insn ("movs\t%0, %1", ops);
+ else if (op1 & 0xff)
+   output_asm_insn ("adds\t%0, %1", ops);
+ return "";
+   }
+  case 8: return "ldr\t%0, %1";
+  case 9: return "str\t%1, %0";
+  case 10: return "mov\t%0, %1";
+}
+}
   [(set_attr "length" "2,2,4,4,4,2,2,14,2,2,2")
(set_attr "type" 
"mov_reg,mov_imm,mov_imm,multiple,multiple,load_4,store_4,alu_sreg,load_4,store_4,mov_reg")
(set_attr "pool_range" "*,*,*,*,*,*,*, *,1018,*,*")
diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c 
b/gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
new file mode 100644
index 000..31061d5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-skip-if "skip override" { *-*-* } { "-mfloat-abi=hard" } { "" } } */
+/* { dg-options "-mpure-code -mcpu=cortex-m23 -march=armv8-m.base -mthumb 
-mfloat-abi=soft" } */
+
+typedef int __attribute__ ((__vector_size__ (16))) V;
+
+V v;
+
+void
+foo (void)
+{
+  v += (V){4095};
+}
diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c 
b/gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c
new file mode 100644
index 000..c1789da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mpure-code" } */
+
+typedef int __attribute__ ((__vector_size__ (16))) V;
+
+V v;
+
+void
+foo (void)
+{
+  v += (V){4095};
+}
-- 
2.7.4



Re: [RISC-V] Add support for AddressSanitizer on RISC-V GCC

2020-08-19 Thread Kito Cheng via Gcc-patches
Hi Andrew:

I am not sure the reason why some targets pick different numbers.
It seems it's not only target dependent but also OS dependent[1].

For RV32, I think using 1<<29 like other 32 bit targets is fine.

[1] 
https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/asan/asan_mapping.h#L159

Hi Joshua:

Could you update that for RV32, and this patch will be pending until
LLVM accepts the libsanitizer part.

On Wed, Aug 19, 2020 at 4:48 PM Andrew Waterman  wrote:
>
> I'm having trouble understanding why different ports chose their
> various constants--e.g., SPARC uses 1<<29 for 32-bit and 1<<43 for
> 64-bit, whereas x86 uses 1<<29 and 0x7fff8000, respectively.  So I
> can't comment on the choice of the constant 1<<36 for RISC-V.  But
> isn't it a problem that 1<<36 is not a valid Pmode value for ILP32?
>
>
> On Wed, Aug 19, 2020 at 1:02 AM Joshua via Gcc-patches
>  wrote:
> >
> > From: cooper.joshua 
> >
> > gcc/
> >
> > * config/riscv/riscv.c (asan_shadow_offset): Implement the offset 
> > of asan shadow memory for risc-v.
> > (asan_shadow_offset): new macro definition.
> > ---
> >
> >  gcc/config/riscv/riscv.c | 11 +++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> > index 63b0c38..b85b459 100644
> > --- a/gcc/config/riscv/riscv.c
> > +++ b/gcc/config/riscv/riscv.c
> > @@ -5292,6 +5292,14 @@ riscv_gpr_save_operation_p (rtx op)
> >return true;
> >  }
> >
> > +/* Implement TARGET_ASAN_SHADOW_OFFSET.  */
> > +
> > +static unsigned HOST_WIDE_INT
> > +riscv_asan_shadow_offset (void)
> > +{
> > +  return HOST_WIDE_INT_1U << 36;
> > +}
> > +
> >  /* Initialize the GCC target structure.  */
> >  #undef TARGET_ASM_ALIGNED_HI_OP
> >  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> > @@ -5475,6 +5483,9 @@ riscv_gpr_save_operation_p (rtx op)
> >  #undef TARGET_NEW_ADDRESS_PROFITABLE_P
> >  #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p
> >
> > +#undef TARGET_ASAN_SHADOW_OFFSET
> > +#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset
> > +
> >  struct gcc_target targetm = TARGET_INITIALIZER;
> >
> >  #include "gt-riscv.h"
> > --
> > 2.7.4
> >


[PATCH V2] Add pattern for pointer-diff on addresses with same base/offset (PR 94234)

2020-08-19 Thread Feng Xue OS via Gcc-patches
As Richard's comment, this patch is composed to simplify generalized
binary-with-convert pattern like ((T) X) OP ((T) Y). Instead of creating
almost duplicated rules into match.pd, we try to transform it to (T) (X OP Y),
and apply simplification on (X OP Y) in forwprop pass.

Regards,
Feng
---
2020-08-19  Feng Xue  

gcc/
PR tree-optimization/94234
* tree-ssa-forwprop.c (simplify_binary_with_convert): New function.
* (fwprop_ssa_val): Move it before its new caller.

gcc/testsuite/
PR tree-optimization/94234
* gcc.dg/ifcvt-3.c: Modified to suppress forward propagation.
* gcc.dg/tree-ssa/20030807-10.c: Likewise.
* gcc.dg/pr94234-2.c: New test.

> 
> From: Richard Biener 
> Sent: Monday, June 15, 2020 3:41 PM
> To: Feng Xue OS
> Cc: gcc-patches@gcc.gnu.org; Marc Glisse
> Subject: Re: [PATCH] Add pattern for pointer-diff on addresses with same 
> base/offset (PR 94234)
> 
> On Fri, Jun 5, 2020 at 11:20 AM Feng Xue OS  
> wrote:
>>
>>  As Marc suggested, removed the new pointer_diff rule, and add another rule 
>> to fold
>>  convert-add expression. This new rule is:
>> 
>>(T)(A * C) +- (T)(B * C) -> (T) ((A +- B) * C)
>> 
>>  Regards,
>>  Feng
>> 
>>  ---
>> 2020-06-01  Feng Xue  
>> 
>>  gcc/
>>  PR tree-optimization/94234
>>  * match.pd ((T)(A * C) +- (T)(B * C)) -> (T)((A +- B) * C): New
>>  simplification.
>>  * ((PTR_A + OFF) - (PTR_B + OFF)) -> (PTR_A - PTR_B): New
>>  simplification.
>> 
>>  gcc/testsuite/
>>  PR tree-optimization/94234
>>  * gcc.dg/pr94234.c: New test.
>>  ---
>>   gcc/match.pd   | 28 
>>   gcc/testsuite/gcc.dg/pr94234.c | 24 
>>   2 files changed, 52 insertions(+)
>>   create mode 100644 gcc/testsuite/gcc.dg/pr94234.c
>> 
>>  diff --git a/gcc/match.pd b/gcc/match.pd
>>  index 33ee1a920bf..4f340bfe40a 100644
>>  --- a/gcc/match.pd
>>  +++ b/gcc/match.pd
>>  @@ -2515,6 +2515,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>   && TREE_CODE (@2) == INTEGER_CST
>>   && tree_int_cst_sign_bit (@2) == 0))
>>(minus (convert @1) (convert @2)
>>  +   (simplify
>>  +(pointer_diff (pointer_plus @0 @2) (pointer_plus @1 @2))
>>  + (pointer_diff @0 @1))
> 
> This new pattern is OK.  Please commit it separately.
> 
>>  (simplify
>>   (pointer_diff (pointer_plus @@0 @1) (pointer_plus @0 @2))
>>   /* The second argument of pointer_plus must be interpreted as signed, 
>> and
>>  @@ -2526,6 +2529,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>(minus (convert (view_convert:stype @1))
>>  (convert (view_convert:stype @2)))
>> 
>>  +/* (T)(A * C) +- (T)(B * C) -> (T)((A +- B) * C) and
>>  +   (T)(A * C) +- (T)(A) -> (T)(A * (C +- 1)). */
>>  +(if (INTEGRAL_TYPE_P (type))
>>  + (for plusminus (plus minus)
>>  +  (simplify
>>  +   (plusminus (convert:s (mult:cs @0 @1)) (convert:s (mult:cs @0 @2)))
>>  +   (if (element_precision (type) <= element_precision (TREE_TYPE (@0))
>>  +   && (TYPE_OVERFLOW_UNDEFINED (type) || TYPE_OVERFLOW_WRAPS (type))
>>  +   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
>>  +(convert (mult (plusminus @1 @2) @0
>>  +  (simplify
>>  +   (plusminus (convert @0) (convert@2 (mult:c@3 @0 @1)))
>>  +   (if (element_precision (type) <= element_precision (TREE_TYPE (@0))
>>  +   && (TYPE_OVERFLOW_UNDEFINED (type) || TYPE_OVERFLOW_WRAPS (type))
>>  +   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
>>  +   && single_use (@2) && single_use (@3))
>>  +(convert (mult (plusminus { build_one_cst (TREE_TYPE (@1)); } @1) 
>> @0
>>  +  (simplify
>>  +   (plusminus (convert@2 (mult:c@3 @0 @1)) (convert @0))
>>  +   (if (element_precision (type) <= element_precision (TREE_TYPE (@0))
>>  +   && (TYPE_OVERFLOW_UNDEFINED (type) || TYPE_OVERFLOW_WRAPS (type))
>>  +   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
>>  +   && single_use (@2) && single_use (@3))
>>  +(convert (mult (plusminus @1 { build_one_cst (TREE_TYPE (@1)); }) 
>> @0))
>>  +
> 
> This shows the limit of pattern matching IMHO.  I'm also not convinced
> it gets the
> overflow cases correct (but I didn't spend too much time here).  Note we have
> similar functionality implemented in fold_plusminus_mult_expr.  IMHO instead
> of doing the above moving fold_plusminus_mult_expr to GIMPLE by executing
> it from inside the forwprop pass would make more sense.  Or finally biting the
> bullet and try to teach reassociation about how to handle signed arithmetic
> with non-wrapping overflow behavior.
> 
> Richard.

From 68bba2edb714f741ef6e7f4a7814869cb99e938c Mon Sep 17 00:00:00 2001
From: Feng Xue 
Date: Mon, 17 Aug 2020 23:00:35 +0800
Subject: [PATCH] Simplify binary-with-convert expression in forwprop pass (PR
 94234)

For expression as ((T) X) OP ((T) Y), try to transform it to (T) (X OP Y),
and apply simplification 

Re: [PATCH 1/2] Add new RTX instruction class FILLER_INSN

2020-08-19 Thread Andrea Corallo
Segher Boessenkool  writes:

> Hi Andrea,
>
> On Wed, Jul 22, 2020 at 12:02:33PM +0200, Andrea Corallo wrote:
>> This first patch implements the addition of a new RTX instruction class
>> FILLER_INSN, which has been white listed to allow placement of NOPs
>> outside of a basic block.  This is to allow padding after unconditional
>> branches.  This is favorable so that any performance gained from
>> diluting branches is not paid straight back via excessive eating of
>> nops.
>
>> It was deemed that a new RTX class was less invasive than modifying
>> behavior in regards to standard UNSPEC nops.
>
> So I wonder if this cannot be done with some kind of NOTE, instead?
>

Hi Segher,

I was having a look into reworking this using an insn note as (IIUC)
suggested.  The idea is appealing but looking into insn-notes.def I've
found the following comment:

"We are slowly removing the concept of insn-chain notes from the
compiler.  Adding new codes to this file is STRONGLY DISCOURAGED.
If you think you need one, look for other ways to express what you
mean, such as register notes or bits in the basic-block structure."

Would still be justificated in this case to proceed this way?  The other
option would be to add the information into the basic-block or into
struct rtx_jump_insn.

My GCC experience is far from sufficient for having a formed opinion on
this, I'd probably bet on struct rtx_jump_insn as the better option.

Any thoughts?

Thanks!

  Andrea


Re: [PATCH] testsuite: require c99 runtime for trigonometric optimisation tests

2020-08-19 Thread Richard Sandiford
Pat Bernardi  writes:
> A number of optimisation that simplify trigonometric expressions are only
> performed when the compiler knows the target has a C99 libm available.
> Since targets like *-elf may not have such a libm, a C99 runtime requirement
> is added to these tests.
>
> Tested on x86-elf and x86_64-elf hosted on x86_64-linux in addition to 
> x86_64-pc-linux-gnu
>
> If approved, I'll need a maintainer to kindly commit on my behalf.
>
> Thanks,
>
> Pat Bernardi
> Senior Software Engineer, AdaCore
>
> 2020-08-18  Pat Bernardi  
>
> gcc/testsuite/ChangeLog
>
>   * gcc.dg/sinatan-2.c: Add dg-require-effective-target c99_runtime.
>   * gcc.dg/sinhovercosh-1.c: Likewise.
>   * gcc.dg/tanhbysinh.c: Likewise.

Thanks, pushed to master.

Richard

> ---
>  gcc/testsuite/gcc.dg/sinatan-2.c  | 1 +
>  gcc/testsuite/gcc.dg/sinhovercosh-1.c | 1 +
>  gcc/testsuite/gcc.dg/tanhbysinh.c | 3 ++-
>  3 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/sinatan-2.c 
> b/gcc/testsuite/gcc.dg/sinatan-2.c
> index 8e7ea3c90fc..64d6d301535 100644
> --- a/gcc/testsuite/gcc.dg/sinatan-2.c
> +++ b/gcc/testsuite/gcc.dg/sinatan-2.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-Ofast -fdump-tree-optimized" } */
> +/* { dg-require-effective-target c99_runtime } */
>  
>  extern float sinf (float);
>  extern float cosf (float);
> diff --git a/gcc/testsuite/gcc.dg/sinhovercosh-1.c 
> b/gcc/testsuite/gcc.dg/sinhovercosh-1.c
> index d41093fa6de..564d3c51b3e 100644
> --- a/gcc/testsuite/gcc.dg/sinhovercosh-1.c
> +++ b/gcc/testsuite/gcc.dg/sinhovercosh-1.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-Ofast -fdump-tree-optimized" } */
> +/* { dg-require-effective-target c99_runtime } */
>  
>  extern float sinhf (float);
>  extern float coshf (float);
> diff --git a/gcc/testsuite/gcc.dg/tanhbysinh.c 
> b/gcc/testsuite/gcc.dg/tanhbysinh.c
> index fde72c2f93b..9dbe133ec74 100644
> --- a/gcc/testsuite/gcc.dg/tanhbysinh.c
> +++ b/gcc/testsuite/gcc.dg/tanhbysinh.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-Ofast -fdump-tree-optimized" } */
> +/* { dg-require-effective-target c99_runtime } */
>  
>  extern float sinhf (float);
>  extern float tanhf (float);
> @@ -37,4 +38,4 @@ tanhbysinhl_ (long double x)
>  /* {dg-final { scan-tree-dump-not "tanhl " "optimized" }} */
>  /* { dg-final { scan-tree-dump "cosh " "optimized" } } */
>  /* { dg-final { scan-tree-dump "coshf " "optimized" } } */
> -/* { dg-final { scan-tree-dump "coshl " "optimized" } } */
> \ No newline at end of file
> +/* { dg-final { scan-tree-dump "coshl " "optimized" } } */


Re: [RISC-V] Add support for AddressSanitizer on RISC-V GCC

2020-08-19 Thread Andrew Waterman
I'm having trouble understanding why different ports chose their
various constants--e.g., SPARC uses 1<<29 for 32-bit and 1<<43 for
64-bit, whereas x86 uses 1<<29 and 0x7fff8000, respectively.  So I
can't comment on the choice of the constant 1<<36 for RISC-V.  But
isn't it a problem that 1<<36 is not a valid Pmode value for ILP32?


On Wed, Aug 19, 2020 at 1:02 AM Joshua via Gcc-patches
 wrote:
>
> From: cooper.joshua 
>
> gcc/
>
> * config/riscv/riscv.c (asan_shadow_offset): Implement the offset of 
> asan shadow memory for risc-v.
> (asan_shadow_offset): new macro definition.
> ---
>
>  gcc/config/riscv/riscv.c | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index 63b0c38..b85b459 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -5292,6 +5292,14 @@ riscv_gpr_save_operation_p (rtx op)
>return true;
>  }
>
> +/* Implement TARGET_ASAN_SHADOW_OFFSET.  */
> +
> +static unsigned HOST_WIDE_INT
> +riscv_asan_shadow_offset (void)
> +{
> +  return HOST_WIDE_INT_1U << 36;
> +}
> +
>  /* Initialize the GCC target structure.  */
>  #undef TARGET_ASM_ALIGNED_HI_OP
>  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> @@ -5475,6 +5483,9 @@ riscv_gpr_save_operation_p (rtx op)
>  #undef TARGET_NEW_ADDRESS_PROFITABLE_P
>  #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p
>
> +#undef TARGET_ASAN_SHADOW_OFFSET
> +#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>
>  #include "gt-riscv.h"
> --
> 2.7.4
>


RE: [PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-08-19 Thread Alex Coplan
Hi Richard,

> -Original Message-
> From: Richard Sandiford 
> Sent: 18 August 2020 09:35
> To: Alex Coplan 
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw ;
> Marcus Shawcroft ; Kyrylo Tkachov
> 
> Subject: Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend
> syntax
> 
> Alex Coplan  writes:
> > Note that an obvious omission here is that this patch does not touch
> the
> > mult patterns such as *add__mult_. I found
> > that I couldn't hit these patterns with C code since multiplications by
> > powers of two always get turned into shifts by earlier RTL passes. If
> > there's a way to reliably hit these patterns, then perhaps these should
> > be updated as well.
> 
> Hmm.  Feels like we should either update them or delete them.  E.g.:
> 
>   *adds__multp2
>   *subs__multp2
> 
> were added alongside the adds3.c and subs3.c tests that you're updating,
> so if the tests don't/no longer need the multp2 patterns to pass,
> there's a good chance that the patterns are redundant.
> 
> For reasons I never understood, the canonical representation is to use
> (mult …) for powers of 2 inside a (mem …) but shifts outside of (mem …)s.
> So perhaps the patterns were originally for address calculations that had
> been moved outside of a (mem …) and not updated to shifts instead of
> mults.

Thanks for the review, and for clarifying this. I tried removing these
together with the *_mul_imm_ patterns (e.g. *adds_mul_imm_) and
the only failure was gcc/testsuite/gcc.dg/torture/pr34330.c when
compiled with -Os -ftree-vectorize which appears to depend on the
*add_mul_imm_di pattern. Without this pattern, we ICE in LRA on this
input.

In this case, GCC appears to have done exactly what you described: we
have the (plus (mult ...) ...) nodes inside (mem)s prior to register
allocation, and then we end up pulling these out without converting them
to shifts.

Seeing this behaviour (and in particular seeing the ICE) makes me
hesitant to just go ahead and remove the other patterns. That said, I
have a patch to remove the following patterns:

 *adds__multp2
 *subs__multp2
 *add__mult_
 *add__mult_si_uxtw
 *add__multp2
 *add_si_multp2_uxtw
 *add_uxt_multp2
 *add_uxtsi_multp2_uxtw
 *sub__multp2
 *sub_si_multp2_uxtw
 *sub_uxt_multp2
 *sub_uxtsi_multp2_uxtw
 *sub_uxtsi_multp2_uxtw

(together with the predicate aarch64_pwr_imm3 which is only used in
these patterns) and this bootstraps/regtests just fine.

So, I have a couple of questions:

(1) Should it be considered a bug if we pull (plus (mult (power of 2)
   ...) ...) out of a (mem) RTX without re-writing the (mult) as a
   shift?
(2) If not, can we otherwise justify the removal of the patterns here?

I'm happy to go ahead with this if either (1) or (2) are true.

Thanks,
Alex


[RISC-V] Add support for AddressSanitizer on RISC-V GCC

2020-08-19 Thread Joshua via Gcc-patches
From: cooper.joshua 

gcc/

* config/riscv/riscv.c (asan_shadow_offset): Implement the offset of 
asan shadow memory for risc-v.
(asan_shadow_offset): new macro definition.
---

 gcc/config/riscv/riscv.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 63b0c38..b85b459 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -5292,6 +5292,14 @@ riscv_gpr_save_operation_p (rtx op)
   return true;
 }
 
+/* Implement TARGET_ASAN_SHADOW_OFFSET.  */
+
+static unsigned HOST_WIDE_INT
+riscv_asan_shadow_offset (void)
+{
+  return HOST_WIDE_INT_1U << 36;
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -5475,6 +5483,9 @@ riscv_gpr_save_operation_p (rtx op)
 #undef TARGET_NEW_ADDRESS_PROFITABLE_P
 #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p
 
+#undef TARGET_ASAN_SHADOW_OFFSET
+#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-riscv.h"
-- 
2.7.4



[PATCH] libgccjit: update some comments in libgccjit.c

2020-08-19 Thread Andrea Corallo
Hi all,

just a small patch updating some comments that apparently went out of
sync a while ago adding gcc_jit_context_new_rvalue_from_long.

Okay for trunk?

Thanks

  Andrea

>From 84b94a039d164878bdbf8bfd1a2038960f813c76 Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Thu, 6 Aug 2020 10:25:40 +0200
Subject: [PATCH] libgccjit: Update comments for
 gcc_jit_context_new_rvalue_from* functions

2020-08-06  Andrea Corallo  

* libgccjit.c:
(gcc_jit_context_new_rvalue_from_int)
(gcc_jit_context_new_rvalue_from_long)
(gcc_jit_context_new_rvalue_from_double)
(gcc_jit_context_new_rvalue_from_ptr): Update function heading
comments.
---
 gcc/jit/libgccjit.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c
index 3d04f6db3af..50130fbbe00 100644
--- a/gcc/jit/libgccjit.c
+++ b/gcc/jit/libgccjit.c
@@ -1188,7 +1188,7 @@ gcc_jit_rvalue_get_type (gcc_jit_rvalue *rvalue)
 /* Public entrypoint.  See description in libgccjit.h.
 
After error-checking, the real work is done by the
-   gcc::jit::recording::context::new_rvalue_from_int method in
+   gcc::jit::recording::context::new_rvalue_from_const  method in
jit-recording.c.  */
 
 gcc_jit_rvalue *
@@ -1204,7 +1204,11 @@ gcc_jit_context_new_rvalue_from_int (gcc_jit_context 
*ctxt,
  ->new_rvalue_from_const  (numeric_type, value));
 }
 
-/* FIXME. */
+/* Public entrypoint.  See description in libgccjit.h.
+
+   After error-checking, the real work is done by the
+   gcc::jit::recording::context::new_rvalue_from_const  method
+   in jit-recording.c.  */
 
 gcc_jit_rvalue *
 gcc_jit_context_new_rvalue_from_long (gcc_jit_context *ctxt,
@@ -1256,7 +1260,7 @@ gcc_jit_context_one (gcc_jit_context *ctxt,
 /* Public entrypoint.  See description in libgccjit.h.
 
After error-checking, the real work is done by the
-   gcc::jit::recording::context::new_rvalue_from_double method in
+   gcc::jit::recording::context::new_rvalue_from_const  method in
jit-recording.c.  */
 
 gcc_jit_rvalue *
@@ -1275,8 +1279,8 @@ gcc_jit_context_new_rvalue_from_double (gcc_jit_context 
*ctxt,
 /* Public entrypoint.  See description in libgccjit.h.
 
After error-checking, the real work is done by the
-   gcc::jit::recording::context::new_rvalue_from_ptr method in
-   jit-recording.c.  */
+   gcc::jit::recording::context::new_rvalue_from_const  method
+   in jit-recording.c.  */
 
 gcc_jit_rvalue *
 gcc_jit_context_new_rvalue_from_ptr (gcc_jit_context *ctxt,
-- 
2.17.1



Re: [PATCH V2] libgccjit: Add new gcc_jit_context_new_blob entry point

2020-08-19 Thread Andrea Corallo
David Malcolm  writes:

> Thanks for the updated patch.  Comments inline below.

Hi Dave,

sorry for the late reply.

> [...]
>
>> diff --git a/gcc/jit/docs/topics/expressions.rst 
>> b/gcc/jit/docs/topics/expressions.rst
>> index d783ceea51a8..7699dcfd27be 100644
>> --- a/gcc/jit/docs/topics/expressions.rst
>> +++ b/gcc/jit/docs/topics/expressions.rst
>> @@ -582,6 +582,27 @@ Global variables
>>referring to it.  Analogous to using an "extern" global from a
>>header file.
>>
>> +.. function:: gcc_jit_lvalue *\
>> +  gcc_jit_global_set_initializer (gcc_jit_lvalue *global,\
>> +  const void *blob,\
>> +  size_t num_bytes)
>> +
>> +   Set an initializer for an object using the memory content pointed
>> +   by ``blob`` for ``num_bytes``.  ``global`` must be an arrays of an
>
> Typo: "arrays" -> "array"

Fixed

>> +   integral type.
>
> Why the non-void return type?  Looking at libgccjit.c I see it returns
> "global" if it succeeds, or NULL if it fails.  Wouldn't it be better to
> simply have void return type, and rely on the normaly error-handling
> mechanisms?
> Or is this inspired by the inline asm patch? (for PR 87291)

The idea is that this way the user could also create the global,
initialize it and directly use it like:

foo (gcc_jit_global_set_initializer (gcc_jit_context_new_global (...), ...))

I left it that way, let me know if you prefer otherwise.

[...]
>
>> --- a/gcc/jit/jit-playback.h
>> +++ b/gcc/jit/jit-playback.h
>> @@ -111,6 +111,15 @@ public:
>>type *type,
>>const char *name);
>>
>> +  lvalue *
>> +  new_global_initialized (location *loc,
>> +  enum gcc_jit_global_kind kind,
>> +  type *type,
>> +  size_t element_size,
>> +  size_t initializer_num_elem,
>> +  const void *initializer,
>> +  const char *name);
>> +
>>template 
>>rvalue *
>>new_rvalue_from_const (type *type,
>> @@ -266,6 +275,14 @@ private:
>>const char * get_path_s_file () const;
>>const char * get_path_so_file () const;
>>
>> +  tree
>> +  global_new_decl (location *loc,
>> +   enum gcc_jit_global_kind kind,
>> +   type *type,
>> +   const char *name);
>> +  lvalue *
>> +  global_finalize_lvalue (tree inner);
>> +
>>  private:
>>
>>/* Functions for implementing "compile".  */
>> diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
>> index 0fddf04da873..52fc92f5928c 100644
>> --- a/gcc/jit/jit-playback.c
>> +++ b/gcc/jit/jit-playback.c
>> @@ -510,14 +510,14 @@ new_function (location *loc,
>>return func;
>>  }
>>
>> -/* Construct a playback::lvalue instance (wrapping a tree).  */
>> +/* In use by new_global and new_global_initialized.  */
>>
>> -playback::lvalue *
>> +tree
>>  playback::context::
>> -new_global (location *loc,
>> -enum gcc_jit_global_kind kind,
>> -type *type,
>> -const char *name)
>> +global_new_decl (location *loc,
>> + enum gcc_jit_global_kind kind,
>> + type *type,
>> + const char *name)
>>  {
>>gcc_assert (type);
>>gcc_assert (name);
>> @@ -547,6 +547,15 @@ new_global (location *loc,
>>if (loc)
>>  set_tree_location (inner, loc);
>>
>> +  return inner;
>> +}
>> +
>> +/* In use by new_global and new_global_initialized.  */
>> +
>> +playback::lvalue *
>> +playback::context::
>> +global_finalize_lvalue (tree inner)
>> +{
>>varpool_node::get_create (inner);
>>
>>varpool_node::finalize_decl (inner);
>> @@ -556,6 +565,90 @@ new_global (location *loc,
>>return new lvalue (this, inner);
>>  }
>>
>> +/* Construct a playback::lvalue instance (wrapping a tree).  */
>> +
>> +playback::lvalue *
>> +playback::context::
>> +new_global (location *loc,
>> +enum gcc_jit_global_kind kind,
>> +type *type,
>> +const char *name)
>> +{
>> +  tree inner = global_new_decl (loc, kind, type, name);
>> +
>> +  return global_finalize_lvalue (inner);
>> +}
>> +
>> +/* Fill 'constructor_elements' with the memory content of
>> +   'initializer'.  Each element of the initializer is of the size of
>> +   type T.  In use by new_global_initialized.*/
>> +
>> +template
>> +static void
>> +load_blob_in_ctor (vec *_elements,
>> +   size_t num_elem,
>> +   const void *initializer)
>> +{
>> +  /* Loosely based on 'output_init_element' c-typeck.c:9691.  */
>> +  const T *p = (const T *)initializer;
>> +  tree node = make_unsigned_type (BITS_PER_UNIT * sizeof (T));
>> +  for (size_t i = 0; i < num_elem; i++)
>> +{
>> +  constructor_elt celt =
>> +{ build_int_cst (long_unsigned_type_node, i),
>> +  build_int_cst (node, p[i]) };
>> +  vec_safe_push (constructor_elements, celt);
>> +}
>> +}
>> +
>> +/* Construct an initialized 

Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-19 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu  wrote:
>
> On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak  wrote:
> >
> > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu  wrote:
> > >
> > > Enable operator or/xor/and/andn/not for mask register, kxnor is not
> > > enabled since there's no corresponding instruction for general
> > > registers.
> > >
> > > gcc/
> > > PR target/88808
> > > * config/i386/i386.md: (*movsi_internal): Adjust constraints
> > > for mask registers.
> > > (*movhi_internal): Ditto.
> > > (*movqi_internal): Ditto.
> > > (*anddi_1): Support mask register operations
> > > (*and_1): Ditto.
> > > (*andqi_1): Ditto.
> > > (*andn_1): Ditto.
> > > (*_1): Ditto.
> > > (*qi_1): Ditto.
> > > (*one_cmpl2_1): Ditto.
> > > (*one_cmplsi2_1_zext): Ditto.
> > > (*one_cmplqi2_1): Ditto.
> > >
> > > gcc/testsuite/
> > > * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> >
> > index 74d207c3711..e8ad79d1b0a 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -2294,7 +2294,7 @@
> >
> >  (define_insn "*movsi_internal"
> >[(set (match_operand:SI 0 "nonimmediate_operand"
> > -"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> > +"=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
> >  (match_operand:SI 1 "general_operand"
> >  "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
> >"!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> >
> > I'd rather see *k everywhere, also with *movqi_internal and
> > *movhi_internal patterns. The "*" means that the allocator won't
> > allocate a mask register by default, but it will be used to optimize
> > moves. With the above change, you are risking that during integer
> > register pressure, the register allocator will allocate zero to a mask
> > register, and later "optimize" the move with a direct maskreg-intreg
> > move.
> >
> > The current strategy is that only general registers get allocated for
> > integer modes. Let's keep it this way for now.
> >
>
> Yes,  though it would fail gcc.target/i386/avx512dq-pr88465.c and
> gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to
> move zero into mask register directly.

Although it would be nice if the register allocator was smart enough,
the current strategy is to introduce peephole2 patterns to fix these
problems, similar to [1]. These peepholes can be introduced in a
follow-up patch.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html

> > Otherwise, the patchset LGTM, but please test the suggested changes and 
> > repost.
> >
> > BTW: Do you plan to remove mask operations from sse.md? ATM, they are
> > used to distinguish mask operations, generated from builtins from
> > generic operations, so I'd like to keep them for a while. The drawback
> > is, that they are not combined with other operations, but at the end
> > of the day, this is what the programmer asked for by using builtins.
>
> Agree, I prefer to keep them.

Thinking some more about the approach, it looks to me that the optimal
solution is a post-reload splitter that would convert "generic"
patterns to mask operations from sse.md. The mask operations don't set
flags, so we can substantially improve post reload scheduling of these
instructions by removing flags clobber.

So, simply add "#" to relevant alternatives of logic patterns and add
something like:

--cut here--
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 41c6dbfa668..ad49bdc7583 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1470,6 +1470,18 @@
   ]
   (const_string "")))])

+(define_split
+  [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
+   (any_logic:SWI1248_AVX512BW
+ (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand")
+ (match_operand:SWI1248_AVX512BW 2 "mask_reg_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F && reload_completed"
+  [(parallel
+ [(set (match_dup 0)
+  (any_logic:SWI1248_AVX512BW (match_dup 1) (match_dup 2)))
+  (unspec [(const_int 0)] UNSPEC_MASKOP)])])
+
 (define_insn "kandn"
   [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
(and:SWI1248_AVX512BW
--cut here--

and similar for kandn and knot in sse.md. You will have to add
mask_reg_operand predicate, see e.g. sse_reg_operand in predicates.md
for example.

We don't lose anything, because all important transformations,
propagations and simplifications with these patterns happen before
reload.

Uros.


Re: [PATCH 2/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-19 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 19, 2020 at 4:17 AM Hongtao Liu  wrote:
>
> On Wed, Aug 19, 2020 at 10:17 AM Hongtao Liu  wrote:
> >
> > On Mon, Aug 17, 2020 at 5:34 PM Uros Bizjak  wrote:
> > >
> > > On Fri, Aug 14, 2020 at 10:24 AM Hongtao Liu  wrote:
> > > >
> > > >   Enable direct move between masks and gprs in pass_reload with
> > > > consideration of cost model.
> > > >
> > > > Changelog
> > > > gcc/
> > > > * config/i386/i386.c (inline_secondary_memory_needed):
> > > > No memory is needed between mask regs and gpr.
> > > > (ix86_hard_regno_mode_ok): Add condition TARGET_AVX512F for
> > > > mask regno.
> > > > * config/i386/i386.h (enum reg_class): Add INT_MASK_REGS.
> > > > (REG_CLASS_NAMES): Ditto.
> > > > (REG_CLASS_CONTENTS): Ditto.
> > > > * config/i386/i386.md: Exclude mask register in
> > > > define_peephole2 which is available only for gpr.
> > > >
> > > > gcc/testsuites/
> > > > * gcc.target/i386/pr71453-1.c: New tests.
> > > > * gcc.target/i386/pr71453-2.c: Ditto.
> > > > * gcc.target/i386/pr71453-3.c: Ditto.
> > > > * gcc.target/i386/pr71453-4.c: Ditto.
> > >
> > > @@ -18571,9 +18571,7 @@ inline_secondary_memory_needed (machine_mode
> > > mode, reg_class_t class1,
> > >|| MAYBE_SSE_CLASS_P (class1) != SSE_CLASS_P (class1)
> > >|| MAYBE_SSE_CLASS_P (class2) != SSE_CLASS_P (class2)
> > >|| MAYBE_MMX_CLASS_P (class1) != MMX_CLASS_P (class1)
> > > -  || MAYBE_MMX_CLASS_P (class2) != MMX_CLASS_P (class2)
> > > -  || MAYBE_MASK_CLASS_P (class1) != MASK_CLASS_P (class1)
> > > -  || MAYBE_MASK_CLASS_P (class2) != MASK_CLASS_P (class2))
> > > +  || MAYBE_MMX_CLASS_P (class2) != MMX_CLASS_P (class2))
> > >  {
> > >gcc_assert (!strict || lra_in_progress);
> > >return true;
> > >
> > > No, this is still needed, the reason is explained in the comment above
> >
> > Remove my change here.
> >
> > > inline_secondary_memory_needed:
> > >
> > >The function can't work reliably when one of the CLASSES is a class
> > >containing registers from multiple sets.  We avoid this by never 
> > > combining
> > >different sets in a single alternative in the machine description.
> > >Ensure that this constraint holds to avoid unexpected surprises.
> > >
> > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > index b24a4557871..74d207c3711 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -15051,7 +15051,7 @@
> > > (parallel [(set (reg:CC FLAGS_REG)
> > > (unspec:CC [(match_dup 0)] UNSPEC_PARITY))
> > >(clobber (match_dup 0))])]
> > > -  ""
> > > +  "!MASK_REGNO_P (REGNO (operands[0]))"
> > >[(set (reg:CC FLAGS_REG)
> > >  (unspec:CC [(match_dup 1)] UNSPEC_PARITY))])
> > >
> > > @@ -15072,6 +15072,7 @@
> > > (label_ref (match_operand 5))
> > > (pc)))]
> > >"REGNO (operands[2]) == REGNO (operands[3])
> > > +   && !MASK_REGNO_P (REGNO (operands[1]))
> > > && peep2_reg_dead_p (3, operands[0])
> > > && peep2_reg_dead_p (3, operands[2])
> > > && peep2_regno_dead_p (4, FLAGS_REG)"
> > >
> >
> > Changed for upper two define_peepholes.
> >
> > > Actually, there are several (historic?) peephole2 patterns that assume
> >
> > I didn't find those patterns.
> >
> > I looked through i386.md, there are 3 cases
> > 1. Since mask registers are only used for "mov/zero_extend/bitwise"
> > patterns, peephole2 patterns involving only other patterns are safe to
> > use general_registers.
> > 2. For those peephole2 patterns containing both
> > mov/zero_extend/bitwise and other patterns, implicit conditions such
> > as match_dup used in other patterns will make them the same as case 1.
> > 3. Peephole2 patterns are safe to use mask registers, since they would
> > be eliminated in output patterns.
> >
> > > register_operand means only integer registers. Just change
> > > register_operand to general_reg_operand and eventually
> > > nonimmediate_operand to nonimmediate_gr_operand. Do not put additional
> > > predicates into insn predicate.
> >
> >
> >
> > >
> > > Uros.
> >
> > Update patch.
> >
> > --
> > BR,
> > Hongtao
>
> This patch.

OK, modulo:

+/* { dg-final { scan-assembler-not "%xmm" } } */

It is not clear to me what the testcase is testing here. The scan
string is probably too broad and can generate false positives.

Is it possible to refine the scans?

Uros.


Re: [PATCH 1/4][PR target/88808]Enable bitwise operator for AVX512 masks.

2020-08-19 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 19, 2020 at 3:52 AM Hongtao Liu  wrote:
>
> On Mon, Aug 17, 2020 at 5:20 PM Uros Bizjak  wrote:
> >
> > On Fri, Aug 14, 2020 at 10:22 AM Hongtao Liu  wrote:
> > >
> > > Hi:
> > >   First, since avx512 masks involve both vector isa and general part,
> > > so i add both maintainers to the maillist.
> > >
> > >   I'm doing this in 4 steps:
> > >   1 - Add cost model for operation of mask registers.
> > >   2 - Introduce new cover class INT_MASK_REGS, this will enable direct
> > > move between gpr and mask registers in pass_reload by consideration of
> > > cost model, this is similar as INT_SSE_REGS.
> > >   3 - Tune cost model.
> > >   4 - Enable operator or/xor/and/andn/not for mask register. kxnor is
> > > not enabled since there's no corresponding instruction for general
> > > registers, 64bit mask op is not enabled for 32bit target.
> > > kadd/kshift/ktest are not merged into general versionsadd/ashl/test
> > > since i think it would be odd to use mask register for those
> > > operations.
> > >
> > >   Bootstrap is ok, regression test is ok for i386/x86-64 result.
> > >   There's some improvement for performance of SPEC2017 tested on SKL,
> > > i observe there're many spills from integer to mask registers instead
> > > of memory which is the reason for the improvement.
> >
> > +  if (MASK_CLASS_P (regclass))
> > +{
> > +  int index;
> > +  switch (GET_MODE_SIZE (mode))
> > +{
> > +case 1:
> > +  index = 0;
> > +  break;
> > +case 2:
> > +  index = 1;
> > +  break;
> > +default:
> > +  index = 3;
> >
> > Max index = 2!
> >
>
> Fix typo.
>
> > +  break;
> > +}
> > +
> > +  if (in == 2)
> > +return MAX (ix86_cost->hard_register.mask_load[index],
> > +ix86_cost->hard_register.mask_store[index]);
> > +  return in ? ix86_cost->hard_register.mask_load[2]
> > +: ix86_cost->hard_register.mask_store[2];
> > +}
> >
> > Are DImode loads and stores assumed to cost the same as SImode? A
> > comment would be nice here.
> >
>
> Yes, comment is added.
>
> > Uros.
>
> Update patch.

Patch is OK (no comment on costs, though).

Thanks,
Uros.


[PATCH v2] testsuite: Update some vect cases for partial vectors

2020-08-19 Thread Kewen.Lin via Gcc-patches
Hi Richard,

>> Yeah, the comments were confusing, its intent is to check which targets
>> support partial vectors and which usage to be used.
>>
>> How about to update them like:
>>
>> "Return true if loops using partial vectors are supported and usage kind is
>> 1/2".
> 
> I wasn't really commenting on the comment so much as the intent.
> It should be possible to run the testsuite with:
> 
>   --target_board unix/--param=vect-partial-vector-usage=1
> 
> and get the right results.
> 
>>> E.g. maybe use check_compile to run gcc with “-Q --help=params” and an
>>> arbitrary output type (probably assembly).  Then use “regexp” on the
>>> lines to parse the --param=vect-partial-vector-usage value.  At that
>>> point it would be worth caching the result.
>>
>> Now the default value of this parameter is 2, even for those targets which
>> don't have the supports with partial vectors.  Since we will get the value
>> 2 on those unsupported targets, it looks like we have to set it manually?
> 
> I think that just means we want:
> 
> vect_len_load_store
>   the len_load_store equivalent of vect_fully_masked, i.e. whether
>   the target supports len load/store (regardless of whether the
>   --param enables it)
> 
> vect_partial_vectors
>   (vect_fully_masked || vect_len_load_store) && param != 0
> 
> vect_partial_vectors_usage_1
>   (vect_fully_masked || vect_len_load_store) && param == 1
> 
> vect_partial_vectors_usage_2
>   (vect_fully_masked || vect_len_load_store) && param == 2
> 

Thanks for the clarification!

Sorry for the late update, this new version depends on the --help= patch,
I held it for a while.

Bootstrapped/regtested on powerpc64le-linux-gnu P9, with/without any
explicit usage setting, it also works well with --target_board 
unix/--param=vect-partial-vector-usage=1.

By the way, for the testing on usage 1, it needs the function 
check_effective_target_vect_len_load_store to return true for Power9,
this part will be sent for review separately.

Is it ok for trunk?

BR,
Kewen
-
gcc/ChangeLog:

* doc/sourcebuild.texi (vect_len_load_store,
vect_partial_vectors_usage_1, vect_partial_vectors_usage_2,
vect_partial_vectors): Document.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-pr69907.c: Adjust for partial vector usages.
* gcc.dg/vect/slp-3.c: Likewise.
* gcc.dg/vect/slp-multitypes-11.c: Likewise.
* gcc.dg/vect/slp-perm-1.c: Likewise.
* gcc.dg/vect/slp-perm-5.c: Likewise.
* gcc.dg/vect/slp-perm-6.c: Likewise.
* gcc.dg/vect/slp-perm-7.c: Likewise.
* gcc.dg/vect/slp-perm-8.c: Likewise.
* gcc.dg/vect/slp-perm-9.c: Likewise.
* gcc.dg/vect/vect-version-2.c: Likewise.
* lib/target-supports.exp (check_vect_partial_vector_usage): New
function.
(check_effective_target_vect_len_load_store): Likewise.
(check_effective_target_vect_partial_vectors_usage_1): Likewise.
(check_effective_target_vect_partial_vectors_usage_2): Likewise.
(check_effective_target_vect_partial_vectors): Likewise.
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 9f37ac26241..2290af5812d 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1689,6 +1689,19 @@ Target supports AND, IOR and XOR reduction on vectors.
 
 @item vect_fold_extract_last
 Target supports the @code{fold_extract_last} optab.
+
+@item vect_len_load_store
+Target supports loops using length-based partial vectors.
+
+@item vect_partial_vectors_usage_1
+Target supports loops using partial vectors but only for those loops whose need
+to iterate can be removed.
+
+@item vect_partial_vectors_usage_2
+Target supports loops using partial vectors and for all loops.
+
+@item vect_partial_vectors
+Target supports loops using partial vectors.
 @end table
 
 @subsubsection Thread Local Storage attributes
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
index fe52d18525a..b348526b62f 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O3" } */
+/* Disable for vectorization using partial vectors since it would have only
+   one iteration left, consequently BB vectorization won't happen.  */
+/* { dg-additional-options "-O3 --param=vect-partial-vector-usage=0" } */
 /* { dg-require-effective-target vect_unpack } */
 
 #include "tree-vect.h"
diff --git a/gcc/testsuite/gcc.dg/vect/slp-3.c 
b/gcc/testsuite/gcc.dg/vect/slp-3.c
index 5e40499ff96..46ab584419a 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-3.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-3.c
@@ -141,8 +141,8 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { 
! vect_fully_masked } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" { target 
vect_fully_masked } } } */
-/* { dg-final {