date:20191107

Re: [PATCH][committed] Warn about inconsistent OpenACC nested reduction clauses

2019-11-07 Thread Harwath, Frederik

Hi Jakub,

On 06.11.19 14:00, Jakub Jelinek wrote:
> On Wed, Nov 06, 2019 at 01:41:47PM +0100, frede...@codesourcery.com wrote:
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -128,6 +128,12 @@ struct omp_context
>> [...]
>> +  /* A tree_list of the reduction clauses in this context.  */
>> +  tree local_reduction_clauses;
>> +
>> +  /* A tree_list of the reduction clauses in outer contexts.  */
>> +  tree outer_reduction_clauses;
> 
> Could there be acc in the name to make it clear it is OpenACC only?

Yes, will be added.


>> @@ -910,6 +916,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
>> [...]
>> +  ctx->local_reduction_clauses = NULL;
>> [...]
>> @@ -925,6 +933,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
>> [...]
>> +  ctx->local_reduction_clauses = NULL;
>> +  ctx->outer_reduction_clauses = NULL;
> 
> The = NULL assignments are unnecessary in all 3 cases, ctx is allocated with
> XCNEW.

Ok, will be removed.

>> @@ -1139,6 +1149,11 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>>goto do_private;
>>  
>>  case OMP_CLAUSE_REDUCTION:
>> +  if (is_oacc_parallel (ctx) || is_oacc_kernels (ctx))
>> +ctx->local_reduction_clauses
>> +  = tree_cons (NULL, c, ctx->local_reduction_clauses);
> 
> I'm not sure it is a good idea to use a TREE_LIST in this case, vec would be
> more natural, wouldn't it.

Yes.

> Or, wouldn't it be better to do this checking in the gimplifier instead of
> omp-low.c?  There we have splay trees with GOVD_REDUCTION etc. for the
> variables, so it wouldn't be O(#reductions^2) compile time> It is true that 
> the gimplifier doesn't record the reduction codes (after
> all, OpenMP has UDRs and so there can be fairly arbitrary reductions).


Right, I have considered moving the implementation somewhere else before.
I am going to look into this, but perhaps we will just keep it where it is
if otherwise the implementation becomes more complicated.

> Consider million reduction clauses on nested loops.
> If gimplifier is not the right spot, then use a splay tree + vector instead?
> splay tree for the outer ones, vector for the local ones, and put into both
> the clauses, so you can compare reduction code etc.

Sounds like a good idea. I am going to try that.
However, I have not seen the suboptimal data structure choices
of the original patch as a problem, since the case of million reduction clauses
has not occurred to me.

Thank you for your feedback!

Best regards,
Frederik

Re: [Patch][Fortran] PR91253 fix continuation-line handling with -pre_include

2019-11-07 Thread Jerry DeLisle


On 11/7/19 12:41 PM, Tobias Burnus wrote:

This fixes the gfortran.dg/continuation_6.f fails testsuite fails with newer 
GLIBC.

The continuation line handling assumes that the line number starts at 0 (→ 
continue_line) and then can be incremented, if needed.


The problem came up with -pre_include, which is used with newer GLIBC to provide 
things like "!GCC$ builtin (cos) attributes simd (notinbranch) if('x86_64')".


There, first the file math-vector-fortran.h file is loaded, then the actual 
file. The 'continue_line' gets incremented for math-vector-fortran.h but nothing 
resets it before parsing the actual input file. For the 'include_stmt' function, 
the reset happens during parsing – while for our case, this knowledge is only in 
the line information, but on file change, 'continue_line' is not updated/reset.


I think the same issue can occur with #include, especially as one plays with 
#line, but I have not actually tested it. Obviously, if one plays around with 
#line during a continuation block, this check won't work either. However, it 
should fix the most common occurrence.


Additionally, I have removed the ATTRIBUTE_UNUSED from get_file's 'reason' as it 
is used in the linemap_add call.


And I have moved the OpenMP/OpenACC comment before if openmp/openacc condition, 
where in my opinion it belongs to.


OK for the trunk?

Tobias



Yes, OK, thanks for patch.

Jerry

Re: [PATCH rs6000]Fix PR92132

2019-11-07 Thread Kewen.Lin

Hi Segher,

on 2019/11/8 上午8:07, Segher Boessenkool wrote:
> Hi!
> 
>>> Half are pretty simple:
>>>
>>> lt(a,b) = gt(b,a)
>>> gt(a,b) = gt(a,b)
>>> eq(a,b) = eq(a,b)
>>> le(a,b) = ge(b,a)
>>> ge(a,b) = ge(a,b)
>>>
>>> ltgt(a,b) = ge(a,b) ^ ge(b,a)
>>> ord(a,b)  = ge(a,b) | ge(b,a)
>>>
>>> The other half are the negations of those:
>>>
>>> unge(a,b) = ~gt(b,a)
>>> unle(a,b) = ~gt(a,b)
>>> ne(a,b)   = ~eq(a,b)
>>> ungt(a,b) = ~ge(b,a)
>>> unlt(a,b) = ~ge(a,b)
>>>
>>> uneq(a,b) = ~(ge(a,b) ^ ge(b,a))
>>> un(a,b) = ~(ge(a,b) | ge(b,a))
>>
>> Awesome!  Do you suggest refactoring on them?  :)
> 
> I'd do the first five in one pattern (which then swaps two ops and the
> condition in the lt and le case), and the other five in another pattern.
> And the rest in two or four patterns?  Just try it out, see what works
> well.  It helps to do a bunch together in one pattern, but if that then
> turns into special cases for everything, more might be lost than gained.> 

Got it, I'll make a refactoring patch for this part later.

> 
>>> 8 codes, ordered:never lt   gt   ltgt eq   le   ge   ordered
>>> 8 codes, unordered:  unordered unlt ungt ne   uneq unle unge always
>>> 8 codes, fast-math:  never lt   gt   ne   eq   le   ge   always
>>> 8 codes, non-fp: never lt   gt   ne   eq   le   ge   always
>>
>> Sorry, I don't quite follow this table.  What's the column heads?
> 
> The first row is the eight possible fp conditions that are not always
> true if unordered is set; the second row is those that *are* always true
> if it is set.  The other two rows (which are the same) is just the eight
> conditions that do not test unordered at all.
> 
> The tricky one is "ne": for FP *with* NaNs, "ne" means "less than, or
> greater than, or unordered", while without NaNs (i.e. -ffast-math) it
> means "less than, or greater than".
> 
> You could write the column heads as
> --/--/--  lt/--/--  --/gt/--  lt/gt/--  --/--/eq  lt/--/eq  --/gt/eq  lt/gt/eq
> if that helps?  Just the eight combinations of the first free flags.
> 

Thanks a lot for the explanation.  It's helpful!
 

>> +;; For signed integer vectors comparison.
>> +(define_expand "vec_cmp"
>> +  [(set (match_operand:VEC_I 0 "vint_operand")
>> +(match_operator 1 "signed_or_equality_comparison_operator"
>> +  [(match_operand:VEC_I 2 "vint_operand")
>> +   (match_operand:VEC_I 3 "vint_operand")]))]
>> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
>> +{
>> +  enum rtx_code code = GET_CODE (operands[1]);
>> +  rtx tmp = gen_reg_rtx (mode);
>> +  switch (code)
>> +{
>> +case NE:
>> +  emit_insn (gen_vector_eq (operands[0], operands[2], 
>> operands[3]));
>> +  emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
>> +  break;
>> +case EQ:
>> +  emit_insn (gen_vector_eq (operands[0], operands[2], 
>> operands[3]));
>> +  break;
>> +case GE:
>> +  emit_insn (gen_vector_nlt (operands[0],operands[2], operands[3],
>> +   tmp));
>> +  break;
>> +case GT:
>> +  emit_insn (gen_vector_gt (operands[0], operands[2], 
>> operands[3]));
>> +  break;
>> +case LE:
>> +  emit_insn (gen_vector_ngt (operands[0], operands[2], 
>> operands[3],
>> +   tmp));
>> +  break;
>> +case LT:
>> +  emit_insn (gen_vector_gt (operands[0], operands[3], 
>> operands[2]));
>> +  break;
>> +default:
>> +  gcc_unreachable ();
>> +  break;
>> +}
>> +  DONE;
>> +})
> 
> I would think this can be done easier, but it is alright for now, it can
> be touched up later if we want.
> 
>> +;; For float point vectors comparison.
>> +(define_expand "vec_cmp"
> 
> This, too.
> 
>> +  [(set (match_operand: 0 "vint_operand")
>> + (match_operator 1 "comparison_operator"
> 
> If you make an iterator for this instead, it is simpler code (you can then
> use  to do all these cases in one statement).

If my understanding is correct and based on some tries before, I think we
have to leave these **CASEs** there (at least at the 1st level define_expand
for vec_cmp*), since vec_cmp* doesn't have  field in the pattern name.
The code can be only extracted from operator 1.  I tried to add one dummy
operand to hold  but it's impractical.

Sorry, I may miss something here, I'm happy to make a subsequent patch to
uniform these cases if there is a good way to run a code iterator on them.

> 
> But that can be done later.  Okay for trunk.  Thanks!
> 

Many thanks for your time!


BR,
Kewen

Re: [PR47785] COLLECT_AS_OPTIONS

2019-11-07 Thread Kugan Vivekanandarajah

Hi Richard,
Thanks for the review.

On Tue, 5 Nov 2019 at 23:08, Richard Biener  wrote:
>
> On Tue, Nov 5, 2019 at 12:17 AM Kugan Vivekanandarajah
>  wrote:
> >
> > Hi,
> > Thanks for the review.
> >
> > On Tue, 5 Nov 2019 at 03:57, H.J. Lu  wrote:
> > >
> > > On Sun, Nov 3, 2019 at 6:45 PM Kugan Vivekanandarajah
> > >  wrote:
> > > >
> > > > Thanks for the reviews.
> > > >
> > > >
> > > > On Sat, 2 Nov 2019 at 02:49, H.J. Lu  wrote:
> > > > >
> > > > > On Thu, Oct 31, 2019 at 6:33 PM Kugan Vivekanandarajah
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, 30 Oct 2019 at 03:11, H.J. Lu  wrote:
> > > > > > >
> > > > > > > On Sun, Oct 27, 2019 at 6:33 PM Kugan Vivekanandarajah
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi Richard,
> > > > > > > >
> > > > > > > > Thanks for the review.
> > > > > > > >
> > > > > > > > On Wed, 23 Oct 2019 at 23:07, Richard Biener 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Oct 21, 2019 at 10:04 AM Kugan Vivekanandarajah
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Richard,
> > > > > > > > > >
> > > > > > > > > > Thanks for the pointers.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, 11 Oct 2019 at 22:33, Richard Biener 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Oct 11, 2019 at 6:15 AM Kugan Vivekanandarajah
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, 2 Oct 2019 at 20:41, Richard Biener 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Oct 2, 2019 at 10:39 AM Kugan Vivekanandarajah
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > As mentioned in the PR, attached patch adds 
> > > > > > > > > > > > > > COLLECT_AS_OPTIONS for
> > > > > > > > > > > > > > passing assembler options specified with -Wa, to 
> > > > > > > > > > > > > > the link-time driver.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The proposed solution only works for uniform -Wa 
> > > > > > > > > > > > > > options across all
> > > > > > > > > > > > > > TUs. As mentioned by Richard Biener, supporting 
> > > > > > > > > > > > > > non-uniform -Wa flags
> > > > > > > > > > > > > > would require either adjusting partitioning 
> > > > > > > > > > > > > > according to flags or
> > > > > > > > > > > > > > emitting multiple object files  from a single 
> > > > > > > > > > > > > > LTRANS CU. We could
> > > > > > > > > > > > > > consider this as a follow up.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Bootstrapped and regression tests on  
> > > > > > > > > > > > > > arm-linux-gcc. Is this OK for trunk?
> > > > > > > > > > > > >
> > > > > > > > > > > > > While it works for your simple cases it is unlikely 
> > > > > > > > > > > > > to work in practice since
> > > > > > > > > > > > > your implementation needs the assembler options be 
> > > > > > > > > > > > > present at the link
> > > > > > > > > > > > > command line.  I agree that this might be the way for 
> > > > > > > > > > > > > people to go when
> > > > > > > > > > > > > they face the issue but then it needs to be 
> > > > > > > > > > > > > documented somewhere
> > > > > > > > > > > > > in the manual.
> > > > > > > > > > > > >
> > > > > > > > > > > > > That is, with COLLECT_AS_OPTION (why singular?  I'd 
> > > > > > > > > > > > > expected
> > > > > > > > > > > > > COLLECT_AS_OPTIONS) available to cc1 we could stream 
> > > > > > > > > > > > > this string
> > > > > > > > > > > > > to lto_options and re-materialize it at link time 
> > > > > > > > > > > > > (and diagnose mismatches
> > > > > > > > > > > > > even if we like).
> > > > > > > > > > > > OK. I will try to implement this. So the idea is if we 
> > > > > > > > > > > > provide
> > > > > > > > > > > > -Wa,options as part of the lto compile, this should be 
> > > > > > > > > > > > available
> > > > > > > > > > > > during link time. Like in:
> > > > > > > > > > > >
> > > > > > > > > > > > arm-linux-gnueabihf-gcc -march=armv7-a -mthumb -O2 -flto
> > > > > > > > > > > > -Wa,-mimplicit-it=always,-mthumb -c test.c
> > > > > > > > > > > > arm-linux-gnueabihf-gcc  -flto  test.o
> > > > > > > > > > > >
> > > > > > > > > > > > I am not sure where should we stream this. Currently, 
> > > > > > > > > > > > cl_optimization
> > > > > > > > > > > > has all the optimization flag provided for compiler and 
> > > > > > > > > > > > it is
> > > > > > > > > > > > autogenerated and all the flags are integer values. Do 
> > > > > > > > > > > > you have any
> > > > > > > > > > > > preference or example where this should be done.
> > > > > > > > > > >
> > > > > > > > > > > In lto_write_options, I'd simply append the contents of 
> > > > > > > > > > > COLLECT_AS_OPTIONS
> > > > > > > > > > >

Re: [PATCH, rs6000 v2] Make load cost more in vectorization cost for P8/P9

2019-11-07 Thread Kewen.Lin

Hi Segher,

on 2019/11/8 上午6:36, Segher Boessenkool wrote:
> On Thu, Nov 07, 2019 at 11:22:12AM +0800, Kewen.Lin wrote:
>> One updated patch to enable it everywhere attached.
> 
>> 2019-11-07  Kewen Lin  
>>
>>  * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Make
>>  scalar_load, vector_load, unaligned_load and vector_gather_load cost
>>  more to conform hardware latency and insn cost settings.
> 
>>case unaligned_load:
>>case vector_gather_load:
> 
> ...
> 
>> -  /* Word aligned.  */
>> -  return 22;
> 
>> +/* Word aligned.  */
>> +return 44;
> 
> I don't think it should go up from 22 all the way to 44 (not all insns here
> are loads).  But exact cost doesn't really matter.  Make it 30 perhaps?
> 

Good point, I'll try the cost 33 (the avg. of 22 and 44).

> 44 (as well as 22) are awfully precise numbers for a very imprecise cost
> like this ;-)

Yep!  ;-)

> 
> With either cost, whatever seems reasonable to you and works well in your
> tests: approved for trunk.  Thanks!

Thanks!  I'll kick off two regression testing on both BE and LE with new cost,
then commit it if everything goes well.


BR,
Kewen

Handle removal of old-style function definitions in C2x

2019-11-07 Thread Joseph Myers

C2x removes support for old-style function definitions with identifier
lists, changing () in function definitions to be equivalent to (void)
(while () in declarations that are not definitions still gives an
unprototyped type).

This patch updates GCC accordingly.  The new semantics for () are
implemented for C2x mode (meaning () in function definitions isn't
diagnosed by -Wold-style-definition in that mode).
-Wold-style-definition is enabled by default, and turned into a
pedwarn, for C2x.

Bootstrapped with no regressions on x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc:
2019-11-08  Joseph Myers  

* doc/invoke.texi (-Wold-style-definition): Document () not being
considered an old-style definition for C2x.

gcc/c:
2019-11-08  Joseph Myers  

* c-decl.c (grokparms): Convert () in a function definition to
(void) for C2x.
(store_parm_decls_oldstyle): Pedwarn for C2x.
(store_parm_decls): Update comment about () not generating a
prototype.

gcc/c-family:
2019-11-08  Joseph Myers  

* c.opt (Wold-style-definition): Initialize to -1.
* c-opts.c (c_common_post_options): Set warn_old_style_definition
to flag_isoc2x if not set explicitly.

gcc/testsuite:
2019-11-08  Joseph Myers  

* gcc.dg/c11-old-style-definition-1.c,
gcc.dg/c11-old-style-definition-2.c,
gcc.dg/c2x-old-style-definition-1.c,
gcc.dg/c2x-old-style-definition-2.c,
gcc.dg/c2x-old-style-definition-3.c,
gcc.dg/c2x-old-style-definition-4.c,
gcc.dg/c2x-old-style-definition-5.c,
gcc.dg/c2x-old-style-definition-6.c: New tests.

Index: gcc/c/c-decl.c
===
--- gcc/c/c-decl.c  (revision 277935)
+++ gcc/c/c-decl.c  (working copy)
@@ -7416,6 +7416,13 @@ grokparms (struct c_arg_info *arg_info, bool funcd
   tree parm, type, typelt;
   unsigned int parmno;
 
+  /* In C2X, convert () in a function definition to (void).  */
+  if (flag_isoc2x
+ && funcdef_flag
+ && !arg_types
+ && !arg_info->parms)
+   arg_types = arg_info->types = void_list_node;
+
   /* If there is a parameter of incomplete type in a definition,
 this is an error.  In a declaration this is valid, and a
 struct or union type may be completed later, before any calls
@@ -9261,8 +9268,15 @@ store_parm_decls_oldstyle (tree fndecl, const stru
   hash_set seen_args;
 
   if (!in_system_header_at (input_location))
-warning_at (DECL_SOURCE_LOCATION (fndecl),
-   OPT_Wold_style_definition, "old-style function definition");
+{
+  if (flag_isoc2x)
+   pedwarn (DECL_SOURCE_LOCATION (fndecl),
+OPT_Wold_style_definition, "old-style function definition");
+  else
+   warning_at (DECL_SOURCE_LOCATION (fndecl),
+   OPT_Wold_style_definition,
+   "old-style function definition");
+}
 
   /* Match each formal parameter name with its declaration.  Save each
  decl in the appropriate TREE_PURPOSE slot of the parmids chain.  */
@@ -9578,11 +9592,10 @@ store_parm_decls (void)
   struct c_arg_info *arg_info = current_function_arg_info;
   current_function_arg_info = 0;
 
-  /* True if this definition is written with a prototype.  Note:
- despite C99 6.7.5.3p14, we can *not* treat an empty argument
- list in a function definition as equivalent to (void) -- an
- empty argument list specifies the function has no parameters,
- but only (void) sets up a prototype for future calls.  */
+  /* True if this definition is written with a prototype.  In C2X, an
+ empty argument list was converted to (void) in grokparms; in
+ older C standard versions, it does not give the function a type
+ with a prototype for future calls.  */
   proto = arg_info->types != 0;
 
   if (proto)
Index: gcc/c-family/c-opts.c
===
--- gcc/c-family/c-opts.c   (revision 277935)
+++ gcc/c-family/c-opts.c   (working copy)
@@ -904,6 +904,10 @@ c_common_post_options (const char **pfilename)
   if (warn_implicit_int == -1)
 warn_implicit_int = flag_isoc99;
 
+  /* -Wold-style-definition is enabled by default for C2X.  */
+  if (warn_old_style_definition == -1)
+warn_old_style_definition = flag_isoc2x;
+
   /* -Wshift-overflow is enabled by default in C99 and C++11 modes.  */
   if (warn_shift_overflow == -1)
 warn_shift_overflow = cxx_dialect >= cxx11 || flag_isoc99;
Index: gcc/c-family/c.opt
===
--- gcc/c-family/c.opt  (revision 277935)
+++ gcc/c-family/c.opt  (working copy)
@@ -960,7 +960,7 @@ C ObjC Var(warn_old_style_declaration) Warning Ena
 Warn for obsolescent usage in a declaration.
 
 Wold-style-definition
-C ObjC Var(warn_old_style_definition) Warning
+C ObjC Var(warn_old_style_definition) Init(-1) Warning
 Warn

[committed] pa: Revise memory barriers to use strongly ordered ldcw instruction

2019-11-07 Thread John David Anglin

This change revises the memory barrier patterns to use the ldcw instruction 
instead of
the sync instruction.  The sync instruction performs better and I have more 
confidence
in it than sync.

We use a location just above the top of the stack for these operations.  The 
stack address
is aligned to a 16-byte boundary if the system is not coherent.

I have added two new options.  The first is the -mcoherent-ldcw option.  The 
majority of
PA 2.0 system have coherent caches and as a result the coherent ldcw completer 
can be used.
In that case, the ldcw address doesn't require 16-byte alignment.  We set the 
default to
-mcoherent-ldcw.

The second option is the -mordered option.  Although all PA 1.x systems have 
ordered memory
accesses, PA 2.0 systems are weakly ordered.  Since PA 2.0 are now prevalent, 
we set the
default to -mno-ordered.  For ordered systems, we fall back to just a compiler 
memory barrier.

I believe acquire and release fences can be defined in a similar way using an 
ordered load
and an ordered store, respectively.

Tested on hppa-unknown-linux-gnu, hppa2.0w-hp-hpux11.11 and hppa64-hp-hpux11.11.

Committed to trunk.

Dave

2019-11-07  John David Anglin  

* config/pa/pa.md (memory_barrier): Revise to use ldcw barriers.
Enhance comment.
(memory_barrier_coherent, memory_barrier_64, memory_barrier_32): New
insn patterns using ldcw instruction.
(memory_barrier): Remove insn pattern using sync instruction.
* config/pa/pa.opt (coherent-ldcw): New option.
(ordered): New option.

Index: config/pa/pa.md
===
--- config/pa/pa.md (revision 277870)
+++ config/pa/pa.md (working copy)
@@ -10086,23 +10086,55 @@
(set_attr "length" "4,16")])

 ;; PA 2.0 hardware supports out-of-order execution of loads and stores, so
-;; we need a memory barrier to enforce program order for memory references.
-;; Since we want PA 1.x code to be PA 2.0 compatible, we also need the
-;; barrier when generating PA 1.x code.
+;; we need memory barriers to enforce program order for memory references
+;; when the TLB and PSW O bits are not set.  We assume all PA 2.0 systems
+;; are weakly ordered since neither HP-UX or Linux set the PSW O bit.  Since
+;; we want PA 1.x code to be PA 2.0 compatible, we also need barriers when
+;; generating PA 1.x code even though all PA 1.x systems are strongly ordered.

+;; When barriers are needed, we use a strongly ordered ldcw instruction as
+;; the barrier.  Most PA 2.0 targets are cache coherent.  In that case, we
+;; can use the coherent cache control hint and avoid aligning the ldcw
+;; address.  In spite of its description, it is not clear that the sync
+;; instruction works as a barrier.
+
 (define_expand "memory_barrier"
-  [(set (match_dup 0)
-(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))]
+  [(parallel
+ [(set (match_dup 0) (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))
+  (clobber (match_dup 1))])]
   ""
 {
-  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  /* We don't need a barrier if the target uses ordered memory references.  */
+  if (TARGET_ORDERED)
+FAIL;
+  operands[1] = gen_reg_rtx (Pmode);
+  operands[0] = gen_rtx_MEM (BLKmode, operands[1]);
   MEM_VOLATILE_P (operands[0]) = 1;
 })

-(define_insn "*memory_barrier"
+(define_insn "*memory_barrier_coherent"
   [(set (match_operand:BLK 0 "" "")
-(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))]
+(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))
+   (clobber (match_operand 1 "pmode_register_operand" "=r"))]
+  "TARGET_PA_20 && TARGET_COHERENT_LDCW"
+  "ldcw,co 0(%%sp),%1"
+  [(set_attr "type" "binary")
+   (set_attr "length" "4")])
+
+(define_insn "*memory_barrier_64"
+  [(set (match_operand:BLK 0 "" "")
+(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))
+(clobber (match_operand 1 "pmode_register_operand" "="))]
+  "TARGET_64BIT"
+  "ldo 15(%%sp),%1\n\tdepd %%r0,63,3,%1\n\tldcw 0(%1),%1"
+  [(set_attr "type" "binary")
+   (set_attr "length" "12")])
+
+(define_insn "*memory_barrier_32"
+  [(set (match_operand:BLK 0 "" "")
+(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))
+(clobber (match_operand 1 "pmode_register_operand" "="))]
   ""
-  "sync"
+  "ldo 15(%%sp),%1\n\t{dep|depw} %%r0,31,3,%1\n\tldcw 0(%1),%1"
   [(set_attr "type" "binary")
-   (set_attr "length" "4")])
+   (set_attr "length" "12")])
Index: config/pa/pa.opt
===
--- config/pa/pa.opt(revision 277870)
+++ config/pa/pa.opt(working copy)
@@ -45,6 +45,10 @@
 Target Report Mask(CALLER_COPIES)
 Caller copies function arguments passed by hidden reference.

+mcoherent-ldcw
+Target Report Var(TARGET_COHERENT_LDCW) Init(1)
+Use ldcw/ldcd coherent cache-control hint.
+
 mdisable-fpregs
 Target Report Mask(DISABLE_FPREGS)
 Disable FP regs.
@@ -90,6 +94,10 @@
 Target

[PATCH] libsupc++: add to precompiled header

2019-11-07 Thread Jonathan Wakely


Also process it with Doxygen.

* doc/doxygen/user.cfg.in (INPUT): Add  header.
* include/precompiled/stdc++.h: Include  header.

Tested powerpc64le-linux, committed to trunk.
commit ed86b7c065df6b33dfee86fba3bee089386ac4a8
Author: Jonathan Wakely 
Date:   Thu Nov 7 23:20:51 2019 +

libsupc++: add  to precompiled header

Also process it with Doxygen.

* doc/doxygen/user.cfg.in (INPUT): Add  header.
* include/precompiled/stdc++.h: Include  header.

diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
b/libstdc++-v3/doc/doxygen/user.cfg.in
index 42001016721..18994703f0b 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -787,6 +787,7 @@ WARN_LOGFILE   =
 # Note: If this tag is empty the current directory is searched.
 
 INPUT  = @srcdir@/doc/doxygen/doxygroups.cc \
+ @srcdir@/libsupc++/compare \
  @srcdir@/libsupc++/cxxabi.h \
  @srcdir@/libsupc++/exception \
  @srcdir@/libsupc++/initializer_list \
diff --git a/libstdc++-v3/include/precompiled/stdc++.h 
b/libstdc++-v3/include/precompiled/stdc++.h
index 57c3e2e32ee..118fc8f359a 100644
--- a/libstdc++-v3/include/precompiled/stdc++.h
+++ b/libstdc++-v3/include/precompiled/stdc++.h
@@ -135,7 +135,7 @@
 
 #if __cplusplus > 201703L
 #include 
-// #include 
+#include 
 #include 
 #include 
 #include

[PATCH] libstdc++: define std::common_comparison_category for C++20

2019-11-07 Thread Jonathan Wakely


* libsupc++/compare (common_comparison_category)
(common_comparison_category_t): Define for C++20.
* testsuite/18_support/comparisons/common/1.cc: New test.

Tested powerpc64le-linux, committed to trunk.

commit cb48e7e6d0bbed9eadc34b4318511f3e05ec1b64
Author: Jonathan Wakely 
Date:   Thu Nov 7 21:33:36 2019 +

libstdc++: define std::common_comparison_category for C++20

* libsupc++/compare (common_comparison_category)
(common_comparison_category_t): Define for C++20.
* testsuite/18_support/comparisons/common/1.cc: New test.

diff --git a/libstdc++-v3/libsupc++/compare b/libstdc++-v3/libsupc++/compare
index 84cc3f5c85f..94728e29de8 100644
--- a/libstdc++-v3/libsupc++/compare
+++ b/libstdc++-v3/libsupc++/compare
@@ -385,18 +385,81 @@ namespace std
   is_gteq(partial_ordering __cmp) noexcept
   { return __cmp >= 0; }
 
+#if __cpp_lib_concepts
+  namespace __detail
+  {
+template
+  inline constexpr unsigned __cmp_cat_id = 1;
+template<>
+  inline constexpr unsigned __cmp_cat_id = 2;
+template<>
+  inline constexpr unsigned __cmp_cat_id = 4;
+template<>
+  inline constexpr unsigned __cmp_cat_id = 8;
+
+template
+  constexpr unsigned __cmp_cat_ids()
+  { return (__cmp_cat_id<_Ts> | ...); }
+
+template
+  struct __common_cmp_cat;
+
+// If any Ti is not a comparison category type, U is void.
+template
+  requires ((_Bits & 1) == 1)
+  struct __common_cmp_cat<_Bits> { using type = void; };
+
+// Otherwise, if at least one Ti is std::partial_ordering,
+// U is std::partial_ordering.
+template
+  requires ((_Bits & 0b1001) == 0b1000)
+  struct __common_cmp_cat<_Bits> { using type = partial_ordering; };
+
+// Otherwise, if at least one Ti is std::weak_ordering,
+// U is std::weak_ordering.
+template
+  requires ((_Bits & 0b1101) == 0b0100)
+  struct __common_cmp_cat<_Bits> { using type = weak_ordering; };
+
+// Otherwise, U is std::strong_ordering.
+template<>
+  struct __common_cmp_cat<0b0010> { using type = strong_ordering; };
+  } // namespace __detail
+
   // [cmp.common], common comparison category type
   template
 struct common_comparison_category
 {
-  // using type = TODO
+  using type
+   = __detail::__common_cmp_cat<__detail::__cmp_cat_ids<_Ts...>()>::type;
 };
 
+  // Partial specializations for one and zero argument cases.
+
+  template
+struct common_comparison_category<_Tp>
+{ using type = void; };
+
+  template<>
+struct common_comparison_category
+{ using type = partial_ordering; };
+
+  template<>
+struct common_comparison_category
+{ using type = weak_ordering; };
+
+  template<>
+struct common_comparison_category
+{ using type = strong_ordering; };
+
+  template<>
+struct common_comparison_category<>
+{ using type = strong_ordering; };
+
   template
 using common_comparison_category_t
   = typename common_comparison_category<_Ts...>::type;
 
-#if __cpp_lib_concepts
   namespace __detail
   {
 template
@@ -493,22 +556,22 @@ namespace std
 template
   requires (three_way_comparable_with<_Tp, _Up>
  || __detail::__3way_builtin_ptr_cmp<_Tp, _Up>)
-constexpr auto
-operator()(_Tp&& __t, _Up&& __u) const noexcept
-{
-  if constexpr (__detail::__3way_builtin_ptr_cmp<_Tp, _Up>)
-   {
- auto __pt = static_cast(__t);
- auto __pu = static_cast(__u);
- if (__builtin_is_constant_evaluated())
-   return __pt <=> __pu;
- auto __it = reinterpret_cast<__UINTPTR_TYPE__>(__pt);
- auto __iu = reinterpret_cast<__UINTPTR_TYPE__>(__pu);
- return __it <=> __iu;
-   }
-  else
-   return static_cast<_Tp&&>(__t) <=> static_cast<_Up&&>(__u);
-}
+  constexpr auto
+  operator()(_Tp&& __t, _Up&& __u) const noexcept
+  {
+   if constexpr (__detail::__3way_builtin_ptr_cmp<_Tp, _Up>)
+ {
+   auto __pt = static_cast(__t);
+   auto __pu = static_cast(__u);
+   if (__builtin_is_constant_evaluated())
+ return __pt <=> __pu;
+   auto __it = reinterpret_cast<__UINTPTR_TYPE__>(__pt);
+   auto __iu = reinterpret_cast<__UINTPTR_TYPE__>(__pu);
+   return __it <=> __iu;
+ }
+   else
+ return static_cast<_Tp&&>(__t) <=> static_cast<_Up&&>(__u);
+  }
 
 using is_transparent = void;
   };
diff --git a/libstdc++-v3/testsuite/18_support/comparisons/common/1.cc 
b/libstdc++-v3/testsuite/18_support/comparisons/common/1.cc
new file mode 100644
index 000..015a8acae97
--- /dev/null
+++ b/libstdc++-v3/testsuite/18_support/comparisons/common/1.cc
@@ -0,0 +1,48 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms

Re: [PATCH 2/4] MSP430: Disable exception handling by default for C++

2019-11-07 Thread Oleg Endo

On Thu, 2019-11-07 at 21:37 +, Jozef Lawrynowicz wrote:
> The code size bloat added by building C++ programs using libraries containing
> support for exceptions is significant. When using simple constructs such as
> static variables, sometimes many kB from the libraries are unnecessarily
> pulled in.
> 
> So this patch disable exceptions by default for MSP430 when compiling for C++,
> by implicitly passing -fno-exceptions unless -fexceptions is passed.

It is extremely annoying when GCC's default standard behavior differs
across different targets.  And as a consequence, you have to add a load
of workarounds and disable other things, like fiddling with the
testsuite.  It's the same thing as setting "double = float" to get more
"speed" by default.

I would strongly advice against making such non-standard behaviors the
default in the vanilla compiler.  C++ normally has exceptions enabled. 
If a user doesn't want them and is willing to deal with it all the
consequences, then we already have a mechanism to do that:
 --fno-exceptions

Perhaps it's generally more useful to add a global configure option for
GCC to disable exception handling by default.  Then you can provide a
turn-key toolchain to your customers as well -- just add an option to
the configure line.

Cheers,
Oleg

Re: [PATCH rs6000]Fix PR92132

2019-11-07 Thread Segher Boessenkool

Hi!

On Thu, Nov 07, 2019 at 06:17:53PM +0800, Kewen.Lin wrote:
> on 2019/11/7 上午7:49, Segher Boessenkool wrote:
> > The expander named "one_cmpl3":
> > 
> > Erm.  2, not 3 :-)

> Ah, sorry I didn't notice we have one cmpl**3** but actually for one
> cmpl**2** expand, a bit surprised.  Done.  Thanks for pointing that.

Yeah, I suddenly couldn't find it myself either.  Real head-scratcher :-)

> > etc., so you can just delete the expand and rename the insn to the proper
> > name (one_cmpl2).  It sometimes is useful to have an expand like
> > this if there are multiple insns that could implement this, but that is
> > not the case here.
> 
> OK, example like vector_select?  :)

Sure, like that.  There are many examples where you are required to have
just one define_expand, it is called by name after all, but you want to
have different define_insns (for different cpus, say).

> > So we have only gt/ge/eq.
> > 
> > I think the following are ooptimal (not tested!):
> > 
> > lt(a,b) = gt(b,a)
> yes, this is what I used for that operator.
> 
> > gt(a,b) = gt(a,b)
> > eq(a,b) = eq(a,b)
> > un(a,b) = ~(ge(a,b) | ge(b,a))
> > 
> 
> existing code uses (~ge(a,b) & ~ge(b,a))
> but should be the same.

Yup, it's just ge/ge/nor, whatever way you write it :-)  (RTL requires
you write the expression in your form, with all the NOTs "pushed in").

> > ltgt(a,b) = ge(a,b) ^ ge(b,a)
> 
> existing code uses gt(a,b) | gt(b,a)
> but should be the same.

Yup, computes exactly the same, and exactly the same execution speeds.

Your form might be slightly easier to optimise with (it has no XOR).

> > Half are pretty simple:
> > 
> > lt(a,b) = gt(b,a)
> > gt(a,b) = gt(a,b)
> > eq(a,b) = eq(a,b)
> > le(a,b) = ge(b,a)
> > ge(a,b) = ge(a,b)
> > 
> > ltgt(a,b) = ge(a,b) ^ ge(b,a)
> > ord(a,b)  = ge(a,b) | ge(b,a)
> > 
> > The other half are the negations of those:
> > 
> > unge(a,b) = ~gt(b,a)
> > unle(a,b) = ~gt(a,b)
> > ne(a,b)   = ~eq(a,b)
> > ungt(a,b) = ~ge(b,a)
> > unlt(a,b) = ~ge(a,b)
> > 
> > uneq(a,b) = ~(ge(a,b) ^ ge(b,a))
> > un(a,b) = ~(ge(a,b) | ge(b,a))
> 
> Awesome!  Do you suggest refactoring on them?  :)

I'd do the first five in one pattern (which then swaps two ops and the
condition in the lt and le case), and the other five in another pattern.
And the rest in two or four patterns?  Just try it out, see what works
well.  It helps to do a bunch together in one pattern, but if that then
turns into special cases for everything, more might be lost than gained.

> > And please remember to test everythin with -ffast-math :-)  That is, when
> > flag_finite_math_only is set.  You cannot get unordered results, then,
> > making the optimal sequences different in some cases (and changing what
> > "ne" means!)
> 
> Thanks for the remind!  On RTL pattern, I think we won't get any un*
> related operators with -ffast-math, so that part on un* expansion
> would be fine?

Yeah, but look what you should do for "ne" :-)

> > 8 codes, ordered:never lt   gt   ltgt eq   le   ge   ordered
> > 8 codes, unordered:  unordered unlt ungt ne   uneq unle unge always
> > 8 codes, fast-math:  never lt   gt   ne   eq   le   ge   always
> > 8 codes, non-fp: never lt   gt   ne   eq   le   ge   always
> 
> Sorry, I don't quite follow this table.  What's the column heads?

The first row is the eight possible fp conditions that are not always
true if unordered is set; the second row is those that *are* always true
if it is set.  The other two rows (which are the same) is just the eight
conditions that do not test unordered at all.

The tricky one is "ne": for FP *with* NaNs, "ne" means "less than, or
greater than, or unordered", while without NaNs (i.e. -ffast-math) it
means "less than, or greater than".

You could write the column heads as
--/--/--  lt/--/--  --/gt/--  lt/gt/--  --/--/eq  lt/--/eq  --/gt/eq  lt/gt/eq
if that helps?  Just the eight combinations of the first free flags.

> > Yes, it is redundant, the comparison code already says if it is an
> > unsigned comparison.  So this a question about the generic patterns, not
> > your implementation of them :-)
> > 
> > And if it is *one* pattern then handling LTU etc. makes perfect sense.
> 
> Fully agree, but it separates for now.  :)

Sure :-)

> Thanks again!  I've updated a new version as some comments, you can review 
> this
> one to save your time.  :)


> +;; Return 1 if OP is a signed comparison or an equality operator.
> +(define_predicate "signed_or_equality_comparison_operator"
> +  (ior (match_operand 0 "equality_operator")
> +   (match_operand 0 "signed_comparison_operator")))
> +
> +;; Return 1 if OP is an unsigned comparison or an equality operator.
> +(define_predicate "unsigned_or_equality_comparison_operator"
> +  (ior (match_operand 0 "equality_operator")
> +   (match_operand 0 "unsigned_comparison_operator")))

Hrm.  Unpleasant.

> +(define_expand "vcond_mask_"
> +  [(match_operand:VEC_I 0 "vint_operand")
> +   (match_operand:VEC_I 1 "vint_operand")
> +

[PATCH] rs6000: Remove no longer correct assert

2019-11-07 Thread Segher Boessenkool

After the simplify-rtx patch, we can now be asked about conditions we
wouldn't be asked about before.  This is perfectly fine, except we
have a little over-eager assert.  Remove that one.


I originally had this patch before the simplify-rtx one, but I reordered
them without retesting every step.  And broke the powerpc bootstraps now.
So sorry.

It failed in stage 1 before, it's still building but past there, so I'll
commit this now.


Segher


2019-11-07  Segher Boessenkool  

* config/rs6000/rs6000.c (validate_condition_mode): Don't assert for
valid conditions.

---
 gcc/config/rs6000/rs6000.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d9d275b..d48157a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -10250,14 +10250,6 @@ validate_condition_mode (enum rtx_code code, 
machine_mode mode)
  && code != UNGT && code != UNLT
  && code != UNGE && code != UNLE));
 
-  /* These should never be generated except for
- flag_finite_math_only.  */
-  gcc_assert (mode != CCFPmode
- || flag_finite_math_only
- || (code != LE && code != GE
- && code != UNEQ && code != LTGT
- && code != UNGT && code != UNLT));
-
   /* These are invalid; the information is not there.  */
   gcc_assert (mode != CCEQmode || code == EQ || code == NE);
 }
-- 
1.8.3.1

Expand C2x attribute parsing support and factor out from TM attributes

2019-11-07 Thread Joseph Myers

There is one place in the C parser that already handles a subset of
the C2x [[]] attribute syntax: c_parser_transaction_attributes.

This patch factors C2x attribute parsing out of there, extending it to
cover the full C2x attribute syntax (although currently only called
from that one place in the parser - so this is another piece of
preparation for supporting C2x attributes in the places where C2x says
they are valid, not the patch that actually enables such support).
The new C2X attribute parsing code uses the same representation for
scoped attributes as C++ does, so requiring parse_tm_stmt_attr to
handle the scoped attributes representation (C++ currently
special-cases TM attributes "to avoid the pedwarn in C++98 mode"; in C
I'm using an argument to c_parser_std_attribute_specifier to disable
the pedwarn_c11 call in the TM case).

Parsing of arguments to known attributes is shared by GNU and C2x
attributes.  C2x specifies that unknown attributes are ignored (GCC
practice in such a case is to warn along with ignoring the attribute)
and gives a very general balanced-token-sequence syntax for arguments
to unknown attributes (known ones each have their own syntax which is
a subset of balanced-token-sequence), so support is added for parsing
and ignoring such balanced-token-sequences as arguments of unknown
attributes.

Some limited tests are added of different attribute usages in the TM
attribute case.  The cases that become valid in the TM case include
extra commas inside [[]], and an explicit "gnu" namespace, as the
extra commas have no semantic effect for C2x attributes, while
accepting the "gnu" namespace seems appropriate because the attribute
in question is accepted inside __attribute__ (()), which is considered
equivalent to the "gnu" namespace inside [[]].

Bootstrapped with no regressions on x86_64-pc-linux-gnu. Applied to 
mainline.

gcc/c:
2019-11-07  Joseph Myers  

* c-parser.c (c_parser_attribute_arguments): New function.
Factored out of c_parser_gnu_attribute.
(c_parser_gnu_attribute): Use c_parser_attribute_arguments.
(c_parser_balanced_token_sequence, c_parser_std_attribute)
(c_parser_std_attribute_specifier): New functions.
(c_parser_transaction_attributes): Use
c_parser_std_attribute_specifier.

gcc/c-family:
2019-11-07  Joseph Myers  

* c-attribs.c (parse_tm_stmt_attr): Handle scoped attributes.

gcc/testsuite:
2019-11-07  Joseph Myers  

* gcc.dg/tm/attrs-1.c: New test.
* gcc.dg/tm/props-5.c: New test.  Based on props-4.c.

Index: gcc/c/c-parser.c
===
--- gcc/c/c-parser.c(revision 277927)
+++ gcc/c/c-parser.c(working copy)
@@ -4308,6 +4308,70 @@ c_parser_gnu_attribute_any_word (c_parser *parser)
   return attr_name;
 }
 
+/* Parse attribute arguments.  This is a common form of syntax
+   covering all currently valid GNU and standard attributes.
+
+   gnu-attribute-arguments:
+ identifier
+ identifier , nonempty-expr-list
+ expr-list
+
+   where the "identifier" must not be declared as a type.  ??? Why not
+   allow identifiers declared as types to start the arguments?  */
+
+static tree
+c_parser_attribute_arguments (c_parser *parser, bool takes_identifier)
+{
+  vec *expr_list;
+  tree attr_args;
+  /* Parse the attribute contents.  If they start with an
+ identifier which is followed by a comma or close
+ parenthesis, then the arguments start with that
+ identifier; otherwise they are an expression list.
+ In objective-c the identifier may be a classname.  */
+  if (c_parser_next_token_is (parser, CPP_NAME)
+  && (c_parser_peek_token (parser)->id_kind == C_ID_ID
+ || (c_dialect_objc ()
+ && c_parser_peek_token (parser)->id_kind
+ == C_ID_CLASSNAME))
+  && ((c_parser_peek_2nd_token (parser)->type == CPP_COMMA)
+ || (c_parser_peek_2nd_token (parser)->type
+ == CPP_CLOSE_PAREN))
+  && (takes_identifier
+ || (c_dialect_objc ()
+ && c_parser_peek_token (parser)->id_kind
+ == C_ID_CLASSNAME)))
+{
+  tree arg1 = c_parser_peek_token (parser)->value;
+  c_parser_consume_token (parser);
+  if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN))
+   attr_args = build_tree_list (NULL_TREE, arg1);
+  else
+   {
+ tree tree_list;
+ c_parser_consume_token (parser);
+ expr_list = c_parser_expr_list (parser, false, true,
+ NULL, NULL, NULL, NULL);
+ tree_list = build_tree_list_vec (expr_list);
+ attr_args = tree_cons (NULL_TREE, arg1, tree_list);
+ release_tree_vector (expr_list);
+   }
+}
+  else
+{
+  if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN))
+   attr_args = NULL_TREE;
+  else
+   {
+ expr_list = c_parser_expr_list (parser, false, true,
+

Re: [C] Opt out of GNU vector extensions for built-in SVE types

2019-11-07 Thread Joseph Myers

On Thu, 7 Nov 2019, Richard Sandiford wrote:

> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [C++ PATCH] Implement D1959R0, remove weak_equality and strong_equality.

2019-11-07 Thread Jakub Jelinek

On Thu, Nov 07, 2019 at 05:05:16PM +, Jason Merrill wrote:
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/spaceship-scalar1-neg.C
> @@ -0,0 +1,25 @@
> +// { dg-do run { target c++2a } }
> +
> +#include 
> +
> +#define assert(X) do { if (!(X)) __builtin_abort(); } while(0)
> +
> +void f(){}
> +void g(){}
> +
> +int main()
> +{
> +  {
> +const auto v =  <=> // { dg-error "invalid operands" }
...

I've noticed this test being UNRESOLVED (the execution part of it, because
we expect errors during compilation).

Fixed thusly, tested on x86_64-linux, committed to trunk as obvious:

2019-11-08  Jakub Jelinek  

* g++.dg/cpp2a/spaceship-scalar1-neg.C: Change dg-do from run to
compile.

--- gcc/testsuite/g++.dg/cpp2a/spaceship-scalar1-neg.C  (revision 277932)
+++ gcc/testsuite/g++.dg/cpp2a/spaceship-scalar1-neg.C  (working copy)
@@ -1,4 +1,4 @@
-// { dg-do run { target c++2a } }
+// { dg-do compile { target c++2a } }
 
 #include 
 


Jakub

[committed] Fix some typos

2019-11-07 Thread Jakub Jelinek

Hi!

I've noticed a comment typo in build_vec_delete_1 and when grepping
around if the same typo isn't elsewhere, I found a different typo
elsewhere.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed to trunk as obvious.

2019-11-08  Jakub Jelinek  

* ipa-utils.c (ipa_merge_profiles): Fix fprintf format string
typo - mistmatch -> mismatch.
* ipa-profile.c (ipa_profile): Likewise.
* ipa-devirt.c (compare_virtual_tables): Fix a comment typo
- mistmatch -> mismatch.
cp/
* init.c (build_vec_delete_1): Fix a comment typo - mist -> must.

--- gcc/ipa-utils.c.jj  2019-10-30 10:49:35.092045700 +0100
+++ gcc/ipa-utils.c 2019-11-07 15:40:21.803010506 +0100
@@ -518,7 +518,7 @@ ipa_merge_profiles (struct cgraph_node *
{
  if (symtab->dump_file)
fprintf (symtab->dump_file,
-"Edge count mistmatch for bb %i.\n",
+"Edge count mismatch for bb %i.\n",
 srcbb->index);
  match = false;
  break;
@@ -531,7 +531,7 @@ ipa_merge_profiles (struct cgraph_node *
{
  if (symtab->dump_file)
fprintf (symtab->dump_file,
-"Succ edge mistmatch for bb %i.\n",
+"Succ edge mismatch for bb %i.\n",
 srce->dest->index);
  match = false;
  break;
--- gcc/ipa-profile.c.jj2019-10-30 10:49:37.170013777 +0100
+++ gcc/ipa-profile.c   2019-11-07 15:40:02.111304179 +0100
@@ -613,7 +613,7 @@ ipa_profile (void)
  if (dump_file)
fprintf (dump_file,
 "Not speculating: "
-"parameter count mistmatch\n");
+"parameter count mismatch\n");
}
  else if (e->indirect_info->polymorphic
   && !opt_for_fn (n->decl, flag_devirtualize)
--- gcc/ipa-devirt.c.jj 2019-11-05 08:40:44.824281200 +0100
+++ gcc/ipa-devirt.c2019-11-07 15:39:43.658579375 +0100
@@ -808,7 +808,7 @@ compare_virtual_tables (varpool_node *pr
  return;
}
 
-  /* And in the last case we have either mistmatch in between two virtual
+  /* And in the last case we have either mismatch in between two virtual
 methods or two virtual table pointers.  */
   auto_diagnostic_group d;
   if (warning_at (DECL_SOURCE_LOCATION
--- gcc/cp/init.c.jj2019-11-02 00:26:48.995846401 +0100
+++ gcc/cp/init.c   2019-11-07 15:39:16.651982139 +0100
@@ -4060,7 +4060,7 @@ build_vec_delete_1 (tree base, tree maxi
   else if (!body)
 body = deallocate_expr;
   else
-/* The delete operator mist be called, even if a destructor
+/* The delete operator must be called, even if a destructor
throws.  */
 body = build2 (TRY_FINALLY_EXPR, void_type_node, body, deallocate_expr);
 

Jakub

[PATCH] libstdc++: make negative count safe with std::for_each_n

2019-11-07 Thread Jonathan Wakely


The Library Working Group have approved a change to std::for_each_n that
requires it to handle negative N gracefully, which we were not doing for
random access iterators.

* include/bits/stl_algo.h (for_each_n): Handle negative count.
* testsuite/25_algorithms/for_each/for_each_n_debug.cc: New test.

Tested powerpc64le-linux, committed to trunk.

commit 8a968a0ad62d5beae36f8ac33c4327d6c5599a33
Author: Jonathan Wakely 
Date:   Thu Nov 7 21:42:52 2019 +

libstdc++: make negative count safe with std::for_each_n

The Library Working Group have approved a change to std::for_each_n that
requires it to handle negative N gracefully, which we were not doing for
random access iterators.

* include/bits/stl_algo.h (for_each_n): Handle negative count.
* testsuite/25_algorithms/for_each/for_each_n_debug.cc: New test.

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index 661db0264ea..24322b7188d 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -4011,6 +4011,8 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   using _Cat = typename iterator_traits<_InputIterator>::iterator_category;
   if constexpr (is_base_of_v)
{
+ if (__n2 <= 0)
+   return __first;
  auto __last = __first + __n2;
  std::for_each(__first, __last, std::move(__f));
  return __last;
diff --git a/libstdc++-v3/testsuite/25_algorithms/for_each/for_each_n_debug.cc 
b/libstdc++-v3/testsuite/25_algorithms/for_each/for_each_n_debug.cc
new file mode 100644
index 000..24c14efbbf9
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/for_each/for_each_n_debug.cc
@@ -0,0 +1,44 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+// { dg-do run { target c++17 } }
+
+#include 
+#include 
+#include 
+
+void
+test01()
+{
+  __gnu_debug::vector v{1, 2, 3};
+  std::for_each_n(v.begin(), -2, [](int){});
+}
+
+void
+test02()
+{
+  __gnu_debug::list l{1, 2, 3};
+  std::for_each_n(l.begin(), -2, [](int){});
+}
+
+int
+main()
+{
+  test01();
+  test02();
+}

Re: [PATCH 2/2] Introduce the gcc option --record-gcc-command-line

2019-11-07 Thread Egeyar Bagcioglu





On 11/7/19 7:57 PM, Segher Boessenkool wrote:

Hi!

On Thu, Nov 07, 2019 at 06:44:17PM +0100, Egeyar Bagcioglu wrote:

On 11/7/19 9:03 AM, Segher Boessenkool wrote:

+   ASM_OUTPUT_ASCII(asm_out_file, cmdline, cmdline_length);
+}
+  cmdline[0] = 0;
+  ASM_OUTPUT_ASCII(asm_out_file, cmdline, 1);
+
+  /* The return value is currently ignored by the caller, but must be 0.
*/
+  return 0;
+}

A temporary file like this isn't so great.

GCC operates with temporary files, doesn't it? What is the concern that
is specific to this one? That is the most reasonable way I found to pass
the argv of gcc to child processes for saving. Other ways of passing it
that I could think of, or the idea of saving it in the driver were
actually very bad ideas.

Oh, this is for passing something to another process?  I guess I didn't
read it closely enough, sorry, never mind.


Opening a file as "r" but then
accessing it with "fread" is peculiar, too.

I am not sure what you mean here. Is it that you prefer "wb" and "rb"
instead of "w" and "r"? I thought it was enough to use a consistent pair.

I'd use fgets or similar, not fread.


Two things made me prefer fread over fgets here:
1) Although I am reading a string, I do not need each read character to 
be checked against newline. I just need to read till end-of-file.
2) fread returns the number of elements read which I later use. If I 
used fgets, I'd need to call strlen or so afterwards to get the string size.


Let me know please if you disagree or if there are advantages / 
disadvantages that I omit.


Regards
Egeyar

Re: introduce -fcallgraph-info option

2019-11-07 Thread Alexandre Oliva

On Nov  7, 2019, Richard Biener  wrote:

> (also raises the question why we have both -dumpbase and -auxbase ...)

https://gcc.gnu.org/ml/gcc-patches/2002-08/msg00294.html

This was before -dumpdir, however.

Here's the current logic for aux_base_name:

-c or -S with -o [odir/]obase.oext: [odir/]obase
otherwise, given input [idir/]ibase.iext: ibase

Whereas the current logic for dump_base_name, once aux_base_name has
been determined as [auxdir/]auxbase, is:

given -dumpbase ddir/dbase: ddir/dbase
otherwise, given -dumpdir ddir and -dumpbase dbase: ddir/dbase
otherwise, given -dumpbase dbase: [auxdir/]dbase
otherwise, given -dumpdir ddir: ddir/ibase.iext
otherwise: [auxdir/]ibase.iext

Relevant cases to consider: (aux, dump) for each compilation with
CC='gcc -fstack-usage -fdump-tree-original'

compiling without -o: (ibase, ibase.iext)
ex $CC -c srcdir/foo.c srcdir/x/bar.c
-> foo.o foo.su foo.c.#t.original
 + bar.o bar.su bar.c.#t.original

compiling with -o: ([odir/]obase, [odir/]ibase.iext)
ex $CC -c srcdir/foo.c -o objdir/foobaz.o -Dfoobaz
-> objdir/foobaz.o objdir/foobaz.su objdir/foo.c.#t.original

compiling multiple sources with -dumpbase: (ibase, [ddir/]dbase)
ex $CC -dumpbase outdir/runme.dump -c srcdir/foo.c srcdir/x/bar.c
-> foo.o foo.su outdir/runme.dump.#t.original
 + bar.o bar.su outdir/runme.dump.#t.original (dupe)

compiling and linking with -o: (ibase, ibase.iext)
ex $CC -o outdir/runme srcdir/foo.c srcdir/x/bar.c
-> /tmp/temp().o foo.su foo.c.#t.original
 + /tmp/temp().o bar.su bar.c.#t.original
 + outdir/runme

lto-recompiling and linking with -o: (/tmp/obase.temp().ltrans#.ltrans, 
odir/obase.ltrans#)
ex $CC -o outdir/runme ltobjdir/foo.o ltobjdir/bar.o -fdump-rtl-expand
-> /tmp/runme.temp().ltrans0.ltrans.o /tmp/runme.temp().ltrans0.ltrans.su
 + outdir/runme.ltrans0.#r.expand
 + outdir/runme

lto-recompiling and linking without -o: (/tmp/temp().ltrans#.ltrans, 
/tmp/temp().ltrans#.o)
ex $CC ltobjdir/foo.o ltobjdir/bar.o -fdump-rtl-expand
-> /tmp/temp().ltrans0.ltrans.o /tmp/temp().ltrans0.ltrans.su
 + /tmp/temp().ltrans0.#r.expand
 + a.out


If we were to unify auxbase and dumpbase, I'd take the opportunity to
fix the -o objdir/foobaz.o compilation to output dumps named after
objdir/foobaz or objdir/foobaz-foo.c rather than ./foo.c; for
outdir/runme.dump to be used as a prefix for aux and dump names, so that
we wouldn't create and then overwrite outdir/runme.dump, and so that
other compilations of foo.c and bar.c wouldn't overwrite the .su files,
but rather create outdir/runme.dump-{foo,bar}.* dumps and aux files; and
likewise use outdir/runme.ltrans0 or a.out.ltrans0 for the .su and
.expand files.


The logic I suggest is involves combining some of the -auxbase and some
of the -dumpbase logic, namely:

In the driver:

compiling a single source idir/ibase.iext:

  -o odir/obase.oext specified: default -dumpdir odir -dumpbase obase.iext
  -o obase.oext specified: default -dumpbase obase.iext
  -o ibase.oext implied: default -dumpbase ibase.iext

compiling multiple sources named as ibase.iext for linking:

  -dumpbase [ddir/]dbase specified: make it -dumpbase [ddir/]dbase-ibase.iext
  -o odir/output specified: default -dumpdir odir -dumpbase output-ibase.iext
  -o output specified: default -dumpbase output-ibase.iext
  -o a.out implied: default -dumpbase a.out-ibase.iext

LTO recompiling:

  same as above, with each ibase.iext set to ltrans#


In the compiler, set dump_base_name to:

Given -dumpbase ddir/dbase: ddir/dbase
otherwise, given -dumpdir ddir and -dumpbase dbase: ddir/dbase
otherwise, given -dumpbase dbase: dbase

and copy aux_base_name from dump_base_name, but if it ends in .iext,
drop the extension.

The resulting behavior (aux_base_name, dump_base_name)

compiling without -o: (ibase, ibase.iext)  unchanged
ex $CC -c srcdir/foo.c srcdir/x/bar.c
-> foo.o foo.su foo.c.#t.original
 + bar.o bar.su bar.c.#t.original

compiling with -o: ([odir/]obase, [odir/]obase.iext)
ex $CC -c srcdir/foo.c -o objdir/foobaz.o -Dfoobaz
-> objdir/foobaz.o objdir/foobaz.su objdir/foobaz.c.#t.original

compiling multiple sources with -dumpbase: ([ddir]/dbase, [ddir/]dbase)
ex $CC -dumpbase outdir/runme.dump -c srcdir/foo.c srcdir/x/bar.c
-> foo.o outdir/runme.dump-foo.su outdir/runme.dump-foo.c.#t.original
 + bar.o outdir/runme.dump-bar.su outdir/runme.dump-bar.c.#t.original

compiling and linking with -o: (outdir/runme-ibase, outdir/runme-ibase.iext)
ex $CC -o outdir/runme srcdir/foo.c srcdir/x/bar.c
-> /tmp/temp().o outdir/runme-foo.su outdir/runme-foo.c.#t.original
 + /tmp/temp().o outdir/runme-bar.su outdir/runme-bar.c.#t.original
 + outdir/runme

lto-recompiling and linking with -o: (outdir/runme.ltrans#, 
outdir/runme.ltrans#)
ex $CC -o outdir/runme ltobjdir/foo.o ltobjdir/bar.o -fdump-rtl-expand
-> /tmp/runme.temp().ltrans0.ltrans.o outdir/runme.ltrans0.su
 + outdir/runme.ltrans0.#r.expand
 + outdir/runme

lto-recompiling and linking without -o: (a.out.ltrans#, a.out.ltrans#)
ex $CC

Re: [PATCH, rs6000 v2] Make load cost more in vectorization cost for P8/P9

2019-11-07 Thread Segher Boessenkool

On Thu, Nov 07, 2019 at 11:22:12AM +0800, Kewen.Lin wrote:
> One updated patch to enable it everywhere attached.

> 2019-11-07  Kewen Lin  
> 
>   * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Make
>   scalar_load, vector_load, unaligned_load and vector_gather_load cost
>   more to conform hardware latency and insn cost settings.

>case unaligned_load:
>case vector_gather_load:

...

> -  /* Word aligned.  */
> -  return 22;

> + /* Word aligned.  */
> + return 44;

I don't think it should go up from 22 all the way to 44 (not all insns here
are loads).  But exact cost doesn't really matter.  Make it 30 perhaps?

44 (as well as 22) are awfully precise numbers for a very imprecise cost
like this ;-)

With either cost, whatever seems reasonable to you and works well in your
tests: approved for trunk.  Thanks!

Segher

Re: [PATCH] simplify-rtx: simplify_logical_relational_operation

2019-11-07 Thread Segher Boessenkool

On Wed, Nov 06, 2019 at 10:46:06AM -0700, Jeff Law wrote:
> BTW, I think there's enough overlap between simplify-rtx and combine
> that if you wanted to maintain simplify-rtx as well that I'd fully
> support it.  Thoughts?

I'd be honoured, thanks for the offer!


Segher

[PATCH 4/4] MSP430: Deprecate -minrt option

2019-11-07 Thread Jozef Lawrynowicz

Support for the MSP430 -minrt option has been removed from Newlib, since all the
associated behaviour is now dynamic. Initialization code run before main is only
included when needed.

This patch removes the final traces of -minrt from GCC.

-minrt used to modify the linking process in the following ways:
* Removing .init and .fini sections, by using a reduced crt0 and excluding crtn.
* Removing crtbegin and crtend (thereby not using crtstuff.c at all).
  + This meant that even if the program had constructors for global or
static objects which must run before main, it would blindly remove them.

These causes of code bloat have been addressed by:
* switching to .{init,fini}_array instead of using .{init,fini} sections
  "Lean" code to run through constructors before main is only included if
  .init_array has contents.
* removing bloat (frame_dummy, *tm_clones*, *do_global_dtors*) from the
  crtstuff.c with the changes in the previous patches

Here are some examples of the total size of different "barebones" C programs to
show that the size previously achieved by -minrt is now matched by default:

program |old (with -minrt)  |new (without -minrt)
-
Empty main  |20 |20
Looping main|14 |14
Looping main with data  |94 |94
Looping main with bss   |56 |56
>From 6e561b45c118540f06d5828ec386d2dd79c13b62 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Wed, 6 Nov 2019 18:12:45 +
Subject: [PATCH 4/4] MSP430: Remove -minrt option

gcc/ChangeLog:

2019-11-07  Jozef Lawrynowicz  

	* config/msp430/msp430.h (STARTFILE_SPEC): Remove -minrt rules.
	Use "if, then, else" syntax for specs.
	(ENDFILE_SPEC): Likewise.
	* config/msp430/msp430.opt: Mark -minrt as deprecated.
	* doc/invoke.texi: Remove -minrt documentation.
---
 gcc/config/msp430/msp430.h   | 9 -
 gcc/config/msp430/msp430.opt | 4 ++--
 gcc/doc/invoke.texi  | 9 +
 3 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/gcc/config/msp430/msp430.h b/gcc/config/msp430/msp430.h
index 25944125182..3f013a9d315 100644
--- a/gcc/config/msp430/msp430.h
+++ b/gcc/config/msp430/msp430.h
@@ -45,15 +45,14 @@ extern bool msp430x;
   while (0)
 
 #undef  STARTFILE_SPEC
-#define STARTFILE_SPEC "%{pg:gcrt0.o%s}" \
-  "%{!pg:%{minrt:crt0-minrt.o%s}%{!minrt:crt0.o%s}} " \
-  "%{!minrt:%{fexceptions:crtbegin.o%s}%{!fexceptions:crtbegin_no_eh.o%s}}"
+#define STARTFILE_SPEC "%{pg:gcrt0.o%s; :crt0.o%s} " \
+  "%{fexceptions:crtbegin.o%s; :crtbegin_no_eh.o%s}"
 
 /* -lgcc is included because crtend.o needs __mspabi_func_epilog_1.  */
 #undef  ENDFILE_SPEC
 #define ENDFILE_SPEC \
-  "%{!minrt:%{fexceptions:crtend.o%s}%{!fexceptions:crtend_no_eh.o%s}} "  \
-  "%{minrt:%:if-exists(crtn-minrt.o%s)}%{!minrt:%:if-exists(crtn.o%s)} -lgcc"
+  "%{fexceptions:crtend.o%s; :crtend_no_eh.o%s} "  \
+  "%:if-exists(crtn.o%s) -lgcc"
 
 #define ASM_SPEC "-mP " /* Enable polymorphic instructions.  */ \
   "%{mcpu=*:-mcpu=%*} " /* Pass the CPU type on to the assembler.  */ \
diff --git a/gcc/config/msp430/msp430.opt b/gcc/config/msp430/msp430.opt
index 2db2906ca11..74fdcdf0851 100644
--- a/gcc/config/msp430/msp430.opt
+++ b/gcc/config/msp430/msp430.opt
@@ -38,8 +38,8 @@ mOs
 Target Undocumented Mask(OPT_SPACE)
 
 minrt
-Target Report Mask(MINRT) RejectNegative
-Use a minimum runtime (no static initializers or ctors) for memory-constrained devices.
+Target Undocumented WarnRemoved
+This option is deprecated in GCC10 and has no effect.
 
 HeaderInclude
 config/msp430/msp430-opts.h
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6829b949b4b..12a360ed6a7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1009,7 +1009,7 @@ Objective-C and Objective-C++ Dialects}.
 -mwarn-mcu @gol
 -mcode-region=  -mdata-region= @gol
 -msilicon-errata=  -msilicon-errata-warn= @gol
--mhwmult=  -minrt}
+-mhwmult=}
 
 @emph{NDS32 Options}
 @gccoptlist{-mbig-endian  -mlittle-endian @gol
@@ -23262,13 +23262,6 @@ The hardware multiply routines disable interrupts whilst running and
 restore the previous interrupt state when they finish.  This makes
 them safe to use inside interrupt handlers as well as in normal code.
 
-@item -minrt
-@opindex minrt
-Enable the use of a minimum runtime environment - no static
-initializers or constructors.  This is intended for memory-constrained
-devices.  The compiler includes special symbols in some objects
-that tell the linker and runtime which code fragments are required.
-
 @item -mcode-region=
 @itemx -mdata-region=
 @opindex mcode-region
-- 
2.17.1

[PATCH 3/4] MSP430: Disable __cxa_atexit

2019-11-07 Thread Jozef Lawrynowicz

The MSP430 target does not need to support dynamic shared objects so
__cxa_atexit does not need to be used - atexit is sufficient.

Newlib atexit is a fine replacement as it also supports registration of more
than 32 functions.

By not using __cxa_atexit, we can define TARGET_LIBGCC_REMOVE_DSO_HANDLE to
remove the definition of __dso_handle from crtstuff.c, saving code size by
removing the necessity to link in functions to initialize global data in
*every* program.
>From a0086b73d0e029cab2f65a91e67a2502e4d4 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Wed, 30 Oct 2019 16:39:52 +
Subject: [PATCH 3/4] MSP430: Disable __cxa_atexit

gcc/ChangeLog:

2019-11-07  Jozef Lawrynowicz  

	* config.gcc (msp430*-*-*): Disable __cxa_atexit by default.
	* config/msp430/msp430.c (msp430_option_override): Emit an error if
	-fuse-cxa-atexit was used when __cxa_atexit was disabled at configure
	time.
	* config/msp430/msp430.h (TARGET_LIBGCC_REMOVE_DSO_HANDLE): Define if
	__cxa_atexit was disabled at configure time.
	
gcc/testsuite/ChangeLog:

2019-11-07  Jozef Lawrynowicz  

	* g++.dg/init/dso_handle1.C: Add dg-require-cxa-atexit. 
	* g++.dg/init/dso_handle2.C: Likewise.
	* g++.dg/other/cxa-atexit1.C: Likewise.
	* lib/target-supports.exp (check_cxa_atexit_available): Add hard-coded
	case for msp430.

---
 gcc/config.gcc   | 7 +++
 gcc/config/msp430/msp430.c   | 9 +
 gcc/config/msp430/msp430.h   | 6 ++
 gcc/testsuite/g++.dg/init/dso_handle1.C  | 1 +
 gcc/testsuite/g++.dg/init/dso_handle2.C  | 1 +
 gcc/testsuite/g++.dg/other/cxa-atexit1.C | 1 +
 gcc/testsuite/lib/target-supports.exp| 3 +++
 7 files changed, 28 insertions(+)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index d74bcbb9856..2e79101cc8f 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2533,6 +2533,13 @@ msp430*-*-*)
 	tmake_file="${tmake_file} msp430/t-msp430"
 	extra_objs="${extra_objs} msp430-devices.o"
 	extra_gcc_objs="driver-msp430.o msp430-devices.o"
+
+	# __cxa_atexit increases code size, and we don't need to support dynamic
+	# shared objects on MSP430, so regular Newlib atexit is a fine
+	# replacement as it also supports registration of more than 32
+	# functions.
+	default_use_cxa_atexit=no
+
 	# Enable .init_array unless it has been explicitly disabled.
 	# The MSP430 EABI mandates the use of .init_array, and the Newlib CRT
 	# code since mid-2019 expects it.
diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index fe1fcc0db43..ce8d863abd3 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -284,6 +284,15 @@ msp430_option_override (void)
  possible to build newlib with -Os enabled.  Until now...  */
   if (TARGET_OPT_SPACE && optimize < 3)
 optimize_size = 1;
+
+#if !DEFAULT_USE_CXA_ATEXIT
+  /* By default, we enforce atexit() instead of __cxa_atexit() to save on code
+ size and remove the declaration of __dso_handle from the CRT library.
+ Configuring GCC with --enable-__cxa-atexit re-enables it by defining
+ DEFAULT_USE_CXA_ATEXIT to 1.  */
+  if (flag_use_cxa_atexit)
+error ("%<-fuse-cxa-atexit%> is not supported for msp430-elf");
+#endif
 }
 
 #undef  TARGET_SCALAR_MODE_SUPPORTED_P
diff --git a/gcc/config/msp430/msp430.h b/gcc/config/msp430/msp430.h
index 90ceec0e947..25944125182 100644
--- a/gcc/config/msp430/msp430.h
+++ b/gcc/config/msp430/msp430.h
@@ -509,4 +509,10 @@ typedef struct
 #define ASM_OUTPUT_ALIGNED_DECL_COMMON(FILE, DECL, NAME, SIZE, ALIGN)	\
   msp430_output_aligned_decl_common ((FILE), (DECL), (NAME), (SIZE), (ALIGN))
 
+#if !DEFAULT_USE_CXA_ATEXIT
+/* We're not using __cxa_atexit, so __dso_handle isn't needed.  */
+#undef TARGET_LIBGCC_REMOVE_DSO_HANDLE
+#define TARGET_LIBGCC_REMOVE_DSO_HANDLE
+#endif
+
 #define SYMBOL_FLAG_LOW_MEM (SYMBOL_FLAG_MACH_DEP << 0)
diff --git a/gcc/testsuite/g++.dg/init/dso_handle1.C b/gcc/testsuite/g++.dg/init/dso_handle1.C
index 97f67cad8f4..dc92e22d12a 100644
--- a/gcc/testsuite/g++.dg/init/dso_handle1.C
+++ b/gcc/testsuite/g++.dg/init/dso_handle1.C
@@ -1,6 +1,7 @@
 // PR c++/17042
 // { dg-do assemble }
 /* { dg-require-weak "" } */
+// { dg-require-cxa-atexit "" }
 // { dg-options "-fuse-cxa-atexit" }
 
 struct A
diff --git a/gcc/testsuite/g++.dg/init/dso_handle2.C b/gcc/testsuite/g++.dg/init/dso_handle2.C
index b219dc02611..6e151e50fa7 100644
--- a/gcc/testsuite/g++.dg/init/dso_handle2.C
+++ b/gcc/testsuite/g++.dg/init/dso_handle2.C
@@ -1,4 +1,5 @@
 // PR c++/58846
+// { dg-require-cxa-atexit "" }
 // { dg-options "-fuse-cxa-atexit" }
 
 extern "C" { char* __dso_handle; }
diff --git a/gcc/testsuite/g++.dg/other/cxa-atexit1.C b/gcc/testsuite/g++.dg/other/cxa-atexit1.C
index a51f3340142..d6ab3dc4733 100644
--- a/gcc/testsuite/g++.dg/other/cxa-atexit1.C
+++ b/gcc/testsuite/g++.dg/other/cxa-atexit1.C
@@ -1,4 +1,5 @@
 // { dg-do compile }
+// { dg-require-cxa-atexit "" }
 // { dg-options "-O2 -fuse-cxa-atexit" }
 
 # 1 "cxa-atexit1.C"
diff --git

[PATCH 2/4] MSP430: Disable exception handling by default for C++

2019-11-07 Thread Jozef Lawrynowicz

The code size bloat added by building C++ programs using libraries containing
support for exceptions is significant. When using simple constructs such as
static variables, sometimes many kB from the libraries are unnecessarily
pulled in.

So this patch disable exceptions by default for MSP430 when compiling for C++,
by implicitly passing -fno-exceptions unless -fexceptions is passed.

Multilibs have been added for the -fexceptions configuration.
Since building double the multilibs does significantly increase build time,
the patch also adds a configure option --disable-exceptions. This disables the
fexceptions mulitlibs from being built.

There was a lot of fallout from the G++ testsuite caused by disabling exceptions
by default.

I've mitigated some of it by adding dg-prune strings which mark a test as
unsupported if the compiler reports exception handling is disabled. This doesn't
work for some execution tests or tests for warnings/errors/messages.

There's some further mitigation achieved by new functionality which
will pass -fexceptions as a default flag if exceptions are supported but not
enabled by default.
However, for tests with dg-options directives, this gets ignored. So
for these tests (of which there weren't *too* many, I've added -fexceptions to
the dg-options directives in the tests.

As a result of all the above there aren't any DejaGNU regressions.
>From 7844e05172d07443167c3e852cf0b695f043c0eb Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Tue, 29 Oct 2019 15:32:07 +
Subject: [PATCH 2/4] MSP430: Disable exception handling by default for C++

ChangeLog:

2019-11-07  Jozef Lawrynowicz  

	* config-ml.in: Support --disable-exceptions configure flag.

gcc/ChangeLog:

2019-11-07  Jozef Lawrynowicz  

	* config/msp430/msp430.h (STARTFILE_SPEC) [!fexceptions]: Use
	crtbegin_no_eh.o.
	(ENDFILE_SPEC) [!fexceptions]: Use crtend_no_eh.o.
	(CC1PLUS_SPEC): Define.
	* config/msp430/t-msp430: Add -fexceptions multilibs.
	* doc/install.texi: Document --disable-exceptions configure option.
	* doc/invoke.texi: Document that exceptions are disabled by default for
	C++ for msp430-elf.

gcc/testsuite/ChangeLog:

2019-11-07  Jozef Lawrynowicz  

	* g++.dg/cpp1y/sized-dealloc2.C: Add -fexceptions to dg-options.
	* g++.dg/cpp2a/explicit1.C: Likewise.
	* g++.dg/cpp2a/explicit2.C: Likewise.
	* g++.dg/cpp2a/explicit5.C: Likewise.
	* g++.dg/eh/array1.C: Likewise.
	* g++.dg/eh/spec11.C: Likewise.
	* g++.dg/eh/spec6.C: Likewise.
	* g++.dg/ext/vla4.C: Likewise.
	* g++.dg/ipa/pr64612.C: Likewise.
	* g++.dg/other/error32.C: Likewise.
	* g++.dg/torture/pr34850.C: Likewise.
	* g++.dg/tree-ssa/ivopts-3.C: Likewise.
	* g++.dg/tree-ssa/pr33615.C: Likewise.
	* g++.dg/warn/Wcatch-value-1.C: Likewise.
	* g++.dg/warn/Wcatch-value-2.C: Likewise.
	* g++.dg/warn/Wcatch-value-3.C: Likewise.
	* g++.dg/warn/Wstringop-truncation-2.C: Likewise.
	* g++.dg/warn/Wterminate1.C: Likewise.
	* g++.dg/warn/pr83054.C: Likewise.
	* g++.old-deja/g++.other/cond5.C: Likewise.
	* g++.dg/dg.exp: Pass -fexceptions as a default flag if exceptions
	aren't enabled by default.
	* g++.dg/torture/dg-torture.exp: Likewise.
	* g++.old-deja/old-deja.exp:
	* lib/gcc-dg.exp: Add dg-prune messages for when exception handling is
	disabled.
	* lib/target-supports.exp (check_effective_target_exceptions): Check if
	GCC was configured with --disable-exceptions.
	(check_effective_target_exceptions_enabled_by_default): New.

libgcc/ChangeLog:

2019-11-07  Jozef Lawrynowicz  

	* config.host: Add crt{begin,end}_no_eh.o to "extra_parts".
	* config/msp430/t-msp430: Add rules to build crt{begin,end}_no_eh.o.
---
 config-ml.in  | 13 ++
 gcc/config/msp430/msp430.h| 11 +++--
 gcc/config/msp430/t-msp430|  9 +++
 gcc/doc/install.texi  |  3 +++
 gcc/doc/invoke.texi   |  6 +++--
 gcc/testsuite/g++.dg/cpp1y/sized-dealloc2.C   |  2 +-
 gcc/testsuite/g++.dg/cpp2a/explicit1.C|  2 +-
 gcc/testsuite/g++.dg/cpp2a/explicit2.C|  2 +-
 gcc/testsuite/g++.dg/cpp2a/explicit5.C|  2 +-
 gcc/testsuite/g++.dg/dg.exp   |  9 ++-
 gcc/testsuite/g++.dg/eh/array1.C  |  2 +-
 gcc/testsuite/g++.dg/eh/spec11.C  |  2 +-
 gcc/testsuite/g++.dg/eh/spec6.C   |  2 +-
 gcc/testsuite/g++.dg/ext/vla4.C   |  2 +-
 gcc/testsuite/g++.dg/ipa/pr64612.C|  2 +-
 gcc/testsuite/g++.dg/other/error32.C  |  2 +-
 gcc/testsuite/g++.dg/torture/dg-torture.exp   |  9 ++-
 gcc/testsuite/g++.dg/torture/pr34850.C|  2 +-
 gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C  |  2 +-
 gcc/testsuite/g++.dg/tree-ssa/pr33615.C   |  2 +-
 gcc/testsuite/g++.dg/warn/Wcatch-value-1.C|  2 +-
 gcc/testsuite/g++.dg/warn/Wcatch-value-2.C|  2 +-
 gcc/testsuite/g++.dg/warn/Wcatch-value-3.C|  2 +-
 .../g++.dg/warn/Wstringop-truncation-2.C  |  2 +-

[PATCH 1/4] MSP430: Disable TM clone registry by default

2019-11-07 Thread Jozef Lawrynowicz

Given that MSP430 is a resource constrained, embedded target disabling
transactional memory by default is a good idea to save on code size in
the runtime library.

It can still be enabled by passing --enable-tm-clone-registry (although as far
as I understand the feature is fundamentally incompatible with MSP430 given
reliance on libitm, lack of thread support without an OS and the memory
limitations of the device.
>From 9dfc5fde568c5a4cd29471888bff538943a995b1 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Tue, 29 Oct 2019 14:49:08 +
Subject: [PATCH 1/4] MSP430: Disable TM clone registry by default

libgcc/ChangeLog:

2019-11-07  Jozef Lawrynowicz  

	* configure: Regenerate.
	* configure.ac (tm-clone-registry): Disable by default for MSP430.
---
 libgcc/configure| 9 +
 libgcc/configure.ac | 8 
 2 files changed, 17 insertions(+)

diff --git a/libgcc/configure b/libgcc/configure
index 117e9c97e57..26d4d68a510 100755
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -4964,6 +4964,15 @@ if test "$enable_tm_clone_registry" = no; then
   use_tm_clone_registry=-DUSE_TM_CLONE_REGISTRY=0
 fi
 
+else
+
+use_tm_clone_registry=
+case $target in
+  msp430*)
+   use_tm_clone_registry=-DUSE_TM_CLONE_REGISTRY=0
+   ;;
+esac
+
 fi
 
 
diff --git a/libgcc/configure.ac b/libgcc/configure.ac
index f63c5e736e5..0f225b84117 100644
--- a/libgcc/configure.ac
+++ b/libgcc/configure.ac
@@ -268,6 +268,14 @@ use_tm_clone_registry=
 if test "$enable_tm_clone_registry" = no; then
   use_tm_clone_registry=-DUSE_TM_CLONE_REGISTRY=0
 fi
+],
+[
+use_tm_clone_registry=
+case $target in
+  msp430*)
+   use_tm_clone_registry=-DUSE_TM_CLONE_REGISTRY=0
+   ;;
+esac
 ])
 AC_SUBST([use_tm_clone_registry])
 
-- 
2.17.1

[PATCH 0/4][MSP430] Tweaks to default configuration to reduce code size

2019-11-07 Thread Jozef Lawrynowicz

When building small programs for MSP430, the impact of the unused
functions pulled in from the CRT libraries is quite noticeable. Most of these
relates to feature that will never be used for MSP430 (Transactional memory,
supporting shared objects and dynamic linking), or rarely used (exception
handling).

The following patches change the default configuration for msp430-elf with the
aim of reducing code size by removing these unsupported features.

Related generic changes to GCC have been submitted here:
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00415.html
(But note that the first patch to disable eh frame registry has been retracted
as it's no longer necessary).

I picked random C and C++ programs from the testsuite to give an
indication of the size reduction:

$ msp430-elf-gcc testsuite/gcc.dg/2108-1.c -Os -msim -ffunction-sections \
-fdata-sections -Wl,-gc-sections
Before:
   textdata bss dec hex filename
708 242  28 978 3d2 2108-1.exe
After:
   textdata bss dec hex filename
444 234   2 680 2a8 2108-1.exe

$ msp430-elf-g++ -msim -Os testsuite/g++.dg/abi/covariant5.C \
-ffunction-sections -fdata-sections -Wl,-gc-sections
Before:
   textdata bss dec hex filename
   4090 396  1845041198 covariant5.exe
Before (-fno-exceptions):
   textdata bss dec hex filename
   3912 396  18432610e6 a.out
After:
   textdata bss dec hex filename
   3396 122   23520 dc0 covariant5.exe

The writeup for the -minrt patch has some more code size comparisons related to
that option.

Successfully regtested for msp430-elf.

Ok to apply?

Jozef Lawrynowicz (4):
  MSP430: Disable TM clone registry by default
  MSP430: Disable exception handling by default for C++
  MSP430: Disable __cxa_atexit
  MSP430: Remove -minrt option

 config-ml.in  | 13 +
 gcc/config.gcc|  7 +
 gcc/config/msp430/msp430.c|  9 +++
 gcc/config/msp430/msp430.h| 20 +++---
 gcc/config/msp430/msp430.opt  |  4 +--
 gcc/config/msp430/t-msp430|  9 ---
 gcc/doc/install.texi  |  3 +++
 gcc/doc/invoke.texi   | 15 ---
 gcc/testsuite/g++.dg/cpp1y/sized-dealloc2.C   |  2 +-
 gcc/testsuite/g++.dg/cpp2a/explicit1.C|  2 +-
 gcc/testsuite/g++.dg/cpp2a/explicit2.C|  2 +-
 gcc/testsuite/g++.dg/cpp2a/explicit5.C|  2 +-
 gcc/testsuite/g++.dg/dg.exp   |  9 ++-
 gcc/testsuite/g++.dg/eh/array1.C  |  2 +-
 gcc/testsuite/g++.dg/eh/spec11.C  |  2 +-
 gcc/testsuite/g++.dg/eh/spec6.C   |  2 +-
 gcc/testsuite/g++.dg/ext/vla4.C   |  2 +-
 gcc/testsuite/g++.dg/init/dso_handle1.C   |  1 +
 gcc/testsuite/g++.dg/init/dso_handle2.C   |  1 +
 gcc/testsuite/g++.dg/ipa/pr64612.C|  2 +-
 gcc/testsuite/g++.dg/other/cxa-atexit1.C  |  1 +
 gcc/testsuite/g++.dg/other/error32.C  |  2 +-
 gcc/testsuite/g++.dg/torture/dg-torture.exp   |  9 ++-
 gcc/testsuite/g++.dg/torture/pr34850.C|  2 +-
 gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C  |  2 +-
 gcc/testsuite/g++.dg/tree-ssa/pr33615.C   |  2 +-
 gcc/testsuite/g++.dg/warn/Wcatch-value-1.C|  2 +-
 gcc/testsuite/g++.dg/warn/Wcatch-value-2.C|  2 +-
 gcc/testsuite/g++.dg/warn/Wcatch-value-3.C|  2 +-
 .../g++.dg/warn/Wstringop-truncation-2.C  |  2 +-
 gcc/testsuite/g++.dg/warn/Wterminate1.C   |  2 +-
 gcc/testsuite/g++.dg/warn/pr83054.C   |  2 +-
 gcc/testsuite/g++.old-deja/g++.other/cond5.C  |  2 +-
 gcc/testsuite/g++.old-deja/old-deja.exp   |  9 ++-
 gcc/testsuite/lib/gcc-dg.exp  | 10 +++
 gcc/testsuite/lib/target-supports.exp | 27 ---
 libgcc/config.host|  3 ++-
 libgcc/config/msp430/t-msp430 |  6 +
 libgcc/configure  |  9 +++
 libgcc/configure.ac   |  8 ++
 40 files changed, 166 insertions(+), 47 deletions(-)

-- 
2.17.1

[Darwin, X86, testsuite, committed] Fix pr92258.c.

2019-11-07 Thread Iain Sandoe

This test uses -masm=intel, which isn't supported by Darwin,  so add the
necessary dg-require-effective-target.

tested on x86_64-darwin16, 
applied as obvious to mainline,
thanks
Iain

gcc/testsuite/ChangeLog:

2019-11-07  Iain Sandoe  

* gcc.target/i386/pr92258.c: Add dg-requires for masm_intel.


diff --git a/gcc/testsuite/gcc.target/i386/pr92258.c 
b/gcc/testsuite/gcc.target/i386/pr92258.c
index 4e78ea3..85b9400 100644
--- a/gcc/testsuite/gcc.target/i386/pr92258.c
+++ b/gcc/testsuite/gcc.target/i386/pr92258.c
@@ -1,5 +1,6 @@
 /* PR target/92258 */
 /* { dg-do compile } */
+/* { dg-require-effective-target masm_intel } */
 /* { dg-options "-masm=intel -msse2" } */
 
 typedef double V __attribute__ ((__vector_size__ (16)));

[Patch][Fortran] PR91253 fix continuation-line handling with -pre_include

2019-11-07 Thread Tobias Burnus

This fixes the gfortran.dg/continuation_6.f fails testsuite fails with 
newer GLIBC.


The continuation line handling assumes that the line number starts at 0 
(→ continue_line) and then can be incremented, if needed.


The problem came up with -pre_include, which is used with newer GLIBC to 
provide things like "!GCC$ builtin (cos) attributes simd (notinbranch) 
if('x86_64')".


There, first the file math-vector-fortran.h file is loaded, then the 
actual file. The 'continue_line' gets incremented for 
math-vector-fortran.h but nothing resets it before parsing the actual 
input file. For the 'include_stmt' function, the reset happens during 
parsing – while for our case, this knowledge is only in the line 
information, but on file change, 'continue_line' is not updated/reset.


I think the same issue can occur with #include, especially as one plays 
with #line, but I have not actually tested it. Obviously, if one plays 
around with #line during a continuation block, this check won't work 
either. However, it should fix the most common occurrence.


Additionally, I have removed the ATTRIBUTE_UNUSED from get_file's 
'reason' as it is used in the linemap_add call.


And I have moved the OpenMP/OpenACC comment before if openmp/openacc 
condition, where in my opinion it belongs to.


OK for the trunk?

Tobias

2019-11-07  Tobias Burnus   gfc_linebuf_linenum (gfc_current_locus.lb) + 1)
+	continue_line = gfc_linebuf_linenum (gfc_current_locus.lb) + 1;
+
   continue_flag = 1;
   if (c == '!')
 	skip_comment_line ();
@@ -1475,6 +1483,14 @@ restart:
   if (flag_openacc)
 	prev_openacc_flag = openacc_flag;
 
+  /* This can happen if the input file changed or via cpp's #line
+	 without getting reset (e.g. via input_stmt). It also happens
+	 when pre-including files via -fpre-include=.  */
+  if (continue_count == 0
+	  && gfc_current_locus.lb
+	  && continue_line > gfc_linebuf_linenum (gfc_current_locus.lb) + 1)
+	continue_line = gfc_linebuf_linenum (gfc_current_locus.lb) + 1;
+
   continue_flag = 1;
   old_loc = gfc_current_locus;
 
@@ -1943,7 +1959,7 @@ next_char:
the file stack.  */
 
 static gfc_file *
-get_file (const char *name, enum lc_reason reason ATTRIBUTE_UNUSED)
+get_file (const char *name, enum lc_reason reason)
 {
   gfc_file *f;

Re: [C++ PATCH] PR c++/91370 - Implement P1041R4 and P1139R2 - Stronger Unicode reqs

2019-11-07 Thread Jason Merrill


OK.

On 11/7/19 3:38 PM, Jakub Jelinek wrote:

Hi!

GCC does use UTF-16 and UTF-32 for char16_t and char32_t string literals
already, so P1041R4 is I believe already implemented with no changes needed.

While going through P1139R2, I've realized that we weren't handling
"If the value is not representable within 16 bits, the program is ill-formed. A 
char16_t
literal containing multiple c-chars is ill-formed."
and
"A char32_t literal containing multiple c-chars is ill-formed."
already from C++11 correctly, we were just warning about it, rather than
emitting an error.  This is different from C11, where the standard
makes it implementation-defined what happens.

Furthermore, the C++17:
"If the value is not representable with a single UTF-8 code unit,
the program is ill-formed. A UTF-8 character literal containing multiple 
c-chars is
ill-formed."
wasn't handled as an error, but instead u8'ab' would be an int with a
warning, similarly u8'\u00c0' etc.  u8 char literals are only in C++17+,
not in C, so no need to worry about C at this point.

And lastly, P1139R2 makes it clear that code points above U+10 are
ill-formed, but that is something Eric already implemented in r276167.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I believe we can now claim to have both P1041R4 and P1139R2 implemented.

2019-11-07  Jakub Jelinek  

PR c++/91370 - Implement P1041R4 and P1139R2 - Stronger Unicode reqs
* charset.c (narrow_str_to_charconst): Add TYPE argument.  For
CPP_UTF8CHAR diagnose whenever number of chars is > 1, using
CPP_DL_ERROR instead of CPP_DL_WARNING.
(wide_str_to_charconst): For CPP_CHAR16 or CPP_CHAR32, use
CPP_DL_ERROR instead of CPP_DL_WARNING when multiple char16_t
or char32_t chars are needed.
(cpp_interpret_charconst): Adjust narrow_str_to_charconst caller.

* g++.dg/cpp1z/utf8-neg.C: Expect errors rather than -Wmultichar
warnings.
* g++.dg/ext/utf16-4.C: Expect errors rather than warnings.
* g++.dg/ext/utf32-4.C: Likewise.
* g++.dg/cpp2a/ucn2.C: New test.

--- libcpp/charset.c.jj 2019-09-27 10:32:17.127641484 +0200
+++ libcpp/charset.c2019-11-07 13:40:19.616040925 +0100
@@ -1881,10 +1881,11 @@ cpp_interpret_string_notranslate (cpp_re
  /* Subroutine of cpp_interpret_charconst which performs the conversion
 to a number, for narrow strings.  STR is the string structure returned
 by cpp_interpret_string.  PCHARS_SEEN and UNSIGNEDP are as for
-   cpp_interpret_charconst.  */
+   cpp_interpret_charconst.  TYPE is the token type.  */
  static cppchar_t
  narrow_str_to_charconst (cpp_reader *pfile, cpp_string str,
-unsigned int *pchars_seen, int *unsignedp)
+unsigned int *pchars_seen, int *unsignedp,
+enum cpp_ttype type)
  {
size_t width = CPP_OPTION (pfile, char_precision);
size_t max_chars = CPP_OPTION (pfile, int_precision) / width;
@@ -1913,10 +1914,12 @@ narrow_str_to_charconst (cpp_reader *pfi
result = c;
  }
  
+  if (type == CPP_UTF8CHAR)

+max_chars = 1;
if (i > max_chars)
  {
i = max_chars;
-  cpp_error (pfile, CPP_DL_WARNING,
+  cpp_error (pfile, type == CPP_UTF8CHAR ? CPP_DL_ERROR : CPP_DL_WARNING,
 "character constant too long for its type");
  }
else if (i > 1 && CPP_OPTION (pfile, warn_multichar))
@@ -1980,7 +1983,9 @@ wide_str_to_charconst (cpp_reader *pfile
   character exactly fills a wchar_t, so a multi-character wide
   character constant is guaranteed to overflow.  */
if (str.len > nbwc * 2)
-cpp_error (pfile, CPP_DL_WARNING,
+cpp_error (pfile, (CPP_OPTION (pfile, cplusplus)
+  && (type == CPP_CHAR16 || type == CPP_CHAR32))
+ ? CPP_DL_ERROR : CPP_DL_WARNING,
   "character constant too long for its type");
  
/* Truncate the constant to its natural width, and simultaneously

@@ -2038,7 +2043,8 @@ cpp_interpret_charconst (cpp_reader *pfi
  result = wide_str_to_charconst (pfile, str, pchars_seen, unsignedp,
token->type);
else
-result = narrow_str_to_charconst (pfile, str, pchars_seen, unsignedp);
+result = narrow_str_to_charconst (pfile, str, pchars_seen, unsignedp,
+ token->type);
  
if (str.text != token->val.str.text)

  free ((void *)str.text);
--- gcc/testsuite/g++.dg/cpp1z/utf8-neg.C.jj2018-10-22 09:28:06.380657152 
+0200
+++ gcc/testsuite/g++.dg/cpp1z/utf8-neg.C   2019-11-07 14:34:23.929317534 
+0100
@@ -1,6 +1,6 @@
  /* { dg-do compile { target c++17 } } */
  
  const static char c0 = u8'';		// { dg-error "empty character" }

-const static char c1 = u8'ab'; // { dg-warning "multi-character character 
constant" }
-const static char c2 = u8'\u0124'; // { dg-warning "multi-character character 
constant" }
-const

[PATCH] PR libstdc++/92124 on hashtable

2019-11-07 Thread François Dumont

From what I understood from recent fix the unordered containers need to 
be updated the same way.


I hope you'll appreciate the usage of rvalue forwarding. Containers node 
values are moved as soon as _M_assign is called with a rvalue reference 
to the source container.


Additionnaly this patch removes usages of lambdas in _Hashtable.

If you confirm it I'll check for the same on _Rb_tree.

    * include/bits/hashtable.h (_Hashtable<>::__alloc_node_gen_t): New
    template alias.
    (_Hashtable<>::__mv_if_value_type_mv_noexcept): New.
    (_Hashtable<>::__fwd_value): New.
    (_Hashtable<>::_M_assign_elements<>): Remove _NodeGenerator template
    parameter.
    (_Hashtable<>::_M_assign<>): Add _Ht template parameter.
    (_Hashtable<>::operator=(const _Hashtable<>&)): Adapt.
    (_Hashtable<>::_M_move_assign): Adapt.
    (_Hashtable<>::_Hashtable(const _Hashtable&)): Adapt.
    (_Hashtable<>::_Hashtable(const _Hashtable&, const allocator_type&)):
    Adapt.
    (_Hashtable<>::_Hashtable(_Hashtable&&, const allocator_type&)):
    Adapt.
    * testsuite/23_containers/unordered_set/92124.cc: New.

Tested under Linux x86_64.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index ab579a7059e..c2b2219d471 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -255,6 +255,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   using __reuse_or_alloc_node_gen_t =
 	__detail::_ReuseOrAllocNode<__node_alloc_type>;
+  using __alloc_node_gen_t =
+	__detail::_AllocNode<__node_alloc_type>;
 
   // Simple RAII type for managing a node containing an element
   struct _Scoped_node
@@ -280,6 +282,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__node_type* _M_node;
   };
 
+  template
+	static constexpr
+	typename conditional<__move_if_noexcept_cond::value,
+			 const _Tp&, _Tp&&>::type
+	__mv_if_value_type_mv_noexcept(_Tp& __x) noexcept
+	{ return std::move(__x); }
+
+  template
+	static constexpr
+	typename conditional::value,
+			 value_type&&, const value_type&>::type
+	__fwd_value(_Ht&&, value_type& __val) noexcept
+	{ return std::move(__val); }
+
   // Metaprogramming for picking apart hash caching.
   template
 	using __if_hash_cached = __or_<__not_<__hash_cached>, _Cond>;
@@ -406,13 +422,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // Assign *this using another _Hashtable instance. Either elements
   // are copy or move depends on the _NodeGenerator.
-  template
+  template
 	void
-	_M_assign_elements(_Ht&&, const _NodeGenerator&);
+	_M_assign_elements(_Ht&&);
 
-  template
+  template
 	void
-	_M_assign(const _Hashtable&, const _NodeGenerator&);
+	_M_assign(_Ht&&, const _NodeGenerator&);
 
   void
   _M_move_assign(_Hashtable&&, true_type);
@@ -1051,11 +1067,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  _M_bucket_count = __ht._M_bucket_count;
 	  _M_element_count = __ht._M_element_count;
 	  _M_rehash_policy = __ht._M_rehash_policy;
+	  __alloc_node_gen_t __alloc_node_gen(*this);
 	  __try
 		{
-		  _M_assign(__ht,
-			[this](const __node_type* __n)
-			{ return this->_M_allocate_node(__n->_M_v()); });
+		  _M_assign(__ht, __alloc_node_gen);
 		}
 	  __catch(...)
 		{
@@ -1070,9 +1085,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
 
   // Reuse allocated buckets and nodes.
-  _M_assign_elements(__ht,
-	[](const __reuse_or_alloc_node_gen_t& __roan, const __node_type* __n)
-	{ return __roan(__n->_M_v()); });
+  _M_assign_elements(__ht);
   return *this;
 }
 
@@ -1080,11 +1093,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	   typename _Alloc, typename _ExtractKey, typename _Equal,
 	   typename _H1, typename _H2, typename _Hash, typename _RehashPolicy,
 	   typename _Traits>
-template
+template
   void
   _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 		 _H1, _H2, _Hash, _RehashPolicy, _Traits>::
-  _M_assign_elements(_Ht&& __ht, const _NodeGenerator& __node_gen)
+  _M_assign_elements(_Ht&& __ht)
   {
 	__bucket_type* __former_buckets = nullptr;
 	std::size_t __former_bucket_count = _M_bucket_count;
@@ -1107,9 +1120,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	_M_rehash_policy = __ht._M_rehash_policy;
 	__reuse_or_alloc_node_gen_t __roan(_M_begin(), *this);
 	_M_before_begin._M_nxt = nullptr;
-	_M_assign(__ht,
-		  [&__node_gen, &__roan](__node_type* __n)
-		  { return __node_gen(__roan, __n); });
+	_M_assign(std::forward<_Ht>(__ht), __roan);
 	if (__former_buckets)
 	  _M_deallocate_buckets(__former_buckets, __former_bucket_count);
 	  }
@@ -1133,11 +1144,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	   typename _Alloc, typename _ExtractKey, typename _Equal,
 	   typename _H1, typename _H2, typename _Hash, typename _RehashPolicy,
 	   typename _Traits>
-template
+template
   void
   _Hashtable<_Key, _Value, _Alloc, _ExtractKey,

[PATCH, rs6000][committed] Fix PR92090: Allow MODE_PARTIAL_INT modes for integer constant input operands.

2019-11-07 Thread Peter Bergner

Before, LRA, we have an insn that sets a TImode pseudo with an integer
constant and a following insn that copies that TImode pseudo to a PTImode
pseudo.  During LRA spilling, we generate a new insn that sets a PTImode
pseudo to that constant directly and we ICE because we do not recognize
that as a valid insn.  The fix below fixes the ICE reported in PR92090 by
modifying our input_operand predicate to allow MODE_PARTIAL_INT modes for
integer constant input operands.

This patch (preapproved by Segher) passed bootstrap and regtesting
with no errors.  Committed.

Peter


gcc/
PR other/92090
* config/rs6000/predicates.md (input_operand): Allow MODE_PARTIAL_INT
modes for integer constants.

gcc/testsuite/
PR other/92090
* gcc.target/powerpc/pr92090.c: New test.

Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 277861)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1047,8 +1047,7 @@ (define_predicate "input_operand"
 return 1;
 
   /* Allow any integer constant.  */
-  if (GET_MODE_CLASS (mode) == MODE_INT
-  && CONST_SCALAR_INT_P (op))
+  if (SCALAR_INT_MODE_P (mode) && CONST_SCALAR_INT_P (op))
 return 1;
 
   /* Allow easy vector constants.  */
Index: gcc/testsuite/gcc.target/powerpc/pr92090.c
===
--- gcc/testsuite/gcc.target/powerpc/pr92090.c  (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr92090.c  (working copy)
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-options "-mdejagnu-cpu=power8 -Os -mbig" } */
+
+/* Verify that we don't ICE.  */
+
+_Atomic int a;
+_Atomic long double b, c;
+int j;
+void foo (void);
+void bar (int, int, int, int);
+
+void
+bug (void)
+{
+  b = 1;
+  int d, e, f, g;
+  while (a)
+;
+  for (int h = 0; h < 1; h++)
+{
+  double i = b /= 3;
+  foo ();
+  if (i)
+   {
+ if (i == 1)
+   d++;
+ e++;
+ b = 0;
+   }
+  else
+   {
+ if (i == 2)
+   f++;
+ g++;
+ b = 1;
+   }
+}
+  bar (d, e, f, g);
+  c = 1;
+  for (int h; h; h++)
+j = 0;
+}

Re: [PATCH 2/2] Introduce the gcc option --record-gcc-command-line

2019-11-07 Thread Segher Boessenkool

Hi!

On Thu, Nov 07, 2019 at 06:44:17PM +0100, Egeyar Bagcioglu wrote:
> On 11/7/19 9:03 AM, Segher Boessenkool wrote:
> >>+   ASM_OUTPUT_ASCII(asm_out_file, cmdline, cmdline_length);
> >>+}
> >>+  cmdline[0] = 0;
> >>+  ASM_OUTPUT_ASCII(asm_out_file, cmdline, 1);
> >>+
> >>+  /* The return value is currently ignored by the caller, but must be 0. 
> >>*/
> >>+  return 0;
> >>+}
> >A temporary file like this isn't so great.
> 
> GCC operates with temporary files, doesn't it? What is the concern that 
> is specific to this one? That is the most reasonable way I found to pass 
> the argv of gcc to child processes for saving. Other ways of passing it 
> that I could think of, or the idea of saving it in the driver were 
> actually very bad ideas.

Oh, this is for passing something to another process?  I guess I didn't
read it closely enough, sorry, never mind.

> >Opening a file as "r" but then
> >accessing it with "fread" is peculiar, too.
> 
> I am not sure what you mean here. Is it that you prefer "wb" and "rb" 
> instead of "w" and "r"? I thought it was enough to use a consistent pair.

I'd use fgets or similar, not fread.


Segher

[PATCH 7/X] [libsanitizer] Add tests

2019-11-07 Thread Matthew Malcomson

Adding hwasan tests.

Frankly, these could be tidied up a little.
I will be tidying them up while getting feedback on the hwasan introduction.


gcc/testsuite/ChangeLog:

2019-11-07  Matthew Malcomson  

* c-c++-common/hwasan/arguments.c: New test.
* c-c++-common/hwasan/halt_on_error-1.c: New test.
* g++.dg/hwasan/rvo-handled.c: New test.
* g++.dg/hwasan/try-catch-0.cpp: New test.
* g++.dg/hwasan/try-catch-1.cpp: New test.
* gcc.dg/hwasan/aligned-alloc.c: New test.
* gcc.dg/hwasan/alloca-array-accessible.c: New test.
* gcc.dg/hwasan/alloca-gets-different-tag.c: New test.
* gcc.dg/hwasan/alloca-outside-caught.c: New test.
* gcc.dg/hwasan/bitfield-1.c: New test.
* gcc.dg/hwasan/bitfield-2.c: New test.
* gcc.dg/hwasan/builtin-special-handling.c: New test.
* gcc.dg/hwasan/check-interface.c: New test.
* gcc.dg/hwasan/hwasan-poison-optimisation.c: New test.
* gcc.dg/hwasan/hwasan-thread-access-parent.c: New test.
* gcc.dg/hwasan/hwasan-thread-basic-failure.c: New test.
* gcc.dg/hwasan/hwasan-thread-clears-stack.c: New test.
* gcc.dg/hwasan/hwasan-thread-success.c: New test.
* gcc.dg/hwasan/hwasan.exp: New file.
* gcc.dg/hwasan/kernel-defaults.c: New test.
* gcc.dg/hwasan/large-aligned-0.c: New test.
* gcc.dg/hwasan/large-aligned-1.c: New test.
* gcc.dg/hwasan/macro-definition.c: New test.
* gcc.dg/hwasan/nested-functions-0.c: New test.
* gcc.dg/hwasan/nested-functions-1.c: New test.
* gcc.dg/hwasan/nested-functions-2.c: New test.
* gcc.dg/hwasan/no-sanitize-attribute.c: New test.
* gcc.dg/hwasan/random-frame-tag.c: New test.
* gcc.dg/hwasan/setjmp-longjmp-0.c: New test.
* gcc.dg/hwasan/setjmp-longjmp-1.c: New test.
* gcc.dg/hwasan/stack-tagging-basic-0.c: New test.
* gcc.dg/hwasan/stack-tagging-basic-1.c: New test.
* gcc.dg/hwasan/stack-tagging-disable.c: New test.
* gcc.dg/hwasan/vararray-outside-caught.c: New test.
* gcc.dg/hwasan/very-large-objects.c: New test.
* lib/hwasan-dg.exp: New file.



### Attachment also inlined for ease of reply###


diff --git a/gcc/testsuite/c-c++-common/hwasan/arguments.c 
b/gcc/testsuite/c-c++-common/hwasan/arguments.c
new file mode 100644
index 
..2d563eb8541694d501b021babd9452fd7fd502a3
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/hwasan/arguments.c
@@ -0,0 +1,7 @@
+/*
+   TODO
+   Somehow test the conflict of arguments
+   -fsanitize=hwaddress -fsanitize=kernel-address
+   -fsanitize=hwaddress -fsanitize=address
+   -fsanitize=hwaddress -fsanitize=thread
+ */
diff --git a/gcc/testsuite/c-c++-common/hwasan/halt_on_error-1.c 
b/gcc/testsuite/c-c++-common/hwasan/halt_on_error-1.c
new file mode 100644
index 
..118191e2e00bd07bd4839888d2fb29baec926c60
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/hwasan/halt_on_error-1.c
@@ -0,0 +1,25 @@
+/* Test recovery mode.  */
+/* { dg-do run } */
+/* { dg-options "-fsanitize-recover=hwaddress" } */
+/* { dg-set-target-env-var HWASAN_OPTIONS "halt_on_error=false" } */
+/* { dg-shouldfail "hwasan" } */
+
+#include 
+
+volatile int ten = 16;
+
+int main() {
+  char x[10];
+  __builtin_memset(x, 0, ten + 1);
+  asm volatile ("" : : : "memory");
+  volatile int res = x[ten];
+  x[ten] = res + 3;
+  res = x[ten];
+  return 0;
+}
+
+/* { dg-output "WRITE of size 17 at 0x\[0-9a-f\]+.*" } */
+/* { dg-output "READ of size 1 at 0x\[0-9a-f\]+.*" } */
+/* { dg-output "WRITE of size 1 at 0x\[0-9a-f\]+.*" } */
+/* { dg-output "READ of size 1 at 0x\[0-9a-f\]+.*" } */
+
diff --git a/gcc/testsuite/g++.dg/hwasan/rvo-handled.c 
b/gcc/testsuite/g++.dg/hwasan/rvo-handled.c
new file mode 100644
index 
..6e6934a0be1b0ce14c459555168f6a2590a8ec7f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/hwasan/rvo-handled.c
@@ -0,0 +1,56 @@
+/* { dg-do run } */
+/* TODO Ensure this test has enough optimisation to get RVO. */
+
+#define assert(x) if (!(x)) __builtin_abort ()
+
+struct big_struct {
+int left;
+int right;
+void *ptr;
+int big_array[100];
+};
+
+/*
+   Tests for RVO (basically, checking -fsanitize=hwaddress has not broken RVO
+   in any way).
+
+   0) The value is accessible in both functions without a hwasan complaint.
+   1) RVO does happen.
+ */
+
+struct big_struct __attribute__ ((noinline))
+return_on_stack()
+{
+  struct big_struct x;
+  x.left = 100;
+  x.right = 20;
+  x.big_array[10] = 30;
+  x.ptr = 
+  return x;
+}
+
+struct big_struct __attribute__ ((noinline))
+unnamed_return_on_stack()
+{
+  return (struct big_struct){
+  .left = 100,
+  .right = 20,
+  .ptr = __builtin_frame_address (0),
+  .big_array = {0}
+  };
+}
+
+int main()
+{
+  struct big_struct x;
+  x = return_on_stack();
+  /*

[PATCH 5/X] [libsanitizer][mid-end] Introduce stack variable handling for HWASAN

2019-11-07 Thread Matthew Malcomson

Handling stack variables has three features.

1) Ensure HWASAN required alignment for stack variables

When tagging shadow memory, we need to ensure that each tag granule is
only used by one variable at a time.

This is done by ensuring that each tagged variable is aligned to the tag
granule representation size and also ensure that the end of each
variable as an alignment boundary between the end and the start of any
other data stored on the stack.

This patch ensures that by adding alignment requirements in
`align_local_variable` and forcing all stack variable allocation to be
deferred so that `expand_stack_vars` can ensure the stack pointer is
aligned before allocating any variable for the current frame.

2) Put tags into each stack variable pointer

Make sure that every pointer to a stack variable includes a tag of some
sort on it.

The way tagging works is:
  1) For every new stack frame, a random tag is generated.
  2) A base register is formed from the stack pointer value and this
 random tag.
  3) References to stack variables are now formed with RTL describing an
 offset from this base in both tag and value.

The random tag generation is handled by a backend hook.  This hook
decides whether to introduce a random tag or use the stack background
based on the parameter hwasan-random-frame-tag.  Using the stack
background is necessary for testing and bootstrap.  It is necessary
during bootstrap to avoid breaking the `configure` test program for
determining stack direction.

Using the stack background means that every stack frame has the initial
tag of zero and variables are tagged with incrementing tags from 1,
which also makes debugging a bit easier.

The tag offsets are also handled by a backend hook.

This patch also adds some macros defining how the HWASAN shadow memory
is stored and how a tag is stored in a pointer.

3) For each stack variable, tag and untag the shadow stack on function
   prologue and epilogue.

On entry to each function we tag the relevant shadow stack region for
each stack variable the tag to match the tag added to each pointer for
that variable.

This is the first patch where we use the HWASAN shadow space, so we need
to add in the libhwasan initialisation code that creates this shadow
memory region into the binary we produce.  This instrumentation is done
in `compile_file`.

When exiting a function we need to ensure the shadow stack for this
function has no remaining tag.  Without clearing the shadow stack area
for this stack frame, later function calls could get false positives
when those later function calls check untagged areas (such as parameters
passed on the stack) against a shadow stack area with left-over tag.

Hence we ensure that the entire stack frame is cleared on function exit.

gcc/ChangeLog:

2019-11-07  Matthew Malcomson  

* asan.c (hwasan_record_base): New function.
(hwasan_emit_untag_frame): New.
(hwasan_increment_tag): New function.
(hwasan_with_tag): New function.
(hwasan_tag_init): New function.
(initialize_sanitizer_builtins): Define new builtins.
(ATTR_NOTHROW_LIST): New macro.
(hwasan_current_tag): New.
(hwasan_emit_prologue): New.
(hwasan_create_untagged_base): New.
(hwasan_finish_file): New.
(hwasan_sanitize_stack_p): New.
(memory_tagging_p): New.
* asan.h (hwasan_record_base): New declaration.
(hwasan_emit_untag_frame): New.
(hwasan_increment_tag): New declaration.
(hwasan_with_tag): New declaration.
(hwasan_sanitize_stack_p): New declaration.
(hwasan_tag_init): New declaration.
(memory_tagging_p): New declaration.
(HWASAN_TAG_SIZE): New macro.
(HWASAN_TAG_GRANULE_SIZE):New macro.
(HWASAN_SHIFT):New macro.
(HWASAN_SHIFT_RTX):New macro.
(HWASAN_STACK_BACKGROUND):New macro.
(hwasan_finish_file): New.
(hwasan_current_tag): New.
(hwasan_create_untagged_base): New.
(hwasan_emit_prologue): New.
* cfgexpand.c (struct stack_vars_data): Add information to
record hwasan variable stack offsets.
(expand_stack_vars): Ensure variables are offset from a tagged
base. Record offsets for hwasan. Ensure alignment.
(expand_used_vars): Call function to emit prologue, and get
untagging instructions for function exit.
(align_local_variable): Ensure alignment.
(defer_stack_allocation): Ensure all variables are deferred so
they can be handled by `expand_stack_vars`.
(expand_one_stack_var_at): Account for tags in
variables when using HWASAN.
(expand_one_stack_var_1): Pass new argument to
expand_one_stack_var_at.
(init_vars_expansion): Initialise hwasan internal variables when
starting variable expansion.
* doc/tm.texi (TARGET_MEMTAG_GENTAG): Document.
* doc/tm.texi.in (TARGET_MEMTAG_GENTAG): Document.

[PATCH 6/X] [libsanitizer] Add hwasan pass and associated gimple changes

2019-11-07 Thread Matthew Malcomson

There are four main features to this change:

1) Check pointer tags match address tags.

In the new `hwasan` pass we put HWASAN_CHECK internal functions around
all memory accesses, to check that tags in the pointer being used match
the tag stored in shadow memory for the memory region being used.

These internal functions are expanded into actual checks in the sanopt
pass that happens just before expansion into RTL.

We use the same mechanism that currently inserts ASAN_CHECK internal
functions to insert the new HWASAN_CHECK functions.

2) Instrument known builtin function calls.

Handle all builtin functions that we know use memory accesses.
This commit uses the machinery added for ASAN to identify builtin
functions that access memory.

The main differences between the approaches for HWASAN and ASAN are:
 - libhwasan intercepts much less builtin functions.
 - Alloca needs to be transformed differently (instead of adding
   redzones it needs to tag shadow memory and return a tagged pointer).
 - stack_restore needs to untag the shadow stack between the current
   position and where it's going.
 - `noreturn` functions can not be handled by simply unpoisoning the
   entire shadow stack -- there is no "always valid" tag.
   (exceptions and things such as longjmp need to be handled in a
   different way).

For hardware implemented checking (such as AArch64's memory tagging
extension) alloca and stack_restore will need to be handled by hooks in
the backend rather than transformation at the gimple level.  This will
allow architecture specific handling of such stack modifications.

3) Introduce HWASAN block-scope poisoning

Here we use exactly the same mechanism as ASAN_MARK to poison/unpoison
variables on entry/exit of a block.

In order to simply use the exact same machinery we're using the same
internal functions until the SANOPT pass.  This means that all handling
of ASAN_MARK is the same.
This has the negative that the naming may be a little confusing, but a
positive that handling of the internal function doesn't have to be
duplicated for a function that behaves exactly the same but has a
different name.

gcc/ChangeLog:

2019-11-07  Matthew Malcomson  

* asan.c (handle_builtin_stack_restore): Account for HWASAN.
(handle_builtin_alloca): Account for HWASAN.
(get_mem_refs_of_builtin_call): Special case strlen for HWASAN.
(report_error_func): Assert not HWASAN.
(build_check_stmt): Make HWASAN_CHECK instead of ASAN_CHECK.
(instrument_derefs): HWASAN does not tag globals.
(maybe_instrument_call): Don't instrument `noreturn` functions.
(initialize_sanitizer_builtins): Add new type.
(asan_expand_mark_ifn): Account for HWASAN.
(asan_expand_check_ifn): Assert never called by HWASAN.
(asan_expand_poison_ifn): Account for HWASAN.
(hwasan_instrument): New.
(hwasan_base): New.
(hwasan_emit_untag_frame): Free block-scope-var hash map.
(hwasan_check_func): New.
(hwasan_expand_check_ifn): New.
(hwasan_expand_mark_ifn): New.
(gate_hwasan): New.
(class pass_hwasan): New.
(make_pass_hwasan): New.
(class pass_hwasan_O0): New.
(make_pass_hwasan_O0): New.
* asan.h (hwasan_base): New decl.
(hwasan_expand_check_ifn): New decl.
(hwasan_expand_mark_ifn): New decl.
(gate_hwasan): New decl.
(enum hwasan_mark_flags): New.
(asan_intercepted_p): Always false for hwasan.
(asan_sanitize_use_after_scope): Account for HWASAN.
* builtin-types.def (BT_FN_PTR_CONST_PTR_UINT8): New.
* gimple-pretty-print.c (dump_gimple_call_args): Account for
HWASAN.
* gimplify.c (asan_poison_variable): Account for HWASAN.
(gimplify_function_tree): Remove requirement of
SANITIZE_ADDRESS, requiring asan or hwasan is accounted for in
`asan_sanitize_use_after_scope`.
* internal-fn.c (expand_HWASAN_CHECK): New.
(expand_HWASAN_CHOOSE_TAG): New.
(expand_HWASAN_MARK): New.
* internal-fn.def (HWASAN_CHOOSE_TAG): New.
(HWASAN_CHECK): New.
(HWASAN_MARK): New.
* passes.def: Add hwasan and hwasan_O0 passes.
* sanitizer.def (BUILT_IN_HWASAN_LOAD1): New.
(BUILT_IN_HWASAN_LOAD2): New.
(BUILT_IN_HWASAN_LOAD4): New.
(BUILT_IN_HWASAN_LOAD8): New.
(BUILT_IN_HWASAN_LOAD16): New.
(BUILT_IN_HWASAN_LOADN): New.
(BUILT_IN_HWASAN_STORE1): New.
(BUILT_IN_HWASAN_STORE2): New.
(BUILT_IN_HWASAN_STORE4): New.
(BUILT_IN_HWASAN_STORE8): New.
(BUILT_IN_HWASAN_STORE16): New.
(BUILT_IN_HWASAN_STOREN): New.
(BUILT_IN_HWASAN_LOAD1_NOABORT): New.
(BUILT_IN_HWASAN_LOAD2_NOABORT): New.
(BUILT_IN_HWASAN_LOAD4_NOABORT): New.
(BUILT_IN_HWASAN_LOAD8_NOABORT): New.
(BUILT_IN_HWASAN_LOAD16_NOABORT): New.

[PATCH 4/X] [libsanitizer][options] Add hwasan flags and argument parsing

2019-11-07 Thread Matthew Malcomson

These flags can't be used at the same time as any of the other
sanitizers.
We add an equivalent flag to -static-libasan in -static-libhwasan to
ensure static linking.

The -fsanitize=kernel-hwaddress option is for compiling targeting the
kernel.  This flag has defaults that allow compiling KASAN with tags as
it is currently implemented.
These defaults are that we do not sanitize variables on the stack and
always recover from a detected bug.
Stack tagging in the kernel is a future aim, stack instrumentation has
not yet been enabled for the kernel for clang either
(https://lists.infradead.org/pipermail/linux-arm-kernel/2019-October/687121.html).

We introduce a backend hook `targetm.memtag.can_tag_addresses` that
indicates to the mid-end whether a target has a feature like AArch64 TBI
where the top byte of an address is ignored.
Without this feature hwasan sanitization is not done.

gcc/ChangeLog:

2019-11-07  Matthew Malcomson  

* asan.c (memory_tagging_p): New.
* asan.h (memory_tagging_p): New.
* common.opt (flag_sanitize_recover): Default for kernel
hwaddress.
(static-libhwasan): New cli option.
* config/aarch64/aarch64.c (aarch64_can_tag_addresses): New.
(TARGET_MEMTAG_CAN_TAG_ADDRESSES): New.
* config/gnu-user.h (LIBHWASAN_EARLY_SPEC): hwasan equivalent of
asan command line flags.
* cppbuiltin.c (define_builtin_macros_for_compilation_flags):
Add hwasan equivalent of __SANITIZE_ADDRESS__.
* doc/tm.texi: Document new hook.
* doc/tm.texi.in: Document new hook.
* flag-types.h (enum sanitize_code): New sanitizer values.
* gcc.c (STATIC_LIBHWASAN_LIBS): New macro.
(LIBHWASAN_SPEC): New macro.
(LIBHWASAN_EARLY_SPEC): New macro.
(SANITIZER_EARLY_SPEC): Update to include hwasan.
(SANITIZER_SPEC): Update to include hwasan.
(sanitize_spec_function): Use hwasan options.
* opts.c (finish_options): Describe conflicts between address
sanitizers.
(sanitizer_opts): Introduce new sanitizer flags.
(common_handle_option): Add defaults for kernel sanitizer.
* params.def (PARAM_HWASAN_RANDOM_FRAME_TAG): New.
(PARAM_HWASAN_STACK): New.
* params.h (HWASAN_STACK): New.
(HWASAN_RANDOM_FRAME_TAG): New.
* target.def (HOOK_PREFIX): Add new hook.
* targhooks.c (default_memtag_can_tag_addresses): New.
* toplev.c (process_options): Ensure hwasan only on TBI
architectures.

gcc/c-family/ChangeLog:

2019-11-07  Matthew Malcomson  

* c-attribs.c (handle_no_sanitize_hwaddress_attribute): New
attribute.



### Attachment also inlined for ease of reply###


diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 
1c9f28587fbb2348cc30e302e889a5a22906901a..a5e68061ff956018957b6be137a7b2f2b7353647
 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -54,6 +54,8 @@ static tree handle_cold_attribute (tree *, tree, tree, int, 
bool *);
 static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
  int, bool *);
+static tree handle_no_sanitize_hwaddress_attribute (tree *, tree, tree,
+   int, bool *);
 static tree handle_no_sanitize_thread_attribute (tree *, tree, tree,
 int, bool *);
 static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree,
@@ -412,6 +414,8 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_no_sanitize_attribute, NULL },
   { "no_sanitize_address",0, 0, true, false, false, false,
  handle_no_sanitize_address_attribute, NULL },
+  { "no_sanitize_hwaddress",0, 0, true, false, false, false,
+ handle_no_sanitize_hwaddress_attribute, NULL },
   { "no_sanitize_thread", 0, 0, true, false, false, false,
  handle_no_sanitize_thread_attribute, NULL },
   { "no_sanitize_undefined",  0, 0, true, false, false, false,
@@ -941,6 +945,22 @@ handle_no_sanitize_address_attribute (tree *node, tree 
name, tree, int,
   return NULL_TREE;
 }
 
+/* Handle a "no_sanitize_hwaddress" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_no_sanitize_hwaddress_attribute (tree *node, tree name, tree, int,
+ bool *no_add_attrs)
+{
+  *no_add_attrs = true;
+  if (TREE_CODE (*node) != FUNCTION_DECL)
+warning (OPT_Wattributes, "%qE attribute ignored", name);
+  else
+add_no_sanitize_value (*node, SANITIZE_HWADDRESS);
+
+  return NULL_TREE;
+}
+
 /* Handle a "no_sanitize_thread" attribute; arguments as in
struct attribute_spec.handler.  */
 
diff --git

[PATCH 1/X] [libsanitizer] Tie the hwasan library into our build system

2019-11-07 Thread Matthew Malcomson

This patch does tries to tie libhwasan into the GCC build system in the
same way that the other sanitizer runtime libraries are handled.

libsanitizer/ChangeLog:

2019-11-07  Matthew Malcomson  

* Makefile.am:  Build libhwasan.
* Makefile.in:  Build libhwasan.
* asan/Makefile.in:  Build libhwasan.
* configure:  Build libhwasan.
* configure.ac:  Build libhwasan.
* hwasan/Makefile.am: New file.
* hwasan/Makefile.in: New file.
* hwasan/libtool-version: New file.
* interception/Makefile.in: Build libhwasan.
* libbacktrace/Makefile.in: Build libhwasan.
* libsanitizer.spec.in: Build libhwasan.
* lsan/Makefile.in: Build libhwasan.
* merge.sh: Build libhwasan.
* sanitizer_common/Makefile.in: Build libhwasan.
* tsan/Makefile.in: Build libhwasan.
* ubsan/Makefile.in: Build libhwasan.



### Attachment also inlined for ease of reply###


diff --git a/libsanitizer/Makefile.am b/libsanitizer/Makefile.am
index 
65ed1e712378ef453f820f86c4d3221f9dee5f2c..2a7e8e1debe838719db0f0fad218b2543cc3111b
 100644
--- a/libsanitizer/Makefile.am
+++ b/libsanitizer/Makefile.am
@@ -14,11 +14,12 @@ endif
 if LIBBACKTRACE_SUPPORTED
 SUBDIRS += libbacktrace
 endif
-SUBDIRS += lsan asan ubsan
+SUBDIRS += lsan asan ubsan hwasan
 nodist_saninclude_HEADERS += \
   include/sanitizer/lsan_interface.h \
   include/sanitizer/asan_interface.h \
-  include/sanitizer/tsan_interface.h
+  include/sanitizer/tsan_interface.h \
+  include/sanitizer/hwasan_interface.h
 if TSAN_SUPPORTED
 SUBDIRS += tsan
 endif
diff --git a/libsanitizer/Makefile.in b/libsanitizer/Makefile.in
index 
0d789b3a59d21ea2e5a23057ca3afe15425feec4..36aa952af7e04bc0e4fb94cdcd584d539193d781
 100644
--- a/libsanitizer/Makefile.in
+++ b/libsanitizer/Makefile.in
@@ -92,7 +92,8 @@ target_triplet = @target@
 @SANITIZER_SUPPORTED_TRUE@am__append_1 = 
include/sanitizer/common_interface_defs.h \
 @SANITIZER_SUPPORTED_TRUE@ include/sanitizer/lsan_interface.h \
 @SANITIZER_SUPPORTED_TRUE@ include/sanitizer/asan_interface.h \
-@SANITIZER_SUPPORTED_TRUE@ include/sanitizer/tsan_interface.h
+@SANITIZER_SUPPORTED_TRUE@ include/sanitizer/tsan_interface.h \
+@SANITIZER_SUPPORTED_TRUE@ include/sanitizer/hwasan_interface.h
 @SANITIZER_SUPPORTED_TRUE@@USING_MAC_INTERPOSE_FALSE@am__append_2 = 
interception
 @LIBBACKTRACE_SUPPORTED_TRUE@@SANITIZER_SUPPORTED_TRUE@am__append_3 = 
libbacktrace
 @SANITIZER_SUPPORTED_TRUE@@TSAN_SUPPORTED_TRUE@am__append_4 = tsan
@@ -206,7 +207,7 @@ ETAGS = etags
 CTAGS = ctags
 CSCOPE = cscope
 DIST_SUBDIRS = sanitizer_common interception libbacktrace lsan asan \
-   ubsan tsan
+   ubsan hwasan tsan
 ACLOCAL = @ACLOCAL@
 ALLOC_FILE = @ALLOC_FILE@
 AMTAR = @AMTAR@
@@ -328,6 +329,7 @@ install_sh = @install_sh@
 libdir = @libdir@
 libexecdir = @libexecdir@
 link_libasan = @link_libasan@
+link_libhwasan = @link_libhwasan@
 link_liblsan = @link_liblsan@
 link_libtsan = @link_libtsan@
 link_libubsan = @link_libubsan@
@@ -361,7 +363,7 @@ sanincludedir = 
$(libdir)/gcc/$(target_alias)/$(gcc_version)/include/sanitizer
 nodist_saninclude_HEADERS = $(am__append_1)
 @SANITIZER_SUPPORTED_TRUE@SUBDIRS = sanitizer_common $(am__append_2) \
 @SANITIZER_SUPPORTED_TRUE@ $(am__append_3) lsan asan ubsan \
-@SANITIZER_SUPPORTED_TRUE@ $(am__append_4)
+@SANITIZER_SUPPORTED_TRUE@ hwasan $(am__append_4)
 gcc_version := $(shell @get_gcc_base_ver@ $(top_srcdir)/../gcc/BASE-VER)
 
 # Work around what appears to be a GNU make bug handling MAKEFLAGS
diff --git a/libsanitizer/asan/Makefile.in b/libsanitizer/asan/Makefile.in
index 
00b6082da5372efd679ddc230f588bbc58161ef6..76689c3b224b1fb04895ae48829eac4b6784cd84
 100644
--- a/libsanitizer/asan/Makefile.in
+++ b/libsanitizer/asan/Makefile.in
@@ -382,6 +382,7 @@ install_sh = @install_sh@
 libdir = @libdir@
 libexecdir = @libexecdir@
 link_libasan = @link_libasan@
+link_libhwasan = @link_libhwasan@
 link_liblsan = @link_liblsan@
 link_libtsan = @link_libtsan@
 link_libubsan = @link_libubsan@
diff --git a/libsanitizer/configure b/libsanitizer/configure
index 
79b5c1eadb59018bca13a33f19f3494c170365ee..ff72af73e6f77aaf93bf39e6799f896851a377dd
 100755
--- a/libsanitizer/configure
+++ b/libsanitizer/configure
@@ -657,6 +657,7 @@ USING_MAC_INTERPOSE_TRUE
 link_liblsan
 link_libubsan
 link_libtsan
+link_libhwasan
 link_libasan
 LSAN_SUPPORTED_FALSE
 LSAN_SUPPORTED_TRUE
@@ -12334,7 +12335,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12337 "configure"
+#line 12338 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12440,7 +12441,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12443 "configure"
+#line 12444 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15916,6 +15917,10 @@ fi

[PATCH 3/X] [libsanitizer] Add option to bootstrap using HWASAN

2019-11-07 Thread Matthew Malcomson

This is an analogous option to --bootstrap-asan to configure.  It allows
bootstrapping GCC using HWASAN.

For the same reasons as for ASAN we have to avoid using the HWASAN
sanitizer when compiling libiberty and the lto-plugin.

Also add a function to query whether -fsanitize=hwaddress has been
passed.

ChangeLog:

2019-08-29  Matthew Malcomson  

* configure: Regenerate.
* configure.ac: Add --bootstrap-hwasan option.

config/ChangeLog:

2019-11-07  Matthew Malcomson  

* bootstrap-hwasan.mk: New file.

libiberty/ChangeLog:

2019-11-07  Matthew Malcomson  

* configure: Regenerate.
* configure.ac: Avoid using sanitizer.

lto-plugin/ChangeLog:

2019-11-07  Matthew Malcomson  

* Makefile.am: Avoid using sanitizer.
* Makefile.in: Regenerate.



### Attachment also inlined for ease of reply###


diff --git a/config/bootstrap-hwasan.mk b/config/bootstrap-hwasan.mk
new file mode 100644
index 
..4f60bed3fd6e98b47a3a38aea6eba2a7c320da25
--- /dev/null
+++ b/config/bootstrap-hwasan.mk
@@ -0,0 +1,8 @@
+# This option enables -fsanitize=hwaddress for stage2 and stage3.
+
+STAGE2_CFLAGS += -fsanitize=hwaddress
+STAGE3_CFLAGS += -fsanitize=hwaddress
+POSTSTAGE1_LDFLAGS += -fsanitize=hwaddress -static-libhwasan \
+ -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/ \
+ -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/hwasan/ \
+ -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/hwasan/.libs
diff --git a/configure b/configure
index 
aec9186b2b0123d3088b69eb1ee541567654953e..6f71b111bd18ec053180beecf83dd4549e83c2b9
 100755
--- a/configure
+++ b/configure
@@ -7270,7 +7270,7 @@ fi
 # or bootstrap-ubsan, bootstrap it.
 if echo " ${target_configdirs} " | grep " libsanitizer " > /dev/null 2>&1; then
   case "$BUILD_CONFIG" in
-*bootstrap-asan* | *bootstrap-ubsan* )
+*bootstrap-hwasan* | *bootstrap-asan* | *bootstrap-ubsan* )
   bootstrap_target_libs=${bootstrap_target_libs}target-libsanitizer,
   bootstrap_fixincludes=yes
   ;;
diff --git a/configure.ac b/configure.ac
index 
b8ce2ad20b9d03e42731252a9ec2a8417c13e566..16bfdf164555dad94c789f17b6a63ba1a2e3e9f4
 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2775,7 +2775,7 @@ fi
 # or bootstrap-ubsan, bootstrap it.
 if echo " ${target_configdirs} " | grep " libsanitizer " > /dev/null 2>&1; then
   case "$BUILD_CONFIG" in
-*bootstrap-asan* | *bootstrap-ubsan* )
+*bootstrap-hwasan* | *bootstrap-asan* | *bootstrap-ubsan* )
   bootstrap_target_libs=${bootstrap_target_libs}target-libsanitizer,
   bootstrap_fixincludes=yes
   ;;
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 
6c9579bfaff955eb43875b404fb7db1a667bf522..427a2f4e56b37e165b72cc166e1acb0732449a8b
 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -2645,6 +2645,11 @@ Some examples of build configurations designed for 
developers of GCC are:
 Compiles GCC itself using Address Sanitization in order to catch invalid memory
 accesses within the GCC code.
 
+@item @samp{bootstrap-hwasan}
+Compiles GCC itself using HWAddress Sanitization in order to catch invalid
+memory accesses within the GCC code.  This option is only available on AArch64
+targets with a very recent linux kernel (5.4 or later).
+
 @section Building a cross compiler
 
 When building a cross compiler, it is not generally possible to do a
diff --git a/libiberty/configure b/libiberty/configure
index 
7a34dabec32b0b383bd33f07811757335f4dd39c..cb2dd4ff5295598343cc18b3a79a86a778f2261d
 100755
--- a/libiberty/configure
+++ b/libiberty/configure
@@ -5261,6 +5261,7 @@ fi
 NOASANFLAG=
 case " ${CFLAGS} " in
   *\ -fsanitize=address\ *) NOASANFLAG=-fno-sanitize=address ;;
+  *\ -fsanitize=hwaddress\ *) NOASANFLAG=-fno-sanitize=hwaddress ;;
 esac
 
 
diff --git a/libiberty/configure.ac b/libiberty/configure.ac
index 
f1ce76010c9acde79c5dc46686a78b2e2f19244e..043237628b79cbf37d07359b59c5ffe17a7a22ef
 100644
--- a/libiberty/configure.ac
+++ b/libiberty/configure.ac
@@ -240,6 +240,7 @@ AC_SUBST(PICFLAG)
 NOASANFLAG=
 case " ${CFLAGS} " in
   *\ -fsanitize=address\ *) NOASANFLAG=-fno-sanitize=address ;;
+  *\ -fsanitize=hwaddress\ *) NOASANFLAG=-fno-sanitize=hwaddress ;;
 esac
 AC_SUBST(NOASANFLAG)
 
diff --git a/lto-plugin/Makefile.am b/lto-plugin/Makefile.am
index 
28dc21014b2e86988fa88adabd63ce6092e18e02..34aa397d785e3cc9b6975de460d065900364c3ff
 100644
--- a/lto-plugin/Makefile.am
+++ b/lto-plugin/Makefile.am
@@ -11,8 +11,8 @@ AM_CPPFLAGS = -I$(top_srcdir)/../include $(DEFS)
 AM_CFLAGS = @ac_lto_plugin_warn_cflags@
 AM_LDFLAGS = @ac_lto_plugin_ldflags@
 AM_LIBTOOLFLAGS = --tag=disable-static
-override CFLAGS := $(filter-out -fsanitize=address,$(CFLAGS))
-override LDFLAGS := $(filter-out -fsanitize=address,$(LDFLAGS))
+override CFLAGS := $(filter-out -fsanitize=address 
-fsanitize=hwaddress,$(CFLAGS))
+override LDFLAGS := $(filter-out

[PATCH 2/X] [libsanitizer] Only build libhwasan when targeting AArch64

2019-11-07 Thread Matthew Malcomson

Though the library has limited support for x86, we don't have any
support for generating code targeting x86 so there is no point building
for that target.

libsanitizer/ChangeLog:

2019-11-07  Matthew Malcomson  

* Makefile.am: Condition building hwasan directory.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Set HWASAN_SUPPORTED based on target
architecture.
* configure.tgt: Likewise.



### Attachment also inlined for ease of reply###


diff --git a/libsanitizer/Makefile.am b/libsanitizer/Makefile.am
index 
2a7e8e1debe838719db0f0fad218b2543cc3111b..065a65e78d49f7689a01ecb64db1f07ca83aa987
 100644
--- a/libsanitizer/Makefile.am
+++ b/libsanitizer/Makefile.am
@@ -14,7 +14,7 @@ endif
 if LIBBACKTRACE_SUPPORTED
 SUBDIRS += libbacktrace
 endif
-SUBDIRS += lsan asan ubsan hwasan
+SUBDIRS += lsan asan ubsan
 nodist_saninclude_HEADERS += \
   include/sanitizer/lsan_interface.h \
   include/sanitizer/asan_interface.h \
@@ -23,6 +23,9 @@ nodist_saninclude_HEADERS += \
 if TSAN_SUPPORTED
 SUBDIRS += tsan
 endif
+if HWASAN_SUPPORTED
+SUBDIRS += hwasan
+endif
 endif
 
 ## May be used by toolexeclibdir.
diff --git a/libsanitizer/Makefile.in b/libsanitizer/Makefile.in
index 
36aa952af7e04bc0e4fb94cdcd584d539193d781..75a99491cb1d4422fd5e2d93cae93eb883ae0963
 100644
--- a/libsanitizer/Makefile.in
+++ b/libsanitizer/Makefile.in
@@ -97,6 +97,7 @@ target_triplet = @target@
 @SANITIZER_SUPPORTED_TRUE@@USING_MAC_INTERPOSE_FALSE@am__append_2 = 
interception
 @LIBBACKTRACE_SUPPORTED_TRUE@@SANITIZER_SUPPORTED_TRUE@am__append_3 = 
libbacktrace
 @SANITIZER_SUPPORTED_TRUE@@TSAN_SUPPORTED_TRUE@am__append_4 = tsan
+@HWASAN_SUPPORTED_TRUE@@SANITIZER_SUPPORTED_TRUE@am__append_5 = hwasan
 subdir = .
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
@@ -207,7 +208,7 @@ ETAGS = etags
 CTAGS = ctags
 CSCOPE = cscope
 DIST_SUBDIRS = sanitizer_common interception libbacktrace lsan asan \
-   ubsan hwasan tsan
+   ubsan tsan hwasan
 ACLOCAL = @ACLOCAL@
 ALLOC_FILE = @ALLOC_FILE@
 AMTAR = @AMTAR@
@@ -363,7 +364,7 @@ sanincludedir = 
$(libdir)/gcc/$(target_alias)/$(gcc_version)/include/sanitizer
 nodist_saninclude_HEADERS = $(am__append_1)
 @SANITIZER_SUPPORTED_TRUE@SUBDIRS = sanitizer_common $(am__append_2) \
 @SANITIZER_SUPPORTED_TRUE@ $(am__append_3) lsan asan ubsan \
-@SANITIZER_SUPPORTED_TRUE@ hwasan $(am__append_4)
+@SANITIZER_SUPPORTED_TRUE@ $(am__append_4) $(am__append_5)
 gcc_version := $(shell @get_gcc_base_ver@ $(top_srcdir)/../gcc/BASE-VER)
 
 # Work around what appears to be a GNU make bug handling MAKEFLAGS
diff --git a/libsanitizer/configure b/libsanitizer/configure
index 
ff72af73e6f77aaf93bf39e6799f896851a377dd..4e95194fe3567b1227c4036c2f5bf6540f735975
 100755
--- a/libsanitizer/configure
+++ b/libsanitizer/configure
@@ -659,6 +659,8 @@ link_libubsan
 link_libtsan
 link_libhwasan
 link_libasan
+HWASAN_SUPPORTED_FALSE
+HWASAN_SUPPORTED_TRUE
 LSAN_SUPPORTED_FALSE
 LSAN_SUPPORTED_TRUE
 TSAN_SUPPORTED_FALSE
@@ -12335,7 +12337,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12338 "configure"
+#line 12340 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12441,7 +12443,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12444 "configure"
+#line 12446 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15792,6 +15794,7 @@ fi
 # Get target configury.
 unset TSAN_SUPPORTED
 unset LSAN_SUPPORTED
+unset HWASAN_SUPPORTED
 . ${srcdir}/configure.tgt
  if test "x$TSAN_SUPPORTED" = "xyes"; then
   TSAN_SUPPORTED_TRUE=
@@ -15809,6 +15812,14 @@ else
   LSAN_SUPPORTED_FALSE=
 fi
 
+ if test "x$HWASAN_SUPPORTED" = "xyes"; then
+  HWASAN_SUPPORTED_TRUE=
+  HWASAN_SUPPORTED_FALSE='#'
+else
+  HWASAN_SUPPORTED_TRUE='#'
+  HWASAN_SUPPORTED_FALSE=
+fi
+
 
 # Check for functions needed.
 for ac_func in clock_getres clock_gettime clock_settime lstat readlink
@@ -16791,7 +16802,7 @@ ac_config_files="$ac_config_files Makefile 
libsanitizer.spec libbacktrace/backtr
 ac_config_headers="$ac_config_headers config.h"
 
 
-ac_config_files="$ac_config_files interception/Makefile 
sanitizer_common/Makefile libbacktrace/Makefile lsan/Makefile asan/Makefile 
hwasan/Makefile ubsan/Makefile"
+ac_config_files="$ac_config_files interception/Makefile 
sanitizer_common/Makefile libbacktrace/Makefile lsan/Makefile asan/Makefile 
ubsan/Makefile"
 
 
 if test "x$TSAN_SUPPORTED" = "xyes"; then
@@ -16799,6 +16810,11 @@ if test "x$TSAN_SUPPORTED" = "xyes"; then
 
 fi
 
+if test "x$HWASAN_SUPPORTED" = "xyes"; then
+  ac_config_files="$ac_config_files hwasan/Makefile"
+
+fi
+
 
 
 
@@ -17059,6 +17075,10 @@ if test -z "${LSAN_SUPPORTED_TRUE}" && test -z 
"${LSAN_SUPPORTED_FALSE}"; then
   as_fn_error $? "conditional

v2 [PATCH 0/X] Introduce HWASAN sanitizer to GCC

2019-11-07 Thread Matthew Malcomson

I have rebased this series onto Martin Liska's patches that take the most
recent libhwasan from upstream LLVM.
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00340.html

I've also cleared up some nomenclature (I had previously used the word 'colour'
a few times instead of the word 'tag' and that clashes with other descriptions)
and based the patch series off a more recent GCC revision (r277678).

There's an ongoing discussion on whether to have __SANITIZER_ADDRESS__, or
__SANITIZER_HWADDRESS__, but I'm keeping that discussion to the existing
thread.

Similarly there's still the question around C++ exceptions that I'm keeping to
the existing thread (on the first patch series).


NOTE:
  Unfortunately, there's a bug in the more recent version of GCC I rebased
  onto.
  Hwasan catches this when bootstrapping, which means bootstrapping with hwasan
  fails.
  I'm working on tracking the bug down now, but sending this series upstream
  for visibility while that happens.

  Bugzilla link:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410

Entire patch series attached to cover letter.

all-patches.tar.gz
Description: all-patches.tar.gz

Re: [PATCH 13/X] [libsanitizer][options] Add hwasan flags and argument parsing

2019-11-07 Thread Evgenii Stepanov via gcc-patches

Clang has a function level attribute,
  __attribute__((no_sanitize("hwaddress")))
a feature macro
  #if __has_feature(hwaddress_sanitizer)
and a blacklist section
  [hwaddress]
  https://clang.llvm.org/docs/SanitizerSpecialCaseList.html

I think it makes sense for the compiler to err on the side of not losing
information and provide distinct macros for these two sanitizers. If the
kernel does not care about the difference, they can add a simple #ifdef.
They would need to, anyway, because gcc does not have feature macros and
clang does not define __SANITIZE_ADDRESS__.


On Thu, Nov 7, 2019 at 7:51 AM Andrey Konovalov 
wrote:

> On Thu, Nov 7, 2019 at 1:48 PM Matthew Malcomson
>  wrote:
> >
> > On 05/11/2019 13:11, Andrey Konovalov wrote:
> > > On Tue, Nov 5, 2019 at 12:34 PM Matthew Malcomson
> > >  wrote:
> > >>
> > >> NOTE:
> > >> --
> > >> I have defined a new macro of __SANITIZE_HWADDRESS__ that gets
> > >> automatically defined when compiling with hwasan.  This is analogous
> to
> > >> __SANITIZE_ADDRESS__ which is defined when compiling with asan.
> > >>
> > >> Users in the kernel have expressed an interest in using
> > >> __SANITIZE_ADDRESS__ for both
> > >> (
> https://lists.infradead.org/pipermail/linux-arm-kernel/2019-October/690703.html
> ).
> > >>
> > >> One approach to do this could be to define __SANITIZE_ADDRESS__ with
> > >> different values depending on whether we are compiling with hwasan or
> > >> asan.
> > >>
> > >> Using __SANITIZE_ADDRESS__ for both means that code like the kernel
> > >> which wants to treat the two sanitizers as alternate implementations
> of
> > >> the same thing gets that automatically.
> > >>
> > >> My preference is to use __SANITIZE_HWADDRESS__ since that means any
> > >> existing code will not be predicated on this (and hence I guess less
> > >> surprises), but would appreciate feedback on this given the point
> above.
> > >
> > > +Evgenii Stepanov
> > >
> > > (A repost from my answer from the mentioned thread):
> > >
> > >> Similarly, I'm thinking I'll add no_sanitize_hwaddress as the hwasan
> > >> equivalent of no_sanitize_address, which will require an update in the
> > >> kernel given it seems you want KASAN to be used the same whether using
> > >> tags or not.
> > >
> > > We have intentionally reused the same macros to simplify things. Is
> > > there any reason to use separate macros for GCC? Are there places
> > > where we need to use specifically no_sanitize_hwaddress and
> > > __SANITIZE_HWADDRESS__, but not no_sanitize_address and
> > > __SANITIZE_ADDRESS__?
> > >
> > >
> >
> > I've just looked through some open source repositories (via github
> > search) that used the existing __SANITIZE_ADDRESS__ macro.
> >
> > There are a few repos that would want to use a feature macro for hwasan
> > or asan in the exact same way as each other, but of the 31 truly
> > different uses I found, 11 look like they would need to distinguish
> > between hwasan and asan (where 4 uses I found I couldn't easily tell)
> >
> > NOTE
> > - This is a count of unique uses, ignoring those repos which use a file
> > from another repo.
> > - I'm just giving links to the first of the relevant kind that I found,
> > not putting effort into finding the "canonical" source of each
> repository.
> >
> >
> > Places that need distinction (and their reasons):
> >
> > There are quite a few that use the ASAN_POISON_MEMORY_REGION and
> > ASAN_UNPOISON_MEMORY_REGION macros to poison/unpoison memory themselves.
> >   This abstraction doesn't quite make sense in a hwasan environment, as
> > there is not really a "poisoned/unpoisoned" concept.
> >
> > https://github.com/laurynas-biveinis/unodb
> > https://github.com/darktable-org/rawspeed
> > https://github.com/MariaDB/server
> > https://github.com/ralfbrown/framepac-ng
> > https://github.com/peters/aom
> > https://github.com/pspacek/knot-resolver-docker-fix
> > https://github.com/harikrishnan94/sheap
> >
> >
> > Some use it to record their compilation "type" as `-fsanitize=address`
> > https://github.com/wallix/redemption
> >
> > Or to decide to set the environment variable ASAN_OPTIONS
> > https://github.com/dephonatine/VBox5.2.18
> >
> > Others worry about stack space due to asan's redzones (hwasan has a much
> > smaller stack memory overhead).
> > https://github.com/fastbuild/fastbuild
> > https://github.com/scylladb/seastar
> > (n.b. seastar has a lot more conditioned code that would be the same
> > between asan and hwasan).
> >
> >
> > Each of these needs to know the difference between compiling with asan
> > and hwasan, so I'm confident that having some way to determine that in
> > the source code is a good idea.
> >
> >
> > I also believe there could be code in the wild that would need to
> > distinguish between hwasan and asan where the existence of tags could be
> > problematic:
> >
> > - code already using the top-byte-ignore feature may be able to be used
> > with asan but not hwasan.
> > - Code that makes assumptions about pointer

Re: [PATCH 2/2] Introduce the gcc option --record-gcc-command-line

2019-11-07 Thread Egeyar Bagcioglu


Hello again Segher!

On 11/7/19 9:03 AM, Segher Boessenkool wrote:

Hi!

On Wed, Nov 06, 2019 at 06:21:34PM +0100, Egeyar Bagcioglu wrote:

+static const char *
+record_gcc_command_line_spec_function(int argc ATTRIBUTE_UNUSED, const char 
**argv)
+{
+  const char *filename = argv[0];
+  FILE *out = fopen (filename, "w");
+  if (out)
+{
+  fputs (_gcc_argv[0], out);
+  for (int i = 1; i < _gcc_argc; i += 1)
+   {
+ fputc (' ', out);
+ fputs (_gcc_argv[i], out);
+   }
+  fclose (out);
+}
+  return filename;
+}

Pet peeve: just use fprintf?


okay.


If there is an error, should this return 0 or indicate error some way?
Making the error silent ("if (out)") seems weird, otherwise -- if it is
on purpose, a comment might help.


It was on purpose so that this does not interfere with the builds. 
However, re-watching today Nick's talk at Cauldron where he mentions 
it's good to fail even in such cases, I am rethinking if we would like 
to error out here. If anyone has any preference on this, I'd like to hear.



+int
+elf_record_gcc_command_line ()
+{
+  char cmdline[256];
+  section * sec;

section *sec;


right.


+  sec = get_section (targetm.asm_out.record_gcc_switches_section,
+SECTION_DEBUG | SECTION_MERGE
+| SECTION_STRINGS | (SECTION_ENTSIZE & 1),
+NULL);
+  switch_to_section (sec);
+
+  ASM_OUTPUT_ASCII(asm_out_file, version_string, strlen(version_string));
+
+  FILE *out_stream = fopen (gcc_command_line_file, "r");
+  if (out_stream)
+{
+  ASM_OUTPUT_ASCII(asm_out_file, " : ", 3);
+  ssize_t cmdline_length;
+  while ((cmdline_length = fread(cmdline, 1, 256, out_stream)))

fread (
(and many more like it).


okay, I will fix them. Thanks for catching.


+   ASM_OUTPUT_ASCII(asm_out_file, cmdline, cmdline_length);
+}
+  cmdline[0] = 0;
+  ASM_OUTPUT_ASCII(asm_out_file, cmdline, 1);
+
+  /* The return value is currently ignored by the caller, but must be 0.  */
+  return 0;
+}

A temporary file like this isn't so great.


GCC operates with temporary files, doesn't it? What is the concern that 
is specific to this one? That is the most reasonable way I found to pass 
the argv of gcc to child processes for saving. Other ways of passing it 
that I could think of, or the idea of saving it in the driver were 
actually very bad ideas.



Opening a file as "r" but then
accessing it with "fread" is peculiar, too.


I am not sure what you mean here. Is it that you prefer "wb" and "rb" 
instead of "w" and "r"? I thought it was enough to use a consistent pair.



HTH,


It does! Thanks a lot for the review!

Regards
Egeyar

[PATCH] [LRA] Do not use eliminable registers for spilling

2019-11-07 Thread Kwok Cheung Yeung


Hello

On AMD GCN, I encountered the following situation in the following 
testcases using the compilation flags '-O2 -ftracer -fsplit-paths':


libgomp.oacc-fortran/reduction-1.f90
libgomp.oacc-fortran/reduction-2.f90
libgomp.oacc-fortran/reduction-3.f90
gcc.c-torture/execute/ieee/pr50310.c

- LRA decides to spill a register to s14 (which is used for the hard 
frame pointer, but is not in use due to the -fomit-frame-pointer implied 
by -O2). The reload dump has:


  Spill r612 into hr14
...
(insn 597 711 712 2 (set (reg:BI 129 scc [612])
(ne:BI (reg:SI 2 s2 [684])
(const_int 0 [0]))) "reduction-1.f90":22:0 23 {cstoresi4}
 (nil))
...
(insn 710 713 598 2 (set (reg:BI 14 s14)
(reg:BI 160 v0 [685])) "reduction-1.f90":22:0 3 {*movbi}
 (nil))

- Later on, LRA decides to allocate s14 to a pseudo:

	 Assigning to 758 (cl=ALL_REGS, orig=758, freq=388, tfirst=758, 
tfreq=388)...

   Assign 14 to subreg reload r758 (freq=388)
...
(insn 801 786 787 34 (set (reg:BI 14 s14 [758])
(reg:BI 163 v3 [758])) 3 {*movbi}
 (nil))

- But then the next BB reloads the value previously spilled into s14, 
which has been clobbered by previous instruction:


(insn 733 144 732 9 (set (reg:BI 163 v3 [706])
(reg:BI 14 s14)) 3 {*movbi}
 (nil))

A similar issue has been dealt with in the past in PR83327, which was 
fixed in r258093. However, it does not work here - s14 is not marked as 
conflicting with pseudo 758.


This is because s14 is in eliminable_regset - if 
HARD_FRAME_POINTER_IS_FRAME_POINTER is false, 
ira_setup_eliminable_regset puts HARD_FRAME_POINTER_REGNUM into 
eliminable_regset even if the frame pointer is not needed (Why is this? 
It seems to have been that way since IRA was introduced). At the 
beginning of process_bb_lives (in lra-lives.c), eliminable_regset is 
~ANDed out of hard_regs_live, so even if s14 is in the live-outs of the 
BB, it will be removed from consideration when registering conflicts 
with pseudos in the BB.


(As an aside, the liveness of eliminable spill registers would 
previously have been updated by make_hard_regno_live and 
make_hard_regno_dead, but as of r276440 '[LRA] Don't make eliminable 
registers live (PR91957)' this is no longer the case.)


Given that conflicts with registers in eliminable_regset is not tracked, 
I believe the easiest fix is simply to prevent eliminable registers from 
being used as spill registers.


Built and tested on AMD GCN with no regressions.

I've bootstrapped it on x86_64, but there is no point testing on it ATM 
as TARGET_SPILL_CLASS was disabled in r237792.


Okay for trunk?

Kwok Yeung


[LRA] Do not use eliminable registers for spilling

2019-11-07  Kwok Cheung Yeung  

gcc/
* lra-spills.c (assign_spill_hard_regs): Do not spill into
registers in eliminable_regset.

diff --git a/gcc/lra-spills.c b/gcc/lra-spills.c
index 0068e52..54f76cc 100644
--- a/gcc/lra-spills.c
+++ b/gcc/lra-spills.c
@@ -283,6 +283,8 @@ assign_spill_hard_regs (int *pseudo_regnos, int n)
   for (k = 0; k < spill_class_size; k++)
{
  hard_regno = ira_class_hard_regs[spill_class][k];
+ if (TEST_HARD_REG_BIT (eliminable_regset, hard_regno))
+   continue;
  if (! overlaps_hard_reg_set_p (conflict_hard_regs, mode, hard_regno))
break;
}

Re: [PATCH 1/2] Introduce dg-require-target-object-format

2019-11-07 Thread Egeyar Bagcioglu

On 11/7/19 6:17 PM, jose.march...@oracle.com wrote:

 On 11/7/19 8:47 AM, Segher Boessenkool wrote:

 > Hi!
 >
 > On Wed, Nov 06, 2019 at 06:21:33PM +0100, Egeyar Bagcioglu wrote:
 >> gcc/testsuite/ChangeLog:
 >> 2019-11-06  Egeyar Bagcioglu  
 >>
 >>  * lib/target-supports-dg.exp: Define 
dg-require-target-object-format.
 > * lib/target-supports-dg.exp (dg-require-target-object-format): New.

 Right, thanks for the correction!

 >> +proc dg-require-target-object-format { args } {

 >> +if { [gcc_target_object_format] == [lindex $args 1] } {
 >> + return
 >> +}
 > "==" for strings is dangerous.  Use "eq" or "string equal"?

 I see many "string match"es. I will make the change.

 Just as a note, though: Why is it dangerous? In C, for example, this

 would be a pointer comparison and consistently fail. In many other
 languages, it is indeed a string comparison and works well.

 I am asking also because I see "==" for variable vs literal strings in

 gcc/testsuite/lib. As opposed to C-like languages that consistently
 fail them, these seem to work. If you still think this is dangerous,
 I'd love to know why. Also, if so, someone might want to check the
 library.

Because if the string happens to look like a number Tcl will perform
arithmetic comparison instead of lexicographic comparison, i.e. "01" ==
"1" evaluates to true :)

Oh lovely indeed! I am glad I asked.
Thanks Jose!

Re: [PATCH 1/2] Introduce dg-require-target-object-format

2019-11-07 Thread Jose E. Marchesi

On 11/7/19 8:47 AM, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Nov 06, 2019 at 06:21:33PM +0100, Egeyar Bagcioglu wrote:
>> gcc/testsuite/ChangeLog:
>> 2019-11-06  Egeyar Bagcioglu  
>>
>>  * lib/target-supports-dg.exp: Define 
dg-require-target-object-format.
> * lib/target-supports-dg.exp (dg-require-target-object-format): New.

Right, thanks for the correction!

>> +proc dg-require-target-object-format { args } {
>> +if { [gcc_target_object_format] == [lindex $args 1] } {
>> +return
>> +}
> "==" for strings is dangerous.  Use "eq" or "string equal"?

I see many "string match"es. I will make the change.

Just as a note, though: Why is it dangerous? In C, for example, this
would be a pointer comparison and consistently fail. In many other
languages, it is indeed a string comparison and works well.

I am asking also because I see "==" for variable vs literal strings in
gcc/testsuite/lib. As opposed to C-like languages that consistently
fail them, these seem to work. If you still think this is dangerous,
I'd love to know why. Also, if so, someone might want to check the
library.

Because if the string happens to look like a number Tcl will perform
arithmetic comparison instead of lexicographic comparison, i.e. "01" ==
"1" evaluates to true :)

Re: [4/6] Optionally pick the cheapest loop_vec_info

2019-11-07 Thread Richard Sandiford

Richard Biener  writes:
> On Wed, Nov 6, 2019 at 3:01 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Tue, Nov 5, 2019 at 3:29 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> This patch adds a mode in which the vectoriser tries each available
>> >> base vector mode and picks the one with the lowest cost.  For now
>> >> the behaviour is behind a default-off --param, but a later patch
>> >> enables it by default for SVE.
>> >>
>> >> The patch keeps the current behaviour of preferring a VF of
>> >> loop->simdlen over any larger or smaller VF, regardless of costs
>> >> or target preferences.
>> >
>> > Can you avoid using a --param for this?  Instead I'd suggest to
>> > amend the vectorize_modes target hook to return some
>> > flags like VECT_FIRST_MODE_WINS.  We'd eventually want
>> > to make the target able to say do-not-vectorize-epiloges-of-MODE
>> > (I think we may not want to vectorize SSE vectorized loop
>> > epilogues with MMX-with-SSE or GPRs for example).  I guess
>> > for the latter we'd use a new target hook.
>>
>> The reason for using a --param was that I wanted a way of turning
>> this on and off on the command line, so that users can experiment
>> with it if necessary.  E.g. enabling the --param could be a viable
>> alternative to -mprefix-* in some cases.  Disabling it would be
>> a way of working around a bad cost model decision without going
>> all the way to -fno-vect-cost-model.
>>
>> These kinds of --params can become useful workarounds until an
>> optimisation bug is fixed.
>
> I'm arguing that the default depends on the actual ISAs so there isn't
> a one-fits all and given we have OMP SIMD and target cloning for
> multiple ISAs this looks like a wrong approach.  For sure the
> target can use its own switches to override defaults here, or alternatively
> we might want to have a #pragma GCC simdlen mimicing OMP behavior
> here.

I agree there's no one-size-fits-all choice here, but that's true for
other --params too.  The problem with using target switches is that we
have to explain them and to keep accepting them "forever" (or at least
with a long deprecation period).  Whereas the --param was just something
that people could play with or perhaps use to work around problems
temporarily.  It would come with no guarantees attached.  And what the
--param did applied to any targets that support multiple modes,
regardless of what the targets do by default.

All that said, here's a version that returns the bitmask you suggested.
I ended up making the flag select the new behaviour and 0 select the
current behaviour, rather than have a flag for "first mode wins".
Tested as before.

Thanks,
Richard


2019-11-07  Richard Sandiford  

gcc/
* target.h (VECT_COMPARE_COSTS): New constant.
* target.def (autovectorize_vector_modes): Return a bitmask of flags.
* doc/tm.texi: Regenerate.
* targhooks.h (default_autovectorize_vector_modes): Update accordingly.
* targhooks.c (default_autovectorize_vector_modes): Likewise.
* config/aarch64/aarch64.c (aarch64_autovectorize_vector_modes):
Likewise.
* config/arc/arc.c (arc_autovectorize_vector_modes): Likewise.
* config/arm/arm.c (arm_autovectorize_vector_modes): Likewise.
* config/i386/i386.c (ix86_autovectorize_vector_modes): Likewise.
* config/mips/mips.c (mips_autovectorize_vector_modes): Likewise.
* tree-vectorizer.h (_loop_vec_info::vec_outside_cost)
(_loop_vec_info::vec_inside_cost): New member variables.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize them.
(vect_better_loop_vinfo_p, vect_joust_loop_vinfos): New functions.
(vect_analyze_loop): When autovectorize_vector_modes returns
VECT_COMPARE_COSTS, try vectorizing the loop with each available
vector mode and picking the one with the lowest cost.
(vect_estimate_min_profitable_iters): Record the computed costs
in the loop_vec_info.

Index: gcc/target.h
===
--- gcc/target.h2019-11-07 15:11:15.831017985 +
+++ gcc/target.h2019-11-07 16:52:30.037198353 +
@@ -218,6 +218,14 @@ enum omp_device_kind_arch_isa {
   omp_device_isa
 };
 
+/* Flags returned by TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES:
+
+   VECT_COMPARE_COSTS
+   Tells the loop vectorizer to try all the provided modes and
+   pick the one with the lowest cost.  By default the vectorizer
+   will choose the first mode that works.  */
+const unsigned int VECT_COMPARE_COSTS = 1U << 0;
+
 /* The target structure.  This holds all the backend hooks.  */
 #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
 #define DEFHOOK(NAME, DOC, TYPE, PARAMS, INIT) TYPE (* NAME) PARAMS;
Index: gcc/target.def
===
--- gcc/target.def  2019-11-07 15:11:15.819018071 +
+++ gcc/target.def  2019-11-07

[C++ PATCH] Implement D1959R0, remove weak_equality and strong_equality.

2019-11-07 Thread Jason Merrill

Shortly after I finished implementing the previous <=> semantics, the
committee decided to remove the *_equality comparison categories, because
they were largely obsoleted by the earlier change that separated operator==
from its original dependency on operator<=>.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/
* method.c (enum comp_cat_tag, comp_cat_info): Remove *_equality.
(genericize_spaceship, common_comparison_type): Likewise.
* typeck.c (cp_build_binary_op): Move SPACESHIP_EXPR to be with the
relational operators, exclude other types no longer supported.
libstdc++-v3/
* libsupc++/compare: Remove strong_equality and weak_equality.
---
 gcc/cp/method.c   |  47 +-
 gcc/cp/typeck.c   |  13 +-
 .../g++.dg/cpp2a/spaceship-scalar1-neg.C  |  25 
 .../g++.dg/cpp2a/spaceship-scalar1.C  |  27 
 .../g++.dg/cpp2a/spaceship-scalar1a.C |  12 --
 .../g++.dg/cpp2a/spaceship-scalar3.C  |  18 +--
 libstdc++-v3/libsupc++/compare| 136 +-
 7 files changed, 44 insertions(+), 234 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/spaceship-scalar1-neg.C

diff --git a/gcc/cp/method.c b/gcc/cp/method.c
index c9dd90fcba7..47441c10c52 100644
--- a/gcc/cp/method.c
+++ b/gcc/cp/method.c
@@ -882,8 +882,6 @@ do_build_copy_assign (tree fndecl)
 
 enum comp_cat_tag
 {
-  cc_weak_equality,
-  cc_strong_equality,
   cc_partial_ordering,
   cc_weak_ordering,
   cc_strong_ordering,
@@ -901,8 +899,6 @@ struct comp_cat_info_t
 };
 static const comp_cat_info_t comp_cat_info[cc_last]
 = {
-   { "weak_equality", "equivalent", "nonequivalent" },
-   { "strong_equality", "equal", "nonequal" },
{ "partial_ordering", "equivalent", "greater", "less", "unordered" },
{ "weak_ordering", "equivalent", "greater", "less" },
{ "strong_ordering", "equal", "greater", "less" }
@@ -1028,21 +1024,8 @@ spaceship_comp_cat (tree optype)
 return cc_strong_ordering;
   else if (TREE_CODE (optype) == REAL_TYPE)
 return cc_partial_ordering;
-  else if (TYPE_PTRFN_P (optype) || TYPE_PTRMEM_P (optype)
-  || NULLPTR_TYPE_P (optype))
-return cc_strong_equality;
-  else if (TREE_CODE (optype) == COMPLEX_TYPE)
-{
-  tree intype = optype;
-  while (TREE_CODE (intype) == COMPLEX_TYPE)
-   intype = TREE_TYPE (intype);
-  if (TREE_CODE (intype) == REAL_TYPE)
-   return cc_weak_equality;
-  else
-   return cc_strong_equality;
-}
 
-  /* FIXME should vector <=> produce a vector of one of the above?  */
+  /* ??? should vector <=> produce a vector of one of the above?  */
   gcc_unreachable ();
 }
 
@@ -1065,35 +1048,29 @@ genericize_spaceship (tree type, tree op0, tree op1)
   comp_cat_tag tag = cat_tag_for (type);
   gcc_checking_assert (tag < cc_last);
 
-  tree eq = lookup_comparison_result (tag, type, 0);
-  tree negt = lookup_comparison_result (tag, type, 1);
-
-  if (tag == cc_strong_equality || tag == cc_weak_equality)
-{
-  tree comp = fold_build2 (EQ_EXPR, boolean_type_node, op0, op1);
-  return fold_build3 (COND_EXPR, type, comp, eq, negt);
-}
-
   tree r;
   op0 = save_expr (op0);
   op1 = save_expr (op1);
 
+  tree gt = lookup_comparison_result (tag, type, 1);
+
   if (tag == cc_partial_ordering)
 {
   /* op0 == op1 ? equivalent : op0 < op1 ? less :
 op0 > op1 ? greater : unordered */
   tree uo = lookup_comparison_result (tag, type, 3);
   tree comp = fold_build2 (GT_EXPR, boolean_type_node, op0, op1);
-  r = fold_build3 (COND_EXPR, type, comp, negt, uo);
+  r = fold_build3 (COND_EXPR, type, comp, gt, uo);
 }
   else
 /* op0 == op1 ? equal : op0 < op1 ? less : greater */
-r = negt;
+r = gt;
 
   tree lt = lookup_comparison_result (tag, type, 2);
   tree comp = fold_build2 (LT_EXPR, boolean_type_node, op0, op1);
   r = fold_build3 (COND_EXPR, type, comp, lt, r);
 
+  tree eq = lookup_comparison_result (tag, type, 0);
   comp = fold_build2 (EQ_EXPR, boolean_type_node, op0, op1);
   r = fold_build3 (COND_EXPR, type, comp, eq, r);
 
@@ -1178,18 +1155,6 @@ common_comparison_type (vec )
return void_type_node;
 }
 
-  /* Otherwise, if at least one T i is std::weak_equality, or at least one T i
- is std::strong_equality and at least one T j is std::partial_ordering or
- std::weak_ordering, U is std::weak_equality.  */
-  if (tree t = seen[cc_weak_equality]) return t;
-  if (seen[cc_strong_equality]
-  && (seen[cc_partial_ordering] || seen[cc_weak_ordering]))
-return lookup_comparison_category (cc_weak_equality);
-
-  /* Otherwise, if at least one T i is std::strong_equality, U is
- std::strong_equality.  */
-  if (tree t = seen[cc_strong_equality]) return t;
-
   /* Otherwise, if at least one T i is std::partial_ordering, U is
  std::partial_ordering.  */
   if (tree t = seen[cc_partial_ordering]) return t;
diff --git

Re: [PATCH 2/2][MIPS][RFC] Emit .note.GNU-stack for hard-float linux targets.

2019-11-07 Thread Dragan Mladjenovic

On 01.11.2019. 11:32, Dragan Mladjenovic wrote:
> On 10.08.2019. 00:15, Joseph Myers wrote:
>> On Fri, 9 Aug 2019, Jeff Law wrote:
>>
 2019-08-05  Dragan Mladjenovic  

 * config.in: Regenerated.
 * config/mips/linux.h (NEED_INDICATE_EXEC_STACK): Define to 1
 for TARGET_LIBC_GNUSTACK.
 * configure: Regenerated.
 * configure.ac: Define TARGET_LIBC_GNUSTACK if glibc version is
 found 2.31 or greater.
>>> My only concern here is the configure bits.  So for example, will it do
>>> the right thing if you're cross-compiling to a MIPS linux target?  If
>>> so, how?  If not, do we need to make it a first class configure option
>>> so that it can be specified when building cross MIPS linux toolchains?
>>
>> The key point of using GCC_GLIBC_VERSION_GTE_IFELSE is that (a) it checks
>> the target glibc headers if available when GCC is built and (b) if not
>> available, you can still use --with-glibc-version when configuring
>> GCC, to
>> get the right configuration in a bootstrap compiler built before glibc is
>> built (the latter is necessary on some architectures to get the right
>> stack-protector configuration for bootstrapping glibc, but may be useful
>> in other cases as well).
>>
>> My main concern about this patch is the one I gave in
>>  about what
>> the configuration mechanism should be, on a whole-toolchain level, to say
>> whether you are OK with a requirement for a 4.8 or later kernel.
>>
>
> Sorry for the late reply.
>
> I was waiting to backport [1] to most of the glibc release branches in
> use, but I got sidetracked along the way.
>
> After this patch lands the preferred way to configure gcc would be using
> --with-glibc-version=2.31 and to use said glibc.
> If the user/distribution can live with minimal kernel requirement of 4.8
> the glibc used should be configured with --enable-kernel=4.8.
> I also plan to backport the [1] to limit the opportunity for building
> the possibly broken glibc with the gcc w/ enabled .note.GNU-stack.
>
> This is all tedious and user has to be aware of all of it to make it
> work, but hopefully over time the distributions will default to
> --with-glibc-version=2.31 and --enable-kernel=4.8. I guess providing the
> detailed NEWS entry for this change would help a bit.
>
> Is there any objections to getting this on the trunk before the end of
> stage1?
>
> [1] https://sourceware.org/ml/libc-alpha/2019-08/msg00639.html
>

Small update and gentle ping. The glibc change was backported all the 
way back to 2.24.

Best regards,
Dragan

Re: [2/6] Don't assign a cost to vectorizable_assignment

2019-11-07 Thread Richard Sandiford

Richard Biener  writes:
> On Wed, Nov 6, 2019 at 4:58 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Tue, Nov 5, 2019 at 3:27 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> vectorizable_assignment handles true SSA-to-SSA copies (which hopefully
>> >> we don't see in practice) and no-op conversions that are required
>> >> to maintain correct gimple, such as changes between signed and
>> >> unsigned types.  These cases shouldn't generate any code and so
>> >> shouldn't count against either the scalar or vector costs.
>> >>
>> >> Later patches test this, but it seemed worth splitting out.
>> >
>> > Hmm, but you have to adjust vect_compute_single_scalar_iteration_cost and
>> > possibly the SLP cost walk as well, otherwise we're artificially making
>> > those copies cheaper when vectorized.
>>
>> Ah, yeah.  It looks complicated to reproduce the conditions exactly
>> there, so how about just costing 1 copy in vectorizable_assignment
>> to counteract it, and ignore ncopies?
>
> I guess costing a single scalar_stmt ought to make it exactly offset
> the scalar cost?

To summarise what we said on IRC: the problem with that is that we
need to count VF scalar stmts, where VF might be a runtime value.
The follow-on loop costing code copes with variable VF without
relying on vect_vf_for_cost.

Calling vectorizable_assignment from the scalar costing code
seemed like too much of a hack.  And it turns out that we can't
delay the scalar costing until after vect_analyze_stmts because
vect_enhance_data_refs_alignment needs it before then.  Reworking
this whole thing is too much work for GCC 10 at this stage.

So this patch goes with your suggestion of using a test based on
tree_nop_conversion.  To make sure that the scalar and vector costs
stay somewhat consistent, vectorizable_assignment continues to cost
stmts for which the new predicate is false.

Tested as before.

Thanks,
Richard


2019-11-07  Richard Sandiford  

gcc/
* tree-vectorizer.h (vect_nop_conversion_p): Declare.
* tree-vect-stmts.c (vect_nop_conversion_p): New function.
(vectorizable_assignment): Don't add a cost for nop conversions.
* tree-vect-loop.c (vect_compute_single_scalar_iteration_cost):
Likewise.
* tree-vect-slp.c (vect_bb_slp_scalar_cost): Likewise.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2019-11-07 15:11:22.290972236 +
+++ gcc/tree-vectorizer.h   2019-11-07 16:32:14.817523866 +
@@ -1654,6 +1654,7 @@ extern tree vect_get_vec_def_for_stmt_co
 extern bool vect_transform_stmt (stmt_vec_info, gimple_stmt_iterator *,
 slp_tree, slp_instance);
 extern void vect_remove_stores (stmt_vec_info);
+extern bool vect_nop_conversion_p (stmt_vec_info);
 extern opt_result vect_analyze_stmt (stmt_vec_info, bool *, slp_tree,
 slp_instance, stmt_vector_for_cost *);
 extern void vect_get_load_cost (stmt_vec_info, int, bool,
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-11-07 15:11:50.134775028 +
+++ gcc/tree-vect-stmts.c   2019-11-07 16:32:14.817523866 +
@@ -5284,6 +5284,29 @@ vectorizable_conversion (stmt_vec_info s
   return true;
 }
 
+/* Return true if we can assume from the scalar form of STMT_INFO that
+   neither the scalar nor the vector forms will generate code.  STMT_INFO
+   is known not to involve a data reference.  */
+
+bool
+vect_nop_conversion_p (stmt_vec_info stmt_info)
+{
+  gassign *stmt = dyn_cast  (stmt_info->stmt);
+  if (!stmt)
+return false;
+
+  tree lhs = gimple_assign_lhs (stmt);
+  tree_code code = gimple_assign_rhs_code (stmt);
+  tree rhs = gimple_assign_rhs1 (stmt);
+
+  if (code == SSA_NAME || code == VIEW_CONVERT_EXPR)
+return true;
+
+  if (CONVERT_EXPR_CODE_P (code))
+return tree_nop_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs));
+
+  return false;
+}
 
 /* Function vectorizable_assignment.
 
@@ -5399,7 +5422,9 @@ vectorizable_assignment (stmt_vec_info s
 {
   STMT_VINFO_TYPE (stmt_info) = assignment_vec_info_type;
   DUMP_VECT_SCOPE ("vectorizable_assignment");
-  vect_model_simple_cost (stmt_info, ncopies, dt, ndts, slp_node, 
cost_vec);
+  if (!vect_nop_conversion_p (stmt_info))
+   vect_model_simple_cost (stmt_info, ncopies, dt, ndts, slp_node,
+   cost_vec);
   return true;
 }
 
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2019-11-07 15:11:22.290972236 +
+++ gcc/tree-vect-loop.c2019-11-07 16:32:14.813523891 +
@@ -1126,7 +1126,9 @@ vect_compute_single_scalar_iteration_cos
  else
kind = scalar_store;
 }
-  else
+ else if (vect_nop_conversion_p (stmt_info))
+

Re: [PATCH 1/2] Introduce dg-require-target-object-format

2019-11-07 Thread Egeyar Bagcioglu


Hi Segher!


On 11/7/19 8:47 AM, Segher Boessenkool wrote:

Hi!

On Wed, Nov 06, 2019 at 06:21:33PM +0100, Egeyar Bagcioglu wrote:

gcc/testsuite/ChangeLog:
2019-11-06  Egeyar Bagcioglu  

 * lib/target-supports-dg.exp: Define dg-require-target-object-format.

* lib/target-supports-dg.exp (dg-require-target-object-format): New.


Right, thanks for the correction!


+proc dg-require-target-object-format { args } {
+if { [gcc_target_object_format] == [lindex $args 1] } {
+   return
+}

"==" for strings is dangerous.  Use "eq" or "string equal"?


I see many "string match"es. I will make the change.

Just as a note, though: Why is it dangerous? In C, for example, this 
would be a pointer comparison and consistently fail. In many other 
languages, it is indeed a string comparison and works well.


I am asking also because I see "==" for variable vs literal strings in 
gcc/testsuite/lib. As opposed to C-like languages that consistently fail 
them, these seem to work. If you still think this is dangerous, I'd love 
to know why. Also, if so, someone might want to check the library.


Thanks for the review!

Egeyar

Re: [PATCH 0/2] Introduce a new GCC option, --record-gcc-command-line

2019-11-07 Thread Egeyar Bagcioglu





On 11/7/19 4:13 PM, Nick Clifton wrote:

Hi Egeyar,

Thanks for including me in this discussion.


This option is similar to -frecord-gcc-switches.

For the record I will also note that there is -fverbose-asm which
does almost the same thing, but only records the options as comments
in the assembler.  They are never converted into data in the actual
object files.


Right.


It is also worth noting that if your goal is to record how a binary
was produced, possibly with an eye to reproducibility, then you may
also need to record some environment variables too.


That is an important point and in fact, such a need might arise. Even in 
that case, I would like to keep the options of GCC as modular as 
possible so that we can pick and drop as the specific use cases require. 
This one is the one that saves the command line for now.


If we implement the aliases you suggested at the end, we can even create 
aliases that combine them for the user.



One thing I found with annobin is that capturing preprocessor options
(eg -D_FORTIFY_SOURCE) can be quite hard from inside gcc, since often
they have already been processed and discarded.  I do not know if this
affects your actual patch though.


Yes, this was one of Martin's points as well. It is not the case for 
this patch, though. I have noticed that the current options aim to 
capture more than the command line, dive into GCC, and therefore miss or 
discard some options given by the user. This patch only stores *argv* as 
the driver receives and writes it to the object file blindly. Therefore, 
capturing options such as -D_FORTIFY_SOURCE is no special case. Really, 
this patch only answers the simple question of "How did you call GCC?".



Speaking of annobin, I will bang the gcc plugin gong again here and say
that if your patch is rejected then you might want to consider turning
it into a plugin instead.  In that way you will not need approval from
the gcc maintainers.  But of course you will have to maintain and
publicise the plugin yourself.


Thanks for the suggestion. That will always be in my mind for more 
ambitious cases. In the case of this specific 160-line patch though, I 
believe it wouldn't bother us to maintain one more small patch in the 
GCC packages we distribute. It can be "only at Oracle!". However, for me 
this is really a basic functionality. Intel's icc has the most similar 
-sox option too. Thinking back on how many times I said "now, how did we 
compile this?" in the past, I would like this to be available for all 
GCC users too, in the spirit of sharing.



One other thought occurs to me, which is that if the patch is acceptable,
or at least the idea of it, then maybe it would be better to amalgamate
all of the current command line recording options into a single version.
Eg:

   --frecord-options=[dwarf,assembler,object]

where:

   --frecord-options=dwarf  is a synonym for -grecord-switches
   --frecord-options=assembler  is a synonym for -fverbose-asm
   --frecord-options=object is a synonym for your option

The user could supply one or more of the selectors to have the recording
happen in multiple places.

Just an idea.


This is a very good idea for the user experience! I already pass an 
argument to cc1; however, we can always simplify the arguments of the 
driver so that these similar functionalities can be called via one 
common name plus an option. I really like the idea.



Cheers
   Nick



Thanks Nick!

Regards
Egeyar

[patch] follow up on the aarch64 r18 story

2019-11-07 Thread Olivier Hainque

Hello,

About a year ago we discussed various possibilities regarding
an issue with the aarch64 selection of r18 as the static chain,
problematic in environments where the OS uses r18 for "private"
reasons. VxWorks uses this permission to use r18 as a pointer to
the current TCB, and Windows does something similar I think.

I first proposed a change allowing target ports to state that
r18 should be considered "fixed" and switching the static chain
to r11 in this case. I had picked r11 as the next after r9/r10
already used for stack checking,

After a few exchanges, we agreed that we should rather use r9
as the alternate static chain and remap the temporary registers
used for stack-checking to r10/r11.

  https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01592.html

We were also thinking that we should be able to change the
static chain register unconditionally.

Then during the last GNU cauldron, we realized that libffi
currently relies on the knowledge of which register is used for
the static chain, for the support of Go closures.

Unconditionally changing r18 to something else would break that
support, most notably on Linux. Adjusting libffi is possible but
raises a clear compatibility issue.

The attached patch is a renewed proposal to let target OS
ports state their private use of r18, switching the static
chain to r9 and declaring r18 "fixed" in that case. 

In addition to adjusting PROBE_STACK reg values to r10/r11,
the patch also moves the corresponding definitions together
with the other special register number definitions, so we have
a centralized view of the numbering choices.

This solves the immediate issue for VxWorks (which I'd like
to contribute) and seems like an interesting thing to have
in any case, regardless of whether or not we want to more
generally change the static chain someday.

With this change, we have good build and testing results
on both aarch64-elf and aarch64-vxworks7 for a gcc-9 based
compiler, where only vxworks states TARGET_OS_USES_R18.

I have also verified that I would build a native aarch64-linux
mainline compiler with the change and that it still picks r18 as
the static chain.

Is this variant ok to commit ?

Thanks in advance,

Best Regards,

Olivier

2019-11-07  Olivier Hainque  

* config/aarch64/aarch64.h: Define PROBE_STACK_FIRST_REG
and PROBE_STACK_SECOND_REG here, to r10/r11 respectively.
(TARGET_OS_USES_R18): New macro, default value 0 that target
OS configuration files may redefine.
(STATIC_CHAIN_REGNUM): r9 if TARGET_OS_USES_R18, r18 otherwise.
* config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG,
PROBE_STACK_SECOND_REG): Remove definitions.
(aarch64_conditional_register_usage): Preserve r18 if the target
OS uses it, and check that the static chain selection wouldn't
conflict.



aarch64-r18.diff
Description: Binary data

Re: [1/6] Fix vectorizable_conversion costs

2019-11-07 Thread Richard Biener

On November 7, 2019 4:14:14 PM GMT+01:00, Richard Sandiford 
 wrote:
>Richard Biener  writes:
>> On Tue, Nov 5, 2019 at 3:25 PM Richard Sandiford
>>  wrote:
>>>
>>> This patch makes two tweaks to vectorizable_conversion.  The first
>>> is to use "modifier" to distinguish between promotion, demotion,
>>> and neither promotion nor demotion, rather than using a code for
>>> some cases and "modifier" for others.  The second is to take ncopies
>>> into account for the promotion and demotion costs; previously we
>gave
>>> multiple copies the same cost as a single copy.
>>>
>>> Later patches test this, but it seemed worth splitting out.
>>
>> OK, but does ncopies properly handle unrolling with SLP?
>
>Bah, thanks for catching that.  Here's a fixed version, tested as
>before.

OK. 

Thanks, 
Richard. 

>Richard
>
>
>2019-11-07  Richard Sandiford  
>
>gcc/
>   * tree-vect-stmts.c (vect_model_promotion_demotion_cost): Take the
>   number of ncopies as an additional argument.
>   (vectorizable_conversion): Update call accordingly.  Use "modifier"
>   to check whether a conversion is between vectors with the same
>   numbers of units.
>
>Index: gcc/tree-vect-stmts.c
>===
>--- gcc/tree-vect-stmts.c  2019-11-07 15:11:49.0 +
>+++ gcc/tree-vect-stmts.c  2019-11-07 15:11:50.134775028 +
>@@ -917,26 +917,27 @@ vect_model_simple_cost (stmt_vec_info st
> }
> 
> 
>-/* Model cost for type demotion and promotion operations.  PWR is
>normally
>-   zero for single-step promotions and demotions.  It will be one if 
>-   two-step promotion/demotion is required, and so on.  Each
>additional
>+/* Model cost for type demotion and promotion operations.  PWR is
>+   normally zero for single-step promotions and demotions.  It will be
>+   one if two-step promotion/demotion is required, and so on.  NCOPIES
>+   is the number of vector results (and thus number of instructions)
>+   for the narrowest end of the operation chain.  Each additional
>step doubles the number of instructions required.  */
> 
> static void
> vect_model_promotion_demotion_cost (stmt_vec_info stmt_info,
>-  enum vect_def_type *dt, int pwr,
>+  enum vect_def_type *dt,
>+  unsigned int ncopies, int pwr,
>   stmt_vector_for_cost *cost_vec)
> {
>-  int i, tmp;
>+  int i;
>   int inside_cost = 0, prologue_cost = 0;
> 
>   for (i = 0; i < pwr + 1; i++)
> {
>-  tmp = (STMT_VINFO_TYPE (stmt_info) ==
>type_promotion_vec_info_type) ?
>-  (i + 1) : i;
>-  inside_cost += record_stmt_cost (cost_vec, vect_pow2 (tmp),
>- vec_promote_demote, stmt_info, 0,
>- vect_body);
>+  inside_cost += record_stmt_cost (cost_vec, ncopies,
>vec_promote_demote,
>+ stmt_info, 0, vect_body);
>+  ncopies *= 2;
> }
> 
>   /* FORNOW: Assuming maximum 2 args per stmts.  */
>@@ -4964,7 +4965,7 @@ vectorizable_conversion (stmt_vec_info s
>   if (!vec_stmt)  /* transformation not required.  */
> {
>   DUMP_VECT_SCOPE ("vectorizable_conversion");
>-  if (code == FIX_TRUNC_EXPR || code == FLOAT_EXPR)
>+  if (modifier == NONE)
> {
> STMT_VINFO_TYPE (stmt_info) = type_conversion_vec_info_type;
> vect_model_simple_cost (stmt_info, ncopies, dt, ndts, slp_node,
>@@ -4973,14 +4974,24 @@ vectorizable_conversion (stmt_vec_info s
>   else if (modifier == NARROW)
>   {
> STMT_VINFO_TYPE (stmt_info) = type_demotion_vec_info_type;
>-vect_model_promotion_demotion_cost (stmt_info, dt, multi_step_cvt,
>-cost_vec);
>+/* The final packing step produces one vector result per copy.  */
>+unsigned int nvectors
>+  = (slp_node ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) : ncopies);
>+vect_model_promotion_demotion_cost (stmt_info, dt, nvectors,
>+multi_step_cvt, cost_vec);
>   }
>   else
>   {
> STMT_VINFO_TYPE (stmt_info) = type_promotion_vec_info_type;
>-vect_model_promotion_demotion_cost (stmt_info, dt, multi_step_cvt,
>-cost_vec);
>+/* The initial unpacking step produces two vector results
>+   per copy.  MULTI_STEP_CVT is 0 for a single conversion,
>+   so >> MULTI_STEP_CVT divides by 2^(number of steps - 1).  */
>+unsigned int nvectors
>+  = (slp_node
>+ ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) >> multi_step_cvt
>+ : ncopies * 2);
>+vect_model_promotion_demotion_cost (stmt_info, dt, nvectors,
>+multi_step_cvt, cost_vec);
>   }
>   interm_types.release ();
>   return

Re: [PATCH 13/X] [libsanitizer][options] Add hwasan flags and argument parsing

2019-11-07 Thread Andrey Konovalov via gcc-patches

On Thu, Nov 7, 2019 at 1:48 PM Matthew Malcomson
 wrote:
>
> On 05/11/2019 13:11, Andrey Konovalov wrote:
> > On Tue, Nov 5, 2019 at 12:34 PM Matthew Malcomson
> >  wrote:
> >>
> >> NOTE:
> >> --
> >> I have defined a new macro of __SANITIZE_HWADDRESS__ that gets
> >> automatically defined when compiling with hwasan.  This is analogous to
> >> __SANITIZE_ADDRESS__ which is defined when compiling with asan.
> >>
> >> Users in the kernel have expressed an interest in using
> >> __SANITIZE_ADDRESS__ for both
> >> (https://lists.infradead.org/pipermail/linux-arm-kernel/2019-October/690703.html).
> >>
> >> One approach to do this could be to define __SANITIZE_ADDRESS__ with
> >> different values depending on whether we are compiling with hwasan or
> >> asan.
> >>
> >> Using __SANITIZE_ADDRESS__ for both means that code like the kernel
> >> which wants to treat the two sanitizers as alternate implementations of
> >> the same thing gets that automatically.
> >>
> >> My preference is to use __SANITIZE_HWADDRESS__ since that means any
> >> existing code will not be predicated on this (and hence I guess less
> >> surprises), but would appreciate feedback on this given the point above.
> >
> > +Evgenii Stepanov
> >
> > (A repost from my answer from the mentioned thread):
> >
> >> Similarly, I'm thinking I'll add no_sanitize_hwaddress as the hwasan
> >> equivalent of no_sanitize_address, which will require an update in the
> >> kernel given it seems you want KASAN to be used the same whether using
> >> tags or not.
> >
> > We have intentionally reused the same macros to simplify things. Is
> > there any reason to use separate macros for GCC? Are there places
> > where we need to use specifically no_sanitize_hwaddress and
> > __SANITIZE_HWADDRESS__, but not no_sanitize_address and
> > __SANITIZE_ADDRESS__?
> >
> >
>
> I've just looked through some open source repositories (via github
> search) that used the existing __SANITIZE_ADDRESS__ macro.
>
> There are a few repos that would want to use a feature macro for hwasan
> or asan in the exact same way as each other, but of the 31 truly
> different uses I found, 11 look like they would need to distinguish
> between hwasan and asan (where 4 uses I found I couldn't easily tell)
>
> NOTE
> - This is a count of unique uses, ignoring those repos which use a file
> from another repo.
> - I'm just giving links to the first of the relevant kind that I found,
> not putting effort into finding the "canonical" source of each repository.
>
>
> Places that need distinction (and their reasons):
>
> There are quite a few that use the ASAN_POISON_MEMORY_REGION and
> ASAN_UNPOISON_MEMORY_REGION macros to poison/unpoison memory themselves.
>   This abstraction doesn't quite make sense in a hwasan environment, as
> there is not really a "poisoned/unpoisoned" concept.
>
> https://github.com/laurynas-biveinis/unodb
> https://github.com/darktable-org/rawspeed
> https://github.com/MariaDB/server
> https://github.com/ralfbrown/framepac-ng
> https://github.com/peters/aom
> https://github.com/pspacek/knot-resolver-docker-fix
> https://github.com/harikrishnan94/sheap
>
>
> Some use it to record their compilation "type" as `-fsanitize=address`
> https://github.com/wallix/redemption
>
> Or to decide to set the environment variable ASAN_OPTIONS
> https://github.com/dephonatine/VBox5.2.18
>
> Others worry about stack space due to asan's redzones (hwasan has a much
> smaller stack memory overhead).
> https://github.com/fastbuild/fastbuild
> https://github.com/scylladb/seastar
> (n.b. seastar has a lot more conditioned code that would be the same
> between asan and hwasan).
>
>
> Each of these needs to know the difference between compiling with asan
> and hwasan, so I'm confident that having some way to determine that in
> the source code is a good idea.
>
>
> I also believe there could be code in the wild that would need to
> distinguish between hwasan and asan where the existence of tags could be
> problematic:
>
> - code already using the top-byte-ignore feature may be able to be used
> with asan but not hwasan.
> - Code that makes assumptions about pointer ordering (e.g. the autoconf
> program that looks for stack growth direction) could break on hwasan but
> not on asan.
> - Code looking for the distance between two objects in memory would need
> to account for tags in pointers.
>
>
> Hence I think this distinction is needed.

Evgenii, how does clang-compiled code dististinguishes whether it's
being compiled with ASAN or HWASAN?

[C++ PATCH] PR c++/91370 - Implement P1041R4 and P1139R2 - Stronger Unicode reqs

2019-11-07 Thread Jakub Jelinek

Hi!

GCC does use UTF-16 and UTF-32 for char16_t and char32_t string literals
already, so P1041R4 is I believe already implemented with no changes needed.

While going through P1139R2, I've realized that we weren't handling
"If the value is not representable within 16 bits, the program is ill-formed. A 
char16_t
literal containing multiple c-chars is ill-formed."
and
"A char32_t literal containing multiple c-chars is ill-formed."
already from C++11 correctly, we were just warning about it, rather than
emitting an error.  This is different from C11, where the standard
makes it implementation-defined what happens.

Furthermore, the C++17:
"If the value is not representable with a single UTF-8 code unit,
the program is ill-formed. A UTF-8 character literal containing multiple 
c-chars is
ill-formed."
wasn't handled as an error, but instead u8'ab' would be an int with a
warning, similarly u8'\u00c0' etc.  u8 char literals are only in C++17+,
not in C, so no need to worry about C at this point.

And lastly, P1139R2 makes it clear that code points above U+10 are
ill-formed, but that is something Eric already implemented in r276167.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I believe we can now claim to have both P1041R4 and P1139R2 implemented.

2019-11-07  Jakub Jelinek  

PR c++/91370 - Implement P1041R4 and P1139R2 - Stronger Unicode reqs
* charset.c (narrow_str_to_charconst): Add TYPE argument.  For
CPP_UTF8CHAR diagnose whenever number of chars is > 1, using
CPP_DL_ERROR instead of CPP_DL_WARNING.
(wide_str_to_charconst): For CPP_CHAR16 or CPP_CHAR32, use
CPP_DL_ERROR instead of CPP_DL_WARNING when multiple char16_t
or char32_t chars are needed.
(cpp_interpret_charconst): Adjust narrow_str_to_charconst caller.

* g++.dg/cpp1z/utf8-neg.C: Expect errors rather than -Wmultichar
warnings.
* g++.dg/ext/utf16-4.C: Expect errors rather than warnings.
* g++.dg/ext/utf32-4.C: Likewise.
* g++.dg/cpp2a/ucn2.C: New test.

--- libcpp/charset.c.jj 2019-09-27 10:32:17.127641484 +0200
+++ libcpp/charset.c2019-11-07 13:40:19.616040925 +0100
@@ -1881,10 +1881,11 @@ cpp_interpret_string_notranslate (cpp_re
 /* Subroutine of cpp_interpret_charconst which performs the conversion
to a number, for narrow strings.  STR is the string structure returned
by cpp_interpret_string.  PCHARS_SEEN and UNSIGNEDP are as for
-   cpp_interpret_charconst.  */
+   cpp_interpret_charconst.  TYPE is the token type.  */
 static cppchar_t
 narrow_str_to_charconst (cpp_reader *pfile, cpp_string str,
-unsigned int *pchars_seen, int *unsignedp)
+unsigned int *pchars_seen, int *unsignedp,
+enum cpp_ttype type)
 {
   size_t width = CPP_OPTION (pfile, char_precision);
   size_t max_chars = CPP_OPTION (pfile, int_precision) / width;
@@ -1913,10 +1914,12 @@ narrow_str_to_charconst (cpp_reader *pfi
result = c;
 }
 
+  if (type == CPP_UTF8CHAR)
+max_chars = 1;
   if (i > max_chars)
 {
   i = max_chars;
-  cpp_error (pfile, CPP_DL_WARNING,
+  cpp_error (pfile, type == CPP_UTF8CHAR ? CPP_DL_ERROR : CPP_DL_WARNING,
 "character constant too long for its type");
 }
   else if (i > 1 && CPP_OPTION (pfile, warn_multichar))
@@ -1980,7 +1983,9 @@ wide_str_to_charconst (cpp_reader *pfile
  character exactly fills a wchar_t, so a multi-character wide
  character constant is guaranteed to overflow.  */
   if (str.len > nbwc * 2)
-cpp_error (pfile, CPP_DL_WARNING,
+cpp_error (pfile, (CPP_OPTION (pfile, cplusplus)
+  && (type == CPP_CHAR16 || type == CPP_CHAR32))
+ ? CPP_DL_ERROR : CPP_DL_WARNING,
   "character constant too long for its type");
 
   /* Truncate the constant to its natural width, and simultaneously
@@ -2038,7 +2043,8 @@ cpp_interpret_charconst (cpp_reader *pfi
 result = wide_str_to_charconst (pfile, str, pchars_seen, unsignedp,
token->type);
   else
-result = narrow_str_to_charconst (pfile, str, pchars_seen, unsignedp);
+result = narrow_str_to_charconst (pfile, str, pchars_seen, unsignedp,
+ token->type);
 
   if (str.text != token->val.str.text)
 free ((void *)str.text);
--- gcc/testsuite/g++.dg/cpp1z/utf8-neg.C.jj2018-10-22 09:28:06.380657152 
+0200
+++ gcc/testsuite/g++.dg/cpp1z/utf8-neg.C   2019-11-07 14:34:23.929317534 
+0100
@@ -1,6 +1,6 @@
 /* { dg-do compile { target c++17 } } */
 
 const static char c0 = u8'';   // { dg-error "empty character" }
-const static char c1 = u8'ab'; // { dg-warning "multi-character 
character constant" }
-const static char c2 = u8'\u0124'; // { dg-warning "multi-character 
character constant" }
-const static char c3 = u8'\U00064321';  // { dg-warning "multi-character

[PATCH] Handle gimple_clobber_p stmts in store-merging (PR target/92038)

2019-11-07 Thread Jakub Jelinek

Hi!

The following patch adds handling of clobbers in store-merging.  The intent
is if we have a clobber followed by some stores into the clobbered area,
even if don't store all the bytes in the area, we can avoid masking, because
the non-stored bytes are undefined and in some cases we can even overwrite
the whole area with the same or smaller number of stores compared to the
original IL.
Clobbers aren't removed from the IL, even if the following stores completely
cover the whole area, as clobbers carry important additional information
that the old value is gone, e.g. for tail call discovery if address taken
before the clobber but not after it, removing the clobbers would disable
tail call optimization.
The patch right now treats the clobbered non-stored bytes as non-masked zero
stores, except that we don't add stores to whole words etc. if there are no
other overlapping stores; I have a separate patch that also computed
defined_mask which contained whether some bytes are just undefined and we
could in theory try different bit patterns in those bytes, but in the end
decided it is too complicated and if needed, could be done as a follow-up.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-11-07  Jakub Jelinek  

PR target/92038
* gimple-ssa-store-merging.c (find_constituent_stores): For return
value only, return non-NULL if there is a single non-clobber
constituent store even if there are constituent clobbers and return
one of clobber constituent stores if all constituent stores are
clobbers.
(split_group): Handle clobbers.
(imm_store_chain_info::output_merged_store): When computing
bzero_first, look after all clobbers at the start.  Don't count
clobber stmts in orig_num_stmts, except if the first orig store is
a clobber covering the whole area and split_stores cover the whole
area, consider equal number of stmts ok.  Punt if split_stores
contains only ->orig stores and their number plus number of original
clobbers is equal to original number of stmts.  For ->orig, look past
clobbers in the constituent stores.
(imm_store_chain_info::output_merged_stores): Don't remove clobber
stmts.
(rhs_valid_for_store_merging_p): Don't return false for clobber stmt
rhs.
(store_valid_for_store_merging_p): Allow clobber stmts.
(verify_clear_bit_region_be): Fix up a thinko in function comment.

* g++.dg/opt/store-merging-1.C: New test.
* g++.dg/opt/store-merging-2.C: New test.
* g++.dg/opt/store-merging-3.C: New test.

--- gcc/gimple-ssa-store-merging.c.jj   2019-11-07 09:50:38.029447052 +0100
+++ gcc/gimple-ssa-store-merging.c  2019-11-07 12:13:15.048531180 +0100
@@ -3110,7 +3110,8 @@ split_store::split_store (unsigned HOST_
 /* Record all stores in GROUP that write to the region starting at BITPOS and
is of size BITSIZE.  Record infos for such statements in STORES if
non-NULL.  The stores in GROUP must be sorted by bitposition.  Return INFO
-   if there is exactly one original store in the range.  */
+   if there is exactly one original store in the range (in that case ignore
+   clobber stmts, unless there are only clobber stmts).  */
 
 static store_immediate_info *
 find_constituent_stores (class merged_store_group *group,
@@ -3146,16 +3147,24 @@ find_constituent_stores (class merged_st
   if (stmt_start >= end)
return ret;
 
+  if (gimple_clobber_p (info->stmt))
+   {
+ if (stores)
+   stores->safe_push (info);
+ if (ret == NULL)
+   ret = info;
+ continue;
+   }
   if (stores)
{
  stores->safe_push (info);
- if (ret)
+ if (ret && !gimple_clobber_p (ret->stmt))
{
  ret = NULL;
  second = true;
}
}
-  else if (ret)
+  else if (ret && !gimple_clobber_p (ret->stmt))
return NULL;
   if (!second)
ret = info;
@@ -3347,13 +3356,17 @@ split_group (merged_store_group *group,
 
   if (bzero_first)
 {
-  first = 1;
+  store_immediate_info *gstore;
+  FOR_EACH_VEC_ELT (group->stores, first, gstore)
+   if (!gimple_clobber_p (gstore->stmt))
+ break;
+  ++first;
   ret = 1;
   if (split_stores)
{
  split_store *store
-   = new split_store (bytepos, group->stores[0]->bitsize, align_base);
- store->orig_stores.safe_push (group->stores[0]);
+   = new split_store (bytepos, gstore->bitsize, align_base);
+ store->orig_stores.safe_push (gstore);
  store->orig = true;
  any_orig = true;
  split_stores->safe_push (store);
@@ -3377,6 +3390,7 @@ split_group (merged_store_group *group,
   unsigned HOST_WIDE_INT align_bitpos
= (try_bitpos - align_base) & (group_align - 1);
   unsigned HOST_WIDE_INT align =

Re: [committed] Remove gimple_call_types_likely_match_p (PR 70929)

2019-11-07 Thread Jakub Jelinek

On Thu, Nov 07, 2019 at 12:00:32PM +0100, Martin Jambor wrote:
> 2019-11-07  Martin Jambor  
> 
>   PR lto/70929
>   * cif-code.def (MISMATCHED_ARGUMENTS): Removed.
>   * cgraph.h (gimple_check_call_matching_types): Remove
>   * cgraph.c (gimple_check_call_args): Likewise.
>   (gimple_check_call_matching_types): Likewise.
>   (symbol_table::create_edge): Do not call
>   gimple_check_call_matching_types.
>   (cgraph_edge::make_direct): Likewise.
>   (cgraph_edge::redirect_call_stmt_to_callee): Likewise.
>   * value-prof.h (check_ic_target): Remove.
>   * value-prof.c (check_ic_target): Remove.
>   (gimple_ic_transform): Do nat call check_ic_target.
>   * auto-profile.c (function_instance::find_icall_target_map): Likewise.
>   (afdo_indirect_call): Likewise.
>   * ipa-prop.c (update_indirect_edges_after_inlining): Do not call
>   gimple_check_call_matching_types.
>   * ipa-inline.c (early_inliner): Likewise.
> 
>   testsuite/
>   * g++.dg/lto/pr70929_[01].C: New test.
>   * gcc.dg/winline-10.c: Adjust for the fact that inlining happens.

I'm seeing some regressions on i686-linux from yesterday, and while I haven't
proved it is your patch, it is very likely.
The regressions are:
+FAIL: gcc.dg/cast-function-1.c (internal compiler error)
+FAIL: gcc.dg/cast-function-1.c (test for excess errors)
+FAIL: gdc.test/runnable/nested.d -O2   (internal compiler error)
+UNRESOLVED: gdc.test/runnable/nested.d -O2   compilation failed to produce 
executable
+FAIL: gdc.test/runnable/nested.d -O2 -frelease   (internal compiler error)
+UNRESOLVED: gdc.test/runnable/nested.d -O2 -frelease   compilation failed to 
produce executable
+FAIL: gdc.test/runnable/nested.d -O2 -frelease -g   (internal compiler error)
+UNRESOLVED: gdc.test/runnable/nested.d -O2 -frelease -g   compilation failed 
to produce executable
+FAIL: gdc.test/runnable/nested.d -O2 -frelease -g -shared-libphobos   
(internal compiler error)
+UNRESOLVED: gdc.test/runnable/nested.d -O2 -frelease -g -shared-libphobos   
compilation failed to produce executable
+FAIL: gdc.test/runnable/nested.d -O2 -frelease -shared-libphobos   (internal 
compiler error)
+UNRESOLVED: gdc.test/runnable/nested.d -O2 -frelease -shared-libphobos   
compilation failed to produce executable
+FAIL: gdc.test/runnable/nested.d -O2 -g   (internal compiler error)
+UNRESOLVED: gdc.test/runnable/nested.d -O2 -g   compilation failed to produce 
executable
+FAIL: gdc.test/runnable/nested.d -O2 -g -shared-libphobos   (internal compiler 
error)
+UNRESOLVED: gdc.test/runnable/nested.d -O2 -g -shared-libphobos   compilation 
failed to produce executable
+FAIL: gdc.test/runnable/nested.d -O2 -shared-libphobos   (internal compiler 
error)
+UNRESOLVED: gdc.test/runnable/nested.d -O2 -shared-libphobos   compilation 
failed to produce executable
All are checking ICEs on type mismatches.
Can you please have a look?

Jakub

Re: [1/6] Fix vectorizable_conversion costs

2019-11-07 Thread Richard Sandiford

Richard Biener  writes:
> On Tue, Nov 5, 2019 at 3:25 PM Richard Sandiford
>  wrote:
>>
>> This patch makes two tweaks to vectorizable_conversion.  The first
>> is to use "modifier" to distinguish between promotion, demotion,
>> and neither promotion nor demotion, rather than using a code for
>> some cases and "modifier" for others.  The second is to take ncopies
>> into account for the promotion and demotion costs; previously we gave
>> multiple copies the same cost as a single copy.
>>
>> Later patches test this, but it seemed worth splitting out.
>
> OK, but does ncopies properly handle unrolling with SLP?

Bah, thanks for catching that.  Here's a fixed version, tested as before.

Richard


2019-11-07  Richard Sandiford  

gcc/
* tree-vect-stmts.c (vect_model_promotion_demotion_cost): Take the
number of ncopies as an additional argument.
(vectorizable_conversion): Update call accordingly.  Use "modifier"
to check whether a conversion is between vectors with the same
numbers of units.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-11-07 15:11:49.0 +
+++ gcc/tree-vect-stmts.c   2019-11-07 15:11:50.134775028 +
@@ -917,26 +917,27 @@ vect_model_simple_cost (stmt_vec_info st
 }
 
 
-/* Model cost for type demotion and promotion operations.  PWR is normally
-   zero for single-step promotions and demotions.  It will be one if 
-   two-step promotion/demotion is required, and so on.  Each additional
+/* Model cost for type demotion and promotion operations.  PWR is
+   normally zero for single-step promotions and demotions.  It will be
+   one if two-step promotion/demotion is required, and so on.  NCOPIES
+   is the number of vector results (and thus number of instructions)
+   for the narrowest end of the operation chain.  Each additional
step doubles the number of instructions required.  */
 
 static void
 vect_model_promotion_demotion_cost (stmt_vec_info stmt_info,
-   enum vect_def_type *dt, int pwr,
+   enum vect_def_type *dt,
+   unsigned int ncopies, int pwr,
stmt_vector_for_cost *cost_vec)
 {
-  int i, tmp;
+  int i;
   int inside_cost = 0, prologue_cost = 0;
 
   for (i = 0; i < pwr + 1; i++)
 {
-  tmp = (STMT_VINFO_TYPE (stmt_info) == type_promotion_vec_info_type) ?
-   (i + 1) : i;
-  inside_cost += record_stmt_cost (cost_vec, vect_pow2 (tmp),
-  vec_promote_demote, stmt_info, 0,
-  vect_body);
+  inside_cost += record_stmt_cost (cost_vec, ncopies, vec_promote_demote,
+  stmt_info, 0, vect_body);
+  ncopies *= 2;
 }
 
   /* FORNOW: Assuming maximum 2 args per stmts.  */
@@ -4964,7 +4965,7 @@ vectorizable_conversion (stmt_vec_info s
   if (!vec_stmt)   /* transformation not required.  */
 {
   DUMP_VECT_SCOPE ("vectorizable_conversion");
-  if (code == FIX_TRUNC_EXPR || code == FLOAT_EXPR)
+  if (modifier == NONE)
 {
  STMT_VINFO_TYPE (stmt_info) = type_conversion_vec_info_type;
  vect_model_simple_cost (stmt_info, ncopies, dt, ndts, slp_node,
@@ -4973,14 +4974,24 @@ vectorizable_conversion (stmt_vec_info s
   else if (modifier == NARROW)
{
  STMT_VINFO_TYPE (stmt_info) = type_demotion_vec_info_type;
- vect_model_promotion_demotion_cost (stmt_info, dt, multi_step_cvt,
- cost_vec);
+ /* The final packing step produces one vector result per copy.  */
+ unsigned int nvectors
+   = (slp_node ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) : ncopies);
+ vect_model_promotion_demotion_cost (stmt_info, dt, nvectors,
+ multi_step_cvt, cost_vec);
}
   else
{
  STMT_VINFO_TYPE (stmt_info) = type_promotion_vec_info_type;
- vect_model_promotion_demotion_cost (stmt_info, dt, multi_step_cvt,
- cost_vec);
+ /* The initial unpacking step produces two vector results
+per copy.  MULTI_STEP_CVT is 0 for a single conversion,
+so >> MULTI_STEP_CVT divides by 2^(number of steps - 1).  */
+ unsigned int nvectors
+   = (slp_node
+  ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) >> multi_step_cvt
+  : ncopies * 2);
+ vect_model_promotion_demotion_cost (stmt_info, dt, nvectors,
+ multi_step_cvt, cost_vec);
}
   interm_types.release ();
   return true;

Re: [PATCH 0/2] Introduce a new GCC option, --record-gcc-command-line

2019-11-07 Thread Nick Clifton

Hi Egeyar,

Thanks for including me in this discussion.

>>> This option is similar to -frecord-gcc-switches.

For the record I will also note that there is -fverbose-asm which
does almost the same thing, but only records the options as comments
in the assembler.  They are never converted into data in the actual
object files.

It is also worth noting that if your goal is to record how a binary
was produced, possibly with an eye to reproducibility, then you may
also need to record some environment variables too.

One thing I found with annobin is that capturing preprocessor options
(eg -D_FORTIFY_SOURCE) can be quite hard from inside gcc, since often
they have already been processed and discarded.  I do not know if this
affects your actual patch though.

Speaking of annobin, I will bang the gcc plugin gong again here and say
that if your patch is rejected then you might want to consider turning
it into a plugin instead.  In that way you will not need approval from
the gcc maintainers.  But of course you will have to maintain and 
publicise the plugin yourself.

One other thought occurs to me, which is that if the patch is acceptable,
or at least the idea of it, then maybe it would be better to amalgamate
all of the current command line recording options into a single version.
Eg:

  --frecord-options=[dwarf,assembler,object]

where:

  --frecord-options=dwarf  is a synonym for -grecord-switches
  --frecord-options=assembler  is a synonym for -fverbose-asm
  --frecord-options=object is a synonym for your option

The user could supply one or more of the selectors to have the recording
happen in multiple places.

Just an idea.

Cheers
  Nick

Re: [PATCH 0/2] Introduce a new GCC option, --record-gcc-command-line

2019-11-07 Thread Egeyar Bagcioglu



On 11/7/19 10:24 AM, Martin Liška wrote:

On 11/6/19 6:21 PM, Egeyar Bagcioglu wrote:

Hello,


Hello.


Thanks for your detailed reply Martin. You'll find my reply inline. 
Since you added Nick Clifton to your following reply, I am adding him to 
this email too. He is not only the author of annobin, he also submitted 
the -frecord-gcc-switches to GCC. I agree that this discussion can 
benefit from his input.




I would like to propose the following patches which introduce a 
compile option --record-gcc-command-line. When passed to gcc, it 
saves the command line option into the produced object file. The 
option makes it trivial to trace back how a file was compiled and by 
which version of the gcc. It helps with debugging, reproducing bugs 
and repeating the build process.


I like your motivation, we as SUSE would like to have a similar 
functionality. But the current approach has some limitations that make 
it not usable (will explain later).


I am glad you agree with the motivation. Let me answer below the other 
concerns that you have.


This option is similar to -frecord-gcc-switches. However, they have 
three fundamental differences: Firstly, -frecord-gcc-switches saves 
the internal state after the argv is processed and passed by the 
driver. As opposed to that, --record-gcc-command-line saves the 


I would not name it as a fundamental changes, it's doing very similar 
to what -frecord-gcc-switches does.


It is very similar; however, I still insist that what I outlined are 
fundamental differences. As I mentioned in my previous email, I built 
binutils as my test-case-project. I attach to this email the output of 
"readelf -p .GCC.command.line ld/ld-new", so that you can see how well 
the output is merged in general. Please take a look. It saves the 
command line *as is* and as one entry per invocation.


For the record, this is just to test and showcase the functionality. 
This patch in fact has nothing to do with binutils.



Moreover, we also have one another option -grecord-gcc-switches
that saves command line into DWARF.


As Nick also mentioned many times, -grecord-gcc-switches is in DWARF and 
this causes a great disadvantage: it gets stripped out. I believe the 
-grecord-gcc-switches is moot for the sake of this discussion. Because I 
think the discussion surrounding the submission of -frecord-gcc-switches 
makes it clear that the necessity to keep the compile options in the 
object file is something that is already agreed on.



Plus there's a Red Hat plugin called Annobin:
https://www.youtube.com/watch?v=uzffr1M-w5M
https://developers.redhat.com/blog/2018/02/20/annobin-storing-information-binaries/


I am aware of annobin, which is already released as a part of RHEL8. I 
think it is much superior to what I am aiming here. The sole purpose of 
this patch is to keep the command line options in the object file. I 
believe this is a basic functionality that should be supported by the 
GCC itself, without requiring a plugin. In other words, I think pushing 
a different build of a GCC plugin for each GCC version we use on each 
distro (i.e. versions-times-distros many plugin builds) is an overkill 
for such a basic need.


Those who already use annobin for any of its many use cases, might of 
course prefer it over this functionality. For the rest of the distros 
and the GCC versions, I believe this patch is quite useful and 
extendable for its quite basic purpose.




Main limitation of current approach (and probably the suggested patch) 
are:
a) it does not print per function options, which can be modified with 
__attribute__ (or pragma):


$ cat demo.c
int foo()
{
  return 1;
}

#pragma GCC optimize ("-O3")

int bar()
{
  return 0;
}

int main()
{
  return 0;
}


I understand the need here. However, the purpose of this patch is only 
to save the command line options. Your example is a change in the source 
file. Of course, the source file can change. Even the implementation of 
the functions themselves might change. But I believe this is out of the 
scope of this patch, which is the command line.




b) we as SUSE are switching to LTO (-flto); doing that each LTO LTRANS 
will become one compilation unit and

one will see a misleading command line invocation:

$ gcc -flto -O2 demo2.c -c
$ gcc -flto -O3 demo.c -c
$ gcc demo.o demo2.o -o a.out -frecord-gcc-switches
...
.file    ""
.section    .GCC.command.line,"MS",@progbits,1
.ascii    "-mtune=generic"
.zero    1
.ascii    "-march=x86-64"
.zero    1
.ascii    "-auxbase-strip a.out.ltrans0.ltrans.o"
.zero    1
.ascii    "-O3"
.zero    1
.ascii    "-fno-openmp"
.zero    1
.ascii    "-fno-openacc"
.zero    1
.ascii    "-fno-pie"
.zero    1
.ascii    "-frecord-gcc-switches"
.zero    1
.ascii    "-fltrans"
.zero    1
.ascii    "a.out.ltrans0.o"
.zero    1
.text
.type    foo, @function


This is a very interesting case indeed. Thanks for bringing it

Re: [PATCH V3] rs6000: Refine small loop unroll in loop_unroll_adjust hook

2019-11-07 Thread Jiufu Guo

Jiufu Guo  writes:

Hi Sehger,

I updated the patch as we discussed.  This patch does not turn on -fweb
or -frename-registers with -funroll-loops for powerpc.

Thanks for review in advance.

gcc/
2019-11-07  Jiufu Guo  

PR tree-optimization/88760
* gcc/config/rs6000/rs6000.opt (-munroll-only-small-loops): New option.
* gcc/common/config/rs6000/rs6000-common.c
(rs6000_option_optimization_table) [OPT_LEVELS_2_PLUS_SPEED_ONLY]:
Turn on -funroll-loops and -munroll-only-small-loops.
[OPT_LEVELS_ALL]: Turn off -fweb and -frename-registers.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
set of PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS.
Turn off -munroll-only-small-loops for explicit -funroll-loops.
(TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook.
(rs6000_loop_unroll_adjust): Define it.  Use -munroll-only-small-loops.

gcc.testsuite/
2019-11-07  Jiufu Guo  

PR tree-optimization/88760
* gcc.dg/pr59643.c: Update back to r277550.

Index: gcc/common/config/rs6000/rs6000-common.c
===
--- gcc/common/config/rs6000/rs6000-common.c(revision 277871)
+++ gcc/common/config/rs6000/rs6000-common.c(working copy)
@@ -35,7 +35,14 @@ static const struct default_options rs6000_option_
 { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
 /* Enable -fsched-pressure for first pass instruction scheduling.  */
 { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
-{ OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
+/* Enable -munroll-only-small-loops with -funroll-loops to unroll small
+loops at -O2 and above by default.   */
+{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
+{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
+/* -fweb and -frename-registers are useless in general for rs6000,
+   turn them off.  */
+{ OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
+{ OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 277871)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribut
 #undef TARGET_VECTORIZE_DESTROY_COST_DATA
 #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data
 
+#undef TARGET_LOOP_UNROLL_ADJUST
+#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
+
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS rs6000_init_builtins
 #undef TARGET_BUILTIN_DECL
@@ -4540,26 +4543,13 @@ rs6000_option_override_internal (bool global_init_
 global_options.x_param_values,
 global_options_set.x_param_values);
 
-  /* unroll very small loops 2 time if no -funroll-loops.  */
-  if (!global_options_set.x_flag_unroll_loops
- && !global_options_set.x_flag_unroll_all_loops)
-   {
- maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
-global_options.x_param_values,
-global_options_set.x_param_values);
+  /* Explicit -funroll-loops turns -munroll-only-small-loops off.  */
+  if (((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
+  || (global_options_set.x_flag_unroll_all_loops
+  && flag_unroll_all_loops))
+ && !global_options_set.x_unroll_only_small_loops)
+   unroll_only_small_loops = 0;
 
- maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
-global_options.x_param_values,
-global_options_set.x_param_values);
-
- /* If fweb or frename-registers are not specificed in command-line,
-do not turn them on implicitly.  */
- if (!global_options_set.x_flag_web)
-   global_options.x_flag_web = 0;
- if (!global_options_set.x_flag_rename_registers)
-   global_options.x_flag_rename_registers = 0;
-   }
-
   /* If using typedef char *va_list, signal that
 __builtin_va_start (, 0) can be optimized to
 ap = __builtin_next_arg (0).  */
@@ -5101,6 +5091,25 @@ rs6000_destroy_cost_data (void *data)
   free (data);
 }
 
+/*  Implement targetm.loop_unroll_adjust.  */
+
+static unsigned
+rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
+{
+   if (unroll_only_small_loops)
+{
+  /* TODO: This is hardcoded to 10 right now.  It can be refined, for
+example we may want to unroll very small loops more times (4 perhaps).
+We also should use a PARAM for this.  */
+  if (loop->ninsns <= 10)
+   return MIN (2, nunroll);
+  else
+   return 0;
+}
+
+  return nunroll;
+}
+
 /* Handler for the Mathematical

Re: [PATCH][vect] PR 92351: When peeling for alignment make alignment of epilogues unknown

2019-11-07 Thread Andre Vieira (lists)





On 07/11/2019 14:00, Richard Biener wrote:

On Thu, 7 Nov 2019, Andre Vieira (lists) wrote:


Hi,

PR92351 reports a bug in which a wrongly aligned load is generated for an
epilogue of a main loop for which we peeled for alignment.  There is no way to
guarantee that epilogue data accesses are aligned when the main loop is
peeling for alignment.

I also had to split vect-peel-2.c as there were scans there for the number of
unaligned accesses that were vectorized, thanks to this change that now
depends on whether we are vectorizing the epilogue, which will also contain
unaligned accesses.  Since not all targets need to be able to vectorize the
epilogue I decided to disable epilogue vectorization for the version in which
we scan the dumps and add a version that attempts epilogue vectorization but
does not scan the dumps.

Bootstrapped and regression tested on x86_64 and aarch64.

Is this OK for trunk?


@@ -938,6 +938,18 @@ vect_compute_data_ref_alignment (dr_vec_info
*dr_info)
  = exact_div (vect_calculate_target_alignment (dr_info),
BITS_PER_UNIT);
DR_TARGET_ALIGNMENT (dr_info) = vector_alignment;
  
+  /* If the main loop has peeled for alignment we have no way of knowing

+ whether the data accesses in the epilogues are aligned.  We can't at
+ compile time answer the question whether we have entered the main
loop
or
+ not.  Fixes PR 92351.  */
+  if (loop_vinfo)
+{
+  loop_vec_info orig_loop_vinfo = LOOP_VINFO_ORIG_LOOP_INFO
(loop_vinfo);
+  if (orig_loop_vinfo
+ && LOOP_VINFO_PEELING_FOR_ALIGNMENT (orig_loop_vinfo) != 0)
+   return;
+}

so I'm not sure this is the correct place to do the fixup.  Isn't the
above done when analyzing the loops with different vector size/mode?
So we don't yet know whether we analyze the loop as epilogue or
not epilogue?  Looks like we at the moment always choose the
very first loop we analyze successfully as "main" loop?

So, can we do this instead in update_epilogue_loop_vinfo?  There
we should also know whether we created the jump-around the
main vect loop.



So we do know we are analyzing it as an epilogue, that is the only case 
orig_loop_vinfo is set.


The reason why we shouldn't do it in update_epilogue_loop_vinfo is that 
the target might not know how to vectorize memory accesses for unaligned 
memory for the given VF. Or maybe it does but is too expensive don't 
know if we currently check that though. I do not have an example but 
this is why I believe it would be better to do it during analysis. I 
thought it had been you who alerted me to this, but maybe it was 
Sandiford, or maybe I dreamt it up ;)

Re: [PATCH][vect] Disable vectorization of epilogues for loops with SIMDUID set

2019-11-07 Thread Jakub Jelinek

On Thu, Nov 07, 2019 at 02:26:29PM +, Andre Vieira (lists) wrote:
> 2019-11-07  Andre Vieira  
> 
> * tree-vect-loop.c (vect_analyze_loop): Disable epilogue
> vectorization for loops with SIMDUID set.  Enable epilogue
> vectorization for loops with SIMDLEN set after finding a main
> loop with a VF that matches it.
> 
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 
> dfa087ebb2cf01a5d21da0a921f8b6fc3d691ce9..22550ca2d6c56cce201ea422bfae5472a0d85f3a
>  100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2455,11 +2455,15 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
> *shared)
>   delete loop_vinfo;
>  
> /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
> -  enabled, this is not a simd loop and it is the innermost loop.  */
> -   vect_epilogues = (!loop->simdlen
> +  enabled, SIMDUID is not set, it is the innermost loop and we have
> +  either already found the loop's SIMDLEN or there was no SIMDLEN to
> +  begin with.
> +  TODO: Enable epilogue vectorization for loops with SIMDUID set.  */
> +   vect_epilogues = (!simdlen
>   && loop->inner == NULL
>   && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK)
>   && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo)
> + && !loop->simduid
>   /* For now only allow one epilogue loop.  */
>   && first_loop_vinfo->epilogue_vinfos.is_empty ());

LGTM.

Jakub

Re: [PATCH][vect] Disable vectorization of epilogues for loops with SIMDUID set

2019-11-07 Thread Andre Vieira (lists)


Hi,

Rebased the patch on top of Richard Sandiford's patches, with his fixes 
I can now allow for vectorization of epilogues after we match simdlen. 
This will however not turn on epilogue vectorization in cases where we 
specify a desired simdlen that is never matched.  This would require 
more work as before simdlen is matched we would need to analyze each 
vector_size after creating a "first_loop_vinfo" twice: once as an 
epilogue (for in the case we never match simdlen) and once as a main 
loop (in case simdlen would match its VF).  Maybe there is a different 
way of doing it but I don't see it right now.


Bootstrapped and regression tested (also ran libgomp tests) for x86_64 
and aarch64. Currently libgomp has 5 failures for aarch64, these are all 
openacc tests. The first one I looked at is due to a reduction seemingly 
performing too many iterations when defining '$acc parallel 
vector_length(vl)' I am looking into it.


Is this OK for trunk?

Cheers,
Andre

gcc/ChangeLog:
2019-11-07  Andre Vieira  

* tree-vect-loop.c (vect_analyze_loop): Disable epilogue
vectorization for loops with SIMDUID set.  Enable epilogue
vectorization for loops with SIMDLEN set after finding a main
loop with a VF that matches it.

On 05/11/2019 07:16, Jakub Jelinek wrote:

On Tue, Nov 05, 2019 at 08:07:53AM +0100, Richard Biener wrote:

I was using loop->simdlen to detect whether it was a SIMD loop and I don't
believe that was correct, as can be witnessed by the mass failures in libgomp.
My apologies for not running this, didn't think of it!

I found that these were failing because we do not handle vectorization of
epilogues correctly when SIMDUID is set. For now Jakub and I agreed to disable
epilogue vectorization for loops where SIMDUID is set until we have fixed
this. See further comments inline.

I bootstrapped it on aarch64 and x86_64, ran libgomp on both.

This OK for trunk?


OK.  Can you remove the simdlen == 0 check as a followup?


Yeah, exactly, I wanted to ask what the point of the simdlen == 0 check is.
All a non-zero simdlen says is a user assertion that certain inter-loop
depencencies don't exist.

Jakub

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index dfa087ebb2cf01a5d21da0a921f8b6fc3d691ce9..22550ca2d6c56cce201ea422bfae5472a0d85f3a 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2455,11 +2455,15 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 	delete loop_vinfo;
 
 	  /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
-	 enabled, this is not a simd loop and it is the innermost loop.  */
-	  vect_epilogues = (!loop->simdlen
+	 enabled, SIMDUID is not set, it is the innermost loop and we have
+	 either already found the loop's SIMDLEN or there was no SIMDLEN to
+	 begin with.
+	 TODO: Enable epilogue vectorization for loops with SIMDUID set.  */
+	  vect_epilogues = (!simdlen
 			&& loop->inner == NULL
 			&& PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK)
 			&& LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo)
+			&& !loop->simduid
 			/* For now only allow one epilogue loop.  */
 			&& first_loop_vinfo->epilogue_vinfos.is_empty ());

Re: introduce -fcallgraph-info option

2019-11-07 Thread Richard Biener

On Thu, 7 Nov 2019, Alexandre Oliva wrote:

> On Nov  7, 2019, Richard Biener  wrote:
> 
> > So how's -dumpbase handled?
> 
> It's part of the gcc driver interface, and lto-wrapper passes it to gcc
> for -fltrans compilations.  -auxbase isn't, so one of the alternatives I
> suggested involved our taking auxbase from dumpdir & dumpbase for
> -fltrans compilations.  The other alternatives amounted to exposing
> auxdir or auxbase in the driver interface, so that lto-wrapper could
> specify them explicitly.

Both variants sound reasonable to me (also raises the question why
we have both -dumpbase and -auxbase ...)

Richard.

Re: [PATCH][vect] PR 92351: When peeling for alignment make alignment of epilogues unknown

2019-11-07 Thread Richard Biener

On Thu, 7 Nov 2019, Andre Vieira (lists) wrote:

> Hi,
> 
> PR92351 reports a bug in which a wrongly aligned load is generated for an
> epilogue of a main loop for which we peeled for alignment.  There is no way to
> guarantee that epilogue data accesses are aligned when the main loop is
> peeling for alignment.
> 
> I also had to split vect-peel-2.c as there were scans there for the number of
> unaligned accesses that were vectorized, thanks to this change that now
> depends on whether we are vectorizing the epilogue, which will also contain
> unaligned accesses.  Since not all targets need to be able to vectorize the
> epilogue I decided to disable epilogue vectorization for the version in which
> we scan the dumps and add a version that attempts epilogue vectorization but
> does not scan the dumps.
> 
> Bootstrapped and regression tested on x86_64 and aarch64.
> 
> Is this OK for trunk?

@@ -938,6 +938,18 @@ vect_compute_data_ref_alignment (dr_vec_info 
*dr_info)
 = exact_div (vect_calculate_target_alignment (dr_info), 
BITS_PER_UNIT);
   DR_TARGET_ALIGNMENT (dr_info) = vector_alignment;
 
+  /* If the main loop has peeled for alignment we have no way of knowing
+ whether the data accesses in the epilogues are aligned.  We can't at
+ compile time answer the question whether we have entered the main 
loop
or
+ not.  Fixes PR 92351.  */
+  if (loop_vinfo)
+{
+  loop_vec_info orig_loop_vinfo = LOOP_VINFO_ORIG_LOOP_INFO
(loop_vinfo);
+  if (orig_loop_vinfo
+ && LOOP_VINFO_PEELING_FOR_ALIGNMENT (orig_loop_vinfo) != 0)
+   return;
+}

so I'm not sure this is the correct place to do the fixup.  Isn't the
above done when analyzing the loops with different vector size/mode?
So we don't yet know whether we analyze the loop as epilogue or
not epilogue?  Looks like we at the moment always choose the
very first loop we analyze successfully as "main" loop?

So, can we do this instead in update_epilogue_loop_vinfo?  There
we should also know whether we created the jump-around the
main vect loop.

> In the future I would like to look at allowing for misalignment analysis for
> cases in which both the number of iterations and iterations to peel are known
> at compile time, as in that case we shouldn't ever be skipping the main loop
> as we shouldn't be generating it.
> 
> gcc/ChangeLog:
> 2019-11-07  Andre Vieira  
> 
> * tree-vect-data-refs.c (vect_compute_data_ref_alignment): When we are
> peeling the main loop for alignment, make sure to set the misalignment
> of the epilogue's data references to DR_MISALIGNMENT_UNKNOWN.
> 
> gcc/testsuite/ChangeLog:
> 2019-11-07  Andre Vieira  
> 
> * gcc.dg/vect/vect-peel-2.c: Disable epilogue vectorization and
> split the source of this test to...
> * gcc.dg/vect/vect-peel-2-src.c: ... This.
> * gcc.dg/vect/vect-peel-2-epilogues.c: New test.
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-07 Thread Dennis Zhang

Hi Kyrill,

I have rebased the patch on top of current truck.
For resolve_overloaded, I redefined my memtag overloading function to 
fit the latest resolve_overloaded_builtin interface.

Regression tested again and survived for aarch64-none-linux-gnu.

Cheers
Dennis

Changelog is updated as following:

gcc/ChangeLog:

2019-11-07  Dennis Zhang  

* config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
(aarch64_init_memtag_builtins): New.
(AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
(aarch64_general_init_builtins): Call aarch64_init_memtag_builtins.
(aarch64_expand_builtin_memtag): New.
(aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag.
(AARCH64_BUILTIN_SUBCODE): New macro.
(aarch64_resolve_overloaded_memtag): New.
(aarch64_resolve_overloaded_builtin_general): New hook. Call
aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_MEMORY_TAGGING when enabled.
(aarch64_resolve_overloaded_builtin): Call
aarch64_resolve_overloaded_builtin_general.
* config/aarch64/aarch64-protos.h
(aarch64_resolve_overloaded_builtin_general): New declaration.
* config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
(TARGET_MEMTAG): Likewise.
* config/aarch64/aarch64.md (define_c_enum "unspec"): Add
UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
(irg, gmi, subp, addg, ldg, stg): New instructions.
* config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro.
(__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
(__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): Likewise.
* config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
(aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
* config/arm/types.md (memtag): New.
* doc/invoke.texi (-memtag): Update description.

gcc/testsuite/ChangeLog:

2019-11-07  Dennis Zhang  

* gcc.target/aarch64/acle/memtag_1.c: New test.
* gcc.target/aarch64/acle/memtag_2.c: New test.
* gcc.target/aarch64/acle/memtag_3.c: New test.

On 04/11/2019 16:40, Kyrill Tkachov wrote:
> Hi Dennis,
> 
> On 10/17/19 11:03 AM, Dennis Zhang wrote:
>> Hi,
>>
>> Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
>> It can be used for spatial and temporal memory safety detection and
>> lightweight lock and key system.
>>
>> This patch enables new intrinsics leveraging MTE instructions to
>> implement functionalities of creating tags, setting tags, reading tags,
>> and manipulating tags.
>> The intrinsics are part of Arm ACLE extension:
>> https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics
>> The MTE ISA specification can be found at
>> https://developer.arm.com/docs/ddi0487/latest chapter D6.
>>
>> Bootstraped and regtested for aarch64-none-linux-gnu.
>>
>> Please help to check if it's OK for trunk.
>>
> 
> This looks mostly ok to me but for further review this needs to be 
> rebased on top of current trunk as there are some conflicts with the SVE 
> ACLE changes that recently went in. Most conflicts looks trivial to 
> resolve but one that needs more attention is the definition of the 
> TARGET_RESOLVE_OVERLOADED_BUILTIN hook.
> 
> Thanks,
> 
> Kyrill
> 
>> Many Thanks
>> Dennis
>>
>> gcc/ChangeLog:
>>
>> 2019-10-16  Dennis Zhang  
>>
>>     * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
>>     AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
>>     AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
>>     AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
>>     AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
>>     (aarch64_init_memtag_builtins): New.
>>     (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
>>     (aarch64_general_init_builtins): Call 
>> aarch64_init_memtag_builtins.
>>     (aarch64_expand_builtin_memtag): New.
>>     (aarch64_general_expand_builtin): Call 
>> aarch64_expand_builtin_memtag.
>>     (AARCH64_BUILTIN_SUBCODE): New macro.
>>     (aarch64_resolve_overloaded_memtag): New.
>>     (aarch64_resolve_overloaded_builtin): New hook. Call
>>     aarch64_resolve_overloaded_memtag to handle overloaded MTE 
>> builtins.
>>     * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): 
>> Define
>>     __ARM_FEATURE_MEMORY_TAGGING when enabled.
>>     * config/aarch64/aarch64-protos.h 
>> (aarch64_resolve_overloaded_builtin):
>>     Add declaration.
>>

Free some more stuff in free_lang_data

2019-11-07 Thread Jan Hubicka

Hi,
as every year, I went through reasons why types of same ODR name are not
merged in firefox streaming.  Here are few problems I caught.  Remaining
issues I understand are
 1) odr violations (which is OK of course)
 2) keyed vtables: sometimes the vtable decl is weak and sometimes it is
external (I have WIP patch for ipa-devirt to stream BINFO_VTABLE off
the main stream, but at least for Firefox it does not have any
dramatic effects on the size of stream)
 3) differences in attribute list (some attributes, like aligned does
not make sense on incomplete types, but our FEs lets us to add them
there so I think there is no canonical incomplete variant here)
 4) TYPELESS storage differences
 5) if type is not merged also all types referring to it via
TYPE_CONTEXT are not.  This still causes propagation from type to
another

There are still some cases which I did not track down, but we have only
couple hundred of unmerged types, so situation seems to be mostly under
control.

There is about 700MB of trees in global stream for Firefox + 200MB of
in_decl_state vectors and I am not sure if there are easy ways to cut it
down.

Kind   Nodes  Bytes

constructors 12k   308k
vecs 15k  1593k
refs 55k  2702k
binfos   63k  6670k
constants   253k  9760k
random kinds654k25M
exprs   744k23M
identifiers1217k47M
decls  1583k   293M
types  1822k   298M

Total  6423k   709M


union_type 4335 
real_cst   7244 
mem_ref8429 
array_type   10k
enumeral_type10k
constructor  12k
tree_vec 15k
array_ref15k
nop_expr 19k
component_ref30k
pointer_plus_expr41k
tree_binfo   63k
var_decl 78k
function_type   106k
integer_cst 110k
reference_type  125k
string_cst  132k
type_decl   203k
record_type 335k
field_decl  355k
method_type 583k
pointer_type642k
tree_list   654k
addr_expr   683k
function_decl   941k
identifier_node1217k




Bootstrapped/regtested x86_64-linux, OK?

Honza

* tree.c (fld_incomplete_type_of): Clear TYPE_FINAL_P, TYPE_EMPTY_P,
ENUM_IS_OPAQUE and ENUM_IS_SCOPED.
(free_lang_data_in_binfo): Clear TREE_PUBLIC in BINFO
(free_lang_data_in_type): Clear ENUM_IS_OPAQUE and ENUM_IS_SCOPED.
Index: tree.c
===
--- tree.c  (revision 277796)
+++ tree.c  (working copy)
@@ -5383,9 +5387,15 @@ fld_incomplete_type_of (tree t, class fr
  TYPE_TYPELESS_STORAGE (copy) = 0;
  TYPE_FIELDS (copy) = NULL;
  TYPE_BINFO (copy) = NULL;
+ TYPE_FINAL_P (copy) = 0;
+ TYPE_EMPTY_P (copy) = 0;
}
  else
-   TYPE_VALUES (copy) = NULL;
+   {
+ TYPE_VALUES (copy) = NULL;
+ ENUM_IS_OPAQUE (copy) = 0;
+ ENUM_IS_SCOPED (copy) = 0;
+   }
 
  /* Build copy of TYPE_DECL in TYPE_NAME if necessary.
 This is needed for ODR violation warnings to come out right (we
@@ -5468,6 +5478,7 @@ free_lang_data_in_binfo (tree binfo)
   BINFO_INHERITANCE_CHAIN (binfo) = NULL_TREE;
   BINFO_SUBVTT_INDEX (binfo) = NULL_TREE;
   BINFO_VPTR_FIELD (binfo) = NULL_TREE;
+  TREE_PUBLIC (binfo) = 0;
 
   FOR_EACH_VEC_ELT (*BINFO_BASE_BINFOS (binfo), i, t)
 free_lang_data_in_binfo (t);
@@ -5569,6 +5580,8 @@ free_lang_data_in_type (tree type, class
 {
   if (TREE_CODE (type) == ENUMERAL_TYPE)
{
+ ENUM_IS_OPAQUE (type) = 0;
+ ENUM_IS_SCOPED (type) = 0;
  /* Type values are used only for C++ ODR checking.  Drop them
 for all type variants and non-ODR types.
 For ODR types the data is freed in free_odr_warning_data.  */

Re: [patch][avr] PR92055: Add switches to enable 64-bit [long] double.

2019-11-07 Thread Georg-Johann Lay


Am 07.11.19 um 13:49 schrieb Martin Liška:

On 11/7/19 1:39 PM, Georg-Johann Lay wrote:

Am 07.11.19 um 10:41 schrieb Martin Liška:

Hello.

I've noticed quite some GNU coding style violations with your patch.
Please next time, use something like:

$ git diff HEAD~ > /tmp/patch && ./contrib/check_GNU_style.py /tmp/patch

Thanks,
Martin



hm, I am actually using GNU style with Emacs...

You mean the lines > 80 chars in config.gcc?

I assumed that is no issue because there are already quite some lines 
that don't follow the < 80 rule.


That's fine. I'm mainly talking about:

=== ERROR type #1: blocks of 8 spaces should be replaced with tabs (45 
error(s)) ===
gcc/common/config/avr/avr-common.c:78:0:   const struct 
cl_decoded_option *decoded, location_t loc)

gcc/common/config/avr/avr-common.c:86:0:{
gcc/common/config/avr/avr-common.c:88:0:  error_at (loc, "option 
%<-mdouble=64%> is only available if "
gcc/common/config/avr/avr-common.c:89:0:    "configured 
%<--with-double={64|64,32|32,64}%>");
gcc/common/config/avr/avr-common.c:91:0:  
opts->x_avr_long_double = 64;

gcc/common/config/avr/avr-common.c:92:0:}
gcc/common/config/avr/avr-common.c:94:0:{
...

Martin


My intention was to avoid a mixup of TABs and spaces mode, because
the avr backend is indented with spaces. So the indentation picks
up the style from the context (just like ypi would do it in Python
to avoid dreaded mixing of tabs ans spaces). Tabyfying the complete
sources is also something which I didn't consider, because that
makes porting much harder...

Johann

Re: [patch][avr] PR92055: Add switches to enable 64-bit [long] double.

2019-11-07 Thread Martin Liška


On 11/7/19 1:39 PM, Georg-Johann Lay wrote:

Am 07.11.19 um 10:41 schrieb Martin Liška:

Hello.

I've noticed quite some GNU coding style violations with your patch.
Please next time, use something like:

$ git diff HEAD~ > /tmp/patch && ./contrib/check_GNU_style.py /tmp/patch

Thanks,
Martin



hm, I am actually using GNU style with Emacs...

You mean the lines > 80 chars in config.gcc?

I assumed that is no issue because there are already quite some lines that don't 
follow the < 80 rule.


That's fine. I'm mainly talking about:

=== ERROR type #1: blocks of 8 spaces should be replaced with tabs (45 
error(s)) ===
gcc/common/config/avr/avr-common.c:78:0:   const struct 
cl_decoded_option *decoded, location_t loc)
gcc/common/config/avr/avr-common.c:86:0:{
gcc/common/config/avr/avr-common.c:88:0:  error_at (loc, "option 
%<-mdouble=64%> is only available if "
gcc/common/config/avr/avr-common.c:89:0:"configured 
%<--with-double={64|64,32|32,64}%>");
gcc/common/config/avr/avr-common.c:91:0:  opts->x_avr_long_double = 64;
gcc/common/config/avr/avr-common.c:92:0:}
gcc/common/config/avr/avr-common.c:94:0:{
...

Martin



Johann

Re: [PATCH 13/X] [libsanitizer][options] Add hwasan flags and argument parsing

2019-11-07 Thread Matthew Malcomson

On 05/11/2019 13:11, Andrey Konovalov wrote:
> On Tue, Nov 5, 2019 at 12:34 PM Matthew Malcomson
>  wrote:
>>
>> NOTE:
>> --
>> I have defined a new macro of __SANITIZE_HWADDRESS__ that gets
>> automatically defined when compiling with hwasan.  This is analogous to
>> __SANITIZE_ADDRESS__ which is defined when compiling with asan.
>>
>> Users in the kernel have expressed an interest in using
>> __SANITIZE_ADDRESS__ for both
>> (https://lists.infradead.org/pipermail/linux-arm-kernel/2019-October/690703.html).
>>
>> One approach to do this could be to define __SANITIZE_ADDRESS__ with
>> different values depending on whether we are compiling with hwasan or
>> asan.
>>
>> Using __SANITIZE_ADDRESS__ for both means that code like the kernel
>> which wants to treat the two sanitizers as alternate implementations of
>> the same thing gets that automatically.
>>
>> My preference is to use __SANITIZE_HWADDRESS__ since that means any
>> existing code will not be predicated on this (and hence I guess less
>> surprises), but would appreciate feedback on this given the point above.
> 
> +Evgenii Stepanov
> 
> (A repost from my answer from the mentioned thread):
> 
>> Similarly, I'm thinking I'll add no_sanitize_hwaddress as the hwasan
>> equivalent of no_sanitize_address, which will require an update in the
>> kernel given it seems you want KASAN to be used the same whether using
>> tags or not.
> 
> We have intentionally reused the same macros to simplify things. Is
> there any reason to use separate macros for GCC? Are there places
> where we need to use specifically no_sanitize_hwaddress and
> __SANITIZE_HWADDRESS__, but not no_sanitize_address and
> __SANITIZE_ADDRESS__?
> 
> 

I've just looked through some open source repositories (via github 
search) that used the existing __SANITIZE_ADDRESS__ macro.

There are a few repos that would want to use a feature macro for hwasan 
or asan in the exact same way as each other, but of the 31 truly 
different uses I found, 11 look like they would need to distinguish 
between hwasan and asan (where 4 uses I found I couldn't easily tell)

NOTE
- This is a count of unique uses, ignoring those repos which use a file 
from another repo.
- I'm just giving links to the first of the relevant kind that I found, 
not putting effort into finding the "canonical" source of each repository.


Places that need distinction (and their reasons):

There are quite a few that use the ASAN_POISON_MEMORY_REGION and 
ASAN_UNPOISON_MEMORY_REGION macros to poison/unpoison memory themselves. 
  This abstraction doesn't quite make sense in a hwasan environment, as 
there is not really a "poisoned/unpoisoned" concept.

https://github.com/laurynas-biveinis/unodb
https://github.com/darktable-org/rawspeed
https://github.com/MariaDB/server
https://github.com/ralfbrown/framepac-ng
https://github.com/peters/aom
https://github.com/pspacek/knot-resolver-docker-fix
https://github.com/harikrishnan94/sheap


Some use it to record their compilation "type" as `-fsanitize=address`
https://github.com/wallix/redemption

Or to decide to set the environment variable ASAN_OPTIONS
https://github.com/dephonatine/VBox5.2.18

Others worry about stack space due to asan's redzones (hwasan has a much 
smaller stack memory overhead).
https://github.com/fastbuild/fastbuild
https://github.com/scylladb/seastar
(n.b. seastar has a lot more conditioned code that would be the same 
between asan and hwasan).


Each of these needs to know the difference between compiling with asan 
and hwasan, so I'm confident that having some way to determine that in 
the source code is a good idea.


I also believe there could be code in the wild that would need to 
distinguish between hwasan and asan where the existence of tags could be 
problematic:

- code already using the top-byte-ignore feature may be able to be used 
with asan but not hwasan.
- Code that makes assumptions about pointer ordering (e.g. the autoconf 
program that looks for stack growth direction) could break on hwasan but 
not on asan.
- Code looking for the distance between two objects in memory would need 
to account for tags in pointers.


Hence I think this distinction is needed.

Matthew

Re: [RFC PATCH] Extend the simd function attribute

2019-11-07 Thread Szabolcs Nagy

On 05/11/2019 18:42, Joseph Myers wrote:
> On Tue, 5 Nov 2019, Szabolcs Nagy wrote:
>> GCC currently supports two ways to declare the availability of vector
>> variants of a scalar function:
>>
>>   #pragma omp declare simd
>>   void f (void);
>>
>> and
>>
>>   __attribute__ ((simd))
>>   void f (void);
>>
>> However neither can declare unambiguously a single vector variant, only
>> a set of vector variants which may change in the future as new vector
>> architectures or vector call ABIs are introduced. So these mechanisms
>> are not suitable for libraries which must not declare more than what
>> they provide.
> 
> Rather, these mechanisms must imply a fixed set of variants defined at the 
> time the ABI was set, with any new variants added in future requiring some 
> different attribute / pragma to be specified, to avoid breaking existing 
> libraries and headers.  That's a requirement I made clear in the glibc 
> review of the original libmvec addition, and is why the x86_64 vector ABI 
> 
>  
> says "Compiler implementations must not generate calls to version of other 
> ISAs unless some non-standard pragma or clause is used to declare those 
> other versions are available.".

OK i guess similar text can be added to the aarch64 vector abi document,
so the meaning of omp declare simd is stable.

And then we only need the proposed mechanism to be able to specify a
smaller set of variants, or for future vector ABIs.

Re: [patch][avr] PR92055: Add switches to enable 64-bit [long] double.

2019-11-07 Thread Georg-Johann Lay


Am 07.11.19 um 10:41 schrieb Martin Liška:

Hello.

I've noticed quite some GNU coding style violations with your patch.
Please next time, use something like:

$ git diff HEAD~ > /tmp/patch && ./contrib/check_GNU_style.py /tmp/patch

Thanks,
Martin



hm, I am actually using GNU style with Emacs...

You mean the lines > 80 chars in config.gcc?

I assumed that is no issue because there are already quite some lines 
that don't follow the < 80 rule.


Johann

[PATCH 7/7] Fix test-suite fallout.

2019-11-07 Thread Martin Liska


gcc/testsuite/ChangeLog:

2019-11-06  Martin Liska  

* gcc.dg/completion-3.c: Append = to all expected
results and sort expected output.
* gcc.dg/pr83620.c: Update error message.
* gcc.dg/spellcheck-params-2.c: Likewise.
* gcc.dg/spellcheck-params.c: Likewise.
* gcc.misc-tests/help.exp: Update expected output.
---
 gcc/testsuite/gcc.dg/completion-3.c| 16 
 gcc/testsuite/gcc.dg/pr83620.c |  2 +-
 gcc/testsuite/gcc.dg/spellcheck-params-2.c |  2 +-
 gcc/testsuite/gcc.dg/spellcheck-params.c   |  2 +-
 gcc/testsuite/gcc.misc-tests/help.exp  |  4 +---
 5 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/completion-3.c b/gcc/testsuite/gcc.dg/completion-3.c
index 3c4a89ffdd5..a240993dd45 100644
--- a/gcc/testsuite/gcc.dg/completion-3.c
+++ b/gcc/testsuite/gcc.dg/completion-3.c
@@ -2,12 +2,12 @@
 /* { dg-options "--completion=--param=asan-" } */
 
 /* { dg-begin-multiline-output "" }
---param=asan-stack
---param=asan-instrument-allocas
---param=asan-globals
---param=asan-instrument-writes
---param=asan-instrument-reads
---param=asan-memintrin
---param=asan-use-after-return
---param=asan-instrumentation-with-call-threshold
+--param=asan-globals=
+--param=asan-instrument-allocas=
+--param=asan-instrument-reads=
+--param=asan-instrument-writes=
+--param=asan-instrumentation-with-call-threshold=
+--param=asan-memintrin=
+--param=asan-stack=
+--param=asan-use-after-return=
{ dg-end-multiline-output "" } */
diff --git a/gcc/testsuite/gcc.dg/pr83620.c b/gcc/testsuite/gcc.dg/pr83620.c
index e0e44a3b421..cf6eb91197f 100644
--- a/gcc/testsuite/gcc.dg/pr83620.c
+++ b/gcc/testsuite/gcc.dg/pr83620.c
@@ -1,7 +1,7 @@
 /* PR rtl-optimization/86620 */
 /* { dg-do compile } */
 /* { dg-options "-O2 -flive-range-shrinkage --param=max-sched-ready-insns=0" } */
-/* { dg-error "minimum value of parameter 'max-sched-ready-insns' is 1" "" { target *-*-* } 0 } */
+/* { dg-error "argument to '--param=max-sched-ready-insns=' is not between 1 and 65536" "" { target *-*-* } 0 } */
 
 void
 foo (void)
diff --git a/gcc/testsuite/gcc.dg/spellcheck-params-2.c b/gcc/testsuite/gcc.dg/spellcheck-params-2.c
index 8187de43481..0f56241f48f 100644
--- a/gcc/testsuite/gcc.dg/spellcheck-params-2.c
+++ b/gcc/testsuite/gcc.dg/spellcheck-params-2.c
@@ -1,4 +1,4 @@
 /* { dg-do compile } */
 /* { dg-options "--param does-not-resemble-anything=42" } */
-/* { dg-error "invalid '--param' name 'does-not-resemble-anything'"  "" { target *-*-* } 0 } */
+/* { dg-error "unrecognized command-line option '--param=does-not-resemble-anything=42'"  "" { target *-*-* } 0 } */
 
diff --git a/gcc/testsuite/gcc.dg/spellcheck-params.c b/gcc/testsuite/gcc.dg/spellcheck-params.c
index 01e1343ab9e..4010a5df0d2 100644
--- a/gcc/testsuite/gcc.dg/spellcheck-params.c
+++ b/gcc/testsuite/gcc.dg/spellcheck-params.c
@@ -1,4 +1,4 @@
 /* { dg-do compile } */
 /* { dg-options "--param max-early-inliner-iteration=3" } */
-/* { dg-error "invalid '--param' name 'max-early-inliner-iteration'; did you mean 'max-early-inliner-iterations'?"  "" { target *-*-* } 0 } */
+/* { dg-error "unrecognized command-line option '--param=max-early-inliner-iteration=3'; did you mean '--param=max-early-inliner-iterations='?"  "" { target *-*-* } 0 } */
 
diff --git a/gcc/testsuite/gcc.misc-tests/help.exp b/gcc/testsuite/gcc.misc-tests/help.exp
index 4bb359f698d..60e794b1f46 100644
--- a/gcc/testsuite/gcc.misc-tests/help.exp
+++ b/gcc/testsuite/gcc.misc-tests/help.exp
@@ -64,7 +64,7 @@ check_for_options c "-v --help" "" {are likely to\n  -std} ""
 check_for_options c "--help=optimizers" "-O" "  -g  " ""
 check_for_options c "--help=params" "maximum number of" "-Wunsafe-loop-optimizations" ""
 check_for_options_with_filter c "--help=params" \
-"^The --param option recognizes the following as parameters:$" "" {[^.]$} ""
+"^The following options control parameters:$" "" {[^.]$} ""
 check_for_options c "--help=C" "-ansi" "-gnatO" ""
 check_for_options c {--help=C++} {-std=c\+\+} "-gnatO" ""
 check_for_options c "--help=common" "-dumpbase" "-gnatO" ""
@@ -133,8 +133,6 @@ check_for_options c "--help=warnings,^joined" \
 "^ +-Wtrigraphs" "^ +-Wformat=" ""
 check_for_options c "--help=joined,separate" \
 "^ +-I" "" ""
-check_for_options c "--help=^joined,separate" \
-"^ +--param " "" ""
 check_for_options c "--help=joined,^separate" \
 "^ +--help=" "" ""
 check_for_options c "--help=joined,undocumented" "" "" ""

[PATCH 5/7] Remove last leftover usage of params* files.

2019-11-07 Thread Martin Liska


gcc/ChangeLog:

2019-11-06  Martin Liska  

* common.opt: Remove param_values.
* config/i386/i386-options.c (ix86_valid_target_attribute_p):
Remove finalize_options_struct.
* gcc.c (driver::decode_argv): Do not call global_init_params
and finish_params.
(driver::finalize): Do not call params_c_finalize
and finalize_options_struct.
* opt-suggestions.c (option_proposer::get_completions): Remove
special casing of params.
(option_proposer::find_param_completions): Remove.
(test_completion_partial_match): Update expected output.
* opt-suggestions.h: Remove find_param_completions.
* opts-common.c (add_misspelling_candidates): Add
--param with a space.
* opts.c (handle_param): Remove.
(init_options_struct):. Remove init_options_struct and
similar calls.
(finalize_options_struct): Remove.
(common_handle_option): Use SET_OPTION_IF_UNSET.
* opts.h (finalize_options_struct): Remove.
* toplev.c (general_init): Do not call global_init_params.
(toplev::finalize): Do not call params_c_finalize and
finalize_options_struct.
---
 gcc/common.opt |  3 --
 gcc/config/i386/i386-options.c |  2 -
 gcc/gcc.c  |  8 
 gcc/opt-suggestions.c  | 58 ++-
 gcc/opt-suggestions.h  |  5 ---
 gcc/opts-common.c  | 11 ++
 gcc/opts.c | 71 ++
 gcc/opts.h |  1 -
 gcc/toplev.c   |  8 
 9 files changed, 27 insertions(+), 140 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 8c6acabb1fc..26b6c2ce9e1 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -63,9 +63,6 @@ int flag_complex_method = 1
 Variable
 bool flag_warn_unused_result = false
 
-Variable
-int *param_values
-
 ; Nonzero if we should write GIMPLE bytecode for link-time optimization.
 Variable
 int flag_generate_lto
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 1e3280d1bb9..c909f8ea1ed 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -1340,8 +1340,6 @@ ix86_valid_target_attribute_p (tree fndecl,
 	DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl) = new_optimize;
 }
 
-  finalize_options_struct (_options);
-
   return ret;
 }
 
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 159ffe7cb53..539ded01ce6 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -7422,10 +7422,6 @@ driver::expand_at_files (int *argc, char ***argv) const
 void
 driver::decode_argv (int argc, const char **argv)
 {
-  /* Register the language-independent parameters.  */
-  global_init_params ();
-  finish_params ();
-
   init_opts_obstack ();
   init_options_struct (_options, _options_set);
 
@@ -10113,7 +10109,6 @@ void
 driver::finalize ()
 {
   env.restore ();
-  params_c_finalize ();
   diagnostic_finish (global_dc);
 
   is_cpp_driver = 0;
@@ -10134,9 +10129,6 @@ driver::finalize ()
   spec_machine = DEFAULT_TARGET_MACHINE;
   greatest_status = 1;
 
-  finalize_options_struct (_options);
-  finalize_options_struct (_options_set);
-
   obstack_free (, NULL);
   obstack_free (_obstack, NULL); /* in opts.c */
   obstack_free (_obstack, NULL);
diff --git a/gcc/opt-suggestions.c b/gcc/opt-suggestions.c
index 609e60bd20a..01ce331eb0e 100644
--- a/gcc/opt-suggestions.c
+++ b/gcc/opt-suggestions.c
@@ -64,32 +64,17 @@ option_proposer::get_completions (const char *option_prefix,
 
   size_t length = strlen (option_prefix);
 
-  /* Handle OPTION_PREFIX starting with "-param".  */
-  const char *prefix = "-param";
-  if (length >= strlen (prefix)
-  && strstr (option_prefix, prefix) == option_prefix)
-{
-  /* We support both '-param-xyz=123' and '-param xyz=123' */
-  option_prefix += strlen (prefix);
-  char separator = option_prefix[0];
-  option_prefix++;
-  if (separator == ' ' || separator == '=')
-	find_param_completions (separator, option_prefix, results);
-}
-  else
-{
-  /* Lazily populate m_option_suggestions.  */
-  if (!m_option_suggestions)
-	build_option_suggestions (option_prefix);
-  gcc_assert (m_option_suggestions);
+  /* Lazily populate m_option_suggestions.  */
+  if (!m_option_suggestions)
+build_option_suggestions (option_prefix);
+  gcc_assert (m_option_suggestions);
 
-  for (unsigned i = 0; i < m_option_suggestions->length (); i++)
-	{
-	  char *candidate = (*m_option_suggestions)[i];
-	  if (strlen (candidate) >= length
-	  && strstr (candidate, option_prefix) == candidate)
-	results.safe_push (concat ("-", candidate, NULL));
-	}
+  for (unsigned i = 0; i < m_option_suggestions->length (); i++)
+{
+  char *candidate = (*m_option_suggestions)[i];
+  if (strlen (candidate) >= length
+	  && strstr (candidate, option_prefix) == candidate)
+	results.safe_push (concat ("-", candidate, NULL));
 }

[PATCH 6/7] Remove set_default_param_value from documentation.

2019-11-07 Thread Martin Liska


gcc/ChangeLog:

2019-11-06  Martin Liska  

* common/common-target.def:
Do not mention set_default_param_value
and set_param_value.
* doc/tm.texi: Likewise.
---
 gcc/common/common-target.def | 6 ++
 gcc/doc/tm.texi  | 4 ++--
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/common/common-target.def b/gcc/common/common-target.def
index 41b7e704c2e..48096720e44 100644
--- a/gcc/common/common-target.def
+++ b/gcc/common/common-target.def
@@ -51,15 +51,13 @@ DEFHOOKPOD
 
 DEFHOOK
 (option_default_params,
-"Set target-dependent default values for @option{--param} settings, using\
- calls to @code{set_default_param_value}.",
+"Set target-dependent default values for @option{--param} settings.",
  void, (void),
  hook_void_void)
 
 DEFHOOK
 (option_validate_param,
-"Validate target-dependent value for @option{--param} settings, using\
- calls to @code{set_param_value}.",
+"Validate target-dependent value for @option{--param} settings.",
  bool, (int, int),
  default_option_validate_param)
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index cd9aed9874f..f6bc31bef65 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -759,11 +759,11 @@ Set target-dependent initial values of fields in @var{opts}.
 @end deftypefn
 
 @deftypefn {Common Target Hook} void TARGET_OPTION_DEFAULT_PARAMS (void)
-Set target-dependent default values for @option{--param} settings, using calls to @code{set_default_param_value}.
+Set target-dependent default values for @option{--param} settings.
 @end deftypefn
 
 @deftypefn {Common Target Hook} bool TARGET_OPTION_VALIDATE_PARAM (int, @var{int})
-Validate target-dependent value for @option{--param} settings, using calls to @code{set_param_value}.
+Validate target-dependent value for @option{--param} settings.
 @end deftypefn
 
 @defmac SWITCHABLE_TARGET

[PATCH 2/7] Include new generated gcc/params.opt file.

2019-11-07 Thread Martin Liska


gcc/ChangeLog:

2019-11-06  Martin Liska  

* Makefile.in: Include params.opt.
* flag-types.h (enum parloops_schedule_type): Add
parloops_schedule_type used in params.opt.
* params.opt: New file.
---
 gcc/Makefile.in  |   2 +-
 gcc/flag-types.h |  11 +
 gcc/params.opt   | 967 +++
 3 files changed, 979 insertions(+), 1 deletion(-)
 create mode 100644 gcc/params.opt

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 95f054c4d4f..ed47a346689 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -567,7 +567,7 @@ xm_include_list=@xm_include_list@
 xm_defines=@xm_defines@
 lang_checks=
 lang_checks_parallelized=
-lang_opt_files=@lang_opt_files@ $(srcdir)/c-family/c.opt $(srcdir)/common.opt
+lang_opt_files=@lang_opt_files@ $(srcdir)/c-family/c.opt $(srcdir)/common.opt $(srcdir)/params.opt
 lang_specs_files=@lang_specs_files@
 lang_tree_files=@lang_tree_files@
 target_cpu_default=@target_cpu_default@
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index b23d3a271f1..0c23aadefed 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -370,4 +370,15 @@ enum cf_protection_level
   CF_FULL = CF_BRANCH | CF_RETURN,
   CF_SET = 1 << 2
 };
+
+/* Parloops schedule type.  */
+enum parloops_schedule_type
+{
+  PARLOOPS_SCHEDULE_STATIC = 0,
+  PARLOOPS_SCHEDULE_DYNAMIC,
+  PARLOOPS_SCHEDULE_GUIDED,
+  PARLOOPS_SCHEDULE_AUTO,
+  PARLOOPS_SCHEDULE_RUNTIME
+};
+
 #endif /* ! GCC_FLAG_TYPES_H */
diff --git a/gcc/params.opt b/gcc/params.opt
new file mode 100644
index 000..7f2f8610c40
--- /dev/null
+++ b/gcc/params.opt
@@ -0,0 +1,967 @@
+; Parameter options of the compiler.
+
+; Copyright (C) 2019 Free Software Foundation, Inc.
+;
+; This file is part of GCC.
+;
+; GCC is free software; you can redistribute it and/or modify it under
+; the terms of the GNU General Public License as published by the Free
+; Software Foundation; either version 3, or (at your option) any later
+; version.
+;
+; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+; FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+; for more details.
+;
+; You should have received a copy of the GNU General Public License
+; along with GCC; see the file COPYING3.  If not see
+; .
+
+; See the GCC internals manual (options.texi) for a description of this file's format.
+
+; Please try to keep this file in ASCII collating order.
+
+-param=align-loop-iterations=
+Common Joined UInteger Var(param_align_loop_iterations) Optimization Init(4) Param
+Loops iterating at least selected number of iterations will get loop alignment.
+
+-param=align-threshold=
+Common Joined UInteger Var(param_align_threshold) Optimization Init(100) IntegerRange(1, 65536) Param
+Select fraction of the maximal frequency of executions of basic block in function given basic block get alignment.
+
+-param=asan-globals=
+Common Joined UInteger Var(param_asan_globals) Init(1) IntegerRange(0, 1) Param
+Enable asan globals protection.
+
+-param=asan-instrument-allocas=
+Common Joined UInteger Var(param_asan_protect_allocas) Init(1) IntegerRange(0, 1) Param
+Enable asan allocas/VLAs protection.
+
+-param=asan-instrument-reads=
+Common Joined UInteger Var(param_asan_instrument_reads) Init(1) IntegerRange(0, 1) Param
+Enable asan load operations protection.
+
+-param=asan-instrument-writes=
+Common Joined UInteger Var(param_asan_instrument_writes) Init(1) IntegerRange(0, 1) Param
+Enable asan store operations protection.
+
+-param=asan-instrumentation-with-call-threshold=
+Common Joined UInteger Var(param_asan_instrumentation_with_call_threshold) Optimization Init(7000) Param
+Use callbacks instead of inline code if number of accesses in function becomes greater or equal to this number.
+
+-param=asan-memintrin=
+Common Joined UInteger Var(param_asan_memintrin) Init(1) IntegerRange(0, 1) Param
+Enable asan builtin functions protection.
+
+-param=asan-stack=
+Common Joined UInteger Var(param_asan_stack) Init(1) IntegerRange(0, 1) Param
+Enable asan stack protection.
+
+-param=asan-use-after-return=
+Common Joined UInteger Var(param_asan_use_after_return) Init(1) IntegerRange(0, 1) Param
+Enable asan detection of use-after-return bugs.
+
+-param=avg-loop-niter=
+Common Joined UInteger Var(param_avg_loop_niter) Optimization Init(10) IntegerRange(1, 65536) Param
+Average number of iterations of a loop.
+
+-param=avoid-fma-max-bits=
+Common Joined UInteger Var(param_avoid_fma_max_bits) Optimization IntegerRange(0, 512) Param
+Maximum number of bits for which we avoid creating FMAs.
+
+-param=builtin-expect-probability=
+Common Joined UInteger Var(param_builtin_expect_probability) Optimization Init(90) IntegerRange(0, 100) Param
+Set the estimated probability in percentage for builtin expect. The default value is 90% probability.
+
+-param=builtin-string-cmp-inline-length=
+Common Joined

[PATCH 0/7] Param conversion to option machinery

2019-11-07 Thread Martin Liska

The email thread is follow up of a demo that I made here:
https://gcc.gnu.org/ml/gcc-patches/2019-10/msg02220.html

The patchset converts current param infrastructure to
the option machinery. The parts 3 and 4 are quite
mechanical.

I would appreciate a feedback about what parameters
should be per function (Optimization keyword) and global.

The patch survives bootstrap and regtests on x86_64-linux-gnu
and ppc64-linux-gnu and I made cross build of all target compilers.

Thoughts?
Martin

Martin Liska (7):
  Param to options conversion.
  Include new generated gcc/params.opt file.
  Apply mechanical replacement (generated patch).
  Remove gcc/params.* files.
  Remove last leftover usage of params* files.
  Remove set_default_param_value from documentation.
  Fix test-suite fallout.

 gcc/Makefile.in   |   20 +-
 gcc/asan.c|   19 +-
 gcc/auto-profile.c|3 +-
 gcc/bb-reorder.c  |5 +-
 gcc/builtins.c|3 +-
 gcc/c/gimple-parser.c |3 +-
 gcc/cfgcleanup.c  |5 +-
 gcc/cfgexpand.c   |9 +-
 gcc/cfgloopanal.c |9 +-
 gcc/cgraph.c  |3 +-
 gcc/combine.c |5 +-
 gcc/common.opt|   10 -
 gcc/common/common-target.def  |6 +-
 gcc/common/config/aarch64/aarch64-common.c|   16 +-
 gcc/common/config/gcn/gcn-common.c|1 -
 gcc/common/config/ia64/ia64-common.c  |9 +-
 .../config/powerpcspe/powerpcspe-common.c |3 +-
 gcc/common/config/rs6000/rs6000-common.c  |3 +-
 gcc/common/config/sh/sh-common.c  |3 +-
 gcc/config/aarch64/aarch64.c  |   80 +-
 gcc/config/alpha/alpha.c  |   17 +-
 gcc/config/arm/arm.c  |   44 +-
 gcc/config/avr/avr.c  |1 -
 gcc/config/csky/csky.c|1 -
 gcc/config/i386/i386-builtins.c   |1 -
 gcc/config/i386/i386-expand.c |1 -
 gcc/config/i386/i386-features.c   |1 -
 gcc/config/i386/i386-options.c|   35 +-
 gcc/config/i386/i386.c|   27 +-
 gcc/config/ia64/ia64.c|3 +-
 gcc/config/rs6000/rs6000-logue.c  |5 +-
 gcc/config/rs6000/rs6000.c|   56 +-
 gcc/config/s390/s390.c|   80 +-
 gcc/config/sparc/sparc.c  |   84 +-
 gcc/config/visium/visium.c|7 +-
 gcc/coverage.c|9 +-
 gcc/cp/name-lookup.c  |3 +-
 gcc/cp/typeck.c   |5 +-
 gcc/cprop.c   |1 -
 gcc/cse.c |7 +-
 gcc/cselib.c  |3 +-
 gcc/doc/options.texi  |3 +
 gcc/doc/tm.texi   |4 +-
 gcc/dse.c |3 +-
 gcc/emit-rtl.c|   19 +-
 gcc/explow.c  |3 +-
 gcc/final.c   |5 +-
 gcc/flag-types.h  |   11 +
 gcc/fold-const.c  |   13 +-
 gcc/gcc.c |9 -
 gcc/gcse.c|   17 +-
 gcc/ggc-common.c  |5 +-
 gcc/ggc-page.c|5 +-
 gcc/gimple-loop-interchange.cc|7 +-
 gcc/gimple-loop-jam.c |9 +-
 gcc/gimple-loop-versioning.cc |5 +-
 gcc/gimple-ssa-split-paths.c  |3 +-
 gcc/gimple-ssa-sprintf.c  |1 -
 gcc/gimple-ssa-store-merging.c|9 +-
 gcc/gimple-ssa-strength-reduction.c   |3 +-
 gcc/gimple-ssa-warn-alloca.c  |1 -
 gcc/gimple-ssa-warn-restrict.c|1 -
 gcc/graphite-isl-ast-to-gimple.c  |5 +-
 gcc/graphite-optimize-isl.c   |5 +-
 gcc/graphite-scop-detection.c |5 +-
 gcc/graphite-sese-to-poly.c   |1 -
 gcc/graphite.c|1 -
 gcc/haifa-sched.c |   39 +-
 gcc/hsa-gen.c |3 +-
 gcc/ifcvt.c   |5 +-
 gcc/ipa-cp.c  |   31 +-
 gcc/ipa-fnsummary.c   |   21 +-
 gcc/ipa-inline-analysis.c |3 +-
 gcc/ipa-inline.c  |   78 +-

[PATCH 1/7] Param to options conversion.

2019-11-07 Thread Martin Liska


gcc/ChangeLog:

2019-11-06  Martin Liska  

* common.opt: Remove --param and --param= options.
* opt-functions.awk: Mark CL_PARAMS for options
that have Param keyword.
* opts-common.c (decode_cmdline_options_to_array):
Replace --param key=value with --param=key=value.
* opts.c (print_filtered_help): Remove special
printing of params.
(print_specific_help): Update title for params.
(common_handle_option): Do not handle OPT__param.
opts.h (SET_OPTION_IF_UNSET): New macro.
* doc/options.texi: Document Param keyword.
---
 gcc/common.opt|  7 ---
 gcc/doc/options.texi  |  3 +++
 gcc/opt-functions.awk |  3 ++-
 gcc/opts-common.c |  9 +
 gcc/opts.c| 38 +-
 gcc/opts.h| 10 ++
 6 files changed, 25 insertions(+), 45 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 12c0083964e..8c6acabb1fc 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -437,13 +437,6 @@ Common Driver Alias(-target-help)
 fversion
 Common Driver Alias(-version)
 
--param
-Common Separate
---param =	Set parameter  to value.  See below for a complete list of parameters.
-
--param=
-Common Joined Alias(-param)
-
 -sysroot
 Driver Separate Alias(-sysroot=)
 
diff --git a/gcc/doc/options.texi b/gcc/doc/options.texi
index b59f4d39aef..c7c70acd526 100644
--- a/gcc/doc/options.texi
+++ b/gcc/doc/options.texi
@@ -475,6 +475,9 @@ affect executable code generation may use this flag instead, so that the
 option is not taken into account in ways that might affect executable
 code generation.
 
+@item Param
+This is an option that is a parameter.
+
 @item Undocumented
 The option is deliberately missing documentation and should not
 be included in the @option{--help} output.
diff --git a/gcc/opt-functions.awk b/gcc/opt-functions.awk
index c1da80c648c..4f02b74e97c 100644
--- a/gcc/opt-functions.awk
+++ b/gcc/opt-functions.awk
@@ -105,7 +105,8 @@ function switch_flags (flags)
 	  test_flag("Undocumented", flags,  " | CL_UNDOCUMENTED") \
 	  test_flag("NoDWARFRecord", flags,  " | CL_NO_DWARF_RECORD") \
 	  test_flag("Warning", flags,  " | CL_WARNING") \
-	  test_flag("(Optimization|PerFunction)", flags,  " | CL_OPTIMIZATION")
+	  test_flag("(Optimization|PerFunction)", flags,  " | CL_OPTIMIZATION") \
+	  test_flag("Param", flags,  " | CL_PARAMS")
 	sub( "^0 \\| ", "", result )
 	return result
 }
diff --git a/gcc/opts-common.c b/gcc/opts-common.c
index b4ec1bd25ac..d55dc93e165 100644
--- a/gcc/opts-common.c
+++ b/gcc/opts-common.c
@@ -961,6 +961,15 @@ decode_cmdline_options_to_array (unsigned int argc, const char **argv,
 	  continue;
 	}
 
+  /* Interpret "--param" "key=name" as "--param=key=name".  */
+  const char *needle = "--param";
+  if (i + 1 < argc && strcmp (opt, needle) == 0)
+	{
+	  const char *replacement
+	= opts_concat (needle, "=", argv[i + 1], NULL);
+	  argv[++i] = replacement;
+	}
+
   n = decode_cmdline_option (argv + i, lang_mask,
  _array[num_decoded_options]);
   num_decoded_options++;
diff --git a/gcc/opts.c b/gcc/opts.c
index f46b468a968..394cbfd1c56 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1278,38 +1278,6 @@ print_filtered_help (unsigned int include_flags,
   bool displayed = false;
   char new_help[256];
 
-  if (include_flags == CL_PARAMS)
-{
-  for (i = 0; i < LAST_PARAM; i++)
-	{
-	  const char *param = compiler_params[i].option;
-
-	  help = compiler_params[i].help;
-	  if (help == NULL || *help == '\0')
-	{
-	  if (exclude_flags & CL_UNDOCUMENTED)
-		continue;
-	  help = undocumented_msg;
-	}
-
-	  /* Get the translation.  */
-	  help = _(help);
-
-	  if (!opts->x_quiet_flag)
-	{
-	  snprintf (new_help, sizeof (new_help),
-			_("default %d minimum %d maximum %d"),
-			compiler_params[i].default_value,
-			compiler_params[i].min_value,
-			compiler_params[i].max_value);
-	  help = new_help;
-	}
-	  wrap_help (help, param, strlen (param), columns);
-	}
-  putchar ('\n');
-  return;
-}
-
   if (!opts->x_help_printed)
 opts->x_help_printed = XCNEWVAR (char, cl_options_count);
 
@@ -1679,7 +1647,7 @@ print_specific_help (unsigned int include_flags,
 	  description = _("The following options are language-independent");
 	  break;
 	case CL_PARAMS:
-	  description = _("The --param option recognizes the following as parameters");
+	  description = _("The following options control parameters");
 	  break;
 	default:
 	  if (i >= cl_lang_count)
@@ -2241,10 +2209,6 @@ common_handle_option (struct gcc_options *opts,
 
   switch (code)
 {
-case OPT__param:
-  handle_param (opts, opts_set, loc, arg);
-  break;
-
 case OPT__help:
   {
 	unsigned int all_langs_mask = (1U << cl_lang_count) - 1;
diff --git a/gcc/opts.h b/gcc/opts.h
index 47223229388..0de8e4269db 100644
--- a/gcc/opts.h
+++ b/gcc/opts.h
@@ -461,4 +461,14 @@ extern bool

Re: introduce -fcallgraph-info option

2019-11-07 Thread Alexandre Oliva

On Nov  7, 2019, Richard Biener  wrote:

> So how's -dumpbase handled?

It's part of the gcc driver interface, and lto-wrapper passes it to gcc
for -fltrans compilations.  -auxbase isn't, so one of the alternatives I
suggested involved our taking auxbase from dumpdir & dumpbase for
-fltrans compilations.  The other alternatives amounted to exposing
auxdir or auxbase in the driver interface, so that lto-wrapper could
specify them explicitly.

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);

[PATCH][vect] PR 92351: When peeling for alignment make alignment of epilogues unknown

2019-11-07 Thread Andre Vieira (lists)


Hi,

PR92351 reports a bug in which a wrongly aligned load is generated for 
an epilogue of a main loop for which we peeled for alignment.  There is 
no way to guarantee that epilogue data accesses are aligned when the 
main loop is peeling for alignment.


I also had to split vect-peel-2.c as there were scans there for the 
number of unaligned accesses that were vectorized, thanks to this change 
that now depends on whether we are vectorizing the epilogue, which will 
also contain unaligned accesses.  Since not all targets need to be able 
to vectorize the epilogue I decided to disable epilogue vectorization 
for the version in which we scan the dumps and add a version that 
attempts epilogue vectorization but does not scan the dumps.


Bootstrapped and regression tested on x86_64 and aarch64.

Is this OK for trunk?

In the future I would like to look at allowing for misalignment analysis 
for cases in which both the number of iterations and iterations to peel 
are known at compile time, as in that case we shouldn't ever be skipping 
the main loop as we shouldn't be generating it.


gcc/ChangeLog:
2019-11-07  Andre Vieira  

* tree-vect-data-refs.c (vect_compute_data_ref_alignment): When 
we are
peeling the main loop for alignment, make sure to set the 
misalignment

of the epilogue's data references to DR_MISALIGNMENT_UNKNOWN.

gcc/testsuite/ChangeLog:
2019-11-07  Andre Vieira  

* gcc.dg/vect/vect-peel-2.c: Disable epilogue vectorization and
split the source of this test to...
* gcc.dg/vect/vect-peel-2-src.c: ... This.
* gcc.dg/vect/vect-peel-2-epilogues.c: New test.
diff --git a/gcc/testsuite/gcc.dg/vect/vect-peel-2-epilogues.c b/gcc/testsuite/gcc.dg/vect/vect-peel-2-epilogues.c
new file mode 100644
index ..c06fa442fafa36855d285d2336e0d69ee9bffe03
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-peel-2-epilogues.c
@@ -0,0 +1,3 @@
+/* { dg-require-effective-target vect_int } */
+
+#include "vect-peel-2-src.c"
diff --git a/gcc/testsuite/gcc.dg/vect/vect-peel-2-src.c b/gcc/testsuite/gcc.dg/vect/vect-peel-2-src.c
new file mode 100644
index ..f6fc134c8705567a628dcd62c053ad6f2ca2904d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-peel-2-src.c
@@ -0,0 +1,48 @@
+#include 
+#include "tree-vect.h"
+
+#define N 128
+
+/* unaligned store.  */
+
+int ib[N+7];
+
+__attribute__ ((noinline))
+int main1 ()
+{
+  int i;
+  int ia[N+1];
+
+  /* The store is aligned and the loads are misaligned with the same 
+ misalignment. Cost model is disabled. If misaligned stores are supported,
+ we peel according to the loads to align them.  */
+  for (i = 0; i <= N; i++)
+{
+  ia[i] = ib[i+2] + ib[i+6];
+}
+
+  /* check results:  */
+  for (i = 1; i <= N; i++)
+{
+  if (ia[i] != ib[i+2] + ib[i+6])
+abort ();
+}
+
+  return 0;
+}
+
+int main (void)
+{ 
+  int i;
+
+  check_vect ();
+
+  for (i = 0; i <= N+6; i++)
+{
+  asm volatile ("" : "+r" (i));
+  ib[i] = i;
+}
+
+  return main1 ();
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-peel-2.c b/gcc/testsuite/gcc.dg/vect/vect-peel-2.c
index b6061c3b8553b67ecdf56367b2f4128d7c0bd342..65e70bd44170c63ce3bc25c6a7ecf426ddcd39b1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-peel-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-peel-2.c
@@ -1,52 +1,8 @@
 /* { dg-require-effective-target vect_int } */
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 
-#include 
-#include "tree-vect.h"
-
-#define N 128
-
-/* unaligned store.  */
-
-int ib[N+7];
-
-__attribute__ ((noinline))
-int main1 ()
-{
-  int i;
-  int ia[N+1];
-
-  /* The store is aligned and the loads are misaligned with the same 
- misalignment. Cost model is disabled. If misaligned stores are supported,
- we peel according to the loads to align them.  */
-  for (i = 0; i <= N; i++)
-{
-  ia[i] = ib[i+2] + ib[i+6];
-}
-
-  /* check results:  */
-  for (i = 1; i <= N; i++)
-{
-  if (ia[i] != ib[i+2] + ib[i+6])
-abort ();
-}
-
-  return 0;
-}
-
-int main (void)
-{ 
-  int i;
-
-  check_vect ();
-
-  for (i = 0; i <= N+6; i++)
-{
-  asm volatile ("" : "+r" (i));
-  ib[i] = i;
-}
-
-  return main1 ();
-}
+#include "vect-peel-2-src.c"
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1 "vect" { target { { vect_element_align } && { vect_aligned_arrays } } } } } */
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 9dd18d265361ba5635de685f9d898e355999bf4c..fc75ac4f112486934a007c90cc6b646b6115857b 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -938,6 +938,18 @@ vect_compute_data_ref_alignment (dr_vec_info *dr_info)
 = exact_div

[Committed] IBM Z: Add pattern for load truth value of comparison into reg

2019-11-07 Thread Andreas Krebbel

The RTXs used to express an overflow condition check in add/sub/mul are
too complex for if conversion.  However, there is code in
noce_emit_store_flag which generates a simple CC compare as the base
for using a conditional load.  All we have to do is to provide a
pattern to store the truth value of a CC compare into a GPR.

Done with the attached patch.

Bootstrapped and regression tested on s390x.
Committed to mainline.

2019-11-07  Andreas Krebbel  

* config/s390/s390.md ("*cstorecc_z13"): New insn_and_split
pattern.

gcc/testsuite/ChangeLog:

2019-11-07  Andreas Krebbel  

* gcc.target/s390/addsub-signed-overflow-1.c: Expect lochi
instructions to be used.
* gcc.target/s390/addsub-signed-overflow-2.c: Likewise.
* gcc.target/s390/mul-signed-overflow-1.c: Likewise.
* gcc.target/s390/mul-signed-overflow-2.c: Likewise.
* gcc.target/s390/vector/vec-scalar-cmp-1.c: Check for 32 and 64
bit variant of lochi.  Swap the values for the lochi's.
* gcc.target/s390/zvector/vec-cmp-1.c: Likewise.
---
 gcc/config/s390/s390.md   | 15 
 .../s390/addsub-signed-overflow-1.c   |  2 +
 .../s390/addsub-signed-overflow-2.c   |  2 +
 .../gcc.target/s390/mul-signed-overflow-1.c   |  2 +
 .../gcc.target/s390/mul-signed-overflow-2.c   |  2 +
 .../gcc.target/s390/vector/vec-scalar-cmp-1.c | 18 +++--
 .../gcc.target/s390/zvector/vec-cmp-1.c   | 72 ---
 7 files changed, 83 insertions(+), 30 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index e3881d07f2b..c1d73d5ca42 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -6810,6 +6810,21 @@
 [(set (match_dup 0) (ashiftrt:SI (match_dup 0) (const_int 28)))
  (clobber (reg:CC CC_REGNUM))])])
 
+; Such patterns get directly emitted by noce_emit_store_flag.
+(define_insn_and_split "*cstorecc_z13"
+  [(set (match_operand:GPR  0 "register_operand""=")
+   (match_operator:GPR 1 "s390_comparison"
+   [(match_operand 2 "cc_reg_operand""c")
+(match_operand 3 "const_int_operand"  "")]))]
+  "TARGET_Z13"
+  "#"
+  "reload_completed"
+  [(set (match_dup 0) (const_int 0))
+   (set (match_dup 0)
+   (if_then_else:GPR
+(match_op_dup 1 [(match_dup 2) (match_dup 3)])
+(const_int 1)
+(match_dup 0)))])
 
 ;;
 ;; - Conditional move instructions (introduced with z196)
diff --git a/gcc/testsuite/gcc.target/s390/addsub-signed-overflow-1.c 
b/gcc/testsuite/gcc.target/s390/addsub-signed-overflow-1.c
index 367dbcb3774..143220d5541 100644
--- a/gcc/testsuite/gcc.target/s390/addsub-signed-overflow-1.c
+++ b/gcc/testsuite/gcc.target/s390/addsub-signed-overflow-1.c
@@ -79,3 +79,5 @@ main ()
 /* { dg-final { scan-assembler-not "\trisbg" { target { lp64 } } } } */
 /* Just one for the ret != 6 comparison.  */
 /* { dg-final { scan-assembler-times "ci" 1 } } */
+/* { dg-final { scan-assembler-times "\tlochio\t" 6 { target { ! lp64 } } } } 
*/
+/* { dg-final { scan-assembler-times "\tlocghio\t" 6 { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/s390/addsub-signed-overflow-2.c 
b/gcc/testsuite/gcc.target/s390/addsub-signed-overflow-2.c
index 230ad4af1e7..798e489cece 100644
--- a/gcc/testsuite/gcc.target/s390/addsub-signed-overflow-2.c
+++ b/gcc/testsuite/gcc.target/s390/addsub-signed-overflow-2.c
@@ -78,3 +78,5 @@ main ()
 /* { dg-final { scan-assembler-not "\trisbg" { target { lp64 } } } } */
 /* Just one for the ret != 3 comparison.  */
 /* { dg-final { scan-assembler-times "ci" 1 } } */
+/* { dg-final { scan-assembler-times "\tlochio\t" 6 { target { ! lp64 } } } } 
*/
+/* { dg-final { scan-assembler-times "\tlocghio\t" 6 { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c 
b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c
index b3db60ffef5..fdf56d6e695 100644
--- a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c
+++ b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c
@@ -54,3 +54,5 @@ main ()
 /* { dg-final { scan-assembler-not "\trisbg" { target { lp64 } } } } */
 /* Just one for the ret != 3 comparison.  */
 /* { dg-final { scan-assembler-times "ci" 1 } } */
+/* { dg-final { scan-assembler-times "\tlochio\t" 3 { target { ! lp64 } } } } 
*/
+/* { dg-final { scan-assembler-times "\tlocghio\t" 3 { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c 
b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c
index 76b3fa60361..d0088188aa2 100644
--- a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c
+++ b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c
@@ -54,3 +54,5 @@ main ()
 /* { dg-final { scan-assembler-not "\trisbg" { target { lp64 } } } } */
 /* Just one for the ret != 3 comparison.  */
 /* { dg-final { scan-assembler-times "ci" 1 } } */
+/* { dg-final { scan-assembler-times "\tlochio\t" 3 { target { ! lp64 } }

Re: introduce -fcallgraph-info option

2019-11-07 Thread Richard Biener

On Thu, 7 Nov 2019, Alexandre Oliva wrote:

> On Nov  7, 2019, Richard Biener  wrote:
> 
> > A simple test shows we currently only pass -auxbase-strip /tmp/cc...o
> > to the LTRANS lto1 invocation plus -dumpbase cc...o
> 
> This -auxbase-strip argument is introduced by the gcc driver that runs
> lto1, based on the -o (tmp ltrans .o) argument, but this driver has no
> clue as to the executable name or location, and there's nothing
> whatsoever in the driver interface that enables aux_base_name to be
> located separately from the output name.  Thus the possibilities I
> brought up of introducing means for it to be told so, explicitly or by
> convention.

So how's -dumpbase handled?  I'd prefer the same approach for -auxbase

Richard.

[PATCH] Fix PR92405

2019-11-07 Thread Richard Biener



Bootstrapped & tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-11-07  Richard Biener  

PR tree-optimization/92405
* tree-vect-loop.c (vectorizable_reduction): Appropriately
restrict lane-reducing ops to single stmt chains.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 277907)
+++ gcc/tree-vect-loop.c(working copy)
@@ -5865,6 +5865,18 @@ vectorizable_reduction (stmt_vec_info st
   reduc_chain_length++;
 }
 
+  /* For lane-reducing ops we're reducing the number of reduction PHIs
+ which means the only use of that may be in the lane-reducing operation.  
*/
+  if (lane_reduc_code_p
+  && reduc_chain_length != 1
+  && !only_slp_reduc_chain)
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"lane-reducing reduction with extra stmts.\n");
+  return false;
+}
+
   reduc_def = PHI_RESULT (reduc_def_phi);
   for (i = 0; i < op_type; i++)
 {

Re: introduce -fcallgraph-info option

2019-11-07 Thread Alexandre Oliva

On Nov  6, 2019, Thomas Schwinge  wrote:

>> which is a valid VCG file (you can launch your favorite VCG
>> viewer on it unmodified)

> What should be my "favorite VCG viewer"?

-ENOCLUE, I'm afraid.  I honestly don't even know which one Eric used
back when he first attempted to contribute this feature, almost 10 years
ago.

What I do know is that visualization is not the primary goal.  There are
indeed newer and more elaborate and modern graph file formats for that.
The primary intended consumer of this output is gnatstack, that's long
used only this simple format.  It's hard to justify rewriting it and
creating an incompatibility when the simple format does the job well.

Plus, it's simple enough and regular enough that it should be quite easy
to parse it with a few lines of awk and post-process the .ci file into
any other graph format of interest, when visualization of the graph is
the aim.  If you show me examples of graph formats that you'd like, that
can represent all the data encoded in .ci files, it wouldn't take much
effort to persuade me to write the few lines of awk, or perhaps even
sed, to convert .ci files output by GCC to the other format ;-)

> I tried that, but 'xvcg' didn't render anything useful for a
> '-fcallgraph-info=su,da' dump, hmm.

Did xvcg fail to display the node information added by su and da?  As
in, do you see the difference the options make to the graph text file,
but not in the visualization?  Or is it something else?

> Also, I found that many years ago, in 2012, Steven Bosscher did "Rework
> RTL CFG graph dumping to dump DOT format" (that's Graphviz), and then did
> "remove vcg CFG dumper".

gnatstack and -fcallgraph-info have been available since long before
that move indeed.

> Note that I'm not actively objecting VCG

Good, thanks for pointing that out :-)

> unmaintained mid-90s software, containing obfuscated layout/rendering
> source code

Since gnatstack is the primary consumer, I think that objection doesn't
apply.

As a Free Software activist, however, I am a little concerned about the
claim about obfuscated source code.  I haven't been able to find any
substantiation of that in your message.  I think that would be OT for
this list, so would you please send me what you got about it at
ol...@fsf.org?  TIA,

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);

[committed] Remove gimple_call_types_likely_match_p (PR 70929)

2019-11-07 Thread Martin Jambor

Hi

On Mon, Nov 04 2019, Richard Biener wrote:
> On Fri, 1 Nov 2019, Martin Jambor wrote:
>
>> Hi,
>> 
>> I have spent some more time looking into PR 70929, how
>> gimple_check_call_matching_types behaves when LTO-building Firefox to
>> see what could replace it or if we perhaps could remove it.
>> 
>> TL;DR:
>> I believe it can and should be removed, possibly except the use in
>> value-prof.c where I replaced it with a clearly imprecise predicate to
>> catch cases where the profile info is corrupted and since I had it, I
>> also ended up using it for speculative devirtualization, in case it got
>> its guess somehow wrong (but I have not seen either of the two cases
>> happening).  See the patch below.
>> 
>> 
>> More details:
>> With LTO the predicate can always be fooled and so we cannot rely on it
>> as means to prevent ICEs, if the user calls incompatible functions, the
>> compiler should try to fix it with folding, VCEing or just using zero
>> constructors in the most bogus of cases.
>> 
>> And trying to make the predicate more clever can be difficult.  When
>> LTO-building Firefox (without PGO), gimple_check_call_matching_types
>> returns false 8133 times and those cases can be divided into:
>> 
>>   - 2507x the callee was __builtin_constant_p
>>   -   17x the callee was __builtin_signbit
>>   - 1388x the callee was __builtin_unreachable
>> 
>>   - 4215x would pass the suggested test in comment 5 of the bug.  I
>> examined quite a few and all were exactly the problem discussed in
>> this PR - they were all deemed incompatible because one parameter
>> was a reference to a TREE_ADDRESSABLE class.
>> 
>>   - 6x both predicates returned false for a target found by speculative
>> devirtualization.  I tend to think they were both wrong because...
>> 
>> ...the predicate from comment #5 of the bug is not a good substitute
>> because it returns false for perfectly fine virtual calls when the type
>> of the call is a method belonging to an ancestor of the class to which
>> the actual callee belongs.  Thousands of calls to AddRef did not pass
>> the test.
>> 
>> Without finding any real case for having the predicate, I decided to
>> remove its use from everywhere except for check_ic_target because its
>> comment says:
>> 
>> /* Perform sanity check on the indirect call target. Due to race conditions,
>>false function target may be attributed to an indirect call site. If the
>>call expression type mismatches with the target function's type, 
>> expand_call
>>may ICE. Here we only do very minimal sanity check just to make compiler 
>> happy.
>>Returns true if TARGET is considered ok for call CALL_STMT.  */
>> 
>> and if some such race conditions really happen and can be detected if
>> e.g. the number of parameters is clearly off, the compiler should
>> probably discard the target.  But I did not want to keep the original
>> clumsy predicate and therefore decided to extract the non-problematic
>> bits of useless_type_conversion_p into:
>> 
>> /* Check the type of FNDECL and return true if it is likely compatible with 
>> the
>>callee type in CALL.  The check is imprecise, the intended use of this
>>function is that when optimizations like value profiling and speculative
>>devirtualization somehow guess a clearly wrong target of an indirect call,
>>they can discard it.  */
>> 
>> bool
>> gimple_call_types_likely_match_p (gcall *call, tree fndecl)
>> {
>>   tree call_type = gimple_call_fntype (call);
>>   tree decl_type = TREE_TYPE (fndecl);
>> 
>>   /* If one is a function and the other a method, that's a mismatch.  */
>>   if (TREE_CODE (call_type) != TREE_CODE (decl_type))
>> return false;
>>   /* If the return types are not compatible, bail out.  */
>>   if (!useless_type_conversion_p (TREE_TYPE (call_type),
>>TREE_TYPE (decl_type)))
>> return false;
>>   /* If the call was to an unprototyped function, all bets are off.  */
>>   if (!prototype_p (call_type))
>> return true;
>> 
>>   /* If the unqualified argument types are compatible, the types match.  */
>>   if (TYPE_ARG_TYPES (call_type) == TYPE_ARG_TYPES (decl_type))
>> return true;
>> 
>>   tree call_parm, decl_parm;
>>   for (call_parm = TYPE_ARG_TYPES (call_type),
>>   decl_parm = TYPE_ARG_TYPES (decl_type);
>>call_parm && decl_parm;
>>call_parm = TREE_CHAIN (call_parm),
>>   decl_parm = TREE_CHAIN (decl_parm))
>> if (!useless_type_conversion_p
>>  (TYPE_MAIN_VARIANT (TREE_VALUE (call_parm)),
>>   TYPE_MAIN_VARIANT (TREE_VALUE (decl_parm
>>   return false;
>> 
>>   /* If there is a mismatch in the number of arguments the functions
>>  are not compatible.  */
>>   if (call_parm || decl_parm)
>> return false;
>> 
>>   return true;
>> }
>> 
>> Crucially, the function is missing the part that does:
>> 
>>   /* Method types should belong to a compatible base class.  */
>>   if (TREE_CODE (inner_type) ==

Re: introduce -fcallgraph-info option

2019-11-07 Thread Alexandre Oliva

On Nov  7, 2019, Richard Biener  wrote:

> A simple test shows we currently only pass -auxbase-strip /tmp/cc...o
> to the LTRANS lto1 invocation plus -dumpbase cc...o

This -auxbase-strip argument is introduced by the gcc driver that runs
lto1, based on the -o (tmp ltrans .o) argument, but this driver has no
clue as to the executable name or location, and there's nothing
whatsoever in the driver interface that enables aux_base_name to be
located separately from the output name.  Thus the possibilities I
brought up of introducing means for it to be told so, explicitly or by
convention.

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);

[C] Opt out of GNU vector extensions for built-in SVE types

2019-11-07 Thread Richard Sandiford

The AArch64 port defines built-in SVE types at start-up under names
like __SVInt8_t.  These types are represented in the front end and
gimple as normal VECTOR_TYPEs and are code-generated as normal vectors.
However, we'd like to stop the frontends from treating them in the
same way as GNU-style ("vector_size") vectors, for several reasons:

(1) We allowed the GNU vector extensions to be mixed with Advanced SIMD
intrinsics and it ended up causing a lot of confusion on big-endian
targets.  Although SVE handles big-endian vectors differently from
Advanced SIMD, there are still potential surprises; see the block
comment near the head of aarch64-sve.md for details.

(2) One of the SVE vectors is a packed one-bit-per-element boolean vector.
That isn't a combination the GNU vector extensions have supported
before.  E.g. it means that vectors can no longer decompose to
arrays for indexing, and that not all elements are individually
addressable.  It also makes it less clear which order the initialiser
should be in (lsb first, or bitfield ordering?).  We could define
all that of course, but it seems a bit weird to go to the effort
for this case when, given all the other reasons, we don't want the
extensions anyway.

(3) The GNU vector extensions only provide full-vector operations,
which is a very artifical limitation on a predicated architecture
like SVE.

(4) The set of operations provided by the GNU vector extensions is
relatively small, whereas the SVE intrinsics provide many more.

(5) It makes it easier to ensure that (with default options) code is
portable between compilers without the GNU vector extensions having
to become an official part of the SVE intrinsics spec.

(6) The length of the SVE types is usually not fixed at compile time,
whereas the GNU vector extension is geared around fixed-length
vectors.

It's possible to specify the length of an SVE vector using the
command-line option -msve-vector-bits=N, but in principle it should
be possible to have functions compiled for different N in the same
translation unit.  This isn't supported yet but would be very useful
for implementing ifuncs.  Once mixing lengths in a translation unit
is supported, the SVE types should represent the same type throughout
the translation unit, just as GNU vector types do.

However, when -msve-vector-bits=N is in effect, we do allow conversions
between explicit GNU vector types of N bits and the corresponding SVE
types.  This doesn't undermine the intent of (5) because in this case
the use of GNU vector types is explicit and intentional.  It also doesn't
undermine the intent of (6) because converting between the types is just
a conditionally-supported operation.  In other words, the types still
represent the same types throughout the translation unit, it's just that
conversions between them are valid in cases where a certain precondition
is known to hold.  It's similar to the way that the SVE vector types are
defined throughout the translation unit but can only be used in functions
for which SVE is enabled.

The patch adds a new flag to tree_type_common to select this behaviour.
(We currently have 17 bits free.)  To avoid making the flag too specific
to vectors, I called it TYPE_INDIVISIBLE_P, to mean that the frontend
should not allow the components of the type to be accessed directly.
This could perhaps be useful in future for hiding the fact that a
type is an array, or for hiding the fields of a record or union.

The actual frontend changes are very simple, mostly just replacing
VECTOR_TYPE_P with gnu_vector_type_p in selected places.

One interesting case is:

  /* Need to convert condition operand into a vector mask.  */
  if (VECTOR_TYPE_P (TREE_TYPE (ifexp)))
{
  tree vectype = TREE_TYPE (ifexp);
  tree elem_type = TREE_TYPE (vectype);
  tree zero = build_int_cst (elem_type, 0);
  tree zero_vec = build_vector_from_val (vectype, zero);
  tree cmp_type = build_same_sized_truth_vector_type (vectype);
  ifexp = build2 (NE_EXPR, cmp_type, ifexp, zero_vec);
}

in build_conditional_expr.  This appears to be trying to support
elementwise conditions like "vec1 ? vec2 : vec3", which is something
the C++ frontend supports.  However, this code can never trigger AFAICT,
because "vec1" does not survive c_objc_common_truthvalue_conversion:

case VECTOR_TYPE:
  error_at (location, "used vector type where scalar is required");
  return error_mark_node;

Even if it did, the operation should be a VEC_COND_EXPR rather
than a COND_EXPR.

I've therefore left that condition as-is, but added tests for the
"vec1 ? vec2 : vec3" case to make sure that we don't accidentally
allow it for SVE vectors in future.

I originally wrote this a series of patches that add one
gnu_vector_type_p test each, along with the associated lines of the
testcases.  I wasn't sure in the end if it was useful to submit
in

Re: [Patch][OpenMP][Fortran] Support absent optional args with use_device_{ptr,addr} (+ OpenACC's use_device clause)

2019-11-07 Thread Tobias Burnus

Thinking about the patch over night, I have now updated it a bit: 
Namely, I only add the "if(present-check)" condition, if the original 
variable is dereferenced. There is no need for code like


  omp_data_arr.c = c == NULL ? NULL : c;

and then, after the libgomp call, code like "c_2 = c == NULL ? NULL : 
omp_data_arr.c;"; due to the libgomp call, the latter cannot even be 
optimized away.


Hence, I added 'do_optional_check'; additionally, I had a 
libgomp.fortran/use_device_ptr-optional-1.f90 change floating around, 
which I included. Otherwise unchanged.


Retested. OK for the trunk?

Cheers,

Tobias

On 11/6/19 4:04 PM, Tobias Burnus wrote:

This patch is based on Kwok's patch, posted as (4/5) at 
https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00964.html – which is 
targeting OpenACC's use_device* – but it also applies to OpenMP 
use_device_{ptr,addr}.


I added an OpenMP test case. It showed that for arguments with value 
attribute and for assumed-shape array, one needs to do more — as the 
decl cannot be directly used for the is-argument-present check.


(For 'value', a hidden boolean '_' + arg-name is passed in addition; 
for assumed-shape arrays, the array descriptor "x" is replaced by the 
local variable "x.0" (with "x.0 = x->data") and the original decl "x" 
is in GFC_DECL_SAVED_DESCRIPTOR. Especially for assumed-shape arrays, 
the new decl cannot be used unconditionally as it is uninitialized 
when the argument is absent.)


Bootstrapped and regtested on x86_64-gnu-linux without offloading + 
with nvptx.

OK?

Cheers,

Tobias

*The OpenACC test cases are in 5/5 and depend on some other changes. 
Submission of {1,missing one line of 2,3,5}/5 is planned next.
PPS: For fully absent-optional support, mapping needs to be handled 
for OpenACC (see Kwok's …/5 patches) and OpenMP (which is quite 
different on FE level) – and OpenMP also needs changes for the share 
clauses.]


2019-11-07  Tobias Burnus  
	Kwok Cheung Yeung  

	gcc/
	* langhooks-def.h (LANG_HOOKS_OMP_CHECK_OPTIONAL_ARGUMENT):
	Renamed from LANG_HOOKS_OMP_IS_OPTIONAL_ARGUMENT; update define.
	(LANG_HOOKS_DECLS): Rename also here.
	* langhooks.h (lang_hooks_for_decls): Rename
	omp_is_optional_argument to omp_check_optional_argument; take
	additional bool argument.
	* omp-general.h (omp_check_optional_argument): Likewise.
	* omp-general.h (omp_check_optional_argument): Likewise.
	* omp-low.c (lower_omp_target): Update calls; handle absent
	Fortran optional arguments with USE_DEVICE_ADDR/USE_DEVICE_PTR.

	gcc/fortran/
	* trans-decl.c (create_function_arglist): Also set
	GFC_DECL_OPTIONAL_ARGUMENT for per-value arguments.
	* f95-lang.c (LANG_HOOKS_OMP_CHECK_OPTIONAL_ARGUMENT):
	Renamed from LANG_HOOKS_OMP_IS_OPTIONAL_ARGUMENT; point
	to gfc_omp_check_optional_argument.
	* trans.h (gfc_omp_check_optional_argument): Subsitutes
	gfc_omp_is_optional_argument declaration.
	* trans-openmp.c (gfc_omp_is_optional_argument): Make static.
	(gfc_omp_check_optional_argument): New function.

	libgomp/
	* testsuite/libgomp.fortran/use_device_ptr-optional-1.f90: Extend.
	* testsuite/libgomp.fortran/use_device_ptr-optional-2.f90: New.

 gcc/fortran/f95-lang.c  |   4 ++--
 gcc/fortran/trans-decl.c|   3 +--
 gcc/fortran/trans-openmp.c  |  62 +-
 gcc/fortran/trans.h |   2 +-
 gcc/langhooks-def.h |   4 ++--
 gcc/langhooks.h |  13 -
 gcc/omp-general.c   |  14 ++
 gcc/omp-general.h   |   2 +-
 gcc/omp-low.c   | 117 -
 libgomp/testsuite/libgomp.fortran/use_device_ptr-optional-1.f90 |  22 ++
 libgomp/testsuite/libgomp.fortran/use_device_ptr-optional-2.f90 |  33 +
 11 files changed, 229 insertions(+), 47 deletions(-)

diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index 0684c3b99cf..c7b592dbfe2 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -115,7 +115,7 @@ static const struct attribute_spec gfc_attribute_table[] =
 #undef LANG_HOOKS_INIT_TS
 #undef LANG_HOOKS_OMP_ARRAY_DATA
 #undef LANG_HOOKS_OMP_IS_ALLOCATABLE_OR_PTR
-#undef LANG_HOOKS_OMP_IS_OPTIONAL_ARGUMENT
+#undef LANG_HOOKS_OMP_CHECK_OPTIONAL_ARGUMENT
 #undef LANG_HOOKS_OMP_PRIVATIZE_BY_REFERENCE
 #undef LANG_HOOKS_OMP_PREDETERMINED_SHARING
 #undef LANG_HOOKS_OMP_REPORT_DECL
@@ -150,7 +150,7 @@ static const struct attribute_spec gfc_attribute_table[] =
 #define LANG_HOOKS_INIT_TS		gfc_init_ts
 #define LANG_HOOKS_OMP_ARRAY_DATA

[PATCH][arm][4/X] Add initial support for GE-setting SIMD32 intrinsics

2019-11-07 Thread Kyrill Tkachov


Hi all,

This patch adds in plumbing for the ACLE intrinsics that set the GE bits in
APSR.  These are special SIMD instructions in Armv6 that pack bytes or
halfwords into the 32-bit general-purpose registers and set the GE bits in
APSR to indicate if some of the "lanes" of the result have overflowed or 
have

some other instruction-specific property.
These bits can then be used by the SEL instruction (accessed through the 
__sel

intrinsic) to select lanes for further processing.

This situation is similar to the Q-setting intrinsics: we have to track 
the GE

fake register, detect when a function reads it through __sel and restrict
existing patterns that may generate GE-clobbering instruction from
straight-line C code when reading the GE bits matters.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committed to trunk.
Thanks,
Kyrill


2019-11-07  Kyrylo Tkachov  

    * config/arm/aout.h (REGISTER_NAMES): Add apsrge.
    * config/arm/arm.md (APSRGE_REGNUM): Define.
    (arm_): New define_insn.
    (arm_sel): Likewise.
    * config/arm/arm.h (FIXED_REGISTERS): Add entry for apsrge.
    (CALL_USED_REGISTERS): Likewise.
    (REG_ALLOC_ORDER): Likewise.
    (FIRST_PSEUDO_REGISTER): Update value.
    (ARM_GE_BITS_READ): Define.
    * config/arm/arm.c (arm_conditional_register_usage): Clear
    APSRGE_REGNUM from operand_reg_set.
    (arm_ge_bits_access): Define.
    * config/arm/arm-builtins.c (arm_check_builtin_call): Handle
    ARM_BUIILTIN_sel.
    * config/arm/arm-protos.h (arm_ge_bits_access): Declare prototype.
    * config/arm/arm-fixed.md (add3): Convert to define_expand.
    FAIL if ARM_GE_BITS_READ.
    (*arm_add3): New define_insn.
    (sub3): Convert to define_expand.  FAIL if ARM_GE_BITS_READ.
    (*arm_sub3): New define_insn.
    * config/arm/arm_acle.h (__sel, __sadd8, __ssub8, __uadd8, __usub8,
    __sadd16, __sasx, __ssax, __ssub16, __uadd16, __uasx, __usax,
    __usub16): Define.
    * config/arm/arm_acle_builtins.def: Define builtins for the above.
    * config/arm/iterators.md (SIMD32_GE): New int_iterator.
    (simd32_op): Handle the above.
    * config/arm/unspecs.md (UNSPEC_GE_SET): Define.
    (UNSPEC_SEL, UNSPEC_SADD8, UNSPEC_SSUB8, UNSPEC_UADD8, UNSPEC_USUB8,
    UNSPEC_SADD16, UNSPEC_SASX, UNSPEC_SSAX, UNSPEC_SSUB16, UNSPEC_UADD16,
    UNSPEC_UASX, UNSPEC_USAX, UNSPEC_USUB16): Define.

2019-11-07  Kyrylo Tkachov  

    * gcc.target/arm/acle/simd32.c: Update test.
    * gcc.target/arm/acle/simd32_sel.c: New test.

diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index a5f83cb503f61cc1cab0e61795edde33250610e7..72782758853a869bcb9a9d69f3fa0da979cd711f 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -72,7 +72,7 @@
   "wr8",   "wr9",   "wr10",  "wr11",\
   "wr12",  "wr13",  "wr14",  "wr15",\
   "wcgr0", "wcgr1", "wcgr2", "wcgr3",\
-  "cc", "vfpcc", "sfp", "afp", "apsrq"\
+  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge"		\
 }
 #endif
 
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 995f50785f6ebff7b3cd47185516f7bcb4fd5b81..2d902d0b325bc1fe5e22831ef8a59a2bb37c1225 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -3370,6 +3370,13 @@ arm_check_builtin_call (location_t , vec , tree fndecl,
 	  = tree_cons (get_identifier ("acle qbit"), NULL_TREE,
 		   DECL_ATTRIBUTES (cfun->decl));
 }
+  if (fcode == ARM_BUILTIN_sel)
+{
+  if (cfun && cfun->decl)
+	DECL_ATTRIBUTES (cfun->decl)
+	  = tree_cons (get_identifier ("acle gebits"), NULL_TREE,
+		   DECL_ATTRIBUTES (cfun->decl));
+}
   return true;
 }
 
diff --git a/gcc/config/arm/arm-fixed.md b/gcc/config/arm/arm-fixed.md
index 85dbc5d05c35921bc5115df68d30292a712729cf..6d949ba7064c0587d4c5d7b855f2c04c6d0e08e7 100644
--- a/gcc/config/arm/arm-fixed.md
+++ b/gcc/config/arm/arm-fixed.md
@@ -28,11 +28,22 @@
(set_attr "predicable_short_it" "yes,no")
(set_attr "type" "alu_sreg")])
 
-(define_insn "add3"
+(define_expand "add3"
+  [(set (match_operand:ADDSUB 0 "s_register_operand")
+	(plus:ADDSUB (match_operand:ADDSUB 1 "s_register_operand")
+		 (match_operand:ADDSUB 2 "s_register_operand")))]
+  "TARGET_INT_SIMD"
+  {
+if (ARM_GE_BITS_READ)
+  FAIL;
+  }
+)
+
+(define_insn "*arm_add3"
   [(set (match_operand:ADDSUB 0 "s_register_operand" "=r")
 	(plus:ADDSUB (match_operand:ADDSUB 1 "s_register_operand" "r")
 		 (match_operand:ADDSUB 2 "s_register_operand" "r")))]
-  "TARGET_INT_SIMD"
+  "TARGET_INT_SIMD && !ARM_GE_BITS_READ"
   "sadd%?\\t%0, %1, %2"
   [(set_attr "predicable" "yes")
(set_attr "type" "alu_dsp_reg")])
@@ -76,11 +87,22 @@
(set_attr "predicable_short_it" "yes,no")
(set_attr "type" "alu_sreg")])
 
-(define_insn "sub3"
+(define_expand "sub3"
+  [(set (match_operand:ADDSUB 0 "s_register_operand")
+	(minus:ADDSUB (match_operand:ADDSUB 1 "s_register_operand")
+		 (match_operand:ADDSUB 2 "s_register_operand")))]
+  "TARGET_INT_SIMD"
+  {
+if (ARM_GE_BITS_READ)
+

[PATCH][arm][5/X] Implement Q-bit-setting SIMD32 intrinsics

2019-11-07 Thread Kyrill Tkachov


Hi all,

This patch implements some more Q-setting intrinsics of the 
multiply-accumulate
variety, but these are in the SIMD32 family in that they treat their 
operands

as packed SIMD values, but that's not important at the RTL level.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk.
Thanks,
Kyrill

2019-11-07  Kyrylo Tkachov  

    * config/arm/arm.md (arm__insn):
    New define_insns.
    (arm_): New define_expands.
    * config/arm/arm_acle.h (__smlad, __smladx, __smlsd, __smlsdx,
    __smuad, __smuadx): Define.
    * config/arm/arm_acle_builtins.def: Define builtins for the above.
    * config/arm/iterators.md (SIMD32_TERNOP_Q): New int_iterator.
    (SIMD32_BINOP_Q): Likewise.
    (simd32_op): Handle the above.
    * config/arm/unspecs.md: Define unspecs for the above.

2019-11-07  Kyrylo Tkachov  

    * gcc.target/arm/acle/simd32.c: Update test.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 884a224a991102955787600317581e6468463bea..7717f547ab4706183d2727013496c249edbe7abf 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5865,6 +5865,62 @@
   [(set_attr "predicable" "yes")
(set_attr "type" "alu_sreg")])
 
+(define_insn "arm__insn"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand" "r")
+	   (match_operand:SI 2 "s_register_operand" "r")
+	   (match_operand:SI 3 "s_register_operand" "r")] SIMD32_TERNOP_Q))]
+  "TARGET_INT_SIMD && "
+  "%?\\t%0, %1, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "alu_sreg")])
+
+(define_expand "arm_"
+  [(set (match_operand:SI 0 "s_register_operand")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand")
+	   (match_operand:SI 2 "s_register_operand")
+	   (match_operand:SI 3 "s_register_operand")] SIMD32_TERNOP_Q))]
+  "TARGET_INT_SIMD"
+  {
+if (ARM_Q_BIT_READ)
+  emit_insn (gen_arm__setq_insn (operands[0], operands[1],
+		operands[2], operands[3]));
+else
+  emit_insn (gen_arm__insn (operands[0], operands[1],
+	   operands[2], operands[3]));
+DONE;
+  }
+)
+
+(define_insn "arm__insn"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand" "r")
+	   (match_operand:SI 2 "s_register_operand" "r")] SIMD32_BINOP_Q))]
+  "TARGET_INT_SIMD && "
+  "%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "alu_sreg")])
+
+(define_expand "arm_"
+  [(set (match_operand:SI 0 "s_register_operand")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand")
+	   (match_operand:SI 2 "s_register_operand")] SIMD32_BINOP_Q))]
+  "TARGET_INT_SIMD"
+  {
+if (ARM_Q_BIT_READ)
+  emit_insn (gen_arm__setq_insn (operands[0], operands[1],
+		operands[2]));
+else
+  emit_insn (gen_arm__insn (operands[0], operands[1],
+	   operands[2]));
+DONE;
+  }
+)
+
 (define_insn "arm_sel"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(unspec:SI
diff --git a/gcc/config/arm/arm_acle.h b/gcc/config/arm/arm_acle.h
index b8d02a5502f273fcba492bbeba2542b13334a8ea..c30645e3949f84321fb1dfe3afd06167ef859d62 100644
--- a/gcc/config/arm/arm_acle.h
+++ b/gcc/config/arm/arm_acle.h
@@ -522,6 +522,48 @@ __usub16 (uint16x2_t __a, uint16x2_t __b)
   return __builtin_arm_usub16 (__a, __b);
 }
 
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smlad (int16x2_t __a, int16x2_t __b, int32_t __c)
+{
+  return __builtin_arm_smlad (__a, __b, __c);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smladx (int16x2_t __a, int16x2_t __b, int32_t __c)
+{
+  return __builtin_arm_smladx (__a, __b, __c);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smlsd (int16x2_t __a, int16x2_t __b, int32_t __c)
+{
+  return __builtin_arm_smlsd (__a, __b, __c);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smlsdx (int16x2_t __a, int16x2_t __b, int32_t __c)
+{
+  return __builtin_arm_smlsdx (__a, __b, __c);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smuad (int16x2_t __a, int16x2_t __b)
+{
+  return __builtin_arm_smuad (__a, __b);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smuadx (int16x2_t __a, int16x2_t __b)
+{
+  return __builtin_arm_smuadx (__a, __b);
+}
+
 #endif
 
 #ifdef __ARM_FEATURE_SAT
diff --git a/gcc/config/arm/arm_acle_builtins.def b/gcc/config/arm/arm_acle_builtins.def
index 715c3c94e8c8f6355e880a36eb275be80d1a3912..018d89682c61a963961515823420f1b986cd40db 100644
--- a/gcc/config/arm/arm_acle_builtins.def
+++ b/gcc/config/arm/arm_acle_builtins.def
@@ -107,3 +107,10 @@ VAR1 (UBINOP, usax, si)
 VAR1 (UBINOP, usub16, si)
 
 VAR1 (UBINOP, sel, si)

[PATCH][arm][6/X] Add support for __[us]sat16 intrinsics

2019-11-07 Thread Kyrill Tkachov


Hi all,

This last patch adds the the __ssat16 and __usat16 intrinsics that perform
"clipping" to a particular bitwidth on packed SIMD values, setting the Q bit
as appropriate.

Bootstrapped and tested on arm-none-linux-gnueabihf.
Committing to trunk.
Thanks,
Kyrill

2019-11-07  Kyrylo Tkachov  

    * config/arm/arm.md (arm_): New define_expand.
    (arm__insn): New define_insn.
    * config/arm/arm_acle.h (__ssat16, __usat16): Define.
    * config/arm/arm_acle_builtins.def: Define builtins for the above.
    * config/arm/iterators.md (USSAT16): New int_iterator.
    (simd32_op): Handle UNSPEC_SSAT16, UNSPEC_USAT16.
    (sup): Likewise.
    * config/arm/predicates.md (ssat16_imm): New predicate.
    (usat16_imm): Likewise.
    * config/arm/unspecs.md (UNSPEC_SSAT16, UNSPEC_USAT16): Define.

2019-11-07  Kyrylo Tkachov  

    * gcc.target/arm/acle/simd32.c: Update test.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 7717f547ab4706183d2727013496c249edbe7abf..f2f5094f9e2a802557e5c19db1edbc028a91cbd8 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5921,6 +5921,33 @@
   }
 )
 
+(define_insn "arm__insn"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand" "r")
+	   (match_operand:SI 2 "sat16_imm" "i")] USSAT16))]
+  "TARGET_INT_SIMD && "
+  "%?\\t%0, %2, %1"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "alu_sreg")])
+
+(define_expand "arm_"
+  [(set (match_operand:SI 0 "s_register_operand")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand")
+	   (match_operand:SI 2 "sat16_imm")] USSAT16))]
+  "TARGET_INT_SIMD"
+  {
+if (ARM_Q_BIT_READ)
+  emit_insn (gen_arm__setq_insn (operands[0], operands[1],
+		operands[2]));
+else
+  emit_insn (gen_arm__insn (operands[0], operands[1],
+	   operands[2]));
+DONE;
+  }
+)
+
 (define_insn "arm_sel"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(unspec:SI
diff --git a/gcc/config/arm/arm_acle.h b/gcc/config/arm/arm_acle.h
index c30645e3949f84321fb1dfe3afd06167ef859d62..9ea922f2d096870d2c2d34ac43f03e3bc9dc4741 100644
--- a/gcc/config/arm/arm_acle.h
+++ b/gcc/config/arm/arm_acle.h
@@ -564,6 +564,24 @@ __smuadx (int16x2_t __a, int16x2_t __b)
   return __builtin_arm_smuadx (__a, __b);
 }
 
+#define __ssat16(__a, __sat)	\
+  __extension__			\
+  ({\
+int16x2_t __arg = (__a);	\
+__builtin_sat_imm_check (__sat, 1, 16);			\
+int16x2_t __res = __builtin_arm_ssat16 (__arg, __sat);	\
+__res;			\
+  })
+
+#define __usat16(__a, __sat)	\
+  __extension__			\
+  ({\
+int16x2_t __arg = (__a);	\
+__builtin_sat_imm_check (__sat, 0, 15);			\
+int16x2_t __res = __builtin_arm_usat16 (__arg, __sat);	\
+__res;			\
+  })
+
 #endif
 
 #ifdef __ARM_FEATURE_SAT
diff --git a/gcc/config/arm/arm_acle_builtins.def b/gcc/config/arm/arm_acle_builtins.def
index 018d89682c61a963961515823420f1b986cd40db..8a21ff74f41840dd793221e079627055d379c474 100644
--- a/gcc/config/arm/arm_acle_builtins.def
+++ b/gcc/config/arm/arm_acle_builtins.def
@@ -114,3 +114,6 @@ VAR1 (TERNOP, smlsd, si)
 VAR1 (TERNOP, smlsdx, si)
 VAR1 (BINOP, smuad, si)
 VAR1 (BINOP, smuadx, si)
+
+VAR1 (SAT_BINOP_UNSIGNED_IMM, ssat16, si)
+VAR1 (SAT_BINOP_UNSIGNED_IMM, usat16, si)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 72aba5e86fc20216bcba74f5cfa5b9f744497a6e..c412851843f4468c2c18bce264288705e076ac50 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -458,6 +458,8 @@
 
 (define_int_iterator SIMD32_BINOP_Q [UNSPEC_SMUAD UNSPEC_SMUADX])
 
+(define_int_iterator USSAT16 [UNSPEC_SSAT16 UNSPEC_USAT16])
+
 (define_int_iterator VQRDMLH_AS [UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH])
 
 (define_int_iterator VFM_LANE_AS [UNSPEC_VFMA_LANE UNSPEC_VFMS_LANE])
@@ -918,6 +920,7 @@
   (UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u")
   (UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u")
   (UNSPEC_DOT_S "s") (UNSPEC_DOT_U "u")
+  (UNSPEC_SSAT16 "s") (UNSPEC_USAT16 "u")
 ])
 
 (define_int_attr vfml_half
@@ -1083,7 +1086,8 @@
 			(UNSPEC_USUB16 "usub16") (UNSPEC_SMLAD "smlad")
 			(UNSPEC_SMLADX "smladx") (UNSPEC_SMLSD "smlsd")
 			(UNSPEC_SMLSDX "smlsdx") (UNSPEC_SMUAD "smuad")
-			(UNSPEC_SMUADX "smuadx")])
+			(UNSPEC_SMUADX "smuadx") (UNSPEC_SSAT16 "ssat16")
+			(UNSPEC_USAT16 "usat16")])
 
 ;; Both kinds of return insn.
 (define_code_iterator RETURNS [return simple_return])
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 267c446c03e8903c21a0d74e43ae589ffcf689f4..c1f655c704011bbe8bac82c24a3234a23bf6b242 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -193,6 +193,14 @@
   (and (match_code "const_int")
(match_test "IN_RANGE (UINTVAL (op), 1, GET_MODE_BITSIZE (mode))")))
 
+(define_predicate "ssat16_imm"
+  (and (match_code "const_int")
+   (match_test "IN_RANGE (INTVAL (op), 1, 16)")))
+

[PATCH][arm][3/X] Implement __smla* intrinsics (Q-setting)

2019-11-07 Thread Kyrill Tkachov


Hi all,

This patch implements some more Q-setting intrinsics form the SMLA* group.
These can set the saturation bit on overflow in the accumulation step.
Like earlier, these have non-Q-setting RTL forms as well for when the 
Q-bit read

is not needed.

Bootstrapped and tested on arm-none-linux-gnueabihf.
Committing to trunk.
Thanks,
Kyrill

2019-11-07  Kyrylo Tkachov 

    * config/arm/arm.md (arm_smlabb_setq): New define_insn.
    (arm_smlabb): New define_expand.
    (*maddhisi4tb): Rename to...
    (maddhisi4tb): ... This.
    (*maddhisi4tt): Rename to...
    (maddhisi4tt): ... This.
    (arm_smlatb_setq): New define_insn.
    (arm_smlatb): New define_expand.
    (arm_smlatt_setq): New define_insn.
    (arm_smlatt): New define_expand.
    (arm__insn): New define_insn.
    (arm_): New define_expand.
    * config/arm/arm_acle.h (__smlabb, __smlatb, __smlabt, __smlatt,
    __smlawb, __smlawt): Define.
    * config/arm_acle_builtins.def: Define builtins for the above.
    * config/arm/iterators.md (SMLAWBT): New int_iterator.
    (slaw_op): New int_attribute.
    * config/arm/unspecs.md (UNSPEC_SMLAWB, UNSPEC_SMLAWT): Define.

2019-11-07  Kyrylo Tkachov 

    * gcc.target/arm/acle/dsp_arith.c: Update test.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index db7a4006eb4f354e08f22c666fea8f1e87726085..05c8ca2772d4475a25b037e3e745c9558e1c5742 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -2565,8 +2565,40 @@
(set_attr "predicable" "yes")]
 )
 
+(define_insn "arm_smlabb_setq"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(plus:SI (mult:SI (sign_extend:SI
+			   (match_operand:HI 1 "s_register_operand" "r"))
+			  (sign_extend:SI
+			   (match_operand:HI 2 "s_register_operand" "r")))
+		 (match_operand:SI 3 "s_register_operand" "r")))
+   (set (reg:CC APSRQ_REGNUM)
+	(unspec:CC [(reg:CC APSRQ_REGNUM)] UNSPEC_Q_SET))]
+  "TARGET_DSP_MULTIPLY"
+  "smlabb%?\\t%0, %1, %2, %3"
+  [(set_attr "type" "smlaxy")
+   (set_attr "predicable" "yes")]
+)
+
+(define_expand "arm_smlabb"
+ [(match_operand:SI 0 "s_register_operand")
+  (match_operand:SI 1 "s_register_operand")
+  (match_operand:SI 2 "s_register_operand")
+  (match_operand:SI 3 "s_register_operand")]
+  "TARGET_DSP_MULTIPLY"
+  {
+rtx mult1 = gen_lowpart (HImode, operands[1]);
+rtx mult2 = gen_lowpart (HImode, operands[2]);
+if (ARM_Q_BIT_READ)
+  emit_insn (gen_arm_smlabb_setq (operands[0], mult1, mult2, operands[3]));
+else
+  emit_insn (gen_maddhisi4 (operands[0], mult1, mult2, operands[3]));
+DONE;
+  }
+)
+
 ;; Note: there is no maddhisi4ibt because this one is canonical form
-(define_insn "*maddhisi4tb"
+(define_insn "maddhisi4tb"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(plus:SI (mult:SI (ashiftrt:SI
 			   (match_operand:SI 1 "s_register_operand" "r")
@@ -2580,7 +2612,41 @@
(set_attr "predicable" "yes")]
 )
 
-(define_insn "*maddhisi4tt"
+(define_insn "arm_smlatb_setq"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(plus:SI (mult:SI (ashiftrt:SI
+			   (match_operand:SI 1 "s_register_operand" "r")
+			   (const_int 16))
+			  (sign_extend:SI
+			   (match_operand:HI 2 "s_register_operand" "r")))
+		 (match_operand:SI 3 "s_register_operand" "r")))
+   (set (reg:CC APSRQ_REGNUM)
+	(unspec:CC [(reg:CC APSRQ_REGNUM)] UNSPEC_Q_SET))]
+  "TARGET_DSP_MULTIPLY"
+  "smlatb%?\\t%0, %1, %2, %3"
+  [(set_attr "type" "smlaxy")
+   (set_attr "predicable" "yes")]
+)
+
+(define_expand "arm_smlatb"
+ [(match_operand:SI 0 "s_register_operand")
+  (match_operand:SI 1 "s_register_operand")
+  (match_operand:SI 2 "s_register_operand")
+  (match_operand:SI 3 "s_register_operand")]
+  "TARGET_DSP_MULTIPLY"
+  {
+rtx mult2 = gen_lowpart (HImode, operands[2]);
+if (ARM_Q_BIT_READ)
+  emit_insn (gen_arm_smlatb_setq (operands[0], operands[1],
+  mult2, operands[3]));
+else
+  emit_insn (gen_maddhisi4tb (operands[0], operands[1],
+  mult2, operands[3]));
+DONE;
+  }
+)
+
+(define_insn "maddhisi4tt"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(plus:SI (mult:SI (ashiftrt:SI
 			   (match_operand:SI 1 "s_register_operand" "r")
@@ -2595,6 +2661,40 @@
(set_attr "predicable" "yes")]
 )
 
+(define_insn "arm_smlatt_setq"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(plus:SI (mult:SI (ashiftrt:SI
+			   (match_operand:SI 1 "s_register_operand" "r")
+			   (const_int 16))
+			  (ashiftrt:SI
+			   (match_operand:SI 2 "s_register_operand" "r")
+			   (const_int 16)))
+		 (match_operand:SI 3 "s_register_operand" "r")))
+   (set (reg:CC APSRQ_REGNUM)
+	(unspec:CC [(reg:CC APSRQ_REGNUM)] UNSPEC_Q_SET))]
+  "TARGET_DSP_MULTIPLY"
+  "smlatt%?\\t%0, %1, %2, %3"
+  [(set_attr "type" "smlaxy")
+   (set_attr "predicable" "yes")]
+)
+
+(define_expand "arm_smlatt"
+ [(match_operand:SI 0 "s_register_operand")
+  (match_operand:SI 1 "s_register_operand")
+  (match_operand:SI 2 "s_register_operand")
+  (match_operand:SI 3

[PATCH][arm][2/X] Implement qadd, qsub, __qdbl intrinsics

2019-11-07 Thread Kyrill Tkachov


Hi all,

This patch implements some more Q-bit-setting intrinsics from ACLE.
With the plumbing from patch 1 in place they are a simple builtin->RTL 
affair.


Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk.
Thanks,
Kyrill

2019-11-07  Kyrylo Tkachov  

    * config/arm/arm.md (arm_): New define_expand.
    (arm__insn): New define_insn.
    * config/arm/arm_acle.h (__qadd, __qsub, __qdbl): Define.
    * config/arm/arm_acle_builtins.def: Add builtins for qadd, qsub.
    * config/arm/iterators.md (SSPLUSMINUS): New code iterator.
    (ss_op): New code_attr.

2019-11-07  Kyrylo Tkachov  

    * gcc.target/arm/acle/dsp_arith.c: New test.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 09b632b5dbc8b38dcca22494468366c97a514bb6..db7a4006eb4f354e08f22c666fea8f1e87726085 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4076,6 +4076,32 @@
(set_attr "type" "multiple")]
 )
 
+
+(define_expand "arm_"
+  [(set (match_operand:SI 0 "s_register_operand")
+	(SSPLUSMINUS:SI (match_operand:SI 1 "s_register_operand")
+			(match_operand:SI 2 "s_register_operand")))]
+  "TARGET_DSP_MULTIPLY"
+  {
+if (ARM_Q_BIT_READ)
+  emit_insn (gen_arm__setq_insn (operands[0],
+	operands[1], operands[2]));
+else
+  emit_insn (gen_arm__insn (operands[0], operands[1], operands[2]));
+DONE;
+  }
+)
+
+(define_insn "arm__insn"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(SSPLUSMINUS:SI (match_operand:SI 1 "s_register_operand" "r")
+			(match_operand:SI 2 "s_register_operand" "r")))]
+  "TARGET_DSP_MULTIPLY && "
+  "%?\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "alu_dsp_reg")]
+)
+
 (define_code_iterator SAT [smin smax])
 (define_code_attr SATrev [(smin "smax") (smax "smin")])
 (define_code_attr SATlo [(smin "1") (smax "2")])
diff --git a/gcc/config/arm/arm_acle.h b/gcc/config/arm/arm_acle.h
index 2564ad849856610f9415586e386f85eea6947bf7..397653d3e8bf43cbcb82d98dd704bcd3a66cf782 100644
--- a/gcc/config/arm/arm_acle.h
+++ b/gcc/config/arm/arm_acle.h
@@ -478,6 +478,29 @@ __ignore_saturation (void)
   })
 #endif
 
+#ifdef __ARM_FEATURE_DSP
+__extension__ extern __inline int32_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+__qadd (int32_t __a, int32_t __b)
+{
+  return __builtin_arm_qadd (__a, __b);
+}
+
+__extension__ extern __inline int32_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+__qsub (int32_t __a, int32_t __b)
+{
+  return __builtin_arm_qsub (__a, __b);
+}
+
+__extension__ extern __inline int32_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+__qdbl (int32_t __x)
+{
+  return __qadd (__x, __x);
+}
+#endif
+
 #pragma GCC push_options
 #ifdef __ARM_FEATURE_CRC32
 #ifdef __ARM_FP
diff --git a/gcc/config/arm/arm_acle_builtins.def b/gcc/config/arm/arm_acle_builtins.def
index c72480321faa952ac307418f9e4f7d5f5f9e3745..def1a569311e67194a323decc309ed92747c4c86 100644
--- a/gcc/config/arm/arm_acle_builtins.def
+++ b/gcc/config/arm/arm_acle_builtins.def
@@ -84,3 +84,5 @@ VAR1 (SAT_BINOP_UNSIGNED_IMM, ssat, si)
 VAR1 (UNSIGNED_SAT_BINOP_UNSIGNED_IMM, usat, si)
 VAR1 (SAT_OCCURRED, saturation_occurred, si)
 VAR1 (SET_SAT, set_saturation, void)
+VAR1 (BINOP, qadd, si)
+VAR1 (BINOP, qsub, si)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index e5cef6852a2dfcef4cd3597c163a53a6c247afab..ebb8218f265023786730881ef0bc9f818e7235b0 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -264,6 +264,9 @@
 ;; Conversions.
 (define_code_iterator FCVT [unsigned_float float])
 
+;; Saturating addition, subtraction
+(define_code_iterator SSPLUSMINUS [ss_plus ss_minus])
+
 ;; plus and minus are the only SHIFTABLE_OPS for which Thumb2 allows
 ;; a stack pointer operand.  The minus operation is a candidate for an rsub
 ;; and hence only plus is supported.
@@ -282,6 +285,8 @@
 
 (define_code_attr vfml_op [(plus "a") (minus "s")])
 
+(define_code_attr ss_op [(ss_plus "qadd") (ss_minus "qsub")])
+
 ;;
 ;; Int iterators
 ;;
diff --git a/gcc/testsuite/gcc.target/arm/acle/dsp_arith.c b/gcc/testsuite/gcc.target/arm/acle/dsp_arith.c
new file mode 100644
index ..f0bf80993beb0007b0eb360878f0fd1811098d9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/dsp_arith.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_qbit_ok } */
+/* { dg-add-options arm_qbit  } */
+
+#include 
+
+int32_t
+test_qadd (int32_t a, int32_t b)
+{
+  return __qadd (a, b);
+}
+
+int32_t
+test_qdbl (int32_t a)
+{
+  return __qdbl(a);
+}
+
+/* { dg-final { scan-assembler-times "qadd\t...?, ...?, ...?" 2 } } */
+
+int32_t
+test_qsub (int32_t a, int32_t b)
+{
+  return __qsub (a, b);
+}
+
+/* { dg-final { scan-assembler-times "qsub\t...?,

[PATCH][arm][1/X] Add initial support for saturation intrinsics

2019-11-07 Thread Kyrill Tkachov


Hi all,

This patch adds the plumbing for and an implementation of the saturation
intrinsics from ACLE [1], in particular the __ssat, __usat intrinsics.
These intrinsics set the Q sticky bit in APSR if an overflow occurred.
ACLE allows the user to read that bit (within the same function, it's not
defined across function boundaries) using the __saturation_occurred 
intrinsic

and reset it using __set_saturation_occurred.
Thus, if the user cares about the Q bit they would be using a flow such as:

__set_saturation_occurred (0); // reset the Q bit
...
__ssat (...) // Do some calculations involving __ssat
...
if (__saturation_occurred ()) // if Q bit set handle overflow
  ...

For the implementation this has a few implications:
* We must track the Q-setting side-effects of these instructions to make 
sure

saturation reading/writing intrinsics are ordered properly.
This is done by introducing a new "apsrq" register (and associated
APSRQ_REGNUM) in a similar way to the "fake"" cc register.

* The RTL patterns coming out of these intrinsics can have two forms:
one where they set the APSRQ_REGNUM and one where they don't.
Which one is used depends on whether the function cares about reading the Q
flag. This is detected using the TARGET_CHECK_BUILTIN_CALL hook on the
__saturation_occurred, __set_saturation_occurred occurrences.
If no Q-flag read is present in the function we'll use the simpler
non-Q-setting form to allow for more aggressive scheduling and such.
If a Q-bit read is present then the Q-setting form is emitted.
To avoid adding two patterns for each intrinsic to the MD file we make
use of define_subst to auto-generate the Q-setting forms

* Some existing patterns already produce instructions that may clobber the
Q bit, but they don't model it (as we didn't care about that bit up till 
now).
Since these patterns can be generated from straight-line C code they can 
affect

the Q-bit reads from intrinsics. Therefore they have to be disabled when
a Q-bit read is present.  These are mostly patterns in arm-fixed.md that are
not very common anyway, but there are also a couple of widening
multiply-accumulate patterns in arm.md that can set the Q-bit during
accumulation.

There are more Q-setting intrinsics in ACLE, but these will be 
implemented in

a more mechanical fashion once the infrastructure in this patch goes in.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk.
Thanks,
Kyrill


2019-11-07  Kyrylo Tkachov  

    * config/arm/aout.h (REGISTER_NAMES): Add apsrq.
    * config/arm/arm.md (APSRQ_REGNUM): Define.
    (add_setq): New define_subst.
    (add_clobber_q_name): New define_subst_attr.
    (add_clobber_q_pred): Likewise.
    (maddhisi4): Change to define_expand.  Split into mult and add if
    ARM_Q_BIT_READ.
    (arm_maddhisi4): New define_insn.
    (*maddhisi4tb): Disable for ARM_Q_BIT_READ.
    (*maddhisi4tt): Likewise.
    (arm_ssat): New define_expand.
    (arm_usat): Likewise.
    (arm_get_apsr): New define_insn.
    (arm_set_apsr): Likewise.
    (arm_saturation_occurred): New define_expand.
    (arm_set_saturation): Likewise.
    (*satsi_): Rename to...
    (satsi_): ... This.
    (*satsi__shift): Disable for ARM_Q_BIT_READ.
    * config/arm/arm.h (FIXED_REGISTERS): Mark apsrq as fixed.
    (CALL_USED_REGISTERS): Mark apsrq.
    (FIRST_PSEUDO_REGISTER): Update value.
    (REG_ALLOC_ORDER): Add APSRQ_REGNUM.
    (machine_function): Add q_bit_access.
    (ARM_Q_BIT_READ): Define.
    * config/arm/arm.c (TARGET_CHECK_BUILTIN_CALL): Define.
    (arm_conditional_register_usage): Clear APSRQ_REGNUM from
    operand_reg_set.
    (arm_q_bit_access): Define.
    * config/arm/arm-builtins.c: Include stringpool.h.
    (arm_sat_binop_imm_qualifiers,
    arm_unsigned_sat_binop_unsigned_imm_qualifiers,
    arm_sat_occurred_qualifiers, arm_set_sat_qualifiers): Define.
    (SAT_BINOP_UNSIGNED_IMM_QUALIFIERS,
    UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS, SAT_OCCURRED_QUALIFIERS,
    SET_SAT_QUALIFIERS): Likewise.
    (arm_builtins): Define ARM_BUILTIN_SAT_IMM_CHECK.
    (arm_init_acle_builtins): Initialize __builtin_sat_imm_check.
    Handle 0 argument expander.
    (arm_expand_acle_builtin): Handle ARM_BUILTIN_SAT_IMM_CHECK.
    (arm_check_builtin_call): Define.
    * config/arm/arm.md (ssmulsa3, usmulusa3, usmuluha3,
    arm_ssatsihi_shift, arm_usatsihi): Disable when ARM_Q_BIT_READ.
    * config/arm/arm-protos.h (arm_check_builtin_call): Declare prototype.
    (arm_q_bit_access): Likewise.
    * config/arm/arm_acle.h (__ssat, __usat, __ignore_saturation,
    __saturation_occurred, __set_saturation_occurred): Define.
    * config/arm/arm_acle_builtins.def: Define builtins for ssat, usat,
    saturation_occurred, set_saturation_occurred.
    * config/arm/unspecs.md (UNSPEC_Q_SET): Define.
    (UNSPEC_APSR_READ): Likewise.
    (VUNSPEC_APSR_WRITE): Likewise.
    * config/arm/arm-fixed.md (ssadd3): Convert to define_expand.
    (*arm_ssadd3): New define_insn.
    (sssub3): Convert

Re: [PATCH rs6000]Fix PR92132

2019-11-07 Thread Kewen.Lin

Hi Segher,

on 2019/11/7 上午7:49, Segher Boessenkool wrote:
> 
> The expander named "one_cmpl3":
> 
> Erm.  2, not 3 :-)
> 
> (define_expand "one_cmpl2"
>   [(set (match_operand:BOOL_128 0 "vlogical_operand")
> (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")))]
>   ""
>   "")
> 
> while the define_insn is
> 
> (define_insn_and_split "*one_cmpl3_internal"
>   [(set (match_operand:BOOL_128 0 "vlogical_operand" "=")
> (not:BOOL_128
>   (match_operand:BOOL_128 1 "vlogical_operand" "")))]
>   ""
> {
> 

Ah, sorry I didn't notice we have one cmpl**3** but actually for one
cmpl**2** expand, a bit surprised.  Done.  Thanks for pointing that.

> etc., so you can just delete the expand and rename the insn to the proper
> name (one_cmpl2).  It sometimes is useful to have an expand like
> this if there are multiple insns that could implement this, but that is
> not the case here.
> 

OK, example like vector_select?  :)

 +(define_code_iterator fpcmpun [ungt unge unlt unle])
>>>
>>> Why these four?  Should there be more?  Should this be added to some
>>> existing iterator?
>>
>> For floating point comparison operator and vector type, currently rs6000
>> supports eq, gt, ge, *ltgt, *unordered, *ordered, *uneq (* for unnamed).
>> We can leverage gt, ge, eq for lt, le, ne, then these four left.
> 
> There are four conditions for FP: lt/gt/eq/un.  For every comparison,
> exactly one of the four is true.  If not HONOR_NANS for this mode you
> never have un, so it is one of lt/gt/eq then, just like with integers.
> 
> If we have HONOR_NANS(mode) (or !flag_finite_math_only), there are 14
> possible combinations to test for (testing for any of the four or none
> of the four is easy ;-) )
> 
> Four test just if lt, gt, eq, or un is set.  Another four test if one of
> the flags is *not* set, or said differently, if one of three flags is set:
> ordered, ne, unle, unge.  The remaining six test two flags each: ltgt, le,
> unlt, ge, ungt, uneq.

Yes, for these 14, rs6000 current support status:

  ge: vector_ge -> define_expand -> match vsx/altivec insn
  gt: vector_gt -> define_expand -> match vsx/altivec insn
  eq: vector_eq -> define_expand -> match vsx/altivec insn
  
  ltgt: *vector_ltgt -> define_insn_and_split
  ord: *vector_ordered -> define_insn_and_split
  unord: *vector_unordered -> define_insn_and_split
  uneq: *vector_uneq -> define_insn_and_split

  ne: no RTL pattern.
  lt: Likewise.
  le: Likewise.
  unge: Likewise.
  ungt: Likewise.
  unle: Likewise.
  unlt: Likewise.

Since I thought the un{ge,gt,le,lt} is a bit complicated than ne/lt/le (wrong
thought actually), I added the specific define_expand for them.  As your
simpler example below, I've added the RTL patterns with define_expand for the
missing ne, lt, le, unge, ungt, unle, unlt.

I didn't use iterator any more, since without further refactoring, just
several ones (2 each pair) can be shared with iterators, and need to check
 to decide swap or not.  Maybe the subsequent uniform refactoring patch
is required to make it?  

> 
>> I originally wanted to merge them into the existing unordered or uneq, but
>> I found it's hard to share their existing patterns.  For example, the uneq
>> looks like:
>>
>>   [(set (match_dup 3)
>>  (gt:VEC_F (match_dup 1)
>>(match_dup 2)))
>>(set (match_dup 4)
>>  (gt:VEC_F (match_dup 2)
>>(match_dup 1)))
>>(set (match_dup 0)
>>  (and:VEC_F (not:VEC_F (match_dup 3))
>> (not:VEC_F (match_dup 4]
> 
> Or ge/ge/eqv, etc. -- there are multiple options.
> 
>> While ungt looks like:
>>
>>   [(set (match_dup 3)
>>  (ge:VEC_F (match_dup 1)
>>(match_dup 2)))
>>(set (match_dup 4)
>>  (ge:VEC_F (match_dup 2)
>>(match_dup 1)))
>>(set (match_dup 3)
>>  (ior:VEC_F (not:VEC_F (match_dup 3))
>> (not:VEC_F (match_dup 4
>>(set (match_dup 4)
>>  (gt:VEC_F (match_dup 1)
>>(match_dup 2)))
>>(set (match_dup 3)
>>  (ior:VEC_F (match_dup 3)
>> (match_dup 4)))]
> 
> (set (match_dup 3)
>  (ge:VEC_F (match_dup 2)
>(match_dup 1)))
> (set (match_dup 0)
>  (not:VEC_F (match_dup 3)))
> 
> should be enough?
> 

Nice!  I was trapped to get unordered first.  :(

> 
> So we have only gt/ge/eq.
> 
> I think the following are ooptimal (not tested!):
> 
> lt(a,b) = gt(b,a)
yes, this is what I used for that operator.

> gt(a,b) = gt(a,b)
> eq(a,b) = eq(a,b)
> un(a,b) = ~(ge(a,b) | ge(b,a))
> 

existing code uses (~ge(a,b) & ~ge(b,a))
but should be the same.

> ltgt(a,b) = ge(a,b) ^ ge(b,a)

existing code uses gt(a,b) | gt(b,a)
but should be the same.

> le(a,b)   = ge(b,a)
> unlt(a,b) = ~ge(a,b)
> ge(a,b)   = ge(a,b)
> ungt(a,b) = ~ge(b,a)
> uneq(a,b) = ~(ge(a,b) ^ ge(b,a))
> 

existing code uses ~gt(a,b) & ~gt(b,a)
but should be the same.

> ord(a,b)  = ge(a,b) | ge(b,a)
> ne(a,b)   = ~eq(a,b)
> unle(a,b) = ~gt(a,b)
>

[PATCH] Add OpenACC 2.6 `serial' construct support

2019-11-07 Thread Frederik Harwath

Hi,
this patch implements the OpenACC 2.6 "serial" construct.
It has been tested by running the testsuite with nvptx-none
offloading on x86_64-pc-linux-gnu.

Best regards,
Frederik
 
 8< ---

The `serial' construct (cf. section 2.5.3 of the OpenACC 2.6 standard)
is equivalent to a `parallel' construct with clauses `num_gangs(1)
 num_workers(1) vector_length(1)' implied.
These clauses are therefore not supported with the `serial'
construct. All the remaining clauses accepted with `parallel' are also
accepted with `serial'.

The `serial' construct is implemented like `parallel', except for
hardcoding dimensions rather than taking them from the relevant
clauses, in `expand_omp_target'.

Separate codes are used to denote the `serial' construct throughout the
middle end, even though the mapping of `serial' to an equivalent
`parallel' construct could have been done in the individual language
frontends. In particular, this allows to distinguish between `parallel'
and `serial' in warnings, error messages, dumps etc.

2019-11-07  Maciej W. Rozycki  
Tobias Burnus  
Frederik Harwath  

gcc/
* gimple.h (gf_mask): Add GF_OMP_TARGET_KIND_OACC_SERIAL
enumeration constant.
(is_gimple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(is_gimple_omp_offloaded): Likewise.
* gimplify.c (omp_region_type): Add ORT_ACC_SERIAL enumeration
constant.  Adjust the value of ORT_NONE accordingly.
(is_gimple_stmt): Handle OACC_SERIAL.
(oacc_default_clause): Handle ORT_ACC_SERIAL.
(gomp_needs_data_present): Likewise.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_workshare): Handle OACC_SERIAL.
(gimplify_expr): Likewise.
* omp-builtins.def (BUILT_IN_GOACC_PARALLEL): Add parameter.
* omp-expand.c (expand_omp_target):
Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(build_omp_regions_1, omp_make_gimple_edges): Likewise.
* omp-low.c (is_oacc_parallel): Rename function to...
(is_oacc_parallel_or_serial): ... this.
Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(scan_sharing_clauses): Adjust accordingly.
(scan_omp_for): Likewise.
(lower_oacc_head_mark): Likewise.
(convert_from_firstprivate_int): Likewise.
(lower_omp_target): Likewise.
(check_omp_nesting_restrictions): Handle
GF_OMP_TARGET_KIND_OACC_SERIAL.
(lower_oacc_reductions): Likewise.
(lower_omp_target): Likewise.
* tree.def (OACC_SERIAL): New tree code.
* tree-pretty-print.c (dump_generic_node): Handle OACC_SERIAL.

* doc/generic.texi (OpenACC): Document OACC_SERIAL.

gcc/c-family/
* c-pragma.h (pragma_kind): Add PRAGMA_OACC_SERIAL enumeration
constant.
* c-pragma.c (oacc_pragmas): Add "serial" entry.

gcc/c/
* c-parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(c_parser_oacc_kernels_parallel): Rename function to...
(c_parser_oacc_compute): ... this.  Handle PRAGMA_OACC_SERIAL.
(c_parser_omp_construct): Update accordingly.

gcc/cp/
* constexpr.c (potential_constant_expression_1): Handle
OACC_SERIAL.
* parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(cp_parser_oacc_kernels_parallel): Rename function to...
(cp_parser_oacc_compute): ... this.  Handle PRAGMA_OACC_SERIAL.
(cp_parser_omp_construct): Update accordingly.
(cp_parser_pragma): Handle PRAGMA_OACC_SERIAL.  Fix alphabetic
order.
* pt.c (tsubst_expr): Handle OACC_SERIAL.

gcc/fortran/
* gfortran.h (gfc_statement): Add ST_OACC_SERIAL_LOOP,
ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL and ST_OACC_END_SERIAL
enumeration constants.
(gfc_exec_op): Add EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL
enumeration constants.
* match.h (gfc_match_oacc_serial): New prototype.
(gfc_match_oacc_serial_loop): Likewise.
* dump-parse-tree.c (show_omp_node, show_code_node): Handle
EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
* match.c (match_exit_cycle): Handle EXEC_OACC_SERIAL_LOOP.
* openmp.c (OACC_SERIAL_CLAUSES): New macro.
(gfc_match_oacc_serial_loop): New function.
(gfc_match_oacc_serial): Likewise.
(oacc_is_loop): Handle EXEC_OACC_SERIAL_LOOP.
(resolve_omp_clauses): Handle EXEC_OACC_SERIAL.
(oacc_code_to_statement): Handle EXEC_OACC_SERIAL and
EXEC_OACC_SERIAL_LOOP.
(gfc_resolve_oacc_directive): Likewise.
* parse.c (decode_oacc_directive) <'s'>: Add case for "serial"
and "serial loop".
(next_statement): Handle ST_OACC_SERIAL_LOOP and ST_OACC_SERIAL.
(gfc_ascii_statement): Likewise.  Handle ST_OACC_END_SERIAL_LOOP
and ST_OACC_END_SERIAL.

Re: [PATCH target/92295] Fix inefficient vector constructor

2019-11-07 Thread Richard Biener

On Thu, Nov 7, 2019 at 7:58 AM Hongtao Liu  wrote:
>
> Ping!

OK.

Thanks,
Richard.

> On Sat, Nov 2, 2019 at 9:38 PM Hongtao Liu  wrote:
> >
> > Hi Jakub:
> >   Could you help reviewing this patch.
> >
> > PS: Since this patch is related to vectors(avx512f), and Uros
> > mentioned before that he has no intension to maintain avx512f.
> >
> > On Fri, Nov 1, 2019 at 9:12 AM Hongtao Liu  wrote:
> > >
> > > Hi uros:
> > >   This patch is about to fix inefficient vector constructor.
> > >   Currently in ix86_expand_vector_init_concat, vector are initialized
> > > per 2 elements which can miss some optimization opportunity like
> > > pr92295.
> > >
> > >   Bootstrap and i386 regression test is ok.
> > >   Ok for trunk?
> > >
> > > Changelog
> > > gcc/
> > > PR target/92295
> > > * config/i386/i386-expand.c (ix86_expand_vector_init_concat)
> > > Enhance ix86_expand_vector_init_concat.
> > >
> > > gcc/testsuite
> > > * gcc.target/i386/pr92295.c: New test.
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao

Re: [patch][avr] PR92055: Add switches to enable 64-bit [long] double.

2019-11-07 Thread Martin Liška


Hello.

I've noticed quite some GNU coding style violations with your patch.
Please next time, use something like:

$ git diff HEAD~ > /tmp/patch && ./contrib/check_GNU_style.py /tmp/patch

Thanks,
Martin

Re: Generalise gather and scatter optabs

2019-11-07 Thread Richard Biener

On Wed, Nov 6, 2019 at 5:06 PM Richard Sandiford
 wrote:
>
> The gather and scatter optabs required the vector offset to be
> the integer equivalent of the vector mode being loaded or stored.
> This patch generalises them so that the two vectors can have different
> element sizes, although they still need to have the same number of
> elements.
>
> One consequence of this is that it's possible (if unlikely)
> for two IFN_GATHER_LOADs to have the same arguments but different
> return types.  E.g. the same scalar base and vector of 32-bit offsets
> could be used to load 8-bit elements and to load 16-bit elements.
> From just looking at the arguments, we could wrongly deduce that
> they're equivalent.
>
> I know we saw this happen at one point with IFN_WHILE_ULT,
> and we dealt with it there by passing a zero of the return type
> as an extra argument.  Doing the same here also makes the load
> and store functions have the same argument assignment.
>
> For now this patch should be a no-op, but later SVE patches take
> advantage of the new flexibility.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> 2019-11-06  Richard Sandiford  
>
> gcc/
> * optabs.def (gather_load_optab, mask_gather_load_optab)
> (scatter_store_optab, mask_scatter_store_optab): Turn into
> conversion optabs, with the offset mode given explicitly.
> * doc/md.texi: Update accordingly.
> * config/aarch64/aarch64-sve-builtins-base.cc
> (svld1_gather_impl::expand): Likewise.
> (svst1_scatter_impl::expand): Likewise.
> * internal-fn.c (gather_load_direct, scatter_store_direct): Likewise.
> (expand_scatter_store_optab_fn): Likewise.
> (direct_gather_load_optab_supported_p): Likewise.
> (direct_scatter_store_optab_supported_p): Likewise.
> (expand_gather_load_optab_fn): Likewise.  Expect the mask argument
> to be argument 4.
> (internal_fn_mask_index): Return 4 for IFN_MASK_GATHER_LOAD.
> (internal_gather_scatter_fn_supported_p): Replace the offset sign
> argument with the offset vector type.  Require the two vector
> types to have the same number of elements but allow their element
> sizes to be different.  Treat the optabs as conversion optabs.
> * internal-fn.h (internal_gather_scatter_fn_supported_p): Update
> prototype accordingly.
> * optabs-query.c (supports_at_least_one_mode_p): Replace with...
> (supports_vec_convert_optab_p): ...this new function.
> (supports_vec_gather_load_p): Update accordingly.
> (supports_vec_scatter_store_p): Likewise.
> * tree-vectorizer.h (vect_gather_scatter_fn_p): Take a vec_info.
> Replace the offset sign and bits parameters with a scalar type tree.
> * tree-vect-data-refs.c (vect_gather_scatter_fn_p): Likewise.
> Pass back the offset vector type instead of the scalar element type.
> Allow the offset to be wider than the memory elements.  Search for
> an offset type that the target supports, stopping once we've
> reached the maximum of the element size and pointer size.
> Update call to internal_gather_scatter_fn_supported_p.
> (vect_check_gather_scatter): Update calls accordingly.
> When testing a new scale before knowing the final offset type,
> check whether the scale is supported for any signed or unsigned
> offset type.  Check whether the target supports the source and
> target types of a conversion before deciding whether to look
> through the conversion.  Record the chosen offset_vectype.
> * tree-vect-patterns.c (vect_get_gather_scatter_offset_type): Delete.
> (vect_recog_gather_scatter_pattern): Get the scalar offset type
> directly from the gs_info's offset_vectype instead.  Pass a zero
> of the result type to IFN_GATHER_LOAD and IFN_MASK_GATHER_LOAD.
> * tree-vect-stmts.c (check_load_store_masking): Update call to
> internal_gather_scatter_fn_supported_p, passing the offset vector
> type recorded in the gs_info.
> (vect_truncate_gather_scatter_offset): Update call to
> vect_check_gather_scatter, leaving it to search for a valid
> offset vector type.
> (vect_use_strided_gather_scatters_p): Convert the offset to the
> element type of the gs_info's offset_vectype.
> (vect_get_gather_scatter_ops): Get the offset vector type directly
> from the gs_info.
> (vect_get_strided_load_store_ops): Likewise.
> (vectorizable_load): Pass a zero of the result type to IFN_GATHER_LOAD
> and IFN_MASK_GATHER_LOAD.
> * config/aarch64/aarch64-sve.md (gather_load): Rename to...
> (gather_load): ...this.
> (mask_gather_load): Rename to...
> (mask_gather_load): ...this.
> (scatter_store): Rename

1 2 >

1 - 100 of 107 matches

Mail list logo