Re: libgo patch committed: Upgrade to Go 1.9 release

2017-09-15 Thread Ian Lance Taylor
On Fri, Sep 15, 2017 at 5:03 AM, Rainer Orth
 wrote:
>
 the patch broke Solaris bootstrap:

 /vol/gcc/src/hg/trunk/local/libgo/go/syscall/exec_unix.go:240:11: error: 
 reference to undefined name 'forkExecPipe'
   if err = forkExecPipe(p[:]); err != nil {
^

 libgo/go/syscall/forkpipe_bsd.go is needed on Solaris, too.

 /vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/link.go:73:10: 
 error: use of undefined type 'lifnum'
   lifn := lifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | 
 sysLIFC_ALLZONES | sysLIFC_UNDER_IPMP}
   ^
 make[8]: *** [Makefile:3349: golang_org/x/net/lif.lo] Error 1

 The Go 1.9 upgrade patch has

 @@ -70,7 +70,7 @@ func Links(af int, name string) ([]Link,

  func links(eps []endpoint, name string) ([]Link, error) {
 var lls []Link
 -   lifn := sysLifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | 
 sysLIFC_AL
 LZONES | sysLIFC_UNDER_IPMP}
 +   lifn := lifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | 
 sysLIFC_ALLZO
 NES | sysLIFC_UNDER_IPMP}

 Reverting that allows link.go to compile.

 /vol/gcc/src/hg/trunk/local/libgo/go/internal/poll/fd_unix.go:366:21: 
 error: reference to undefined identifier 'syscall.ReadDirent'
n, err := syscall.ReadDirent(fd.Sysfd, buf)
  ^

 I don't yet see where this comes from on non-Linux systems...
>>>
>>> It's in forkpipe_bsd.go.  Does this patch fix the problem?
>>
>> that's true for forkExecPipe and I had this change in the patch I'd
>> attached.  But what about syscall.ReadDirent?  I couldn't find that
>> one...
>
> I've had success with this patch on sparc-sun-solaris2.11 and
> i386-pc-solaris2.11.
>
> I've no idea what's behind upstream's src/syscall/syscall_solaris.go:
>
> func ReadDirent(fd int, buf []byte) (n int, err error) {
> // Final argument is (basep *uintptr) and the syscall doesn't take 
> nil.
> // TODO(rsc): Can we use a single global basep for all calls?
> return Getdents(fd, buf, new(uintptr))
> }
>
> I could find no hint that getdents(2) has an additional basep arg,
> neither in OpenSolaris sources nor in Illumos, so I've ignored this
> weirdness.

Thanks for sorting this out.  I committed your patch.

Ian


Re: [PATCH][RFC] Radically simplify emission of balanced tree for switch statements.

2017-09-15 Thread Jeff Law
On 09/14/2017 06:17 AM, Martin Liška wrote:
> Hello.
> 
> As mentioned at Cauldron 2017, second step in switch lowering should be 
> massive
> simplification in code that does expansion of balanced tree. Basically it 
> includes
> VRP and DCE, which we can for obvious reason do by our own.
> 
> The patch does that, and introduces a separate pass for -O0 that's responsible
> for lowering at the end of tree pass.
> 
> There's a small fallback that I would like to discuss:
> 
> 1) vrp105.c - there's no longer catches threading opportunity in between 
> default cases:
> adding Patrick who can probably help why is the opportunity skipped with 
> expanded tree
Well, VRP knows how to analyze a switch statement and determine when
paths out of one switch imply a particular case will be taken in a later
switch.  In this particular case we're trying to verify that the default
case in the first switch threads directly to the default case in the
second (though it seems we ought to be able to totally thread the cases).

Obviously we don't have switches anymore after your lowering pass.  But
ISTM we still ought to be able to evaluate how your lowering affects
jump threading on this case.  In theory the lowered switch statements
should be easier to thread.

Reality is sadly different.  There's two cases to consider.  One when i
< 0 (should go to first default, then directly to second default).  The
other when i > 1 which should also go to the first default, then
directly to the second default).

When i < 0 we do in fact thread through the second switch and go from
the first default case directly to the second default case.

When i > 1 we still go through the test for the second switch.

These are the key blocks:

;;   basic block 2, loop depth 0, freq 1, maybe hot
;;prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
;;pred:   ENTRY [always]  (FALLTHRU,EXECUTABLE)
  i.0_1 = i;
  if (i.0_1 > 0)
goto ; [50.01%] [count: INV]
  else
goto ; [49.99%] [count: INV]
;;succ:   3 [50.0% (guessed)]  (FALSE_VALUE,EXECUTABLE)
;;4 [50.0% (guessed)]  (TRUE_VALUE,EXECUTABLE)

;;   basic block 3, loop depth 0, freq 1, maybe hot
;;   Invalid sum of incoming frequencies 4999, should be 1
;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
;;pred:   2 [50.0% (guessed)]  (FALSE_VALUE,EXECUTABLE)
  if (i.0_1 == 0)
goto ; [40.00%] [count: INV]
  else
goto ; [60.00%] [count: INV]
;;succ:   14 [60.0% (guessed)]  (FALSE_VALUE,EXECUTABLE)
;;5 [40.0% (guessed)]  (TRUE_VALUE,EXECUTABLE)

;;   basic block 4, loop depth 0, freq 1, maybe hot
;;   Invalid sum of incoming frequencies 5001, should be 1
;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;pred:   2 [50.0% (guessed)]  (TRUE_VALUE,EXECUTABLE)
  _13 = i.0_1 != 1;
  if (_13 != 0)
goto ; [16.68%] [count: INV]
  else
goto ; [83.32%] [count: INV]
;;succ:   6 [83.3% (guessed)]  (FALSE_VALUE,EXECUTABLE)
;;13 [16.7% (guessed)]  (TRUE_VALUE,EXECUTABLE)

[ ... ]

;;   basic block 9, loop depth 0, freq , maybe hot
;;   Invalid sum of incoming frequencies 3999, should be 
;;prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED)
;;pred:   13 [always]  (FALLTHRU,EXECUTABLE)
;;7 [always (guessed)]  (TRUE_VALUE,EXECUTABLE)
  # prephitmp_19 = PHI 
  _2 = prephitmp_19 != 1;
  if (_2 != 0)
goto ; [16.68%] [count: INV]
  else
goto ; [83.32%] [count: INV]
;;succ:   11 [83.3% (guessed)]  (FALSE_VALUE,EXECUTABLE)
;;12 [16.7% (guessed)]  (TRUE_VALUE,EXECUTABLE)

[ ... ]
;;   basic block 13, loop depth 0, freq 1668, maybe hot
;;prev block 12, next block 14, flags: (NEW, REACHABLE, VISITED)
;;pred:   4 [16.7% (guessed)]  (TRUE_VALUE,EXECUTABLE)
  # prephitmp_15 = PHI 
  goto ; [100.00%] [count: INV]
;;succ:   9 [always]  (FALLTHRU,EXECUTABLE)



WHen bb9 is reached by bb13 we know by back substitution that the assignemnt

 _2 = prehitmp_19 != 1
 _2 = prehitmp_15 != 1
 _2 = i.0_1 != 1

And we should know enough about the range of i.0_1 to determine that is
true which would allow us to thread the jump.  But a few things get in
the way.

First, the VRP thread doesn't try hard at all to simplify gimple
assignments.  It really just tries to simplify gimple_cond and
gimple_switch.  This could be trivially improved.

So if I do the right things by hand we end up trying to evaluate i.0_1
!= 1.  So that's good.  But we don't get a useful result back from
vrp_evaluate_conditional.  WHy?  Because the recorded range for i.0_1 is
[1,INF] -- that's the global range for i.0_1 not the range on the path.

Now it turns out this is precisely the problem that I've got code to
address in my queue which fixes a regression I was working on earlier
this year in the gcc-7 cycle.  It's backed up behind some improvements
to 

Backports to 6.x

2017-09-15 Thread Jakub Jelinek
Hi!

I've backported 13 commits of mine and one from Richard.
Bootstrapped/regtested on x86_64-linux and i686-linux, committed
to gcc-6-branch.

Jakub
2017-09-15  Jakub Jelinek  

Backported from mainline
2017-06-30  Jakub Jelinek  

PR target/81225
* config/i386/sse.md (vec_extract_lo_): For
V8FI, V16FI and VI8F_256 iterators, use  instead
of nonimmediate_operand and  instead of m for
the input operand.  For V8FI iterator, always split if input is a MEM.
For V16FI and V8SF_256 iterators, don't test if both operands are MEM
if .  For VI4F_256 iterator, use 
instead of register_operand and  instead of v for
the input operand.  Make sure both operands aren't MEMs for if not
.

* gcc.target/i386/pr81225.c: New test.

--- gcc/config/i386/sse.md  (revision 250284)
+++ gcc/config/i386/sse.md  (revision 250285)
@@ -7230,12 +7230,13 @@ (define_insn "vec_extract_lo__mask
 (define_insn "vec_extract_lo_"
   [(set (match_operand: 0 "" 
"=,v")
(vec_select:
- (match_operand:V8FI 1 "nonimmediate_operand" "v,m")
+ (match_operand:V8FI 1 "" 
"v,")
  (parallel [(const_int 0) (const_int 1)
 (const_int 2) (const_int 3)])))]
-  "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+  "TARGET_AVX512F
+   && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
 {
-  if ( || !TARGET_AVX512VL)
+  if ( || (!TARGET_AVX512VL && !MEM_P (operands[1])))
 return "vextract64x4\t{$0x0, %1, 
%0|%0, %1, 0x0}";
   else
 return "#";
@@ -7374,14 +7375,15 @@ (define_expand "avx_vextractf128"
 (define_insn "vec_extract_lo_"
   [(set (match_operand: 0 "nonimmediate_operand" "=v,m")
(vec_select:
- (match_operand:V16FI 1 "nonimmediate_operand" "vm,v")
+ (match_operand:V16FI 1 ""
+",v")
  (parallel [(const_int 0) (const_int 1)
  (const_int 2) (const_int 3)
  (const_int 4) (const_int 5)
  (const_int 6) (const_int 7)])))]
   "TARGET_AVX512F
&& 
-   && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+   && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
 {
   if ()
 return "vextract32x8\t{$0x0, %1, 
%0|%0, %1, 0x0}";
@@ -7413,11 +7415,12 @@ (define_split
 (define_insn "vec_extract_lo_"
   [(set (match_operand: 0 "" "=v,m")
(vec_select:
- (match_operand:VI8F_256 1 "nonimmediate_operand" "vm,v")
+ (match_operand:VI8F_256 1 ""
+   ",v")
  (parallel [(const_int 0) (const_int 1)])))]
   "TARGET_AVX
&&  && 
-   && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+   && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
 {
   if ()
 return "vextract64x2\t{$0x0, %1, %0%{%3%}|%0%{%3%}, %1, 0x0}";
@@ -7493,12 +7496,16 @@ (define_split
 
 
 (define_insn "vec_extract_lo_"
-  [(set (match_operand: 0 "" 
"=")
+  [(set (match_operand: 0 ""
+ "=,v")
(vec_select:
- (match_operand:VI4F_256 1 "register_operand" "v")
+ (match_operand:VI4F_256 1 ""
+   "v,")
  (parallel [(const_int 0) (const_int 1)
 (const_int 2) (const_int 3)])))]
-  "TARGET_AVX &&  && "
+  "TARGET_AVX
+   &&  && 
+   && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
 {
   if ()
 return "vextract32x4\t{$0x0, %1, 
%0|%0, %1, 0x0}";
--- gcc/testsuite/gcc.target/i386/pr81225.c (nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr81225.c (revision 250285)
@@ -0,0 +1,14 @@
+/* PR target/81225 */
+/* { dg-do compile } */
+/* { dg-options "-mavx512ifma -O3 -ffloat-store" } */
+
+long a[24];
+float b[4], c[24];
+int d;
+
+void
+foo ()
+{
+  for (d = 0; d < 24; d++)
+c[d] = (float) d ? : b[a[d]];
+}
2017-09-15  Jakub Jelinek  

PR libquadmath/65757
* math/roundq.c: Cherry-pick upstream glibc 2015-04-28 change.

--- libquadmath/math/roundq.c   (revision 250378)
+++ libquadmath/math/roundq.c   (revision 250379)
@@ -1,5 +1,5 @@
 /* Round __float128 to integer away from zero.
-   Copyright (C) 1997, 1999 Free Software Foundation, Inc.
+   Copyright (C) 1997-2017 Free Software Foundation, Inc.
This file is part of the GNU C Library.
Contributed by Ulrich Drepper , 1997 and
  Jakub Jelinek , 1999.
@@ -32,7 +32,7 @@ roundq (__float128 x)
 
   GET_FLT128_WORDS64 (i0, i1, x);
   j0 = ((i0 >> 48) & 0x7fff) - 0x3fff;
-  if (j0 < 31)
+  if (j0 < 48)
 {
   if (j0 < 0)
{
2017-09-15  Jakub Jelinek  

Backported from mainline
2017-08-08  Richard Biener  

PR middle-end/81766
* function.c (thread_prologue_and_epilogue_insns): Restore
behavior of always calling 

RFA (hash-map): PATCH to support GTY((cache)) with hash_map

2017-09-15 Thread Jason Merrill
The hash_map interface is a lot more convenient than that of
hash_table for cases where it makes sense, but there hasn't been a way
to get the ggc_cache_remove behavior with a hash_map.  In other words,
not marking elements during the initial ggc marking phase, but maybe
marking them during the clear_caches phase based on keep_cache_entry.

This patch implements that by:

Adding a ggc_maybe_mx member function to ggc_remove, and overriding
that instead of ggc_mx in ggc_cache_remove.
Calling H::ggc_maybe_mx instead of H::ggc_mx in gt_ggc_mx (hash_table *).
Calling H::ggc_mx in gt_cleare_caches (hash_table *) rather than
relying on an extern declaration of a plain function that cannot be
declared for hash_map::hash_entry.
Adding ggc_maybe_mx and keep_cache_entry to hash_map::hash_entry.
Adding gt_cleare_cache for hash_map.
Adding a boolean constant to the hash-map traits indicating whether we
want the cache behavior above.

I then define a typedef tree_cache_map to use this functionality, and
use it in a few places in the C++ front end.

Tested x86_64-pc-linux-gnu, OK for trunk?
commit 29a77538437442915ecb85516e3710c918d0e8ac
Author: Jason Merrill 
Date:   Wed Sep 13 14:33:33 2017 -0400

Support GTY((cache)) on hash_map.

gcc/
* hash-traits.h (ggc_remove): Add ggc_maybe_mx member function.
(ggc_cache_remove): Override it instead of ggc_mx.
* hash-table.h (gt_ggc_mx): Call it instead of ggc_mx.
(gt_cleare_cache): Call ggc_mx instead of gt_ggc_mx.
* hash-map-traits.h (simple_hashmap_traits): Add maybe_mx member.
(simple_cache_map_traits): Override maybe_mx.
* hash-map.h (hash_entry): Add ggc_maybe_mx and keep_cache_entry.
(hash_map): Friend gt_cleare_cache.
(gt_cleare_cache): New.
* tree.h (tree_cache_traits): New hash_map traits class.
(tree_cache_map): New typedef.
gcc/cp/
* decl.c (decomp_type_table): Use tree_cache_map.
* init.c (nsdmi_inst): Likewise.
* pt.c (defarg_ints): Likewise.
* cp-objcp-common.c (cp_get_debug_type): Likewise.

diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c
index 183e7f7bf57..27f0b985378 100644
--- a/gcc/cp/cp-objcp-common.c
+++ b/gcc/cp/cp-objcp-common.c
@@ -131,19 +131,7 @@ cxx_types_compatible_p (tree x, tree y)
   return same_type_ignoring_top_level_qualifiers_p (x, y);
 }
 
-struct debug_type_hasher : ggc_cache_ptr_hash
-{
-  static hashval_t hash (tree_map *m) { return tree_map_hash (m); }
-  static bool equal (tree_map *a, tree_map *b) { return tree_map_eq (a, b); }
-
-  static int
-  keep_cache_entry (tree_map *)
-  {
-return ggc_marked_p (e->base.from);
-  }
-};
-
-static GTY((cache)) hash_table *debug_type_hash;
+static GTY((cache)) tree_cache_map *debug_type_map;
 
 /* Return a type to use in the debug info instead of TYPE, or NULL_TREE to
keep TYPE.  */
@@ -151,38 +139,29 @@ static GTY((cache)) hash_table 
*debug_type_hash;
 tree
 cp_get_debug_type (const_tree type)
 {
+  tree dtype = NULL_TREE;
+
   if (TYPE_PTRMEMFUNC_P (type) && !typedef_variant_p (type))
+dtype = build_offset_type (TYPE_PTRMEMFUNC_OBJECT_TYPE (type),
+  TREE_TYPE (TYPE_PTRMEMFUNC_FN_TYPE (type)));
+
+  /* We cannot simply return the debug type here because the function uses
+ the type canonicalization hashtable, which is GC-ed, so its behavior
+ depends on the actual collection points.  Since we are building these
+ types on the fly for the debug info only, they would not be attached
+ to any GC root and always be swept, so we would make the contents of
+ the debug info depend on the collection points.  */
+  if (dtype)
 {
-  if (debug_type_hash == NULL)
-   debug_type_hash = hash_table::create_ggc (512);
-
-  /* We cannot simply use build_offset_type here because the function uses
-the type canonicalization hashtable, which is GC-ed, so its behavior
-depends on the actual collection points.  Since we are building these
-types on the fly for the debug info only, they would not be attached
-to any GC root and always be swept, so we would make the contents of
-the debug info depend on the collection points.  */
-  struct tree_map in, *h;
-
-  in.base.from = CONST_CAST_TREE (type);
-  in.hash = htab_hash_pointer (type);
-  h = debug_type_hash->find_with_hash (, in.hash);
-  if (h)
-   return h->to;
-
-  tree t = build_offset_type (TYPE_PTRMEMFUNC_OBJECT_TYPE (type),
- TREE_TYPE (TYPE_PTRMEMFUNC_FN_TYPE (type)));
-
-  h = ggc_alloc ();
-  h->base.from = CONST_CAST_TREE (type);
-  h->hash = htab_hash_pointer (type);
-  h->to = t;
-  *debug_type_hash->find_slot_with_hash (h, h->hash, INSERT) = h;
-
-  return t;
+  tree ktype = CONST_CAST_TREE (type);
+  if 

Re: [PATCH, rs6000] gimple folding vector load test variant

2017-09-15 Thread Segher Boessenkool
On Fri, Sep 15, 2017 at 10:06:40AM -0500, Will Schmidt wrote:
>   This is a test created during investigation of the feedback on
> the rs6000 gimple vector folding code, regarding the handling of
> arg1_type.  Inspired by feedback from Richard and Bill.
> 
> This was useful to illustrate the issue to me.  Whether this is a
> valid test for the testsuite I'll defer to the judgement of the
> maintainers.. :-)

Looks fine to me, except one thing:

> +/* { dg-do compile { target lp64 } } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-mvsx -O2 -mpower8-vector -fdump-tree-gimple" } */

If you use -mpower8-vector (do you need it?), you should use

/* { dg-require-effective-target powerpc_p8vector_ok } */

(instead of the vsx_ok, which is then implied).

Okay for trunk with that fixed.  Thanks,


Segher


Go patch committed: return an error statement for fallthrough in last case

2017-09-15 Thread Ian Lance Taylor
This patch by Cherry Zhang changes the Go frontend to call
error_statement when generating GIMPLE for a fallthrough in the last
case in a switch.  This avoids generating incorrect IR in an erroneous
case.  Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 252849)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-de7b370901c4fc6852eaa7372282bb699429ec4a
+70cf67704699c8bcaf6f52437812367cdc4ad169
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/statements.cc
===
--- gcc/go/gofrontend/statements.cc (revision 252749)
+++ gcc/go/gofrontend/statements.cc (working copy)
@@ -3707,6 +3707,12 @@ Case_clauses::get_backend(Translate_cont
   std::vector cases;
   Bstatement* stat = p->get_backend(context, break_label, _constants,
);
+  // The final clause can't fall through.
+  if (i == c - 1 && p->is_fallthrough())
+{
+  go_assert(saw_errors());
+  stat = context->backend()->error_statement();
+}
   (*all_cases)[i].swap(cases);
   (*all_statements)[i] = stat;
 }


Re: [PATCH, rs6000] [v3] Folding of vector loads in GIMPLE

2017-09-15 Thread Segher Boessenkool
Hi Will,

On Fri, Sep 15, 2017 at 09:59:54AM -0500, Will Schmidt wrote:
> +/* Vector loads.  */
> +case ALTIVEC_BUILTIN_LVX_V16QI:
> +case ALTIVEC_BUILTIN_LVX_V8HI:
> +case ALTIVEC_BUILTIN_LVX_V4SI:
> +case ALTIVEC_BUILTIN_LVX_V4SF:
> +case ALTIVEC_BUILTIN_LVX_V2DI:
> +case ALTIVEC_BUILTIN_LVX_V2DF:
> +  {
> +  gimple *g;

This is only used much later (and for a short time only), so just declare
it there?

Other than that nit, looks fine to me.  Okay for trunk.  Thanks!


Segher


Go patch committed: check for error expression in Array_type::get_backend_length

2017-09-15 Thread Ian Lance Taylor
This patch by Cherry Zhang to the Go frontend checks for an  error
expression in Array_type::get_backend_length.  Otherwise, a zero
length is created in the backend and the backend doesn't know there is
an error.  Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 252767)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-4e063a8eee636cce17aea48c7183e78431174de3
+de7b370901c4fc6852eaa7372282bb699429ec4a
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 252746)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -7638,6 +7638,11 @@ Array_type::get_backend_length(Gogo* gog
   go_assert(this->length_ != NULL);
   if (this->blength_ == NULL)
 {
+  if (this->length_->is_error_expression())
+{
+  this->blength_ = gogo->backend()->error_expression();
+  return this->blength_;
+}
   Numeric_constant nc;
   mpz_t val;
   if (this->length_->numeric_constant_value() && nc.to_int())


[PATCH,rs6000] Replace swap of a loaded vector constant with load of a swapped vector constant

2017-09-15 Thread Kelvin Nilsen

On Power8 little endian, two instructions are needed to load from the
natural in-memory representation of a vector into a vector register: a
load followed by a swap.  When the vector value to be loaded is a
constant, more efficient code can be achieved by swapping the
representation of the constant in memory so that only a load instruction
is required.

This patch has been bootstrapped and tested without regressions on
powerpc64le-unknown-linux (P8) and on powerpc-unknown-linux (P8,
big-endian, with both -m32 and -m64 target options).

Is this ok for trunk?

gcc/ChangeLog:

2017-09-14  Kelvin Nilsen  

* config/rs6000/rs6000-p8swap.c (const_load_sequence_p): Revise
this function to return false if the definition used by the swap
instruction is artificial, or if the memory address from which the
constant value is loaded is not represented by a base address held
in a register or if the base address register is a frame or stack
pointer.  Additionally, return false if the base address of the
loaded constant is a SYMBOL_REF but is not considered to be a
constant.
(replace_swapped_load_constant): New function.
(rs6000_analyze_swaps): Add a new pass to replace a swap of a
loaded constant vector with a load of a swapped constant vector.

gcc/testsuite/ChangeLog:

2017-09-14  Kelvin Nilsen  

* gcc.target/powerpc/swaps-p8-28.c: New test.
* gcc.target/powerpc/swaps-p8-29.c: New test.
* gcc.target/powerpc/swps-p8-30.c: New test.

Index: gcc/config/rs6000/rs6000-p8swap.c
===
--- gcc/config/rs6000/rs6000-p8swap.c   (revision 252768)
+++ gcc/config/rs6000/rs6000-p8swap.c   (working copy)
@@ -342,7 +342,8 @@ const_load_sequence_p (swap_web_entry *insn_entry,
   FOR_EACH_INSN_INFO_USE (use, insn_info)
 {
   struct df_link *def_link = DF_REF_CHAIN (use);
-  if (!def_link || def_link->next)
+  if (!def_link || !def_link->ref || DF_REF_IS_ARTIFICIAL (def_link->ref)
+ || def_link->next)
return false;
 
   rtx def_insn = DF_REF_INSN (def_link->ref);
@@ -358,6 +359,8 @@ const_load_sequence_p (swap_web_entry *insn_entry,
 
   rtx mem = XEXP (SET_SRC (body), 0);
   rtx base_reg = XEXP (mem, 0);
+  if (!REG_P (base_reg))
+   return false;
 
   df_ref base_use;
   insn_info = DF_INSN_INFO_GET (def_insn);
@@ -370,6 +373,14 @@ const_load_sequence_p (swap_web_entry *insn_entry,
  if (!base_def_link || base_def_link->next)
return false;
 
+ /* Constants held on the stack are not "true" constants
+  * because their values are not part of the static load
+  * image.  If this constant's base reference is a stack
+  * or frame pointer, it is seen as an artificial
+  * reference. */
+ if (DF_REF_IS_ARTIFICIAL (base_def_link->ref))
+   return false;
+
  rtx tocrel_insn = DF_REF_INSN (base_def_link->ref);
  rtx tocrel_body = PATTERN (tocrel_insn);
  rtx base, offset;
@@ -385,6 +396,25 @@ const_load_sequence_p (swap_web_entry *insn_entry,
  split_const (XVECEXP (tocrel_base, 0, 0), , );
  if (GET_CODE (base) != SYMBOL_REF || !CONSTANT_POOL_ADDRESS_P (base))
return false;
+ else
+   {
+ /* FIXME: The conditions under which
+  *  ((GET_CODE (const_vector) == SYMBOL_REF) &&
+  *   !CONSTANT_POOL_ADDRESS_P (const_vector))
+  * are not well understood.  This code prevents
+  * an internal compiler error which will occur in
+  * replace_swapped_load_constant () if we were to return
+  * true.  Some day, we should figure out how to properly
+  * handle this condition in
+  * replace_swapped_load_constant () and then we can
+  * remove this special test.  */
+ rtx const_vector = get_pool_constant (base);
+ if (GET_CODE (const_vector) == SYMBOL_REF)
+   {
+ if (!CONSTANT_POOL_ADDRESS_P (const_vector))
+   return false;
+   }
+   }
}
 }
   return true;
@@ -1281,6 +1311,189 @@ replace_swap_with_copy (swap_web_entry *insn_entry
   insn->set_deleted ();
 }
 
+/* Given that swap_insn represents a swap of a load of a constant
+   vector value, replace with a single instruction that loads a
+   swapped variant of the original constant. 
+
+   The "natural" representation of a byte array in memory is the same
+   for big endian and little endian.
+
+   unsigned char byte_array[] =
+ { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f };
+
+   However, when loaded into a vector register, the representation
+   depends on endian conventions.
+
+   In big-endian mode, the register holds:
+
+ MSB   

Implement C11 excess precision semantics for conversions (PR c/82071)

2017-09-15 Thread Joseph Myers
C11 semantics for excess precision (from N1531) are that an implicit
conversion (from the usual arithmetic conversions, not by assignment)
from integer to floating point has a result in the corresponding
evaluation format of that floating-point type, so possibly with excess
precision (whereas a cast or conversion by assignment from integer to
floating point must produce a value without excess range or precision,
as always).  This patch makes GCC support those semantics if
flag_isoc11 (which in turn means that conditional expressions need to
support generating a result with excess precision even if neither
operand had excess precision).

C99 is less than entirely clear in this regard, but my reading as
outlined at 
is that the results of conversions from integer to floating-point
types are always expected to be representable in the target type
without excess precision, and this patch conservatively keeps these
semantics for pre-C11 (i.e. if an older standard is explicitly
selected).

Bootstrapped with no regressions on x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc/c:
2017-09-15  Joseph Myers  

PR c/82071
* c-typeck.c (ep_convert_and_check): Just call convert_and_check
for C11.
(build_conditional_expr): For C11, generate result with excess
precision when one argument is an integer and the other is of a
type using excess precision.

gcc/testsuite:
2017-09-15  Joseph Myers  

PR c/82071
* gcc.target/i386/excess-precision-8.c: New test.

Index: gcc/c/c-typeck.c
===
--- gcc/c/c-typeck.c(revision 252806)
+++ gcc/c/c-typeck.c(working copy)
@@ -4866,7 +4866,9 @@ ep_convert_and_check (location_t loc, tree type, t
   if (TREE_TYPE (expr) == type)
 return expr;
 
-  if (!semantic_type)
+  /* For C11, integer conversions may have results with excess
+ precision.  */
+  if (flag_isoc11 || !semantic_type)
 return convert_and_check (loc, type, expr);
 
   if (TREE_CODE (TREE_TYPE (expr)) == INTEGER_TYPE
@@ -4994,7 +4996,31 @@ build_conditional_expr (location_t colon_loc, tree
   && (code2 == INTEGER_TYPE || code2 == REAL_TYPE
   || code2 == COMPLEX_TYPE))
 {
-  result_type = c_common_type (type1, type2);
+  /* In C11, a conditional expression between a floating-point
+type and an integer type should convert the integer type to
+the evaluation format of the floating-point type, with
+possible excess precision.  */
+  tree eptype1 = type1;
+  tree eptype2 = type2;
+  if (flag_isoc11)
+   {
+ tree eptype;
+ if (ANY_INTEGRAL_TYPE_P (type1)
+ && (eptype = excess_precision_type (type2)) != NULL_TREE)
+   {
+ eptype2 = eptype;
+ if (!semantic_result_type)
+   semantic_result_type = c_common_type (type1, type2);
+   }
+ else if (ANY_INTEGRAL_TYPE_P (type2)
+  && (eptype = excess_precision_type (type1)) != NULL_TREE)
+   {
+ eptype1 = eptype;
+ if (!semantic_result_type)
+   semantic_result_type = c_common_type (type1, type2);
+   }
+   }
+  result_type = c_common_type (eptype1, eptype2);
   if (result_type == error_mark_node)
return error_mark_node;
   do_warn_double_promotion (result_type, type1, type2,
Index: gcc/testsuite/gcc.target/i386/excess-precision-8.c
===
--- gcc/testsuite/gcc.target/i386/excess-precision-8.c  (nonexistent)
+++ gcc/testsuite/gcc.target/i386/excess-precision-8.c  (working copy)
@@ -0,0 +1,61 @@
+/* Excess precision tests.  Test C11 semantics for conversions from
+   integers to floating point: no excess precision for either explicit
+   conversions, but excess precision for implicit conversions.  */
+/* { dg-do run } */
+/* { dg-options "-std=c11 -mfpmath=387 -fexcess-precision=standard" } */
+
+extern void abort (void);
+extern void exit (int);
+
+int
+main (void)
+{
+  float f = 1.0f;
+  int i;
+
+  i = 0x10001234;
+  if ((float) i != 0x10001240)
+abort ();
+
+  i = 0x10001234;
+  i += f;
+  if (i != 0x10001235)
+abort ();
+
+  i = 0x10001234;
+  i += 1.0f;
+  if (i != 0x10001235)
+abort ();
+
+  i = 0x10001234;
+  i = i + f;
+  if (i != 0x10001235)
+abort ();
+
+  i = 0x10001234;
+  i = i + 1.0f;
+  if (i != 0x10001235)
+abort ();
+
+  i = 0x10001235;
+  i = (1 ? i : 1.0f);
+  if (i != 0x10001235)
+abort ();
+
+  i = 0x10001235;
+  i = (1 ? i : f);
+  if (i != 0x10001235)
+abort ();
+
+  i = 0x10001235;
+  i = (0 ? 1.0f :i);
+  if (i != 0x10001235)
+abort ();
+
+  i = 0x10001235;
+  i = (0 ? f : i);
+  if (i != 0x10001235)
+abort ();
+
+  exit (0);
+}

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb

2017-09-15 Thread Steve Ellcey
PR 81356 points out that doing a __builtin_strcpy of an empty string on
aarch64 does a copy from memory instead of just writing out a zero byte.
In looking at this I found that it was because of
aarch64_use_by_pieces_infrastructure_p, which returns false for
STORE_BY_PIECES.  The comment says:

  /* STORE_BY_PIECES can be used when copying a constant string, but
 in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
 For now we always fail this and let the move_by_pieces code copy
 the string from read-only memory.  */

But this doesn't seem to be the case anymore.  When I remove this function
and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code
for __builtin_strcpy of a constant string seems to be either better or the
same.  The only time I got more instructions after removing this function
was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk
instructions to create the source followed by a store instead of doing a
load/store of 8 bytes.  The comment may have been applicable for
-mstrict-align at one time but it doesn't seem to be the case now.  I still
get better code without this routine under that option as well.

Bootstrapped and tested without regressions, OK to checkin?

Steve Ellcey
sell...@cavium.com



2017-09-15  Steve Ellcey  

PR target/81356
* config/aarch64/aarch64.c (aarch64_use_by_pieces_infrastructure_p):
Remove.
(TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.


2017-09-15  Steve Ellcey  

* gcc.target/aarch64/pr81356.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1c14008..fc72236 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14118,22 +14118,6 @@ aarch64_asan_shadow_offset (void)
   return (HOST_WIDE_INT_1 << 36);
 }
 
-static bool
-aarch64_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size,
-	unsigned int align,
-	enum by_pieces_operation op,
-	bool speed_p)
-{
-  /* STORE_BY_PIECES can be used when copying a constant string, but
- in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
- For now we always fail this and let the move_by_pieces code copy
- the string from read-only memory.  */
-  if (op == STORE_BY_PIECES)
-return false;
-
-  return default_use_by_pieces_infrastructure_p (size, align, op, speed_p);
-}
-
 static rtx
 aarch64_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn **gen_seq,
 			int code, tree treeop0, tree treeop1)
@@ -15631,10 +15615,6 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_LEGITIMIZE_ADDRESS
 #define TARGET_LEGITIMIZE_ADDRESS aarch64_legitimize_address
 
-#undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
-#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
-  aarch64_use_by_pieces_infrastructure_p
-
 #undef TARGET_SCHED_CAN_SPECULATE_INSN
 #define TARGET_SCHED_CAN_SPECULATE_INSN aarch64_sched_can_speculate_insn
 
diff --git a/gcc/testsuite/gcc.target/aarch64/pr81356.c b/gcc/testsuite/gcc.target/aarch64/pr81356.c
index e69de29..9fd6baa 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr81356.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr81356.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+void f(char *a)
+{
+  __builtin_strcpy (a, "");
+}
+
+/* { dg-final { scan-assembler-not "ldrb" } } */


Re: [C PATCH] field_decl_cmp

2017-09-15 Thread Joseph Myers
On Fri, 15 Sep 2017, Nathan Sidwell wrote:

> On 09/12/2017 12:48 PM, Joseph Myers wrote:
> 
> > I'd be concerned about the possibility of a qsort implementation that
> > calls the comparison function with two pointers to the same object (as far
> > as I can tell, it's valid for qsort to do that).  That is, I think you
> > need to check for the two DECLs being the same DECL, before asserting
> > their names are different.
> 
> I suppose we can drop the assert.  That does leave it returning +1 in the case
> you're concerned about, but I don't really see the need to tell such a stupid
> qsort that the things are unordered.

I don't know what such a qsort would do if such a case returned 1; my 
presumption is that all our comparison functions ought to return 0 when 
two objects are equal, even if that can only be if they are the same 
object.  It's OK with a return of 0 if x == y (or if DECL_NAME (x) == 
DECL_NAME (y), whichever you think appropriate).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [C++ PATCH] Incremental patch for -std=gnu++2a and -std=c++2a

2017-09-15 Thread Jason Merrill
OK.

On Fri, Sep 15, 2017 at 8:49 AM, Jakub Jelinek  wrote:
> Hi!
>
> On Thu, Sep 14, 2017 at 11:28:09PM +0200, Jakub Jelinek wrote:
>> > I'd be tempted to say leave all this, and march 1z -> 2a for the _next_ 
>> > standard.  2020 or so is a good first stab at the date.
>>
>> I didn't want to add c++2a and gnu++2a in the same patch, it can be added
>> incrementally and readd the above wording.  Unless somebody else is planning
>> to do that, I can do that next.
>
> Here is so far untested incremental patch on top of the 1z -> 17 patch,
> mostly using Andrew's patch, but adjusted so that it applies and with
> various additions and small tweaks.
>
> 2017-09-15  Andrew Sutton  
> Jakub Jelinek  
>
> Add support for -std=c++2a.
> * c-common.h (cxx_dialect): Add cxx2a as a dialect.
> * opt.c: Add options for -std=c++2a and -std=gnu++2a.
> * c-opts.c (set_std_cxx2a): New.
> (c_common_handle_option): Set options when -std=c++2a is enabled.
> (c_common_post_options): Adjust comments.
> (set_std_cxx14, set_std_cxx17): Likewise.
>
> * doc/cpp.texi (__cplusplus): Document value for -std=c++2a
> or -std=gnu+2a.
> * doc/invoke.texi: Document -std=c++2a and -std=gnu++2a.
>
> * lib/target-supports.exp (check_effective_target_c++17): Return
> 1 also if check_effective_target_c++2a.
> (check_effective_target_c++17_down): New.
> (check_effective_target_c++2a_only): New.
> (check_effective_target_c++2a): New.
> * g++.dg/cpp2a/cplusplus.C: New.
>
> * include/cpplib.h (c_lang): Add CXX2A and GNUCXX2A.
> * init.c (lang_defaults): Add rows for CXX2A and GNUCXX2A.
> (cpp_init_builtins): Set __cplusplus to 201709L for C++2a.
>
> --- gcc/c-family/c-common.h.jj  2017-09-14 22:53:34.977313456 +0200
> +++ gcc/c-family/c-common.h 2017-09-15 12:59:25.539053983 +0200
> @@ -703,7 +703,9 @@ enum cxx_dialect {
>/* C++14 */
>cxx14,
>/* C++17 */
> -  cxx17
> +  cxx17,
> +  /* C++2a (C++20?) */
> +  cxx2a
>  };
>
>  /* The C++ dialect being used. C++98 is the default.  */
> --- gcc/c-family/c-opts.c.jj2017-09-14 22:53:34.978313443 +0200
> +++ gcc/c-family/c-opts.c   2017-09-15 14:28:21.287595277 +0200
> @@ -111,6 +111,7 @@ static void set_std_cxx98 (int);
>  static void set_std_cxx11 (int);
>  static void set_std_cxx14 (int);
>  static void set_std_cxx17 (int);
> +static void set_std_cxx2a (int);
>  static void set_std_c89 (int, int);
>  static void set_std_c99 (int);
>  static void set_std_c11 (int);
> @@ -637,6 +638,12 @@ c_common_handle_option (size_t scode, co
> set_std_cxx17 (code == OPT_std_c__17 /* ISO */);
>break;
>
> +case OPT_std_c__2a:
> +case OPT_std_gnu__2a:
> +  if (!preprocessing_asm_p)
> +   set_std_cxx2a (code == OPT_std_c__2a /* ISO */);
> +  break;
> +
>  case OPT_std_c90:
>  case OPT_std_iso9899_199409:
>if (!preprocessing_asm_p)
> @@ -938,7 +945,7 @@ c_common_post_options (const char **pfil
> warn_narrowing = 1;
>
>/* Unless -f{,no-}ext-numeric-literals has been used explicitly,
> -for -std=c++{11,14,17} default to -fno-ext-numeric-literals.  */
> +for -std=c++{11,14,17,2a} default to -fno-ext-numeric-literals.  */
>if (flag_iso && !global_options_set.x_flag_ext_numeric_literals)
> cpp_opts->ext_numeric_literals = 0;
>  }
> @@ -1589,7 +1596,7 @@ set_std_cxx14 (int iso)
>flag_no_gnu_keywords = iso;
>flag_no_nonansi_builtin = iso;
>flag_iso = iso;
> -  /* C++11 includes the C99 standard library.  */
> +  /* C++14 includes the C99 standard library.  */
>flag_isoc94 = 1;
>flag_isoc99 = 1;
>cxx_dialect = cxx14;
> @@ -1604,7 +1611,7 @@ set_std_cxx17 (int iso)
>flag_no_gnu_keywords = iso;
>flag_no_nonansi_builtin = iso;
>flag_iso = iso;
> -  /* C++11 includes the C99 standard library.  */
> +  /* C++17 includes the C11 standard library.  */
>flag_isoc94 = 1;
>flag_isoc99 = 1;
>flag_isoc11 = 1;
> @@ -1612,6 +1619,22 @@ set_std_cxx17 (int iso)
>lang_hooks.name = "GNU C++17";
>  }
>
> +/* Set the C++ 202a draft standard (without GNU extensions if ISO).  */
> +static void
> +set_std_cxx2a (int iso)
> +{
> +  cpp_set_lang (parse_in, iso ? CLK_CXX2A: CLK_GNUCXX2A);
> +  flag_no_gnu_keywords = iso;
> +  flag_no_nonansi_builtin = iso;
> +  flag_iso = iso;
> +  /* C++17 includes the C11 standard library.  */
> +  flag_isoc94 = 1;
> +  flag_isoc99 = 1;
> +  flag_isoc11 = 1;
> +  cxx_dialect = cxx2a;
> +  lang_hooks.name = "GNU C++17"; /* Pretend C++17 until standardization.  */
> +}
> +
>  /* Args to -d specify what to dump.  Silently ignore
> unrecognized options; they may be aimed at toplev.c.  */
>  static void
> --- gcc/c-family/c.opt.jj   2017-09-14 22:53:34.977313456 +0200
> +++ gcc/c-family/c.opt  2017-09-15 

Re: [C PATCH] field_decl_cmp

2017-09-15 Thread Nathan Sidwell

On 09/12/2017 12:48 PM, Joseph Myers wrote:


I'd be concerned about the possibility of a qsort implementation that
calls the comparison function with two pointers to the same object (as far
as I can tell, it's valid for qsort to do that).  That is, I think you
need to check for the two DECLs being the same DECL, before asserting
their names are different.


I suppose we can drop the assert.  That does leave it returning +1 in 
the case you're concerned about, but I don't really see the need to tell 
such a stupid qsort that the things are unordered.


nathan

--
Nathan Sidwell
2017-09-15  Nathan Sidwell  

	* c-decl.c (field_decl_cmp): No need to handle NULL or TYPE_DECLs.

Index: c-decl.c
===
--- c-decl.c	(revision 252833)
+++ c-decl.c	(working copy)
@@ -7845,19 +7845,10 @@ warn_cxx_compat_finish_struct (tree fiel
 static int
 field_decl_cmp (const void *x_p, const void *y_p)
 {
-  const tree *const x = (const tree *) x_p;
-  const tree *const y = (const tree *) y_p;
+  const tree x = *(const tree *) x_p;
+  const tree y = *(const tree *) y_p;
 
-  if (DECL_NAME (*x) == DECL_NAME (*y))
-/* A nontype is "greater" than a type.  */
-return (TREE_CODE (*y) == TYPE_DECL) - (TREE_CODE (*x) == TYPE_DECL);
-  if (DECL_NAME (*x) == NULL_TREE)
-return -1;
-  if (DECL_NAME (*y) == NULL_TREE)
-return 1;
-  if (DECL_NAME (*x) < DECL_NAME (*y))
-return -1;
-  return 1;
+  return DECL_NAME (x) < DECL_NAME (y) ? -1 : +1;
 }
 
 /* Fill in the fields of a RECORD_TYPE or UNION_TYPE node, T.


[PATCH] lra: make reload_pseudo_compare_func a proper comparator

2017-09-15 Thread Alexander Monakov
Hello,

I'd like to apply the following LRA patch to make qsort comparator
reload_pseudo_compare_func proper (right now it lacks transitivity
due to incorrect use of non_reload_pseudos bitmap, PR 68988).

This function was originally a proper comparator, and the problematic
code was added as a fix for PR 57878.  However, some time later the fix
for PR 60650 really fixed this LRA spill failure, making the original
fix unneeded.  So now GCC can revert to the original, simpler comparator.

The only question is what comparison order would be better for performance.
The check in question only matters for multi-reg pseudos, so it matters
mostly for 64-bit modes on 32-bit architectures.

To investigate that, I've bootstrapped GCC on 32-bit x86 in 4 configurations:

1. Current trunk.

[2-4 are with original PR 57878 fix reverted]
2. Original code, with ira_reg_class_max_nregs below regno_assign_info.freq
   check.
3. Hybrid code, with i_r_c_max_nregs preferred over r_a_i.freq during the
   second assignment pass, but not first.
4. With i_r_c_max_nregs above r_a_i.freq check, i.e. always do fragmentation
   avoidance check before the frequency check. This is the original PR 57878
   fix proposed by Wei Mi.

I have found that cc1 size is largest with variant 1, variants 2 and 3 result
in ~500 bytes size reduction, and variant 4 is further ~500 bytes smaller than
that (considering only the .text section, debug info variance is larger).

I have also tested variants 2 and 4 on SPEC CPU 2000: there's no significant 
difference in performance (in fact generated code is the same on almost all
tests).

Therefore I suggest we go with variant 4, implemented by the following patch.

Bootstrapped and regtested on 32-bit x86, OK to apply?

Thanks.
Alexander

PR rtl-optimization/57878
PR rtl-optimization/68988
* lra-assigns.c (reload_pseudo_compare_func): Remove fragmentation
avoidance test involving non_reload_pseudos.  Move frequency test
below the general fragmentation avoidance test.

diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
index 9208fccfd59..5a65c7c8c5f 100644
--- a/gcc/lra-assigns.c
+++ b/gcc/lra-assigns.c
@@ -211,24 +211,15 @@ reload_pseudo_compare_func (const void *v1p, const void 
*v2p)
   if ((diff = (ira_class_hard_regs_num[cl1]
   - ira_class_hard_regs_num[cl2])) != 0)
 return diff;
-  if ((diff
-   = (ira_reg_class_max_nregs[cl2][lra_reg_info[r2].biggest_mode]
- - ira_reg_class_max_nregs[cl1][lra_reg_info[r1].biggest_mode])) != 0
-  /* The code below executes rarely as nregs == 1 in most cases.
-So we should not worry about using faster data structures to
-check reload pseudos.  */
-  && ! bitmap_bit_p (_reload_pseudos, r1)
-  && ! bitmap_bit_p (_reload_pseudos, r2))
-return diff;
-  if ((diff = (regno_assign_info[regno_assign_info[r2].first].freq
-  - regno_assign_info[regno_assign_info[r1].first].freq)) != 0)
-return diff;
   /* Allocate bigger pseudos first to avoid register file
  fragmentation.  */
   if ((diff
= (ira_reg_class_max_nregs[cl2][lra_reg_info[r2].biggest_mode]
  - ira_reg_class_max_nregs[cl1][lra_reg_info[r1].biggest_mode])) != 0)
 return diff;
+  if ((diff = (regno_assign_info[regno_assign_info[r2].first].freq
+  - regno_assign_info[regno_assign_info[r1].first].freq)) != 0)
+return diff;
   /* Put pseudos from the thread nearby.  */
   if ((diff = regno_assign_info[r1].first - regno_assign_info[r2].first) != 0)
 return diff;


Vxworks Maintainership

2017-09-15 Thread Nathan Sidwell
As I mentioned a few months back, I intended to step down as a vxworks 
maintainer.  Olivier is more than capable of looking after the target, 
plus he has hardware.


I've committed this patch.

nathan
--
Nathan Sidwell
2017-09-15  Nathan Sidwell  

	* MAINTAINERS: Remove myself as a vxworks maintainer.

Index: MAINTAINERS
===
--- MAINTAINERS	(revision 252076)
+++ MAINTAINERS	(working copy)
@@ -134,7 +134,6 @@ RTEMS Ports		Ralf Corsepius		
 VMS			Douglas Rupp		
 VMS			Tristan Gingold		
-VxWorks ports		Nathan Sidwell		
 VxWorks ports		Olivier Hainque		
 cygwin, mingw-w64	Jonathan Yong		<10wa...@gmail.com>
 


Re: [C++ PATCH] Renames/adjustments of 1z to 17

2017-09-15 Thread Sandra Loosemore

On 09/15/2017 11:07 AM, Nathan Sidwell wrote:

On 09/14/2017 04:26 PM, Jakub Jelinek wrote:


There is one 1z spot in gcc/doc/standards.texi, but that whole paragraph
looks wrong, can somebody please rewrite it to match the reality?


This patch addresses that paragraph.  I've rewritten it using the c++11
and C++14 examples and removed any reference to concepts there.  I
noticed the URLS don't point anywhere useful, so updated them. The web
page still uses cxx1z as the tag, so I left that.

ok?

nathan



Looks good to me.  :-)

-Sandra



Re: [C++ PATCH] Renames/adjustments of 1z to 17

2017-09-15 Thread Nathan Sidwell

On 09/14/2017 04:26 PM, Jakub Jelinek wrote:


There is one 1z spot in gcc/doc/standards.texi, but that whole paragraph
looks wrong, can somebody please rewrite it to match the reality?


This patch addresses that paragraph.  I've rewritten it using the c++11 
and C++14 examples and removed any reference to concepts there.  I 
noticed the URLS don't point anywhere useful, so updated them. The web 
page still uses cxx1z as the tag, so I left that.


ok?

nathan

--
Nathan Sidwell
2017-09-15  Nathan Sidwell  

	* doc/standards.texi: Fix C++17 description.  Update URLs for
	C++11 & 14.

Index: doc/standards.texi
===
--- doc/standards.texi	(revision 252076)
+++ doc/standards.texi	(working copy)
@@ -196,24 +196,22 @@ A revised ISO C++ standard was published
 14882:2011, and is referred to as C++11; before its publication it was
 commonly referred to as C++0x.  C++11 contains several changes to the
 C++ language, all of which have been implemented in GCC@. For details
-see @uref{https://gcc.gnu.org/projects/@/cxx0x.html}.
+see @uref{https://gcc.gnu.org/projects/@/cxx-status.html#cxx11}.
 To select this standard in GCC, use the option @option{-std=c++11}.
 
 Another revised ISO C++ standard was published in 2014 as ISO/IEC
 14882:2014, and is referred to as C++14; before its publication it was
 sometimes referred to as C++1y.  C++14 contains several further
 changes to the C++ language, all of which have been implemented in GCC@.
-For details see @uref{https://gcc.gnu.org/projects/@/cxx1y.html}.
+For details see @uref{https://gcc.gnu.org/projects/@/cxx-status.html#cxx14}.
 To select this standard in GCC, use the option @option{-std=c++14}.
 
-GCC also supports the C++ Concepts Technical Specification,
-ISO/IEC TS 19217:2015, which allows constraints to be defined for templates,
-allowing template arguments to be checked and for templates to be
-overloaded or specialized based on the constraints. Support for C++ Concepts
-is included in an experimental C++1z mode that corresponds to the next
-revision of the ISO C++ standard, expected to be published in 2017. To enable
-C++1z support in GCC, use the option @option{-std=c++17} or
-@option{-std=c++1z}.
+The C++ language was further revised in 2017 as ISO/IEC 14882:2017 was
+published.  This is referred to as C++17, and before publication was
+often referred to as C++1z.  GCC supports all the changes in the new
+specification.  For further details see
+@uref{https://gcc.gnu.org/projects/@/cxx-status.html#cxx1z}.  Use the option
+@option{-std=c++17} to select this variant of C++.
 
 More information about the C++ standards is available on the ISO C++
 committee's web site at @uref{http://www.open-std.org/@/jtc1/@/sc22/@/wg21/}.
@@ -232,7 +230,7 @@ select an extended version of the C++ la
 @option{-std=gnu++98} (for C++98 with GNU extensions), or
 @option{-std=gnu++11} (for C++11 with GNU extensions), or
 @option{-std=gnu++14} (for C++14 with GNU extensions), or
-@option{-std=gnu++1z} (for C++1z with GNU extensions).  
+@option{-std=gnu++17} (for C++17 with GNU extensions).  
 
 The default, if
 no C++ language dialect options are given, is @option{-std=gnu++14}.


Re: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.

2017-09-15 Thread Kyrill Tkachov


On 15/09/17 16:38, Charles Baylis wrote:

On 13 September 2017 at 10:02, Kyrill  Tkachov
 wrote:

Hi Charles,

On 12/09/17 09:34, charles.bay...@linaro.org wrote:

From: Charles Baylis 

This patch moves the calculation of costs for MEM into a
separate function, and reforms the calculation into two
parts. Firstly any additional cost of the addressing mode
is calculated, and then the cost of the memory access itself
is added.

In this patch, the calculation of the cost of the addressing
mode is left as a placeholder, to be added in a subsequent
patch.


Can you please mention how has this series been tested?
A bootstrap and test run on arm-none-linux-gnueabihf is required at least.

It has been tested with make check on arm-unknown-linux-gnueabihf with
no regressions. I've successfully bootstrapped the next spin.


Thanks.


Also, do you have any benchmarking results for this?
I agree that generating the addressing modes in the new tests is desirable.
So I'm not objecting to the goal of this patch, but a check to make sure
that this doesn't regress SPEC
would be great.  Further comments on the patch inline.

SPEC2006 scores are unaffected by this patch on Cortex-A57.


Good, thanks for checking :)


+/* Helper function for arm_rtx_costs_internal.  Calculates the cost of a
MEM,
+   considering the costs of the addressing mode and memory access
+   separately.  */
+static bool
+arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
+  int *cost, bool speed_p)
+{
+  machine_mode mode = GET_MODE (x);
+  if (flag_pic
+  && GET_CODE (XEXP (x, 0)) == PLUS
+  && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
+/* This will be split into two instructions.  Add the cost of the
+   additional instruction here.  The cost of the memory access is
computed
+   below.  See arm.md:calculate_pic_address.  */
+*cost = COSTS_N_INSNS (1);
+  else
+*cost = 0;


For speed_p we want the size cost of the insn (COSTS_N_INSNS (1) for a each
insn)
plus the appropriate field in extra_cost. So you should unconditionally
initialise the cost
to COSTS_N_INSNS (1), conditionally increment it by COSTS_N_INSNS (1) with
the condition above.

OK. I also have to subtract that COSTS_N_INSNS (1) in the if (speed_p)
part because the cost of a single bus transfer is included in that
initial cost.


+
+  /* Calculate cost of the addressing mode.  */
+  if (speed_p)
+{
+  /* TODO: Add table-driven costs for addressing modes.  (See patch
2) */
+}


You mean "patch 3". I recommend you just remove this conditional from this
patch and add the logic
in patch 3 entirely.

OK.


+
+  /* Calculate cost of memory access.  */
+  if (speed_p)
+{
+  /* data transfer is transfer size divided by bus width.  */
+  int bus_width_bytes = current_tune->bus_width / 4;


This should be bus_width / BITS_PER_UNIT to get the size in bytes.
BITS_PER_UNIT is 8 though, so you'll have to double check to make sure the
cost calculation and generated code is still appropriate.

Oops, I changed the units around and messed this up. I'll fix this.


+  *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes);
+  *cost += extra_cost->ldst.load;
+}
+  else
+{
+  *cost += COSTS_N_INSNS (1);
+}

Given my first comment above this else would be deleted.

OK


I have a concern about using the bus_width parameter which
I explain in the thread for patch 1 (I don't think we need it, we should 
use the fields in extra_cost->ldst

more carefully).

Kyrill



Re: [PATCH 1/3] [ARM] Add bus_width_bits to tune_params

2017-09-15 Thread Kyrill Tkachov


On 15/09/17 16:38, Charles Baylis wrote:

On 13 September 2017 at 10:02, Kyrill  Tkachov
 wrote:

Hi Charles,

On 12/09/17 09:34, charles.bay...@linaro.org wrote:

From: Charles Baylis 

Add bus widths. These use the approximation that v7 and later cores have
64bit data bus width, and earlier cores have 32 bit bus width, with the
exception of v7m.


Given the way this field is used in patch 2 does it affect the addressing
mode generation
in the tests you added depending on the -mtune option given?
If so, we'll get testsuite failures when people test with particular default
CPU configurations.

No, because the auto_inc_dec phase compares the cost of two different
MEMs which differ only by addressing mode. The part of the calculation
which depends on the bus_width is the same both times, so it is
cancelled out.


Could you expand on the benefits we get from this extra bus_width
information?
I get that we increase the cost of memory accesses if the size of the mode
we load is larger than the
bus width, but it's not as if there is ever an alternative in this regard,
such as loading less memory,
so what pass can make different decisions thanks to this field?

As far as this patch series is concerned, it doesn't matter. It is
there to encapsulate the notion that a larger transfer results in
rtx_costs() returning a larger cost, but I don't know of any part of
the compiler which is sensitive to that difference. It's done this way
because Ramana and Richard wanted it done that way
(https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00652.html).


From what I can tell Ramana and Richard preferred to encode this 
attribute as
a tuning struct property rather than an inline conditional based on 
arm_arch7.
I agree that if we want to use that information, it should be encoded 
this way.
What I'm not convinced about is whether we do want this parameter in the 
first place.


The cost tables already encode information about the costs of different 
sized loads/stores.
In patch 2, for example, you add the cost for extra_cost->ldst.load 
which is nominally just
the cost of a normal 32-bit ldr. But we also have costs for ldst.ldrd 
which is the 64-bit two-register load
which should reflect any extra cost due to a narrower bus in it. We also 
have costs for ldst.loadf (for 32-bit
VFP loads) and ldst.loadd (for 64-bit VFP D-register loads). So I think 
we should use those cost fields
depending on the mode class and size instead of using ldst.load 
unconditionally and adding a new bus_size parameter.


So I think the way forward is to drop this patch and modify patch 2/3 to 
use the extra_cost->ldst fields as described above.


Sorry for the back-and-forth. I think this is the best approach because 
it uses the existing fields more naturally and
doesn't add new parameters that partly duplicate the information encoded 
in the existing fields.
Ramana, Richard: if you prefer the bus_width approach I won't block it, 
but could you clarify your preference?
If we do end up adding the bus_width parameter then this patch and patch 
2/3 look ok.

Thanks,
Kyrill

P.S. I'm going on a 4-week holiday from today, so I won't be able to do 
any further review in that timeframe.
As I said, if we go with the bus_size approach then these patches are 
ok. If we go with my suggestion, this would
be dropped and patch 2 would be extended to select the appropriate 
extra_cost->ldst field depending on mode.


Re: [PATCH] Fix PR target/82066 - target pragma and attribute documentation for ARM, AArch64, and S/390

2017-09-15 Thread Jeff Law
On 09/13/2017 12:00 PM, Steve Ellcey wrote:
> This patch fixes the documentation issues pointed out in PR target/82066.
> It may be considered obvious enough to just check in but I'd rather have
> someone look it over to make sure I didn't mess something up.
> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> 2017-09-13  Steve Ellcey  
> 
>   PR target/82066
>   * doc/extend.texi (Common Function Attributes): Add 
>   references to ARM, AArch64, and S/390 specific attributes.
>   (Function Specific Option Pragmas): Add AArch64 and S/390
> to list of back ends that support the target pragma.
OK.
jeff


Re: [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes.

2017-09-15 Thread Kyrill Tkachov


On 15/09/17 16:38, Charles Baylis wrote:

On 13 September 2017 at 10:02, Kyrill  Tkachov
 wrote:


Please add a comment here saying that the units are in COSTS_N_INSNS
so that we can reduce the temptation to use these in inappropriate contexts.

+  if (VECTOR_MODE_P (mode))
+   {
+ *cost += current_tune->addr_mode_costs->vector[op_type];
+   }
+  else if (FLOAT_MODE_P (mode))
+   {
+ *cost += current_tune->addr_mode_costs->fp[op_type];
+   }
+  else
+   {
+ *cost += current_tune->addr_mode_costs->integer[op_type];
+   }


No need for brackets for single-statement conditionals.

Done.


Thanks, this is ok once the prerequisites are sorted.

Kyrill


Re: [demangler] Fix nested generic lambda

2017-09-15 Thread Pedro Alves
On 09/15/2017 05:15 PM, Pedro Alves wrote:
> On 09/15/2017 01:04 PM, Nathan Sidwell wrote:
>>
>>
>> Pedro, would you like me to port to gdb's libiberty, or will you do a
>> merge in the near future?
> 
> I wasn't planning to, but I'm doing it now.
> 

Now done:
 https://sourceware.org/ml/gdb-patches/2017-09/msg00421.html

> Thanks much for the fix!

Thanks,
Pedro Alves



Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

2017-09-15 Thread Jeff Law
On 09/13/2017 03:20 PM, Wilco Dijkstra wrote:
> Jeff Law wrote:
>> On 09/06/2017 03:55 AM, Jackson Woodruff wrote:
>>> On 08/30/2017 01:46 PM, Richard Biener wrote:
> 
rdivtmp = 1 / (y*C);
tem = x *rdivtmp;
tem2= z * rdivtmp;

 instead of

rdivtmp = 1/y;
tem = x * 1/C * rdivtmp;
tem2 = z * 1/C * rdivtmp;
>>>
>>> Ideally we would be able to CSE that into
>>>
>>> rdivtmp = 1/y * 1/C;
>>> tem = x * rdivtmp;
>>> tem2 = z * rdivtmp;
>> So why is your sequence significantly better than Richi's desired
>> seqeuence?  They both seem to need 3 mults and a division (which in both
>> cases might be a reciprocal estimation).In Richi's sequence we have
>> to mult and feed the result as an operand into the reciprocal insn.  In
>> yours we feed the result of the reciprocal into the multiply.
> 
> Basically this stuff happens a lot in real code, which is exactly why I 
> proposed it.
> I even provided counts of how many divisions each transformation avoids:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026. 
I don't doubt that it happens in real code.  There's a reason why MIPS
IV added recip and rsqrt 20 years ago.  Our ability to exploit them has
always been fairly limited though.

What I'm trying to avoid is a transformation where the two forms are
roughly equal in terms of what they expose vs what they hide.

If pulling the 1/c out is consistently better, then that's obviously
good.  If it's essentially a toss-up because of the other interactions
with CSE, then we need to think about it more deeply.

I _suspect_ that pulling the 1/c out is generally better, but something
more concrete than my intuition would be helpful.




> 
> Note this transformation is a canonicalization - if you can't merge a
> constant somehow, moving it out to the LHS can expose more
> opportunities, like in (C1 * x) / (C2 * y) -> (C1 * x * C2) / y -> (C3 * x) / 
> y.
> Same for negation as it behaves like * -1.
FWIW, I generally agree.

> 
> The key here is that it is at least an order of magnitude worse if you have
> to execute an extra division than if you have an extra multiply.
No doubt.  I'll trade a division for a multiply any day.  Similarly 1/C
is just a constant, so I consider it essentially free.


> 
>> ISTM in the end if  y*C or 1/(y*C) is a CSE, then Richi's sequence wins.
>>  Similarly if 1/y is a CSE, then yours wins.  Is there some reason to
>> believe that one is a more likely CSE than the other?
> 
> The idea is that 1/y is much more likely a CSE than 1/(y*C).
Do we have anything other than intuition to back that up?
> 
> We could make the pattern only fire in single use cases and see whether
> that makes a difference. It would be easy to test old vs new vs single-use
> new and count how many divisions we end up with. We could add 1/ (y * C)
> to the reciprocal phase if it is unacceptable as a canonicalization, but then
> you won't be able to optimize (C1 * x * C2) / y.
We could.  I think the question would then become does restricting to
the single-use case kill too many opportunities.

Sigh.  I think the 4 of us could go round and round on this forever in
the pursuit of the perfect code.  Though in reality I'm happy with a
clear improvement.


> 
>> I think there's a fundamental phase ordering problem here.  You want to
>> CSE stuff as much as possible, then expose reciprocals, then CSE again
>> because exposing reciprocals can expose new CSE opportunities.
> 
> I agree there are phase ordering issues and various problems in
> reassociation, CSE and division optimizations not being able to find and
> optimize complex cases that are worthwhile.
> 
> However I don't agree doing CSE before reciprocals is a good idea. We
> want to expose reciprocals early on, even if that means we find fewer
> CSEs as a result - again because division is so much more expensive than
> any other operation. CSE is generally not smart enough to CSE a * x in
> a * b * x and a * c * x, something which is likely to happen quite frequently 
> -
> unlike the made up division examples here.
We have much stronger reassociation capabilities for multiplication --
it's a well understood problem and if we fail to pick up a * x across
those two kind of expressions, I'd consider our reassociation code as
failing pretty badly, particularly for integer types.

BUt yes, division is expensive.  And I'm all for a tranformation that
turns a division into a multiplication.  That's almost always a win.

> 
> The first question is whether you see it as a canonicalization. If so, then
> match.pd should be fine.
Pulling the constant part out of the denominator and turning it into a
multiplication by the recip constant should likely be considered
canonicalization.  I think Richi largely agreed to this in the thread as
well and asked for that hunk of the patch to be extracted and submitted
independent of the other changes so that it could go ahead and move
forward while we figure out the rest.

Note that 

Re: Transform (x / y) != 0 to x >=y and (x / y) == 0 to x < y if x, y are unsigned

2017-09-15 Thread Marc Glisse

On Fri, 15 Sep 2017, Wilco Dijkstra wrote:


Marc Glisse wrote:


The question is whether, having computed c=a/b, it is cheaper to test a

No, a

This would indicate that we do not need to check for single-use, makes the 
patch simpler, thanks.

(let's ignore -Os)

--
Marc Glisse


Re: [PATCH] Add comments to struct cgraph_thunk_info

2017-09-15 Thread Pierre-Marie de Rodat

On 09/15/2017 06:10 PM, Jeff Law wrote:

OK.
jeff


Committed. Thanks!

--
Pierre-Marie de Rodat


Re: [demangler] Fix nested generic lambda

2017-09-15 Thread Pedro Alves
On 09/15/2017 01:04 PM, Nathan Sidwell wrote:
> 
> 
> Pedro, would you like me to port to gdb's libiberty, or will you do a
> merge in the near future?

I wasn't planning to, but I'm doing it now.

Thanks much for the fix!

-- 
Pedro Alves



Re: Transform (x / y) != 0 to x >=y and (x / y) == 0 to x < y if x, y are unsigned

2017-09-15 Thread Wilco Dijkstra
Marc Glisse wrote:

> The question is whether, having computed c=a/b, it is cheaper to test a c!=0. 
> I think it is usually the second one, but not for all types on all targets. 
> Although since
> you mention VRP, it is easier to do further optimizations using the 
> information a

Re: [c-family] Issue a warning in C++ on pragma scalar_storage_order

2017-09-15 Thread Jeff Law
On 09/15/2017 05:07 AM, Eric Botcazou wrote:
> Hi,
> 
> this plugs the hole reported by Florian on the gcc@ list, namely that no 
> warning is issued with -Wall in C++ on pragma scalar_storage_order.
> 
> Tested on x86_64-suse-linux, OK for the mainline?  And some branches?
> 
> 
> 2017-09-15  Eric Botcazou  
> 
>   * c-pragma.c (handle_pragma_scalar_storage_order): Expand on error
>   message for non-uniform endianness and issue a warning in C++.
> 
> 
> 2017-09-15  Eric Botcazou  
> 
>   * g++.dg/sso-1.C: New test.
>   * g++.dg/sso-2.C: Likewise.
> 
OK.
jeff


Re: [PATCH] Add comments to struct cgraph_thunk_info

2017-09-15 Thread Jeff Law
On 09/15/2017 08:48 AM, Pierre-Marie de Rodat wrote:
> On 09/15/2017 12:54 PM, Pierre-Marie de Rodat wrote:
>> I’m not super confident about this though, so I’ll resubmit a patch
>> without the reordering: I’ve added more comments anyway as I’ve
>> learned more about this since yesterday. ;-)
> 
> Here it is!
> 
> -- 
> Pierre-Marie de Rodat
> 
> 0001-Add-comments-to-struct-cgraph_thunk_info.patch
> 
> 
> From 601dd0e949f4af456a11036918da9dbadbb3aa0c Mon Sep 17 00:00:00 2001
> From: Pierre-Marie de Rodat 
> Date: Wed, 13 Sep 2017 16:21:04 +0200
> Subject: [PATCH] Add comments to struct cgraph_thunk_info
> 
> This commit adds comments to fields in the cgraph_thunk_info structure
> declaration from cgraph.h.  They will hopefully answer questions that
> people like myself can ask while discovering the thunk machinery.  I
> also made an assertion stricter in cgraph_node::create_thunk.
> 
> Bootsrapped and regtested on x86_64-linux.
> 
> gcc/
> 
>   * cgraph.h (cgraph_thunk_info): Add comments.
>   * cgraph.c (cgraph_node::create_thunk): Adjust comment, make
>   assert for VIRTUAL_* arguments stricter.
OK.
jeff


Re: Transform (x / y) != 0 to x >=y and (x / y) == 0 to x < y if x, y are unsigned

2017-09-15 Thread Jeff Law
On 09/15/2017 09:55 AM, Marc Glisse wrote:
> On Fri, 15 Sep 2017, Jeff Law wrote:
> 
>> On 09/15/2017 07:09 AM, Marc Glisse wrote:
>>> On Fri, 15 Sep 2017, Prathamesh Kulkarni wrote:
>>>
>>> +/* (X / Y) == 0 -> X < Y if X, Y are unsigned.  */
>>> +(simplify
>>> +  (eq (trunc_div @0 @1) integer_zerop)
>>> +  (if (TYPE_UNSIGNED (TREE_TYPE(@0)) && TYPE_UNSIGNED (TREE_TYPE (@1)))
>>> +(lt @0 @1)))
>>> +
>>> +/* (X / Y) != 0 -> X >= Y, if X, Y are unsigned.  */
>>> +(simplify
>>> +  (ne (trunc_div @0 @1) integer_zerop)
>>> +  (if (TYPE_UNSIGNED (TREE_TYPE(@0)) && TYPE_UNSIGNED (TREE_TYPE (@1)))
>>> +(ge @0 @1)))
>>> +
>>>
>>> Hello,
>>>
>>> you can merge the 2 transforms using "for". Also, no need to test the
>>> type of @1 since you are already testing @0.
>> Right.
>>
>>>
>>> - do we want a single_use restriction on the result of the division?
>> I think so.  If x/y is a common subexpression, then ideally we'd compute
>> it once.
> 
> The question is whether, having computed c=a/b, it is cheaper to test
> a on all targets. Although since you mention VRP, it is easier to do
> further optimizations using the information a= b
for the two sides of the branch.  It falls out quite naturally, so I
wouldn't let which is easier for VRP to use play a significant role here.

>>> - do we also want a special case when X is 1 that produces Y==1, as
>>> asked in a recent PR?
>> Seems like a reasonable follow-up as well.
>>
>> The other follow-up to consider is detecting these cases in VRP to
>> produce suitable ASSERT_EXPRs and ranges.
>>
>>> - once in a while, someone mentions that eq, on vectors, can either do
>>> elementwise comparison and return a vector, or return a single boolean,
>>> which would fail here. However, I don't remember ever seeing an example.
>> We could always restrict to the integral types.  Probably wise to
>> explicitly do that anyway.
> 
> VECTOR_TYPE_P (type) || !VECTOR_TYPE_P (TREE_TYPE (@0))
> 
> should be enough to avoid the problematic case, the transformation can
> still be nice for vectors.
Seems reasonable to me.

Richi, further comments?

Prathamesh, I think you've got a few things to address, but hopefully
nothing terribly complex.

jeff-0
> 



Re: [C++ PATCH] Renames/adjustments of 1z to 17

2017-09-15 Thread Nathan Sidwell

On 09/14/2017 04:26 PM, Jakub Jelinek wrote:

Hi!

Given https://herbsutter.com/2017/09/06/c17-is-formally-approved/
this patch makes -std=c++17 and -std=gnu++17 the documented options
and -std=c++1z and -std=gnu++1z deprecated aliases, adjusts diagnostics etc.

Bootstrapped/regtest on x86_64-linux and i686-linux, ok for trunk?
The changes in gcc/testsuite/ and libstdc++/testsuite appart from
*.exp files are just sed -i -e 's/1z/17/g' `find . -type f`.


I think the patch is good, modulo the issue Pedro pointed at.


There is one 1z spot in gcc/doc/standards.texi, but that whole paragraph
looks wrong, can somebody please rewrite it to match the reality?


Yeah, that's no longer true.  Will fix.

nathan

--
Nathan Sidwell


[committed] Fix combine make_extraction (PR rtl-optimization/82192)

2017-09-15 Thread Jakub Jelinek
Hi!

When we have (x >> y) & 0x1fff or similar (for non-constant y or
even for constant y if y + 13 is bigger than x's bits) and x is
a non-paradoxical lowpart subreg (in the testcase (subreg:SI (reg:DI ...) 0))
then the lshiftrt extracts some bits out (0 to 13) out of the wider
DImode registers starting at y, but the upper bits if should be zeroed
out.  make_extraction happily changes it into an extraction out of
the reg:DI directly and that (at least for initially valid y 0 to 31)
will always extract exactly 13 bits out of the register; if there are
any bits above the low SImode part that are non-zero, this results
in different behavior.

The following patch stops doing that unless we can prove we don't care
about any of the bits above it.

Bootstrapped/regtested on x86_64-linux and i686-linux, extra statistics
collection didn't reveal changes in combiner's total_* vars at the end
of compilations except for this new testcase and combine.c itself.
Preapproved by Segher in the PR, committed to trunk.
For possible backports to release branches I'd like to wait some time.

2017-09-15  Jakub Jelinek  

PR rtl-optimization/82192
* combine.c (make_extraction): Don't look through non-paradoxical
SUBREGs or TRUNCATE if pos + len is or might be bigger than
inner's mode.

* gcc.c-torture/execute/pr82192.c: New test.

--- gcc/combine.c.jj2017-09-14 10:04:56.0 +0200
+++ gcc/combine.c   2017-09-14 16:59:28.529783572 +0200
@@ -7444,7 +7444,14 @@ make_extraction (machine_mode mode, rtx
   if (pos_rtx && CONST_INT_P (pos_rtx))
 pos = INTVAL (pos_rtx), pos_rtx = 0;
 
-  if (GET_CODE (inner) == SUBREG && subreg_lowpart_p (inner))
+  if (GET_CODE (inner) == SUBREG
+  && subreg_lowpart_p (inner)
+  && (paradoxical_subreg_p (inner)
+ /* If trying or potentionally trying to extract
+bits outside of is_mode, don't look through
+non-paradoxical SUBREGs.  See PR82192.  */
+ || (pos_rtx == NULL_RTX
+ && pos + len <= GET_MODE_PRECISION (is_mode
 {
   /* If going from (subreg:SI (mem:QI ...)) to (mem:QI ...),
 consider just the QI as the memory to extract from.
@@ -7470,7 +7477,12 @@ make_extraction (machine_mode mode, rtx
   if (new_rtx != 0)
return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
 }
-  else if (GET_CODE (inner) == TRUNCATE)
+  else if (GET_CODE (inner) == TRUNCATE
+  /* If trying or potentionally trying to extract
+ bits outside of is_mode, don't look through
+ TRUNCATE.  See PR82192.  */
+  && pos_rtx == NULL_RTX
+  && pos + len <= GET_MODE_PRECISION (is_mode))
 inner = XEXP (inner, 0);
 
   inner_mode = GET_MODE (inner);
--- gcc/testsuite/gcc.c-torture/execute/pr82192.c.jj2017-09-14 
17:02:54.281234432 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr82192.c   2017-09-14 
17:02:39.0 +0200
@@ -0,0 +1,22 @@
+/* PR rtl-optimization/82192 */
+
+unsigned long long int a = 0x95dd3d896f7422e2ULL;
+struct S { unsigned int m : 13; } b;
+
+__attribute__((noinline, noclone)) void
+foo (void)
+{
+  b.m = ((unsigned) a) >> (0x644eee9667723bf7LL
+  | a & ~0xdee27af8U) - 0x644eee9667763bd8LL;
+}
+
+int
+main ()
+{
+  if (__INT_MAX__ != 0x7fffULL)
+return 0;
+  foo ();
+  if (b.m != 0)
+__builtin_abort ();
+  return 0;
+}

Jakub


Re: [C++ Patch Ping] PR 64644 (""warning: anonymous union with no members" should be an error with -pedantic-errors")

2017-09-15 Thread Nathan Sidwell

On 09/15/2017 05:53 AM, Paolo Carlini wrote:

Hi,

gently pinging this.

On 16/06/2017 15:47, Paolo Carlini wrote:

Hi,

submitter and Manuel analyzed this a while ago and came to the 
conclusion - which I think is still valid vs the current working draft 
- that strictly speaking this kind of code violates [dcl.dcl], thus a 
pedwarn seems more suited than a plain warning. The below one-liner, 
suggested by Manuel at the time, passes testing on x86_64-linux 
together with my testsuite changes.


     https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01193.html


Ok.  class.union.anon has the member-specification as non-optional.

nathan

--
Nathan Sidwell


Re: Transform (x / y) != 0 to x >=y and (x / y) == 0 to x < y if x, y are unsigned

2017-09-15 Thread Marc Glisse

On Fri, 15 Sep 2017, Jeff Law wrote:


On 09/15/2017 07:09 AM, Marc Glisse wrote:

On Fri, 15 Sep 2017, Prathamesh Kulkarni wrote:

+/* (X / Y) == 0 -> X < Y if X, Y are unsigned.  */
+(simplify
+  (eq (trunc_div @0 @1) integer_zerop)
+  (if (TYPE_UNSIGNED (TREE_TYPE(@0)) && TYPE_UNSIGNED (TREE_TYPE (@1)))
+(lt @0 @1)))
+
+/* (X / Y) != 0 -> X >= Y, if X, Y are unsigned.  */
+(simplify
+  (ne (trunc_div @0 @1) integer_zerop)
+  (if (TYPE_UNSIGNED (TREE_TYPE(@0)) && TYPE_UNSIGNED (TREE_TYPE (@1)))
+(ge @0 @1)))
+

Hello,

you can merge the 2 transforms using "for". Also, no need to test the
type of @1 since you are already testing @0.

Right.



- do we want a single_use restriction on the result of the division?

I think so.  If x/y is a common subexpression, then ideally we'd compute
it once.


The question is whether, having computed c=a/b, it is cheaper to test aor c!=0. I think it is usually the second one, but not for all types on 
all targets. Although since you mention VRP, it is easier to do further 
optimizations using the information a


- do we also want to handle (x>>4)==0?

I think so, but that can be a follow-up IMHO.


Yes, I forgot to specify it in my email, but these are not meant as 
requirements for this patch to move forward.



- do we also want a special case when X is 1 that produces Y==1, as
asked in a recent PR?

Seems like a reasonable follow-up as well.

The other follow-up to consider is detecting these cases in VRP to
produce suitable ASSERT_EXPRs and ranges.


- once in a while, someone mentions that eq, on vectors, can either do
elementwise comparison and return a vector, or return a single boolean,
which would fail here. However, I don't remember ever seeing an example.

We could always restrict to the integral types.  Probably wise to
explicitly do that anyway.


VECTOR_TYPE_P (type) || !VECTOR_TYPE_P (TREE_TYPE (@0))

should be enough to avoid the problematic case, the transformation can 
still be nice for vectors.


--
Marc Glisse


Re: Fix an SVE failure in the Fortran matmul* tests

2017-09-15 Thread Jeff Law
On 09/15/2017 04:49 AM, Richard Sandiford wrote:
> The vectoriser was calling vect_get_smallest_scalar_type without
> having proven that the type actually is a scalar.  This seems to
> be the intended behaviour: the ultimate test of whether the type
> is interesting (and hence scalar) is whether an associated vector
> type exists, but this is only tested later.
> 
> The patch simply makes the function cope gracefully with non-scalar
> inputs.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?
> 
> Richard
> 
> 
> 2017-09-15  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-vect-data-refs.c (vect_get_smallest_scalar_type): Cope
>   with types that aren't in fact scalar.
OK.
jeff


Re: [PATCHv2] Add a -Wcast-align=strict warning

2017-09-15 Thread Joseph Myers
On Thu, 14 Sep 2017, Bernd Edlinger wrote:

> Hi,
> 
> as suggested by Joseph, here is an updated patch that
> uses min_align_of_type instead of TYPE_ALIGN.
> 
> Is it OK?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Fix vectorizable_mask_load_store handling of invariant masks

2017-09-15 Thread Jeff Law
On 09/15/2017 04:47 AM, Richard Sandiford wrote:
> vectorizable_mask_load_store was not passing the required mask type to
> vect_get_vec_def_for_operand.  This doesn't matter for masks that are
> defined in the loop, since their STMT_VINFO_VECTYPE will be what we need
> anyway.  But it's not possible to tell which mask type the caller needs
> when looking at an invariant scalar boolean.  As the comment above the
> function says:
> 
>In case OP is an invariant or constant, a new stmt that creates a vector 
> def
>needs to be introduced.  VECTYPE may be used to specify a required type for
>vector invariant.
> 
> This fixes the attached testcase for SVE.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?
> 
> Richard
> 
> 
> 2017-09-15  Richard Sandiford  
> 
> gcc/
>   * tree-vect-stmts.c (vectorizable_mask_load_store): Pass mask_vectype
>   to vect_get_vec_def_for_operand when getting the mask operand.
> 
> gcc/testsuite/
>   * gfortran.dg/vect/mask-store-1.f90: New test.
OK.  It does make me wonder though if we should just always be passing
in the vectype rather than defaulting it to NULL to avoid similar issues
at other call points.

I haven't really looked at those call points to know if they're actually
problematical or not -- it's just a general concern.

Jeff


Re: [PATCH] Add macro DISABLE_COPY_AND_ASSIGN

2017-09-15 Thread Yao Qi
On Sat, Sep 9, 2017 at 1:27 PM, Ian Lance Taylor  wrote:
>
> The patch to include/ansidecl.h is basically OK.  Please add an
> example in the comment showing how to use it: after `private:`, and
> with a trailing semicolon.  Thanks.

Patch below is committed.  Thanks for the review.

>
> The patches to the other files will have to be approved by the
> relevant maintainers.
>

I'll split it and post them later.

-- 
Yao (齐尧)
From 753d12319d85876c2513029037af539c43717251 Mon Sep 17 00:00:00 2001
From: qiyao 
Date: Fri, 15 Sep 2017 15:40:50 +
Subject: [PATCH] [include] Add macro DISABLE_COPY_AND_ASSIGN

We have many classes that copy cotr and assignment operator are deleted
in different projects, gcc, gdb and gold.  So this patch adds a macro
to do this, and replace these existing mechanical code with macro
DISABLE_COPY_AND_ASSIGN.

The patch was posted in gdb-patches,
https://sourceware.org/ml/gdb-patches/2017-07/msg00254.html but we
think it is better to put this macro in include/ansidecl.h so that
other projects can use it too.

include:

2017-09-15  Yao Qi  
	Pedro Alves  

	* ansidecl.h (DISABLE_COPY_AND_ASSIGN): New macro.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@252823 138bc75d-0d04-0410-961f-82ee72b054a4
---
 include/ChangeLog  |  5 +
 include/ansidecl.h | 26 ++
 2 files changed, 31 insertions(+)

diff --git a/include/ChangeLog b/include/ChangeLog
index 4703588..0221586 100644
--- a/include/ChangeLog
+++ b/include/ChangeLog
@@ -1,3 +1,8 @@
+2017-09-15  Yao Qi  
+	Pedro Alves  
+
+	* ansidecl.h (DISABLE_COPY_AND_ASSIGN): New macro.
+
 2017-09-12  Jiong Wang  
 
 	* dwarf2.def (DW_CFA_AARCH64_negate_ra_state): New DW_CFA_DUP.
diff --git a/include/ansidecl.h b/include/ansidecl.h
index ab3b895..450ce35 100644
--- a/include/ansidecl.h
+++ b/include/ansidecl.h
@@ -360,6 +360,32 @@ So instead we use the macro below and test it against specific values.  */
 # define FINAL
 #endif
 
+/* A macro to disable the copy constructor and assignment operator.
+   When building with C++11 and above, the methods are explicitly
+   deleted, causing a compile-time error if something tries to copy.
+   For C++03, this just declares the methods, causing a link-time
+   error if the methods end up called (assuming you don't
+   define them).  For C++03, for best results, place the macro
+   under the private: access specifier, like this,
+
+   class name_lookup
+   {
+ private:
+   DISABLE_COPY_AND_ASSIGN (name_lookup);
+   };
+
+   so that most attempts at copy are caught at compile-time.  */
+
+#if __cplusplus >= 201103
+#define DISABLE_COPY_AND_ASSIGN(TYPE)		\
+  TYPE (const TYPE&) = delete;			\
+  void operator= (const TYPE &) = delete
+  #else
+#define DISABLE_COPY_AND_ASSIGN(TYPE)		\
+  TYPE (const TYPE&);\
+  void operator= (const TYPE &)
+#endif /* __cplusplus >= 201103 */
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.9.1



Re: [PATCH, rs6000 version 2] Add support for vec_xst_len_r() and vec_xl_len_r() builtins

2017-09-15 Thread Segher Boessenkool
Hi Carl,

On Thu, Sep 14, 2017 at 02:23:47PM -0700, Carl Love wrote:
> vecload isn't really the correct type for this, but I see we have the
> same on the existing lvsl patterns (it's permute unit on p9; I expect
> the same on p8 and older, but please check).

It is a bit more complicated on older cores I think; but we'll deal with
all at once, there is nothing special about your added one.

>   * doc/extend.texi: Update the built-in documenation file for the new
>   built-in functions.

(Typo, "documentation").

> +(define_insn "altivec_lvsl_reg"
> +  [(set (match_operand:V16QI 0 "vsx_register_operand" "=v")

altivec_register_operand instead?  lvsl can target only the VR regs, not
all VSR regs.

> +;; Load VSX Vector with Length, right justified
> +(define_expand "lxvll"
> +  [(set (match_dup 3)
> + (match_operand:DI 2 "register_operand"))
> +   (set (match_operand:V16QI 0 "vsx_register_operand")
> + (unspec:V16QI
> + [(match_operand:DI 1 "gpc_reg_operand")
> +  (match_dup 3)]
> + UNSPEC_LXVLL))]
> +  "TARGET_P9_VECTOR && TARGET_64BIT"
> +{
> +  operands[3] = gen_reg_rtx (DImode);
> +})

I don't think you need to copy operands[2] to a temporary here, see below.

Why does this require TARGET_64BIT?

> +(define_insn "sldi"
> +  [(set (match_operand:DI 0 "vsx_register_operand" "=r")
> + (unspec:DI [(match_operand:DI 1 "gpc_reg_operand" "r")
> + (match_operand:DI 2 "u6bit_cint_operand" "")]
> +UNSPEC_SLDI))]
> +  ""
> +  "sldi %0,%1,%2"
> +)

As we discussed, you can just use ashldi3.

> +(define_insn "*lxvll"
> +  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
> + (unspec:V16QI [(match_operand:DI 1 "gpc_reg_operand" "b")
> +(match_operand:DI 2 "register_operand" "+r")]
> +   UNSPEC_LXVLL))]
> +  "TARGET_P9_VECTOR && TARGET_64BIT"
> +  "lxvll %x0,%1,%2;"
> +  [(set_attr "length" "4")
> +   (set_attr "type" "vecload")])

Why "+r"?  The instruction doesn't write to that reg.  A leftover from
an earlier version of the patch, I guess.

No ";" at the end of pattern strings please.

Length 4 is the default, just leave it out.

> +;; Expand for builtin xl_len_r
> +(define_expand "xl_len_r"
> +  [(match_operand:V16QI 0 "vsx_register_operand" "=v")
> +   (match_operand:DI 1 "register_operand" "r")
> +   (match_operand:DI 2 "register_operand" "r")]
> +  "UNSPEC_XL_LEN_R"

Expanders don't need constraints; just leave them out :-)

> +{
> +  rtx shift_mask = gen_reg_rtx (V16QImode);
> +  rtx rtx_vtmp = gen_reg_rtx (V16QImode);
> +  rtx tmp = gen_reg_rtx (DImode);
> +
> +  /* Setup permute vector to shift right by operands[2] bytes.
> + Note: operands[2] is between 0 and 15, adding -16 to it results
> + in a negative value.  Shifting left by a negative value results in
> + the value being shifted right by the desired amount.  */
> +  emit_insn (gen_adddi3 (tmp, operands[2], GEN_INT (-16)));
> +  emit_insn (gen_altivec_lvsl_reg (shift_mask, tmp));

Since lvsl looks only at the low four bits, adding -16 does nothing for it.

> +  emit_insn (gen_sldi (operands[2], operands[2], GEN_INT (56)));

Please use a new temporary instead of reusing operands[2]; this gives the
register allocator more freedom.

> +(define_insn "*stxvll"
> +  [(set (mem:V16QI (match_operand:DI 1 "gpc_reg_operand" "b"))
> + (unspec:V16QI
> +  [(match_operand:V16QI 0 "vsx_register_operand" "wa")
> +   (match_operand:DI 2 "register_operand" "+r")]
> +  UNSPEC_STXVLL))]
> +  "TARGET_P9_VECTOR && TARGET_64BIT"
> +  "stxvll %x0,%1,%2"
> +  [(set_attr "length" "8")
> +   (set_attr "type" "vecstore")])

That's the wrong length now (just a single insn; doesn't need a length
attribute).

Many of these comments apply to multiple places, please check all.

Thanks,


Segher


Re: Include phis in SLP unrolling calculation

2017-09-15 Thread Jeff Law
On 09/15/2017 04:48 AM, Richard Sandiford wrote:
> Without this we'd pick an unrolling factor based purely on longs,
> ignoring the ints.  It's posssible that vect_get_smallest_scalar_type
> should also handle shifts, but I think we'd still want this as a
> belt-and-braces fix.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?
> 
> Richard
> 
> 
> 2017-09-15  Richard Sandiford  
> 
> gcc/
>   * tree-vect-slp.c (vect_record_max_nunits): New function,
>   split out from...
>   (vect_build_slp_tree_1): ...here.
>   (vect_build_slp_tree_2): Call it for phis too.
> 
> gcc/testsuite/
>   * gcc.dg/vect/slp-multitypes-13.c: New test.
OK.
jeff


Re: Fix type of bitstart in vectorizable_live_operation

2017-09-15 Thread Jeff Law
On 09/15/2017 04:45 AM, Richard Sandiford wrote:
> This patch changes the type of the multiplier applied by
> vectorizable_live_operation from unsigned_type_node to bitsizetype,
> which matches the type of TYPE_SIZE and is the type expected of a
> BIT_FIELD_REF bit position.  This is shown by existing tests when
> SVE is added.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?
> 
> Richard
> 
> 
> 2017-09-15  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-vect-loop.c (vectorizable_live_operation): Fix type of
>   bitstart.
OK.
jeff


Re: [libstdc++/71500] make back reference work with icase

2017-09-15 Thread Jonathan Wakely

On 04/09/17 03:31 -0700, Tim Shen via libstdc++ wrote:

This fixes the follow-up comments in 71500.

Back-reference matching is different from other matching, as the
content the back-reference refers to is at "run-time", aka during
regex_match(), not regex() compilation.

For compilation we do have an abstraction layer to catch all
comparison customizations, namely _M_translator in regex_compiler.h.
Until this patch, we don't have an abstraction for "run-time"
matching. I believe that back-reference is the only place that needs
run-time matching, so I just build a _Backref_matcher in
regex_executot.tcc.

Tested on x86_64-linux-gnu.

Thanks!

--
Regards,
Tim Shen



commit a97b7fecd319e031ffc489a956b8cf3dc63eeb26
Author: Tim Shen 
Date:   Mon Sep 4 03:19:35 2017 -0700

   PR libstdc++/71500
   * include/bits/regex_executor.tcc: Support icase in
   regex_tratis<...> for back reference matches.
   * testsuite/28_regex/regression.cc: Test case.

diff --git a/libstdc++-v3/include/bits/regex_executor.tcc 
b/libstdc++-v3/include/bits/regex_executor.tcc
index 226e05856e1..f6149fecf9d 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -335,6 +335,54 @@ namespace __detail
  _M_states._M_queue(__state._M_next, _M_cur_results);
}

+  template
+struct _Backref_matcher
+{
+  _Backref_matcher(bool __icase, const _TraitsT& __traits)
+  : _M_traits(__traits) { }
+
+  bool
+  _M_apply(_BiIter __expected_begin,
+  _BiIter __expected_end, _BiIter __actual_begin,
+  _BiIter __actual_end)
+  {
+   return _M_traits.transform(__expected_begin, __expected_end)
+   == _M_traits.transform(__actual_begin, __actual_end);
+  }
+
+  const _TraitsT& _M_traits;
+};
+
+  template
+struct _Backref_matcher<_BiIter, std::regex_traits<_CharT>>
+{
+  using _TraitsT = std::regex_traits<_CharT>;
+  _Backref_matcher(bool __icase, const _TraitsT& __traits)
+  : _M_icase(__icase), _M_traits(__traits) { }
+
+  bool
+  _M_apply(_BiIter __expected_begin,
+  _BiIter __expected_end, _BiIter __actual_begin,
+  _BiIter __actual_end)
+  {
+   if (!_M_icase)
+ return std::equal(__expected_begin, __expected_end,
+   __actual_begin, __actual_end);


This is only valid in C++14 and higher, because the 4-argument version
of std::equal isn't present in C++11.


+   typedef std::ctype<_CharT> __ctype_type;
+   const auto& __fctyp = use_facet<__ctype_type>(_M_traits.getloc());
+   return std::equal(__expected_begin, __expected_end,
+ __actual_begin, __actual_end,


Same here.


+ [this, &__fctyp](_CharT __lhs, _CharT __rhs)
+ {
+   return __fctyp.tolower(__lhs)
+   == __fctyp.tolower(__rhs);
+ });


We need to rewrite this to check the lengths are equal first, and then
call the 3-argument version of std::equal.

Alternatively, we could move the implementation of the C++14
std::equal overloads to __equal and make that available for C++11.
I'll try that.




Re: Fix vectorizable_live_operation handling of vector booleans

2017-09-15 Thread Jeff Law
On 09/15/2017 04:45 AM, Richard Sandiford wrote:
> vectorizable_live_operation needs to use BIT_FIELD_REF to extract one
> element of a vector.  For a packed vector boolean type, the number of
> bits to extract should be taken from TYPE_PRECISION rather than TYPE_SIZE.
> 
> This is shown by existing tests once SVE is added.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?
> 
> Richard
> 
> 
> 2017-09-15  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-vect-loop.c (vectorizable_live_operation): Fix element size
>   calculation for vector booleans.
OK.
jeff



Re: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.

2017-09-15 Thread Charles Baylis
On 13 September 2017 at 10:02, Kyrill  Tkachov
 wrote:
> Hi Charles,
>
> On 12/09/17 09:34, charles.bay...@linaro.org wrote:
>>
>> From: Charles Baylis 
>>
>> This patch moves the calculation of costs for MEM into a
>> separate function, and reforms the calculation into two
>> parts. Firstly any additional cost of the addressing mode
>> is calculated, and then the cost of the memory access itself
>> is added.
>>
>> In this patch, the calculation of the cost of the addressing
>> mode is left as a placeholder, to be added in a subsequent
>> patch.
>>
>
> Can you please mention how has this series been tested?
> A bootstrap and test run on arm-none-linux-gnueabihf is required at least.

It has been tested with make check on arm-unknown-linux-gnueabihf with
no regressions. I've successfully bootstrapped the next spin.

> Also, do you have any benchmarking results for this?
> I agree that generating the addressing modes in the new tests is desirable.
> So I'm not objecting to the goal of this patch, but a check to make sure
> that this doesn't regress SPEC
> would be great.  Further comments on the patch inline.

SPEC2006 scores are unaffected by this patch on Cortex-A57.

>> +/* Helper function for arm_rtx_costs_internal.  Calculates the cost of a
>> MEM,
>> +   considering the costs of the addressing mode and memory access
>> +   separately.  */
>> +static bool
>> +arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
>> +  int *cost, bool speed_p)
>> +{
>> +  machine_mode mode = GET_MODE (x);
>> +  if (flag_pic
>> +  && GET_CODE (XEXP (x, 0)) == PLUS
>> +  && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
>> +/* This will be split into two instructions.  Add the cost of the
>> +   additional instruction here.  The cost of the memory access is
>> computed
>> +   below.  See arm.md:calculate_pic_address.  */
>> +*cost = COSTS_N_INSNS (1);
>> +  else
>> +*cost = 0;
>
>
> For speed_p we want the size cost of the insn (COSTS_N_INSNS (1) for a each
> insn)
> plus the appropriate field in extra_cost. So you should unconditionally
> initialise the cost
> to COSTS_N_INSNS (1), conditionally increment it by COSTS_N_INSNS (1) with
> the condition above.

OK. I also have to subtract that COSTS_N_INSNS (1) in the if (speed_p)
part because the cost of a single bus transfer is included in that
initial cost.

>> +
>> +  /* Calculate cost of the addressing mode.  */
>> +  if (speed_p)
>> +{
>> +  /* TODO: Add table-driven costs for addressing modes.  (See patch
>> 2) */
>> +}
>
>
> You mean "patch 3". I recommend you just remove this conditional from this
> patch and add the logic
> in patch 3 entirely.

OK.

>> +
>> +  /* Calculate cost of memory access.  */
>> +  if (speed_p)
>> +{
>> +  /* data transfer is transfer size divided by bus width.  */
>> +  int bus_width_bytes = current_tune->bus_width / 4;
>
>
> This should be bus_width / BITS_PER_UNIT to get the size in bytes.
> BITS_PER_UNIT is 8 though, so you'll have to double check to make sure the
> cost calculation and generated code is still appropriate.

Oops, I changed the units around and messed this up. I'll fix this.

>> +  *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes);
>> +  *cost += extra_cost->ldst.load;
>> +}
>> +  else
>> +{
>> +  *cost += COSTS_N_INSNS (1);
>> +}
>
> Given my first comment above this else would be deleted.

OK
From f81e1d3212475a3dc7aaeb8cb3171c6defd40687 Mon Sep 17 00:00:00 2001
From: Charles Baylis 
Date: Wed, 8 Feb 2017 16:52:10 +
Subject: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.

This patch moves the calculation of costs for MEM into a
separate function, and reforms the calculation into two
parts. Firstly any additional cost of the addressing mode
is calculated, and then the cost of the memory access itself
is added.

In this patch, the calculation of the cost of the addressing
mode is left as a placeholder, to be added in a subsequent
patch.

gcc/ChangeLog:

  Charles Baylis  

	* config/arm/arm.c (arm_mem_costs): New function.
	(arm_rtx_costs_internal): Use arm_mem_costs.

gcc/testsuite/ChangeLog:

  Charles Baylis  

	* gcc.target/arm/addr-modes-float.c: New test.
	* gcc.target/arm/addr-modes-int.c: New test.
	* gcc.target/arm/addr-modes.h: New header.

Change-Id: I99e93406ea39ee31f71c7bf428ad3e127b7a618e
---
 gcc/config/arm/arm.c| 60 ++---
 gcc/testsuite/gcc.target/arm/addr-modes-float.c | 42 +
 gcc/testsuite/gcc.target/arm/addr-modes-int.c   | 46 +++
 gcc/testsuite/gcc.target/arm/addr-modes.h   | 53 ++
 4 files changed, 176 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-float.c
 create mode 100644 

Re: [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes.

2017-09-15 Thread Charles Baylis
On 13 September 2017 at 10:02, Kyrill  Tkachov
 wrote:

>
> Please add a comment here saying that the units are in COSTS_N_INSNS
> so that we can reduce the temptation to use these in inappropriate contexts.

>> +  if (VECTOR_MODE_P (mode))
>> +   {
>> + *cost += current_tune->addr_mode_costs->vector[op_type];
>> +   }
>> +  else if (FLOAT_MODE_P (mode))
>> +   {
>> + *cost += current_tune->addr_mode_costs->fp[op_type];
>> +   }
>> +  else
>> +   {
>> + *cost += current_tune->addr_mode_costs->integer[op_type];
>> +   }
>
>
> No need for brackets for single-statement conditionals.

Done.
From a35fa59f4dc3be42a52519a90bdd2d47e74db086 Mon Sep 17 00:00:00 2001
From: Charles Baylis 
Date: Thu, 14 Sep 2017 12:47:41 +0100
Subject: [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes.

This patch adds support for modelling the varying costs of
different addressing modes. The generic cost table treats
all addressing modes as having equal cost.

gcc/ChangeLog:

  Charles Baylis  

	* config/arm/arm-protos.h (enum arm_addr_mode_op): New.
	(struct addr_mode_cost_table): New.
	(struct tune_params): Add field addr_mode_costs.
	* config/arm/arm.c (generic_addr_mode_costs): New.
	(arm_slowmul_tune): Initialise addr_mode_costs field.
	(arm_fastmul_tune): Likewise.
	(arm_strongarm_tune): Likewise.
	(arm_xscale_tune): Likewise.
	(arm_9e_tune): Likewise.
	(arm_marvell_pj4_tune): Likewise.
	(arm_v6t2_tune): Likewise.
	(arm_cortex_tune): Likewise.
	(arm_cortex_a8_tune): Likewise.
	(arm_cortex_a7_tune): Likewise.
	(arm_cortex_a15_tune): Likewise.
	(arm_cortex_a35_tune): Likewise.
	(arm_cortex_a53_tune): Likewise.
	(arm_cortex_a57_tune): Likewise.
	(arm_exynosm1_tune): Likewise.
	(arm_xgene1_tune): Likewise.
	(arm_cortex_a5_tune): Likewise.
	(arm_cortex_a9_tune): Likewise.
	(arm_cortex_a12_tune): Likewise.
	(arm_cortex_a73_tune): Likewise.
	(arm_v7m_tune): Likewise.
	(arm_cortex_m7_tune): Likewise.
	(arm_v6m_tune): Likewise.
	(arm_fa726te_tune): Likewise.
	(arm_mem_costs): Use table lookup to calculate cost of addressing
	mode.

Change-Id: If71bd7c4f4bb876c5ed82dc28791130efb8bf89e
---
 gcc/config/arm/arm-protos.h | 20 +++
 gcc/config/arm/arm.c| 81 +
 2 files changed, 101 insertions(+)

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 47a85cc..7769726 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -261,12 +261,32 @@ struct cpu_vec_costs {
 
 struct cpu_cost_table;
 
+/* Addressing mode operations.  Used to index tables in struct
+   addr_mode_cost_table.  */
+enum arm_addr_mode_op
+{
+   AMO_DEFAULT,
+   AMO_NO_WB,	/* Offset with no writeback.  */
+   AMO_WB,	/* Offset with writeback.  */
+   AMO_MAX	/* For array size.  */
+};
+
+/* Table of additional costs in units of COSTS_N_INSNS() when using
+   addressing modes for each access type.  */
+struct addr_mode_cost_table
+{
+   const int integer[AMO_MAX];
+   const int fp[AMO_MAX];
+   const int vector[AMO_MAX];
+};
+
 /* Dump function ARM_PRINT_TUNE_INFO should be updated whenever this
structure is modified.  */
 
 struct tune_params
 {
   const struct cpu_cost_table *insn_extra_cost;
+  const struct addr_mode_cost_table *addr_mode_costs;
   bool (*sched_adjust_cost) (rtx_insn *, int, rtx_insn *, int *);
   int (*branch_cost) (bool, bool);
   /* Vectorizer costs.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 64230b8..7773ec3 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1751,9 +1751,32 @@ const struct cpu_cost_table v7m_extra_costs =
   }
 };
 
+const struct addr_mode_cost_table generic_addr_mode_costs =
+{
+  /* int.  */
+  {
+COSTS_N_INSNS (0),	/* AMO_DEFAULT.  */
+COSTS_N_INSNS (0),	/* AMO_NO_WB.  */
+COSTS_N_INSNS (0)	/* AMO_WB.  */
+  },
+  /* float.  */
+  {
+COSTS_N_INSNS (0),	/* AMO_DEFAULT.  */
+COSTS_N_INSNS (0),	/* AMO_NO_WB.  */
+COSTS_N_INSNS (0)	/* AMO_WB.  */
+  },
+  /* vector.  */
+  {
+COSTS_N_INSNS (0),	/* AMO_DEFAULT.  */
+COSTS_N_INSNS (0),	/* AMO_NO_WB.  */
+COSTS_N_INSNS (0)	/* AMO_WB.  */
+  }
+};
+
 const struct tune_params arm_slowmul_tune =
 {
   _extra_costs,			/* Insn extra costs.  */
+  _addr_mode_costs,		/* Addressing mode costs.  */
   NULL,	/* Sched adj cost.  */
   arm_default_branch_cost,
   _default_vec_cost,
@@ -1777,6 +1800,7 @@ const struct tune_params arm_slowmul_tune =
 const struct tune_params arm_fastmul_tune =
 {
   _extra_costs,			/* Insn extra costs.  */
+  _addr_mode_costs,		/* Addressing mode costs.  */
   NULL,	/* Sched adj cost.  */
   arm_default_branch_cost,
   _default_vec_cost,
@@ -1803,6 +1827,7 @@ const struct tune_params arm_fastmul_tune =
 const struct tune_params arm_strongarm_tune =
 {
   _extra_costs,			/* Insn extra costs.  */
+  _addr_mode_costs,		/* 

Re: [PATCH 1/3] [ARM] Add bus_width_bits to tune_params

2017-09-15 Thread Charles Baylis
On 13 September 2017 at 10:02, Kyrill  Tkachov
 wrote:
> Hi Charles,
>
> On 12/09/17 09:34, charles.bay...@linaro.org wrote:
>>
>> From: Charles Baylis 
>>
>> Add bus widths. These use the approximation that v7 and later cores have
>> 64bit data bus width, and earlier cores have 32 bit bus width, with the
>> exception of v7m.
>>
>
> Given the way this field is used in patch 2 does it affect the addressing
> mode generation
> in the tests you added depending on the -mtune option given?
> If so, we'll get testsuite failures when people test with particular default
> CPU configurations.

No, because the auto_inc_dec phase compares the cost of two different
MEMs which differ only by addressing mode. The part of the calculation
which depends on the bus_width is the same both times, so it is
cancelled out.

> Could you expand on the benefits we get from this extra bus_width
> information?
> I get that we increase the cost of memory accesses if the size of the mode
> we load is larger than the
> bus width, but it's not as if there is ever an alternative in this regard,
> such as loading less memory,
> so what pass can make different decisions thanks to this field?

As far as this patch series is concerned, it doesn't matter. It is
there to encapsulate the notion that a larger transfer results in
rtx_costs() returning a larger cost, but I don't know of any part of
the compiler which is sensitive to that difference. It's done this way
because Ramana and Richard wanted it done that way
(https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00652.html).
From b7bec2e4f7ca0335e0e5bd84c297215a3a7fb8c7 Mon Sep 17 00:00:00 2001
From: Charles Baylis 
Date: Fri, 8 Sep 2017 12:53:50 +0100
Subject: [PATCH 1/3] [ARM] Add bus_width_bits to tune_params

Add bus widths. These use the approximation that v7 and later cores have
64bit data bus width, and earlier cores have 32 bit bus width, with the
exception of v7m.

  Charles Baylis  

	* config/arm/arm-protos.h (struct tune_params): New field
	bus_width.
	* config/arm/arm.c (arm_slowmul_tune): Initialise bus_width field.
	(arm_fastmul_tune): Likewise.
	(arm_strongarm_tune): Likewise.
	(arm_xscale_tune): Likewise.
	(arm_9e_tune): Likewise.
	(arm_marvell_pj4_tune): Likewise.
	(arm_v6t2_tune): Likewise.
	(arm_cortex_tune): Likewise.
	(arm_cortex_a8_tune): Likewise.
	(arm_cortex_a7_tune): Likewise.
	(arm_cortex_a15_tune): Likewise.
	(arm_cortex_a35_tune): Likewise.
	(arm_cortex_a53_tune): Likewise.
	(arm_cortex_a57_tune): Likewise.
	(arm_exynosm1_tune): Likewise.
	(arm_xgene1_tune): Likewise.
	(arm_cortex_a5_tune): Likewise.
	(arm_cortex_a9_tune): Likewise.
	(arm_cortex_a12_tune): Likewise.
	(arm_cortex_a73_tune): Likewise.
	(arm_v7m_tune): Likewise.
	(arm_cortex_m7_tune): Likewise.
	(arm_v6m_tune): Likewise.
	(arm_fa726te_tune): Likewise.

Change-Id: I613e876db93ffd6f8c1e72ba483be2efc0b56d66
---
 gcc/config/arm/arm-protos.h |  2 ++
 gcc/config/arm/arm.c| 24 
 2 files changed, 26 insertions(+)

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 4538078..47a85cc 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -278,6 +278,8 @@ struct tune_params
   int max_insns_inline_memset;
   /* Issue rate of the processor.  */
   unsigned int issue_rate;
+  /* Bus width (bits).  */
+  unsigned int bus_width;
   /* Explicit prefetch data.  */
   struct
 {
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index bca8a34..32001e5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1761,6 +1761,7 @@ const struct tune_params arm_slowmul_tune =
   5,		/* Max cond insns.  */
   8,		/* Memset max inline.  */
   1,		/* Issue rate.  */
+  32,		/* Bus width.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   tune_params::PREF_CONST_POOL_TRUE,
   tune_params::PREF_LDRD_FALSE,
@@ -1783,6 +1784,7 @@ const struct tune_params arm_fastmul_tune =
   5,		/* Max cond insns.  */
   8,		/* Memset max inline.  */
   1,		/* Issue rate.  */
+  32,		/* Bus width.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   tune_params::PREF_CONST_POOL_TRUE,
   tune_params::PREF_LDRD_FALSE,
@@ -1808,6 +1810,7 @@ const struct tune_params arm_strongarm_tune =
   3,		/* Max cond insns.  */
   8,		/* Memset max inline.  */
   1,		/* Issue rate.  */
+  32,		/* Bus width.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   tune_params::PREF_CONST_POOL_TRUE,
   tune_params::PREF_LDRD_FALSE,
@@ -1830,6 +1833,7 @@ const struct tune_params arm_xscale_tune =
   3,		/* Max cond insns.  */
   8,		/* Memset max inline.  */
   1,		/* Issue rate.  */
+  32,		/* Bus width.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   tune_params::PREF_CONST_POOL_TRUE,
   tune_params::PREF_LDRD_FALSE,
@@ -1852,6 +1856,7 @@ const struct tune_params arm_9e_tune =
   5,		/* Max cond insns.  */
   8,		/* Memset max inline.  */
   1,		

Re: [PATCH 3/3] [ARM] Add table of costs for AAarch32 addressing modes.

2017-09-15 Thread Charles Baylis
On 13 September 2017 at 10:02, Kyrill  Tkachov
 wrote:

>
> Please add a comment here saying that the units are in COSTS_N_INSNS
> so that we can reduce the temptation to use these in inappropriate contexts.

>> +  if (VECTOR_MODE_P (mode))
>> +   {
>> + *cost += current_tune->addr_mode_costs->vector[op_type];
>> +   }
>> +  else if (FLOAT_MODE_P (mode))
>> +   {
>> + *cost += current_tune->addr_mode_costs->fp[op_type];
>> +   }
>> +  else
>> +   {
>> + *cost += current_tune->addr_mode_costs->integer[op_type];
>> +   }
>
>
> No need for brackets for single-statement conditionals.

Done.


Re: Invoke vectorizable_live_operation in a consistent way

2017-09-15 Thread Jeff Law
On 09/15/2017 04:44 AM, Richard Sandiford wrote:
> vect_transform_stmt calls vectorizable_live_operation for
> each live statement in an SLP node, but vect_analyze_stmt
> only called it the once.  This patch makes vect_analyze_stmt
> consistent with vect_transform_stmt, which should be a bit
> more robust, and also means that a later patch can use
> slp_index when deciding validity.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?
> 
> Richard
> 
> 
> 2017-09-15  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-vect-stmts.c (can_vectorize_live_stmts): New function,
>   split out from...
>   (vect_transform_stmt): ...here.
>   (vect_analyze_stmt): Use it instead of calling
>   vectorizable_live_operation directly.
OK.
jeff


RE: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation

2017-09-15 Thread Tsimbalist, Igor V
> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Jeff Law
> Sent: Friday, August 25, 2017 11:03 PM
> To: Tsimbalist, Igor V ; 'gcc-
> patc...@gcc.gnu.org' 
> Subject: Re: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation
> 
> On 08/01/2017 02:56 AM, Tsimbalist, Igor V wrote:
> > Part#6. Add x86 tests for Intel CET implementation.
> >
> >
> > 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation.patch
> >
> >
> > From e4a8227e83e8e9f3ddbaa97707f3d335009e0e77 Mon Sep 17 00:00:00
> 2001
> > From: Igor Tsimbalist 
> > Date: Fri, 21 Jul 2017 19:40:40 +0300
> > Subject: [PATCH 6/9] Part#6. Add x86 tests for Intel CET implementation.
> >
> > gcc/testsuite/
> >
> > * g++.dg/cet-notrack-1.C: New test.
> > * gcc.target/i386/cet-intrin-1.c: Likewise.
> > * gcc.target/i386/cet-intrin-10.c: Likewise.
> > * gcc.target/i386/cet-intrin-2.c: Likewise.
> > * gcc.target/i386/cet-intrin-3.c: Likewise.
> > * gcc.target/i386/cet-intrin-4.c: Likewise.
> > * gcc.target/i386/cet-intrin-5.c: Likewise.
> > * gcc.target/i386/cet-intrin-6.c: Likewise.
> > * gcc.target/i386/cet-intrin-7.c: Likewise.
> > * gcc.target/i386/cet-intrin-8.c: Likewise.
> > * gcc.target/i386/cet-intrin-9.c: Likewise.
> > * gcc.target/i386/cet-label.c: Likewise.
> > * gcc.target/i386/cet-notrack-1a.c: Likewise.
> > * gcc.target/i386/cet-notrack-1b.c: Likewise.
> > * gcc.target/i386/cet-notrack-2a.c: Likewise.
> > * gcc.target/i386/cet-notrack-2b.c: Likewise.
> > * gcc.target/i386/cet-notrack-3.c: Likewise.
> > * gcc.target/i386/cet-notrack-4a.c: Likewise.
> > * gcc.target/i386/cet-notrack-4b.c: Likewise.
> > * gcc.target/i386/cet-notrack-5a.c: Likewise.
> > * gcc.target/i386/cet-notrack-5b.c: Likewise.
> > * gcc.target/i386/cet-notrack-6a.c: Likewise.
> > * gcc.target/i386/cet-notrack-6b.c: Likewise.
> > * gcc.target/i386/cet-notrack-7.c: Likewise.
> > * gcc.target/i386/cet-property-1.c: Likewise.
> > * gcc.target/i386/cet-property-2.c: Likewise.
> > * gcc.target/i386/cet-rdssp-1.c: Likewise.
> > * gcc.target/i386/cet-sjlj-1.c: Likewise.
> > * gcc.target/i386/cet-sjlj-2.c: Likewise.
> > * gcc.target/i386/cet-sjlj-3.c: Likewise.
> > * gcc.target/i386/cet-switch-1.c: Likewise.
> > * gcc.target/i386/cet-switch-2.c: Likewise.
> > * lib/target-supports.exp (check_effective_target_cet): New
> > proc.
> Whoops.  NEvermind my previous comment about x86 specific tests.  I
> should have scanned the whole kit before starting to comment on the earlier
> patches.
> 
> Uros will have the say on the x86 specific bits.  Given it's been 3 weeks, you
> might want to ping him directly to start getting his feedback.

Thanks, Jeff. Whom should I ping for other patches review, which are related 
compiler libraries like libgcc and other target libraries?

Igor

> jeff


Re: Move computation of SLP_TREE_NUMBER_OF_VEC_STMTS

2017-09-15 Thread Jeff Law
On 09/15/2017 04:43 AM, Richard Sandiford wrote:
> Previously SLP_TREE_NUMBER_OF_VEC_STMTS was calculated while scheduling
> an SLP tree after analysis, but sometimes it can be useful to know the
> value during analysis too.  This patch moves the calculation to
> vect_slp_analyze_node_operaions instead.
> 
> This became more natural after:
> 
> 2017-06-30  Richard Biener  
> 
> * tree-vect-slp.c (vect_slp_analyze_node_operations): Only
> analyze the first scalar stmt.  Move vector type computation
> for the BB case here from ...
> * tree-vect-stmts.c (vect_analyze_stmt): ... here.  Guard
> live operation processing in the SLP case properly.
> 
> since the STMT_VINFO_VECTYPE is now always initialised in time.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?
> 
> Richard
> 
> 
> 2017-09-15  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-vectorizer.h (vect_slp_analyze_operations): Replace parameters
>   with a vec_info *.
>   * tree-vect-loop.c (vect_analyze_loop_operations): Update call
>   accordingly.
>   * tree-vect-slp.c (vect_slp_analyze_node_operations): Add vec_info *
>   parameter.  Set SLP_TREE_NUMBER_OF_VEC_STMTS here rather than in
>   vect_schedule_slp_instance.
>   (vect_slp_analyze_operations): Replace parameters with a vec_info *.
>   Update call to vect_slp_analyze_node_operations.  Simplify return
>   value.
>   (vect_slp_analyze_bb_1): Update call accordingly.
>   (vect_schedule_slp_instance): Remove vectorization_factor parameter.
>   Don't calculate SLP_TREE_NUMBER_OF_VEC_STMTS here.
>   (vect_schedule_slp): Update call accordingly.
OK.
jeff


Re: [RFC][PACH 3/5] Prevent tree unroller from completely unrolling inner loops if that results in excessive strided-loads in outer loop

2017-09-15 Thread Jeff Law
On 09/15/2017 03:42 AM, Richard Biener wrote:
> On Fri, Sep 15, 2017 at 5:44 AM, Andrew Pinski  wrote:
>> On Thu, Sep 14, 2017 at 6:30 PM, Kugan Vivekanandarajah
>>  wrote:
>>> This patch prevent tree unroller from completely unrolling inner loops if 
>>> that
>>> results in excessive strided-loads in outer loop.
>>
>> Same comments from the RTL version.
>>
>> Though one more comment here:
>> +  if (!INDIRECT_REF_P (op)
> 
> There's no INDIRECT_REF in GIMPLE.
> 
>> +  && TREE_CODE (op) != MEM_REF
>> +  && TREE_CODE (op) != TARGET_MEM_REF)
>> +continue;
>>
>> This does not handle ARRAY_REF which might be/should be handled.
> 
> It looks like he wants to do
> 
>  op = get_base_address (op);
> 
> first.
> 
> But OTOH the routine looks completely bogus to me ...
> 
> You want to do
> 
>   find_data_references_in_stmt ()
> 
> and then look at the data-refs and the evolution of their access fns.
> 
> The function needs _way_ more comments though, you have to apply excessive
> guessing as to what it computes.  It also feels like this should be a target
> hook but part of some generic cost modeling infrastructure and the target
> should instead provide the number of load/store streams it can handle
> well (aka HW-prefetch).  That would be also (very) useful information
> for the loop distribution pass.
> 
> Related information that is missing is for the vectorizer peeling cost model
> the number of store buffers when deciding whether to peel stores for alignment
> for example.
Yea.  I'd much rather see a costing model of some kind rather than just
calling into a backend hook to disable the transformation in some cases.

Jeff


RE: 0002-Part-2.-Document-finstrument-control-flow-and-notrack attribute

2017-09-15 Thread Tsimbalist, Igor V
> -Original Message-
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Friday, August 25, 2017 10:59 PM
> To: Tsimbalist, Igor V ; 'gcc-
> patc...@gcc.gnu.org' 
> Subject: Re: 0002-Part-2.-Document-finstrument-control-flow-and-notrack
> attribute
> 
> On 08/01/2017 02:56 AM, Tsimbalist, Igor V wrote:
> > Part#2. Document -finstrument-control-flow and notrack attribute.
> >
> >
> > 0002-Part-2.-Document-finstrument-control-flow-and-notrac.patch
> >
> >
> > From c3e45c80731672e74d638f787e80ba975279b9b9 Mon Sep 17 00:00:00
> 2001
> > From: Igor Tsimbalist 
> > Date: Mon, 3 Jul 2017 17:12:49 +0300
> > Subject: [PATCH 2/9] Part#2. Document -finstrument-control-flow and
> > notrack  attribute.
> >
> > gcc/
> > * doc/extend.texi: Add 'notrack' documentation.
> > * doc/invoke.texi: Add -finstrument-control-flow documentation.
> > * doc/rtl.texi: Add REG_CALL_NOTRACK documenation.
> > ---
> >  gcc/doc/extend.texi | 52
> > 
> >  gcc/doc/invoke.texi | 22 ++
> >  gcc/doc/rtl.texi| 15 +++
> >  3 files changed, 89 insertions(+)
> >
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index
> > 6934b4c..80de8a7 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -5632,6 +5632,58 @@ Specify which floating-point unit to use.  You
> > must specify the  @code{target("fpmath=sse,387")} option as
> > @code{target("fpmath=sse+387")} because the comma would separate
> > different options.
> > +
> > +@item notrack
> > +@cindex @code{notrack} function attribute The @code{notrack}
> > +attribute on a function is used to inform the compiler that the
> > +function's prolog should not be instrumented when compiled with the
> > +@option{-finstrument-control-flow} option.  The compiler assumes that
> > +the function's address is a valid target for a control-flow transfer.
> Is the default to instrument everything when -finstrument-control-flow is
> enabled?  Or can we avoid instrumentation on a function that never has its
> address taken (ie, it is only called via a call instruction?)
The instrumentation is on by default but for all platform except of x86 it does 
nothing as
the implementation is not supported. For x86 the implementation is lightweight 
and just
increase a bit code size due to 'endbranch' instruction.

Given a function decl is there an information already available if an address 
was taken from
the function? I plan to do what you suggested later as an optimization 
especially for global
function where ipa is required. 

> > +
> > +The @code{notrack} attribute on a type of pointer to function is used
> > +to inform the compiler that a call through the pointer should not be
> > +instrumented when compiled with the
> > +@option{-finstrument-control-flow} option.  The compiler assumes that
> > +the function's address from the pointer is a valid target for a
> > +control-flow transfer.  A direct function call through a function
> > +name is assumed as a save call thus direct calls will not be
> > +instrumented by the compiler.
> s/save/safe/
> 
> FWIW, I think putting the attribute into in the type system is a good thing 
> :-)
> 
> > +
> > +The @code{notrack} attribute is applied to an object's type.  A The
> > +@code{notrack} attribute is transfered to a call instruction at the
> > +GIMPLE and RTL translation phases.  The attribute is not propagated
> > +through assignment, store and load.
> > +
> > +@smallexample
> > +@{
> > +void (*foo)(void) __attribute__(notrack); void (*foo1)(void)
> > +__attribute__(notrack); void (*foo2)(void);
> > +
> > +int
> > +foo (void) /* The function's address is not tracked.  */
> > +
> > +  /* This call site is not tracked for
> > + control-flow instrumentation.  */  (*foo1)();
> > +  foo1 = foo2;
> > +  /* This call site is still not tracked for
> > + control-flow instrumentation.  */  (*foo1)();
> > +
> > +  /* This call site is tracked for
> > + control-flow instrumentation.  */  (*foo2)();
> > +  foo2 = foo1;
> > +  /* This call site is still tracked for
> > + control-flow instrumentation.  */  (*foo2)();
> > +
> > +  return 0;
> > +@}
> > +@end smallexample
> Given the notrack attribute is part of the type system, could we issue a
> warning on the foo1 = foo2 assignment since we're discarding tracking that's
> implicit on foo2?
Fixed. For the code above messages are issued
w.c: In function 'foo':
w.c:22:8: warning: nocf_check attribute mismatch for assignment [-Wattributes]
   foo1 = foo2;
^
w.c:31:8: warning: nocf_check attribute mismatch for assignment [-Wattributes]
   foo2 = foo1;
^

> > +
> >  @end table
> >
> >  On the x86, the inliner does not inline a diff --git
> > a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 5ae9dc4..ff2ce92
> > 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -459,6 +459,7 @@ 

Re: Transform (x / y) != 0 to x >=y and (x / y) == 0 to x < y if x, y are unsigned

2017-09-15 Thread Jeff Law
On 09/15/2017 07:09 AM, Marc Glisse wrote:
> On Fri, 15 Sep 2017, Prathamesh Kulkarni wrote:
> 
> +/* (X / Y) == 0 -> X < Y if X, Y are unsigned.  */
> +(simplify
> +  (eq (trunc_div @0 @1) integer_zerop)
> +  (if (TYPE_UNSIGNED (TREE_TYPE(@0)) && TYPE_UNSIGNED (TREE_TYPE (@1)))
> +(lt @0 @1)))
> +
> +/* (X / Y) != 0 -> X >= Y, if X, Y are unsigned.  */
> +(simplify
> +  (ne (trunc_div @0 @1) integer_zerop)
> +  (if (TYPE_UNSIGNED (TREE_TYPE(@0)) && TYPE_UNSIGNED (TREE_TYPE (@1)))
> +(ge @0 @1)))
> +
> 
> Hello,
> 
> you can merge the 2 transforms using "for". Also, no need to test the
> type of @1 since you are already testing @0.
Right.

> 
> - do we want a single_use restriction on the result of the division?
I think so.  If x/y is a common subexpression, then ideally we'd compute
it once.

> - do we also want to handle (x>>4)==0?
I think so, but that can be a follow-up IMHO.


> - do we also want a special case when X is 1 that produces Y==1, as
> asked in a recent PR?
Seems like a reasonable follow-up as well.

The other follow-up to consider is detecting these cases in VRP to
produce suitable ASSERT_EXPRs and ranges.

> - once in a while, someone mentions that eq, on vectors, can either do
> elementwise comparison and return a vector, or return a single boolean,
> which would fail here. However, I don't remember ever seeing an example.
We could always restrict to the integral types.  Probably wise to
explicitly do that anyway.

jeff


[PATCH, rs6000] gimple folding vector load test variant

2017-09-15 Thread Will Schmidt
Hi, 
  This is a test created during investigation of the feedback on
the rs6000 gimple vector folding code, regarding the handling of
arg1_type.  Inspired by feedback from Richard and Bill.

This was useful to illustrate the issue to me.  Whether this is a
valid test for the testsuite I'll defer to the judgement of the
maintainers.. :-)

OK for trunk? 

[gcc/testsuite]

2017-09-15  Will Schmidt  

* gcc.target/powerpc/fold-vec-ld-misc.c: New.



diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-ld-misc.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-ld-misc.c
new file mode 100644
index 000..01069f6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-ld-misc.c
@@ -0,0 +1,54 @@
+/* Verify that overloaded built-ins for vec_ld with 
+   structure pointer / double inputs produce the right code.  */
+
+/* This test is to ensure that when a cast is associated with arg1 on a
+   call to vec_ld (arg0, arg1), that the arg1 type is properly handled
+   through the gimple folding code.
+   We want something like this:
+   D.2736 = MEM[(voidD.44 *)D.2739];
+   We specifically do not want 'struct S' showing up:
+   D.3212 = MEM[(struct S *)D.3215];
+*/
+
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-mvsx -O2 -mpower8-vector -fdump-tree-gimple" } */
+
+#include 
+#include 
+
+struct S {
+  vector int *i1,*i2;
+  vector long long *ll1;
+  vector double *vd1;
+  vector double *vd2;
+  vector double *vd3;
+  vector double *vd4;
+};
+
+vector double
+testld_struct1 (long long ll1, struct S *p)
+{
+  return __builtin_altivec_lvx_v2df (ll1, (double *)p);
+}
+
+vector double
+testld_struct1b (long long ll1, struct S *p)
+{
+  return vec_ld (ll1, (vector double *)p);
+}
+
+vector double
+testld_struct2 (struct S *p)
+{
+  return vec_ld (16, (vector double *)p);
+}
+
+vector double
+testld_struct3 (struct S *p)
+{
+  return vec_ld (16, (vector double *)p->vd2);
+}
+
+// We do not want the "struct S" reference to show up.
+/* { dg-final { scan-tree-dump-times "MEM\[\(struct S *\)D.\[0-9\]+\]" 0 
"gimple" } } */




[pushed] Fix compile time error when using ansidecl.h with an old version of GCC.

2017-09-15 Thread Pedro Alves
Hi guys,

I was looking at merging libiberty from gcc to binutils-gdb,
and noticed this one patch that is in binutils-gdb and not in gcc,
since last July.

I think the patch is borderline obvious (it's arguable whether
to define OVERRIDE/FINAL for C), but in interest of re-syncing
the trees, I'm pushing the patch to gcc as is.

Thanks,
Pedro Alves
>From 47ba729a29c6fa2283835d95d2ab5695d8c5d732 Mon Sep 17 00:00:00 2001
From: Nick Clifton 
Date: Mon, 31 Jul 2017 15:08:32 +0100
Subject: [PATCH] Fix compile time error when using ansidecl.h with an old
 version of GCC.

	Binutils PR 21850
	* ansidecl.h (OVERRIDE): Protect check of __cplusplus value with
	#idef __cplusplus.
---
 include/ChangeLog  |  6 ++
 include/ansidecl.h | 30 ++
 2 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/include/ChangeLog b/include/ChangeLog
index c7ce259..4703588 100644
--- a/include/ChangeLog
+++ b/include/ChangeLog
@@ -8,6 +8,12 @@
 	* simple-object.h (simple_object_copy_lto_debug_sections): New
 	function.
 
+2017-07-31  Nick Clifton  
+
+	Binutils PR 21850
+	* ansidecl.h (OVERRIDE): Protect check of __cplusplus value with
+	#idef __cplusplus.
+
 2017-07-02  Jan Kratochvil  
 
 	* dwarf2.def (DW_IDX_compile_unit, DW_IDX_type_unit, DW_IDX_die_offset)
diff --git a/include/ansidecl.h b/include/ansidecl.h
index f6e1761..ab3b895 100644
--- a/include/ansidecl.h
+++ b/include/ansidecl.h
@@ -334,22 +334,28 @@ So instead we use the macro below and test it against specific values.  */
For gcc, use "-std=c++11" to enable C++11 support; gcc 6 onwards enables
this by default (actually GNU++14).  */
 
-#if __cplusplus >= 201103
-/* C++11 claims to be available: use it.  final/override were only
-   implemented in 4.7, though.  */
-# if GCC_VERSION < 4007
+#if defined __cplusplus
+# if __cplusplus >= 201103
+   /* C++11 claims to be available: use it.  Final/override were only
+  implemented in 4.7, though.  */
+#  if GCC_VERSION < 4007
+#   define OVERRIDE
+#   define FINAL
+#  else
+#   define OVERRIDE override
+#   define FINAL final
+#  endif
+# elif GCC_VERSION >= 4007
+   /* G++ 4.7 supports __final in C++98.  */
 #  define OVERRIDE
-#  define FINAL
+#  define FINAL __final
 # else
-#  define OVERRIDE override
-#  define FINAL final
+   /* No C++11 support; leave the macros empty.  */
+#  define OVERRIDE
+#  define FINAL
 # endif
-#elif GCC_VERSION >= 4007
-/* G++ 4.7 supports __final in C++98.  */
-# define OVERRIDE
-# define FINAL __final
 #else
-/* No C++11 support; leave the macros empty: */
+  /* No C++11 support; leave the macros empty.  */
 # define OVERRIDE
 # define FINAL
 #endif
-- 
2.5.5



[PATCH, rs6000] [v3] Folding of vector loads in GIMPLE

2017-09-15 Thread Will Schmidt
Hi,

[PATCH, rs6000] [v3] Folding of vector loads in GIMPLE

Folding of vector loads in GIMPLE.

Add code to handle gimple folding for the vec_ld builtins.
Remove the now obsoleted folding code for vec_ld from rs6000-c.c. Surrounding
comments have been adjusted slightly so they continue to read OK for the
existing vec_st code.

The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c
tests which are already in-tree.

For V2 of this patch, I've removed the chunk of code that prohibited the
gimple fold from occurring in BE environments.   This had fixed an issue
for me earlier during my development of the code, and turns out this was
not necessary.  (this introduced a failure in LE environment, so V3...)

for V3 of this patch;
 I've added a reworked statement that prohibits the folding of a vector
load when altivec=be is specified in an LE environment.
 Adjusted the arg1_type definition to use ptr_type_node per feedback and
discussions and experimentation with generated code.

Regtest to be run on power6 and newer.

OK for trunk?  (assuming successful completion of regtest).

Thanks,
-Will

[gcc]

2017-09-15  Will Schmidt  

* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.


diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index d27f563..a49db97 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -6470,89 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t loc, 
tree fndecl,
 convert (TREE_TYPE (stmt), arg0));
   stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
   return stmt;
 }
 
-  /* Expand vec_ld into an expression that masks the address and
- performs the load.  We need to expand this early to allow
+  /* Expand vec_st into an expression that masks the address and
+ performs the store.  We need to expand this early to allow
  the best aliasing, as by the time we get into RTL we no longer
  are able to honor __restrict__, for example.  We may want to
  consider this for all memory access built-ins.
 
  When -maltivec=be is specified, or the wrong number of arguments
  is provided, simply punt to existing built-in processing.  */
-  if (fcode == ALTIVEC_BUILTIN_VEC_LD
-  && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
-  && nargs == 2)
-{
-  tree arg0 = (*arglist)[0];
-  tree arg1 = (*arglist)[1];
-
-  /* Strip qualifiers like "const" from the pointer arg.  */
-  tree arg1_type = TREE_TYPE (arg1);
-  if (TREE_CODE (arg1_type) == ARRAY_TYPE && c_dialect_cxx ())
-   {
- /* Force array-to-pointer decay for C++.  */
- arg1 = default_conversion (arg1);
- arg1_type = TREE_TYPE (arg1);
-   }
-  if (!POINTER_TYPE_P (arg1_type))
-   goto bad;
-
-  tree inner_type = TREE_TYPE (arg1_type);
-  if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)
-   {
- arg1_type = build_pointer_type (build_qualified_type (inner_type,
-   0));
- arg1 = fold_convert (arg1_type, arg1);
-   }
-
-  /* Construct the masked address.  Let existing error handling take
-over if we don't have a constant offset.  */
-  arg0 = fold (arg0);
-
-  if (TREE_CODE (arg0) == INTEGER_CST)
-   {
- if (!ptrofftype_p (TREE_TYPE (arg0)))
-   arg0 = build1 (NOP_EXPR, sizetype, arg0);
-
- tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,
-  arg1, arg0);
- tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,
- build_int_cst (arg1_type, -16));
-
- /* Find the built-in to get the return type so we can convert
-the result properly (or fall back to default handling if the
-arguments aren't compatible).  */
- for (desc = altivec_overloaded_builtins;
-  desc->code && desc->code != fcode; desc++)
-   continue;
-
- for (; desc->code == fcode; desc++)
-   if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)
-   && (rs6000_builtin_type_compatible (TREE_TYPE (arg1),
-   desc->op2)))
- {
-   tree ret_type = rs6000_builtin_type (desc->ret_type);
-   if (TYPE_MODE (ret_type) == V2DImode)
- /* Type-based aliasing analysis thinks vector long
-and vector long long are different and will put them
-in distinct alias classes.  Force our return type
-to be a may-alias type to avoid this.  */
-   

Re: [PATCH] Add comments to struct cgraph_thunk_info

2017-09-15 Thread Pierre-Marie de Rodat

On 09/15/2017 12:54 PM, Pierre-Marie de Rodat wrote:
I’m not super confident about this though, so I’ll resubmit a patch 
without the reordering: I’ve added more comments anyway as I’ve learned 
more about this since yesterday. ;-)


Here it is!

--
Pierre-Marie de Rodat
>From 601dd0e949f4af456a11036918da9dbadbb3aa0c Mon Sep 17 00:00:00 2001
From: Pierre-Marie de Rodat 
Date: Wed, 13 Sep 2017 16:21:04 +0200
Subject: [PATCH] Add comments to struct cgraph_thunk_info

This commit adds comments to fields in the cgraph_thunk_info structure
declaration from cgraph.h.  They will hopefully answer questions that
people like myself can ask while discovering the thunk machinery.  I
also made an assertion stricter in cgraph_node::create_thunk.

Bootsrapped and regtested on x86_64-linux.

gcc/

	* cgraph.h (cgraph_thunk_info): Add comments.
	* cgraph.c (cgraph_node::create_thunk): Adjust comment, make
	assert for VIRTUAL_* arguments stricter.
---
 gcc/cgraph.c | 14 +-
 gcc/cgraph.h | 39 +++
 2 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 69aa6c5bce2..8bffdec8fb7 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -603,7 +603,7 @@ cgraph_node::create_same_body_alias (tree alias, tree decl)
 
 /* Add thunk alias into callgraph.  The alias declaration is ALIAS and it
aliases DECL with an adjustments made into the first parameter.
-   See comments in thunk_adjust for detail on the parameters.  */
+   See comments in struct cgraph_thunk_info for detail on the parameters.  */
 
 cgraph_node *
 cgraph_node::create_thunk (tree alias, tree, bool this_adjusting,
@@ -619,13 +619,17 @@ cgraph_node::create_thunk (tree alias, tree, bool this_adjusting,
 node->reset ();
   else
 node = cgraph_node::create (alias);
-  gcc_checking_assert (!virtual_offset
-		   || wi::eq_p (virtual_offset, virtual_value));
+
+  /* Make sure that if VIRTUAL_OFFSET is in sync with VIRTUAL_VALUE.  */
+  gcc_checking_assert (virtual_offset
+		   ? wi::eq_p (virtual_offset, virtual_value)
+		   : virtual_value == 0);
+
   node->thunk.fixed_offset = fixed_offset;
-  node->thunk.this_adjusting = this_adjusting;
   node->thunk.virtual_value = virtual_value;
-  node->thunk.virtual_offset_p = virtual_offset != NULL;
   node->thunk.alias = real_alias;
+  node->thunk.this_adjusting = this_adjusting;
+  node->thunk.virtual_offset_p = virtual_offset != NULL;
   node->thunk.thunk_p = true;
   node->definition = true;
 
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 57cdaa45681..c668b37ef82 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -629,17 +629,48 @@ extern const char * const cgraph_availability_names[];
 extern const char * const ld_plugin_symbol_resolution_names[];
 extern const char * const tls_model_names[];
 
-/* Information about thunk, used only for same body aliases.  */
+/* Sub-structure of cgraph_node.  Holds information about thunk, used only for
+   same body aliases.
+
+   Thunks are basically wrappers around methods which are introduced in case
+   of multiple inheritance in order to adjust the value of the "this" pointer
+   or of the returned value.
+
+   In the case of this-adjusting thunks, each back-end can override the
+   can_output_mi_thunk/output_mi_thunk target hooks to generate a minimal thunk
+   (with a tail call for instance) directly as assembly.  For the default hook
+   or for the case where the can_output_mi_thunk hooks return false, the thunk
+   is gimplified and lowered using the regular machinery.  */
 
 struct GTY(()) cgraph_thunk_info {
-  /* Information about the thunk.  */
+  /* Offset used to adjust "this".  */
   HOST_WIDE_INT fixed_offset;
+
+  /* Offset in the virtual table to get the offset to adjust "this".  Valid iff
+ VIRTUAL_OFFSET_P is true.  */
   HOST_WIDE_INT virtual_value;
+
+  /* Thunk target, i.e. the method that this thunk wraps.  Depending on the
+ TARGET_USE_LOCAL_THUNK_ALIAS_P macro, this may have to be a new alias.  */
   tree alias;
+
+  /* Nonzero for a "this" adjusting thunk and zero for a result adjusting
+ thunk.  */
   bool this_adjusting;
+
+  /* If true, this thunk is what we call a virtual thunk.  In this case:
+ * for this-adjusting thunks, after the FIXED_OFFSET based adjustment is
+   done, add to the result the offset found in the vtable at:
+	 vptr + VIRTUAL_VALUE
+ * for result-adjusting thinks, the FIXED_OFFSET adjustment is done after
+   the virtual one.  */
   bool virtual_offset_p;
+
+  /* ??? True for special kind of thunks, seems related to instrumentation.  */
   bool add_pointer_bounds_args;
-  /* Set to true when alias node is thunk.  */
+
+  /* Set to true when alias node (the cgraph_node to which this struct belong)
+ is a thunk.  Access to any other fields is invalid if this is false.  */
   bool thunk_p;
 };
 
@@ -983,7 +1014,7 @@ public:
 
   /* Add thunk alias into callgraph.  The alias declaration is 

Re: Backports for GCC 5 branch

2017-09-15 Thread Martin Liška
One more also for GCC 5 branch.

Martin
>From 6853238cc8103fefb8b8acc8f56d444860495714 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 11 Aug 2017 10:01:13 +
Subject: [PATCH] Backport r251049

gcc/ChangeLog:

2017-08-11  Martin Liska  

	PR tree-opt/79987
	* tree-chkp.c (chkp_get_bounds_for_decl_addr): Do not instrument
	variables of void type.

gcc/testsuite/ChangeLog:

2017-08-11  Martin Liska  

	PR tree-opt/79987
	* gcc.target/i386/mpx/pr79987.c: New test.
---
 gcc/testsuite/gcc.target/i386/mpx/pr79987.c | 5 +
 gcc/tree-chkp.c | 3 +++
 2 files changed, 8 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/mpx/pr79987.c

diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr79987.c b/gcc/testsuite/gcc.target/i386/mpx/pr79987.c
new file mode 100644
index 000..b3ebda95694
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mpx/pr79987.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-fcheck-pointer-bounds -mmpx" } */
+
+extern void foo;
+void *bar =  /* { dg-warning "taking address of expression of type .void." } */
diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c
index 2901e8b354d..28dac22add6 100644
--- a/gcc/tree-chkp.c
+++ b/gcc/tree-chkp.c
@@ -3144,6 +3144,9 @@ chkp_get_bounds_for_decl_addr (tree decl)
   && !flag_chkp_incomplete_type)
   return chkp_get_zero_bounds ();
 
+  if (VOID_TYPE_P (TREE_TYPE (decl)))
+return chkp_get_zero_bounds ();
+
   if (flag_chkp_use_static_bounds
   && TREE_CODE (decl) == VAR_DECL
   && (TREE_STATIC (decl)
-- 
2.14.1



Re: Backports for GCC 6 branch

2017-09-15 Thread Martin Liška
One more.

Martin
>From 6853238cc8103fefb8b8acc8f56d444860495714 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 11 Aug 2017 10:01:13 +
Subject: [PATCH] Backport r251049

gcc/ChangeLog:

2017-08-11  Martin Liska  

	PR tree-opt/79987
	* tree-chkp.c (chkp_get_bounds_for_decl_addr): Do not instrument
	variables of void type.

gcc/testsuite/ChangeLog:

2017-08-11  Martin Liska  

	PR tree-opt/79987
	* gcc.target/i386/mpx/pr79987.c: New test.
---
 gcc/testsuite/gcc.target/i386/mpx/pr79987.c | 5 +
 gcc/tree-chkp.c | 3 +++
 2 files changed, 8 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/mpx/pr79987.c

diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr79987.c b/gcc/testsuite/gcc.target/i386/mpx/pr79987.c
new file mode 100644
index 000..b3ebda95694
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mpx/pr79987.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-fcheck-pointer-bounds -mmpx" } */
+
+extern void foo;
+void *bar =  /* { dg-warning "taking address of expression of type .void." } */
diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c
index 2901e8b354d..28dac22add6 100644
--- a/gcc/tree-chkp.c
+++ b/gcc/tree-chkp.c
@@ -3144,6 +3144,9 @@ chkp_get_bounds_for_decl_addr (tree decl)
   && !flag_chkp_incomplete_type)
   return chkp_get_zero_bounds ();
 
+  if (VOID_TYPE_P (TREE_TYPE (decl)))
+return chkp_get_zero_bounds ();
+
   if (flag_chkp_use_static_bounds
   && TREE_CODE (decl) == VAR_DECL
   && (TREE_STATIC (decl)
-- 
2.14.1



Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE

2017-09-15 Thread Bill Schmidt
On Sep 15, 2017, at 4:13 AM, Richard Biener  wrote:
> 
> On Thu, Sep 14, 2017 at 4:38 PM, Bill Schmidt
>  wrote:
>> On Sep 14, 2017, at 5:15 AM, Richard Biener  
>> wrote:
>>> 
>>> On Wed, Sep 13, 2017 at 10:14 PM, Bill Schmidt
>>>  wrote:
 On Sep 13, 2017, at 10:40 AM, Bill Schmidt  
 wrote:
> 
> On Sep 13, 2017, at 7:23 AM, Richard Biener  
> wrote:
>> 
>> On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt
>>  wrote:
>>> Hi,
>>> 
>>> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE
>>> 
>>> Folding of vector loads in GIMPLE.
>>> 
>>> Add code to handle gimple folding for the vec_ld builtins.
>>> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. 
>>> Surrounding
>>> comments have been adjusted slightly so they continue to read OK for the
>>> existing vec_st code.
>>> 
>>> The resulting code is specifically verified by the 
>>> powerpc/fold-vec-ld-*.c
>>> tests which have been posted separately.
>>> 
>>> For V2 of this patch, I've removed the chunk of code that prohibited the
>>> gimple fold from occurring in BE environments.   This had fixed an issue
>>> for me earlier during my development of the code, and turns out this was
>>> not necessary.  I've sniff-tested after removing that check and it looks
>>> OK.
>>> 
 + /* Limit folding of loads to LE targets.  */
 +  if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)
 +return false;
>>> 
>>> I've restarted a regression test on this updated version.
>>> 
>>> OK for trunk (assuming successful regression test completion)  ?
>>> 
>>> Thanks,
>>> -Will
>>> 
>>> [gcc]
>>> 
>>> 2017-09-12  Will Schmidt  
>>> 
>>> * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
>>> for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).
>>> * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>>> Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.
>>> 
>>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
>>> index fbab0a2..bb8a77d 100644
>>> --- a/gcc/config/rs6000/rs6000-c.c
>>> +++ b/gcc/config/rs6000/rs6000-c.c
>>> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t 
>>> loc, tree fndecl,
>>>  convert (TREE_TYPE (stmt), arg0));
>>>stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>>>return stmt;
>>>  }
>>> 
>>> -  /* Expand vec_ld into an expression that masks the address and
>>> - performs the load.  We need to expand this early to allow
>>> +  /* Expand vec_st into an expression that masks the address and
>>> + performs the store.  We need to expand this early to allow
>>>   the best aliasing, as by the time we get into RTL we no longer
>>>   are able to honor __restrict__, for example.  We may want to
>>>   consider this for all memory access built-ins.
>>> 
>>>   When -maltivec=be is specified, or the wrong number of arguments
>>>   is provided, simply punt to existing built-in processing.  */
>>> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD
>>> -  && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
>>> -  && nargs == 2)
>>> -{
>>> -  tree arg0 = (*arglist)[0];
>>> -  tree arg1 = (*arglist)[1];
>>> -
>>> -  /* Strip qualifiers like "const" from the pointer arg.  */
>>> -  tree arg1_type = TREE_TYPE (arg1);
>>> -  if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != 
>>> ARRAY_TYPE)
>>> -   goto bad;
>>> -
>>> -  tree inner_type = TREE_TYPE (arg1_type);
>>> -  if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)
>>> -   {
>>> - arg1_type = build_pointer_type (build_qualified_type 
>>> (inner_type,
>>> -   0));
>>> - arg1 = fold_convert (arg1_type, arg1);
>>> -   }
>>> -
>>> -  /* Construct the masked address.  Let existing error handling 
>>> take
>>> -over if we don't have a constant offset.  */
>>> -  arg0 = fold (arg0);
>>> -
>>> -  if (TREE_CODE (arg0) == INTEGER_CST)
>>> -   {
>>> - if (!ptrofftype_p (TREE_TYPE (arg0)))
>>> -   arg0 = build1 (NOP_EXPR, sizetype, arg0);
>>> -
>>> - tree arg1_type = TREE_TYPE (arg1);
>>> - if (TREE_CODE (arg1_type) == ARRAY_TYPE)
>>> -   {
>>> - arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));
>>> - tree const0 = 

Re: [ARM] PR 67591 ARM v8 Thumb IT blocks are deprecated

2017-09-15 Thread Christophe Lyon
On 13 September 2017 at 18:33, Kyrill  Tkachov
 wrote:
> Hi Christophe,
>
>
> On 13/09/17 16:23, Christophe Lyon wrote:
>>
>> Hi,
>>
>> On 12 October 2016 at 11:22, Christophe Lyon 
>> wrote:
>>>
>>> On 12 October 2016 at 11:14, Kyrill Tkachov 
>>> wrote:

 On 12/10/16 09:59, Christophe Lyon wrote:
>
> Hi Kyrill,
>
> On 7 October 2016 at 17:00, Kyrill Tkachov
> 
> wrote:
>>
>> Hi Christophe,
>>
>>
>> On 07/09/16 21:05, Christophe Lyon wrote:
>>>
>>> Hi,
>>>
>>> The attached patch is a first part to solve PR 67591: it removes
>>> several occurrences of "IT blocks containing 32-bit Thumb
>>> instructions are deprecated in ARMv8" messages in the
>>> gcc/g++/libstdc++/fortran testsuites.
>>>
>>> It does not remove them all yet. This patch only modifies the
>>> *cmp_and, *cmp_ior, *ior_scc_scc, *ior_scc_scc_cmp,
>>> *and_scc_scc and *and_scc_scc_cmp patterns.
>>> Additional work is required in sub_shiftsi etc, at least.
>>> I've started looking at these, but I decided I could already
>>> post this self-contained patch to check if this implementation
>>> is OK.
>>>
>>> Regarding *cmp_and and *cmp_ior patterns, the addition of the
>>> enabled_for_depr_it attribute is aggressive in the sense that it
>>> keeps
>>> only the alternatives with 'l' and 'Py' constraints, while in some
>>> cases the constraints could be relaxed. Indeed, these 2 patterns can
>>> swap their input comparisons, meaning that any of them can be emitted
>>> in the IT-block, and is thus subject to the ARMv8 deprecation.
>>> The generated code is possibly suboptimal in the cases where the
>>> operands are not swapped, since 'r' could be used.
>>>
>>> Cross-tested on arm-none-linux-gnueabihf with -mthumb/-march=armv8-a
>>> and --with-cpu=cortex-a57 --with-mode=thumb, showing only
>>> improvements:
>>>
>>>
>>>
>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/239850-depr-it-4/report-build-info.html
>>>
>>> Bootstrapped OK on armv8l HW.
>>>
>>> Is this OK?
>>>
>>> Thanks,
>>>
>>> Christophe
>>
>>
>>(define_insn_and_split "*ior_scc_scc"
>> -  [(set (match_operand:SI 0 "s_register_operand" "=Ts")
>> +  [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts")
>>   (ior:SI (match_operator:SI 3 "arm_comparison_operator"
>> -[(match_operand:SI 1 "s_register_operand" "r")
>> - (match_operand:SI 2 "arm_add_operand" "rIL")])
>> +[(match_operand:SI 1 "s_register_operand" "r,l")
>> + (match_operand:SI 2 "arm_add_operand" "rIL,lPy")])
>>   (match_operator:SI 6 "arm_comparison_operator"
>> -[(match_operand:SI 4 "s_register_operand" "r")
>> - (match_operand:SI 5 "arm_add_operand" "rIL")])))
>> +[(match_operand:SI 4 "s_register_operand" "r,l")
>> + (match_operand:SI 5 "arm_add_operand" "rIL,lPy")])))
>>
>> Can you please put the more restrictive alternatives (lPy) first?
>> Same with the other patterns your patch touches.
>> Ok with that change if a rebased testing run is ok.
>> Sorry for the delay in reviewing.
>>
> OK, I will update my patch accordingly.
>
> However, when I discussed this with Ramana during the Cauldron,
> he requested benchmark results. So far, I was able to run spec2006
> on an APM machine, and I'm seeing performance changes in the
> range 11% improvement (465.tonto) to 7% degradation (433.milc).
>
> Would that be acceptable?


 Those sound like quite large swings.
>>>
>>> Indeed, but most are in the -1%-+1% range.
>>>
 Are you sure the machine was not running anything else at the time
 or playing tricks with frequency scaling?
>>>
>>> No, I had no such guarantee. I used this machine temporarily,
>>> first to check that bootstrap worked. I planed to use another
>>> board with an A57 "standard" microarch for proper
>>> benchmarking, but I'm not sure when I'll have access to it
>>> (wrt to e/o gcc stage1), that's why I reported these early
>>> figures.
>>>
 Did all iterations of SPEC show a consistent difference?

 If the changes are consistent, could you have a look at the codegen
 to see if there are any clues to the differences?
>>>
>>> I will update my patch according to your comment, re-run the bench
>>> and have a deeper look at the codegen differences.
>>>
>> I have finally been able to run benchmarks with my patch updated
>> according to your comment, on new machines where we have
>> better control of the environment (frequency, etc...).
>>
>> These machines use cortex-a57 CPUs 

Re: Transform (x / y) != 0 to x >=y and (x / y) == 0 to x < y if x, y are unsigned

2017-09-15 Thread Marc Glisse

On Fri, 15 Sep 2017, Prathamesh Kulkarni wrote:

+/* (X / Y) == 0 -> X < Y if X, Y are unsigned.  */
+(simplify
+  (eq (trunc_div @0 @1) integer_zerop)
+  (if (TYPE_UNSIGNED (TREE_TYPE(@0)) && TYPE_UNSIGNED (TREE_TYPE (@1)))
+(lt @0 @1)))
+
+/* (X / Y) != 0 -> X >= Y, if X, Y are unsigned.  */
+(simplify
+  (ne (trunc_div @0 @1) integer_zerop)
+  (if (TYPE_UNSIGNED (TREE_TYPE(@0)) && TYPE_UNSIGNED (TREE_TYPE (@1)))
+(ge @0 @1)))
+

Hello,

you can merge the 2 transforms using "for". Also, no need to test the type 
of @1 since you are already testing @0.


- do we want a single_use restriction on the result of the division?
- do we also want to handle (x>>4)==0?
- do we also want a special case when X is 1 that produces Y==1, as asked 
in a recent PR?
- once in a while, someone mentions that eq, on vectors, can either do 
elementwise comparison and return a vector, or return a single boolean, 
which would fail here. However, I don't remember ever seeing an example.


--
Marc Glisse


Re: [PATCH PR82163]Rewrite loop into lcssa form instantly

2017-09-15 Thread Bin.Cheng
On Fri, Sep 15, 2017 at 12:49 PM, Richard Biener
 wrote:
> On Thu, Sep 14, 2017 at 5:02 PM, Bin Cheng  wrote:
>> Hi,
>> Current pcom implementation rewrites into lcssa form after all loops are 
>> transformed, this is
>> not enough because unrolling of later loop checks lcssa form in function 
>> tree_transform_and_unroll_loop.
>> This simple patch rewrites loop into lcssa form if store-store chain is 
>> handled.  I think it doesn't
>> affect compilation time since rewrite_into_loop_closed_ssa_1 is only called 
>> for store-store chain
>> transformation and only the transformed loop is rewritten.
>
> Well, it may look like only the transformed loop is rewritten -- yes,
> it is, but rewrite_into_loop_closed_ssa
> calls update_ssa () which operates on the whole function.
I see.
>
> So I'd rather _not_ do this.
>
> Is there a real problem or is it just the overly aggressive checking
> done?  IMHO we should remove
In this case, it's the check itself.
> the checking or pass in a param whether to skip the checking.  Or even
> better, restrict the
> checking to those loops trans_form_and_unroll actually touches.
Yes, will see if we can check loops only related to trans_form_and_unroll.

Thanks,
bin
>
> Richard.
>
>> Bootstrap and test ongoing on x86_64.  is it OK if no failures?
>>
>> Thanks,
>> bin
>> 2017-09-14  Bin Cheng  
>>
>> PR tree-optimization/82163
>> * tree-predcom.c (tree_predictive_commoning_loop): Rewrite into
>> loop closed ssa instantly.  Return boolean true if loop is unrolled.
>> (tree_predictive_commoning): Return TODO_cleanup_cfg if loop is
>> unrolled.
>>
>> gcc/testsuite
>> 2017-09-14  Bin Cheng  
>>
>> PR tree-optimization/82163
>> * gcc.dg/tree-ssa/pr82163.c: New test.


Re: [PATCH] Add -std=c++2a

2017-09-15 Thread Pedro Alves
On 09/15/2017 01:53 PM, Jakub Jelinek wrote:
> On Mon, Aug 07, 2017 at 02:17:17PM +0100, Pedro Alves wrote:
>> I happened to skim this patch and notice a couple issues.
>> See below.
> 
> Note I've just posted updated patch based on this to gcc-patches.

Thanks ( FWIW :-) ).

>>> @@ -497,7 +499,10 @@ cpp_init_builtins (cpp_reader *pfile, int hosted)
>>>  
>>>if (CPP_OPTION (pfile, cplusplus))
>>>  {
>>> -  if (CPP_OPTION (pfile, lang) == CLK_CXX1Z
>>> +  if (CPP_OPTION (pfile, lang) == CLK_CXX2A
>>> + || CPP_OPTION (pfile, lang) == CLK_GNUCXX2A)
>>> +   _cpp_define_builtin (pfile, "__cplusplus 201707L");
>>
>> I think you wanted 202007L here.
> 
> The documentation states some unspecified value strictly greater than
> 201703L.  In the patch I've posted it is 201709L, because that is this
> month, 202007L would be just a wild guess.  People aren't supposed to
> rely on a particular value until C++2z is finalized, so just use
> (__cplusplus > 201703L) for features beyond C++17.
> 

I see, I had assumed 202007L was the intention because that's
what was in the changeLog entry, and because "2020":

Add support for C++2a.
* include/cpplib.h (c_lang): Add CXX2A and GNUCXX2A.
* init.c (lang_defaults): Add rows for CXX2A and GNUCXX2A.
(cpp_init_builtins): Set __cplusplus to 202007L for C++2x.
^^

Thanks,
Pedro Alves



Re: [PATCH] Add -std=c++2a

2017-09-15 Thread Jakub Jelinek
On Mon, Aug 07, 2017 at 02:17:17PM +0100, Pedro Alves wrote:
> I happened to skim this patch and notice a couple issues.
> See below.

Note I've just posted updated patch based on this to gcc-patches.

> > +/* Set the C++ 202a draft standard (without GNU extensions if ISO).  */
> > +static void
> > +set_std_cxx2a (int iso)
> > +{
> > +  cpp_set_lang (parse_in, iso ? CLK_CXX2A: CLK_GNUCXX2A);
> > +  flag_no_gnu_keywords = iso;
> > +  flag_no_nonansi_builtin = iso;
> > +  flag_iso = iso;
> > +  /* C++1z includes the C99 standard library.  */
> > +  flag_isoc94 = 1;
> > +  flag_isoc99 = 1;
> > +  flag_isoc11 = 1;
> > +  cxx_dialect = cxx2a;
> > +  lang_hooks.name = "GNU C++17"; /* Pretend C++17 until standardization.  
> > */
> 
> Did you mean to write C++20 here?

No, that matches what we did with C++1z until the patch I've just posted.

> > --- a/libcpp/include/cpplib.h
> > +++ b/libcpp/include/cpplib.h
> > @@ -171,7 +171,8 @@ enum cpp_ttype
> >  enum c_lang {CLK_GNUC89 = 0, CLK_GNUC99, CLK_GNUC11,
> >  CLK_STDC89, CLK_STDC94, CLK_STDC99, CLK_STDC11,
> >  CLK_GNUCXX, CLK_CXX98, CLK_GNUCXX11, CLK_CXX11,
> > -CLK_GNUCXX14, CLK_CXX14, CLK_GNUCXX1Z, CLK_CXX1Z, CLK_ASM};
> > +CLK_GNUCXX14, CLK_CXX14, CLK_GNUCXX1Z, CLK_CXX1Z,
> > + CLK_GNUCXX2A, CLK_CXX2A, CLK_ASM};
> 
> Tabs vs spaces?

Fixed.
> 
> > @@ -497,7 +499,10 @@ cpp_init_builtins (cpp_reader *pfile, int hosted)
> >  
> >if (CPP_OPTION (pfile, cplusplus))
> >  {
> > -  if (CPP_OPTION (pfile, lang) == CLK_CXX1Z
> > +  if (CPP_OPTION (pfile, lang) == CLK_CXX2A
> > + || CPP_OPTION (pfile, lang) == CLK_GNUCXX2A)
> > +   _cpp_define_builtin (pfile, "__cplusplus 201707L");
> 
> I think you wanted 202007L here.

The documentation states some unspecified value strictly greater than
201703L.  In the patch I've posted it is 201709L, because that is this
month, 202007L would be just a wild guess.  People aren't supposed to
rely on a particular value until C++2z is finalized, so just use
(__cplusplus > 201703L) for features beyond C++17.

Jakub


[C++ PATCH] Incremental patch for -std=gnu++2a and -std=c++2a

2017-09-15 Thread Jakub Jelinek
Hi!

On Thu, Sep 14, 2017 at 11:28:09PM +0200, Jakub Jelinek wrote:
> > I'd be tempted to say leave all this, and march 1z -> 2a for the _next_ 
> > standard.  2020 or so is a good first stab at the date.
> 
> I didn't want to add c++2a and gnu++2a in the same patch, it can be added
> incrementally and readd the above wording.  Unless somebody else is planning
> to do that, I can do that next.

Here is so far untested incremental patch on top of the 1z -> 17 patch,
mostly using Andrew's patch, but adjusted so that it applies and with
various additions and small tweaks.

2017-09-15  Andrew Sutton  
Jakub Jelinek  

Add support for -std=c++2a.
* c-common.h (cxx_dialect): Add cxx2a as a dialect.
* opt.c: Add options for -std=c++2a and -std=gnu++2a.
* c-opts.c (set_std_cxx2a): New.
(c_common_handle_option): Set options when -std=c++2a is enabled.
(c_common_post_options): Adjust comments.
(set_std_cxx14, set_std_cxx17): Likewise.

* doc/cpp.texi (__cplusplus): Document value for -std=c++2a
or -std=gnu+2a.
* doc/invoke.texi: Document -std=c++2a and -std=gnu++2a.

* lib/target-supports.exp (check_effective_target_c++17): Return
1 also if check_effective_target_c++2a.
(check_effective_target_c++17_down): New.
(check_effective_target_c++2a_only): New.
(check_effective_target_c++2a): New.
* g++.dg/cpp2a/cplusplus.C: New.

* include/cpplib.h (c_lang): Add CXX2A and GNUCXX2A.
* init.c (lang_defaults): Add rows for CXX2A and GNUCXX2A.
(cpp_init_builtins): Set __cplusplus to 201709L for C++2a.

--- gcc/c-family/c-common.h.jj  2017-09-14 22:53:34.977313456 +0200
+++ gcc/c-family/c-common.h 2017-09-15 12:59:25.539053983 +0200
@@ -703,7 +703,9 @@ enum cxx_dialect {
   /* C++14 */
   cxx14,
   /* C++17 */
-  cxx17
+  cxx17,
+  /* C++2a (C++20?) */
+  cxx2a
 };
 
 /* The C++ dialect being used. C++98 is the default.  */
--- gcc/c-family/c-opts.c.jj2017-09-14 22:53:34.978313443 +0200
+++ gcc/c-family/c-opts.c   2017-09-15 14:28:21.287595277 +0200
@@ -111,6 +111,7 @@ static void set_std_cxx98 (int);
 static void set_std_cxx11 (int);
 static void set_std_cxx14 (int);
 static void set_std_cxx17 (int);
+static void set_std_cxx2a (int);
 static void set_std_c89 (int, int);
 static void set_std_c99 (int);
 static void set_std_c11 (int);
@@ -637,6 +638,12 @@ c_common_handle_option (size_t scode, co
set_std_cxx17 (code == OPT_std_c__17 /* ISO */);
   break;
 
+case OPT_std_c__2a:
+case OPT_std_gnu__2a:
+  if (!preprocessing_asm_p)
+   set_std_cxx2a (code == OPT_std_c__2a /* ISO */);
+  break;
+
 case OPT_std_c90:
 case OPT_std_iso9899_199409:
   if (!preprocessing_asm_p)
@@ -938,7 +945,7 @@ c_common_post_options (const char **pfil
warn_narrowing = 1;
 
   /* Unless -f{,no-}ext-numeric-literals has been used explicitly,
-for -std=c++{11,14,17} default to -fno-ext-numeric-literals.  */
+for -std=c++{11,14,17,2a} default to -fno-ext-numeric-literals.  */
   if (flag_iso && !global_options_set.x_flag_ext_numeric_literals)
cpp_opts->ext_numeric_literals = 0;
 }
@@ -1589,7 +1596,7 @@ set_std_cxx14 (int iso)
   flag_no_gnu_keywords = iso;
   flag_no_nonansi_builtin = iso;
   flag_iso = iso;
-  /* C++11 includes the C99 standard library.  */
+  /* C++14 includes the C99 standard library.  */
   flag_isoc94 = 1;
   flag_isoc99 = 1;
   cxx_dialect = cxx14;
@@ -1604,7 +1611,7 @@ set_std_cxx17 (int iso)
   flag_no_gnu_keywords = iso;
   flag_no_nonansi_builtin = iso;
   flag_iso = iso;
-  /* C++11 includes the C99 standard library.  */
+  /* C++17 includes the C11 standard library.  */
   flag_isoc94 = 1;
   flag_isoc99 = 1;
   flag_isoc11 = 1;
@@ -1612,6 +1619,22 @@ set_std_cxx17 (int iso)
   lang_hooks.name = "GNU C++17";
 }
 
+/* Set the C++ 202a draft standard (without GNU extensions if ISO).  */
+static void
+set_std_cxx2a (int iso)
+{
+  cpp_set_lang (parse_in, iso ? CLK_CXX2A: CLK_GNUCXX2A);
+  flag_no_gnu_keywords = iso;
+  flag_no_nonansi_builtin = iso;
+  flag_iso = iso;
+  /* C++17 includes the C11 standard library.  */
+  flag_isoc94 = 1;
+  flag_isoc99 = 1;
+  flag_isoc11 = 1;
+  cxx_dialect = cxx2a;
+  lang_hooks.name = "GNU C++17"; /* Pretend C++17 until standardization.  */
+}
+
 /* Args to -d specify what to dump.  Silently ignore
unrecognized options; they may be aimed at toplev.c.  */
 static void
--- gcc/c-family/c.opt.jj   2017-09-14 22:53:34.977313456 +0200
+++ gcc/c-family/c.opt  2017-09-15 12:59:25.542053945 +0200
@@ -1932,6 +1932,10 @@ std=c++17
 C++ ObjC++
 Conform to the ISO 2017 C++ standard.
 
+std=c++2a
+C++ ObjC++
+Conform to the ISO 2020(?) C++ draft standard (experimental and incomplete 
support).
+
 std=c11
 C ObjC
 Conform to the ISO 2011 C standard.
@@ -1990,6 +1994,10 @@ std=gnu++17
 C++ ObjC++
 

Re: [RFC] propagate malloc attribute in ipa-pure-const pass

2017-09-15 Thread Prathamesh Kulkarni
On 1 September 2017 at 08:09, Prathamesh Kulkarni
 wrote:
> On 17 August 2017 at 18:02, Prathamesh Kulkarni
>  wrote:
>> On 8 August 2017 at 09:50, Prathamesh Kulkarni
>>  wrote:
>>> On 31 July 2017 at 23:53, Prathamesh Kulkarni
>>>  wrote:
 On 23 May 2017 at 19:10, Prathamesh Kulkarni
  wrote:
> On 19 May 2017 at 19:02, Jan Hubicka  wrote:
>>>
>>> * LTO and memory management
>>> This is a general question about LTO and memory management.
>>> IIUC the following sequence takes place during normal LTO:
>>> LGEN: generate_summary, write_summary
>>> WPA: read_summary, execute ipa passes, write_opt_summary
>>>
>>> So I assumed it was OK in LGEN to allocate return_callees_map in
>>> generate_summary and free it in write_summary and during WPA, allocate
>>> return_callees_map in read_summary and free it after execute (since
>>> write_opt_summary does not require return_callees_map).
>>>
>>> However with fat LTO, it seems the sequence changes for LGEN with
>>> execute phase takes place after write_summary. However since
>>> return_callees_map is freed in pure_const_write_summary and
>>> propagate_malloc() accesses it in execute stage, it results in
>>> segmentation fault.
>>>
>>> To work around this, I am using the following hack in 
>>> pure_const_write_summary:
>>> // FIXME: Do not free if -ffat-lto-objects is enabled.
>>> if (!global_options.x_flag_fat_lto_objects)
>>>   free_return_callees_map ();
>>> Is there a better approach for handling this ?
>>
>> I think most passes just do not free summaries with -flto.  We probably 
>> want
>> to fix it to make it possible to compile multiple units i.e. from plugin 
>> by
>> adding release_summaries method...
>> So I would say it is OK to do the same as others do and leak it with 
>> -flto.
>>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
>>> index e457166ea39..724c26e03f6 100644
>>> --- a/gcc/ipa-pure-const.c
>>> +++ b/gcc/ipa-pure-const.c
>>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "tree-scalar-evolution.h"
>>>  #include "intl.h"
>>>  #include "opts.h"
>>> +#include "ssa.h"
>>>
>>>  /* Lattice values for const and pure functions.  Everything starts out
>>> being const, then may drop to pure and then neither depending on
>>> @@ -69,6 +70,15 @@ enum pure_const_state_e
>>>
>>>  const char *pure_const_names[3] = {"const", "pure", "neither"};
>>>
>>> +enum malloc_state_e
>>> +{
>>> +  PURE_CONST_MALLOC_TOP,
>>> +  PURE_CONST_MALLOC,
>>> +  PURE_CONST_MALLOC_BOTTOM
>>> +};
>>
>> It took me a while to work out what PURE_CONST means here :)
>> I would just call it something like STATE_MALLOC_TOP... or so.
>> ipa_pure_const is outdated name from the time pass was doing only
>> those two.
>>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state;
>>>
>>>  static vec funct_state_vec;
>>>
>>> +/* A map from node to subset of callees. The subset contains those 
>>> callees
>>> + * whose return-value is returned by the node. */
>>> +static hash_map< cgraph_node *, vec* > 
>>> *return_callees_map;
>>> +
>>
>> Hehe, a special case of return jump function.  We ought to support those 
>> more generally.
>> How do you keep it up to date over callgraph changes?
>>> @@ -921,6 +1055,23 @@ end:
>>>if (TREE_NOTHROW (decl))
>>>  l->can_throw = false;
>>>
>>> +  if (ipa)
>>> +{
>>> +  vec v = vNULL;
>>> +  l->malloc_state = PURE_CONST_MALLOC_BOTTOM;
>>> +  if (DECL_IS_MALLOC (decl))
>>> + l->malloc_state = PURE_CONST_MALLOC;
>>> +  else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))
>>> + {
>>> +   l->malloc_state = PURE_CONST_MALLOC_TOP;
>>> +   vec *callees_p = new vec (vNULL);
>>> +   for (unsigned i = 0; i < v.length (); ++i)
>>> + callees_p->safe_push (v[i]);
>>> +   return_callees_map->put (fn, callees_p);
>>> + }
>>> +  v.release ();
>>> +}
>>> +
>>
>> I would do non-ipa variant, too.  I think most attributes can be 
>> detected that way
>> as well.
>>
>> The patch generally makes sense to me.  It would be nice to make it 
>> easier to write such
>> a basic propagators across callgraph (perhaps adding a template doing 
>> the basic
>> propagation logic). Also I think you need to solve the problem with 
>> keeping your
>> summaries up to date across callgraph node removal and duplications.
> Thanks for the suggestions, I will try to 

Re: Backports for GCC 6 branch

2017-09-15 Thread Martin Liška
One more that I've just tested.

Martin
>From 69bdfd4b8d845d2262139b4406cfd9f2d947f80d Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 28 Jun 2017 07:59:23 +
Subject: [PATCH] Backport r249728

gcc/ChangeLog:

2017-06-28  Martin Liska  

	PR sanitizer/81224
	* asan.c (instrument_derefs): Bail out inner references
	that are hard register variables.

gcc/testsuite/ChangeLog:

2017-06-28  Martin Liska  

	PR sanitizer/81224
	* gcc.dg/asan/pr81224.c: New test.
---
 gcc/asan.c  |  3 +++
 gcc/testsuite/gcc.dg/asan/pr81224.c | 11 +++
 2 files changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/asan/pr81224.c

diff --git a/gcc/asan.c b/gcc/asan.c
index 9b35104dd43..b80df24acfd 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1800,6 +1800,9 @@ instrument_derefs (gimple_stmt_iterator *iter, tree t,
   || bitsize != size_in_bytes * BITS_PER_UNIT)
 return;
 
+  if (TREE_CODE (inner) == VAR_DECL && DECL_HARD_REGISTER (inner))
+return;
+
   if (TREE_CODE (inner) == VAR_DECL
   && offset == NULL_TREE
   && bitpos >= 0
diff --git a/gcc/testsuite/gcc.dg/asan/pr81224.c b/gcc/testsuite/gcc.dg/asan/pr81224.c
new file mode 100644
index 000..def5cb69aec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/pr81224.c
@@ -0,0 +1,11 @@
+/* PR sanitizer/80659 */
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-msse2" } */
+
+int a;
+int
+b ()
+{
+  register __attribute__ ((__vector_size__ (4 * sizeof (int int c asm("xmm0");
+  return c[a];
+}
-- 
2.14.1



Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling

2017-09-15 Thread Richard Biener
On Fri, Sep 15, 2017 at 1:12 PM, Tsimbalist, Igor V
 wrote:
>> -Original Message-
>> From: Tsimbalist, Igor V
>> Sent: Tuesday, September 12, 2017 5:35 PM
>> To: 'Richard Biener' 
>> Cc: 'gcc-patches@gcc.gnu.org' ; Tsimbalist, Igor V
>> 
>> Subject: RE: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
>>
>> > -Original Message-
>> > From: Tsimbalist, Igor V
>> > Sent: Friday, August 18, 2017 4:43 PM
>> > To: 'Richard Biener' 
>> > Cc: gcc-patches@gcc.gnu.org; Tsimbalist, Igor V
>> > 
>> > Subject: RE: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
>> >
>> > > -Original Message-
>> > > From: Richard Biener [mailto:richard.guent...@gmail.com]
>> > > Sent: Friday, August 18, 2017 3:53 PM
>> > > To: Tsimbalist, Igor V 
>> > > Cc: gcc-patches@gcc.gnu.org
>> > > Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
>> > >
>> > > On Fri, Aug 18, 2017 at 3:11 PM, Tsimbalist, Igor V
>> > >  wrote:
>> > > >> -Original Message-
>> > > >> From: Richard Biener [mailto:richard.guent...@gmail.com]
>> > > >> Sent: Tuesday, August 15, 2017 3:43 PM
>> > > >> To: Tsimbalist, Igor V 
>> > > >> Cc: gcc-patches@gcc.gnu.org
>> > > >> Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
>> > > >>
>> > > >> On Tue, Aug 1, 2017 at 10:56 AM, Tsimbalist, Igor V
>> > > >>  wrote:
>> > > >> > Part#1. Add generic part for Intel CET enabling.
>> > > >> >
>> > > >> > The spec is available at
>> > > >> >
>> > > >> > https://software.intel.com/sites/default/files/managed/4d/2a/co
>> > > >> > nt ro l-f low-enforcement-technology-preview.pdf
>
> <..skipped..>
>
>> > > >> I think 'notrack' is somewhat unspecific of a name, what
>> > > >> prevented you to use 'nocet'?
>> > > >
>> > > > Actually it's specific. The HW will have a prefix with exactly
>> > > > this name and
>> > > the same meaning. And I think, what is more important, 'track/notrack'
>> > > gives better semantic for a user. CET is a name bound with Intel
>> > > specific technology.
>> > >
>> > > But 'tracking' something is quite unspecific.  Tracking for what?
>> > > 'no_verify_cf' (aka do not verify control flow) maybe?
>> >
>> > The name just  has to suggest the right semantic. 'no_verify_cf' is
>> > good, let's use it unless different name appears.
>> I have renamed all newly introduced function and macro names to use
>> 'noverify_cf'. But I still keep the attribute name as 'notrack'. 
>> Historically the
>> attribute name follows the public CET specification, which uses 'no-track
>> prefix' wording. Is it ok to keep such attribute name?
>
> Here is an updated proposal about option name and attribute name.
>
> The new option has values to let a user to choose what control-flow 
> protection to activate.
>
> -fcf-protection=[full|branch|return|none]
>   branch - do control-flow protection for indirect jumps and calls
>   return - do control-flow protection for function returns
>   full - alias to specify both branch + return
>   none - turn off protection. This value is needed when/if cf-protection is 
> turned on by default by driver in future
>
> Attribute name is the most tough one. Here are several names to evaluate: 
> 'nocf_verify' or 'nocf_check', or to be more specific and to mimic option 
> name 'nocf_branch_verify' or 'nocf_branch_check'. I would prefer 'nocf_check' 
> as it applies to functions and function pointers so it's definitely related 
> to a branch and it's a smaller one.
>
> If you ok with the new proposal I'll implement it in a general parts (code, 
> documentation and tests) and resend these patches for review.

nocf_check sounds fine to me.

Richard.

> Thanks,
> Igor
>


Backports for GCC 5 branch

2017-09-15 Thread Martin Liška
Hello.

I'm going to install following backports.

Patches can bootstrap on ppc64le-redhat-linux and survives regression tests.

Martin
>From baece18f7986907d9cd7cedea78fea9b1d7ef895 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 28 Jun 2017 07:59:23 +
Subject: [PATCH 6/6] Backport r249728

gcc/ChangeLog:

2017-06-28  Martin Liska  

	PR sanitizer/81224
	* asan.c (instrument_derefs): Bail out inner references
	that are hard register variables.

gcc/testsuite/ChangeLog:

2017-06-28  Martin Liska  

	PR sanitizer/81224
	* gcc.dg/asan/pr81224.c: New test.
---
 gcc/asan.c  |  3 +++
 gcc/testsuite/gcc.dg/asan/pr81224.c | 11 +++
 2 files changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/asan/pr81224.c

diff --git a/gcc/asan.c b/gcc/asan.c
index 8e359681fc4..3edbdf37612 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1802,6 +1802,9 @@ instrument_derefs (gimple_stmt_iterator *iter, tree t,
   || bitsize != size_in_bytes * BITS_PER_UNIT)
 return;
 
+  if (TREE_CODE (inner) == VAR_DECL && DECL_HARD_REGISTER (inner))
+return;
+
   if (TREE_CODE (inner) == VAR_DECL
   && offset == NULL_TREE
   && bitpos >= 0
diff --git a/gcc/testsuite/gcc.dg/asan/pr81224.c b/gcc/testsuite/gcc.dg/asan/pr81224.c
new file mode 100644
index 000..def5cb69aec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/pr81224.c
@@ -0,0 +1,11 @@
+/* PR sanitizer/80659 */
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-msse2" } */
+
+int a;
+int
+b ()
+{
+  register __attribute__ ((__vector_size__ (4 * sizeof (int int c asm("xmm0");
+  return c[a];
+}
-- 
2.14.1

>From 5b3c19d5c81cb1b8ba6686509ffb889295cdebbc Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 30 Aug 2017 12:38:31 +
Subject: [PATCH 5/6] Backport r251530

gcc/ChangeLog:

2017-08-30  Martin Liska  

	PR inline-asm/82001
	* ipa-icf-gimple.c (func_checker::compare_tree_list_operand):
	Rename to ...
	(func_checker::compare_asm_inputs_outputs): ... this function.
	(func_checker::compare_gimple_asm): Use the function to compare
	also ASM constrains.
	* ipa-icf-gimple.h: Rename the function.

gcc/testsuite/ChangeLog:

2017-08-30  Martin Liska  

	PR inline-asm/82001
	* gcc.dg/ipa/pr82001.c: New test.
---
 gcc/ipa-icf-gimple.c   | 19 +--
 gcc/ipa-icf-gimple.h   |  6 +++---
 gcc/testsuite/gcc.dg/ipa/pr82001.c | 21 +
 3 files changed, 37 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr82001.c

diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index 6227b7e9579..a97f282a7a2 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -577,11 +577,8 @@ func_checker::compare_operand (tree t1, tree t2)
 }
 }
 
-/* Compares two tree list operands T1 and T2 and returns true if these
-   two trees are semantically equivalent.  */
-
 bool
-func_checker::compare_tree_list_operand (tree t1, tree t2)
+func_checker::compare_asm_inputs_outputs (tree t1, tree t2)
 {
   gcc_assert (TREE_CODE (t1) == TREE_LIST);
   gcc_assert (TREE_CODE (t2) == TREE_LIST);
@@ -594,6 +591,16 @@ func_checker::compare_tree_list_operand (tree t1, tree t2)
   if (!compare_operand (TREE_VALUE (t1), TREE_VALUE (t2)))
 	return return_false ();
 
+  tree p1 = TREE_PURPOSE (t1);
+  tree p2 = TREE_PURPOSE (t2);
+
+  gcc_assert (TREE_CODE (p1) == TREE_LIST);
+  gcc_assert (TREE_CODE (p2) == TREE_LIST);
+
+  if (strcmp (TREE_STRING_POINTER (TREE_VALUE (p1)),
+		  TREE_STRING_POINTER (TREE_VALUE (p2))) != 0)
+	return return_false ();
+
   t2 = TREE_CHAIN (t2);
 }
 
@@ -1039,7 +1046,7 @@ func_checker::compare_gimple_asm (const gasm *g1, const gasm *g2)
   tree input1 = gimple_asm_input_op (g1, i);
   tree input2 = gimple_asm_input_op (g2, i);
 
-  if (!compare_tree_list_operand (input1, input2))
+  if (!compare_asm_inputs_outputs (input1, input2))
 	return return_false_with_msg ("ASM input is different");
 }
 
@@ -1048,7 +1055,7 @@ func_checker::compare_gimple_asm (const gasm *g1, const gasm *g2)
   tree output1 = gimple_asm_output_op (g1, i);
   tree output2 = gimple_asm_output_op (g2, i);
 
-  if (!compare_tree_list_operand (output1, output2))
+  if (!compare_asm_inputs_outputs (output1, output2))
 	return return_false_with_msg ("ASM output is different");
 }
 
diff --git a/gcc/ipa-icf-gimple.h b/gcc/ipa-icf-gimple.h
index 6a9cbed5ff4..4d2ec9169b7 100644
--- a/gcc/ipa-icf-gimple.h
+++ b/gcc/ipa-icf-gimple.h
@@ -215,9 +215,9 @@ public:
  is returned.  */
   bool compare_operand (tree t1, tree t2);
 
-  /* Compares two tree list operands T1 and T2 and returns true if these
- two trees are semantically equivalent.  */
-  bool compare_tree_list_operand (tree t1, tree t2);
+  /* Compares 

[PATCH] More PR81968 fixing

2017-09-15 Thread Richard Biener

This fixes simple_object_elf_copy_lto_debug_sections to properly
iterate the marking sections as necessary process.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-09-15  Richard Biener  

PR lto/81968
* simple-object-elf.c (simple_object_elf_copy_lto_debug_sections):
Iterate marking dependent sections necessary.

Index: libiberty/simple-object-elf.c
===
--- libiberty/simple-object-elf.c   (revision 252780)
+++ libiberty/simple-object-elf.c   (working copy)
@@ -1158,70 +1158,84 @@ simple_object_elf_copy_lto_debug_section
 
   /* Mark sections as preserved that are required by to be preserved
  sections.  */
-  for (i = 1; i < shnum; ++i)
+  int changed;
+  do
 {
-  unsigned char *shdr;
-  unsigned int sh_type, sh_info, sh_link;
-  off_t offset;
-  off_t length;
-
-  shdr = shdrs + (i - 1) * shdr_size;
-  sh_type = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
-shdr, sh_type, Elf_Word);
-  sh_info = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
-shdr, sh_info, Elf_Word);
-  sh_link = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
-shdr, sh_link, Elf_Word);
-  if (sh_type == SHT_GROUP)
+  changed = 0;
+  for (i = 1; i < shnum; ++i)
{
- /* Mark groups containing copied sections.  */
- unsigned entsize = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
- shdr, sh_entsize, Elf_Addr);
- unsigned char *ent, *buf;
- int keep = 0;
- offset = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
-   shdr, sh_offset, Elf_Addr);
- length = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
-   shdr, sh_size, Elf_Addr);
- buf = XNEWVEC (unsigned char, length);
- if (!simple_object_internal_read (sobj->descriptor,
-   sobj->offset + offset, buf,
-   (size_t) length, , err))
+ unsigned char *shdr;
+ unsigned int sh_type, sh_info, sh_link;
+ off_t offset;
+ off_t length;
+
+ shdr = shdrs + (i - 1) * shdr_size;
+ sh_type = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
+shdr, sh_type, Elf_Word);
+ sh_info = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
+shdr, sh_info, Elf_Word);
+ sh_link = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
+shdr, sh_link, Elf_Word);
+ if (sh_type == SHT_GROUP)
{
- XDELETEVEC (buf);
- XDELETEVEC (names);
- XDELETEVEC (shdrs);
- return errmsg;
+ /* Mark groups containing copied sections.  */
+ unsigned entsize = ELF_FETCH_FIELD (type_functions, ei_class,
+ Shdr, shdr, sh_entsize,
+ Elf_Addr);
+ unsigned char *ent, *buf;
+ int keep = 0;
+ offset = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
+   shdr, sh_offset, Elf_Addr);
+ length = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
+   shdr, sh_size, Elf_Addr);
+ buf = XNEWVEC (unsigned char, length);
+ if (!simple_object_internal_read (sobj->descriptor,
+   sobj->offset + offset, buf,
+   (size_t) length, , err))
+   {
+ XDELETEVEC (buf);
+ XDELETEVEC (names);
+ XDELETEVEC (shdrs);
+ return errmsg;
+   }
+ for (ent = buf + entsize; ent < buf + length; ent += entsize)
+   {
+ unsigned sec = type_functions->fetch_Elf_Word (ent);
+ if (pfnret[sec - 1] == 0)
+   keep = 1;
+   }
+ if (keep)
+   {
+ changed |= (pfnret[sh_link - 1] == -1
+ || pfnret[i - 1] == -1);
+ pfnret[sh_link - 1] = 0;
+ pfnret[i - 1] = 0;
+   }
}
- for (ent = buf + entsize; ent < buf + length; ent += entsize)
+ if (sh_type == SHT_RELA
+ || sh_type == SHT_REL)
{
- unsigned sec = type_functions->fetch_Elf_Word (ent);
- if (pfnret[sec - 1] == 0)
-   keep = 1;
+ /* Mark relocation sections and symtab of copied sections.  */
+ if (pfnret[sh_info - 1] == 0)
+ 

Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

2017-09-15 Thread Richard Biener
On Wed, Sep 13, 2017 at 11:20 PM, Wilco Dijkstra  wrote:
> Jeff Law wrote:
>> On 09/06/2017 03:55 AM, Jackson Woodruff wrote:
>> > On 08/30/2017 01:46 PM, Richard Biener wrote:
>
   rdivtmp = 1 / (y*C);
   tem = x *rdivtmp;
   tem2= z * rdivtmp;

 instead of

   rdivtmp = 1/y;
   tem = x * 1/C * rdivtmp;
   tem2 = z * 1/C * rdivtmp;
>>>
>>> Ideally we would be able to CSE that into
>>>
>>> rdivtmp = 1/y * 1/C;
>>> tem = x * rdivtmp;
>>> tem2 = z * rdivtmp;
>> So why is your sequence significantly better than Richi's desired
>> seqeuence?  They both seem to need 3 mults and a division (which in both
>> cases might be a reciprocal estimation).In Richi's sequence we have
>> to mult and feed the result as an operand into the reciprocal insn.  In
>> yours we feed the result of the reciprocal into the multiply.
>
> Basically this stuff happens a lot in real code, which is exactly why I 
> proposed it.
> I even provided counts of how many divisions each transformation avoids:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026.
>
> Note this transformation is a canonicalization - if you can't merge a
> constant somehow, moving it out to the LHS can expose more
> opportunities, like in (C1 * x) / (C2 * y) -> (C1 * x * C2) / y -> (C3 * x) / 
> y.
> Same for negation as it behaves like * -1.
>
> The key here is that it is at least an order of magnitude worse if you have
> to execute an extra division than if you have an extra multiply.
>
>> ISTM in the end if  y*C or 1/(y*C) is a CSE, then Richi's sequence wins.
>> Similarly if 1/y is a CSE, then yours wins.  Is there some reason to
>> believe that one is a more likely CSE than the other?
>
> The idea is that 1/y is much more likely a CSE than 1/(y*C).
>
> We could make the pattern only fire in single use cases and see whether
> that makes a difference. It would be easy to test old vs new vs single-use
> new and count how many divisions we end up with. We could add 1/ (y * C)
> to the reciprocal phase if it is unacceptable as a canonicalization, but then
> you won't be able to optimize (C1 * x * C2) / y.
>
>> I think there's a fundamental phase ordering problem here.  You want to
>> CSE stuff as much as possible, then expose reciprocals, then CSE again
>> because exposing reciprocals can expose new CSE opportunities.
>
> I agree there are phase ordering issues and various problems in
> reassociation, CSE and division optimizations not being able to find and
> optimize complex cases that are worthwhile.
>
> However I don't agree doing CSE before reciprocals is a good idea. We
> want to expose reciprocals early on, even if that means we find fewer
> CSEs as a result - again because division is so much more expensive than
> any other operation. CSE is generally not smart enough to CSE a * x in
> a * b * x and a * c * x, something which is likely to happen quite frequently 
> -
> unlike the made up division examples here.
>
>> And I suspect that no matter how hard we try, there's going to be cases
>> that get exposed by various transformations in the pipeline such that to
>> fully optimize the cse - reciprocal - cse sequence would need to be
>> repeated to fully optimize.  We may have to live with not being
>> particularly good at picking up the those second order effects.
>>
>> I do think that the need for cse - reciprocal - cse  sequencing suggests
>> that match.pd may not be the right place for these transformations.  I
>> think Richi has pointed this out a couple times already.
>
> I don't think you can ever expect to find the optimal case by repeating
> optimizations. It's quite easy to construct examples where first doing CSE
> makes things significantly worse. Ultimately to get something optimal you'd
> need to try lots of permutations and count for each possible permutation
> how many multiplies and divisons you end up after full optimization.
> Quite impractical...
>
>> I'm not going to accept or reject at this time.  I think we need to make
>> a higher level decision.  Are these transformations better suited for
>> match.pd or the reciprocal transformation pass?  I realize that some
>> patterns are already in match.pd, but let's try to settle the higher
>> level issue before we add more.
>
> The first question is whether you see it as a canonicalization. If so, then
> match.pd should be fine.

A canonicalization to more divisions is not fine.  That is, if we think
that x / (C * y) is non-canonical because constant parts should be
in the denominator then fine, canonicalize it as (x * C') / y with
C' = 1/C.  But then implement it so, not in a pattern that suggests
you'll end up with two divisions.

Let me repeat though that this looks like a job for re-association.

>From that perspective the part of the patch doing

+ /* Simplify x / (C * y) to (x * (1 / C)) / y where C is a constant.  */
+ (if (optimize)
+  (simplify
+   (rdiv @0
+(mult @1 REAL_CST@2))
+   (if (!real_zerop (@1))
+(with
+ 

[PATCH] Revert r238089 (PR driver/81829).

2017-09-15 Thread Martin Liška
Hi.

In order to make the code simple and transparent, I suggest to revert r238089.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 0a865d51a5f61d0fa13e5d4eea208c62ff89e32e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 15 Sep 2017 14:02:13 +0200
Subject: [PATCH] Revert r238089 (PR driver/81829).

gcc/ChangeLog:

2017-09-15  Martin Liska  

	PR driver/81829
	* file-find.c (remove_prefix): Remove.
	* file-find.h (remove_prefix): Likewise.
	* gcc-ar.c: Remove smartness of lookup.
---
 gcc/file-find.c | 35 ---
 gcc/file-find.h |  1 -
 gcc/gcc-ar.c|  8 
 3 files changed, 44 deletions(-)

diff --git a/gcc/file-find.c b/gcc/file-find.c
index b072a4993d7..b5a1fe8494e 100644
--- a/gcc/file-find.c
+++ b/gcc/file-find.c
@@ -208,38 +208,3 @@ prefix_from_string (const char *p, struct path_prefix *pprefix)
 }
   free (nstore);
 }
-
-void
-remove_prefix (const char *prefix, struct path_prefix *pprefix)
-{
-  struct prefix_list *remove, **prev, **remove_prev = NULL;
-  int max_len = 0;
-
-  if (pprefix->plist)
-{
-  prev = >plist;
-  for (struct prefix_list *pl = pprefix->plist; pl->next; pl = pl->next)
-	{
-	  if (strcmp (prefix, pl->prefix) == 0)
-	{
-	  remove = pl;
-	  remove_prev = prev;
-	  continue;
-	}
-
-	  int l = strlen (pl->prefix);
-	  if (l > max_len)
-	max_len = l;
-
-	  prev = 
-	}
-
-  if (remove_prev)
-	{
-	  *remove_prev = remove->next;
-	  free (remove);
-	}
-
-  pprefix->max_len = max_len;
-}
-}
diff --git a/gcc/file-find.h b/gcc/file-find.h
index 8f49a3af273..407feba26e7 100644
--- a/gcc/file-find.h
+++ b/gcc/file-find.h
@@ -41,7 +41,6 @@ extern void find_file_set_debug (bool);
 extern char *find_a_file (struct path_prefix *, const char *, int);
 extern void add_prefix (struct path_prefix *, const char *);
 extern void add_prefix_begin (struct path_prefix *, const char *);
-extern void remove_prefix (const char *prefix, struct path_prefix *);
 extern void prefix_from_env (const char *, struct path_prefix *);
 extern void prefix_from_string (const char *, struct path_prefix *);
 
diff --git a/gcc/gcc-ar.c b/gcc/gcc-ar.c
index 78d2fc1ad30..d5d80e042e5 100644
--- a/gcc/gcc-ar.c
+++ b/gcc/gcc-ar.c
@@ -194,14 +194,6 @@ main (int ac, char **av)
 #ifdef CROSS_DIRECTORY_STRUCTURE
   real_exe_name = concat (target_machine, "-", PERSONALITY, NULL);
 #endif
-  /* Do not search original location in the same folder.  */
-  char *exe_folder = lrealpath (av[0]);
-  exe_folder[strlen (exe_folder) - strlen (lbasename (exe_folder))] = '\0';
-  char *location = concat (exe_folder, PERSONALITY, NULL);
-
-  if (access (location, X_OK) == 0)
-	remove_prefix (exe_folder, );
-
   exe_name = find_a_file (, real_exe_name, X_OK);
   if (!exe_name)
 	{
-- 
2.14.1



[demangler] Fix nested generic lambda

2017-09-15 Thread Nathan Sidwell
This patch fixes PR82195, which turned out to be a demangler bug -- the 
specification and the compiler were DTRT.


The originating source contained a non-generic lambda within a generic 
lambda.  Because we had an older GDB without recursion protection in its 
demangler, GDB died, and that kind of makes debugging difficult.


I did reduce the testcase further to:

void Foo () {
  // Foo():: [with auto:1 = int]
  // _ZZ3FoovENKUlT_E_clIiEEfS_
  [](auto) ->float {
struct Local {
  // void Foo()::::Local::fn() [with auto:1 = int]
  // _ZZZ3FoovENKUlT_E_clIiEEfS_EN5Local2fnEv
  static void fn () {}
};
Local::fn ();
return 0.0f;
  } (0);
}

The salient point is that we have to mangle names within the local 
context of an instantiation of a lambda's templated operator ().  The 
demangler was not expecting a local name:

   Z  E  [discriminator]
to have a function as a local-entity.  This wasn't noticed because it 
still managed to parse the trailing parameter types in the context of 
d_local_name's caller, as there usually is no discriminator.  But those 
parameters got attached to the wrong level of the demangle tree.


When those parameters contain template parameters that refer to the 
instantiation parameters of the local entity, it all goes wrong.  (The 
instantiation parameters get correctly attached to the local-entity).


After parsing a local entity name we peek at the next character, and if 
it's not NUL, 'E' (end of containing scope) or '_' (discriminator) we 
parse function parameters before closing out the local name.


A complication is that we need to know if we're the outermost local 
name, and not do it in that case.  Otherwise we end up gluing the 
parameters too deeply and consequently --no-param demangles break 
(you'll see there's similar funky code in d_encoding).  Luckily in such 
cases we cannot meet the templated case.


Applying to trunk.

Pedro, would you like me to port to gdb's libiberty, or will you do a 
merge in the near future?


nathan

--
Nathan Sidwell
2017-09-15  Nathan Sidwell  

	PR demangle/82195
	* cp-demangle.c (d_name): Add 'toplevel' parm.  Pass to	...
	(d_local_name): ... here.  Parse trailing function args on nested
	local_name.
	(d_encoding, d_special_name, d_class_enum_type): Adjust d_name calls.
	* testsuite/demangle-expected: Add tests.

Index: cp-demangle.c
===
--- cp-demangle.c	(revision 252802)
+++ cp-demangle.c	(working copy)
@@ -425,7 +425,7 @@ is_ctor_dtor_or_conversion (struct deman
 
 static struct demangle_component *d_encoding (struct d_info *, int);
 
-static struct demangle_component *d_name (struct d_info *);
+static struct demangle_component *d_name (struct d_info *, int);
 
 static struct demangle_component *d_nested_name (struct d_info *);
 
@@ -484,7 +484,7 @@ static struct demangle_component *d_expr
 
 static struct demangle_component *d_expr_primary (struct d_info *);
 
-static struct demangle_component *d_local_name (struct d_info *);
+static struct demangle_component *d_local_name (struct d_info *, int);
 
 static int d_discriminator (struct d_info *);
 
@@ -1308,7 +1308,7 @@ d_encoding (struct d_info *di, int top_l
 {
   struct demangle_component *dc, *dcr;
 
-  dc = d_name (di);
+  dc = d_name (di, top_level);
 
   if (dc != NULL && top_level && (di->options & DMGL_PARAMS) == 0)
 	{
@@ -1383,7 +1383,7 @@ d_abi_tags (struct d_info *di, struct de
 */
 
 static struct demangle_component *
-d_name (struct d_info *di)
+d_name (struct d_info *di, int top_level)
 {
   char peek = d_peek_char (di);
   struct demangle_component *dc;
@@ -1394,7 +1394,7 @@ d_name (struct d_info *di)
   return d_nested_name (di);
 
 case 'Z':
-  return d_local_name (di);
+  return d_local_name (di, top_level);
 
 case 'U':
   return d_unqualified_name (di);
@@ -2079,11 +2079,11 @@ d_special_name (struct d_info *di)
 
 	case 'H':
 	  return d_make_comp (di, DEMANGLE_COMPONENT_TLS_INIT,
-			  d_name (di), NULL);
+			  d_name (di, 0), NULL);
 
 	case 'W':
 	  return d_make_comp (di, DEMANGLE_COMPONENT_TLS_WRAPPER,
-			  d_name (di), NULL);
+			  d_name (di, 0), NULL);
 
 	default:
 	  return NULL;
@@ -2094,11 +2094,12 @@ d_special_name (struct d_info *di)
   switch (d_next_char (di))
 	{
 	case 'V':
-	  return d_make_comp (di, DEMANGLE_COMPONENT_GUARD, d_name (di), NULL);
+	  return d_make_comp (di, DEMANGLE_COMPONENT_GUARD,
+			  d_name (di, 0), NULL);
 
 	case 'R':
 	  {
-	struct demangle_component *name = d_name (di);
+	struct demangle_component *name = d_name (di, 0);
 	return d_make_comp (di, DEMANGLE_COMPONENT_REFTEMP, name,
 d_number_component (di));
 	  }
@@ -2934,7 +2935,7 @@ d_bare_function_type (struct d_info *di,
 static struct demangle_component *
 d_class_enum_type (struct d_info *di)
 {
-  return d_name (di);
+  return d_name (di, 0);

Re: libgo patch committed: Upgrade to Go 1.9 release

2017-09-15 Thread Rainer Orth
Hi Ian,

>>> the patch broke Solaris bootstrap:
>>>
>>> /vol/gcc/src/hg/trunk/local/libgo/go/syscall/exec_unix.go:240:11: error: 
>>> reference to undefined name 'forkExecPipe'
>>>   if err = forkExecPipe(p[:]); err != nil {
>>>^
>>>
>>> libgo/go/syscall/forkpipe_bsd.go is needed on Solaris, too.
>>>
>>> /vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/link.go:73:10: 
>>> error: use of undefined type 'lifnum'
>>>   lifn := lifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | 
>>> sysLIFC_ALLZONES | sysLIFC_UNDER_IPMP}
>>>   ^
>>> make[8]: *** [Makefile:3349: golang_org/x/net/lif.lo] Error 1
>>>
>>> The Go 1.9 upgrade patch has
>>>
>>> @@ -70,7 +70,7 @@ func Links(af int, name string) ([]Link,
>>>
>>>  func links(eps []endpoint, name string) ([]Link, error) {
>>> var lls []Link
>>> -   lifn := sysLifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | 
>>> sysLIFC_AL
>>> LZONES | sysLIFC_UNDER_IPMP}
>>> +   lifn := lifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | 
>>> sysLIFC_ALLZO
>>> NES | sysLIFC_UNDER_IPMP}
>>>
>>> Reverting that allows link.go to compile.
>>>
>>> /vol/gcc/src/hg/trunk/local/libgo/go/internal/poll/fd_unix.go:366:21: 
>>> error: reference to undefined identifier 'syscall.ReadDirent'
>>>n, err := syscall.ReadDirent(fd.Sysfd, buf)
>>>  ^
>>>
>>> I don't yet see where this comes from on non-Linux systems...
>>
>> It's in forkpipe_bsd.go.  Does this patch fix the problem?
>
> that's true for forkExecPipe and I had this change in the patch I'd
> attached.  But what about syscall.ReadDirent?  I couldn't find that
> one...

I've had success with this patch on sparc-sun-solaris2.11 and
i386-pc-solaris2.11.

I've no idea what's behind upstream's src/syscall/syscall_solaris.go:

func ReadDirent(fd int, buf []byte) (n int, err error) {
// Final argument is (basep *uintptr) and the syscall doesn't take nil.
// TODO(rsc): Can we use a single global basep for all calls?
return Getdents(fd, buf, new(uintptr))
}

I could find no hint that getdents(2) has an additional basep arg,
neither in OpenSolaris sources nor in Illumos, so I've ignored this
weirdness.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


# HG changeset patch
# Parent  7757f79801cbabf35852c59a0a056f48150cb0b6
Fix Solaris libgo build after Go 1.9 import

diff --git a/libgo/go/golang_org/x/net/lif/link.go b/libgo/go/golang_org/x/net/lif/link.go
--- a/libgo/go/golang_org/x/net/lif/link.go
+++ b/libgo/go/golang_org/x/net/lif/link.go
@@ -70,7 +70,7 @@ func Links(af int, name string) ([]Link,
 
 func links(eps []endpoint, name string) ([]Link, error) {
 	var lls []Link
-	lifn := lifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | sysLIFC_ALLZONES | sysLIFC_UNDER_IPMP}
+	lifn := sysLifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | sysLIFC_ALLZONES | sysLIFC_UNDER_IPMP}
 	lifc := lifconf{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | sysLIFC_ALLZONES | sysLIFC_UNDER_IPMP}
 	for _, ep := range eps {
 		lifn.Family = uint16(ep.af)
diff --git a/libgo/go/syscall/forkpipe_bsd.go b/libgo/go/syscall/forkpipe_bsd.go
--- a/libgo/go/syscall/forkpipe_bsd.go
+++ b/libgo/go/syscall/forkpipe_bsd.go
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build darwin dragonfly netbsd openbsd
+// +build darwin dragonfly netbsd openbsd solaris
 
 package syscall
 
diff --git a/libgo/go/syscall/libcall_solaris.go b/libgo/go/syscall/libcall_solaris.go
new file mode 100644
--- /dev/null
+++ b/libgo/go/syscall/libcall_solaris.go
@@ -0,0 +1,12 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package syscall
+
+//sys Getdents(fd int, buf []byte) (n int, err error)
+//getdents(fd _C_int, buf *byte, nbyte Size_t) _C_int
+
+func ReadDirent(fd int, buf []byte) (n int, err error) {
+	return Getdents(fd, buf)
+}


Re: Turn CANNOT_CHANGE_MODE_CLASS into a hook

2017-09-15 Thread Richard Sandiford
Jeff Law  writes:
> On 09/13/2017 01:19 PM, Richard Sandiford wrote:
>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
>> Also tested by comparing the testsuite assembly output on at least one
>> target per CPU directory.  OK to install?
>> 
>> Richard
>> 
>> 
>> 2017-09-13  Richard Sandiford  
>>  Alan Hayard  
>>  David Sherwood  
>> 
>> gcc/
>>  * target.def (can_change_mode_class): New hook.
>>  (mode_rep_extended): Refer to it instead of CANNOT_CHANGE_MODE_CLASS.
>>  (hard_regno_nregs): Likewise.
>>  * hooks.h (hook_bool_mode_mode_reg_class_t_true): Declare.
>>  * hooks.c (hook_bool_mode_mode_reg_class_t_true): New function.
>>  * doc/tm.texi.in (CANNOT_CHANGE_MODE_CLASS): Replace with...
>>  (TARGET_CAN_CHANGE_MODE_CLASS): ...this.
>>  (LOAD_EXTEND_OP): Update accordingly.
>>  * doc/tm.texi: Regenerate.
>>  * doc/rtl.texi: Refer to TARGET_CAN_CHANGE_MODE_CLASS instead of
>>  CANNOT_CHANGE_MODE_CLASS.
>>  * hard-reg-set.h (REG_CANNOT_CHANGE_MODE_P): Replace with...
>>  (REG_CAN_CHANGE_MODE_P): ...this new macro.
>>  * combine.c (simplify_set): Update accordingly.
>>  * emit-rtl.c (validate_subreg): Likewise.
>>  * recog.c (general_operand): Likewise.
>>  * regcprop.c (mode_change_ok): Likewise.
>>  * reload1.c (choose_reload_regs): Likewise.
>>  (inherit_piecemeal_p): Likewise.
>>  * rtlanal.c (simplify_subreg_regno): Likewise.
>>  * postreload.c (reload_cse_simplify_set): Use REG_CAN_CHANGE_MODE_P
>>  instead of CANNOT_CHANGE_MODE_CLASS.
>>  (reload_cse_simplify_operands): Likewise.
>>  * reload.c (push_reload): Use targetm.can_change_mode_class
>>  instead of CANNOT_CHANGE_MODE_CLASS.
>>  (push_reload): Likewise.  Also use REG_CAN_CHANGE_MODE_P instead of
>>  REG_CANNOT_CHANGE_MODE_P.
>>  * config/alpha/alpha.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>>  * config/alpha/alpha.c (alpha_can_change_mode_class): New function.
>>  (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>>  * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>>  * config/arm/arm.c (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>>  (arm_can_change_mode_class): New function.
>>  * config/arm/neon.md: Refer to TARGET_CAN_CHANGE_MODE_CLASS rather
>>  than CANNOT_CHANGE_MODE_CLASS in comments.
>>  * config/i386/i386.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>>  * config/i386/i386-protos.h (ix86_cannot_change_mode_class): Delete.
>>  * config/i386/i386.c (ix86_cannot_change_mode_class): Replace with...
>>  (ix86_can_change_mode_class): ...this new function, inverting the
>>  sense of the return value.
>>  (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>>  * config/ia64/ia64.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>>  * config/ia64/ia64.c (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>>  (ia64_can_change_mode_class): New function.
>>  * config/m32c/m32c.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>>  * config/m32c/m32c-protos.h (m32c_cannot_change_mode_class): Delete.
>>  * config/m32c/m32c.c (m32c_cannot_change_mode_class): Replace with...
>>  (m32c_can_change_mode_class): ...this new function, inverting the
>>  sense of the return value.
>>  (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>>  * config/mips/mips.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>>  * config/mips/mips-protos.h (mips_cannot_change_mode_class): Delete.
>>  * config/mips/mips.c (mips_cannot_change_mode_class): Replace with...
>>  (mips_can_change_mode_class): ...this new function, inverting the
>>  sense of the return value.
>>  (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>>  * config/msp430/msp430.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>>  * config/msp430/msp430.c (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>>  (msp430_can_change_mode_class): New function.
>>  * config/nvptx/nvptx.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>>  * config/nvptx/nvptx.c (nvptx_can_change_mode_class): New function.
>>  (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>>  * config/pa/pa32-regs.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>>  * config/pa/pa64-regs.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>>  * config/pa/pa-protos.h (pa_cannot_change_mode_class): Delete.
>>  * config/pa/pa.c (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>>  (pa_cannot_change_mode_class): Replace with...
>>  (pa_can_change_mode_class): ...this new function, inverting the
>>  sense of the return value.
>>  (pa_modes_tieable_p): Refer to TARGET_CAN_CHANGE_MODE_CLASS rather
>>  than CANNOT_CHANGE_MODE_CLASS in comments.
>>  * config/pdp11/pdp11.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>>  * config/pdp11/pdp11-protos.h (pdp11_cannot_change_mode_class): Delete.
>>  * config/pdp11/pdp11.c (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>>  

Re: [PATCH PR82163]Rewrite loop into lcssa form instantly

2017-09-15 Thread Richard Biener
On Thu, Sep 14, 2017 at 5:02 PM, Bin Cheng  wrote:
> Hi,
> Current pcom implementation rewrites into lcssa form after all loops are 
> transformed, this is
> not enough because unrolling of later loop checks lcssa form in function 
> tree_transform_and_unroll_loop.
> This simple patch rewrites loop into lcssa form if store-store chain is 
> handled.  I think it doesn't
> affect compilation time since rewrite_into_loop_closed_ssa_1 is only called 
> for store-store chain
> transformation and only the transformed loop is rewritten.

Well, it may look like only the transformed loop is rewritten -- yes,
it is, but rewrite_into_loop_closed_ssa
calls update_ssa () which operates on the whole function.

So I'd rather _not_ do this.

Is there a real problem or is it just the overly aggressive checking
done?  IMHO we should remove
the checking or pass in a param whether to skip the checking.  Or even
better, restrict the
checking to those loops trans_form_and_unroll actually touches.

Richard.

> Bootstrap and test ongoing on x86_64.  is it OK if no failures?
>
> Thanks,
> bin
> 2017-09-14  Bin Cheng  
>
> PR tree-optimization/82163
> * tree-predcom.c (tree_predictive_commoning_loop): Rewrite into
> loop closed ssa instantly.  Return boolean true if loop is unrolled.
> (tree_predictive_commoning): Return TODO_cleanup_cfg if loop is
> unrolled.
>
> gcc/testsuite
> 2017-09-14  Bin Cheng  
>
> PR tree-optimization/82163
> * gcc.dg/tree-ssa/pr82163.c: New test.


Backports to 7.x

2017-09-15 Thread Jakub Jelinek
Hi!

I've bootstrapped/regtested (x86_64-linux and i686-linux) and committed 
following
4 backports to gcc-7-branch:

Jakub
2017-09-15  Jakub Jelinek  

Backported from mainline
2017-09-12  Jakub Jelinek  

PR target/82112
* c-common.c (sync_resolve_size): Instead of c_dialect_cxx ()
assertion check that in the condition.
(get_atomic_generic_size): Likewise.  Before testing if parameter
has pointer type, if it has array type, call for C++
default_conversion to perform array-to-pointer conversion.

* c-c++-common/pr82112.c: New test.
* gcc.dg/pr82112.c: New test.

--- gcc/c-family/c-common.c (revision 252002)
+++ gcc/c-family/c-common.c (revision 252003)
@@ -6478,10 +6478,9 @@ sync_resolve_size (tree function, vec2017-09-15  Jakub Jelinek  

Backported from mainline
2017-09-12  Jakub Jelinek  

PR target/82112
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): For
ALTIVEC_BUILTIN_VEC_LD if arg1 has array type call default_conversion
on it early, rather than manual conversion late.  For
ALTIVEC_BUILTIN_VEC_ST if arg2 has array type call default_conversion
instead of performing manual conversion.

* gcc.target/powerpc/pr82112.c: New test.
* g++.dg/ext/altivec-18.C: New test.

--- gcc/config/rs6000/rs6000-c.c(revision 252027)
+++ gcc/config/rs6000/rs6000-c.c(revision 252028)
@@ -6489,7 +6489,13 @@ altivec_resolve_overloaded_builtin (loca
 
   /* Strip qualifiers like "const" from the pointer arg.  */
   tree arg1_type = TREE_TYPE (arg1);
-  if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != ARRAY_TYPE)
+  if (TREE_CODE (arg1_type) == ARRAY_TYPE && c_dialect_cxx ())
+   {
+ /* Force array-to-pointer decay for C++.  */
+ arg1 = default_conversion (arg1);
+ arg1_type = TREE_TYPE (arg1);
+   }
+  if (!POINTER_TYPE_P (arg1_type))
goto bad;
 
   tree inner_type = TREE_TYPE (arg1_type);
@@ -6509,15 +6515,6 @@ altivec_resolve_overloaded_builtin (loca
  if (!ptrofftype_p (TREE_TYPE (arg0)))
arg0 = build1 (NOP_EXPR, sizetype, arg0);
 
- tree arg1_type = TREE_TYPE (arg1);
- if (TREE_CODE (arg1_type) == ARRAY_TYPE)
-   {
- arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));
- tree const0 = build_int_cstu (sizetype, 0);
- tree arg1_elt0 = build_array_ref (loc, arg1, const0);
- arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);
-   }
-
  tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,
   arg1, arg0);
  tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,
@@ -6572,12 +6569,11 @@ altivec_resolve_overloaded_builtin (loca
arg1 = build1 (NOP_EXPR, sizetype, arg1);
 
  tree arg2_type = TREE_TYPE (arg2);
- if (TREE_CODE (arg2_type) == ARRAY_TYPE)
+ if (TREE_CODE (arg2_type) == ARRAY_TYPE && c_dialect_cxx ())
{
- arg2_type = TYPE_POINTER_TO (TREE_TYPE (arg2_type));
- tree const0 = build_int_cstu (sizetype, 0);
- tree arg2_elt0 = build_array_ref (loc, arg2, const0);
- arg2 = build1 (ADDR_EXPR, arg2_type, arg2_elt0);
+ /* Force array-to-pointer decay for C++.  */
+ arg2 = default_conversion (arg2);
+ arg2_type = TREE_TYPE (arg2);
}
 
  /* Find the built-in to make sure a compatible one exists; if not
--- gcc/testsuite/gcc.target/powerpc/pr82112.c  (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr82112.c  (revision 252028)
@@ -0,0 +1,16 @@
+/* PR target/82112 */
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -std=gnu90" } */
+
+#include 
+
+struct __attribute__((aligned (16))) S { unsigned char c[64]; } bar (void);
+vector unsigned char v;
+
+void
+foo (void)
+{
+  vec_ld (0, bar ().c);/* { dg-error "invalid parameter combination 
for AltiVec intrinsic" } */
+  vec_st (v, 0, bar ().c); /* { dg-error "invalid parameter combination 
for AltiVec intrinsic" } */
+}
--- gcc/testsuite/g++.dg/ext/altivec-18.C   (nonexistent)
+++ gcc/testsuite/g++.dg/ext/altivec-18.C   (revision 252028)
@@ -0,0 +1,14 @@
+// PR target/82112
+// { dg-do compile { target powerpc*-*-* } }
+// { dg-require-effective-target powerpc_altivec_ok }
+// { dg-options "-maltivec" }
+
+#include 
+
+__attribute__((aligned (16))) extern const unsigned char c[16];
+
+void
+foo (void)
+{
+  vec_ld (0, c);
+}
2017-09-15  Jakub Jelinek  

Backported from mainline
2017-09-14  Jakub Jelinek  
 
PR target/81325
* cfgbuild.c (find_bb_boundaries): Ignore 

RE: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling

2017-09-15 Thread Tsimbalist, Igor V
> -Original Message-
> From: Tsimbalist, Igor V
> Sent: Tuesday, September 12, 2017 5:35 PM
> To: 'Richard Biener' 
> Cc: 'gcc-patches@gcc.gnu.org' ; Tsimbalist, Igor V
> 
> Subject: RE: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> 
> > -Original Message-
> > From: Tsimbalist, Igor V
> > Sent: Friday, August 18, 2017 4:43 PM
> > To: 'Richard Biener' 
> > Cc: gcc-patches@gcc.gnu.org; Tsimbalist, Igor V
> > 
> > Subject: RE: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> >
> > > -Original Message-
> > > From: Richard Biener [mailto:richard.guent...@gmail.com]
> > > Sent: Friday, August 18, 2017 3:53 PM
> > > To: Tsimbalist, Igor V 
> > > Cc: gcc-patches@gcc.gnu.org
> > > Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> > >
> > > On Fri, Aug 18, 2017 at 3:11 PM, Tsimbalist, Igor V
> > >  wrote:
> > > >> -Original Message-
> > > >> From: Richard Biener [mailto:richard.guent...@gmail.com]
> > > >> Sent: Tuesday, August 15, 2017 3:43 PM
> > > >> To: Tsimbalist, Igor V 
> > > >> Cc: gcc-patches@gcc.gnu.org
> > > >> Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> > > >>
> > > >> On Tue, Aug 1, 2017 at 10:56 AM, Tsimbalist, Igor V
> > > >>  wrote:
> > > >> > Part#1. Add generic part for Intel CET enabling.
> > > >> >
> > > >> > The spec is available at
> > > >> >
> > > >> > https://software.intel.com/sites/default/files/managed/4d/2a/co
> > > >> > nt ro l-f low-enforcement-technology-preview.pdf

<..skipped..>

> > > >> I think 'notrack' is somewhat unspecific of a name, what
> > > >> prevented you to use 'nocet'?
> > > >
> > > > Actually it's specific. The HW will have a prefix with exactly
> > > > this name and
> > > the same meaning. And I think, what is more important, 'track/notrack'
> > > gives better semantic for a user. CET is a name bound with Intel
> > > specific technology.
> > >
> > > But 'tracking' something is quite unspecific.  Tracking for what?
> > > 'no_verify_cf' (aka do not verify control flow) maybe?
> >
> > The name just  has to suggest the right semantic. 'no_verify_cf' is
> > good, let's use it unless different name appears.
> I have renamed all newly introduced function and macro names to use
> 'noverify_cf'. But I still keep the attribute name as 'notrack'. Historically 
> the
> attribute name follows the public CET specification, which uses 'no-track
> prefix' wording. Is it ok to keep such attribute name?

Here is an updated proposal about option name and attribute name.

The new option has values to let a user to choose what control-flow protection 
to activate.

-fcf-protection=[full|branch|return|none]
  branch - do control-flow protection for indirect jumps and calls
  return - do control-flow protection for function returns
  full - alias to specify both branch + return
  none - turn off protection. This value is needed when/if cf-protection is 
turned on by default by driver in future

Attribute name is the most tough one. Here are several names to evaluate: 
'nocf_verify' or 'nocf_check', or to be more specific and to mimic option name 
'nocf_branch_verify' or 'nocf_branch_check'. I would prefer 'nocf_check' as it 
applies to functions and function pointers so it's definitely related to a 
branch and it's a smaller one.

If you ok with the new proposal I'll implement it in a general parts (code, 
documentation and tests) and resend these patches for review.

Thanks,
Igor



[c-family] Issue a warning in C++ on pragma scalar_storage_order

2017-09-15 Thread Eric Botcazou
Hi,

this plugs the hole reported by Florian on the gcc@ list, namely that no 
warning is issued with -Wall in C++ on pragma scalar_storage_order.

Tested on x86_64-suse-linux, OK for the mainline?  And some branches?


2017-09-15  Eric Botcazou  

* c-pragma.c (handle_pragma_scalar_storage_order): Expand on error
message for non-uniform endianness and issue a warning in C++.


2017-09-15  Eric Botcazou  

* g++.dg/sso-1.C: New test.
* g++.dg/sso-2.C: Likewise.

-- 
Eric BotcazouIndex: c-family/c-pragma.c
===
--- c-family/c-pragma.c	(revision 252749)
+++ c-family/c-pragma.c	(working copy)
@@ -415,7 +415,19 @@ handle_pragma_scalar_storage_order (cpp_
   tree x;
 
   if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
-error ("scalar_storage_order is not supported");
+{
+  error ("scalar_storage_order is not supported because endianness "
+	 "is not uniform");
+  return;
+}
+
+  if (c_dialect_cxx ())
+{
+  if (warn_unknown_pragmas > in_system_header_at (input_location))
+	warning (OPT_Wunknown_pragmas,
+		 "%<#pragma scalar_storage_order%> is not supported for C++");
+  return;
+}
 
   token = pragma_lex ();
   if (token != CPP_NAME)
/* Test support of scalar_storage_order attribute */

/* { dg-do compile } */

struct __attribute__((scalar_storage_order("little-endian"))) Rec /* { dg-warning "attribute ignored" } */
{
  int i;
};
/* Test support of scalar_storage_order pragma */

/* { dg-do compile } */
/* { dg-options "-Wall" } */

#pragma scalar_storage_order little-endian /* { dg-warning "not supported" } */


Transform (x / y) != 0 to x >=y and (x / y) == 0 to x < y if x, y are unsigned

2017-09-15 Thread Prathamesh Kulkarni
Hi,
This patch adds the transforms mentioned in $subject.
Bootstrap+test in progress on x86_64-unknown-linux-gnu.
OK to commit if passes ?

Thanks,
Prathamesh
2017-09-15  Prathamesh Kulkarni  

* match.pd ((X / Y) == 0 -> X < Y): New pattern.
((X / Y) != 0 -> X >= Y): Likewise.

testsuite/
* gcc.dg/tree-ssa/cmpdiv.c: New test.

diff --git a/gcc/match.pd b/gcc/match.pd
index dbfceaf10a5..0e1b59c9c10 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1266,6 +1266,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   || TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0
(op @1 @0
 
+/* (X / Y) == 0 -> X < Y if X, Y are unsigned.  */
+(simplify
+  (eq (trunc_div @0 @1) integer_zerop)
+  (if (TYPE_UNSIGNED (TREE_TYPE(@0)) && TYPE_UNSIGNED (TREE_TYPE (@1)))
+(lt @0 @1)))
+
+/* (X / Y) != 0 -> X >= Y, if X, Y are unsigned.  */
+(simplify
+  (ne (trunc_div @0 @1) integer_zerop)
+  (if (TYPE_UNSIGNED (TREE_TYPE(@0)) && TYPE_UNSIGNED (TREE_TYPE (@1)))
+(ge @0 @1)))
+
 /* X == C - X can never be true if C is odd.  */
 (for cmp (eq ne)
  (simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cmpdiv.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cmpdiv.c
new file mode 100644
index 000..14161f5ea6f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cmpdiv.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+
+_Bool f1(unsigned x, unsigned y)
+{
+  unsigned t1 = x / y;
+  _Bool t2 = (t1 != 0);
+  return t2;
+}
+
+_Bool f2(unsigned x, unsigned y)
+{
+  unsigned t1 = x / y;
+  _Bool t2 = (t1 == 0);
+  return t2;
+}
+
+/* { dg-final { scan-tree-dump-not "trunc_div_expr" "optimized" } } */


Re: [PATCH] Add comments to struct cgraph_thunk_info

2017-09-15 Thread Pierre-Marie de Rodat

On 09/14/2017 07:16 PM, Jeff Law wrote:

The comment additions are fine.  What's the rationale behind the
ordering of the fields?  In general we want the opposite order from what
you did -- going from most strictly aligned to least strictly aligned
minimizes the amount of unused padding.


Thank you for the review!

I moved them because I thought it would make more sense to readers: flag 
that describe the “shape” of the thunk, and then the associated data, 
whose validity depends on the flags. I also thought that in practice, 
cgraph_thunk_info is embedded only in the cgraph_node class, and the 
field after “thunk” (“count”, a profile_count class) must be aligned at 
least on 8 bytes (because of n_bits/m_quality), so this change would not 
matter structure-size-wise.


I’m not super confident about this though, so I’ll resubmit a patch 
without the reordering: I’ve added more comments anyway as I’ve learned 
more about this since yesterday. ;-)


--
Pierre-Marie de Rodat


Fix an SVE failure in the Fortran matmul* tests

2017-09-15 Thread Richard Sandiford
The vectoriser was calling vect_get_smallest_scalar_type without
having proven that the type actually is a scalar.  This seems to
be the intended behaviour: the ultimate test of whether the type
is interesting (and hence scalar) is whether an associated vector
type exists, but this is only tested later.

The patch simply makes the function cope gracefully with non-scalar
inputs.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-09-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-data-refs.c (vect_get_smallest_scalar_type): Cope
with types that aren't in fact scalar.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2017-09-14 17:35:26.634355297 +0100
+++ gcc/tree-vect-data-refs.c   2017-09-15 11:41:22.764283196 +0100
@@ -118,6 +118,11 @@ vect_get_smallest_scalar_type (gimple *s
   tree scalar_type = gimple_expr_type (stmt);
   HOST_WIDE_INT lhs, rhs;
 
+  /* During the analysis phase, this function is called on arbitrary
+ statements that might not have scalar results.  */
+  if (!tree_fits_uhwi_p (TYPE_SIZE_UNIT (scalar_type)))
+return scalar_type;
+
   lhs = rhs = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (scalar_type));
 
   if (is_gimple_assign (stmt)


[Demangle PATCH] Some pre-fix cleanups

2017-09-15 Thread Nathan Sidwell
In working on bug 82195, a lambda-related demangler bug, I noticed some 
cleanup opportunities.


1) we express an is_fnqual_component_type in two separate, independent, 
ways.  It would be smarter to express one of those ways in terms of the 
other.


2) An overly nested function call made it awkward to look at the 
argument values at the outer call side.  Use a local variable.


3) An if condition ended with the same call in both true and false 
cases.  Move that call out of the if.


4) The testsuite had mis-ordered blank lines (optional args) towards the 
end.


Applying to trunk.

nathan
--
Nathan Sidwell
2017-09-15  Nathan Sidwell  

	* cp-demangle.c (is_fnqual_component_type): Reimplement using
	FNQUAL_COMPONENT_CASE.
	(d_encoding): Hold bare_function_type in local var.
	(d_local_name): Build name in both cases and build result once.
	Collapse switch-if to single conditional.
	* testsuite/demangle-expected: Realign blank lines with tests.

Index: cp-demangle.c
===
--- cp-demangle.c	(revision 252076)
+++ cp-demangle.c	(working copy)
@@ -568,22 +568,6 @@ static int d_demangle_callback (const ch
 demangle_callbackref, void *);
 static char *d_demangle (const char *, int, size_t *);
 
-/* True iff TYPE is a demangling component representing a
-   function-type-qualifier.  */
-
-static int
-is_fnqual_component_type (enum demangle_component_type type)
-{
-  return (type == DEMANGLE_COMPONENT_RESTRICT_THIS
-	  || type == DEMANGLE_COMPONENT_VOLATILE_THIS
-	  || type == DEMANGLE_COMPONENT_CONST_THIS
-	  || type == DEMANGLE_COMPONENT_RVALUE_REFERENCE_THIS
-	  || type == DEMANGLE_COMPONENT_TRANSACTION_SAFE
-	  || type == DEMANGLE_COMPONENT_NOEXCEPT
-	  || type == DEMANGLE_COMPONENT_THROW_SPEC
-	  || type == DEMANGLE_COMPONENT_REFERENCE_THIS);
-}
-
 #define FNQUAL_COMPONENT_CASE\
 case DEMANGLE_COMPONENT_RESTRICT_THIS:		\
 case DEMANGLE_COMPONENT_VOLATILE_THIS:		\
@@ -594,6 +578,23 @@ is_fnqual_component_type (enum demangle_
 case DEMANGLE_COMPONENT_NOEXCEPT:			\
 case DEMANGLE_COMPONENT_THROW_SPEC
 
+/* True iff TYPE is a demangling component representing a
+   function-type-qualifier.  */
+
+static int
+is_fnqual_component_type (enum demangle_component_type type)
+{
+  switch (type)
+{
+FNQUAL_COMPONENT_CASE:
+  return 1;
+default:
+  break;
+}
+  return 0;
+}
+
+
 #ifdef CP_DEMANGLE_DEBUG
 
 static void
@@ -1305,7 +1306,7 @@ d_encoding (struct d_info *di, int top_l
 return d_special_name (di);
   else
 {
-  struct demangle_component *dc;
+  struct demangle_component *dc, *dcr;
 
   dc = d_name (di);
 
@@ -1327,8 +1328,6 @@ d_encoding (struct d_info *di, int top_l
 	 which is local to a function.  */
 	  if (dc->type == DEMANGLE_COMPONENT_LOCAL_NAME)
 	{
-	  struct demangle_component *dcr;
-
 	  dcr = d_right (dc);
 	  while (is_fnqual_component_type (dcr->type))
 		dcr = d_left (dcr);
@@ -1341,8 +1340,8 @@ d_encoding (struct d_info *di, int top_l
   peek = d_peek_char (di);
   if (dc == NULL || peek == '\0' || peek == 'E')
 	return dc;
-  return d_make_comp (di, DEMANGLE_COMPONENT_TYPED_NAME, dc,
-			  d_bare_function_type (di, has_return_type (dc)));
+  dcr = d_bare_function_type (di, has_return_type (dc));
+  return d_make_comp (di, DEMANGLE_COMPONENT_TYPED_NAME, dc, dcr);
 }
 }
 
@@ -3571,6 +3570,7 @@ static struct demangle_component *
 d_local_name (struct d_info *di)
 {
   struct demangle_component *function;
+  struct demangle_component *name;
 
   if (! d_check_char (di, 'Z'))
 return NULL;
@@ -3585,13 +3585,10 @@ d_local_name (struct d_info *di)
   d_advance (di, 1);
   if (! d_discriminator (di))
 	return NULL;
-  return d_make_comp (di, DEMANGLE_COMPONENT_LOCAL_NAME, function,
-			  d_make_name (di, "string literal",
-   sizeof "string literal" - 1));
+  name = d_make_name (di, "string literal", sizeof "string literal" - 1);
 }
   else
 {
-  struct demangle_component *name;
   int num = -1;
 
   if (d_peek_char (di) == 'd')
@@ -3604,21 +3601,19 @@ d_local_name (struct d_info *di)
 	}
 
   name = d_name (di);
-  if (name)
-	switch (name->type)
-	  {
-	/* Lambdas and unnamed types have internal discriminators.  */
-	  case DEMANGLE_COMPONENT_LAMBDA:
-	  case DEMANGLE_COMPONENT_UNNAMED_TYPE:
-	break;
-	  default:
-	if (! d_discriminator (di))
-	  return NULL;
-	  }
+  if (name
+	  /* Lambdas and unnamed types have internal discriminators.  */
+	  && name->type != DEMANGLE_COMPONENT_LAMBDA
+	  && name->type != DEMANGLE_COMPONENT_UNNAMED_TYPE
+	  /* Otherwise read and ignore an optional discriminator.  */
+	  && ! d_discriminator (di))
+	return NULL;
+
   if (num >= 0)
 	name = d_make_default_arg (di, num, name);
-  return d_make_comp (di, DEMANGLE_COMPONENT_LOCAL_NAME, function, name);
 }
+
+  

[Demangle PATCH] Some pre-fix cleanups

2017-09-15 Thread Nathan Sidwell
In working on bug 82195, a lambda-related demangler bug, I noticed some 
cleanup opportunities.


1) we express an is_fnqual_component_type in two separate, independent, 
ways.  It would be smarter to express one of those ways in terms of the 
other.


2) An overly nested function call made it awkward to look at the 
argument values at the outer call side.  Use a local variable.


3) An if condition ended with the same call in both true and false 
cases.  Move that call out of the if.


4) The testsuite had mis-ordered blank lines (optional args) towards the 
end.


Applying to trunk.

nathan
--
Nathan Sidwell
2017-09-15  Nathan Sidwell  

	* cp-demangle.c (is_fnqual_component_type): Reimplement using
	FNQUAL_COMPONENT_CASE.
	(d_encoding): Hold bare_function_type in local var.
	(d_local_name): Build name in both cases and build result once.
	Collapse switch-if to single conditional.
	* testsuite/demangle-expected: Realign blank lines with tests.

Index: cp-demangle.c
===
--- cp-demangle.c	(revision 252076)
+++ cp-demangle.c	(working copy)
@@ -568,22 +568,6 @@ static int d_demangle_callback (const ch
 demangle_callbackref, void *);
 static char *d_demangle (const char *, int, size_t *);
 
-/* True iff TYPE is a demangling component representing a
-   function-type-qualifier.  */
-
-static int
-is_fnqual_component_type (enum demangle_component_type type)
-{
-  return (type == DEMANGLE_COMPONENT_RESTRICT_THIS
-	  || type == DEMANGLE_COMPONENT_VOLATILE_THIS
-	  || type == DEMANGLE_COMPONENT_CONST_THIS
-	  || type == DEMANGLE_COMPONENT_RVALUE_REFERENCE_THIS
-	  || type == DEMANGLE_COMPONENT_TRANSACTION_SAFE
-	  || type == DEMANGLE_COMPONENT_NOEXCEPT
-	  || type == DEMANGLE_COMPONENT_THROW_SPEC
-	  || type == DEMANGLE_COMPONENT_REFERENCE_THIS);
-}
-
 #define FNQUAL_COMPONENT_CASE\
 case DEMANGLE_COMPONENT_RESTRICT_THIS:		\
 case DEMANGLE_COMPONENT_VOLATILE_THIS:		\
@@ -594,6 +578,23 @@ is_fnqual_component_type (enum demangle_
 case DEMANGLE_COMPONENT_NOEXCEPT:			\
 case DEMANGLE_COMPONENT_THROW_SPEC
 
+/* True iff TYPE is a demangling component representing a
+   function-type-qualifier.  */
+
+static int
+is_fnqual_component_type (enum demangle_component_type type)
+{
+  switch (type)
+{
+FNQUAL_COMPONENT_CASE:
+  return 1;
+default:
+  break;
+}
+  return 0;
+}
+
+
 #ifdef CP_DEMANGLE_DEBUG
 
 static void
@@ -1305,7 +1306,7 @@ d_encoding (struct d_info *di, int top_l
 return d_special_name (di);
   else
 {
-  struct demangle_component *dc;
+  struct demangle_component *dc, *dcr;
 
   dc = d_name (di);
 
@@ -1327,8 +1328,6 @@ d_encoding (struct d_info *di, int top_l
 	 which is local to a function.  */
 	  if (dc->type == DEMANGLE_COMPONENT_LOCAL_NAME)
 	{
-	  struct demangle_component *dcr;
-
 	  dcr = d_right (dc);
 	  while (is_fnqual_component_type (dcr->type))
 		dcr = d_left (dcr);
@@ -1341,8 +1340,8 @@ d_encoding (struct d_info *di, int top_l
   peek = d_peek_char (di);
   if (dc == NULL || peek == '\0' || peek == 'E')
 	return dc;
-  return d_make_comp (di, DEMANGLE_COMPONENT_TYPED_NAME, dc,
-			  d_bare_function_type (di, has_return_type (dc)));
+  dcr = d_bare_function_type (di, has_return_type (dc));
+  return d_make_comp (di, DEMANGLE_COMPONENT_TYPED_NAME, dc, dcr);
 }
 }
 
@@ -3571,6 +3570,7 @@ static struct demangle_component *
 d_local_name (struct d_info *di)
 {
   struct demangle_component *function;
+  struct demangle_component *name;
 
   if (! d_check_char (di, 'Z'))
 return NULL;
@@ -3585,13 +3585,10 @@ d_local_name (struct d_info *di)
   d_advance (di, 1);
   if (! d_discriminator (di))
 	return NULL;
-  return d_make_comp (di, DEMANGLE_COMPONENT_LOCAL_NAME, function,
-			  d_make_name (di, "string literal",
-   sizeof "string literal" - 1));
+  name = d_make_name (di, "string literal", sizeof "string literal" - 1);
 }
   else
 {
-  struct demangle_component *name;
   int num = -1;
 
   if (d_peek_char (di) == 'd')
@@ -3604,21 +3601,19 @@ d_local_name (struct d_info *di)
 	}
 
   name = d_name (di);
-  if (name)
-	switch (name->type)
-	  {
-	/* Lambdas and unnamed types have internal discriminators.  */
-	  case DEMANGLE_COMPONENT_LAMBDA:
-	  case DEMANGLE_COMPONENT_UNNAMED_TYPE:
-	break;
-	  default:
-	if (! d_discriminator (di))
-	  return NULL;
-	  }
+  if (name
+	  /* Lambdas and unnamed types have internal discriminators.  */
+	  && name->type != DEMANGLE_COMPONENT_LAMBDA
+	  && name->type != DEMANGLE_COMPONENT_UNNAMED_TYPE
+	  /* Otherwise read and ignore an optional discriminator.  */
+	  && ! d_discriminator (di))
+	return NULL;
+
   if (num >= 0)
 	name = d_make_default_arg (di, num, name);
-  return d_make_comp (di, DEMANGLE_COMPONENT_LOCAL_NAME, function, name);
 }
+
+  

Include phis in SLP unrolling calculation

2017-09-15 Thread Richard Sandiford
Without this we'd pick an unrolling factor based purely on longs,
ignoring the ints.  It's posssible that vect_get_smallest_scalar_type
should also handle shifts, but I think we'd still want this as a
belt-and-braces fix.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-09-15  Richard Sandiford  

gcc/
* tree-vect-slp.c (vect_record_max_nunits): New function,
split out from...
(vect_build_slp_tree_1): ...here.
(vect_build_slp_tree_2): Call it for phis too.

gcc/testsuite/
* gcc.dg/vect/slp-multitypes-13.c: New test.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c 2017-09-15 11:35:46.833592065 +0100
+++ gcc/tree-vect-slp.c 2017-09-15 11:40:10.286573578 +0100
@@ -480,6 +480,48 @@ vect_get_and_check_slp_defs (vec_info *v
   return 0;
 }
 
+/* A subroutine of vect_build_slp_tree for checking VECTYPE, which is the
+   caller's attempt to find the vector type in STMT with the narrowest
+   element type.  Return true if VECTYPE is nonnull and if it is valid
+   for VINFO.  When returning true, update MAX_NUNITS to reflect the
+   number of units in VECTYPE.  VINFO, GORUP_SIZE and MAX_NUNITS are
+   as for vect_build_slp_tree.  */
+
+static bool
+vect_record_max_nunits (vec_info *vinfo, gimple *stmt, unsigned int group_size,
+   tree vectype, unsigned int *max_nunits)
+{
+  if (!vectype)
+{
+  if (dump_enabled_p ())
+   {
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "Build SLP failed: unsupported data-type in ");
+ dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
+ dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+   }
+  /* Fatal mismatch.  */
+  return false;
+}
+
+  /* If populating the vector type requires unrolling then fail
+ before adjusting *max_nunits for basic-block vectorization.  */
+  if (is_a  (vinfo)
+  && TYPE_VECTOR_SUBPARTS (vectype) > group_size)
+{
+  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "Build SLP failed: unrolling required "
+  "in basic block SLP\n");
+  /* Fatal mismatch.  */
+  return false;
+}
+
+  /* In case of multiple types we need to detect the smallest type.  */
+  if (*max_nunits < TYPE_VECTOR_SUBPARTS (vectype))
+*max_nunits = TYPE_VECTOR_SUBPARTS (vectype);
+
+  return true;
+}
 
 /* Verify if the scalar stmts STMTS are isomorphic, require data
permutation or are of unsupported types of operation.  Return
@@ -560,38 +602,14 @@ vect_build_slp_tree_1 (vec_info *vinfo,
 
   scalar_type = vect_get_smallest_scalar_type (stmt, , );
   vectype = get_vectype_for_scalar_type (scalar_type);
-  if (!vectype)
-{
-  if (dump_enabled_p ())
-{
-  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, 
-  "Build SLP failed: unsupported data-type ");
-  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
-scalar_type);
-  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
-}
+  if (!vect_record_max_nunits (vinfo, stmt, group_size, vectype,
+  max_nunits))
+   {
  /* Fatal mismatch.  */
  matches[0] = false;
   return false;
 }
 
-  /* If populating the vector type requires unrolling then fail
- before adjusting *max_nunits for basic-block vectorization.  */
-  if (is_a  (vinfo)
- && TYPE_VECTOR_SUBPARTS (vectype) > group_size)
-   {
- dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, 
-  "Build SLP failed: unrolling required "
-  "in basic block SLP\n");
- /* Fatal mismatch.  */
- matches[0] = false;
- return false;
-   }
-
-  /* In case of multiple types we need to detect the smallest type.  */
-  if (*max_nunits < TYPE_VECTOR_SUBPARTS (vectype))
-   *max_nunits = TYPE_VECTOR_SUBPARTS (vectype);
-
   if (gcall *call_stmt = dyn_cast  (stmt))
{
  rhs_code = CALL_EXPR;
@@ -1018,6 +1036,12 @@ vect_build_slp_tree_2 (vec_info *vinfo,
  the recursion.  */
   if (gimple_code (stmt) == GIMPLE_PHI)
 {
+  tree scalar_type = TREE_TYPE (PHI_RESULT (stmt));
+  tree vectype = get_vectype_for_scalar_type (scalar_type);
+  if (!vect_record_max_nunits (vinfo, stmt, group_size, vectype,
+  max_nunits))
+   return NULL;
+
   vect_def_type def_type = STMT_VINFO_DEF_TYPE (vinfo_for_stmt (stmt));
   /* Induction from different IVs is not supported.  */
   if (def_type == vect_induction_def)
Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-13.c

Fix vectorizable_mask_load_store handling of invariant masks

2017-09-15 Thread Richard Sandiford
vectorizable_mask_load_store was not passing the required mask type to
vect_get_vec_def_for_operand.  This doesn't matter for masks that are
defined in the loop, since their STMT_VINFO_VECTYPE will be what we need
anyway.  But it's not possible to tell which mask type the caller needs
when looking at an invariant scalar boolean.  As the comment above the
function says:

   In case OP is an invariant or constant, a new stmt that creates a vector def
   needs to be introduced.  VECTYPE may be used to specify a required type for
   vector invariant.

This fixes the attached testcase for SVE.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-09-15  Richard Sandiford  

gcc/
* tree-vect-stmts.c (vectorizable_mask_load_store): Pass mask_vectype
to vect_get_vec_def_for_operand when getting the mask operand.

gcc/testsuite/
* gfortran.dg/vect/mask-store-1.f90: New test.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2017-09-15 11:36:58.331030784 +0100
+++ gcc/tree-vect-stmts.c   2017-09-15 11:39:17.850889641 +0100
@@ -2331,7 +2331,8 @@ vectorizable_mask_load_store (gimple *st
{
  tree rhs = gimple_call_arg (stmt, 3);
  vec_rhs = vect_get_vec_def_for_operand (rhs, stmt);
- vec_mask = vect_get_vec_def_for_operand (mask, stmt);
+ vec_mask = vect_get_vec_def_for_operand (mask, stmt,
+  mask_vectype);
  /* We should have catched mismatched types earlier.  */
  gcc_assert (useless_type_conversion_p (vectype,
 TREE_TYPE (vec_rhs)));
@@ -2388,7 +2389,8 @@ vectorizable_mask_load_store (gimple *st
 
  if (i == 0)
{
- vec_mask = vect_get_vec_def_for_operand (mask, stmt);
+ vec_mask = vect_get_vec_def_for_operand (mask, stmt,
+  mask_vectype);
  dataref_ptr = vect_create_data_ref_ptr (stmt, vectype, NULL,
  NULL_TREE, , gsi,
  _incr, false, _p);
Index: gcc/testsuite/gfortran.dg/vect/mask-store-1.f90
===
--- /dev/null   2017-09-15 10:12:35.472207962 +0100
+++ gcc/testsuite/gfortran.dg/vect/mask-store-1.f90 2017-09-15 
11:39:17.849889812 +0100
@@ -0,0 +1,11 @@
+subroutine foo(a, b, x, n)
+  real(kind=8) :: a(n), b(n), tmp
+  logical(kind=1) :: x
+  integer(kind=4) :: i, n
+  do i = 1, n
+ if (x) then
+a(i) = b(i)
+ end if
+ b(i) = b(i) + 10
+  end do
+end subroutine


Fix vectorizable_live_operation handling of vector booleans

2017-09-15 Thread Richard Sandiford
vectorizable_live_operation needs to use BIT_FIELD_REF to extract one
element of a vector.  For a packed vector boolean type, the number of
bits to extract should be taken from TYPE_PRECISION rather than TYPE_SIZE.

This is shown by existing tests once SVE is added.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-09-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-loop.c (vectorizable_live_operation): Fix element size
calculation for vector booleans.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2017-09-15 11:35:46.832592132 +0100
+++ gcc/tree-vect-loop.c2017-09-15 11:37:45.639244036 +0100
@@ -7065,7 +7065,9 @@ vectorizable_live_operation (gimple *stm
: gimple_get_lhs (stmt);
   lhs_type = TREE_TYPE (lhs);
 
-  bitsize = TYPE_SIZE (TREE_TYPE (vectype));
+  bitsize = (VECTOR_BOOLEAN_TYPE_P (vectype)
+? bitsize_int (TYPE_PRECISION (TREE_TYPE (vectype)))
+: TYPE_SIZE (TREE_TYPE (vectype)));
   vec_bitsize = TYPE_SIZE (vectype);
 
   /* Get the vectorized lhs of STMT and the lane to use (counted in bits).  */


Fix type of bitstart in vectorizable_live_operation

2017-09-15 Thread Richard Sandiford
This patch changes the type of the multiplier applied by
vectorizable_live_operation from unsigned_type_node to bitsizetype,
which matches the type of TYPE_SIZE and is the type expected of a
BIT_FIELD_REF bit position.  This is shown by existing tests when
SVE is added.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-09-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-loop.c (vectorizable_live_operation): Fix type of
bitstart.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2017-09-15 11:37:45.639244036 +0100
+++ gcc/tree-vect-loop.c2017-09-15 11:38:33.276424843 +0100
@@ -7090,7 +7090,7 @@ vectorizable_live_operation (gimple *stm
   vec_lhs = gimple_get_lhs (SLP_TREE_VEC_STMTS (slp_node)[vec_entry]);
 
   /* Get entry to use.  */
-  bitstart = build_int_cst (unsigned_type_node, vec_index);
+  bitstart = bitsize_int (vec_index);
   bitstart = int_const_binop (MULT_EXPR, bitsize, bitstart);
 }
   else


Invoke vectorizable_live_operation in a consistent way

2017-09-15 Thread Richard Sandiford
vect_transform_stmt calls vectorizable_live_operation for
each live statement in an SLP node, but vect_analyze_stmt
only called it the once.  This patch makes vect_analyze_stmt
consistent with vect_transform_stmt, which should be a bit
more robust, and also means that a later patch can use
slp_index when deciding validity.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-09-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-stmts.c (can_vectorize_live_stmts): New function,
split out from...
(vect_transform_stmt): ...here.
(vect_analyze_stmt): Use it instead of calling
vectorizable_live_operation directly.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2017-09-14 17:30:48.110211243 +0100
+++ gcc/tree-vect-stmts.c   2017-09-15 11:36:58.331030784 +0100
@@ -8479,6 +8479,35 @@ vectorizable_comparison (gimple *stmt, g
   return true;
 }
 
+/* If SLP_NODE is nonnull, return true if vectorizable_live_operation
+   can handle all live statements in the node.  Otherwise return true
+   if STMT is not live or if vectorizable_live_operation can handle it.
+   GSI and VEC_STMT are as for vectorizable_live_operation.  */
+
+static bool
+can_vectorize_live_stmts (gimple *stmt, gimple_stmt_iterator *gsi,
+ slp_tree slp_node, gimple **vec_stmt)
+{
+  if (slp_node)
+{
+  gimple *slp_stmt;
+  unsigned int i;
+  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt)
+   {
+ stmt_vec_info slp_stmt_info = vinfo_for_stmt (slp_stmt);
+ if (STMT_VINFO_LIVE_P (slp_stmt_info)
+ && !vectorizable_live_operation (slp_stmt, gsi, slp_node, i,
+  vec_stmt))
+   return false;
+   }
+}
+  else if (STMT_VINFO_LIVE_P (vinfo_for_stmt (stmt))
+  && !vectorizable_live_operation (stmt, gsi, slp_node, -1, vec_stmt))
+return false;
+
+  return true;
+}
+
 /* Make sure the statement is vectorizable.  */
 
 bool
@@ -8685,17 +8714,13 @@ vect_analyze_stmt (gimple *stmt, bool *n
 
   /* Stmts that are (also) "live" (i.e. - that are used out of the loop)
   need extra handling, except for vectorizable reductions.  */
-  if (STMT_VINFO_LIVE_P (stmt_info)
-  && STMT_VINFO_TYPE (stmt_info) != reduc_vec_info_type)
-ok = vectorizable_live_operation (stmt, NULL, node, -1, NULL);
-
-  if (!ok)
+  if (STMT_VINFO_TYPE (stmt_info) != reduc_vec_info_type
+  && !can_vectorize_live_stmts (stmt, NULL, node, NULL))
 {
   if (dump_enabled_p ())
 {
   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-   "not vectorized: live stmt not ");
-  dump_printf (MSG_MISSED_OPTIMIZATION,  "supported: ");
+   "not vectorized: live stmt not supported: ");
   dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
 }
 
@@ -8861,26 +8886,9 @@ vect_transform_stmt (gimple *stmt, gimpl
 
   /* Handle stmts whose DEF is used outside the loop-nest that is
  being vectorized.  */
-  if (slp_node)
-{
-  gimple *slp_stmt;
-  int i;
-  if (STMT_VINFO_TYPE (stmt_info) != reduc_vec_info_type)
-   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt)
- {
-   stmt_vec_info slp_stmt_info = vinfo_for_stmt (slp_stmt);
-   if (STMT_VINFO_LIVE_P (slp_stmt_info))
- {
-   done = vectorizable_live_operation (slp_stmt, gsi, slp_node, i,
-   _stmt);
-   gcc_assert (done);
- }
- }
-}
-  else if (STMT_VINFO_LIVE_P (stmt_info)
-  && STMT_VINFO_TYPE (stmt_info) != reduc_vec_info_type)
+  if (STMT_VINFO_TYPE (stmt_info) != reduc_vec_info_type)
 {
-  done = vectorizable_live_operation (stmt, gsi, slp_node, -1, _stmt);
+  done = can_vectorize_live_stmts (stmt, gsi, slp_node, _stmt);
   gcc_assert (done);
 }
 


Move computation of SLP_TREE_NUMBER_OF_VEC_STMTS

2017-09-15 Thread Richard Sandiford
Previously SLP_TREE_NUMBER_OF_VEC_STMTS was calculated while scheduling
an SLP tree after analysis, but sometimes it can be useful to know the
value during analysis too.  This patch moves the calculation to
vect_slp_analyze_node_operaions instead.

This became more natural after:

2017-06-30  Richard Biener  

* tree-vect-slp.c (vect_slp_analyze_node_operations): Only
analyze the first scalar stmt.  Move vector type computation
for the BB case here from ...
* tree-vect-stmts.c (vect_analyze_stmt): ... here.  Guard
live operation processing in the SLP case properly.

since the STMT_VINFO_VECTYPE is now always initialised in time.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-09-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vectorizer.h (vect_slp_analyze_operations): Replace parameters
with a vec_info *.
* tree-vect-loop.c (vect_analyze_loop_operations): Update call
accordingly.
* tree-vect-slp.c (vect_slp_analyze_node_operations): Add vec_info *
parameter.  Set SLP_TREE_NUMBER_OF_VEC_STMTS here rather than in
vect_schedule_slp_instance.
(vect_slp_analyze_operations): Replace parameters with a vec_info *.
Update call to vect_slp_analyze_node_operations.  Simplify return
value.
(vect_slp_analyze_bb_1): Update call accordingly.
(vect_schedule_slp_instance): Remove vectorization_factor parameter.
Don't calculate SLP_TREE_NUMBER_OF_VEC_STMTS here.
(vect_schedule_slp): Update call accordingly.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2017-09-14 17:35:26.635276568 +0100
+++ gcc/tree-vectorizer.h   2017-09-15 11:35:46.833592065 +0100
@@ -1246,8 +1246,7 @@ extern void vect_free_slp_instance (slp_
 extern bool vect_transform_slp_perm_load (slp_tree, vec ,
   gimple_stmt_iterator *, int,
   slp_instance, bool, unsigned *);
-extern bool vect_slp_analyze_operations (vec slp_instances,
-void *);
+extern bool vect_slp_analyze_operations (vec_info *);
 extern bool vect_schedule_slp (vec_info *);
 extern bool vect_analyze_slp (vec_info *, unsigned);
 extern bool vect_make_slp_decision (loop_vec_info);
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2017-09-14 17:35:26.635276568 +0100
+++ gcc/tree-vect-loop.c2017-09-15 11:35:46.832592132 +0100
@@ -2031,8 +2031,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
 remove unsupported SLP instances which makes the above
 SLP kind detection invalid.  */
   unsigned old_size = LOOP_VINFO_SLP_INSTANCES (loop_vinfo).length ();
-  vect_slp_analyze_operations (LOOP_VINFO_SLP_INSTANCES (loop_vinfo),
-  LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
+  vect_slp_analyze_operations (loop_vinfo);
   if (LOOP_VINFO_SLP_INSTANCES (loop_vinfo).length () != old_size)
goto again;
 }
Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c 2017-09-14 17:04:19.083694343 +0100
+++ gcc/tree-vect-slp.c 2017-09-15 11:35:46.833592065 +0100
@@ -2501,11 +2501,14 @@ _bb_vec_info::~_bb_vec_info ()
 }
 
 
-/* Analyze statements contained in SLP tree node after recursively analyzing
-   the subtree. Return TRUE if the operations are supported.  */
+/* Analyze statements contained in SLP tree NODE after recursively analyzing
+   the subtree.  NODE_INSTANCE contains NODE and VINFO contains INSTANCE.
+
+   Return true if the operations are supported.  */
 
 static bool
-vect_slp_analyze_node_operations (slp_tree node, slp_instance node_instance)
+vect_slp_analyze_node_operations (vec_info *vinfo, slp_tree node,
+ slp_instance node_instance)
 {
   bool dummy;
   int i, j;
@@ -2516,7 +2519,7 @@ vect_slp_analyze_node_operations (slp_tr
 return true;
 
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
-if (!vect_slp_analyze_node_operations (child, node_instance))
+if (!vect_slp_analyze_node_operations (vinfo, child, node_instance))
   return false;
 
   stmt = SLP_TREE_SCALAR_STMTS (node)[0];
@@ -2568,6 +2571,29 @@ vect_slp_analyze_node_operations (slp_tr
STMT_VINFO_VECTYPE (vinfo_for_stmt (sstmt)) = vectype;
 }
 
+  /* Calculate the number of vector statements to be created for the
+ scalar stmts in this node.  For SLP reductions it is equal to the
+ number of vector statements in the children (which has already been
+ calculated by the recursive call).  Otherwise it 

Re: [PATCH] Enhance PHI processing in VN

2017-09-15 Thread David Edelsohn
On Fri, Sep 15, 2017 at 2:53 AM, Richard Biener  wrote:
> On Thu, 14 Sep 2017, David Edelsohn wrote:
>
>> * tree-ssa-sccvn.c (visit_phi): Merge undefined values similar
>> to VN_TOP.
>>
>> This seems to have regressed
>>
>> FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile
>> "Read tp_first_run: 0" 2
>> FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile
>> "Read tp_first_run: 2" 1
>> FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile
>> "Read tp_first_run: 3" 1
>
> Hmm, I don't see these FAILs.  Looking at the testcase there are
> no undefined uses so I wonder how the patch could have any effect.
>
> Can you re-check and open a bugreport?

It disappeared again.  A different failure appeared and disappeared a
few weeks ago.  Something in the testsuite infrastructure appears to
not be stable, at least on AIX.  Sorry for the incorrect report.

- David


[C++ Patch Ping] PR 64644 (""warning: anonymous union with no members" should be an error with -pedantic-errors")

2017-09-15 Thread Paolo Carlini

Hi,

gently pinging this.

On 16/06/2017 15:47, Paolo Carlini wrote:

Hi,

submitter and Manuel analyzed this a while ago and came to the 
conclusion - which I think is still valid vs the current working draft 
- that strictly speaking this kind of code violates [dcl.dcl], thus a 
pedwarn seems more suited than a plain warning. The below one-liner, 
suggested by Manuel at the time, passes testing on x86_64-linux 
together with my testsuite changes.


https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01193.html

Thanks!
Paolo.


Re: [RFC][PACH 3/5] Prevent tree unroller from completely unrolling inner loops if that results in excessive strided-loads in outer loop

2017-09-15 Thread Richard Biener
On Fri, Sep 15, 2017 at 5:44 AM, Andrew Pinski  wrote:
> On Thu, Sep 14, 2017 at 6:30 PM, Kugan Vivekanandarajah
>  wrote:
>> This patch prevent tree unroller from completely unrolling inner loops if 
>> that
>> results in excessive strided-loads in outer loop.
>
> Same comments from the RTL version.
>
> Though one more comment here:
> +  if (!INDIRECT_REF_P (op)

There's no INDIRECT_REF in GIMPLE.

> +  && TREE_CODE (op) != MEM_REF
> +  && TREE_CODE (op) != TARGET_MEM_REF)
> +continue;
>
> This does not handle ARRAY_REF which might be/should be handled.

It looks like he wants to do

 op = get_base_address (op);

first.

But OTOH the routine looks completely bogus to me ...

You want to do

  find_data_references_in_stmt ()

and then look at the data-refs and the evolution of their access fns.

The function needs _way_ more comments though, you have to apply excessive
guessing as to what it computes.  It also feels like this should be a target
hook but part of some generic cost modeling infrastructure and the target
should instead provide the number of load/store streams it can handle
well (aka HW-prefetch).  That would be also (very) useful information
for the loop distribution pass.

Related information that is missing is for the vectorizer peeling cost model
the number of store buffers when deciding whether to peel stores for alignment
for example.

Richard.

>
> +  if ((loop_father = loop_outer (loop)))
>
> Since you don't use loop_father outside of the if statement use the
> following (allowed) format
> if (struct loop *loop_father = loop_outer (loop))
>
> Thinking about this more, hw_prefetchers_avail might not be equivalent
> to num_slots (PARAM_SIMULTANEOUS_PREFETCHES) but the name does not fit
> what it means if I understand your hardware correctly.
> Maybe hw_load_non_cacheline_prefetcher_avail since if I understand the
> micro-arch is that the prefetchers are not based on the cacheline
> being loaded.
>
> Thanks,
> Andrew
>
>>
>> Thanks,
>> Kugan
>>
>> gcc/ChangeLog:
>>
>> 2017-09-12  Kugan Vivekanandarajah  
>>
>> * config/aarch64/aarch64.c (count_mem_load_streams): New.
>> (aarch64_ok_to_unroll): New.
>> * doc/tm.texi (ok_to_unroll): Define new target hook.
>> * doc/tm.texi.in (ok_to_unroll): Likewise.
>> * target.def (ok_to_unroll): Likewise.
>> * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Use
>>   ok_to_unroll while unrolling.


Re: [RFC][PATCH 4/5] Change iv_analyze_result to take const_rtx.

2017-09-15 Thread Richard Biener
On Fri, Sep 15, 2017 at 3:31 AM, Kugan Vivekanandarajah
 wrote:
> Change iv_analyze_result to take const_rtx. This is just to make the
> next patch compile. No functional changes:

Ok.

Richard.

> Thanks,
> Kugan
>
> gcc/ChangeLog:
>
> 2017-09-12  Kugan Vivekanandarajah  
>
> * cfgloop.h (iv_analyze_result): Change 2nd param from rtx to
>   const_rtx.
> * df-core.c (df_find_def): Likewise.
> * df.h (df_find_def): Likewise.
> * loop-iv.c (iv_analyze_result): Likewise.


Re: [RFC][PATCH 1/5] Add separate parms for rtl unroller

2017-09-15 Thread Richard Biener
On Fri, Sep 15, 2017 at 3:27 AM, Kugan Vivekanandarajah
 wrote:
> This patch adds separate params for rtl unroller so that they can be
> tunned accordingly. Default values I have are based on some testing on
> aarch64. I am happy to leave it as the current value and set them in
> the back-end.

PARAM_MAX_AVERAGE_UNROLLED_INSNS is only used by the RTL
unroller.  Why should we separate PARAM_MAX_UNROLL_TIMES?

PARAM_MAX_UNROLLED_INSNS is only used by gimple passes
that perform unrolling.  Since GIMPLE is three-address it should
match RTL reasonably well -- but I'd be ok in having a separate param
for those.  But I wouldn't name those 'partial'.

That said, those are magic numbers and I expect we can find some
that work well on RTL and GIMPLE.

Richard.

>
> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2017-09-12  Kugan Vivekanandarajah  
>
> * loop-unroll.c (decide_unroll_constant_iterations): Use new params.
> (decide_unroll_runtime_iterations): Likewise.
> (decide_unroll_stupid): Likewise.
> * params.def (DEFPARAM): Separate and add new params for rtl unroller.


Re: Backports for GCC 6 branch

2017-09-15 Thread Martin Liška
On 09/15/2017 10:26 AM, Martin Liška wrote:
> Hello.
> 
> I'm going to install following backports.
> 
> Patches can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Martin
> 

Small correction as gcc/c-family/c-attribs.c file was created in time of GCC 6.
Thus there's updated version of patches I've just tested.

Martin
>From ab3e0619b82c0033c8c65417ffa796e6731e6d0f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 30 Aug 2017 12:38:31 +
Subject: [PATCH 7/7] Backport r251530

gcc/ChangeLog:

2017-08-30  Martin Liska  

	PR inline-asm/82001
	* ipa-icf-gimple.c (func_checker::compare_tree_list_operand):
	Rename to ...
	(func_checker::compare_asm_inputs_outputs): ... this function.
	(func_checker::compare_gimple_asm): Use the function to compare
	also ASM constrains.
	* ipa-icf-gimple.h: Rename the function.

gcc/testsuite/ChangeLog:

2017-08-30  Martin Liska  

	PR inline-asm/82001
	* gcc.dg/ipa/pr82001.c: New test.
---
 gcc/ipa-icf-gimple.c   | 19 +--
 gcc/ipa-icf-gimple.h   |  6 +++---
 gcc/testsuite/gcc.dg/ipa/pr82001.c | 21 +
 3 files changed, 37 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr82001.c

diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index 1c5aebf00ca..9a9012c6983 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -543,11 +543,8 @@ func_checker::compare_operand (tree t1, tree t2)
 }
 }
 
-/* Compares two tree list operands T1 and T2 and returns true if these
-   two trees are semantically equivalent.  */
-
 bool
-func_checker::compare_tree_list_operand (tree t1, tree t2)
+func_checker::compare_asm_inputs_outputs (tree t1, tree t2)
 {
   gcc_assert (TREE_CODE (t1) == TREE_LIST);
   gcc_assert (TREE_CODE (t2) == TREE_LIST);
@@ -560,6 +557,16 @@ func_checker::compare_tree_list_operand (tree t1, tree t2)
   if (!compare_operand (TREE_VALUE (t1), TREE_VALUE (t2)))
 	return return_false ();
 
+  tree p1 = TREE_PURPOSE (t1);
+  tree p2 = TREE_PURPOSE (t2);
+
+  gcc_assert (TREE_CODE (p1) == TREE_LIST);
+  gcc_assert (TREE_CODE (p2) == TREE_LIST);
+
+  if (strcmp (TREE_STRING_POINTER (TREE_VALUE (p1)),
+		  TREE_STRING_POINTER (TREE_VALUE (p2))) != 0)
+	return return_false ();
+
   t2 = TREE_CHAIN (t2);
 }
 
@@ -1008,7 +1015,7 @@ func_checker::compare_gimple_asm (const gasm *g1, const gasm *g2)
   tree input1 = gimple_asm_input_op (g1, i);
   tree input2 = gimple_asm_input_op (g2, i);
 
-  if (!compare_tree_list_operand (input1, input2))
+  if (!compare_asm_inputs_outputs (input1, input2))
 	return return_false_with_msg ("ASM input is different");
 }
 
@@ -1017,7 +1024,7 @@ func_checker::compare_gimple_asm (const gasm *g1, const gasm *g2)
   tree output1 = gimple_asm_output_op (g1, i);
   tree output2 = gimple_asm_output_op (g2, i);
 
-  if (!compare_tree_list_operand (output1, output2))
+  if (!compare_asm_inputs_outputs (output1, output2))
 	return return_false_with_msg ("ASM output is different");
 }
 
diff --git a/gcc/ipa-icf-gimple.h b/gcc/ipa-icf-gimple.h
index 9530a8ed55c..c572a181736 100644
--- a/gcc/ipa-icf-gimple.h
+++ b/gcc/ipa-icf-gimple.h
@@ -215,9 +215,9 @@ public:
  is returned.  */
   bool compare_operand (tree t1, tree t2);
 
-  /* Compares two tree list operands T1 and T2 and returns true if these
- two trees are semantically equivalent.  */
-  bool compare_tree_list_operand (tree t1, tree t2);
+  /* Compares GIMPLE ASM inputs (or outputs) where we iterate tree chain
+ and compare both TREE_PURPOSEs and TREE_VALUEs.  */
+  bool compare_asm_inputs_outputs (tree t1, tree t2);
 
   /* Verifies that trees T1 and T2, representing function declarations
  are equivalent from perspective of ICF.  */
diff --git a/gcc/testsuite/gcc.dg/ipa/pr82001.c b/gcc/testsuite/gcc.dg/ipa/pr82001.c
new file mode 100644
index 000..05e32b10ef5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr82001.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fdump-ipa-icf-details"  } */
+
+int
+mullo (int a, int b)
+{
+  asm("mul %%edx   # %%1 was %1"
+  : "+"
+	"a"(a),
+	"+d"(b));
+  return a;
+}
+
+int
+mulhi (int a, int b)
+{
+  asm("mul %%edx   # %%1 was %1" : "+d"(a), "+a"(b));
+  return a;
+}
+
+/* { dg-final { scan-ipa-dump "Equal symbols: 0" "icf"  } } */
-- 
2.14.1

>From dbe394036c46f5f97d23b93b526991c018f6f8d9 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 29 Aug 2017 08:35:46 +
Subject: [PATCH 6/7] Backport r251406

gcc/ada/ChangeLog:

2017-08-29  Martin Liska  

	PR other/39851
	* gcc-interface/trans.c (Pragma_to_gnu): Set argument to NULL.
---
 gcc/ada/gcc-interface/trans.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/gcc-interface/trans.c 

[PATCH] Fix PR82217

2017-09-15 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-09-15  Richard Biener  

PR tree-optimization/82217
* tree-ssa-sccvn.c (visit_phi): Properly handle all VN_TOP
but not undefined case.

* gcc.dg/torture/pr82217.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 252780)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -3901,13 +3901,10 @@ visit_phi (gimple *phi)
  if only a single edge is exectuable use its value.  */
   if (n_executable <= 1)
 result = seen_undef ? seen_undef : sameval;
-  /* If we saw only undefined values create a new undef SSA name to
- avoid false equivalences.  */
+  /* If we saw only undefined values and VN_TOP use one of the
+ undefined values.  */
   else if (sameval == VN_TOP)
-{
-  gcc_assert (seen_undef);
-  result = seen_undef;
-}
+result = seen_undef ? seen_undef : sameval;
   /* First see if it is equivalent to a phi node in this block.  We prefer
  this as it allows IV elimination - see PRs 66502 and 67167.  */
   else if ((result = vn_phi_lookup (phi)))
Index: gcc/testsuite/gcc.dg/torture/pr82217.c
===
--- gcc/testsuite/gcc.dg/torture/pr82217.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr82217.c  (working copy)
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+
+int a, b, c;
+
+void fn1 ()
+{ 
+  while (1)
+{ 
+  if (c)
+   goto L2;
+  break;
+}
+  if (c)
+{
+L1:
+   {
+ int g[1];
+ if (b)
+   goto L1;
+ goto L1;
+L2:
+ for (a = 0; a;)
+   goto L1;
+   }
+}
+}
+
+int main ()
+{ 
+  fn1 ();
+  return 0;
+}


Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE

2017-09-15 Thread Richard Biener
On Thu, Sep 14, 2017 at 4:38 PM, Bill Schmidt
 wrote:
> On Sep 14, 2017, at 5:15 AM, Richard Biener  
> wrote:
>>
>> On Wed, Sep 13, 2017 at 10:14 PM, Bill Schmidt
>>  wrote:
>>> On Sep 13, 2017, at 10:40 AM, Bill Schmidt  
>>> wrote:

 On Sep 13, 2017, at 7:23 AM, Richard Biener  
 wrote:
>
> On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt
>  wrote:
>> Hi,
>>
>> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE
>>
>> Folding of vector loads in GIMPLE.
>>
>> Add code to handle gimple folding for the vec_ld builtins.
>> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. 
>> Surrounding
>> comments have been adjusted slightly so they continue to read OK for the
>> existing vec_st code.
>>
>> The resulting code is specifically verified by the 
>> powerpc/fold-vec-ld-*.c
>> tests which have been posted separately.
>>
>> For V2 of this patch, I've removed the chunk of code that prohibited the
>> gimple fold from occurring in BE environments.   This had fixed an issue
>> for me earlier during my development of the code, and turns out this was
>> not necessary.  I've sniff-tested after removing that check and it looks
>> OK.
>>
>>> + /* Limit folding of loads to LE targets.  */
>>> +  if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)
>>> +return false;
>>
>> I've restarted a regression test on this updated version.
>>
>> OK for trunk (assuming successful regression test completion)  ?
>>
>> Thanks,
>> -Will
>>
>> [gcc]
>>
>>  2017-09-12  Will Schmidt  
>>
>>  * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
>>  for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).
>>  * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>>  Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.
>>
>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
>> index fbab0a2..bb8a77d 100644
>> --- a/gcc/config/rs6000/rs6000-c.c
>> +++ b/gcc/config/rs6000/rs6000-c.c
>> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t 
>> loc, tree fndecl,
>>   convert (TREE_TYPE (stmt), arg0));
>> stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>> return stmt;
>>   }
>>
>> -  /* Expand vec_ld into an expression that masks the address and
>> - performs the load.  We need to expand this early to allow
>> +  /* Expand vec_st into an expression that masks the address and
>> + performs the store.  We need to expand this early to allow
>>the best aliasing, as by the time we get into RTL we no longer
>>are able to honor __restrict__, for example.  We may want to
>>consider this for all memory access built-ins.
>>
>>When -maltivec=be is specified, or the wrong number of arguments
>>is provided, simply punt to existing built-in processing.  */
>> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD
>> -  && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
>> -  && nargs == 2)
>> -{
>> -  tree arg0 = (*arglist)[0];
>> -  tree arg1 = (*arglist)[1];
>> -
>> -  /* Strip qualifiers like "const" from the pointer arg.  */
>> -  tree arg1_type = TREE_TYPE (arg1);
>> -  if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != 
>> ARRAY_TYPE)
>> -   goto bad;
>> -
>> -  tree inner_type = TREE_TYPE (arg1_type);
>> -  if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)
>> -   {
>> - arg1_type = build_pointer_type (build_qualified_type 
>> (inner_type,
>> -   0));
>> - arg1 = fold_convert (arg1_type, arg1);
>> -   }
>> -
>> -  /* Construct the masked address.  Let existing error handling take
>> -over if we don't have a constant offset.  */
>> -  arg0 = fold (arg0);
>> -
>> -  if (TREE_CODE (arg0) == INTEGER_CST)
>> -   {
>> - if (!ptrofftype_p (TREE_TYPE (arg0)))
>> -   arg0 = build1 (NOP_EXPR, sizetype, arg0);
>> -
>> - tree arg1_type = TREE_TYPE (arg1);
>> - if (TREE_CODE (arg1_type) == ARRAY_TYPE)
>> -   {
>> - arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));
>> - tree const0 = build_int_cstu (sizetype, 0);
>> - tree arg1_elt0 = build_array_ref (loc, arg1, const0);
>> - arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);
>> -   }
>> -
>> -  

  1   2   >