Re: [PATCH] fold a * (a > 0 ? 1 : -1) to abs(a) and related optimizations

2017-06-23 Thread Andrew Pinski
Forgot the patch

On Fri, Jun 23, 2017 at 8:59 PM, Andrew Pinski  wrote:
> Hi,
>   I saw this on llvm's review site (https://reviews.llvm.org/D34579)
> and I thought why not add it to GCC.  I expanded more than what was
> done on the LLVM patch.
>
> I added the following optimizations:
> Transform X * (X > 0 ? 1 : -1) into ABS(X).
> Transform X * (X >= 0 ? 1 : -1) into ABS(X).
> Transform X * (X > 0.0 ? 1.0 : -1.0) into ABS(X).
> Transform X * (X >= 0.0 ? 1.0 : -1.0) into ABS(X).
> Transform X * (X > 0 ? -1 : 1) into -ABS(X).
> Transform X * (X >= 0 ? -1 : 1) into -ABS(X).
> Transform X * (X > 0.0 ? -1.0 : 1.0) into -ABS(X).
> Transform X * (X >= 0.0 ? -1.0 : 1.0) into -ABS(X).
> Transform X * (X < 0 ? 1 : -1) into -ABS(X).
> Transform X * (X <= 0 ? 1 : -1) into -ABS(X).
> Transform X * (X < 0.0 ? 1.0 : -1.0) into -ABS(X).
> Transform X * (X <= 0.0 ? 1.0 : -1.0) into -ABS(X).
> Transform X * (X < 0 ? -1 : 1) into ABS(X).
> Transform X * (X <= 0 ? -1 : 1) into ABS(X).
> Transform X * (X < 0.0 ? -1.0 : 1.0) into ABS(X).
> Transform X * (X <= 0.0 ? -1.0 : 1.0) into ABS(X).
>
> The floating points ones only happen when not honoring SNANS and not
> honoring signed zeros.
>
> OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>
> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * match.pd ( X * (X >/>=/
> Testsuite/ChangeLog:
> * testsuite/gcc.dg/tree-ssa/mult-abs-1.c: New testcase.
> * testsuite/gcc.dg/tree-ssa/mult-abs-2.c: New testcase.
Index: match.pd
===
--- match.pd(revision 249613)
+++ match.pd(working copy)
@@ -155,6 +155,55 @@
|| !COMPLEX_FLOAT_TYPE_P (type)))
(negate @0)))
 
+(for cmp (gt ge)
+ /* Transform X * (X > 0 ? 1 : -1) into ABS(X). */
+ /* Transform X * (X >= 0 ? 1 : -1) into ABS(X). */
+ (simplify
+  (mult:c @0 (cond (cmp @0 integer_zerop) integer_onep integer_all_onesp))
+  (abs @0))
+ /* Transform X * (X > 0.0 ? 1.0 : -1.0) into ABS(X). */
+ /* Transform X * (X >= 0.0 ? 1.0 : -1.0) into ABS(X). */
+ (simplify
+  (mult:c @0 (cond (cmp @0 real_zerop) real_onep real_minus_onep))
+  (if (!HONOR_SNANS (type) && !HONOR_SIGNED_ZEROS (type))
+   (abs @0)))
+ /* Transform X * (X > 0 ? -1 : 1) into -ABS(X). */
+ /* Transform X * (X >= 0 ? -1 : 1) into -ABS(X). */
+ (simplify
+  (mult:c @0 (cond (cmp @0 integer_zerop) integer_all_onesp integer_onep))
+  (negate (abs @0)))
+ /* Transform X * (X > 0.0 ? -1.0 : 1.0) into -ABS(X). */
+ /* Transform X * (X >= 0.0 ? -1.0 : 1.0) into -ABS(X). */
+ (simplify
+  (mult:c @0 (cond (cmp @0 real_zerop) real_minus_onep real_onep))
+  (if (!HONOR_SNANS (type) && !HONOR_SIGNED_ZEROS (type))
+   (negate (abs @0)
+
+(for cmp (lt le)
+ /* Transform X * (X < 0 ? 1 : -1) into -ABS(X). */
+ /* Transform X * (X <= 0 ? 1 : -1) into -ABS(X). */
+ (simplify
+  (mult:c @0 (cond (cmp @0 integer_zerop) integer_onep integer_all_onesp))
+  (negate (abs @0)))
+ /* Transform X * (X < 0.0 ? 1.0 : -1.0) into -ABS(X). */
+ /* Transform X * (X <= 0.0 ? 1.0 : -1.0) into -ABS(X). */
+ (simplify
+  (mult:c @0 (cond (cmp @0 real_zerop) real_onep real_minus_onep))
+  (if (!HONOR_SNANS (type) && !HONOR_SIGNED_ZEROS (type))
+   (negate (abs @0
+ /* Transform X * (X < 0 ? -1 : 1) into ABS(X). */
+ /* Transform X * (X <= 0 ? -1 : 1) into ABS(X). */
+ (simplify
+  (mult:c @0 (cond (cmp @0 integer_zerop) integer_all_onesp integer_onep))
+  (abs @0))
+ /* Transform X * (X < 0.0 ? -1.0 : 1.0) into ABS(X). */
+ /* Transform X * (X <= 0.0 ? -1.0 : 1.0) into ABS(X). */
+ (simplify
+  (mult:c @0 (cond (cmp @0 real_zerop) real_minus_onep real_onep))
+  (if (!HONOR_SNANS (type) && !HONOR_SIGNED_ZEROS (type))
+   (abs @0
+
+
 /* X * 1, X / 1 -> X.  */
 (for op (mult trunc_div ceil_div floor_div round_div exact_div)
   (simplify
Index: testsuite/gcc.dg/tree-ssa/mult-abs-1.c
===
--- testsuite/gcc.dg/tree-ssa/mult-abs-1.c  (nonexistent)
+++ testsuite/gcc.dg/tree-ssa/mult-abs-1.c  (working copy)
@@ -0,0 +1,35 @@
+/* { dg-options "-O2 -fdump-tree-gimple" } */
+/* { dg-do compile } */
+int f(int x)
+{
+  return x * (x > 0 ? -1 : 1);
+}
+int f1(int x)
+{
+  return x * (x > 0 ? 1 : -1);
+}
+int g(int x)
+{
+  return x * (x >= 0 ? -1 : 1);
+}
+int g1(int x)
+{
+  return x * (x >= 0 ? 1 : -1);
+}
+int h(int x)
+{
+  return x * (x < 0 ? -1 : 1);
+}
+int h1(int x)
+{
+  return x * (x < 0 ? 1 : -1);
+}
+int i(int x)
+{
+  return x * (x <= 0 ? -1 : 1);
+}
+int i1(int x)
+{
+  return x * (x <= 0 ? 1 : -1);
+}
+/* { dg-final { scan-tree-dump-times "ABS" 8 "gimple"} } */
Index: testsuite/gcc.dg/tree-ssa/mult-abs-2.c
===
--- testsuite/gcc.dg/tree-ssa/mult-abs-2.c  (nonexistent)
+++ testsuite/gcc.dg/tree-ssa/mult-abs-2.c  (working copy)
@@ -0,0 +1,35 @@
+/* { dg-options "-O2 -ffast-math -fdump-tree-gimple" } */
+/* { dg-do compile } */
+float f(float x)

[PATCH] fold a * (a > 0 ? 1 : -1) to abs(a) and related optimizations

2017-06-23 Thread Andrew Pinski
Hi,
  I saw this on llvm's review site (https://reviews.llvm.org/D34579)
and I thought why not add it to GCC.  I expanded more than what was
done on the LLVM patch.

I added the following optimizations:
Transform X * (X > 0 ? 1 : -1) into ABS(X).
Transform X * (X >= 0 ? 1 : -1) into ABS(X).
Transform X * (X > 0.0 ? 1.0 : -1.0) into ABS(X).
Transform X * (X >= 0.0 ? 1.0 : -1.0) into ABS(X).
Transform X * (X > 0 ? -1 : 1) into -ABS(X).
Transform X * (X >= 0 ? -1 : 1) into -ABS(X).
Transform X * (X > 0.0 ? -1.0 : 1.0) into -ABS(X).
Transform X * (X >= 0.0 ? -1.0 : 1.0) into -ABS(X).
Transform X * (X < 0 ? 1 : -1) into -ABS(X).
Transform X * (X <= 0 ? 1 : -1) into -ABS(X).
Transform X * (X < 0.0 ? 1.0 : -1.0) into -ABS(X).
Transform X * (X <= 0.0 ? 1.0 : -1.0) into -ABS(X).
Transform X * (X < 0 ? -1 : 1) into ABS(X).
Transform X * (X <= 0 ? -1 : 1) into ABS(X).
Transform X * (X < 0.0 ? -1.0 : 1.0) into ABS(X).
Transform X * (X <= 0.0 ? -1.0 : 1.0) into ABS(X).

The floating points ones only happen when not honoring SNANS and not
honoring signed zeros.

OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

ChangeLog:
* match.pd ( X * (X >/>=/

Re: [PR80693] drop value of parallel SETs dropped by combine

2017-06-23 Thread Alexandre Oliva
On Jun 22, 2017, Segher Boessenkool  wrote:

> On Thu, Jun 22, 2017 at 03:21:01AM -0300, Alexandre Oliva wrote:
>> On Jun  8, 2017, Segher Boessenkool  wrote:
>> 
>> > [ I missed this patch the first time around; please cc: me to prevent this 
>> > ]
>> 
>> > On Thu, May 18, 2017 at 07:25:57AM -0300, Alexandre Oliva wrote:
>> >> When an insn used by combine has multiple SETs, only the non-REG_UNUSED
>> >> set is used: others will end up dropped on the floor.
>> 
>> > Sometimes, yes; not always.
>> 
>> You mean sets to non-REGs, I suppose.  I didn't take them into account
>> in my statement indeed, but I think it still applies: can_combine_p will
>> reject parallel SETs if two or more of them don't have a REG_UNUSED note
>> for their respective SET_DESTs.

> can_combine_p is not called for I3; it also isn't called until after
> I2 is split, if that happens.

Oh, I see what you mean now.  I just don't think of I3 as "used"; other
insns are "used" in that they are substituted, in whole or in part, into
I3.

On Jun 22, 2017, Segher Boessenkool  wrote:

> On Thu, Jun 22, 2017 at 09:25:21AM -0300, Alexandre Oliva wrote:
>> On Jun  8, 2017, Segher Boessenkool  wrote:
>> 
>> > Would it work to just have "else" instead if this "if"?  Or hrm, we'll
>> > need to kill the recorded reg_stat value in the last case before this
>> > as well?
>> 
>> The patch below (is this what you meant?)

> Yes exactly.

>> How's this?  (I haven't run regression tests yet)

> Looks a lot better digestable than the previous patch, thanks!

> Things should probably be restructured a bit so we keep the sets count
> correct, if that is possible?

I'll have to think a bit to figure out the exact conditions in which to
decrement the sets count, and reset the recorded value.  I was thinking
the conditions were the same; am I missing something?

Or are you getting at cases in which we should do both and don't, or
vice-versa?  E.g., if reg_referenced_p holds but the subsequent test
doesn't?  I guess we do, but don't we have to distinguish the cases of
an original unused set remaining from that of reusing the pseudo for a
new set?

Do we have to test whether from_insn still reg_sets_p the REG_UNUSED
operand, when from_insn is not i3?  (e.g., it could be something that
remains set in i1 as a side effect, but that's not used in either i2 or
i3)

Am I overdoing this?  The situations I had to analyze in the patch I
posted before were much simpler, and even then I now think I missed a
number of them :-)

>> When an insn used by combine has multiple SETs, only the non-REG_UNUSED
>> set is used: others will end up dropped on the floor.

> write something simpler like

>   If combine drops a REG_UNUSED SET, [...]

Nice, thanks.

> Similar for this comment.  (It has a stray tab btw, before "We").

*nod*

> Maybe use "long long"?  Less incorrect/misleading on 32-bit targets ;-)

Sure, thanks.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


C++ PATCH for c++/79056, C++17 ICE with invalid template-id

2017-06-23 Thread Jason Merrill
In this testcase, in C++17 mode after parsing the template-id fails,
we try to treat the template-name as a class template deduction
placeholder, producing an auto type, and then we try to take its
TREE_TYPE when calling cp_parser_check_for_invalid_type_id, which
doesn't work because it's a type, not a TYPE_DECL.  Fixed by removing
the TREE_TYPE and adjusting cp_parser_check_for_invalid_type_id to
handle TYPE_DECL.

Really we shouldn't be creating a deduction placeholder in the first
place when the template name is followed by <, but that distinction
shouldn't affect well-formed code since we try to parse a template-id
first.

I still want to improve our handling of parse errors in template-ids;
we never actually diagnose the syntax error here because it's
swallowed by tentative parsing, we just say "argument 1 is invalid".

Tested x86_64-pc-linux-gnu, applying to trunk.
commit bbea244e8042460d49970b7fffce13f79e58d8e9
Author: Jason Merrill 
Date:   Tue Jun 20 15:32:06 2017 -0400

PR c++/79056 - C++17 ICE with invalid template syntax.

* parser.c (cp_parser_simple_type_specifier): Don't assume that type
is a TYPE_DECL.
(cp_parser_check_for_invalid_template_id): Handle TYPE_DECL.
* pt.c (template_placeholder_p): New.
* cp-tree.h: Declare it.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 40c113b..33dde15 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6413,6 +6413,7 @@ extern void check_template_variable   (tree);
 extern tree make_auto  (void);
 extern tree make_decltype_auto (void);
 extern tree make_template_placeholder  (tree);
+extern bool template_placeholder_p (tree);
 extern tree do_auto_deduction   (tree, tree, tree);
 extern tree do_auto_deduction   (tree, tree, tree,
  tsubst_flags_t,
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index ddb1cf3..97cd923 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -2983,7 +2983,9 @@ cp_parser_check_for_invalid_template_id (cp_parser* 
parser,
 
   if (cp_lexer_next_token_is (parser->lexer, CPP_LESS))
 {
-  if (TYPE_P (type))
+  if (TREE_CODE (type) == TYPE_DECL)
+   type = TREE_TYPE (type);
+  if (TYPE_P (type) && !template_placeholder_p (type))
error_at (location, "%qT is not a template", type);
   else if (identifier_p (type))
{
@@ -17060,7 +17062,7 @@ cp_parser_simple_type_specifier (cp_parser* parser,
   /* There is no valid C++ program where a non-template type is
 followed by a "<".  That usually indicates that the user
 thought that the type was a template.  */
-  cp_parser_check_for_invalid_template_id (parser, TREE_TYPE (type),
+  cp_parser_check_for_invalid_template_id (parser, type,
   none_type,
   token->location);
 }
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index fba7fb1..392fba0 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -24799,6 +24799,14 @@ make_template_placeholder (tree tmpl)
   return t;
 }
 
+/* True iff T is a C++17 class template deduction placeholder.  */
+
+bool
+template_placeholder_p (tree t)
+{
+  return is_auto (t) && CLASS_PLACEHOLDER_TEMPLATE (t);
+}
+
 /* Make a "constrained auto" type-specifier. This is an
auto type with constraints that must be associated after
deduction.  The constraint is formed from the given
diff --git a/gcc/testsuite/g++.dg/parse/template28.C 
b/gcc/testsuite/g++.dg/parse/template28.C
new file mode 100644
index 000..6868bc8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/template28.C
@@ -0,0 +1,10 @@
+// PR c++/79056
+
+template struct A {};
+
+template void foo(A=A()) {} // { dg-error "" }
+
+void bar()
+{
+  foo(A());   // { dg-error "" }
+}


Re: [PATCH v3, rs6000] Add vec_reve support

2017-06-23 Thread Segher Boessenkool
Hi Carl,

On Fri, Jun 23, 2017 at 02:59:05PM -0700, Carl Love wrote:
> +(define_expand "altivec_vreve2"
> +  [(set (match_operand:VEC_A 0 "register_operand" "=v")
> + (unspec:VEC_A [(match_operand:VEC_A 1 "register_operand" "v")]
> +   UNSPEC_VREVEV))]
> +  "TARGET_ALTIVEC"
> +{
> +  int i, j, size, num_elements;
> +  rtvec v = rtvec_alloc (16);
> +  rtx mask = gen_reg_rtx (V16QImode);
> +
> +  size = GET_MODE_UNIT_SIZE (mode);
> +  num_elements = GET_MODE_NUNITS (mode);
> +
> +  for (j = num_elements - 1; j >= 0; j--)

You're still running this loop backwards, is that on purpose?  If not,
please fix.

> +for (i = 0; i < size; i++)
> +  RTVEC_ELT (v, i + j * size)
> + =  gen_rtx_CONST_INT (QImode, i + (num_elements - 1 - j) * size);

Why not just GEN_INT?

> +/* { dg-do run { target { powerpc*-*-linux* } } } */
> +/* { dg-require-effective-target vsx_hw } */
> +/* { dg-options "-O2 -mvsx" } */

Does it actually use VSX?  The condition on the expander is just
TARGET_ALTIVEC.  Or won't the testcase not work without VSX for some
other reason?

The rest looks good.  Okay for trunk if you can take care of these final
few things.

Thanks!


Segher


[PATCH v3, rs6000] Add vec_reve support

2017-06-23 Thread Carl Love
GCC maintainers:

I have updated the patch per the comments from Segher.  The vec_reve
builtin does work on Power 7, the test file was fixed to run on Power 7
as well.

The updated patch was tested on  on powerpc64le-unknown-linux-gnu
(Power 8 LE), powerpc64-unknown-linux-gnu(Power 8 BE),
powerpc64-unknown-linux-gnu (Power 7).

Please let me know if you see anything else that needs fixing.  Thanks.

   Carl Love


gcc/ChangeLog:

2017-06-23  Carl Love  

* config/rs6000/rs6000-c.c: Add support for built-in functions
vector bool char vec_reve (vector bool char);
vector signed char vec_reve (vector signed char);
vector unsigned char vec_reve (vector unsigned char);
vector bool int vec_reve (vector bool int);
vector signed int vec_reve (vector signed int);
vector unsigned int vec_reve (vector unsigned int);
vector bool long long vec_reve (vector bool long long);
vector signed long long vec_reve (vector signed long long);
vector unsigned long long vec_reve (vector unsigned long long);
vector bool short vec_reve (vector bool short);
vector signed short vec_reve (vector signed short);
vector double vec_reve (vector double);
vector float vec_reve (vector float);
* config/rs6000/rs6000-builtin.def (VREVE_V2DI, VREVE_V4SI,
VREVE_V8HI, VREVE_V16QI, VREVE_V2DF, VREVE_V4SF, VREVE): New builtin.
* config/rs6000/altivec.md (UNSPEC_VREVEV): New UNSPEC.
(altivec_vreve): New pattern.
* config/rs6000/altivec.h (vec_reve): New define.
* doc/extend.texi (vec_rev): Update the built-in documentation file
for the new built-in functions.

gcc/testsuite/ChangeLog:

2017-06-23  Carl Love  

* gcc.target/powerpc/builtins-3-vec_reve-runnable.c:
Add new runnable test file for the vec_rev built-ins.
---
 gcc/config/rs6000/altivec.h|   1 +
 gcc/config/rs6000/altivec.md   |  26 ++
 gcc/config/rs6000/rs6000-builtin.def   |   9 +
 gcc/config/rs6000/rs6000-c.c   |  29 ++
 gcc/doc/extend.texi|  13 +
 .../powerpc/builtins-3-vec_reve-runnable.c | 394 +
 6 files changed, 472 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/builtins-3-vec_reve-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index d542315..dd68ae1 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -142,6 +142,7 @@
 #define vec_madd __builtin_vec_madd
 #define vec_madds __builtin_vec_madds
 #define vec_mtvscr __builtin_vec_mtvscr
+#define vec_reve __builtin_vec_vreve
 #define vec_vmaxfp __builtin_vec_vmaxfp
 #define vec_vmaxsw __builtin_vec_vmaxsw
 #define vec_vmaxsh __builtin_vec_vmaxsh
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 25b2768..ac86e43 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -46,6 +46,7 @@
UNSPEC_VPACK_UNS_UNS_SAT
UNSPEC_VPACK_UNS_UNS_MOD
UNSPEC_VPACK_UNS_UNS_MOD_DIRECT
+   UNSPEC_VREVEV
UNSPEC_VSLV4SI
UNSPEC_VSLO
UNSPEC_VSR
@@ -3727,6 +3728,31 @@
   DONE;
 }")
 
+;; Vector reverse elements
+(define_expand "altivec_vreve2"
+  [(set (match_operand:VEC_A 0 "register_operand" "=v")
+   (unspec:VEC_A [(match_operand:VEC_A 1 "register_operand" "v")]
+ UNSPEC_VREVEV))]
+  "TARGET_ALTIVEC"
+{
+  int i, j, size, num_elements;
+  rtvec v = rtvec_alloc (16);
+  rtx mask = gen_reg_rtx (V16QImode);
+
+  size = GET_MODE_UNIT_SIZE (mode);
+  num_elements = GET_MODE_NUNITS (mode);
+
+  for (j = num_elements - 1; j >= 0; j--)
+for (i = 0; i < size; i++)
+  RTVEC_ELT (v, i + j * size)
+   =  gen_rtx_CONST_INT (QImode, i + (num_elements - 1 - j) * size);
+
+  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_altivec_vperm_ (operands[0], operands[1],
+operands[1], mask));
+  DONE;
+})
+
 ;; Vector SIMD PEM v2.06c defines LVLX, LVLXL, LVRX, LVRXL,
 ;; STVLX, STVLXL, STVVRX, STVRXL are available only on Cell.
 (define_insn "altivec_lvlx"
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 4682628..20974b4 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1130,6 +1130,13 @@ BU_ALTIVEC_1 (VUPKLSB, "vupklsb",CONST,  
altivec_vupklsb)
 BU_ALTIVEC_1 (VUPKLPX,   "vupklpx",CONST,  altivec_vupklpx)
 BU_ALTIVEC_1 (VUPKLSH,   "vupklsh",CONST,  altivec_vupklsh)
 
+BU_ALTIVEC_1 (VREVE_V2DI,  "vreve_v2di", CONST,  altivec_vrevev2di2)
+BU_ALTIVEC_1 (VREVE_V4SI,  "vreve_v4si", CONST,  altivec_vrevev4si2)
+BU_ALTIVEC_1 (VREVE_V8HI,  "vreve_v8hi", CONST,  

Re: [PATCH, ARM/AArch64] drop aarch32 support for falkor/qdf24xx

2017-06-23 Thread Jim Wilson
On Mon, Jun 12, 2017 at 3:40 AM, James Greenhalgh
 wrote:
> In both the original patch, and the backport, you're modifying the
> AArch64 options here. I'd expect the edits to be to the AArch32 options
> (these start somewhere around line 15,000).

Yes, I screwed this up.  Richard Earnshaw already fixed the ARM Option
list in one of his -mcpu patches.  I checked in a fix for the AArch64
Option list under the obvious rule.  Tested with a make doc, and using
info to look at the docs to make sure that they are right.

I will fix the gcc-7 backport before I check it in.

Jim
	gcc/
	* doc/invoke.texi (AArch64 Options, -mtune): Re-add falkor and
	qdf24xx.

Index: doc/invoke.texi
===
--- doc/invoke.texi	(revision 249611)
+++ doc/invoke.texi	(working copy)
@@ -14079,7 +14079,8 @@ Specify the name of the target processor for which
 performance of the code.  Permissible values for this option are:
 @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
 @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75},
-@samp{exynos-m1}, @samp{xgene1}, @samp{vulcan}, @samp{thunderx},
+@samp{exynos-m1}, @samp{falkor}, @samp{qdf24xx},
+@samp{xgene1}, @samp{vulcan}, @samp{thunderx},
 @samp{thunderxt88}, @samp{thunderxt88p1}, @samp{thunderxt81},
 @samp{thunderxt83}, @samp{thunderx2t99}, @samp{cortex-a57.cortex-a53},
 @samp{cortex-a72.cortex-a53}, @samp{cortex-a73.cortex-a35},


RE: [PATCH][ARM] Fix static analysis warnings in arm backend

2017-06-23 Thread Michael Collison
Hi Eric,

The warnings are listed in the PR here:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68535

Regards,

Michael Collison

-Original Message-
From: Eric Gallager [mailto:eg...@gwmail.gwu.edu] 
Sent: Friday, June 23, 2017 2:37 PM
To: Michael Collison 
Cc: GCC Patches ; nd 
Subject: Re: [PATCH][ARM] Fix static analysis warnings in arm backend

On 6/23/17, Michael Collison  wrote:
> This patch cleans up warning messages due to unused variables and 
> overly complicated loop structures.
>
> Okay for trunk?
>
> 2017-03-30  Michael Collison  
>
>   PR target/68535
>   * config/arm/arm.c (gen_ldm_seq): Remove last unnecessary
>   set of base_reg
>   (arm_gen_movmemqi): Removed unused variable 'i'.
>   Convert 'for' loop into 'while' loop.
>   (arm_expand_prologue): Remove last unnecessary set of insn.
>   (thumb_pop): Remove unused variable 'pushed_words'.
>   (thumb_exit): Remove last unnecessary set of regs_to_pop.
>

What were the actual warning messages, could you post those?
Eric


Re: [PATCH][ARM] Fix static analysis warnings in arm backend

2017-06-23 Thread Eric Gallager
On 6/23/17, Michael Collison  wrote:
> This patch cleans up warning messages due to unused variables and overly
> complicated loop structures.
>
> Okay for trunk?
>
> 2017-03-30  Michael Collison  
>
>   PR target/68535
>   * config/arm/arm.c (gen_ldm_seq): Remove last unnecessary
>   set of base_reg
>   (arm_gen_movmemqi): Removed unused variable 'i'.
>   Convert 'for' loop into 'while' loop.
>   (arm_expand_prologue): Remove last unnecessary set of insn.
>   (thumb_pop): Remove unused variable 'pushed_words'.
>   (thumb_exit): Remove last unnecessary set of regs_to_pop.
>

What were the actual warning messages, could you post those?
Eric


libgo patch RFC: Test the runtime package with the go tool

2017-06-23 Thread Ian Lance Taylor
Many of the tests of the runtime package require invoking the go tool,
and therefore are not run when testing libgo.  This patch to
gotools/Makefile.am adds an additional test of the runtime package on
a native system, this time letting it run the newly built go tool.
This provides significantly more testing.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.

Ian


2017-06-23  Ian Lance Taylor  

* Makefile.am (MOSTLYCLEANFILES): Remove testing files and logs.
(mostlyclean-local): Remove check-runtime-dir.
(ECHO_ENV): Define.
(check-go-tool): Depend on cgo.  Write command to testlog.
(check-runtime): New target.
(check): Depend on check-runtime.  Add @ to prettify output.
* Makefile.in: Rebuild.
Index: Makefile.am
===
--- Makefile.am (revision 249203)
+++ Makefile.am (working copy)
@@ -106,7 +106,12 @@ s-zdefaultcc: Makefile
$(SHELL) $(srcdir)/../move-if-change zdefaultcc.go.tmp zdefaultcc.go
$(STAMP) $@ 
 
-MOSTLYCLEANFILES = zdefaultcc.go s-zdefaultcc
+MOSTLYCLEANFILES = \
+   zdefaultcc.go s-zdefaultcc \
+   check-gccgo gotools.head *-testlog gotools.sum gotools.log
+
+mostlyclean-local:
+   rm -rf check-go-dir check-runtime-dir
 
 if NATIVE
 
@@ -156,6 +161,7 @@ check-gccgo: Makefile
chmod +x $@
 
 # CHECK_ENV sets up the environment to run the newly built go tool.
+# If you change this, change ECHO_ENV, below.
 CHECK_ENV = \
PATH=`echo $(abs_builddir):$${PATH} | sed 
's,::*,:,g;s,^:*,,;s,:*$$,,'`; \
export PATH; \
@@ -169,25 +175,54 @@ CHECK_ENV = \
LD_LIBRARY_PATH=`echo $${abs_libgodir}/.libs:$${LD_LIBRARY_PATH} | sed 
's,::*,:,g;s,^:*,,;s,:*$$,,'`; \
export LD_LIBRARY_PATH;
 
+# ECHO_ENV is a variant of CHECK_ENV to put into a testlog file.
+# It assumes that abs_libgodir is set.
+ECHO_ENV = PATH=`echo $(abs_builddir):$${PATH} | sed 
's,::*,:,g;s,^:*,,;s,:*$$,,'` GCCGO='$(abs_builddir)/check-gccgo' 
GCCGOTOOLDIR='$(abs_builddir)' GO_TESTING_GOTOOLS=yes LD_LIBRARY_PATH=`echo 
$${abs_libgodir}/.libs:$${LD_LIBRARY_PATH} | sed 's,::*,:,g;s,^:*,,;s,:*$$,,'`
+
 # check-go-tools runs `go test cmd/go` in our environment.
-check-go-tool: go$(EXEEXT) check-head check-gccgo
-   rm -rf check-go-dir
+check-go-tool: go$(EXEEXT) cgo$(EXEEXT) check-head check-gccgo
+   rm -rf check-go-dir cmd_go-testlog
$(MKDIR_P) check-go-dir/src/cmd/go
cp $(cmdsrcdir)/go/*.go check-go-dir/src/cmd/go/
cp $(libgodir)/zstdpkglist.go check-go-dir/src/cmd/go/
cp zdefaultcc.go check-go-dir/src/cmd/go/
cp -r $(cmdsrcdir)/go/testdata check-go-dir/src/cmd/go/
+   @abs_libgodir=`cd $(libgodir) && $(PWD_COMMAND)`; \
+   abs_checkdir=`cd check-go-dir && $(PWD_COMMAND)`; \
+   echo "cd check-go-dir/src/cmd/go && $(ECHO_ENV) GOPATH=$${abs_checkdir} 
$(abs_builddir)/go$(EXEEXT) test -test.short -test.v" > cmd_go-testlog
$(CHECK_ENV) \
GOPATH=`cd check-go-dir && $(PWD_COMMAND)`; \
export GOPATH; \
-   (cd check-go-dir/src/cmd/go && $(abs_builddir)/go$(EXEEXT) test 
-test.short -test.v) >& cmd_go-testlog || true
+   (cd check-go-dir/src/cmd/go && $(abs_builddir)/go$(EXEEXT) test 
-test.short -test.v) >> cmd_go-testlog 2>&1 || true
grep '^--- ' cmd_go-testlog | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/'
 
+# check-runtime runs `go test runtime` in our environment.
+# The runtime package is also tested as part of libgo,
+# but the runtime tests use the go tool heavily, so testing
+# here too will catch more problems.
+check-runtime: go$(EXEEXT) cgo$(EXEEXT) check-head check-gccgo
+   rm -rf check-runtime-dir runtime-testlog
+   $(MKDIR_P) check-runtime-dir
+   @abs_libgodir=`cd $(libgodir) && $(PWD_COMMAND)`; \
+   LD_LIBRARY_PATH=`echo $${abs_libgodir}/.libs:$${LD_LIBRARY_PATH} | sed 
's,::*,:,g;s,^:*,,;s,:*$$,,'`; \
+   GOARCH=`$(abs_builddir)/go$(EXEEXT) env GOARCH`; \
+   GOOS=`$(abs_builddir)/go$(EXEEXT) env GOOS`; \
+   files=`$(SHELL) $(libgosrcdir)/../match.sh --goarch=$${GOARCH} 
--goos=$${GOOS} --srcdir=$(libgosrcdir)/runtime 
--extrafiles="$(libgodir)/runtime_sysinfo.go $(libgodir)/sigtab.go" 
--tag=libffi`; \
+   echo "$(ECHO_ENV) GC='$(abs_builddir)/check-gccgo 
-fgo-compiling-runtime' GOARCH=$${GOARCH} GOOS=$${GOOS} $(SHELL) 
$(libgosrcdir)/../testsuite/gotest --goarch=$${GOARCH} --goos=$${GOOS} 
--basedir=$(libgosrcdir)/.. --srcdir=$(libgosrcdir)/runtime --pkgpath=runtime 
--pkgfiles='$${files}' -test.v" > runtime-testlog
+   $(CHECK_ENV) \
+   GC="$${GCCGO} -fgo-compiling-runtime"; \
+   export GC; \
+   GOARCH=`$(abs_builddir)/go$(EXEEXT) env GOARCH`; \
+   GOOS=`$(abs_builddir)/go$(EXEEXT) env GOOS`; \
+   files=`$(SHELL) $(libgosrcdir)/../match.sh --goarch=$${GOARCH} 
--goos=$${GOOS} --srcdir=$(libgosrcdir)/runtime 
--extrafiles="$(libgodir)/runtime_sysinfo.go $(libgodir)/sigtab.go" 
--tag=libffi`; \
+   $(SHELL) 

[PATCH][ARM] Fix static analysis warnings in arm backend

2017-06-23 Thread Michael Collison
This patch cleans up warning messages due to unused variables and overly 
complicated loop structures.

Okay for trunk?

2017-03-30  Michael Collison  

PR target/68535
* config/arm/arm.c (gen_ldm_seq): Remove last unnecessary
set of base_reg
(arm_gen_movmemqi): Removed unused variable 'i'.
Convert 'for' loop into 'while' loop.
(arm_expand_prologue): Remove last unnecessary set of insn.
(thumb_pop): Remove unused variable 'pushed_words'.
(thumb_exit): Remove last unnecessary set of regs_to_pop.


pr6294.patch
Description: pr6294.patch


Re: [C++ Patch] PR 62315 ("do not print typename in diagnostic if the original code does not have it")

2017-06-23 Thread Jason Merrill
OK.

On Fri, Jun 2, 2017 at 4:35 AM, Paolo Carlini  wrote:
> Hi,
>
> a while ago Manuel noticed that printing 'typename' in error messages about
> missing 'typename' can be confusing. That seems easy to fix, in fact we
> already handle correctly a similar situation in grokdeclarator. Tested
> x86_64-linux.
>
> Thanks, Paolo.
>
> //
>
>


libgo patch committed: Complete defer handling in CgocallBackDone

2017-06-23 Thread Ian Lance Taylor
This patch to libgo completes the handling of a cgo-generated defer in
CgocallBackDone.

When C code calls a Go function, it actually calls a function
generated by cgo. That function is written in Go, and, among other
things, it calls the real Go function like this:
CgocallBack()
defer CgocallBackDone()
RealGoFunction()
The deferred CgocallBackDone function enters syscall mode as we return
to C. Typically the C function will then eventually return to Go.

However, in the case where the C function is running on a thread
created in C, it will not return to Go. For that case we will have
allocated an m struct, with an associated g struct, for the duration
of the Go code, and when the Go is complete we will return the m and g
to a free list.

That all works, but we are running in a deferred function, which means
that we have been invoked by deferreturn, and deferreturn expects to
do a bit of cleanup to record that the defer has been completed. Doing
that cleanup while using an m and g that have already been returned to
the free list is clearly a bad idea. It was kind of working because
deferreturn was holding the g pointer in a local variable, but there
were races with some other thread picking up and using the newly freed
g.  It was also kind of working because of a special check in
freedefer; that check is no longer necessary.

This patch changes the special case of releasing the m and g to do the
defer cleanup in CgocallBackDone itself.

This patch also checks for the special case of a panic through
CgocallBackDone. In that special case, we don't want to release the m
and g. Since we are returning to C code that was not called by Go
code, we know that the panic is not going to be caught and we are
going to exit the program. So for that special case we keep the m and
g structs so that the rest of the panic code can use them.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249609)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-fc0cfdff94ca1099421900f43837ca5a70189cd6
+0a20181d00d43a423c55f4e772b759fba0619478
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/cgo_gccgo.go
===
--- libgo/go/runtime/cgo_gccgo.go   (revision 249205)
+++ libgo/go/runtime/cgo_gccgo.go   (working copy)
@@ -95,9 +95,34 @@ func CgocallBack() {
 // CgocallBackDone prepares to return to C/C++ code that has called
 // into Go code.
 func CgocallBackDone() {
+   // If we are the top level Go function called from C/C++, then
+   // we need to release the m. But don't release it if we are
+   // panicing; since this is the top level, we are going to
+   // crash the program, and we need the g and m to print the
+   // panic values.
+   //
+   // Dropping the m is going to clear g. This function is being
+   // called as a deferred function, so we will return to
+   // deferreturn which will want to clear the _defer field.
+   // As soon as we call dropm another thread may call needm and
+   // start using g, so we must not tamper with the _defer field
+   // after dropm. So clear _defer now.
+   gp := getg()
+   mp := gp.m
+   drop := false
+   if mp.dropextram && mp.ncgo == 0 && gp._panic == nil {
+   d := gp._defer
+   if d == nil || d.link != nil {
+   throw("unexpected g._defer in CgocallBackDone")
+   }
+   gp._defer = nil
+   freedefer(d)
+   drop = true
+   }
+
entersyscall(0)
-   mp := getg().m
-   if mp.dropextram && mp.ncgo == 0 {
+
+   if drop {
mp.dropextram = false
dropm()
}
Index: libgo/go/runtime/panic.go
===
--- libgo/go/runtime/panic.go   (revision 249590)
+++ libgo/go/runtime/panic.go   (working copy)
@@ -143,14 +143,6 @@ func newdefer() *_defer {
 //
 //go:nosplit
 func freedefer(d *_defer) {
-   // When C code calls a Go function on a non-Go thread, the
-   // deferred call to cgocallBackDone will set g to nil.
-   // Don't crash trying to put d on the free list; just let it
-   // be garbage collected.
-   if getg() == nil {
-   return
-   }
-
pp := getg().m.p.ptr()
if len(pp.deferpool) == cap(pp.deferpool) {
// Transfer half of local cache to the central cache.
@@ -201,6 +193,15 @@ func deferreturn(frame *bool) {
fn(d.arg)
}
 
+   // If we are returning from a Go function called by a
+   // C function running in a C thread, g may now be nil,
+

libgo patch committed: Don't require GOROOT

2017-06-23 Thread Ian Lance Taylor
This patch to libgo changes the libgo version of the go tool to not
require GOROOT.  GOROOT is only required for the gc toolchain, and is
not necessary for gccgo.  This fixes running the gotools testsuite in
a GCC build directory when using a --prefix for which you've never run
`make install`.  Bootstrapped and ran gotools testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249599)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-c49ba1ca392b3c23a4b3934e0a95a908b1dc2f1d
+fc0cfdff94ca1099421900f43837ca5a70189cd6
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/cmd/go/main.go
===
--- libgo/go/cmd/go/main.go (revision 249205)
+++ libgo/go/cmd/go/main.go (working copy)
@@ -155,8 +155,13 @@ func main() {
}
 
if fi, err := os.Stat(goroot); err != nil || !fi.IsDir() {
-   fmt.Fprintf(os.Stderr, "go: cannot find GOROOT directory: 
%v\n", goroot)
-   os.Exit(2)
+   // For gccgo this is fine, carry on.
+   // Note that this check is imperfect as we have not yet
+   // parsed the -compiler flag.
+   if runtime.Compiler != "gccgo" {
+   fmt.Fprintf(os.Stderr, "go: cannot find GOROOT 
directory: %v\n", goroot)
+   os.Exit(2)
+   }
}
 
// Set environment (GOOS, GOARCH, etc) explicitly.


Re: [PATCH, GCC/ARM, Stage 1] Rename FPSCR builtins to correct names

2017-06-23 Thread Christophe Lyon
Hi Thomas,

On 23 June 2017 at 17:48, Thomas Preudhomme
 wrote:
> Hi Kyrill,
>
>
> On 10/04/17 15:01, Kyrill Tkachov wrote:
>>
>> Hi Prakhar,
>> Sorry for the delay,
>>
>> On 22/03/17 10:46, Prakhar Bahuguna wrote:
>>>
>>> The GCC documentation in section 6.60.8 ARM Floating Point Status and
>>> Control
>>> Intrinsics states that the FPSCR register can be read and written to
>>> using the
>>> intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr. However,
>>> these
>>> are misnamed within GCC itself and these intrinsic names are not
>>> recognised.
>>> This patch corrects the intrinsic names to match the documentation, and
>>> adds
>>> tests to verify these intrinsics generate the correct instructions.
>>>
>>> Testing done: Ran regression tests on arm-none-eabi for Cortex-M4.
>>>
>>> 2017-03-09  Prakhar Bahuguna  
>>>
>>> gcc/ChangeLog:
>>>
>>> * gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename
>>>   __builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename
>>>   __builtin_arm_stfscr to __builtin_arm_set_fpscr.
>>> * gcc/testsuite/gcc.target/arm/fpscr.c: New file.
>>>
>>> Okay for stage 1?
>>
>>
>> I see that the mistake was in not addressing one of the review comments
>> in:
>> https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01832.html
>> properly in the patch that added these functions :(
>>
>> This is ok for stage 1 if a bootstrap and test on arm-none-linux-gnueabihf
>> works
>> fine
>> I don't think we want to maintain the __builtin_arm_[ld,st]fscr names for
>> backwards compatibility
>> as they were not documented and are __builtin_arm* functions that we don't
>> guarantee to maintain.
>
>
> How about a backport to GCC 5, 6 & 7? The patch applied cleanly on each of
> these versions and the testsuite didn't show any regression for any of the
> backport when run for Cortex-M7.
>

Three's a problem with GCC-5:
gcc.target/arm/fpscr.c: unknown effective target keyword
`arm_fp_ok' for " dg-require-effective-target 4 arm_fp_ok "

Indeed arm_fp_ok effective-target does not exist in the gcc-5 branch.

Christophe

> Patches attached for reference.
>
> ChangeLog entries:
>
> *** gcc/ChangeLog ***
>
> 2017-06-20  Thomas Preud'homme  
>
> Backport from mainline
> 2017-05-04  Prakhar Bahuguna  
>
> * gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename
> __builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename
> __builtin_arm_stfscr to __builtin_arm_set_fpscr.
>
>
> *** gcc/testsuite/ChangeLog ***
>
> 2017-06-20  Thomas Preud'homme  
>
> Backport from mainline
> 2017-05-04  Prakhar Bahuguna  
>
> gcc/testsuite/
> * gcc.target/arm/fpscr.c: New file.
>
>
> Best regards,
>
> Thomas


Re: [libcilkrts] Fix 64-bit SPARC/Linux port

2017-06-23 Thread David Miller
From: Eric Botcazou 
Date: Fri, 23 Jun 2017 19:34:54 +0200

> Since libcilkrts was ported to the SPARC architecture by Rainer, running the 
> testsuite on SPARC/Linux in 64-bit mode with sufficiently high parallelim has 
> resulted in an almost guaranteed kernel panic.
> 
> Fixed thusly, tested on SPARC64/Linux and SPARC/Solaris., applied to mainline 
> and 7 branch.  Rainer kindly agreed to submit a copy of the fix to the master 
> repository when he gets a chance.

Ok, but the kernel shouldn't crash because of a bad stack pointer.

The fact that an unaligned stack access causes the problem is a good
clue.  Thanks, I'll try to look into this.


[PATCH/AARCH64] Improve cost of arithmetic instructions with shift/extend on ThunderX2T99

2017-06-23 Thread Andrew Pinski
Hi,
  This patch is similar to what I did for ThunderX where I increase
the costs of these instructions slightly to point out there is a
slight cost to using them over using the two instructions.

OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.
This gave ~2% on SPEC CPU 2006 int.

Thanks,
Andrew Pinski

ChangeLog:
* config/aarch64/aarch64-cost-tables.h (thunderx2t99_extra_costs):
Increment Arith_shift, Arith_shift_reg, Log_shift, Log_shift_reg and
Extend_arith by 1.
Index: gcc/config/aarch64/aarch64-cost-tables.h
===
--- gcc/config/aarch64/aarch64-cost-tables.h(revision 249583)
+++ gcc/config/aarch64/aarch64-cost-tables.h(working copy)
@@ -239,12 +239,12 @@
 0, /* Logical.  */
 0, /* Shift.  */
 0, /* Shift_reg.  */
-COSTS_N_INSNS (1), /* Arith_shift.  */
-COSTS_N_INSNS (1), /* Arith_shift_reg.  */
-COSTS_N_INSNS (1), /* Log_shift.  */
-COSTS_N_INSNS (1), /* Log_shift_reg.  */
+COSTS_N_INSNS (1)+1,   /* Arith_shift.  */
+COSTS_N_INSNS (1)+1,   /* Arith_shift_reg.  */
+COSTS_N_INSNS (1)+1,   /* Log_shift.  */
+COSTS_N_INSNS (1)+1,   /* Log_shift_reg.  */
 0, /* Extend.  */
-COSTS_N_INSNS (1), /* Extend_arith.  */
+COSTS_N_INSNS (1)+1,   /* Extend_arith.  */
 0, /* Bfi.  */
 0, /* Bfx.  */
 COSTS_N_INSNS (3), /* Clz.  */


Re: [nvptx, PATCH, 3/3] Add v2di support

2017-06-23 Thread Jeff Law
On 06/06/2017 07:12 AM, Tom de Vries wrote:
> Hi,
> 
> this patch adds v2di support to the nvptx target.  This allows us to
> generate 128-bit loads and stores.
> 
> Tested in nvptx mainkernel mode and x86_64 accelerator mode.
> 
> OK for trunk?
> 
> Thanks,
> - Tom
> 
> 0003-Add-v2di-support.patch
> 
> 
> Add v2di support
> 
> 2017-06-06  Tom de Vries  
> 
>   * config/nvptx/nvptx-modes.def: Add V2DImode.
>   * config/nvptx/nvptx-protos.h (nvptx_data_alignment): Declare.
>   * config/nvptx/nvptx.c (nvptx_ptx_type_from_mode): Handle V2DImode.
>   (nvptx_output_mov_insn): Handle lack of mov.b128.
>   (nvptx_print_operand): Handle 'H' and 'L' codes.
>   (nvptx_vector_mode_supported): Allow V2DImode.
>   (nvptx_preferred_simd_mode): New function.
>   (nvptx_data_alignment): New function.
>   (TARGET_VECTORIZE_PREFERRED_SIMD_MODE): Redefine to
>   nvptx_preferred_simd_mode.
>   * config/nvptx/nvptx.h (STACK_BOUNDARY, BIGGEST_ALIGNMENT): Change from
>   64 to 128 bits.
>   (DATA_ALIGNMENT): Define.  Set to nvptx_data_alignment.
> 
>   * config/nvptx/nvptx.md (VECIM): Add V2DI.
> 
>   * gcc.target/nvptx/decl-init.c: Update alignment.
>   * gcc.target/nvptx/slp-2-run.c: New test.
>   * gcc.target/nvptx/slp-2.c: New test.
>   * gcc.target/nvptx/v2di.c: New test.
> 
>   * testsuite/libgomp.oacc-c/vec.c: New test.
>
OK.  I'm going to take your word that bumping STACK_BOUNDARY is the
right thing to do  rather than dynamic realignment.  Presumably mixing
code from different PTX compilers isn't something we're really worrying
about yet anyway...

jeff


Re: [PATCH] Fix expand_builtin_atomic_fetch_op for pre-op (PR80902)

2017-06-23 Thread Jeff Law
On 06/23/2017 11:44 AM, Segher Boessenkool wrote:
> On Thu, Jun 22, 2017 at 10:59:05PM -0600, Jeff Law wrote:
>> On 05/28/2017 06:31 AM, Segher Boessenkool wrote:
>>> __atomic_add_fetch adds a value to some memory, and returns the result.
>>> If there is no direct support for this, expand_builtin_atomic_fetch_op
>>> is asked to implement this as __atomic_fetch_add (which returns the
>>> original value of the mem), followed by the addition.  Now, the
>>> __atomic_add_fetch could have been a tail call, but we shouldn't
>>> perform the __atomic_fetch_add as a tail call: following code would
>>> not be executed, and in fact thrown away because there is a barrier
>>> after tail calls.
>>>
>>> This fixes it.
> 
>>> PR middle-end/80902
>>> * builtins.c (expand_builtin_atomic_fetch_op): If emitting code after
>>> a call, force the call to not be a tail call.
>> Hmmm.  I wonder if we have similar problems elsewhere.  For example
>> expand_builtin_int_roundingfn_2, stack_protect_epilogue,
>> expand_builtin_trap (though this one probably isn't broken in practice),
>> expand_ifn_atomic_compare_exchange_into_call.
>>
>> OK, but please check the other instances where we call expand_call, then
>> continue generating code afterwards.  Fixing those can be a follow-up patch.
> 
> I guess we want an expand_call_notail helper? 
Probably.

 Or, hrm, why are function
> calls expanded as tail calls at all, should that not be decided later?
That's how I thought it worked.  We create two streams of insns, then
decide later which of the two streams to use.

But I think part of the criteria for creating streams was that call was
in the tail position to start with.  And that's not the case with the
code you pointed out and the others I found.

Jeff


Re: [nvptx, PATCH, 2/3 ] Add v2si support

2017-06-23 Thread Jeff Law
On 06/06/2017 07:05 AM, Tom de Vries wrote:
> Hi,
> 
> this patch adds v2si support to the nvptx target.
> 
> Tested in nvptx mainkernel mode and x86_64 accelerator mode.
> 
> OK for trunk?
> 
> Thanks,
> - Tom
> 
> 
> 0002-Add-v2si-support.patch
> 
> 
> Add v2si support
> 
> 2017-06-06  Tom de Vries  
> 
>   * config/nvptx/nvptx-modes.def: New file.  Add V2SImode.
>   * config/nvptx/nvptx.c (nvptx_ptx_type_from_mode): Handle V2SImode.
>   (nvptx_vector_mode_supported): New function.  Allow V2SImode.
>   (TARGET_VECTOR_MODE_SUPPORTED_P): Redefine to 
> nvptx_vector_mode_supported.
>   * config/nvptx/nvptx.md (VECIM): New mode iterator. Add V2SI.
>   (mov_insn): New define_insn.
>   (define_expand "mov): New define_expand.
> 
>   * gcc.target/nvptx/slp-run.c: New test.
>   * gcc.target/nvptx/slp.c: New test.
>   * gcc.target/nvptx/v2si-cvt.c: New test.
>   * gcc.target/nvptx/v2si-run.c: New test.
>   * gcc.target/nvptx/v2si.c: New test.
>   * gcc.target/nvptx/vec.inc: New test.
OK.
jeff


Re: [PATCH] Fix expand_builtin_atomic_fetch_op for pre-op (PR80902)

2017-06-23 Thread Segher Boessenkool
On Thu, Jun 22, 2017 at 10:59:05PM -0600, Jeff Law wrote:
> On 05/28/2017 06:31 AM, Segher Boessenkool wrote:
> > __atomic_add_fetch adds a value to some memory, and returns the result.
> > If there is no direct support for this, expand_builtin_atomic_fetch_op
> > is asked to implement this as __atomic_fetch_add (which returns the
> > original value of the mem), followed by the addition.  Now, the
> > __atomic_add_fetch could have been a tail call, but we shouldn't
> > perform the __atomic_fetch_add as a tail call: following code would
> > not be executed, and in fact thrown away because there is a barrier
> > after tail calls.
> > 
> > This fixes it.

> > PR middle-end/80902
> > * builtins.c (expand_builtin_atomic_fetch_op): If emitting code after
> > a call, force the call to not be a tail call.
> Hmmm.  I wonder if we have similar problems elsewhere.  For example
> expand_builtin_int_roundingfn_2, stack_protect_epilogue,
> expand_builtin_trap (though this one probably isn't broken in practice),
> expand_ifn_atomic_compare_exchange_into_call.
> 
> OK, but please check the other instances where we call expand_call, then
> continue generating code afterwards.  Fixing those can be a follow-up patch.

I guess we want an expand_call_notail helper?  Or, hrm, why are function
calls expanded as tail calls at all, should that not be decided later?


Segher


[libcilkrts] Fix 64-bit SPARC/Linux port

2017-06-23 Thread Eric Botcazou
Since libcilkrts was ported to the SPARC architecture by Rainer, running the 
testsuite on SPARC/Linux in 64-bit mode with sufficiently high parallelim has 
resulted in an almost guaranteed kernel panic.

Fixed thusly, tested on SPARC64/Linux and SPARC/Solaris., applied to mainline 
and 7 branch.  Rainer kindly agreed to submit a copy of the fix to the master 
repository when he gets a chance.

* runtime/config/sparc/os-unix-sysdep.c (__cilkrts_getticks): Adjust
preprocessor test for SPARC/Linux.
* runtime/jmpbuf.h (CILK_[UN]ADJUST_SP): Likewise.

-- 
Eric BotcazouIndex: runtime/config/sparc/os-unix-sysdep.c
===
--- runtime/config/sparc/os-unix-sysdep.c	(revision 249451)
+++ runtime/config/sparc/os-unix-sysdep.c	(working copy)
@@ -47,7 +47,7 @@
  *  for your assistance in helping us improve Cilk Plus.
  *
  *
- * This file contains system-specific code for sparc-based systems
+ * This file contains system-specific code for SPARC-based systems
  */
 
 #include "os.h"
@@ -60,7 +60,7 @@
 COMMON_SYSDEP unsigned long long __cilkrts_getticks(void)
 {
 unsigned long long tick;
-#ifdef __sparcv9
+#if defined(__sparcv9) || defined(__arch64__)
 __asm__ volatile("rd %%tick, %0" : "=r"(tick));
 #else
 __asm__ volatile("rd %%tick, %L0\n"
Index: runtime/jmpbuf.h
===
--- runtime/jmpbuf.h	(revision 249451)
+++ runtime/jmpbuf.h	(working copy)
@@ -110,8 +110,8 @@
 /**
  * @brief Some architecture-dependent stack adjustment.
  */
-#if defined(__sparcv9)
-// Subtract sparc v9 stack bias so the actual stack starts at the
+#if defined(__sparcv9) || (defined(__sparc__) && defined(__arch64__))
+// Subtract SPARC V9 stack bias so the actual stack starts at the
 // allocated area.
 #   define CILK_ADJUST_SP(SP) ((SP) - 2047)
 #   define CILK_UNADJUST_SP(SP) ((SP) + 2047)


Re: [PATCH], PR target/80510, Optimize 32-bit offsettable memory references on power7/power8

2017-06-23 Thread Segher Boessenkool
On Thu, Jun 22, 2017 at 06:54:52PM -0400, Michael Meissner wrote:
> This patch implements the necessary move and peephole support for 32-bit ISA
> 2.05/2.06 (power7/power8) targets, so that the compiler can optimize:
> 
>   load FPR,   move FPR, ALTIVEC
>   move ALTIVEC, FPR   store FPR, 
> 
> into:
> 
>   ADDI GPR,  ADDI GPR, 
>   load ALTIVEC, GPR   store ALTIVEC, GPR

> Can I install this into the trunk and after a burn in period, install it on 
> the
> GCC 7 and GCC 6 branches (the previous patch for 64-bit is already installed 
> on
> both branches)?  If desired, I can make sure it gets into 6.4, or I can wait 
> to
> install the patch until after 6.4 ships.

Okay for trunk; okay for 7.  Also okay for 6, if you are confident it will
not cause problems.  Thanks,


Segher


libgo patch committed: Align siginfo argument to waitid

2017-06-23 Thread Ian Lance Taylor
Backport a patch just committed to gc tip.  There is a bug report on
the golang-dev mailing list
(https://groups.google.com/d/msg/golang-dev/sDg-t1_DPw0/-AJmLxgPBQAJ)
in which waitid running on MIPS returns EFAULT, and this patch may fix
the problem.  Bootstrapped and ran Go tests on x86_64-pc-linux-gnu.
Committed to mainline and GCC 7 branch.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249595)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-f107cc8bced1939b0083231fc1ea24669ca4832c
+c49ba1ca392b3c23a4b3934e0a95a908b1dc2f1d
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/os/wait_waitid.go
===
--- libgo/go/os/wait_waitid.go  (revision 249205)
+++ libgo/go/os/wait_waitid.go  (working copy)
@@ -23,7 +23,7 @@ func (p *Process) blockUntilWaitable() (
// On Darwin, it requires greater than or equal to 64 bytes
// for darwin/{386,arm} and 104 bytes for darwin/amd64.
// We don't care about the values it returns.
-   var siginfo [128]byte
+   var siginfo [16]uint64
psig := [0]
_, _, e := syscall.Syscall6(syscall.SYS_WAITID, _P_PID, uintptr(p.Pid), 
uintptr(unsafe.Pointer(psig)), syscall.WEXITED|syscall.WNOWAIT, 0, 0)
runtime.KeepAlive(p)


Re: [nvptx, PATCH, 1/3] Add generic v2 vector mode support

2017-06-23 Thread Jeff Law
On 06/06/2017 07:02 AM, Tom de Vries wrote:
> Hi,
> 
> this patch adds generic v2 vector mode support for nvptx.
> 
> Tested in nvptx mainkernel mode and x86_64 accelerator mode.
> 
> OK for trunk?
> 
> Thanks,
> - Tom
> 
> 0001-Add-generic-v2-vector-mode-support.patch
> 
> 
> Add generic v2 vector mode support
> 
> 2017-06-06  Tom de Vries  
> 
>   * config/nvptx/nvptx.c (nvptx_print_operand): Handle v2 vector mode.
OK.
jeff


Re: [PATCH] PR ipa/81185, Improve naming of target_clone cloned function names

2017-06-23 Thread Jeff Law
On 06/22/2017 07:58 PM, Michael Meissner wrote:
> he June 19th, 2017 change from Martin Liska  , made the
> target_clones support more usable, in that it it changed the external name 
> from
> being the default function to being the ifunc handler.  This means that calls
> from other modules will call the appropriate clone based on what machine it is
> running on.
> 
> The name generated for each of the clone functions for non-default
> architectures has the string ".default." added to it, while it already
> has the various names for the different architectures.
> 
> I tracked this down to create_dispatcher_calls getting called for each of the
> clone functions, since they have the DECL_FUNCTION_VERISIONED bit set.
> 
> I have done bootstrap builds on both x86_64 and PowerPC and this patch builds
> the current GCC and has no regressions in the test suite.  Can I check it into
> the trunk?
> 
> 2017-06-22  Michael Meissner  
> 
>   PR ipa/81185
>   * multiple_target.c (create_dispatcher_calls): Only create the
>   dispatcher call if the function is the default clone of a
>   versioned function.
OK
jeff


Re: [PATCH][Testsuite] Use user defined memmove in gcc.c-torture/execute/builtins/memops-asm-lib.c

2017-06-23 Thread Jeff Law
On 06/23/2017 03:19 AM, Renlin Li wrote:
> Hi all,
> 
> After the change r249278. bcopy is folded into memmove. And in newlib
> aarch64
> memmove implementation, it will call memcpy in certain conditions.
> The memcpy defined in memops-asm-lib.c will abort when the test is running.
> 
> In this case, I defined a user memmove function which by pass the
> library one.
> So that memcpy won't be called accidentally.
> 
> Okay to commit?
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-06-22  Renlin Li  
> Szabolcs Nagy  
> 
> * gcc.c-torture/execute/builtins/memops-asm-lib.c (my_memmove): New.
> * gcc.c-torture/execute/builtins/memops-asm.c (memmove): Declare
> memmove.
OK.
jeff


Re: [PATCH] Avoid UB in the Ada FE

2017-06-23 Thread Eric Botcazou
> Another option would be to change atomic_access_required_p to add
> *sync = false;
> before the first return, or to initialize bool sync = false; at the
> definition.

Yes, let's do the initialization at the definition (no need to retest).

-- 
Eric Botcazou


Re: [PATCH GCC][4/6]Simple patch skips single element component

2017-06-23 Thread Jeff Law
On 05/12/2017 05:28 AM, Bin Cheng wrote:
> Hi,
> This is a simple patch discarding simple element components earlier in 
> predcom.
> Bootstrap and test on x86_64 and AArch64, is it OK?
> 
> Thanks,
> bin
> 2017-05-10  Bin Cheng  
> 
>   * tree-predcom.c (determine_roots_comp): Skip single-elem chain.
> 
OK.
jeff


Re: [PATCH GCC][2/6]Compute available register for each register classes

2017-06-23 Thread Jeff Law
On 05/12/2017 05:27 AM, Bin Cheng wrote:
> Hi,
> Currently available/clobber registers are computed only for GENERAL_REGS, this
> patch extends it for all reg pressure classes.  It also updates existing uses
> in various places.
> 
> Bootstrap and test on x86_64 and AArch64.  Is it OK?
> 
> Thanks,
> bin
> 2017-05-10  Bin Cheng  
> 
>   * cfgloop.h (struct target_cfgloop): Change x_target_avail_regs and
>   x_target_clobbered_regs into array fields.
>   (init_avail_clobber_regs): New declaration.
>   * cfgloopanal.c (memmodel.h, ira.h): Include header files.
>   (init_set_costs): Remove computation for old x_target_avail_regs and
>   x_target_clobbered_regs fields.
>   (init_avail_clobber_regs): New function.
>   (estimate_reg_pressure_cost): Update the uses.
>   * toplev.c (cfgloop.h): Update comment why the header file is needed.
>   (backend_init_target): Call init_avail_clobber_regs.
>   * tree-predcom.c (memmodel.h, ira.h): Include header files.
>   (MAX_DISTANCE): Update the use.
>   * tree-ssa-loop-ivopts.c (determine_set_costs): Update the uses.
>   (determine_set_costs): Ditto.
> 
OK.
jeff


Re: [PATCH] reorganize block/string move/compare expansions out of rs6000.c

2017-06-23 Thread Segher Boessenkool
On Thu, Jun 22, 2017 at 04:01:54PM -0500, Aaron Sawdey wrote:
> This patch moves about 1400 lines of code for various block and string
> compare/move/zero expansions out of rs6000.c into a new file 
> rs6000-string.c. Segher had asked me to do this before I go adding new
> code here.
> 
> Bootstrap passes on ppc64le, regtest in progress. OK for trunk if that
> passes?


> 2017-06-22  Aaron Sawdey  
> 
>   * config/rs6000/rs6000-string.c (expand_block_clear,
>   do_load_for_compare, select_block_compare_mode,
>   compute_current_alignment, expand_block_compare,
>   expand_strncmp_align_check, expand_strn_compare,
>   expand_block_move, rs6000_output_load_multiple)
>   Move functions related to string/block move/compare
>   to a separate file.
>   * config/rs6000/rs6000.c Move above functions to rs6000-string.c.
>   * config/rs6000/rs6000-protos.h (rs6000_emit_dot_insn) Add prototype
>   for this function which is now used in two files.
>   * config/rs6000/t-rs6000 Add rule to compile rs6000-string.o.
>   * config.gcc Add rs6000-string.o to extra_objs for
>   targets powerpc*-*-* and rs6000*-*-*.

You're missing colons everywhere here.

> +/* Subroutines used for code generation on IBM RS/6000.

Please make this more specific.

> +   Copyright (C) 1991-2017 Free Software Foundation, Inc.
> +   Contributed by Richard Kenner (ken...@vlsi1.ultra.nyu.edu)

I don't think Richard wrote anything in this new file?

> +#define min(A,B) ((A) < (B) ? (A) : (B))
> +#define max(A,B) ((A) > (B) ? (A) : (B))

There are MIN and MAX in system.h already...  min is used only once, max
is used never; change it to use MIN?

Okay for trunk with those things taken care of.  Thanks!


Segher


Re: [PATCH GCC][1/6]Compute type mode and register pressure class mapping

2017-06-23 Thread Jeff Law
On 05/12/2017 05:27 AM, Bin Cheng wrote:
> Hi,
> This will be a patch series implementing an interface which estimates register
> pressure on tree ssa and uses the information in predictive common 
> optimization.
> This the first patch computing map from type modes to register pressure 
> classes.
> 
> Given there is no pseudo register on tree ssa form, we need type mode -> 
> register
> class information to compute register pressure.  This patch adds such map in
> struct target_ira and a function computing the map.  Though the map is 
> computed
> by guess, it's enough for use on tree level.  As a matter of fact, we only 
> need
> to identify GENERAL, FLOAT and VECTOR register classes.
> 
> Bootstrap and test on x86_64 and AArch64, is it OK?
> 
> Thanks,
> bin
> 
> 2017-05-10  Bin Cheng  
> 
>   * ira.c (setup_mode_classes): New function.
>   (find_reg_classes): Call above function.
>   * ira.h (struct target_ira): New field x_ira_mode_classes.
>   (ira_mode_classes): New macro.
> 
OK.
jeff


Re: C/C++ PATCH to add __typeof_noqual (PR c/65455, c/39985)

2017-06-23 Thread Joseph Myers
On Fri, 23 Jun 2017, Marek Polacek wrote:

> You'll also see that I dropped all qualifiers for __auto_type.  But I actually
> couldn't trigger the
> init_type = c_build_qualified_type (init_type, TYPE_UNQUALIFIED);
> line in c_parser_declaration_or_fndef (even when running the whole testsuite)
> so I'm not convinced it makes any difference.

It looks like it would only make a difference, in the present code, for 
the case of an atomic register variable, or bit-field in an atomic 
structure, as the initializer.  Those are the cases where 
convert_lvalue_to_rvalue would not return a non-atomic result, given an 
atomic argument.  With the proposed change, it should apply to any 
qualified lvalue used as the initializer.

> @@ -506,6 +508,7 @@ const struct c_common_resword c_common_reswords[] =
>{ "typename",  RID_TYPENAME,   D_CXXONLY | D_CXXWARN },
>{ "typeid",RID_TYPEID, D_CXXONLY | D_CXXWARN },
>{ "typeof",RID_TYPEOF, D_ASM | D_EXT },
> +  { "typeof_noqual", RID_TYPEOF_NOQUAL, D_ASM | D_EXT },
>{ "union", RID_UNION,  0 },
>{ "unsigned",  RID_UNSIGNED,   0 },
>{ "using", RID_USING,  D_CXXONLY | D_CXXWARN },

I don't think we should have this keyword variant.

I think there should be tests of the change to __auto_type.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [committed] Fix -Werror=class-memaccess failures in jit testsuite (PR jit/81144)

2017-06-23 Thread Martin Sebor

On 06/20/2017 06:54 PM, David Malcolm wrote:

On Tue, 2017-06-20 at 17:15 -0600, Martin Sebor wrote:

On 06/20/2017 03:25 PM, David Malcolm wrote:

This patch fixes a couple of failures of the form:

  error: 'void* memset(void*, int, size_t)' clearing an object of
non-trivial
type 'struct quadratic_test'; use assignment or value
-initialization
instead [-Werror=class-memaccess]
  note: 'struct quadratic_test' declared here
  cc1plus: all warnings being treated as errors

seen within the jit testsuite, by using zero-initialization instead
of memset.

(presumably introduced by r249234 aka
a324786b4ded9047d05463b4bce9d238b6c6b3ef)

Successfully tested on x86_64-pc-linux-gnu; takes jit.sum from:
  # of expected passes9211
  # of unexpected failures2
to:
  # of expected passes9349

Martin: it's unclear to me what the benefit of the warning is for
these
cases.  AIUI, it's complaining because the code is calling
the default ctor for struct quadratic_test, and then that object is
being clobbered by the memset.
But if I'm reading things right, the default ctor for this struct
zero-initializes all fields.  Can't the compiler simply optimize
away
the redundant memset, and not issue a warning?


Thanks for the info.


-Wclass-memaccess is issued because struct quadratic_test contains
members of classes that define a default ctor to initialize their
private members.
The premise behind the warning is that objects
of types with user-defined default and copy ctors should be
initialized by making use of their ctors, and those with private
data members manipulated via member functions rather than by
directly modifying their raw representation.  Using memset to
bypass the default ctor doesn't begin the lifetime of an object,
can violate invariants set up by it, and using it to overwrite
private members breaks encapsulation.  Examples of especially
insidious errors include overwriting const data, references, or
pointer to data members for which zero-initialization isn't
the same as clearing their bytes.


If I'm reading my code correctly, all of the default ctors of all of
the members of this struct are "merely" initializing the pointer they
wrap to NULL.


Yes, that's my reading as well.



So the ctors are initializing everything to NULL, and then the memset
redundant re-init's everything to 0 bits (I guess I was going for a
"belt and braces" approach to ensure that things are initialized).


The warning runs early on in the C++ front end and has no knowledge
of either the effects of the type's ctors, dtor, and copy assignment
operator, or whether the raw memory function is called in lieu of
initializing an object (e.g., in storage obtained from malloc or
operator new), or as a shortcut to zero out its members, or when
zeroing them out happens to be safe and doesn't actually do any
of those bad things I mentioned above.


Aha: so at the place where the warning runs it's not possible to access
the ctors and tell that they're assigning NULL everywhere?


Right, though I view it less as a limitation of the choice to
implement the warning in the FE and more as a feature.



Might it be possible to convert the warning to work in a two-phase way
where it first gathers up a vec of suspicious-looking modifications,
and then flushes them later, filtering against ctor information when it
has the latter?  (so that we don't have to warn for this case at
-Wall?)


With some effort I suppose it might be possible to do something
sophisticated like that but based on the warnings we've seen so
far I'm not convinced it's necessary or that it would time well
spent.  In my view, classes with user-defined ctors and other
special functions (dtors, copy assignment), i.e., basically non
trivial types, are preferably manipulated using these special
functions, and memset and friends should only be used only for
raw memory operations, not as a substitute for the former.  In
this case (as in most others I've seen, including the one in your
patch), the code is clearer, more concise, and in general, also
safer (and much easier for GCC to analyze for correctness than
calls to memset et al.)


Alternatively maybe this is PEBCAK at my end; if so, maybe a case for adding 
this to the changes.html page?  (and maybe adding some notes on workarounds 
there, and/or to invoke.texi?)


Sure, that sounds good to me.  Let me make a mental note to add
something to the manual.





That said, I'm sorry (and a little surprised) that I missed these
errors in my tests.  I thought I had all the languages covered by
using

   --enable-languages=all,ada,c,c++,fortran,go,lto,objc,obj-c++

but I guess jit still isn't implied by all, even after Nathan's
recent change to it.  Let me add jit to my script (IIRC, I once
had it there but it was causing some trouble and I took it out.)


Reading r248454 (aka 01b4453cde8f1871495955298043d9fb589e4a36), it
looks like "jit" is only included in "all" if you also pass
  --enable-host-shared

Re: [PATCH][Testsuite] Use user defined memmove in gcc.c-torture/execute/builtins/memops-asm-lib.c

2017-06-23 Thread Renlin Li

Hi Martin,

On 23/06/17 16:27, Martin Sebor wrote:

On 06/23/2017 03:19 AM, Renlin Li wrote:

Hi all,

After the change r249278. bcopy is folded into memmove. And in newlib
aarch64
memmove implementation, it will call memcpy in certain conditions.
The memcpy defined in memops-asm-lib.c will abort when the test is running.

In this case, I defined a user memmove function which by pass the
library one.
So that memcpy won't be called accidentally.

Okay to commit?


Having memmove call memcpy when there is no overlap seems like
a valid transformation.  I don't know which test specifically
fails so the question on my mind is whether it perhaps is overly
restrictive in assuming that this transformation must never take
place.  Other than that, although I can't really approve patches,
this one looks okay to me.  Thanks for getting to the bottom of
the failure and fixing it!


Sorry I didn't mention the regressions.
It only happens with aarch64 baremetal targets because of the newlib memmove 
implementation.

FAIL: gcc.c-torture/execute/builtins/memops-asm.c execution,  -O0
FAIL: gcc.c-torture/execute/builtins/memops-asm.c execution,  -O1
FAIL: gcc.c-torture/execute/builtins/memops-asm.c execution,  -O2
FAIL: gcc.c-torture/execute/builtins/memops-asm.c execution,  -O3 -g
FAIL: gcc.c-torture/execute/builtins/memops-asm.c execution,  -Og -g
FAIL: gcc.c-torture/execute/builtins/memops-asm.c execution,  -Os

I think the purpose of the test is to check, the original function is not directly called 
from the main_test function.

Instead, those calls are redirected to "my_" version. It will abort otherwise.
I CCed Richard Sandiford as he is the original contributor of the test case.

Before r249278, bcopy has a corresponding my_bcopy function which is actually 
got called.

Regards,
Renlin



Martin



gcc/testsuite/ChangeLog:

2017-06-22  Renlin Li  
Szabolcs Nagy  

* gcc.c-torture/execute/builtins/memops-asm-lib.c (my_memmove): New.
* gcc.c-torture/execute/builtins/memops-asm.c (memmove): Declare
memmove.




Re: [PATCH] Fix PR71815 (SLSR misses PHI opportunities)

2017-06-23 Thread Bill Schmidt
Hi,

Here's version 2 of the patch to fix the missed SLSR PHI opportunities,
addressing Richard's comments.  I've repeated regstrap and SPEC testing
on powerpc64le-unknown-linux-gnu, again showing the patch as neutral
with respect to performance.  Is this ok for trunk?

Thanks for the review!

Bill


[gcc]

2016-06-23  Bill Schmidt  

* gimple-ssa-strength-reduction.c (uses_consumed_by_stmt): New
function.
(find_basis_for_candidate): Call uses_consumed_by_stmt rather than
has_single_use.
(slsr_process_phi): Likewise.
(replace_uncond_cands_and_profitable_phis): Don't replace a
multiply candidate with a stride of 1 (copy or cast).
(phi_incr_cost): Call uses_consumed_by_stmt rather than
has_single_use.
(lowest_cost_path): Likewise.
(total_savings): Likewise.

[gcc/testsuite]

2016-06-23  Bill Schmidt  

* gcc.dg/tree-ssa/slsr-35.c: Remove -fno-code-hoisting workaround.
* gcc.dg/tree-ssa/slsr-36.c: Likewise.


Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 249223)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -482,6 +482,36 @@ find_phi_def (tree base)
   return c->cand_num;
 }
 
+/* Determine whether all uses of NAME are directly or indirectly
+   used by STMT.  That is, we want to know whether if STMT goes
+   dead, the definition of NAME also goes dead.  */
+static bool
+uses_consumed_by_stmt (tree name, gimple *stmt, unsigned recurse = 0)
+{
+  gimple *use_stmt;
+  imm_use_iterator iter;
+  bool retval = true;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, iter, name)
+{
+  if (use_stmt == stmt || is_gimple_debug (use_stmt))
+   continue;
+
+  if (!is_gimple_assign (use_stmt)
+ || !gimple_get_lhs (use_stmt)
+ || !is_gimple_reg (gimple_get_lhs (use_stmt))
+ || recurse >= 10
+ || !uses_consumed_by_stmt (gimple_get_lhs (use_stmt), stmt,
+recurse + 1))
+   {
+ retval = false;
+ BREAK_FROM_IMM_USE_STMT (iter);
+   }
+}
+
+  return retval;
+}
+
 /* Helper routine for find_basis_for_candidate.  May be called twice:
once for the candidate's base expr, and optionally again either for
the candidate's phi definition or for a CAND_REF's alternative base
@@ -558,7 +588,8 @@ find_basis_for_candidate (slsr_cand_t c)
 
  /* If we found a hidden basis, estimate additional dead-code
 savings if the phi and its feeding statements can be removed.  */
- if (basis && has_single_use (gimple_phi_result (phi_cand->cand_stmt)))
+ tree feeding_var = gimple_phi_result (phi_cand->cand_stmt);
+ if (basis && uses_consumed_by_stmt (feeding_var, c->cand_stmt))
c->dead_savings += phi_cand->dead_savings;
}
 }
@@ -789,7 +820,7 @@ slsr_process_phi (gphi *phi, bool speed)
 
  /* Gather potential dead code savings if the phi statement
 can be removed later on.  */
- if (has_single_use (arg))
+ if (uses_consumed_by_stmt (arg, phi))
{
  if (gimple_code (arg_stmt) == GIMPLE_PHI)
savings += arg_cand->dead_savings;
@@ -2479,7 +2510,9 @@ replace_uncond_cands_and_profitable_phis (slsr_can
 {
   if (phi_dependent_cand_p (c))
 {
-  if (c->kind == CAND_MULT)
+  /* A multiply candidate with a stride of 1 is just an artifice
+of a copy or cast; there is no value in replacing it.  */
+  if (c->kind == CAND_MULT && wi::to_widest (c->stride) != 1)
{
  /* A candidate dependent upon a phi will replace a multiply by 
 a constant with an add, and will insert at most one add for
@@ -2725,8 +2758,9 @@ phi_incr_cost (slsr_cand_t c, const widest_int 
  if (gimple_code (arg_def) == GIMPLE_PHI)
{
  int feeding_savings = 0;
+ tree feeding_var = gimple_phi_result (arg_def);
  cost += phi_incr_cost (c, incr, arg_def, _savings);
- if (has_single_use (gimple_phi_result (arg_def)))
+ if (uses_consumed_by_stmt (feeding_var, phi))
*savings += feeding_savings;
}
  else
@@ -2739,7 +2773,7 @@ phi_incr_cost (slsr_cand_t c, const widest_int 
  tree basis_lhs = gimple_assign_lhs (basis->cand_stmt);
  tree lhs = gimple_assign_lhs (arg_cand->cand_stmt);
  cost += add_cost (true, TYPE_MODE (TREE_TYPE (basis_lhs)));
- if (has_single_use (lhs))
+ if (uses_consumed_by_stmt (lhs, phi))
*savings += stmt_cost (arg_cand->cand_stmt, true);
}
}
@@ -2816,7 +2850,7 @@ lowest_cost_path (int cost_in, int repl_savings, s
   gimple *phi = lookup_cand 

Re: [PATCH] handling address mode changes inside extract_bit_field

2017-06-23 Thread Jeff Law
On 06/08/2017 11:07 AM, Jim Wilson wrote:
> I've got a testcase to add for this patch.  Sorry about the delay, I
> took some time off to deal with a medical problem.
> 
> This was tested with and without the extract_bit_field patch.  The
> testcase fails without the patch and works with the patch.
Thanks.  Please go ahead and install this.

Sorry for the delay on my side as well.  stack-clash has had me buried
since late May and I'm just starting to dig out a bit.

jeff


libgo patch committed: Don't crash if no p in kickoff

2017-06-23 Thread Ian Lance Taylor
In libgo the kickoff function for g0 can be invoked without a p, for
example from mcall(exitsyscall0) in exitsyscall after exitsyscall has
cleared the p field. The assignment gp.param = nil will invoke a write
barrier.  If gp.param is not already nil, this will require a p. Avoid
the problem for a specific case that is known to be OK: when the value
in gp.param is a *g.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249594)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-29c61dc3c5151df5de9362b7882ccf04679df976
+f107cc8bced1939b0083231fc1ea24669ca4832c
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/proc.go
===
--- libgo/go/runtime/proc.go(revision 249577)
+++ libgo/go/runtime/proc.go(working copy)
@@ -1097,7 +1097,25 @@ func kickoff() {
fv := gp.entry
param := gp.param
gp.entry = nil
+
+   // When running on the g0 stack we can wind up here without a p,
+   // for example from mcall(exitsyscall0) in exitsyscall.
+   // Setting gp.param = nil will call a write barrier, and if
+   // there is no p that write barrier will crash. When called from
+   // mcall the gp.param value will be a *g, which we don't need to
+   // shade since we know it will be kept alive elsewhere. In that
+   // case clear the field using uintptr so that the write barrier
+   // does nothing.
+   if gp.m.p == 0 {
+   if gp == gp.m.g0 && gp.param == unsafe.Pointer(gp.m.curg) {
+   *(*uintptr)(unsafe.Pointer()) = 0
+   } else {
+   throw("no p in kickoff")
+   }
+   }
+
gp.param = nil
+
fv(param)
goexit1()
 }


Go patch committed: Add go:notinheap magic comment

2017-06-23 Thread Ian Lance Taylor
This patch to the Go frontend implements go:notinheap as the gc
compiler does. A type marked as go:notinheap may not live in the heap,
and does not require a write barrier. Struct and array types that
incorporate notinheap types are themselves notinheap. Allocating a
value of a notinheap type on the heap is an error.

This is not just an optimization. There is code where a write barrier
may not occur that was getting a write barrier with gccgo but not gc,
because the types in question were notinheap. The case I found was
setting the mcache field in exitsyscallfast.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249590)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-c4adba240f9d5af8ab0534316d6b05bd988c432c
+29c61dc3c5151df5de9362b7882ccf04679df976
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 249205)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -7499,6 +7499,10 @@ Builtin_call_expression::lower_make(Stat
 }
   Type* type = first_arg->type();
 
+  if (!type->in_heap())
+go_error_at(first_arg->location(),
+   "can't make slice of go:notinheap type");
+
   bool is_slice = false;
   bool is_map = false;
   bool is_chan = false;
@@ -8742,6 +8746,9 @@ Builtin_call_expression::do_check_types(
  }
 
Type* element_type = slice_type->array_type()->element_type();
+   if (!element_type->in_heap())
+ go_error_at(args->front()->location(),
+ "can't append to slice of go:notinheap type");
if (this->is_varargs())
  {
if (!args->back()->type()->is_slice_type()
@@ -12436,6 +12443,13 @@ Allocation_expression::do_type()
   return Type::make_pointer_type(this->type_);
 }
 
+void
+Allocation_expression::do_check_types(Gogo*)
+{
+  if (!this->type_->in_heap())
+go_error_at(this->location(), "can't heap allocate go:notinheap type");
+}
+
 // Make a copy of an allocation expression.
 
 Expression*
Index: gcc/go/gofrontend/expressions.h
===
--- gcc/go/gofrontend/expressions.h (revision 249205)
+++ gcc/go/gofrontend/expressions.h (working copy)
@@ -3220,6 +3220,9 @@ class Allocation_expression : public Exp
   do_determine_type(const Type_context*)
   { }
 
+  void
+  do_check_types(Gogo*);
+
   Expression*
   do_copy();
 
Index: gcc/go/gofrontend/lex.cc
===
--- gcc/go/gofrontend/lex.cc(revision 249564)
+++ gcc/go/gofrontend/lex.cc(working copy)
@@ -1897,6 +1897,11 @@ Lex::skip_cpp_comment()
   // Applies to the next function.  Do not inline the function.
   this->pragmas_ |= GOPRAGMA_NOINLINE;
 }
+  else if (verb == "go:notinheap")
+{
+  // Applies to the next type.  The type does not live in the heap.
+  this->pragmas_ |= GOPRAGMA_NOTINHEAP;
+}
   else if (verb == "go:systemstack")
 {
   // Applies to the next function.  It must run on the system stack.
Index: gcc/go/gofrontend/lex.h
===
--- gcc/go/gofrontend/lex.h (revision 249205)
+++ gcc/go/gofrontend/lex.h (working copy)
@@ -64,7 +64,8 @@ enum GoPragma
   GOPRAGMA_NOWRITEBARRIER = 1 << 6,// No write barriers.
   GOPRAGMA_NOWRITEBARRIERREC = 1 << 7, // No write barriers here or callees.
   GOPRAGMA_CGOUNSAFEARGS = 1 << 8, // Pointer to arg is pointer to all.
-  GOPRAGMA_UINTPTRESCAPES = 1 << 9 // uintptr(p) escapes.
+  GOPRAGMA_UINTPTRESCAPES = 1 << 9,// uintptr(p) escapes.
+  GOPRAGMA_NOTINHEAP = 1 << 10 // type is not in heap.
 };
 
 // A token returned from the lexer.
Index: gcc/go/gofrontend/parse.cc
===
--- gcc/go/gofrontend/parse.cc  (revision 249205)
+++ gcc/go/gofrontend/parse.cc  (working copy)
@@ -1310,14 +1310,16 @@ Parse::declaration()
   const Token* token = this->peek_token();
 
   unsigned int pragmas = this->lex_->get_and_clear_pragmas();
-  if (pragmas != 0 && !token->is_keyword(KEYWORD_FUNC))
+  if (pragmas != 0
+  && !token->is_keyword(KEYWORD_FUNC)
+  && !token->is_keyword(KEYWORD_TYPE))
 go_warning_at(token->location(), 0,
  "ignoring magic comment before non-function");
 
   if (token->is_keyword(KEYWORD_CONST))
 this->const_decl();
   else if (token->is_keyword(KEYWORD_TYPE))
-this->type_decl();
+this->type_decl(pragmas);
   else if (token->is_keyword(KEYWORD_VAR))
 this->var_decl();
   else if (token->is_keyword(KEYWORD_FUNC))
@@ -1342,7 +1344,8 

Re: [PATCH,rs6000] Add IEEE 128 support for several existing built-in functions

2017-06-23 Thread Segher Boessenkool
Hi Kelvin,

On Wed, Jun 21, 2017 at 04:42:46PM -0600, Kelvin Nilsen wrote:
> This patch adds IEEE 128 support to the existing scalar_insert_exp,
> scalar_extract_exp, scalar_extract_sig, scalar_test_data_class, and
> scalar_test_neg rs6000 built-in functions.  Test programs are provided
> to exercise the new IEEE 128 functionality and to validate forms of
> these built-in functions that do not depend on IEEE 128 support.


>   * config/rs6000/rs6000-builtin.def (VSEEQP): Add scalar extract

Stray tab (after "scalar").

> +;; VSX Scalar Extract Exponent Quad-Precision
> +(define_insn "xsxexpqp"
> +  [(set (match_operand:DI 0 "altivec_register_operand" "=v")
> + (unspec:DI [(match_operand:KF 1 "altivec_register_operand" "v")]
> +  UNSPEC_VSX_SXEXPDP))]
> +  "TARGET_P9_VECTOR && TARGET_64BIT"
> +  "xsxexpqp %0,%1"
> +  [(set_attr "type" "vecmove")])

TARGET_64BIT should probably be removed (and if not, a comment would help).

You also may want to explain the low half of the TI result is zeroed, but
we ignore it here?  (Because some other insns use TI).  Or maybe that is
obvious to people who actually know the instructions ;-)

> +;; VSX Scalar Extract Significand Quad-Precision
> +(define_insn "xsxsigqp"
> +  [(set (match_operand:TI 0 "altivec_register_operand" "=v")
> + (unspec:TI [(match_operand:KF 1 "altivec_register_operand" "v")]
> +  UNSPEC_VSX_SXSIGDP))]
> +  "TARGET_P9_VECTOR && TARGET_64BIT"
> +  "xsxsigqp %0,%1"
> +  [(set_attr "type" "vecmove")])

Should this be UNSPEC_VSX_SXSIGQP?  Or, if we can use the same unspec for
all data sizes, its name should not say "DP" :-)

(And TARGET_64BIT; please check it everywhere).

> +;; VSX Scalar Test Data Class Quad-Precision
> +;;  (The lt bit is set if operand 1 is negative.  The eq bit is set
> +;;   if any of the conditions tested by operand 2 are satisfied.
> +;;   The gt and unordered bits are cleared to zero.)
> +(define_expand "xststdcqp"
> +  [(set (match_dup 3)
> + (compare:CCFP
> +  (unspec:KF
> +   [(match_operand:KF 1 "vsx_register_operand" "v")
> +(match_operand:SI 2 "u7bit_cint_operand" "n")]
> +   UNSPEC_VSX_STSTDC)
> +  (match_dup 4)))
> +   (set (match_operand:SI 0 "register_operand" "=r")
> + (eq:SI (match_dup 3)
> +(const_int 0)))]

So this is specialised to only testing the "eq" part.  That is fine, but
please add a comment?

> +  operands[4] = CONST0_RTX (SImode);

Please write const0_rtx, instead, for scalar integer modes.

> +(define_insn "*xststdcqp"
> +  [(set (match_operand:CCFP 0 "" "=y")
> + (compare:CCFP
> +  (unspec:KF [(match_operand:KF 1 "altivec_register_operand" "v")
> +  (match_operand:SI 2 "u7bit_cint_operand" "n")]
> +   UNSPEC_VSX_STSTDC)
> +  (match_operand:SI 3 "zero_constant" "j")))]

Can't you just write (const_int 0) here?

> + if (GET_MODE_PRECISION (arg1_mode) > 64)
> +   {
> + /* if first argument is of float variety, choose variant
> +that expects __ieee128 argument.  Otherwise, expect
> +__int128 argument.  */

"If".

> + else
> +   {
> + /* if first argument is of float variety, choose variant

(again).

> +/* { dg-do compile { target { powerpc*-*-* } } } */

Btw, tests in gcc.target/powerpc do not need this line: from powerpc.exp:

# Exit immediately if this isn't a PowerPC target.
if { ![istarget powerpc*-*-*] && ![istarget rs6000-*-*] } then {
  return
}

(and compile is the default action).

Doesn't hurt either of course.

>  The @code{scalar_extract_exp} and @code{scalar_extract_sig} built-in
>  functions return the significand and the biased exponent value
>  respectively of their @code{source} arguments.
> -Within the result returned by @code{scalar_extract_sig},
> -the @code{0x10} bit is set if the
> +When supplied with a 64-bit @code{source} argument, the
> +result returned by @code{scalar_extract_sig} has
> +the @code{0x10} bit set if the
>  function's @code{source} argument is in normalized form.
>  Otherwise, this bit is set to 0.
> +When supplied with a 128-bit @code{source} argument, the
> +@code{0x1} bit of the result is
> +treated similarly.

Which bit is that?  Hard to tell...  Maybe write
@code{0x0001} (and @code{0x0010})
to make it easier to read?


Please consider the trivialities above; the patch is okay for trunk.
Thanks,


Segher


Minor update to contrib.texi

2017-06-23 Thread Jeff Law

Steven contacted me about getting his contribution mentioned.  While we
no longer use enquire to generate float.h adding an entry for Steven's
work seemed reasonable.

Installed on the trunk.

Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 327d809a87c..130fa37ad34 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2017-06-23  Jeff Law  
+
+   * doc/contrib.texi: Add entry for Steven Pemberton's work on
+   enquire.
+
 2017-06-23  Will Schmidt  
 
* config/rs6000/rs6000.c: Add include of ssa-propagate.h for
diff --git a/gcc/doc/contrib.texi b/gcc/doc/contrib.texi
index 4f5ffc1710f..60b71026779 100644
--- a/gcc/doc/contrib.texi
+++ b/gcc/doc/contrib.texi
@@ -761,6 +761,11 @@ clean-ups and porting work, and maintaining the IRIX, 
Solaris 2, and
 Tru64 UNIX ports.
 
 @item
+Steven Pemberton for his contribution of @file{enquire} which allowed GCC to
+determine various properties of the floating point unit and generate
+@file{float.h} in older versions of GCC.
+
+@item
 Hartmut Penner for work on the s390 port.
 
 @item


Re: [PATCH, GCC/ARM, Stage 1] Rename FPSCR builtins to correct names

2017-06-23 Thread Kyrill Tkachov

Hi Thomas,

On 23/06/17 16:48, Thomas Preudhomme wrote:

Hi Kyrill,

On 10/04/17 15:01, Kyrill Tkachov wrote:

Hi Prakhar,
Sorry for the delay,

On 22/03/17 10:46, Prakhar Bahuguna wrote:

The GCC documentation in section 6.60.8 ARM Floating Point Status and Control
Intrinsics states that the FPSCR register can be read and written to using the
intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr. However, these
are misnamed within GCC itself and these intrinsic names are not recognised.
This patch corrects the intrinsic names to match the documentation, and adds
tests to verify these intrinsics generate the correct instructions.

Testing done: Ran regression tests on arm-none-eabi for Cortex-M4.

2017-03-09  Prakhar Bahuguna  

gcc/ChangeLog:

* gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename
  __builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename
  __builtin_arm_stfscr to __builtin_arm_set_fpscr.
* gcc/testsuite/gcc.target/arm/fpscr.c: New file.

Okay for stage 1?


I see that the mistake was in not addressing one of the review comments in:
https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01832.html
properly in the patch that added these functions :(

This is ok for stage 1 if a bootstrap and test on arm-none-linux-gnueabihf works
fine
I don't think we want to maintain the __builtin_arm_[ld,st]fscr names for
backwards compatibility
as they were not documented and are __builtin_arm* functions that we don't
guarantee to maintain.


How about a backport to GCC 5, 6 & 7? The patch applied cleanly on each of 
these versions and the testsuite didn't show any regression for any of the backport 
when run for Cortex-M7.


Yes, thanks.
These were always documented "correctly". The patch makes sure the 
implementation matches that documentation.

Kyrill



Patches attached for reference.

ChangeLog entries:

*** gcc/ChangeLog ***

2017-06-20  Thomas Preud'homme  

Backport from mainline
2017-05-04  Prakhar Bahuguna  

* gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename
__builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename
__builtin_arm_stfscr to __builtin_arm_set_fpscr.


*** gcc/testsuite/ChangeLog ***

2017-06-20  Thomas Preud'homme  

Backport from mainline
2017-05-04  Prakhar Bahuguna  

gcc/testsuite/
* gcc.target/arm/fpscr.c: New file.


Best regards,

Thomas




Re: [PATCH, GCC/ARM, Stage 1] Rename FPSCR builtins to correct names

2017-06-23 Thread Thomas Preudhomme

Hi Kyrill,

On 10/04/17 15:01, Kyrill Tkachov wrote:

Hi Prakhar,
Sorry for the delay,

On 22/03/17 10:46, Prakhar Bahuguna wrote:

The GCC documentation in section 6.60.8 ARM Floating Point Status and Control
Intrinsics states that the FPSCR register can be read and written to using the
intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr. However, these
are misnamed within GCC itself and these intrinsic names are not recognised.
This patch corrects the intrinsic names to match the documentation, and adds
tests to verify these intrinsics generate the correct instructions.

Testing done: Ran regression tests on arm-none-eabi for Cortex-M4.

2017-03-09  Prakhar Bahuguna  

gcc/ChangeLog:

* gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename
  __builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename
  __builtin_arm_stfscr to __builtin_arm_set_fpscr.
* gcc/testsuite/gcc.target/arm/fpscr.c: New file.

Okay for stage 1?


I see that the mistake was in not addressing one of the review comments in:
https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01832.html
properly in the patch that added these functions :(

This is ok for stage 1 if a bootstrap and test on arm-none-linux-gnueabihf works
fine
I don't think we want to maintain the __builtin_arm_[ld,st]fscr names for
backwards compatibility
as they were not documented and are __builtin_arm* functions that we don't
guarantee to maintain.


How about a backport to GCC 5, 6 & 7? The patch applied cleanly on each of these 
versions and the testsuite didn't show any regression for any of the backport 
when run for Cortex-M7.


Patches attached for reference.

ChangeLog entries:

*** gcc/ChangeLog ***

2017-06-20  Thomas Preud'homme  

Backport from mainline
2017-05-04  Prakhar Bahuguna  

* gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename
__builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename
__builtin_arm_stfscr to __builtin_arm_set_fpscr.


*** gcc/testsuite/ChangeLog ***

2017-06-20  Thomas Preud'homme  

Backport from mainline
2017-05-04  Prakhar Bahuguna  

gcc/testsuite/
* gcc.target/arm/fpscr.c: New file.


Best regards,

Thomas
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index da321440384628fb1770ff9e96377b341c61da6a..ab0e7c0167ac287b774378c3ecfb15a37d5362e7 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2017-06-20  Thomas Preud'homme  
+
+	Backport from mainline
+	2017-05-04  Prakhar Bahuguna  
+
+	* gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename
+	__builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename
+	__builtin_arm_stfscr to __builtin_arm_set_fpscr.
+
 2017-06-22  Martin Liska  
 
 	Backport from mainline
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 6f4fd9bdb9774b942f7f51145a406258a82ac1e7..edd6dac6ab73d24447e8c9f6e39c5ba22fbf9302 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -1747,10 +1747,10 @@ arm_init_builtins (void)
 	= build_function_type_list (unsigned_type_node, NULL);
 
   arm_builtin_decls[ARM_BUILTIN_GET_FPSCR]
-	= add_builtin_function ("__builtin_arm_ldfscr", ftype_get_fpscr,
+	= add_builtin_function ("__builtin_arm_get_fpscr", ftype_get_fpscr,
 ARM_BUILTIN_GET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE);
   arm_builtin_decls[ARM_BUILTIN_SET_FPSCR]
-	= add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr,
+	= add_builtin_function ("__builtin_arm_set_fpscr", ftype_set_fpscr,
 ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE);
 }
 }
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index b411b9dbc108f12bd1931f57d3f4c1f315161ca0..a865ed054597c12de76a953fcf751209c1e4b84c 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2017-06-20  Thomas Preud'homme  
+
+	Backport from mainline
+	2017-05-04  Prakhar Bahuguna  
+
+	* gcc.target/arm/fpscr.c: New file.
+
 2017-06-22  Martin Liska  
 
 	Backport from mainline
diff --git a/gcc/testsuite/gcc.target/arm/fpscr.c b/gcc/testsuite/gcc.target/arm/fpscr.c
new file mode 100644
index ..7b4d71d72d8964f6da0d0604bf59aeb4a895df43
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/fpscr.c
@@ -0,0 +1,16 @@
+/* Test the fpscr builtins.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_fp_ok } */
+/* { dg-skip-if "need fp instructions" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
+/* { dg-add-options arm_fp } */
+
+void
+test_fpscr ()
+{
+  volatile unsigned int status = __builtin_arm_get_fpscr ();
+  __builtin_arm_set_fpscr (status);
+}
+
+/* { dg-final { scan-assembler "mrc\tp10, 7, r\[0-9\]+, cr1, cr0, 0" } } */
+/* { dg-final { 

Re: [PING][PATCH] Move the check for any_condjump_p from sched-deps to target macros

2017-06-23 Thread Jeff Law
On 05/10/2017 10:46 PM, Hurugalawadi, Naveen wrote:
> Hi,
> 
>>> Doesn't this avoid calling the target hook in cases where it used to 
>>> call it before?
> Yes. Thanks for pointing it out.
> 
>>> Consider a conditional jump inside a parallel that is not a single set.
> Please find attached the modified patch that handles the case mentioned.
> Please review the patch and let us know if its okay?
> 
> Bootstrapped and Regression tested on AArch64 and X86_64.
> Please review the patch and let us know if its okay?
SOrry it's taken so long to come back to this.

The code is a bit convoluted, but I think you've preserved the existing
logic.You need a ChangeLog entry, but I think that's it.  Can you
please repost with a ChangeLog entry for final approval?

Thanks,

JEff


Re: [PATCH][Testsuite] Use user defined memmove in gcc.c-torture/execute/builtins/memops-asm-lib.c

2017-06-23 Thread Martin Sebor

On 06/23/2017 03:19 AM, Renlin Li wrote:

Hi all,

After the change r249278. bcopy is folded into memmove. And in newlib
aarch64
memmove implementation, it will call memcpy in certain conditions.
The memcpy defined in memops-asm-lib.c will abort when the test is running.

In this case, I defined a user memmove function which by pass the
library one.
So that memcpy won't be called accidentally.

Okay to commit?


Having memmove call memcpy when there is no overlap seems like
a valid transformation.  I don't know which test specifically
fails so the question on my mind is whether it perhaps is overly
restrictive in assuming that this transformation must never take
place.  Other than that, although I can't really approve patches,
this one looks okay to me.  Thanks for getting to the bottom of
the failure and fixing it!

Martin



gcc/testsuite/ChangeLog:

2017-06-22  Renlin Li  
Szabolcs Nagy  

* gcc.c-torture/execute/builtins/memops-asm-lib.c (my_memmove): New.
* gcc.c-torture/execute/builtins/memops-asm.c (memmove): Declare
memmove.




Re: [PATCH] Fix more PR80928 fallout

2017-06-23 Thread Jeff Law
On 06/23/2017 05:39 AM, Richard Biener wrote:
> 
> SLP induction vectorization runs into the issue that it remembers
> pointers to PHI nodes in the SLP tree during analysis.  But those
> may get invalidated by loop copying (for prologue/epilogue peeling
> or versioning) as the low-level CFG helper copy_bbs works in the
> way of copying individual BBs plus their outgoing edges but with
> old destinations and at the end re-directing the edges to the
> desired location.  In SSA this triggers the whole machinery of
> making room for new PHI nodes -- that is undesirable because it
> causes re-allocation of PHI nodes in the set of source blocks.
> 
> After much pondering I arrived at the following (least ugly) solution
> to this "problem" (well, I define it as a problem, it's at least
> an inefficiency and a workaround in the vectorizer would be way
> uglier).  Namely simply do not trigger the SSA machinery for
> blocks with BB_DUPLICATED (I skimmed all other users and they seem
> fine).
> 
> In the process I also implemented some poisoning of the old PHI node
> when we reallocate (well, free) PHI nodes.  But that triggers some
> other issues, one fixed by the tree-ssa-phionlycoprop.c hunk below.
> So I'm not submitting it as part of this fix.
> 
> Bootstrapped (with the poisoning sofar, plain patch still running)
> on x86_64-unknown-linux-gnu, testing in progress.
> 
> Comments welcome, testing won't finish before I leave for the
> weekend.
I fully support poisoning the old PHI nodes -- I tracked down a similar
problem just a few months back that probably would have been obvious if
we had poisoned the old nodes (79621 which is now a missed optimization
bug).

I wouldn't be surprised if there's others lurking and given the general
trend of using block duplication to enable various optimizations,
catching this stuff early would definitely be good.

Jeff


fenv.h builtins

2017-06-23 Thread Marc Glisse

Hello,

this is now the complete list of C99 fenv.h functions. I tried to be 
rather conservative, only fegetround is pure, and functions that "raise an 
exception" (in the fenv sense, not the C++ one) do not get nothrow,leaf. 
We can always change that afterwards.


I am not convinced there is much we will be able to do with those, but at 
least they are available now...


Trying to declare those functions with wrong prototypes now gives the 
expected error.


Bootstrap + testsuite on powerpc64le-unknown-linux-gnu.

2017-06-23  Marc Glisse  

* builtin-types.def (BT_FENV_T_PTR, BT_CONST_FENV_T_PTR,
BT_FEXCEPT_T_PTR, BT_CONST_FEXCEPT_T_PTR): New primitive types.
(BT_FN_INT_FENV_T_PTR, BT_FN_INT_CONST_FENV_T_PTR,
BT_FN_INT_FEXCEPT_T_PTR_INT, BT_FN_INT_CONST_FEXCEPT_T_PTR_INT):
New function types.
* builtins.def (BUILT_IN_FECLEAREXCEPT, BUILT_IN_FEGETENV,
BUILT_IN_FEGETEXCEPTFLAG, BUILT_IN_FEGETROUND,
BUILT_IN_FEHOLDEXCEPT, BUILT_IN_FERAISEEXCEPT,
BUILT_IN_FESETENV, BUILT_IN_FESETEXCEPTFLAG,
BUILT_IN_FESETROUND, BUILT_IN_FETESTEXCEPT,
BUILT_IN_FEUPDATEENV): New builtins.
* tree-core.h (TI_FENV_T_PTR_TYPE, TI_CONST_FENV_T_PTR_TYPE,
TI_FEXCEPT_T_PTR_TYPE, TI_CONST_FEXCEPT_T_PTR_TYPE): New entries.
* tree.h (fenv_t_ptr_type_node, const_fenv_t_ptr_type_node,
fexcept_t_ptr_type_node, const_fexcept_t_ptr_type_node): New
macros.
(builtin_structptr_types): Adjust size.
* tree.c (builtin_structptr_types): Add four entries.


--
Marc GlisseIndex: gcc/builtin-types.def
===
--- gcc/builtin-types.def	(revision 249585)
+++ gcc/builtin-types.def	(working copy)
@@ -100,20 +100,24 @@ DEF_PRIMITIVE_TYPE (BT_FLOAT64X, (float6
 DEF_PRIMITIVE_TYPE (BT_FLOAT128X, (float128x_type_node
    ? float128x_type_node
    : error_mark_node))
 DEF_PRIMITIVE_TYPE (BT_COMPLEX_FLOAT, complex_float_type_node)
 DEF_PRIMITIVE_TYPE (BT_COMPLEX_DOUBLE, complex_double_type_node)
 DEF_PRIMITIVE_TYPE (BT_COMPLEX_LONGDOUBLE, complex_long_double_type_node)
 
 DEF_PRIMITIVE_TYPE (BT_PTR, ptr_type_node)
 DEF_PRIMITIVE_TYPE (BT_FILEPTR, fileptr_type_node)
 DEF_PRIMITIVE_TYPE (BT_CONST_TM_PTR, const_tm_ptr_type_node)
+DEF_PRIMITIVE_TYPE (BT_FENV_T_PTR, fenv_t_ptr_type_node)
+DEF_PRIMITIVE_TYPE (BT_CONST_FENV_T_PTR, const_fenv_t_ptr_type_node)
+DEF_PRIMITIVE_TYPE (BT_FEXCEPT_T_PTR, fexcept_t_ptr_type_node)
+DEF_PRIMITIVE_TYPE (BT_CONST_FEXCEPT_T_PTR, const_fexcept_t_ptr_type_node)
 DEF_PRIMITIVE_TYPE (BT_CONST_PTR, const_ptr_type_node)
 DEF_PRIMITIVE_TYPE (BT_VOLATILE_PTR,
 		build_pointer_type
 		 (build_qualified_type (void_type_node,
 	TYPE_QUAL_VOLATILE)))
 DEF_PRIMITIVE_TYPE (BT_CONST_VOLATILE_PTR,
 		build_pointer_type
 		 (build_qualified_type (void_type_node,
 	  TYPE_QUAL_VOLATILE|TYPE_QUAL_CONST)))
 DEF_PRIMITIVE_TYPE (BT_PTRMODE, (*lang_hooks.types.type_for_mode)(ptr_mode, 0))
@@ -291,20 +295,22 @@ DEF_FUNCTION_TYPE_1 (BT_FN_UINT16_UINT16
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT32_UINT32, BT_UINT32, BT_UINT32)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT64_UINT64, BT_UINT64, BT_UINT64)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT64_FLOAT, BT_UINT64, BT_FLOAT)
 DEF_FUNCTION_TYPE_1 (BT_FN_BOOL_INT, BT_BOOL, BT_INT)
 DEF_FUNCTION_TYPE_1 (BT_FN_PTR_CONST_PTR, BT_PTR, BT_CONST_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_CONST_PTR_CONST_PTR, BT_CONST_PTR, BT_CONST_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_BND_CONST_PTR, BT_BND, BT_CONST_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_CONST_PTR_BND, BT_CONST_PTR, BT_BND)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT16_UINT32, BT_UINT16, BT_UINT32)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT32_UINT16, BT_UINT32, BT_UINT16)
+DEF_FUNCTION_TYPE_1 (BT_FN_INT_FENV_T_PTR, BT_INT, BT_FENV_T_PTR)
+DEF_FUNCTION_TYPE_1 (BT_FN_INT_CONST_FENV_T_PTR, BT_INT, BT_CONST_FENV_T_PTR)
 
 DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR, BT_FN_VOID_PTR)
 
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTR_INT, BT_VOID, BT_PTR, BT_INT)
 DEF_FUNCTION_TYPE_2 (BT_FN_STRING_STRING_CONST_STRING,
 		 BT_STRING, BT_STRING, BT_CONST_STRING)
 DEF_FUNCTION_TYPE_2 (BT_FN_INT_CONST_STRING_CONST_STRING,
 		 BT_INT, BT_CONST_STRING, BT_CONST_STRING)
 DEF_FUNCTION_TYPE_2 (BT_FN_STRING_CONST_STRING_CONST_STRING,
 		 BT_STRING, BT_CONST_STRING, BT_CONST_STRING)
@@ -464,20 +470,24 @@ DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT_UIN
 DEF_FUNCTION_TYPE_2 (BT_FN_UINT_UINT_PTR, BT_UINT, BT_UINT, BT_PTR)
 DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_SIZE, BT_PTR, BT_CONST_PTR, BT_SIZE)
 DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_CONST_PTR, BT_PTR, BT_CONST_PTR, BT_CONST_PTR)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTRPTR_CONST_PTR, BT_VOID, BT_PTR_PTR, BT_CONST_PTR)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_CONST_PTR_SIZE, BT_VOID, BT_CONST_PTR, BT_SIZE)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTR_BND, BT_VOID, BT_PTR, BT_BND)
 DEF_FUNCTION_TYPE_2 (BT_FN_CONST_PTR_CONST_PTR_CONST_PTR, BT_CONST_PTR, BT_CONST_PTR, 

Re: C/C++ PATCH to add __typeof_noqual (PR c/65455, c/39985)

2017-06-23 Thread Marek Polacek
On Fri, Jun 23, 2017 at 04:48:33PM +0200, Jakub Jelinek wrote:
> On Fri, Jun 23, 2017 at 04:46:06PM +0200, Marek Polacek wrote:
> > +++ gcc/c-family/c-common.c
> > @@ -433,6 +433,8 @@ const struct c_common_resword c_common_reswords[] =
> >{ "__transaction_cancel", RID_TRANSACTION_CANCEL, 0 },
> >{ "__typeof",RID_TYPEOF, 0 },
> >{ "__typeof__",  RID_TYPEOF, 0 },
> > +  { "__typeof_noqual", RID_TYPEOF_NOQUAL, 0 },
> > +  { "__typeof_noqual__", RID_TYPEOF_NOQUAL, 0 },
> >{ "__underlying_type", RID_UNDERLYING_TYPE, D_CXXONLY },
> >{ "__volatile",  RID_VOLATILE,   0 },
> >{ "__volatile__",RID_VOLATILE,   0 },
> > @@ -506,6 +508,7 @@ const struct c_common_resword c_common_reswords[] =
> >{ "typename",RID_TYPENAME,   D_CXXONLY | D_CXXWARN },
> >{ "typeid",  RID_TYPEID, D_CXXONLY | D_CXXWARN },
> >{ "typeof",  RID_TYPEOF, D_ASM | D_EXT },
> > +  { "typeof_noqual",   RID_TYPEOF_NOQUAL, D_ASM | D_EXT },
> 
> Do you think we need this one?  Wouldn't just __typeof_noqual and
> __typeof_noqual__ be sufficient?

Unsure.  At first I didn't add typeof_noqual, but then I saw that our doc
use "typeof" and thought it might be better to be consistent and allow
typeof_noqual without leading underscores.  But in C++ it's only accepted
with -fgnu-keywords.

I could do without it.  Let's see what others think.

Marek


Re: [committed] Fix -fstack-check with really big frames on aarch64

2017-06-23 Thread Jeff Law
On 06/23/2017 05:15 AM, Christophe Lyon wrote:

> Hi,
> 
> A minor comment at this stage: this new test fails to compile for
> thumb-1 targets:
> testsuite/gcc.c-torture/compile/20031023-1.c:27:1: sorry,
> unimplemented: -fstack-check=specific for Thumb-1
> 
> for instance on arm-none-linux-gnueabi --with-mode=thumb --with-cpu=cortex-a9
> and forcing -march=armv5t in runtest flags.
> 
> Is there a clean way to make it unsupported?Presumably we could create an 
> effective-target test.  That would seem to
me to be the most reliable way.

We're going to want the ability to check for -fstack-check=specific and
-fstack-check=.  Do you mind waiting a few days as I start to
pull the larger stack checking issues together for submission?

Jeff


Re: [committed] Fix -fstack-check with really big frames on aarch64

2017-06-23 Thread Jeff Law
On 06/22/2017 11:28 AM, Jakub Jelinek wrote:
> On Thu, Jun 22, 2017 at 11:21:15AM -0600, Jeff Law wrote:
>> +2017-06-22  Jeff Law  
>> +
>> +* gcc.c-torture/compile/stack-check-1.c: New test.
>> +
>>  2016-06-22  Richard Biener  
>>  
>>  * gcc.dg/vect/pr65947-1.c: Remove xfail.
>> diff --git a/gcc/testsuite/gcc.c-torture/compile/stack-check-1.c 
>> b/gcc/testsuite/gcc.c-torture/compile/stack-check-1.c
>> new file mode 100644
>> index 000..4058eb58709
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.c-torture/compile/stack-check-1.c
>> @@ -0,0 +1,2 @@
>> +/* { dg-additional-options "-fstack-check" } */
>> +#include "20031023-1.c"
> 
> That test has:
> /* { dg-require-effective-target untyped_assembly } */
> which needs to be duplicated here (dejagnu isn't aware of the
> #include and doesn't scan dg- directives in there).
Ugh.  Good point.  Fixed in the obvious way.

Jeff
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 4e2defd7ab4..8c558622f78 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2017-06-22  Jeff Law  
+
+   * gcc.c-torture/compile/stack-check-1.c: Require "untyped_assembly".
+
 2017-06-23  Will Schmidt  
 
* gcc.target/powerpc/fold-vec-shift-char.c: New.
diff --git a/gcc/testsuite/gcc.c-torture/compile/stack-check-1.c 
b/gcc/testsuite/gcc.c-torture/compile/stack-check-1.c
index 4058eb58709..5c99688b35a 100644
--- a/gcc/testsuite/gcc.c-torture/compile/stack-check-1.c
+++ b/gcc/testsuite/gcc.c-torture/compile/stack-check-1.c
@@ -1,2 +1,3 @@
+/* { dg-require-effective-target untyped_assembly } */
 /* { dg-additional-options "-fstack-check" } */
 #include "20031023-1.c"


Re: C/C++ PATCH to add __typeof_noqual (PR c/65455, c/39985)

2017-06-23 Thread Jakub Jelinek
On Fri, Jun 23, 2017 at 04:46:06PM +0200, Marek Polacek wrote:
> +++ gcc/c-family/c-common.c
> @@ -433,6 +433,8 @@ const struct c_common_resword c_common_reswords[] =
>{ "__transaction_cancel", RID_TRANSACTION_CANCEL, 0 },
>{ "__typeof",  RID_TYPEOF, 0 },
>{ "__typeof__",RID_TYPEOF, 0 },
> +  { "__typeof_noqual",   RID_TYPEOF_NOQUAL, 0 },
> +  { "__typeof_noqual__", RID_TYPEOF_NOQUAL, 0 },
>{ "__underlying_type", RID_UNDERLYING_TYPE, D_CXXONLY },
>{ "__volatile",RID_VOLATILE,   0 },
>{ "__volatile__",  RID_VOLATILE,   0 },
> @@ -506,6 +508,7 @@ const struct c_common_resword c_common_reswords[] =
>{ "typename",  RID_TYPENAME,   D_CXXONLY | D_CXXWARN },
>{ "typeid",RID_TYPEID, D_CXXONLY | D_CXXWARN },
>{ "typeof",RID_TYPEOF, D_ASM | D_EXT },
> +  { "typeof_noqual", RID_TYPEOF_NOQUAL, D_ASM | D_EXT },

Do you think we need this one?  Wouldn't just __typeof_noqual and
__typeof_noqual__ be sufficient?

Jakub


C/C++ PATCH to add __typeof_noqual (PR c/65455, c/39985)

2017-06-23 Thread Marek Polacek
This patch adds a variant of __typeof, called __typeof_noqual.  As the name
suggests, this variant always drops all qualifiers, not just when the type
is atomic.  This was discussed several times in the past, see e.g.

or

It's been brought to my attention again here:


One approach would be to just modify the current __typeof, but that could
cause some incompatibilities, I'm afraid.  This is based on rth's earlier
patch:  but I
didn't do the address space-stripping variant __typeof_noas.  I also added
a couple of missing things.

You'll also see that I dropped all qualifiers for __auto_type.  But I actually
couldn't trigger the
init_type = c_build_qualified_type (init_type, TYPE_UNQUALIFIED);
line in c_parser_declaration_or_fndef (even when running the whole testsuite)
so I'm not convinced it makes any difference.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-06-23  Marek Polacek  
Richard Henderson  

PR c/65455
PR c/39985
* c-common.c (c_common_reswords): Add __typeof_noqual,
__typeof_noqual__, and typeof_noqual.
(keyword_begins_type_specifier): Handle RID_TYPEOF_NOQUAL.
* c-common.h (enum rid): Add RID_TYPEOF_NOQUAL.

* c-parser.c (c_keyword_starts_typename): Handle RID_TYPEOF_NOQUAL.
(c_token_starts_declspecs): Likewise.
(c_parser_declaration_or_fndef): Always strip all qualifiers for
__auto_type.
(c_parser_declspecs): Handle RID_TYPEOF_NOQUAL.
(c_parser_typeof_specifier): Handle RID_TYPEOF_NOQUAL by dropping
all the qualifiers.
(c_parser_objc_selector): Handle RID_TYPEOF_NOQUAL.

* parser.c (cp_keyword_starts_decl_specifier_p): Handle 
RID_TYPEOF_NOQUAL.
(cp_parser_simple_type_specifier): Handle RID_TYPEOF_NOQUAL by dropping
all the qualifiers.

* doc/extend.texi: Document __typeof_noqual.
* doc/invoke.texi: Update documentation regarding typeof.

* c-c++-common/typeof-noqual-1.c: New test.
* c-c++-common/typeof-noqual-2.c: New test.
* gcc.dg/typeof-noqual-1.c: New test.


diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index f6a9d05..db9c3ba 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -433,6 +433,8 @@ const struct c_common_resword c_common_reswords[] =
   { "__transaction_cancel", RID_TRANSACTION_CANCEL, 0 },
   { "__typeof",RID_TYPEOF, 0 },
   { "__typeof__",  RID_TYPEOF, 0 },
+  { "__typeof_noqual", RID_TYPEOF_NOQUAL, 0 },
+  { "__typeof_noqual__", RID_TYPEOF_NOQUAL, 0 },
   { "__underlying_type", RID_UNDERLYING_TYPE, D_CXXONLY },
   { "__volatile",  RID_VOLATILE,   0 },
   { "__volatile__",RID_VOLATILE,   0 },
@@ -506,6 +508,7 @@ const struct c_common_resword c_common_reswords[] =
   { "typename",RID_TYPENAME,   D_CXXONLY | D_CXXWARN },
   { "typeid",  RID_TYPEID, D_CXXONLY | D_CXXWARN },
   { "typeof",  RID_TYPEOF, D_ASM | D_EXT },
+  { "typeof_noqual",   RID_TYPEOF_NOQUAL, D_ASM | D_EXT },
   { "union",   RID_UNION,  0 },
   { "unsigned",RID_UNSIGNED,   0 },
   { "using",   RID_USING,  D_CXXONLY | D_CXXWARN },
@@ -7511,6 +7514,7 @@ keyword_begins_type_specifier (enum rid keyword)
 case RID_SAT:
 case RID_COMPLEX:
 case RID_TYPEOF:
+case RID_TYPEOF_NOQUAL:
 case RID_STRUCT:
 case RID_CLASS:
 case RID_UNION:
diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h
index 1748c19..3d98697 100644
--- gcc/c-family/c-common.h
+++ gcc/c-family/c-common.h
@@ -100,8 +100,9 @@ enum rid
   /* C extensions */
   RID_ASM,   RID_TYPEOF,   RID_ALIGNOF,  RID_ATTRIBUTE,  RID_VA_ARG,
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,  RID_CHOOSE_EXPR,
-  RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX, 
RID_BUILTIN_SHUFFLE,
-  RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
+  RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX,
+  RID_BUILTIN_SHUFFLE, RID_DFLOAT32, RID_DFLOAT64,  RID_DFLOAT128,
+  RID_TYPEOF_NOQUAL,
 
   /* TS 18661-3 keywords, in the same sequence as the TI_* values.  */
   RID_FLOAT16,
diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 6f954f2..9899592 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -495,6 +495,7 @@ c_keyword_starts_typename (enum rid keyword)
 case RID_STRUCT:
 case RID_UNION:
 case RID_TYPEOF:
+case RID_TYPEOF_NOQUAL:
 case RID_CONST:
 case RID_ATOMIC:
 case RID_VOLATILE:
@@ -671,6 +672,7 @@ c_token_starts_declspecs (c_token *token)
case RID_STRUCT:
case RID_UNION:
case RID_TYPEOF:
+   case RID_TYPEOF_NOQUAL:
case RID_CONST:
case RID_VOLATILE:
case 

Re: [PATCH] Fix more PR80928 fallout

2017-06-23 Thread Rainer Orth
Hi Richard,

> Bootstrapped (with the poisoning sofar, plain patch still running)
> on x86_64-unknown-linux-gnu, testing in progress.
>
> Comments welcome, testing won't finish before I leave for the
> weekend.

a i686-pc-linux-gnu bootstrap just completed with the 64-bit libgomp
failues gone and no regressions.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: libgo patch committed: Fix ptrace implementation on MIPS

2017-06-23 Thread Ian Lance Taylor
On Fri, Jun 23, 2017 at 2:34 AM, James Cowgill  wrote:
> Hi,
>
> On 22/06/17 20:59, Ian Lance Taylor wrote:
>> James, any thoughts?
>>
>> Ian
>>
>> On Thu, Jun 22, 2017 at 12:55 AM, Andreas Schwab  wrote:
>>> On Jun 21 2017, Ian Lance Taylor  wrote:
>>>
 Index: libgo/sysinfo.c
 ===
 --- libgo/sysinfo.c   (revision 249205)
 +++ libgo/sysinfo.c   (working copy)
 @@ -102,6 +102,9 @@
  #if defined(HAVE_LINUX_NETLINK_H)
  #include 
  #endif
 +#if defined(HAVE_LINUX_PTRACE_H)
 +#include 
 +#endif
  #if defined(HAVE_LINUX_RTNETLINK_H)
  #include 
  #endif
>>>
>>> That breaks ia64:
>>>
>>> In file included from /usr/include/asm/ptrace.h:58:0,
>>>  from /usr/include/linux/ptrace.h:69,
>>>  from ../../../libgo/sysinfo.c:106:
>>> /usr/include/asm/fpu.h:57:8: error: redefinition of 'struct ia64_fpreg'
>>>  struct ia64_fpreg {
>>> ^~
>>> In file included from /usr/include/signal.h:339:0,
>>>  from 
>>> /usr/local/gcc/gcc-20170622/Build/gcc/include-fixed/sys/ucontext.h:32,
>>>  from /usr/include/ucontext.h:27,
>>>  from ../../../libgo/sysinfo.c:17:
>>> /usr/include/bits/sigcontext.h:32:8: note: originally defined here
>>>  struct ia64_fpreg
>>> ^~
>>> In file included from /usr/include/linux/ptrace.h:69:0,
>>>  from ../../../libgo/sysinfo.c:106:
>>> /usr/include/asm/ptrace.h:208:8: error: redefinition of 'struct 
>>> pt_all_user_regs'
>>>  struct pt_all_user_regs {
>>> ^~~~
>>> In file included from ../../../libgo/sysinfo.c:66:0:
>>> /usr/include/sys/ptrace.h:116:8: note: originally defined here
>>>  struct pt_all_user_regs
>>> ^~~~
>
> This looks like this glibc bug which was fixed in 2.19.
> https://sourceware.org/bugzilla/show_bug.cgi?id=762

Thanks.

Andreas, can we avoid the problem for earlier glibc versions with a
patch like the appended?

Ian
diff --git a/libgo/sysinfo.c b/libgo/sysinfo.c
index a1afc7d1..80407443 100644
--- a/libgo/sysinfo.c
+++ b/libgo/sysinfo.c
@@ -38,7 +38,10 @@
 #if defined(HAVE_NETINET_IF_ETHER_H)
 #include 
 #endif
+/* Avoid https://sourceware.org/bugzilla/show_bug.cgi?id=762 .  */
+#define ia64_fpreg pt_ia64_fpreg
 #include 
+#undef ia64_fpreg
 #include 
 #include 
 #if defined(HAVE_SYSCALL_H)


libgo patch committed: Improve handling of panic during deferred function

2017-06-23 Thread Ian Lance Taylor
When a Go panic occurs while processing a deferred function that
recovered an earlier panic, we shouldn't report the recovered panic in
the panic stack trace. This libgo patch stops doing so by keeping
track of the panic that triggered a defer, marking it as aborted if we
see the defer again, and discarding aborted panics when a panic is
recovered. This is what the gc runtime does.

The test for this is TestRecursivePanic in runtime/crash_test.go.  We
don't run that test yet, but we will soon.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249578)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-385efb8947af70b8425c833a1ab68ba5f357dfae
+c4adba240f9d5af8ab0534316d6b05bd988c432c
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/panic.go
===
--- libgo/go/runtime/panic.go   (revision 249559)
+++ libgo/go/runtime/panic.go   (working copy)
@@ -91,6 +91,9 @@ func throwinit() {
 // arg is a value to pass to pfn.
 func deferproc(frame *bool, pfn uintptr, arg unsafe.Pointer) {
d := newdefer()
+   if d._panic != nil {
+   throw("deferproc: d.panic != nil after newdefer")
+   }
d.frame = frame
d.panicStack = getg()._panic
d.pfn = pfn
@@ -338,17 +341,28 @@ func Goexit() {
if d == nil {
break
}
-   gp._defer = d.link
 
pfn := d.pfn
+   if pfn == 0 {
+   if d._panic != nil {
+   d._panic.aborted = true
+   d._panic = nil
+   }
+   gp._defer = d.link
+   freedefer(d)
+   continue
+   }
d.pfn = 0
 
-   if pfn != 0 {
-   var fn func(unsafe.Pointer)
-   *(*uintptr)(unsafe.Pointer()) = 
uintptr(unsafe.Pointer())
-   fn(d.arg)
-   }
+   var fn func(unsafe.Pointer)
+   *(*uintptr)(unsafe.Pointer()) = uintptr(unsafe.Pointer())
+   fn(d.arg)
 
+   if gp._defer != d {
+   throw("bad defer entry in Goexit")
+   }
+   d._panic = nil
+   gp._defer = d.link
freedefer(d)
// Note: we ignore recovers here because Goexit isn't a panic
}
@@ -442,39 +456,71 @@ func gopanic(e interface{}) {
}
 
pfn := d.pfn
+
+   // If defer was started by earlier panic or Goexit (and, since 
we're back here, that triggered a new panic),
+   // take defer off list. The earlier panic or Goexit will not 
continue running.
+   if pfn == 0 {
+   if d._panic != nil {
+   d._panic.aborted = true
+   }
+   d._panic = nil
+   gp._defer = d.link
+   freedefer(d)
+   continue
+   }
d.pfn = 0
 
-   if pfn != 0 {
-   var fn func(unsafe.Pointer)
-   *(*uintptr)(unsafe.Pointer()) = 
uintptr(unsafe.Pointer())
-   fn(d.arg)
+   // Record the panic that is running the defer.
+   // If there is a new panic during the deferred call, that panic
+   // will find d in the list and will mark d._panic (this panic) 
aborted.
+   d._panic = p
+
+   var fn func(unsafe.Pointer)
+   *(*uintptr)(unsafe.Pointer()) = uintptr(unsafe.Pointer())
+   fn(d.arg)
 
-   if p.recovered {
-   // Some deferred function called recover.
-   // Stop running this panic.
-   gp._panic = p.link
-
-   // Unwind the stack by throwing an exception.
-   // The compiler has arranged to create
-   // exception handlers in each function
-   // that uses a defer statement.  These
-   // exception handlers will check whether
-   // the entry on the top of the defer stack
-   // is from the current function.  If it is,
-   // we have unwound the stack far enough.
-   unwindStack()
+   if gp._defer != d {
+   throw("bad defer entry in panic")
+  

Re: [PATCH] Fix PR81175, make gather builtins pure

2017-06-23 Thread Jakub Jelinek
On Fri, Jun 23, 2017 at 03:22:13PM +0200, Richard Biener wrote:
> On Fri, 23 Jun 2017, Marc Glisse wrote:
> 
> > On Fri, 23 Jun 2017, Richard Biener wrote:
> > 
> > > The vectorizer is confused about the spurious VDEFs that are caused
> > > by gather vectorization so the following avoids them by making the
> > > builtins pure appropriately.
> > > 
> > > Bootstrap / regtest pending on x86_64-unknown-linux-gnu, ok for
> > > trunk and branch?
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > > 2017-06-23  Richard Biener  
> > > 
> > >   PR target/81175
> > >   * config/i386/i386.c (struct builtin_isa): Add pure_p member.
> > >   (def_builtin2): Initialize pure_p.
> > >   (ix86_add_new_builtins): Honor pure_p.
> > >   (def_builtin_pure): New function.
> > 
> > If you svn update (or equivalent), you will notice that the above is already
> > available ;-)
> 
> Sorry, that was the GCC 7 variant of the patch ...  just scrap the
> already available pieces for trunk ;)

For GCC7, maybe it would be better to backport Marc's commit except perhaps
for the stmxcsr change and then backport your trunk patch on top of it.

Jakub


Re: [PATCH] Fix PR81175, make gather builtins pure

2017-06-23 Thread Richard Biener
On Fri, 23 Jun 2017, Marc Glisse wrote:

> On Fri, 23 Jun 2017, Richard Biener wrote:
> 
> > The vectorizer is confused about the spurious VDEFs that are caused
> > by gather vectorization so the following avoids them by making the
> > builtins pure appropriately.
> > 
> > Bootstrap / regtest pending on x86_64-unknown-linux-gnu, ok for
> > trunk and branch?
> > 
> > Thanks,
> > Richard.
> > 
> > 2017-06-23  Richard Biener  
> > 
> > PR target/81175
> > * config/i386/i386.c (struct builtin_isa): Add pure_p member.
> > (def_builtin2): Initialize pure_p.
> > (ix86_add_new_builtins): Honor pure_p.
> > (def_builtin_pure): New function.
> 
> If you svn update (or equivalent), you will notice that the above is already
> available ;-)

Sorry, that was the GCC 7 variant of the patch ...  just scrap the
already available pieces for trunk ;)

Richard.


Re: [PATCH] go.test: update MIPS architecture names

2017-06-23 Thread Ian Lance Taylor via gcc-patches
On Fri, Jun 23, 2017 at 5:40 AM, James Cowgill  wrote:
>
> This updates the go architecture names on MIPS in line with the recent
> changes to libgo.
>
> I do not have commit access, so please can someone else commit this for
> me.
>
> Thanks,
> James
>
> 2016-06-23  James Cowgill  
>
> * go.test/go-test.exp (go-set-goarch): update MIPS architecture
> names.

Thanks.  Committed.

Ian


Re: [PATCH] Fix PR81175, make gather builtins pure

2017-06-23 Thread Marc Glisse

On Fri, 23 Jun 2017, Richard Biener wrote:


The vectorizer is confused about the spurious VDEFs that are caused
by gather vectorization so the following avoids them by making the
builtins pure appropriately.

Bootstrap / regtest pending on x86_64-unknown-linux-gnu, ok for
trunk and branch?

Thanks,
Richard.

2017-06-23  Richard Biener  

PR target/81175
* config/i386/i386.c (struct builtin_isa): Add pure_p member.
(def_builtin2): Initialize pure_p.
(ix86_add_new_builtins): Honor pure_p.
(def_builtin_pure): New function.


If you svn update (or equivalent), you will notice that the above is 
already available ;-)



--
Marc Glisse


Simple reassoc transforms in match.pd

2017-06-23 Thread Marc Glisse

Hello,

here are a few simple transformations, mostly useful for types with 
undefined overflow where we do not have reassoc.


I did not name the testcase reassoc-* to leave that namespace to the 
realloc pass, and -fno-tree-reassoc is just in case someone ever enhances 
that pass...


Bootstrap + testsuite on powerpc64le-unknown-linux-gnu.

2017-06-23  Marc Glisse  

gcc/
* match.pd ((A+-B)+(C-A), (A+B)-(A-C)): New transformations.

gcc/testsuite/
* gcc.dg/tree-ssa/assoc-1.c: New file.

--
Marc GlisseIndex: gcc/match.pd
===
--- gcc/match.pd	(revision 249585)
+++ gcc/match.pd	(working copy)
@@ -1314,20 +1314,32 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (negate @1))
   (simplify
 (plus:c (minus @0 @1) @1)
 @0)
   (simplify
(minus @0 (plus:c @0 @1))
(negate @1))
   (simplify
(minus @0 (minus @0 @1))
@1)
+  /* (A +- B) + (C - A)   -> C +- B */
+  /* (A +  B) - (A - C)   -> B + C */
+  /* More cases are handled with comparisons.  */
+  (simplify
+   (plus:c (plus:c @0 @1) (minus @2 @0))
+   (plus @2 @1))
+  (simplify
+   (plus:c (minus @0 @1) (minus @2 @0))
+   (minus @2 @1))
+  (simplify
+   (minus (plus:c @0 @1) (minus @0 @2))
+   (plus @1 @2))
 
   /* (A +- CST1) +- CST2 -> A + CST3
  Use view_convert because it is safe for vectors and equivalent for
  scalars.  */
   (for outer_op (plus minus)
(for inner_op (plus minus)
 	neg_inner_op (minus plus)
 (simplify
  (outer_op (nop_convert (inner_op @0 CONSTANT_CLASS_P@1))
 	   CONSTANT_CLASS_P@2)
Index: gcc/testsuite/gcc.dg/tree-ssa/assoc-1.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/assoc-1.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/assoc-1.c	(working copy)
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized-raw -fno-tree-reassoc" } */
+
+int f0(int a,int b,int c){
+  int d = a + b;
+  int e = c + b;
+  return d - e;
+}
+int f1(int a,int b,int c){
+  int d = a + b;
+  int e = b - c;
+  return d - e;
+}
+int f2(int a,int b,int c){
+  int d = a + b;
+  int e = c - b;
+  return e + d;
+}
+int f3(int a,int b,int c){
+  int d = a - b;
+  int e = c - b;
+  return d - e;
+}
+int f4(int a,int b,int c){
+  int d = b - a;
+  int e = c - b;
+  return e + d;
+}
+
+/* { dg-final { scan-tree-dump-times "plus_expr" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "minus_expr" 3 "optimized" } } */


[PATCH] Fix PR81175, make gather builtins pure

2017-06-23 Thread Richard Biener

The vectorizer is confused about the spurious VDEFs that are caused
by gather vectorization so the following avoids them by making the
builtins pure appropriately.

Bootstrap / regtest pending on x86_64-unknown-linux-gnu, ok for
trunk and branch?

Thanks,
Richard.

2017-06-23  Richard Biener  

PR target/81175
* config/i386/i386.c (struct builtin_isa): Add pure_p member.
(def_builtin2): Initialize pure_p.
(ix86_add_new_builtins): Honor pure_p.
(def_builtin_pure): New function.
(ix86_init_mmx_sse_builtins): Use def_builtin_pure for all
gather builtins.

* gfortran.dg/pr81175.f: New testcase.

Index: gcc/config/i386/i386.c
===
*** gcc/config/i386/i386.c  (revision 249586)
--- gcc/config/i386/i386.c  (working copy)
*** struct builtin_isa {
*** 31074,31079 
--- 31074,31080 
HOST_WIDE_INT isa;  /* isa_flags this builtin is defined for */
HOST_WIDE_INT isa2; /* additional isa_flags this builtin is defined 
for */
bool const_p;   /* true if the declaration is constant 
*/
+   bool pure_p;/* true if the declaration is pure */
bool leaf_p;/* true if the declaration has leaf 
attribute */
bool nothrow_p; /* true if the declaration has nothrow 
attribute */
bool set_and_not_built_p;
*** def_builtin_const (HOST_WIDE_INT mask, c
*** 31166,31171 
--- 31167,31187 
return decl;
  }
  
+ /* Like def_builtin, but also marks the function decl "pure".  */
+ 
+ static inline tree
+ def_builtin_pure (HOST_WIDE_INT mask, const char *name,
+ enum ix86_builtin_func_type tcode, enum ix86_builtins code)
+ {
+   tree decl = def_builtin (mask, name, tcode, code);
+   if (decl)
+ DECL_PURE_P (decl) = 1;
+   else
+ ix86_builtins_isa[(int) code].pure_p = true;
+ 
+   return decl;
+ }
+ 
  /* Like def_builtin, but for additional isa2 flags.  */
  
  static inline tree
*** def_builtin2 (HOST_WIDE_INT mask, const
*** 31200,31205 
--- 31216,31222 
ix86_builtins_isa[(int) code].leaf_p = false;
ix86_builtins_isa[(int) code].nothrow_p = false;
ix86_builtins_isa[(int) code].const_p = false;
+   ix86_builtins_isa[(int) code].pure_p = false;
ix86_builtins_isa[(int) code].set_and_not_built_p = true;
  }
  
*** ix86_add_new_builtins (HOST_WIDE_INT isa
*** 31259,31264 
--- 31276,31283 
  ix86_builtins[i] = decl;
  if (ix86_builtins_isa[i].const_p)
TREE_READONLY (decl) = 1;
+ if (ix86_builtins_isa[i].pure_p)
+   DECL_PURE_P (decl) = 1;
  if (ix86_builtins_isa[i].leaf_p)
DECL_ATTRIBUTES (decl) = build_tree_list (get_identifier ("leaf"),
  NULL_TREE);
*** ix86_init_mmx_sse_builtins (void)
*** 31663,31796 
   IX86_BUILTIN_RDRAND64_STEP);
  
/* AVX2 */
!   def_builtin (OPTION_MASK_ISA_AVX2, "__builtin_ia32_gathersiv2df",
!  V2DF_FTYPE_V2DF_PCDOUBLE_V4SI_V2DF_INT,
!  IX86_BUILTIN_GATHERSIV2DF);
! 
!   def_builtin (OPTION_MASK_ISA_AVX2, "__builtin_ia32_gathersiv4df",
!  V4DF_FTYPE_V4DF_PCDOUBLE_V4SI_V4DF_INT,
!  IX86_BUILTIN_GATHERSIV4DF);
! 
!   def_builtin (OPTION_MASK_ISA_AVX2, "__builtin_ia32_gatherdiv2df",
!  V2DF_FTYPE_V2DF_PCDOUBLE_V2DI_V2DF_INT,
!  IX86_BUILTIN_GATHERDIV2DF);
! 
!   def_builtin (OPTION_MASK_ISA_AVX2, "__builtin_ia32_gatherdiv4df",
!  V4DF_FTYPE_V4DF_PCDOUBLE_V4DI_V4DF_INT,
!  IX86_BUILTIN_GATHERDIV4DF);
! 
!   def_builtin (OPTION_MASK_ISA_AVX2, "__builtin_ia32_gathersiv4sf",
!  V4SF_FTYPE_V4SF_PCFLOAT_V4SI_V4SF_INT,
!  IX86_BUILTIN_GATHERSIV4SF);
! 
!   def_builtin (OPTION_MASK_ISA_AVX2, "__builtin_ia32_gathersiv8sf",
!  V8SF_FTYPE_V8SF_PCFLOAT_V8SI_V8SF_INT,
!  IX86_BUILTIN_GATHERSIV8SF);
! 
!   def_builtin (OPTION_MASK_ISA_AVX2, "__builtin_ia32_gatherdiv4sf",
!  V4SF_FTYPE_V4SF_PCFLOAT_V2DI_V4SF_INT,
!  IX86_BUILTIN_GATHERDIV4SF);
! 
!   def_builtin (OPTION_MASK_ISA_AVX2, "__builtin_ia32_gatherdiv4sf256",
!  V4SF_FTYPE_V4SF_PCFLOAT_V4DI_V4SF_INT,
!  IX86_BUILTIN_GATHERDIV8SF);
! 
!   def_builtin (OPTION_MASK_ISA_AVX2, "__builtin_ia32_gathersiv2di",
!  V2DI_FTYPE_V2DI_PCINT64_V4SI_V2DI_INT,
!  IX86_BUILTIN_GATHERSIV2DI);
! 
!   def_builtin (OPTION_MASK_ISA_AVX2, "__builtin_ia32_gathersiv4di",
!  V4DI_FTYPE_V4DI_PCINT64_V4SI_V4DI_INT,
!  IX86_BUILTIN_GATHERSIV4DI);
! 
!   def_builtin (OPTION_MASK_ISA_AVX2, "__builtin_ia32_gatherdiv2di",
!  V2DI_FTYPE_V2DI_PCINT64_V2DI_V2DI_INT,
!  IX86_BUILTIN_GATHERDIV2DI);
! 

[PATCH] go.test: update MIPS architecture names

2017-06-23 Thread James Cowgill
Hi,

This updates the go architecture names on MIPS in line with the recent
changes to libgo.

I do not have commit access, so please can someone else commit this for
me.

Thanks,
James

2016-06-23  James Cowgill  

* go.test/go-test.exp (go-set-goarch): update MIPS architecture
names.
---
 gcc/testsuite/go.test/go-test.exp | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/go.test/go-test.exp 
b/gcc/testsuite/go.test/go-test.exp
index 5f6ef299e55..4b10e4e2d16 100644
--- a/gcc/testsuite/go.test/go-test.exp
+++ b/gcc/testsuite/go.test/go-test.exp
@@ -213,29 +213,27 @@ proc go-set-goarch { } {
#error FOO
#endif
}] {
-   set goarch "mipso32"
+   set goarch "mips"
} elseif [check_no_compiler_messages mipsn32 assembly {
#if _MIPS_SIM != _ABIN32
#error FOO
#endif
}] {
-   set goarch "mipsn32"
+   set goarch "mips64p32"
} elseif [check_no_compiler_messages mipsn64 assembly {
#if _MIPS_SIM != _ABI64
#error FOO
#endif
}] {
-   set goarch "mipsn64"
-   } elseif [check_no_compiler_messages mipso64 assembly {
-   #if _MIPS_SIM != _ABIO64
-   #error FOO
-   #endif
-   }] {
-   set goarch "mipso64"
+   set goarch "mips64"
} else {
perror "$target_triplet: unrecognized MIPS ABI"
return ""
}
+
+   if [istarget "mips*el-*-*"] {
+   append goarch "le"
+   }
}
"powerpc*-*-*" {
if [check_effective_target_ilp32] {
-- 
2.13.1


Re: PR81136: ICE from inconsistent DR_MISALIGNMENTs

2017-06-23 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford
>  wrote:
>> The test case triggered this assert in vect_update_misalignment_for_peel:
>>
>>   gcc_assert (DR_MISALIGNMENT (dr) / dr_size ==
>>   DR_MISALIGNMENT (dr_peel) / dr_peel_size);
>>
>> We knew that the two DRs had the same misalignment at runtime, but when
>> considered in isolation, one data reference guaranteed a higher compile-time
>> base alignment than the other.
>>
>> In the test case this looks like a missed opportunity.  Both references
>> are unconditional, so it should be possible to use the highest of the
>> available base alignment guarantees when analyzing each reference.
>> The patch does this.
>>
>> However, as the comment in the patch says, the base alignment guarantees
>> provided by a conditional reference only apply if the reference occurs
>> at least once.  In this case it would be legitimate for two references
>> to have the same runtime misalignment and for one reference to provide a
>> stronger compile-time guarantee than the other about what the misalignment
>> actually is.  The patch therefore relaxes the assert to handle that case.
>
> Hmm, but you don't actually check whether a reference occurs only conditional,
> do you?  You just seem to say that for masked loads/stores the reference
> is conditional (I believe that's not true).  But for a loop like
>
>  for (;;)
>if (a[i])
>  sum += b[j];
>
> you still assume b[j] executes unconditionally?

Maybe the documentation isn't clear enough, but DR_IS_CONDITIONAL
was supposed to mean "even if the containing statement executes
and runs to completion, the reference might not actually occur".
The example above isn't conditional in that sense because the
reference to b[j] does occur if the store is reached and completes.

Masked loads and stores are conditional in that sense though.
The reference only occurs if the mask is nonzero; the memory
isn't touched otherwise.  The functions are used to if-convert
things like:

   for (...)
 a[i] = b[i] ? c[i] : d[i];

where there's no guarantee that it's safe to access c[i] when !b[i]
(or d[i] when b[i]).  No reference occurs for an all-false mask.

> The vectorizer of course only sees unconditionally executed stmts.
>
> So - I'd simply not add this DR_IS_CONDITIONAL.  Did you run into
> any real-world (testsuite) issues without this?

Dropping DR_IS_CONDITIONAL would cause us to make invalid alignment
assumptions in silly corner cases.  I could add a scan test for it,
for targets with masked loads and stores.  It wouldn't trigger
an execution failure though because we assume that targets with
masked loads and stores allow unaligned accesses:

  /* For now assume all conditional loads/stores support unaligned
 access without any special code.  */
  if (is_gimple_call (stmt)
  && gimple_call_internal_p (stmt)
  && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD
  || gimple_call_internal_fn (stmt) == IFN_MASK_STORE))
return dr_unaligned_supported;

So the worst that would happen is that we'd supposedly peel for
alignment, but actually misalign everything instead, and so make
things slower rather than quicker.

> Note that the assert is to prevent bogus information.  Iff we aligned
> DR with base alignment 8 and misalign 3 then if another same-align
> DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT
> as it still can be 8 after aligning DR.
>
> So I think it's wrong to put DRs with differing base-alignment into
> the same-align-refs chain, those should get their DR_MISALIGNMENT
> updated independenlty after peeling.

DR_MISALIGNMENT is relative to the vector alignment rather than
the base alignment though.  So:

a) when looking for references *A1 and *A2 with the same alignment,
   we simply have to prove that A1 % vecalign == A2 % vecalign.
   This doesn't require any knowledge about the base alignment.
   If we break the addresses down as:

  A1 = BASE1 + REST1,  REST1 = INIT1 + OFFSET1 + X * STEP1
  A2 = BASE2 + REST2,  REST2 = INIT2 + OFFSET2 + X * STEP2

   and can prove that BASE1 == BASE2, the alignment of that base
   isn't important.  We simply need to prove that REST1 % vecalign
   == REST2 % vecalign for all X.

b) In the assert, we've peeled the loop so that DR_PEEL is guaranteed
   to be vector-aligned.  If DR_PEEL is A1 in the example above, we have
   A1 % vecalign == 0, so A2 % vecalign must be 0 too.  This again doesn't
   rely on the base alignment being known.

What a high base alignment for DR_PEEL gives us is the ability to know
at compile how many iterations need to be peeled to make DR_PEEL aligned.
But the points above apply regardless of whether we know that value at
compile time or not.

In examples like the test case, we would have known at compile time that
VF-1 iterations would need to be peeled if we'd picked the store as the
DR_PEEL, but 

[PATCH] Fix more PR80928 fallout

2017-06-23 Thread Richard Biener

SLP induction vectorization runs into the issue that it remembers
pointers to PHI nodes in the SLP tree during analysis.  But those
may get invalidated by loop copying (for prologue/epilogue peeling
or versioning) as the low-level CFG helper copy_bbs works in the
way of copying individual BBs plus their outgoing edges but with
old destinations and at the end re-directing the edges to the
desired location.  In SSA this triggers the whole machinery of
making room for new PHI nodes -- that is undesirable because it
causes re-allocation of PHI nodes in the set of source blocks.

After much pondering I arrived at the following (least ugly) solution
to this "problem" (well, I define it as a problem, it's at least
an inefficiency and a workaround in the vectorizer would be way
uglier).  Namely simply do not trigger the SSA machinery for
blocks with BB_DUPLICATED (I skimmed all other users and they seem
fine).

In the process I also implemented some poisoning of the old PHI node
when we reallocate (well, free) PHI nodes.  But that triggers some
other issues, one fixed by the tree-ssa-phionlycoprop.c hunk below.
So I'm not submitting it as part of this fix.

Bootstrapped (with the poisoning sofar, plain patch still running)
on x86_64-unknown-linux-gnu, testing in progress.

Comments welcome, testing won't finish before I leave for the
weekend.

Thanks,
Richard.

2017-06-23  Richard Biener  

* cfghooks.c (duplicate_block): Do not copy BB_DUPLICATED flag.
(copy_bbs): Set BB_DUPLICATED flag early.
(execute_on_growing_pred): Do not execute for BB_DUPLICATED
marked blocks.
(execute_on_shrinking_pred): Likewise.
* tree-ssa.c (ssa_redirect_edge): Do not look for PHI args in
BB_DUPLICATED blocks.
* tree-ssa-phionlycoprop.c (eliminate_degenerate_phis_1): Properly
iterate over all PHIs considering removal of *gsi.

Index: gcc/cfghooks.c
===
--- gcc/cfghooks.c  (revision 249552)
+++ gcc/cfghooks.c  (working copy)
@@ -1087,7 +1087,7 @@ duplicate_block (basic_block bb, edge e,
   if (after)
 move_block_after (new_bb, after);
 
-  new_bb->flags = bb->flags;
+  new_bb->flags = (bb->flags & ~BB_DUPLICATED);
   FOR_EACH_EDGE (s, ei, bb->succs)
 {
   /* Since we are creating edges from a new block to successors
@@ -1207,7 +1207,8 @@ flow_call_edges_add (sbitmap blocks)
 void
 execute_on_growing_pred (edge e)
 {
-  if (cfg_hooks->execute_on_growing_pred)
+  if (! (e->dest->flags & BB_DUPLICATED)
+  && cfg_hooks->execute_on_growing_pred)
 cfg_hooks->execute_on_growing_pred (e);
 }
 
@@ -1217,7 +1218,8 @@ execute_on_growing_pred (edge e)
 void
 execute_on_shrinking_pred (edge e)
 {
-  if (cfg_hooks->execute_on_shrinking_pred)
+  if (! (e->dest->flags & BB_DUPLICATED)
+  && cfg_hooks->execute_on_shrinking_pred)
 cfg_hooks->execute_on_shrinking_pred (e);
 }
 
@@ -1353,6 +1355,12 @@ copy_bbs (basic_block *bbs, unsigned n,
   basic_block bb, new_bb, dom_bb;
   edge e;
 
+  /* Mark the blocks to be copied.  This is used by edge creation hooks
+ to decide whether to reallocate PHI nodes capacity to avoid reallocating
+ PHIs in the set of source BBs.  */
+  for (i = 0; i < n; i++)
+bbs[i]->flags |= BB_DUPLICATED;
+
   /* Duplicate bbs, update dominators, assign bbs to loops.  */
   for (i = 0; i < n; i++)
 {
@@ -1360,7 +1368,6 @@ copy_bbs (basic_block *bbs, unsigned n,
   bb = bbs[i];
   new_bb = new_bbs[i] = duplicate_block (bb, NULL, after);
   after = new_bb;
-  bb->flags |= BB_DUPLICATED;
   if (bb->loop_father)
{
  /* Possibly set loop header.  */
Index: gcc/tree-ssa-phionlycprop.c
===
--- gcc/tree-ssa-phionlycprop.c (revision 249552)
+++ gcc/tree-ssa-phionlycprop.c (working copy)
@@ -420,10 +420,11 @@ eliminate_degenerate_phis_1 (basic_block
   basic_block son;
   bool cfg_altered = false;
 
-  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next ())
+  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi);)
 {
   gphi *phi = gsi.phi ();
-
+  /* We might end up removing PHI so advance the iterator now.  */
+  gsi_next ();
   cfg_altered |= eliminate_const_or_copy (phi, interesting_names,
  need_eh_cleanup);
 }

Index: gcc/tree-ssa.c
===
--- gcc/tree-ssa.c  (revision 249552)
+++ gcc/tree-ssa.c  (working copy)
@@ -142,21 +142,24 @@ ssa_redirect_edge (edge e, basic_block d
 
   redirect_edge_var_map_clear (e);
 
-  /* Remove the appropriate PHI arguments in E's destination block.  */
-  for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next ())
-{
-  tree def;
-  source_location locus ;
-
-  phi = gsi.phi ();
-  def = gimple_phi_arg_def (phi, e->dest_idx);
-  locus 

RE: Add support for use_hazard_barrier_return function attribute

2017-06-23 Thread Prachi Godbole
Please find the updated patch below. I hope I've covered everything.
I've added the test for inline restriction, could you check if I got all the 
options correct?

Changelog:

2017-06-23  Prachi Godbole  

gcc/
* config/mips/mips.h (machine_function): New variable
use_hazard_barrier_return_p.
* config/mips/mips.md (UNSPEC_JRHB): New unspec.
(mips_hb_return_internal): New insn pattern.
* config/mips/mips.c (mips_attribute_table): Add attribute
use_hazard_barrier_return.
(mips_use_hazard_barrier_return_p): New static function.
(mips_function_attr_inlinable_p): Likewise.
(mips_compute_frame_info): Set use_hazard_barrier_return_p.  Emit error
for unsupported architecture choice.
(mips_function_ok_for_sibcall, mips_can_use_return_insn): Return false
for use_hazard_barrier_return.
(mips_expand_epilogue): Emit hazard barrier return.
* doc/extend.texi: Document use_hazard_barrier_return.

gcc/testsuite/
* gcc.target/mips/hazard-barrier-return-attribute.c: New tests.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 246899)
+++ gcc/doc/extend.texi (working copy)
@@ -4496,6 +4496,12 @@ On MIPS targets, you can use the @code{nocompressi
 to locally turn off MIPS16 and microMIPS code generation.  This attribute
 overrides the @option{-mips16} and @option{-mmicromips} options on the
 command line (@pxref{MIPS Options}).
+
+@item use_hazard_barrier_return
+@cindex @code{use_hazard_barrier_return} function attribute, MIPS
+This function attribute instructs the compiler to generate a hazard
+barrier return that clears all execution and instruction hazards while
+returning, instead of generating a normal return instruction.
 @end table
 
 @node MSP430 Function Attributes
Index: gcc/config/mips/mips.md
===
--- gcc/config/mips/mips.md (revision 246899)
+++ gcc/config/mips/mips.md (working copy)
@@ -156,6 +156,9 @@
 
   ;; The `.insn' pseudo-op.
   UNSPEC_INSN_PSEUDO
+
+  ;; Hazard barrier return.
+  UNSPEC_JRHB
 ])
 
 (define_constants
@@ -6578,6 +6581,20 @@
   [(set_attr "type""jump")
(set_attr "mode""none")])
 
+;; Insn to clear execution and instruction hazards while returning.
+;; However, it doesn't clear hazards created by the insn in its delay slot.
+;; Thus, explicitly place a nop in its delay slot.
+
+(define_insn "mips_hb_return_internal"
+  [(return)
+   (unspec_volatile [(match_operand 0 "pmode_register_operand" "")]
+   UNSPEC_JRHB)]
+  ""
+  {
+return "%(jr.hb\t$31%/%)";
+  }
+  [(set_attr "insn_count" "2")])
+
 ;; Normal return.
 
 (define_insn "_internal"
Index: gcc/config/mips/mips.c
===
--- gcc/config/mips/mips.c  (revision 246899)
+++ gcc/config/mips/mips.c  (working copy)
@@ -615,6 +615,7 @@ static const struct attribute_spec mips_attribute_
 mips_handle_use_shadow_register_set_attr, false },
   { "keep_interrupts_masked",  0, 0, false, true,  true, NULL, false },
   { "use_debug_exception_return", 0, 0, false, true,  true, NULL, false },
+  { "use_hazard_barrier_return", 0, 0, true, false, false, NULL, false },
   { NULL, 0, 0, false, false, false, NULL, false }
 };
 

@@ -1275,6 +1276,16 @@ mips_use_debug_exception_return_p (tree type)
   TYPE_ATTRIBUTES (type)) != NULL;
 }
 
+/* Check if the attribute to use hazard barrier return is set for
+   the function declaration DECL.  */
+
+static bool
+mips_use_hazard_barrier_return_p (const_tree decl)
+{
+  return lookup_attribute ("use_hazard_barrier_return",
+  DECL_ATTRIBUTES (decl)) != NULL;
+}
+
 /* Return the set of compression modes that are explicitly required
by the attributes in ATTRIBUTES.  */
 
@@ -1460,6 +1471,21 @@ mips_can_inline_p (tree caller, tree callee)
   return default_target_can_inline_p (caller, callee);
 }
 
+/* Implement TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P.
+
+   A function reqeuesting clearing of all instruction and execution hazards
+   before returning cannot be inlined - thereby not clearing any hazards.
+   All our other function attributes are related to how out-of-line copies
+   should be compiled or called.  They don't in themselves prevent inlining.  
*/
+
+static bool
+mips_function_attr_inlinable_p (const_tree decl)
+{
+  if (mips_use_hazard_barrier_return_p (decl))
+return false;
+  return true;
+}
+
 /* Handle an "interrupt" attribute with an optional argument.  */
 
 static tree
@@ -7863,6 +7889,17 @@ mips_function_ok_for_sibcall (tree decl, tree exp
   && !targetm.binds_local_p (decl))
 return false;
 
+  /* Functions that need to return with a hazard barrier cannot sibcall 
because:
+
+ 1) Hazard barriers are not possible for direct jumps

Re: PR81136: ICE from inconsistent DR_MISALIGNMENTs

2017-06-23 Thread Richard Biener
On Fri, Jun 23, 2017 at 1:19 PM, Richard Biener
 wrote:
> On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford
>  wrote:
>> The test case triggered this assert in vect_update_misalignment_for_peel:
>>
>>   gcc_assert (DR_MISALIGNMENT (dr) / dr_size ==
>>   DR_MISALIGNMENT (dr_peel) / dr_peel_size);
>>
>> We knew that the two DRs had the same misalignment at runtime, but when
>> considered in isolation, one data reference guaranteed a higher compile-time
>> base alignment than the other.
>>
>> In the test case this looks like a missed opportunity.  Both references
>> are unconditional, so it should be possible to use the highest of the
>> available base alignment guarantees when analyzing each reference.
>> The patch does this.
>>
>> However, as the comment in the patch says, the base alignment guarantees
>> provided by a conditional reference only apply if the reference occurs
>> at least once.  In this case it would be legitimate for two references
>> to have the same runtime misalignment and for one reference to provide a
>> stronger compile-time guarantee than the other about what the misalignment
>> actually is.  The patch therefore relaxes the assert to handle that case.
>
> Hmm, but you don't actually check whether a reference occurs only conditional,
> do you?  You just seem to say that for masked loads/stores the reference
> is conditional (I believe that's not true).  But for a loop like
>
>  for (;;)
>if (a[i])
>  sum += b[j];
>
> you still assume b[j] executes unconditionally?
>
> The vectorizer of course only sees unconditionally executed stmts.
>
> So - I'd simply not add this DR_IS_CONDITIONAL.  Did you run into
> any real-world (testsuite) issues without this?

+   for (int i = 0; i < n; ++i)
+ ptr[i] = c[i] ? ((struct s *) (ptr - 1))->array[i] : 0;
+

note that this example shows an if-conversion bug as if-conversion
makes the access to ((struct s *) (ptr - 1))->array[i] unconditional
if it may not trap (or the trap helper doesn't consider alignment-related
traps).

I'm not convinced we should worry ... (ok, I didn't say that).

Richard.

> Note that the assert is to prevent bogus information.  Iff we aligned
> DR with base alignment 8 and misalign 3 then if another same-align
> DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT
> as it still can be 8 after aligning DR.
>
> So I think it's wrong to put DRs with differing base-alignment into
> the same-align-refs chain, those should get their DR_MISALIGNMENT
> updated independenlty after peeling.
>
> I'd rather not mix fixing this with the improvement to eventuall use a
> larger align for the other DR if possible.
>
> Thanks,
> Richard.
>
>> Tested on powerpc64-linux-gnu, aarch64-linux-gnu and x86_64-linux-gnu.
>> OK to instal?
>>
>> Richard
>>
>>
>> 2017-06-22  Richard Sandiford  
>>
>> gcc/
>> PR tree-optimization/81136
>> * tree-vectorizer.h: Include tree-hash-traits.h.
>> (vec_base_alignments): New typedef.
>> (vec_info): Add a base_alignments field.
>> (vect_compute_base_alignments: Declare.
>> * tree-data-ref.h (data_reference): Add an is_conditional field.
>> (DR_IS_CONDITIONAL): New macro.
>> (create_data_ref): Add an is_conditional argument.
>> * tree-data-ref.c (create_data_ref): Likewise.  Use it to initialize
>> the is_conditional field.
>> (data_ref_loc): Add an is_conditional field.
>> (get_references_in_stmt): Set the is_conditional field.
>> (find_data_references_in_stmt): Update call to create_data_ref.
>> (graphite_find_data_references_in_stmt): Likewise.
>> * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Likewise.
>> * tree-vect-data-refs.c (vect_analyze_data_refs): Likewise.
>> (vect_get_base_address): New function.
>> (vect_compute_base_alignments): Likewise.
>> (vect_compute_base_alignment): Likewise, split out from...
>> (vect_compute_data_ref_alignment): ...here.  Use precomputed
>> base alignments.  Only compute a new base alignment here if the
>> reference is conditional.
>> (vect_update_misalignment_for_peel): Allow the compile-time
>> DR_MISALIGNMENTs of two references with the same runtime alignment
>> to be different if one of the references is conditional.
>> (vect_find_same_alignment_drs): Compare base addresses instead
>> of base objects.
>> (vect_compute_data_ref_alignment): Call vect_compute_base_alignments.
>> * tree-vect-slp.c (vect_slp_analyze_bb_1): Likewise.
>> (new_bb_vec_info): Initialize base_alignments.
>> * tree-vect-loop.c (new_loop_vec_info): Likewise.
>> * tree-vectorizer.c (vect_destroy_datarefs): Release base_alignments.
>>
>> gcc/testsuite/
>> PR tree-optimization/81136

Re: PR81136: ICE from inconsistent DR_MISALIGNMENTs

2017-06-23 Thread Richard Biener
On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford
 wrote:
> The test case triggered this assert in vect_update_misalignment_for_peel:
>
>   gcc_assert (DR_MISALIGNMENT (dr) / dr_size ==
>   DR_MISALIGNMENT (dr_peel) / dr_peel_size);
>
> We knew that the two DRs had the same misalignment at runtime, but when
> considered in isolation, one data reference guaranteed a higher compile-time
> base alignment than the other.
>
> In the test case this looks like a missed opportunity.  Both references
> are unconditional, so it should be possible to use the highest of the
> available base alignment guarantees when analyzing each reference.
> The patch does this.
>
> However, as the comment in the patch says, the base alignment guarantees
> provided by a conditional reference only apply if the reference occurs
> at least once.  In this case it would be legitimate for two references
> to have the same runtime misalignment and for one reference to provide a
> stronger compile-time guarantee than the other about what the misalignment
> actually is.  The patch therefore relaxes the assert to handle that case.

Hmm, but you don't actually check whether a reference occurs only conditional,
do you?  You just seem to say that for masked loads/stores the reference
is conditional (I believe that's not true).  But for a loop like

 for (;;)
   if (a[i])
 sum += b[j];

you still assume b[j] executes unconditionally?

The vectorizer of course only sees unconditionally executed stmts.

So - I'd simply not add this DR_IS_CONDITIONAL.  Did you run into
any real-world (testsuite) issues without this?

Note that the assert is to prevent bogus information.  Iff we aligned
DR with base alignment 8 and misalign 3 then if another same-align
DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT
as it still can be 8 after aligning DR.

So I think it's wrong to put DRs with differing base-alignment into
the same-align-refs chain, those should get their DR_MISALIGNMENT
updated independenlty after peeling.

I'd rather not mix fixing this with the improvement to eventuall use a
larger align for the other DR if possible.

Thanks,
Richard.

> Tested on powerpc64-linux-gnu, aarch64-linux-gnu and x86_64-linux-gnu.
> OK to instal?
>
> Richard
>
>
> 2017-06-22  Richard Sandiford  
>
> gcc/
> PR tree-optimization/81136
> * tree-vectorizer.h: Include tree-hash-traits.h.
> (vec_base_alignments): New typedef.
> (vec_info): Add a base_alignments field.
> (vect_compute_base_alignments: Declare.
> * tree-data-ref.h (data_reference): Add an is_conditional field.
> (DR_IS_CONDITIONAL): New macro.
> (create_data_ref): Add an is_conditional argument.
> * tree-data-ref.c (create_data_ref): Likewise.  Use it to initialize
> the is_conditional field.
> (data_ref_loc): Add an is_conditional field.
> (get_references_in_stmt): Set the is_conditional field.
> (find_data_references_in_stmt): Update call to create_data_ref.
> (graphite_find_data_references_in_stmt): Likewise.
> * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Likewise.
> * tree-vect-data-refs.c (vect_analyze_data_refs): Likewise.
> (vect_get_base_address): New function.
> (vect_compute_base_alignments): Likewise.
> (vect_compute_base_alignment): Likewise, split out from...
> (vect_compute_data_ref_alignment): ...here.  Use precomputed
> base alignments.  Only compute a new base alignment here if the
> reference is conditional.
> (vect_update_misalignment_for_peel): Allow the compile-time
> DR_MISALIGNMENTs of two references with the same runtime alignment
> to be different if one of the references is conditional.
> (vect_find_same_alignment_drs): Compare base addresses instead
> of base objects.
> (vect_compute_data_ref_alignment): Call vect_compute_base_alignments.
> * tree-vect-slp.c (vect_slp_analyze_bb_1): Likewise.
> (new_bb_vec_info): Initialize base_alignments.
> * tree-vect-loop.c (new_loop_vec_info): Likewise.
> * tree-vectorizer.c (vect_destroy_datarefs): Release base_alignments.
>
> gcc/testsuite/
> PR tree-optimization/81136
> * gcc.dg/vect/pr81136.c: New test.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2017-06-08 08:51:43.347264181 +0100
> +++ gcc/tree-vectorizer.h   2017-06-22 12:23:21.288421018 +0100
> @@ -22,6 +22,7 @@ Software Foundation; either version 3, o
>  #define GCC_TREE_VECTORIZER_H
>
>  #include "tree-data-ref.h"
> +#include "tree-hash-traits.h"
>  #include "target.h"
>
>  /* Used for naming of new temporaries.  */
> @@ -84,6 +85,10 @@ struct stmt_info_for_cost {
>
>  typedef vec stmt_vector_for_cost;
>

Re: [committed] Fix -fstack-check with really big frames on aarch64

2017-06-23 Thread Christophe Lyon
On 22 June 2017 at 19:21, Jeff Law  wrote:
>
> This time with the test.  Just #includes 20031023-1.c with a suitable dg
> directive to ensure we compile with -fstack-check.
>
> I won't be surprised if other targets fail this test.  It's a really big
> stack frame :-)
>
> Anyways, committed to the trunk.
>
>
>
> Jeff
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 0a3426eef3e..03a824f6b3f 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,8 @@
> +2017-06-22  Jeff Law  
> +
> +   * config/aarch64/aarch64.c (aarch64_emit_probe_stack_range): Handle
> +   frame sizes that do not satisfy aarch64_uimm12_shift.
> +
>  2017-06-22  Jan Hubicka 
>
> * profile-count.h (apply_probability,
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 3364a02e89c..95592f9fa17 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -2766,11 +2766,19 @@ aarch64_emit_probe_stack_range (HOST_WIDE_INT first, 
> HOST_WIDE_INT size)
>  plus_constant (Pmode, stack_pointer_rtx, -first));
>
>/* LAST_ADDR = SP + FIRST + ROUNDED_SIZE.  */
> -  emit_set_insn (reg2,
> -plus_constant (Pmode, stack_pointer_rtx,
> -   -(first + rounded_size)));
> -
> -
> +  HOST_WIDE_INT adjustment = - (first + rounded_size);
> +  if (! aarch64_uimm12_shift (adjustment))
> +   {
> + aarch64_internal_mov_immediate (reg2, GEN_INT (adjustment),
> + true, Pmode);
> + emit_set_insn (reg2, gen_rtx_PLUS (Pmode, stack_pointer_rtx, reg2));
> +   }
> +  else
> +   {
> + emit_set_insn (reg2,
> +plus_constant (Pmode, stack_pointer_rtx, 
> adjustment));
> +   }
> +
>/* Step 3: the loop
>
>  do
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 641e4124e37..e162386fb68 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,7 @@
> +2017-06-22  Jeff Law  
> +
> +   * gcc.c-torture/compile/stack-check-1.c: New test.
> +
>  2016-06-22  Richard Biener  
>
> * gcc.dg/vect/pr65947-1.c: Remove xfail.
> diff --git a/gcc/testsuite/gcc.c-torture/compile/stack-check-1.c 
> b/gcc/testsuite/gcc.c-torture/compile/stack-check-1.c
> new file mode 100644
> index 000..4058eb58709
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/stack-check-1.c
> @@ -0,0 +1,2 @@
> +/* { dg-additional-options "-fstack-check" } */
> +#include "20031023-1.c"
>

Hi,

A minor comment at this stage: this new test fails to compile for
thumb-1 targets:
testsuite/gcc.c-torture/compile/20031023-1.c:27:1: sorry,
unimplemented: -fstack-check=specific for Thumb-1

for instance on arm-none-linux-gnueabi --with-mode=thumb --with-cpu=cortex-a9
and forcing -march=armv5t in runtest flags.

Is there a clean way to make it unsupported?

Thanks,

Christophe


Re: [PATCH GCC][5/5]Enable tree loop distribution at -O3 and above optimization levels.

2017-06-23 Thread Richard Biener
On Fri, Jun 23, 2017 at 10:47 AM, Bin.Cheng  wrote:
> On Fri, Jun 23, 2017 at 6:04 AM, Jeff Law  wrote:
>> On 06/07/2017 02:07 AM, Bin.Cheng wrote:
>>> On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law  wrote:
 On 06/02/2017 05:52 AM, Bin Cheng wrote:
> Hi,
> This patch enables -ftree-loop-distribution by default at -O3 and above 
> optimization levels.
> Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?
>
> Note I don't have strong opinion here and am fine with either it's 
> accepted or rejected.
>
> Thanks,
> bin
> 2017-05-31  Bin Cheng  
>
>   * opts.c (default_options_table): Enable OPT_ftree_loop_distribution
>   for -O3 and above levels.
 I think the question is how does this generally impact the performance
 of the generated code and to a lesser degree compile-time.

 Do you have any performance data?
>>> Hi Jeff,
>>> At this stage of the patch, only hmmer is impacted and improved
>>> obviously in my local run of spec2006 for x86_64 and AArch64.  In long
>>> term, loop distribution is also one prerequisite transformation to
>>> handle bwaves (at least).  For these two impacted cases, it helps to
>>> resolve the gap against ICC.  I didn't check compilation time slow
>>> down, we can restrict it to problem with small partition number if
>>> that's a problem.
>> Just a note. I know you've iterated further with Richi -- I'm not
>> objecting to the patch, nor was I ready to approve.
>>
>> Are you and Richi happy with this as-is or are you looking to submit
>> something newer based on the conversation the two of you have had?
> Hi Jeff,
> The patch series is updated in various ways according to review
> comments, for example, it restricts compilation time by checking
> number of data references against MAX_DATAREFS_FOR_DATADEPS as well as
> restores data dependence cache.  There are still two missing parts I'd
> like to do as followup patches: one is loop nest distribution and the
> other is a data-locality cost model (at least) for small cases.  Now
> Richi approved most patches except the last major one, but I still
> need another iterate for some (approved) patches in order to fix
> mistake/typo introduced when I separating the patch.

The patch is ok after the approved parts of the ldist series has been committed.
Note your patch lacks updates to invoke.texi (what options are enabled at -O3).
Please adjust that before committing.

Thanks,
Richard.

> Thanks,
> bin
>>
>> jeff


Re: Avoid generating useless range info

2017-06-23 Thread Richard Biener
On Fri, Jun 23, 2017 at 12:32 PM, Jakub Jelinek  wrote:
> On Fri, Jun 23, 2017 at 12:24:25PM +0200, Richard Biener wrote:
>> > void
>> > set_nonzero_bits (tree name, const wide_int_ref )
>> > {
>> >   gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
>> >   if (SSA_NAME_RANGE_INFO (name) == NULL)
>> > set_range_info (name, VR_RANGE,
>> >TYPE_MIN_VALUE (TREE_TYPE (name)),
>> >TYPE_MAX_VALUE (TREE_TYPE (name)));
>> >   range_info_def *ri = SSA_NAME_RANGE_INFO (name);
>> >   ri->set_nonzero_bits (mask);
>> > }
>> >
>> > Let me know how you'd like me to proceed.
>>
>> Just factor out a set_range_info_raw and call that then from here.
>
> And don't call it if the mask is all ones.  Perhaps set_range_info
> and set_nonzero_bits even should ggc_free and clear earlier range_info_def
> if the range is all values and nonzero bit mask is all ones.
> Or do we share range_info_def between multiple SSA_NAMEs?  If yes, of course
> we shouldn't use ggc_free.

We shouldn't as we don't copy on change.  We do for points-to but only for the
bitmap pointer IIRC.

Richard,

>
> Jakub


Re: [PATCH][AArch64] Mark symbols as constant

2017-06-23 Thread Wilco Dijkstra
Andreas Schwab wrote:
>
> This breaks gcc.target/aarch64/reload-valid-spoff.c with -mabi=ilp32:

Indeed, there is a odd ILP32 bug that causes high/lo_sum to be generated
in SI mode in expand:

(insn 15 14 16 4 (set (reg:SI 125)
(high:SI (symbol_ref/u:DI ("*.LC1") [flags 0x2]))) 
 (nil))
(insn 16 15 17 4 (set (reg:SI 124)
(lo_sum:SI (reg:SI 125)
(symbol_ref/u:DI ("*.LC1") [flags 0x2])))

Eventually this may end up as a 64-bit adrp, a 32-bit lo_sum and a load that
expects a 64-bit address...

I have a simple workaround to disable the symbol optimization in ILP32,
but I'm looking into fixing the underlying cause.

Wilco


Re: [PATCH GCC][09/13]Simply cost model merges partitions with the same references

2017-06-23 Thread Bin.Cheng
On Fri, Jun 23, 2017 at 11:48 AM, Richard Biener
 wrote:
> On Fri, Jun 23, 2017 at 12:19 PM, Bin.Cheng  wrote:
>> On Mon, Jun 19, 2017 at 4:20 PM, Richard Biener
>>  wrote:
>>> On Mon, Jun 19, 2017 at 3:40 PM, Bin.Cheng  wrote:
 On Wed, Jun 14, 2017 at 2:54 PM, Richard Biener
  wrote:
> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
>> Hi,
>> Current primitive cost model merges partitions with data references 
>> sharing the same
>> base address.  I believe it's designed to maximize data reuse in 
>> distribution, but
>> that should be done by dedicated data reusing algorithm.  At this stage 
>> of merging,
>> we should be conservative and only merge partitions with the same 
>> references.
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> Well, I'd say "conservative" is merging more, not less.  For example
> splitting a[i+1] from a[i]
> would be bad(?), so I'd see to allow unequal DR_INIT as "equal" for
> merging.  Maybe
> DR_INIT within a cacheline or so.
>
> How many extra distributions in say SPEC do you get from this change 
> alone?
 Hi,
 I collected data for spec2006 only with/without this patch.  I am a
 bit surprised that it doesn't change the number of distributed loops.
>
> It shows also that having partition->reads_and_writes would be nice
> ...  the code duplication
 Yeah, I merged read/write data references in previous patch, now this
 duplication is gone.  Update patch attached.  Is it OK?
>>>
>>> +  gcc_assert (i < datarefs_vec.length ());
>>> +  dr1 = datarefs_vec[i];
>>>
>>> these asserts are superfluous -- vec::operator[] does them as well.
>>>
>>> Ok if you remove them.
>> Done.
>> I realized I made mistakes when measuring the impact of this patch.
>> This patch only apparently causes failure of
>> gcc.dg/tree-ssa/ldist-6.c, so here is the updated patch.  I also
>> collected the number of distributed loops in spec2k6 as below:
>>  trunk:  5882
>>  only this patch: 7130
>>  whole patch series: 5237
>> So the conclusion is, this patch does aggressive distribution like
>> ldist-6.c, which means worse data-locality.  The following patch does
>> more fusion which mitigates impact of this patch and results in
>> conservative distribution overall.
>
> What changed in the patch?  Did you attach the correct one?
No code changed in this one.  I just added test case change which
can't be resolved by following patches.  ldist-6.c slipped away
because of a bug in patch:

[11/13]Annotate partition by its parallelism execution type

>
> I'm not sure ldist-6.c is a "valid" testcase but I didn't try to see
> where it was reduced from.
>
>>   But as we lack of data locality
>> cost model, ldist-6.c remains failed even after applying whole patch
>> series.  Hmm, a cache-sensitive cost model is need for several passes
>> now, distribution, prefetch and (possible) interchange.
>> Richard, do you have second comment based on the new data?
>
> I expected the "only this patch" result somewhat, as said, I'd have
> allowed "related" references to fuse by not requiring equal
> DR_INIT for example.
>
> I suggest to go forward with it in its current form.  We can tweak the
> cost model later.
Yeah.
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>> 2017-06-20  Bin Cheng  
>>
>> * tree-loop-distribution.c (ref_base_address): Delete.
>> (similar_memory_accesses): Rename ...
>> (share_memory_accesses): ... to this.  Check if partitions access
>> the same memory reference.
>> (distribute_loop): Call share_memory_accesses.
>>
>> gcc/testsuite/ChangeLog
>> 2017-06-20  Bin Cheng  
>>
>> * gcc.dg/tree-ssa/ldist-6.c: XFAIL.


Re: [PATCH GCC][08/13]Refactoring structure partition for distribution

2017-06-23 Thread Richard Biener
On Fri, Jun 23, 2017 at 12:29 PM, Bin.Cheng  wrote:
> On Mon, Jun 19, 2017 at 4:18 PM, Richard Biener
>  wrote:
>> On Mon, Jun 19, 2017 at 3:37 PM, Bin.Cheng  wrote:
>>> On Wed, Jun 14, 2017 at 2:47 PM, Richard Biener
>>>  wrote:
 On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
> Hi,
> This patch refactors struct partition for later distribution.  It records
> bitmap of data references in struct partition rather than vertices' data 
> in
> partition dependence graph.  It simplifies code as well as enables 
> following
> rewriting.
> Bootstrap and test on x86_64 and AArch64.  Is it OK?

 Ok.
>>> Hi,
>>> I updated patch by merging read/write data references together in
>>> struct partition.  This helps remove code duplication.  Is it OK?
>>
>> Ok.
> Sorry I made a mistake when separating patch.  The previous patch uses
> un-initialized variable "dir".  Though related code was removed by
> following patch, for this specific version, the code is wrong.  Is it
> OK?
> Like:
> +int dir = pg_add_dependence_edges (rdg, dir,
> +   partition1->datarefs,
> +   partition2->datarefs);
> Now changed to
> +int dir = pg_add_dependence_edges (rdg, 0,
> +   partition1->datarefs,
> +   partition2->datarefs);

Ok.

> Thanks,
> bin
>
>>
>> Richard.
>>
>>> Thanks,
>>> bin
>>> 2017-06-07  Bin Cheng  
>>>
>>> * tree-loop-distribution.c (struct partition): New field recording
>>> its data reference.
>>> (partition_alloc, partition_free): Init and release data refs.
>>> (partition_merge_into): Merge data refs.
>>> (build_rdg_partition_for_vertex): Collect data refs for partition.
>>> (pg_add_dependence_edges): Change parameters from vector to bitmap.
>>> Update uses.
>>> (distribute_loop): Remve data refs from vertice data of partition
>>> graph.


Re: [PATCH GCC][11/13]Annotate partition by its parallelism execution type

2017-06-23 Thread Richard Biener
On Fri, Jun 23, 2017 at 12:25 PM, Bin.Cheng  wrote:
> And the patch.
>
> On Fri, Jun 23, 2017 at 11:24 AM, Bin.Cheng  wrote:
>> On Tue, Jun 20, 2017 at 12:34 PM, Richard Biener
>>  wrote:
>>> On Tue, Jun 20, 2017 at 11:18 AM, Bin.Cheng  wrote:
 On Fri, Jun 16, 2017 at 11:10 AM, Richard Biener
  wrote:
> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
>> Hi,
>> This patch checks and records if partition can be executed in parallel by
>> looking if there exists data dependence cycles.  The information is 
>> needed
>> for distribution because the idea is to distribute parallel type 
>> partitions
>> away from sequential ones.  I believe current distribution doesn't work
>> very well because it does blind distribution/fusion.
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> +  /* In case of no data dependence.  */
> +  if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
> +return false;
> +  /* Or the data dependence can be resolved by compilation time alias
> + check.  */
> +  else if (!alias_sets_conflict_p (get_alias_set (DR_REF (dr1)),
> +  get_alias_set (DR_REF (dr2
> +return false;
>
> dependence analysis should use TBAA already, in which cases do you need 
> this?
> It seems to fall foul of the easy mistake of not honoring GCCs memory 
> model
> as well ... see dr_may_alias_p.
 I see.  Patch updated with this branch removed.

>
> +  /* Further check if any data dependence prevents us from executing the
> + partition parallelly.  */
> +  EXECUTE_IF_SET_IN_BITMAP (partition->reads, 0, i, bi)
> +{
> +  dr1 = (*datarefs_vec)[i];
> +  EXECUTE_IF_SET_IN_BITMAP (partition->writes, 0, j, bj)
> +   {
>
> what about write-write dependences?
>
> +  EXECUTE_IF_SET_IN_BITMAP (partition->reads, 0, i, bi)
> +{
> +  dr1 = (*datarefs_vec)[i];
> +  EXECUTE_IF_SET_IN_BITMAP (partition->writes, i + 1, j, bj)
> +   {
> + dr2 = (*datarefs_vec)[j];
> + /* Partition can only be executed sequentially if there is any
> +data dependence cycle.  */
>
> exact copy of the loop nest follows?!  Maybe you meant to iterate
> over writes in the first loop.
 Yes, this is a copy-paste typo.  Patch is also simplified because
 read/write are recorded together now.  Is it OK?
>>>
>>> Ok.
>> Sorry I have to update this patch because one of my mistake.  I didn't
>> update partition type when fusing them.  For some partition fusion,
>> the update is necessary otherwise we end up with inaccurate type and
>> inaccurate fusion later.  Is it Ok?

Ok.

>> Thanks,
>> bin
>> 2017-06-20  Bin Cheng  
>>
>> * tree-loop-distribution.c (enum partition_type): New.
>> (struct partition): New field type.
>> (partition_merge_into): Add parameter.  Update partition type.
>> (data_dep_in_cycle_p, update_type_for_merge): New functions.
>> (build_rdg_partition_for_vertex): Compute partition type.
>> (rdg_build_partitions): Dump partition type.
>> (distribute_loop): Update calls to partition_merge_into.


Re: [PATCH GCC][10/13]Compute and cache data dependence relation

2017-06-23 Thread Richard Biener
On Fri, Jun 23, 2017 at 12:22 PM, Bin.Cheng  wrote:
> On Tue, Jun 20, 2017 at 12:32 PM, Richard Biener
>  wrote:
>> On Tue, Jun 20, 2017 at 11:15 AM, Bin.Cheng  wrote:
>>> On Fri, Jun 16, 2017 at 11:03 AM, Richard Biener
>>>  wrote:
 On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
> Hi,
> This patch computes and caches data dependence relation in a hash table
> so that it can be queried multiple times later for partition dependence
> check.
> Bootstrap and test on x86_64 and AArch64.  Is it OK?

 +/* Vector of data dependence relations.  */
 +static vec *ddrs_vec;
 +
 +/* Hash table for data dependence relation in the loop to be distributed. 
  */
 +static hash_table *ddrs_table;

 avoid the extra indirection.

 +/* Hashtable entry for data reference relation.  */
 +struct ddr_entry
 +{
 +  data_reference_p a;
 +  data_reference_p b;
 +  ddr_p ddr;
 +  hashval_t hash;
 +};
 ...
 +/* Hash table equality function for data reference relation.  */
 +
 +inline bool
 +ddr_entry_hasher::equal (const ddr_entry *entry1, const ddr_entry *entry2)
 +{
 +  return (entry1->hash == entry2->hash
 + && DR_STMT (entry1->a) == DR_STMT (entry2->a)
 + && DR_STMT (entry1->b) == DR_STMT (entry2->b)
 + && operand_equal_p (DR_REF (entry1->a), DR_REF (entry2->a), 0)
 + && operand_equal_p (DR_REF (entry1->b), DR_REF (entry2->b), 0));
 +}

 what's the issue with using hash_table  with a custom hasher?
 That is, simply key on the dataref pointers (hash them, compare those
 for equality)?

 Your scheme looks too complicated / expensive to me ...

 You can drop ddrs_vec needed only for memory removal if you traverse
 the hashtable.
>>> Thanks for reviewing.  Patch simplified as suggested.  Is it OK?
>>
>> +inline hashval_t
>> +ddr_hasher::hash (const data_dependence_relation *ddr)
>> +{
>> +  return iterative_hash_object (DDR_A (ddr),
>> +   iterative_hash_object (DDR_B (ddr), 0));
>> +}
>> +
>>
>> please use
>>
>> inchash::hash h;
>> h.add_ptr (DDR_A (ddr));
>> h.add_ptr (DDR_B (ddr));
>> return h.end ();
>>
>> Ok with that change.
> Done, patch updated.

Ok.

> Thanks,
> bin
>>
>> Richard.
>>
>>> Thanks,
>>> bin
>>> 2017-06-17  Bin Cheng  
>>>
>>> * tree-loop-distribution.c (struct ddr_hasher): New.
>>> (ddr_hasher::hash, ::equal, get_data_dependence): New function.
>>> (ddrs_table): New.
>>> (classify_partition): Call get_data_dependence.
>>> (pg_add_dependence_edges): Ditto.
>>> (distribute_loop): Release data dependence hash table.


Re: [PATCH GCC][12/13]Workaround reduction statements for distribution

2017-06-23 Thread Richard Biener
On Fri, Jun 23, 2017 at 12:26 PM, Bin.Cheng  wrote:
> On Tue, Jun 20, 2017 at 12:36 PM, Richard Biener
>  wrote:
>> On Tue, Jun 20, 2017 at 11:20 AM, Bin.Cheng  wrote:
>>> On Fri, Jun 16, 2017 at 6:15 PM, Bin.Cheng  wrote:
 On Fri, Jun 16, 2017 at 11:21 AM, Richard Biener
  wrote:
> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
>> Hi,
>> For now, loop distribution handles variables used outside of loop as 
>> reduction.
>> This is inaccurate because all partitions contain statement defining 
>> induction
>> vars.
>
> But final induction values are usually not used outside of the loop...
 This is in actuality for induction variable which is used outside of the 
 loop.
>
> What is missing is loop distribution trying to change partition order.  
> In fact
> we somehow assume we can move a reduction across a detected builtin
> (I don't remember if we ever check for validity of that...).
 Hmm, I am not sure when we can't.  If there is any dependence between
 builtin/reduction partitions, it should be captured by RDG or PG,
 otherwise the partitions are independent and can be freely ordered as
 long as reduction partition is scheduled last?
>
>> Ideally we should factor out scev-propagation as a standalone interface
>> which can be called when necessary.  Before that, this patch simply 
>> workarounds
>> reduction issue by checking if the statement belongs to all partitions.  
>> If yes,
>> the reduction must be computed in the last partition no matter how the 
>> loop is
>> distributed.
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> stmt_in_all_partitions is not kept up-to-date during partition merging 
> and if
> merging makes the reduction partition(s) pass the stmt_in_all_partitions
> test your simple workaround doesn't work ...
 I think it doesn't matter because:
   A) it's really workaround for induction variables.  In general,
 induction variables are included by all partition.
   B) After classify partition, we immediately fuses all reduction
 partitions.  More stmt_in_all_partitions means we are fusing
 non-reduction partition with reduction partition, so the newly
 generated (stmt_in_all_partitions) are actually not reduction
 statements.  The workaround won't work anyway even the bitmap is
 maintained.
>
> As written it's a valid optimization but can you please note it's 
> limitation in
> some comment please?
 Yeah, I will add comment explaining it.
>>> Comment added in new version patch.  It also computes bitmap outside
>>> now, is it OK?
>>
>> Ok.  Can you add a testcase for this as well please?  I think the
>> series up to this
>> is now fully reviewed, I defered 1/n (the new IFN) to the last one
>> containing the
>> runtime versioning.  Can you re-post that (you can merge with the IFN patch)
>> to apply after the series has been applied up to this?
> Test case added.

Ok.

> Thanks,
> bin
> 2017-06-20  Bin Cheng  
>
> * tree-loop-distribution.c (classify_partition): New parameter and
> better handle reduction statement.
> (rdg_build_partitions): Revise comment.
> (distribute_loop): Compute statements in all partitions and pass it
> to classify_partition.
>
> gcc/testsuite/ChangeLog
> 2017-06-20  Bin Cheng  
>
> * gcc.dg/tree-ssa/ldist-26.c: New test.


Re: [PATCH GCC][09/13]Simply cost model merges partitions with the same references

2017-06-23 Thread Richard Biener
On Fri, Jun 23, 2017 at 12:19 PM, Bin.Cheng  wrote:
> On Mon, Jun 19, 2017 at 4:20 PM, Richard Biener
>  wrote:
>> On Mon, Jun 19, 2017 at 3:40 PM, Bin.Cheng  wrote:
>>> On Wed, Jun 14, 2017 at 2:54 PM, Richard Biener
>>>  wrote:
 On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
> Hi,
> Current primitive cost model merges partitions with data references 
> sharing the same
> base address.  I believe it's designed to maximize data reuse in 
> distribution, but
> that should be done by dedicated data reusing algorithm.  At this stage 
> of merging,
> we should be conservative and only merge partitions with the same 
> references.
> Bootstrap and test on x86_64 and AArch64.  Is it OK?

 Well, I'd say "conservative" is merging more, not less.  For example
 splitting a[i+1] from a[i]
 would be bad(?), so I'd see to allow unequal DR_INIT as "equal" for
 merging.  Maybe
 DR_INIT within a cacheline or so.

 How many extra distributions in say SPEC do you get from this change alone?
>>> Hi,
>>> I collected data for spec2006 only with/without this patch.  I am a
>>> bit surprised that it doesn't change the number of distributed loops.

 It shows also that having partition->reads_and_writes would be nice
 ...  the code duplication
>>> Yeah, I merged read/write data references in previous patch, now this
>>> duplication is gone.  Update patch attached.  Is it OK?
>>
>> +  gcc_assert (i < datarefs_vec.length ());
>> +  dr1 = datarefs_vec[i];
>>
>> these asserts are superfluous -- vec::operator[] does them as well.
>>
>> Ok if you remove them.
> Done.
> I realized I made mistakes when measuring the impact of this patch.
> This patch only apparently causes failure of
> gcc.dg/tree-ssa/ldist-6.c, so here is the updated patch.  I also
> collected the number of distributed loops in spec2k6 as below:
>  trunk:  5882
>  only this patch: 7130
>  whole patch series: 5237
> So the conclusion is, this patch does aggressive distribution like
> ldist-6.c, which means worse data-locality.  The following patch does
> more fusion which mitigates impact of this patch and results in
> conservative distribution overall.

What changed in the patch?  Did you attach the correct one?

I'm not sure ldist-6.c is a "valid" testcase but I didn't try to see
where it was reduced from.

>   But as we lack of data locality
> cost model, ldist-6.c remains failed even after applying whole patch
> series.  Hmm, a cache-sensitive cost model is need for several passes
> now, distribution, prefetch and (possible) interchange.
> Richard, do you have second comment based on the new data?

I expected the "only this patch" result somewhat, as said, I'd have
allowed "related" references to fuse by not requiring equal
DR_INIT for example.

I suggest to go forward with it in its current form.  We can tweak the
cost model later.

Thanks,
Richard.

> Thanks,
> bin
> 2017-06-20  Bin Cheng  
>
> * tree-loop-distribution.c (ref_base_address): Delete.
> (similar_memory_accesses): Rename ...
> (share_memory_accesses): ... to this.  Check if partitions access
> the same memory reference.
> (distribute_loop): Call share_memory_accesses.
>
> gcc/testsuite/ChangeLog
> 2017-06-20  Bin Cheng  
>
> * gcc.dg/tree-ssa/ldist-6.c: XFAIL.


Re: Avoid generating useless range info

2017-06-23 Thread Jakub Jelinek
On Fri, Jun 23, 2017 at 12:24:25PM +0200, Richard Biener wrote:
> > void
> > set_nonzero_bits (tree name, const wide_int_ref )
> > {
> >   gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
> >   if (SSA_NAME_RANGE_INFO (name) == NULL)
> > set_range_info (name, VR_RANGE,
> >TYPE_MIN_VALUE (TREE_TYPE (name)),
> >TYPE_MAX_VALUE (TREE_TYPE (name)));
> >   range_info_def *ri = SSA_NAME_RANGE_INFO (name);
> >   ri->set_nonzero_bits (mask);
> > }
> >
> > Let me know how you'd like me to proceed.
> 
> Just factor out a set_range_info_raw and call that then from here.

And don't call it if the mask is all ones.  Perhaps set_range_info
and set_nonzero_bits even should ggc_free and clear earlier range_info_def
if the range is all values and nonzero bit mask is all ones.
Or do we share range_info_def between multiple SSA_NAMEs?  If yes, of course
we shouldn't use ggc_free.

Jakub


Re: [PATCH GCC][13/13]Distribute loop with loop versioning under runtime alias check

2017-06-23 Thread Bin.Cheng
On Tue, Jun 20, 2017 at 10:22 AM, Bin.Cheng  wrote:
> On Mon, Jun 12, 2017 at 6:03 PM, Bin Cheng  wrote:
>> Hi,
>> This is the main patch rewriting loop distribution in order to handle hmmer.
>> It improves loop distribution by versioning loop under runtime alias check 
>> conditions.
>> As described in comments, the patch basically implements distribution in the 
>> following
>> steps:
>>
>>  1) Seed partitions with specific type statements.  For now we support
>> two types seed statements: statement defining variable used outside
>> of loop; statement storing to memory.
>>  2) Build reduced dependence graph (RDG) for loop to be distributed.
>> The vertices (RDG:V) model all statements in the loop and the edges
>> (RDG:E) model flow and control dependencies between statements.
>>  3) Apart from RDG, compute data dependencies between memory references.
>>  4) Starting from seed statement, build up partition by adding depended
>> statements according to RDG's dependence information.  Partition is
>> classified as parallel type if it can be executed paralleled; or as
>> sequential type if it can't.  Parallel type partition is further
>> classified as different builtin kinds if it can be implemented as
>> builtin function calls.
>>  5) Build partition dependence graph (PG) based on data dependencies.
>> The vertices (PG:V) model all partitions and the edges (PG:E) model
>> all data dependencies between every partitions pair.  In general,
>> data dependence is either compilation time known or unknown.  In C
>> family languages, there exists quite amount compilation time unknown
>> dependencies because of possible alias relation of data references.
>> We categorize PG's edge to two types: "true" edge that represents
>> compilation time known data dependencies; "alias" edge for all other
>> data dependencies.
>>  6) Traverse subgraph of PG as if all "alias" edges don't exist.  Merge
>> partitions in each strong connected component (SCC) correspondingly.
>> Build new PG for merged partitions.
>>  7) Traverse PG again and this time with both "true" and "alias" edges
>> included.  We try to break SCCs by removing some edges.  Because
>> SCCs by "true" edges are all fused in step 6), we can break SCCs
>> by removing some "alias" edges.  It's NP-hard to choose optimal
>> edge set, fortunately simple approximation is good enough for us
>> given the small problem scale.
>>  8) Collect all data dependencies of the removed "alias" edges.  Create
>> runtime alias checks for collected data dependencies.
>>  9) Version loop under the condition of runtime alias checks.  Given
>> loop distribution generally introduces additional overhead, it is
>> only useful if vectorization is achieved in distributed loop.  We
>> version loop with internal function call IFN_LOOP_DIST_ALIAS.  If
>> no distributed loop can be vectorized, we simply remove distributed
>> loops and recover to the original one.
>>
>> Also, there are some more to improve in the future (which isn't difficult I 
>> think):
>>TODO:
>>  1) We only distribute innermost loops now.  This pass should handle loop
>> nests in the future.
>>  2) We only fuse partitions in SCC now.  A better fusion algorithm is
>> desired to minimize loop overhead, maximize parallelism and maximize
>>
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
> Trivial updated due to changes in previous patches.  Also fixed issues
> mentioned by Kugan.
Rebased V3 for changes in previous patches.  Bootstap and test on
x86_64 and aarch64.

Thanks,
bin
>
> Thanks,
> bin
>
> 2017-06-17  Bin Cheng  
>
> * tree-loop-distribution.c: Add general explanantion on the pass.
> (pg_add_dependence_edges): New parameter.  handle alias data
> dependence specially and record it in the parameter if asked.
> (struct pg_vdata, pg_edata, pg_edge_callback_data): New structs.
> (init_partition_graph_vertices, add_partition_graph_edge): New.
> (pg_skip_alias_edge, free_partition_graph_edata_cb): New.
> (free_partition_graph_vdata, build_partition_graph): New.
> (sort_partitions_by_post_order, merge_dep_scc_partitions): New.
> (pg_collect_alias_ddrs, break_alias_scc_partitions): New.
> (data_ref_segment_size, latch_dominated_by_data_ref): New.
> (compute_alias_check_pairs, version_loop_by_alias_check): New.
> (version_for_distribution_p, finalize_partitions): New.
> (distribute_loop): Handle alias data dependence specially.  Factor
> out loop fusion code as functions and call these functions.
>
> gcc/testsuite/ChangeLog
> 2017-06-17  Bin Cheng  
>
> * gcc.dg/tree-ssa/ldist-4.c: 

Re: [PATCH GCC][08/13]Refactoring structure partition for distribution

2017-06-23 Thread Bin.Cheng
On Mon, Jun 19, 2017 at 4:18 PM, Richard Biener
 wrote:
> On Mon, Jun 19, 2017 at 3:37 PM, Bin.Cheng  wrote:
>> On Wed, Jun 14, 2017 at 2:47 PM, Richard Biener
>>  wrote:
>>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
 Hi,
 This patch refactors struct partition for later distribution.  It records
 bitmap of data references in struct partition rather than vertices' data in
 partition dependence graph.  It simplifies code as well as enables 
 following
 rewriting.
 Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>>> Ok.
>> Hi,
>> I updated patch by merging read/write data references together in
>> struct partition.  This helps remove code duplication.  Is it OK?
>
> Ok.
Sorry I made a mistake when separating patch.  The previous patch uses
un-initialized variable "dir".  Though related code was removed by
following patch, for this specific version, the code is wrong.  Is it
OK?
Like:
+int dir = pg_add_dependence_edges (rdg, dir,
+   partition1->datarefs,
+   partition2->datarefs);
Now changed to
+int dir = pg_add_dependence_edges (rdg, 0,
+   partition1->datarefs,
+   partition2->datarefs);

Thanks,
bin

>
> Richard.
>
>> Thanks,
>> bin
>> 2017-06-07  Bin Cheng  
>>
>> * tree-loop-distribution.c (struct partition): New field recording
>> its data reference.
>> (partition_alloc, partition_free): Init and release data refs.
>> (partition_merge_into): Merge data refs.
>> (build_rdg_partition_for_vertex): Collect data refs for partition.
>> (pg_add_dependence_edges): Change parameters from vector to bitmap.
>> Update uses.
>> (distribute_loop): Remve data refs from vertice data of partition
>> graph.
From 324f9c5505cdf928e4178ab610aafc2800ca8577 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 9 Jun 2017 12:29:24 +0100
Subject: [PATCH 07/13] struct-partition-refactoring-20170608.txt

---
 gcc/tree-loop-distribution.c | 179 +++
 1 file changed, 94 insertions(+), 85 deletions(-)

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index a013556..eafd119 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -500,30 +500,40 @@ enum partition_kind {
 PKIND_NORMAL, PKIND_MEMSET, PKIND_MEMCPY, PKIND_MEMMOVE
 };
 
+/* Partition for loop distribution.  */
 struct partition
 {
+  /* Statements of the partition.  */
   bitmap stmts;
+  /* Loops of the partition.  */
   bitmap loops;
+  /* True if the partition defines variable which is used outside of loop.  */
   bool reduction_p;
+  /* For builtin partition, true if it executes one iteration more than
+ number of loop (latch) iterations.  */
   bool plus_one;
   enum partition_kind kind;
   /* data-references a kind != PKIND_NORMAL partition is about.  */
   data_reference_p main_dr;
   data_reference_p secondary_dr;
+  /* Number of loop (latch) iterations.  */
   tree niter;
+  /* Data references in the partition.  */
+  bitmap datarefs;
 };
 
 
 /* Allocate and initialize a partition from BITMAP.  */
 
 static partition *
-partition_alloc (bitmap stmts, bitmap loops)
+partition_alloc (void)
 {
   partition *partition = XCNEW (struct partition);
-  partition->stmts = stmts ? stmts : BITMAP_ALLOC (NULL);
-  partition->loops = loops ? loops : BITMAP_ALLOC (NULL);
+  partition->stmts = BITMAP_ALLOC (NULL);
+  partition->loops = BITMAP_ALLOC (NULL);
   partition->reduction_p = false;
   partition->kind = PKIND_NORMAL;
+  partition->datarefs = BITMAP_ALLOC (NULL);
   return partition;
 }
 
@@ -534,6 +544,7 @@ partition_free (partition *partition)
 {
   BITMAP_FREE (partition->stmts);
   BITMAP_FREE (partition->loops);
+  BITMAP_FREE (partition->datarefs);
   free (partition);
 }
 
@@ -581,6 +592,8 @@ partition_merge_into (partition *dest, partition *partition, enum fuse_type ft)
   if (partition_reduction_p (partition))
 dest->reduction_p = true;
 
+  bitmap_ior_into (dest->datarefs, partition->datarefs);
+
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "Fuse partitions because %s:\n", fuse_message[ft]);
@@ -1051,10 +1064,11 @@ generate_code_for_partition (struct loop *loop,
 static partition *
 build_rdg_partition_for_vertex (struct graph *rdg, int v)
 {
-  partition *partition = partition_alloc (NULL, NULL);
+  partition *partition = partition_alloc ();
   auto_vec nodes;
-  unsigned i;
+  unsigned i, j;
   int x;
+  data_reference_p dr;
 
   graphds_dfs (rdg, , 1, , false, NULL);
 
@@ -1063,6 +1077,14 @@ build_rdg_partition_for_vertex (struct graph *rdg, int v)
   bitmap_set_bit (partition->stmts, x);
   bitmap_set_bit (partition->loops,
 		  loop_containing_stmt (RDG_STMT 

Re: [PATCH GCC][12/13]Workaround reduction statements for distribution

2017-06-23 Thread Bin.Cheng
On Tue, Jun 20, 2017 at 12:36 PM, Richard Biener
 wrote:
> On Tue, Jun 20, 2017 at 11:20 AM, Bin.Cheng  wrote:
>> On Fri, Jun 16, 2017 at 6:15 PM, Bin.Cheng  wrote:
>>> On Fri, Jun 16, 2017 at 11:21 AM, Richard Biener
>>>  wrote:
 On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
> Hi,
> For now, loop distribution handles variables used outside of loop as 
> reduction.
> This is inaccurate because all partitions contain statement defining 
> induction
> vars.

 But final induction values are usually not used outside of the loop...
>>> This is in actuality for induction variable which is used outside of the 
>>> loop.

 What is missing is loop distribution trying to change partition order.  In 
 fact
 we somehow assume we can move a reduction across a detected builtin
 (I don't remember if we ever check for validity of that...).
>>> Hmm, I am not sure when we can't.  If there is any dependence between
>>> builtin/reduction partitions, it should be captured by RDG or PG,
>>> otherwise the partitions are independent and can be freely ordered as
>>> long as reduction partition is scheduled last?

> Ideally we should factor out scev-propagation as a standalone interface
> which can be called when necessary.  Before that, this patch simply 
> workarounds
> reduction issue by checking if the statement belongs to all partitions.  
> If yes,
> the reduction must be computed in the last partition no matter how the 
> loop is
> distributed.
> Bootstrap and test on x86_64 and AArch64.  Is it OK?

 stmt_in_all_partitions is not kept up-to-date during partition merging and 
 if
 merging makes the reduction partition(s) pass the stmt_in_all_partitions
 test your simple workaround doesn't work ...
>>> I think it doesn't matter because:
>>>   A) it's really workaround for induction variables.  In general,
>>> induction variables are included by all partition.
>>>   B) After classify partition, we immediately fuses all reduction
>>> partitions.  More stmt_in_all_partitions means we are fusing
>>> non-reduction partition with reduction partition, so the newly
>>> generated (stmt_in_all_partitions) are actually not reduction
>>> statements.  The workaround won't work anyway even the bitmap is
>>> maintained.

 As written it's a valid optimization but can you please note it's 
 limitation in
 some comment please?
>>> Yeah, I will add comment explaining it.
>> Comment added in new version patch.  It also computes bitmap outside
>> now, is it OK?
>
> Ok.  Can you add a testcase for this as well please?  I think the
> series up to this
> is now fully reviewed, I defered 1/n (the new IFN) to the last one
> containing the
> runtime versioning.  Can you re-post that (you can merge with the IFN patch)
> to apply after the series has been applied up to this?
Test case added.

Thanks,
bin
2017-06-20  Bin Cheng  

* tree-loop-distribution.c (classify_partition): New parameter and
better handle reduction statement.
(rdg_build_partitions): Revise comment.
(distribute_loop): Compute statements in all partitions and pass it
to classify_partition.

gcc/testsuite/ChangeLog
2017-06-20  Bin Cheng  

* gcc.dg/tree-ssa/ldist-26.c: New test.
From b16a4839f3211737dccc3ff92ab2c4f325907cd3 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Thu, 22 Jun 2017 17:16:58 +0100
Subject: [PATCH 11/13] reduction-workaround-20170607.txt

---
 gcc/testsuite/gcc.dg/tree-ssa/ldist-26.c | 36 ++
 gcc/tree-loop-distribution.c | 43 
 2 files changed, 68 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ldist-26.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-26.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-26.c
new file mode 100644
index 000..3a69884
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-26.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -ftree-loop-distribution -fdump-tree-ldist-details" } */
+
+extern void abort (void);
+
+int a[130], b[128], c[128];
+
+int __attribute__((noinline,noclone))
+foo (int len, int x)
+{
+  int i;
+  for (i = 1; i <= len; ++i)
+{
+  a[i] = a[i + 2] + 1;
+  b[i] = 0;
+  a[i + 1] = a[i] - 3;
+  if (i < x)
+	c[i] = a[i];
+}
+  return i;
+}
+
+int main()
+{
+  int i;
+  for (i = 0; i < 130; ++i)
+a[i] = i;
+  foo (127, 67);
+  if (a[0] != 0 || a[1] != 4 || a[127] != 130)
+abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "distributed: split to 2 loops and 0 library calls" "ldist" } } */
+/* { dg-final { scan-tree-dump "distributed: split to 1 loops and 1 library calls" "ldist" } } */
+/* { dg-final { scan-tree-dump 

Re: [PATCH GCC][11/13]Annotate partition by its parallelism execution type

2017-06-23 Thread Bin.Cheng
And the patch.

On Fri, Jun 23, 2017 at 11:24 AM, Bin.Cheng  wrote:
> On Tue, Jun 20, 2017 at 12:34 PM, Richard Biener
>  wrote:
>> On Tue, Jun 20, 2017 at 11:18 AM, Bin.Cheng  wrote:
>>> On Fri, Jun 16, 2017 at 11:10 AM, Richard Biener
>>>  wrote:
 On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
> Hi,
> This patch checks and records if partition can be executed in parallel by
> looking if there exists data dependence cycles.  The information is needed
> for distribution because the idea is to distribute parallel type 
> partitions
> away from sequential ones.  I believe current distribution doesn't work
> very well because it does blind distribution/fusion.
> Bootstrap and test on x86_64 and AArch64.  Is it OK?

 +  /* In case of no data dependence.  */
 +  if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
 +return false;
 +  /* Or the data dependence can be resolved by compilation time alias
 + check.  */
 +  else if (!alias_sets_conflict_p (get_alias_set (DR_REF (dr1)),
 +  get_alias_set (DR_REF (dr2
 +return false;

 dependence analysis should use TBAA already, in which cases do you need 
 this?
 It seems to fall foul of the easy mistake of not honoring GCCs memory model
 as well ... see dr_may_alias_p.
>>> I see.  Patch updated with this branch removed.
>>>

 +  /* Further check if any data dependence prevents us from executing the
 + partition parallelly.  */
 +  EXECUTE_IF_SET_IN_BITMAP (partition->reads, 0, i, bi)
 +{
 +  dr1 = (*datarefs_vec)[i];
 +  EXECUTE_IF_SET_IN_BITMAP (partition->writes, 0, j, bj)
 +   {

 what about write-write dependences?

 +  EXECUTE_IF_SET_IN_BITMAP (partition->reads, 0, i, bi)
 +{
 +  dr1 = (*datarefs_vec)[i];
 +  EXECUTE_IF_SET_IN_BITMAP (partition->writes, i + 1, j, bj)
 +   {
 + dr2 = (*datarefs_vec)[j];
 + /* Partition can only be executed sequentially if there is any
 +data dependence cycle.  */

 exact copy of the loop nest follows?!  Maybe you meant to iterate
 over writes in the first loop.
>>> Yes, this is a copy-paste typo.  Patch is also simplified because
>>> read/write are recorded together now.  Is it OK?
>>
>> Ok.
> Sorry I have to update this patch because one of my mistake.  I didn't
> update partition type when fusing them.  For some partition fusion,
> the update is necessary otherwise we end up with inaccurate type and
> inaccurate fusion later.  Is it Ok?
>
> Thanks,
> bin
> 2017-06-20  Bin Cheng  
>
> * tree-loop-distribution.c (enum partition_type): New.
> (struct partition): New field type.
> (partition_merge_into): Add parameter.  Update partition type.
> (data_dep_in_cycle_p, update_type_for_merge): New functions.
> (build_rdg_partition_for_vertex): Compute partition type.
> (rdg_build_partitions): Dump partition type.
> (distribute_loop): Update calls to partition_merge_into.
From 3a00323b1773eaeab368d29fd5995d09afc0cb4e Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 23 Jun 2017 10:43:05 +0100
Subject: [PATCH 10/13] partition-type-20170608.txt

---
 gcc/tree-loop-distribution.c | 139 ++-
 1 file changed, 123 insertions(+), 16 deletions(-)

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 516d5f7..87fdc15 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -528,11 +528,19 @@ build_rdg (struct loop *loop, control_dependences *cd)
 }
 
 
-
+/* Kind of distributed loop.  */
 enum partition_kind {
 PKIND_NORMAL, PKIND_MEMSET, PKIND_MEMCPY, PKIND_MEMMOVE
 };
 
+/* Type of distributed loop.  */
+enum partition_type {
+/* The distributed loop can be executed parallelly.  */
+PTYPE_PARALLEL = 0,
+/* The distributed loop has to be executed sequentially.  */
+PTYPE_SEQUENTIAL
+};
+
 /* Partition for loop distribution.  */
 struct partition
 {
@@ -546,6 +554,7 @@ struct partition
  number of loop (latch) iterations.  */
   bool plus_one;
   enum partition_kind kind;
+  enum partition_type type;
   /* data-references a kind != PKIND_NORMAL partition is about.  */
   data_reference_p main_dr;
   data_reference_p secondary_dr;
@@ -615,18 +624,16 @@ static const char *fuse_message[] = {
   "they are in the same dependence scc",
   "there is no point to distribute loop"};
 
-/* Merge PARTITION into the partition DEST.  */
-
 static void
-partition_merge_into (partition *dest, partition *partition, enum fuse_type ft)
-{
-  dest->kind = PKIND_NORMAL;
-  bitmap_ior_into (dest->stmts, partition->stmts);
-  if (partition_reduction_p (partition))
-   

Re: [PATCH GCC][11/13]Annotate partition by its parallelism execution type

2017-06-23 Thread Bin.Cheng
On Tue, Jun 20, 2017 at 12:34 PM, Richard Biener
 wrote:
> On Tue, Jun 20, 2017 at 11:18 AM, Bin.Cheng  wrote:
>> On Fri, Jun 16, 2017 at 11:10 AM, Richard Biener
>>  wrote:
>>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
 Hi,
 This patch checks and records if partition can be executed in parallel by
 looking if there exists data dependence cycles.  The information is needed
 for distribution because the idea is to distribute parallel type partitions
 away from sequential ones.  I believe current distribution doesn't work
 very well because it does blind distribution/fusion.
 Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>>> +  /* In case of no data dependence.  */
>>> +  if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
>>> +return false;
>>> +  /* Or the data dependence can be resolved by compilation time alias
>>> + check.  */
>>> +  else if (!alias_sets_conflict_p (get_alias_set (DR_REF (dr1)),
>>> +  get_alias_set (DR_REF (dr2
>>> +return false;
>>>
>>> dependence analysis should use TBAA already, in which cases do you need 
>>> this?
>>> It seems to fall foul of the easy mistake of not honoring GCCs memory model
>>> as well ... see dr_may_alias_p.
>> I see.  Patch updated with this branch removed.
>>
>>>
>>> +  /* Further check if any data dependence prevents us from executing the
>>> + partition parallelly.  */
>>> +  EXECUTE_IF_SET_IN_BITMAP (partition->reads, 0, i, bi)
>>> +{
>>> +  dr1 = (*datarefs_vec)[i];
>>> +  EXECUTE_IF_SET_IN_BITMAP (partition->writes, 0, j, bj)
>>> +   {
>>>
>>> what about write-write dependences?
>>>
>>> +  EXECUTE_IF_SET_IN_BITMAP (partition->reads, 0, i, bi)
>>> +{
>>> +  dr1 = (*datarefs_vec)[i];
>>> +  EXECUTE_IF_SET_IN_BITMAP (partition->writes, i + 1, j, bj)
>>> +   {
>>> + dr2 = (*datarefs_vec)[j];
>>> + /* Partition can only be executed sequentially if there is any
>>> +data dependence cycle.  */
>>>
>>> exact copy of the loop nest follows?!  Maybe you meant to iterate
>>> over writes in the first loop.
>> Yes, this is a copy-paste typo.  Patch is also simplified because
>> read/write are recorded together now.  Is it OK?
>
> Ok.
Sorry I have to update this patch because one of my mistake.  I didn't
update partition type when fusing them.  For some partition fusion,
the update is necessary otherwise we end up with inaccurate type and
inaccurate fusion later.  Is it Ok?

Thanks,
bin
2017-06-20  Bin Cheng  

* tree-loop-distribution.c (enum partition_type): New.
(struct partition): New field type.
(partition_merge_into): Add parameter.  Update partition type.
(data_dep_in_cycle_p, update_type_for_merge): New functions.
(build_rdg_partition_for_vertex): Compute partition type.
(rdg_build_partitions): Dump partition type.
(distribute_loop): Update calls to partition_merge_into.


Re: Avoid generating useless range info

2017-06-23 Thread Richard Biener
On Fri, Jun 23, 2017 at 10:59 AM, Aldy Hernandez  wrote:
>
>
> On Fri, Jun 16, 2017 at 4:00 AM, Richard Biener 
> wrote:
>>
>> On Wed, Jun 14, 2017 at 6:41 PM, Aldy Hernandez  wrote:
>> > Hi!
>> >
>> > As discovered in my range class work, we seem to generate a significant
>> > amount of useless range info out of VRP.
>> >
>> > Is there any reason why we can't avoid generating any range info that
>> > spans
>> > the entire domain, and yet contains nothing in the non-zero bitmask?
>> >
>> > The attached patch passes bootstrap, and the one regression it causes is
>> > because now the -Walloca-larger-than= pass is better able to determine
>> > that
>> > there is no range information at all, and the testcase is unbounded.
>> > So...win, win.
>> >
>> > OK for trunk?
>>
>> Can you please do this in set_range_info itself?  Thus, if min ==
>> wi::min_value && max == wi::max_value
>> simply return?  (do not use TYPE_MIN?MAX_VALUE please)
>
>
> The reason I did it in vrp_finalize is because if you do it in
> set_range_info, you break set_nonzero_bits when setting bits on an SSA that
> currently has no range info:
>
> void
> set_nonzero_bits (tree name, const wide_int_ref )
> {
>   gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
>   if (SSA_NAME_RANGE_INFO (name) == NULL)
> set_range_info (name, VR_RANGE,
>TYPE_MIN_VALUE (TREE_TYPE (name)),
>TYPE_MAX_VALUE (TREE_TYPE (name)));
>   range_info_def *ri = SSA_NAME_RANGE_INFO (name);
>   ri->set_nonzero_bits (mask);
> }
>
> Let me know how you'd like me to proceed.

Just factor out a set_range_info_raw and call that then from here.

Richard.

> Aldy
>
>>
>> Thanks,
>> Richard.
>>
>> > Aldy
>
>


Re: [PATCH GCC][10/13]Compute and cache data dependence relation

2017-06-23 Thread Bin.Cheng
On Tue, Jun 20, 2017 at 12:32 PM, Richard Biener
 wrote:
> On Tue, Jun 20, 2017 at 11:15 AM, Bin.Cheng  wrote:
>> On Fri, Jun 16, 2017 at 11:03 AM, Richard Biener
>>  wrote:
>>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
 Hi,
 This patch computes and caches data dependence relation in a hash table
 so that it can be queried multiple times later for partition dependence
 check.
 Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>>> +/* Vector of data dependence relations.  */
>>> +static vec *ddrs_vec;
>>> +
>>> +/* Hash table for data dependence relation in the loop to be distributed.  
>>> */
>>> +static hash_table *ddrs_table;
>>>
>>> avoid the extra indirection.
>>>
>>> +/* Hashtable entry for data reference relation.  */
>>> +struct ddr_entry
>>> +{
>>> +  data_reference_p a;
>>> +  data_reference_p b;
>>> +  ddr_p ddr;
>>> +  hashval_t hash;
>>> +};
>>> ...
>>> +/* Hash table equality function for data reference relation.  */
>>> +
>>> +inline bool
>>> +ddr_entry_hasher::equal (const ddr_entry *entry1, const ddr_entry *entry2)
>>> +{
>>> +  return (entry1->hash == entry2->hash
>>> + && DR_STMT (entry1->a) == DR_STMT (entry2->a)
>>> + && DR_STMT (entry1->b) == DR_STMT (entry2->b)
>>> + && operand_equal_p (DR_REF (entry1->a), DR_REF (entry2->a), 0)
>>> + && operand_equal_p (DR_REF (entry1->b), DR_REF (entry2->b), 0));
>>> +}
>>>
>>> what's the issue with using hash_table  with a custom hasher?
>>> That is, simply key on the dataref pointers (hash them, compare those
>>> for equality)?
>>>
>>> Your scheme looks too complicated / expensive to me ...
>>>
>>> You can drop ddrs_vec needed only for memory removal if you traverse
>>> the hashtable.
>> Thanks for reviewing.  Patch simplified as suggested.  Is it OK?
>
> +inline hashval_t
> +ddr_hasher::hash (const data_dependence_relation *ddr)
> +{
> +  return iterative_hash_object (DDR_A (ddr),
> +   iterative_hash_object (DDR_B (ddr), 0));
> +}
> +
>
> please use
>
> inchash::hash h;
> h.add_ptr (DDR_A (ddr));
> h.add_ptr (DDR_B (ddr));
> return h.end ();
>
> Ok with that change.
Done, patch updated.

Thanks,
bin
>
> Richard.
>
>> Thanks,
>> bin
>> 2017-06-17  Bin Cheng  
>>
>> * tree-loop-distribution.c (struct ddr_hasher): New.
>> (ddr_hasher::hash, ::equal, get_data_dependence): New function.
>> (ddrs_table): New.
>> (classify_partition): Call get_data_dependence.
>> (pg_add_dependence_edges): Ditto.
>> (distribute_loop): Release data dependence hash table.
From f1bc5437a186af22398c6ab7071ba1ef8d0dd897 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 9 Jun 2017 13:02:09 +0100
Subject: [PATCH 09/13] cache-data-dependence-20170609.txt

---
 gcc/tree-loop-distribution.c | 99 
 1 file changed, 73 insertions(+), 26 deletions(-)

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 119863f..516d5f7 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -70,6 +70,35 @@ along with GCC; see the file COPYING3.  If not see
 #define MAX_DATAREFS_NUM \
 	((unsigned) PARAM_VALUE (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS))
 
+/* Hashtable helpers.  */
+
+struct ddr_hasher : nofree_ptr_hash 
+{
+  static inline hashval_t hash (const data_dependence_relation *);
+  static inline bool equal (const data_dependence_relation *,
+			const data_dependence_relation *);
+};
+
+/* Hash function for data dependence.  */
+
+inline hashval_t
+ddr_hasher::hash (const data_dependence_relation *ddr)
+{
+  inchash::hash h;
+  h.add_ptr (DDR_A (ddr));
+  h.add_ptr (DDR_B (ddr));
+  return h.end ();
+}
+
+/* Hash table equality function for data dependence.  */
+
+inline bool
+ddr_hasher::equal (const data_dependence_relation *ddr1,
+		   const data_dependence_relation *ddr2)
+{
+  return (DDR_A (ddr1) == DDR_A (ddr2) && DDR_B (ddr1) == DDR_B (ddr2));
+}
+
 /* The loop (nest) to be distributed.  */
 static vec loop_nest;
 
@@ -79,6 +108,10 @@ static vec datarefs_vec;
 /* Store index of data reference in aux field.  */
 #define DR_INDEX(dr)  ((uintptr_t) (dr)->aux)
 
+/* Hash table for data dependence relation in the loop to be distributed.  */
+static hash_table ddrs_table (389);
+
+
 /* A Reduced Dependence Graph (RDG) vertex representing a statement.  */
 struct rdg_vertex
 {
@@ -1057,6 +1090,32 @@ generate_code_for_partition (struct loop *loop,
   return false;
 }
 
+/* Return data dependence relation for data references A and B.  The two
+   data references must be in lexicographic order wrto reduced dependence
+   graph RDG.  We firstly try to find ddr from global ddr hash table.  If
+   it doesn't exist, compute the ddr and cache it.  */
+
+static data_dependence_relation *
+get_data_dependence 

[Patch AArch64 docs] Document the RcPc extension

2017-06-23 Thread James Greenhalgh

Hi,

Andrew pointed out that I did not document the new architecture extension
flag I added the RcPc iextension. This was intentional, as enablihg the rcpc
extension does not change GCC code generation, and is just an assembler flag.
But for completeness, here is documentation for the new option.

OK?

Thanks,
James

---
2017-06-21  James Greenhalgh  

* doc/invoke.texi (rcpc architecture extension): Document it.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7e7a16a5..db00e51 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14172,6 +14172,10 @@ Enable Large System Extension instructions.  This is on by default for
 @option{-march=armv8.1-a}.
 @item fp16
 Enable FP16 extension.  This also enables floating-point instructions.
+@item rcpc
+Enable the RcPc extension.  This does not change code generation from GCC,
+but is passed on to the assembler, enabling inline asm statements to use
+instructions from the RcPc extension.
 
 @end table
 


Re: [PATCH GCC][09/13]Simply cost model merges partitions with the same references

2017-06-23 Thread Bin.Cheng
On Mon, Jun 19, 2017 at 4:20 PM, Richard Biener
 wrote:
> On Mon, Jun 19, 2017 at 3:40 PM, Bin.Cheng  wrote:
>> On Wed, Jun 14, 2017 at 2:54 PM, Richard Biener
>>  wrote:
>>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
 Hi,
 Current primitive cost model merges partitions with data references 
 sharing the same
 base address.  I believe it's designed to maximize data reuse in 
 distribution, but
 that should be done by dedicated data reusing algorithm.  At this stage of 
 merging,
 we should be conservative and only merge partitions with the same 
 references.
 Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>>> Well, I'd say "conservative" is merging more, not less.  For example
>>> splitting a[i+1] from a[i]
>>> would be bad(?), so I'd see to allow unequal DR_INIT as "equal" for
>>> merging.  Maybe
>>> DR_INIT within a cacheline or so.
>>>
>>> How many extra distributions in say SPEC do you get from this change alone?
>> Hi,
>> I collected data for spec2006 only with/without this patch.  I am a
>> bit surprised that it doesn't change the number of distributed loops.
>>>
>>> It shows also that having partition->reads_and_writes would be nice
>>> ...  the code duplication
>> Yeah, I merged read/write data references in previous patch, now this
>> duplication is gone.  Update patch attached.  Is it OK?
>
> +  gcc_assert (i < datarefs_vec.length ());
> +  dr1 = datarefs_vec[i];
>
> these asserts are superfluous -- vec::operator[] does them as well.
>
> Ok if you remove them.
Done.
I realized I made mistakes when measuring the impact of this patch.
This patch only apparently causes failure of
gcc.dg/tree-ssa/ldist-6.c, so here is the updated patch.  I also
collected the number of distributed loops in spec2k6 as below:
 trunk:  5882
 only this patch: 7130
 whole patch series: 5237
So the conclusion is, this patch does aggressive distribution like
ldist-6.c, which means worse data-locality.  The following patch does
more fusion which mitigates impact of this patch and results in
conservative distribution overall.   But as we lack of data locality
cost model, ldist-6.c remains failed even after applying whole patch
series.  Hmm, a cache-sensitive cost model is need for several passes
now, distribution, prefetch and (possible) interchange.
Richard, do you have second comment based on the new data?

Thanks,
bin
2017-06-20  Bin Cheng  

* tree-loop-distribution.c (ref_base_address): Delete.
(similar_memory_accesses): Rename ...
(share_memory_accesses): ... to this.  Check if partitions access
the same memory reference.
(distribute_loop): Call share_memory_accesses.

gcc/testsuite/ChangeLog
2017-06-20  Bin Cheng  

* gcc.dg/tree-ssa/ldist-6.c: XFAIL.
From a002d0a88ab9e981d9c57bd8f1203072290623ad Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 9 Jun 2017 12:41:36 +0100
Subject: [PATCH 08/13] share-memory-access-20170608.txt

---
 gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c |  2 +-
 gcc/tree-loop-distribution.c| 69 +++--
 2 files changed, 32 insertions(+), 39 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c
index 8eb1c62..e0a68d8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c
@@ -34,4 +34,4 @@ int loop1 (int k)
   return a[1000-2] + b[1000-1] + c[1000-2] + d[1000-2];
 }
 
-/* { dg-final { scan-tree-dump-times "distributed: split to 2 loops" 0 "ldist" } } */
+/* { dg-final { scan-tree-dump-times "distributed: split to 2 loops" 0 "ldist" { xfail *-*-* } } } */
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index eafd119..119863f 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1268,30 +1268,16 @@ classify_partition (loop_p loop, struct graph *rdg, partition *partition)
 }
 }
 
-/* For a data reference REF, return the declaration of its base
-   address or NULL_TREE if the base is not determined.  */
-
-static tree
-ref_base_address (data_reference_p dr)
-{
-  tree base_address = DR_BASE_ADDRESS (dr);
-  if (base_address
-  && TREE_CODE (base_address) == ADDR_EXPR)
-return TREE_OPERAND (base_address, 0);
-
-  return base_address;
-}
-
-/* Returns true when PARTITION1 and PARTITION2 have similar memory
-   accesses in RDG.  */
+/* Returns true when PARTITION1 and PARTITION2 access the same memory
+   object in RDG.  */
 
 static bool
-similar_memory_accesses (struct graph *rdg, partition *partition1,
-			 partition *partition2)
+share_memory_accesses (struct graph *rdg,
+		   partition *partition1, partition *partition2)
 {
-  unsigned i, j, k, l;
+  unsigned i, j;
   bitmap_iterator bi, bj;
-  data_reference_p ref1, ref2;
+  

Re: [PATCH GCC][01/13]Introduce internal function IFN_LOOP_DIST_ALIAS

2017-06-23 Thread Bin.Cheng
On Mon, Jun 12, 2017 at 6:02 PM, Bin Cheng  wrote:
> Hi,
> I was asked by upstream to split the loop distribution patch into small ones.
> It is hard because data structure and algorithm are closely coupled together.
> Anyway, this is the patch series with smaller patches.  Basically I tried to
> separate data structure and bug-fix changes apart with one as the main patch.
> Note I only made necessary code refactoring in order to separate patch, apart
> from that, there is no change against the last version.
>
> This is the first patch introducing new internal function IFN_LOOP_DIST_ALIAS.
> GCC will distribute loops under condition of this function call.
>
> Bootstrap and test on x86_64 and AArch64.  Is it OK?
Hi,
I need to update this patch fixing an issue in
vect_loop_dist_alias_call.  The previous patch fails to find some
IFN_LOOP_DIST_ALIAS calls.

Bootstrap and test in series.  Is it OK?

Thanks,
bin
>
> Thanks,
> bin
> 2017-06-07  Bin Cheng  
>
> * cfgloop.h (struct loop): New field ldist_alias_id.
> * cfgloopmanip.c (lv_adjust_loop_entry_edge): Comment change.
> * internal-fn.c (expand_LOOP_DIST_ALIAS): New function.
> * internal-fn.def (LOOP_DIST_ALIAS): New.
> * tree-vectorizer.c (vect_loop_dist_alias_call): New function.
> (fold_loop_dist_alias_call): New function.
> (vectorize_loops): Fold IFN_LOOP_DIST_ALIAS call depending on
> successful vectorization or not.
From ab8334c5f109c593610df3efcf1aa5a2edcf6be9 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 7 Jun 2017 13:04:03 +0100
Subject: [PATCH 01/13] ifn_loop_dist_alias-20170608.txt

---
 gcc/cfgloop.h |  9 ++
 gcc/cfgloopmanip.c|  3 +-
 gcc/internal-fn.c |  8 ++
 gcc/internal-fn.def   |  1 +
 gcc/tree-vectorizer.c | 79 ++-
 5 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index a8bec1d..be4187a 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -225,6 +225,15 @@ struct GTY ((chain_next ("%h.next"))) loop {
  builtins.  */
   tree simduid;
 
+  /* For loops generated by distribution with runtime alias checks, this
+ is a unique identifier of the original distributed loop.  Generally
+ it is the number of the original loop.  IFN_LOOP_DIST_ALIAS builtin
+ uses this id as its first argument.  Give a loop with an id, we can
+ look upward in dominance tree for the corresponding IFN_LOOP_DIST_ALIAS
+ buildin.  Note this id has no meanling after IFN_LOOP_DIST_ALIAS is
+ folded and eliminated.  */
+  int ldist_alias_id;
+
   /* Upper bound on number of iterations of a loop.  */
   struct nb_iter_bound *bounds;
 
diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index d764ab9..adb2f65 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -1653,7 +1653,8 @@ force_single_succ_latches (void)
 
   THEN_PROB is the probability of then branch of the condition.
   ELSE_PROB is the probability of else branch. Note that they may be both
-  REG_BR_PROB_BASE when condition is IFN_LOOP_VECTORIZED.  */
+  REG_BR_PROB_BASE when condition is IFN_LOOP_VECTORIZED or
+  IFN_LOOP_DIST_ALIAS.  */
 
 static basic_block
 lv_adjust_loop_entry_edge (basic_block first_head, basic_block second_head,
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 75fe027..96e40cb 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2250,6 +2250,14 @@ expand_LOOP_VECTORIZED (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* This should get folded in tree-vectorizer.c.  */
+
+static void
+expand_LOOP_DIST_ALIAS (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
 /* Expand MASK_LOAD call STMT using optab OPTAB.  */
 
 static void
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index e162d81..79c19fb 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -158,6 +158,7 @@ DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMD_ORDERED_START, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMD_ORDERED_END, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (LOOP_DIST_ALIAS, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ANNOTATE,  ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (UBSAN_NULL, ECF_LEAF | ECF_NOTHROW, ".R.")
 DEF_INTERNAL_FN (UBSAN_BOUNDS, ECF_LEAF | ECF_NOTHROW, NULL)
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 1bef2e4..05e9f84 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -469,6 +469,67 @@ fold_loop_vectorized_call (gimple *g, tree value)
 }
 }
 
+/* If LOOP has been versioned during loop distribution, return the internal
+   call guarding it.  */
+
+static gimple *
+vect_loop_dist_alias_call (struct loop *loop)
+{
+  gimple_stmt_iterator gsi;
+  gimple *g;

[C++ Patch PING] (was: [C++ Patch] PR 62315 ("do not print typename in diagnostic if the original code does not have it"))

2017-06-23 Thread Paolo Carlini

Hi,

gently pingning this:

On 02/06/2017 10:35, Paolo Carlini wrote:

Hi,

a while ago Manuel noticed that printing 'typename' in error messages 
about missing 'typename' can be confusing. That seems easy to fix, in 
fact we already handle correctly a similar situation in 
grokdeclarator. Tested x86_64-linux.


https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00099.html

Thanks!
Paolo.


Re: libgo patch committed: Fix ptrace implementation on MIPS

2017-06-23 Thread James Cowgill
Hi,

On 22/06/17 20:59, Ian Lance Taylor wrote:
> James, any thoughts?
> 
> Ian
> 
> On Thu, Jun 22, 2017 at 12:55 AM, Andreas Schwab  wrote:
>> On Jun 21 2017, Ian Lance Taylor  wrote:
>>
>>> Index: libgo/sysinfo.c
>>> ===
>>> --- libgo/sysinfo.c   (revision 249205)
>>> +++ libgo/sysinfo.c   (working copy)
>>> @@ -102,6 +102,9 @@
>>>  #if defined(HAVE_LINUX_NETLINK_H)
>>>  #include 
>>>  #endif
>>> +#if defined(HAVE_LINUX_PTRACE_H)
>>> +#include 
>>> +#endif
>>>  #if defined(HAVE_LINUX_RTNETLINK_H)
>>>  #include 
>>>  #endif
>>
>> That breaks ia64:
>>
>> In file included from /usr/include/asm/ptrace.h:58:0,
>>  from /usr/include/linux/ptrace.h:69,
>>  from ../../../libgo/sysinfo.c:106:
>> /usr/include/asm/fpu.h:57:8: error: redefinition of 'struct ia64_fpreg'
>>  struct ia64_fpreg {
>> ^~
>> In file included from /usr/include/signal.h:339:0,
>>  from 
>> /usr/local/gcc/gcc-20170622/Build/gcc/include-fixed/sys/ucontext.h:32,
>>  from /usr/include/ucontext.h:27,
>>  from ../../../libgo/sysinfo.c:17:
>> /usr/include/bits/sigcontext.h:32:8: note: originally defined here
>>  struct ia64_fpreg
>> ^~
>> In file included from /usr/include/linux/ptrace.h:69:0,
>>  from ../../../libgo/sysinfo.c:106:
>> /usr/include/asm/ptrace.h:208:8: error: redefinition of 'struct 
>> pt_all_user_regs'
>>  struct pt_all_user_regs {
>> ^~~~
>> In file included from ../../../libgo/sysinfo.c:66:0:
>> /usr/include/sys/ptrace.h:116:8: note: originally defined here
>>  struct pt_all_user_regs
>> ^~~~

This looks like this glibc bug which was fixed in 2.19.
https://sourceware.org/bugzilla/show_bug.cgi?id=762

James



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] [Aarch64] Variable shift count truncation issues

2017-06-23 Thread James Greenhalgh
On Fri, Jun 23, 2017 at 10:27:55AM +0100, Michael Collison wrote:
> Fixed the "nitpick" issues pointed out by James. Okay for trunk?


> > I have a few comments below, which are closer to nitpicking than structural
> > issues with the patch.
> > 
> > With those fixed, this is OK to commit.

This is still OK for trunk.

Thanks,
James

> 2017-05-22  Kyrylo Tkachov  
>   Michael Collison 
> 
>   PR target/70119
>   * config/aarch64/aarch64.md (*aarch64__reg_3_mask1):
>   New pattern.
>   (*aarch64_reg_3_neg_mask2): New pattern.
>   (*aarch64_reg_3_minus_mask): New pattern.
>   (*aarch64__reg_di3_mask2): New pattern.
>   * config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost
>   of shift when the shift amount is masked with constant equal to
>   the size of the mode.
>   * config/aarch64/predicates.md (subreg_lowpart_operator): New
>   predicate.
> 
> 
> 2016-05-22  Kyrylo Tkachov  
>   Michael Collison 
> 
>   PR target/70119
>   * gcc.target/aarch64/var_shift_mask_1.c: New test.
> 



RE: [PATCH] [Aarch64] Variable shift count truncation issues

2017-06-23 Thread Michael Collison
Fixed the "nitpick" issues pointed out by James. Okay for trunk?

2017-05-22  Kyrylo Tkachov  
Michael Collison 

PR target/70119
* config/aarch64/aarch64.md (*aarch64__reg_3_mask1):
New pattern.
(*aarch64_reg_3_neg_mask2): New pattern.
(*aarch64_reg_3_minus_mask): New pattern.
(*aarch64__reg_di3_mask2): New pattern.
* config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost
of shift when the shift amount is masked with constant equal to
the size of the mode.
* config/aarch64/predicates.md (subreg_lowpart_operator): New
predicate.


2016-05-22  Kyrylo Tkachov  
Michael Collison 

PR target/70119
* gcc.target/aarch64/var_shift_mask_1.c: New test.

-Original Message-
From: James Greenhalgh [mailto:james.greenha...@arm.com] 
Sent: Thursday, June 22, 2017 3:17 AM
To: Michael Collison ; Wilco Dijkstra 
; Christophe Lyon ; GCC 
Patches ; nd ; 
richard.sandif...@linaro.org
Subject: Re: [PATCH] [Aarch64] Variable shift count truncation issues

On Wed, Jun 21, 2017 at 04:42:00PM +0100, Richard Sandiford wrote:
> Michael Collison  writes:
> > Updated the patch per Richard's suggestions to allow scheduling of 
> > instructions before reload.
> 
> Thanks, this looks good to me FWIW, but obviously I can't approve it.

Thanks for the review Richard, that gives me good confidence in this patch.

I have a few comments below, which are closer to nitpicking than structural 
issues with the patch.

With those fixed, this is OK to commit.

> > 2017-05-22  Kyrylo Tkachov  
> > Michael Collison 

With the work you've done, you can probably place yourself first on the 
ChangeLog now ;)

> >
> > PR target/70119
> > * config/aarch64/aarch64.md (*aarch64__reg_3_mask1):
> > New pattern.
> > (*aarch64_reg_3_neg_mask2): New pattern.
> > (*aarch64_reg_3_minus_mask): New pattern.
> > (*aarch64__reg_di3_mask2): New pattern.
> > * config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost
> > of shift when the shift amount is masked with constant equal to
> > the size of the mode.
> > * config/aarch64/predicates.md (subreg_lowpart_operator): New
> > predicate.
> >
> >
> > 2016-05-22  Kyrylo Tkachov  
> > Michael Collison 
> >
> > PR target/70119
> > * gcc.target/aarch64/var_shift_mask_1.c: New test.


> > diff --git a/gcc/config/aarch64/aarch64.md 
> > b/gcc/config/aarch64/aarch64.md index d89df66..ff5720c 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -3942,6 +3942,97 @@
> >}
> >  )
> >  
> > +;; When the LSL, LSR, ASR, ROR instructions operate on all register 
> > +arguments ;; they truncate the shift/rotate amount by the size of 
> > +the registers they ;; operate on: 32 for W-regs, 63 for X-regs.  
> > +This allows us to optimise away

Is this "63" a typo? Should it be 64?

> > +;; such redundant masking instructions.  GCC can do that 
> > +automatically when ;; SHIFT_COUNT_TRUNCATED is true, but we can't 
> > +enable it for TARGET_SIMD ;; because some of the SISD shift alternatives 
> > don't perform this truncations.
> > +;; So this pattern exists to catch such cases.
> > +
> > +(define_insn "*aarch64__reg_3_mask1"
> > +  [(set (match_operand:GPI 0 "register_operand" "=r")
> > +   (SHIFT:GPI
> > + (match_operand:GPI 1 "register_operand" "r")
> > + (match_operator 4 "subreg_lowpart_operator"
> > +  [(and:GPI (match_operand:GPI 2 "register_operand" "r")
> > +(match_operand 3 "const_int_operand" "n"))])))]
> > +  "(~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0"

Spaces around "-"

> > +  "\t%0, %1, %2"
> > +  [(set_attr "type" "shift_reg")]
> > +)
> > +
> > +(define_insn_and_split "*aarch64_reg_3_neg_mask2"
> > +  [(set (match_operand:GPI 0 "register_operand" "=")
> > +   (SHIFT:GPI
> > + (match_operand:GPI 1 "register_operand" "r")
> > + (match_operator 4 "subreg_lowpart_operator"
> > + [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r")
> > +  (match_operand 3 "const_int_operand" "n")))])))]
> > +  "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0)"
> > +  "#"
> > +  "&& 1"

I'd prefer "true" to "1"

> > +  [(const_int 0)]
> > +  {
> > +rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode)
> > +  : operands[0]);
> > +emit_insn (gen_negsi2 (tmp, operands[2]));
> > +
> > +rtx and_op = gen_rtx_AND (SImode, tmp, operands[3]);
> > +rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op,
> > + 

[PATCH][Testsuite] Use user defined memmove in gcc.c-torture/execute/builtins/memops-asm-lib.c

2017-06-23 Thread Renlin Li

Hi all,

After the change r249278. bcopy is folded into memmove. And in newlib aarch64
memmove implementation, it will call memcpy in certain conditions.
The memcpy defined in memops-asm-lib.c will abort when the test is running.

In this case, I defined a user memmove function which by pass the library one.
So that memcpy won't be called accidentally.

Okay to commit?

gcc/testsuite/ChangeLog:

2017-06-22  Renlin Li  
Szabolcs Nagy  

* gcc.c-torture/execute/builtins/memops-asm-lib.c (my_memmove): New.
* gcc.c-torture/execute/builtins/memops-asm.c (memmove): Declare 
memmove.
diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/memops-asm-lib.c b/gcc/testsuite/gcc.c-torture/execute/builtins/memops-asm-lib.c
index 529..25d4a40 100644
--- a/gcc/testsuite/gcc.c-torture/execute/builtins/memops-asm-lib.c
+++ b/gcc/testsuite/gcc.c-torture/execute/builtins/memops-asm-lib.c
@@ -37,6 +37,24 @@ my_bcopy (const void *s, void *d, size_t n)
 }
 }
 
+__attribute__ ((used))
+void
+my_memmove (void *d, const void *s, size_t n)
+{
+  char *dst = (char *) d;
+  const char *src = (const char *) s;
+  if (src >= dst)
+while (n--)
+  *dst++ = *src++;
+  else
+{
+  dst += n;
+  src += n;
+  while (n--)
+	*--dst = *--src;
+}
+}
+
 /* LTO code is at the present to able to track that asm alias my_bcopy on builtin
actually refers to this function.  See PR47181. */
 __attribute__ ((used))
diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/memops-asm.c b/gcc/testsuite/gcc.c-torture/execute/builtins/memops-asm.c
index ed2b06c..44e336c 100644
--- a/gcc/testsuite/gcc.c-torture/execute/builtins/memops-asm.c
+++ b/gcc/testsuite/gcc.c-torture/execute/builtins/memops-asm.c
@@ -12,6 +12,8 @@ extern void *memcpy (void *, const void *, size_t)
   __asm (ASMNAME ("my_memcpy"));
 extern void bcopy (const void *, void *, size_t)
   __asm (ASMNAME ("my_bcopy"));
+extern void *memmove (void *, const void *, size_t)
+  __asm (ASMNAME ("my_memmove"));
 extern void *memset (void *, int, size_t)
   __asm (ASMNAME ("my_memset"));
 extern void bzero (void *, size_t)


Re: [Neon intrinsics] Literal vector construction through vcombine is poor

2017-06-23 Thread Richard Earnshaw (lists)
On 23/06/17 00:10, Michael Collison wrote:
> Richard,
> 
> I reworked the patch and retested on big endian as well as little. The 
> original code was performing two swaps in the big endian case which works out 
> to no swaps at all.
> 
> I also updated the ChangeLog per your comments. Okay for trunk?
> 
> 2017-06-19  Michael Collison  
> 
>   * config/aarch64/aarch64-simd.md (aarch64_combine): Directly
>   call aarch64_split_simd_combine.
>   * (aarch64_combine_internal): Delete pattern.
>   * config/aarch64/aarch64.c (aarch64_split_simd_combine):
>   Allow register and subreg operands.
> 
> -Original Message-
> From: Richard Earnshaw (lists) [mailto:richard.earns...@arm.com] 
> Sent: Monday, June 19, 2017 6:37 AM
> To: Michael Collison ; GCC Patches 
> 
> Cc: nd 
> Subject: Re: [Neon intrinsics] Literal vector construction through vcombine 
> is poor
> 
> On 16/06/17 22:08, Michael Collison wrote:
>> This patch improves code generation for literal vector construction by 
>> expanding and exposing the pattern to rtl optimization earlier. The current 
>> implementation delays splitting the pattern until after reload which results 
>> in poor code generation for the following code:
>>
>>
>> #include "arm_neon.h"
>>
>> int16x8_t
>> foo ()
>> {
>>   return vcombine_s16 (vdup_n_s16 (0), vdup_n_s16 (8)); }
>>
>> Trunk generates:
>>
>> foo:
>>  moviv1.2s, 0
>>  moviv0.4h, 0x8
>>  dup d2, v1.d[0]
>>  ins v2.d[1], v0.d[0]
>>  orr v0.16b, v2.16b, v2.16b
>>  ret
>>
>> With the patch we now generate:
>>
>> foo:
>>  moviv1.4h, 0x8
>>  moviv0.4s, 0
>>  ins v0.d[1], v1.d[0]
>>  ret
>>
>> Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk.
>>
>> 2017-06-15  Michael Collison  
>>
>>  * config/aarch64/aarch64-simd.md(aarch64_combine_internal):
>>  Convert from define_insn_and_split into define_expand
>>  * config/aarch64/aarch64.c(aarch64_split_simd_combine):
>>  Allow register and subreg operands.
>>
> 
> Your changelog entry is confusing.  You've deleted the 
> aarch64_combine_internal pattern entirely, having merged some of its 
> functionality directly into its caller (aarch64_combine).
> 
> So I think it should read:
> 
> * config/aarch64/aarch64-simd.md (aarch64_combine): Directly call 
> aarch64_split_simd_combine.
> (aarch64_combine_internal): Delete pattern.
> * ...
> 
> Note also there should be a space between the file name and the open bracket 
> for the first function name.
> 
> Why don't you need the big-endian code path any more?
> 
> R.
> 
>>
>> pr7057.patch
>>
>>
>> diff --git a/gcc/config/aarch64/aarch64-simd.md 
>> b/gcc/config/aarch64/aarch64-simd.md
>> index c462164..4a253a9 100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -2807,27 +2807,11 @@
>>op1 = operands[1];
>>op2 = operands[2];
>>  }
>> -  emit_insn (gen_aarch64_combine_internal (operands[0], op1, 
>> op2));
>> -  DONE;
>> -}
>> -)
>>  
>> -(define_insn_and_split "aarch64_combine_internal"
>> -  [(set (match_operand: 0 "register_operand" "=")
>> -(vec_concat: (match_operand:VDC 1 "register_operand" "w")
>> -   (match_operand:VDC 2 "register_operand" "w")))]
>> -  "TARGET_SIMD"
>> -  "#"
>> -  "&& reload_completed"
>> -  [(const_int 0)]
>> -{
>> -  if (BYTES_BIG_ENDIAN)
>> -aarch64_split_simd_combine (operands[0], operands[2], operands[1]);
>> -  else
>> -aarch64_split_simd_combine (operands[0], operands[1], operands[2]);
>> +  aarch64_split_simd_combine (operands[0], op1, op2);
>> +
>>DONE;
>>  }
>> -[(set_attr "type" "multiple")]
>>  )
>>  
>>  (define_expand "aarch64_simd_combine"
>> diff --git a/gcc/config/aarch64/aarch64.c 
>> b/gcc/config/aarch64/aarch64.c index 2e385c4..46bd78b 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -1650,7 +1650,8 @@ aarch64_split_simd_combine (rtx dst, rtx src1, 
>> rtx src2)
>>  
>>gcc_assert (VECTOR_MODE_P (dst_mode));
>>  
>> -  if (REG_P (dst) && REG_P (src1) && REG_P (src2))
>> +  if (register_operand (dst, dst_mode) && register_operand (src1, src_mode)
>> +  && register_operand (src2, src_mode))
>>  {
>>rtx (*gen) (rtx, rtx, rtx);
>>  
>>
> 
> 
> pr7057v4.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index c462164..3043f81 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -2796,38 +2796,10 @@
> (match_operand:VDC 2 "register_operand")]
>"TARGET_SIMD"
>  {
> -  rtx op1, op2;
> -  if (BYTES_BIG_ENDIAN)
> -{
> -  op1 = operands[2];
> -  op2 = operands[1];
> -}
> -  else
> -{
> -  op1 = operands[1];
> -  op2 = operands[2];
> -}
> -  emit_insn 

Re: Avoid generating useless range info

2017-06-23 Thread Aldy Hernandez
[one more time, but without sending html which the list refuses :-/]

On Fri, Jun 16, 2017 at 4:00 AM, Richard Biener
 wrote:
> On Wed, Jun 14, 2017 at 6:41 PM, Aldy Hernandez  wrote:
>> Hi!
>>
>> As discovered in my range class work, we seem to generate a significant
>> amount of useless range info out of VRP.
>>
>> Is there any reason why we can't avoid generating any range info that spans
>> the entire domain, and yet contains nothing in the non-zero bitmask?
>>
>> The attached patch passes bootstrap, and the one regression it causes is
>> because now the -Walloca-larger-than= pass is better able to determine that
>> there is no range information at all, and the testcase is unbounded.
>> So...win, win.
>>
>> OK for trunk?
>
> Can you please do this in set_range_info itself?  Thus, if min ==
> wi::min_value && max == wi::max_value
> simply return?  (do not use TYPE_MIN?MAX_VALUE please)


The reason I did it in vrp_finalize is because if you do it in
set_range_info, you break set_nonzero_bits when setting bits on an SSA
that currently has no range info:

void
set_nonzero_bits (tree name, const wide_int_ref )
{
  gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
  if (SSA_NAME_RANGE_INFO (name) == NULL)
set_range_info (name, VR_RANGE,
   TYPE_MIN_VALUE (TREE_TYPE (name)),
   TYPE_MAX_VALUE (TREE_TYPE (name)));
  range_info_def *ri = SSA_NAME_RANGE_INFO (name);
  ri->set_nonzero_bits (mask);
}

Let me know how you'd like me to proceed.
Aldy

>
> Thanks,
> Richard.
>
>> Aldy


[PATHC][x86] Scalar mask and round RTL templates

2017-06-23 Thread Peryt, Sebastian
Hi,

This patch adds three extra RTL meta-templates for scalar round and mask. 
Additionally fixes errors caused by previous mask and round usage in some of 
the intrinsics that I found.

2017-06-23  Sebastian Peryt  

gcc/
* config/i386/subst.md (mask_scalar, round_scalar, 
round_saeonly_scalar): New templates.
(mask_scalar_name, mask_scalar_operand3, round_scalar_name,
round_scalar_mask_operand3, round_scalar_mask_op3,
round_scalar_constraint, round_scalar_prefix, round_saeonly_scalar_name,
round_saeonly_scalar_mask_operand3, round_saeonly_scalar_mask_op3,
round_saeonly_scalar_constraint, round_saeonly_scalar_prefix): New 
subst attribute.
* config/i386/sse.md
(_vm3): Renamed to ...
_vm3 
... this.
(_vm3): Renamed to 
...
_vm3 
... this.
(_vm3): Renamed to ...
_vm3 ... 
this.
(v\t{%2, %1, 
%0|%0, %1, %2}): Changed to 
...
v\t{%2, 
%1, %0|%0, %1, 
%2} ... this.
(v\t{%2, %1, 
%0|%0, %1, %2}): Changed to 
...
v\t{%2, 
%1, %0|%0, %1, 
%2} ... this.
(v\t{%2, %1, 
%0|%0, %1, %2}): 
Changed to ...

v\t{%2, %1, 
%0|%0, %1, 
%2} ... this.

Is it ok for trunk?

Thanks,
Sebastian


Scalar-templates.patch
Description: Scalar-templates.patch


ping: [gcc patch] DWARF-5: Define DW_IDX_GNU_static and DW_IDX_GNU_external

2017-06-23 Thread Jan Kratochvil
http://dwarfstd.org/ShowIssue.php?issue=170527.1

170527.1 Jan Kratochvil DW_IDX_* for static/extern symbols Enhancement Open 

Section 6.1.1.4.7, pg 147
When a debugger wants to print 'somename' it logically tries to find first 
'somename' as an 
external symbol in all available libraries.  Only if none such external symbol 
is found the 
debugger starts searching for a static 'somename' symbol in those libraries.

This requires to know whether a symbol in .debug_names index has DW_AT_external 
or not.  
Otherwise a lot of needless CU expansions happen.  This extension improves 
performance 
gain of the .debug_names index.

(Discovered in an original fix by Doug Evans - GDB Bug 14125.)

Proposing and asking for pre-allocation:
  DW_IDX_static   = 6 = DW_FORM_flag_present = DIE's DW_AT_external is not 
present
  DW_IDX_external = 7 = DW_FORM_flag_present = DIE's DW_AT_external is present
include/ChangeLog
2017-05-26  Jan Kratochvil  

* dwarf2.def (DW_IDX_compile_unit, DW_IDX_type_unit, DW_IDX_die_offset)
(DW_IDX_parent, DW_IDX_type_hash, DW_IDX_lo_user, DW_IDX_hi_user)
(DW_IDX_GNU_static, DW_IDX_GNU_external): New.
* dwarf2.h (DW_IDX, DW_IDX_DUP, DW_FIRST_IDX, DW_END_IDX): New.
(enum dwarf_name_index_attribute): Remove.
(get_DW_IDX_name): New declaration.

libiberty/ChangeLog
2017-05-26  Jan Kratochvil  

* dwarfnames.c (DW_FIRST_IDX, DW_END_IDX, DW_IDX, DW_IDX_DUP): New.

diff --git a/include/dwarf2.def b/include/dwarf2.def
index ea6194e..1f0d50f 100644
--- a/include/dwarf2.def
+++ b/include/dwarf2.def
@@ -782,3 +782,15 @@ DW_CFA (DW_CFA_GNU_args_size, 0x2e)
 DW_CFA (DW_CFA_GNU_negative_offset_extended, 0x2f)
 
 DW_END_CFA
+
+/* Index attributes in the Abbreviations Table.  */
+DW_FIRST_IDX (DW_IDX_compile_unit, 1)
+DW_IDX (DW_IDX_type_unit, 2)
+DW_IDX (DW_IDX_die_offset, 3)
+DW_IDX (DW_IDX_parent, 4)
+DW_IDX (DW_IDX_type_hash, 5)
+DW_IDX_DUP (DW_IDX_lo_user, 0x2000)
+DW_IDX (DW_IDX_hi_user, 0x3fff)
+DW_IDX (DW_IDX_GNU_static, 0x2000)
+DW_IDX (DW_IDX_GNU_external, 0x2001)
+DW_END_IDX
diff --git a/include/dwarf2.h b/include/dwarf2.h
index 9c78880..14b6f22 100644
--- a/include/dwarf2.h
+++ b/include/dwarf2.h
@@ -52,6 +52,8 @@
 #define DW_ATE(name, value) , name = value
 #define DW_ATE_DUP(name, value) , name = value
 #define DW_CFA(name, value) , name = value
+#define DW_IDX(name, value) , name = value
+#define DW_IDX_DUP(name, value) , name = value
 
 #define DW_FIRST_TAG(name, value) enum dwarf_tag { \
   name = value
@@ -71,6 +73,9 @@
 #define DW_FIRST_CFA(name, value) enum dwarf_call_frame_info { \
   name = value
 #define DW_END_CFA };
+#define DW_FIRST_IDX(name, value) enum dwarf_name_index_attribute { \
+  name = value
+#define DW_END_IDX };
 
 #include "dwarf2.def"
 
@@ -86,6 +91,8 @@
 #undef DW_END_ATE
 #undef DW_FIRST_CFA
 #undef DW_END_CFA
+#undef DW_FIRST_IDX
+#undef DW_END_IDX
 
 #undef DW_TAG
 #undef DW_TAG_DUP
@@ -97,6 +104,8 @@
 #undef DW_ATE
 #undef DW_ATE_DUP
 #undef DW_CFA
+#undef DW_IDX
+#undef DW_IDX_DUP
 
 /* Flag that tells whether entry has a child or not.  */
 #define DW_children_no   0
@@ -420,18 +429,6 @@ enum dwarf_macro_record_type
 DW_MACRO_GNU_hi_user = 0xff
   };
 
-/* Index attributes in the Abbreviations Table.  */
-enum dwarf_name_index_attribute
-  {
-DW_IDX_compile_unit = 1,
-DW_IDX_type_unit = 2,
-DW_IDX_die_offset = 3,
-DW_IDX_parent = 4,
-DW_IDX_type_hash = 5,
-DW_IDX_lo_user = 0x2000,
-DW_IDX_hi_user = 0x3fff
-  };
-
 /* Range list entry kinds in .debug_rnglists* section.  */
 enum dwarf_range_list_entry
   {
@@ -524,6 +521,10 @@ extern const char *get_DW_ATE_name (unsigned int enc);
recognized.  */
 extern const char *get_DW_CFA_name (unsigned int opc);
 
+/* Return the name of a DW_IDX_ constant, or NULL if the value is not
+   recognized.  */
+extern const char *get_DW_IDX_name (unsigned int idx);
+
 #ifdef __cplusplus
 }
 #endif /* __cplusplus */
diff --git a/libiberty/dwarfnames.c b/libiberty/dwarfnames.c
index 62563b7..e58d03c 100644
--- a/libiberty/dwarfnames.c
+++ b/libiberty/dwarfnames.c
@@ -59,6 +59,11 @@ Boston, MA 02110-1301, USA.  */
   switch (opc) {   \
   DW_CFA (name, value)
 #define DW_END_CFA } return 0; }
+#define DW_FIRST_IDX(name, value) \
+  const char *get_DW_IDX_name (unsigned int idx) { \
+  switch (idx) {   \
+  DW_IDX (name, value)
+#define DW_END_IDX } return 0; }
 
 #define DW_TAG(name, value) case name: return # name ;
 #define DW_TAG_DUP(name, value)
@@ -70,6 +75,8 @@ Boston, MA 02110-1301, USA.  */
 #define DW_ATE(name, value) case name: return # name ;
 #define DW_ATE_DUP(name, value)
 #define DW_CFA(name, value) case name: return # name ;
+#define DW_IDX(name, value) case name: return # name ;
+#define DW_IDX_DUP(name, value)
 
 #include "dwarf2.def"
 
@@ -85,6 +92,8 @@ Boston, MA 02110-1301, USA.  */
 #undef 

Re: [PATCH GCC][5/5]Enable tree loop distribution at -O3 and above optimization levels.

2017-06-23 Thread Bin.Cheng
On Fri, Jun 23, 2017 at 6:04 AM, Jeff Law  wrote:
> On 06/07/2017 02:07 AM, Bin.Cheng wrote:
>> On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law  wrote:
>>> On 06/02/2017 05:52 AM, Bin Cheng wrote:
 Hi,
 This patch enables -ftree-loop-distribution by default at -O3 and above 
 optimization levels.
 Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?

 Note I don't have strong opinion here and am fine with either it's 
 accepted or rejected.

 Thanks,
 bin
 2017-05-31  Bin Cheng  

   * opts.c (default_options_table): Enable OPT_ftree_loop_distribution
   for -O3 and above levels.
>>> I think the question is how does this generally impact the performance
>>> of the generated code and to a lesser degree compile-time.
>>>
>>> Do you have any performance data?
>> Hi Jeff,
>> At this stage of the patch, only hmmer is impacted and improved
>> obviously in my local run of spec2006 for x86_64 and AArch64.  In long
>> term, loop distribution is also one prerequisite transformation to
>> handle bwaves (at least).  For these two impacted cases, it helps to
>> resolve the gap against ICC.  I didn't check compilation time slow
>> down, we can restrict it to problem with small partition number if
>> that's a problem.
> Just a note. I know you've iterated further with Richi -- I'm not
> objecting to the patch, nor was I ready to approve.
>
> Are you and Richi happy with this as-is or are you looking to submit
> something newer based on the conversation the two of you have had?
Hi Jeff,
The patch series is updated in various ways according to review
comments, for example, it restricts compilation time by checking
number of data references against MAX_DATAREFS_FOR_DATADEPS as well as
restores data dependence cache.  There are still two missing parts I'd
like to do as followup patches: one is loop nest distribution and the
other is a data-locality cost model (at least) for small cases.  Now
Richi approved most patches except the last major one, but I still
need another iterate for some (approved) patches in order to fix
mistake/typo introduced when I separating the patch.

Thanks,
bin
>
> jeff


[PATCH] PR c++/81187 fix -Wnoexcept-type entry in manual

2017-06-23 Thread Jonathan Wakely

PR c++/81187
* doc/invoke.texi (-Wnoexcept-type): Fix name of option, from
-Wnoexcept.

Committed to trunk as obvious. Will also commit to gcc-7-branch.
commit 3c1266fdb7cf6241d9a08109160bf86753d733bd
Author: Jonathan Wakely 
Date:   Fri Jun 23 09:13:06 2017 +0100

PR c++/81187 fix -Wnoexcept-type entry in manual

PR c++/81187
* doc/invoke.texi (-Wnoexcept-type): Fix name of option, from
-Wnoexcept.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7e7a16a5..7c81f0d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2908,7 +2908,7 @@ to a function that does not have a non-throwing exception
 specification (i.e. @code{throw()} or @code{noexcept}) but is known by
 the compiler to never throw an exception.
 
-@item -Wnoexcept @r{(C++ and Objective-C++ only)}
+@item -Wnoexcept-type @r{(C++ and Objective-C++ only)}
 @opindex Wnoexcept-type
 @opindex Wno-noexcept-type
 Warn if the C++1z feature making @code{noexcept} part of a function


Re: [PATCH 2/3] Make early return predictor more precise.

2017-06-23 Thread Christophe Lyon
On 23 June 2017 at 09:03, Martin Liška  wrote:
> On 06/22/2017 04:14 PM, Christophe Lyon wrote:
>> Since this commit (r249450), I have noticed a regression:
>> FAIL:gcc.dg/tree-ssa/ipa-split-5.c scan-tree-dump optimized "part"
>> on aarch64/arm
>>
>> Christophe
>
> Hello.
>
> I'm aware of the failure and I fixed that (hopefully) in r249503.
> Can you please test that?
>
Yes, I can confirm it's now OK. Thanks

> Thanks,
> Martin


  1   2   >