[PATCH] Remove old diagnostic macros.

2016-05-30 Thread Marcin Baczyński
Hi,
this is my first GCC patch, so please bear with me if something is wrong with
it in an obvious way.

I've found two unused macros in gcc/diagnostic.h. Is the patch okay as is?
Bootstrapped on x86_64-pc-linux-gnu.

2016-05-31 Marcin Baczyński 

   * gcc/diagnostic.h (diagnostic_line_cutoff, diagnostic_flush_buffer):
delete.
---
 gcc/diagnostic.h | 6 --
 1 file changed, 6 deletions(-)

diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
index 48ae50d..58d77df 100644
--- a/gcc/diagnostic.h
+++ b/gcc/diagnostic.h
@@ -231,12 +231,6 @@ diagnostic_inhibit_notes (diagnostic_context * context)
 /* Same as output_prefixing_rule.  Works on 'diagnostic_context *'.  */
 #define diagnostic_prefixing_rule(DC) ((DC)->printer->wrapping.rule)
 
-/* Maximum characters per line in automatic line wrapping mode.
-   Zero means don't wrap lines.  */
-#define diagnostic_line_cutoff(DC) ((DC)->printer->wrapping.line_cutoff)
-
-#define diagnostic_flush_buffer(DC) pp_flush ((DC)->printer)
-
 /* True if the last module or file in which a diagnostic was reported is
different from the current one.  */
 #define diagnostic_last_module_changed(DC, MAP)\
-- 
2.8.3



Re: [PATCH] c++/71306 - bogus -Wplacement-new with an array element

2016-05-30 Thread Martin Sebor

On 05/30/2016 12:42 PM, Jason Merrill wrote:

OK.


Sorry, I should have asked to begin with: Is it okay to backport
this fix to the 6.x branch too?

Martin


Re: [PATCH] gcc/config/tilegx/tilegx.c (tilegx_function_profiler): Save r10 to stack before call mcount

2016-05-30 Thread Chen Gang
On 5/30/16 22:25, Chen Gang wrote:
> On 5/30/16 21:32, Chen Gang wrote:
>> On 5/30/16 03:18, Mike Stump wrote:
>>> On May 29, 2016, at 3:39 AM, cheng...@emindsoft.com.cn wrote:

 r10 may also be as parameter for the nested function, so need save it
 before call mcount.
>>>
>>> mcount can have a special abi where it preserves more registers than one 
>>> would otherwise expect.  I'm wondering if you know what registers it saves 
>>> or doesn't touch?  Does this fix any bug found by running tests, or just by 
>>> inspection?
>>>

Oh, I forgot bundle optimization. We can {st sp, 10; addi sp, sp, -8}.

Also I shall add PR target/71331 for the comments.

If no any additional reply within 3 days, I shall send patch v2 for it
within this week.

Thanks.

>>
>> It is about Bug71331, which I reported yesterday.
>>
>> For nested function, r10 is treated as the stack pointer for parameters,
>> mcount really save r10, but tilegx also use r10 to save lr, so cause
>> this issue ("move r10, lr; jal __mcount").
>>
>> What I have done just like gcc x86 has done ("push r10; callq mcount").
>>
> 
> After this patch, nested-func-4.c can passes compiling and running under
> tilegx qemu. :-)
> 
> Thanks.
> 

-- 
Chen Gang (陈刚)

Managing Natural Environments is the Duty of Human Beings.


[committed] Fix some OpenMP 4.5 issues (PR c++/71349)

2016-05-30 Thread Jakub Jelinek
Hi!

As the changes to the testcase show, there were 3 issues:
1) nowait clause has not been allowed on target {teams distribute ,}parallel 
for{, simd}
- while parallel for{, simd} disallows nowait, it is allowed on target, so
on these constructs nowait should go to target, and not to the worksharing
construct
2) depend clause has been allowed on combined constructs including target,
but resulted in ICE, only specifying it on non-combined target worked
3) for C++ only, after fixing 2), target parallel depend(inout: p[0])
resulted in weirdo error, because we were finalizing clauses twice in that
case

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed to trunk and 6.2.

2016-05-30  Jakub Jelinek  

PR c++/71349
* c-parser.c (c_parser_omp_for): Don't disallow nowait clause
when combined with target construct.

* parser.c (cp_parser_omp_for): Don't disallow nowait clause
when combined with target construct.
(cp_parser_omp_parallel): Pass cclauses == NULL as last argument
to cp_parser_omp_all_clauses.

* c-omp.c (c_omp_split_clauses): Put OMP_CLAUSE_DEPEND to
C_OMP_CLAUSE_SPLIT_TARGET.  Put OMP_CLAUSE_NOWAIT to
C_OMP_CLAUSE_SPLIT_TARGET if combined with target construct,
instead of C_OMP_CLAUSE_SPLIT_FOR.

* c-c++-common/gomp/clauses-1.c (bar): Add dd argument.  Add
nowait depend(inout: dd[0]) clauses where permitted.

--- gcc/c/c-parser.c.jj 2016-05-26 21:10:52.0 +0200
+++ gcc/c/c-parser.c2016-05-30 20:11:36.522350438 +0200
@@ -15113,7 +15113,9 @@ c_parser_omp_for (location_t loc, c_pars
 
   strcat (p_name, " for");
   mask |= OMP_FOR_CLAUSE_MASK;
-  if (cclauses)
+  /* parallel for{, simd} disallows nowait clause, but for
+ target {teams distribute ,}parallel for{, simd} it should be accepted.  */
+  if (cclauses && (mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_MAP)) == 0)
 mask &= ~(OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_NOWAIT);
   /* Composite distribute parallel for{, simd} disallows ordered clause.  */
   if ((mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DIST_SCHEDULE)) != 0)
--- gcc/cp/parser.c.jj  2016-05-26 10:38:01.0 +0200
+++ gcc/cp/parser.c 2016-05-30 20:51:40.418637030 +0200
@@ -33917,7 +33917,9 @@ cp_parser_omp_for (cp_parser *parser, cp
 
   strcat (p_name, " for");
   mask |= OMP_FOR_CLAUSE_MASK;
-  if (cclauses)
+  /* parallel for{, simd} disallows nowait clause, but for
+ target {teams distribute ,}parallel for{, simd} it should be accepted.  */
+  if (cclauses && (mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_MAP)) == 0)
 mask &= ~(OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_NOWAIT);
   /* Composite distribute parallel for{, simd} disallows ordered clause.  */
   if ((mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DIST_SCHEDULE)) != 0)
@@ -34256,7 +34258,8 @@ cp_parser_omp_parallel (cp_parser *parse
}
 }
 
-  clauses = cp_parser_omp_all_clauses (parser, mask, p_name, pragma_tok);
+  clauses = cp_parser_omp_all_clauses (parser, mask, p_name, pragma_tok,
+  cclauses == NULL);
   if (cclauses)
 {
   cp_omp_split_clauses (loc, OMP_PARALLEL, mask, clauses, cclauses);
--- gcc/c-family/c-omp.c.jj 2016-05-01 12:21:55.0 +0200
+++ gcc/c-family/c-omp.c2016-05-30 20:21:59.107412705 +0200
@@ -983,6 +983,7 @@ c_omp_split_clauses (location_t loc, enu
case OMP_CLAUSE_MAP:
case OMP_CLAUSE_IS_DEVICE_PTR:
case OMP_CLAUSE_DEFAULTMAP:
+   case OMP_CLAUSE_DEPEND:
  s = C_OMP_CLAUSE_SPLIT_TARGET;
  break;
case OMP_CLAUSE_NUM_TEAMS:
@@ -998,7 +999,6 @@ c_omp_split_clauses (location_t loc, enu
  s = C_OMP_CLAUSE_SPLIT_PARALLEL;
  break;
case OMP_CLAUSE_ORDERED:
-   case OMP_CLAUSE_NOWAIT:
  s = C_OMP_CLAUSE_SPLIT_FOR;
  break;
case OMP_CLAUSE_SCHEDULE:
@@ -1333,6 +1333,18 @@ c_omp_split_clauses (location_t loc, enu
  else
s = C_OMP_CLAUSE_SPLIT_FOR;
  break;
+   case OMP_CLAUSE_NOWAIT:
+ /* Nowait clause is allowed on target, for and sections, but
+is not allowed on parallel for or parallel sections.  Therefore,
+put it on target construct if present, because that can only
+be combined with parallel for{, simd} and not with for{, simd},
+otherwise to the worksharing construct.  */
+ if ((mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_MAP))
+ != 0)
+   s = C_OMP_CLAUSE_SPLIT_TARGET;
+ else
+   s = C_OMP_CLAUSE_SPLIT_FOR;
+ break;
default:
  gcc_unreachable ();
}
--- gcc/testsuite/c-c++-common/gomp/clauses-1.c.jj  2016-03-08 
09:01:48.0 +0100
+++ gcc/testsuite/c-c++-common/gomp/clauses-1.c 2016-05-30 20:18:59.0 
+0200
@@ -34,7 +34,7 @@ foo (int d, int m, int i1, int i2, 

Re: [v3 PATCH] Protect allocator-overloads of tuple-from-tuple constructors from cases that would create dangling references.

2016-05-30 Thread Paolo Carlini

Hi Ville,

On 29/05/2016 16:42, Ville Voutilainen wrote:

On 28 May 2016 at 21:25, Ville Voutilainen  wrote:

The fix to avoid binding dangling references to temporaries for tuple's
constructors that take tuples of different type didn't include the fix
for allocator overloads. That was just lazy, and I should feel ashamed.
This patch fixes it, and takes us one step further to pass libc++'s testsuite
for tuple. The added _NonNestedTuple checks could actually be folded
into the recently-added _TMCT alias, but I'll do that as a separate cleanup
patch. For now, this should do as an easy and straightforward fix.

Tested on Linux-x64.

2016-05-28  Ville Voutilainen  

 Protect allocator-overloads of tuple-from-tuple constructors
 from cases that would create dangling references.
 * include/std/tuple (tuple(allocator_arg_t, const _Alloc&,
  const tuple<_UElements...>&), tuple(allocator_arg_t, const _Alloc&,
  tuple<_UElements...>&&)): Add a check for _NonNestedTuple.
 * testsuite/20_util/tuple/cons/nested_tuple_construct.cc: Adjust.

Since Jonathan is going to be out-of-reach for next week due to a
well-deserved holiday, would it be ok if Paolo approves such patches?
Admittedly, I didn't follow in detail all your latest contributions, but 
sure, if this work goes only in trunk, please go ahead at your ease. We 
can certainly revisit details when Jon comes back, but trunk is 
currently in Stage 1, open for any kind of development work...


Thanks,
Paolo.


Re: [patch] config.gcc FreeBSD ARM

2016-05-30 Thread Andreas Tobler

On 27.05.16 22:34, Andreas Tobler wrote:

Hi all,

The FreeBSD ARM people eliminated the extra armv6hf target and moved the
hardfloat 'functionality' into the armv6-*-freebsd11+ target.
This applies / will apply (FreeBSD11 is not released yet. Planned date
in September 16) to FreeBSD11. On FreeBSD10 armv6 still has only soft
float. The armv6hf is not life on FreeBSD10.

This simplifies life a bit.

I'll commit the attached patch to all the active branches. Regarding the
gcc-5 branch, do I have permission to apply?


Committed to trunk and gcc-6, waiting till gcc-5 is open again.

Andreas


2016-05-27  Andreas Tobler  

* config.gcc: Move hard float support for arm*hf*-*-freebsd* into
armv6*-*-freebsd* for FreeBSD11*. Eliminate the arm*hf*-*-freebsd*
target.

Index: config.gcc
===
--- config.gcc  (revision 236835)
+++ config.gcc  (working copy)
@@ -1058,13 +1058,11 @@
case $target in
armv6*-*-freebsd*)
tm_defines="${tm_defines} TARGET_FREEBSD_ARMv6=1"
+if test $fbsd_major -ge 11; then
+   tm_defines="${tm_defines} TARGET_FREEBSD_ARM_HARD_FLOAT=1"
+fi
;;
esac
-   case $target in
-   arm*hf-*-freebsd*)
-   tm_defines="${tm_defines} TARGET_FREEBSD_ARM_HARD_FLOAT=1"
-   ;;
-   esac
with_tls=${with_tls:-gnu}
;;
  arm*-*-netbsdelf*)





[PATCH] Fix PR c++/27100

2016-05-30 Thread Patrick Palka
Here's a more straightforward version of the patch that abuses the fact
that the fields DECL_PENDING_INLINE_INFO and DECL_SAVED_FUNCTION_DATA
are a union.

Bootstrapped + regtested on x86_64-pc-linux-gnu.  Does this look OK to
commit?

gcc/cp/ChangeLog:

PR c++/27100
* decl.c (duplicate_decls): Copy the DECL_PENDING_INLINE_P flag.

gcc/testsuite/ChangeLog:

PR c++/27100
* g++.dg/other/friend6.C: New test.
---
 gcc/cp/decl.c| 11 +--
 gcc/testsuite/g++.dg/other/friend6.C | 15 +++
 2 files changed, 24 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/other/friend6.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 90e147c..b26e8ae 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -2351,8 +2351,15 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
newdecl_is_friend)
}
   else
{
- if (DECL_PENDING_INLINE_INFO (newdecl) == 0)
-   DECL_PENDING_INLINE_INFO (newdecl) = DECL_PENDING_INLINE_INFO 
(olddecl);
+ if (DECL_PENDING_INLINE_INFO (newdecl) == NULL)
+   {
+ /* This also copies DECL_SAVED_FUNCTION_DATA since the two fields
+are a union (tagged by DECL_PENDING_INLINE_P).  */
+ DECL_PENDING_INLINE_INFO (newdecl)
+   = DECL_PENDING_INLINE_INFO (olddecl);
+ DECL_PENDING_INLINE_P (newdecl)
+   = DECL_PENDING_INLINE_P (olddecl);
+   }
 
  DECL_DECLARED_INLINE_P (newdecl) |= DECL_DECLARED_INLINE_P (olddecl);
 
diff --git a/gcc/testsuite/g++.dg/other/friend6.C 
b/gcc/testsuite/g++.dg/other/friend6.C
new file mode 100644
index 000..5f593a1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/other/friend6.C
@@ -0,0 +1,15 @@
+// PR c++/27100
+// This used to fail at link time with an "undefined reference to 'foo'" error.
+// { dg-do link }
+
+struct A
+{
+  friend void foo (const A&) { }
+  friend void foo (const A&);
+};
+
+int
+main ()
+{
+  foo (A ());
+}
-- 
2.9.0.rc0.29.gabd6606



Re: [PATCH] Update documentation for ARM architecture

2016-05-30 Thread Sandra Loosemore

On 05/28/2016 07:52 AM, stefan.bru...@rwth-aachen.de wrote:

From: Stefan Brüns 

   * use lexicographical ordering, as "gcc -march=foo" does
   * add armv6k, armv6z, arm6zk
   * remove ep9312, it is only valid for -mcpu
   * add armv6s-m and document it, as it is no official ARM name.

Support for the OS extension/SVC is mandatory, non-supporting
implementations are deprecated (ARMv6-M Architecture Reference Manual, B.2)
---
  gcc/doc/invoke.texi | 17 -
  1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9e92133..6942b83 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14041,15 +14041,22 @@ name to determine what kind of instructions it can 
emit when generating
  assembly code.  This option can be used in conjunction with or instead
  of the @option{-mcpu=} option.  Permissible names are: @samp{armv2},
  @samp{armv2a}, @samp{armv3}, @samp{armv3m}, @samp{armv4}, @samp{armv4t},
-@samp{armv5}, @samp{armv5t}, @samp{armv5e}, @samp{armv5te},
-@samp{armv6}, @samp{armv6j},
-@samp{armv6t2}, @samp{armv6z}, @samp{armv6kz}, @samp{armv6-m},
-@samp{armv7}, @samp{armv7-a}, @samp{armv7-r}, @samp{armv7-m}, @samp{armv7e-m},
+@samp{armv5}, @samp{armv5e}, @samp{armv5t}, @samp{armv5te},
+@samp{armv6}, @samp{armv6-m}, @samp{armv6j}, @samp{armv6k},
+@samp{armv6kz}, @samp{armv6s-m},
+@samp{armv6t2}, @samp{armv6z}, @samp{armv6zk},
+@samp{armv7}, @samp{armv7-a}, @samp{armv7-m}, @samp{armv7-r}, @samp{armv7e-m},
  @samp{armv7ve}, @samp{armv8-a}, @samp{armv8-a+crc}, @samp{armv8.1-a},
-@samp{armv8.1-a+crc}, @samp{iwmmxt}, @samp{iwmmxt2}, @samp{ep9312}.
+@samp{armv8.1-a+crc}, @samp{iwmmxt}, @samp{iwmmxt2}.

  Architecture revisions older than @option{armv4t} are deprecated.


Can you please

s/@option{armv4t}/@samp{armv4t}/


+@option{-march=armv6s-m} is the armv6-m architecture with support for
+the (now mandatory) SVC instruction.
+
+@option{-march=armv6zk} is an alias for armv6kz, existing for backwards
+compatibility.
+
  @option{-march=armv7ve} is the armv7-a architecture with virtualization
  extensions.


And in the above 3 paragraphs, please add @samp{} markup to armv6-m, 
armv6kz, and armv7-a, respectively.


It's OK with those changes unless the arch maintainers have some 
technical complaint about the information.


-Sandra



[PATCH] Fix PR tree-optimization/71314

2016-05-30 Thread Patrick Palka
The test case ssa-thread-14.c expects that non-short-circuit logical ops
are used so it fails on targets that don't use such ops by default.  To
fix, I copied the dg directives from tree-ssa/reassoc-35.c which look
reasonable.  Does this look OK to commit?

gcc/testsuite/ChangeLog:

PR tree-optimization/71314
* gcc.dg/tree-ssa/ssa-thread-14.c: Adjust target selector.  Pass
-mbranch-cost= 2.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
index e2ac2f7..c754b5b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
@@ -1,5 +1,6 @@
-/* { dg-do compile }  */
+/* { dg-do compile { target { ! { m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* 
v850*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* 
xtensa*-*-* hppa*-*-* nios2*-*-* } } } }  */
 /* { dg-additional-options "-O2 -fdump-tree-vrp-details" }  */
+/* { dg-additional-options "-mbranch-cost=2" { target mips*-*-* avr-*-* 
s390*-*-* i?86-*-* x86_64-*-* } }  */
 /* { dg-final { scan-tree-dump-times "Threaded jump" 8 "vrp1" } }  */
 
 void foo (void);
-- 
2.9.0.rc0.29.gabd6606



[PATCH] Fix PR tree-optimization/71077

2016-05-30 Thread Patrick Palka
In this PR the function simplify_control_stmt_condition_1(), which is
responsible for recursing into the operands of a GIMPLE_COND during jump
threading to check for dominating ASSERT_EXPRs, was erroneously
returning a VECTOR_CST when it should instead be returning an
INTEGER_CST.  The problem is that the zero/one constants that this
function uses share the type of the GIMPLE_COND's operands, operands
which may be vectors.  This makes no sense and there is no reason to
build and use such constants instead of using boolean_[true|false]_node
since all that matters is whether a constant returned by
simplify_control_stmt_condition() correctly satisfies
integer_[zero|one]p.  So this patch makes the combining part of
simplify_control_stmt_condition_1() just use boolean_false_node or
boolean_true_node as the designated false/true values.

Does this look OK to commit after bootstrap+regtest on
x86_64-pc-linux-gnu?  (I'm not sure if I put the test in the proper
directory since no other test in tree-ssa uses -flto.  Also not sure if
an i686 target accepts the -march=core-avx2 flag.)

gcc/ChangeLog:

PR tree-optimization/71077
* tree-ssa-threadedge.c (simplify_control_stmt_condition_1): In
the combining step, use boolean_false_node and boolean_true_node
as the designated false/true return values.

gcc/testsuite/ChangeLog:

PR tree-optimization/71077
* gcc.dg/tree-ssa/pr71077.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr71077.c | 18 ++
 gcc/tree-ssa-threadedge.c   | 26 --
 2 files changed, 30 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr71077.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71077.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71077.c
new file mode 100644
index 000..4753740
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71077.c
@@ -0,0 +1,18 @@
+/* PR c++/71077  */
+/* { dg-do link { target { i?86-*-* x86_64-*-* } } }  */
+/* { dg-options "-O3 -flto -march=core-avx2" }  */
+
+int *a;
+int b, c, d, e;
+int sched_analyze(void) {
+ for (; b; b++) {
+   c = 0;
+   for (; c < 32; c++)
+ if (b & 1 << c)
+   a[b + c] = d;
+ }
+ return 0;
+}
+
+void schedule_insns(void) { e = sched_analyze(); }
+int main(void) { schedule_insns(); }
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index eca3812..826d620 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -573,8 +573,6 @@ simplify_control_stmt_condition_1 (edge e,
  enum tree_code rhs_code = gimple_assign_rhs_code (def_stmt);
  const tree rhs1 = gimple_assign_rhs1 (def_stmt);
  const tree rhs2 = gimple_assign_rhs2 (def_stmt);
- const tree zero_cst = build_zero_cst (TREE_TYPE (op0));
- const tree one_cst = build_one_cst (TREE_TYPE (op0));
 
  /* Is A != 0 ?  */
  const tree res1
@@ -589,19 +587,19 @@ simplify_control_stmt_condition_1 (edge e,
{
  /* If A == 0 then (A & B) != 0 is always false.  */
  if (cond_code == NE_EXPR)
-   return zero_cst;
+   return boolean_false_node;
  /* If A == 0 then (A & B) == 0 is always true.  */
  if (cond_code == EQ_EXPR)
-   return one_cst;
+   return boolean_true_node;
}
  else if (rhs_code == BIT_IOR_EXPR && integer_nonzerop (res1))
{
  /* If A != 0 then (A | B) != 0 is always true.  */
  if (cond_code == NE_EXPR)
-   return one_cst;
+   return boolean_true_node;
  /* If A != 0 then (A | B) == 0 is always false.  */
  if (cond_code == EQ_EXPR)
-   return zero_cst;
+   return boolean_false_node;
}
 
  /* Is B != 0 ?  */
@@ -617,19 +615,19 @@ simplify_control_stmt_condition_1 (edge e,
{
  /* If B == 0 then (A & B) != 0 is always false.  */
  if (cond_code == NE_EXPR)
-   return zero_cst;
+   return boolean_false_node;
  /* If B == 0 then (A & B) == 0 is always true.  */
  if (cond_code == EQ_EXPR)
-   return one_cst;
+   return boolean_true_node;
}
  else if (rhs_code == BIT_IOR_EXPR && integer_nonzerop (res2))
{
  /* If B != 0 then (A | B) != 0 is always true.  */
  if (cond_code == NE_EXPR)
-   return one_cst;
+   return boolean_true_node;
  /* If B != 0 then (A | B) == 0 is always false.  */
  if (cond_code == EQ_EXPR)
-   return zero_cst;
+   return boolean_false_node;
}
 
  if (res1 != NULL_TREE && res2 != NULL_TREE)
@@ -641,10 +639,10 @@ simplify_control_stmt_condition_1 (edge e,
{
  /* If A != 0 and B != 0 then (bool)(A & B) != 0 is true.  */
 

Re: Enable loop peeling at -O3

2016-05-30 Thread Sandra Loosemore

On 05/30/2016 09:26 AM, Jan Hubicka wrote:

Hi,
this is version of patch I intend to commit after re-testing at x86_64-linux
with loop peeling enabled at -O3.

It drops -fpeel-all-loops, add logic to not peel loops multiple times
and fix profile updating.

Bootstrapped/regtested x86_64-linux

Honza

* doc/invoke.texi (-fpeel-loops,-O3): Update documentation.
* opts.c (default_options): Enable peel loops at -O3.
* tree-ssa-loop-ivcanon.c (peeled_loops): New static var.
(try_peel_loop): Do not re-peel already peeled loops;
use likely upper bounds; fix profile updating.
(pass_complete_unroll::execute): Initialize peeled_loops.

* gcc.dg/tree-ssa/peel1.c: New testcase.
* gcc.dg/tree-ssa/peel2.c: New testcase.
* gcc.dg/tree-ssa/pr61743-1.c: Disable loop peeling.
* gcc.dg/tree-ssa/pr61743-2.c: Disable loop peeling.


The documentation parts look OK (and I'm glad we don't need to add yet 
another new option).


-Sandra



Re: [PATCH] x86 interrupt attribute patch [2/2]

2016-05-30 Thread Sandra Loosemore

On 05/30/2016 07:31 AM, Koval, Julia wrote:

Hi,
Here is the fixed version of the patch. Ok for trunk?

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 2d4f028..f4bd7dd 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -5266,6 +5266,96 @@ On x86-32 targets, the @code{stdcall} attribute causes th
e compiler to
 assume that the called function pops off the stack space used to
 pass arguments, unless it takes a variable number of arguments.

+@item no_caller_saved_registers
+@cindex @code{no_caller_saved_registers} function attribute, x86
+Use this attribute to indicate that the specified function has no
+caller-saved registers, for example for a function, called from an
+interrupt handler.  That is, all registers are callee-saved.


I think the "for example" information should come after the second 
sentence to improve the flow here.  I'm also confused; I can see how an 
interrupt handler might have different register usage conventions, but 
do regular functions called from inside an interrupt handler function 
really use a non-standard call/return sequence?



+The compiler generates proper function entry and exit sequences to
+save and restore any modified registers, except for the EFLAGS
+register.  Since GCC doesn't preserve MPX, SSE, MMX nor x87 states,
+the GCC option, @option{-mgeneral-regs-only}, should be used to


Please delete both commas in the line above.


+compile functions with @code{no_caller_saved_registers} attribute.
+
+@emph{Note for compiler implementers:} If the compiler generates MPX,
+SSE, MMX or x87 instructions in a function with
+@code{no_caller_saved_registers} attribute or functions called from
+a function with @code{no_caller_saved_registers} attribute may contain
+MPX, SSE, MMX or x87 instructions, the compiler must save and restore
+the corresponding state.


A "Note for compiler implementers" has no place in user documentation. 
You should just document what GCC does, if it is relevant to how a user 
would use this feature.


It also seems like the admonition in this note that the compiler must 
save/restore the state contradicts the previous paragraph, where you say 
GCC doesn't preserve the state.



+
+@item interrupt
+@cindex @code{interrupt} function attribute, x86
+Use this attribute to indicate that the specified function is an
+interrupt handler or an exception handler (depending on parameters, passed


Delete the comma.


+to the function, explained further).  The compiler generates function
+entry and exit sequences suitable for use in an interrupt handler when
+this attribute is present.  The @code{IRET} instruction, instead of the
+@code{RET} instruction, is used to return from interrupt handlers.  All
+registers, except for the EFLAGS register which is restored by the
+@code{IRET} instruction, are preserved by the compiler.  Since GCC
+doesn't preserve MPX, SSE, MMX nor x87 states, the GCC option,
+@option{-mgeneral-regs-only}, should be used to compile interrupt and


Delete the two previous commas.


+exception handlers.
+
+@emph{Note for compiler implementers:} If the compiler generates MPX,
+SSE, MMX or x87 instructions in an interrupt or exception handler, or
+functions called from an interrupt or exception handler may contain
+MPX, SSE, MMX or x87 instructions, the compiler must save and restore
+the corresponding state.


Again, this is user documentation.  Just explain what GCC does if it is 
relevant to how a user would use the feature you are documenting.



+
+Since the direction flag in the FLAGS register in interrupt handlers
+is undetermined, cld instruction must be emitted in function prologue
+if rep string instructions are used in interrupt handler or interrupt
+handler isn't a leaf function.


This paragraph seems totally implementor-speaky and irrelevant to how a 
user would use the feature.



+
+Any interruptible-without-stack-switch code must be compiled with
+@option{-mno-red-zone} since interrupt handlers can and will, because
+of the hardware design, touch the red zone.
+
+An interrupt handler must be declared with a mandatory pointer
+argument:
+
+@smallexample
+struct interrupt_frame;
+
+__attribute__ ((interrupt))
+void
+f (struct interrupt_frame *frame)
+@{
+@}
+@end smallexample
+
+and you must define the structure the pointer pointing to, depending
+on the proper x86 interrupt frame, described in the processor's manual.


How about

@noindent
and you must define @code{struct interrupt_frame} as described in the 
processor's manual.




+
+The exception handler is very similar to the interrupt handler with
+a different mandatory function signature:
+
+@smallexample
+#ifdef __x86_64__
+typedef unsigned long long int uword_t;
+#else
+typedef unsigned int uword_t;
+#endif
+
+struct interrupt_frame;
+
+__attribute__ ((interrupt))
+void
+f (struct interrupt_frame *frame, uword_t error_code)
+@{
+  ...
+@}
+@end smallexample
+
+and compiler pops the error code off the stack before the @code{IRET}
+instruction.
+

[PATCH, i386]: Use "lock orl $0, -4(%esp)" in mfence_nosse

2016-05-30 Thread Uros Bizjak
On Sun, May 29, 2016 at 11:10 PM, Uros Bizjak  wrote:

> Attached RFC patch implements this proposal.
>
> 2016-05-29  Uros Bizjak  
>
> * config/i386/sync.md (mfence_nosse): Use "lock orl $0, -4(%esp)".
>
> Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> Any other opinion on this issue? The linux kernel also implements
> memory fence like the above proposal.

I have committed the patch to mainline SVN.

Uros.


Re: [PATCH] c++/71306 - bogus -Wplacement-new with an array element

2016-05-30 Thread Jason Merrill

OK.

Jason


Re: [C++ Patch] PR 71238 ("Undeclared function message imprecisely points to error column")

2016-05-30 Thread Jason Merrill

On 05/30/2016 12:45 PM, Paolo Carlini wrote:

Hi,

On 30/05/2016 18:07, Jason Merrill wrote:

On 05/30/2016 11:42 AM, Paolo Carlini wrote:

+unqualified_name_lookup_error (tree name, location_t loc)
 {
+  if (loc == UNKNOWN_LOCATION)
+loc = location_of (name);


When does this do anything useful?  If name is a DECL, this will give
the wrong answer.  If name is an IDENTIFIER_NODE, this will give
UNKNOWN_LOCATION.

Indeed, I wasn't sure, as I said, I tried to be conservative, because we
used to call in that function. I'm going to regression test again with
input_location instead. Does the patch otherwise looks sane?


The safest choice would be

EXPR_LOCATION_OR_LOC (name, input_location)

OK with that change.

Jason



Re: [PATCH, OpenACC] Make reduction arguments addressable

2016-05-30 Thread Jakub Jelinek
On Mon, May 30, 2016 at 10:38:59PM +0800, Chung-Lin Tang wrote:
> Hi, a previous patch of Cesar's has made the middle-end omp-lowering
> automatically create and insert a tofrom (i.e. present_or_copy) map for
> parallel reductions.  This allowed the user to not need explicit
> clauses to copy out the reduction result, but because reduction arguments
> are not marked addressable, async does not work as expected,
> i.e. the asynchronous copy-out results are not used in the compiler generated 
> code.

If you need it only for async parallel/kernels? regions, can't you do that
only for those and not for others?

> This patch fixes this in the front-ends, I've tested this patch without
> new regressions, and fixes some C++ OpenACC tests that regressed after
> my last OpenACC async patch.  Is this okay for trunk?

Testcases in the testsuite or others?  If the latter, we should add them.

Jakub


[gomp4.5] Doacross fixes for Fortran non-simple loops

2016-05-30 Thread Jakub Jelinek
Hi!

This fixes the handling of doacross on non-simple Fortran loops (ones where
step isn't compile time known 1 or -1), except that lastprivate is still
broken if the collapsed loop iterates non-zero times, but one of the ordered
loops inside of it has zero iterations.  Will need to figure out something
for that case.

Tested on x86_64-linux and i686-linux, committed to gomp-4_5-branch.

2016-05-30  Jakub Jelinek  

* omp-low.c (expand_omp_ordered_sink): Handle TREE_PURPOSE
of deps being TRUNC_DIV_EXPR.
* gimplify.c (gimplify_scan_omp_clauses): Likewise.
gcc/fortran/
* trans-openmp.c (doacross_steps): New variable.
(gfc_trans_omp_clauses): Wrap depend sink addend into
TRUNC_DIV_EXPR with second operand the non-simple step.
(gfc_trans_omp_do): Set up doacross_steps.
libgomp/
* testsuite/libgomp.c/doacross-1.c (main): Add missing
#pragma omp atomic read.
* testsuite/libgomp.c/doacross-2.c (main): Likewise.
* testsuite/libgomp.c/doacross-3.c (main): Likewise.
* testsuite/libgomp.fortran/doacross1.f90: New test.
* testsuite/libgomp.fortran/doacross2.f90: New test.

--- gcc/omp-low.c.jj2016-05-20 14:52:59.0 +0200
+++ gcc/omp-low.c   2016-05-30 17:46:40.239706984 +0200
@@ -7996,12 +7996,27 @@ expand_omp_ordered_sink (gimple_stmt_ite
 
   for (i = 0; i < fd->ordered; i++)
 {
+  tree step = NULL_TREE;
   off = TREE_PURPOSE (deps);
+  if (TREE_CODE (off) == TRUNC_DIV_EXPR)
+   {
+ step = TREE_OPERAND (off, 1);
+ off = TREE_OPERAND (off, 0);
+   }
   if (!integer_zerop (off))
{
  gcc_assert (fd->loops[i].cond_code == LT_EXPR
  || fd->loops[i].cond_code == GT_EXPR);
  bool forward = fd->loops[i].cond_code == LT_EXPR;
+ if (step)
+   {
+ /* Non-simple Fortran DO loops.  If step is variable,
+we don't know at compile even the direction, so can't
+warn.  */
+ if (TREE_CODE (step) != INTEGER_CST)
+   break;
+ forward = tree_int_cst_sgn (step) != -1;
+   }
  if (forward ^ OMP_CLAUSE_DEPEND_SINK_NEGATIVE (deps))
warning_at (loc, 0, "% clause waiting for "
"lexically later iteration");
@@ -8022,16 +8037,33 @@ expand_omp_ordered_sink (gimple_stmt_ite
   edge e1 = split_block (gsi_bb (gsi2), gsi_stmt (gsi2));
   edge e2 = split_block_after_labels (e1->dest);
 
-  *gsi = gsi_after_labels (e1->dest);
+  gsi2 = gsi_after_labels (e1->dest);
+  *gsi = gsi_last_bb (e1->src);
   for (i = 0; i < fd->ordered; i++)
 {
   tree itype = TREE_TYPE (fd->loops[i].v);
+  tree step = NULL_TREE;
+  tree orig_off = NULL_TREE;
   if (POINTER_TYPE_P (itype))
itype = sizetype;
   if (i)
deps = TREE_CHAIN (deps);
   off = TREE_PURPOSE (deps);
-  tree s = fold_convert_loc (loc, itype, fd->loops[i].step);
+  if (TREE_CODE (off) == TRUNC_DIV_EXPR)
+   {
+ step = TREE_OPERAND (off, 1);
+ off = TREE_OPERAND (off, 0);
+ gcc_assert (fd->loops[i].cond_code == LT_EXPR
+ && integer_onep (fd->loops[i].step)
+ && !POINTER_TYPE_P (TREE_TYPE (fd->loops[i].v)));
+   }
+  tree s = fold_convert_loc (loc, itype, step ? step : fd->loops[i].step);
+  if (step)
+   {
+ off = fold_convert_loc (loc, itype, off);
+ orig_off = off;
+ off = fold_build2_loc (loc, TRUNC_DIV_EXPR, itype, off, s);
+   }
 
   if (integer_zerop (off))
t = boolean_true_node;
@@ -8053,7 +8085,36 @@ expand_omp_ordered_sink (gimple_stmt_ite
  else
a = fold_build2_loc (loc, PLUS_EXPR, TREE_TYPE (fd->loops[i].v),
 fd->loops[i].v, co);
- if (fd->loops[i].cond_code == LT_EXPR)
+ if (step)
+   {
+ tree t1, t2;
+ if (OMP_CLAUSE_DEPEND_SINK_NEGATIVE (deps))
+   t1 = fold_build2_loc (loc, GE_EXPR, boolean_type_node, a,
+ fd->loops[i].n1);
+ else
+   t1 = fold_build2_loc (loc, LT_EXPR, boolean_type_node, a,
+ fd->loops[i].n2);
+ if (OMP_CLAUSE_DEPEND_SINK_NEGATIVE (deps))
+   t2 = fold_build2_loc (loc, LT_EXPR, boolean_type_node, a,
+ fd->loops[i].n2);
+ else
+   t2 = fold_build2_loc (loc, GE_EXPR, boolean_type_node, a,
+ fd->loops[i].n1);
+ t = fold_build2_loc (loc, LT_EXPR, boolean_type_node,
+  step, build_int_cst (TREE_TYPE (step), 0));
+ if (TREE_CODE (step) != INTEGER_CST)
+   {
+ t1 = unshare_expr (t1);
+ t1 = 

Re: [C++ Patch] PR 71238 ("Undeclared function message imprecisely points to error column")

2016-05-30 Thread Paolo Carlini

Hi,

On 30/05/2016 18:07, Jason Merrill wrote:

On 05/30/2016 11:42 AM, Paolo Carlini wrote:

+unqualified_name_lookup_error (tree name, location_t loc)
 {
+  if (loc == UNKNOWN_LOCATION)
+loc = location_of (name);


When does this do anything useful?  If name is a DECL, this will give 
the wrong answer.  If name is an IDENTIFIER_NODE, this will give 
UNKNOWN_LOCATION.
Indeed, I wasn't sure, as I said, I tried to be conservative, because we 
used to call in that function. I'm going to regression test again with 
input_location instead. Does the patch otherwise looks sane?


Thanks,
Paolo.



Re: [C++ Patch] PR 71238 ("Undeclared function message imprecisely points to error column")

2016-05-30 Thread Jason Merrill

On 05/30/2016 11:42 AM, Paolo Carlini wrote:

+unqualified_name_lookup_error (tree name, location_t loc)
 {
+  if (loc == UNKNOWN_LOCATION)
+loc = location_of (name);


When does this do anything useful?  If name is a DECL, this will give 
the wrong answer.  If name is an IDENTIFIER_NODE, this will give 
UNKNOWN_LOCATION.


Jason



Re: [PATCH, rs6000] Add builtin-support for new Power9 vslv and vsrv (vector shift left and right variable) instructions

2016-05-30 Thread Segher Boessenkool
On Fri, May 27, 2016 at 02:01:33PM -0600, Kelvin Nilsen wrote:
> 
> This patch adds built-in function support for the Power9 vslv and vsrv
> instructions.


> +Tne @code{vec_slv} and @code{vec_srv} functions operate in parallel on

Typo ("Tne").

> +all of the bytes of their @code{src} and @code{shift_distance}
> +arguments in parallel.  The behavior of the @code{vec_slv} is as if

You also say "in parallel" twice here.

> --- gcc/testsuite/gcc.target/powerpc/vslv-0.c (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/vslv-0.c (working copy)
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-options "-mcpu=power9" } */

I think you need to prevent the user from overriding -mcpu= for these
tests?  There are many examples of this in gcc.target/powerpc/ .

Okay for trunk with those things taken care of; okay for 6 later, too.

Thanks,


Segher


Re: [PATCH 2/3] Add profiling support for IVOPTS

2016-05-30 Thread Martin Liška
On 05/24/2016 12:11 PM, Bin.Cheng wrote:
> Hi,
> Could you please factor out this as a function and remove the goto
> statements?  Okay with this change if no fallout in benchmarks you
> run.
> 
> Thanks,
> bin

Hi.

Thanks for the review, I've just verified that it does not introduce any
regression on SPECv6 and it improves couple of SPEC2006 benchmarks w/
PGO. I'm going to install the patch and make a control run of benchmarks.

Thanks
Martin
>From 2991622862dd934e464f542e9e58270bf0088544 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 17 May 2016 15:22:43 +0200
Subject: [PATCH 1/5] Add profiling support for IVOPTS

gcc/ChangeLog:

2016-05-17  Martin Liska  

	* tree-ssa-loop-ivopts.c (get_computation_cost_at): Scale
	computed costs by frequency of BB they belong to.
	(get_scaled_computation_cost_at): New function.
---
 gcc/tree-ssa-loop-ivopts.c | 62 ++
 1 file changed, 46 insertions(+), 16 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index d770ec9..a541ef8 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -4794,7 +4794,33 @@ get_loop_invariant_expr (struct ivopts_data *data, tree ubase,
   return record_inv_expr (data, expr);
 }
 
+/* Scale (multiply) the computed COST (except scratch part that should be
+   hoisted out a loop) by header->frequency / AT->frequency,
+   which makes expected cost more accurate.  */
 
+static comp_cost
+get_scaled_computation_cost_at (ivopts_data *data, gimple *at, iv_cand *cand,
+comp_cost cost)
+{
+   int loop_freq = data->current_loop->header->frequency;
+   int bb_freq = at->bb->frequency;
+   if (loop_freq != 0)
+ {
+   gcc_assert (cost.scratch <= cost.cost);
+   int scaled_cost
+	 = cost.scratch + (cost.cost - cost.scratch) * bb_freq / loop_freq;
+
+   if (dump_file && (dump_flags & TDF_DETAILS))
+	 fprintf (dump_file, "Scaling iv_use based on cand %d "
+		  "by %2.2f: %d (scratch: %d) -> %d (%d/%d)\n",
+		  cand->id, 1.0f * bb_freq / loop_freq, cost.cost,
+		  cost.scratch, scaled_cost, bb_freq, loop_freq);
+
+   cost.cost = scaled_cost;
+ }
+
+  return cost;
+}
 
 /* Determines the cost of the computation by that USE is expressed
from induction variable CAND.  If ADDRESS_P is true, we just need
@@ -4982,18 +5008,21 @@ get_computation_cost_at (struct ivopts_data *data,
  (symbol/var1/const parts may be omitted).  If we are looking for an
  address, find the cost of addressing this.  */
   if (address_p)
-return cost + get_address_cost (symbol_present, var_present,
-offset, ratio, cstepi,
-mem_mode,
-TYPE_ADDR_SPACE (TREE_TYPE (utype)),
-speed, stmt_is_after_inc, can_autoinc);
+{
+  cost += get_address_cost (symbol_present, var_present,
+offset, ratio, cstepi,
+mem_mode,
+TYPE_ADDR_SPACE (TREE_TYPE (utype)),
+speed, stmt_is_after_inc, can_autoinc);
+  return get_scaled_computation_cost_at (data, at, cand, cost);
+}
 
   /* Otherwise estimate the costs for computing the expression.  */
   if (!symbol_present && !var_present && !offset)
 {
   if (ratio != 1)
 	cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
-  return cost;
+  return get_scaled_computation_cost_at (data, at, cand, cost);
 }
 
   /* Symbol + offset should be compile-time computable so consider that they
@@ -5012,24 +5041,25 @@ get_computation_cost_at (struct ivopts_data *data,
   aratio = ratio > 0 ? ratio : -ratio;
   if (aratio != 1)
 cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
-  return cost;
+
+  return get_scaled_computation_cost_at (data, at, cand, cost);
 
 fallback:
   if (can_autoinc)
 *can_autoinc = false;
 
-  {
-/* Just get the expression, expand it and measure the cost.  */
-tree comp = get_computation_at (data->current_loop, use, cand, at);
+  /* Just get the expression, expand it and measure the cost.  */
+  tree comp = get_computation_at (data->current_loop, use, cand, at);
 
-if (!comp)
-  return infinite_cost;
+  if (!comp)
+return infinite_cost;
+
+  if (address_p)
+comp = build_simple_mem_ref (comp);
 
-if (address_p)
-  comp = build_simple_mem_ref (comp);
+  cost = comp_cost (computation_cost (comp, speed), 0);
 
-return comp_cost (computation_cost (comp, speed), 0);
-  }
+  return get_scaled_computation_cost_at (data, at, cand, cost);
 }
 
 /* Determines the cost of the computation by that USE is expressed
-- 
2.8.3



[C++ Patch] PR 71238 ("Undeclared function message imprecisely points to error column")

2016-05-30 Thread Paolo Carlini

Hi,

this diagnostic bug report is about an incorrect column (normally the 
final closed parenthesis) for error messages emitted by 
unqualified_fn_lookup_error. The substance of the fix is obvious - pass 
an appropriate column information to the function - and it handles 
correctly all the instances I could find in the C++ testsuite, but there 
are a few implementation details that certainly could be different. Note 
that I also tried to be quite conservative, thus actively avoid 
UNKNOWN_LOCATION to ever get through. Eventually I'm proposing to change 
unqualified_fn_lookup_error to take a cp_expr instead of my initial idea 
of adding a location_t parameter: this way the call from 
cp_parser_postfix_expression - which accounts for about a third of the 
locations fixed in the testsuite - is automatically handled and only the 
call from perform_koenig_lookup needs a little adjustment, to wrap the 
location in a cp_expr together with the identifier. But admittedly I 
didn't follow in detail the discussion which led to the introduction of 
cp_expr, I'm not sure this kind of wrapping counts as an "appropriate" 
use. Tested x86_64-linux.


Thanks,
Paolo.

//
/cp
2016-05-30  Paolo Carlini  

PR c++/71238
* lex.c (unqualified_name_lookup_error): Take a location too.
(unqualified_fn_lookup_error): Take a cp_expr.
* cp-tree.h (unqualified_name_lookup_error,
unqualified_fn_lookup_error): Adjust declarations.
* semantics.c (perform_koenig_lookup): Adjust
unqualified_fn_lookup_error call, pass the location of
the identifier too as part of a cp_expr.

/testsuite
2016-05-30  Paolo Carlini  

PR c++/71238
* g++.dg/parse/pr71238.C: New.
* g++.dg/concepts/friend1.C: Test column numbers too.
* g++.dg/cpp0x/initlist31.C: Likewise.
* g++.dg/cpp0x/pr51420.C: Likewise.
* g++.dg/cpp0x/udlit-declare-neg.C: Likewise.
* g++.dg/cpp0x/udlit-member-neg.C: Likewise.
* g++.dg/ext/builtin3.C: Likewise.
* g++.dg/lookup/friend12.C: Likewise.
* g++.dg/lookup/friend7.C: Likewise.
* g++.dg/lookup/koenig1.C: Likewise.
* g++.dg/lookup/koenig5.C: Likewise.
* g++.dg/lookup/used-before-declaration.C: Likewise.
* g++.dg/overload/koenig1.C: Likewise.
* g++.dg/template/crash65.C: Likewise.
* g++.dg/template/friend57.C: Likewise.
* g++.dg/warn/Wshadow-5.C: Likewise.
* g++.dg/warn/Wunused-8.C: Likewise.
* g++.old-deja/g++.bugs/900211_01.C: Likewise.
* g++.old-deja/g++.jason/lineno5.C: Likewise.
* g++.old-deja/g++.jason/member.C: Likewise.
* g++.old-deja/g++.jason/report.C: Likewise.
* g++.old-deja/g++.jason/scoping12.C: Likewise.
* g++.old-deja/g++.law/visibility20.C: Likewise.
* g++.old-deja/g++.ns/koenig5.C: Likewise.
* g++.old-deja/g++.other/static5.C: Likewise.
* g++.old-deja/g++.pt/overload2.C: Likewise.
Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 236880)
+++ cp/cp-tree.h(working copy)
@@ -5974,8 +5974,9 @@ extern tree build_vtbl_address  (t
 extern void cxx_dup_lang_specific_decl (tree);
 extern void yyungetc   (int, int);
 
-extern tree unqualified_name_lookup_error  (tree);
-extern tree unqualified_fn_lookup_error(tree);
+extern tree unqualified_name_lookup_error  (tree,
+location_t = UNKNOWN_LOCATION);
+extern tree unqualified_fn_lookup_error(cp_expr);
 extern tree build_lang_decl(enum tree_code, tree, tree);
 extern tree build_lang_decl_loc(location_t, enum 
tree_code, tree, tree);
 extern void retrofit_lang_decl (tree);
Index: cp/lex.c
===
--- cp/lex.c(revision 236880)
+++ cp/lex.c(working copy)
@@ -443,19 +443,22 @@ handle_pragma_java_exceptions (cpp_reader* /*dfile
IDENTIFIER_NODE) failed.  Returns the ERROR_MARK_NODE.  */
 
 tree
-unqualified_name_lookup_error (tree name)
+unqualified_name_lookup_error (tree name, location_t loc)
 {
+  if (loc == UNKNOWN_LOCATION)
+loc = location_of (name);
+
   if (IDENTIFIER_OPNAME_P (name))
 {
   if (name != ansi_opname (ERROR_MARK))
-   error ("%qD not defined", name);
+   error_at (loc, "%qD not defined", name);
 }
   else
 {
   if (!objc_diagnose_private_ivar (name))
{
- error ("%qD was not declared in this scope", name);
- suggest_alternatives_for (location_of (name), name);
+ error_at (loc, "%qD was not declared in this scope", name);
+ suggest_alternatives_for (loc, name);
}
   /* Prevent repeated error 

Re: Enable loop peeling at -O3

2016-05-30 Thread Jan Hubicka
Hi,
this is version of patch I intend to commit after re-testing at x86_64-linux
with loop peeling enabled at -O3.

It drops -fpeel-all-loops, add logic to not peel loops multiple times
and fix profile updating.

Bootstrapped/regtested x86_64-linux

Honza

* doc/invoke.texi (-fpeel-loops,-O3): Update documentation.
* opts.c (default_options): Enable peel loops at -O3.
* tree-ssa-loop-ivcanon.c (peeled_loops): New static var.
(try_peel_loop): Do not re-peel already peeled loops;
use likely upper bounds; fix profile updating.
(pass_complete_unroll::execute): Initialize peeled_loops.

* gcc.dg/tree-ssa/peel1.c: New testcase.
* gcc.dg/tree-ssa/peel2.c: New testcase.
* gcc.dg/tree-ssa/pr61743-1.c: Disable loop peeling.
* gcc.dg/tree-ssa/pr61743-2.c: Disable loop peeling.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 236873)
+++ doc/invoke.texi (working copy)
@@ -6338,7 +6338,8 @@ by @option{-O2} and also turns on the @o
 @option{-fgcse-after-reload}, @option{-ftree-loop-vectorize},
 @option{-ftree-loop-distribute-patterns}, @option{-fsplit-paths}
 @option{-ftree-slp-vectorize}, @option{-fvect-cost-model},
-@option{-ftree-partial-pre} and @option{-fipa-cp-clone} options.
+@option{-ftree-partial-pre}, @option{-fpeel-loops}
+and @option{-fipa-cp-clone} options.
 
 @item -O0
 @opindex O0
@@ -8661,10 +8662,11 @@ the loop is entered.  This usually makes
 @item -fpeel-loops
 @opindex fpeel-loops
 Peels loops for which there is enough information that they do not
-roll much (from profile feedback).  It also turns on complete loop peeling
-(i.e.@: complete removal of loops with small constant number of iterations).
+roll much (from profile feedback or static analysis).  It also turns on
+complete loop peeling (i.e.@: complete removal of loops with small constant
+number of iterations).
 
-Enabled with @option{-fprofile-use}.
+Enabled with @option{-O3} and/or @option{-fprofile-use}.
 
 @item -fmove-loop-invariants
 @opindex fmove-loop-invariants
Index: opts.c
===
--- opts.c  (revision 236873)
+++ opts.c  (working copy)
@@ -535,6 +535,7 @@ static const struct default_options defa
 { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC 
},
 { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
+{ OPT_LEVELS_3_PLUS, OPT_fpeel_loops, NULL, 1 },
 
 /* -Ofast adds optimizations to -O3.  */
 { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
Index: testsuite/gcc.dg/tree-ssa/peel1.c
===
--- testsuite/gcc.dg/tree-ssa/peel1.c   (revision 0)
+++ testsuite/gcc.dg/tree-ssa/peel1.c   (working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+struct foo {int b; int a[3];} foo;
+void add(struct foo *a,int l)
+{
+  int i;
+  for (i=0;ia[i]++;
+}
+/* { dg-final { scan-tree-dump "Loop 1 likely iterates at most 3 times." 
"cunroll"} } */
+/* { dg-final { scan-tree-dump "Peeled loop 1, 4 times." "cunroll"} } */
Index: testsuite/gcc.dg/tree-ssa/peel2.c
===
--- testsuite/gcc.dg/tree-ssa/peel2.c   (revision 0)
+++ testsuite/gcc.dg/tree-ssa/peel2.c   (working copy)
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fpeel-all-loops -fdump-tree-cunroll-details --param 
max-peel-times=16 --param max-peeled-insns=100" } */
+void add(int *a,int l)
+{
+  int i;
+  for (i=0;i

Re: [PATCH] match.pd: optimize unsigned mul overflow check

2016-05-30 Thread Alexander Monakov
On Sun, 29 May 2016, Marc Glisse wrote:
> On Sat, 28 May 2016, Alexander Monakov wrote:
> > For unsigned A, B, 'A > -1 / B' is a nice predicate for checking whether
> > 'A*B'
> > overflows (or 'B && A > -1 / B' if B may be zero).  Let's optimize it to an
> > invocation of __builtin_mul_overflow to avoid the divide operation.
> 
[snip]
> Any plan on optimizing the 'B && ovf' form?

Not in the immediate future; I don't have a good approach in mind to do that.

(if I get back to this topic I'd try to CSE the product with the IFN result)

Thanks for the discussion and help -- I've applied the patch.

Alexander


[PATCH, testsuite]: Fix clear_non_sret_int_hardware_registers define in gcc.target/i386/iamcu/args.h

2016-05-30 Thread Uros Bizjak
Hello!

2016-05-30  Uros Bizjak  

* gcc.target/i386/iamcu/args.h (clear_non_sret_int_hardware_registers):
Use correct register when clearing %edx.

Regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline and gcc-6 branch.

Uros.
Index: gcc.target/i386/iamcu/args.h
===
--- gcc.target/i386/iamcu/args.h(revision 236882)
+++ gcc.target/i386/iamcu/args.h(working copy)
@@ -30,7 +30,7 @@
 /* Clear all scratch integer registers, excluding the one used to return
aggregate.  */
 #define clear_non_sret_int_hardware_registers \
-  asm __volatile__ ("xor %%edx, %%ebx\n\t" \
+  asm __volatile__ ("xor %%edx, %%edx\n\t" \
"xor %%ecx, %%ecx\n\t" \
::: "edx", "ecx");
 


[PATCH, OpenACC] Make reduction arguments addressable

2016-05-30 Thread Chung-Lin Tang
Hi, a previous patch of Cesar's has made the middle-end omp-lowering
automatically create and insert a tofrom (i.e. present_or_copy) map for
parallel reductions.  This allowed the user to not need explicit
clauses to copy out the reduction result, but because reduction arguments
are not marked addressable, async does not work as expected,
i.e. the asynchronous copy-out results are not used in the compiler generated 
code.

This patch fixes this in the front-ends, I've tested this patch without
new regressions, and fixes some C++ OpenACC tests that regressed after
my last OpenACC async patch.  Is this okay for trunk?

Thanks,
Chung-Lin

2016-05-30  Chung-Lin Tang  

c/
* c-typeck.c (c_finish_omp_clauses): Mark OpenACC reduction
arguments as addressable.

cp/
* semantics.c (finish_omp_clauses): Mark OpenACC reduction
arguments as addressable.

fortran/
* trans-openmp.c (gfc_trans_oacc_construct): Mark OpenACC reduction
arguments as addressable.
(gfc_trans_oacc_combined_directive): Likewise.
Index: c/c-typeck.c
===
--- c/c-typeck.c(revision 236845)
+++ c/c-typeck.c(working copy)
@@ -12575,6 +12575,8 @@ c_finish_omp_clauses (tree clauses, enum c_omp_reg
  remove = true;
  break;
}
+ if (ort & C_ORT_ACC)
+   c_mark_addressable (t);
  type = TREE_TYPE (t);
  if (TREE_CODE (t) == MEM_REF)
type = TREE_TYPE (type);
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 236845)
+++ cp/semantics.c  (working copy)
@@ -5827,6 +5827,8 @@ finish_omp_clauses (tree clauses, enum c_omp_regio
t = n;
  goto check_dup_generic_t;
}
+ if (ort & C_ORT_ACC)
+   cxx_mark_addressable (t);
  goto check_dup_generic;
case OMP_CLAUSE_COPYPRIVATE:
  copyprivate_seen = true;
Index: fortran/trans-openmp.c
===
--- fortran/trans-openmp.c  (revision 236845)
+++ fortran/trans-openmp.c  (working copy)
@@ -2704,6 +2704,10 @@ gfc_trans_oacc_construct (gfc_code *code)
   gfc_start_block ();
   oacc_clauses = gfc_trans_omp_clauses (, code->ext.omp_clauses,
code->loc);
+  for (tree c = oacc_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
+   && DECL_P (OMP_CLAUSE_DECL (c)))
+  TREE_ADDRESSABLE (OMP_CLAUSE_DECL (c)) = 1;
   stmt = gfc_trans_omp_code (code->block->next, true);
   stmt = build2_loc (input_location, construct_code, void_type_node, stmt,
 oacc_clauses);
@@ -3501,6 +3505,10 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
construct_clauses.lists[OMP_LIST_REDUCTION] = NULL;
   oacc_clauses = gfc_trans_omp_clauses (, _clauses,
code->loc);
+  for (tree c = oacc_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
+   && DECL_P (OMP_CLAUSE_DECL (c)))
+ TREE_ADDRESSABLE (OMP_CLAUSE_DECL (c)) = 1;
 }
   if (!loop_clauses.seq)
 pblock = 


Re: [C++ Patch] PR 71109 ("Misleading diagnostic message with 'virtual' used in out-of-line definitions of class template member functions")

2016-05-30 Thread Jason Merrill
OK.

On Sun, May 29, 2016 at 6:49 PM, Paolo Carlini  wrote:
> Hi,
>
> submitter noticed that for wrong uses of 'virtual' outside of template
> classes (B in the testcase) vs plain classes (A) we wrongly emit the
> "templates may not be %" error message. Simply checking
> current_class_type seems enough to solve the problem. Case C in the extended
> testcase double checks that we still give the "templates may not be
> %" error message when appropriate. Tested x86_64-linux.
>
> Thanks,
> Paolo.
>
> //


Re: [PATCH] gcc/config/tilegx/tilegx.c (tilegx_function_profiler): Save r10 to stack before call mcount

2016-05-30 Thread Chen Gang
On 5/30/16 21:32, Chen Gang wrote:
> On 5/30/16 03:18, Mike Stump wrote:
>> On May 29, 2016, at 3:39 AM, cheng...@emindsoft.com.cn wrote:
>>>
>>> r10 may also be as parameter for the nested function, so need save it
>>> before call mcount.
>>
>> mcount can have a special abi where it preserves more registers than one 
>> would otherwise expect.  I'm wondering if you know what registers it saves 
>> or doesn't touch?  Does this fix any bug found by running tests, or just by 
>> inspection?
>>
> 
> It is about Bug71331, which I reported yesterday.
> 
> For nested function, r10 is treated as the stack pointer for parameters,
> mcount really save r10, but tilegx also use r10 to save lr, so cause
> this issue ("move r10, lr; jal __mcount").
> 
> What I have done just like gcc x86 has done ("push r10; callq mcount").
> 

After this patch, nested-func-4.c can passes compiling and running under
tilegx qemu. :-)

Thanks.
-- 
Chen Gang (陈刚)

Managing Natural Environments is the Duty of Human Beings.


Re: [PATCH, i386, AVX-512] Fix PR target/71346.

2016-05-30 Thread Jakub Jelinek
On Mon, May 30, 2016 at 04:46:50PM +0300, Kirill Yukhin wrote:
> Hello,
> Patch limits constraint for scalar operand in split to AVX-512VL.
> 
> Boostrap/regtest in progress for x86_64/ia32.
> 
> I'll check it in if pass.
> 
> PR target/71346.
> gcc/
>   * config/i386/sse.md (define_insn_and_split "*vec_extractv4sf_0"): Use
>   `Yv' for scalar operand.
> testsuite/
>   * gcc.target/i386/prr71346.c: New test.

Too many `r' chars above.

LGTM otherwise and sorry for introducing this in r235970.

> commit 6c0021bea7a5be8d9a10ef3f2fb30c1228f53d48
> Author: Kirill Yukhin 
> Date:   Mon May 30 16:31:28 2016 +0300
> 
> AVX-512. Fix PR target/71346.
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index b348f2d..1267897 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -6837,7 +6837,7 @@
>"operands[1] = gen_lowpart (SFmode, operands[1]);")
>  
>  (define_insn_and_split "*sse4_1_extractps"
> -  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,v,v")
> +  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,Yv,Yv")
>   (vec_select:SF
> (match_operand:V4SF 1 "register_operand" "Yr,*x,v,0,v")
> (parallel [(match_operand:SI 2 "const_0_to_3_operand" 
> "n,n,n,n,n")])))]
> diff --git a/gcc/testsuite/gcc.target/i386/pr71346.c 
> b/gcc/testsuite/gcc.target/i386/pr71346.c
> new file mode 100644
> index 000..0a15869
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr71346.c
> @@ -0,0 +1,25 @@
> +/* PR target/71346 */
> +/* { dg-do compile { target { ! ia32 } }  } */
> +/* { dg-options "-O2 -ftree-vectorize -ffast-math -march=knl" } */
> +
> +typedef int rvec[3];
> +int a;
> +float b, c, d, g;
> +rvec *e, *f;
> +void fn2(float h, float g);
> +
> +void
> +fn1()
> +{
> +  float h;
> +  for (; a; a++) {
> +h += e[a][0] * f[a][0];
> +b += e[a][0] * f[a][1];
> +c += e[a][2] * f[a][0];
> +d += e[a][2] * f[a][1];
> +g += e[a][2] * f[a][2];
> +  }
> +  fn2(h, g);
> +}
> +
> +/* { dg-final { scan-assembler-not "vshufps\[ 
> \\t\]+\[^\n\]*%\xmm(?:1\[6-9\]|\[2-3\]\[0-9\])" } } */

Jakub


Re: [PATCH, PR69068] Handle 3-arg phi in copy_bb_and_scalar_dependences

2016-05-30 Thread Richard Biener
On Mon, 30 May 2016, Tom de Vries wrote:

> On 30/05/16 12:56, Richard Biener wrote:
> > On Mon, 30 May 2016, Tom de Vries wrote:
> > 
> > > >On 30/05/16 11:46, Richard Biener wrote:
> > > > > > > >This patch fixes the assert conservatively by aborting graphite
> > > > > code
> > > > > > > > > >generation when encountering a phi with more than two
> > > > > > arguments in
> > > > > > > > > >copy_bb_and_scalar_dependences.
> > > > > > > > > >
> > > > > > > > > >Bootstrapped and reg-tested on x86_64.
> > > > > > > > > >
> > > > > > > > > >OK for trunk, 6 branch?
> > > >
> > > > > >Did you check if simply returning false from
> > > > bb_contains_loop_phi_nodes
> > > > > >instead of asserting works?  The caller has a 'else' that is supposed
> > > > > >to handle condition PHIs.  After all it already handles one
> > > > predecessor
> > > > > >specially ...  Thus
> > > > > >
> > > > > >if (EDGE_COUNT (bb->preds) != 2)
> > > > > >  return false;
> > > > > >
> > > > > >should work here.
> > > >
> > > >Unfortunately, that doesn't work. We run into another assert in
> > > >copy_cond_phi_nodes:
> > > >...
> > > >   /* Cond phi nodes should have exactly two arguments.  */
> > > >   gcc_assert (2 == EDGE_COUNT (bb->preds));
> > > >...
> > Hah.  So can we still do my suggested change and bail out conservatively
> > in Cond PHI node handling instead?  Because the PHI node is clearly
> > _not_  a loop header PHI and the cond phi assert is also a latent bug.
> > 
> 
> Agreed. Updated as attached.
> 
> OK for trunk, 6 branch?

Ok with the now redundant 2nd check removed

@@ -1075,7 +1075,8 @@ bb_contains_loop_close_phi_nodes (basic_block bb)
 static bool
 bb_contains_loop_phi_nodes (basic_block bb)
 {
-  gcc_assert (EDGE_COUNT (bb->preds) <= 2);
+  if (EDGE_COUNT (bb->preds) != 2)
+return false;
 
   if (bb->preds->length () == 1)
 return false;
^^


Thanks,
Richard.


> Thanks,
> - Tom
> 
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH, PR69068] Handle 3-arg phi in copy_bb_and_scalar_dependences

2016-05-30 Thread Tom de Vries

On 30/05/16 12:56, Richard Biener wrote:

On Mon, 30 May 2016, Tom de Vries wrote:


>On 30/05/16 11:46, Richard Biener wrote:

> > >This patch fixes the assert conservatively by aborting graphite code

> > > >generation when encountering a phi with more than two arguments in
> > > >copy_bb_and_scalar_dependences.
> > > >
> > > >Bootstrapped and reg-tested on x86_64.
> > > >
> > > >OK for trunk, 6 branch?

>

> >Did you check if simply returning false from bb_contains_loop_phi_nodes
> >instead of asserting works?  The caller has a 'else' that is supposed
> >to handle condition PHIs.  After all it already handles one predecessor
> >specially ...  Thus
> >
> >if (EDGE_COUNT (bb->preds) != 2)
> >  return false;
> >
> >should work here.

>
>Unfortunately, that doesn't work. We run into another assert in
>copy_cond_phi_nodes:
>...
>   /* Cond phi nodes should have exactly two arguments.  */
>   gcc_assert (2 == EDGE_COUNT (bb->preds));
>...

Hah.  So can we still do my suggested change and bail out conservatively
in Cond PHI node handling instead?  Because the PHI node is clearly
_not_  a loop header PHI and the cond phi assert is also a latent bug.



Agreed. Updated as attached.

OK for trunk, 6 branch?

Thanks,
- Tom


Handle 3-arg phi in copy_bb_and_scalar_dependences

2016-05-30  Tom de Vries  

	PR tree-optimization/69068
	* graphite-isl-ast-to-gimple.c (copy_bb_and_scalar_dependences): Handle
	phis with more than two args.

	* gcc.dg/graphite/pr69068.c: New test.

---
 gcc/graphite-isl-ast-to-gimple.c| 10 ++
 gcc/testsuite/gcc.dg/graphite/pr69068.c | 14 ++
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index ff1d91f..b4ee9a9 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1075,7 +1075,8 @@ bb_contains_loop_close_phi_nodes (basic_block bb)
 static bool
 bb_contains_loop_phi_nodes (basic_block bb)
 {
-  gcc_assert (EDGE_COUNT (bb->preds) <= 2);
+  if (EDGE_COUNT (bb->preds) != 2)
+return false;
 
   if (bb->preds->length () == 1)
 return false;
@@ -2480,13 +2481,14 @@ copy_cond_phi_nodes (basic_block bb, basic_block new_bb, vec iv_map)
 
   gcc_assert (!bb_contains_loop_close_phi_nodes (bb));
 
+  /* TODO: Handle cond phi nodes with more than 2 predecessors.  */
+  if (EDGE_COUNT (bb->preds) != 2)
+return false;
+
   if (dump_file)
 fprintf (dump_file, "[codegen] copying cond phi nodes in bb_%d.\n",
 	 new_bb->index);
 
-  /* Cond phi nodes should have exactly two arguments.  */
-  gcc_assert (2 == EDGE_COUNT (bb->preds));
-
   for (gphi_iterator psi = gsi_start_phis (bb); !gsi_end_p (psi);
gsi_next ())
 {
diff --git a/gcc/testsuite/gcc.dg/graphite/pr69068.c b/gcc/testsuite/gcc.dg/graphite/pr69068.c
new file mode 100644
index 000..0abea06
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/pr69068.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fgraphite-identity" } */
+
+int qo;
+int zh[2];
+
+void
+td (void)
+{
+  int ly, en;
+  for (ly = 0; ly < 2; ++ly)
+for (en = 0; en < 2; ++en)
+  zh[en] = ((qo == 0) || (((qo * 2) != 0))) ? 1 : -1;
+}


[PATCH, i386, AVX-512] Fix PR target/71346.

2016-05-30 Thread Kirill Yukhin
Hello,
Patch limits constraint for scalar operand in split to AVX-512VL.

Boostrap/regtest in progress for x86_64/ia32.

I'll check it in if pass.

PR target/71346.
gcc/
* config/i386/sse.md (define_insn_and_split "*vec_extractv4sf_0"): Use
`Yv' for scalar operand.
testsuite/
* gcc.target/i386/prr71346.c: New test.

--
Thanks, K

commit 6c0021bea7a5be8d9a10ef3f2fb30c1228f53d48
Author: Kirill Yukhin 
Date:   Mon May 30 16:31:28 2016 +0300

AVX-512. Fix PR target/71346.

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b348f2d..1267897 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -6837,7 +6837,7 @@
   "operands[1] = gen_lowpart (SFmode, operands[1]);")
 
 (define_insn_and_split "*sse4_1_extractps"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,v,v")
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,Yv,Yv")
(vec_select:SF
  (match_operand:V4SF 1 "register_operand" "Yr,*x,v,0,v")
  (parallel [(match_operand:SI 2 "const_0_to_3_operand" 
"n,n,n,n,n")])))]
diff --git a/gcc/testsuite/gcc.target/i386/pr71346.c 
b/gcc/testsuite/gcc.target/i386/pr71346.c
new file mode 100644
index 000..0a15869
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr71346.c
@@ -0,0 +1,25 @@
+/* PR target/71346 */
+/* { dg-do compile { target { ! ia32 } }  } */
+/* { dg-options "-O2 -ftree-vectorize -ffast-math -march=knl" } */
+
+typedef int rvec[3];
+int a;
+float b, c, d, g;
+rvec *e, *f;
+void fn2(float h, float g);
+
+void
+fn1()
+{
+  float h;
+  for (; a; a++) {
+h += e[a][0] * f[a][0];
+b += e[a][0] * f[a][1];
+c += e[a][2] * f[a][0];
+d += e[a][2] * f[a][1];
+g += e[a][2] * f[a][2];
+  }
+  fn2(h, g);
+}
+
+/* { dg-final { scan-assembler-not "vshufps\[ 
\\t\]+\[^\n\]*%\xmm(?:1\[6-9\]|\[2-3\]\[0-9\])" } } */


RE: [PATCH] x86 interrupt attribute patch [2/2]

2016-05-30 Thread Koval, Julia
Hi,
Here is the fixed version of the patch. Ok for trunk?


-Original Message-
From: Sandra Loosemore [mailto:san...@codesourcery.com] 
Sent: Tuesday, May 10, 2016 11:02 PM
To: Koval, Julia ; gcc-patches@gcc.gnu.org
Cc: Lu, Hongjiu ; vaalfr...@gmail.com; ubiz...@gmail.com; 
l...@redhat.com; Zamyatin, Igor 
Subject: Re: [PATCH] x86 interrupt attribute patch [2/2]

On 04/20/2016 07:42 AM, Koval, Julia wrote:
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 
> a5a8b23..82de5bf 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -5263,6 +5263,83 @@ On x86-32 targets, the @code{stdcall} attribute 
> causes the compiler to  assume that the called function pops off the 
> stack space used to  pass arguments, unless it takes a variable number of 
> arguments.
>
> +@item no_caller_saved_registers
> +@cindex @code{no_caller_saved_registers} function attribute, x86 Use 
> +this attribute to indicate that the specified function has no 
> +caller-saved registers.  That is, all registers are callee-saved.
> +The compiler generates proper function entry and exit sequences to 
> +save and restore any modified registers, except for the EFLAGS 
> +register.  If the compiler generates MPX, SSE, MMX or x87 
> +instructions in a function with @code{no_caller_saved_registers} 
> +attribute or functions called from a function with 
> +@code{no_caller_saved_registers} attribute may contain MPX, SSE, MMX 
> +or x87 instructions, the compiler must save and restore the corresponding 
> state.

I cannot parse the last sentence in this paragraph.  How can the compiler know 
whether called functions may contain those instructions? 
Plus, talking about what the compiler must do seems too implementor-speaky for 
user documentation.  Maybe you mean something like "The compiler also saves and 
restores state associated with MPX, SSE, MMX, and x87 instructions." ?

I also think the documentation needs to give some hint about why a user would 
want to put this attribute on a function.

> +
> +@item interrupt
> +@cindex @code{interrupt} function attribute, x86 Use this attribute 
> +to indicate that the specified function is an interrupt handler.  The 
> +compiler generates function entry and exit sequences suitable for use 
> +in an interrupt handler when this attribute is present.  The 
> +@code{IRET} instruction, instead of the @code{RET} instruction, is 
> +used to return from interrupt handlers.  All registers, except for 
> +the EFLAGS register which is restored by the @code{IRET} instruction, 
> +are preserved by the compiler.  If the compiler generates MPX, SSE, 
> +MMX or x87 instructions in an interrupt handler, or functions called 
> +from an interrupt handler may contain MPX, SSE, MMX or x87 
> +instructions, the compiler must save and restore the corresponding 
> +state.

Similar problems here.

 From the further discussion that follows, it appears that you can use the 
"interrupt" attribute on exception handlers as well, but the paragraph above 
only mentions interrupt handlers.

> +
> +Since the direction flag in the FLAGS register in interrupt handlers 
> +is undetermined, cld instruction must be emitted in function prologue 
> +if rep string instructions are used in interrupt handler or interrupt 
> +handler isn't a leaf function.

Again, this sounds like implementor-speak, and there are grammatical errors 
(noun/verb disagreement, missing articles).  Do users of this attribute need to 
know what instructions the compiler is emitting?  We already say above that it 
causes GCC to generate suitable entry and exit sequences.

> +
> +Any interruptible-without-stack-switch code must be compiled with
> +@option{-mno-red-zone} since interrupt handlers can and will, because
> +of the hardware design, touch the red zone.
> +
> +An interrupt handler must be declared with a mandatory pointer
> +argument:
> +
> +@smallexample
> +struct interrupt_frame;
> +
> +__attribute__ ((interrupt))
> +void
> +f (struct interrupt_frame *frame)
> +@{
> +@}
> +@end smallexample
> +
> +and user must properly define the structure the pointer pointing to.

"user" == "you" in the GCC user manual.

How do you "properly define" this structure, or is that a stupid 
question?  (I know very little about x86, maybe this is obvious to experts.)

> +
> +The exception handler is very similar to the interrupt handler with
> +a different mandatory function signature:
> +
> +@smallexample
> +#ifdef __x86_64__
> +typedef unsigned long long int uword_t;
> +#else
> +typedef unsigned int uword_t;
> +#endif
> +
> +struct interrupt_frame;
> +
> +__attribute__ ((interrupt))
> +void
> +f (struct interrupt_frame *frame, uword_t error_code)
> +@{
> +  ...
> +@}
> +@end smallexample
> +
> +and compiler pops the error code off the stack before the @code{IRET}
> +instruction.
> +
> +The exception handler should only be used for exceptions which push an
> +error code and all 

Re: [PATCH] gcc/config/tilegx/tilegx.c (tilegx_function_profiler): Save r10 to stack before call mcount

2016-05-30 Thread Chen Gang
On 5/30/16 03:18, Mike Stump wrote:
> On May 29, 2016, at 3:39 AM, cheng...@emindsoft.com.cn wrote:
>>
>> r10 may also be as parameter for the nested function, so need save it
>> before call mcount.
> 
> mcount can have a special abi where it preserves more registers than one 
> would otherwise expect.  I'm wondering if you know what registers it saves or 
> doesn't touch?  Does this fix any bug found by running tests, or just by 
> inspection?
> 

It is about Bug71331, which I reported yesterday.

For nested function, r10 is treated as the stack pointer for parameters,
mcount really save r10, but tilegx also use r10 to save lr, so cause
this issue ("move r10, lr; jal __mcount").

What I have done just like gcc x86 has done ("push r10; callq mcount").

Thanks.
-- 
Chen Gang (陈刚)

Managing Natural Environments is the Duty of Human Beings.


Re: [PATCH][2/3] Vectorize inductions that are live after the loop

2016-05-30 Thread Richard Biener
On Fri, May 27, 2016 at 5:12 PM, Alan Hayward  wrote:
>
> On 27/05/2016 12:41, "Richard Biener"  wrote:
>
>>On Fri, May 27, 2016 at 11:09 AM, Alan Hayward 
>>wrote:
>>> This patch is a reworking of the previous vectorize inductions that are
>>> live
>>> after the loop patch.
>>> It now supports SLP and an optimisation has been moved to patch [3/3].
>>>
>>>
>>> Stmts which are live (ie: defined inside a loop and then used after the
>>> loop)
>>> are not currently supported by the vectorizer.  In many cases
>>> vectorization can
>>> still occur because the SCEV cprop pass will hoist the definition of the
>>> stmt
>>> outside of the loop before the vectorizor pass. However, there are
>>>various
>>> cases SCEV cprop cannot hoist, for example:
>>>   for (i = 0; i < n; ++i)
>>> {
>>>   ret = x[i];
>>>   x[i] = i;
>>> }
>>>return i;
>>>
>>> Currently stmts are marked live using a bool, and the relevant state
>>>using
>>> an
>>> enum. Both these states are propagated to the definition of all uses of
>>>the
>>> stmt. Also, a stmt can be live but not relevant.
>>>
>>> This patch vectorizes a live stmt definition normally within the loop
>>>and
>>> then
>>> after the loop uses BIT_FIELD_REF to extract the final scalar value from
>>> the
>>> vector.
>>>
>>> This patch adds a new relevant state (vect_used_only_live) for when a
>>>stmt
>>> is
>>> used only outside the loop. The relevant state is still propagated to
>>>all
>>> it's
>>> uses, but the live bool is not (this ensures that
>>> vectorizable_live_operation
>>> is only called with stmts that really are live).
>>>
>>> Tested on x86 and aarch64.
>>
>>+  /* If STMT is a simple assignment and its inputs are invariant, then
>>it can
>>+ remain in place, unvectorized.  The original last scalar value that
>>it
>>+ computes will be used.  */
>>+  if (is_simple_and_all_uses_invariant (stmt, loop_vinfo))
>> {
>>
>>so we can't use STMT_VINFO_RELEVANT or so?  I thought we somehow
>>mark stmts we don't vectorize in the end.
>
> It’s probably worth making clear that this check exists in the current
> GCC head - today vectorize_live_operation only supports the
> simple+invariant
> case and the SSE2 case.
> My patch simply moved the simple+invariant code into the new function
> is_simple_and_all_uses_invariant.

Ah, didn't notice this.

> Looking at this again, I think what we really care about is….
> *If the stmt is live but not relevant, we need to mark it to ensure it is
> vectorised.
> *If a stmt is simple+invariant then a live usage of it can either use the
> scalar or vectorized version.
>
> So for a live stmt:
> *If it is simple+invariant and not relevant, then it is more optimal to
> use the
> scalar version.
> *If it is simple+invariant and relevant then it is more optimal to use the
> vectorized version.
> *If it is not simple+invariant then we must always use the vectorized
> version.

So, if ! relevant (aka not vectorized) use the scalar version.  Otherwise
use the vectorized version.  The case we fail to handle is a live but not
relevant computation derived from a vectorized stmt (the !invariant and
!relevant case).  This case should be handled by sinking the computation
up to the vectorized stmt to the exit and handling the "last" live operation
by using the vectorized def.  In fact stmt sinking should have performed
this sinking already (so a check and not vectorizing this case just in
case that didn't happen is fine I think)

> Therefore, the patch as it stands is correct but not optimal. In patch 3/3
> for
> the code above (vectorize_live_operation) I can change the check to if not
> relevant then assert that it is not simple+invariant and return true.

Sounds good.

>
>
>>
>>+  lhs = (is_a  (stmt)) ? gimple_phi_result (stmt)
>>+   : gimple_get_lhs (stmt);
>>+  lhs_type = TREE_TYPE (lhs);
>>+
>>+  /* Find all uses of STMT outside the loop.  */
>>+  auto_vec worklist;
>>+  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
>>+{
>>+  basic_block bb = gimple_bb (use_stmt);
>>+
>>+  if (!flow_bb_inside_loop_p (loop, bb))
>>+   worklist.safe_push (use_stmt);
>> }
>>+  gcc_assert (!worklist.is_empty ());
>>
>>as we are in loop-closed SSA there should be exactly one such use...?
>
> Yes, I should change this to assert that worklist is of length 1.
>
>>
>>+  /* Get the correct slp vectorized stmt.  */
>>+  vec_oprnds.create (num_vec);
>>+  vect_get_slp_vect_defs (slp_node, _oprnds);
>>
>>As you look at the vectorized stmt you can directly use the
>>SLP_TREE_VEC_STMTS
>>array (the stmts lhs, of course), no need to export this function.
>
> Ok, I can change this.
>
>>
>>The rest of the changes look ok to me.
>
> Does that include PATCH 1/3  ?

I don't like how 1/3 ends up looking :/  So what was the alternative again?
I looked into 1/3 and what it takes to remove the 'stmt' argument and
instead pass in a vect_def_type.  It's a bit 

Re: [SPARC] Support for --with-{cpu,tune}-{32,64} in sparc*-* targets

2016-05-30 Thread Paolo Carlini

Hi Jose,

On 30/05/2016 12:03, jose.march...@oracle.com wrote:

 > Tested in sparc64-linux-gnu, sparcv9-linux-gnu and sparc-sun-solaris2.11.
 >
 > 2016-05-25  Jose E. Marchesi  
 >
 >   * config.gcc (sparc*-*-*): Support cpu_32, cpu_64, tune_32 and
 >   tune_64.
 >   * doc/install.texi (--with-cpu-32, --with-cpu-64): Document
 >   support on SPARC.
 >   * config/sparc/linux64.h (OPTION_DEFAULT_SPECS): Add entries for
 >   cpu_32, cpu_64, tune_32 and tune_64.
 >   * config/sparc/sol2.h (OPTION_DEFAULT_SPECS): Likewise.
 
 OK for mainline, thanks.


Thanks.  I would need someone to install it, as I don't have commit
access to the svn.

If nobody beats me to it, I can take care of that later today.

Cheers,
Paolo.


Re: [PATCH][RFC] Remove ifcvt_repair_bool_pattern, re-do bool patterns

2016-05-30 Thread Ilya Enkovich
2016-05-30 14:04 GMT+03:00 Richard Biener :
>
> The following patch removes the restriction on seeing a tree of stmts
> in vectorizer bool pattern detection (aka single-use).  With this
> it is no longer necessary to unshare DEFs in ifcvt_repair_bool_pattern
> and that compile-time hog can go (it's now enabled unconditionally for GCC 7).
>
> Instead the pattern detection code will now "unshare" the condition tree
> for each bool pattern root my means of adding all pattern stmts of the
> condition tree to its pattern def sequence (so we still get some
> unnecessary copying, worst-case quadratic rather than exponential).
>
> Ilja - I had to disable the
>
>   tree mask_type = get_mask_type_for_scalar_type (TREE_TYPE
> (rhs1));
>   if (mask_type
>   && expand_vec_cmp_expr_p (comp_vectype, mask_type))
> return false;
>
> check you added to check_bool_pattern to get any coverage for bool
> patterns on x86_64.  Doing that regresses
>
> FAIL: gcc.target/i386/mask-pack.c scan-tree-dump-times vect "vectorized 1
> loops" 10
> FAIL: gcc.target/i386/mask-unpack.c scan-tree-dump-times vect "vectorized
> 1 loops" 10
> FAIL: gcc.target/i386/pr70021.c scan-tree-dump-times vect "vectorized 1
> loops" 2
>
> so somehow bool patterns mess up things here (I didn't investigate).
> The final patch will enable the above path again, avoiding the regression.

Mask conversion patterns handle some cases bool patterns don't.  So it's
expected we can't vectorize some loops if bool patterns are enforced.

Thanks,
Ilya

>
> Yuri - I suppose you have a larger set of testcases using OMP simd
> or other forced vectorization you added ifcvt_repair_bool_pattern for.
> I'd appreciate testing (and testcases if anything fails unexpectedly).
>
> Testing on other targets is of course appreciated as well.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu (with the #if 0).
>
> Comments?
>
> I agree with Ilya elsewhere to remove bool patterns completely
> (as a first step making them apply to individual stmts).  We do
> currently not support mixed cond/non-cond uses anyway, like
>
> _Bool a[64];
> unsigned char b[64];
>
> void foo (void)
> {
>   for (int i = 0; i < 64; ++i)
> {
>   _Bool x = a[i] && b[i] < 10;
>   a[i] = x;
> }
> }
>
> and stmt-local "patterns" can be added when vectorizing the stmt
> (like in the above case a tem = a[i] != 0 ? -1 : 0).  Doing
> bool "promotion" optimally requires a better vectorizer IL
> (similar to placing of shuffles).
>
> Thanks,
> Richard.
>


Fix profile updating in loop peeling

2016-05-30 Thread Jan Hubicka
Hi,
this patch fixes profile updates in loop peeling pass.  First it correctly
set wont_exit which can only be set when we know the number of iterations
tested by EXIT and this number is higher than maximal number of iterations
(an unlikely case which is usually removed by VRP pass or earlier cunroll).

Second problem is that we determine number of peelings as number of estimated
iterations + 1.  After peeling we currently underflow updating estimates
which makes them to be re-computed later to bogus values.

Last change is way we netermine profile for the new loop header.  We used to
drop the loop frequency 1/1000.  This patch makes use of the info on remaining
entry edges to the loop.

While working on this I noticed that try_peel_loop has tendency to iterate
and peel one loop many times.  I will prepare followup for that.  Also
testcases will come once I commit the change enabling loop peeling at -O3
by using likely estimates.

Bootstrapped/regtested x86_64-linux.

* tree-ssa-loop-ivcanon.c (try_peel_loop): Correctly set wont_exit
for peeled copies; avoid underflow when updating estimates; correctly
scale loop profile.

Index: tree-ssa-loop-ivcanon.c
===
--- tree-ssa-loop-ivcanon.c (revision 236874)
+++ tree-ssa-loop-ivcanon.c (working copy)
@@ -970,7 +970,9 @@ try_peel_loop (struct loop *loop,
   if (!flag_peel_loops || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0)
 return false;
 
-  /* Peel only innermost loops.  */
+  /* Peel only innermost loops.
+ While the code is perfectly capable of peeling non-innermost loops,
+ the heuristics would probably need some improvements. */
   if (loop->inner)
 {
   if (dump_file)
@@ -1029,13 +1031,23 @@ try_peel_loop (struct loop *loop,
   /* Duplicate possibly eliminating the exits.  */
   initialize_original_copy_tables ();
   wont_exit = sbitmap_alloc (npeel + 1);
-  bitmap_ones (wont_exit);
-  bitmap_clear_bit (wont_exit, 0);
+  if (exit && niter
+  && TREE_CODE (niter) == INTEGER_CST
+  && wi::leu_p (npeel, wi::to_widest (niter)))
+{
+  bitmap_ones (wont_exit);
+  if (wi::eq_p (wi::to_widest (niter), npeel))
+bitmap_clear_bit (wont_exit, 0);
+}
+  else
+{
+  exit = NULL;
+  bitmap_clear (wont_exit);
+}
   if (!gimple_duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
 npeel, wont_exit,
 exit, _remove,
-DLTHE_FLAG_UPDATE_FREQ
-| DLTHE_FLAG_COMPLETTE_PEEL))
+DLTHE_FLAG_UPDATE_FREQ))
 {
   free_original_copy_tables ();
   free (wont_exit);
@@ -1053,14 +1065,48 @@ try_peel_loop (struct loop *loop,
   fprintf (dump_file, "Peeled loop %d, %i times.\n",
   loop->num, (int) npeel);
 }
+  if (loop->any_estimate)
+{
+  if (wi::ltu_p (npeel, loop->nb_iterations_estimate))
+loop->nb_iterations_estimate -= npeel;
+  else
+   loop->nb_iterations_estimate = 0;
+}
   if (loop->any_upper_bound)
-loop->nb_iterations_upper_bound -= npeel;
+{
+  if (wi::ltu_p (npeel, loop->nb_iterations_estimate))
+loop->nb_iterations_upper_bound -= npeel;
+  else
+loop->nb_iterations_upper_bound = 0;
+}
   if (loop->any_likely_upper_bound)
-loop->nb_iterations_likely_upper_bound -= npeel;
-  loop->nb_iterations_estimate = 0;
-  /* Make sure to mark loop cold so we do not try to peel it more.  */
-  scale_loop_profile (loop, 1, 0);
-  loop->header->count = 0;
+{
+  if (wi::ltu_p (npeel, loop->nb_iterations_estimate))
+   loop->nb_iterations_likely_upper_bound -= npeel;
+  else
+   {
+ loop->any_estimate = true;
+ loop->nb_iterations_estimate = 0;
+ loop->nb_iterations_likely_upper_bound = 0;
+   }
+}
+  gcov_type entry_count = 0;
+  int entry_freq = 0;
+
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, loop->header->preds)
+if (e->src != loop->latch)
+  {
+   entry_count += e->src->count;
+   entry_freq += e->src->frequency;
+   gcc_assert (!flow_bb_inside_loop_p (loop, e->src));
+  }
+  int scale = 1;
+  if (loop->header->count)
+scale = RDIV (entry_count * REG_BR_PROB_BASE, loop->header->count);
+  else if (loop->header->frequency)
+scale = RDIV (entry_freq * REG_BR_PROB_BASE, loop->header->frequency);
+  scale_loop_profile (loop, scale, 0);
   return true;
 }
 /* Adds a canonical induction variable to LOOP if suitable.


[PATCH 2/2] [ARC] Update target specific tests.

2016-05-30 Thread Claudiu Zissulescu
Update the ARC specific tests.

OK to apply?
Claudiu

gcc/
2016-05-26  Claudiu Zissulescu  

* testsuite/gcc.target/arc/abitest.S: New file.
* testsuite/gcc.target/arc/va_args-1.c: Likewise.
* testsuite/gcc.target/arc/va_args-2.c: Likewise.
* testsuite/gcc.target/arc/va_args-3.c: Likewise.
* testsuite/gcc.target/arc/mcrc.c: Deleted.
* testsuite/gcc.target/arc/mdsp-packa.c: Likewise.
* testsuite/gcc.target/arc/mdvbf.c: Likewise.
* testsuite/gcc.target/arc/mmac-24.c: Likewise.
* testsuite/gcc.target/arc/mmac-d16.c: Likewise.
* testsuite/gcc.target/arc/mno-crc.c: Likewise.
* testsuite/gcc.target/arc/mno-dsp-packa.c: Likewise.
* testsuite/gcc.target/arc/mno-dvbf.c: Likewise.
* testsuite/gcc.target/arc/mno-mac-24.c: Likewise.
* testsuite/gcc.target/arc/mno-mac-d16.c: Likewise.
* testsuite/gcc.target/arc/mno-rtsc.c: Likewise.
* testsuite/gcc.target/arc/mno-xy.c: Likewise.
* testsuite/gcc.target/arc/mrtsc.c: Likewise.
* testsuite/gcc.target/arc/arc.exp (check_effective_target_arcem):
New function.
(check_effective_target_arc700): Likewise.
(check_effective_target_arc6xx): Likewise.
(check_effective_target_arcmpy): Likewise.
(check_effective_target_archs): Likewise.
(check_effective_target_clmcpu): Likewise.
* testsuite/gcc.target/arc/barrel-shifter-1.c: Changed.
* testsuite/gcc.target/arc/builtin_simd.c: Test only for ARC700
cpus.
* testsuite/gcc.target/arc/cmem-1.c: Changed.
* testsuite/gcc.target/arc/cmem-2.c: Likewise.
* testsuite/gcc.target/arc/cmem-3.c: Likewise.
* testsuite/gcc.target/arc/cmem-4.c: Likewise.
* testsuite/gcc.target/arc/cmem-5.c: Likewise.
* testsuite/gcc.target/arc/cmem-6.c: Likewise.
* testsuite/gcc.target/arc/cmem-7.c: Likewise.
* testsuite/gcc.target/arc/interrupt-1.c: Test for RTIE as well.
* testsuite/gcc.target/arc/interrupt-2.c: Skip it for ARCv2 cores.
* testsuite/gcc.target/arc/interrupt-3.c: Match also ARCv2
warnings.
* testsuite/gcc.target/arc/jump-around-jump.c: Update options.
* testsuite/gcc.target/arc/mARC601.c: Changed.
* testsuite/gcc.target/arc/mcpu-arc600.c: Changed.
* testsuite/gcc.target/arc/mcpu-arc601.c: Changed.
* testsuite/gcc.target/arc/mcpu-arc700.c: Changed.
* testsuite/gcc.target/arc/mdpfp.c: Skip for ARCv2 cores.
* testsuite/gcc.target/arc/movb-1.c: Changed.
* testsuite/gcc.target/arc/movb-2.c: Likewise.
* testsuite/gcc.target/arc/movb-3.c: Likewise.
* testsuite/gcc.target/arc/movb-4.c: Likewise.
* testsuite/gcc.target/arc/movb-5.c: Likewise.
* testsuite/gcc.target/arc/movb_cl-1.c: Likewise.
* testsuite/gcc.target/arc/movb_cl-2.c: Likewise.
* testsuite/gcc.target/arc/movbi_cl-1.c: Likewise.
* testsuite/gcc.target/arc/movh_cl-1.c: Likewise.
* testsuite/gcc.target/arc/mspfp.c: Skip for ARC HS cores.
* testsuite/gcc.target/arc/mul64.c: Enable it only for ARC600.
* testsuite/gcc.target/arc/mulsi3_highpart-1.c: Scan for ARCv2
instructions.
* testsuite/gcc.target/arc/mulsi3_highpart-2.c: Skip it for ARCv1
cores.
* testsuite/gcc.target/arc/no-dpfp-lrsr.c: Skip it for ARC HS.
* testsuite/gcc.target/arc/trsub.c: Only for ARC EM cores.
* testsuite/gcc.target/arc/builtin_simdarc.c: Changed.
* testsuite/gcc.target/arc/extzv-1.c: Likewise.
* testsuite/gcc.target/arc/insv-1.c: Likewise.
* testsuite/gcc.target/arc/insv-2.c: Likewise.
* testsuite/gcc.target/arc/mA6.c: Likewise.
* testsuite/gcc.target/arc/mA7.c: Likewise.
* testsuite/gcc.target/arc/mARC600.c: Likewise.
* testsuite/gcc.target/arc/mARC700.c: Likewise.
* testsuite/gcc.target/arc/mcpu-arc600.c: Likewise.
* testsuite/gcc.target/arc/mcpu-arc700.c: Likewise.
* testsuite/gcc.target/arc/movl-1.c: Likewise.
* testsuite/gcc.target/arc/nps400-1.c: Likewise.
* testsuite/gcc.target/arc/trsub.c: Likewise.
---
 gcc/testsuite/gcc.target/arc/abitest.S   | 31 +++
 gcc/testsuite/gcc.target/arc/arc.exp | 66 +++-
 gcc/testsuite/gcc.target/arc/barrel-shifter-1.c  |  2 +-
 gcc/testsuite/gcc.target/arc/builtin_simd.c  |  1 +
 gcc/testsuite/gcc.target/arc/builtin_simdarc.c   |  1 +
 gcc/testsuite/gcc.target/arc/cmem-1.c|  1 +
 gcc/testsuite/gcc.target/arc/cmem-2.c|  1 +
 gcc/testsuite/gcc.target/arc/cmem-3.c|  1 +
 gcc/testsuite/gcc.target/arc/cmem-4.c|  1 +
 gcc/testsuite/gcc.target/arc/cmem-5.c|  1 +
 gcc/testsuite/gcc.target/arc/cmem-6.c|  1 +
 gcc/testsuite/gcc.target/arc/cmem-7.c|  1 +
 

[PATCH 1/2] [ARC] New option handling, refurbish multilib support.

2016-05-30 Thread Claudiu Zissulescu
Overhaul of ARC options, and multilib support for various cpus.

OK to apply?
Claudiu

gcc/
2016-05-09  Claudiu Zissulescu  

* config/arc/arc-arch.h: New file.
* config/arc/arc-arches.def: Likewise.
* config/arc/arc-cpus.def: Likewise.
* config/arc/arc-options.def: Likewise.
* config/arc/t-multilib: Likewise.
* config/arc/genmultilib.awk: Likewise.
* config/arc/genoptions.awk: Likewise.
* config/arc/arc-tables.opt: Likewise.
* config/arc/driver-arc.c: Likewise.
* common/config/arc/arc-common.c (arc_handle_option): Trace
toggled options.
* config.gcc (arc*-*-*): Add arc-tables.opt to arc's extra
options; check for supported cpu against arc-cpus.def file.
(arc*-*-elf*, arc*-*-linux-uclibc*): Use new make fragment; define
TARGET_CPU_BUILD macro; add driver-arc.o as an extra object.
* config/arc/arc-c.def: Add emacs local variables.
* config/arc/arc-opts.h (processor_type): Use arc-cpus.def file.
(FPU_FPUS, FPU_FPUD, FPU_FPUDA, FPU_FPUDA_DIV, FPU_FPUDA_FMA)
(FPU_FPUDA_ALL, FPU_FPUS_DIV, FPU_FPUS_FMA, FPU_FPUS_ALL)
(FPU_FPUD_DIV, FPU_FPUD_FMA, FPU_FPUD_ALL): New defines.
(DEFAULT_arc_fpu_build): Define.
(DEFAULT_arc_mpy_option): Define.
* config/arc/arc-protos.h (arc_init): Delete.
* config/arc/arc.c (arc_cpu_name): New variable.
(arc_selected_cpu, arc_selected_arch, arc_arcem, arc_archs)
(arc_arc700, arc_arc600, arc_arc601): New variable.
(arc_init): Add static; remove selection of default tune value,
cleanup obsolete error messages.
(arc_override_options): Make use of .def files for selecting the
right cpu and option configurations.
* config/arc/arc.h (stdbool.h): Include.
(TARGET_CPU_DEFAULT): Define.
(CPP_SPEC): Remove mcpu=NPS400 handling.
(arc_cpu_to_as): Declare.
(EXTRA_SPEC_FUNCTIONS): Define.
(OPTION_DEFAULT_SPECS): Likewise.
(ASM_DEFAULT): Remove.
(ASM_SPEC): Use arc_cpu_to_as.
(DRIVER_SELF_SPECS): Remove deprecated options.
(arc_arc700, arc_arc600, arc_arc601, arc_arcem, arc_archs):
Declare.
(TARGET_ARC600, TARGET_ARC601, TARGET_ARC700, TARGET_EM)
(TARGET_HS, TARGET_V2, TARGET_ARC600): Make them use arc_arc*
variables.
(MULTILIB_DEFAULTS): Use ARC_MULTILIB_CPU_DEFAULT.
* config/arc/arc.md (attr_cpu): Remove.
* config/arc/arc.opt (arc_mpy_option): Make it target variable.
(mno-mpy): Deprecate.
(mcpu=ARC600, mcpu=ARC601, mcpu=ARC700, mcpu=NPS400, mcpu=ARCEM)
(mcpu=ARCHS): Remove.
(mcrc, mdsp-packa, mdvbf, mmac-d16, mmac-24, mtelephony, mrtsc):
Deprecate.
(mbarrel_shifte, mspfp_, mdpfp_, mdsp_pack, mmac_): Remove.
(arc_fpu): Use new defines.
(arc_seen_options): New target variable.
* config/arc/t-arc (driver-arc.o): New target.
(arc-cpus, t-multilib, arc-tables.opt): Likewise.
* config/arc/t-arc-newlib: Delete.
* config/arc/t-arc-uClibc: Renamed to t-uClibc.
* doc/invoke.texi (ARC): Update arc options.
---
 gcc/common/config/arc/arc-common.c | 162 -
 gcc/config.gcc |  47 +
 gcc/config/arc/arc-arch.h  | 120 ++
 gcc/config/arc/arc-arches.def  |  35 +++
 gcc/config/arc/arc-c.def   |   4 +
 gcc/config/arc/arc-cpus.def|  47 +
 gcc/config/arc/arc-options.def |  69 +
 gcc/config/arc/arc-opts.h  |  47 +++--
 gcc/config/arc/arc-protos.h|   1 -
 gcc/config/arc/arc-tables.opt  |  90 
 gcc/config/arc/arc.c   | 186 ++---
 gcc/config/arc/arc.h   |  91 -
 gcc/config/arc/arc.md  |   5 -
 gcc/config/arc/arc.opt | 109 ++--
 gcc/config/arc/driver-arc.c|  80 +++
 gcc/config/arc/genmultilib.awk | 204 +
 gcc/config/arc/genoptions.awk  |  86 
 gcc/config/arc/t-arc   |  19 
 gcc/config/arc/t-arc-newlib|  46 -
 gcc/config/arc/t-arc-uClibc|  20 
 gcc/config/arc/t-multilib  |  51 ++
 gcc/config/arc/t-uClibc|  20 
 gcc/doc/invoke.texi|  86 +---
 23 files changed, 1249 insertions(+), 376 deletions(-)
 create mode 100644 gcc/config/arc/arc-arch.h
 create mode 100644 gcc/config/arc/arc-arches.def
 create mode 100644 gcc/config/arc/arc-cpus.def
 create mode 100644 gcc/config/arc/arc-options.def
 create mode 100644 gcc/config/arc/arc-tables.opt
 create mode 100644 gcc/config/arc/driver-arc.c
 create mode 100644 gcc/config/arc/genmultilib.awk
 create mode 100644 

[PATCH 0/2] [ARC] Refurbish backend options

2016-05-30 Thread Claudiu Zissulescu
This series of patches redefines the way how we handle the options
within the ARC backend. Firstly, a number of options got deprecated as
they were directed to control the assembler behavior, assembler which
got overhaul and ignores the options in question. Secondly, we remove
the capitalized cpu names accepted by mcpu option. Finally, we
introduced new cpu options, based on the Synopsys predefined CPU
templates to make easier the translation from hardware CPU template to
gcc options. Those new cpu options are handled by a number of scripts
which seamlessly generates gcc options, multilib variants, and
checking the allowed hardware extensions against gcc command line
options.

Please find two patches, one which is refurbishing the ARC options,
and the second one which updates the ARC specific tests.

Claudiu Zissulescu (2):
  New option handling, refurbish multilib support.
  Update target specific tests.

 gcc/common/config/arc/arc-common.c   | 162 --
 gcc/config.gcc   |  47 +++---
 gcc/config/arc/arc-arch.h| 120 +
 gcc/config/arc/arc-arches.def|  35 
 gcc/config/arc/arc-c.def |   4 +
 gcc/config/arc/arc-cpus.def  |  47 ++
 gcc/config/arc/arc-options.def   |  69 
 gcc/config/arc/arc-opts.h|  47 +-
 gcc/config/arc/arc-protos.h  |   1 -
 gcc/config/arc/arc-tables.opt|  90 ++
 gcc/config/arc/arc.c | 186 +++--
 gcc/config/arc/arc.h |  91 +-
 gcc/config/arc/arc.md|   5 -
 gcc/config/arc/arc.opt   | 109 
 gcc/config/arc/driver-arc.c  |  80 +
 gcc/config/arc/genmultilib.awk   | 204 +++
 gcc/config/arc/genoptions.awk|  86 ++
 gcc/config/arc/t-arc |  19 +++
 gcc/config/arc/t-arc-newlib  |  46 -
 gcc/config/arc/t-arc-uClibc  |  20 ---
 gcc/config/arc/t-multilib|  51 ++
 gcc/config/arc/t-uClibc  |  20 +++
 gcc/doc/invoke.texi  |  86 --
 gcc/testsuite/gcc.target/arc/abitest.S   |  31 
 gcc/testsuite/gcc.target/arc/arc.exp |  66 +++-
 gcc/testsuite/gcc.target/arc/barrel-shifter-1.c  |   2 +-
 gcc/testsuite/gcc.target/arc/builtin_simd.c  |   1 +
 gcc/testsuite/gcc.target/arc/builtin_simdarc.c   |   1 +
 gcc/testsuite/gcc.target/arc/cmem-1.c|   1 +
 gcc/testsuite/gcc.target/arc/cmem-2.c|   1 +
 gcc/testsuite/gcc.target/arc/cmem-3.c|   1 +
 gcc/testsuite/gcc.target/arc/cmem-4.c|   1 +
 gcc/testsuite/gcc.target/arc/cmem-5.c|   1 +
 gcc/testsuite/gcc.target/arc/cmem-6.c|   1 +
 gcc/testsuite/gcc.target/arc/cmem-7.c|   1 +
 gcc/testsuite/gcc.target/arc/extzv-1.c   |   1 +
 gcc/testsuite/gcc.target/arc/insv-1.c|   1 +
 gcc/testsuite/gcc.target/arc/insv-2.c|   1 +
 gcc/testsuite/gcc.target/arc/interrupt-1.c   |   7 +-
 gcc/testsuite/gcc.target/arc/interrupt-2.c   |   1 +
 gcc/testsuite/gcc.target/arc/interrupt-3.c   |   2 +-
 gcc/testsuite/gcc.target/arc/jump-around-jump.c  |   2 +-
 gcc/testsuite/gcc.target/arc/mA6.c   |   1 +
 gcc/testsuite/gcc.target/arc/mA7.c   |   1 +
 gcc/testsuite/gcc.target/arc/mARC600.c   |   1 +
 gcc/testsuite/gcc.target/arc/mARC601.c   |   3 +-
 gcc/testsuite/gcc.target/arc/mARC700.c   |   1 +
 gcc/testsuite/gcc.target/arc/mcpu-arc600.c   |   3 +-
 gcc/testsuite/gcc.target/arc/mcpu-arc601.c   |   5 +-
 gcc/testsuite/gcc.target/arc/mcpu-arc700.c   |   3 +-
 gcc/testsuite/gcc.target/arc/mcrc.c  |   8 -
 gcc/testsuite/gcc.target/arc/mdpfp.c |   1 +
 gcc/testsuite/gcc.target/arc/mdsp-packa.c|   9 -
 gcc/testsuite/gcc.target/arc/mdvbf.c |   9 -
 gcc/testsuite/gcc.target/arc/mmac-24.c   |   8 -
 gcc/testsuite/gcc.target/arc/mmac-d16.c  |   9 -
 gcc/testsuite/gcc.target/arc/mno-crc.c   |  11 --
 gcc/testsuite/gcc.target/arc/mno-dsp-packa.c |  11 --
 gcc/testsuite/gcc.target/arc/mno-dvbf.c  |  11 --
 gcc/testsuite/gcc.target/arc/mno-mac-24.c|  11 --
 gcc/testsuite/gcc.target/arc/mno-mac-d16.c   |  11 --
 gcc/testsuite/gcc.target/arc/mno-rtsc.c  |  11 --
 gcc/testsuite/gcc.target/arc/mno-xy.c|  10 --
 gcc/testsuite/gcc.target/arc/movb-1.c|   1 +
 gcc/testsuite/gcc.target/arc/movb-2.c|   1 +
 gcc/testsuite/gcc.target/arc/movb-3.c|   1 +
 gcc/testsuite/gcc.target/arc/movb-4.c|   1 +
 

Re: Enable loop peeling at -O3

2016-05-30 Thread Jan Hubicka
> 
> Sounds good.
> 
> The patch is ok if you omit the new flag for now.

Ok, I will omit that flag for now.  Thanks!

Honza
> 
> Thanks,
> Richard.


[build] Handle gas/gld --compress-debug-sections=type

2016-05-30 Thread Rainer Orth
I noticed that, while gcc.c is already prepared for it, gcc/configure
doesn't yet detect the full ELF gABI compression support introduced in
GNU binutils 2.26.

The following patch adds the missing pieces.  While mostly
straightforward, there are two noteworthy issues:

* I'm removing the `extra switches' arg to gcc_GAS_CHECK_FEATURES.  This
  would cause HAVE_AS_COMPRESS_DEBUG to be set to "no" instead of one of
  the numeric values when testing /bin/as on Solaris.  In the future,
  this might cause to misdetect assembler compression support requiring
  a different option.

* When I removed the default in the gcc_cv_ld_compress test, the outcome
  always was 0, irrespective of the linker tested.  Before, it would
  almost always have been 1 if testing GNU ld.  It turns out that in
  this (and numerous other) cases the nesting of tests using ld_date was
  wrong.  I believe most if not all of those ld_date tests can go, being
  only relevant for truly prehistoric versions of GNU ld not in use
  anymore.  I'll probably submit a followup to remove them, simplifying
  several ld tests.

Bootstrapped without regressions on i386-pc-solaris2.1[02] with various
as/ld combinations and checking that the tests yielded the expected
results:

gcc_cv_as_compress_debug/gcc_cv_ld_compress_debug
as/ld 2.10  0/0
as/ld 2.12  0/3
gas 2.26/ld 2.102/0
gas 2.26/ld 2.122/3
gas 2.26/gld 2.26   2/3
gas 2.25/gld 2.25   1/1

I've also compiled and linked a test program and checked that assembler
and linker were invoked as expected for
-gz/-gz=none/-gz=zlib/-gz=zlib-gnu and yielded either errors or output
with compressed debug sections as should be.

Ok for mainline?

Rainer


2016-05-26  Rainer Orth  

* configure.ac (gcc_cv_as_compress_debug): Remove
--compress-debug-sections as extra as switch.
Handle gas --compress-debug-sections=type.
(gcc_cv_ld_compess_debug): Remove bogus ld_date check.
Handle gld --compress-debug-sections=type.
* configure: Regenerate.

# HG changeset patch
# Parent  42e566f9c2ba3a71305b2db7875ea2e5fe2df2c2
Handle gas/gld --compress-debug-sections=type

diff --git a/gcc/configure.ac b/gcc/configure.ac
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -4731,12 +4731,21 @@ if test x"$insn" != x; then
 fi
 
 gcc_GAS_CHECK_FEATURE([compressed debug sections],
-  gcc_cv_as_compress_debug,,[--compress-debug-sections],,
+  gcc_cv_as_compress_debug
   [# gas compiled without zlib cannot compress debug sections and warns
# about it, but still exits successfully.  So check for this, too.
if $gcc_cv_as --compress-debug-sections -o conftest.o conftest.s 2>&1 | grep -i warning > /dev/null
then
  gcc_cv_as_compress_debug=0
+   # Since binutils 2.26, gas supports --compress-debug-sections=type,
+   # defaulting to the ELF gABI format.
+   elif $gcc_cv_as --compress-debug-sections=zlib-gnu -o conftest.o conftest.s > /dev/null 2>&1
+   then
+ gcc_cv_as_compress_debug=2
+ gcc_cv_as_compress_debug_option="--compress-debug-sections"
+ gcc_cv_as_no_compress_debug_option="--nocompress-debug-sections"
+   # Before binutils 2.26, gas only supported --compress-debug-options and
+   # emitted the traditional GNU format.
elif $gcc_cv_as --compress-debug-sections -o conftest.o conftest.s > /dev/null 2>&1
then
  gcc_cv_as_compress_debug=1
@@ -4744,8 +4753,6 @@ gcc_GAS_CHECK_FEATURE([compressed debug 
  gcc_cv_as_no_compress_debug_option="--nocompress-debug-sections"
else
  gcc_cv_as_compress_debug=0
-   # FIXME: Future gas versions will support ELF gABI style via
-   # --compress-debug-sections[=type].
fi])
 AC_DEFINE_UNQUOTED(HAVE_AS_COMPRESS_DEBUG, $gcc_cv_as_compress_debug,
 [Define to the level of your assembler's compressed debug section support.])
@@ -5120,27 +5127,30 @@ AC_MSG_RESULT($gcc_cv_ld_eh_gc_sections_
 
 AC_MSG_CHECKING(linker for compressed debug sections)
 # gold/gld support compressed debug sections since binutils 2.19/2.21
+# In binutils 2.26, gld gained support for the ELF gABI format.
 if test $in_tree_ld = yes ; then
   gcc_cv_ld_compress_debug=0
   if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 19 -o "$gcc_cv_gld_major_version" -gt 2 \
  && test $in_tree_ld_is_elf = yes && test $ld_is_gold = yes; then
 gcc_cv_ld_compress_debug=2
 gcc_cv_ld_compress_debug_option="--compress-debug-sections"
+  elif test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 26 -o "$gcc_cv_gld_major_version" -gt 2 \
+ && test $in_tree_ld_is_elf = yes && test $ld_is_gold = no; then
+gcc_cv_ld_compress_debug=3
+gcc_cv_ld_compress_debug_option="--compress-debug-sections"
   elif test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 21 -o "$gcc_cv_gld_major_version" -gt 2 \
  && test $in_tree_ld_is_elf = yes; then
 

Re: [PATCH] match.pd: optimize unsigned mul overflow check

2016-05-30 Thread Richard Biener
On Mon, May 30, 2016 at 1:15 PM, Jakub Jelinek  wrote:
> On Mon, May 30, 2016 at 12:55:22PM +0200, Richard Biener wrote:
>> CCing Jakub who wrote the tree-ssa-math-opts.c code last year.  I remember we
>> discussed using match.pd but ended up with not doing it there but I
>> don't remember
>> the exact reason.
>
>  richi: does match.pd support turning non-call code into internal call?
>  jakub: yes
>  jakub: (simplify (plus @1 @2) (fn @1 @2))
>  richi: where fn is IFN_... or something else?
>  jakub: IFN_ should work, yes
>  jakub: CFN_ as well as far as I understand the code richard added
>  but then I don't control the type of the lhs, right?
>  want to transform
>r_4 = x_2(D) - y_3(D);
>if (x_2(D) < r_4)
>  into:
>  well, the lhs is always the original type of the expression 
> (obviously)
>_4 = SUB_OVERFLOW (x_2(D), y_3(D));
>r.0_5 = REALPART_EXPR <_4>;
>_7 = IMAGPART_EXPR <_4>;
>if (_7 != 0)
>  unsigned only
>  jakub: hmm, you have two results, the compare and the subtract, 
> that's not possible to handle right now (unless r_4 is unused otherwise)
>  jakub: you can write a matcher and do the transform in a pass 
> somewhere
>  richi: ah, ok; any suggestion which pass is best for this?  forwprop, 
> something else?
>  jakub: I'd say forwprop is good enough, yes
>  richi: FYI, this is for PR67089, rth is working on the optab side for 
> that
>  hmm, the (match ...) code doesn't deal well with values not having an 
> SSA name (the compare in this case)
>  "deal well" as in, not implemented
>  it would require to output some auxiliary entry points
>  richi: if it is done in forwprop or some other pass, it can do 
> everything in there by hand, doesn't need match.pd help I guess
>  jakub: yes

Ok, so Alexander - you transform the condition only as the
multiplication value is not available
anyway with your formulation of the overflow check.  That's then
different to Jakubs case,
so match.pd is ok (that is, the patch is ok for trunk).

>>  It would be nice to have these in one place rather
>> than split.
>>
>> As of umulv4 handling - users can write overflow checks using the 
>> builtins
>> so we better expand them to the optimal form for each target, thus
>> canonicalizing
>> them to the IFN looks reasonable to me.
>
>> The plus/minus case in tree-ssa-math-opts.c
>> _does_ disable itself if no uaddv is available though.
>
> That is because the uaddv/usubv case is so simple without the HW support that
> it is unlikely to be beneficial then.  That is not the case for 
> multiplication.

Richard.

> Jakub


Re: Enable loop peeling at -O3

2016-05-30 Thread Richard Biener
On Mon, 30 May 2016, Jan Hubicka wrote:

> > On Sat, 28 May 2016, Jan Hubicka wrote:
> > 
> > > Hello,
> > > thanks for feedback. I updated the patch and also noticed that 
> > > -fpeel-all-loops gives up when
> > > upper bound is known but it is large and when the max-peel-insns is too 
> > > small to permit
> > > peeling max-peel-times.  This patch also updates  pr61743-2.c which are 
> > > now peeled before
> > > we manage to propagate the proper loop bound.
> > > 
> > > Bootstrapped/regtested x86_64-linux. OK?
> > 
> > Humm, so why add -fpeel-all-loops?  I don't think -funroll-all-loops
> > is useful.
> It is mostly there to trigger the transform to see if it is useful. Not 
> something
> you want to enable by default.
> 
> -fpeel-all-loops helps when you know your code have internal loops that
> iterate few times.  I.e. one can get good speedup for the sudoku solver 
> benchmark
> because it has loops that iterate either once or 10 times.
> 
> http://www.ucw.cz/~hubicka/papers/amd64/node4.html also claims that 
> -funroll-all-loops
> improves specint by 2.5%, while -funroll-loops by 2.23%, so it seemed 
> somewhat useful
> back then.

Still I'm hesitant to introduce new user-visible options.

> > Did you check code-size/compile-time/performance effects of enabling
> > -fpeel-loops at -O3 for, say, SPEC CPU 2006?
> 
> Martin Liska run it on the SPEC2006 and v6 (not with latest fixes to
> heuristics).  Without FDO the loop peeling triggers only for loops where we
> have likely_upper_bound != upper_bound. We do not predict that too often (we
> may in future as there is room for improvement in niter).  The code size 
> effect
> was +0.9% for SPECint and +2.2% on SPECfp.  The off-noise improvements were 
> vrp
> 94.5->89.7 and John the ripper 106.9->100 (less than 0.1% in geomavg).  I have
> cut down the code size effects since that (there was a bug that made us to 
> peel
> without control when maxiter overflowed), but I did not re-run full specs 
> since
> then.
> 
> My motivation was mainly to reduce number of optimizations that are not good
> enough to be enabled by default and also observation that it helps to some
> benchmarks.
> 
> We can re-run benchmarks with current patch after fixing the profile update
> issues I will send shortly.  No fine-tuning of the parameters was done and I
> guess they are still set the way they was set for RTL peeling in 2003.

Sounds good.

The patch is ok if you omit the new flag for now.

Thanks,
Richard.


Re: [PATCH] match.pd: optimize unsigned mul overflow check

2016-05-30 Thread Jakub Jelinek
On Mon, May 30, 2016 at 12:55:22PM +0200, Richard Biener wrote:
> CCing Jakub who wrote the tree-ssa-math-opts.c code last year.  I remember we
> discussed using match.pd but ended up with not doing it there but I
> don't remember
> the exact reason.

 richi: does match.pd support turning non-call code into internal call?
 jakub: yes
 jakub: (simplify (plus @1 @2) (fn @1 @2))
 richi: where fn is IFN_... or something else?
 jakub: IFN_ should work, yes
 jakub: CFN_ as well as far as I understand the code richard added
 but then I don't control the type of the lhs, right?
 want to transform
   r_4 = x_2(D) - y_3(D);
   if (x_2(D) < r_4)
 into:
 well, the lhs is always the original type of the expression (obviously)
   _4 = SUB_OVERFLOW (x_2(D), y_3(D));
   r.0_5 = REALPART_EXPR <_4>;
   _7 = IMAGPART_EXPR <_4>;
   if (_7 != 0)
 unsigned only
 jakub: hmm, you have two results, the compare and the subtract, that's 
not possible to handle right now (unless r_4 is unused otherwise)
 jakub: you can write a matcher and do the transform in a pass somewhere
 richi: ah, ok; any suggestion which pass is best for this?  forwprop, 
something else?
 jakub: I'd say forwprop is good enough, yes
 richi: FYI, this is for PR67089, rth is working on the optab side for 
that
 hmm, the (match ...) code doesn't deal well with values not having an 
SSA name (the compare in this case)
 "deal well" as in, not implemented
 it would require to output some auxiliary entry points
 richi: if it is done in forwprop or some other pass, it can do 
everything in there by hand, doesn't need match.pd help I guess
 jakub: yes

>  It would be nice to have these in one place rather
> than split.
> 
> As of umulv4 handling - users can write overflow checks using the 
> builtins
> so we better expand them to the optimal form for each target, thus
> canonicalizing
> them to the IFN looks reasonable to me.

> The plus/minus case in tree-ssa-math-opts.c
> _does_ disable itself if no uaddv is available though.

That is because the uaddv/usubv case is so simple without the HW support that
it is unlikely to be beneficial then.  That is not the case for multiplication.

Jakub


Re: Enable loop peeling at -O3

2016-05-30 Thread Jan Hubicka
> On Sat, 28 May 2016, Jan Hubicka wrote:
> 
> > Hello,
> > thanks for feedback. I updated the patch and also noticed that 
> > -fpeel-all-loops gives up when
> > upper bound is known but it is large and when the max-peel-insns is too 
> > small to permit
> > peeling max-peel-times.  This patch also updates  pr61743-2.c which are now 
> > peeled before
> > we manage to propagate the proper loop bound.
> > 
> > Bootstrapped/regtested x86_64-linux. OK?
> 
> Humm, so why add -fpeel-all-loops?  I don't think -funroll-all-loops
> is useful.
It is mostly there to trigger the transform to see if it is useful. Not 
something
you want to enable by default.

-fpeel-all-loops helps when you know your code have internal loops that
iterate few times.  I.e. one can get good speedup for the sudoku solver 
benchmark
because it has loops that iterate either once or 10 times.

http://www.ucw.cz/~hubicka/papers/amd64/node4.html also claims that 
-funroll-all-loops
improves specint by 2.5%, while -funroll-loops by 2.23%, so it seemed somewhat 
useful
back then.
> 
> Did you check code-size/compile-time/performance effects of enabling
> -fpeel-loops at -O3 for, say, SPEC CPU 2006?

Martin Liska run it on the SPEC2006 and v6 (not with latest fixes to
heuristics).  Without FDO the loop peeling triggers only for loops where we
have likely_upper_bound != upper_bound. We do not predict that too often (we
may in future as there is room for improvement in niter).  The code size effect
was +0.9% for SPECint and +2.2% on SPECfp.  The off-noise improvements were vrp
94.5->89.7 and John the ripper 106.9->100 (less than 0.1% in geomavg).  I have
cut down the code size effects since that (there was a bug that made us to peel
without control when maxiter overflowed), but I did not re-run full specs since
then.

My motivation was mainly to reduce number of optimizations that are not good
enough to be enabled by default and also observation that it helps to some
benchmarks.

We can re-run benchmarks with current patch after fixing the profile update
issues I will send shortly.  No fine-tuning of the parameters was done and I
guess they are still set the way they was set for RTL peeling in 2003.

Honza


[PATCH][RFC] Remove ifcvt_repair_bool_pattern, re-do bool patterns

2016-05-30 Thread Richard Biener

The following patch removes the restriction on seeing a tree of stmts
in vectorizer bool pattern detection (aka single-use).  With this
it is no longer necessary to unshare DEFs in ifcvt_repair_bool_pattern
and that compile-time hog can go (it's now enabled unconditionally for GCC 7).

Instead the pattern detection code will now "unshare" the condition tree
for each bool pattern root my means of adding all pattern stmts of the
condition tree to its pattern def sequence (so we still get some
unnecessary copying, worst-case quadratic rather than exponential).

Ilja - I had to disable the

  tree mask_type = get_mask_type_for_scalar_type (TREE_TYPE 
(rhs1));
  if (mask_type
  && expand_vec_cmp_expr_p (comp_vectype, mask_type))
return false;

check you added to check_bool_pattern to get any coverage for bool
patterns on x86_64.  Doing that regresses

FAIL: gcc.target/i386/mask-pack.c scan-tree-dump-times vect "vectorized 1 
loops" 10
FAIL: gcc.target/i386/mask-unpack.c scan-tree-dump-times vect "vectorized 
1 loops" 10
FAIL: gcc.target/i386/pr70021.c scan-tree-dump-times vect "vectorized 1 
loops" 2

so somehow bool patterns mess up things here (I didn't investigate).
The final patch will enable the above path again, avoiding the regression.

Yuri - I suppose you have a larger set of testcases using OMP simd
or other forced vectorization you added ifcvt_repair_bool_pattern for.
I'd appreciate testing (and testcases if anything fails unexpectedly).

Testing on other targets is of course appreciated as well.

Bootstrapped and tested on x86_64-unknown-linux-gnu (with the #if 0).

Comments?

I agree with Ilya elsewhere to remove bool patterns completely
(as a first step making them apply to individual stmts).  We do
currently not support mixed cond/non-cond uses anyway, like

_Bool a[64];
unsigned char b[64];

void foo (void)
{
  for (int i = 0; i < 64; ++i)
{
  _Bool x = a[i] && b[i] < 10;
  a[i] = x;
}
}

and stmt-local "patterns" can be added when vectorizing the stmt
(like in the above case a tem = a[i] != 0 ? -1 : 0).  Doing
bool "promotion" optimally requires a better vectorizer IL
(similar to placing of shuffles).

Thanks,
Richard.

2016-05-30  Richard Biener  

PR tree-optimization/71261
* tree-vect-patterns.c (check_bool_pattern): Gather a hash-set
of stmts successfully put in the bool pattern.  Remove
single-use restriction.
(adjust_bool_pattern_cast): Add cast at the use site via the
pattern def sequence.
(adjust_bool_pattern): Remove recursion, maintain a hash-map
of patterned defs.  Use the pattern def seqence instead of
multiple independent patterns.
(sort_after_uid): New qsort compare function.
(adjust_bool_stmts): New function to process stmts in the bool
pattern in IL order.
(vect_recog_bool_pattern): Adjust.
* tree-if-conv.c (ifcvt_split_def_stmt): Remove.
(ifcvt_walk_pattern_tree): Likewise.
(stmt_is_root_of_bool_pattern): Likewise.
(ifcvt_repair_bool_pattern): Likewise.
(tree_if_conversion): Do not call ifcvt_repair_bool_pattern.

* gcc.dg/torture/vect-bool-1.c: New testcase.

Index: gcc/tree-vect-patterns.c
===
*** gcc/tree-vect-patterns.c.orig   2016-05-27 14:32:49.778686018 +0200
--- gcc/tree-vect-patterns.c2016-05-30 11:06:23.563532173 +0200
*** vect_recog_mixed_size_cond_pattern (vec<
*** 2888,2897 
  /* Helper function of vect_recog_bool_pattern.  Called recursively, return
 true if bool VAR can and should be optimized that way.  Assume it shouldn't
 in case it's a result of a comparison which can be directly vectorized into
!a vector comparison.  */
  
  static bool
! check_bool_pattern (tree var, vec_info *vinfo)
  {
gimple *def_stmt;
enum vect_def_type dt;
--- 2888,2898 
  /* Helper function of vect_recog_bool_pattern.  Called recursively, return
 true if bool VAR can and should be optimized that way.  Assume it shouldn't
 in case it's a result of a comparison which can be directly vectorized into
!a vector comparison.  Fills in STMTS with all stmts visited during the
!walk.  */
  
  static bool
! check_bool_pattern (tree var, vec_info *vinfo, hash_set )
  {
gimple *def_stmt;
enum vect_def_type dt;
*** check_bool_pattern (tree var, vec_info *
*** 2907,2943 
if (!is_gimple_assign (def_stmt))
  return false;
  
!   if (!has_single_use (var))
! return false;
  
rhs1 = gimple_assign_rhs1 (def_stmt);
rhs_code = gimple_assign_rhs_code (def_stmt);
switch (rhs_code)
  {
  case SSA_NAME:
!   return check_bool_pattern (rhs1, vinfo);
  
  CASE_CONVERT:
if ((TYPE_PRECISION (TREE_TYPE (rhs1)) != 1
   || !TYPE_UNSIGNED (TREE_TYPE (rhs1)))
  && TREE_CODE 

Re: [PATCH, PR69068] Handle 3-arg phi in copy_bb_and_scalar_dependences

2016-05-30 Thread Richard Biener
On Mon, 30 May 2016, Tom de Vries wrote:

> On 30/05/16 11:46, Richard Biener wrote:
> > > This patch fixes the assert conservatively by aborting graphite code
> > > >generation when encountering a phi with more than two arguments in
> > > >copy_bb_and_scalar_dependences.
> > > >
> > > >Bootstrapped and reg-tested on x86_64.
> > > >
> > > >OK for trunk, 6 branch?
> 
> > Did you check if simply returning false from bb_contains_loop_phi_nodes
> > instead of asserting works?  The caller has a 'else' that is supposed
> > to handle condition PHIs.  After all it already handles one predecessor
> > specially ...  Thus
> > 
> >if (EDGE_COUNT (bb->preds) != 2)
> >  return false;
> > 
> > should work here.
> 
> Unfortunately, that doesn't work. We run into another assert in
> copy_cond_phi_nodes:
> ...
>   /* Cond phi nodes should have exactly two arguments.  */
>   gcc_assert (2 == EDGE_COUNT (bb->preds));
> ...

Hah.  So can we still do my suggested change and bail out conservatively
in Cond PHI node handling instead?  Because the PHI node is clearly
_not_ a loop header PHI and the cond phi assert is also a latent bug.

> > Or replace this function with bb_loop_header_p (bb)
> > (works w/o loop info, the function seems to depend on loop info and
> > thus simply checking bb->loop_father->header == bb should work as well).
> 
> I think that will run into the same assert.

Yes.

Richard.

> Thanks,
> - Tom
> 
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] match.pd: optimize unsigned mul overflow check

2016-05-30 Thread Richard Biener
On Mon, May 30, 2016 at 9:14 AM, Alexander Monakov  wrote:
> On Sun, 29 May 2016, Marc Glisse wrote:
>> On Sat, 28 May 2016, Alexander Monakov wrote:
>>
>> > For unsigned A, B, 'A > -1 / B' is a nice predicate for checking whether
>> > 'A*B'
>> > overflows (or 'B && A > -1 / B' if B may be zero).  Let's optimize it to an
>> > invocation of __builtin_mul_overflow to avoid the divide operation.
>>
>> I forgot to ask earlier: what does this give for modes / platforms where
>> umulv4 does not have a specific implementation? Is the generic implementation
>> worse than A>-1/B, in which case we may want to check optab_handler before
>> doing the transformation? Or is it always at least as good?
>
> If umulv4 is unavailable (which today is everywhere except x86), gcc
> falls back as follows.  First, it tries to see if doing a multiplication in a
> 2x wider type is possible (which it usually is, as gcc supports __int128_t on
> 64-bit platforms and 64-bit long long on 32-bit platforms), then it looks at
> high bits of the 2x wide product.  This should boil down to doing a 'high
> multiply' instruction if original operands' type matches register size, and a
> normal multiply + masking high bits if the type is smaller than register.
>
> Second, if the above fails (e.g. with 64-bit operands on a 32-bit platform),
> then gcc emits a sequence that performs the multiplication by parts in a 2x
> narrower type.
>
> I think the first, more commonly taken, fallback path results in an
> always-good code. In the second case, the eliminated 64-bit divide is unlikely
> to have a direct hw support; e.g., on i386 it's a library call to __udivdi3.
> This makes the transformation a likely loss for code size, a likely win for
> performance.  It could be better if GCC could CSE REALPART (IFN_MUL_OVERFLOW)
> with A*B on gimple.

CCing Jakub who wrote the tree-ssa-math-opts.c code last year.  I remember we
discussed using match.pd but ended up with not doing it there but I
don't remember
the exact reason.  It would be nice to have these in one place rather
than split.

As of umulv4 handling - users can write overflow checks using the builtins
so we better expand them to the optimal form for each target, thus
canonicalizing
them to the IFN looks reasonable to me.  The plus/minus case in
tree-ssa-math-opts.c
_does_ disable itself if no uaddv is available though.

As for the division by zero thing - division by zero invokes undefined
behavior.  Yes,
with -fnon-call-exceptions you could catch this as exception, but you shouldn't
be able to w/o a -fdivision-by-zero.  And yes, we're quite
inconsistent in folding
of, say, 0/0 - but that is due to warning code (historically).  I'd
rather ignore that
bit in folding and make us consistent here (hopefully w/o regressing
in diagnostics).

Richard.

> Thanks.
> Alexander


Re: [PATCH2][PR71252] Fix insertion point of stmt_to_insert

2016-05-30 Thread Richard Biener
On Mon, May 30, 2016 at 2:35 AM, kugan
 wrote:
>
>
> On 28/05/16 01:28, Kugan Vivekanandarajah wrote:
>>
>> Hi Richard,
>>
>> This fix insertion point of stmt_to_insert based on your comments. In
>> insert_stmt_before_use , I now use find_insert_point such that we
>> insert the stmt_to_insert after its operands are defined. This means
>> that we now have to insert before and insert after in other cases.
>>
>> I also factored out uses of insert_stmt_before_use.
>>
>> I tested this with:
>> ./build/gcc/f951 cp2k_single_file.f90 -O3 -ffast-math -march=westmere
>>
>> I am running bootstrap and regression testing on x86-64-linux gnu. Is
>> this OK for trunk if the testing is fine ? I will also test with other
>> test cases from relevant PRs
>>
>>
> Hi,
>
> Here is the tested patch. I made a slight change to reflect the paten used
> everywhere else in reassoc.
>
> i.e., in insert_stmt_before_use, after finding the insertion point I now do:
>
> +  if (stmt == insert_point)
> +gsi_insert_before (, stmt_to_insert, GSI_NEW_STMT);
> +  else
> +insert_stmt_after (stmt_to_insert, insert_point);
>
> This is consistent with what is done in other places.
>
> I tested the patch with CPU2006 and bootstrapped and regression tested for
> x86-64-none-linux-gnu with no new regressions.
>
> Is this OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> Kugan
>
> gcc/testsuite/ChangeLog:
>
> 2016-05-28  Kugan Vivekanandarajah  
>
> PR middle-end/71269
> PR middle-end/71292
> * gcc.dg/tree-ssa/pr71269.c: New test.
> * gcc.dg/tree-ssa/pr71292.c: New test.
>
> gcc/ChangeLog:
>
> 2016-05-28  Kugan Vivekanandarajah  
>
> PR middle-end/71269
> PR middle-end/71252
>
> * tree-ssa-reassoc.c (insert_stmt_before_use): Use find_insert_point
> so
> that inserted stmt will not dominate stmts that defines its operand.
> (rewrite_expr_tree): Add stmt_to_insert before adding the use stmt.
> (rewrite_expr_tree_parallel): Likewise.


Re: [PATCH, PR69068] Handle 3-arg phi in copy_bb_and_scalar_dependences

2016-05-30 Thread Tom de Vries

On 30/05/16 11:46, Richard Biener wrote:

This patch fixes the assert conservatively by aborting graphite code
>generation when encountering a phi with more than two arguments in
>copy_bb_and_scalar_dependences.
>
>Bootstrapped and reg-tested on x86_64.
>
>OK for trunk, 6 branch?



Did you check if simply returning false from bb_contains_loop_phi_nodes
instead of asserting works?  The caller has a 'else' that is supposed
to handle condition PHIs.  After all it already handles one predecessor
specially ...  Thus

   if (EDGE_COUNT (bb->preds) != 2)
 return false;

should work here.


Unfortunately, that doesn't work. We run into another assert in 
copy_cond_phi_nodes:

...
  /* Cond phi nodes should have exactly two arguments.  */
  gcc_assert (2 == EDGE_COUNT (bb->preds));
...


Or replace this function with bb_loop_header_p (bb)
(works w/o loop info, the function seems to depend on loop info and
thus simply checking bb->loop_father->header == bb should work as well).


I think that will run into the same assert.

Thanks,
- Tom




Re: C PATCH to add -Wswitch-unreachable (PR c/49859)

2016-05-30 Thread Florian Weimer

On 05/23/2016 12:16 PM, Marek Polacek wrote:


Can we document what a programmer needs to do to suppress the warning? My
worry is that the warning will be too unpredictable to be useful.


Why would you suppress the warning?  Just delete the code that cannot be
executed anyway.  And what is unpredictable?  You don't want the compiler
to warn on
   switch(x) {
 found = 1;
 case 1:
 ...
?


Oh, I misread what the patch was about.  It looks sufficiently 
predictable to do this the GIMPLE level (but it's odd if that's indeed 
the architecturally correct choice).


Thanks,
Florian



Re: [SPARC] Support for --with-{cpu,tune}-{32,64} in sparc*-* targets

2016-05-30 Thread Jose E. Marchesi

> Tested in sparc64-linux-gnu, sparcv9-linux-gnu and sparc-sun-solaris2.11.
> 
> 2016-05-25  Jose E. Marchesi  
> 
>   * config.gcc (sparc*-*-*): Support cpu_32, cpu_64, tune_32 and
>   tune_64.
>   * doc/install.texi (--with-cpu-32, --with-cpu-64): Document
>   support on SPARC.
>   * config/sparc/linux64.h (OPTION_DEFAULT_SPECS): Add entries for
>   cpu_32, cpu_64, tune_32 and tune_64.
>   * config/sparc/sol2.h (OPTION_DEFAULT_SPECS): Likewise.

OK for mainline, thanks.

Thanks.  I would need someone to install it, as I don't have commit
access to the svn.


Re: [PATCH, PR69068] Handle 3-arg phi in copy_bb_and_scalar_dependences

2016-05-30 Thread Richard Biener
On Mon, 30 May 2016, Tom de Vries wrote:

> Hi,
> 
> this patch fixes graphite PR69068, a 6/7 regression.
> 
> 
> I.
> 
> Consider this test-case:
> ...
> int qo;
> int zh[2];
> 
> void
> td (void)
> {
>   int ly, en;
>   for (ly = 0; ly < 2; ++ly)
> for (en = 0; en < 2; ++en)
>   zh[en] = ((qo == 0) || (((qo * 2) != 0))) ? 1 : -1;
> }
> ...
> 
> When compiling with -O1 -fgraphite-identity, we run into an assert in
> bb_contains_loop_phi_nodes:
> ...
> pr-graphite.c: In function ‘td’:
> pr-graphite.c:5:1: internal compiler error: in bb_contains_loop_phi_nodes, at
> graphite-isl-ast-to-gimple.c:1078
> td(void)
> ^~
> ...
> 
> 
> II.
> 
> At graphite0, we have a 3 argument phi in bb 7:
> ...
> td ()
> {
>   int en;
>   int ly;
>   int qo.1_1;
>   int _3;
>   int iftmp.0_6;
> 
>   :
>   qo.1_1 = qo;
>   _3 = qo.1_1 * 2;
>   goto ;
> 
>   :
> 
>   :
>   # en_19 = PHI <0(10), en_12(3)>
>   if (qo.1_1 == 0)
> goto ;
>   else
> goto ;
> 
>   :
>   if (_3 != 0)
> goto ;
>   else
> goto ;
> 
>   :
> 
>   :
>   # iftmp.0_6 = PHI <1(6), -1(5), 1(4)>
>   zh[en_19] = iftmp.0_6;
>   en_12 = en_19 + 1;
>   if (en_12 <= 1)
> goto ;
>   else
> goto ;
> 
>   :
>   ly_13 = ly_18 + 1;
>   if (ly_13 <= 1)
> goto ;
>   else
> goto ;
> 
>   :
> 
>   :
>   # ly_18 = PHI 
>   goto ;
> 
>   :
>   return;
> 
> }
> ...
> 
> When instantiating the gimple from the graphite ast, we arrive at
> copy_bb_and_scalar_dependences with bb.index == 7, where we execute:
> ...
>   if (num_phis > 0 && bb_contains_loop_phi_nodes (bb))
> {
> ...
> 
> Subsequently we run into this assert, because
> EDGE_COUNT (bb->preds) == 3:
> ...
> /* Return true when BB contains loop phi nodes.  A loop phi node is the
>loop header containing phi nodes which has one init-edge and one
>back-edge.  */
> 
> static bool
> bb_contains_loop_phi_nodes (basic_block bb)
> {
>   gcc_assert (EDGE_COUNT (bb->preds) <= 2);
> ...
> 
> 
> III.
> 
> This patch fixes the assert conservatively by aborting graphite code
> generation when encountering a phi with more than two arguments in
> copy_bb_and_scalar_dependences.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk, 6 branch?

Did you check if simply returning false from bb_contains_loop_phi_nodes
instead of asserting works?  The caller has a 'else' that is supposed
to handle condition PHIs.  After all it already handles one predecessor
specially ...  Thus

  if (EDGE_COUNT (bb->preds) != 2)
return false;

should work here.  Or replace this function with bb_loop_header_p (bb)
(works w/o loop info, the function seems to depend on loop info and
thus simply checking bb->loop_father->header == bb should work as well).

Thanks,
Richard.

[PATCH, PR69068] Handle 3-arg phi in copy_bb_and_scalar_dependences

2016-05-30 Thread Tom de Vries

Hi,

this patch fixes graphite PR69068, a 6/7 regression.


I.

Consider this test-case:
...
int qo;
int zh[2];

void
td (void)
{
  int ly, en;
  for (ly = 0; ly < 2; ++ly)
for (en = 0; en < 2; ++en)
  zh[en] = ((qo == 0) || (((qo * 2) != 0))) ? 1 : -1;
}
...

When compiling with -O1 -fgraphite-identity, we run into an assert in 
bb_contains_loop_phi_nodes:

...
pr-graphite.c: In function ‘td’:
pr-graphite.c:5:1: internal compiler error: in 
bb_contains_loop_phi_nodes, at graphite-isl-ast-to-gimple.c:1078

td(void)
^~
...


II.

At graphite0, we have a 3 argument phi in bb 7:
...
td ()
{
  int en;
  int ly;
  int qo.1_1;
  int _3;
  int iftmp.0_6;

  :
  qo.1_1 = qo;
  _3 = qo.1_1 * 2;
  goto ;

  :

  :
  # en_19 = PHI <0(10), en_12(3)>
  if (qo.1_1 == 0)
goto ;
  else
goto ;

  :
  if (_3 != 0)
goto ;
  else
goto ;

  :

  :
  # iftmp.0_6 = PHI <1(6), -1(5), 1(4)>
  zh[en_19] = iftmp.0_6;
  en_12 = en_19 + 1;
  if (en_12 <= 1)
goto ;
  else
goto ;

  :
  ly_13 = ly_18 + 1;
  if (ly_13 <= 1)
goto ;
  else
goto ;

  :

  :
  # ly_18 = PHI 
  goto ;

  :
  return;

}
...

When instantiating the gimple from the graphite ast, we arrive at 
copy_bb_and_scalar_dependences with bb.index == 7, where we execute:

...
  if (num_phis > 0 && bb_contains_loop_phi_nodes (bb))
{
...

Subsequently we run into this assert, because
EDGE_COUNT (bb->preds) == 3:
...
/* Return true when BB contains loop phi nodes.  A loop phi node is the
   loop header containing phi nodes which has one init-edge and one
   back-edge.  */

static bool
bb_contains_loop_phi_nodes (basic_block bb)
{
  gcc_assert (EDGE_COUNT (bb->preds) <= 2);
...


III.

This patch fixes the assert conservatively by aborting graphite code 
generation when encountering a phi with more than two arguments in 
copy_bb_and_scalar_dependences.


Bootstrapped and reg-tested on x86_64.

OK for trunk, 6 branch?

Thanks,
- Tom
Handle 3-arg phi in copy_bb_and_scalar_dependences

2016-05-30  Tom de Vries  

	PR tree-optimization/69068
	* graphite-isl-ast-to-gimple.c (copy_bb_and_scalar_dependences): Handle
	phis with more than two args.

	* gcc.dg/graphite/pr69068.c: New test.

---
 gcc/graphite-isl-ast-to-gimple.c|  7 +++
 gcc/testsuite/gcc.dg/graphite/pr69068.c | 14 ++
 2 files changed, 21 insertions(+)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index ff1d91f..c176db0 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -2725,6 +2725,13 @@ copy_bb_and_scalar_dependences (basic_block bb, edge next_e, vec iv_map)
 }
   else
 {
+  if (num_phis > 0
+	  && EDGE_COUNT (bb->preds) > 2)
+	{
+	  codegen_error = true;
+	  return NULL;
+	}
+
   new_bb = split_edge (next_e);
   if (num_phis > 0 && bb_contains_loop_phi_nodes (bb))
 	{
diff --git a/gcc/testsuite/gcc.dg/graphite/pr69068.c b/gcc/testsuite/gcc.dg/graphite/pr69068.c
new file mode 100644
index 000..0abea06
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/pr69068.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fgraphite-identity" } */
+
+int qo;
+int zh[2];
+
+void
+td (void)
+{
+  int ly, en;
+  for (ly = 0; ly < 2; ++ly)
+for (en = 0; en < 2; ++en)
+  zh[en] = ((qo == 0) || (((qo * 2) != 0))) ? 1 : -1;
+}


Re: Thoughts on memcmp expansion (PR43052)

2016-05-30 Thread Florian Weimer

On 01/15/2016 05:58 PM, Bernd Schmidt wrote:


One question Richard posed in the comments: why aren't we optimizing
small constant size memcmps other than size 1 to *s == *q? The reason is
the return value of memcmp, which implies byte-sized operation
(incidentally, the use of SImode in the cmpmem/cmpstr patterns is really
odd). It's possible to work around this, but expansion becomes a little
more tricky (subtract after bswap, maybe).


When I did this (big-endian conversion, wide substract, sign) to the 
tail difference check in glibc's x86_64 memcmp, it was actually a bit 
faster than isolating the differing byte and returning its difference, 
even for non-random data such as encountered during qsort:



 * Expand __memcmp_eq for small constant sizes with loads and
   comparison, fall back to a memcmp call.


Should we export such a function from glibc?  I expect it's fairly 
common.  Computing the tail difference costs a few cycles.


It may also make sense to call a streamlined implementation if you have 
interesting alignment information (for x86_64, that would be at least 16 
on one or both inputs, so it's perhaps not easy to come by).


Florian



Re: Enable loop peeling at -O3

2016-05-30 Thread Richard Biener
On Sat, 28 May 2016, Jan Hubicka wrote:

> Hello,
> thanks for feedback. I updated the patch and also noticed that 
> -fpeel-all-loops gives up when
> upper bound is known but it is large and when the max-peel-insns is too small 
> to permit
> peeling max-peel-times.  This patch also updates  pr61743-2.c which are now 
> peeled before
> we manage to propagate the proper loop bound.
> 
> Bootstrapped/regtested x86_64-linux. OK?

Humm, so why add -fpeel-all-loops?  I don't think -funroll-all-loops
is useful.

Did you check code-size/compile-time/performance effects of enabling
-fpeel-loops at -O3 for, say, SPEC CPU 2006?

Thanks,
Richard.

> Honza
> 
>   * common.opt (flag_peel_all_loops): New option.
>   * doc/invoke.texi: (-fpeel-loops): Update documentation.
>   (-fpeel-all-loops): Document.
>   * opts.c (default_options): Add OPT_fpeel_loops to -O3+.
>   * toplev.c (process_options): flag_peel_all_loops implies
>   flag_peel_loops.
>   * tree-ssa-lop-ivcanon.c (try_peel_loop): Update comment; handle
>   -fpeel-all-loops, use likely estimates.
> 
>   * gcc.dg/tree-ssa/peel1.c: New testcase.
>   * gcc.dg/tree-ssa/peel2.c: New testcase.
>   * gcc.dg/tree-ssa/pr61743-1.c: Pass -fno-peel-loops.
>   * gcc.dg/tree-ssa/pr61743-2.c: Pass -fno-peel-loops.
> Index: common.opt
> ===
> --- common.opt(revision 236815)
> +++ common.opt(working copy)
> @@ -1840,6 +1840,10 @@ fpeel-loops
>  Common Report Var(flag_peel_loops) Optimization
>  Perform loop peeling.
>  
> +fpeel-all-loops
> +Common Report Var(flag_peel_all_loops) Optimization
> +Perform loop peeling of all loops.
> +
>  fpeephole
>  Common Report Var(flag_no_peephole,0) Optimization
>  Enable machine specific peephole optimizations.
> Index: doc/invoke.texi
> ===
> --- doc/invoke.texi   (revision 236815)
> +++ doc/invoke.texi   (working copy)
> @@ -375,7 +375,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol
>  -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol
>  -fomit-frame-pointer -foptimize-sibling-calls @gol
> --fpartial-inlining -fpeel-loops -fpredictive-commoning @gol
> +-fpartial-inlining -fpeel-loops -fpeel-all-loops -fpredictive-commoning @gol
>  -fprefetch-loop-arrays @gol
>  -fprofile-correction @gol
>  -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
> @@ -6338,7 +6338,8 @@ by @option{-O2} and also turns on the @o
>  @option{-fgcse-after-reload}, @option{-ftree-loop-vectorize},
>  @option{-ftree-loop-distribute-patterns}, @option{-fsplit-paths}
>  @option{-ftree-slp-vectorize}, @option{-fvect-cost-model},
> -@option{-ftree-partial-pre} and @option{-fipa-cp-clone} options.
> +@option{-ftree-partial-pre}, @option{-fpeel-loops}
> +and @option{-fipa-cp-clone} options.
>  
>  @item -O0
>  @opindex O0
> @@ -8593,7 +8594,7 @@ data about values of expressions in the
>  With @option{-fbranch-probabilities}, it reads back the data gathered
>  from profiling values of expressions for usage in optimizations.
>  
> -Enabled with @option{-fprofile-generate} and @option{-fprofile-use}.
> +Enabled with @option{-fprofile-generate} and/or @option{-fprofile-use}.
>  
>  @item -fprofile-reorder-functions
>  @opindex fprofile-reorder-functions
> @@ -8661,10 +8662,17 @@ the loop is entered.  This usually makes
>  @item -fpeel-loops
>  @opindex fpeel-loops
>  Peels loops for which there is enough information that they do not
> -roll much (from profile feedback).  It also turns on complete loop peeling
> -(i.e.@: complete removal of loops with small constant number of iterations).
> -
> -Enabled with @option{-fprofile-use}.
> +roll much (from profile feedback or static analysis).  It also turns on
> +complete loop peeling (i.e.@: complete removal of loops with small constant
> +number of iterations).
> +
> +Enabled with @option{-O3} and @option{-fprofile-use}.
> +
> +@item -fpeel-all-loops
> +@opindex fpeel-all-loops
> +Peel all loops, even if their number of iterations is uncertain when
> +the loop is entered.  For loops with large number of iterations this leads
> +to wasted code size.
>  
>  @item -fmove-loop-invariants
>  @opindex fmove-loop-invariants
> Index: opts.c
> ===
> --- opts.c(revision 236815)
> +++ opts.c(working copy)
> @@ -535,6 +535,7 @@ static const struct default_options defa
>  { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, 
> VECT_COST_MODEL_DYNAMIC },
>  { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
>  { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
> +{ OPT_LEVELS_3_PLUS, OPT_fpeel_loops, NULL, 1 },
>  
>  /* -Ofast adds optimizations to -O3.  */
>  { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
> Index: testsuite/gcc.dg/tree-ssa/peel1.c
> 

Re: Moving backwards/FSM threader into its own pass

2016-05-30 Thread Richard Biener
On Fri, May 27, 2016 at 6:30 PM, Jeff Law  wrote:
>
> It's been my plan since finally wrapping my head around Bodik's thesis to
> revamp how we handle jump threading to use some of the principles from his
> thesis.  In particular, the back substitution and simplification model feels
> like the right long term direction.
>
> Sebastian's FSM threader was the first step on that path (gcc-5). Exploiting
> that threader for more than just FSM loops was the next big step (gcc-6).
>
> This patch takes the next step -- dis-entangling that new jump threading
> code from the old threading code and VRP/DOM.
>
> The key thing to realize here is that the backwards (FSM) jump threader does
> not inherently need the DOM tables nor the ASSERT_EXPRs from VRP to do its
> job.  ie, it can and should run completely independently of DOM/VRP (though
> one day it may exploit range information that a prior VRP pass has
> computed).
>
> By moving the backwards threader into its own pass, we can run it prior to
> DOM/VRP, which allow DOM/VRP to work on a simpler CFG with larger extended
> basic blocks.

Ok, but the placement (and number of) threading passes then no longer depends
on DOM/VRP passes - and as you placed the threading passes _before_ those
passes the threading itself does not benefit from DOM/VRP but only from
previous optimization passes.

I see this as opportunity to remove some of them ;)  I now see in the main
optimization pipeline

  NEXT_PASS (pass_fre);
  NEXT_PASS (pass_thread_jumps);
  NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);

position makes sense - FRE removed redundancies and fully copy/constant
propagated the IL.

  NEXT_PASS (pass_sra);
  /* The dom pass will also resolve all __builtin_constant_p calls
 that are still there to 0.  This has to be done after some
 propagations have already run, but before some more dead code
 is removed, and this place fits nicely.  Remember this when
 trying to move or duplicate pass_dominator somewhere earlier.  */
  NEXT_PASS (pass_thread_jumps);
  NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */);

this position OTOH doesn't make much sense as IL cleanup is missing
after SRA and previous opts.  After loop we now have

  NEXT_PASS (pass_tracer);
  NEXT_PASS (pass_thread_jumps);
  NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
  NEXT_PASS (pass_strlen);
  NEXT_PASS (pass_thread_jumps);
  NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);

I don't think we want two threadings so close together.  It makes some sense
to have a threading _after_ DOM but before VRP (DOM cleaned up the IL).

So that would leave two from your four passes and expose the opportunity
to re-add one during early-opts, also after FRE.  That one should be
throttled down to operate in "-Os" mode though.

So, can you see what removing the two threading passes that don't make
sense to me do to your statistics?  And check whether a -Os-like threading
can be done early?

Thanks,
Richard.

> The removal of unexecutable paths before VRP also has the nice effect of
> also eliminating false positive warnings for some work Aldy is doing around
> out-of-bound array index warnings.
>
> We can remove all the calls to the backwards threader from the old style
> threader.  The way the FSM bits wired into the old threader caused redundant
> path evaluations.  That can be easily fixed with the FSM bits in their own
> pass.  The net is a 25% reduction in paths examined by the FSM threader.
>
> Finally, we ultimately end up threading more jumps.  I don't have the #s
> handy anymore, but when I ran this through my tester there was a clear
> decrease in the number of runtime jumps.
>
> So what are the downsides.
>
> With the threader in its own pass, we end up getting more calls into the CFG
> & SSA verification routines in a checking-enabled compiler.  So the
> compile-time improvement is lost for a checking-enabled compiler.
>
> The backward threader does not combine identical jump threading paths with
> different starting edges into a single jump threading path with multiple
> entry points.  This is primarily a codesize issue, but can have a secondary
> effect on performance.  I know how to fix this and it's on the list for
> gcc-7 along with further cleanups.
>
>
> Bootstrapped and regression tested on x86_64 linux.  Installing on the trunk
> momentarily.
>
> Jeff
>
> commit 35bd646a4834a68a49af9ccb5873362a0fc742ae
> Author: Jeff Law 
> Date:   Fri May 27 10:23:54 2016 -0600
>
> * tree-ssa-threadedge.c: Remove include of
> tree-ssa-threadbackward.h.
> (thread_across_edge): Remove calls to find_jump_threads_backwards.
> * passes.def: Add jump threading passes before DOM/VRP.
> * tree-ssa-threadbackward.c (find_jump_threads_backwards): Change
> argument to a basic block from an edge.  Remove tests which are
> 

Re: [visium] Split DImode arithmetical operations

2016-05-30 Thread Eric Botcazou
> This makes it so that DImode arithmetical operations are split into a pair
> of SImode operations in order to enable better scheduling.
> 
> Tested on visium-elf, applied on the mainline and 6 branch.

A minor follow-up, applied on the mainline and 6 branch.


2016-05-30  Eric Botcazou  

* config/visium/visium.c (visium_split_double_add): Minor tweaks.
(visium_expand_copysign): Use gen_int_mode directly.
(visium_compute_frame_size): Minor tweaks.

-- 
Eric BotcazouIndex: config/visium/visium.c
===
--- config/visium/visium.c	(revision 236827)
+++ config/visium/visium.c	(working copy)
@@ -2087,6 +2087,7 @@ visium_split_double_add (enum rtx_code c
   rtx op6 = gen_highpart (SImode, op0);
   rtx op7 = (op1 == const0_rtx ? op1 : gen_highpart (SImode, op1));
   rtx op8;
+  rtx x, pat, flags;
 
   /* If operand #2 is a small constant, then its high part is null.  */
   if (CONST_INT_P (op2))
@@ -2109,14 +2110,13 @@ visium_split_double_add (enum rtx_code c
 }
 
   /* This is the {add,sub,neg}si3_insn_set_flags pattern.  */
-  rtx x;
   if (op4 == const0_rtx)
 x = gen_rtx_NEG (SImode, op5);
   else
 x = gen_rtx_fmt_ee (code, SImode, op4, op5);
-  rtx pat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+  pat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
   XVECEXP (pat, 0, 0) = gen_rtx_SET (op3, x);
-  rtx flags = gen_rtx_REG (CC_NOOVmode, FLAGS_REGNUM);
+  flags = gen_rtx_REG (CC_NOOVmode, FLAGS_REGNUM);
   x = gen_rtx_COMPARE (CC_NOOVmode, shallow_copy_rtx (x), const0_rtx);
   XVECEXP (pat, 0, 1) = gen_rtx_SET (flags, x);
   emit_insn (pat);
@@ -2160,7 +2160,7 @@ visium_expand_copysign (rtx *operands, e
 	{
 	  long l;
 	  REAL_VALUE_TO_TARGET_SINGLE (*CONST_DOUBLE_REAL_VALUE (op1), l);
-	  op1 = force_reg (SImode, GEN_INT (trunc_int_for_mode (l, SImode)));
+	  op1 = force_reg (SImode, gen_int_mode (l, SImode));
 	}
 }
   else
@@ -3597,18 +3597,15 @@ visium_compute_frame_size (int size)
 int
 visium_initial_elimination_offset (int from, int to ATTRIBUTE_UNUSED)
 {
-  const int frame_size = visium_compute_frame_size (get_frame_size ());
   const int save_fp = current_frame_info.save_fp;
   const int save_lr = current_frame_info.save_lr;
   const int lr_slot = current_frame_info.lr_slot;
-  const int local_frame_offset
-= (save_fp + save_lr + lr_slot) * UNITS_PER_WORD;
   int offset;
 
   if (from == FRAME_POINTER_REGNUM)
-offset = local_frame_offset;
+offset = (save_fp + save_lr + lr_slot) * UNITS_PER_WORD;
   else if (from == ARG_POINTER_REGNUM)
-offset = frame_size;
+offset = visium_compute_frame_size (get_frame_size ());
   else
 gcc_unreachable ();
 


[gotools, libcc1] Update copyright dates

2016-05-30 Thread Rainer Orth
I noticed that libcc1 copyright dates hadn't been updated since it had
initially been added.  Looking around, the same applies to gotools.

This patch updates update-copyright.py to take care of that.  At the
same time, I checked that script, added appropriate entries for the
other missing directories and removed libmudflap support.  I'm also
including the results of a update-copyright.py run on gotools and
libcc1, but am uncertain what to include in default_dirs.

Bootstrapped without regressions on i386-pc-solaris2.12, ok for
mainline?

Rainer

2016-05-25  Rainer Orth  

libcc1:
Update copyrights.

gotools:
Update copyrights.

contrib:
* update-copyright.py (LibMudflapFilter): Remove.
(GCCCmdLine.__init__): Add gotools, libcc1.
Remove libmudflap.
List unhandled intl, libcilkrts, libgo, liboffloadmic,
maintainer-scripts.

# HG changeset patch
# Parent  2977caa0bc2d9892e182af40fcd11c51d9ec921b
Update copyright dates

diff --git a/contrib/update-copyright.py b/contrib/update-copyright.py
--- a/contrib/update-copyright.py
+++ b/contrib/update-copyright.py
@@ -631,15 +631,6 @@ class LibJavaFilter (GenericFilter):
 return re.compile ('.*icSigCopyrightTag')
 return GenericFilter.get_line_filter (self, dir, filename)
 
-class LibMudflapFilter (GenericFilter):
-def __init__ (self):
-GenericFilter.__init__ (self)
-
-self.skip_dirs |= set ([
-# Handled separately.
-'testsuite',
-])
-
 class LibStdCxxFilter (GenericFilter):
 def __init__ (self):
 GenericFilter.__init__ (self)
@@ -724,30 +715,34 @@ class GCCCmdLine (CmdLine):
 self.add_dir ('gcc', GCCFilter())
 self.add_dir (os.path.join ('gcc', 'testsuite'), TestsuiteFilter())
 self.add_dir ('gnattools')
+self.add_dir ('gotools')
 self.add_dir ('include')
+# intl is imported from upstream.
 self.add_dir ('libada')
 self.add_dir ('libatomic')
 self.add_dir ('libbacktrace')
+self.add_dir ('libcc1')
+# libcilkrts is imported from upstream.
 self.add_dir ('libcpp', LibCppFilter())
 self.add_dir ('libdecnumber')
 # libffi is imported from upstream.
 self.add_dir ('libgcc', LibGCCFilter())
 self.add_dir ('libgfortran')
+# libgo is imported from upstream.
 self.add_dir ('libgomp')
 self.add_dir ('libiberty')
 self.add_dir ('libitm')
 self.add_dir ('libjava', LibJavaFilter())
 self.add_dir (os.path.join ('libjava', 'testsuite'), TestsuiteFilter())
-self.add_dir ('libmudflap', LibMudflapFilter())
-self.add_dir (os.path.join ('libmudflap', 'testsuite'),
-  TestsuiteFilter())
 self.add_dir ('libobjc')
+# liboffloadmic is imported from upstream.
 self.add_dir ('libquadmath')
-# libsanitiser is imported from upstream.
+# libsanitizer is imported from upstream.
 self.add_dir ('libssp')
 self.add_dir ('libstdc++-v3', LibStdCxxFilter())
 self.add_dir ('libvtv')
 self.add_dir ('lto-plugin')
+# maintainer-scripts maintainer-scripts
 # zlib is imported from upstream.
 
 self.default_dirs = [
@@ -761,7 +756,6 @@ class GCCCmdLine (CmdLine):
 'libgfortran',
 'libgomp',
 'libitm',
-'libmudflap',
 'libobjc',
 'libstdc++-v3',
 ]
diff --git a/gotools/Makefile.am b/gotools/Makefile.am
--- a/gotools/Makefile.am
+++ b/gotools/Makefile.am
@@ -1,5 +1,5 @@
 # Makefile for gotools
-#   Copyright 2015 Free Software Foundation, Inc.
+#   Copyright (C) 2015-2016 Free Software Foundation, Inc.
 #
 # This file is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
diff --git a/gotools/configure.ac b/gotools/configure.ac
--- a/gotools/configure.ac
+++ b/gotools/configure.ac
@@ -1,5 +1,5 @@
 # Configure script for gotools.
-#   Copyright 2015 Free Software Foundation, Inc.
+#   Copyright (C) 2015-2016 Free Software Foundation, Inc.
 #
 # This file is free software; you can redistribute it and/or modify it
 # under the terms of the GNU General Public License as published by
diff --git a/libcc1/Makefile.am b/libcc1/Makefile.am
--- a/libcc1/Makefile.am
+++ b/libcc1/Makefile.am
@@ -1,4 +1,4 @@
-## Copyright (C) 2014 Free Software Foundation, Inc.
+## Copyright (C) 2014-2016 Free Software Foundation, Inc.
 
 ## This file is part of GCC.
 
diff --git a/libcc1/callbacks.cc b/libcc1/callbacks.cc
--- a/libcc1/callbacks.cc
+++ b/libcc1/callbacks.cc
@@ -1,5 +1,5 @@
 /* Callback management.
-   Copyright (C) 2014 Free Software Foundation, Inc.
+   Copyright (C) 2014-2016 Free Software Foundation, Inc.
 
 This file is part of GCC.
 
diff --git 

Re: [ARM] implement division using vrecpe/vrecps with -funsafe-math-optimizations

2016-05-30 Thread Prathamesh Kulkarni
On 23 May 2016 at 14:59, Prathamesh Kulkarni
 wrote:
> On 5 February 2016 at 18:40, Prathamesh Kulkarni
>  wrote:
>> On 4 February 2016 at 16:31, Ramana Radhakrishnan
>>  wrote:
>>> On Sun, Jan 17, 2016 at 9:06 AM, Prathamesh Kulkarni
>>>  wrote:
 On 31 July 2015 at 15:04, Ramana Radhakrishnan
  wrote:
>
>
> On 29/07/15 11:09, Prathamesh Kulkarni wrote:
>> Hi,
>> This patch tries to implement division with multiplication by
>> reciprocal using vrecpe/vrecps
>> with -funsafe-math-optimizations and -freciprocal-math enabled.
>> Tested on arm-none-linux-gnueabihf using qemu.
>> OK for trunk ?
>>
>> Thank you,
>> Prathamesh
>>
>
> I've tried this in the past and never been convinced that 2 iterations 
> are enough to get to stability with this given that the results are only 
> precise for 8 bits / iteration. Thus I've always believed you need 3 
> iterations rather than 2 at which point I've never been sure that it's 
> worth it. So the testing that you've done with this currently is not 
> enough for this to go into the tree.
>
> I'd like this to be tested on a couple of different AArch32 
> implementations with a wider range of inputs to verify that the results 
> are acceptable as well as running something like SPEC2k(6) with atleast 
> one iteration to ensure correctness.
 Hi,
 I got results of SPEC2k6 fp benchmarks:
 a15: +0.64% overall, 481.wrf: +6.46%
 a53: +0.21% overall, 416.gamess: -1.39%, 481.wrf: +6.76%
 a57: +0.35% overall, 481.wrf: +3.84%
 The other benchmarks had (almost) identical results.
>>>
>>> Thanks for the benchmarking results -  Please repost the patch with
>>> the changes that I had requested in my previous review - given it is
>>> now stage4 , I would rather queue changes like this for stage1 now.
>> Hi,
>> Please find the updated patch attached.
>> It passes testsuite for arm-none-linux-gnueabi, arm-none-linux-gnueabihf and
>> arm-none-eabi.
>> However the test-case added in the patch (neon-vect-div-1.c) fails to
>> get vectorized at -O2
>> for armeb-none-linux-gnueabihf.
>> Charles suggested me to try with -O3, which worked.
>> It appears the test-case fails to get vectorized with
>> -fvect-cost-model=cheap (which is default enabled at -O2)
>> and passes for -fno-vect-cost-model / -fvect-cost-model=dynamic
>>
>> I can't figure out why it fails -fvect-cost-model=cheap.
>> From the vect dump (attached):
>> neon-vect-div-1.c:12:3: note: Setting misalignment to -1.
>> neon-vect-div-1.c:12:3: note: not vectorized: unsupported unaligned load.*_9
> Hi,
> I think I have some idea why the test-case fails attached with patch
> fail to get vectorized on armeb with -O2.
>
> Issue with big endian vectorizer:
> The patch does not cause regressions on big endian vectorizer but
> fails to vectorize the test-cases attached with the patch, while they
> get vectorized on
> litttle-endian.
> Fails with armeb with the following message in dump:
> note: not vectorized: unsupported unaligned load.*_9
>
> The behavior of big and little endian vectorizer seems to be different
> in arm_builtin_support_vector_misalignment() which overrides the hook
> targetm.vectorize.support_vector_misalignment().
>
> targetm.vectorize.support_vector_misalignment is called by
> vect_supportable_dr_alignment () which in turn is called
> by verify_data_refs_alignment ().
>
> Execution upto following condition is common between arm and armeb
> in vect_supportable_dr_alignment():
>
> if ((TYPE_USER_ALIGN (type) && !is_packed)
>   || targetm.vectorize.support_vector_misalignment (mode, type,
> DR_MISALIGNMENT (dr), is_packed))
> /* Can't software pipeline the loads, but can at least do them.  */
> return dr_unaligned_supported;
>
> For little endian case:
> arm_builtin_support_vector_misalignment() is called with
> V2SF mode and misalignment == -1, and the following condition
> becomes true:
> /* If the misalignment is unknown, we should be able to handle the access
>  so long as it is not to a member of a packed data structure.  */
>   if (misalignment == -1)
> return true;
>
> Since the hook returned true we enter the condition above in
> vect_supportable_dr_alignment() and return dr_unaligned_supported;
>
> For big-endian:
> arm_builtin_support_vector_misalignment() is called with V2SF mode.
> The following condition that gates the entire function body fails:
>  if (TARGET_NEON && !BYTES_BIG_ENDIAN && unaligned_access)
> and the default hook gets called with V2SF mode and the default hook
> returns false because
> movmisalign_optab does not exist for V2SF mode.
>
> So the condition above in vect_supportable_dr_alignment() fails
> and we come here:
>  /* Unsupported.  

Re: RFC [1/2] divmod transform

2016-05-30 Thread Richard Biener
On Mon, 30 May 2016, Prathamesh Kulkarni wrote:

> The attached patch ICE's during bootstrap for x86_64, and is reproducible with
> following case with -m32 -O2:
> 
> typedef long long type;
> 
> type f(type x, type y)
> {
>   type q = x / y;
>   type r = x % y;
>   return q + r;
> }
> 
> The ICE happens because the test-case hits
> gcc_assert (unsignedp);
> in default_expand_divmod_libfunc ().

That's of course your function (and ICE).

> Surprisingly, optab_libfunc (sdivmod_optab, DImode) returns optab_libfunc
> with name "__divmoddi4" although __divmoddi4() is nowhere defined in
> libgcc for x86.
> (I verified that by forcing the patch to generate call to __divmoddi4,
> which results in undefined reference to __divmoddi4).
> 
> This happens because in optabs.def we have:
> OPTAB_NL(sdivmod_optab, "divmod$a4", UNKNOWN, "divmod", '4', gen_int_libfunc)
> 
> and gen_int_libfunc generates "__divmoddi4" on first call to optab_libfunc
> and sets optab_libfunc (sdivmod_optab, DImode) to "__divmoddi4".
> I wonder if we should remove gen_int_libfunc entry in optabs.def for
> sdivmod_optab ?

Hum, not sure - you might want to look at expand_divmod (though that
always just computes one part of the result in the end).

Joseph - do you know sth about why there's not a full set of divmod
libfuncs in libgcc?

Thanks,
Richard.


Re: [PATCH, PR69067] Remove assert in get_def_bb_for_const

2016-05-30 Thread Richard Biener
On Sun, 29 May 2016, Tom de Vries wrote:

> Hi,
> 
> this patch fixes graphite PR69067, a 6/7 regression.
> 
> 
> I.
> 
> Consider the following test-case, compiled with -O1 -floop-nest-optimize
> -flto:
> ...
> int a1, c1, cr, kt;
> int aa[2];
> 
> int
> ce (void)
> {
>   while (a1 < 1) {
> int g8;
> for (g8 = 0; g8 < 3; ++g8)
>   if (c1 != 0)
> cr = aa[a1 * 2] = kt;
> for (c1 = 0; c1 < 2; ++c1)
>   aa[c1] = cr;
> ++a1;
>   }
>   return 0;
> }
> 
> int
> main (void)
> {
>   return ce (aa);
> }
> ...
> 
> At graphite0, there's a loop with header bb4, which conditionally executes bb
> 5:
> ...
>   :
> 
>   :
>   # g8_39 = PHI <0(13), g8_19(3)>
>   # cr_lsm.1_4 = PHI 
>   # cr_lsm.2_22 = PHI 
>   if (c1_lsm.3_35 != 0)
> goto ;
>   else
> goto ;
> 
>   :
>   aa[_3] = 0;
> 
>   :
>   # cr_lsm.1_33 = PHI 
>   # cr_lsm.2_11 = PHI 
>   g8_19 = g8_39 + 1;
>   if (g8_19 <= 2)
> goto ;
>   else
> goto ;
> ...
> 
> 
> II.
> 
> The graphite transformation moves the condition '(P_35 <= -1 || P_35 >= 1)'
> out of the loop:
> ...
> [scheduler] original ast:
> {
>   for (int c0 = 0; c0 <= 2; c0 += 1) {
> S_4(c0);
> if (P_35 <= -1 || P_35 >= 1)
>   S_5(c0);
> S_6(c0);
>   }
>   S_7();
>   for (int c0 = 0; c0 <= 1; c0 += 1)
> S_8(c0);
> }
> 
> [scheduler] AST generated by isl:
> {
>   if (P_35 <= -1) {
> for (int c0 = 0; c0 <= 2; c0 += 1)
>   S_5(c0);
>   } else if (P_35 >= 1)
> for (int c0 = 0; c0 <= 2; c0 += 1)
>   S_5(c0);
>   for (int c0 = 0; c0 <= 2; c0 += 1) {
> S_4(c0);
> S_6(c0);
>   }
>   S_7();
>   for (int c0 = 0; c0 <= 1; c0 += 1)
> S_8(c0);
> }
> ...
> 
> When instantiating the ast back to gimple, we run into an assert:
> ...
> pr-graphite-4.c: In function ‘ce’:
> pr-graphite-4.c:5:1: internal compiler error: in get_def_bb_for_const, at
> graphite-isl-ast-to-gimple.c:1795
>  ce()
>  ^
> ...
> 
> 
> III.
> 
> What happens is the following: in copy_cond_phi_args we try to copy the
> arguments of phi in bb6 to the arguments of new_phi in bb 46
> ...
> (gdb) call debug_gimple_stmt (phi)
> cr_lsm.1_33 = PHI 
> (gdb) call debug_gimple_stmt (new_phi)
> cr_lsm.1_62 = PHI <(28), (47)>
> ...
> 
> While handling the '0' phi argument in add_phi_arg_for_new_expr we trigger
> this bit of code and call get_def_bb_for_const with bb.index == 46 and
> old_bb.index == 5:
> ...
>   /* If the corresponding def_bb could not be found the entry will
>  be NULL.  */
>   if (TREE_CODE (old_phi_args[i]) == INTEGER_CST)
> def_pred[i]
>   = get_def_bb_for_const (new_bb,
>   gimple_phi_arg_edge (phi, i)->src);
> ...
> 
> Neither of the two copies of bb 5 dominates bb 46, so we run into the assert
> at the end:
> ...
> /* Returns a basic block that could correspond to where a constant was
>   defined in the original code.  In the original code OLD_BB had the
>   definition, we need to find which basic block out of the copies of
>   old_bb, in the new region, should a definition correspond to if it
>   has to reach BB.  */
> 
> basic_block translate_isl_ast_to_gimple::
> get_def_bb_for_const (basic_block bb, basic_block old_bb) const
> {
>   vec  *bbs = region->copied_bb_map->get (old_bb);
> 
>   if (!bbs || bbs->is_empty ())
> return NULL;
> 
>   if (1 == bbs->length ())
> return (*bbs)[0];
> 
>   int i;
>   basic_block b1 = NULL, b2;
>   FOR_EACH_VEC_ELT (*bbs, i, b2)
> {
>   if (b2 == bb)
> return bb;
> 
>   /* BB and B2 are in two unrelated if-clauses.  */
>   if (!dominated_by_p (CDI_DOMINATORS, bb, b2))
> continue;
> 
>   /* Compute the nearest dominator.  */
>   if (!b1 || dominated_by_p (CDI_DOMINATORS, b2, b1))
> b1 = b2;
> }
> 
>   gcc_assert (b1);
>   return b1;
> }
> ...
> 
> 
> IV.
> 
> Attached patch fixes this by removing the assert.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk, 6-branch?

Ok.

Thanks,
Richard.

> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH] match.pd: optimize unsigned mul overflow check

2016-05-30 Thread Alexander Monakov
On Sun, 29 May 2016, Marc Glisse wrote:
> On Sat, 28 May 2016, Alexander Monakov wrote:
> 
> > For unsigned A, B, 'A > -1 / B' is a nice predicate for checking whether
> > 'A*B'
> > overflows (or 'B && A > -1 / B' if B may be zero).  Let's optimize it to an
> > invocation of __builtin_mul_overflow to avoid the divide operation.
> 
> I forgot to ask earlier: what does this give for modes / platforms where
> umulv4 does not have a specific implementation? Is the generic implementation
> worse than A>-1/B, in which case we may want to check optab_handler before
> doing the transformation? Or is it always at least as good?

If umulv4 is unavailable (which today is everywhere except x86), gcc
falls back as follows.  First, it tries to see if doing a multiplication in a
2x wider type is possible (which it usually is, as gcc supports __int128_t on
64-bit platforms and 64-bit long long on 32-bit platforms), then it looks at
high bits of the 2x wide product.  This should boil down to doing a 'high
multiply' instruction if original operands' type matches register size, and a
normal multiply + masking high bits if the type is smaller than register.

Second, if the above fails (e.g. with 64-bit operands on a 32-bit platform),
then gcc emits a sequence that performs the multiplication by parts in a 2x
narrower type.

I think the first, more commonly taken, fallback path results in an
always-good code. In the second case, the eliminated 64-bit divide is unlikely
to have a direct hw support; e.g., on i386 it's a library call to __udivdi3.
This makes the transformation a likely loss for code size, a likely win for
performance.  It could be better if GCC could CSE REALPART (IFN_MUL_OVERFLOW)
with A*B on gimple.

Thanks.
Alexander