Re: [ipa-vrp] ice in set_value_range

2016-11-09 Thread kugan

Hi Andrew,

On 10/11/16 17:14, Andrew Pinski wrote:

On Wed, Nov 9, 2016 at 12:01 AM, kugan
 wrote:

Hi Andrew,

On 09/11/16 17:02, Andrew Pinski wrote:


Either this patch or the patch for "Handle unary pass-through jump
functions for ipa-vrp" caused a bootstrap failure on
aarch64-linux-gnu.
Bootstrap comparison failure!
gcc/go/types.o differs
gcc/fortran/class.o differs
gcc/tree-ssa-live.o differs
gcc/data-streamer-out.o differs
gcc/ira-build.o differs
gcc/hsa-gen.o differs
gcc/hsa-brig.o differs
gcc/omp-low.o differs
gcc/lto-streamer-in.o differs
gcc/real.o differs
gcc/final.o differs
gcc/df-core.o differs

I bootstrap with the following options:

--with-cpu=thunderx+lse --enable-languages=c,c++,fortran,go
--disable-werror --with-sysroot=/ --enable-plugins
--enable-gnu-indirect-function

I have not tried removing the +lse part though


I was able to reproduce it with just  --with-cpu=thunderx .  I am
trying without " --with-cpu=thunderx" now.  This is in my jenkins env
and I have not tried to reproduce it outside of it yet.


I can reproduce it. I am going to revert it as this is affecting your 
bootstrap. I will commit after fixing this (of-course after review)


Thanks,
Kugan


Thanks,
Andrew




Sorry about the breakage. I will try to reproduce this.

Thanks,
Kugan


Re: [ipa-vrp] ice in set_value_range

2016-11-09 Thread Andrew Pinski
On Wed, Nov 9, 2016 at 12:01 AM, kugan
 wrote:
> Hi Andrew,
>
> On 09/11/16 17:02, Andrew Pinski wrote:
>>
>> Either this patch or the patch for "Handle unary pass-through jump
>> functions for ipa-vrp" caused a bootstrap failure on
>> aarch64-linux-gnu.
>> Bootstrap comparison failure!
>> gcc/go/types.o differs
>> gcc/fortran/class.o differs
>> gcc/tree-ssa-live.o differs
>> gcc/data-streamer-out.o differs
>> gcc/ira-build.o differs
>> gcc/hsa-gen.o differs
>> gcc/hsa-brig.o differs
>> gcc/omp-low.o differs
>> gcc/lto-streamer-in.o differs
>> gcc/real.o differs
>> gcc/final.o differs
>> gcc/df-core.o differs
>>
>> I bootstrap with the following options:
>>
>> --with-cpu=thunderx+lse --enable-languages=c,c++,fortran,go
>> --disable-werror --with-sysroot=/ --enable-plugins
>> --enable-gnu-indirect-function
>>
>> I have not tried removing the +lse part though

I was able to reproduce it with just  --with-cpu=thunderx .  I am
trying without " --with-cpu=thunderx" now.  This is in my jenkins env
and I have not tried to reproduce it outside of it yet.

Thanks,
Andrew

>
>
> Sorry about the breakage. I will try to reproduce this.
>
> Thanks,
> Kugan


[PATCH] loop distribution bug fix

2016-11-09 Thread Jim Wilson
This fixes a bug in code adding edges to the dependence graph. Values
for this_dir can be -1 (backward edge), 0 (no edge), 1 (forward edge),
and 2 (both backward and forward edges). There can be multiple
dependencies checked, creating multiple edges that have to be merged
together. this_dir contains the current edge. dir contains the
previous edges. The code fails to handle the case where this_dir is 2,
in which case we can return immediately. This is a minor optimization
to improve compile time. The code handles the case where dir is
non-zero and this_dir is zero by returning 2, which is incorrect. dir
should be unmodified in this case. We can return 2 only if both dir
and this_dir are non-zero and unequal, i.e. one is -1 and the other is
1. This problem creates extra unnecessary edges, which can prevent
loops from being distributed. The patch fixes both problems.

This passed an x86_64 gcc bootstrap with -ftree-loop-distribution
added to BOOT_CFLAGS and a testsuite regression check.  Curiously, I
see that I get different results for the C/C++ ubsan tests every time
I run them, but this has nothing to do with my patch, as it happens
with or without my patch.  I haven't tried debugging this yet.  Might
be related to my recent upgrade to Ubuntu 16.04 LTS.  Otherwise, there
are no regressions.

On SPEC CPU2006, on aarch64, I see 5879 loops distributed without the
patch, and 5906 loops distributed with the patch. So 27 extra loops
are distributed which is about 0.5% more loop distributions. There is
no measurable performance gain from the bug fix on the CPU2006 run
time though I plan to spend some more time looking at this code to see
if I can find other improvements.

OK?

Jim
2016-11-09  Jim Wilson  

	* tree-loop-distribution.c (pg_add_dependence_edges): Return 2 if
	this_dir is 2.  Check for this_dir non-zero before dir != this_dir
	check.

Index: gcc/tree-loop-distribution.c
===
--- gcc/tree-loop-distribution.c	(revision 242025)
+++ gcc/tree-loop-distribution.c	(working copy)
@@ -1408,9 +1408,11 @@ pg_add_dependence_edges (struct graph *rdg, vec

Re: [PATCH] simplify-rtx: Transform (xor (and (xor A B) C) B) with C const

2016-11-09 Thread Bernd Schmidt

On 11/09/2016 10:58 PM, Segher Boessenkool wrote:

I'll do a PowerPC-specific testcase for all rl[wd]* next week.  rl[wd]imi
will show this xor-xor thing (half of all possible insns were not optimised
before this patch).  Is that enough?


Sure. Once David's rtl testing work is ready we'll want to start doing 
selftests for these kinds of things.



Bernd



[committed] Fix g++.dg/asan/asan_test.C failure

2016-11-09 Thread Jakub Jelinek
Hi!

I've noticed that the asan_test.C test started failing today.
The problem are many -Wc++1z-compat warnings.  This lead to discovery of
PR78283, which fixes the "many" part of it, but there are still some,
so this patch disables them.  While playing with the preprocessed source
of the testcase, I've noticed some -Wformat warnings which somebody worked
around by using -Wno-format, but that looks wrong - we instead should fix
the underlying issue - if the condition for EXPECT_TRUE etc. macros
contain % character, then we invoke undefined behavior if there are
failures.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed as obvious
to trunk.

2016-11-09  Jakub Jelinek  

* g++.dg/asan/asan_test.C: Remove -Wno-format, add
-Wno-c++1z-compat.
* g++.dg/asan/dejagnu-gtest.h (EXPECT_TRUE): Use "%s" format
string in case condition contains % characters.

--- gcc/testsuite/g++.dg/asan/asan_test.C.jj2014-09-25 15:02:37.0 
+0200
+++ gcc/testsuite/g++.dg/asan/asan_test.C   2016-11-09 21:54:37.038627318 
+0100
@@ -2,7 +2,7 @@
 // { dg-skip-if "" { *-*-* } { "*" } { "-O2" } }
 // { dg-skip-if "" { *-*-* } { "-flto" } { "" } }
 // { dg-additional-sources "asan_globals_test-wrapper.cc" }
-// { dg-options "-std=c++11 -fsanitize=address -fno-builtin -Wall -Wno-format 
-Werror -g -DASAN_UAR=0 -DASAN_HAS_EXCEPTIONS=1 -DASAN_HAS_BLACKLIST=0 
-DSANITIZER_USE_DEJAGNU_GTEST=1 -lasan -lpthread -ldl" }
+// { dg-options "-std=c++11 -fsanitize=address -fno-builtin -Wall 
-Wno-c++1z-compat -Werror -g -DASAN_UAR=0 -DASAN_HAS_EXCEPTIONS=1 
-DASAN_HAS_BLACKLIST=0 -DSANITIZER_USE_DEJAGNU_GTEST=1 -lasan -lpthread -ldl" }
 // { dg-additional-options "-DASAN_NEEDS_SEGV=1" { target { ! arm*-*-* } } }
 // { dg-additional-options "-DASAN_LOW_MEMORY=1 -DASAN_NEEDS_SEGV=0" { target 
arm*-*-* } }
 // { dg-additional-options "-DASAN_AVOID_EXPENSIVE_TESTS=1" { target { ! 
run_expensive_tests } } }
--- gcc/testsuite/g++.dg/asan/dejagnu-gtest.h.jj2014-09-25 
15:02:37.255763202 +0200
+++ gcc/testsuite/g++.dg/asan/dejagnu-gtest.h   2016-11-09 21:50:41.352612632 
+0100
@@ -68,7 +68,8 @@ while (0)
 #define EXPECT_TRUE(condition) \
   if (!(condition))\
 {  \
-  fprintf (stderr, "EXPECT_TRUE failed: " #condition "\n");\
+  fprintf (stderr, "%s",   \
+  "EXPECT_TRUE failed: " #condition "\n"); \
   exit (1);\
 }
 #define EXPECT_FALSE(condition) EXPECT_TRUE (!condition)


Jakub


Re: [PATCH] simplify-rtx: Transform (xor (and (xor A B) C) B) with C const

2016-11-09 Thread Segher Boessenkool
On Wed, Nov 09, 2016 at 11:29:45PM +0100, Marc Glisse wrote:
> >>>This patch makes RTL simplify transform (xor (and (xor A B) C) B) back
> >>>to (ior (and A C) (and B ~C)) for constant C (and similar with A instead
> >>>of B for that last term).
> >>
> >>Would it make sense to implement this transformation in match.pd, next to
> >>the "opposite" one, or do you need it at the RTL level because C only
> >>becomes a constant at that stage?
> >
> >It becomes a constant in the later gimple passes, but we need it in the RTL
> >simplifiers as well, even if you also do it in match.pd?
> 
> (assuming it is always an improvement, even though it may use the same 
> number of operations and one more constant)

And a shallower evaluation tree.

Does match.pd (or the gimple optimisers in general) care about the number
of constants at all?

> Sure, it doesn't hurt to have it in both places. It just seems that since 
> the problem was caused by match.pd in your original testcase, fixing it at 
> that level (undoing the harm as soon as possible) would make the RTL 
> version less useful (though not useless). Anyway, I don't feel competent 
> to decide when which form is preferable, I was just curious.

I don't know either.  See PR63568.

> (simplify
>  (bit_xor:cs (bit_and:s (bit_xor:cs @0 @1) CONSTANT_CLASS_P@2) @0)
>  (bit_ior (bit_and @0 (bit_not @2)) (bit_and @1 @2)))
> 
> (this handles vectors as well, I don't know if that is desired)

No clue.  There is a theme here ;-)


Segher


[committed] Fix -Wc++1z-compat warning (PR c++/78283)

2016-11-09 Thread Jakub Jelinek
Hi!

Once mangle_decl warns once about the mangling changes for C++1z,
it warns about all following symbols, including stuff like ,
.L* symbols etc. or any other symbols that are not affected by the
exception specification mangling changes.
Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
approved by Jason in the PR, committed to trunk.

2016-11-09  Jakub Jelinek  

PR c++/78283
* mangle.c (start_mangling): Reset G.need_cxx1z_warning.

* g++.dg/cpp1z/noexcept-type13.C: New test.

--- gcc/cp/mangle.c.jj  2016-11-09 15:22:35.0 +0100
+++ gcc/cp/mangle.c 2016-11-09 21:45:44.013378262 +0100
@@ -3649,6 +3649,7 @@ start_mangling (const tree entity)
 {
   G.entity = entity;
   G.need_abi_warning = false;
+  G.need_cxx1z_warning = false;
   obstack_free (_obstack, name_base);
   mangle_obstack = _obstack;
   name_base = obstack_alloc (_obstack, 0);
--- gcc/testsuite/g++.dg/cpp1z/noexcept-type13.C.jj 2016-11-09 
21:43:45.835873413 +0100
+++ gcc/testsuite/g++.dg/cpp1z/noexcept-type13.C2016-11-09 
21:44:32.906277891 +0100
@@ -0,0 +1,14 @@
+// PR c++/78283
+// { dg-do compile }
+// { dg-options "-Wall" }
+
+void foo () throw () {}// { dg-bogus "mangled name" }
+
+template 
+T bar (T x) { return x; }  // { dg-warning "mangled name" "" { target 
c++14_down } }
+
+void baz () {  // { dg-bogus "mangled name" }
+  return (bar (foo)) ();
+}
+
+void decl () {}// { dg-bogus "mangled name" }

Jakub


Re: Review debug message generation

2016-11-09 Thread Jonathan Wakely

On 09/11/16 22:36 +0100, François Dumont wrote:

Hi

   Here is a proposal to review how we generate the debug output in 
case of assertion failure. It removes usage of format_word which, as a 
side effect will fix PR 77459. Should I reference this PR in the 
ChangeLog ?


Please do either mention the PR in the ChangeLog, or just put a note
in the bugzilla PR that says "fixed by rN" with the SVN commit
number, to make it easier to know what fixed it.

   I introduced a print_literal function to avoid using strlen on 
them. I know that gcc optimizes calls to strlen on literals but in our 
case we were not directly calling it on the literals.


   Tested under Linux x86_64, ok to commit ?

   * src/c++11/debug.cc (format_word): Delete.
   (print_literal): New. Replace call to print_word for literals.


This is a nice change, OK to commit, thanks.




Re: [PATCH] simplify-rtx: Transform (xor (and (xor A B) C) B) with C const

2016-11-09 Thread Marc Glisse

On Wed, 9 Nov 2016, Marc Glisse wrote:


On Wed, 9 Nov 2016, Segher Boessenkool wrote:


On Wed, Nov 09, 2016 at 10:54:53PM +0100, Marc Glisse wrote:

match.pd transforms (A)|(B&~C) to ((A^B))^B, which is fewer
operations if C is not const (and it is not on simple tests at least,
this transform is done very early already).

Various processors have "insert" instructions that can do this, but
combine cannot build those from the xor-and-xor, especially it has no
chance at all to do that if A or B or multiple instructions as well
(on PowerPC, the rl[ws]imi instructions that can do this with a rotate,
or a simple shift with appropriate C; other ISAs have similar insns).

This patch makes RTL simplify transform (xor (and (xor A B) C) B) back
to (ior (and A C) (and B ~C)) for constant C (and similar with A instead
of B for that last term).


Would it make sense to implement this transformation in match.pd, next to
the "opposite" one, or do you need it at the RTL level because C only
becomes a constant at that stage?


It becomes a constant in the later gimple passes, but we need it in the RTL
simplifiers as well, even if you also do it in match.pd?


(assuming it is always an improvement, even though it may use the same number 
of operations and one more constant)


Sure, it doesn't hurt to have it in both places. It just seems that since the 
problem was caused by match.pd in your original testcase, fixing it at that 
level (undoing the harm as soon as possible) would make the RTL version less 
useful (though not useless). Anyway, I don't feel competent to decide when 
which form is preferable, I was just curious.


(simplify
(bit_xor:cs (bit_and:s (bit_xor:cs @0 @1) CONSTANT_CLASS_P@2) @0)


 (bit_xor:c (bit_and:s (bit_xor:cs @0 @1) CONSTANT_CLASS_P@2) @0)

without the initial "s" of course... (nice, genmatch does notice something 
is wrong if I leave it in)



(bit_ior (bit_and @0 (bit_not @2)) (bit_and @1 @2)))

(this handles vectors as well, I don't know if that is desired)


--
Marc Glisse


[PING] [PATCH] enhance buffer overflow warnings (and c/53562)

2016-11-09 Thread Martin Sebor

The attached minor update to the patch also resolves bug 77784 that
points out that -Wformat-length issues a warning also issued during
the expansion of some of the __builtin___sprintf_chk intrinsics.

Martin

On 11/04/2016 02:16 PM, Martin Sebor wrote:

Attached is an update to the patch that takes into consideration
the feedback I got.  It goes back to adding just one option,
-Wstringop-overflow, as in the original, while keeping the Object
Size type as an argument.  It uses type-1 as the default setting
for string functions (strcpy et al.) and, unconditionally, type-0
for raw memory functions (memcpy, etc.)

I retested Binutils 2.27 and the Linux kernel again with this patch
and also added Glibc, and it doesn't complain about anything (both
Binutils and the kernel also build cleanly with an unpatched GCC
with_FORTIFY_SOURCE=2 or its rough equivalent for the kernel).
The emit-rtl.c warning (bug 78174) has also been suppressed by
the change to bos type-0 for memcpy.

While the patch doesn't trigger any false positives (AFAIK) it is
subject to a fair number of false negatives due to the limitations
of the tree-object-size pass, and due to transformations done by
other passes that prevent it from detecting some otherwise obvious
overflows.  Although unfortunate, I believe the warnings that are
emitted are useful as the first line of defense in software that
doesn't use _FORTIFY_SOURCE (such as GCC itself).   And this can
of course be improved if some of the limitations are removed over
time.

Martin


PR c/53562 - Add -Werror= support for -D_FORTIFY_SOURCE / __builtin___memcpy_chk
PR middle-end/77784 - duplicate warning for snprintf when n > object size
PR middle-end/78149 - missing warning on strncpy buffer overflow due to an excessive bound
PR middle-end/78138 - missing warnings on buffer overflow with non-constant source length

gcc/c-family/ChangeLog:
	PR c/53562
	* c.opt (-Wstringop-overflow): New option.

gcc/ChangeLog:
	PR c/53562
	PR middle-end/77784
	PR middle-end/78149
	PR middle-end/78138
	
	* builtins.c (expand_builtin_strcat, expand_builtin_strncat): New
	functions.
	(compute_dest_size, get_size_range, check_sizes, check_strncat_sizes)
	(check_memop_sizes): Same.
	(expand_builtin_memcpy): Call check memop_sizes.
	(expand_builtin_mempcpy): Same.
	(expand_builtin_memset): Same,
	(expand_builtin_bzero): Same.
	(expand_builtin_memory_chk): Call check_sizes.
	(expand_builtin_strcpy): Same.
	(expand_builtin_strncpy): Same.
	(maybe_emit_sprintf_chk_warning): Same.
	(expand_builtin): Handle strcat and strncat.
	(fini_object_sizes): Reset pointers.
	(compute_object_size): New function.
	* gimple-ssa-sprintf.c (pass_sprintf_length::handle_gimple_call):
	Avoid issuing warnings also issued during built-in expansion.
	* doc/invoke.texi (Warning Options): Document -Wstringop-overflow.

gcc/testsuite/ChangeLog:
	PR c/53562
	PR middle-end/77784
	PR middle-end/78149
	PR middle-end/78138

	* c-c++-common/Wsizeof-pointer-memaccess2.c: Adjust expected diagnostic.
	* g++.dg/ext/builtin-object-size3.C (bar): Same.
	* g++.dg/ext/strncpy-chk1.C: Same.
	* g++.dg/opt/memcpy1.C: Same.
	* g++.dg/torture/Wsizeof-pointer-memaccess1.C: Same.
	* gcc.c-torture/compile/pr55569.c: Disable -Wstringop-overflow.
	* gcc.dg/Wobjsize-1.c: Adjust expected diagnostic.
	* gcc.dg/attr-alloc_size.c: Same.
	* gcc.dg/builtin-stringop-chk-1.c: Adjust expected diagnostic.
	* gcc.dg/builtin-stringop-chk-2.c: Same.
	* gcc.dg/builtin-stringop-chk-4.c: New test.
	* gcc.dg/builtin-strncat-chk-1.c: Adjust expected diagnostic.
	* gcc.dg/memcpy-2.c: Same.
	* gcc.dg/pr40340-1.c: Same.
	* gcc.dg/pr40340-2.c (main): Same.
	* gcc.dg/pr40340-5.c (main): Same.
	* gcc.dg/torture/Wsizeof-pointer-memaccess1.c: Same.
	* gcc.dg/torture/pr71132.c: Disable -Wstringop-overflow.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Adjust text of expected
	warning.

	* gcc.dg/fstack-protector-strong.c: Add expected diagnostic.
diff --git a/gcc/builtins.c b/gcc/builtins.c
index cc711a0..cb7bbaf 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -67,7 +67,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
 #include "gimple-fold.h"
-
+#include "intl.h"
 
 struct target_builtins default_target_builtins;
 #if SWITCHABLE_TARGET
@@ -125,9 +125,11 @@ static rtx expand_builtin_mempcpy (tree, rtx, machine_mode);
 static rtx expand_builtin_mempcpy_with_bounds (tree, rtx, machine_mode);
 static rtx expand_builtin_mempcpy_args (tree, tree, tree, rtx,
 	machine_mode, int, tree);
+static rtx expand_builtin_strcat (tree, rtx);
 static rtx expand_builtin_strcpy (tree, rtx);
 static rtx expand_builtin_strcpy_args (tree, tree, rtx);
 static rtx expand_builtin_stpcpy (tree, rtx, machine_mode);
+static rtx expand_builtin_strncat (tree, rtx);
 static rtx expand_builtin_strncpy (tree, rtx);
 static rtx builtin_memset_gen_str (void *, HOST_WIDE_INT, machine_mode);
 static rtx expand_builtin_memset (tree, rtx, machine_mode);
@@ 

Re: [PATCH] simplify-rtx: Transform (xor (and (xor A B) C) B) with C const

2016-11-09 Thread Marc Glisse

On Wed, 9 Nov 2016, Segher Boessenkool wrote:


On Wed, Nov 09, 2016 at 10:54:53PM +0100, Marc Glisse wrote:

match.pd transforms (A)|(B&~C) to ((A^B))^B, which is fewer
operations if C is not const (and it is not on simple tests at least,
this transform is done very early already).

Various processors have "insert" instructions that can do this, but
combine cannot build those from the xor-and-xor, especially it has no
chance at all to do that if A or B or multiple instructions as well
(on PowerPC, the rl[ws]imi instructions that can do this with a rotate,
or a simple shift with appropriate C; other ISAs have similar insns).

This patch makes RTL simplify transform (xor (and (xor A B) C) B) back
to (ior (and A C) (and B ~C)) for constant C (and similar with A instead
of B for that last term).


Would it make sense to implement this transformation in match.pd, next to
the "opposite" one, or do you need it at the RTL level because C only
becomes a constant at that stage?


It becomes a constant in the later gimple passes, but we need it in the RTL
simplifiers as well, even if you also do it in match.pd?


(assuming it is always an improvement, even though it may use the same 
number of operations and one more constant)


Sure, it doesn't hurt to have it in both places. It just seems that since 
the problem was caused by match.pd in your original testcase, fixing it at 
that level (undoing the harm as soon as possible) would make the RTL 
version less useful (though not useless). Anyway, I don't feel competent 
to decide when which form is preferable, I was just curious.


(simplify
 (bit_xor:cs (bit_and:s (bit_xor:cs @0 @1) CONSTANT_CLASS_P@2) @0)
 (bit_ior (bit_and @0 (bit_not @2)) (bit_and @1 @2)))

(this handles vectors as well, I don't know if that is desired)

--
Marc Glisse


Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)

2016-11-09 Thread Joseph Myers
On Wed, 9 Nov 2016, Bernd Edlinger wrote:

> Yes, but maybe introduce a test if the half-wide value fits?
> 
> like:
> 
> #define M_OK2(M, T) ((M) > sizeof(T) * CHAR_BIT / 2 - 1)

Something like that.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] PR78241, fix loop unroller when niter expr is not reliable

2016-11-09 Thread Pat Haugen
The following fixes a problem introduced by my earlier loop unroller patch, 
https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01612.html. In instances where 
the niter expr is not reliable we need to still emit an initial peel copy of 
the loop.

Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?

-Pat


2016-11-09  Pat Haugen  

PR rtl-optimization/78241
* loop-unroll.c (unroll_loop_runtime_iterations): Don't adjust 'niter', 
but
emit initial peel copy if niter expr is not reliable.


testsuite/ChangeLog:
2016-11-09  Pat Haugen  

* gcc.dg/pr78241.c: New test.


Index: gcc/loop-unroll.c
===
--- gcc/loop-unroll.c	(revision 241821)
+++ gcc/loop-unroll.c	(working copy)
@@ -918,9 +918,10 @@ unroll_loop_runtime_iterations (struct l
   if (tmp != niter)
 emit_move_insn (niter, tmp);
 
-  /* For loops that exit at end, add one to niter to account for first pass
- through loop body before reaching exit test. */
-  if (exit_at_end)
+  /* For loops that exit at end and whose number of iterations is reliable,
+ add one to niter to account for first pass through loop body before
+ reaching exit test. */
+  if (exit_at_end && !desc->noloop_assumptions)
 {
   niter = expand_simple_binop (desc->mode, PLUS,
    niter, const1_rtx,
@@ -946,7 +947,7 @@ unroll_loop_runtime_iterations (struct l
 
   auto_sbitmap wont_exit (max_unroll + 2);
 
-  if (extra_zero_check)
+  if (extra_zero_check || desc->noloop_assumptions)
 {
   /* Peel the first copy of loop body.  Leave the exit test if the number
 	 of iterations is not reliable.  Also record the place of the extra zero
Index: gcc/testsuite/gcc.dg/pr78241.c
===
--- gcc/testsuite/gcc.dg/pr78241.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr78241.c	(working copy)
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-Og -funroll-loops" } */
+
+static __attribute__((noinline, noclone)) unsigned
+foo (unsigned x)
+{
+  do
+x++;
+  while (x <= 15);
+  return x;
+}
+
+int main ()
+{
+  unsigned x = foo (-2);
+  if (x != (unsigned)-1)
+__builtin_abort();
+  return 0;
+}
+


Re: [PATCH] simplify-rtx: Transform (xor (and (xor A B) C) B) with C const

2016-11-09 Thread Segher Boessenkool
On Wed, Nov 09, 2016 at 10:54:53PM +0100, Marc Glisse wrote:
> >match.pd transforms (A)|(B&~C) to ((A^B))^B, which is fewer
> >operations if C is not const (and it is not on simple tests at least,
> >this transform is done very early already).
> >
> >Various processors have "insert" instructions that can do this, but
> >combine cannot build those from the xor-and-xor, especially it has no
> >chance at all to do that if A or B or multiple instructions as well
> >(on PowerPC, the rl[ws]imi instructions that can do this with a rotate,
> >or a simple shift with appropriate C; other ISAs have similar insns).
> >
> >This patch makes RTL simplify transform (xor (and (xor A B) C) B) back
> >to (ior (and A C) (and B ~C)) for constant C (and similar with A instead
> >of B for that last term).
> 
> Would it make sense to implement this transformation in match.pd, next to 
> the "opposite" one, or do you need it at the RTL level because C only 
> becomes a constant at that stage?

It becomes a constant in the later gimple passes, but we need it in the RTL
simplifiers as well, even if you also do it in match.pd?


Segher


Re: [PATCH] simplify-rtx: Transform (xor (and (xor A B) C) B) with C const

2016-11-09 Thread Segher Boessenkool
On Wed, Nov 09, 2016 at 10:27:35PM +0100, Bernd Schmidt wrote:
> On 11/09/2016 10:13 PM, Segher Boessenkool wrote:
> > * simplify-rtx.c (simplify_binary_operation_1): Simplify
> > (xor (and (xor A B) C) B) to (ior (and A C) (and B ~C)) and
> > (xor (and (xor A B) C) B) to (ior (and A ~C) (and B C)) if C
> > is a const_int.
> 
> I think one of the xors should have A as the second operand.

The second, thanks for spotting it.

> >+/* If we have (xor (and (xor A B) C) A) with C a constant we can 
> >instead
> >+   do (ior (and A ~C) (and B C)) which is a machine instruction on 
> >some
> >+   machines, and also has shorter instruction path length.  */
> >+  if (GET_CODE (op0) == AND
> 
> Comments doesn't line up with the if/else on my monitor; could be email 
> damage but please check.

I messed it up (had // comments first, whoops).

> Other than that, I think it does qualify as a simplification (or at 
> least an improvement), so OK. Would be nice to check for it with a testcase.

I'll do a PowerPC-specific testcase for all rl[wd]* next week.  rl[wd]imi
will show this xor-xor thing (half of all possible insns were not optimised
before this patch).  Is that enough?


Segher


Re: [PATCH] simplify-rtx: Transform (xor (and (xor A B) C) B) with C const

2016-11-09 Thread Marc Glisse

On Wed, 9 Nov 2016, Segher Boessenkool wrote:


match.pd transforms (A)|(B&~C) to ((A^B))^B, which is fewer
operations if C is not const (and it is not on simple tests at least,
this transform is done very early already).

Various processors have "insert" instructions that can do this, but
combine cannot build those from the xor-and-xor, especially it has no
chance at all to do that if A or B or multiple instructions as well
(on PowerPC, the rl[ws]imi instructions that can do this with a rotate,
or a simple shift with appropriate C; other ISAs have similar insns).

This patch makes RTL simplify transform (xor (and (xor A B) C) B) back
to (ior (and A C) (and B ~C)) for constant C (and similar with A instead
of B for that last term).


Would it make sense to implement this transformation in match.pd, next to 
the "opposite" one, or do you need it at the RTL level because C only 
becomes a constant at that stage?


--
Marc Glisse


[PATCH] shrink-wrap: New spread_components

2016-11-09 Thread Segher Boessenkool
This patch changes spread_components to use a simpler algorithm that
puts prologue components as early as possible, and epilogue components
as late as possible.  This allows better scheduling, and also saves a
bit of code size.  The blocks that run with some specific component
enabled after this patch is a strict superset of those that had it
before the patch.

It does this by finding for every component the basic blocks where that
component is not needed on some path from the entry block (it reuses
head_components to store this), and similarly the blocks where the
component is not needed on some path to the exit block (or the exit can
not be reached from that block) (stored in tail_components).  Blocks
that then are in neither of those two sets get the component active.

Tested on powerpc64-linux {-m32,-m64}.  Is this okay for trunk?


Segher


2016-11-09  Segher Boessenkool  

* shrink-wrap.c (init_separate_shrink_wrap): Do not clear
head_components and tail_components.
(spread_components): New algorithm.
(emit_common_tails_for_components): Clear head_components and
tail_components.
(insert_prologue_epilogue_for_components): Write extra output to the
dump file for sibcalls and abnormal exits.

---
 gcc/shrink-wrap.c | 181 +++---
 1 file changed, 146 insertions(+), 35 deletions(-)

diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index 4395d8a..e480d4d 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -1131,8 +1131,6 @@ init_separate_shrink_wrap (sbitmap components)
   SW (bb)->head_components = sbitmap_alloc (SBITMAP_SIZE (components));
   SW (bb)->tail_components = sbitmap_alloc (SBITMAP_SIZE (components));
   bitmap_clear (SW (bb)->has_components);
-  bitmap_clear (SW (bb)->head_components);
-  bitmap_clear (SW (bb)->tail_components);
 }
 }
 
@@ -1253,48 +1251,151 @@ place_prologue_for_one_component (unsigned int which, 
basic_block head)
 }
 }
 
-/* Mark HAS_COMPONENTS for every block dominated by at least one block with
-   HAS_COMPONENTS set for the respective components, starting at HEAD.  */
+/* Set HAS_COMPONENTS in every block to the maximum it can be set to without
+   setting it on any path from entry to exit where it was not already set
+   somewhere (or, for blocks that have no path to the exit, consider only
+   paths from the entry to the block itself).  */
 static void
-spread_components (basic_block head)
+spread_components (sbitmap components)
 {
-  basic_block bb = head;
-  bool first_visit = true;
-  /* This keeps a tally of all components active.  */
-  sbitmap components = SW (head)->has_components;
+  basic_block entry_block = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+  basic_block exit_block = EXIT_BLOCK_PTR_FOR_FN (cfun);
 
-  for (;;)
+  /* A stack of all blocks left to consider, and a bitmap of all blocks
+ on that stack.  */
+  vec todo;
+  todo.create (n_basic_blocks_for_fn (cfun));
+  bitmap seen = BITMAP_ALLOC (NULL);
+
+  sbitmap old = sbitmap_alloc (SBITMAP_SIZE (components));
+
+  /* Find for every block the components that are *not* needed on some path
+ from the entry to that block.  Do this with a flood fill from the entry
+ block.  Every block can be visited at most as often as the number of
+ components (plus one), and usually much less often.  */
+
+  if (dump_file)
+fprintf (dump_file, "Spreading down...\n");
+
+  basic_block bb;
+  FOR_ALL_BB_FN (bb, cfun)
+bitmap_clear (SW (bb)->head_components);
+
+  bitmap_copy (SW (entry_block)->head_components, components);
+
+  edge e;
+  edge_iterator ei;
+
+  todo.quick_push (single_succ (entry_block));
+  bitmap_set_bit (seen, single_succ (entry_block)->index);
+  while (!todo.is_empty ())
 {
-  if (first_visit)
-   {
- bitmap_ior (SW (bb)->has_components, SW (bb)->has_components,
- components);
+  bb = todo.pop ();
 
- if (first_dom_son (CDI_DOMINATORS, bb))
-   {
- components = SW (bb)->has_components;
- bb = first_dom_son (CDI_DOMINATORS, bb);
- continue;
-   }
-   }
+  bitmap_copy (old, SW (bb)->head_components);
 
-  components = SW (bb)->has_components;
+  FOR_EACH_EDGE (e, ei, bb->preds)
+   bitmap_ior (SW (bb)->head_components, SW (bb)->head_components,
+   SW (e->src)->head_components);
 
-  if (next_dom_son (CDI_DOMINATORS, bb))
+  bitmap_and_compl (SW (bb)->head_components, SW (bb)->head_components,
+   SW (bb)->has_components);
+
+  if (!bitmap_equal_p (old, SW (bb)->head_components))
+   FOR_EACH_EDGE (e, ei, bb->succs)
+ if (bitmap_set_bit (seen, e->dest->index))
+   todo.quick_push (e->dest);
+
+  bitmap_clear_bit (seen, bb->index);
+}
+
+  /* Find for every block the components that are *not* needed on some reverse
+ path 

Re: [PATCH TEST]Drop xfail for gcc.dg/vect/vect-cond-2.c

2016-11-09 Thread Christophe Lyon
Hi Bin

On 8 November 2016 at 13:37, Bin Cheng  wrote:
> Hi,
> Test gcc.dg/vect/vect-cond-2.c can be vectorized by GCC now, this patch drops 
> the xfail.
>
> Thanks,
> bin
>
> gcc/testsuite/ChangeLog
> 2016-11-04  Bin Cheng  
>
> * gcc.dg/vect/vect-cond-2.c: Drop xfail.

But the test is now fail on armeb

Christophe


Go patch committed: rework static initializer code

2016-11-09 Thread Ian Lance Taylor
This patch to the Go frontend reworks the static initializer code.  It
renames is_immutable to is_static_initializer to try to capture what
it really means.  It is more precise about when an address expression,
or a binary expression, can be a static initializer.  Stop checking
whether a type has pointers when deciding whether an initializer must
be read-write, just check whether it is being used to initialize a
global variable.  To make that work set the Translate_context function
to NULL for a global variable with a static initializer.

The effect of this is to let more global variables be initialized
directly, rather than being initialized in the generated init
function.

I'm doing this now because it is a prerequisite to copying the signal
table from the Go 1.7 runtime, as the signal table has to be
initialized before init functions are run.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 241868)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-afe0456d25e3c6c0d91a8fd4c0fdfdbaa35cc251
+cac897bd27885c18a16dacfe27d5efd4526455c5
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 241740)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -536,10 +536,6 @@ class Error_expression : public Expressi
   { return true; }
 
   bool
-  do_is_immutable() const
-  { return true; }
-
-  bool
   do_numeric_constant_value(Numeric_constant* nc) const
   {
 nc->set_unsigned_long(NULL, 0);
@@ -1374,7 +1370,7 @@ class Func_code_reference_expression : p
   { return TRAVERSE_CONTINUE; }
 
   bool
-  do_is_immutable() const
+  do_is_static_initializer() const
   { return true; }
 
   Type*
@@ -1520,7 +1516,7 @@ class Boolean_expression : public Expres
   { return true; }
 
   bool
-  do_is_immutable() const
+  do_is_static_initializer() const
   { return true; }
 
   Type*
@@ -1889,7 +1885,7 @@ class Integer_expression : public Expres
   { return true; }
 
   bool
-  do_is_immutable() const
+  do_is_static_initializer() const
   { return true; }
 
   bool
@@ -2285,7 +2281,7 @@ class Float_expression : public Expressi
   { return true; }
 
   bool
-  do_is_immutable() const
+  do_is_static_initializer() const
   { return true; }
 
   bool
@@ -2475,7 +2471,7 @@ class Complex_expression : public Expres
   { return true; }
 
   bool
-  do_is_immutable() const
+  do_is_static_initializer() const
   { return true; }
 
   bool
@@ -2691,7 +2687,7 @@ class Const_expression : public Expressi
   { return true; }
 
   bool
-  do_is_immutable() const
+  do_is_static_initializer() const
   { return true; }
 
   bool
@@ -3047,7 +3043,7 @@ class Nil_expression : public Expression
   { return true; }
 
   bool
-  do_is_immutable() const
+  do_is_static_initializer() const
   { return true; }
 
   Type*
@@ -3284,10 +3280,11 @@ Type_conversion_expression::do_is_consta
   return true;
 }
 
-// Return whether a type conversion is immutable.
+// Return whether a type conversion can be used in a constant
+// initializer.
 
 bool
-Type_conversion_expression::do_is_immutable() const
+Type_conversion_expression::do_is_static_initializer() const
 {
   Type* type = this->type_;
   Type* expr_type = this->expr_->type();
@@ -3296,7 +3293,7 @@ Type_conversion_expression::do_is_immuta
   || expr_type->interface_type() != NULL)
 return false;
 
-  if (!this->expr_->is_immutable())
+  if (!this->expr_->is_static_initializer())
 return false;
 
   if (Type::are_identical(type, expr_type, false, NULL))
@@ -3542,10 +3539,11 @@ Unsafe_type_conversion_expression::do_tr
   return TRAVERSE_CONTINUE;
 }
 
-// Return whether an unsafe type conversion is immutable.
+// Return whether an unsafe type conversion can be used as a constant
+// initializer.
 
 bool
-Unsafe_type_conversion_expression::do_is_immutable() const
+Unsafe_type_conversion_expression::do_is_static_initializer() const
 {
   Type* type = this->type_;
   Type* expr_type = this->expr_->type();
@@ -3554,7 +3552,7 @@ Unsafe_type_conversion_expression::do_is
   || expr_type->interface_type() != NULL)
 return false;
 
-  if (!this->expr_->is_immutable())
+  if (!this->expr_->is_static_initializer())
 return false;
 
   if (Type::are_convertible(type, expr_type, NULL))
@@ -3855,6 +3853,44 @@ Unary_expression::do_is_constant() const
 return this->expr_->is_constant();
 }
 
+// Return whether a unary expression can be used as a constant
+// initializer.
+
+bool
+Unary_expression::do_is_static_initializer() const
+{
+  if (this->op_ == OPERATOR_MULT)
+return false;
+  else if (this->op_ == OPERATOR_AND)
+{
+  // The address of a global variable can 

Review debug message generation

2016-11-09 Thread François Dumont

Hi

Here is a proposal to review how we generate the debug output in 
case of assertion failure. It removes usage of format_word which, as a 
side effect will fix PR 77459. Should I reference this PR in the ChangeLog ?


I introduced a print_literal function to avoid using strlen on 
them. I know that gcc optimizes calls to strlen on literals but in our 
case we were not directly calling it on the literals.


Tested under Linux x86_64, ok to commit ?

* src/c++11/debug.cc (format_word): Delete.
(print_literal): New. Replace call to print_word for literals.

François


On 08/11/2016 22:35, fdumont at gcc dot gnu.org wrote:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77459

--- Comment #8 from François Dumont  ---
Ok, at least it confirms what I thought about builtins. So the problem is
rather a buggy target.

Even if so I'll try to find an alternative approach to avoid snprintf usage.



diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc
index d79e43b..2b9c00b 100644
--- a/libstdc++-v3/src/c++11/debug.cc
+++ b/libstdc++-v3/src/c++11/debug.cc
@@ -546,11 +546,6 @@ namespace
   using _Error_formatter = __gnu_debug::_Error_formatter;
   using _Parameter = __gnu_debug::_Error_formatter::_Parameter;
 
-  template
-int
-format_word(char* buf, int n, const char* fmt, _Tp s)
-{ return std::min(__builtin_snprintf(buf, n, fmt, s), n - 1); }
-
   void
   get_max_length(std::size_t& max_length)
   {
@@ -577,6 +572,11 @@ namespace
 bool	_M_wordwrap;
   };
 
+  template
+void
+print_literal(PrintContext& ctx, const char()[Length])
+{ print_word(ctx, word, Length - 1); }
+
   void
   print_word(PrintContext& ctx, const char* word,
 	 std::ptrdiff_t count = -1)
@@ -627,18 +627,19 @@ namespace
   }
 else
   {
-	print_word(ctx, "\n", 1);
+	print_literal(ctx, "\n");
 	print_word(ctx, word, count);
   }
   }
 
+  template
 void
 print_type(PrintContext& ctx,
 	   const type_info* info,
-	 const char* unknown_name)
+	   const char(_name)[Length])
 {
   if (!info)
-  print_word(ctx, unknown_name);
+	print_literal(ctx, unknown_name);
   else
 	{
 	  int status;
@@ -784,20 +785,18 @@ namespace
   {
 if (type._M_name)
   {
-	const int bufsize = 64;
-	char buf[bufsize];
-	int written
-	  = format_word(buf, bufsize, "\"%s\"", type._M_name);
-	print_word(ctx, buf, written);
+	print_literal(ctx, "\"");
+	print_word(ctx, type._M_name);
+	print_literal(ctx, "\"");
   }
 
-print_word(ctx, " {\n");
+print_literal(ctx, " {\n");
 
 if (type._M_type)
   {
-	print_word(ctx, "  type = ");
+	print_literal(ctx, "  type = ");
 	print_type(ctx, type._M_type, "");
-	print_word(ctx, ";\n");
+	print_literal(ctx, ";\n");
   }
   }
 
@@ -809,9 +808,9 @@ namespace
 
 if (inst._M_name)
   {
-	int written
-	  = format_word(buf, bufsize, "\"%s\" ", inst._M_name);
-	print_word(ctx, buf, written);
+	print_literal(ctx, "\"");
+	print_word(ctx, inst._M_name);
+	print_literal(ctx, "\" ");
   }
 
 int written
@@ -820,7 +819,7 @@ namespace
 
 if (inst._M_type)
   {
-	print_word(ctx, "  type = ");
+	print_literal(ctx, "  type = ");
 	print_type(ctx, inst._M_type, "");
   }
   }
@@ -838,36 +837,36 @@ namespace
 	{
 	  const auto& ite = variant._M_iterator;
 
-	  print_word(ctx, "iterator ");
+	  print_literal(ctx, "iterator ");
 	  print_description(ctx, ite);
 
 	  if (ite._M_type)
 	{
 	  if (ite._M_constness != _Error_formatter::__unknown_constness)
 		{
-		  print_word(ctx, " (");
+		  print_literal(ctx, " (");
 		  print_field(ctx, param, "constness");
-		  print_word(ctx, " iterator)");
+		  print_literal(ctx, " iterator)");
 		}
 
-	  print_word(ctx, ";\n");
+	  print_literal(ctx, ";\n");
 	}
 
 	  if (ite._M_state != _Error_formatter::__unknown_state)
 	{
-	  print_word(ctx, "  state = ");
+	  print_literal(ctx, "  state = ");
 	  print_field(ctx, param, "state");
-	  print_word(ctx, ";\n");
+	  print_literal(ctx, ";\n");
 	}
 
 	  if (ite._M_sequence)
 	{
-	  print_word(ctx, "  references sequence ");
+	  print_literal(ctx, "  references sequence ");
 	  if (ite._M_seq_type)
 		{
-		  print_word(ctx, "with type '");
+		  print_literal(ctx, "with type '");
 		  print_field(ctx, param, "seq_type");
-		  print_word(ctx, "' ");
+		  print_literal(ctx, "' ");
 		}
 
 	  int written
@@ -875,34 +874,34 @@ namespace
 	  print_word(ctx, buf, written);
 	}
 
-	  print_word(ctx, "}\n", 2);
+	  print_literal(ctx, "}\n");
 	}
 	break;
 
   case _Parameter::__sequence:
-	print_word(ctx, "sequence ");
+	print_literal(ctx, "sequence ");
 	print_description(ctx, variant._M_sequence);
 
 	if (variant._M_sequence._M_type)
-	  print_word(ctx, ";\n", 2);
+	  print_literal(ctx, ";\n");
 
-	print_word(ctx, "}\n", 2);
+	print_literal(ctx, "}\n");
 	break;
 
   case _Parameter::__instance:
-	print_word(ctx, 

[ARM] PR 78253 do not resolve weak ref locally

2016-11-09 Thread Christophe Lyon
Hi,

PR 78253 shows that the handling of weak references has changed for
ARM with gcc-5.

When r220674 was committed, default_binds_local_p_2 gained a new
parameter (weak_dominate), which, when true, implies that a reference
to a weak symbol defined locally will be resolved locally, even though
it could be overridden by a strong definition in another object file.

With r220674, default_binds_local_p forces weak_dominate=true,
effectively changing the previous behavior.

The attached patch introduces default_binds_local_p_4 which is a copy
of default_binds_local_p_2, but using weak_dominate=false, and updates
the ARM target to call default_binds_local_p_4 instead of
default_binds_local_p_2.

I ran cross-tests on various arm* configurations with no regression,
and checked that the test attached to the original bugzilla now works
as expected.

I am not sure why weak_dominate defaults to true, and I couldn't
really understand why by reading the threads related to r220674 and
following updates to default_binds_local_p_* which all deal with other
corner cases and do not discuss the weak_dominate parameter.

Or should this patch be made more generic?

Thanks,

Christophe
2016-11-09  Christophe Lyon  

PR target/78253
* output.h (default_binds_local_p_4): New.
* varasm.c (default_binds_local_p_4): New, use
weak_dominate=false.
* config/arm/linux-elf.h (TARGET_BINDS_LOCAL_P): Define to
default_binds_local_p_4.


diff --git a/gcc/config/arm/linux-elf.h b/gcc/config/arm/linux-elf.h
index cc17b51..4f32ce8 100644
--- a/gcc/config/arm/linux-elf.h
+++ b/gcc/config/arm/linux-elf.h
@@ -110,7 +110,7 @@
strong definitions in dependent shared libraries, will resolve
to COPY relocated symbol in the executable.  See PR65780.  */
 #undef TARGET_BINDS_LOCAL_P
-#define TARGET_BINDS_LOCAL_P default_binds_local_p_2
+#define TARGET_BINDS_LOCAL_P default_binds_local_p_4
 
 /* Define this to be nonzero if static stack checking is supported.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
diff --git a/gcc/output.h b/gcc/output.h
index 0924499..11b5ce5 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -585,6 +585,7 @@ extern bool default_binds_local_p (const_tree);
 extern bool default_binds_local_p_1 (const_tree, int);
 extern bool default_binds_local_p_2 (const_tree);
 extern bool default_binds_local_p_3 (const_tree, bool, bool, bool, bool);
+extern bool default_binds_local_p_4 (const_tree);
 extern void default_globalize_label (FILE *, const char *);
 extern void default_globalize_decl_name (FILE *, tree);
 extern void default_emit_unwind_label (FILE *, tree, int, int);
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 6a7ffc2..7a3cf99 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -6952,6 +6952,16 @@ default_binds_local_p_2 (const_tree exp)
  !flag_pic);
 }
 
+/* Similar to default_binds_local_p_2, but local weak definition does
+   not imply local resolution (weak_dominate is false).  */
+
+bool
+default_binds_local_p_4 (const_tree exp)
+{
+  return default_binds_local_p_3 (exp, flag_shlib != 0, false, true,
+ !flag_pic);
+}
+
 bool
 default_binds_local_p_1 (const_tree exp, int shlib)
 {


Re: [PATCH] simplify-rtx: Transform (xor (and (xor A B) C) B) with C const

2016-11-09 Thread Bernd Schmidt

On 11/09/2016 10:13 PM, Segher Boessenkool wrote:

* simplify-rtx.c (simplify_binary_operation_1): Simplify
(xor (and (xor A B) C) B) to (ior (and A C) (and B ~C)) and
(xor (and (xor A B) C) B) to (ior (and A ~C) (and B C)) if C
is a const_int.


I think one of the xors should have A as the second operand.



+/* If we have (xor (and (xor A B) C) A) with C a constant we can instead
+   do (ior (and A ~C) (and B C)) which is a machine instruction on some
+   machines, and also has shorter instruction path length.  */
+  if (GET_CODE (op0) == AND


Comments doesn't line up with the if/else on my monitor; could be email 
damage but please check.


Other than that, I think it does qualify as a simplification (or at 
least an improvement), so OK. Would be nice to check for it with a testcase.



Bernd


[PATCH] libstdc++: Improve code generation for atomic reference counting

2016-11-09 Thread Pauli
Atomic reference counting generates pessimistic code in platforms where
builtin atomics could optimize code for following branch with subtract
instruction.

To allow better code generation with compile time constant addition can
be checked for negative value. Those cases can then be better optimized
with __atomic_fetch_sub builtin.

Extra branch checking presence of threads in application generates a
lot extra instructions when builtin atomics often generate very few
instructions.

Can I assume that __builtin_constant_p is available in all compilers
with _GLIBCXX_ATOMIC_BUILTINS is available?
bits/stl_list.h assumes __builtin_constant_p is available inside
_GLIBCXX_USE_CXX11_ABI.

x86_64 test results:
FAIL: 22_locale/numpunct/members/char/3.cc execution test
FAIL: 22_locale/time_get/get_date/wchar_t/4.cc execution test
  of expected passes11941
  of unexpected failures2
  of expected failures  65
  of unsupported tests  244
---

2016-11-09  Pauli Nieminen  

* include/ext/atomicity.h (__exchange_and_add): Optimize negative add
with builtin __atomic_fetch_sub.
(__atomic_add): Call __exchange_and_add to reuse same code
(__exchange_adn_add_dispatch) (__atomic_add_dispatch): Optimize
threading check if builtins generate inline instructions. 

---
 libstdc++-v3/include/ext/atomicity.h | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/ext/atomicity.h 
b/libstdc++-v3/include/ext/atomicity.h
index 0fcb390..ecf5a9d 100644
--- a/libstdc++-v3/include/ext/atomicity.h
+++ b/libstdc++-v3/include/ext/atomicity.h
@@ -35,6 +35,14 @@
 #include 
 #include 
 
+#if defined(__GCC_ATOMIC_INT_LOCK_FREE)
+#define _GLIBCXX_INT_LOCK_FREE __GCC_ATOMIC_INT_LOCK_FREE
+#else
+#define _GLIBCXX_INT_LOCK_FREE 2
+#endif
+#define _GLIBCXX_ATOMIC_INLINE \
+  (_GLIBCXX_ATOMIC_BUILTINS && _GLIBCXX_INT_LOCK_FREE == 2)
+
 namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -46,11 +54,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef _GLIBCXX_ATOMIC_BUILTINS
   static inline _Atomic_word 
   __exchange_and_add(volatile _Atomic_word* __mem, int __val)
-  { return __atomic_fetch_add(__mem, __val, __ATOMIC_ACQ_REL); }
+  {
+if (__builtin_constant_p(__val) && __val < 0)
+  return __atomic_fetch_sub(__mem, 0 - __val, __ATOMIC_ACQ_REL);
+else
+  return __atomic_fetch_add(__mem, __val, __ATOMIC_ACQ_REL);
+  }
 
   static inline void
   __atomic_add(volatile _Atomic_word* __mem, int __val)
-  { __atomic_fetch_add(__mem, __val, __ATOMIC_ACQ_REL); }
+  { __exchange_and_add(__mem, __val); }
 #else
   _Atomic_word
   __attribute__ ((__unused__))
@@ -78,7 +91,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __exchange_and_add_dispatch(_Atomic_word* __mem, int __val)
   {
 #ifdef __GTHREADS
-if (__gthread_active_p())
+if (_GLIBCXX_ATOMIC_INLINE || __gthread_active_p())
   return __exchange_and_add(__mem, __val);
 else
   return __exchange_and_add_single(__mem, __val);
@@ -92,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __atomic_add_dispatch(_Atomic_word* __mem, int __val)
   {
 #ifdef __GTHREADS
-if (__gthread_active_p())
+if (_GLIBCXX_ATOMIC_INLINE || __gthread_active_p())
   __atomic_add(__mem, __val);
 else
   __atomic_add_single(__mem, __val);
@@ -104,6 +117,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
+#undef _GLIBCXX_INT_LOCK_FREE
+#undef _GLIBCXX_ATOMIC_INLINE
+
 // Even if the CPU doesn't need a memory barrier, we need to ensure
 // that the compiler doesn't reorder memory accesses across the
 // barriers.
-- 
2.9.3



[PATCH] libstdc++: Allow using without lock free atomic int

2016-11-09 Thread Pauli
Compiling programs using std::future for old arm processors fails. The
problem is caused by preprocessor check for atomic lock free int.

Future can be changed to work correctly without lock free atomics with
minor changes to exception_ptr implementation.

Without lock free atomics there is question if deadlock can happen. But
atomic operations can't call outside code preventing any ABBA or
recursive mutex acquiring deadlocks.
Deadlock could happen if throwing an exception or access
is_lock_free() == false atomic from asynchronous signal handler.
Throwing from signal handler is undefined behavior. I don't know about
accessing atomics from asynchronous signal handler but that feels like
undefined behavior if is_lock_free returns false.

Bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64735

differences to current if atomic builtins available:
* Race detector annotations that are empty by default
* Check for __gthread_active_p
* Generate x86 code uses xadd instead of xsub
This makes code a bit worse. But problem is duplicated to any other user
of __exchange_and_add. The internal API implementation should be fixed
to generate better code for all cases. But that is a follow up patch.

 ** libstdc++ test results
x86_64 build from trunk@241870 resulted to exactly same results with and
without patch. That is two localetest cases failing:
FAIL: 22_locale/numpunct/members/char/3.cc execution test
FAIL: 22_locale/time_get/get_date/wchar_t/4.cc execution test
  of expected passes11941
  of unexpected failures2
  of expected failures  65
  of unsupported tests  244

armv5tel from gcc-6-branch@241866 in qemu-arm-static -cpu arm926 resulted
to following differences.
Before patch:
WARNING: program timed out.
FAIL: 22_locale/locale/cons/12658_thread-2.cc execution test
FAIL: 22_locale/numpunct/members/char/3.cc execution test
FAIL: 22_locale/time_get/get_date/wchar_t/4.cc execution test
FAIL: 27_io/fpos/14775.cc execution test
  of expected passes10628
  of unexpected failures4
  of expected failures  66
  of unsupported tests  390

After patch:
WARNING: program timed out.
FAIL: 22_locale/locale/cons/12658_thread-1.cc execution test
WARNING: program timed out.
FAIL: 22_locale/locale/cons/12658_thread-2.cc execution test
FAIL: 22_locale/numpunct/members/char/3.cc execution test
FAIL: 22_locale/time_get/get_date/wchar_t/4.cc execution test
FAIL: 27_io/fpos/14775.cc execution test
  of expected passes10831
  of unexpected failures5
  of expected failures  66
  of unsupported tests  281

My cpu isn't fast enough to run 12658 thread tests in qemu.
---

2016-11-09  Pauli Nieminen  

* include/std/future: Remove check for ATOMIC_INT_LOCK_FREE
* libsupc++/eh_ptr.cc (exception_ptr::_M_addref)
(exception_ptr::_M_release) (__gxx_dependent_exception_cleanup)
(rethrow_exception): Use atomicity.h reference counting helpers.
* libsupc++/eh_throw.cc (__gxx_exception_cleanup): Likewise.
* libsupc++/eh_tm.cc (free_any_cxa_exception): Likewise.
* libsupc++/exception: Remove check for ATOMIC_INT_LOCK_FREE
* libsupc++/exception_ptr.h: Likewise
* libsupc++/nested_exception.cc: Likewise
* libsupc++/nested_exception.h: Likewise
* src/c++11/future.cc: Likewise
* testsuite: Remove atomic builtins check from exception_ptr,
nested_exception, future, promise, packaged_task, async and
shared_future.


---
 libstdc++-v3/include/std/future|  4 +---
 libstdc++-v3/libsupc++/eh_ptr.cc   | 26 +-
 libstdc++-v3/libsupc++/eh_throw.cc |  9 
 libstdc++-v3/libsupc++/eh_tm.cc| 11 +
 libstdc++-v3/libsupc++/exception   |  3 +--
 libstdc++-v3/libsupc++/exception_ptr.h |  4 
 libstdc++-v3/libsupc++/nested_exception.cc |  2 --
 libstdc++-v3/libsupc++/nested_exception.h  |  4 
 libstdc++-v3/src/c++11/future.cc   |  4 ++--
 .../testsuite/18_support/exception_ptr/40296.cc|  1 -
 .../18_support/exception_ptr/60612-terminate.cc|  1 -
 .../18_support/exception_ptr/60612-unexpected.cc   |  1 -
 .../testsuite/18_support/exception_ptr/62258.cc|  1 -
 .../testsuite/18_support/exception_ptr/64241.cc|  1 -
 .../18_support/exception_ptr/current_exception.cc  |  1 -
 .../testsuite/18_support/exception_ptr/lifespan.cc |  1 -
 .../18_support/exception_ptr/make_exception_ptr.cc |  1 -
 .../exception_ptr/make_exception_ptr_2.cc  |  1 -
 .../testsuite/18_support/exception_ptr/move.cc |  1 -
 .../18_support/exception_ptr/requirements.cc   |  1 -
 .../18_support/exception_ptr/requirements_neg.cc   |  1 -
 .../18_support/exception_ptr/rethrow_exception.cc  |  1 -
 .../testsuite/18_support/nested_exception/51438.cc |  1 -
 

[PATCH] simplify-rtx: Transform (xor (and (xor A B) C) B) with C const

2016-11-09 Thread Segher Boessenkool
match.pd transforms (A)|(B&~C) to ((A^B))^B, which is fewer
operations if C is not const (and it is not on simple tests at least,
this transform is done very early already).

Various processors have "insert" instructions that can do this, but
combine cannot build those from the xor-and-xor, especially it has no
chance at all to do that if A or B or multiple instructions as well
(on PowerPC, the rl[ws]imi instructions that can do this with a rotate,
or a simple shift with appropriate C; other ISAs have similar insns).

This patch makes RTL simplify transform (xor (and (xor A B) C) B) back
to (ior (and A C) (and B ~C)) for constant C (and similar with A instead
of B for that last term).

Tested on powerpc64-linux {-m32,-m64}; no regressions.  Is this okay
for trunk?


Segher


2016-11-09  Segher Boessenkool  

* simplify-rtx.c (simplify_binary_operation_1): Simplify
(xor (and (xor A B) C) B) to (ior (and A C) (and B ~C)) and
(xor (and (xor A B) C) B) to (ior (and A ~C) (and B C)) if C
is a const_int.

---
 gcc/simplify-rtx.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 5c3dea1..62d60f3 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -2886,6 +2886,37 @@ simplify_binary_operation_1 (enum rtx_code code, 
machine_mode mode,
}
}
 
+/* If we have (xor (and (xor A B) C) A) with C a constant we can instead
+   do (ior (and A ~C) (and B C)) which is a machine instruction on some
+   machines, and also has shorter instruction path length.  */
+  if (GET_CODE (op0) == AND
+ && GET_CODE (XEXP (op0, 0)) == XOR
+ && CONST_INT_P (XEXP (op0, 1))
+ && rtx_equal_p (XEXP (XEXP (op0, 0), 0), trueop1))
+   {
+ rtx a = trueop1;
+ rtx b = XEXP (XEXP (op0, 0), 1);
+ rtx c = XEXP (op0, 1);
+ rtx nc = simplify_gen_unary (NOT, mode, c, mode);
+ rtx a_nc = simplify_gen_binary (AND, mode, a, nc);
+ rtx bc = simplify_gen_binary (AND, mode, b, c);
+ return simplify_gen_binary (IOR, mode, a_nc, bc);
+   }
+/* Similarly, (xor (and (xor A B) C) B) as (ior (and A C) (and B ~C))  */
+  else if (GET_CODE (op0) == AND
+ && GET_CODE (XEXP (op0, 0)) == XOR
+ && CONST_INT_P (XEXP (op0, 1))
+ && rtx_equal_p (XEXP (XEXP (op0, 0), 1), trueop1))
+   {
+ rtx a = XEXP (XEXP (op0, 0), 0);
+ rtx b = trueop1;
+ rtx c = XEXP (op0, 1);
+ rtx nc = simplify_gen_unary (NOT, mode, c, mode);
+ rtx b_nc = simplify_gen_binary (AND, mode, b, nc);
+ rtx ac = simplify_gen_binary (AND, mode, a, c);
+ return simplify_gen_binary (IOR, mode, ac, b_nc);
+   }
+
   /* (xor (comparison foo bar) (const_int 1)) can become the reversed
 comparison if STORE_FLAG_VALUE is 1.  */
   if (STORE_FLAG_VALUE == 1
-- 
1.9.3



Re: [PATCH] (v2) print-rtl-function.c: add (param) directive to dump

2016-11-09 Thread David Malcolm
On Wed, 2016-11-09 at 17:46 +0100, Bernd Schmidt wrote:
> On 11/09/2016 05:46 PM, David Malcolm wrote:
> 
> > OK for trunk if it passes bootstrap and regrtest?
> > 
> > gcc/ChangeLog:
> > * print-rtl-function.c (print_any_param_name): New function.
> > (print_param): New function.
> > (print_rtx_function): Call print_param for each argument.
> > * print-rtl.c (rtx_writer::finish_directive): New function.
> > * print-rtl.h (rtx_writer::finish_directive): New decl.
> > +
> > +  rtx decl_rtl = DECL_WRTL_CHECK (arg)->decl_with_rtl.rtl;
> 
> Isn't this DECL_RTL_IF_SET? If so, please use that macro. Otherwise 
> ok.

Yes, it is; thanks.  Using it required including varasm.h.

For reference, here's what I committed, as r242023 (having verified
 bootstrap and regrtest).Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 242022)
+++ gcc/ChangeLog	(revision 242023)
@@ -1,3 +1,12 @@
+2016-11-09  David Malcolm  
+
+	* print-rtl-function.c: Include varasm.h.
+	(print_any_param_name): New function.
+	(print_param): New function.
+	(print_rtx_function): Call print_param for each argument.
+	* print-rtl.c (rtx_writer::finish_directive): New function.
+	* print-rtl.h (rtx_writer::finish_directive): New decl.
+
 2016-11-09  Uros Bizjak  
 
 	PR target/78262
Index: gcc/print-rtl.c
===
--- gcc/print-rtl.c	(revision 242022)
+++ gcc/print-rtl.c	(revision 242023)
@@ -804,6 +804,15 @@
   m_sawclose = 1;
 }
 
+/* Emit a closing parenthesis and newline.  */
+
+void
+rtx_writer::finish_directive ()
+{
+  fprintf (m_outfile, ")\n");
+  m_sawclose = 0;
+}
+
 /* Print an rtx on the current line of FILE.  Initially indent IND
characters.  */
 
Index: gcc/print-rtl.h
===
--- gcc/print-rtl.h	(revision 242022)
+++ gcc/print-rtl.h	(revision 242023)
@@ -31,6 +31,8 @@
   void print_rtl (const_rtx rtx_first);
   int print_rtl_single_with_indent (const_rtx x, int ind);
 
+  void finish_directive ();
+
  private:
   void print_rtx_operand_code_0 (const_rtx in_rtx, int idx);
   void print_rtx_operand_code_e (const_rtx in_rtx, int idx);
Index: gcc/print-rtl-function.c
===
--- gcc/print-rtl-function.c	(revision 242022)
+++ gcc/print-rtl-function.c	(revision 242023)
@@ -33,6 +33,7 @@
 #include "langhooks.h"
 #include "memmodel.h"
 #include "emit-rtl.h"
+#include "varasm.h"
 
 /* Print an "(edge-from)" or "(edge-to)" directive describing E
to OUTFILE.  */
@@ -127,6 +128,37 @@
   return true;
 }
 
+/* Subroutine of print_param.  Write the name of ARG, if any, to OUTFILE.  */
+
+static void
+print_any_param_name (FILE *outfile, tree arg)
+{
+  if (DECL_NAME (arg))
+fprintf (outfile, " \"%s\"", IDENTIFIER_POINTER (DECL_NAME (arg)));
+}
+
+/* Print a "(param)" directive for ARG to OUTFILE.  */
+
+static void
+print_param (FILE *outfile, rtx_writer , tree arg)
+{
+  fprintf (outfile, "  (param");
+  print_any_param_name (outfile, arg);
+  fprintf (outfile, "\n");
+
+  /* Print the value of DECL_RTL (without lazy-evaluation).  */
+  fprintf (outfile, "(DECL_RTL ");
+  w.print_rtx (DECL_RTL_IF_SET (arg));
+  w.finish_directive ();
+
+  /* Print DECL_INCOMING_RTL.  */
+  fprintf (outfile, "(DECL_RTL_INCOMING ");
+  w.print_rtx (DECL_INCOMING_RTL (arg));
+  fprintf (outfile, ")");
+
+  w.finish_directive ();
+}
+
 /* Write FN to OUTFILE in a form suitable for parsing, with indentation
and comments to make the structure easy for a human to grok.  Track
the basic blocks of insns in the chain, wrapping those that are within
@@ -197,6 +229,10 @@
 
   fprintf (outfile, "(function \"%s\"\n", dname);
 
+  /* Params.  */
+  for (tree arg = DECL_ARGUMENTS (fdecl); arg; arg = DECL_CHAIN (arg))
+print_param (outfile, w, arg);
+
   /* The instruction chain.  */
   fprintf (outfile, "  (insn-chain\n");
   basic_block curr_bb = NULL;


Re: [PATCH], PowerPC ISA 3.0, allow QImode/HImode to go into vector registers

2016-11-09 Thread Michael Meissner
Of course it would help, if I actually attached the patches:

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md 
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 241924)
+++ gcc/config/rs6000/rs6000.md (.../gcc/config/rs6000) (working copy)
@@ -325,6 +325,9 @@ (define_mode_iterator INT [QI HI SI DI T
 ; Any supported integer mode that fits in one register.
 (define_mode_iterator INT1 [QI HI SI (DI "TARGET_POWERPC64")])
 
+; Integer modes supported in VSX registers with ISA 3.0 instructions
+(define_mode_iterator INT_ISA3 [QI HI SI DI])
+
 ; Everything we can extend QImode to.
 (define_mode_iterator EXTQI [SI (DI "TARGET_POWERPC64")])
 
@@ -334,7 +337,7 @@ (define_mode_iterator EXTHI [SI (DI "TAR
 ; Everything we can extend SImode to.
 (define_mode_iterator EXTSI [(DI "TARGET_POWERPC64")])
 
-; QImode or HImode for small atomic ops
+; QImode or HImode for small integer moves and small atomic ops
 (define_mode_iterator QHI [QI HI])
 
 ; QImode, HImode, SImode for fused ops only for GPR loads
@@ -735,13 +738,15 @@ (define_code_attr SMINMAX [(smin "SM
 ;; complex forms.  Basic data transfer is done later.
 
 (define_insn "zero_extendqi2"
-  [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,r")
-   (zero_extend:EXTQI (match_operand:QI 1 "reg_or_mem_operand" "m,r")))]
+  [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,r,?*wJwK,?*wK")
+   (zero_extend:EXTQI (match_operand:QI 1 "reg_or_mem_operand" 
"m,r,Z,*wK")))]
   ""
   "@
lbz%U1%X1 %0,%1
-   rlwinm %0,%1,0,0xff"
-  [(set_attr "type" "load,shift")])
+   rlwinm %0,%1,0,0xff
+   lxsibzx %x0,%y1
+   vextractub %0,%1,7"
+  [(set_attr "type" "load,shift,fpload,vecperm")])
 
 (define_insn_and_split "*zero_extendqi2_dot"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y")
@@ -786,13 +791,15 @@ (define_insn_and_split "*zero_extendqi2"
-  [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r,r")
-   (zero_extend:EXTHI (match_operand:HI 1 "reg_or_mem_operand" "m,r")))]
+  [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r,r,?*wJwK,?*wK")
+   (zero_extend:EXTHI (match_operand:HI 1 "reg_or_mem_operand" 
"m,r,Z,wK")))]
   ""
   "@
lhz%U1%X1 %0,%1
-   rlwinm %0,%1,0,0x"
-  [(set_attr "type" "load,shift")])
+   rlwinm %0,%1,0,0x
+   lxsihzx %x0,%y1
+   vextractuh %0,%1,6"
+  [(set_attr "type" "load,shift,fpload,vecperm")])
 
 (define_insn_and_split "*zero_extendhi2_dot"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y")
@@ -893,11 +900,13 @@ (define_insn_and_split "*zero_extendsi2"
-  [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r")
-   (sign_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" "r")))]
+  [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,?*wK")
+   (sign_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" "r,?*wK")))]
   ""
-  "extsb %0,%1"
-  [(set_attr "type" "exts")])
+  "@
+   extsb %0,%1
+   vextsb2d %0,%1"
+  [(set_attr "type" "exts,vecperm")])
 
 (define_insn_and_split "*extendqi2_dot"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y")
@@ -948,14 +957,30 @@ (define_expand "extendhi2"
   "")
 
 (define_insn "*extendhi2"
-  [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r,r")
-   (sign_extend:EXTHI (match_operand:HI 1 "reg_or_mem_operand" "m,r")))]
+  [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r,r,?*wK,?*wK")
+   (sign_extend:EXTHI (match_operand:HI 1 "reg_or_mem_operand" 
"m,r,Z,wK")))]
   "rs6000_gen_cell_microcode"
   "@
lha%U1%X1 %0,%1
-   extsh %0,%1"
-  [(set_attr "type" "load,exts")
-   (set_attr "sign_extend" "yes")])
+   extsh %0,%1
+   #
+   vextsh2d %0,%1"
+  [(set_attr "type" "load,exts,fpload,vecperm")
+   (set_attr "sign_extend" "yes")
+   (set_attr "length" "4,4,8,4")])
+
+(define_split
+  [(set (match_operand:EXTHI 0 "altivec_register_operand" "")
+   (sign_extend:EXTHI
+(match_operand:HI 1 "indexed_or_indirect_operand" "")))]
+  "TARGET_P9_VECTOR && reload_completed"
+  [(set (match_dup 2)
+   (match_dup 1))
+   (set (match_dup 0)
+   (sign_extend:EXTHI (match_dup 2)))]
+{
+  operands[2] = gen_rtx_REG (HImode, REGNO (operands[1]));
+})
 
 (define_insn "*extendhi2_noload"
   [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r")
@@ -5299,30 +5324,33 @@ (define_insn_and_split "*floatunssidf2_i
(set_attr "type" "fp")])
 
 ;; ISA 3.0 adds instructions lxsi[bh]zx to directly load QImode and HImode to
-;; vector registers.  At the moment, QI/HImode are not allowed in floating
-;; point or vector registers, so we use UNSPEC's to use the load byte and
-;; half-word instructions.
+;; vector registers.  These insns favor doing the sign/zero extension in
+;; the vector registers, rather then loading up a GPR, doing a sign/zero
+;; extension and then a direct move.
 
 

Re: [Patch, fortran] PR44265 - Link error with reference to parameter array in specification expression

2016-11-09 Thread Dominique d'Humières

> Le 9 nov. 2016 à 20:09, Paul Richard Thomas  a 
> écrit :
> 
> Dear Dominique,
> 
> I am deeply embarrassed. This is the consequence of an additional
> condition added at the last minute.

No reason to be embarrassed;-)

> The attached removes it and makes sure that the original bug is tested
> in char_result_14.f90. The ChangeLogs are the same.
> 
> OK for trunk?

IMO yes

> 
> Paul
> 

I have a last glitch (which can be deferred if needed):

  FUNCTION Get(i) RESULT(s)
CHARACTER(*), PARAMETER :: names(3) = [  &
'Apple  ',  &
'Orange ',  &
'Mango  ' ];  
INTEGER, INTENT(IN) :: i
CHARACTER(LEN_TRIM(names(i))) :: s
!
s = names(i)
print *, len(s)
  END FUNCTION Get

PROGRAM WheresThatbLinkingConstantGone
  IMPLICIT NONE
  interface
FUNCTION Get(i) RESULT(s)
  CHARACTER(*), PARAMETER :: names(3) = [  &
  'Apple  ',  &
  'Orange ',  &
  'Mango  ' ];
  INTEGER, INTENT(IN) :: i
  CHARACTER(LEN_TRIM(names(i))) :: s
  END FUNCTION Get
  end interface

  integer :: i
  i = len(Get(1))
  print *, i
END PROGRAM WheresThatbLinkingConstantGone

does not link.

Dominique



[PATCH], PowerPC ISA 3.0, allow QImode/HImode to go into vector registers

2016-11-09 Thread Michael Meissner
The PowerPC ISA 3.0 (power9) has new instructions that make it feasible to
allow QImode and HImode to be allocated to vector registers.  The new
instructions are:

* load byte with zero extend
* load half word with zero extend
* store byte
* store half word
* extract byte from vector and zero extend
* extract half word from vector and zero extend
* sign extend byte to word/double word
* sign extend half word to word/double word

I have bootstrapped a previous version of the changes on a little endian Power8
system, and I'm now repeating the bootstrap on both a big endian Power8 and a
little endian Power8.  Assuming there are no regressions with the patches, can
I check these patches into the trunk?

I have built the spec 2006 CPU benchmark suite with these changes, and the
power8 (ISA 2.07) code generation does not change.

I have also built the spec 2006 CPU benchmark for power9.  The following
15 (out of 30) benchmarks had code changes:

* perlbench (char <-> floating point)
* gcc   (one extra ld/std)
* gamess(int <-> floating point)
* gromacs   (one fmr instead of li/mtvsrd)
* cactusADM (char/int <-> floating point)
* namd  (floating point -> int)
* gobmk (floating point -> int)
* dealII(int/long in vector regs. vs. gprs)
* povray(char/int <-> floating point)
* calculix  (int -> zero extend to long)
* hmmer (floating point -> int)
* h264ref   (zero extend short)
* tonto (floating point -> int)
* omnetpp   (floating point -> int)
* wrf   (floating point -> unsigned/int)

[gcc]
2016-11-09  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): If ISA 3.0,
enable HImode and QImode to go in vector registers by default if
the -mvsx-small-integer option is enabled.
(rs6000_secondary_reload_simple_move): Likewise.
(rs6000_preferred_reload_class): Don't force integer constants to
be loaded into vector registers that we can easily make into
memory (or being created in the GPRs and moved over with direct
move).
* config/rs6000/vsx.md (UNSPEC_P9_MEMORY): Delete, no longer
used.
(vsx_extract_): Rework V4SImode, V8HImode, and V16QImode
vector extraction on ISA 3.0 when the scalar integer can be
allocated in vector registers.  Generate the VEC_SELECT directy,
and don't use UNSPEC's to avoid having the scalar type in a vector
register.  Make the expander target registers, and let the
combiner fold in results storing to memory, if the machine
supports stores.
(vsx_extract__di): Likewise.
(vsx_extract__p9): Likewise.
(vsx_extract__di_p9): Likewise.
(vsx_extract__store_p9): Likewise.
(vsx_extract_si): Likewise.
(vsx_extract__p8): Likewise.
(p9_lxsizx): Delete, no longer used.
(p9_stxsix): Likewise.
* config/rs6000/rs6000.md (INT_ISA3): New mode iterator for
integers in vector registers for ISA 3.0.
(QHI): Update comment.
(zero_extendqi2): Add support for ISA 3.0 scalar load or
vector extract instructions in sign/zero extend.
(zero_extendhi): Likewise.
(extendqi): Likewise.
(extendhi2): Likewise.
(HImode splitter for load/sign extend in vector register):
Likewise.
(float2): Eliminate old method of
optimizing floating point conversions to/from small data types and
rewrite it to support QImode/HImode being allowed in vector
registers on ISA 3.0.
(float2_internal): Likewise.
(floatuns2): Likewise.
(floatuns2_internal): Likewise.
(fix_trunc2): Likewise.
(fix_trunc2_internal): Likewise.
(fixuns_trunc2): Likewise.
(fixuns_trunc2_internal): Likewise.
VSPLITISW on ISA 2.07.
(movhi_internal): Combine movhi_internal and movqi_internal into
one mov_internal with an iterator.  Add support for QImode
and HImode being allowed in vector registers.  Make large number
of attributes and constraints easier to read.
(movqi_internal): Likewise.
(mov_internal): Likewise.
(movdi_internal64): Fix constraint to allow loading -16..15 with
VSPLITISW on ISA 2.07.
(integer XXSPLTIB splitter): Add support for QI, HI, and SImode as
well as DImode.

[gcc/testsuite]
2016-11-09  Michael Meissner  

* gcc.target/powerpc/vsx-qimode.c: New test for QImode, HImode
being allowed in vector registers.
* gcc.target/powerpc/vsx-qimode2.c: Likewise.
* gcc.target/powerpc/vsx-qimode3.c: Likewise.
* gcc.target/powerpc/vsx-himode.c: Likewise.
* 

[Patch, Fortran, committed] PR 46459: ICE (segfault): Invalid read in compare_actual_formal [error recovery]

2016-11-09 Thread Janus Weil
Hi all,

I have committed yet another obvious ice-on-invalid fix:

https://gcc.gnu.org/viewcvs?rev=242020=gcc=rev

Cheers,
Janus
Index: gcc/fortran/interface.c
===
--- gcc/fortran/interface.c (Revision 241993)
+++ gcc/fortran/interface.c (Arbeitskopie)
@@ -3190,6 +3190,7 @@ compare_actual_formal (gfc_actual_arglist **ap, gf
 shape array, if the dummy argument has the VOLATILE attribute.  */
 
   if (f->sym->attr.volatile_
+ && a->expr->expr_type == EXPR_VARIABLE
  && a->expr->symtree->n.sym->as
  && a->expr->symtree->n.sym->as->type == AS_ASSUMED_SHAPE
  && !(f->sym->as && f->sym->as->type == AS_ASSUMED_SHAPE))
@@ -3219,6 +3220,7 @@ compare_actual_formal (gfc_actual_arglist **ap, gf
 dummy argument has the VOLATILE attribute.  */
 
   if (f->sym->attr.volatile_
+ && a->expr->expr_type == EXPR_VARIABLE
  && a->expr->symtree->n.sym->attr.pointer
  && a->expr->symtree->n.sym->as
  && !(f->sym->as


Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)

2016-11-09 Thread Jason Merrill
On Mon, Nov 7, 2016 at 2:24 AM, Richard Biener  wrote:
> The issue with moving is that I failed to export the definition of
> c_parser in c-parser.h due to gengtype putting vec 
> handlers into gtype-c.h but not gtype-objc.h and thus objc bootstrap
> fails :/
>
> I believe (well, I hope) that code generation for the C parser
> should be mostly unaffected (inlining is still done as determined
> useful) and the performance of the GIMPLE parser shouldn't be
> too important.
>
> If anybody feels like digging into the gengtype issue, I gave up
> after trying for half a day to trick it to do what I want
> (like for example also putting it in gtype-objc.h).

I have a fix for that issue waiting for review:

https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02265.html

Jason


RE: [PATCH,testsuite] MIPS: Upgrade to MIPS IV if using (HAS_MOVN) with MIPS III.

2016-11-09 Thread Moore, Catherine


> -Original Message-
> From: Toma Tabacu [mailto:toma.tab...@imgtec.com]
> Sent: Wednesday, November 9, 2016 12:19 PM
> Subject: RE: [PATCH,testsuite] MIPS: Upgrade to MIPS IV if using (HAS_MOVN) 
> with MIPS III.
> 
> No, I don't have write access.
> I would be grateful if you could commit the patch for me, if you think it
> would be OK with Matthew.
> To be clear, I am not in a rush to get this committed.

You're all set now.  I committed this:  revision 242021.

Catherine

> gcc/testsuite/ChangeLog:
> 
> 2016-11-09  Toma Tabacu  
> 
>   * gcc.target/mips/mips.exp (mips-dg-options): Upgrade to MIPS IV if
>   using (HAS_MOVN) with MIPS III.
> 



C++ PATCH for some C++17 class deduction issues

2016-11-09 Thread Jason Merrill
Discussion at the meeting led me to notice that class template
deduction wasn't working with template template parameters.  This
patch also improves a few diagnostic issues.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 66d2621abd8ed6bb47d0fda747a572e00aece25e
Author: Jason Merrill 
Date:   Tue Nov 8 10:14:24 2016 -0800

Fix C++17 template placeholder for template template parm.

* parser.c (cp_parser_simple_type_specifier): Allow placeholder
for template template parameter.
(cp_parser_type_id_1): Improve diagnostic.
* decl.c (grokdeclarator): Handle class deduction diagnostics here.
* pt.c (splice_late_return_type): Not here.
(tsubst) [TEMPLATE_TYPE_PARM]: Substitute into placeholder template.
(do_class_deduction): Handle non-class templates.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index bd37faa..4b18d4e 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -9490,6 +9490,11 @@ grokdeclarator (const cp_declarator *declarator,
   if (initialized > 1)
 funcdef_flag = true;
 
+  location_t typespec_loc = smallest_type_quals_location (type_quals,
+ declspecs->locations);
+  if (typespec_loc == UNKNOWN_LOCATION)
+typespec_loc = declspecs->locations[ds_type_spec];
+
   /* Look inside a declarator for the name being declared
  and get it as a string, for an error message.  */
   for (id_declarator = declarator;
@@ -10011,6 +10016,16 @@ grokdeclarator (const cp_declarator *declarator,
   /* We might have ignored or rejected some of the qualifiers.  */
   type_quals = cp_type_quals (type);
 
+  if (cxx_dialect >= cxx1z && type && is_auto (type)
+  && innermost_code != cdk_function
+  && id_declarator && declarator != id_declarator)
+if (tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (type))
+{
+  error_at (typespec_loc, "template placeholder type %qT must be followed "
+   "by a simple declarator-id", type);
+  inform (DECL_SOURCE_LOCATION (tmpl), "%qD declared here", tmpl);
+}
+
   staticp = 0;
   inlinep = decl_spec_seq_has_spec_p (declspecs, ds_inline);
   virtualp =  decl_spec_seq_has_spec_p (declspecs, ds_virtual);
@@ -10247,12 +10262,7 @@ grokdeclarator (const cp_declarator *declarator,
  {
if (SCALAR_TYPE_P (type) || VOID_TYPE_P (type))
  {
-   location_t loc;
-   loc = smallest_type_quals_location (type_quals,
-   declspecs->locations);
-   if (loc == UNKNOWN_LOCATION)
- loc = declspecs->locations[ds_type_spec];
-   warning_at (loc, OPT_Wignored_qualifiers, "type "
+   warning_at (typespec_loc, OPT_Wignored_qualifiers, "type "
"qualifiers ignored on function return type");
  }
/* We now know that the TYPE_QUALS don't apply to the
@@ -10301,11 +10311,12 @@ grokdeclarator (const cp_declarator *declarator,
funcdecl_p = inner_declarator && inner_declarator->kind == cdk_id;
 
/* Handle a late-specified return type.  */
+   tree late_return_type = declarator->u.function.late_return_type;
if (funcdecl_p)
  {
-   if (type_uses_auto (type))
+   if (tree auto_node = type_uses_auto (type))
  {
-   if (!declarator->u.function.late_return_type)
+   if (!late_return_type)
  {
if (current_class_type
&& LAMBDA_TYPE_P (current_class_type))
@@ -10333,8 +10344,32 @@ grokdeclarator (const cp_declarator *declarator,
   name, type);
return error_mark_node;
  }
+   if (tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (auto_node))
+ {
+   if (!late_return_type)
+ {
+   if (dguide_name_p (unqualified_id))
+ error_at (typespec_loc, "deduction guide for "
+   "%qT must have trailing return type",
+   TREE_TYPE (tmpl));
+   else
+ error_at (typespec_loc, "deduced class type %qT "
+   "in function return type", type);
+   inform (DECL_SOURCE_LOCATION (tmpl),
+   "%qD declared here", tmpl);
+ }
+   else if (CLASS_TYPE_P (late_return_type)
+&& CLASSTYPE_TEMPLATE_INFO (late_return_type)
+&& (CLASSTYPE_TI_TEMPLATE (late_return_type)
+

C++ PATCH for C++17 auto non-type template parameters

2016-11-09 Thread Jason Merrill
This was pretty straightforward; auto template parameters look more or
less the same as any other parameter with dependent type, so
substitution works without any changes, we just need to do auto
deduction in a couple of places.  The most involved bit was handling
deduction of a type parameter from the type of an array bound, which
only happens if deduction otherwise doesn't find a binding.  At first
I tried to work this into unify et al, but eventually decided to do it
in a (much simpler) separate function.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 04ade0b5013698fd1e458ef1425f7afd023feaf0
Author: Jason Merrill 
Date:   Mon Nov 7 16:45:06 2016 -0800

Implement P0127R2, Declaring non-type parameters with auto.

gcc/cp/
* cp-tree.h (enum auto_deduction_context): Add adc_unify.
* decl.c (grokdeclarator): Allow 'auto' in C++17 template non-type
parameter types.
* pt.c (do_auto_deduction): Add outer_targs parameter.
(convert_template_argument): Call do_auto_deduction.  If adc_unify,
don't give up on dependent init.
(unify): Likewise.  In C++17, walk into the type of a
TEMPLATE_PARM_INDEX.
(for_each_template_parm): Add any_fn parameter.
(struct pair_fn_data): Likewise.
(for_each_template_parm_r): Call it for any tree.  In C++17, walk
into the type of a TEMPLATE_PARM_INDEX.
(zero_r, array_deduction_r, try_array_deduction): New.
(type_unification_real): Call try_array_deduction.
(get_partial_spec_bindings): Likewise.
gcc/c-family/
* c-cppbuiltin.c (c_cpp_builtins): Define __cpp_template_auto.

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 55dbf44..70eade1 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -942,6 +942,7 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_aggregate_bases=201603");
  cpp_define (pfile, "__cpp_deduction_guides=201606");
  cpp_define (pfile, "__cpp_noexcept_function_type=201510");
+ cpp_define (pfile, "__cpp_template_auto=201606");
}
   if (flag_concepts)
cpp_define (pfile, "__cpp_concepts=201507");
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 20b52ad..9b5b5bc 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5163,6 +5163,7 @@ enum auto_deduction_context
   adc_unspecified,   /* Not given */
   adc_variable_type, /* Variable initializer deduction */
   adc_return_type,   /* Return type deduction */
+  adc_unify, /* Template argument deduction */
   adc_requirement/* Argument dedution constraint */
 };
 
@@ -6088,7 +6089,8 @@ extern tree make_template_placeholder (tree);
 extern tree do_auto_deduction   (tree, tree, tree);
 extern tree do_auto_deduction   (tree, tree, tree,
  tsubst_flags_t,
- auto_deduction_context);
+ auto_deduction_context,
+tree = NULL_TREE);
 extern tree type_uses_auto (tree);
 extern tree type_uses_auto_or_concept  (tree);
 extern void append_type_to_template_for_access_check (tree, tree, tree,
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index c0321f9..bd37faa 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -11135,7 +11135,8 @@ grokdeclarator (const cp_declarator *declarator,
   if (ctype || in_namespace)
error ("cannot use %<::%> in parameter declaration");
 
-  if (type_uses_auto (type))
+  if (type_uses_auto (type)
+ && !(cxx_dialect >= cxx1z && template_parm_flag))
{
  if (cxx_dialect >= cxx14)
error ("% parameter not permitted in this context");
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 3df71dd..64e566e 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -161,7 +161,7 @@ static tree convert_nontype_argument (tree, tree, 
tsubst_flags_t);
 static tree convert_template_argument (tree, tree, tree,
   tsubst_flags_t, int, tree);
 static tree for_each_template_parm (tree, tree_fn_t, void*,
-   hash_set *, bool);
+   hash_set *, bool, tree_fn_t = NULL);
 static tree expand_template_argument_pack (tree);
 static tree build_template_parm_index (int, int, int, tree, tree);
 static bool inline_needs_template_parms (tree, bool);
@@ -7299,6 +7299,13 @@ convert_template_argument (tree parm,
 {
   tree t = tsubst (TREE_TYPE (parm), args, complain, in_decl);
 
+  if (tree a = type_uses_auto (t))
+   {
+ t = do_auto_deduction (t, arg, a, complain, adc_unspecified);
+ if (t == error_mark_node)
+   return error_mark_node;
+ 

[PATCH, i386]: Fix PR78626, wrong code with -fschedule-insns

2016-11-09 Thread Uros Bizjak
Hello!

We need earlyclobber on output operand of doubleword shift insns,
since we have to prevent (partial) output matching %ecx as count
argument.

2016-11-09  Uros Bizjak  

PR target/78262
* config/i386/i386.md (*3_doubleword): Mark
operand 0 as earlyclobber.

testsuite/ChangeLog:

2016-11-09  Uros Bizjak  

PR target/78262
* gcc.target/i386/pr78262.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline, will be backported to release branches.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 242004)
+++ config/i386/i386.md (working copy)
@@ -10339,7 +10339,7 @@
   "operands[2] = gen_lowpart (QImode, operands[2]);")
 
 (define_insn_and_split "*3_doubleword"
-  [(set (match_operand:DWI 0 "register_operand" "=r")
+  [(set (match_operand:DWI 0 "register_operand" "=")
(any_shiftrt:DWI (match_operand:DWI 1 "register_operand" "0")
 (match_operand:QI 2 "nonmemory_operand" "c")))
(clobber (reg:CC FLAGS_REG))]
Index: testsuite/gcc.target/i386/pr78262.c
===
--- testsuite/gcc.target/i386/pr78262.c (nonexistent)
+++ testsuite/gcc.target/i386/pr78262.c (working copy)
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-O -fschedule-insns" } */
+
+typedef unsigned char u8;
+typedef unsigned __int128 u128;
+
+static u128 u128_0;
+static u128 *p128;
+
+u128 __attribute__ ((noinline, noclone))
+foo(u8 u8_0)
+{
+  p128 = _0;
+  u128_0 = u8_0;
+  u128_0 = u128_0 << 127 | u128_0 >> 1;
+  u128_0 >>= (u8)u128_0;
+  return 2 + u128_0;
+}
+
+int
+main()
+{
+  u128 x = foo(5);
+  if (p128 != _0)
+__builtin_abort();
+  if (u128_0 != ((u128)2 << 124))
+__builtin_abort();
+  if (x != ((u128)2 << 124) + 2)
+__builtin_abort();
+  return 0;
+}


Re: [PATCH 2/4] BRIG (HSAIL) frontend: The FE itself.

2016-11-09 Thread Pekka Jääskeläinen
Hi Martin,

On Fri, Nov 4, 2016 at 4:58 PM, Martin Jambor  wrote:
> I personally primarily want this for debugging purposes, and we should
> try to eventually report errors in a more comprehensible way than
> HSAILasm.  But more generally, and more importantly, if the input,
> whether human readable or not, is incorrect, the compiler should not
> produce an "internal error."  If that happens, users will file bugs
> against gcc when in fact it is the generating tool that is broken.
> One you maintain the FE, you personally will not want this :-)

Yes, I will add better error messages as soon asissues are reported.

> I was wondering why in case of VECTOR_TYPE, instead of
> return build1 (VIEW_CONVERT_EXPR, type, expr);
> you do not do
> return fold (convert_to_vector (type, expr));
>
> it seemed to me natural given how you handle the other cases and that
> the function internally also builds a V_C_E, after two checks.  So I
> thought the checks (matching overall size and only converting from an
> integer or another vector-type) were the problem and was wondering
> which one.  Especially the size one is something I believe you should
> never violate too.

I see. I cannot recall the origins of this, but my guess is that the code is
copied from an older Go frontend version. I tested with your
suggestion and it seems
to work fine.

> copy_move_inst_handler operator() starts by type-casting whatever it
> gets to BrigInstSourceType* and immediately dereferencing it, loading
> field sourceType, even though, in case of an LDA, what it is actually
> looking at is not a BrigInstSourceType but a BrigInstAddr, which is an
> entirely different structure that does not have that field.  So you
> read the 8-bit segment and the first reserved field and happily pass
> that to gccbrig_tree_type_for_hsa_type, hoping it does not explode on
> random data.  That is wrong.
>
> Later on in the operator() you actually figure out that you are
> looking at BrigInstSourceType but that is too late.

Ah, I see the issue now. It worked as with LDA the input type was
forced and handled explicitly. I now made the code more explicit
regarding this.

> Indeed it does, thanks.  However, I think that at the point when you
> do:
>
>   if (offs > 0)
> const_offset = build_int_cst (size_type_node, offs);
>
> const_offset can already be non-NULL you may lose whatever value it
> had before.  You might want to keep the constant offset as integer
> until the very moment when you may want to use it.

Great catch! There indeed was a major memory corruption with
group and private variable which was not caught by the PRM conformance
suite I use for testing. Should be now fixed.

> (Also, single statements should not have braces around them ;-)

...also fixed these, once again ;)

> I see.  If you think that MULT_HIGHPART_EXPR is broken, it may be
> worth filing a bug report (probably during next stage1).

It can be considered broken or not just fully implemented for all targets
and various vector types (especially those not directly supported by the
target).  Yes, I rather look closer and report this after the patch has been
merged to be able to provide BRIG test cases that work with
upstream code base.

> This week I have checked out the updated tree from github and looked
> at only a few changed functions there.  Hopefully the steering
> committee will decide soon and we'll get attention of a global
> reviewer during the next few weeks too.
>
> Thanks for addressing so many of my comments,

Thanks again, an updated FE patch attached.

BR,
Pekka


002-brig-fe-gcc.patch.gz
Description: GNU Zip compressed data


Re: [PATCH] Fix PR78189

2016-11-09 Thread Christophe Lyon
On 9 November 2016 at 09:36, Bin.Cheng  wrote:
> On Tue, Nov 8, 2016 at 9:11 AM, Richard Biener  wrote:
>> On Mon, 7 Nov 2016, Christophe Lyon wrote:
>>
>>> Hi Richard,
>>>
>>>
>>> On 7 November 2016 at 09:01, Richard Biener  wrote:
>>> >
>>> > The following fixes an oversight when computing alignment in the
>>> > vectorizer.
>>> >
>>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>>> >
>>> > Richard.
>>> >
>>> > 2016-11-07  Richard Biener  
>>> >
>>> > PR tree-optimization/78189
>>> > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Fix
>>> > alignment computation.
>>> >
>>> > * g++.dg/torture/pr78189.C: New testcase.
>>> >
>>> > Index: gcc/testsuite/g++.dg/torture/pr78189.C
>>> > ===
>>> > --- gcc/testsuite/g++.dg/torture/pr78189.C  (revision 0)
>>> > +++ gcc/testsuite/g++.dg/torture/pr78189.C  (working copy)
>>> > @@ -0,0 +1,41 @@
>>> > +/* { dg-do run } */
>>> > +/* { dg-additional-options "-ftree-slp-vectorize -fno-vect-cost-model" } 
>>> > */
>>> > +
>>> > +#include 
>>> > +
>>> > +struct A
>>> > +{
>>> > +  void * a;
>>> > +  void * b;
>>> > +};
>>> > +
>>> > +struct alignas(16) B
>>> > +{
>>> > +  void * pad;
>>> > +  void * misaligned;
>>> > +  void * pad2;
>>> > +
>>> > +  A a;
>>> > +
>>> > +  void Null();
>>> > +};
>>> > +
>>> > +void B::Null()
>>> > +{
>>> > +  a.a = nullptr;
>>> > +  a.b = nullptr;
>>> > +}
>>> > +
>>> > +void __attribute__((noinline,noclone))
>>> > +NullB(void * misalignedPtr)
>>> > +{
>>> > +  B* b = reinterpret_cast(reinterpret_cast(misalignedPtr) - 
>>> > offsetof(B, misaligned));
>>> > +  b->Null();
>>> > +}
>>> > +
>>> > +int main()
>>> > +{
>>> > +  B b;
>>> > +  NullB();
>>> > +  return 0;
>>> > +}
>>> > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>>> > index 9346cfe..b03cb1e 100644
>>> > --- gcc/tree-vect-data-refs.c
>>> > +++ gcc/tree-vect-data-refs.c
>>> > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct 
>>> > data_reference *dr)
>>> >base = ref;
>>> >while (handled_component_p (base))
>>> >  base = TREE_OPERAND (base, 0);
>>> > +  unsigned int base_alignment;
>>> > +  unsigned HOST_WIDE_INT base_bitpos;
>>> > +  get_object_alignment_1 (base, _alignment, _bitpos);
>>> > +  /* As data-ref analysis strips the MEM_REF down to its base operand
>>> > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to
>>> > + adjust things to make base_alignment valid as the alignment of
>>> > + DR_BASE_ADDRESS.  */
>>> >if (TREE_CODE (base) == MEM_REF)
>>> > -base = build2 (MEM_REF, TREE_TYPE (base), base_addr,
>>> > -  build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), 0));
>>> > -  unsigned int base_alignment = get_object_alignment (base);
>>> > +{
>>> > +  base_bitpos -= mem_ref_offset (base).to_short_addr () * 
>>> > BITS_PER_UNIT;
>>> > +  base_bitpos &= (base_alignment - 1);
>>> > +}
>>> > +  if (base_bitpos != 0)
>>> > +base_alignment = base_bitpos & -base_bitpos;
>>> > +  /* Also look at the alignment of the base address DR analysis
>>> > + computed.  */
>>> > +  unsigned int base_addr_alignment = get_pointer_alignment (base_addr);
>>> > +  if (base_addr_alignment > base_alignment)
>>> > +base_alignment = base_addr_alignment;
>>> >
>>> >if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype)))
>>> >  DR_VECT_AUX (dr)->base_element_aligned = true;
>>>
>>> Since you committed this patch (r241892), I'm seeing execution failures:
>>>   gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test
>>>   gcc.dg/vect/pr40074.c execution test
>>> on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9
>>> --with-fpu=neon-fp16
>>> (using qemu as simulator)
>>
>> The difference is that we now vectorize the testcase with versioning
>> for alignment (but it should never execute the vectorized variant).
>> I need arm peoples help to understand what is wrong.
> Hi All,
> I will look at it.
>

Hi,

This is causing new regressions on armeb:
  gcc.dg/vect/vect-strided-a-u8-i2-gap.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 2 loops" 1
  gcc.dg/vect/vect-strided-a-u8-i2-gap.c scan-tree-dump-times vect
"vectorized 2 loops" 1

pr470074 passes now,

Thanks,

Christophe

> Thanks,
> bin
>>
>> At least the testcase shows there is (kind-of) a missed optimization
>> that we no longer figure out versioning for alignment is useless.
>> I'll look into that.
>>
>> Richard.


[Patch, Fortran, committed] PR 60777: [F03] RECURSIVE function rejected in specification expression

2016-11-09 Thread Janus Weil
Hi all,

I just committed a close-to-obvious fix for PR60777 (ok'd by Steve in
the PR), see attachment:

https://gcc.gnu.org/viewcvs?rev=242009=gcc=rev

Cheers,
Janus
Index: gcc/fortran/expr.c
===
--- gcc/fortran/expr.c  (Revision 241993)
+++ gcc/fortran/expr.c  (Arbeitskopie)
@@ -2794,12 +2794,12 @@ external_spec_function (gfc_expr *e)
   return false;
 }
 
-  if (f->attr.recursive)
-{
-  gfc_error ("Specification function %qs at %L cannot be RECURSIVE",
-f->name, >where);
+  /* F08:7.1.11.6. */
+  if (f->attr.recursive
+  && !gfc_notify_std (GFC_STD_F2003,
+ "Specification function '%s' "
+ "at %L cannot be RECURSIVE",  f->name, >where))
   return false;
-}
 
 function_allowed:
   return restricted_args (e->value.function.actual);
! { dg-do run }
!
! PR 60777: [F03] RECURSIVE function rejected in specification expression
!
! Contributed by Vladimir Fuka 

module recur
  implicit none
contains

  pure recursive function f(n) result(answer)
integer, intent(in) :: n
integer :: answer
if (n<2) then
  answer = 1
else
  answer = f(n-1)*n
end if
  end function

  pure function usef(n)
integer,intent(in) :: n
character(f(n)):: usef
usef = repeat('*',f(n))
  end function
end module

program testspecexpr
  use recur
  implicit none
  if (usef(1) /= '*')  call abort()
  if (usef(2) /= '**') call abort()
  if (usef(3) /= '**') call abort()
end


Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)

2016-11-09 Thread Bernd Edlinger
On 11/09/16 17:52, Joseph Myers wrote:
> On Wed, 9 Nov 2016, James Greenhalgh wrote:
>
>> These new test cases look like they are going to be out of exponent range
>> for _Float16 - so the testcases will fail for a target which tests either
>> of:
>>
>>   gcc.dg/torture/fp-int-convert-float16.c
>>   gcc.dg/torture/fp-int-convert-float16-timode.c
>>
>> Should they have an M_OK1 check?
>
> Yes.
>

Yes, but maybe introduce a test if the half-wide value fits?

like:

#define M_OK2(M, T) ((M) > sizeof(T) * CHAR_BIT / 2 - 1)


Bernd.


Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)

2016-11-09 Thread Richard Biener
On November 9, 2016 5:45:32 PM GMT+01:00, Joseph Myers 
 wrote:
>On Wed, 9 Nov 2016, Richard Biener wrote:
>
>> I'll push back c_parser to the header and put inlines I need to
>export
>> there as well.
>> 
>> Joseph, is this (with regard to the inlines) your preference as well?
>
>I'm not clear what the proposal is.  If some patch is now proposed 
>different from what was previously posted to gcc-patches, please post
>it.

I prefer the posted patch with c_parser not exported.  But Jakub raised 
concerns about the inlines thus I digged again and found a possible solution to 
the gengtype issue.  Thus my question if you prefer the inlines to to stay as 
inlines and thus move to the header or if you do not request this change (that 
will save me work).

Thanks,
Richard.




RE: [PATCH,testsuite] MIPS: Upgrade to MIPS IV if using (HAS_MOVN) with MIPS III.

2016-11-09 Thread Toma Tabacu
> From: Moore, Catherine [mailto:catherine_mo...@mentor.com]
> Sent: 08 November 2016 20:47
> To: Toma Tabacu; gcc-patches@gcc.gnu.org
> Cc: Matthew Fortune
> Subject: RE: [PATCH,testsuite] MIPS: Upgrade to MIPS IV if using (HAS_MOVN)
> with MIPS III.
> 
> 
> 
> > -Original Message-
> > From: Toma Tabacu [mailto:toma.tab...@imgtec.com]
> > Sent: Monday, November 7, 2016 11:21 AM
> > gcc/testsuite/ChangeLog:
> >
> > 2016-11-07  Toma Tabacu  
> >
> > * gcc.target/mips/mips.exp (mips-dg-options): Upgrade to MIPS IV if
> using
> > (HAS_MOVN) with MIPS III.
> >
> > diff --git a/gcc/testsuite/gcc.target/mips/mips.exp
> > b/gcc/testsuite/gcc.target/mips/mips.exp
> > index 39f44ff..e22d782 100644
> > --- a/gcc/testsuite/gcc.target/mips/mips.exp
> > +++ b/gcc/testsuite/gcc.target/mips/mips.exp
> > @@ -1129,7 +1129,7 @@ proc mips-dg-options { args } {
> >  # We need MIPS IV or higher for:
> > #
> > #
> > -   } elseif { $isa < 3
> > +   } elseif { $isa < 4
> >&& [mips_have_test_option_p options "HAS_MOVN"] }
> > {
> > mips_make_test_option options "-mips4"
> >  # We need MIPS III or higher for:
> 
> Hi Toma,
> 
> The patch itself is OK, but the ChangeLog entry line length is greater than 
> 80.
> 

Yes, sorry. The version below fixes this.

> Do you have write access to the repository?  Please let me know if you would
> like me to commit this for you?
> 
> Thanks,
> Catherine

No, I don't have write access.
I would be grateful if you could commit the patch for me, if you think it
would be OK with Matthew.
To be clear, I am not in a rush to get this committed.

Thanks,
Toma Tabacu

gcc/testsuite/ChangeLog:

2016-11-09  Toma Tabacu  

* gcc.target/mips/mips.exp (mips-dg-options): Upgrade to MIPS IV if
using (HAS_MOVN) with MIPS III.

diff --git a/gcc/testsuite/gcc.target/mips/mips.exp 
b/gcc/testsuite/gcc.target/mips/mips.exp
index 39f44ff..e22d782 100644
--- a/gcc/testsuite/gcc.target/mips/mips.exp
+++ b/gcc/testsuite/gcc.target/mips/mips.exp
@@ -1129,7 +1129,7 @@ proc mips-dg-options { args } {
 # We need MIPS IV or higher for:
#
#
-   } elseif { $isa < 3
+   } elseif { $isa < 4
   && [mips_have_test_option_p options "HAS_MOVN"] } {
mips_make_test_option options "-mips4"
 # We need MIPS III or higher for:



Re: [PATCH] enable -fprintf-return-value by default

2016-11-09 Thread Sandra Loosemore

On 11/08/2016 08:13 PM, Martin Sebor wrote:

The -fprintf-return-value optimization has been disabled since
the last time it caused a bootstrap failure on powerpc64le.  With
the underlying problems fixed GCC has bootstrapped fine on all of
powerpc64, powerpc64le and x86_64 and tested with no regressions.
I'd like to re-enable the option.  The attached patch does that.




diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 17c5c22..adebeff 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8301,7 +8301,7 @@ if (snprintf (buf, "%08x", i) >= sizeof buf)
 The @option{-fprintf-return-value} option relies on other optimizations
 and yields best results with @option{-O2}.  It works in tandem with the
 @option{-Wformat-length} option.  The @option{-fprintf-return-value}
-option is disabled by default.
+option is enabled by default.

 @item -fno-peephole
 @itemx -fno-peephole2


Near the beginning of this chapter, in @node Invoking GCC, it says:

Many options have long names starting with @samp{-f} or with
@samp{-W}---for example,
@option{-fmove-loop-invariants}, @option{-Wformat} and so on.  Most of
these have both positive and negative forms; the negative form of
@option{-ffoo} is @option{-fno-foo}.  This manual documents
only one of these two forms, whichever one is not the default.

So you should be documenting the non-default negative form 
-fno-printf-return-value instead of the default positive form.  The 
corresponding entry in the list in @node Option Summary needs to be 
adjusted, too.


-Sandra



Re: [patch,avr] Add new option -mabsdata.

2016-11-09 Thread Sandra Loosemore

On 11/07/2016 05:54 AM, Georg-Johann Lay wrote:

@@ -15261,6 +15262,13 @@ GCC supports the following AVR devices a

 @include avr-mmcu.texi

+@item -mabsdata
+@opindex mabsdata
+
+Assume that all data in static stocage can be accessed by LDS / STS


s/stocage/storage/


+inctructions.  This option has only an effect on reduced Tiny devices like


s/inctructions/instructions/


+ATtiny40.
+
 @item -maccumulate-args
 @opindex maccumulate-args


The documentation changes are OK with those typos fixed.

-Sandra



Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)

2016-11-09 Thread Joseph Myers
On Wed, 9 Nov 2016, James Greenhalgh wrote:

> These new test cases look like they are going to be out of exponent range
> for _Float16 - so the testcases will fail for a target which tests either
> of:
> 
>   gcc.dg/torture/fp-int-convert-float16.c
>   gcc.dg/torture/fp-int-convert-float16-timode.c
> 
> Should they have an M_OK1 check?

Yes.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)

2016-11-09 Thread Joseph Myers
On Wed, 9 Nov 2016, Richard Biener wrote:

> I'll push back c_parser to the header and put inlines I need to export
> there as well.
> 
> Joseph, is this (with regard to the inlines) your preference as well?

I'm not clear what the proposal is.  If some patch is now proposed 
different from what was previously posted to gcc-patches, please post it.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] (v2) print-rtl-function.c: add (param) directive to dump

2016-11-09 Thread Bernd Schmidt

On 11/09/2016 05:46 PM, David Malcolm wrote:


OK for trunk if it passes bootstrap and regrtest?

gcc/ChangeLog:
* print-rtl-function.c (print_any_param_name): New function.
(print_param): New function.
(print_rtx_function): Call print_param for each argument.
* print-rtl.c (rtx_writer::finish_directive): New function.
* print-rtl.h (rtx_writer::finish_directive): New decl.
+
+  rtx decl_rtl = DECL_WRTL_CHECK (arg)->decl_with_rtl.rtl;


Isn't this DECL_RTL_IF_SET? If so, please use that macro. Otherwise ok.


Bernd


Re: [PATCH, GCC/ARM] Fix PR77933: stack corruption on ARM when using high registers and lr

2016-11-09 Thread Thomas Preudhomme
I've reworked the patch following comments from Wilco [1] (sorry could not find 
it in my MUA for some reason).


[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00317.html


== Context ==

When saving registers, function thumb1_expand_prologue () aims at minimizing the 
number of push instructions. One of the optimization it does is to push LR 
alongside high register(s) (after having moved them to low register(s)) when 
there is no low register to save. The way this is implemented is to add LR to 
the pushable_regs mask if it is live just before pushing the registers in that 
mask. The mask of live pushable registers which is used to detect whether LR 
needs to be saved is then clear to ensure LR is only saved once.



== Problem ==

However beyond deciding what register to push pushable_regs is used to track 
what pushable register can be used to move a high register before being pushed, 
hence the name. That mask is cleared when all high registers have been assigned 
a low register but the clearing assumes the high registers were assigned to the 
registers with the biggest number in that mask. This is not the case because LR 
is not considered when looking for a register in that mask. Furthermore, LR 
might have been saved in the TARGET_BACKTRACE path above yet the mask of live 
pushable registers is not cleared in that case.



== Solution ==

This patch changes the loop to iterate over register LR to r0 so as to both fix 
the stack corruption reported in PR77933 and reuse lr to push some high register 
when possible. This patch also introduce a new variable lr_needs_saving to 
record whether LR (still) needs to be saved at a given point in code and sets 
the variable accordingly throughout the code, thus fixing the second issue. 
Finally, this patch create a new push_mask variable to distinguish between the 
mask of registers to push and the mask of live pushable registers.



== Note ==

Other bits could have been improved but have been left out to allow the patch to 
be backported to stable branch:


(1) using argument registers that are not holding an argument
(2) using push_mask consistently instead of l_mask (in TARGET_BACKTRACE), mask 
(low register push) and push_mask

(3) the !l_mask case improved in TARGET_BACKTRACE since offset == 0
(4) rename l_mask to a more appropriate name (live_pushable_regs_mask?)

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2016-11-08  Thomas Preud'homme  

PR target/77933
* config/arm/arm.c (thumb1_expand_prologue): Distinguish between lr
being live in the function and lr needing to be saved.  Distinguish
between already saved pushable registers and registers to push.
Check for LR being an available pushable register.


*** gcc/testsuite/ChangeLog ***

2016-11-08  Thomas Preud'homme  

PR target/77933
* gcc.target/arm/pr77933-1.c: New test.
* gcc.target/arm/pr77933-2.c: Likewise.


Testing: no regression on arm-none-eabi GCC cross-compiler targeting Cortex-M0

Is this ok for trunk?

Best regards,

Thomas

On 02/11/16 17:08, Thomas Preudhomme wrote:

Hi,

When saving registers, function thumb1_expand_prologue () aims at minimizing the
number of push instructions. One of the optimization it does is to push lr
alongside high register(s) (after having moved them to low register(s)) when
there is no low register to save. The way this is implemented is to add lr to
the list of registers that can be pushed just before the push happens. This
would then push lr and allows it to be used for further push if there was not
enough registers to push all high registers to be pushed.

However, the logic that decides what register to move high registers to before
being pushed only looks at low registers (see for loop initialization). This
means not only that lr is not used for pushing high registers but also that lr
is not removed from the list of registers to be pushed when it's not used. This
extra lr push is not poped in epilogue leading in stack corruption.

This patch changes the loop to iterate over register r0 to lr so as to both fix
the stack corruption and reuse lr to push some high register when possible.

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2016-11-01  Thomas Preud'homme  

PR target/77933
* config/arm/arm.c (thumb1_expand_prologue): Also check for lr being a
pushable register.


*** gcc/testsuite/ChangeLog ***

2016-11-01  Thomas Preud'homme  

PR target/77933
* gcc.target/arm/pr77933.c: New test.


Testing: no regression on arm-none-eabi GCC cross-compiler targeting Cortex-M0

Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dd8d5e5db8ca50daab648e58df290969aa794862..ddbda3e46dbcabb6c5775f847bc338c37705e122 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ 

Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)

2016-11-09 Thread James Greenhalgh
On Sat, Oct 29, 2016 at 05:47:54AM +, Bernd Edlinger wrote:
> On 10/28/16 16:05, Bernd Edlinger wrote:
> > On 10/27/16 22:23, Joseph Myers wrote:
> >> On Thu, 27 Oct 2016, Bernd Edlinger wrote:
> >>
> >>> Hi,
> >>>
> >>> by code reading I became aware that libgcc can call count_leading_zeros
> >>> in certain cases which can give undefined results.  This happens on
> >>> signed int128 -> float or double conversions, when the int128 is in
> >>> the range
> >>> INT64_MAX+1 to UINT64_MAX.
> >>
> >> I'd expect testcases added to the testsuite that exercise this case at
> >> runtime, if not already present.
> >>
> >
> > Yes, thanks.  I somehow expected there were already test cases,
> > somewhere, but now when you ask that, I begin to doubt as well...
> >
> > I will try to add an asm("int 3") and see if that gets hit at all.
> >
> 
> The breakpoint got hit only once, in the libgo testsuite: runtime/pprof.
> 
> I see there are some int to float conversion tests at
> gcc.dg/torture/fp-int-convert*.c, where it is easy to add a
> test case that hits the breakpoint too.
> 
> However the test case does not fail before the patch,
> it is just slightly undefined behavior, that is not
> causing problems (at least for x86_64).
> 
> Find attached a new patch with test case.

These new test cases look like they are going to be out of exponent range
for _Float16 - so the testcases will fail for a target which tests either
of:

  gcc.dg/torture/fp-int-convert-float16.c
  gcc.dg/torture/fp-int-convert-float16-timode.c

Should they have an M_OK1 check?

Thanks,
James


> 
> 
> Boot-strapped on x86_64-pc-linux-gnu.
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.

> 2016-10-27  Bernd Edlinger  
> 
>   PR libgcc/78067
>   * libgcc2.c (__floatdisf, __floatdidf): Avoid undefined results from
>   count_leading_zeros.
> 
> testsuite:
> 2016-10-27  Bernd Edlinger  
> 
>   PR libgcc/78067
>   * gcc.dg/torture/fp-int-convert.h: Add more conversion tests.
>   
> 
> Index: libgcc2.c
> ===
> --- libgcc2.c (revision 241400)
> +++ libgcc2.c (working copy)
> @@ -1643,6 +1643,11 @@
>  hi = -(UWtype) hi;
>  
>UWtype count, shift;
> +#if !defined (COUNT_LEADING_ZEROS_0) || COUNT_LEADING_ZEROS_0 != W_TYPE_SIZE
> +  if (hi == 0)
> +count = W_TYPE_SIZE;
> +  else
> +#endif
>count_leading_zeros (count, hi);
>  
>/* No leading bits means u == minimum.  */
> Index: gcc/testsuite/gcc.dg/torture/fp-int-convert.h
> ===
> --- gcc/testsuite/gcc.dg/torture/fp-int-convert.h (revision 241647)
> +++ gcc/testsuite/gcc.dg/torture/fp-int-convert.h (working copy)
> @@ -53,6 +53,8 @@ do {
> \
>TEST_I_F_VAL (U, F, HVAL1U (P, U), P_OK (P, U));   \
>TEST_I_F_VAL (U, F, HVAL1U (P, U) + 1, P_OK (P, U));   \
>TEST_I_F_VAL (U, F, HVAL1U (P, U) - 1, P_OK (P, U));   \
> +  TEST_I_F_VAL (I, F, WVAL0S (I), 1);\
> +  TEST_I_F_VAL (I, F, -WVAL0S (I), 1);   \
>  } while (0)
>  
>  #define P_OK(P, T) ((P) >= sizeof(T) * CHAR_BIT)
> @@ -74,6 +76,7 @@ do {
> \
>? (S)1  \
>: (((S)1 << (sizeof(S) * CHAR_BIT - 2)) \
>   + ((S)3 << (sizeof(S) * CHAR_BIT - 2 - P
> +#define WVAL0S(S) (S)((S)1 << (sizeof(S) * CHAR_BIT / 2 - 1))
>  
>  #define TEST_I_F_VAL(IT, FT, VAL, PREC_OK)   \
>  do { \



Re: [PATCH] fix PR68468

2016-11-09 Thread Jakub Jelinek
On Wed, Nov 09, 2016 at 04:08:39PM +0100, Bernd Schmidt wrote:
> On 11/05/2016 06:14 PM, Waldemar Brodkorb wrote:
> >Hi,
> >
> >the following patch fixes PR68468.
> >Patch is used for a while in Buildroot without issues.
> >
> >2016-11-05  Waldemar Brodkorb 

Two spaces before < instead of just one.
> >
> >   PR gcc/68468

PR libgcc/68468
instead.

> >   * libgcc/unwind-dw2-fde-dip.c: fix build on FDPIC targets.

Capital F in Fix.
No libgcc/ prefix for files in libgcc/ChangeLog.

> This is ok.

I think Waldemar does not have SVN write access, are you going to check it
in or who will do that?

Jakub


[PATCH, GCC/ARM] Fix ICE when compiling empty FIQ interrupt handler in ARM mode

2016-11-09 Thread Thomas Preudhomme

Hi,

This patch fixes the following ICE when building when compiling an empty FIQ 
interrupt handler in ARM mode:


empty_fiq_handler.c:5:1: error: insn does not satisfy its constraints:
 }
 ^

(insn/f 13 12 14 (set (reg/f:SI 13 sp)
(plus:SI (reg/f:SI 11 fp)
(const_int 4 [0x4]))) irq.c:5 4 {*arm_addsi3}
 (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:SI 13 sp)
(plus:SI (reg/f:SI 11 fp)
(const_int 4 [0x4])))
(nil)))

The ICE was provoked by missing an alternative to reflect that ARM mode can do 
an add of general register into sp which is unpredictable in Thumb mode add 
immediate.


ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2016-11-04  Thomas Preud'homme  

* config/arm/arm.md (arm_addsi3): Add alternative for addition of
general register with general register or ARM constant into SP
register.


*** gcc/testsuite/ChangeLog ***

2016-11-04  Thomas Preud'homme  

* gcc.target/arm/empty_fiq_handler.c: New test.


Testing: bootstrapped on ARMv7-A ARM mode & testsuite shows no regression.

Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 8393f65bcf4c9c3e61b91e5adcd5f59ff7c6ec3f..70cd31f6cb176fe29efc1fbbf692bfc270ef5a1b 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -609,9 +609,9 @@
 ;;  (plus (reg rN) (reg sp)) into (reg rN).  In this case reload will
 ;; put the duplicated register first, and not try the commutative version.
 (define_insn_and_split "*arm_addsi3"
-  [(set (match_operand:SI  0 "s_register_operand" "=rk,l,l ,l ,r ,k ,r,r ,k ,r ,k,k,r ,k ,r")
-(plus:SI (match_operand:SI 1 "s_register_operand" "%0 ,l,0 ,l ,rk,k ,r,rk,k ,rk,k,r,rk,k ,rk")
- (match_operand:SI 2 "reg_or_int_operand" "rk ,l,Py,Pd,rI,rI,k,Pj,Pj,L ,L,L,PJ,PJ,?n")))]
+  [(set (match_operand:SI  0 "s_register_operand" "=rk,l,l ,l ,r ,k ,r,k ,r ,k ,r ,k,k,r ,k ,r")
+	(plus:SI (match_operand:SI 1 "s_register_operand" "%0 ,l,0 ,l ,rk,k ,r,r ,rk,k ,rk,k,r,rk,k ,rk")
+		 (match_operand:SI 2 "reg_or_int_operand" "rk ,l,Py,Pd,rI,rI,k,rI,Pj,Pj,L ,L,L,PJ,PJ,?n")))]
   "TARGET_32BIT"
   "@
add%?\\t%0, %0, %2
@@ -621,6 +621,7 @@
add%?\\t%0, %1, %2
add%?\\t%0, %1, %2
add%?\\t%0, %2, %1
+   add%?\\t%0, %1, %2
addw%?\\t%0, %1, %2
addw%?\\t%0, %1, %2
sub%?\\t%0, %1, #%n2
@@ -640,10 +641,10 @@
 		  operands[1], 0);
   DONE;
   "
-  [(set_attr "length" "2,4,4,4,4,4,4,4,4,4,4,4,4,4,16")
+  [(set_attr "length" "2,4,4,4,4,4,4,4,4,4,4,4,4,4,4,16")
(set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "yes,yes,yes,yes,no,no,no,no,no,no,no,no,no,no,no")
-   (set_attr "arch" "t2,t2,t2,t2,*,*,*,t2,t2,*,*,a,t2,t2,*")
+   (set_attr "predicable_short_it" "yes,yes,yes,yes,no,no,no,no,no,no,no,no,no,no,no,no")
+   (set_attr "arch" "t2,t2,t2,t2,*,*,*,a,t2,t2,*,*,a,t2,t2,*")
(set (attr "type") (if_then_else (match_operand 2 "const_int_operand" "")
 		  (const_string "alu_imm")
 		  (const_string "alu_sreg")))
diff --git a/gcc/testsuite/gcc.target/arm/empty_fiq_handler.c b/gcc/testsuite/gcc.target/arm/empty_fiq_handler.c
new file mode 100644
index ..bbcfd0e32f9d0cc60c8a013fd1bb584b21aaad16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/empty_fiq_handler.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+
+/* Below code used to trigger an ICE due to missing constraints for
+   sp = fp + cst pattern.  */
+
+void fiq_handler (void) __attribute__((interrupt ("FIQ")));
+
+void
+fiq_handler (void)
+{
+}


[PATCH] (v2) print-rtl-function.c: add (param) directive to dump

2016-11-09 Thread David Malcolm
On Wed, 2016-11-09 at 12:59 +0100, Bernd Schmidt wrote:
> On 11/08/2016 07:03 PM, David Malcolm wrote:
> > int __RTL("rtl-combine") f1 (int n)
> > {
> > (function "f1"
> >   (param "n"
> > (DECL_RTL
> >   (reg/v:SI %1 [ n ])
> > ) ;; DECL_RTL
>
> The ;; DECL_RTL etc. comments seem somewhat redundant and add
> clutter.
> Please remove those.

Done.

> Also, why is the closing paren on its own line? That doesn't seem
> right.

It's because of the m_sawclose when calling print_rtx, which leads to
a leading newline.

I fixed this by introducing a rtx_writer::finish_directive method to
unset m_sawclose.  (That field seems misnamed; perhaps it should be
renamed to m_force_newline or somesuch?  That feels like a separate
patch though).

> Later (not for this patch) I'd really like to see some logic not to
> add
> linebreaks before simple expressions, so that we'd have (DECL_RTL
> (reg:SI xyz)) on a single line.

I've been thinking about a patch that would track indentation, which
would only insert newlines when close to some column limit, and would
line up the open parentheses for siblings.
That way you'd get things like:

(DECL_RTL (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars)
 (const_int -4)) [1 i+0 S4 A32]))

where the newline is auto-inserted to fit things within e.g. an
80 char column, and where "const_int" gets indented to be underneath
the "reg", since they're both operands of the same "plus".

Anyway, the following updated version of the patch eliminates
the newlines and comments.

Updated examples: aarch64 function taking one argument:

(function "times_two"
  (param "i"
(DECL_RTL (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars)
(const_int -4)) [1 i+0 S4 A32]))
(DECL_RTL_INCOMING (reg:SI x0 [ i ])))
  (insn-chain
 ;; etc

x86_64 function taking three ints:

(function "test_1"
  (param "i"
(DECL_RTL (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars)
(const_int -4)) [1 i+0 S4 A32]))
(DECL_RTL_INCOMING (reg:SI di [ i ])))
  (param "j"
(DECL_RTL (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars)
(const_int -8)) [1 j+0 S4 A32]))
(DECL_RTL_INCOMING (reg:SI si [ j ])))
  (param "k"
(DECL_RTL (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars)
(const_int -12)) [1 k+0 S4 A32]))
(DECL_RTL_INCOMING (reg:SI dx [ k ])))
  (insn-chain
 ;; etc

As before, only lightly tested so far.

OK for trunk if it passes bootstrap and regrtest?

gcc/ChangeLog:
* print-rtl-function.c (print_any_param_name): New function.
(print_param): New function.
(print_rtx_function): Call print_param for each argument.
* print-rtl.c (rtx_writer::finish_directive): New function.
* print-rtl.h (rtx_writer::finish_directive): New decl.
---
 gcc/print-rtl-function.c | 36 
 gcc/print-rtl.c  |  9 +
 gcc/print-rtl.h  |  2 ++
 3 files changed, 47 insertions(+)

diff --git a/gcc/print-rtl-function.c b/gcc/print-rtl-function.c
index b62f1b3..f18491a 100644
--- a/gcc/print-rtl-function.c
+++ b/gcc/print-rtl-function.c
@@ -127,6 +127,38 @@ can_have_basic_block_p (const rtx_insn *insn)
   return true;
 }
 
+/* Subroutine of print_param.  Write the name of ARG, if any, to OUTFILE.  */
+
+static void
+print_any_param_name (FILE *outfile, tree arg)
+{
+  if (DECL_NAME (arg))
+fprintf (outfile, " \"%s\"", IDENTIFIER_POINTER (DECL_NAME (arg)));
+}
+
+/* Print a "(param)" directive for ARG to OUTFILE.  */
+
+static void
+print_param (FILE *outfile, rtx_writer , tree arg)
+{
+  fprintf (outfile, "  (param");
+  print_any_param_name (outfile, arg);
+  fprintf (outfile, "\n");
+
+  /* Print the value of DECL_RTL (without lazy-evaluation).  */
+  fprintf (outfile, "(DECL_RTL ");
+  rtx decl_rtl = DECL_WRTL_CHECK (arg)->decl_with_rtl.rtl;
+  w.print_rtx (decl_rtl);
+  w.finish_directive ();
+
+  /* Print DECL_INCOMING_RTL.  */
+  fprintf (outfile, "(DECL_RTL_INCOMING ");
+  w.print_rtx (DECL_INCOMING_RTL (arg));
+  fprintf (outfile, ")");
+
+  w.finish_directive ();
+}
+
 /* Write FN to OUTFILE in a form suitable for parsing, with indentation
and comments to make the structure easy for a human to grok.  Track
the basic blocks of insns in the chain, wrapping those that are within
@@ -202,6 +234,10 @@ print_rtx_function (FILE *outfile, function *fn, bool 
compact)
 
   fprintf (outfile, "(function \"%s\"\n", dname);
 
+  /* Params.  */
+  for (tree arg = DECL_ARGUMENTS (fdecl); arg; arg = DECL_CHAIN (arg))
+print_param (outfile, w, arg);
+
   /* The instruction chain.  */
   fprintf (outfile, "  (insn-chain\n");
   basic_block curr_bb = NULL;
diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
index 30ff8fa..3bbd395 100644
--- a/gcc/print-rtl.c
+++ b/gcc/print-rtl.c
@@ -921,6 +921,15 @@ rtx_writer::print_rtx (const_rtx in_rtx)
   m_sawclose = 1;
 }
 
+/* Emit a closing parenthesis and newline.  */
+
+void
+rtx_writer::finish_directive ()

Re: [PATCH] Add testcase for PR middle-end/77718

2016-11-09 Thread Richard Biener
On November 9, 2016 4:34:27 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>I've noticed this PR is still open, but has been fixed, though
>apparently Bernd's patch from the PR has been applied as is with
>the needed tab eaten (instead 7 spaces) and no testcase has been added.
>
>Tested with cross-compiler and the r240625 change reverted and as is
>and running the resulting assembler on ppc64le.
>
>Ok for trunk?

OK.

Richard.

>2016-11-09  Jakub Jelinek  
>
>   PR target/77718
>   * builtins.c (expand_builtin_memcmp): Formatting fix.
>
>   * gcc.c-torture/execute/pr77718.c: New test.
>
>--- gcc/ChangeLog.jj   2016-11-09 15:22:28.0 +0100
>+++ gcc/ChangeLog  2016-11-09 16:29:35.152056326 +0100
>@@ -5027,7 +5027,8 @@
> 
> 2016-09-29  Bernd Schmidt  
> 
>-  * builtins.c (expand_builtin_memcmp): don't swap args unless
>+  PR target/77718
>+  * builtins.c (expand_builtin_memcmp): Don't swap args unless
>   result is only being compared with zero.
> 
> 2016-09-29  Marek Polacek  
>--- gcc/builtins.c.jj  2016-10-31 13:28:06.0 +0100
>+++ gcc/builtins.c 2016-11-09 16:19:08.886909150 +0100
>@@ -3754,7 +3754,7 @@ expand_builtin_memcmp (tree exp, rtx tar
> {
>   src_str = c_getstr (arg1);
>   if (src_str != NULL)
>-   std::swap (arg1_rtx, arg2_rtx);
>+  std::swap (arg1_rtx, arg2_rtx);
> }
> 
>   /* If SRC is a string constant and block move would be done
>--- gcc/testsuite/gcc.c-torture/execute/pr77718.c.jj   2016-11-09
>16:28:30.414868074 +0100
>+++ gcc/testsuite/gcc.c-torture/execute/pr77718.c  2016-11-09
>16:22:06.0 +0100
>@@ -0,0 +1,25 @@
>+/* PR middle-end/77718 */
>+
>+char a[64] __attribute__((aligned (8)));
>+
>+__attribute__((noinline, noclone)) int
>+foo (void)
>+{
>+  return __builtin_memcmp ("bb", a, 6);
>+}
>+
>+__attribute__((noinline, noclone)) int
>+bar (void)
>+{
>+  return __builtin_memcmp (a, "bb", 6);
>+}
>+
>+int
>+main ()
>+{
>+  __builtin_memset (a, 'a', sizeof (a));
>+  if (((foo () < 0) ^ ('a' > 'b'))
>+  || ((bar () < 0) ^ ('a' < 'b')))
>+__builtin_abort ();
>+  return 0;
>+}
>
>   Jakub




Re: [PATCH] Fix folding of memcmp("a", "a", 2) (PR, tree-optimization/78257)

2016-11-09 Thread David Edelsohn
This patch breaks bootstrap on AIX and probably on any 32 bit host.

Thanks, David

/nasfarm/edelsohn/src/src/gcc/fold-const-call.c: In function
'tree_node* fold_const_call(combined_fn, tree, tree, tree, tree)':
/nasfarm/edelsohn/src/src/gcc/fold-const-call.c:1541:36: error: cannot
convert 'size_t* {aka long unsigned int*}' to 'long long unsigned
int*' for argument '2' to 'const char* c_getstr(tree, long long
unsigned int*)'
   if ((p0 = c_getstr (arg0, ))
^
/nasfarm/edelsohn/src/src/gcc/fold-const-call.c:1542:32: error: cannot
convert 'size_t* {aka long unsigned int*}' to 'long long unsigned
int*' for argument '2' to 'const char* c_getstr(tree, long long
unsigned int*)'
&& (p1 = c_getstr (arg1, ))
^
make[3]: *** [fold-const-call.o] Error 1


[PATCH, Fortran] PR78259: ICE in gfc_trans_subcomponent_assign, at fortran/trans-expr.c:7330

2016-11-09 Thread Fritz Reese
All,

Will commit the below to trunk as an obvious fix for PR78259. (The
regression was introduced in r241626 from
https://gcc.gnu.org/ml/fortran/2016-10/msg00206.html). The patch
clears regtests.

---
Fritz Reese

From: Fritz O. Reese 
Date: Wed, 9 Nov 2016 10:59:17 -0500
Subject: [PATCH] Fix ICE in gfc_trans_subcomponent_assign due to NULL
components.

PR fortran/78259
* gcc/fortran/trans-expr.c (gfc_trans_subcomponent_assign): Guard
against NULL values.

PR fortran/78259
* gcc/testsuite/gfortran.dg/pr78259.f90: New test.
---

diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 7159b17..0352916 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -7321,7 +7321,8 @@ gfc_trans_subcomponent_assign (tree dest, gfc_component *
   gfc_constructor *c = gfc_constructor_first (expr->value.constructor);
   /* We mark that the entire union should be initialized with a contrived
  EXPR_NULL expression at the beginning.  */
-  if (c->n.component == NULL && c->expr->expr_type == EXPR_NULL)
+  if (c != NULL && c->n.component == NULL
+ && c->expr != NULL && c->expr->expr_type == EXPR_NULL)
 {
   tmp = build2_loc (input_location, MODIFY_EXPR, void_type_node,
dest, build_constructor (TREE_TYPE (dest), NULL));
diff --git a/gcc/testsuite/gfortran.dg/pr78259.f90 b/gcc/testsuite/gfortran.dg/p
new file mode 100644
index 000..82f48ea
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr78259.f90
@@ -0,0 +1,22 @@
+! { dg-do "compile" }
+! { dg-options "-fdec-structure" }
+!
+! PR fortran/78259
+!
+! ICE in gfc_trans_subcomponent_assign
+!
+
+subroutine sub
+  structure /s/
+union
+  map
+integer n(2)
+  end map
+  map
+integer(8) m /2/
+  end map
+end union
+  end structure
+  record /s/ r
+  r.n(1) = 1
+end


[PING][PATCH][aarch64] Improve Logical And Immediate Expressions

2016-11-09 Thread Michael Collison
Ping. Link to original post:

https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02305.html


[PATCH] Add testcase for PR middle-end/77718

2016-11-09 Thread Jakub Jelinek
Hi!

I've noticed this PR is still open, but has been fixed, though
apparently Bernd's patch from the PR has been applied as is with
the needed tab eaten (instead 7 spaces) and no testcase has been added.

Tested with cross-compiler and the r240625 change reverted and as is
and running the resulting assembler on ppc64le.

Ok for trunk?

2016-11-09  Jakub Jelinek  

PR target/77718
* builtins.c (expand_builtin_memcmp): Formatting fix.

* gcc.c-torture/execute/pr77718.c: New test.

--- gcc/ChangeLog.jj2016-11-09 15:22:28.0 +0100
+++ gcc/ChangeLog   2016-11-09 16:29:35.152056326 +0100
@@ -5027,7 +5027,8 @@
 
 2016-09-29  Bernd Schmidt  
 
-   * builtins.c (expand_builtin_memcmp): don't swap args unless
+   PR target/77718
+   * builtins.c (expand_builtin_memcmp): Don't swap args unless
result is only being compared with zero.
 
 2016-09-29  Marek Polacek  
--- gcc/builtins.c.jj   2016-10-31 13:28:06.0 +0100
+++ gcc/builtins.c  2016-11-09 16:19:08.886909150 +0100
@@ -3754,7 +3754,7 @@ expand_builtin_memcmp (tree exp, rtx tar
 {
   src_str = c_getstr (arg1);
   if (src_str != NULL)
-   std::swap (arg1_rtx, arg2_rtx);
+   std::swap (arg1_rtx, arg2_rtx);
 }
 
   /* If SRC is a string constant and block move would be done
--- gcc/testsuite/gcc.c-torture/execute/pr77718.c.jj2016-11-09 
16:28:30.414868074 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr77718.c   2016-11-09 
16:22:06.0 +0100
@@ -0,0 +1,25 @@
+/* PR middle-end/77718 */
+
+char a[64] __attribute__((aligned (8)));
+
+__attribute__((noinline, noclone)) int
+foo (void)
+{
+  return __builtin_memcmp ("bb", a, 6);
+}
+
+__attribute__((noinline, noclone)) int
+bar (void)
+{
+  return __builtin_memcmp (a, "bb", 6);
+}
+
+int
+main ()
+{
+  __builtin_memset (a, 'a', sizeof (a));
+  if (((foo () < 0) ^ ('a' > 'b'))
+  || ((bar () < 0) ^ ('a' < 'b')))
+__builtin_abort ();
+  return 0;
+}

Jakub


Re: [PATCH] fix PR68468

2016-11-09 Thread Bernd Schmidt

On 11/05/2016 06:14 PM, Waldemar Brodkorb wrote:

Hi,

the following patch fixes PR68468.
Patch is used for a while in Buildroot without issues.

2016-11-05  Waldemar Brodkorb 

   PR gcc/68468
   * libgcc/unwind-dw2-fde-dip.c: fix build on FDPIC targets.


This is ok.


Bernd



Re: [Patch, fortran] PR44265 - Link error with reference to parameter array in specification expression

2016-11-09 Thread Dominique d'Humières
Sorry for the bad news, but while gfortran regtests with regression with the 
patch I still get a link error with the original test:

% gfc pr44265.f90
Undefined symbols for architecture x86_64:
  "___fruits_MOD_names", referenced from:
  _MAIN__ in ccyeNqa1.o
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status

and its variants.

Thanks for working on this issue

Dominique

> Le 9 nov. 2016 à 13:47, Paul Richard Thomas  a 
> écrit :
> 
> Dear All,
> 
> The title of this PR says what this is all about, except that it
> applies uniquely applicable to character function result string
> lengths.
> 
> Ian Harvey wrote the first patch for this PR for which many thanks.
> However, two issues came up that took some little while to understand;
> (i) In comment #1, it was found that calls from sibling procedures
> could generate streams of undefined references at the link stage; and
> (ii) An entity of the same name as the contained procedure entity in
> module scope caused similar problems.
> 
> The relationship with Ian's patch is still obvious. The fundamental
> difference is that the parameter arrays are automatically promoted to
> module scope using a unique symtree. This fixes both the issues above.
> 
> Dominique, could you please check that the -m32 issue has gone away?
> 
> Bootstrapped and regtested on FC21/x86_64 - OK for trunk?
> 
> Paul
> 
> 2016-11-09  Paul Thomas  
> 
>PR fortran/44265
>* gfortran.h : Add fn_result_spec bitfield to gfc_symbol.
>* resolve.c (flag_fn_result_spec): New function.
>(resolve_fntype): Call it for character result lengths.
>* symbol.c (gfc_new_symbol): Set fn_result_spec to zero.
>* trans-decl.c (gfc_sym_mangled_identifier): Include the
>procedure name in the mangled name for symbols with the
>fn_result_spec bit set.
>(gfc_get_symbol_decl): Mangle the name of these symbols.
>(gfc_create_module_variable): Allow them through the assert.
>(gfc_generate_function_code): Remove the assert before the
>initialization of sym->tlink because the frontend no longer
>uses this field.
>* trans-expr.c (gfc_map_intrinsic_function): Add a case to
>treat the LEN_TRIM intrinsic.
> 
> 2016-11-09  Paul Thomas  
> 
>PR fortran/44265
>* gfortran.dg/char_result_14.f90: New test.
>* gfortran.dg/char_result_15.f90: New test.
> 
> 
> -- 
> The difference between genius and stupidity is; genius has its limits.
> 
> Albert Einstein
> 



Re: [PATCHv2 6/7, GCC, ARM, V8M] ARMv8-M Security Extension's cmse_nonsecure_call: use __gnu_cmse_nonsecure_call

2016-11-09 Thread Andre Vieira (lists)
On 27/10/16 11:01, Andre Vieira (lists) wrote:
> On 25/10/16 17:30, Andre Vieira (lists) wrote:
>> On 24/08/16 12:01, Andre Vieira (lists) wrote:
>>> On 25/07/16 14:26, Andre Vieira (lists) wrote:
 This patch extends support for the ARMv8-M Security Extensions
 'cmse_nonsecure_call' to use a new library function
 '__gnu_cmse_nonsecure_call'. This library function is responsible for
 (without using r0-r3 or d0-d7):
 1) saving and clearing all callee-saved registers using the secure stack
 2) clearing the LSB of the address passed in r4 and using blxns to
 'jump' to it
 3) clearing ASPR, including the 'ge bits' if DSP is enabled
 4) clearing FPSCR if using non-soft float-abi
 5) restoring callee-saved registers.

 The decisions whether to include DSP 'ge bits' clearing and floating
 point registers (single/double precision) all depends on the multilib used.

 See Section 5.5 of ARM®v8-M Security Extensions
 (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html).

 *** gcc/ChangeLog ***
 2016-07-25  Andre Vieira
 Thomas Preud'homme  

 * config/arm/arm.c (detect_cmse_nonsecure_call): New.
 (cmse_nonsecure_call_clear_caller_saved): New.
 (arm_reorg): Use cmse_nonsecure_call_clear_caller_saved.
 * config/arm/arm-protos.h (detect_cmse_nonsecure_call): New.
 * config/arm/arm.md (call): Handle cmse_nonsecure_entry.
 (call_value): Likewise.
 (nonsecure_call_internal): New.
 (nonsecure_call_value_internal): New.
 * config/arm/thumb1.md (*nonsecure_call_reg_thumb1_v5): New.
 (*nonsecure_call_value_reg_thumb1_v5): New.
 * config/arm/thumb2.md (*nonsecure_call_reg_thumb2): New.
 (*nonsecure_call_value_reg_thumb2): New.
 * config/arm/unspecs.md (UNSPEC_NONSECURE_MEM): New.

 *** libgcc/ChangeLog ***
 2016-07-25  Andre Vieira
 Thomas Preud'homme  

 * config/arm/cmse_nonsecure_call.S: New.
* config/arm/t-arm: Compile cmse_nonsecure_call.S


 *** gcc/testsuite/ChangeLog ***
 2016-07-25  Andre Vieira
 Thomas Preud'homme  

 * gcc.target/arm/cmse/cmse.exp: Run tests in mainline dir.
 * gcc.target/arm/cmse/cmse-9.c: Added some extra tests.
 * gcc.target/arm/cmse/baseline/bitfield-4.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-5.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-6.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-7.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-8.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-9.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-and-union-1.c: New.
 * gcc.target/arm/cmse/baseline/cmse-11.c: New.
* gcc.target/arm/cmse/baseline/cmse-13.c: New.
* gcc.target/arm/cmse/baseline/cmse-6.c: New.
 * gcc/testsuite/gcc.target/arm/cmse/baseline/union-1.c: New.
 * gcc/testsuite/gcc.target/arm/cmse/baseline/union-2.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/soft/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/soft/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/soft/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-8.c: New.

>>>
>>> Updated this patch to correctly clear only the cumulative
>>> exception-status (0-4,7) and the condition code bits (28-31) of the FPSCR.
>>>
>>> 
>>>
>>> This patch extends support for the ARMv8-M Security Extensions
>>> 'cmse_nonsecure_call' to use a new library function
>>> '__gnu_cmse_nonsecure_call'. This library function is responsible for
>>> (without using r0-r3 or d0-d7):
>>> 1) saving and clearing all callee-saved registers using the secure stack
>>> 2) clearing the LSB of the address passed in r4 and using blxns to
>>> 'jump' to it
>>> 3) clearing ASPR, including the 'ge bits' if DSP is enabled
>>> 4) clearing the cumulative exception-status (0-4, 7) and the condition
>>> bits (28-31) of the FPSCR 

Re: [PATCH 7/7, GCC, ARM, V8M] Added support for ARMV8-M Security Extension cmse_nonsecure_caller intrinsic

2016-11-09 Thread Andre Vieira (lists)
On 09/11/16 10:26, Kyrill Tkachov wrote:
> 
> @@ -1832,6 +1834,17 @@ arm_init_builtins (void)
>  = add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr,
>  ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE);
>  }
> +
> +  if (arm_arch_cmse)
> +{
> +  tree ftype_cmse_nonsecure_caller
> += build_function_ty
> 
> Should this be use_cmse ?
> This looks ok to me otherwise.
> I believe patch [6/7] is the only one needing approval after this...
> 
> Kyrill

Hi,

Yeah it should indeed be 'use_cmse'. Here is the reworked version, no
changes to ChangeLog.

Cheers,
Andre
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
e73043db6db69fa64bb1e72cf71a36d7169062db..232d2de52106324e9f22b535a1b02cb95a2294b7
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -528,6 +528,8 @@ enum arm_builtins
   ARM_BUILTIN_GET_FPSCR,
   ARM_BUILTIN_SET_FPSCR,
 
+  ARM_BUILTIN_CMSE_NONSECURE_CALLER,
+
 #undef CRYPTO1
 #undef CRYPTO2
 #undef CRYPTO3
@@ -1832,6 +1834,17 @@ arm_init_builtins (void)
= add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr,
ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, 
NULL_TREE);
 }
+
+  if (use_cmse)
+{
+  tree ftype_cmse_nonsecure_caller
+   = build_function_type_list (unsigned_type_node, NULL);
+  arm_builtin_decls[ARM_BUILTIN_CMSE_NONSECURE_CALLER]
+   = add_builtin_function ("__builtin_arm_cmse_nonsecure_caller",
+   ftype_cmse_nonsecure_caller,
+   ARM_BUILTIN_CMSE_NONSECURE_CALLER, BUILT_IN_MD,
+   NULL, NULL_TREE);
+}
 }
 
 /* Return the ARM builtin for CODE.  */
@@ -2452,6 +2465,12 @@ arm_expand_builtin (tree exp,
   emit_insn (pat);
   return target;
 
+case ARM_BUILTIN_CMSE_NONSECURE_CALLER:
+  target = gen_reg_rtx (SImode);
+  op0 = arm_return_addr (0, NULL_RTX);
+  emit_insn (gen_addsi3 (target, op0, const1_rtx));
+  return target;
+
 case ARM_BUILTIN_TEXTRMSB:
 case ARM_BUILTIN_TEXTRMUB:
 case ARM_BUILTIN_TEXTRMSH:
diff --git a/gcc/config/arm/arm_cmse.h b/gcc/config/arm/arm_cmse.h
index 
894343bb835b61e09c14668d45aa43a8693fd011..82b58b1c4f4a12ba6062e2cc2632653788d0eeb7
 100644
--- a/gcc/config/arm/arm_cmse.h
+++ b/gcc/config/arm/arm_cmse.h
@@ -163,6 +163,13 @@ __attribute__ ((__always_inline__))
 cmse_TTAT (void *__p)
 __CMSE_TT_ASM (at)
 
+/* FIXME: diagnose use outside cmse_nonsecure_entry functions.  */
+__extension__ static __inline int __attribute__ ((__always_inline__))
+cmse_nonsecure_caller (void)
+{
+  return __builtin_arm_cmse_nonsecure_caller ();
+}
+
 #define CMSE_AU_NONSECURE  2
 #define CMSE_MPU_NONSECURE 16
 #define CMSE_NONSECURE 18
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 
462e6c71e20791b35f02adabfc97b9b013fd296a..88f6e014c3def8e6a2d2452df5d4937a4f0dd1ef
 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -12625,6 +12625,7 @@ cmse_address_info_t cmse_TTAT_fptr (FPTR)
 void * cmse_check_address_range (void *, size_t, int)
 typeof(p) cmse_nsfptr_create (FPTR p)
 intptr_t cmse_is_nsfptr (FPTR)
+int cmse_nonsecure_caller (void)
 @end smallexample
 
 @node AVR Built-in Functions
diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c 
b/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c
index 
d5b9a2d9d59569de170da814ae660e9fb2b943e7..ddcf12a30a6c1806969d239c448da81ccf49532e
 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c
@@ -65,3 +65,32 @@ int foo (char * p)
 /* { dg-final { scan-assembler-times "ttat " 2 } } */
 /* { dg-final { scan-assembler-times "bl.cmse_check_address_range" 7 } } */
 /* { dg-final { scan-assembler-not "cmse_check_pointed_object" } } */
+
+typedef int (*int_ret_funcptr_t) (void);
+typedef int __attribute__ ((cmse_nonsecure_call)) (*int_ret_nsfuncptr_t) 
(void);
+
+int __attribute__ ((cmse_nonsecure_entry))
+baz (void)
+{
+  return cmse_nonsecure_caller ();
+}
+
+int __attribute__ ((cmse_nonsecure_entry))
+qux (int_ret_funcptr_t int_ret_funcptr)
+{
+  int_ret_nsfuncptr_t int_ret_nsfunc_ptr;
+
+  if (cmse_is_nsfptr (int_ret_funcptr))
+{
+  int_ret_nsfunc_ptr = cmse_nsfptr_create (int_ret_funcptr);
+  return int_ret_nsfunc_ptr ();
+}
+  return 0;
+}
+/* { dg-final { scan-assembler "baz:" } } */
+/* { dg-final { scan-assembler "__acle_se_baz:" } } */
+/* { dg-final { scan-assembler-not "\tcmse_nonsecure_caller" } } */
+/* { dg-final { scan-rtl-dump "and.*reg.*const_int 1" expand } } */
+/* { dg-final { scan-assembler "bic" } } */
+/* { dg-final { scan-assembler "push\t\{r4, r5, r6" } } */
+/* { dg-final { scan-assembler "msr\tAPSR_nzcvq" } } */


Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-09 Thread Richard Biener
On Wed, Nov 9, 2016 at 3:30 PM, Dominik Vogt  wrote:
> On Wed, Nov 09, 2016 at 01:43:58PM +0100, Richard Biener wrote:
>> On Tue, Nov 8, 2016 at 5:18 PM, Marc Glisse  wrote:
>> > On Tue, 8 Nov 2016, Dominik Vogt wrote:
>> >
>> >> On Fri, Nov 04, 2016 at 01:54:20PM +0100, Richard Biener wrote:
>> >>>
>> >>> On Fri, Nov 4, 2016 at 12:08 PM, Dominik Vogt 
>> >>> wrote:
>> 
>>  On Fri, Nov 04, 2016 at 09:47:26AM +0100, Richard Biener wrote:
>> >
>> > On Thu, Nov 3, 2016 at 4:03 PM, Dominik Vogt 
>> > wrote:
>> >>
>> >> Is VRP the right pass to do this optimisation or should a later
>> >> pass rather attempt to eliminate the new use of b_5 instead?  Uli
>> >> has brought up the idea a mini "sign extend elimination" pass that
>> >> checks if the result of a sign extend could be replaced by the
>> >> original quantity in all places, and if so, eliminate the ssa
>> >> name.  (I guess that won't help with the above code because l is
>> >> used also as a function argument.)  How could a sensible approach
>> >> to deal with the situation look like?
>> >
>> >
>> > We run into this kind of situation regularly and for general foldings
>> > in match.pd we settled with single_use () even though it is not
>> > perfect.
>> > Note the usual complaint is not extra extension instructions but
>> > the increase of register pressure.
>> >
>> > This is because it is hard to do better when you are doing local
>> > optimization.
>> >
>> > As for the question on whether VRP is the right pass to do this the
>> > answer is two-fold -- VRP has the most precise range information.
>> > But the folding itself should be moved to generic code and use
>> > get_range_info ().
>> 
>> 
>>  All right, is this a sensible approach then?
>> >>>
>> >>>
>> >>> Yes.
>> >>>
>>    1. Using has_single_use as in the experimental patch is good
>>   enough (provided further testing does not show serious
>>   regressions).
>> >>>
>> >>>
>> >>> I'd approve that, yes.
>> >>>
>>    2. Rip the whole top level if-block from simplify_cond_using_ranges().
>>    3. Translate that code to match.pd syntax.
>> >>>
>> >>>
>> >>> Might be some work but yes, that's also desired (you'd lose the ability
>> >>> to emit the warnings though).
>> >>
>> >>
>> >> Could you give me a match-pd-hint please?  We now have something
>> >> like this:
>> >>
>> >> (simplify
>> >>  (cond (gt SSA_NAME@0 INTEGER_CST@1) @2 @3)
>> >>  (if (... many conditions ...)
>> >>   (cond (gt ... ...) @2 @3))
>> >>
>> >> The generated code ends up in gimple_simplify_COND_EXPR, but when
>> >> gimple_simplify is actually called, it goes through the
>> >> GIMPLE_COND case and calls gimple_resimplify2(..., GT, ...) and
>> >> there it tries gimple_simplify_GT_EXPR(), peeling of the (cond
>> >> ...), i.e. it never tries the generated code.
>> >
>> >
>> > Not sure what you mean here.
>> >
>> >> There is another pattern in match.pd that uses a (cond ...) as the
>> >> first operand, and I do not understand how this works.  Should we
>> >> just use "(gt SSA_NAME@0 INTEGER_CST@1)" as the first operand
>> >> instead, and wouldn't this pattern be too general that way?
>> >
>> >
>> > IIUC, you are trying to move the second half of simplify_cond_using_ranges
>> > to match.pd. I don't see any reason to restrict it to the case where the
>> > comparison result is used directly in a COND_EXPR, so that would look like:
>> >
>> > (for cmp (...)
>> >  (simplify
>> >   (cmp (convert SSA_NAME@0) INTEGER_CST@1)
>> >   (if (...)
>> >(cmp @0 (convert @1)
>> >
>> > maybe? I think I may have missed your point.
>>
>> Yeah, if you'd use (cond (gt ... then it only matches in assignments
>> with COND_EXPRs on the RHS, _not_ in GIMPLE_CONDs.
>>
>> So you ran into the (cond vs. GIMPLE_COND "mismatch".
>>
>> You'd essentially implement sth similar to shorten_compare in match.pd.
>
> Something like the attached patch?  Robin and me have spent quite
> some time to figure out the new pattern.  Two questions:
>
> 1) In the match expression you cannot just use SSA_NAME@0 because
>then the "case SSA_NAME:" is added to a switch for another
>pattern that already has that label.  Thus we made that "proxy"
>predicate "ssa_name_p" that forces the code for the new pattern
>out of the old switch and into a separate code block.  We
>couldn't figure out whether this joining of case labels is a
>feature in the matching language.  So, is this the right way to
>deal with the conflicting labels?

No, just do not match SSA_NAME.  And instead of

+  (with { gimple *def_stmt = SSA_NAME_DEF_STMT (@0); }
+   (if (is_gimple_assign (def_stmt)
+   && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt)))

you instead want to change the pattern to

(simpify
  (cmp (convert @0) 

Re: [patch, avr] Add flash size to device info and make wrap around default

2016-11-09 Thread Georg-Johann Lay

On 09.11.2016 10:14, Pitchumani Sivanupandi wrote:

On Tuesday 08 November 2016 02:57 PM, Georg-Johann Lay wrote:

On 08.11.2016 08:08, Pitchumani Sivanupandi wrote:

I have updated patch to include the flash size as well. Took that info from
device headers (it was fed into crt's device information note section also).


The new option would render -mn-flash superfluous, but we should keep it for
backward compatibility.

Ok.

Shouldn't link_pmem_wrap then be removed from link_relax, i.e. from
LINK_RELAX_SPEC?  And what happens if relaxation is off?

Yes. Removed link_pmem_wrap from link_relax.
Disabling relaxation doesn't change -mpmem-wrap-around behavior.

Now, wrap around behavior is changed as follows:

For 8K flash devices:
Device specs adds --pmem-wrap-around=8k linker option if -mno-pmem-wrap-around
is NOT enabled.
It makes the --pmem-wrap-around=8k linker option default for 8k flash devices.

For 16/32/64K flash devices:
Spec string 'link_pmem_wrap' added to all 16/32/64k flash devices specs.
Other wise no changes i.e. It adds --pmem-wrap-around=16/32/64k option if
-mpmem-wrap-around option is enabled.

For other devices, no changes in device specs.

Reg tested with default and -mrelax options enabled. No issues.

Regards,
Pitchumani

gcc/ChangeLog

2016-11-08  Pitchumani Sivanupandi 

* config/avr/avr-arch.h (avr_mcu_t): Add flash_size member.
* config/avr/avr-devices.c(avr_mcu_types): Add flash size info.
* config/avr/avr-mcu.def: Likewise.
* config/avr/gen-avr-mmcu-specs.c (print_mcu): Remove hard-coded prefix
check to find wrap-around value, instead use MCU flash size. For 8k flash
devices, update link_pmem_wrap spec string to add --pmem-wrap-around=8k.
* config/avr/specs.h: Remove link_pmem_wrap from LINK_RELAX_SPEC and
add to linker specs (LINK_SPEC) directly.

flashsize-and-wrap-around.patch



diff --git a/gcc/config/avr/avr-mcus.def b/gcc/config/avr/avr-mcus.def
index 6bcc6ff..9d4aa1a 100644



 /* Classic, == 128K.  */
-AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP,
NULL,0x0060, 0x0, 2)
-AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP,
"__AVR_ATmega103__", 0x0060, 0x0, 2)
-AVR_MCU ("at43usb320",   ARCH_AVR31, AVR_ISA_NONE,
"__AVR_AT43USB320__",   0x0060, 0x0, 2)
+AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP,
NULL,0x0060, 0x0, 2, 0x2)
+AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP,
"__AVR_ATmega103__", 0x0060, 0x0, 2, 0x2)
+AVR_MCU ("at43usb320",   ARCH_AVR31, AVR_ISA_NONE,
"__AVR_AT43USB320__",   0x0060, 0x0, 2, 0x1)


This looks incorrect: either .flash_size should be 0x2 or n_flash be 1.
As you draw the information from internal hardware descriptions, I'd guess
that the new information is more reliable?

...as this also determines whether AT43USB320supports ELPM this also means
that the device is in the wrong multilib set?

I couldn't find the data sheet at atmel.com, and the one I found on the net
only mentions 64KiB program memory.  It mentions LPM on pp. 9 but has no
reference to the supported instruction set.  From the manual I would conclude
that this device should be avr3, not avr31?

Yes. I couldn't find any other source other than the datasheet in net.
This device do not have internal program memory. Flash size is set to 64K
as the device could address only 64K. No RAMPZ register, so no ELPM. Moved
device to AVR3 architecture.

 /* Classic, > 8K, <= 64K.  */
-AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL,
0x0060, 0x0, 1)
-AVR_MCU ("at43usb355",   ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT43USB355__",
0x0060, 0x0, 1)
-AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT76C711__",  
0x0060, 0x0, 1)
+AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL,
0x0060, 0x0, 1, 0x6000)
+AVR_MCU ("at43usb355",   ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT43USB355__",
0x0060, 0x0, 1, 0x6000)
+AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT76C711__",  
0x0060, 0x0, 1, 0x4000)
+AVR_MCU ("at43usb320",   ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT43USB320__",
0x0060, 0x0, 1, 0x1)
 /* Classic, == 128K.  */
-AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,
0x0060, 0x0, 2)
-AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, "__AVR_ATmega103__", 
0x0060, 0x0, 2)
-AVR_MCU ("at43usb320",   ARCH_AVR31, AVR_ISA_NONE, "__AVR_AT43USB320__",   
0x0060, 0x0, 2)
+AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,
0x0060, 0x0, 2, 0x2)
+AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, "__AVR_ATmega103__", 
0x0060, 0x0, 2, 0x2)
 /* Classic + MOVW + JMP/CALL.  */


If at43usb320 is in the wrong multilib, then this should be handled as separate 
issue / patch together with its own PR.  Sorry for the confusion. 

Re: [PATCH] Fix folding of memcmp("a", "a", 2) (PR, tree-optimization/78257)

2016-11-09 Thread Richard Biener
On Wed, Nov 9, 2016 at 3:08 PM, Martin Liška  wrote:
> On 11/09/2016 02:58 PM, Richard Biener wrote:
>> On Wed, Nov 9, 2016 at 2:56 PM, Martin Liška  wrote:
>>> Hello.
>>>
>>> Following patch fixes [1] (f0 function), where we have off-by-one error.
>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>
>>> Ready to be installed?
>>
>> Ok (w/o the unrelated gimplify.c hunk).
>
> Thanks, installed as r242000 and sorry for the wrong hunk.
> As the patch fixed just of the half of issues in the PR, should I close
> it or should I keep it for tracking (thus maybe changing to enhancement?)?

Keep it open for the other issues - this one was minor and easy to fix.

Richard.

> Thanks,
> Martin
>
>
>>
>> Richard.
>>
>>> Martin
>>>
>>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257#c0
>


Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-09 Thread Dominik Vogt
On Wed, Nov 09, 2016 at 01:43:58PM +0100, Richard Biener wrote:
> On Tue, Nov 8, 2016 at 5:18 PM, Marc Glisse  wrote:
> > On Tue, 8 Nov 2016, Dominik Vogt wrote:
> >
> >> On Fri, Nov 04, 2016 at 01:54:20PM +0100, Richard Biener wrote:
> >>>
> >>> On Fri, Nov 4, 2016 at 12:08 PM, Dominik Vogt 
> >>> wrote:
> 
>  On Fri, Nov 04, 2016 at 09:47:26AM +0100, Richard Biener wrote:
> >
> > On Thu, Nov 3, 2016 at 4:03 PM, Dominik Vogt 
> > wrote:
> >>
> >> Is VRP the right pass to do this optimisation or should a later
> >> pass rather attempt to eliminate the new use of b_5 instead?  Uli
> >> has brought up the idea a mini "sign extend elimination" pass that
> >> checks if the result of a sign extend could be replaced by the
> >> original quantity in all places, and if so, eliminate the ssa
> >> name.  (I guess that won't help with the above code because l is
> >> used also as a function argument.)  How could a sensible approach
> >> to deal with the situation look like?
> >
> >
> > We run into this kind of situation regularly and for general foldings
> > in match.pd we settled with single_use () even though it is not
> > perfect.
> > Note the usual complaint is not extra extension instructions but
> > the increase of register pressure.
> >
> > This is because it is hard to do better when you are doing local
> > optimization.
> >
> > As for the question on whether VRP is the right pass to do this the
> > answer is two-fold -- VRP has the most precise range information.
> > But the folding itself should be moved to generic code and use
> > get_range_info ().
> 
> 
>  All right, is this a sensible approach then?
> >>>
> >>>
> >>> Yes.
> >>>
>    1. Using has_single_use as in the experimental patch is good
>   enough (provided further testing does not show serious
>   regressions).
> >>>
> >>>
> >>> I'd approve that, yes.
> >>>
>    2. Rip the whole top level if-block from simplify_cond_using_ranges().
>    3. Translate that code to match.pd syntax.
> >>>
> >>>
> >>> Might be some work but yes, that's also desired (you'd lose the ability
> >>> to emit the warnings though).
> >>
> >>
> >> Could you give me a match-pd-hint please?  We now have something
> >> like this:
> >>
> >> (simplify
> >>  (cond (gt SSA_NAME@0 INTEGER_CST@1) @2 @3)
> >>  (if (... many conditions ...)
> >>   (cond (gt ... ...) @2 @3))
> >>
> >> The generated code ends up in gimple_simplify_COND_EXPR, but when
> >> gimple_simplify is actually called, it goes through the
> >> GIMPLE_COND case and calls gimple_resimplify2(..., GT, ...) and
> >> there it tries gimple_simplify_GT_EXPR(), peeling of the (cond
> >> ...), i.e. it never tries the generated code.
> >
> >
> > Not sure what you mean here.
> >
> >> There is another pattern in match.pd that uses a (cond ...) as the
> >> first operand, and I do not understand how this works.  Should we
> >> just use "(gt SSA_NAME@0 INTEGER_CST@1)" as the first operand
> >> instead, and wouldn't this pattern be too general that way?
> >
> >
> > IIUC, you are trying to move the second half of simplify_cond_using_ranges
> > to match.pd. I don't see any reason to restrict it to the case where the
> > comparison result is used directly in a COND_EXPR, so that would look like:
> >
> > (for cmp (...)
> >  (simplify
> >   (cmp (convert SSA_NAME@0) INTEGER_CST@1)
> >   (if (...)
> >(cmp @0 (convert @1)
> >
> > maybe? I think I may have missed your point.
> 
> Yeah, if you'd use (cond (gt ... then it only matches in assignments
> with COND_EXPRs on the RHS, _not_ in GIMPLE_CONDs.
> 
> So you ran into the (cond vs. GIMPLE_COND "mismatch".
> 
> You'd essentially implement sth similar to shorten_compare in match.pd.

Something like the attached patch?  Robin and me have spent quite
some time to figure out the new pattern.  Two questions:

1) In the match expression you cannot just use SSA_NAME@0 because
   then the "case SSA_NAME:" is added to a switch for another
   pattern that already has that label.  Thus we made that "proxy"
   predicate "ssa_name_p" that forces the code for the new pattern
   out of the old switch and into a separate code block.  We
   couldn't figure out whether this joining of case labels is a
   feature in the matching language.  So, is this the right way to
   deal with the conflicting labels?

2) There's an awful lot of if-clauses in the replacement
   expression.  Is it the right place to do all this checking?

> Btw, moving to match.pd shouldn't be a blocker for adding proper
> single_use tests
> just in case you get lost ...

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
>From 26f75d777254c7e29d721c6813eca33539ac6574 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 2 Nov 2016 14:01:46 +0100
Subject: [PATCH] Convert folding 

[arm-embedded][PATCH, gcc/ARM, ping] Add support for Cortex-M23

2016-11-09 Thread Thomas Preudhomme

Hi,

We have decided to backport this patch to add support for ARM Cortex-M23 to our 
embedded-6-branch.


2016-11-09  Thomas Preud'homme  

Backport from mainline
2016-11-04  Thomas Preud'homme  

gcc/
* config/arm/arm-arches.def (armv8-m.base): Set Cortex-M23 as
representative core for this architecture.
* config/arm/arm-cores.def (cortex-m23): Define new processor.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/arm.c (arm_v6m_tune): Add Cortex-M23 to the list of cores
this tuning parameters apply to in the comment.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M23 to the list of
valid -mcpu options.
* doc/invoke.texi (ARM Options): Document new Cortex-M23 processor.

Best regards,

Thomas
--- Begin Message ---

On 02/11/16 10:13, Kyrill Tkachov wrote:


On 02/11/16 10:07, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 26/10/16 17:42, Thomas Preudhomme wrote:

Hi,

This patch adds support for the Cortex-M23 processor launched by ARM [1]. The
patch adds support for the name and wires it up to the ARMv8-M Baseline
architecture and arm_v6m_tune tuning parameters for the time being. It also
updates documentation to mention this new processor.

[1] http://www.arm.com/products/processors/cortex-m/cortex-m23-processor.php

ChangeLog entry is as follows:

*** gcc/Changelog ***

2016-10-26  Thomas Preud'homme 

* config/arm/arm-arches.def (armv8-m.base): Set Cortex-M23 as
representative core for this architecture.
* config/arm/arm-cores.def (cortex-m23): Define new processor.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/arm.c (arm_v6m_tune): Add Cortex-M23 to the list of cores
this tuning parameters apply to in the comment.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M23 to the list of
valid -mcpu options.
* doc/invoke.texi (ARM Options): Document new Cortex-M23 processor.


Tested by building libgcc and libstdc++ for Cortex-M23 and running a hello world
compiled for it.

Is this ok for master?



Ok.
Thanks,
Kyrill



Committed, thanks.

Best regards,

Thomas
--- End Message ---


Re: [PATCH] Fix folding of memcmp("a", "a", 2) (PR, tree-optimization/78257)

2016-11-09 Thread Martin Liška
On 11/09/2016 02:58 PM, Richard Biener wrote:
> On Wed, Nov 9, 2016 at 2:56 PM, Martin Liška  wrote:
>> Hello.
>>
>> Following patch fixes [1] (f0 function), where we have off-by-one error.
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
> 
> Ok (w/o the unrelated gimplify.c hunk).

Thanks, installed as r242000 and sorry for the wrong hunk.
As the patch fixed just of the half of issues in the PR, should I close
it or should I keep it for tracking (thus maybe changing to enhancement?)?

Thanks,
Martin


> 
> Richard.
> 
>> Martin
>>
>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257#c0



[PATCH] Remove unused parameter from tree_swap_operands_p

2016-11-09 Thread Richard Biener

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2016-11-09  Richard Biener  

* fold-const.c (tree_swap_operands_p): Remove unused arg.
* fold-const.c (tree_swap_operands_p): Likewise.
(fold_binary_loc): Adjust.
(fold_ternary_loc): Likewise.
* genmatch.c (dt_operand::gen_gimple_exp): Likewise.
* gimple-fold.c (fold_stmt_1): Likewise.
* gimple-match-head.c (gimple_resimplify2): Likewise.
(gimple_resimplify3): Likewise.
(gimple_simplify): Likewise.
* tree-ssa-dom.c (record_equality): Likewise.
* tree-ssa-reassoc.c (optimize_range_tests_var_bound): Likewise.
* tree-ssa-sccvn.c (vn_nary_op_compute_hash): Likewise.
* tree-ssa-threadedge.c (simplify_control_stmt_condition_1): Likewise.

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 603aff0..a774848 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -9159,13 +9159,13 @@ fold_binary_loc (location_t loc,
   /* If this is a commutative operation, and ARG0 is a constant, move it
  to ARG1 to reduce the number of tests below.  */
   if (commutative_tree_code (code)
-  && tree_swap_operands_p (arg0, arg1, true))
+  && tree_swap_operands_p (arg0, arg1))
 return fold_build2_loc (loc, code, type, op1, op0);
 
   /* Likewise if this is a comparison, and ARG0 is a constant, move it
  to ARG1 to reduce the number of tests below.  */
   if (kind == tcc_comparison
-  && tree_swap_operands_p (arg0, arg1, true))
+  && tree_swap_operands_p (arg0, arg1))
 return fold_build2_loc (loc, swap_tree_comparison (code), type, op1, op0);
 
   tem = generic_simplify (loc, code, type, op0, op1);
@@ -11271,7 +11271,7 @@ fold_ternary_loc (location_t loc, enum tree_code code, 
tree type,
   /* If this is a commutative operation, and OP0 is a constant, move it
  to OP1 to reduce the number of tests below.  */
   if (commutative_ternary_tree_code (code)
-  && tree_swap_operands_p (op0, op1, true))
+  && tree_swap_operands_p (op0, op1))
 return fold_build3_loc (loc, code, type, op1, op0, op2);
 
   tem = generic_simplify (loc, code, type, op0, op1, op2);
@@ -11400,7 +11400,7 @@ fold_ternary_loc (location_t loc, enum tree_code code, 
tree type,
   /* If the second operand is simpler than the third, swap them
 since that produces better jump optimization results.  */
   if (truth_value_p (TREE_CODE (arg0))
- && tree_swap_operands_p (op1, op2, false))
+ && tree_swap_operands_p (op1, op2))
{
  location_t loc0 = expr_location_or (arg0, loc);
  /* See if this can be inverted.  If it can't, possibly because
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index ae37142..46dcd28 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -124,7 +124,7 @@ extern tree build_invariant_address (tree, tree, 
HOST_WIDE_INT);
 extern tree constant_boolean_node (bool, tree);
 extern tree div_if_zero_remainder (const_tree, const_tree);
 
-extern bool tree_swap_operands_p (const_tree, const_tree, bool);
+extern bool tree_swap_operands_p (const_tree, const_tree);
 extern enum tree_code swap_tree_comparison (enum tree_code);
 
 extern bool ptr_difference_const (tree, tree, HOST_WIDE_INT *);
diff --git a/gcc/genmatch.c b/gcc/genmatch.c
index b14034d..41951c5 100644
--- a/gcc/genmatch.c
+++ b/gcc/genmatch.c
@@ -2701,7 +2701,7 @@ dt_operand::gen_gimple_expr (FILE *f, int indent)
  gen_opname (child_opname0, 0);
  gen_opname (child_opname1, 1);
  fprintf_indent (f, indent,
- "if (tree_swap_operands_p (%s, %s, false))\n",
+ "if (tree_swap_operands_p (%s, %s))\n",
  child_opname0, child_opname1);
  fprintf_indent (f, indent,
  "  std::swap (%s, %s);\n",
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 5d46405..aabc8ff 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -4166,7 +4166,7 @@ fold_stmt_1 (gimple_stmt_iterator *gsi, bool inplace, 
tree (*valueize) (tree))
{
  tree rhs1 = gimple_assign_rhs1 (stmt);
  tree rhs2 = gimple_assign_rhs2 (stmt);
- if (tree_swap_operands_p (rhs1, rhs2, false))
+ if (tree_swap_operands_p (rhs1, rhs2))
{
  gimple_assign_set_rhs1 (stmt, rhs2);
  gimple_assign_set_rhs2 (stmt, rhs1);
@@ -4232,7 +4232,7 @@ fold_stmt_1 (gimple_stmt_iterator *gsi, bool inplace, 
tree (*valueize) (tree))
/* Canonicalize operand order.  */
tree lhs = gimple_cond_lhs (stmt);
tree rhs = gimple_cond_rhs (stmt);
-   if (tree_swap_operands_p (lhs, rhs, false))
+   if (tree_swap_operands_p (lhs, rhs))
  {
gcond *gc = as_a  (stmt);
gimple_cond_set_lhs (gc, rhs);
diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 4eab90d..09f729f 100644

Re: [PATCH] Fix folding of memcmp("a", "a", 2) (PR, tree-optimization/78257)

2016-11-09 Thread Richard Biener
On Wed, Nov 9, 2016 at 2:56 PM, Martin Liška  wrote:
> Hello.
>
> Following patch fixes [1] (f0 function), where we have off-by-one error.
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

Ok (w/o the unrelated gimplify.c hunk).

Richard.

> Martin
>
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257#c0


[PATCH] Fix folding of memcmp("a", "a", 2) (PR, tree-optimization/78257)

2016-11-09 Thread Martin Liška
Hello.

Following patch fixes [1] (f0 function), where we have off-by-one error.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257#c0
>From d7c46e5eb4d295d7653eae188cae04a8f7f0719f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 9 Nov 2016 10:33:43 +0100
Subject: [PATCH] Fix folding of memcmp("a", "a", 2) (PR
 tree-optimization/78257)

gcc/ChangeLog:

2016-11-09  Martin Liska  

	* fold-const-call.c (fold_const_call): Fix the folding.

gcc/testsuite/ChangeLog:

2016-11-09  Martin Liska  

	* gcc.dg/tree-ssa/builtins-folding-generic.c (main): Add new
	test-case for memcmp.
	* gcc.dg/tree-ssa/builtins-folding-gimple.c: Likewise.
---
 gcc/fold-const-call.c| 10 +-
 gcc/gimplify.c   |  1 -
 gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-generic.c |  5 +
 gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c  |  6 ++
 4 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index 1b3a755..38f9717 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -1506,7 +1506,7 @@ tree
 fold_const_call (combined_fn fn, tree type, tree arg0, tree arg1, tree arg2)
 {
   const char *p0, *p1;
-  size_t s2 = 0;
+  size_t s0, s1, s2 = 0;
   switch (fn)
 {
 case CFN_BUILT_IN_STRNCMP:
@@ -1538,11 +1538,11 @@ fold_const_call (combined_fn fn, tree type, tree arg0, tree arg1, tree arg2)
   }
 case CFN_BUILT_IN_BCMP:
 case CFN_BUILT_IN_MEMCMP:
-  if ((p0 = c_getstr (arg0))
-	  && (p1 = c_getstr (arg1))
+  if ((p0 = c_getstr (arg0, ))
+	  && (p1 = c_getstr (arg1, ))
 	  && host_size_t_cst_p (arg2, )
-	  && s2 <= strlen (p0)
-	  && s2 <= strlen (p1))
+	  && s2 <= s0
+	  && s2 <= s1)
 	return build_cmp_result (type, memcmp (p0, p1, s2));
   return NULL_TREE;
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index e5930e6..15976e2 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2266,7 +2266,6 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
 
   labels = gimplify_ctxp->case_labels;
   gimplify_ctxp->case_labels = saved_labels;
-  gcc_assert (gimplify_ctxp->live_switch_vars->elements () == 0);
   delete gimplify_ctxp->live_switch_vars;
   gimplify_ctxp->live_switch_vars = saved_live_switch_vars;
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-generic.c b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-generic.c
index 175feff..3c7e001 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-generic.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-generic.c
@@ -66,6 +66,10 @@ main (void)
   if (__builtin_strncasecmp ("aab", "aac", 2) != 0)
 __builtin_abort ();
 
+  /* MEMCMP.  */
+  if (__builtin_memcmp ("a", "a", 6) != 0)
+__builtin_abort ();
+
   return 0;
 }
 
@@ -74,3 +78,4 @@ main (void)
 /* { dg-final { scan-tree-dump-not "__builtin_strncmp" "original" } } */
 /* { dg-final { scan-tree-dump-not "__builtin_strncasecmp" "original" } } */
 /* { dg-final { scan-tree-dump-not "__builtin_memchr" "original" } } */
+/* { dg-final { scan-tree-dump-not "__builtin_memcmp" "original" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
index 283bd1c..d08ced5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
@@ -63,6 +63,7 @@ main (void)
 
   const char *a = "a";
   const char *hello = "hello";
+  const char *hello2 = "hello";
   const char *empty = "";
   const char *ab = "ab";
   const char *ba = "ba";
@@ -151,6 +152,10 @@ main (void)
   if (__builtin_strncasecmp (++s2, ++s3+2, 0) != 0 || s2 != s1+1 || s3 != s1+5)
 __builtin_abort();
 
+  /* MEMCMP.  */
+  if (__builtin_memcmp (hello, hello2, 6) != 0)
+__builtin_abort ();
+
   return 0;
 }
 
@@ -158,4 +163,5 @@ main (void)
 /* { dg-final { scan-tree-dump-not "__builtin_strcasecmp" "optimized" } } */
 /* { dg-final { scan-tree-dump-not "__builtin_strncmp" "optimized" } } */
 /* { dg-final { scan-tree-dump-not "__builtin_memchr" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "__builtin_memcmp" "optimized" } } */
 /* { dg-final { scan-tree-dump-times "__builtin_strncasecmp" 3 "optimized" } } */
-- 
2.10.1



Re: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)

2016-11-09 Thread Martin Liška
On 11/09/2016 02:29 PM, Jakub Jelinek wrote:
> On Wed, Nov 09, 2016 at 02:16:45PM +0100, Martin Liška wrote:
>> As shown in the attached test-case, the assert cannot always be true.
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
>> Martin
> 
>> >From b55459461f3f7396a094be6801082715ddb4b30d Mon Sep 17 00:00:00 2001
>> From: marxin 
>> Date: Wed, 9 Nov 2016 11:52:00 +0100
>> Subject: [PATCH] Remove unneeded gcc_assert in gimplifier (PR 
>> sanitizer/78270)
>>
>> gcc/ChangeLog:
>>
>> 2016-11-09  Martin Liska  
>>
>>  PR sanitizer/78270
>>  * gimplify.c (gimplify_switch_expr):
> 
> No description on what you've changed.
> 
> That said, I'm not 100% sure it is the right fix.
> As the testcase shows, for switch without GIMPLE_BIND wrapping the body
> we can have variables that are in scope from the switch onwards.
> I bet we could also have variables that go out of scope, say if in the
> compound literal's initializer there is ({ ... }) that declares variables.
> I doubt you can have a valid case/default label in those though, so
> perhaps it would be simpler not to create live_switch_vars at all
> if SWITCH_BODY is not a BIND_EXPR?

I like the approach you introduced! I'll re-trigger regression tests and
send a newer version of patch.

Martin

> 
>   Jakub
> 



Re: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)

2016-11-09 Thread Jakub Jelinek
On Wed, Nov 09, 2016 at 02:16:45PM +0100, Martin Liška wrote:
> As shown in the attached test-case, the assert cannot always be true.
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin

> >From b55459461f3f7396a094be6801082715ddb4b30d Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Wed, 9 Nov 2016 11:52:00 +0100
> Subject: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)
> 
> gcc/ChangeLog:
> 
> 2016-11-09  Martin Liska  
> 
>   PR sanitizer/78270
>   * gimplify.c (gimplify_switch_expr):

No description on what you've changed.

That said, I'm not 100% sure it is the right fix.
As the testcase shows, for switch without GIMPLE_BIND wrapping the body
we can have variables that are in scope from the switch onwards.
I bet we could also have variables that go out of scope, say if in the
compound literal's initializer there is ({ ... }) that declares variables.
I doubt you can have a valid case/default label in those though, so
perhaps it would be simpler not to create live_switch_vars at all
if SWITCH_BODY is not a BIND_EXPR?

Jakub


[PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)

2016-11-09 Thread Martin Liška
As shown in the attached test-case, the assert cannot always be true.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From b55459461f3f7396a094be6801082715ddb4b30d Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 9 Nov 2016 11:52:00 +0100
Subject: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)

gcc/ChangeLog:

2016-11-09  Martin Liska  

	PR sanitizer/78270
	* gimplify.c (gimplify_switch_expr):

gcc/testsuite/ChangeLog:

2016-11-09  Martin Liska  

	PR sanitizer/78270
	* gcc.dg/asan/pr78269.c: New test.
---
 gcc/gimplify.c  |  1 -
 gcc/testsuite/gcc.dg/asan/pr78269.c | 13 +
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/asan/pr78269.c

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index e5930e6..15976e2 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2266,7 +2266,6 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
 
   labels = gimplify_ctxp->case_labels;
   gimplify_ctxp->case_labels = saved_labels;
-  gcc_assert (gimplify_ctxp->live_switch_vars->elements () == 0);
   delete gimplify_ctxp->live_switch_vars;
   gimplify_ctxp->live_switch_vars = saved_live_switch_vars;
 
diff --git a/gcc/testsuite/gcc.dg/asan/pr78269.c b/gcc/testsuite/gcc.dg/asan/pr78269.c
new file mode 100644
index 000..55840b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/pr78269.c
@@ -0,0 +1,13 @@
+// { dg-do compile }
+// { dg-additional-options "-Wno-switch-unreachable" }
+
+typedef struct
+{
+} bdaddr_t;
+
+int a;
+void fn1 ()
+{
+  switch (a)
+&(bdaddr_t){};
+}
-- 
2.10.1



Re: [WIP C++ PATCH] P0217R3 - C++17 structured bindings

2016-11-09 Thread Jakub Jelinek
Hi!

On Wed, Nov 09, 2016 at 01:24:22PM +0100, Jakub Jelinek wrote:
> The following patch is a WIP on P0217R3 - decomposition declarations.
> It contains various FIXMEs, Jason, do you think you could finish it up?

Some more comments:

Invalid? code like
int arr[2];
extern int x, y;
auto [ x, y ] = arr;
depends on PR78217 fix, so I haven't added testsuite coverage for that yet.
Nor for decomp at namespace scope.  There is no coverage for bitfields
either.

And the testsuite coverage surely needs to have some verification of the
exact types and cv quals of the individual decls, the tests only cover
their addresses.  As implemented in the patch, those decls with
DECL_VALUE_EXPRs have non-reference type always, not sure if it is ok
or not.

Jakub


Re: [PATCH, RFC] Improve ivopts group costs

2016-11-09 Thread Bin.Cheng
On Thu, Nov 3, 2016 at 4:00 PM, Bin.Cheng  wrote:
> On Thu, Nov 3, 2016 at 1:35 PM, Evgeny Kudryashov  
> wrote:
>> Hello,
>>
>> I'm facing the following problem related to ivopts. The problem is that GCC
>> generates a lot of induction variables and as a result there is an
>> unnecessary increase of stack usage and register pressure.
>>
>> For instance, for the attached testcase (tc_ivopts.c) GCC generates 26
>> induction variables (25 for each of lhsX[{0-4}][{0-4}][k] and one for
>> rhs[k][j][{0-2}]). You should be able to reproduce this issue on arm target
>> using GCC with "-O2 -mcpu=cortex-a9 -mtune=cortex-a9". For reference, I'm
>> attaching assembly I get on current trunk.
>>
>> The reason might be in use groups costs, in particular, in the way of
>> estimation. Currently, the cost of a tuple (group, candidate) is a sum of
>> per-use costs in the group. So, the cost of a group grows proportional to
>> the number of uses in the group. This approach has a negative effect on the
>> algorithm for finding the best set of induction variables: the part of a
>> total cost related to adding a new candidate is almost washed out by the
>> cost of the group. In addition, when there is a lot of groups with many uses
>> inside and a target is out of free registers, the cost of spill is washing
>> out too. As a result, GCC prefers to use native candidates (candidate
>> created for a particular group) for each group of uses instead of
>> considering the real cost of introducing a new variable into a set.
>>
>> The summing approach was added as a part of this patch
>> (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00641.html) and the
>> motivation for taking the sum does not seem to be
>> specifically discussed.
>>
>> I propose the following patch that changes a group cost from cost summing to
>> selecting the largest one inside a group. For the given test case I have:
>> necessary size of stack is decreased by almost 3 times and ldr\str amount
>> are decreased by less than 2 times. Also I'm attaching assembly after
>> applying the patch.
>>
>> The essential change in the patch is just:
>>
>> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
>> index f9211ad..a149418 100644
>> --- a/gcc/tree-ssa-loop-ivopts.c
>> +++ b/gcc/tree-ssa-loop-ivopts.c
>> @@ -5151,7 +5151,8 @@ determine_group_iv_cost_address (struct ivopts_data
>> *data,
>>  offset and where the iv_use is.  */
>> cost = get_computation_cost (data, next, cand, true,
>>  NULL, _autoinc, NULL);
>> -  sum_cost += cost;
>> +  if (sum_cost < cost)
>> +sum_cost = cost;
>>  }
>>set_group_iv_cost (data, group, cand, sum_cost, depends_on,
>>  NULL_TREE, ERROR_MARK, inv_expr);
>>
>> Any suggestions?
> Hi Evgeny,
>
> I tend to be practical here.  Given cost is not model that well in
> IVOPT, any heuristic tuning in cost is quite likely resulting in
> improvement in some cases, while regressions in others.  At least we
> need to run good number of benchmarks for such changes.  Specific to
> this one, the summary of cost in a group is a natural result of the
> original code, in which uses' cost is added up before candidate
> selection.  Take your code as an example, choosing loop's original
> candidate for group results in lots of memory accesses with [base +
> index << scale] addressing mode, which could have higher cost than
> [base + offset] mode wrto u-arch, IMHO, it makes sense to add up this
> cost.  OTOH, I totally agree that IVOPT tends to choose too many
> candidates at the moment, especially for large loops.  Is it possible
> to achieve the same effect by penalizing register pressure cost?
> Meanwhile, I can collect benchmark data for your patch on AArch64 and
> get back to you later.
I collected spec2k6 data on one AArch64 processors, it doesn't give
meaningful improvement or regression.  Looking at the test, it boils
down to how we choose induction variable for loops having below memory
references:
for (biv)
  MEM[base + biv << scale + off1];
  MEM[base + biv << scale + off2];
  // ...
  MEM[base + biv << scale + offn];

On targets support [base + index << scale + offset] addressing mode ,
the biv should be preferred (if cost of the addressing mode is not
blocking) thus register pressure is 2.  While on targets only support
[base + index << scale], it is more complicated.  Choosing biv
actually increases the register pressure by 1 than choosing {base_1 +
biv << scale, step} as the candidate, or an additional computation
outside of address expression is required for each memory referece.
Well, there is one exception when "base" is actually anticipated on
exit edge of this loop.  So this is really target dependent cost model
issue, IMHO, decreasing group cost in target-independent way is not
that good, what do you think?  I will look deeper why choosing biv can
get rid of spilling.

BTW, the case can be 

[PATCH, ARM] Enable ldrd/strd peephole rules unconditionally

2016-11-09 Thread Bernd Edlinger
Hi!


This patch enables the ldrd/strd peephole rules unconditionally.

It is meant to fix cases, where the patch to reduce the sha512
stack usage splits ldrd/strd instructions into separate ldr/str insns,
but is technically independent from the other patch:

See https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00523.html

It was necessary to change check_effective_target_arm_prefer_ldrd_strd
to retain the true prefer_ldrd_strd tuning flag.


Bootstrapped and reg-tested on arm-linux-gnueabihf.
Is it OK for trunk?


Thanks
Bernd.2016-11-02  Bernd Edlinger  

	PR target/77308
	* config/arm/arm.md (*thumb2_ldrd, *thumb2_ldrd_base,
	*thumb2_ldrd_base_neg, *thumb2_strd, *thumb2_strd_base,
	*thumb2_strd_base_neg): Recognize insn regardless of
	current_tune->prefer_ldrd_strd.
	* config/arm/ldrdstrd.md: Enable all ldrd/strd peephole rules
	whenever possible.

testsuite:
2016-11-02  Bernd Edlinger  

	PR target/77308
	* gcc.target/arm/pr53447-5.c: New test.
	* lib/target-supports.exp
	(check_effective_target_arm_prefer_ldrd_strd): Adjust.

Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md	(revision 241816)
+++ gcc/config/arm/arm.md	(working copy)
@@ -11636,7 +11636,6 @@
 (mem:SI (plus:SI (match_dup 1)
  (match_operand:SI 4 "const_int_operand" ""]
   "TARGET_LDRD && TARGET_THUMB2 && reload_completed
- && current_tune->prefer_ldrd_strd
  && ((INTVAL (operands[2]) + 4) == INTVAL (operands[4]))
  && (operands_ok_ldrd_strd (operands[0], operands[3],
   operands[1], INTVAL (operands[2]),
@@ -11653,7 +11652,6 @@
 (mem:SI (plus:SI (match_dup 1)
  (const_int 4]
   "TARGET_LDRD && TARGET_THUMB2 && reload_completed
- && current_tune->prefer_ldrd_strd
  && (operands_ok_ldrd_strd (operands[0], operands[2],
   operands[1], 0, false, true))"
   "ldrd%?\t%0, %2, [%1]"
@@ -11668,7 +11666,6 @@
(set (match_operand:SI 2 "s_register_operand" "=r")
 (mem:SI (match_dup 1)))]
   "TARGET_LDRD && TARGET_THUMB2 && reload_completed
- && current_tune->prefer_ldrd_strd
  && (operands_ok_ldrd_strd (operands[0], operands[2],
   operands[1], -4, false, true))"
   "ldrd%?\t%0, %2, [%1, #-4]"
@@ -11684,7 +11681,6 @@
  (match_operand:SI 3 "const_int_operand" "")))
 (match_operand:SI 4 "s_register_operand" "r"))]
   "TARGET_LDRD && TARGET_THUMB2 && reload_completed
- && current_tune->prefer_ldrd_strd
  && ((INTVAL (operands[1]) + 4) == INTVAL (operands[3]))
  && (operands_ok_ldrd_strd (operands[2], operands[4],
   operands[0], INTVAL (operands[1]),
@@ -11701,7 +11697,6 @@
  (const_int 4)))
 (match_operand:SI 2 "s_register_operand" "r"))]
   "TARGET_LDRD && TARGET_THUMB2 && reload_completed
- && current_tune->prefer_ldrd_strd
  && (operands_ok_ldrd_strd (operands[1], operands[2],
   operands[0], 0, false, false))"
   "strd%?\t%1, %2, [%0]"
@@ -11716,7 +11711,6 @@
(set (mem:SI (match_dup 0))
 (match_operand:SI 2 "s_register_operand" "r"))]
   "TARGET_LDRD && TARGET_THUMB2 && reload_completed
- && current_tune->prefer_ldrd_strd
  && (operands_ok_ldrd_strd (operands[1], operands[2],
   operands[0], -4, false, false))"
   "strd%?\t%1, %2, [%0, #-4]"
Index: gcc/config/arm/ldrdstrd.md
===
--- gcc/config/arm/ldrdstrd.md	(revision 241816)
+++ gcc/config/arm/ldrdstrd.md	(working copy)
@@ -29,9 +29,7 @@
 (match_operand:SI 2 "memory_operand" ""))
(set (match_operand:SI 1 "arm_general_register_operand" "")
 (match_operand:SI 3 "memory_operand" ""))]
-  "TARGET_LDRD
- && current_tune->prefer_ldrd_strd
- && !optimize_function_for_size_p (cfun)"
+  "TARGET_LDRD"
   [(const_int 0)]
 {
   if (!gen_operands_ldrd_strd (operands, true, false, false))
@@ -63,9 +61,7 @@
 	(match_operand:SI 0 "arm_general_register_operand" ""))
(set (match_operand:SI 3 "memory_operand" "")
 	(match_operand:SI 1 "arm_general_register_operand" ""))]
-  "TARGET_LDRD
- && current_tune->prefer_ldrd_strd
- && !optimize_function_for_size_p (cfun)"
+  "TARGET_LDRD"
   [(const_int 0)]
 {
   if (!gen_operands_ldrd_strd (operands, false, false, false))
@@ -102,9 +98,7 @@
 (match_operand:SI 5 "const_int_operand" ""))
(set (match_operand:SI 3 "memory_operand" "")
 (match_dup 1))]
- "TARGET_LDRD
-  && current_tune->prefer_ldrd_strd
-  && !optimize_function_for_size_p (cfun)"
+  "TARGET_LDRD"
   [(const_int 0)]
 {
   if (!gen_operands_ldrd_strd (operands, false, true, false))
@@ -147,10 +141,8 @@
 (match_dup 0))
(set (match_operand:SI 3 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Richard Biener
On Wed, 9 Nov 2016, Yuri Rumyantsev wrote:

> Thanks Richard for your comments.
> Your proposed to handle epilogue loop just like normal short-trip loop
> but this is not exactly truth since e.g. epilogue must not be peeled
> for alignment.

But if we know the epilogue data-refs are aligned we should have reflected
that in the code so the vectorizer wouldn't even try to peel for
alignment.  OTOH peeling for alignment already knows that peeling
a short-trip loop is not likely profitable (maybe the condition needs
to be hardened to also work for VF/2).

> It is not clear for me what are my next steps? Should I re-design the
> patch completely or simply decompose the whole patch to different
> parts? But it means that we must start review process from beginning
> but release is closed to its end.

What I disliked about the series from the beginning is that it does
everything at once rather than first introducing vectorizing of
epilogues as an independent patch.  Lumping all in together makes
it hard to decipher the conditions each feature is enabled.

I'm mostly concerned about the predication part and thus if we can
get the other parts separated and committed that would be a much
smaller patch to look at and experiment.

Note that only stage1 is at its end, we usually still accept patches
that were posted before stage1 end during stage3.

> Note also that i work for Intel till the end of year and have not idea
> who will continue working on this project.

Noted.

Richard.

> Any help will be appreciated.
>
> Thanks.
> Yuri.
> 
> 2016-11-09 13:37 GMT+03:00 Bin.Cheng :
> > On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  wrote:
> >> Hi All,
> >>
> >> I re-send all patches sent by Ilya earlier for review which support
> >> vectorization of loop epilogues and loops with low trip count. We
> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> >> approved by Jeff.
> >>
> >> I did re-base of all patches and performed bootstrapping and
> >> regression testing that did not show any new failures. Also all
> >> changes related to new vect_do_peeling algorithm have been changed
> >> accordingly.
> >>
> >> Is it OK for trunk?
> >
> > Hi,
> > I can't approve patches, but had some comments after going through the
> > implementation.
> >
> > One confusing part is cost model change, as well as the way it's used
> > to decide how epilogue loop should be vectorized.  Given vect-tail is
> > disabled at the moment and the cost change needs further tuning, is it
> > reasonable to split this part out and get vectorization part
> > reviewed/submitted independently?  For example, let user specified
> > parameters make the decision for now.  Cost and target dependent
> > changes should go in at last, this could make the patch easier to
> > read.
> >
> > The implementation computes/shares quite amount information from main
> > loop to epilogue loop vectorization.  Furthermore, variables/fields
> > for such information are somehow named in a misleading way.  For
> > example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
> > flag controlling whether epilogue loop should be vectorized with
> > masking.  However, it's actually controlled by exactly the same flag
> > as whether epilogue loop should be combined into the main loop with
> > masking:
> > @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
> >
> >slpeel_make_loop_iterate_ntimes (loop, niters_vector);
> >
> > +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
> > +vect_combine_loop_epilogue (loop_vinfo);
> > +
> >/* Reduce loop iterations by the vectorization factor.  */
> >scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
> >expected_iterations / vf);
> >
> > IMHO, we should decouple main loop vectorization and epilogue
> > vectorization as much as possible by sharing as few information as we
> > can.  The general idea is to handle epilogue loop just like normal
> > short-trip loop.  For example, we can rename
> > LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
> > else), and we don't differentiate its meaning between main and
> > epilogue(short-trip) loop.  It only indicates the current loop should
> > be vectorized with masking no matter it's a main loop or epilogue
> > loop, and it works just like the current implementation.
> >
> > After this change, we can refine vectorization and make it more
> > general for normal loop and epilogue(short trip) loop.  For example,
> > this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
> > loop and use it to control how it should be vectorized:
> > +  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > +{
> > +  LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
> > +  LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
> > +}
> > +  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> > +   && min_profitable_combine_iters >= 0)
> > +{
> >
> > This works, 

[PING][PATCH v2][aarch64] Add mcpu flag for Qualcomm falkor core

2016-11-09 Thread Siddhesh Poyarekar
Ping!

Siddhesh

On 4 November 2016 at 21:17, Siddhesh Poyarekar
 wrote:
> This adds an mcpu option for the upcoming Qualcomm Falkor core.  This
> is identical to the qdf24xx part that was added earlier and hence
> retains the same tuning structure and continues to have the a57
> pipeline for now.  The part number has also been changed and this
> patch fixes this for both qdf24xx and falkor options.
>
> Tested with aarch64 and armhf.
>
> Siddhesh
>
> * gcc/config/aarch64/aarch64-cores.def (qdf24xx): Update part
> number.
> (falkor): New core.
> * gcc/config/aarch64/aarch64-tune.md: Regenerated.
> * gcc/config/arm/arm-cores.def (falkor): New core.
> * gcc/config/arm/arm-tables.opt: Regenerated.
> * gcc/config/arm/arm-tune.md: Regenerated.
> * gcc/config/arm/bpabi.h (BE8_LINK_SPEC): Add falkor support.
> * gcc/config/arm/t-aprofile (MULTILIB_MATCHES): Add falkor
> support.
> * gcc/doc/invoke.texi (AArch64 Options/-mtune): Add falkor.
> (ARM Options/-mtune): Add falkor.
>
> ---
>  gcc/config/aarch64/aarch64-cores.def | 3 ++-
>  gcc/config/aarch64/aarch64-tune.md   | 2 +-
>  gcc/config/arm/arm-cores.def | 1 +
>  gcc/config/arm/arm-tables.opt| 3 +++
>  gcc/config/arm/arm-tune.md   | 2 +-
>  gcc/config/arm/bpabi.h   | 2 ++
>  gcc/config/arm/t-aprofile| 1 +
>  gcc/doc/invoke.texi  | 9 +
>  8 files changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index f9b7552..4b00f3f 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -54,7 +54,8 @@ AARCH64_CORE("cortex-a73",  cortexa73, cortexa57, 8A,  
> AARCH64_FL_FOR_ARCH8 | AA
>  AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  0x53, 0x001)
>
>  /* Qualcomm ('Q') cores. */
> -AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0x800)
> +AARCH64_CORE("falkor",  falkor,cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0xC00)
> +AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0xC00)
>
>  /* Cavium ('C') cores. */
>  AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 0x0a1)
> diff --git a/gcc/config/aarch64/aarch64-tune.md 
> b/gcc/config/aarch64/aarch64-tune.md
> index 022b131..29afcdf 100644
> --- a/gcc/config/aarch64/aarch64-tune.md
> +++ b/gcc/config/aarch64/aarch64-tune.md
> @@ -1,5 +1,5 @@
>  ;; -*- buffer-read-only: t -*-
>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>  (define_attr "tune"
> -   
> "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,qdf24xx,thunderx,xgene1,vulcan,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
> +   
> "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,falkor,qdf24xx,thunderx,xgene1,vulcan,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
> (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
> diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
> index 2072e1e..1bfec9c 100644
> --- a/gcc/config/arm/arm-cores.def
> +++ b/gcc/config/arm/arm-cores.def
> @@ -173,6 +173,7 @@ ARM_CORE("cortex-a57",  cortexa57, cortexa57,   8A,   
>   ARM_FSET_MAKE_CPU1 (FL_LDSCHED
>  ARM_CORE("cortex-a72", cortexa72, cortexa57,   8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
>  ARM_CORE("cortex-a73", cortexa73, cortexa57,   8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a73)
>  ARM_CORE("exynos-m1",  exynosm1,  exynosm1,8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), exynosm1)
> +ARM_CORE("falkor", falkor,cortexa57,   8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), qdf24xx)
>  ARM_CORE("qdf24xx",qdf24xx,   cortexa57,   8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), qdf24xx)
>  ARM_CORE("xgene1",  xgene1,xgene1,  8A,ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_FOR_ARCH8A),xgene1)
>
> diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
> index ee9e3bb..7b15c8c 100644
> --- a/gcc/config/arm/arm-tables.opt
> +++ b/gcc/config/arm/arm-tables.opt
> @@ -328,6 +328,9 @@ EnumValue
>  Enum(processor_type) String(exynos-m1) Value(exynosm1)
>
>  EnumValue
> +Enum(processor_type) String(falkor) Value(falkor)
> +
> +EnumValue
>  Enum(processor_type) String(qdf24xx) Value(qdf24xx)
>
>  EnumValue
> diff --git 

[Patch, fortran] PR44265 - Link error with reference to parameter array in specification expression

2016-11-09 Thread Paul Richard Thomas
Dear All,

The title of this PR says what this is all about, except that it
applies uniquely applicable to character function result string
lengths.

Ian Harvey wrote the first patch for this PR for which many thanks.
However, two issues came up that took some little while to understand;
(i) In comment #1, it was found that calls from sibling procedures
could generate streams of undefined references at the link stage; and
(ii) An entity of the same name as the contained procedure entity in
module scope caused similar problems.

The relationship with Ian's patch is still obvious. The fundamental
difference is that the parameter arrays are automatically promoted to
module scope using a unique symtree. This fixes both the issues above.

Dominique, could you please check that the -m32 issue has gone away?

Bootstrapped and regtested on FC21/x86_64 - OK for trunk?

Paul

2016-11-09  Paul Thomas  

PR fortran/44265
* gfortran.h : Add fn_result_spec bitfield to gfc_symbol.
* resolve.c (flag_fn_result_spec): New function.
(resolve_fntype): Call it for character result lengths.
* symbol.c (gfc_new_symbol): Set fn_result_spec to zero.
* trans-decl.c (gfc_sym_mangled_identifier): Include the
procedure name in the mangled name for symbols with the
fn_result_spec bit set.
(gfc_get_symbol_decl): Mangle the name of these symbols.
(gfc_create_module_variable): Allow them through the assert.
(gfc_generate_function_code): Remove the assert before the
initialization of sym->tlink because the frontend no longer
uses this field.
* trans-expr.c (gfc_map_intrinsic_function): Add a case to
treat the LEN_TRIM intrinsic.

2016-11-09  Paul Thomas  

PR fortran/44265
* gfortran.dg/char_result_14.f90: New test.
* gfortran.dg/char_result_15.f90: New test.


-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein
Index: gcc/fortran/gfortran.h
===
*** gcc/fortran/gfortran.h  (revision 241994)
--- gcc/fortran/gfortran.h  (working copy)
*** typedef struct gfc_symbol
*** 1498,1503 
--- 1498,1505 
unsigned equiv_built:1;
/* Set if this variable is used as an index name in a FORALL.  */
unsigned forall_index:1;
+   /* Set if the symbol is used in a function result specification .  */
+   unsigned fn_result_spec:1;
/* Used to avoid multiple resolutions of a single symbol.  */
unsigned resolved:1;
/* Set if this is a module function or subroutine with the
Index: gcc/fortran/resolve.c
===
*** gcc/fortran/resolve.c   (revision 241994)
--- gcc/fortran/resolve.c   (working copy)
*** resolve_equivalence (gfc_equiv *eq)
*** 15732,15737 
--- 15732,15776 
  }
  
  
+ /* Function called by resolve_fntype to flag other symbol used in the
+length type parameter specification of function resuls.  */
+ 
+ static bool
+ flag_fn_result_spec (gfc_expr *expr,
+  gfc_symbol *sym ATTRIBUTE_UNUSED,
+  int *f ATTRIBUTE_UNUSED)
+ {
+   gfc_namespace *ns;
+   gfc_symbol *s;
+ 
+   if (expr->expr_type == EXPR_VARIABLE)
+ {
+   s = expr->symtree->n.sym;
+   for (ns = s->ns; ns; ns = ns->parent)
+   if (!ns->parent)
+ break;
+ 
+   if (!s->fn_result_spec && s->ns->parent != NULL
+ && s->attr.flavor == FL_PARAMETER)
+   {
+ if (ns->proc_name && ns->proc_name->attr.flavor == FL_MODULE)
+   {
+ gfc_symtree *st;
+ s->fn_result_spec = 1;
+ /* Make sure that this symbol is translated as a module
+variable.  */
+ st = gfc_get_unique_symtree (ns);
+ st->n.sym = s;
+ s->refs++;
+   }
+ else if (s->attr.use_assoc || s->attr.used_in_submodule)
+   s->fn_result_spec = 1;
+   }
+ }
+   return false;
+ }
+ 
+ 
  /* Resolve function and ENTRY types, issue diagnostics if needed.  */
  
  static void
*** resolve_fntype (gfc_namespace *ns)
*** 15782,15787 
--- 15821,15829 
el->sym->attr.untyped = 1;
  }
}
+ 
+   if (sym->ts.type == BT_CHARACTER)
+ gfc_traverse_expr (sym->ts.u.cl->length, NULL, flag_fn_result_spec, 0);
  }
  
  
Index: gcc/fortran/symbol.c
===
*** gcc/fortran/symbol.c(revision 241994)
--- gcc/fortran/symbol.c(working copy)
*** gfc_new_symbol (const char *name, gfc_na
*** 2933,2938 
--- 2933,2939 
p->common_block = NULL;
p->f2k_derived = NULL;
p->assoc = NULL;
+   p->fn_result_spec = 0;

return p;
  }
Index: gcc/fortran/trans-decl.c
===
*** gcc/fortran/trans-decl.c(revision 

Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-09 Thread Richard Biener
On Tue, Nov 8, 2016 at 5:18 PM, Marc Glisse  wrote:
> On Tue, 8 Nov 2016, Dominik Vogt wrote:
>
>> On Fri, Nov 04, 2016 at 01:54:20PM +0100, Richard Biener wrote:
>>>
>>> On Fri, Nov 4, 2016 at 12:08 PM, Dominik Vogt 
>>> wrote:

 On Fri, Nov 04, 2016 at 09:47:26AM +0100, Richard Biener wrote:
>
> On Thu, Nov 3, 2016 at 4:03 PM, Dominik Vogt 
> wrote:
>>
>> Is VRP the right pass to do this optimisation or should a later
>> pass rather attempt to eliminate the new use of b_5 instead?  Uli
>> has brought up the idea a mini "sign extend elimination" pass that
>> checks if the result of a sign extend could be replaced by the
>> original quantity in all places, and if so, eliminate the ssa
>> name.  (I guess that won't help with the above code because l is
>> used also as a function argument.)  How could a sensible approach
>> to deal with the situation look like?
>
>
> We run into this kind of situation regularly and for general foldings
> in match.pd we settled with single_use () even though it is not
> perfect.
> Note the usual complaint is not extra extension instructions but
> the increase of register pressure.
>
> This is because it is hard to do better when you are doing local
> optimization.
>
> As for the question on whether VRP is the right pass to do this the
> answer is two-fold -- VRP has the most precise range information.
> But the folding itself should be moved to generic code and use
> get_range_info ().


 All right, is this a sensible approach then?
>>>
>>>
>>> Yes.
>>>
   1. Using has_single_use as in the experimental patch is good
  enough (provided further testing does not show serious
  regressions).
>>>
>>>
>>> I'd approve that, yes.
>>>
   2. Rip the whole top level if-block from simplify_cond_using_ranges().
   3. Translate that code to match.pd syntax.
>>>
>>>
>>> Might be some work but yes, that's also desired (you'd lose the ability
>>> to emit the warnings though).
>>
>>
>> Could you give me a match-pd-hint please?  We now have something
>> like this:
>>
>> (simplify
>>  (cond (gt SSA_NAME@0 INTEGER_CST@1) @2 @3)
>>  (if (... many conditions ...)
>>   (cond (gt ... ...) @2 @3))
>>
>> The generated code ends up in gimple_simplify_COND_EXPR, but when
>> gimple_simplify is actually called, it goes through the
>> GIMPLE_COND case and calls gimple_resimplify2(..., GT, ...) and
>> there it tries gimple_simplify_GT_EXPR(), peeling of the (cond
>> ...), i.e. it never tries the generated code.
>
>
> Not sure what you mean here.
>
>> There is another pattern in match.pd that uses a (cond ...) as the
>> first operand, and I do not understand how this works.  Should we
>> just use "(gt SSA_NAME@0 INTEGER_CST@1)" as the first operand
>> instead, and wouldn't this pattern be too general that way?
>
>
> IIUC, you are trying to move the second half of simplify_cond_using_ranges
> to match.pd. I don't see any reason to restrict it to the case where the
> comparison result is used directly in a COND_EXPR, so that would look like:
>
> (for cmp (...)
>  (simplify
>   (cmp (convert SSA_NAME@0) INTEGER_CST@1)
>   (if (...)
>(cmp @0 (convert @1)
>
> maybe? I think I may have missed your point.

Yeah, if you'd use (cond (gt ... then it only matches in assignments
with COND_EXPRs on the RHS, _not_ in GIMPLE_CONDs.

So you ran into the (cond vs. GIMPLE_COND "mismatch".

You'd essentially implement sth similar to shorten_compare in match.pd.

Btw, moving to match.pd shouldn't be a blocker for adding proper
single_use tests
just in case you get lost ...

Richard.

> (and yes, the first half would give a very general (simplify (cmp SSA_NAME@0
> INTEGER_CST@1) ...), that doesn't seem so bad)
>
> --
> Marc Glisse


Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Bin.Cheng
On Wed, Nov 9, 2016 at 12:12 PM, Yuri Rumyantsev  wrote:
> I am familiar with SVE extension and understand that implemented
> approach might be not suitable for ARM. The main point is that only
> load/store instructions are masked but all other calculations are not
> (we did special conversion for reduction statements to implement
> merging predication semantic). For SVE peeling for niters is not
> required but it is not true for x86 -  we must determine what
> vectorization scheme is more profitable: loop combining (the only
> essential for SVE) or separate epilogue vectorization using masking or
> less vectorization factor. So I'd like to have the full list of
> required changes to our implementation to try to remove them.
Hmm, sorry that my comment gave impression that I was trying to hold
back the patch, it's not what I meant by any means.  Also it's not
related to SVE, As a matter of fact, I haven't read any document about
SVE yet.  Sorry again for the false impression conveyed by previous
messages.

Thanks,
bin
>
> Thanks.
> Yuri.
>
> 2016-11-09 14:46 GMT+03:00 Bin.Cheng :
>> On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev  wrote:
>>> Thanks Richard for your comments.
>>> Your proposed to handle epilogue loop just like normal short-trip loop
>>> but this is not exactly truth since e.g. epilogue must not be peeled
>>> for alignment.
>> Yes there must be some differences, my motivation is to minimize that
>> so we don't need to specially check normal/epilogue loops at too many
>> places.
>> Of course it's just my feeling when going through the patch set, and
>> could be wrong.
>>
>> Thanks,
>> bin
>>>
>>> It is not clear for me what are my next steps? Should I re-design the
>>> patch completely or simply decompose the whole patch to different
>>> parts? But it means that we must start review process from beginning
>>> but release is closed to its end.
>>> Note also that i work for Intel till the end of year and have not idea
>>> who will continue working on this project.
>>>
>>> Any help will be appreciated.
>>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2016-11-09 13:37 GMT+03:00 Bin.Cheng :
 On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  
 wrote:
> Hi All,
>
> I re-send all patches sent by Ilya earlier for review which support
> vectorization of loop epilogues and loops with low trip count. We
> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> approved by Jeff.
>
> I did re-base of all patches and performed bootstrapping and
> regression testing that did not show any new failures. Also all
> changes related to new vect_do_peeling algorithm have been changed
> accordingly.
>
> Is it OK for trunk?

 Hi,
 I can't approve patches, but had some comments after going through the
 implementation.

 One confusing part is cost model change, as well as the way it's used
 to decide how epilogue loop should be vectorized.  Given vect-tail is
 disabled at the moment and the cost change needs further tuning, is it
 reasonable to split this part out and get vectorization part
 reviewed/submitted independently?  For example, let user specified
 parameters make the decision for now.  Cost and target dependent
 changes should go in at last, this could make the patch easier to
 read.

 The implementation computes/shares quite amount information from main
 loop to epilogue loop vectorization.  Furthermore, variables/fields
 for such information are somehow named in a misleading way.  For
 example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
 flag controlling whether epilogue loop should be vectorized with
 masking.  However, it's actually controlled by exactly the same flag
 as whether epilogue loop should be combined into the main loop with
 masking:
 @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)

slpeel_make_loop_iterate_ntimes (loop, niters_vector);

 +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
 +vect_combine_loop_epilogue (loop_vinfo);
 +
/* Reduce loop iterations by the vectorization factor.  */
scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
expected_iterations / vf);

 IMHO, we should decouple main loop vectorization and epilogue
 vectorization as much as possible by sharing as few information as we
 can.  The general idea is to handle epilogue loop just like normal
 short-trip loop.  For example, we can rename
 LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
 else), and we don't differentiate its meaning between main and
 epilogue(short-trip) loop.  It only indicates the current loop should
 be vectorized with masking no matter it's a main loop or epilogue
 loop, and 

[WIP C++ PATCH] P0217R3 - C++17 structured bindings

2016-11-09 Thread Jakub Jelinek
Hi!

The following patch is a WIP on P0217R3 - decomposition declarations.
It contains various FIXMEs, Jason, do you think you could finish it up?

The most important unfinished part in the patch is that cp_finish_decomp
for classes doesn't try to check if std::tuple_size::value is
well-formed integral constant expression and use decl.get() or
get(decl) as initializers and std::tuple_element::type
as type of the individual vars (that need cp_finish_decl then and
dropping of DEC_HAS_VALUE_EXPR_P/DECL_VALUE_EXPR if they have any)
- template-ids, lookups, instantiations isn't something I'm comfortable
enough with to write that.

Another thing is with the non-reference decompositions - I think
int arr[2];
...
auto [ x, y ] = arr;
works properly as copy-initialization, avoids explicit copy constructors,
as I've tried to test in the decomp6.C, though the templates used during
the VEC_INIT_EXPR gimplification aren't actually instantiated (see the
fixme routine in the test that really shouldn't be necessary).
But
auto [ x, y ] { arr };
doesn't seem to work correctly, see the commented out part of decomp6.C,
the cp-gimplify.c hunk is kind of hackish, but it seems to try to
use A::A conversion ctor while I believe it should use the
A::A(const A &) copy ctor.

Another thing is in find_decomp_class_base - I think the current code
will just reject if the same base class with any non-static data members
appears as virtual base in more than one place in the bases tree,
not really sure what should be done, how to check if the paths to
that base are accessible etc.

Another thing is in find_decomp_class_base caller, while
that function can return some base class if the only non-static data
members are in such a base and not in the derived class(es), the caller
isn't prepared to expand that, not sure if we need build_base_path or what
to actually generate the COMPONENT_REFs for the artificial FIELD_DECLs
or what.

Not sure about how the qualifiers from class fields and from the
cv-qualifiers on the decomposition declaration should be treated, at the
moment I'm oring them in.

The match.pd hunk is needed, otherwise the generic folding happily folds
int arr[2];
...
auto [ x, y ] = arr;
 == [0]
into 0, because it thinks x and arr are distinct VAR_DECLs.  Though, if
such comparisons are required to be folded in constexpr contexts under
certain conditions, we'd need to handle the DECL_VALUE_EXPRs in constexpr.c
somehow.

Mangling of the decomposition declaration base decls at namespace scope
isn't implemented; I think the DECL_ASSEMBLER_NAME for them (they have
NULL DECL_NAME) could be done in cp_finish_decomp, where we have all the
corresponding identifiers available, but it would probably need to be done
through direct calls to the demangler, as the base decl isn't all that is
needed for that.

Per IRC discussions, the associated VAR_DECLs are chained through
DECL_CHAIN.  It isn't that easy though, because decls are pushed into the
bindings in reverse order and afterwards nreversed.  Plus at least for
the (not yet implemented) std::tuple_size::value stuff where the
VAR_DECLs shouldn't have DECL_VALUE_EXPR, the desired final order is
that the nameless artificial base decl comes first and then the other
decls in the order they appear.  But that means that at cp_finish_decl
time of the base decl they are actually in the order y, x, D.1234
for tha above auto [ x, y ] = arr; - so the finalizing of those is done
by separate cp_finish_decomp that needs to know the start of the chain
(the last named variable of the decomposition) plus the base artificial
decl and count.  For range for it uses the DECL_VALUE_EXPRs to find
those, maybe it would be better to pass around during parsing not just
address of a tree (the range decl), but address of a triplet (range decl, first
decomp decl in the chain (i.e. last one) and count, so that range for
parsing doesn't have to rediscover those.  The base decl has the
DECL_DECOMPOSITION_P flag on it set and NULL_TREE DECL_NAME, the
instantiation code in the patch doesn't parse DECL_VALUE_EXPRs though,
because if the decomp initializer is not type dependent, those
DECL_VALUE_EXPRs aren't actually in a standardized form that could be
easily parseable, so it just assumes those named decls have also
DECL_DECOMPOSITION_P flag on and non-NULL DECL_NAME and follow
the base decl in DECL_CHAIN (the new ones created with tsubst_expr
in reverse order again).  If nothing attempts to fold stuff in templates,
perhaps we could avoid setting DECL_VALUE_EXPRs at all when
processing_template_decl?

--- gcc/match.pd.jj 2016-11-07 18:32:56.0 +0100
+++ gcc/match.pd2016-11-08 14:00:05.391773322 +0100
@@ -2547,8 +2547,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (with
  {
int equal = 2;
-   if (decl_in_symtab_p (base0)
-  && decl_in_symtab_p (base1))
+   /* Punt in GENERIC on variables with value expressions;
+ the value expressions might point to 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Yuri Rumyantsev
I am familiar with SVE extension and understand that implemented
approach might be not suitable for ARM. The main point is that only
load/store instructions are masked but all other calculations are not
(we did special conversion for reduction statements to implement
merging predication semantic). For SVE peeling for niters is not
required but it is not true for x86 -  we must determine what
vectorization scheme is more profitable: loop combining (the only
essential for SVE) or separate epilogue vectorization using masking or
less vectorization factor. So I'd like to have the full list of
required changes to our implementation to try to remove them.

Thanks.
Yuri.

2016-11-09 14:46 GMT+03:00 Bin.Cheng :
> On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev  wrote:
>> Thanks Richard for your comments.
>> Your proposed to handle epilogue loop just like normal short-trip loop
>> but this is not exactly truth since e.g. epilogue must not be peeled
>> for alignment.
> Yes there must be some differences, my motivation is to minimize that
> so we don't need to specially check normal/epilogue loops at too many
> places.
> Of course it's just my feeling when going through the patch set, and
> could be wrong.
>
> Thanks,
> bin
>>
>> It is not clear for me what are my next steps? Should I re-design the
>> patch completely or simply decompose the whole patch to different
>> parts? But it means that we must start review process from beginning
>> but release is closed to its end.
>> Note also that i work for Intel till the end of year and have not idea
>> who will continue working on this project.
>>
>> Any help will be appreciated.
>>
>> Thanks.
>> Yuri.
>>
>> 2016-11-09 13:37 GMT+03:00 Bin.Cheng :
>>> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  wrote:
 Hi All,

 I re-send all patches sent by Ilya earlier for review which support
 vectorization of loop epilogues and loops with low trip count. We
 assume that the only patch - vec-tails-07-combine-tail.patch - was not
 approved by Jeff.

 I did re-base of all patches and performed bootstrapping and
 regression testing that did not show any new failures. Also all
 changes related to new vect_do_peeling algorithm have been changed
 accordingly.

 Is it OK for trunk?
>>>
>>> Hi,
>>> I can't approve patches, but had some comments after going through the
>>> implementation.
>>>
>>> One confusing part is cost model change, as well as the way it's used
>>> to decide how epilogue loop should be vectorized.  Given vect-tail is
>>> disabled at the moment and the cost change needs further tuning, is it
>>> reasonable to split this part out and get vectorization part
>>> reviewed/submitted independently?  For example, let user specified
>>> parameters make the decision for now.  Cost and target dependent
>>> changes should go in at last, this could make the patch easier to
>>> read.
>>>
>>> The implementation computes/shares quite amount information from main
>>> loop to epilogue loop vectorization.  Furthermore, variables/fields
>>> for such information are somehow named in a misleading way.  For
>>> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
>>> flag controlling whether epilogue loop should be vectorized with
>>> masking.  However, it's actually controlled by exactly the same flag
>>> as whether epilogue loop should be combined into the main loop with
>>> masking:
>>> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>>>
>>>slpeel_make_loop_iterate_ntimes (loop, niters_vector);
>>>
>>> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>>> +vect_combine_loop_epilogue (loop_vinfo);
>>> +
>>>/* Reduce loop iterations by the vectorization factor.  */
>>>scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
>>>expected_iterations / vf);
>>>
>>> IMHO, we should decouple main loop vectorization and epilogue
>>> vectorization as much as possible by sharing as few information as we
>>> can.  The general idea is to handle epilogue loop just like normal
>>> short-trip loop.  For example, we can rename
>>> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
>>> else), and we don't differentiate its meaning between main and
>>> epilogue(short-trip) loop.  It only indicates the current loop should
>>> be vectorized with masking no matter it's a main loop or epilogue
>>> loop, and it works just like the current implementation.
>>>
>>> After this change, we can refine vectorization and make it more
>>> general for normal loop and epilogue(short trip) loop.  For example,
>>> this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
>>> loop and use it to control how it should be vectorized:
>>> +  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
>>> +{
>>> +  LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
>>> +  LOOP_VINFO_COMBINE_EPILOGUE 

Re: [PATCH] DECL_RTL and DECL_RTL_INCOMING in RTL dumps

2016-11-09 Thread Bernd Schmidt

On 11/08/2016 07:03 PM, David Malcolm wrote:

int __RTL("rtl-combine") f1 (int n)
{
(function "f1"
  (param "n"
(DECL_RTL
  (reg/v:SI %1 [ n ])
) ;; DECL_RTL


The ;; DECL_RTL etc. comments seem somewhat redundant and add clutter. 
Please remove those.


Also, why is the closing paren on its own line? That doesn't seem right.

Later (not for this patch) I'd really like to see some logic not to add 
linebreaks before simple expressions, so that we'd have (DECL_RTL 
(reg:SI xyz)) on a single line.



Bernd



Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Bin.Cheng
On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev  wrote:
> Thanks Richard for your comments.
> Your proposed to handle epilogue loop just like normal short-trip loop
> but this is not exactly truth since e.g. epilogue must not be peeled
> for alignment.
Yes there must be some differences, my motivation is to minimize that
so we don't need to specially check normal/epilogue loops at too many
places.
Of course it's just my feeling when going through the patch set, and
could be wrong.

Thanks,
bin
>
> It is not clear for me what are my next steps? Should I re-design the
> patch completely or simply decompose the whole patch to different
> parts? But it means that we must start review process from beginning
> but release is closed to its end.
> Note also that i work for Intel till the end of year and have not idea
> who will continue working on this project.
>
> Any help will be appreciated.
>
> Thanks.
> Yuri.
>
> 2016-11-09 13:37 GMT+03:00 Bin.Cheng :
>> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  wrote:
>>> Hi All,
>>>
>>> I re-send all patches sent by Ilya earlier for review which support
>>> vectorization of loop epilogues and loops with low trip count. We
>>> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>>> approved by Jeff.
>>>
>>> I did re-base of all patches and performed bootstrapping and
>>> regression testing that did not show any new failures. Also all
>>> changes related to new vect_do_peeling algorithm have been changed
>>> accordingly.
>>>
>>> Is it OK for trunk?
>>
>> Hi,
>> I can't approve patches, but had some comments after going through the
>> implementation.
>>
>> One confusing part is cost model change, as well as the way it's used
>> to decide how epilogue loop should be vectorized.  Given vect-tail is
>> disabled at the moment and the cost change needs further tuning, is it
>> reasonable to split this part out and get vectorization part
>> reviewed/submitted independently?  For example, let user specified
>> parameters make the decision for now.  Cost and target dependent
>> changes should go in at last, this could make the patch easier to
>> read.
>>
>> The implementation computes/shares quite amount information from main
>> loop to epilogue loop vectorization.  Furthermore, variables/fields
>> for such information are somehow named in a misleading way.  For
>> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
>> flag controlling whether epilogue loop should be vectorized with
>> masking.  However, it's actually controlled by exactly the same flag
>> as whether epilogue loop should be combined into the main loop with
>> masking:
>> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>>
>>slpeel_make_loop_iterate_ntimes (loop, niters_vector);
>>
>> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>> +vect_combine_loop_epilogue (loop_vinfo);
>> +
>>/* Reduce loop iterations by the vectorization factor.  */
>>scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
>>expected_iterations / vf);
>>
>> IMHO, we should decouple main loop vectorization and epilogue
>> vectorization as much as possible by sharing as few information as we
>> can.  The general idea is to handle epilogue loop just like normal
>> short-trip loop.  For example, we can rename
>> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
>> else), and we don't differentiate its meaning between main and
>> epilogue(short-trip) loop.  It only indicates the current loop should
>> be vectorized with masking no matter it's a main loop or epilogue
>> loop, and it works just like the current implementation.
>>
>> After this change, we can refine vectorization and make it more
>> general for normal loop and epilogue(short trip) loop.  For example,
>> this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
>> loop and use it to control how it should be vectorized:
>> +  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
>> +{
>> +  LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
>> +  LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
>> +}
>> +  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
>> +   && min_profitable_combine_iters >= 0)
>> +{
>>
>> This works, but not that good for understanding or maintaining.
>>
>> Thanks,
>> bin


Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Yuri Rumyantsev
Thanks Richard for your comments.
Your proposed to handle epilogue loop just like normal short-trip loop
but this is not exactly truth since e.g. epilogue must not be peeled
for alignment.

It is not clear for me what are my next steps? Should I re-design the
patch completely or simply decompose the whole patch to different
parts? But it means that we must start review process from beginning
but release is closed to its end.
Note also that i work for Intel till the end of year and have not idea
who will continue working on this project.

Any help will be appreciated.

Thanks.
Yuri.

2016-11-09 13:37 GMT+03:00 Bin.Cheng :
> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  wrote:
>> Hi All,
>>
>> I re-send all patches sent by Ilya earlier for review which support
>> vectorization of loop epilogues and loops with low trip count. We
>> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>> approved by Jeff.
>>
>> I did re-base of all patches and performed bootstrapping and
>> regression testing that did not show any new failures. Also all
>> changes related to new vect_do_peeling algorithm have been changed
>> accordingly.
>>
>> Is it OK for trunk?
>
> Hi,
> I can't approve patches, but had some comments after going through the
> implementation.
>
> One confusing part is cost model change, as well as the way it's used
> to decide how epilogue loop should be vectorized.  Given vect-tail is
> disabled at the moment and the cost change needs further tuning, is it
> reasonable to split this part out and get vectorization part
> reviewed/submitted independently?  For example, let user specified
> parameters make the decision for now.  Cost and target dependent
> changes should go in at last, this could make the patch easier to
> read.
>
> The implementation computes/shares quite amount information from main
> loop to epilogue loop vectorization.  Furthermore, variables/fields
> for such information are somehow named in a misleading way.  For
> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
> flag controlling whether epilogue loop should be vectorized with
> masking.  However, it's actually controlled by exactly the same flag
> as whether epilogue loop should be combined into the main loop with
> masking:
> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>
>slpeel_make_loop_iterate_ntimes (loop, niters_vector);
>
> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
> +vect_combine_loop_epilogue (loop_vinfo);
> +
>/* Reduce loop iterations by the vectorization factor.  */
>scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
>expected_iterations / vf);
>
> IMHO, we should decouple main loop vectorization and epilogue
> vectorization as much as possible by sharing as few information as we
> can.  The general idea is to handle epilogue loop just like normal
> short-trip loop.  For example, we can rename
> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
> else), and we don't differentiate its meaning between main and
> epilogue(short-trip) loop.  It only indicates the current loop should
> be vectorized with masking no matter it's a main loop or epilogue
> loop, and it works just like the current implementation.
>
> After this change, we can refine vectorization and make it more
> general for normal loop and epilogue(short trip) loop.  For example,
> this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
> loop and use it to control how it should be vectorized:
> +  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> +{
> +  LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
> +  LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
> +}
> +  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +   && min_profitable_combine_iters >= 0)
> +{
>
> This works, but not that good for understanding or maintaining.
>
> Thanks,
> bin


[PATCH] Remove redundant canonicalize_comparison from DOM

2016-11-09 Thread Richard Biener

DOM currently canonicalizes GIMPLE_COND operand order.  But it does
so before constant/copy propagation into that stmt (so quite pointless)
and furthermore we fold the stmt if we constant/copy propagated
into it and fold already does the canonicalization.

Thus, removed.

Bootstrap & regtest pending on x86_64-unknown-linux-gnu.

Richard.

2016-11-09  Richard Biener  

* tree-ssa-dom.c (canonicalize_comparison): Remove.
(optimize_stmt): Remove redundant pre-propagation canonicalization
of comparison operand order.

Index: gcc/tree-ssa-dom.c
===
--- gcc/tree-ssa-dom.c  (revision 241992)
+++ gcc/tree-ssa-dom.c  (working copy)
@@ -753,49 +753,6 @@ make_pass_dominator (gcc::context *ctxt)
 }
 
 
-/* Given a conditional statement CONDSTMT, convert the
-   condition to a canonical form.  */
-
-static void
-canonicalize_comparison (gcond *condstmt)
-{
-  tree op0;
-  tree op1;
-  enum tree_code code;
-
-  gcc_assert (gimple_code (condstmt) == GIMPLE_COND);
-
-  op0 = gimple_cond_lhs (condstmt);
-  op1 = gimple_cond_rhs (condstmt);
-
-  code = gimple_cond_code (condstmt);
-
-  /* If it would be profitable to swap the operands, then do so to
- canonicalize the statement, enabling better optimization.
-
- By placing canonicalization of such expressions here we
- transparently keep statements in canonical form, even
- when the statement is modified.  */
-  if (tree_swap_operands_p (op0, op1, false))
-{
-  /* For relationals we need to swap the operands
-and change the code.  */
-  if (code == LT_EXPR
- || code == GT_EXPR
- || code == LE_EXPR
- || code == GE_EXPR)
-   {
-  code = swap_tree_comparison (code);
-
-  gimple_cond_set_code (condstmt, code);
-  gimple_cond_set_lhs (condstmt, op1);
-  gimple_cond_set_rhs (condstmt, op0);
-
-  update_stmt (condstmt);
-   }
-}
-}
-
 /* A trivial wrapper so that we can present the generic jump
threading code with a simple API for simplifying statements.  */
 static tree
@@ -1789,9 +1746,6 @@ optimize_stmt (basic_block bb, gimple_st
   print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
 }
 
-  if (gimple_code (stmt) == GIMPLE_COND)
-canonicalize_comparison (as_a  (stmt));
-
   update_stmt_if_modified (stmt);
   opt_stats.num_stmts++;
 


[PATCH, GCC/testsuite] Fix PR78269: noexcept-type9.C should be a compile test

2016-11-09 Thread Thomas Preudhomme
g++.dg/cpp1z/noexcept-type9.C contains a dg-error but is currently a runtime 
test. The intent was clearly to make it a compile test to test that error, 
especially since there is no main. This patch changes it to a compile test.


ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2016-11-09  Thomas Preud'homme  

PR testsuite/78269
* g++.dg/cpp1z/noexcept-type9.C: Make it a compile test.


Applied as obvious.

Best regards,

Thomas
diff --git a/gcc/testsuite/g++.dg/cpp1z/noexcept-type9.C b/gcc/testsuite/g++.dg/cpp1z/noexcept-type9.C
index 4547c4cc4d3c2bf3f7e049b4e6c1179b85982a62..a29618a3001266b003ab966daa80ca5e8d5020da 100644
--- a/gcc/testsuite/g++.dg/cpp1z/noexcept-type9.C
+++ b/gcc/testsuite/g++.dg/cpp1z/noexcept-type9.C
@@ -1,6 +1,6 @@
 // Test for PMF template args.
 // { dg-options -std=c++1z }
-// { dg-do run }
+// { dg-do compile }
 
 struct A
 {


[PATCH] Remove flag_evaluation_order (Java remains)

2016-11-09 Thread Richard Biener

The Java FE was the only thing setting flag_evaluation_order to nonzero
thus the following removes that flag and adjusts code checking it
(a followup will remove the now unused arg from tree_swap_operands_p).

Bootstrap / regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2016-11-09  Richard Biener  

* common.opt (flag_evaluation_order): Remove.
* expr.c (expand_operands): Remove code guarded by
flag_evaluation_order.
* fold-const.c (reorder_operands_p): Remove, it always returns
true.
(negate_expr_p): Remove calls to reorder_operands_p.
(fold_negate_expr): Likewise.
(tree_swap_operands_p): Likewise.
(fold_binary_loc): Likewise.

Index: gcc/common.opt
===
--- gcc/common.opt  (revision 241992)
+++ gcc/common.opt  (working copy)
@@ -58,10 +58,6 @@ int flag_incremental_link = 0
 Variable
 int flag_complex_method = 1
 
-; Nonzero if subexpressions must be evaluated from left-to-right.
-Variable
-int flag_evaluation_order = 0
-
 ; Language specific warning pass for unused results.
 Variable
 bool flag_warn_unused_result = false
Index: gcc/expr.c
===
--- gcc/expr.c  (revision 241992)
+++ gcc/expr.c  (working copy)
@@ -7681,10 +7682,6 @@ expand_operands (tree exp0, tree exp1, r
 }
   else
 {
-  /* If we need to preserve evaluation order, copy exp0 into its own
-temporary variable so that it can't be clobbered by exp1.  */
-  if (flag_evaluation_order && TREE_SIDE_EFFECTS (exp1))
-   exp0 = save_expr (exp0);
   *op0 = expand_expr (exp0, target, VOIDmode, modifier);
   *op1 = expand_expr (exp1, NULL_RTX, VOIDmode, modifier);
 }
Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 241992)
+++ gcc/fold-const.c(working copy)
@@ -133,7 +133,6 @@ static tree fold_binary_op_with_conditio
 tree, tree,
 tree, tree, int);
 static tree fold_div_compare (location_t, enum tree_code, tree, tree, tree);
-static bool reorder_operands_p (const_tree, const_tree);
 static tree fold_negate_const (tree, tree);
 static tree fold_not_const (const_tree, tree);
 static tree fold_relational_const (enum tree_code, tree, tree, tree);
@@ -435,9 +434,7 @@ negate_expr_p (tree t)
  && ! TYPE_OVERFLOW_WRAPS (type)))
return false;
   /* -(A + B) -> (-B) - A.  */
-  if (negate_expr_p (TREE_OPERAND (t, 1))
- && reorder_operands_p (TREE_OPERAND (t, 0),
-TREE_OPERAND (t, 1)))
+  if (negate_expr_p (TREE_OPERAND (t, 1)))
return true;
   /* -(A + B) -> (-A) - B.  */
   return negate_expr_p (TREE_OPERAND (t, 0));
@@ -447,9 +444,7 @@ negate_expr_p (tree t)
   return !HONOR_SIGN_DEPENDENT_ROUNDING (element_mode (type))
 && !HONOR_SIGNED_ZEROS (element_mode (type))
 && (! INTEGRAL_TYPE_P (type)
-|| TYPE_OVERFLOW_WRAPS (type))
-&& reorder_operands_p (TREE_OPERAND (t, 0),
-   TREE_OPERAND (t, 1));
+|| TYPE_OVERFLOW_WRAPS (type));
 
 case MULT_EXPR:
   if (TYPE_UNSIGNED (type))
@@ -606,9 +601,7 @@ fold_negate_expr (location_t loc, tree t
  && !HONOR_SIGNED_ZEROS (element_mode (type)))
{
  /* -(A + B) -> (-B) - A.  */
- if (negate_expr_p (TREE_OPERAND (t, 1))
- && reorder_operands_p (TREE_OPERAND (t, 0),
-TREE_OPERAND (t, 1)))
+ if (negate_expr_p (TREE_OPERAND (t, 1)))
{
  tem = negate_expr (TREE_OPERAND (t, 1));
  return fold_build2_loc (loc, MINUS_EXPR, type,
@@ -628,8 +621,7 @@ fold_negate_expr (location_t loc, tree t
 case MINUS_EXPR:
   /* - (A - B) -> B - A  */
   if (!HONOR_SIGN_DEPENDENT_ROUNDING (element_mode (type))
- && !HONOR_SIGNED_ZEROS (element_mode (type))
- && reorder_operands_p (TREE_OPERAND (t, 0), TREE_OPERAND (t, 1)))
+ && !HONOR_SIGNED_ZEROS (element_mode (type)))
return fold_build2_loc (loc, MINUS_EXPR, type,
TREE_OPERAND (t, 1), TREE_OPERAND (t, 0));
   break;
@@ -6761,27 +6753,12 @@ fold_single_bit_test (location_t loc, en
   return NULL_TREE;
 }
 
-/* Check whether we are allowed to reorder operands arg0 and arg1,
-   such that the evaluation of arg1 occurs before arg0.  */
-
-static bool
-reorder_operands_p (const_tree arg0, const_tree arg1)
-{
-  if (! flag_evaluation_order)
-  return true;
-  if (TREE_CONSTANT (arg0) || TREE_CONSTANT (arg1))
-return true;
-  return ! TREE_SIDE_EFFECTS (arg0)
-&& ! TREE_SIDE_EFFECTS (arg1);
-}
-
 /* Test whether it is preferable two swap two 

Re: Prevent aliasing between arguments in calls to move_alloc

2016-11-09 Thread Paul Richard Thomas
Hi Steve,

Committed as r241995. Thanks.

Paul

On 8 November 2016 at 20:43, Steve Kargl
 wrote:
> Yes.  I saw Ian's analysis in c.l.f.  It seems we both got
> caught out on this one.  The patch looks fine.
>
> --
> steve
>
> On Tue, Nov 08, 2016 at 08:26:37PM +0100, Paul Richard Thomas wrote:
>> Hi Steve,
>>
>> I moved too quickly and caused a regression. See the link in the
>> testcase. The attached fixes the problem and bootstraps/regtests.
>>
>> OK for trunk?
>>
>> Paul
>>
>>
>> On 5 November 2016 at 16:17, Steve Kargl
>>  wrote:
>> > On Sat, Nov 05, 2016 at 10:05:30AM +0100, Paul Richard Thomas wrote:
>> >>
>> >> Bootstraps and regtests on FC21/x86_64 - OK for trunk?
>> >
>> > OK with minor nit (see below).
>> >
>> >>
>> >> +   /*  F2003 12.4.1.7  */
>> >> +   if (to->expr_type == EXPR_VARIABLE && from->expr_type ==EXPR_VARIABLE
>> >
>> > Need a space after ==.
>> >
>> > --
>> > Steve
>>
>>
>>
>> --
>> The difference between genius and stupidity is; genius has its limits.
>>
>> Albert Einstein
>
>> Index: gcc/fortran/check.c
>> ===
>> *** gcc/fortran/check.c   (revision 241872)
>> --- gcc/fortran/check.c   (working copy)
>> *** gfc_check_move_alloc (gfc_expr *from, gf
>> *** 3343,3355 
>>   }
>>
>> /*  F2003 12.4.1.7  */
>> !   if (to->expr_type == EXPR_VARIABLE && from->expr_type ==EXPR_VARIABLE
>> && !strcmp (to->symtree->n.sym->name, from->symtree->n.sym->name))
>>   {
>> !   gfc_error ("The FROM and TO arguments at %L are either the same 
>> object "
>> !  "or subobjects thereof and so violate aliasing restrictions "
>> !  "(F2003 12.4.1.7)", >where);
>> !   return false;
>>   }
>>
>> /* CLASS arguments: Make sure the vtab of from is present.  */
>> --- 3343,3380 
>>   }
>>
>> /*  F2003 12.4.1.7  */
>> !   if (to->expr_type == EXPR_VARIABLE && from->expr_type == EXPR_VARIABLE
>> && !strcmp (to->symtree->n.sym->name, from->symtree->n.sym->name))
>>   {
>> !   gfc_ref *to_ref, *from_ref;
>> !   to_ref = to->ref;
>> !   from_ref = from->ref;
>> !   bool aliasing = true;
>> !
>> !   for (; from_ref && to_ref;
>> !from_ref = from_ref->next, to_ref = to_ref->next)
>> ! {
>> !   if (to_ref->type != from->ref->type)
>> ! aliasing = false;
>> !   else if (to_ref->type == REF_ARRAY
>> !&& to_ref->u.ar.type != AR_FULL
>> !&& from_ref->u.ar.type != AR_FULL)
>> ! /* Play safe; assume sections and elements are different.  */
>> ! aliasing = false;
>> !   else if (to_ref->type == REF_COMPONENT
>> !&& to_ref->u.c.component != from_ref->u.c.component)
>> ! aliasing = false;
>> !
>> !   if (!aliasing)
>> ! break;
>> ! }
>> !
>> !   if (aliasing)
>> ! {
>> !   gfc_error ("The FROM and TO arguments at %L violate aliasing "
>> !  "restrictions (F2003 12.4.1.7)", >where);
>> !   return false;
>> ! }
>>   }
>>
>> /* CLASS arguments: Make sure the vtab of from is present.  */
>> Index: gcc/testsuite/gfortran.dg/move_alloc_18.f90
>> ===
>> *** gcc/testsuite/gfortran.dg/move_alloc_18.f90   (revision 0)
>> --- gcc/testsuite/gfortran.dg/move_alloc_18.f90   (working copy)
>> ***
>> *** 0 
>> --- 1,21 
>> + ! { dg-do compile }
>> + !
>> + ! Test that the anti-aliasing restriction does not knock out valid code.
>> + !
>> + ! Contributed by  Andrew Balwin on
>> + ! https://groups.google.com/forum/#!topic/comp.lang.fortran/oiXdl1LPb_s
>> + !
>> +   PROGRAM TEST
>> + IMPLICIT NONE
>> +
>> + TYPE FOOBAR
>> +   INTEGER, ALLOCATABLE :: COMP(:)
>> + END TYPE
>> +
>> + TYPE (FOOBAR) :: MY_ARRAY(6)
>> +
>> + ALLOCATE (MY_ARRAY(1)%COMP(10))
>> +
>> + CALL MOVE_ALLOC (MY_ARRAY(1)%COMP, MY_ARRAY(2)%COMP)
>> +
>> +   END PROGRAM TEST
>
>
> --
> Steve



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein


Reject out-of-range bit pos in bit-fields insns operating on a register.

2016-11-09 Thread Andreas Schwab
As seen by the testcase in PR77822, combine can generate out-of-range
bit pos in a bit-field insn, unless the pattern explicitly rejects it.
This only makes a difference for expressions that are undefined at
runtime.  Without that we would either generate bad assembler or ICE in
output_btst.

PR target/78254
* config/m68k/m68k.md: Reject out-of-range bit pos in bit-fields
insns operating on a register.

Index: gcc/config/m68k/m68k.md
===
--- gcc/config/m68k/m68k.md (revision 241995)
+++ gcc/config/m68k/m68k.md (working copy)
@@ -706,7 +706,7 @@
   (minus:SI (const_int 31)
 (match_operand:SI 1 "general_operand" 
"di")))
 (const_int 0)))]
-  ""
+  "!(CONST_INT_P (operands[1]) && !IN_RANGE (INTVAL (operands[1]), 0, 31))"
 {
   return output_btst (operands, operands[1], operands[0], insn, 31);
 })
@@ -765,9 +765,10 @@
 (cc0)
 (compare (zero_extract:SI (match_operand:SI 0 "register_operand" "do")
  (const_int 1)
-   (match_operand:SI 1 "const_int_operand" "n"))
+ (match_operand:SI 1 "const_int_operand" "n"))
 (const_int 0)))]
-  "!TARGET_COLDFIRE"
+  "!TARGET_COLDFIRE
+   && !(REG_P (operands[0]) && !IN_RANGE (INTVAL (operands[1]), 0, 31))"
 {
   if (GET_CODE (operands[0]) == MEM)
 {
@@ -790,7 +791,8 @@
  (const_int 1)
  (match_operand:SI 1 "const_int_operand" "n"))
 (const_int 0)))]
-  "TARGET_COLDFIRE"
+  "TARGET_COLDFIRE
+   && !(REG_P (operands[0]) && !IN_RANGE (INTVAL (operands[1]), 0, 31))"
 {
   if (GET_CODE (operands[0]) == MEM)
 {
@@ -5397,6 +5399,7 @@
 (match_operand:SI 2 "const_int_operand" "n"))
(match_operand:SI 3 "register_operand" "d"))]
   "TARGET_68020 && TARGET_BITFIELD
+   && IN_RANGE (INTVAL (operands[2]), 0, 31)
&& (INTVAL (operands[1]) == 8 || INTVAL (operands[1]) == 16)
&& INTVAL (operands[2]) % INTVAL (operands[1]) == 0"
 {
@@ -5438,6 +5441,7 @@
 (match_operand:SI 2 "const_int_operand" "n")
 (match_operand:SI 3 "const_int_operand" "n")))]
   "TARGET_68020 && TARGET_BITFIELD
+   && IN_RANGE (INTVAL (operands[3]), 0, 31)
&& (INTVAL (operands[2]) == 8 || INTVAL (operands[2]) == 16)
&& INTVAL (operands[3]) % INTVAL (operands[2]) == 0"
 {
@@ -5480,6 +5484,7 @@
 (match_operand:SI 2 "const_int_operand" "n")
 (match_operand:SI 3 "const_int_operand" "n")))]
   "TARGET_68020 && TARGET_BITFIELD
+   && IN_RANGE (INTVAL (operands[3]), 0, 31)
&& (INTVAL (operands[2]) == 8 || INTVAL (operands[2]) == 16)
&& INTVAL (operands[3]) % INTVAL (operands[2]) == 0"
 {
@@ -5610,7 +5615,7 @@
(sign_extract:SI (match_operand:SI 1 "register_operand" "d")
 (match_operand:SI 2 "const_int_operand" "n")
 (match_operand:SI 3 "const_int_operand" "n")))]
-  "TARGET_68020 && TARGET_BITFIELD"
+  "TARGET_68020 && TARGET_BITFIELD && IN_RANGE (INTVAL (operands[3]), 0, 31)"
   "bfexts %1{%b3:%b2},%0")
 
 (define_insn "*extv_bfextu_reg"
@@ -5618,7 +5623,7 @@
(zero_extract:SI (match_operand:SI 1 "register_operand" "d")
 (match_operand:SI 2 "const_int_operand" "n")
 (match_operand:SI 3 "const_int_operand" "n")))]
-  "TARGET_68020 && TARGET_BITFIELD"
+  "TARGET_68020 && TARGET_BITFIELD && IN_RANGE (INTVAL (operands[3]), 0, 31)"
 {
   if (GET_CODE (operands[2]) == CONST_INT)
 {
@@ -5637,7 +5642,7 @@
 (match_operand:SI 1 "const_int_operand" "n")
 (match_operand:SI 2 "const_int_operand" "n"))
(const_int 0))]
-  "TARGET_68020 && TARGET_BITFIELD"
+  "TARGET_68020 && TARGET_BITFIELD && IN_RANGE (INTVAL (operands[2]), 0, 31)"
 {
   CC_STATUS_INIT;
   return "bfclr %0{%b2:%b1}";
@@ -5648,7 +5653,7 @@
 (match_operand:SI 1 "const_int_operand" "n")
 (match_operand:SI 2 "const_int_operand" "n"))
(const_int -1))]
-  "TARGET_68020 && TARGET_BITFIELD"
+  "TARGET_68020 && TARGET_BITFIELD && IN_RANGE (INTVAL (operands[2]), 0, 31)"
 {
   CC_STATUS_INIT;
   return "bfset %0{%b2:%b1}";
@@ -5659,7 +5664,7 @@
 (match_operand:SI 1 "const_int_operand" "n")
 (match_operand:SI 2 "const_int_operand" "n"))
(match_operand:SI 3 "register_operand" "d"))]
-  "TARGET_68020 && TARGET_BITFIELD"
+  "TARGET_68020 && TARGET_BITFIELD && IN_RANGE (INTVAL (operands[2]), 0, 31)"
 {
 #if 0
   /* These special cases are now recognized by a specific pattern.  */
@@ -5707,7 +5712,8 @@
  (match_operand:SI 1 "const_int_operand" "n")
  

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Bin.Cheng
On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  wrote:
> Hi All,
>
> I re-send all patches sent by Ilya earlier for review which support
> vectorization of loop epilogues and loops with low trip count. We
> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> approved by Jeff.
>
> I did re-base of all patches and performed bootstrapping and
> regression testing that did not show any new failures. Also all
> changes related to new vect_do_peeling algorithm have been changed
> accordingly.
>
> Is it OK for trunk?

Hi,
I can't approve patches, but had some comments after going through the
implementation.

One confusing part is cost model change, as well as the way it's used
to decide how epilogue loop should be vectorized.  Given vect-tail is
disabled at the moment and the cost change needs further tuning, is it
reasonable to split this part out and get vectorization part
reviewed/submitted independently?  For example, let user specified
parameters make the decision for now.  Cost and target dependent
changes should go in at last, this could make the patch easier to
read.

The implementation computes/shares quite amount information from main
loop to epilogue loop vectorization.  Furthermore, variables/fields
for such information are somehow named in a misleading way.  For
example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
flag controlling whether epilogue loop should be vectorized with
masking.  However, it's actually controlled by exactly the same flag
as whether epilogue loop should be combined into the main loop with
masking:
@@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)

   slpeel_make_loop_iterate_ntimes (loop, niters_vector);

+  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
+vect_combine_loop_epilogue (loop_vinfo);
+
   /* Reduce loop iterations by the vectorization factor.  */
   scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
   expected_iterations / vf);

IMHO, we should decouple main loop vectorization and epilogue
vectorization as much as possible by sharing as few information as we
can.  The general idea is to handle epilogue loop just like normal
short-trip loop.  For example, we can rename
LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
else), and we don't differentiate its meaning between main and
epilogue(short-trip) loop.  It only indicates the current loop should
be vectorized with masking no matter it's a main loop or epilogue
loop, and it works just like the current implementation.

After this change, we can refine vectorization and make it more
general for normal loop and epilogue(short trip) loop.  For example,
this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
loop and use it to control how it should be vectorized:
+  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
+{
+  LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
+  LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
+}
+  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
+   && min_profitable_combine_iters >= 0)
+{

This works, but not that good for understanding or maintaining.

Thanks,
bin


Re: [PATCH 7/7, GCC, ARM, V8M] Added support for ARMV8-M Security Extension cmse_nonsecure_caller intrinsic

2016-11-09 Thread Kyrill Tkachov

Hi Andre,

On 25/10/16 17:30, Andre Vieira (lists) wrote:

On 24/08/16 12:01, Andre Vieira (lists) wrote:

On 25/07/16 14:28, Andre Vieira (lists) wrote:

This patch adds support ARMv8-M's Security Extension's
cmse_nonsecure_caller intrinsic. This intrinsic is used to check whether
an entry function was called from a non-secure state.
See Section 5.4.3 of ARM®v8-M Security Extensions: Requirements on
Development Tools
(http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html)
for further details.

The FIXME in config/arm/arm_cmse.h is for a diagnostic message that is
suggested in the ARMv8-M Security Extensions document mentioned above,
to diagnose the use of the cmse_nonsecure_caller intrinsic outside of
functions with the 'cmse_nonsecure_entry' attribute.  Checking whether
the intrinsic is called from within such functions can easily be done
inside 'arm_expand_builtin'. However, making the warning point to the
right location is more complicated.  The ARMv8-M Security Extensions
specification does mention that such a diagnostic might become
mandatory, so I might have to pick this up later, otherwise it is left
as a potential extra feature.


*** gcc/ChangeLog ***
2016-07-25  Andre Vieira
 Thomas Preud'homme  

 * config/arm/arm-builtins.c (arm_builtins): Define
 ARM_BUILTIN_CMSE_NONSECURE_CALLER.
 (bdesc_2arg): Add line for cmse_nonsecure_caller.
 (arm_expand_builtin): Handle cmse_nonsecure_caller.
 * config/arm/arm_cmse.h (cmse_nonsecure_caller): New.

*** gcc/testsuite/ChangeLog ***
2016-07-25  Andre Vieira
 Thomas Preud'homme  

 * gcc.target/arm/cmse/cmse-1.c: Add test for
 cmse_nonsecure_caller.


Added more documentation as requested.

---

This patch adds support ARMv8-M's Security Extension's
cmse_nonsecure_caller intrinsic. This intrinsic is used to check whether
an entry function was called from a non-secure state.
See Section 5.4.3 of ARM®v8-M Security Extensions: Requirements on
Development Tools
(http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html)
for further details.

The FIXME in config/arm/arm_cmse.h is for a diagnostic message that is
suggested in the ARMv8-M Security Extensions document mentioned above,
to diagnose the use of the cmse_nonsecure_caller intrinsic outside of
functions with the 'cmse_nonsecure_entry' attribute.  Checking whether
the intrinsic is called from within such functions can easily be done
inside 'arm_expand_builtin'. However, making the warning point to the
right location is more complicated.  The ARMv8-M Security Extensions
specification does mention that such a diagnostic might become
mandatory, so I might have to pick this up later, otherwise it is left
as a potential extra feature.


*** gcc/ChangeLog ***
2016-07-xx  Andre Vieira
 Thomas Preud'homme  

 * config/arm/arm-builtins.c (arm_builtins): Define
 ARM_BUILTIN_CMSE_NONSECURE_CALLER.
 (bdesc_2arg): Add line for cmse_nonsecure_caller.
 (arm_expand_builtin): Handle cmse_nonsecure_caller.
 * config/arm/arm_cmse.h (cmse_nonsecure_caller): New.
 * doc/extend.texi (ARM ARMv8-M Security Extensions): New intrinsic.

*** gcc/testsuite/ChangeLog ***
2016-07-xx  Andre Vieira
 Thomas Preud'homme  

 * gcc.target/arm/cmse/cmse-1.c: Add test for
 cmse_nonsecure_caller.


Hi,

Rebased previous patch on top of trunk as requested. No changes to
ChangeLog.

Cheers,
Andre


@@ -1832,6 +1834,17 @@ arm_init_builtins (void)
= add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr,
ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, 
NULL_TREE);
 }
+
+  if (arm_arch_cmse)
+{
+  tree ftype_cmse_nonsecure_caller
+   = build_function_ty

Should this be use_cmse ?
This looks ok to me otherwise.
I believe patch [6/7] is the only one needing approval after this...

Kyrill


Re: [Patch, Fortran, committed] PR 71894: [OOP] ICE in gfc_add_component_ref, at fortran/class.c:227

2016-11-09 Thread Janus Weil
Hi Andre,

sorry, didn't want to be impolite. I was just not aware that it makes
a big difference whether I attach the patch or post the link. Patch
below.

Cheers,
Janus


Index: gcc/fortran/class.c
===
--- gcc/fortran/class.c(Revision 241992)
+++ gcc/fortran/class.c(Arbeitskopie)
@@ -224,7 +224,8 @@ gfc_add_component_ref (gfc_expr *e, const char *na
 break;
   tail = &((*tail)->next);
 }
-  if (derived->components->next->ts.type == BT_DERIVED &&
+  if (derived->components && derived->components->next &&
+  derived->components->next->ts.type == BT_DERIVED &&
   derived->components->next->ts.u.derived == NULL)
 {
   /* Fix up missing vtype.  */




2016-11-09 10:54 GMT+01:00 Andre Vehreschild :
> Hi Janus,
>
> may I ask you to attach also the "obvious" patches to the mail you send to the
> list? It is far more comfortable to take a look at the patch in the mail than
> open the webpage. Furthermore is it considered polite to attach the patches
> even when they are obvious. This is not to treat you badly, but to ensure a
> certain quality of gfortran. When a diff is attached I look at it, but I will
> not open the link, not when I am only on a mobile.
>
> Thank you in advance,
> Andre
>
> On Wed, 9 Nov 2016 10:35:10 +0100
> Janus Weil  wrote:
>
>> Hi all,
>>
>> I have committed to trunk another obvious patch to fix an ICE on invalid 
>> code:
>>
>> https://gcc.gnu.org/viewcvs?rev=241993=gcc=rev
>>
>> Cheers,
>> Janus
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de


Re: [PATCH] Convert character arrays to string csts

2016-11-09 Thread Richard Biener
On Fri, Nov 4, 2016 at 2:33 PM, Martin Liška  wrote:
> On 11/03/2016 02:00 PM, Jan Hubicka wrote:
>>> On 11/03/2016 01:12 PM, Martin Liška wrote:
 +  tree init = DECL_INITIAL (decl);
 +  if (init
 +  && TREE_READONLY (decl)
 +  && can_convert_ctor_to_string_cst (init))
 +DECL_INITIAL (decl) = build_string_cst_from_ctor (init);
>>>
>>> I'd merge these two new functions since they're only ever called
>>> together. We'd then have something like
>>>
>>> if (init && TREE_READONLY (decl))
>>>   init = convert_ctor_to_string_cst (init);
>>> if (init)
>>>   DECL_INITIAL (decl) = init;
>
> Done.
>
>>>
>>> I'll defer to Jan on whether finalize_decl seems like a good place
>>> to do this.
>>
>> I think finalize_decl may be bit too early because frontends may expects the
>> ctors to be in a way they produced them.  We only want to convert those 
>> arrays
>> seen by middle-end so I would move the logic to varpool_node::analyze
>>
>> Otherwise the patch seems fine to me.
>>
>> Honza
>>>
 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c 
 b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
 index 283bd1c..b2d1fd5 100644
 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
 +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
 @@ -4,12 +4,15 @@
 char *buffer1;
 char *buffer2;

 +const char global[] = {'a', 'b', 'c', 'd', '\0'};
 +
 #define SIZE 1000

 int
 main (void)
 {
   const char* const foo1 = "hello world";
 +  const char local[] = "abcd";

   buffer1 = __builtin_malloc (SIZE);
   __builtin_strcpy (buffer1, foo1);
 @@ -45,6 +48,10 @@ main (void)
 __builtin_abort ();
   if (__builtin_memchr (foo1, null, 12) != foo1 + 11)
 __builtin_abort ();
 +  if (__builtin_memchr (global, null, 5) == 0)
 +__builtin_abort ();
 +  if (__builtin_memchr (local, null, 5) == 0)
 +__builtin_abort ();
>>>
>>> How is that a meaningful test? This seems to work even with an
>>> unpatched gcc. I'd have expected something that shows a benefit for
>>> doing this conversion, and maybe also a test that shows it isn't
>>> done in cases where it's not allowed.
>
> It's meaningful as it scans that there's no __builtin_memchr in optimized 
> dump.
> I'm adding new tests that does the opposite test.
>
>>>
 tree
 -build_string_literal (int len, const char *str)
 +build_string_literal (int len, const char *str, bool build_addr_expr)
>>>
>>> New arguments should be documented in the function comment.
>
> Yep, improved.
>
>>>
 +/* Return TRUE when CTOR can be converted to a string constant.  */
>>>
>>> "if", not "when".
>
> Done.
>
>>>
 +  unsigned HOST_WIDE_INT elements = CONSTRUCTOR_NELTS (ctor);
 +  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), idx, key, value)
 +{
 +  if (key == NULL_TREE
 + || TREE_CODE (key) != INTEGER_CST
 + || !tree_fits_uhwi_p (value)
 + || !useless_type_conversion_p (TREE_TYPE (value), char_type_node))
 +   return false;
>>>
>>> Shouldn't all elements have the same type, or do you really have to
>>> call useless_type_conversion in a loop?
>>>
 +  /* Allow zero character just at the end of a string.  */
 +  if (integer_zerop (value))
 +   return idx == elements - 1;
>>>
>>> Don't you also have to explicitly check it's there?
>>>
>>>
>>> Bernd
>
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

I'm curious about the

@@ -1495,7 +1495,8 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
{
  if (!TREE_STATIC (decl))
{
- DECL_INITIAL (decl) = NULL_TREE;
+ if (!TREE_READONLY (decl) || TREE_CODE (init) != STRING_CST)
+   DECL_INITIAL (decl) = NULL_TREE;
  init = build2 (INIT_EXPR, void_type_node, decl, init);
  gimplify_and_add (init, seq_p);
  ggc_free (init);

change.  Why keep DECL_INITIAL if you build a INIT_EXPR anyway?

@@ -4438,6 +4439,19 @@ gimplify_init_constructor (tree *expr_p,
gimple_seq *pre_p, gimple_seq *post_p,
break;
  }

+   /* Replace a ctor with a string constant with possible.  */
+   if (TREE_READONLY (object)
+   && VAR_P (object))
+ {
+   tree string_ctor = convert_ctor_to_string_cst (ctor);
+   if (string_ctor)
+ {
+   TREE_OPERAND (*expr_p, 1) = string_ctor;
+   DECL_INITIAL (object) = string_ctor;
+   break;
+ }
+ }
+
/* Fetch information about the constructor to direct later processing.
   We might want to make static versions of it in various cases, and
   can only do so if it known to be a valid constant initializer.  */

hmm, so both these hunks will end up keeping a DECL_INITIAL
for non-static local consts?  

[PATCH 6/6][ARM] Implement support for ACLE Coprocessor MCRR and MRRC intrinsics

2016-11-09 Thread Andre Vieira (lists)
Hi,

This patch implements support for the ARM ACLE Coprocessor MCR and MRC
intrinsics. See below a table mapping the intrinsics to their respective
instructions:

+---+---+
| Intrinsic signature   |
Instruction pattern   |
+---+---+
|void __arm_mcrr(coproc, opc1, uint64_t value, CRm) |
MCRR coproc, opc1, Rt, Rt2, CRm   |
+---+---+
|void __arm_mcrr2(coproc, opc1, uint64_t value, CRm)|
MCRR2 coproc, opc1, Rt, Rt2, CRm  |
+---+---+
|uint64_t __arm_mrrc(coproc, opc1, CRm) |
MRRC coproc, opc1, Rt, Rt2, CRm   |
+---+---+
|uint64_t __arm_mrrc2(coproc, opc1, CRm)|
MRRC2 coproc, opc1, Rt, Rt2, CRm  |
+---+---+
Note that any untyped variable in the intrinsic signature is required to
be a compiler-time constant and has the type 'unsigned int'.  We do some
boundary checks for coproc:[0-15], opc1[0-7] CR*:[0-31]. If either of
these requirements are not met a diagnostic is issued.

I added a new arm_arch variable for ARMv5TE to use when deciding whether
or not the MCRR and MRCC intrinsics are available.

Is this OK for trunk?

Regards,
Andre

gcc/ChangeLog:
2016-11-09  Andre Vieira  

  * config/arm/arm.md (): New.
  (): New.
  * config/arm/arm.c (arm_arch5te): New.
  (arm_option_override): Set arm_arch5te.
  (arm_coproc_builtin_available): Add support for mcrr, mcrr2, mrrc
  and mrrc2.
  * config/arm/arm-builtins.c (MCRR_QUALIFIERS): Define to...
  (arm_mcrr_qualifiers): ... this. New.
  (MRRC_QUALIFIERS): Define to...
  (arm_mrrc_qualifiers): ... this. New.
  * config/arm/arm_acle.h (__arm_mcrr, __arm_mcrr2, __arm_mrrc,
  __arm_mrrc2): New.
  * config/arm/arm_acle_builtins.def (mcrr, mcrr2, mrrc, mrrc2): New.
  * config/arm/iterators.md (MCRRI, mcrr, MCRR): New.
  (MRRCI, mrrc, MRRC): New.
  * config/arm/unspecs.md (VUNSPEC_MCRR, VUNSPEC_MCRR2, VUNSPEC_MRRC,
  VUNSPEC_MRRC2): New.

gcc/testsuite/ChangeLog:

2016-11-09  Andre Vieira  

  * gcc.target/arm/acle/mcrr: New.
  * gcc.target/arm/acle/mcrr2: New.
  * gcc.target/arm/acle/mrrc: New.
  * gcc.target/arm/acle/mrrc2: New.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
44f255356dcc3ea6b8f554ba96f99fd7856bf6a1..ab641d6d5fb5f64b5f3317f461e13e5222150237
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -216,6 +216,24 @@ arm_mrc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned_immediate, qualifier_unsigned_immediate };
 #define MRC_QUALIFIERS \
   (arm_mrc_qualifiers)
+
+/* void (unsigned immediate, unsigned immediate,  T, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_mcrr_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate, qualifier_none,
+  qualifier_unsigned_immediate };
+#define MCRR_QUALIFIERS \
+  (arm_mcrr_qualifiers)
+
+/* T (unsigned immediate, unsigned immediate, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_mrrc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate, qualifier_unsigned_immediate };
+#define MRRC_QUALIFIERS \
+  (arm_mrrc_qualifiers)
+
 /* The first argument (return type) of a store should be void type,
which we represent with qualifier_void.  Their first operand will be
a DImode pointer to the location to store to, so we must use
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
25439a343e8540c5fca5cbe19e8b76e2fdb97a73..3f4e2a39580217b6564f047998681e5b8419e741
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -797,6 +797,9 @@ int arm_arch5 = 0;
 /* Nonzero if this chip supports the ARM Architecture 5E extensions.  */
 int arm_arch5e = 0;
 
+/* Nonzero if this chip supports the ARM Architecture 5TE extensions.  */
+int arm_arch5te = 0;
+
 /* Nonzero if this chip supports the ARM Architecture 6 extensions.  */
 int arm_arch6 = 0;
 
@@ -3231,6 +3234,7 @@ arm_option_override (void)
   arm_arch4t = arm_arch4 && (ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB));
   arm_arch5 = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH5);
   arm_arch5e = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH5E);
+  arm_arch5te = arm_arch5e && ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB);
   

[PATCH 5/6][ARM] Implement support for ACLE Coprocessor MCR and MRC intrinsics

2016-11-09 Thread Andre Vieira (lists)
Hi,

This patch implements support for the ARM ACLE Coprocessor MCR and MRC
intrinsics. See below a table mapping the intrinsics to their respective
instructions:

+---+---+
| Intrinsic signature   |
Instruction pattern   |
+---+---+
|void __arm_mcr(coproc, opc1, uint32_t value, CRn, CRm, opc2)   |
MCR coproc, opc1, Rt, CRn, CRm, opc2  |
+---+---+
|void __arm_mcr2(coproc, opc1, uint32_t value, CRn, CRm, opc2)  |
MCR2 coproc, opc1, Rt, CRn, CRm, opc2 |
+---+---+
|uint32_t __arm_mrc(coproc, opc1, CRn, CRm, opc2)   |
MRC coproc, opc1, Rt, CRn, CRm, opc2  |
+---+---+
|uint32_t __arm_mrc2(coproc, opc1, CRn, CRm, opc2)  |
MRC2 coproc, opc1, Rt, CRn, CRm, opc2 |
+---+---+
Note that any untyped variable in the intrinsic signature is required to
be a compiler-time constant and has the type 'unsigned int'.  We do some
boundary checks for coproc:[0-15], opc1[0-7] CR*:[0-31],opc2:[0-7].  If
either of these requirements are not met a diagnostic is issued.

Is this OK for trunk?

Regards,
Andre

gcc/ChangeLog:
2016-11-09  Andre Vieira  

  * config/arm/arm.md (): New.
  (): New.
  * config/arm/arm.c (arm_coproc_builtin_available): Add
  support for mcr, mrc, mcr2 and mrc2.
  * config/arm/arm-builtins.c (MCR_QUALIFIERS): Define to...
  (arm_mcr_qualifiers): ... this. New.
  (MRC_QUALIFIERS): Define to ...
  (arm_mrc_qualifiers): ... this. New.
  (MCR_QUALIFIERS): Define to ...
  (arm_mcr_qualifiers): ... this. New.
  * config/arm/arm_acle.h (__arm_mcr, __arm_mrc, __arm_mcr2,
  __arm_mrc2): New.
  * config/arm/arm_acle_builtins.def (mcr, mcr2, mrc, mrc2): New.
  * config/arm/iterators.md (MCRI, mcr, MCR, MRCI, mrc, MRC): New.
  * config/arm/unspecs.md (VUNSPEC_MCR, VUNSPEC_MCR2, VUNSPEC_MRC,
  VUNSPEC_MRC2): New.


gcc/ChangeLog:
2016-11-09  Andre Vieira  

  * gcc.target/arm/acle/mcr.c: New.
  * gcc.target/arm/acle/mrc.c: New.
  * gcc.target/arm/acle/mcr2.c: New.
  * gcc.target/arm/acle/mrc2.c: New.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
e6e58cd6a656732e37c6bca23ad980eea522a710..44f255356dcc3ea6b8f554ba96f99fd7856bf6a1
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -196,6 +196,26 @@ arm_stc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define STC_QUALIFIERS \
   (arm_stc_qualifiers)
 
+/* void (unsigned immediate, unsigned immediate,  T, unsigned immediate,
+unsigned immediate, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_mcr_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate, qualifier_none,
+  qualifier_unsigned_immediate, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate };
+#define MCR_QUALIFIERS \
+  (arm_mcr_qualifiers)
+
+/* T (unsigned immediate, unsigned immediate, unsigned immediate,
+  unsigned immediate, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_mrc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate, qualifier_unsigned_immediate };
+#define MRC_QUALIFIERS \
+  (arm_mrc_qualifiers)
 /* The first argument (return type) of a store should be void type,
which we represent with qualifier_void.  Their first operand will be
a DImode pointer to the location to store to, so we must use
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
d3191abee163d5fbf753fb5072be50fdb2b4c785..25439a343e8540c5fca5cbe19e8b76e2fdb97a73
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -30803,6 +30803,8 @@ bool arm_coproc_builtin_available (enum unspecv builtin)
   case VUNSPEC_LDCL:
   case VUNSPEC_STC:
   case VUNSPEC_STCL:
+  case VUNSPEC_MCR:
+  case VUNSPEC_MRC:
if (arm_arch4)
  return true;
break;
@@ -30811,6 +30813,8 @@ bool arm_coproc_builtin_available (enum unspecv builtin)
   case VUNSPEC_LDC2L:
   case VUNSPEC_STC2:
   case VUNSPEC_STC2L:
+  case VUNSPEC_MCR2:
+  case VUNSPEC_MRC2:
/* Only present in ARMv5*, ARMv6 (but not ARMv6-M), ARMv7* and
   ARMv8-{A,M}.  */
if (arm_arch5)
diff --git a/gcc/config/arm/arm.md 

[PATCH 4/6][ARM] Implement support for ACLE Coprocessor LDC and STC intrinsics

2016-11-09 Thread Andre Vieira (lists)
Hi,

This patch implements support for the ARM ACLE Coprocessor LDC and STC
intrinsics. See below a table mapping the intrinsics to their respective
instructions:

++--+
| Intrinsic signature| Instruction
pattern  |
++--+
|void __arm_ldc(coproc, CRd, const void* p)  |LDC coproc, CRd,
[...]|
++--+
|void __arm_ldcl(coproc, CRd, const void* p) |LDCL coproc, CRd,
[...]   |
++--+
|void __arm_ldc2(coproc, CRd, const void* p) |LDC2 coproc, CRd,
[...]   |
++--+
|void __arm_ldc2l(coproc, CRd, const void* p)|LDC2L coproc, CRd,
[...]  |
++--+
|void __arm_stc(coproc, CRd, void* p)|STC coproc, CRd,
[...]|
++--+
|void __arm_stcl(coproc, CRd, void* p)   |STCL coproc, CRd,
[...]   |
++--+
|void __arm_stc2(coproc, CRd, void* p)   |STC2 coproc, CRd,
[...]   |
++--+
|void __arm_stc2l(coproc, CRd, void* p)  |STC2L coproc, CRd,
[...]  |
++--+
Note that any untyped variable in the intrinsic signature is required to
be a compiler-time constant and has the type 'unsigned int'.  We do some
boundary checks for coproc:[0-15], CR*:[0-31]. If either of these
requirements are not met a diagnostic is issued.


Is this ok for trunk?

Regards,
Andre

gcc/ChangeLog:
2016-11-09  Andre Vieira  

  * config/arm/arm.md (*ldcstc): New.
  (): New.
  * config/arm/arm.c (arm_coproc_builtin_available): Add
  support for ldc,ldcl,stc,stcl,ldc2,ldc2l,stc2 and stc2l.
  (arm_coproc_ldc_stc_legitimate_address): New.
  * config/arm/arm-builtins.c (arm_type_qualifiers): Add
  'qualifier_const_pointer'.
  (LDC_QUALIFIERS): Define to...
  (arm_ldc_qualifiers): ... this. New.
  (STC_QUALIFIERS): Define to...
  (arm_stc_qualifiers): ... this. New.
  * config/arm/arm-protos.h
  (arm_coproc_ldc_stc_legitimate_address): New.
  * config/arm/arm_acle.h (__arm_ldc, __arm_ldcl, __arm_stc,
  __arm_stcl, __arm_ldc2, __arm_ldc2l, __arm_stc2, __arm_stc2l): New.
  * config/arm/arm_acle_builtins.def (ldc, ldc2, ldcl, ldc2l, stc,
  stc2, stcl, stc2l): New.
  * config/arm/constraints.md (Uz): New.
  * config/arm/iterators.md (LDCSTCI, ldcstc, LDCSTC): New.
  * config/arm/unspecs.md (VUNSPEC_LDC, VUNSPEC_LDC2, VUNSPEC_LDCL,
  VUNSPEC_LDC2L, VUNSPEC_STC, VUNSPEC_STC2, VUNSPEC_STCL,
  VUNSPEC_STC2L): New.

gcc/testsuite/ChangeLog:

2016-11-09  Andre Vieira  

  * gcc.target/arm/acle/ldc: New.
  * gcc.target/arm/acle/ldc2: New.
  * gcc.target/arm/acle/ldcl: New.
  * gcc.target/arm/acle/ldc2l: New.
  * gcc.target/arm/acle/stc: New.
  * gcc.target/arm/acle/stc2: New.
  * gcc.target/arm/acle/stcl: New.
  * gcc.target/arm/acle/stc2l: New.


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
bdb8aad8658af089b4977373654bb2d2c0b5c653..e6e58cd6a656732e37c6bca23ad980eea522a710
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -50,6 +50,8 @@ enum arm_type_qualifiers
   qualifier_const = 0x2, /* 1 << 1  */
   /* T *foo.  */
   qualifier_pointer = 0x4, /* 1 << 2  */
+  /* const T * foo */
+  qualifier_const_pointer = 0x6,
   /* Used when expanding arguments if an operand could
  be an immediate.  */
   qualifier_immediate = 0x8, /* 1 << 3  */
@@ -177,6 +179,23 @@ arm_cdp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned_immediate };
 #define CDP_QUALIFIERS \
   (arm_cdp_qualifiers)
+
+/* void (unsigned immediate, unsigned immediate,  const void *).  */
+static enum arm_type_qualifiers
+arm_ldc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate, qualifier_const_pointer };
+#define LDC_QUALIFIERS \
+  (arm_ldc_qualifiers)
+
+/* void (unsigned immediate, unsigned immediate,  void *).  */
+static enum arm_type_qualifiers
+arm_stc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate, qualifier_pointer };
+#define 

[PATCH 3/6][ARM] Implement support for ACLE Coprocessor CDP intrinsics

2016-11-09 Thread Andre Vieira (lists)
Hi,

This patch implements support for the ARM ACLE Coprocessor CDP
intrinsics. See below a table mapping the intrinsics to their respective
instructions:

++--+
| Intrinsic signature| Instruction
pattern  |
++--+
|void __arm_cdp(coproc, opc1, CRd, CRn, CRm, opc2)   |CDP coproc, opc1,
CRd, CRn, CRm, opc2 |
++--+
|void __arm_cdp2(coproc, opc1, CRd, CRn, CRm, opc2)  |CDP2 coproc, opc1,
CRd, CRn, CRm, opc2|
++--+
Note that any untyped variable in the intrinsic signature is required to
be a compiler-time constant and has the type 'unsigned int'.  We do some
boundary checks for coproc:[0-15], opc1:[0-15], CR*:[0-31], opc2:[0-7].
If either of these requirements are not met a diagnostic is issued.

I renamed neon_const_bounds in this patch, to arm_const_bounds, simply
because it is also used in the Coprocessor intrinsics.  It also requires
the expansion of the builtin frame work such that it accepted 'void'
modes and intrinsics with 6 arguments.

I also changed acle.exp to run tests for multiple options, where all lto
option sets are appended with -ffat-objects to allow for assembly scans.

Is this OK for trunk?

Regards,
Andre

gcc/ChangeLog:
2016-11-09  Andre Vieira  

  * config/arm/arm.md (): New.
  * config/arm/arm.c (neon_const_bounds): Rename this ...
  (arm_const_bounds): ... this.
  (arm_coproc_builtin_available): New.
  * config/arm/arm-builtins.c (SIMD_MAX_BUILTIN_ARGS): Increase.
  (arm_type_qualifiers): Add 'qualifier_unsigned_immediate'.
  (CDP_QUALIFIERS): Define to...
  (arm_cdp_qualifiers): ... this. New.
  (void_UP): Define.
  (arm_expand_builtin_args): Add case for 6 arguments.
  * config/arm/arm-protos.h (neon_const_bounds): Rename this ...
  (arm_const_bounds): ... this.
  (arm_coproc_builtin_available): New.
  * config/arm/arm_acle.h (__arm_cdp): New.
  (__arm_cdp2): New.
  * config/arm/arm_acle_builtins.def (cdp): New.
  (cdp2): New.
  * config/arm/iterators.md (CDPI,CDP,cdp): New.
  * config/arm/neon.md: Rename all 'neon_const_bounds' to
  'arm_const_bounds'.
  * config/arm/types.md (coproc): New.
  * config/arm/unspecs.md (VUNSPEC_CDP, VUNSPEC_CDP2): New.
  * gcc/doc/extend.texi (ACLE): Add a mention of Coprocessor intrinsics.

gcc/testsuite/ChangeLog:
2016-11-09  Andre Vieira  

  * gcc.target/arm/acle/acle.exp: Run tests for different options
  and make sure fat-lto-objects is used such that we can still do
  assemble scans.
  * gcc.target/arm/acle/cdp.c: New.
  * gcc.target/arm/acle/cdp2.c: New.
  * lib/target-supports.exp (check_effective_target_arm_coproc1_ok): New.
  (check_effective_target_arm_coproc1_ok_nocache): New.
  (check_effective_target_arm_coproc2_ok): New.
  (check_effective_target_arm_coproc2_ok_nocache): New.
  (check_effective_target_arm_coproc3_ok): New.
  (check_effective_target_arm_coproc3_ok_nocache): New.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
2130a3004f17c47be6e42412c1ea30f3cff20573..bdb8aad8658af089b4977373654bb2d2c0b5c653
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -38,7 +38,7 @@
 #include "langhooks.h"
 #include "case-cfn-macros.h"
 
-#define SIMD_MAX_BUILTIN_ARGS 5
+#define SIMD_MAX_BUILTIN_ARGS 7
 
 enum arm_type_qualifiers
 {
@@ -53,6 +53,7 @@ enum arm_type_qualifiers
   /* Used when expanding arguments if an operand could
  be an immediate.  */
   qualifier_immediate = 0x8, /* 1 << 3  */
+  qualifier_unsigned_immediate = 0x9,
   qualifier_maybe_immediate = 0x10, /* 1 << 4  */
   /* void foo (...).  */
   qualifier_void = 0x20, /* 1 << 5  */
@@ -164,6 +165,18 @@ arm_unsigned_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned };
 #define UBINOP_QUALIFIERS (arm_unsigned_binop_qualifiers)
 
+/* void (unsigned immediate, unsigned immediate, unsigned immediate,
+unsigned immediate, unsigned immediate, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cdp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate };
+#define CDP_QUALIFIERS \
+  (arm_cdp_qualifiers)
 /* The first argument (return type) of a store should be void type,
which we represent with qualifier_void.  Their first operand will be
a DImode pointer to the location to store to, so we must use
@@ -200,6 +213,7 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define oi_UP   OImode
 #define hf_UP 

[PATCH 1/6][ARM] Refactor NEON builtin framework to work for other builtins

2016-11-09 Thread Andre Vieira (lists)
Hi,

Refactor NEON builtin framework such that it can be used to implement
other builtins.

Is this OK for trunk?

Regards,
Andre

gcc/ChangeLog
2016-11-09  Andre Vieira  

* config/arm/arm-builtins.c (neon_builtin_datum): Rename to ..
(arm_builtin_datum): ... this.
(arm_init_neon_builtin): Rename to ...
(arm_init_builtin): ... this. Add a new parameters PREFIX
and USE_SIG_IN_NAME.
(arm_init_neon_builtins): Replace 'arm_init_neon_builtin' with
'arm_init_builtin'. Replace type 'neon_builtin_datum' with
'arm_builtin_datum'.
(arm_init_vfp_builtins): Likewise.
(builtin_arg): Rename enum's replacing 'NEON_ARG' with
'ARG_BUILTIN' and add a 'ARG_BUILTIN_NEON_MEMORY.
(arm_expand_neon_args): Rename to ...
(arm_expand_builtin_args): ... this. Rename builtin_arg
enum values and differentiate between ARG_BUILTIN_MEMORY
and ARG_BUILTIN_NEON_MEMORY.
(arm_expand_neon_builtin_1): Rename to ...
(arm_expand_builtin_1): ... this. Rename builtin_arg enum
values, arm_expand_builtin_args and add bool parameter NEON.
(arm_expand_neon_builtin): Use arm_expand_builtin_1.
(arm_expand_vfp_builtin): Likewise.
(NEON_MAX_BUILTIN_ARGS): Remove, it was unused.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
e73043db6db69fa64bb1e72cf71a36d7169062db..154dafc6bf2525165406ae07f7551a665b2bdee8
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -202,7 +202,7 @@ typedef struct {
   const enum insn_code code;
   unsigned int fcode;
   enum arm_type_qualifiers *qualifiers;
-} neon_builtin_datum;
+} arm_builtin_datum;
 
 #define CF(N,X) CODE_FOR_neon_##N##X
 
@@ -242,7 +242,7 @@ typedef struct {
   VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, L)
 
-/* The NEON builtin data can be found in arm_neon_builtins.def and
+/* The builtin data can be found in arm_neon_builtins.def,
arm_vfp_builtins.def.  The entries in arm_neon_builtins.def require
TARGET_NEON to be true.  The feature tests are checked when the
builtins are expanded.
@@ -252,14 +252,14 @@ typedef struct {
would be specified after the assembler mnemonic, which usually
refers to the last vector operand.  The modes listed per
instruction should be the same as those defined for that
-   instruction's pattern in neon.md.  */
+   instruction's pattern, for instance in neon.md.  */
 
-static neon_builtin_datum vfp_builtin_data[] =
+static arm_builtin_datum vfp_builtin_data[] =
 {
 #include "arm_vfp_builtins.def"
 };
 
-static neon_builtin_datum neon_builtin_data[] =
+static arm_builtin_datum neon_builtin_data[] =
 {
 #include "arm_neon_builtins.def"
 };
@@ -915,11 +915,17 @@ arm_init_simd_builtin_scalar_types (void)
 "__builtin_neon_uti");
 }
 
-/* Set up a NEON builtin.  */
+/* Set up a builtin.  It will use information stored in the argument struct D 
to
+   derive the builtin's type signature and name.  It will append the name in D
+   to the PREFIX passed and use these to create a builtin declaration that is
+   then stored in 'arm_builtin_decls' under index FCODE.  This FCODE is also
+   written back to D for future use.  If USE_SIG_IN_NAME is true the builtin's
+   name is appended with type signature information to distinguish between
+   signedness and poly.  */
 
 static void
-arm_init_neon_builtin (unsigned int fcode,
-  neon_builtin_datum *d)
+arm_init_builtin (unsigned int fcode, arm_builtin_datum *d,
+ const char * prefix, bool use_sig_in_name)
 {
   bool print_type_signature_p = false;
   char type_signature[SIMD_MAX_BUILTIN_ARGS] = { 0 };
@@ -1007,12 +1013,12 @@ arm_init_neon_builtin (unsigned int fcode,
 
   gcc_assert (ftype != NULL);
 
-  if (print_type_signature_p)
-snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s_%s",
- d->name, type_signature);
+  if (print_type_signature_p && use_sig_in_name)
+snprintf (namebuf, sizeof (namebuf), "%s_%s_%s",
+ prefix, d->name, type_signature);
   else
-snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s",
- d->name);
+snprintf (namebuf, sizeof (namebuf), "%s_%s",
+ prefix, d->name);
 
   fndecl = add_builtin_function (namebuf, ftype, fcode, BUILT_IN_MD,
 NULL, NULL_TREE);
@@ -1048,8 +1054,8 @@ arm_init_neon_builtins (void)
 
   for (i = 0; i < ARRAY_SIZE (neon_builtin_data); i++, fcode++)
 {
-  neon_builtin_datum *d = _builtin_data[i];
-  arm_init_neon_builtin (fcode, d);
+  arm_builtin_datum *d = _builtin_data[i];
+  arm_init_builtin (fcode, d, "__builtin_neon", true);
 }
 }
 
@@ -1062,8 +1068,8 @@ arm_init_vfp_builtins (void)
 
   for (i = 0; i < ARRAY_SIZE (vfp_builtin_data); i++, fcode++)
 {
-  neon_builtin_datum *d = _builtin_data[i];
-  arm_init_neon_builtin (fcode, d);
+  arm_builtin_datum 

[PATCH 2/6][ARM] Move CRC builtins to refactored framework

2016-11-09 Thread Andre Vieira (lists)
Hi,

This patch refactors the implementation of the ARM ACLE CRC builtins to
use the builtin framework.

Is this OK for trunk?

Regards,
Andre

gcc/ChangeLog
2016-11-09  Andre Vieira  

  * config/arm/arm-builtins.c (arm_unsigned_binop_qualifiers): New.
  (UBINOP_QUALIFIERS): New.
  (si_UP): Define.
  (acle_builtin_data): New. Change comment.
  (arm_builtins): Remove ARM_BUILTIN_CRC32B, ARM_BUILTIN_CRC32H,
  ARM_BUILTIN_CRC32W, ARM_BUILTIN_CRC32CB, ARM_BUILTIN_CRC32CH,
  ARM_BUILTIN_CRC32CW. Add ARM_BUILTIN_ACLE_BASE and include
  arm_acle_builtins.def.
  (ARM_BUILTIN_ACLE_PATTERN_START): Define.
  (arm_init_acle_builtins): New.
  (CRC32_BUILTIN): Remove.
  (bdesc_2arg): Remove entries for crc32b, crc32h, crc32w,
  crc32cb, crc32ch and crc32cw.
  (arm_init_crc32_builtins): Remove.
  (arm_init_builtins): Use arm_init_acle_builtins rather
  than arm_init_crc32_builtins.
  (arm_expand_acle_builtin): New.
  (arm_expand_builtin): Use 'arm_expand_acle_builtin'.
  * config/arm/arm_acle_builtins.def: New.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
154dafc6bf2525165406ae07f7551a665b2bdee8..2130a3004f17c47be6e42412c1ea30f3cff20573
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -157,6 +157,13 @@ arm_load1_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_none, qualifier_struct_load_store_lane_index };
 #define LOAD1LANE_QUALIFIERS (arm_load1_lane_qualifiers)
 
+/* unsigned T (unsigned T, unsigned T, unsigned T).  */
+static enum arm_type_qualifiers
+arm_unsigned_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
+  qualifier_unsigned };
+#define UBINOP_QUALIFIERS (arm_unsigned_binop_qualifiers)
+
 /* The first argument (return type) of a store should be void type,
which we represent with qualifier_void.  Their first operand will be
a DImode pointer to the location to store to, so we must use
@@ -242,17 +249,16 @@ typedef struct {
   VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, L)
 
-/* The builtin data can be found in arm_neon_builtins.def,
-   arm_vfp_builtins.def.  The entries in arm_neon_builtins.def require
-   TARGET_NEON to be true.  The feature tests are checked when the
-   builtins are expanded.
+/* The builtin data can be found in arm_neon_builtins.def, arm_vfp_builtins.def
+   and arm_acle_builtins.def.  The entries in arm_neon_builtins.def require
+   TARGET_NEON to be true.  The feature tests are checked when the builtins are
+   expanded.
 
-   The mode entries in the following table correspond to the "key"
-   type of the instruction variant, i.e. equivalent to that which
-   would be specified after the assembler mnemonic, which usually
-   refers to the last vector operand.  The modes listed per
-   instruction should be the same as those defined for that
-   instruction's pattern, for instance in neon.md.  */
+   The mode entries in the following table correspond to the "key" type of the
+   instruction variant, i.e. equivalent to that which would be specified after
+   the assembler mnemonic for neon instructions, which usually refers to the
+   last vector operand.  The modes listed per instruction should be the same as
+   those defined for that instruction's pattern, for instance in neon.md.  */
 
 static arm_builtin_datum vfp_builtin_data[] =
 {
@@ -266,6 +272,15 @@ static arm_builtin_datum neon_builtin_data[] =
 
 #undef CF
 #undef VAR1
+#define VAR1(T, N, A) \
+  {#N, UP (A), CODE_FOR_##N, 0, T##_QUALIFIERS},
+
+static arm_builtin_datum acle_builtin_data[] =
+{
+#include "arm_acle_builtins.def"
+};
+
+#undef VAR1
 
 #define VAR1(T, N, X) \
   ARM_BUILTIN_NEON_##N##X,
@@ -518,13 +533,6 @@ enum arm_builtins
 
   ARM_BUILTIN_WMERGE,
 
-  ARM_BUILTIN_CRC32B,
-  ARM_BUILTIN_CRC32H,
-  ARM_BUILTIN_CRC32W,
-  ARM_BUILTIN_CRC32CB,
-  ARM_BUILTIN_CRC32CH,
-  ARM_BUILTIN_CRC32CW,
-
   ARM_BUILTIN_GET_FPSCR,
   ARM_BUILTIN_SET_FPSCR,
 
@@ -556,6 +564,14 @@ enum arm_builtins
 
 #include "arm_neon_builtins.def"
 
+#undef VAR1
+#define VAR1(T, N, X) \
+  ARM_BUILTIN_##N,
+
+  ARM_BUILTIN_ACLE_BASE,
+
+#include "arm_acle_builtins.def"
+
   ARM_BUILTIN_MAX
 };
 
@@ -565,6 +581,9 @@ enum arm_builtins
 #define ARM_BUILTIN_NEON_PATTERN_START \
   (ARM_BUILTIN_NEON_BASE + 1)
 
+#define ARM_BUILTIN_ACLE_PATTERN_START \
+  (ARM_BUILTIN_ACLE_BASE + 1)
+
 #undef CF
 #undef VAR1
 #undef VAR2
@@ -1025,6 +1044,23 @@ arm_init_builtin (unsigned int fcode, arm_builtin_datum 
*d,
   arm_builtin_decls[fcode] = fndecl;
 }
 
+/* Set up ACLE builtins, even builtins for instructions that are not
+   in the current target ISA to allow the user to compile particular modules
+   with different target specific options that differ from the command line
+   options.  Such builtins will be rejected in arm_expand_builtin.  */
+
+static void
+arm_init_acle_builtins (void)
+{
+  unsigned int i, fcode = ARM_BUILTIN_ACLE_PATTERN_START;
+
+  

  1   2   >