Re: [ipa-vrp] ice in set_value_range

2016-11-08 Thread Andrew Pinski
On Tue, Nov 8, 2016 at 2:11 AM, kugan  wrote:
> Hi,
>
> On 04/11/16 03:24, Martin Jambor wrote:
>>
>> Hi,
>>
>> On Fri, Oct 28, 2016 at 01:58:13PM +1100, kugan wrote:

 Do I understand it correctly that extract_range_from_unary_expr deals
 with any potential type conversions better (compared to what you did
 before here)?
>>>
>>>
>>> Yes, this can be wrong at times too as reported in
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78121. I have separated this
>>> part of the patch with a testcase.
>>>
>>> Please note that I am using fold_convert in the attached patch.
>>>
>>> Bootstrapped and regression tested on x86_64-linux-gnu with no new
>>> regressions. Is this OK for trunk?
>>>
>>
>> I have no objections, but we need to wait for Honza.
>
> Thanks.
>
> Honza, is this OK for you ?


Either this patch or the patch for "Handle unary pass-through jump
functions for ipa-vrp" caused a bootstrap failure on
aarch64-linux-gnu.
Bootstrap comparison failure!
gcc/go/types.o differs
gcc/fortran/class.o differs
gcc/tree-ssa-live.o differs
gcc/data-streamer-out.o differs
gcc/ira-build.o differs
gcc/hsa-gen.o differs
gcc/hsa-brig.o differs
gcc/omp-low.o differs
gcc/lto-streamer-in.o differs
gcc/real.o differs
gcc/final.o differs
gcc/df-core.o differs

I bootstrap with the following options:

--with-cpu=thunderx+lse --enable-languages=c,c++,fortran,go
--disable-werror --with-sysroot=/ --enable-plugins
--enable-gnu-indirect-function

I have not tried removing the +lse part though

Thanks,
Andrew Pinski


>
> Thanks,
> Kugan
>
>
>>
>> Thanks,
>>
>> Martin
>>
>>> Thanks,
>>> Kugan
>>>
>>>
>>> gcc/ChangeLog:
>>>
>>> 2016-10-28  Kugan Vivekanandarajah  
>>>
>>> PR ipa/78121
>>> * ipa-cp.c (propagate_vr_accross_jump_function): Pass param type.
>>> Also fold constant passed as argument while computing value
>>> range.
>>> (propagate_constants_accross_call): Pass param type.
>>> * ipa-prop.c: export ipa_get_callee_param_type.
>>> * ipa-prop.h: export ipa_get_callee_param_type.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2016-10-28  Kugan Vivekanandarajah  
>>>
>>> PR ipa/78121
>>> * gcc.dg/ipa/pr78121.c: New test.
>>
>>
>


[PATCH] enable -fprintf-return-value by default

2016-11-08 Thread Martin Sebor

The -fprintf-return-value optimization has been disabled since
the last time it caused a bootstrap failure on powerpc64le.  With
the underlying problems fixed GCC has bootstrapped fine on all of
powerpc64, powerpc64le and x86_64 and tested with no regressions.
I'd like to re-enable the option.  The attached patch does that.

Thanks
Martin

gcc/c-family/ChangeLog:

	* c.opt (-fprintf-return-value): Enable by default.

gcc/ChangeLog:

	* doc/invoke.texi (-fprintf-return-value): Document that option
	is enabled by default.

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 7d8a726..9c9e83f 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1534,7 +1534,7 @@ C++ ObjC++ Var(flag_pretty_templates) Init(1)
 -fno-pretty-templates Do not pretty-print template specializations as the template signature followed by the arguments.
 
 fprintf-return-value
-C ObjC C++ ObjC++ LTO Optimization Var(flag_printf_return_value) Init(0)
+C ObjC C++ ObjC++ LTO Optimization Var(flag_printf_return_value) Init(1)
 Treat known sprintf return values as constants.
 
 freplace-objc-classes
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 17c5c22..adebeff 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8301,7 +8301,7 @@ if (snprintf (buf, "%08x", i) >= sizeof buf)
 The @option{-fprintf-return-value} option relies on other optimizations
 and yields best results with @option{-O2}.  It works in tandem with the
 @option{-Wformat-length} option.  The @option{-fprintf-return-value}
-option is disabled by default.
+option is enabled by default.
 
 @item -fno-peephole
 @itemx -fno-peephole2


Re: [Patch 6/11] Migrate excess precision logic to use TARGET_EXCESS_PRECISION

2016-11-08 Thread Joseph Myers
On Wed, 2 Nov 2016, James Greenhalgh wrote:

> OK, I've reworked the patch along those lines. I noticed that the original
> logic looked for
> 
>   && TARGET_FLT_EVAL_METHOD != 0
> 
> And I no longer make that check. Is that something I need to reinstate?

No, the replacement logic should imply that previously supported cases 
with TARGET_FLT_EVAL_METHOD == 0 will pass as being IEEE-compatible.

> I didn't find any reference to excess precision in Annex F, so I'd guess
> not?

There are references, e.g. F.6 requiring that the return statement removes 
excess precision (but that's something done in the front end, nothing to 
do with this patch).

> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> index 547bab2..507967d 100644
> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -1314,6 +1314,8 @@ c_tree_chain_next (tree t)
>  #define TM_STMT_ATTR_ATOMIC  4
>  #define TM_STMT_ATTR_RELAXED 8
>  
> +extern int parse_tm_stmt_attr (tree, int);
> +
>  /* Mask used by tm_attr_to_mask and tm_mask_to_attr.  Note that these
> are ordered specifically such that more restrictive attributes are
> at lower bit positions.  This fact is known by the C++ tm attribute
> @@ -1325,6 +1327,10 @@ c_tree_chain_next (tree t)
>  #define TM_ATTR_IRREVOCABLE  8
>  #define TM_ATTR_MAY_CANCEL_OUTER 16
>  
> +extern int tm_attr_to_mask (tree);
> +extern tree tm_mask_to_attr (int);
> +extern tree find_tm_attribute (tree);
> +
>  /* A suffix-identifier value doublet that represents user-defined literals
> for C++-0x.  */
>  enum overflow_type {

These changes to c-common.h are nothing to do with the subject of the 
patch.

The patch is OK with those changes removed.  If there are other changes in 
this series still needing review, please repost the whole series 
identifying what patches still need review.  (I think the ARM _Float16 
support still needs the testsuite changes to ensure that 
architecture-specific options to enable _FloatN / _FloatNx support are 
used when testing if the types are supported, so that the _Float16 tests 
are actually run in that case.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Fix PR78230

2016-11-08 Thread Kito Cheng
This change also verified with gcc 6.1 and 5.4
it can pass with gcc 6.1 and make gcc 5.4 ICE after this change.

On Wed, Nov 9, 2016 at 10:43 AM, Kito Cheng  wrote:
> gcc/testsuite/ChangeLog:
>
> 2016-11-09  Kito Cheng 
>
> PR target/78230
> * gcc.dg/torture/pr66178.c (test): Use uintptr_t instead of int.
> (test2) Ditto.


[PATCH] Fix PR78230

2016-11-08 Thread Kito Cheng
gcc/testsuite/ChangeLog:

2016-11-09  Kito Cheng 

PR target/78230
* gcc.dg/torture/pr66178.c (test): Use uintptr_t instead of int.
(test2) Ditto.
From 73ff22745720ecfc2a33148f68ff7e0f36c1144b Mon Sep 17 00:00:00 2001
From: Kito Cheng 
Date: Wed, 9 Nov 2016 10:39:59 +0800
Subject: [PATCH] Use uintptr_t instead of int for pr66178.c

---
 gcc/testsuite/gcc.dg/torture/pr66178.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr66178.c b/gcc/testsuite/gcc.dg/torture/pr66178.c
index c42996d..ee09cf6 100644
--- a/gcc/testsuite/gcc.dg/torture/pr66178.c
+++ b/gcc/testsuite/gcc.dg/torture/pr66178.c
@@ -1,9 +1,11 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target label_values } */
 
+typedef __UINTPTR_TYPE__ uintptr_t;
+
 int test(void)
 {
-static int a =  ((char *)&(char *)&)-1;
+static uintptr_t a =  ((char *)&(char *)&)-1;
 l1:
 l2:
 return a;
@@ -11,7 +13,7 @@ l2:
 
 int test2(void)
 {
-static int a =  ((char *)&(char *)&)+((char *)&(char *)&);
+static uintptr_t a =  ((char *)&(char *)&)+((char *)&(char *)&);
 l1:
 l2:
 l3:
-- 
2.7.4



Re: [Patch 1/11] Add a new target hook for describing excess precision intentions

2016-11-08 Thread Joseph Myers
On Wed, 2 Nov 2016, James Greenhalgh wrote:

> 2016-11-02  James Greenhalgh  
> 
>   * target.def (excess_precision): New hook.
>   * target.h (flt_eval_method): New.
>   (excess_precision_type): Likewise.
>   * targhooks.c (default_excess_precision): New.
>   * targhooks.h (default_excess_precision): New.
>   * doc/tm.texi.in (TARGET_C_EXCESS_PRECISION): New.
>   * doc/tm.texi: Regenerate.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker

2016-11-08 Thread Iain Sandoe

> On 8 Nov 2016, at 13:39, Mike Stump  wrote:
> 
> On Nov 8, 2016, at 1:05 PM, Iain Sandoe  wrote:
>> 
>> Simple for the simple case is already part of my patch, but capability for 
>> the expert and non-simple case is also present,
> 
> I'm trying to ask a specific question, what does the patch allow that can't 
> be done otherwise?  Kinda a trivial question, and I don't see any answer.
> 
> I'm not looking for, it allows it to work, or it makes the expert case work.  
> I'm look for the specific question, and the specific information you want, 
> and why ld -v doesn't get it.

ld -v gets it when you can execute ld.
It doesn’t get it when the $host ld is not executable on $build.
Providing the option to give the version allows that without requiring the 
complexity of other (possibly valid) solutions.  If you know that you’re 
building (my patched) ld64-253.9 for powerpc-darwin9 (crossed from 
x86-64-darwin14) it’s easy, just put —with-ld64=253.9 .. 

I think we’ve debated this enough - I’m OK with keeping my extra facility 
locally and will resubmit the patch with it removed in due course,
Iain

> 
> For example, there is a host you are thinking of, there is a  build system 
> you are think of, there is a target you are thinking of.  There is a set of 
> software you have, likely a particular Xcode release.  There are binaries 
> that you might have built up for it, there might be headers you grab from 
> some place, libraries or SDKs from another place.  This might be done for 
> quick debugging a darwin problem on a linux host, but without the full 
> ability to generate a tool chain, it might be to support a full tool chain.
> 
> I want it to work, without specifying the it, doesn't let me see what problem 
> you are solving.
> 
> I read through the PR listed, and it seems to just list a typical 
> x86_64-apple-darwin12.6.0 native port.  That really isn't rocket science is 
> it?  I mean, we see the wanting from the build tools directly.  So, the 
> change isn't to support that, is it?
> 



[PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-08 Thread Martin Sebor

The -Wformat-length checker relies on the compute_builtin_object_size
function to determine the size of the buffer it checks for overflow.
The function returns either a size computed by the tree-object-size
pass for objects referenced by the __builtin_object_size intrinsic
(if it's used in the program) or it tries to compute it for a small
subset of expressions otherwise.  This subset doesn't include objects
allocated by either malloc or alloca, and so for those the function
returns "unknown" or (size_t)-1 in the case of -Wformat-length.  As
a consequence, -Wformat-length is unable to detect overflows
involving such objects.

The attached patch adds a new function, compute_object_size, that
uses the existing algorithms to compute and return the sizes of
allocated objects as well, as if they were referenced by
__builtin_object_size in the program source, enabling the
-Wformat-length checker to detect more buffer overflows.

Martin

PS The function makes use of the init_function_sizes API that is
otherwise unused outside the tree-object-size pass to initialize
the internal structures, but then calls fini_object_sizes to
release them before returning.  That seems wasteful because
the size of the same object or one related to it might need
to computed again in the context of the same function.  I
experimented with allocating and releasing the structures only
when current_function_decl changes but that led to crashes.
I suspect I'm missing something about the management of memory
allocated for these structures.  Does anyone have any suggestions
how to make this work?  (Do I perhaps need to allocate them using
a special allocator so they don't get garbage collected?)
PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically allocated buffer

gcc/testsuite/ChangeLog:

	PR middle-end/78245
	* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests.

gcc/ChangeLog:

	PR middle-end/78245
	* gimple-ssa-sprintf.c (get_destination_size): Call compute_object_size.
	* tree-object-size.c (addr_object_size): Adjust.
	(pass_through_call): Adjust.
	(compute_object_size, internal_object_size): New functions.
	(compute_builtin_object_size): Call internal_object_size.
	(pass_object_sizes::execute): Adjust.
	* tree-object-size.h (compute_object_size): Declare.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 3138ad3..f360711 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -2471,7 +2471,7 @@ get_destination_size (tree dest)
  object (the function fails without optimization in this type).  */
   int ost = optimize > 0;
   unsigned HOST_WIDE_INT size;
-  if (compute_builtin_object_size (dest, ost, ))
+  if (compute_object_size (dest, ost, ))
 return size;
 
   return HOST_WIDE_INT_M1U;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
index 8d97fa8..9874332 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
@@ -1,5 +1,10 @@
 /* { dg-do compile } */
-/* { dg-options "-std=c99 -O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */
+/* { dg-options "-O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */
+/* Verify that all sprintf built-ins detect overflow involving directives
+   with non-constant arguments known to be constrained by some range of
+   values, and even when writing into dynamically allocated buffers.
+   -O2 (-ftree-vrp) is necessary for the tests involving ranges to pass,
+   otherwise -O1 is sufficient.  */
 
 #ifndef LINE
 #  define LINE 0
@@ -7,18 +12,26 @@
 
 #define bos(x) __builtin_object_size (x, 0)
 
-#define T(bufsize, fmt, ...)		\
-do {\
-  if (!LINE || __LINE__ == LINE)	\
-	{\
-	  char *d = (char *)__builtin_malloc (bufsize);			\
-	  __builtin___sprintf_chk (d, 0, bos (d), fmt, __VA_ARGS__);	\
-	  sink (d);			\
-	}\
-} while (0)
+/* Defined (and redefined) to the allocation function to use, either
+   malloc, or alloca, or a VLA.  */
+#define ALLOC(p, n)   (p) = __builtin_malloc (n)
 
-void
-sink (void*);
+/* Defined (and redefined) to the sprintf function to exercise.  */
+#define TEST_SPRINTF(d, maxsize, objsize, fmt, ...)		\
+  __builtin___sprintf_chk (d, 0, objsize, fmt, __VA_ARGS__)
+
+#define T(bufsize, fmt, ...)\
+  do {			\
+if (!LINE || __LINE__ == LINE)			\
+  {			\
+	char *d;	\
+	ALLOC (d, bufsize);\
+	TEST_SPRINTF (d, 0, bos (d), fmt, __VA_ARGS__);	\
+	sink (d);	\
+  }			\
+  } while (0)
+
+void sink (void*);
 
 /* Identity function to verify that the checker figures out the value
of the operand even when it's not constant (i.e., makes use of
@@ -232,3 +245,88 @@ void test_sprintf_chk_range_sshort (signed short *a, signed short *b)
   T ( 4, "%i",  Ra (998,  999));
   T ( 4, "%i",  Ra (999, 1000)); /* { dg-warning "may write a terminating nul past the end of the 

[Patch, Fortran, committed] PR 66840: [OOP] ICE on declaring class variable with wrong attribute

2016-11-08 Thread Janus Weil
Hi all,

I have just committed as obvious a small patch for an ice-on-invalid problem:

https://gcc.gnu.org/viewcvs/gcc?view=revision=241979

Cheers,
Janus


Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker

2016-11-08 Thread Mike Stump
On Nov 8, 2016, at 1:05 PM, Iain Sandoe  wrote:
> 
> Simple for the simple case is already part of my patch, but capability for 
> the expert and non-simple case is also present,

I'm trying to ask a specific question, what does the patch allow that can't be 
done otherwise?  Kinda a trivial question, and I don't see any answer.

I'm not looking for, it allows it to work, or it makes the expert case work.  
I'm look for the specific question, and the specific information you want, and 
why ld -v doesn't get it.

For example, there is a host you are thinking of, there is a  build system you 
are think of, there is a target you are thinking of.  There is a set of 
software you have, likely a particular Xcode release.  There are binaries that 
you might have built up for it, there might be headers you grab from some 
place, libraries or SDKs from another place.  This might be done for quick 
debugging a darwin problem on a linux host, but without the full ability to 
generate a tool chain, it might be to support a full tool chain.

I want it to work, without specifying the it, doesn't let me see what problem 
you are solving.

I read through the PR listed, and it seems to just list a typical 
x86_64-apple-darwin12.6.0 native port.  That really isn't rocket science is it? 
 I mean, we see the wanting from the build tools directly.  So, the change 
isn't to support that, is it?



Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker

2016-11-08 Thread Iain Sandoe

> On 8 Nov 2016, at 10:27, Mike Stump  wrote:
> 
> On Nov 8, 2016, at 8:31 AM, Iain Sandoe  wrote:
>> 
>>> On 8 Nov 2016, at 08:18, Mike Stump  wrote:
>>> 
>>> On Nov 7, 2016, at 6:33 PM, Iain Sandoe  wrote:
 
 a) right now, we need to know the target linker version - while it’s not 
 impossible to try and conjure up some test to see if a linker we can run 
 supports coalesced sections or not, the configury code and complexity 
 needed to support that would exceed what I’m proposing at present (and 
 still would not cover the native and canadian cases).
>>> 
>>> A traditional canadian can run the host linker for the target on the build 
>>> machine with --version (or whatever flag) and capture the version number.  
>>> I don't know what setup you have engineered for, since you didn't say.  
>>> First question, can you run the host linker for the target on the build 
>>> machine?  If so, you can directly capture the output.  The next question 
>>> is, is it the same version as the version that would be used on the host?
>> 
>> I suppose that one could demand that - and require a build thus.
> 
> It is a statement of what we have today, already a requirement.  Sorry you 
> missed the memo.  The problem is software usually changes through time, and 
> the old software can't acquire the features of the new software, so to get 
> those you need the new software, not the old.  If there are no new features, 
> in some rare cases, one might be able to use a range of versions, but the 
> range that would work, would be the range that an expert tested and 
> documented as working.  Absent that, it generally speaking isn't that safe to 
> assume it is safe.  And, when it isn't safe and doesn't work, well, it just 
> doesn't work.  Thinking it will, or hoping it will, won't make it so.  If 
> those features were able to be pushed back into the old software, you're 
> merely reinvent a source distribution of the the new software in different 
> clothes, poorly.
> 
>> If we demand that the same version linker is used for all, then perhaps that 
>> could work.
> 
> More that that, it will just work; by design; and in the cases where that 
> isn't the case, those are bugs that can be worked around or fixed.
> 
>> It seems likely that we’ll end up with mis-configures and stuff hard to 
>> support with non-expert build folks.
> 
> Nope.  Indeed, remember the only reason why we do this, is to make the 
> phrase, it just works, true.  If you are imagining anything else, the flaw is 
> mine in communicating why we do what it is that we do.
> 
 I’m not debating the various solutions in your reply to Jeff - but 
 honestly I wonder how many of them are realistically in reach of the 
 typical end-user (I have done most of them at one stage or another, but I 
 wonder how many would be stopped dead by “first find and build ld64, which 
 itself needs a c++11 compiler and BTW needs you to build libLTO.dylib .. 
 which needs you to build at least LLVM itself").
>>> 
>>> Package managers exist to solve that problem nicely, if someone wants a 
>>> trivial solution.  They have the ability to scoop up binaries and just copy 
>>> them onto a machine, solving hard chicken/egg problems. Other possibilities 
>>> are scripts that setup everything and release the scripts.
>> 
>> yes, I’m working on at least the latter (don’t have time to become a package 
>> manager).
> 
> The thing is, the build scripts have already been written, we're just 'making 
> them work'.
> 
 am I missing a point here?
>>> 
>>> The answer to the two questions above.  The answer to the question, what 
>>> specific question do you want answered, and what is available to the build 
>>> machine, specifically to answer that question?
>>> 
>>> Also, you deflect on the os version to linker version number, but you never 
>>> said what didn't actually work.  What specifically doesn't work?  This 
>>> method is trivial and the mapping is direct and expressible in a few lines 
>>> per version supported.  I still maintain that the only limitation is you 
>>> must choose exactly 1 version per config triplet; I don't see that as a 
>>> problem.  If it were, I didn't see you explain the problem. Even if it is, 
>>> that problem is solved after the general problem that nothing works today.  
>>> By having at least _a_ mapping, you generally solve the problem for most 
>>> people, most of the time.
>> 
>> It *requires* that one configures arch-darwinNN .. and doesn’t allow for 
>> arch-darwin (to mean anything other than  build=host=target)
> 
> No, that's a misunderstanding of my position.  Use the filter, does this mean 
> 'it just works', or not.  If not, then that's not what I mean.
> 
>> If we can engineer that a suitable ld64 can be run at configuration time so 
>> that the version can be discovered automatically, I’m with you 100% - 

[PATCH] combine: Do not call simplify from inside change_zero_ext (PR78232)

2016-11-08 Thread Segher Boessenkool
When combine splits a three-insn combination into two instructions it
can reuse i2dest for the temporary result of the first new instruction.
However all information it has in reg_stat about that register will be
stale.  This results in the simplify_gen_binary calls in change_zero_ext
using out-of-date information, which makes it think one of the ANDs
generated there always results in 0, and it doesn't get better from there.

This can also happen if a splitter in the MD uses nonzero_bits (for
example).  I tried to make the splitting code in combine save and restore
the i2dest reg_stat info, but that causes one of the acats tests to fail.

This whole reg_stat thing needs an overhaul.

This patch changes change_zero_ext to do the expected simplifications
itself and not call simplify_gen_*.

Does anyone have a brighter idea?


Segher


2016-11-08  Segher Boessenkool  

PR rtl-optimization/78232
* combine.c (change_zero_ext): Do not call simplify_gen_binary, do
the simplifications manually.

---
 gcc/combine.c | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index 7ed0a62..40fe9b4 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11163,8 +11163,10 @@ change_zero_ext (rtx pat)
  if (BITS_BIG_ENDIAN)
start = GET_MODE_PRECISION (mode) - size - start;
 
- x = simplify_gen_binary (LSHIFTRT, mode,
-  XEXP (x, 0), GEN_INT (start));
+ if (start)
+   x = gen_rtx_LSHIFTRT (mode, XEXP (x, 0), GEN_INT (start));
+ else
+   x = XEXP (x, 0);
}
   else if (GET_CODE (x) == ZERO_EXTEND
   && SCALAR_INT_MODE_P (mode)
@@ -11220,16 +11222,18 @@ change_zero_ext (rtx pat)
   if (BITS_BIG_ENDIAN)
offset = reg_width - width - offset;
 
+  rtx x, y, z, w;
   wide_int mask = wi::shifted_mask (offset, width, true, reg_width);
-  rtx x = gen_rtx_AND (mode, reg, immed_wide_int_const (mask, mode));
-  rtx y = simplify_gen_binary (ASHIFT, mode, SET_SRC (pat),
-  GEN_INT (offset));
   wide_int mask2 = wi::shifted_mask (offset, width, false, reg_width);
-  y = simplify_gen_binary (AND, mode, y,
-  immed_wide_int_const (mask2, mode));
-  rtx z = simplify_gen_binary (IOR, mode, x, y);
+  x = gen_rtx_AND (mode, reg, immed_wide_int_const (mask, mode));
+  if (offset)
+   y = gen_rtx_ASHIFT (mode, SET_SRC (pat), GEN_INT (offset));
+  else
+   y = SET_SRC (pat);
+  z = gen_rtx_AND (mode, y, immed_wide_int_const (mask2, mode));
+  w = gen_rtx_IOR (mode, x, z);
   SUBST (SET_DEST (pat), reg);
-  SUBST (SET_SRC (pat), z);
+  SUBST (SET_SRC (pat), w);
 
   changed = true;
 }
-- 
1.9.3



RE: [PATCH,testsuite] MIPS: Upgrade to MIPS IV if using (HAS_MOVN) with MIPS III.

2016-11-08 Thread Moore, Catherine


> -Original Message-
> From: Toma Tabacu [mailto:toma.tab...@imgtec.com]
> Sent: Monday, November 7, 2016 11:21 AM
> gcc/testsuite/ChangeLog:
> 
> 2016-11-07  Toma Tabacu  
> 
>   * gcc.target/mips/mips.exp (mips-dg-options): Upgrade to MIPS IV if 
> using
>   (HAS_MOVN) with MIPS III.
> 
> diff --git a/gcc/testsuite/gcc.target/mips/mips.exp
> b/gcc/testsuite/gcc.target/mips/mips.exp
> index 39f44ff..e22d782 100644
> --- a/gcc/testsuite/gcc.target/mips/mips.exp
> +++ b/gcc/testsuite/gcc.target/mips/mips.exp
> @@ -1129,7 +1129,7 @@ proc mips-dg-options { args } {
>  # We need MIPS IV or higher for:
>   #
>   #
> - } elseif { $isa < 3
> + } elseif { $isa < 4
>  && [mips_have_test_option_p options "HAS_MOVN"] }
> {
>   mips_make_test_option options "-mips4"
>  # We need MIPS III or higher for:

Hi Toma,

The patch itself is OK, but the ChangeLog entry line length is greater than 80.

Do you have write access to the repository?  Please let me know if you would 
like me to commit this for you?

Thanks,
Catherine



Re: [PATCH v2] AIX visibility

2016-11-08 Thread Christophe Lyon
Hi David,


On 8 November 2016 at 19:00, David Edelsohn  wrote:
> On Tue, Nov 8, 2016 at 10:23 AM, Christophe Lyon
>  wrote:
>> Hi David,
>>
>> On 2 November 2016 at 16:41, David Edelsohn  wrote:
>>> This revised patch makes two changes:
>>>
>>> 1) Fix typo in configure.ac
>>> 2) Add AIX visibility support for ASM_WEAKEN_DECL, which does touch
>>> the same code as Linux.
>>>
>>> The AIX "weak" support fixes a large number of C++ visibility testcases.
>>>
>>> Bootstrapped on powerpc-ibm-aix7.2.0.0.
>>>
>>> * configure.ac (.hidden): Change to conftest_s string. Provide string
>>> for AIX assembler.
>>> (gcc_cv_ld_hidden): Yes for AIX.
>>> * configure: Regenerate.
>>>
>>> * dwarf2asm.c (USE_LINKONCE_INDIRECT): Don't set for AIX (XCOFF).
>>>
>>> * config/rs6000/rs6000-protos.h (rs6000_asm_weaken_decl): Declare
>>> (rs6000_xcoff_asm_output_aligned_decl_common): Declare.
>>> * config/rs6000/xcoff.h (TARGET_ASM_GLOBALIZE_DECL_NAME): Define.
>>> (ASM_OUTPUT_ALIGNED_DECL_COMMON): Define.
>>> (ASM_OUTPUT_ALIGNED_COMMON): Delete.
>>> * config/rs6000/rs6000.c (rs6000_init_builtins): Change clog rename
>>> from #if to if.
>>> (rs6000_xcoff_visibility): New.
>>> (rs6000_xcoff_declare_function_name): Add visibility support.
>>> (rs6000_xcoff_asm_globalize_decl_name): New.
>>> (rs6000_xcoff_asm_output_aligned_decl_common): New.
>>> (rs6000_asm_weaken_decl): New.
>>> (rs6000_code_end): Disable HIDDEN_LINKONCE on XCOFF.
>>> config/rs6000/rs6000.h (ASM_WEAKEN_DECL): Change definition to
>>> reference function.
>>>
>>> dwarf2asm.c okay?
>>>
>>> Any comments on ASM_WEAKEN_DECL change?
>>>
>>> Thanks, David
>>
>> It seems this commit (r241930) is causing a regression on aarch64:
>> FAIL: g++.dg/torture/pr60750.C   -O2 -flto -fno-use-linker-plugin
>> -flto-partition=none  execution test
>
> Hi, Christophe
>
> Because GCC wants to move toward runtime tests, the appended patch is
> a better solution.
>
> Thanks, David
>
> Index: dwarf2asm.c
> ===
> --- dwarf2asm.c (revision 241972)
> +++ dwarf2asm.c (working copy)
> @@ -824,8 +824,8 @@
>
>  static GTY(()) int dw2_const_labelno;
>
> -#if defined(HAVE_GAS_HIDDEN) && !defined(XCOFF_DEBUGGING_INFO)
> -# define USE_LINKONCE_INDIRECT (SUPPORTS_ONE_ONLY)
> +#if defined(HAVE_GAS_HIDDEN)
> +# define USE_LINKONCE_INDIRECT (SUPPORTS_ONE_ONLY && !XCOFF_DEBUGGING_INFO)
>  #else
>  # define USE_LINKONCE_INDIRECT 0
>  #endif

I confirm this 2nd version works.

Thanks

Christophe


Re: Prevent aliasing between arguments in calls to move_alloc

2016-11-08 Thread Steve Kargl
Yes.  I saw Ian's analysis in c.l.f.  It seems we both got
caught out on this one.  The patch looks fine.

-- 
steve

On Tue, Nov 08, 2016 at 08:26:37PM +0100, Paul Richard Thomas wrote:
> Hi Steve,
> 
> I moved too quickly and caused a regression. See the link in the
> testcase. The attached fixes the problem and bootstraps/regtests.
> 
> OK for trunk?
> 
> Paul
> 
> 
> On 5 November 2016 at 16:17, Steve Kargl
>  wrote:
> > On Sat, Nov 05, 2016 at 10:05:30AM +0100, Paul Richard Thomas wrote:
> >>
> >> Bootstraps and regtests on FC21/x86_64 - OK for trunk?
> >
> > OK with minor nit (see below).
> >
> >>
> >> +   /*  F2003 12.4.1.7  */
> >> +   if (to->expr_type == EXPR_VARIABLE && from->expr_type ==EXPR_VARIABLE
> >
> > Need a space after ==.
> >
> > --
> > Steve
> 
> 
> 
> -- 
> The difference between genius and stupidity is; genius has its limits.
> 
> Albert Einstein

> Index: gcc/fortran/check.c
> ===
> *** gcc/fortran/check.c   (revision 241872)
> --- gcc/fortran/check.c   (working copy)
> *** gfc_check_move_alloc (gfc_expr *from, gf
> *** 3343,3355 
>   }
>   
> /*  F2003 12.4.1.7  */
> !   if (to->expr_type == EXPR_VARIABLE && from->expr_type ==EXPR_VARIABLE
> && !strcmp (to->symtree->n.sym->name, from->symtree->n.sym->name))
>   {
> !   gfc_error ("The FROM and TO arguments at %L are either the same 
> object "
> !  "or subobjects thereof and so violate aliasing restrictions "
> !  "(F2003 12.4.1.7)", >where);
> !   return false;
>   }
>   
> /* CLASS arguments: Make sure the vtab of from is present.  */
> --- 3343,3380 
>   }
>   
> /*  F2003 12.4.1.7  */
> !   if (to->expr_type == EXPR_VARIABLE && from->expr_type == EXPR_VARIABLE
> && !strcmp (to->symtree->n.sym->name, from->symtree->n.sym->name))
>   {
> !   gfc_ref *to_ref, *from_ref;
> !   to_ref = to->ref;
> !   from_ref = from->ref;
> !   bool aliasing = true;
> ! 
> !   for (; from_ref && to_ref;
> !from_ref = from_ref->next, to_ref = to_ref->next)
> ! {
> !   if (to_ref->type != from->ref->type)
> ! aliasing = false;
> !   else if (to_ref->type == REF_ARRAY
> !&& to_ref->u.ar.type != AR_FULL
> !&& from_ref->u.ar.type != AR_FULL)
> ! /* Play safe; assume sections and elements are different.  */
> ! aliasing = false;
> !   else if (to_ref->type == REF_COMPONENT
> !&& to_ref->u.c.component != from_ref->u.c.component)
> ! aliasing = false;
> ! 
> !   if (!aliasing)
> ! break;
> ! }
> ! 
> !   if (aliasing)
> ! {
> !   gfc_error ("The FROM and TO arguments at %L violate aliasing "
> !  "restrictions (F2003 12.4.1.7)", >where);
> !   return false;
> ! }
>   }
>   
> /* CLASS arguments: Make sure the vtab of from is present.  */
> Index: gcc/testsuite/gfortran.dg/move_alloc_18.f90
> ===
> *** gcc/testsuite/gfortran.dg/move_alloc_18.f90   (revision 0)
> --- gcc/testsuite/gfortran.dg/move_alloc_18.f90   (working copy)
> ***
> *** 0 
> --- 1,21 
> + ! { dg-do compile }
> + !
> + ! Test that the anti-aliasing restriction does not knock out valid code.
> + !
> + ! Contributed by  Andrew Balwin on
> + ! https://groups.google.com/forum/#!topic/comp.lang.fortran/oiXdl1LPb_s
> + !
> +   PROGRAM TEST
> + IMPLICIT NONE
> + 
> + TYPE FOOBAR
> +   INTEGER, ALLOCATABLE :: COMP(:)
> + END TYPE
> + 
> + TYPE (FOOBAR) :: MY_ARRAY(6)
> + 
> + ALLOCATE (MY_ARRAY(1)%COMP(10))
> + 
> + CALL MOVE_ALLOC (MY_ARRAY(1)%COMP, MY_ARRAY(2)%COMP)
> + 
> +   END PROGRAM TEST


-- 
Steve


Re: Prevent aliasing between arguments in calls to move_alloc

2016-11-08 Thread Paul Richard Thomas
Hi Steve,

I moved too quickly and caused a regression. See the link in the
testcase. The attached fixes the problem and bootstraps/regtests.

OK for trunk?

Paul


On 5 November 2016 at 16:17, Steve Kargl
 wrote:
> On Sat, Nov 05, 2016 at 10:05:30AM +0100, Paul Richard Thomas wrote:
>>
>> Bootstraps and regtests on FC21/x86_64 - OK for trunk?
>
> OK with minor nit (see below).
>
>>
>> +   /*  F2003 12.4.1.7  */
>> +   if (to->expr_type == EXPR_VARIABLE && from->expr_type ==EXPR_VARIABLE
>
> Need a space after ==.
>
> --
> Steve



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein
Index: gcc/fortran/check.c
===
*** gcc/fortran/check.c (revision 241872)
--- gcc/fortran/check.c (working copy)
*** gfc_check_move_alloc (gfc_expr *from, gf
*** 3343,3355 
  }
  
/*  F2003 12.4.1.7  */
!   if (to->expr_type == EXPR_VARIABLE && from->expr_type ==EXPR_VARIABLE
&& !strcmp (to->symtree->n.sym->name, from->symtree->n.sym->name))
  {
!   gfc_error ("The FROM and TO arguments at %L are either the same object "
!"or subobjects thereof and so violate aliasing restrictions "
!"(F2003 12.4.1.7)", >where);
!   return false;
  }
  
/* CLASS arguments: Make sure the vtab of from is present.  */
--- 3343,3380 
  }
  
/*  F2003 12.4.1.7  */
!   if (to->expr_type == EXPR_VARIABLE && from->expr_type == EXPR_VARIABLE
&& !strcmp (to->symtree->n.sym->name, from->symtree->n.sym->name))
  {
!   gfc_ref *to_ref, *from_ref;
!   to_ref = to->ref;
!   from_ref = from->ref;
!   bool aliasing = true;
! 
!   for (; from_ref && to_ref;
!  from_ref = from_ref->next, to_ref = to_ref->next)
!   {
! if (to_ref->type != from->ref->type)
!   aliasing = false;
! else if (to_ref->type == REF_ARRAY
!  && to_ref->u.ar.type != AR_FULL
!  && from_ref->u.ar.type != AR_FULL)
!   /* Play safe; assume sections and elements are different.  */
!   aliasing = false;
! else if (to_ref->type == REF_COMPONENT
!  && to_ref->u.c.component != from_ref->u.c.component)
!   aliasing = false;
! 
! if (!aliasing)
!   break;
!   }
! 
!   if (aliasing)
!   {
! gfc_error ("The FROM and TO arguments at %L violate aliasing "
!"restrictions (F2003 12.4.1.7)", >where);
! return false;
!   }
  }
  
/* CLASS arguments: Make sure the vtab of from is present.  */
Index: gcc/testsuite/gfortran.dg/move_alloc_18.f90
===
*** gcc/testsuite/gfortran.dg/move_alloc_18.f90 (revision 0)
--- gcc/testsuite/gfortran.dg/move_alloc_18.f90 (working copy)
***
*** 0 
--- 1,21 
+ ! { dg-do compile }
+ !
+ ! Test that the anti-aliasing restriction does not knock out valid code.
+ !
+ ! Contributed by  Andrew Balwin on
+ ! https://groups.google.com/forum/#!topic/comp.lang.fortran/oiXdl1LPb_s
+ !
+   PROGRAM TEST
+ IMPLICIT NONE
+ 
+ TYPE FOOBAR
+   INTEGER, ALLOCATABLE :: COMP(:)
+ END TYPE
+ 
+ TYPE (FOOBAR) :: MY_ARRAY(6)
+ 
+ ALLOCATE (MY_ARRAY(1)%COMP(10))
+ 
+ CALL MOVE_ALLOC (MY_ARRAY(1)%COMP, MY_ARRAY(2)%COMP)
+ 
+   END PROGRAM TEST


[PATCH, i386]: Fix PR70799, STV pass does not convert DImode shifts

2016-11-08 Thread Uros Bizjak
Hello!

Attached patch converts non-variable DImode shifts to SSE shifts on
32bit targets.

Please note that the patch doesn't convert variable shifts. We can't
just use Qimode register from integer shifts in its SImode to
implement SSE shifts. The bits outside QImode can be non-zero, the
narrowest mode to copy value from integer to SSE register is SImode,
and since SSE shifts truncate for count values outside allowed range,
it is possible to truncate shifted value to zero when using count
register in a wider mode (SImode).

The problem above can be solved by zero-extending the count value from
QImode to SImode first, but since we are saving only *one* shift
operation (out of two), I think this additional operation won't make
the conversion profitable anymore.

The patch also converts only the non-variable counts that would
otherwise perform two shift operations (e.g. shifts > 31 bits would
originally result in one SImode register being zero).

The patch noticeably improves compiled assembly from crypto code in
libgo and from random generators in libgfortran, resulting in longer
STV sequences on 32bit targets.

2016-11-08  Uros Bizjak  

* config/i386/i386.c (dimode_scalar_to_vector_candidate_p):
Handle ASHIFT and LSHIFTRT.
(dimode_scalar_chain::compute_convert_gain): Ditto.
(dimode_scalar_chain::convert_insn): Ditto.

testsuite/ChangeLog:

2016-11-08  Uros Bizjak  

* gcc.target/i386/pr70799-2.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 241929)
+++ config/i386/i386.c  (working copy)
@@ -2805,11 +2805,24 @@ dimode_scalar_to_vector_candidate_p (rtx_insn *ins
 
   switch (GET_CODE (src))
 {
+case ASHIFT:
+case LSHIFTRT:
+  /* Consider only non-variable shifts narrower
+than general register width.  */
+  if (!(CONST_INT_P (XEXP (src, 1))
+   && IN_RANGE (INTVAL (XEXP (src, 1)), 0, 31)))
+   return false;
+  break;
+
 case PLUS:
 case MINUS:
 case IOR:
 case XOR:
 case AND:
+  if (!REG_P (XEXP (src, 1))
+ && !MEM_P (XEXP (src, 1))
+ && !CONST_INT_P (XEXP (src, 1)))
+   return false;
   break;
 
 case REG:
@@ -2832,11 +2845,6 @@ dimode_scalar_to_vector_candidate_p (rtx_insn *ins
  || !REG_P (XEXP (XEXP (src, 0), 0
   return false;
 
-  if (!REG_P (XEXP (src, 1))
-  && !MEM_P (XEXP (src, 1))
-  && !CONST_INT_P (XEXP (src, 1)))
-  return false;
-
   if ((GET_MODE (XEXP (src, 0)) != DImode
&& !CONST_INT_P (XEXP (src, 0)))
   || (GET_MODE (XEXP (src, 1)) != DImode
@@ -3387,6 +3395,13 @@ dimode_scalar_chain::compute_convert_gain ()
gain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1];
   else if (MEM_P (src) && REG_P (dst))
gain += 2 * ix86_cost->int_load[2] - ix86_cost->sse_load[1];
+  else if (GET_CODE (src) == ASHIFT
+  || GET_CODE (src) == LSHIFTRT)
+   {
+ gain += ix86_cost->add;
+ if (CONST_INT_P (XEXP (src, 0)))
+   gain -= vector_const_cost (XEXP (src, 0));
+   }
   else if (GET_CODE (src) == PLUS
   || GET_CODE (src) == MINUS
   || GET_CODE (src) == IOR
@@ -3738,6 +3753,12 @@ dimode_scalar_chain::convert_insn (rtx_insn *insn)
 
   switch (GET_CODE (src))
 {
+case ASHIFT:
+case LSHIFTRT:
+  convert_op ( (src, 0), insn);
+  PUT_MODE (src, V2DImode);
+  break;
+
 case PLUS:
 case MINUS:
 case IOR:
Index: testsuite/gcc.target/i386/pr70799-2.c
===
--- testsuite/gcc.target/i386/pr70799-2.c   (nonexistent)
+++ testsuite/gcc.target/i386/pr70799-2.c   (working copy)
@@ -0,0 +1,17 @@
+/* PR target/pr70799 */
+/* { dg-do compile { target { ia32 } } } */
+/* { dg-options "-O2 -march=slm -mno-stackrealign" } */
+/* { dg-final { scan-assembler "psllq" } } */
+/* { dg-final { scan-assembler "psrlq" } } */
+
+unsigned long long a, b;
+
+void test1 (void)
+{
+  a = b << 21;
+}
+
+void test2 (void)
+{
+  a = b >> 21;
+}


Re: [PATCH] Fix regex_iterator end() state and operator==()

2016-11-08 Thread Jonathan Wakely
On 8 November 2016 at 07:01, Tim Shen wrote:
> This fixes libstdc++/78236. I'm surprised that this bug was not
> revealed until now :P.
>
> Bootstrapped and tested under x86_64-linux-gnu.
>
> I'm happy with however many backports.

OK for trunk, let's revisit in a few weeks and consider backporting to 6 and 5.

Thanks.


Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker

2016-11-08 Thread Mike Stump
On Nov 8, 2016, at 8:31 AM, Iain Sandoe  wrote:
> 
>> On 8 Nov 2016, at 08:18, Mike Stump  wrote:
>> 
>> On Nov 7, 2016, at 6:33 PM, Iain Sandoe  wrote:
>>> 
>>> a) right now, we need to know the target linker version - while it’s not 
>>> impossible to try and conjure up some test to see if a linker we can run 
>>> supports coalesced sections or not, the configury code and complexity 
>>> needed to support that would exceed what I’m proposing at present (and 
>>> still would not cover the native and canadian cases).
>> 
>> A traditional canadian can run the host linker for the target on the build 
>> machine with --version (or whatever flag) and capture the version number.  I 
>> don't know what setup you have engineered for, since you didn't say.  First 
>> question, can you run the host linker for the target on the build machine?  
>> If so, you can directly capture the output.  The next question is, is it the 
>> same version as the version that would be used on the host?
> 
> I suppose that one could demand that - and require a build thus.

It is a statement of what we have today, already a requirement.  Sorry you 
missed the memo.  The problem is software usually changes through time, and the 
old software can't acquire the features of the new software, so to get those 
you need the new software, not the old.  If there are no new features, in some 
rare cases, one might be able to use a range of versions, but the range that 
would work, would be the range that an expert tested and documented as working. 
 Absent that, it generally speaking isn't that safe to assume it is safe.  And, 
when it isn't safe and doesn't work, well, it just doesn't work.  Thinking it 
will, or hoping it will, won't make it so.  If those features were able to be 
pushed back into the old software, you're merely reinvent a source distribution 
of the the new software in different clothes, poorly.

> If we demand that the same version linker is used for all, then perhaps that 
> could work.

More that that, it will just work; by design; and in the cases where that isn't 
the case, those are bugs that can be worked around or fixed.

> It seems likely that we’ll end up with mis-configures and stuff hard to 
> support with non-expert build folks.

Nope.  Indeed, remember the only reason why we do this, is to make the phrase, 
it just works, true.  If you are imagining anything else, the flaw is mine in 
communicating why we do what it is that we do.

>>> I’m not debating the various solutions in your reply to Jeff - but honestly 
>>> I wonder how many of them are realistically in reach of the typical 
>>> end-user (I have done most of them at one stage or another, but I wonder 
>>> how many would be stopped dead by “first find and build ld64, which itself 
>>> needs a c++11 compiler and BTW needs you to build libLTO.dylib .. which 
>>> needs you to build at least LLVM itself").
>> 
>> Package managers exist to solve that problem nicely, if someone wants a 
>> trivial solution.  They have the ability to scoop up binaries and just copy 
>> them onto a machine, solving hard chicken/egg problems.  Other possibilities 
>> are scripts that setup everything and release the scripts.
> 
> yes, I’m working on at least the latter (don’t have time to become a package 
> manager).

The thing is, the build scripts have already been written, we're just 'making 
them work'.

>>> am I missing a point here?
>> 
>> The answer to the two questions above.  The answer to the question, what 
>> specific question do you want answered, and what is available to the build 
>> machine, specifically to answer that question?
>> 
>> Also, you deflect on the os version to linker version number, but you never 
>> said what didn't actually work.  What specifically doesn't work?  This 
>> method is trivial and the mapping is direct and expressible in a few lines 
>> per version supported.  I still maintain that the only limitation is you 
>> must choose exactly 1 version per config triplet; I don't see that as a 
>> problem.  If it were, I didn't see you explain the problem.  Even if it is, 
>> that problem is solved after the general problem that nothing works today.  
>> By having at least _a_ mapping, you generally solve the problem for most 
>> people, most of the time.
> 
> It *requires* that one configures arch-darwinNN .. and doesn’t allow for 
> arch-darwin (to mean anything other than  build=host=target)

No, that's a misunderstanding of my position.  Use the filter, does this mean 
'it just works', or not.  If not, then that's not what I mean.

> If we can engineer that a suitable ld64 can be run at configuration time so 
> that the version can be discovered automatically, I’m with you 100% - but the 
> scenarios put forward seem very complex for typical folks,

A linker is a requirement of a build, this isn't supposed to be opaque.  A 
linker means you can run it.  Again, not 

Re: [PATCH] use-after-scope fallout

2016-11-08 Thread David Malcolm
On Tue, 2016-11-08 at 13:00 +0100, Martin Liška wrote:
> Hello.
> 
> This is fallout fix where I changed:
> 
> 1) Fix ICE for lambda functions (added test-case: use-after-scope
> -4.C)
> 2) Fix ICE in gimplify_switch_expr, at gimplify.c:2269 (fixed by not
> adding
> artificial variables)
> 3) PR testsuite/78242 - I basically removed the test (not
> interesting)
> 4) LEAF and NOTHROW flags are properly set on ASAN {un}poison
> functions
> 5) dbg_cnt has been added
> 6) use-after-scope-types-4.C - scanned pattern is updated to work on
> i686
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression
> tests.
> 
> Ready to be installed?

Thanks.  The jit build is now fixed (as of r241961).



Re: [PATCH v2] AIX visibility

2016-11-08 Thread David Edelsohn
On Tue, Nov 8, 2016 at 10:23 AM, Christophe Lyon
 wrote:
> Hi David,
>
> On 2 November 2016 at 16:41, David Edelsohn  wrote:
>> This revised patch makes two changes:
>>
>> 1) Fix typo in configure.ac
>> 2) Add AIX visibility support for ASM_WEAKEN_DECL, which does touch
>> the same code as Linux.
>>
>> The AIX "weak" support fixes a large number of C++ visibility testcases.
>>
>> Bootstrapped on powerpc-ibm-aix7.2.0.0.
>>
>> * configure.ac (.hidden): Change to conftest_s string. Provide string
>> for AIX assembler.
>> (gcc_cv_ld_hidden): Yes for AIX.
>> * configure: Regenerate.
>>
>> * dwarf2asm.c (USE_LINKONCE_INDIRECT): Don't set for AIX (XCOFF).
>>
>> * config/rs6000/rs6000-protos.h (rs6000_asm_weaken_decl): Declare
>> (rs6000_xcoff_asm_output_aligned_decl_common): Declare.
>> * config/rs6000/xcoff.h (TARGET_ASM_GLOBALIZE_DECL_NAME): Define.
>> (ASM_OUTPUT_ALIGNED_DECL_COMMON): Define.
>> (ASM_OUTPUT_ALIGNED_COMMON): Delete.
>> * config/rs6000/rs6000.c (rs6000_init_builtins): Change clog rename
>> from #if to if.
>> (rs6000_xcoff_visibility): New.
>> (rs6000_xcoff_declare_function_name): Add visibility support.
>> (rs6000_xcoff_asm_globalize_decl_name): New.
>> (rs6000_xcoff_asm_output_aligned_decl_common): New.
>> (rs6000_asm_weaken_decl): New.
>> (rs6000_code_end): Disable HIDDEN_LINKONCE on XCOFF.
>> config/rs6000/rs6000.h (ASM_WEAKEN_DECL): Change definition to
>> reference function.
>>
>> dwarf2asm.c okay?
>>
>> Any comments on ASM_WEAKEN_DECL change?
>>
>> Thanks, David
>
> It seems this commit (r241930) is causing a regression on aarch64:
> FAIL: g++.dg/torture/pr60750.C   -O2 -flto -fno-use-linker-plugin
> -flto-partition=none  execution test

Hi, Christophe

Because GCC wants to move toward runtime tests, the appended patch is
a better solution.

Thanks, David

Index: dwarf2asm.c
===
--- dwarf2asm.c (revision 241972)
+++ dwarf2asm.c (working copy)
@@ -824,8 +824,8 @@

 static GTY(()) int dw2_const_labelno;

-#if defined(HAVE_GAS_HIDDEN) && !defined(XCOFF_DEBUGGING_INFO)
-# define USE_LINKONCE_INDIRECT (SUPPORTS_ONE_ONLY)
+#if defined(HAVE_GAS_HIDDEN)
+# define USE_LINKONCE_INDIRECT (SUPPORTS_ONE_ONLY && !XCOFF_DEBUGGING_INFO)
 #else
 # define USE_LINKONCE_INDIRECT 0
 #endif


Re: [PATCH 0/2] strncmp builtin expansion improvement

2016-11-08 Thread Jeff Law

On 11/05/2016 10:32 PM, Aaron Sawdey wrote:

On Fri, 2016-11-04 at 20:43 -0600, Jeff Law wrote:

So what's the motivation here?  When we don't have any constants
then
I'd think we'd be better off punting into the library.


When none of the args to strncmp are constant, I'd be inclined to
agree. However the current state of affairs is that strncmp is not
expanded in the case where the length is a constant but the strings are
not. This patch allows the expansion to be attempted.
Ah.  Yea, there are probably cases where we'd want to give the target 
some control when the length is a known constant, but the strings are 
unknowns.


For small constants (1-2 chars) the expander should probably just 
inline.  Beyond that we could be querying the target.


jeff



[PATCH] DECL_RTL and DECL_RTL_INCOMING in RTL dumps

2016-11-08 Thread David Malcolm
Whilst working on the RTL frontend, I ran into various crashes
relating to missing RTL information for params, for DECL_RTL, and
DECL_RTL_INCOMING.

These are normally set up for a PARM_DECL by "expand", but are
currently NULL when reading dumps from print_rtx_function.

Attempting to access DECL_RTL without initialization leads to an
attempt to lazily set the RTL, which fails here in make_decl_rtl:

1302  /* Check that we are not being given an automatic variable.  */
1303  gcc_assert (TREE_CODE (decl) != PARM_DECL
1304  && TREE_CODE (decl) != RESULT_DECL);

Similarly, DECL_RTL_INCOMING is sometimes accessed by some passes, and
is currently NULL when reading RTL dumps.

I don't think we can re-run parts of expand, so I think we need to
store the values of DECL_RTL and DECL_RTL_INCOMING for PARM_DECLs in
the dump format.

The following patch implements this for print_rtx_function.

For example, a function on aarch64 taking one int:

int __RTL("rtl-combine") f1 (int n)
{
(function "f1"
  (param "n"
(DECL_RTL
  (reg/v:SI %1 [ n ])
) ;; DECL_RTL
(DECL_RTL_INCOMING
  (reg:SI x0 [ n ])
) ;; DECL_RTL_INCOMING
  ) ;; param "n"
  (insn-chain
;; etc

and a function on x86_64 taking three ints:

int __RTL("rtl-vregs") test_1 (int i, int j, int k)
{
(function "test_1"
  (param "i"
(DECL_RTL
  (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars)
(const_int -4)) [1 i+0 S4 A32])
) ;; DECL_RTL
(DECL_RTL_INCOMING
  (reg:SI di [ i ])
) ;; DECL_RTL_INCOMING
  ) ;; param "i"
  (param "j"
(DECL_RTL
  (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars)
(const_int -8)) [1 j+0 S4 A32])
) ;; DECL_RTL
(DECL_RTL_INCOMING
  (reg:SI si [ j ])
) ;; DECL_RTL_INCOMING
  ) ;; param "j"
  (param "k"
(DECL_RTL
  (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars)
(const_int -12)) [1 k+0 S4 A32])
) ;; DECL_RTL
(DECL_RTL_INCOMING
  (reg:SI dx [ k ])
) ;; DECL_RTL_INCOMING
  ) ;; param "k"
  (insn-chain
;; etc

I don't like how verbose the output is, but I think it's needed.
Or we could move it to after the "insn-chain" directive.

I have working code for the RTL frontend to read this format, and
it fixes the various bugs I ran into.

Only lightly tested so far.

OK for trunk if it passes bootstrap and regrtest?

gcc/ChangeLog:
* print-rtl-function.c (print_any_param_name): New function.
(print_param): New function.
(print_rtx_function): Call print_param for each argument.
---
 gcc/print-rtl-function.c | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/gcc/print-rtl-function.c b/gcc/print-rtl-function.c
index b62f1b3..9b1155d 100644
--- a/gcc/print-rtl-function.c
+++ b/gcc/print-rtl-function.c
@@ -127,6 +127,40 @@ can_have_basic_block_p (const rtx_insn *insn)
   return true;
 }
 
+/* Subroutine of print_param.  Write the name of ARG, if any, to OUTFILE.  */
+
+static void
+print_any_param_name (FILE *outfile, tree arg)
+{
+  if (DECL_NAME (arg))
+fprintf (outfile, " \"%s\"", IDENTIFIER_POINTER (DECL_NAME (arg)));
+}
+
+/* Print a "(param)" directive for ARG to OUTFILE.  */
+
+static void
+print_param (FILE *outfile, rtx_writer , tree arg)
+{
+  fprintf (outfile, "  (param");
+  print_any_param_name (outfile, arg);
+  fprintf (outfile, "\n");
+
+  /* Print the value of DECL_RTL (without lazy-evaluation).  */
+  fprintf (outfile, "(DECL_RTL\n");
+  rtx decl_rtl = DECL_WRTL_CHECK (arg)->decl_with_rtl.rtl;
+  w.print_rtl_single_with_indent (decl_rtl, 6);
+  fprintf (outfile, ") ;; DECL_RTL\n");
+
+  /* Print DECL_INCOMING_RTL.  */
+  fprintf (outfile, "(DECL_RTL_INCOMING\n");
+  w.print_rtl_single_with_indent (DECL_INCOMING_RTL (arg), 6);
+  fprintf (outfile, ") ;; DECL_RTL_INCOMING\n");
+
+  fprintf (outfile, "  ) ;; param");
+  print_any_param_name (outfile, arg);
+  fprintf (outfile, "\n");
+}
+
 /* Write FN to OUTFILE in a form suitable for parsing, with indentation
and comments to make the structure easy for a human to grok.  Track
the basic blocks of insns in the chain, wrapping those that are within
@@ -202,6 +236,10 @@ print_rtx_function (FILE *outfile, function *fn, bool 
compact)
 
   fprintf (outfile, "(function \"%s\"\n", dname);
 
+  /* Params.  */
+  for (tree arg = DECL_ARGUMENTS (fdecl); arg; arg = DECL_CHAIN (arg))
+print_param (outfile, w, arg);
+
   /* The instruction chain.  */
   fprintf (outfile, "  (insn-chain\n");
   basic_block curr_bb = NULL;
-- 
1.8.5.3



Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker

2016-11-08 Thread Mike Stump
On Nov 7, 2016, at 9:15 PM, Iain Sandoe  wrote:
> 
> this is pretty “black belt” stuff - I don’t see most of our users wanting to 
> dive this deeply … 

No, not really.  It is true that newlib and GNU sim needs a few more lines of 
code, but that lines are written for linux and most other posix style systems 
already and can be contributed.  Once they are in, I used a vanishingly small 
change to configure, to put $RUN on the front of some command lines.  RUN="" 
for natives, and RUN=triple-run for selecting the GNU simulator.  In later 
builds, I didn't even do that, I do things like:

  CC=triplet-run\ $install/bin/triplet-gcc CXX=cyclops2e-run\ 
$install/bin/triplet-g++ ../newlib/configure --target=triplet --host=triplet2 
--prefix=$install
  make -j$N CC_FOR_TARGET="triplet-run $install/bin/triplet-gcc"
  make -j$N CC_FOR_TARGET="triplet-run $install/bin/triplet-gcc" install

an an example on how to configure and build up newlib.  Works just fine as is.  
gcc is the same, and just as easy.  For cross natives, don't expect it can be 
much easier, unless I added 2 lines of code into configure that tacked in the 
$RUN by itself.  If that were done, it's literally just a couple of lines, and 
then presto, it is the standard documented build methodology as it exists today.

>> Also, for darwin, in some cases, we can actually run the target or host 
>> programs on the build machine directly.
> 
> I have that (at least weakly) in the patch posted

If you can do that, then you can run:
$ /usr/bin/ld -v
@(#)PROGRAM:ld  PROJECT:ld64-274.1
configured to support archs: armv6 armv7 armv7s arm64 i386 x86_64 x86_64h 
armv6m armv7k armv7m armv7em (tvOS)
LTO support using: LLVM version 8.0.0, (clang-800.0.42.1)
TAPI support using: Apple TAPI version 1.30

$ ld -v
@(#)PROGRAM:ld  PROJECT:ld64-264.3.102
configured to support archs: i386 x86_64 x86_64h armv6 armv7 armv7s armv7m 
armv7k arm64 (tvOS)
LTO support using: LLVM version 3.8.1

and get a flavor of ld directly, no?  Isn't the version number you want 
something like 274.1 and 264.3.102?  I can write the sed to get that if you 
want.

  $ld -v

is the program I used to get the above, and I'm assuming that you can figure 
out how to set $ld.  This also works for natives, just as well, as can be seen 
above, as those two are the usual native commands.`

> .. and produce the situation where we can never have a c++11 compiler on 
> powerpc-darwin9,

No, I never said that and never would accept that limitation.  If someone 
wanted to contribute patches to do that, I'd approve them.  If they wanted help 
and advice on how to make it happen, I'm there.  You are imagining limitations 
that don't exist.  I have envisioned, a world where everything just works and 
getting there is trivial.  Just takes a few well placed patches.

> because the “required ld64” doesn’t support it (OK. maybe we don’t care about 
> that) but supposing we can now have symbol aliasses with modern ld64 (I think 
> we can) - would we want to prevent a 10.6 from using that?

Nope.

> So, I am strongly against the situation where we fix the capability of the 
> toolchain on some assumption of externally-available tools predicated on the 
> system revision.

We aren't.  It is only a notion of what the meaning of a particular configure 
line is.  If you want it to mean,  blow chunks, then you are wrong for that 
choice.  If you want it to mean, it just works, then we are on the same page.  
So, my question to you is, what happen when you don't give the linker version 
on the configure line?  Does it blow chunks or just work?  If it just works, 
why would you want to provide the flag?  If it blow chunks, then you now 
understand completely why I oppose that.

> The intent of my patch is to move away from this to a situation where we use 
> configuration tests to determine the capability from the tools [when we can 
> run them] and on the basis of their version(s) when we are configuring in a 
> cross scenario.

No, it is backward progress.  You would require infinite argument lists for 
infinite complexity and infinite ways for breakage given how hard it is to 
formulate those arguments.  This is against the autoconf/configure spirit.  
That spirit is no infinite argument list for infinite complexity, and 
everything just works.

When I say my cross native build is:

   ../newlib/configure --target=triplet --prefix=$install
  make && make install

Do you see why it must be right?  There just is no possibility that it is 
wrong.  Further, it is this today, tomorrow, yesterday and next century, by 
design.  I resist the configure flag because you said you can run ld, and as 
I've shown, -v will get us the information I think you seek, directly, no 
configure flag.  You can not said that you can't run it, and you have not said 
why the information you seek isn't in the output found.

> Actually, there’s a bunch of configury in GCC that picks up the version of 
> binutils components 

Re: [PATCH v2] AIX visibility

2016-11-08 Thread David Edelsohn
On Tue, Nov 8, 2016 at 10:23 AM, Christophe Lyon
 wrote:
> Hi David,
>
> On 2 November 2016 at 16:41, David Edelsohn  wrote:
>> This revised patch makes two changes:
>>
>> 1) Fix typo in configure.ac
>> 2) Add AIX visibility support for ASM_WEAKEN_DECL, which does touch
>> the same code as Linux.
>>
>> The AIX "weak" support fixes a large number of C++ visibility testcases.
>>
>> Bootstrapped on powerpc-ibm-aix7.2.0.0.
>>
>> * configure.ac (.hidden): Change to conftest_s string. Provide string
>> for AIX assembler.
>> (gcc_cv_ld_hidden): Yes for AIX.
>> * configure: Regenerate.
>>
>> * dwarf2asm.c (USE_LINKONCE_INDIRECT): Don't set for AIX (XCOFF).
>>
>> * config/rs6000/rs6000-protos.h (rs6000_asm_weaken_decl): Declare
>> (rs6000_xcoff_asm_output_aligned_decl_common): Declare.
>> * config/rs6000/xcoff.h (TARGET_ASM_GLOBALIZE_DECL_NAME): Define.
>> (ASM_OUTPUT_ALIGNED_DECL_COMMON): Define.
>> (ASM_OUTPUT_ALIGNED_COMMON): Delete.
>> * config/rs6000/rs6000.c (rs6000_init_builtins): Change clog rename
>> from #if to if.
>> (rs6000_xcoff_visibility): New.
>> (rs6000_xcoff_declare_function_name): Add visibility support.
>> (rs6000_xcoff_asm_globalize_decl_name): New.
>> (rs6000_xcoff_asm_output_aligned_decl_common): New.
>> (rs6000_asm_weaken_decl): New.
>> (rs6000_code_end): Disable HIDDEN_LINKONCE on XCOFF.
>> config/rs6000/rs6000.h (ASM_WEAKEN_DECL): Change definition to
>> reference function.
>>
>> dwarf2asm.c okay?
>>
>> Any comments on ASM_WEAKEN_DECL change?
>>
>> Thanks, David
>
> It seems this commit (r241930) is causing a regression on aarch64:
> FAIL: g++.dg/torture/pr60750.C   -O2 -flto -fno-use-linker-plugin
> -flto-partition=none  execution test

Hi, Christophe

Ah, I see.

Can you try the appended patch.

Thanks, David

Index: dwarf2asm.c
===
--- dwarf2asm.c (revision 241972)
+++ dwarf2asm.c (working copy)
@@ -824,7 +824,7 @@

 static GTY(()) int dw2_const_labelno;

-#if defined(HAVE_GAS_HIDDEN) && !defined(XCOFF_DEBUGGING_INFO)
+#if defined(HAVE_GAS_HIDDEN) && !XCOFF_DEBUGGING_INFO
 # define USE_LINKONCE_INDIRECT (SUPPORTS_ONE_ONLY)
 #else
 # define USE_LINKONCE_INDIRECT 0

Thanks, David


Re: [match.pd] Fix for PR35691

2016-11-08 Thread Martin Sebor

Chritstophe reported to me that the commit caused test-cases
pr35691-1.c and pr35691-2.c (which were added by the commit)
to FAIL for cortex-a5:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241915/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a5-vfpv3-d16-fp16.txt


I also see the test fail on powerpc64le.  The forwprop1 tree dump
is attached in case it helps.

Martin

;; Function foo (foo, funcdef_no=0, decl_uid=2506, cgraph_uid=0, symbol_order=0)

Applying pattern match.pd:2422, gimple-match.c:1704
Applying pattern match.pd:913, gimple-match.c:621
Applying pattern match.pd:901, gimple-match.c:164
Applying pattern match.pd:2685, gimple-match.c:5
gimple_simplified to if (_1 != 0)
Applying pattern match.pd:913, generic-match.c:429
Applying pattern match.pd:901, generic-match.c:136
Applying pattern match.pd:2685, generic-match.c:30968
  Replaced '_1 != 0' with 'z0_4(D) == 0'
Applying pattern match.pd:2422, gimple-match.c:1704
Applying pattern match.pd:913, gimple-match.c:621
Applying pattern match.pd:901, gimple-match.c:164
Applying pattern match.pd:2685, gimple-match.c:5
gimple_simplified to if (_2 != 0)
Applying pattern match.pd:913, generic-match.c:429
Applying pattern match.pd:901, generic-match.c:136
Applying pattern match.pd:2685, generic-match.c:30968
  Replaced '_2 != 0' with 'z1_6(D) == 0'
gimple_simplified to _11 = iftmp.0_3;
foo (int z0, unsigned int z1)
{
  int t2;
  int t1;
  int t0;
  _Bool _1;
  _Bool _2;
  int iftmp.0_3;
  int _11;

  :
  _1 = z0_4(D) == 0;
  t0_5 = (int) _1;
  _2 = z1_6(D) == 0;
  t1_7 = (int) _2;
  if (z0_4(D) == 0)
goto ;
  else
goto ;

  :
  if (z1_6(D) == 0)
goto ;
  else
goto ;

  :

  :
  # iftmp.0_3 = PHI <1(3), 0(4)>
  t2_10 = iftmp.0_3;
  _11 = iftmp.0_3;
  return _11;

}




Re: [PATCH] S390: Fix PR/77822.

2016-11-08 Thread Matthias Klose
On 08.11.2016 15:38, Dominik Vogt wrote:
> The attached patch fixes PR/77822 on s390/s390x dor gcc-6 *only*.
> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77822
> 
> Bootstrapped and regression tested on s390 and s390x biarch on a
> zEC12.

missing the testcase mentioned in the ChangeLog.



Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker

2016-11-08 Thread Iain Sandoe

> On 8 Nov 2016, at 08:18, Mike Stump  wrote:
> 
> On Nov 7, 2016, at 6:33 PM, Iain Sandoe  wrote:
>> 
>> a) right now, we need to know the target linker version - while it’s not 
>> impossible to try and conjure up some test to see if a linker we can run 
>> supports coalesced sections or not, the configury code and complexity needed 
>> to support that would exceed what I’m proposing at present (and still would 
>> not cover the native and canadian cases).
> 
> A traditional canadian can run the host linker for the target on the build 
> machine with --version (or whatever flag) and capture the version number.  I 
> don't know what setup you have engineered for, since you didn't say.  First 
> question, can you run the host linker for the target on the build machine?  
> If so, you can directly capture the output.  The next question is, is it the 
> same version as the version that would be used on the host?

I suppose that one could demand that - and require a build thus.

So I build x86_64-darwin14 X powerpc-darwin9
and then a native host  build = x86_64-darwin14 host = target = powerpc-darwin9
If we demand that the same version linker is used for all, then perhaps that 
could work.

It seems likely that we’ll end up with mis-configures and stuff hard to support 
with non-expert build folks.

>> I’m not debating the various solutions in your reply to Jeff - but honestly 
>> I wonder how many of them are realistically in reach of the typical end-user 
>> (I have done most of them at one stage or another, but I wonder how many 
>> would be stopped dead by “first find and build ld64, which itself needs a 
>> c++11 compiler and BTW needs you to build libLTO.dylib .. which needs you to 
>> build at least LLVM itself").
> 
> Package managers exist to solve that problem nicely, if someone wants a 
> trivial solution.  They have the ability to scoop up binaries and just copy 
> them onto a machine, solving hard chicken/egg problems.  Other possibilities 
> are scripts that setup everything and release the scripts.

yes, I’m working on at least the latter (don’t have time to become a package 
manager).
> 
>> am I missing a point here?
> 
> The answer to the two questions above.  The answer to the question, what 
> specific question do you want answered, and what is available to the build 
> machine, specifically to answer that question?
> 
> Also, you deflect on the os version to linker version number, but you never 
> said what didn't actually work.  What specifically doesn't work?  This method 
> is trivial and the mapping is direct and expressible in a few lines per 
> version supported.  I still maintain that the only limitation is you must 
> choose exactly 1 version per config triplet; I don't see that as a problem.  
> If it were, I didn't see you explain the problem.  Even if it is, that 
> problem is solved after the general problem that nothing works today.  By 
> having at least _a_ mapping, you generally solve the problem for most people, 
> most of the time.

It *requires* that one configures arch-darwinNN .. and doesn’t allow for 
arch-darwin (to mean anything other than  build=host=target) - but really we 
should be able to build arch-darwin wit config flags including 
-mmacosx-version-min= to be deployable on any darwin > than the specified min.  
I actually do this for day job toolchains, it’s not purely hypothetical.  Since 
darwin toolchains are all supposed to be “just works” cross ones.

> For example, if you target 10.0, there is no new features from an Xcode that 
> was from 10.11.6 timeframe.  The only way to get those features, would be to 
> run a newer ld64, and, if you are doing that, then you likely have enough 
> sources to run ld directly.  And if you are running it directly, then you can 
> just ask it what version it is.

If we can engineer that a suitable ld64 can be run at configuration time so 
that the version can be discovered automatically, I’m with you 100% - but the 
scenarios put forward seem very complex for typical folks,

What would you prefer to replace this patch with?
Iain




Re: ineffective dg-skip-if directive in asan/use-after-scope-8.c

2016-11-08 Thread Martin Liška
On 11/08/2016 05:28 PM, Martin Sebor wrote:
> Hi Martin,
> 
> I noticed a new failure in the use-after-scope-8.c test on powerpc64le:
> 
>   FAIL: gcc.dg/asan/use-after-scope-8.c   -O0  (test for excess errors)
> 
> with the error being
> 
>   use-after-scope-8.c:9:16: error: invalid register name for 'a'
> 
> It looks to me as though the dg-skip-if directive in the test isn't
> having the desired effect:
> 
>   /* { dg-skip-if "" { *-*-* } { "*" } { "-O0" } } */
> 
> I'm not familiar with the specifics of the directive but changing
> it to this got rid of the error for me:
> 
>   // { dg-skip-if "" { *-*-* } "-O0" "" }
> 
> Looking more closely at the code, it references an x86 register
> which is obviously not going to be valid on other processors.
> I'm not sure I understand the purpose of the register variable
> (on powerpc64le the test passes without it) but if it's important
> for some reason then as an alternative to the above, changing
> the register to one that's generally available and getting rid
> of the directive should work too.  This did the trick for me:
> 
>   register int a asm ("0") = 123;
> 
> Martin

Hello.

I know about the issue, I decided to remove the test-case as it didn't test
any interesting behavior. Fixed in r241961.

Martin


ineffective dg-skip-if directive in asan/use-after-scope-8.c

2016-11-08 Thread Martin Sebor

Hi Martin,

I noticed a new failure in the use-after-scope-8.c test on powerpc64le:

  FAIL: gcc.dg/asan/use-after-scope-8.c   -O0  (test for excess errors)

with the error being

  use-after-scope-8.c:9:16: error: invalid register name for 'a'

It looks to me as though the dg-skip-if directive in the test isn't
having the desired effect:

  /* { dg-skip-if "" { *-*-* } { "*" } { "-O0" } } */

I'm not familiar with the specifics of the directive but changing
it to this got rid of the error for me:

  // { dg-skip-if "" { *-*-* } "-O0" "" }

Looking more closely at the code, it references an x86 register
which is obviously not going to be valid on other processors.
I'm not sure I understand the purpose of the register variable
(on powerpc64le the test passes without it) but if it's important
for some reason then as an alternative to the above, changing
the register to one that's generally available and getting rid
of the directive should work too.  This did the trick for me:

  register int a asm ("0") = 123;

Martin


Re: [Patch, Fortran, F03] PR77596: procedure pointer component with implicit interface can point to a function

2016-11-08 Thread Janus Weil
2016-11-08 15:51 GMT+01:00 Steve Kargl :
> On Tue, Nov 08, 2016 at 11:02:26AM +0100, Janus Weil wrote:
>>
>> here is a simple patch for the accepts-invalid problem of PR77596.
>> Regtests cleanly on x86_64-linux-gnu. Ok for trunk?
>>
>
> Yes.

Thanks! Committed as r241972.


> (and welcome back to the wonderful world of bugzilla).

:)

In fact I hope to be able to devote quite a bit of time to gfortran in
the coming weeks (and will probably start with some low-hanging fruit,
in order warm up after a longer absence).

Cheers,
Janus


Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker

2016-11-08 Thread Mike Stump
On Nov 7, 2016, at 6:33 PM, Iain Sandoe  wrote:
> 
> a) right now, we need to know the target linker version - while it’s not 
> impossible to try and conjure up some test to see if a linker we can run 
> supports coalesced sections or not, the configury code and complexity needed 
> to support that would exceed what I’m proposing at present (and still would 
> not cover the native and canadian cases).

A traditional canadian can run the host linker for the target on the build 
machine with --version (or whatever flag) and capture the version number.  I 
don't know what setup you have engineered for, since you didn't say.  First 
question, can you run the host linker for the target on the build machine?  If 
so, you can directly capture the output.  The next question is, is it the same 
version as the version that would be used on the host?

> I’m not debating the various solutions in your reply to Jeff - but honestly I 
> wonder how many of them are realistically in reach of the typical end-user (I 
> have done most of them at one stage or another, but I wonder how many would 
> be stopped dead by “first find and build ld64, which itself needs a c++11 
> compiler and BTW needs you to build libLTO.dylib .. which needs you to build 
> at least LLVM itself").

Package managers exist to solve that problem nicely, if someone wants a trivial 
solution.  They have the ability to scoop up binaries and just copy them onto a 
machine, solving hard chicken/egg problems.  Other possibilities are scripts 
that setup everything and release the scripts.

> am I missing a point here?

The answer to the two questions above.  The answer to the question, what 
specific question do you want answered, and what is available to the build 
machine, specifically to answer that question?

Also, you deflect on the os version to linker version number, but you never 
said what didn't actually work.  What specifically doesn't work?  This method 
is trivial and the mapping is direct and expressible in a few lines per version 
supported.  I still maintain that the only limitation is you must choose 
exactly 1 version per config triplet; I don't see that as a problem.  If it 
were, I didn't see you explain the problem.  Even if it is, that problem is 
solved after the general problem that nothing works today.  By having at least 
_a_ mapping, you generally solve the problem for most people, most of the time.

For example, if you target 10.0, there is no new features from an Xcode that 
was from 10.11.6 timeframe.  The only way to get those features, would be to 
run a newer ld64, and, if you are doing that, then you likely have enough 
sources to run ld directly.  And if you are running it directly, then you can 
just ask it what version it is.



Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-08 Thread Marc Glisse

On Tue, 8 Nov 2016, Dominik Vogt wrote:


On Fri, Nov 04, 2016 at 01:54:20PM +0100, Richard Biener wrote:

On Fri, Nov 4, 2016 at 12:08 PM, Dominik Vogt  wrote:

On Fri, Nov 04, 2016 at 09:47:26AM +0100, Richard Biener wrote:

On Thu, Nov 3, 2016 at 4:03 PM, Dominik Vogt  wrote:

Is VRP the right pass to do this optimisation or should a later
pass rather attempt to eliminate the new use of b_5 instead?  Uli
has brought up the idea a mini "sign extend elimination" pass that
checks if the result of a sign extend could be replaced by the
original quantity in all places, and if so, eliminate the ssa
name.  (I guess that won't help with the above code because l is
used also as a function argument.)  How could a sensible approach
to deal with the situation look like?


We run into this kind of situation regularly and for general foldings
in match.pd we settled with single_use () even though it is not perfect.
Note the usual complaint is not extra extension instructions but
the increase of register pressure.

This is because it is hard to do better when you are doing local
optimization.

As for the question on whether VRP is the right pass to do this the
answer is two-fold -- VRP has the most precise range information.
But the folding itself should be moved to generic code and use
get_range_info ().


All right, is this a sensible approach then?


Yes.


  1. Using has_single_use as in the experimental patch is good
 enough (provided further testing does not show serious
 regressions).


I'd approve that, yes.


  2. Rip the whole top level if-block from simplify_cond_using_ranges().
  3. Translate that code to match.pd syntax.


Might be some work but yes, that's also desired (you'd lose the ability
to emit the warnings though).


Could you give me a match-pd-hint please?  We now have something
like this:

(simplify
 (cond (gt SSA_NAME@0 INTEGER_CST@1) @2 @3)
 (if (... many conditions ...)
  (cond (gt ... ...) @2 @3))

The generated code ends up in gimple_simplify_COND_EXPR, but when
gimple_simplify is actually called, it goes through the
GIMPLE_COND case and calls gimple_resimplify2(..., GT, ...) and
there it tries gimple_simplify_GT_EXPR(), peeling of the (cond
...), i.e. it never tries the generated code.


Not sure what you mean here.


There is another pattern in match.pd that uses a (cond ...) as the
first operand, and I do not understand how this works.  Should we
just use "(gt SSA_NAME@0 INTEGER_CST@1)" as the first operand
instead, and wouldn't this pattern be too general that way?


IIUC, you are trying to move the second half of simplify_cond_using_ranges 
to match.pd. I don't see any reason to restrict it to the case where the 
comparison result is used directly in a COND_EXPR, so that would look 
like:


(for cmp (...)
 (simplify
  (cmp (convert SSA_NAME@0) INTEGER_CST@1)
  (if (...)
   (cmp @0 (convert @1)

maybe? I think I may have missed your point.

(and yes, the first half would give a very general (simplify (cmp 
SSA_NAME@0 INTEGER_CST@1) ...), that doesn't seem so bad)


--
Marc Glisse


[gomp-nvptx 1/2] omp-low: add missing call to unshare_expr

2016-11-08 Thread Alexander Monakov
* omp-low.c (lower_lastprivate_clauses): Add missing call to
unshare_expr.
---
 gcc/ChangeLog.gomp-nvptx | 5 +
 gcc/omp-low.c| 1 +
 2 files changed, 6 insertions(+)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 347730d..da5476b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5599,6 +5599,7 @@ lower_lastprivate_clauses (tree clauses, tree predicate, 
gimple_seq *stmt_list,
  x = build_call_expr_internal_loc
(UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_IDX,
 TREE_TYPE (new_var), 2, new_var, simtlast);
+ new_var = unshare_expr (new_var);
  gimplify_assign (new_var, x, stmt_list);
  new_var = unshare_expr (new_var);
}


[gomp-nvptx 2/2] libgomp: adjust comments in nvptx team.c

2016-11-08 Thread Alexander Monakov
* config/nvptx/team.c: Adjust comments.
---
 libgomp/ChangeLog.gomp-nvptx |  4 
 libgomp/config/nvptx/team.c  | 16 +---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index 6c6827a..f7b5e3e 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -23,7 +23,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
-/* This file handles the maintainence of threads on NVPTX.  */
+/* This file handles maintainance of threads on NVPTX.  */
 
 #if defined __nvptx_softstack__ && defined __nvptx_unisimt__
 
@@ -35,6 +35,16 @@ struct gomp_thread *nvptx_thrs 
__attribute__((shared,nocommon));
 
 static void gomp_thread_start (struct gomp_thread_pool *);
 
+
+/* This externally visible function handles target region entry.  It
+   sets up a per-team thread pool and transfers control by calling FN (FN_DATA)
+   in the master thread or gomp_thread_start in other threads.
+
+   The name of this function is part of the interface with the compiler: for
+   each target region, GCC emits a PTX .kernel function that sets up soft-stack
+   and uniform-simt state and calls this function, passing in FN the original
+   function outlined for the target region.  */
+
 void
 gomp_nvptx_main (void (*fn) (void *), void *fn_data)
 {
@@ -73,8 +83,8 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
 }
 }
 
-/* This function is a pthread_create entry point.  This contains the idle
-   loop in which a thread waits to be called up to become part of a team.  */
+/* This function contains the idle loop in which a thread waits
+   to be called up to become part of a team.  */
 
 static void
 gomp_thread_start (struct gomp_thread_pool *pool)


[gomp-nvptx 0/2] Branch merged from trunk and updated

2016-11-08 Thread Alexander Monakov
Hello,

I have done a trunk merge on amonakov/gomp-nvptx branch (up to svn rev. 241651)
and applied two small patches to address issues pointed out by Jakub.

Alexander Monakov (2):
  omp-low: add missing call to unshare_expr
  libgomp: adjust comments in nvptx team.c

 gcc/ChangeLog.gomp-nvptx |  5 +
 gcc/omp-low.c|  1 +
 libgomp/ChangeLog.gomp-nvptx |  4 
 libgomp/config/nvptx/team.c  | 16 +---
 4 files changed, 23 insertions(+), 3 deletions(-)


Re: [PATCH][2/2] Add store merging unit tests

2016-11-08 Thread Jeff Law

On 11/08/2016 05:03 AM, Kyrill Tkachov wrote:

Hi all,

This patch adds the plumbing for unit testing of the store merging pass.
It also adds some initial tests of some of the helpers used in
encode_tree_to_bitpos
to manipulate byte arrays. They caught an off-by-one error bug that is
fixed in patch [1/2].

Bootstrapped and tested on x86_64 and aarch64.

Ok for trunk?

Thanks,
Kyrill

2016-11-08  Kyrylo Tkachov  

* gimple-ssa-store-merging.c: Include selftest.h
(verify_array_eq): New function.
(verify_shift_bytes_in_array): Likewise.
(verify_shift_bytes_in_array_right): Likewise.
(verify_clear_bit_region): Likewise.
(verify_clear_bit_region_be): Likewise.
(store_merging_c_tests): Likewise.
* selftest.h (store_merging_c_tests): Declare prototype.
* selftest-run-tests.c (selftest::run_tests): Run
store_merging_c_tests.

OK.
jeff


Re: [PATCH v2] AIX visibility

2016-11-08 Thread Christophe Lyon
Hi David,

On 2 November 2016 at 16:41, David Edelsohn  wrote:
> This revised patch makes two changes:
>
> 1) Fix typo in configure.ac
> 2) Add AIX visibility support for ASM_WEAKEN_DECL, which does touch
> the same code as Linux.
>
> The AIX "weak" support fixes a large number of C++ visibility testcases.
>
> Bootstrapped on powerpc-ibm-aix7.2.0.0.
>
> * configure.ac (.hidden): Change to conftest_s string. Provide string
> for AIX assembler.
> (gcc_cv_ld_hidden): Yes for AIX.
> * configure: Regenerate.
>
> * dwarf2asm.c (USE_LINKONCE_INDIRECT): Don't set for AIX (XCOFF).
>
> * config/rs6000/rs6000-protos.h (rs6000_asm_weaken_decl): Declare
> (rs6000_xcoff_asm_output_aligned_decl_common): Declare.
> * config/rs6000/xcoff.h (TARGET_ASM_GLOBALIZE_DECL_NAME): Define.
> (ASM_OUTPUT_ALIGNED_DECL_COMMON): Define.
> (ASM_OUTPUT_ALIGNED_COMMON): Delete.
> * config/rs6000/rs6000.c (rs6000_init_builtins): Change clog rename
> from #if to if.
> (rs6000_xcoff_visibility): New.
> (rs6000_xcoff_declare_function_name): Add visibility support.
> (rs6000_xcoff_asm_globalize_decl_name): New.
> (rs6000_xcoff_asm_output_aligned_decl_common): New.
> (rs6000_asm_weaken_decl): New.
> (rs6000_code_end): Disable HIDDEN_LINKONCE on XCOFF.
> config/rs6000/rs6000.h (ASM_WEAKEN_DECL): Change definition to
> reference function.
>
> dwarf2asm.c okay?
>
> Any comments on ASM_WEAKEN_DECL change?
>
> Thanks, David

It seems this commit (r241930) is causing a regression on aarch64:
FAIL: g++.dg/torture/pr60750.C   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test

Christophe


Re: [RFC] Handle unary pass-through jump functions for ipa-vrp

2016-11-08 Thread Jan Hubicka
> Hi,
> 
> 
> On 04/11/16 04:36, Martin Jambor wrote:
> >Hi,
> >
> >On Fri, Oct 28, 2016 at 02:03:47PM +1100, kugan wrote:
> >>
> >>...snip...
> >>
> >>I have also separated the constant parameter conversion out and posted as
> >>https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02309.html. This is now
> >>handling just unary pass-through jump functions.
> >>
> >>Bootstrapped and regression tested on x86_64-linux-gnu with no new
> >>regressions.
> >>
> >>Is this OK for trunk?
> >>
> >>Thanks,
> >>Kugan
> >>
> >>gcc/testsuite/ChangeLog:
> >>
> >>2016-10-28  Kugan Vivekanandarajah  
> >>
> >>* gcc.dg/ipa/vrp7.c: New test.
> >>
> >>
> >>gcc/ChangeLog:
> >>
> >>2016-10-28  Kugan Vivekanandarajah  
> >>
> >>* ipa-cp.c (ipa_get_jf_pass_through_result): Handle unary expressions.
> >>(propagate_vr_accross_jump_function): Likewise.
> >>* ipa-prop.c (ipa_set_jf_unary_pass_through): New.
> >>(load_from_param_1): New.
> >>(load_from_unmodified_param): Factor common part into load_from_param_1.
> >>(load_from_param): New.
> >>(compute_complex_assign_jump_func): Handle unary expressions.
> >>(ipa_write_jump_function): Likewise.
> >>(ipa_read_jump_function): Likewise.
> >>
> >>
> >>>Patch is OK with changes Martin suggested.
> >>>
> >>>Honza
> >>>
> >
> >>From b7d9b413951ba20d156a7801640cc7d7bc57c062 Mon Sep 17 00:00:00 2001
> >>From: Kugan Vivekanandarajah 
> >>Date: Fri, 28 Oct 2016 10:16:38 +1100
> >>Subject: [PATCH 2/2] add unary jump function
> >>
> >>---
> >> gcc/ipa-cp.c| 39 +++---
> >> gcc/ipa-prop.c  | 89 
> >> +++--
> >> gcc/testsuite/gcc.dg/ipa/vrp7.c | 32 +++
> >> 3 files changed, 142 insertions(+), 18 deletions(-)
> >> create mode 100644 gcc/testsuite/gcc.dg/ipa/vrp7.c
> >>
> >>diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> >>index 9f28557..8fc95dd 100644
> >>--- a/gcc/ipa-cp.c
> >>+++ b/gcc/ipa-cp.c
> >>@@ -1225,13 +1225,21 @@ ipa_get_jf_pass_through_result (struct 
> >>ipa_jump_func *jfunc, tree input)
> >> return NULL_TREE;
> >>
> >>   if (TREE_CODE_CLASS (ipa_get_jf_pass_through_operation (jfunc))
> >>-  == tcc_comparison)
> >>-restype = boolean_type_node;
> >>+  == tcc_unary)
> >>+{
> >>+  res = fold_unary (ipa_get_jf_pass_through_operation (jfunc),
> >>+   TREE_TYPE (input), input);
> >>+}
> >
> >Please do not put curly braces around a single statement.  Apart from
> >that, no objection from me.
> 
> Thanks Martin, I will fix this.
> 
> Honza, is this OK for you with the above fix?

OK,
thanks!
Honza
> 
> Thanks,
> Kugan
> >
> >Thanks,
> >
> >Martin
> >


Re: [ipa-vrp] ice in set_value_range

2016-11-08 Thread Jan Hubicka
> Hi,
> 
> On 04/11/16 03:24, Martin Jambor wrote:
> >Hi,
> >
> >On Fri, Oct 28, 2016 at 01:58:13PM +1100, kugan wrote:
> >>>Do I understand it correctly that extract_range_from_unary_expr deals
> >>>with any potential type conversions better (compared to what you did
> >>>before here)?
> >>
> >>Yes, this can be wrong at times too as reported in
> >>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78121. I have separated this
> >>part of the patch with a testcase.
> >>
> >>Please note that I am using fold_convert in the attached patch.
> >>
> >>Bootstrapped and regression tested on x86_64-linux-gnu with no new
> >>regressions. Is this OK for trunk?
> >>
> >
> >I have no objections, but we need to wait for Honza.
> Thanks.
> 
> Honza, is this OK for you ?
OK,
thanks!
Honza
> 
> Thanks,
> Kugan
> 
> >
> >Thanks,
> >
> >Martin
> >
> >>Thanks,
> >>Kugan
> >>
> >>
> >>gcc/ChangeLog:
> >>
> >>2016-10-28  Kugan Vivekanandarajah  
> >>
> >>PR ipa/78121
> >>* ipa-cp.c (propagate_vr_accross_jump_function): Pass param type.
> >>Also fold constant passed as argument while computing value range.
> >>(propagate_constants_accross_call): Pass param type.
> >>* ipa-prop.c: export ipa_get_callee_param_type.
> >>* ipa-prop.h: export ipa_get_callee_param_type.
> >>
> >>gcc/testsuite/ChangeLog:
> >>
> >>2016-10-28  Kugan Vivekanandarajah  
> >>
> >>PR ipa/78121
> >>* gcc.dg/ipa/pr78121.c: New test.
> >


Re: [Patch, Fortran, F03] PR77596: procedure pointer component with implicit interface can point to a function

2016-11-08 Thread Steve Kargl
On Tue, Nov 08, 2016 at 11:02:26AM +0100, Janus Weil wrote:
> 
> here is a simple patch for the accepts-invalid problem of PR77596.
> Regtests cleanly on x86_64-linux-gnu. Ok for trunk?
> 

Yes.  (and welcome back to the wonderful world of bugzilla).

-- 
Steve


[PATCH] S390: Fix PR/77822.

2016-11-08 Thread Dominik Vogt
The attached patch fixes PR/77822 on s390/s390x dor gcc-6 *only*.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77822

Bootstrapped and regression tested on s390 and s390x biarch on a
zEC12.

For gcc-7, there will be a different patch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.md ("extzv", "*extzv_zEC12")
("*extzv_z10"): Check validity of zero_extend arguments.
gcc/testsuite/ChangeLog

* gcc.target/s390/pr77822.c: New test for PR/77822.
>From 5c007442158756a36f2ede66f372a42d9a7b2aa6 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Tue, 8 Nov 2016 09:54:03 +0100
Subject: [PATCH] S390: Fix PR/77822.

Check the range of the arguments in extzv patterns.  This avoids generating
risbg instructions with an out of range shift count.
---
 gcc/config/s390/s390.md | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index b3d4370..899ed62 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -3708,6 +3708,10 @@
  (clobber (reg:CC CC_REGNUM))])]
   "TARGET_Z10"
 {
+  if (!IN_RANGE (INTVAL (operands[3]), 0, GET_MODE_BITSIZE (DImode) - 1)
+  || !IN_RANGE (INTVAL (operands[3]) + INTVAL (operands[2]), 1,
+   GET_MODE_BITSIZE (DImode)))
+FAIL;
   /* Starting with zEC12 there is risbgn not clobbering CC.  */
   if (TARGET_ZEC12)
 {
@@ -3726,7 +3730,9 @@
 (match_operand:GPR 1 "register_operand" "d")
 (match_operand 2 "const_int_operand" "")   ; size
 (match_operand 3 "const_int_operand" "")))] ; start]
-  "TARGET_ZEC12"
+  "TARGET_ZEC12
+   && IN_RANGE (INTVAL (operands[3]), 0, GET_MODE_BITSIZE (DImode) - 1)
+   && IN_RANGE (INTVAL (operands[3]) + INTVAL (operands[2]), 1, 
GET_MODE_BITSIZE (DImode))"
   "risbgn\t%0,%1,64-%2,128+63,+%3+%2" ; dst, src, start, end, shift
   [(set_attr "op_type" "RIE")])
 
@@ -3737,7 +3743,9 @@
(match_operand 2 "const_int_operand" "")   ; size
(match_operand 3 "const_int_operand" ""))) ; start
(clobber (reg:CC CC_REGNUM))]
-  "TARGET_Z10"
+  "TARGET_Z10
+   && IN_RANGE (INTVAL (operands[3]), 0, GET_MODE_BITSIZE (DImode) - 1)
+   && IN_RANGE (INTVAL (operands[3]) + INTVAL (operands[2]), 1, 
GET_MODE_BITSIZE (DImode))"
   "risbg\t%0,%1,64-%2,128+63,+%3+%2" ; dst, src, start, end, shift
   [(set_attr "op_type" "RIE")
(set_attr "z10prop" "z10_super_E1")])
-- 
2.3.0



[arm-embedded][committed] Add support for -mpure-code option

2016-11-08 Thread Andre Vieira (lists)
Hi,

Backported the support for the ARM -mpure-code option, including the
follow-up fixes to the embedded-6-branch. Tested for arm-none-eabi.

Committed in revision r241970.

Cheers,
Andre

gcc/ChangeLog.arm:
2016-11-08  Andre Vieira  

   Backport from mainline
   2016-09-23  Uros Bizjak  
   Jakub Jelinek  

   * hooks.h (hook_uint_uintp_false): Rename to...
   (hook_bool_uint_uintp_false): ... this.
   * hooks.c (hook_uint_uintp_false): Rename to...
   (hook_bool_uint_uintp_false): ... this.
   * target.def (elf_flags_numeric): Use hook_bool_uint_uintp_false
   instead of hook_uint_uintp_false.

   2016-09-23  Richard Biener  

   * hooks.h (hook_uint_uintp_false): Declare.

   2016-09-22  Andre Vieira  
   Terry Guo  

   * target.def (elf_flags_numeric): New target hook.
   * targhooks.h (default_asm_elf_flags_numeric): New.
   * varasm.c (default_asm_elf_flags_numeric): New.
   (default_elf_asm_named_section): Use new target hook.
   * config/arm/arm.opt (mpure-code): New.
   * config/arm/arm.h (SECTION_ARM_PURECODE): New.
   * config/arm/arm.c (arm_asm_init_sections): Add section
   attribute to default text section if -mpure-code.
   (arm_option_check_internal): Diagnose use of option with
   non supported targets and/or options.
   (arm_asm_elf_flags_numeric): New.
   (arm_function_section): New.
   (arm_elf_section_type_flags): New.
   * config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Disable
   for -mpure-code.
   * gcc/doc/texi (TARGET_ASM_ELF_FLAGS_NUMERIC): New.
   * gcc/doc/texi.in (TARGET_ASM_ELF_FLAGS_NUMERIC): Likewise.

gcc/testsuite/ChangeLog.arm:
2016-11-08  Andre Vieira  

   Backport from mainline
   2016-10-21  Andre Vieira  

   * gcc.target/arm/pure-code/pure-code.exp: Require arm_cortex_m
   effective target.

   2016-09-22  Andre Vieira  
   Terry Guo  

   * gcc.target/arm/pure-code/ffunction-sections.c: New.
   * gcc.target/arm/pure-code/no-literal-pool.c: New.
   * gcc.target/arm/pure-code/pure-code.exp: New.
diff --git a/gcc/ChangeLog.arm b/gcc/ChangeLog.arm
index 
f45bd18e94e597208132b067607bf186d2e30f09..b048981d664a952135ff942fe298cce32bbc52a9
 100644
--- a/gcc/ChangeLog.arm
+++ b/gcc/ChangeLog.arm
@@ -1,6 +1,44 @@
 2016-11-08  Andre Vieira  
 
Backport from mainline
+   2016-09-23  Uros Bizjak  
+   Jakub Jelinek  
+
+   * hooks.h (hook_uint_uintp_false): Rename to...
+   (hook_bool_uint_uintp_false): ... this.
+   * hooks.c (hook_uint_uintp_false): Rename to...
+   (hook_bool_uint_uintp_false): ... this.
+   * target.def (elf_flags_numeric): Use hook_bool_uint_uintp_false
+   instead of hook_uint_uintp_false.
+
+   2016-09-23  Richard Biener  
+
+   * hooks.h (hook_uint_uintp_false): Declare.
+
+   2016-09-22  Andre Vieira  
+   Terry Guo  
+
+   * target.def (elf_flags_numeric): New target hook.
+   * targhooks.h (default_asm_elf_flags_numeric): New.
+   * varasm.c (default_asm_elf_flags_numeric): New.
+   (default_elf_asm_named_section): Use new target hook.
+   * config/arm/arm.opt (mpure-code): New.
+   * config/arm/arm.h (SECTION_ARM_PURECODE): New.
+   * config/arm/arm.c (arm_asm_init_sections): Add section
+   attribute to default text section if -mpure-code.
+   (arm_option_check_internal): Diagnose use of option with
+   non supported targets and/or options.
+   (arm_asm_elf_flags_numeric): New.
+   (arm_function_section): New.
+   (arm_elf_section_type_flags): New.
+   * config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Disable
+   for -mpure-code.
+   * gcc/doc/texi (TARGET_ASM_ELF_FLAGS_NUMERIC): New.
+   * gcc/doc/texi.in (TARGET_ASM_ELF_FLAGS_NUMERIC): Likewise.
+
+2016-11-08  Andre Vieira  
+
+   Backport from mainline
2016-05-27  Ulrich Weigand  
 
* configure.ac: Treat a --with-headers option without argument
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
77f2e102fe55246676598b8f93ea6737dcc2abbb..93f5e25099501c2b475eb1801dfc7228ced3f839
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2285,4 +2285,8 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
 /* For switching between functions with different target attributes.  */
 #define SWITCHABLE_TARGET 1
 
+/* Define SECTION_ARM_PURECODE as the ARM specific 

Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-08 Thread Dominik Vogt
On Fri, Nov 04, 2016 at 01:54:20PM +0100, Richard Biener wrote:
> On Fri, Nov 4, 2016 at 12:08 PM, Dominik Vogt  wrote:
> > On Fri, Nov 04, 2016 at 09:47:26AM +0100, Richard Biener wrote:
> >> On Thu, Nov 3, 2016 at 4:03 PM, Dominik Vogt  
> >> wrote:
> >> > Is VRP the right pass to do this optimisation or should a later
> >> > pass rather attempt to eliminate the new use of b_5 instead?  Uli
> >> > has brought up the idea a mini "sign extend elimination" pass that
> >> > checks if the result of a sign extend could be replaced by the
> >> > original quantity in all places, and if so, eliminate the ssa
> >> > name.  (I guess that won't help with the above code because l is
> >> > used also as a function argument.)  How could a sensible approach
> >> > to deal with the situation look like?
> >>
> >> We run into this kind of situation regularly and for general foldings
> >> in match.pd we settled with single_use () even though it is not perfect.
> >> Note the usual complaint is not extra extension instructions but
> >> the increase of register pressure.
> >>
> >> This is because it is hard to do better when you are doing local
> >> optimization.
> >>
> >> As for the question on whether VRP is the right pass to do this the
> >> answer is two-fold -- VRP has the most precise range information.
> >> But the folding itself should be moved to generic code and use
> >> get_range_info ().
> >
> > All right, is this a sensible approach then?
> 
> Yes.
> 
> >   1. Using has_single_use as in the experimental patch is good
> >  enough (provided further testing does not show serious
> >  regressions).
> 
> I'd approve that, yes.
> 
> >   2. Rip the whole top level if-block from simplify_cond_using_ranges().
> >   3. Translate that code to match.pd syntax.
> 
> Might be some work but yes, that's also desired (you'd lose the ability
> to emit the warnings though).

Could you give me a match-pd-hint please?  We now have something
like this:

 (simplify
  (cond (gt SSA_NAME@0 INTEGER_CST@1) @2 @3)
  (if (... many conditions ...)
   (cond (gt ... ...) @2 @3))

The generated code ends up in gimple_simplify_COND_EXPR, but when
gimple_simplify is actually called, it goes through the
GIMPLE_COND case and calls gimple_resimplify2(..., GT, ...) and
there it tries gimple_simplify_GT_EXPR(), peeling of the (cond
...), i.e. it never tries the generated code.

There is another pattern in match.pd that uses a (cond ...) as the
first operand, and I do not understand how this works.  Should we
just use "(gt SSA_NAME@0 INTEGER_CST@1)" as the first operand
instead, and wouldn't this pattern be too general that way?

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



[PING, PATCH] DWARF: make signedness explicit for enumerator const values

2016-11-08 Thread Pierre-Marie de Rodat

Hello,

Ping for the patch submitted there: 
.


Thank you in advance!

--
Pierre-Marie de Rodat


[PATCH] debug/PR78112: remove recent duplicates for DW_TAG_subprogram attributes

2016-11-08 Thread Pierre-Marie de Rodat
Hello,

This patch comes from the discussion for PR debug/78112
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78112).  It seems to fix
the regression Rainer detected on x86_64-apple-darwin, although I still
wonder why remaining duplicate attributes (which were already present
before my original patch) are not a problem on this platform.

Anyway, bootstrapped and regtested on x86_64-linux.  As I said on the
PR, I managed to check with a Python script that there were no
duplicates anymore but I don't know how to turn this into a DejaGNU
testcase.

Ok to commit?

gcc/

PR debug/78112
* dwarf2out.c (dwarf2out_early_global_decl): Call dwarf2out_decl
on the context only when it has no DIE yet.
---
 gcc/dwarf2out.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 1dfff38..f9ec090 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -25256,11 +25256,11 @@ dwarf2out_early_global_decl (tree decl)
  if (!DECL_STRUCT_FUNCTION (decl))
goto early_decl_exit;
 
- /* For nested functions, emit DIEs for the parents first so that all
-nested DIEs are generated at the proper scope in the first
-shot.  */
+ /* For nested functions, make sure we have DIEs for the parents first
+so that all nested DIEs are generated at the proper scope in the
+first shot.  */
  tree context = decl_function_context (decl);
- if (context != NULL)
+ if (context != NULL && lookup_decl_die (context) == NULL)
{
  current_function_decl = context;
  dwarf2out_decl (context);
-- 
2.10.2



Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-08 Thread Yuri Rumyantsev
Richard,

Here is updated 3 patch.

I checked that all new tests related to epilogue vectorization passed with it.

Your comments will be appreciated.

2016-11-08 15:38 GMT+03:00 Richard Biener :
> On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>
>> Hi Richard,
>>
>> I did not understand your last remark:
>>
>> > That is, here (and avoid the FOR_EACH_LOOP change):
>> >
>> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>> >   && dump_enabled_p ())
>> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>> >"loop vectorized\n");
>> > -   vect_transform_loop (loop_vinfo);
>> > +   new_loop = vect_transform_loop (loop_vinfo);
>> > num_vectorized_loops++;
>> >/* Now that the loop has been vectorized, allow it to be unrolled
>> >   etc.  */
>> >  loop->force_vectorize = false;
>> >
>> > +   /* Add new loop to a processing queue.  To make it easier
>> > +  to match loop and its epilogue vectorization in dumps
>> > +  put new loop as the next loop to process.  */
>> > +   if (new_loop)
>> > + {
>> > +   loops.safe_insert (i + 1, new_loop->num);
>> > +   vect_loops_num = number_of_loops (cfun);
>> > + }
>> >
>> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>> f> unction which will set up stuff properly (and also perform
>> > the if-conversion of the epilogue there).
>> >
>> > That said, if we can get in non-masked epilogue vectorization
>> > separately that would be great.
>>
>> Could you please clarify your proposal.
>
> When a loop was vectorized set things up to immediately vectorize
> its epilogue, avoiding changing the loop iteration and avoiding
> the re-use of ->aux.
>
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2016-11-02 15:27 GMT+03:00 Richard Biener :
>> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>> >
>> >> Hi All,
>> >>
>> >> I re-send all patches sent by Ilya earlier for review which support
>> >> vectorization of loop epilogues and loops with low trip count. We
>> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>> >> approved by Jeff.
>> >>
>> >> I did re-base of all patches and performed bootstrapping and
>> >> regression testing that did not show any new failures. Also all
>> >> changes related to new vect_do_peeling algorithm have been changed
>> >> accordingly.
>> >>
>> >> Is it OK for trunk?
>> >
>> > I would have prefered that the series up to -03-nomask-tails would
>> > _only_ contain epilogue loop vectorization changes but unfortunately
>> > the patchset is oddly separated.
>> >
>> > I have a comment on that part nevertheless:
>> >
>> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
>> > loop_vinfo)
>> >/* Check if we can possibly peel the loop.  */
>> >if (!vect_can_advance_ivs_p (loop_vinfo)
>> >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
>> > -  || loop->inner)
>> > +  || loop->inner
>> > +  /* Required peeling was performed in prologue and
>> > +is not required for epilogue.  */
>> > +  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>> >  do_peeling = false;
>> >
>> >if (do_peeling
>> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
>> > loop_vinfo)
>> >
>> >do_versioning =
>> > optimize_loop_nest_for_speed_p (loop)
>> > -   && (!loop->inner); /* FORNOW */
>> > +   && (!loop->inner) /* FORNOW */
>> > +/* Required versioning was performed for the
>> > +  original loop and is not required for epilogue.  */
>> > +   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>> >
>> >if (do_versioning)
>> >  {
>> >
>> > please do that check in the single caller of this function.
>> >
>> > Otherwise I still dislike the new ->aux use and I believe that simply
>> > passing down info from the processed parent would be _much_ cleaner.
>> > That is, here (and avoid the FOR_EACH_LOOP change):
>> >
>> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>> > && dump_enabled_p ())
>> >dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>> > "loop vectorized\n");
>> > -   vect_transform_loop (loop_vinfo);
>> > +   new_loop = vect_transform_loop (loop_vinfo);
>> > num_vectorized_loops++;
>> > /* Now that the loop has been vectorized, allow it to be unrolled
>> >etc.  */
>> > loop->force_vectorize = false;
>> >
>> > +   /* Add new loop to a processing queue.  To make it easier
>> > +  to match loop and its epilogue vectorization in dumps
>> > +  put new loop as the next loop to process.  */
>> > +   if (new_loop)
>> > + {
>> > +   loops.safe_insert (i + 1, new_loop->num);
>> > +   vect_loops_num = number_of_loops (cfun);
>> > + }
>> >
>> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>> > 

Re: [PATCH][ARM] Fix ldrd offsets

2016-11-08 Thread Ramana Radhakrishnan
On Thu, Nov 3, 2016 at 12:20 PM, Wilco Dijkstra  wrote:
> Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020,
> without -255..4091.  This reduces the number of addressing instructions
> when using DI mode operations (such as in PR77308).
>
> Bootstrap & regress OK.
>
> ChangeLog:
> 2015-11-03  Wilco Dijkstra  
>
> gcc/
> * config/arm/arm.c (arm_legitimate_index_p): Add comment.
> (thumb2_legitimate_index_p): Use correct range for DI/DF mode.
> --
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 
> 3c4c7042d9c2101619722b5822b3d1ca37d637b9..5d12cf9c46c27d60a278d90584bde36ec86bb3fe
>  100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -7486,6 +7486,8 @@ arm_legitimate_index_p (machine_mode mode, rtx index, 
> RTX_CODE outer,
> {
>   HOST_WIDE_INT val = INTVAL (index);
>
> + /* Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
> +If vldr is selected it uses arm_coproc_mem_operand.  */
>   if (TARGET_LDRD)
> return val > -256 && val < 256;
>   else
> @@ -7613,11 +7615,13 @@ thumb2_legitimate_index_p (machine_mode mode, rtx 
> index, int strict_p)
>if (code == CONST_INT)
> {
>   HOST_WIDE_INT val = INTVAL (index);
> - /* ??? Can we assume ldrd for thumb2?  */
> - /* Thumb-2 ldrd only has reg+const addressing modes.  */
> - /* ldrd supports offsets of +-1020.
> -However the ldr fallback does not.  */
> - return val > -256 && val < 256 && (val & 3) == 0;
> + /* Thumb-2 ldrd only has reg+const addressing modes.
> +Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
> +If vldr is selected it uses arm_coproc_mem_operand.  */
> + if (TARGET_LDRD)

I suspect this should be : if (TARGET_LDRD && !fix_cm3_ldrd)  - I am a
bit worried about this change because of the non-uniformity with ldr
and the fallout with other places where things may break with this.  I
would like a test with -mcpu=cortex-m3/-mthumb as well for an
arm-none-eabi target to see what the fallout of this change is on that
...


Ramana


Re: [PATCH, GCC/ARM, ping] Fix PR77904: callee-saved register trashed when clobbering sp

2016-11-08 Thread Thomas Preudhomme

Ping?

Best regards,

Thomas

On 03/11/16 16:52, Thomas Preudhomme wrote:

Hi,

When using a callee-saved register to save the frame pointer the Thumb-1
prologue fails to save the callee-saved register before that. For ARM and
Thumb-2 targets the frame pointer is handled as a special case but nothing is
done for Thumb-1 targets. This patch adds the same logic for Thumb-1 targets.

ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2016-11-02  Thomas Preud'homme  

PR target/77904
* config/arm/arm.c (thumb1_compute_save_reg_mask): mark frame pointer
in save register mask if it is needed.


*** gcc/testsuite/ChangeLog ***

2016-11-02  Thomas Preud'homme  

PR target/77904
* gcc.target/arm/pr77904.c: New test.


Testing: Testsuite shows no regression when run with arm-none-eabi GCC
cross-compiler for Cortex-M0 target.

Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dd8d5e5db8ca50daab648e58df290969aa794862..c7bf3320a3db5dfc4f33ae145ff2e5f239d6c0f9 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19495,6 +19495,10 @@ thumb1_compute_save_reg_mask (void)
 if (df_regs_ever_live_p (reg) && callee_saved_reg_p (reg))
   mask |= 1 << reg;
 
+  /* Handle the frame pointer as a special case.  */
+  if (frame_pointer_needed)
+mask |= 1 << HARD_FRAME_POINTER_REGNUM;
+
   if (flag_pic
   && !TARGET_SINGLE_PIC_BASE
   && arm_pic_register != INVALID_REGNUM
diff --git a/gcc/testsuite/gcc.target/arm/pr77904.c b/gcc/testsuite/gcc.target/arm/pr77904.c
new file mode 100644
index ..76728c07e73350ce44160cabff3dd2fa7a6ef021
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr77904.c
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+__attribute__ ((noinline, noclone)) void
+clobber_sp (void)
+{
+  __asm volatile ("" : : : "sp");
+}
+
+int
+main (void)
+{
+  int ret;
+
+  __asm volatile ("mov\tr4, #0xf4\n\t"
+		  "mov\tr5, #0xf5\n\t"
+		  "mov\tr6, #0xf6\n\t"
+		  "mov\tr7, #0xf7\n\t"
+		  "mov\tr0, #0xf8\n\t"
+		  "mov\tr8, r0\n\t"
+		  "mov\tr0, #0xfa\n\t"
+		  "mov\tr10, r0"
+		  : : : "r0", "r4", "r5", "r6", "r7", "r8", "r10");
+  clobber_sp ();
+
+  __asm volatile ("cmp\tr4, #0xf4\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr5, #0xf5\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr6, #0xf6\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr7, #0xf7\n\t"
+		  "bne\tfail\n\t"
+		  "mov\tr0, r8\n\t"
+		  "cmp\tr0, #0xf8\n\t"
+		  "bne\tfail\n\t"
+		  "mov\tr0, r10\n\t"
+		  "cmp\tr0, #0xfa\n\t"
+		  "bne\tfail\n\t"
+		  "mov\t%0, #1\n"
+		  "fail:\n\t"
+		  "sub\tr0, #1"
+		  : "=r" (ret) : :);
+  return ret;
+}


Re: [PATCH, GCC/ARM 2/2, ping2] Allow combination of aprofile and rmprofile multilibs

2016-11-08 Thread Thomas Preudhomme

Ping?

Best regards,

Thomas

On 02/11/16 10:05, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 24/10/16 09:07, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 13/10/16 16:35, Thomas Preudhomme wrote:

Hi ARM maintainers,

This patchset aims at adding multilib support for R and M profile ARM
architectures and allowing it to be built alongside multilib for A profile ARM
architectures. This specific patch is concerned with the latter. The patch works
by moving the bits shared by both aprofile and rmprofile multilib build
(variable initilization as well as ISA and float ABI to build multilib for) to a
new t-multilib file. Then, based on which profile was requested in
--with-multilib-list option, that files includes t-aprofile and/or t-rmprofile
where the architecture and FPU to build the multilib for are specified.

Unfortunately the duplication of CPU to A profile architectures could not be
avoided because substitution due to MULTILIB_MATCHES are not transitive.
Therefore, mapping armv7-a to armv7 for rmprofile multilib build does not have
the expected effect. Two patches were written to allow this using 2 different
approaches but I decided against it because this is not the right solution IMO.
See caveats below for what I believe is the correct approach.


*** combined build caveats ***

As the documentation in this patch warns, there is a few caveats to using a
combined multilib build due to the way the multilib framework works.

1) For instance, when using only rmprofile the combination of options -mthumb
-march=armv7 -mfpu=neon the thumb/-march=armv7 multilib but in a combined
multilib build the default multilib would be used. This is because in the
rmprofile build -mfpu=neon is not specified in MULTILIB_OPTION and thus the
option is ignored when considering MULTILIB_REQUIRED entries.

2) Another issue is the fact that aprofile and rmprofile multilib build have
some conflicting requirements in terms of how to map options for which no
multilib is built to another option. (i) A first example of this is the
difference of CPU to architecture mapping mentionned above: rmprofile multilib
build needs A profile CPUs and architectures to be mapped down to ARMv7 so that
one of the v7-ar multilib gets chosen in such a case but aprofile needs A
profile architectures to stand on their own because multilibs are built for
several architectures.

(ii) Another example of this is that in aprofile multilib build no multilib is
built with -mfpu=fpv5-d16 but some multilibs are built with -mfpu=fpv4-d16.
Therefore, aprofile defines a match rule to map fpv5-d16 onto fpv4-d16. However,
rmprofile multilib profile *does* build some multilibs with -mfpu=fpv5-d16. This
has the consequence that when building for -mthumb -march=armv7e-m
-mfpu=fpv5-d16 -mfloat-abi=hard the default multilib is chosen because this is
rewritten into -mthumb -march=armv7e-m -mfpu=fpv5-d16 -mfloat-abi=hard and there
is no multilib for that.

Both of these issues could be handled by using MULTILIB_REUSE instead of
MULTILIB_MATCHES but this would require a large set of rules. I believe instead
the right approach is to create a new mechanism to inform GCC on how options can
be down mapped _when no multilib can be found_ which would require a smaller set
of rules and would make it explicit that the options are not equivalent. A patch
will be posted to this effect at a later time.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme  

* config.gcc: Allow combinations of aprofile and rmprofile values for
--with-multilib-list.
* config/arm/t-multilib: New file.
* config/arm/t-aprofile: Remove initialization of MULTILIB_*
variables.  Remove setting of ISA and floating-point ABI in
MULTILIB_OPTIONS and MULTILIB_DIRNAMES.  Set architecture and FPU in
MULTI_ARCH_OPTS_A and MULTI_ARCH_DIRS_A rather than MULTILIB_OPTIONS
and MULTILIB_DIRNAMES respectively.  Add comment to introduce all
matches.  Add architecture matches for marvel-pj4 and generic-armv7-a
CPU options.
* config/arm/t-rmprofile: Likewise except for the matches changes.
* doc/install.texi (--with-multilib-list): Document the combination of
aprofile and rmprofile values and warn about pitfalls in doing that.


Testing:

* "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same before and after the
patchset for both aprofile and rmprofile
* "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same for aprofile,rmprofile
and rmprofile,aprofile
* default spec (gcc -dumpspecs) is the same for aprofile,rmprofile and
rmprofile,aprofile

* Difference in --print-multi-directory between aprofile or rmprofile and
aprofile,rmprofile for all combination of ISA (ARM/Thumb), architecture, CPU,
FPU and float ABI is as expected (best multilib for the combination is chosen),
modulo the caveat mentionned above and the new marvel-pj4 and 

Re: [PATCH, GCC/ARM 1/2, ping2] Add multilib support for embedded bare-metal targets

2016-11-08 Thread Thomas Preudhomme

Ping?

Best regards,

Thomas

On 02/11/16 10:05, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 27/10/16 15:26, Thomas Preudhomme wrote:

Hi Kyrill,

On 27/10/16 10:45, Kyrill Tkachov wrote:

Hi Thomas,

On 24/10/16 09:06, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 13/10/16 16:35, Thomas Preudhomme wrote:

Hi ARM maintainers,

This patchset aims at adding multilib support for R and M profile ARM
architectures and allowing it to be built alongside multilib for A profile ARM
architectures. This specific patch adds the t-rmprofile multilib Makefile
fragment for the former objective. Multilib are built for all M profile
architecture involved: ARMv6S-M, ARMv7-M and ARMv7E-M as well as ARMv7. ARMv7
multilib is used for R profile architectures but also A profile architectures.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme 

* config.gcc: Allow new rmprofile value for configure option
--with-multilib-list.
* config/arm/t-rmprofile: New file.
* doc/install.texi (--with-multilib-list): Document new rmprofile
value
for ARM.


Testing:

== aprofile ==
* "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same before and after the
patchset for both aprofile and rmprofile
* default spec (gcc -dumpspecs) is the same before and after the patchset for
aprofile
* No difference in --print-multi-directory between before and after the
patchset
for aprofile for all combination of ISA (ARM/Thumb), architecture, CPU, FPU
and
float ABI

== rmprofile ==
* aprofile and rmprofile use similar directory structure (ISA/arch/FPU/float
ABI) and directory naming
* Difference in --print-multi-directory between before [1] and after the
patchset for rmprofile for all combination of ISA (ARM/Thumb), architecture,
CPU, FPU and float ABI modulo the name and directory structure changes

[1] as per patch applied in ARM embedded branches
https://gcc.gnu.org/viewcvs/gcc/branches/ARM/embedded-5-branch/gcc/config/arm/t-baremetal?view=markup





== aprofile + rmprofile ==
* aprofile,rmprofile and rmprofile,aprofile builds give an error saying it is
not supported


Is this ok for master branch?

Best regards,

Thomas


+# Arch Matches
+MULTILIB_MATCHES   += march?armv6s-m=march?armv6-m
+MULTILIB_MATCHES   += march?armv8-m.main=march?armv8-m.main+dsp
+MULTILIB_MATCHES   += march?armv7=march?armv7-r
+ifeq (,$(HAS_APROFILE))
+MULTILIB_MATCHES   += march?armv7=march?armv7-a
+MULTILIB_MATCHES   += march?armv7=march?armv7ve
+MULTILIB_MATCHES   += march?armv7=march?armv8-a
+MULTILIB_MATCHES   += march?armv7=march?armv8-a+crc
+MULTILIB_MATCHES   += march?armv7=march?armv8.1-a
+MULTILIB_MATCHES   += march?armv7=march?armv8.1-a+crc
+endif

I think you want to update the patch to handle -march=armv8.2-a and
armv8.2-a+fp16
Thanks,
Kyrill


Indeed. Please find updated ChangeLog and patch (attached):

*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme  

* config.gcc: Allow new rmprofile value for configure option
--with-multilib-list.
* config/arm/t-rmprofile: New file.
* doc/install.texi (--with-multilib-list): Document new rmprofile value
for ARM.

Ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config.gcc b/gcc/config.gcc
index d956da22ad60abfe9c6b4be0882f9e7dd64ac39f..15b662ad5449f8b91eb760b7fbe45f33d8cecb4b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3739,6 +3739,16 @@ case "${target}" in
 # pragmatic.
 tmake_profile_file="arm/t-aprofile"
 ;;
+			rmprofile)
+# Note that arm/t-rmprofile is a
+# stand-alone make file fragment to be
+# used only with itself.  We do not
+# specifically use the
+# TM_MULTILIB_OPTION framework because
+# this shorthand is more
+# pragmatic.
+tmake_profile_file="arm/t-rmprofile"
+;;
 			default)
 ;;
 			*)
@@ -3748,9 +3758,10 @@ case "${target}" in
 			esac
 
 			if test "x${tmake_profile_file}" != x ; then
-# arm/t-aprofile is only designed to work
-# without any with-cpu, with-arch, with-mode,
-# with-fpu or with-float options.
+# arm/t-aprofile and arm/t-rmprofile are only
+# designed to work without any with-cpu,
+# with-arch, with-mode, with-fpu or with-float
+# options.
 if test "x$with_arch" != x \
 || test "x$with_cpu" != x \
 || test "x$with_float" != x \
diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
new file mode 100644
index ..c8b5c9cbd03694eea69855e20372afa3e97d6b4c
--- /dev/null
+++ b/gcc/config/arm/t-rmprofile
@@ -0,0 +1,174 @@
+# Copyright (C) 2016 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at 

Re: [PATCH, GCC/ARM, ping] Optional -mthumb for Thumb only targets

2016-11-08 Thread Thomas Preudhomme

Ping?

Best regards,

Thomas

On 25/10/16 18:07, Thomas Preudhomme wrote:

Hi,

Currently when a user compiles for a thumb-only target (such as Cortex-M
processors) without specifying the -mthumb option GCC throws the error "target
CPU does not support ARM mode". This is suboptimal from a usability point of
view: the -mthumb could be deduced from the -march or -mcpu option when there is
no ambiguity.

This patch implements this behavior by extending the DRIVER_SELF_SPECS to
automatically append -mthumb to the command line for thumb-only targets. It does
so by checking the last -march option if any is given or the last -mcpu option
otherwise. There is no ordering issue because conflicting -mcpu and -march is
already handled.

Note that the logic cannot be implemented in function arm_option_override
because we need to provide the modified command line to the GCC driver for
finding the right multilib path and the function arm_option_override is executed
too late for that effect.

ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2016-10-18  Terry Guo  
Thomas Preud'homme  

PR target/64802
* common/config/arm/arm-common.c (arm_target_thumb_only): New function.
* config/arm/arm-opts.h: Include arm-flags.h.
(struct arm_arch_core_flag): Define.
(arm_arch_core_flags): Define.
* config/arm/arm-protos.h: Include arm-flags.h.
(FL_NONE, FL_ANY, FL_CO_PROC, FL_ARCH3M, FL_MODE26, FL_MODE32,
FL_ARCH4, FL_ARCH5, FL_THUMB, FL_LDSCHED, FL_STRONG, FL_ARCH5E,
FL_XSCALE, FL_ARCH6, FL_VFPV2, FL_WBUF, FL_ARCH6K, FL_THUMB2, FL_NOTM,
FL_THUMB_DIV, FL_VFPV3, FL_NEON, FL_ARCH7EM, FL_ARCH7, FL_ARM_DIV,
FL_ARCH8, FL_CRC32, FL_SMALLMUL, FL_NO_VOLATILE_CE, FL_IWMMXT,
FL_IWMMXT2, FL_ARCH6KZ, FL2_ARCH8_1, FL2_ARCH8_2, FL2_FP16INST,
FL_TUNE, FL_FOR_ARCH2, FL_FOR_ARCH3, FL_FOR_ARCH3M, FL_FOR_ARCH4,
FL_FOR_ARCH4T, FL_FOR_ARCH5, FL_FOR_ARCH5T, FL_FOR_ARCH5E,
FL_FOR_ARCH5TE, FL_FOR_ARCH5TEJ, FL_FOR_ARCH6, FL_FOR_ARCH6J,
FL_FOR_ARCH6K, FL_FOR_ARCH6Z, FL_FOR_ARCH6ZK, FL_FOR_ARCH6KZ,
FL_FOR_ARCH6T2, FL_FOR_ARCH6M, FL_FOR_ARCH7, FL_FOR_ARCH7A,
FL_FOR_ARCH7VE, FL_FOR_ARCH7R, FL_FOR_ARCH7M, FL_FOR_ARCH7EM,
FL_FOR_ARCH8A, FL2_FOR_ARCH8_1A, FL2_FOR_ARCH8_2A, FL_FOR_ARCH8M_BASE,
FL_FOR_ARCH8M_MAIN, arm_feature_set, ARM_FSET_MAKE,
ARM_FSET_MAKE_CPU1, ARM_FSET_MAKE_CPU2, ARM_FSET_CPU1, ARM_FSET_CPU2,
ARM_FSET_EMPTY, ARM_FSET_ANY, ARM_FSET_HAS_CPU1, ARM_FSET_HAS_CPU2,
ARM_FSET_HAS_CPU, ARM_FSET_ADD_CPU1, ARM_FSET_ADD_CPU2,
ARM_FSET_DEL_CPU1, ARM_FSET_DEL_CPU2, ARM_FSET_UNION, ARM_FSET_INTER,
ARM_FSET_XOR, ARM_FSET_EXCLUDE, ARM_FSET_IS_EMPTY,
ARM_FSET_CPU_SUBSET): Move to ...
* config/arm/arm-flags.h: This new file.
* config/arm/arm.h (TARGET_MODE_SPEC_FUNCTIONS): Define.
(EXTRA_SPEC_FUNCTIONS): Add TARGET_MODE_SPEC_FUNCTIONS to its value.
(TARGET_MODE_SPECS): Define.
(DRIVER_SELF_SPECS): Add TARGET_MODE_SPECS to its value.


*** gcc/testsuite/ChangeLog ***

2016-10-11  Thomas Preud'homme  

PR target/64802
* gcc.target/arm/optional_thumb-1.c: New test.
* gcc.target/arm/optional_thumb-2.c: New test.
* gcc.target/arm/optional_thumb-3.c: New test.


No regression when running the testsuite for -mcpu=cortex-m0 -mthumb,
-mcpu=cortex-m0 -marm and -mcpu=cortex-a8 -marm

Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c
index a9abd6b026e2f35844e810fecf09e9890ea41e21..29ae0c35dd036a5293a51dc16f356e6ed668d3c2 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -97,6 +97,29 @@ arm_rewrite_mcpu (int argc, const char **argv)
   return arm_rewrite_selected_cpu (argv[argc - 1]);
 }
 
+/* Called by the driver to check whether the target denoted by current
+   command line options is a Thumb-only target.  ARGV is an array of
+   -march and -mcpu values (ie. it contains the rhs after the equal
+   sign) and we use the last one of them to make a decision.  The
+   number of elements in ARGV is given in ARGC.  */
+const char *
+arm_target_thumb_only (int argc, const char **argv)
+{
+  unsigned int opt;
+
+  if (argc)
+{
+  for (opt = 0; opt < (ARRAY_SIZE (arm_arch_core_flags)); opt++)
+	if ((strcmp (argv[argc - 1], arm_arch_core_flags[opt].name) == 0)
+	&& !ARM_FSET_HAS_CPU1(arm_arch_core_flags[opt].flags, FL_NOTM))
+	  return "-mthumb";
+
+  return NULL;
+}
+  else
+return NULL;
+}
+
 #undef ARM_CPU_NAME_LENGTH
 
 
diff --git a/gcc/config/arm/arm-flags.h b/gcc/config/arm/arm-flags.h
new file mode 100644
index ..9a5991aa07a229a7741e526c2876e7e0e4749db4
--- /dev/null
+++ b/gcc/config/arm/arm-flags.h
@@ -0,0 +1,210 @@
+/* Flags 

Re: [PATCH, ARM/testsuite 6/7, ping2] Force soft float in ARMv6-M and ARMv8-M Baseline options

2016-11-08 Thread Thomas Preudhomme

Ping,

Best regards,

Thomas

On 02/11/16 10:04, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 28/10/16 10:49, Thomas Preudhomme wrote:

On 22/09/16 16:47, Richard Earnshaw (lists) wrote:

On 22/09/16 15:51, Thomas Preudhomme wrote:

Sorry, noticed an error in the patch. It was not caught during testing
because GCC was built with --with-mode=thumb. Correct patch attached.

Best regards,

Thomas

On 22/09/16 14:49, Thomas Preudhomme wrote:

Hi,

ARMv6-M and ARMv8-M Baseline only support soft float ABI. Therefore, the
arm_arch_v8m_base add option should pass -mfloat-abi=soft, much like
-mthumb is
passed for architectures that only support Thumb instruction set. This
patch
adds -mfloat-abi=soft to both arm_arch_v6m and arm_arch_v8m_base add
options.
Patch is in attachment.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2016-07-15  Thomas Preud'homme  

* lib/target-supports.exp (add_options_for_arm_arch_v6m): Add
-mfloat-abi=soft option.
(add_options_for_arm_arch_v8m_base): Likewise.


Is this ok for trunk?

Best regards,

Thomas


6_softfloat_testing_v6m_v8m_baseline.patch


diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index
0dabea0850124947a7fe333e0b94c4077434f278..b5d72f1283be6a6e4736a1d20936e169c1384398

100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3540,24 +3540,25 @@ proc check_effective_target_arm_fp16_hw { } {
 # Usage: /* { dg-require-effective-target arm_arch_v5_ok } */
 #/* { dg-add-options arm_arch_v5 } */
 # /* { dg-require-effective-target arm_arch_v5_multilib } */
-foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__
- v4t "-march=armv4t" __ARM_ARCH_4T__
- v5 "-march=armv5 -marm" __ARM_ARCH_5__
- v5t "-march=armv5t" __ARM_ARCH_5T__
- v5te "-march=armv5te" __ARM_ARCH_5TE__
- v6 "-march=armv6" __ARM_ARCH_6__
- v6k "-march=armv6k" __ARM_ARCH_6K__
- v6t2 "-march=armv6t2" __ARM_ARCH_6T2__
- v6z "-march=armv6z" __ARM_ARCH_6Z__
- v6m "-march=armv6-m -mthumb" __ARM_ARCH_6M__
- v7a "-march=armv7-a" __ARM_ARCH_7A__
- v7r "-march=armv7-r" __ARM_ARCH_7R__
- v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
- v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
- v8a "-march=armv8-a" __ARM_ARCH_8A__
- v8_1a "-march=armv8.1a" __ARM_ARCH_8A__
- v8m_base "-march=armv8-m.base -mthumb"
__ARM_ARCH_8M_BASE__
- v8m_main "-march=armv8-m.main -mthumb"
__ARM_ARCH_8M_MAIN__ } {
+foreach { armfunc armflag armdef } {
+v4 "-march=armv4 -marm" __ARM_ARCH_4__
+v4t "-march=armv4t" __ARM_ARCH_4T__
+v5 "-march=armv5 -marm" __ARM_ARCH_5__
+v5t "-march=armv5t" __ARM_ARCH_5T__
+v5te "-march=armv5te" __ARM_ARCH_5TE__
+v6 "-march=armv6" __ARM_ARCH_6__
+v6k "-march=armv6k" __ARM_ARCH_6K__
+v6t2 "-march=armv6t2" __ARM_ARCH_6T2__
+v6z "-march=armv6z" __ARM_ARCH_6Z__
+v6m "-march=armv6-m -mthumb -mfloat-abi=soft" __ARM_ARCH_6M__
+v7a "-march=armv7-a" __ARM_ARCH_7A__
+v7r "-march=armv7-r" __ARM_ARCH_7R__
+v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
+v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
+v8a "-march=armv8-a" __ARM_ARCH_8A__
+v8_1a "-march=armv8.1a" __ARM_ARCH_8A__
+v8m_base "-march=armv8-m.base -mthumb -mfloat-abi=soft"
__ARM_ARCH_8M_BASE__
+v8m_main "-march=armv8-m.main -mthumb" __ARM_ARCH_8M_MAIN__ } {
 eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
 proc check_effective_target_arm_arch_FUNC_ok { } {
 if { [ string match "*-marm*" "FLAG" ] &&



I think if you're going to do this you need to also check that changing
the ABI in this way isn't incompatible with other aspects of how the
user has invoked dejagnu.


The reason this patch was made is that without it dg-require-effective-target
arm_arch_v8m_base_ok evaluates to true for an arm-none-linux-gnueabihf toolchain
but then any testcase containing a function for such a target (such as the
atomic-op-* in gcc.target/arm) will error out because ARMv8-M Baseline does not
support hard float ABI.

I see 2 ways to fix this:

1) the approach taken in this patch, ie saying that to select ARMv8-M baseline
architecture you need the right -march, -mthumb but also the right float ABI.

Note that the comment at the top of that procedure says:
# Creates a series of routines that return 1 if the given architecture
# can be selected and a routine to give the flags to select that architecture

2) Add a function to the assembly that is used to test support for the
architecture.

The reason I favor the first one is that it enables more test while 

Re: [Ping][PATCH, ARM] PR71607: New approach to arm_disable_literal_pool

2016-11-08 Thread Andre Vieira (lists)
On 21/10/16 09:55, Andre Vieira (lists) wrote:
> On 06/10/16 14:57, Andre Vieira (lists) wrote:
>> Hello,
>>
>> This patch tackles the issue reported in PR71607. This patch takes a
>> different approach for disabling the creation of literal pools. Instead
>> of disabling the patterns that would normally transform the rtl into
>> actual literal pools, it disables the creation of this literal pool rtl
>> by making the target hook TARGET_CANNOT_FORCE_CONST_MEM return true if
>> arm_disable_literal_pool is true. I added patterns to split floating
>> point constants for both SF and DFmode. A pattern to handle the
>> addressing of label_refs had to be included as well since all
>> "memory_operand" patterns are disabled when
>> TARGET_CANNOT_FORCE_CONST_MEM returns true. Also the pattern for
>> splitting 32-bit immediates had to be changed, it was not accepting
>> unsigned 32-bit unsigned integers with the MSB set. I believe
>> const_int_operand expects the mode of the operand to be set to VOIDmode
>> and not SImode. I have only changed it in the patterns that were
>> affecting this code, though I suggest looking into changing it in the
>> rest of the ARM backend.
>>
>> I added more test cases. No regressions for arm-none-eabi with
>> Cortex-M0, Cortex-M3 and Cortex-M7.
>>
>> Is this OK for trunk?
>>
>> Cheers,
>> Andre
>>
>> gcc/ChangeLog:
>>
>> 2016-10-06  Andre Vieira  
>>
>> PR target/71607
>> * config/arm/arm.md (use_literal_pool): Remove.
>> (64-bit immediate split): No longer take cost into consideration
>> if 'arm_disable_literal_pool' is enabled.
>> (32-bit const split): Remove SImode from constant, which was
>> not allowing large unsigned integers to be split.
>> * config/arm/arm.c (thumb2_legitimate_address_p): Remove handling
>> of 'arm_disable_literal_pool' here.
>> (arm_max_const_double_inline_cost): Likewise.
>> (arm_cannot_force_const_mem): Return false for
>> 'arm_disable_literal_pool'.
>> (thumb2_legitimate_address_p): Remove check involving
>> 'arm_disable_literal_pool'
>> that is no longer relevant.
>> (arm_legitimate_constant_p): Ignore the outcome of
>> 'arm_cannot_force_const_mem'
>> if 'arm_disable_literal_pool' is enabled.
>> * config/arm/vfp.md (no_literal_pool_df_immediate): New.
>> (no_literal_pool_sf_immediate): New.
>> * config/arm/thumb2.md (*thumb2_movsi_labelref_insn): New.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2016-10-06  Andre Vieira  
>> Thomas Preud'homme  
>>
>> PR target/71607
>> * gcc.target/arm/thumb2-slow-flash-data.c: Rename to ...
>> * gcc.target/arm/thumb2-slow-flash-data-1.c: ... this.
>> * gcc.target/arm/thumb2-slow-flash-data-2.c: New.
>> * gcc.target/arm/thumb2-slow-flash-data-3.c: New.
>>
>>
> Ping.
> 
Ping.


Re: [PATCH TEST]XFAIL gcc.dg/vect/pr56541.c for !vect_cond_mixed target for now

2016-11-08 Thread Richard Biener
On Tue, Nov 8, 2016 at 1:40 PM, Bin Cheng  wrote:
> Hi,
> GCC should be able to vectorize gcc.dg/vect/pr56541.c on targets do not 
> support vect_cond_mixed.  Problem is jump threading factors comparison out of 
> cond_expr and creates mixed-type cond_expr.  As a result, it can only be 
> vectorized with vect_cond_mixed.  We don't have a good solution at the 
> moment, this patch xfail it on such targets.

Ok.

Richard.

> Thanks,
> bin
>
> gcc/testsuite/ChangeLog
> 2016-11-04  Bin Cheng  
>
> * gcc.dg/vect/pr56541.c: Xfail on !vect_cond_mixed targets.


Re: [PATCH TEST]Drop xfail for gcc.dg/tree-ssa/pr71347.c

2016-11-08 Thread Richard Biener
On Tue, Nov 8, 2016 at 1:39 PM, Bin Cheng  wrote:
> Hi,
> With tree pre improvement @239414, test gcc.dg/tree-ssa/pr71347.c no longer 
> xfails.  Given the change is target independent, I am dropping xfail of the 
> test for all targets.

Ok.

Richard.

> Thanks,
> bin
>
> gcc/testsuite/ChangeLog
> 2016-11-04  Bin Cheng  
>
> * gcc.dg/tree-ssa/pr71347.c: Drop xfail.


Re: [PATCH TEST]Drop xfail for gcc.dg/vect/vect-cond-2.c

2016-11-08 Thread Richard Biener
On Tue, Nov 8, 2016 at 1:37 PM, Bin Cheng  wrote:
> Hi,
> Test gcc.dg/vect/vect-cond-2.c can be vectorized by GCC now, this patch drops 
> the xfail.

Ok.

Richard.
> bin
>
> gcc/testsuite/ChangeLog
> 2016-11-04  Bin Cheng  
>
> * gcc.dg/vect/vect-cond-2.c: Drop xfail.


Re: [PATCH 2/2, expand] make expand_builtin_strncmp more general

2016-11-08 Thread Aaron Sawdey
Richard,
  Thanks for the review ... comments below.

On Tue, 2016-11-08 at 13:36 +0100, Richard Biener wrote:
> On Tue, Nov 1, 2016 at 11:29 PM, Aaron Sawdey
>  wrote:
> > 
> > This patch adds code to expand_builtin_strncmp so it also attempts
> > expansion via cmpstrnsi in the case where c_strlen() returns NULL
> > for
> > both string arguments, meaning that neither one is a constant.
> 
> @@ -3929,61 +3929,85 @@
>  unsigned int arg1_align = get_pointer_alignment (arg1) /
> BITS_PER_UNIT;
>  unsigned int arg2_align = get_pointer_alignment (arg2) /
> BITS_PER_UNIT;
> 
> +/* If we don't have POINTER_TYPE, call the function.  */
> +if (arg1_align == 0 || arg2_align == 0)
> +  return NULL_RTX;
> +
> 
> hm?  we cann validate_arglist at the beginning...
> 
> +if (!len1 && !len2)
> +  {
> +   /* If neither arg1 nor arg2 are constant strings.  */
> +   /* Stabilize the arguments in case gen_cmpstrnsi fails.  */
> +   arg1 = builtin_save_expr (arg1);
> +   arg2 = builtin_save_expr (arg2);
> 
> we no longer need the stabilization dance since we expand from
> GIMPLE.
> 
> +   /* Use save_expr here because cmpstrnsi may modify arg3
> +  and builtin_save_expr() doesn't force the save to
> happen.  */
> +   len = save_expr (arg3);
> +   len = fold_convert_loc (loc, sizetype, len);
> 
> cmpstrnsi may certainly not modify trees in-place.  If it does it
> needs to be fixed.

I needed this to get past a bootstrap failure on i386 where the
cmpstrnsi pattern was modifying cx which also contained a length used
elsewhere. The pattern does this:

  count = operands[3];
  countreg = ix86_zero_extend_to_Pmode (count);

but does not clobber operand 3 even though it uses it as the count for
the repz cmpsb. I wonder if this is the source of that problem?

> 
> +   /* If both arguments have side effects, we cannot
> optimize.  */
> +   if (TREE_SIDE_EFFECTS (len))
> + return NULL_RTX;
> 
> this can't happen anymore
> 
> btw, due to the re-indention a context diff would be _much_ easier to
> review.

I assume you mean a diff that ignores whitespace -- I'll make sure to
do that.

> 
> So the real "meat" of your change is
> 
> /* If both arguments have side effects, we cannot optimize.  */
> -if (!len || TREE_SIDE_EFFECTS (len))
> +   if (TREE_SIDE_EFFECTS (len))
>   return NULL_RTX;
> 
> and falling back to arg3 as len if !len1 && !len2.
> 
> Plus
> 
> +/* Set MEM_SIZE as appropriate.  This will only happen in
> +   the case where incoming arg3 is constant and arg1/arg2 are
> not.  */
> +if (CONST_INT_P (arg3_rtx))
> +  {
> +   set_mem_size (arg1_rtx, INTVAL (arg3_rtx));
> +   set_mem_size (arg2_rtx, INTVAL (arg3_rtx));
> +  }
> 
> where I don't really see why you need it or how it is even correct
> (arg1 might
> terminate with a '\0' before arg3).
> 
> It would  be nice to simplify the patch to simply do
> 
>    if (!len1 && !len2)
>  len = arg3;
>    else if (!len1)
> ...

I'll see if I can simplify it to do just this and also remove the
unnecessary stuff you've flagged.

Thanks,
    Aaron

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: [ssa-coalesce] Rename register_ssa_partition

2016-11-08 Thread Richard Biener
On Tue, Nov 8, 2016 at 3:32 AM, kugan  wrote:
> Hi,
>
> In tree-ssa-coalesce, register_ssa_partition ) and
> register_ssa_partition_check have lost their meaning over various commits
> and now just verifies that ssa_var is indeed a SSA_NAME and not a
> virtual_operand_p. It is confusing when one look at if for the fist time and
> would expect more while reading the register_ssa_partition.
>
> Attached patch just changes it to verify_ssa_for_coalesce to better reflect
> what it is doing now.
>
> Bootstrap and regression testing is ongoing. Is this OK for trunk if no
> regressions?

Hum, can you retain the inline wrapper please?  I find the new name
verify_ssa_for_coalesce bad as tree-ssa-live.h is something generic,
not just coalescing related.  I'd say a better improvement would be to remove
register_ssa_partition completely.

Richard.

> Thanks,
> Kugan
>
>
>
> gcc/ChangeLog:
>
> 2016-11-08  Kugan Vivekanandarajah  
>
> * tree-ssa-coalesce.c (register_default_def): Remove usage of arg
> map which is not used at all.
> (create_outofssa_var_map): Use renamed verify_ssa_for_coalesce from
> register_ssa_partition.
> * tree-ssa-live.c (verify_ssa_for_coalesce): Renamed
> register_ssa_partition.
> (register_ssa_partition_check): Remove.
> * tree-ssa-live.h (register_ssa_partition): Renamed to
> verify_ssa_for_coalesce


[PATCH TEST]XFAIL gcc.dg/vect/pr56541.c for !vect_cond_mixed target for now

2016-11-08 Thread Bin Cheng
Hi,
GCC should be able to vectorize gcc.dg/vect/pr56541.c on targets do not support 
vect_cond_mixed.  Problem is jump threading factors comparison out of cond_expr 
and creates mixed-type cond_expr.  As a result, it can only be vectorized with 
vect_cond_mixed.  We don't have a good solution at the moment, this patch xfail 
it on such targets.

Thanks,
bin

gcc/testsuite/ChangeLog
2016-11-04  Bin Cheng  

* gcc.dg/vect/pr56541.c: Xfail on !vect_cond_mixed targets.diff --git a/gcc/testsuite/gcc.dg/vect/pr56541.c 
b/gcc/testsuite/gcc.dg/vect/pr56541.c
index 16b8d7c..d5def68 100644
--- a/gcc/testsuite/gcc.dg/vect/pr56541.c
+++ b/gcc/testsuite/gcc.dg/vect/pr56541.c
@@ -24,4 +24,4 @@ void foo()
 }
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! 
vect_cond_mixed } } } } */


[PATCH TEST]Drop xfail for gcc.dg/tree-ssa/pr71347.c

2016-11-08 Thread Bin Cheng
Hi,
With tree pre improvement @239414, test gcc.dg/tree-ssa/pr71347.c no longer 
xfails.  Given the change is target independent, I am dropping xfail of the 
test for all targets.

Thanks,
bin

gcc/testsuite/ChangeLog
2016-11-04  Bin Cheng  

* gcc.dg/tree-ssa/pr71347.c: Drop xfail.diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
index 428e41b..d2f7f05 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
@@ -14,4 +14,4 @@ void foo (void)
 }
 
 /* Load of X[i - i] can be omitted by reusing X[i] in previous iteration.  */
-/* { dg-final { scan-tree-dump-not ".* = MEM.*;" "optimized" { xfail { 
ia64-*-* arm*-*-* m68k*-*-* } } } } */
+/* { dg-final { scan-tree-dump-not ".* = MEM.*;" "optimized" } } */


Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-08 Thread Richard Biener
On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:

> Hi Richard,
> 
> I did not understand your last remark:
> 
> > That is, here (and avoid the FOR_EACH_LOOP change):
> >
> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >   && dump_enabled_p ())
> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> >"loop vectorized\n");
> > -   vect_transform_loop (loop_vinfo);
> > +   new_loop = vect_transform_loop (loop_vinfo);
> > num_vectorized_loops++;
> >/* Now that the loop has been vectorized, allow it to be unrolled
> >   etc.  */
> >  loop->force_vectorize = false;
> >
> > +   /* Add new loop to a processing queue.  To make it easier
> > +  to match loop and its epilogue vectorization in dumps
> > +  put new loop as the next loop to process.  */
> > +   if (new_loop)
> > + {
> > +   loops.safe_insert (i + 1, new_loop->num);
> > +   vect_loops_num = number_of_loops (cfun);
> > + }
> >
> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> f> unction which will set up stuff properly (and also perform
> > the if-conversion of the epilogue there).
> >
> > That said, if we can get in non-masked epilogue vectorization
> > separately that would be great.
> 
> Could you please clarify your proposal.

When a loop was vectorized set things up to immediately vectorize
its epilogue, avoiding changing the loop iteration and avoiding
the re-use of ->aux.

Richard.

> Thanks.
> Yuri.
> 
> 2016-11-02 15:27 GMT+03:00 Richard Biener :
> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
> >
> >> Hi All,
> >>
> >> I re-send all patches sent by Ilya earlier for review which support
> >> vectorization of loop epilogues and loops with low trip count. We
> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> >> approved by Jeff.
> >>
> >> I did re-base of all patches and performed bootstrapping and
> >> regression testing that did not show any new failures. Also all
> >> changes related to new vect_do_peeling algorithm have been changed
> >> accordingly.
> >>
> >> Is it OK for trunk?
> >
> > I would have prefered that the series up to -03-nomask-tails would
> > _only_ contain epilogue loop vectorization changes but unfortunately
> > the patchset is oddly separated.
> >
> > I have a comment on that part nevertheless:
> >
> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> > loop_vinfo)
> >/* Check if we can possibly peel the loop.  */
> >if (!vect_can_advance_ivs_p (loop_vinfo)
> >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> > -  || loop->inner)
> > +  || loop->inner
> > +  /* Required peeling was performed in prologue and
> > +is not required for epilogue.  */
> > +  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> >  do_peeling = false;
> >
> >if (do_peeling
> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> > loop_vinfo)
> >
> >do_versioning =
> > optimize_loop_nest_for_speed_p (loop)
> > -   && (!loop->inner); /* FORNOW */
> > +   && (!loop->inner) /* FORNOW */
> > +/* Required versioning was performed for the
> > +  original loop and is not required for epilogue.  */
> > +   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
> >
> >if (do_versioning)
> >  {
> >
> > please do that check in the single caller of this function.
> >
> > Otherwise I still dislike the new ->aux use and I believe that simply
> > passing down info from the processed parent would be _much_ cleaner.
> > That is, here (and avoid the FOR_EACH_LOOP change):
> >
> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> > && dump_enabled_p ())
> >dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> > "loop vectorized\n");
> > -   vect_transform_loop (loop_vinfo);
> > +   new_loop = vect_transform_loop (loop_vinfo);
> > num_vectorized_loops++;
> > /* Now that the loop has been vectorized, allow it to be unrolled
> >etc.  */
> > loop->force_vectorize = false;
> >
> > +   /* Add new loop to a processing queue.  To make it easier
> > +  to match loop and its epilogue vectorization in dumps
> > +  put new loop as the next loop to process.  */
> > +   if (new_loop)
> > + {
> > +   loops.safe_insert (i + 1, new_loop->num);
> > +   vect_loops_num = number_of_loops (cfun);
> > + }
> >
> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> > function which will set up stuff properly (and also perform
> > the if-conversion of the epilogue there).
> >
> > That said, if we can get in non-masked epilogue vectorization
> > separately that would be great.
> >
> > I'm still torn about all the rest of the stuff and question its
> > usability (esp. merging the epilogue with the main 

[PATCH TEST]Drop xfail for gcc.dg/vect/vect-cond-2.c

2016-11-08 Thread Bin Cheng
Hi,
Test gcc.dg/vect/vect-cond-2.c can be vectorized by GCC now, this patch drops 
the xfail.

Thanks,
bin

gcc/testsuite/ChangeLog
2016-11-04  Bin Cheng  

* gcc.dg/vect/vect-cond-2.c: Drop xfail.diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-2.c
index 9a62856..d7da803 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-2.c
@@ -39,8 +39,6 @@ int main (void)
   return 0;
 }
 
-/* The order of computation should not be changed for cond_expr, therefore, 
-   it cannot be vectorized in reduction.  */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail 
*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 
 


Re: [PATCH 2/2, expand] make expand_builtin_strncmp more general

2016-11-08 Thread Richard Biener
On Tue, Nov 1, 2016 at 11:29 PM, Aaron Sawdey
 wrote:
> This patch adds code to expand_builtin_strncmp so it also attempts
> expansion via cmpstrnsi in the case where c_strlen() returns NULL for
> both string arguments, meaning that neither one is a constant.

@@ -3929,61 +3929,85 @@
 unsigned int arg1_align = get_pointer_alignment (arg1) / BITS_PER_UNIT;
 unsigned int arg2_align = get_pointer_alignment (arg2) / BITS_PER_UNIT;

+/* If we don't have POINTER_TYPE, call the function.  */
+if (arg1_align == 0 || arg2_align == 0)
+  return NULL_RTX;
+

hm?  we cann validate_arglist at the beginning...

+if (!len1 && !len2)
+  {
+   /* If neither arg1 nor arg2 are constant strings.  */
+   /* Stabilize the arguments in case gen_cmpstrnsi fails.  */
+   arg1 = builtin_save_expr (arg1);
+   arg2 = builtin_save_expr (arg2);

we no longer need the stabilization dance since we expand from GIMPLE.

+   /* Use save_expr here because cmpstrnsi may modify arg3
+  and builtin_save_expr() doesn't force the save to happen.  */
+   len = save_expr (arg3);
+   len = fold_convert_loc (loc, sizetype, len);

cmpstrnsi may certainly not modify trees in-place.  If it does it
needs to be fixed.

+   /* If both arguments have side effects, we cannot optimize.  */
+   if (TREE_SIDE_EFFECTS (len))
+ return NULL_RTX;

this can't happen anymore

btw, due to the re-indention a context diff would be _much_ easier to review.

So the real "meat" of your change is

/* If both arguments have side effects, we cannot optimize.  */
-if (!len || TREE_SIDE_EFFECTS (len))
+   if (TREE_SIDE_EFFECTS (len))
  return NULL_RTX;

and falling back to arg3 as len if !len1 && !len2.

Plus

+/* Set MEM_SIZE as appropriate.  This will only happen in
+   the case where incoming arg3 is constant and arg1/arg2 are not.  */
+if (CONST_INT_P (arg3_rtx))
+  {
+   set_mem_size (arg1_rtx, INTVAL (arg3_rtx));
+   set_mem_size (arg2_rtx, INTVAL (arg3_rtx));
+  }

where I don't really see why you need it or how it is even correct (arg1 might
terminate with a '\0' before arg3).

It would  be nice to simplify the patch to simply do

   if (!len1 && !len2)
 len = arg3;
   else if (!len1)
...

Richard.

> --
> Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
> 050-2/C113  (507) 253-7520 home: 507/263-0782
> IBM Linux Technology Center - PPC Toolchain


Re: [PATCH][1/2] Fix off-by-one error in clear_bit_region in store merging (PR tree-optimization/78234 ?)

2016-11-08 Thread Richard Biener
On Tue, Nov 8, 2016 at 1:03 PM, Kyrill Tkachov
 wrote:
> Hi all,
>
> There is an off-by-one error in the clear_bit_region helper in store merging
> in the case where it deals with
> multi-byte quantities starting at a non-zero bit offset. The particular
> input is
> {0xff, 0xff, 0xff} and we want to clear all bits except the least and most
> significant i.e. we want:
> {0x01, 0x00, 0x80} so it's called as clear_bit_region (input, 1, 22);
> This ends up clearing one more bit due to this bug. The patch fixes that.
> The last argument to clear_bit_region is the number of bits left to clear
> and since in the previous call we cleared
> BITS_PER_UNIT - start bits we should subtract exactly that amount from len
> when calculating the bits left to clear.
> This was uncovered when writing initial unit tests for these functions which
> are included in the followup patch.
>
> Bootstrapped and tested on aarch64 and x86_64 (the affected function is only
> called for little-endian code).
>
> Ok for trunk?

Ok.

Richard.

> Thanks,
> Kyrill
>
> 2016-11-08  Kyrylo Tkachov  
>
> PR tree-optimization/78234
> * gimple-ssa-store-merging.c (clear_bit_region): Fix off-by-one error
> in start != 0 case.


Re: [PATCHv2 4/7, GCC, ARM, V8M] ARMv8-M Security Extension's cmse_nonsecure_entry: clear registers

2016-11-08 Thread Kyrill Tkachov


On 28/10/16 17:07, Andre Vieira (lists) wrote:

On 27/10/16 11:44, Kyrill Tkachov wrote:

On 27/10/16 11:00, Andre Vieira (lists) wrote:

On 26/10/16 17:30, Kyrill Tkachov wrote:

On 26/10/16 17:26, Andre Vieira (lists) wrote:

On 26/10/16 13:51, Kyrill Tkachov wrote:

Hi Andre,

On 25/10/16 17:29, Andre Vieira (lists) wrote:

On 24/08/16 12:01, Andre Vieira (lists) wrote:

On 25/07/16 14:23, Andre Vieira (lists) wrote:

This patch extends support for the ARMv8-M Security Extensions
'cmse_nonsecure_entry' attribute to safeguard against leak of
information through unbanked registers.

When returning from a nonsecure entry function we clear all
caller-saved
registers that are not used to pass return values, by writing
either
the
LR, in case of general purpose registers, or the value 0, in case
of FP
registers. We use the LR to write to APSR and FPSCR too. We
currently do
not support entry functions that pass arguments or return
variables on
the stack and we diagnose this. This patch relies on the existing
code
to make sure callee-saved registers used in cmse_nonsecure_entry
functions are saved and restored thus retaining their nonsecure
mode
value, this should be happening already as it is required by AAPCS.

This patch also clears padding bits for cmse_nonsecure_entry
functions
with struct and union return types. For unions a bit is only
considered
a padding bit if it is an unused bit in every field of that union.
The
function that calculates these is used in a later patch to do the
same
for arguments of cmse_nonsecure_call's.

*** gcc/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* config/arm/arm.c (output_return_instruction): Clear
registers.
(thumb2_expand_return): Likewise.
(thumb1_expand_epilogue): Likewise.
(thumb_exit): Likewise.
(arm_expand_epilogue): Likewise.
(cmse_nonsecure_entry_clear_before_return): New.
(comp_not_to_clear_mask_str_un): New.
(compute_not_to_clear_mask): New.
* config/arm/thumb1.md (*epilogue_insns): Change length
attribute.
* config/arm/thumb2.md (*thumb2_return): Likewise.

*** gcc/testsuite/ChangeLog ***
2016-07-25  Andre Vieira
Thomas Preud'homme  

* gcc.target/arm/cmse/cmse.exp: Test different multilibs
separate.
* gcc.target/arm/cmse/struct-1.c: New.
* gcc.target/arm/cmse/bitfield-1.c: New.
* gcc.target/arm/cmse/bitfield-2.c: New.
* gcc.target/arm/cmse/bitfield-3.c: New.
* gcc.target/arm/cmse/baseline/cmse-2.c: Test that
registers are
cleared.
* gcc.target/arm/cmse/mainline/soft/cmse-5.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-5.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-5.c: New.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-5.c: New.


Updated this patch to correctly clear only the cumulative
exception-status (0-4,7) and the condition code bits (28-31) of the
FPSCR. I also adapted the code to be handle the bigger floating
point
register files.



This patch extends support for the ARMv8-M Security Extensions
'cmse_nonsecure_entry' attribute to safeguard against leak of
information through unbanked registers.

When returning from a nonsecure entry function we clear all
caller-saved
registers that are not used to pass return values, by writing
either the
LR, in case of general purpose registers, or the value 0, in case
of FP
registers. We use the LR to write to APSR. For FPSCR we clear
only the
cumulative exception-status (0-4, 7) and the condition code bits
(28-31). We currently do not support entry functions that pass
arguments
or return variables on the stack and we diagnose this. This patch
relies
on the existing code to make sure callee-saved registers used in
cmse_nonsecure_entry functions are saved and restored thus retaining
their nonsecure mode value, this should be happening already as
it is
required by AAPCS.

This patch also clears padding bits for cmse_nonsecure_entry
functions
with struct and union return types. For unions a bit is only
considered
a padding bit if it is an unused bit in every field of that
union. The
function that calculates these is used in a later patch to do the
same
for arguments of cmse_nonsecure_call's.

*** gcc/ChangeLog ***
2016-07-xx  Andre Vieira
Thomas Preud'homme  

* config/arm/arm.c (output_return_instruction): Clear
registers.
(thumb2_expand_return): Likewise.
(thumb1_expand_epilogue): Likewise.
(thumb_exit): Likewise.
(arm_expand_epilogue): Likewise.

Re: [PATCH] use-after-scope fallout

2016-11-08 Thread Jakub Jelinek
On Tue, Nov 08, 2016 at 01:00:19PM +0100, Martin Liška wrote:
> This is fallout fix where I changed:
> 
> 1) Fix ICE for lambda functions (added test-case: use-after-scope-4.C)
> 2) Fix ICE in gimplify_switch_expr, at gimplify.c:2269 (fixed by not adding
> artificial variables)
> 3) PR testsuite/78242 - I basically removed the test (not interesting)
> 4) LEAF and NOTHROW flags are properly set on ASAN {un}poison functions
> 5) dbg_cnt has been added
> 6) use-after-scope-types-4.C - scanned pattern is updated to work on i686
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin

> >From 36eb4a8b3542729c9c428ac319d8422bea677869 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Mon, 7 Nov 2016 14:49:00 +0100
> Subject: [PATCH] use-after-scope fallout
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-11-08  Martin Liska  
> 
>   PR testsuite/78242
>   * g++.dg/asan/use-after-scope-4.C: New test.
>   * g++.dg/asan/use-after-scope-types-4.C: Update scanned pattern.
>   * gcc.dg/asan/use-after-scope-8.c: Remove.
> 
> gcc/ChangeLog:
> 
> 2016-11-08  Martin Liska  
> 
>   PR testsuite/78242
>   * dbgcnt.def: Add new debug counter asan_use_after_scope.
>   * gimplify.c (gimplify_decl_expr): Do not sanitize vars
>   with a value expr.  Do not add artificial variables to
>   live_switch_vars.  Use the debug counter.
>   (gimplify_target_expr): Use the debug counter.
>   * internal-fn.def: Remove ECF_TM_PURE from ASAN_MARK builtin.
>   * sanitizer.def: Set ATTR_NOTHROW_LEAF_LIST to
>   BUILT_IN_ASAN_CLOBBER_N and BUILT_IN_ASAN_UNCLOBBER_N.

Ok.  BTW, in stage3 please also check if/how nested functions (C and
fortran) work, I bet if you ASAN_MARK some vars and then
tree-nested.c moves them into an artificial struct that things might
not work 100% properly (e.g. would there be a guarantee that it is
unpoisoned upon function exit)?

Jakub


[PATCH][2/2] Add store merging unit tests

2016-11-08 Thread Kyrill Tkachov

Hi all,

This patch adds the plumbing for unit testing of the store merging pass.
It also adds some initial tests of some of the helpers used in 
encode_tree_to_bitpos
to manipulate byte arrays. They caught an off-by-one error bug that is fixed in 
patch [1/2].

Bootstrapped and tested on x86_64 and aarch64.

Ok for trunk?

Thanks,
Kyrill

2016-11-08  Kyrylo Tkachov  

* gimple-ssa-store-merging.c: Include selftest.h
(verify_array_eq): New function.
(verify_shift_bytes_in_array): Likewise.
(verify_shift_bytes_in_array_right): Likewise.
(verify_clear_bit_region): Likewise.
(verify_clear_bit_region_be): Likewise.
(store_merging_c_tests): Likewise.
* selftest.h (store_merging_c_tests): Declare prototype.
* selftest-run-tests.c (selftest::run_tests): Run
store_merging_c_tests.
diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index 46f92ba2d2f4e85c4256be11be5c8b1d40c21499..7b59f81e26423c60f8bf1c975281d3904315b306 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -126,6 +126,7 @@
 #include "tree-eh.h"
 #include "target.h"
 #include "gimplify-me.h"
+#include "selftest.h"
 
 /* The maximum size (in bits) of the stores this pass should generate.  */
 #define MAX_STORE_BITSIZE (BITS_PER_WORD)
@@ -1499,3 +1500,141 @@ make_pass_store_merging (gcc::context *ctxt)
 {
   return new pass_store_merging (ctxt);
 }
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Selftests for store merging helpers.  */
+
+/* Assert that all elements of the byte arrays X and Y, both of length N
+   are equal.  */
+
+static void
+verify_array_eq (unsigned char *x, unsigned char *y, unsigned int n)
+{
+  for (unsigned int i = 0; i < n; i++)
+{
+  if (x[i] != y[i])
+	{
+	  fprintf (stderr, "Arrays do not match.  X:\n");
+	  dump_char_array (stderr, x, n);
+	  fprintf (stderr, "Y:\n");
+	  dump_char_array (stderr, y, n);
+	}
+  ASSERT_EQ (x[i], y[i]);
+}
+}
+
+/* Test shift_bytes_in_array and that it carries bits across between
+   bytes correctly.  */
+
+static void
+verify_shift_bytes_in_array (void)
+{
+   /* byte 1   | byte 0
+  0001 | 1110.  */
+  unsigned char orig[2] = { 0xe0, 0x1f };
+  unsigned char in[2];
+  memcpy (in, orig, sizeof orig);
+
+  unsigned char expected[2] = { 0x80, 0x7f };
+  shift_bytes_in_array (in, sizeof (in), 2);
+  verify_array_eq (in, expected, sizeof (in));
+
+  memcpy (in, orig, sizeof orig);
+  memcpy (expected, orig, sizeof orig);
+  /* Check that shifting by zero doesn't change anything.  */
+  shift_bytes_in_array (in, sizeof (in), 0);
+  verify_array_eq (in, expected, sizeof (in));
+
+}
+
+/* Test shift_bytes_in_array_right and that it carries bits across between
+   bytes correctly.  */
+
+static void
+verify_shift_bytes_in_array_right (void)
+{
+   /* byte 1   | byte 0
+  0001 | 1110.  */
+  unsigned char orig[2] = { 0x1f, 0xe0};
+  unsigned char in[2];
+  memcpy (in, orig, sizeof orig);
+  unsigned char expected[2] = { 0x07, 0xf8};
+  shift_bytes_in_array_right (in, sizeof (in), 2);
+  verify_array_eq (in, expected, sizeof (in));
+
+  memcpy (in, orig, sizeof orig);
+  memcpy (expected, orig, sizeof orig);
+  /* Check that shifting by zero doesn't change anything.  */
+  shift_bytes_in_array_right (in, sizeof (in), 0);
+  verify_array_eq (in, expected, sizeof (in));
+}
+
+/* Test clear_bit_region that it clears exactly the bits asked and
+   nothing more.  */
+
+static void
+verify_clear_bit_region (void)
+{
+  /* Start with all bits set and test clearing various patterns in them.  */
+  unsigned char orig[3] = { 0xff, 0xff, 0xff};
+  unsigned char in[3];
+  unsigned char expected[3];
+  memcpy (in, orig, sizeof in);
+
+  /* Check zeroing out all the bits.  */
+  clear_bit_region (in, 0, 3 * BITS_PER_UNIT);
+  expected[0] = expected[1] = expected[2] = 0;
+  verify_array_eq (in, expected, sizeof in);
+
+  memcpy (in, orig, sizeof in);
+  /* Leave the first and last bits intact.  */
+  clear_bit_region (in, 1, 3 * BITS_PER_UNIT - 2);
+  expected[0] = 0x1;
+  expected[1] = 0;
+  expected[2] = 0x80;
+  verify_array_eq (in, expected, sizeof in);
+}
+
+/* Test verify_clear_bit_region_be that it clears exactly the bits asked and
+   nothing more.  */
+
+static void
+verify_clear_bit_region_be (void)
+{
+  /* Start with all bits set and test clearing various patterns in them.  */
+  unsigned char orig[3] = { 0xff, 0xff, 0xff};
+  unsigned char in[3];
+  unsigned char expected[3];
+  memcpy (in, orig, sizeof in);
+
+  /* Check zeroing out all the bits.  */
+  clear_bit_region_be (in, BITS_PER_UNIT - 1, 3 * BITS_PER_UNIT);
+  expected[0] = expected[1] = expected[2] = 0;
+  verify_array_eq (in, expected, sizeof in);
+
+  memcpy (in, orig, sizeof in);
+  /* Leave the first and last bits intact.  */
+  clear_bit_region_be (in, BITS_PER_UNIT - 2, 3 * BITS_PER_UNIT - 2);
+  expected[0] = 0x80;
+  expected[1] = 0;
+  expected[2] = 0x1;
+  verify_array_eq 

[PATCH][1/2] Fix off-by-one error in clear_bit_region in store merging (PR tree-optimization/78234 ?)

2016-11-08 Thread Kyrill Tkachov

Hi all,

There is an off-by-one error in the clear_bit_region helper in store merging in 
the case where it deals with
multi-byte quantities starting at a non-zero bit offset. The particular input is
{0xff, 0xff, 0xff} and we want to clear all bits except the least and most 
significant i.e. we want:
{0x01, 0x00, 0x80} so it's called as clear_bit_region (input, 1, 22);
This ends up clearing one more bit due to this bug. The patch fixes that.
The last argument to clear_bit_region is the number of bits left to clear and 
since in the previous call we cleared
BITS_PER_UNIT - start bits we should subtract exactly that amount from len when 
calculating the bits left to clear.
This was uncovered when writing initial unit tests for these functions which 
are included in the followup patch.

Bootstrapped and tested on aarch64 and x86_64 (the affected function is only 
called for little-endian code).

Ok for trunk?
Thanks,
Kyrill

2016-11-08  Kyrylo Tkachov  

PR tree-optimization/78234
* gimple-ssa-store-merging.c (clear_bit_region): Fix off-by-one error
in start != 0 case.
diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index 36bc833af910f85c4c8ed7581a247f5d3182053d..46f92ba2d2f4e85c4256be11be5c8b1d40c21499 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -337,7 +337,7 @@ clear_bit_region (unsigned char *ptr, unsigned int start,
   else if (start != 0)
 {
   clear_bit_region (ptr, start, BITS_PER_UNIT - start);
-  clear_bit_region (ptr + 1, 0, len - (BITS_PER_UNIT - start) + 1);
+  clear_bit_region (ptr + 1, 0, len - (BITS_PER_UNIT - start));
 }
   /* Whole bytes need to be cleared.  */
   else if (start == 0 && len > BITS_PER_UNIT)


[arm-embedded][committed] Treat a --with-headers option without argument same as default

2016-11-08 Thread Andre Vieira (lists)
Hi,

I backported this
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02196.html to the
embedded-6-branch. Tested on arm-none-eabi.

Committed in revision r241960.

Cheers,
Andre

gcc/ChangeLog.arm:
2016-11-08  Andre Vieira  

   Backport from mainline
   2016-05-27  Ulrich Weigand  

   * configure.ac: Treat a --with-headers option without argument
   the same as the default (i.e. consult sys-include directory).
   * configure: Regenerate.
diff --git a/gcc/ChangeLog.arm b/gcc/ChangeLog.arm
index 
6a1d2f46bc64820b54012c744987f988f556eeff..f45bd18e94e597208132b067607bf186d2e30f09
 100644
--- a/gcc/ChangeLog.arm
+++ b/gcc/ChangeLog.arm
@@ -1,3 +1,12 @@
+2016-11-08  Andre Vieira  
+
+   Backport from mainline
+   2016-05-27  Ulrich Weigand  
+
+   * configure.ac: Treat a --with-headers option without argument
+   the same as the default (i.e. consult sys-include directory).
+   * configure: Regenerate.
+
 2016-10-27  Thomas Preud'homme  
 
Backport from mainline
diff --git a/gcc/configure b/gcc/configure
index 
fc83cc8ec6efc59c2e014e3c7619f6790fecf9a8..25e5e31a1afdffe8e922e6218705fe11c832edee
 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -12263,7 +12263,7 @@ elif test "x$TARGET_SYSTEM_ROOT" != x; then
 fi
 
 if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x; then
-  if test "x$with_headers" != x; then
+  if test "x$with_headers" != x && test "x$with_headers" != xyes; then
 target_header_dir=$with_headers
   elif test "x$with_sysroot" = x; then
 target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include"
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 
dc22d3ce93c93b8e4874de1ff6560f592eb3ad11..228b58ce32393b2131cc8a4be13713aeb86476f8
 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -2019,7 +2019,7 @@ elif test "x$TARGET_SYSTEM_ROOT" != x; then
 fi
 
 if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x; then
-  if test "x$with_headers" != x; then
+  if test "x$with_headers" != x && test "x$with_headers" != xyes; then
 target_header_dir=$with_headers
   elif test "x$with_sysroot" = x; then
 target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include"


[PATCH] use-after-scope fallout

2016-11-08 Thread Martin Liška
Hello.

This is fallout fix where I changed:

1) Fix ICE for lambda functions (added test-case: use-after-scope-4.C)
2) Fix ICE in gimplify_switch_expr, at gimplify.c:2269 (fixed by not adding
artificial variables)
3) PR testsuite/78242 - I basically removed the test (not interesting)
4) LEAF and NOTHROW flags are properly set on ASAN {un}poison functions
5) dbg_cnt has been added
6) use-after-scope-types-4.C - scanned pattern is updated to work on i686

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 36eb4a8b3542729c9c428ac319d8422bea677869 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 7 Nov 2016 14:49:00 +0100
Subject: [PATCH] use-after-scope fallout

gcc/testsuite/ChangeLog:

2016-11-08  Martin Liska  

	PR testsuite/78242
	* g++.dg/asan/use-after-scope-4.C: New test.
	* g++.dg/asan/use-after-scope-types-4.C: Update scanned pattern.
	* gcc.dg/asan/use-after-scope-8.c: Remove.

gcc/ChangeLog:

2016-11-08  Martin Liska  

	PR testsuite/78242
	* dbgcnt.def: Add new debug counter asan_use_after_scope.
	* gimplify.c (gimplify_decl_expr): Do not sanitize vars
	with a value expr.  Do not add artificial variables to
	live_switch_vars.  Use the debug counter.
	(gimplify_target_expr): Use the debug counter.
	* internal-fn.def: Remove ECF_TM_PURE from ASAN_MARK builtin.
	* sanitizer.def: Set ATTR_NOTHROW_LEAF_LIST to
	BUILT_IN_ASAN_CLOBBER_N and BUILT_IN_ASAN_UNCLOBBER_N.
---
 gcc/dbgcnt.def |  1 +
 gcc/gimplify.c | 10 --
 gcc/internal-fn.def|  2 +-
 gcc/sanitizer.def  |  4 +--
 gcc/testsuite/g++.dg/asan/use-after-scope-4.C  | 36 ++
 .../g++.dg/asan/use-after-scope-types-4.C  |  2 +-
 gcc/testsuite/gcc.dg/asan/use-after-scope-8.c  | 14 -
 7 files changed, 48 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-4.C
 delete mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-8.c

diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 78ddcc2..0a45bac 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -141,6 +141,7 @@ echo ubound: $ub
 */
 
 /* Debug counter definitions.  */
+DEBUG_COUNTER (asan_use_after_scope)
 DEBUG_COUNTER (auto_inc_dec)
 DEBUG_COUNTER (ccp)
 DEBUG_COUNTER (cfg_cleanup)
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index e5930e6..d392450 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -60,6 +60,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks-def.h"	/* FIXME: for lhd_set_decl_assembler_name */
 #include "builtins.h"
 #include "asan.h"
+#include "dbgcnt.h"
 
 /* Hash set of poisoned variables in a bind expr.  */
 static hash_set *asan_poisoned_variables = NULL;
@@ -1622,11 +1623,13 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
 	  && !asan_no_sanitize_address_p ()
 	  && !is_vla
 	  && TREE_ADDRESSABLE (decl)
-	  && !TREE_STATIC (decl))
+	  && !TREE_STATIC (decl)
+	  && !DECL_HAS_VALUE_EXPR_P (decl)
+	  && dbg_cnt (asan_use_after_scope))
 	{
 	  asan_poisoned_variables->add (decl);
 	  asan_poison_variable (decl, false, seq_p);
-	  if (gimplify_ctxp->live_switch_vars)
+	  if (!DECL_ARTIFICIAL (decl) && gimplify_ctxp->live_switch_vars)
 	gimplify_ctxp->live_switch_vars->add (decl);
 	}
 
@@ -6399,7 +6402,8 @@ gimplify_target_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p)
 	  else
 		cleanup = clobber;
 	}
-	  if (asan_sanitize_use_after_scope ())
+	  if (asan_sanitize_use_after_scope ()
+	  && dbg_cnt (asan_use_after_scope))
 	{
 	  tree asan_cleanup = build_asan_poison_call_expr (temp);
 	  if (asan_cleanup)
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 6a0a7f6..0869b2f 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -158,7 +158,7 @@ DEF_INTERNAL_FN (UBSAN_OBJECT_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ABNORMAL_DISPATCHER, ECF_NORETURN, NULL)
 DEF_INTERNAL_FN (BUILTIN_EXPECT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ASAN_CHECK, ECF_TM_PURE | ECF_LEAF | ECF_NOTHROW, ".R...")
-DEF_INTERNAL_FN (ASAN_MARK, ECF_TM_PURE | ECF_LEAF | ECF_NOTHROW, ".R..")
+DEF_INTERNAL_FN (ASAN_MARK, ECF_LEAF | ECF_NOTHROW, ".R..")
 DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
diff --git a/gcc/sanitizer.def b/gcc/sanitizer.def
index 1c142e9..c11c95a 100644
--- a/gcc/sanitizer.def
+++ b/gcc/sanitizer.def
@@ -166,9 +166,9 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_AFTER_DYNAMIC_INIT,
 		  "__asan_after_dynamic_init",
 		  BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_CLOBBER_N, "__asan_poison_stack_memory",
-		  BT_FN_VOID_PTR_PTRMODE, 0)
+		  

RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC

2016-11-08 Thread Tamar Christina
Hi Christophe,

Thanks for the review!

> 
> A while ago I added p64_p128.c, to contain all the poly64/128 tests except for
> vreinterpret.
> Why do you need to create p64.c ?

I originally created it because I had a much smaller set of intrinsics that I 
wanted to
add initially, this grew and It hadn't occurred to me that I can use the 
existing file now.

Another reason was the effective-target arm_crypto_ok as you mentioned below.

> 
> Similarly, adding tests for vcreate_p64 etc... in p64.c or p64_p128.c might be
> easier to maintain than adding them to vcreate.c etc with several #ifdef
> conditions.

Fair enough, I'll move them to p64_p128.c.

> For vdup-vmod.c, why do you add the "&& defined(__aarch64__)"
> condition? These intrinsics are defined in arm/arm_neon.h, right?
> They are tested in p64_p128.c

I should have looked for them, they weren't being tested before so I had
Mistakenly assumed that they weren't available. Now I realize I just need
To add the proper test option to the file to enable crypto. I'll update this as 
well.

> Looking at your patch, it seems some tests are currently missing for arm:
> vget_high_p64. I'm not sure why I missed it when I removed neont-
> testgen...

I'll adjust the test conditions so they run for ARM as well.

> 
> Regarding vreinterpret_p128.c, doesn't the existing effective-target
> arm_crypto_ok prevent the tests from running on aarch64?

Yes they do, I was comparing the output against a clean version and hasn't 
noticed
That they weren't running. Thanks!

> 
> Thanks,
> 
> Christophe


Re: [Patch 1/4] [libgcc, ARM] Generalise float-to-half conversion function.

2016-11-08 Thread James Greenhalgh
On Mon, Oct 24, 2016 at 02:44:37PM +0100, James Greenhalgh wrote:
> 
> Hi,
> 
> I'm adapting this patch from work started by Matthew Wahab.
> 
> Conversions from double precision floats to the ARM __fp16 are required
> to round only once. A conversion function for double to __fp16 to
> support this on soft-fp targets. This and the following patch add this
> conversion function by reusing the exising float to __fp16 function
> config/arm/fp16.c:__gnu_f2h_internal.
> 
> This patch generalizes __gnu_f2h_internal by adding a specification of
> the source format and reworking the code to make use of it. Initially,
> only the binary32 format is supported.
> 
> Bootstrapped on an ARMv8-A machine with no issues, and cross-tested with
> a reasonable multi-lib range.
> 
> 2016-10-24  James Greenhalgh  
>   Matthew Wahab  
> 
>   * config/arm/fp16.c (struct format): New.
>   (binary32): New.
>   (__gnu_float2h_internal): New.  Body moved from
>   __gnu_f2h_internal and generalize.
>   (_gnu_f2h_internal): Move body to function __gnu_float2h_internal.
>   Call it with binary32.

Looking at it carefully, this patch has a bug in the way it handles
rounding for doubles. The relevant code looks like this:

  if (aexp < -14)
{
  mask = point | (point - 1);
  /* Minimum exponent for half-precision is 2^-24.  */
  if (aexp >= -25)
mask >>= 25 + aexp;
}
  else
mask = 0x1fff;

  /* Round.  */
  if (mantissa & mask)
{
  increment = (mask + 1) >> 1;
  if ((mantissa & mask) == increment)
increment = mantissa & (increment << 1);
  mantissa += increment;
  if (mantissa >= (point << 1))
{
  mantissa >>= 1;
  aexp++;
}
}

But clearly that mask is not going to be sufficient when we are deciding
whether we ought to round a double or not as there are many more bits for
us to look at.

I'll work on a new spin of this patch.

I'm still hopeful that both this and the AArch64 (and generic bits and
pieces) support for _Float16 will make it for GCC 7.

Thanks,
James



Re: [AArch64][GCC][PATCHv2 1/3] Add missing Poly64_t intrinsics to GCC

2016-11-08 Thread Christophe Lyon
On 8 November 2016 at 12:29, James Greenhalgh  wrote:
> On Tue, Nov 08, 2016 at 12:20:57PM +0100, Christophe Lyon wrote:
>> On 07/11/2016 14:55, Tamar Christina wrote:
>> >Hi all,
>> >
>> >This patch (1 of 3) adds the following NEON intrinsics
>> >to the Aarch64 back-end of GCC:
>> >
>> >* vsli_n_p64
>> >* vsliq_n_p64
>> I notice that vsrl_n_p64 and vsriq_n_p64 exist on aarch32. Is this an
>> omission in this patch for aarch64?
>
> Presumably you meant vsri_n_p64 here?
Yes, sorry for the typo.

>
> That looks like an omission in the patch, but I'm still happy for it to go
> in as is with a follow-up patch adding the final two intrinsics.
>
Indeed

> Thanks,
> James
>


Re: [AArch64][GCC][PATCHv2 1/3] Add missing Poly64_t intrinsics to GCC

2016-11-08 Thread James Greenhalgh
On Tue, Nov 08, 2016 at 12:20:57PM +0100, Christophe Lyon wrote:
> On 07/11/2016 14:55, Tamar Christina wrote:
> >Hi all,
> >
> >This patch (1 of 3) adds the following NEON intrinsics
> >to the Aarch64 back-end of GCC:
> >
> >* vsli_n_p64
> >* vsliq_n_p64
> I notice that vsrl_n_p64 and vsriq_n_p64 exist on aarch32. Is this an
> omission in this patch for aarch64?

Presumably you meant vsri_n_p64 here?

That looks like an omission in the patch, but I'm still happy for it to go
in as is with a follow-up patch adding the final two intrinsics.

Thanks,
James



Re: [PATCH][ARM][1/2] Use generic_extra_costs in all remaining tuning structs

2016-11-08 Thread Richard Earnshaw (lists)
On 01/11/16 17:12, Kyrill Tkachov wrote:
> Hi all,
> 
> This is the first of two patches to do away with the transitional
> -mold-rtx-costs option and finalise
> the transition to the table-based RTX costs approach.
> 
> This first patch switches the remaining tuning structs to use
> generic_extra_costs so that the 2nd
> patch can remove the rtx_costs function pointer in tune_params. This
> essentially makes the transitional
> option -mnew-generic-costs the default (though it will be removed in the
> second patch).
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2016-11-01  Kyrylo Tkachov  kyrylo.tkac...@arm.com
> 
> * config/arm/arm.c (arm_slowmul_tune): Use generic_extra_costs.
> (arm_fastmul_tune): Likewise.
> (arm_strongarm_tune): Likewise.
> (arm_xscale_tune): Likewise.
> (arm_9e_tune): Likewise.
> (arm_marvell_pj4_tune): Likewise.
> (arm_v6t2_tune): Likewise.
> (arm_v6m_tune): Likewise.
> (arm_fa726te_tune): Likewise.

OK.

R.



Re: [PATCH][ARM][2/2] Remove old rtx costs

2016-11-08 Thread Richard Earnshaw (lists)
On 01/11/16 17:12, Kyrill Tkachov wrote:
> Hi all,
> 
> This is the big removal patch that removes the old costs functions, the
> function pointer
> field in tune_params, and the transitional options -mold-rtx-costs and
> -mnew-generic-costs.
> The diff stats come in at:
> 3 files changed, 61 insertions(+), 1275 deletions(-)
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2016-11-01  Kyrylo Tkachov  
> 
> * config/arm/arm.opt (mold-rtx-costs): Delete.
> (mnew-generic-costs): Delete.
> * config/arm/arm-protos.h (struct tune_params): Delete rtx_costs field.
> * config/arm/arm.c (arm_rtx_costs_1): Delete.
> (arm_size_rtx_costs): Likewise.
> (arm_slowmul_rtx_costs): Likewise.
> (arm_fastmul_rtx_costs): Likewise.
> (arm_xscale_rtx_costs): Likewise.
> (arm_9e_rtx_costs): Likewise.
> (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
> arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
> arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
> arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
> arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
> arm_cortex_a5_tune, arm_xgene1_tune, arm_marvell_pj4_tune,
> arm_cortex_a35_tune, arm_exynosm1_tune, arm_cortex_a73_tune,
> arm_cortex_m7_tune):
> Delete rtx_costs field.
> (arm_new_rtx_costs): Rename to...
> (arm_rtx_costs_internal): ... This.
> (arm_rtx_costs): Remove old way of doing rtx costs.

OK.

R.



Re: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC

2016-11-08 Thread Christophe Lyon
On 7 November 2016 at 14:55, Tamar Christina  wrote:
> Hi all,
>
> This patch (3 of 3) adds updates tests for the NEON intrinsics
> added by the previous patches:
>
> Ran regression tests on aarch64-none-linux-gnu
> and on arm-none-linux-gnueabihf.
>
> Ok for trunk?
>
> Thanks,
> Tamar
>
>
> gcc/testsuite/
> 2016-11-04  Tamar Christina  
>
> * gcc.target/aarch64/advsimd-intrinsics/p64.c: New.
> * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
> (Poly64x1_t, Poly64x2_t): Added type.
> (AARCH64_ONLY): Added macro.
> * gcc.target/aarch64/advsimd-intrinsics/vcombine.c:
> Added test for Poly64.
> * gcc.target/aarch64/advsimd-intrinsics/vcreate.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vget_high.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vget_lane.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vget_low.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vldX.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vst1_lane.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vld1.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c:
> Added AArch64 flags.
> * gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c:
> Added Aarch64 flags.

Hi Tamar,

A while ago I added p64_p128.c, to contain all the poly64/128
tests except for vreinterpret.
Why do you need to create p64.c ?

Similarly, adding tests for vcreate_p64 etc... in p64.c or p64_p128.c
might be easier to maintain than adding them to vcreate.c etc
with several #ifdef conditions.

For vdup-vmod.c, why do you add the "&& defined(__aarch64__)"
condition? These intrinsics are defined in arm/arm_neon.h, right?
They are tested in p64_p128.c

Looking at your patch, it seems some tests are currently missing
for arm: vget_high_p64. I'm not sure why I missed it when I removed
neont-testgen...

Regarding vreinterpret_p128.c, doesn't the existing
effective-target arm_crypto_ok prevent the tests from
running on aarch64?

Thanks,

Christophe


Re: [AArch64][GCC][PATCHv2 1/3] Add missing Poly64_t intrinsics to GCC

2016-11-08 Thread Christophe Lyon

On 07/11/2016 14:55, Tamar Christina wrote:

Hi all,

This patch (1 of 3) adds the following NEON intrinsics
to the Aarch64 back-end of GCC:

* vsli_n_p64
* vsliq_n_p64

I notice that vsrl_n_p64 and vsriq_n_p64 exist on aarch32. Is this an omission 
in this patch for aarch64?



* vld1_p64
* vld1q_p64
* vld1_dup_p64
* vld1q_dup_p64

* vst1_p64
* vst1q_p64
   
* vld2_p64

* vld3_p64
* vld4_p64
* vld2q_p64
* vld3q_p64
* vld4q_p64

* vld2_dup_p64
* vld3_dup_p64james.greenha...@arm.com
* vld4_dup_p64

* __aarch64_vdup_lane_p64
* __aarch64_vdup_laneq_p64
* __aarch64_vdupq_lane_p64
* __aarch64_vdupq_laneq_p64

* vget_lane_p64
* vgetq_lane_p64

* vreinterpret_p8_p64
* vreinterpretq_p8_p64
* vreinterpret_p16_p64
* vreinterpretq_p16_p64

* vreinterpret_p64_f16
* vreinterpret_p64_f64
* vreinterpret_p64_s8
* vreinterpret_p64_s16
* vreinterpret_p64_s32
* vreinterpret_p64_s64
* vreinterpret_p64_f32
* vreinterpret_p64_u8
* vreinterpret_p64_u16
* vreinterpret_p64_u32
* vreinterpret_p64_u64
* vreinterpret_p64_p8

* vreinterpretq_p64_f64
* vreinterpretq_p64_s8
* vreinterpretq_p64_s16
* vreinterpretq_p64_s32
* vreinterpretq_p64_s64
* vreinterpretq_p64_f16
* vreinterpretq_p64_f32
* vreinterpretq_p64_u8
* vreinterpretq_p64_u16
* vreinterpretq_p64_u32
* vreinterpretq_p64_u64
* vreinterpretq_p64_p8

* vreinterpret_f16_p64
* vreinterpretq_f16_p64
* vreinterpret_f32_p64
* vreinterpretq_f32_p64
* vreinterpret_f64_p64
* vreinterpretq_f64_p64
* vreinterpret_s64_p64
* vreinterpretq_s64_p64
* vreinterpret_u64_p64
* vreinterpretq_u64_p64
* vreinterpret_s8_p64
* vreinterpretq_s8_p64
* vreinterpret_s16_p64
* vreinterpret_s32_p64
* vreinterpretq_s32_p64
* vreinterpret_u8_p64
* vreinterpret_u16_p64
* vreinterpretq_u16_p64
* vreinterpret_u32_p64
* vreinterpretq_u32_p64

* vset_lane_p64
* vsetq_lane_p64

* vget_low_p64
* vget_high_p64

* vcombine_p64
* vcreate_p64

* vst2_lane_p64
* vst3_lane_p64
* vst4_lane_p64
* vst2q_lane_p64
* vst3q_lane_p64
* vst4q_lane_p64

* vget_lane_p64
* vget_laneq_p64
* vset_lane_p64
* vset_laneq_p64

* vcopy_lane_p64
* vcopy_laneq_p64

* vdup_n_p64
* vdupq_n_p64
* vdup_lane_p64
* vdup_laneq_p64

* vld1_p64
* vld1q_p64
* vld1_dup_p64
* vld1q_dup_p64
* vld1q_dup_p64
* vmov_n_p64
* vmovq_n_p64
* vst3q_p64
* vst4q_p64

* vld1_lane_p64
* vld1q_lane_p64
* vst1_lane_p64
* vst1q_lane_p64
* vcopy_laneq_p64
* vcopyq_laneq_p64
* vdupq_laneq_p64

Added new tests for these and ran regression tests on aarch64-none-linux-gnu
and on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Tamar

gcc/
2016-11-04  Tamar Christina  

* config/aarch64/aarch64-builtins.c (TYPES_SETREGP): Added poly type.
(TYPES_GETREGP): Likewise.
(TYPES_SHIFTINSERTP): Likewise.
(TYPES_COMBINEP): Likewise.
(TYPES_STORE1P): Likewise.
* config/aarch64/aarch64-simd-builtins.def
(combine): Added poly generator.
(get_dregoi): Likewise.
(get_dregci): Likewise.
(get_dregxi): Likewise.
(ssli_n): Likewise.
(ld1): Likewise.
(st1): Likewise.
* config/aarch64/arm_neon.h
(poly64x1x2_t, poly64x1x3_t): New.
(poly64x1x4_t, poly64x2x2_t): Likewise.
(poly64x2x3_t, poly64x2x4_t): Likewise.
(poly64x1_t): Likewise.
(vcreate_p64, vcombine_p64): Likewise.
(vdup_n_p64, vdupq_n_p64): Likewise.
(vld2_p64, vld2q_p64): Likewise.
(vld3_p64, vld3q_p64): Likewise.
(vld4_p64, vld4q_p64): Likewise.
(vld2_dup_p64, vld3_dup_p64): Likewise.
(vld4_dup_p64, vsli_n_p64): Likewise.
(vsliq_n_p64, vst1_p64): Likewise.
(vst1q_p64, vst2_p64): Likewise.
(vst3_p64, vst4_p64): Likewise.
(__aarch64_vdup_lane_p64, __aarch64_vdup_laneq_p64): Likewise.
(__aarch64_vdupq_lane_p64, __aarch64_vdupq_laneq_p64): Likewise.
(vget_lane_p64, vgetq_lane_p64): Likewise.
(vreinterpret_p8_p64, vreinterpretq_p8_p64): Likewise.
(vreinterpret_p16_p64, vreinterpretq_p16_p64): Likewise.
(vreinterpret_p64_f16, vreinterpret_p64_f64): Likewise.
(vreinterpret_p64_s8, vreinterpret_p64_s16): Likewise.
(vreinterpret_p64_s32, vreinterpret_p64_s64): Likewise.
(vreinterpret_p64_f32, vreinterpret_p64_u8): Likewise.
(vreinterpret_p64_u16, vreinterpret_p64_u32): Likewise.
(vreinterpret_p64_u64, vreinterpret_p64_p8): Likewise.
(vreinterpretq_p64_f64, vreinterpretq_p64_s8): Likewise.
(vreinterpretq_p64_s16, vreinterpretq_p64_s32): Likewise.
(vreinterpretq_p64_s64, vreinterpretq_p64_f16): Likewise.
(vreinterpretq_p64_f32, vreinterpretq_p64_u8): Likewise.
(vreinterpretq_p64_u16, vreinterpretq_p64_u32): Likewise.
(vreinterpretq_p64_u64, vreinterpretq_p64_p8): Likewise.
(vreinterpret_f16_p64, vreinterpretq_f16_p64): Likewise.
(vreinterpret_f32_p64, vreinterpretq_f32_p64): Likewise.
(vreinterpret_f64_p64, vreinterpretq_f64_p64): Likewise.

Re: [match.pd] Fix for PR35691

2016-11-08 Thread Kyrill Tkachov


On 08/11/16 11:16, Richard Biener wrote:

On Tue, 8 Nov 2016, Prathamesh Kulkarni wrote:


On 8 November 2016 at 13:23, Richard Biener  wrote:

On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote:


On 7 November 2016 at 23:06, Prathamesh Kulkarni
 wrote:

On 7 November 2016 at 15:43, Richard Biener  wrote:

On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:


On 4 November 2016 at 13:41, Richard Biener  wrote:

On Thu, 3 Nov 2016, Marc Glisse wrote:


On Thu, 3 Nov 2016, Richard Biener wrote:


The transform would also work for vectors (element_precision for
the test but also a value-matching zero which should ensure the
same number of elements).

Um sorry, I didn't get how to check vectors to be of equal length by a
matching zero.
Could you please elaborate on that ?

He may have meant something like:

   (op (cmp @0 integer_zerop@2) (cmp @1 @2))

I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the
point of the pattern.

Oups, that's what I had written first, and then I somehow managed to confuse
myself enough to remove it so as to remove the call to types_match :-(


So the last operand is checked with operand_equal_p instead of
integer_zerop. But the fact that we could compute bit_ior on the
comparison results should already imply that the number of elements is the
same.

Though for equality compares we also allow scalar results IIRC.

Oh, right, I keep forgetting that :-( And I have no idea how to generate one
for a testcase, at least until the GIMPLE FE lands...


On platforms that have IOR on floats (at least x86 with SSE, maybe some
vector mode on s390?), it would be cool to do the same for floats (most
likely at the RTL level).

On GIMPLE view-converts could come to the rescue here as well.  Or we cab
just allow bit-and/or on floats as much as we allow them on pointers.

Would that generate sensible code on targets that do not have logic insns for
floats? Actually, even on x86_64 that generates inefficient code, so there
would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3).

I am also a bit wary of doing those obfuscating optimizations too early...
a==0 is something that other optimizations might use. long
c=(long&)a|(long&)b; (double&)c==0; less so...

(and I am assuming that signaling NaNs don't make the whole transformation
impossible, which might be wrong)

Yeah.  I also think it's not so much important - I just wanted to mention
vectors...

Btw, I still think we need a more sensible infrastructure for passes
to gather, analyze and modify complex conditions.  (I'm always pointing
to tree-affine.c as an, albeit not very good, example for handling
a similar problem)

Thanks for mentioning the value-matching capture @@, I wasn't aware of
this match.pd feature.
The current patch keeps it restricted to only bitwise operators on integers.
Bootstrap+test running on x86_64-unknown-linux-gnu.
OK to commit if passes ?

+/* PR35691: Transform
+   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
+   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
+

Please omit the vertical space

+(for bitop (bit_and bit_ior)
+ cmp (eq ne)
+ (simplify
+  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))

if you capture the first integer_zerop as @2 then you can re-use it...

+   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+   && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+   && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
(@1)))
+(cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));

... here inplace of the { build_zero_cst ... }.

Ok with that changes.

Thanks, committed the attached version as r241915.

ugh, the svn commit message has:

testsuite/
* gcc.dg/pr35691-1.c: New test-case.
* gcc.dg/pr35691-4.c: Likewise.

pr35691-4.c was a typo, should be pr35691-2.c :/
However testsuite/ChangeLog correctly has entry for pr35691-2.c
Is it possible to edit the commit message for r241915 ?
Sorry about this.

No, just leave it as-is.

Hi,
Chritstophe reported to me that the commit caused test-cases
pr35691-1.c and pr35691-2.c (which were added by the commit)
to FAIL for cortex-a5:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241915/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a5-vfpv3-d16-fp16.txt

It seems truth_andif_expr is not simplified to bit_and_expr on
cortex-a5 as for x86_64 (and other arm variants).
The differences in dumps start from 004t.gimple for pr35691-1.c:

x86_64 gimple dump:
foo (int z0, unsigned int z1)
{
   int D.1800;
   int t0;
   int t1;
   int t2;

   _1 = z0 == 0;
   t0 = (int) _1;
   _2 = z1 == 0;
   t1 = (int) _2;
   _3 = t0 != 0;
   _4 = t1 != 0;
   _5 = _3 & _4;
   t2 = (int) _5;
   D.1800 = t2;
   return D.1800;
}

cortex-a5 gimple dump:
foo (int z0, unsigned int z1)
{
   int iftmp.0;
   int D.4176;
   int t0;
   int t1;
   int t2;

   _1 = z0 == 0;
   t0 = (int) _1;
   _2 = z1 == 0;
  

Re: [match.pd] Fix for PR35691

2016-11-08 Thread Richard Biener
On Tue, 8 Nov 2016, Prathamesh Kulkarni wrote:

> On 8 November 2016 at 13:23, Richard Biener  wrote:
> > On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote:
> >
> >> On 7 November 2016 at 23:06, Prathamesh Kulkarni
> >>  wrote:
> >> > On 7 November 2016 at 15:43, Richard Biener  wrote:
> >> >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:
> >> >>
> >> >>> On 4 November 2016 at 13:41, Richard Biener  wrote:
> >> >>> > On Thu, 3 Nov 2016, Marc Glisse wrote:
> >> >>> >
> >> >>> >> On Thu, 3 Nov 2016, Richard Biener wrote:
> >> >>> >>
> >> >>> >> > > > > The transform would also work for vectors 
> >> >>> >> > > > > (element_precision for
> >> >>> >> > > > > the test but also a value-matching zero which should ensure 
> >> >>> >> > > > > the
> >> >>> >> > > > > same number of elements).
> >> >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal 
> >> >>> >> > > > length by a
> >> >>> >> > > > matching zero.
> >> >>> >> > > > Could you please elaborate on that ?
> >> >>> >> > >
> >> >>> >> > > He may have meant something like:
> >> >>> >> > >
> >> >>> >> > >   (op (cmp @0 integer_zerop@2) (cmp @1 @2))
> >> >>> >> >
> >> >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 
> >> >>> >> > which was the
> >> >>> >> > point of the pattern.
> >> >>> >>
> >> >>> >> Oups, that's what I had written first, and then I somehow managed 
> >> >>> >> to confuse
> >> >>> >> myself enough to remove it so as to remove the call to types_match 
> >> >>> >> :-(
> >> >>> >>
> >> >>> >> > > So the last operand is checked with operand_equal_p instead of
> >> >>> >> > > integer_zerop. But the fact that we could compute bit_ior on the
> >> >>> >> > > comparison results should already imply that the number of 
> >> >>> >> > > elements is the
> >> >>> >> > > same.
> >> >>> >> >
> >> >>> >> > Though for equality compares we also allow scalar results IIRC.
> >> >>> >>
> >> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to 
> >> >>> >> generate one
> >> >>> >> for a testcase, at least until the GIMPLE FE lands...
> >> >>> >>
> >> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, 
> >> >>> >> > > maybe some
> >> >>> >> > > vector mode on s390?), it would be cool to do the same for 
> >> >>> >> > > floats (most
> >> >>> >> > > likely at the RTL level).
> >> >>> >> >
> >> >>> >> > On GIMPLE view-converts could come to the rescue here as well.  
> >> >>> >> > Or we cab
> >> >>> >> > just allow bit-and/or on floats as much as we allow them on 
> >> >>> >> > pointers.
> >> >>> >>
> >> >>> >> Would that generate sensible code on targets that do not have logic 
> >> >>> >> insns for
> >> >>> >> floats? Actually, even on x86_64 that generates inefficient code, 
> >> >>> >> so there
> >> >>> >> would be some work (for instance grep finds no gen_iordf3, only 
> >> >>> >> gen_iorv2df3).
> >> >>> >>
> >> >>> >> I am also a bit wary of doing those obfuscating optimizations too 
> >> >>> >> early...
> >> >>> >> a==0 is something that other optimizations might use. long
> >> >>> >> c=(long&)a|(long&)b; (double&)c==0; less so...
> >> >>> >>
> >> >>> >> (and I am assuming that signaling NaNs don't make the whole 
> >> >>> >> transformation
> >> >>> >> impossible, which might be wrong)
> >> >>> >
> >> >>> > Yeah.  I also think it's not so much important - I just wanted to 
> >> >>> > mention
> >> >>> > vectors...
> >> >>> >
> >> >>> > Btw, I still think we need a more sensible infrastructure for passes
> >> >>> > to gather, analyze and modify complex conditions.  (I'm always 
> >> >>> > pointing
> >> >>> > to tree-affine.c as an, albeit not very good, example for handling
> >> >>> > a similar problem)
> >> >>> Thanks for mentioning the value-matching capture @@, I wasn't aware of
> >> >>> this match.pd feature.
> >> >>> The current patch keeps it restricted to only bitwise operators on 
> >> >>> integers.
> >> >>> Bootstrap+test running on x86_64-unknown-linux-gnu.
> >> >>> OK to commit if passes ?
> >> >>
> >> >> +/* PR35691: Transform
> >> >> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
> >> >> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
> >> >> +
> >> >>
> >> >> Please omit the vertical space
> >> >>
> >> >> +(for bitop (bit_and bit_ior)
> >> >> + cmp (eq ne)
> >> >> + (simplify
> >> >> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))
> >> >>
> >> >> if you capture the first integer_zerop as @2 then you can re-use it...
> >> >>
> >> >> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> >> >> +   && INTEGRAL_TYPE_P (TREE_TYPE (@1))
> >> >> +   && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
> >> >> (@1)))
> >> >> +(cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));
> >> >>
> >> >> ... here inplace of the { build_zero_cst ... }.
> >> >>
> >> >> Ok with that changes.
> >> > Thanks, committed the attached version as r241915.
> >> ugh, 

Re: [match.pd] Fix for PR35691

2016-11-08 Thread Prathamesh Kulkarni
On 8 November 2016 at 13:23, Richard Biener  wrote:
> On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 7 November 2016 at 23:06, Prathamesh Kulkarni
>>  wrote:
>> > On 7 November 2016 at 15:43, Richard Biener  wrote:
>> >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:
>> >>
>> >>> On 4 November 2016 at 13:41, Richard Biener  wrote:
>> >>> > On Thu, 3 Nov 2016, Marc Glisse wrote:
>> >>> >
>> >>> >> On Thu, 3 Nov 2016, Richard Biener wrote:
>> >>> >>
>> >>> >> > > > > The transform would also work for vectors (element_precision 
>> >>> >> > > > > for
>> >>> >> > > > > the test but also a value-matching zero which should ensure 
>> >>> >> > > > > the
>> >>> >> > > > > same number of elements).
>> >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal 
>> >>> >> > > > length by a
>> >>> >> > > > matching zero.
>> >>> >> > > > Could you please elaborate on that ?
>> >>> >> > >
>> >>> >> > > He may have meant something like:
>> >>> >> > >
>> >>> >> > >   (op (cmp @0 integer_zerop@2) (cmp @1 @2))
>> >>> >> >
>> >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which 
>> >>> >> > was the
>> >>> >> > point of the pattern.
>> >>> >>
>> >>> >> Oups, that's what I had written first, and then I somehow managed to 
>> >>> >> confuse
>> >>> >> myself enough to remove it so as to remove the call to types_match :-(
>> >>> >>
>> >>> >> > > So the last operand is checked with operand_equal_p instead of
>> >>> >> > > integer_zerop. But the fact that we could compute bit_ior on the
>> >>> >> > > comparison results should already imply that the number of 
>> >>> >> > > elements is the
>> >>> >> > > same.
>> >>> >> >
>> >>> >> > Though for equality compares we also allow scalar results IIRC.
>> >>> >>
>> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to 
>> >>> >> generate one
>> >>> >> for a testcase, at least until the GIMPLE FE lands...
>> >>> >>
>> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, 
>> >>> >> > > maybe some
>> >>> >> > > vector mode on s390?), it would be cool to do the same for floats 
>> >>> >> > > (most
>> >>> >> > > likely at the RTL level).
>> >>> >> >
>> >>> >> > On GIMPLE view-converts could come to the rescue here as well.  Or 
>> >>> >> > we cab
>> >>> >> > just allow bit-and/or on floats as much as we allow them on 
>> >>> >> > pointers.
>> >>> >>
>> >>> >> Would that generate sensible code on targets that do not have logic 
>> >>> >> insns for
>> >>> >> floats? Actually, even on x86_64 that generates inefficient code, so 
>> >>> >> there
>> >>> >> would be some work (for instance grep finds no gen_iordf3, only 
>> >>> >> gen_iorv2df3).
>> >>> >>
>> >>> >> I am also a bit wary of doing those obfuscating optimizations too 
>> >>> >> early...
>> >>> >> a==0 is something that other optimizations might use. long
>> >>> >> c=(long&)a|(long&)b; (double&)c==0; less so...
>> >>> >>
>> >>> >> (and I am assuming that signaling NaNs don't make the whole 
>> >>> >> transformation
>> >>> >> impossible, which might be wrong)
>> >>> >
>> >>> > Yeah.  I also think it's not so much important - I just wanted to 
>> >>> > mention
>> >>> > vectors...
>> >>> >
>> >>> > Btw, I still think we need a more sensible infrastructure for passes
>> >>> > to gather, analyze and modify complex conditions.  (I'm always pointing
>> >>> > to tree-affine.c as an, albeit not very good, example for handling
>> >>> > a similar problem)
>> >>> Thanks for mentioning the value-matching capture @@, I wasn't aware of
>> >>> this match.pd feature.
>> >>> The current patch keeps it restricted to only bitwise operators on 
>> >>> integers.
>> >>> Bootstrap+test running on x86_64-unknown-linux-gnu.
>> >>> OK to commit if passes ?
>> >>
>> >> +/* PR35691: Transform
>> >> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
>> >> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
>> >> +
>> >>
>> >> Please omit the vertical space
>> >>
>> >> +(for bitop (bit_and bit_ior)
>> >> + cmp (eq ne)
>> >> + (simplify
>> >> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))
>> >>
>> >> if you capture the first integer_zerop as @2 then you can re-use it...
>> >>
>> >> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
>> >> +   && INTEGRAL_TYPE_P (TREE_TYPE (@1))
>> >> +   && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
>> >> (@1)))
>> >> +(cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));
>> >>
>> >> ... here inplace of the { build_zero_cst ... }.
>> >>
>> >> Ok with that changes.
>> > Thanks, committed the attached version as r241915.
>> ugh, the svn commit message has:
>>
>> testsuite/
>> * gcc.dg/pr35691-1.c: New test-case.
>> * gcc.dg/pr35691-4.c: Likewise.
>>
>> pr35691-4.c was a typo, should be pr35691-2.c :/
>> However testsuite/ChangeLog correctly has entry for pr35691-2.c
>> Is it possible to edit the commit message for r241915 ?

Re: [PATCH][AArch64] Optimized implementation of search_line_fast for the CPP lexer

2016-11-08 Thread Richard Earnshaw (lists)
On 08/11/16 09:46, James Greenhalgh wrote:
> On Mon, Nov 07, 2016 at 01:39:53PM +, Richard Earnshaw (lists) wrote:
>> This patch contains an implementation of search_line_fast for the CPP
>> lexer.  It's based in part on the AArch32 (ARM) code but incorporates
>> new instructions available in AArch64 (reduction add operations) plus
>> some tricks for reducing the realignment overheads.  We assume a page
>> size of 4k, but that's a safe assumption -- AArch64 systems can never
>> have a smaller page size than that: on systems with larger pages we will
>> go through the realignment code more often than strictly necessary, but
>> it's still likely to be in the noise (less than 0.5% of the time).
>> Bootstrapped on aarch64-none-linux-gnu.
> 
> Some very minor nits wrt. style for the Advanced SIMD intrinsics, otherwise
> OK from me.
> 
>>
>> +  const uint8x16_t xmask = (uint8x16_t) vdupq_n_u64 (0x8040201008040201ULL);
> 
> 
> It is a pedantic point, but these casts are a GNU extension, the "portable"
> way to write this would be:
> 
>   vreinterpretq_u8_u64 (vdupq_n_u64 (0x8040201008040201ULL));

We've used GNU-style casts in the original code and never encountered
problems.  I personally find the reinterpret casts less readable..

> 
>> +
>> +#ifdef __AARCH64EB
>> +  const int16x8_t shift = {8, 8, 8, 8, 0, 0, 0, 0};
> 
> This sort of vector initialisation is a bit scary for user programmers, as
> we shouldn't generally mix Neon intrinsics with the GNU extensions (for
> exactly the reason you have here, keeping BE and LE straight is extra
> effort)
> 
> This could be written portably as:
> 
>   vcombine_u16 (vdup_n_u16 (8), vdup_n_u16 (0));
> 

Nice idea, but that's the wrong way around and fixing it currently
generates *terrible* code.

> Or if you prefer to be explicit about the elements:
> 
>   int16_t buf[] = {8, 8, 8, 8, 0, 0, 0, 0};
>   int16x8_t shift = vld1q_s16 (buf);
> 
>> +#else
>> +  const int16x8_t shift = {0, 0, 0, 0, 8, 8, 8, 8};
>> +#endif
>> +
>> +  unsigned int found;
>> +  const uint8_t *p;
>> +  uint8x16_t data;
>> +  uint8x16_t t;
>> +  uint16x8_t m;
>> +  uint8x16_t u, v, w;
>> +
>> +  /* Align the source pointer.  */
>> +  p = (const uint8_t *)((uintptr_t)s & -16);
>> +
>> +  /* Assuming random string start positions, with a 4k page size we'll take
>> + the slow path about 0.37% of the time.  */
>> +  if (__builtin_expect ((AARCH64_MIN_PAGE_SIZE
>> + - (((uintptr_t) s) & (AARCH64_MIN_PAGE_SIZE - 1)))
>> +< 16, 0))
>> +{
>> +  /* Slow path: the string starts near a possible page boundary.  */
>> +  uint32_t misalign, mask;
>> +
>> +  misalign = (uintptr_t)s & 15;
>> +  mask = (-1u << misalign) & 0x;
>> +  data = vld1q_u8 (p);
>> +  t = vceqq_u8 (data, repl_nl);
>> +  u = vceqq_u8 (data, repl_cr);
>> +  v = vorrq_u8 (t, vceqq_u8 (data, repl_bs));
>> +  w = vorrq_u8 (u, vceqq_u8 (data, repl_qm));
>> +  t = vorrq_u8 (v, w);
> 
> Can you trust the compiler to perform the reassociation here manually?
> That would let you write this in the more natural form:
> 
>   t = vceqq_u8 (data, repl_nl);
>   t = vorrq_u8 (t, vceqq_u8 (data, repl_cr));
>   t = vorrq_u8 (t, vceqq_u8 (data, repl_bs));
>   t = vorrq_u8 (t, vceqq_u8 (data, repl_qm));
> 

Maybe, but we have plenty of spare registers (this is target specific
code, I know what's happening).

Either way, the reassoc code is currently messing with this and
serializing the VORRQ operations.

>> +  t = vandq_u8 (t, xmask);
>> +  m = vpaddlq_u8 (t);
>> +  m = vshlq_u16 (m, shift);
>> +  found = vaddvq_u16 (m);
>> +  found &= mask;
>> +  if (found)
>> +return (const uchar*)p + __builtin_ctz (found);
>> +}
>> +  else
>> +{
>> +  data = vld1q_u8 ((const uint8_t *) s);
>> +  t = vceqq_u8 (data, repl_nl);
>> +  u = vceqq_u8 (data, repl_cr);
>> +  v = vorrq_u8 (t, vceqq_u8 (data, repl_bs));
>> +  w = vorrq_u8 (u, vceqq_u8 (data, repl_qm));
>> +  t = vorrq_u8 (v, w);
>> +  if (__builtin_expect (vpaddd_u64 ((uint64x2_t)t), 0))
>> +goto done;
> 
> As above, this cast is a GNU extension:
> 
> if (__builtin_expect (vpaddd_u64 (vreinterpretq_u64_u8 (t)), 0))
> 
>> +}
>> +
>> +  do
>> +{
>> +  p += 16;
>> +  data = vld1q_u8 (p);
>> +  t = vceqq_u8 (data, repl_nl);
>> +  u = vceqq_u8 (data, repl_cr);
>> +  v = vorrq_u8 (t, vceqq_u8 (data, repl_bs));
>> +  w = vorrq_u8 (u, vceqq_u8 (data, repl_qm));
>> +  t = vorrq_u8 (v, w);
>> +} while (!vpaddd_u64 ((uint64x2_t)t));
> 
> Likewise here.
> 
> Thanks,
> James
> 



Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2016-11-08 Thread Richard Biener
On Mon, 7 Nov 2016, Richard Biener wrote:

> 
> Currently we force peeling for gaps whenever element overrun can occur
> but for aligned accesses we know that the loads won't trap and thus
> we can avoid this.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
> some testsuite fallout here so didn't bother to invent a new testcase).
> 
> Just in case somebody thinks the overrun is a bad idea in general
> (even when not trapping).  Like for ASAN or valgrind.

This is what I applied.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2016-11-08  Richard Biener  

* tree-vect-stmts.c (get_group_load_store_type): If the
access is aligned do not trigger peeling for gaps.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Do not
force alignment of vars with DECL_USER_ALIGN.

* gcc.dg/vect/vect-nb-iter-ub-2.c: Adjust.

diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
index bc07b4b..4e13702 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
@@ -3,7 +3,7 @@
 #include "tree-vect.h"
 
 int ii[32];
-char cc[66] =
+char cc[66] __attribute__((aligned(1))) =
   { 0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, 9, 0,
 10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 0, 16, 0, 17, 0, 18, 0, 19, 0,
 20, 0, 21, 0, 22, 0, 23, 0, 24, 0, 25, 0, 26, 0, 27, 0, 28, 0, 29, 0,
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index b03cb1e..f014d68 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -831,6 +831,19 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
  return true;
}
 
+  if (DECL_USER_ALIGN (base))
+   {
+ if (dump_enabled_p ())
+   {
+ dump_printf_loc (MSG_NOTE, vect_location,
+  "not forcing alignment of user-aligned "
+  "variable: ");
+ dump_generic_expr (MSG_NOTE, TDF_SLIM, base);
+ dump_printf (MSG_NOTE, "\n");
+   }
+ return true;
+   }
+
   /* Force the alignment of the decl.
 NOTE: This is the only change to the code we make during
 the analysis phase, before deciding to vectorize the loop.  */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 15aec21..c29e73d 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1770,6 +1770,11 @@ get_group_load_store_type (gimple *stmt, tree vectype, 
bool slp,
   " non-consecutive accesses\n");
  return false;
}
+ /* If the access is aligned an overrun is fine.  */
+ if (overrun_p
+ && aligned_access_p
+  (STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt
+   overrun_p = false;
  if (overrun_p && !can_overrun_p)
{
  if (dump_enabled_p ())
@@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree vectype, 
bool slp,
   /* If there is a gap at the end of the group then these optimizations
 would access excess elements in the last iteration.  */
   bool would_overrun_p = (gap != 0);
+  /* If the access is aligned an overrun is fine.  */
+  if (would_overrun_p
+ && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
+   would_overrun_p = false;
   if (!STMT_VINFO_STRIDED_P (stmt_info)
  && (can_overrun_p || !would_overrun_p)
  && compare_step_with_zero (stmt) > 0)


Re: [ARM][GCC][PATCHv2 2/3] Add missing Poly64_t intrinsics to GCC

2016-11-08 Thread Kyrill Tkachov


On 07/11/16 13:55, Tamar Christina wrote:

Hi all,

This patch (2 of 3) adds the following NEON intrinsics to
the ARM back-end of GCC:

* vget_lane_p64

Added new tests for these and ran regression tests on aarch64-none-linux-gnu
and on arm-none-linux-gnueabihf.

Ok for trunk?


Ok.
Thanks,
Kyrill


Thanks,
Tamar

gcc/
2016-11-04  Tamar Christina  

* config/arm/arm_neon.h (vget_lane_p64): New.




Re: [AArch64][GCC][PATCHv2 1/3] Add missing Poly64_t intrinsics to GCC

2016-11-08 Thread James Greenhalgh
On Mon, Nov 07, 2016 at 01:55:15PM +, Tamar Christina wrote:
> Hi all,
> 
> Added new tests for these and ran regression tests on aarch64-none-linux-gnu
> and on arm-none-linux-gnueabihf.
> 
> Ok for trunk?

OK.

Thanks,
James

> gcc/
> 2016-11-04  Tamar Christina  
> 
>   * config/aarch64/aarch64-builtins.c (TYPES_SETREGP): Added poly type.
>   (TYPES_GETREGP): Likewise.
>   (TYPES_SHIFTINSERTP): Likewise.
>   (TYPES_COMBINEP): Likewise.
>   (TYPES_STORE1P): Likewise.
>   * config/aarch64/aarch64-simd-builtins.def
>   (combine): Added poly generator.
>   (get_dregoi): Likewise.
>   (get_dregci): Likewise.
>   (get_dregxi): Likewise.
>   (ssli_n): Likewise.
>   (ld1): Likewise.
>   (st1): Likewise.
>   * config/aarch64/arm_neon.h
>   (poly64x1x2_t, poly64x1x3_t): New.
>   (poly64x1x4_t, poly64x2x2_t): Likewise.
>   (poly64x2x3_t, poly64x2x4_t): Likewise.
>   (poly64x1_t): Likewise.
>   (vcreate_p64, vcombine_p64): Likewise.
>   (vdup_n_p64, vdupq_n_p64): Likewise.
>   (vld2_p64, vld2q_p64): Likewise.
>   (vld3_p64, vld3q_p64): Likewise.
>   (vld4_p64, vld4q_p64): Likewise.
>   (vld2_dup_p64, vld3_dup_p64): Likewise.
>   (vld4_dup_p64, vsli_n_p64): Likewise.
>   (vsliq_n_p64, vst1_p64): Likewise.
>   (vst1q_p64, vst2_p64): Likewise.
>   (vst3_p64, vst4_p64): Likewise.
>   (__aarch64_vdup_lane_p64, __aarch64_vdup_laneq_p64): Likewise.
>   (__aarch64_vdupq_lane_p64, __aarch64_vdupq_laneq_p64): Likewise.
>   (vget_lane_p64, vgetq_lane_p64): Likewise.
>   (vreinterpret_p8_p64, vreinterpretq_p8_p64): Likewise.
>   (vreinterpret_p16_p64, vreinterpretq_p16_p64): Likewise.
>   (vreinterpret_p64_f16, vreinterpret_p64_f64): Likewise.
>   (vreinterpret_p64_s8, vreinterpret_p64_s16): Likewise.
>   (vreinterpret_p64_s32, vreinterpret_p64_s64): Likewise.
>   (vreinterpret_p64_f32, vreinterpret_p64_u8): Likewise.
>   (vreinterpret_p64_u16, vreinterpret_p64_u32): Likewise.
>   (vreinterpret_p64_u64, vreinterpret_p64_p8): Likewise.
>   (vreinterpretq_p64_f64, vreinterpretq_p64_s8): Likewise.
>   (vreinterpretq_p64_s16, vreinterpretq_p64_s32): Likewise.
>   (vreinterpretq_p64_s64, vreinterpretq_p64_f16): Likewise.
>   (vreinterpretq_p64_f32, vreinterpretq_p64_u8): Likewise.
>   (vreinterpretq_p64_u16, vreinterpretq_p64_u32): Likewise.
>   (vreinterpretq_p64_u64, vreinterpretq_p64_p8): Likewise.
>   (vreinterpret_f16_p64, vreinterpretq_f16_p64): Likewise.
>   (vreinterpret_f32_p64, vreinterpretq_f32_p64): Likewise.
>   (vreinterpret_f64_p64, vreinterpretq_f64_p64): Likewise.
>   (vreinterpret_s64_p64, vreinterpretq_s64_p64): Likewise.
>   (vreinterpret_u64_p64, vreinterpretq_u64_p64): Likewise.
>   (vreinterpret_s8_p64, vreinterpretq_s8_p64): Likewise.
>   (vreinterpret_s16_p64, vreinterpret_s32_p64): Likewise.
>   (vreinterpretq_s32_p64, vreinterpret_u8_p64): Likewise.
>   (vreinterpret_u16_p64, vreinterpretq_u16_p64): Likewise.
>   (vreinterpret_u32_p64, vreinterpretq_u32_p64): Likewise.
>   (vset_lane_p64, vsetq_lane_p64): Likewise.
>   (vget_low_p64, vget_high_p64): Likewise.
>   (vcombine_p64, vst2_lane_p64): Likewise.
>   (vst3_lane_p64, vst4_lane_p64): Likewise.
>   (vst2q_lane_p64, vst3q_lane_p64): Likewise.
>   (vst4q_lane_p64, vget_lane_p64): Likewise.
>   (vget_laneq_p64, vset_lane_p64): Likewise.
>   (vset_laneq_p64, vcopy_lane_p64): Likewise.
>   (vcopy_laneq_p64, vdup_n_p64): Likewise.
>   (vdupq_n_p64, vdup_lane_p64): Likewise.
>   (vdup_laneq_p64, vld1_p64): Likewise.
>   (vld1q_p64, vld1_dup_p64): Likewise.
>   (vld1q_dup_p64, vld1q_dup_p64): Likewise.
>   (vmov_n_p64, vmovq_n_p64): Likewise.
>   (vst3q_p64, vst4q_p64): Likewise.
>   (vld1_lane_p64, vld1q_lane_p64): Likewise.
>   (vst1_lane_p64, vst1q_lane_p64): Likewise.
>   (vcopy_laneq_p64, vcopyq_laneq_p64): Likewise.
>   (vdupq_laneq_p64): Likewise.



Re: [RFC] Handle unary pass-through jump functions for ipa-vrp

2016-11-08 Thread kugan

Hi,


On 04/11/16 04:36, Martin Jambor wrote:

Hi,

On Fri, Oct 28, 2016 at 02:03:47PM +1100, kugan wrote:


...snip...

I have also separated the constant parameter conversion out and posted as
https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02309.html. This is now
handling just unary pass-through jump functions.

Bootstrapped and regression tested on x86_64-linux-gnu with no new
regressions.

Is this OK for trunk?

Thanks,
Kugan

gcc/testsuite/ChangeLog:

2016-10-28  Kugan Vivekanandarajah  

* gcc.dg/ipa/vrp7.c: New test.


gcc/ChangeLog:

2016-10-28  Kugan Vivekanandarajah  

* ipa-cp.c (ipa_get_jf_pass_through_result): Handle unary expressions.
(propagate_vr_accross_jump_function): Likewise.
* ipa-prop.c (ipa_set_jf_unary_pass_through): New.
(load_from_param_1): New.
(load_from_unmodified_param): Factor common part into load_from_param_1.
(load_from_param): New.
(compute_complex_assign_jump_func): Handle unary expressions.
(ipa_write_jump_function): Likewise.
(ipa_read_jump_function): Likewise.



Patch is OK with changes Martin suggested.

Honza




From b7d9b413951ba20d156a7801640cc7d7bc57c062 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Fri, 28 Oct 2016 10:16:38 +1100
Subject: [PATCH 2/2] add unary jump function

---
 gcc/ipa-cp.c| 39 +++---
 gcc/ipa-prop.c  | 89 +++--
 gcc/testsuite/gcc.dg/ipa/vrp7.c | 32 +++
 3 files changed, 142 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/vrp7.c

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 9f28557..8fc95dd 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -1225,13 +1225,21 @@ ipa_get_jf_pass_through_result (struct ipa_jump_func 
*jfunc, tree input)
 return NULL_TREE;

   if (TREE_CODE_CLASS (ipa_get_jf_pass_through_operation (jfunc))
-  == tcc_comparison)
-restype = boolean_type_node;
+  == tcc_unary)
+{
+  res = fold_unary (ipa_get_jf_pass_through_operation (jfunc),
+   TREE_TYPE (input), input);
+}


Please do not put curly braces around a single statement.  Apart from
that, no objection from me.


Thanks Martin, I will fix this.

Honza, is this OK for you with the above fix?

Thanks,
Kugan


Thanks,

Martin



Re: [ipa-vrp] ice in set_value_range

2016-11-08 Thread kugan

Hi,

On 04/11/16 03:24, Martin Jambor wrote:

Hi,

On Fri, Oct 28, 2016 at 01:58:13PM +1100, kugan wrote:

Do I understand it correctly that extract_range_from_unary_expr deals
with any potential type conversions better (compared to what you did
before here)?


Yes, this can be wrong at times too as reported in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78121. I have separated this
part of the patch with a testcase.

Please note that I am using fold_convert in the attached patch.

Bootstrapped and regression tested on x86_64-linux-gnu with no new
regressions. Is this OK for trunk?



I have no objections, but we need to wait for Honza.

Thanks.

Honza, is this OK for you ?

Thanks,
Kugan



Thanks,

Martin


Thanks,
Kugan


gcc/ChangeLog:

2016-10-28  Kugan Vivekanandarajah  

PR ipa/78121
* ipa-cp.c (propagate_vr_accross_jump_function): Pass param type.
Also fold constant passed as argument while computing value range.
(propagate_constants_accross_call): Pass param type.
* ipa-prop.c: export ipa_get_callee_param_type.
* ipa-prop.h: export ipa_get_callee_param_type.

gcc/testsuite/ChangeLog:

2016-10-28  Kugan Vivekanandarajah  

PR ipa/78121
* gcc.dg/ipa/pr78121.c: New test.




[Patch, Fortran, F03] PR77596: procedure pointer component with implicit interface can point to a function

2016-11-08 Thread Janus Weil
Hi all,

here is a simple patch for the accepts-invalid problem of PR77596.
Regtests cleanly on x86_64-linux-gnu. Ok for trunk?

Cheers,
Janus


2016-11-08  Janus Weil  

PR fortran/77596
* expr.c (gfc_check_pointer_assign): Add special check for procedure-
pointer component with absent interface.

2016-11-08  Janus Weil  

PR fortran/77596
* gfortran.dg/proc_ptr_comp_46.f90: New test.
Index: gcc/fortran/expr.c
===
--- gcc/fortran/expr.c  (Revision 241956)
+++ gcc/fortran/expr.c  (Arbeitskopie)
@@ -3445,7 +3445,7 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_ex
 {
   char err[200];
   gfc_symbol *s1,*s2;
-  gfc_component *comp;
+  gfc_component *comp1, *comp2;
   const char *name;
 
   attr = gfc_expr_attr (rvalue);
@@ -3549,9 +3549,9 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_ex
}
}
 
-  comp = gfc_get_proc_ptr_comp (lvalue);
-  if (comp)
-   s1 = comp->ts.interface;
+  comp1 = gfc_get_proc_ptr_comp (lvalue);
+  if (comp1)
+   s1 = comp1->ts.interface;
   else
{
  s1 = lvalue->symtree->n.sym;
@@ -3559,18 +3559,18 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_ex
s1 = s1->ts.interface;
}
 
-  comp = gfc_get_proc_ptr_comp (rvalue);
-  if (comp)
+  comp2 = gfc_get_proc_ptr_comp (rvalue);
+  if (comp2)
{
  if (rvalue->expr_type == EXPR_FUNCTION)
{
- s2 = comp->ts.interface->result;
+ s2 = comp2->ts.interface->result;
  name = s2->name;
}
  else
{
- s2 = comp->ts.interface;
- name = comp->name;
+ s2 = comp2->ts.interface;
+ name = comp2->name;
}
}
   else if (rvalue->expr_type == EXPR_FUNCTION)
@@ -3591,6 +3591,15 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_ex
   if (s2 && s2->attr.proc_pointer && s2->ts.interface)
s2 = s2->ts.interface;
 
+  /* Special check for the case of absent interface on the lvalue.
+   * All other interface checks are done below. */
+  if (!s1 && comp1 && comp1->attr.subroutine && s2 && s2->attr.function)
+   {
+ gfc_error ("Interface mismatch in procedure pointer assignment "
+"at %L: '%s' is not a subroutine", >where, name);
+ return false;
+   }
+
   if (s1 == s2 || !s1 || !s2)
return true;
 
! { dg-do compile }
!
! PR 77596: [F03] procedure pointer component with implicit interface can point to a function
!
! Contributed by toK 

program xxx
  implicit none

  type tf
 procedure(), nopass, pointer :: fp
  end type tf

  call ass()

contains

  integer function ff(x)
integer, intent(in) :: x
ff = x + 1
  end function ff

  subroutine ass()
type(tf) :: p
p%fp=>ff! { dg-error "is not a subroutine" }
call p%fp(3)
  end subroutine ass

end


Re: [PATCH] [AArch64] Fix PR71727

2016-11-08 Thread Hurugalawadi, Naveen
Hi Kyrill,

Thanks for the review and suggestions.

>> It's a good idea to CC the AArch64 maintainers and reviewers
>> on aarch64 patches, or at least

Thanks for CCing the maintainers. Added [AArch64] in the subject line.

>> New functions need a function comment describing their arguments and their 
>> result.

Done.

>> Some more information about why the current behaviour is wrong
>> and how the patch fixes it would be useful in reviewing.

support_vector_misalignment target hook is incorrect when 
STRICT_ALIGNMENT is true for AArch64. 
The patch implements the hook and rectifies the behavior.

Please find attached the modified patch as per suggestions.

Thanks,
Naveendiff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b7d4640..5a0eff5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -141,6 +141,10 @@ static bool aarch64_vector_mode_supported_p (machine_mode);
 static bool aarch64_vectorize_vec_perm_const_ok (machine_mode vmode,
 		 const unsigned char *sel);
 static int aarch64_address_cost (rtx, machine_mode, addr_space_t, bool);
+static bool aarch64_builtin_support_vector_misalignment (machine_mode mode,
+			 const_tree type,
+			 int misalignment,
+			 bool is_packed);
 
 /* Major revision number of the ARM Architecture implemented by the target.  */
 unsigned aarch64_architecture_version;
@@ -11148,6 +11152,37 @@ aarch64_simd_vector_alignment_reachable (const_tree type, bool is_packed)
   return true;
 }
 
+/* Return true if the vector misalignment factor is supported by the
+   target.  */
+static bool
+aarch64_builtin_support_vector_misalignment (machine_mode mode,
+	 const_tree type, int misalignment,
+	 bool is_packed)
+{
+  if (TARGET_SIMD && STRICT_ALIGNMENT)
+{
+  /* Return if movmisalign pattern is not supported for this mode.  */
+  if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing)
+return false;
+
+  if (misalignment == -1)
+	{
+	  /* Misalignment factor is unknown at compile time but we know
+	 it's word aligned.  */
+	  if (aarch64_simd_vector_alignment_reachable (type, is_packed))
+{
+  int element_size = TREE_INT_CST_LOW (TYPE_SIZE (type));
+
+  if (element_size != 64)
+return true;
+}
+	  return false;
+	}
+}
+  return default_builtin_support_vector_misalignment (mode, type, misalignment,
+		  is_packed);
+}
+
 /* If VALS is a vector constant that can be loaded into a register
using DUP, generate instructions to do so and return an RTX to
assign to the register.  Otherwise return NULL_RTX.  */
@@ -14398,6 +14433,10 @@ aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,
 #undef TARGET_VECTOR_MODE_SUPPORTED_P
 #define TARGET_VECTOR_MODE_SUPPORTED_P aarch64_vector_mode_supported_p
 
+#undef TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
+#define TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT \
+  aarch64_builtin_support_vector_misalignment
+
 #undef TARGET_ARRAY_MODE_SUPPORTED_P
 #define TARGET_ARRAY_MODE_SUPPORTED_P aarch64_array_mode_supported_p
 
diff --git a/gcc/testsuite/gcc.target/aarch64/pr71727.c b/gcc/testsuite/gcc.target/aarch64/pr71727.c
new file mode 100644
index 000..05eef3e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr71727.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-mstrict-align -O3" } */
+
+struct test_struct_s
+{
+  long a;
+  long b;
+  long c;
+  long d;
+  unsigned long e;
+};
+
+
+char _a;
+struct test_struct_s xarray[128];
+
+void
+_start (void)
+{
+  struct test_struct_s *new_entry;
+
+  new_entry = [0];
+  new_entry->a = 1;
+  new_entry->b = 2;
+  new_entry->c = 3;
+  new_entry->d = 4;
+  new_entry->e = 5;
+
+  return;
+}
+
+/* { dg-final { scan-assembler-times "mov\tx" 5 {target lp64} } } */
+/* { dg-final { scan-assembler-not "add\tx0, x0, :" {target lp64} } } */


Re: [PATCH][AArch64] Optimized implementation of search_line_fast for the CPP lexer

2016-11-08 Thread James Greenhalgh
On Mon, Nov 07, 2016 at 01:39:53PM +, Richard Earnshaw (lists) wrote:
> This patch contains an implementation of search_line_fast for the CPP
> lexer.  It's based in part on the AArch32 (ARM) code but incorporates
> new instructions available in AArch64 (reduction add operations) plus
> some tricks for reducing the realignment overheads.  We assume a page
> size of 4k, but that's a safe assumption -- AArch64 systems can never
> have a smaller page size than that: on systems with larger pages we will
> go through the realignment code more often than strictly necessary, but
> it's still likely to be in the noise (less than 0.5% of the time).
> Bootstrapped on aarch64-none-linux-gnu.

Some very minor nits wrt. style for the Advanced SIMD intrinsics, otherwise
OK from me.

> 
> +  const uint8x16_t xmask = (uint8x16_t) vdupq_n_u64 (0x8040201008040201ULL);


It is a pedantic point, but these casts are a GNU extension, the "portable"
way to write this would be:

  vreinterpretq_u8_u64 (vdupq_n_u64 (0x8040201008040201ULL));

> +
> +#ifdef __AARCH64EB
> +  const int16x8_t shift = {8, 8, 8, 8, 0, 0, 0, 0};

This sort of vector initialisation is a bit scary for user programmers, as
we shouldn't generally mix Neon intrinsics with the GNU extensions (for
exactly the reason you have here, keeping BE and LE straight is extra
effort)

This could be written portably as:

  vcombine_u16 (vdup_n_u16 (8), vdup_n_u16 (0));

Or if you prefer to be explicit about the elements:

  int16_t buf[] = {8, 8, 8, 8, 0, 0, 0, 0};
  int16x8_t shift = vld1q_s16 (buf);

> +#else
> +  const int16x8_t shift = {0, 0, 0, 0, 8, 8, 8, 8};
> +#endif
> +
> +  unsigned int found;
> +  const uint8_t *p;
> +  uint8x16_t data;
> +  uint8x16_t t;
> +  uint16x8_t m;
> +  uint8x16_t u, v, w;
> +
> +  /* Align the source pointer.  */
> +  p = (const uint8_t *)((uintptr_t)s & -16);
> +
> +  /* Assuming random string start positions, with a 4k page size we'll take
> + the slow path about 0.37% of the time.  */
> +  if (__builtin_expect ((AARCH64_MIN_PAGE_SIZE
> +  - (((uintptr_t) s) & (AARCH64_MIN_PAGE_SIZE - 1)))
> + < 16, 0))
> +{
> +  /* Slow path: the string starts near a possible page boundary.  */
> +  uint32_t misalign, mask;
> +
> +  misalign = (uintptr_t)s & 15;
> +  mask = (-1u << misalign) & 0x;
> +  data = vld1q_u8 (p);
> +  t = vceqq_u8 (data, repl_nl);
> +  u = vceqq_u8 (data, repl_cr);
> +  v = vorrq_u8 (t, vceqq_u8 (data, repl_bs));
> +  w = vorrq_u8 (u, vceqq_u8 (data, repl_qm));
> +  t = vorrq_u8 (v, w);

Can you trust the compiler to perform the reassociation here manually?
That would let you write this in the more natural form:

  t = vceqq_u8 (data, repl_nl);
  t = vorrq_u8 (t, vceqq_u8 (data, repl_cr));
  t = vorrq_u8 (t, vceqq_u8 (data, repl_bs));
  t = vorrq_u8 (t, vceqq_u8 (data, repl_qm));

> +  t = vandq_u8 (t, xmask);
> +  m = vpaddlq_u8 (t);
> +  m = vshlq_u16 (m, shift);
> +  found = vaddvq_u16 (m);
> +  found &= mask;
> +  if (found)
> + return (const uchar*)p + __builtin_ctz (found);
> +}
> +  else
> +{
> +  data = vld1q_u8 ((const uint8_t *) s);
> +  t = vceqq_u8 (data, repl_nl);
> +  u = vceqq_u8 (data, repl_cr);
> +  v = vorrq_u8 (t, vceqq_u8 (data, repl_bs));
> +  w = vorrq_u8 (u, vceqq_u8 (data, repl_qm));
> +  t = vorrq_u8 (v, w);
> +  if (__builtin_expect (vpaddd_u64 ((uint64x2_t)t), 0))
> + goto done;

As above, this cast is a GNU extension:

if (__builtin_expect (vpaddd_u64 (vreinterpretq_u64_u8 (t)), 0))

> +}
> +
> +  do
> +{
> +  p += 16;
> +  data = vld1q_u8 (p);
> +  t = vceqq_u8 (data, repl_nl);
> +  u = vceqq_u8 (data, repl_cr);
> +  v = vorrq_u8 (t, vceqq_u8 (data, repl_bs));
> +  w = vorrq_u8 (u, vceqq_u8 (data, repl_qm));
> +  t = vorrq_u8 (v, w);
> +} while (!vpaddd_u64 ((uint64x2_t)t));

Likewise here.

Thanks,
James



Re: [ARM][GCC][PATCHv2 2/3] Add missing Poly64_t intrinsics to GCC

2016-11-08 Thread Christophe Lyon
On 7 November 2016 at 14:55, Tamar Christina  wrote:
> Hi all,
>
> This patch (2 of 3) adds the following NEON intrinsics to
> the ARM back-end of GCC:
>
> * vget_lane_p64
>
> Added new tests for these and ran regression tests on aarch64-none-linux-gnu
> and on arm-none-linux-gnueabihf.
>
> Ok for trunk?
>
> Thanks,
> Tamar
>
> gcc/
> 2016-11-04  Tamar Christina  
>
> * config/arm/arm_neon.h (vget_lane_p64): New.

LGTM (but I can't approve)

Thanks


Re: Fix build of jit (was Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v3))

2016-11-08 Thread Jakub Jelinek
On Tue, Nov 08, 2016 at 10:38:23AM +0100, Martin Liška wrote:
> > I believe the 0 here is a bug, I'd think we should be using something like
> > ATTR_TMPURE_NOTHROW_LEAF_LIST that we are using __asan_load* - the functions
> > aren't going to throw, nor call anything in the current TU.  Not 100% sure
> > about the TMPURE, after all they do write/read memory (the shadow one).
> > So maybe ATTR_NOTHROW_LEAF_LIST instead for now?  Martin?
> 
> Yes, 0 is bug. I'm inclining to ATTR_NOTHROW_LEAF_LIST as 
> __asan_{un}poison_stack_memory
> modifies global memory. It would be more safe. I'm also going to change it 
> for ASAN_MARK
> internal function (where ECF_TM_PURE is currently selected).

The TM stuff needs to be eventually resolved with the TM maintainers
(Richard Henderson and Torvald Riegel), the thing is that we can annotate
stuff even in TM regions, tm_pure functions etc.  I believe we have lots of
other TM issues (internal calls and the like) that haven't been addressed.

Jakub


Re: [Patch AArch64] aarch64-c.o should depend on TARGET_H

2016-11-08 Thread Richard Earnshaw (lists)
On 01/11/16 13:58, James Greenhalgh wrote:
> 
> Hi,
> 
> I've noticed that aarch64-c.o doesn't get rebuilt after a change to
> the target hooks. That leaves it out of sync with the rest of the compiler
> in incremental builds, which in turn causes this code to write to the wrong
> memory location:
> 
>   void
>   aarch64_register_pragmas (void)
>   {
> /* Update pragma hook to allow parsing #pragma GCC target.  */
> targetm.target_option.pragma_parse = aarch64_pragma_target_parse;
>   }
> 
> Leaving pragma_parse pointing at the default implementation, and mangling
> poor innocent targetm.target_option.print (which generally we don't use
> after the rebuild - else we likely would see it here too).
> 
> This change adds a dependency on target.h to aarch64-c.o in t-aarch64,
> which looks correct.
> 
> Thanks,
> James
> 
> ---
> 2016-11-01  James Greenhalgh  
> 
>   * config/aarch64/t-aarch64 (aarch64-c.o): Depend on TARGET_H.
> 

OK.

Seems pretty obvious...

R.

> 
> 0001-Patch-AArch64-aarch64-c.o-should-depend-on-TARGET_H.patch
> 
> 
> diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
> index 04eb636..b461eb5 100644
> --- a/gcc/config/aarch64/t-aarch64
> +++ b/gcc/config/aarch64/t-aarch64
> @@ -52,7 +52,7 @@ aarch-common.o: $(srcdir)/config/arm/aarch-common.c 
> $(CONFIG_H) $(SYSTEM_H) \
>   $(srcdir)/config/arm/aarch-common.c
>  
>  aarch64-c.o: $(srcdir)/config/aarch64/aarch64-c.c $(CONFIG_H) $(SYSTEM_H) \
> -coretypes.h $(TM_H) $(TREE_H) output.h $(C_COMMON_H)
> +coretypes.h $(TM_H) $(TREE_H) output.h $(C_COMMON_H) $(TARGET_H)
>   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>   $(srcdir)/config/aarch64/aarch64-c.c
>  
> 



Re: Fix build of jit (was Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v3))

2016-11-08 Thread Martin Liška
On 11/07/2016 05:17 PM, Jakub Jelinek wrote:
> On Mon, Nov 07, 2016 at 11:07:13AM -0500, David Malcolm wrote:
>> The patch (r241896) introduced an error in the build of the jit:
>>
>> ../../src/gcc/jit/jit-builtins.c:62:1: error: invalid conversion from
>> ‘int’ to ‘gcc::jit::built_in_attribute’ [-fpermissive]
>>  };
>>  ^
>>
>> which seems to be due to the "0" for ATTRS in:
>>
>> --- a/gcc/sanitizer.def
>> +++ b/gcc/sanitizer.def
>> @@ -165,6 +165,10 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_BEFORE_DYNAMIC_INIT,
>>  DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_AFTER_DYNAMIC_INIT,
>>"__asan_after_dynamic_init",
>>BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
>> +DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_CLOBBER_N, "__asan_poison_stack_memory",
>> +  BT_FN_VOID_PTR_PTRMODE, 0)
>> +DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_UNCLOBBER_N, 
>> "__asan_unpoison_stack_memory",
>> +  BT_FN_VOID_PTR_PTRMODE, 0)
> 
> I believe the 0 here is a bug, I'd think we should be using something like
> ATTR_TMPURE_NOTHROW_LEAF_LIST that we are using __asan_load* - the functions
> aren't going to throw, nor call anything in the current TU.  Not 100% sure
> about the TMPURE, after all they do write/read memory (the shadow one).
> So maybe ATTR_NOTHROW_LEAF_LIST instead for now?  Martin?

Yes, 0 is bug. I'm inclining to ATTR_NOTHROW_LEAF_LIST as 
__asan_{un}poison_stack_memory
modifies global memory. It would be more safe. I'm also going to change it for 
ASAN_MARK
internal function (where ECF_TM_PURE is currently selected).

I'm testing patch for that.
Martin

> 
>> Is the attached patch OK as a fix? (assuming testing passes)  Or should
>> these builtins have other attrs?  (sorry, am not very familiar with the
>> sanitizer code).
> 
>   Jakub
> 



Re: Question about lambda function variables

2016-11-08 Thread Martin Liška
On 11/08/2016 10:12 AM, Jakub Jelinek wrote:
> On Tue, Nov 08, 2016 at 09:58:13AM +0100, Martin Liška wrote:
>> Problematic is lambda function (use-after-scope-ice-1.ii.004t.gimple):
>> C::AsyncCloseConnectionWithErrorMsg(const A&):: (const struct 
>> __lambda0 * const __closure)
>> {
>>   const struct A message [value-expr: __closure->__message];
>>   struct C * const this [value-expr: __closure->__this];
>>
>>   try
>> {
>>   ASAN_MARK (2, , 4);
> 
> That shows that the ASAN_MARK adds the  without going through
> gimplify_expr on it and therefore not handling the DECL_VALUE_EXPR in it,
> otherwise it would be
>   _2 = &__closure->__message;
>   ASAN_MARK (2, _2, 4);
> or something similar.
> That said, poisoning/unpoisoning the lambda captured vars inside of the
> lambda is of course wrong, 1) you really don't know where the members
> live, it could be on the stack, but could very well be on the heap or
> elsewhere, and while for stack and say longjmp we are prepared to unpoison
> it, for heap allocated vars you risk you keep the memory poisoned in corner
> cases and nothing will ever unpoison it; 2) the captured vars live longer
> than just in the lambda method, it is perhaps up to whatever function
> creates the lambda var to poison/unpoison it.
> 
>>   _1 = __closure->__this;
>>   C::DispatchConnectionCloseEvent (_1, __closure->__message);
>> }
>>   finally
>> {
>>   ASAN_MARK (1, , 4);
>> }
>> }
>>
>> Where for quite obvious reasons variables 'message' can't be put as a stack 
>> variable and ICE is triggered in:
>> /tmp/use-after-scope-ice-1.ii:31:23: internal compiler error: in 
>> make_decl_rtl, at varasm.c:1311
>>
>> My question is how to properly identify local variables defined in __closure 
>> context? Is it somehow
>> related to DECL_HAS_VALUE_EXPR_P field set on a var?
> 
> So yes, you should just ignore vars with DECL_HAS_VALUE_EXPR_P.  That can
> mean lots of things (e.g. heavily used for OpenMP/OpenACC/Cilk+), but I
> can't think of a case which you would like to poison - if it is
> DECL_VALUE_EXPR to another var of part thereof, the other var should still
> be declared in its scope.

Thank you for clarification, I'm testing patch for this and other fallout 
issues.

Martin

> 
>   Jakub
> 



RE: [PATCH][GCC/TESTSUITE] Make test for traditional-cpp depend on

2016-11-08 Thread Tamar Christina
> Can you remove the comment: Newlib uses ## when including stdlib.h as of
> 2007-09-07.  while you are at it?  I think it doesn't make any sense post the
> change unless one reads history.
> 

No problem,
Thanks

> > 2016-10-31  Tamar Christina  
> >
> > PR testsuite/78136
> > * gcc.dg/cpp/trad/trad.exp
> > (dg-runtest): Added $srcdir/$subdir/ to Include dirs.
> > * gcc.dg/cpp/trad/include.c: Use local header
> > file.



Re: [patch, avr] Add flash size to device info and make wrap around default

2016-11-08 Thread Georg-Johann Lay

On 08.11.2016 08:08, Pitchumani Sivanupandi wrote:

I have updated patch to include the flash size as well. Took that info from
device headers (it was fed into crt's device information note section also).


The new option would render -mn-flash superfluous, but we should keep it for
backward compatibility.

Ok.

Shouldn't link_pmem_wrap then be removed from link_relax, i.e. from
LINK_RELAX_SPEC?  And what happens if relaxation is off?

Yes. Removed link_pmem_wrap from link_relax.
Disabling relaxation doesn't change -mpmem-wrap-around behavior.

Now, wrap around behavior is changed as follows:

For 8K flash devices:
Device specs adds --pmem-wrap-around=8k linker option if -mno-pmem-wrap-around
is NOT enabled.
It makes the --pmem-wrap-around=8k linker option default for 8k flash devices.

For 16/32/64K flash devices:
Spec string 'link_pmem_wrap' added to all 16/32/64k flash devices specs.
Other wise no changes i.e. It adds --pmem-wrap-around=16/32/64k option if
-mpmem-wrap-around option is enabled.

For other devices, no changes in device specs.

Reg tested with default and -mrelax options enabled. No issues.

Regards,
Pitchumani


gcc/ChangeLog

2016-11-08  Pitchumani Sivanupandi 

* config/avr/avr-arch.h (avr_mcu_t): Add flash_size member.
* config/avr/avr-devices.c(avr_mcu_types): Add flash size info.
* config/avr/avr-mcu.def: Likewise.
* config/avr/gen-avr-mmcu-specs.c (print_mcu): Remove hard-coded prefix
check to find wrap-around value, instead use MCU flash size. For 8k flash
devices, update link_pmem_wrap spec string to add --pmem-wrap-around=8k.
* config/avr/specs.h: Remove link_pmem_wrap from LINK_RELAX_SPEC and
add to linker specs (LINK_SPEC) directly.

flashsize-and-wrap-around.patch



diff --git a/gcc/config/avr/avr-mcus.def b/gcc/config/avr/avr-mcus.def
index 6bcc6ff..9d4aa1a 100644



 /* Classic, == 128K.  */
-AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,
0x0060, 0x0, 2)
-AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, "__AVR_ATmega103__", 
0x0060, 0x0, 2)
-AVR_MCU ("at43usb320",   ARCH_AVR31, AVR_ISA_NONE, "__AVR_AT43USB320__",   
0x0060, 0x0, 2)
+AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,
0x0060, 0x0, 2, 0x2)
+AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, "__AVR_ATmega103__", 
0x0060, 0x0, 2, 0x2)
+AVR_MCU ("at43usb320",   ARCH_AVR31, AVR_ISA_NONE, "__AVR_AT43USB320__",   
0x0060, 0x0, 2, 0x1)


This looks incorrect: either .flash_size should be 0x2 or n_flash be 1.  As 
you draw the information from internal hardware descriptions, I'd guess that 
the new information is more reliable?


...as this also determines whether AT43USB320supports ELPM this also means that 
the device is in the wrong multilib set?


I couldn't find the data sheet at atmel.com, and the one I found on the net 
only mentions 64KiB program memory.  It mentions LPM on pp. 9 but has no 
reference to the supported instruction set.  From the manual I would conclude 
that this device should be avr3, not avr31?




+AVR_MCU ("atxmega384c3", ARCH_AVRXMEGA6, AVR_ISA_RMW,  
"__AVR_ATxmega384C3__", 0x2000, 0x0, 6, 0x62000)
+AVR_MCU ("atxmega384d3", ARCH_AVRXMEGA6, AVR_ISA_NONE, 
"__AVR_ATxmega384D3__", 0x2000, 0x0, 6, 0x62000)
 /* Xmega, 128K < Flash, RAM > 64K RAM.  */


Two more glitches; presumably, .n_flash should be 7 here? Or even better, we 
can drop .n_flash field in the future and use the more reliable, finer grained 
information from .flash_size instead?


Johann



Re: [PATCH][ARM][2/2] Remove old rtx costs

2016-11-08 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00064.html

Thanks,
Kyrill

On 01/11/16 17:12, Kyrill Tkachov wrote:

Hi all,

This is the big removal patch that removes the old costs functions, the 
function pointer
field in tune_params, and the transitional options -mold-rtx-costs and 
-mnew-generic-costs.
The diff stats come in at:
3 files changed, 61 insertions(+), 1275 deletions(-)

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2016-11-01  Kyrylo Tkachov  

* config/arm/arm.opt (mold-rtx-costs): Delete.
(mnew-generic-costs): Delete.
* config/arm/arm-protos.h (struct tune_params): Delete rtx_costs field.
* config/arm/arm.c (arm_rtx_costs_1): Delete.
(arm_size_rtx_costs): Likewise.
(arm_slowmul_rtx_costs): Likewise.
(arm_fastmul_rtx_costs): Likewise.
(arm_xscale_rtx_costs): Likewise.
(arm_9e_rtx_costs): Likewise.
(arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
arm_cortex_a5_tune, arm_xgene1_tune, arm_marvell_pj4_tune,
arm_cortex_a35_tune, arm_exynosm1_tune, arm_cortex_a73_tune,
arm_cortex_m7_tune):
Delete rtx_costs field.
(arm_new_rtx_costs): Rename to...
(arm_rtx_costs_internal): ... This.
(arm_rtx_costs): Remove old way of doing rtx costs.




Re: [PATCH][ARM][1/2] Use generic_extra_costs in all remaining tuning structs

2016-11-08 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00063.html

Thanks,
Kyrill
On 01/11/16 17:12, Kyrill Tkachov wrote:

Hi all,

This is the first of two patches to do away with the transitional 
-mold-rtx-costs option and finalise
the transition to the table-based RTX costs approach.

This first patch switches the remaining tuning structs to use 
generic_extra_costs so that the 2nd
patch can remove the rtx_costs function pointer in tune_params. This 
essentially makes the transitional
option -mnew-generic-costs the default (though it will be removed in the second 
patch).

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2016-11-01  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/arm/arm.c (arm_slowmul_tune): Use generic_extra_costs.
(arm_fastmul_tune): Likewise.
(arm_strongarm_tune): Likewise.
(arm_xscale_tune): Likewise.
(arm_9e_tune): Likewise.
(arm_marvell_pj4_tune): Likewise.
(arm_v6t2_tune): Likewise.
(arm_v6m_tune): Likewise.
(arm_fa726te_tune): Likewise.




  1   2   >