Re: [PATCH][AArch64] Add crypto_pmull attribute

2017-06-20 Thread Hurugalawadi, Naveen
Hi Ramana,

Thanks for the review and approval.

>> Please update the ARM backend with the new attribute too
>> (define_insn "crypto_vmullp64"

Its already been updated in the patch posted at:-
https://gcc.gnu.org/ml/gcc-patches/2017-03/msg00504.html

>> Ok with that change and checking that you can build cc1 for arm-none-eabi .

Checked and built the arm toolchain successfully with the patch.

Patch has been committed at:-
https://gcc.gnu.org/viewcvs/gcc?view=revision=249433

Thanks,
Naveen


[PATCH, rs6000] Add vec_reve support

2017-06-20 Thread Carl Love
GCC maintainers:

This patch adds support for the various vec_reve builtins.

The patch has been tested on powerpc64le-unknown-linux-gnu (Power 8 LE)
and on powerpc64-unknown-linux-gnu (Power 8 BE) with no regressions.

Is the patch OK for gcc mainline?

  Carl Love

---

gcc/ChangeLog:

2017-06-20  Carl Love  

* config/rs6000/rs6000-c.c: Add support for built-in functions
vector bool char vec_reve (vector bool char);
vector signed char vec_reve (vector signed char);
vector unsigned char vec_reve (vector unsigned char);
vector bool int vec_reve (vector bool int);
vector signed int vec_reve (vector signed int);
vector unsigned int vec_reve (vector unsigned int);
vector bool long long vec_reve (vector bool long long);
vector signed long long vec_reve (vector signed long long);
vector unsigned long long vec_reve (vector unsigned long long);
vector bool short vec_reve (vector bool short);
vector signed short vec_reve (vector signed short);
vector double vec_reve (vector double);
vector float vec_reve (vector float);
* config/rs6000/rs6000-builtin.def (VREVE_V2DI, VREVE_V4SI,
VREVE_V8HI, VREVE_V16QI, VREVE_V2DF, VREVE_V4SF, VREVE): New
* config/rs6000/altivec.md (UNSPEC_VREVEV, VEC_A_size,
altivec_vrev): New
UNSPEC, new mode_attr, new patterns.
* config/rs6000/altivec.h (vec_reve): New define
* doc/extend.texi (vec_rev): Update the built-in documentation file
for the new built-in functions.

gcc/testsuite/ChangeLog:

2017-06-20  Carl Love  

* gcc.target/powerpc/builtins-3-vec_reve-runable.c (test_results,
main): Add new runnable test file for the vec_rev built-ins.
---
 gcc/config/rs6000/altivec.h|   1 +
 gcc/config/rs6000/altivec.md   |  31 +++
 gcc/config/rs6000/rs6000-builtin.def   |   9 +
 gcc/config/rs6000/rs6000-c.c   |  29 +++
 gcc/doc/extend.texi|  13 ++
 .../powerpc/builtins-3-vec_reve-runnable.c | 251 +
 6 files changed, 334 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/builtins-3-vec_reve-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index d542315..98ccfd2 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -142,6 +142,7 @@
 #define vec_madd __builtin_vec_madd
 #define vec_madds __builtin_vec_madds
 #define vec_mtvscr __builtin_vec_mtvscr
+#define vec_reve   __builtin_vec_vreve
 #define vec_vmaxfp __builtin_vec_vmaxfp
 #define vec_vmaxsw __builtin_vec_vmaxsw
 #define vec_vmaxsh __builtin_vec_vmaxsh
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 25b2768..800d70c 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -46,6 +46,7 @@
UNSPEC_VPACK_UNS_UNS_SAT
UNSPEC_VPACK_UNS_UNS_MOD
UNSPEC_VPACK_UNS_UNS_MOD_DIRECT
+   UNSPEC_VREVEV
UNSPEC_VSLV4SI
UNSPEC_VSLO
UNSPEC_VSR
@@ -231,6 +232,11 @@
 ;; Vector negate
 (define_mode_iterator VNEG [V4SI V2DI])
 
+;; Vector reverse elements, uses define_mode_iterator VEC_A
+;; size in bytes of the vector element
+(define_mode_attr VEC_A_size [(V2DI "8") (V4SI "4") (V8HI "2")
+  (V16QI "1") (V2DF "8") (V4SF "4")])
+
 ;; Vector move instructions.
 (define_insn "*altivec_mov"
   [(set (match_operand:VM2 0 "nonimmediate_operand" 
"=Z,v,v,?Y,?*r,?*r,v,v,?*r")
@@ -3727,6 +3733,31 @@
   DONE;
 }")
 
+;; Vector reverse elements
+(define_expand "altivec_vreve2"
+  [(set (match_operand:VEC_A 0 "register_operand" "=v")
+   (unspec:VEC_A [(match_operand:VEC_A 1 "register_operand" "v")]
+ UNSPEC_VREVEV))]
+  "TARGET_ALTIVEC"
+{
+  int i, j, k, size, num_elements;
+  rtvec v = rtvec_alloc (16);
+  rtx mask = gen_reg_rtx (V16QImode);
+
+  size = ;
+  num_elements = 16 / size;
+  k = 0;
+
+  for (j = num_elements-1; j >= 0; j--)
+for (i = 0; i < size; i++)
+  RTVEC_ELT (v, i + j*size) = gen_rtx_CONST_INT (QImode, k++);
+
+  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_altivec_vperm_ (operands[0], operands[1],
+operands[1], mask));
+  DONE;
+})
+
 ;; Vector SIMD PEM v2.06c defines LVLX, LVLXL, LVRX, LVRXL,
 ;; STVLX, STVLXL, STVVRX, STVRXL are available only on Cell.
 (define_insn "altivec_lvlx"
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 4682628..20974b4 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1130,6 +1130,13 @@ BU_ALTIVEC_1 (VUPKLSB, "vupklsb",CONST,  
altivec_vupklsb)
 BU_ALTIVEC_1 (VUPKLPX,   "vupklpx",CONST,  altivec_vupklpx)
 BU_ALTIVEC_1 

Re: [committed] Fix -Werror=class-memaccess failures in jit testsuite (PR jit/81144)

2017-06-20 Thread David Malcolm
On Tue, 2017-06-20 at 17:15 -0600, Martin Sebor wrote:
> On 06/20/2017 03:25 PM, David Malcolm wrote:
> > This patch fixes a couple of failures of the form:
> > 
> >   error: 'void* memset(void*, int, size_t)' clearing an object of
> > non-trivial
> > type 'struct quadratic_test'; use assignment or value
> > -initialization
> > instead [-Werror=class-memaccess]
> >   note: 'struct quadratic_test' declared here
> >   cc1plus: all warnings being treated as errors
> > 
> > seen within the jit testsuite, by using zero-initialization instead
> > of memset.
> > 
> > (presumably introduced by r249234 aka
> > a324786b4ded9047d05463b4bce9d238b6c6b3ef)
> > 
> > Successfully tested on x86_64-pc-linux-gnu; takes jit.sum from:
> >   # of expected passes9211
> >   # of unexpected failures2
> > to:
> >   # of expected passes9349
> > 
> > Martin: it's unclear to me what the benefit of the warning is for
> > these
> > cases.  AIUI, it's complaining because the code is calling
> > the default ctor for struct quadratic_test, and then that object is
> > being clobbered by the memset.
> > But if I'm reading things right, the default ctor for this struct
> > zero-initializes all fields.  Can't the compiler simply optimize
> > away
> > the redundant memset, and not issue a warning?

Thanks for the info.

> -Wclass-memaccess is issued because struct quadratic_test contains
> members of classes that define a default ctor to initialize their
> private members.  
> The premise behind the warning is that objects
> of types with user-defined default and copy ctors should be
> initialized by making use of their ctors, and those with private
> data members manipulated via member functions rather than by
> directly modifying their raw representation.  Using memset to
> bypass the default ctor doesn't begin the lifetime of an object,
> can violate invariants set up by it, and using it to overwrite
> private members breaks encapsulation.  Examples of especially
> insidious errors include overwriting const data, references, or
> pointer to data members for which zero-initialization isn't
> the same as clearing their bytes.

If I'm reading my code correctly, all of the default ctors of all of
the members of this struct are "merely" initializing the pointer they
wrap to NULL.

So the ctors are initializing everything to NULL, and then the memset
redundant re-init's everything to 0 bits (I guess I was going for a
"belt and braces" approach to ensure that things are initialized).

> The warning runs early on in the C++ front end and has no knowledge
> of either the effects of the type's ctors, dtor, and copy assignment
> operator, or whether the raw memory function is called in lieu of
> initializing an object (e.g., in storage obtained from malloc or
> operator new), or as a shortcut to zero out its members, or when
> zeroing them out happens to be safe and doesn't actually do any
> of those bad things I mentioned above.

Aha: so at the place where the warning runs it's not possible to access
the ctors and tell that they're assigning NULL everywhere?

Might it be possible to convert the warning to work in a two-phase way
where it first gathers up a vec of suspicious-looking modifications,
and then flushes them later, filtering against ctor information when it
has the latter?  (so that we don't have to warn for this case at
-Wall?)

Alternatively maybe this is PEBCAK at my end; if so, maybe a case for adding 
this to the changes.html page?  (and maybe adding some notes on workarounds 
there, and/or to invoke.texi?)

> 
> That said, I'm sorry (and a little surprised) that I missed these
> errors in my tests.  I thought I had all the languages covered by
> using
> 
>--enable-languages=all,ada,c,c++,fortran,go,lto,objc,obj-c++
> 
> but I guess jit still isn't implied by all, even after Nathan's
> recent change to it.  Let me add jit to my script (IIRC, I once
> had it there but it was causing some trouble and I took it out.)

Reading r248454 (aka 01b4453cde8f1871495955298043d9fb589e4a36), it
looks like "jit" is only included in "all" if you also pass
  --enable-host-shared
Presumably that's what happened.  Bother.

Thanks; hope this is constructive.
Dave


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-20 Thread Wilco Dijkstra
Jeff Law wrote:
> But the stack pointer might have already been advanced into the guard
> page by the caller.   For the sake of argument assume the guard page is
> 0xf1000 and assume that our stack pointer at entry is 0xf1010 and that
> the caller hasn't touched the 0xf1000 page.
>
> If FrameSize >= 32, then the stores are going to hit the 0xf page
> rather than the 0xf1000 page.   That's jumping the guard.  Thus we have
> to emit a probe prior to this stack allocation.

That's an incorrect ABI that allows adjusting the frame by 4080+32! A correct
one might allow say 1024 bytes for outgoing arguments. That means when
you call a function, there is still guard-page-size - 1024 bytes left that you 
can
use to allocate locals. With a 4K guard page that allows leaf functions up to 
3KB, 
and depending on the frame locals of 2-3KB plus up to 1024 bytes of outgoing
arguments without inserting any probes beyond the normal frame stores. 

This design means almost no functions need additional probes. Assuming we're
also increasing the guard page size to 64KB, it's cheap even for large 
functions.

Wilco


Re: [committed] Fix -Werror=class-memaccess failures in jit testsuite (PR jit/81144)

2017-06-20 Thread Martin Sebor

On 06/20/2017 03:25 PM, David Malcolm wrote:

This patch fixes a couple of failures of the form:

  error: 'void* memset(void*, int, size_t)' clearing an object of non-trivial
type 'struct quadratic_test'; use assignment or value-initialization
instead [-Werror=class-memaccess]
  note: 'struct quadratic_test' declared here
  cc1plus: all warnings being treated as errors

seen within the jit testsuite, by using zero-initialization instead
of memset.

(presumably introduced by r249234 aka a324786b4ded9047d05463b4bce9d238b6c6b3ef)

Successfully tested on x86_64-pc-linux-gnu; takes jit.sum from:
  # of expected passes9211
  # of unexpected failures2
to:
  # of expected passes9349

Martin: it's unclear to me what the benefit of the warning is for these
cases.  AIUI, it's complaining because the code is calling
the default ctor for struct quadratic_test, and then that object is
being clobbered by the memset.
But if I'm reading things right, the default ctor for this struct
zero-initializes all fields.  Can't the compiler simply optimize away
the redundant memset, and not issue a warning?


-Wclass-memaccess is issued because struct quadratic_test contains
members of classes that define a default ctor to initialize their
private members.  The premise behind the warning is that objects
of types with user-defined default and copy ctors should be
initialized by making use of their ctors, and those with private
data members manipulated via member functions rather than by
directly modifying their raw representation.  Using memset to
bypass the default ctor doesn't begin the lifetime of an object,
can violate invariants set up by it, and using it to overwrite
private members breaks encapsulation.  Examples of especially
insidious errors include overwriting const data, references, or
pointer to data members for which zero-initialization isn't
the same as clearing their bytes.

The warning runs early on in the C++ front end and has no knowledge
of either the effects of the type's ctors, dtor, and copy assignment
operator, or whether the raw memory function is called in lieu of
initializing an object (e.g., in storage obtained from malloc or
operator new), or as a shortcut to zero out its members, or when
zeroing them out happens to be safe and doesn't actually do any
of those bad things I mentioned above.

That said, I'm sorry (and a little surprised) that I missed these
errors in my tests.  I thought I had all the languages covered by
using

  --enable-languages=all,ada,c,c++,fortran,go,lto,objc,obj-c++

but I guess jit still isn't implied by all, even after Nathan's
recent change to it.  Let me add jit to my script (IIRC, I once
had it there but it was causing some trouble and I took it out.)

Martin



Re: [PATCH][libgcc] Fix PR81080, build libgcov with large file support

2017-06-20 Thread Ian Lance Taylor via gcc-patches
On Wed, Jun 14, 2017 at 1:01 AM, Richard Biener  wrote:
>
> The following patch makes sure we build the 32bit multilib libgcov with
> large file support on x86_64-linux.  libgcov.h ends up using auto-host.h
> via including tconfig.h which is only valid for the main multilib
> (and on x86_64 doesn't need explicit large-file support defines).  That
> libgcc ends up using that is probably from times where it wasn't at
> the toplevel, some files already include auto-target.h generated by
> libgcc configure but most do so after including tsystem.h which is
> of course too late.  I suppose libgcc files shouldn't include tconfig.h
> from gcc/, but that's a change going to far for this bug ;)
>
> Thus, this makes libgcov.h include auto-target.h (but in the correct
> position) plus adds AC_SYS_LARGEFILE to libgccs configure.
>
> With that I properly end up with 32bit libgcov.a using fopen64 and open64
> as fopen/open seem to fail for some filesystems and inode numbers that
> do not fit 32bits even if the files in question are not large.  Failure
> mode is:
>
> int main(void) {
> return 0;
> }
>
> niffler:/home/mue/src # gcc -m32 --coverage -o t testit.c
> niffler:/home/mue/src # ./t
> profiling:/home/mue/src/testit.gcda:Cannot open
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk and
> branches after a while?
>
> libgcov is the only libgcc module doing I/O.
>
> Other than libgcov libgomp, libcilkrts, libmpx and libstdc++
> are similarly affected (they use fopen on the 32bit multilib)
> but not fixed.  libubsan, libasan, libssp, libbacktrace and libgfortran
> use open.  While libgfortran configury has AC_SYS_LARGEFILE, the
> open use leaks in through libbacktrace (ubsan/asan might have the
> same issue, didn't investigate).  libbacktrace lacks AC_SYS_LARGEFILE.
>
> Thanks,
> Richard.
>
> 2017-06-14  Richard Biener  
>
> PR gcov-profile/81080
> * configure.ac: Add AC_SYS_LARGEFILE.
> * libgcov.h: Include auto-target.h before tsystem.h to pick
> up _FILE_OFFSET_BITS which might differ for multilibs.
> * config.in: Regenerate.
> * configure: Likewise.


This is OK.

Thanks.

Ian


Re: [PATCH] LFS support for libbacktrace

2017-06-20 Thread Ian Lance Taylor via gcc-patches
On Wed, Jun 14, 2017 at 3:40 AM, Richard Biener  wrote:
>
> This fixes the [f]open use in libgfortran.  Doesn't fix the ones
> in libsanitizer because those appearantly use a copy because they
> need to rename stuff...
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk
> and branches?
>
> Thanks,
> Richard.
>
> 2017-06-14  Richard Biener  
>
> * configure.ac: Add AC_SYS_LARGEFILE.
> * config.h.in: Regenerate.
> * configure: Likewise.

This is OK everywhere.

Thanks.

Ian


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-20 Thread Eric Botcazou
> But what you end up depending on is undocumented behavior of a
> particular kernel implementation.  That seems rather unwise.

And it's the single example of such a thing in the entire codebase?
I don't know the code of the sanitizer much, but from the outside it looks 
full of similar tricks...

> Which ABIs have that property?  I'll be the first to admit that I've
> purged much of my weird ABI memories.

The original Alpha ABI mentioned by Richard IIRC for example.

> Supporting ABIs which force us into a probe, then allocate strategy is
> actually easy.  We can use the existing -fstack-check code, but use the
> value 0 for STACK_CHECK_PROTECT.
> 
> Just replace all uses of STACK_CHECK_PROTECT with calls to a wrapper.
> 
> The wrapper looks like
> 
> if (magic_flag)
>   return STACK_CHECK_PROTECT;
> else
>   return 0;
> 
> That's precisely what we were planning to do prior to bumping against
> the valgrind issues.  That indirection makes it easy to ensure we didn't
> change the behavior of the existing stack-check for Ada, but also allows
> us to change the behavior for the new stack checking option.

Yes, that would seem the most straightforward thing to do modulo Valgrind.

> Ah, so if you're running on an alternate stack, then why probe ahead of
> need?  I thought the whole point of probing a couple pages ahead as to
> ensure you could take the signal the Ada.

We run on the alternate stack only when we do _not_ probe ahead, i.e. on 
x86/x86-64 Linux.

> I've also wondered if a 2 page guard would solve some of these problems.
>  In the event of stack overflow, the kernel maps in one of the two pages
> for use by the signal handler.   But changing things at this point may
> not be worth the effort.

That was exactly the strategy used by Tru64 (so you needed to manually unmap 
the page after you had recovered from the overflow).

-- 
Eric Botcazou


Re: [PATCH, AArch64] Add x86 intrinsic headers to GCC AArch64 taget

2017-06-20 Thread Segher Boessenkool
On Tue, Jun 20, 2017 at 09:34:25PM +, Joseph Myers wrote:
> On Tue, 20 Jun 2017, Segher Boessenkool wrote:
> 
> > > And as you see see below the gcc.target tests have to be duplicated
> > > anyway. Even if the C code is common there will many differences in
> > > dg-options and dg-require-effective-target. Trying to common these
> > > implementations only creates more small files to manage.
> > 
> > So somewhere in the near future we'll have to pull things apart again,
> > if we go with merging things now.
> 
> The common part in the intrinsics implementation should be exactly the 
> parts that can be implemented in GNU C without target-specific intrinsics 
> being needed.  There should be nothing to pull apart if you start with the 
> right things in the common header.  If a particular header has some 
> functions that can be implemented in GNU C and some that need 
> target-specific code, the generic GNU C functions should be in a common 
> header, #included by the target-specific header.  The common header should 
> have no conditionals on target architectures whatever (it might have 
> conditionals on things like endianness).

I don't think there is much that will end up in the common header
eventually.  If it was possible to describe most of this in plain C,
and in such a way that it would optimise well, there would not *be*
these intrinsics.

> I don't expect many different effective-target / dg-add-options keywords 
> to be needed for common tests (obviously, duplicating tests for each 
> architecture wanting these intrinsics is generally a bad idea).

Yeah, I think it should be possible to share the tests, perhaps with
some added dg things (so that we don't have to repeat the same things
over and over).


Segher


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-20 Thread Jeff Law
On 06/20/2017 06:27 AM, Richard Biener wrote:
> On Tue, Jun 20, 2017 at 2:20 PM, Uros Bizjak  wrote:
>> On Tue, Jun 20, 2017 at 2:17 PM, Uros Bizjak  wrote:
>>> On Tue, Jun 20, 2017 at 2:13 PM, Florian Weimer  wrote:
 On 06/20/2017 01:10 PM, Uros Bizjak wrote:

>   74,99%  a.outa.out  [.] test_or
>   12,50%  a.outa.out  [.] test_movb
>   12,50%  a.outa.out  [.] test_movl

 Could you try notl/notb/negl/negb as well, please?
>>>
>>> These all have the same (long) runtime as test_or.
>>
>> Perhaps we can use "testb $0, %0"? It doesn't write to the memory, but
>> otherwise has the same runtime as movb/movl.
> 
> That sounds good, OTOH it's a matter of putting strain on the
> memory fetch or store side...  We'll get cacheline allocations in
> any case (but the memory will be used eventually).  Instead
> of test a mere movb into a scratch register (aka, load instead of
> store) would work as well apart from the need of a scratch register.
It was never clear to me why we always implement probes via stores --
though from development standpoint a destructive store is useful.

I'd expect a tst to generate the desired SEGV.

How does that like compare to the partial-allocation + push approach?


> 
> We can also vectorize with scatters ;)  (just kidding)
:-)

jeff


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-20 Thread Jeff Law
On 06/20/2017 02:16 AM, Eric Botcazou wrote:
> 
> Right, because the Linux kernel for x86/x86-64 is the only OS flavor that 
> doesn't let you probe the stack ahead of the stack pointer.  All other 
> combinations of OS and architecture we tried (and it's quite a lot) do.
But what you end up depending on is undocumented behavior of a
particular kernel implementation.  That seems rather unwise.


> 
>> After much poking around I concluded that we really need to implement
>> allocation and probing via a "moving sp" strategy.   Probing into
>> unallocated areas runs afoul of valgrind, so that's a non-starter.
> 
> The reason why you cannot use this strategy on a global basis for stack 
> checking is that some ABIs specify that you cannot update the stack pointer 
> more than once to establish a frame; others don't explicitly care but...
Which ABIs have that property?  I'll be the first to admit that I've
purged much of my weird ABI memories.

Supporting ABIs which force us into a probe, then allocate strategy is
actually easy.  We can use the existing -fstack-check code, but use the
value 0 for STACK_CHECK_PROTECT.

Just replace all uses of STACK_CHECK_PROTECT with calls to a wrapper.

The wrapper looks like

if (magic_flag)
  return STACK_CHECK_PROTECT;
else
  return 0;

That's precisely what we were planning to do prior to bumping against
the valgrind issues.  That indirection makes it easy to ensure we didn't
change the behavior of the existing stack-check for Ada, but also allows
us to change the behavior for the new stack checking option.


> 
>> Allocating stack space, then probing the pages within the space is
>> vulnerable to async signal delivery between the allocation point and the
>> probe point.  If that occurs the signal handler could end up running on
>> a stack that has collided with the heap.
> 
> ...yes, there are difficulties with the "moving sp" strategy.
> 
>> Finally, we need not ensure the ability to handle a signal at stack
>> overflow.  It is fine for the kernel to halt the process immediately if
>> it detects a reference to the guard page.
> 
> In Ada it's the opposite and we use an alternate signal stack in this case.
Ah, so if you're running on an alternate stack, then why probe ahead of
need?  I thought the whole point of probing a couple pages ahead as to
ensure you could take the signal the Ada.

I've also wondered if a 2 page guard would solve some of these problems.
 In the event of stack overflow, the kernel maps in one of the two pages
for use by the signal handler.   But changing things at this point may
not be worth the effort.

> 
>> Michael Matz has suggested some generic support so that we don't have to
>> write target specific code for each and every target we support.  THe
>> idea is to have a helper function which allocates and probes stack
>> space.  THe port can then call that helper function from within its
>> prologue generator.  I  think this is wise -- I wouldn't want to go
>> through this exercise on every port.
> 
> Interesting.  We never convinced ourselves that this was worthwhile.
The idea is not to have to write probing code for all those embedded
targets.  I doubt anyone really wants to write probes for the mn103,
rl78, mep, etc etc.

With Matz's little helper routine, they just have to pick the right
point in their prologue code to call the helper.  At least that's the
theory.

Jeff




Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-20 Thread Jeff Law
On 06/20/2017 03:27 AM, Richard Earnshaw (lists) wrote:
> On 19/06/17 18:07, Jeff Law wrote:
>> As some of you are likely aware, Qualys has just published fairly
>> detailed information on using stack/heap clashes as an attack vector.
>> Eric B, Michael M -- sorry I couldn't say more when I contact you about
>> -fstack-check and some PPC specific stuff.  This has been under embargo
>> for the last month.
>>
>>
>> --
>>
>>
>> http://www.openwall.com/lists/oss-security/2017/06/19/1
>>
> [...]
>> aarch64 is significantly worse.  There are no implicit probes we can
>> exploit.  Furthermore, the prologue may allocate stack space 3-4 times.
>> So we have the track the distance to the most recent probe and when that
>> distance grows too large, we have to emit a probe.  Of course we have to
>> make worst case assumptions at function entry.
>>
> 
> I'm not sure I understand what you're saying here.  According to the
> comment above aarch64_expand_prologue, the stack frame looks like:
> 
> +---+
> |   |
> |  incoming stack arguments |
> |   |
> +---+
> |   | <-- incoming stack pointer (aligned)
> |  callee-allocated save area   |
> |  for register varargs |
> |   |
> +---+
> |  local variables  | <-- frame_pointer_rtx
> |   |
> +---+
> |  padding0 | \
> +---+  |
> |  callee-saved registers   |  | frame.saved_regs_size
> +---+  |
> |  LR'  |  |
> +---+  |
> |  FP'  | / <- hard_frame_pointer_rtx (aligned)
> +---+
> |  dynamic allocation   |
> +---+
> |  padding  |
> +---+
> |  outgoing stack arguments | <-- arg_pointer
> |   |
> +---+
> |   | <-- stack_pointer_rtx (aligned)
> 
> Now for the majority of frames the amount of local variables is small
> and there is neither dynamic allocation nor the need for outgoing local
> variables.  In this case the first instruction in the function is
> 
>   stp fp, lr, [sp, #-FrameSize
But the stack pointer might have already been advanced into the guard
page by the caller.   For the sake of argument assume the guard page is
0xf1000 and assume that our stack pointer at entry is 0xf1010 and that
the caller hasn't touched the 0xf1000 page.

If FrameSize >= 32, then the stores are going to hit the 0xf page
rather than the 0xf1000 page.   That's jumping the guard.  Thus we have
to emit a probe prior to this stack allocation.

Now because this instruction stores at *new_sp, it does allow us to
eliminate future probes and I do take advantage of that in my code.

The implementation is actually rather simple.  We keep a conservative
estimate of the offset of the last known probe relative to the stack
pointer.  At entry we have to assume the offset is:

PROBE_INTERVAL - (STACK_BOUNDARY / BITS_PER_UNIT)


A stack allocation increases the offset.  A store into the stack
decreases the offset.
 i
A probe is required before an allocation that increases the offset to >=
PROBE_INTERVAL.

An allocation + store instruction such as shown does both, but can (and
is) easily modeled.  THe only tricky case here is that you can't naively
break it up into an allocation and store as that can force an
unnecessary probe (say if the allocated space is just enough to hold the
stored objects).




> 
> 
> If the locals area gets slightly larger (>= 512 bytes) then the sequence
> becomes
>   sub sp, sp, #FrameSize
>   stp fp, lr, [sp]
> 
> But again this acts as a sufficient implicit probe provided that
> FrameSize does not exceed the probe interval.
And again, the store acts as a probe which can eliminate potential
probes that might occur later in the instruction stream.  But if the
allocation by the "sub" instruction causes our running offset to cross
PROBE_BOUNDARY, then we must emit a probe prior to the "sub" instruction.

Hopefully it'll be clearer when I post the code :-)  aarch64 is one that
will need updating as all work to-date has been with Red Hat's 4.8
compiler with the aarch64 code generator bolted onto the side.

So perhaps "no implicit probes" was too strong.  It would probably be
better stated "no implicit probes in the caller".  We certainly use
stores in the prologue to try and eliminate probes.  In fact, we try
harder on aarch64 than any other target.

Jeff


Re: [i386] __builtin_ia32_stmxcsr could be pure

2017-06-20 Thread Marc Glisse

Ping.

On Sat, 3 Jun 2017, Marc Glisse wrote:


Hello,

I don't think Richard's "sounds good" was meant as "ok to commit". Does an 
x86 maintainer want to approve or criticize the patch?


https://gcc.gnu.org/ml/gcc-patches/2017-05/msg02009.html

On Fri, 26 May 2017, Richard Biener wrote:


On Fri, May 26, 2017 at 10:55 AM, Marc Glisse  wrote:

Hello,

glibc marks fegetround as a pure function. On x86, people tend to use
_MM_GET_ROUNDING_MODE instead, which could benefit from the same. I think 
it

is safe, but a second opinion would be welcome.


Sounds good.  The important part is to keep the dependency to 
SET_ROUNDING_MODE

which is done via claiming both touch global memory.


I could have handled just this builtin, but it seemed better to provide
def_builtin_pure (like "const" already has) since there should be other
builtins that can be marked this way (maybe the gathers?).


Should work for gathers.  They could even use stronger guarantees,
namely a fnspec with "..R" (the pointer argument is only read from 
directly).

Similarly scatter can use ".W" (the pointer argument is only written to
directly).

Richard.


Bootstrap+testsuite on x86_64-pc-linux-gnu with default languages.

2017-05-29  Marc Glisse  

gcc/
* config/i386/i386.c (struct builtin_isa): New field pure_p.
Reorder for compactness.
(def_builtin, def_builtin2, ix86_add_new_builtins): Handle pure_p.
(def_builtin_pure, def_builtin_pure2): New functions.
(ix86_init_mmx_sse_builtins) [__builtin_ia32_stmxcsr]: Mark as 
pure.


gcc/testsuite/
* gcc.target/i386/getround.c: New file.

--
Marc Glisse







--
Marc Glisse


Re: [PATCH, AArch64] Add x86 intrinsic headers to GCC AArch64 taget

2017-06-20 Thread Joseph Myers
On Tue, 20 Jun 2017, Segher Boessenkool wrote:

> > And as you see see below the gcc.target tests have to be duplicated
> > anyway. Even if the C code is common there will many differences in
> > dg-options and dg-require-effective-target. Trying to common these
> > implementations only creates more small files to manage.
> 
> So somewhere in the near future we'll have to pull things apart again,
> if we go with merging things now.

The common part in the intrinsics implementation should be exactly the 
parts that can be implemented in GNU C without target-specific intrinsics 
being needed.  There should be nothing to pull apart if you start with the 
right things in the common header.  If a particular header has some 
functions that can be implemented in GNU C and some that need 
target-specific code, the generic GNU C functions should be in a common 
header, #included by the target-specific header.  The common header should 
have no conditionals on target architectures whatever (it might have 
conditionals on things like endianness).

I don't expect many different effective-target / dg-add-options keywords 
to be needed for common tests (obviously, duplicating tests for each 
architecture wanting these intrinsics is generally a bad idea).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] [i386] Enable Control-flow Enforcement Technology (CET).

2017-06-20 Thread Joseph Myers
This patch is completely missing documentation (in *.texi files) of the 
new options, attribute, built-in functions etc.

You appear to be adding quite target-specific things to the 
architecture-independent compiler.  If the attribute, for example, is to 
be architecture-independent, the documentation needs to define semantics 
for it that make sense on any architecture (or any architecture providing 
such features), not just on x86; the patch submission needs to justify the 
design choices of what is target-specific and what is target-independent.  
(Cf. MPX where there are various architecture-independent features for 
which a software implementation would be logically possible, although 
actually the only implementation of those features in GCC is for MPX 
hardware.)  I don't think this patch would even build for non-x86 targets, 
because you're putting completely x86-specific references such as 
TARGET_CET and gen_nop_endbr in target-independent files.

-- 
Joseph S. Myers
jos...@codesourcery.com


[committed] Fix bootstrap on armv6-*-freebsd

2017-06-20 Thread Andreas Tobler

Hi All,

I committed the chunk below to fix bootstrap on armv6*-*-freebsd.

Andreas

2017-06-20  Andreas Tobler  

* config.gcc (armv6*-*-freebsd*): Change the target_cpu_cname to
arm1176jzf-s.

Index: config.gcc
===
--- config.gcc  (revision 249427)
+++ config.gcc  (working copy)
@@ -1089,7 +1089,7 @@
tm_file="${tm_file} arm/bpabi.h arm/freebsd.h arm/aout.h arm/arm.h"
case $target in
armv6*-*-freebsd*)
-   target_cpu_cname="arm1176jzfs"
+   target_cpu_cname="arm1176jzf-s"
tm_defines="${tm_defines} TARGET_FREEBSD_ARMv6=1"
 if test $fbsd_major -ge 11; then
tm_defines="${tm_defines} TARGET_FREEBSD_ARM_HARD_FLOAT=1"


Re: [PATCH, AArch64] Add x86 intrinsic headers to GCC AArch64 taget

2017-06-20 Thread Segher Boessenkool
On Tue, Jun 20, 2017 at 01:51:24PM -0500, Steven Munroe wrote:
> I am not sure this works or is even a good idea.
> 
> As an accident bmiintrin.h can be implemented as C code or common
> builtins. But bmi2intrin.h depends on __builtin_bpermd which to my
> knowledge is PowerISA only.

Right.  And the plan is to only support 64-bit, LE, POWER8 and above
(I hope I got that right -- the point is, only systems with newish
features, not something generic even when considering rs6000 alone).

> As I work on mmx, sse, sse2, etc it gets more complicated. There are
> many X86 intrinsic instances that require altivec.h unique instrisics to
> implement efficiently for the power64le target and some inline __asm.

Yeah.  And even then the expectation is not to get perfectly good
performance, only something good enough as a starting point for a
porting effort.

> Net the current sample size so far is to small to make a reasonable
> assessment.

Right!  And we have only two implementations so far, as well.

> And as you see see below the gcc.target tests have to be duplicated
> anyway. Even if the C code is common there will many differences in
> dg-options and dg-require-effective-target. Trying to common these
> implementations only creates more small files to manage.

So somewhere in the near future we'll have to pull things apart again,
if we go with merging things now.

It's not like the "common" parts will see much (if any) maintenance,
anyway...  The interface is already set in stone, that's the whole
point of this all.


Segher


Re: [PATCH/AARCH64] Improve/correct ThunderX 1 cost model for Arith_shift

2017-06-20 Thread Andrew Pinski
On Mon, Jun 19, 2017 at 2:00 PM, Andrew Pinski  wrote:
> On Wed, Jun 7, 2017 at 10:16 AM, James Greenhalgh
>  wrote:
>> On Fri, Dec 30, 2016 at 10:05:26PM -0800, Andrew Pinski wrote:
>>> Hi,
>>>   Currently for the following function:
>>> int f(int a, int b)
>>> {
>>>   return a + (b <<7);
>>> }
>>>
>>> GCC produces:
>>> add w0, w0, w1, lsl 7
>>> But for ThunderX 1, it is better if the instruction was split allowing
>>> better scheduling to happen in most cases, the latency is the same.  I
>>> get a small improvement in coremarks, ~1%.
>>>
>>> Currently the code does not take into account Arith_shift even though
>>> the comment:
>>>   /* Strip any extend, leave shifts behind as we will
>>> cost them through mult_cost.  */
>>> Say it does not strip out the shift, aarch64_strip_extend does and has
>>> always has since the back-end was added to GCC.
>>>
>>> Once I fixed the code around aarch64_strip_extend, I got a regression
>>> for ThunderX 1 as some shifts/extends (left shifts <=4 and/or zero
>>> extends) are considered free so I needed to add a new tuning flag.
>>>
>>> Note I will get an even more improvement for ThunderX 2 CN99XX, but I
>>> have not measured it yet as I have not made the change to
>>> aarch64-cost-tables.h yet as I am waiting for approval of the renaming
>>> patch first before submitting any of the cost table changes.  Also I
>>> noticed this problem with this tuning first and then looked back at
>>> what I needed to do for ThunderX 1.
>>>
>>> OK?  Bootstrapped and tested on aarch64-linux-gnu without any
>>> regressions (both with and without --with-cpu=thunderx).
>>
>> This is mostly OK, but I don't like the name "easy"_shift_extend. Cheap
>> or free seems better. I have some other minor points below.
>
>
> Ok, that seems like a good idea.  I used easy since that was the
> wording our hardware folks had came up with.  I am changing the
> comments to make clearer when this flag should be used.
> I should a new patch out by the end of today.

Due to the LSE ICE which I reported in the other thread, it took me
longer to send out a new patch.
Anyways here is the updated patch with the changes requested.


OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.


Thanks,
Andrew Pinski


* config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs):
Increment Arith_shift and Arith_shift_reg by 1.
* config/aarch64/aarch64-tuning-flags.def (cheap_shift_extend): New tuning flag.
* config/aarch64/aarch64.c (thunderx_tunings): Enable
AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND.
(aarch64_strip_extend): Add new argument and test for it.
(aarch64_cheap_mult_shift_p): New function.
(aarch64_rtx_mult_cost): Call aarch64_cheap_mult_shift_p and don't add
a cost if it is true.
Update calls to aarch64_strip_extend.
(aarch64_rtx_costs): Update calls to aarch64_strip_extend.



>
> Thanks,
> Andrew
>
>
>>
>>> Index: config/aarch64/aarch64-tuning-flags.def
>>> ===
>>> --- config/aarch64/aarch64-tuning-flags.def   (revision 243974)
>>> +++ config/aarch64/aarch64-tuning-flags.def   (working copy)
>>> @@ -35,4 +35,8 @@ two load/stores are not at least 8 byte
>>>  pairs.   */
>>>  AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW)
>>>
>>> +/* Logical shift left <=4 with/without zero extend are considered easy
>>> +   extended, also zero extends without the shift. */
>>
>>
>> I'm struggling to parse this comment. "also zero extends without the shift"
>> is what is getting me. I'm also not certain I follow when I should set this
>> flag. If all shifts are cheap/free on my platform, should I set this flag?
>>
>>> +AARCH64_EXTRA_TUNING_OPTION ("easy_shift_extend", EASY_SHIFT_EXTEND)
>>> +
>>>  #undef AARCH64_EXTRA_TUNING_OPTION
>>
>>
>>> +
>>> +/* Return true iff X is an easy shift without a sign extend. */
>>> +
>>
>> Again I don't like calling <= 4 "easy", it feels imprecise.
>>
>> Thanks,
>> James
>>
Index: gcc/config/aarch64/aarch64-cost-tables.h
===
--- gcc/config/aarch64/aarch64-cost-tables.h(revision 249424)
+++ gcc/config/aarch64/aarch64-cost-tables.h(working copy)
@@ -136,8 +136,8 @@ const struct cpu_cost_table thunderx_ext
 0, /* Logical.  */
 0, /* Shift.  */
 0, /* Shift_reg.  */
-COSTS_N_INSNS (1), /* Arith_shift.  */
-COSTS_N_INSNS (1), /* Arith_shift_reg.  */
+COSTS_N_INSNS (1)+1,   /* Arith_shift.  */
+COSTS_N_INSNS (1)+1,   /* Arith_shift_reg.  */
 COSTS_N_INSNS (1), /* UNUSED: Log_shift.  */
 COSTS_N_INSNS (1), /* UNUSED: Log_shift_reg.  */
 0, /* Extend.  */
Index: gcc/config/aarch64/aarch64-tuning-flags.def
===
--- gcc/config/aarch64/aarch64-tuning-flags.def (revision 249424)
+++ 

NOP conversions in X+CST+CST

2017-06-20 Thread Marc Glisse

Hello,

now that FRE was fixed to avoid infinite recursion, this patch passes 
bootstrap+testsuite on x86_64-pc-linux-gnu multilib with all languages 
(including ada).


This isn't exactly the patch that was reverted, because the previous patch 
did not actually handle vectors properly.


It still shouldn't interfere with the patch by Robin Dapp which IIUC only 
handles the case where the conversion is an extension.


2017-06-21  Marc Glisse  

gcc/
* match.pd (nop_convert): New predicate.
((A +- CST1) +- CST2): Allow some NOP conversions.

gcc/testsuite/
* gcc.dg/tree-ssa/addadd.c: Un-XFAIL.
* gcc.dg/tree-ssa/addadd-2.c: New file.

--
Marc GlisseIndex: gcc/match.pd
===
--- gcc/match.pd	(revision 249413)
+++ gcc/match.pd	(working copy)
@@ -67,20 +67,34 @@ along with GCC; see the file COPYING3.
 			  BUILT_IN_L##FN \
 			  BUILT_IN_LL##FN) \
   (define_operator_list X##FN##L BUILT_IN_I##FN##L \
  BUILT_IN_L##FN##L \
  BUILT_IN_LL##FN##L)
 
 DEFINE_INT_AND_FLOAT_ROUND_FN (FLOOR)
 DEFINE_INT_AND_FLOAT_ROUND_FN (CEIL)
 DEFINE_INT_AND_FLOAT_ROUND_FN (ROUND)
 DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
+
+/* As opposed to convert?, this still creates a single pattern, so
+   it is not a suitable replacement for convert? in all cases.  */
+(match (nop_convert @0)
+ (convert @0)
+ (if (tree_nop_conversion_p (type, TREE_TYPE (@0)
+(match (nop_convert @0)
+ (view_convert @0)
+ (if (VECTOR_TYPE_P (type) && VECTOR_TYPE_P (TREE_TYPE (@0))
+  && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0))
+  && tree_nop_conversion_p (TREE_TYPE (type), TREE_TYPE (TREE_TYPE (@0))
+/* This one has to be last, or it shadows the others.  */
+(match (nop_convert @0)
+ @0) 
 
 /* Simplifications of operations with one constant operand and
simplifications to constants or single values.  */
 
 (for op (plus pointer_plus minus bit_ior bit_xor)
   (simplify
 (op @0 integer_zerop)
 (non_lvalue @0)))
 
 /* 0 +p index -> (type)index */
@@ -1289,32 +1303,58 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (simplify
 (plus:c (minus @0 @1) @1)
 @0)
   (simplify
(minus @0 (plus:c @0 @1))
(negate @1))
   (simplify
(minus @0 (minus @0 @1))
@1)
 
-  /* (A +- CST1) +- CST2 -> A + CST3  */
+  /* (A +- CST1) +- CST2 -> A + CST3
+ Use view_convert because it is safe for vectors and equivalent for
+ scalars.  */
   (for outer_op (plus minus)
(for inner_op (plus minus)
+	neg_inner_op (minus plus)
 (simplify
- (outer_op (inner_op @0 CONSTANT_CLASS_P@1) CONSTANT_CLASS_P@2)
- /* If the constant operation overflows we cannot do the transform
-	as we would introduce undefined overflow, for example
-	with (a - 1) + INT_MIN.  */
- (with { tree cst = const_binop (outer_op == inner_op
- ? PLUS_EXPR : MINUS_EXPR, type, @1, @2); }
-  (if (cst && !TREE_OVERFLOW (cst))
-   (inner_op @0 { cst; } ))
+ (outer_op (nop_convert (inner_op @0 CONSTANT_CLASS_P@1))
+	   CONSTANT_CLASS_P@2)
+ /* If one of the types wraps, use that one.  */
+ (if (!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type))
+  (if (outer_op == PLUS_EXPR)
+   (plus (view_convert @0) (inner_op @2 (view_convert @1)))
+   (minus (view_convert @0) (neg_inner_op @2 (view_convert @1
+  (if (!ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
+	   || TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
+   (if (outer_op == PLUS_EXPR)
+	(view_convert (plus @0 (inner_op (view_convert @2) @1)))
+	(view_convert (minus @0 (neg_inner_op (view_convert @2) @1
+   /* If the constant operation overflows we cannot do the transform
+	  directly as we would introduce undefined overflow, for example
+	  with (a - 1) + INT_MIN.  */
+   (if (types_match (type, @0))
+	(with { tree cst = const_binop (outer_op == inner_op
+	? PLUS_EXPR : MINUS_EXPR,
+	type, @1, @2); }
+	 (if (cst && !TREE_OVERFLOW (cst))
+	  (inner_op @0 { cst; } )
+	  /* X+INT_MAX+1 is X-INT_MIN.  */
+	  (if (INTEGRAL_TYPE_P (type) && cst
+	   && wi::eq_p (cst, wi::min_value (type)))
+	   (neg_inner_op @0 { wide_int_to_tree (type, cst); })
+	   /* Last resort, use some unsigned type.  */
+	   (with { tree utype = unsigned_type_for (type); }
+	(view_convert (inner_op
+			   (view_convert:utype @0)
+			   (view_convert:utype
+			{ drop_tree_overflow (cst); })
 
   /* (CST1 - A) +- CST2 -> CST3 - A  */
   (for outer_op (plus minus)
(simplify
 (outer_op (minus CONSTANT_CLASS_P@1 @0) CONSTANT_CLASS_P@2)
 (with { tree cst = const_binop (outer_op, type, @1, @2); }
  (if (cst && !TREE_OVERFLOW (cst))
   (minus { cst; } @0)
 
   /* CST1 - (CST2 - A) -> CST3 + A  */
Index: gcc/testsuite/gcc.dg/tree-ssa/addadd-2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/addadd-2.c	(nonexistent)
+++ 

[PATCH] [i386] Enable Control-flow Enforcement Technology (CET).

2017-06-20 Thread Tsimbalist, Igor V
Control-flow Enforcement Technology (CET) provides the following
capabilities to defend against ROP/JOP style control-flow subversion
attacks:
- Shadow Stack - return address protection to defend against Return
  Oriented Programming,
- Indirect branch tracking - free branch protection to defend
  against
  Jump/Call Oriented Programming.

Details are described in the doc
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf

This patch enables CET in the compiler only (gcc directory). The
executables built with the CET enabled compiler can run successfully
on non-CET i386 HW as executed new instructions are NOPs there.
Functional testing can be done through Intel® Software Development
Emulator or Intel® SDE.

There will be more patches to enable remaining intrinsics, to support CET
in the compiler libraries (exception handling) and in glibc.

The patch adds 1) new options to control the technology, 2) three new
instructions (endbr, rdssp, inssp) and intrinsics, 3) a pass to generate
endbr instruction, 4) new 'notrack' attribute for functions and pointers
to function and code generation for it, 5) shadow stack processing in
setjmp/longjmp builtins.

Basic functional tests are added. Bootstrap is done successfully w/o and
w/ CET option (-mcet).

gcc/
    * builtins.c
    (expand_builtin_setjmp_setup): Add saving shadow stack pointer
   in jmpbuf using rdssp insatruction.
    (expand_builtin_longjmp): Add adjusting shadow stack pointer
   using incssp instruction.
    * c-family/c-attribs.c
    (handle_notrack_attribute): New function.
    (c_common_attribute_table): Add a 'notrack' attribute.
    * calls.c
    (emit_call_1): Set REG_CALL_NOTRACK on call insn.
    (flags_from_decl_or_type): Retrieve notrack attribute from a 
decl.
    (expand_call): Retrieve notrack attribute from a decl.
    * combine.c: Handle REG_CALL_NOTRACK.
    * common/config/i386/i386-common.c
    (OPTION_MASK_ISA_CET_SET, OPTION_MASK_ISA_CET_UNSET): New.
    (ix86_handle_option): Handle OPT_mcet.
    * config.gcc: Add cetintrin.h.
    * config/i386/cetintrin.h: New file.
    * config/i386/cpuid.h: (bit_CET) new bit.
    * config/i386/driver-i386.c
    (host_detect_local_cpu): Detect cet.
    * config/i386/i386-builtin.def
    (__builtin_ia32_rdsspd, __builtin_ia32_rdsspq,
    __builtin_ia32_incsspd, __builtin_ia32_incsspd): New intrinsics.
    * config/i386/i386-c.c
    (ix86_target_macros_internal): Define __CET__.
    * config/i386/i386-protos.h (ix86_notrack_prefixed_insn_p): New.
    * config/i386/i386.c
    (ix86_target_string): Add -mcet.
    (ix86_valid_target_attribute_inner_p): Add cet.
    (ix86_print_operand): Output notrack.
    BDESC_VERIFYS for CET intrinsics.
    (ix86_init_mmx_sse_builtins): Define CET intrinsics.
    (x86_output_mi_thunk): Add endbr instruction.
    (ix86_notrack_prefixed_insn_p): New function.
    * config/i386/i386.h
    (TARGET_CET, TARGET_CET_P): New.
    * config/i386/i386.md
    (define_insn "rdssp"): New instruction.
    (define_insn "incssp"): Likewise.
    (define_insn "nop_endbr"): Likewise.
    * config/i386/i386.opt
    (mcet, mcet-switch, mcet-indbranch-tracking,
    mcet-shadow-stack): New options.
    * config/i386/immintrin.h
    Add include .
    * final.c
    (rest_of_handle_cet): New.
    (pass_data_handle_cet): New.
    (pass_handle_cet): New.
    (make_pass_handle_cet): New.
    * passes.def: (pass_handle_cet) Add pass.
    * reg-notes.def: (CALL_NOTRACK) New note for notrack.
    * timevar.def: (TV_CET) New.
    * tree-core.h: (ECF_NOTRACK) New.
    * tree-pass.h: (make_pass_handle_cet) New.

gcc/testsuite/

    * gcc.target/i386/cet-intrin.c: New test.
    * gcc.target/i386/cet-label.c: Likewise.
    * gcc.target/i386/cet-notrack.c: Likewise.
    * gcc.target/i386/cet-sjlj.c: Likewise.
    * gcc.target/i386/cet-switch-1.c: Likewise.
    * gcc.target/i386/cet-switch-2.c: Likewise.
---



0001-Enable-Control-flow-Enforcement-Technology-CET.patch
Description: 0001-Enable-Control-flow-Enforcement-Technology-CET.patch


[committed] Fix -Werror=class-memaccess failures in jit testsuite (PR jit/81144)

2017-06-20 Thread David Malcolm
This patch fixes a couple of failures of the form:

  error: 'void* memset(void*, int, size_t)' clearing an object of non-trivial
type 'struct quadratic_test'; use assignment or value-initialization
instead [-Werror=class-memaccess]
  note: 'struct quadratic_test' declared here
  cc1plus: all warnings being treated as errors

seen within the jit testsuite, by using zero-initialization instead
of memset.

(presumably introduced by r249234 aka a324786b4ded9047d05463b4bce9d238b6c6b3ef)

Successfully tested on x86_64-pc-linux-gnu; takes jit.sum from:
  # of expected passes9211
  # of unexpected failures2
to:
  # of expected passes9349

Martin: it's unclear to me what the benefit of the warning is for these
cases.  AIUI, it's complaining because the code is calling
the default ctor for struct quadratic_test, and then that object is
being clobbered by the memset.
But if I'm reading things right, the default ctor for this struct
zero-initializes all fields.  Can't the compiler simply optimize away
the redundant memset, and not issue a warning?

gcc/testsuite/ChangeLog:
PR jit/81144
* jit.dg/test-operator-overloading.cc (make_test_quadratic): Replace
memset call with zero-initialization.
* jit.dg/test-quadratic.cc (make_test_quadratic): Likewise.
---
 gcc/testsuite/jit.dg/test-operator-overloading.cc | 3 +--
 gcc/testsuite/jit.dg/test-quadratic.cc| 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/jit.dg/test-operator-overloading.cc 
b/gcc/testsuite/jit.dg/test-operator-overloading.cc
index cbb1e98..f57b3fc 100644
--- a/gcc/testsuite/jit.dg/test-operator-overloading.cc
+++ b/gcc/testsuite/jit.dg/test-operator-overloading.cc
@@ -272,8 +272,7 @@ make_test_quadratic (quadratic_test )
 void
 create_code (gcc_jit_context *ctxt, void *user_data)
 {
-  struct quadratic_test testcase;
-  memset (, 0, sizeof (testcase));
+  struct quadratic_test testcase = {};
   testcase.ctxt = ctxt;
   make_types (testcase);
   make_sqrt (testcase);
diff --git a/gcc/testsuite/jit.dg/test-quadratic.cc 
b/gcc/testsuite/jit.dg/test-quadratic.cc
index f347669..61b5cdd 100644
--- a/gcc/testsuite/jit.dg/test-quadratic.cc
+++ b/gcc/testsuite/jit.dg/test-quadratic.cc
@@ -328,8 +328,7 @@ make_test_quadratic (quadratic_test )
 void
 create_code (gcc_jit_context *ctxt, void *user_data)
 {
-  struct quadratic_test testcase;
-  memset (, 0, sizeof (testcase));
+  struct quadratic_test testcase = {};
   testcase.ctxt = ctxt;
   make_types (testcase);
   make_sqrt (testcase);
-- 
1.8.5.3



Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-20 Thread Jeff Law
On 06/20/2017 02:37 PM, Eric Botcazou wrote:
>> But then valgrind won't be able to find bugs in the code (storing and later
>> reading stuff into the volatile parts of the stack that could be overwritten
>> by any asynchronous signal).  GCC had various bugs in this area and
>> valgrind has been able to report those.  Unless the probe instruction is
>> sufficiently magic that it won't usually appear in other code.
> 
> Right, maybe this magic aspect was the reason why it was initially 
> implemented 
> like that for Cygwin, at least you know that orl $0 is meant to be special.
> 
>> Only checking loads below the stack is not sufficient, some buggy code could
>> e.g. store some data below stack pointer (below red zone if any), then
>> subtract stack and then try to read it, etc.
>>
>> Not to mention that it isn't just false positive messages with current
>> valgrind on -fstack-check code, e.g. on ppc64 it just crashes.
> 
> The reasoning seems weird though since, apart from x86/x86-64, you're going 
> to 
> gratuitously inflict this painful "moving sp" thing to every program compiled 
> on Linux because of just one tool that you can adapt.
I don't see MOVING_SP as painful, except perhaps on aarch64.  On
something like PPC MOVING_SP turns out to be exceedingly clean.

jeff


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-20 Thread Eric Botcazou
> But then valgrind won't be able to find bugs in the code (storing and later
> reading stuff into the volatile parts of the stack that could be overwritten
> by any asynchronous signal).  GCC had various bugs in this area and
> valgrind has been able to report those.  Unless the probe instruction is
> sufficiently magic that it won't usually appear in other code.

Right, maybe this magic aspect was the reason why it was initially implemented 
like that for Cygwin, at least you know that orl $0 is meant to be special.

> Only checking loads below the stack is not sufficient, some buggy code could
> e.g. store some data below stack pointer (below red zone if any), then
> subtract stack and then try to read it, etc.
> 
> Not to mention that it isn't just false positive messages with current
> valgrind on -fstack-check code, e.g. on ppc64 it just crashes.

The reasoning seems weird though since, apart from x86/x86-64, you're going to 
gratuitously inflict this painful "moving sp" thing to every program compiled 
on Linux because of just one tool that you can adapt.

-- 
Eric Botcazou


[PATCH, VAX] Correct ffs instruction constraint

2017-06-20 Thread coypu
VAX' FFS as variable-length bit field instruction uses a "base"
operand of type "vb" meaning "byte address".
"base" can be 32 bits (SI) and due to the definition of
ffssi2/__builtin_ffs() with the operand constraint "m", code can be
emitted which incorrectly implies a mode-dependent (= longword, for
the 32-bit operand) address.
File scsipi_base.c compiled with -Os for our VAX install kernel shows:

ffs $0x0,$0x20,0x50(r11)[r0],r9

Apparently, 0x50(r11)[r0] as a longword address is assumed to be
evaluated in longword context by FFS, but the instruction expects a
byte address.

Our fix is to change the operand constraint from "m" to "Q", i. e.
"operand is a MEM that does not have a mode-dependent address", which
results in:

moval 0x50(r11)[r0],r1
ffs $0x0,$0x20,(r1),r9

MOVAL evaluates the source operand/address in longword context, so
effectively converts the word address to a byte address for FFS.

See NetBSD PR port-vax/51761 (http://gnats.netbsd.org/51761) and
discussion on port-vax mailing list
(http://mail-index.netbsd.org/port-vax/2017/01/06/msg002954.html).

Changlog:

2017-06-20  Maya Rashish  

* gcc/config/vax/builtins.md: Correct ffssi2_internal
instruction constraint.


---
 gcc/config/vax/builtins.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/vax/builtins.md b/gcc/config/vax/builtins.md
index fb0f69acb..b78fb5616 100644
--- a/gcc/config/vax/builtins.md
+++ b/gcc/config/vax/builtins.md
@@ -41,7 +41,7 @@
 
 (define_insn "ffssi2_internal"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=rQ")
-   (ffs:SI (match_operand:SI 1 "general_operand" "nrmT")))
+   (ffs:SI (match_operand:SI 1 "general_operand" "nrQT")))
(set (cc0) (match_dup 0))]
   ""
   "ffs $0,$32,%1,%0")
-- 
2.13.1



Re: [PATCH] [SPARC] Add a workaround for the LEON3FT store-store errata

2017-06-20 Thread David Miller
From: Eric Botcazou 
Date: Tue, 20 Jun 2017 21:19:37 +0200

>> I'm fine with this change.
> 
> I disagree, the existing policy is to avoid switches like -mfix-b2bst and use 
> -mfix- where  is a CPU (here  could be ut699e or ut700).

Ok, I was not aware of that policy.  But this should be easy for the
submitter to fix.


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-20 Thread Jakub Jelinek
On Tue, Jun 20, 2017 at 10:21:14AM +0200, Eric Botcazou wrote:
> > Out of curiousity, does the old Alpha/VMS stack-checking API meet the
> > requirements?  From what I recall, I think it does.
> 
> No, it's the usual probe-first-and-then-allocate strategy and Jeff rejects it 
> because of valgrind.  I'd personally rather change valgrind but...

But then valgrind won't be able to find bugs in the code (storing and later
reading stuff into the volatile parts of the stack that could be overwritten
by any asynchronous signal).  GCC had various bugs in this area and valgrind
has been able to report those.  Unless the probe instruction is sufficiently
magic that it won't usually appear in other code.

Only checking loads below the stack is not sufficient, some buggy code could
e.g. store some data below stack pointer (below red zone if any), then
subtract stack and then try to read it, etc.

Not to mention that it isn't just false positive messages with current
valgrind on -fstack-check code, e.g. on ppc64 it just crashes.

Jakub


[PATCH, alpha, go]: Introduce applyRelocationsALPHA

2017-06-20 Thread Uros Bizjak
This patch inroduces applyRelocationsALPHA to solve:

FAIL: TestCgoConsistentResults
FAIL: TestCgoPkgConfig
FAIL: TestCgoHandlesWlORIGIN

gotools errors.

Bootstrapped and regression tested on alphaev68-linux-gnu.

Uros.
Index: go/debug/elf/file.go
===
--- go/debug/elf/file.go(revision 249418)
+++ go/debug/elf/file.go(working copy)
@@ -602,6 +602,8 @@
return f.applyRelocationss390x(dst, rels)
case f.Class == ELFCLASS64 && f.Machine == EM_SPARCV9:
return f.applyRelocationsSPARC64(dst, rels)
+   case f.Class == ELFCLASS64 && f.Machine == EM_ALPHA:
+   return f.applyRelocationsALPHA(dst, rels)
default:
return errors.New("applyRelocations: not implemented")
}
@@ -1049,6 +1051,55 @@
return nil
 }
 
+func (f *File) applyRelocationsALPHA(dst []byte, rels []byte) error {
+   // 24 is the size of Rela64.
+   if len(rels)%24 != 0 {
+   return errors.New("length of relocation section is not a 
multiple of 24")
+   }
+
+   symbols, _, err := f.getSymbols(SHT_SYMTAB)
+   if err != nil {
+   return err
+   }
+
+   b := bytes.NewReader(rels)
+   var rela Rela64
+
+   for b.Len() > 0 {
+   binary.Read(b, f.ByteOrder, )
+   symNo := rela.Info >> 32
+   t := R_ALPHA(rela.Info & 0x)
+
+   if symNo == 0 || symNo > uint64(len(symbols)) {
+   continue
+   }
+   sym := [symNo-1]
+   if SymType(sym.Info&0xf) != STT_SECTION {
+   // We don't handle non-section relocations for now.
+   continue
+   }
+
+   // There are relocations, so this must be a normal
+   // object file, and we only look at section symbols,
+   // so we assume that the symbol value is 0.
+
+   switch t {
+   case R_ALPHA_REFQUAD:
+   if rela.Off+8 >= uint64(len(dst)) || rela.Addend < 0 {
+   continue
+   }
+   f.ByteOrder.PutUint64(dst[rela.Off:rela.Off+8], 
uint64(rela.Addend))
+   case R_ALPHA_REFLONG:
+   if rela.Off+4 >= uint64(len(dst)) || rela.Addend < 0 {
+   continue
+   }
+   f.ByteOrder.PutUint32(dst[rela.Off:rela.Off+4], 
uint32(rela.Addend))
+   }
+   }
+
+   return nil
+}
+
 func (f *File) DWARF() (*dwarf.Data, error) {
// sectionData gets the data for s, checks its size, and
// applies any applicable relations.


Re: C++ PATCH for c++/81073, constexpr and static var in statement-expression

2017-06-20 Thread Andreas Schwab
On Jun 20 2017, Jason Merrill  wrote:

> On Tue, Jun 20, 2017 at 5:40 AM, Andreas Schwab  wrote:
>> FAIL: g++.dg/cpp0x/constexpr-cast.C  -std=c++11  (test for errors, line 10)
>> FAIL: g++.dg/cpp0x/constexpr-cast.C  -std=c++11 (test for excess errors)
>> FAIL: g++.dg/cpp0x/constexpr-cast.C  -std=c++14  (test for errors, line 10)
>> FAIL: g++.dg/cpp0x/constexpr-cast.C  -std=c++14 (test for excess errors)
>
> I'm not seeing this.  Can you give more detail?

http://gcc.gnu.org/ml/gcc-testresults/2017-06/msg02172.html

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


[PING^2] Re: [PATCH] c/c++: Add fix-it hints for suggested missing #includes

2017-06-20 Thread David Malcolm
Ping re:

  https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00321.html


On Fri, 2017-05-26 at 15:54 -0400, David Malcolm wrote:
> Ping:
>   https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00321.html
> 
> On Thu, 2017-05-04 at 12:36 -0400, David Malcolm wrote:
> > As of r247522, fix-it-hints can suggest the insertion of new lines.
> > 
> > This patch uses this to implement a new "maybe_add_include_fixit"
> > function in c-common.c and uses it in the two places where the C
> > and
> > C++
> > frontend can suggest missing #include directives. [1]
> > 
> > The idea is that the user can then click on the fix-it in an IDE
> > and have it add the #include for them (or use -fdiagnostics
> > -generate
> > -patch).
> > 
> > Examples can be seen in the test cases.
> > 
> > The function attempts to put the #include in a reasonable place:
> > immediately after the last #include within the file, or at the
> > top of the file.  It is idempotent, so -fdiagnostics-generate-patch
> > does the right thing if several such diagnostics are emitted.
> > 
> > Successfully bootstrapped on x86_64-pc-linux-gnu.
> > 
> > OK for trunk?
> > 
> > [1] I'm working on a followup which tweaks another diagnostic so
> > that
> > it
> > can suggest that a #include was missing, so I'll use it there as
> > well.
> > 
> > gcc/c-family/ChangeLog:
> > * c-common.c (try_to_locate_new_include_insertion_point): New
> > function.
> > (per_file_includes_t): New typedef.
> > (added_includes_t): New typedef.
> > (added_includes): New variable.
> > (maybe_add_include_fixit): New function.
> > * c-common.h (maybe_add_include_fixit): New decl.
> > 
> > gcc/c/ChangeLog:
> > * c-decl.c (implicitly_declare): When suggesting a missing
> > #include, provide a fix-it hint.
> > 
> > gcc/cp/ChangeLog:
> > * name-lookup.c (get_std_name_hint): Add '<' and '>' around
> > the header names.
> > (maybe_suggest_missing_header): Update for addition of '<' and
> > '>'
> > to above.  Provide a fix-it hint.
> > 
> > gcc/testsuite/ChangeLog:
> > * g++.dg/lookup/missing-std-include-2.C: New text case.
> > * gcc.dg/missing-header-fixit-1.c: New test case.
> > ---
> >  gcc/c-family/c-common.c| 117
> > +
> >  gcc/c-family/c-common.h|   2 +
> >  gcc/c/c-decl.c |  10 +-
> >  gcc/cp/name-lookup.c   |  94 +
> > --
> > --
> >  .../g++.dg/lookup/missing-std-include-2.C  |  55
> > ++
> >  gcc/testsuite/gcc.dg/missing-header-fixit-1.c  |  36 +++
> >  6 files changed, 267 insertions(+), 47 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/lookup/missing-std-include
> > -2.C
> >  create mode 100644 gcc/testsuite/gcc.dg/missing-header-fixit-1.c
> > 
> > diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
> > index 0884922..19f7e60 100644
> > --- a/gcc/c-family/c-common.c
> > +++ b/gcc/c-family/c-common.c
> > @@ -7983,4 +7983,121 @@ c_flt_eval_method (bool maybe_c11_only_p)
> >  return c_ts18661_flt_eval_method ();
> >  }
> >  
> > +/* Attempt to locate a suitable location within FILE for a
> > +   #include directive to be inserted before.  FILE should
> > +   be a string from libcpp (pointer equality is used).
> > +
> > +   Attempt to return the location within FILE immediately
> > +   after the last #include within that file, or the start of
> > +   that file if it has no #include directives.
> > +
> > +   Return UNKNOWN_LOCATION if no suitable location is found,
> > +   or if an error occurs.  */
> > +
> > +static location_t
> > +try_to_locate_new_include_insertion_point (const char *file)
> > +{
> > +  /* Locate the last ordinary map within FILE that ended with a
> > #include.  */
> > +  const line_map_ordinary *last_include_ord_map = NULL;
> > +
> > +  /* ...and the next ordinary map within FILE after that one.  */
> > +  const line_map_ordinary *last_ord_map_after_include = NULL;
> > +
> > +  /* ...and the first ordinary map within FILE.  */
> > +  const line_map_ordinary *first_ord_map_in_file = NULL;
> > +
> > +  for (unsigned int i = 0; i < LINEMAPS_ORDINARY_USED
> > (line_table);
> > i++)
> > +{
> > +  const line_map_ordinary *ord_map
> > +   = LINEMAPS_ORDINARY_MAP_AT (line_table, i);
> > +
> > +  const line_map_ordinary *from = INCLUDED_FROM (line_table,
> > ord_map);
> > +  if (from)
> > +   if (from->to_file == file)
> > + {
> > +   last_include_ord_map = from;
> > +   last_ord_map_after_include = NULL;
> > + }
> > +
> > +  if (ord_map->to_file == file)
> > +   {
> > + if (!first_ord_map_in_file)
> > +   first_ord_map_in_file = ord_map;
> > + if (last_include_ord_map && !last_ord_map_after_include)
> > +   last_ord_map_after_include = ord_map;
> > +   }
> > +}
> > +
> > +  /* Determine where to insert the #include.  */
> > +  const line_map_ordinary *ord_map_for_insertion;

Re: [PATCH] [SPARC] Add a workaround for the LEON3FT store-store errata

2017-06-20 Thread Eric Botcazou
> I'm fine with this change.

I disagree, the existing policy is to avoid switches like -mfix-b2bst and use 
-mfix- where  is a CPU (here  could be ut699e or ut700).

-- 
Eric Botcazou


Re: [PING] C++ Re: [PATCH] C/C++: fix quoting of "aka" typedef information (PR 62170)

2017-06-20 Thread Jason Merrill
On Tue, Jun 20, 2017 at 3:06 PM, David Malcolm  wrote:
> It's not clear to me what the issue alluded to with negative
> obstack_blank is, but I chose to follow the above docs and use
> obstack_blank_fast; am testing an updated patch in which the above line
> now looks like:
>
>   obstack_blank_fast (ob, -(type_start + type_len));
>
> Is the patch OK with that change? (assuming bootstrap
> pass), or should I re-post?

OK with that change.

> On a related matter, this patch conflicts with Volker's patch here:
>
>   https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01576.html
>
> in which he removes the trailing "{enum}" info (and hence all of our
> changes to the testsuite conflict between the two patches...)
>
> Do you have any thoughts on that other patch? [Ccing Volker]

That patch makes sense to me; I prefer "enum E" to "E {enum}".

Jason


Re: [PING] C++ Re: [PATCH] C/C++: fix quoting of "aka" typedef information (PR 62170)

2017-06-20 Thread David Malcolm
On Tue, 2017-06-20 at 14:01 -0400, Jason Merrill wrote:
> On Tue, Jun 20, 2017 at 1:58 PM, Jason Merrill 
> wrote:
> > On Tue, Jun 20, 2017 at 11:50 AM, David Malcolm <
> > dmalc...@redhat.com> wrote:
> > > > +   ob->next_free = p + type_start + type_len;
> > 
> > I'm uncomfortable with modifying the obstack directly.  Why not use
> > obstack_free?
> 
> ...because you aren't freeing the object, but shrinking it.  So
> obstack_blank is a better choice.

Thanks.

As of r229987 ("Copy gnulib obstack files", aka
1ed1385ecb1c11d6915adac74afa2ff7da8be5d1),

libiberty/obstacks.texi says:
> @cindex shrinking objects
> You can use @code{obstack_blank_fast} with a ``negative'' size
> argument to make the current object smaller.  Just don't try to
> shrink it beyond zero length---there's no telling what will happen 
> if you do that.  Earlier versions of obstacks allowed you to use
> @code{obstack_blank} to shrink objects.  This will no longer work.

It's not clear to me what the issue alluded to with negative
obstack_blank is, but I chose to follow the above docs and use
obstack_blank_fast; am testing an updated patch in which the above line
now looks like:

  obstack_blank_fast (ob, -(type_start + type_len));

Is the patch OK with that change? (assuming bootstrap
pass), or should I re-post?

On a related matter, this patch conflicts with Volker's patch here:

  https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01576.html

in which he removes the trailing "{enum}" info (and hence all of our
changes to the testsuite conflict between the two patches...)

Do you have any thoughts on that other patch? [Ccing Volker]

Thanks
Dave


[PATCH, testsuite]: Fix gcc.target/i386/pr80732.c execution test failure

2017-06-20 Thread Uros Bizjak
2017-06-20  Uros Bizjak  

* gcc.target/i386/pr80732.c: Include fma4-check.h.
(main): Renamed to ...
(fma4_test): ... this.

Tested on x86_64-linux-gnu and committed to mainline SVN.

Uros.
Index: gcc.target/i386/pr80732.c
===
--- gcc.target/i386/pr80732.c   (revision 249418)
+++ gcc.target/i386/pr80732.c   (working copy)
@@ -6,6 +6,8 @@
 /* { dg-require-effective-target fpic } */
 /* { dg-require-effective-target pie } */
 
+#include "fma4-check.h"
+
 #include 
 
 __attribute__((target_clones("default","fma"),noinline,optimize("fast-math")))
@@ -51,7 +53,8 @@
 
 double (*initializer) (double, double, double) = {  };
 
-int main()
+static void
+fma4_test (void)
 {
 char buffer[256];
 const char *expectation = "4.93038e-32, 4.93038e-32, 4.93038e-32";
@@ -87,6 +90,4 @@
 __builtin_sprintf(buffer, "%g, %g, %g", initializer (a, b, c), v2_2, v2_3);
 if (__builtin_strcmp (buffer, expectation) != 0)
   __builtin_abort ();
-
-return 0;
 }


Re: [PATCH, AArch64] Add x86 intrinsic headers to GCC AArch64 taget

2017-06-20 Thread Steven Munroe
On Tue, 2017-06-20 at 09:04 +, Hurugalawadi, Naveen wrote:
> Hi Joesph,
> 
> Thanks for your review and valuable comments on this issue.
> 
> Please find attached the patch that merges x86-intrinsics for AArch64 and PPC
> architectures.
> 
> >> it would seem to me to be a bad idea to duplicate the 
> >> implementation for more and more architectures.
> Merged the implementation for AArch64 and PPC architectures.
> 
> The testcase have not been merged yet. Will do it after checking out
> the comments on the current idea of implementation.
> 
> Please check the patch and let me know the comments.
> 
> Bootstrapped and Regression tested on aarch64-thunder-linux and PPC.
> 
I am not sure this works or is even a good idea.

As an accident bmiintrin.h can be implemented as C code or common
builtins. But bmi2intrin.h depends on __builtin_bpermd which to my
knowledge is PowerISA only.

As I work on mmx, sse, sse2, etc it gets more complicated. There are
many X86 intrinsic instances that require altivec.h unique instrisics to
implement efficiently for the power64le target and some inline __asm.

Net the current sample size so far is to small to make a reasonable
assessment.

And as you see see below the gcc.target tests have to be duplicated
anyway. Even if the C code is common there will many differences in
dg-options and dg-require-effective-target. Trying to common these
implementations only creates more small files to manage.

> Thanks,
> Naveen
> 
> 2017-06-20  Naveen H.S  
> 
> [gcc]
>   * config.gcc (aarch64*-*-*): Add bmi2intrin.h, bmiintrin.h,
>   adxintrin.h and x86intrin.h in Config folder.
>   (powerpc*-*-*): Move bmi2intrin.h, bmiintrin.h and x86intrin.h into
>   Config folder.
>   * config/adxintrin.h: New file.
>   * config/bmi2intrin.h: New file.
>   * config/bmiintrin.h: New file.
>   * config/x86intrin.h: New file.
>   * config/rs6000/bmi2intrin.h: Delete file.
>   * config/rs6000/bmiintrin.h: Likewise.
>   * config/rs6000/x86intrin.h: Likewise.
> 
> [gcc/testsuite]
> 
>   * gcc.target/aarch64/adx-addcarryx32-1.c: New file.
>   * gcc.target/aarch64/adx-addcarryx32-2.c: New file.
>   * gcc.target/aarch64/adx-addcarryx32-3.c: New file.
>   * gcc.target/aarch64/adx-addcarryx64-1.c: New file.
>   * gcc.target/aarch64/adx-addcarryx64-2.c: New file
>   * gcc.target/aarch64/adx-addcarryx64-3.c: New file
>   * gcc.target/aarch64/adx-check.h: New file
>   * gcc.target/aarch64/bmi-andn-1.c: New file
>   * gcc.target/aarch64/bmi-andn-2.c: New file.
>   * gcc.target/aarch64/bmi-bextr-1.c: New file.
>   * gcc.target/aarch64/bmi-bextr-2.c: New file.
>   * gcc.target/aarch64/bmi-bextr-4.c: New file.
>   * gcc.target/aarch64/bmi-bextr-5.c: New file.
>   * gcc.target/aarch64/bmi-blsi-1.c: New file.
>   * gcc.target/aarch64/bmi-blsi-2.c: New file.
>   * gcc.target/aarch64/bmi-blsmsk-1.c: new file.
>   * gcc.target/aarch64/bmi-blsmsk-2.c: New file.
>   * gcc.target/aarch64/bmi-blsr-1.c: New file.
>   * gcc.target/aarch64/bmi-blsr-2.c: New File.
>   * gcc.target/aarch64/bmi-check.h: New File.
>   * gcc.target/aarch64/bmi-tzcnt-1.c: new file.
>   * gcc.target/aarch64/bmi-tzcnt-2.c: New file.
>   * gcc.target/aarch64/bmi2-bzhi32-1.c: New file.
>   * gcc.target/aarch64/bmi2-bzhi64-1.c: New file.
>   * gcc.target/aarch64/bmi2-bzhi64-1a.c: New file.
>   * gcc.target/aarch64/bmi2-check.h: New file.
>   * gcc.target/aarch64/bmi2-mulx32-1.c: New file.
>   * gcc.target/aarch64/bmi2-mulx32-2.c: New file.
>   * gcc.target/aarch64/bmi2-mulx64-1.c: New file.
>   * gcc.target/aarch64/bmi2-mulx64-2.c: New file.
>   * gcc.target/aarch64/bmi2-pdep32-1.c: New file.
>   * gcc.target/aarch64/bmi2-pdep64-1.c: New file.
>   * gcc.target/aarch64/bmi2-pext32-1.c: New File.
>   * gcc.target/aarch64/bmi2-pext64-1.c: New file.
>   * gcc.target/aarch64/bmi2-pext64-1a.c: New File.




Re: [PATCH] Fix UB in ira-costs.c (find_costs_and_classes)

2017-06-20 Thread Vladimir Makarov



On 06/20/2017 03:27 AM, Jakub Jelinek wrote:

Hi!

bootstrap-ubsan revealed many
../../gcc/ira-costs.c:1747:20: runtime error: member access within null pointer 
of type 'cost_classes *[107]'
issues.  The problem is that cost_classes_ptr is sometimes NULL, but
in those cases we have early exit:
   if (! allocno_p)
 {
   if (regno_reg_rtx[i] == NULL_RTX)
 continue;  // <- HERE
   memcpy (temp_costs, COSTS (costs, i), struct_costs_size);
   i_mem_cost = temp_costs->mem_cost;
 }
   else
 {
   if (ira_regno_allocno_map[i] == NULL)
 continue;  // <- or HERE
...
 }
Still, cost_classes_ptr->classes where classes is an array is UB when
cost_classes_ptr is NULL, so this patch moves it after the if (...) continue;
in both branches (because it is needed both later in the else ...
and after the whole if.

Bootstrapped/regtested on x86_64-linux and i686-linux (with
bootstrap-ubsan), ok for trunk?

Sure.

Jakub, thank you for addressing the issue.

2017-06-20  Jakub Jelinek  

* ira-costs.c (find_costs_and_classes): Initialize cost_classes later
to make sure not to dereference a NULL cost_classes_ptr pointer.





Re: C++ PATCH for c++/81073, constexpr and static var in statement-expression

2017-06-20 Thread Jason Merrill
On Tue, Jun 20, 2017 at 5:40 AM, Andreas Schwab  wrote:
> FAIL: g++.dg/cpp0x/constexpr-cast.C  -std=c++11  (test for errors, line 10)
> FAIL: g++.dg/cpp0x/constexpr-cast.C  -std=c++11 (test for excess errors)
> FAIL: g++.dg/cpp0x/constexpr-cast.C  -std=c++14  (test for errors, line 10)
> FAIL: g++.dg/cpp0x/constexpr-cast.C  -std=c++14 (test for excess errors)

I'm not seeing this.  Can you give more detail?

Jason


Re: [PATCH v2] C++: Add fix-it hints for -Wold-style-cast

2017-06-20 Thread Jason Merrill
On Wed, May 3, 2017 at 9:51 AM, David Malcolm  wrote:
> On Thu, 2017-04-27 at 23:03 +0200, Marek Polacek wrote:
>> On Thu, Apr 27, 2017 at 05:10:24PM -0400, David Malcolm wrote:
>> > +  /* First try const_cast.  */
>> > +  trial = build_const_cast (dst_type, orig_expr, 0 /* complain
>> > */);
>> > +  if (trial != error_mark_node)
>> > +return "const_cast";
>> > +
>> > +  /* If that fails, try static_cast.  */
>> > +  trial = build_static_cast (dst_type, orig_expr, 0 /* complain
>> > */);
>> > +  if (trial != error_mark_node)
>> > +return "static_cast";
>> > +
>> > +  /* Finally, try reinterpret_cast.  */
>> > +  trial = build_reinterpret_cast (dst_type, orig_expr, 0 /*
>> > complain */);
>> > +  if (trial != error_mark_node)
>> > +return "reinterpret_cast";
>>
>> I think you'll want tf_none instead of 0 /* complain */ in these.
>>
>>   Marek
>
> Thanks.
>
> Here's an updated version of the patch.
>
> Changes since v1:
> - updated expected fixit-formatting (the new fix-it printer in
>   r247548 handles this properly now)
> - added new test cases as suggested by Florian
> - use "tf_none" rather than "0 /* complain */"
>
> Successfully bootstrapped on x86_64-pc-linux-gnu.
>
> OK for trunk?

OK.

Jason


Re: [PATCH][X86] Fix rounding pattern similar to PR73350

2017-06-20 Thread Kirill Yukhin
Hello Julia, Uroš,
On 16 Jun 09:05, Uros Bizjak wrote:
> On Fri, Jun 16, 2017 at 8:46 AM, Koval, Julia  wrote:
> > Hi,
> >
> > This test hangs on avx512er, maybe that's why:
> >> According to POSIX, the behavior of a process is undefined after it 
> >> ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by 
> >> kill(2) or raise(3).
> >
> > And volatile make it work even without a patch(r1 and r2 are not combined 
> > then).
> >
> > Added other changes.
> 
> The testcase LGTM. I'll leave the final approval to Kirill.
The change and the case are fine to me. I've committed it to main trunk.
> 
> Uros.

=-
Thanks, K


Re: [Patch AArch64] Add rcpc extension

2017-06-20 Thread Andrew Pinski
On Tue, Jun 20, 2017 at 6:50 AM, James Greenhalgh
 wrote:
>
> Hi,
>
> While GCC doesn't need to know anything about the RcPc extension for code
> generation, we do need to add the extension flag to the string we pass
> to the assembler when we're compiling for a CPU which implements the RcPc
> extension.
>
> I've built a toolchain with this patch applied, and checked that we
> correctly pass +rcpc on to the assembler if we give something like
> -mcpu=generic+rcpc .
>
> OK?


I think you forgot to update the documentation for this option extension.
https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#g_t-march-and--mcpu-Feature-Modifiers

I suspect there are other missing here too.

Thanks,
Andrew

>
> Thanks,
> James
>
> ---
> 2017-06-20  James Greenhalgh  
>
> * config/aarch64/aarch64-option-extensions.def (rcpc): New.
> * config/aarch64/aarch64.h (AARCH64_FL_RCPC): New.
>


[PATCH/AARCH64 v2] Enable software prefetching (-fprefetch-loop-arrays) for ThunderX 88xxx

2017-06-20 Thread Andrew Pinski
Here is the updated patch based on the new infrastructure which is now included.

OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions
and tested again on SPEC CPU 2006 on THunderX T88 with the speed up
mentioned before.

Thanks,
Andrew Pinski

ChangeLog:
* config/aarch64/aarch64-cores.def (thunderxt88p1): Use thunderxt88 tunings.
(thunderxt88): Likewise.

* config/aarch64/aarch64.c (thunderxt88_prefetch_tune): New variable.
(thunderx_prefetch_tune): New variable.
(thunderx2t99_prefetch_tune): Update for the correct values.
(thunderxt88_tunings): New variable.
(thunderx_tunings): Use thunderx_prefetch_tune instead of generic_prefetch_tune.
(thunderx2t99_tunings): Use AUTOPREFETCHER_WEAK.
Index: gcc/config/aarch64/aarch64-cores.def
===
--- gcc/config/aarch64/aarch64-cores.def(revision 249422)
+++ gcc/config/aarch64/aarch64-cores.def(working copy)
@@ -56,8 +56,8 @@ AARCH64_CORE("cortex-a73",  cortexa73, c
 AARCH64_CORE("thunderx",  thunderx,  thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
0x0a0, -1)
 /* Do not swap around "thunderxt88p1" and "thunderxt88",
this order is required to handle variant correctly. */
-AARCH64_CORE("thunderxt88p1", thunderxt88p1, thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO,thunderx,  
0x43, 0x0a1, 0)
-AARCH64_CORE("thunderxt88",   thunderxt88,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
0x0a1, -1)
+AARCH64_CORE("thunderxt88p1", thunderxt88p1, thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO,thunderxt88,  
0x43, 0x0a1, 0)
+AARCH64_CORE("thunderxt88",   thunderxt88,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderxt88,  0x43, 
0x0a1, -1)
 AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
0x0a2, -1)
 AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
0x0a3, -1)
 
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c(revision 249422)
+++ gcc/config/aarch64/aarch64.c(working copy)
@@ -554,12 +554,30 @@ static const cpu_prefetch_tune qdf24xx_p
   3/* default_opt_level  */
 };
 
+static const cpu_prefetch_tune thunderxt88_prefetch_tune =
+{
+  8,   /* num_slots  */
+  32,  /* l1_cache_size  */
+  128, /* l1_cache_line_size  */
+  16*1024, /* l2_cache_size  */
+  3/* default_opt_level  */
+};
+
+static const cpu_prefetch_tune thunderx_prefetch_tune =
+{
+  8,   /* num_slots  */
+  32,  /* l1_cache_size  */
+  128, /* l1_cache_line_size  */
+  -1,  /* l2_cache_size  */
+  -1   /* default_opt_level  */
+};
+
 static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
 {
-  0,   /* num_slots  */
-  -1,  /* l1_cache_size  */
+  8,   /* num_slots  */
+  32,  /* l1_cache_size  */
   64,  /* l1_cache_line_size  */
-  -1,  /* l2_cache_size  */
+  256, /* l2_cache_size  */
   -1   /* default_opt_level  */
 };
 
@@ -745,6 +763,31 @@ static const struct tune_params exynosm1
   _prefetch_tune
 };
 
+static const struct tune_params thunderxt88_tunings =
+{
+  _extra_costs,
+  _addrcost_table,
+  _regmove_cost,
+  _vector_cost,
+  _branch_cost,
+  _approx_modes,
+  6, /* memmov_cost  */
+  2, /* issue_rate  */
+  AARCH64_FUSE_CMP_BRANCH, /* fusible_ops  */
+  8,   /* function_align.  */
+  8,   /* jump_align.  */
+  8,   /* loop_align.  */
+  2,   /* int_reassoc_width.  */
+  4,   /* fp_reassoc_width.  */
+  1,   /* vec_reassoc_width.  */
+  2,   /* min_div_recip_mul_sf.  */
+  2,   /* min_div_recip_mul_df.  */
+  0,   /* max_case_values.  */
+  tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW),/* tune_flags.  */
+  _prefetch_tune
+};
+
 static const struct tune_params thunderx_tunings =
 {
   _extra_costs,
@@ -767,7 +810,7 @@ static const struct tune_params thunderx
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW),/* tune_flags.  */
-  _prefetch_tune
+  _prefetch_tune
 };
 
 static const struct tune_params xgene1_tunings =
@@ -841,7 +884,7 @@ static const struct tune_params thunderx
   2,   /* min_div_recip_mul_sf.  */
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
-  tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model.  */
+ 

Re: [PING] C++ Re: [PATCH] C/C++: fix quoting of "aka" typedef information (PR 62170)

2017-06-20 Thread Jason Merrill
On Tue, Jun 20, 2017 at 1:58 PM, Jason Merrill  wrote:
> On Tue, Jun 20, 2017 at 11:50 AM, David Malcolm  wrote:
>>> +   ob->next_free = p + type_start + type_len;
>
> I'm uncomfortable with modifying the obstack directly.  Why not use
> obstack_free?

...because you aren't freeing the object, but shrinking it.  So
obstack_blank is a better choice.

Jason


Re: [PING] C++ Re: [PATCH] C/C++: fix quoting of "aka" typedef information (PR 62170)

2017-06-20 Thread Jason Merrill
On Tue, Jun 20, 2017 at 11:50 AM, David Malcolm  wrote:
>> +   ob->next_free = p + type_start + type_len;

I'm uncomfortable with modifying the obstack directly.  Why not use
obstack_free?  I guess for that you'd want to change type_start to a
pointer and get it from obstack_next_free.

Jason


C++ PATCH for c++/80972, C++17 ICE with packed class

2017-06-20 Thread Jason Merrill
In C++17 mode we check to make sure that we are initializing directly
from class prvalues rather than copying them, which hit an issue
with packed fields: we create a temporary for binding a
reference to a packed field, and then pass that temporary to the copy
constructor.  This isn't actually a problem, since the packed field
must support trivial copy, so let's just allow it.

Tested x86_64-pc-linux-gnu, applying to trunk and 7.
commit 298b21a24bd9fbfac20a6dca12df2b64655e4f42
Author: Jason Merrill 
Date:   Mon Jun 19 14:44:02 2017 -0400

PR c++/80972 - C++17 ICE with attribute packed.

* call.c (build_over_call): Allow a TARGET_EXPR from reference
binding.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index d1f27dd..b56da35 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8025,6 +8025,8 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
 subobject.  */
   if (CHECKING_P && cxx_dialect >= cxx1z)
gcc_assert (TREE_CODE (arg) != TARGET_EXPR
+   /* It's from binding the ref parm to a packed field. */
+   || convs[0]->need_temporary_p
|| seen_error ()
/* See unsafe_copy_elision_p.  */
|| DECL_BASE_CONSTRUCTOR_P (fn));
diff --git a/gcc/testsuite/g++.dg/ext/packed12.C 
b/gcc/testsuite/g++.dg/ext/packed12.C
new file mode 100644
index 000..2ad14de
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/packed12.C
@@ -0,0 +1,6 @@
+// PR c++/80972
+
+struct A { int i; };
+struct B { A a; } __attribute__((packed));
+
+A a = B().a;


[C++ PATCH] remove unused identifier

2017-06-20 Thread Nathan Sidwell
'nelts' was a suspiciously gnu user space identifier to have.  Turns out 
we don't use it anywhere.  so killed ...


nathan
--
Nathan Sidwell
2017-06-20  Nathan Sidwell  

	* cp-tree.h (CPTI_NELTS_IDENTIFIER): Delete.
	(nelts_identifier): Delete.
	* decl.c (initialize_predefined_identifiers): Remove nelts.

Index: cp-tree.h
===
--- cp-tree.h	(revision 249418)
+++ cp-tree.h	(working copy)
@@ -136,7 +136,6 @@ enum cp_tree_index
 CPTI_DELTA_IDENTIFIER,
 CPTI_IN_CHARGE_IDENTIFIER,
 CPTI_VTT_PARM_IDENTIFIER,
-CPTI_NELTS_IDENTIFIER,
 CPTI_THIS_IDENTIFIER,
 CPTI_PFN_IDENTIFIER,
 CPTI_VPTR_IDENTIFIER,
@@ -234,7 +233,6 @@ extern GTY(()) tree cp_global_trees[CPTI
 /* The name of the parameter that contains a pointer to the VTT to use
for this subobject constructor or destructor.  */
 #define vtt_parm_identifier		cp_global_trees[CPTI_VTT_PARM_IDENTIFIER]
-#define nelts_identifier		cp_global_trees[CPTI_NELTS_IDENTIFIER]
 #define this_identifier			cp_global_trees[CPTI_THIS_IDENTIFIER]
 #define pfn_identifier			cp_global_trees[CPTI_PFN_IDENTIFIER]
 #define vptr_identifier			cp_global_trees[CPTI_VPTR_IDENTIFIER]
Index: decl.c
===
--- decl.c	(revision 249418)
+++ decl.c	(working copy)
@@ -3982,7 +3982,6 @@ initialize_predefined_identifiers (void)
 { "__base_dtor ", _dtor_identifier, 1 },
 { "__deleting_dtor ", _dtor_identifier, 1 },
 { IN_CHARGE_NAME, _charge_identifier, 0 },
-{ "nelts", _identifier, 0 },
 { THIS_NAME, _identifier, 0 },
 { VTABLE_DELTA_NAME, _identifier, 0 },
 { VTABLE_PFN_NAME, _identifier, 0 },


Re: [PATCH] Improved diagnostics for casts and enums

2017-06-20 Thread Martin Sebor

So here's the patch that reverts the special enum handling in
type_to_string and uses %q#T instead of %qT for two casting-related
diagnostics.

Bootstrapped and regtested on x86_64-pc-linux-gnu.

OK for trunk?


The "E {enum}'" notation is still on trunk so it seems that this
patch has never been committed and I can't find approval of it
in the archive.  To make sure it doesn't get forgotten, please
consider this a ping on Volker's behalf:

https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01576.html

Thanks
Martin



As one can see from the testsuite changes, there are several
casting- and conversion-related messages like "invalid conversion from",
"cannot convert", "invalid cast" that still use the simple %qT
form. I'll give it a try to use %q#T there as well and prepare a
separate patch if this really improves the diagnostics.

Regards,
Volker


2017-04-30  Volker Reichelt  

* parser.c (cp_parser_cast_expression): Use %q#T instead of %qT
in old-style cast diagnostic.
* typeck.c (maybe_warn_about_useless_cast): Use %q#T instead of %qT
in useless cast diagnostic.
* error.c (type_to_string): Remove enum special handling.

Index: gcc/cp/parser.c
===
--- gcc/cp/parser.c (revision 247394)
+++ gcc/cp/parser.c (working copy)
@@ -8764,7 +8764,7 @@
  && !VOID_TYPE_P (type)
  && current_lang_name != lang_name_c)
warning (OPT_Wold_style_cast,
-"use of old-style cast to %qT", type);
+"use of old-style cast to %q#T", type);

  /* Only type conversions to integral or enumeration types
 can be used in constant-expressions.  */
Index: gcc/cp/typeck.c
===
--- gcc/cp/typeck.c (revision 247394)
+++ gcc/cp/typeck.c (working copy)
@@ -6631,7 +6631,7 @@
   ? xvalue_p (expr) : lvalue_p (expr))
   && same_type_p (TREE_TYPE (expr), TREE_TYPE (type)))
  || same_type_p (TREE_TYPE (expr), type))
-   warning (OPT_Wuseless_cast, "useless cast to type %qT", type);
+   warning (OPT_Wuseless_cast, "useless cast to type %q#T", type);
 }
 }

Index: gcc/cp/error.c
===
--- gcc/cp/error.c  (revision 247394)
+++ gcc/cp/error.c  (working copy)
@@ -3134,10 +3134,6 @@
   if (len == aka_len && memcmp (p, p+aka_start, len) == 0)
p[len] = '\0';
 }
-
-  if (typ && TYPE_P (typ) && TREE_CODE (typ) == ENUMERAL_TYPE)
-pp_string (cxx_pp, M_(" {enum}"));
-
   return pp_ggc_formatted_text (cxx_pp);
 }

===

2017-04-30  Volker Reichelt  

* g++.dg/cpp1z/direct-enum-init1.C: Rever special enum handling.
* g++.dg/warn/pr12242.C: Likewise.

Index: gcc/testsuite/g++.dg/cpp1z/direct-enum-init1.C
===
--- gcc/testsuite/g++.dg/cpp1z/direct-enum-init1.C  (revision 247394)
+++ gcc/testsuite/g++.dg/cpp1z/direct-enum-init1.C  (working copy)
@@ -17,67 +17,67 @@
 void
 foo ()
 {
-  A a1 { 5 };  // { dg-error "invalid conversion from 'int' to 'A 
{enum}'" }
-  B b1 { 7 };  // { dg-error "invalid conversion from 'int' to 'B {enum}'" 
"" { target c++14_down } }
+  A a1 { 5 };  // { dg-error "invalid conversion from 'int' to 'A'" }
+  B b1 { 7 };  // { dg-error "invalid conversion from 'int' to 'B'" "" 
{ target c++14_down } }
   C c1 { s };
-  D d1 { D(t) };   // { dg-error "invalid cast from type 'T' to type 'D 
{enum}'" }
-  D d2 { t };  // { dg-error "cannot convert 'T' to 'D {enum}' in 
initialization" "" { target c++14_down } }
+  D d1 { D(t) };   // { dg-error "invalid cast from type 'T' to type 'D'" }
+  D d2 { t };  // { dg-error "cannot convert 'T' to 'D' in initialization" 
"" { target c++14_down } }
// { dg-error "invalid cast from type 'T' to type 'D'" 
"" { target c++1z } .-1 }
-  D d3 { 9 };  // { dg-error "cannot convert 'int' to 'D {enum}' in 
initialization" "" { target c++14_down } }
-  D d4 { l };  // { dg-error "cannot convert 'long int' to 'D {enum}' in 
initialization" "" { target c++14_down } }
+  D d3 { 9 };  // { dg-error "cannot convert 'int' to 'D' in initialization" 
"" { target c++14_down } }
+  D d4 { l };  // { dg-error "cannot convert 'long int' to 'D' in 
initialization" "" { target c++14_down } }
   D d5 { D(l) };
-  D d6 { G };  // { dg-error "cannot convert 'A {enum}' to 'D {enum}' in 
initialization" "" { target c++14_down } }
-  E e1 { 5 };  // { dg-error "cannot convert 'int' to 'E {enum}' in 
initialization" "" { target c++14_down } }
-  E e2 { -1 }; // { dg-error 

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-20 Thread Jeff Law
On 06/20/2017 06:17 AM, Uros Bizjak wrote:
> On Tue, Jun 20, 2017 at 2:13 PM, Florian Weimer  wrote:
>> On 06/20/2017 01:10 PM, Uros Bizjak wrote:
>>
>>>   74,99%  a.outa.out  [.] test_or
>>>   12,50%  a.outa.out  [.] test_movb
>>>   12,50%  a.outa.out  [.] test_movl
>>
>> Could you try notl/notb/negl/negb as well, please?
> 
> These all have the same (long) runtime as test_or.
That would be my expectation -- they (not/neg) are going to be RMW.

So we can we agree that moving away RMW to a simple W style instruction
for the probe is where we want to go?  Then we can kick around the exact
form of that store.

FWIW, we don't have to store zero -- ultimately we care about the side
effect of triggering the page fault, not the value written.  So we could
just as easily store a register into the probed address to avoid the
codesize cost of encoding an immediate

I did that in my local s390 patches.  It may not be necessary there, but
it allowed me to avoid thinking too hard about the ISA and get s390
proof of concept code running :-)


Jeff


[PATCH, alpha]: Update libstdc++ baseline_symbols.txt

2017-06-20 Thread Uros Bizjak
2017-06-20  Uros Bizjak  

* config/abi/post/alpha-linux-gnu/baseline_symbols.txt: Update.

Tested on alphaev68-linux-gnu and committed to mainline SVN.

Uros.
Index: config/abi/post/alpha-linux-gnu/baseline_symbols.txt
===
--- config/abi/post/alpha-linux-gnu/baseline_symbols.txt(revision 
249356)
+++ config/abi/post/alpha-linux-gnu/baseline_symbols.txt(working copy)
@@ -444,6 +444,7 @@
 FUNC:_ZNKSt13basic_istreamIwSt11char_traitsIwEE6gcountEv@@GLIBCXX_3.4
 FUNC:_ZNKSt13basic_istreamIwSt11char_traitsIwEE6sentrycvbEv@@GLIBCXX_3.4
 FUNC:_ZNKSt13basic_ostreamIwSt11char_traitsIwEE6sentrycvbEv@@GLIBCXX_3.4
+FUNC:_ZNKSt13random_device13_M_getentropyEv@@GLIBCXX_3.4.25
 FUNC:_ZNKSt13runtime_error4whatEv@@GLIBCXX_3.4
 FUNC:_ZNKSt14basic_ifstreamIcSt11char_traitsIcEE5rdbufEv@@GLIBCXX_3.4
 FUNC:_ZNKSt14basic_ifstreamIcSt11char_traitsIcEE7is_openEv@@GLIBCXX_3.4.5
@@ -1471,6 +1472,7 @@
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1EPKwmRKS1_@@GLIBCXX_3.4
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ERKS1_@@GLIBCXX_3.4
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ERKS2_@@GLIBCXX_3.4
+FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ERKS2_mRKS1_@@GLIBCXX_3.4.23
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ERKS2_mm@@GLIBCXX_3.4
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ERKS2_mmRKS1_@@GLIBCXX_3.4
 
FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ESt16initializer_listIwERKS1_@@GLIBCXX_3.4.11
@@ -1484,6 +1486,7 @@
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2EPKwmRKS1_@@GLIBCXX_3.4
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ERKS1_@@GLIBCXX_3.4
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ERKS2_@@GLIBCXX_3.4
+FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ERKS2_mRKS1_@@GLIBCXX_3.4.23
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ERKS2_mm@@GLIBCXX_3.4
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ERKS2_mmRKS1_@@GLIBCXX_3.4
 
FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ESt16initializer_listIwERKS1_@@GLIBCXX_3.4.11
@@ -1726,6 +1729,7 @@
 FUNC:_ZNSsC1EPKcmRKSaIcE@@GLIBCXX_3.4
 FUNC:_ZNSsC1ERKSaIcE@@GLIBCXX_3.4
 FUNC:_ZNSsC1ERKSs@@GLIBCXX_3.4
+FUNC:_ZNSsC1ERKSsmRKSaIcE@@GLIBCXX_3.4.23
 FUNC:_ZNSsC1ERKSsmm@@GLIBCXX_3.4
 FUNC:_ZNSsC1ERKSsmmRKSaIcE@@GLIBCXX_3.4
 FUNC:_ZNSsC1ESt16initializer_listIcERKSaIcE@@GLIBCXX_3.4.11
@@ -1739,6 +1743,7 @@
 FUNC:_ZNSsC2EPKcmRKSaIcE@@GLIBCXX_3.4
 FUNC:_ZNSsC2ERKSaIcE@@GLIBCXX_3.4
 FUNC:_ZNSsC2ERKSs@@GLIBCXX_3.4
+FUNC:_ZNSsC2ERKSsmRKSaIcE@@GLIBCXX_3.4.23
 FUNC:_ZNSsC2ERKSsmm@@GLIBCXX_3.4
 FUNC:_ZNSsC2ERKSsmmRKSaIcE@@GLIBCXX_3.4
 FUNC:_ZNSsC2ESt16initializer_listIcERKSaIcE@@GLIBCXX_3.4.11
@@ -2382,6 +2387,7 @@
 FUNC:_ZNSt15_List_node_base9_M_unhookEv@@GLIBCXX_3.4.14
 FUNC:_ZNSt15__exception_ptr13exception_ptr4swapERS0_@@CXXABI_1.3.3
 FUNC:_ZNSt15__exception_ptr13exception_ptrC1EMS0_FvvE@@CXXABI_1.3.3
+FUNC:_ZNSt15__exception_ptr13exception_ptrC1EPv@@CXXABI_1.3.11
 FUNC:_ZNSt15__exception_ptr13exception_ptrC1ERKS0_@@CXXABI_1.3.3
 FUNC:_ZNSt15__exception_ptr13exception_ptrC1Ev@@CXXABI_1.3.3
 FUNC:_ZNSt15__exception_ptr13exception_ptrC2EMS0_FvvE@@CXXABI_1.3.3
@@ -3068,6 +3074,7 @@
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ERKS3_@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ERKS4_@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ERKS4_RKS3_@@GLIBCXX_3.4.21
+FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ERKS4_mRKS3_@@GLIBCXX_3.4.23
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ERKS4_mm@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ERKS4_mmRKS3_@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ESt16initializer_listIcERKS3_@@GLIBCXX_3.4.21
@@ -3083,6 +3090,7 @@
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC2ERKS3_@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC2ERKS4_@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC2ERKS4_RKS3_@@GLIBCXX_3.4.21
+FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC2ERKS4_mRKS3_@@GLIBCXX_3.4.23
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC2ERKS4_mm@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC2ERKS4_mmRKS3_@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC2ESt16initializer_listIcERKS3_@@GLIBCXX_3.4.21
@@ -3209,6 +3217,7 @@
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEEC1ERKS3_@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEEC1ERKS4_@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEEC1ERKS4_RKS3_@@GLIBCXX_3.4.21
+FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEEC1ERKS4_mRKS3_@@GLIBCXX_3.4.23
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEEC1ERKS4_mm@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEEC1ERKS4_mmRKS3_@@GLIBCXX_3.4.21
 

[PING] C++ Re: [PATCH] C/C++: fix quoting of "aka" typedef information (PR 62170)

2017-06-20 Thread David Malcolm
Ping re the C++ part of this:

  https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00242.html

Marek approved the C parts; are the C++ parts OK for trunk?

Or can I self-approve this?  (exactly what the boundaries of
my"diagnostic messages" maintainer remit are aren't clear to me).

Thanks
Dave

On Mon, 2017-06-05 at 15:01 -0400, David Malcolm wrote:
> PR 62170 describes a problem with how the quoting in pp_format
> interacts with the "aka" information for typedefs in %qT for
> the C family of frontends, and also now for %qH and %qI in the
> C++ frontend.
> 
> Currently for %qT we print e.g.:
> 
>   ‘Py_ssize_t* {aka int*}’
>^^ colorized as "quote"
> 
> i.e.
>   ‘[START_COLOR]Py_ssize_t* {aka int*}[END_COLOR]’
> 
> when we should print:
> 
>   ‘Py_ssize_t*’ {aka ‘int*’}
>^^^ colorized as "quote"
> 
> i.e.
>   ‘[START_COLOR]Py_ssize_t*[END_COLOR]’ {aka
> ‘[START_COLOR]int*[END_COLOR]’}
> 
> where the opening and closing quote characters and colorization are
> currently added by the 'q' handling within pp_format, adding the
> closing
> quote unconditionally after whatever pp_format_decoder prints for 'T'
> within "%qT".
> 
> This patch fixes the quoting by updating the %T handling in C and C++
> and the %H/%I handling in C++ to insert the quoting appropriately.
> It converts the "quote" param of the pp_format_decoder callback from
> bool to bool *, allowing for the %T and %H/%I handlers to write
> false back to it, to avoid printing the closing quote for the cases
> like the above where a final trailing closing quote isn't needed.
> 
> It introduces pp_begin_quote/pp_end_quote to simplify this.  These
> take a "bool show_color", rather than using "pp_show_color (pp)"
> since cxx_pp's pp_show_color isn't currently initialized (since
> cxx_initialize_diagnostics happens before diagnostic_color_init).
> 
> Successfully bootstrapped on x86_64-pc-linux-gnu.
> 
> OK for trunk?
> 
> gcc/c/ChangeLog:
>   PR c++/62170
>   * c-objc-common.c (c_tree_printer): Convert penultimate param
> from
>   bool to bool *.  Within '%T' handling, if showing an "aka", use
>   "quoted" param to add appropriate quoting.
> 
> gcc/cp/ChangeLog:
>   PR c++/62170
>   * error.c (type_to_string): Add leading comment.  Add params
>   "postprocessed", "quote", and "show_color", using them to fix
>   quoting of the "aka" for types involving typedefs.
>   (arg_to_string): Update for new params to type_to_string.
>   (cxx_format_postprocessor::handle): Likewise.
>   (cp_printer): Convert penultimate param from bool to bool *.
>   Update call to type_to_string and calls to
>   defer_phase_2_of_type_diff.
> 
> gcc/fortran/ChangeLog:
>   PR c++/62170
>   * error.c (gfc_notify_std): Convert "quoted" param from bool to
>   bool *.
> 
> gcc/ChangeLog:
>   PR c++/62170
>   * pretty-print.c (pp_format): Move quoting implementation to
>   pp_begin_quote and pp_end_quote.  Update pp_format_decoder call
>   to pass address of "quote" local.
>   (pp_begin_quote): New function.
>   (pp_end_quote): New function.
>   * pretty-print.h (printer_fn): Convert penultimate param from
> bool
>   to bool *.
>   (pp_begin_quote): New decl.
>   (pp_end_quote): New decl.
>   * tree-diagnostic.c (default_tree_printer): Convert penultimate
>   param from bool to bool *.
>   * tree-diagnostic.h (default_tree_printer): Likewise.
> 
> gcc/testsuite/ChangeLog:
>   PR c++/62170
>   * g++.dg/cpp1z/direct-enum-init1.C: Update expected error
> messages
>   to reflect fixes to quoting.
>   * g++.dg/diagnostic/aka1.C: Likewise.
>   * g++.dg/diagnostic/aka2.C: New test case.
>   * g++.dg/parse/error55.C: Update expected error messages to
>   reflect fixes to quoting.
>   * g++.dg/warn/pr12242.C: Likewise.
>   * g++.old-deja/g++.mike/enum1.C: Likewise.
>   * gcc.dg/diag-aka-1.c: Likewise.
>   * gcc.dg/diag-aka-2.c: New test case.
>   * gcc.dg/pr13804-1.c: Update expected error messages to reflect
>   fixes to quoting.
>   * gcc.dg/pr56980.c: Likewise.
>   * gcc.dg/pr65050.c: Likewise.
>   * gcc.dg/redecl-14.c: Likewise.
>   * gcc.dg/utf16-4.c Likewise.
>   * gcc.target/i386/sse-vect-types.c (__m128d): Likewise.
>   * obj-c++.dg/invalid-type-1.mm: Likewise.
> ---
>  gcc/c/c-objc-common.c  |  12 +-
>  gcc/cp/error.c |  92 --
>  gcc/fortran/error.c|   2 +-
>  gcc/pretty-print.c |  37 +++-
>  gcc/pretty-print.h |   5 +-
>  gcc/testsuite/g++.dg/cpp1z/direct-enum-init1.C | 234 ---
> --
>  gcc/testsuite/g++.dg/diagnostic/aka1.C |   2 +-
>  gcc/testsuite/g++.dg/diagnostic/aka2.C |  32 
>  gcc/testsuite/g++.dg/parse/error55.C   |   2 +-
>  

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-20 Thread Jeff Law
On 06/20/2017 02:27 AM, Richard Earnshaw (lists) wrote:
> On 19/06/17 20:04, Jeff Law wrote:
>> On 06/19/2017 11:50 AM, Joseph Myers wrote:
>>> On Mon, 19 Jun 2017, Jeff Law wrote:
>>>
 A key point to remember is that you can never have an allocation
 (potentially using more than one allocation site) which is larger than a
 page without probing the page.
>>>
>>> There's a platform ABI issue here.  At least some kernel fixes for these 
>>> stack issues, as I understand it, increase the size of the stack guard to 
>>> more than a single page.  It would be possible to define the ABI to 
>>> require such a larger guard for protection and so reduce the number of 
>>> (non-alloca/VLA-using) functions that need probes generated, depending on 
>>> whether a goal is to achieve security on kernels without such a fix.  
>>> (Thinking in terms of how to get to enabling such probes by default.)
>> On 32 bit platforms we don't have a lot of address space left, so we
>> have to be careful about creating too large of a guard.
>>
>> On 64 bit platforms we have a lot more freedom and I suspect larger
>> guards, mandated by the ABI would be useful, if for no other reason than
>> allowing us to allocate more stack without probing.   A simple array of
>> PATH_MAX characters triggers probing right now.   I suspect (but didn't
>> bother to confirm) that PATH_MAX array are what causes git to have so
>> many large stacks.
>>
>> Also if we look at something like ppc and aarch64, we've currently got
>> the PROBE_INTERVAL set to 4k.  But in reality they're using much larger
>> page sizes.  So we could improve things there as well.
>>
> 
> There are aarch64 linux systems using 4k pages for compatibility with
> existing aarch32 binaries.
Ah.  That's good to know.  Thanks.

jeff


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-20 Thread Jeff Law
On 06/20/2017 02:21 AM, Eric Botcazou wrote:
>> Out of curiousity, does the old Alpha/VMS stack-checking API meet the
>> requirements?  From what I recall, I think it does.
> 
> No, it's the usual probe-first-and-then-allocate strategy and Jeff rejects it 
> because of valgrind.  I'd personally rather change valgrind but...
I'm torn here.  It'd certainly be a hell of a lot easier to punt this to
valgrind, but with the issues I just couldn't get comfortable with that.

We're probing pages which are potentially unmapped and far away from the
stack pointer.  The kernel and/or valgrind has to look at the reference
and make a guess whether or not it's really a request for more stack or
a wild pointer -- and there's no real good way to tell the difference.
Thus we're dependent upon the heuristics used by the kernel and valgrind
and we're dependent on those heuristics essentially being the same (and
I'm certain they are not at this time :-)

One could also argue that these probes are undefined behavior precisely
because they potentially hit pages unmapped pages far away from the
stack pointer.

Any probing beyond the stack pointer is also going to trigger a valgrind
warning.  Valgrind has some support avoiding the warning if a reference
hits the red zone -- but these probes can be well beyond the red zone.

In a world where distros may be turning on -fstack-check= by
default, valgrind has to continue to work and not generate an
unreasonable number of false positive warnings else the tool becomes
useless.


Jeff





Re: [PATCH, contrib] Support multi-tool sum files in dg-cmp-results.sh

2017-06-20 Thread Mike Stump
On Jun 20, 2017, at 8:31 AM, Thomas Preudhomme  
wrote:
> 
> 2017-06-14  Thomas Preud'homme  
> 
>   * dg-cmp-results.sh: Keep test result lines rather than throwing
>   header and summary to support sum files with multiple tools.
> 
> 
> Is this still ok?

Ok.



Re: [Patch match.pd] Fold (A / (1 << B)) to (A >> B)

2017-06-20 Thread James Greenhalgh

On Fri, Jun 16, 2017 at 11:41:57AM +0200, Richard Biener wrote:
> On Fri, 16 Jun 2017, James Greenhalgh wrote:
> > On Mon, Jun 12, 2017 at 03:56:25PM +0200, Richard Biener wrote:
> > > +   We can't do the same for signed A, as it might be negative, which
> > > would
> > > +   introduce undefined behaviour.  */
> > >
> > > huh, AFAIR it is _left_ shift of negative values that invokes
> > > undefined behavior.
> >
> > You're right this is not a clear comment. The problem is not undefined
> > behaviour, so that text needs to go, but rounding towards/away from zero
> > for signed negative values. Division will round towards zero, arithmetic
> > right shift away from zero. For example in:
> >
> > -1 / (1 << 1)   !=-1 >> 1
> >   = -1 / 2
> >   = 0 = -1
> >
> > I've rewritten the comment to make it clear this is why we can only make
> > this optimisation for unsigned values.
>
> Ah, of course.  You could use
>
>  if ((TYPE_UNSIGNED (type)
>   || tree_expr_nonnegative_p (@0))
>
> here as improvement.

Thanks, I've made that change.

> > See, for example, gcc.c-torture/execute/pr34070-2.c
> >
> > > Note that as you are accepting vectors you need to make sure the
> > > target actually supports arithmetic right shift of vectors
> > > (you only know it supports left shift and division -- so it might
> > > be sort-of-superfluous to check in case there is no arch that supports
> > > those but not the other).
> >
> > I've added a check for that using optabs, is that the right way to do this?
>
> +  && (!VECTOR_TYPE_P (type)
> +  || optab_for_tree_code (RSHIFT_EXPR, type, optab_vector)
> +  || optab_for_tree_code (RSHIFT_EXPR, type, optab_scalar)))
>
> is not enough -- you need sth like
>
>  optab ot = optab_for_tree_code (RSHIFT_EXPR, type, optab_vector);
>  if (ot != unknown_optab
>  && optab_handler (ot, TYPE_MODE (type)) != CODE_FOR_nothing)
>.. ok! ...
>
> ideally we'd have a helper for this in optab-tree.[ch],
> tree-vect-patterns.c could also make use of that.

OK. I've added "target_has_vector_rshift_p" for this purpose.

Bootstrapped and tested on aarch64-none-linux-gnu with no issues.

OK?

Thanks,
James

---
gcc/

2017-06-19  James Greenhalgh  

* match.pd (A / (1 << B) -> A >> B): New.
* generic-match-head.c: Include optabs-tree.h.
* gimple-match-head.c: Likewise.
* optabs-tree.h (target_has_vector_rshift_p): New.
* optabs-tree.c (target_has_vector_rshift_p): New.

gcc/testsuite/

2017-06-19  James Greenhalgh  

* gcc.dg/tree-ssa/forwprop-37.c: New.

diff --git a/gcc/generic-match-head.c b/gcc/generic-match-head.c
index 0c0d182..4504401 100644
--- a/gcc/generic-match-head.c
+++ b/gcc/generic-match-head.c
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "case-cfn-macros.h"
 #include "gimplify.h"
+#include "optabs-tree.h"
 
 
 /* Routine to determine if the types T1 and T2 are effectively
diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index e7e9839..5f6aa27 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
 #include "gimplify.h"
+#include "optabs-tree.h"
 
 
 /* Forward declarations of the private auto-generated matchers.
diff --git a/gcc/match.pd b/gcc/match.pd
index 244e9eb..eb6bd59 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -147,6 +147,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (op @0 integer_onep)
 (non_lvalue @0)))
 
+/* (A / (1 << B)) -> (A >> B).
+   Only for unsigned A.  For signed A, this would not preserve rounding
+   toward zero.
+   For example: (-1 / ( 1 << B)) !=  -1 >> B.  */
+(simplify
+ (trunc_div @0 (lshift integer_onep@1 @2))
+ (if ((TYPE_UNSIGNED (type) || tree_expr_nonnegative_p (@0))
+  && (!VECTOR_TYPE_P (type)
+	  || target_has_vector_rshift_p (type, optab_default)))
+  (rshift @0 @2)))
+
 /* Preserve explicit divisions by 0: the C++ front-end wants to detect
undefined behavior in constexpr evaluation, and assuming that the division
traps enables better optimizations than these anyway.  */
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 4bb54ba..4a513d2 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -376,3 +376,24 @@ init_tree_optimization_optabs (tree optnode)
   ggc_free (tmp_optabs);
 }
 }
+
+/* Return TRUE if the target has support for vector right shift of an
+   operand of type TYPE.  If OT_TYPE is OPTAB_DEFAULT, check for existence
+   of a shift by either a scalar or a vector.  Otherwise, check only
+   for a shift that matches OT_TYPE.  */
+
+bool
+target_has_vector_rshift_p (tree type, enum optab_subtype ot_type)
+{
+  gcc_assert (VECTOR_TYPE_P (type));
+  if (ot_type != optab_default)
+{
+  optab ot = optab_for_tree_code (RSHIFT_EXPR, type, 

Re: [PATCH] [SPARC] Add a workaround for the LEON3FT store-store errata

2017-06-20 Thread David Miller
From: Sebastian Huber 
Date: Tue, 20 Jun 2017 07:55:33 +0200

> would someone mind reviewing this patch please. It was already sent
> for review on January this year and got no attention. Now we are in a
> different development stage.
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01354.html

I'm fine with this change.


[Patch AArch64] Add initial tuning support for Cortex-A55 and Cortex-A75

2017-06-20 Thread James Greenhalgh

Hi,

This patch adds support for the ARM Cortex-A75 and
Cortex-A55 processors through the -mcpu/-mtune values cortex-a55 and
cortex-a75, and an ARM DynamIQ big.LITTLE configuration of these two
processors through the -mcpu/-mtune value cortex-a75.cortex-a55

The ARM Cortex-A75 is ARM's latest and highest performance applications
processor. For the initial tuning provided in this patch, I have chosen to
share the tuning structure with its predecessor, the Cortex-A73.

The ARM Cortex-A55 delivers the best combination of power efficiency
and performance in its class. For the initial tuning provided in this patch,
I have chosen to share the tuning structure with its predecessor, the
Cortex-A53.

Both Cortex-A55 and Cortex-A75 support ARMv8-A with the ARM8.1-A and
ARMv8.2-A extensions, along with the cryptography extension, and
the RCPC extensions from ARMv8.3-A. This is reflected in the patch,
-mcpu=cortex-a75 is treated as equivalent to passing -mtune=cortex-a75
-march=armv8.2-a+rcpc .

Tested on aarch64-none-elf with no issues.

OK for trunk?

Thanks,
James

---
2017-06-20  James Greenhalgh  

* config/aarch64/aarch64-cores.def (cortex-a55): New.
(cortex-a75): Likewise.
(cortex-a75.cortex-a55): Likewise.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (-mtune): Document new values for -mtune.

diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index e333d5f..0baa20c 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -80,6 +80,12 @@ AARCH64_CORE("vulcan",  vulcan, thunderx2t99, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AA
 /* Cavium ('C') cores. */
 AARCH64_CORE("thunderx2t99",  thunderx2t99,  thunderx2t99, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x43, 0x0af, -1)
 
+/* ARMv8.2-A Architecture Processors.  */
+
+/* ARM ('A') cores. */
+AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_RCPC, cortexa53, 0x41, 0xd05, -1)
+AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_RCPC, cortexa73, 0x41, 0xd0a, -1)
+
 /* ARMv8-A big.LITTLE implementations.  */
 
 AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE (0xd07, 0xd03), -1)
@@ -87,4 +93,8 @@ AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8A,  AARCH
 AARCH64_CORE("cortex-a73.cortex-a35",  cortexa73cortexa35, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd09, 0xd04), -1)
 AARCH64_CORE("cortex-a73.cortex-a53",  cortexa73cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd09, 0xd03), -1)
 
+/* ARM DynamIQ big.LITTLE configurations.  */
+
+AARCH64_CORE("cortex-a75.cortex-a55",  cortexa75cortexa55, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_RCPC, cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd0a, 0xd05), -1)
+
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index 4209f67..7fcd6cb 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
+	"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 86c8d62..2746c3e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14077,17 +14077,19 @@ processors implementing the target architecture.
 @opindex mtune
 Specify the name of the target processor for which GCC should tune the
 performance of the code.  Permissible values for this option are:
-@samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a57},
-@samp{cortex-a72}, @samp{cortex-a73}, @samp{exynos-m1},
-@samp{xgene1}, @samp{vulcan}, @samp{thunderx},
+@samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
+@samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75},
+@samp{exynos-m1}, @samp{xgene1}, @samp{vulcan}, @samp{thunderx},
 @samp{thunderxt88}, @samp{thunderxt88p1}, @samp{thunderxt81},
 @samp{thunderxt83}, @samp{thunderx2t99}, @samp{cortex-a57.cortex-a53},
 @samp{cortex-a72.cortex-a53}, 

Re: [PATCH, contrib] Support multi-tool sum files in dg-cmp-results.sh

2017-06-20 Thread Thomas Preudhomme

Hi Mike,

Sorry, there was a mistake in the patch I sent. Please find an updated patch 
below.

ChangeLog entry unchanged:


*** contrib/ChangeLog ***

2017-06-14  Thomas Preud'homme  

* dg-cmp-results.sh: Keep test result lines rather than throwing
header and summary to support sum files with multiple tools.


Is this still ok?

Best regards,

Thomas

On 19/06/17 16:55, Mike Stump wrote:

On Jun 14, 2017, at 5:30 AM, Thomas Preudhomme  
wrote:


2017-06-14  Thomas Preud'homme  

* dg-cmp-results.sh: Keep test result lines rather than throwing
header and summary to support sum files with multiple tools.

Tested successfully on sum file with single tool with similar results
and on sum file with multiple tools now showing a regression with patch
proposed in https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00875.html

Is this ok for trunk?


Ok.

diff --git a/contrib/dg-cmp-results.sh b/contrib/dg-cmp-results.sh
index d291769547dcd2a02ecf6f80d60d6be7802af4fd..921e9337d1f8ffea78ef566c351fb48a8f6ca064 100755
--- a/contrib/dg-cmp-results.sh
+++ b/contrib/dg-cmp-results.sh
@@ -90,8 +90,7 @@ echo "Newer log file: $NFILE"
 sed $E -e '/^[[:space:]]+===/,$d' $NFILE
 
 # Create a temporary file from the old file's interesting section.
-sed $E -e "1,/$header/d" \
-  -e '/^[[:space:]]+===/,$d' \
+sed $E -e '/^Running target /,/^[[:space:]]+===.*Summary ===/!d' \
   -e '/^[A-Z]+:/!d' \
   -e '/^(WARNING|ERROR):/d' \
   -e 's/\r$//' \
@@ -101,8 +100,7 @@ sed $E -e "1,/$header/d" \
   >/tmp/o$$-$OBASE
 
 # Create a temporary file from the new file's interesting section.
-sed $E -e "1,/$header/d" \
-  -e '/^[[:space:]]+===/,$d' \
+sed $E -e '/^Running target /,/^[[:space:]]+===.*Summary ===/!d' \
   -e '/^[A-Z]+:/!d' \
   -e '/^(WARNING|ERROR):/d' \
   -e 's/\r$//' \


Re: [PATCH/AARCH64] Improve aarch64 conditional compare usage

2017-06-20 Thread Steve Ellcey
On Tue, 2017-06-20 at 14:58 +0100, James Greenhalgh wrote:
> On Fri, Jun 16, 2017 at 10:06:51AM -0700, Steve Ellcey wrote:
> > 
> > 
> > https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00021.html
> > 
> > Ping.
> Hi Steve,
> 
> These changes all look like they are to the tree pass rather than to the
> AArch64 back end. Maybe reposting it without the AArch64 tag will get it
> more visibility from people other than the AArch64 maintainers?
> 
> Cheers,
> James

Someone else made the same suggestion so I re-pinged here:

https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01338.html

With the subject "[PATCH] Ping of ccmp.c (conditional compare) patch"

Steve Ellcey
sell...@cavium.com


Re: [PATCH][AArch64] Improve dup pattern

2017-06-20 Thread Wilco Dijkstra
James Greenhalgh wrote:
>
> Have you tested this in cases where an integer dup is definitely the right
> thing to do?

Yes, this still generates:

  #include 
  void f(unsigned a, unsigned b, uint32x4_t *c)
  {
    c[0] = vdupq_n_u32(a);
    c[1] = vdupq_n_u32(b);
  }

dup v1.4s, w0
dup v0.4s, w1
str q1, [x2]
str q0, [x2, 16]
ret

The reason is that the GP to FP register move cost is typically >= 5, while
the additional cost of '?' is just 1.

> And similar cases? If these still look good, then the patch is OK - though
> I'm still very nervous about the register allocator cost model!

Well it's complex and hard to get working well... However slightly preferring 
one
variant works alright (unlike using '*' which results in incorrect costs).

Wilco


[PATCH, GCC/ARM] Remove ARMv8-M code for D17-D31

2017-06-20 Thread Thomas Preudhomme

Hi,

Function cmse_nonsecure_entry_clear_before_return has code to deal with
high VFP register (D16-D31) while ARMv8-M Baseline and Mainline both do
not support more than 16 double VFP registers (D0-D15). This makes this
security-sensitive code harder to read for not much benefit since
libcall for cmse_nonsecure_call functions do not deal with those high
VFP registers anyway.

This commit gets rid of this code for simplicity and fixes 2 issues in
the same function:

- stop the first loop when reaching maxregno to avoid dealing with VFP
  registers if targetting Thumb-1 or using -mfloat-abi=soft
- include maxregno in that loop

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-06-13  Thomas Preud'homme  

* config/arm/arm.c (arm_option_override): Forbid ARMv8-M Security
Extensions with more than 16 double VFP registers.
(cmse_nonsecure_entry_clear_before_return): Remove second entry of
to_clear_mask and all code related to it and make the remaining
entry a 64-bit scalar integer variable and adapt code accordingly.

Testing: Testsuite shows no regression when run for ARMv8-M Baseline and
ARMv8-M Mainline.

Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 259597d8890ee84c5bd92b12b6f9f6521c8dcd2e..60a4d1f46765d285de469f51fbb5a0ad76d56d9b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3620,6 +3620,11 @@ arm_option_override (void)
   if (use_cmse && !arm_arch_cmse)
 error ("target CPU does not support ARMv8-M Security Extensions");
 
+  /* We don't clear D16-D31 VFP registers for cmse_nonsecure_call functions
+ and ARMv8-M Baseline and Mainline do not allow such configuration.  */
+  if (use_cmse && LAST_VFP_REGNUM > LAST_LO_VFP_REGNUM)
+error ("ARMv8-M Security Extensions incompatible with selected FPU");
+
   /* Disable scheduling fusion by default if it's not armv7 processor
  or doesn't prefer ldrd/strd.  */
   if (flag_schedule_fusion == 2
@@ -24996,15 +25001,15 @@ thumb1_expand_prologue (void)
 void
 cmse_nonsecure_entry_clear_before_return (void)
 {
-  uint64_t to_clear_mask[2];
+  uint64_t to_clear_mask;
   uint32_t padding_bits_to_clear = 0;
   uint32_t * padding_bits_to_clear_ptr = _bits_to_clear;
   int regno, maxregno = IP_REGNUM;
   tree result_type;
   rtx result_rtl;
 
-  to_clear_mask[0] = (1ULL << (NUM_ARG_REGS)) - 1;
-  to_clear_mask[0] |= (1ULL << IP_REGNUM);
+  to_clear_mask = (1ULL << (NUM_ARG_REGS)) - 1;
+  to_clear_mask |= (1ULL << IP_REGNUM);
 
   /* If we are not dealing with -mfloat-abi=soft we will need to clear VFP
  registers.  We also check that TARGET_HARD_FLOAT and !TARGET_THUMB1 hold
@@ -25015,23 +25020,22 @@ cmse_nonsecure_entry_clear_before_return (void)
   maxregno = LAST_VFP_REGNUM;
 
   float_mask &= ~((1ULL << FIRST_VFP_REGNUM) - 1);
-  to_clear_mask[0] |= float_mask;
-
-  float_mask = (1ULL << (maxregno - 63)) - 1;
-  to_clear_mask[1] = float_mask;
+  to_clear_mask |= float_mask;
 
   /* Make sure we don't clear the two scratch registers used to clear the
 	 relevant FPSCR bits in output_return_instruction.  */
   emit_use (gen_rtx_REG (SImode, IP_REGNUM));
-  to_clear_mask[0] &= ~(1ULL << IP_REGNUM);
+  to_clear_mask &= ~(1ULL << IP_REGNUM);
   emit_use (gen_rtx_REG (SImode, 4));
-  to_clear_mask[0] &= ~(1ULL << 4);
+  to_clear_mask &= ~(1ULL << 4);
 }
 
+  gcc_assert ((unsigned) maxregno <= sizeof (to_clear_mask) * __CHAR_BIT__);
+
   /* If the user has defined registers to be caller saved, these are no longer
  restored by the function before returning and must thus be cleared for
  security purposes.  */
-  for (regno = NUM_ARG_REGS; regno < LAST_VFP_REGNUM; regno++)
+  for (regno = NUM_ARG_REGS; regno <= maxregno; regno++)
 {
   /* We do not touch registers that can be used to pass arguments as per
 	 the AAPCS, since these should never be made callee-saved by user
@@ -25041,7 +25045,7 @@ cmse_nonsecure_entry_clear_before_return (void)
   if (IN_RANGE (regno, IP_REGNUM, PC_REGNUM))
 	continue;
   if (call_used_regs[regno])
-	to_clear_mask[regno / 64] |= (1ULL << (regno % 64));
+	to_clear_mask |= (1ULL << regno);
 }
 
   /* Make sure we do not clear the registers used to return the result in.  */
@@ -25052,7 +25056,7 @@ cmse_nonsecure_entry_clear_before_return (void)
 
   /* No need to check that we return in registers, because we don't
 	 support returning on stack yet.  */
-  to_clear_mask[0]
+  to_clear_mask
 	&= ~compute_not_to_clear_mask (result_type, result_rtl, 0,
    padding_bits_to_clear_ptr);
 }
@@ -25063,7 +25067,7 @@ cmse_nonsecure_entry_clear_before_return (void)
   /* Padding bits to clear is not 0 so we know we are dealing with
 	 returning a composite type, which only uses r0.  Let's make sure that
 	 r1-r3 is cleared too, we will use r1 as a scratch register.  */
-  

Re: SSA range class and removal of VR_ANTI_RANGEs

2017-06-20 Thread Martin Sebor

On 06/20/2017 02:41 AM, Aldy Hernandez wrote:



On 05/23/2017 03:26 PM, Martin Sebor wrote:

On 05/23/2017 04:48 AM, Aldy Hernandez wrote:



+  void Union (wide_int x, wide_int y);
+  bool Union (const irange );
+  bool Union (const irange , const irange );
+
+  // THIS = THIS ^ [X,Y].  Return TRUE if result is non-empty.
+  bool Intersect (wide_int x, wide_int y, bool readonly = false);
+  // THIS = THIS ^ R.  Return TRUE if result is non-empty.
+  // THIS = R1 ^ R2.  Return TRUE if result is non-empty.
+  bool Intersect (const irange , const irange , bool readonly =
false);
+  // Return TRUE if THIS ^ R will be non-empty.
+  bool Intersect_p (const irange )
+{ return Intersect (r, /*readonly=*/true); }

I would suggest the following changes to Union, Intersect, and Not:

1) Define all three members without the readonly argument and
returning irange& (i.e., *this).  The return value can be
used wherever irange& is expected, and the is_empty() member
function can be called on it to obtain the same result.  E.g.,
Intersect A with B, storing the result in A:

irange A, B;
if (A.Intersect (B).is_empty ()) { ... }

2) Add non-members like so:

   irange range_union (const irange , const irange )
   {
 return irange (lhs).Union (rhs);
   }

   and find out if the union of A or B is empty without modifying
   either argument:

   irange A, B;
   if (range_union (A, B).is_empty ()) { ... }


Perhaps we could provide an implicit conversion from irange to bool such
that we could write:

if (range_union (A, B)) { ... }

as well as being able to write:

if (!range_union (A, B).is_empty ()) { ... }

That is, have range_union() return an irange as suggested, but have a
bool overload (or whatever the C++ nomenclature is) such that converting
an irange to a bool is interpreted as ``nitems != 0''.

Is this acceptable C++ practice?


Implicit conversion to bool is a common way of testing validity
but I don't think it would be too surprising to use it as a test
for non-emptiness.  An alternative to consider is to provide
an implicit conversion to an unsigned integer (instead of
num_ranges()(*)) and have it return the number of ranges.  That
will make it possible to do the same thing as above while also
simplifying the API.

Martin

[*] FWIW, there's nothing wrong with the name num_ranges() but
those familiar with the C++ standard library are going to be
accustomed to size() as the name of a function that returns
the number of elements in a container.  Since the irange class
is an ordered sequence of ranges, size() would work for it too.

PS Thinking of the irange class as a container of ranges suggests
the design might benefit from introducing a simple lower-level
abstraction (class) for a single contiguous range.


Re: [PATCH][AArch64] Mark symbols as constant

2017-06-20 Thread Wilco Dijkstra
Richard Earnshaw wrote:

> What testing has this had with -fpic?  I'm not convinced that this
> assertion is true in that case?

I ran the GLIBC tests which pass. -fpic works since it does also form a
constant address, ie. instead of:

adrp  x1, global
add x1, x1, :lo12:global

we do:

adrp  x1, :got:global
ldr x1, [x1, :got_lo12:global]

CSEing or rematerializing either sequence works in the same way.

With TLS the resulting addresses are also constant, however this could
cause rather complex TLS sequences to be rematerialized.  It seems
best to block that.  Updated patch below:


Aarch64_legitimate_constant_p currently returns false for symbols,
eventhough they are always valid constants.  This means LOSYM isn't
CSEd correctly.  If we return true CSE works better, resulting in
smaller/faster code (0.3% smaller code on SPEC2006).  Avoid this
for TLS symbols since their sequence is complex.

int x0 = 1, x1 = 2, x2 = 3;

int 
f (int x, int y)
{
  x += x1;
  if (x > 100)
y += x2;
  x += x0;
  return x + y;
}

Before:
adrpx3, .LANCHOR0
add x4, x3, :lo12:.LANCHOR0
ldr w2, [x3, #:lo12:.LANCHOR0]
add w0, w0, w2
cmp w0, 100
ble .L5
ldr w2, [x4, 8]
add w1, w1, w2
.L5:
add x3, x3, :lo12:.LANCHOR0
ldr w2, [x3, 4]
add w0, w0, w2
add w0, w0, w1
ret

After:
adrpx2, .LANCHOR0
add x3, x2, :lo12:.LANCHOR0
ldr w2, [x2, #:lo12:.LANCHOR0]
add w0, w0, w2
cmp w0, 100
ble .L5
ldr w2, [x3, 8]
add w1, w1, w2
.L5:
ldr w2, [x3, 4]
add w0, w0, w2
add w0, w0, w1
ret

Bootstrap OK, OK for commit?

ChangeLog:
2017-06-20  Wilco Dijkstra  

* config/aarch64/aarch64.c (aarch64_legitimate_constant_p):
Return true for non-tls symbols.
--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
5ec6bbfcf484baa4005b8a88cb98d0d04f710877..060cd8476d2954119daac495ecb059c9be73edbe
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10111,6 +10111,11 @@ aarch64_legitimate_constant_p (machine_mode mode, rtx 
x)
   && aarch64_valid_symref (XEXP (x, 0), GET_MODE (XEXP (x, 0
 return true;
 
+  /* Treat symbols as constants.  Avoid TLS symbols as they are complex,
+ so spilling them is better than rematerialization.  */
+  if (SYMBOL_REF_P (x) && !SYMBOL_REF_TLS_MODEL (x))
+return true;
+
   return aarch64_constant_address_p (x);
 }
 

Re: [Patch AArch64] Add rcpc extension

2017-06-20 Thread Richard Earnshaw (lists)
On 20/06/17 14:50, James Greenhalgh wrote:
> 
> Hi,
> 
> While GCC doesn't need to know anything about the RcPc extension for code
> generation, we do need to add the extension flag to the string we pass
> to the assembler when we're compiling for a CPU which implements the RcPc
> extension.
> 
> I've built a toolchain with this patch applied, and checked that we
> correctly pass +rcpc on to the assembler if we give something like
> -mcpu=generic+rcpc .
> 
> OK?
> 
> Thanks,
> James
> 

OK.

R.
> ---
> 2017-06-20  James Greenhalgh  
> 
>   * config/aarch64/aarch64-option-extensions.def (rcpc): New.
>   * config/aarch64/aarch64.h (AARCH64_FL_RCPC): New.
> 
> 
> 0001-Patch-AArch64-Add-rcpc-extension.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> b/gcc/config/aarch64/aarch64-option-extensions.def
> index b54de03..c0752ce 100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -60,4 +60,7 @@ AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, 
> "atomics")
> Disabling "fp16" just disables "fp16".  */
>  AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, 0, "fphp 
> asimdhp")
>  
> +/* Enabling or disabling "rcpc" only changes "rcpc".  */
> +AARCH64_OPT_EXTENSION("rcpc", AARCH64_FL_RCPC, 0, 0, "lrcpc")
> +
>  #undef AARCH64_OPT_EXTENSION
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index e4fb96f..3b3f27e 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -140,6 +140,7 @@ extern unsigned aarch64_architecture_version;
>  #define AARCH64_FL_F16 (1 << 9)  /* Has ARMv8.2-A FP16 
> extensions.  */
>  /* ARMv8.3-A architecture extensions.  */
>  #define AARCH64_FL_V8_3(1 << 10)  /* Has ARMv8.3-A features.  */
> +#define AARCH64_FL_RCPC(1 << 11)  /* Has support for RCpc model. 
>  */
>  
>  /* Has FP and SIMD.  */
>  #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
> 



Re: [PATCH 2/2] DWARF: make it possible to emit debug info for declarations only

2017-06-20 Thread Pierre-Marie de Rodat

On 06/20/2017 02:16 PM, Richard Biener wrote:

Nice.  This looks ok.


Great, thank you!


I'm mildy curious about the deecrease of debuginfo size for cc1 -- did
you spot anything obvious there?


Well, the benchmark I exposed was for the whole file size, not just 
.debug_info section size. Just to be sure, I compared object files for 
both trunk and my patched tree: outside of Ada units, I only get the 
following evolution:

gcc/dwarf2out.o: -10168 bytes
gcc/godump.o: +272 bytes
gcc/passes.o: +880 bytes


This diff comes from my changes themselves. I had a quick look at the 
same for cc1’s .debug_info: there is the expected evolution, too,



I suspect Fortran wants to do sth similar as Ada for imported modules.


Maybe. I have zero Fortran knowledge, so I’ll let a Fortran expert 
decide, if that is fine for you. :-) In any case, the back-end is ready 
for that.


--
Pierre-Marie de Rodat


Re: [testsuite, i386] Always check for target i?86 and x86_64

2017-06-20 Thread JonY
On 06/20/2017 01:35 PM, Rainer Orth wrote:
> JonY <10wa...@gmail.com> writes:
> 
>> On 06/20/2017 01:01 PM, Rainer Orth wrote:
>>> Given that there were no other comments, I've installed the patch.  It
>>> would still be nice if the Cygwin/MingW maintainer could comment on the
>>> testcase situation for those targets.
>>
>> Honestly, I'm not sure how ms-bitfields work on non-Windows targets,
> 
> I just noticed that it's handled in generic code in i386.c and the
> affected tests worked on the likes of Linux and Solaris ;-)

gcc/testsuite/gcc.dg/array-quals-1.casm bits probably won't work 
due to object format difference
gcc/testsuite/gcc.dg/lto/20091013-1_1.c I'm not familiar with LTO 
enough to comment on it
gcc/testsuite/gcc.dg/lto/20091013-1_2.c dto.
gcc/testsuite/gcc.dg/pr32370.c  This probably should be safe to 
enable for all.
gcc/testsuite/gcc.dg/pr50251.c  dto.
gcc/testsuite/gcc.dg/tls/thr-cse-1.cI think this should stay as is.
gcc/testsuite/gcc.dg/weak/weak-15.c ELF weak symbols don't really 
work on Windows PE format and are known to be broken
gcc/testsuite/gcc.dg/weak/weak-16.c dto.
gcc/testsuite/gcc.dg/weak/weak-2.c  dto.
gcc/testsuite/gcc.dg/weak/weak-3.c  dto.
gcc/testsuite/gcc.dg/weak/weak-4.c  dto.
gcc/testsuite/gcc.dg/weak/weak-5.c  dto.
libffi/testsuite/libffi.call/cls_longdouble_va.c iirc libffi hasn't been ported 
for 64bit Windows



gcc/testsuite/g++.dg/abi/bitfield3.CShould remain 32bit specific
gcc/testsuite/g++.dg/ext/dllexport3.C   Should be x86_64-*-cygwin too
gcc/testsuite/g++.dg/ext/selectany1.C   dto. 
gcc/testsuite/g++.dg/ext/selectany2.C   dto. 
gcc/testsuite/g++.old-deja/g++.ext/attrib5.CProbably broken due to the 
underscore prefix in asm names for 32bit mingw/cygwin
gcc/testsuite/gcc.dg/dll-3.cShould be x86_64-*-cygwin too
gcc/testsuite/gcc.dg/dll-4.cdto.
gcc/testsuite/gcc.dg/dll-5.cdto.
gcc/testsuite/gcc.dg/dll-8.cdto.
gcc/testsuite/gcc.dg/tree-ssa/loop-1.c  dto.
gcc/testsuite/gcc.target/i386/fastcall-1.c  Keep as is, x86 specific test 
case.

I can't comment on the ARM mingw* port, I have no experience with it.





signature.asc
Description: OpenPGP digital signature


Re: [patch, libfortran, RFC] Speed up cshift with array shift

2017-06-20 Thread Dominique d'Humières
Hi Thomas,

On my machine I get the following timings without the patch

 cpu time cshift dim=1   0.490763009
 cpu time do loop dim=15.57969809E-02
 cpu time cshift dim=2   0.416319966
 cpu time do loop dim=2   0.187106013
 cpu time cshift dim=31.37362707
 cpu time do loop dim=31.39690399

and

 cpu time cshift dim=1   0.166012987
 cpu time do loop dim=15.48990071E-02
 cpu time cshift dim=2   0.183587968
 cpu time do loop dim=2   0.191835046
 cpu time cshift dim=31.35024190
 cpu time do loop dim=31.42215610

with the patch. Do you understand why cshift is so slow for dim=3?

Thanks for working on this issue.

Dominique

PS See also pr45689.



[AArch64] Improve HFA code generation

2017-06-20 Thread James Greenhalgh

Hi,

For this code:

  struct y {
float x[4];
  };

  float
  bar3 (struct y x)
  {
return x.x[3];
  }

GCC generates:

  bar3:
fmovx1, d2
mov x0, 0
bfi x0, x1, 0, 32
fmovx1, d3
bfi x0, x1, 32, 32
sbfxx0, x0, 32, 32
fmovs0, w0
ret

If you can wrap your head around that, you'll spot that it could be
simplified to:

  bar3:
fmovs0, s3
ret

Looking at it, I think the issue is the mode that we assign to the
PARALLEL we build for an HFA in registers. When we get in to
aarch64_layout_arg with a composite, MODE is set to the smallest integer
mode that would contain the size of the composite type. That is to say, in
the example above, MODE will be TImode.

Looking at the expansion path through assign_parms, we're going to go:

assign_parms
  assign_parm_setup_reg
assign_parm_remove_parallels
  emit_group_store

assign_parm_remove_parallels is going to try to create a REG in MODE,
then construct that REG using the values in the HFA PARALLEL we created. So,
for the example above, we're going to try to create a packed TImode value
built up from each of the four "S" registers we've assigned for the
arguments. Using one of the struct elements is then a 32-bit extract from
the TImode value (then a move back to FP/SIMD registers). This explains
the code-gen in the example. Note that an extract from the TImode value
makes the whole TImode value live, so we can't optimize away the
construction in registers.

If instead we make the PARALLEL that we create in aarch64_layout_arg BLKmode
then our expansion path is through:

assign_parms
  assign_parm_setup_block

Which handles creating a stack slot of the right size for our HFA, and
copying it to there. We could then trust the usual optimisers to deal with
the object construction and eliminate it where possible.

However, we can't just return a BLKmode Parallel, as the mid-end was explictly
asking us to return in MODE, and will eventually ICE given the inconsistency.
One other way we can force these structures to be given BLKmode is through
TARGET_MEMBER_TYPE_FORCES_BLK. Which is what we do in this patch.

We're going to tell the mid-end that any structure of more than one element
which contains either floating-point or vector data should be set out in
BLKmode rather than a large-enough integer mode. In doing so, we implicitly
fix the issue with HFA layout above. But at what cost!

A long running deficiency in GCC's code-gen (doesn't clean up stack
allocations after stack uses have been eliminated) prevents us from
getting what we really wanted, but:

  bar3:
sub sp, sp, #16
fmovs0, s3
add sp, sp, 16
ret

is pretty close, and a huge improvement over where we are today.

Note that we can still get some pretty bad code-generation out of the
compiler when passing and returning structs. I particularly like this
one:

  struct y {
float x[4];
  };

  struct y
  bar (struct y x)
  {
return x;
  }

  bar:
sub sp, sp, #48
stp s0, s1, [sp, 16]
stp s2, s3, [sp, 24]
ldp x0, x1, [sp, 16]
stp x0, x1, [sp, 32]
ldp s0, s1, [sp, 32]
ldp s2, s3, [sp, 40]
add sp, sp, 48
ret

But that looks to be a seperate issue, and is not substantially worse tha
current trunk:

  bar:
fmovx2, d0
mov x1, 0
mov x0, 0
bfi x1, x2, 0, 32
fmovx2, d2
bfi x0, x2, 0, 32
fmovx2, d1
bfi x1, x2, 32, 32
fmovx2, d3
bfi x0, x2, 32, 32
ubfxx2, x1, 0, 32
ubfxx1, x1, 32, 32
fmovs0, w2
ubfxx3, x0, 0, 32
fmovs1, w1
ubfxx0, x0, 32, 32
fmovs2, w3
fmovs3, w0
ret

I've benchamrked this with Spec2000 and found no performance differences.
And bootstrapped on aarch64-none-linux-gnu with no issues.

Does this look like a sensible approach and if so, is it OK for trunk?

Thanks,
James

---
gcc/

2017-06-20  James Greenhalgh  

* config/aarch64/aarch64.c (aarch64_layout_arg): Construct HFA
PARALLELs in BLKmode.

gcc/testsuite/

2017-06-20  James Greenhalgh  

* gcc.target/aarch64/hfa_1.c: New.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 04417dc..a147068 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14925,6 +14925,32 @@ aarch64_sched_can_speculate_insn (rtx_insn *insn)
 }
 }
 
+/* We're an composite type, so MODE is the smallest integer mode
+   that can fit the total size of our aggregate.  However,
+   we're going to build a parallel that contains each of our
+   registers, and GCC is going to emit code to move them in
+   to a packed value in MODE.  As an example, for an HFA of
+   two 

Re: [PATCH] Fix PR71815 (SLSR misses PHI opportunities)

2017-06-20 Thread Bill Schmidt
On Jun 20, 2017, at 6:23 AM, Richard Biener  wrote:
> 
> On Fri, Jun 16, 2017 at 6:10 PM, Bill Schmidt
>  wrote:
>> Hi,
>> 
>> PR71815 identifies a situation where SLSR misses opportunities for
>> PHI candidates when code hoisting is enabled (which is now on by
>> default).  The basic problem is that SLSR currently uses an overly
>> simple test for profitability of the transformation.  The algorithm
>> currently requires that the PHI basis (through which the non-local
>> SLSR candidate is propagated) has only one use, which is the
>> candidate statement.  The true requirement for profitability is
>> that, if the candidate statement will be dead after transformation,
>> then so will the PHI candidate.
>> 
>> This patch fixes the problem by looking at the transitive reachability
>> of the PHI definitions.  If all paths terminate in the candidate
>> statement, then we know the PHI basis will go dead and we will not
>> make the code worse with the planned replacement.  To avoid compile
>> time issues, path search is arbitrarily terminated at depth 10.  The
>> new test is used throughout the cost calculation, so appears multiple
>> times in the code.
>> 
>> Also, I've added a check to avoid replacing multiply candidates with
>> a stride of 1.  Such a candidate is really a copy or cast statement,
>> and if we replace it, we will just generate a different copy or cast
>> statement.  I noticed this with one of the test cases from the PR
>> while debugging the problem.
>> 
>> I've updated the two test cases that were previously enabled only
>> with -fno-code-hoisting, removing that restriction.
>> 
>> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
>> regressions.  I've also tested this with SPEC cpu2006 and the
>> patch is performance neutral on a POWER8 box (as expected).  Is
>> this ok for trunk?
>> 
>> Thanks,
>> Bill
>> 
>> 
>> [gcc]
>> 
>> 2016-06-16  Bill Schmidt  
>> 
>>* gimple-ssa-strength-reduction.c (uses_consumed_by_stmt): New
>>function.
>>(find_basis_for_candidate): Call uses_consumed_by_stmt rather than
>>has_single_use.
>>(slsr_process_phi): Likewise.
>>(replace_uncond_cands_and_profitable_phis): Don't replace a
>>multiply candidate with a stride of 1 (copy or cast).
>>(phi_incr_cost): Call uses_consumed_by_stmt rather than
>>has_single_use.
>>(lowest_cost_path): Likewise.
>>(total_savings): Likewise.
>> 
>> [gcc/testsuite]
>> 
>> 2016-06-16  Bill Schmidt  
>> 
>>* gcc.dg/tree-ssa/slsr-35.c: Remove -fno-code-hoisting workaround.
>>* gcc.dg/tree-ssa/slsr-36.c: Likewise.
>> 
>> 
>> Index: gcc/gimple-ssa-strength-reduction.c
>> ===
>> --- gcc/gimple-ssa-strength-reduction.c (revision 239241)
>> +++ gcc/gimple-ssa-strength-reduction.c (working copy)
>> @@ -475,6 +475,48 @@ find_phi_def (tree base)
>>   return c->cand_num;
>> }
>> 
>> +/* Determine whether all uses of NAME are directly or indirectly
>> +   used by STMT.  That is, we want to know whether if STMT goes
>> +   dead, the definition of NAME also goes dead.  */
>> +static bool
>> +uses_consumed_by_stmt (tree name, gimple *stmt, unsigned recurse)
> 
> use a default arg 'unsigned recurse = 0' to hide this implementation
> detail at users.

Good idea, thanks.

> 
>> +{
>> +  gimple *use_stmt;
>> +  imm_use_iterator iter;
>> +  bool retval = true;
>> +
>> +  FOR_EACH_IMM_USE_STMT (use_stmt, iter, name)
>> +{
>> +  if (use_stmt == stmt || is_gimple_debug (use_stmt))
>> +   continue;
>> +
>> +  if (!is_gimple_assign (use_stmt))
>> +   {
>> + retval = false;
>> + BREAK_FROM_IMM_USE_STMT (iter);
>> +   }
>> +
>> +  /* Limit recursion.  */
>> +  if (recurse >= 10)
>> +   {
>> + retval = false;
>> + BREAK_FROM_IMM_USE_STMT (iter);
>> +   }
> 
> Put this limit right before the recursion.
> 
>> +  tree next_name = gimple_get_lhs (use_stmt);
>> +  if (!next_name || !is_gimple_reg (next_name))
>> +   {
>> + retval = false;
>> + BREAK_FROM_IMM_USE_STMT (iter);
>> +   }
>> +
>> +  if (uses_consumed_by_stmt (next_name, stmt, recurse + 1))
>> +   continue;
> 
> So this doesn't change dependent on the result which means you likely meant
> 
>  if (! uses)
>   {
> retval = false;
> BREAK...
>   }
> 
> which possibly also invalidates your testing?

Grumble.  Can't believe I did that.  Yep, will respin.

> 
> The whole thing is probably easier to optimize if you merge the ifs
> that break into one.

Will do!  Thanks, Richard!

Bill

> 
> Richard.
> 
>> +}
>> +
>> +  return retval;
>> +}
>> +
>> /* Helper routine for find_basis_for_candidate.  May be called twice:
>>once for the candidate's base expr, 

[PING^3] re [PATCH v2] C++: Add fix-it hints for -Wold-style-cast

2017-06-20 Thread David Malcolm
Ping re this patch:

  https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00204.html

(more description can be seen in v1 of the patch here:
  https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01429.html
)


On Mon, 2017-06-05 at 12:41 -0400, David Malcolm wrote:
> Ping re this patch:
> 
>  https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00204.html
> 
> On Fri, 2017-05-26 at 15:35 -0400, David Malcolm wrote:
> > On Wed, 2017-05-03 at 09:51 -0400, David Malcolm wrote:
> > > On Thu, 2017-04-27 at 23:03 +0200, Marek Polacek wrote:
> > > > On Thu, Apr 27, 2017 at 05:10:24PM -0400, David Malcolm wrote:
> > > > > +  /* First try const_cast.  */
> > > > > +  trial = build_const_cast (dst_type, orig_expr, 0 /*
> > > > > complain
> > > > > */);
> > > > > +  if (trial != error_mark_node)
> > > > > +return "const_cast";
> > > > > +
> > > > > +  /* If that fails, try static_cast.  */
> > > > > +  trial = build_static_cast (dst_type, orig_expr, 0 /*
> > > > > complain
> > > > > */);
> > > > > +  if (trial != error_mark_node)
> > > > > +return "static_cast";
> > > > > +
> > > > > +  /* Finally, try reinterpret_cast.  */
> > > > > +  trial = build_reinterpret_cast (dst_type, orig_expr, 0 /*
> > > > > complain */);
> > > > > +  if (trial != error_mark_node)
> > > > > +return "reinterpret_cast";
> > > > 
> > > > I think you'll want tf_none instead of 0 /* complain */ in
> > > > these.
> > > > 
> > > > Marek
> > > 
> > > Thanks.
> > > 
> > > Here's an updated version of the patch.
> > > 
> > > Changes since v1:
> > > - updated expected fixit-formatting (the new fix-it printer in
> > >   r247548 handles this properly now)
> > > - added new test cases as suggested by Florian
> > > - use "tf_none" rather than "0 /* complain */"
> > > 
> > > Successfully bootstrapped on x86_64-pc-linux-gnu.
> > > 
> > > OK for trunk?
> > > 
> > > gcc/cp/ChangeLog:
> > >   * parser.c (get_cast_suggestion): New function.
> > >   (maybe_add_cast_fixit): New function.
> > >   (cp_parser_cast_expression): Capture the location of the
> > > closing
> > >   parenthesis.  Call maybe_add_cast_fixit when emitting warnings
> > >   about old-style casts.
> > > 
> > > gcc/testsuite/ChangeLog:
> > >   * g++.dg/other/old-style-cast-fixits.C: New test case.
> > > ---
> > >  gcc/cp/parser.c| 93
> > > -
> > >  gcc/testsuite/g++.dg/other/old-style-cast-fixits.C | 95
> > > ++
> > >  2 files changed, 186 insertions(+), 2 deletions(-)
> > >  create mode 100644 gcc/testsuite/g++.dg/other/old-style-cast
> > > -fixits.C
> > > 
> > > diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> > > index 4714bc6..2f83aa9 100644
> > > --- a/gcc/cp/parser.c
> > > +++ b/gcc/cp/parser.c
> > > @@ -8633,6 +8633,85 @@ cp_parser_tokens_start_cast_expression
> > > (cp_parser *parser)
> > >  }
> > >  }
> > >  
> > > +/* Try to find a legal C++-style cast to DST_TYPE for ORIG_EXPR,
> > > trying them
> > > +   in the order: const_cast, static_cast, reinterpret_cast.
> > > +
> > > +   Don't suggest dynamic_cast.
> > > +
> > > +   Return the first legal cast kind found, or NULL otherwise. 
> > >  */
> > > +
> > > +static const char *
> > > +get_cast_suggestion (tree dst_type, tree orig_expr)
> > > +{
> > > +  tree trial;
> > > +
> > > +  /* Reuse the parser logic by attempting to build the various
> > > kinds
> > > of
> > > + cast, with "complain" disabled.
> > > + Identify the first such cast that is valid.  */
> > > +
> > > +  /* Don't attempt to run such logic within template processing.
> > >  */
> > > +  if (processing_template_decl)
> > > +return NULL;
> > > +
> > > +  /* First try const_cast.  */
> > > +  trial = build_const_cast (dst_type, orig_expr, tf_none);
> > > +  if (trial != error_mark_node)
> > > +return "const_cast";
> > > +
> > > +  /* If that fails, try static_cast.  */
> > > +  trial = build_static_cast (dst_type, orig_expr, tf_none);
> > > +  if (trial != error_mark_node)
> > > +return "static_cast";
> > > +
> > > +  /* Finally, try reinterpret_cast.  */
> > > +  trial = build_reinterpret_cast (dst_type, orig_expr, tf_none);
> > > +  if (trial != error_mark_node)
> > > +return "reinterpret_cast";
> > > +
> > > +  /* No such cast possible.  */
> > > +  return NULL;
> > > +}
> > > +
> > > +/* If -Wold-style-cast is enabled, add fix-its to RICHLOC,
> > > +   suggesting how to convert a C-style cast of the form:
> > > +
> > > + (DST_TYPE)ORIG_EXPR
> > > +
> > > +   to a C++-style cast.
> > > +
> > > +   The primary range of RICHLOC is asssumed to be that of the
> > > original
> > > +   expression.  OPEN_PAREN_LOC and CLOSE_PAREN_LOC give the
> > > locations
> > > +   of the parens in the C-style cast.  */
> > > +
> > > +static void
> > > +maybe_add_cast_fixit (rich_location *rich_loc, location_t
> > > open_paren_loc,
> > > +   location_t close_paren_loc, tree
> > > orig_expr,
> > > +   tree dst_type)
> > > +{
> > > +  /* This function is 

Re: [PATCH/AARCH64] Improve aarch64 conditional compare usage

2017-06-20 Thread James Greenhalgh
On Fri, Jun 16, 2017 at 10:06:51AM -0700, Steve Ellcey wrote:
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00021.html
> 
> Ping.

Hi Steve,

These changes all look like they are to the tree pass rather than to the
AArch64 back end. Maybe reposting it without the AArch64 tag will get it
more visibility from people other than the AArch64 maintainers?

Cheers,
James



[Patch AArch64] Add rcpc extension

2017-06-20 Thread James Greenhalgh

Hi,

While GCC doesn't need to know anything about the RcPc extension for code
generation, we do need to add the extension flag to the string we pass
to the assembler when we're compiling for a CPU which implements the RcPc
extension.

I've built a toolchain with this patch applied, and checked that we
correctly pass +rcpc on to the assembler if we give something like
-mcpu=generic+rcpc .

OK?

Thanks,
James

---
2017-06-20  James Greenhalgh  

* config/aarch64/aarch64-option-extensions.def (rcpc): New.
* config/aarch64/aarch64.h (AARCH64_FL_RCPC): New.

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index b54de03..c0752ce 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -60,4 +60,7 @@ AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, "atomics")
Disabling "fp16" just disables "fp16".  */
 AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, 0, "fphp asimdhp")
 
+/* Enabling or disabling "rcpc" only changes "rcpc".  */
+AARCH64_OPT_EXTENSION("rcpc", AARCH64_FL_RCPC, 0, 0, "lrcpc")
+
 #undef AARCH64_OPT_EXTENSION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index e4fb96f..3b3f27e 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -140,6 +140,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_F16	  (1 << 9)  /* Has ARMv8.2-A FP16 extensions.  */
 /* ARMv8.3-A architecture extensions.  */
 #define AARCH64_FL_V8_3	  (1 << 10)  /* Has ARMv8.3-A features.  */
+#define AARCH64_FL_RCPC	  (1 << 11)  /* Has support for RCpc model.  */
 
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)


Re: [PATCH 2/3] Simplify wrapped binops

2017-06-20 Thread Richard Biener
On Tue, Jun 20, 2017 at 3:08 PM, Robin Dapp  wrote:
>>> Currently, extract_... () does that all that for me, is it really too
>>> expensive to call? I guess, using get_range_info first and calling
>>> extract when get_range_info returns a VR_RANGE is not really a favorable
>>> thing to do either? :)
>> Not only the cost, we should avoid introducing more interfaces while
>> old ones can do the work.  Anyway, it's Richard's call here.
>
> I rewrote the match.pd patterns to use get_range_info () now, keeping
> track of an "ok" overflow (both min and max overflow) and one which does
> not allow us to continue (min xor max overflow, split/anti range).  Test
> suite on s390x has no regressions, bootstrap is ok, x86 running.

+  (if (TREE_CODE (type) == INTEGER_TYPE
+   && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3)))
+   (with

use INTEGRAL_TYPE_P.

+ bool ovf_undef = TYPE_OVERFLOW_UNDEFINED (inner_type);
+

so this is overflow behavior of the inner op.

+ /* Convert combined constant to tree of outer type if
+there was no overflow in the original operation.  */

"in the original inner operation."

you then go on and use ovf_undef also for the outer operation:

+ if (ovf_undef || vr_outer == VR_RANGE)
+ {

but you do not actually _use_ vr_outer.  Do you think that if
vr_outer is a VR_RANGE then the outer operation may not
possibly have wrapped?  That's a false conclusion.

But I don't see how overflow in the original outer operation matters
and the code lacks comments as to explaining that as well.

So if you have a vr0 then you can compute whether the inner
operation cannot overflow.  You do this here:

+   if (!ovf_undef && vr0 == VR_RANGE)
+ {
+   int max_ovf = 0;
+   int min_ovf = 0;
+
+   signop sgn = TYPE_SIGN (inner_type);
+
+   wmin = wi::add (wmin0, w1);
+   min_ovf = wi::cmp (wmin, w1, sgn) < 0;
+
+   wmax = wi::add (wmax0, w1);
+   max_ovf = wi::cmp (wmax, w1, sgn) < 0;
+
+   ovf = min_ovf || max_ovf;
+
+   split_range = ((min_ovf && !max_ovf)
+  || (!min_ovf && max_ovf));

ah, here's the use of the outer value-range.  This lacks a comment
(and it looks fishy given the outer value-range is a conservative approximation
and thus could be [-INF, +INF]).  Why's this not using the
wi::add overload with the overflow flag?  ISTR you want to handle "negative"
unsigned constants somehow, but then I don't see how the above works.
I'd say if wmin/wmax interpreted as signed are positive and then using
a signed op to add w1 results in a still positive number you're fine
(you don't seem
to restrict the widening cast to either zero- or sign-extending).

+   if (ovf_undef || !split_range)
+ {
+   /* Extend @1 to TYPE. */
+   w1 = w1.from (w1, TYPE_PRECISION (type),
+ ovf ? SIGNED : TYPE_SIGN
(TREE_TYPE (@1)));

ideally you could always interpret w1 as signed?

+   /* Combine in outer, larger type.  */
+   wide_int combined_cst;
+   combined_cst = wi::add (w1, w2);

+(if (cst)
+(outer_op (convert @0) { cst; }))
+   )

bogus indent.

+/* ((T)(A)) +- CST -> (T)(A +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+(simplify
+ (outer_op (convert SSA_NAME@0) INTEGER_CST@2)
+  (if (TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
+  && TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
+  && TREE_CODE (type) == INTEGER_TYPE)

INTEGRAL_TYPE_P and do that first before looking at TYPE_PRECISION.

+   if (vr == VR_RANGE)
+ {
+   wide_int wmin = wi::add (wmin0, w1);
+   bool min_ovf = wi::cmp (wmin, w1, sgn) < 0;
+
+   wide_int wmax = wi::add (wmax0, w1);
+   bool max_ovf = wi::cmp (wmax, w1, sgn) < 0;
+
+   split_range = (min_ovf && !max_ovf) || (!min_ovf && max_ovf);

similar why not use wi:add overload with the overflow flag?

Btw, I find

  (with
   {
 tree x = NULL;
  if (...) x = non-NULL;
   }
   (if (x)
(

ugly.  Use

   (with
{
  ...
}
(if (...)
 (... { non-NULL } )

or sth like that which makes control flow more easily visible.

Richard.


> Regards
>  Robin
>
> --
>
> gcc/ChangeLog:
>
> 2017-06-19  Robin Dapp  
>
> * match.pd: Simplify wrapped binary operations.


[arm-embedded] [PATCH, GCC/ARM, Stage 1] Rename FPSCR builtins to correct names

2017-06-20 Thread Thomas Preudhomme

Hi,

We have decided to apply the following patch to the embedded-6-branch to fix 
naming of an ARM intrinsic.


ChangeLog entry is as follows:

2017-06-20  Thomas Preud'homme  

Backport from mainline
2017-05-04  Prakhar Bahuguna  

gcc/
* gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename
__builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename
__builtin_arm_stfscr to __builtin_arm_set_fpscr.

gcc/testsuite/
* gcc.target/arm/fpscr.c: New file.


Best regards,

Thomas
--- Begin Message ---

Hi Prakhar,
Sorry for the delay,

On 22/03/17 10:46, Prakhar Bahuguna wrote:

The GCC documentation in section 6.60.8 ARM Floating Point Status and Control
Intrinsics states that the FPSCR register can be read and written to using the
intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr. However, these
are misnamed within GCC itself and these intrinsic names are not recognised.
This patch corrects the intrinsic names to match the documentation, and adds
tests to verify these intrinsics generate the correct instructions.

Testing done: Ran regression tests on arm-none-eabi for Cortex-M4.

2017-03-09  Prakhar Bahuguna  

gcc/ChangeLog:

* gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename
  __builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename
  __builtin_arm_stfscr to __builtin_arm_set_fpscr.
* gcc/testsuite/gcc.target/arm/fpscr.c: New file.

Okay for stage 1?


I see that the mistake was in not addressing one of the review comments in:
https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01832.html
properly in the patch that added these functions :(

This is ok for stage 1 if a bootstrap and test on arm-none-linux-gnueabihf 
works fine
I don't think we want to maintain the __builtin_arm_[ld,st]fscr names for 
backwards compatibility
as they were not documented and are __builtin_arm* functions that we don't 
guarantee to maintain.

Thanks,
Kyrill


--

Prakhar Bahuguna


--- End Message ---


[Patch AArch64 obvious] Fix expected string for fp16 extensions

2017-06-20 Thread James Greenhalgh

Hi,

As currently coded, the native detection of the fp16 architecture extension
from the ARMv8.2-A extensions looks for the string "fp16", but the kernel
exposes support of these features through two strings "fphp, for scalar
16-bit floating point support, and "asimdhp" for vector 16-bit floating-point
support [1].

This patch fixes the string we look for, looking for the pair of both
fphp and asimdhp.

I have no platform to test this on, so my testing is to show that it builds
and correctly enables the fp16 extension when given a faked up /proc/cpuinfo

I've committed this as obvious to trunk (as revision 249411) and gcc-7-branch
(as revision 249413).

Thanks,
James

[1] Patchwork arm64: Add support for Half precision floating point
https://patchwork.kernel.org/patch/8124451/

---
2017-06-20  James Greenhalgh  

* config/aarch64/aarch64-option-extensions.def (fp16): Fix expected
feature string.

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index 36766d9..b54de03 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -58,6 +58,6 @@ AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, "atomics")
 
 /* Enabling "fp16" also enables "fp".
Disabling "fp16" just disables "fp16".  */
-AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, 0, "fp16")
+AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, 0, "fphp asimdhp")
 
 #undef AARCH64_OPT_EXTENSION


Re: [testsuite, i386] Always check for target i?86 and x86_64

2017-06-20 Thread Rainer Orth
JonY <10wa...@gmail.com> writes:

> On 06/20/2017 01:01 PM, Rainer Orth wrote:
>> Given that there were no other comments, I've installed the patch.  It
>> would still be nice if the Cygwin/MingW maintainer could comment on the
>> testcase situation for those targets.
>
> Honestly, I'm not sure how ms-bitfields work on non-Windows targets,

I just noticed that it's handled in generic code in i386.c and the
affected tests worked on the likes of Linux and Solaris ;-)

> beyond that, the patch looks like it won't change the tests that run for
> mingw/cygwin.

True: as I mentioned in the submission

https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01103.html

I didn't touch the Cygwin/MingW patches listed there except for those
where I could verify that they would/should work on any x86 target.
Maybe you can have a look a the questions raised there ("There's one
group of targets I've omitted completely"...)?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[Patch AArch64 obvious] Rearrange the processors in aarch64-cores.def

2017-06-20 Thread James Greenhalgh

Hi,

This patch rearranges the cores in aarch64-cores.def first by architecture
revision, then by alphabetical order of implementer ID.

This just neatens up the file a bit, as it is growing to be unwieldy.

Committed as revision 249410.

Thanks,
James

---
2017-06-20  James Greenhalgh  

* config/aarch64/aarch64-cores.def: Rearrange to sort by
architecture, then by implementer ID.
* config/aarch64/aarch64-tune.md: Regenerate.

diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 92b57cf..e333d5f 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -43,7 +43,7 @@
VARIANT is the variant of the CPU.  In a GNU/Linux system it can found
in /proc/cpuinfo.  If this is -1, this means it can match any variant.  */
 
-/* V8 Architecture Processors.  */
+/* ARMv8-A Architecture Processors.  */
 
 /* ARM ('A') cores. */
 AARCH64_CORE("cortex-a35",  cortexa35, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04, -1)
@@ -52,13 +52,6 @@ AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AA
 AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, 0xd08, -1)
 AARCH64_CORE("cortex-a73",  cortexa73, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, 0xd09, -1)
 
-/* Samsung ('S') cores. */
-AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  0x53, 0x001, -1)
-
-/* Qualcomm ('Q') cores. */
-AARCH64_CORE("falkor",  falkor,cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0xC00, -1)
-AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0xC00, -1)
-
 /* Cavium ('C') cores. */
 AARCH64_CORE("thunderx",  thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 0x0a0, -1)
 /* Do not swap around "thunderxt88p1" and "thunderxt88",
@@ -67,18 +60,27 @@ AARCH64_CORE("thunderxt88p1", thunderxt88p1, thunderx,  8A,  AARCH64_FL_FOR_ARCH
 AARCH64_CORE("thunderxt88",   thunderxt88,   thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 0x0a1, -1)
 AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 0x0a2, -1)
 AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 0x0a3, -1)
-AARCH64_CORE("thunderx2t99",  thunderx2t99,  thunderx2t99, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x43, 0x0af, -1)
 
 /* APM ('P') cores. */
 AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, xgene1, 0x50, 0x000, -1)
 
-/* V8.1 Architecture Processors.  */
+/* Qualcomm ('Q') cores. */
+AARCH64_CORE("falkor",  falkor,cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0xC00, -1)
+AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0xC00, -1)
+
+/* Samsung ('S') cores. */
+AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  0x53, 0x001, -1)
+
+/* ARMv8.1-A Architecture Processors.  */
 
 /* Broadcom ('B') cores. */
 AARCH64_CORE("thunderx2t99p1",  thunderx2t99p1, thunderx2t99, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)
 AARCH64_CORE("vulcan",  vulcan, thunderx2t99, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)
 
-/* V8 big.LITTLE implementations.  */
+/* Cavium ('C') cores. */
+AARCH64_CORE("thunderx2t99",  thunderx2t99,  thunderx2t99, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x43, 0x0af, -1)
+
+/* ARMv8-A big.LITTLE implementations.  */
 
 AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE (0xd07, 0xd03), -1)
 AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, AARCH64_BIG_LITTLE (0xd08, 0xd03), -1)
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index c948846..4209f67 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	

Re: [RFC] Dejagnu patch to handle multi-line directives

2017-06-20 Thread Rainer Orth
Mike Stump  writes:

> On Jun 10, 2017, at 12:57 AM, Tom de Vries  wrote:
>> 
>> one thing that has bothered me on a regular basis is the inability to
>> spread long dejagnu directives over multiple lines.
>
> I'm not terribly in favor of this.  I'd like to retain the ability to grep
> and sed single line things.  It makes exploring and finding things easier.
> Also, if we bulk convert to a new framework system for running test cases,
> the conversion is easier.

I have to agree.  Besides, extremely long lines with DejaGnu directives
often are a sign that something is amiss, e.g. long lists of targets
instead of an effective-target keyword describing what's common between
them or Tom's example which is way more readably handled via
dg-add-options as he's discovered in the meantime ;-)

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [testsuite, i386] Always check for target i?86 and x86_64

2017-06-20 Thread JonY
On 06/20/2017 01:01 PM, Rainer Orth wrote:
> 
> once I got the syntax right, this worked fine: it needs
> 
>   { target { ilp32 || lp64 } }
> 
> instead ;-)
> 
> I've also now managed to complete a Darwin/x86_64 bootstrap by locally
> reverting the two culprit patches for PR bootstrap/81033 and confirmed
> that the patched testcases are fine there, two.
> 
> Given that there were no other comments, I've installed the patch.  It
> would still be nice if the Cygwin/MingW maintainer could comment on the
> testcase situation for those targets.
> 
>   Rainer
> 

Honestly, I'm not sure how ms-bitfields work on non-Windows targets,
beyond that, the patch looks like it won't change the tests that run for
mingw/cygwin.




signature.asc
Description: OpenPGP digital signature


[arm-embedded] [PATCH, GCC/LTO, ping] Fix PR69866: LTO with def for weak alias in regular object file

2017-06-20 Thread Thomas Preudhomme

Hi,

We have decided to apply the referenced fix (r249352) to the 
ARM/embedded-6-branch along with its initial commit (r249224) to fix an ICE with 
LTO and aliases.


Fix PR69866

2017-06-20  Thomas Preud'homme  

Backport from mainline
2017-06-15  Jan Hubicka  
Thomas Preud'homme  

gcc/
PR lto/69866
* lto-symtab.c (lto_symtab_merge_symbols): Drop useless definitions
that resolved externally.
2017-06-15  Thomas Preud'homme  

gcc/testsuite/
PR lto/69866
* gcc.dg/lto/pr69866_0.c: New test.
* gcc.dg/lto/pr69866_1.c: Likewise.

Backport from mainline
2017-06-18  Jan Hubicka  

gcc/testsuite/
* gcc.dg/lto/pr69866_0.c: This test needs alias.

Best regards,

Thomas
--- Begin Message ---
> The new test fails on darwin with the usual
> 
> FAIL: gcc.dg/lto/pr69866 c_lto_pr69866_0.o-c_lto_pr69866_1.o link, -O0 -flto 
> -flto-partition=none
> 
> IMO it requires a
> 
> /* { dg-require-alias "" } */

Yep,I will add it shortly.

Honza
> 
> directive.
> 
> TIA
> 
> Dominique
--- End Message ---


[arm-embedded] [PATCH, ARM] Implement __ARM_FEATURE_COPROC coprocessor intrinsic feature macro

2017-06-20 Thread Thomas Preudhomme

Hi,

We have decided to apply the following patch to the ARM/embedded-6-branch and 
ARM/embedded-7-branch to implement the __ARM_FEATURE_COPROC coprocessor 
intrinsic feature macro.


2017-06-20  Thomas Preud'homme  

Backport from mainline
2017-06-20  Prakhar Bahuguna  

gcc/
* config/arm/arm-c.c (arm_cpu_builtins): New block to define
__ARM_FEATURE_COPROC according to support.

gcc/testsuite/
* gcc.target/arm/acle/cdp.c: Add feature macro bitmap test.
* gcc.target/arm/acle/cdp2.c: Likewise.
* gcc.target/arm/acle/ldc.c: Likewise.
* gcc.target/arm/acle/ldc2.c: Likewise.
* gcc.target/arm/acle/ldc2l.c: Likewise.
* gcc.target/arm/acle/ldcl.c: Likewise.
* gcc.target/arm/acle/mcr.c: Likewise.
* gcc.target/arm/acle/mcr2.c: Likewise.
* gcc.target/arm/acle/mcrr.c: Likewise.
* gcc.target/arm/acle/mcrr2.c: Likewise.
* gcc.target/arm/acle/mrc.c: Likewise.
* gcc.target/arm/acle/mrc2.c: Likewise.
* gcc.target/arm/acle/mrrc.c: Likewise.
* gcc.target/arm/acle/mrrc2.c: Likewise.
* gcc.target/arm/acle/stc.c: Likewise.
* gcc.target/arm/acle/stc2.c: Likewise.
* gcc.target/arm/acle/stc2l.c: Likewise.
* gcc.target/arm/acle/stcl.c: Likewise.

Best regards,

Thomas
--- Begin Message ---
On 16/06/2017 15:37:18, Richard Earnshaw (lists) wrote:
> On 16/06/17 08:48, Prakhar Bahuguna wrote:
> > On 15/06/2017 17:23:43, Richard Earnshaw (lists) wrote:
> >> On 14/06/17 10:35, Prakhar Bahuguna wrote:
> >>> The ARM ACLE defines the __ARM_FEATURE_COPROC macro which indicates which
> >>> coprocessor intrinsics are available for the target. If 
> >>> __ARM_FEATURE_COPROC is
> >>> undefined, the target does not support coprocessor intrinsics. The feature
> >>> levels are defined as follows:
> >>>
> >>> +-+---+--+
> >>> | **Bit** | **Value** | **Intrinsics Available** |
> >>> +-+---+--+
> >>> | 0   | 0x1   | __arm_cdp __arm_ldc, __arm_ldcl, __arm_stc,  |
> >>> | |   | __arm_stcl, __arm_mcr and __arm_mrc  |
> >>> +-+---+--+
> >>> | 1   | 0x2   | __arm_cdp2, __arm_ldc2, __arm_stc2, __arm_ldc2l, |
> >>> | |   | __arm_stc2l, __arm_mcr2 and __arm_mrc2   |
> >>> +-+---+--+
> >>> | 2   | 0x4   | __arm_mcrr and __arm_mrrc|
> >>> +-+---+--+
> >>> | 3   | 0x8   | __arm_mcrr2 and __arm_mrrc2  |
> >>> +-+---+--+
> >>>
> >>> This patch implements full support for this feature macro as defined in 
> >>> section
> >>> 5.9 of the ACLE
> >>> (https://developer.arm.com/products/software-development-tools/compilers/arm-compiler-5/docs/101028/latest/5-feature-test-macros).
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> 2017-06-14  Prakhar Bahuguna  
> >>>
> >>>   * config/arm/arm-c.c (arm_cpu_builtins): New block to define
> >>>__ARM_FEATURE_COPROC according to support.
> >>>
> >>> 2017-06-14  Prakhar Bahuguna  
> >>>   * gcc/testsuite/gcc.target/arm/acle/cdp.c: Add feature macro bitmap
> >>>   test.
> >>>   * gcc/testsuite/gcc.target/arm/acle/cdp2.c: Likewise.
> >>>   * gcc/testsuite/gcc.target/arm/acle/ldc.c: Likewise.
> >>>   * gcc/testsuite/gcc.target/arm/acle/ldc2.c: Likewise.
> >>>   * gcc/testsuite/gcc.target/arm/acle/ldc2l.c: Likewise.
> >>>   * gcc/testsuite/gcc.target/arm/acle/ldcl.c: Likewise.
> >>>   * gcc/testsuite/gcc.target/arm/acle/mcr.c: Likewise.
> >>>   * gcc/testsuite/gcc.target/arm/acle/mcr2.c: Likewise.
> >>>   * gcc/testsuite/gcc.target/arm/acle/mcrr.c: Likewise.
> >>>   * gcc/testsuite/gcc.target/arm/acle/mcrr2.c: Likewise.
> >>>   * gcc/testsuite/gcc.target/arm/acle/mrc.c: Likewise.
> >>>   * gcc/testsuite/gcc.target/arm/acle/mrc2.c: Likewise.
> >>>   * gcc/testsuite/gcc.target/arm/acle/mrrc.c: Likewise.
> >>>   * gcc/testsuite/gcc.target/arm/acle/mrrc2.c: Likewise.
> >>>   * gcc/testsuite/gcc.target/arm/acle/stc.c: Likewise.
> >>>   * gcc/testsuite/gcc.target/arm/acle/stc2.c: Likewise.
> >>>   * gcc/testsuite/gcc.target/arm/acle/stc2l.c: Likewise.
> >>>   * gcc/testsuite/gcc.target/arm/acle/stcl.c: Likewise.
> >>>
> >>> Testing done: ACLE regression tests updated with tests for feature macro 
> >>> bits.
> >>> All regression tests pass.
> >>>
> >>> Okay for trunk?
> >>>
> >>>
> >>> 0001-Implement-__ARM_FEATURE_COPROC-coprocessor-intrinsic.patch
> >>>
> >>>
> >>> From 79d71aec9d2bdee936b240ae49368ff5f8d8fc48 Mon Sep 17 

Re: [PATCH 2/3] Simplify wrapped binops

2017-06-20 Thread Robin Dapp
>> Currently, extract_... () does that all that for me, is it really too
>> expensive to call? I guess, using get_range_info first and calling
>> extract when get_range_info returns a VR_RANGE is not really a favorable
>> thing to do either? :)
> Not only the cost, we should avoid introducing more interfaces while
> old ones can do the work.  Anyway, it's Richard's call here.

I rewrote the match.pd patterns to use get_range_info () now, keeping
track of an "ok" overflow (both min and max overflow) and one which does
not allow us to continue (min xor max overflow, split/anti range).  Test
suite on s390x has no regressions, bootstrap is ok, x86 running.

Regards
 Robin

--

gcc/ChangeLog:

2017-06-19  Robin Dapp  

* match.pd: Simplify wrapped binary operations.
diff --git a/gcc/match.pd b/gcc/match.pd
index 80a17ba..66c37f6 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1290,6 +1290,128 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (if (cst && !TREE_OVERFLOW (cst))
  (plus { cst; } @0
 
+/* ((T)(A +- CST)) +- CST -> (T)(A) +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+ (for inner_op (plus minus)
+   (simplify
+	 (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
+	   (if (TREE_CODE (type) == INTEGER_TYPE
+		&& TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3)))
+	(with
+	{
+	  tree cst = NULL_TREE;
+	  tree inner_type = TREE_TYPE (@3);
+	  wide_int wmin, wmax;
+	  wide_int wmin0, wmax0;
+
+	  bool ovf = true;
+	  bool ovf_undef = TYPE_OVERFLOW_UNDEFINED (inner_type);
+
+	  enum value_range_type vr_outer =
+		get_range_info (@3, , );
+	  enum value_range_type vr0 =
+		get_range_info (@0, , );
+
+	  /* Convert combined constant to tree of outer type if
+		 there was no overflow in the original operation.  */
+	  if (ovf_undef || vr_outer == VR_RANGE)
+	  {
+		wide_int w1 = @1;
+		wide_int w2 = @2;
+
+		if (ovf_undef || vr0 == VR_RANGE)
+		  {
+		if (inner_op == MINUS_EXPR)
+		  w1 = wi::neg (w1);
+
+		if (outer_op == MINUS_EXPR)
+		  w2 = wi::neg (w2);
+
+		bool split_range = true;
+
+		if (!ovf_undef && vr0 == VR_RANGE)
+		  {
+			int max_ovf = 0;
+			int min_ovf = 0;
+
+			signop sgn = TYPE_SIGN (inner_type);
+
+			wmin = wi::add (wmin0, w1);
+			min_ovf = wi::cmp (wmin, w1, sgn) < 0;
+
+			wmax = wi::add (wmax0, w1);
+			max_ovf = wi::cmp (wmax, w1, sgn) < 0;
+
+			ovf = min_ovf || max_ovf;
+
+			split_range = ((min_ovf && !max_ovf)
+   || (!min_ovf && max_ovf));
+		  }
+
+		if (ovf_undef || !split_range)
+		  {
+			/* Extend @1 to TYPE. */
+			w1 = w1.from (w1, TYPE_PRECISION (type),
+  ovf ? SIGNED : TYPE_SIGN (TREE_TYPE (@1)));
+
+			/* Combine in outer, larger type.  */
+			wide_int combined_cst;
+			combined_cst = wi::add (w1, w2);
+
+			cst = wide_int_to_tree (type, combined_cst);
+		  }
+		  }
+	  }
+	}
+(if (cst)
+	 (outer_op (convert @0) { cst; }))
+	)
+#endif
+
+/* ((T)(A)) +- CST -> (T)(A +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+(simplify
+ (outer_op (convert SSA_NAME@0) INTEGER_CST@2)
+  (if (TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
+	   && TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
+	   && TREE_CODE (type) == INTEGER_TYPE)
+   /* Perform binary operation inside the cast if the constant fits
+	  and there is no overflow.  */
+   (with
+	{
+	  bool split_range = true;
+	  tree cst_inner = NULL_TREE;
+	  enum value_range_type vr = VR_VARYING;
+	  tree inner_type = TREE_TYPE (@0);
+
+	  if (int_fits_type_p (@2, inner_type))
+	  {
+	cst_inner = fold_convert (inner_type, @2);
+
+	wide_int wmin0, wmax0;
+	wide_int w1 = cst_inner;
+	signop sgn = TYPE_SIGN (inner_type);
+	vr = get_range_info (@0, , );
+
+	if (vr == VR_RANGE)
+	  {
+		wide_int wmin = wi::add (wmin0, w1);
+		bool min_ovf = wi::cmp (wmin, w1, sgn) < 0;
+
+		wide_int wmax = wi::add (wmax0, w1);
+		bool max_ovf = wi::cmp (wmax, w1, sgn) < 0;
+
+		split_range = (min_ovf && !max_ovf) || (!min_ovf && max_ovf);
+	  }
+	  }
+	}
+	(if (cst_inner && !split_range)
+	 (convert (outer_op @0 { cst_inner; })))
+	
+#endif
+
   /* ~A + A -> -1 */
   (simplify
(plus:c (bit_not @0) @0)


Re: [PATCH] ASAN: handle addressable params (PR sanitize/81040).

2017-06-20 Thread Martin Liška
On 06/20/2017 11:32 AM, Jakub Jelinek wrote:
> On Tue, Jun 20, 2017 at 11:23:36AM +0200, Martin Liška wrote:
>>> Then something needs to be done for debugging too.  If it is without VTA,
>>> then probably just having DECL_VALUE_EXPR is good enough, otherwise
>>> (VTA) you probably don't want that (or can reset it at that point), but
>>> instead emit after the initialization stmt a debug stmt that the variable
>>> value now lives in a different var.  Though ideally we want the debugger
>>> to be able to also change the value of the var, that might be harder.
>>> With DECL_VALUE_EXPR on the other side the debug info will be incorrect in
>>> the prologue until it is assigned to the slot.
>>
>> Here I'm not sure about how to distinguish whether to build or not to build
>> the debug statement. According to flag_var_tracking?
> 
> More like if (target_for_debug_bind (arg))
> And if that is false, just make sure DECL_VALUE_EXPR is set to var.
> 
>> You mean something like:
>> g = gimple_build_debug_bind (arg, var, g);
>> ?
> 
> Well, there is no stmt, so the last argument would be just NULL.
> 
>>> I don't understand the distinction.  If you turn the original parm
>>> for complex/vector DECL_GIMPLE_REG_P, you should need the exact same code
>>> (but I think it would be better to use the default SSA_NAME of the PARM_DECL
>>> if it is a gimple reg type, rather than use the PARM_DECL itself
>>> and wait for update_ssa).
>>
>> Yes, the test-case /gcc/testsuite/g++.dg/asan/function-argument-3.C fails 
>> for me
>> as one needs to have a temporary SSA name, otherwise:
>>
>> /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/asan/function-argument-3.C:13:1:
>>  error: invalid rhs for gimple memory store
>>  foo (v4si arg)
>>  ^~~
>> arg
>>
>> arg
>>
>> # .MEM_4 = VDEF <.MEM_1(D)>
>> arg = arg;
>> during GIMPLE pass: sanopt
>>
>> If I see correctly the function in my test-case does not have default def 
>> SSA name for the parameter.
>> Thus I guess I need to create a SSA name?
> 
> I'd expect if you have DECL_GIMPLE_REG_P set on the PARM_DECL and
> use the default def, you shouldn't run into this.
> 
>   Jakub
> 

Good I fixed that in v2, that passes regression tests.
Ale objections should be resolved in the version.

Ready for trunk?
Martin
>From ed5da705250c3015e964de8d23d1aa3d0056012a Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 14 Jun 2017 11:40:01 +0200
Subject: [PATCH] ASAN: handle addressable params (PR sanitize/81040).

gcc/testsuite/ChangeLog:

2017-06-19  Martin Liska  

	PR sanitize/81040
	* g++.dg/asan/function-argument-1.C: New test.
	* g++.dg/asan/function-argument-2.C: New test.
	* g++.dg/asan/function-argument-3.C: New test.

gcc/ChangeLog:

2017-06-19  Martin Liska  

	PR sanitize/81040
	* sanopt.c (rewrite_usage_of_param): New function.
	(sanitize_rewrite_addressable_params): Likewise.
	(pass_sanopt::execute): Call rewrite_usage_of_param.
---
 gcc/sanopt.c| 132 
 gcc/testsuite/g++.dg/asan/function-argument-1.C |  30 ++
 gcc/testsuite/g++.dg/asan/function-argument-2.C |  24 +
 gcc/testsuite/g++.dg/asan/function-argument-3.C |  27 +
 4 files changed, 213 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/asan/function-argument-1.C
 create mode 100644 gcc/testsuite/g++.dg/asan/function-argument-2.C
 create mode 100644 gcc/testsuite/g++.dg/asan/function-argument-3.C

diff --git a/gcc/sanopt.c b/gcc/sanopt.c
index 16bdba76042..077811b5b93 100644
--- a/gcc/sanopt.c
+++ b/gcc/sanopt.c
@@ -37,6 +37,12 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-ssa.h"
 #include "tree-phinodes.h"
 #include "ssa-iterators.h"
+#include "gimplify.h"
+#include "gimple-iterator.h"
+#include "gimple-walk.h"
+#include "cfghooks.h"
+#include "tree-dfa.h"
+#include "tree-ssa.h"
 
 /* This is used to carry information about basic blocks.  It is
attached to the AUX field of the standard CFG block.  */
@@ -858,6 +864,129 @@ sanitize_asan_mark_poison (void)
 }
 }
 
+/* Rewrite all usages of tree OP which is a PARM_DECL with a VAR_DECL
+   that is it's DECL_VALUE_EXPR.  */
+
+static tree
+rewrite_usage_of_param (tree *op, int *walk_subtrees, void *)
+{
+  if (TREE_CODE (*op) == PARM_DECL && DECL_VALUE_EXPR (*op) != NULL_TREE)
+{
+  *op = DECL_VALUE_EXPR (*op);
+  *walk_subtrees = 0;
+}
+
+  return NULL;
+}
+
+/* For a given function FUN, rewrite all addressable parameters so that
+   a new automatic variable is introduced.  Right after function entry
+   a parameter is assigned to the variable.  */
+
+static void
+sanitize_rewrite_addressable_params (function *fun)
+{
+  gimple *g;
+  gimple_seq stmts = NULL;
+  auto_vec addressable_params;
+
+  for (tree arg = DECL_ARGUMENTS (current_function_decl);
+   arg; arg = DECL_CHAIN (arg))
+{
+  if (TREE_ADDRESSABLE (arg) && !TREE_ADDRESSABLE (TREE_TYPE (arg)))
+	{
+	  TREE_ADDRESSABLE (arg) = 0;
+	  /* 

Re: [testsuite, i386] Always check for target i?86 and x86_64

2017-06-20 Thread Rainer Orth
Hi Jonathan,

> On 15/06/17 12:51 +0200, Rainer Orth wrote:
>>I happened to notice that recently a couple of testcases have sneaked in
>>that are restricted to x86_64-*-* targets only.  This is always wrong:
>>it should be i?86-*-* and x86_64-*-* alike, eventually restricing the
>>test to ilp32 or lp64.  There were also instances of i?86-*-* only,
>>which I've handled as well.
>
> [...]
>
>>diff --git a/libstdc++-v3/testsuite/20_util/variant/index_type.cc
>> b/libstdc++-v3/testsuite/20_util/variant/index_type.cc
>>--- a/libstdc++-v3/testsuite/20_util/variant/index_type.cc
>>+++ b/libstdc++-v3/testsuite/20_util/variant/index_type.cc
>>@@ -1,5 +1,5 @@
>> // { dg-options "-std=gnu++17" }
>>-// { dg-do compile { target x86_64-*-* powerpc*-*-* } }
>>+// { dg-do compile { target i?86-*-* x86_64-*-* powerpc*-*-* } }
>>
>> // Copyright (C) 2017 Free Software Foundation, Inc.
>> //
>
> The concern here was just that we don't want the test to fail on
> targets with weird integer sizes, so the list of targets was
> restricted to just those where Ville had tested it.
>
> But { target ilp32 lp64 } would surely be fine. The test will only
> fail if a struct with two char-sized subobjects is the same size as
> size_t. Feel free to change it to { target ilp32 lp64 }.

once I got the syntax right, this worked fine: it needs

{ target { ilp32 || lp64 } }

instead ;-)

I've also now managed to complete a Darwin/x86_64 bootstrap by locally
reverting the two culprit patches for PR bootstrap/81033 and confirmed
that the patched testcases are fine there, two.

Given that there were no other comments, I've installed the patch.  It
would still be nice if the Cygwin/MingW maintainer could comment on the
testcase situation for those targets.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-20 Thread Segher Boessenkool
On Mon, Jun 19, 2017 at 12:40:53PM -0500, Segher Boessenkool wrote:
> On Mon, Jun 19, 2017 at 05:01:10PM +0100, Richard Earnshaw (lists) wrote:
> > Yeah, and I'm not suggesting we change the logic there (sorry if the
> > description was misleading).  Instead I'm proposing that we handle more
> > cases for parallels to not return zero.
> 
> Right.  My test run is half way through, will have results later --
> your change looks good to me, but it is always surprising whether
> better costs help or not, or even *hurt* good code generation (things
> are just too tightly tuned to the current behaviour, so some things
> may need retuning).

Everything built successfully (31 targets); --enable-checking=yes,rtl,tree
so it took a while, sorry.

The targets with any differences (table shows code size):

 old patched
 arm  11545709  11545797
 powerpc   8442762   8442746
  x86_64  10627428  10627363

Arm has very many differences, the others do not.  For powerpc (which
is 32-bit, 64-bit showed no differences) most of the difference is
scheduling deciding to do things a bit differently, and most of it
in places where we have not-so-good costs anyway.  For arm the effects
often cascade to bb-reorder making different decisions.

Anyway, all differences are small, it is not likely to hurt anything.
I support the patch, if that helps -- but I cannot approve it.


Segher


[PR c++/67074] Namespace aliases to same name

2017-06-20 Thread Nathan Sidwell
This patch fixes a couple of places where namespace aliases refer to the 
same namespace.  These are not ambiguous or conflicting.


Firstly, aliases of the same name may exist in namespaces searched via 
using directives.  Those should be merged in lookup, which is the change 
to add_value.


Secondly, an alias to an existing namespace of the same name in the same 
scope is ok.  The change to duplicate_decls calms it down -- all it 
should do is say 'yup, this is the same decl', and disregard 
conflictingness -- that's pushdecl's problem.  Which is the change to 
update_binding.  As it happens, duplicate_decls will have already 
eliminated matching aliases, but it needs to be taught to permit an 
alias to the same named namespace to be allowed.


And finally diagnose_name_conflict needs to treat namespaces as always 
conflicting.  It's confusing to say the second instance is a redeclaration.


Applied to trunk.

nathan
--
Nathan Sidwell
2017-06-20  Nathan Sidwell  

	PR c++/67074 - namespace aliases
	* decl.c (duplicate_decls): Don't error here on mismatched
	namespace alias.
	* name-lookup.c (name_lookup::add_value): Matching namespaces are
	not ambiguous.
	(diagnose_name_conflict): Namespaces are never redeclarations.
	(update_binding): An alias can match a real namespace.

	PR c++/67074
	* g++.dg/lookup/pr67074.C: New.
	* g++.dg/parse/namespace-alias-1.C: Adjust.

Index: cp/decl.c
===
--- cp/decl.c	(revision 249384)
+++ cp/decl.c	(working copy)
@@ -1751,17 +1751,9 @@ duplicate_decls (tree newdecl, tree oldd
 	  && (DECL_NAMESPACE_ALIAS (newdecl)
 	  == DECL_NAMESPACE_ALIAS (olddecl)))
 	return olddecl;
-  /* [namespace.alias]
 
-	 A namespace-name or namespace-alias shall not be declared as
-	 the name of any other entity in the same declarative region.
-	 A namespace-name defined at global scope shall not be
-	 declared as the name of any other entity in any global scope
-	 of the program.  */
-  error ("conflicting declaration of namespace %q+D", newdecl);
-  inform (DECL_SOURCE_LOCATION (olddecl),
-	  "previous declaration of namespace %qD here", olddecl);
-  return error_mark_node;
+  /* Leave it to update_binding to merge or report error.  */
+  return NULL_TREE;
 }
   else
 {
Index: cp/name-lookup.c
===
--- cp/name-lookup.c	(revision 249385)
+++ cp/name-lookup.c	(working copy)
@@ -450,7 +450,13 @@ name_lookup::add_value (tree new_val)
   else if ((TREE_CODE (value) == TYPE_DECL
 	&& TREE_CODE (new_val) == TYPE_DECL
 	&& same_type_p (TREE_TYPE (value), TREE_TYPE (new_val
-;
+/* Typedefs to the same type. */;
+  else if (TREE_CODE (value) == NAMESPACE_DECL
+	   && TREE_CODE (new_val) == NAMESPACE_DECL
+	   && ORIGINAL_NAMESPACE (value) == ORIGINAL_NAMESPACE (new_val))
+/* Namespace (possibly aliased) to the same namespace.  Locate
+   the namespace*/
+value = ORIGINAL_NAMESPACE (value);
   else
 {
   if (deduping)
@@ -1630,10 +1636,10 @@ static void
 diagnose_name_conflict (tree decl, tree bval)
 {
   if (TREE_CODE (decl) == TREE_CODE (bval)
-  && (TREE_CODE (decl) != TYPE_DECL
-	  || (DECL_ARTIFICIAL (decl) && DECL_ARTIFICIAL (bval))
-	  || (!DECL_ARTIFICIAL (decl) && !DECL_ARTIFICIAL (bval)))
+  && TREE_CODE (decl) != NAMESPACE_DECL
   && !DECL_DECLARES_FUNCTION_P (decl)
+  && (TREE_CODE (decl) != TYPE_DECL
+	  || DECL_ARTIFICIAL (decl) == DECL_ARTIFICIAL (bval))
   && CP_DECL_CONTEXT (decl) == CP_DECL_CONTEXT (bval))
 error ("redeclaration of %q#D", decl);
   else
@@ -1809,15 +1815,14 @@ update_binding (cp_binding_level *level,
 }
   else if (TREE_CODE (old) == NAMESPACE_DECL)
 {
-  if (DECL_NAMESPACE_ALIAS (old) && DECL_NAMESPACE_ALIAS (decl)
-	  && ORIGINAL_NAMESPACE (old) == ORIGINAL_NAMESPACE (decl))
-	/* In a declarative region, a namespace-alias-definition can be
-	   used to redefine a namespace-alias declared in that declarative
-	   region to refer only to the namespace to which it already
-	   refers.  [namespace.alias] */
-	return old;
-  else
+  /* Two maybe-aliased namespaces.  If they're to the same target
+	 namespace, that's ok.  */
+  if (ORIGINAL_NAMESPACE (old) != ORIGINAL_NAMESPACE (decl))
 	goto conflict;
+
+  /* The new one must be an alias at this point.  */
+  gcc_assert (DECL_NAMESPACE_ALIAS (decl));
+  return old;
 }
   else if (TREE_CODE (old) == VAR_DECL)
 {
Index: testsuite/g++.dg/lookup/pr67074.C
===
--- testsuite/g++.dg/lookup/pr67074.C	(nonexistent)
+++ testsuite/g++.dg/lookup/pr67074.C	(working copy)
@@ -0,0 +1,18 @@
+// PR c++/67074 namespace aliases to the same place.
+
+namespace P {
+  namespace X {
+static int i = 1;
+  }
+}
+namespace Q {
+  namespace X = P::X;
+}
+
+using namespace P;
+using namespace 

[PATCH] Fix PR81097

2017-06-20 Thread Richard Biener

The following fixes PR81097 but not eventually more latent issues
in transforming ~x "back" to -x - 1.  I want a testcase before
fiddling more with this.  The following follows the pattern
of previous fixes to this function, making sure to do negation
in 'type'.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-06-20  Richard Biener  

PR middle-end/81097
* fold-const.c (split_tree): Fold to type before negating.

* c-c++-common/ubsan/pr81097.c: New testcase.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 249397)
+++ gcc/fold-const.c(working copy)
@@ -853,9 +853,9 @@ split_tree (location_t loc, tree in, tre
   && code == PLUS_EXPR)
 {
   /* -X - 1 is folded to ~X, undo that here.  Do _not_ do this
- when IN is constant.  */
-  *minus_litp = build_one_cst (TREE_TYPE (in));
-  var = negate_expr (TREE_OPERAND (in, 0));
+ when IN is constant.  Convert to TYPE before negating.  */
+  *minus_litp = build_one_cst (type);
+  var = negate_expr (fold_convert_loc (loc, type, TREE_OPERAND (in, 0)));
 }
   else
 var = in;
Index: gcc/testsuite/c-c++-common/ubsan/pr81097.c
===
--- gcc/testsuite/c-c++-common/ubsan/pr81097.c  (nonexistent)
+++ gcc/testsuite/c-c++-common/ubsan/pr81097.c  (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do run } */
+/* { dg-options "-fsanitize=undefined -fsanitize-undefined-trap-on-error" } */
+
+unsigned int a = 3309568;
+unsigned int b = -1204857327;
+short c = -10871;
+short x;
+int main()
+{
+  x = ((short)(~a) | ~c) +  ((short)(~b) | ~c);
+  return 0;
+}


Re: [PATCH, GCC/testsuite/ARM] Consistently check for neon in vect effective targets

2017-06-20 Thread Christophe Lyon
On 19 June 2017 at 16:47, Thomas Preudhomme
 wrote:
>
>
> On 19/06/17 15:31, Christophe Lyon wrote:
>>
>> On 19 June 2017 at 16:11, Thomas Preudhomme
>>  wrote:
>>>
>>>
>>>
>>> On 19/06/17 10:16, Thomas Preudhomme wrote:




 On 19/06/17 08:41, Christophe Lyon wrote:
>
>
> Hi Thomas,
>
>
> On 15 June 2017 at 18:18, Thomas Preudhomme
>  wrote:
>>
>>
>> Hi,
>>
>> Conditions checked for ARM targets in vector-related effective targets
>> are inconsistent:
>>
>> * sometimes arm*-*-* is checked
>> * sometimes Neon is checked
>> * sometimes arm_neon_ok and sometimes arm_neon is used for neon check
>> * sometimes check_effective_target_* is used, sometimes
>> is-effective-target
>>
>> This patch consolidate all of these check into using
>> is-effective-target
>> arm_neon and when little endian was checked, the check is kept.
>>
>> ChangeLog entry is as follows:
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2017-06-06  Thomas Preud'homme  
>>
>> * lib/target-supports.exp (check_effective_target_vect_int):
>> Replace
>> current ARM check by ARM NEON's availability check.
>> (check_effective_target_vect_intfloat_cvt): Likewise.
>> (check_effective_target_vect_uintfloat_cvt): Likewise.
>> (check_effective_target_vect_floatint_cvt): Likewise.
>> (check_effective_target_vect_floatuint_cvt): Likewise.
>> (check_effective_target_vect_shift): Likewise.
>> (check_effective_target_whole_vector_shift): Likewise.
>> (check_effective_target_vect_bswap): Likewise.
>> (check_effective_target_vect_shift_char): Likewise.
>> (check_effective_target_vect_long): Likewise.
>> (check_effective_target_vect_float): Likewise.
>> (check_effective_target_vect_perm): Likewise.
>> (check_effective_target_vect_perm_byte): Likewise.
>> (check_effective_target_vect_perm_short): Likewise.
>> (check_effective_target_vect_widen_sum_hi_to_si_pattern):
>> Likewise.
>> (check_effective_target_vect_widen_sum_qi_to_hi): Likewise.
>> (check_effective_target_vect_widen_mult_qi_to_hi): Likewise.
>> (check_effective_target_vect_widen_mult_hi_to_si): Likewise.
>> (check_effective_target_vect_widen_mult_qi_to_hi_pattern):
>> Likewise.
>> (check_effective_target_vect_widen_mult_hi_to_si_pattern):
>> Likewise.
>> (check_effective_target_vect_widen_shift): Likewise.
>> (check_effective_target_vect_extract_even_odd): Likewise.
>> (check_effective_target_vect_interleave): Likewise.
>> (check_effective_target_vect_multiple_sizes): Likewise.
>> (check_effective_target_vect64): Likewise.
>> (check_effective_target_vect_max_reduc): Likewise.
>>
>> Testing: Testsuite shows no regression when targeting ARMv7-A with
>> -mfpu=neon-fpv4 and -mfloat-abi=hard or when targeting Cortex-M3 with
>> default FPU and float ABI (soft). Testing was done with both
>> compare_tests
>> and the updated dg-cmp-results proposed in
>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01030.html
>>
>> Is this ok for trunk?
>>
>
> I applied your patch on top of r249233, and noticed quite a few
> changes:
>
>
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/249233-consistent_neon_check.patch/report-build-info.html
>
>
> Note that "Big-Regression" cases are caused by the fact that there a
> are PASS->XPASS and XFAILs disappear with your patch, and many
> (3000-4000) PASS disappear.
> In that intended?



 It certainly is not. I'd like to investigate this but the link to
 results
 for
 rev 249233 is broken. Could you provide me with the results you have for
 that so
 that I can compare manually?
>>>
>>>
>>>
>>> Actually yes it is, at least for the configurations with default (which
>>> still uses -mfpu=vfp in r249233) or VFP (whatever version) FPU. I've
>>> checked
>>> all the ->NA and ->UNSUPPORTED for the arm-none-linux-gnueabi
>>> configuration
>>> and none of them has a dg directive to select the neon unit (such as
>>> dg-additional-options >> line>).
>>> I've also looked at arm-none-linux-gnueabihf configuration with neon FPU
>>> and
>>> there is no regression there.
>>>
>>> I therefore think this is all normal and expected. Note that under
>>> current
>>> trunk this should be different because neon-fp16 would be selected
>>> instead
>>> of vfp for default FPU with Cortex-A9.
>>>
>>
>> OK, thanks for checking. So the version you sent on June 15th is OK?
>

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-20 Thread Richard Biener
On Tue, Jun 20, 2017 at 2:20 PM, Uros Bizjak  wrote:
> On Tue, Jun 20, 2017 at 2:17 PM, Uros Bizjak  wrote:
>> On Tue, Jun 20, 2017 at 2:13 PM, Florian Weimer  wrote:
>>> On 06/20/2017 01:10 PM, Uros Bizjak wrote:
>>>
   74,99%  a.outa.out  [.] test_or
   12,50%  a.outa.out  [.] test_movb
   12,50%  a.outa.out  [.] test_movl
>>>
>>> Could you try notl/notb/negl/negb as well, please?
>>
>> These all have the same (long) runtime as test_or.
>
> Perhaps we can use "testb $0, %0"? It doesn't write to the memory, but
> otherwise has the same runtime as movb/movl.

That sounds good, OTOH it's a matter of putting strain on the
memory fetch or store side...  We'll get cacheline allocations in
any case (but the memory will be used eventually).  Instead
of test a mere movb into a scratch register (aka, load instead of
store) would work as well apart from the need of a scratch register.

We can also vectorize with scatters ;)  (just kidding)

Richard.

> Uros.


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-20 Thread Uros Bizjak
On Tue, Jun 20, 2017 at 2:17 PM, Uros Bizjak  wrote:
> On Tue, Jun 20, 2017 at 2:13 PM, Florian Weimer  wrote:
>> On 06/20/2017 01:10 PM, Uros Bizjak wrote:
>>
>>>   74,99%  a.outa.out  [.] test_or
>>>   12,50%  a.outa.out  [.] test_movb
>>>   12,50%  a.outa.out  [.] test_movl
>>
>> Could you try notl/notb/negl/negb as well, please?
>
> These all have the same (long) runtime as test_or.

Perhaps we can use "testb $0, %0"? It doesn't write to the memory, but
otherwise has the same runtime as movb/movl.

Uros.


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-20 Thread Uros Bizjak
On Tue, Jun 20, 2017 at 2:13 PM, Florian Weimer  wrote:
> On 06/20/2017 01:10 PM, Uros Bizjak wrote:
>
>>   74,99%  a.outa.out  [.] test_or
>>   12,50%  a.outa.out  [.] test_movb
>>   12,50%  a.outa.out  [.] test_movl
>
> Could you try notl/notb/negl/negb as well, please?

These all have the same (long) runtime as test_or.

Uros.


Re: [PATCH 2/2] DWARF: make it possible to emit debug info for declarations only

2017-06-20 Thread Richard Biener
On Fri, Jun 16, 2017 at 6:35 PM, Pierre-Marie de Rodat
 wrote:
> On 05/31/2017 11:08 AM, Pierre-Marie de Rodat wrote:
>>
>> On 05/31/2017 09:34 AM, Richard Biener wrote:
>>>
>>> Actually for the bigger picture I'd refactor
>>> rest_of_decl_compilation, not calling it from the frontends but
>>> rely on finalize_decl/function.  The missing part would then be
>>> calling the dwarf hook which should eventually be done at some of
>>> the places the frontends now call rest_of_decl_compliation.
>
>
> I put some thought about this, but I suppose I don’t yet understand well
> enough the relation between what rest_of_decl_compilation and
> finalize_decl/function do. So I’ve tried to go half-way: I moved the
> “specification?” guard from the DWARF back-end to callers of the
> early_global_decl hook. In the end, this yielded a very small middle-end
> change: almost all hook calls in front-ends are for variables or namespaces,
> not for functions.
>
>>> But for an easier way (you might still explore the above ;)) just remove
>>> the guards from dwarf2out.c and handle it more like types that we
>>> prune if they end up being unused (OTOH I guess we don't refer to
>>> the decl DIEs from "calls" because not all calls are refered to with
>>> standard DWARF -- the GNU callsite stuff refers them I think but those
>>> get generated too late).
>>>
>>> That said, when early_finish is called the cgraph and IPA references
>>> exists and thus you can
>>> sort-of see which functions are "used".
>>
>>
>> Ok, thanks. I’ll give a try to the first option, then. :-)
>
>
> I finally decided not to implement this scheme, as it does not give the same
> results for the case in Ada that motivated this change: it would generate
> potentially one DIE per “calling unit” per called function, which is quite
> suboptimal compared to one DIE per subprogram definition or subprogram
> import. This would look like a debug info bloat for debatable gain.
>
> So here’s an updated patch, without the new debug hook. It boostrapped and
> regtested fine on x86_64-linux. After this change, I observed an increase
> of:
>
>   * an increase of ~22KB for gnat1 (base is 210MB);
>   * a decrease (?) of ~3KB for cc1 (base is 197MB);
>   * a similar decrease of 3KB for cc1plus (base is 220MB).
>
> Ok to commit?

Nice.  This looks ok.

I'm mildy curious about the deecrease of debuginfo size for cc1 -- did
you spot anything obvious there?

I suspect Fortran wants to do sth similar as Ada for imported modules.

Thanks,
Richard.

> --
> Pierre-Marie de Rodat


Re: [PATCH] Call BUILT_IN_ASAN_HANDLE_NO_RETURN before BUILT_IN_UNWIND_RESUME (PR sanitizer/81021).

2017-06-20 Thread Martin Liška
PING^1

On 06/13/2017 10:09 AM, Martin Liška wrote:
> Hi.
> 
> For a function that does not handle an expection (and calls 
> BUILT_IN_UNWIND_RESUME),
> we need to emit call to BUILT_IN_ASAN_HANDLE_NO_RETURN. That will clean up 
> stack
> which can possibly contain poisoned shadow memory that will not be cleaned-up
> in function prologue.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-06-12  Martin Liska  
> 
>   PR sanitizer/81021
>   * g++.dg/asan/pr81021.C: New test.
> 
> gcc/ChangeLog:
> 
> 2017-06-12  Martin Liska  
> 
>   PR sanitizer/81021
>   * tree-eh.c (lower_resx): Call BUILT_IN_ASAN_HANDLE_NO_RETURN
>   before BUILT_IN_UNWIND_RESUME when ASAN is used.
> ---
>  gcc/testsuite/g++.dg/asan/pr81021.C | 33 +
>  gcc/tree-eh.c   | 14 ++
>  2 files changed, 47 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/asan/pr81021.C
> 
> 



Re: [PATCH 2/3] Make early return predictor more precise.

2017-06-20 Thread Martin Liška
On 06/19/2017 01:11 PM, Jan Hubicka wrote:
>> Ok, you're right that we can preserve the predictor. However, let's consider 
>> following test-case:
>>
>> static
>> int baz(int a)
>> {
>>   if (a == 1)
>> return 1;
>>
>>   return 0;
>> }
>>
>>   
>> static
>> int bar(int a)
>> {
>>   if (a == 1)
>> return baz(a);
>>
>>   return 0;
>> }
>>   
>> static
>> int foo(int a)
>> {
>>   if (a == 1)
>> return bar(a);
>>
>>   return 12;
>> }
>>
>> int main(int argc, char **argv)
>> {
>>   return foo(argc);
>> }
>>
>> There after einline we have:
>>
>> main (int argc, char * * argv)
>> {
>>   int D.1832;
>>   int _3;
>>   int _4;
>>
>>[100.00%]:
>>   if (argc_2(D) == 1)
>> goto ; [37.13%]
>>   else
>> goto ; [62.87%]
>>
>>[37.13%]:
>>   // predicted unlikely by early return (on trees) predictor.
>>   // predicted unlikely by early return (on trees) predictor.
>>   // predicted unlikely by early return (on trees) predictor.
>>
>>[100.00%]:
>>   # _3 = PHI <12(2), 1(3)>
>>   _5 = _3;
>>   _4 = _5;
>>   return _4;
>>
>> }
>>
>> I'm thinking what's the best place to merge all the predictor
>> statements?
> 
> I wonder if we need to - predictors are relatively short lived.
> In fact they do not need to hit IPA passes but they do as at a time
> I was implementing them I was worrying about introducing yet another
> global IPA pass to remove them (we can't do during early inlining
> because we want to reuse them after inlining).

Ok, so I fixed that in the described way. There's one remaining fallout of:
gcc/testsuite/gcc.dg/tree-ssa/ipa-split-5.c

Where a fnsplit is properly done, but then it's again inlined:

Considering split_me.part.0/5 with 23 size
 to be inlined into test/2 in unknown:0
 Estimated badness is -0.01, frequency 0.33.
 Inlined split_me.part.0 into test which now has time 50.30 and size 44, 
net change of +17.

Considering split_me.part.0/5 with 23 size
 to be inlined into test/2 in unknown:0
 Estimated badness is -0.01, frequency 0.33.
 Inlined split_me.part.0 into test which now has time 70.76 and size 61, 
net change of +17.

Considering split_me.part.0/5 with 23 size
 to be inlined into test/2 in unknown:0
 Estimated badness is -0.01, frequency 0.33.
 Inlined split_me.part.0 into test which now has time 91.22 and size 78, 
net change of +17.

Considering split_me.part.0/5 with 23 size
 to be inlined into test/2 in unknown:0
 Estimated badness is -0.01, frequency 0.33.
 Inlined split_me.part.0 into test which now has time 111.68 and size 95, 
net change of +17.
Unit growth for small function inlining: 61->129 (111%)

...

Any hint how to block the IPA inlining?

Sending new version of patch.
Martin

> 
> I would just move pass_strip_predict_hints pre-IPA and not worry about
> them chaining.
> 
> There is problem that after inlining the prediction may expand its scope
> and predict branch that it outside of the original function body,
> but I do not see very easy solution for that besides not re-doing
> prediction (we could also copy probabilities from the inlined function
> when they exists and honnor them in the outer function. I am not sure
> that is going to improve prediction quality though - extra context
> is probably useful)
> 
> Thanks,
> Honza
>>
>> Thanks,
>> Martin
>>
>>>
>>> Where did you found this case?
>>> Honza
  
/* Create a new deep copy of the statement.  */
copy = gimple_copy (stmt);
 -- 
 2.13.0


>From 84625a782add6ae2ed29630815b61b34a052770a Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 6 Jun 2017 10:55:18 +0200
Subject: [PATCH 1/2] Make early return predictor more precise.

gcc/ChangeLog:

2017-05-26  Martin Liska  

	PR tree-optimization/79489
	* gimplify.c (maybe_add_early_return_predict_stmt): New
	function.
	(gimplify_return_expr): Call the function.
	* predict.c (tree_estimate_probability_bb): Remove handling
	of early return.
	* predict.def: Update comment about early return predictor.
	* gimple-predict.h (is_gimple_predict): New function.
	* predict.def: Change default value of early return to 66.
	* tree-tailcall.c (find_tail_calls): Skip GIMPLE_PREDICT
	statements.
	* passes.def: Put pass_strip_predict_hints to the beginning of
	IPA passes.
---
 gcc/gimple-low.c |  2 ++
 gcc/gimple-predict.h |  8 
 gcc/gimplify.c   | 16 
 gcc/passes.def   |  1 +
 gcc/predict.c| 41 -
 gcc/predict.def  | 15 +++
 gcc/tree-tailcall.c  |  2 ++
 7 files changed, 32 insertions(+), 53 deletions(-)

diff --git a/gcc/gimple-low.c b/gcc/gimple-low.c
index 619b9d7bfb1..4ea6c3532f3 100644
--- a/gcc/gimple-low.c
+++ b/gcc/gimple-low.c
@@ -30,6 +30,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "calls.h"
 #include "gimple-iterator.h"
 #include "gimple-low.h"
+#include "predict.h"
+#include "gimple-predict.h"
 
 /* The differences between High GIMPLE 

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-20 Thread Florian Weimer
On 06/20/2017 01:10 PM, Uros Bizjak wrote:

>   74,99%  a.outa.out  [.] test_or
>   12,50%  a.outa.out  [.] test_movb
>   12,50%  a.outa.out  [.] test_movl

Could you try notl/notb/negl/negb as well, please?

Thanks,
Florian


Re: [PATCH][AArch64] Emit SIMD moves as mov

2017-06-20 Thread Wilco Dijkstra
James Greenhalgh wrote:
> 
> Does this introduce a dependency on a particular binutils version, or have
> we always supported this alias?
>
> The patch looks OK, but I don't want to introduce a new dependency so please
> check how far back this is supported.

Well gas/testsuite/gas/aarch64/alias.s contains "mov v0.8b, v1.8b"
since binutils 2.23, which was the first release with AArch64.

Wilco



Re: [PATCH GCC][12/13]Workaround reduction statements for distribution

2017-06-20 Thread Richard Biener
On Tue, Jun 20, 2017 at 11:20 AM, Bin.Cheng  wrote:
> On Fri, Jun 16, 2017 at 6:15 PM, Bin.Cheng  wrote:
>> On Fri, Jun 16, 2017 at 11:21 AM, Richard Biener
>>  wrote:
>>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
 Hi,
 For now, loop distribution handles variables used outside of loop as 
 reduction.
 This is inaccurate because all partitions contain statement defining 
 induction
 vars.
>>>
>>> But final induction values are usually not used outside of the loop...
>> This is in actuality for induction variable which is used outside of the 
>> loop.
>>>
>>> What is missing is loop distribution trying to change partition order.  In 
>>> fact
>>> we somehow assume we can move a reduction across a detected builtin
>>> (I don't remember if we ever check for validity of that...).
>> Hmm, I am not sure when we can't.  If there is any dependence between
>> builtin/reduction partitions, it should be captured by RDG or PG,
>> otherwise the partitions are independent and can be freely ordered as
>> long as reduction partition is scheduled last?
>>>
 Ideally we should factor out scev-propagation as a standalone interface
 which can be called when necessary.  Before that, this patch simply 
 workarounds
 reduction issue by checking if the statement belongs to all partitions.  
 If yes,
 the reduction must be computed in the last partition no matter how the 
 loop is
 distributed.
 Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>>> stmt_in_all_partitions is not kept up-to-date during partition merging and 
>>> if
>>> merging makes the reduction partition(s) pass the stmt_in_all_partitions
>>> test your simple workaround doesn't work ...
>> I think it doesn't matter because:
>>   A) it's really workaround for induction variables.  In general,
>> induction variables are included by all partition.
>>   B) After classify partition, we immediately fuses all reduction
>> partitions.  More stmt_in_all_partitions means we are fusing
>> non-reduction partition with reduction partition, so the newly
>> generated (stmt_in_all_partitions) are actually not reduction
>> statements.  The workaround won't work anyway even the bitmap is
>> maintained.
>>>
>>> As written it's a valid optimization but can you please note it's 
>>> limitation in
>>> some comment please?
>> Yeah, I will add comment explaining it.
> Comment added in new version patch.  It also computes bitmap outside
> now, is it OK?

Ok.  Can you add a testcase for this as well please?  I think the
series up to this
is now fully reviewed, I defered 1/n (the new IFN) to the last one
containing the
runtime versioning.  Can you re-post that (you can merge with the IFN patch)
to apply after the series has been applied up to this?

Thanks,
Richard.

> Thanks,
> bin
> 2017-06-07  Bin Cheng  
>
> * tree-loop-distribution.c (classify_partition): New parameter and
> better handle reduction statement.
> (rdg_build_partitions): Revise comment.
> (distribute_loop): Compute statements in all partitions and pass it
> to classify_partition.


Re: [PATCH GCC][11/13]Annotate partition by its parallelism execution type

2017-06-20 Thread Richard Biener
On Tue, Jun 20, 2017 at 11:18 AM, Bin.Cheng  wrote:
> On Fri, Jun 16, 2017 at 11:10 AM, Richard Biener
>  wrote:
>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
>>> Hi,
>>> This patch checks and records if partition can be executed in parallel by
>>> looking if there exists data dependence cycles.  The information is needed
>>> for distribution because the idea is to distribute parallel type partitions
>>> away from sequential ones.  I believe current distribution doesn't work
>>> very well because it does blind distribution/fusion.
>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> +  /* In case of no data dependence.  */
>> +  if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
>> +return false;
>> +  /* Or the data dependence can be resolved by compilation time alias
>> + check.  */
>> +  else if (!alias_sets_conflict_p (get_alias_set (DR_REF (dr1)),
>> +  get_alias_set (DR_REF (dr2
>> +return false;
>>
>> dependence analysis should use TBAA already, in which cases do you need this?
>> It seems to fall foul of the easy mistake of not honoring GCCs memory model
>> as well ... see dr_may_alias_p.
> I see.  Patch updated with this branch removed.
>
>>
>> +  /* Further check if any data dependence prevents us from executing the
>> + partition parallelly.  */
>> +  EXECUTE_IF_SET_IN_BITMAP (partition->reads, 0, i, bi)
>> +{
>> +  dr1 = (*datarefs_vec)[i];
>> +  EXECUTE_IF_SET_IN_BITMAP (partition->writes, 0, j, bj)
>> +   {
>>
>> what about write-write dependences?
>>
>> +  EXECUTE_IF_SET_IN_BITMAP (partition->reads, 0, i, bi)
>> +{
>> +  dr1 = (*datarefs_vec)[i];
>> +  EXECUTE_IF_SET_IN_BITMAP (partition->writes, i + 1, j, bj)
>> +   {
>> + dr2 = (*datarefs_vec)[j];
>> + /* Partition can only be executed sequentially if there is any
>> +data dependence cycle.  */
>>
>> exact copy of the loop nest follows?!  Maybe you meant to iterate
>> over writes in the first loop.
> Yes, this is a copy-paste typo.  Patch is also simplified because
> read/write are recorded together now.  Is it OK?

Ok.

Thanks,
Richard.

> Thanks,
> bin
> 2017-06-07  Bin Cheng  
>
> * tree-loop-distribution.c (enum partition_type): New.
> (struct partition): New field type.
> (partition_merge_into): Update partition type.
> (data_dep_in_cycle_p): New function.
> (build_rdg_partition_for_vertex): Compute partition type.
> (rdg_build_partitions): Dump partition type.


Re: [PATCH GCC][07/13]Preserve data references for whole distribution life time

2017-06-20 Thread Richard Biener
On Mon, Jun 19, 2017 at 5:59 PM, Bin.Cheng  wrote:
> On Mon, Jun 19, 2017 at 4:16 PM, Richard Biener
>  wrote:
>> On Mon, Jun 19, 2017 at 3:34 PM, Bin.Cheng  wrote:
>>> On Tue, Jun 13, 2017 at 12:14 PM, Richard Biener
>>>  wrote:
 On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng  wrote:
> Hi,
> This patch collects and preserves all data references in loop for whole
> distribution life time.  It will be used afterwards.
>
> Bootstrap and test on x86_64 and AArch64.  Is it OK?

 +/* Vector of data references in the loop to be distributed.  */
 +static vec *datarefs_vec;
 +
 +/* Map of data reference in the loop to a unique id.  */
 +static hash_map *datarefs_map;
 +

 no need to make those pointers.  It's not a unique id but
 the index into the datarefs_vec vector, right?

 loop distribution doesn't yet use dr->aux so it would be nice
 to avoid the hash_map in favor of using that field.

 #define DR_INDEX (dr) ((uintptr_t)(dr)->aux)

 +  if (datarefs_vec->length () > 64)

 There is PARAM_VALUE (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS)
 with a default value of 1000.  Please use that instead of magic numbers.

 +{
 +  if (dump_file && (dump_flags & TDF_DETAILS))
 +   fprintf (dump_file,
 +"Loop %d not distributed: more than 64 memory 
 references.\n",
 +loop->num);
 +
 +  free_rdg (rdg);
 +  loop_nest->release ();
 +  delete loop_nest;
 +  free_data_refs (*datarefs_vec);
 +  delete datarefs_vec;
 +  return 0;
 +}

 auto_* were so nice ...
>>> Hi Richard,
>>> This is the updated patch.  It removes datarefs_map as well as checks
>>> number of data references against the parameter.  Is it OK?
>>
>> ENOPATCH
> Ah Sorry for that.

Ok.

Richard.

> Thanks,
> bin
>>
>>> Thanks,
>>> bin
>>> 2017-06-07  Bin Cheng  
>>>
>>> * tree-loop-distribution.c (params.h): Include header file.
>>> (MAX_DATAREFS_NUM, DR_INDEX): New macro.
>>> (datarefs_vec): New global var.
>>> (create_rdg_vertices): Use datarefs_vec directly.
>>> (free_rdg): Don't free data references.
>>> (build_rdg): Update use.  Don't free data references.
>>> (distribute_loop): Compute global variable for data references.
>>> Bail out if there are too many data references.


Re: [PATCH GCC][10/13]Compute and cache data dependence relation

2017-06-20 Thread Richard Biener
On Tue, Jun 20, 2017 at 11:15 AM, Bin.Cheng  wrote:
> On Fri, Jun 16, 2017 at 11:03 AM, Richard Biener
>  wrote:
>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
>>> Hi,
>>> This patch computes and caches data dependence relation in a hash table
>>> so that it can be queried multiple times later for partition dependence
>>> check.
>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> +/* Vector of data dependence relations.  */
>> +static vec *ddrs_vec;
>> +
>> +/* Hash table for data dependence relation in the loop to be distributed.  
>> */
>> +static hash_table *ddrs_table;
>>
>> avoid the extra indirection.
>>
>> +/* Hashtable entry for data reference relation.  */
>> +struct ddr_entry
>> +{
>> +  data_reference_p a;
>> +  data_reference_p b;
>> +  ddr_p ddr;
>> +  hashval_t hash;
>> +};
>> ...
>> +/* Hash table equality function for data reference relation.  */
>> +
>> +inline bool
>> +ddr_entry_hasher::equal (const ddr_entry *entry1, const ddr_entry *entry2)
>> +{
>> +  return (entry1->hash == entry2->hash
>> + && DR_STMT (entry1->a) == DR_STMT (entry2->a)
>> + && DR_STMT (entry1->b) == DR_STMT (entry2->b)
>> + && operand_equal_p (DR_REF (entry1->a), DR_REF (entry2->a), 0)
>> + && operand_equal_p (DR_REF (entry1->b), DR_REF (entry2->b), 0));
>> +}
>>
>> what's the issue with using hash_table  with a custom hasher?
>> That is, simply key on the dataref pointers (hash them, compare those
>> for equality)?
>>
>> Your scheme looks too complicated / expensive to me ...
>>
>> You can drop ddrs_vec needed only for memory removal if you traverse
>> the hashtable.
> Thanks for reviewing.  Patch simplified as suggested.  Is it OK?

+inline hashval_t
+ddr_hasher::hash (const data_dependence_relation *ddr)
+{
+  return iterative_hash_object (DDR_A (ddr),
+   iterative_hash_object (DDR_B (ddr), 0));
+}
+

please use

inchash::hash h;
h.add_ptr (DDR_A (ddr));
h.add_ptr (DDR_B (ddr));
return h.end ();

Ok with that change.

Richard.

> Thanks,
> bin
> 2017-06-17  Bin Cheng  
>
> * tree-loop-distribution.c (struct ddr_hasher): New.
> (ddr_hasher::hash, ::equal, get_data_dependence): New function.
> (ddrs_table): New.
> (classify_partition): Call get_data_dependence.
> (pg_add_dependence_edges): Ditto.
> (distribute_loop): Release data dependence hash table.


Re: [PATCH] Fix PR71815 (SLSR misses PHI opportunities)

2017-06-20 Thread Richard Biener
On Fri, Jun 16, 2017 at 6:10 PM, Bill Schmidt
 wrote:
> Hi,
>
> PR71815 identifies a situation where SLSR misses opportunities for
> PHI candidates when code hoisting is enabled (which is now on by
> default).  The basic problem is that SLSR currently uses an overly
> simple test for profitability of the transformation.  The algorithm
> currently requires that the PHI basis (through which the non-local
> SLSR candidate is propagated) has only one use, which is the
> candidate statement.  The true requirement for profitability is
> that, if the candidate statement will be dead after transformation,
> then so will the PHI candidate.
>
> This patch fixes the problem by looking at the transitive reachability
> of the PHI definitions.  If all paths terminate in the candidate
> statement, then we know the PHI basis will go dead and we will not
> make the code worse with the planned replacement.  To avoid compile
> time issues, path search is arbitrarily terminated at depth 10.  The
> new test is used throughout the cost calculation, so appears multiple
> times in the code.
>
> Also, I've added a check to avoid replacing multiply candidates with
> a stride of 1.  Such a candidate is really a copy or cast statement,
> and if we replace it, we will just generate a different copy or cast
> statement.  I noticed this with one of the test cases from the PR
> while debugging the problem.
>
> I've updated the two test cases that were previously enabled only
> with -fno-code-hoisting, removing that restriction.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  I've also tested this with SPEC cpu2006 and the
> patch is performance neutral on a POWER8 box (as expected).  Is
> this ok for trunk?
>
> Thanks,
> Bill
>
>
> [gcc]
>
> 2016-06-16  Bill Schmidt  
>
> * gimple-ssa-strength-reduction.c (uses_consumed_by_stmt): New
> function.
> (find_basis_for_candidate): Call uses_consumed_by_stmt rather than
> has_single_use.
> (slsr_process_phi): Likewise.
> (replace_uncond_cands_and_profitable_phis): Don't replace a
> multiply candidate with a stride of 1 (copy or cast).
> (phi_incr_cost): Call uses_consumed_by_stmt rather than
> has_single_use.
> (lowest_cost_path): Likewise.
> (total_savings): Likewise.
>
> [gcc/testsuite]
>
> 2016-06-16  Bill Schmidt  
>
> * gcc.dg/tree-ssa/slsr-35.c: Remove -fno-code-hoisting workaround.
> * gcc.dg/tree-ssa/slsr-36.c: Likewise.
>
>
> Index: gcc/gimple-ssa-strength-reduction.c
> ===
> --- gcc/gimple-ssa-strength-reduction.c (revision 239241)
> +++ gcc/gimple-ssa-strength-reduction.c (working copy)
> @@ -475,6 +475,48 @@ find_phi_def (tree base)
>return c->cand_num;
>  }
>
> +/* Determine whether all uses of NAME are directly or indirectly
> +   used by STMT.  That is, we want to know whether if STMT goes
> +   dead, the definition of NAME also goes dead.  */
> +static bool
> +uses_consumed_by_stmt (tree name, gimple *stmt, unsigned recurse)

use a default arg 'unsigned recurse = 0' to hide this implementation
detail at users.

> +{
> +  gimple *use_stmt;
> +  imm_use_iterator iter;
> +  bool retval = true;
> +
> +  FOR_EACH_IMM_USE_STMT (use_stmt, iter, name)
> +{
> +  if (use_stmt == stmt || is_gimple_debug (use_stmt))
> +   continue;
> +
> +  if (!is_gimple_assign (use_stmt))
> +   {
> + retval = false;
> + BREAK_FROM_IMM_USE_STMT (iter);
> +   }
> +
> +  /* Limit recursion.  */
> +  if (recurse >= 10)
> +   {
> + retval = false;
> + BREAK_FROM_IMM_USE_STMT (iter);
> +   }

Put this limit right before the recursion.

> +  tree next_name = gimple_get_lhs (use_stmt);
> +  if (!next_name || !is_gimple_reg (next_name))
> +   {
> + retval = false;
> + BREAK_FROM_IMM_USE_STMT (iter);
> +   }
> +
> +  if (uses_consumed_by_stmt (next_name, stmt, recurse + 1))
> +   continue;

So this doesn't change dependent on the result which means you likely meant

  if (! uses)
   {
 retval = false;
 BREAK...
   }

which possibly also invalidates your testing?

The whole thing is probably easier to optimize if you merge the ifs
that break into one.

Richard.

> +}
> +
> +  return retval;
> +}
> +
>  /* Helper routine for find_basis_for_candidate.  May be called twice:
> once for the candidate's base expr, and optionally again either for
> the candidate's phi definition or for a CAND_REF's alternative base
> @@ -550,7 +592,8 @@ find_basis_for_candidate (slsr_cand_t c)
>
>   /* If we found a hidden basis, estimate additional dead-code
>  savings if the phi and its feeding statements can be removed.  */
> - if (basis && 

Re: [PATCH][2/2] early LTO debug, main part

2017-06-20 Thread Richard Biener
On Wed, 7 Jun 2017, Richard Biener wrote:

> On Fri, 19 May 2017, Richard Biener wrote:
> 
> > 
> > This is a repost of the main part of the early LTO debug support.
> > The only changes relative to the last post is in the dwarf2out.c
> > pieces due to Jasons review and Jakubs introduction of
> > DW_OP_GNU_variable_value.
> > 
> > I've also adjusted testcases for fallout (the asan backtraces do
> > give files / line numbers because libbacktrace doesn't understand
> > the DWARF) plus added a -flto run over the libstdc++ pretty printer
> > testsuite -- after all the goal was to make those work with LTO,
> > and they now nicely do.
> > 
> > [LTO-]bootstrapped and tested on x86_64-unknown-linux-gnu.
> > 
> > I've also tested with -flto -g and compared to before the patch and
> > the outcome doesn't contain any surprises.
> > 
> > I've also ran the gdb testsuite with no differences (but I guess
> > it doesn't exercise LTO).
> > 
> > I've also built SPEC 2k6 with -flto -g.
> > 
> > I've also debugged optimized LTO bootstrapped cc1 a bit - not that
> > debugging (LTO) optimized cc1 is a pleasant experience, but at least
> > gdb doesn't crash.
> > 
> > Ok for trunk?
> 
> Ping.

Ping^2.

> > Both darwin and mingw maintainers were not concerned about LTO with -g
> > being broken for them.
> > 
> > This patch allows us to go forward with freeing more stuff after
> > the frontend finished, in particular remove LTO streaming of a lot
> > of type information that is referenced from trees (and, as a first
> > step, enable free-lang-data for non-LTO compiles).
> > 
> > Thanks,
> > Richard.
> > 
> > 2017-05-19  Richard Biener  
> > 
> > * debug.h (struct gcc_debug_hooks): Add die_ref_for_decl and
> > register_external_die hooks.
> > (debug_false_tree_charstarstar_uhwistar): Declare.
> > (debug_nothing_tree_charstar_uhwi): Likewise.
> > * debug.c (do_nothing_debug_hooks): Adjust.
> > (debug_false_tree_charstarstar_uhwistar): New do nothing.
> > (debug_nothing_tree_charstar_uhwi): Likewise.
> > * dbxout.c (dbx_debug_hooks): Adjust.
> > (xcoff_debug_hooks): Likewise.
> > * sdbout.c (sdb_debug_hooks): Likewise.
> > * vmsdbgout.c (vmsdbg_debug_hooks): Likewise.
> > 
> > * dwarf2out.c (macinfo_label_base): New global.
> > (dwarf2out_register_external_die): New function for the
> > register_external_die hook.
> > (dwarf2out_die_ref_for_decl): Likewise for die_ref_for_decl.
> > (dwarf2_debug_hooks): Use them.
> > (dwarf2_lineno_debug_hooks): Adjust.
> > (struct die_struct): Add with_offset flag.
> > (DEBUG_LTO_DWO_INFO_SECTION, DEBUG_LTO_INFO_SECTION,
> > DEBUG_LTO_DWO_ABBREV_SECTION, DEBUG_LTO_ABBREV_SECTION,
> > DEBUG_LTO_DWO_MACINFO_SECTION, DEBUG_LTO_MACINFO_SECTION,
> > DEBUG_LTO_DWO_MACRO_SECTION, DEBUG_LTO_MACRO_SECTION,
> > DEBUG_LTO_LINE_SECTION, DEBUG_LTO_DWO_STR_OFFSETS_SECTION,
> > DEBUG_LTO_STR_DWO_SECTION, DEBUG_STR_LTO_SECTION): New macros
> > defining section names for the early LTO debug variants.
> > (reset_indirect_string): New helper.
> > (add_AT_external_die_ref): Helper for 
> > dwarf2out_register_external_die.
> > (print_dw_val): Add support for offsetted symbol references.
> > (compute_section_prefix_1): Split out worker to distinguish
> > the comdat from the LTO case.
> > (compute_section_prefix): Wrap old comdat case here.
> > (output_die): Skip DIE symbol output for the LTO added one.
> > Handle DIE symbol references with offset.
> > (output_comp_unit): Guard section name mangling properly.
> > For LTO debug sections emit a symbol at the section beginning
> > which we use to refer to its DIEs.
> > (add_abstract_origin_attribute): For DIEs registered via
> > dwarf2out_register_external_die directly refer to the early
> > DIE rather than indirectly through the shadow one we created.
> > (gen_array_type_die): When generating early LTO debug do
> > not emit DW_AT_string_length.
> > (gen_formal_parameter_die): Do not re-create DIEs for PARM_DECLs
> > late when in LTO.
> > (gen_subprogram_die): Adjust the check for whether we face
> > a concrete instance DIE for an inline we can reuse for the
> > late LTO case.  Likewise avoid another specification DIE
> > for early built declarations/definitions for the late LTO case.
> > (gen_variable_die): Add type references for late duplicated VLA dies
> > when in late LTO.
> > (gen_inlined_subroutine_die): Do not call 
> > dwarf2out_abstract_function,
> > we have the abstract instance already.
> > (process_scope_var): Adjust decl DIE contexts in LTO which
> > first puts them in limbo.
> > (gen_decl_die): Do not generate type DIEs late 

Re: [PATCH][AArch64] Emit SIMD moves as mov

2017-06-20 Thread James Greenhalgh
On Tue, Jun 20, 2017 at 12:06:29PM +0100, Wilco Dijkstra wrote:
> SIMD moves are currently emitted as ORR.  Change this to use the MOV
> pseudo instruction just like integer moves (the ARM-ARM states MOV is the
> preferred disassembly), improving readability of -S output.
> 
> Passes bootstrap, OK for commit?

Does this introduce a dependency on a particular binutils version, or have
we always supported this alias?

The patch looks OK, but I don't want to introduce a new dependency so please
check how far back this is supported.

Thanks,
James

> 
> ChangeLog:
> 2017-06-20  Wilco Dijkstra  
> 
>   * config/aarch64/aarch64.md (movti_aarch64):
>   Emit mov rather than orr.
>   (movtf_aarch64): Likewise.
>   * config/aarch64/aarch64-simd.md (aarch64_simd_mov):
>   Emit mov rather than orr.
> --


  1   2   >