Re: [PATCH][AArch64] Add crypto_pmull attribute
Hi Ramana, Thanks for the review and approval. >> Please update the ARM backend with the new attribute too >> (define_insn "crypto_vmullp64" Its already been updated in the patch posted at:- https://gcc.gnu.org/ml/gcc-patches/2017-03/msg00504.html >> Ok with that change and checking that you can build cc1 for arm-none-eabi . Checked and built the arm toolchain successfully with the patch. Patch has been committed at:- https://gcc.gnu.org/viewcvs/gcc?view=revision=249433 Thanks, Naveen
[PATCH, rs6000] Add vec_reve support
GCC maintainers: This patch adds support for the various vec_reve builtins. The patch has been tested on powerpc64le-unknown-linux-gnu (Power 8 LE) and on powerpc64-unknown-linux-gnu (Power 8 BE) with no regressions. Is the patch OK for gcc mainline? Carl Love --- gcc/ChangeLog: 2017-06-20 Carl Love* config/rs6000/rs6000-c.c: Add support for built-in functions vector bool char vec_reve (vector bool char); vector signed char vec_reve (vector signed char); vector unsigned char vec_reve (vector unsigned char); vector bool int vec_reve (vector bool int); vector signed int vec_reve (vector signed int); vector unsigned int vec_reve (vector unsigned int); vector bool long long vec_reve (vector bool long long); vector signed long long vec_reve (vector signed long long); vector unsigned long long vec_reve (vector unsigned long long); vector bool short vec_reve (vector bool short); vector signed short vec_reve (vector signed short); vector double vec_reve (vector double); vector float vec_reve (vector float); * config/rs6000/rs6000-builtin.def (VREVE_V2DI, VREVE_V4SI, VREVE_V8HI, VREVE_V16QI, VREVE_V2DF, VREVE_V4SF, VREVE): New * config/rs6000/altivec.md (UNSPEC_VREVEV, VEC_A_size, altivec_vrev): New UNSPEC, new mode_attr, new patterns. * config/rs6000/altivec.h (vec_reve): New define * doc/extend.texi (vec_rev): Update the built-in documentation file for the new built-in functions. gcc/testsuite/ChangeLog: 2017-06-20 Carl Love * gcc.target/powerpc/builtins-3-vec_reve-runable.c (test_results, main): Add new runnable test file for the vec_rev built-ins. --- gcc/config/rs6000/altivec.h| 1 + gcc/config/rs6000/altivec.md | 31 +++ gcc/config/rs6000/rs6000-builtin.def | 9 + gcc/config/rs6000/rs6000-c.c | 29 +++ gcc/doc/extend.texi| 13 ++ .../powerpc/builtins-3-vec_reve-runnable.c | 251 + 6 files changed, 334 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-3-vec_reve-runnable.c diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index d542315..98ccfd2 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -142,6 +142,7 @@ #define vec_madd __builtin_vec_madd #define vec_madds __builtin_vec_madds #define vec_mtvscr __builtin_vec_mtvscr +#define vec_reve __builtin_vec_vreve #define vec_vmaxfp __builtin_vec_vmaxfp #define vec_vmaxsw __builtin_vec_vmaxsw #define vec_vmaxsh __builtin_vec_vmaxsh diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 25b2768..800d70c 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -46,6 +46,7 @@ UNSPEC_VPACK_UNS_UNS_SAT UNSPEC_VPACK_UNS_UNS_MOD UNSPEC_VPACK_UNS_UNS_MOD_DIRECT + UNSPEC_VREVEV UNSPEC_VSLV4SI UNSPEC_VSLO UNSPEC_VSR @@ -231,6 +232,11 @@ ;; Vector negate (define_mode_iterator VNEG [V4SI V2DI]) +;; Vector reverse elements, uses define_mode_iterator VEC_A +;; size in bytes of the vector element +(define_mode_attr VEC_A_size [(V2DI "8") (V4SI "4") (V8HI "2") + (V16QI "1") (V2DF "8") (V4SF "4")]) + ;; Vector move instructions. (define_insn "*altivec_mov" [(set (match_operand:VM2 0 "nonimmediate_operand" "=Z,v,v,?Y,?*r,?*r,v,v,?*r") @@ -3727,6 +3733,31 @@ DONE; }") +;; Vector reverse elements +(define_expand "altivec_vreve2" + [(set (match_operand:VEC_A 0 "register_operand" "=v") + (unspec:VEC_A [(match_operand:VEC_A 1 "register_operand" "v")] + UNSPEC_VREVEV))] + "TARGET_ALTIVEC" +{ + int i, j, k, size, num_elements; + rtvec v = rtvec_alloc (16); + rtx mask = gen_reg_rtx (V16QImode); + + size = ; + num_elements = 16 / size; + k = 0; + + for (j = num_elements-1; j >= 0; j--) +for (i = 0; i < size; i++) + RTVEC_ELT (v, i + j*size) = gen_rtx_CONST_INT (QImode, k++); + + emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); + emit_insn (gen_altivec_vperm_ (operands[0], operands[1], +operands[1], mask)); + DONE; +}) + ;; Vector SIMD PEM v2.06c defines LVLX, LVLXL, LVRX, LVRXL, ;; STVLX, STVLXL, STVVRX, STVRXL are available only on Cell. (define_insn "altivec_lvlx" diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index 4682628..20974b4 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -1130,6 +1130,13 @@ BU_ALTIVEC_1 (VUPKLSB, "vupklsb",CONST, altivec_vupklsb) BU_ALTIVEC_1 (VUPKLPX, "vupklpx",CONST, altivec_vupklpx) BU_ALTIVEC_1
Re: [committed] Fix -Werror=class-memaccess failures in jit testsuite (PR jit/81144)
On Tue, 2017-06-20 at 17:15 -0600, Martin Sebor wrote: > On 06/20/2017 03:25 PM, David Malcolm wrote: > > This patch fixes a couple of failures of the form: > > > > error: 'void* memset(void*, int, size_t)' clearing an object of > > non-trivial > > type 'struct quadratic_test'; use assignment or value > > -initialization > > instead [-Werror=class-memaccess] > > note: 'struct quadratic_test' declared here > > cc1plus: all warnings being treated as errors > > > > seen within the jit testsuite, by using zero-initialization instead > > of memset. > > > > (presumably introduced by r249234 aka > > a324786b4ded9047d05463b4bce9d238b6c6b3ef) > > > > Successfully tested on x86_64-pc-linux-gnu; takes jit.sum from: > > # of expected passes9211 > > # of unexpected failures2 > > to: > > # of expected passes9349 > > > > Martin: it's unclear to me what the benefit of the warning is for > > these > > cases. AIUI, it's complaining because the code is calling > > the default ctor for struct quadratic_test, and then that object is > > being clobbered by the memset. > > But if I'm reading things right, the default ctor for this struct > > zero-initializes all fields. Can't the compiler simply optimize > > away > > the redundant memset, and not issue a warning? Thanks for the info. > -Wclass-memaccess is issued because struct quadratic_test contains > members of classes that define a default ctor to initialize their > private members. > The premise behind the warning is that objects > of types with user-defined default and copy ctors should be > initialized by making use of their ctors, and those with private > data members manipulated via member functions rather than by > directly modifying their raw representation. Using memset to > bypass the default ctor doesn't begin the lifetime of an object, > can violate invariants set up by it, and using it to overwrite > private members breaks encapsulation. Examples of especially > insidious errors include overwriting const data, references, or > pointer to data members for which zero-initialization isn't > the same as clearing their bytes. If I'm reading my code correctly, all of the default ctors of all of the members of this struct are "merely" initializing the pointer they wrap to NULL. So the ctors are initializing everything to NULL, and then the memset redundant re-init's everything to 0 bits (I guess I was going for a "belt and braces" approach to ensure that things are initialized). > The warning runs early on in the C++ front end and has no knowledge > of either the effects of the type's ctors, dtor, and copy assignment > operator, or whether the raw memory function is called in lieu of > initializing an object (e.g., in storage obtained from malloc or > operator new), or as a shortcut to zero out its members, or when > zeroing them out happens to be safe and doesn't actually do any > of those bad things I mentioned above. Aha: so at the place where the warning runs it's not possible to access the ctors and tell that they're assigning NULL everywhere? Might it be possible to convert the warning to work in a two-phase way where it first gathers up a vec of suspicious-looking modifications, and then flushes them later, filtering against ctor information when it has the latter? (so that we don't have to warn for this case at -Wall?) Alternatively maybe this is PEBCAK at my end; if so, maybe a case for adding this to the changes.html page? (and maybe adding some notes on workarounds there, and/or to invoke.texi?) > > That said, I'm sorry (and a little surprised) that I missed these > errors in my tests. I thought I had all the languages covered by > using > >--enable-languages=all,ada,c,c++,fortran,go,lto,objc,obj-c++ > > but I guess jit still isn't implied by all, even after Nathan's > recent change to it. Let me add jit to my script (IIRC, I once > had it there but it was causing some trouble and I took it out.) Reading r248454 (aka 01b4453cde8f1871495955298043d9fb589e4a36), it looks like "jit" is only included in "all" if you also pass --enable-host-shared Presumably that's what happened. Bother. Thanks; hope this is constructive. Dave
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
Jeff Law wrote: > But the stack pointer might have already been advanced into the guard > page by the caller. For the sake of argument assume the guard page is > 0xf1000 and assume that our stack pointer at entry is 0xf1010 and that > the caller hasn't touched the 0xf1000 page. > > If FrameSize >= 32, then the stores are going to hit the 0xf page > rather than the 0xf1000 page. That's jumping the guard. Thus we have > to emit a probe prior to this stack allocation. That's an incorrect ABI that allows adjusting the frame by 4080+32! A correct one might allow say 1024 bytes for outgoing arguments. That means when you call a function, there is still guard-page-size - 1024 bytes left that you can use to allocate locals. With a 4K guard page that allows leaf functions up to 3KB, and depending on the frame locals of 2-3KB plus up to 1024 bytes of outgoing arguments without inserting any probes beyond the normal frame stores. This design means almost no functions need additional probes. Assuming we're also increasing the guard page size to 64KB, it's cheap even for large functions. Wilco
Re: [committed] Fix -Werror=class-memaccess failures in jit testsuite (PR jit/81144)
On 06/20/2017 03:25 PM, David Malcolm wrote: This patch fixes a couple of failures of the form: error: 'void* memset(void*, int, size_t)' clearing an object of non-trivial type 'struct quadratic_test'; use assignment or value-initialization instead [-Werror=class-memaccess] note: 'struct quadratic_test' declared here cc1plus: all warnings being treated as errors seen within the jit testsuite, by using zero-initialization instead of memset. (presumably introduced by r249234 aka a324786b4ded9047d05463b4bce9d238b6c6b3ef) Successfully tested on x86_64-pc-linux-gnu; takes jit.sum from: # of expected passes9211 # of unexpected failures2 to: # of expected passes9349 Martin: it's unclear to me what the benefit of the warning is for these cases. AIUI, it's complaining because the code is calling the default ctor for struct quadratic_test, and then that object is being clobbered by the memset. But if I'm reading things right, the default ctor for this struct zero-initializes all fields. Can't the compiler simply optimize away the redundant memset, and not issue a warning? -Wclass-memaccess is issued because struct quadratic_test contains members of classes that define a default ctor to initialize their private members. The premise behind the warning is that objects of types with user-defined default and copy ctors should be initialized by making use of their ctors, and those with private data members manipulated via member functions rather than by directly modifying their raw representation. Using memset to bypass the default ctor doesn't begin the lifetime of an object, can violate invariants set up by it, and using it to overwrite private members breaks encapsulation. Examples of especially insidious errors include overwriting const data, references, or pointer to data members for which zero-initialization isn't the same as clearing their bytes. The warning runs early on in the C++ front end and has no knowledge of either the effects of the type's ctors, dtor, and copy assignment operator, or whether the raw memory function is called in lieu of initializing an object (e.g., in storage obtained from malloc or operator new), or as a shortcut to zero out its members, or when zeroing them out happens to be safe and doesn't actually do any of those bad things I mentioned above. That said, I'm sorry (and a little surprised) that I missed these errors in my tests. I thought I had all the languages covered by using --enable-languages=all,ada,c,c++,fortran,go,lto,objc,obj-c++ but I guess jit still isn't implied by all, even after Nathan's recent change to it. Let me add jit to my script (IIRC, I once had it there but it was causing some trouble and I took it out.) Martin
Re: [PATCH][libgcc] Fix PR81080, build libgcov with large file support
On Wed, Jun 14, 2017 at 1:01 AM, Richard Bienerwrote: > > The following patch makes sure we build the 32bit multilib libgcov with > large file support on x86_64-linux. libgcov.h ends up using auto-host.h > via including tconfig.h which is only valid for the main multilib > (and on x86_64 doesn't need explicit large-file support defines). That > libgcc ends up using that is probably from times where it wasn't at > the toplevel, some files already include auto-target.h generated by > libgcc configure but most do so after including tsystem.h which is > of course too late. I suppose libgcc files shouldn't include tconfig.h > from gcc/, but that's a change going to far for this bug ;) > > Thus, this makes libgcov.h include auto-target.h (but in the correct > position) plus adds AC_SYS_LARGEFILE to libgccs configure. > > With that I properly end up with 32bit libgcov.a using fopen64 and open64 > as fopen/open seem to fail for some filesystems and inode numbers that > do not fit 32bits even if the files in question are not large. Failure > mode is: > > int main(void) { > return 0; > } > > niffler:/home/mue/src # gcc -m32 --coverage -o t testit.c > niffler:/home/mue/src # ./t > profiling:/home/mue/src/testit.gcda:Cannot open > > Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk and > branches after a while? > > libgcov is the only libgcc module doing I/O. > > Other than libgcov libgomp, libcilkrts, libmpx and libstdc++ > are similarly affected (they use fopen on the 32bit multilib) > but not fixed. libubsan, libasan, libssp, libbacktrace and libgfortran > use open. While libgfortran configury has AC_SYS_LARGEFILE, the > open use leaks in through libbacktrace (ubsan/asan might have the > same issue, didn't investigate). libbacktrace lacks AC_SYS_LARGEFILE. > > Thanks, > Richard. > > 2017-06-14 Richard Biener > > PR gcov-profile/81080 > * configure.ac: Add AC_SYS_LARGEFILE. > * libgcov.h: Include auto-target.h before tsystem.h to pick > up _FILE_OFFSET_BITS which might differ for multilibs. > * config.in: Regenerate. > * configure: Likewise. This is OK. Thanks. Ian
Re: [PATCH] LFS support for libbacktrace
On Wed, Jun 14, 2017 at 3:40 AM, Richard Bienerwrote: > > This fixes the [f]open use in libgfortran. Doesn't fix the ones > in libsanitizer because those appearantly use a copy because they > need to rename stuff... > > Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk > and branches? > > Thanks, > Richard. > > 2017-06-14 Richard Biener > > * configure.ac: Add AC_SYS_LARGEFILE. > * config.h.in: Regenerate. > * configure: Likewise. This is OK everywhere. Thanks. Ian
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
> But what you end up depending on is undocumented behavior of a > particular kernel implementation. That seems rather unwise. And it's the single example of such a thing in the entire codebase? I don't know the code of the sanitizer much, but from the outside it looks full of similar tricks... > Which ABIs have that property? I'll be the first to admit that I've > purged much of my weird ABI memories. The original Alpha ABI mentioned by Richard IIRC for example. > Supporting ABIs which force us into a probe, then allocate strategy is > actually easy. We can use the existing -fstack-check code, but use the > value 0 for STACK_CHECK_PROTECT. > > Just replace all uses of STACK_CHECK_PROTECT with calls to a wrapper. > > The wrapper looks like > > if (magic_flag) > return STACK_CHECK_PROTECT; > else > return 0; > > That's precisely what we were planning to do prior to bumping against > the valgrind issues. That indirection makes it easy to ensure we didn't > change the behavior of the existing stack-check for Ada, but also allows > us to change the behavior for the new stack checking option. Yes, that would seem the most straightforward thing to do modulo Valgrind. > Ah, so if you're running on an alternate stack, then why probe ahead of > need? I thought the whole point of probing a couple pages ahead as to > ensure you could take the signal the Ada. We run on the alternate stack only when we do _not_ probe ahead, i.e. on x86/x86-64 Linux. > I've also wondered if a 2 page guard would solve some of these problems. > In the event of stack overflow, the kernel maps in one of the two pages > for use by the signal handler. But changing things at this point may > not be worth the effort. That was exactly the strategy used by Tru64 (so you needed to manually unmap the page after you had recovered from the overflow). -- Eric Botcazou
Re: [PATCH, AArch64] Add x86 intrinsic headers to GCC AArch64 taget
On Tue, Jun 20, 2017 at 09:34:25PM +, Joseph Myers wrote: > On Tue, 20 Jun 2017, Segher Boessenkool wrote: > > > > And as you see see below the gcc.target tests have to be duplicated > > > anyway. Even if the C code is common there will many differences in > > > dg-options and dg-require-effective-target. Trying to common these > > > implementations only creates more small files to manage. > > > > So somewhere in the near future we'll have to pull things apart again, > > if we go with merging things now. > > The common part in the intrinsics implementation should be exactly the > parts that can be implemented in GNU C without target-specific intrinsics > being needed. There should be nothing to pull apart if you start with the > right things in the common header. If a particular header has some > functions that can be implemented in GNU C and some that need > target-specific code, the generic GNU C functions should be in a common > header, #included by the target-specific header. The common header should > have no conditionals on target architectures whatever (it might have > conditionals on things like endianness). I don't think there is much that will end up in the common header eventually. If it was possible to describe most of this in plain C, and in such a way that it would optimise well, there would not *be* these intrinsics. > I don't expect many different effective-target / dg-add-options keywords > to be needed for common tests (obviously, duplicating tests for each > architecture wanting these intrinsics is generally a bad idea). Yeah, I think it should be possible to share the tests, perhaps with some added dg things (so that we don't have to repeat the same things over and over). Segher
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
On 06/20/2017 06:27 AM, Richard Biener wrote: > On Tue, Jun 20, 2017 at 2:20 PM, Uros Bizjakwrote: >> On Tue, Jun 20, 2017 at 2:17 PM, Uros Bizjak wrote: >>> On Tue, Jun 20, 2017 at 2:13 PM, Florian Weimer wrote: On 06/20/2017 01:10 PM, Uros Bizjak wrote: > 74,99% a.outa.out [.] test_or > 12,50% a.outa.out [.] test_movb > 12,50% a.outa.out [.] test_movl Could you try notl/notb/negl/negb as well, please? >>> >>> These all have the same (long) runtime as test_or. >> >> Perhaps we can use "testb $0, %0"? It doesn't write to the memory, but >> otherwise has the same runtime as movb/movl. > > That sounds good, OTOH it's a matter of putting strain on the > memory fetch or store side... We'll get cacheline allocations in > any case (but the memory will be used eventually). Instead > of test a mere movb into a scratch register (aka, load instead of > store) would work as well apart from the need of a scratch register. It was never clear to me why we always implement probes via stores -- though from development standpoint a destructive store is useful. I'd expect a tst to generate the desired SEGV. How does that like compare to the partial-allocation + push approach? > > We can also vectorize with scatters ;) (just kidding) :-) jeff
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
On 06/20/2017 02:16 AM, Eric Botcazou wrote: > > Right, because the Linux kernel for x86/x86-64 is the only OS flavor that > doesn't let you probe the stack ahead of the stack pointer. All other > combinations of OS and architecture we tried (and it's quite a lot) do. But what you end up depending on is undocumented behavior of a particular kernel implementation. That seems rather unwise. > >> After much poking around I concluded that we really need to implement >> allocation and probing via a "moving sp" strategy. Probing into >> unallocated areas runs afoul of valgrind, so that's a non-starter. > > The reason why you cannot use this strategy on a global basis for stack > checking is that some ABIs specify that you cannot update the stack pointer > more than once to establish a frame; others don't explicitly care but... Which ABIs have that property? I'll be the first to admit that I've purged much of my weird ABI memories. Supporting ABIs which force us into a probe, then allocate strategy is actually easy. We can use the existing -fstack-check code, but use the value 0 for STACK_CHECK_PROTECT. Just replace all uses of STACK_CHECK_PROTECT with calls to a wrapper. The wrapper looks like if (magic_flag) return STACK_CHECK_PROTECT; else return 0; That's precisely what we were planning to do prior to bumping against the valgrind issues. That indirection makes it easy to ensure we didn't change the behavior of the existing stack-check for Ada, but also allows us to change the behavior for the new stack checking option. > >> Allocating stack space, then probing the pages within the space is >> vulnerable to async signal delivery between the allocation point and the >> probe point. If that occurs the signal handler could end up running on >> a stack that has collided with the heap. > > ...yes, there are difficulties with the "moving sp" strategy. > >> Finally, we need not ensure the ability to handle a signal at stack >> overflow. It is fine for the kernel to halt the process immediately if >> it detects a reference to the guard page. > > In Ada it's the opposite and we use an alternate signal stack in this case. Ah, so if you're running on an alternate stack, then why probe ahead of need? I thought the whole point of probing a couple pages ahead as to ensure you could take the signal the Ada. I've also wondered if a 2 page guard would solve some of these problems. In the event of stack overflow, the kernel maps in one of the two pages for use by the signal handler. But changing things at this point may not be worth the effort. > >> Michael Matz has suggested some generic support so that we don't have to >> write target specific code for each and every target we support. THe >> idea is to have a helper function which allocates and probes stack >> space. THe port can then call that helper function from within its >> prologue generator. I think this is wise -- I wouldn't want to go >> through this exercise on every port. > > Interesting. We never convinced ourselves that this was worthwhile. The idea is not to have to write probing code for all those embedded targets. I doubt anyone really wants to write probes for the mn103, rl78, mep, etc etc. With Matz's little helper routine, they just have to pick the right point in their prologue code to call the helper. At least that's the theory. Jeff
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
On 06/20/2017 03:27 AM, Richard Earnshaw (lists) wrote: > On 19/06/17 18:07, Jeff Law wrote: >> As some of you are likely aware, Qualys has just published fairly >> detailed information on using stack/heap clashes as an attack vector. >> Eric B, Michael M -- sorry I couldn't say more when I contact you about >> -fstack-check and some PPC specific stuff. This has been under embargo >> for the last month. >> >> >> -- >> >> >> http://www.openwall.com/lists/oss-security/2017/06/19/1 >> > [...] >> aarch64 is significantly worse. There are no implicit probes we can >> exploit. Furthermore, the prologue may allocate stack space 3-4 times. >> So we have the track the distance to the most recent probe and when that >> distance grows too large, we have to emit a probe. Of course we have to >> make worst case assumptions at function entry. >> > > I'm not sure I understand what you're saying here. According to the > comment above aarch64_expand_prologue, the stack frame looks like: > > +---+ > | | > | incoming stack arguments | > | | > +---+ > | | <-- incoming stack pointer (aligned) > | callee-allocated save area | > | for register varargs | > | | > +---+ > | local variables | <-- frame_pointer_rtx > | | > +---+ > | padding0 | \ > +---+ | > | callee-saved registers | | frame.saved_regs_size > +---+ | > | LR' | | > +---+ | > | FP' | / <- hard_frame_pointer_rtx (aligned) > +---+ > | dynamic allocation | > +---+ > | padding | > +---+ > | outgoing stack arguments | <-- arg_pointer > | | > +---+ > | | <-- stack_pointer_rtx (aligned) > > Now for the majority of frames the amount of local variables is small > and there is neither dynamic allocation nor the need for outgoing local > variables. In this case the first instruction in the function is > > stp fp, lr, [sp, #-FrameSize But the stack pointer might have already been advanced into the guard page by the caller. For the sake of argument assume the guard page is 0xf1000 and assume that our stack pointer at entry is 0xf1010 and that the caller hasn't touched the 0xf1000 page. If FrameSize >= 32, then the stores are going to hit the 0xf page rather than the 0xf1000 page. That's jumping the guard. Thus we have to emit a probe prior to this stack allocation. Now because this instruction stores at *new_sp, it does allow us to eliminate future probes and I do take advantage of that in my code. The implementation is actually rather simple. We keep a conservative estimate of the offset of the last known probe relative to the stack pointer. At entry we have to assume the offset is: PROBE_INTERVAL - (STACK_BOUNDARY / BITS_PER_UNIT) A stack allocation increases the offset. A store into the stack decreases the offset. i A probe is required before an allocation that increases the offset to >= PROBE_INTERVAL. An allocation + store instruction such as shown does both, but can (and is) easily modeled. THe only tricky case here is that you can't naively break it up into an allocation and store as that can force an unnecessary probe (say if the allocated space is just enough to hold the stored objects). > > > If the locals area gets slightly larger (>= 512 bytes) then the sequence > becomes > sub sp, sp, #FrameSize > stp fp, lr, [sp] > > But again this acts as a sufficient implicit probe provided that > FrameSize does not exceed the probe interval. And again, the store acts as a probe which can eliminate potential probes that might occur later in the instruction stream. But if the allocation by the "sub" instruction causes our running offset to cross PROBE_BOUNDARY, then we must emit a probe prior to the "sub" instruction. Hopefully it'll be clearer when I post the code :-) aarch64 is one that will need updating as all work to-date has been with Red Hat's 4.8 compiler with the aarch64 code generator bolted onto the side. So perhaps "no implicit probes" was too strong. It would probably be better stated "no implicit probes in the caller". We certainly use stores in the prologue to try and eliminate probes. In fact, we try harder on aarch64 than any other target. Jeff
Re: [i386] __builtin_ia32_stmxcsr could be pure
Ping. On Sat, 3 Jun 2017, Marc Glisse wrote: Hello, I don't think Richard's "sounds good" was meant as "ok to commit". Does an x86 maintainer want to approve or criticize the patch? https://gcc.gnu.org/ml/gcc-patches/2017-05/msg02009.html On Fri, 26 May 2017, Richard Biener wrote: On Fri, May 26, 2017 at 10:55 AM, Marc Glissewrote: Hello, glibc marks fegetround as a pure function. On x86, people tend to use _MM_GET_ROUNDING_MODE instead, which could benefit from the same. I think it is safe, but a second opinion would be welcome. Sounds good. The important part is to keep the dependency to SET_ROUNDING_MODE which is done via claiming both touch global memory. I could have handled just this builtin, but it seemed better to provide def_builtin_pure (like "const" already has) since there should be other builtins that can be marked this way (maybe the gathers?). Should work for gathers. They could even use stronger guarantees, namely a fnspec with "..R" (the pointer argument is only read from directly). Similarly scatter can use ".W" (the pointer argument is only written to directly). Richard. Bootstrap+testsuite on x86_64-pc-linux-gnu with default languages. 2017-05-29 Marc Glisse gcc/ * config/i386/i386.c (struct builtin_isa): New field pure_p. Reorder for compactness. (def_builtin, def_builtin2, ix86_add_new_builtins): Handle pure_p. (def_builtin_pure, def_builtin_pure2): New functions. (ix86_init_mmx_sse_builtins) [__builtin_ia32_stmxcsr]: Mark as pure. gcc/testsuite/ * gcc.target/i386/getround.c: New file. -- Marc Glisse -- Marc Glisse
Re: [PATCH, AArch64] Add x86 intrinsic headers to GCC AArch64 taget
On Tue, 20 Jun 2017, Segher Boessenkool wrote: > > And as you see see below the gcc.target tests have to be duplicated > > anyway. Even if the C code is common there will many differences in > > dg-options and dg-require-effective-target. Trying to common these > > implementations only creates more small files to manage. > > So somewhere in the near future we'll have to pull things apart again, > if we go with merging things now. The common part in the intrinsics implementation should be exactly the parts that can be implemented in GNU C without target-specific intrinsics being needed. There should be nothing to pull apart if you start with the right things in the common header. If a particular header has some functions that can be implemented in GNU C and some that need target-specific code, the generic GNU C functions should be in a common header, #included by the target-specific header. The common header should have no conditionals on target architectures whatever (it might have conditionals on things like endianness). I don't expect many different effective-target / dg-add-options keywords to be needed for common tests (obviously, duplicating tests for each architecture wanting these intrinsics is generally a bad idea). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] [i386] Enable Control-flow Enforcement Technology (CET).
This patch is completely missing documentation (in *.texi files) of the new options, attribute, built-in functions etc. You appear to be adding quite target-specific things to the architecture-independent compiler. If the attribute, for example, is to be architecture-independent, the documentation needs to define semantics for it that make sense on any architecture (or any architecture providing such features), not just on x86; the patch submission needs to justify the design choices of what is target-specific and what is target-independent. (Cf. MPX where there are various architecture-independent features for which a software implementation would be logically possible, although actually the only implementation of those features in GCC is for MPX hardware.) I don't think this patch would even build for non-x86 targets, because you're putting completely x86-specific references such as TARGET_CET and gen_nop_endbr in target-independent files. -- Joseph S. Myers jos...@codesourcery.com
[committed] Fix bootstrap on armv6-*-freebsd
Hi All, I committed the chunk below to fix bootstrap on armv6*-*-freebsd. Andreas 2017-06-20 Andreas Tobler* config.gcc (armv6*-*-freebsd*): Change the target_cpu_cname to arm1176jzf-s. Index: config.gcc === --- config.gcc (revision 249427) +++ config.gcc (working copy) @@ -1089,7 +1089,7 @@ tm_file="${tm_file} arm/bpabi.h arm/freebsd.h arm/aout.h arm/arm.h" case $target in armv6*-*-freebsd*) - target_cpu_cname="arm1176jzfs" + target_cpu_cname="arm1176jzf-s" tm_defines="${tm_defines} TARGET_FREEBSD_ARMv6=1" if test $fbsd_major -ge 11; then tm_defines="${tm_defines} TARGET_FREEBSD_ARM_HARD_FLOAT=1"
Re: [PATCH, AArch64] Add x86 intrinsic headers to GCC AArch64 taget
On Tue, Jun 20, 2017 at 01:51:24PM -0500, Steven Munroe wrote: > I am not sure this works or is even a good idea. > > As an accident bmiintrin.h can be implemented as C code or common > builtins. But bmi2intrin.h depends on __builtin_bpermd which to my > knowledge is PowerISA only. Right. And the plan is to only support 64-bit, LE, POWER8 and above (I hope I got that right -- the point is, only systems with newish features, not something generic even when considering rs6000 alone). > As I work on mmx, sse, sse2, etc it gets more complicated. There are > many X86 intrinsic instances that require altivec.h unique instrisics to > implement efficiently for the power64le target and some inline __asm. Yeah. And even then the expectation is not to get perfectly good performance, only something good enough as a starting point for a porting effort. > Net the current sample size so far is to small to make a reasonable > assessment. Right! And we have only two implementations so far, as well. > And as you see see below the gcc.target tests have to be duplicated > anyway. Even if the C code is common there will many differences in > dg-options and dg-require-effective-target. Trying to common these > implementations only creates more small files to manage. So somewhere in the near future we'll have to pull things apart again, if we go with merging things now. It's not like the "common" parts will see much (if any) maintenance, anyway... The interface is already set in stone, that's the whole point of this all. Segher
Re: [PATCH/AARCH64] Improve/correct ThunderX 1 cost model for Arith_shift
On Mon, Jun 19, 2017 at 2:00 PM, Andrew Pinskiwrote: > On Wed, Jun 7, 2017 at 10:16 AM, James Greenhalgh > wrote: >> On Fri, Dec 30, 2016 at 10:05:26PM -0800, Andrew Pinski wrote: >>> Hi, >>> Currently for the following function: >>> int f(int a, int b) >>> { >>> return a + (b <<7); >>> } >>> >>> GCC produces: >>> add w0, w0, w1, lsl 7 >>> But for ThunderX 1, it is better if the instruction was split allowing >>> better scheduling to happen in most cases, the latency is the same. I >>> get a small improvement in coremarks, ~1%. >>> >>> Currently the code does not take into account Arith_shift even though >>> the comment: >>> /* Strip any extend, leave shifts behind as we will >>> cost them through mult_cost. */ >>> Say it does not strip out the shift, aarch64_strip_extend does and has >>> always has since the back-end was added to GCC. >>> >>> Once I fixed the code around aarch64_strip_extend, I got a regression >>> for ThunderX 1 as some shifts/extends (left shifts <=4 and/or zero >>> extends) are considered free so I needed to add a new tuning flag. >>> >>> Note I will get an even more improvement for ThunderX 2 CN99XX, but I >>> have not measured it yet as I have not made the change to >>> aarch64-cost-tables.h yet as I am waiting for approval of the renaming >>> patch first before submitting any of the cost table changes. Also I >>> noticed this problem with this tuning first and then looked back at >>> what I needed to do for ThunderX 1. >>> >>> OK? Bootstrapped and tested on aarch64-linux-gnu without any >>> regressions (both with and without --with-cpu=thunderx). >> >> This is mostly OK, but I don't like the name "easy"_shift_extend. Cheap >> or free seems better. I have some other minor points below. > > > Ok, that seems like a good idea. I used easy since that was the > wording our hardware folks had came up with. I am changing the > comments to make clearer when this flag should be used. > I should a new patch out by the end of today. Due to the LSE ICE which I reported in the other thread, it took me longer to send out a new patch. Anyways here is the updated patch with the changes requested. OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. Thanks, Andrew Pinski * config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs): Increment Arith_shift and Arith_shift_reg by 1. * config/aarch64/aarch64-tuning-flags.def (cheap_shift_extend): New tuning flag. * config/aarch64/aarch64.c (thunderx_tunings): Enable AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND. (aarch64_strip_extend): Add new argument and test for it. (aarch64_cheap_mult_shift_p): New function. (aarch64_rtx_mult_cost): Call aarch64_cheap_mult_shift_p and don't add a cost if it is true. Update calls to aarch64_strip_extend. (aarch64_rtx_costs): Update calls to aarch64_strip_extend. > > Thanks, > Andrew > > >> >>> Index: config/aarch64/aarch64-tuning-flags.def >>> === >>> --- config/aarch64/aarch64-tuning-flags.def (revision 243974) >>> +++ config/aarch64/aarch64-tuning-flags.def (working copy) >>> @@ -35,4 +35,8 @@ two load/stores are not at least 8 byte >>> pairs. */ >>> AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW) >>> >>> +/* Logical shift left <=4 with/without zero extend are considered easy >>> + extended, also zero extends without the shift. */ >> >> >> I'm struggling to parse this comment. "also zero extends without the shift" >> is what is getting me. I'm also not certain I follow when I should set this >> flag. If all shifts are cheap/free on my platform, should I set this flag? >> >>> +AARCH64_EXTRA_TUNING_OPTION ("easy_shift_extend", EASY_SHIFT_EXTEND) >>> + >>> #undef AARCH64_EXTRA_TUNING_OPTION >> >> >>> + >>> +/* Return true iff X is an easy shift without a sign extend. */ >>> + >> >> Again I don't like calling <= 4 "easy", it feels imprecise. >> >> Thanks, >> James >> Index: gcc/config/aarch64/aarch64-cost-tables.h === --- gcc/config/aarch64/aarch64-cost-tables.h(revision 249424) +++ gcc/config/aarch64/aarch64-cost-tables.h(working copy) @@ -136,8 +136,8 @@ const struct cpu_cost_table thunderx_ext 0, /* Logical. */ 0, /* Shift. */ 0, /* Shift_reg. */ -COSTS_N_INSNS (1), /* Arith_shift. */ -COSTS_N_INSNS (1), /* Arith_shift_reg. */ +COSTS_N_INSNS (1)+1, /* Arith_shift. */ +COSTS_N_INSNS (1)+1, /* Arith_shift_reg. */ COSTS_N_INSNS (1), /* UNUSED: Log_shift. */ COSTS_N_INSNS (1), /* UNUSED: Log_shift_reg. */ 0, /* Extend. */ Index: gcc/config/aarch64/aarch64-tuning-flags.def === --- gcc/config/aarch64/aarch64-tuning-flags.def (revision 249424) +++
NOP conversions in X+CST+CST
Hello, now that FRE was fixed to avoid infinite recursion, this patch passes bootstrap+testsuite on x86_64-pc-linux-gnu multilib with all languages (including ada). This isn't exactly the patch that was reverted, because the previous patch did not actually handle vectors properly. It still shouldn't interfere with the patch by Robin Dapp which IIUC only handles the case where the conversion is an extension. 2017-06-21 Marc Glissegcc/ * match.pd (nop_convert): New predicate. ((A +- CST1) +- CST2): Allow some NOP conversions. gcc/testsuite/ * gcc.dg/tree-ssa/addadd.c: Un-XFAIL. * gcc.dg/tree-ssa/addadd-2.c: New file. -- Marc GlisseIndex: gcc/match.pd === --- gcc/match.pd (revision 249413) +++ gcc/match.pd (working copy) @@ -67,20 +67,34 @@ along with GCC; see the file COPYING3. BUILT_IN_L##FN \ BUILT_IN_LL##FN) \ (define_operator_list X##FN##L BUILT_IN_I##FN##L \ BUILT_IN_L##FN##L \ BUILT_IN_LL##FN##L) DEFINE_INT_AND_FLOAT_ROUND_FN (FLOOR) DEFINE_INT_AND_FLOAT_ROUND_FN (CEIL) DEFINE_INT_AND_FLOAT_ROUND_FN (ROUND) DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) + +/* As opposed to convert?, this still creates a single pattern, so + it is not a suitable replacement for convert? in all cases. */ +(match (nop_convert @0) + (convert @0) + (if (tree_nop_conversion_p (type, TREE_TYPE (@0) +(match (nop_convert @0) + (view_convert @0) + (if (VECTOR_TYPE_P (type) && VECTOR_TYPE_P (TREE_TYPE (@0)) + && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)) + && tree_nop_conversion_p (TREE_TYPE (type), TREE_TYPE (TREE_TYPE (@0)) +/* This one has to be last, or it shadows the others. */ +(match (nop_convert @0) + @0) /* Simplifications of operations with one constant operand and simplifications to constants or single values. */ (for op (plus pointer_plus minus bit_ior bit_xor) (simplify (op @0 integer_zerop) (non_lvalue @0))) /* 0 +p index -> (type)index */ @@ -1289,32 +1303,58 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (simplify (plus:c (minus @0 @1) @1) @0) (simplify (minus @0 (plus:c @0 @1)) (negate @1)) (simplify (minus @0 (minus @0 @1)) @1) - /* (A +- CST1) +- CST2 -> A + CST3 */ + /* (A +- CST1) +- CST2 -> A + CST3 + Use view_convert because it is safe for vectors and equivalent for + scalars. */ (for outer_op (plus minus) (for inner_op (plus minus) + neg_inner_op (minus plus) (simplify - (outer_op (inner_op @0 CONSTANT_CLASS_P@1) CONSTANT_CLASS_P@2) - /* If the constant operation overflows we cannot do the transform - as we would introduce undefined overflow, for example - with (a - 1) + INT_MIN. */ - (with { tree cst = const_binop (outer_op == inner_op - ? PLUS_EXPR : MINUS_EXPR, type, @1, @2); } - (if (cst && !TREE_OVERFLOW (cst)) - (inner_op @0 { cst; } )) + (outer_op (nop_convert (inner_op @0 CONSTANT_CLASS_P@1)) + CONSTANT_CLASS_P@2) + /* If one of the types wraps, use that one. */ + (if (!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type)) + (if (outer_op == PLUS_EXPR) + (plus (view_convert @0) (inner_op @2 (view_convert @1))) + (minus (view_convert @0) (neg_inner_op @2 (view_convert @1 + (if (!ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0)) + || TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))) + (if (outer_op == PLUS_EXPR) + (view_convert (plus @0 (inner_op (view_convert @2) @1))) + (view_convert (minus @0 (neg_inner_op (view_convert @2) @1 + /* If the constant operation overflows we cannot do the transform + directly as we would introduce undefined overflow, for example + with (a - 1) + INT_MIN. */ + (if (types_match (type, @0)) + (with { tree cst = const_binop (outer_op == inner_op + ? PLUS_EXPR : MINUS_EXPR, + type, @1, @2); } + (if (cst && !TREE_OVERFLOW (cst)) + (inner_op @0 { cst; } ) + /* X+INT_MAX+1 is X-INT_MIN. */ + (if (INTEGRAL_TYPE_P (type) && cst + && wi::eq_p (cst, wi::min_value (type))) + (neg_inner_op @0 { wide_int_to_tree (type, cst); }) + /* Last resort, use some unsigned type. */ + (with { tree utype = unsigned_type_for (type); } + (view_convert (inner_op + (view_convert:utype @0) + (view_convert:utype + { drop_tree_overflow (cst); }) /* (CST1 - A) +- CST2 -> CST3 - A */ (for outer_op (plus minus) (simplify (outer_op (minus CONSTANT_CLASS_P@1 @0) CONSTANT_CLASS_P@2) (with { tree cst = const_binop (outer_op, type, @1, @2); } (if (cst && !TREE_OVERFLOW (cst)) (minus { cst; } @0) /* CST1 - (CST2 - A) -> CST3 + A */ Index: gcc/testsuite/gcc.dg/tree-ssa/addadd-2.c === --- gcc/testsuite/gcc.dg/tree-ssa/addadd-2.c (nonexistent) +++
[PATCH] [i386] Enable Control-flow Enforcement Technology (CET).
Control-flow Enforcement Technology (CET) provides the following capabilities to defend against ROP/JOP style control-flow subversion attacks: - Shadow Stack - return address protection to defend against Return Oriented Programming, - Indirect branch tracking - free branch protection to defend against Jump/Call Oriented Programming. Details are described in the doc https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf This patch enables CET in the compiler only (gcc directory). The executables built with the CET enabled compiler can run successfully on non-CET i386 HW as executed new instructions are NOPs there. Functional testing can be done through Intel® Software Development Emulator or Intel® SDE. There will be more patches to enable remaining intrinsics, to support CET in the compiler libraries (exception handling) and in glibc. The patch adds 1) new options to control the technology, 2) three new instructions (endbr, rdssp, inssp) and intrinsics, 3) a pass to generate endbr instruction, 4) new 'notrack' attribute for functions and pointers to function and code generation for it, 5) shadow stack processing in setjmp/longjmp builtins. Basic functional tests are added. Bootstrap is done successfully w/o and w/ CET option (-mcet). gcc/ * builtins.c (expand_builtin_setjmp_setup): Add saving shadow stack pointer in jmpbuf using rdssp insatruction. (expand_builtin_longjmp): Add adjusting shadow stack pointer using incssp instruction. * c-family/c-attribs.c (handle_notrack_attribute): New function. (c_common_attribute_table): Add a 'notrack' attribute. * calls.c (emit_call_1): Set REG_CALL_NOTRACK on call insn. (flags_from_decl_or_type): Retrieve notrack attribute from a decl. (expand_call): Retrieve notrack attribute from a decl. * combine.c: Handle REG_CALL_NOTRACK. * common/config/i386/i386-common.c (OPTION_MASK_ISA_CET_SET, OPTION_MASK_ISA_CET_UNSET): New. (ix86_handle_option): Handle OPT_mcet. * config.gcc: Add cetintrin.h. * config/i386/cetintrin.h: New file. * config/i386/cpuid.h: (bit_CET) new bit. * config/i386/driver-i386.c (host_detect_local_cpu): Detect cet. * config/i386/i386-builtin.def (__builtin_ia32_rdsspd, __builtin_ia32_rdsspq, __builtin_ia32_incsspd, __builtin_ia32_incsspd): New intrinsics. * config/i386/i386-c.c (ix86_target_macros_internal): Define __CET__. * config/i386/i386-protos.h (ix86_notrack_prefixed_insn_p): New. * config/i386/i386.c (ix86_target_string): Add -mcet. (ix86_valid_target_attribute_inner_p): Add cet. (ix86_print_operand): Output notrack. BDESC_VERIFYS for CET intrinsics. (ix86_init_mmx_sse_builtins): Define CET intrinsics. (x86_output_mi_thunk): Add endbr instruction. (ix86_notrack_prefixed_insn_p): New function. * config/i386/i386.h (TARGET_CET, TARGET_CET_P): New. * config/i386/i386.md (define_insn "rdssp"): New instruction. (define_insn "incssp"): Likewise. (define_insn "nop_endbr"): Likewise. * config/i386/i386.opt (mcet, mcet-switch, mcet-indbranch-tracking, mcet-shadow-stack): New options. * config/i386/immintrin.h Add include . * final.c (rest_of_handle_cet): New. (pass_data_handle_cet): New. (pass_handle_cet): New. (make_pass_handle_cet): New. * passes.def: (pass_handle_cet) Add pass. * reg-notes.def: (CALL_NOTRACK) New note for notrack. * timevar.def: (TV_CET) New. * tree-core.h: (ECF_NOTRACK) New. * tree-pass.h: (make_pass_handle_cet) New. gcc/testsuite/ * gcc.target/i386/cet-intrin.c: New test. * gcc.target/i386/cet-label.c: Likewise. * gcc.target/i386/cet-notrack.c: Likewise. * gcc.target/i386/cet-sjlj.c: Likewise. * gcc.target/i386/cet-switch-1.c: Likewise. * gcc.target/i386/cet-switch-2.c: Likewise. --- 0001-Enable-Control-flow-Enforcement-Technology-CET.patch Description: 0001-Enable-Control-flow-Enforcement-Technology-CET.patch
[committed] Fix -Werror=class-memaccess failures in jit testsuite (PR jit/81144)
This patch fixes a couple of failures of the form: error: 'void* memset(void*, int, size_t)' clearing an object of non-trivial type 'struct quadratic_test'; use assignment or value-initialization instead [-Werror=class-memaccess] note: 'struct quadratic_test' declared here cc1plus: all warnings being treated as errors seen within the jit testsuite, by using zero-initialization instead of memset. (presumably introduced by r249234 aka a324786b4ded9047d05463b4bce9d238b6c6b3ef) Successfully tested on x86_64-pc-linux-gnu; takes jit.sum from: # of expected passes9211 # of unexpected failures2 to: # of expected passes9349 Martin: it's unclear to me what the benefit of the warning is for these cases. AIUI, it's complaining because the code is calling the default ctor for struct quadratic_test, and then that object is being clobbered by the memset. But if I'm reading things right, the default ctor for this struct zero-initializes all fields. Can't the compiler simply optimize away the redundant memset, and not issue a warning? gcc/testsuite/ChangeLog: PR jit/81144 * jit.dg/test-operator-overloading.cc (make_test_quadratic): Replace memset call with zero-initialization. * jit.dg/test-quadratic.cc (make_test_quadratic): Likewise. --- gcc/testsuite/jit.dg/test-operator-overloading.cc | 3 +-- gcc/testsuite/jit.dg/test-quadratic.cc| 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/jit.dg/test-operator-overloading.cc b/gcc/testsuite/jit.dg/test-operator-overloading.cc index cbb1e98..f57b3fc 100644 --- a/gcc/testsuite/jit.dg/test-operator-overloading.cc +++ b/gcc/testsuite/jit.dg/test-operator-overloading.cc @@ -272,8 +272,7 @@ make_test_quadratic (quadratic_test ) void create_code (gcc_jit_context *ctxt, void *user_data) { - struct quadratic_test testcase; - memset (, 0, sizeof (testcase)); + struct quadratic_test testcase = {}; testcase.ctxt = ctxt; make_types (testcase); make_sqrt (testcase); diff --git a/gcc/testsuite/jit.dg/test-quadratic.cc b/gcc/testsuite/jit.dg/test-quadratic.cc index f347669..61b5cdd 100644 --- a/gcc/testsuite/jit.dg/test-quadratic.cc +++ b/gcc/testsuite/jit.dg/test-quadratic.cc @@ -328,8 +328,7 @@ make_test_quadratic (quadratic_test ) void create_code (gcc_jit_context *ctxt, void *user_data) { - struct quadratic_test testcase; - memset (, 0, sizeof (testcase)); + struct quadratic_test testcase = {}; testcase.ctxt = ctxt; make_types (testcase); make_sqrt (testcase); -- 1.8.5.3
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
On 06/20/2017 02:37 PM, Eric Botcazou wrote: >> But then valgrind won't be able to find bugs in the code (storing and later >> reading stuff into the volatile parts of the stack that could be overwritten >> by any asynchronous signal). GCC had various bugs in this area and >> valgrind has been able to report those. Unless the probe instruction is >> sufficiently magic that it won't usually appear in other code. > > Right, maybe this magic aspect was the reason why it was initially > implemented > like that for Cygwin, at least you know that orl $0 is meant to be special. > >> Only checking loads below the stack is not sufficient, some buggy code could >> e.g. store some data below stack pointer (below red zone if any), then >> subtract stack and then try to read it, etc. >> >> Not to mention that it isn't just false positive messages with current >> valgrind on -fstack-check code, e.g. on ppc64 it just crashes. > > The reasoning seems weird though since, apart from x86/x86-64, you're going > to > gratuitously inflict this painful "moving sp" thing to every program compiled > on Linux because of just one tool that you can adapt. I don't see MOVING_SP as painful, except perhaps on aarch64. On something like PPC MOVING_SP turns out to be exceedingly clean. jeff
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
> But then valgrind won't be able to find bugs in the code (storing and later > reading stuff into the volatile parts of the stack that could be overwritten > by any asynchronous signal). GCC had various bugs in this area and > valgrind has been able to report those. Unless the probe instruction is > sufficiently magic that it won't usually appear in other code. Right, maybe this magic aspect was the reason why it was initially implemented like that for Cygwin, at least you know that orl $0 is meant to be special. > Only checking loads below the stack is not sufficient, some buggy code could > e.g. store some data below stack pointer (below red zone if any), then > subtract stack and then try to read it, etc. > > Not to mention that it isn't just false positive messages with current > valgrind on -fstack-check code, e.g. on ppc64 it just crashes. The reasoning seems weird though since, apart from x86/x86-64, you're going to gratuitously inflict this painful "moving sp" thing to every program compiled on Linux because of just one tool that you can adapt. -- Eric Botcazou
[PATCH, VAX] Correct ffs instruction constraint
VAX' FFS as variable-length bit field instruction uses a "base" operand of type "vb" meaning "byte address". "base" can be 32 bits (SI) and due to the definition of ffssi2/__builtin_ffs() with the operand constraint "m", code can be emitted which incorrectly implies a mode-dependent (= longword, for the 32-bit operand) address. File scsipi_base.c compiled with -Os for our VAX install kernel shows: ffs $0x0,$0x20,0x50(r11)[r0],r9 Apparently, 0x50(r11)[r0] as a longword address is assumed to be evaluated in longword context by FFS, but the instruction expects a byte address. Our fix is to change the operand constraint from "m" to "Q", i. e. "operand is a MEM that does not have a mode-dependent address", which results in: moval 0x50(r11)[r0],r1 ffs $0x0,$0x20,(r1),r9 MOVAL evaluates the source operand/address in longword context, so effectively converts the word address to a byte address for FFS. See NetBSD PR port-vax/51761 (http://gnats.netbsd.org/51761) and discussion on port-vax mailing list (http://mail-index.netbsd.org/port-vax/2017/01/06/msg002954.html). Changlog: 2017-06-20 Maya Rashish* gcc/config/vax/builtins.md: Correct ffssi2_internal instruction constraint. --- gcc/config/vax/builtins.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/vax/builtins.md b/gcc/config/vax/builtins.md index fb0f69acb..b78fb5616 100644 --- a/gcc/config/vax/builtins.md +++ b/gcc/config/vax/builtins.md @@ -41,7 +41,7 @@ (define_insn "ffssi2_internal" [(set (match_operand:SI 0 "nonimmediate_operand" "=rQ") - (ffs:SI (match_operand:SI 1 "general_operand" "nrmT"))) + (ffs:SI (match_operand:SI 1 "general_operand" "nrQT"))) (set (cc0) (match_dup 0))] "" "ffs $0,$32,%1,%0") -- 2.13.1
Re: [PATCH] [SPARC] Add a workaround for the LEON3FT store-store errata
From: Eric BotcazouDate: Tue, 20 Jun 2017 21:19:37 +0200 >> I'm fine with this change. > > I disagree, the existing policy is to avoid switches like -mfix-b2bst and use > -mfix- where is a CPU (here could be ut699e or ut700). Ok, I was not aware of that policy. But this should be easy for the submitter to fix.
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
On Tue, Jun 20, 2017 at 10:21:14AM +0200, Eric Botcazou wrote: > > Out of curiousity, does the old Alpha/VMS stack-checking API meet the > > requirements? From what I recall, I think it does. > > No, it's the usual probe-first-and-then-allocate strategy and Jeff rejects it > because of valgrind. I'd personally rather change valgrind but... But then valgrind won't be able to find bugs in the code (storing and later reading stuff into the volatile parts of the stack that could be overwritten by any asynchronous signal). GCC had various bugs in this area and valgrind has been able to report those. Unless the probe instruction is sufficiently magic that it won't usually appear in other code. Only checking loads below the stack is not sufficient, some buggy code could e.g. store some data below stack pointer (below red zone if any), then subtract stack and then try to read it, etc. Not to mention that it isn't just false positive messages with current valgrind on -fstack-check code, e.g. on ppc64 it just crashes. Jakub
[PATCH, alpha, go]: Introduce applyRelocationsALPHA
This patch inroduces applyRelocationsALPHA to solve: FAIL: TestCgoConsistentResults FAIL: TestCgoPkgConfig FAIL: TestCgoHandlesWlORIGIN gotools errors. Bootstrapped and regression tested on alphaev68-linux-gnu. Uros. Index: go/debug/elf/file.go === --- go/debug/elf/file.go(revision 249418) +++ go/debug/elf/file.go(working copy) @@ -602,6 +602,8 @@ return f.applyRelocationss390x(dst, rels) case f.Class == ELFCLASS64 && f.Machine == EM_SPARCV9: return f.applyRelocationsSPARC64(dst, rels) + case f.Class == ELFCLASS64 && f.Machine == EM_ALPHA: + return f.applyRelocationsALPHA(dst, rels) default: return errors.New("applyRelocations: not implemented") } @@ -1049,6 +1051,55 @@ return nil } +func (f *File) applyRelocationsALPHA(dst []byte, rels []byte) error { + // 24 is the size of Rela64. + if len(rels)%24 != 0 { + return errors.New("length of relocation section is not a multiple of 24") + } + + symbols, _, err := f.getSymbols(SHT_SYMTAB) + if err != nil { + return err + } + + b := bytes.NewReader(rels) + var rela Rela64 + + for b.Len() > 0 { + binary.Read(b, f.ByteOrder, ) + symNo := rela.Info >> 32 + t := R_ALPHA(rela.Info & 0x) + + if symNo == 0 || symNo > uint64(len(symbols)) { + continue + } + sym := [symNo-1] + if SymType(sym.Info&0xf) != STT_SECTION { + // We don't handle non-section relocations for now. + continue + } + + // There are relocations, so this must be a normal + // object file, and we only look at section symbols, + // so we assume that the symbol value is 0. + + switch t { + case R_ALPHA_REFQUAD: + if rela.Off+8 >= uint64(len(dst)) || rela.Addend < 0 { + continue + } + f.ByteOrder.PutUint64(dst[rela.Off:rela.Off+8], uint64(rela.Addend)) + case R_ALPHA_REFLONG: + if rela.Off+4 >= uint64(len(dst)) || rela.Addend < 0 { + continue + } + f.ByteOrder.PutUint32(dst[rela.Off:rela.Off+4], uint32(rela.Addend)) + } + } + + return nil +} + func (f *File) DWARF() (*dwarf.Data, error) { // sectionData gets the data for s, checks its size, and // applies any applicable relations.
Re: C++ PATCH for c++/81073, constexpr and static var in statement-expression
On Jun 20 2017, Jason Merrillwrote: > On Tue, Jun 20, 2017 at 5:40 AM, Andreas Schwab wrote: >> FAIL: g++.dg/cpp0x/constexpr-cast.C -std=c++11 (test for errors, line 10) >> FAIL: g++.dg/cpp0x/constexpr-cast.C -std=c++11 (test for excess errors) >> FAIL: g++.dg/cpp0x/constexpr-cast.C -std=c++14 (test for errors, line 10) >> FAIL: g++.dg/cpp0x/constexpr-cast.C -std=c++14 (test for excess errors) > > I'm not seeing this. Can you give more detail? http://gcc.gnu.org/ml/gcc-testresults/2017-06/msg02172.html Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
[PING^2] Re: [PATCH] c/c++: Add fix-it hints for suggested missing #includes
Ping re: https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00321.html On Fri, 2017-05-26 at 15:54 -0400, David Malcolm wrote: > Ping: > https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00321.html > > On Thu, 2017-05-04 at 12:36 -0400, David Malcolm wrote: > > As of r247522, fix-it-hints can suggest the insertion of new lines. > > > > This patch uses this to implement a new "maybe_add_include_fixit" > > function in c-common.c and uses it in the two places where the C > > and > > C++ > > frontend can suggest missing #include directives. [1] > > > > The idea is that the user can then click on the fix-it in an IDE > > and have it add the #include for them (or use -fdiagnostics > > -generate > > -patch). > > > > Examples can be seen in the test cases. > > > > The function attempts to put the #include in a reasonable place: > > immediately after the last #include within the file, or at the > > top of the file. It is idempotent, so -fdiagnostics-generate-patch > > does the right thing if several such diagnostics are emitted. > > > > Successfully bootstrapped on x86_64-pc-linux-gnu. > > > > OK for trunk? > > > > [1] I'm working on a followup which tweaks another diagnostic so > > that > > it > > can suggest that a #include was missing, so I'll use it there as > > well. > > > > gcc/c-family/ChangeLog: > > * c-common.c (try_to_locate_new_include_insertion_point): New > > function. > > (per_file_includes_t): New typedef. > > (added_includes_t): New typedef. > > (added_includes): New variable. > > (maybe_add_include_fixit): New function. > > * c-common.h (maybe_add_include_fixit): New decl. > > > > gcc/c/ChangeLog: > > * c-decl.c (implicitly_declare): When suggesting a missing > > #include, provide a fix-it hint. > > > > gcc/cp/ChangeLog: > > * name-lookup.c (get_std_name_hint): Add '<' and '>' around > > the header names. > > (maybe_suggest_missing_header): Update for addition of '<' and > > '>' > > to above. Provide a fix-it hint. > > > > gcc/testsuite/ChangeLog: > > * g++.dg/lookup/missing-std-include-2.C: New text case. > > * gcc.dg/missing-header-fixit-1.c: New test case. > > --- > > gcc/c-family/c-common.c| 117 > > + > > gcc/c-family/c-common.h| 2 + > > gcc/c/c-decl.c | 10 +- > > gcc/cp/name-lookup.c | 94 + > > -- > > -- > > .../g++.dg/lookup/missing-std-include-2.C | 55 > > ++ > > gcc/testsuite/gcc.dg/missing-header-fixit-1.c | 36 +++ > > 6 files changed, 267 insertions(+), 47 deletions(-) > > create mode 100644 gcc/testsuite/g++.dg/lookup/missing-std-include > > -2.C > > create mode 100644 gcc/testsuite/gcc.dg/missing-header-fixit-1.c > > > > diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c > > index 0884922..19f7e60 100644 > > --- a/gcc/c-family/c-common.c > > +++ b/gcc/c-family/c-common.c > > @@ -7983,4 +7983,121 @@ c_flt_eval_method (bool maybe_c11_only_p) > > return c_ts18661_flt_eval_method (); > > } > > > > +/* Attempt to locate a suitable location within FILE for a > > + #include directive to be inserted before. FILE should > > + be a string from libcpp (pointer equality is used). > > + > > + Attempt to return the location within FILE immediately > > + after the last #include within that file, or the start of > > + that file if it has no #include directives. > > + > > + Return UNKNOWN_LOCATION if no suitable location is found, > > + or if an error occurs. */ > > + > > +static location_t > > +try_to_locate_new_include_insertion_point (const char *file) > > +{ > > + /* Locate the last ordinary map within FILE that ended with a > > #include. */ > > + const line_map_ordinary *last_include_ord_map = NULL; > > + > > + /* ...and the next ordinary map within FILE after that one. */ > > + const line_map_ordinary *last_ord_map_after_include = NULL; > > + > > + /* ...and the first ordinary map within FILE. */ > > + const line_map_ordinary *first_ord_map_in_file = NULL; > > + > > + for (unsigned int i = 0; i < LINEMAPS_ORDINARY_USED > > (line_table); > > i++) > > +{ > > + const line_map_ordinary *ord_map > > + = LINEMAPS_ORDINARY_MAP_AT (line_table, i); > > + > > + const line_map_ordinary *from = INCLUDED_FROM (line_table, > > ord_map); > > + if (from) > > + if (from->to_file == file) > > + { > > + last_include_ord_map = from; > > + last_ord_map_after_include = NULL; > > + } > > + > > + if (ord_map->to_file == file) > > + { > > + if (!first_ord_map_in_file) > > + first_ord_map_in_file = ord_map; > > + if (last_include_ord_map && !last_ord_map_after_include) > > + last_ord_map_after_include = ord_map; > > + } > > +} > > + > > + /* Determine where to insert the #include. */ > > + const line_map_ordinary *ord_map_for_insertion;
Re: [PATCH] [SPARC] Add a workaround for the LEON3FT store-store errata
> I'm fine with this change. I disagree, the existing policy is to avoid switches like -mfix-b2bst and use -mfix- where is a CPU (here could be ut699e or ut700). -- Eric Botcazou
Re: [PING] C++ Re: [PATCH] C/C++: fix quoting of "aka" typedef information (PR 62170)
On Tue, Jun 20, 2017 at 3:06 PM, David Malcolmwrote: > It's not clear to me what the issue alluded to with negative > obstack_blank is, but I chose to follow the above docs and use > obstack_blank_fast; am testing an updated patch in which the above line > now looks like: > > obstack_blank_fast (ob, -(type_start + type_len)); > > Is the patch OK with that change? (assuming bootstrap > pass), or should I re-post? OK with that change. > On a related matter, this patch conflicts with Volker's patch here: > > https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01576.html > > in which he removes the trailing "{enum}" info (and hence all of our > changes to the testsuite conflict between the two patches...) > > Do you have any thoughts on that other patch? [Ccing Volker] That patch makes sense to me; I prefer "enum E" to "E {enum}". Jason
Re: [PING] C++ Re: [PATCH] C/C++: fix quoting of "aka" typedef information (PR 62170)
On Tue, 2017-06-20 at 14:01 -0400, Jason Merrill wrote: > On Tue, Jun 20, 2017 at 1:58 PM, Jason Merrill> wrote: > > On Tue, Jun 20, 2017 at 11:50 AM, David Malcolm < > > dmalc...@redhat.com> wrote: > > > > + ob->next_free = p + type_start + type_len; > > > > I'm uncomfortable with modifying the obstack directly. Why not use > > obstack_free? > > ...because you aren't freeing the object, but shrinking it. So > obstack_blank is a better choice. Thanks. As of r229987 ("Copy gnulib obstack files", aka 1ed1385ecb1c11d6915adac74afa2ff7da8be5d1), libiberty/obstacks.texi says: > @cindex shrinking objects > You can use @code{obstack_blank_fast} with a ``negative'' size > argument to make the current object smaller. Just don't try to > shrink it beyond zero length---there's no telling what will happen > if you do that. Earlier versions of obstacks allowed you to use > @code{obstack_blank} to shrink objects. This will no longer work. It's not clear to me what the issue alluded to with negative obstack_blank is, but I chose to follow the above docs and use obstack_blank_fast; am testing an updated patch in which the above line now looks like: obstack_blank_fast (ob, -(type_start + type_len)); Is the patch OK with that change? (assuming bootstrap pass), or should I re-post? On a related matter, this patch conflicts with Volker's patch here: https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01576.html in which he removes the trailing "{enum}" info (and hence all of our changes to the testsuite conflict between the two patches...) Do you have any thoughts on that other patch? [Ccing Volker] Thanks Dave
[PATCH, testsuite]: Fix gcc.target/i386/pr80732.c execution test failure
2017-06-20 Uros Bizjak* gcc.target/i386/pr80732.c: Include fma4-check.h. (main): Renamed to ... (fma4_test): ... this. Tested on x86_64-linux-gnu and committed to mainline SVN. Uros. Index: gcc.target/i386/pr80732.c === --- gcc.target/i386/pr80732.c (revision 249418) +++ gcc.target/i386/pr80732.c (working copy) @@ -6,6 +6,8 @@ /* { dg-require-effective-target fpic } */ /* { dg-require-effective-target pie } */ +#include "fma4-check.h" + #include __attribute__((target_clones("default","fma"),noinline,optimize("fast-math"))) @@ -51,7 +53,8 @@ double (*initializer) (double, double, double) = { }; -int main() +static void +fma4_test (void) { char buffer[256]; const char *expectation = "4.93038e-32, 4.93038e-32, 4.93038e-32"; @@ -87,6 +90,4 @@ __builtin_sprintf(buffer, "%g, %g, %g", initializer (a, b, c), v2_2, v2_3); if (__builtin_strcmp (buffer, expectation) != 0) __builtin_abort (); - -return 0; }
Re: [PATCH, AArch64] Add x86 intrinsic headers to GCC AArch64 taget
On Tue, 2017-06-20 at 09:04 +, Hurugalawadi, Naveen wrote: > Hi Joesph, > > Thanks for your review and valuable comments on this issue. > > Please find attached the patch that merges x86-intrinsics for AArch64 and PPC > architectures. > > >> it would seem to me to be a bad idea to duplicate the > >> implementation for more and more architectures. > Merged the implementation for AArch64 and PPC architectures. > > The testcase have not been merged yet. Will do it after checking out > the comments on the current idea of implementation. > > Please check the patch and let me know the comments. > > Bootstrapped and Regression tested on aarch64-thunder-linux and PPC. > I am not sure this works or is even a good idea. As an accident bmiintrin.h can be implemented as C code or common builtins. But bmi2intrin.h depends on __builtin_bpermd which to my knowledge is PowerISA only. As I work on mmx, sse, sse2, etc it gets more complicated. There are many X86 intrinsic instances that require altivec.h unique instrisics to implement efficiently for the power64le target and some inline __asm. Net the current sample size so far is to small to make a reasonable assessment. And as you see see below the gcc.target tests have to be duplicated anyway. Even if the C code is common there will many differences in dg-options and dg-require-effective-target. Trying to common these implementations only creates more small files to manage. > Thanks, > Naveen > > 2017-06-20 Naveen H.S> > [gcc] > * config.gcc (aarch64*-*-*): Add bmi2intrin.h, bmiintrin.h, > adxintrin.h and x86intrin.h in Config folder. > (powerpc*-*-*): Move bmi2intrin.h, bmiintrin.h and x86intrin.h into > Config folder. > * config/adxintrin.h: New file. > * config/bmi2intrin.h: New file. > * config/bmiintrin.h: New file. > * config/x86intrin.h: New file. > * config/rs6000/bmi2intrin.h: Delete file. > * config/rs6000/bmiintrin.h: Likewise. > * config/rs6000/x86intrin.h: Likewise. > > [gcc/testsuite] > > * gcc.target/aarch64/adx-addcarryx32-1.c: New file. > * gcc.target/aarch64/adx-addcarryx32-2.c: New file. > * gcc.target/aarch64/adx-addcarryx32-3.c: New file. > * gcc.target/aarch64/adx-addcarryx64-1.c: New file. > * gcc.target/aarch64/adx-addcarryx64-2.c: New file > * gcc.target/aarch64/adx-addcarryx64-3.c: New file > * gcc.target/aarch64/adx-check.h: New file > * gcc.target/aarch64/bmi-andn-1.c: New file > * gcc.target/aarch64/bmi-andn-2.c: New file. > * gcc.target/aarch64/bmi-bextr-1.c: New file. > * gcc.target/aarch64/bmi-bextr-2.c: New file. > * gcc.target/aarch64/bmi-bextr-4.c: New file. > * gcc.target/aarch64/bmi-bextr-5.c: New file. > * gcc.target/aarch64/bmi-blsi-1.c: New file. > * gcc.target/aarch64/bmi-blsi-2.c: New file. > * gcc.target/aarch64/bmi-blsmsk-1.c: new file. > * gcc.target/aarch64/bmi-blsmsk-2.c: New file. > * gcc.target/aarch64/bmi-blsr-1.c: New file. > * gcc.target/aarch64/bmi-blsr-2.c: New File. > * gcc.target/aarch64/bmi-check.h: New File. > * gcc.target/aarch64/bmi-tzcnt-1.c: new file. > * gcc.target/aarch64/bmi-tzcnt-2.c: New file. > * gcc.target/aarch64/bmi2-bzhi32-1.c: New file. > * gcc.target/aarch64/bmi2-bzhi64-1.c: New file. > * gcc.target/aarch64/bmi2-bzhi64-1a.c: New file. > * gcc.target/aarch64/bmi2-check.h: New file. > * gcc.target/aarch64/bmi2-mulx32-1.c: New file. > * gcc.target/aarch64/bmi2-mulx32-2.c: New file. > * gcc.target/aarch64/bmi2-mulx64-1.c: New file. > * gcc.target/aarch64/bmi2-mulx64-2.c: New file. > * gcc.target/aarch64/bmi2-pdep32-1.c: New file. > * gcc.target/aarch64/bmi2-pdep64-1.c: New file. > * gcc.target/aarch64/bmi2-pext32-1.c: New File. > * gcc.target/aarch64/bmi2-pext64-1.c: New file. > * gcc.target/aarch64/bmi2-pext64-1a.c: New File.
Re: [PATCH] Fix UB in ira-costs.c (find_costs_and_classes)
On 06/20/2017 03:27 AM, Jakub Jelinek wrote: Hi! bootstrap-ubsan revealed many ../../gcc/ira-costs.c:1747:20: runtime error: member access within null pointer of type 'cost_classes *[107]' issues. The problem is that cost_classes_ptr is sometimes NULL, but in those cases we have early exit: if (! allocno_p) { if (regno_reg_rtx[i] == NULL_RTX) continue; // <- HERE memcpy (temp_costs, COSTS (costs, i), struct_costs_size); i_mem_cost = temp_costs->mem_cost; } else { if (ira_regno_allocno_map[i] == NULL) continue; // <- or HERE ... } Still, cost_classes_ptr->classes where classes is an array is UB when cost_classes_ptr is NULL, so this patch moves it after the if (...) continue; in both branches (because it is needed both later in the else ... and after the whole if. Bootstrapped/regtested on x86_64-linux and i686-linux (with bootstrap-ubsan), ok for trunk? Sure. Jakub, thank you for addressing the issue. 2017-06-20 Jakub Jelinek* ira-costs.c (find_costs_and_classes): Initialize cost_classes later to make sure not to dereference a NULL cost_classes_ptr pointer.
Re: C++ PATCH for c++/81073, constexpr and static var in statement-expression
On Tue, Jun 20, 2017 at 5:40 AM, Andreas Schwabwrote: > FAIL: g++.dg/cpp0x/constexpr-cast.C -std=c++11 (test for errors, line 10) > FAIL: g++.dg/cpp0x/constexpr-cast.C -std=c++11 (test for excess errors) > FAIL: g++.dg/cpp0x/constexpr-cast.C -std=c++14 (test for errors, line 10) > FAIL: g++.dg/cpp0x/constexpr-cast.C -std=c++14 (test for excess errors) I'm not seeing this. Can you give more detail? Jason
Re: [PATCH v2] C++: Add fix-it hints for -Wold-style-cast
On Wed, May 3, 2017 at 9:51 AM, David Malcolmwrote: > On Thu, 2017-04-27 at 23:03 +0200, Marek Polacek wrote: >> On Thu, Apr 27, 2017 at 05:10:24PM -0400, David Malcolm wrote: >> > + /* First try const_cast. */ >> > + trial = build_const_cast (dst_type, orig_expr, 0 /* complain >> > */); >> > + if (trial != error_mark_node) >> > +return "const_cast"; >> > + >> > + /* If that fails, try static_cast. */ >> > + trial = build_static_cast (dst_type, orig_expr, 0 /* complain >> > */); >> > + if (trial != error_mark_node) >> > +return "static_cast"; >> > + >> > + /* Finally, try reinterpret_cast. */ >> > + trial = build_reinterpret_cast (dst_type, orig_expr, 0 /* >> > complain */); >> > + if (trial != error_mark_node) >> > +return "reinterpret_cast"; >> >> I think you'll want tf_none instead of 0 /* complain */ in these. >> >> Marek > > Thanks. > > Here's an updated version of the patch. > > Changes since v1: > - updated expected fixit-formatting (the new fix-it printer in > r247548 handles this properly now) > - added new test cases as suggested by Florian > - use "tf_none" rather than "0 /* complain */" > > Successfully bootstrapped on x86_64-pc-linux-gnu. > > OK for trunk? OK. Jason
Re: [PATCH][X86] Fix rounding pattern similar to PR73350
Hello Julia, Uroš, On 16 Jun 09:05, Uros Bizjak wrote: > On Fri, Jun 16, 2017 at 8:46 AM, Koval, Juliawrote: > > Hi, > > > > This test hangs on avx512er, maybe that's why: > >> According to POSIX, the behavior of a process is undefined after it > >> ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by > >> kill(2) or raise(3). > > > > And volatile make it work even without a patch(r1 and r2 are not combined > > then). > > > > Added other changes. > > The testcase LGTM. I'll leave the final approval to Kirill. The change and the case are fine to me. I've committed it to main trunk. > > Uros. =- Thanks, K
Re: [Patch AArch64] Add rcpc extension
On Tue, Jun 20, 2017 at 6:50 AM, James Greenhalghwrote: > > Hi, > > While GCC doesn't need to know anything about the RcPc extension for code > generation, we do need to add the extension flag to the string we pass > to the assembler when we're compiling for a CPU which implements the RcPc > extension. > > I've built a toolchain with this patch applied, and checked that we > correctly pass +rcpc on to the assembler if we give something like > -mcpu=generic+rcpc . > > OK? I think you forgot to update the documentation for this option extension. https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#g_t-march-and--mcpu-Feature-Modifiers I suspect there are other missing here too. Thanks, Andrew > > Thanks, > James > > --- > 2017-06-20 James Greenhalgh > > * config/aarch64/aarch64-option-extensions.def (rcpc): New. > * config/aarch64/aarch64.h (AARCH64_FL_RCPC): New. >
[PATCH/AARCH64 v2] Enable software prefetching (-fprefetch-loop-arrays) for ThunderX 88xxx
Here is the updated patch based on the new infrastructure which is now included. OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions and tested again on SPEC CPU 2006 on THunderX T88 with the speed up mentioned before. Thanks, Andrew Pinski ChangeLog: * config/aarch64/aarch64-cores.def (thunderxt88p1): Use thunderxt88 tunings. (thunderxt88): Likewise. * config/aarch64/aarch64.c (thunderxt88_prefetch_tune): New variable. (thunderx_prefetch_tune): New variable. (thunderx2t99_prefetch_tune): Update for the correct values. (thunderxt88_tunings): New variable. (thunderx_tunings): Use thunderx_prefetch_tune instead of generic_prefetch_tune. (thunderx2t99_tunings): Use AUTOPREFETCHER_WEAK. Index: gcc/config/aarch64/aarch64-cores.def === --- gcc/config/aarch64/aarch64-cores.def(revision 249422) +++ gcc/config/aarch64/aarch64-cores.def(working copy) @@ -56,8 +56,8 @@ AARCH64_CORE("cortex-a73", cortexa73, c AARCH64_CORE("thunderx", thunderx, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a0, -1) /* Do not swap around "thunderxt88p1" and "thunderxt88", this order is required to handle variant correctly. */ -AARCH64_CORE("thunderxt88p1", thunderxt88p1, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO,thunderx, 0x43, 0x0a1, 0) -AARCH64_CORE("thunderxt88", thunderxt88, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a1, -1) +AARCH64_CORE("thunderxt88p1", thunderxt88p1, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO,thunderxt88, 0x43, 0x0a1, 0) +AARCH64_CORE("thunderxt88", thunderxt88, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderxt88, 0x43, 0x0a1, -1) AARCH64_CORE("thunderxt81", thunderxt81, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a2, -1) AARCH64_CORE("thunderxt83", thunderxt83, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a3, -1) Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c(revision 249422) +++ gcc/config/aarch64/aarch64.c(working copy) @@ -554,12 +554,30 @@ static const cpu_prefetch_tune qdf24xx_p 3/* default_opt_level */ }; +static const cpu_prefetch_tune thunderxt88_prefetch_tune = +{ + 8, /* num_slots */ + 32, /* l1_cache_size */ + 128, /* l1_cache_line_size */ + 16*1024, /* l2_cache_size */ + 3/* default_opt_level */ +}; + +static const cpu_prefetch_tune thunderx_prefetch_tune = +{ + 8, /* num_slots */ + 32, /* l1_cache_size */ + 128, /* l1_cache_line_size */ + -1, /* l2_cache_size */ + -1 /* default_opt_level */ +}; + static const cpu_prefetch_tune thunderx2t99_prefetch_tune = { - 0, /* num_slots */ - -1, /* l1_cache_size */ + 8, /* num_slots */ + 32, /* l1_cache_size */ 64, /* l1_cache_line_size */ - -1, /* l2_cache_size */ + 256, /* l2_cache_size */ -1 /* default_opt_level */ }; @@ -745,6 +763,31 @@ static const struct tune_params exynosm1 _prefetch_tune }; +static const struct tune_params thunderxt88_tunings = +{ + _extra_costs, + _addrcost_table, + _regmove_cost, + _vector_cost, + _branch_cost, + _approx_modes, + 6, /* memmov_cost */ + 2, /* issue_rate */ + AARCH64_FUSE_CMP_BRANCH, /* fusible_ops */ + 8, /* function_align. */ + 8, /* jump_align. */ + 8, /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW),/* tune_flags. */ + _prefetch_tune +}; + static const struct tune_params thunderx_tunings = { _extra_costs, @@ -767,7 +810,7 @@ static const struct tune_params thunderx 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW),/* tune_flags. */ - _prefetch_tune + _prefetch_tune }; static const struct tune_params xgene1_tunings = @@ -841,7 +884,7 @@ static const struct tune_params thunderx 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ - tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ +
Re: [PING] C++ Re: [PATCH] C/C++: fix quoting of "aka" typedef information (PR 62170)
On Tue, Jun 20, 2017 at 1:58 PM, Jason Merrillwrote: > On Tue, Jun 20, 2017 at 11:50 AM, David Malcolm wrote: >>> + ob->next_free = p + type_start + type_len; > > I'm uncomfortable with modifying the obstack directly. Why not use > obstack_free? ...because you aren't freeing the object, but shrinking it. So obstack_blank is a better choice. Jason
Re: [PING] C++ Re: [PATCH] C/C++: fix quoting of "aka" typedef information (PR 62170)
On Tue, Jun 20, 2017 at 11:50 AM, David Malcolmwrote: >> + ob->next_free = p + type_start + type_len; I'm uncomfortable with modifying the obstack directly. Why not use obstack_free? I guess for that you'd want to change type_start to a pointer and get it from obstack_next_free. Jason
C++ PATCH for c++/80972, C++17 ICE with packed class
In C++17 mode we check to make sure that we are initializing directly from class prvalues rather than copying them, which hit an issue with packed fields: we create a temporary for binding a reference to a packed field, and then pass that temporary to the copy constructor. This isn't actually a problem, since the packed field must support trivial copy, so let's just allow it. Tested x86_64-pc-linux-gnu, applying to trunk and 7. commit 298b21a24bd9fbfac20a6dca12df2b64655e4f42 Author: Jason MerrillDate: Mon Jun 19 14:44:02 2017 -0400 PR c++/80972 - C++17 ICE with attribute packed. * call.c (build_over_call): Allow a TARGET_EXPR from reference binding. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index d1f27dd..b56da35 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -8025,6 +8025,8 @@ build_over_call (struct z_candidate *cand, int flags, tsubst_flags_t complain) subobject. */ if (CHECKING_P && cxx_dialect >= cxx1z) gcc_assert (TREE_CODE (arg) != TARGET_EXPR + /* It's from binding the ref parm to a packed field. */ + || convs[0]->need_temporary_p || seen_error () /* See unsafe_copy_elision_p. */ || DECL_BASE_CONSTRUCTOR_P (fn)); diff --git a/gcc/testsuite/g++.dg/ext/packed12.C b/gcc/testsuite/g++.dg/ext/packed12.C new file mode 100644 index 000..2ad14de --- /dev/null +++ b/gcc/testsuite/g++.dg/ext/packed12.C @@ -0,0 +1,6 @@ +// PR c++/80972 + +struct A { int i; }; +struct B { A a; } __attribute__((packed)); + +A a = B().a;
[C++ PATCH] remove unused identifier
'nelts' was a suspiciously gnu user space identifier to have. Turns out we don't use it anywhere. so killed ... nathan -- Nathan Sidwell 2017-06-20 Nathan Sidwell* cp-tree.h (CPTI_NELTS_IDENTIFIER): Delete. (nelts_identifier): Delete. * decl.c (initialize_predefined_identifiers): Remove nelts. Index: cp-tree.h === --- cp-tree.h (revision 249418) +++ cp-tree.h (working copy) @@ -136,7 +136,6 @@ enum cp_tree_index CPTI_DELTA_IDENTIFIER, CPTI_IN_CHARGE_IDENTIFIER, CPTI_VTT_PARM_IDENTIFIER, -CPTI_NELTS_IDENTIFIER, CPTI_THIS_IDENTIFIER, CPTI_PFN_IDENTIFIER, CPTI_VPTR_IDENTIFIER, @@ -234,7 +233,6 @@ extern GTY(()) tree cp_global_trees[CPTI /* The name of the parameter that contains a pointer to the VTT to use for this subobject constructor or destructor. */ #define vtt_parm_identifier cp_global_trees[CPTI_VTT_PARM_IDENTIFIER] -#define nelts_identifier cp_global_trees[CPTI_NELTS_IDENTIFIER] #define this_identifier cp_global_trees[CPTI_THIS_IDENTIFIER] #define pfn_identifier cp_global_trees[CPTI_PFN_IDENTIFIER] #define vptr_identifier cp_global_trees[CPTI_VPTR_IDENTIFIER] Index: decl.c === --- decl.c (revision 249418) +++ decl.c (working copy) @@ -3982,7 +3982,6 @@ initialize_predefined_identifiers (void) { "__base_dtor ", _dtor_identifier, 1 }, { "__deleting_dtor ", _dtor_identifier, 1 }, { IN_CHARGE_NAME, _charge_identifier, 0 }, -{ "nelts", _identifier, 0 }, { THIS_NAME, _identifier, 0 }, { VTABLE_DELTA_NAME, _identifier, 0 }, { VTABLE_PFN_NAME, _identifier, 0 },
Re: [PATCH] Improved diagnostics for casts and enums
So here's the patch that reverts the special enum handling in type_to_string and uses %q#T instead of %qT for two casting-related diagnostics. Bootstrapped and regtested on x86_64-pc-linux-gnu. OK for trunk? The "E {enum}'" notation is still on trunk so it seems that this patch has never been committed and I can't find approval of it in the archive. To make sure it doesn't get forgotten, please consider this a ping on Volker's behalf: https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01576.html Thanks Martin As one can see from the testsuite changes, there are several casting- and conversion-related messages like "invalid conversion from", "cannot convert", "invalid cast" that still use the simple %qT form. I'll give it a try to use %q#T there as well and prepare a separate patch if this really improves the diagnostics. Regards, Volker 2017-04-30 Volker Reichelt* parser.c (cp_parser_cast_expression): Use %q#T instead of %qT in old-style cast diagnostic. * typeck.c (maybe_warn_about_useless_cast): Use %q#T instead of %qT in useless cast diagnostic. * error.c (type_to_string): Remove enum special handling. Index: gcc/cp/parser.c === --- gcc/cp/parser.c (revision 247394) +++ gcc/cp/parser.c (working copy) @@ -8764,7 +8764,7 @@ && !VOID_TYPE_P (type) && current_lang_name != lang_name_c) warning (OPT_Wold_style_cast, -"use of old-style cast to %qT", type); +"use of old-style cast to %q#T", type); /* Only type conversions to integral or enumeration types can be used in constant-expressions. */ Index: gcc/cp/typeck.c === --- gcc/cp/typeck.c (revision 247394) +++ gcc/cp/typeck.c (working copy) @@ -6631,7 +6631,7 @@ ? xvalue_p (expr) : lvalue_p (expr)) && same_type_p (TREE_TYPE (expr), TREE_TYPE (type))) || same_type_p (TREE_TYPE (expr), type)) - warning (OPT_Wuseless_cast, "useless cast to type %qT", type); + warning (OPT_Wuseless_cast, "useless cast to type %q#T", type); } } Index: gcc/cp/error.c === --- gcc/cp/error.c (revision 247394) +++ gcc/cp/error.c (working copy) @@ -3134,10 +3134,6 @@ if (len == aka_len && memcmp (p, p+aka_start, len) == 0) p[len] = '\0'; } - - if (typ && TYPE_P (typ) && TREE_CODE (typ) == ENUMERAL_TYPE) -pp_string (cxx_pp, M_(" {enum}")); - return pp_ggc_formatted_text (cxx_pp); } === 2017-04-30 Volker Reichelt * g++.dg/cpp1z/direct-enum-init1.C: Rever special enum handling. * g++.dg/warn/pr12242.C: Likewise. Index: gcc/testsuite/g++.dg/cpp1z/direct-enum-init1.C === --- gcc/testsuite/g++.dg/cpp1z/direct-enum-init1.C (revision 247394) +++ gcc/testsuite/g++.dg/cpp1z/direct-enum-init1.C (working copy) @@ -17,67 +17,67 @@ void foo () { - A a1 { 5 }; // { dg-error "invalid conversion from 'int' to 'A {enum}'" } - B b1 { 7 }; // { dg-error "invalid conversion from 'int' to 'B {enum}'" "" { target c++14_down } } + A a1 { 5 }; // { dg-error "invalid conversion from 'int' to 'A'" } + B b1 { 7 }; // { dg-error "invalid conversion from 'int' to 'B'" "" { target c++14_down } } C c1 { s }; - D d1 { D(t) }; // { dg-error "invalid cast from type 'T' to type 'D {enum}'" } - D d2 { t }; // { dg-error "cannot convert 'T' to 'D {enum}' in initialization" "" { target c++14_down } } + D d1 { D(t) }; // { dg-error "invalid cast from type 'T' to type 'D'" } + D d2 { t }; // { dg-error "cannot convert 'T' to 'D' in initialization" "" { target c++14_down } } // { dg-error "invalid cast from type 'T' to type 'D'" "" { target c++1z } .-1 } - D d3 { 9 }; // { dg-error "cannot convert 'int' to 'D {enum}' in initialization" "" { target c++14_down } } - D d4 { l }; // { dg-error "cannot convert 'long int' to 'D {enum}' in initialization" "" { target c++14_down } } + D d3 { 9 }; // { dg-error "cannot convert 'int' to 'D' in initialization" "" { target c++14_down } } + D d4 { l }; // { dg-error "cannot convert 'long int' to 'D' in initialization" "" { target c++14_down } } D d5 { D(l) }; - D d6 { G }; // { dg-error "cannot convert 'A {enum}' to 'D {enum}' in initialization" "" { target c++14_down } } - E e1 { 5 }; // { dg-error "cannot convert 'int' to 'E {enum}' in initialization" "" { target c++14_down } } - E e2 { -1 }; // { dg-error
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
On 06/20/2017 06:17 AM, Uros Bizjak wrote: > On Tue, Jun 20, 2017 at 2:13 PM, Florian Weimerwrote: >> On 06/20/2017 01:10 PM, Uros Bizjak wrote: >> >>> 74,99% a.outa.out [.] test_or >>> 12,50% a.outa.out [.] test_movb >>> 12,50% a.outa.out [.] test_movl >> >> Could you try notl/notb/negl/negb as well, please? > > These all have the same (long) runtime as test_or. That would be my expectation -- they (not/neg) are going to be RMW. So we can we agree that moving away RMW to a simple W style instruction for the probe is where we want to go? Then we can kick around the exact form of that store. FWIW, we don't have to store zero -- ultimately we care about the side effect of triggering the page fault, not the value written. So we could just as easily store a register into the probed address to avoid the codesize cost of encoding an immediate I did that in my local s390 patches. It may not be necessary there, but it allowed me to avoid thinking too hard about the ISA and get s390 proof of concept code running :-) Jeff
[PATCH, alpha]: Update libstdc++ baseline_symbols.txt
2017-06-20 Uros Bizjak* config/abi/post/alpha-linux-gnu/baseline_symbols.txt: Update. Tested on alphaev68-linux-gnu and committed to mainline SVN. Uros. Index: config/abi/post/alpha-linux-gnu/baseline_symbols.txt === --- config/abi/post/alpha-linux-gnu/baseline_symbols.txt(revision 249356) +++ config/abi/post/alpha-linux-gnu/baseline_symbols.txt(working copy) @@ -444,6 +444,7 @@ FUNC:_ZNKSt13basic_istreamIwSt11char_traitsIwEE6gcountEv@@GLIBCXX_3.4 FUNC:_ZNKSt13basic_istreamIwSt11char_traitsIwEE6sentrycvbEv@@GLIBCXX_3.4 FUNC:_ZNKSt13basic_ostreamIwSt11char_traitsIwEE6sentrycvbEv@@GLIBCXX_3.4 +FUNC:_ZNKSt13random_device13_M_getentropyEv@@GLIBCXX_3.4.25 FUNC:_ZNKSt13runtime_error4whatEv@@GLIBCXX_3.4 FUNC:_ZNKSt14basic_ifstreamIcSt11char_traitsIcEE5rdbufEv@@GLIBCXX_3.4 FUNC:_ZNKSt14basic_ifstreamIcSt11char_traitsIcEE7is_openEv@@GLIBCXX_3.4.5 @@ -1471,6 +1472,7 @@ FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1EPKwmRKS1_@@GLIBCXX_3.4 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ERKS1_@@GLIBCXX_3.4 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ERKS2_@@GLIBCXX_3.4 +FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ERKS2_mRKS1_@@GLIBCXX_3.4.23 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ERKS2_mm@@GLIBCXX_3.4 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ERKS2_mmRKS1_@@GLIBCXX_3.4 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ESt16initializer_listIwERKS1_@@GLIBCXX_3.4.11 @@ -1484,6 +1486,7 @@ FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2EPKwmRKS1_@@GLIBCXX_3.4 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ERKS1_@@GLIBCXX_3.4 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ERKS2_@@GLIBCXX_3.4 +FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ERKS2_mRKS1_@@GLIBCXX_3.4.23 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ERKS2_mm@@GLIBCXX_3.4 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ERKS2_mmRKS1_@@GLIBCXX_3.4 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ESt16initializer_listIwERKS1_@@GLIBCXX_3.4.11 @@ -1726,6 +1729,7 @@ FUNC:_ZNSsC1EPKcmRKSaIcE@@GLIBCXX_3.4 FUNC:_ZNSsC1ERKSaIcE@@GLIBCXX_3.4 FUNC:_ZNSsC1ERKSs@@GLIBCXX_3.4 +FUNC:_ZNSsC1ERKSsmRKSaIcE@@GLIBCXX_3.4.23 FUNC:_ZNSsC1ERKSsmm@@GLIBCXX_3.4 FUNC:_ZNSsC1ERKSsmmRKSaIcE@@GLIBCXX_3.4 FUNC:_ZNSsC1ESt16initializer_listIcERKSaIcE@@GLIBCXX_3.4.11 @@ -1739,6 +1743,7 @@ FUNC:_ZNSsC2EPKcmRKSaIcE@@GLIBCXX_3.4 FUNC:_ZNSsC2ERKSaIcE@@GLIBCXX_3.4 FUNC:_ZNSsC2ERKSs@@GLIBCXX_3.4 +FUNC:_ZNSsC2ERKSsmRKSaIcE@@GLIBCXX_3.4.23 FUNC:_ZNSsC2ERKSsmm@@GLIBCXX_3.4 FUNC:_ZNSsC2ERKSsmmRKSaIcE@@GLIBCXX_3.4 FUNC:_ZNSsC2ESt16initializer_listIcERKSaIcE@@GLIBCXX_3.4.11 @@ -2382,6 +2387,7 @@ FUNC:_ZNSt15_List_node_base9_M_unhookEv@@GLIBCXX_3.4.14 FUNC:_ZNSt15__exception_ptr13exception_ptr4swapERS0_@@CXXABI_1.3.3 FUNC:_ZNSt15__exception_ptr13exception_ptrC1EMS0_FvvE@@CXXABI_1.3.3 +FUNC:_ZNSt15__exception_ptr13exception_ptrC1EPv@@CXXABI_1.3.11 FUNC:_ZNSt15__exception_ptr13exception_ptrC1ERKS0_@@CXXABI_1.3.3 FUNC:_ZNSt15__exception_ptr13exception_ptrC1Ev@@CXXABI_1.3.3 FUNC:_ZNSt15__exception_ptr13exception_ptrC2EMS0_FvvE@@CXXABI_1.3.3 @@ -3068,6 +3074,7 @@ FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ERKS3_@@GLIBCXX_3.4.21 FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ERKS4_@@GLIBCXX_3.4.21 FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ERKS4_RKS3_@@GLIBCXX_3.4.21 +FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ERKS4_mRKS3_@@GLIBCXX_3.4.23 FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ERKS4_mm@@GLIBCXX_3.4.21 FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ERKS4_mmRKS3_@@GLIBCXX_3.4.21 FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ESt16initializer_listIcERKS3_@@GLIBCXX_3.4.21 @@ -3083,6 +3090,7 @@ FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC2ERKS3_@@GLIBCXX_3.4.21 FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC2ERKS4_@@GLIBCXX_3.4.21 FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC2ERKS4_RKS3_@@GLIBCXX_3.4.21 +FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC2ERKS4_mRKS3_@@GLIBCXX_3.4.23 FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC2ERKS4_mm@@GLIBCXX_3.4.21 FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC2ERKS4_mmRKS3_@@GLIBCXX_3.4.21 FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC2ESt16initializer_listIcERKS3_@@GLIBCXX_3.4.21 @@ -3209,6 +3217,7 @@ FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEEC1ERKS3_@@GLIBCXX_3.4.21 FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEEC1ERKS4_@@GLIBCXX_3.4.21 FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEEC1ERKS4_RKS3_@@GLIBCXX_3.4.21 +FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEEC1ERKS4_mRKS3_@@GLIBCXX_3.4.23 FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEEC1ERKS4_mm@@GLIBCXX_3.4.21 FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEEC1ERKS4_mmRKS3_@@GLIBCXX_3.4.21
[PING] C++ Re: [PATCH] C/C++: fix quoting of "aka" typedef information (PR 62170)
Ping re the C++ part of this: https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00242.html Marek approved the C parts; are the C++ parts OK for trunk? Or can I self-approve this? (exactly what the boundaries of my"diagnostic messages" maintainer remit are aren't clear to me). Thanks Dave On Mon, 2017-06-05 at 15:01 -0400, David Malcolm wrote: > PR 62170 describes a problem with how the quoting in pp_format > interacts with the "aka" information for typedefs in %qT for > the C family of frontends, and also now for %qH and %qI in the > C++ frontend. > > Currently for %qT we print e.g.: > > ‘Py_ssize_t* {aka int*}’ >^^ colorized as "quote" > > i.e. > ‘[START_COLOR]Py_ssize_t* {aka int*}[END_COLOR]’ > > when we should print: > > ‘Py_ssize_t*’ {aka ‘int*’} >^^^ colorized as "quote" > > i.e. > ‘[START_COLOR]Py_ssize_t*[END_COLOR]’ {aka > ‘[START_COLOR]int*[END_COLOR]’} > > where the opening and closing quote characters and colorization are > currently added by the 'q' handling within pp_format, adding the > closing > quote unconditionally after whatever pp_format_decoder prints for 'T' > within "%qT". > > This patch fixes the quoting by updating the %T handling in C and C++ > and the %H/%I handling in C++ to insert the quoting appropriately. > It converts the "quote" param of the pp_format_decoder callback from > bool to bool *, allowing for the %T and %H/%I handlers to write > false back to it, to avoid printing the closing quote for the cases > like the above where a final trailing closing quote isn't needed. > > It introduces pp_begin_quote/pp_end_quote to simplify this. These > take a "bool show_color", rather than using "pp_show_color (pp)" > since cxx_pp's pp_show_color isn't currently initialized (since > cxx_initialize_diagnostics happens before diagnostic_color_init). > > Successfully bootstrapped on x86_64-pc-linux-gnu. > > OK for trunk? > > gcc/c/ChangeLog: > PR c++/62170 > * c-objc-common.c (c_tree_printer): Convert penultimate param > from > bool to bool *. Within '%T' handling, if showing an "aka", use > "quoted" param to add appropriate quoting. > > gcc/cp/ChangeLog: > PR c++/62170 > * error.c (type_to_string): Add leading comment. Add params > "postprocessed", "quote", and "show_color", using them to fix > quoting of the "aka" for types involving typedefs. > (arg_to_string): Update for new params to type_to_string. > (cxx_format_postprocessor::handle): Likewise. > (cp_printer): Convert penultimate param from bool to bool *. > Update call to type_to_string and calls to > defer_phase_2_of_type_diff. > > gcc/fortran/ChangeLog: > PR c++/62170 > * error.c (gfc_notify_std): Convert "quoted" param from bool to > bool *. > > gcc/ChangeLog: > PR c++/62170 > * pretty-print.c (pp_format): Move quoting implementation to > pp_begin_quote and pp_end_quote. Update pp_format_decoder call > to pass address of "quote" local. > (pp_begin_quote): New function. > (pp_end_quote): New function. > * pretty-print.h (printer_fn): Convert penultimate param from > bool > to bool *. > (pp_begin_quote): New decl. > (pp_end_quote): New decl. > * tree-diagnostic.c (default_tree_printer): Convert penultimate > param from bool to bool *. > * tree-diagnostic.h (default_tree_printer): Likewise. > > gcc/testsuite/ChangeLog: > PR c++/62170 > * g++.dg/cpp1z/direct-enum-init1.C: Update expected error > messages > to reflect fixes to quoting. > * g++.dg/diagnostic/aka1.C: Likewise. > * g++.dg/diagnostic/aka2.C: New test case. > * g++.dg/parse/error55.C: Update expected error messages to > reflect fixes to quoting. > * g++.dg/warn/pr12242.C: Likewise. > * g++.old-deja/g++.mike/enum1.C: Likewise. > * gcc.dg/diag-aka-1.c: Likewise. > * gcc.dg/diag-aka-2.c: New test case. > * gcc.dg/pr13804-1.c: Update expected error messages to reflect > fixes to quoting. > * gcc.dg/pr56980.c: Likewise. > * gcc.dg/pr65050.c: Likewise. > * gcc.dg/redecl-14.c: Likewise. > * gcc.dg/utf16-4.c Likewise. > * gcc.target/i386/sse-vect-types.c (__m128d): Likewise. > * obj-c++.dg/invalid-type-1.mm: Likewise. > --- > gcc/c/c-objc-common.c | 12 +- > gcc/cp/error.c | 92 -- > gcc/fortran/error.c| 2 +- > gcc/pretty-print.c | 37 +++- > gcc/pretty-print.h | 5 +- > gcc/testsuite/g++.dg/cpp1z/direct-enum-init1.C | 234 --- > -- > gcc/testsuite/g++.dg/diagnostic/aka1.C | 2 +- > gcc/testsuite/g++.dg/diagnostic/aka2.C | 32 > gcc/testsuite/g++.dg/parse/error55.C | 2 +- >
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
On 06/20/2017 02:27 AM, Richard Earnshaw (lists) wrote: > On 19/06/17 20:04, Jeff Law wrote: >> On 06/19/2017 11:50 AM, Joseph Myers wrote: >>> On Mon, 19 Jun 2017, Jeff Law wrote: >>> A key point to remember is that you can never have an allocation (potentially using more than one allocation site) which is larger than a page without probing the page. >>> >>> There's a platform ABI issue here. At least some kernel fixes for these >>> stack issues, as I understand it, increase the size of the stack guard to >>> more than a single page. It would be possible to define the ABI to >>> require such a larger guard for protection and so reduce the number of >>> (non-alloca/VLA-using) functions that need probes generated, depending on >>> whether a goal is to achieve security on kernels without such a fix. >>> (Thinking in terms of how to get to enabling such probes by default.) >> On 32 bit platforms we don't have a lot of address space left, so we >> have to be careful about creating too large of a guard. >> >> On 64 bit platforms we have a lot more freedom and I suspect larger >> guards, mandated by the ABI would be useful, if for no other reason than >> allowing us to allocate more stack without probing. A simple array of >> PATH_MAX characters triggers probing right now. I suspect (but didn't >> bother to confirm) that PATH_MAX array are what causes git to have so >> many large stacks. >> >> Also if we look at something like ppc and aarch64, we've currently got >> the PROBE_INTERVAL set to 4k. But in reality they're using much larger >> page sizes. So we could improve things there as well. >> > > There are aarch64 linux systems using 4k pages for compatibility with > existing aarch32 binaries. Ah. That's good to know. Thanks. jeff
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
On 06/20/2017 02:21 AM, Eric Botcazou wrote: >> Out of curiousity, does the old Alpha/VMS stack-checking API meet the >> requirements? From what I recall, I think it does. > > No, it's the usual probe-first-and-then-allocate strategy and Jeff rejects it > because of valgrind. I'd personally rather change valgrind but... I'm torn here. It'd certainly be a hell of a lot easier to punt this to valgrind, but with the issues I just couldn't get comfortable with that. We're probing pages which are potentially unmapped and far away from the stack pointer. The kernel and/or valgrind has to look at the reference and make a guess whether or not it's really a request for more stack or a wild pointer -- and there's no real good way to tell the difference. Thus we're dependent upon the heuristics used by the kernel and valgrind and we're dependent on those heuristics essentially being the same (and I'm certain they are not at this time :-) One could also argue that these probes are undefined behavior precisely because they potentially hit pages unmapped pages far away from the stack pointer. Any probing beyond the stack pointer is also going to trigger a valgrind warning. Valgrind has some support avoiding the warning if a reference hits the red zone -- but these probes can be well beyond the red zone. In a world where distros may be turning on -fstack-check= by default, valgrind has to continue to work and not generate an unreasonable number of false positive warnings else the tool becomes useless. Jeff
Re: [PATCH, contrib] Support multi-tool sum files in dg-cmp-results.sh
On Jun 20, 2017, at 8:31 AM, Thomas Preudhommewrote: > > 2017-06-14 Thomas Preud'homme > > * dg-cmp-results.sh: Keep test result lines rather than throwing > header and summary to support sum files with multiple tools. > > > Is this still ok? Ok.
Re: [Patch match.pd] Fold (A / (1 << B)) to (A >> B)
On Fri, Jun 16, 2017 at 11:41:57AM +0200, Richard Biener wrote: > On Fri, 16 Jun 2017, James Greenhalgh wrote: > > On Mon, Jun 12, 2017 at 03:56:25PM +0200, Richard Biener wrote: > > > + We can't do the same for signed A, as it might be negative, which > > > would > > > + introduce undefined behaviour. */ > > > > > > huh, AFAIR it is _left_ shift of negative values that invokes > > > undefined behavior. > > > > You're right this is not a clear comment. The problem is not undefined > > behaviour, so that text needs to go, but rounding towards/away from zero > > for signed negative values. Division will round towards zero, arithmetic > > right shift away from zero. For example in: > > > > -1 / (1 << 1) !=-1 >> 1 > > = -1 / 2 > > = 0 = -1 > > > > I've rewritten the comment to make it clear this is why we can only make > > this optimisation for unsigned values. > > Ah, of course. You could use > > if ((TYPE_UNSIGNED (type) > || tree_expr_nonnegative_p (@0)) > > here as improvement. Thanks, I've made that change. > > See, for example, gcc.c-torture/execute/pr34070-2.c > > > > > Note that as you are accepting vectors you need to make sure the > > > target actually supports arithmetic right shift of vectors > > > (you only know it supports left shift and division -- so it might > > > be sort-of-superfluous to check in case there is no arch that supports > > > those but not the other). > > > > I've added a check for that using optabs, is that the right way to do this? > > + && (!VECTOR_TYPE_P (type) > + || optab_for_tree_code (RSHIFT_EXPR, type, optab_vector) > + || optab_for_tree_code (RSHIFT_EXPR, type, optab_scalar))) > > is not enough -- you need sth like > > optab ot = optab_for_tree_code (RSHIFT_EXPR, type, optab_vector); > if (ot != unknown_optab > && optab_handler (ot, TYPE_MODE (type)) != CODE_FOR_nothing) >.. ok! ... > > ideally we'd have a helper for this in optab-tree.[ch], > tree-vect-patterns.c could also make use of that. OK. I've added "target_has_vector_rshift_p" for this purpose. Bootstrapped and tested on aarch64-none-linux-gnu with no issues. OK? Thanks, James --- gcc/ 2017-06-19 James Greenhalgh* match.pd (A / (1 << B) -> A >> B): New. * generic-match-head.c: Include optabs-tree.h. * gimple-match-head.c: Likewise. * optabs-tree.h (target_has_vector_rshift_p): New. * optabs-tree.c (target_has_vector_rshift_p): New. gcc/testsuite/ 2017-06-19 James Greenhalgh * gcc.dg/tree-ssa/forwprop-37.c: New. diff --git a/gcc/generic-match-head.c b/gcc/generic-match-head.c index 0c0d182..4504401 100644 --- a/gcc/generic-match-head.c +++ b/gcc/generic-match-head.c @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see #include "builtins.h" #include "case-cfn-macros.h" #include "gimplify.h" +#include "optabs-tree.h" /* Routine to determine if the types T1 and T2 are effectively diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c index e7e9839..5f6aa27 100644 --- a/gcc/gimple-match-head.c +++ b/gcc/gimple-match-head.c @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. If not see #include "internal-fn.h" #include "case-cfn-macros.h" #include "gimplify.h" +#include "optabs-tree.h" /* Forward declarations of the private auto-generated matchers. diff --git a/gcc/match.pd b/gcc/match.pd index 244e9eb..eb6bd59 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -147,6 +147,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (op @0 integer_onep) (non_lvalue @0))) +/* (A / (1 << B)) -> (A >> B). + Only for unsigned A. For signed A, this would not preserve rounding + toward zero. + For example: (-1 / ( 1 << B)) != -1 >> B. */ +(simplify + (trunc_div @0 (lshift integer_onep@1 @2)) + (if ((TYPE_UNSIGNED (type) || tree_expr_nonnegative_p (@0)) + && (!VECTOR_TYPE_P (type) + || target_has_vector_rshift_p (type, optab_default))) + (rshift @0 @2))) + /* Preserve explicit divisions by 0: the C++ front-end wants to detect undefined behavior in constexpr evaluation, and assuming that the division traps enables better optimizations than these anyway. */ diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c index 4bb54ba..4a513d2 100644 --- a/gcc/optabs-tree.c +++ b/gcc/optabs-tree.c @@ -376,3 +376,24 @@ init_tree_optimization_optabs (tree optnode) ggc_free (tmp_optabs); } } + +/* Return TRUE if the target has support for vector right shift of an + operand of type TYPE. If OT_TYPE is OPTAB_DEFAULT, check for existence + of a shift by either a scalar or a vector. Otherwise, check only + for a shift that matches OT_TYPE. */ + +bool +target_has_vector_rshift_p (tree type, enum optab_subtype ot_type) +{ + gcc_assert (VECTOR_TYPE_P (type)); + if (ot_type != optab_default) +{ + optab ot = optab_for_tree_code (RSHIFT_EXPR, type,
Re: [PATCH] [SPARC] Add a workaround for the LEON3FT store-store errata
From: Sebastian HuberDate: Tue, 20 Jun 2017 07:55:33 +0200 > would someone mind reviewing this patch please. It was already sent > for review on January this year and got no attention. Now we are in a > different development stage. > > https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01354.html I'm fine with this change.
[Patch AArch64] Add initial tuning support for Cortex-A55 and Cortex-A75
Hi, This patch adds support for the ARM Cortex-A75 and Cortex-A55 processors through the -mcpu/-mtune values cortex-a55 and cortex-a75, and an ARM DynamIQ big.LITTLE configuration of these two processors through the -mcpu/-mtune value cortex-a75.cortex-a55 The ARM Cortex-A75 is ARM's latest and highest performance applications processor. For the initial tuning provided in this patch, I have chosen to share the tuning structure with its predecessor, the Cortex-A73. The ARM Cortex-A55 delivers the best combination of power efficiency and performance in its class. For the initial tuning provided in this patch, I have chosen to share the tuning structure with its predecessor, the Cortex-A53. Both Cortex-A55 and Cortex-A75 support ARMv8-A with the ARM8.1-A and ARMv8.2-A extensions, along with the cryptography extension, and the RCPC extensions from ARMv8.3-A. This is reflected in the patch, -mcpu=cortex-a75 is treated as equivalent to passing -mtune=cortex-a75 -march=armv8.2-a+rcpc . Tested on aarch64-none-elf with no issues. OK for trunk? Thanks, James --- 2017-06-20 James Greenhalgh* config/aarch64/aarch64-cores.def (cortex-a55): New. (cortex-a75): Likewise. (cortex-a75.cortex-a55): Likewise. * config/aarch64/aarch64-tune.md: Regenerate. * doc/invoke.texi (-mtune): Document new values for -mtune. diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index e333d5f..0baa20c 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -80,6 +80,12 @@ AARCH64_CORE("vulcan", vulcan, thunderx2t99, 8_1A, AARCH64_FL_FOR_ARCH8_1 | AA /* Cavium ('C') cores. */ AARCH64_CORE("thunderx2t99", thunderx2t99, thunderx2t99, 8_1A, AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x43, 0x0af, -1) +/* ARMv8.2-A Architecture Processors. */ + +/* ARM ('A') cores. */ +AARCH64_CORE("cortex-a55", cortexa55, cortexa53, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_RCPC, cortexa53, 0x41, 0xd05, -1) +AARCH64_CORE("cortex-a75", cortexa75, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_RCPC, cortexa73, 0x41, 0xd0a, -1) + /* ARMv8-A big.LITTLE implementations. */ AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE (0xd07, 0xd03), -1) @@ -87,4 +93,8 @@ AARCH64_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8A, AARCH AARCH64_CORE("cortex-a73.cortex-a35", cortexa73cortexa35, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd09, 0xd04), -1) AARCH64_CORE("cortex-a73.cortex-a53", cortexa73cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd09, 0xd03), -1) +/* ARM DynamIQ big.LITTLE configurations. */ + +AARCH64_CORE("cortex-a75.cortex-a55", cortexa75cortexa55, cortexa53, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_RCPC, cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd0a, 0xd05), -1) + #undef AARCH64_CORE diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md index 4209f67..7fcd6cb 100644 --- a/gcc/config/aarch64/aarch64-tune.md +++ b/gcc/config/aarch64/aarch64-tune.md @@ -1,5 +1,5 @@ ;; -*- buffer-read-only: t -*- ;; Generated automatically by gentune.sh from aarch64-cores.def (define_attr "tune" - "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53" + "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55" (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 86c8d62..2746c3e 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -14077,17 +14077,19 @@ processors implementing the target architecture. @opindex mtune Specify the name of the target processor for which GCC should tune the performance of the code. Permissible values for this option are: -@samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a57}, -@samp{cortex-a72}, @samp{cortex-a73}, @samp{exynos-m1}, -@samp{xgene1}, @samp{vulcan}, @samp{thunderx}, +@samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55}, +@samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75}, +@samp{exynos-m1}, @samp{xgene1}, @samp{vulcan}, @samp{thunderx}, @samp{thunderxt88}, @samp{thunderxt88p1}, @samp{thunderxt81}, @samp{thunderxt83}, @samp{thunderx2t99}, @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53},
Re: [PATCH, contrib] Support multi-tool sum files in dg-cmp-results.sh
Hi Mike, Sorry, there was a mistake in the patch I sent. Please find an updated patch below. ChangeLog entry unchanged: *** contrib/ChangeLog *** 2017-06-14 Thomas Preud'homme* dg-cmp-results.sh: Keep test result lines rather than throwing header and summary to support sum files with multiple tools. Is this still ok? Best regards, Thomas On 19/06/17 16:55, Mike Stump wrote: On Jun 14, 2017, at 5:30 AM, Thomas Preudhomme wrote: 2017-06-14 Thomas Preud'homme * dg-cmp-results.sh: Keep test result lines rather than throwing header and summary to support sum files with multiple tools. Tested successfully on sum file with single tool with similar results and on sum file with multiple tools now showing a regression with patch proposed in https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00875.html Is this ok for trunk? Ok. diff --git a/contrib/dg-cmp-results.sh b/contrib/dg-cmp-results.sh index d291769547dcd2a02ecf6f80d60d6be7802af4fd..921e9337d1f8ffea78ef566c351fb48a8f6ca064 100755 --- a/contrib/dg-cmp-results.sh +++ b/contrib/dg-cmp-results.sh @@ -90,8 +90,7 @@ echo "Newer log file: $NFILE" sed $E -e '/^[[:space:]]+===/,$d' $NFILE # Create a temporary file from the old file's interesting section. -sed $E -e "1,/$header/d" \ - -e '/^[[:space:]]+===/,$d' \ +sed $E -e '/^Running target /,/^[[:space:]]+===.*Summary ===/!d' \ -e '/^[A-Z]+:/!d' \ -e '/^(WARNING|ERROR):/d' \ -e 's/\r$//' \ @@ -101,8 +100,7 @@ sed $E -e "1,/$header/d" \ >/tmp/o$$-$OBASE # Create a temporary file from the new file's interesting section. -sed $E -e "1,/$header/d" \ - -e '/^[[:space:]]+===/,$d' \ +sed $E -e '/^Running target /,/^[[:space:]]+===.*Summary ===/!d' \ -e '/^[A-Z]+:/!d' \ -e '/^(WARNING|ERROR):/d' \ -e 's/\r$//' \
Re: [PATCH/AARCH64] Improve aarch64 conditional compare usage
On Tue, 2017-06-20 at 14:58 +0100, James Greenhalgh wrote: > On Fri, Jun 16, 2017 at 10:06:51AM -0700, Steve Ellcey wrote: > > > > > > https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00021.html > > > > Ping. > Hi Steve, > > These changes all look like they are to the tree pass rather than to the > AArch64 back end. Maybe reposting it without the AArch64 tag will get it > more visibility from people other than the AArch64 maintainers? > > Cheers, > James Someone else made the same suggestion so I re-pinged here: https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01338.html With the subject "[PATCH] Ping of ccmp.c (conditional compare) patch" Steve Ellcey sell...@cavium.com
Re: [PATCH][AArch64] Improve dup pattern
James Greenhalgh wrote: > > Have you tested this in cases where an integer dup is definitely the right > thing to do? Yes, this still generates: #include void f(unsigned a, unsigned b, uint32x4_t *c) { c[0] = vdupq_n_u32(a); c[1] = vdupq_n_u32(b); } dup v1.4s, w0 dup v0.4s, w1 str q1, [x2] str q0, [x2, 16] ret The reason is that the GP to FP register move cost is typically >= 5, while the additional cost of '?' is just 1. > And similar cases? If these still look good, then the patch is OK - though > I'm still very nervous about the register allocator cost model! Well it's complex and hard to get working well... However slightly preferring one variant works alright (unlike using '*' which results in incorrect costs). Wilco
[PATCH, GCC/ARM] Remove ARMv8-M code for D17-D31
Hi, Function cmse_nonsecure_entry_clear_before_return has code to deal with high VFP register (D16-D31) while ARMv8-M Baseline and Mainline both do not support more than 16 double VFP registers (D0-D15). This makes this security-sensitive code harder to read for not much benefit since libcall for cmse_nonsecure_call functions do not deal with those high VFP registers anyway. This commit gets rid of this code for simplicity and fixes 2 issues in the same function: - stop the first loop when reaching maxregno to avoid dealing with VFP registers if targetting Thumb-1 or using -mfloat-abi=soft - include maxregno in that loop ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-06-13 Thomas Preud'homme* config/arm/arm.c (arm_option_override): Forbid ARMv8-M Security Extensions with more than 16 double VFP registers. (cmse_nonsecure_entry_clear_before_return): Remove second entry of to_clear_mask and all code related to it and make the remaining entry a 64-bit scalar integer variable and adapt code accordingly. Testing: Testsuite shows no regression when run for ARMv8-M Baseline and ARMv8-M Mainline. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 259597d8890ee84c5bd92b12b6f9f6521c8dcd2e..60a4d1f46765d285de469f51fbb5a0ad76d56d9b 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3620,6 +3620,11 @@ arm_option_override (void) if (use_cmse && !arm_arch_cmse) error ("target CPU does not support ARMv8-M Security Extensions"); + /* We don't clear D16-D31 VFP registers for cmse_nonsecure_call functions + and ARMv8-M Baseline and Mainline do not allow such configuration. */ + if (use_cmse && LAST_VFP_REGNUM > LAST_LO_VFP_REGNUM) +error ("ARMv8-M Security Extensions incompatible with selected FPU"); + /* Disable scheduling fusion by default if it's not armv7 processor or doesn't prefer ldrd/strd. */ if (flag_schedule_fusion == 2 @@ -24996,15 +25001,15 @@ thumb1_expand_prologue (void) void cmse_nonsecure_entry_clear_before_return (void) { - uint64_t to_clear_mask[2]; + uint64_t to_clear_mask; uint32_t padding_bits_to_clear = 0; uint32_t * padding_bits_to_clear_ptr = _bits_to_clear; int regno, maxregno = IP_REGNUM; tree result_type; rtx result_rtl; - to_clear_mask[0] = (1ULL << (NUM_ARG_REGS)) - 1; - to_clear_mask[0] |= (1ULL << IP_REGNUM); + to_clear_mask = (1ULL << (NUM_ARG_REGS)) - 1; + to_clear_mask |= (1ULL << IP_REGNUM); /* If we are not dealing with -mfloat-abi=soft we will need to clear VFP registers. We also check that TARGET_HARD_FLOAT and !TARGET_THUMB1 hold @@ -25015,23 +25020,22 @@ cmse_nonsecure_entry_clear_before_return (void) maxregno = LAST_VFP_REGNUM; float_mask &= ~((1ULL << FIRST_VFP_REGNUM) - 1); - to_clear_mask[0] |= float_mask; - - float_mask = (1ULL << (maxregno - 63)) - 1; - to_clear_mask[1] = float_mask; + to_clear_mask |= float_mask; /* Make sure we don't clear the two scratch registers used to clear the relevant FPSCR bits in output_return_instruction. */ emit_use (gen_rtx_REG (SImode, IP_REGNUM)); - to_clear_mask[0] &= ~(1ULL << IP_REGNUM); + to_clear_mask &= ~(1ULL << IP_REGNUM); emit_use (gen_rtx_REG (SImode, 4)); - to_clear_mask[0] &= ~(1ULL << 4); + to_clear_mask &= ~(1ULL << 4); } + gcc_assert ((unsigned) maxregno <= sizeof (to_clear_mask) * __CHAR_BIT__); + /* If the user has defined registers to be caller saved, these are no longer restored by the function before returning and must thus be cleared for security purposes. */ - for (regno = NUM_ARG_REGS; regno < LAST_VFP_REGNUM; regno++) + for (regno = NUM_ARG_REGS; regno <= maxregno; regno++) { /* We do not touch registers that can be used to pass arguments as per the AAPCS, since these should never be made callee-saved by user @@ -25041,7 +25045,7 @@ cmse_nonsecure_entry_clear_before_return (void) if (IN_RANGE (regno, IP_REGNUM, PC_REGNUM)) continue; if (call_used_regs[regno]) - to_clear_mask[regno / 64] |= (1ULL << (regno % 64)); + to_clear_mask |= (1ULL << regno); } /* Make sure we do not clear the registers used to return the result in. */ @@ -25052,7 +25056,7 @@ cmse_nonsecure_entry_clear_before_return (void) /* No need to check that we return in registers, because we don't support returning on stack yet. */ - to_clear_mask[0] + to_clear_mask &= ~compute_not_to_clear_mask (result_type, result_rtl, 0, padding_bits_to_clear_ptr); } @@ -25063,7 +25067,7 @@ cmse_nonsecure_entry_clear_before_return (void) /* Padding bits to clear is not 0 so we know we are dealing with returning a composite type, which only uses r0. Let's make sure that r1-r3 is cleared too, we will use r1 as a scratch register. */ -
Re: SSA range class and removal of VR_ANTI_RANGEs
On 06/20/2017 02:41 AM, Aldy Hernandez wrote: On 05/23/2017 03:26 PM, Martin Sebor wrote: On 05/23/2017 04:48 AM, Aldy Hernandez wrote: + void Union (wide_int x, wide_int y); + bool Union (const irange ); + bool Union (const irange , const irange ); + + // THIS = THIS ^ [X,Y]. Return TRUE if result is non-empty. + bool Intersect (wide_int x, wide_int y, bool readonly = false); + // THIS = THIS ^ R. Return TRUE if result is non-empty. + // THIS = R1 ^ R2. Return TRUE if result is non-empty. + bool Intersect (const irange , const irange , bool readonly = false); + // Return TRUE if THIS ^ R will be non-empty. + bool Intersect_p (const irange ) +{ return Intersect (r, /*readonly=*/true); } I would suggest the following changes to Union, Intersect, and Not: 1) Define all three members without the readonly argument and returning irange& (i.e., *this). The return value can be used wherever irange& is expected, and the is_empty() member function can be called on it to obtain the same result. E.g., Intersect A with B, storing the result in A: irange A, B; if (A.Intersect (B).is_empty ()) { ... } 2) Add non-members like so: irange range_union (const irange , const irange ) { return irange (lhs).Union (rhs); } and find out if the union of A or B is empty without modifying either argument: irange A, B; if (range_union (A, B).is_empty ()) { ... } Perhaps we could provide an implicit conversion from irange to bool such that we could write: if (range_union (A, B)) { ... } as well as being able to write: if (!range_union (A, B).is_empty ()) { ... } That is, have range_union() return an irange as suggested, but have a bool overload (or whatever the C++ nomenclature is) such that converting an irange to a bool is interpreted as ``nitems != 0''. Is this acceptable C++ practice? Implicit conversion to bool is a common way of testing validity but I don't think it would be too surprising to use it as a test for non-emptiness. An alternative to consider is to provide an implicit conversion to an unsigned integer (instead of num_ranges()(*)) and have it return the number of ranges. That will make it possible to do the same thing as above while also simplifying the API. Martin [*] FWIW, there's nothing wrong with the name num_ranges() but those familiar with the C++ standard library are going to be accustomed to size() as the name of a function that returns the number of elements in a container. Since the irange class is an ordered sequence of ranges, size() would work for it too. PS Thinking of the irange class as a container of ranges suggests the design might benefit from introducing a simple lower-level abstraction (class) for a single contiguous range.
Re: [PATCH][AArch64] Mark symbols as constant
Richard Earnshaw wrote: > What testing has this had with -fpic? I'm not convinced that this > assertion is true in that case? I ran the GLIBC tests which pass. -fpic works since it does also form a constant address, ie. instead of: adrp x1, global add x1, x1, :lo12:global we do: adrp x1, :got:global ldr x1, [x1, :got_lo12:global] CSEing or rematerializing either sequence works in the same way. With TLS the resulting addresses are also constant, however this could cause rather complex TLS sequences to be rematerialized. It seems best to block that. Updated patch below: Aarch64_legitimate_constant_p currently returns false for symbols, eventhough they are always valid constants. This means LOSYM isn't CSEd correctly. If we return true CSE works better, resulting in smaller/faster code (0.3% smaller code on SPEC2006). Avoid this for TLS symbols since their sequence is complex. int x0 = 1, x1 = 2, x2 = 3; int f (int x, int y) { x += x1; if (x > 100) y += x2; x += x0; return x + y; } Before: adrpx3, .LANCHOR0 add x4, x3, :lo12:.LANCHOR0 ldr w2, [x3, #:lo12:.LANCHOR0] add w0, w0, w2 cmp w0, 100 ble .L5 ldr w2, [x4, 8] add w1, w1, w2 .L5: add x3, x3, :lo12:.LANCHOR0 ldr w2, [x3, 4] add w0, w0, w2 add w0, w0, w1 ret After: adrpx2, .LANCHOR0 add x3, x2, :lo12:.LANCHOR0 ldr w2, [x2, #:lo12:.LANCHOR0] add w0, w0, w2 cmp w0, 100 ble .L5 ldr w2, [x3, 8] add w1, w1, w2 .L5: ldr w2, [x3, 4] add w0, w0, w2 add w0, w0, w1 ret Bootstrap OK, OK for commit? ChangeLog: 2017-06-20 Wilco Dijkstra* config/aarch64/aarch64.c (aarch64_legitimate_constant_p): Return true for non-tls symbols. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5ec6bbfcf484baa4005b8a88cb98d0d04f710877..060cd8476d2954119daac495ecb059c9be73edbe 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -10111,6 +10111,11 @@ aarch64_legitimate_constant_p (machine_mode mode, rtx x) && aarch64_valid_symref (XEXP (x, 0), GET_MODE (XEXP (x, 0 return true; + /* Treat symbols as constants. Avoid TLS symbols as they are complex, + so spilling them is better than rematerialization. */ + if (SYMBOL_REF_P (x) && !SYMBOL_REF_TLS_MODEL (x)) +return true; + return aarch64_constant_address_p (x); }
Re: [Patch AArch64] Add rcpc extension
On 20/06/17 14:50, James Greenhalgh wrote: > > Hi, > > While GCC doesn't need to know anything about the RcPc extension for code > generation, we do need to add the extension flag to the string we pass > to the assembler when we're compiling for a CPU which implements the RcPc > extension. > > I've built a toolchain with this patch applied, and checked that we > correctly pass +rcpc on to the assembler if we give something like > -mcpu=generic+rcpc . > > OK? > > Thanks, > James > OK. R. > --- > 2017-06-20 James Greenhalgh> > * config/aarch64/aarch64-option-extensions.def (rcpc): New. > * config/aarch64/aarch64.h (AARCH64_FL_RCPC): New. > > > 0001-Patch-AArch64-Add-rcpc-extension.patch > > > diff --git a/gcc/config/aarch64/aarch64-option-extensions.def > b/gcc/config/aarch64/aarch64-option-extensions.def > index b54de03..c0752ce 100644 > --- a/gcc/config/aarch64/aarch64-option-extensions.def > +++ b/gcc/config/aarch64/aarch64-option-extensions.def > @@ -60,4 +60,7 @@ AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, > "atomics") > Disabling "fp16" just disables "fp16". */ > AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, 0, "fphp > asimdhp") > > +/* Enabling or disabling "rcpc" only changes "rcpc". */ > +AARCH64_OPT_EXTENSION("rcpc", AARCH64_FL_RCPC, 0, 0, "lrcpc") > + > #undef AARCH64_OPT_EXTENSION > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index e4fb96f..3b3f27e 100644 > --- a/gcc/config/aarch64/aarch64.h > +++ b/gcc/config/aarch64/aarch64.h > @@ -140,6 +140,7 @@ extern unsigned aarch64_architecture_version; > #define AARCH64_FL_F16 (1 << 9) /* Has ARMv8.2-A FP16 > extensions. */ > /* ARMv8.3-A architecture extensions. */ > #define AARCH64_FL_V8_3(1 << 10) /* Has ARMv8.3-A features. */ > +#define AARCH64_FL_RCPC(1 << 11) /* Has support for RCpc model. > */ > > /* Has FP and SIMD. */ > #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD) >
Re: [PATCH 2/2] DWARF: make it possible to emit debug info for declarations only
On 06/20/2017 02:16 PM, Richard Biener wrote: Nice. This looks ok. Great, thank you! I'm mildy curious about the deecrease of debuginfo size for cc1 -- did you spot anything obvious there? Well, the benchmark I exposed was for the whole file size, not just .debug_info section size. Just to be sure, I compared object files for both trunk and my patched tree: outside of Ada units, I only get the following evolution: gcc/dwarf2out.o: -10168 bytes gcc/godump.o: +272 bytes gcc/passes.o: +880 bytes This diff comes from my changes themselves. I had a quick look at the same for cc1’s .debug_info: there is the expected evolution, too, I suspect Fortran wants to do sth similar as Ada for imported modules. Maybe. I have zero Fortran knowledge, so I’ll let a Fortran expert decide, if that is fine for you. :-) In any case, the back-end is ready for that. -- Pierre-Marie de Rodat
Re: [testsuite, i386] Always check for target i?86 and x86_64
On 06/20/2017 01:35 PM, Rainer Orth wrote: > JonY <10wa...@gmail.com> writes: > >> On 06/20/2017 01:01 PM, Rainer Orth wrote: >>> Given that there were no other comments, I've installed the patch. It >>> would still be nice if the Cygwin/MingW maintainer could comment on the >>> testcase situation for those targets. >> >> Honestly, I'm not sure how ms-bitfields work on non-Windows targets, > > I just noticed that it's handled in generic code in i386.c and the > affected tests worked on the likes of Linux and Solaris ;-) gcc/testsuite/gcc.dg/array-quals-1.casm bits probably won't work due to object format difference gcc/testsuite/gcc.dg/lto/20091013-1_1.c I'm not familiar with LTO enough to comment on it gcc/testsuite/gcc.dg/lto/20091013-1_2.c dto. gcc/testsuite/gcc.dg/pr32370.c This probably should be safe to enable for all. gcc/testsuite/gcc.dg/pr50251.c dto. gcc/testsuite/gcc.dg/tls/thr-cse-1.cI think this should stay as is. gcc/testsuite/gcc.dg/weak/weak-15.c ELF weak symbols don't really work on Windows PE format and are known to be broken gcc/testsuite/gcc.dg/weak/weak-16.c dto. gcc/testsuite/gcc.dg/weak/weak-2.c dto. gcc/testsuite/gcc.dg/weak/weak-3.c dto. gcc/testsuite/gcc.dg/weak/weak-4.c dto. gcc/testsuite/gcc.dg/weak/weak-5.c dto. libffi/testsuite/libffi.call/cls_longdouble_va.c iirc libffi hasn't been ported for 64bit Windows gcc/testsuite/g++.dg/abi/bitfield3.CShould remain 32bit specific gcc/testsuite/g++.dg/ext/dllexport3.C Should be x86_64-*-cygwin too gcc/testsuite/g++.dg/ext/selectany1.C dto. gcc/testsuite/g++.dg/ext/selectany2.C dto. gcc/testsuite/g++.old-deja/g++.ext/attrib5.CProbably broken due to the underscore prefix in asm names for 32bit mingw/cygwin gcc/testsuite/gcc.dg/dll-3.cShould be x86_64-*-cygwin too gcc/testsuite/gcc.dg/dll-4.cdto. gcc/testsuite/gcc.dg/dll-5.cdto. gcc/testsuite/gcc.dg/dll-8.cdto. gcc/testsuite/gcc.dg/tree-ssa/loop-1.c dto. gcc/testsuite/gcc.target/i386/fastcall-1.c Keep as is, x86 specific test case. I can't comment on the ARM mingw* port, I have no experience with it. signature.asc Description: OpenPGP digital signature
Re: [patch, libfortran, RFC] Speed up cshift with array shift
Hi Thomas, On my machine I get the following timings without the patch cpu time cshift dim=1 0.490763009 cpu time do loop dim=15.57969809E-02 cpu time cshift dim=2 0.416319966 cpu time do loop dim=2 0.187106013 cpu time cshift dim=31.37362707 cpu time do loop dim=31.39690399 and cpu time cshift dim=1 0.166012987 cpu time do loop dim=15.48990071E-02 cpu time cshift dim=2 0.183587968 cpu time do loop dim=2 0.191835046 cpu time cshift dim=31.35024190 cpu time do loop dim=31.42215610 with the patch. Do you understand why cshift is so slow for dim=3? Thanks for working on this issue. Dominique PS See also pr45689.
[AArch64] Improve HFA code generation
Hi, For this code: struct y { float x[4]; }; float bar3 (struct y x) { return x.x[3]; } GCC generates: bar3: fmovx1, d2 mov x0, 0 bfi x0, x1, 0, 32 fmovx1, d3 bfi x0, x1, 32, 32 sbfxx0, x0, 32, 32 fmovs0, w0 ret If you can wrap your head around that, you'll spot that it could be simplified to: bar3: fmovs0, s3 ret Looking at it, I think the issue is the mode that we assign to the PARALLEL we build for an HFA in registers. When we get in to aarch64_layout_arg with a composite, MODE is set to the smallest integer mode that would contain the size of the composite type. That is to say, in the example above, MODE will be TImode. Looking at the expansion path through assign_parms, we're going to go: assign_parms assign_parm_setup_reg assign_parm_remove_parallels emit_group_store assign_parm_remove_parallels is going to try to create a REG in MODE, then construct that REG using the values in the HFA PARALLEL we created. So, for the example above, we're going to try to create a packed TImode value built up from each of the four "S" registers we've assigned for the arguments. Using one of the struct elements is then a 32-bit extract from the TImode value (then a move back to FP/SIMD registers). This explains the code-gen in the example. Note that an extract from the TImode value makes the whole TImode value live, so we can't optimize away the construction in registers. If instead we make the PARALLEL that we create in aarch64_layout_arg BLKmode then our expansion path is through: assign_parms assign_parm_setup_block Which handles creating a stack slot of the right size for our HFA, and copying it to there. We could then trust the usual optimisers to deal with the object construction and eliminate it where possible. However, we can't just return a BLKmode Parallel, as the mid-end was explictly asking us to return in MODE, and will eventually ICE given the inconsistency. One other way we can force these structures to be given BLKmode is through TARGET_MEMBER_TYPE_FORCES_BLK. Which is what we do in this patch. We're going to tell the mid-end that any structure of more than one element which contains either floating-point or vector data should be set out in BLKmode rather than a large-enough integer mode. In doing so, we implicitly fix the issue with HFA layout above. But at what cost! A long running deficiency in GCC's code-gen (doesn't clean up stack allocations after stack uses have been eliminated) prevents us from getting what we really wanted, but: bar3: sub sp, sp, #16 fmovs0, s3 add sp, sp, 16 ret is pretty close, and a huge improvement over where we are today. Note that we can still get some pretty bad code-generation out of the compiler when passing and returning structs. I particularly like this one: struct y { float x[4]; }; struct y bar (struct y x) { return x; } bar: sub sp, sp, #48 stp s0, s1, [sp, 16] stp s2, s3, [sp, 24] ldp x0, x1, [sp, 16] stp x0, x1, [sp, 32] ldp s0, s1, [sp, 32] ldp s2, s3, [sp, 40] add sp, sp, 48 ret But that looks to be a seperate issue, and is not substantially worse tha current trunk: bar: fmovx2, d0 mov x1, 0 mov x0, 0 bfi x1, x2, 0, 32 fmovx2, d2 bfi x0, x2, 0, 32 fmovx2, d1 bfi x1, x2, 32, 32 fmovx2, d3 bfi x0, x2, 32, 32 ubfxx2, x1, 0, 32 ubfxx1, x1, 32, 32 fmovs0, w2 ubfxx3, x0, 0, 32 fmovs1, w1 ubfxx0, x0, 32, 32 fmovs2, w3 fmovs3, w0 ret I've benchamrked this with Spec2000 and found no performance differences. And bootstrapped on aarch64-none-linux-gnu with no issues. Does this look like a sensible approach and if so, is it OK for trunk? Thanks, James --- gcc/ 2017-06-20 James Greenhalgh* config/aarch64/aarch64.c (aarch64_layout_arg): Construct HFA PARALLELs in BLKmode. gcc/testsuite/ 2017-06-20 James Greenhalgh * gcc.target/aarch64/hfa_1.c: New. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 04417dc..a147068 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -14925,6 +14925,32 @@ aarch64_sched_can_speculate_insn (rtx_insn *insn) } } +/* We're an composite type, so MODE is the smallest integer mode + that can fit the total size of our aggregate. However, + we're going to build a parallel that contains each of our + registers, and GCC is going to emit code to move them in + to a packed value in MODE. As an example, for an HFA of + two
Re: [PATCH] Fix PR71815 (SLSR misses PHI opportunities)
On Jun 20, 2017, at 6:23 AM, Richard Bienerwrote: > > On Fri, Jun 16, 2017 at 6:10 PM, Bill Schmidt > wrote: >> Hi, >> >> PR71815 identifies a situation where SLSR misses opportunities for >> PHI candidates when code hoisting is enabled (which is now on by >> default). The basic problem is that SLSR currently uses an overly >> simple test for profitability of the transformation. The algorithm >> currently requires that the PHI basis (through which the non-local >> SLSR candidate is propagated) has only one use, which is the >> candidate statement. The true requirement for profitability is >> that, if the candidate statement will be dead after transformation, >> then so will the PHI candidate. >> >> This patch fixes the problem by looking at the transitive reachability >> of the PHI definitions. If all paths terminate in the candidate >> statement, then we know the PHI basis will go dead and we will not >> make the code worse with the planned replacement. To avoid compile >> time issues, path search is arbitrarily terminated at depth 10. The >> new test is used throughout the cost calculation, so appears multiple >> times in the code. >> >> Also, I've added a check to avoid replacing multiply candidates with >> a stride of 1. Such a candidate is really a copy or cast statement, >> and if we replace it, we will just generate a different copy or cast >> statement. I noticed this with one of the test cases from the PR >> while debugging the problem. >> >> I've updated the two test cases that were previously enabled only >> with -fno-code-hoisting, removing that restriction. >> >> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no >> regressions. I've also tested this with SPEC cpu2006 and the >> patch is performance neutral on a POWER8 box (as expected). Is >> this ok for trunk? >> >> Thanks, >> Bill >> >> >> [gcc] >> >> 2016-06-16 Bill Schmidt >> >>* gimple-ssa-strength-reduction.c (uses_consumed_by_stmt): New >>function. >>(find_basis_for_candidate): Call uses_consumed_by_stmt rather than >>has_single_use. >>(slsr_process_phi): Likewise. >>(replace_uncond_cands_and_profitable_phis): Don't replace a >>multiply candidate with a stride of 1 (copy or cast). >>(phi_incr_cost): Call uses_consumed_by_stmt rather than >>has_single_use. >>(lowest_cost_path): Likewise. >>(total_savings): Likewise. >> >> [gcc/testsuite] >> >> 2016-06-16 Bill Schmidt >> >>* gcc.dg/tree-ssa/slsr-35.c: Remove -fno-code-hoisting workaround. >>* gcc.dg/tree-ssa/slsr-36.c: Likewise. >> >> >> Index: gcc/gimple-ssa-strength-reduction.c >> === >> --- gcc/gimple-ssa-strength-reduction.c (revision 239241) >> +++ gcc/gimple-ssa-strength-reduction.c (working copy) >> @@ -475,6 +475,48 @@ find_phi_def (tree base) >> return c->cand_num; >> } >> >> +/* Determine whether all uses of NAME are directly or indirectly >> + used by STMT. That is, we want to know whether if STMT goes >> + dead, the definition of NAME also goes dead. */ >> +static bool >> +uses_consumed_by_stmt (tree name, gimple *stmt, unsigned recurse) > > use a default arg 'unsigned recurse = 0' to hide this implementation > detail at users. Good idea, thanks. > >> +{ >> + gimple *use_stmt; >> + imm_use_iterator iter; >> + bool retval = true; >> + >> + FOR_EACH_IMM_USE_STMT (use_stmt, iter, name) >> +{ >> + if (use_stmt == stmt || is_gimple_debug (use_stmt)) >> + continue; >> + >> + if (!is_gimple_assign (use_stmt)) >> + { >> + retval = false; >> + BREAK_FROM_IMM_USE_STMT (iter); >> + } >> + >> + /* Limit recursion. */ >> + if (recurse >= 10) >> + { >> + retval = false; >> + BREAK_FROM_IMM_USE_STMT (iter); >> + } > > Put this limit right before the recursion. > >> + tree next_name = gimple_get_lhs (use_stmt); >> + if (!next_name || !is_gimple_reg (next_name)) >> + { >> + retval = false; >> + BREAK_FROM_IMM_USE_STMT (iter); >> + } >> + >> + if (uses_consumed_by_stmt (next_name, stmt, recurse + 1)) >> + continue; > > So this doesn't change dependent on the result which means you likely meant > > if (! uses) > { > retval = false; > BREAK... > } > > which possibly also invalidates your testing? Grumble. Can't believe I did that. Yep, will respin. > > The whole thing is probably easier to optimize if you merge the ifs > that break into one. Will do! Thanks, Richard! Bill > > Richard. > >> +} >> + >> + return retval; >> +} >> + >> /* Helper routine for find_basis_for_candidate. May be called twice: >>once for the candidate's base expr,
[PING^3] re [PATCH v2] C++: Add fix-it hints for -Wold-style-cast
Ping re this patch: https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00204.html (more description can be seen in v1 of the patch here: https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01429.html ) On Mon, 2017-06-05 at 12:41 -0400, David Malcolm wrote: > Ping re this patch: > > https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00204.html > > On Fri, 2017-05-26 at 15:35 -0400, David Malcolm wrote: > > On Wed, 2017-05-03 at 09:51 -0400, David Malcolm wrote: > > > On Thu, 2017-04-27 at 23:03 +0200, Marek Polacek wrote: > > > > On Thu, Apr 27, 2017 at 05:10:24PM -0400, David Malcolm wrote: > > > > > + /* First try const_cast. */ > > > > > + trial = build_const_cast (dst_type, orig_expr, 0 /* > > > > > complain > > > > > */); > > > > > + if (trial != error_mark_node) > > > > > +return "const_cast"; > > > > > + > > > > > + /* If that fails, try static_cast. */ > > > > > + trial = build_static_cast (dst_type, orig_expr, 0 /* > > > > > complain > > > > > */); > > > > > + if (trial != error_mark_node) > > > > > +return "static_cast"; > > > > > + > > > > > + /* Finally, try reinterpret_cast. */ > > > > > + trial = build_reinterpret_cast (dst_type, orig_expr, 0 /* > > > > > complain */); > > > > > + if (trial != error_mark_node) > > > > > +return "reinterpret_cast"; > > > > > > > > I think you'll want tf_none instead of 0 /* complain */ in > > > > these. > > > > > > > > Marek > > > > > > Thanks. > > > > > > Here's an updated version of the patch. > > > > > > Changes since v1: > > > - updated expected fixit-formatting (the new fix-it printer in > > > r247548 handles this properly now) > > > - added new test cases as suggested by Florian > > > - use "tf_none" rather than "0 /* complain */" > > > > > > Successfully bootstrapped on x86_64-pc-linux-gnu. > > > > > > OK for trunk? > > > > > > gcc/cp/ChangeLog: > > > * parser.c (get_cast_suggestion): New function. > > > (maybe_add_cast_fixit): New function. > > > (cp_parser_cast_expression): Capture the location of the > > > closing > > > parenthesis. Call maybe_add_cast_fixit when emitting warnings > > > about old-style casts. > > > > > > gcc/testsuite/ChangeLog: > > > * g++.dg/other/old-style-cast-fixits.C: New test case. > > > --- > > > gcc/cp/parser.c| 93 > > > - > > > gcc/testsuite/g++.dg/other/old-style-cast-fixits.C | 95 > > > ++ > > > 2 files changed, 186 insertions(+), 2 deletions(-) > > > create mode 100644 gcc/testsuite/g++.dg/other/old-style-cast > > > -fixits.C > > > > > > diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c > > > index 4714bc6..2f83aa9 100644 > > > --- a/gcc/cp/parser.c > > > +++ b/gcc/cp/parser.c > > > @@ -8633,6 +8633,85 @@ cp_parser_tokens_start_cast_expression > > > (cp_parser *parser) > > > } > > > } > > > > > > +/* Try to find a legal C++-style cast to DST_TYPE for ORIG_EXPR, > > > trying them > > > + in the order: const_cast, static_cast, reinterpret_cast. > > > + > > > + Don't suggest dynamic_cast. > > > + > > > + Return the first legal cast kind found, or NULL otherwise. > > > */ > > > + > > > +static const char * > > > +get_cast_suggestion (tree dst_type, tree orig_expr) > > > +{ > > > + tree trial; > > > + > > > + /* Reuse the parser logic by attempting to build the various > > > kinds > > > of > > > + cast, with "complain" disabled. > > > + Identify the first such cast that is valid. */ > > > + > > > + /* Don't attempt to run such logic within template processing. > > > */ > > > + if (processing_template_decl) > > > +return NULL; > > > + > > > + /* First try const_cast. */ > > > + trial = build_const_cast (dst_type, orig_expr, tf_none); > > > + if (trial != error_mark_node) > > > +return "const_cast"; > > > + > > > + /* If that fails, try static_cast. */ > > > + trial = build_static_cast (dst_type, orig_expr, tf_none); > > > + if (trial != error_mark_node) > > > +return "static_cast"; > > > + > > > + /* Finally, try reinterpret_cast. */ > > > + trial = build_reinterpret_cast (dst_type, orig_expr, tf_none); > > > + if (trial != error_mark_node) > > > +return "reinterpret_cast"; > > > + > > > + /* No such cast possible. */ > > > + return NULL; > > > +} > > > + > > > +/* If -Wold-style-cast is enabled, add fix-its to RICHLOC, > > > + suggesting how to convert a C-style cast of the form: > > > + > > > + (DST_TYPE)ORIG_EXPR > > > + > > > + to a C++-style cast. > > > + > > > + The primary range of RICHLOC is asssumed to be that of the > > > original > > > + expression. OPEN_PAREN_LOC and CLOSE_PAREN_LOC give the > > > locations > > > + of the parens in the C-style cast. */ > > > + > > > +static void > > > +maybe_add_cast_fixit (rich_location *rich_loc, location_t > > > open_paren_loc, > > > + location_t close_paren_loc, tree > > > orig_expr, > > > + tree dst_type) > > > +{ > > > + /* This function is
Re: [PATCH/AARCH64] Improve aarch64 conditional compare usage
On Fri, Jun 16, 2017 at 10:06:51AM -0700, Steve Ellcey wrote: > > https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00021.html > > Ping. Hi Steve, These changes all look like they are to the tree pass rather than to the AArch64 back end. Maybe reposting it without the AArch64 tag will get it more visibility from people other than the AArch64 maintainers? Cheers, James
[Patch AArch64] Add rcpc extension
Hi, While GCC doesn't need to know anything about the RcPc extension for code generation, we do need to add the extension flag to the string we pass to the assembler when we're compiling for a CPU which implements the RcPc extension. I've built a toolchain with this patch applied, and checked that we correctly pass +rcpc on to the assembler if we give something like -mcpu=generic+rcpc . OK? Thanks, James --- 2017-06-20 James Greenhalgh* config/aarch64/aarch64-option-extensions.def (rcpc): New. * config/aarch64/aarch64.h (AARCH64_FL_RCPC): New. diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index b54de03..c0752ce 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -60,4 +60,7 @@ AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, "atomics") Disabling "fp16" just disables "fp16". */ AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, 0, "fphp asimdhp") +/* Enabling or disabling "rcpc" only changes "rcpc". */ +AARCH64_OPT_EXTENSION("rcpc", AARCH64_FL_RCPC, 0, 0, "lrcpc") + #undef AARCH64_OPT_EXTENSION diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index e4fb96f..3b3f27e 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -140,6 +140,7 @@ extern unsigned aarch64_architecture_version; #define AARCH64_FL_F16 (1 << 9) /* Has ARMv8.2-A FP16 extensions. */ /* ARMv8.3-A architecture extensions. */ #define AARCH64_FL_V8_3 (1 << 10) /* Has ARMv8.3-A features. */ +#define AARCH64_FL_RCPC (1 << 11) /* Has support for RCpc model. */ /* Has FP and SIMD. */ #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
Re: [PATCH 2/3] Simplify wrapped binops
On Tue, Jun 20, 2017 at 3:08 PM, Robin Dappwrote: >>> Currently, extract_... () does that all that for me, is it really too >>> expensive to call? I guess, using get_range_info first and calling >>> extract when get_range_info returns a VR_RANGE is not really a favorable >>> thing to do either? :) >> Not only the cost, we should avoid introducing more interfaces while >> old ones can do the work. Anyway, it's Richard's call here. > > I rewrote the match.pd patterns to use get_range_info () now, keeping > track of an "ok" overflow (both min and max overflow) and one which does > not allow us to continue (min xor max overflow, split/anti range). Test > suite on s390x has no regressions, bootstrap is ok, x86 running. + (if (TREE_CODE (type) == INTEGER_TYPE + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3))) + (with use INTEGRAL_TYPE_P. + bool ovf_undef = TYPE_OVERFLOW_UNDEFINED (inner_type); + so this is overflow behavior of the inner op. + /* Convert combined constant to tree of outer type if +there was no overflow in the original operation. */ "in the original inner operation." you then go on and use ovf_undef also for the outer operation: + if (ovf_undef || vr_outer == VR_RANGE) + { but you do not actually _use_ vr_outer. Do you think that if vr_outer is a VR_RANGE then the outer operation may not possibly have wrapped? That's a false conclusion. But I don't see how overflow in the original outer operation matters and the code lacks comments as to explaining that as well. So if you have a vr0 then you can compute whether the inner operation cannot overflow. You do this here: + if (!ovf_undef && vr0 == VR_RANGE) + { + int max_ovf = 0; + int min_ovf = 0; + + signop sgn = TYPE_SIGN (inner_type); + + wmin = wi::add (wmin0, w1); + min_ovf = wi::cmp (wmin, w1, sgn) < 0; + + wmax = wi::add (wmax0, w1); + max_ovf = wi::cmp (wmax, w1, sgn) < 0; + + ovf = min_ovf || max_ovf; + + split_range = ((min_ovf && !max_ovf) + || (!min_ovf && max_ovf)); ah, here's the use of the outer value-range. This lacks a comment (and it looks fishy given the outer value-range is a conservative approximation and thus could be [-INF, +INF]). Why's this not using the wi::add overload with the overflow flag? ISTR you want to handle "negative" unsigned constants somehow, but then I don't see how the above works. I'd say if wmin/wmax interpreted as signed are positive and then using a signed op to add w1 results in a still positive number you're fine (you don't seem to restrict the widening cast to either zero- or sign-extending). + if (ovf_undef || !split_range) + { + /* Extend @1 to TYPE. */ + w1 = w1.from (w1, TYPE_PRECISION (type), + ovf ? SIGNED : TYPE_SIGN (TREE_TYPE (@1))); ideally you could always interpret w1 as signed? + /* Combine in outer, larger type. */ + wide_int combined_cst; + combined_cst = wi::add (w1, w2); +(if (cst) +(outer_op (convert @0) { cst; })) + ) bogus indent. +/* ((T)(A)) +- CST -> (T)(A +- CST) */ +#if GIMPLE + (for outer_op (plus minus) +(simplify + (outer_op (convert SSA_NAME@0) INTEGER_CST@2) + (if (TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) + && TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE + && TREE_CODE (type) == INTEGER_TYPE) INTEGRAL_TYPE_P and do that first before looking at TYPE_PRECISION. + if (vr == VR_RANGE) + { + wide_int wmin = wi::add (wmin0, w1); + bool min_ovf = wi::cmp (wmin, w1, sgn) < 0; + + wide_int wmax = wi::add (wmax0, w1); + bool max_ovf = wi::cmp (wmax, w1, sgn) < 0; + + split_range = (min_ovf && !max_ovf) || (!min_ovf && max_ovf); similar why not use wi:add overload with the overflow flag? Btw, I find (with { tree x = NULL; if (...) x = non-NULL; } (if (x) ( ugly. Use (with { ... } (if (...) (... { non-NULL } ) or sth like that which makes control flow more easily visible. Richard. > Regards > Robin > > -- > > gcc/ChangeLog: > > 2017-06-19 Robin Dapp > > * match.pd: Simplify wrapped binary operations.
[arm-embedded] [PATCH, GCC/ARM, Stage 1] Rename FPSCR builtins to correct names
Hi, We have decided to apply the following patch to the embedded-6-branch to fix naming of an ARM intrinsic. ChangeLog entry is as follows: 2017-06-20 Thomas Preud'hommeBackport from mainline 2017-05-04 Prakhar Bahuguna gcc/ * gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename __builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename __builtin_arm_stfscr to __builtin_arm_set_fpscr. gcc/testsuite/ * gcc.target/arm/fpscr.c: New file. Best regards, Thomas --- Begin Message --- Hi Prakhar, Sorry for the delay, On 22/03/17 10:46, Prakhar Bahuguna wrote: The GCC documentation in section 6.60.8 ARM Floating Point Status and Control Intrinsics states that the FPSCR register can be read and written to using the intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr. However, these are misnamed within GCC itself and these intrinsic names are not recognised. This patch corrects the intrinsic names to match the documentation, and adds tests to verify these intrinsics generate the correct instructions. Testing done: Ran regression tests on arm-none-eabi for Cortex-M4. 2017-03-09 Prakhar Bahuguna gcc/ChangeLog: * gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename __builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename __builtin_arm_stfscr to __builtin_arm_set_fpscr. * gcc/testsuite/gcc.target/arm/fpscr.c: New file. Okay for stage 1? I see that the mistake was in not addressing one of the review comments in: https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01832.html properly in the patch that added these functions :( This is ok for stage 1 if a bootstrap and test on arm-none-linux-gnueabihf works fine I don't think we want to maintain the __builtin_arm_[ld,st]fscr names for backwards compatibility as they were not documented and are __builtin_arm* functions that we don't guarantee to maintain. Thanks, Kyrill -- Prakhar Bahuguna --- End Message ---
[Patch AArch64 obvious] Fix expected string for fp16 extensions
Hi, As currently coded, the native detection of the fp16 architecture extension from the ARMv8.2-A extensions looks for the string "fp16", but the kernel exposes support of these features through two strings "fphp, for scalar 16-bit floating point support, and "asimdhp" for vector 16-bit floating-point support [1]. This patch fixes the string we look for, looking for the pair of both fphp and asimdhp. I have no platform to test this on, so my testing is to show that it builds and correctly enables the fp16 extension when given a faked up /proc/cpuinfo I've committed this as obvious to trunk (as revision 249411) and gcc-7-branch (as revision 249413). Thanks, James [1] Patchwork arm64: Add support for Half precision floating point https://patchwork.kernel.org/patch/8124451/ --- 2017-06-20 James Greenhalgh* config/aarch64/aarch64-option-extensions.def (fp16): Fix expected feature string. diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index 36766d9..b54de03 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -58,6 +58,6 @@ AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, "atomics") /* Enabling "fp16" also enables "fp". Disabling "fp16" just disables "fp16". */ -AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, 0, "fp16") +AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, 0, "fphp asimdhp") #undef AARCH64_OPT_EXTENSION
Re: [testsuite, i386] Always check for target i?86 and x86_64
JonY <10wa...@gmail.com> writes: > On 06/20/2017 01:01 PM, Rainer Orth wrote: >> Given that there were no other comments, I've installed the patch. It >> would still be nice if the Cygwin/MingW maintainer could comment on the >> testcase situation for those targets. > > Honestly, I'm not sure how ms-bitfields work on non-Windows targets, I just noticed that it's handled in generic code in i386.c and the affected tests worked on the likes of Linux and Solaris ;-) > beyond that, the patch looks like it won't change the tests that run for > mingw/cygwin. True: as I mentioned in the submission https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01103.html I didn't touch the Cygwin/MingW patches listed there except for those where I could verify that they would/should work on any x86 target. Maybe you can have a look a the questions raised there ("There's one group of targets I've omitted completely"...)? Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[Patch AArch64 obvious] Rearrange the processors in aarch64-cores.def
Hi, This patch rearranges the cores in aarch64-cores.def first by architecture revision, then by alphabetical order of implementer ID. This just neatens up the file a bit, as it is growing to be unwieldy. Committed as revision 249410. Thanks, James --- 2017-06-20 James Greenhalgh* config/aarch64/aarch64-cores.def: Rearrange to sort by architecture, then by implementer ID. * config/aarch64/aarch64-tune.md: Regenerate. diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 92b57cf..e333d5f 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -43,7 +43,7 @@ VARIANT is the variant of the CPU. In a GNU/Linux system it can found in /proc/cpuinfo. If this is -1, this means it can match any variant. */ -/* V8 Architecture Processors. */ +/* ARMv8-A Architecture Processors. */ /* ARM ('A') cores. */ AARCH64_CORE("cortex-a35", cortexa35, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04, -1) @@ -52,13 +52,6 @@ AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AA AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, 0xd08, -1) AARCH64_CORE("cortex-a73", cortexa73, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, 0xd09, -1) -/* Samsung ('S') cores. */ -AARCH64_CORE("exynos-m1", exynosm1, exynosm1, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1, 0x53, 0x001, -1) - -/* Qualcomm ('Q') cores. */ -AARCH64_CORE("falkor", falkor,cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, 0x51, 0xC00, -1) -AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, 0x51, 0xC00, -1) - /* Cavium ('C') cores. */ AARCH64_CORE("thunderx", thunderx, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a0, -1) /* Do not swap around "thunderxt88p1" and "thunderxt88", @@ -67,18 +60,27 @@ AARCH64_CORE("thunderxt88p1", thunderxt88p1, thunderx, 8A, AARCH64_FL_FOR_ARCH AARCH64_CORE("thunderxt88", thunderxt88, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a1, -1) AARCH64_CORE("thunderxt81", thunderxt81, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a2, -1) AARCH64_CORE("thunderxt83", thunderxt83, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a3, -1) -AARCH64_CORE("thunderx2t99", thunderx2t99, thunderx2t99, 8_1A, AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x43, 0x0af, -1) /* APM ('P') cores. */ AARCH64_CORE("xgene1", xgene1,xgene1,8A, AARCH64_FL_FOR_ARCH8, xgene1, 0x50, 0x000, -1) -/* V8.1 Architecture Processors. */ +/* Qualcomm ('Q') cores. */ +AARCH64_CORE("falkor", falkor,cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, 0x51, 0xC00, -1) +AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, 0x51, 0xC00, -1) + +/* Samsung ('S') cores. */ +AARCH64_CORE("exynos-m1", exynosm1, exynosm1, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1, 0x53, 0x001, -1) + +/* ARMv8.1-A Architecture Processors. */ /* Broadcom ('B') cores. */ AARCH64_CORE("thunderx2t99p1", thunderx2t99p1, thunderx2t99, 8_1A, AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1) AARCH64_CORE("vulcan", vulcan, thunderx2t99, 8_1A, AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1) -/* V8 big.LITTLE implementations. */ +/* Cavium ('C') cores. */ +AARCH64_CORE("thunderx2t99", thunderx2t99, thunderx2t99, 8_1A, AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x43, 0x0af, -1) + +/* ARMv8-A big.LITTLE implementations. */ AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE (0xd07, 0xd03), -1) AARCH64_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, AARCH64_BIG_LITTLE (0xd08, 0xd03), -1) diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md index c948846..4209f67 100644 --- a/gcc/config/aarch64/aarch64-tune.md +++ b/gcc/config/aarch64/aarch64-tune.md @@ -1,5 +1,5 @@ ;; -*- buffer-read-only: t -*- ;; Generated automatically by gentune.sh from aarch64-cores.def (define_attr "tune" -
Re: [RFC] Dejagnu patch to handle multi-line directives
Mike Stumpwrites: > On Jun 10, 2017, at 12:57 AM, Tom de Vries wrote: >> >> one thing that has bothered me on a regular basis is the inability to >> spread long dejagnu directives over multiple lines. > > I'm not terribly in favor of this. I'd like to retain the ability to grep > and sed single line things. It makes exploring and finding things easier. > Also, if we bulk convert to a new framework system for running test cases, > the conversion is easier. I have to agree. Besides, extremely long lines with DejaGnu directives often are a sign that something is amiss, e.g. long lists of targets instead of an effective-target keyword describing what's common between them or Tom's example which is way more readably handled via dg-add-options as he's discovered in the meantime ;-) Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [testsuite, i386] Always check for target i?86 and x86_64
On 06/20/2017 01:01 PM, Rainer Orth wrote: > > once I got the syntax right, this worked fine: it needs > > { target { ilp32 || lp64 } } > > instead ;-) > > I've also now managed to complete a Darwin/x86_64 bootstrap by locally > reverting the two culprit patches for PR bootstrap/81033 and confirmed > that the patched testcases are fine there, two. > > Given that there were no other comments, I've installed the patch. It > would still be nice if the Cygwin/MingW maintainer could comment on the > testcase situation for those targets. > > Rainer > Honestly, I'm not sure how ms-bitfields work on non-Windows targets, beyond that, the patch looks like it won't change the tests that run for mingw/cygwin. signature.asc Description: OpenPGP digital signature
[arm-embedded] [PATCH, GCC/LTO, ping] Fix PR69866: LTO with def for weak alias in regular object file
Hi, We have decided to apply the referenced fix (r249352) to the ARM/embedded-6-branch along with its initial commit (r249224) to fix an ICE with LTO and aliases. Fix PR69866 2017-06-20 Thomas Preud'hommeBackport from mainline 2017-06-15 Jan Hubicka Thomas Preud'homme gcc/ PR lto/69866 * lto-symtab.c (lto_symtab_merge_symbols): Drop useless definitions that resolved externally. 2017-06-15 Thomas Preud'homme gcc/testsuite/ PR lto/69866 * gcc.dg/lto/pr69866_0.c: New test. * gcc.dg/lto/pr69866_1.c: Likewise. Backport from mainline 2017-06-18 Jan Hubicka gcc/testsuite/ * gcc.dg/lto/pr69866_0.c: This test needs alias. Best regards, Thomas --- Begin Message --- > The new test fails on darwin with the usual > > FAIL: gcc.dg/lto/pr69866 c_lto_pr69866_0.o-c_lto_pr69866_1.o link, -O0 -flto > -flto-partition=none > > IMO it requires a > > /* { dg-require-alias "" } */ Yep,I will add it shortly. Honza > > directive. > > TIA > > Dominique --- End Message ---
[arm-embedded] [PATCH, ARM] Implement __ARM_FEATURE_COPROC coprocessor intrinsic feature macro
Hi, We have decided to apply the following patch to the ARM/embedded-6-branch and ARM/embedded-7-branch to implement the __ARM_FEATURE_COPROC coprocessor intrinsic feature macro. 2017-06-20 Thomas Preud'hommeBackport from mainline 2017-06-20 Prakhar Bahuguna gcc/ * config/arm/arm-c.c (arm_cpu_builtins): New block to define __ARM_FEATURE_COPROC according to support. gcc/testsuite/ * gcc.target/arm/acle/cdp.c: Add feature macro bitmap test. * gcc.target/arm/acle/cdp2.c: Likewise. * gcc.target/arm/acle/ldc.c: Likewise. * gcc.target/arm/acle/ldc2.c: Likewise. * gcc.target/arm/acle/ldc2l.c: Likewise. * gcc.target/arm/acle/ldcl.c: Likewise. * gcc.target/arm/acle/mcr.c: Likewise. * gcc.target/arm/acle/mcr2.c: Likewise. * gcc.target/arm/acle/mcrr.c: Likewise. * gcc.target/arm/acle/mcrr2.c: Likewise. * gcc.target/arm/acle/mrc.c: Likewise. * gcc.target/arm/acle/mrc2.c: Likewise. * gcc.target/arm/acle/mrrc.c: Likewise. * gcc.target/arm/acle/mrrc2.c: Likewise. * gcc.target/arm/acle/stc.c: Likewise. * gcc.target/arm/acle/stc2.c: Likewise. * gcc.target/arm/acle/stc2l.c: Likewise. * gcc.target/arm/acle/stcl.c: Likewise. Best regards, Thomas --- Begin Message --- On 16/06/2017 15:37:18, Richard Earnshaw (lists) wrote: > On 16/06/17 08:48, Prakhar Bahuguna wrote: > > On 15/06/2017 17:23:43, Richard Earnshaw (lists) wrote: > >> On 14/06/17 10:35, Prakhar Bahuguna wrote: > >>> The ARM ACLE defines the __ARM_FEATURE_COPROC macro which indicates which > >>> coprocessor intrinsics are available for the target. If > >>> __ARM_FEATURE_COPROC is > >>> undefined, the target does not support coprocessor intrinsics. The feature > >>> levels are defined as follows: > >>> > >>> +-+---+--+ > >>> | **Bit** | **Value** | **Intrinsics Available** | > >>> +-+---+--+ > >>> | 0 | 0x1 | __arm_cdp __arm_ldc, __arm_ldcl, __arm_stc, | > >>> | | | __arm_stcl, __arm_mcr and __arm_mrc | > >>> +-+---+--+ > >>> | 1 | 0x2 | __arm_cdp2, __arm_ldc2, __arm_stc2, __arm_ldc2l, | > >>> | | | __arm_stc2l, __arm_mcr2 and __arm_mrc2 | > >>> +-+---+--+ > >>> | 2 | 0x4 | __arm_mcrr and __arm_mrrc| > >>> +-+---+--+ > >>> | 3 | 0x8 | __arm_mcrr2 and __arm_mrrc2 | > >>> +-+---+--+ > >>> > >>> This patch implements full support for this feature macro as defined in > >>> section > >>> 5.9 of the ACLE > >>> (https://developer.arm.com/products/software-development-tools/compilers/arm-compiler-5/docs/101028/latest/5-feature-test-macros). > >>> > >>> gcc/ChangeLog: > >>> > >>> 2017-06-14 Prakhar Bahuguna > >>> > >>> * config/arm/arm-c.c (arm_cpu_builtins): New block to define > >>>__ARM_FEATURE_COPROC according to support. > >>> > >>> 2017-06-14 Prakhar Bahuguna > >>> * gcc/testsuite/gcc.target/arm/acle/cdp.c: Add feature macro bitmap > >>> test. > >>> * gcc/testsuite/gcc.target/arm/acle/cdp2.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/ldc.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/ldc2.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/ldc2l.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/ldcl.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/mcr.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/mcr2.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/mcrr.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/mcrr2.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/mrc.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/mrc2.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/mrrc.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/mrrc2.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/stc.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/stc2.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/stc2l.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/stcl.c: Likewise. > >>> > >>> Testing done: ACLE regression tests updated with tests for feature macro > >>> bits. > >>> All regression tests pass. > >>> > >>> Okay for trunk? > >>> > >>> > >>> 0001-Implement-__ARM_FEATURE_COPROC-coprocessor-intrinsic.patch > >>> > >>> > >>> From 79d71aec9d2bdee936b240ae49368ff5f8d8fc48 Mon Sep 17
Re: [PATCH 2/3] Simplify wrapped binops
>> Currently, extract_... () does that all that for me, is it really too >> expensive to call? I guess, using get_range_info first and calling >> extract when get_range_info returns a VR_RANGE is not really a favorable >> thing to do either? :) > Not only the cost, we should avoid introducing more interfaces while > old ones can do the work. Anyway, it's Richard's call here. I rewrote the match.pd patterns to use get_range_info () now, keeping track of an "ok" overflow (both min and max overflow) and one which does not allow us to continue (min xor max overflow, split/anti range). Test suite on s390x has no regressions, bootstrap is ok, x86 running. Regards Robin -- gcc/ChangeLog: 2017-06-19 Robin Dapp* match.pd: Simplify wrapped binary operations. diff --git a/gcc/match.pd b/gcc/match.pd index 80a17ba..66c37f6 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1290,6 +1290,128 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (cst && !TREE_OVERFLOW (cst)) (plus { cst; } @0 +/* ((T)(A +- CST)) +- CST -> (T)(A) +- CST) */ +#if GIMPLE + (for outer_op (plus minus) + (for inner_op (plus minus) + (simplify + (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2) + (if (TREE_CODE (type) == INTEGER_TYPE + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3))) + (with + { + tree cst = NULL_TREE; + tree inner_type = TREE_TYPE (@3); + wide_int wmin, wmax; + wide_int wmin0, wmax0; + + bool ovf = true; + bool ovf_undef = TYPE_OVERFLOW_UNDEFINED (inner_type); + + enum value_range_type vr_outer = + get_range_info (@3, , ); + enum value_range_type vr0 = + get_range_info (@0, , ); + + /* Convert combined constant to tree of outer type if + there was no overflow in the original operation. */ + if (ovf_undef || vr_outer == VR_RANGE) + { + wide_int w1 = @1; + wide_int w2 = @2; + + if (ovf_undef || vr0 == VR_RANGE) + { + if (inner_op == MINUS_EXPR) + w1 = wi::neg (w1); + + if (outer_op == MINUS_EXPR) + w2 = wi::neg (w2); + + bool split_range = true; + + if (!ovf_undef && vr0 == VR_RANGE) + { + int max_ovf = 0; + int min_ovf = 0; + + signop sgn = TYPE_SIGN (inner_type); + + wmin = wi::add (wmin0, w1); + min_ovf = wi::cmp (wmin, w1, sgn) < 0; + + wmax = wi::add (wmax0, w1); + max_ovf = wi::cmp (wmax, w1, sgn) < 0; + + ovf = min_ovf || max_ovf; + + split_range = ((min_ovf && !max_ovf) + || (!min_ovf && max_ovf)); + } + + if (ovf_undef || !split_range) + { + /* Extend @1 to TYPE. */ + w1 = w1.from (w1, TYPE_PRECISION (type), + ovf ? SIGNED : TYPE_SIGN (TREE_TYPE (@1))); + + /* Combine in outer, larger type. */ + wide_int combined_cst; + combined_cst = wi::add (w1, w2); + + cst = wide_int_to_tree (type, combined_cst); + } + } + } + } +(if (cst) + (outer_op (convert @0) { cst; })) + ) +#endif + +/* ((T)(A)) +- CST -> (T)(A +- CST) */ +#if GIMPLE + (for outer_op (plus minus) +(simplify + (outer_op (convert SSA_NAME@0) INTEGER_CST@2) + (if (TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) + && TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE + && TREE_CODE (type) == INTEGER_TYPE) + /* Perform binary operation inside the cast if the constant fits + and there is no overflow. */ + (with + { + bool split_range = true; + tree cst_inner = NULL_TREE; + enum value_range_type vr = VR_VARYING; + tree inner_type = TREE_TYPE (@0); + + if (int_fits_type_p (@2, inner_type)) + { + cst_inner = fold_convert (inner_type, @2); + + wide_int wmin0, wmax0; + wide_int w1 = cst_inner; + signop sgn = TYPE_SIGN (inner_type); + vr = get_range_info (@0, , ); + + if (vr == VR_RANGE) + { + wide_int wmin = wi::add (wmin0, w1); + bool min_ovf = wi::cmp (wmin, w1, sgn) < 0; + + wide_int wmax = wi::add (wmax0, w1); + bool max_ovf = wi::cmp (wmax, w1, sgn) < 0; + + split_range = (min_ovf && !max_ovf) || (!min_ovf && max_ovf); + } + } + } + (if (cst_inner && !split_range) + (convert (outer_op @0 { cst_inner; }))) + +#endif + /* ~A + A -> -1 */ (simplify (plus:c (bit_not @0) @0)
Re: [PATCH] ASAN: handle addressable params (PR sanitize/81040).
On 06/20/2017 11:32 AM, Jakub Jelinek wrote: > On Tue, Jun 20, 2017 at 11:23:36AM +0200, Martin Liška wrote: >>> Then something needs to be done for debugging too. If it is without VTA, >>> then probably just having DECL_VALUE_EXPR is good enough, otherwise >>> (VTA) you probably don't want that (or can reset it at that point), but >>> instead emit after the initialization stmt a debug stmt that the variable >>> value now lives in a different var. Though ideally we want the debugger >>> to be able to also change the value of the var, that might be harder. >>> With DECL_VALUE_EXPR on the other side the debug info will be incorrect in >>> the prologue until it is assigned to the slot. >> >> Here I'm not sure about how to distinguish whether to build or not to build >> the debug statement. According to flag_var_tracking? > > More like if (target_for_debug_bind (arg)) > And if that is false, just make sure DECL_VALUE_EXPR is set to var. > >> You mean something like: >> g = gimple_build_debug_bind (arg, var, g); >> ? > > Well, there is no stmt, so the last argument would be just NULL. > >>> I don't understand the distinction. If you turn the original parm >>> for complex/vector DECL_GIMPLE_REG_P, you should need the exact same code >>> (but I think it would be better to use the default SSA_NAME of the PARM_DECL >>> if it is a gimple reg type, rather than use the PARM_DECL itself >>> and wait for update_ssa). >> >> Yes, the test-case /gcc/testsuite/g++.dg/asan/function-argument-3.C fails >> for me >> as one needs to have a temporary SSA name, otherwise: >> >> /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/asan/function-argument-3.C:13:1: >> error: invalid rhs for gimple memory store >> foo (v4si arg) >> ^~~ >> arg >> >> arg >> >> # .MEM_4 = VDEF <.MEM_1(D)> >> arg = arg; >> during GIMPLE pass: sanopt >> >> If I see correctly the function in my test-case does not have default def >> SSA name for the parameter. >> Thus I guess I need to create a SSA name? > > I'd expect if you have DECL_GIMPLE_REG_P set on the PARM_DECL and > use the default def, you shouldn't run into this. > > Jakub > Good I fixed that in v2, that passes regression tests. Ale objections should be resolved in the version. Ready for trunk? Martin >From ed5da705250c3015e964de8d23d1aa3d0056012a Mon Sep 17 00:00:00 2001 From: marxinDate: Wed, 14 Jun 2017 11:40:01 +0200 Subject: [PATCH] ASAN: handle addressable params (PR sanitize/81040). gcc/testsuite/ChangeLog: 2017-06-19 Martin Liska PR sanitize/81040 * g++.dg/asan/function-argument-1.C: New test. * g++.dg/asan/function-argument-2.C: New test. * g++.dg/asan/function-argument-3.C: New test. gcc/ChangeLog: 2017-06-19 Martin Liska PR sanitize/81040 * sanopt.c (rewrite_usage_of_param): New function. (sanitize_rewrite_addressable_params): Likewise. (pass_sanopt::execute): Call rewrite_usage_of_param. --- gcc/sanopt.c| 132 gcc/testsuite/g++.dg/asan/function-argument-1.C | 30 ++ gcc/testsuite/g++.dg/asan/function-argument-2.C | 24 + gcc/testsuite/g++.dg/asan/function-argument-3.C | 27 + 4 files changed, 213 insertions(+) create mode 100644 gcc/testsuite/g++.dg/asan/function-argument-1.C create mode 100644 gcc/testsuite/g++.dg/asan/function-argument-2.C create mode 100644 gcc/testsuite/g++.dg/asan/function-argument-3.C diff --git a/gcc/sanopt.c b/gcc/sanopt.c index 16bdba76042..077811b5b93 100644 --- a/gcc/sanopt.c +++ b/gcc/sanopt.c @@ -37,6 +37,12 @@ along with GCC; see the file COPYING3. If not see #include "gimple-ssa.h" #include "tree-phinodes.h" #include "ssa-iterators.h" +#include "gimplify.h" +#include "gimple-iterator.h" +#include "gimple-walk.h" +#include "cfghooks.h" +#include "tree-dfa.h" +#include "tree-ssa.h" /* This is used to carry information about basic blocks. It is attached to the AUX field of the standard CFG block. */ @@ -858,6 +864,129 @@ sanitize_asan_mark_poison (void) } } +/* Rewrite all usages of tree OP which is a PARM_DECL with a VAR_DECL + that is it's DECL_VALUE_EXPR. */ + +static tree +rewrite_usage_of_param (tree *op, int *walk_subtrees, void *) +{ + if (TREE_CODE (*op) == PARM_DECL && DECL_VALUE_EXPR (*op) != NULL_TREE) +{ + *op = DECL_VALUE_EXPR (*op); + *walk_subtrees = 0; +} + + return NULL; +} + +/* For a given function FUN, rewrite all addressable parameters so that + a new automatic variable is introduced. Right after function entry + a parameter is assigned to the variable. */ + +static void +sanitize_rewrite_addressable_params (function *fun) +{ + gimple *g; + gimple_seq stmts = NULL; + auto_vec addressable_params; + + for (tree arg = DECL_ARGUMENTS (current_function_decl); + arg; arg = DECL_CHAIN (arg)) +{ + if (TREE_ADDRESSABLE (arg) && !TREE_ADDRESSABLE (TREE_TYPE (arg))) + { + TREE_ADDRESSABLE (arg) = 0; + /*
Re: [testsuite, i386] Always check for target i?86 and x86_64
Hi Jonathan, > On 15/06/17 12:51 +0200, Rainer Orth wrote: >>I happened to notice that recently a couple of testcases have sneaked in >>that are restricted to x86_64-*-* targets only. This is always wrong: >>it should be i?86-*-* and x86_64-*-* alike, eventually restricing the >>test to ilp32 or lp64. There were also instances of i?86-*-* only, >>which I've handled as well. > > [...] > >>diff --git a/libstdc++-v3/testsuite/20_util/variant/index_type.cc >> b/libstdc++-v3/testsuite/20_util/variant/index_type.cc >>--- a/libstdc++-v3/testsuite/20_util/variant/index_type.cc >>+++ b/libstdc++-v3/testsuite/20_util/variant/index_type.cc >>@@ -1,5 +1,5 @@ >> // { dg-options "-std=gnu++17" } >>-// { dg-do compile { target x86_64-*-* powerpc*-*-* } } >>+// { dg-do compile { target i?86-*-* x86_64-*-* powerpc*-*-* } } >> >> // Copyright (C) 2017 Free Software Foundation, Inc. >> // > > The concern here was just that we don't want the test to fail on > targets with weird integer sizes, so the list of targets was > restricted to just those where Ville had tested it. > > But { target ilp32 lp64 } would surely be fine. The test will only > fail if a struct with two char-sized subobjects is the same size as > size_t. Feel free to change it to { target ilp32 lp64 }. once I got the syntax right, this worked fine: it needs { target { ilp32 || lp64 } } instead ;-) I've also now managed to complete a Darwin/x86_64 bootstrap by locally reverting the two culprit patches for PR bootstrap/81033 and confirmed that the patched testcases are fine there, two. Given that there were no other comments, I've installed the patch. It would still be nice if the Cygwin/MingW maintainer could comment on the testcase situation for those targets. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.
On Mon, Jun 19, 2017 at 12:40:53PM -0500, Segher Boessenkool wrote: > On Mon, Jun 19, 2017 at 05:01:10PM +0100, Richard Earnshaw (lists) wrote: > > Yeah, and I'm not suggesting we change the logic there (sorry if the > > description was misleading). Instead I'm proposing that we handle more > > cases for parallels to not return zero. > > Right. My test run is half way through, will have results later -- > your change looks good to me, but it is always surprising whether > better costs help or not, or even *hurt* good code generation (things > are just too tightly tuned to the current behaviour, so some things > may need retuning). Everything built successfully (31 targets); --enable-checking=yes,rtl,tree so it took a while, sorry. The targets with any differences (table shows code size): old patched arm 11545709 11545797 powerpc 8442762 8442746 x86_64 10627428 10627363 Arm has very many differences, the others do not. For powerpc (which is 32-bit, 64-bit showed no differences) most of the difference is scheduling deciding to do things a bit differently, and most of it in places where we have not-so-good costs anyway. For arm the effects often cascade to bb-reorder making different decisions. Anyway, all differences are small, it is not likely to hurt anything. I support the patch, if that helps -- but I cannot approve it. Segher
[PR c++/67074] Namespace aliases to same name
This patch fixes a couple of places where namespace aliases refer to the same namespace. These are not ambiguous or conflicting. Firstly, aliases of the same name may exist in namespaces searched via using directives. Those should be merged in lookup, which is the change to add_value. Secondly, an alias to an existing namespace of the same name in the same scope is ok. The change to duplicate_decls calms it down -- all it should do is say 'yup, this is the same decl', and disregard conflictingness -- that's pushdecl's problem. Which is the change to update_binding. As it happens, duplicate_decls will have already eliminated matching aliases, but it needs to be taught to permit an alias to the same named namespace to be allowed. And finally diagnose_name_conflict needs to treat namespaces as always conflicting. It's confusing to say the second instance is a redeclaration. Applied to trunk. nathan -- Nathan Sidwell 2017-06-20 Nathan SidwellPR c++/67074 - namespace aliases * decl.c (duplicate_decls): Don't error here on mismatched namespace alias. * name-lookup.c (name_lookup::add_value): Matching namespaces are not ambiguous. (diagnose_name_conflict): Namespaces are never redeclarations. (update_binding): An alias can match a real namespace. PR c++/67074 * g++.dg/lookup/pr67074.C: New. * g++.dg/parse/namespace-alias-1.C: Adjust. Index: cp/decl.c === --- cp/decl.c (revision 249384) +++ cp/decl.c (working copy) @@ -1751,17 +1751,9 @@ duplicate_decls (tree newdecl, tree oldd && (DECL_NAMESPACE_ALIAS (newdecl) == DECL_NAMESPACE_ALIAS (olddecl))) return olddecl; - /* [namespace.alias] - A namespace-name or namespace-alias shall not be declared as - the name of any other entity in the same declarative region. - A namespace-name defined at global scope shall not be - declared as the name of any other entity in any global scope - of the program. */ - error ("conflicting declaration of namespace %q+D", newdecl); - inform (DECL_SOURCE_LOCATION (olddecl), - "previous declaration of namespace %qD here", olddecl); - return error_mark_node; + /* Leave it to update_binding to merge or report error. */ + return NULL_TREE; } else { Index: cp/name-lookup.c === --- cp/name-lookup.c (revision 249385) +++ cp/name-lookup.c (working copy) @@ -450,7 +450,13 @@ name_lookup::add_value (tree new_val) else if ((TREE_CODE (value) == TYPE_DECL && TREE_CODE (new_val) == TYPE_DECL && same_type_p (TREE_TYPE (value), TREE_TYPE (new_val -; +/* Typedefs to the same type. */; + else if (TREE_CODE (value) == NAMESPACE_DECL + && TREE_CODE (new_val) == NAMESPACE_DECL + && ORIGINAL_NAMESPACE (value) == ORIGINAL_NAMESPACE (new_val)) +/* Namespace (possibly aliased) to the same namespace. Locate + the namespace*/ +value = ORIGINAL_NAMESPACE (value); else { if (deduping) @@ -1630,10 +1636,10 @@ static void diagnose_name_conflict (tree decl, tree bval) { if (TREE_CODE (decl) == TREE_CODE (bval) - && (TREE_CODE (decl) != TYPE_DECL - || (DECL_ARTIFICIAL (decl) && DECL_ARTIFICIAL (bval)) - || (!DECL_ARTIFICIAL (decl) && !DECL_ARTIFICIAL (bval))) + && TREE_CODE (decl) != NAMESPACE_DECL && !DECL_DECLARES_FUNCTION_P (decl) + && (TREE_CODE (decl) != TYPE_DECL + || DECL_ARTIFICIAL (decl) == DECL_ARTIFICIAL (bval)) && CP_DECL_CONTEXT (decl) == CP_DECL_CONTEXT (bval)) error ("redeclaration of %q#D", decl); else @@ -1809,15 +1815,14 @@ update_binding (cp_binding_level *level, } else if (TREE_CODE (old) == NAMESPACE_DECL) { - if (DECL_NAMESPACE_ALIAS (old) && DECL_NAMESPACE_ALIAS (decl) - && ORIGINAL_NAMESPACE (old) == ORIGINAL_NAMESPACE (decl)) - /* In a declarative region, a namespace-alias-definition can be - used to redefine a namespace-alias declared in that declarative - region to refer only to the namespace to which it already - refers. [namespace.alias] */ - return old; - else + /* Two maybe-aliased namespaces. If they're to the same target + namespace, that's ok. */ + if (ORIGINAL_NAMESPACE (old) != ORIGINAL_NAMESPACE (decl)) goto conflict; + + /* The new one must be an alias at this point. */ + gcc_assert (DECL_NAMESPACE_ALIAS (decl)); + return old; } else if (TREE_CODE (old) == VAR_DECL) { Index: testsuite/g++.dg/lookup/pr67074.C === --- testsuite/g++.dg/lookup/pr67074.C (nonexistent) +++ testsuite/g++.dg/lookup/pr67074.C (working copy) @@ -0,0 +1,18 @@ +// PR c++/67074 namespace aliases to the same place. + +namespace P { + namespace X { +static int i = 1; + } +} +namespace Q { + namespace X = P::X; +} + +using namespace P; +using namespace
[PATCH] Fix PR81097
The following fixes PR81097 but not eventually more latent issues in transforming ~x "back" to -x - 1. I want a testcase before fiddling more with this. The following follows the pattern of previous fixes to this function, making sure to do negation in 'type'. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2017-06-20 Richard BienerPR middle-end/81097 * fold-const.c (split_tree): Fold to type before negating. * c-c++-common/ubsan/pr81097.c: New testcase. Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 249397) +++ gcc/fold-const.c(working copy) @@ -853,9 +853,9 @@ split_tree (location_t loc, tree in, tre && code == PLUS_EXPR) { /* -X - 1 is folded to ~X, undo that here. Do _not_ do this - when IN is constant. */ - *minus_litp = build_one_cst (TREE_TYPE (in)); - var = negate_expr (TREE_OPERAND (in, 0)); + when IN is constant. Convert to TYPE before negating. */ + *minus_litp = build_one_cst (type); + var = negate_expr (fold_convert_loc (loc, type, TREE_OPERAND (in, 0))); } else var = in; Index: gcc/testsuite/c-c++-common/ubsan/pr81097.c === --- gcc/testsuite/c-c++-common/ubsan/pr81097.c (nonexistent) +++ gcc/testsuite/c-c++-common/ubsan/pr81097.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do run } */ +/* { dg-options "-fsanitize=undefined -fsanitize-undefined-trap-on-error" } */ + +unsigned int a = 3309568; +unsigned int b = -1204857327; +short c = -10871; +short x; +int main() +{ + x = ((short)(~a) | ~c) + ((short)(~b) | ~c); + return 0; +}
Re: [PATCH, GCC/testsuite/ARM] Consistently check for neon in vect effective targets
On 19 June 2017 at 16:47, Thomas Preudhommewrote: > > > On 19/06/17 15:31, Christophe Lyon wrote: >> >> On 19 June 2017 at 16:11, Thomas Preudhomme >> wrote: >>> >>> >>> >>> On 19/06/17 10:16, Thomas Preudhomme wrote: On 19/06/17 08:41, Christophe Lyon wrote: > > > Hi Thomas, > > > On 15 June 2017 at 18:18, Thomas Preudhomme > wrote: >> >> >> Hi, >> >> Conditions checked for ARM targets in vector-related effective targets >> are inconsistent: >> >> * sometimes arm*-*-* is checked >> * sometimes Neon is checked >> * sometimes arm_neon_ok and sometimes arm_neon is used for neon check >> * sometimes check_effective_target_* is used, sometimes >> is-effective-target >> >> This patch consolidate all of these check into using >> is-effective-target >> arm_neon and when little endian was checked, the check is kept. >> >> ChangeLog entry is as follows: >> >> *** gcc/testsuite/ChangeLog *** >> >> 2017-06-06 Thomas Preud'homme >> >> * lib/target-supports.exp (check_effective_target_vect_int): >> Replace >> current ARM check by ARM NEON's availability check. >> (check_effective_target_vect_intfloat_cvt): Likewise. >> (check_effective_target_vect_uintfloat_cvt): Likewise. >> (check_effective_target_vect_floatint_cvt): Likewise. >> (check_effective_target_vect_floatuint_cvt): Likewise. >> (check_effective_target_vect_shift): Likewise. >> (check_effective_target_whole_vector_shift): Likewise. >> (check_effective_target_vect_bswap): Likewise. >> (check_effective_target_vect_shift_char): Likewise. >> (check_effective_target_vect_long): Likewise. >> (check_effective_target_vect_float): Likewise. >> (check_effective_target_vect_perm): Likewise. >> (check_effective_target_vect_perm_byte): Likewise. >> (check_effective_target_vect_perm_short): Likewise. >> (check_effective_target_vect_widen_sum_hi_to_si_pattern): >> Likewise. >> (check_effective_target_vect_widen_sum_qi_to_hi): Likewise. >> (check_effective_target_vect_widen_mult_qi_to_hi): Likewise. >> (check_effective_target_vect_widen_mult_hi_to_si): Likewise. >> (check_effective_target_vect_widen_mult_qi_to_hi_pattern): >> Likewise. >> (check_effective_target_vect_widen_mult_hi_to_si_pattern): >> Likewise. >> (check_effective_target_vect_widen_shift): Likewise. >> (check_effective_target_vect_extract_even_odd): Likewise. >> (check_effective_target_vect_interleave): Likewise. >> (check_effective_target_vect_multiple_sizes): Likewise. >> (check_effective_target_vect64): Likewise. >> (check_effective_target_vect_max_reduc): Likewise. >> >> Testing: Testsuite shows no regression when targeting ARMv7-A with >> -mfpu=neon-fpv4 and -mfloat-abi=hard or when targeting Cortex-M3 with >> default FPU and float ABI (soft). Testing was done with both >> compare_tests >> and the updated dg-cmp-results proposed in >> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01030.html >> >> Is this ok for trunk? >> > > I applied your patch on top of r249233, and noticed quite a few > changes: > > > http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/249233-consistent_neon_check.patch/report-build-info.html > > > Note that "Big-Regression" cases are caused by the fact that there a > are PASS->XPASS and XFAILs disappear with your patch, and many > (3000-4000) PASS disappear. > In that intended? It certainly is not. I'd like to investigate this but the link to results for rev 249233 is broken. Could you provide me with the results you have for that so that I can compare manually? >>> >>> >>> >>> Actually yes it is, at least for the configurations with default (which >>> still uses -mfpu=vfp in r249233) or VFP (whatever version) FPU. I've >>> checked >>> all the ->NA and ->UNSUPPORTED for the arm-none-linux-gnueabi >>> configuration >>> and none of them has a dg directive to select the neon unit (such as >>> dg-additional-options >> line>). >>> I've also looked at arm-none-linux-gnueabihf configuration with neon FPU >>> and >>> there is no regression there. >>> >>> I therefore think this is all normal and expected. Note that under >>> current >>> trunk this should be different because neon-fp16 would be selected >>> instead >>> of vfp for default FPU with Cortex-A9. >>> >> >> OK, thanks for checking. So the version you sent on June 15th is OK? >
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
On Tue, Jun 20, 2017 at 2:20 PM, Uros Bizjakwrote: > On Tue, Jun 20, 2017 at 2:17 PM, Uros Bizjak wrote: >> On Tue, Jun 20, 2017 at 2:13 PM, Florian Weimer wrote: >>> On 06/20/2017 01:10 PM, Uros Bizjak wrote: >>> 74,99% a.outa.out [.] test_or 12,50% a.outa.out [.] test_movb 12,50% a.outa.out [.] test_movl >>> >>> Could you try notl/notb/negl/negb as well, please? >> >> These all have the same (long) runtime as test_or. > > Perhaps we can use "testb $0, %0"? It doesn't write to the memory, but > otherwise has the same runtime as movb/movl. That sounds good, OTOH it's a matter of putting strain on the memory fetch or store side... We'll get cacheline allocations in any case (but the memory will be used eventually). Instead of test a mere movb into a scratch register (aka, load instead of store) would work as well apart from the need of a scratch register. We can also vectorize with scatters ;) (just kidding) Richard. > Uros.
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
On Tue, Jun 20, 2017 at 2:17 PM, Uros Bizjakwrote: > On Tue, Jun 20, 2017 at 2:13 PM, Florian Weimer wrote: >> On 06/20/2017 01:10 PM, Uros Bizjak wrote: >> >>> 74,99% a.outa.out [.] test_or >>> 12,50% a.outa.out [.] test_movb >>> 12,50% a.outa.out [.] test_movl >> >> Could you try notl/notb/negl/negb as well, please? > > These all have the same (long) runtime as test_or. Perhaps we can use "testb $0, %0"? It doesn't write to the memory, but otherwise has the same runtime as movb/movl. Uros.
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
On Tue, Jun 20, 2017 at 2:13 PM, Florian Weimerwrote: > On 06/20/2017 01:10 PM, Uros Bizjak wrote: > >> 74,99% a.outa.out [.] test_or >> 12,50% a.outa.out [.] test_movb >> 12,50% a.outa.out [.] test_movl > > Could you try notl/notb/negl/negb as well, please? These all have the same (long) runtime as test_or. Uros.
Re: [PATCH 2/2] DWARF: make it possible to emit debug info for declarations only
On Fri, Jun 16, 2017 at 6:35 PM, Pierre-Marie de Rodatwrote: > On 05/31/2017 11:08 AM, Pierre-Marie de Rodat wrote: >> >> On 05/31/2017 09:34 AM, Richard Biener wrote: >>> >>> Actually for the bigger picture I'd refactor >>> rest_of_decl_compilation, not calling it from the frontends but >>> rely on finalize_decl/function. The missing part would then be >>> calling the dwarf hook which should eventually be done at some of >>> the places the frontends now call rest_of_decl_compliation. > > > I put some thought about this, but I suppose I don’t yet understand well > enough the relation between what rest_of_decl_compilation and > finalize_decl/function do. So I’ve tried to go half-way: I moved the > “specification?” guard from the DWARF back-end to callers of the > early_global_decl hook. In the end, this yielded a very small middle-end > change: almost all hook calls in front-ends are for variables or namespaces, > not for functions. > >>> But for an easier way (you might still explore the above ;)) just remove >>> the guards from dwarf2out.c and handle it more like types that we >>> prune if they end up being unused (OTOH I guess we don't refer to >>> the decl DIEs from "calls" because not all calls are refered to with >>> standard DWARF -- the GNU callsite stuff refers them I think but those >>> get generated too late). >>> >>> That said, when early_finish is called the cgraph and IPA references >>> exists and thus you can >>> sort-of see which functions are "used". >> >> >> Ok, thanks. I’ll give a try to the first option, then. :-) > > > I finally decided not to implement this scheme, as it does not give the same > results for the case in Ada that motivated this change: it would generate > potentially one DIE per “calling unit” per called function, which is quite > suboptimal compared to one DIE per subprogram definition or subprogram > import. This would look like a debug info bloat for debatable gain. > > So here’s an updated patch, without the new debug hook. It boostrapped and > regtested fine on x86_64-linux. After this change, I observed an increase > of: > > * an increase of ~22KB for gnat1 (base is 210MB); > * a decrease (?) of ~3KB for cc1 (base is 197MB); > * a similar decrease of 3KB for cc1plus (base is 220MB). > > Ok to commit? Nice. This looks ok. I'm mildy curious about the deecrease of debuginfo size for cc1 -- did you spot anything obvious there? I suspect Fortran wants to do sth similar as Ada for imported modules. Thanks, Richard. > -- > Pierre-Marie de Rodat
Re: [PATCH] Call BUILT_IN_ASAN_HANDLE_NO_RETURN before BUILT_IN_UNWIND_RESUME (PR sanitizer/81021).
PING^1 On 06/13/2017 10:09 AM, Martin Liška wrote: > Hi. > > For a function that does not handle an expection (and calls > BUILT_IN_UNWIND_RESUME), > we need to emit call to BUILT_IN_ASAN_HANDLE_NO_RETURN. That will clean up > stack > which can possibly contain poisoned shadow memory that will not be cleaned-up > in function prologue. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Ready to be installed? > Martin > > gcc/testsuite/ChangeLog: > > 2017-06-12 Martin Liska> > PR sanitizer/81021 > * g++.dg/asan/pr81021.C: New test. > > gcc/ChangeLog: > > 2017-06-12 Martin Liska > > PR sanitizer/81021 > * tree-eh.c (lower_resx): Call BUILT_IN_ASAN_HANDLE_NO_RETURN > before BUILT_IN_UNWIND_RESUME when ASAN is used. > --- > gcc/testsuite/g++.dg/asan/pr81021.C | 33 + > gcc/tree-eh.c | 14 ++ > 2 files changed, 47 insertions(+) > create mode 100644 gcc/testsuite/g++.dg/asan/pr81021.C > >
Re: [PATCH 2/3] Make early return predictor more precise.
On 06/19/2017 01:11 PM, Jan Hubicka wrote: >> Ok, you're right that we can preserve the predictor. However, let's consider >> following test-case: >> >> static >> int baz(int a) >> { >> if (a == 1) >> return 1; >> >> return 0; >> } >> >> >> static >> int bar(int a) >> { >> if (a == 1) >> return baz(a); >> >> return 0; >> } >> >> static >> int foo(int a) >> { >> if (a == 1) >> return bar(a); >> >> return 12; >> } >> >> int main(int argc, char **argv) >> { >> return foo(argc); >> } >> >> There after einline we have: >> >> main (int argc, char * * argv) >> { >> int D.1832; >> int _3; >> int _4; >> >>[100.00%]: >> if (argc_2(D) == 1) >> goto ; [37.13%] >> else >> goto ; [62.87%] >> >>[37.13%]: >> // predicted unlikely by early return (on trees) predictor. >> // predicted unlikely by early return (on trees) predictor. >> // predicted unlikely by early return (on trees) predictor. >> >>[100.00%]: >> # _3 = PHI <12(2), 1(3)> >> _5 = _3; >> _4 = _5; >> return _4; >> >> } >> >> I'm thinking what's the best place to merge all the predictor >> statements? > > I wonder if we need to - predictors are relatively short lived. > In fact they do not need to hit IPA passes but they do as at a time > I was implementing them I was worrying about introducing yet another > global IPA pass to remove them (we can't do during early inlining > because we want to reuse them after inlining). Ok, so I fixed that in the described way. There's one remaining fallout of: gcc/testsuite/gcc.dg/tree-ssa/ipa-split-5.c Where a fnsplit is properly done, but then it's again inlined: Considering split_me.part.0/5 with 23 size to be inlined into test/2 in unknown:0 Estimated badness is -0.01, frequency 0.33. Inlined split_me.part.0 into test which now has time 50.30 and size 44, net change of +17. Considering split_me.part.0/5 with 23 size to be inlined into test/2 in unknown:0 Estimated badness is -0.01, frequency 0.33. Inlined split_me.part.0 into test which now has time 70.76 and size 61, net change of +17. Considering split_me.part.0/5 with 23 size to be inlined into test/2 in unknown:0 Estimated badness is -0.01, frequency 0.33. Inlined split_me.part.0 into test which now has time 91.22 and size 78, net change of +17. Considering split_me.part.0/5 with 23 size to be inlined into test/2 in unknown:0 Estimated badness is -0.01, frequency 0.33. Inlined split_me.part.0 into test which now has time 111.68 and size 95, net change of +17. Unit growth for small function inlining: 61->129 (111%) ... Any hint how to block the IPA inlining? Sending new version of patch. Martin > > I would just move pass_strip_predict_hints pre-IPA and not worry about > them chaining. > > There is problem that after inlining the prediction may expand its scope > and predict branch that it outside of the original function body, > but I do not see very easy solution for that besides not re-doing > prediction (we could also copy probabilities from the inlined function > when they exists and honnor them in the outer function. I am not sure > that is going to improve prediction quality though - extra context > is probably useful) > > Thanks, > Honza >> >> Thanks, >> Martin >> >>> >>> Where did you found this case? >>> Honza /* Create a new deep copy of the statement. */ copy = gimple_copy (stmt); -- 2.13.0 >From 84625a782add6ae2ed29630815b61b34a052770a Mon Sep 17 00:00:00 2001 From: marxinDate: Tue, 6 Jun 2017 10:55:18 +0200 Subject: [PATCH 1/2] Make early return predictor more precise. gcc/ChangeLog: 2017-05-26 Martin Liska PR tree-optimization/79489 * gimplify.c (maybe_add_early_return_predict_stmt): New function. (gimplify_return_expr): Call the function. * predict.c (tree_estimate_probability_bb): Remove handling of early return. * predict.def: Update comment about early return predictor. * gimple-predict.h (is_gimple_predict): New function. * predict.def: Change default value of early return to 66. * tree-tailcall.c (find_tail_calls): Skip GIMPLE_PREDICT statements. * passes.def: Put pass_strip_predict_hints to the beginning of IPA passes. --- gcc/gimple-low.c | 2 ++ gcc/gimple-predict.h | 8 gcc/gimplify.c | 16 gcc/passes.def | 1 + gcc/predict.c| 41 - gcc/predict.def | 15 +++ gcc/tree-tailcall.c | 2 ++ 7 files changed, 32 insertions(+), 53 deletions(-) diff --git a/gcc/gimple-low.c b/gcc/gimple-low.c index 619b9d7bfb1..4ea6c3532f3 100644 --- a/gcc/gimple-low.c +++ b/gcc/gimple-low.c @@ -30,6 +30,8 @@ along with GCC; see the file COPYING3. If not see #include "calls.h" #include "gimple-iterator.h" #include "gimple-low.h" +#include "predict.h" +#include "gimple-predict.h" /* The differences between High GIMPLE
Re: RFC: stack/heap collision vulnerability and mitigation with GCC
On 06/20/2017 01:10 PM, Uros Bizjak wrote: > 74,99% a.outa.out [.] test_or > 12,50% a.outa.out [.] test_movb > 12,50% a.outa.out [.] test_movl Could you try notl/notb/negl/negb as well, please? Thanks, Florian
Re: [PATCH][AArch64] Emit SIMD moves as mov
James Greenhalgh wrote: > > Does this introduce a dependency on a particular binutils version, or have > we always supported this alias? > > The patch looks OK, but I don't want to introduce a new dependency so please > check how far back this is supported. Well gas/testsuite/gas/aarch64/alias.s contains "mov v0.8b, v1.8b" since binutils 2.23, which was the first release with AArch64. Wilco
Re: [PATCH GCC][12/13]Workaround reduction statements for distribution
On Tue, Jun 20, 2017 at 11:20 AM, Bin.Chengwrote: > On Fri, Jun 16, 2017 at 6:15 PM, Bin.Cheng wrote: >> On Fri, Jun 16, 2017 at 11:21 AM, Richard Biener >> wrote: >>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng wrote: Hi, For now, loop distribution handles variables used outside of loop as reduction. This is inaccurate because all partitions contain statement defining induction vars. >>> >>> But final induction values are usually not used outside of the loop... >> This is in actuality for induction variable which is used outside of the >> loop. >>> >>> What is missing is loop distribution trying to change partition order. In >>> fact >>> we somehow assume we can move a reduction across a detected builtin >>> (I don't remember if we ever check for validity of that...). >> Hmm, I am not sure when we can't. If there is any dependence between >> builtin/reduction partitions, it should be captured by RDG or PG, >> otherwise the partitions are independent and can be freely ordered as >> long as reduction partition is scheduled last? >>> Ideally we should factor out scev-propagation as a standalone interface which can be called when necessary. Before that, this patch simply workarounds reduction issue by checking if the statement belongs to all partitions. If yes, the reduction must be computed in the last partition no matter how the loop is distributed. Bootstrap and test on x86_64 and AArch64. Is it OK? >>> >>> stmt_in_all_partitions is not kept up-to-date during partition merging and >>> if >>> merging makes the reduction partition(s) pass the stmt_in_all_partitions >>> test your simple workaround doesn't work ... >> I think it doesn't matter because: >> A) it's really workaround for induction variables. In general, >> induction variables are included by all partition. >> B) After classify partition, we immediately fuses all reduction >> partitions. More stmt_in_all_partitions means we are fusing >> non-reduction partition with reduction partition, so the newly >> generated (stmt_in_all_partitions) are actually not reduction >> statements. The workaround won't work anyway even the bitmap is >> maintained. >>> >>> As written it's a valid optimization but can you please note it's >>> limitation in >>> some comment please? >> Yeah, I will add comment explaining it. > Comment added in new version patch. It also computes bitmap outside > now, is it OK? Ok. Can you add a testcase for this as well please? I think the series up to this is now fully reviewed, I defered 1/n (the new IFN) to the last one containing the runtime versioning. Can you re-post that (you can merge with the IFN patch) to apply after the series has been applied up to this? Thanks, Richard. > Thanks, > bin > 2017-06-07 Bin Cheng > > * tree-loop-distribution.c (classify_partition): New parameter and > better handle reduction statement. > (rdg_build_partitions): Revise comment. > (distribute_loop): Compute statements in all partitions and pass it > to classify_partition.
Re: [PATCH GCC][11/13]Annotate partition by its parallelism execution type
On Tue, Jun 20, 2017 at 11:18 AM, Bin.Chengwrote: > On Fri, Jun 16, 2017 at 11:10 AM, Richard Biener > wrote: >> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng wrote: >>> Hi, >>> This patch checks and records if partition can be executed in parallel by >>> looking if there exists data dependence cycles. The information is needed >>> for distribution because the idea is to distribute parallel type partitions >>> away from sequential ones. I believe current distribution doesn't work >>> very well because it does blind distribution/fusion. >>> Bootstrap and test on x86_64 and AArch64. Is it OK? >> >> + /* In case of no data dependence. */ >> + if (DDR_ARE_DEPENDENT (ddr) == chrec_known) >> +return false; >> + /* Or the data dependence can be resolved by compilation time alias >> + check. */ >> + else if (!alias_sets_conflict_p (get_alias_set (DR_REF (dr1)), >> + get_alias_set (DR_REF (dr2 >> +return false; >> >> dependence analysis should use TBAA already, in which cases do you need this? >> It seems to fall foul of the easy mistake of not honoring GCCs memory model >> as well ... see dr_may_alias_p. > I see. Patch updated with this branch removed. > >> >> + /* Further check if any data dependence prevents us from executing the >> + partition parallelly. */ >> + EXECUTE_IF_SET_IN_BITMAP (partition->reads, 0, i, bi) >> +{ >> + dr1 = (*datarefs_vec)[i]; >> + EXECUTE_IF_SET_IN_BITMAP (partition->writes, 0, j, bj) >> + { >> >> what about write-write dependences? >> >> + EXECUTE_IF_SET_IN_BITMAP (partition->reads, 0, i, bi) >> +{ >> + dr1 = (*datarefs_vec)[i]; >> + EXECUTE_IF_SET_IN_BITMAP (partition->writes, i + 1, j, bj) >> + { >> + dr2 = (*datarefs_vec)[j]; >> + /* Partition can only be executed sequentially if there is any >> +data dependence cycle. */ >> >> exact copy of the loop nest follows?! Maybe you meant to iterate >> over writes in the first loop. > Yes, this is a copy-paste typo. Patch is also simplified because > read/write are recorded together now. Is it OK? Ok. Thanks, Richard. > Thanks, > bin > 2017-06-07 Bin Cheng > > * tree-loop-distribution.c (enum partition_type): New. > (struct partition): New field type. > (partition_merge_into): Update partition type. > (data_dep_in_cycle_p): New function. > (build_rdg_partition_for_vertex): Compute partition type. > (rdg_build_partitions): Dump partition type.
Re: [PATCH GCC][07/13]Preserve data references for whole distribution life time
On Mon, Jun 19, 2017 at 5:59 PM, Bin.Chengwrote: > On Mon, Jun 19, 2017 at 4:16 PM, Richard Biener > wrote: >> On Mon, Jun 19, 2017 at 3:34 PM, Bin.Cheng wrote: >>> On Tue, Jun 13, 2017 at 12:14 PM, Richard Biener >>> wrote: On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng wrote: > Hi, > This patch collects and preserves all data references in loop for whole > distribution life time. It will be used afterwards. > > Bootstrap and test on x86_64 and AArch64. Is it OK? +/* Vector of data references in the loop to be distributed. */ +static vec *datarefs_vec; + +/* Map of data reference in the loop to a unique id. */ +static hash_map *datarefs_map; + no need to make those pointers. It's not a unique id but the index into the datarefs_vec vector, right? loop distribution doesn't yet use dr->aux so it would be nice to avoid the hash_map in favor of using that field. #define DR_INDEX (dr) ((uintptr_t)(dr)->aux) + if (datarefs_vec->length () > 64) There is PARAM_VALUE (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS) with a default value of 1000. Please use that instead of magic numbers. +{ + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, +"Loop %d not distributed: more than 64 memory references.\n", +loop->num); + + free_rdg (rdg); + loop_nest->release (); + delete loop_nest; + free_data_refs (*datarefs_vec); + delete datarefs_vec; + return 0; +} auto_* were so nice ... >>> Hi Richard, >>> This is the updated patch. It removes datarefs_map as well as checks >>> number of data references against the parameter. Is it OK? >> >> ENOPATCH > Ah Sorry for that. Ok. Richard. > Thanks, > bin >> >>> Thanks, >>> bin >>> 2017-06-07 Bin Cheng >>> >>> * tree-loop-distribution.c (params.h): Include header file. >>> (MAX_DATAREFS_NUM, DR_INDEX): New macro. >>> (datarefs_vec): New global var. >>> (create_rdg_vertices): Use datarefs_vec directly. >>> (free_rdg): Don't free data references. >>> (build_rdg): Update use. Don't free data references. >>> (distribute_loop): Compute global variable for data references. >>> Bail out if there are too many data references.
Re: [PATCH GCC][10/13]Compute and cache data dependence relation
On Tue, Jun 20, 2017 at 11:15 AM, Bin.Chengwrote: > On Fri, Jun 16, 2017 at 11:03 AM, Richard Biener > wrote: >> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng wrote: >>> Hi, >>> This patch computes and caches data dependence relation in a hash table >>> so that it can be queried multiple times later for partition dependence >>> check. >>> Bootstrap and test on x86_64 and AArch64. Is it OK? >> >> +/* Vector of data dependence relations. */ >> +static vec *ddrs_vec; >> + >> +/* Hash table for data dependence relation in the loop to be distributed. >> */ >> +static hash_table *ddrs_table; >> >> avoid the extra indirection. >> >> +/* Hashtable entry for data reference relation. */ >> +struct ddr_entry >> +{ >> + data_reference_p a; >> + data_reference_p b; >> + ddr_p ddr; >> + hashval_t hash; >> +}; >> ... >> +/* Hash table equality function for data reference relation. */ >> + >> +inline bool >> +ddr_entry_hasher::equal (const ddr_entry *entry1, const ddr_entry *entry2) >> +{ >> + return (entry1->hash == entry2->hash >> + && DR_STMT (entry1->a) == DR_STMT (entry2->a) >> + && DR_STMT (entry1->b) == DR_STMT (entry2->b) >> + && operand_equal_p (DR_REF (entry1->a), DR_REF (entry2->a), 0) >> + && operand_equal_p (DR_REF (entry1->b), DR_REF (entry2->b), 0)); >> +} >> >> what's the issue with using hash_table with a custom hasher? >> That is, simply key on the dataref pointers (hash them, compare those >> for equality)? >> >> Your scheme looks too complicated / expensive to me ... >> >> You can drop ddrs_vec needed only for memory removal if you traverse >> the hashtable. > Thanks for reviewing. Patch simplified as suggested. Is it OK? +inline hashval_t +ddr_hasher::hash (const data_dependence_relation *ddr) +{ + return iterative_hash_object (DDR_A (ddr), + iterative_hash_object (DDR_B (ddr), 0)); +} + please use inchash::hash h; h.add_ptr (DDR_A (ddr)); h.add_ptr (DDR_B (ddr)); return h.end (); Ok with that change. Richard. > Thanks, > bin > 2017-06-17 Bin Cheng > > * tree-loop-distribution.c (struct ddr_hasher): New. > (ddr_hasher::hash, ::equal, get_data_dependence): New function. > (ddrs_table): New. > (classify_partition): Call get_data_dependence. > (pg_add_dependence_edges): Ditto. > (distribute_loop): Release data dependence hash table.
Re: [PATCH] Fix PR71815 (SLSR misses PHI opportunities)
On Fri, Jun 16, 2017 at 6:10 PM, Bill Schmidtwrote: > Hi, > > PR71815 identifies a situation where SLSR misses opportunities for > PHI candidates when code hoisting is enabled (which is now on by > default). The basic problem is that SLSR currently uses an overly > simple test for profitability of the transformation. The algorithm > currently requires that the PHI basis (through which the non-local > SLSR candidate is propagated) has only one use, which is the > candidate statement. The true requirement for profitability is > that, if the candidate statement will be dead after transformation, > then so will the PHI candidate. > > This patch fixes the problem by looking at the transitive reachability > of the PHI definitions. If all paths terminate in the candidate > statement, then we know the PHI basis will go dead and we will not > make the code worse with the planned replacement. To avoid compile > time issues, path search is arbitrarily terminated at depth 10. The > new test is used throughout the cost calculation, so appears multiple > times in the code. > > Also, I've added a check to avoid replacing multiply candidates with > a stride of 1. Such a candidate is really a copy or cast statement, > and if we replace it, we will just generate a different copy or cast > statement. I noticed this with one of the test cases from the PR > while debugging the problem. > > I've updated the two test cases that were previously enabled only > with -fno-code-hoisting, removing that restriction. > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no > regressions. I've also tested this with SPEC cpu2006 and the > patch is performance neutral on a POWER8 box (as expected). Is > this ok for trunk? > > Thanks, > Bill > > > [gcc] > > 2016-06-16 Bill Schmidt > > * gimple-ssa-strength-reduction.c (uses_consumed_by_stmt): New > function. > (find_basis_for_candidate): Call uses_consumed_by_stmt rather than > has_single_use. > (slsr_process_phi): Likewise. > (replace_uncond_cands_and_profitable_phis): Don't replace a > multiply candidate with a stride of 1 (copy or cast). > (phi_incr_cost): Call uses_consumed_by_stmt rather than > has_single_use. > (lowest_cost_path): Likewise. > (total_savings): Likewise. > > [gcc/testsuite] > > 2016-06-16 Bill Schmidt > > * gcc.dg/tree-ssa/slsr-35.c: Remove -fno-code-hoisting workaround. > * gcc.dg/tree-ssa/slsr-36.c: Likewise. > > > Index: gcc/gimple-ssa-strength-reduction.c > === > --- gcc/gimple-ssa-strength-reduction.c (revision 239241) > +++ gcc/gimple-ssa-strength-reduction.c (working copy) > @@ -475,6 +475,48 @@ find_phi_def (tree base) >return c->cand_num; > } > > +/* Determine whether all uses of NAME are directly or indirectly > + used by STMT. That is, we want to know whether if STMT goes > + dead, the definition of NAME also goes dead. */ > +static bool > +uses_consumed_by_stmt (tree name, gimple *stmt, unsigned recurse) use a default arg 'unsigned recurse = 0' to hide this implementation detail at users. > +{ > + gimple *use_stmt; > + imm_use_iterator iter; > + bool retval = true; > + > + FOR_EACH_IMM_USE_STMT (use_stmt, iter, name) > +{ > + if (use_stmt == stmt || is_gimple_debug (use_stmt)) > + continue; > + > + if (!is_gimple_assign (use_stmt)) > + { > + retval = false; > + BREAK_FROM_IMM_USE_STMT (iter); > + } > + > + /* Limit recursion. */ > + if (recurse >= 10) > + { > + retval = false; > + BREAK_FROM_IMM_USE_STMT (iter); > + } Put this limit right before the recursion. > + tree next_name = gimple_get_lhs (use_stmt); > + if (!next_name || !is_gimple_reg (next_name)) > + { > + retval = false; > + BREAK_FROM_IMM_USE_STMT (iter); > + } > + > + if (uses_consumed_by_stmt (next_name, stmt, recurse + 1)) > + continue; So this doesn't change dependent on the result which means you likely meant if (! uses) { retval = false; BREAK... } which possibly also invalidates your testing? The whole thing is probably easier to optimize if you merge the ifs that break into one. Richard. > +} > + > + return retval; > +} > + > /* Helper routine for find_basis_for_candidate. May be called twice: > once for the candidate's base expr, and optionally again either for > the candidate's phi definition or for a CAND_REF's alternative base > @@ -550,7 +592,8 @@ find_basis_for_candidate (slsr_cand_t c) > > /* If we found a hidden basis, estimate additional dead-code > savings if the phi and its feeding statements can be removed. */ > - if (basis &&
Re: [PATCH][2/2] early LTO debug, main part
On Wed, 7 Jun 2017, Richard Biener wrote: > On Fri, 19 May 2017, Richard Biener wrote: > > > > > This is a repost of the main part of the early LTO debug support. > > The only changes relative to the last post is in the dwarf2out.c > > pieces due to Jasons review and Jakubs introduction of > > DW_OP_GNU_variable_value. > > > > I've also adjusted testcases for fallout (the asan backtraces do > > give files / line numbers because libbacktrace doesn't understand > > the DWARF) plus added a -flto run over the libstdc++ pretty printer > > testsuite -- after all the goal was to make those work with LTO, > > and they now nicely do. > > > > [LTO-]bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > I've also tested with -flto -g and compared to before the patch and > > the outcome doesn't contain any surprises. > > > > I've also ran the gdb testsuite with no differences (but I guess > > it doesn't exercise LTO). > > > > I've also built SPEC 2k6 with -flto -g. > > > > I've also debugged optimized LTO bootstrapped cc1 a bit - not that > > debugging (LTO) optimized cc1 is a pleasant experience, but at least > > gdb doesn't crash. > > > > Ok for trunk? > > Ping. Ping^2. > > Both darwin and mingw maintainers were not concerned about LTO with -g > > being broken for them. > > > > This patch allows us to go forward with freeing more stuff after > > the frontend finished, in particular remove LTO streaming of a lot > > of type information that is referenced from trees (and, as a first > > step, enable free-lang-data for non-LTO compiles). > > > > Thanks, > > Richard. > > > > 2017-05-19 Richard Biener> > > > * debug.h (struct gcc_debug_hooks): Add die_ref_for_decl and > > register_external_die hooks. > > (debug_false_tree_charstarstar_uhwistar): Declare. > > (debug_nothing_tree_charstar_uhwi): Likewise. > > * debug.c (do_nothing_debug_hooks): Adjust. > > (debug_false_tree_charstarstar_uhwistar): New do nothing. > > (debug_nothing_tree_charstar_uhwi): Likewise. > > * dbxout.c (dbx_debug_hooks): Adjust. > > (xcoff_debug_hooks): Likewise. > > * sdbout.c (sdb_debug_hooks): Likewise. > > * vmsdbgout.c (vmsdbg_debug_hooks): Likewise. > > > > * dwarf2out.c (macinfo_label_base): New global. > > (dwarf2out_register_external_die): New function for the > > register_external_die hook. > > (dwarf2out_die_ref_for_decl): Likewise for die_ref_for_decl. > > (dwarf2_debug_hooks): Use them. > > (dwarf2_lineno_debug_hooks): Adjust. > > (struct die_struct): Add with_offset flag. > > (DEBUG_LTO_DWO_INFO_SECTION, DEBUG_LTO_INFO_SECTION, > > DEBUG_LTO_DWO_ABBREV_SECTION, DEBUG_LTO_ABBREV_SECTION, > > DEBUG_LTO_DWO_MACINFO_SECTION, DEBUG_LTO_MACINFO_SECTION, > > DEBUG_LTO_DWO_MACRO_SECTION, DEBUG_LTO_MACRO_SECTION, > > DEBUG_LTO_LINE_SECTION, DEBUG_LTO_DWO_STR_OFFSETS_SECTION, > > DEBUG_LTO_STR_DWO_SECTION, DEBUG_STR_LTO_SECTION): New macros > > defining section names for the early LTO debug variants. > > (reset_indirect_string): New helper. > > (add_AT_external_die_ref): Helper for > > dwarf2out_register_external_die. > > (print_dw_val): Add support for offsetted symbol references. > > (compute_section_prefix_1): Split out worker to distinguish > > the comdat from the LTO case. > > (compute_section_prefix): Wrap old comdat case here. > > (output_die): Skip DIE symbol output for the LTO added one. > > Handle DIE symbol references with offset. > > (output_comp_unit): Guard section name mangling properly. > > For LTO debug sections emit a symbol at the section beginning > > which we use to refer to its DIEs. > > (add_abstract_origin_attribute): For DIEs registered via > > dwarf2out_register_external_die directly refer to the early > > DIE rather than indirectly through the shadow one we created. > > (gen_array_type_die): When generating early LTO debug do > > not emit DW_AT_string_length. > > (gen_formal_parameter_die): Do not re-create DIEs for PARM_DECLs > > late when in LTO. > > (gen_subprogram_die): Adjust the check for whether we face > > a concrete instance DIE for an inline we can reuse for the > > late LTO case. Likewise avoid another specification DIE > > for early built declarations/definitions for the late LTO case. > > (gen_variable_die): Add type references for late duplicated VLA dies > > when in late LTO. > > (gen_inlined_subroutine_die): Do not call > > dwarf2out_abstract_function, > > we have the abstract instance already. > > (process_scope_var): Adjust decl DIE contexts in LTO which > > first puts them in limbo. > > (gen_decl_die): Do not generate type DIEs late
Re: [PATCH][AArch64] Emit SIMD moves as mov
On Tue, Jun 20, 2017 at 12:06:29PM +0100, Wilco Dijkstra wrote: > SIMD moves are currently emitted as ORR. Change this to use the MOV > pseudo instruction just like integer moves (the ARM-ARM states MOV is the > preferred disassembly), improving readability of -S output. > > Passes bootstrap, OK for commit? Does this introduce a dependency on a particular binutils version, or have we always supported this alias? The patch looks OK, but I don't want to introduce a new dependency so please check how far back this is supported. Thanks, James > > ChangeLog: > 2017-06-20 Wilco Dijkstra> > * config/aarch64/aarch64.md (movti_aarch64): > Emit mov rather than orr. > (movtf_aarch64): Likewise. > * config/aarch64/aarch64-simd.md (aarch64_simd_mov): > Emit mov rather than orr. > --