Re: [PR libfortran/62768] Handle filenames with embedded nulls
On Wed, 17 Sep 2014, Hans-Peter Nilsson wrote: On Thu, 18 Sep 2014, Janne Blomqvist wrote: On Thu, Sep 18, 2014 at 12:57 AM, Hans-Peter Nilsson h...@bitrange.com wrote: 'k so we'll track the regressions in a PR. Do you prefer to tack on to 62768 or a new PR? Hijacking 62768 for the purposes of reporting a regression for its fix would not be proper. Will tell in a new PR, unless I see a really obvious fix. False alarm. If you look back at the patch I posted, there's a typo. :-} Duly warned about, but I'd rather expect the build to fail. Apparently libgfortran is not compiled with -Werror, at least not for crosses. Maybe -Werror is there for native but I'm not sure as I see some warning: array subscript has type 'char' [-Wchar-subscripts] which seems generic and also some others. Though no more than can be fixed or excepted, IMHO. brgds, H-P
Re: [PATCH AArch64]: Add constraint letter for stack_protect_test pattern)
On 17 September 2014 15:43, James Greenhalgh james.greenha...@arm.com wrote: On Wed, Sep 17, 2014 at 09:30:31AM +0100, Richard Earnshaw wrote: =r is correct for an early-clobbered scratch. R. In that case... How is the attached patch for trunk? I've bootstrapped it on AArch64 with -fstack-protector-strong and -frename-registers in the BOOT_CFLAGS without seeing any issues. OK? Thanks, James --- gcc/ 2014-09-15 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64.md (stack_protect_test_mode): Mark scratch register as an output to placate register renaming. OK for this part. gcc/testsuite/ 2014-09-15 James Greenhalgh james.greenha...@arm.com * gcc.target/aarch64/stack_protector_set_1.c: New. * gcc.target/aarch64/stack_protector_set_2.c: Likewise. I agree with Andrew, these don't need to be aarch64 specific. /Marcus
Re: [PR libfortran/62768] Handle filenames with embedded nulls
On Thu, Sep 18, 2014 at 11:14 AM, Hans-Peter Nilsson h...@bitrange.com wrote: On Wed, 17 Sep 2014, Hans-Peter Nilsson wrote: On Thu, 18 Sep 2014, Janne Blomqvist wrote: On Thu, Sep 18, 2014 at 12:57 AM, Hans-Peter Nilsson h...@bitrange.com wrote: 'k so we'll track the regressions in a PR. Do you prefer to tack on to 62768 or a new PR? Hijacking 62768 for the purposes of reporting a regression for its fix would not be proper. Will tell in a new PR, unless I see a really obvious fix. False alarm. Ok, good; I was a bit perplexed what could be wrong. If you look back at the patch I posted, there's a typo. :-} Duly warned about, but I'd rather expect the build to fail. Yes, strange that it didn't fail. There's no prototype for cf_fstrcpy, and since we use std=gnu11 prototypes should be mandatory. Also, since there's no symbol called cf_fstrcpy so at least the linking should fail. Unless the link picked up some old inquire.o file? Apparently libgfortran is not compiled with -Werror, at least not for crosses. Maybe -Werror is there for native but I'm not sure as I see some warning: array subscript has type 'char' [-Wchar-subscripts] which seems generic and also some others. Though no more than can be fixed or excepted, IMHO. No, Werror isn't used. It was tried, but apparently caused issues. From the changelog: 2008-06-13 Tobias Burnus bur...@net-b.de * configure.ac (AM_CFLAGS): Remove -Werror again. * configure: Regenerate. 2008-06-13 Tobias Burnus bur...@net-b.de PR libgfortran/36518 * configure.ac (AM_CFLAGS): Add -Werror. * configure: Regenerate. * m4/ifunction_logical.m4: Cast n to (int). * generated/any_l16.c: Regenerate. * generated/any_l2.c: Regenerate. * generated/all_l1.c: Regenerate. * generated/all_l2.c: Regenerate. * generated/all_l16.c: Regenerate. * generated/any_l4.c: Regenerate. * generated/count_4_l.c: Regenerate. * generated/count_8_l.c: Regenerate. * generated/all_l4.c: Regenerate. * generated/count_1_l.c: Regenerate. * generated/count_16_l.c: Regenerate. * generated/any_l8.c: Regenerate. * generated/count_2_l.c: Regenerate. * generated/any_l1.c: Regenerate. * generated/all_l8.c: Regenerate. I have a vague recollection that there were issues with system headers on non-glibc targets. It would be nice if Werror was used by default, I think we've had a few cases where bugs have slipped past due to it. -- Janne Blomqvist
Re: [PATCHv4] Vimrc config with GNU formatting
On 09/18/2014 07:52 AM, Segher Boessenkool wrote: +# Local Vim config + +vimrc: + (cd $(srcdir); $(LN_S) contrib/vimrc .local.vimrc; $(LN_S) contrib/vimrc .lvimrc) + This is another target than what the doc (in the script itself) mentions. Right, I've forgot to fix it before sending the patch. Too much experimenting in the evening... It is not marked as phony. Noted. It should not _be_ phony; the two files should be separate targets. I've done that initially but that may look weird for the user. When typing 'make .local.vimrc' in GCC build directory one would expect .local.vimrc to be created at the root of build directory, not srcdir. Why make links instead of copies? A user will likely want to edit his config. I see your point. On the other hand fixing a bug in contrib/vimrc will not be propagated to .local.vimrc which looks like a major disadvantage (to me at least). The way you use ; is wrong (it continues if there is an error). Agreed. Current Makefiles do use ; in backticks and that drew me away. You don't need the cd anyway, come to that. Noted. It's pretty silly to have a makefile target that only copies a file (that is never used by the makefile itself); just tell in the doc where to copy the file. I personally prefer a Makefile target to simplify things. But let's wait for other people opinions on this. --- /dev/null +++ b/contrib/vimrc @@ -0,0 +1,45 @@ + Code formatting settings for Vim. + + To enable this for GCC files by default, install thinca's vim-localrc + plugin and do + $ make .local.vimrc No, we should *not* advertise an enough rope solution without mentioning it *will* kill you. How about adding a disclaimer? E.g. beware that Vim plugins are a GAPING SECURITY HOLE so use the at YOUR OWN RISK. (And note that Braun's plugin does use sandboxes). Or not mention it at all. Esp. since your next option has all the same functionality and more. It lacks very important functionality: user has to specify path to concrete GCC source tree when adding the autocmd. I have a dozen of trees on my box and I regularly rename, move or copy them. With plugins one doesn't have to bother fixing paths in ~/.vimrc which is important for productivity. + Or if you dislike plugins, add autocmd in your ~/.vimrc: + :au BufNewFile,BufReadPost path/to/gcc/* :so path/to/gcc/contrib/vimrc There are many more reasons than just dislike of plugins to prefer something like this. For one thing, many Vim users will have many similar statements in their config _already_. So if you don't want to use plugins? + Or just source file manually every time if you are masochist: + :so path/to/gcc/contrib/vimrc How is that masochist? Typing that cino by hand though, now that would qualify ;-) Note that user has to type source command for every newly opened file. This indeed looks inconvenient (to me again). Just keep things neutral please. Trying to salt the boring docs a bit to attract reader's attention ;) +setlocal cindent +setlocal shiftwidth=2 +setlocal softtabstop=2 +setlocal cinoptions=2s,n-s,{s,^-s,:s,=s,g0,f0,hs,p2s,t0,+s,(0,u0,w1,m0 If you write this as absolute numbers instead of as shift widths, you do not need to force sw and sts settings down people's throat. It might also be easier to read? Well I doubt that, but it will be slightly shorter at least. IMHO matching shiftwidth with GNU indent may be useful. E.g. Vim won't reindent when you start editing an empty line and user will have to insert indents manually. Also replacing offsets with numbers hides the fact that they are based on GNU shiftwidth. +setlocal textwidth=79 The coding conventions say maximum line length is 80. From https://www.gnu.org/prep/standards/html_node/Formatting.html : Please keep the length of source lines to 79 characters or less, for maximum readability in the widest range of environments. 'tw' is a user preference as well The config just follows the GNU coding standard. Now we rarely do violate textwidth in our codes, that's why I do formatoptions+=l below. +setlocal formatoptions-=ro formatoptions+=cql Yet another user preference. Also mostly the default, except l -- which won't do anything if tw=0 as it should be. And you do not enable t (also on by default), so you do not want to wrap text anyway? Confused now. Me as well, the original config author did it that way. IMHO +t makes sense here. -Y
Re: [PR libfortran/62768] Handle filenames with embedded nulls
On Sep 18 2014, Janne Blomqvist wrote: Apparently libgfortran is not compiled with -Werror, at least not for crosses. Maybe -Werror is there for native but I'm not sure as I see some warning: array subscript has type 'char' [-Wchar-subscripts] which seems generic and also some others. Though no more than can be fixed or excepted, IMHO. No, Werror isn't used. It was tried, but apparently caused issues. From the changelog: 2008-06-13 Tobias Burnus bur...@net-b.de * configure.ac (AM_CFLAGS): Remove -Werror again. I have a vague recollection that there were issues with system headers on non-glibc targets. It would be nice if Werror was used by default, I think we've had a few cases where bugs have slipped past due to it. I wasn't involved, but that sounds more than just likely! I have had that experience with several options, including Werror, pedantic and specific standards ones. My experience is that most vendors clean up at least the standard C headers with time, and usually the more basic POSIX ones, but any others often remain beyond redemption. And what is not going to help is the ongoing incompatibilities in de jure and de facto standards. I have certainly seen standard headers that would compile only with specific language selection options. Oh, yes, their COMPILER supported other ones - you just couldn't use some important system headers with them :-( If I get time, I will look at the libfortran header use and see if I can make any useful specific comments. Regards, Nick Maclaren.
Re: [PATCH AArch64]: Add constraint letter for stack_protect_test pattern)
On Thu, Sep 18, 2014 at 09:18:53AM +0100, Marcus Shawcroft wrote: gcc/testsuite/ 2014-09-15 James Greenhalgh james.greenha...@arm.com * gcc.target/aarch64/stack_protector_set_1.c: New. * gcc.target/aarch64/stack_protector_set_2.c: Likewise. I agree with Andrew, these don't need to be aarch64 specific. Well, guess the 16 needs to be replaced with sizeof buffer, because sizeof (unsigned int) is not 4 on all architectures. And /* { dg-require-effective-target fstack_protector } */ is needed, not all targets support -fstack-protector*. Jakub
Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c
Thanks for the reply - and the in-depth investigation. I agree that the correctness of the compiler is critical rather than particular platforms such as Ada / Alpha. Moreover, I think we both agree that if result_mode==shift_mode, the transformation is correct. Just putting that check in, achieves what I'm trying for here, so I'd be happy to go with the attached patch and call it a day. However, I'm a little concerned about the other cases - i.e. where shift_mode is wider than result_mode. If I understand correctly (and I'm not sure about that, let's see how far I get), this means we'll perform the shift in (say) DImode, when we're only really concerned about the lower (say) 32-bits (for an originally-SImode shift). try_widen_shift_mode will in this case check that the result of the operation *inside* the shift (in our case an XOR) has 33 sign bit copies (via num_sign_bit_copies), i.e. that its *top* 32-bits are all equal to the original SImode sign bit. count of these bits may be fed into the top of the desired SImode result by the DImode shift. Right so far? AFAICT, num_sign_bit_copies for an XOR, conservatively returns the minimum of the num_sign_bit_copies of its two operands. I'm not sure whether this is behaviour we should rely on in its callers, or for the sake of abstraction we should treat num_sign_bit_copies as a black box (which does what it says on the, erm, tin). If the former, doesn't having num_sign_bit_copies = the difference in size between result_mode and shift_mode, of both operands to the XOR, guarantee safety of the commutation (whether the constant is positive or negative)? We could perform the shift (in the larger mode) on both of the XOR operands safely, then XOR together their lower parts. If, however, we want to play safe and ensure that we deal safely with any XOR whose top (mode size difference + 1) bits were the same, then I think the restriction that the XOR constant is positive is neither necessary nor sufficient; rather (mirroring try_widen_shift_mode) I think we need that num_sign_bit_copies of the constant in shift_mode, is more than the size difference between result_mode and shift_mode. Hmmm. I might try that patch at some point, I think it is the right check to make. (Meta-comment: this would be *so*much* easier if we could write unit tests more easily!) In the meantime I'd be happy to settle for the attached... (tests are as they were posted https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01233.html .) --Alan Jeff Law wrote: On 07/17/14 10:56, Alan Lawrence wrote: Ok, the attached tests are passing on x86_64-none-linux-gnu, aarch64-none-elf, arm-none-eabi, and a bunch of smaller platforms for which I've only built a stage 1 compiler (i.e. as far as necessary to assemble). That's with either change to simplify_shift_const_1. As to the addition of result_mode != shift_mode, or removing the whole check against XOR - I've now tested the latter: bootstrapped on x86_64-none-linux-gnu, check-gcc and check-ada; bootstrapped on arm-none-linux-gnueabihf; bootstrapped on aarch64-none-linux-gnu; cross-tested check-gcc on aarch64-none-elf; cross-tested on arm-none-eabi; (and Uros has bootstrapped on alpha and done a suite of tests, as per https://gcc.gnu.org/ml/gcc-testresults/2014-07/msg01236.html). From a perspective of paranoia, I'd lean towards adding result_mode != shift_mode, but for neatness removing the whole check against XOR is nicer. So I'd defer to the maintainers as to whether one might be preferable to the other...(but my unproven suspicion is that the two are equivalent, and no case where result_mode != shift_mode is possible!) So first, whether or not someone cares about Alpha-VMS isn't the issue at hand. It's whether or not the new code is correct or not. Similarly the fact that the code generation paths are radically different now when compared to 2004 and we can't currently trigger the cases where the modes are different isn't the issue, again, it's whether or not your code is correct or not. I think the key is to look at try_widen_shift_mode and realize that it can return a mode larger than the original mode of the operations. However, it only does so when it presented with a case where it knows the sign bit being shifted in from the left will be the same as the sign bit in the original mode. In the case of an XOR when the sign bit set in shift_mode, that's not going to be the case. We would violate the assumption made when we decided to widen the shift to shift_mode. So your relaxation is safe when shift_mode == result_mode, but unsafe otherwise -- even though we don't currently have a testcase for the shift_mode != result_mode case, we don't want to break that. So your updated patch is correct. However, I would ask that you make one additional change. Namely the comment before the two fragments of code you changed needs updating. Something like ... and the constant has its sign bit set in shift_mode and shift_mode
[AArch64] Auto-generate the BUILTIN_ macros for aarch64-builtins.c
Hi, A possible source of errors is in keeping the iterators.md file and the iterator macros in aarch64-builtin.c synchronized. Clearly this shouldn't be a problem given standard unix tools, it is just a text processing job. This patch adds geniterators.sh to the AArch64 backend which takes the iterators.md file and generates aarch64-builtin-iterators.h, this replaces the definitions from aarch64-builtins.c, which now just include this file. Bootstrapped for aarch64-none-linux-gnueabi, and regression tested for aarch64-none-elf with no issues. OK? Thanks, James --- gcc/ 2014-09-18 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64-builtin-iterators.h: New. * config/aarch64/geniterators.sh: New. * config/aarch64/iterators.md (VDQF_DF): New. * config/aarch64/t-aarch64: Add dependencies on new build script. * config/aarch64/aarch64-builtins.c (BUILTIN_*) Remove. diff --git a/gcc/config/aarch64/aarch64-builtin-iterators.h b/gcc/config/aarch64/aarch64-builtin-iterators.h new file mode 100644 index 000..ae579e8 --- /dev/null +++ b/gcc/config/aarch64/aarch64-builtin-iterators.h @@ -0,0 +1,113 @@ +/* -*- buffer-read-only: t -*- */ +/* Generated automatically by geniterators.sh from iterators.md. */ +#ifndef GCC_AARCH64_ITERATORS_H +#define GCC_AARCH64_ITERATORS_H +#define BUILTIN_GPI(T, N, MAP) \ + VAR2 (T, N, MAP, si, di) +#define BUILTIN_SHORT(T, N, MAP) \ + VAR2 (T, N, MAP, qi, hi) +#define BUILTIN_ALLI(T, N, MAP) \ + VAR4 (T, N, MAP, qi, hi, si, di) +#define BUILTIN_SDQ_I(T, N, MAP) \ + VAR4 (T, N, MAP, qi, hi, si, di) +#define BUILTIN_ALLX(T, N, MAP) \ + VAR3 (T, N, MAP, qi, hi, si) +#define BUILTIN_GPF(T, N, MAP) \ + VAR2 (T, N, MAP, sf, df) +#define BUILTIN_VDQ(T, N, MAP) \ + VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di) +#define BUILTIN_VDQ_I(T, N, MAP) \ + VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di) +#define BUILTIN_VSDQ_I(T, N, MAP) \ + VAR11 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, qi, hi, si, di) +#define BUILTIN_VSDQ_I_DI(T, N, MAP) \ + VAR8 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, di) +#define BUILTIN_VD(T, N, MAP) \ + VAR4 (T, N, MAP, v8qi, v4hi, v2si, v2sf) +#define BUILTIN_VD_BHSI(T, N, MAP) \ + VAR3 (T, N, MAP, v8qi, v4hi, v2si) +#define BUILTIN_VDQ_BHSI(T, N, MAP) \ + VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si) +#define BUILTIN_VQ(T, N, MAP) \ + VAR6 (T, N, MAP, v16qi, v8hi, v4si, v2di, v4sf, v2df) +#define BUILTIN_VQ_NO2E(T, N, MAP) \ + VAR4 (T, N, MAP, v16qi, v8hi, v4si, v4sf) +#define BUILTIN_VQ_2E(T, N, MAP) \ + VAR2 (T, N, MAP, v2di, v2df) +#define BUILTIN_VQ_S(T, N, MAP) \ + VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si) +#define BUILTIN_VSDQ_I_BHSI(T, N, MAP) \ + VAR10 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, qi, hi, si) +#define BUILTIN_VDQM(T, N, MAP) \ + VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si) +#define BUILTIN_VDQF(T, N, MAP) \ + VAR3 (T, N, MAP, v2sf, v4sf, v2df) +#define BUILTIN_VDQF_DF(T, N, MAP) \ + VAR4 (T, N, MAP, v2sf, v4sf, v2df, df) +#define BUILTIN_VDQSF(T, N, MAP) \ + VAR2 (T, N, MAP, v2sf, v4sf) +#define BUILTIN_VDQF_COND(T, N, MAP) \ + VAR6 (T, N, MAP, v2sf, v2si, v4sf, v4si, v2df, v2di) +#define BUILTIN_VALLF(T, N, MAP) \ + VAR5 (T, N, MAP, v2sf, v4sf, v2df, sf, df) +#define BUILTIN_V2F(T, N, MAP) \ + VAR2 (T, N, MAP, v2sf, v2df) +#define BUILTIN_VALL(T, N, MAP) \ + VAR10 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, v2sf, v4sf, v2df) +#define BUILTIN_VALLDI(T, N, MAP) \ + VAR11 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, v2sf, v4sf, v2df, di) +#define BUILTIN_VALLDIF(T, N, MAP) \ + VAR12 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, v2sf, v4sf, v2df, di, df) +#define BUILTIN_VDQV(T, N, MAP) \ + VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v4si, v2di) +#define BUILTIN_VDQV_S(T, N, MAP) \ + VAR5 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v4si) +#define BUILTIN_VDN(T, N, MAP) \ + VAR3 (T, N, MAP, v4hi, v2si, di) +#define BUILTIN_VQN(T, N, MAP) \ + VAR3 (T, N, MAP, v8hi, v4si, v2di) +#define BUILTIN_VDW(T, N, MAP) \ + VAR3 (T, N, MAP, v8qi, v4hi, v2si) +#define BUILTIN_VSQN_HSDI(T, N, MAP) \ + VAR6 (T, N, MAP, v8hi, v4si, v2di, hi, si, di) +#define BUILTIN_VQW(T, N, MAP) \ + VAR3 (T, N, MAP, v16qi, v8hi, v4si) +#define BUILTIN_VDC(T, N, MAP) \ + VAR6 (T, N, MAP, v8qi, v4hi, v2si, v2sf, di, df) +#define BUILTIN_VDIC(T, N, MAP) \ + VAR3 (T, N, MAP, v8qi, v4hi, v2si) +#define BUILTIN_VD1(T, N, MAP) \ + VAR5 (T, N, MAP, v8qi, v4hi, v2si, v2sf, v1df) +#define BUILTIN_VDQIF(T, N, MAP) \ + VAR9 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2sf, v4sf, v2df) +#define BUILTIN_VDQQH(T, N, MAP) \ + VAR4 (T, N, MAP, v8qi, v16qi, v4hi, v8hi) +#define BUILTIN_VDQHS(T, N, MAP) \ + VAR4 (T, N, MAP, v4hi, v8hi, v2si, v4si) +#define BUILTIN_VDQHSD(T, N, MAP) \ + VAR5 (T, N, MAP, v4hi, v8hi, v2si, v4si, v2di) +#define BUILTIN_VDQQHS(T, N, MAP) \ + VAR6 (T,
[PATCH][ARM] Fix insn type of movmisalign neon load pattern
Hi all, While browsing the code I noticed that the pattern in the patch has a store type when it is really a vld1 operation. Looking at the patterns around it, I think it was just a copy-pasto. The patch corrects that. Tested arm-none-eabi. Ok for trunk? 2014-09-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/neon.md (*movmisalignmode_neon_load): Change type to neon_load1_1regq.diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 354a105..69b7cfa 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -296,7 +296,7 @@ (define_insn *movmisalignmode_neon_load UNSPEC_MISALIGNED_ACCESS))] TARGET_NEON !BYTES_BIG_ENDIAN unaligned_access vld1.V_sz_elem\t{%q0}, %A1 - [(set_attr type neon_store1_1regq)]) + [(set_attr type neon_load1_1regq)]) (define_insn vec_setmode_internal [(set (match_operand:VD 0 s_register_operand =w,w)
Re: [PATCH][ARM] Fix insn type of movmisalign neon load pattern
On 18/09/14 11:01, Kyrill Tkachov wrote: Hi all, While browsing the code I noticed that the pattern in the patch has a store type when it is really a vld1 operation. Looking at the patterns around it, I think it was just a copy-pasto. The patch corrects that. Tested arm-none-eabi. Ok for trunk? 2014-09-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/neon.md (*movmisalignmode_neon_load): Change type to neon_load1_1regq. arm-movmisalign-type.patch diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 354a105..69b7cfa 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -296,7 +296,7 @@ (define_insn *movmisalignmode_neon_load UNSPEC_MISALIGNED_ACCESS))] TARGET_NEON !BYTES_BIG_ENDIAN unaligned_access vld1.V_sz_elem\t{%q0}, %A1 - [(set_attr type neon_store1_1regq)]) + [(set_attr type neon_load1_1regq)]) (define_insn vec_setmode_internal [(set (match_operand:VD 0 s_register_operand =w,w) OK. R.
[PATCH 0/5] Fix handling of word subregs of wide registers
This series is a cleaned-up version of: https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html The underlying problem is that the semantics of subregs depend on the word size. You can't have a subreg for byte 2 of a 4-byte word, say, but you can have a subreg for word 2 of a 4-word value (as well as lowpart subregs of that word, etc.). This causes problems when an architecture has wider-than-word registers, since the addressability of a word can then depend on which register class is used. The register allocators need to fix up cases where a subreg turns out to be invalid for a particular class. This is really an extension of what we need to do for CANNOT_CHANGE_MODE_CLASS. Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf. Thanks, Richard
[PATCH 1/5] Allow *_HARD_REG_SET arguments to be const
Patch 4 needs to pass a const HARD_REG_SET to AND/COPY_HARD_REG_SET. This patch allows that for all intent-in arguments. gcc/ * hard-reg-set.h (COPY_HARD_REG_SET, COMPL_HARD_REG_SET) (AND_HARD_REG_SET, AND_COMPL_HARD_REG_SET, IOR_HARD_REG_SET) (IOR_COMPL_HARD_REG_SET): Allow the from set to be constant. Index: gcc/hard-reg-set.h === --- gcc/hard-reg-set.h 2014-09-15 10:00:12.133398136 +0100 +++ gcc/hard-reg-set.h 2014-09-15 10:00:12.129398185 +0100 @@ -168,32 +168,38 @@ do { HARD_REG_ELT_TYPE *scan_tp_ = (TO); scan_tp_[1] = -1; } while (0) #define COPY_HARD_REG_SET(TO, FROM) \ -do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM); \ +do { HARD_REG_ELT_TYPE *scan_tp_ = (TO); \ + const HARD_REG_ELT_TYPE *scan_fp_ = (FROM); \ scan_tp_[0] = scan_fp_[0];\ scan_tp_[1] = scan_fp_[1]; } while (0) #define COMPL_HARD_REG_SET(TO, FROM) \ -do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM); \ +do { HARD_REG_ELT_TYPE *scan_tp_ = (TO); \ + const HARD_REG_ELT_TYPE *scan_fp_ = (FROM); \ scan_tp_[0] = ~ scan_fp_[0]; \ scan_tp_[1] = ~ scan_fp_[1]; } while (0) #define AND_HARD_REG_SET(TO, FROM) \ -do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM); \ +do { HARD_REG_ELT_TYPE *scan_tp_ = (TO); \ + const HARD_REG_ELT_TYPE *scan_fp_ = (FROM); \ scan_tp_[0] = scan_fp_[0]; \ scan_tp_[1] = scan_fp_[1]; } while (0) #define AND_COMPL_HARD_REG_SET(TO, FROM) \ -do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM); \ +do { HARD_REG_ELT_TYPE *scan_tp_ = (TO); \ + const HARD_REG_ELT_TYPE *scan_fp_ = (FROM); \ scan_tp_[0] = ~ scan_fp_[0]; \ scan_tp_[1] = ~ scan_fp_[1]; } while (0) #define IOR_HARD_REG_SET(TO, FROM) \ -do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM); \ +do { HARD_REG_ELT_TYPE *scan_tp_ = (TO); \ + const HARD_REG_ELT_TYPE *scan_fp_ = (FROM); \ scan_tp_[0] |= scan_fp_[0]; \ scan_tp_[1] |= scan_fp_[1]; } while (0) #define IOR_COMPL_HARD_REG_SET(TO, FROM) \ -do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM); \ +do { HARD_REG_ELT_TYPE *scan_tp_ = (TO); \ + const HARD_REG_ELT_TYPE *scan_fp_ = (FROM); \ scan_tp_[0] |= ~ scan_fp_[0]; \ scan_tp_[1] |= ~ scan_fp_[1]; } while (0) @@ -236,37 +242,43 @@ do { HARD_REG_ELT_TYPE *scan_tp_ = (TO); scan_tp_[2] = -1; } while (0) #define COPY_HARD_REG_SET(TO, FROM) \ -do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM); \ +do { HARD_REG_ELT_TYPE *scan_tp_ = (TO); \ + const HARD_REG_ELT_TYPE *scan_fp_ = (FROM); \ scan_tp_[0] = scan_fp_[0];\ scan_tp_[1] = scan_fp_[1];\ scan_tp_[2] = scan_fp_[2]; } while (0) #define COMPL_HARD_REG_SET(TO, FROM) \ -do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM); \ +do { HARD_REG_ELT_TYPE *scan_tp_ = (TO); \ + const HARD_REG_ELT_TYPE *scan_fp_ = (FROM); \ scan_tp_[0] = ~ scan_fp_[0]; \ scan_tp_[1] = ~ scan_fp_[1]; \ scan_tp_[2] = ~ scan_fp_[2]; } while (0) #define AND_HARD_REG_SET(TO, FROM) \ -do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM); \ +do { HARD_REG_ELT_TYPE *scan_tp_ = (TO); \ + const HARD_REG_ELT_TYPE *scan_fp_ = (FROM); \ scan_tp_[0] = scan_fp_[0]; \ scan_tp_[1] = scan_fp_[1]; \ scan_tp_[2] = scan_fp_[2]; } while (0) #define AND_COMPL_HARD_REG_SET(TO, FROM) \ -do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM); \ +do { HARD_REG_ELT_TYPE *scan_tp_ = (TO); \ + const HARD_REG_ELT_TYPE *scan_fp_ = (FROM); \ scan_tp_[0] = ~ scan_fp_[0]; \ scan_tp_[1] = ~ scan_fp_[1]; \ scan_tp_[2] = ~ scan_fp_[2]; } while (0) #define IOR_HARD_REG_SET(TO, FROM) \ -do { HARD_REG_ELT_TYPE *scan_tp_ = (TO), *scan_fp_ = (FROM); \ +do { HARD_REG_ELT_TYPE *scan_tp_ = (TO); \ + const HARD_REG_ELT_TYPE *scan_fp_ = (FROM); \ scan_tp_[0] |= scan_fp_[0]; \ scan_tp_[1] |= scan_fp_[1]; \ scan_tp_[2] |= scan_fp_[2]; } while (0) #define
Re: Fix i386 FP_TRAPPING_EXCEPTIONS
On Wed, Sep 17, 2014 at 9:47 PM, Joseph S. Myers jos...@codesourcery.com wrote: The i386 sfp-machine.h defines FP_TRAPPING_EXCEPTIONS in a way that is always wrong: it treats a set bit as indicating the exception is trapping, when actually a set bit (both for 387 and SSE floating point) indicates it is masked, and a clear bit indicates it is trapping. This patch fixes this bug. Bootstrapped with no regressions on x86_64-unknown-linux-gnu. OK to commit? Note to ia64 maintainers: it would be a good idea to add a definition of FP_TRAPPING_EXCEPTIONS for ia64, and I expect the new test to fail on ia64 until you do so. libgcc: 2014-09-17 Joseph Myers jos...@codesourcery.com * config/i386/sfp-machine.h (FP_TRAPPING_EXCEPTIONS): Treat clear bits not set bits as indicating trapping exceptions. gcc/testsuite: 2014-09-17 Joseph Myers jos...@codesourcery.com * gcc.dg/torture/float128-exact-underflow.c: New test. My brown paperbag bug :( OK for mainline and release branches. Thanks, Uros.
[PATCH 2/5] Tweak subreg_get_info documentation
Try to clarify what subreg_get_info does and doesn't check. gcc/ * rtl.h (subreg_info): Expand commentary * rtlanal.c (subreg_get_info): Likewise. Index: gcc/rtl.h === --- gcc/rtl.h 2014-09-15 10:00:14.693366097 +0100 +++ gcc/rtl.h 2014-09-15 10:00:14.689366147 +0100 @@ -2866,10 +2866,13 @@ struct subreg_info { /* Offset of first hard register involved in the subreg. */ int offset; - /* Number of hard registers involved in the subreg. */ + /* Number of hard registers involved in the subreg. In the case of + a paradoxical subreg, this is the number of registers that would + be modified by writing to the subreg; some of them may be don't-care + when reading from the subreg. */ int nregs; /* Whether this subreg can be represented as a hard reg with the new - mode. */ + mode (by adding OFFSET to the original hard register). */ bool representable_p; }; Index: gcc/rtlanal.c === --- gcc/rtlanal.c 2014-09-15 10:00:14.693366097 +0100 +++ gcc/rtlanal.c 2014-09-15 10:00:14.689366147 +0100 @@ -3411,7 +3411,20 @@ subreg_lsb (const_rtx x) xmode - The mode of xregno. offset - The byte offset. ymode - The mode of a top level SUBREG (or what may become one). - info - Pointer to structure to fill in. */ + info - Pointer to structure to fill in. + + Rather than considering one particular inner register (and thus one + particular outer register) in isolation, this function really uses + XREGNO as a model for a sequence of isomorphic hard registers. Thus the + function does not check whether adding INFO-offset to XREGNO gives + a valid hard register; even if INFO-offset + XREGNO is out of range, + there might be another register of the same type that is in range. + Likewise it doesn't check whether HARD_REGNO_MODE_OK accepts the new + register, since that can depend on things like whether the final + register number is even or odd. Callers that want to check whether + this particular subreg can be replaced by a simple (reg ...) should + use simplify_subreg_regno. */ + void subreg_get_info (unsigned int xregno, enum machine_mode xmode, unsigned int offset, enum machine_mode ymode,
Re: [AArch64] Auto-generate the BUILTIN_ macros for aarch64-builtins.c
On 18/09/14 10:53, James Greenhalgh wrote: Hi, A possible source of errors is in keeping the iterators.md file and the iterator macros in aarch64-builtin.c synchronized. Clearly this shouldn't be a problem given standard unix tools, it is just a text processing job. This patch adds geniterators.sh to the AArch64 backend which takes the iterators.md file and generates aarch64-builtin-iterators.h, this replaces the definitions from aarch64-builtins.c, which now just include this file. Bootstrapped for aarch64-none-linux-gnueabi, and regression tested for aarch64-none-elf with no issues. OK? Thanks, James --- gcc/ 2014-09-18 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64-builtin-iterators.h: New. * config/aarch64/geniterators.sh: New. * config/aarch64/iterators.md (VDQF_DF): New. * config/aarch64/t-aarch64: Add dependencies on new build script. * config/aarch64/aarch64-builtins.c (BUILTIN_*) Remove. 0001-AArch64-Auto-generate-the-BUILTIN_-macros-for-aarch6.patch diff --git a/gcc/config/aarch64/aarch64-builtin-iterators.h b/gcc/config/aarch64/aarch64-builtin-iterators.h new file mode 100644 index 000..ae579e8 --- /dev/null +++ b/gcc/config/aarch64/aarch64-builtin-iterators.h @@ -0,0 +1,113 @@ +/* -*- buffer-read-only: t -*- */ +/* Generated automatically by geniterators.sh from iterators.md. */ +#ifndef GCC_AARCH64_ITERATORS_H +#define GCC_AARCH64_ITERATORS_H +#define BUILTIN_GPI(T, N, MAP) \ + VAR2 (T, N, MAP, si, di) +#define BUILTIN_SHORT(T, N, MAP) \ + VAR2 (T, N, MAP, qi, hi) +#define BUILTIN_ALLI(T, N, MAP) \ + VAR4 (T, N, MAP, qi, hi, si, di) +#define BUILTIN_SDQ_I(T, N, MAP) \ + VAR4 (T, N, MAP, qi, hi, si, di) +#define BUILTIN_ALLX(T, N, MAP) \ + VAR3 (T, N, MAP, qi, hi, si) +#define BUILTIN_GPF(T, N, MAP) \ + VAR2 (T, N, MAP, sf, df) +#define BUILTIN_VDQ(T, N, MAP) \ + VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di) +#define BUILTIN_VDQ_I(T, N, MAP) \ + VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di) +#define BUILTIN_VSDQ_I(T, N, MAP) \ + VAR11 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, qi, hi, si, di) +#define BUILTIN_VSDQ_I_DI(T, N, MAP) \ + VAR8 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, di) +#define BUILTIN_VD(T, N, MAP) \ + VAR4 (T, N, MAP, v8qi, v4hi, v2si, v2sf) +#define BUILTIN_VD_BHSI(T, N, MAP) \ + VAR3 (T, N, MAP, v8qi, v4hi, v2si) +#define BUILTIN_VDQ_BHSI(T, N, MAP) \ + VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si) +#define BUILTIN_VQ(T, N, MAP) \ + VAR6 (T, N, MAP, v16qi, v8hi, v4si, v2di, v4sf, v2df) +#define BUILTIN_VQ_NO2E(T, N, MAP) \ + VAR4 (T, N, MAP, v16qi, v8hi, v4si, v4sf) +#define BUILTIN_VQ_2E(T, N, MAP) \ + VAR2 (T, N, MAP, v2di, v2df) +#define BUILTIN_VQ_S(T, N, MAP) \ + VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si) +#define BUILTIN_VSDQ_I_BHSI(T, N, MAP) \ + VAR10 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, qi, hi, si) +#define BUILTIN_VDQM(T, N, MAP) \ + VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si) +#define BUILTIN_VDQF(T, N, MAP) \ + VAR3 (T, N, MAP, v2sf, v4sf, v2df) +#define BUILTIN_VDQF_DF(T, N, MAP) \ + VAR4 (T, N, MAP, v2sf, v4sf, v2df, df) +#define BUILTIN_VDQSF(T, N, MAP) \ + VAR2 (T, N, MAP, v2sf, v4sf) +#define BUILTIN_VDQF_COND(T, N, MAP) \ + VAR6 (T, N, MAP, v2sf, v2si, v4sf, v4si, v2df, v2di) +#define BUILTIN_VALLF(T, N, MAP) \ + VAR5 (T, N, MAP, v2sf, v4sf, v2df, sf, df) +#define BUILTIN_V2F(T, N, MAP) \ + VAR2 (T, N, MAP, v2sf, v2df) +#define BUILTIN_VALL(T, N, MAP) \ + VAR10 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, v2sf, v4sf, v2df) +#define BUILTIN_VALLDI(T, N, MAP) \ + VAR11 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, v2sf, v4sf, v2df, di) +#define BUILTIN_VALLDIF(T, N, MAP) \ + VAR12 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di, v2sf, v4sf, v2df, di, df) +#define BUILTIN_VDQV(T, N, MAP) \ + VAR6 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v4si, v2di) +#define BUILTIN_VDQV_S(T, N, MAP) \ + VAR5 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v4si) +#define BUILTIN_VDN(T, N, MAP) \ + VAR3 (T, N, MAP, v4hi, v2si, di) +#define BUILTIN_VQN(T, N, MAP) \ + VAR3 (T, N, MAP, v8hi, v4si, v2di) +#define BUILTIN_VDW(T, N, MAP) \ + VAR3 (T, N, MAP, v8qi, v4hi, v2si) +#define BUILTIN_VSQN_HSDI(T, N, MAP) \ + VAR6 (T, N, MAP, v8hi, v4si, v2di, hi, si, di) +#define BUILTIN_VQW(T, N, MAP) \ + VAR3 (T, N, MAP, v16qi, v8hi, v4si) +#define BUILTIN_VDC(T, N, MAP) \ + VAR6 (T, N, MAP, v8qi, v4hi, v2si, v2sf, di, df) +#define BUILTIN_VDIC(T, N, MAP) \ + VAR3 (T, N, MAP, v8qi, v4hi, v2si) +#define BUILTIN_VD1(T, N, MAP) \ + VAR5 (T, N, MAP, v8qi, v4hi, v2si, v2sf, v1df) +#define BUILTIN_VDQIF(T, N, MAP) \ + VAR9 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2sf, v4sf, v2df) +#define BUILTIN_VDQQH(T, N, MAP) \ + VAR4 (T, N, MAP,
[PATCH 3/5] Use simplify_subreg_regno in combine.c:subst
combine.c:subst should refuse to substitute a hard register into a subreg if the new subreg would not be simplified to a simple hard register, since the result would have to be reloaded. This is more for optimisation than correctness, since in theory the RA should be able to fix up any unsimplified subregs. gcc/ * combine.c (subst): Use simplify_subreg_regno rather than REG_CANNOT_CHANGE_MODE_P to detect invalid mode changes. Index: gcc/combine.c === --- gcc/combine.c 2014-09-15 10:00:17.545330404 +0100 +++ gcc/combine.c 2014-09-15 10:00:17.545330404 +0100 @@ -5121,15 +5121,13 @@ #define COMBINE_RTX_EQUAL_P(X,Y) \ ) return gen_rtx_CLOBBER (VOIDmode, const0_rtx); -#ifdef CANNOT_CHANGE_MODE_CLASS if (code == SUBREG REG_P (to) REGNO (to) FIRST_PSEUDO_REGISTER - REG_CANNOT_CHANGE_MODE_P (REGNO (to), - GET_MODE (to), - GET_MODE (x))) + simplify_subreg_regno (REGNO (to), GET_MODE (to), + SUBREG_BYTE (x), + GET_MODE (x)) 0) return gen_rtx_CLOBBER (VOIDmode, const0_rtx); -#endif new_rtx = (unique_copy n_occurrences ? copy_rtx (to) : to); n_occurrences++;
Re: [PATCH AArch64]: Add constraint letter for stack_protect_test pattern)
On Wed, Sep 17, 2014 at 03:50:55PM +0100, pins...@gmail.com wrote: On Sep 17, 2014, at 7:43 AM, James Greenhalgh james.greenha...@arm.com wrote: On Wed, Sep 17, 2014 at 09:30:31AM +0100, Richard Earnshaw wrote: =r is correct for an early-clobbered scratch. R. In that case... How is the attached patch for trunk? I've bootstrapped it on AArch64 with -fstack-protector-strong and -frename-registers in the BOOT_CFLAGS without seeing any issues. There is nothing aarch64 specific about this testcase so I would place them under gcc.dg and add the extra marker which says this testcase requires stack protector. That sounds reasonable to me. Updated as attached, along with Jakub's suggestions. And maybe even use compile instead of just assemble too. Compile is weaker than assemble. Assemble takes you up to an object file, which is as far as we need to go. Thanks, James --- gcc/ 2014-09-18 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64.md (stack_protect_test_mode): Mark scratch register as an output to placate register renaming. gcc/testsuite/ 2014-09-18 James Greenhalgh james.greenha...@arm.com * gcc.dg/ssp-3.c: New. * gcc.dg/ssp-4.c: Likewise. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c60038a9015d614f40f6d9e3fd228ad3e2b247a8..f15a516bb0559c86bea7512f91d60dc179ec9149 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4031,7 +4031,7 @@ (define_insn stack_protect_test_mode (unspec:PTR [(match_operand:PTR 1 memory_operand m) (match_operand:PTR 2 memory_operand m)] UNSPEC_SP_TEST)) - (clobber (match_scratch:PTR 3 r))] + (clobber (match_scratch:PTR 3 =r))] ldr\t%w3, %x1\;ldr\t%w0, %x2\;eor\t%w0, %w3, %w0 [(set_attr length 12) diff --git a/gcc/testsuite/gcc.dg/ssp-3.c b/gcc/testsuite/gcc.dg/ssp-3.c new file mode 100644 index 000..98c12da --- /dev/null +++ b/gcc/testsuite/gcc.dg/ssp-3.c @@ -0,0 +1,16 @@ +/* { dg-do assemble } */ +/* { dg-options -fstack-protector-strong -O1 -frename-registers } */ +/* { dg-require-effective-target fstack_protector } */ + +extern int bar (const char *s, int *argc); +extern int baz (const char *s); + +char +foo (const char *s) +{ + int argc; + int ret; + if ( !bar (s, argc)) +ret = baz (s); + return *s; +} diff --git a/gcc/testsuite/gcc.dg/ssp-4.c b/gcc/testsuite/gcc.dg/ssp-4.c new file mode 100644 index 000..402033c --- /dev/null +++ b/gcc/testsuite/gcc.dg/ssp-4.c @@ -0,0 +1,18 @@ +/* { dg-do assemble } */ +/* { dg-options -fstack-protector-strong -O1 -frename-registers } */ +/* { dg-require-effective-target fstack_protector } */ + +typedef unsigned int uint32_t; +struct ctx +{ + uint32_t A; +}; + +void * +buffer_copy (const struct ctx *ctx, void *resbuf) +{ + uint32_t buffer[4]; + buffer[0] = (ctx-A); + __builtin_memcpy (resbuf, buffer, sizeof (buffer)); + return resbuf; +}
[Patch] Teach genrecog/genoutput that scratch registers require write constraint modifiers
Hi, As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01334.html The construct (clobber (match_scratch 0 r)) is invalid - operand 0 must be marked either write or read/write. Likewise (match_* 0 r) is invalid, marking an operand earlyclobber does not remove the need to also mark it write or read/write. This patch adds checking for these two error conditions to the generator programs and documents the restriction. Bootstrapped on x86, ARM and AArch64 with no new issues. Ok? Thanks, James --- 2014-09-17 James Greenhalgh james.greenha...@arm.com * doc/md.texi (Modifiers): Consistently use read/write nomenclature rather than input/output. * genrecog.c (constraints_supported_in_insn_p): New. (validate_pattern): If needed, also check constraints on MATCH_SCRATCH operands. * genoutput.c (validate_insn_alternatives): Catch earlyclobber operands with no '=' or '+' modifier. diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 80e8bd6..435d850 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -1546,18 +1546,18 @@ Here are constraint modifier characters. @table @samp @cindex @samp{=} in constraint @item = -Means that this operand is write-only for this instruction: the previous -value is discarded and replaced by output data. +Means that this operand is written to by this instruction: +the previous value is discarded and replaced by new data. @cindex @samp{+} in constraint @item + Means that this operand is both read and written by the instruction. When the compiler fixes up the operands to satisfy the constraints, -it needs to know which operands are inputs to the instruction and -which are outputs from it. @samp{=} identifies an output; @samp{+} -identifies an operand that is both input and output; all other operands -are assumed to be input only. +it needs to know which operands are read by the instruction and +which are written by it. @samp{=} identifies an operand which is only +written; @samp{+} identifies an operand that is both read and written; all +other operands are assumed to only be read. If you specify @samp{=} or @samp{+} in a constraint, you put it in the first character of the constraint string. @@ -1566,9 +1566,9 @@ first character of the constraint string. @cindex earlyclobber operand @item Means (in a particular alternative) that this operand is an -@dfn{earlyclobber} operand, which is modified before the instruction is +@dfn{earlyclobber} operand, which is written before the instruction is finished using the input operands. Therefore, this operand may not lie -in a register that is used as an input operand or as part of any memory +in a register that is read by the instruction or as part of any memory address. @samp{} applies only to the alternative in which it is written. In @@ -1576,16 +1576,19 @@ constraints with multiple alternatives, sometimes one alternative requires @samp{} while others do not. See, for example, the @samp{movdf} insn of the 68000. -An input operand can be tied to an earlyclobber operand if its only -use as an input occurs before the early result is written. Adding -alternatives of this form often allows GCC to produce better code -when only some of the inputs can be affected by the earlyclobber. -See, for example, the @samp{mulsi3} insn of the ARM@. +A operand which is read by the instruction can be tied to an earlyclobber +operand if its only use as an input occurs before the early result is +written. Adding alternatives of this form often allows GCC to produce +better code when only some of the read operands can be affected by the +earlyclobber. See, for example, the @samp{mulsi3} insn of the ARM@. -Furthermore, if the @dfn{earlyclobber} operand is also read/write operand, then -that operand is modified only after it's used. +Furthermore, if the @dfn{earlyclobber} operand is also a read/write +operand, then that operand is written only after it's used. -@samp{} does not obviate the need to write @samp{=} or @samp{+}. +@samp{} does not obviate the need to write @samp{=} or @samp{+}. As +@dfn{earlyclobber} operands are always written, a read-only +@dfn{earlyclobber} operand is ill-formed and will be rejected by the +compiler. @cindex @samp{%} in constraint @item % @@ -1593,7 +1596,7 @@ Declares the instruction to be commutative for this operand and the following operand. This means that the compiler may interchange the two operands if that is the cheapest way to make all operands fit the constraints. @samp{%} applies to all alternatives and must appear as -the first character in the constraint. Only input operands can use +the first character in the constraint. Only read-only operands can use @samp{%}. @ifset INTERNALS diff --git a/gcc/genoutput.c b/gcc/genoutput.c index 69d5ab0..8094288 100644 --- a/gcc/genoutput.c +++ b/gcc/genoutput.c @@ -769,6 +769,7 @@ validate_insn_alternatives (struct data *d) char c; int
Re: Fix ARM ICE for register var asm (pc) (PR target/60606)
It seems to be the change to arm_regno_class relating to PC_REGNUM. I see scal-to-vec1.c failing with just that, or that in combination with the changes to cfgexpand.c+varasm.c. And scal-to-vec1.c is OK on -fPIC if I apply the changes to cfgexpand.c, varasm.c, and arm.c (arm_hard_regno_ok), i.e. all bar the change to arm_regno_class. A change relating to the program counter affecting -fPIC does sound plausible, I haven't looked any further than that... --Alan Joseph S. Myers wrote: On Wed, 17 Sep 2014, Alan Lawrence wrote: We've just noticed this patch causes an ICE in gcc.c-torture/execute/scal-to-vec1.c at -O3 when running with -fPIC on arm-none-linux-gnueabi and arm-none-linux-gnueabihf; test logs: Which part causes the ICE? The arm_hard_regno_mode_ok change relating to modes assigned to CC_REGNUM, the arm_regno_class change relating to PC_REGNUM, or something else? Either of those would indicate something very strange going on in LRA (maybe something else needs to change somewhere as well to stop attempts to use CC_REGNUM or PC_REGNUM inappropriately?).
[PATCH 4/5] Generalise invalid_mode_change_p
This is the main patch for the bug. We should treat a register as invalid for a mode change if simplify_subreg_regno cannot provide a new register number for the result. We should treat a class as invalid for a mode change if all registers in the class are invalid. This is an extension of the old CANNOT_CHANGE_MODE_CLASS-based check (simplify_subreg_regno checks C_C_C_M). I forgot to say that the patch is a prerequisite to removing aarch64's C_C_C_M. There are other prerequisites too, but removing C_C_C_M without this patch caused regressions in the existing testsuite, which is why no new tests are needed. gcc/ * hard-reg-set.h: Include hash-table.h. (target_hard_regs): Add a finalize method and a x_simplifiable_subregs field. * target-globals.c (target_globals::~target_globals): Handle hard_regs-finalize. * rtl.h (subreg_shape): New structure. (shape_of_subreg): New function. (simplifiable_subregs): Declare. * reginfo.c (simplifiable_subreg): New structure. (simplifiable_subregs_hasher): Likewise. (simplifiable_subregs): New function. (invalid_mode_changes): Delete. (alid_mode_changes, valid_mode_changes_obstack): New variables. (record_subregs_of_mode): Remove subregs_of_mode parameter. Record valid mode changes in valid_mode_changes. (find_subregs_of_mode): Remove subregs_of_mode parameter. Update calls to record_subregs_of_mode. (init_subregs_of_mode): Remove invalid_mode_changes and bitmap handling. Initialize new variables. Update call to find_subregs_of_mode. (invalid_mode_change_p): Check new variables instead of invalid_mode_changes. (finish_subregs_of_mode): Finalize new variables instead of invalid_mode_changes. (target_hard_regs::finalize): New function. * ira-costs.c (print_allocno_costs): Call invalid_mode_change_p even when CLASS_CANNOT_CHANGE_MODE is undefined. Index: gcc/hard-reg-set.h === --- gcc/hard-reg-set.h 2014-09-15 11:55:40.459855161 +0100 +++ gcc/hard-reg-set.h 2014-09-15 11:55:40.455855210 +0100 @@ -20,6 +20,8 @@ Software Foundation; either version 3, o #ifndef GCC_HARD_REG_SET_H #define GCC_HARD_REG_SET_H +#include hash-table.h + /* Define the type of a set of hard registers. */ /* HARD_REG_ELT_TYPE is a typedef of the unsigned integral type which @@ -613,7 +615,11 @@ #define EXECUTE_IF_SET_IN_HARD_REG_SET(S extern char global_regs[FIRST_PSEUDO_REGISTER]; +struct simplifiable_subregs_hasher; + struct target_hard_regs { + void finalize (); + /* The set of registers that actually exist on the current target. */ HARD_REG_SET x_accessible_reg_set; @@ -688,6 +694,10 @@ struct target_hard_regs { /* Vector indexed by hardware reg giving its name. */ const char *x_reg_names[FIRST_PSEUDO_REGISTER]; + + /* Records which registers can form a particular subreg, with the subreg + being identified by its outer mode, inner mode and offset. */ + hash_table simplifiable_subregs_hasher *x_simplifiable_subregs; }; extern struct target_hard_regs default_target_hard_regs; Index: gcc/target-globals.c === --- gcc/target-globals.c2014-09-15 11:55:40.459855161 +0100 +++ gcc/target-globals.c2014-09-15 11:55:40.459855161 +0100 @@ -125,6 +125,7 @@ target_globals::~target_globals () /* default_target_globals points to static data so shouldn't be freed. */ if (this != default_target_globals) { + hard_regs-finalize (); XDELETE (flag_state); XDELETE (regs); XDELETE (recog); Index: gcc/rtl.h === --- gcc/rtl.h 2014-09-15 11:55:40.459855161 +0100 +++ gcc/rtl.h 2014-09-15 12:26:21.249077760 +0100 @@ -1822,6 +1822,64 @@ costs_add_n_insns (struct full_rtx_costs c-size += COSTS_N_INSNS (n); } +/* Describes the shape of a subreg: + + inner_mode == the mode of the SUBREG_REG + offset == the SUBREG_BYTE + outer_mode == the mode of the SUBREG itself. */ +struct subreg_shape { + subreg_shape (enum machine_mode, unsigned int, enum machine_mode); + bool operator == (const subreg_shape ) const; + bool operator != (const subreg_shape ) const; + unsigned int unique_id () const; + + enum machine_mode inner_mode; + unsigned int offset; + enum machine_mode outer_mode; +}; + +inline +subreg_shape::subreg_shape (enum machine_mode inner_mode_in, + unsigned int offset_in, + enum machine_mode outer_mode_in) + : inner_mode (inner_mode_in), offset (offset_in), outer_mode (outer_mode_in) +{} + +inline bool +subreg_shape::operator == (const subreg_shape other) const +{ + return (inner_mode == other.inner_mode + offset == other.offset +
[PATCH 5/5] Remove CANNOT_CHANGE_MODE_CLASS workaround in i386.c
Patch 4 should make it possible to relax i386'a CANNOT_CHANGE_MODE_CLASS, solving the missed optimisation that triggered the original thread. gcc/ * config/i386/i386.c (ix86_cannot_change_mode_class): Remove GET_MODE_SIZE (to) GET_MODE_SIZE (from) test. Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c 2014-09-15 09:48:11.310438531 +0100 +++ gcc/config/i386/i386.c 2014-09-15 09:48:11.310438531 +0100 @@ -37526,13 +37526,6 @@ ix86_cannot_change_mode_class (enum mach the vec_dupv4hi pattern. */ if (GET_MODE_SIZE (from) 4) return true; - - /* Vector registers do not support subreg with nonzero offsets, which -are otherwise valid for integer registers. Since we can't see -whether we have a nonzero offset from here, prohibit all - nonparadoxical subregs changing size. */ - if (GET_MODE_SIZE (to) GET_MODE_SIZE (from)) - return true; } return false;
[patch] powerpc-vxworksmils port, variant of powerpc-vxworksae
Hello, We have been maintaining a port to VxWorks MILS for powerpc for a while now and thought others might be interested. VxWorksMILS is very close to VxWorksAE, so the patch is pretty small. The main noticeable difference is that only the vthreads environment headers are available, so we arrange to build the libgcc variants all with -mvthreads. We have been using this with a gcc-4.7 based toolchain for a couple of years and moved to gcc-4.9 recently. The 4.9 patch applies as-is on mainline. OK to commit ? Thanks in advance for your feedback, With Kind Regards, Olivier 2014-09-18 Olivier Hainque hain...@adacore.com gcc/ * config.gcc (powerpc-wrs-vxworksmils): New configuration. * config/rs6000/t-vxworksmils: New file. * config/rs6000/vxworksmils.h: New file. libgcc/ * config.host (powerpc-wrs-vxworksmils): New configuration, same as vxworksae. contrib/ * config-list.mk (LIST): Add powerpc-wrs-vxworksmils. vxmils.diff Description: Binary data
Re: [PATCH][PING] Enable -fsanitize-recover for KASan
On Mon, Sep 15, 2014 at 01:38:42PM +0400, Yury Gribov wrote: --- a/gcc/builtins.def +++ b/gcc/builtins.def @@ -176,7 +176,7 @@ along with GCC; see the file COPYING3. If not see DEF_BUILTIN (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE, TYPE,\ true, true, true, ATTRS, true, \ (flag_sanitize (SANITIZE_ADDRESS | SANITIZE_THREAD \ - | SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT))) + | SANITIZE_UNDEFINED | SANITIZE_UNDEFINED_NONDEFAULT))) This is too long line after the change. --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -8236,7 +8236,7 @@ sanitize_spec_function (int argc, const char **argv) if (strcmp (argv[0], thread) == 0) return (flag_sanitize SANITIZE_THREAD) ? : NULL; if (strcmp (argv[0], undefined) == 0) -return ((flag_sanitize (SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT)) +return ((flag_sanitize (SANITIZE_UNDEFINED | SANITIZE_UNDEFINED_NONDEFAULT)) Likewise. --- a/gcc/opts.c +++ b/gcc/opts.c @@ -1551,6 +1551,12 @@ common_handle_option (struct gcc_options *opts, | SANITIZE_RETURNS_NONNULL_ATTRIBUTE)) opts-x_flag_delete_null_pointer_checks = 0; + /* UBSan and KASan enable recovery by default. */ + opts-x_flag_sanitize_recover + = !!(flag_sanitize (SANITIZE_UNDEFINED + | SANITIZE_UNDEFINED_NONDEFAULT + | SANITIZE_KERNEL_ADDRESS)); + Doesn't this override even user supplied -fsanitize-recover or -fno-sanitize-recover ? Have you tried both -fno-sanitize-recover -fsanitize=kernel-address and -fsanitize=kernel-address -fno-sanitize-recover option orders? Seems for -fdelete-null-pointer-checks we got it wrong too, IMHO for -fsanitize={null,{,returns-}nonnull-attribute,undefined} we want to disable it unconditionally, regardless of whether that option appears on the command line or not. And we handle it right for -fdelete-null-pointer-checks -fsanitize=undefined but not for -fsanitize=undefined -fdelete-null-pointer-checks Joseph, thoughts where to override it instead (I mean, after all options are processed)? In the -fsanitize-recover case, I'd on the other side think that it should just override the default and not override explicit user's decision. Which could be done here, but supposedly guarded with if (!opts_set-x_flag_sanitize_recover)? I don't think your proposal will work properly though, if one compiles with -fsanitize=undefined -fsanitize=address you'll just get userland asan with error recovery, which is highly undesirable (not just that it changes the behavior from how it behaved before, but especially because libasan doesn't contain such entrypoints at all). -fsanitize=undefined,address or -fsanitize=address,undefined is normal supported mode and thus I think you either can't reuse -fsanitize-recover option for what you want to do, or asan.c needs to limit it to flag_sanitize SANITIZE_KERNEL_ADDRESS mode only. Depends if you ever want to add recovery for userland sanitization. Jakub
Re: [kyukhin/gomp4-offload] DESTDIR issues
Ok, the approach with additional --enable-offload-targets arguments seems to be more appropriate, so I will fix offloading infrastructure pach #1. Thanks, -- Ilya
Re: [PATCH][Kasan][PING] Allow to override Asan shadow offset from command line
On Mon, Sep 15, 2014 at 01:46:14PM +0400, Yury Gribov wrote: On 09/08/2014 06:29 PM, Yury Gribov wrote: Kasan developers has asked for an option to override offset of Asan shadow memory region. This should simplify experimenting with memory layouts on 64-bit architectures. I've bootstrapped and regtested this on x64. Ok to commit? I don't like it at all. For the kernel-address perhaps it might make sense as a param, but for userland, as it is an ABI changing option, I'm afraid people would start to create objects/shared libraries/binaries with ABI incompatible values. So, if you need it for kernel, use a param that can be eventually dropped, and limit it to kernel-address mode only. Jakub
Re: [PATCH][Kasan][PING] Allow to override Asan shadow offset from command line
On 09/18/2014 03:01 PM, Jakub Jelinek wrote: On Mon, Sep 15, 2014 at 01:46:14PM +0400, Yury Gribov wrote: On 09/08/2014 06:29 PM, Yury Gribov wrote: Kasan developers has asked for an option to override offset of Asan shadow memory region. This should simplify experimenting with memory layouts on 64-bit architectures. I've bootstrapped and regtested this on x64. Ok to commit? I don't like it at all. For the kernel-address perhaps it might make sense as a param, but for userland, as it is an ABI changing option, I'm afraid people would start to create objects/shared libraries/binaries with ABI incompatible values. So, if you need it for kernel, use a param that can be eventually dropped, and limit it to kernel-address mode only. Problem with params is that they are ints so won't work for 64-bit platforms. How about aborting if -fasan-shadow-offset is supplied without -fsanitize=kernel-address? -Y
[PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114
The end goal here is to remove this code from tree-vect-loop.c (vect_create_epilog_for_reduction): if (BYTES_BIG_ENDIAN) bitpos = size_binop (MULT_EXPR, bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1), TYPE_SIZE (scalar_type)); else as this is the root cause of PR/61114 (see testcase there, failing on all bigendian targets supporting reduc_[us]plus_optab). Quoting Richard Biener, all code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is suspicious. The code snippet above is used on two paths: (Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR = reduc_[us](plus|min|max)_optab. The optab is documented as the scalar result is stored in the least significant bits of operand 0, but the tree code as the first element in the vector holding the result of the reduction of all elements of the operand. This mismatch means that when the tree code is folded, the code snippet above reads the result from the wrong end of the vector. The strategy (as per https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) is to define new tree codes and optabs that produce scalar results directly; this seems better than tying (the element of the vector into which the result is placed) to (the endianness of the target), and avoids generating extra moves on current bigendian targets. However, the previous optabs are retained for now as a migration strategy so as not to break existing backends; moving individual platforms over will follow. A complication here is on AArch64, where we directly generate REDUC_PLUS_EXPRs from intrinsics in gimple_fold_builtin; I temporarily remove this folding in order to decouple the midend and AArch64 backend. (Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e. VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the optab is defined in an endianness-dependent way, leading to significant complication in fold-const.c. (Moreover, the equivalent vec_shl_optab is never used!). Few platforms appear to handle vec_shr_optab (and fewer bigendian - I see only PowerPC and MIPS), so it seems pertinent to change the existing optab to be endianness-neutral. Patch 10 defines vec_shr for AArch64, for the old specification; patch 13 updates that implementation to fit the new endianness-neutral specification, serving as a guide for other existing backends. Patches/RFCs 15 and 16 are equivalents for MIPS and PowerPC; I haven't tested these but hope they act as useful pointers for the port maintainers. Finally patch 14 cleans up the affected part of tree-vect-loop.c (vect_create_epilog_for_reduction). --Alan
[PATCH 1/14][AArch64] Temporarily remove aarch64_gimple_fold_builtin code for reduction operations
The gimple folding ties the AArch64 backend to the tree representation of the midend via the neon intrinsics. This code enables constant folding of Neon intrinsics reduction ops, so improves performance, but is not necessary for correctness. By temporarily removing it (here), we can then change the midend representation independently of the AArch64 backend + intrinsics. However, I'm leaving the code in place, as a later patch will bring it all back in a very similar form (but enabled for bigendian). Bootstrapped on aarch64-none-linux; tested aarch64.exp on aarch64-none-elf and aarch64_be-none-elf. (The removed code was already disabled for bigendian; and this is solely a __builtin-folding mechanism, i.e. used only for Neon/ACLE intrinsics.) gcc/ChangeLog: * config/aarch64/aarch64.c (TARGET_GIMPLE_FOLD_BUILTIN): Comment out. * config/aarch64/aarch64-builtins.c (aarch64_gimple_fold_builtin): Remove using preprocessor directives.diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 5217f4a5f39224dbf8029542ad33790ef2c191be..15eb7c686d95b1d66cbd514500ec29ba074eaa3f 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -1333,6 +1333,9 @@ aarch64_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *args, return NULL_TREE; } +/* Handling of reduction operations temporarily removed so as to decouple + changes to tree codes from AArch64 NEON Intrinsics. */ +#if 0 bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) { @@ -1404,6 +1407,7 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) return changed; } +#endif void aarch64_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index e7946fc0b70ced70a4e98caa0a33121f29242aad..9197ec038b7d40a601c886b846113c50a29cf5e2 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -9925,8 +9925,8 @@ aarch64_expand_movmem (rtx *operands) #undef TARGET_FRAME_POINTER_REQUIRED #define TARGET_FRAME_POINTER_REQUIRED aarch64_frame_pointer_required -#undef TARGET_GIMPLE_FOLD_BUILTIN -#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin +//#undef TARGET_GIMPLE_FOLD_BUILTIN +//#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin #undef TARGET_GIMPLIFY_VA_ARG_EXPR #define TARGET_GIMPLIFY_VA_ARG_EXPR aarch64_gimplify_va_arg_expr
[PATCH i386 AVX512] [42/n] Add masked vunpck[lh]pd.
Hello, Patch in the bottom extends/adds patterns for masked unpack instructions. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn avx_unpckhpd256mask_name): Add masking. (define_insn avx512vl_unpckhpd128_mask): New. (define_expand avx_movddup256mask_name): Add masking. (define_expand avx_unpcklpd256mask_name): Ditto. (define_insn *avx_unpcklpd256mask_name): Ditto. (define_insn avx512vl_unpcklpd128_mask): New. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 533308b..ab2d3b1 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -7081,16 +7081,16 @@ (set_attr mode V8DF)]) ;; Recall that the 256-bit unpck insns only shuffle within their lanes. -(define_insn avx_unpckhpd256 - [(set (match_operand:V4DF 0 register_operand =x) +(define_insn avx_unpckhpd256mask_name + [(set (match_operand:V4DF 0 register_operand =v) (vec_select:V4DF (vec_concat:V8DF - (match_operand:V4DF 1 register_operand x) - (match_operand:V4DF 2 nonimmediate_operand xm)) + (match_operand:V4DF 1 register_operand v) + (match_operand:V4DF 2 nonimmediate_operand vm)) (parallel [(const_int 1) (const_int 5) (const_int 3) (const_int 7)])))] - TARGET_AVX - vunpckhpd\t{%2, %1, %0|%0, %1, %2} + TARGET_AVX mask_avx512vl_condition + vunpckhpd\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2} [(set_attr type sselog) (set_attr prefix vex) (set_attr mode V4DF)]) @@ -7124,6 +7124,22 @@ }) +(define_insn avx512vl_unpckhpd128_mask + [(set (match_operand:V2DF 0 register_operand =v) + (vec_merge:V2DF + (vec_select:V2DF + (vec_concat:V4DF + (match_operand:V2DF 1 register_operand v) + (match_operand:V2DF 2 nonimmediate_operand vm)) + (parallel [(const_int 1) (const_int 3)])) + (match_operand:V2DF 3 vector_move_operand 0C) + (match_operand:QI 4 register_operand Yk)))] + TARGET_AVX512VL + vunpckhpd\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2} + [(set_attr type sselog) + (set_attr prefix evex) + (set_attr mode V2DF)]) + (define_expand vec_interleave_highv2df [(set (match_operand:V2DF 0 register_operand) (vec_select:V2DF @@ -7204,7 +7220,7 @@ (set_attr mode V8DF)]) ;; Recall that the 256-bit unpck insns only shuffle within their lanes. -(define_expand avx_movddup256 +(define_expand avx_movddup256mask_name [(set (match_operand:V4DF 0 register_operand) (vec_select:V4DF (vec_concat:V8DF @@ -7212,9 +7228,9 @@ (match_dup 1)) (parallel [(const_int 0) (const_int 4) (const_int 2) (const_int 6)])))] - TARGET_AVX) + TARGET_AVX mask_avx512vl_condition) -(define_expand avx_unpcklpd256 +(define_expand avx_unpcklpd256mask_name [(set (match_operand:V4DF 0 register_operand) (vec_select:V4DF (vec_concat:V8DF @@ -7222,20 +7238,20 @@ (match_operand:V4DF 2 nonimmediate_operand)) (parallel [(const_int 0) (const_int 4) (const_int 2) (const_int 6)])))] - TARGET_AVX) + TARGET_AVX mask_avx512vl_condition) -(define_insn *avx_unpcklpd256 - [(set (match_operand:V4DF 0 register_operand =x,x) +(define_insn *avx_unpcklpd256mask_name + [(set (match_operand:V4DF 0 register_operand =v,v) (vec_select:V4DF (vec_concat:V8DF - (match_operand:V4DF 1 nonimmediate_operand x,m) - (match_operand:V4DF 2 nonimmediate_operand xm,1)) + (match_operand:V4DF 1 nonimmediate_operand v,m) + (match_operand:V4DF 2 nonimmediate_operand vm,1)) (parallel [(const_int 0) (const_int 4) (const_int 2) (const_int 6)])))] - TARGET_AVX + TARGET_AVX mask_avx512vl_condition @ - vunpcklpd\t{%2, %1, %0|%0, %1, %2} - vmovddup\t{%1, %0|%0, %1} + vunpcklpd\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2} + vmovddup\t{%1, %0mask_operand3|%0mask_operand3, %1} [(set_attr type sselog) (set_attr prefix vex) (set_attr mode V4DF)]) @@ -7268,6 +7284,22 @@ operands[4] = gen_reg_rtx (V4DFmode); }) +(define_insn avx512vl_unpcklpd128_mask + [(set (match_operand:V2DF 0 register_operand =v) + (vec_merge:V2DF + (vec_select:V2DF + (vec_concat:V4DF + (match_operand:V2DF 1 register_operand v) + (match_operand:V2DF 2 nonimmediate_operand vm)) + (parallel [(const_int 0) (const_int 2)])) + (match_operand:V2DF 3 vector_move_operand 0C) + (match_operand:QI 4 register_operand Yk)))] + TARGET_AVX512VL + vunpcklpd\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2} + [(set_attr type sselog) + (set_attr prefix evex) + (set_attr mode V2DF)]) + (define_expand vec_interleave_lowv2df [(set (match_operand:V2DF 0
[PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result
This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes. These are presently documented as producing a vector with the result in element 0, and this is inconsistent with their use in tree-vect-loop.c (which on bigendian targets pulls the bits out of the wrong end of the vector result). This leads to bugs on bigendian targets - see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114. I discounted fixing the vectorizer (to read from element 0) and then making bigendian targets (whose architectural insn produces the result in lane N-1) permute the result vector, as optimization of vectors in RTL seems unlikely to remove such a permute and would lead to a performance regression. Instead it seems more natural for the tree code to produce a scalar result (producing a vector with the result in lane 0 has already caused confusion, e.g. https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01100.html). However, this patch preserves the meaning of the optab (producing a result in lane 0 on little-endian architectures or N-1 on bigendian), thus generally avoiding the need to change backends. Thus, expr.c extracts an endianness-dependent element from the optab result to give the result expected for the tree code. Previously posted as an RFC https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html , now with an extra VIEW_CONVERT_EXPR if the types of the reduction/result do not match. Testing: x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++ aarch64-none-linux-gnu: bootstrap aarch64-none-elf: check-gcc, check-g++ arm-none-eabi: check-gcc aarch64_be-none-elf: check-gcc, showing FAIL-PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test FAIL-PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test Passes the (previously-failing) reduced testcase on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114 Have also assembler/stage-1 tested that testcase on PowerPC, also fixed. gcc/ChangeLog: * expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add extract_bit_field around optab result. * fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR, produce scalar not vector. * tree-cfg.c (verify_gimple_assign_unary): Check result vs operand type for REDUC_{MIN,MAX,PLUS}_EXPR. * tree-vect-loop.c (vect_analyze_loop): Update comment. (vect_create_epilog_for_reduction): For direct vector reduction, use result of tree code directly without extract_bit_field. * tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update comment. diff --git a/gcc/expr.c b/gcc/expr.c index 58b87ba7ed7eee156b9730b61679af946694e8df..a293c06489f09586ed56dff1381467401687be45 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -9020,7 +9020,17 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode, { op0 = expand_normal (treeop0); this_optab = optab_for_tree_code (code, type, optab_default); -temp = expand_unop (mode, this_optab, op0, target, unsignedp); +enum machine_mode vec_mode = TYPE_MODE (TREE_TYPE (treeop0)); +temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp); +gcc_assert (temp); +/* The tree code produces a scalar result, but (somewhat by convention) + the optab produces a vector with the result in element 0 if + little-endian, or element N-1 if big-endian. So pull the scalar + result out of that element. */ +int index = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (vec_mode) - 1 : 0; +int bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode)); +temp = extract_bit_field (temp, bitsize, bitsize * index, unsignedp, + target, mode, mode); gcc_assert (temp); return temp; } diff --git a/gcc/fold-const.c b/gcc/fold-const.c index d44476972158b125aecd8c4a5c8d6176ad3b0e5c..b8baa94d37a74ebb824e2a4d03f2a10befcdf749 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -8475,12 +8475,13 @@ fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0) case REDUC_MAX_EXPR: case REDUC_PLUS_EXPR: { - unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i; + unsigned int nelts, i; tree *elts; enum tree_code subcode; if (TREE_CODE (op0) != VECTOR_CST) return NULL_TREE; +nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)); elts = XALLOCAVEC (tree, nelts); if (!vec_cst_ctor_to_array (op0, elts)) @@ -8499,10 +8500,9 @@ fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0) elts[0] = const_binop (subcode, elts[0], elts[i]); if (elts[0] == NULL_TREE || !CONSTANT_CLASS_P (elts[0])) return NULL_TREE; - elts[i] = build_zero_cst (TREE_TYPE (type)); } - return build_vector (type, elts); + return elts[0]; } default: diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
Re: [PATCH] RTEMS: Update contrib/config-list.mk
On Wed, 2014-09-17 10:52:34 -0500, Joel Sherrill joel.sherr...@oarcorp.com wrote: On 9/17/2014 10:41 AM, Sebastian Huber wrote: On 09/17/2014 04:45 PM, Jan-Benedict Glaw wrote: On Wed, 2014-09-17 15:37:32 +0200, Sebastian Hubersebastian.hu...@embedded-brains.de wrote: contrib/ChangeLog 2014-09-17 Sebastian Hubersebastian.hu...@embedded-brains.de * config-list.mk (LIST): Add arm-rtems. Add nios2-rtems. Remove extra option from powerpc-rtems. What's the rationale for removing --enable-threads=yes here, as well as the specific version number? [...] And is this the input to your buildbot? :) Yes, the target list in contrib/config-list.mk is what'll be built using the config-list.mk-building backend. (The robot has another backend using a different build strategy, which has a separate target list, though one could argue that I'd also include all the config-list.mk targets in that other list as well.) And to tell the whole story, Sebastian approached me with extending the target lists in use by those targets he sent a patch for; I just asked him to go this route, because I guess that'd be beneficial for other folks as well. MfG, JBG -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of: The course of history shows that as a government grows, liberty the second : decreases. (Thomas Jefferson) signature.asc Description: Digital signature
[PATCH i386 AVX512] [43/n] Add rest of vunpck[lh]ps.
Hello, This patch adds rest of unpack insn patterns. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn avx_unpckhps256mask_name): Add masking. (define_insn vec_interleave_highv4sfmask_name): Ditto. (define_insn avx_unpcklps256mask_name): Ditto. (define_insn unpcklps128_mask): New. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index ab2d3b1..295f11a 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -5525,18 +5525,18 @@ (set_attr mode V16SF)]) ;; Recall that the 256-bit unpck insns only shuffle within their lanes. -(define_insn avx_unpckhps256 - [(set (match_operand:V8SF 0 register_operand =x) +(define_insn avx_unpckhps256mask_name + [(set (match_operand:V8SF 0 register_operand =v) (vec_select:V8SF (vec_concat:V16SF - (match_operand:V8SF 1 register_operand x) - (match_operand:V8SF 2 nonimmediate_operand xm)) + (match_operand:V8SF 1 register_operand v) + (match_operand:V8SF 2 nonimmediate_operand vm)) (parallel [(const_int 2) (const_int 10) (const_int 3) (const_int 11) (const_int 6) (const_int 14) (const_int 7) (const_int 15)])))] - TARGET_AVX - vunpckhps\t{%2, %1, %0|%0, %1, %2} + TARGET_AVX mask_avx512vl_condition + vunpckhps\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2} [(set_attr type sselog) (set_attr prefix vex) (set_attr mode V8SF)]) @@ -5575,18 +5575,18 @@ operands[4] = gen_reg_rtx (V8SFmode); }) -(define_insn vec_interleave_highv4sf - [(set (match_operand:V4SF 0 register_operand =x,x) +(define_insn vec_interleave_highv4sfmask_name + [(set (match_operand:V4SF 0 register_operand =x,v) (vec_select:V4SF (vec_concat:V8SF - (match_operand:V4SF 1 register_operand 0,x) - (match_operand:V4SF 2 nonimmediate_operand xm,xm)) + (match_operand:V4SF 1 register_operand 0,v) + (match_operand:V4SF 2 nonimmediate_operand xm,vm)) (parallel [(const_int 2) (const_int 6) (const_int 3) (const_int 7)])))] - TARGET_SSE + TARGET_SSE mask_avx512vl_condition @ unpckhps\t{%2, %0|%0, %2} - vunpckhps\t{%2, %1, %0|%0, %1, %2} + vunpckhps\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2} [(set_attr isa noavx,avx) (set_attr type sselog) (set_attr prefix orig,vex) @@ -5613,22 +5613,39 @@ (set_attr mode V16SF)]) ;; Recall that the 256-bit unpck insns only shuffle within their lanes. -(define_insn avx_unpcklps256 - [(set (match_operand:V8SF 0 register_operand =x) +(define_insn avx_unpcklps256mask_name + [(set (match_operand:V8SF 0 register_operand =v) (vec_select:V8SF (vec_concat:V16SF - (match_operand:V8SF 1 register_operand x) - (match_operand:V8SF 2 nonimmediate_operand xm)) + (match_operand:V8SF 1 register_operand v) + (match_operand:V8SF 2 nonimmediate_operand vm)) (parallel [(const_int 0) (const_int 8) (const_int 1) (const_int 9) (const_int 4) (const_int 12) (const_int 5) (const_int 13)])))] - TARGET_AVX - vunpcklps\t{%2, %1, %0|%0, %1, %2} + TARGET_AVX mask_avx512vl_condition + vunpcklps\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2} [(set_attr type sselog) (set_attr prefix vex) (set_attr mode V8SF)]) +(define_insn unpcklps128_mask + [(set (match_operand:V4SF 0 register_operand =v) + (vec_merge:V4SF + (vec_select:V4SF + (vec_concat:V8SF + (match_operand:V4SF 1 register_operand v) + (match_operand:V4SF 2 nonimmediate_operand vm)) + (parallel [(const_int 0) (const_int 4) + (const_int 1) (const_int 5)])) + (match_operand:V4SF 3 vector_move_operand 0C) + (match_operand:QI 4 register_operand Yk)))] + TARGET_AVX512VL + vunpcklps\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2} + [(set_attr type sselog) + (set_attr prefix evex) + (set_attr mode V4SF)]) + (define_expand vec_interleave_lowv8sf [(set (match_dup 3) (vec_select:V8SF
[PATCH 3/14] Add new optabs for reducing vectors to scalars
These match their corresponding tree codes, by taking a vector and returning a scalar; this is more architecturally neutral than the (somewhat loosely defined) previous optab that took a vector and returned a vector with the result in the least significant bits (i.e. element 0 for little-endian or N-1 for bigendian). However, the old optabs are preserved so as not to break existing backends, so clients check for both old + new optabs. Bootstrap, check-gcc and check-g++ on x86_64-none-linux-gnu. aarch64.exp + vect.exp on aarch64{,_be}-none-elf. (of course at this point in the series all these are using the old optab + migration path.) gcc/ChangeLog: * doc/md.texi (Standard Names): Add reduc_(plus,[us](min|max))|scal optabs, and note in reduc_[us](plus|min|max) to prefer the former. * expr.c (expand_expr_real_2): Use reduc_..._scal if available, fall back to old reduc_... + BIT_FIELD_REF only if not. * optabs.c (optab_for_tree_code): for REDUC_(MAX,MIN,PLUS)_EXPR, return the reduce-to-scalar (reduc_..._scal) optab. (scalar_reduc_to_vector): New. * optabs.def (reduc_smax_scal_optab, reduc_smin_scal_optab, reduc_plus_scal_optab, reduc_umax_scal_optab, reduc_umin_scal_optab): New. * optabs.h (scalar_reduc_to_vector): Declare. * tree-vect-loop.c (vectorizable_reduction): Look for optabs reducing to either scalar or vector.diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index dd7861188afb8afd01971f9f75f0e32da9f9f826..3f5fd6f0e3ac3fcc30f6c961e3e2709a35f4d413 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4811,29 +4811,48 @@ it is unspecified which of the two operands is returned as the result. @cindex @code{reduc_smax_@var{m}} instruction pattern @item @samp{reduc_smin_@var{m}}, @samp{reduc_smax_@var{m}} Find the signed minimum/maximum of the elements of a vector. The vector is -operand 1, and the scalar result is stored in the least significant bits of +operand 1, and the result is stored in the least significant bits of operand 0 (also a vector). The output and input vector should have the same -modes. +modes. These are legacy optabs, and platforms should prefer to implement +@samp{reduc_smin_scal_@var{m}} and @samp{reduc_smax_scal_@var{m}}. @cindex @code{reduc_umin_@var{m}} instruction pattern @cindex @code{reduc_umax_@var{m}} instruction pattern @item @samp{reduc_umin_@var{m}}, @samp{reduc_umax_@var{m}} Find the unsigned minimum/maximum of the elements of a vector. The vector is -operand 1, and the scalar result is stored in the least significant bits of +operand 1, and the result is stored in the least significant bits of operand 0 (also a vector). The output and input vector should have the same -modes. +modes. These are legacy optabs, and platforms should prefer to implement +@samp{reduc_umin_scal_@var{m}} and @samp{reduc_umax_scal_@var{m}}. @cindex @code{reduc_splus_@var{m}} instruction pattern -@item @samp{reduc_splus_@var{m}} -Compute the sum of the signed elements of a vector. The vector is operand 1, -and the scalar result is stored in the least significant bits of operand 0 -(also a vector). The output and input vector should have the same modes. - @cindex @code{reduc_uplus_@var{m}} instruction pattern -@item @samp{reduc_uplus_@var{m}} -Compute the sum of the unsigned elements of a vector. The vector is operand 1, -and the scalar result is stored in the least significant bits of operand 0 +@item @samp{reduc_splus_@var{m}}, @samp{reduc_uplus_@var{m}} +Compute the sum of the signed/unsigned elements of a vector. The vector is +operand 1, and the result is stored in the least significant bits of operand 0 (also a vector). The output and input vector should have the same modes. +These are legacy optabs, and platforms should prefer to implement +@samp{reduc_plus_scal_@var{m}}. + +@cindex @code{reduc_smin_scal_@var{m}} instruction pattern +@cindex @code{reduc_smax_scal_@var{m}} instruction pattern +@item @samp{reduc_smin_scal_@var{m}}, @samp{reduc_smax_scal_@var{m}} +Find the signed minimum/maximum of the elements of a vector. The vector is +operand 1, and operand 0 is the scalar result, with mode equal to the mode of +the elements of the input vector. + +@cindex @code{reduc_umin_scal_@var{m}} instruction pattern +@cindex @code{reduc_umax_scal_@var{m}} instruction pattern +@item @samp{reduc_umin_scal_@var{m}}, @samp{reduc_umax_scal_@var{m}} +Find the unsigned minimum/maximum of the elements of a vector. The vector is +operand 1, and operand 0 is the scalar result, with mode equal to the mode of +the elements of the input vector. + +@cindex @code{reduc_plus_scal_@var{m}} instruction pattern +@item @samp{reduc_plus_scal_@var{m}} +Compute the sum of the elements of a vector. The vector is operand 1, and +operand 0 is the scalar result, with mode equal to the mode of the elements of +the input vector. @cindex @code{sdot_prod@var{m}} instruction pattern @item
[PATCH 4/14][AArch64] Use new reduc_plus_scal optabs, inc. for __builtins
This migrates AArch64 over to the new optab for 'plus' reductions, i.e. so the define_expands produce scalars by generating a MOV to a GPR. Effectively, this moves the vget_lane inside every arm_neon.h intrinsic, into the inside of the define_expand. Tested: aarch64.exp vect.exp on aarch64-none-elf and aarch64_be-none-elf (full check-gcc on next patch for reduc_min/max) gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (reduc_splus_mode/VDQF, reduc_uplus_mode/VDQF, reduc_splus_v4sf): Remove. (reduc_plus_scal_mode, reduc_plus_scal_v4sf): New. * config/aarch64/aarch64-simd.md (reduc_surplus_mode): Remove. (reduc_splus_mode, reduc_uplus_mode, reduc_plus_scal_mode): New. (reduc_surplus_mode): Change SUADDV - UNSPEC_ADDV, rename to... (aarch64_reduc_plus_internalmode): ...this. (reduc_surplus_v2si): Change SUADDV - UNSPEC_ADDV, rename to... (aarch64_reduc_plus_internalv2si): ...this. (reduc_splus_mode/V2F): Rename to... (aarch64_reduc_plus_internalmode): ...this. * config/aarch64/iterators.md (UNSPEC_SADDV, UNSPEC_UADDV, SUADDV): Remove. (UNSPEC_ADDV): New. (sur): Remove elements for UNSPEC_SADDV and UNSPEC_UADDV. * config/aarch64/arm_neon.h (vaddv_s8, vaddv_s16, vaddv_s32, vaddv_u8, vaddv_u16, vaddv_u32, vaddvq_s8, vaddvq_s16, vaddvq_s32, vaddvq_s64, vaddvq_u8, vaddvq_u16, vaddvq_u32, vaddvq_u64, vaddv_f32, vaddvq_f32, vaddvq_f64): Change __builtin_aarch64_reduc_[us]plus_... to __builtin_aarch64_reduc_plus_scal, remove vget_lane wrapper.diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 4f3bd12c8447e7125dfeba3f06536cdf9acc2440..ae4ab42e3e3df7de4e4b2c5e46a1476a2ed64175 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -248,9 +248,8 @@ BUILTIN_VSDQ_I_DI (BINOP, cmgtu, 0) BUILTIN_VSDQ_I_DI (BINOP, cmtst, 0) - /* Implemented by reduc_surplus_mode. */ - BUILTIN_VALL (UNOP, reduc_splus_, 10) - BUILTIN_VDQ (UNOP, reduc_uplus_, 10) + /* Implemented by aarch64_reduc_plus_mode. */ + BUILTIN_VALL (UNOP, reduc_plus_scal_, 10) /* Implemented by reduc_maxmin_uns_mode. */ BUILTIN_VDQIF (UNOP, reduc_smax_, 10) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index f5fa4aebe4cafe1430b31ca3a89ec5f3698d23bd..23b89584d9ba1d88ff49bfa28d210b325e7dea7f 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1719,25 +1719,74 @@ ;; 'across lanes' add. -(define_insn reduc_surplus_mode +(define_expand reduc_splus_mode + [(match_operand:VALL 0 register_operand =w) + (match_operand:VALL 1 register_operand w)] + TARGET_SIMD + { +/* Old optab/standard name, should not be used since we are providing + newer reduc_plus_scal_mode. */ +gcc_unreachable (); + } +) + +(define_expand reduc_uplus_mode + [(match_operand:VALL 0 register_operand =w) + (match_operand:VALL 1 register_operand w)] + TARGET_SIMD + { +/* Old optab/standard name, should not be used since we are providing + newer reduc_plus_scal_mode. */ +gcc_unreachable (); + } +) + +(define_expand reduc_plus_scal_mode + [(match_operand:VEL 0 register_operand =w) + (unspec:VDQ [(match_operand:VDQ 1 register_operand w)] + UNSPEC_ADDV)] + TARGET_SIMD + { +rtx elt = GEN_INT (ENDIAN_LANE_N (MODEmode, 0)); +rtx scratch = gen_reg_rtx (MODEmode); +emit_insn (gen_aarch64_reduc_plus_internalmode (scratch, operands[1])); +emit_insn (gen_aarch64_get_lanemode (operands[0], scratch, elt)); +DONE; + } +) + +(define_expand reduc_plus_scal_mode + [(match_operand:VEL 0 register_operand =w) + (match_operand:V2F 1 register_operand w)] + TARGET_SIMD + { +rtx elt = GEN_INT (ENDIAN_LANE_N (MODEmode, 0)); +rtx scratch = gen_reg_rtx (MODEmode); +emit_insn (gen_aarch64_reduc_plus_internalmode (scratch, operands[1])); +emit_insn (gen_aarch64_get_lanemode (operands[0], scratch, elt)); +DONE; + } +) + +(define_insn aarch64_reduc_plus_internalmode [(set (match_operand:VDQV 0 register_operand =w) (unspec:VDQV [(match_operand:VDQV 1 register_operand w)] - SUADDV))] + UNSPEC_ADDV))] TARGET_SIMD addVDQV:vp\\t%Vetype0, %1.Vtype [(set_attr type neon_reduc_addq)] ) -(define_insn reduc_surplus_v2si +(define_insn aarch64_reduc_plus_internalv2si [(set (match_operand:V2SI 0 register_operand =w) (unspec:V2SI [(match_operand:V2SI 1 register_operand w)] - SUADDV))] + UNSPEC_ADDV))] TARGET_SIMD addp\\t%0.2s, %1.2s, %1.2s [(set_attr type neon_reduc_add)] ) -(define_insn reduc_splus_mode +(define_insn aarch64_reduc_plus_internalmode [(set (match_operand:V2F 0 register_operand =w) (unspec:V2F [(match_operand:V2F 1 register_operand w)] UNSPEC_FADDV))] @@
[PATCH i386 AVX512] [44/n] Add vsgufps insn patterns.
Hello, Patch in the bottom extends AVX-512 shufps. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_expand avx_shufps256mask_expand4_name): Add masking. (define_insn avx_shufps256_1mask_name): Ditto. (define_expand sse_shufpsmask_expand4_name): Ditto. (define_insn sse_shufps_v4sf_mask): New. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 295f11a..9151063 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -5805,7 +5805,7 @@ (set_attr prefix evex) (set_attr mode V16SF)]) -(define_expand avx_shufps256 +(define_expand avx_shufps256mask_expand4_name [(match_operand:V8SF 0 register_operand) (match_operand:V8SF 1 register_operand) (match_operand:V8SF 2 nonimmediate_operand) @@ -5813,25 +5813,28 @@ TARGET_AVX { int mask = INTVAL (operands[3]); - emit_insn (gen_avx_shufps256_1 (operands[0], operands[1], operands[2], - GEN_INT ((mask 0) 3), - GEN_INT ((mask 2) 3), - GEN_INT (((mask 4) 3) + 8), - GEN_INT (((mask 6) 3) + 8), - GEN_INT (((mask 0) 3) + 4), - GEN_INT (((mask 2) 3) + 4), - GEN_INT (((mask 4) 3) + 12), - GEN_INT (((mask 6) 3) + 12))); + emit_insn (gen_avx_shufps256_1mask_expand4_name (operands[0], +operands[1], +operands[2], +GEN_INT ((mask 0) 3), +GEN_INT ((mask 2) 3), +GEN_INT (((mask 4) 3) + 8), +GEN_INT (((mask 6) 3) + 8), +GEN_INT (((mask 0) 3) + 4), +GEN_INT (((mask 2) 3) + 4), +GEN_INT (((mask 4) 3) + 12), +GEN_INT (((mask 6) 3) + 12) +mask_expand4_args)); DONE; }) ;; One bit in mask selects 2 elements. -(define_insn avx_shufps256_1 - [(set (match_operand:V8SF 0 register_operand =x) +(define_insn avx_shufps256_1mask_name + [(set (match_operand:V8SF 0 register_operand =v) (vec_select:V8SF (vec_concat:V16SF - (match_operand:V8SF 1 register_operand x) - (match_operand:V8SF 2 nonimmediate_operand xm)) + (match_operand:V8SF 1 register_operand v) + (match_operand:V8SF 2 nonimmediate_operand vm)) (parallel [(match_operand 3 const_0_to_3_operand ) (match_operand 4 const_0_to_3_operand ) (match_operand 5 const_8_to_11_operand ) @@ -5841,6 +5844,7 @@ (match_operand 9 const_12_to_15_operand) (match_operand 10 const_12_to_15_operand)])))] TARGET_AVX +mask_avx512vl_condition (INTVAL (operands[3]) == (INTVAL (operands[7]) - 4) INTVAL (operands[4]) == (INTVAL (operands[8]) - 4) INTVAL (operands[5]) == (INTVAL (operands[9]) - 4) @@ -5853,14 +5857,14 @@ mask |= (INTVAL (operands[6]) - 8) 6; operands[3] = GEN_INT (mask); - return vshufps\t{%3, %2, %1, %0|%0, %1, %2, %3}; + return vshufps\t{%3, %2, %1, %0mask_operand11|%0mask_operand11, %1, %2, %3}; } [(set_attr type sseshuf) (set_attr length_immediate 1) - (set_attr prefix vex) + (set_attr prefix mask_prefix) (set_attr mode V8SF)]) -(define_expand sse_shufps +(define_expand sse_shufpsmask_expand4_name [(match_operand:V4SF 0 register_operand) (match_operand:V4SF 1 register_operand) (match_operand:V4SF 2 nonimmediate_operand) @@ -5868,14 +5872,46 @@ TARGET_SSE { int mask = INTVAL (operands[3]); - emit_insn (gen_sse_shufps_v4sf (operands[0], operands[1], operands[2], - GEN_INT ((mask 0) 3), - GEN_INT ((mask 2) 3), - GEN_INT (((mask 4) 3) + 4), - GEN_INT (((mask 6) 3) + 4))); + emit_insn (gen_sse_shufps_v4sfmask_expand4_name (operands[0], +operands[1], +operands[2], +GEN_INT ((mask 0) 3), +GEN_INT ((mask 2) 3), +GEN_INT (((mask 4) 3) + 4), +
[PATCH 5/14][AArch64] Use new reduc_[us](min|max)_scal optabs, inc. for builtins
Similarly to the previous patch (r/2205), this migrates AArch64 to the new reduce-to-scalar optabs for min and max. For consistency we apply the same treatment to the smax_nan and smin_nan patterns (used for __builtins), even though reduc_smin_nan_scal (etc.) is not a standard name. Tested: check-gcc on aarch64-none-elf and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (reduc_smax_, reduc_smin_, reduc_umax_, reduc_umin_, reduc_smax_nan_, reduc_smin_nan_): Remove. (reduc_smax_scal_, reduc_smin_scal_, reduc_umax_scal_, reduc_umin_scal_, reduc_smax_nan_scal_, reduc_smin_nan_scal_): New. * config/aarch64/aarch64-simd.md (reduc_maxmin_uns_mode): Rename VDQV_S variant to... (reduc_maxmin_uns_internalmode): ...this. (reduc_maxmin_uns_mode): New (VDQ_BHSI). (reduc_maxmin_uns_scal_mode): New (*2). (reduc_maxmin_uns_v2si): Combine with below, renaming... (reduc_maxmin_uns_mode): Combine V2F with above, renaming... (reduc_maxmin_uns_internal_mode): ...to this (VDQF). * config/aarch64/arm_neon.h (vmaxv_f32, vmaxv_s8, vmaxv_s16, vmaxv_s32, vmaxv_u8, vmaxv_u16, vmaxv_u32, vmaxvq_f32, vmaxvq_f64, vmaxvq_s8, vmaxvq_s16, vmaxvq_s32, vmaxvq_u8, vmaxvq_u16, vmaxvq_u32, vmaxnmv_f32, vmaxnmvq_f32, vmaxnmvq_f64, vminv_f32, vminv_s8, vminv_s16, vminv_s32, vminv_u8, vminv_u16, vminv_u32, vminvq_f32, vminvq_f64, vminvq_s8, vminvq_s16, vminvq_s32, vminvq_u8, vminvq_u16, vminvq_u32, vminnmv_f32, vminnmvq_f32, vminnmvq_f64): Update to use __builtin_aarch64_reduc_..._scal; remove vget_lane wrapper.diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index ae4ab42e3e3df7de4e4b2c5e46a1476a2ed64175..e213b9ce3adfc0c4c50b4dc34f4f1b995d5e8042 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -251,13 +251,13 @@ /* Implemented by aarch64_reduc_plus_mode. */ BUILTIN_VALL (UNOP, reduc_plus_scal_, 10) - /* Implemented by reduc_maxmin_uns_mode. */ - BUILTIN_VDQIF (UNOP, reduc_smax_, 10) - BUILTIN_VDQIF (UNOP, reduc_smin_, 10) - BUILTIN_VDQ_BHSI (UNOP, reduc_umax_, 10) - BUILTIN_VDQ_BHSI (UNOP, reduc_umin_, 10) - BUILTIN_VDQF (UNOP, reduc_smax_nan_, 10) - BUILTIN_VDQF (UNOP, reduc_smin_nan_, 10) + /* Implemented by reduc_maxmin_uns_scal_mode (producing scalar). */ + BUILTIN_VDQIF (UNOP, reduc_smax_scal_, 10) + BUILTIN_VDQIF (UNOP, reduc_smin_scal_, 10) + BUILTIN_VDQ_BHSI (UNOPU, reduc_umax_scal_, 10) + BUILTIN_VDQ_BHSI (UNOPU, reduc_umin_scal_, 10) + BUILTIN_VDQF (UNOP, reduc_smax_nan_scal_, 10) + BUILTIN_VDQF (UNOP, reduc_smin_nan_scal_, 10) /* Implemented by maxminmode3. smax variants map to fmaxnm, diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 23b89584d9ba1d88ff49bfa28d210b325e7dea7f..d4a745be59897b4cb2a0de23adb56b5d79203592 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1828,7 +1828,64 @@ ;; 'across lanes' max and min ops. -(define_insn reduc_maxmin_uns_mode +(define_expand reduc_maxmin_uns_mode + [(match_operand:VDQ_BHSI 0 register_operand) + (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1 register_operand)] + MAXMINV)] + TARGET_SIMD + { +/* Old optab/standard name, should not be used since we are providing +newer reduc_..._scal_mode. */ +gcc_unreachable (); + } +) + +(define_expand reduc_maxmin_uns_mode + [(match_operand:VDQF 0 register_operand) + (unspec:VDQF [(match_operand:VDQF 1 register_operand)] + FMAXMINV)] + TARGET_SIMD + { +/* Old optab/standard name, should not be used since we are providing +newer reduc_..._scal_mode. */ +gcc_unreachable (); + } +) + +;; Template for outputting a scalar, so we can create __builtins which can be +;; gimple_fold'd to the REDUC_(MAX|MIN)_EXPR tree code. (This is FP smax/smin). +(define_expand reduc_maxmin_uns_scal_mode + [(match_operand:VEL 0 register_operand) + (unspec:VDQF [(match_operand:VDQF 1 register_operand)] + FMAXMINV)] + TARGET_SIMD + { +rtx elt = GEN_INT (ENDIAN_LANE_N (MODEmode, 0)); +rtx scratch = gen_reg_rtx (MODEmode); +emit_insn (gen_aarch64_reduc_maxmin_uns_internalmode (scratch, + operands[1])); +emit_insn (gen_aarch64_get_lanemode (operands[0], scratch, elt)); +DONE; + } +) + +;; Likewise for integer cases, signed and unsigned. +(define_expand reduc_maxmin_uns_scal_mode + [(match_operand:VEL 0 register_operand) + (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1 register_operand)] + MAXMINV)] + TARGET_SIMD + { +rtx elt = GEN_INT (ENDIAN_LANE_N (MODEmode, 0)); +rtx scratch = gen_reg_rtx (MODEmode); +emit_insn (gen_aarch64_reduc_maxmin_uns_internalmode (scratch, + operands[1])); +emit_insn (gen_aarch64_get_lanemode (operands[0],
[PATCH i386 AVX512] [45/n] Add vshufpd insn patterns.
Hello, This patch supports AVX-512's vshufpd insns. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_expand avx_shufpd256mask_expand4_name): Add masking. (define_insn avx_shufpd256_1mask_name): Ditto. (define_expand sse2_shufpdmask_expand4_name): Ditto. (define_insn sse2_shufpd_v2df_mask): New. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 9151063..9e0c0e8 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -7790,7 +7790,7 @@ (set_attr prefix evex) (set_attr mode V8DF)]) -(define_expand avx_shufpd256 +(define_expand avx_shufpd256mask_expand4_name [(match_operand:V4DF 0 register_operand) (match_operand:V4DF 1 register_operand) (match_operand:V4DF 2 nonimmediate_operand) @@ -7798,25 +7798,28 @@ TARGET_AVX { int mask = INTVAL (operands[3]); - emit_insn (gen_avx_shufpd256_1 (operands[0], operands[1], operands[2], - GEN_INT (mask 1), - GEN_INT (mask 2 ? 5 : 4), - GEN_INT (mask 4 ? 3 : 2), - GEN_INT (mask 8 ? 7 : 6))); + emit_insn (gen_avx_shufpd256_1mask_expand4_name (operands[0], +operands[1], +operands[2], +GEN_INT (mask 1), +GEN_INT (mask 2 ? 5 : 4), +GEN_INT (mask 4 ? 3 : 2), +GEN_INT (mask 8 ? 7 : 6) +mask_expand4_args)); DONE; }) -(define_insn avx_shufpd256_1 - [(set (match_operand:V4DF 0 register_operand =x) +(define_insn avx_shufpd256_1mask_name + [(set (match_operand:V4DF 0 register_operand =v) (vec_select:V4DF (vec_concat:V8DF - (match_operand:V4DF 1 register_operand x) - (match_operand:V4DF 2 nonimmediate_operand xm)) + (match_operand:V4DF 1 register_operand v) + (match_operand:V4DF 2 nonimmediate_operand vm)) (parallel [(match_operand 3 const_0_to_1_operand) (match_operand 4 const_4_to_5_operand) (match_operand 5 const_2_to_3_operand) (match_operand 6 const_6_to_7_operand)])))] - TARGET_AVX + TARGET_AVX mask_avx512vl_condition { int mask; mask = INTVAL (operands[3]); @@ -7825,14 +7828,14 @@ mask |= (INTVAL (operands[6]) - 6) 3; operands[3] = GEN_INT (mask); - return vshufpd\t{%3, %2, %1, %0|%0, %1, %2, %3}; + return vshufpd\t{%3, %2, %1, %0mask_operand7|%0mask_operand7, %1, %2, %3}; } [(set_attr type sseshuf) (set_attr length_immediate 1) (set_attr prefix vex) (set_attr mode V4DF)]) -(define_expand sse2_shufpd +(define_expand sse2_shufpdmask_expand4_name [(match_operand:V2DF 0 register_operand) (match_operand:V2DF 1 register_operand) (match_operand:V2DF 2 nonimmediate_operand) @@ -7840,12 +7843,38 @@ TARGET_SSE2 { int mask = INTVAL (operands[3]); - emit_insn (gen_sse2_shufpd_v2df (operands[0], operands[1], operands[2], - GEN_INT (mask 1), - GEN_INT (mask 2 ? 3 : 2))); + emit_insn (gen_sse2_shufpd_v2dfmask_expand4_name (operands[0], operands[1], + operands[2], GEN_INT (mask 1), + GEN_INT (mask 2 ? 3 : 2) + mask_expand4_args)); DONE; }) +(define_insn sse2_shufpd_v2df_mask + [(set (match_operand:V2DF 0 register_operand =v) +(vec_merge:V2DF + (vec_select:V2DF + (vec_concat:V4DF + (match_operand:V2DF 1 register_operand v) + (match_operand:V2DF 2 nonimmediate_operand vm)) + (parallel [(match_operand 3 const_0_to_1_operand) + (match_operand 4 const_2_to_3_operand)])) + (match_operand:V2DF 5 vector_move_operand 0C) + (match_operand:QI 6 register_operand Yk)))] + TARGET_AVX512VL +{ + int mask; + mask = INTVAL (operands[3]); + mask |= (INTVAL (operands[4]) - 2) 1; + operands[3] = GEN_INT (mask); + + return vshufpd\t{%3, %2, %1, %0%{%6%}%N5|%0%{6%}%N5, %1, %2, %3}; +} + [(set_attr type sseshuf) + (set_attr length_immediate 1) + (set_attr prefix evex) + (set_attr mode V2DF)]) + ;; punpcklqdq and punpckhqdq are shorter than shufpd. (define_insn avx2_interleave_highv4dimask_name [(set (match_operand:V4DI 0 register_operand =v)
[PATCH 6/14][AArch64] Restore gimple_folding of reduction intrinsics
This gives us back the constant-folding of the neon-intrinsics that was removed in the first patch, but is now OK for bigendian too. bootstrapped on aarch64-none-linux-gnu. check-gcc on aarch64-none-elf and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64.c (TARGET_GIMPLE_FOLD_BUILTIN): Define again. * config/aarch64/aarch64-builtins.c (aarch64_gimple_fold_builtin): Restore, enable for bigendian, update to use __builtin..._scal...diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 15eb7c686d95b1d66cbd514500ec29ba074eaa3f..0432d3aa1a515a15b051ba89afec7c0306cb5803 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -1333,9 +1333,6 @@ aarch64_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *args, return NULL_TREE; } -/* Handling of reduction operations temporarily removed so as to decouple - changes to tree codes from AArch64 NEON Intrinsics. */ -#if 0 bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) { @@ -1345,19 +1342,6 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) tree fndecl; gimple new_stmt = NULL; - /* The operations folded below are reduction operations. These are - defined to leave their result in the 0'th element (from the perspective - of GCC). The architectural instruction we are folding will leave the - result in the 0'th element (from the perspective of the architecture). - For big-endian systems, these perspectives are not aligned. - - It is therefore wrong to perform this fold on big-endian. There - are some tricks we could play with shuffling, but the mid-end is - inconsistent in the way it treats reduction operations, so we will - end up in difficulty. Until we fix the ambiguity - just bail out. */ - if (BYTES_BIG_ENDIAN) -return false; - if (call) { fndecl = gimple_call_fndecl (stmt); @@ -1369,23 +1353,28 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) ? gimple_call_arg_ptr (stmt, 0) : error_mark_node); + /* We use gimple's REDUC_(PLUS|MIN|MAX)_EXPRs for float, signed int + and unsigned int; it will distinguish according to the types of + the arguments to the __builtin. */ switch (fcode) { - BUILTIN_VALL (UNOP, reduc_splus_, 10) - new_stmt = gimple_build_assign_with_ops ( + BUILTIN_VALL (UNOP, reduc_plus_scal_, 10) + new_stmt = gimple_build_assign_with_ops ( REDUC_PLUS_EXPR, gimple_call_lhs (stmt), args[0], NULL_TREE); break; - BUILTIN_VDQIF (UNOP, reduc_smax_, 10) + BUILTIN_VDQIF (UNOP, reduc_smax_scal_, 10) + BUILTIN_VDQ_BHSI (UNOPU, reduc_umax_scal_, 10) new_stmt = gimple_build_assign_with_ops ( REDUC_MAX_EXPR, gimple_call_lhs (stmt), args[0], NULL_TREE); break; - BUILTIN_VDQIF (UNOP, reduc_smin_, 10) + BUILTIN_VDQIF (UNOP, reduc_smin_scal_, 10) + BUILTIN_VDQ_BHSI (UNOPU, reduc_umin_scal_, 10) new_stmt = gimple_build_assign_with_ops ( REDUC_MIN_EXPR, gimple_call_lhs (stmt), @@ -1407,7 +1396,6 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) return changed; } -#endif void aarch64_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 9197ec038b7d40a601c886b846113c50a29cf5e2..e7946fc0b70ced70a4e98caa0a33121f29242aad 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -9925,8 +9925,8 @@ aarch64_expand_movmem (rtx *operands) #undef TARGET_FRAME_POINTER_REQUIRED #define TARGET_FRAME_POINTER_REQUIRED aarch64_frame_pointer_required -//#undef TARGET_GIMPLE_FOLD_BUILTIN -//#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin +#undef TARGET_GIMPLE_FOLD_BUILTIN +#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin #undef TARGET_GIMPLIFY_VA_ARG_EXPR #define TARGET_GIMPLIFY_VA_ARG_EXPR aarch64_gimplify_va_arg_expr
Re: Fix i386 FP_TRAPPING_EXCEPTIONS
On Thu, 18 Sep 2014, Uros Bizjak wrote: OK for mainline and release branches. I've omitted ia64 from the targets in the testcase in the release branch version, given the lack of any definition of FP_TRAPPING_EXCEPTIONS at all there. (I think a definition as (~_fcw 0x3f) should work for ia64, but haven't tested that.) -- Joseph S. Myers jos...@codesourcery.com
[PATCH 7/14][Testsuite] Add tests of reductions using whole-vector-shifts (multiplication)
For reduction operations (e.g. multiply) that don't have such a tree code ,or where the target platform doesn't define an optab handler for the tree code, we can perform the reduction using a series of log(N) shifts (where N = #elements in vector), using the VEC_RSHIFT_EXPR=whole-vector-shift tree code (if the platform handles the vec_shr_optab). First stage is to add some tests of non-(min/max/plus) reductions; here, multiplies. The first is designed to be non-foldable, so we make sure the architectural instructions line up with what the tree codes specify. The second is designed to be easily constant-propagated, to test the (currently endianness-dependent) constant folding code. In lib/target-supports.exp, I've defined a new check_effective_target_whole_vector_shift, which I intended to define to true for platforms with the vec_shr optab. However, I've not managed to make this test pass on PowerPC - even with -maltivec, -fdump-tree-vect-details gives me a message about the target not supporting vector multiplication - so I've omitted PowerPC from the whole_vector_shift. This doesn't feel right, suggestions welcomed from PowerPC maintainers? Tests passing on arm-none-eabi and x86_64-none-linux-gnu; also verified the scan-tree-dump part works on ia64-none-linux-gnu (by compiling to assembly only). (Tests are not run on AArch64, because we have no vec_shr_optab at this point; PowerPC, as above; or MIPS, as check_effective_target_vect_int_mult yields 0.) gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_whole_vector_shift): New. * gcc.dg/vect/vect-reduc-mul_1.c: New test. * gcc.dg/vect/vect-reduc-mul_2.c: New test. diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-mul_1.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-mul_1.c new file mode 100644 index ..44f026ff9b561bcf314224c44d51bdd19448851b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-mul_1.c @@ -0,0 +1,36 @@ +/* { dg-require-effective-target vect_int_mult } */ +/* { dg-require-effective-target whole_vector_shift } */ + +/* Write a reduction loop to be reduced using vector shifts. */ + +extern void abort(void); + +unsigned char in[16]; + +int +main (unsigned char argc, char **argv) +{ + unsigned char i = 0; + unsigned char sum = 1; + + for (i = 0; i 16; i++) +in[i] = i + i + 1; + + /* Prevent constant propagation of the entire loop below. */ + asm volatile ( : : : memory); + + for (i = 0; i 16; i++) +sum *= in[i]; + + if (sum != 33) +{ + __builtin_printf(Failed %d\n, sum); + abort(); +} + + return 0; +} + +/* { dg-final { scan-tree-dump Reduce using vector shifts vect } } */ +/* { dg-final { cleanup-tree-dump vect } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-mul_2.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-mul_2.c new file mode 100644 index ..414fba7a5c96c4dd89030682492edb57ebba3b16 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-mul_2.c @@ -0,0 +1,32 @@ +/* { dg-require-effective-target vect_int_mult } */ +/* { dg-require-effective-target whole_vector_shift } */ + +/* Write a reduction loop to be reduced using vector shifts and folded. */ + +extern void abort(void); + +int +main (unsigned char argc, char **argv) +{ + unsigned char in[16]; + unsigned char i = 0; + unsigned char sum = 1; + + for (i = 0; i 16; i++) +in[i] = i + i + 1; + + for (i = 0; i 16; i++) +sum *= in[i]; + + if (sum != 33) +{ + __builtin_printf(Failed %d\n, sum); + abort(); +} + + return 0; +} + +/* { dg-final { scan-tree-dump Reduce using vector shifts vect } } */ +/* { dg-final { cleanup-tree-dump vect } } */ + diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index fa5137ea472e1773be60759caad32bbc7ab4c551..0f4bebd533c9268adfcd4ed250f06fca825c92b1 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3320,6 +3320,22 @@ proc check_effective_target_vect_shift { } { return $et_vect_shift_saved } +proc check_effective_target_whole_vector_shift { } { +if { [istarget x86_64-*-*] + || [istarget ia64-*-*] + || ([check_effective_target_arm32] + [check_effective_target_arm_little_endian]) + || ([istarget mips*-*-*] + [check_effective_target_mips_loongson]) } { + set answer 1 +} else { + set answer 0 +} + +verbose check_effective_target_vect_long: returning $answer 2 +return $answer +} + # Return 1 if the target supports vector bswap operations. proc check_effective_target_vect_bswap { } {
[PATCH 8/14][Testsuite] Add tests of reductions using whole-vector-shifts (ior)
These are like the previous patch, but using | rather than * - I was unable to get the previous test to pass on PowerPC and MIPS. I note there is no inherent vector operation here - a bitwise OR across a word, and a reduction via shifts using scalar (not vector) ops would be all that's necessary. However, GCC doesn't exploit this possibility at present, and I don't have any plans at present to add such myself. Passing on x86_64-linux-gnu, aarch64-none-elf, aarch64_be-none-elf, arm-none-eabi. The 'scan-tree-dump' part passes on mips64 and powerpc (although the latter is disabled as check_effective_target_whole_vector_shift gives 0, as per previous patch) gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-reduc-or_1.c: New test. * gcc.dg/vect/vect-reduc-or_2.c: Likewise. diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c new file mode 100644 index ..4e1a8577ce21aad539fca7cf07700b99575dfab0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c @@ -0,0 +1,35 @@ +/* { dg-require-effective-target whole_vector_shift } */ + +/* Write a reduction loop to be reduced using vector shifts. */ + +extern void abort(void); + +unsigned char in[16] __attribute__((__aligned__(16))); + +int +main (unsigned char argc, char **argv) +{ + unsigned char i = 0; + unsigned char sum = 1; + + for (i = 0; i 16; i++) +in[i] = (i + i + 1) 0xfd; + + /* Prevent constant propagation of the entire loop below. */ + asm volatile ( : : : memory); + + for (i = 0; i 16; i++) +sum |= in[i]; + + if (sum != 29) +{ + __builtin_printf(Failed %d\n, sum); + abort(); +} + + return 0; +} + +/* { dg-final { scan-tree-dump Reduce using vector shifts vect } } */ +/* { dg-final { cleanup-tree-dump vect } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c new file mode 100644 index ..e25467e59221adc09cbe0bb7548842902a4bf6da --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c @@ -0,0 +1,31 @@ +/* { dg-require-effective-target whole_vector_shift } */ + +/* Write a reduction loop to be reduced using vector shifts and folded. */ + +extern void abort(void); + +int +main (unsigned char argc, char **argv) +{ + unsigned char in[16] __attribute__((aligned(16))); + unsigned char i = 0; + unsigned char sum = 1; + + for (i = 0; i 16; i++) +in[i] = (i + i + 1) 0xfd; + + for (i = 0; i 16; i++) +sum |= in[i]; + + if (sum != 29) +{ + __builtin_printf(Failed %d\n, sum); + abort(); +} + + return 0; +} + +/* { dg-final { scan-tree-dump Reduce using vector shifts vect } } */ +/* { dg-final { cleanup-tree-dump vect } } */ +
[PATCH 9/14] Enforce whole-vector-shifts to always be by a whole number of elements
The VEC_RSHIFT_EXPR is only ever used by the vectorizer in tree-vect-loop.c (vect_create_epilog_for_reduction), to shift the vector by a whole number of elements. The tree code allows more general shifts but only for integral types. This only causes pain and difficulty for backends (particularly for backends with different endiannesses), and enforcing that restriction for integral types too does no harm. bootstrapped on aarch64-none-linux-gnu and x86-64-none-linux-gnu check-gcc on aarch64-none-elf and x86_64-none-linux-gnu gcc/ChangeLog: * tree-cfg.c (verify_gimple_assign_binary): for VEC_RSHIFT_EXPR (and VEC_LSHIFT_EXPR), require shifts to be by a whole number of elements for all types, rather than only non-integral types. * tree.def (VEC_LSHIFT_EXPR, VEC_RSHIFT_EXPR): Update comment. * doc/md.texi (vec_shl_m, vec_shr_m): Update comment. diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 3f5fd6f0e3ac3fcc30f6c961e3e2709a35f4d413..a78aea2f3f6e35b0d89719a42d734e62a2f5bd65 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4888,7 +4888,8 @@ of a wider mode.) @item @samp{vec_shl_@var{m}}, @samp{vec_shr_@var{m}} Whole vector left/right shift in bits. Operand 1 is a vector to be shifted. -Operand 2 is an integer shift amount in bits. +Operand 2 is an integer shift amount in bits, which must be a multiple of the +element size. Operand 0 is where the resulting shifted vector is stored. The output and input vectors should have the same modes. diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index 49986cc40758bb5998e395c727142e75f7d6e9f4..1ea2e256b09b25331810a57a9c35e5cc875d0404 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -3667,14 +3667,11 @@ verify_gimple_assign_binary (gimple stmt) debug_generic_expr (rhs2_type); return true; } - /* For shifting a vector of non-integral components we - only allow shifting by a constant multiple of the element size. */ - if (!INTEGRAL_TYPE_P (TREE_TYPE (rhs1_type)) - (TREE_CODE (rhs2) != INTEGER_CST - || !div_if_zero_remainder (rhs2, - TYPE_SIZE (TREE_TYPE (rhs1_type) + /* All shifts must be by a constant multiple of the element size. */ + if (TREE_CODE (rhs2) != INTEGER_CST + || !div_if_zero_remainder (rhs2, TYPE_SIZE (TREE_TYPE (rhs1_type { - error (non-element sized vector shift of floating point vector); + error (non-element sized vector shift); return true; } diff --git a/gcc/tree.def b/gcc/tree.def index e9af52e554babb100d49ea14f47c805cd5024949..5406ffe67c53ff3f12920ca8c965cf0740a079c2 100644 --- a/gcc/tree.def +++ b/gcc/tree.def @@ -1240,7 +1240,8 @@ DEFTREECODE (FMA_EXPR, fma_expr, tcc_expression, 3) /* Whole vector left/right shift in bits. Operand 0 is a vector to be shifted. - Operand 1 is an integer shift amount in bits. */ + Operand 1 is an integer shift amount in bits, which must be a multiple of the + element size. */ DEFTREECODE (VEC_LSHIFT_EXPR, vec_lshift_expr, tcc_binary, 2) DEFTREECODE (VEC_RSHIFT_EXPR, vec_rshift_expr, tcc_binary, 2)
v3 PATCH to __dynamic_cast to handle undefined behavior
A while back a customer complained about this program crashing, which I explained was because the behavior is undefined; it crashes specifically because when we try to do the dynamic_cast the F vptr is pointing to a construction vtable for E-in-F which doesn't have a vbase offset entry for C, so when dynamic_cast goes looking for it it instead loads some random value from whatever happens to be just before that vtable. But while looking at Jakub's -fsanitize=vptr work, it occurred to me that it would be easy and pretty cheap to catch this situation in dynamic_cast: if the whole object disagrees with the original subobject about what type it is, just fail. Jakub, I think ubsan could use the same approach to check the argument. Tested x86_64-pc-linux-gnu, applying to trunk. commit e5f1ca3b03c352380dda95474e06f525ff9849be Author: Jason Merrill ja...@redhat.com Date: Wed Sep 17 14:47:20 2014 -0400 * libsupc++/dyncast.cc (__dynamic_cast): Handle mid-destruction dynamic_cast more gracefully. diff --git a/gcc/testsuite/g++.dg/rtti/dyncast7.C b/gcc/testsuite/g++.dg/rtti/dyncast7.C new file mode 100644 index 000..deb4397 --- /dev/null +++ b/gcc/testsuite/g++.dg/rtti/dyncast7.C @@ -0,0 +1,28 @@ +// I think this dynamic_cast has undefined behavior when destroying E::o +// because we're the F period of destruction has started and ap doesn't +// point to the object currently being destroyed--but the reasonable +// options are success or failure, not SEGV. + +// { dg-do run } + +extern C void abort(); + +struct A { virtual ~A(); }; +struct B { virtual ~B() { } }; +struct C : B, A { }; +struct E : virtual B { A o; }; +struct F : virtual C, virtual E { }; + +A* ap; +C* cp; + +A::~A() { + C* cp2 = dynamic_castC*(ap); + if (cp2 != cp cp2 != 0) +abort(); +} + +int main() { + F f; + ap = cp = f; +} diff --git a/libstdc++-v3/libsupc++/dyncast.cc b/libstdc++-v3/libsupc++/dyncast.cc index 2bcb7dd..9f6adef 100644 --- a/libstdc++-v3/libsupc++/dyncast.cc +++ b/libstdc++-v3/libsupc++/dyncast.cc @@ -55,6 +55,18 @@ __dynamic_cast (const void *src_ptr,// object started from adjust_pointer void (src_ptr, prefix-whole_object); const __class_type_info *whole_type = prefix-whole_type; __class_type_info::__dyncast_result result; + + // If the whole object vptr doesn't refer to the whole object type, we're + // in the middle of constructing a primary base, and src is a separate + // base. This has undefined behavior and we can't find anything outside + // of the base we're actually constructing, so fail now rather than + // segfault later trying to use a vbase offset that doesn't exist. + const void *whole_vtable = *static_cast const void *const * (whole_ptr); + const vtable_prefix *whole_prefix = +adjust_pointer vtable_prefix (whole_vtable, +-offsetof (vtable_prefix, origin)); + if (whole_prefix-whole_type != whole_type) +return NULL; whole_type-__do_dyncast (src2dst, __class_type_info::__contained_public, dst_type, whole_ptr, src_type, src_ptr, result);
[PATCH 10/14][AArch64] Implement vec_shr optab
This allows reduction of non-(plus|min|max) operations using log_2(N) shifts rather than N vec_extracts; e.g. for example code int main (unsigned char argc, char **argv) { unsigned char in[16] = { 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31 }; unsigned char i = 0; unsigned char sum = 1; /* Prevent constant propagation of the entire loop below. */ asm volatile ( : : : memory); for (i = 0; i 16; i++) sum *= in[i]; if (sum != 33) __builtin_printf(Failed %d\n, sum); } (a simplified, less-general version of vect-reduc-mul_1.c) this gives main: ldr q0, .LC0 sub sp, sp, #16 str q0, [sp] ldr q1, [sp] moviv0.4s, 0 ext v2.16b, v1.16b, v0.16b, #8 mul v1.16b, v1.16b, v2.16b ext v2.16b, v1.16b, v0.16b, #4 mul v1.16b, v2.16b, v1.16b ext v2.16b, v1.16b, v0.16b, #2 mul v1.16b, v2.16b, v1.16b ext v0.16b, v1.16b, v0.16b, #1 mul v0.16b, v0.16b, v1.16b umovw1, v0.b[0] cmp w1, 33 beq .L2 ... rather than previously: main: ldr q0, .LC0 sub sp, sp, #16 str q0, [sp] ldr d1, [sp] ldr d0, [sp, 8] mul v0.8b, v0.8b, v1.8b umovw0, v0.b[1] umovw3, v0.b[0] umovw2, v0.b[2] umovw7, v0.b[3] umovw6, v0.b[4] mul w3, w0, w3 umovw5, v0.b[5] umovw4, v0.b[6] umovw1, v0.b[7] mul w3, w3, w2 mul w2, w3, w7 mul w2, w2, w6 mul w0, w2, w5 mul w0, w0, w4 mul w1, w0, w1 uxtbw1, w1 cmp w1, 33 beq .L2 ... Tested check-gcc on aarch64-none-elf and aarch64_be-none-elf. (Including new tests from previous patches.) gcc/ChangeLog: * config/aarch64/aarch64-simd.md (vec_shrmode): New (*2). gcc/testsuite/ChangeLog: * lib/target_supports.exp (check_effective_target_whole_vector_shift): Add aarch64*-*-*. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index d4a745be59897b4cb2a0de23adb56b5d79203592..3fcf809113d73b37a95653b8c2be432478d2bc1e 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -770,6 +770,45 @@ } ) +;; For 64-bit modes we use ushl/r, as this does not require a SIMD zero. +(define_insn vec_shr_mode + [(set (match_operand:VD 0 register_operand =w) +(lshiftrt:VD (match_operand:VD 1 register_operand w) + (match_operand:SI 2 immediate_operand i)))] + TARGET_SIMD + ushr %d0, %d1, %2 + [(set_attr type neon_shift_imm)] +) + +(define_expand vec_shr_mode + [(set (match_operand:VQ 0 register_operand =w) +(lshiftrt:VQ (match_operand:VQ 1 register_operand w) + (match_operand:SI 2 immediate_operand i)))] + TARGET_SIMD +{ + HOST_WIDE_INT num_bits = INTVAL (operands[2]); + HOST_WIDE_INT elem_bits = GET_MODE_BITSIZE (GET_MODE_INNER (MODEmode)); + rtx zero_reg = force_reg (MODEmode, CONST0_RTX (MODEmode)); + + gcc_assert (GET_MODE_BITSIZE (MODEmode) == 128); + gcc_assert (num_bits % elem_bits == 0); + + if (num_bits == 0) +{ + emit_move_insn (operands[0], operands[1]); + DONE; +} + else if (num_bits == 128) +{ + emit_move_insn (operands[0], CONST0_RTX (MODEmode)); + DONE; +} + + emit_insn (gen_aarch64_extmode (operands[0], operands[1], zero_reg, + GEN_INT (num_bits / elem_bits))); + DONE; +}) + (define_insn aarch64_simd_vec_setv2di [(set (match_operand:V2DI 0 register_operand =w,w) (vec_merge:V2DI diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 5e40f5fcdfc95e41e804075bb5daa7030eb9bc66..720cc345bf6a76470cc85116d7b3365be07caa97 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3323,6 +3323,7 @@ proc check_effective_target_vect_shift { } { proc check_effective_target_whole_vector_shift { } { if { [istarget x86_64-*-*] || [istarget ia64-*-*] + || [istarget aarch64*-*-*] || ([check_effective_target_arm32] [check_effective_target_arm_little_endian]) || ([istarget mips*-*-*]
[PATCH 11/14] Remove VEC_LSHIFT_EXPR and vec_shl_optab
The VEC_LSHIFT_EXPR tree code, and the corresponding vec_shl_optab, seem to have been added for completeness, providing a counterpart to VEC_RSHIFT_EXPR and vec_shr_optab. However, whereas VEC_RSHIFT_EXPRs are generated (only) by the vectorizer, VEC_LSHIFT_EXPR expressions are not generated at all, so there seems little point in maintaining it. Bootstrapped on x86_64-unknown-linux-gnu. aarch64.exp+vect.exp on aarch64-none-elf and aarch64_be-none-elf. gcc/ChangeLog: * expr.c (expand_expr_real_2): Remove code handling VEC_LSHIFT_EXPR. * fold-const.c (const_binop): Likewise. * cfgexpand.c (expand_debug_expr): Likewise. * tree-inline.c (estimate_operator_cost, dump_generic_node, op_code_prio, op_symbol_code): Likewise. * tree-vect-generic.c (expand_vector_operations_1): Likewise. * optabs.c (optab_for_tree_code): Likewise. (expand_vec_shift_expr): Likewise, update comment. * tree.def: Delete VEC_LSHIFT_EXPR, remove comment. * optabs.h (expand_vec_shift_expr): Remove comment re. VEC_LSHIFT_EXPR. * optabs.def: Remove vec_shl_optab. * doc/md.texi: Remove references to vec_shr_m.diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index f6da5d632f441544fdacafc266e9cf17083a825a..6b46b08538c01190215a174773dfcb1109134873 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -4592,7 +4592,6 @@ expand_debug_expr (tree exp) case REDUC_MIN_EXPR: case REDUC_PLUS_EXPR: case VEC_COND_EXPR: -case VEC_LSHIFT_EXPR: case VEC_PACK_FIX_TRUNC_EXPR: case VEC_PACK_SAT_EXPR: case VEC_PACK_TRUNC_EXPR: diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index a78aea2f3f6e35b0d89719a42d734e62a2f5bd65..f94e0f62c622d43e2df0d0619fb1eba74c415165 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4883,10 +4883,9 @@ operand 1. Add operand 1 to operand 2 and place the widened result in operand 0. (This is used express accumulation of elements into an accumulator of a wider mode.) -@cindex @code{vec_shl_@var{m}} instruction pattern @cindex @code{vec_shr_@var{m}} instruction pattern -@item @samp{vec_shl_@var{m}}, @samp{vec_shr_@var{m}} -Whole vector left/right shift in bits. +@item @samp{vec_shr_@var{m}} +Whole vector right shift in bits. Operand 1 is a vector to be shifted. Operand 2 is an integer shift amount in bits, which must be a multiple of the element size. diff --git a/gcc/expr.c b/gcc/expr.c index 11930ca121e4e1f3807261a2e5b0ca4f6723176d..30ea87af3ef102d7071c6c29db37df875af316f5 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -9053,7 +9053,6 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode, return temp; } -case VEC_LSHIFT_EXPR: case VEC_RSHIFT_EXPR: { target = expand_vec_shift_expr (ops, target); diff --git a/gcc/fold-const.c b/gcc/fold-const.c index b8baa94d37a74ebb824e2a4d03f2a10befcdf749..bd4ba5f0c64c710df9fa36d4059f7b08e949fae0 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -1406,8 +1406,7 @@ const_binop (enum tree_code code, tree arg1, tree arg2) int count = TYPE_VECTOR_SUBPARTS (type), i; tree *elts = XALLOCAVEC (tree, count); - if (code == VEC_LSHIFT_EXPR - || code == VEC_RSHIFT_EXPR) + if (code == VEC_RSHIFT_EXPR) { if (!tree_fits_uhwi_p (arg2)) return NULL_TREE; @@ -1419,11 +1418,10 @@ const_binop (enum tree_code code, tree arg1, tree arg2) if (shiftc = outerc || (shiftc % innerc) != 0) return NULL_TREE; int offset = shiftc / innerc; - /* The direction of VEC_[LR]SHIFT_EXPR is endian dependent. - For reductions, compiler emits VEC_RSHIFT_EXPR always, - for !BYTES_BIG_ENDIAN picks first vector element, but - for BYTES_BIG_ENDIAN last element from the vector. */ - if ((code == VEC_RSHIFT_EXPR) ^ (!BYTES_BIG_ENDIAN)) + /* The direction of VEC_RSHIFT_EXPR is endian dependent. + For reductions, if !BYTES_BIG_ENDIAN then compiler picks first + vector element, but last element if BYTES_BIG_ENDIAN. */ + if (BYTES_BIG_ENDIAN) offset = -offset; tree zero = build_zero_cst (TREE_TYPE (type)); for (i = 0; i count; i++) diff --git a/gcc/optabs.c b/gcc/optabs.c index e422bcce18d06a39b26547b510c35858efc2303e..9c5b5daa6f2b51bda5ba92fcd61534f1dd55e646 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -515,9 +515,6 @@ optab_for_tree_code (enum tree_code code, const_tree type, case REDUC_PLUS_EXPR: return reduc_plus_scal_optab; -case VEC_LSHIFT_EXPR: - return vec_shl_optab; - case VEC_RSHIFT_EXPR: return vec_shr_optab; @@ -765,7 +762,7 @@ force_expand_binop (enum machine_mode mode, optab binoptab, return true; } -/* Generate insns for VEC_LSHIFT_EXPR, VEC_RSHIFT_EXPR. */ +/* Generate insns for VEC_RSHIFT_EXPR. */ rtx expand_vec_shift_expr (sepops ops, rtx target) @@ -776,21 +773,10 @@ expand_vec_shift_expr (sepops ops, rtx target) enum machine_mode mode = TYPE_MODE (ops-type); tree vec_oprnd =
[PATCH 12/14][Vectorizer] Redefine VEC_RSHIFT_EXPR and vec_shr_optab as endianness-neutral
The direction of VEC_RSHIFT_EXPR has been endian-dependent, contrary to the general principles of tree. This patch updates fold-const and the vectorizer (the only place where such expressions are created), such that VEC_RSHIFT_EXPR always shifts towards element 0. The tree code still maps directly onto the vec_shr_optab, and so this patch *will break any bigendian platform defining the vec_shr optab*. -- For AArch64_be, patch follows next in series; -- For PowerPC, I think patch/rfc 15 should fix, please inspect; -- For MIPS, I think patch/rfc 16 should fix, please inspect. gcc/ChangeLog: * fold-const.c (const_binop): VEC_RSHIFT_EXPR always shifts towards element 0. * tree-vect-loop.c (vect_create_epilog_for_reduction): always extract the result of a reduction with vector shifts from element 0. * tree.def (VEC_RSHIFT_EXPR, VEC_LSHIFT_EXPR): Comment shift direction. * doc/md.texi (vec_shr_m, vec_shl_m): Document shift direction. Testing Done: Bootstrap and check-gcc on x86_64-none-linux-gnu; check-gcc on aarch64-none-elf. diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index f94e0f62c622d43e2df0d0619fb1eba74c415165..a2e8f297fbdd69dfec23e6e0769a21917b06b5c7 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4885,7 +4885,7 @@ of a wider mode.) @cindex @code{vec_shr_@var{m}} instruction pattern @item @samp{vec_shr_@var{m}} -Whole vector right shift in bits. +Whole vector right shift in bits, i.e. towards element 0. Operand 1 is a vector to be shifted. Operand 2 is an integer shift amount in bits, which must be a multiple of the element size. diff --git a/gcc/fold-const.c b/gcc/fold-const.c index bd4ba5f0c64c710df9fa36d4059f7b08e949fae0..2a4fafa1b0634edd7a56f2484dec3a51a4699222 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -1418,15 +1418,10 @@ const_binop (enum tree_code code, tree arg1, tree arg2) if (shiftc = outerc || (shiftc % innerc) != 0) return NULL_TREE; int offset = shiftc / innerc; - /* The direction of VEC_RSHIFT_EXPR is endian dependent. - For reductions, if !BYTES_BIG_ENDIAN then compiler picks first - vector element, but last element if BYTES_BIG_ENDIAN. */ - if (BYTES_BIG_ENDIAN) - offset = -offset; tree zero = build_zero_cst (TREE_TYPE (type)); for (i = 0; i count; i++) { - if (i + offset 0 || i + offset = count) + if (i + offset = count) elts[i] = zero; else elts[i] = VECTOR_CST_ELT (arg1, i + offset); diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index d0a29d312bfd9a7eb552d937e3c64cf9b30d558a..016e2c1fc839fc4d1c97caaa38064fb8bbb510d8 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -3860,7 +3860,7 @@ vect_create_epilog_for_reduction (vectree vect_defs, gimple stmt, gimple epilog_stmt = NULL; enum tree_code code = gimple_assign_rhs_code (stmt); gimple exit_phi; - tree bitsize, bitpos; + tree bitsize; tree adjustment_def = NULL; tree vec_initial_def = NULL; tree reduction_op, expr, def; @@ -4371,14 +4371,8 @@ vect_create_epilog_for_reduction (vectree vect_defs, gimple stmt, dump_printf_loc (MSG_NOTE, vect_location, extract scalar result\n); - if (BYTES_BIG_ENDIAN) -bitpos = size_binop (MULT_EXPR, - bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1), - TYPE_SIZE (scalar_type)); - else -bitpos = bitsize_zero_node; - - rhs = build3 (BIT_FIELD_REF, scalar_type, new_temp, bitsize, bitpos); + rhs = build3 (BIT_FIELD_REF, scalar_type, + new_temp, bitsize, bitsize_zero_node); epilog_stmt = gimple_build_assign (new_scalar_dest, rhs); new_temp = make_ssa_name (new_scalar_dest, epilog_stmt); gimple_assign_set_lhs (epilog_stmt, new_temp); diff --git a/gcc/tree.def b/gcc/tree.def index ff56bfc18bc00e8dac2dfc072fd4fa878a0f2a04..90bc27fde303e1606baac858738a7a86a517573b 100644 --- a/gcc/tree.def +++ b/gcc/tree.def @@ -1238,7 +1238,7 @@ DEFTREECODE (WIDEN_LSHIFT_EXPR, widen_lshift_expr, tcc_binary, 2) before adding operand three. */ DEFTREECODE (FMA_EXPR, fma_expr, tcc_expression, 3) -/* Whole vector right shift in bits. +/* Whole vector right shift in bits, i.e. towards element 0. Operand 0 is a vector to be shifted. Operand 1 is an integer shift amount in bits, which must be a multiple of the element size. */
Re: [C++ Patch] PR 62232
OK. Jason
[PATCH 13/14][AArch64_be] Fix vec_shr pattern to correctly implement endianness-neutral optab
The previous patch broke aarch64_be by redefining VEC_RSHIFT_EXPR / vec_shr_optab to always shift the vector towards gcc's element 0. This fixes aarch64_be to do that. check-gcc on aarch64-none-elf (no changes) and aarch64_be-none-elf (fixes all regressions produced by previous patch, i.e. no regressions from before redefining vec_shr). gcc/ChangeLog: * config/aarch64/aarch64-simd.md (vec_shr_mode *2): Fix bigendian. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 3fcf809113d73b37a95653b8c2be432478d2bc1e..e45eddbda7528cfbb4b0953b2c9934c5408d2f6d 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -776,7 +776,12 @@ (lshiftrt:VD (match_operand:VD 1 register_operand w) (match_operand:SI 2 immediate_operand i)))] TARGET_SIMD - ushr %d0, %d1, %2 + { +if (BYTES_BIG_ENDIAN) + return ushl %d0, %d1, %2; +else + return ushr %d0, %d1, %2; + } [(set_attr type neon_shift_imm)] ) @@ -804,6 +809,14 @@ DONE; } + if (BYTES_BIG_ENDIAN) +{ + rtx temp = operands[1]; + operands[1] = zero_reg; + zero_reg = temp; + num_bits = 128 - num_bits; +} + emit_insn (gen_aarch64_extmode (operands[0], operands[1], zero_reg, GEN_INT (num_bits / elem_bits))); DONE;
[PATCH 14/14][Vectorizer] Tidy up vect_create_epilog / use_scalar_result
Following earlier patches, vect_create_epilog_for_reduction contains exactly one case where extract_scalar_result==true. Hence, move the code 'if (extract_scalar_result)' there, and tidy-up/remove some variables. bootstrapped on x86_64-none-linux-gnu + check-gcc + check-g++. gcc/ChangeLog: * tree-vect-loop.c (vect_create_epilog_for_reduction): Move code for 'if (extract_scalar_result)' to the only place that it is true.diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 016e2c1fc839fc4d1c97caaa38064fb8bbb510d8..62b279e4d29d1fdfbfbd4e606fc8be9d608d3707 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -3867,7 +3867,6 @@ vect_create_epilog_for_reduction (vectree vect_defs, gimple stmt, tree orig_name, scalar_result; imm_use_iterator imm_iter, phi_imm_iter; use_operand_p use_p, phi_use_p; - bool extract_scalar_result = false; gimple use_stmt, orig_stmt, reduction_phi = NULL; bool nested_in_vect_loop = false; auto_vecgimple new_phis; @@ -4235,6 +4234,8 @@ vect_create_epilog_for_reduction (vectree vect_defs, gimple stmt, Create: va = vop va, va' } */ + tree rhs; + if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, Reduce using vector shifts\n); @@ -4260,7 +4261,20 @@ vect_create_epilog_for_reduction (vectree vect_defs, gimple stmt, gsi_insert_before (exit_gsi, epilog_stmt, GSI_SAME_STMT); } - extract_scalar_result = true; + /* 2.4 Extract the final scalar result. Create: + s_out3 = extract_field v_out2, bitpos */ + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + extract scalar result\n); + + rhs = build3 (BIT_FIELD_REF, scalar_type, new_temp, + bitsize, bitsize_zero_node); + epilog_stmt = gimple_build_assign (new_scalar_dest, rhs); + new_temp = make_ssa_name (new_scalar_dest, epilog_stmt); + gimple_assign_set_lhs (epilog_stmt, new_temp); + gsi_insert_before (exit_gsi, epilog_stmt, GSI_SAME_STMT); + scalar_results.safe_push (new_temp); } else { @@ -4355,30 +4369,8 @@ vect_create_epilog_for_reduction (vectree vect_defs, gimple stmt, else /* Not SLP - we have one scalar to keep in SCALAR_RESULTS. */ scalar_results.safe_push (new_temp); - - extract_scalar_result = false; } } - - /* 2.4 Extract the final scalar result. Create: - s_out3 = extract_field v_out2, bitpos */ - - if (extract_scalar_result) -{ - tree rhs; - - if (dump_enabled_p ()) -dump_printf_loc (MSG_NOTE, vect_location, - extract scalar result\n); - - rhs = build3 (BIT_FIELD_REF, scalar_type, - new_temp, bitsize, bitsize_zero_node); - epilog_stmt = gimple_build_assign (new_scalar_dest, rhs); - new_temp = make_ssa_name (new_scalar_dest, epilog_stmt); - gimple_assign_set_lhs (epilog_stmt, new_temp); - gsi_insert_before (exit_gsi, epilog_stmt, GSI_SAME_STMT); - scalar_results.safe_push (new_temp); -} vect_finalize_reduction:
parallel check output changes?
Has the changes that have gone into the check parallelization made the .sum file non-deterministic? I'm seeing a lot of small hunks in different orders which cause my comparison scripts to show big differences. I haven't been paying attention to the nature of the make check changes so Im not sure if this is expected... Or is this something else? Its the same code base between runs, just with a few changes made to some include files. ie: the order of the options -mstackrealign and -mno-stackrealign are swapped in this output: Running /gcc/2014-09-16/gcc/gcc/testsuite/gcc.target/i386/stackalign/stackalign.exp ... - UNSUPPORTED: gcc.target/i386/stackalign/asm-1.c -mstackrealign UNSUPPORTED: gcc.target/i386/stackalign/asm-1.c -mno-stackrealign ! UNSUPPORTED: gcc.target/i386/stackalign/longlong-1.c -mstackrealign UNSUPPORTED: gcc.target/i386/stackalign/longlong-1.c -mno-stackrealign UNSUPPORTED: gcc.target/i386/stackalign/longlong-2.c -mstackrealign UNSUPPORTED: gcc.target/i386/stackalign/longlong-2.c -mno-stackrealign PASS: gcc.target/i386/stackalign/pr39146.c -mstackrealign (test for excess errors) --- 110393,110402 PASS: gcc.target/i386/math-torture/trunc.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors) PASS: gcc.target/i386/math-torture/trunc.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) Running /gcc/2014-09-16/gcc/gcc/testsuite/gcc.target/i386/stackalign/stackalign.exp ... UNSUPPORTED: gcc.target/i386/stackalign/asm-1.c -mno-stackrealign ! UNSUPPORTED: gcc.target/i386/stackalign/asm-1.c -mstackrealign UNSUPPORTED: gcc.target/i386/stackalign/longlong-1.c -mno-stackrealign + UNSUPPORTED: gcc.target/i386/stackalign/longlong-1.c -mstackrealign UNSUPPORTED: gcc.target/i386/stackalign/longlong-2.c -mstackrealign UNSUPPORTED: gcc.target/i386/stackalign/longlong-2.c -mno-stackrealign PASS: gcc.target/i386/stackalign/pr39146.c -mstackrealign (test for excess errors) Andrew
[PATCH/RFC 15 / 14+2][RS6000] Remove vec_shl and (hopefully) fix vec_shr
Patch 12 of 14 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01475.html) will break bigendian targets implementing vec_shr. This is a PowerPC parallel of patch 13 of 14 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01477.html) for AArch64. I've checked I can build a stage 1 compiler for powerpc-none-eabi and that the assembly output looks plausible but no further than that. In fact I find BYTES_BIG_ENDIAN is defined to true on powerpcle-none-eabi as well as powerpc-none-eabi (and also on ppc64-none-elf, but to false on ppc64le-none-elf), so I'm not quite sure how your backend works in this regard - nonetheless I hope this is a helpful starting point even if not definitive. gcc/ChangeLog: * config/rs6000/vector.md (vec_shl_mode): Remove. (vec_shr_mode): Reverse shift if BYTES_BIG_ENDIAN. diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md index edbb83161d142b1a562735635fe90ef65b09fbbf..8bc010eb26526e2997d02ea7aef655e60eca8707 100644 --- a/gcc/config/rs6000/vector.md +++ b/gcc/config/rs6000/vector.md @@ -972,53 +972,11 @@ VECTOR_MEM_VSX_P (MODEmode) TARGET_ALLOW_MOVMISALIGN ) - -;; Vector shift left in bits. Currently supported ony for shift -;; amounts that can be expressed as byte shifts (divisible by 8). -;; General shift amounts can be supported using vslo + vsl. We're -;; not expecting to see these yet (the vectorizer currently -;; generates only shifts divisible by byte_size). -(define_expand vec_shl_mode - [(match_operand:VEC_L 0 vlogical_operand ) - (match_operand:VEC_L 1 vlogical_operand ) - (match_operand:QI 2 reg_or_short_operand )] - TARGET_ALTIVEC - -{ - rtx bitshift = operands[2]; - rtx shift; - rtx insn; - HOST_WIDE_INT bitshift_val; - HOST_WIDE_INT byteshift_val; - - if (! CONSTANT_P (bitshift)) -FAIL; - bitshift_val = INTVAL (bitshift); - if (bitshift_val 0x7) -FAIL; - byteshift_val = bitshift_val 3; - if (TARGET_VSX (byteshift_val 0x3) == 0) -{ - shift = gen_rtx_CONST_INT (QImode, byteshift_val 2); - insn = gen_vsx_xxsldwi_mode (operands[0], operands[1], operands[1], - shift); -} - else -{ - shift = gen_rtx_CONST_INT (QImode, byteshift_val); - insn = gen_altivec_vsldoi_mode (operands[0], operands[1], operands[1], - shift); -} - - emit_insn (insn); - DONE; -}) - ;; Vector shift right in bits. Currently supported ony for shift ;; amounts that can be expressed as byte shifts (divisible by 8). ;; General shift amounts can be supported using vsro + vsr. We're ;; not expecting to see these yet (the vectorizer currently -;; generates only shifts divisible by byte_size). +;; generates only shifts by a whole number of vector elements). (define_expand vec_shr_mode [(match_operand:VEC_L 0 vlogical_operand ) (match_operand:VEC_L 1 vlogical_operand ) @@ -1037,7 +995,9 @@ bitshift_val = INTVAL (bitshift); if (bitshift_val 0x7) FAIL; - byteshift_val = 16 - (bitshift_val 3); + byteshift_val = (bitshift_val 3); + if (!BYTES_BIG_ENDIAN) +byteshift_val = 16 - byteshift_val; if (TARGET_VSX (byteshift_val 0x3) == 0) { shift = gen_rtx_CONST_INT (QImode, byteshift_val 2);
[PATCH 16 / 14+2][MIPS] Remove vec_shl and (hopefully) fix vec_shr
Patch 12 of 14 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01475.html) will break bigendian targets implementing vec_shr. This is a MIPS parallel of patch 13 of 14 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01477.html) for AArch64; the idea is that vec_shr should be unaffected on little-endian, but reversed (to be the same as the old vec_shl) if big-endian. Manual inspection of assembler output looks to do the right sort of thing on mips and mips64, but I haven't been able to run any testcases so this is not definitive. I'm hoping it is nonetheless helpful as a starting point! gcc/ChangeLog: * config/mips/loongson.md (unspec): Remove UNSPEC_LOONGSON_DSLL. (vec_shl_mode): Remove. (vec_shr_mode): Reverse shift if BYTES_BIG_ENDIAN.diff --git a/gcc/config/mips/loongson.md b/gcc/config/mips/loongson.md index 474033d1e2c244d3b70ad5ed630ab9f29d5fd5f6..dcba23440a5cb8cf0f2063ee15fbcf9d2a579714 100644 --- a/gcc/config/mips/loongson.md +++ b/gcc/config/mips/loongson.md @@ -39,7 +39,6 @@ UNSPEC_LOONGSON_PUNPCKL UNSPEC_LOONGSON_PADDD UNSPEC_LOONGSON_PSUBD - UNSPEC_LOONGSON_DSLL UNSPEC_LOONGSON_DSRL ]) @@ -834,22 +833,18 @@ }) ;; Whole vector shifts, used for reduction epilogues. -(define_insn vec_shl_mode - [(set (match_operand:VWHBDI 0 register_operand =f) -(unspec:VWHBDI [(match_operand:VWHBDI 1 register_operand f) -(match_operand:SI 2 register_operand f)] - UNSPEC_LOONGSON_DSLL))] - TARGET_HARD_FLOAT TARGET_LOONGSON_VECTORS - dsll\t%0,%1,%2 - [(set_attr type fcvt)]) - (define_insn vec_shr_mode [(set (match_operand:VWHBDI 0 register_operand =f) (unspec:VWHBDI [(match_operand:VWHBDI 1 register_operand f) (match_operand:SI 2 register_operand f)] UNSPEC_LOONGSON_DSRL))] TARGET_HARD_FLOAT TARGET_LOONGSON_VECTORS - dsrl\t%0,%1,%2 + { +if (BYTES_BIG_ENDIAN) + return dsll\t%0,%1,%2; +else + return dsrl\t%0,%1,%2; + } [(set_attr type fcvt)]) (define_expand reduc_uplus_mode
Re: parallel check output changes?
On 09/18/2014 09:01 AM, Jakub Jelinek wrote: On Thu, Sep 18, 2014 at 08:56:50AM -0400, Andrew MacLeod wrote: Has the changes that have gone into the check parallelization made the .sum file non-deterministic? I'm seeing a lot of small hunks in different orders which cause my comparison scripts to show big differences. I haven't been paying attention to the nature of the make check changes so Im not sure if this is expected... Or is this something else? Its the same code base between runs, just with a few changes made to some include files. I'm using contrib/test_summary and haven't seen any non-determinisms in the output of that command. As for dg-extract-results.sh, we have two versions of that, one if you have python 2.6 or newer, another one if you don't. Perhaps the behavior of those two (I'm using the python version probably) differs? Jakub Not sure, although I do have python 2.7.5 installed for what its worth... I'll try another run in a bit. Andrew
Re: parallel check output changes?
On Thu, Sep 18, 2014 at 08:56:50AM -0400, Andrew MacLeod wrote: Has the changes that have gone into the check parallelization made the .sum file non-deterministic? I'm seeing a lot of small hunks in different orders which cause my comparison scripts to show big differences. I haven't been paying attention to the nature of the make check changes so Im not sure if this is expected... Or is this something else? Its the same code base between runs, just with a few changes made to some include files. I'm using contrib/test_summary and haven't seen any non-determinisms in the output of that command. As for dg-extract-results.sh, we have two versions of that, one if you have python 2.6 or newer, another one if you don't. Perhaps the behavior of those two (I'm using the python version probably) differs? Jakub
Re: [PATCH 12/14][Vectorizer] Redefine VEC_RSHIFT_EXPR and vec_shr_optab as endianness-neutral
On Thu, Sep 18, 2014 at 8:42 AM, Alan Lawrence alan.lawre...@arm.com wrote: The direction of VEC_RSHIFT_EXPR has been endian-dependent, contrary to the general principles of tree. This patch updates fold-const and the vectorizer (the only place where such expressions are created), such that VEC_RSHIFT_EXPR always shifts towards element 0. The tree code still maps directly onto the vec_shr_optab, and so this patch *will break any bigendian platform defining the vec_shr optab*. -- For AArch64_be, patch follows next in series; -- For PowerPC, I think patch/rfc 15 should fix, please inspect; -- For MIPS, I think patch/rfc 16 should fix, please inspect. gcc/ChangeLog: * fold-const.c (const_binop): VEC_RSHIFT_EXPR always shifts towards element 0. * tree-vect-loop.c (vect_create_epilog_for_reduction): always extract the result of a reduction with vector shifts from element 0. * tree.def (VEC_RSHIFT_EXPR, VEC_LSHIFT_EXPR): Comment shift direction. * doc/md.texi (vec_shr_m, vec_shl_m): Document shift direction. Testing Done: Bootstrap and check-gcc on x86_64-none-linux-gnu; check-gcc on aarch64-none-elf. Why wasn't this tested on the PowerLinux system in the GCC Compile Farm? Also, Bill Schmidt can help check the PPC parts fo the patches. Thanks, David
Re: [PATCH][PING] Enable -fsanitize-recover for KASan
Added Marek to comment on proposed UBSan option change. On 09/18/2014 02:52 PM, Jakub Jelinek wrote: --- a/gcc/opts.c +++ b/gcc/opts.c @@ -1551,6 +1551,12 @@ common_handle_option (struct gcc_options *opts, | SANITIZE_RETURNS_NONNULL_ATTRIBUTE)) opts-x_flag_delete_null_pointer_checks = 0; + /* UBSan and KASan enable recovery by default. */ + opts-x_flag_sanitize_recover + = !!(flag_sanitize (SANITIZE_UNDEFINED + | SANITIZE_UNDEFINED_NONDEFAULT + | SANITIZE_KERNEL_ADDRESS)); + Doesn't this override even user supplied -fsanitize-recover or -fno-sanitize-recover ? Have you tried both -fno-sanitize-recover -fsanitize=kernel-address and -fsanitize=kernel-address -fno-sanitize-recover option orders? I did and this worked in a seemingly logical way: * -fsanitize=address (disable recovery) * -fsanitize-recover -fsanitize=address (disable recovery) * -fsanitize=address -fsanitize-recover (enable recovery) * -fsanitize=kernel-address (enable recovery) * -fno-sanitize-recover -fsanitize=kernel-address (enable recovery) * -fsanitize=kernel-address -fno-sanitize-recover (enable recovery) Seems for -fdelete-null-pointer-checks we got it wrong too, IMHO for -fsanitize={null,{,returns-}nonnull-attribute,undefined} we want to disable it unconditionally, regardless of whether that option appears on the command line or not. My understanding is that all -fsanitize=(address|kernel-address|undefined|you-name-it) are simply packs of options to enable. User may override any selected option from the pack if he so desires. I don't think your proposal will work properly though, if one compiles with -fsanitize=undefined -fsanitize=address you'll just get userland asan with error recovery, which is highly undesirable Now that's a problem. Looks like I'll need a separate flag to achieve what I need (-fasan-recover? And maybe then rename -fsanitize-recover to -fubsan-recover for consistency?). or asan.c needs to limit it to flag_sanitize SANITIZE_KERNEL_ADDRESS mode only. We may want to UBsanitize kernel in future and this may cause the same problem as for userspace Asan/UBSan interaction you described above. Depends if you ever want to add recovery for userland sanitization. Also kernel developers want both recoverable (more user-friendly) and non-recoverable (faster) Asan error handling. -Y
Re: [PATCH, i386, Pointer Bounds Checker 31/x] Pointer Bounds Checker builtins for i386 target
On 17 Sep 20:06, Uros Bizjak wrote: On Wed, Sep 17, 2014 at 6:31 PM, Ilya Enkovich enkovich@gmail.com wrote: I don't like the way arguments are prepared. For the case above, bnd_ldx should have index_register_operand predicate in its pattern, and this predicate (and its mode) should be checked in the expander code. There are many examples of argument expansion in ix86_expand_builtin function, including how Pmode is handled. Also, please see how target is handled there. Target can be null, so REG_P predicate will crash. You should also select insn patterns depending on BNDmode, not TARGET_64BIT. Please use assign_386_stack_local so stack slots can be shared. SLOT_TEMP is intended for short-lived temporaries, you can introduce new slots if you need more live values at once. Uros. Thanks for comments! Here is a new version in which I addressed all your concerns. Unfortunately, it doesn't. The patch only fixed one instance w.r.t to target handling, the one I referred as an example. You still have unchecked target, at least in IX86_BUILTIN_BNDMK. However, you have a general problems in your builtin expansion code, so please look at how other builtins are handled. E.g.: if (optimize || !target || GET_MODE (target) != tmode || !register_operand(target, tmode)) target = gen_reg_rtx (tmode); also, here is an example how input operands are prepared: op0 = expand_normal (arg0); op1 = expand_normal (arg1); op2 = expand_normal (arg2); if (!register_operand (op0, Pmode)) op0 = ix86_zero_extend_to_Pmode (op0); if (!register_operand (op1, SImode)) op1 = copy_to_mode_reg (SImode, op1); if (!register_operand (op2, SImode)) op2 = copy_to_mode_reg (SImode, op2); So, Pmode is handled in a special way, even when x32 is not considered. BTW: I wonder if word_mode is needed here, Pmode can be SImode with address prefix (x32). Inside the expanders, please use expand_simple_binop and expand_unop on RTX, not tree expressions. Again, please see many examples. Thank you for additional explanations. Hope this time I answer your concerns correctly :) Yes, this version is MUCH better. There are further comments down the code. 2014-09-17 Ilya Enkovich ilya.enkov...@intel.com * config/i386/i386-builtin-types.def (BND): New. (ULONG): New. (BND_FTYPE_PCVOID_ULONG): New. (VOID_FTYPE_BND_PCVOID): New. (VOID_FTYPE_PCVOID_PCVOID_BND): New. (BND_FTYPE_PCVOID_PCVOID): New. (BND_FTYPE_PCVOID): New. (BND_FTYPE_BND_BND): New. (PVOID_FTYPE_PVOID_PVOID_ULONG): New. (PVOID_FTYPE_PCVOID_BND_ULONG): New. (ULONG_FTYPE_VOID): New. (PVOID_FTYPE_BND): New. * config/i386/i386.c: Include tree-chkp.h, rtl-chkp.h. (ix86_builtins): Add IX86_BUILTIN_BNDMK, IX86_BUILTIN_BNDSTX, IX86_BUILTIN_BNDLDX, IX86_BUILTIN_BNDCL, IX86_BUILTIN_BNDCU, IX86_BUILTIN_BNDRET, IX86_BUILTIN_BNDNARROW, IX86_BUILTIN_BNDINT, IX86_BUILTIN_SIZEOF, IX86_BUILTIN_BNDLOWER, IX86_BUILTIN_BNDUPPER. (builtin_isa): Add leaf_p and nothrow_p fields. (def_builtin): Initialize leaf_p and nothrow_p. (ix86_add_new_builtins): Handle leaf_p and nothrow_p flags. (bdesc_mpx): New. (bdesc_mpx_const): New. (ix86_init_mpx_builtins): New. (ix86_init_builtins): Call ix86_init_mpx_builtins. (ix86_emit_cmove): New. (ix86_emit_move_max): New. (ix86_expand_builtin): Expand IX86_BUILTIN_BNDMK, IX86_BUILTIN_BNDSTX, IX86_BUILTIN_BNDLDX, IX86_BUILTIN_BNDCL, IX86_BUILTIN_BNDCU, IX86_BUILTIN_BNDRET, IX86_BUILTIN_BNDNARROW, IX86_BUILTIN_BNDINT, IX86_BUILTIN_SIZEOF, IX86_BUILTIN_BNDLOWER, IX86_BUILTIN_BNDUPPER. * config/i386/i386.h (ix86_stack_slot): Added SLOT_BND_STORED. .. + /* We need to move bounds to memory before any computations. */ + if (!MEM_P (op1)) + { + m1 = assign_386_stack_local (BNDmode, SLOT_TEMP); + emit_move_insn (m1, op1); + } + else + m1 = op1; No negative conditions, please. Just swap the arms of if sentence. It is much more readable. + + /* Generate mem expression to be used for access to LB and UB. */ + m1h1 = gen_rtx_MEM (Pmode, XEXP (m1, 0)); + m1h2 = gen_rtx_MEM (Pmode, plus_constant (Pmode, XEXP (m1, 0), + GET_MODE_SIZE (Pmode))); Please use adjust_address instead of manually producing MEMs. + + t1 = gen_reg_rtx (Pmode); + + /* Compute LB. */ + emit_move_insn (t1, m1h1); + ix86_emit_move_max (t1, lb); +
[committed] Fix up pr59594.c testcase (PR testsuite/63292)
Hi! This testsuite contains a buffer overflow, I've fixed it thusly, tested that it still fails with the fix reverted and works with current trunk and committed as obvious to 4.9 and trunk. 2014-09-18 Jakub Jelinek ja...@redhat.com PR testsuite/63292 * gcc.dg/vect/pr59594.c (b): Increase size to N + 2 elements. --- gcc/testsuite/gcc.dg/vect/pr59594.c.jj 2014-01-29 10:26:34.0 +0100 +++ gcc/testsuite/gcc.dg/vect/pr59594.c 2014-09-18 15:42:38.628739317 +0200 @@ -3,7 +3,7 @@ #include tree-vect.h #define N 1024 -int b[N + 1]; +int b[N + 2]; int main () Jakub
Re: [PATCH, i386, Pointer Bounds Checker 30/x] Size relocation
On 17 Sep 20:51, Uros Bizjak wrote: On Wed, Sep 17, 2014 at 8:35 PM, Ilya Enkovich enkovich@gmail.com wrote: On 16 Sep 12:22, Uros Bizjak wrote: On Tue, Sep 16, 2014 at 11:37 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-09-16 13:08 GMT+04:00 Uros Bizjak ubiz...@gmail.com: Can x86_64_immediate_operand predicate be used here? I think it cannot be used because of TLS symbols not counting as immediate. OK, please introduce a new predicate, similar to x86_64_immediate_operand, perhaps x86_64_immediate_size_operand, so we can add some comments there. This will also help to macroize the insn, x86_64_immediate_operand has !TARGET_64BIT shortcut for this case. Uros. I don't see how new predicate would help to macroize insn. Single template may look as following patch. You put early return for !TARGET_64BITS. Please see x86_64_immediate_operand predicate. So, /* Here comes comment. */ (define_predicate x86_64_immediate_size_operand (match_code symbol_ref) { if (!TARGET_64BIT) return true; /* Comment here explaining these conditions. */ return (ix86_cmodel == CM_SMALL || ix86_cmodel == CM_KERNEL); } And then in the pattern itself: if (x86_64_immediate_size_operand (operands[1], VOIDmode) return mov{l}\t{%1@SIZE, %k0|%k0, %1@SIZE}; else return movabs{q}\t{%1@SIZE, %0|%0, %1@SIZE}; Uros. Here is a version with check in a form you suggest. Thanks, Ilya -- 2014-09-18 Ilya Enkovich ilya.enkov...@intel.com * config/i386/i386.md (UNSPEC_SIZEOF): New. (move_size_reloc_mode): New. * config/i386/predicates.md (symbol_operand): New. (x86_64_immediate_size_operand): New. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 2c367b2..db22b06 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -79,6 +79,7 @@ UNSPEC_PLTOFF UNSPEC_MACHOPIC_OFFSET UNSPEC_PCREL + UNSPEC_SIZEOF ;; Prologue support UNSPEC_STACK_ALLOC @@ -18554,6 +18555,21 @@ bndstx\t{%2, %3|%3, %2} [(set_attr type mpxst)]) +(define_insn move_size_reloc_mode + [(set (match_operand:SWI48 0 register_operand =r) + (unspec:SWI48 +[(match_operand:SWI48 1 symbol_operand)] +UNSPEC_SIZEOF))] + TARGET_MPX +{ + if (x86_64_immediate_size_operand (operands[1], VOIDmode)) +return mov{l}\t{%1@SIZE, %k0|%k0, %1@SIZE}; + else +return movabs{q}\t{%1@SIZE, %0|%0, %1@SIZE}; +} + [(set_attr type imov) + (set_attr mode MODE)]) + (include mmx.md) (include sse.md) (include sync.md) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index cd542b7..da01c9a 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -124,6 +124,10 @@ (match_test TARGET_64BIT) (match_test REGNO (op) BX_REG))) +;; Return true if VALUE is symbol reference +(define_predicate symbol_operand + (match_code symbol_ref)) + ;; Return true if VALUE can be stored in a sign extended immediate field. (define_predicate x86_64_immediate_operand (match_code const_int,symbol_ref,label_ref,const) @@ -336,6 +340,19 @@ return false; }) +;; Return true if size of VALUE can be stored in a sign +;; extended immediate field. +(define_predicate x86_64_immediate_size_operand + (match_code symbol_ref) +{ + if (!TARGET_64BIT) +return true; + + /* For 64 bit target we may assume size of object fits + immediate only when code model guarantees that. */ + return (ix86_cmodel == CM_SMALL || ix86_cmodel == CM_KERNEL); +}) + ;; Return true if OP is general operand representable on x86_64. (define_predicate x86_64_general_operand (if_then_else (match_test TARGET_64BIT)
[committed] Don't instrument clobbers with asan (PR c++/62017)
Hi! Clobber stmts, being artificial statements, were certainly never meant to be instrumented. In 4.8 when asan has been introduced into gcc, the lhs of clobber could be only a decl and as a whole decl store would not be really instrumented, but with *this clobbers in 4.9 that is no longer the case. Fixed thusly, tested on x86_64-linux, committed to trunk and 4.9 as obvious. 2014-09-18 Jakub Jelinek ja...@redhat.com PR c++/62017 * asan.c (transform_statements): Don't instrument clobber statements. * g++.dg/asan/pr62017.C: New test. --- gcc/asan.c.jj 2014-09-08 22:12:52.0 +0200 +++ gcc/asan.c 2014-09-18 14:34:30.023446693 +0200 @@ -2072,6 +2072,7 @@ transform_statements (void) if (has_stmt_been_instrumented_p (s)) gsi_next (i); else if (gimple_assign_single_p (s) + !gimple_clobber_p (s) maybe_instrument_assignment (i)) /* Nothing to do as maybe_instrument_assignment advanced the iterator I. */; --- gcc/testsuite/g++.dg/asan/pr62017.C.jj 2014-09-18 14:44:03.964525585 +0200 +++ gcc/testsuite/g++.dg/asan/pr62017.C 2014-09-18 14:43:52.0 +0200 @@ -0,0 +1,17 @@ +// PR c++/62017 +// { dg-do run } + +struct A +{ + int x; + virtual ~A () {} +}; +struct B : public virtual A {}; +struct C : public virtual A {}; +struct D : public B, virtual public C {}; + +int +main () +{ + D d; +} Jakub
Re: [PATCH] Fix PR 58867: asan and ubsan tests not run for installed testing.
Hi Andrew! What is the status of this patch? Enabling ASan and UBSan testsuites is useful for testing installed toolchain, so I wonder if you are going to commit this. -Maxim
[PATCH][PING] Keep patch file permissions in mklog
On 08/04/2014 12:14 PM, Tom de Vries wrote: On 04-08-14 08:45, Yury Gribov wrote: Thanks! My 2 (actually 4) cents below. Hi Yuri, thanks for the review. +if ($#ARGV == 1 ($ARGV[0] eq -i || $ARGV[0] eq --inline)) { +$diff = $ARGV[1]; Can we shift here and then just set $diff to $ARGV[0] unconditionally? Done. +if ($diff eq -) { +die Reading from - and using -i are not compatible; +} Hm, can't we dump ChangeLog to stdout in that case? The limitation looks rather strange. My original idea here was that --inline means 'in the patch file', which is not possible if the patch comes from stdin. I've now interpreted it such that --inline prints to stdout what it would print to the patch file otherwise, that is, both log and patch. Printing just the log to stdout can be already be achieved by not using --inline. +open (FILE1, '', $tmp) or die Could not open temp file; Could we use more descriptive name? I've used the slightly more descriptive 'OUTPUTFILE'. +system (cat $diff $tmp) == 0 +or die Could not append patch to temp file; ... +unlink ($tmp) == 1 or die Could not remove temp file; The checks look like an overkill given that we don't check for result of mktemp... I've added a check for the result of mktemp, and removed the unlink result check. I've left in the Could not append patch to temp file check because the patch file might be read-only. OK for trunk? Thanks, - Tom Pinging the patch for Tom.
Re: [PATCH] RTEMS: Update contrib/config-list.mk
I committed this to 4.9 and head. Sebastian.. please double check that it is OK please. I had some issues with applying it to the head and manually did it. --joel On 9/17/2014 8:37 AM, Sebastian Huber wrote: contrib/ChangeLog 2014-09-17 Sebastian Huber sebastian.hu...@embedded-brains.de * config-list.mk (LIST): Add arm-rtems. Add nios2-rtems. Remove extra option from powerpc-rtems. --- contrib/config-list.mk | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/contrib/config-list.mk b/contrib/config-list.mk index 4345487..056fbf0 100644 --- a/contrib/config-list.mk +++ b/contrib/config-list.mk @@ -17,7 +17,7 @@ LIST = aarch64-elf aarch64-linux-gnu \ arc-elf32OPT-with-cpu=arc600 arc-elf32OPT-with-cpu=arc700 \ arc-linux-uclibcOPT-with-cpu=arc700 arceb-linux-uclibcOPT-with-cpu=arc700 \ arm-wrs-vxworks arm-netbsdelf \ - arm-linux-androideabi arm-uclinux_eabi arm-eabi \ + arm-linux-androideabi arm-uclinux_eabi arm-eabi arm-rtems \ arm-symbianelf avr-rtems avr-elf \ bfin-elf bfin-uclinux bfin-linux-uclibc bfin-rtems bfin-openbsd \ c6x-elf c6x-uclinux cr16-elf cris-elf cris-linux crisv32-elf crisv32-linux \ @@ -48,13 +48,13 @@ LIST = aarch64-elf aarch64-linux-gnu \ moxie-uclinux moxie-rtems \ msp430-elf \ nds32le-elf nds32be-elf \ - nios2-elf nios2-linux-gnu \ + nios2-elf nios2-linux-gnu nios2-rtems \ pdp11-aout picochip-elfOPT-enable-obsolete \ powerpc-darwin8 \ powerpc-darwin7 powerpc64-darwin powerpc-freebsd6 powerpc-netbsd \ powerpc-eabispe powerpc-eabisimaltivec powerpc-eabisim ppc-elf \ powerpc-eabialtivec powerpc-xilinx-eabi powerpc-eabi \ - powerpc-rtems4.11OPT-enable-threads=yes powerpc-linux_spe \ + powerpc-rtems powerpc-linux_spe \ powerpc-linux_paired powerpc64-linux_altivec \ powerpc-wrs-vxworks powerpc-wrs-vxworksae powerpc-lynxos powerpcle-elf \ powerpcle-eabisim powerpcle-eabi rs6000-ibm-aix4.3 rs6000-ibm-aix5.1.0 \ -- Joel Sherrill, Ph.D. Director of Research Development joel.sherr...@oarcorp.comOn-Line Applications Research Ask me about RTEMS: a free RTOS Huntsville AL 35805 Support Available(256) 722-9985
Re: [PATCH] RTEMS: Update contrib/config-list.mk
On 9/18/2014 6:51 AM, Jan-Benedict Glaw wrote: On Wed, 2014-09-17 10:52:34 -0500, Joel Sherrill joel.sherr...@oarcorp.com wrote: On 9/17/2014 10:41 AM, Sebastian Huber wrote: On 09/17/2014 04:45 PM, Jan-Benedict Glaw wrote: On Wed, 2014-09-17 15:37:32 +0200, Sebastian Hubersebastian.hu...@embedded-brains.de wrote: contrib/ChangeLog 2014-09-17 Sebastian Hubersebastian.hu...@embedded-brains.de * config-list.mk (LIST): Add arm-rtems. Add nios2-rtems. Remove extra option from powerpc-rtems. What's the rationale for removing --enable-threads=yes here, as well as the specific version number? [...] And is this the input to your buildbot? :) Yes, the target list in contrib/config-list.mk is what'll be built using the config-list.mk-building backend. (The robot has another backend using a different build strategy, which has a separate target list, though one could argue that I'd also include all the config-list.mk targets in that other list as well.) And to tell the whole story, Sebastian approached me with extending the target lists in use by those targets he sent a patch for; I just asked him to go this route, because I guess that'd be beneficial for other folks as well. OK. Thanks for clarifying that. I suspected there was a link. And it is committed. And I will post a follow up patch to add v850-elf and v850-rtems. --joel MfG, JBG
[PATCH] Add v850-rtems to contrib/config-list.mk
OK to commit? 2014-09-18 Joel Sherrill joel.sherr...@oarcorp.com * config-list.mk (LIST): Add v850-rtems. Index: contrib/config-list.mk === --- contrib/config-list.mk (revision 215357) +++ contrib/config-list.mk (working copy) @@ -68,7 +68,7 @@ sparc-wrs-vxworks sparc64-elf sparc64-rtems sparc64-linux sparc64-freebsd6 \ sparc64-netbsd sparc64-openbsd spu-elf \ tilegx-linux-gnu tilegxbe-linux-gnu tilepro-linux-gnu \ - v850e-elf v850-elf vax-linux-gnu \ + v850e-elf v850-elf v850-rtems vax-linux-gnu \ vax-netbsdelf vax-openbsd x86_64-apple-darwin \ x86_64-pc-linux-gnuOPT-with-fpmath=avx \ x86_64-elfOPT-with-fpmath=sse x86_64-freebsd6 x86_64-netbsd \ -- Joel Sherrill, Ph.D. Director of Research Development joel.sherr...@oarcorp.comOn-Line Applications Research Ask me about RTEMS: a free RTOS Huntsville AL 35805 Support Available(256) 722-9985
Re: [jit] Add sphinx-based documentation for libgccjit
On Sep 17, 2014, at 6:22 PM, David Malcolm dmalc...@redhat.com wrote: I greatly prefer to use Sphinx over Texinfo, both for the ease of editing, and the quality of the generated HTML; I already use it for both the Python bindings to libgccjit, and for gcc-python-plugin. Hence I've used Sphinx for these docs. It's trivial to build texinfo and info files from it (assuming you have sphinx installed). So, I’d recommend for you to additionally generate and build the texinfo and possibly the html source from it and check that in as a generated file. That way, no one has to have the package and all the builds and installs just work as normal. People that want to edit those file, which will be few, would then just install that package and regenerate and check it in. People doing spelling corrections and the like, could just edit the source and let someone else regenerate as well.
Re: [PATCHv4] Vimrc config with GNU formatting
On Sep 18, 2014, at 1:40 AM, Yury Gribov y.gri...@samsung.com wrote: How about adding a disclaimer? E.g. beware that Vim plugins are a GAPING SECURITY HOLE so use the at YOUR OWN RISK. (And note that Braun's plugin does use sandboxes). Building gcc features a security risk at least as big as a plugin for vim. And, yes, I’ve built gcc in a sandbox before.
[PING] [PATCH 2/2] Add patch for debugging compiler ICEs.
Ping. On 09/11/2014 08:20 PM, Maxim Ostapenko wrote: Hi, Joseph, Thanks for your review! I've added comments for new functions and replaced POSIX subprocess interfaces with libiberty's ones. In general, when cc1 or cc1plus ICE-es, we try to reproduce the bug by running compiler 3 times and comparing stderr and stdout on each attempt with respective ones that were gotten as the result of previous compiler run (we use temporary dump files to do this). If these files are identical, we add GCC configuration (e.g. target, configure options and version), compiler command line and preprocessed source code into last dump file, containing backtrace. Following Jakub's approach, we trigger ICE_EXIT_CODE instead of FATAL_EXIT_CODE in case of DK_FATAL error to differ ICEs from other fatal errors, so try_generate_repro routine will be able to run even if fatal_error occurred in compiler. We've noticed that on rare occasion a particularly severe segfault can cause GCC to abort without ICE-ing. These (hopefully rare) errors will be missed by our patch, because SIGSEGV handler is not able to catch the signal due to corrupted stack. It could make sense to allocate separate stack for SIGSEGV handler to resolve this situation. -Maxim On 09/10/2014 08:37 PM, Joseph S. Myers wrote: On Wed, 10 Sep 2014, Jakub Jelinek wrote: On Tue, Sep 09, 2014 at 10:51:23PM +, Joseph S. Myers wrote: On Thu, 28 Aug 2014, Maxim Ostapenko wrote: diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c index 0cc7593..67b8c5b 100644 --- a/gcc/diagnostic.c +++ b/gcc/diagnostic.c @@ -492,7 +492,7 @@ diagnostic_action_after_output (diagnostic_context *context, real_abort (); diagnostic_finish (context); fnotice (stderr, compilation terminated.\n); - exit (FATAL_EXIT_CODE); + exit (ICE_EXIT_CODE); Why? This is the case for fatal_error. FATAL_EXIT_CODE seems right for this, and ICE_EXIT_CODE wrong. So that the driver can understand the difference between an ICE and other fatal errors (e.g. sorry etc.). Users are typically using the driver and for them it matters what exit code is returned from the driver, not from cc1/cc1plus etc. Well, I think the next revision of the patch submission needs more explanation in this area. What exit codes do cc1 and the driver now return for (normal error, fatal error, ICE), and what do they return after the patch, and how does the change to the fatal_error case avoid incorrect changes if either cc1 or the driver called fatal_error (as opposed to either cc1 or the driver having an ICE)? Maybe that explanation should be in the form of a comment on this exit call, explaining why the counterintuitive use of ICE_EXIT_CODE in the DK_FATAL case is correct. 2014-09-04 Jakub Jelinek ja...@redhat.com Max Ostapenko m.ostape...@partner.samsung.com * common.opt: New option. * doc/invoke.texi: Describe new option. * diagnostic.c (diagnostic_action_after_output): Exit with ICE_EXIT_CODE instead of FATAL_EXIT_CODE. * gcc.c (execute): Don't free first string early, but at the end of the function. Call retry_ice if compiler exited with ICE_EXIT_CODE. (main): Factor out common code. (print_configuration): New function. (try_fork): Likewise. (redirect_stdout_stderr): Likewise. (files_equal_p): Likewise. (check_repro): Likewise. (run_attempt): Likewise. (do_report_bug): Likewise. (append_text): Likewise. (try_generate_repro): Likewise diff --git a/gcc/common.opt b/gcc/common.opt index 7d78803..ce71f09 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1120,6 +1120,11 @@ fdump-noaddr Common Report Var(flag_dump_noaddr) Suppress output of addresses in debugging dumps +freport-bug +Common Driver Var(flag_report_bug) +Collect and dump debug information into temporary file if ICE in C/C++ +compiler occured. + fdump-passes Common Var(flag_dump_passes) Init(0) Dump optimization passes diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c index 73666d6..dbc928b 100644 --- a/gcc/diagnostic.c +++ b/gcc/diagnostic.c @@ -494,7 +494,10 @@ diagnostic_action_after_output (diagnostic_context *context, real_abort (); diagnostic_finish (context); fnotice (stderr, compilation terminated.\n); - exit (FATAL_EXIT_CODE); + /* Exit with ICE_EXIT_CODE rather then FATAL_EXIT_CODE so the driver + understands the difference between an ICE and other fatal errors + (DK_SORRY and DK_ERROR). */ + exit (ICE_EXIT_CODE); default: gcc_unreachable (); diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 863b382..565421c 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -6336,6 +6336,11 @@ feasible to use diff on debugging dumps for compiler invocations with different compiler binaries and/or different text / bss / data / heap / stack / dso start locations. +@item -freport-bug +@opindex freport-bug +Collect and dump debug information into temporary file if ICE in C/C++ +compiler
Re: parallel check output changes?
On 09/18/2014 09:05 AM, Andrew MacLeod wrote: On 09/18/2014 09:01 AM, Jakub Jelinek wrote: On Thu, Sep 18, 2014 at 08:56:50AM -0400, Andrew MacLeod wrote: Has the changes that have gone into the check parallelization made the .sum file non-deterministic? I'm seeing a lot of small hunks in different orders which cause my comparison scripts to show big differences. I haven't been paying attention to the nature of the make check changes so Im not sure if this is expected... Or is this something else? Its the same code base between runs, just with a few changes made to some include files. I'm using contrib/test_summary and haven't seen any non-determinisms in the output of that command. As for dg-extract-results.sh, we have two versions of that, one if you have python 2.6 or newer, another one if you don't. Perhaps the behavior of those two (I'm using the python version probably) differs? Jakub Not sure, although I do have python 2.7.5 installed for what its worth... I'll try another run in a bit. Andrew hum. My 3rd run (which has no compilation change from the 2nd one) is different from both other runs :-P. I did tweak my -j parameter in the make check, but that is it. Andrew
Re: [AArch64] Auto-generate the BUILTIN_ macros for aarch64-builtins.c
On Sep 18, 2014, at 3:12 AM, Richard Earnshaw rearn...@arm.com wrote: Is there any real need to write this into the source directory and have the built file checked in? Ie. can't we always write to the build directory and use it from there. I build part of my .md file from a C++ program, so I have to build that program, then generate the md file: s-mddeps: abi.md abi.md: s-abi; @true s-abi: genmd $(srcdir)/config/port/port-assist.h ./genmd tmp-abi.md $(SHELL) $(srcdir)/../move-if-change tmp-abi.md abi.md $(STAMP) s-abi genmd: $(srcdir)/config/port/genmd.c $(OPTIONS_H) s-genbuiltin touch insn-constants.h touch insn-flags.h $(COMPILER_FOR_BUILD) $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) $(srcdir)/config/port/genmd.c -o genmd Certainly bash source is portable enough to just built on demand. This let’s me take content that is in .h files concerning the abi and generate md constants from them.
Patch to fix PR61360
The following patch fixes the PR. The details can be found on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360 The patch was bootstrapped and tested on x86/x86-64. Committed as rev. 215358. 2014-09-18 Vladimir Makarov vmaka...@redhat.com PR target/61360 * lra.c (lra): Call recog_init. 2014-09-18 Vladimir Makarov vmaka...@redhat.com PR target/61360 * gcc.target/i386/pr61360.c: New. Index: lra.c === --- lra.c (revision 215337) +++ lra.c (working copy) @@ -2135,6 +2135,11 @@ lra (FILE *f) lra_in_progress = 1; + /* The enable attributes can change their values as LRA starts + although it is a bad practice. To prevent reuse of the outdated + values, clear them. */ + recog_init (); + lra_live_range_iter = lra_coalesce_iter = 0; lra_constraint_iter = lra_constraint_iter_after_spill = 0; lra_inheritance_iter = lra_undo_inheritance_iter = 0; Index: testsuite/gcc.target/i386/pr61360.c === --- testsuite/gcc.target/i386/pr61360.c (revision 0) +++ testsuite/gcc.target/i386/pr61360.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -march=amdfam10 -O2 } */ +int a, b, c, e, f, g, h; +long *d; +__attribute__((cold)) void fn1() { + int i = g | 1; + for (; g; h++) { +for (; a; e++) d[0] = c; +if (0.002 * i) break; +for (; b; f++) d[h] = 0; + } +}
[jit] Build the example files from the documentation when running the testsuite
Doing this caught missing return statements in the examples. This brings the result I see in jit.sum to: # of expected passes 4286 Committed to branch dmalcolm/jit: gcc/jit/ChangeLog.jit: * docs/examples/install-hello-world.c (main): Fix missing return. * docs/examples/tut01-square.c (main): Likewise. * docs/examples/tut02-sum-of-squares.c (main): Likewise. gcc/testsuite/ChangeLog.jit: * jit.dg/jit.exp: When constructing tests, add the example files from the documentation, to ensure that they compile. (I accidentally committed this to gcc/testsuite/ChangeLog; I've fixed it up to use ChangeLog.jit in a subsequent commit) --- gcc/jit/ChangeLog.jit| 7 +++ gcc/jit/docs/examples/install-hello-world.c | 1 + gcc/jit/docs/examples/tut01-square.c | 1 + gcc/jit/docs/examples/tut02-sum-of-squares.c | 1 + gcc/testsuite/ChangeLog | 5 + gcc/testsuite/jit.dg/jit.exp | 7 +++ 6 files changed, 22 insertions(+) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index b42038e..7ee7ebf 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,3 +1,10 @@ +2014-09-18 David Malcolm dmalc...@redhat.com + + * docs/examples/install-hello-world.c (main): Fix missing + return. + * docs/examples/tut01-square.c (main): Likewise. + * docs/examples/tut02-sum-of-squares.c (main): Likewise. + 2014-09-17 David Malcolm dmalc...@redhat.com * docs/Makefile: New file. diff --git a/gcc/jit/docs/examples/install-hello-world.c b/gcc/jit/docs/examples/install-hello-world.c index c75543d..29afad9 100644 --- a/gcc/jit/docs/examples/install-hello-world.c +++ b/gcc/jit/docs/examples/install-hello-world.c @@ -100,4 +100,5 @@ main (int argc, char **argv) gcc_jit_context_release (ctxt); gcc_jit_result_release (result); + return 0; } diff --git a/gcc/jit/docs/examples/tut01-square.c b/gcc/jit/docs/examples/tut01-square.c index ddb218e..ea07b92 100644 --- a/gcc/jit/docs/examples/tut01-square.c +++ b/gcc/jit/docs/examples/tut01-square.c @@ -84,4 +84,5 @@ main (int argc, char **argv) error: gcc_jit_context_release (ctxt); gcc_jit_result_release (result); + return 0; } diff --git a/gcc/jit/docs/examples/tut02-sum-of-squares.c b/gcc/jit/docs/examples/tut02-sum-of-squares.c index e2811ac..1970a36 100644 --- a/gcc/jit/docs/examples/tut02-sum-of-squares.c +++ b/gcc/jit/docs/examples/tut02-sum-of-squares.c @@ -149,4 +149,5 @@ main (int argc, char **argv) error: gcc_jit_context_release (ctxt); gcc_jit_result_release (result); + return 0; } diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 448a7ef..942e219 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2014-09-18 David Malcolm dmalc...@redhat.com + + * jit.dg/jit.exp: When constructing tests, add the example files + from the documentation, to ensure that they compile. + 2014-09-09 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.target/powerpc/swaps-p8-15.c: Remove scan-assembler-not for diff --git a/gcc/testsuite/jit.dg/jit.exp b/gcc/testsuite/jit.dg/jit.exp index 878ff4b..7986185 100644 --- a/gcc/testsuite/jit.dg/jit.exp +++ b/gcc/testsuite/jit.dg/jit.exp @@ -33,7 +33,14 @@ if ![info exists GCC_UNDER_TEST] { dg-init # Gather a list of all tests. + +# Tests within the testsuite: gcc/testsuite/jit.dg/test-*.c set tests [lsort [find $srcdir/$subdir test-*.c]] + +# We also test the examples within the documentation, to ensure that +# they compile: +set tests [lsort [concat $tests [find $srcdir/../jit/docs/examples *.c]]] + verbose tests: $tests proc jit-dg-test { prog do_what extra_tool_flags } { -- 1.7.11.7
Re: Patch to fix PR61360
On Thu, Sep 18, 2014 at 12:04:30PM -0400, Vladimir Makarov wrote: The following patch fixes the PR. The details can be found on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360 The patch was bootstrapped and tested on x86/x86-64. Committed as rev. 215358. What effect does this have on compile time? 2014-09-18 Vladimir Makarov vmaka...@redhat.com PR target/61360 * lra.c (lra): Call recog_init. 2014-09-18 Vladimir Makarov vmaka...@redhat.com PR target/61360 * gcc.target/i386/pr61360.c: New. Jakub
[patch] rename DECL_ABSTRACT to DECL_ABSTRACT_P
Similarly named DECL_ABSTRACT, DECL_ABSTRACT_ORIGIN, and DECL_ORIGIN are somewhat confusing to my poor brain. Particularly annoying is DECL_ABSTRACT which is actually a boolean, unlike the other two. Would it be OK to rename it something more sensible like DECL_ABSTRACT_P? I know this is a longstanding name, but the proposed is clearer and virtually the same. OK for mainline? commit 2705197662689e354fe397abe907ebf3763eae2d Author: Aldy Hernandez al...@redhat.com Date: Thu Sep 18 10:06:43 2014 -0600 * cgraph.h, dbxout.c, dwarfout2.c, gimple-fold.c, lto-streamer-out.c, print-tree.c, symtab.c, tree-inline.c, tree-streamer-in.c, tree-streamer-out.c, tree.c, tree.h, varpool.c: Rename all instances of DECL_ABSTRACT to DECL_ABSTRACT_P. cp/ * class.c, decl.c, optimize.c: Rename all instances of DECL_ABSTRACT to DECL_ABSTRACT_P. lto/ * lto-symtab.c, lto.c: Rename all instances of DECL_ABSTRACT to DECL_ABSTRACT_P. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index dd76758..9c4ec45 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,11 @@ +2014-09-18 Aldy Hernandez al...@redhat.com + + * cgraph.h, dbxout.c, dwarfout2.c, gimple-fold.c, + lto-streamer-out.c, print-tree.c, symtab.c, tree-inline.c, + tree-streamer-in.c, tree-streamer-out.c, tree.c, tree.h, + varpool.c: Rename all instances of DECL_ABSTRACT to + DECL_ABSTRACT_P. + 2014-09-05 David Malcolm dmalc...@redhat.com * config/arc/arc.c (arc_print_operand): Use insn method of diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 030a1c7..0902fe9 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -1970,7 +1970,7 @@ symtab_node::real_symbol_p (void) { cgraph_node *cnode; - if (DECL_ABSTRACT (decl)) + if (DECL_ABSTRACT_P (decl)) return false; if (!is_a cgraph_node * (this)) return true; diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog index 3d87231..a236569 100644 --- a/gcc/cp/ChangeLog +++ b/gcc/cp/ChangeLog @@ -1,3 +1,8 @@ +2014-09-18 Aldy Hernandez al...@redhat.com + + * class.c, decl.c, optimize.c: Rename all instances of + DECL_ABSTRACT to DECL_ABSTRACT_P. + 2014-09-05 Jason Merrill ja...@redhat.com PR c++/62659 diff --git a/gcc/cp/class.c b/gcc/cp/class.c index 09f946f..1c802b4 100644 --- a/gcc/cp/class.c +++ b/gcc/cp/class.c @@ -4580,7 +4580,7 @@ clone_function_decl (tree fn, int update_method_vec_p) } /* Note that this is an abstract function that is never emitted. */ - DECL_ABSTRACT (fn) = 1; + DECL_ABSTRACT_P (fn) = 1; } /* DECL is an in charge constructor, which is being defined. This will diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index d8fb35e..775e057 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -2262,7 +2262,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool newdecl_is_friend) } /* Preserve abstractness on cloned [cd]tors. */ - DECL_ABSTRACT (newdecl) = DECL_ABSTRACT (olddecl); + DECL_ABSTRACT_P (newdecl) = DECL_ABSTRACT_P (olddecl); /* Update newdecl's parms to point at olddecl. */ for (parm = DECL_ARGUMENTS (newdecl); parm; @@ -10272,7 +10272,7 @@ grokdeclarator (const cp_declarator *declarator, clones. The decloning optimization (for space) may revert this subsequently if it determines that the clones should share a common implementation. */ - DECL_ABSTRACT (decl) = 1; + DECL_ABSTRACT_P (decl) = 1; } else if (current_class_type constructor_name_p (unqualified_id, current_class_type)) diff --git a/gcc/cp/optimize.c b/gcc/cp/optimize.c index 31acb07..f37515ec2 100644 --- a/gcc/cp/optimize.c +++ b/gcc/cp/optimize.c @@ -270,7 +270,7 @@ maybe_thunk_body (tree fn, bool force) (for non-vague linkage ctors) or the COMDAT group (otherwise). */ populate_clone_array (fn, fns); - DECL_ABSTRACT (fn) = false; + DECL_ABSTRACT_P (fn) = false; if (!DECL_WEAK (fn)) { TREE_PUBLIC (fn) = false; diff --git a/gcc/dbxout.c b/gcc/dbxout.c index d856bdd..91cedf7 100644 --- a/gcc/dbxout.c +++ b/gcc/dbxout.c @@ -1618,7 +1618,7 @@ dbxout_type_methods (tree type) /* Also ignore abstract methods; those are only interesting to the DWARF backends. */ - if (DECL_IGNORED_P (fndecl) || DECL_ABSTRACT (fndecl)) + if (DECL_IGNORED_P (fndecl) || DECL_ABSTRACT_P (fndecl)) continue; /* Redundantly output the plain name, since that's what gdb diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 23a80d8..e3c4f98 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -3679,7 +3679,7 @@ decl_ultimate_origin (const_tree decl) /* output_inline_function sets DECL_ABSTRACT_ORIGIN for all the nodes in the function to point to themselves; ignore that if we're trying to output the abstract instance of this function. */ - if
Re: [patch] rename DECL_ABSTRACT to DECL_ABSTRACT_P
On Thu, Sep 18, 2014 at 10:11:24AM -0600, Aldy Hernandez wrote: Similarly named DECL_ABSTRACT, DECL_ABSTRACT_ORIGIN, and DECL_ORIGIN are somewhat confusing to my poor brain. Particularly annoying is DECL_ABSTRACT which is actually a boolean, unlike the other two. Would it be OK to rename it something more sensible like DECL_ABSTRACT_P? I know this is a longstanding name, but the proposed is clearer and virtually the same. OK for mainline? IMHO a good idea. --- a/gcc/cp/class.c +++ b/gcc/cp/class.c @@ -4580,7 +4580,7 @@ clone_function_decl (tree fn, int update_method_vec_p) } /* Note that this is an abstract function that is never emitted. */ - DECL_ABSTRACT (fn) = 1; + DECL_ABSTRACT_P (fn) = 1; It'd probably make sense to use 'true' now. @@ -10272,7 +10272,7 @@ grokdeclarator (const cp_declarator *declarator, clones. The decloning optimization (for space) may revert this subsequently if it determines that the clones should share a common implementation. */ - DECL_ABSTRACT (decl) = 1; + DECL_ABSTRACT_P (decl) = 1; Likewise. --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -5095,7 +5095,7 @@ copy_decl_no_change (tree decl, copy_body_data *id) copy = copy_node (decl); /* The COPY is not abstract; it will be generated in DST_FN. */ - DECL_ABSTRACT (copy) = 0; + DECL_ABSTRACT_P (copy) = 0; lang_hooks.dup_lang_specific_decl (copy); And false here. --- a/gcc/varpool.c +++ b/gcc/varpool.c @@ -704,7 +704,7 @@ add_new_static_var (tree type) TREE_STATIC (new_decl) = 1; TREE_USED (new_decl) = 1; DECL_CONTEXT (new_decl) = NULL_TREE; - DECL_ABSTRACT (new_decl) = 0; + DECL_ABSTRACT_P (new_decl) = 0; And here. Marek
Re: [PATCH] Add header guard to several header files.
Hi Joseph: Here is updated patch and ChangeLog, However I don't have commit write yet, can you help me to commit it? thanks btw, I has already signed FSF agreement:) 2014-09-19 Kito Cheng k...@0xlab.org except.h: Fix header guard. addresses.h: Add missing header guard. cfghooks.h: Likewise. collect-utils.h: Likewise. collect2-aix.h: Likewise. conditions.h: Likewise. cselib.h: Likewise. dwarf2asm.h: Likewise. graphds.h: Likewise. graphite-scop-detection.h: Likewise. gsyms.h: Likewise. hw-doloop.h: Likewise. incpath.h: Likewise. ipa-inline.h: Likewise. ipa-ref.h: Likewise. ira-int.h: Likewise. ira.h: Likewise. lra-int.h: Likewise. lra.h: Likewise. lto-section-names.h: Likewise. read-md.h: Likewise. reload.h: Likewise. rtl-error.h: Likewise. sdbout.h: Likewise. targhooks.h: Likewise. tree-affine.h: Likewise. xcoff.h: Likewise. xcoffout.h: Likewise. OK except for the changes to target-def.h and target-hooks-macros.h. (Those aren't exactly normal headers that could reasonably be included more than once in a source file; they have dependencies on where they get included and what's defined before/after inclusion. So while I suspect the include guards would not cause any problems in those headers, it's not obvious they're desirable either.) From 8c7e08c00526265f21830f72c7b266fd48ddea17 Mon Sep 17 00:00:00 2001 From: Kito Cheng kito.ch...@gmail.com Date: Fri, 22 Aug 2014 17:34:49 +0800 Subject: [PATCH] Add header guard to several header files. 2014-09-19 Kito Cheng k...@0xlab.org except.h: Fix header guard. addresses.h: Add missing header guard. cfghooks.h: Likewise. collect-utils.h: Likewise. collect2-aix.h: Likewise. conditions.h: Likewise. cselib.h: Likewise. dwarf2asm.h: Likewise. graphds.h: Likewise. graphite-scop-detection.h: Likewise. gsyms.h: Likewise. hw-doloop.h: Likewise. incpath.h: Likewise. ipa-inline.h: Likewise. ipa-ref.h: Likewise. ira-int.h: Likewise. ira.h: Likewise. lra-int.h: Likewise. lra.h: Likewise. lto-section-names.h: Likewise. read-md.h: Likewise. reload.h: Likewise. rtl-error.h: Likewise. sdbout.h: Likewise. targhooks.h: Likewise. tree-affine.h: Likewise. xcoff.h: Likewise. xcoffout.h: Likewise. --- gcc/addresses.h | 5 + gcc/cfghooks.h| 4 gcc/collect-utils.h | 5 + gcc/collect2-aix.h| 4 gcc/conditions.h | 5 + gcc/cselib.h | 5 + gcc/dwarf2asm.h | 4 gcc/except.h | 5 +++-- gcc/graphds.h | 5 + gcc/graphite-scop-detection.h | 4 gcc/gsyms.h | 4 gcc/hw-doloop.h | 5 + gcc/incpath.h | 5 + gcc/ipa-inline.h | 5 + gcc/ipa-ref.h | 5 + gcc/ira-int.h | 5 + gcc/ira.h | 5 + gcc/lra-int.h | 5 + gcc/lra.h | 5 + gcc/lto-section-names.h | 5 + gcc/read-md.h | 5 + gcc/reload.h | 4 gcc/rtl-error.h | 5 + gcc/sdbout.h | 5 + gcc/targhooks.h | 5 + gcc/tree-affine.h | 5 + gcc/xcoff.h | 5 + gcc/xcoffout.h| 4 28 files changed, 131 insertions(+), 2 deletions(-) diff --git a/gcc/addresses.h b/gcc/addresses.h index e323b58..3f0089a 100644 --- a/gcc/addresses.h +++ b/gcc/addresses.h @@ -21,6 +21,9 @@ along with GCC; see the file COPYING3. If not see MODE_BASE_REG_REG_CLASS, MODE_BASE_REG_CLASS and BASE_REG_CLASS. Arguments as for the MODE_CODE_BASE_REG_CLASS macro. */ +#ifndef GCC_ADDRESSES_H +#define GCC_ADDRESSES_H + static inline enum reg_class base_reg_class (enum machine_mode mode ATTRIBUTE_UNUSED, addr_space_t as ATTRIBUTE_UNUSED, @@ -82,3 +85,5 @@ regno_ok_for_base_p (unsigned regno, enum machine_mode mode, addr_space_t as, return ok_for_base_p_1 (regno, mode, as, outer_code, index_code); } + +#endif /* GCC_ADDRESSES_H */ diff --git a/gcc/cfghooks.h b/gcc/cfghooks.h index 8ff808c..1b8587a 100644 --- a/gcc/cfghooks.h +++ b/gcc/cfghooks.h @@ -18,6 +18,9 @@ You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ +#ifndef GCC_CFGHOOKS_H +#define GCC_CFGHOOKS_H + /* Only basic-block.h includes this. */ struct cfg_hooks @@ -221,3 +224,4 @@ extern void gimple_register_cfg_hooks (void); extern struct cfg_hooks get_cfg_hooks (void); extern void set_cfg_hooks (struct cfg_hooks); +#endif /* GCC_CFGHOOKS_H */ diff --git a/gcc/collect-utils.h b/gcc/collect-utils.h index 2989c6b..ba1985e 100644 --- a/gcc/collect-utils.h +++ b/gcc/collect-utils.h @@ -17,6 +17,9 @@ You should have received a copy of the
Re: [patch] rename DECL_ABSTRACT to DECL_ABSTRACT_P
- DECL_ABSTRACT (fn) = 1; + DECL_ABSTRACT_P (fn) = 1; It'd probably make sense to use 'true' now. I thought about it, but I wanted to change as little as possible, plus I wanted to follow the same style as what we've been doing for a lot of the _P macros: DECL_HAS_VALUE_EXPR_P (t) = 1; DECL_HAS_DEBUG_ARGS_P (from) = 1; DECL_IGNORED_P (lab) = 1; TREE_PUBLIC (decl) = 1; CONSTANT_POOL_ADDRESS_P (symbol) = 1; etc, etc. But I am happy to change it, if people feel strongly about it. (Though I'm not volunteering to change the other umteenhundred _P macros that currently use 0/1 ;-)). Aldy
Re: [patch] rename DECL_ABSTRACT to DECL_ABSTRACT_P
On Thu, Sep 18, 2014 at 10:30:30AM -0600, Aldy Hernandez wrote: - DECL_ABSTRACT (fn) = 1; + DECL_ABSTRACT_P (fn) = 1; It'd probably make sense to use 'true' now. I thought about it, but I wanted to change as little as possible, plus I wanted to follow the same style as what we've been doing for a lot of the _P macros: DECL_HAS_VALUE_EXPR_P (t) = 1; DECL_HAS_DEBUG_ARGS_P (from) = 1; DECL_IGNORED_P (lab) = 1; TREE_PUBLIC (decl) = 1; CONSTANT_POOL_ADDRESS_P (symbol) = 1; etc, etc. But I am happy to change it, if people feel strongly about it. (Though I'm not volunteering to change the other umteenhundred _P macros that currently use 0/1 ;-)). Yeah, sure, either way it's a good cleanup ;). Marek
[patch] update comments on *_ultimate_origin
output_inline_function was removed in tree-ssa times, no sense referencing it a decade later. I still see DECL_ABSTRACT_ORIGIN pointing to itself in some instances, though I haven't tracked down where, so I assume we still need the functionality, just not the comment :). OK for mainline? Aldy commit d51576de0a8450634ff7622e4688fd02fc8fcee9 Author: Aldy Hernandez al...@redhat.com Date: Thu Sep 18 10:35:30 2014 -0600 * dwarf2out.c (decl_ultimate_origin): Update comment. * tree.c (block_ultimate_origin): Same. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 23a80d8..c65c756 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -3676,8 +3676,7 @@ decl_ultimate_origin (const_tree decl) if (!CODE_CONTAINS_STRUCT (TREE_CODE (decl), TS_DECL_COMMON)) return NULL_TREE; - /* output_inline_function sets DECL_ABSTRACT_ORIGIN for all the - nodes in the function to point to themselves; ignore that if + /* DECL_ABSTRACT_ORIGIN can point to itself; ignore that if we're trying to output the abstract instance of this function. */ if (DECL_ABSTRACT (decl) DECL_ABSTRACT_ORIGIN (decl) == decl) return NULL_TREE; diff --git a/gcc/tree.c b/gcc/tree.c index d1d67ef..fc544de 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -11554,8 +11554,7 @@ block_ultimate_origin (const_tree block) { tree immediate_origin = BLOCK_ABSTRACT_ORIGIN (block); - /* output_inline_function sets BLOCK_ABSTRACT_ORIGIN for all the - nodes in the function to point to themselves; ignore that if + /* BLOCK_ABSTRACT_ORIGIN can point to itself; ignore that if we're trying to output the abstract instance of this function. */ if (BLOCK_ABSTRACT (block) immediate_origin == block) return NULL_TREE;
Re: [PATCH] Put all constants last in tree_swap_operands_p, remove odd -Os check
We've been seeing errors using aarch64-none-linux-gnu gcc to build the 403.gcc benchmark from spec2k6, that we've traced back to this patch. The error looks like: /home/alalaw01/bootstrap_richie/gcc/xgcc -B/home/alalaw01/bootstrap_richie/gcc -O3 -mcpu=cortex-a57.cortex-a53 -DSPEC_CPU_LP64alloca.o asprintf.o vasprintf.o c-parse.o c-lang.o attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o c-aux-info.o c-common.o c-format.o c-semantics.o c-objc-common.o main.o cpplib.o cpplex.o cppmacro.o cppexp.o cppfiles.o cpphash.o cpperror.o cppinit.o cppdefault.o line-map.o mkdeps.o prefix.o version.o mbchar.o alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o cfg.o cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o cfgrtl.o combine.o conflict.o convert.o cse.o cselib.o dbxout.o debug.o dependence.o df.o diagnostic.o doloop.o dominance.o dwarf2asm.o dwarf2out.o dwarfout.o emit-rtl.o except.o explow.o expmed.o expr.o final.o flow.o fold-const.o function.o gcse.o genrtl.o ggc-common.o global.o graph.o haifa-sched.o hash.o hashtable.o hooks.o ifcvt.o insn-attrtab.o insn-emit.o insn-extract.o insn-opinit.o insn-output.o insn-peep.o insn-recog.o integrate.o intl.o jump.o langhooks.o lcm.o lists.o local-alloc.o loop.o obstack.o optabs.o params.o predict.o print-rtl.o print-tree.o profile.o real.o recog.o reg-stack.o regclass.o regmove.o regrename.o reload.o reload1.o reorg.o resource.o rtl.o rtlanal.o rtl-error.o sbitmap.o sched-deps.o sched-ebb.o sched-rgn.o sched-vis.o sdbout.o sibcall.o simplify-rtx.o ssa.o ssa-ccp.o ssa-dce.o stmt.o stor-layout.o stringpool.o timevar.o toplev.o tree.o tree-dump.o tree-inline.o unroll.o varasm.o varray.o vmsdbgout.o xcoffout.o ggc-page.o i386.o xmalloc.o xexit.o hashtab.o safe-ctype.o splay-tree.o xstrdup.o md5.o fibheap.o xstrerror.o concat.o partition.o hex.o lbasename.o getpwd.o ucbqsort.o -lm-o gcc emit-rtl.o: In function `gen_rtx_REG': emit-rtl.c:(.text+0x12f8): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section in regclass.o emit-rtl.o: In function `gen_rtx': emit-rtl.c:(.text+0x1824): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section in regclass.o collect2: error: ld returned 1 exit status specmake: *** [gcc] Error 1 Error with make 'specmake -j7 build': check file '/home/alalaw01/spectest/benchspec/CPU2006/403.gcc/build/build_base_test./make.err' Command returned exit code 2 Error with make! *** Error building 403.gcc Inspecting the compiled emit-rtl.o shows: $ readelf --relocs good/emit-rtl.o | grep fixed_regs 12a8 005d0113 R_AARCH64_ADR_PRE fixed_regs + 0 12ac 005d0115 R_AARCH64_ADD_ABS fixed_regs + 0 1800 005d0113 R_AARCH64_ADR_PRE fixed_regs + 0 1804 005d0115 R_AARCH64_ADD_ABS fixed_regs + 0 (that's compiled with a gcc just before this patch), contrastingly using a gcc with that patch: $ readelf --relocs bad/emit-rtl.o | grep fixed_regs 12a8 005d0113 R_AARCH64_ADR_PRE fixed_regs + 0 12ac 005d0115 R_AARCH64_ADD_ABS fixed_regs + 0 12f8 005d0113 R_AARCH64_ADR_PRE fixed_regs + 12fc 005d0116 R_AARCH64_LDST8_A fixed_regs + 1824 005d0113 R_AARCH64_ADR_PRE fixed_regs + 1828 005d0116 R_AARCH64_LDST8_A fixed_regs + 186c 005d0113 R_AARCH64_ADR_PRE fixed_regs + 0 1870 005d0115 R_AARCH64_ADD_ABS fixed_regs + 0 I attach a candidate 'fix', which allows building of 403.gcc on aarch64-none-linux-gnu, full regression etc ongoing. (I admit there may be better options in terms of canonicalizing if you want to!) --Alan Richard Biener wrote: The following makes tree_swap_operands_p put all constants 2nd place, also looks through sign-changes when considering further canonicalzations and removes the odd -Os guard for those. That was put in with https://gcc.gnu.org/ml/gcc-patches/2003-10/msg01208.html just motivated by CSiBE numbers - but rather than disabling canonicalization this should have disabled the actual harmful transforms. Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu. Richard. 2014-08-15 Richard Biener rguent...@suse.de * fold-const.c (tree_swap_operands_p): Put all constants last, also strip sign-changing NOPs when considering further canonicalization. Canonicalize also when optimizing for size. Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 214007) +++ gcc/fold-const.c(working copy) @@ -6642,37 +6650,19 @@ reorder_operands_p (const_tree arg0,
[PATCH, rs6000] Rename GCC version in warning messages
Hello, the ABI warning messages I introduced in recent patches refer to a GCC version 4.10. As GCC has since adopted a new version naming scheme, this patch updates those messages to refer to GCC 5 instead. Tested on powerpc64le-linux. OK for mainline? Bye, Ulrich ChangeLog: * config/rs6000/rs6000.c (rs6000_special_adjust_field_align_p): Update GCC version name to GCC 5. (rs6000_function_arg_boundary): Likewise. (rs6000_function_arg): Likewise. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 215355) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -5939,7 +5939,7 @@ warned = true; inform (input_location, the layout of aggregates containing vectors with - %d-byte alignment has changed in GCC 4.10, + %d-byte alignment has changed in GCC 5, computed / BITS_PER_UNIT); } } @@ -9307,7 +9307,7 @@ warned = true; inform (input_location, the ABI of passing aggregates with %d-byte alignment - has changed in GCC 4.10, + has changed in GCC 5, (int) TYPE_ALIGN (type) / BITS_PER_UNIT); } } @@ -10428,7 +10428,7 @@ warned = true; inform (input_location, the ABI of passing homogeneous float aggregates - has changed in GCC 4.10); + has changed in GCC 5); } } -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[committed] Fix Crash when OpenMP target's array section handling is used with templates (PR c++/63248)
On Wed, Sep 10, 2014 at 05:21:07PM +0200, Thomas Schwinge wrote: Hi! On Wed, 10 Sep 2014 12:23:04 +0200, Jakub Jelinek ja...@redhat.com wrote: On Wed, Sep 10, 2014 at 12:12:03PM +0200, Thomas Schwinge wrote: Are the following issues known? No, please file a PR. Will do tomorrow. Here is a fix I've committed to trunk/4.9 after testing on x86_64-linux. 2014-09-18 Jakub Jelinek ja...@redhat.com PR c++/63248 * semantics.c (finish_omp_clauses): Don't call cp_omp_mappable_type on type of type dependent expressions, and don't call it if handle_omp_array_sections has kept TREE_LIST because something was type dependent. * pt.c (tsubst_expr) case OMP_TARGET, case OMP_TARGET_DATA: Use keep_next_level, begin_omp_structured_block and finish_omp_structured_block instead of push_stmt_list and pop_stmt_list. libgomp/ * testsuite/libgomp.c++/pr63248.C: New test. --- gcc/cp/semantics.c.jj 2014-09-17 21:01:11.0 +0200 +++ gcc/cp/semantics.c 2014-09-18 17:05:19.785988633 +0200 @@ -5668,7 +5668,9 @@ finish_omp_clauses (tree clauses) else { t = OMP_CLAUSE_DECL (c); - if (!cp_omp_mappable_type (TREE_TYPE (t))) + if (TREE_CODE (t) != TREE_LIST + !type_dependent_expression_p (t) + !cp_omp_mappable_type (TREE_TYPE (t))) { error_at (OMP_CLAUSE_LOCATION (c), array section does not have mappable type @@ -5708,6 +5710,7 @@ finish_omp_clauses (tree clauses) remove = true; else if (!(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP OMP_CLAUSE_MAP_KIND (c) == OMP_CLAUSE_MAP_POINTER) + !type_dependent_expression_p (t) !cp_omp_mappable_type ((TREE_CODE (TREE_TYPE (t)) == REFERENCE_TYPE) ? TREE_TYPE (TREE_TYPE (t)) --- gcc/cp/pt.c.jj 2014-09-16 10:00:35.0 +0200 +++ gcc/cp/pt.c 2014-09-18 18:12:09.804850925 +0200 @@ -14089,8 +14089,6 @@ tsubst_expr (tree t, tree args, tsubst_f case OMP_SECTIONS: case OMP_SINGLE: case OMP_TEAMS: -case OMP_TARGET_DATA: -case OMP_TARGET: tmp = tsubst_omp_clauses (OMP_CLAUSES (t), false, args, complain, in_decl); stmt = push_stmt_list (); @@ -14099,6 +14097,22 @@ tsubst_expr (tree t, tree args, tsubst_f t = copy_node (t); OMP_BODY (t) = stmt; + OMP_CLAUSES (t) = tmp; + add_stmt (t); + break; + +case OMP_TARGET_DATA: +case OMP_TARGET: + tmp = tsubst_omp_clauses (OMP_CLAUSES (t), false, + args, complain, in_decl); + keep_next_level (true); + stmt = begin_omp_structured_block (); + + RECUR (OMP_BODY (t)); + stmt = finish_omp_structured_block (stmt); + + t = copy_node (t); + OMP_BODY (t) = stmt; OMP_CLAUSES (t) = tmp; add_stmt (t); break; --- libgomp/testsuite/libgomp.c++/pr63248.C.jj 2014-09-18 18:19:49.806529990 +0200 +++ libgomp/testsuite/libgomp.c++/pr63248.C 2014-09-18 18:18:58.0 +0200 @@ -0,0 +1,62 @@ +// PR c++/63248 +// { dg-do run } + +int *v; + +template typename T +T +foo (T A, T B) +{ + T a = 2; + T b = 4; + +#pragma omp target map(v[a:b]) + v[a] = 1; + +#pragma omp target map(v[A:B]) + v[a] = 2; + +#pragma omp target map(A) + A = 19; + return A; +} + +template int N +int +bar (int A, int B) +{ +#pragma omp target map(A) + A = 8; + if (A != 8) +__builtin_abort (); +#pragma omp target map(A, B) + { +A = 1; +B = 2; + } + return A + B; +} + +int +baz (int A, int B) +{ +#pragma omp target map(A) + A = 8; + if (A != 8) +__builtin_abort (); +#pragma omp target map(A, B) + { +A = 1; +B = 2; + } + return A + B; +} + +int +main () +{ + int a[10] = { 0 }; + v = a; + if (foo (1, 5) != 19 || v[2] != 2 || bar0 (5, 7) != 3 || baz (5, 7) != 3) +__builtin_abort (); +} Jakub
Re: Patch to fix PR61360
On 09/18/2014 12:10 PM, Jakub Jelinek wrote: On Thu, Sep 18, 2014 at 12:04:30PM -0400, Vladimir Makarov wrote: The following patch fixes the PR. The details can be found on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360 The patch was bootstrapped and tested on x86/x86-64. Committed as rev. 215358. What effect does this have on compile time? It is hard to measure real time but 0.05% according to valgrind lackey on combine.i for -O2
Re: [PATCH i386 AVX512] [42/n] Add masked vunpck[lh]pd.
On Thu, Sep 18, 2014 at 1:47 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, Patch in the bottom extends/adds patterns for masked unpack instructions. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn avx_unpckhpd256mask_name): Add masking. (define_insn avx512vl_unpckhpd128_mask): New. (define_expand avx_movddup256mask_name): Add masking. (define_expand avx_unpcklpd256mask_name): Ditto. (define_insn *avx_unpcklpd256mask_name): Ditto. (define_insn avx512vl_unpcklpd128_mask): New. OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [43/n] Add rest of vunpck[lh]ps.
On Thu, Sep 18, 2014 at 1:54 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch adds rest of unpack insn patterns. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn avx_unpckhps256mask_name): Add masking. (define_insn vec_interleave_highv4sfmask_name): Ditto. (define_insn avx_unpcklps256mask_name): Ditto. (define_insn unpcklps128_mask): New. OK. Thanks, Uros.
[patch] normalize the x86-vxworks port organization
Hello, VxWorks ports typically come in two flavors: regular VxWorks and VxWorksAE (653). In most cases, cpu/vxworks.h is used as a common configuration file for the two flavors and cpu/vxworksae.h overrides/adds on top of that. There are also config/vx*.h shared by everybody. The x86 port departs from this scheme, with a i386/vx-common.h file. The attached patch is a proposal to bring the x86 port organization in line with what is done for other CPUs. It essentially - moves the contents of i386/vx-common.h within i386/vxworks.h, - removes i386/vx-common.h - adjusts config.gcc accordingly The patch takes the opportunity to - cleanup i386/vxworksae.h, removing redundant or obsolete definitions and putting the one we use wrt stack-checking support for this platform. We (AdaCore) have been using this succesfully for a while on gcc-4.7 and recently on gcc-4.9, for both VxWorks6 and VxWorksAE targets. The patch attached here applies on mainline and passes make all-gcc for --target=i686-wrs-vxworksae --enable-languages=c OK to commit ? Thanks in advance for your feedback, With Kind Regards, Olivier 2014-09-18 Olivier Hainque hain...@adacore.com * config/i386/vxworksae.h: Remove obsolete definitions. (STACK_CHECK_PROTECT): Define. * config/i386/vx-common.h: Remove. Merge contents within config/i386/vxworks.h. * config.gcc (i?86-vxworks*): Use i386/vxworks.h instead of i386/vx-common.h. cleanup-x86vx653.diff Description: Binary data
Re: [PATCH i386 AVX512] [44/n] Add vsgufps insn patterns.
On Thu, Sep 18, 2014 at 1:59 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, Patch in the bottom extends AVX-512 shufps. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_expand avx_shufps256mask_expand4_name): Add masking. (define_insn avx_shufps256_1mask_name): Ditto. (define_expand sse_shufpsmask_expand4_name): Ditto. (define_insn sse_shufps_v4sf_mask): New. OK. Thanks, Uros.
Re: [PATCH i386 AVX512] [45/n] Add vshufpd insn patterns.
On Thu, Sep 18, 2014 at 2:02 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch supports AVX-512's vshufpd insns. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_expand avx_shufpd256mask_expand4_name): Add masking. (define_insn avx_shufpd256_1mask_name): Ditto. (define_expand sse2_shufpdmask_expand4_name): Ditto. (define_insn sse2_shufpd_v2df_mask): New. OK. Thanks, Uros.
Re: [PATCHv4] Vimrc config with GNU formatting
On Thu, Sep 18, 2014 at 12:40:08PM +0400, Yury Gribov wrote: When typing 'make .local.vimrc' in GCC build directory one would expect .local.vimrc to be created at the root of build directory, not srcdir. Yes, you would not expect it to do anything to your source dir, ever :-) + To enable this for GCC files by default, install thinca's vim-localrc + plugin and do + $ make .local.vimrc No, we should *not* advertise an enough rope solution without mentioning it *will* kill you. How about adding a disclaimer? E.g. beware that Vim plugins are a GAPING SECURITY HOLE so use the at YOUR OWN RISK. (And note that Braun's plugin does use sandboxes). This *particular* plugin is suicidal. Most plugins are just fine. Or not mention it at all. Esp. since your next option has all the same functionality and more. It lacks very important functionality: user has to specify path to concrete GCC source tree when adding the autocmd. I was talking about mbr's plugin here :-) I have a dozen of trees on my box and I regularly rename, move or copy them. With plugins one doesn't have to bother fixing paths in ~/.vimrc which is important for productivity. And :au bufread ~/src/gcc/* ... works for me. To each their own. + Or if you dislike plugins, add autocmd in your ~/.vimrc: + :au BufNewFile,BufReadPost path/to/gcc/* :so path/to/gcc/contrib/vimrc There are many more reasons than just dislike of plugins to prefer something like this. For one thing, many Vim users will have many similar statements in their config _already_. So if you don't want to use plugins? Just mention it as another option? Something like You can add these options to your .vimrc; or you can :source this script file; or do either with an :autocmd; or use e.g. the name of plugin here plugin some vim.org url. Don't say do X if Y; let people decide for themselves what fits their situation best. + Or just source file manually every time if you are masochist: + :so path/to/gcc/contrib/vimrc How is that masochist? Typing that cino by hand though, now that would qualify ;-) Note that user has to type source command for every newly opened file. This indeed looks inconvenient (to me again). Well for most people it is justt :so contrib/vimrc. Or just :lo if you're talking about crazy people with views. +setlocal cinoptions=2s,n-s,{s,^-s,:s,=s,g0,f0,hs,p2s,t0,+s,(0,u0,w1,m0 If you write this as absolute numbers instead of as shift widths, you do not need to force sw and sts settings down people's throat. It might also be easier to read? Well I doubt that, but it will be slightly shorter at least. IMHO matching shiftwidth with GNU indent may be useful. E.g. Vim won't reindent when you start editing an empty line and user will have to insert indents manually. Also replacing offsets with numbers hides the fact that they are based on GNU shiftwidth. I have no idea what you mean with matching with GNU indent, sorry. I was suggesting you could write it as :set cino=4,n-2,{2,^-2,:2,=2,g0,f0,h2,p4,t0,+2,(0,u0,w1,m0 and you'd be independent of sw setting. The coding standard says indent two spaces etc. anyway. And yeah sw=2 does make sense for editing GCC, if you are used to sw=2 that is. The point is that the sw setting has nothing to do with what your text will look like, only with what keys you press. +setlocal textwidth=79 The coding conventions say maximum line length is 80. From https://www.gnu.org/prep/standards/html_node/Formatting.html : Please keep the length of source lines to 79 characters or less, for maximum readability in the widest range of environments. There is a doc on gcc.gnu.org as well, which describes many more details. Now we rarely do violate textwidth in our codes, rarely? Ho hum. There are many worse formatting errors, of course. And how much do those matter _really_. And you do not enable t (also on by default), so you do not want to wrap text anyway? Confused now. Me as well, the original config author did it that way. IMHO +t makes sense here. It is certainly more consistent. Segher
Re: [PATCH, i386, Pointer Bounds Checker 30/x] Size relocation
On Thu, Sep 18, 2014 at 4:00 PM, Ilya Enkovich enkovich@gmail.com wrote: On 17 Sep 20:51, Uros Bizjak wrote: On Wed, Sep 17, 2014 at 8:35 PM, Ilya Enkovich enkovich@gmail.com wrote: On 16 Sep 12:22, Uros Bizjak wrote: On Tue, Sep 16, 2014 at 11:37 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-09-16 13:08 GMT+04:00 Uros Bizjak ubiz...@gmail.com: Can x86_64_immediate_operand predicate be used here? I think it cannot be used because of TLS symbols not counting as immediate. OK, please introduce a new predicate, similar to x86_64_immediate_operand, perhaps x86_64_immediate_size_operand, so we can add some comments there. This will also help to macroize the insn, x86_64_immediate_operand has !TARGET_64BIT shortcut for this case. Uros. I don't see how new predicate would help to macroize insn. Single template may look as following patch. You put early return for !TARGET_64BITS. Please see x86_64_immediate_operand predicate. So, /* Here comes comment. */ (define_predicate x86_64_immediate_size_operand (match_code symbol_ref) { if (!TARGET_64BIT) return true; /* Comment here explaining these conditions. */ return (ix86_cmodel == CM_SMALL || ix86_cmodel == CM_KERNEL); } And then in the pattern itself: if (x86_64_immediate_size_operand (operands[1], VOIDmode) return mov{l}\t{%1@SIZE, %k0|%k0, %1@SIZE}; else return movabs{q}\t{%1@SIZE, %0|%0, %1@SIZE}; Uros. Here is a version with check in a form you suggest. Thanks, Ilya -- 2014-09-18 Ilya Enkovich ilya.enkov...@intel.com * config/i386/i386.md (UNSPEC_SIZEOF): New. (move_size_reloc_mode): New. * config/i386/predicates.md (symbol_operand): New. (x86_64_immediate_size_operand): New. OK. We are always on the safe side now, movl is an optimization exception. I wonder if we can also add something like || (ix86_cmodel == CM_MEDIUM !SYMBOL_REF_FAR_ADDR_P (op))); as is the case with x86_64_immediate_operand, but I am not sure that object size is guaranteed to fit in 31bits. Maybe Honza (CC'd) can confirm this. Uros. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 2c367b2..db22b06 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -79,6 +79,7 @@ UNSPEC_PLTOFF UNSPEC_MACHOPIC_OFFSET UNSPEC_PCREL + UNSPEC_SIZEOF ;; Prologue support UNSPEC_STACK_ALLOC @@ -18554,6 +18555,21 @@ bndstx\t{%2, %3|%3, %2} [(set_attr type mpxst)]) +(define_insn move_size_reloc_mode + [(set (match_operand:SWI48 0 register_operand =r) + (unspec:SWI48 +[(match_operand:SWI48 1 symbol_operand)] +UNSPEC_SIZEOF))] + TARGET_MPX +{ + if (x86_64_immediate_size_operand (operands[1], VOIDmode)) +return mov{l}\t{%1@SIZE, %k0|%k0, %1@SIZE}; + else +return movabs{q}\t{%1@SIZE, %0|%0, %1@SIZE}; +} + [(set_attr type imov) + (set_attr mode MODE)]) + (include mmx.md) (include sse.md) (include sync.md) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index cd542b7..da01c9a 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -124,6 +124,10 @@ (match_test TARGET_64BIT) (match_test REGNO (op) BX_REG))) +;; Return true if VALUE is symbol reference +(define_predicate symbol_operand + (match_code symbol_ref)) + ;; Return true if VALUE can be stored in a sign extended immediate field. (define_predicate x86_64_immediate_operand (match_code const_int,symbol_ref,label_ref,const) @@ -336,6 +340,19 @@ return false; }) +;; Return true if size of VALUE can be stored in a sign +;; extended immediate field. +(define_predicate x86_64_immediate_size_operand + (match_code symbol_ref) +{ + if (!TARGET_64BIT) +return true; + + /* For 64 bit target we may assume size of object fits + immediate only when code model guarantees that. */ + return (ix86_cmodel == CM_SMALL || ix86_cmodel == CM_KERNEL); +}) + ;; Return true if OP is general operand representable on x86_64. (define_predicate x86_64_general_operand (if_then_else (match_test TARGET_64BIT)
Re: parallel check output changes?
On 09/18/2014 05:03 PM, Andrew MacLeod wrote: On 09/18/2014 09:05 AM, Andrew MacLeod wrote: On 09/18/2014 09:01 AM, Jakub Jelinek wrote: On Thu, Sep 18, 2014 at 08:56:50AM -0400, Andrew MacLeod wrote: Has the changes that have gone into the check parallelization made the .sum file non-deterministic? I'm seeing a lot of small hunks in different orders which cause my comparison scripts to show big differences. I haven't been paying attention to the nature of the make check changes so Im not sure if this is expected... Or is this something else? Its the same code base between runs, just with a few changes made to some include files. I'm using contrib/test_summary and haven't seen any non-determinisms in the output of that command. As for dg-extract-results.sh, we have two versions of that, one if you have python 2.6 or newer, another one if you don't. Perhaps the behavior of those two (I'm using the python version probably) differs? Jakub Not sure, although I do have python 2.7.5 installed for what its worth... I'll try another run in a bit. Andrew hum. My 3rd run (which has no compilation change from the 2nd one) is different from both other runs :-P. I did tweak my -j parameter in the make check, but that is it. I'm also seeing this. Python 3.3.5 here. Bernd
[jit] Markup fixes within documentation
Committed to branch dmalcolm/jit: gcc/jit/ChangeLog.jit: * docs/intro/install.rst: Markup fixes. * docs/intro/tutorial01.rst: Likewise. * docs/intro/tutorial02.rst: Likewise. * docs/topics/contexts.rst: Likewise. * docs/topics/expressions.rst: Likewise. * docs/topics/functions.rst: Likewise. * docs/topics/locations.rst: Likewise. * docs/topics/types.rst: Likewise. --- gcc/jit/ChangeLog.jit | 11 +++ gcc/jit/docs/intro/install.rst | 32 +++- gcc/jit/docs/intro/tutorial01.rst | 8 +--- gcc/jit/docs/intro/tutorial02.rst | 4 +++- gcc/jit/docs/topics/contexts.rst| 16 +--- gcc/jit/docs/topics/expressions.rst | 34 +- gcc/jit/docs/topics/functions.rst | 28 +--- gcc/jit/docs/topics/locations.rst | 8 ++-- gcc/jit/docs/topics/types.rst | 16 9 files changed, 119 insertions(+), 38 deletions(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index 7ee7ebf..11c9298 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,5 +1,16 @@ 2014-09-18 David Malcolm dmalc...@redhat.com + * docs/intro/install.rst: Markup fixes. + * docs/intro/tutorial01.rst: Likewise. + * docs/intro/tutorial02.rst: Likewise. + * docs/topics/contexts.rst: Likewise. + * docs/topics/expressions.rst: Likewise. + * docs/topics/functions.rst: Likewise. + * docs/topics/locations.rst: Likewise. + * docs/topics/types.rst: Likewise. + +2014-09-18 David Malcolm dmalc...@redhat.com + * docs/examples/install-hello-world.c (main): Fix missing return. * docs/examples/tut01-square.c (main): Likewise. diff --git a/gcc/jit/docs/intro/install.rst b/gcc/jit/docs/intro/install.rst index fc2e96e..1a39192 100644 --- a/gcc/jit/docs/intro/install.rst +++ b/gcc/jit/docs/intro/install.rst @@ -41,12 +41,14 @@ your system. Having done this, sudo yum install libgccjit-devel should give you both the JIT library (`libgccjit`) and the header files -needed to develop against it (`libgccjit-devel`):: +needed to develop against it (`libgccjit-devel`): - [david@c64 ~]$ rpm -qlv libgccjit +.. code-block:: console + + $ rpm -qlv libgccjit lrwxrwxrwx1 rootroot 18 Aug 12 07:56 /usr/lib64/libgccjit.so.0 - libgccjit.so.0.0.1 -rwxr-xr-x1 rootroot 14463448 Aug 12 07:57 /usr/lib64/libgccjit.so.0.0.1 - [david@c64 ~]$ rpm -qlv libgccjit-devel + $ rpm -qlv libgccjit-devel -rwxr-xr-x1 rootroot37654 Aug 12 07:56 /usr/include/libgccjit++.h -rwxr-xr-x1 rootroot28967 Aug 12 07:56 /usr/include/libgccjit.h lrwxrwxrwx1 rootroot 14 Aug 12 07:56 /usr/lib64/libgccjit.so - libgccjit.so.0 @@ -103,7 +105,9 @@ To build it (within the jit/build subdirectory, installing to On my 4-core laptop this takes 17 minutes and 1.1G of disk space (it's much faster with many cores and a corresponding -j setting). -This should build a libgccjit.so within jit/build/gcc:: +This should build a libgccjit.so within jit/build/gcc: + +.. code-block:: console [build] $ file gcc/libgccjit.so* gcc/libgccjit.so: symbolic link to `libgccjit.so.0' @@ -126,14 +130,18 @@ earlier) via: On my laptop this uses a further 0.4G of disk space. You should be able to see the header files within the `include` -subdirectory of the installation prefix:: +subdirectory of the installation prefix: + +.. code-block:: console $ find $PREFIX/include /home/david/gcc-jit/install/include /home/david/gcc-jit/install/include/libgccjit.h /home/david/gcc-jit/install/include/libgccjit++.h -and the library within the `lib` subdirectory:: +and the library within the `lib` subdirectory: + +.. code-block:: console $ find $PREFIX/lib/libgccjit.* /home/david/gcc-jit/install/lib/libgccjit.so @@ -152,7 +160,9 @@ a call to `printf` and use it to write a message to stdout. Copy it to `jit-hello-world.c`. -To build it with prebuilt packages, use:: +To build it with prebuilt packages, use: + +.. code-block:: console $ gcc \ jit-hello-world.c \ @@ -165,7 +175,9 @@ To build it with prebuilt packages, use:: If building against an locally-built install (to $PREFIX), specify the -include and library paths with -I and -L:: +include and library paths with -I and -L: + +.. code-block:: console $ gcc \ jit-hello-world.c \ @@ -173,7 +185,9 @@ include and library paths with -I and -L:: -lgccjit \ -I$PREFIX/include -L$PREFIX/lib -and when running, specify the dynamic linkage path via LD_LIBRARY_PATH:: +and when running, specify the dynamic linkage path via LD_LIBRARY_PATH: + +.. code-block:: console $ LD_LIBRARY_PATH=$PREFIX/lib ./jit-hello-world hello world diff --git
Re: [PATCH, i386, Pointer Bounds Checker 31/x] Pointer Bounds Checker builtins for i386 target
On Thu, Sep 18, 2014 at 3:47 PM, Ilya Enkovich enkovich@gmail.com wrote: Thanks for your comments. Below is a fixed verison. Ilya -- 2014-09-17 Ilya Enkovich ilya.enkov...@intel.com * config/i386/i386-builtin-types.def (BND): New. (ULONG): New. (BND_FTYPE_PCVOID_ULONG): New. (VOID_FTYPE_BND_PCVOID): New. (VOID_FTYPE_PCVOID_PCVOID_BND): New. (BND_FTYPE_PCVOID_PCVOID): New. (BND_FTYPE_PCVOID): New. (BND_FTYPE_BND_BND): New. (PVOID_FTYPE_PVOID_PVOID_ULONG): New. (PVOID_FTYPE_PCVOID_BND_ULONG): New. (ULONG_FTYPE_VOID): New. (PVOID_FTYPE_BND): New. * config/i386/i386.c: Include tree-chkp.h, rtl-chkp.h. (ix86_builtins): Add IX86_BUILTIN_BNDMK, IX86_BUILTIN_BNDSTX, IX86_BUILTIN_BNDLDX, IX86_BUILTIN_BNDCL, IX86_BUILTIN_BNDCU, IX86_BUILTIN_BNDRET, IX86_BUILTIN_BNDNARROW, IX86_BUILTIN_BNDINT, IX86_BUILTIN_SIZEOF, IX86_BUILTIN_BNDLOWER, IX86_BUILTIN_BNDUPPER. (builtin_isa): Add leaf_p and nothrow_p fields. (def_builtin): Initialize leaf_p and nothrow_p. (ix86_add_new_builtins): Handle leaf_p and nothrow_p flags. (bdesc_mpx): New. (bdesc_mpx_const): New. (ix86_init_mpx_builtins): New. (ix86_init_builtins): Call ix86_init_mpx_builtins. (ix86_emit_cmove): New. (ix86_emit_move_max): New. (ix86_expand_builtin): Expand IX86_BUILTIN_BNDMK, IX86_BUILTIN_BNDSTX, IX86_BUILTIN_BNDLDX, IX86_BUILTIN_BNDCL, IX86_BUILTIN_BNDCU, IX86_BUILTIN_BNDRET, IX86_BUILTIN_BNDNARROW, IX86_BUILTIN_BNDINT, IX86_BUILTIN_SIZEOF, IX86_BUILTIN_BNDLOWER, IX86_BUILTIN_BNDUPPER. OK with a few nits below. Thanks, Uros. diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 35c0035..989297a 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -47,6 +47,7 @@ DEF_PRIMITIVE_TYPE (UCHAR, unsigned_char_type_node) DEF_PRIMITIVE_TYPE (QI, char_type_node) DEF_PRIMITIVE_TYPE (HI, intHI_type_node) DEF_PRIMITIVE_TYPE (SI, intSI_type_node) +DEF_PRIMITIVE_TYPE (BND, pointer_bounds_type_node) # ??? Logically this should be intDI_type_node, but that maps to long # with 64-bit, and that's not how the emmintrin.h is written. Again, # changing this would change name mangling. @@ -60,6 +61,7 @@ DEF_PRIMITIVE_TYPE (USHORT, short_unsigned_type_node) DEF_PRIMITIVE_TYPE (INT, integer_type_node) DEF_PRIMITIVE_TYPE (UINT, unsigned_type_node) DEF_PRIMITIVE_TYPE (UNSIGNED, unsigned_type_node) +DEF_PRIMITIVE_TYPE (ULONG, long_unsigned_type_node) DEF_PRIMITIVE_TYPE (LONGLONG, long_long_integer_type_node) DEF_PRIMITIVE_TYPE (ULONGLONG, long_long_unsigned_type_node) DEF_PRIMITIVE_TYPE (UINT8, unsigned_char_type_node) @@ -806,3 +808,15 @@ DEF_FUNCTION_TYPE_ALIAS (V2DI_FTYPE_V2DI_V2DI, TF) DEF_FUNCTION_TYPE_ALIAS (V4SF_FTYPE_V4SF_V4SF, TF) DEF_FUNCTION_TYPE_ALIAS (V4SI_FTYPE_V4SI_V4SI, TF) DEF_FUNCTION_TYPE_ALIAS (V8HI_FTYPE_V8HI_V8HI, TF) + +# MPX builtins +DEF_FUNCTION_TYPE (BND, PCVOID, ULONG) +DEF_FUNCTION_TYPE (VOID, PCVOID, BND) +DEF_FUNCTION_TYPE (VOID, PCVOID, BND, PCVOID) +DEF_FUNCTION_TYPE (BND, PCVOID, PCVOID) +DEF_FUNCTION_TYPE (BND, PCVOID) +DEF_FUNCTION_TYPE (BND, BND, BND) +DEF_FUNCTION_TYPE (PVOID, PVOID, PVOID, ULONG) +DEF_FUNCTION_TYPE (PVOID, PCVOID, BND, ULONG) +DEF_FUNCTION_TYPE (ULONG, VOID) +DEF_FUNCTION_TYPE (PVOID, BND) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index d0f58b1..6082f86 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -85,6 +85,8 @@ along with GCC; see the file COPYING3. If not see #include tree-vectorizer.h #include shrink-wrap.h #include builtins.h +#include tree-chkp.h +#include rtl-chkp.h static rtx legitimize_dllimport_symbol (rtx, bool); static rtx legitimize_pe_coff_extern_decl (rtx, bool); @@ -28775,6 +28777,19 @@ enum ix86_builtins IX86_BUILTIN_XABORT, IX86_BUILTIN_XTEST, + /* MPX */ + IX86_BUILTIN_BNDMK, + IX86_BUILTIN_BNDSTX, + IX86_BUILTIN_BNDLDX, + IX86_BUILTIN_BNDCL, + IX86_BUILTIN_BNDCU, + IX86_BUILTIN_BNDRET, + IX86_BUILTIN_BNDNARROW, + IX86_BUILTIN_BNDINT, + IX86_BUILTIN_SIZEOF, + IX86_BUILTIN_BNDLOWER, + IX86_BUILTIN_BNDUPPER, + /* BMI instructions. */ IX86_BUILTIN_BEXTR32, IX86_BUILTIN_BEXTR64, @@ -28848,6 +28863,8 @@ struct builtin_isa { enum ix86_builtin_func_type tcode; /* type to use in the declaration */ HOST_WIDE_INT isa; /* isa_flags this builtin is defined for */ bool const_p;/* true if the declaration is constant */ + bool leaf_p; /* true if the declaration has leaf attribute */ + bool nothrow_p; /* true if the declaration has nothrow attribute */
Re: parallel check output changes?
On Thu, Sep 18, 2014 at 07:32:00PM +0200, Bernd Schmidt wrote: hum. My 3rd run (which has no compilation change from the 2nd one) is different from both other runs :-P. I did tweak my -j parameter in the make check, but that is it. I'm also seeing this. Python 3.3.5 here. Segher on IRC mentioned that changing result_re in dg-extract-results.py should help here (or disabling the python version, *.sh version should sort everything). Jakub
Re: Patch to fix PR61360
Jakub Jelinek ja...@redhat.com writes: On Thu, Sep 18, 2014 at 12:04:30PM -0400, Vladimir Makarov wrote: The following patch fixes the PR. The details can be found on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360 The patch was bootstrapped and tested on x86/x86-64. Committed as rev. 215358. What effect does this have on compile time? Regardless of compile time, I strongly object to this kind of hack. (a) it will in practice never go away. (b) (more importantly) it makes no conceptual sense. It means that passes before lra use the old, cached enabled attribute while lra and after will uew fresh values. The only reason the call has been put here is because lra was the only pass that checks for and asserted on inconsistent values. Passes before lra will still see the same inconsistent values but they happen not to assert. I.e. we've put the call here to shut up a valid assert rather than because it's the right place to do it. (c) the enabled attribute was never supposed to be used in this way. I really think the patch should be reverted. Thanks, Richard
Re: [PATCH][PING] Keep patch file permissions in mklog
On Thu, Sep 18, 2014 at 10:56 AM, Yury Gribov y.gri...@samsung.com wrote: On 08/04/2014 12:14 PM, Tom de Vries wrote: On 04-08-14 08:45, Yury Gribov wrote: Thanks! My 2 (actually 4) cents below. Hi Yuri, thanks for the review. +if ($#ARGV == 1 ($ARGV[0] eq -i || $ARGV[0] eq --inline)) { +$diff = $ARGV[1]; Can we shift here and then just set $diff to $ARGV[0] unconditionally? Done. +if ($diff eq -) { +die Reading from - and using -i are not compatible; +} Hm, can't we dump ChangeLog to stdout in that case? The limitation looks rather strange. My original idea here was that --inline means 'in the patch file', which is not possible if the patch comes from stdin. I've now interpreted it such that --inline prints to stdout what it would print to the patch file otherwise, that is, both log and patch. Printing just the log to stdout can be already be achieved by not using --inline. +open (FILE1, '', $tmp) or die Could not open temp file; Could we use more descriptive name? I've used the slightly more descriptive 'OUTPUTFILE'. +system (cat $diff $tmp) == 0 +or die Could not append patch to temp file; ... +unlink ($tmp) == 1 or die Could not remove temp file; The checks look like an overkill given that we don't check for result of mktemp... I've added a check for the result of mktemp, and removed the unlink result check. I've left in the Could not append patch to temp file check because the patch file might be read-only. OK for trunk? Thanks, - Tom Pinging the patch for Tom. Apologies for the delay. Could someone post the latest patch. I see it's gone through a cycle of reviews and changes. Thanks. Diego.
[gomp4] OpenACC acc_on_device (was: various OpenACC/PTX built-ins and a reduction tweak)
Hi! Here is my OpenACC acc_on_device patch, in a more complete form, with test cases and all that. Thanks, Cesar, for getting the ball rolling! On Wed, 17 Sep 2014 10:49:54 +0200, Jakub Jelinek ja...@redhat.com wrote: On Wed, Sep 17, 2014 at 10:44:12AM +0200, Tobias Burnus wrote: Cesar Philippidis wrote: The patch introduces the following OpenACC/PTX-specific built-ins: ... It is not completely clear how they are supposed to get used. Should the user call them directly in some cases? Or are they only used internally? acc_on_device sounds like a function which would be in C/C++ made available to the user via #define acc_on_device __builtin_acc_on_device. And not just providing acc_on_device prototype in some header? Yes, just a prototype. And next to DEF_GOACC_BUILTIN (configured the same as DEF_GOMP_BUILTIN), I add a new DEF_GOACC_BUILTIN_COMPILER that is configured to always provide the __builtin_[...] variant, but the un-prefixed [...] only if -fopenacc is in effect. Does that look alright? Without looking at the OpenACC standard, it sounds like this function could be similar to omp_is_initial_device, so can and should be handled supposedly similarly. I think we've been talking about this at the Cauldron, where you agreed that omp_is_initial_device should also be implemented as a builtin. (Or am I confusing things?) However, the rest looks as if it should rather be an internal function instead of a builtin. Or should the user really ever call the builtin directly? GOMP_* functions are builtins and not internal functions too, all those functions are library functions, while the user typically doesn't call them directly, they still are implemented in the library. Internal functions are used for something that doesn't have a library implementation and is not something user can call directly. Regarding Fortran: Builtins aren't directly available to the user. You have to wrap them into an intrinsic to make them available. If they have to be made available via a module (e.g. via module acc) - you have to create a virtual module, which provides the intrinsic. If you don't want to convert the whole module, you could create an auxiliar module (e.g. acc_internal_) which provides only those bits - and then include it (use,intrinsic :: ...) it in the main module - written in normal Fortran. This I have not yet addressed -- please see the TODO comments in the gcc/fortran/ files as well as Fortran test cases. For the user callable fortran functions, for OpenMP libgomp just provides *_ entrypoints to * functions. Perhaps acc_on_device_ could be provided too. This is what I had done already. Does that patch look good? (With the Fortran things still to be addressed.) (And, obviously this is not yet based on the Tobias/Jim Fortran module/header rewrite.) commit 8efbd08ed058d7ed3c43e10fbff0eac35b4defc9 Author: Thomas Schwinge tho...@codesourcery.com Date: Fri Jul 4 11:45:05 2014 + OpenACC acc_on_device. gcc/ * builtins.def (DEF_GOACC_BUILTIN_COMPILER): New macro. * oacc-builtins.def (BUILT_IN_GOACC_UPDATE): New builtin. * builtins.c (expand_builtin_acc_on_device): New function. (expand_builtin): Use it to handle BUILT_IN_ACC_ON_DEVICE. (is_inexpensive_builtin): Handle BUILT_IN_ACC_ON_DEVICE. gcc/fortran/ * f95-lang.c (DEF_GOACC_BUILTIN_COMPILER): New macro. * types.def (BT_FN_INT_INT): New type. gcc/testsuite/ * c-c++-common/goacc/acc_on_device-1.c: New file. * c-c++-common/goacc/acc_on_device-2.c: Likewise. * c-c++-common/goacc/acc_on_device-2-off.c: Likewise. * gfortran.dg/goacc/acc_on_device-1.f95: Likewise. * gfortran.dg/goacc/acc_on_device-2.f95: Likewise. * gfortran.dg/goacc/acc_on_device-2-off.f95: Likewise. libgomp/ * libgomp.map (OACC_2.0): Add acc_on_device, acc_on_device_. * fortran.c: Include openacc.h. (acc_on_device_): New function. * oacc-parallel.c: Include openacc.h. (acc_on_device): New function. * openacc.f90 (acc_device_kind, acc_device_none) (acc_device_default, acc_device_host, acc_device_not_host): New parameters. (acc_on_device): New function declaration. * openacc_lib.h (acc_device_kind, acc_device_none) (acc_device_default, acc_device_host, acc_device_not_host): New parameters. (acc_on_device): New function declaration. * openacc.h (acc_device_t): New enum. (acc_on_device): New function declaration. * testsuite/libgomp.oacc-c/acc_on_device-1.c: New file. * testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise. * testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise. --- gcc/ChangeLog.gomp | 8
Re: Patch to fix PR61360
On 09/18/2014 01:36 PM, Richard Sandiford wrote: Jakub Jelinek ja...@redhat.com writes: On Thu, Sep 18, 2014 at 12:04:30PM -0400, Vladimir Makarov wrote: The following patch fixes the PR. The details can be found on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360 The patch was bootstrapped and tested on x86/x86-64. Committed as rev. 215358. What effect does this have on compile time? Regardless of compile time, I strongly object to this kind of hack. (a) it will in practice never go away. (b) (more importantly) it makes no conceptual sense. It means that passes before lra use the old, cached enabled attribute while lra and after will uew fresh values. The only reason the call has been put here is because lra was the only pass that checks for and asserted on inconsistent values. Passes before lra will still see the same inconsistent values but they happen not to assert. I.e. we've put the call here to shut up a valid assert rather than because it's the right place to do it. (c) the enabled attribute was never supposed to be used in this way. I really think the patch should be reverted. Richard, I waited 4 months that somebody fixes this in md file (and people tried to do this without success). Instead I was asked numerous times from people interesting in fixing these crashes to fix it in LRA. After a recent request, I gave up. So I could revert it transferring blame on you but I don't think this hack is so bad to do this (may be I am wrong).
[jit] Use the pyramid theme for generated HTML docs
Committed to branch dmalcolm/jit: The default Sphinx theme is perhaps a little dated; switch to a non-default one. The pyramid one is clean and attractive IMHO. I've updated the prebuilt docs currently at: https://dmalcolm.fedorapeople.org/gcc/libgccjit-api-docs/ to use the new theme. gcc/jit/ChangeLog.jit: * docs/conf.py (Options for HTML output): Update html_theme from default to pyramid. --- gcc/jit/ChangeLog.jit | 5 + gcc/jit/docs/conf.py | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index 11c9298..06734db 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,5 +1,10 @@ 2014-09-18 David Malcolm dmalc...@redhat.com + * docs/conf.py (Options for HTML output): Update html_theme from + default to pyramid. + +2014-09-18 David Malcolm dmalc...@redhat.com + * docs/intro/install.rst: Markup fixes. * docs/intro/tutorial01.rst: Likewise. * docs/intro/tutorial02.rst: Likewise. diff --git a/gcc/jit/docs/conf.py b/gcc/jit/docs/conf.py index 6199010..22e763a 100644 --- a/gcc/jit/docs/conf.py +++ b/gcc/jit/docs/conf.py @@ -91,7 +91,7 @@ pygments_style = 'sphinx' # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. -html_theme = 'default' +html_theme = 'pyramid' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the -- 1.7.11.7
Re: [PATCH] Put all constants last in tree_swap_operands_p, remove odd -Os check
On Thu, Sep 18, 2014 at 9:44 AM, Alan Lawrence alan.lawre...@arm.com wrote: We've been seeing errors using aarch64-none-linux-gnu gcc to build the 403.gcc benchmark from spec2k6, that we've traced back to this patch. The error looks like: /home/alalaw01/bootstrap_richie/gcc/xgcc -B/home/alalaw01/bootstrap_richie/gcc -O3 -mcpu=cortex-a57.cortex-a53 -DSPEC_CPU_LP64alloca.o asprintf.o vasprintf.o c-parse.o c-lang.o attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o c-aux-info.o c-common.o c-format.o c-semantics.o c-objc-common.o main.o cpplib.o cpplex.o cppmacro.o cppexp.o cppfiles.o cpphash.o cpperror.o cppinit.o cppdefault.o line-map.o mkdeps.o prefix.o version.o mbchar.o alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o cfg.o cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o cfgrtl.o combine.o conflict.o convert.o cse.o cselib.o dbxout.o debug.o dependence.o df.o diagnostic.o doloop.o dominance.o dwarf2asm.o dwarf2out.o dwarfout.o emit-rtl.o except.o explow.o expmed.o expr.o final.o flow.o fold-const.o function.o gcse.o genrtl.o ggc-common.o global.o graph.o haifa-sched.o hash.o hashtable.o hooks.o ifcvt.o insn-attrtab.o insn-emit.o insn-extract.o insn-opinit.o insn-output.o insn-peep.o insn-recog.o integrate.o intl.o jump.o langhooks.o lcm.o lists.o local-alloc.o loop.o obstack.o optabs.o params.o predict.o print-rtl.o print-tree.o profile.o real.o recog.o reg-stack.o regclass.o regmove.o regrename.o reload.o reload1.o reorg.o resource.o rtl.o rtlanal.o rtl-error.o sbitmap.o sched-deps.o sched-ebb.o sched-rgn.o sched-vis.o sdbout.o sibcall.o simplify-rtx.o ssa.o ssa-ccp.o ssa-dce.o stmt.o stor-layout.o stringpool.o timevar.o toplev.o tree.o tree-dump.o tree-inline.o unroll.o varasm.o varray.o vmsdbgout.o xcoffout.o ggc-page.o i386.o xmalloc.o xexit.o hashtab.o safe-ctype.o splay-tree.o xstrdup.o md5.o fibheap.o xstrerror.o concat.o partition.o hex.o lbasename.o getpwd.o ucbqsort.o -lm-o gcc emit-rtl.o: In function `gen_rtx_REG': emit-rtl.c:(.text+0x12f8): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section in regclass.o emit-rtl.o: In function `gen_rtx': emit-rtl.c:(.text+0x1824): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `fixed_regs' defined in COMMON section in regclass.o collect2: error: ld returned 1 exit status specmake: *** [gcc] Error 1 Error with make 'specmake -j7 build': check file '/home/alalaw01/spectest/benchspec/CPU2006/403.gcc/build/build_base_test./make.err' Command returned exit code 2 Error with make! *** Error building 403.gcc Inspecting the compiled emit-rtl.o shows: $ readelf --relocs good/emit-rtl.o | grep fixed_regs 12a8 005d0113 R_AARCH64_ADR_PRE fixed_regs + 0 12ac 005d0115 R_AARCH64_ADD_ABS fixed_regs + 0 1800 005d0113 R_AARCH64_ADR_PRE fixed_regs + 0 1804 005d0115 R_AARCH64_ADD_ABS fixed_regs + 0 (that's compiled with a gcc just before this patch), contrastingly using a gcc with that patch: $ readelf --relocs bad/emit-rtl.o | grep fixed_regs 12a8 005d0113 R_AARCH64_ADR_PRE fixed_regs + 0 12ac 005d0115 R_AARCH64_ADD_ABS fixed_regs + 0 12f8 005d0113 R_AARCH64_ADR_PRE fixed_regs + 12fc 005d0116 R_AARCH64_LDST8_A fixed_regs + 1824 005d0113 R_AARCH64_ADR_PRE fixed_regs + 1828 005d0116 R_AARCH64_LDST8_A fixed_regs + 186c 005d0113 R_AARCH64_ADR_PRE fixed_regs + 0 1870 005d0115 R_AARCH64_ADD_ABS fixed_regs + 0 I attach a candidate 'fix', which allows building of 403.gcc on aarch64-none-linux-gnu, full regression etc ongoing. (I admit there may be better options in terms of canonicalizing if you want to!) I don't think this is the correct fix or even close to the real issue. I think we have some tiny memory model issues coming in when medium memory model is being used. I ran into this issue while compiling php with a GCC 4.7 based compiler. Try the attached patch. Thanks, Andrew Pinski --Alan Richard Biener wrote: The following makes tree_swap_operands_p put all constants 2nd place, also looks through sign-changes when considering further canonicalzations and removes the odd -Os guard for those. That was put in with https://gcc.gnu.org/ml/gcc-patches/2003-10/msg01208.html just motivated by CSiBE numbers - but rather than disabling canonicalization this should have disabled the actual harmful transforms. Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu. Richard. 2014-08-15 Richard Biener rguent...@suse.de * fold-const.c