[PATCH 0/3][AArch64]More intrinsics/builtins improvements

2014-11-14 Thread Alan Lawrence
These three are logically independent, but all on a common theme, and I've tested them all together by bootstrapped + check-gcc on aarch64-none-elf cross-tested check-gcc on aarch64_be-none-elf Ok for trunk?

[PATCH 1/3][AArch64]Replace __builtin_aarch64_createv1df with a cast, cleanup

2014-11-14 Thread Alan Lawrence
Now that float64x1_t is a vector, casting to it from a unit64_t causes the bit pattern to be reinterpreted, just as vcreate_f64 should. (Previously when float64x1_t was still a scalar, casting caused a conversion.) Hence, replace the __builtin with a cast. None of the other variants of the

[PATCH 2/3][AArch64] Extend aarch64_simd_vec_set pattern, replace asm for vld1_lane

2014-11-14 Thread Alan Lawrence
The vld1_lane intrinsic is currently implemented using inline asm. This patch replaces that with a load and a straightforward use of vset_lane (this gives us correct bigendian lane-flipping in a simple manner). Naively this would produce assembler along the lines of (for vld1_lane_u8):

[PATCH 3/3][AArch64]Replace temporary assembler for vld1_dup

2014-11-14 Thread Alan Lawrence
This patch replaces the inline asm for vld1_dup intrinsics with a vdup_n_ and a load from the pointer. The existing *aarch64_simd_ld1rmode insn, combiner, etc., are quite capable of generating the expected single ld1r instruction from this. (I've verified by inspecting assembler output.)

Re: [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR

2014-11-14 Thread Alan Lawrence
Ah, I didn't realize Loongson was little-endian only. In that case (with mid-end reductions-via-shifts changes pushed) I don't think I have actually broken anything, or at least, no MIPS platform that exists :). However, yes, that would seem a safe bet (and simpler than my linked patch that

[PATCH][AArch64]Add vec_shr pattern for 64-bit vectors using ush{l,r}; enable tests.

2014-11-14 Thread Alan Lawrence
Following recent vectorizer changes to reductions via shifts, AArch64 will now reduce loops such as this unsigned char in[8] = {1, 3, 5, 7, 9, 11, 13, 15}; int main (unsigned char argc, char **argv) { unsigned char prod = 1; /* Prevent constant propagation of the entire loop below. */

Re: [PATCH][AArch64]Add vec_shr pattern for 64-bit vectors using ush{l,r}; enable tests.

2014-11-14 Thread Alan Lawrence
...Patch attached... Alan Lawrence wrote: Following recent vectorizer changes to reductions via shifts, AArch64 will now reduce loops such as this unsigned char in[8] = {1, 3, 5, 7, 9, 11, 13, 15}; int main (unsigned char argc, char **argv) { unsigned char prod = 1; /* Prevent

[PATCH][AArch64] Remove/merge redundant iterators

2014-11-13 Thread Alan Lawrence
Hi, gcc/config/aarch64/iterators.md contains numerous duplicates - not always obvious as they are not always sorted the same. Sometimes, one copy is used is aarch64-simd-builtins.def and another in aarch64-simd.md; othertimes there is no obvious pattern ;). This patch just removes all the

Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-11-12 Thread Alan Lawrence
So I'm no expert on RS6000 here, but following on from Segher's observation about the change in pattern...so the difference in 'expand' is exactly that, a vsx_reduc_splus_v2df followed by a vec_extract to DF, becomes a vsx_reduc_splus_v2df_scalar - as I expected the combiner to produce by

Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness

2014-11-12 Thread Alan Lawrence
Nice! One nit - can the extra tree argument be a const_tree ? - I'll defer to the maintainers on the use of C++ default arguments in the AArch64 backend. But LGTM. --Alan Charles Baylis wrote: On 11 November 2014 15:25, Alan Lawrence alan.lawre...@arm.com wrote: [Resending in gcc-patches

[PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR

2014-11-12 Thread Alan Lawrence
In response to https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01803.html, this series removes the VEC_RSHIFT_EXPR, instead using a VEC_PERM_EXPR (with a second argument full of constant zeroes) to represent the shift. I've kept the use of vec_shr optab for platforms that define it, as even on

[PATCH 1/4][Vectorizer] Split vect_gen_perm_mask into _checked and _any variants

2014-11-12 Thread Alan Lawrence
This is a preliminary to patch 2, which wants functionality equivalent to vect_gen_perm_mask (converting a char* to an RTL const_vector) but without the check of can_vec_perm_p. All existing calls to vect_gen_perm_mask barring that in perm_mask_for_reverse, assert the return value is

[PATCH 2/4][Vectorizer] Use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR; expand appropriate VEC_PERM_EXPRs using vec_shr_optab

2014-11-12 Thread Alan Lawrence
This makes the vectorizer use VEC_PERM_EXPRs when doing reductions via shifts, rather than VEC_RSHIFT_EXPR. VEC_RSHIFT_EXPR presently has an endianness-dependent meaning (paralleling vec_shr_optab). While the overall destination of this patch series is to make these endianness-neutral, this

[PATCH 3/4] Remove VEC_RSHIFT_EXPR tree code, now unused

2014-11-12 Thread Alan Lawrence
Tested (with patches 1+2): Bootstrap + check-gcc on x64-none-linux-gnu cross-tested check-gcc on aarch64-none-elf and aarch64_be-none-elf as these platforms stand (i.e. without vec_shr_optab). also cross-tested check-gcc on aarch64-none-elf and aarch64_be-none-elf after applying

[PATCH 4/4][Vectorizer]Make reductions-via-shifts and vec_shr_optab endianness-neutral

2014-11-12 Thread Alan Lawrence
This redefines vec_shr optab to be the same (in terms of gcc vectors) regardless of target endianness. The vectorizer uses this to do reductions via shifts, so also change the vectorizer to shift things always the same way (from the midend's POV of vectors). cross-tested check-gcc on (1)

Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-11-12 Thread Alan Lawrence
Have run check-gcc on gcc110.fsffrance.org (powerpc64-unknown-linux-gnu) using this snippet on top of original patch; no regressions. Alan Lawrence wrote: So I'm no expert on RS6000 here, but following on from Segher's observation about the change in pattern...so the difference in 'expand

Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness

2014-11-12 Thread Alan Lawrence
Pushed as r217440, also with Charles' whitespace fixes ('' - tab) - good spot! Cheers, Alan Marcus Shawcroft wrote: On 6 November 2014 10:19, Alan Lawrence alan.lawre...@arm.com wrote: This generates out-of-range errors at compile- (rather than assemble-)time for the vqdm*_lane

Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness

2014-11-11 Thread Alan Lawrence
, but there's still ARM, indeed. If you have any way/ideas to get better error messages (i.e. line numbers), that'd be particularly good, tho :) Cheers, Alan Charles Baylis wrote: On 6 November 2014 10:19, Alan Lawrence alan.lawre...@arm.com mailto:alan.lawre...@arm.com wrote: This generates out

Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-11-07 Thread Alan Lawrence
Ah I see now! Thank you for explaining that bit, I was a bit puzzled when I saw it, but it makes sense now! Cheers, Alan Bill Schmidt wrote: On Thu, 2014-11-06 at 16:44 +, Alan Lawrence wrote: Hmmm. I am a little surprised by your mention of saturation points as I would not expect any

[PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness

2014-11-06 Thread Alan Lawrence
This generates out-of-range errors at compile- (rather than assemble-)time for the vqdm*_lane intrinsics, and also provides a single place to do bigendian lane-swapping for all those intrinsics (and others to follow in later patches). This allows us to remove many define_expands that just do a

Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-11-06 Thread Alan Lawrence
Hmmm. I am a little surprised by your mention of saturation points as I would not expect any variety of reduc_plus to be a saturating operation??? A. Bill Schmidt wrote: On Fri, 2014-10-24 at 19:49 -0400, David Edelsohn wrote: On Fri, Oct 24, 2014 at 8:06 AM, Alan Lawrence alan.lawre

Re: [COMMITTED][PATCH PR63173] [AARCH64, NEON] Improve vld[234](q?)_dup intrinsics

2014-11-03 Thread Alan Lawrence
So we've been seeing FAIL: gcc.target/aarch64/vldN_dup_1.c on aarch64_be-none-elf, since this patch went in. Felix, did you test for bigendian? However, this failure is fixed if I apply David Sherwood's patch set: https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00942.html

Re: FW: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.

2014-10-28 Thread Alan Lawrence
When you say a patch by Alan Hayward that's coming soon, I take it you mean this one? https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00952.html Just so that we know it has now arrived :). --Alan David Sherwood wrote: Hi, I forgot to mention that this patch needs was tested in combination

Re: [PATCH 11/14] Remove VEC_LSHIFT_EXPR and vec_shl_optab

2014-10-27 Thread Alan Lawrence
, there is just a one-line conflict with a change to a comment from the previous patch (which I'm skipping)... Cheers, Alan Richard Biener wrote: On Thu, Sep 18, 2014 at 2:35 PM, Alan Lawrence alan.lawre...@arm.com wrote: The VEC_LSHIFT_EXPR tree code, and the corresponding vec_shl_optab, seem to have

Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c

2014-10-24 Thread Alan Lawrence
Rainer Orth wrote: However, as a quick first step, does adding the ilp32 / lp64 (and keeping the architectures list for now) solve the immediate problem? Patch attached, OK for trunk? No, as I said this is wrong for biarch targets like sparc and i386. When you say no this does not solve

[PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral

2014-10-24 Thread Alan Lawrence
, with the existing name, so am open to suggestions? Cheers, Alancommit 9819291c17610dcdcca19a3d9ea3a4260df0577e Author: Alan Lawrence alan.lawre...@arm.com Date: Thu Aug 21 13:05:43 2014 +0100 Temporarily remove gimple_fold diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64

[PATCH 7/11][ARM] Migrate to new reduc_plus_scal_optab

2014-10-24 Thread Alan Lawrence
to... (reduc_plus_scal_*): ...this; reduce to temp and extract scalar result.commit 22e60bd46f2a591f5357a543d76b19ed89f401ed Author: Alan Lawrence alan.lawre...@arm.com Date: Thu Aug 28 16:12:24 2014 +0100 ARM reduc_plus_scal, V_elem not V_ext, rm old reduc_[us]plus, emit the extract! diff --git

[PATCH 8/11][ARM] Migrate to new reduc_[us](min|max)_scal_optab

2014-10-24 Thread Alan Lawrence
): ...this; extract scalar result.commit 537c31561933f8054a2289198f35b19cf5c4196e Author: Alan Lawrence alan.lawre...@arm.com Date: Thu Aug 28 16:49:24 2014 +0100 ARM reduc_[us](min|max)_scal, V_elem not V_ext, rm old non-_scal version. diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md

[PATCH 9/11][i386] Migrate reduction optabs to reduc_..._scal

2014-10-24 Thread Alan Lawrence
, reduc_plus_scal_v8sf, reduc_plus_scal_v4sf): ...these, adding gen_vec_extract for scalar result.commit 80b0d10a78b2f3e86325f373e99e9cf71e42e622 Author: Alan Lawrence alan.lawre...@arm.com Date: Tue Oct 7 13:25:08 2014 +0100 i386 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386

[Protopatch 11/11][IA64] Migrate to reduc_(plus|min|max)_scal_v2df optab

2014-10-24 Thread Alan Lawrence
This is an attempt to migrate IA64 to the newer optabs, however, I found none of the tests in gcc.dg/vect seemed to touch any of the affected patternsso this is only really tested by building a stage-1 compiler. gcc/ChangeLog: * config/ia64/vect.md (reduc_splus_v2sf): Rename to...

[PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-10-24 Thread Alan Lawrence
This migrates the reduction patterns in altivec.md and vector.md to the new names. I've not touched paired.md as I wasn't really sure how to fix that (how do I vec_extractv2sf ?), moreover the testing I did didn't seem to exercise any of those patterns (iow: I'm not sure what would be an

Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-10-24 Thread Alan Lawrence
Ooops, attached.commit e48d59399722ce8316d4b1b4f28b40d87b1193fa Author: Alan Lawrence alan.lawre...@arm.com Date: Tue Oct 7 15:28:47 2014 +0100 PowerPC v2 (but not paired.md) diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 02ea142..92bb5d0 100644 --- a/gcc

Re: [Protopatch 11/11][IA64] Migrate to reduc_(plus|min|max)_scal_v2df optab

2014-10-24 Thread Alan Lawrence
Ooops, attached.commit 56296417b9f6795e541b1101dce6e6ac1789de9a Author: Alan Lawrence alan.lawre...@arm.com Date: Wed Oct 8 15:58:27 2014 +0100 IA64 (?!) diff --git a/gcc/config/ia64/vect.md b/gcc/config/ia64/vect.md index e3ce292..45f4156 100644 --- a/gcc/config/ia64/vect.md +++ b/gcc

Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c

2014-10-24 Thread Alan Lawrence
-supports.exp - yes, could do that, but it's difficult to come up with a good characterization of what the criteria is, and I don't see it'd generalize to any other tests at all :() --Alan Rainer Orth wrote: Alan Lawrence alan.lawre...@arm.com writes: Rainer Orth wrote: However

RE: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c

2014-10-23 Thread Alan Lawrence
From: Rainer Orth [r...@cebitec.uni-bielefeld.de] Sent: 23 October 2014 14:10 To: Andreas Schwab Cc: Alan Lawrence; Jeff Law; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c Andreas Schwab sch...@linux

Re: [PATCH] support ggc hash_map and hash_set

2014-10-17 Thread Alan Lawrence
Sorry, somehow I missed this email. Yes, that appears to have fixed it! Thank you very much, Alan Trevor Saunders wrote: On Tue, Sep 09, 2014 at 03:37:26PM +0100, Alan Lawrence wrote: Following this, we're seeing ICEs in tests in gcc.dg/pch.exp and g++.dg/pch.exp, with cross-builds (hosted

Re: [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114

2014-10-09 Thread Alan Lawrence
a vec_extract, to produce a scalar result, to the end of each reduc_ optab ? --Alan Richard Biener wrote: On Mon, Oct 6, 2014 at 7:30 PM, Alan Lawrence alan.lawre...@arm.com wrote: Ok, so unless there are objections, I plan to commit patches 1, 2, 4, 5, and 6, which have been previously

Re: [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114

2014-10-06 Thread Alan Lawrence
backends to the new _scal_ optab (and removing the vector optab). Certainly I'd like to replace vec_shr/l with vec_perm_expr too, but I'm conscious that the end of stage 1 is approaching! --Alan Richard Biener wrote: On Thu, Sep 18, 2014 at 1:41 PM, Alan Lawrence alan.lawre...@arm.com wrote

Re: [PATCH] Put all constants last in tree_swap_operands_p, remove odd -Os check

2014-09-25 Thread Alan Lawrence
executed, of course! So yes, my workaround is wrong, we are working on a proper fix... --Alan Andrew Pinski wrote: On Mon, Sep 22, 2014 at 4:10 AM, Alan Lawrence alan.lawre...@arm.com wrote: Well, I haven't looked into this in detail: I've gone only as far as * swapping emit-rtl.o between

Re: [PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result

2014-09-25 Thread Alan Lawrence
Many thanks indeed! :) --Alan Segher Boessenkool wrote: On Wed, Sep 24, 2014 at 04:02:11PM +0100, Alan Lawrence wrote: However my CompileFarm account is still pending, so to that end, if you were able to test patch 2/14 (attached inc. Richie's s/VIEW_CONVERT_EXPR/NOP_EXPR

Re: [PATCH/RFC v2 3/14] Add new optabs for reducing vectors to scalars

2014-09-25 Thread Alan Lawrence
, is there any chance you could test this on powerpc too? (in combination with patch 2/14, which will need to be applied first; you can skip patch 1, and =4.) --Alan Richard Biener wrote: On Thu, Sep 25, 2014 at 4:32 PM, Alan Lawrence alan.lawre...@arm.com wrote: Ok, so, I've tried making reduc_plus

Re: [PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result

2014-09-24 Thread Alan Lawrence
to the list shortly, and follow up with .md updates to the various backends. Cheers, Alan Richard Biener wrote: On Thu, Sep 18, 2014 at 1:50 PM, Alan Lawrence alan.lawre...@arm.com wrote: This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes. These are presently documented

Re: [PATCH] Put all constants last in tree_swap_operands_p, remove odd -Os check

2014-09-22 Thread Alan Lawrence
for trunk...? --Alan Andrew Pinski wrote: On Thu, Sep 18, 2014 at 9:44 AM, Alan Lawrence alan.lawre...@arm.com wrote: We've been seeing errors using aarch64-none-linux-gnu gcc to build the 403.gcc benchmark from spec2k6, that we've traced back to this patch. The error looks like: /home/alalaw01

[AArch64] Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c

2014-09-22 Thread Alan Lawrence
injecting some dubious RTL from a builtin, although this'll only give a momentary snapshot of behaviour. I may or may not have time to look into this though ;)... Cheers, Alan Jeff Law wrote: On 09/18/14 03:35, Alan Lawrence wrote: Moreover, I think we both agree that if result_mode==shift_mode

Re: [PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result

2014-09-22 Thread Alan Lawrence
Richard Biener wrote: Huh. Does that ever happen? Please use a NOP_EXPR instead of a VIEW_CONVERT_EXPR. Yes, the testcase is gcc.target/i386/pr51235.c which performs black magic*** with void *. (This testcase otherwise fails the verify_gimple_assign_unary check in tree-cfg.c .) However,

Re: [PATCH 3/14] Add new optabs for reducing vectors to scalars

2014-09-22 Thread Alan Lawrence
Richard Biener wrote: scalar_reduc_to_vector misses a comment. Ok to reuse the comment in optabs.h in optabs.c also? I wonder if at the end we wouldn't transition all backends and then renaming reduc_*_scal_optab back to reduc_*_optab makes sense. Yes, that sounds like a plan, the _scal

Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c

2014-09-18 Thread Alan Lawrence
... (tests are as they were posted https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01233.html .) --Alan Jeff Law wrote: On 07/17/14 10:56, Alan Lawrence wrote: Ok, the attached tests are passing on x86_64-none-linux-gnu, aarch64-none-elf, arm-none-eabi, and a bunch of smaller platforms for which I've

Re: Fix ARM ICE for register var asm (pc) (PR target/60606)

2014-09-18 Thread Alan Lawrence
), i.e. all bar the change to arm_regno_class. A change relating to the program counter affecting -fPIC does sound plausible, I haven't looked any further than that... --Alan Joseph S. Myers wrote: On Wed, 17 Sep 2014, Alan Lawrence wrote: We've just noticed this patch causes an ICE in gcc.c

[PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114

2014-09-18 Thread Alan Lawrence
The end goal here is to remove this code from tree-vect-loop.c (vect_create_epilog_for_reduction): if (BYTES_BIG_ENDIAN) bitpos = size_binop (MULT_EXPR, bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1), TYPE_SIZE

[PATCH 1/14][AArch64] Temporarily remove aarch64_gimple_fold_builtin code for reduction operations

2014-09-18 Thread Alan Lawrence
The gimple folding ties the AArch64 backend to the tree representation of the midend via the neon intrinsics. This code enables constant folding of Neon intrinsics reduction ops, so improves performance, but is not necessary for correctness. By temporarily removing it (here), we can then change

[PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result

2014-09-18 Thread Alan Lawrence
This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes. These are presently documented as producing a vector with the result in element 0, and this is inconsistent with their use in tree-vect-loop.c (which on bigendian targets pulls the bits out of the wrong end of the

[PATCH 3/14] Add new optabs for reducing vectors to scalars

2014-09-18 Thread Alan Lawrence
These match their corresponding tree codes, by taking a vector and returning a scalar; this is more architecturally neutral than the (somewhat loosely defined) previous optab that took a vector and returned a vector with the result in the least significant bits (i.e. element 0 for little-endian

[PATCH 4/14][AArch64] Use new reduc_plus_scal optabs, inc. for __builtins

2014-09-18 Thread Alan Lawrence
This migrates AArch64 over to the new optab for 'plus' reductions, i.e. so the define_expands produce scalars by generating a MOV to a GPR. Effectively, this moves the vget_lane inside every arm_neon.h intrinsic, into the inside of the define_expand. Tested: aarch64.exp vect.exp on

[PATCH 5/14][AArch64] Use new reduc_[us](min|max)_scal optabs, inc. for builtins

2014-09-18 Thread Alan Lawrence
Similarly to the previous patch (r/2205), this migrates AArch64 to the new reduce-to-scalar optabs for min and max. For consistency we apply the same treatment to the smax_nan and smin_nan patterns (used for __builtins), even though reduc_smin_nan_scal (etc.) is not a standard name. Tested:

[PATCH 6/14][AArch64] Restore gimple_folding of reduction intrinsics

2014-09-18 Thread Alan Lawrence
This gives us back the constant-folding of the neon-intrinsics that was removed in the first patch, but is now OK for bigendian too. bootstrapped on aarch64-none-linux-gnu. check-gcc on aarch64-none-elf and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64.c

[PATCH 7/14][Testsuite] Add tests of reductions using whole-vector-shifts (multiplication)

2014-09-18 Thread Alan Lawrence
For reduction operations (e.g. multiply) that don't have such a tree code ,or where the target platform doesn't define an optab handler for the tree code, we can perform the reduction using a series of log(N) shifts (where N = #elements in vector), using the VEC_RSHIFT_EXPR=whole-vector-shift

[PATCH 8/14][Testsuite] Add tests of reductions using whole-vector-shifts (ior)

2014-09-18 Thread Alan Lawrence
These are like the previous patch, but using | rather than * - I was unable to get the previous test to pass on PowerPC and MIPS. I note there is no inherent vector operation here - a bitwise OR across a word, and a reduction via shifts using scalar (not vector) ops would be all that's

[PATCH 9/14] Enforce whole-vector-shifts to always be by a whole number of elements

2014-09-18 Thread Alan Lawrence
The VEC_RSHIFT_EXPR is only ever used by the vectorizer in tree-vect-loop.c (vect_create_epilog_for_reduction), to shift the vector by a whole number of elements. The tree code allows more general shifts but only for integral types. This only causes pain and difficulty for backends

[PATCH 10/14][AArch64] Implement vec_shr optab

2014-09-18 Thread Alan Lawrence
This allows reduction of non-(plus|min|max) operations using log_2(N) shifts rather than N vec_extracts; e.g. for example code int main (unsigned char argc, char **argv) { unsigned char in[16] = { 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31 }; unsigned char i = 0; unsigned char sum = 1;

[PATCH 11/14] Remove VEC_LSHIFT_EXPR and vec_shl_optab

2014-09-18 Thread Alan Lawrence
The VEC_LSHIFT_EXPR tree code, and the corresponding vec_shl_optab, seem to have been added for completeness, providing a counterpart to VEC_RSHIFT_EXPR and vec_shr_optab. However, whereas VEC_RSHIFT_EXPRs are generated (only) by the vectorizer, VEC_LSHIFT_EXPR expressions are not generated at

[PATCH 12/14][Vectorizer] Redefine VEC_RSHIFT_EXPR and vec_shr_optab as endianness-neutral

2014-09-18 Thread Alan Lawrence
The direction of VEC_RSHIFT_EXPR has been endian-dependent, contrary to the general principles of tree. This patch updates fold-const and the vectorizer (the only place where such expressions are created), such that VEC_RSHIFT_EXPR always shifts towards element 0. The tree code still maps

[PATCH 13/14][AArch64_be] Fix vec_shr pattern to correctly implement endianness-neutral optab

2014-09-18 Thread Alan Lawrence
The previous patch broke aarch64_be by redefining VEC_RSHIFT_EXPR / vec_shr_optab to always shift the vector towards gcc's element 0. This fixes aarch64_be to do that. check-gcc on aarch64-none-elf (no changes) and aarch64_be-none-elf (fixes all regressions produced by previous patch, i.e. no

[PATCH 14/14][Vectorizer] Tidy up vect_create_epilog / use_scalar_result

2014-09-18 Thread Alan Lawrence
Following earlier patches, vect_create_epilog_for_reduction contains exactly one case where extract_scalar_result==true. Hence, move the code 'if (extract_scalar_result)' there, and tidy-up/remove some variables. bootstrapped on x86_64-none-linux-gnu + check-gcc + check-g++. gcc/ChangeLog:

[PATCH/RFC 15 / 14+2][RS6000] Remove vec_shl and (hopefully) fix vec_shr

2014-09-18 Thread Alan Lawrence
Patch 12 of 14 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01475.html) will break bigendian targets implementing vec_shr. This is a PowerPC parallel of patch 13 of 14 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01477.html) for AArch64. I've checked I can build a stage 1 compiler for

[PATCH 16 / 14+2][MIPS] Remove vec_shl and (hopefully) fix vec_shr

2014-09-18 Thread Alan Lawrence
Patch 12 of 14 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01475.html) will break bigendian targets implementing vec_shr. This is a MIPS parallel of patch 13 of 14 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01477.html) for AArch64; the idea is that vec_shr should be unaffected on

Re: [PATCH] Put all constants last in tree_swap_operands_p, remove odd -Os check

2014-09-18 Thread Alan Lawrence
We've been seeing errors using aarch64-none-linux-gnu gcc to build the 403.gcc benchmark from spec2k6, that we've traced back to this patch. The error looks like: /home/alalaw01/bootstrap_richie/gcc/xgcc -B/home/alalaw01/bootstrap_richie/gcc -O3 -mcpu=cortex-a57.cortex-a53 -DSPEC_CPU_LP64

Re: Fix ARM ICE for register var asm (pc) (PR target/60606)

2014-09-17 Thread Alan Lawrence
We've just noticed this patch causes an ICE in gcc.c-torture/execute/scal-to-vec1.c at -O3 when running with -fPIC on arm-none-linux-gnueabi and arm-none-linux-gnueabihf; test logs: spawn /tmp/alan/buildarm-none-linux-gnueabi/obj/gcc4/gcc/xgcc -B/tmp/alan/builda

Re: [PATCH][AArch64 Testsuite] Add test of vld[234]q? intrinsic

2014-09-11 Thread Alan Lawrence
marcus.shawcr...@gmail.com wrote: On 8 September 2014 11:35, Alan Lawrence alan.lawre...@arm.com wrote: This adds a test of all the variants of vld2, vld2q, vld3, vld3q, vld4, and vld4q. These all use typexNxM structs and the OI/CI/XImode mechanism, so the test cross-checks this against plain ol' vst1(q

[PATCH 4.9] Backported r214946: One-liner: fix type of an add in SIMD registers

2014-09-11 Thread Alan Lawrence
Original patch applied cleanly to 4.9 HEAD as r215175. Marcus Shawcroft wrote: On 20 August 2014 10:25, Alan Lawrence alan.lawre...@arm.com wrote: The SIMD-register variant is miscategorized as alu_reg despite not using any ALU registers, and should be neon_add for e.g. scheduling. Tested

[PATCH][AArch64 Testsuite]Fix scan-assembler test false alarm on aarch64-linux-gnu

2014-09-09 Thread Alan Lawrence
...@gmail.com wrote: On 19 August 2014 11:44, Alan Lawrence alan.lawre...@arm.com wrote: gcc/ChangeLog: * config/aarch64/aarch64-builtins.c (aarch64_types_cmtst_qualifiers, TYPES_TST): Define. (aarch64_fold_builtin): Update pattern for cmtst. * config/aarch64/aarch64

Re: [PATCH] support ggc hash_map and hash_set

2014-09-09 Thread Alan Lawrence
Following this, we're seeing ICEs in tests in gcc.dg/pch.exp and g++.dg/pch.exp, with cross-builds (hosted on x86_64) targetting bare metal AArch64 and ARM (aarch64-none-elf, aarch64_be-none-elf and arm-none-eabi; I haven't tested armeb-none-eabi; builds targeting linux are OK), for *release

[PATCH][AArch64 Testsuite] Add test of vld[234]q? intrinsic

2014-09-08 Thread Alan Lawrence
This adds a test of all the variants of vld2, vld2q, vld3, vld3q, vld4, and vld4q. These all use typexNxM structs and the OI/CI/XImode mechanism, so the test cross-checks this against plain ol' vst1(q?). Cross-tested on aarch64-none-elf (passing), also on aarch64_be-none-elf

[PATCH][AArch64 Testsuite] Extend test of vld1+vst1 intrinsics to cover more variants

2014-09-08 Thread Alan Lawrence
The existing vld1/vst1_1.c test in gcc.target/aarch64 covers only vld1_s8 and vld1q_s16. This extends it to cover all int/float variants via token-pasting. Passing on aarch64-none-elf and aarch64_be-none-elf. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vld1-vst1_1.c: Rewrite to test

[PATCH][AArch64 Testsuite] Add a test of vldN_dup intrinsics

2014-09-08 Thread Alan Lawrence
This adds a test of the vld2_dup, vld2q_dup, vld3_dup, vld3q_dup, vld4_dup and vld4q_dup instrinsics. Passing on aarch64-none-elf and aarch64_be-none-elf. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vldN_dup_1.c: New test.diff --git a/gcc/testsuite/gcc.target/aarch64/vldN_dup_1.c

[PATCH][AArch64 Testsuite] Add a test of the vldN_lane intrinsic

2014-09-08 Thread Alan Lawrence
At present there is no test coverage of the vld2_lane, vld2q_lane, vld3_lane, vld3q_lane, vld4_lane, vld4q_lane intrinsics. So this adds a test using the vld1 and vst1 intrinsics. Passing on aarch64-none-elf. Failing on aarch64_be-none-elf; I believe because the intrinsic is [modifying the]

[PATCH][AArch64 Testsuite] Add a test of the vst[234](q?) intrinsics

2014-09-08 Thread Alan Lawrence
This adds a test of all the variants of vst2, vst2q, vst3, vst3q, vst4, and vst4q. These all use typexNxM structs and the OI/CI/XImode mechanism, so the test cross-checks this against plain ol' vld1(q?). Cross-tested on aarch64-none-elf (passing), also on aarch64_be-none-elf (failing as per

Re: [PATCH 4.9][AArch64] Backport r214953: Rename [u]int32x1_t to [u]int32_t (resp 16x1, 8x1)in arm_neon.h

2014-09-08 Thread Alan Lawrence
(No regressions in check-gcc or check-g++ on aarch64-none-elf.) --Alan Alan Lawrence wrote: Some manual editing of patch required due to e.g. int64x1 changes present on trunk but not on the 4.9 branch; new patch attached. I've done a quick smoke test of aarch64.exp+simd.exp (check-gcc

[Obvious] Remove unused aarch64_types_cmtst_qualifiers, was breaking bootstrap.

2014-09-08 Thread Alan Lawrence
Pushed as r215015. gcc/ChangeLog: * config/aarch64/aarch64-builtins.c (aarch64_types_cmtst_qualifiers, TYPES_TST): Remove as unused. - Index: gcc/config/aarch64/aarch64-builtins.c === ---

Re: [PATCH AArch64 1/2] Improve codegen of vector compares inc. tst instruction

2014-09-08 Thread Alan Lawrence
aarch64-none-elf and aarch64_be-none-elf, but FAIL for aarch64-none-linux-gnu. It seems this is not what you saw in your own validations? Christophe. On 2 September 2014 17:17, Marcus Shawcroft marcus.shawcr...@gmail.com wrote: On 19 August 2014 11:44, Alan Lawrence alan.lawre...@arm.com wrote

[PATCH 1/2][AArch64] Simplify patterns for sshr_n_[us]64 intrinsic

2014-09-08 Thread Alan Lawrence
The sshr_n_64 intrinsics allow performing a signed shift right by 64 places. The standard ashrdi3 pattern masks the sign amount with 63, so cannot be used. However, such a shift fills the result by the sign bit, which is identical to shifting right by 63. This patch just simplifies the code to

Re: [PATCH 1/2][AArch64] Simplify patterns for sshr_n_[us]64 intrinsic

2014-09-08 Thread Alan Lawrence
Patch attached. Alan Lawrence wrote: The sshr_n_64 intrinsics allow performing a signed shift right by 64 places. The standard ashrdi3 pattern masks the sign amount with 63, so cannot be used. However, such a shift fills the result by the sign bit, which is identical to shifting right by 63

[PATCH 2/2][AArch64] Simplify+improve patterns for ushr(d?)_n_u64 intrinsic

2014-09-08 Thread Alan Lawrence
const0_rtx. (aarch64_ushr_simddi): Delete. * config/aarch64/aarch64.md (enum unspec): Delete UNSPEC_USHR64. Alan Lawrence wrote: The sshr_n_64 intrinsics allow performing a signed shift right by 64 places. The standard ashrdi3 pattern masks the sign amount with 63, so cannot be used

[PATCH 1/2][AArch64 Testsuite] Add execution test of vset(q?)_lane intrinsics.

2014-09-08 Thread Alan Lawrence
This adds a test thath checks the result of a vset_lane intrinsic is identical to the input apart from one value being changed. Test checks only one index per vset_lane_xxx in a somewhat adhoc fashion as the index has to be a compile-time immediate and I felt that doing a loop using macros

[PATCH 2/2][AArch64] Replace temporary inline assembler for vset_lane

2014-09-08 Thread Alan Lawrence
with __aarch64_vset_lane_any. OK for trunk? Alan Lawrence wrote: This adds a test thath checks the result of a vset_lane intrinsic is identical to the input apart from one value being changed. Test checks only one index per vset_lane_xxx in a somewhat adhoc fashion as the index has to be a compile-time immediate and I

[PATCH][AArch64] Simplify vreinterpret for float64x1_t using casts.

2014-09-08 Thread Alan Lawrence
, vreinterpret_s16_f64, vreinterpret_s32_f64, vreinterpret_u8_f64, vreinterpret_u16_f64, vreinterpret_u32_f64): Use cast. * config/aarch64/iterators.md (VD_RE): Delete.commit 126c5b92eea2850367f005ebe89f86c5b8b4e4f9 Author: Alan Lawrence alan.lawre...@arm.com Date: Wed Aug 6 14:23:00 2014

Re: [PATCH][AArch64] One-liner: fix type of an add in SIMD registers

2014-09-05 Thread Alan Lawrence
Pushed r214946. (In the meantime the erroneous alu_reg had been changed to alu_sreg by r212750 https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00679.html .) Marcus Shawcroft wrote: On 20 August 2014 10:25, Alan Lawrence alan.lawre...@arm.com wrote: The SIMD-register variant is miscategorized

[PATCH 4.9][AArch64] Backport r214953: Rename [u]int32x1_t to [u]int32_t (resp 16x1, 8x1)in arm_neon.h

2014-09-05 Thread Alan Lawrence
running. I repeat, this is source-code-compatibility breaking, but not ABI breaking; if it causes you any problems, it'll be the 4.9.x compiler shouting at you ;). Ok assuming no regressions? --Alan Marcus Shawcroft wrote: On 24 July 2014 11:18, Alan Lawrence alan.lawre...@arm.com wrote

Re: [PATCH AArch64 2/2] Replace temporary inline assembler for vget_high

2014-09-04 Thread Alan Lawrence
. --Alan Marcus Shawcroft wrote: On 12 August 2014 11:12, Alan Lawrence alan.lawre...@arm.com wrote: This patch replaces the current inline assembler for the vget_high intrinsics in arm_neon.h with a sequence of other calls, in a similar fashion to vget_low. Unlike the assembler, these are all

[PATCH][AArch64] Remove varargs from aarch64_simd_expand_args

2014-08-20 Thread Alan Lawrence
This patch just replaces the varargs with a builtin_simd_arg*. The use of varargs seems to make stepping into, and breakpointing, aarch64_simd_expand_args difficult, and this adds typesafety and (I argue!) reduces complexity. Tested check-gcc on aarch64-none-elf. gcc/ChangeLog: *

[PATCH][AArch64] One-liner: fix type of an add in SIMD registers

2014-08-20 Thread Alan Lawrence
The SIMD-register variant is miscategorized as alu_reg despite not using any ALU registers, and should be neon_add for e.g. scheduling. Tested with check-gcc and check-g++ on aarch64-none-elf and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64.md (adddi3_aarch64): set

[PATCH][AArch64] Tidy: remove unused qualifier_const_pointer

2014-08-20 Thread Alan Lawrence
The only reference is in a comment. gcc/ChangeLog: * config/aarch64/aarch64-builtins.c (enum aarch64_type_qualifiers): Remove qualifier_const_pointer, update comment.diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index

Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c

2014-08-20 Thread Alan Lawrence
Thanks to Arnaud for confirming that Adacore does not have interest in the Ada/Alpha combination (https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01832.html). As per below, I've tested check-ada on x86_64-none-linux-gnu without problems. Can I say, ping? :) Cheers, Alan Alan Lawrence wrote

Re: [PATCH,rs6000] Add __VEC_ELEMENT_REG_ORDER__ builtin define for PowerPC

2014-08-20 Thread Alan Lawrence
Completely as an aside: this makes me wonder, whether having and using a similar macro _inside_ gcc, i.e. for targets to specify the ordering of elements within a vector independently of BYTES_BIG_ENDIAN, might be a good thing? --Alan Bill Schmidt wrote: Hi, This adds a macro to indicate

Re: Does anyone use Ada on Alpha?

2014-08-19 Thread Alan Lawrence
of you folk are able to build+test the patch (below) for Ada on Alpha: is this really reason for us to want to hold this up? --Alan Alan Lawrence wrote: ...as I've not managed to build such a gcc. If so, is there any chance you could please test check-ada with the following patch (in gcc

[PATCH AArch64 1/2] Improve codegen of vector compares inc. tst instruction

2014-08-19 Thread Alan Lawrence
Vector comparisons are sometimes generated with needless 'not' instructions, and 'cmtst' is generally not output at all. This patch makes gen_aarch64_vcond_internal more intelligent with regard to swapping the operands to both the comparison and the conditional move, such that not is avoided

[PATCH AArch64 2/2] Remove vector compare/tst __builtins

2014-08-19 Thread Alan Lawrence
The vector compare intrinsics (vc[gl][et]z, vceqz, vtst) were written using __builtin functions as (IIUC) at the time gcc vector extensions did not support comparison ops across both C and C++ frontends. These have since been updated. Following the first patch, we now get equal/better code

[PATCH AArch64] Add a builtin for rbit(q?)_p8; add intrinsics and tests.

2014-08-19 Thread Alan Lawrence
This patch adds the missing vrbit_p8 and vrbitq_p8 intrinsics to arm_neon.h, and implements all the vrbit(q?)_[psu]8 intrinsics using a new builtin, rather than the previous temporary asm. Also adds a testcase checking (a) execution results and (b) that we output rbit vXX.8b,vYY.8b or

Re: [PATCH AArch64 1/3] Don't disparage add/sub in SIMD registers

2014-08-18 Thread Alan Lawrence
, the '!' still doesn't express that; and leaving it in affects code-generation on all cores. And it is inconsistent with other instructions. --Alan pins...@gmail.com wrote: On Aug 12, 2014, at 7:40 AM, Alan Lawrence alan.lawre...@arm.com wrote: (It is no more expensive.) Yes on some processors

Re: [PATCH AArch64 3/3] Fix XOR_one_cmpl pattern; add SIMD-reg variants for BIC,ORN,EON

2014-08-13 Thread Alan Lawrence
, Alan Lawrence wrote: ...patch attached... Alan Lawrence wrote: [When I wrote that xor was broken on GPRs and this fixes it, I meant xor_one_cmpl rather than xor, sorry!] The pattern for xor_one_cmpl never matched, due to the action of combine_simplify_rtx; hence, separate this pattern out from

[PATCH AArch64 1/2] Add execution tests of vget_low and vget_high

2014-08-12 Thread Alan Lawrence
Following patch replaces the current temporary inline assembler implementation of vget_high. So this patch adds a test first. We don't have any test coverage of vget_low, either, so add that too. Passing on aarch64-none-elf and aarch64_be-none-elf. diff --git

<    1   2   3   4   5   6   >