These three are logically independent, but all on a common theme, and I've
tested them all together by
bootstrapped + check-gcc on aarch64-none-elf
cross-tested check-gcc on aarch64_be-none-elf
Ok for trunk?
Now that float64x1_t is a vector, casting to it from a unit64_t causes the bit
pattern to be reinterpreted, just as vcreate_f64 should. (Previously when
float64x1_t was still a scalar, casting caused a conversion.) Hence, replace the
__builtin with a cast. None of the other variants of the
The vld1_lane intrinsic is currently implemented using inline asm. This patch
replaces that with a load and a straightforward use of vset_lane (this gives us
correct bigendian lane-flipping in a simple manner).
Naively this would produce assembler along the lines of (for vld1_lane_u8):
This patch replaces the inline asm for vld1_dup intrinsics with a vdup_n_ and a
load from the pointer. The existing *aarch64_simd_ld1rmode insn, combiner,
etc., are quite capable of generating the expected single ld1r instruction from
this. (I've verified by inspecting assembler output.)
Ah, I didn't realize Loongson was little-endian only. In that case (with mid-end
reductions-via-shifts changes pushed) I don't think I have actually broken
anything, or at least, no MIPS platform that exists :).
However, yes, that would seem a safe bet (and simpler than my linked patch that
Following recent vectorizer changes to reductions via shifts, AArch64 will now
reduce loops such as this
unsigned char in[8] = {1, 3, 5, 7, 9, 11, 13, 15};
int
main (unsigned char argc, char **argv)
{
unsigned char prod = 1;
/* Prevent constant propagation of the entire loop below. */
...Patch attached...
Alan Lawrence wrote:
Following recent vectorizer changes to reductions via shifts, AArch64 will now
reduce loops such as this
unsigned char in[8] = {1, 3, 5, 7, 9, 11, 13, 15};
int
main (unsigned char argc, char **argv)
{
unsigned char prod = 1;
/* Prevent
Hi,
gcc/config/aarch64/iterators.md contains numerous duplicates - not always
obvious as they are not always sorted the same. Sometimes, one copy is used is
aarch64-simd-builtins.def and another in aarch64-simd.md; othertimes there is no
obvious pattern ;).
This patch just removes all the
So I'm no expert on RS6000 here, but following on from Segher's observation
about the change in pattern...so the difference in 'expand' is exactly that, a
vsx_reduc_splus_v2df followed by a vec_extract to DF, becomes a
vsx_reduc_splus_v2df_scalar - as I expected the combiner to produce by
Nice! One nit - can the extra tree argument be a const_tree ? - I'll defer
to the maintainers on the use of C++ default arguments in the AArch64 backend.
But LGTM.
--Alan
Charles Baylis wrote:
On 11 November 2014 15:25, Alan Lawrence alan.lawre...@arm.com wrote:
[Resending in gcc-patches
In response to https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01803.html, this
series removes the VEC_RSHIFT_EXPR, instead using a VEC_PERM_EXPR (with a second
argument full of constant zeroes) to represent the shift.
I've kept the use of vec_shr optab for platforms that define it, as even on
This is a preliminary to patch 2, which wants functionality equivalent to
vect_gen_perm_mask (converting a char* to an RTL const_vector) but without the
check of can_vec_perm_p.
All existing calls to vect_gen_perm_mask barring that in perm_mask_for_reverse,
assert the return value is
This makes the vectorizer use VEC_PERM_EXPRs when doing reductions via shifts,
rather than VEC_RSHIFT_EXPR.
VEC_RSHIFT_EXPR presently has an endianness-dependent meaning (paralleling
vec_shr_optab). While the overall destination of this patch series is to make
these endianness-neutral, this
Tested (with patches 1+2):
Bootstrap + check-gcc on x64-none-linux-gnu
cross-tested check-gcc on aarch64-none-elf and aarch64_be-none-elf as these
platforms stand (i.e. without vec_shr_optab).
also cross-tested check-gcc on aarch64-none-elf and aarch64_be-none-elf after
applying
This redefines vec_shr optab to be the same (in terms of gcc vectors) regardless
of target endianness. The vectorizer uses this to do reductions via shifts, so
also change the vectorizer to shift things always the same way (from the
midend's POV of vectors).
cross-tested check-gcc on (1)
Have run check-gcc on gcc110.fsffrance.org (powerpc64-unknown-linux-gnu) using
this snippet on top of original patch; no regressions.
Alan Lawrence wrote:
So I'm no expert on RS6000 here, but following on from Segher's observation
about the change in pattern...so the difference in 'expand
Pushed as r217440, also with Charles' whitespace fixes ('' - tab) -
good spot!
Cheers, Alan
Marcus Shawcroft wrote:
On 6 November 2014 10:19, Alan Lawrence alan.lawre...@arm.com wrote:
This generates out-of-range errors at compile- (rather than assemble-)time
for the vqdm*_lane
, but there's still ARM, indeed.
If you have any way/ideas to get better error messages (i.e. line numbers),
that'd be particularly good, tho :)
Cheers, Alan
Charles Baylis wrote:
On 6 November 2014 10:19, Alan Lawrence alan.lawre...@arm.com
mailto:alan.lawre...@arm.com wrote:
This generates out
Ah I see now! Thank you for explaining that bit, I was a bit puzzled when I saw
it, but it makes sense now!
Cheers, Alan
Bill Schmidt wrote:
On Thu, 2014-11-06 at 16:44 +, Alan Lawrence wrote:
Hmmm. I am a little surprised by your mention of saturation points as I would
not expect any
This generates out-of-range errors at compile- (rather than assemble-)time for
the vqdm*_lane intrinsics, and also provides a single place to do bigendian
lane-swapping for all those intrinsics (and others to follow in later patches).
This allows us to remove many define_expands that just do a
Hmmm. I am a little surprised by your mention of saturation points as I would
not expect any variety of reduc_plus to be a saturating operation???
A.
Bill Schmidt wrote:
On Fri, 2014-10-24 at 19:49 -0400, David Edelsohn wrote:
On Fri, Oct 24, 2014 at 8:06 AM, Alan Lawrence alan.lawre
So we've been seeing
FAIL: gcc.target/aarch64/vldN_dup_1.c
on aarch64_be-none-elf, since this patch went in. Felix, did you test for
bigendian?
However, this failure is fixed if I apply David Sherwood's patch set:
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00942.html
When you say a patch by Alan Hayward that's coming soon, I take it you mean
this one? https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00952.html
Just so that we know it has now arrived :).
--Alan
David Sherwood wrote:
Hi,
I forgot to mention that this patch needs was tested in combination
, there is just a one-line
conflict with a change to a comment from the previous patch (which I'm skipping)...
Cheers, Alan
Richard Biener wrote:
On Thu, Sep 18, 2014 at 2:35 PM, Alan Lawrence alan.lawre...@arm.com wrote:
The VEC_LSHIFT_EXPR tree code, and the corresponding vec_shl_optab, seem to
have
Rainer Orth wrote:
However, as a quick first step, does adding the ilp32 / lp64 (and keeping
the architectures list for now) solve the immediate problem? Patch
attached, OK for trunk?
No, as I said this is wrong for biarch targets like sparc and i386.
When you say no this does not solve
, with the existing name, so am open to suggestions?
Cheers, Alancommit 9819291c17610dcdcca19a3d9ea3a4260df0577e
Author: Alan Lawrence alan.lawre...@arm.com
Date: Thu Aug 21 13:05:43 2014 +0100
Temporarily remove gimple_fold
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64
to...
(reduc_plus_scal_*): ...this; reduce to temp and extract scalar result.commit 22e60bd46f2a591f5357a543d76b19ed89f401ed
Author: Alan Lawrence alan.lawre...@arm.com
Date: Thu Aug 28 16:12:24 2014 +0100
ARM reduc_plus_scal, V_elem not V_ext, rm old reduc_[us]plus, emit the extract!
diff --git
): ...this; extract scalar result.commit 537c31561933f8054a2289198f35b19cf5c4196e
Author: Alan Lawrence alan.lawre...@arm.com
Date: Thu Aug 28 16:49:24 2014 +0100
ARM reduc_[us](min|max)_scal, V_elem not V_ext, rm old non-_scal version.
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
,
reduc_plus_scal_v8sf, reduc_plus_scal_v4sf): ...these, adding
gen_vec_extract for scalar result.commit 80b0d10a78b2f3e86325f373e99e9cf71e42e622
Author: Alan Lawrence alan.lawre...@arm.com
Date: Tue Oct 7 13:25:08 2014 +0100
i386
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386
This is an attempt to migrate IA64 to the newer optabs, however, I found none of
the tests in gcc.dg/vect seemed to touch any of the affected patternsso this
is only really tested by building a stage-1 compiler.
gcc/ChangeLog:
* config/ia64/vect.md (reduc_splus_v2sf): Rename to...
This migrates the reduction patterns in altivec.md and vector.md to the new
names. I've not touched paired.md as I wasn't really sure how to fix that (how
do I vec_extractv2sf ?), moreover the testing I did didn't seem to exercise any
of those patterns (iow: I'm not sure what would be an
Ooops, attached.commit e48d59399722ce8316d4b1b4f28b40d87b1193fa
Author: Alan Lawrence alan.lawre...@arm.com
Date: Tue Oct 7 15:28:47 2014 +0100
PowerPC v2 (but not paired.md)
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 02ea142..92bb5d0 100644
--- a/gcc
Ooops, attached.commit 56296417b9f6795e541b1101dce6e6ac1789de9a
Author: Alan Lawrence alan.lawre...@arm.com
Date: Wed Oct 8 15:58:27 2014 +0100
IA64 (?!)
diff --git a/gcc/config/ia64/vect.md b/gcc/config/ia64/vect.md
index e3ce292..45f4156 100644
--- a/gcc/config/ia64/vect.md
+++ b/gcc
-supports.exp
- yes, could do that, but it's difficult to come up with a good characterization
of what the criteria is, and I don't see it'd generalize to any other tests at
all :()
--Alan
Rainer Orth wrote:
Alan Lawrence alan.lawre...@arm.com writes:
Rainer Orth wrote:
However
From: Rainer Orth [r...@cebitec.uni-bielefeld.de]
Sent: 23 October 2014 14:10
To: Andreas Schwab
Cc: Alan Lawrence; Jeff Law; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c
Andreas Schwab sch...@linux
Sorry, somehow I missed this email. Yes, that appears to have fixed it!
Thank you very much,
Alan
Trevor Saunders wrote:
On Tue, Sep 09, 2014 at 03:37:26PM +0100, Alan Lawrence wrote:
Following this, we're seeing ICEs in tests in gcc.dg/pch.exp and g++.dg/pch.exp,
with cross-builds (hosted
a vec_extract, to produce a scalar result, to the
end of each reduc_ optab ?
--Alan
Richard Biener wrote:
On Mon, Oct 6, 2014 at 7:30 PM, Alan Lawrence alan.lawre...@arm.com wrote:
Ok, so unless there are objections, I plan to commit patches 1, 2, 4, 5,
and
6,
which have been previously
backends to the new _scal_ optab (and
removing the vector optab). Certainly I'd like to replace vec_shr/l with
vec_perm_expr too, but I'm conscious that the end of stage 1 is approaching!
--Alan
Richard Biener wrote:
On Thu, Sep 18, 2014 at 1:41 PM, Alan Lawrence alan.lawre...@arm.com wrote
executed, of course!
So yes, my workaround is wrong, we are working on a proper fix...
--Alan
Andrew Pinski wrote:
On Mon, Sep 22, 2014 at 4:10 AM, Alan Lawrence alan.lawre...@arm.com wrote:
Well, I haven't looked into this in detail: I've gone only as far as
* swapping emit-rtl.o between
Many thanks indeed! :)
--Alan
Segher Boessenkool wrote:
On Wed, Sep 24, 2014 at 04:02:11PM +0100, Alan Lawrence wrote:
However my CompileFarm account is still pending, so to that end, if you
were able to test patch 2/14 (attached inc. Richie's
s/VIEW_CONVERT_EXPR/NOP_EXPR
, is there any
chance you could test this on powerpc too? (in combination with patch 2/14,
which will need to be applied first; you can skip patch 1, and =4.)
--Alan
Richard Biener wrote:
On Thu, Sep 25, 2014 at 4:32 PM, Alan Lawrence alan.lawre...@arm.com wrote:
Ok, so, I've tried making reduc_plus
to the list shortly, and follow up with .md updates to the various backends.
Cheers, Alan
Richard Biener wrote:
On Thu, Sep 18, 2014 at 1:50 PM, Alan Lawrence alan.lawre...@arm.com wrote:
This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes.
These are presently documented
for trunk...?
--Alan
Andrew Pinski wrote:
On Thu, Sep 18, 2014 at 9:44 AM, Alan Lawrence alan.lawre...@arm.com wrote:
We've been seeing errors using aarch64-none-linux-gnu gcc to build the
403.gcc benchmark from spec2k6, that we've traced back to this patch. The
error looks like:
/home/alalaw01
injecting some dubious RTL from
a builtin, although this'll only give a momentary snapshot of behaviour. I may
or may not have time to look into this though ;)...
Cheers, Alan
Jeff Law wrote:
On 09/18/14 03:35, Alan Lawrence wrote:
Moreover, I think we both agree that if result_mode==shift_mode
Richard Biener wrote:
Huh. Does that ever happen? Please use a NOP_EXPR instead of
a VIEW_CONVERT_EXPR.
Yes, the testcase is gcc.target/i386/pr51235.c which performs black magic***
with void *. (This testcase otherwise fails the verify_gimple_assign_unary check
in tree-cfg.c .) However,
Richard Biener wrote:
scalar_reduc_to_vector misses a comment.
Ok to reuse the comment in optabs.h in optabs.c also?
I wonder if at the end we wouldn't transition all backends and then
renaming reduc_*_scal_optab back to reduc_*_optab makes sense.
Yes, that sounds like a plan, the _scal
...
(tests are as they were posted
https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01233.html .)
--Alan
Jeff Law wrote:
On 07/17/14 10:56, Alan Lawrence wrote:
Ok, the attached tests are passing on x86_64-none-linux-gnu,
aarch64-none-elf, arm-none-eabi, and a bunch of smaller platforms for
which I've
), i.e. all bar the change to arm_regno_class.
A change relating to the program counter affecting -fPIC does sound plausible, I
haven't looked any further than that...
--Alan
Joseph S. Myers wrote:
On Wed, 17 Sep 2014, Alan Lawrence wrote:
We've just noticed this patch causes an ICE in
gcc.c
The end goal here is to remove this code from tree-vect-loop.c
(vect_create_epilog_for_reduction):
if (BYTES_BIG_ENDIAN)
bitpos = size_binop (MULT_EXPR,
bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1),
TYPE_SIZE
The gimple folding ties the AArch64 backend to the tree representation of the
midend via the neon intrinsics. This code enables constant folding of Neon
intrinsics reduction ops, so improves performance, but is not necessary for
correctness. By temporarily removing it (here), we can then change
This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes.
These are presently documented as producing a vector with the result in element
0, and this is inconsistent with their use in tree-vect-loop.c (which on
bigendian targets pulls the bits out of the wrong end of the
These match their corresponding tree codes, by taking a vector and returning a
scalar; this is more architecturally neutral than the (somewhat loosely defined)
previous optab that took a vector and returned a vector with the result in the
least significant bits (i.e. element 0 for little-endian
This migrates AArch64 over to the new optab for 'plus' reductions, i.e. so the
define_expands produce scalars by generating a MOV to a GPR. Effectively, this
moves the vget_lane inside every arm_neon.h intrinsic, into the inside of the
define_expand.
Tested: aarch64.exp vect.exp on
Similarly to the previous patch (r/2205), this migrates AArch64 to the new
reduce-to-scalar optabs for min and max. For consistency we apply the same
treatment to the smax_nan and smin_nan patterns (used for __builtins), even
though reduc_smin_nan_scal (etc.) is not a standard name.
Tested:
This gives us back the constant-folding of the neon-intrinsics that was removed
in the first patch, but is now OK for bigendian too.
bootstrapped on aarch64-none-linux-gnu.
check-gcc on aarch64-none-elf and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64.c
For reduction operations (e.g. multiply) that don't have such a tree code ,or
where the target platform doesn't define an optab handler for the tree code, we
can perform the reduction using a series of log(N) shifts (where N = #elements
in vector), using the VEC_RSHIFT_EXPR=whole-vector-shift
These are like the previous patch, but using | rather than * - I was unable to
get the previous test to pass on PowerPC and MIPS.
I note there is no inherent vector operation here - a bitwise OR across a word,
and a reduction via shifts using scalar (not vector) ops would be all that's
The VEC_RSHIFT_EXPR is only ever used by the vectorizer in tree-vect-loop.c
(vect_create_epilog_for_reduction), to shift the vector by a whole number of
elements. The tree code allows more general shifts but only for integral types.
This only causes pain and difficulty for backends
This allows reduction of non-(plus|min|max) operations using log_2(N) shifts
rather than N vec_extracts; e.g. for example code
int
main (unsigned char argc, char **argv)
{
unsigned char in[16] = { 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31 };
unsigned char i = 0;
unsigned char sum = 1;
The VEC_LSHIFT_EXPR tree code, and the corresponding vec_shl_optab, seem to have
been added for completeness, providing a counterpart to VEC_RSHIFT_EXPR and
vec_shr_optab. However, whereas VEC_RSHIFT_EXPRs are generated (only) by the
vectorizer, VEC_LSHIFT_EXPR expressions are not generated at
The direction of VEC_RSHIFT_EXPR has been endian-dependent, contrary to the
general principles of tree. This patch updates fold-const and the vectorizer
(the only place where such expressions are created), such that VEC_RSHIFT_EXPR
always shifts towards element 0.
The tree code still maps
The previous patch broke aarch64_be by redefining VEC_RSHIFT_EXPR /
vec_shr_optab to always shift the vector towards gcc's element 0. This fixes
aarch64_be to do that.
check-gcc on aarch64-none-elf (no changes) and aarch64_be-none-elf (fixes all
regressions produced by previous patch, i.e. no
Following earlier patches, vect_create_epilog_for_reduction contains exactly one
case where extract_scalar_result==true. Hence, move the code 'if
(extract_scalar_result)' there, and tidy-up/remove some variables.
bootstrapped on x86_64-none-linux-gnu + check-gcc + check-g++.
gcc/ChangeLog:
Patch 12 of 14 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01475.html) will
break bigendian targets implementing vec_shr. This is a PowerPC parallel of
patch 13 of 14 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01477.html) for
AArch64. I've checked I can build a stage 1 compiler for
Patch 12 of 14 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01475.html) will
break bigendian targets implementing vec_shr. This is a MIPS parallel of
patch 13 of 14 (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01477.html) for
AArch64; the idea is that vec_shr should be unaffected on
We've been seeing errors using aarch64-none-linux-gnu gcc to build the 403.gcc
benchmark from spec2k6, that we've traced back to this patch. The error looks like:
/home/alalaw01/bootstrap_richie/gcc/xgcc -B/home/alalaw01/bootstrap_richie/gcc
-O3 -mcpu=cortex-a57.cortex-a53 -DSPEC_CPU_LP64
We've just noticed this patch causes an ICE in
gcc.c-torture/execute/scal-to-vec1.c at -O3 when running with -fPIC on
arm-none-linux-gnueabi and arm-none-linux-gnueabihf; test logs:
spawn /tmp/alan/buildarm-none-linux-gnueabi/obj/gcc4/gcc/xgcc -B/tmp/alan/builda
marcus.shawcr...@gmail.com wrote:
On 8 September 2014 11:35, Alan Lawrence alan.lawre...@arm.com wrote:
This adds a test of all the variants of vld2, vld2q, vld3, vld3q, vld4, and
vld4q. These all use typexNxM structs and the OI/CI/XImode mechanism, so the
test cross-checks this against plain ol' vst1(q
Original patch applied cleanly to 4.9 HEAD as r215175.
Marcus Shawcroft wrote:
On 20 August 2014 10:25, Alan Lawrence alan.lawre...@arm.com wrote:
The SIMD-register variant is miscategorized as alu_reg despite not using
any ALU registers, and should be neon_add for e.g. scheduling.
Tested
...@gmail.com wrote:
On 19 August 2014 11:44, Alan Lawrence alan.lawre...@arm.com wrote:
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.c (aarch64_types_cmtst_qualifiers,
TYPES_TST): Define.
(aarch64_fold_builtin): Update pattern for cmtst.
* config/aarch64/aarch64
Following this, we're seeing ICEs in tests in gcc.dg/pch.exp and g++.dg/pch.exp,
with cross-builds (hosted on x86_64) targetting bare metal AArch64 and ARM
(aarch64-none-elf, aarch64_be-none-elf and arm-none-eabi; I haven't tested
armeb-none-eabi; builds targeting linux are OK), for *release
This adds a test of all the variants of vld2, vld2q, vld3, vld3q, vld4, and
vld4q. These all use typexNxM structs and the OI/CI/XImode mechanism, so the
test cross-checks this against plain ol' vst1(q?).
Cross-tested on aarch64-none-elf (passing), also on aarch64_be-none-elf
The existing vld1/vst1_1.c test in gcc.target/aarch64 covers only vld1_s8 and
vld1q_s16. This extends it to cover all int/float variants via token-pasting.
Passing on aarch64-none-elf and aarch64_be-none-elf.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vld1-vst1_1.c: Rewrite to test
This adds a test of the vld2_dup, vld2q_dup, vld3_dup, vld3q_dup, vld4_dup and
vld4q_dup instrinsics.
Passing on aarch64-none-elf and aarch64_be-none-elf.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vldN_dup_1.c: New test.diff --git a/gcc/testsuite/gcc.target/aarch64/vldN_dup_1.c
At present there is no test coverage of the vld2_lane, vld2q_lane, vld3_lane,
vld3q_lane, vld4_lane, vld4q_lane intrinsics. So this adds a test using the vld1
and vst1 intrinsics.
Passing on aarch64-none-elf.
Failing on aarch64_be-none-elf; I believe because the intrinsic is [modifying
the]
This adds a test of all the variants of vst2, vst2q, vst3, vst3q, vst4, and
vst4q. These all use typexNxM structs and the OI/CI/XImode mechanism, so the
test cross-checks this against plain ol' vld1(q?).
Cross-tested on aarch64-none-elf (passing), also on aarch64_be-none-elf (failing
as per
(No regressions in check-gcc or check-g++ on aarch64-none-elf.)
--Alan
Alan Lawrence wrote:
Some manual editing of patch required due to e.g. int64x1 changes present on
trunk but not on the 4.9 branch; new patch attached.
I've done a quick smoke test of aarch64.exp+simd.exp (check-gcc
Pushed as r215015.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.c
(aarch64_types_cmtst_qualifiers, TYPES_TST): Remove as unused.
-
Index: gcc/config/aarch64/aarch64-builtins.c
===
---
aarch64-none-elf and aarch64_be-none-elf, but
FAIL for aarch64-none-linux-gnu.
It seems this is not what you saw in your own validations?
Christophe.
On 2 September 2014 17:17, Marcus Shawcroft marcus.shawcr...@gmail.com wrote:
On 19 August 2014 11:44, Alan Lawrence alan.lawre...@arm.com wrote
The sshr_n_64 intrinsics allow performing a signed shift right by 64 places. The
standard ashrdi3 pattern masks the sign amount with 63, so cannot be used.
However, such a shift fills the result by the sign bit, which is identical to
shifting right by 63. This patch just simplifies the code to
Patch attached.
Alan Lawrence wrote:
The sshr_n_64 intrinsics allow performing a signed shift right by 64 places. The
standard ashrdi3 pattern masks the sign amount with 63, so cannot be used.
However, such a shift fills the result by the sign bit, which is identical to
shifting right by 63
const0_rtx.
(aarch64_ushr_simddi): Delete.
* config/aarch64/aarch64.md (enum unspec): Delete UNSPEC_USHR64.
Alan Lawrence wrote:
The sshr_n_64 intrinsics allow performing a signed shift right by 64 places. The
standard ashrdi3 pattern masks the sign amount with 63, so cannot be used
This adds a test thath checks the result of a vset_lane intrinsic is identical
to the input apart from one value being changed.
Test checks only one index per vset_lane_xxx in a somewhat adhoc fashion as the
index has to be a compile-time immediate and I felt that doing a loop using
macros
with __aarch64_vset_lane_any.
OK for trunk?
Alan Lawrence wrote:
This adds a test thath checks the result of a vset_lane intrinsic is identical
to the input apart from one value being changed.
Test checks only one index per vset_lane_xxx in a somewhat adhoc fashion as the
index has to be a compile-time immediate and I
, vreinterpret_s16_f64,
vreinterpret_s32_f64, vreinterpret_u8_f64, vreinterpret_u16_f64,
vreinterpret_u32_f64): Use cast.
* config/aarch64/iterators.md (VD_RE): Delete.commit 126c5b92eea2850367f005ebe89f86c5b8b4e4f9
Author: Alan Lawrence alan.lawre...@arm.com
Date: Wed Aug 6 14:23:00 2014
Pushed r214946. (In the meantime the erroneous alu_reg had been changed to
alu_sreg by r212750 https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00679.html .)
Marcus Shawcroft wrote:
On 20 August 2014 10:25, Alan Lawrence alan.lawre...@arm.com wrote:
The SIMD-register variant is miscategorized
running.
I repeat, this is source-code-compatibility breaking, but not ABI breaking; if
it causes you any problems, it'll be the 4.9.x compiler shouting at you ;).
Ok assuming no regressions?
--Alan
Marcus Shawcroft wrote:
On 24 July 2014 11:18, Alan Lawrence alan.lawre...@arm.com wrote
.
--Alan
Marcus Shawcroft wrote:
On 12 August 2014 11:12, Alan Lawrence alan.lawre...@arm.com wrote:
This patch replaces the current inline assembler for the vget_high
intrinsics in arm_neon.h with a sequence of other calls, in a similar
fashion to vget_low. Unlike the assembler, these are all
This patch just replaces the varargs with a builtin_simd_arg*. The use of
varargs seems to make stepping into, and breakpointing, aarch64_simd_expand_args
difficult, and this adds typesafety and (I argue!) reduces complexity.
Tested check-gcc on aarch64-none-elf.
gcc/ChangeLog:
*
The SIMD-register variant is miscategorized as alu_reg despite not using any
ALU registers, and should be neon_add for e.g. scheduling.
Tested with check-gcc and check-g++ on aarch64-none-elf and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64.md (adddi3_aarch64): set
The only reference is in a comment.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.c (enum aarch64_type_qualifiers):
Remove qualifier_const_pointer, update comment.diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index
Thanks to Arnaud for confirming that Adacore does not have interest in the
Ada/Alpha combination
(https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01832.html).
As per below, I've tested check-ada on x86_64-none-linux-gnu without problems.
Can I say, ping? :)
Cheers, Alan
Alan Lawrence wrote
Completely as an aside: this makes me wonder, whether having and using a similar
macro _inside_ gcc, i.e. for targets to specify the ordering of elements within
a vector independently of BYTES_BIG_ENDIAN, might be a good thing?
--Alan
Bill Schmidt wrote:
Hi,
This adds a macro to indicate
of you folk are
able to build+test the patch (below) for Ada on Alpha: is this really reason for
us to want to hold this up?
--Alan
Alan Lawrence wrote:
...as I've not managed to build such a gcc. If so, is there any chance you could
please test check-ada with the following patch (in gcc
Vector comparisons are sometimes generated with needless 'not' instructions, and
'cmtst' is generally not output at all. This patch makes
gen_aarch64_vcond_internal more intelligent with regard to swapping the operands
to both the comparison and the conditional move, such that not is avoided
The vector compare intrinsics (vc[gl][et]z, vceqz, vtst) were written using
__builtin functions as (IIUC) at the time gcc vector extensions did not support
comparison ops across both C and C++ frontends. These have since been updated.
Following the first patch, we now get equal/better code
This patch adds the missing vrbit_p8 and vrbitq_p8 intrinsics to arm_neon.h, and
implements all the vrbit(q?)_[psu]8 intrinsics using a new builtin, rather than
the previous temporary asm. Also adds a testcase checking (a) execution results
and (b) that we output rbit vXX.8b,vYY.8b or
, the '!' still doesn't express
that; and leaving it in affects code-generation on all cores. And it is
inconsistent with other instructions.
--Alan
pins...@gmail.com wrote:
On Aug 12, 2014, at 7:40 AM, Alan Lawrence alan.lawre...@arm.com wrote:
(It is no more expensive.)
Yes on some processors
, Alan Lawrence wrote:
...patch attached...
Alan Lawrence wrote:
[When I wrote that xor was broken on GPRs and this fixes it, I meant
xor_one_cmpl rather than xor, sorry!]
The pattern for xor_one_cmpl never matched, due to the action of
combine_simplify_rtx; hence, separate this pattern out from
Following patch replaces the current temporary inline assembler implementation
of vget_high. So this patch adds a test first. We don't have any test coverage
of vget_low, either, so add that too.
Passing on aarch64-none-elf and aarch64_be-none-elf.
diff --git
401 - 500 of 579 matches
Mail list logo