On 06/11/15 10:39, Richard Biener wrote:
../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: location
references block not in block tree
l1_279 = PHI <1(28), l1_299(33)>
^^^
this is the error to look at! It means that the GC heap will be corrupted
quite easily.
Thanks, I'll
On 04/11/15 13:13, Jakub Jelinek wrote:
On Mon, Jul 06, 2015 at 05:38:35PM +0100, Alan Lawrence wrote:
Trying to push these now (svn!), patch 2 is going first.
I realize my second iteration of patch 1/2, dropped the testcases from the
first version. Okay to include those as per
https
On 30/10/15 10:54, Eric Botcazou wrote:
> On 30/10/15 10:44, Richard Biener wrote:
>>
>> I think you want to use wide-ints here and
>>
>> wide_int idx = wi::from (minidx, TYPE_PRECISION (TYPE_DOMAIN
>> (...)), TYPE_SIGN (TYPE_DOMAIN (..)));
>> wide_int maxidx = ...
>>
>> you can then
On 3 November 2015 at 14:01, Richard Biener wrote:
>
> Hum. I still wonder why we need all this complication ...
Well, certainly I'd love to make it simpler, and if the complication
is because I've gone about trying to deal with especially Ada in the
wrong way...
>
On 03/11/15 13:39, Richard Biener wrote:
> On Tue, Oct 27, 2015 at 6:38 PM, Alan Lawrence <alan.lawre...@arm.com> wrote:
>>
>> Say I...P are consecutive, the input would have gaps 0 1 1 1 1 1 1 1. If we
>> split the load group, we would want subgroups with gaps 0 1 1
On 3 November 2015 at 11:35, Richard Biener wrote:
>
> I think this should simply re-write A << B to (type) (unsigned-type) A
> * (1U << B).
>
> Does that then still vectorize the signed case?
I didn't realize our representation of chrec's could express that.
Yes, it
> s/explicitely/explicitly/ And remove the '*' from the 2nd and 3rd lines
> of the comment.
>
> It looks like get_ctor_element_at_index has numerous formatting
> problems. In particular you didn't indent the braces across the board
> properly. Also check for tabs vs spaces issues please.
Yes,
On 30/10/15 05:35, Jeff Law wrote:
> On 10/29/2015 01:18 PM, Alan Lawrence wrote:
>> This patch just teaches DOM that ARRAY_REFs can be equivalent to MEM_REFs
>> (with
>> pointer type to the array element type).
>>
>> gcc/ChangeLog:
>>
>> * t
On 3 November 2015 at 10:27, Alan Lawrence <alan.lawre...@arm.com> wrote:
> That is, ssa-dom-cse-7.c passes (and the patch series solves PR/63679) if
> instead of my patch 2 (normalization of MEM_REFs) we have this:
>
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 43
There are still a few uses of the old reduc_[us](plus|min|max)_ optabs
remaining. This migrates the instances in mips-ps-3d.md.
This seemed straightforward, as mips-ps-3d.md also provides a vec_extractv2sf.
I tried to be conservative and handle all the possible cases for endianness,
this may be
This migrates the various reduction optabs in sse.md to use the reduce-to-scalar
form. I took the straightforward approach (equivalent to the migration code in
expr.c/optabs.c) of generating a vector temporary, using the existing code to
reduce to that, and extracting lane 0, in each pattern.
On 27/10/15 22:27, H.J. Lu wrote:
>
> It caused:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68112
Bah :(.
So yes, in general case, we can't rewrite (a << 1) to (a * 2) as for signed
types (0x7f...f) << 1 == -2 whereas (0x7f...f * 2) is undefined behaviour.
Oh well :(...
I don't have a
On 02/11/15 14:38, Alan Lawrence wrote:
>
I'm a bit puzzled as to why nobody else has been seeing this, as it's been
happening to me as part of building gcc on x86_64, but since this patch I've
been seeing an ICE in vec::operator[] in reorder_basic_blocks_simple, building
libitm/beginend
On 26/10/15 16:26, Alan Lawrence wrote:
The included testcase demonstrates the ICE: aarch64_valid_floating_const
(via aarch64_float_const_representable_p) disables HFmode immediates, but
allows 0.0. However, *movhf_aarch64 does not allow this insn:
(insn 7 6 10 2 (set (mem:HF (reg/f:DI 73) [0
This patch just teaches DOM that ARRAY_REFs can be equivalent to MEM_REFs (with
pointer type to the array element type).
gcc/ChangeLog:
* tree-ssa-dom.c (dom_normalize_single_rhs): New.
(dom_normalize_gimple_stmt): New.
(lookup_avail_expr): Call dom_normalize_gimple_stmt.
This is a revision of previous series at
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01485.html , and follows on from
the first two patches of that series, which have been pushed already.
A few things have happened since. The previous patch 3, making SRA generate
ARRAY_REFS, is removed. As
gcc/ChangeLog:
* tree-sra.c (scalarizable_type_p): Comment variable-length arrays.
(completely_scalarize): Comment zero-length arrays.
(get_access_replacement): Correct comment re. precondition.
---
gcc/tree-sra.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
This is in response to https://gcc.gnu.org/ml/gcc/2015-10/msg00097.html, where
Richi points out that CONSTRUCTOR elements are not necessarily ordered.
I wasn't sure of a good naming convention for the new get_ctor_element_at_index,
other suggestions welcome.
gcc/ChangeLog:
*
This has changed quite a bit since the previous revision
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01484.html), mostly due to Ada
and specifically Ada on ARM.
I didn't find a good alternative to scanning for constant-pool accesses "as we
go" through the function, and although I didn't find
The code I added to completely_scalarize for arrays isn't right in some cases
of negative array indices (e.g. arrays with indices from -1 to 1 in the Ada
testsuite). On ARM, this prevents a failure bootstrapping Ada with the next
patch, as well as a few ACATS tests (e.g. c64106a).
Some discussion
This makes dom2 identify e.g. MEM[(int[8] *)...] with MEM[(int *)...].
These are not generally equivalent as they have different aliasing behaviour
but they have the same value as far as dom is concerned and so this helps
find more equivalences.
There is some question over the best policy here,
--in-reply-to
<cafiyyc3tepgber2jqc8-x_ij4ghtjjoxfzffcnyzhxhgqbe...@mail.gmail.com>
On 26/10/15 08:58, Richard Biener wrote:
>
> On Fri, Oct 23, 2015 at 5:15 PM, Alan Lawrence <alan.lawre...@arm.com> wrote:
>> + chrec2 = fold_build2 (LSHI
On 26/10/15 15:04, Richard Biener wrote:
apart from the fact that you'll post a new version you need to adjust GROUP_GAP.
You also seem to somewhat "confuse" "first I stmts" and "a group of
size I", those
are not the same when the group has haps. I'd say "a group of size i" makes the
most
The included testcase demonstrates the ICE: aarch64_valid_floating_const
(via aarch64_float_const_representable_p) disables HFmode immediates, but
allows 0.0. However, *movhf_aarch64 does not allow this insn:
(insn 7 6 10 2 (set (mem:HF (reg/f:DI 73) [0 *f_2(D)+0 S2 A16])
(const_double:HF
On 23 October 2015 at 16:20, Alan Lawrence <alan.lawre...@arm.com> wrote:
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
> b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
> index ab54a48..b012d78 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
> +++ b/gcc/testsuite/g
vect_analyze_slp_instance currently only creates an slp_instance if _all_ stores
in a group fitted the same pattern. This patch splits non-matching groups up
on vector boundaries, allowing only part of the group to be SLP'd, or multiple
subgroups to be SLP'd differently.
The algorithm could be
On 19/10/15 12:49, Richard Biener wrote:
> Err, you should always do the shift in the type of rhs1. You should also
> avoid the chrec_convert of rhs2 above for shifts.
Err, yes, indeed. Needed to keep the chrec_convert before the
chrec_fold_multiply, and the rest followed. How's this?
Just one very small point...
On 19/10/15 09:17, Alan Hayward wrote:
> - if (check_reduction
> - && (!commutative_tree_code (code) || !associative_tree_code (code)))
> + if (check_reduction)
> {
> - if (dump_enabled_p ())
> -report_vect_op (MSG_MISSED_OPTIMIZATION,
On closer inspection I think you can also remove this guy (from loongson.md):
(define_insn "reduc_uplus_v8qi"
[(set (match_operand:V8QI 0 "register_operand" "=f")
(unspec:V8QI [(match_operand:V8QI 1 "register_operand" "f")]
UNSPEC_LOONGSON_BIADD))]
gcc.dg/tree-ssa/sra-12.c is skipped on a bunch of targets, including AArch64,
because the default max-scalarization-size depends on MOVE_RATIO, and on those
targets thus ends up being too small for SRA to optimize the testcase. Recently
I noticed that the test has been failing for some time on ARM
The test vdiv_f.c #define's NAN to (0.0 / 0.0). This produces extra scalar
fdiv's, which complicate the scan-assembler testing. We can remove these by
using __builtin_nan instead.
Tested on AArch64 Linux.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vdiv_f.c: Use __builtin_nan.
---
On 14/10/15 23:02, Charles Baylis wrote:
On 12 October 2015 at 11:58, Alan Lawrence <alan.lawre...@arm.com> wrote:
>
Given we are making changes here to how this all works on bigendian, have
you tested armeb at all?
I tested on big endian, and it passes, except
Well, I aske
This enables tests bb-slp-11.c and bb-slp-26.c for AArch64. Both of these are
currently passing on little- and big-endian.
(Tested on aarch64-none-linux-gnu and aarch64_be-none-elf).
OK for trunk?
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (check_effective_target_vect64): Add
This lets the vectorizer handle some simple strides expressed using left-shift
rather than mul, e.g. a[i << 1] (whereas previously only a[i * 2] would have
been handled).
This patch does *not* handle the general case of shifts - neither a[i << j]
nor a[1 << i] will be handled; that would be a
On 07/10/15 00:59, charles.bay...@linaro.org wrote:
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
...
case NEON_ARG_MEMORY:
/* Check if expand failed. */
if (op[argc] == const0_rtx)
{
- va_end
On 09/10/15 22:01, Jeff Law wrote:
So my question for the series as a whole is whether or not we need to do
something for the other languages, particularly Fortran. I was a bit
surprised to see this stuff bleed into the C/C++ front-ends and
obviously wonder if it's bled into Fortran, Ada,
On 07/10/15 00:59, charles.bay...@linaro.org wrote:
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 2667866..251afdc 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -4261,8 +4261,9 @@ if (BYTES_BIG_ENDIAN)
UNSPEC_VLD1_LANE))]
On 07/10/15 11:50, Simon Dardis wrote:
On the change from smin/smax it was a deliberate change as I managed to confuse
myself of the mode patterns, correct version follows. Reverted back to VWHB for
smax/smin. Stylistic point addressed.
No new regression, ok for commit?
Well, I'm not a MIPS
Thanks for working on this, Simon!
On 01/10/15 15:43, Simon Dardis wrote:
-(define_expand "reduc_smax_"
- [(match_operand:VWHB 0 "register_operand" "")
- (match_operand:VWHB 1 "register_operand" "")]
+(define_expand "reduc_smax_scal_"
+ [(match_operand:HI 0 "register_operand" "")
+
On 21/09/15 15:38, James Greenhalgh wrote:
On Mon, Sep 21, 2015 at 10:44:32AM +0100, Alan Lawrence wrote:
[Resending in plain text] This makes sense to me now, although I find
your comment slightly confusing:
[] in that
+;; the meaning of HI and LO is always taken with a little-endian
[Resending in plain text] This makes sense to me now, although I find
your comment slightly confusing:
[] in that
+;; the meaning of HI and LO is always taken with a little-endian view of
+;; the vector
You mean vec_unpacks_{hi,lo} (which seems to go against the
*architectural* bit after
On 18/09/15 13:17, Richard Biener wrote:
Ok, I see.
That this case is already vectorized is because it implements MAX_EXPR,
modifying it slightly to
int foo (int *a)
{
int val = 0;
for (int i = 0; i < 1024; ++i)
if (a[i] > val)
val = a[i] + 1;
return val;
}
makes it no
On 18/09/15 09:35, Richard Biener wrote:
Btw, we ditched the original reduce-to-vector variant due to its
endianess issues (it only had _one_ element of the vector contain
the reduction result). Re-introducing reduce-to-vector but with
the reduction result in all elements wouldn't have any
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01024.html
after discovering that patch was broken on power64le - thanks to Bill Schmidt
for pointing out that gcc112 is the opposite endianness to gcc110...
This time I decided to avoid any funny business with making RTL match
On 02/09/15 23:12, Alexandre Oliva wrote:
On Sep 2, 2015, Alan Lawrence <alan.lawre...@arm.com> wrote:
One more failure to report, I'm afraid. On AArch64 Bigendian,
aapcs64/func-ret-4.c ICEs in simplify_subreg (line refs here are from
r227348):
Thanks. The failure mode was dif
On 15/09/15 08:43, Richard Biener wrote:
>
> Sorry for chiming in so late...
Not at all, TYVM for your help!
> TREE_CONSTANT isn't the correct thing to test. You should use
> TREE_CODE () == INTEGER_CST instead.
Done (in some cases, via tree_fits_shwi_p).
> Also you need to handle
> NULL_TREE
On 16/09/15 15:28, Bill Schmidt wrote:
2015-09-16 Bill Schmidt
* config/rs6000/altivec.md (UNSPEC_REDUC_SMAX, UNSPEC_REDUC_SMIN,
UNSPEC_REDUC_UMAX, UNSPEC_REDUC_UMIN, UNSPEC_REDUC_SMAX_SCAL,
UNSPEC_REDUC_SMIN_SCAL,
On 16/09/15 17:10, Bill Schmidt wrote:
On Wed, 2015-09-16 at 16:29 +0100, Alan Lawrence wrote:
On 16/09/15 15:28, Bill Schmidt wrote:
2015-09-16 Bill Schmidt <wschm...@linux.vnet.ibm.com>
* config/rs6000/altivec.md (UNSPEC_REDUC_SMAX, UNSPEC_REDU
On 16/09/15 17:19, Bill Schmidt wrote:
On Wed, 2015-09-16 at 16:29 +0100, Alan Lawrence wrote:
I proposed a patch to migrate PPC off the old patterns, but have forgotten to
ping it recently - last at
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01024.html ... (ping?!)
Hi Alan,
Thanks
Here's a rebased version, which fixes conflicts with float16 and Christophe's
fixes for bigendian lane indices. Also fiddled around with whitespace in
aarch64-simd.md
vec_store_lanes{oi,ci,xi}_lane are not standard pattern names, so using them in
aarch64-simd.md is misleading. This adds an aarch64_ prefix to those pattern
names, paralleling aarch64_vec_load_lanes_lane.
bootstrapped and check-gcc on aarch64-none-linux-gnu
gcc/ChangeLog:
*
This removes V_FOUR_ELEM in the same way that patch 3 removed V_THREE_ELEM,
again using BLKmode + set_mem_size. (This makes the four-lane expanders very
similar to the three-lane expanders, and they will be combined in patch 7.)
bootstrapped and check-gcc on aarch64-none-linux-gnu
gcc/ChangeLog:
The previous patches leave ld[234]_lane, st[234]_lane, and ld[234]r expanders
all nearly identical, so we can easily parameterize across the number of lanes
and combine them.
For the ld_lane pattern, I switched from the VCONQ attribute to
just using the MODE attribute, this is identical for
This removes EImode from the (AArch64) compiler, and all mention of or support
for it.
bootstrapped and check-gcc on aarch64-none-linux-gnu
gcc/ChangeLog:
* config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist): Update
comment.
* config/aarch64/aarch64-builtins.c
The V_THREE_ELEM attribute used BLKmode for most sizes, but occasionally
EImode. This patch changes to BLKmode in all cases, explicitly setting
memory size (thus, preserving size for the cases that were EImode, and
setting size for the first time for cases that were already BLKmode).
The patterns
aarch64_st and
aarch64_ld expanders back onto 12 insns
aarch64_{ld,st}{2,3,4}_dreg (for VD and DX modes), using the
VSTRUCT_DREG iterator over TI/EI/OI modes to represent the block of memory
transferred. Instead, use BLKmode for all memory transfers, explicitly setting
mem_size.
Bootstrapped and
Same logic as previous; this makes the 2-, 3-, and 4-lane expanders all follow
the same pattern.
bootstrapped and check-gcc on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_simd_ld2r,
aarch64_vec_load_lanesoi_lane,
This adds an AARCH64_VALID_SIMD_DREG_MODE exactly paralleling the existing
...QREG... macro.
The new test now compiles (at -O3) to:
test_1:
add v1.2s, v1.2s, v5.2s
add v2.2s, v2.2s, v6.2s
add v3.2s, v3.2s, v7.2s
add v0.2s, v0.2s, v4.2s
ret
On 15/09/15 10:43, James Greenhalgh wrote:
>
> It is convenient that this falls out, but likely surprising for nregs.
> Please add a comment to nregs explaining the dual use of nregs to represent
> both the number of Q registers used for the type, and the number of elements
> touched by the
Ping. (Rerevert with 5 lines extra paranoia in scalarizable_type_p).
Thanks, Alan
On 08/09/15 13:43, Martin Jambor wrote:
Hi,
On Mon, Sep 07, 2015 at 02:15:45PM +0100, Alan Lawrence wrote:
In-Reply-To: <55e0697d.2010...@arm.com>
On 28/08/15 16:08, Alan Lawrence wrote:
Alan Lawrence
On 11/09/15 14:19, Bill Schmidt wrote:
A secondary concern for powerpc is that REDUC_MAX_EXPR produces a scalar
that has to be broadcast back to a vector, and the best way to implement
it for us already has the max value in all positions of a vector. But
that is something we should be able to
On 09/09/15 11:31, Alan Lawrence wrote:
Hmmm, hang on. I'm not quite sure what the actual issue/bug is here, but is this
the same issue as my patch 12 "with BE RTL fix"?
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01482.html, explanation last at
https://gcc.gnu.org/ml/gcc-patch
Hmmm, hang on. I'm not quite sure what the actual issue/bug is here, but is this
the same issue as my patch 12 "with BE RTL fix"?
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01482.html, explanation last at
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02365.html) I pushed this as
r227551
Ping. (Thanks, Christophe!)
Correct version here: https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01501.html
Cheers, Alan
On 25/08/15 15:21, Christophe Lyon wrote:
On 25 August 2015 at 15:57, Alan Lawrence <alan.lawre...@arm.com> wrote:
Sorry - wrong version posted. Th
Ping. (Thanks, Christophe!).
Original message: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02366.html
On 25/08/15 14:28, Alan Lawrence wrote:
Christophe Lyon wrote:
On 28 July 2015 at 13:26, Alan Lawrence <alan.lawre...@arm.com> wrote:
This is a respin of
https://gcc.gnu.org/ml/gcc-p
On 08/09/15 09:26, James Greenhalgh wrote:
On Tue, Sep 08, 2015 at 09:21:08AM +0100, James Greenhalgh wrote:
On Mon, Sep 07, 2015 at 02:09:01PM +0100, Alan Lawrence wrote:
On 04/09/15 13:32, James Greenhalgh wrote:
In that case, these should be implemented as inline assembly blocks
Original message here: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02363.html
On 28/07/15 12:27, Alan Lawrence wrote:
> This documents the change to arm_neon_fp16_ok in the first patch; the addition
> of arm_neon_fp16_hw_ok in the last patch; and corrects a cross-reference.
>
> (I
In-Reply-To: <55e0697d.2010...@arm.com>
On 28/08/15 16:08, Alan Lawrence wrote:
> Alan Lawrence wrote:
>>
>> Right. I think VLA's are the problem with pr64312.C also. I'm testing a fix
>> (that declares arrays with any of these properties as unscalarizable).
> ...
&
On 04/09/15 13:32, James Greenhalgh wrote:
> In that case, these should be implemented as inline assembly blocks. As it
> stands, the code generation for these intrinsics will be very poor with this
> patch applied.
>
> I'm going to hold off OKing this until I see a follow-up to fix the code
>
On 02/09/15 23:12, Alexandre Oliva wrote:
On Sep 2, 2015, Alan Lawrence <alan.lawre...@arm.com> wrote:
One more failure to report, I'm afraid. On AArch64 Bigendian,
aapcs64/func-ret-4.c ICEs in simplify_subreg (line refs here are from
r227348):
Thanks. The failure mode was dif
On 14/08/15 19:57, Alexandre Oliva wrote:
I'm glad it appears to be working to everyone's
satisfaction now. I've just committed it as r226901, with only a
context adjustment to account for a change in use_register_for_decl in
function.c. /me crosses fingers :-)
Here's the patch as checked
Rainer Orth wrote:
It seems that since 20150717, gcc.dg/vect/no-scevccp-outer-11.c XPASSes
everywhere:
XPASS: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER LOOP
VECTORIZED." 1
To reduce testsuite noise, I'd like to remove the xfail as follows.
Tested with the appropriate
Christophe Lyon wrote:
I asked because I assumed that Alan saw it pass in his configuration.
Bah. No - I now discover a problem in my C++ testsuite setup that was causing a
large number of tests to not be executed. I see the problem too now,
investigating
--Alan
Richard Biener wrote:
On Fri, 28 Aug 2015, Alan Lawrence wrote:
Christophe Lyon wrote:
I asked because I assumed that Alan saw it pass in his configuration.
Bah. No - I now discover a problem in my C++ testsuite setup that was causing
a large number of tests to not be executed. I see
The code in the dom_valueize function is duplicated a number of times; so, call
the function.
Also remove a comment in lookup_avail_expr re const_and_copies, describing one
of said duplicates, that looks like it was superceded in r87787.
Bootstrapped + check-gcc on x86-none-linux-gnu.
Alan Lawrence wrote:
Right. I think VLA's are the problem with pr64312.C also. I'm testing a fix
(that declares arrays with any of these properties as unscalarizable).
Monday is a bank holiday in UK and so I expect to get back to you on Tuesday.
--Alan
In the meantime I've reverted
Jeff Law wrote:
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
new file mode 100644
index 000..e251058
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
@@ -0,0 +1,38 @@
+/* Verify that SRA total scalarization works on records
Martin Jambor wrote:
If you change what the function does, you have to change the comment
too. If I am not mistaken, even with the whole patch set applied, the
first sentence would still be: Create total_scalarization accesses
for all scalar type fields in VAR and for VAR as a whole. And
Martin Jambor wrote:
First, I would be much
happier if you added a proper comment to scalarize_elem function which
you forgot completely. The name is not very descriptive and it has
quite few parameters too.
Second, this patch should also fix PR 67283. It would be great if you
could
Richard Biener wrote:
One extra question is does the way we limit total scalarization work
well
for arrays? I suppose we have either sth like the maximum size of an
aggregate we scalarize or the maximum number of component accesses
we create?
Only the former and that would be kept intact.
This adds an AARCH64_VALID_SIMD_DREG_MODE exactly paralleling the existing
...QREG... macro, and as a driveby fixes mode-(MODE) in the latter.
The new test now compiles (at -O3) to:
test_1:
add v1.2s, v1.2s, v5.2s
add v2.2s, v2.2s, v6.2s
add v3.2s, v3.2s,
The V_THREE_ELEM attribute used BLKmode for most sizes, but occasionally
EImode. This patch changes to BLKmode in all cases, explicitly setting
memory size (thus, preserving size for the cases that were EImode, and
setting size for the first time for cases that were already BLKmode).
The patterns
This removes V_FOUR_ELEM in the same way that patch 3 removed V_THREE_ELEM,
again using BLKmode + set_mem_size. (This makes the four-lane expanders very
similar to the three-lane expanders, and they will be combined in patch 7.)
bootstrapped and check-gcc on aarch64-none-linux-gnu
gcc/ChangeLog:
This removes EImode from the (AArch64) compiler, and all mention of or support
for it.
bootstrapped and check-gcc on aarch64-none-linux-gnu
gcc/ChangeLog:
* config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist): Update
comment.
* config/aarch64/aarch64-builtins.c
aarch64_stVSTRUCT:nregsVDC:mode and
aarch64_ldVSTRUCT:nregsVDC:mode expanders back onto 12 insns
aarch64_{ld,st}{2,3,4}mode_dreg (for VD and DX modes), using the
VSTRUCT_DREG iterator over TI/EI/OI modes to represent the block of memory
transferred. Instead, use BLKmode for all memory transfers,
Same logic as previous; this makes the 2-, 3-, and 4-lane expanders all follow
the same pattern.
bootstrapped and check-gcc on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_simd_ld2rmode,
aarch64_vec_load_lanesoi_lanemode,
The end goal of this series of patches is to enable 64bit vector modes for
TARGET_ARRAY_MODE_SUPPORTED_P, achieved in the last patch. At present, doing so
causes ICEs with illegal subregs (e.g. returning the middle bits from a large
int mode covering 3 vectors); the patchset avoids these by first
vec_store_lanes{oi,ci,xi}_lane are not standard pattern names, so using them in
aarch64-simd.md is misleading. This adds an aarch64_ prefix to those pattern
names, paralleling aarch64_vec_load_lanesmode_lane.
bootstrapped and check-gcc on aarch64-none-linux-gnu
gcc/ChangeLog:
*
The previous patches leave ld[234]_lane, st[234]_lane, and ld[234]r expanders
all nearly identical, so we can easily parameterize across the number of lanes
and combine them.
For the ldVSTRUCT:nregs_lane pattern, I switched from the VCONQ attribute to
just using the MODE attribute, this is
Jeff Law wrote:
The question I have is why this differs from the effects of patch #5.
That would seem to indicate that there's things we're not getting into
the candidate tables with this approach?!?
I'll answer this first, as I think (Richard and) Martin have identified enough
other
Sorry - wrong version posted. The hunk for add_options_for_arm_neon_fp16 has
moved to the previous patch! This version also fixes some whitespace issues.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New.
* lib/target-supports.exp
Christophe Lyon wrote:
On 28 July 2015 at 13:27, Alan Lawrence alan.lawre...@arm.com wrote:
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp:
set additional flags for neon-fp16 support.
* gcc.target/aarch64/advsimd-intrinsics
James Greenhalgh wrote:
- VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf)
+ VAR2 (UNOP, vec_unpacks_hi_, 10, v4sf, v8hf)
Should this not use the appropriate BUILTIN_... iterator?
Indeed; BUILTIN_VQ_HSF it is.
VAR1 (BINOP, float_truncate_hi_, 0, v4sf)
VAR1 (BINOP, float_truncate_hi_, 0,
ssa-dom-cse-2.c fails on a number of platforms because the input array is pushed
out to the constant pool, preventing later stages from folding away the entire
computation. This patch series fixes the failure by extending SRA to pull the
constants back in.
This is my first patch(set) to SRA and
This makes SRA replace loads of records/arrays from constant pool entries,
with elementwise assignments of the constant values, hence, overcoming the
fundamental problem in PR/63679.
As a first pass, the approach I took was to look for constant-pool loads as
we scanned through other accesses, and
I used this as a means of better-testing the previous changes, as it exercises
the constant replacement code a whole lot more. Indeed, quite a few tests are
now optimized away to nothing on AArch64...
Always pulling in constants, is almost certainly not what we want, but we may
nonetheless want
This changes the completely_scalarize_record path to also work on arrays (thus
allowing records containing arrays, etc.). This just required extending the
existing type_consists_of_records_p and completely_scalarize_record methods
to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I
When SRA completely scalarizes an array, this patch changes the generated
accesses from e.g.
MEM[(int[8] *)a + 4B] = 1;
to
a[1] = 1;
This overcomes a limitation in dom2, that accesses to equivalent chunks of e.g.
MEM[(int[8] *)a] are not hashable_expr_equal_p with accesses to e.g.
This is a small refactoring/renaming patch, it just moves the call to
completely_scalarize_record out from completely_scalarize_var, and renames
the latter to create_total_scalarization_access.
This is because the next patch needs to drop the _record suffix and I felt
it would be confusing to
Alan Lawrence wrote:
All AArch64 patches are unchanged from previous version. However, in response to
discussion, the ARM patches are changed (much as I suggested
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02249.html); this version:
* Hides the existing vcvt_f16_f32 and vcvt_f32_f16
101 - 200 of 579 matches
Mail list logo