Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors
On 07/06/16 17:56, Kyrill Tkachov wrote: > Hi all, > > This patch addresses an deficiency we have in handling vector > lane-to-lane moves in the AArch64 backend. > Generally we can use the INS (element) instruction but, as a user > complains in https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html > we don't. James had a patch adding an appropriate combine pattern some > time ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html) > but it never got applied. > > This patch is a rebase of that patch that adds necessary > vec_merge+vec_duplicate+vec_select combine pattern. > I chose to use a define_insn rather than the define_insn_and_split in > that patch that just deletes the instruction when > the source and destination registers are the same, as I think that's not > he combine patterns job to delete the redundant instruction > but rather some other passes job. Also, I was not able to create a > testcase where it would make a difference. > > Also, this patch doesn't reimplement that vcopy*lane* intrinsics from > inline assembly to a vget_lane+vset_lane combo. > This can be done as a separate patch on top of this one. > > Bootstrapped and tested on aarch64-none-linux-gnu. > Also tested on aarch64_be-none-elf. > > Ok for trunk? > OK. R. > Thanks, > Kyrill > > 2016-06-07 James Greenhalgh> Kyrylo Tkachov > > * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane): > New define_insn. > (*aarch64_simd_vec_copy_lane_): Likewise. > > 2016-06-07 James Greenhalgh > Kyrylo Tkachov > > * gcc.target/aarch64/vget_set_lane_1.c: New test. > > aarch64-ins-vec.patch > > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > 6ea35bf487eaa47dd78742e3eae7507b6875ba1a..5600e5bd0a94fd7efd704a4b13d95d993fd5b62f > 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -555,6 +555,49 @@ (define_insn "aarch64_simd_vec_set" >[(set_attr "type" "neon_from_gp, neon_ins, neon_load1_1reg")] > ) > > +(define_insn "*aarch64_simd_vec_copy_lane" > + [(set (match_operand:VALL 0 "register_operand" "=w") > + (vec_merge:VALL > + (vec_duplicate:VALL > + (vec_select: > + (match_operand:VALL 3 "register_operand" "w") > + (parallel > + [(match_operand:SI 4 "immediate_operand" "i")]))) > + (match_operand:VALL 1 "register_operand" "0") > + (match_operand:SI 2 "immediate_operand" "i")))] > + "TARGET_SIMD" > + { > +int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2]))); > +operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt); > +operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4]))); > + > +return "ins\t%0.[%p2], %3.[%4]"; > + } > + [(set_attr "type" "neon_ins")] > +) > + > +(define_insn "*aarch64_simd_vec_copy_lane_" > + [(set (match_operand:VALL 0 "register_operand" "=w") > + (vec_merge:VALL > + (vec_duplicate:VALL > + (vec_select: > + (match_operand: 3 "register_operand" "w") > + (parallel > + [(match_operand:SI 4 "immediate_operand" "i")]))) > + (match_operand:VALL 1 "register_operand" "0") > + (match_operand:SI 2 "immediate_operand" "i")))] > + "TARGET_SIMD" > + { > +int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2]))); > +operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt); > +operands[4] = GEN_INT (ENDIAN_LANE_N (mode, > +INTVAL (operands[4]))); > + > +return "ins\t%0.[%p2], %3.[%4]"; > + } > + [(set_attr "type" "neon_ins")] > +) > + > (define_insn "aarch64_simd_lshr" > [(set (match_operand:VDQ_I 0 "register_operand" "=w") > (lshiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w") > diff --git a/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c > b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c > new file mode 100644 > index > ..07a77de319206c5c6dad1c0d2d9bcc998583f9c1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c > @@ -0,0 +1,72 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "arm_neon.h" > + > +#define BUILD_TEST(TYPE1, TYPE2, Q1, Q2, SUFFIX, INDEX1, INDEX2) \ > +TYPE1 __attribute__((noinline,noclone)) \ > +test_copy##Q1##_lane##Q2##_##SUFFIX (TYPE1 a, TYPE2 b) > \ > +{\ > + return vset##Q1##_lane_##SUFFIX (vget##Q2##_lane_##SUFFIX (b, INDEX2),\ > + a, INDEX1); \ > +} > + > +BUILD_TEST (poly8x8_t, poly8x8_t, , , p8, 7, 6) > +BUILD_TEST (int8x8_t, int8x8_t, , , s8, 7, 6) > +BUILD_TEST (uint8x8_t, uint8x8_t, , , u8, 7, 6) > +/* { dg-final
Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors
Ping. Thanks, Kyrill On 22/06/16 11:07, Kyrill Tkachov wrote: Ping. Richard, Marcus, do you have any feedback on this? https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00502.html Thanks, Kyrill On 14/06/16 10:36, James Greenhalgh wrote: On Tue, Jun 07, 2016 at 05:56:47PM +0100, Kyrill Tkachov wrote: Hi all, This patch addresses an deficiency we have in handling vector lane-to-lane moves in the AArch64 backend. Generally we can use the INS (element) instruction but, as a user complains in https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html we don't. James had a patch adding an appropriate combine pattern some time ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html) but it never got applied. This patch is a rebase of that patch that adds necessary vec_merge+vec_duplicate+vec_select combine pattern. I chose to use a define_insn rather than the define_insn_and_split in that patch that just deletes the instruction when the source and destination registers are the same, as I think that's not he combine patterns job to delete the redundant instruction but rather some other passes job. Also, I was not able to create a testcase where it would make a difference. Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline assembly to a vget_lane+vset_lane combo. This can be done as a separate patch on top of this one. Bootstrapped and tested on aarch64-none-linux-gnu. Also tested on aarch64_be-none-elf. Ok for trunk? This looks OK to me, but as it is based on my code I probably can't approve it within the spirit of the write access policies (I only have localized review permission). Best wait for Richard/Marcus or a global reviewer to take a look. Thanks, Kyrill 2016-06-07 James GreenhalghKyrylo Tkachov * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane): New define_insn. (*aarch64_simd_vec_copy_lane_): Likewise. Watch your ChangeLog formatting. Thanks, James 2016-06-07 James Greenhalgh Kyrylo Tkachov * gcc.target/aarch64/vget_set_lane_1.c: New test.
Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors
Ping. Richard, Marcus, do you have any feedback on this? https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00502.html Thanks, Kyrill On 14/06/16 10:36, James Greenhalgh wrote: On Tue, Jun 07, 2016 at 05:56:47PM +0100, Kyrill Tkachov wrote: Hi all, This patch addresses an deficiency we have in handling vector lane-to-lane moves in the AArch64 backend. Generally we can use the INS (element) instruction but, as a user complains in https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html we don't. James had a patch adding an appropriate combine pattern some time ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html) but it never got applied. This patch is a rebase of that patch that adds necessary vec_merge+vec_duplicate+vec_select combine pattern. I chose to use a define_insn rather than the define_insn_and_split in that patch that just deletes the instruction when the source and destination registers are the same, as I think that's not he combine patterns job to delete the redundant instruction but rather some other passes job. Also, I was not able to create a testcase where it would make a difference. Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline assembly to a vget_lane+vset_lane combo. This can be done as a separate patch on top of this one. Bootstrapped and tested on aarch64-none-linux-gnu. Also tested on aarch64_be-none-elf. Ok for trunk? This looks OK to me, but as it is based on my code I probably can't approve it within the spirit of the write access policies (I only have localized review permission). Best wait for Richard/Marcus or a global reviewer to take a look. Thanks, Kyrill 2016-06-07 James GreenhalghKyrylo Tkachov * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane): New define_insn. (*aarch64_simd_vec_copy_lane_): Likewise. Watch your ChangeLog formatting. Thanks, James 2016-06-07 James Greenhalgh Kyrylo Tkachov * gcc.target/aarch64/vget_set_lane_1.c: New test.
Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors
On Tue, Jun 07, 2016 at 05:56:47PM +0100, Kyrill Tkachov wrote: > Hi all, > > This patch addresses an deficiency we have in handling vector lane-to-lane > moves in the AArch64 backend. Generally we can use the INS (element) > instruction but, as a user complains in > https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html > we don't. James had a patch adding an appropriate combine pattern some time > ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html) but it never > got applied. > > This patch is a rebase of that patch that adds necessary > vec_merge+vec_duplicate+vec_select combine pattern. I chose to use a > define_insn rather than the define_insn_and_split in that patch that just > deletes the instruction when the source and destination registers are the > same, as I think that's not he combine patterns job to delete the redundant > instruction but rather some other passes job. Also, I was not able to create > a testcase where it would make a difference. > > Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline > assembly to a vget_lane+vset_lane combo. This can be done as a separate > patch on top of this one. > > Bootstrapped and tested on aarch64-none-linux-gnu. > Also tested on aarch64_be-none-elf. > > Ok for trunk? This looks OK to me, but as it is based on my code I probably can't approve it within the spirit of the write access policies (I only have localized review permission). Best wait for Richard/Marcus or a global reviewer to take a look. > > Thanks, > Kyrill > > 2016-06-07 James Greenhalgh> Kyrylo Tkachov > > * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane): > New define_insn. > (*aarch64_simd_vec_copy_lane_): Likewise. Watch your ChangeLog formatting. Thanks, James > > 2016-06-07 James Greenhalgh > Kyrylo Tkachov > > * gcc.target/aarch64/vget_set_lane_1.c: New test.
[PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors
Hi all, This patch addresses an deficiency we have in handling vector lane-to-lane moves in the AArch64 backend. Generally we can use the INS (element) instruction but, as a user complains in https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html we don't. James had a patch adding an appropriate combine pattern some time ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html) but it never got applied. This patch is a rebase of that patch that adds necessary vec_merge+vec_duplicate+vec_select combine pattern. I chose to use a define_insn rather than the define_insn_and_split in that patch that just deletes the instruction when the source and destination registers are the same, as I think that's not he combine patterns job to delete the redundant instruction but rather some other passes job. Also, I was not able to create a testcase where it would make a difference. Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline assembly to a vget_lane+vset_lane combo. This can be done as a separate patch on top of this one. Bootstrapped and tested on aarch64-none-linux-gnu. Also tested on aarch64_be-none-elf. Ok for trunk? Thanks, Kyrill 2016-06-07 James GreenhalghKyrylo Tkachov * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane): New define_insn. (*aarch64_simd_vec_copy_lane_): Likewise. 2016-06-07 James Greenhalgh Kyrylo Tkachov * gcc.target/aarch64/vget_set_lane_1.c: New test. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 6ea35bf487eaa47dd78742e3eae7507b6875ba1a..5600e5bd0a94fd7efd704a4b13d95d993fd5b62f 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -555,6 +555,49 @@ (define_insn "aarch64_simd_vec_set" [(set_attr "type" "neon_from_gp, neon_ins, neon_load1_1reg")] ) +(define_insn "*aarch64_simd_vec_copy_lane" + [(set (match_operand:VALL 0 "register_operand" "=w") + (vec_merge:VALL + (vec_duplicate:VALL + (vec_select: + (match_operand:VALL 3 "register_operand" "w") + (parallel + [(match_operand:SI 4 "immediate_operand" "i")]))) + (match_operand:VALL 1 "register_operand" "0") + (match_operand:SI 2 "immediate_operand" "i")))] + "TARGET_SIMD" + { +int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2]))); +operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt); +operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4]))); + +return "ins\t%0.[%p2], %3.[%4]"; + } + [(set_attr "type" "neon_ins")] +) + +(define_insn "*aarch64_simd_vec_copy_lane_" + [(set (match_operand:VALL 0 "register_operand" "=w") + (vec_merge:VALL + (vec_duplicate:VALL + (vec_select: + (match_operand: 3 "register_operand" "w") + (parallel + [(match_operand:SI 4 "immediate_operand" "i")]))) + (match_operand:VALL 1 "register_operand" "0") + (match_operand:SI 2 "immediate_operand" "i")))] + "TARGET_SIMD" + { +int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2]))); +operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt); +operands[4] = GEN_INT (ENDIAN_LANE_N (mode, + INTVAL (operands[4]))); + +return "ins\t%0.[%p2], %3.[%4]"; + } + [(set_attr "type" "neon_ins")] +) + (define_insn "aarch64_simd_lshr" [(set (match_operand:VDQ_I 0 "register_operand" "=w") (lshiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w") diff --git a/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c new file mode 100644 index ..07a77de319206c5c6dad1c0d2d9bcc998583f9c1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c @@ -0,0 +1,72 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "arm_neon.h" + +#define BUILD_TEST(TYPE1, TYPE2, Q1, Q2, SUFFIX, INDEX1, INDEX2) \ +TYPE1 __attribute__((noinline,noclone))\ +test_copy##Q1##_lane##Q2##_##SUFFIX (TYPE1 a, TYPE2 b) \ +{ \ + return vset##Q1##_lane_##SUFFIX (vget##Q2##_lane_##SUFFIX (b, INDEX2),\ +a, INDEX1);\ +} + +BUILD_TEST (poly8x8_t, poly8x8_t, , , p8, 7, 6) +BUILD_TEST (int8x8_t, int8x8_t, , , s8, 7, 6) +BUILD_TEST (uint8x8_t, uint8x8_t, , , u8, 7, 6) +/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[7\\\], v1.b\\\[6\\\]" 3 } } */ +BUILD_TEST (poly16x4_t, poly16x4_t, , , p16, 3, 2) +BUILD_TEST (int16x4_t, int16x4_t, , , s16, 3, 2) +BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2) +/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[3\\\], v1.h\\\[2\\\]" 3 } } */ +BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0) +BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0) +BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0) +/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 } } */ + +BUILD_TEST (poly8x8_t, poly8x16_t, , q, p8, 7, 15)