Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors

2016-06-30 Thread Richard Earnshaw (lists)
On 07/06/16 17:56, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch addresses an deficiency we have in handling vector
> lane-to-lane moves in the AArch64 backend.
> Generally we can use the INS (element) instruction but, as a user
> complains in https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html
> we don't. James had a patch adding an appropriate combine pattern some
> time ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html)
> but it never got applied.
> 
> This patch is a rebase of that patch that adds necessary
> vec_merge+vec_duplicate+vec_select combine pattern.
> I chose to use a define_insn rather than the define_insn_and_split in
> that patch that just deletes the instruction when
> the source and destination registers are the same, as I think that's not
> he combine patterns job to delete the redundant instruction
> but rather some other passes job. Also, I was not able to create a
> testcase where it would make a difference.
> 
> Also, this patch doesn't reimplement that vcopy*lane* intrinsics from
> inline assembly to a vget_lane+vset_lane combo.
> This can be done as a separate patch on top of this one.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Also tested on aarch64_be-none-elf.
> 
> Ok for trunk?
> 

OK.

R.

> Thanks,
> Kyrill
> 
> 2016-06-07  James Greenhalgh  
> Kyrylo Tkachov  
> 
> * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane):
> New define_insn.
>  (*aarch64_simd_vec_copy_lane_): Likewise.
> 
> 2016-06-07  James Greenhalgh  
> Kyrylo Tkachov  
> 
> * gcc.target/aarch64/vget_set_lane_1.c: New test.
> 
> aarch64-ins-vec.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 6ea35bf487eaa47dd78742e3eae7507b6875ba1a..5600e5bd0a94fd7efd704a4b13d95d993fd5b62f
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -555,6 +555,49 @@ (define_insn "aarch64_simd_vec_set"
>[(set_attr "type" "neon_from_gp, neon_ins, neon_load1_1reg")]
>  )
>  
> +(define_insn "*aarch64_simd_vec_copy_lane"
> +  [(set (match_operand:VALL 0 "register_operand" "=w")
> + (vec_merge:VALL
> + (vec_duplicate:VALL
> +   (vec_select:
> + (match_operand:VALL 3 "register_operand" "w")
> + (parallel
> +   [(match_operand:SI 4 "immediate_operand" "i")])))
> + (match_operand:VALL 1 "register_operand" "0")
> + (match_operand:SI 2 "immediate_operand" "i")))]
> +  "TARGET_SIMD"
> +  {
> +int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2])));
> +operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt);
> +operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4])));
> +
> +return "ins\t%0.[%p2], %3.[%4]";
> +  }
> +  [(set_attr "type" "neon_ins")]
> +)
> +
> +(define_insn "*aarch64_simd_vec_copy_lane_"
> +  [(set (match_operand:VALL 0 "register_operand" "=w")
> + (vec_merge:VALL
> + (vec_duplicate:VALL
> +   (vec_select:
> + (match_operand: 3 "register_operand" "w")
> + (parallel
> +   [(match_operand:SI 4 "immediate_operand" "i")])))
> + (match_operand:VALL 1 "register_operand" "0")
> + (match_operand:SI 2 "immediate_operand" "i")))]
> +  "TARGET_SIMD"
> +  {
> +int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2])));
> +operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt);
> +operands[4] = GEN_INT (ENDIAN_LANE_N (mode,
> +INTVAL (operands[4])));
> +
> +return "ins\t%0.[%p2], %3.[%4]";
> +  }
> +  [(set_attr "type" "neon_ins")]
> +)
> +
>  (define_insn "aarch64_simd_lshr"
>   [(set (match_operand:VDQ_I 0 "register_operand" "=w")
> (lshiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
> diff --git a/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c 
> b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
> new file mode 100644
> index 
> ..07a77de319206c5c6dad1c0d2d9bcc998583f9c1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
> @@ -0,0 +1,72 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "arm_neon.h"
> +
> +#define BUILD_TEST(TYPE1, TYPE2, Q1, Q2, SUFFIX, INDEX1, INDEX2) \
> +TYPE1 __attribute__((noinline,noclone))  \
> +test_copy##Q1##_lane##Q2##_##SUFFIX (TYPE1 a, TYPE2 b)   
> \
> +{\
> +  return vset##Q1##_lane_##SUFFIX (vget##Q2##_lane_##SUFFIX (b, INDEX2),\
> + a, INDEX1); \
> +}
> +
> +BUILD_TEST (poly8x8_t, poly8x8_t, , , p8, 7, 6)
> +BUILD_TEST (int8x8_t,  int8x8_t,  , , s8, 7, 6)
> +BUILD_TEST (uint8x8_t, uint8x8_t, , , u8, 7, 6)
> +/* { dg-final 

Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors

2016-06-30 Thread Kyrill Tkachov

Ping.

Thanks,
Kyrill

On 22/06/16 11:07, Kyrill Tkachov wrote:

Ping.
Richard, Marcus, do you have any feedback on this?

https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00502.html

Thanks,
Kyrill
On 14/06/16 10:36, James Greenhalgh wrote:

On Tue, Jun 07, 2016 at 05:56:47PM +0100, Kyrill Tkachov wrote:

Hi all,

This patch addresses an deficiency we have in handling vector lane-to-lane
moves in the AArch64 backend.  Generally we can use the INS (element)
instruction but, as a user complains in
https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html
we don't. James had a patch adding an appropriate combine pattern some time
ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html) but it never
got applied.

This patch is a rebase of that patch that adds necessary
vec_merge+vec_duplicate+vec_select combine pattern.  I chose to use a
define_insn rather than the define_insn_and_split in that patch that just
deletes the instruction when the source and destination registers are the
same, as I think that's not he combine patterns job to delete the redundant
instruction but rather some other passes job. Also, I was not able to create
a testcase where it would make a difference.

Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline
assembly to a vget_lane+vset_lane combo.  This can be done as a separate
patch on top of this one.

Bootstrapped and tested on aarch64-none-linux-gnu.
Also tested on aarch64_be-none-elf.

Ok for trunk?

This looks OK to me, but as it is based on my code I probably can't
approve it within the spirit of the write access policies (I only have
localized review permission).

Best wait for Richard/Marcus or a global reviewer to take a look.


Thanks,
Kyrill

2016-06-07  James Greenhalgh  
 Kyrylo Tkachov  

 * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane):
 New define_insn.
(*aarch64_simd_vec_copy_lane_): Likewise.

Watch your ChangeLog formatting.

Thanks,
James


2016-06-07  James Greenhalgh 
 Kyrylo Tkachov  

 * gcc.target/aarch64/vget_set_lane_1.c: New test.








Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors

2016-06-22 Thread Kyrill Tkachov

Ping.
Richard, Marcus, do you have any feedback on this?

https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00502.html

Thanks,
Kyrill
On 14/06/16 10:36, James Greenhalgh wrote:

On Tue, Jun 07, 2016 at 05:56:47PM +0100, Kyrill Tkachov wrote:

Hi all,

This patch addresses an deficiency we have in handling vector lane-to-lane
moves in the AArch64 backend.  Generally we can use the INS (element)
instruction but, as a user complains in
https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html
we don't. James had a patch adding an appropriate combine pattern some time
ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html) but it never
got applied.

This patch is a rebase of that patch that adds necessary
vec_merge+vec_duplicate+vec_select combine pattern.  I chose to use a
define_insn rather than the define_insn_and_split in that patch that just
deletes the instruction when the source and destination registers are the
same, as I think that's not he combine patterns job to delete the redundant
instruction but rather some other passes job. Also, I was not able to create
a testcase where it would make a difference.

Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline
assembly to a vget_lane+vset_lane combo.  This can be done as a separate
patch on top of this one.

Bootstrapped and tested on aarch64-none-linux-gnu.
Also tested on aarch64_be-none-elf.

Ok for trunk?

This looks OK to me, but as it is based on my code I probably can't
approve it within the spirit of the write access policies (I only have
localized review permission).

Best wait for Richard/Marcus or a global reviewer to take a look.


Thanks,
Kyrill

2016-06-07  James Greenhalgh  
 Kyrylo Tkachov  

 * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane):
 New define_insn.
  (*aarch64_simd_vec_copy_lane_): Likewise.

Watch your ChangeLog formatting.

Thanks,
James


2016-06-07  James Greenhalgh  
 Kyrylo Tkachov  

 * gcc.target/aarch64/vget_set_lane_1.c: New test.






Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors

2016-06-14 Thread James Greenhalgh
On Tue, Jun 07, 2016 at 05:56:47PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch addresses an deficiency we have in handling vector lane-to-lane
> moves in the AArch64 backend.  Generally we can use the INS (element)
> instruction but, as a user complains in
> https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html
> we don't. James had a patch adding an appropriate combine pattern some time
> ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html) but it never
> got applied.
> 
> This patch is a rebase of that patch that adds necessary
> vec_merge+vec_duplicate+vec_select combine pattern.  I chose to use a
> define_insn rather than the define_insn_and_split in that patch that just
> deletes the instruction when the source and destination registers are the
> same, as I think that's not he combine patterns job to delete the redundant
> instruction but rather some other passes job. Also, I was not able to create
> a testcase where it would make a difference.
> 
> Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline
> assembly to a vget_lane+vset_lane combo.  This can be done as a separate
> patch on top of this one.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Also tested on aarch64_be-none-elf.
> 
> Ok for trunk?

This looks OK to me, but as it is based on my code I probably can't
approve it within the spirit of the write access policies (I only have
localized review permission).

Best wait for Richard/Marcus or a global reviewer to take a look.

> 
> Thanks,
> Kyrill
> 
> 2016-06-07  James Greenhalgh  
> Kyrylo Tkachov  
> 
> * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane):
> New define_insn.
>  (*aarch64_simd_vec_copy_lane_): Likewise.

Watch your ChangeLog formatting.

Thanks,
James

> 
> 2016-06-07  James Greenhalgh  
> Kyrylo Tkachov  
> 
> * gcc.target/aarch64/vget_set_lane_1.c: New test.




[PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors

2016-06-07 Thread Kyrill Tkachov

Hi all,

This patch addresses an deficiency we have in handling vector lane-to-lane 
moves in the AArch64 backend.
Generally we can use the INS (element) instruction but, as a user complains in 
https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html
we don't. James had a patch adding an appropriate combine pattern some time ago 
(https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html)
but it never got applied.

This patch is a rebase of that patch that adds necessary 
vec_merge+vec_duplicate+vec_select combine pattern.
I chose to use a define_insn rather than the define_insn_and_split in that 
patch that just deletes the instruction when
the source and destination registers are the same, as I think that's not he 
combine patterns job to delete the redundant instruction
but rather some other passes job. Also, I was not able to create a testcase 
where it would make a difference.

Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline 
assembly to a vget_lane+vset_lane combo.
This can be done as a separate patch on top of this one.

Bootstrapped and tested on aarch64-none-linux-gnu.
Also tested on aarch64_be-none-elf.

Ok for trunk?

Thanks,
Kyrill

2016-06-07  James Greenhalgh  
Kyrylo Tkachov  

* config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane):
New define_insn.
 (*aarch64_simd_vec_copy_lane_): Likewise.

2016-06-07  James Greenhalgh  
Kyrylo Tkachov  

* gcc.target/aarch64/vget_set_lane_1.c: New test.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 6ea35bf487eaa47dd78742e3eae7507b6875ba1a..5600e5bd0a94fd7efd704a4b13d95d993fd5b62f 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -555,6 +555,49 @@ (define_insn "aarch64_simd_vec_set"
   [(set_attr "type" "neon_from_gp, neon_ins, neon_load1_1reg")]
 )
 
+(define_insn "*aarch64_simd_vec_copy_lane"
+  [(set (match_operand:VALL 0 "register_operand" "=w")
+	(vec_merge:VALL
+	(vec_duplicate:VALL
+	  (vec_select:
+		(match_operand:VALL 3 "register_operand" "w")
+		(parallel
+		  [(match_operand:SI 4 "immediate_operand" "i")])))
+	(match_operand:VALL 1 "register_operand" "0")
+	(match_operand:SI 2 "immediate_operand" "i")))]
+  "TARGET_SIMD"
+  {
+int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2])));
+operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt);
+operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4])));
+
+return "ins\t%0.[%p2], %3.[%4]";
+  }
+  [(set_attr "type" "neon_ins")]
+)
+
+(define_insn "*aarch64_simd_vec_copy_lane_"
+  [(set (match_operand:VALL 0 "register_operand" "=w")
+	(vec_merge:VALL
+	(vec_duplicate:VALL
+	  (vec_select:
+		(match_operand: 3 "register_operand" "w")
+		(parallel
+		  [(match_operand:SI 4 "immediate_operand" "i")])))
+	(match_operand:VALL 1 "register_operand" "0")
+	(match_operand:SI 2 "immediate_operand" "i")))]
+  "TARGET_SIMD"
+  {
+int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2])));
+operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt);
+operands[4] = GEN_INT (ENDIAN_LANE_N (mode,
+			   INTVAL (operands[4])));
+
+return "ins\t%0.[%p2], %3.[%4]";
+  }
+  [(set_attr "type" "neon_ins")]
+)
+
 (define_insn "aarch64_simd_lshr"
  [(set (match_operand:VDQ_I 0 "register_operand" "=w")
(lshiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
diff --git a/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
new file mode 100644
index ..07a77de319206c5c6dad1c0d2d9bcc998583f9c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
@@ -0,0 +1,72 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include "arm_neon.h"
+
+#define BUILD_TEST(TYPE1, TYPE2, Q1, Q2, SUFFIX, INDEX1, INDEX2)	\
+TYPE1 __attribute__((noinline,noclone))\
+test_copy##Q1##_lane##Q2##_##SUFFIX (TYPE1 a, TYPE2 b)			\
+{	\
+  return vset##Q1##_lane_##SUFFIX (vget##Q2##_lane_##SUFFIX (b, INDEX2),\
+a, INDEX1);\
+}
+
+BUILD_TEST (poly8x8_t, poly8x8_t, , , p8, 7, 6)
+BUILD_TEST (int8x8_t,  int8x8_t,  , , s8, 7, 6)
+BUILD_TEST (uint8x8_t, uint8x8_t, , , u8, 7, 6)
+/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[7\\\], v1.b\\\[6\\\]" 3 } } */
+BUILD_TEST (poly16x4_t, poly16x4_t, , , p16, 3, 2)
+BUILD_TEST (int16x4_t,  int16x4_t,  , , s16, 3, 2)
+BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2)
+/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[3\\\], v1.h\\\[2\\\]" 3 } } */
+BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
+BUILD_TEST (int32x2_t,   int32x2_t,   , , s32, 1, 0)
+BUILD_TEST (uint32x2_t,  uint32x2_t,  , , u32, 1, 0)
+/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 } } */
+
+BUILD_TEST (poly8x8_t, poly8x16_t, , q, p8, 7, 15)